A blazing fast geo database with LevelDB, Go and Geohashes
You probably have heard of LevelDB it’s a blazing fast key value store (as a library not a daemon), that uses Snappy compression.
There is plenty of usages for it, the API is very simple at least in Go (I will be using Goleveldb).
The key is a []byte
the value is a []byte
so you can “get”, “put” & “delete” that’s it.
I needed a low memory low cpu system that could collect millions of geo data and query over them, Geohash has an interesting property you can encode longitude and latitude into a string : f2m616nn
this hash represents the lat & long 46.770, -71.304 f2m616nn, if you shorten the string to f2m61
it still refers to the same lat & long but with less precisions f2m61.
A 4 digits hash leads to 19545 meters precision, to perfom a lookup around a position you simply query for the 8 adjacent blocks. A Geohash library for Go.
Here you would store all of your data points matching a geohash to the same set.
Problem there is no such thing as a set in LevelDB.
But there is a cursor so you can seek to a position then iterate over the next or previous one (byte ordered).
So your data could be stored that way: 4 digits geohash + a uniq id.
Then you can perform proximity lookup by searching for the 8 adjacents hashes from the position your are looking with a precision of 20km, good but not very flexible.
We can have a more generic solution, first we need a key a simple int64 uniq id.
// NewKey generates a new key using time prefixed by 'K'
func NewKey() Key {
return NewKeyWithInt(time.Now().UnixNano())
}
// NewKeyWithInt returns a key prefixed by 'K' with value i
func NewKeyWithInt(id int64) Key {
key := bytes.NewBufferString("K")
binary.Write(key, binary.BigEndian, id)
return key.Bytes()
}
Here we can encode a key with a Unix timestamp so our key is not just a key it’s also an encoded time value, it will be uniq thanks to the nanosecond precision. We are using BigEndian so it can be byte compared: older will be before newer after.
Now about geo encoding our key will be of the form:G201508282105dhv766K��Ϸ�Y�
(note the end of the key is binary encoded)
You always need a prefix for your keys so you can seek and browse them without running over different keys types, here I have a G
as Geo, then a string encoded date prefix, so we can search by date, but we don’t want extra precision here, it would add extra seek to LevelDB, (that’s why we have a modulo of 10 for minutes) then we add a precise geohash and finally our previous uniq id.
// NewGeoKey generates a new key using a position & a key
func NewGeoKey(latitude, longitude float64) GeoKey {
t := time.Now().UTC()
kint := t.UnixNano()
kid := NewKeyWithInt(kint)
// G + string date + geohash 6 + timestamped key
// G201508282105dhv766K....
gk := geohash.EncodeWithPrecision(latitude, longitude, 6)
ts := t.Format("2006010215")
// modulo 10 to store 10mn interval
m := t.Minute() - t.Minute()%10
zkey := []byte("G" + ts + fmt.Sprintf("%02d", m) + gk)
zkey = append(zkey, kid...)
return zkey
}
We can now lookup by flexible date & by flexible proximity like a Redis ZRANGE, you simply need to reverse the process.
// GeoKeyPrefix return prefixes to lookup using a GeoKey and timerange
func GeoKeyPrefix(start, stop time.Time) []string {
var res []string
d := 10 * time.Minute
var t time.Time
t = start
for {
if t.After(stop) {
break
}
key := "G" + t.Format("2006010215") + fmt.Sprintf("%02d", t.Minute()-t.Minute()%10)
res = append(res, key)
t = t.Add(d)
}
return res
}
Lookup that way:
d := time.Duration(-10) * time.Minute
geoPrefixs := GeoKeyPrefix(time.Now().UTC().Add(d), time.Now().UTC())
// find adjacent hashes in m
// 1, 5003530
// 2, 625441
// 3, 123264
// 4, 19545
// 5, 3803
// 6, 610
gk := geohash.EncodeWithPrecision(lat, long, 4)
adjs := geohash.CalculateAllAdjacent(gk)
adjs = append(adjs, gk)
// for each adjacent blocks
for _, gkl := range adjs {
// for each time range modulo 10
for _, geoPrefix := range geoPrefixs {
startGeoKey := []byte(geoPrefix + gkl)
iter := s.NewIterator(util.BytesPrefix(startGeoKey), nil)
for iter.Next() {
log.Println(iter.Value())
}
iter.Release()
}
}
It can be optimized, reducing the size of the keys, but it performs extremely well storing around 3 millions geopoints per day, using less than 3% cpu and can received hundreds of queries per second.
Oh did I forget to mention it’s running on a Raspberry Pi? :)
I could maybe turn it into a library but it’s so simple it’s probably useless.
Next blog post: what are those millions points used for?