A blazing fast geo database with LevelDB, Go and Geohashes


You probably have heard of LevelDB it’s a blazing fast key value store (as a library not a daemon), that uses Snappy compression.
There is plenty of usages for it, the API is very simple at least in Go (I will be using Goleveldb).

The key is a []byte the value is a []byte so you can “get”, “put” & “delete” that’s it.

I needed a low memory low cpu system that could collect millions of geo data and query over them, Geohash has an interesting property you can encode longitude and latitude into a string : f2m616nn this hash represents the lat & long 46.770, -71.304 f2m616nn, if you shorten the string to f2m61 it still refers to the same lat & long but with less precisions f2m61.
A 4 digits hash leads to 19545 meters precision, to perfom a lookup around a position you simply query for the 8 adjacent blocks. A Geohash library for Go.

Here you would store all of your data points matching a geohash to the same set.
Problem there is no such thing as a set in LevelDB.

But there is a cursor so you can seek to a position then iterate over the next or previous one (byte ordered).
So your data could be stored that way: 4 digits geohash + a uniq id.

Then you can perform proximity lookup by searching for the 8 adjacents hashes from the position your are looking with a precision of 20km, good but not very flexible.

We can have a more generic solution, first we need a key a simple int64 uniq id.

// NewKey generates a new key using time prefixed by 'K'
func NewKey() Key {
	return NewKeyWithInt(time.Now().UnixNano())
}

// NewKeyWithInt returns a key prefixed by 'K' with value i
func NewKeyWithInt(id int64) Key {
	key := bytes.NewBufferString("K")
	binary.Write(key, binary.BigEndian, id)
	return key.Bytes()
}

Here we can encode a key with a Unix timestamp so our key is not just a key it’s also an encoded time value, it will be uniq thanks to the nanosecond precision. We are using BigEndian so it can be byte compared: older will be before newer after.

Now about geo encoding our key will be of the form:
G201508282105dhv766K��Ϸ�Y� (note the end of the key is binary encoded) You always need a prefix for your keys so you can seek and browse them without running over different keys types, here I have a G as Geo, then a string encoded date prefix, so we can search by date, but we don’t want extra precision here, it would add extra seek to LevelDB, (that’s why we have a modulo of 10 for minutes) then we add a precise geohash and finally our previous uniq id.

// NewGeoKey generates a new key using a position & a key
func NewGeoKey(latitude, longitude float64) GeoKey {
	t := time.Now().UTC()
	kint := t.UnixNano()
	kid := NewKeyWithInt(kint)
	// G + string date + geohash 6 + timestamped key 
	// G201508282105dhv766K....
	gk := geohash.EncodeWithPrecision(latitude, longitude, 6)
	ts := t.Format("2006010215")

	// modulo 10 to store 10mn interval
	m := t.Minute() - t.Minute()%10
	zkey := []byte("G" + ts + fmt.Sprintf("%02d", m) + gk)
	zkey = append(zkey, kid...)
	return zkey
}

We can now lookup by flexible date & by flexible proximity like a Redis ZRANGE, you simply need to reverse the process.

// GeoKeyPrefix return prefixes to lookup using a GeoKey and timerange
func GeoKeyPrefix(start, stop time.Time) []string {
	var res []string
	d := 10 * time.Minute
	var t time.Time
	t = start
	for {
		if t.After(stop) {
			break
		}

		key := "G" + t.Format("2006010215") + fmt.Sprintf("%02d", t.Minute()-t.Minute()%10)
		res = append(res, key)
		t = t.Add(d)
	}
	return res
}

Lookup that way:

	d := time.Duration(-10) * time.Minute
	geoPrefixs := GeoKeyPrefix(time.Now().UTC().Add(d), time.Now().UTC())

	// find adjacent hashes in m
	// 1, 5003530
	// 2, 625441
	// 3, 123264
	// 4, 19545
	// 5, 3803
	// 6, 610
	gk := geohash.EncodeWithPrecision(lat, long, 4)
	adjs := geohash.CalculateAllAdjacent(gk)
	adjs = append(adjs, gk)

	// for each adjacent blocks
	for _, gkl := range adjs {

		// for each time range modulo 10
		for _, geoPrefix := range geoPrefixs {
			startGeoKey := []byte(geoPrefix + gkl)
			iter := s.NewIterator(util.BytesPrefix(startGeoKey), nil)

			for iter.Next() {
				log.Println(iter.Value())
			}
			iter.Release()
		}
	}

It can be optimized, reducing the size of the keys, but it performs extremely well storing around 3 millions geopoints per day, using less than 3% cpu and can received hundreds of queries per second.

Oh did I forget to mention it’s running on a Raspberry Pi? :)

I could maybe turn it into a library but it’s so simple it’s probably useless.
Next blog post: what are those millions points used for?