A geo database for polygons, foundations
On a previous post, I’ve described how to use the S2 geo library to create a fast geo database, but it was to store locations (points) and only to perform range queries, a complete geo database would have regions/polygons queries.
Looking for a solution
I had this need: querying for the countries or subregions of hundreds of coordinates per second, without relying on an external service.
One solution, using my previous technique, could have been to store every cities in the world and then perform a proximity query around my point to get the closest cities, but it works only in populated area and it’s only an approximation.
I looked into others solutions, there is some smart ideas using UTF-grid, but it’s a client side solution and also an approximation tied to the resolution of the computed grid.
S2 to the rescue
S2 cells have some nice properties, they are segments on the Hilbert curve, expressed as range of uint64
, so I had the intuition the problem to perform fast region lookup could be simplified as find all mathematical segments containing my location expressed as an uint64
.
Using a Segment Tree datastructure, I first tried an in memory engine, using Natural Earth Data, loading the whole world countries shapes into S2 loops (a Loop
represents a simple spherical polygon), transforming then into cells using the region coverer, it returns cells of different levels, add them to the segment tree.
To query, simply tranform the location into an S2 Cell
(level 30) and perform a stubbing query that intersects the segments, every segments crossed are cells that covered a part of a Loop
.
It will reduce the problem to test a few Loop
vs thousands of them, finally perform ContainsPoint
against the found loops cause the point could be inside the Cell
but not inside the Loop
itself.
Et voilĂ ! It works!
The segment tree structure itself is very low on memory, the loops/polygons data could be stored on disk and loaded on requests, I’ve tested a second implementation using LevelDB using this technique.
If you have a very large tree (for example cities limits for the whole world), you can even put the segment tree on a KV storage, using this paper Interval Indexing and Querying on Key-Value Cloud Stores.
Region a gogo
As a demonstration here is a working microservice called regionagogo, simply returning the country & state for a given location.
It loads geo data for the whole world and answers to HTTP queries using small amout of memory.
GET /country?lat=19.542915&lng=-155.665857
{
"code": "US",
"name": "Hawaii"
}
Here is a Docker image so you can deploy it on your stack.
Note that it performs really well but can be improved a lot, for example the actual Go S2 implementation is still using Rect boxing around loops, that’s why regionagogo is using a data file so it can be generated from the C++ version.
Future
This technique seems to work well for stubbing queries, region queries, geofencing …
It can be a solid foundation to create a flexible and simple geo database.