Using DuckDB With Apache Supserset, Bonus Spatial Data

- duckdb superset geo

Apache Superset is a popular data visualization tool, often referred to as a Business Intelligence (BI) tool. It provides polyglot visual interfaces to various databases.

Integrating DuckDB with Superset can significantly enhance its capabilities, enabling it to read Parquet files and perform spatial queries among other features.

TL;DR: You can use DuckDB with Superset using a slightly modified image

Here is how I’m using it.

Add DuckDB drivers to the Docker image

First let’s build a docker image with DuckDB supports on top of the existing Superset image:

FROM apache/superset:4.1.1

USER root

RUN pip install duckdb==1.1.3
RUN pip install duckdb-engine==0.15.0

USER superset

Build:

docker build -t ghcr.io/akhenakh/duckdb-superset:latest .

Start a local instance

Start it locally:

docker run -p 8088:8088 \                                         
  -e "SUPERSET_SECRET_KEY=XXXXX" \
  --mount type=bind,source=/$(pwd)/data,target=/app/superset_home \
  --name superset-duckdb \
  ghcr.io/akhenakh/duckdb-superset:latest

Create your admin user and init the local db (Supserset can use a SQLite to store its config):

docker exec -it superset-duckdb superset fab create-admin \
    --username admin \
    --firstname Superset \
    --lastname Admin \
    --email admin@superset.com \
    --password mypassword

docker exec -it superset-duckdb superset db upgrade

docker exec -it superset-duckdb superset init

Note that this quick solution is for a local development setup scenario, you probably want to deploy it with its own database and cache, like in Kubernetes

Configure Superset for DuckDB

Spatial Extension

To perform spatial queries, follow these steps:

Validate the installation by running a spatial query using geoparquet, it could take several minutes:

    SELECT
       names.primary as name,
       ST_X(geometry) as longitude,
       ST_Y(geometry) as latitude   
		FROM read_parquet('s3://overturemaps-us-west-2/release/2024-12-18.0/theme=places/type=place/*', filename=true, hive_partitioning=1)
    WHERE categories.primary = 'pizza_restaurant'
    AND bbox.xmin BETWEEN -75 AND -73 
    AND bbox.ymin BETWEEN 40 AND 41;

Supserset showing a heatmap