Python ETL and API for UK energy data

alebgz-91 · 8 September 2025 10:02

Hello everyone.

This is my first post in the forum and I hope my post doesn’t come across as inappropriate.

I am a former energy statistician who worked for several years as part of the team that publishes DUKES, Energy Trends and related energy official statistics releases. I was involved in most of the data you can find here: Statistics at DESNZ

I always have an interest for data engineering (which is my current job, albeit not in energy) and one of the most frustrating tasks in my old role was to provide data for external stakeholders (modellers, analysts, policy makers etc.). The published spreadsheet were reformatted to a new, improved standard but are still heavily human-centric and do not allow easy acquisition for whoever needs them in a processable format. I know by experience that the production of these spreadsheet is more of a craft than an automated process - i.e. it is common to copy paste data into a template from a number of sources that are not in the same format. Hence, fulfilling a data request for the team is not as simple as “look at the working tables”.

QUEENS (Queryable Energy National Statistics) is designed to fill this gap: it takes a published Excel workbook in human readable format, reshapes them, makes sure the product satisfies some constraints in terms of consistency, and writes it to a SQL database in a long format ideal for quick queries. Data, once ingested, can be served though an API or through library methods. The package is capable of ingesting multiple versions of the same spreadsheet and the user only serves a snapshot of these data.

GitHub (code, docs, examples): https://github.com/alebgz-91/queens
PyPI: https://pypi.org/project/queens/
Live demo API (Render): https://queens-5stl.onrender.com/docs (free tier; first request may be slow)

What it does in brief

Reads from GOV.UK directly - no need to download, risking version mismatches.
ETL (extract, transform, load): parses chapter workbooks, normalizes headers, validates types, enforces uniqueness.
Versioning: logs each ingest; staging selects a coherent snapshot “as of” a date.
Querying: metadata (+ dtypes) and data filters with typed operators (eq/neq/lt/lte/gt/gte/like); text comparisons are case-insensitive.
Interfaces: CLI, Python facade, and FastAPI endpoints for programmatic access.
Extensible: driven by JSON configs + Excel templates; can be adapted to other publications.

Minimal API example (demo)

Using the demo deployment (data available for DUKES 2025):

import json
import requests
import pandas as pd

BASE = "https://queens-5stl.onrender.com"

# 1) Discover queryable columns for a table
m = requests.get(f"{BASE}/metadata/dukes", params={"table_name": "6.1"})
m.raise_for_status()
meta = m.json()["data"]   # list[dict]: [{"column_name": "...", "dtype": "TEXT/INTEGER/REAL"}, ...]
pd.DataFrame(meta).head()

# 2) Fetch data with filters + cursor pagination
filters = {
    "year": {"gte": 2015},
    "$or": [
        {"fuel": {"like": "%wood%"}},
        {"fuel": "Landfill gas"}
    ]
}

rows, cursor = [], None
while True:
    params = {
        "table_name": "6.1",
        "limit": 1000,
        "filters": json.dumps(filters)
    }
    if cursor is not None:
        params["cursor"] = cursor

    r = requests.get(f"{BASE}/data/dukes", params=params)
    r.raise_for_status()
    out = r.json()            # {"data": [...], "table_description": "...", "next_cursor": <int|None>}
    rows.extend(out["data"])
    cursor = out["next_cursor"]
    if not cursor:
        break

df = pd.DataFrame(rows)
df.head()

Docs & examples

Overview & architecture: see the repo’s docs/ directory and README
CLI & library usage: docs/cli.md, docs/library.md
API details: docs/api.md
Example notebooks: examples/ (client usage and API demo)

Reuse / adapt

Although the default configuration targets DUKES, the approach is generic. Adapting to another statistical series is largely a matter of:

pointing to the source pages (for URL scraping),
providing mapping templates (Excel) per table shape,
updating schema.json and etl_config.json.

The package currently supports time-series-like tables. where there is a clear unique time index (e.g. annual data). As it is, it can be made to work with more frequent datasets but proper support should be coded. This is a future development aim that will naturally be implemented when extending the support to Energy Trends.

I hope this is useful to you or your colleagues. If you have feedback or would like to discuss details/features, do not hesitate to get in touch.