Domain-wide data projects

Introduction

The three tables below summarize a number of data projects broadly located within the open energy modeling community and primarily intended to serve that community. The EERAdata metadata project is an exception in this regard, in that it is designed to assist closed data communities as well.

Most of the projects outlined either utilize open data or are suitable for use with open data. The “open” is especially important because without Creative Commons CC‑BY‑4.0 licensing (or CC0‑1.0 or something inbound compatible) applied to the datasets, most initiatives would flounder in the absence of legal interoperability — a process otherwise known as data siloing.

The majority of projects listed have only been public facing since 2019 or 2020. But their breath and sophistication is an indication that the open energy system modeling domain is maturing.

That said, the quality of data drops off substantially as one moves specifically into the energy domain. These issues are discussed further in following threads:

In addition, the availability of open data ranges from good to practically nonexistent, depending on the region in question. And the quality and usability of data under statutory reporting also varies markedly. For these reasons, the content projects listed typically rely on community curation to source, clean, maintain, and repair the datasets they offer.

And briefly, for those not familiar with energy system modeling, this activity involves selecting a suitable modeling framework, populating it with data that represents the current state of the system, hypothesizing various future development trajectories known as “scenarios”, electing one such scenario to act as a reference — normally some estimate of “business as usual” or “policy as usual”, and then comparing the remaining scenarios against this particular reference. Furthermore, the data described in this posting classes as “non‑personal”, meaning that issues related to individual privacy do not apply.

Finally, be aware that the Germany‑based Open Energy Platform and Family are distinct and different from the United Kingdom‑based Icebreaker One Open Energy project.

High‑level concepts

Data interoperability requires that the domain community agree on high‑level concepts covering semantics, labeling, and collection protocols. In addition, a degree of convergence on technical practices is also needed but that process tends to be relatively organic.

The Open Energy Ontology (OEO) is located within the openmod community and is licensed CC0‑1.0. As indicated earlier, the EERAdata metadata project is more broadly cast and is intended to support both shared and open data.

Project Name / tagline Coordinator / host Role
OEO Open Energy Ontology Open Energy Family community ontology
EERAdata metadata project European Energy Research Alliance metadata standards

Emphasis on infrastructure and/or tooling

The following infrastructure projects are designed to support modeling activities. In addition, energy modelers have largely settled on python and julia as the programming languages of choice, as indicated.

Project Name / tagline Coordinator / host Role
Open Energy Family open collaborative framework for energy data management Open Energy Platform Community, OVGU, RLI, and many more research data infrastructure including scenario management
LOD–GEOSS linked open data and the global earth observation system of systems in energy systems analysis German Aerospace Center, Institute of Networked Energy Systems concept development for networked databases and metadata discovery based on DBpedia Databus project
SENTINEL archive package project-specific model‑agnostic data management system European institutions facilitate metadata-enabled data flow between linked models, bridge to external data APIs, offer standardized analysis
SPINE open source toolbox for modeling VTT Finland and others cross‑model data interoperability
pyam analysis and visualization of integrated assessment scenarios IIASA and IAMC python package for processing, validation, and plotting of energy system scenario data
PowerSystems.jl National Renewable Energy Laboratory efficient cross‑model data management utility
powerplantmatching power plant inventories Frankfurt Institute for Advanced Studies a toolset for cleaning, standardizing, and combining multiple power plant databases

Emphasis on content

The following content projects repackage datasets for use by modelers. They are normally either constituted as web‑interfaced portals or as programming libraries installed via package management systems.

The Atlite project intends to support the CORDEX regional climate projection datasets and thereby allow modelers to readily apply future climate‑modified weather sets to their numerical investigations.

Project Name / tagline Coordinator / host Role
OPSD Open Power System Data Neon GmbH European electricity data portal with emphasis on curation
Atlite weather data to renewables potentials German institutions python library containing ECMWF ERA5 and SARAH datasets to produce renewables timeseries
PowerSystemCaseBuilder.jl National Renewable Energy Laboratory library of datasets for PowerSystems.jl (see above)
PowerGenome Carbon Impact Consulting LLC US electricity capacity expansion data platform with model transformation
Open Energy Outlook system data part of project here North Carolina State University and others fully open public policy analysis for deep decarbonization of United States
PUDL Public Utility Data Liberation Catalyst Cooperative wide‑ranging United States data project

Note that official databases (for instance, the SMARD portal from the German BNetzA) and industry databases (for instance, Open Data Energy Networks from the French RTE and partners) are not listed here.

Closure

It is relatively easy to speak of an “information ecosystem” but very much harder to create one. The listed projects, taken collectively, arguably now represent such an ecosystem.

There are, nonetheless, many challenges ahead. These include creating and refining community consensuses on semantics and on metadata, converging on technical guidelines, and collecting and curating legally usable and reusable datasets. Taken together, these activities should yield an integrated coherent common pool information resource.

Although this thread covers data projects, it is not possible to separate code from data. Indeed the two are completely intertwined on closer examination.

On that note, data systems projects like PowerSystemCaseBuilder.jl, PowerGenome, and SPINE are starting to provide coherent, integrated, integrity‑checked, and relatively complete snapshots for a number of electricity systems of interest — and by extension, energy systems more broadly in due course. These projects utilize high‑level abstractions that allow their inventories to be translated to serve a range of modeling frameworks. Indeed, the days of simply treating the input data as ad‑hoc collections of resources that surround and feed various numerical simulations may well be numbered. That said, each modeling project will probably also need to to supplement these common data systems with additional data to meet specific requirements. The use of data systems can also foster a more consistent semantics, encourage cross‑model comparisons, and assist standardized reporting.

The projects listed are collectively designed to support public interest analysis. We have just one generation to reach carbon neutrality and it is hard to image that journey being successful and also somewhat optimal in the absence of open and transparent analysis underpinned by genuinely open data and open source numerics.

1 Like

Project URLs

The URLs for the above projects are contained in this separate wikipost — which any registered user can edit and maintain. Otherwise message @robbie.morrison with updates.

Worth noting that some of the projects listed are wide‑ranging and that the data management component is just one of several workstreams.

The lack of literature is due to the young age of the projects. Notwithstanding, several papers are in the pipeline and may well surface later in 2021.

Also worth noting the wikipedia page on open energy system databases which also lists a number of official and industry information portals containing open data.


Project Explanation    URLs Literature
OEO (Open Energy Ontology) community ontology
EERAdata metadata project

Project Explanation    URLs Literature
Open Energy Family data platform
LOD–GEOSS linked open data project
SENTINEL
archive package
model data management
SPINE model data management
pyam analysis and visualization of energy systems and integrated assessment scenarios Gidden and Huppmann (2019)
PowerSystems.jl model data management
powerplantmatching power plants Gotzens et al (2019)

Project Explanation    URLs Literature
OPSD (Open Power System Data) data portal for Europe Wiese et al (2019)
Atlite climate data to energy data
PowerSystemCaseBuilder.jl power systems data
PowerGenome power systems data Schivley et al (2021)
Open Energy Outlook energy systems data DeCarolis et al (2020)
PUDL (Public Utility Data Liberation) United States data collection project Selvans (2020)

References

DeCarolis, Joseph F, Paulina Jaramillo, Jeremiah X Johnson, David L McCollum, Evelina Trutnevyte, David C Daniels, Gökçe Akın‑Olçum, Joule Bergerson, Soolyeon Cho, Joon‑Ho Choi, Michael T Craig, Anderson R de Queiroz, Hadi Eshraghi, Christopher S Galik, Timothy G Gutowski, Karl R Haapala, Bri‑Mathias Hodge, Simi Hoque, Jesse D Jenkins, Alan Jenn, Daniel J A Johansson, Noah Kaufman, Juha Kiviluoma, Zhenhong Lin, and Heather L MacLean (16 December 2020). “Leveraging open‑source tools for collaborative macro‑energy system modeling efforts”. Joule. 4 (12): 2523–2526. ISSN 2542‑4785. doi:10.1016/j.joule.2020.11.002. Open access.

Gidden, Matthew J and Daniel Huppmann (7 January 2019). “pyam: a Python package for the analysis and visualization of models of the interaction of climate, human, and environmental systems”. Journal of Open Source Software. 4 (33): 1095. ISSN 2475-9066. doi:10.21105/joss.01095. Creative Commons CC‑BY‑4.0.

Gotzens, Fabian, Heidi Heinrichs, Jonas Hörsch, and Fabian Hofmann (1 January 2019). “Performing energy modelling exercises in a transparent way: the issue of data quality in power plant databases”. Energy Strategy Reviews. 23: 1–12. ISSN 2211‑467X. doi:10.1016/j.esr.2018.11.004. Creative Commons CC‑BY‑NC‑ND‑4.0.

Wiese, Frauke, Ingmar Schlecht, Wolf‑Dieter Bunke, Clemens Gerbaulet, Lion Hirth, Martin Jahn, Friedrich Kunz, Casimir Lorenz, Jonathan Mühlenpfordt, Juliane Reimann, and Wolf‑Peter Schill (15 February 2019). “Open Power System Data: frictionless data for electricity system modelling”. Applied Energy. 236: 401–409. ISSN 0306‑2619. doi:10.1016/j.apenergy.2018.11.097. Postprint.

Schivley, Greg, Ethan Welty, and Neha Patankar (19 January 2021). PowerGenome/PowerGenome: v0.4.1. Geneva, Switzerland: Zenodo. doi:10.5281/zenodo.4552835. Snapshot.

Selvans, Zane (30 November 2020). Publishing a PUDL API with Datasette. Catalyst Cooperative. Boulder, Colorado, USA.

Digital twins

Perhaps also worth noting some recent digital twin projects that may share similarities with the more sophisticated data system projects listed above. One digital twin in planning is the European Commission DestinE project.

References

Arup (November 2019). Digital twin: towards a meaningful framework. London, United Kingdom: Arup.

European Commission (4 March 2021). Destination Earth (DestinE). Shaping Europe’s digital future - European Commission. Last modified date given.

Vaughan, Adam (12 October 2020). Building digital twins of earth could help Europe cut carbon emissions. New Scientist. London, United Kingdom. Online version. Paywalled.

Vaughan, Adam (17 October 2020). “Virtual Earths to be created”. New Scientist. (3304): 8. ISSN 0262-4079. Print version.