Domain-wide data projects

robbie.morrison · 30 March 2021 20:55

Introduction

The three tables below summarize a number of data projects broadly located within the open energy modeling community and primarily intended to serve that community. The EERAdata metadata project is an exception in this regard, in that it is designed to assist closed data communities as well.

Most of the projects outlined either utilize open data or are suitable for use with open data. The “open” is especially important because without Creative Commons CC‑BY‑4.0 licensing (or CC0‑1.0 or something inbound compatible) applied to the datasets, most initiatives would flounder in the absence of legal interoperability — a process otherwise known as data siloing.

The majority of projects listed have only been public facing since 2019 or 2020. But their breath and sophistication is an indication that the open energy system modeling domain is maturing.

That said, the quality of data drops off substantially as one moves specifically into the energy domain. These issues are discussed further in following threads:

In addition, the availability of open data ranges from good to practically nonexistent, depending on the region in question. And the quality and usability of data under statutory reporting also varies markedly. For these reasons, the content projects listed typically rely on community curation to source, clean, maintain, and repair the datasets they offer.

And briefly, for those not familiar with energy system modeling, this activity involves selecting a suitable modeling framework, populating it with data that represents the current state of the system, hypothesizing various future development trajectories known as “scenarios”, electing one such scenario to act as a reference — normally some estimate of “business as usual” or “policy as usual”, and then comparing the remaining scenarios against this particular reference. Furthermore, the data described in this posting classes as “non‑personal”, meaning that issues related to individual privacy do not apply.

Finally, be aware that the Germany‑based Open Energy Platform and Family are distinct and different from the United Kingdom‑based Icebreaker One Open Energy project.

High‑level concepts

Data interoperability requires that the domain community agree on high‑level concepts covering semantics, labeling, and collection protocols. In addition, a degree of convergence on technical practices is also needed but that process tends to be relatively organic.

The Open Energy Ontology (OEO) is located within the openmod community and is licensed CC0‑1.0. As indicated earlier, the EERAdata metadata project is more broadly cast and is intended to support both shared and open data.

Project	Name / tagline	Coordinator / host	Role
OEO	Open Energy Ontology	Open Energy Family	community ontology
EERAdata	metadata project	European Energy Research Alliance	metadata standards

Emphasis on infrastructure and/or tooling

The following infrastructure projects are designed to support modeling activities. In addition, energy modelers have largely settled on python and julia as the programming languages of choice, as indicated.

Project	Name / tagline	Coordinator / host	Role
Open Energy Family	open collaborative framework for energy data management	Open Energy Platform Community, OVGU, RLI, and many more	research data infrastructure including scenario management
LOD–GEOSS	linked open data and the global earth observation system of systems in energy systems analysis	German Aerospace Center, Institute of Networked Energy Systems	concept development for networked databases and metadata discovery based on DBpedia Databus project
SENTINEL archive package	project-specific model‑agnostic data management system	European institutions	facilitate metadata-enabled data flow between linked models, bridge to external data APIs, offer standardized analysis
SPINE	open source toolbox for modeling	VTT Finland and others	cross‑model data interoperability
pyam	analysis and visualization of integrated assessment scenarios	IIASA and IAMC	python package for processing, validation, and plotting of energy system scenario data
PowerSystems.jl	—	National Renewable Energy Laboratory	efficient cross‑model data management utility
powerplantmatching	power plant inventories	Frankfurt Institute for Advanced Studies	a toolset for cleaning, standardizing, and combining multiple power plant databases

Emphasis on content

The following content projects repackage datasets for use by modelers. They are normally either constituted as web‑interfaced portals or as programming libraries installed via package management systems.

The Atlite project intends to support the CORDEX regional climate projection datasets and thereby allow modelers to readily apply future climate‑modified weather sets to their numerical investigations.

Project	Name / tagline	Coordinator / host	Role
OPSD	Open Power System Data	Neon GmbH	European electricity data portal with emphasis on curation
Atlite	weather data to renewables potentials	German institutions	python library containing ECMWF ERA5 and SARAH datasets to produce renewables timeseries
PowerSystemCaseBuilder.jl	—	National Renewable Energy Laboratory	library of datasets for PowerSystems.jl (see above)
PowerGenome	—	Carbon Impact Consulting LLC	US electricity capacity expansion data platform with model transformation
Open Energy Outlook	system data part of project here	North Carolina State University and others	fully open public policy analysis for deep decarbonization of United States
PUDL	Public Utility Data Liberation	Catalyst Cooperative	wide‑ranging United States data project

Note that official databases (for instance, the SMARD portal from the German BNetzA) and industry databases (for instance, Open Data Energy Networks from the French RTE and partners) are not listed here.

Closure

It is relatively easy to speak of an “information ecosystem” but very much harder to create one. The listed projects, taken collectively, arguably now represent such an ecosystem.

There are, nonetheless, many challenges ahead. These include creating and refining community consensuses on semantics and on metadata, converging on technical guidelines, and collecting and curating legally usable and reusable datasets. Taken together, these activities should yield an integrated coherent common pool information resource.

Although this thread covers data projects, it is not possible to separate code from data. Indeed the two are completely intertwined on closer examination.

On that note, data systems projects like PowerSystemCaseBuilder.jl, PowerGenome, and SPINE are starting to provide coherent, integrated, integrity‑checked, and relatively complete snapshots for a number of electricity systems of interest — and by extension, energy systems more broadly in due course. These projects utilize high‑level abstractions that allow their inventories to be translated to serve a range of modeling frameworks. Indeed, the days of simply treating the input data as ad‑hoc collections of resources that surround and feed various numerical simulations may well be numbered. That said, each modeling project will probably also need to to supplement these common data systems with additional data to meet specific requirements. The use of data systems can also foster a more consistent semantics, encourage cross‑model comparisons, and assist standardized reporting.

The projects listed are collectively designed to support public interest analysis. We have just one generation to reach carbon neutrality and it is hard to image that journey being successful and also somewhat optimal in the absence of open and transparent analysis underpinned by genuinely open data and open source numerics.

▢

robbie.morrison · 1 April 2021 07:38

Project URLs

The URLs for the above projects are contained in this separate wikipost — which any registered user can edit and maintain. Otherwise message @robbie.morrison with updates.

Worth noting that some of the projects listed are wide‑ranging and that the data management component is just one of several workstreams.

The lack of literature is due to the young age of the projects. Notwithstanding, several papers are in the pipeline and may well surface later in 2021.

Also worth noting the wikipedia page on open energy system databases which also lists a number of official and industry information portals containing open data.

Project	Explanation	URLs	Literature
OEO (Open Energy Ontology)	community ontology	project page GitHub
EERAdata	metadata project	project page

Project	Explanation	URLs	Literature
Open Energy Family	data platform	landing page
LOD–GEOSS	linked open data project	project page DBpedia page Databus page
SENTINEL archive package	model data management	project page documentation GitHub
SPINE	model data management	project page
pyam	analysis and visualization of energy systems and integrated assessment scenarios	Read the Docs GitHub	Gidden and Huppmann (2019)
PowerSystems.jl	model data management	GitHub
powerplantmatching	power plants	GitHub	Gotzens et al (2019)

Project	Explanation	URLs	Literature
OPSD (Open Power System Data)	data portal for Europe	landing page wikipedia	Wiese et al (2019)
Atlite	climate data to energy data	documentation GitHub
PowerSystemCaseBuilder.jl	power systems data	GitHub
PowerGenome	power systems data	GitHub	Schivley et al (2021)
Open Energy Outlook	energy systems data	project page	DeCarolis et al (2020)
PUDL (Public Utility Data Liberation)	United States data collection project	project page GitHub	Selvans (2020)

References

DeCarolis, Joseph F, Paulina Jaramillo, Jeremiah X Johnson, David L McCollum, Evelina Trutnevyte, David C Daniels, Gökçe Akın‑Olçum, Joule Bergerson, Soolyeon Cho, Joon‑Ho Choi, Michael T Craig, Anderson R de Queiroz, Hadi Eshraghi, Christopher S Galik, Timothy G Gutowski, Karl R Haapala, Bri‑Mathias Hodge, Simi Hoque, Jesse D Jenkins, Alan Jenn, Daniel J A Johansson, Noah Kaufman, Juha Kiviluoma, Zhenhong Lin, and Heather L MacLean (16 December 2020). “Leveraging open‑source tools for collaborative macro‑energy system modeling efforts”. Joule. 4 (12): 2523–2526. ISSN 2542‑4785. doi:10.1016/j.joule.2020.11.002. Open access.

Gidden, Matthew J and Daniel Huppmann (7 January 2019). “pyam: a Python package for the analysis and visualization of models of the interaction of climate, human, and environmental systems”. Journal of Open Source Software. 4 (33): 1095. ISSN 2475-9066. doi:10.21105/joss.01095. Creative Commons CC‑BY‑4.0.

Gotzens, Fabian, Heidi Heinrichs, Jonas Hörsch, and Fabian Hofmann (1 January 2019). “Performing energy modelling exercises in a transparent way: the issue of data quality in power plant databases”. Energy Strategy Reviews. 23: 1–12. ISSN 2211‑467X. doi:10.1016/j.esr.2018.11.004. Creative Commons CC‑BY‑NC‑ND‑4.0.

Wiese, Frauke, Ingmar Schlecht, Wolf‑Dieter Bunke, Clemens Gerbaulet, Lion Hirth, Martin Jahn, Friedrich Kunz, Casimir Lorenz, Jonathan Mühlenpfordt, Juliane Reimann, and Wolf‑Peter Schill (15 February 2019). “Open Power System Data: frictionless data for electricity system modelling”. Applied Energy. 236: 401–409. ISSN 0306‑2619. doi:10.1016/j.apenergy.2018.11.097. Postprint.

Schivley, Greg, Ethan Welty, and Neha Patankar (19 January 2021). PowerGenome/PowerGenome: v0.4.1. Geneva, Switzerland: Zenodo. doi:10.5281/zenodo.4552835. Snapshot.

Selvans, Zane (30 November 2020). Publishing a PUDL API with Datasette. Catalyst Cooperative. Boulder, Colorado, USA.

robbie.morrison · 13 April 2021 10:11

Digital twins

Perhaps also worth noting some recent digital twin projects that may share similarities with the more sophisticated data system projects listed above. One digital twin in planning is the European Commission DestinE project.

References

Arup (November 2019). Digital twin: towards a meaningful framework. London, United Kingdom: Arup.

European Commission (4 March 2021). Destination Earth (DestinE). Shaping Europe’s digital future - European Commission. Last modified date given.

Vaughan, Adam (12 October 2020). Building digital twins of earth could help Europe cut carbon emissions. New Scientist. London, United Kingdom. Online version. Paywalled.

Vaughan, Adam (17 October 2020). “Virtual Earths to be created”. New Scientist. (3304): 8. ISSN 0262-4079. Print version.