Introduction
The three tables below summarize a number of data projects broadly located within the open energy modeling community and primarily intended to serve that community. The EERAdata metadata project is an exception in this regard, in that it is designed to assist closed data communities as well.
Most of the projects outlined either utilize open data or are suitable for use with open data. The “open” is especially important because without Creative Commons CC‑BY‑4.0 licensing (or CC0‑1.0 or something inbound compatible) applied to the datasets, most initiatives would flounder in the absence of legal interoperability — a process otherwise known as data siloing.
The majority of projects listed have only been public facing since 2019 or 2020. But their breath and sophistication is an indication that the open energy system modeling domain is maturing.
That said, the quality of data drops off substantially as one moves specifically into the energy domain. These issues are discussed further in following threads:
In addition, the availability of open data ranges from good to practically nonexistent, depending on the region in question. And the quality and usability of data under statutory reporting also varies markedly. For these reasons, the content projects listed typically rely on community curation to source, clean, maintain, and repair the datasets they offer.
And briefly, for those not familiar with energy system modeling, this activity involves selecting a suitable modeling framework, populating it with data that represents the current state of the system, hypothesizing various future development trajectories known as “scenarios”, electing one such scenario to act as a reference — normally some estimate of “business as usual” or “policy as usual”, and then comparing the remaining scenarios against this particular reference. Furthermore, the data described in this posting classes as “non‑personal”, meaning that issues related to individual privacy do not apply.
Finally, be aware that the Germany‑based Open Energy Platform and Family are distinct and different from the United Kingdom‑based Icebreaker One Open Energy project.
High‑level concepts
Data interoperability requires that the domain community agree on high‑level concepts covering semantics, labeling, and collection protocols. In addition, a degree of convergence on technical practices is also needed but that process tends to be relatively organic.
The Open Energy Ontology (OEO) is located within the openmod community and is licensed CC0‑1.0. As indicated earlier, the EERAdata metadata project is more broadly cast and is intended to support both shared and open data.
Project | Name / tagline | Coordinator / host | Role |
---|---|---|---|
OEO | Open Energy Ontology | Open Energy Family | community ontology |
EERAdata | metadata project | European Energy Research Alliance | metadata standards |
Emphasis on infrastructure and/or tooling
The following infrastructure projects are designed to support modeling activities. In addition, energy modelers have largely settled on python and julia as the programming languages of choice, as indicated.
Project | Name / tagline | Coordinator / host | Role |
---|---|---|---|
Open Energy Family | open collaborative framework for energy data management | Open Energy Platform Community, OVGU, RLI, and many more | research data infrastructure including scenario management |
LOD–GEOSS | linked open data and the global earth observation system of systems in energy systems analysis | German Aerospace Center, Institute of Networked Energy Systems | concept development for networked databases and metadata discovery based on DBpedia Databus project |
SENTINEL archive package | project-specific model‑agnostic data management system | European institutions | facilitate metadata-enabled data flow between linked models, bridge to external data APIs, offer standardized analysis |
SPINE | open source toolbox for modeling | VTT Finland and others | cross‑model data interoperability |
pyam | analysis and visualization of integrated assessment scenarios | IIASA and IAMC | python package for processing, validation, and plotting of energy system scenario data |
PowerSystems.jl | — | National Renewable Energy Laboratory | efficient cross‑model data management utility |
powerplantmatching | power plant inventories | Frankfurt Institute for Advanced Studies | a toolset for cleaning, standardizing, and combining multiple power plant databases |
Emphasis on content
The following content projects repackage datasets for use by modelers. They are normally either constituted as web‑interfaced portals or as programming libraries installed via package management systems.
The Atlite project intends to support the CORDEX regional climate projection datasets and thereby allow modelers to readily apply future climate‑modified weather sets to their numerical investigations.
Project | Name / tagline | Coordinator / host | Role |
---|---|---|---|
OPSD | Open Power System Data | Neon GmbH | European electricity data portal with emphasis on curation |
Atlite | weather data to renewables potentials | German institutions | python library containing ECMWF ERA5 and SARAH datasets to produce renewables timeseries |
PowerSystemCaseBuilder.jl | — | National Renewable Energy Laboratory | library of datasets for PowerSystems.jl (see above) |
PowerGenome | — | Carbon Impact Consulting LLC | US electricity capacity expansion data platform with model transformation |
Open Energy Outlook | system data part of project here | North Carolina State University and others | fully open public policy analysis for deep decarbonization of United States |
PUDL | Public Utility Data Liberation | Catalyst Cooperative | wide‑ranging United States data project |
Note that official databases (for instance, the SMARD portal from the German BNetzA) and industry databases (for instance, Open Data Energy Networks from the French RTE and partners) are not listed here.
Closure
It is relatively easy to speak of an “information ecosystem” but very much harder to create one. The listed projects, taken collectively, arguably now represent such an ecosystem.
There are, nonetheless, many challenges ahead. These include creating and refining community consensuses on semantics and on metadata, converging on technical guidelines, and collecting and curating legally usable and reusable datasets. Taken together, these activities should yield an integrated coherent common pool information resource.
Although this thread covers data projects, it is not possible to separate code from data. Indeed the two are completely intertwined on closer examination.
On that note, data systems projects like PowerSystemCaseBuilder.jl, PowerGenome, and SPINE are starting to provide coherent, integrated, integrity‑checked, and relatively complete snapshots for a number of electricity systems of interest — and by extension, energy systems more broadly in due course. These projects utilize high‑level abstractions that allow their inventories to be translated to serve a range of modeling frameworks. Indeed, the days of simply treating the input data as ad‑hoc collections of resources that surround and feed various numerical simulations may well be numbered. That said, each modeling project will probably also need to to supplement these common data systems with additional data to meet specific requirements. The use of data systems can also foster a more consistent semantics, encourage cross‑model comparisons, and assist standardized reporting.
The projects listed are collectively designed to support public interest analysis. We have just one generation to reach carbon neutrality and it is hard to image that journey being successful and also somewhat optimal in the absence of open and transparent analysis underpinned by genuinely open data and open source numerics.
▢