Hacking energy sector data

@robbie.morrison took the liberty of transferring a recent posting on data hacking by @tom_brown across from the openmod mailing list to this forum:

There are some copy‑edits here to better suit markdown and improve clarity. But that said, no substantial changes to the content were made.

Introduction

The open energy modeling world has changed since the openmod was founded in September 2014. There are more open modeling frameworks than anyone can count. Research journals and funding bodies increasingly demand open models for studies. In some senses, the initial vision of openmod has been realized, but at the same time I think it is under‑performing compared to its potential.

In particular, the community could do more on data and data quality, which could be called “the new frontier”, and specifically:

  • catalog on the openmod wiki what data is available to help us better discover data
  • crowdsource currently unavailable data
  • catalog model implementations for different regions, to avoid people constantly building new models from scratch

More details below, along with concrete action steps.

Obviously this posting is a personal view by @tom_brown and reflects his own biases and interests.

Cataloging data on the openmod wiki

There is still a need to catalog data and make known what is available. This is distinct from projects like the Open Energy Platform (OEP) which host data. We’re still missing the first step of identifying what is available and linking it. This is not in competition with hosting platforms like OEP, which can be used in a second step to host data in a uniformly accessible way.

We have data pages on the wiki which need filling!

This has worked well on “transmission network datasets”:

but there are big gaps elsewhere, including industry, detailed demand data, and gas networks.

There are virtuous network effects here: if the wiki pages become a standard reference — meaning the first place to look for data, then everyone will want to list their data there.

We can also use the wiki pages to identify gaps in the datasphere.

Concrete action

Take a few minutes to look at the pages like that below and add links to databases (open or commercial) that you know about:

Crowdsourcing data

We should also be crowdsourcing data.

Example: datasets like worldwide steel and cement plants typically have 2000 entries — this is more than a single person can collect, but doable for a team of 10–20.

Openmod could be organizing data hackathons.

We could be creating open datasets where every data point is referenced from a press release or other official source.

Or even better, contributing to existing open projects, like the Global Steel Plant Tracker:

We seem to spend 90% of our time talking about metadata and licensing, only 10% about the data itself, so let’s reverse this ratio!

Concrete action

Volunteer to organize a hackathon! Identify some missing data that could be coordinated over a one‑day hackathon and recruit over the mailing list for volunteers. Make sure it’s doable, and try to make sure there is some social element, such as regular coffee breaks and maybe a zoom dinner afterwards, so it’s not all hard work.

Cataloging model implementations for different regions

I have seen or heard of three different open models of the Chinese power system in the past few days. I’m not sure these projects know about each other. I’ve started a page here:

There is also some overlap with the OEP factsheets here:

But the idea would be to organize by region.

Concrete action

Contribute to the above page on the wiki that lists all the various implementations for regions that are open, with links, for example, all the different implementations in SWITCH, TIMES, calliope, oemof, PyPSA, and so on.

Thanks! All comments, feedback welcome!

2 Likes

Thanks a lot @robbie.morrison and @tom_brown for this initiative. We are taggeling the cataloging issue already in the LOD-GEOSS project. We want to use the databus by dbpedia as a catalog for all kinds of data to ease discovery of the data and for tracking data modifications and updates through unique identifiers for each data set. We have developed the general architecture and some documentation is comming to this forum soon. Currently we are developing some demonstrations of the use of the databus and the catalog, which will also come soon.

1 Like

DBpedia Databus project and distributed data architectures

To note that the DBpedia Databus project was covered in a recent community submission (Morrison 2020 plus five other submitters). Here is figure 1 (page 8) from that document:


Figure: Schematic showing various components of a distributed data architecture (DDA) being developed within the open energy modelling community. Legal interoperability and shared data semantics are two essential requirements. Credit: Genaro Longoria and Robbie Morrison. CC‑BY‑4.0 license (see the PNG metadata for details).

Distributed data architectures (DDA), such as the one depicted, offer advantages over standalone data portals. In some respects, a DDA can be interpreted as a virtual portal. Also important is that Creative Commons CC‑BY‑4.0 and CC0‑1.0 licensing (or something inbound compatible if you must) is a necessary condition for a DDA, like the DBpedia Databus, to operate successfully.

References

Morrison, Robbie (30 May 2020). Submission on a European strategy for data with an emphasis on energy sector datasets — Release 08. Creative Commons CC BY 4.0 license.

Thanks @robbie.morrison for posting! A nice recent example of a crowdsourcing effort for open energy data:

Dan Stowell, @jack_kelly, Damien Tanner, Jamie Taylor, Ethan Jones, James Geddes & Ed Chalstrey, A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK, Scientific Data (2020)

“We present the results of a major crowd-sourcing campaign to create open geographic data for over 260,000 solar PV installations across the UK, covering an estimated 86% of the capacity in the country.”