Ithaka questionnaire on community data useage

robbie.morrison · 23 February 2021 08:03

1. What does the the energy modelling community look like? What kinds of researchers are involved in this work (e.g. disciplinary and organizational affiliations), how do they collaborate, and what kinds of formal structures have been established to organize them?

The community arose in Berlin, Germany in September 2014. Most people involved are completing their higher education or classify as early‑stage researchers. A few are mid‑stage researchers and beyond. And some work for consultancies, companies, start‑ups, or government agencies.

Geographically, the community started in the German‑speaking DACH world, later spread to the United States, and is now making inroads into the United Kingdom. Other participants are sprinkled throughout the planet, including the Russian Federation, India, and the Global South. Aside from the first workshop, the working language has always been English.

The community has no formal structures. Its ethos derives from open source software development. By common understanding, those running the various online services or twice‑annual physical workshops are accorded complete dominion. The mailing list is the principle place for making community decisions.

Much of the discussion that follows below centers on European Union law — in part, because Europe provides a more restrictive legal context for data than that found in the United States. But this focus is equally a reflection of our roots.

2. How and what kinds of data are typically incorporated into energy modelling?

Modelers do not generally deal with personal information — as defined under EU law. If such information is required for numerical models, it can normally be anonymized from real data, generated using estimated statistics, or otherwise synthesized — the key issue is that the information remains representative but need not be exact.

Energy system models require general information about component technologies and their engineering and cost characteristics. Technologies such as windfarms, coal‑fired electricity generation, and high‑voltage transmission lines. Cost information is necessarily estimated in most cases because this information is normally commercially sensitive. Notwithstanding, the European Commission, as well as other governing agencies around the world, could collect cost and performance information under a public interest rationale and make key metrics available in generic form. Future costs and performance projections, sometimes also subject to technological learning, are necessarily speculative.

Energy system models require specific details about the system being modeled — including the location, age, and connectivity of all represented assets. That includes information about the networks under investigation — usually the electricity grid but perhaps also gas and district heat infrastructure. Current and potential future demand profiles are needed. Location‑specific resource potentials are needed too, including solar and wind assessments and land availability. And possibly also information concerning the built environment and mobility, depending on the scope of the model. Some models may also require historical market clearance information or information on how households and firms may take short‑run and long‑run decisions.

The bulk of models capture national and supra‑national systems but some research groups investigate municipal systems, islanded microgrids, and standalone systems. Most research questions provide natural boundaries.

Some of the information indicated above is subject to statutory reporting. But the processes for assembling and publishing that information are often archaic and error prone, leading to poor quality disclosure. Projects within the openmod community assemble and curate this information so it can be more readily utilized by modelers and analysts. One such project is the OPSD portal.

Information on future climate patterns is sometimes required but this information can be readily sourced from the climate science literature and is not legally encumbered.

Most of the modeling within the community is intended to inform public policy options for our rapid trajectory to net‑zero carbon. Research either concentrates on methodologies or seeks to provide policy‑relevant results and insights.

3. What infrastructure is currently available to facilitate the sharing of this data among researchers?

Within the orbit of the openmod, the Open Energy Platform (OEP) is the primary resource. This platform is specifically designed to handle the needs of energy system modeling and, in particular, scenario analysis. Energy system modeling differs from other forms of computational science in that testable outcomes are not possible and a range of speculative scenarios — each with their own explicit objectives, constraints, and assumptions — must instead be analyzed and traded‑off against one another.

In addition, there are initiatives specifically aimed at allowing data to be transferred between different modeling projects in order to facilitate cross‑model comparisons. Each model has necessarily evolved its own data interface and internal semantics.

4. Why is open data sharing important to energy modelling? What are the typical positions on this issue among stakeholders engaged with energy modelling?

We adopt the European Commission description for open data (EU Directive 2019/1024, recital 16):

Open data as a concept is generally understood to denote data in an open format that can be freely used, re‑used and shared by anyone for any purpose

Data sharing reduces duplicated work, improves data quality and coverage, and facilitates cross‑model comparisons — that last point being necessary for strengthening confidence in both the direct results and in subsequent interpretations.

Conversely, data without appropriate open licensing may well be legally encumbered and this lack of certainty makes it unsuitable for open modelling.

5. What challenges or barriers to widespread data sharing are unique to research involving energy modelling?

Our primary challenge is the lack of open licensing, particularly on public sector information and information published under statutory reporting. European Union legislation on the terms of use of public sector information is unclear and contradictory and legislation on energy sector disclosure is silent on licensing. These defects need fixing at the level of the European Parliament. The best that researchers can do until then is to push relentlessly for Creative Commons CC‑BY‑4.0 licensing on all such information.

That means that suitable open licensing is key. In most cases, such licenses do not grant binding permissions but rather confer certainty. Particularly given the presence of Directive 96/9/EC database protection within the European Economic Area (EEA) in which one cannot know if a data extraction from a public portal was insignificant or not.

The power exchanges that run the wholesale electricity markets are particularly resistant to providing disclosed information in any kind of usable form — and deploy techniques like serving data that cannot be highlighted and copied to evade recovery. This is certainly against the spirit of the legislation, even if technically compliant.

Another emerging problem is the proliferation of national open data licenses — such as the recent German Government dl‑de/by‑2‑0. Such licenses could well lead to legally siloed data when not inbound compatible with the CC‑BY‑4.0 license, even if only on some trivial legal point.

Data lacking CC‑BY‑4.0 licensing (or CC0‑1.0 waivers or something inbound compatible) is particularly problematic in the United Kingdom because the threshold for copyright is effort‑based and addressable collections of data may also attract database protection. The situation in the United States is considerably better because datasets and databases are unlikely to be intellectual property. Europe falls somewhere in between.

6. What are the most important supports needed in order to cultivate a thriving data community among energy modelers?

Recognition by science funding organizations of several necessities would help. First, the need to require suitable licenses on all appropriate outputs. Second, support for ongoing maintenance, once the underlying data projects have completed. Third, to provide stable online archiving for non‑deliverable artifacts such as project websites, wikis, public mailing lists, and code repositories.

But beyond that, most solutions have to come from within the modeling community.

7. How is openmod working to address the open data sharing needs of the energy modelling community? Who else is doing important work in this area and what else is on the horizon?

For the openmod, the concept of genuinely open data was central from day one. But maturity has brought forward two vitally important related agendas:

a community ontology — a shared worldview
agreement on collection protocols and metadata — the latter being data about data

Both initiatives are interconnected, both involve deep buy‑in from within the community, and both will take significant effort to work through and bed‑in. The Open Energy Ontology is addressing the first and the EERAdata initiative is pursuing the second. The EERA and openmod communities have begun to work together on the latter.

Open is not the only paradigm for energy system modeling. Another is the closed consortium that effectively remains only within the reach of government ministries, multilateral agencies, and allied research teams. How that paradigm evolves in an increasingly open world remains to be seen. In any case, there is virtually no crossover between these two realms at present. A third paradigm is the single‑institution closed project — and again one whose future looks doubtful.

An upcoming challenge is the tracking of both data provenance and data versioning at scale — taken together these represent active research questions in computer science and are certainly not unique to the domain of energy analysis.

The prospect of supporting and using linked open data (LOD) is now surfacing. Some in our community are working with the DBpedia Databus project to explore the possibilities that semi‑smart knowledge graph systems can offer.

Returning to the present, another issue is dataset forking and fragmentation. Under this process, researchers grab whatever data they need for the issues at hand, modify it to suit their needs, and perhaps later publish as a static archive to support transparency and reproducibility. But any corrections and improvements are not propagated back upstream for wider uptake and benefit. LOD clearly has the potential to assist here.

Finally members from within the openmod community make written and oral submissions to European Union public consultation on law reform and science policy. Making ones voice heard in such processes is an important and necessary activity.