Energy Modelling Platform — Europe May 2017 meeting

robbie.morrison · 22 May 2017 13:28

EMP–E meeting

European Energy Platform — Europe
Reflections on the first EMP–E meeting

17 and 18 May 2017
Joint Research Centre (JRC), European Commission
21 Rue du Champ de Mars – Marsveldstraat, Brussels, Belgium

Author: Robbie Morrison
Date: see revision history
Documentation license: Creative Commons CC BY 4.0 International

Please note:

event website
slides and videos will be available from the REEEM project in due course
the European Commission is welcome to treat this posting as input for any consultation it may undertake on open modeling, open data, and open science

Please reply below if you have comments, corrections, or additional information.

ec-jrc-entrance-graffiti-5982

Entrance to the European Commission JRC building in Brussels

Introduction

This first EMP–E meeting aimed “to initiate a long-term forum for exchanging research, development, and practice of energy system modelling in Europe and, where feasible, promote the sharing of data and resources and improve the efficiency of research in the area” (REEEM website).

I attended the event on behalf of the Open Energy Modelling Initiative. Other people present who also participate in the openmod initiative were: Paul Deane, Mark Howells, Jonas Hörsch, Ludwig Hülk, and Berit Müller.

A number of modeling teams presented their work. These models and their results are not covered here. The purpose of this report is instead to examine key themes and threads. These observations are necessarily incomplete as I attended just two of a total of six focus groups.

These notes also contain personal impressions and interpretations, particularly in relation to the concept of openness and on open data licensing.

Quotes from German copyright law (Urheberrechtsgesetz shortened to UrhG) are from the official English translation, given as Juris (2017). According to Wikipedia, the UrhG is more akin to an author rights law than a conventional copyright law.

For the record, the models presented included: E3ME-FTT, ESI, EUCalc, InSmart, MAGIC, MEDEAS, METIS, POTEnCIA, PRIMES, REflex, SET-Nav, SIM4NEXUS.

Key themes

From my perspective, some of the more interesting themes were:

transparency versus openness: a public policy ideal and an open source ethos
energy metadata: work towards a voluntary standard
dataset license compliance: not discussed much at the meeting but covered here
model linking: soft, hard, integrated
model classification: a notoriously difficult task

Transparency versus openness

One central and often misunderstood theme (from my perspective) was the difference between transparency and openness. In the context of this meeting:

Public transparency is a public policy ideal which requires, at the least, that the model in question be fully documented and that the data used be made available for inspection, but neither necessarily under open copyright licenses. Some authors prefer to term the headline concept comprehensibility rather than transparency (Cao et al 2016:2).

Open development (or openness), on the other hand (and I concede this definition is not universal), derives from an open source development ethos, whereby publication under an open copyright license is key, irrespective of the state or usefulness of the code or data in question. Notwithstanding, the open source ethos is as much a software development paradigm as it is a copyright licensing model (Meeke 2017:ii). A strong case for openness in the energy system modeling domain can be made on both public policy and academic grounds (Pfenninger 2017).

Research reproducibility is a related ideal from the open science movement and constitutes a third prism for viewing this general theme. This perspective, grounded in good science practice, is not much considered here, although it may be the primary motivation for some individuals and teams.

Hence, transparency can be achieved using standard copyright, implicit or explicit, while openness demands an appropriate open license (or public domain waiver) be deployed. Openness allows the code and data to be legally studied, used, modified, and redistributed, the latter sometimes with conditions (copyleft provisions) to protect these freedoms.

Research reproducibility involving source code, strictly speaking, cannot be achieved under standard copyright because there is no right granted to build (interpret or compile and link) and run (execute) the program (discussed later). In terms of data and under German law, a “substantial part of a database” may be reproduced and used “for personal scientific use if and insofar as the reproduction is justified for that purpose and the scientific use does not serve commercial purposes” (§87c UrhG, emphasis added). Other legal jurisdictions do not offer this exemption. Moreover, if the data is under standard copyright, then it cannot be republished (beyond fair use, a legal doctrine originating in the US and weaker in Europe) to support reproducibility findings.

This section was difficult to draft because participants mixed and switched terminology and repeatedly referred to transparency as openness and increasing transparency as opening up.

Participants and projects were located at various points along the closed/transparent/open spectrum.

The least open was PRIMES lead Pantelis Carpos who argued that there is a trade-off between transparency and model quality. This view was challenged by a researcher from climate modeling community who asserted the opposite. Carpos continued with regard to model know-how: “[there] must be copyright on the mathematics” he said. This statement is nonsense. Under US copyright law, “mathematical principles” and “formulas or algorithms” cannot be copyrighted (§102 of the United States Copyright Act, see US Copyright Office 2012). European copyright law (expressed through EU directives, national laws, and national and European Court of Justice (ECJ) rulings) is a little different: it does not expressly exclude mathematical concepts, but neither does it include them.

As noted, the question of “opening up” models and data was traversed often.

Legacy projects present a particular problem in terms of publication. In order to switch to an open license for a codebase which has not employed contribution agreements (which assign copyright to the project), each contributor or their respective institution must give written consent. The JRC is trying to open up one of its models and is currently undertaking this process. Projects that are open from day zero do not normally face this problem.

Several projects (no names though) have simply made their code or data available via the web, including through cloud services like Dropbox, without considering copyright issues or adding copyright license notices. Individuals from two of these projects recounted their experiences which were invariably positive. Going “open” meant additional email traffic, but important data errors were identified and valuable feedback was received. Another participant expressed concern that modeling teams would effectively need to operate “help desks” if they published their models. However, experience suggests otherwise.

Incidentally, it is a breach of copyright to run a model lacking an open license under US law (see next sentence) and German law (see §69c UrhG for what is permitted). In relation to GitHub explicitly but applicable more generally: “unfortunately, if no licensing terms are applied, the default is that no rights are granted to use the software.” (Meeke 2017:148, emphasis added)

Some participants raised valid points countering the publishing or open sourcing of existing projects. The first concerns the intellectual and financial investment in the project to date. Whether to regard that investment as sunk or not is a matter for each team.

The second point concerns academic reputation. The open source mantra of “release early and release often” does not readily apply in this context. Academic teams may instead want a degree of polish before placing their codebases and datasets on public view.

One person remarked that published models (with or without open licenses) would only be of interest to associates of that project. A representative from a TIMES variant (not sure which particular one) countered by saying they averaged seven downloads per week and have no idea where these requests originate.

Open sourcing need not be an end in itself. The real objective may be to generate a community around the project in question. This community can then use and develop the software and also document the project (often on wikis) and assist with collective support (via mailing lists and forums).

Participants from earth science disciplines say this process is exactly what happened within the climate and ocean modeling communities over the last decade. Furthermore, there was a rationalization of models and a consolidation of effort. Whether this same dynamic will develop in the more diverse energy systems modeling domain remains unanswered. But I suspect it will.

Patents were not mentioned during the meeting. Software patents do not seem to be a material issue for energy system modeling.

Although this section has so far discussed code and data together, there are very distinct differences between these two camps. The legal context for open data is where open source was a decade or so ago: widespread confusion over licensing and license types, limited case law, and little or no robust analysis. The remainder of this section focuses on issues involving data.

Code and data are divergent in terms of reliability assessment. Source code can be code reviewed and running programs can be tested for fidelity. But assessing the reliability of conventional data requires a knowledge of its provenance and any data cleansing that has occurred en route. Some sites offer the ability for users to comment on and even correct spurious entries.

There is another approach. Public observations and publicly available facts (there is a lower bound on what can be subject to copyright due to an absence of creativity) are increasingly being crowdsourced, databased, and published on the web. Relentless contributions can improve the accuracy and scope of the collection.

Publishing data can also raise issues of privacy, both personal and commercial, in a way that code does not. This is not a direct concern for crowdsourced data, but big data nonetheless carries its own ethical considerations.

While not discussed at the meeting, the legal issues related to derived works created by interrogating open energy database projects are quite simply a nightmare. Arbitrary multi‑license, multi‑author works can be created in a blink using SQL (single database) and SPARQL (cross‑site) requests. License incompatibilities are difficult to avoid and manage and adding back the legally necessary license notices can be complicated and difficult to automate. Even complying with the attribution-only notice requirements from permissive licenses in such circumstances is onerous (Meeke 2017:260).

Similar issues apply to combined datasets created on the client side rather than the server side. That is, collections of input data combined and assembled manually may not be much different in legal terms. Republishing will require a careful stock-take of all licensing requirements, including database protection provisions (see European Union 1996).

Notwithstanding, one team present at the meeting (not named here) simply ignored the matter of copyright and any associated restrictions and license notice requirements. Indeed, they republished their combined dataset on the Elsevier website and on Dropbox. The team leader opined it was better to ask for forgiveness than seek permission. Their university administration may take a different view, however.

Finally, only about half the live open energy database projects support structured queries. The other half simply act as file servers, albeit with application programming interfaces (API) to support programmatic access.

A voluntary energy metadata standard

Ludwig Hülk presented work in progress on a voluntary energy metadata standard. His proposal uses JSON, a hierarchical (tree-based) human and machine-readable format. The standard would be license-agnostic, supporting all rights reserved copyright, Creative Commons licenses, and public domain waivers, among others. Background information and discussions related to this focus group are located here.

A JSON mock-up is available too. Focus group participants felt that a format field would be a necessary addition, in order to suitably identify, for example, the presence and characteristics of an associated comma-separated values (CSV) data file.

See below for comments on the JRC energy metadata proposal.

In my view, the development and first release of an energy metadata standard for open data should be a priority for the open energy modeling community. Close behind this is the question of promoting voluntary standards for energy data. And close behind that is the question of open data licensing and license management.

JRC–IDEES energy database and licensing

IDEES stands for the Integrated Database of the European Energy Sector, under development by the Joint Research Centre (JRC) of the European Commission. The IDEES database will offer a comprehensive decomposed database of energy use within Europe (Mantzos 2016). The database should go live later this year, 2017 (see presentation by Tobias Wiesenthal). Persistent URIs to individual datasets will be provided for programmatic access, but versioning was not discussed. There will be internal and external versions of the platform. IDEES will also underpin the JRC POTEnCIA model, currently under development.

The database will initially span the years 2000–2018 for all member states. An iterative consultation process will take place to determine a common reference for future energy policy assessments within the European Energy Union.

Dataset licensing is to be governed by the JRC policy on data, given by Doldirina et al (2015). This means that the “acquisition of data by the JRC from third parties shall, where possible and feasible, be governed by the Open Data principles, and all efforts shall be made to avoid imposition of restrictions to their access and use by the JRC and subsequent users” (Doldirina et al 2015:6). The Open Data principles (also page 6), however, are silent on the right of ordinary users to distribute original and modified works.

With regard to Commission-sourced data, some kind of attribution license (perhaps the EU reuse and copyright notice, see European Commission 2011) was indicated when appropriate (see the presentation by Andreas Zucker). It would be great if the Commission could state exactly which open license they intend to use for these data contributions.

Metadata is to follow the JRC Data Policy Implementation Guidelines. As of May 2017, these guidelines are not yet public.

In conclusion, it seems that the use of open licenses in cases where the data provider would agree, has not been satisfactorily traversed and resolved. It therefore remains to be seen whether the IDEES database will deliver openness or merely transparency (both terms as defined earlier). Openness would provide data that modelers can use and adapt to their own requirements and then republish to satisfy their own needs for transparency and scientific reproducibility. This would also provide downstream researchers with that exact same opportunity.

Open access publishing

The European Commission has policies on open access publishing. Under the Horizon 2020 research funding programme, open access publishing should become routine (European Commission 2017).

The European Commission is to be commended for this initiative, which will mean that the interested public can easily and legally access and circulate these publications.

Energy model linking

Although discussed for many years, model linking has become a major research focus. A number of projects now link software, often in complex ways using up to five different stand-alone models. Linking may be implemented manually (soft), script-wise (hard), or programmatically (integrated).

Exchange may be better achieved using parameterized functions than simply passing scalars or arrays.

Energy system models may be coupled to specialized domain models. For instance an energy system optimization model, which includes heating and transport, was coupled with air dispersion and health impact models, to assess the case for stricter air quality regulations.

Several projects couple a computable general equilibrium (CGE) model with a high-resolution multi-sector dispatch model, in an attempt to address the weaknesses inherent in both. The practice remains a research question with a number of issues still unresolved.

As an aside, several participants commented that the European Commission is asking modelers to analyze increasing complex public policy questions.

Energy model classification

Keynote speaker Michael Grubb touched briefly on the topic of model classification. Micheal suggested that at least eight dimensions are needed to capture model characteristics (given on one of his slides which I guess will be available shortly) and that this is way too many dimensions to sensibly cluster and interpret.

As an aside, Michael said that project financing was a major real world issue affecting the uptake of new renewables technologies. Incorporating financing dynamics into energy models would represent an interesting new line of inquiry.

Energy Modelling Platform — Europe

Somewhat surprisingly, the future shape and organization of the EMP–E did not receive much attention. However the following were discussed in this context:

backcasting and the need to better understand past energy system trajectories
model comparison exercises (a practice common in the physical sciences)

Closure

It is enormously encouraging that the European Commission is moving towards transparency and/or openness in the area of energy policy models. Notwithstanding, the Commission does need to select an open data license for its JRC–IDEES database for material for which standard copyright need not apply. Even so, the licensing and license management of open data remains a major, difficult, and perhaps even intractable problem.

The difference between transparency and openness needs to be accorded greater recognition and care. Publishing under default copyright confers users and modelers very different rights than when publishing under open licenses or public domain waivers. This is equally true for code and data, although the licensing details employed will differ in each case. Research reproducibility also demands, effectively, the use of open licenses.

References

Cao, Karl-Kiên, Felix Cebulla, Jonatan J Gómez Vilchez, Babak Mousavi, and Sigrid Prehofer (28 September 2016). “Raising awareness in model-based energy scenario studies — a transparency checklist”. Energy, Sustainability and Society. 6: 28–47. ISSN 2192-0567. doi:10.1186/s13705-016-0090-z. Open access.

Doldirina, Catherine, Anders Friis-Christensen, Nicole Ostlaender, Andrea Perego, Alessandro Annoni, Ioannis Kanellopoulos, Massimo Craglia, Lorenzino Vaccari, Giacinto Tartaglia, Fabrizio Bonato, Paul Triaille Jean, and Stefano Gentile (2015). JRC data policy — Report EUR 27163 EN. Luxembourg: Publications Office of the European Union. ISBN 978-92-79-47104-9. doi:10.2788/607378.

European Commission (14 December 2011). “Commission decision of 12 December 2011 on the reuse of Commission documents — 2011/833/EU”. Official Journal of the European Union. L 330: 39–42.

European Commission (21 March 2017). H2020 Programme: guidelines to the rules on open access to scientific publications and open access to research data in Horizon 2020 — Version 3.2. Brussels, Belgium: European Commission Directorate-General for Research and Innovation.

European Union (1996). Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases.

Juris (2017). Act on Copyright and Related Rights (Urheberrechtsgesetz, UrhG) — Amendments to 20 December 2016 — Official translation. Saarbrücken, Germany: Juris.

Mantzos, Leonidas (1 March 2016). Introducing the JRC-IDEES database — Presentation. Brussels, Belgium: European Commission Joint Research Centre (JRC). IDEES stands for Integrated Database of the European Energy Sector.

Meeke, Heather (4 April 2017). Open (source) for business: a practical guide to open source software licensing (2nd ed). North Charleston, South Carolina, USA: CreateSpace Independent Publishing Platform. ISBN 978-154473764-5.

Pfenninger, Stefan (23 February 2017). “Energy scientists must show their workings”. Nature News. 542: 393. doi:10.1038/542393a.

REEEM. “Energy Modelling Platform for Europe (EMP-E) 2017 in Brussels”. REEEM (Role of technologies in an energy efficient economy — model based analysis policy measures and transformation pathways to a sustainable energy system). Accessed 2017-05-19.

US Copyright Office (January 2012). Ideas, methods, or systems — Circular 31. Washington DC, USA: United States Copyright Office.

▢

berit.mueller · 1 June 2017 14:32

There is a second site where the presentations, posters and speeches will be linked:

http://www.reeem.org/index.php/emp-e-main/

robbie.morrison · 16 August 2017 12:10

URLs

The agenda with links to the slides (PDF) and videos (MP4) are here:

http://www.reeem.org/index.php/emp-e-main/

The posters (PDF) from the networking space are here:

http://www.reeem.org/index.php/emp-e-poster/

And some pictures of the event are here:

http://www.reeem.org/index.php/emp-e-pictures-impressions/

The remarkable comments by Pantelis Capros on the trade-off between transparency and quality (00:21:30–00:23:30) and copyrighting the mathematics used in energy models (00:23:10) — as discussed in main article — are viewable on this YouTube video.

robbie.morrison · 20 February 2018 13:55

The posting above eventually turned into a fully-fledged paper:

Morrison, Robbie (April 2018). “Energy system modeling: public transparency, scientific reproducibility, and open development”. Energy Strategy Reviews. 20: 49–63. ISSN 2211-467X. doi:10.1016/j.esr.2017.12.010. Open access.