Legal issues related to an energy modeling knowledge commons

Release | 11

Introduction

The following discussion traverses key legal issues — as I see them — in relation to building a knowledge commons from the perspective of the open energy system modeling community.[1]

That community is informally structured as the Open Energy Modelling Initiative or openmod. I cannot speak on their behalf, so I use the more generic term “modeling community” here instead to record my observations.

Community members have been engaged in open data and data licensing since the inception of the openmod in 2014. Indeed the concept of genuinely open data was foundational. These two grouped listings capture some of our more recent concerns and efforts:

This posting is somewhat ad‑hoc — in part because our positions are mostly reactive. We are not well placed to propose sweeping changes to public policy and legislation in this realm. The focus here is nonetheless Europe, although our membership is geographically diverse.

Our main strategy is defensive in the sense that we advocate for Creative Commons CC‑BY‑4.0 licensing on primary data as the best solution. This in the absence of historical interest at least by European legislators in providing for a genuine knowledge commons for information of public interest (a limited critique of the Open Data Directive follows later). Accompanying metadata, in contrast, should be released under Creative Commons CC0‑1.0 waivers to minimize friction.

The posting covers only information that has been or can be made public. That then takes matters of commercial and personal privacy off the table in relation to these discussions.

Mixed-content material

There are two significant legal corner cases that have not attracted much attention: mixed‑content material and semantic standards (the latter covered in the next section) These types of artifacts occur regularly in numerical analysis and particularly analysis serving public interests. The reasons being that public interest analysis necessarily places a greater emphasis on repeatability — and thus mixed‑content methods — and on consistency — hence semantic standardization. Taken together, both contribute to the somewhat fuzzy notion of transparency.

I recently blogged on this topic:

So mixed‑content material includes code, data, and written documentation presented together as one collective work. A common example would be a Jupyter Notebook containing the artifacts needed to run a particular rapid decarbonization scenario and then allow one to confirm current outputs against previously reported feasibilities and attributes.[2] A couple of points arise in relation to mixed‑content material specifically:

  1. software should not be excluded from the concept of knowledge — even though software copyright is a specialist form of intellectual property

  2. European 96/9/EC database rights are a significant problem — indeed they are, arguably, the major legal impediment for public interest numerical modeling

  3. combined license usage needs more consideration — in this case, the mixing of content and software licenses to gain full spectrum coverage of copyright, 96/9/EC rights, and patents, both software and general

Point 2 gives rise to the thought that mixed‑content material could well be viewed as a 96/9/EC database and subject to associated rights. This suggestion is perhaps fanciful, but the legal definition provided in the database directive is extremely wide‑ranging.[3] Worth noting too that because 96/9/EC rights attach automatically, there is always legal uncertainty concerning data and mixed‑content material. Moreover, most institutions in the systems modeling space are extremely risk‑averse.

The only real solution is the use of open licenses that expressly traverse and waive 96/9/EC rights (so that excludes all earlier versions of the Creative Commons suite) and waive patents. In my blog, I propose CC0‑1.0 AND MIT be applied to this kind of material.[4]

Semantic standards

Published specifications that inform semantics can equally influence data schemas and software designs. The issues described in this section are likely to be more prominent for systems modeling than for statistical analysis. Semantic standards fall under the umbrella of multi‑role material in the sense that they are intended to influence both data collection and interpretation and code development — in essence, the way practitioners conceive and agree upon the systems they work with. Formal ontologies fall under the rubric of semantic standards.

The following presentation examines the possibility that non‑open semantic standards can potentially taint informed works:

Again, the only real solution I see is the use of CC0‑1.0 OR MIT licensing if the objective is maximally permissive usage.[5] Indeed, I recently suggested that the Open Energy Ontology project covering energy systems modeling adopt this particular arrangement.

That said, the use of proprietary semantic standards is common in the energy domain. Many of those standards are published by the IEEE and remain both legally encumbered and expensive. Both attributes provide barriers to the development of an open knowledge commons.

I acknowledge the ideas in this and the previous section on combined licensing and on standards‑informed derivative works are speculative — but that does not necessarily make them any less valid or important.

Public license interoperability

I believe that robust legal analysis on public license interoperability would be very useful. (Although I don’t know whether that task falls within the scope of your respective projects?) Open licenses are that subset of public licenses that meet the consensus norms established by the Open Knowledge Foundation for data and the Open Source Initiative for software.

A poor example of implementation in this context, in my view, is the Joinup Licensing Assistant (JLA) from the European Commission.

Indeed, while I advocate strongly for such tooling, I recognize that such advice needs to be underpinned by sound legal assessment. And there are a number of results that the JLA returns that certainly left me puzzled.

And therein lies part of the problem — that there is very little case law covering the public licensing of data and content. Ditto for software too, but there has been some at least, including Geniatech v McHardy:

In fact, it was not known prior to this litigation in Germany whether the courts would even warm to the idea of public licensing, so this albeit truncated McHardy trial provided a very welcome result in that regard.

To return to my main point. The matter of legal interoperability of common public licenses — which is, incidentally, directional — needing robust analysis. Together with an associated discussion of the legal uncertainties present.

Perhaps starting with the common licenses: CC‑BY‑4.0, ODbL‑1.0, CDLA‑Permissive‑2.0, EUPL‑1.2, DL‑DE‑BY‑2.0, and OGL‑UK‑3.0.[6] My view is that none of these other listed instruments is inbound‑compatible with CC‑BY‑4.0 — and their use naturally creates license‑delineated information silos.

Dual licensing and more specifically the use of SPDX OR and AND connectors, providing for disjunction and conjunction respectively, should also be covered (see the first two sections).

To give an indication of the size of the issue — the Linux Foundation SPDX project has harvested and identified 17 000 distinct public software licenses in the wild.

Turning back to data, a number of scientific institutions in Europe publish datasets under bespoke licenses — some public and some requiring explicit individual consent — and none classifying as open under established norms.

If you are not aware of efforts to automatically process the terms of public software licenses, this licensing ontology project might be of technical interest:

Legal opinions

You may find the following legal opinions relevant to your projects. They cover the restrictive legal context of energy data under statutory reporting within Europe:

And two related publications:

The key takeaway is that information provided under statutory reporting still gets tangled up in 96/9/EC database protection and possibly also copyright. Fixing that would require just a simple legislative change, I imagine, to remove database rights from information provided under mandatory disclosure. And better still, opt for Creative Commons CC‑BY‑4.0 licensing as the default in the absence of other considerations. The official SMART site uses that exact same license — and without any operational problems arising to my knowledge:

European Regulation 543/2013 establishing the Transparency Platform (section §2.23) introduces the erroneous notion of a “primary owner of the data” without articulating the property rights involved.[7] Indeed, the Transparency Platform would better be subject to open licensing, as argued in this non‑public document:[8]

  • Schmid, Eva, Ingmar Schlecht, Tomas Šumskas, and Florence Melchior (October 2019). Why do we need an open data licence on the ENTSO-E Transparency Platform? An FAQ.

I can provide your studies with considerable background regarding the ENTSO‑E Transparency Platform in this context. I have had multiple interactions with the ENTSO‑E legal division and one interaction with the board to try and resolve some of these legal issues. And also regarding the use of their datasets by US‑based NGOs who believe the database directive does not govern their conduct.

Suitably licensing for energy sector reporting would greatly benefit civil tech projects like this initiative, layered on OpenStreetMap:

And again to stress that the open energy modeling community is desperate for statutory reporting that is free from legal encumbrance and has made considerable efforts to pursue solutions. Indeed, one success was the BNetzA SMARD site mentioned earlier.

Critical infrastructure

To digress from the legal context from a moment. There is some controversy brewing over whether the ENTSO‑E Transparency Platform should intentionally damage reported data covering critical infrastructure for reasons of system security. Information like the geolocations of high‑voltage transmission towers. Energy system modelers are rather disturbed at that prospect because they will need to increasingly rely on second‑best solutions for their data needs. This process of information delegitimization could not happen in California, for instance, with its strong public right‑to‑know traditions.[9]

This general discussion on the specifics of critical infrastructure and how best to balance competing concerns will doubtless increase with time. And those outcomes will likely have a direct bearing on the development of both official and citizen‑based data portal projects and the associated public interest analyses that depend on these primary sources for information.

Case law

This evolving case law is highly material, I would say: Gesellschaft für Freiheitsrechte v Free State of Bavaria. With 96/9/EC database rights being pivotal.

I blogged about that case as follows — and while my analysis is probably of limited interest, the links to primary materials may well be of value (although most are in German):

I try to follow these events and I have not noticed any recent developments.

The original information — detailed building stock thermal performance data and detailed wind farm land availability information — is clearly of public interest. And the GFF (official translation is Society for Civil Rights) response on behalf of data journalist Michael Kreil is also motivated by public interest considerations. And again this conflict centers on 96/9/EC database rights. Indeed, some of the raw material in play is, ironically, under CC‑BY‑4.0 conditions.

On that general note, even if all datasets are made public under open licensing, some lawyers advocate providing overarching licenses on the assemblage as well to cover collective works under copyright law and databases under the scope of the 96/9/EC directive. I support that approach too.

A more general case relevant to this discussion is CV-Online Latvia v Melons, reported as ECLI:EU:C:2021:434 by the CJEU in 2021, which tightened the criteria governing investment and risk‑exposure before 96/9/EC database protection can attach.[10] This judgment probably means that very few of the data portals that energy system modelers would wish to extract information from are likely to gain 96/9/EC protection in such circumstances. But that result does not remove the associated legal uncertainty and modelers should still rely on explicit waivers when building out a knowledge commons for social benefit. Hamilton (2021) summarizes the judicial outcome as follows:[11]

In a judgment handed down on 3 June 2021, the European Court of Justice (CJEU) has redefined the test that EEA courts must apply when establishing whether EU database rights have been infringed. In what appears to be a significant departure from existing case law, in addition to the defendant having extracted or re‑utilised all or a substantial part of the contents of the database without the owner’s permission, a claimant will now also need to show that the infringing acts caused “significant detriment” to the database maker’s investment.

For completeness, the British Horseracing Board v William Hill Organization litigation in 2004 warrants mention. This ruling, which surprised many, found that the resources used for the creation of materials which make up the contents of a database are not protected under this right.[12]

Legislative issues

The definition for “re‑use” in the European Open Data Directive (§2.11, page 70) is wholly inadequate, in my view. The legislation maps the concept of “re‑use” to “use” without much further concrete explanation. Recital 16 (page 58) provides some context, but not the grants we need. Simply stated, we need the unequivocal right to redistribute covered material in original or modified form. My preferred legal conditions are “CC‑BY‑4.0, CC0‑1.0, or something inbound compatible”. Furthermore, recital 18 (page 59) discusses appropriate levels of charging, so clearly this public sector information is not open in any meaningful sense and the title of the directive is also misleading.

I talked to a US copyright lawyer about the phrase “use” and it is not a term of art under US intellectual property law. So no pointers regarding interpretation there.

The Open Data Directive removes 96/9/EC database rights from portals serving public sector information (PSI) (van Eechoud 2021). Note correction:[13]

See elsewhere for comments on Regulation 543/2013.

Non-open science

Major science organizations in Europe are well represented among those who publish data under problematic and/or restrictive terms covering extraction and reuse. Science itself may have carve outs provided under legislation, but those exemptions do no apply more generally — and cannot therefore contribute to the development of a true knowledge commons. I sometimes interact with these organizations and the responses vary greatly. We can talk about those conversations if that would be of interest.

The widely promoted FAIR principles are also meaningless without explicit and suitable open licensing. And the need for proper public licensing has largely been absent from most of this discourse. Indeed, I have not noticed much interest in operationalizing principle R1.1: “(Meta)data are released with a clear and accessible data usage license”. And even then R1.1 falls far short of what is required in practice for genuinely open data.

That said, some commentators do advocate FAIR / O instead — with “open” now added to the acronym.[14] Machine‑readability for metadata, including legal metadata and licensing information specifically, is also promoted.[15][16]

Some further context

Two recent blogs of mine can provide some further context. The first covers the need for radically open climate protection analysis — well beyond current IPCC practices, for instance:

And the second opines that the recent uptick in interest in data portals is likely a distraction and arguably even counterproductive as far as building a knowledge commons is concerned:

Closure

This posting toggles between scientific and public interest considerations. For the open energy modeling community, these two dimensions are usually seen as inseparable.

In my view, licensing under “CC‑BY‑4.0, CC0‑1.0, or something inbound compatible” is a necessary condition for a data commons.

When faced with immiscible licensing and/or lack of licensing, data maintainers are sometimes forced to forward URLs that point to download APIs in order to circumvent legal restrictions on the direct distribution of third‑party information. This strategy is clearly second best — and riddled with self‑evident risks and problems. That is not to criticize genuine community-maintained linked open data (LOD) architectures of course — in this latter case, the “open” qualifier is both central and material.

(On that general note, I recently reworked a ten year old publication and I was stunned how much of the highbrow gray literature I had cited earlier had disappeared completely from the internet: legal judgments, press releases from regulators, authoritative reports, and so on.)

Much of this legal road has already been walked by the free and open source software movement. So it remains a mystery to me as to why there has been so little engagement on licensing by those seeking the ideal of freely reusable information.

More specifically, the FOSS world have been careful to understand open license interoperatilbity. That task needs to repeated for data and for mixed‑content material more generally. On that note specifically, I believe the non‑interoperability of most public licenses intended for content and data to be one of the key barriers to creating a workable knowledge commons.

Aspects of European Union law could clearly be improved. The Open Data Directive needs to be amended to provide clarity on what constitutes “use”.[17] And 96/9/EC database rights should be expressly removed from material under statutory reporting.[18] Regulation 543/2013 establishing the ENTSO‑E Transparency Platform should revise the language covering primary data ownership to provide legal clarity.

A more radical take would be to freeze the provisions of the 96/9/EC Database Directive so no new rights could accrue. The database directive has long been subject to criticism by law faculty academics and others.[19] The United States, in contrast to Europe, has no intellectual property protection for published data in circumstances where no “modicum of creativity” applies.

Generative AI looks set to swamp our knowledge space — whether deliberately or inadvertently — with information lacking accuracy, veracity, and provenance. Which means that the rapid development of a functioning knowledge commons with core rights and responsibilities becomes an imperative.[20]

The open licenses we seek, in many instances provide the modeling community with legal certainty rather than extant grants — because the underpinning intellectual property rights are either non‑existent or unlikely to be enforced for a range of reasons. This legal gray zone in which we are currently forced to operate is highly damaging for public interest numerics. And, I would argue, society more generally will pay a significant price though an impaired analysis of the general solution space — whether the overarching objective is climate protection, energy security, biodiversity loss reversal, social justice, or any number of similar systemic issues.

Legislation

European Parliament and European Council (27 March 1996). “Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases”. Official Journal of the European Union. L 77: 20–28. Known as the database directive.

European Commission (15 June 2013). “Commission Regulation (EU) No 543/2013 of 14 June 2013 on submission and publication of data in electricity markets and amending Annex I to Regulation (EC) No 714/2009 of the European Parliament and of the Council (text with EEA relevance)”. Official Journal of the European Union. L 163: 1–12. Establishing legislation for the ENTSO‑E Transparency Platform.

European Commission (26 June 2019). “Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information — PE/28/2019/REV/1”. Official Journal of the European Union. L 172: 56–83. Covers public sector information and is known as the open data directive.

Addendum

Not directly related to this missive, but this recent posting provides relevant context from the perspective of a United Kingdom electricity distribution company serving open data under Creative Commons CC‑BY‑4.0 licensing:


  1. This posting provides background and references for two upcoming stakeholder interviews that I have scheduled (as of 27 November 2023). The first interview is for the Knowledge Rights 21 project. And the second is for the Open Data Ecosystem (ODECO) Project (in French) run by the French National Centre for Scientific Research.

    Both initiatives are examining legal issues related to the development of a knowledge commons and whether official policies and/or European legislation can and should be modified to assist and facilitate the build‑out of such commons.

    An earlier version of this posting was circulated under the title: Some legal issues related to building a knowledge commons from an energy modeling perspective. ↩︎

  2. For more on this particular application, see: Hodencq, Sacha (22 March 2023). Jupyter notebooks as intermediary objects for energy modelling. Open Energy Modelling Initiative (openmod). YouTube. Duration 00:04:22.

    The licensing dilemma has been around for almost a decade (with the name IPython preceding Jupyter), see: Migdal, Piotr (23 July 2015). Are IPython Notebooks code or slides? (for licensing purpose). Academia Stack Exchange.

    For further background: Kluyver, Thomas, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, and Carol Willing (2016). Jupyter notebooks: a publishing format for reproducible computational workflows. In Fernando Loizides and Birgit Schmidt (editors) Positioning and power in academic publishing: players, agents and agendas. Amsterdam, The Netherlands: IOS Press. ISBN 978‑1‑61499‑648‑4. doi:10.3233/978-1-61499-649-1-87. Proceedings of the 20th International Conference on Electronic Publishing. Download PDF from landing page. ↩︎

  3. Section §1.2 of the 96/9/EC database directive defines a database as “a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means”.

    Ordinary printed maps can, for instance, class as 96/9/EC databases, see: Schweizer, Mark (5 November 2015). C-490/14 — Verlag Esterbauer: Get off my map!. The IPKat. London, United Kingdom. Legal blog. The underlying case is ECLI:EU:C:2015:735. ↩︎

  4. The AND is the SPDX logical conjunction operator. See my aforementioned blog for details. ↩︎

  5. The OR is the SPDX logical disjunction operator. See my aforementioned blog for details. ↩︎

  6. SPDX license identifiers are used here. The SPDX scheme is now officially standardized as ISO/IEC 5962:2021. ↩︎

  7. The criteria for copyright under United Kingdom law could apply to datasets supplied by UK transmission system operators — but this is nonetheless unlikely. The United Kingdom exited the European Union in 2020 but remains part of the European electricity system. All other European jurisdictions require some level of creativity before copyright can attach.

    The establishing legislation for the ENTSO‑E Transparency Platform is cited in full at the end of this posting. ↩︎

  8. I am currently trying to determine whether this document can be published in some format. More soon hopefully. ↩︎

  9. Indeed, it is not clear to me which statute or decision gives ENTSO‑E the ability to intentionally damage material published under mandatory disclosure.

    Such downgrading of information does not appear to be provided for in Regulation 543/2013 which established the Transparency Platform, that regulation cited in full at the end of this posting.

    Other considerations may apply, noting that inadequate climate protection is now being litigated under European human rights law. On that general theme, Sanders (2006:854) opines that “The right of access to information may be enforced through the European Convention on Human Rights. It not only guarantees the freedom of speech, it also recognises the freedom to receive information.”

    Sanders, Anselm Kamperman (July 2006). “Limits to database protection: fair use and scientific research exemptions”. Research Policy. 35 (5): 854–874. ISSN 0048-7333. doi:10.1016/j.respol.2006.04.007. ↩︎

  10. See: Husovec, Martin and Estelle Derclaye (17 June 2021). Access to information and competition concerns enter the sui generis right’s infringement test – The CJEU redefines the database right. Kluwer Copyright Blog. Concerns case C‑762/19 and ruling ECLI:EU:C:2021:434. Quoting:

    Therefore, the CJEU concludes that these acts fall under art 7(2)(a) and (b) of the Database Directive (the directive) but constitute infringement only “provided that they have the effect of depriving that person of income intended to enable him or her to redeem the cost of that investment” (para 37). This important caveat introduces the raison d’être of the database protection into the infringement test. For the Court, its basis is recital 42 of the directive stating that the infringing acts must cause “significant detriment” to the database maker’s investment.

    However, the Court does not stop there. Following AG Szpunar, it notes that (para 41 of the judgment, and paras 3 and 43 of the AG Opinion): “it is necessary to strike a fair balance between, on the one hand, the legitimate interest of the makers of databases in being able to redeem their substantial investment and, on the other hand, that of users and competitors of those makers in having access to the information contained in those databases and the possibility of creating innovative products based on that information”
    The ruling is reported as: CJEU (3 June 2021). Judgment of the Court on ‘CV-Online Latvia’ SIA v ‘Melons’ SIA, case C‑762/19, document ECLI:EU:C:2021:434. Luxembourg City, Luxembourg: Court of Justice of the European Union (CJEU). 6 pages.

    The database directive is cited in full at the end of this posting. ↩︎

  11. Quote from: Hamilton, Chloe (5 July 2021). CJEU ruling narrows the scope of EU database rights. Mishcon de Reya LLP. London, United Kingdom. Concerns case C‑762/19 and ruling ECLI:EU:C:2021:434. ↩︎

  12. That case in full: The British Horseracing Board Ltd and Others v William Hill Organization Ltd.

    See: European Court of Justice (9 November 2004). Judgment of the Court (Grand Chamber) of 9 November 2004 — Case C-203/02 — ECLI:EU:C:2004:695. Kirchberg, Luxembourg: European Court of Justice (ECJ). ↩︎

  13. A preliminary version of this paragraph stated erroneously that the opposite applied — that 96/9/EC rights could automatically attach to public sector information provision.

    van Eechoud (2021:376) makes it clear that “By contrast, public sector bodies that own sui generis database rights are no longer allowed to exercise these — the Directive now makes this explicit.” That provision is given under section §1.6 of the Open Data Directive. It should be noted that the associated definition of a public sector body is more limited than one might imagine.

    ENTSO‑E does not class as a public sector body under European Union law.

    van Eechoud, Mireille (1 April 2021). “A serpent eating its tail: the Database Directive meets the Open Data Directive”. IIC — International Review of Intellectual Property and Competition Law. 52 (4): 375–378. ISSN 2195-0237. doi:10.1007/s40319-021-01049-7. Editorial.

    The open data directive is quoted in full at the end of this posting. ↩︎

  14. See: Hasselbring, Wilhelm, Leslie Carr, Simon Hettrick, Heather Packer, and Thanassis Tiropanis (11 February 2020). “From FAIR research data toward FAIR and open research software”. it - Information Technology. 62 (1). doi:10.1515/itit-2019-0040. :open_access:

    Labastida, Ignasi and Thomas Margoni (1 January 2020). “Licensing FAIR data for reuse”. Data Intelligence. 2 (1-2): 199–207. ISSN 2641-435X. doi:10.1162/dint_a_00042. :open_access: ↩︎

  15. In the energy modeling domain, see: Schwanitz, Valeria Jana, August Wierling, Mehmet Efe Biresselioglu, Massimo Celino, Muhittin Hakan Demir, Maria Bałazińska, Mariusz Kruczek, Manfred Paier, and Demet Suna (25 March 2022). “Current state and call for action to accomplish findability, accessibility, interoperability, and reusability of low carbon energy data”. Scientific Reports. 12 (1): 5208. ISSN 2045-2322. doi:10.1038/s41598-022-08774-0. :open_access: ↩︎

  16. The free and open source software world has developed sophisticated tooling for recovering and confirming software license compliance by the myriad of software components that make up most software systems. A similar effort is needed for data. ↩︎

  17. It is not widely recognized, but the Open Data Directive only grants rights to “use” and not “re‑use” or “reuse” and certainly not to “open data” by any established norms. See my comments elsewhere. ↩︎

  18. As of 2023, the database directive is being reviewed as part of a proposed Data Act. Public submissions closed on 25 June 2021 and a proposal for new harmonized rules on data was published on 23 February 2022.

    See: European Commission (23 February 2022). Data Act: Proposal for a Regulation on harmonised rules on fair access to and use of data (text with EEA significance) — COM(2022) 68 final. Brussels, Belgium: European Commission. ↩︎

  19. One recent example being: Tarkowski, Alek and Francesco Vogelezang (10 December 2021). The argument against property rights in data — Policy brief #1. Europe: Open Future. ↩︎

  20. Zuckerman, Ethan (11 October 2023). Heather Ford: is the web eating itself? LLMs versus verifiability. Ethan Zuckerman. Blog. (LLM is large language model.) ↩︎