Toward a community view on open data licensing

Traffic on the openmod mailing list in early April 2018 indicated that forming a community view (or views) on open data licensing might be useful. @berit.mueller suggested migrating that discussion to this forum. The original thread started by discussing the forthcoming European Commission JRC-IDEES energy sector database and POTEnCIA energy system model. But contributions soon spread to the licensing of IDEES and then to open data licensing more generally. And then to whether a community position on this question should be attempted and used to influence official and semi-official data providers, including the JRC.

I offered to write a summary to frame the discussion and start the ball rolling. So here it is.

European focus

The analysis which follows is based on European law. As a result, some of the phrasing used here derives from German copyright law (abbreviated UrhG) (Juris 2018) and some from the European Union database directive (European Parliament and European Council 1996).

Aside from minor variations in terminology, the principal difference between European and US law is that, if a substantial direct effort underpins a public database located within the European Union, then the database maker can optionally claim a sui generis database right. This right protects against substantial extraction and spans 15 years. Courts have yet to determine what “substantial” means (somewhere above 20% to 50% perhaps?).

Participants from outside Europe are nonetheless encouraged to join in the discussion.

Datasets and databases

A reasonable jumping off point is to underscore the distinction between a dataset and a database.

For our purposes, a dataset is a collection of measurements or observations in the form of an array or table. Ideally, a dataset is packaged with metadata documenting its semantics (meaning), provenance (history), licensing (or public domain waiver), and technical details (encodings, datatypes, technical conventions).

For our purposes, a database, at the minimum, holds individually accessible datasets arranged in a systematic or methodical way. This definition does not preclude deeper database designs such as relational or GIS databases. It is presumed for this discussion that the selection and arrangement of datasets within the database is not sufficiently creative for the database itself to attract copyright.

Copyrightability

It is an open question as to how much of the information hosted on an energy sector data portal does attract copyright. Machine-generated data (say from electricity meters or market clearing algorithms) clearly does not, as the “author’s own intellectual creation” is self-evidently absent (UrhG §2(2)). It is also debatable whether asset inventories (of power plants and similar) would, in terms of their selection and arrangement of information, meet the “intellectual creation” threshold just indicated (a phone book, for example, does not).

Notwithstanding, adding an open data license or public domain dedication to a dataset that is not eligible for copyright is not a problem in and of itself (ANDS 2017:2).

Permissive licensing versus public domain dedication

There are two schools of thought regarding dataset licensing.

The first favors public public domain dedications (or alternatively public domain marks). This yields the maximum flexibility and minimum hindrance for the end user. Stodden (2009) for instance prefers this approach.

The second school opts for permissive licenses. This requires that the copyright holders be attributed when datasets are redistributed. Some argue that attribution can become problematic (Meeker 2017:260) while other suggest not (Pollock 2009).

To inject my opinion, both approaches should work for energy modelers. Also to note that packaged datasets can be nested (like software in distos), so some of this large-scale remixing and overblown attribution management is both unnecessary and arguably poor data practice. Better would be to ship nested datasets with extraction scripts or more sophisticated databases with preformed views or SQL code.

Framing

I suggest we initially limit our discussion to the licensing of datasets. Participants are, of course, free to ignore this comment.

Two licenses and dedications are usually put on the table:

  • Creative Commons Public Domain Dedication 1.0 : CC0-1.0
  • Creative Commons Attribution 4.0 International : CC-BY-4.0

Compatibility chart

The following compatibility chart may help guide the discussion. Its accuracy is not guaranteed but it has been circulated within FSFE and OKI circles for comment.

 

Figure: Common data licenses and public domain dedications

Note that the following commonly discussed licenses are missing from the diagram: ODC-By, CDLA-Sharing, CDLA-Permissive. (I’ll be attending a presentation on the CDLA family on 18 April 2018 and can report back.)

References

Australian National Data Service (4 January 2017). Copyright, data and licensing. Melbourne, Australia: Australian National Data Service (ANDS).

Ball, Alex (17 July 2014). How to license research data. Edinburgh, United Kingdom: Digital Curation Centre (DCC).

Doldirina, Catherine, Anders Friis-Christensen, Nicole Ostlaender, Andrea Perego, Alessandro Annoni, Ioannis Kanellopoulos, Massimo Craglia, Lorenzino Vaccari, Giacinto Tartaglia, Fabrizio Bonato, Paul Triaille Jean, and Stefano Gentile (2015). JRC data policy — Report EUR 27163 EN. Luxembourg: Publications Office of the European Union. ISBN 978-92-79-47104-9. doi:10.2788/607378.

European Parliament and European Council (27 March 1996). “Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases”. Official Journal of the European Union. L 77: 20–28.

Frictionless Data (ongoing). Applying licenses, waivers or public domain marks. Frictionless data. Cambridge, United Kingdom.

Juris (2018). Act on Copyright and Related Rights (Urheberrechtsgesetz, UrhG) — Amendments to 1 September 2017 — Official translation. Saarbrücken, Germany: Juris.

Meeker, Heather (4 April 2017). Open (source) for business: a practical guide to open source software licensing (2nd edition). North Charleston, South Carolina, USA: CreateSpace Independent Publishing Platform. ISBN 978-154473764-5.

Morrison, Robbie, Tom Brown, and Matteo De Felice (10 December 2017). Submission on the re-use of public sector information: with an emphasis on energy system datasets — Release 09. Berlin, Germany. Published under a Creative Commons CC BY 4.0 license.

Pollock, Rufus (9 February 2009). Comments on the Science Commons protocol for implementing open access data. Open Knowledge International Blog.

Stodden, Victoria (3 March 2009). “Enabling reproducible research: open licensing for scientific innovation”. International Journal of Communications Law and Policy. 13: 1–25.

Straw poll results

I have been undertaking some straw polls on this topic when I can. And posting the results here. This entry is user-editable (you must be logged in though) and others are invited to add their informal poll results too.

date event audience CC‑BY‑4.0 CC0‑1.0 votes cast notes
18 May 2018 13th ÖGOR—IHS Workshop ~100 60% 40% ~10
08 June 2018 8th Openmod Workshop ~60 25% 75% 20

Other comments:

  • ODbL-1.0 should be favored over CC-BY-SA-4.0 due to compatibility with OpenStreetMap data