Definitions for open

This thread considers how open and public differ as concepts and provides several widely‑used definitions for “open”. Questions like these often surface during discussions on the advantages of open in its various contexts: open source, open data, open access, open content, and open science.

This thread lists widely accepted touchstone definitions rather than reviewing individual licenses and their relative merits. Nor does this thread consider the various processes involved in endorsing licenses in relation to these various touchstone definitions — a complicated and ofttimes controversial exercise.

The concept of public simply means the content is made generally available to anyone. In the internet era, that can include publication on a website, offering a standalone file for download, or using a clonable code hosting site. Copyright normally attaches and the author or authors retain all rights by default, even if no explicit claim is made. All rights reserved copyright is debilitating for open science for reasons covered later. In addition for data, database rights may adhere in some jurisdictions, principally Europe and the United Kingdom.

Widely‑accepted definitions for open apply to each category of content: software, data, metadata, and academic publishing. These definitions are indicated in the diagram below and listed later, with other forms of artifact also considered. Each category has different legal considerations, different definitions, and therefore different sets of licenses. For instance, software licenses should address software patents whereas licenses suitable for data need to consider European database protection. Open source hardware licensing is not considered here.

Figure 1: Different categories of content and their respective touchstone definitions.

In operational terms, open means that the content carries a suitable open license. Open licenses grant recipients permission to use and modify the material and republish any changes that they might make under the same or compatible contractual terms. This last point is the key — such re‑use means improvements can be returned to the information commons for all to benefit. Unlike proprietary licenses, open licenses grant these permissions to anyone who complies with the stated conditions — without the need to negotiate with rights holders on an individual basis.

Open licenses, however, fall into two distinct camps. So‑called reciprocal licenses (also known as copyleft) are legally sticky and require that the licensed material be reissued under the same conditions — so that the material effectively remains in the information commons forever. Whereas so‑called permissive licenses intentionally allow information leakage into proprietary products. Permissive licenses may be selected by software projects seeking to broaden uptake and foster informal standards. Aside from software, permissive licenses are used extensively for open data and for academic publishing.

Open software

The term software covers both source code and object code — with executable programs falling into the latter grouping. The gold standard definition for open software is the Open Source Initiative (OSI) Open Source Definition (OSD):

In addition, the OSI act as a license steward and maintain a list of approved licenses. The Free Software Foundation (FSF) also publish an operationally equivalent definition to the OSD, widely referred to as the “four freedoms”.

As indicated, software licenses need to consider legal issues specific to software and only explicit software licenses should be applied to source code.

Open data

Like software, data requires specific legal considerations. Database protection for addressable collections of facts and values served from within Europe or the United Kingdom may arise. Copyright may attach to collections of facts and values in jurisdictions that employ an efforts‑based threshold, including the United Kingdom. The gold standard definition for open data is the Open Knowledge Foundation (OKF) Open Definition:

A key sentence in that definition is that open data is data that:

can be freely used, modified, and shared by anyone for any purpose — subject, at most, to measures that preserve provenance and openness

In 2019, the European Union introduced a definition for open data in their reworked public sector information (PSI) directive:

Recital 16 of that directive (page 58) begins:1

Open data as a concept is generally understood to denote data in an open format that can be freely used, re‑used and shared by anyone for any purpose.

Readers should note that Creative Commons licenses prior to version 4.0 do not specifically traverse database rights and similar issues and should not be applied to datasets and databases.

Readers should also note that the FAIR data principles articulated by Wilkinson et al (2016) do not mandate that scientific data be legally open, but merely that such data be “released with a clear and accessible data usage license” (principle R1.1). This omission leads to the phrase “FAIR and open data” — meaning both conditions are honored — being used more often these days.

Open metadata

There is no explicit definition for open metadata but many researchers recommend the Creative Commons CC0‑1.0 public domain dedication to provide for the widest use across jurisdictions (for instance, Kreutzer 2011).

Open academic publishing

Open access publishing attracts a range of definitions, starting with the mere absence of a paywall. But to be generally useful, open access should allow for the re‑use of content. The gold standard definition for open access is the Berlin Declaration:

Note that the phrase in the declaration “for any responsible purpose” is merely advisory under German law and therefore not legally binding in that jurisdiction at least.

The Berlin Declaration was shortly preceded by the Budapest Open Access Initiative (BOAI) with the first statement released on 14 February 2002:

Ten years later the BOAI published a review:

Licensing and reuse are covered in section 2. Section 2.1 endorses the Creative Commons attribution license:

We recommend CC-BY or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.

And section 3.9 states citation information is public domain and would benefit from being standardized:

We should improve and apply the tools necessary to harvest the references or bibliographic citations from published literature. The facts about who cited whom are in the public domain, and should be [open access] in standard formats for use, reuse, and analysis. This will assist researchers and research institutions in knowing what literature exists, even if they don’t have access to it, and in the development of new metrics for access and impact.

Other forms of content

Other forms of content more generally should be published under conditions that meet the OKF Open Definition (see above).

Discussion

The concept of open is additional to the concept of public. The concept of public results simply from the act of publishing, that is, of making the material available to the public, whether at some cost or free of charge. If no open license is provided, copyright may be retained in jurisdictions that deploy an efforts‑based threshold and database protection may apply in jurisdictions that support this particular property right.

Of the several categories of open content listed above, it is open data which provides the most problems for open energy system modelers. That is why there has been a broad effort to get public sector information providers to add either Creative Commons CC‑BY‑4.0 licenses or CC0‑1.0 public domain dedications at their discretion.

In terms to conception, it is useful to distinguish between an open work and a open license. An open work remains the primary objective and a particular open license is simply the legal vehicle to articulate the binding conditions and ascribed rights under which the open work exists. For instance, opensource.com defines open source in terms of “open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community‑oriented development” without explicitly mentioning intellectual property rights or licensing. Likewise, the OSD for software does not mention specific intellectual property rights either. In a similar spirit, the Open Definition defines the characteristics of an open work and an open license as separate concepts.

Choice of license issues are not traversed in this posting aside for data. Selecting a suitable license is a complicated subject, refer instead to the references below. The purpose of this thread is to list the gold standard definitions for each category of content for general reference.

Content without an open license is highly debilitating for open science. Source code without an open license cannot be lawfully built or run. Data without an open license cannot be modified and republished — at least not with legal certainty. Academic publications which languish behind paywalls hinder the dispersion of knowledge. And academic publications without an open license prevent both written text and interpreted data from being generally used, developed, and republished.

Further reading

Software

Jaeger, Till and Axel Metzger (6 February 2020). Open Source Software: Rechtliche Rahmenbedingungen der Freien Software (5th ed). Munich, Germany: CH Beck. ISBN 978‑3‑406‑73497‑7. In German.

Meeker, Heather (26 March 2020). Open (source) for business: a practical guide to open source software licensing (3nd ed). South Carolina, USA: Kindle Direct Publishing Platform. ISBN 979‑861820177‑3. Paperback edition.

Opensource (no date). What is open source?. Opensource.com.

Data

Ball, Alex (17 July 2014). How to license research data. Edinburgh, United Kingdom: Digital Curation Centre (DCC).

Hirth, Lion, Ingmar Schlecht, and Jonathan Mühlenpfordt (6 November 2018). Open data for electricity modeling: an assessment of input data for modeling the European electricity system regarding legal and technical usability — White paper. Berlin, Germany: Neon Neue Energieökonomik. A report for the Federal Ministry for Economic Affairs and Energy, Germany.

Hirth, Lion (1 January 2020). “Open data for electricity modeling: legal aspects”. Energy Strategy Reviews. 27: 100433. ISSN 2211‑467X. doi:10.1016/j.esr.2019.100433.

Lämmerhirt, Danny (December 2017). Avoiding data use silos: how governments can simplify the open licensing landscape. Open Knowledge International. London, United Kingdom.

Morrison, Robbie, Tom Brown, and Matteo De Felice (10 December 2017). Submission on the re‑use of public sector information: with an emphasis on energy system datasets — Release 09. Berlin, Germany. Published under a Creative Commons CC BY 4.0 license.

Morrison, Robbie (30 May 2020). Submission on a European strategy for data with an emphasis on energy sector datasets — Release 08. Published under a Creative Commons CC BY 4.0 license.

Wilkinson, Mark D et al (15 March 2016). “The FAIR Guiding Principles for scientific data management and stewardship — Comment”. Scientific Data. 3: 160018. doi:10.1038/sdata.2016.18. 53 authors in total. Open access.

Metadata

Kreutzer, Till (2011). Validity of the Creative Commons Zero 1.0 universal public domain dedication and its usability for bibliographic metadata from the perspective of German copyright law. Berlin, Germany: Büro für Informationsrechtliche Expertise.

Poblet, Marta, Amir Aryani, Paolo Manghi, Kathryn Unsworth, Jingbo Wang, Brigitte Hausstein, Sunje Dallmeier‑Tiessen, Claus‑Peter Klas, Pompeu Casanovas, and Victor Rodriguez‑Doncel (September 2016). Assigning creative commons licenses to research metadata: issues and cases — Preprint. See also doi:10.1007/978‑3‑030‑00178‑0_16.

Academic publishing

Kreutzer, Till (13 November 2014). Open content: a practical guide to using Creative Commons licences. Germany: German Commission for UNESCO, North Rhine‑Westphalian Library Service Centre (hbz), Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens. ISBN 978‑3‑940785‑57‑2.

License stewards

A license steward reviews new prospective licenses to ensure they conform to existing community norms and expectation using a public review process.

Notes

  1. The concept of open data is regrettably not supported in the body of European directive 2019/1024. Referring to recital 16 (page 58), the term “shared” is neither defined nor deployed in the body of the directive, And the term “re‑use” is perversely remapped to “use” under definition §2.11 (page 70). Moreover, the notion of “use” has a longstanding and restrictive tradition under established copyright law and this body of law is likely to be material. These legal defaults therefore require that genuinely open public sector information be explicitly signaled as such through the application of established data‑capable open licenses and preferably the Creative Common CC‑BY‑4.0 license.

2 Likes

Another open definition (see reference below), this from the OpenForum Europe (OFE), focuses on technology interoperability. Interoperability is also a consideration for energy system modelers in light of growing interest in energy sector digitalization, smart grids, and smart consumption. The Linux Foundation LF Energy initiative also represents a step in this direction, in which open (and royalty free, one would hope) interface standards and open implementations of same are central. (My thanks to OFE for the pointer.)

References

OFE (no date). Our vision — the OFE openness principles. OpenForum Europe (OFE). United Kingdom.

LF Energy — Open Source for Energy Transition. LF Energy. Website.

These ideas were further developed for a seminar in early‑2021. In particular, the bright line between code and data is somewhat of a legal fiction:

I created this simplified graphic that might help to explain the difference between transparency (available) and openness :

3 Likes

Here is a more elaborate version of that same diagram above:

The imperative to release data of public interest under CC‑BY‑4.0 licensing (or CC0‑1.0 or something inbound compatible) grows as the models themselves become open source. In short, open source and genuinely open data go hand in hand.

But note that the term “data” as used here covers only non‑personal information that has been or can be legitimately made public. That caveat then naturally excludes both personally identifiable information and commercially sensitive information such as trade secrets, but naturally includes information published under statutory reporting.

The tarball containing the Inkscape SVG and exported PNG files for release number 06 licensed CC‑BY‑4.0 as follows. Inkscape version 1.1.2 was used to produce the vector art.

5 Likes

Great illustration Robbie! :slight_smile:

Let me just say: I saved this thread for a later occassion and have used it as reference and for illustration to others for the third time this week already. So: Thank you! :slight_smile: