This thread considers how open and public differ as concepts and provides several widely‑used definitions for “open”. Questions like these often surface during discussions on the advantages of open in its various contexts: open source, open data, open access, open content, and open science.
This thread lists widely accepted touchstone definitions rather than reviewing individual licenses and their relative merits. Nor does this thread consider the various processes involved in endorsing licenses in relation to these various touchstone definitions — a complicated and ofttimes controversial exercise.
The concept of public simply means the content is made generally available to anyone. In the internet era, that can include publication on a website, offering a standalone file for download, or using a clonable code hosting site. Copyright normally attaches and the author or authors retain all rights by default, even if no explicit claim is made. All rights reserved copyright is debilitating for open science for reasons covered later. In addition for data, database rights may adhere in some jurisdictions, principally Europe and the United Kingdom.
Widely‑accepted definitions for open apply to each category of content: software, data, metadata, and academic publishing. These definitions are indicated in the diagram below and listed later, with other forms of artifact also considered. Each category has different legal considerations, different definitions, and therefore different sets of licenses. For instance, software licenses should address software patents whereas licenses suitable for data need to consider European database protection.
Figure 1: Different categories of content and their respective touchstone definitions.
In operational terms, open means that the content carries a suitable open license. Open licenses grant recipients permission to use and modify the material and republish any changes that they might make under the same or compatible contractual terms. This last point is the key — such re‑use means improvements can be returned to the information commons for all to benefit. Unlike proprietary licenses, open licenses grant these permissions to anyone who complies with the stated conditions — without the need to negotiate with rights holders on an individual basis.
Open licenses, however, fall into two distinct camps. So‑called reciprocal licenses (also known as copyleft) are legally sticky and require that the licensed material be reissued under the same conditions — so that the material effectively remains in the information commons forever. Whereas so‑called permissive licenses intentionally allow information leakage into proprietary products. Permissive licenses may be selected by software projects seeking to broaden uptake and foster informal standards. Outside of software, permissive licenses are used extensively for open data and for academic publishing.
The term software covers both source code and object code — with executable programs falling into the latter grouping. The gold standard definition for open software is the Open Source Initiative (OSI) Open Source Definition (OSD):
- OSI. Open Source Definition — Last modified 2007-03-22. Open Source Initiative (OSI). Palo Alto, California, USA.
In addition, the OSI act as a license steward and maintain a list of approved licenses. The Free Software Foundation (FSF) also publish an operationally equivalent definition to the OSD, widely referred to as the “four freedoms”.
As indicated, software licenses need to consider legal issues specific to software and only explicit software licenses should be applied to source code.
Like software, data requires specific legal considerations, including the question of database protection for collections of facts and values served from within Europe. The gold standard definition for open data is the Open Knowledge Foundation (OKI) Open Definition:
- Open Knowledge International (no date). Open Definition 2.1 — Defining open in open data, open content and open knowledge. Open Knowledge Foundation (OKF). Cambridge, United Kingdom.
In 2019, the European Union introduced a definition for open data in their reworked public sector information (PSI) directive:
- European Commission (26 June 2019). “Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on open data and the re-use of public sector information — PE/28/2019/REV/1”. Official Journal of the European Union. L 172: 56–83. The directive entered into force on 16 July 2019.
Recital 16 of that directive (page 58) begins:1
Open data as a concept is generally understood to denote data in an open format that can be freely used, re-used and shared by anyone for any purpose.
Readers should note that Creative Commons licenses prior to version 4.0 do not specifically traverse database rights and similar issues and should not be applied to datasets and databases.
Readers should also note that the FAIR data principles articulated by Wilkinson et al (2016) do not mandate that scientific data be legally open, but merely that such data be “released with a clear and accessible data usage license” (principle R1.1). This omission leads to the phrase “FAIR and open data” being used more often these days.
There is no explicit definition for open metadata but some researchers recommend the Creative Commons CC0‑1.0 public domain dedication to provide for the widest use across jurisdictions (for instance, Kreutzer 2011).
Open academic publishing
Open access publishing attracts a range of definitions, starting with the mere absence of a paywall. But to be generally useful, open access should allow for the re‑use of content. The gold standard definition for open access is the Berlin Declaration:
- Berlin Declaration (22 October 2003). Berlin Declaration on open access to knowledge in the sciences and humanities. Munich, Germany: Max‑Planck-Gesellschaft, München.
Note that the phrase in the declaration “for any responsible purpose” is merely advisory under German law and therefore not legally binding in that jurisdiction at least.
The Berlin Declaration was shortly preceded by the Budapest Open Access Initiative (BOAI) with the first statement released on 14 February 2002:
- BOAI (no date). Budapest Open Access Initiative. Budapest Open Access Initiative. Landing page.
Ten years later the BOAI published a review:
- BOAI (12 September 2012). Ten years on from the Budapest Open Access Initiative: setting the default to open. Budapest Open Access Initiative.
Licensing and reuse are covered in section 2. Section 2.1 endorses the Creative Commons attribution license:
We recommend CC-BY or an equivalent license as the optimal license for the publication, distribution, use, and reuse of scholarly work.
And section 3.9 states citation information is public domain and would benefit from being standardized:
We should improve and apply the tools necessary to harvest the references or bibliographic citations from published literature. The facts about who cited whom are in the public domain, and should be [open access] in standard formats for use, reuse, and analysis. This will assist researchers and research institutions in knowing what literature exists, even if they don’t have access to it, and in the development of new metrics for access and impact.
Other forms of content
Other forms of content more generally should be published under conditions that meet the OKI Open Definition (see above).
The concept of open is additional to the concept of public. The concept of public results simply from the act of publishing, that is, of making the material available to the public, whether at some cost or free of charge. If no open license is provided, full copyright is typically retained plus other relevant protections such as European database rights.
Of the several categories of open content listed above, it is open data which provides the most problems for open energy system modelers. That is why there has been a broad effort to get public sector information providers to add either Creative Commons CC‑BY‑4.0 licenses or CC0‑1.0 public domain dedications at their discretion.
In terms to conception, it is useful to distinguish between an open work and a open license. An open work remains the primary objective and a particular open license is simply the legal vehicle to articulate the binding conditions and ascribed rights under which the open work exists. For instance, opensource.com defines open source in terms of “open exchange, collaborative participation, rapid prototyping, transparency, meritocracy, and community‑oriented development” without explicitly mentioning intellectual property rights or licensing. Likewise, the OSD for software does not mention specific intellectual property rights either. In a similar spirit, the Open Definition defines the characteristics of an open work and an open license as separate concepts.
Choice of license issues are not traversed here. Selecting a suitable license is a complicated subject, refer instead to the references below. The purpose of this thread is to list the gold standard definitions for each category of content for general reference.
Content without an open license is highly debilitating for open science. Source code without an open license cannot be lawfully built or run. Data without an open license cannot be modified and republished — at least not with legal certainty. Academic publications which languish behind paywalls hinder the dispersion of knowledge. And academic publications without an open license prevent both text and interpreted data from being generally used, developed, and republished.
Jaeger, Till and Axel Metzger (6 February 2020). Open Source Software: Rechtliche Rahmenbedingungen der Freien Software (5th ed). Munich, Germany: CH Beck. ISBN 978‑3‑406‑73497‑7. In German.
Meeker, Heather (26 March 2020). Open (source) for business: a practical guide to open source software licensing (3nd ed). South Carolina, USA: Kindle Direct Publishing Platform. ISBN 979‑861820177‑3. Paperback edition.
Ball, Alex (17 July 2014). How to license research data. Edinburgh, United Kingdom: Digital Curation Centre (DCC).
Hirth, Lion, Ingmar Schlecht, and Jonathan Mühlenpfordt (6 November 2018). Open data for electricity modeling: an assessment of input data for modeling the European electricity system regarding legal and technical usability — White paper. Berlin, Germany: Neon Neue Energieökonomik. A report for the Federal Ministry for Economic Affairs and Energy, Germany.
Hirth, Lion (1 January 2020). “Open data for electricity modeling: legal aspects”. Energy Strategy Reviews. 27: 100433. ISSN 2211-467X. doi:10.1016/j.esr.2019.100433.
Lämmerhirt, Danny (December 2017). Avoiding data use silos: how governments can simplify the open licensing landscape. Open Knowledge International. Cambridge, United Kingdom.
Morrison, Robbie, Tom Brown, and Matteo De Felice (10 December 2017). Submission on the re-use of public sector information: with an emphasis on energy system datasets — Release 09. Berlin, Germany. Published under a Creative Commons CC BY 4.0 license.
Morrison, Robbie (30 May 2020). Submission on a European strategy for data with an emphasis on energy sector datasets — Release 08. Published under a Creative Commons CC BY 4.0 license.
Wilkinson, Mark D et al (15 March 2016). “The FAIR Guiding Principles for scientific data management and stewardship — Comment”. Scientific Data. 3: 160018. doi:10.1038/sdata.2016.18. 53 authors in total. Open access.
Kreutzer, Till (2011). Validity of the Creative Commons Zero 1.0 universal public domain dedication and its usability for bibliographic metadata from the perspective of German copyright law. Berlin, Germany: Büro für Informationsrechtliche Expertise.
Poblet, Marta, Amir Aryani, Paolo Manghi, Kathryn Unsworth, Jingbo Wang, Brigitte Hausstein, Sunje Dallmeier-Tiessen, Claus-Peter Klas, Pompeu Casanovas, and Victor Rodriguez-Doncel (September 2016). Assigning creative commons licenses to research metadata: issues and cases — Preprint. See also doi:10.1007/978-3-030-00178-0_16.
Kreutzer, Till (13 November 2014). Open content: a practical guide to using Creative Commons licences. Germany: German Commission for UNESCO, North Rhine-Westphalian Library Service Centre (hbz), Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens. ISBN 978-3-940785-57-2.
- The concept of open data is regrettably not supported in the body of European directive 2019/1024. Referring to recital 16 (page 58), the term “shared” is neither defined nor deployed in the body of the directive, And the term “re‑use” is perversely remapped to “use” under definition §2.11 (page 70). Moreover, the notion of “use” has a longstanding and restrictive tradition under established copyright law and this body of law is likely to be material. These legal defaults therefore require that genuinely open public sector information be explicitly signaled as such through the application of established data‑capable open licenses and preferably the Creative Common CC‑BY‑4.0 license.