Distribution network data without license: what to do

marvha · 27 June 2024 08:18

Hi all,

I am a researcher working on distribution network modelling, and I have an edited version of two low voltage networks from the North-West region of the UK that I would like to release (the editing process basically consists of making them four-wire with explicit neutral, instead of three-wire as they currently are, in case you are curious).

I would like to make them available together with some code on a github repository, with a CC-BY-4.0 license. However, I am not sure about the implications as I am not the owner/creator of the original data set, and the original data set comes without a license; it consists of the ZIP folder “LV network models” located here: https://www.enwl.co.uk/future-energy/innovation/smaller-projects/low-carbon-networks-fund/low-voltage-network-solutions/

Can anyone help me with my doubts, i.e., am I allowed to edit and openly share the edits when the original dataset comes with no license?

I am based in the EU, in case that’s relevant.

Thanks!

mathieu-vallee · 27 June 2024 12:57

Looking at the website is seems that the content is “© Electricity North West 2024”.

So I guess you should ask for their permission for republishing the (modified) data set. Since they published it openly, they will probably be OK ?

In the process, you could indeed advise them to add an explicit licence, if they indeed want to grant others the right to use this work.

marvha · 27 June 2024 14:52

thanks for the prompt reply.
unfortunately, they are not very responsive, but I’ll try again I guess…
but also, isn’t there an exception to copyright for non-commercial research purposes, if there is no license?

robbie.morrison · 27 June 2024 18:12

Hi @marvha.

Comment: This main section mistakenly assumes the original data would be added to GitHub too. So read on with this misunderstanding in mind …

A couple general comments first. Jurisdiction is important. Given the data originates in the United Kingdom, their law almost certainly applies. The UK has an unusually low threshold for copyright and still has the European database directive on its statues. So one has to assume copyright and/or database rights can apply. Corporations may be more caviler in this regard but researchers are normally highly risk averse and need clear and suitable terms of use and reuse.

For some more specific material, as it relates to the United Kingdom:

Pang, Yiu-Shing (7 November 2022). Open licences. LinkedIn. Blog.
NGED (2023). National Grid — Open data licence. National Grid. London, United Kingdom.

I believe the National Grid license is based on the United Kingdom Open Government Licence v3.0. The OGL‑UK‑3.0 claims the choice of law to be English law. That provision may well not stand as it is difficult to write binding choice of law provisions into public licenses. But doubtless most forms of infringement will point to the United Kingdom through considerations like the origin of content, location of servers, and place of alleged infringement. The choice of law provision also disqualifies that license as “open” as per The Open Definition.

I have seen civil society organizations based in the United States do much what you propose with third‑party network data. In one case, the original provider followed up and the civil society organization essentially invited the original provider to litigate. Although the original provider easily had the means, they chose not to pursue their claim. Read into that what you will.

Another anecdote. A person on the panel at a recent webinar asked why energy researchers are not making better use of the ENTSOE‑E Transparency Platform (TP). A major reason, in my view, is that the data remains legally encumbered — or at least there is no certainty that it is not. I should add there were major efforts when the openmod first formed to gain Creative Commons CC‑BY‑4.0 licensing on the TP but that decision never happened. Relative to English law, there is even less chance that copyright or database rights apply to the TP and could be successfully litigated — but that legal uncertainty nonetheless remains under present conditions.

To respond to your question about legal exceptions for research purposes — but unfortunately I cannot comment in the context of English law. In the United States, you would not need a seek a defence because this material would be free of intellectual property. In Germany, there are detailed exceptions for research and education. There are also European level exceptions for data scraping too, duly enacted in national laws (and probably still present, post‑Brexit, in UK law). But neither sets of provisions are broad enough to allow a third party (such as yourself) to relicense under CC‑BY‑4.0 and republish. To note that the US affirmative defences you could employ fall under the rubric of “fair use”. The US employs such defences, largely built up through case law. Germany, on the other hand, lists detailed exemptions in its copyright statute (UrhG), also officially translated into English.

Is this all a bit a mess? Yes. It it going to be cleaned up anytime soon? Unlikely. So what can we do? Push for CC‑BY‑4.0 licensing on primary data and CC0‑1.0 on metadata. There have been a few successes along the way — one being the German BNetzA SMARD site.

One final thought. Perhaps you can get the National Grid license applied rather than push for the Creative Commons attribution license? HTH, R.

Addendum: And sorry, I missed your original question about sharing just the edits or diffs. The same kind of question could arise with software — except that a public license is require to run software, be it in binary or source form — so that question cannot really occur. I will ask elsewhere and get back to you in say one weeks time. There is lots of copyright law on derivative works as they apply to original creative material. But I’ve not heard of anything related to numerical data in this context. Back in due course.

Follow up questions: @marvha: Can you clarify how deep your edits are? And how much of the original material remains in the datasets you want to republish? And were your edits mechanical or did you add something creative? Can you distribute just the diffs or are you seeking to make available runnable datasets? TIA, R

marvha · 27 June 2024 20:14

hi Robbie, thanks for the very extensive reply and follow-up, it is certainly helpful!

Indeed, I am not interested in uploading the original dataset to github.

The original dataset (fully contained in the zip folder linked above) consists of 128 low voltage feeders in opendss format, grouped in 25 networks. The network data is complete, i.e., it includes connectivity and branch impedance values. User connection locations are part of the connectivity information. Some anonymized/synthetic demand/generation profiles are also provided in the datasets, that can be assigned to the users. Node (bus) coordinates are also provided (not representative of actual geographic locations, they are there to give the relative distances between the nodes).

The edits are limited to two of these feeders (each from a different network). We do not use the demand/generation from this data set at all (we take some from elsewhere), nor the coordinates (don’t care about the geographical distances in our work).

To be honest, all I want is for other researchers to be able to run the exact same scripts as I do to (re-)generate the results of a paper I am preparing. Which means that I would like everyone to be able to use the very same input data that I use, including the edited feeders. All other sources of data (including user profiles) and code are fine to share/redistribute; these edited feeders are the only thing I am not sure about due to them not being accompanied by an explicit license. I thought that making the edited data available through CC would be a natural way to do so, but I am actually upload them in any other allowed, reusable form.

I suppose you would classify the edits as “creative”. They can be summarized as:

an explicit neutral wire is added to all branches
four-wire impedance data, from another source, is used to replace the original impedance information altogether, original impedances are discarded
for one of the two feeders, one user location is slightly edited (because it was originally overlapping another object, I think, I need to check this though)
These changes are captured and stored in edited versions of the opendss files of the feeders.
Furthermore, within the script:
users are arbitrarily assigned power profiles from a different (non-problematic) dataset
in a subset of the analyzed scenarios, neutral grounding information is added at some nodes.

But the changes in 4),5) are not “stored” in any format, they are just on-the-fly changes to the parsed opendss data.

Please let me know if the steps are not clearly and sufficiently explained.

I suppose I could also avoid the upload of any of the ENWL data, and just add some instructions for the users on the GitHub’s readme, along the lines of:

go to the ENWL website and download the files
put them in folder xxx
run script yyyy, that edits the two ENWL feeders and generates local opendss files
run the paper’s script using these files as input, as originally intended.

But the latter is a bit more cumbersome (but allowed?)…

The code is all of my doing so no problem with that either.

robbie.morrison · 27 June 2024 22:20

Your explanation of the workflow makes almost complete sense!

Some preliminary remarks before I address your questions. I doubt that 96/9/EC database rights would apply. But I cannot be sure. For this to apply, your download, or technically “extraction”, would have to be “substantial” but that can be gamed somewhat by divvying up the database. The operator would need to have made a “substantial” investment in the structure excluding the content. And your particular use or reuse would need to impact materially on their financial viability. All fairly normal constructs in the law of civil wrongs. But unlikely to be met across the board in your circumstances?

Copyright could well apply in your case. As mentioned earlier, the threshold in Britain is particularly low with “intellectual effort” being sufficient. In the United States, some modicum of creativity is required. So I would imagine copyright is quite possible here?

If there are indeed no intellectual property rights present, you are perfectly welcome to add whatever public license you like to the material you circulate, including the CC‑BY‑4.0 license. Downstream users can potentially assess the IP status of the datasets and then elect to ignore your public license notice and treat the material as legally unencumbered. In the United States, that would be public domain.

If you decide to make your modifications public, I would omit any data not relevant to your numerics. But that does not solve the problem of creating open data for your reproducibility, development, and so on.

Your edits seem relatively minor and your GitHub proposal would retain much of the original content. So your modifications would create a derivative work and the original provider could seek redress, given that copyright is indeed present.

Your suggestion to describe API or even manual access and then let users download directly from the primary data provider is used quite widely to deal with protected or legally uncertain data. There are some clear technical drawbacks, such as servers going dark or the data itself being altered. But the scheme works fine from a legal perspective. You could process md5sums to ensure the same inputs?

It would be nice if the primary data provider could be persuaded to license under CC‑BY‑4.0. As I mentioned earlier, as the SMARD portal does. And I once asked a senior manager if they had encountered any issues and the answer was no. And probably their new license notice provides just legal certainty, not explicit grants, on the assumption that no enforceable intellectual property is present in the material?

You could also try contacting LF Energy. They must strike these kind of issues quite regularly. LF Energy are meeting in Brussels starting 05 September 2024. That said, the original opendds file you described would not be protected in the United States. HTH, R.

marvha · 1 July 2024 15:22

Thanks Robbie, this certainly helps, and I feel like I know what to do now (including trying to contact again the people who released the original files to ask them to license them).
I will also attend the LFEnergy event in Brussels, so I’ll try to have a chat about this. Would love to meet if you attend the event, too.

Hope you had a great cycling trip!

robbie.morrison · 2 July 2024 07:44

One general issue I encounter in relation to data is that legal advisors, whether in‑house or external, instinctively adopt defensive positions solely from the perspective of the organizations they counsel — under a narrow interpretation of their duty to client protocols, I suppose. The downsides for wider community usage are often not considered.

Nor the prospect that enforceable intellectual property rights are absent. As I suggested earlier, that assessment can be left to downstream users to decide as to whether such rights apply or not — but there is no direct legal downside for the original publisher to implicitly or explicitly assert the presence of non-existent copyrights and database protections. The commercial world is normally more cavalier about these issues, but research institutes are invariably extremely risk averse and need clear and explicit terms of use and reuse.

The Creative Commons CC‑BY‑4.0 public license works well in this regard and is genuinely international and widely understood — with the CC0‑1.0 waiver on accompanying metadata.

@marvha: good luck with your reproducibility solutions and any interactions back upstream, R