Hi @marvha.
Comment: This main section mistakenly assumes the original data would be added to GitHub too. So read on with this misunderstanding in mind …
A couple general comments first. Jurisdiction is important. Given the data originates in the United Kingdom, their law almost certainly applies. The UK has an unusually low threshold for copyright and still has the European database directive on its statues. So one has to assume copyright and/or database rights can apply. Corporations may be more caviler in this regard but researchers are normally highly risk averse and need clear and suitable terms of use and reuse.
For some more specific material, as it relates to the United Kingdom:
I believe the National Grid license is based on the United Kingdom Open Government Licence v3.0. The OGL‑UK‑3.0 claims the choice of law to be English law. That provision may well not stand as it is difficult to write binding choice of law provisions into public licenses. But doubtless most forms of infringement will point to the United Kingdom through considerations like the origin of content, location of servers, and place of alleged infringement. The choice of law provision also disqualifies that license as “open” as per The Open Definition.
I have seen civil society organizations based in the United States do much what you propose with third‑party network data. In one case, the original provider followed up and the civil society organization essentially invited the original provider to litigate. Although the original provider easily had the means, they chose not to pursue their claim. Read into that what you will.
Another anecdote. A person on the panel at a recent webinar asked why energy researchers are not making better use of the ENTSOE‑E Transparency Platform (TP). A major reason, in my view, is that the data remains legally encumbered — or at least there is no certainty that it is not. I should add there were major efforts when the openmod first formed to gain Creative Commons CC‑BY‑4.0 licensing on the TP but that decision never happened. Relative to English law, there is even less chance that copyright or database rights apply to the TP and could be successfully litigated — but that legal uncertainty nonetheless remains under present conditions.
To respond to your question about legal exceptions for research purposes — but unfortunately I cannot comment in the context of English law. In the United States, you would not need a seek a defence because this material would be free of intellectual property. In Germany, there are detailed exceptions for research and education. There are also European level exceptions for data scraping too, duly enacted in national laws (and probably still present, post‑Brexit, in UK law). But neither sets of provisions are broad enough to allow a third party (such as yourself) to relicense under CC‑BY‑4.0 and republish. To note that the US affirmative defences you could employ fall under the rubric of “fair use”. The US employs such defences, largely built up through case law. Germany, on the other hand, lists detailed exemptions in its copyright statute (UrhG), also officially translated into English.
Is this all a bit a mess? Yes. It it going to be cleaned up anytime soon? Unlikely. So what can we do? Push for CC‑BY‑4.0 licensing on primary data and CC0‑1.0 on metadata. There have been a few successes along the way — one being the German BNetzA SMARD site.
One final thought. Perhaps you can get the National Grid license applied rather than push for the Creative Commons attribution license? HTH, R.
Addendum: And sorry, I missed your original question about sharing just the edits or diffs. The same kind of question could arise with software — except that a public license is require to run software, be it in binary or source form — so that question cannot really occur. I will ask elsewhere and get back to you in say one weeks time. There is lots of copyright law on derivative works as they apply to original creative material. But I’ve not heard of anything related to numerical data in this context. Back in due course.
Follow up questions: @marvha: Can you clarify how deep your edits are? And how much of the original material remains in the datasets you want to republish? And were your edits mechanical or did you add something creative? Can you distribute just the diffs or are you seeking to make available runnable datasets? TIA, R