Not having seen this draft before, I read it with a fresh pair of eyes.
I am not sure who the intended audience is or what the ultimate objective is. So my comments may reflect my own misinterpretations in this regard.
The early part of the document seems to rather dwell on problems associated with the sourcing and cleaning of data, whereas, in practice, a number of projects are now successfully tackling these issues — the document could be a bit more upbeat here.
Moreover, when discussing case studies, some additional energy database projects (perhaps OPSD) should be included.
The first half makes a number of more-or-less unsupported statements about open data and open-source development, while the second half discusses specific projects and their experiences. There needs to be a better integration between the two, but I am not sure the best way of tackling this problem though.
A few comments:
- a need to separate the concepts of data availability (a retrieval issue) and data use and reuse (a licensing issue)
- the section on data processing should make reference to the need to establish community data standards and perhaps ultimately published norms
- the distinction between optimization models and simulation models is too polarized: LP and MIP models contain elements of simulation and simulation models can use optimization to resolve processes like market clearing
- more focus on real-world examples may be needed early on, especially in relation to the open-source ideals of cooperation and community development
- the discussion on mathematical programming is too technical and contributes little or nothing
- the release date and details of the open government license (OGL) for the UKTM example are needed (I was not aware that UKTM has been published and it is now 2017, correct me if I am wrong)
- more analysis on real-world license choice issues would help — for instance, how has license choice impacted on the success of various projects (I tend to the view that the licensing of energy models is not particularly significant, we are not talking systems programming or library development here)
- licensing issues could be accorded their own section
- given that a fair chunk of model development is undertaken by PhD students, some discussion on multiply-authored code in this context is probably warranted
- in the discussion, the list of benefits accruing from the UKTM being open-source needs to be backed up with secondary sources
And a few trivial points:
- the “NaN” term is rather obscure, knowledge of floating point arithmetic should not be a requirement for the reader
- incomplete references make some parts hard to follow
- the term “open code” might be better than “open source” in some places, including figure 1
- I have no idea what this is getting at: “Reproducibility is not guaranteed due to possible changes in the back-end software.”
- figure 2 should note that hybrid typologies exist
- if you are going to discuss mathematical programming languages, you should mention GLPK MathProg
- why do open-source platforms “regularly lack … sound support” — my experience is quite the opposite
- I don’t agree with the statement “the most important point is that publishing something is better than publishing nothing” — the best open-source projects have a tremendous commitment to software quality (for just one example, the GLPK solver)
- describing copyleft licenses as “viral” is somewhat pejorative and unnecessary
- the GPL-licensing of GAMS code is described as “incompatible” — this needs a reference
- I don’t get the point about the use of commercial solvers being problematic, the only issue is cost surely as there should be no problem linking CLPEX to your own open-source C++ model (or am I missing something)
To address Tom’s question about the differentiation with Pfenninger et al (2017). The statement offers a number of case studies that the paper does not. The statement is directed toward practitioners (I would guess) whereas the paper is more general. That said, there are some distinct similarities.
Another key difference is that the statement traverses data and code licensing in substantially more detail. (I presume that is a central objective of the statement.) However that treatment would benefit from being pulled out of the main thread and accorded dedicated sections on licensing and copyright for data and code. Database rights should also be mentioned. A paragraph on the licensing of documentation (under either a GNU documentation license or a Creative Commons license) would complete the discussion.
Finally, and without being too fat-headed, the two Wikipedia pages I started on open data and open code contain useful background and references which could contribute to this statement. Ditto for the openmod article. On that note, which license is going to be used for this document? I suppose CC BY 4.0 (like the openmod wiki), in which case material cannot be simply copied from Wikipedia.