Looking to verify model quality and accuracy

I am looking for

  1. expert practitioners who have a significant credible track record verifying the performance of specific models.
  2. research and standards documentation on the process by which one should follow to obtain certification of a particular model in terms of best practice and/or industry standards.
  3. Benchmark studies comparing the performance of various energy modelling software.

I found only one thread on this topic dated 2017 Projects and publications regarding model evaluation

Is there any new (more recent) research or publications that deal with this topic?

Thanks to anyone who is able to provide advice on this topic.

The following framework comparison exercise was reported in June 2022 and at least one publication covers the comparison methodologies used:

Regarding the terminology in this context:

  • a framework essentially refers the codebase and supporting software development infrastructure
  • a model essentially refers to a particular instance on which a number of scenarios are run in order to investigate some predefined research question

Some care is required to distinguish between framework quality and model quality therefore — although clearly the two are intertwined. There is a considerable literature on software quality assurance and practice, but less so on model quality assurance.

Good software engineering suggests that the programming should not contain hardcoded parameters and that all necessary values are read in at runtime. Model quality is critically dependent on data availability and so data quality assurance necessarily plays a key role. Model quality is also contextual on the purpose of the research being undertaken.

The quality of production data under statutory reporting in Europe is poor as the posting below indicates. And some market information under statutory reporting in Europe is actively made more difficult to recover while still complying technically with the requiring legislation (doubtless in an attempt to increase data sales revenues). Moreover, neither class of information is served under clear legal terms concerning its use and reuse.


Henry et al (2021) report on a recent cross‑framework comparison between modeling projects based in the United States. The authors distinguish between parametric uncertainty and structural uncertainty.

They control for parametric uncertainty by proposing a simplified example, a common dataset, and a reduced set of scenarios that all four frameworks involved can support natively. This methodology means that only structural differences can remain. The projects are Temoa, MEM, energyRt, and SECTR. The authors close with a call for greater use of community benchmarking.

  • Henry, Candise L, Hadi Eshraghi, Oleg Lugovoy, Michael B Waite, Joseph F DeCarolis, David J Farnham, Tyler H Ruggles, Rebecca AM Peer, Yuezi Wu, Anderson de Queiroz, Vladimir Potashnikov, Vijay Modi, and Ken Caldeira (15 December 2021). “Promoting reproducibility and increased collaboration in electric sector capacity expansion models with community benchmarking and intercomparison efforts”. Applied Energy. 304: 117745. ISSN 0306‑2619. doi:10.1016/j.apenergy.2021.117745. :closed_access:

Hi @danielschwab, I’m not aware of any formal “model verification standard”, but there is a recent manuscript about indicators to assess model behavior, see ShieldSquare Captcha.

Also, there is a GitHub repo associated with that article (I think), see https://github.com/kvanderwijst/IAMDiagnostics.

Also, there are a couple of tools that could be useful to easily compare the results of your model to a scenario ensemble such as the one used by IPCC AR6 WG3 (IXMP Scenario Explorer developed by IIASA).

For example, in the Horizon 2020 project NAVIGATE, I wrote a Jupyter notebook to compute and compare growth rates of various energy sector technologies and the implied learning rates (where investment costs for technologies become cheaper as cumulative installed capacity increases). The aim was to identify the ranges of these parameters across models, distinguishing between granular and “lumpy” technologies (i.e., technologies with few but large units, e.g., nuclear power plants).
See https://github.com/iiasa/navigate-technology-analysis. This could be a useful starting point for a more structured comparison.

If you are rather working with R, you could take a look at https://github.com/ecemf/model_comparison, which is used for scenario validation and comparison the Horizon 2020 project ECEMF.


Thanks Robbie, much appreciated for your insights into this important topic

Thanks Daniel, Appreciate the valuable insights into this important topic.

Text and images licensed under CC BY 4.0Data licensed under CC0 1.0Code licensed under MITSite terms of serviceOpenmod mailing listOpenmod wiki. Openmod YouTube.