Break-out group on the integration with external models

kais_siala · 12 October 2017 12:09

Best practices for documentation, APIs, and data formats to enable deep (calling models back and forth) or shallow (using outputs) integration.

Possible problems requiring integration

electricity model with air-emission model: study the impacts of pollution
Investment planning model, coupled to unit commitment: do invested technologies get committed
A core model of the European grid. Side models look at e.g. transport choices into the future
Shouldn’t be trying to build one giant model, it just won’t be good enough for everything!
- But you might be able to get better results than parceling out the modelling to lots of submodels (in which the solution space might be more limited)

Multi-model ecology paradigm:

Linking

Soft: output from model A is input of model B (eventually with some changes)
Hard: more advanced coupling, automated, e.g. code of model MASTER calls code of model SLAVE.
Third: you read the results from one model and try…
Should the research question come first? Would that then feed into possible links that need to exist
- not everyone agrees that it should come first
Data formats are important!
- Open Energy project has worked on metdata for formatting data

Some example projects:

uncertweb: http://www.iia.cnr.it/project/uncert-web/
SET-NAV: http://www.set-nav.eu/
- The SET-Nav team will bring together modelling groups from varied backgrounds. These groups use a diverse set of approaches (econometric, optimization, equilibrium, simulation) and focus on distinct aspects of the energy-economy-environment system.

Experiences

Soft linking is fine if the output data from model A is good enough quality. This requires manually checking model A output to ensure model B input is reasonable.
Agreed on ontology in some coupling projects, which is usually fine within the same field (e.g. electrical engineering)
Chain breaks (models failing) are difficult to deal with if it isn’t your model. We wish our models would fail gracefully with all the relevant information, but that just isn’t the case
ENTSO-E database to be a possible input to multiple models, which is difficult to do, especially with model linkage. Naming convention is really important, because people don’t understand the name and then incorrectly fill in a field
- CGMS(?) is an existing naming convention, but it is too complex

Creating an ontology (but maybe it isn’t an ontology that we’ve been talking about all along)

There is a glossary on the openmod wiki: https://wiki.openmod-initiative.org/wiki/Category:Glossary
Maybe just everyone has good documentation on what their model does

Visualisation - should we have a simple tool to visualise data for quick eye checks

Uncertainty - Is there a common definition we can use here?

Can we rate datasets for quality on a platform like Open Energy?

OpenEI (DOE) does something like this already. traffic light system and a forum
Can data processing metadata include s.d. etc. (i.e. ways in which data preprocessing is communicated)

Examples of bad integration:

Building level modelling went into middleware at one point (timestep coordination, moves data in and out of different models) - acted as a bottleneck and has been mostly removed/moved to better processes like Modellica

Data format

CSV? JSON?
Should we just be writing mini scripts to glue together different models, to change the data structure between two models
Maybe NetCDF would work?
- NETCDF4 built on HDF5, but with metadata stuff. Used well by the climate community
- Need to be careful with model sizes becoming too large, they’d take forever to process though

Unit-aware computing

Used in modelica
you put in your input units and the output will have known units, followed through from the input
PINT: https://pint.readthedocs.io/en/latest/

Model quality:

not just the data, but can you also trust the quality of the model itself
how many models do unit testing? There were some hands not up
Common set of input models for comparative model runs. Like the JS community, you could have a number of models (introducing different features) and a model is marked based on how many of those models it was able to run, and their outputs
Google took a model from Berkley’s Switch and simplified it to make a short model, that is good for teaching people about energy models

let’s all make simple models, with simple APIs and good documentations, then link them until we hit bottlenecks caused by the interlinking

APIs

The meaning is different in different settings (C, Python, web)
Web interface is particularly useful, as many want a simple interface (e.g. industry). Easy to do at the moment for simple models, but not for long running models
- Possibly could have a login page to come back to it when your model has run, so you can see your solutions. This is more difficult to implement
Web-based APIs helps to ask new questions, with new users bringing with them new questions

Getting up to speed on what a model is doing

Workflow management for reproducability

kais_siala · 12 October 2017 12:17

Example: electricity model with air-emission model: study the impacts of pollution