Best practices for documentation, APIs, and data formats to enable deep (calling models back and forth) or shallow (using outputs) integration.
Possible problems requiring integration
-
electricity model with air-emission model: study the impacts of pollution
-
Investment planning model, coupled to unit commitment: do invested technologies get committed
-
A core model of the European grid. Side models look at e.g. transport choices into the future
-
Shouldn’t be trying to build one giant model, it just won’t be good enough for everything!
- But you might be able to get better results than parceling out the modelling to lots of submodels (in which the solution space might be more limited)
Multi-model ecology paradigm:
- Each model does its own thing well, linking them gives complexity
- Should be easy to get up to speed on what the model is
- Need to have reproducability (executable paper)
Linking
-
Soft: output from model A is input of model B (eventually with some changes)
-
Hard: more advanced coupling, automated, e.g. code of model MASTER calls code of model SLAVE.
-
Third: you read the results from one model and try…
-
Should the research question come first? Would that then feed into possible links that need to exist
- not everyone agrees that it should come first
-
Data formats are important!
- Open Energy project has worked on metdata for formatting data
Some example projects:
- uncertweb: http://www.iia.cnr.it/project/uncert-web/
- SET-NAV: http://www.set-nav.eu/
- The SET-Nav team will bring together modelling groups from varied backgrounds. These groups use a diverse set of approaches (econometric, optimization, equilibrium, simulation) and focus on distinct aspects of the energy-economy-environment system.
Experiences
- Soft linking is fine if the output data from model A is good enough quality. This requires manually checking model A output to ensure model B input is reasonable.
- Agreed on ontology in some coupling projects, which is usually fine within the same field (e.g. electrical engineering)
- Chain breaks (models failing) are difficult to deal with if it isn’t your model. We wish our models would fail gracefully with all the relevant information, but that just isn’t the case
- ENTSO-E database to be a possible input to multiple models, which is difficult to do, especially with model linkage. Naming convention is really important, because people don’t understand the name and then incorrectly fill in a field
- CGMS(?) is an existing naming convention, but it is too complex
Creating an ontology (but maybe it isn’t an ontology that we’ve been talking about all along)
- There is a glossary on the openmod wiki: https://wiki.openmod-initiative.org/wiki/Category:Glossary
- Maybe just everyone has good documentation on what their model does
Visualisation - should we have a simple tool to visualise data for quick eye checks
- Pandas DataFrames are definitely the quickest way to do that right now
Uncertainty - Is there a common definition we can use here?
- How do people communicate their uncertain parameters?
Can we rate datasets for quality on a platform like Open Energy?
- OpenEI (DOE) does something like this already. traffic light system and a forum
- Can data processing metadata include s.d. etc. (i.e. ways in which data preprocessing is communicated)
Examples of bad integration:
- Building level modelling went into middleware at one point (timestep coordination, moves data in and out of different models) - acted as a bottleneck and has been mostly removed/moved to better processes like Modellica
Data format
- CSV? JSON?
- Should we just be writing mini scripts to glue together different models, to change the data structure between two models
- Maybe NetCDF would work?
- NETCDF4 built on HDF5, but with metadata stuff. Used well by the climate community
- Need to be careful with model sizes becoming too large, they’d take forever to process though
Unit-aware computing
- Used in modelica
- you put in your input units and the output will have known units, followed through from the input
- PINT: https://pint.readthedocs.io/en/latest/
Model quality:
- not just the data, but can you also trust the quality of the model itself
- how many models do unit testing? There were some hands not up
- Common set of input models for comparative model runs. Like the JS community, you could have a number of models (introducing different features) and a model is marked based on how many of those models it was able to run, and their outputs
- Google took a model from Berkley’s Switch and simplified it to make a short model, that is good for teaching people about energy models
- let’s all make simple models, with simple APIs and good documentations, then link them until we hit bottlenecks caused by the interlinking
APIs
- The meaning is different in different settings (C, Python, web)
- Web interface is particularly useful, as many want a simple interface (e.g. industry). Easy to do at the moment for simple models, but not for long running models
- Possibly could have a login page to come back to it when your model has run, so you can see your solutions. This is more difficult to implement
- Web-based APIs helps to ask new questions, with new users bringing with them new questions
Getting up to speed on what a model is doing
- Perhaps one or two weeks
Workflow management for reproducability