Compiled results from past data breakout groups

In this topic the discussions of the past are summarized. You can complete or correct this by editing this post as a wiki. Answers to this topic should only be made to discuss the breakout-groups. For discussions of single data-topics I suggest to open a new thread (topic).

Open data is one prominent topic that was discussed throughout all openmod meetings in always one or more breakout groups.

1st meeting

link to results :
summary: (For any reason I cannot access the site therefor I write what I remember and it needs additions/corrections from the community!)

  • we discussed the difference of approaches concerning open data and open source modelling.
  • We decided that openmod should go beyond open data
  • open data was the common denominator of all participants
  • we wanted to elaborate useful ways to share data and informations about data to reduce the time we spent with data research

2nd meeting

two consecutive breakout-groups
link to results :
we defined

  • short term steps:
    ** use the wiki and put links there as well as own scripts to process the data
    ** the form of the link and the comments to the links should be standardized
    ** a set of requirements and metadata was already listed as well as tags and most necessary datasets (see google doc)
  • Long-term vision (part one):
    ** we want to cover the full process of input => process => output (input to models) on a central server to make everything transparent
    ** possibility to link the used data directly to the sources
    ** get the output-data through to an API or as CSV-download
    ** reach total transparency for processed data
    ** challenges were discussed as: to comply with international initiatives; quality and updating of data; participation
    => we stated that it is only realistic to go beyond some basic datasets if it is done in a paid project
  • Long-term vision (part two):
    ** who can provide us with a server and a database for our visions? => we discussed to find someone to fund the server (as e.g. the Klimarechenzentrum for Meteorology data): our idea was to address OKFN (who already provide CKAN for openEI and; we drafted three possibilities to work together with them:
    *** dissolve the openmod initiative into an OKFN working group on “Open Energy”
    *** Only use selected resources that OKFN might provide us with: Hosted CKAN, mailing list, etc…
    *** Do our own stuff
  • additional point
    through to the whole discussion was mentioned that a glossary should be opened where we could discuss the terminology we use (which is especially important to define the data, tags and data processes but also for other subjects)

##Follow up of the second meeting:

  • ZNES Flensburg (Clemens) was in contact with OKFN; answer was that they don’t have the capacity to deal with that at the moment.
  • He further suggested OKFN to be partner in a project proposal (H 2020) with which database questions for open energy data can be addressed deeply and that could serve to build up a database along the requested visions. unfortunately OKFN didn’t have capacity to participate in it neither
  • as mentioned in the presentations of the second meeting ZNES and RLI had a proposal for an open source model which was very probable to be funded were they succeeded to add one supplementary workpackage that provided the possibility to use paid time to further discuss the visions and improve the tools (especially open data) for the community.

3rd meeting

two breakout-groups dealing with the data subject: 1. concrete data discussion (Open Weather Data), 2. discussion of the requirements of the platform (that also contains the database) (OpenEnergy Platform)
link to results :
summary :
1.Open Weather Data (@participants of the group: please go through here and add important missing points. I took that from the google doc but didn’t participate in this group). Topics were:

  • Share and improve our understanding of the sources of open weather data - both historic and future - wind, solar, hydro, wave/ocean.
  • Discover and document best practices for using weather data
  • Potential for an inter-comparison project
  • Understand what we require from the next generation of reanalyzes for energy applications

Next steps were defined:

  • Iain to fill out openmod website to have a “how to do this” (make a first stab)
  • Reading meteorologists fix it
  • Reading compile a best practices for reanalysis lis

some requirements for reanalysis and best practice were listed:

  • Higher level wind speeds at fixed heights.
  • Higher spatial resolution
  • Long running time (30 years is typical length of a climate)
  • True hourly timesteps. MERRA’s internal resolution is 3 hours, they then do some fancy post-processing to get to hourly

best practice:

  • Use several years of weather data
  • How to go from 3 or 6 to 1 hourly – bootstrapping to get the distributions?
  • Use multiple sources
  • Testing & validation
  • Good question to set a meteorology department

=> @participants of the group: the google document contains already a lot of discussions and hints to datasets. Was there a follow up were the links have been included in the wiki? please link here if so

2. OpenEnergy Platform
The requirements that were envisioned in the former meetings can’t be achieved just with a database. For features like tags, rating, discussion of datasets, glossary and revisioning (linking of results/scenarios to the exact dataset, full transparency of data processing) additional tools have to be implemented. That was gathered in the discussion of the “OpenEnergy Platform”. It is only mentioned here that the platform was discussed at the meeting.The summary of the discussion should be added to the category “website”.

##Follow up 3rd Meeting
the questions of what is needed for full transparency/reproducibility was discussed in a workshop in October 2015.
Short summary of the WS-results: to make a scenario transparent from assumptions to results we need a detailed description of the scenario which should be linked to the model that was used (exact version if needed also a link to the framework the model was built from) as well as to the used datasets (exact revisions). Therefor the content of a scenario factsheet were discussed as well as for a model factsheet. That were continued as follow up in a google excel sheet and in the openmod wiki.

4th meeting

breakout-groups dealing with the data subject: 1. Data Collection (with subgroups: o) European data Set, a) Grid data, b) Open Weather Data and c) Power Plant Data, 2. data management, 3.Data matching/harmonisation
link to results :
1 and o),:
1. Data Collection
Goals of the group:

=> Check datasets are not duplicated in the wiki/CKAN
After a common introduction and discussion of the state of development of the database (especially discussion the frontend for the user) this group split up into different parts (European weather data, power plants, network data).

subgroup European data
We want to have an overview what kind of data is necessary or “nice to have” for modelling Europe. We want to know which of the data is available and which of the data have to be on our “shopping list”.
we filled in TODOs for general missing data categories (e.g. environmental and socio-economic data)

we didn’t spent time discussing where exactly to allocate the data because the midterm objective is to include it to the Data base.

subgroup Grid data
summary: the following Goals were stated:

  • to discuss the GIS aspect of grid data
  • A method to collect different aspects of grid data for energy system modelling
  • Gather the needs of energy system modellers on grid data


  • especially OSM (open street map) data were discussed (see google doc) e.g.: Besides the geographical information there are different parameters with different completeness; Relations solve the problem of correct electrical connections.
  • SciGRID (European grid data) offers a method to include relations for transmission grids.
  • Mapping relations of 110kV (lower) high voltage is not realistic.

Problems were identified (whole list in google doc):

  • How to import missing grid data and substation data into OSM, especially in other (european/worldwide) countries?
  • How to address the needs of energy system modellers to the OSM-community?
  • What kind of parameters are needed for simulating grid data?
  • It is nearly impossible to include electrical parameters of substations.

Output and Outreach

  • The vision is a global dataset of transmission grid data in OSM for everybody to use with specific OS-tools for everybody to use
  • Write a statement to the OSM-community of how important the OSM-data is for (open) energy system modellers is.
  • Combine different methods of processing OSM-data (SciGrid, GridKit, see: )
  • Implement the grid processing in the OEP (openmod-plattform)
  • Be careful to address the community with a ToDo-List. Rather motivate and help them do their hobby as well as possible.
  • Adjust the data structure for substations (perhaps based on the PyPSA approach?)
  • Connect electrical parameters to the OSM/SciGRID-substation object AND the relations (lines) in order to complete electrical information on infrastructure

subgroup weather data
discussion from meeting 3 was continued; especially Use of weather data in energy systems analysis. the participants had some special interests in certain regions
best practice and necessary features of weather data were gathered (resolution, time steps, height…)
=> happy if someone adds more details

subgroup Power Plant data
according to the google doc metadata and attributes for powerplant data has been discussed (see list in doc)
=> happy if someone adds more details

2. data management
it was discussed how data is stored in different energy models and how quality of data is assure; the current state of concept to address that in the database was discussed

  • Quality: data forecast: the data sets often doesn’t include all assumptions behind the numbers
  • data gaps: how to add information about the assumptions on several cells (concept db: json files can be used for that)
  • two data for the same year

concerning quality it was discussed what kind of testing (automated) could be done to avoid bad quality:

  • structure of data
  • sources are added?
  • differentiate the quality tests between real or synthetic data

3.Data matching/harmonisation
Data like power plant information comes from multiple sources with differing reliability which has to be condensed into one harmonized database of maximal likelihood, while keeping track of the original sources.
The central problem is de-duplicating the sources/finding common entities.
=> happy if someone adds more details

##5th meeting
two breakout-groups on data: 1. Participating in the OpenEnergy Database (oedb) (with somesubgroups); 2. a database for people working on rural electrification
link to results :
rural electrification: same document in the bottom

The whole group started with the introduction of the state of the development and concept of the database (oedb). The common discussion tackled questions like

  • Which other databases are existing or under construction
  • What kind of open data do you use?
  • What kind of API are available?
  • Discussion about metadata (metadata standards)
  • discussion about table/schema and set up of the datasets (Tables should always be merged and harmonized if possible).

subgroup Discussion on merging databases

Aim: how can we merge the existing open databases?
different possibilities were discussed: Simple data-access → technical aspect; Standardised Data model(s);
Decentralized databases linked to decentralized energy systems → no global database; Metadata-conversion
Connection in between databases. Not the same structure, but compatibility.
Example: dat data-project (Berlin), but very in the beginning.
ToDo: start communication with the developers of other open databases

subgroup Discussion on Tags
Aim: give a suggestion how to handle the Tags
Why: Tags are the main way to find datasets. You can search all over the DB or in one Schema.
The tables should be organised by tags not by splitting tables
a Minimum number of Tags for each table should be obligatory; they were discussed as suggestions but need to be discussed again:
=> universal Tags (global database search/filter)
Countries (Continents shall be added automatically) or continent
Type of data (Measurements,Real data;Future projections;processed data,…)
=> schema specific Tags should be defined as obligatory and free - would be good if they are offered when saving new data

subgroup Discussion on Dataset use (rural electrification)
aim: data for electricity demand analysis, load curves are needed
Which kind of data we should need at least?
→ before electrification: numbers of people, HHs, sample size
→ appliances, hours of operating, windows of time (World Bank DATA / Surveys / Databases)
How to share our data?
→ mailing list, add “Rural Electrification” in the glossary, list of useful references through google docs

Some sources (or even possibility to put data?) were mentioned like:

subgroup Discussion on datasets about energy profiles
Datasets of interest:
Energy demand profiles (electricity, natural gas, heating)
Fluxes (persons, goods, energy) for transportation analyses
Different detail levels are considered by different researchers: national, regional, city, district in the city, single building.

sources ond possibilities to get the data were discussed

Text and images licensed under CC BY 4.0Data licensed under CC0 1.0Code licensed under MITSite terms of serviceOpenmod mailing list.