Working out global, annual, figures for carbon intensity of electricity for servers to use for scheduling compute jobs

mrchrisadams · 17 March 2022 13:44

hi there.

I’m not an energy modeller - I’m a software engineer and I’m trying to build some level of awareness of carbon intensity into tools for managing datacentres and working out where and when to schedule computing loads, and I have a couple of questions for areas outside my domain of expertise. I hope that’s ok.

Background

For those interested - here’s the issue where I’m listing notes to build some of these into a go software library, for consumption inside software for managing datacentre and servers like Kubernertes, Nomad and so on.

github.com/thegreenwebfoundation/grid-intensity-go

Add UNFCCC - IFI as a provider of annual country carbon intensity emissions figures

opened 09:22AM - 21 Feb 22 UTC

mrchrisadams

I've been looking around, and it looks like there's a helpful set of annual emis…sions factors for countries that have been commissioned and created by the [United Nations International Financial Institutions project](https://unfccc.int/climate-action/sectoral-engagement/ifis-harmonization-of-standards-for-ghg-accounting). ### TODO _this is the TLDR for this issue_ - [ ] Add default global figure for 2021 (if we have this as a fall back we can at least implement most of the code itself ) - [ ] clarify license and of the figures in the spreadsheet (can we include them in an Apache 2 licensed codebase? Are we able to relicense this data for use in OSS projects?) - [x] create test harness, to flesh out the API following with the same [same Provider Interface as the other providers](https://github.com/thegreenwebfoundation/grid-intensity-go/blob/main/provider.go#L7-L9) - [x] implement IFI Provider, and check about the correct name. In particular, there's section about why they were created in the first place on the project, and I think it's worth sharing here: > ### Stakeholders' expectations > > Donors as well as institutional investors expect standards that are simple to use but that also ensure comparability of GHG emissions estimates to inform their decision-making related to project finance. They also expect the standards to be credible, transparent and widely accepted. > ![expectations](https://user-images.githubusercontent.com/17906/154918584-aef35327-39f4-47f7-a4cb-64bcd6f91930.png) These provide: - a) recent figures for every single country in the UN (although without the corresponding ISO two or three letter codes. These can be added into a Golang `map` or json file to read). - b) guidance on how to use these in various contexts as they provide. I understand the intention of these to be widely available, providing _some_ kind of sensible baseline for discussions about carbon emissions from activity. I can't see how you'd do this without the information being open, because if the information isn't open, then only the people who can afford the data get to make any data-informed arguments. That seems to go against the stakeholder expectations, although this is a thing we'd need to confirm though, obvs. ### A sample of the data Here's the [link to the data in the spreadsheet - Harmonized IFI Default Grid Factors 2021 v3.1](https://unfccc.int/sites/default/files/resource/IFI%20Default%20Grid%20Factors%202021%20v3.1_unfccc.xlsx) The first four numerical numbers are combined margin emission factors - as I understand it, this looks like it factors in future deployment, which is in most cases expected to be lower carbon than the existing infra. Country / Territory / Island | Firm Energy (e.g., Hydro, Geothermal) | Intermittent Energy (e.g., Solar, Wind, Tidal)| Energy Efficiency | Electricity Consumption | Operating Margin Grid Emission Factor, gCO2/kWh (including for use in PCAF GHG accounting) -- | -- | -- | -- | -- | -- Afghanistan | 193 | 331 | 193 | 193 | 414 Albania | 0 | 0 | 0 | 0 | 0 Algeria | 397 | 479 | 397 | 397 | 528 American Samoa (U.S.) | 516 | 664 | 516 | 516 | 753 Andorra | 70 | 144 | 70 | 70 | 188 Angola | 748 | 1203 | 748 | 748 | 1476 ### What the columns mean > The common dataset containing DEFs is constructed using a Combined Margin (CM) for the grid that is comprised of an Operating Margin (OM) and a Build Margin (BM). The OM and BM are terms defined under the clean development mechanism (CDM)2 for grid connected electricity generation from renewable sources: > > (a) The OM represents the cohort of existing power plants whose operation will be most affected (reduced) by the project; > (b) The BM represents the cohort of the prospective/future power plants whose construction and operation could be affected by the renewable energy project, based on an assessment of planned and expected new generation capacity. So, in other words: 1. the OM is a figure for energy from existing power plants - i.e. this is how much is carbon is produced for every kilowatt hour of electricity at present. These come from a range of different places, and it's not clear what the licensing is for this information. 2. The BM is for future generation that _might or might not get built_. i.e. if you have a nice geothermal plant providing 2GW of clean, firm power, then perhaps you don't need to generate 2GW from that planned coal plant after all. We're not sure if these will be built though, so we calculate them separately, as it doesn't make sense to try factoring either into the emissions at present. The first four columns with numbers are _combined margin_ grid emission factors. These are different to the OM ones, as you can see. As I mentioned before, I think the OM figures are higher because they're not trying to incorporate future planned infrastructure - you might care about this if you were making an investment with a 10-20 year time horizon (i.e. do I build this solar farm?), but it's less relevant for understanding the carbon footprint of electricity today. ### Guidance on their use There is some guidance on where you would use these country level figures from [Methodological Approach for the Common Default Grid Emission Factor Dataset - AHG 001](https://unfccc.int/sites/default/files/resource/IFITWG_Methodological_approach_to_common_dataset.pdf): > This approach is most appropriate to apply in the case where grid emissions associated with project electricity consumption cannot be estimated accurately (e.g., due to paucity of hourly generation profile data or estimation of such would require sophisticated modelling). A tier 1 approach cannot be used with projects that actively manage electricity loads by modifying consumption profiles to achieve desired goals (e.g., matching consumption with the availability of electricity produced from a specific energy source). Tier 1 here basically means _default_ energy figures at country level, when you have no other info available. It's one step better than a global estimate i.e. 440g CO2/KWh. Tier 2 and tier 3 refer to higher resolution, like grid or even grid region level, where you _do_ have information where the number might be different where the generation is in a country, and when it's being run. This might be the case if you had access to one of the other APIs we have support for. For the purposes of understanding your emissions from compute you use _now_, where you aren't doing anything clever like moving compute loads through time or space in response to the grid itself changing, the OM figures look like they would work as a baseline. Here's some quoted guidance their usage as outlined in [Methodology/approach to account project emissions associated with grid electricity consumption - AHG-002](https://unfccc.int/climate-action/sectoral-engagement/ifis-harmonization-of-standards-for-ghg-accounting/ifi-twg-list-of-methodologies) Emphasis is mine: > This guidance applies to any investment project that uses grid electricity as an energy source. Examples of project types that are relevant for the approach are provided below (the list below is not exhaustive). > (a) Heat pumps, lights and appliances in buildings; > (b) Electric motors, pumps, robots, etc. in manufacturing facilities; > (c) Pumps, sensors and control systems in waste-water treatment plants; > **(d) Servers, telecommunications towers, computers, telephones and other ICT devices;** > (e) Electric vehicles (buses, cars, trucks, lawn motors, forklifts at ports, and tractors in agriculture, etc.). This makes me think they would work as a useful default set of numbers for the countries, to use for calculating emissions from an infra set up at present - _a baseline of sorts_. ### Comparing these to higher time resolution marginal emissions numbers from APIs I don't think this is the same as the data from marginal computing APIs, but I'll readily admit that even now, I'm not sure. I understood marginal intensity from API s on a short time horizon was for people figuring out _right now_ for example, whether turning something on or off would have much of an affect on the carbon intensity on the grid. If you wanted to work out the carbon savings from clever scheduling of compute you might compare these numbers with a marginal API, like that provided by [Wattime](https://nextjournal.com/greenweb/experiments-with-the-free-marginal-carbon-intensity-from-wattime), [Electricity Map's Marginal API](https://electricitymap.org/blog/marginal-carbon-intensity-of-electricity-with-machine-learning/), the [Carbon intensity in the UK provided by the National Grid](https://carbonintensity.org.uk/), and so on. These APIs give you some idea of what each marginal unit of usage might be - so rather than _looking at the whole fleet of power plants, and allocating a share to you that's the same as the energy your server is using_, they're looking at the ones that would need to be switched on based on their current view of the conditions on the entire grid, _as a consequence of you running that compute, and causing the extra demand for electricity to power that server_. These aren't the same, and they tend to be higher, as the you'd typically switch on faster responding fossil fuel generation to meet this need. These seem conceptually similar to the marginal figures in the dataset linked above, but after reading the guidance, I get the impression that the annual, by-country figures in the dataset are more about deciding whether to finance a whole new energy project, rather than deciding if existing, often dirtier capacity should be spun up to meet an uptick in demand. It may be the case that these balance out anyway over the long term, but I'm leaving a note here, for discussion, and to ask others for pointers,

for those new to it, Nomad is a fairly well known open source piece of software that powers a significant chunk of the internet. Amongst others, it’s used by Cloudflare for example handle tonnes of traffic, and they detail their use here.

I’ve been chatting to the team there, and there’s now a prototype branch being worked on, that consumes the go library I’ve linked above.

That would mean anything that uses Nomad can also use carbon intensity information, to move computing loads through space with annual data, or assuming there is hourly data, move computing loads to time and space, to where the carbon intensity is lower.

You can see the readme below, which outlines at a high level how it’s planned to work:

github.com

hashicorp/nomad/blob/d8b3a61d9ab85d3102e6c666af9f11c2a4a04ae5/CARBON.md

# Carbon-aware Nomad Experiment

This branch is an experiment to enable Nomad to minimize the climate impact of
the compute it manages. In particular it takes the carbon impact of nodes into
account when scheduling: prioritizing the use of of lower-carbon-producing
compute.

## Changes

### Scheduling

A new Carbon Scoring algorithm has been added to the scheduler. When enabled
the higher a node's carbon score, the less likely it will receive work.

Scoring weighting has also been added. To enable carbon scoring you must give
it a non-zero weight either on startup with the following server config:

```hcl
server {
  default_scheduler_config {

This file has been truncated. show original

For context, datacentres have surprising amount of flexibility for demand.

A typical rack can consume 10-20 kw of power, and well… datacentres can fit lots of racks into rooms, of which less than 20% are typically being used by compute jobs, with a signficiant majority waiting idle.

Sidenote: Tools like Nomad allow this idle capacity to be reduced, by moving reliability to another level - instead of having reliability through redundancy within the datacentre, you achieve redundancy across datacentres, and increasing the ‘density’ of compute on the servers that are running

Methodology qn 1 - global, annual marginal intensity of electricity:

I have a methodology question about marginal intensity figures, and I’d like to be able to derive sensible global or “europe” fallback figures, for cases when a machine doesn’t have access information about where in the world it’s running.

I know that publicly accessible marginal intensity figures exist on an annual basis for various countries. We’ve spoken about this a few times before, and the github issue linked above adds some more information.

The dataset listed above gives annual, per country based marginal intensity figures, based on the ‘operating margin’ and the ‘build margin’ numbers as documented in the issue.

If you wanted a global, annual figure for marginal intensity of electricity for say… 2021 I’m a little unsure of how you’d make this though.

Would you need a weighted average based on the assumed capacity or generation in each country? Or is there a more sensible approach than this?

If so, I know that Ember release some of this capacity / generation data, to make it possible to figure these numbers out, and they are licensed along very permissive lines. Is this a valid approach to take? If not, what would you suggest instead?

Qn 2 - Use cases beyond computing scheduling tools.

This project I’m working on art of this project where the goal is to find a way to annotate every single public address on the internet with some baseline carbon intensity data, so that other projects can consume this information for other interesting uses.

Once example I’ve seen is the idea of a carbon aware routing that ranks the routes for transmitting packets of data across the web, based on carbon intensity

I’ve outlined some more examples here:

https://www.thegreenwebfoundation.org/wp-content/uploads/2021-front-conference/index.html#47

If you know of others, I’d be interested in hearing as I’m coming to energy from a computing background, not the other way around.

robbie.morrison · 17 March 2022 16:01

Some fascinating ideas blogged. Let me add a couple of thoughts. Use of the instantaneous short‑run marginal carbon intensity of electricity at the point‑of‑supply is certainly conceptually attractive. And in some senses, mirrors nodal pricing (aka locational marginal pricing or LMP) with the unit price replaced by the carbon intensity in this case. Hence marginal carbon shares some of the attributes of LMP clearance — while assuming away any strategic behavior here of course.

In LMP however, plant operators look forward and take stock, for instance, of their hydro reserves against their expected hydrology. But there is no equivalent foresight in marginal carbon intensity in the first instance. Notwithstanding future values could be estimated using system models and duly incorporated into the management of server farm loadings. But the raw marginal intensity does not provide especially useful information in this context, I don’t think.

Much the same problem is faced by those devising charging strategies and price incentives for electric vehicle users. Indeed with growing emobility, the low voltage network peak is increasingly being felt at midnight and not early evening as is historically the case.

So my suggestion is that some form of anticipatory marginal carbon intensity would be required to manage low carbon server farms. That said, the idea of routing internet traffic and shifting data crunching to better locations in time and space is very appealing across a range of scales from minutes to hours — but some prescience about the future evolution of the energy system is doubtless required too.

Once everything on the system is carbon neutral, this question naturally evaporates. HTH, R.

mrchrisadams · 13 May 2022 08:25

Hi Robbie!

The libraries I’ve linked consume APIs from providers like Watttime - they provide marginal intensity APIs specifically to provide this kind of _ anticipatory marginal carbon intensity_ functionality.

https://www.watttime.org/api-documentation/#real-time-emissions-index

Electricity map offer a similar service in Europe/rest of the world for tracking marginal emissions. You can see them below:

However, I think these all refer to forward looking instances of grid marginal intensity.

I’m not familiar enough with models like PyPSA et al.

Do they have any way to represent concepts like ‘value stacking’ grid services from onsite storage and DER behaviour?

These at least seem to be more directly linked to local carbon intensity, and physical activity on the grid than PPAs