France Grid

Author

Ian Lyttle

Published

2025-01-30

The goal of this project is to execute and document a data pipeline for aspects of the French electrical grid. These are data made available publicly via APIs from RTE France.

I understand RTE France’s terms and conditions allow for republication, so long as the data are credited to RTE France, and are not distorted.

When finished, this site will publish tables as parquet files:

This will also be an opportunity for me to develop my skills with:

I am also developing an Observable notebook to be a consumer of this pipeline.

There are two sections you can access from the menu bar: the pipeline section contains the files in the pipeline; the about section has a little more material how this pipeline was put together.

Data

In this section, we summarize the published data, as of the last run of the pipeline.

import polars as pl
from pyprojroot.here import here

Each API call to fetch data from RTE France contains, at most, two weeks of data. The pipeline runs on a daily schedule, so it will take a number of days before the pipeline “catches up” to the present day.

Generation

generation = pl.read_parquet(here("data/99-publish/standard/generation.parquet"))

Two files parquet files are published, each with the same information:

Because of JavaScript’s current timezone-limitations (soon to be solved), I am writing a version of the data where the date-times are projected into UTC, preserving the wall-clock time; these are the fake-UTC data.

The last few observations:

generation.tail()
shape: (5, 4)
type interval_start interval_end generation
str datetime[ms, Europe/Paris] datetime[ms, Europe/Paris] i64
"HYDRO" 2023-08-09 23:45:00 CEST 2023-08-10 00:00:00 CEST 1314
"NUCLEAR" 2023-08-09 23:45:00 CEST 2023-08-10 00:00:00 CEST 31675
"PUMPING" 2023-08-09 23:45:00 CEST 2023-08-10 00:00:00 CEST -1
"SOLAR" 2023-08-09 23:45:00 CEST 2023-08-10 00:00:00 CEST 208
"WIND" 2023-08-09 23:45:00 CEST 2023-08-10 00:00:00 CEST 2348
  • type: type of generation
  • interval_start, interval_end: date-times describing the interval
  • generation: average (?) of generation during this interval (MW)

We count the number of observations and null values for the generation files (will be the same for both):

generation.groupby(pl.col("type")).agg(
    pl.col("interval_start").min(),
    pl.col("interval_end").max(),
    pl.col("generation").count().alias("n_observations"),
    pl.col("generation").null_count().alias("n_value_null"),
)
shape: (10, 5)
type interval_start interval_end n_observations n_value_null
str datetime[ms, Europe/Paris] datetime[ms, Europe/Paris] u32 u32
"NUCLEAR" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"EXCHANGE" 2017-01-01 00:00:00 CET 2023-07-27 18:30:00 CEST 230254 0
"FOSSIL_GAS" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"FOSSIL_OIL" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"PUMPING" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"FOSSIL_HARD_CO… 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"SOLAR" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"WIND" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0
"HYDRO" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231523 0
"BIOENERGY" 2017-01-01 00:00:00 CET 2023-08-10 00:00:00 CEST 231524 0

I don’t know, right now, why "HYDRO" has one-fewer observation than the others - I’ll try to find out!

generation.groupby(pl.col("interval_start")).agg(
    pl.col("generation").count().alias("n_observations")
).sort(pl.col("n_observations")).head(2)
shape: (2, 2)
interval_start n_observations
datetime[ms, Europe/Paris] u32
2023-07-29 14:30:00 CEST 9
2023-07-28 07:00:00 CEST 9

I don’t think this was at the end of a API call…

Flow

flow = pl.read_parquet(here("data/99-publish/standard/flow.parquet"))

Two files parquet files are published, each with the same information:

The last few observations:

flow.tail()
shape: (5, 4)
partner interval_start interval_end flow_net
str datetime[ms, Europe/Paris] datetime[ms, Europe/Paris] i64
"Belgium" 2017-07-10 23:00:00 CEST 2017-07-11 00:00:00 CEST 1154
"England-IFA" 2017-07-10 23:00:00 CEST 2017-07-11 00:00:00 CEST -2023
"Germany" 2017-07-10 23:00:00 CEST 2017-07-11 00:00:00 CEST -662
"Italy" 2017-07-10 23:00:00 CEST 2017-07-11 00:00:00 CEST -1150
"Switzerland" 2017-07-10 23:00:00 CEST 2017-07-11 00:00:00 CEST 527
  • partner: interchange, usually a country
  • interval_start, interval_end: date-times describing the interval
  • flow: average (?) of power flow, during this interval (MW) - positive means France received power

We count the number of observations and null values for the generation files (will be the same for both):

flow.groupby(pl.col("partner")).agg(
    pl.col("interval_start").min(),
    pl.col("interval_end").max(),
    pl.col("flow_net").count().alias("n_observations"),
    pl.col("flow_net").null_count().alias("n_value_null"),
)
shape: (6, 5)
partner interval_start interval_end n_observations n_value_null
str datetime[ms, Europe/Paris] datetime[ms, Europe/Paris] u32 u32
"Switzerland" 2017-01-01 00:00:00 CET 2017-07-11 00:00:00 CEST 4574 0
"Belgium" 2017-01-01 00:00:00 CET 2017-07-11 00:00:00 CEST 4574 0
"Germany" 2017-01-01 00:00:00 CET 2017-07-11 00:00:00 CEST 4574 0
"Italy" 2017-01-01 00:00:00 CET 2017-07-11 00:00:00 CEST 4574 0
"Spain" 2017-01-01 00:00:00 CET 2017-06-27 16:00:00 CEST 4254 0
"England-IFA" 2017-01-01 00:00:00 CET 2017-07-11 00:00:00 CEST 4574 0

Secrets

To interact with the APIs and data-storage, the code in this report will expect certain environment variables to be set:

  • AWS_ACCESS_KEY_ID, AWS_ACCESS_KEY_SECRET: can also be set using aws cli
    • if you clone this repo, you will likely need to configure your own remote storage.
  • RTE_FRANCE_BASE64 base-64 encoding available from RTE application-page

These allow you access to an application (that you will have to configure on your RTE France account); this application will need access to these APIs: