import polars as pl
from pyprojroot.here import here
France Grid
The goal of this project is to execute and document a data pipeline for aspects of the French electrical grid. These are data made available publicly via APIs from RTE France.
I understand RTE France’s terms and conditions allow for republication, so long as the data are credited to RTE France, and are not distorted.
When finished, this site will publish tables as parquet files:
- generation sources, e.g. wind, solar
- exchanges with other countries, e.g. England, Belgium
- outdoor temperature in Paris (I know there’s more to France than Paris), not yet implemented
This will also be an opportunity for me to develop my skills with:
- Polars: alternative to Pandas
- DVC: pipeline orchestration and remote data management
- Quarto: technical publishing (this website)
- GitHub Actions: run the pipeline on a schedule
I am also developing an Observable notebook to be a consumer of this pipeline.
There are two sections you can access from the menu bar: the pipeline section contains the files in the pipeline; the about section has a little more material how this pipeline was put together.
Data
In this section, we summarize the published data, as of the last run of the pipeline.
Each API call to fetch data from RTE France contains, at most, two weeks of data. The pipeline runs on a daily schedule, so it will take a number of days before the pipeline “catches up” to the present day.
Generation
= pl.read_parquet(here("data/99-publish/standard/generation.parquet")) generation
Two files parquet files are published, each with the same information:
Because of JavaScript’s current timezone-limitations (soon to be solved), I am writing a version of the data where the date-times are projected into UTC, preserving the wall-clock time; these are the fake-UTC data.
The last few observations:
generation.tail()
type | interval_start | interval_end | generation |
---|---|---|---|
str | datetime[ms, Europe/Paris] | datetime[ms, Europe/Paris] | i64 |
"HYDRO" | 2023-08-09 23:45:00 CEST | 2023-08-10 00:00:00 CEST | 1314 |
"NUCLEAR" | 2023-08-09 23:45:00 CEST | 2023-08-10 00:00:00 CEST | 31675 |
"PUMPING" | 2023-08-09 23:45:00 CEST | 2023-08-10 00:00:00 CEST | -1 |
"SOLAR" | 2023-08-09 23:45:00 CEST | 2023-08-10 00:00:00 CEST | 208 |
"WIND" | 2023-08-09 23:45:00 CEST | 2023-08-10 00:00:00 CEST | 2348 |
type
: type of generationinterval_start
,interval_end
: date-times describing the intervalgeneration
: average (?) of generation during this interval (MW)
We count the number of observations and null values for the generation files (will be the same for both):
"type")).agg(
generation.groupby(pl.col("interval_start").min(),
pl.col("interval_end").max(),
pl.col("generation").count().alias("n_observations"),
pl.col("generation").null_count().alias("n_value_null"),
pl.col( )
type | interval_start | interval_end | n_observations | n_value_null |
---|---|---|---|---|
str | datetime[ms, Europe/Paris] | datetime[ms, Europe/Paris] | u32 | u32 |
"NUCLEAR" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"EXCHANGE" | 2017-01-01 00:00:00 CET | 2023-07-27 18:30:00 CEST | 230254 | 0 |
"FOSSIL_GAS" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"FOSSIL_OIL" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"PUMPING" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"FOSSIL_HARD_CO… | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"SOLAR" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"WIND" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
"HYDRO" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231523 | 0 |
"BIOENERGY" | 2017-01-01 00:00:00 CET | 2023-08-10 00:00:00 CEST | 231524 | 0 |
I don’t know, right now, why "HYDRO"
has one-fewer observation than the others - I’ll try to find out!
"interval_start")).agg(
generation.groupby(pl.col("generation").count().alias("n_observations")
pl.col("n_observations")).head(2) ).sort(pl.col(
interval_start | n_observations |
---|---|
datetime[ms, Europe/Paris] | u32 |
2023-07-29 14:30:00 CEST | 9 |
2023-07-28 07:00:00 CEST | 9 |
I don’t think this was at the end of a API call…
Flow
= pl.read_parquet(here("data/99-publish/standard/flow.parquet")) flow
Two files parquet files are published, each with the same information:
The last few observations:
flow.tail()
partner | interval_start | interval_end | flow_net |
---|---|---|---|
str | datetime[ms, Europe/Paris] | datetime[ms, Europe/Paris] | i64 |
"Belgium" | 2017-07-10 23:00:00 CEST | 2017-07-11 00:00:00 CEST | 1154 |
"England-IFA" | 2017-07-10 23:00:00 CEST | 2017-07-11 00:00:00 CEST | -2023 |
"Germany" | 2017-07-10 23:00:00 CEST | 2017-07-11 00:00:00 CEST | -662 |
"Italy" | 2017-07-10 23:00:00 CEST | 2017-07-11 00:00:00 CEST | -1150 |
"Switzerland" | 2017-07-10 23:00:00 CEST | 2017-07-11 00:00:00 CEST | 527 |
partner
: interchange, usually a countryinterval_start
,interval_end
: date-times describing the intervalflow
: average (?) of power flow, during this interval (MW) - positive means France received power
We count the number of observations and null values for the generation files (will be the same for both):
"partner")).agg(
flow.groupby(pl.col("interval_start").min(),
pl.col("interval_end").max(),
pl.col("flow_net").count().alias("n_observations"),
pl.col("flow_net").null_count().alias("n_value_null"),
pl.col( )
partner | interval_start | interval_end | n_observations | n_value_null |
---|---|---|---|---|
str | datetime[ms, Europe/Paris] | datetime[ms, Europe/Paris] | u32 | u32 |
"Switzerland" | 2017-01-01 00:00:00 CET | 2017-07-11 00:00:00 CEST | 4574 | 0 |
"Belgium" | 2017-01-01 00:00:00 CET | 2017-07-11 00:00:00 CEST | 4574 | 0 |
"Germany" | 2017-01-01 00:00:00 CET | 2017-07-11 00:00:00 CEST | 4574 | 0 |
"Italy" | 2017-01-01 00:00:00 CET | 2017-07-11 00:00:00 CEST | 4574 | 0 |
"Spain" | 2017-01-01 00:00:00 CET | 2017-06-27 16:00:00 CEST | 4254 | 0 |
"England-IFA" | 2017-01-01 00:00:00 CET | 2017-07-11 00:00:00 CEST | 4574 | 0 |
Secrets
To interact with the APIs and data-storage, the code in this report will expect certain environment variables to be set:
AWS_ACCESS_KEY_ID
,AWS_ACCESS_KEY_SECRET
: can also be set usingaws cli
- if you clone this repo, you will likely need to configure your own remote storage.
RTE_FRANCE_BASE64
base-64 encoding available from RTE application-page
These allow you access to an application (that you will have to configure on your RTE France account); this application will need access to these APIs: