import os
import polars as pl
from pyprojroot.here import here
import json
Generation Publish
In the publish step:
- Write out data as parquet files in “standard” and “fake UTC” formats.
- Write out metadata as a JSON file.
= pl.read_parquet(here("data/02-transform/flow.parquet")) table
JavaScript does not (yet) have a timezone database available; you can use the timezone of the browser, or you can use UTC. The idea here is to use a “fake UTC”, by projecting the date-times from their original timezone (Europe/Paris
) to UTC, preserving the wall-clock time. This will help us with any date-time math in JavaScript, but with the price of introducing a gap and a duplication at the daylight-saving time transitions.
= table.with_columns(
table_fake_utc "interval_start", "interval_end"]).map(
pl.col([lambda x: x.dt.replace_time_zone(time_zone="UTC")
), )
We publish both the standard and fake-UTC tables:
= here("data/99-publish/standard")
path_standard =True)
os.makedirs(path_standard, exist_okf"{path_standard}/flow.parquet") table.write_parquet(
= here("data/99-publish/fake-utc")
path_fake_utc =True)
os.makedirs(path_fake_utc, exist_okf"{path_fake_utc}/flow.parquet") table_fake_utc.write_parquet(
We also calculate and write out some metadata:
interval_end
: latest observation for eachtype
, then aggregated using earliest of these.
= (
interval_end "partner"))
table.groupby(pl.col("interval_end").max())
.agg(pl.col("interval_end")
.get_column(min()
. )
We publish this to a metadata file:
dict = {"interval_end": interval_end.isoformat()}
with open(here("data/99-publish/flow-meta.json"), "w") as file:
dict, file) json.dump(