import os
import polars as pl
from pyprojroot.here import here
import jsonGeneration Publish
In the publish step:
- Write out data as parquet files in “standard” and “fake UTC” formats.
- Write out metadata as a JSON file.
table = pl.read_parquet(here("data/02-transform/flow.parquet"))JavaScript does not (yet) have a timezone database available; you can use the timezone of the browser, or you can use UTC. The idea here is to use a “fake UTC”, by projecting the date-times from their original timezone (Europe/Paris) to UTC, preserving the wall-clock time. This will help us with any date-time math in JavaScript, but with the price of introducing a gap and a duplication at the daylight-saving time transitions.
table_fake_utc = table.with_columns(
pl.col(["interval_start", "interval_end"]).map(
lambda x: x.dt.replace_time_zone(time_zone="UTC")
),
)We publish both the standard and fake-UTC tables:
path_standard = here("data/99-publish/standard")
os.makedirs(path_standard, exist_ok=True)
table.write_parquet(f"{path_standard}/flow.parquet")path_fake_utc = here("data/99-publish/fake-utc")
os.makedirs(path_fake_utc, exist_ok=True)
table_fake_utc.write_parquet(f"{path_fake_utc}/flow.parquet")We also calculate and write out some metadata:
interval_end: latest observation for eachtype, then aggregated using earliest of these.
interval_end = (
table.groupby(pl.col("partner"))
.agg(pl.col("interval_end").max())
.get_column("interval_end")
.min()
)We publish this to a metadata file:
dict = {"interval_end": interval_end.isoformat()}
with open(here("data/99-publish/flow-meta.json"), "w") as file:
json.dump(dict, file)