# r
library("reticulate")
use_virtualenv("renv/python/virtualenvs/renv-python-3.10")2 Using Python
For this chapter, I will use Python via R’s reticulate package. All code blocks that do not use Python will have a comment at the top, just like this one which uses R:
from pins import board_folder, board_urls
from pyarrow import feather
import pandas as pd2.1 Folder board
board_here = board_folder("pins")
board_here.pin_list()['pins.txt', 'time-csv', 'time-json', 'penguins-csv', 'penguins-json', '_pins.yaml', 'time-arrow', 'penguins-arrow']
Should this be listing a test file?
2.1.1 Read
penguins_csv = board_here.pin_read("penguins-csv")penguins_csv species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
3 Adelie Torgersen NaN ... NaN NaN 2007
4 Adelie Torgersen 36.7 ... 3450.0 female 2007
.. ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 ... 4000.0 male 2009
340 Chinstrap Dream 43.5 ... 3400.0 female 2009
341 Chinstrap Dream 49.6 ... 3775.0 male 2009
342 Chinstrap Dream 50.8 ... 4100.0 male 2009
343 Chinstrap Dream 50.2 ... 3775.0 female 2009
[344 rows x 8 columns]
2.1.2 Timeseries
time_csv = board_here.pin_read("time-csv")
time_csv date datetime value
0 2010-01-01 2020-09-01 00:00:00 0
1 2010-01-02 2020-09-01 00:00:01 1
2 2010-01-03 2020-09-01 00:00:02 2
3 2010-01-04 2020-09-01 00:00:03 3
4 2010-01-05 2020-09-01 00:00:04 4
5 2010-01-06 2020-09-01 00:00:05 5
6 2010-01-07 2020-09-01 00:00:06 6
7 2010-01-08 2020-09-01 00:00:07 7
8 2010-01-09 2020-09-01 00:00:08 8
9 2010-01-10 2020-09-01 00:00:09 9
10 2010-01-11 2020-09-01 00:00:10 10
time_csv.dtypesdate object
datetime object
value int64
dtype: object
We are not parsing dates or datetimes. We don’t have the timezone.
There’s no driver yet for JSON, so let’s try Arrow:
# this seems hacky, but I'm sure it will get sorted out
time_arrow = pd.read_feather(board_here.pin_read("time-arrow")[0])time_arrow date datetime value
0 2010-01-01 2020-09-01 00:00:00-06:00 0
1 2010-01-02 2020-09-01 00:00:01-06:00 1
2 2010-01-03 2020-09-01 00:00:02-06:00 2
3 2010-01-04 2020-09-01 00:00:03-06:00 3
4 2010-01-05 2020-09-01 00:00:04-06:00 4
5 2010-01-06 2020-09-01 00:00:05-06:00 5
6 2010-01-07 2020-09-01 00:00:06-06:00 6
7 2010-01-08 2020-09-01 00:00:07-06:00 7
8 2010-01-09 2020-09-01 00:00:08-06:00 8
9 2010-01-10 2020-09-01 00:00:09-06:00 9
10 2010-01-11 2020-09-01 00:00:10-06:00 10
time_arrow.dtypesdate object
datetime datetime64[ns, America/Denver]
value int32
dtype: object
time_arrow['date'].valuesarray([datetime.date(2010, 1, 1), datetime.date(2010, 1, 2),
datetime.date(2010, 1, 3), datetime.date(2010, 1, 4),
datetime.date(2010, 1, 5), datetime.date(2010, 1, 6),
datetime.date(2010, 1, 7), datetime.date(2010, 1, 8),
datetime.date(2010, 1, 9), datetime.date(2010, 1, 10),
datetime.date(2010, 1, 11)], dtype=object)
2.2 Remote board
I don’t think I will create a Python package; maybe I can just put together a quick script to convert a pins manafest into a dictionary.
import requests
import yaml
# read file, parse into manifest
url_root = "https://ijlyttle.github.io/pins-three-ways/pins"
req = requests.get(f"{url_root}/_pins.yaml")
manifest = yaml.safe_load(req.text)# use the most-recent version
pin_paths = {}
for key, value in manifest.items():
pin_paths[key] = max(value)
pin_paths {'penguins-arrow': 'penguins-arrow/20220811T170224Z-ef034/', 'penguins-csv': 'penguins-csv/20220811T170157Z-809e9/', 'penguins-json': 'penguins-json/20220811T170152Z-fa33e/', 'time-arrow': 'time-arrow/20220811T224202Z-b1900/', 'time-csv': 'time-csv/20220811T224202Z-06d53/', 'time-json': 'time-json/20220811T224202Z-70d59/'}
The Python version has a board constructor board_urls():
board_remote = board_urls(url_root, pin_paths)
board_remote.pin_list()['penguins-arrow', 'penguins-csv', 'penguins-json', 'time-arrow', 'time-csv', 'time-json']
2.2.1 Read
We can read the CSV pin:
board_remote.pin_read("penguins-csv") species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
3 Adelie Torgersen NaN ... NaN NaN 2007
4 Adelie Torgersen 36.7 ... 3450.0 female 2007
.. ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 ... 4000.0 male 2009
340 Chinstrap Dream 43.5 ... 3400.0 female 2009
341 Chinstrap Dream 49.6 ... 3775.0 male 2009
342 Chinstrap Dream 50.8 ... 4100.0 male 2009
343 Chinstrap Dream 50.2 ... 3775.0 female 2009
[344 rows x 8 columns]
penguins_arrow = pd.read_feather(board_here.pin_read("penguins-arrow")[0])
penguins_arrow species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
3 Adelie Torgersen NaN ... NaN NaN 2007
4 Adelie Torgersen 36.7 ... 3450.0 female 2007
.. ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 ... 4000.0 male 2009
340 Chinstrap Dream 43.5 ... 3400.0 female 2009
341 Chinstrap Dream 49.6 ... 3775.0 male 2009
342 Chinstrap Dream 50.8 ... 4100.0 male 2009
343 Chinstrap Dream 50.2 ... 3775.0 female 2009
[344 rows x 8 columns]
penguins_arrow.dtypesspecies category
island category
bill_length_mm float64
bill_depth_mm float64
flipper_length_mm float64
body_mass_g float64
sex category
year int32
dtype: object
penguins_arrow['species'].values['Adelie', 'Adelie', 'Adelie', 'Adelie', 'Adelie', ..., 'Chinstrap', 'Chinstrap', 'Chinstrap', 'Chinstrap', 'Chinstrap']
Length: 344
Categories (3, object): ['Adelie', 'Chinstrap', 'Gentoo']