About Quarto
Quarto is a technical publishing package; it offers a lot more features than are used in this demonstration. Here, it is used to build a website that includes:
- the data pipeline
- the files published from the pipeline
- some discussion about the process (you are here)
It is an implementation of literate programming, where code and prose are bound together.
Quarto websites can be customized for organizations. For example if you are a colleague of Ian’s, he has an unofficial template that aims to comport with the company style.
Execution strategy
To render an entire Quarto website, you would make a global render:
quarto render
Quarto’s default rendering strategy is to render all the files it can find within the working directory, generally in alphabetical order. For a data pipeline, we want to render only those files that need updating, and to render in the optimal order. This is where DVC comes in; it uses the dependency graph to determine the execution order.
As a result, we make some adaptations to Quarto:
- we use
freeze: true
as the default in_quarto.yml
. As a result a globalquarto render
will not re-render these files. - for files not a part of the data pipeline, we use
freeze: auto
. These are re-rendered if the file has changed. - within the DVC pipeline, each stage has a
render <some-file>.qmd
command. This will render the file regardless.
This was discussed in theDVC writeup, a full rendering has two steps:
dvc repro
: run the pipeline according to dependenciesquarto render
: render the other files, compile the website
Jupyter notebooks
There is an equivalence between .qmd
files and .ipynb
files. Quarto offers a conversion utility; both files have markdown and code blocks.
That said, there are a some important differences:
.ipynb
files are much more familiar to the general Python community than.qmd
files (though, the visual editor using the Quarto extension for VS Code is pretty nice)A .ipynb
file renders into itself, then into an HTML file, whereas a.qmd
file renders into an HTML file without modifying the.qmd
file. For a DVC pipeline, this distinction is important, because a stage cannot depend on an.ipynb
file - rendering it changes the file, inducing a circular dependency.
That .ipynb
files contain both the source and result is why this project uses .qmd
files, rather than more-familiar .ipynb
files.
There is hope, however, for using .ipynb
files in DVC. There is an issue at the DVC repo asking about the possibility to filter a source file to determine the dependency. In our case, we would want to filter to keep all the code cells in a notebook, possibly using nbconvert
. Unfortunately, this issue has not been updated since July 2020.