About Quarto

Quarto is a technical publishing package; it offers a lot more features than are used in this demonstration. Here, it is used to build a website that includes:

It is an implementation of literate programming, where code and prose are bound together.

Note

Quarto websites can be customized for organizations. For example if you are a colleague of Ian’s, he has an unofficial template that aims to comport with the company style.

Execution strategy

To render an entire Quarto website, you would make a global render:

quarto render

Quarto’s default rendering strategy is to render all the files it can find within the working directory, generally in alphabetical order. For a data pipeline, we want to render only those files that need updating, and to render in the optimal order. This is where DVC comes in; it uses the dependency graph to determine the execution order.

As a result, we make some adaptations to Quarto:

  • we use freeze: true as the default in _quarto.yml. As a result a global quarto render will not re-render these files.
  • for files not a part of the data pipeline, we use freeze: auto. These are re-rendered if the file has changed.
  • within the DVC pipeline, each stage has a render <some-file>.qmd command. This will render the file regardless.

This was discussed in theDVC writeup, a full rendering has two steps:

  • dvc repro: run the pipeline according to dependencies

  • quarto render: render the other files, compile the website

Jupyter notebooks

There is an equivalence between .qmd files and .ipynb files. Quarto offers a conversion utility; both files have markdown and code blocks.

That said, there are a some important differences:

  • .ipynb files are much more familiar to the general Python community than .qmd files (though, the visual editor using the Quarto extension for VS Code is pretty nice)
  • A .ipynb file renders into itself, then into an HTML file, whereas a .qmd file renders into an HTML file without modifying the .qmd file. For a DVC pipeline, this distinction is important, because a stage cannot depend on an .ipynb file - rendering it changes the file, inducing a circular dependency.

That .ipynb files contain both the source and result is why this project uses .qmd files, rather than more-familiar .ipynb files.

There is hope, however, for using .ipynb files in DVC. There is an issue at the DVC repo asking about the possibility to filter a source file to determine the dependency. In our case, we would want to filter to keep all the code cells in a notebook, possibly using nbconvert. Unfortunately, this issue has not been updated since July 2020.