= html.div(...) app.layout
2 Dash
Although it is a gross oversimplification, at first glance Dash seems like Python’s answer to Shiny.
Accordingly, the goals of this chapter are:
- to show the ways the Dash is like Shiny.
- to introduce the ways Dash is different from Shiny.
- to help you form strategies for how to “think” about Dash.
To provide some context, we will use this demonstration app, and examine its code.
2.1 Principles
Both Shiny and Dash use the idea of a reactive graph, which indicates what things depend on what other things:
In Shiny, the reactive graph (what depends on what) is inferred using the code in reactive expressions.
In Dash, it is explicit, which is a mixed blessing.
For Dash, this explicitness provides the flexibility to do a lot of things, but the price is that you have to specify:
- all the DOM components (UI elements) in the layout.
- each connection between components is governed by a callback function that you provide.
This is substantially similar to Shiny: part of the Dash app is used to define the UI; the rest defines what happens on the server.
However, there are a couple of big considerations:
- the state of the app cannot be stored in the server application.
- it is easiest to move you data around using JSON and base64-encoded data.
This is different from Shiny, which stores the state of the application in the server function. Further, Shiny manages the serialization/de-serialization of data to/from the UI; with Dash, you have to manage that yourself.
These are not insurmountable obstacles, as well see in the rest of this section. For example, one place to store the state is the user’s browser, in the web-page (DOM) itself.
2.1.1 Everything that exists is a component
These first two subsections are an homage to the famous John Chambers quote:
To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.
Similarly, everything that exists on Dash app’s web page is a component.
As we’ll see in the demo, a Dash app contains a layout that you specify:
You need to fill in the ...
. A component might be a straightforward HTML element, or it might be a Dash component, where you define the attributes.
The html
object (imported from the dash
package) behaves very similarly to R’s htmltools package; they are both based on the HTML5 spec.
We’ll see more in the demo, but here’s an example of a Dash component:
id='cols-group', multi=True) dcc.Dropdown(
This is a dropdown component; we define the id
and multi
properties at definition. In this case, we don’t define the options
or value
properties. We’ll update the options
dynamically, and let the user set the value
.
Like other components, dropdowns have a number of properties; we can set them either at initialization, as we did here, or we can set them using a callback.
2.1.2 Everything that happens is a callback
If you want something to happen in a Dash app, it has to happen in a callback function. Dash lets you write callbacks using Python. It also lets you write callbacks in JavaScript, but that gets beyond the scope of this book.
We’ll see this in more detail in the demo app, but a callback is a standard Python function with a decorator:
@app.callback(Output('cols-group', 'options'),
'inp', 'data'))
Input(def update_cols_group(data_records):
return cols_choice(data_records, 'object')
The decorator, @app.callback(...)
tells Dash which layout components to map to the function’s inputs and outputs. When an Input()
changes:
- the browser calls the Dash server to run the callback function.
- the Dash server runs the Python function.
- the Dash server sends the
Output()
to the browser. - the browser updates the DOM.
2.1.3 Server cannot store state
Managing state is a pain. However, by remaining stateless, Dash is able to easily scale to as many server instances it needs because it does not matter which instance of a callback-function responds to which browser (user) making the call.
Coming from Shiny, this might seem like a show-stopper; we are used to manipulating, then storing data using the server side of an app. But there are ways around this. It’s not that you can’t store the state - you just can’t store it “here”. Your options are:
- store data in the DOM, then send it when needed.
- store data in an external database, or the like.
We’ll use the first option here. Here’s one of the components in our layout:
id='inp', data=penguins.to_dict('records')) dcc.Store(
Note that this component is initialized using the penguins
data, but that we are using pandas’ to_dict()
method. This is because the component will receive the data using JSON; it is stored in the DOM as a JavaScript object.
2.1.4 Use JSON or base64
The final thing to keep in mind is that when we communicate data between the browser DOM and the callback functions, it does not use native Python objects. Instead, from the Python callback-functions’ perspective, data is serialized to JSON when sent to the DOM, and deserialized from JSON when received from the DOM.
For Python dictionaries and lists containing numbers and strings, the serialization process is implied.
There are (at least) a couple of conventions for serializing a data frame to JSON: by row or by column.
Coming from R, you may think of a data frame as a list of vectors, each of the same length. This is column-based, for example:
{
"species": ["Adelie", "Adelie"],
"bill_length_mm": [39.1, 18.7]
}
Alternatively, the row-based approach:
[
{"species": "Adelie", "bill_length_mm": 39.1},
{"species": "Adelie", "bill_length_mm": 18.7}
]
The row-based approach seems to be the convention in Dash; this is the approach used by D3. Here, we think of a data frame as a collection of records. To serialize from Pandas, we’ll use to_dict('records')
; to deserialize (import) into Pandas, we’ll use from_dict()
.
In the context of the Python code, I’ll refer to data formats as either:
- data-frame format, i.e. Pandas data frame
- records format, i.e. JSON collection of records
The other option is to use base64 encoding; I have seen this used for uploading/downloading text files, e.g. CSV or JSON files.
2.2 Demonstration app
I’ve tried to outline, briefly, some of the principles used to build a Dash app; I think they will make more sense in the context of the Dash demo-app.
2.2.1 Description
Here is the reactive graph for the demonstration app.
It’s a little busier than the Shiny app; Dash forces the developer to be explicit, Shiny makes some things implicit.
A few things to note about this diagram:
Each rectangle is a layout component; each circle is a callback function.
Components have properties, which are associated with arguments to callback functions.
When an argument to a callback function changes, the function is (re-)run.
The output(s) of callback functions are used to update the properties of components.
Functions are run, properties are updated, and so on, until the app “comes to rest”.
There are a few more things formalized in the diagram, but we’ll get to them as we discuss the demo app:
2.2.2 Prelims
Like we did with Shiny, in the rest of this chapter, we’ll highlight the code used to make app, and the design choices behind the code. In the repository, there are a some files to pay attention to:
app-aggregate-local.py
helpers/
__init__.py
aggregate.py
cols.py
Here’s the app file, app-aggregate-local.py
, with the components and callbacks removed:
# Run this app with `python app-aggregate-local.py` and
# visit http://127.0.0.1:8050/ in your web browser.
import dash
from dash.dependencies import Input, Output, State
from dash import dash_table
from dash import html
from dash import dcc
import dash_bootstrap_components as dbc
import pandas as pd
from palmerpenguins import load_penguins
from helpers.aggregate import aggregate_df, agg_function_choices
from helpers.cols import cols_choice, cols_header
= load_penguins()
penguins
= dash.Dash(
app __name__,
=[dbc.themes.BOOTSTRAP]
external_stylesheets
)
# make `server` available to Heroku
= app.server
server
# <component layout>
# <callback functions>
if __name__ == '__main__':
=True) app.run_server(debug
The first part of the file handles the imports. You will need to have installed (hopefully in your virtual environment): dash
, dash_bootstrap_components
, pandas
, and palmerpenguins
. For example:
> pip install dash
The remaining bits of the file set up our app:
Load the
penguins
data.Create the
app
. This is the standard way to create a Dash app; note that we are using the Bootstrap styling.We make the
app.server
available at the top level asserver
. This goes a little beyond the scope of this chapter, but we are doing this as a part of deploying the app on Heroku.Components and callbacks, we’ll look at these in detail in following sections.
A standard bit of code that tells Python to
app.runServer()
when we execute this file.
We also have a helpers
directory with some files. The __init__.py
file is empty; its purpose is to signal to Python that there are .py
files in this directory with functions that can be imported.
2.2.3 Helper functions
Just like in R, I like to write as much “vanilla” Python code as I can. I want to demonstrate to myself, as much as I can, that the “guts” of the app behaves as I expect. That way, when I’m building the components and the callbacks, I can narrow down the things that might be going wrong.
# using only to demonstrate code; not used in Python Dash.
library("reticulate")
use_virtualenv("./venv")
I’m reproducing this code in an Quarto document using the R engine, so I’m using the reticulate package to run Python. You will not need to do this to create a Dash app; you can work in a purely Python environment.
In fact, if you are newish to Python, you might find the “Field Guide” appendix useful to get things set up. For Python programming, there is a great introduction at reticulate.
Our first step is to get a look at the penguins
dataset, to verify it’s the same as we use in R.
# importing into RMarkdown environment for exposition
import pandas as pd
from palmerpenguins import load_penguins
import pprint
= load_penguins()
penguins print(penguins)
species island bill_length_mm ... body_mass_g sex year
0 Adelie Torgersen 39.1 ... 3750.0 male 2007
1 Adelie Torgersen 39.5 ... 3800.0 female 2007
2 Adelie Torgersen 40.3 ... 3250.0 female 2007
3 Adelie Torgersen NaN ... NaN NaN 2007
4 Adelie Torgersen 36.7 ... 3450.0 female 2007
.. ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 ... 4000.0 male 2009
340 Chinstrap Dream 43.5 ... 3400.0 female 2009
341 Chinstrap Dream 49.6 ... 3775.0 male 2009
342 Chinstrap Dream 50.8 ... 4100.0 male 2009
343 Chinstrap Dream 50.2 ... 3775.0 female 2009
[344 rows x 8 columns]
Because we will tell Dash to move data back and forth using a dictionary, we’ll make a variable that stores penguins
as records:
= penguins.to_dict('records')
penguins_records 0:2]) # just to get a flavor pprint.pp(penguins_records[
[{'species': 'Adelie',
'island': 'Torgersen',
'bill_length_mm': 39.1,
'bill_depth_mm': 18.7,
'flipper_length_mm': 181.0,
'body_mass_g': 3750.0,
'sex': 'male',
'year': 2007},
{'species': 'Adelie',
'island': 'Torgersen',
'bill_length_mm': 39.5,
'bill_depth_mm': 17.4,
'flipper_length_mm': 186.0,
'body_mass_g': 3800.0,
'sex': 'female',
'year': 2007}]
The first function we’ll verify is used to tell us, for a given a data frame and a given Pandas type, which columns are of that type. (In the app, we’ll convert to records-format in the callback function.)
def cols_choice (df, include):
return df.select_dtypes(include=include).columns.to_list()
Let’s test the function. In Pandas, strings have type 'object'
:
'object') cols_choice(penguins,
['species', 'island', 'sex']
Numbers have type 'number'
:
'number') cols_choice(penguins,
['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g', 'year']
So far, so good.
Next, we need a function to generate the (display) table properties. We need to return a list where each element is dictionary that describes a column in the data frame. As we’ll see later, this is the format we need to specify the table headers.
Note that we are not using the data-frame format in this function, so we can keep everything in records format. We use a trick by looking at only the first (well, zeroth) record.
def cols_header (data_records):
if (len(data_records) == 0):
return []
return [{'name': v, 'id': v} for v in data_records[0].keys()]
There’s all sorts of ways to do this in Python. I’m a sucker for functional programming, so here’s how to use a map()
function. This returns a map object, so we need to change it to a list.
list(
map(lambda v: {'name': v, 'id': v}, data_records[0].keys())
)
This works largely like R’s purrr::map()
, but in Python, we provide the function first, then the thing we are iterating over. Here’s a purrr equivalent:
::map(names(data_records[1]), ~list(name = .x, id = .x)) purrr
Anyhow, let’s try out our Python function:
cols_header(penguins_records)
[{'name': 'species', 'id': 'species'}, {'name': 'island', 'id': 'island'}, {'name': 'bill_length_mm', 'id': 'bill_length_mm'}, {'name': 'bill_depth_mm', 'id': 'bill_depth_mm'}, {'name': 'flipper_length_mm', 'id': 'flipper_length_mm'}, {'name': 'body_mass_g', 'id': 'body_mass_g'}, {'name': 'sex', 'id': 'sex'}, {'name': 'year', 'id': 'year'}]
This seems OK.
Finally, we need an aggregation function. We send a data frame, then some lists of column-names, and a string naming an aggregation-function. This is an exercise in Pandas - I think this works, but my lack of experience in Pandas suggests there may be a “better” way. We’ll write this as in terms of data frames; we’ll convert to/from records-format in the callback function.
= ['mean', 'min', 'max']
agg_function_choices
def aggregate_df (df, cols_group, cols_agg, func_agg,
= agg_function_choices):
str_fn_choices
if not func_agg in str_fn_choices:
raise AssertionError(f"{func_agg} not a legal function-choice")
if (cols_group != None):
= df.groupby(cols_group)
df
if (cols_agg == None or len(cols_agg) == 0):
return []
# dictionary, keys: column-names, values: function-name
= {i: func_agg for i in cols_agg}
dict_agg
= df.agg(dict_agg).reset_index()
df
return df
aggregate_df(
penguins, = "island",
cols_group = ["bill_length_mm", "bill_depth_mm"],
cols_agg = "mean"
func_agg )
island bill_length_mm bill_depth_mm
0 Biscoe 45.257485 15.874850
1 Dream 44.167742 18.344355
2 Torgersen 38.950980 18.429412
This may be interesting only to me, but there is a more “functional” way to write the statement:
= {i: func_agg for i in cols_agg} dict_agg
Using a reducer:
from functools import reduce
= reduce(lambda acc, val: dict(acc, **{val: func_agg}), cols_agg, {}) dict_agg
In this situation, it’s more verbose and, I think, less clear what is going on unless you are into functional programming. Please pardon this diversion.
Confident that the “guts” of the app works, we move onto the components and the callbacks.
2.2.4 Compmonent layout
Let’s start with the formatting. In the app, we need to create an app.layout
, which will be the HTML used to render our app in the browser. Just like with Shiny, we’re using Bootstrap, but we have to be a little more explicit. In Shiny, Bootstrap is “baked in”; in Dash, it’s an add-on.
Here’s the “formatting” bits of the layout; the ...
represent Dash components that we’ll discuss presently:
= html.Div(
app.layout ='container-fluid',
className=[
children
...'Aggregator'),
html.H2(
html.Div(='row',
className=[
children
html.Div(='col-sm-4',
className=[
children
dbc.Card(['Aggregation'),
dbc.CardHeader(
dbc.CardBody([
...
])
])
]
),
html.Div(='col-sm-8',
className=[
children'Input data'),
html.H3(
...
html.Hr(),'Aggregated data'),
html.H3(
...
]
)
]
)
] )
2.2.4.1 Data stores
The first parts I want to highlight are the data stores:
id='inp', data=penguins.to_dict('records')),
dcc.Store(id='agg', data=[]) dcc.Store(
Recall that the server part of a Dash app cannot store its state. Instead, we will store the state, in our case: the input and aggregated data, in the DOM. Dash provides a dcc.Store()
component; in its data
property, we store the records-based data (note the use of .to_dict('records')
. Also note that, just like Shiny, components each have an id
property.
2.2.4.2 Inputs
Let’s look at the components that specify the aggregation:
id='cols-group', multi=True),
dcc.Dropdown(id='cols-agg', multi=True),
dcc.Dropdown(
dbc.Select(id='func-agg',
=[{'label': v, 'value': v} for v in agg_function_choices],
options=agg_function_choices[0]
value
),id='button-agg', children='Submit', class_name='btn btn-secondary') dbc.Button(
Here, we’re using a couple of “standard” (dcc
) dropdowns and a couple of Bootstrap (dbc
) components.
With the dropdowns, other than providing the id
, we specify that multiple selections can be made. We don’t populate the options; we’ll do this using a callback. If we were only ever going to use one input-dataset, in this case penguins
, it might make sense to populate the options
when defining the component. Populating using a callback function makes our code more general, allowing the case where we could upload an abitrary dataset. Perhaps it wasn’t necessary to follow the more-general approach here, but it allowed me to learn more about how Dash works.
We define our dbc.Select()
component completely in the layout.
Note also that we can apply Bootstrap classes todbc.Button()
. What we might think of as a label is specified as the children
of the button element.
2.2.4.3 Outputs
It remains to look at the data tables. There are two tables, identical in form; we will examine only one of them:
dash_table.DataTable(id='table-inp',
=10,
page_size='native'
sort_action )
The properties here are straightforward: we need an id
, we want to show ten entries at a time, sort_action='native'
indicates that we want sorting to be available (and for Dash to take care of it).
We will populate the data tables using callback functions.
2.2.5 Callback functions
In Dash, callback functions are just regular functions with decorators. Here’s a very generic example:
@app.callback(Output('output-component', 'output-property'),
'input-component', 'input-property'))
Input(def some_function_name(x):
= some_function(x)
y
return y
The decorator tells Dash how to build a function that wraps the “actual” function. We are not concerned with the implementation; we are interested in the interface.
The decorator is this bit of code: @app.callback()
. It takes a series of arguments which map to the function’s parameters and return values:
Output('output-component', 'output-property')
is mapped to the return value.Input('input-component', 'input-property')
is mapped to thex
parameter.
This tells Dash that whenever the input-property
of the input-component
changes, it should:
- run the function using the
input-property
as an argument, then - send the return value to the
output-property
of theoutput-component
We’ll see some more-complex cases in the following examples.
2.2.5.1 Inputs
We have a couple of input components that need updating: the grouping columns and the aggregation columns. Each has its own callback, but they are virtually identical, so I’ll describe only one.
@app.callback(Output('cols-group', 'options'),
'inp', 'data'))
Input(def update_cols_group(data_records):
= pd.DataFrame.from_dict(data_records)
df return cols_choice(df, 'object')
Whenever the inp
data
changes:
- the function is called using the input data (in records form).
- the function converts to data-frame format, then
- uses our helper function to determine the columns that have string values.
- it returns a list of column names, which Dash uses to update the
cols-group
options
.
2.2.5.2 Calculations
The callback that performs the aggregation has more going on, but we’ll get to the bottom of it.
@app.callback(Output('agg', 'data'),
'button-agg', 'n_clicks'),
Input('inp', 'data'),
State('cols-group', 'value'),
State('cols-agg', 'value'),
State('func-agg', 'value'),
State(=True)
prevent_initial_calldef aggregate(n_clicks, data_records, cols_group, cols_agg, func_agg):
# create DataFrame
= pd.DataFrame.from_dict(data_records)
df
# aggregate
= aggregate_df(df, cols_group, cols_agg, func_agg)
df_new
# serialize DataFrame
return df_new.to_dict('records')
A couple of things to sort through:
The function has five parameters; the decorator has one
Input()
and four instances ofState()
.The decorator is provided an additional argument:
prevent_initial_call
.
A State()
is similar to an Input()
; the difference that the function is not run in response to a State()
change. In this case, the function is run only in response to the value of the button changing, i.e. being clicked.
The prevent_inital_call
argument describes itself well. By setting it to True
, we ensure that the only way the aggregation function will be run is when the button is clicked.
Finally note that we convert our input records into a data frame, then use records-format for the return value.
2.2.5.3 Outputs
The final set of callbacks is for our data tables. Again, although we have two data tables, the callbacks are virtually identical, so I’ll highlight only one.
@app.callback(Output('table-inp', 'columns'),
'table-inp', 'data'),
Output('inp', 'data'))
Input(def update_table_inp(data_records):
return cols_header(data_records), data_records
The thing to note there is that Dash and Python support multiple return-values, mapping each to an Output()
.
This situation merits multiple return-values because a Dash data table has a properties for data
and columns
(headers). This information comes from the same source, so it made sense to use multiple return-values.
Certainly, you could write two callbacks to do the same things; writing a single callback made more sense to me. Of course, you should do what makes sense to you.