function(x) {
**2 - 1
x
}
function(x) {
paste(x, " using a pure function.")
}
1 Shiny
Shiny was my introduction to reactive programming. Like many folks, I started by hacking to “get stuff working”; this is a perfectly-honorable path. Then, I watched Joe Cheng’s tutorials (Part 1, Part 2), in which he explained some of the theory behind Shiny. These talks started me on a path that completely changed my persepective and, eventually, my abilities as a programmer.
This chapter is meant to be a review of Shiny; we will:
- touch on some of the principles I learned from Joe’s talks.
- show how these principles are implemented the demonstration app.
1.1 Principles
These are some things to keep in mind to help you write more-understandable and predictable Shiny apps.
1.1.1 Pure functions vs. side effects
This is the single biggest concept I have learned as a programmer, and I learned it relatively late in my career.
A pure function has two properties:
- given the same set of arguments, it always returns the same value.
- it makes no changes outside of its scope.
This can provide us some big benefits:
- it doesn’t matter where or how the return value is computed, we can rely on getting the same answer.
- we don’t have to worry about the environment changing as a result of calling the function.
Here’s a couple of examples of pure functions:
Pure functions are relatively striaghtforward to test because the output depends only on the inputs.
Side effects is a catch-all term for when a function’s behavior either:
- depends on something not passed in as an argument.
- changes the something outside of its scope, e.g.: writes a file, displays a plot.
Here’s a couple of functions that either depend on or cause side effects:
# return value depends on the *contents* of the file, not just file_name
function(file_name) {
read.csv(file_name)
}
# this might make a change in a remote service
function(url, data) {
<- curl::new_handle()
h ::handle_setform(h, data)
curl
::curl(url)
curl }
Aside from being non-deterministic, functions with side effects can take a long time to execute.
Of course, side effects are not necessarily bad things, but we need to be aware of them. Your Shiny server-function will make much more sense, and be much easier to debug, if you recognize pure functions and side effects.
1.1.2 Reactives vs. observers
Shiny server-functions provide two broad mechanisms for updating the state of your app:
reactive()
: these return values, and work well with pure functions. In other words, the returned value depends only on the reactive values it depends on.observe()
: there is no return value; instead, these cause side-effects. Very often, the effect is to change something in the UI, such as the choices in an input, or to render a plot.
In Shiny, reactive expressions are designed to run quickly and often; observers are designed to be run sparingly.
1.1.3 Using tidyverse functions
The tidyverse is designed with interactive programming in mind. It is meant to support code like this, without a lot of quotes or namespace qualifiers:
|>
penguins group_by(island, sex) |>
summarise(bill_length_mm = mean(bill_length_mm))
In Shiny, variable (column) names in data frames are expressed as strings, rather than as bare variable-names. As well, in Shiny, we may want to summarise()
an arbitrary set of variables. Thus, it can be a challenge to use tidyverse code in Shiny.
It should not surprise us that the tidyverse offers tools to address this situation:
<tidy-select>
is a set of tools to select variables within a data frame. Functions that use<tidy-select>
includedplyr::select()
,tidyr::pivot_longer()
. Of particular use in Shiny are the selection helpers for strings:dplyr::any_of()
anddplyr::all_of()
.across()
lets us use a<tidy-select>
specification in a data-masking function. More concretely, it lets usgroup_by()
orsummarize()
over an arbitrary set of variables in a data frame.- If you need to use data-masking with (by definition) a single variable, you can use subsetting with the
.data
pronoun, e.g.ggplot2::aes(x = .data[[str_var_x]])
.
1.2 Demonstration App
The goal of this chapter is to highlight some design choices in the source code of this demonstration Shiny app.
1.2.1 Description
To start with, spend a few minutes playing with the app, while referring back to these diagrams:
Each input
and output
you see in the diagram is a part of the UI of the app. The reactive expressions, in this case: inp
and agg
, are found only in the app’s server-function.
The solid lines indicate immediate downstream-evaluation if the upstream value changes; this is what we think of when we hear “reactivity”. The dashed lines indicate that downstream-evaluation does not immediate follow an upstream change. For example, the reactive-expression agg
is updated only when the button
is pushed.
Spend some time to study the app, to make sure that these diagrams agree with your understanding of how the app operates. In the following sections, we’ll discuss how to implement in your Shiny code.
1.2.2 Prelims
In the rest of this chapter, we’ll highlight the code used to make app, and the design choices behind the code. In the repository, there are a couple of files to pay attention to:
app-aggregate-local.R
R/
aggregate-local.R
Here’s the start of the app file, app-aggregate-local.R
:
library("shiny")
# -------------------
# global functions
# -------------------
#
# created outside of reactive environment, making it easier:
# - to test
# - to migrate to a package
source("./R/aggregate-local.R")
As you can see, it sources R/aggregate-local.R
, which contains our helper functions.
1.2.3 Helper functions
Before writing a Shiny app, I like to write out a set of non-reactive functions that will do the “heavy lifting”. To the extent possible, these are pure functions, which makes it easier to test. I keep these functions in an R
folder alongside my app; here’s a link to the actual code.
Just like in the app, we’ll use the palmerpenguins dataset:
# this is not part of the helper functions - it's for exposition here
library("palmerpenguins")
library("tibble")
penguins
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# … with 334 more rows, and 2 more variables: sex <fct>, year <int>
In fact, the first bit of code is not even a function. It is an enumeration of the choices for the aggregation function:
# choices for aggregation functions
<- c("mean", "min", "max") agg_function_choices
We’ll use it in a few places, so I want to define it only once.
Next, a couple of functions that, given a data frame, return the names of:
- numerical variables
- categorical variables
You might quibble with how I’ve defined these here, but it works for me, for this example.
# given a data frame, return the names of numeric columns
<- function(df) {
cols_number <- dplyr::select(df, where(~is.numeric(.x) | is.integer(.x)) )
df_select names(df_select)
}
# given a data frame, return the names of string and factor columns
<- function(df) {
cols_category <- dplyr::select(df, where(~is.character(.x) | is.factor(.x)) )
df_select names(df_select)
}
You may have noticed that I refer to functions using the package name, e.g. dplyr::select()
. This is a habit I learned following Hadley Wickham; basically:
I like to be as explicit as possible when writing functions. It provides fewer opportunities for strange things to happen; I provide enough opportunities as it is.
The function is more ready to be included in a package.
As advertised, testing (or at least spot-verification) is straightforward:
cols_number(penguins)
[1] "bill_length_mm" "bill_depth_mm" "flipper_length_mm"
[4] "body_mass_g" "year"
cols_category(penguins)
[1] "species" "island" "sex"
Let’s look at the aggregation function:
<- function(df, str_group, str_agg, str_fn_agg,
group_aggregate str_fn_choices = agg_function_choices) {
# validate the aggregation function
stopifnot(
%in% str_fn_choices
str_fn_agg
)
# get the aggregation function
<- get(str_fn_agg)
func
|>
df ::group_by(dplyr::across(dplyr::all_of(str_group))) |>
dplyr::summarise(
dplyr::across(dplyr::all_of(str_agg), func, na.rm = TRUE)
dplyr
) }
There’s a few things I want to point out about this function:
Aside from the data frame, all the arguments are strings. It is designed for use with Shiny, not for interactive use.
We are using
agg_function_choices
to make sure that we won’t execute arbitrary code. We turn the string into binding to a function usingget()
.We use dplyr’s
across()
function, which lets us useselect()
semantics in “data-masking” functions, e.g.group_by()
,summarise()
.To select data-frame variables using strings, we use
all_of()
.
For example if we were grouping by "island"
, then aggregating over "bill_length_mm"
and "bill_depth_mm"
using "mean"
, our interactive code might look like:
library("dplyr", quietly = TRUE)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
<-
aggregate_interactive |>
penguins group_by(island) |>
summarise(
bill_length_mm = mean(bill_length_mm, na.rm = TRUE),
bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE)
)
aggregate_interactive
# A tibble: 3 × 3
island bill_length_mm bill_depth_mm
<fct> <dbl> <dbl>
1 Biscoe 45.3 15.9
2 Dream 44.2 18.3
3 Torgersen 39.0 18.4
We can use this result to help verify that our “string” version is working:
<- group_aggregate(
aggregate_string
penguins, str_group = "island",
str_agg = c("bill_length_mm", "bill_depth_mm"),
str_fn_agg = "mean"
)
identical(aggregate_interactive, aggregate_string)
[1] TRUE
1.2.4 UI
The UI object is relatively straightforward; we use a fluidPage()
with a narrower column for inputs and a wider column for outputs.
To give a clearer view of the high-level structure of the page, I replaced the code for the inputs and outputs with ...
:
library("shiny")
<- fluidPage(
ui titlePanel("Aggregator"),
fluidRow(
column(
width = 4,
wellPanel(
h3("Aggregation"),
...
)s
),column(
width = 8,
h3("Input data"),
...hr(),
h3("Aggregated data"),
...
)
) )
1.2.4.1 Inputs
wellPanel(
h3("Aggregation"),
selectizeInput(
inputId = "cols_group",
label = "Grouping columns",
choices = c(),
multiple = TRUE
), selectizeInput(
inputId = "cols_agg",
label = "Aggregation columns",
choices = c(),
multiple = TRUE
),selectizeInput(
inputId = "func_agg",
label = "Aggregation function",
choices = agg_function_choices,
multiple = FALSE
),actionButton(
inputId = "button",
label = "Submit"
) )
Let’s look more closely at input$cols_group
(this also applies to input$cols_agg
):
selectizeInput(
inputId = "cols_group",
label = "Grouping columns",
choices = c(),
multiple = TRUE
)
Note that choices
is specified, initially, as an empty vector. The reactivity diagram for cols_group
indicates that, we use an observer function to update this input. We’ll do this in the server function, where we update the choices
.
1.2.4.2 Outputs
The outputs are fairly strightforward; we are using DT::DTOutput()
as placeholders for DT DataTables.
column(
width = 8,
h3("Input data"),
::DTOutput(
DToutputId = "table_inp"
),hr(),
h3("Aggregated data"),
::DTOutput(
DToutputId = "table_agg"
) )
1.2.5 Server function
This may be a habit particular to me, but I like to organize a server-function into groups:
<- function(input, output, session) {
server # input observers
# reactive expressions and values
# outputs
}
1.2.5.1 Input observers
There are two inputs: cols_group
and cols_agg
, whose choices
change when the input data-frame changes.
To make such a change, we use a Shiny observe()
, which runs when any of its reactive dependencies change. An observe()
does not return a value; instead, it causes a side-effect. In this case, it changes an input element in the DOM.
The observers are substantially similar, so I’ll show only cols_group
:
observe({
# this runs whenever the parsed input data changes
updateSelectizeInput(
session,inputId = "cols_group",
choices = cols_category(inp())
) })
Note that one of our helper functions, cols_category()
, makes an appearance. The choices
for the cols_group
input are updated according to the names of the categorical variables in the data frame returned by inp()
.
1.2.5.2 Reactive expressions
This app uses two reactive expressions:
inp()
, which returns the input data-frame.agg()
, which returns the aggregated data-frame.
<-
inp reactive({
::penguins
palmerpenguins })
For this app, we probably did not need to wrap palmerpenguins::penguins
in a reactive()
. I did this with future expansion in mind, where inp()
could also return a data frame according to a choice, or even a data frame parsed from an uploaded CSV file.
The reactive expression for agg()
, the aggregated data-frame, is more interesting:
<-
agg reactive({
req(input$func_agg %in% agg_function_choices)
group_aggregate(
inp(),
str_group = input$cols_group,
str_agg = input$cols_agg,
str_fn_agg = input$func_agg
)|>
}) bindEvent(input$button, ignoreNULL = TRUE, ignoreInit = TRUE)
The first thing we do in the reactive is make sure that the value of input$func_agg
is among the choices we specified. I’m sure you noticed that this is an extra check. Although redundant, I am careful to validate using the same values: agg_function_choices
. You can read more about input validation in the security chapter of Mastering Shiny.
Then, we use our group_aggregate()
helper function. For me, having tested it outside of Shiny helped me focus on getting the rest of the code working.
The reactive()
expression returns the data; the expression itself is piped to bindEvent()
, which will run the reactive()
, and return its value, only when the value of input$button
changes. This is a relatively new pattern in Shiny; it appeared in v1.6.0.
bindEvent()
has a couple of options:
ignoreNULL = FALSE
: thereactive()
is not evaluated ifinput$button
is zero.ignoreInit = FALSE
: thereactive()
is not evaluated when the app is first initialized.
In this case, the reactive()
is evaluated only in response to a button-click. This can be a useful pattern if the reactive()
contains a long-running computation, or a call to an external resource. You may also be interested in Shiny’s bindCache()
function.
1.2.5.3 Outputs
There two outputs: one for the inp()
data, the other for the agg()
data; each is a table output.
These outputs are similar to one another; we’ll focus on output$table_inp
:
$table_inp <- DT::renderDT(inp()) output
The table output is a straightforward use of DT::renderDT()
.