1 Shiny

Published

2022-08-05

Shiny was my introduction to reactive programming. Like many folks, I started by hacking to “get stuff working”; this is a perfectly-honorable path. Then, I watched Joe Cheng’s tutorials (Part 1, Part 2), in which he explained some of the theory behind Shiny. These talks started me on a path that completely changed my persepective and, eventually, my abilities as a programmer.

This chapter is meant to be a review of Shiny; we will:

touch on some of the principles I learned from Joe’s talks.
show how these principles are implemented the demonstration app.

1.1 Principles

These are some things to keep in mind to help you write more-understandable and predictable Shiny apps.

1.1.1 Pure functions vs. side effects

This is the single biggest concept I have learned as a programmer, and I learned it relatively late in my career.

A pure function has two properties:

given the same set of arguments, it always returns the same value.
it makes no changes outside of its scope.

This can provide us some big benefits:

it doesn’t matter where or how the return value is computed, we can rely on getting the same answer.
we don’t have to worry about the environment changing as a result of calling the function.

Here’s a couple of examples of pure functions:

function(x) {
  x**2 - 1
}

function(x) {
  paste(x, " using a pure function.")
}

Pure functions are relatively striaghtforward to test because the output depends only on the inputs.

Side effects is a catch-all term for when a function’s behavior either:

depends on something not passed in as an argument.
changes the something outside of its scope, e.g.: writes a file, displays a plot.

Here’s a couple of functions that either depend on or cause side effects:

# return value depends on the *contents* of the file, not just file_name
function(file_name) {
  read.csv(file_name)
}

# this might make a change in a remote service
function(url, data) {
  
  h <- curl::new_handle()
  curl::handle_setform(h, data)
  
  curl::curl(url)
}

Aside from being non-deterministic, functions with side effects can take a long time to execute.

Of course, side effects are not necessarily bad things, but we need to be aware of them. Your Shiny server-function will make much more sense, and be much easier to debug, if you recognize pure functions and side effects.

1.1.2 Reactives vs. observers

Shiny server-functions provide two broad mechanisms for updating the state of your app:

reactive(): these return values, and work well with pure functions. In other words, the returned value depends only on the reactive values it depends on.
observe(): there is no return value; instead, these cause side-effects. Very often, the effect is to change something in the UI, such as the choices in an input, or to render a plot.

In Shiny, reactive expressions are designed to run quickly and often; observers are designed to be run sparingly.

1.1.3 Using tidyverse functions

The tidyverse is designed with interactive programming in mind. It is meant to support code like this, without a lot of quotes or namespace qualifiers:

penguins |>
  group_by(island, sex) |>
  summarise(bill_length_mm = mean(bill_length_mm))

In Shiny, variable (column) names in data frames are expressed as strings, rather than as bare variable-names. As well, in Shiny, we may want to summarise() an arbitrary set of variables. Thus, it can be a challenge to use tidyverse code in Shiny.

It should not surprise us that the tidyverse offers tools to address this situation:

<tidy-select> is a set of tools to select variables within a data frame. Functions that use <tidy-select> include dplyr::select(), tidyr::pivot_longer(). Of particular use in Shiny are the selection helpers for strings: dplyr::any_of() and dplyr::all_of().
across() lets us use a <tidy-select> specification in a data-masking function. More concretely, it lets us group_by() or summarize() over an arbitrary set of variables in a data frame.
If you need to use data-masking with (by definition) a single variable, you can use subsetting with the .data pronoun, e.g. ggplot2::aes(x = .data[[str_var_x]]).

1.2 Demonstration App

The goal of this chapter is to highlight some design choices in the source code of this demonstration Shiny app.

1.2.1 Description

To start with, spend a few minutes playing with the app, while referring back to these diagrams:

Each input and output you see in the diagram is a part of the UI of the app. The reactive expressions, in this case: inp and agg, are found only in the app’s server-function.

Legend: Reactivity diagram for Shiny demo-app

The solid lines indicate immediate downstream-evaluation if the upstream value changes; this is what we think of when we hear “reactivity”. The dashed lines indicate that downstream-evaluation does not immediate follow an upstream change. For example, the reactive-expression agg is updated only when the button is pushed.

Spend some time to study the app, to make sure that these diagrams agree with your understanding of how the app operates. In the following sections, we’ll discuss how to implement in your Shiny code.

1.2.2 Prelims

In the rest of this chapter, we’ll highlight the code used to make app, and the design choices behind the code. In the repository, there are a couple of files to pay attention to:

app-aggregate-local.R
R/
  aggregate-local.R

Here’s the start of the app file, app-aggregate-local.R:

library("shiny")

# ------------------- 
# global functions
# ------------------- 
#
# created outside of reactive environment, making it easier:
#   - to test
#   - to migrate to a package
source("./R/aggregate-local.R")

As you can see, it sources R/aggregate-local.R, which contains our helper functions.

1.2.3 Helper functions

Before writing a Shiny app, I like to write out a set of non-reactive functions that will do the “heavy lifting”. To the extent possible, these are pure functions, which makes it easier to test. I keep these functions in an R folder alongside my app; here’s a link to the actual code.

Just like in the app, we’ll use the palmerpenguins dataset:

# this is not part of the helper functions - it's for exposition here
library("palmerpenguins")
library("tibble")

penguins

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# … with 334 more rows, and 2 more variables: sex <fct>, year <int>

In fact, the first bit of code is not even a function. It is an enumeration of the choices for the aggregation function:

# choices for aggregation functions
agg_function_choices <- c("mean", "min", "max")

We’ll use it in a few places, so I want to define it only once.

Next, a couple of functions that, given a data frame, return the names of:

numerical variables
categorical variables

You might quibble with how I’ve defined these here, but it works for me, for this example.

# given a data frame, return the names of numeric columns
cols_number <- function(df) {
  df_select <- dplyr::select(df, where(~is.numeric(.x) | is.integer(.x)) ) 
  names(df_select)
}

# given a data frame, return the names of string and factor columns
cols_category <- function(df) {
  df_select <- dplyr::select(df, where(~is.character(.x) | is.factor(.x)) ) 
  names(df_select)
}

You may have noticed that I refer to functions using the package name, e.g. dplyr::select(). This is a habit I learned following Hadley Wickham; basically:

I like to be as explicit as possible when writing functions. It provides fewer opportunities for strange things to happen; I provide enough opportunities as it is.
The function is more ready to be included in a package.

As advertised, testing (or at least spot-verification) is straightforward:

cols_number(penguins)

[1] "bill_length_mm"    "bill_depth_mm"     "flipper_length_mm"
[4] "body_mass_g"       "year"

cols_category(penguins)

[1] "species" "island"  "sex"

Let’s look at the aggregation function:

group_aggregate <- function(df, str_group, str_agg, str_fn_agg, 
                            str_fn_choices = agg_function_choices) {
  
  # validate the aggregation function
  stopifnot(
    str_fn_agg %in% str_fn_choices
  )
  
  # get the aggregation function
  func <- get(str_fn_agg)
  
  df |>
    dplyr::group_by(dplyr::across(dplyr::all_of(str_group))) |>
    dplyr::summarise(
      dplyr::across(dplyr::all_of(str_agg), func, na.rm = TRUE)
    )
}

There’s a few things I want to point out about this function:

Aside from the data frame, all the arguments are strings. It is designed for use with Shiny, not for interactive use.
We are using agg_function_choices to make sure that we won’t execute arbitrary code. We turn the string into binding to a function using get().
We use dplyr’s across() function, which lets us use select() semantics in “data-masking” functions, e.g. group_by(), summarise().
To select data-frame variables using strings, we use all_of().

For example if we were grouping by "island", then aggregating over "bill_length_mm" and "bill_depth_mm" using "mean", our interactive code might look like:

library("dplyr", quietly = TRUE)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

aggregate_interactive <- 
  penguins |>
  group_by(island) |>
  summarise(
    bill_length_mm = mean(bill_length_mm, na.rm = TRUE),
    bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE)
  )

aggregate_interactive

# A tibble: 3 × 3
  island    bill_length_mm bill_depth_mm
  <fct>              <dbl>         <dbl>
1 Biscoe              45.3          15.9
2 Dream               44.2          18.3
3 Torgersen           39.0          18.4

We can use this result to help verify that our “string” version is working:

aggregate_string <- group_aggregate(
  penguins, 
  str_group = "island", 
  str_agg = c("bill_length_mm", "bill_depth_mm"),
  str_fn_agg = "mean"
)

identical(aggregate_interactive, aggregate_string)

[1] TRUE

1.2.4 UI

The UI object is relatively straightforward; we use a fluidPage() with a narrower column for inputs and a wider column for outputs.

To give a clearer view of the high-level structure of the page, I replaced the code for the inputs and outputs with ...:

library("shiny")

ui <- fluidPage(
  titlePanel("Aggregator"),
  fluidRow(
    column(
      width = 4, 
      wellPanel(
        h3("Aggregation"),
        ...
      )s
    ),
    column(
      width = 8,
      h3("Input data"),
      ...
      hr(),
      h3("Aggregated data"),
      ...
    )
  )
)

1.2.4.1 Inputs

wellPanel(
  h3("Aggregation"),
  selectizeInput(
    inputId = "cols_group",
    label = "Grouping columns",
    choices = c(),
    multiple = TRUE
  ),        
  selectizeInput(
    inputId = "cols_agg",
    label = "Aggregation columns",
    choices = c(),
    multiple = TRUE
  ),
  selectizeInput(
    inputId = "func_agg",
    label = "Aggregation function",
    choices = agg_function_choices,
    multiple = FALSE
  ),
  actionButton(
    inputId = "button",
    label = "Submit"
  )
)

Let’s look more closely at input$cols_group (this also applies to input$cols_agg):

selectizeInput(
  inputId = "cols_group",
  label = "Grouping columns",
  choices = c(),
  multiple = TRUE
)

Note that choices is specified, initially, as an empty vector. The reactivity diagram for cols_group indicates that, we use an observer function to update this input. We’ll do this in the server function, where we update the choices.

1.2.4.2 Outputs

The outputs are fairly strightforward; we are using DT::DTOutput() as placeholders for DT DataTables.

column(
  width = 8,
  h3("Input data"),
  DT::DTOutput(
    outputId = "table_inp"
  ),
  hr(),
  h3("Aggregated data"),
  DT::DTOutput(
    outputId = "table_agg"
  )      
)

1.2.5 Server function

This may be a habit particular to me, but I like to organize a server-function into groups:

server <- function(input, output, session) {
  # input observers
  # reactive expressions and values
  # outputs
}

1.2.5.1 Input observers

There are two inputs: cols_group and cols_agg, whose choices change when the input data-frame changes.

To make such a change, we use a Shiny observe(), which runs when any of its reactive dependencies change. An observe() does not return a value; instead, it causes a side-effect. In this case, it changes an input element in the DOM.

The observers are substantially similar, so I’ll show only cols_group:

observe({
  # this runs whenever the parsed input data changes
  updateSelectizeInput(
    session,
    inputId = "cols_group",
    choices = cols_category(inp())
  )
})

Note that one of our helper functions, cols_category(), makes an appearance. The choices for the cols_group input are updated according to the names of the categorical variables in the data frame returned by inp().

1.2.5.2 Reactive expressions

This app uses two reactive expressions:

inp(), which returns the input data-frame.
agg(), which returns the aggregated data-frame.

inp <- 
  reactive({ 
    palmerpenguins::penguins
  })

For this app, we probably did not need to wrap palmerpenguins::penguins in a reactive(). I did this with future expansion in mind, where inp() could also return a data frame according to a choice, or even a data frame parsed from an uploaded CSV file.

The reactive expression for agg(), the aggregated data-frame, is more interesting:

agg <- 
  reactive({
         
    req(input$func_agg %in% agg_function_choices)
 
    group_aggregate(
      inp(), 
      str_group = input$cols_group, 
      str_agg = input$cols_agg, 
      str_fn_agg = input$func_agg
    )
  }) |>
  bindEvent(input$button, ignoreNULL = TRUE, ignoreInit = TRUE)

The first thing we do in the reactive is make sure that the value of input$func_agg is among the choices we specified. I’m sure you noticed that this is an extra check. Although redundant, I am careful to validate using the same values: agg_function_choices. You can read more about input validation in the security chapter of Mastering Shiny.

Then, we use our group_aggregate() helper function. For me, having tested it outside of Shiny helped me focus on getting the rest of the code working.

The reactive() expression returns the data; the expression itself is piped to bindEvent(), which will run the reactive(), and return its value, only when the value of input$button changes. This is a relatively new pattern in Shiny; it appeared in v1.6.0.

bindEvent() has a couple of options:

ignoreNULL = FALSE: the reactive() is not evaluated if input$button is zero.
ignoreInit = FALSE: the reactive() is not evaluated when the app is first initialized.

In this case, the reactive() is evaluated only in response to a button-click. This can be a useful pattern if the reactive() contains a long-running computation, or a call to an external resource. You may also be interested in Shiny’s bindCache() function.

1.2.5.3 Outputs

There two outputs: one for the inp() data, the other for the agg() data; each is a table output.

These outputs are similar to one another; we’ll focus on output$table_inp:

output$table_inp <- DT::renderDT(inp())

The table output is a straightforward use of DT::renderDT().