1.2 Demonstration App

The goal of this chapter is to highlight some design choices in the source code of this demonstration Shiny app.

1.2.1 Description

To start with, spend a few minutes playing with the app, while referring back to these diagrams:

Figure 1.1: Reactivity diagram

Each input and output you see in the diagram is a part of the UI of the app. The reactive expressions, in this case: inp and agg, are found only in the app’s server-function.

Figure 1.2: Legend

The solid lines indicate immediate downstream-evaluation if the upstream value changes; this is what we think of when we hear “reactivity”. The dashed lines indicate that downstream-evaluation does not immediate follow an upstream change. For example, the reactive-expression agg is updated only when the button is pushed.

Spend some time to study the app, to make sure that these diagrams agree with your understanding of how the app operates. In the following sections, we’ll discuss how to implement in your Shiny code.

1.2.2 Prelims

In the rest of this chapter, we’ll highlight the code used to make app, and the design choices behind the code. In the repository, there are a couple of files to pay attention to:

app-aggregate-local.R
R/
  aggregate-local.R

Here’s the start of the app file, app-aggregate-local.R:

library("shiny")

# ------------------- 
# global functions
# ------------------- 
#
# created outside of reactive environment, making it easier:
#   - to test
#   - to migrate to a package
source("./R/aggregate-local.R")

As you can see, it sources R/aggregate-local.R, which contains our helper functions.

1.2.3 Helper functions

Before writing a Shiny app, I like to write out a set of non-reactive functions that will do the “heavy lifting”. To the extent possible, these are pure functions, which makes it easier to test. I keep these functions in an R folder alongside my app; here’s a link to the actual code.

Just like in the app, we’ll use the palmerpenguins dataset:

# this is not part of the helper functions - it's for exposition here
library("palmerpenguins")
library("tibble")

penguins

## # A tibble: 344 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>

In fact, the first bit of code is not even a function. It is an enumeration of the choices for the aggregation function:

# choices for aggregation functions
agg_function_choices <- c("mean", "min", "max")

We’ll use it in a few places, so I want to define it only once.

Next, a couple of functions that, given a data frame, return the names of:

numerical variables
categorical variables

You might quibble with how I’ve defined these here, but it works for me, for this example.

# given a data frame, return the names of numeric columns
cols_number <- function(df) {
  df_select <- dplyr::select(df, where(~is.numeric(.x) | is.integer(.x)) ) 
  names(df_select)
}

# given a data frame, return the names of string and factor columns
cols_category <- function(df) {
  df_select <- dplyr::select(df, where(~is.character(.x) | is.factor(.x)) ) 
  names(df_select)
}

You may have noticed that I refer to functions using the package name, e.g. dplyr::select(). This is a habit I learned following Hadley Wickham; basically:

I like to be as explicit as possible when writing functions. It provides fewer opportunities for strange things to happen; I provide enough opportunities as it is.
The function is more ready to be included in a package.

As advertised, testing (or at least spot-verification) is straightforward:

cols_number(penguins)

## [1] "bill_length_mm"    "bill_depth_mm"     "flipper_length_mm"
## [4] "body_mass_g"       "year"

cols_category(penguins)

## [1] "species" "island"  "sex"

Let’s look at the aggregation function:

group_aggregate <- function(df, str_group, str_agg, str_fn_agg, 
                            str_fn_choices = agg_function_choices) {
  
  # validate the aggregation function
  stopifnot(
    str_fn_agg %in% str_fn_choices
  )
  
  # get the aggregation function
  func <- get(str_fn_agg)
  
  df |>
    dplyr::group_by(dplyr::across(dplyr::all_of(str_group))) |>
    dplyr::summarise(
      dplyr::across(dplyr::all_of(str_agg), func, na.rm = TRUE)
    )
}

There’s a few things I want to point out about this function:

Aside from the data frame, all the arguments are strings. It is designed for use with Shiny, not for interactive use.
We are using agg_function_choices to make sure that we won’t execute arbitrary code. We turn the string into binding to a function using get().
We use dplyr’s across() function, which lets us use select() semantics in “data-masking” functions, e.g. group_by(), summarise().
To select data-frame variables using strings, we use all_of().

For example if we were grouping by "island", then aggregating over "bill_length_mm" and "bill_depth_mm" using "mean", our interactive code might look like:

library("dplyr", quietly = TRUE)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

aggregate_interactive <- 
  penguins |>
  group_by(island) |>
  summarise(
    bill_length_mm = mean(bill_length_mm, na.rm = TRUE),
    bill_depth_mm = mean(bill_depth_mm, na.rm = TRUE)
  )

aggregate_interactive

## # A tibble: 3 × 3
##   island    bill_length_mm bill_depth_mm
##   <fct>              <dbl>         <dbl>
## 1 Biscoe              45.3          15.9
## 2 Dream               44.2          18.3
## 3 Torgersen           39.0          18.4

We can use this result to help verify that our “string” version is working:

aggregate_string <- group_aggregate(
  penguins, 
  str_group = "island", 
  str_agg = c("bill_length_mm", "bill_depth_mm"),
  str_fn_agg = "mean"
)

identical(aggregate_interactive, aggregate_string)

## [1] TRUE

1.2.4 UI

The UI object is relatively straightforward; we use a fluidPage() with a narrower column for inputs and a wider column for outputs.

To give a clearer view of the high-level structure of the page, I replaced the code for the inputs and outputs with ...:

library("shiny")

ui <- fluidPage(
  titlePanel("Aggregator"),
  fluidRow(
    column(
      width = 4, 
      wellPanel(
        h3("Aggregation"),
        ...
      )s
    ),
    column(
      width = 8,
      h3("Input data"),
      ...
      hr(),
      h3("Aggregated data"),
      ...
    )
  )
)

1.2.4.1 Inputs

wellPanel(
  h3("Aggregation"),
  selectizeInput(
    inputId = "cols_group",
    label = "Grouping columns",
    choices = c(),
    multiple = TRUE
  ),        
  selectizeInput(
    inputId = "cols_agg",
    label = "Aggregation columns",
    choices = c(),
    multiple = TRUE
  ),
  selectizeInput(
    inputId = "func_agg",
    label = "Aggregation function",
    choices = agg_function_choices,
    multiple = FALSE
  ),
  actionButton(
    inputId = "button",
    label = "Submit"
  )
)

Let’s look more closely at input$cols_group (this also applies to input$cols_agg):

selectizeInput(
  inputId = "cols_group",
  label = "Grouping columns",
  choices = c(),
  multiple = TRUE
)

Note that choices is specified, initially, as an empty vector. The reactivity diagram for cols_group indicates that, we use an observer function to update this input. We’ll do this in the server function, where we update the choices.

1.2.4.2 Outputs

The outputs are fairly strightforward; we are using DT::DTOutput() as placeholders for DT DataTables.

column(
  width = 8,
  h3("Input data"),
  DT::DTOutput(
    outputId = "table_inp"
  ),
  hr(),
  h3("Aggregated data"),
  DT::DTOutput(
    outputId = "table_agg"
  )      
)

1.2.5 Server function

This may be a habit particular to me, but I like to organize a server-function into groups:

server <- function(input, output, session) {
  # input observers
  # reactive expressions and values
  # outputs
}

1.2.5.1 Input observers

There are two inputs: cols_group and cols_agg, whose choices change when the input data-frame changes.

To make such a change, we use a Shiny observe(), which runs when any of its reactive dependencies change. An observe() does not return a value; instead, it causes a side-effect. In this case, it changes an input element in the DOM.

The observers are substantially similar, so I’ll show only cols_group:

observe({
  # this runs whenever the parsed input data changes
  updateSelectizeInput(
    session,
    inputId = "cols_group",
    choices = cols_category(inp())
  )
})

Note that one of our helper functions, cols_category(), makes an appearance. The choices for the cols_group input are updated according to the names of the categorical variables in the data frame returned by inp().

1.2.5.2 Reactive expressions

This app uses two reactive expressions:

inp(), which returns the input data-frame.
agg(), which returns the aggregated data-frame.

inp <- 
  reactive({ 
    palmerpenguins::penguins
  })

For this app, we probably did not need to wrap palmerpenguins::penguins in a reactive(). I did this with future expansion in mind, where inp() could also return a data frame according to a choice, or even a data frame parsed from an uploaded CSV file.

The reactive expression for agg(), the aggregated data-frame, is more interesting:

agg <- 
  reactive({
         
    req(input$func_agg %in% agg_function_choices)
 
    group_aggregate(
      inp(), 
      str_group = input$cols_group, 
      str_agg = input$cols_agg, 
      str_fn_agg = input$func_agg
    )
  }) |>
  bindEvent(input$button, ignoreNULL = TRUE, ignoreInit = TRUE)

The first thing we do in the reactive is make sure that the value of input$func_agg is among the choices we specified. I’m sure you noticed that this is an extra check. Although redundant, I am careful to validate using the same values: agg_function_choices. You can read more about input validation in the security chapter of Mastering Shiny.

Then, we use our group_aggregate() helper function. For me, having tested it outside of Shiny helped me focus on getting the rest of the code working.

The reactive() expression returns the data; the expression itself is piped to bindEvent(), which will run the reactive(), and return its value, only when the value of input$button changes. This is a relatively new pattern in Shiny; it appeared in v1.6.0.

bindEvent() has a couple of options:

ignoreNULL = FALSE: the reactive() is not evaluated if input$button is zero.
ignoreInit = FALSE: the reactive() is not evaluated when the app is first initialized.

In this case, the reactive() is evaluated only in response to a button-click. This can be a useful pattern if the reactive() contains a long-running computation, or a call to an external resource. You may also be interested in Shiny’s bindCache() function.

1.2.5.3 Outputs

There two outputs: one for the inp() data, the other for the agg() data; each is a table output.

These outputs are similar to one another; we’ll focus on output$table_inp:

output$table_inp <- DT::renderDT(inp())

The table output is a straightforward use of DT::renderDT().