Slides

This R Markdown file is available on GitHub.

View the video of the presentation.

Download the slides from today’s talk

Resources

Books

If you want to learn enough graphic design to be dangerous, get this book:

  • Robin Williams, The Non-Designer’s Design & Type Books: Design and Typographic Principles for the Visual Novice, Deluxe Edition. (Berkeley, California: Peachpit Press, 2008). (Or any more recent version too)

There are some helpful summary articles about CRAP principles online too:

There are a ton of excellent data visualization books, including two new (free!) books by Kieran Healy and Claus Wilke:

  • Kieran Healy, Data Visualization for Social Science: A practical introduction with R and ggplot2
  • Claus Wilke, Fundamentals of Data Visualization
  • Alberto Cairo, The Truthful Art: Data, Charts, and Maps for Communication (Berkeley, California: New Riders, 2016).
  • Stephanie D. H. Evergreen, Effective Data Visualization: The Right Chart for the Right Data (Thousand Oaks, CA: Sage, 2017).
  • Dona M. Wong, The Wall Street Journal Guide to Information Graphics: The Dos and Don’ts of Presenting Data, Facts, and Figures (London: W. W. Norton & Company, 2010).
  • Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data (Sebastopol, California: O’Reilly Media, 2017). [FREE online]
  • Alberto Cairo, The Functional Art: An Introduction to Information Graphics and Visualization (Berkeley, California: New Riders, 2013).

How to select the appropriate chart type

Many people have created many useful tools for selecting the correct chart type for a given dataset or question. Here are some of the best:

  • The Data Visualisation Catalogue: Descriptions, explanations, examples, and tools for creating 60 different types of visualizations.
  • The Data Viz Project: Descriptions and examples for 150 different types of visualizations. Also allows you to search by data shape and chart function (comparison, correlation, distribution, geographical, part to whole, trend over time, etc.).
  • From Data to Viz: A decision tree for dozens of chart types with links to R and Python code.
  • The Chartmaker Directory: Examples of how to create 51 different types of visualizations in 31 different software packages, including Excel, Tableau, and R.
  • R Graph Catalog: R code for 124 ggplot graphs.
  • Emery’s Essentials: Descriptions and examples of 26 different chart types.

Colors

  • Adobe Color: Create, share, and explore rule-based and custom color palettes.
  • ColorBrewer: Sequential, diverging, and qualitative color palettes that take accessibility into account.
  • viridis: Percetually uniform color scales.
  • Scientific Colour-Maps: Perceptually uniform color scales like viridis. Use them in R with scico.
  • Colorgorical: Create color palettes based on fancy mathematical rules for perceptual distance.
  • Colorpicker for data: More fancy mathematical rules for color palettes (explanation).
  • iWantHue: Yet another perceptual distance-based color palette builder.
  • ColourLovers: Like Facebook for color palettes.
  • Photochrome: Word-based color pallettes.

Fonts

Other helpful data visualization resources

🎶 Take a sad plot and make it CRAPier 🎶

library(tidyverse)
library(httr)
library(ggstance)

By default, R graphics don’t really respect CRAP rules. In base R, everything is centered:

plot(mtcars$wt, mtcars$mpg, main = "Here's a title")

Nowadays in ggplot, titles are left aligned, but they used to be centered by default. Even so, now there are multiple alignments—things are aligned center and left (and right if you add a caption)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  labs(title = "Here's a title",
       caption = "I'm right aligned")

To get R graphics in a more publishable state, you have to tweak them a lot. It seems daunting at first, but most of the tweaks can be put in a custom theme that you can reuse over and over again.

First, let’s get some data about characters in the Harry Potter series from the Harry Potter API. You’ll need your own API key for this to work.

HP_KEY <- "put-api-key-here"
response <- GET("https://www.potterapi.com/v1/characters/", query = list(key = HP_KEY))

parsed <- jsonlite::fromJSON(content(response, "text"), simplifyVector = FALSE)

characters <- bind_rows(parsed)

Next, do some quick data processing (which goes beyond the scope of this talk):

professors <- characters %>% 
  filter(str_detect(role, "[Pp]rofessor") | name == "Albus Dumbledore") %>% 
  mutate(type = "Professors")

students <- characters %>% 
  filter(str_detect(role, "[Ss]tudent"), str_detect(school, "Hogwarts School")) %>% 
  mutate(type = "Students")

hogwarts <- bind_rows(professors, students)

plot_houses_df <- hogwarts %>% 
  count(type, house) %>% 
  replace_na(list(house = "Unknown")) %>% 
  mutate_at(vars(type, house), funs(fct_rev(fct_inorder(., ordered = TRUE))))

plot_houses_df
## # A tibble: 9 x 3
##   type       house          n
##   <ord>      <ord>      <int>
## 1 Professors Gryffindor     5
## 2 Professors Hufflepuff     1
## 3 Professors Ravenclaw      4
## 4 Professors Slytherin      2
## 5 Professors Unknown        8
## 6 Students   Gryffindor    18
## 7 Students   Hufflepuff     5
## 8 Students   Ravenclaw      9
## 9 Students   Slytherin      7

The default ggplot settings create multiple alignments, have very little typographic contrast, and have some extra chart junk:

# Non CRAPy default
ggplot(plot_houses_df, aes(x = n, y = house)) +
  geom_barh(stat = "identity") +
  labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  facet_wrap(~ type)

Making stuff CRAPier with theme()

We can fix alignment with some tweaks in theme(), left aligning the strip text and the caption. We can also use scale_x_continuous(expand = c(0, 0)) to get rid of the gap between the axis labels and the start of the bars and add some extra space between the panels with theme(panel.spacing.x = ...). Finally, we don’t need axis titles here because it’s fairly obvious what the axes are. If we needed them, we could change their alignment with theme() too.

# Alignment
ggplot(plot_houses_df, aes(x = n, y = house)) +
  geom_barh(stat = "identity") +
  labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  facet_wrap(~ type) + 
  labs(x = NULL, y = NULL) +
  scale_x_continuous(expand = c(0, 0)) +
  theme(strip.text = element_text(hjust = 0),
        plot.caption = element_text(hjust = 0, 
                                    margin = margin(t = 10)),
        panel.spacing.x = unit(1, "lines"))

We can also get rid of some bits of chart junk like minor gridlines, major y gridlines, axis ticks, etc.

# Chart junk
ggplot(plot_houses_df, aes(x = n, y = house)) +
  geom_barh(stat = "identity") +
  labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  facet_wrap(~ type) + 
  labs(x = NULL, y = NULL) +
  scale_x_continuous(expand = c(0, 0)) +
  theme(strip.text = element_text(hjust = 0),
        plot.caption = element_text(hjust = 0, 
                                    margin = margin(t = 10)),
        panel.spacing.x = unit(1, "lines")) +
  theme(axis.ticks = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor = element_blank())

All those theme() settings are getting long and unwieldy. We can move them into a new custom theme. Technically, ggplot has theme_replace() and %+replace% for changing theme settings, but I don’t like it as much as creating my own function, which I find more flexible and intuitive.

# Move all these tweaks to a new theme based on theme_bw()
theme_hp <- function() {
  new_theme <- theme_bw() +
    theme(strip.text = element_text(hjust = 0),
          strip.background = element_blank(),
          strip.text.x = element_text(margin = margin(l = 0, t = 5)),
          panel.spacing.x = unit(1, "lines"),
          plot.caption = element_text(hjust = 0, 
                                      margin = margin(t = 10)),
          axis.ticks = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor = element_blank(),
          panel.border = element_blank())
  
  return(new_theme)
}

ggplot(plot_houses_df, aes(x = n, y = house)) +
  geom_barh(stat = "identity") +
  labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  scale_x_continuous(expand = c(0, 0)) +
  labs(x = NULL, y = NULL) +
  facet_wrap(~ type) + 
  theme_hp()

Colors

There are lots of ways to work with colors in ggplot. With more recent versions of ggplot, you can use viridis palettes with scale_color_viridis_c() (for continuous data) or scale_color_viridis_d() (for discrete data). You can use Scientific Colour-Maps with scico. You can also define your own palettes with a named list (so you can access colors like my_colors$red), with your own fancy self-made palettes, or even with a simple vector of colors (like c("grey50", "#0d6217", "#000a90", "#eee117", "#7f0909"), though this is hard because it’s impossible to remember what the hex colors actually are).

# Colors
# via http://thisismyworld2003.blogspot.com/2016/03/hogwarts-house-colors.html
house_colors <- list(Gryffindor = "#7f0909",
                     Hufflepuff = "#eee117",
                     Ravenclaw = "#000a90",
                     Slytherin = "#0d6217")

ggplot(plot_houses_df, aes(x = n, y = house, fill = house)) +
  geom_barh(stat = "identity") +
  labs(x = NULL, y = NULL,
       title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_fill_manual(values = c(Unknown = "grey50", unlist(house_colors)), 
                    guide = FALSE) +
  facet_wrap(~ type) +
  theme_hp()

Fonts

So far, the plot has good alignment and color, but it lacks typographic contrast. We can use theme() to change the font on specific plot elements. A few important notes about these element_text() settings:

  • Setting size with rel() makes it easy to deal with font sizes without worrying about the size of the final plot. If you set the size with actual point sizes, the text won’t rescale.
  • Notice how face = "plain" in all the options. This is becuase I’m using light and semibold versions of the font. If I to use face = "bold" at the same time as family = "Encode Sans Condensed Light", ggplot will try to use the bold version of the light font, which doesn’t exist.
  • Setting font options in the theme will apply to everything in the plot except geom elements. If you use geom_label() or geom_text(), those will use Arial unless you specify geom_text(..., family = "your font here"). You can use update_geom_defaults() to automatically use a custom font in those geoms without having to always repeat your custom font settings.
  • I downloaded Encode Sans Condensed from Google Fonts.
theme_hp <- function() {
  # Make geom_label() and geom_text() use custom fonts by default
  update_geom_defaults("label", list(family = "Encode Sans Condensed"))
  update_geom_defaults("text", list(family = "Encode Sans Condensed"))
  
  new_theme <- theme_bw(base_family = "Encode Sans Condensed") +
    theme(plot.title = element_text(size = rel(1.4), face = "plain",
                                    family = "Encode Sans Condensed SemiBold"),
          plot.subtitle = element_text(size = rel(1), face = "plain",
                                       family = "Encode Sans Condensed Light"),
          plot.caption = element_text(size = rel(0.8), color = "grey50", face = "plain",
                                      family = "Encode Sans Condensed Light",
                                      margin = margin(t = 10), hjust = 0),
          strip.text = element_text(size = rel(1), face = "plain",
                                    family = "Encode Sans Condensed SemiBold", hjust = 0),
          strip.text.x = element_text(margin = margin(l = 0, t = 5)),
          strip.background = element_blank(),
          panel.spacing.x = unit(1, "lines"),
          axis.ticks = element_blank(),
          panel.border = element_blank(),
          panel.grid.major.y = element_blank(),
          panel.grid.minor = element_blank())
  
  return(new_theme)
}

And here’s that shiny new theme in use (this time I save it to an object for later use):

# Put text annotations in a separate data frame. Without this, the annotations
# show up in both panels, but we only want it in the students panel
text_to_add <- tribble(
  ~x, ~y, ~type,      ~text,
  0.1,  1,  "Students", "We know all students’ houses"
) %>% 
  mutate(type = factor(type, levels = levels(plot_houses_df$type)))

houses_nice <- ggplot(plot_houses_df, aes(x = n, y = house, fill = house)) +
  geom_barh(stat = "identity") +
  geom_label(data = text_to_add, 
             aes(x = x, y = y, label = text), 
             inherit.aes = FALSE, hjust = 0) +
  labs(x = NULL, y = NULL,
       title = "Gryffindor and Ravenclaw dominate Hogwarts",
       subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
       caption = "Source: potterapi.com") +
  scale_x_continuous(expand = c(0, 0)) +
  scale_fill_manual(values = c(Unknown = "grey50", unlist(house_colors)), 
                    guide = FALSE) +
  facet_wrap(~ type) +
  theme_hp()

houses_nice

Tricky issues with saving

R will yell at you when you save PDFs with custom fonts because by default it can’t embed them. You can get around this by using the Cairo graphics library, which nowadays is installed with R. See this post for full details.

To embed custom fonts in a PDF, use the device = cairo_pdf argument in ggsave():

# Save as PDF. This will yell at you.
ggsave(houses_nice, filename = "output/houses.pdf", 
       width = 7, height = 4, units = "in")
ggsave(houses_nice, filename = "output/houses_with_fonts.pdf", 
       width = 7, height = 4, units = "in", device = cairo_pdf)

Using Cairo with PNGs is also helpful, since R’s default PNG engine doesn’t create images with correct dimensions for whatever reason.

# Save as PNG. Nowadays this defaults to 300 dpi (I think), but you can also be
# explicit about the resolution
ggsave(houses_nice, filename = "output/houses.png",
       width = 7, height = 4, units = "in")

ggsave(houses_nice, filename = "output/houses_hires.png", 
       width = 7, height = 4, units = "in", dpi = 300)

Here’s what this looks like if you put this in a Word document:

If you save the file with type = "cairo", it’ll create a PNG with the correct dimensions:

# Use Cairo
ggsave(houses_nice, filename = "output/houses_hires_correct.png", 
       width = 7, height = 4, units = "in", dpi = 300, type = "cairo")


  1. 🙊 o hi byu! 🙊