If you want to learn enough graphic design to be dangerous, get this book:
There are some helpful summary articles about CRAP principles online too:
There are a ton of excellent data visualization books, including two new (free!) books by Kieran Healy and Claus Wilke:
Many people have created many useful tools for selecting the correct chart type for a given dataset or question. Here are some of the best:
library(tidyverse)
library(httr)
library(ggstance)
By default, R graphics don’t really respect CRAP rules. In base R, everything is centered:
plot(mtcars$wt, mtcars$mpg, main = "Here's a title")
Nowadays in ggplot, titles are left aligned, but they used to be centered by default. Even so, now there are multiple alignments—things are aligned center and left (and right if you add a caption)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
labs(title = "Here's a title",
caption = "I'm right aligned")
To get R graphics in a more publishable state, you have to tweak them a lot. It seems daunting at first, but most of the tweaks can be put in a custom theme that you can reuse over and over again.
First, let’s get some data about characters in the Harry Potter series from the Harry Potter API. You’ll need your own API key for this to work.
HP_KEY <- "put-api-key-here"
response <- GET("https://www.potterapi.com/v1/characters/", query = list(key = HP_KEY))
parsed <- jsonlite::fromJSON(content(response, "text"), simplifyVector = FALSE)
characters <- bind_rows(parsed)
Next, do some quick data processing (which goes beyond the scope of this talk):
professors <- characters %>%
filter(str_detect(role, "[Pp]rofessor") | name == "Albus Dumbledore") %>%
mutate(type = "Professors")
students <- characters %>%
filter(str_detect(role, "[Ss]tudent"), str_detect(school, "Hogwarts School")) %>%
mutate(type = "Students")
hogwarts <- bind_rows(professors, students)
plot_houses_df <- hogwarts %>%
count(type, house) %>%
replace_na(list(house = "Unknown")) %>%
mutate_at(vars(type, house), funs(fct_rev(fct_inorder(., ordered = TRUE))))
plot_houses_df
## # A tibble: 9 x 3
## type house n
## <ord> <ord> <int>
## 1 Professors Gryffindor 5
## 2 Professors Hufflepuff 1
## 3 Professors Ravenclaw 4
## 4 Professors Slytherin 2
## 5 Professors Unknown 8
## 6 Students Gryffindor 18
## 7 Students Hufflepuff 5
## 8 Students Ravenclaw 9
## 9 Students Slytherin 7
The default ggplot settings create multiple alignments, have very little typographic contrast, and have some extra chart junk:
# Non CRAPy default
ggplot(plot_houses_df, aes(x = n, y = house)) +
geom_barh(stat = "identity") +
labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
facet_wrap(~ type)
theme()
We can fix alignment with some tweaks in theme()
, left aligning the strip text and the caption. We can also use scale_x_continuous(expand = c(0, 0))
to get rid of the gap between the axis labels and the start of the bars and add some extra space between the panels with theme(panel.spacing.x = ...)
. Finally, we don’t need axis titles here because it’s fairly obvious what the axes are. If we needed them, we could change their alignment with theme()
too.
# Alignment
ggplot(plot_houses_df, aes(x = n, y = house)) +
geom_barh(stat = "identity") +
labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
facet_wrap(~ type) +
labs(x = NULL, y = NULL) +
scale_x_continuous(expand = c(0, 0)) +
theme(strip.text = element_text(hjust = 0),
plot.caption = element_text(hjust = 0,
margin = margin(t = 10)),
panel.spacing.x = unit(1, "lines"))
We can also get rid of some bits of chart junk like minor gridlines, major y gridlines, axis ticks, etc.
# Chart junk
ggplot(plot_houses_df, aes(x = n, y = house)) +
geom_barh(stat = "identity") +
labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
facet_wrap(~ type) +
labs(x = NULL, y = NULL) +
scale_x_continuous(expand = c(0, 0)) +
theme(strip.text = element_text(hjust = 0),
plot.caption = element_text(hjust = 0,
margin = margin(t = 10)),
panel.spacing.x = unit(1, "lines")) +
theme(axis.ticks = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank())
All those theme()
settings are getting long and unwieldy. We can move them into a new custom theme. Technically, ggplot has theme_replace()
and %+replace%
for changing theme settings, but I don’t like it as much as creating my own function, which I find more flexible and intuitive.
# Move all these tweaks to a new theme based on theme_bw()
theme_hp <- function() {
new_theme <- theme_bw() +
theme(strip.text = element_text(hjust = 0),
strip.background = element_blank(),
strip.text.x = element_text(margin = margin(l = 0, t = 5)),
panel.spacing.x = unit(1, "lines"),
plot.caption = element_text(hjust = 0,
margin = margin(t = 10)),
axis.ticks = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank())
return(new_theme)
}
ggplot(plot_houses_df, aes(x = n, y = house)) +
geom_barh(stat = "identity") +
labs(title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
scale_x_continuous(expand = c(0, 0)) +
labs(x = NULL, y = NULL) +
facet_wrap(~ type) +
theme_hp()
There are lots of ways to work with colors in ggplot. With more recent versions of ggplot, you can use viridis palettes with scale_color_viridis_c()
(for continuous data) or scale_color_viridis_d()
(for discrete data). You can use Scientific Colour-Maps with scico. You can also define your own palettes with a named list (so you can access colors like my_colors$red
), with your own fancy self-made palettes, or even with a simple vector of colors (like c("grey50", "#0d6217", "#000a90", "#eee117", "#7f0909")
, though this is hard because it’s impossible to remember what the hex colors actually are).
# Colors
# via http://thisismyworld2003.blogspot.com/2016/03/hogwarts-house-colors.html
house_colors <- list(Gryffindor = "#7f0909",
Hufflepuff = "#eee117",
Ravenclaw = "#000a90",
Slytherin = "#0d6217")
ggplot(plot_houses_df, aes(x = n, y = house, fill = house)) +
geom_barh(stat = "identity") +
labs(x = NULL, y = NULL,
title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
scale_x_continuous(expand = c(0, 0)) +
scale_fill_manual(values = c(Unknown = "grey50", unlist(house_colors)),
guide = FALSE) +
facet_wrap(~ type) +
theme_hp()
So far, the plot has good alignment and color, but it lacks typographic contrast. We can use theme()
to change the font on specific plot elements. A few important notes about these element_text()
settings:
rel()
makes it easy to deal with font sizes without worrying about the size of the final plot. If you set the size with actual point sizes, the text won’t rescale.face = "plain"
in all the options. This is becuase I’m using light and semibold versions of the font. If I to use face = "bold"
at the same time as family = "Encode Sans Condensed Light"
, ggplot will try to use the bold version of the light font, which doesn’t exist.geom_label()
or geom_text()
, those will use Arial unless you specify geom_text(..., family = "your font here"
). You can use update_geom_defaults()
to automatically use a custom font in those geoms without having to always repeat your custom font settings.theme_hp <- function() {
# Make geom_label() and geom_text() use custom fonts by default
update_geom_defaults("label", list(family = "Encode Sans Condensed"))
update_geom_defaults("text", list(family = "Encode Sans Condensed"))
new_theme <- theme_bw(base_family = "Encode Sans Condensed") +
theme(plot.title = element_text(size = rel(1.4), face = "plain",
family = "Encode Sans Condensed SemiBold"),
plot.subtitle = element_text(size = rel(1), face = "plain",
family = "Encode Sans Condensed Light"),
plot.caption = element_text(size = rel(0.8), color = "grey50", face = "plain",
family = "Encode Sans Condensed Light",
margin = margin(t = 10), hjust = 0),
strip.text = element_text(size = rel(1), face = "plain",
family = "Encode Sans Condensed SemiBold", hjust = 0),
strip.text.x = element_text(margin = margin(l = 0, t = 5)),
strip.background = element_blank(),
panel.spacing.x = unit(1, "lines"),
axis.ticks = element_blank(),
panel.border = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank())
return(new_theme)
}
And here’s that shiny new theme in use (this time I save it to an object for later use):
# Put text annotations in a separate data frame. Without this, the annotations
# show up in both panels, but we only want it in the students panel
text_to_add <- tribble(
~x, ~y, ~type, ~text,
0.1, 1, "Students", "We know all students’ houses"
) %>%
mutate(type = factor(type, levels = levels(plot_houses_df$type)))
houses_nice <- ggplot(plot_houses_df, aes(x = n, y = house, fill = house)) +
geom_barh(stat = "identity") +
geom_label(data = text_to_add,
aes(x = x, y = y, label = text),
inherit.aes = FALSE, hjust = 0) +
labs(x = NULL, y = NULL,
title = "Gryffindor and Ravenclaw dominate Hogwarts",
subtitle = "At least according to J.K. Rowling, whose account is clearly biased",
caption = "Source: potterapi.com") +
scale_x_continuous(expand = c(0, 0)) +
scale_fill_manual(values = c(Unknown = "grey50", unlist(house_colors)),
guide = FALSE) +
facet_wrap(~ type) +
theme_hp()
houses_nice
R will yell at you when you save PDFs with custom fonts because by default it can’t embed them. You can get around this by using the Cairo graphics library, which nowadays is installed with R. See this post for full details.
To embed custom fonts in a PDF, use the device = cairo_pdf
argument in ggsave()
:
# Save as PDF. This will yell at you.
ggsave(houses_nice, filename = "output/houses.pdf",
width = 7, height = 4, units = "in")
ggsave(houses_nice, filename = "output/houses_with_fonts.pdf",
width = 7, height = 4, units = "in", device = cairo_pdf)
Using Cairo with PNGs is also helpful, since R’s default PNG engine doesn’t create images with correct dimensions for whatever reason.
# Save as PNG. Nowadays this defaults to 300 dpi (I think), but you can also be
# explicit about the resolution
ggsave(houses_nice, filename = "output/houses.png",
width = 7, height = 4, units = "in")
ggsave(houses_nice, filename = "output/houses_hires.png",
width = 7, height = 4, units = "in", dpi = 300)
Here’s what this looks like if you put this in a Word document:
If you save the file with type = "cairo"
, it’ll create a PNG with the correct dimensions:
# Use Cairo
ggsave(houses_nice, filename = "output/houses_hires_correct.png",
width = 7, height = 4, units = "in", dpi = 300, type = "cairo")
🙊 o hi byu! 🙊↩