Mental arithmetic in dataviz

A collection of common dataviz caveats by Data-to-Viz.com




Example


Let’s consider the number of people entering (red curve) and leaving (blue curve) a shop from 8am to 10pm. This is an accurate representation using a line plot, that answers very well the question of how many people are entering / leaving the shop.


# Libraries
library(tidyverse)
library(hrbrthemes)

# Create data
data <- data.frame(
  x = seq(8,20,0.5),
  Entering = c(20,22,19,24,28,29,26,32,34,37,33,34,30,28,29,30,27,21,19,21,17,13,15,12,9),
  Leaving = c(0,4,8,7,10,13,15,16,15,16,17,19,22,21,24,26,24,25,28,29,28,26,23,20,19)
)

# reformat
data %>%
  gather( key=type, value=value, -1) %>%
  ggplot( aes(x=x, y=value, color=type)) +
    geom_line() +
    ylim(0,40) +
    scale_color_discrete(name="") +
    scale_x_continuous(breaks=seq(8,20,1)) +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(39, 27, 31), label=LETTERS[1:3] ) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank(),
      legend.position = c(0.9, 0.9),
    ) +
    ylab("# of people") +
    xlab("Hour of day")

Now, what if somebody asks you:

  • what is the trend of the total number of people in the shop?
  • at what time is the number of people in the shop declining?

Mental arithmetic


To answer these questions, your audience must think hard and will probably be confused.

  • Is it where the marker A is, when the number of people entering the shop starts decreasing?
  • Or marker B where more people are leaving than entering?
  • Or C where the number of people leaving decreases?

Instead of forcing the reader to make the calculation, it is probably better to represent the number of people in the shop directly:

# reformat
data %>%
  mutate(difference=Entering-Leaving + 5) %>%
  mutate(tot = cumsum(difference)) %>%
  ggplot( aes(x=x, y=tot)) +
    geom_line() +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(205, 300, 290), label=LETTERS[1:3] ) +
    scale_x_continuous(breaks=seq(8,20,1)) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank()
    ) +
    ylab("# of people") +
    xlab("Hour of day")

Of course, if more people leave the shop than enter, the total quantity starts decreasing (marker B). But if you want your audience to focus on your point, do not give them extra work.

Going further



Dataviz decision tree

Data To Viz is a comprehensive classification of chart types organized by data input format. Get a high-resolution version of our decision tree delivered to your inbox now!


High Resolution Poster
 

A work by Yan Holtz for data-to-viz.com