Mental arithmetic in dataviz

A collection of common dataviz caveats by Data-to-Viz.com




Example


Let’s consider the number of people entering (red curve) and leaving (blue curve) a shop from 8am to 10pm. This is an accurate representation using a line plot, that answers very well the question of how many people are entering / leaving the shop.


# Libraries
library(tidyverse)
library(hrbrthemes)

# Create data
data <- data.frame(
  x = seq(8,20,0.5),
  Entering = c(20,22,19,24,28,29,26,32,34,37,33,34,30,28,29,30,27,21,19,21,17,13,15,12,9),
  Leaving = c(0,4,8,7,10,13,15,16,15,16,17,19,22,21,24,26,24,25,28,29,28,26,23,20,19)
)

# reformat
data %>%
  gather( key=type, value=value, -1) %>%
  ggplot( aes(x=x, y=value, color=type)) +
    geom_line() +
    ylim(0,40) +
    scale_color_discrete(name="") +
    scale_x_continuous(breaks=seq(8,20,1)) +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(39, 27, 31), label=LETTERS[1:3] ) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank(),
      legend.position = c(0.9, 0.9),
    ) +
    ylab("# of people") + 
    xlab("Hour of day")

Now, what if somebody asks you:

Mental arithmetic


To answer these questions, your audience must think hard and will probably be confused.

Instead of forcing the reader to make the calculation, it is probably better to represent the number of people in the shop directly:

# reformat
data %>%
  mutate(difference=Entering-Leaving + 5) %>%
  mutate(tot = cumsum(difference)) %>%
  ggplot( aes(x=x, y=tot)) +
    geom_line() +
    annotate( "text", x=c(12.5, 16.3, 17.5), y=c(205, 300, 290), label=LETTERS[1:3] ) +
    scale_x_continuous(breaks=seq(8,20,1)) +
    theme_ipsum() +
    theme(
      panel.grid.minor = element_blank()
    ) +
    ylab("# of people") + 
    xlab("Hour of day")

Of course, if more people leave the shop than enter, the total quantity starts decreasing (marker B). But if you want your audience to focus on your point, do not give them extra work.

Link with stacking


This is very related with the problem of [stacking].

Going further


Comments


Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:

 

A work by Yan Holtz for data-to-viz.com