The issue with pie chart

A collection of common dataviz caveats by Data-to-Viz.com




Bad by definition


A pie chart is a circle divided into sectors that each represent a proportion of the whole. It is often used to show percentage, where the sum of the sectors equals 100%.



The problem is that humans are pretty bad at reading angles. In the adjacent pie chart, try to figure out which group is the biggest one and try to order them by value. You will probably struggle to do so and this is why pie charts must be avoided.

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(patchwork)

# create 3 data frame:
data1 <- data.frame( name=letters[1:5], value=c(17,18,20,22,24) )
data2 <- data.frame( name=letters[1:5], value=c(20,18,21,20,20) )
data3 <- data.frame( name=letters[1:5], value=c(24,23,21,19,18) )

# Plot
plot_pie <- function(data, vec){

ggplot(data, aes(x="name", y=value, fill=name)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y", start=0, direction = -1) +
  scale_fill_viridis(discrete = TRUE,  direction=-1) + 
  geom_text(aes(y = vec, label = rev(name), size=4, color=c( "white", rep("black", 4)))) +
  scale_color_manual(values=c("black", "white")) +
  theme_ipsum() +
  theme(
    legend.position="none",
    plot.title = element_text(size=14),
    panel.grid = element_blank(),
    axis.text = element_blank(),
    legend.margin=unit(0, "null")
  ) +
  xlab("") +
  ylab("")
  
}

plot_pie(data1, c(10,35,55,75,93))

If you’re still not convinced, let’s try to compare several pie plots. Once more, try to understand which group has the highest value in these 3 graphics. Also, try to figure out what is the evolution of the value among groups.

a <- plot_pie(data1, c(10,35,55,75,93))
b <- plot_pie(data2, c(10,35,53,75,93))
c <- plot_pie(data3, c(10,29,50,75,93))
a + b + c

Now, let’s represent exactly the same data using a barplot:

# A function to make barplots
plot_bar <- function(data){
  ggplot(data, aes(x=name, y=value, fill=name)) +
    geom_bar( stat = "identity") +
    scale_fill_viridis(discrete = TRUE, direction=-1) + 
    scale_color_manual(values=c("black", "white")) +
    theme_ipsum() +
    theme(
      legend.position="none",
      plot.title = element_text(size=14),
      panel.grid = element_blank(),
    ) +
    ylim(0,25) +
    xlab("") +
    ylab("")
}

# Make 3 barplots
a <- plot_bar(data1)
b <- plot_bar(data2)
c <- plot_bar(data3)

# Put them together with patchwork
a + b + c

As you can see on this barplot, there is a heavy difference between the three pie plots with a hidden pattern that you definitely don’t want to miss when you tell your story.

And often made even worse


Even if pie charts are bad by definition, it is still possible to make them even worse by adding other bad features:

Alternatives


The barplot is the best alternative to pie plots. If you have many values to display, you can also consider a lollipop plot that is a bit more elegant in my opinion. Here is an example based on the amount of weapons sold by a few countries in the world:

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/7_OneCatOneNum.csv", header=TRUE, sep=",")

# plot
data %>%
  filter(!is.na(Value)) %>%
  arrange(Value) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=Value) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=Value), color="grey") +
    geom_point(size=3, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("")

Another possibility would be to create a treemap if your goal is to describe what the whole is composed of.

# Package
library(treemap)

# Plot
treemap(data,
            
            # data
            index="Country",
            vSize="Value",
            type="index",
            
            # Main
            title="",
            palette="Dark2",

            # Borders:
            border.col=c("black"),             
            border.lwds=1,                         
        
            # Labels
            fontsize.labels=0.5,
            fontcolor.labels="white",
            fontface.labels=1,            
            bg.labels=c("transparent"),              
            align.labels=c("left", "top"),                                  
            overlap.labels=0.5,
            inflate.labels=T                        # If true, labels are bigger when rectangle is bigger.

            
            )

Going further


Comments


Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:

 

A work by Yan Holtz for data-to-viz.com