Don’t use color if they communicate nothing

A collection of common dataviz caveats by Data-to-Viz.com




Colors are one of the main mediums used to convey information in a dataviz. They allow us to highlight groups or variation when used appropriately, but can be confusing or misleading otherwise.

Here is an example showing the quantity of weapons exported by the top 20 largest exporters in 2017 (more info here):

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/7_OneCatOneNum.csv", header=TRUE, sep=",")

# create random color palette
mycolors <- colors()[sample(1:400, nrow(data))]

# Barplot
data %>%
  filter(!is.na(Value)) %>%
  arrange(Value) %>%
  tail(20) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=Value, fill=Country) ) +
    geom_bar(stat="identity") +
    scale_fill_manual( values = mycolors ) +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Weapon quantity (SIPRI trend-indicator value)")

Here, each country is represented using a random color. When looking at it, your audience may try to understand what the color means. They are thus less focused on the main point: the bar length.

Color: if additionnal information only


Here is the same graphic using a single color for all countries. In my opinion, it conveys the information better:

# Barplot
data %>%
  filter(!is.na(Value)) %>%
  arrange(Value) %>%
  tail(20) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=Value) ) +
    geom_bar(stat="identity", fill="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Weapon quantity (SIPRI trend-indicator value)")

Note that you can use color to highlight specific variation in selected bars. I’m personally not a big fan of it, but it is a common practice;

# Barplot
data %>%
  filter(!is.na(Value)) %>%
  arrange(Value) %>%
  tail(20) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=Value, fill=Value) ) +
    geom_bar(stat="identity") +
    scale_fill_viridis() +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("") +
    ylab("Weapon quantity (SIPRI trend-indicator value)")

To bear in mind


When adding color to your chart, double check that the colors add insight. Colors are often useful to:

  • Show groups
  • Highlight an item
  • Show gradient

Going further


  • What to consider when using choosing colors for data visualization, by Lisa Charlotte Rost on the Data Wrapper blog.


Dataviz decision tree

Data To Viz is a comprehensive classification of chart types organized by data input format. Get a high-resolution version of our decision tree delivered to your inbox now!


High Resolution Poster
 

A work by Yan Holtz for data-to-viz.com