Why you should order your data

A collection of common dataviz caveats by Data-to-Viz.com




By default, most of the data visualization tools will order the groups of your categorical variables using alphabetical order, or using the order of appearance in your input table. It is good practice to think about this order since changing it can add a lot of insight to your graphic.

Unordered lollipop plot


Let’s start with a lollipop plot showing the quantity of weapons sold by a few countries. Here each row represents a country and the X-axis shows how many weapons have been sold in 2017. Countries are ordered in alphabetical order by default.

# Libraries
library(tidyverse)
library(hrbrthemes)
library(kableExtra)
options(knitr.table.format = "html")

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/7_OneCatOneNum.csv", header=TRUE, sep=",")

# Plot 
data %>%
  filter(!is.na(Value)) %>%
  ggplot( aes(x=Country, y=Value) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=Value), color="grey") +
    geom_point(size=3, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("")

It is quite obvious that the US and Russia sell many more weapons than the other countries. However, it is quite hard to see the difference between any other countries, and the reader has to go from one to the other to compare them. This is a lot of work and will definitely discard attention to your graphic.

Reorder it


Instead, let’s make the exact same chart, but reorder each group using their value:

data %>%
  filter(!is.na(Value)) %>%
  arrange(Value) %>%
  mutate(Country=factor(Country, Country)) %>%
  ggplot( aes(x=Country, y=Value) ) +
    geom_segment( aes(x=Country ,xend=Country, y=0, yend=Value), color="grey") +
    geom_point(size=3, color="#69b3a2") +
    coord_flip() +
    theme_ipsum() +
    theme(
      panel.grid.minor.y = element_blank(),
      panel.grid.major.y = element_blank(),
      legend.position="none"
    ) +
    xlab("")

The figure is now way more insightful, with France being the third biggest exporting country, followed by Germany, Israel and the UK. Of course, note that it would make sense to normalize this graphic by the population of each country to have more comparable data.

Conclusion


Reordering your data is an easy step you should always consider when building a chart. Of course, sometimes the order of groups must be set by their features and not their values, like the months of the year, but it’s worth thinking about it.

Read more:


See the collection

Comments


Any thoughts on this? Want to share another common dataviz pitfall? Disagree? Please drop me a word on twitter or in the comment section below:

 

A work by Yan Holtz for data-to-viz.com