The Spaghetti plot

A collection of common dataviz caveats by Data-to-Viz.com




A Spaghetti plot is a line plot with many lines displayed together. With more than a few (~5?) groups this kind of graphic gets really hard to read, and thus provides little insight about the data. Let’s make an example with the evolution of baby names in the US from 1880 to 2015.

It is very hard to follow a line to understand the evolution of a specific name’s popularity. Plus, even if you manage to follow a line, you then need to link it with the legend which is even harder. Let’s try to find a few workarounds to improve this graphic.

#Target a specific group *** Let’s say you plot many groups, but the actual reason for that is to explain the feature of one particular group compared to the others. Then a good workaround is to highlight this group: make it appear different, and give it a proper annotation. Here, the evolution of Amanda’s popularity is obvious. Leaving the other lines is important since it allows you to compare Amanda to all other names.

#Use small multiples *** Area charts can be used to give a more general overview of the dataset, especially when used in combination with small multiples. In the following chart, it is easy to get a glimpse of the evolution of any name:

For instance, Linda was a really popular name for a really short period of time. On another hand, Ida has never been very popular, but was used a little during several decades.

#Combine approaches *** Note that if you want to compare the evolution of each line compared to the others, you can combine both approaches:

#Going further ***

#Comments *** Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:

 

A work by Yan Holtz for data-to-viz.com