Stacked Area Graph

Definition

A `stacked area chart` is the extension of a basic area chart. It displays the evolution of the value of several groups on the same graphic. The values of each group are displayed on top of each other, what allows to check on the same figure the evolution of both the total of a numeric variable, and the importance of each group.

The following example shows the evolution of baby name frequencies in the US between 1880 and 2015.

``````# Libraries
library(tidyverse)
library(babynames)
library(streamgraph)
library(viridis)
library(hrbrthemes)
library(plotly)

data <- babynames %>%
filter(name %in% c("Ashley", "Amanda", "Jessica",    "Patricia", "Linda", "Deborah",   "Dorothy", "Betty", "Helen")) %>%
filter(sex=="F")

# Plot
p <- data %>%
ggplot( aes(x=year, y=n, fill=name, text=name)) +
geom_area( ) +
scale_fill_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
theme(legend.position="none")
ggplotly(p, tooltip="text")``````

Note: This graphic does not have a legend since it is interactive. Hover a group to get its name. The dataset is available through the babynames R library and a `.csv` version is available on github.

What for

The efficiency of stacked area graph is discussed and it must be used with care. To put it in a nutshell:

• stacked area graph are `appropriate` to study the evolution of the `whole` and the `relative proportions` of each group. Indeed, the top of the areas allows to visualize how the whole behaves, like for a classic area chart.

• however they are not appropriate to study the evolution of each `individual group`: it is very hard to substract the height of other groups at each time point. For a more accurate but less attractive figure, consider a line chart or area chart using small multiple.

This website dedicates a whole page about stacking and its potential pitfalls, visit it to go further.

Variation

A variation of the stacked area graph is the `percent stacked area graph`. It is the same thing but value of each group are normalized at each time stamp. That allows to study the percentage of each group in the whole more efficiently:

``````p <- data %>%
# Compute the proportions:
group_by(year) %>%
mutate(freq = n / sum(n)) %>%
ungroup() %>%

# Plot
ggplot( aes(x=year, y=freq, fill=name, color=name, text=name)) +
geom_area(  ) +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
theme(legend.position="none") +
ggtitle("Popularity of American names in the previous 30 years") +
theme_ipsum() +
theme(legend.position="none")
ggplotly(p, tooltip="text")``````

Common caveats

• Use it with care, try using small multiple with area chart or line chart instead.
• The group order (from bottom to top) can have an influence, try several orders.

Related

The R and Python graph galleries are 2 websites providing hundreds of chart example, always providing the reproducible code. Click the button below to see how to build the chart you need with your favorite programing language.