Heatmap

definition - mistake - related - code

Definition


A heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. It is a bit like looking a data table from above.

Here is an example showing 8 general features like population or life expectancy for about 30 countries in 2015. Data come from the French National Institute of Demographic Studies.

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(plotly)
# d3heatmap is not on CRAN yet, but can be found here: https://github.com/talgalili/d3heatmap
library(d3heatmap)

# Load data
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/multivariate.csv", header = T, sep = ";")
colnames(data) <- gsub("\\.", " ", colnames(data))

# Select a few country
data <- data %>%
  filter(Country %in% c("France", "Sweden", "Italy", "Spain", "England", "Portugal", "Greece", "Peru", "Chile", "Brazil", "Argentina", "Bolivia", "Venezuela", "Australia", "New Zealand", "Fiji", "China", "India", "Thailand", "Afghanistan", "Bangladesh", "United States of America", "Canada", "Burundi", "Angola", "Kenya", "Togo")) %>%
  arrange(Country) %>%
  mutate(Country = factor(Country, Country))

# Matrix format
mat <- data
rownames(mat) <- mat[,1]
mat <- mat %>% dplyr::select(-Country, -Group, -Continent)
mat <- as.matrix(mat)

# Heatmap
#d3heatmap(mat, scale="column", dendrogram = "none", width="800px", height="80Opx", colors = "Blues")

library(heatmaply)
p <- heatmaply(mat,
        dendrogram = "none",
        xlab = "", ylab = "",
        main = "",
        scale = "column",
        margins = c(60,100,40,20),
        grid_color = "white",
        grid_width = 0.00001,
        titleX = FALSE,
        hide_colorbar = TRUE,
        branches_lwd = 0.1,
        label_names = c("Country", "Feature:", "Value"),
        fontsize_row = 5, fontsize_col = 5,
        labCol = colnames(mat),
        labRow = rownames(mat),
        heatmap_layers = theme(axis.line = element_blank())
        )

Note: You can learn more about this dataset and how to visualize it in the dedicated page

What for


A heatmap is really useful to display a general view of numerical data, not to extract specific data point. In the graphic above, the huge population size of China and India pops out for example.


A heatmap is also useful to display the result of hierarchical clustering. Basically, clustering checks which countries tend to have the same features on their numeric variables, and therefore which countries are similar. The usual way to represent the result is to use dendrograms. This type of chart can be drawn around the heatmap:

p <- heatmaply(mat,
        #dendrogram = "row",
        xlab = "", ylab = "",
        main = "",
        scale = "column",
        margins = c(60,100,40,20),
        grid_color = "white",
        grid_width = 0.00001,
        titleX = FALSE,
        hide_colorbar = TRUE,
        branches_lwd = 0.1,
        label_names = c("Country", "Feature:", "Value"),
        fontsize_row = 5, fontsize_col = 5,
        labCol = colnames(mat),
        labRow = rownames(mat),
        heatmap_layers = theme(axis.line=element_blank())
        )

# save the widget
# library(htmlwidgets)
# saveWidget(p, file= "~/Desktop/R-graph-gallery/HtmlWidget/heatmapInter.html")

Here, Burundi and Angola are grouped together. Indeed they are two countries in strong expansion, with a lot of children per woman but still a strong mortality rate.

Note: in this heatmap, features are also clusterised. For instance, birth rate and children per woman are grouped together since they are highly correlated.

Note: hierarchical clustering is a complex statistical method. You can learn more about it here.

Variation


  • We’ve seen in the previous section that a heatmap is often used to display the result of a clustering algorithm. A common task is to compare the result with expectations. For instance, we can check if the countries are clustering according to their continent using a color bar.

  • For a static heatmap, a common practice is to display the exact value of each cell in numbers. Indeed, it is hard to translate a color into a precise number.

  • Heatmaps can also be used for time series where there is a regular pattern in time.

  • Heatmaps can be applied to adjacency matrices.

Common mistakes


  • Often need to normalize your data
  • Use cluster analysis and thus permute the rows and the columns of the matrix to place similar values near each other according to the clustering
  • Color palette is important

Build your own


The R, Python, React and D3 graph galleries are 4 websites providing hundreds of chart example, always providing the reproducible code. Click the button below to see how to build the chart you need with your favorite programing language.

R graph gallery Python gallery React gallery D3 gallery

Dataviz decision tree

Data To Viz is a comprehensive classification of chart types organized by data input format. Get a high-resolution version of our decision tree delivered to your inbox now!


High Resolution Poster
 

A work by Yan Holtz for data-to-viz.com