Choropleth map

definition - mistake - related - code

Definition


A choropleth map displays divided geographical areas or regions that are coloured in relation to a numeric variable. It allows to study how a variable evolutes along a territory. It is a powerful and widely used data visualization technique. However, its downside is that regions with bigger sizes tend to have a bigger weight in the map interpretation, which includes a bias.


Here is an example describing the distribution of restaurants in the south of france.

# Libraries
library(sf)
library(cartogram)
library(tidyverse)
library(broom)
library(viridis)
library(patchwork)
library(geojsonio)

# Import region boundaries
spdf <- geojson_read("https://raw.githubusercontent.com/gregoiredavid/france-geojson/master/communes.geojson",  what = "sp")

# Select a subset of the data
spdf <- spdf[substr(spdf@data$code, 1, 2) %in% c("06", "83", "13", "30", "34", "11", "66"), ]

# Convert the spatial data to an sf object
spdf_sf <- st_as_sf(spdf)

# Read the additional data
data <- read.table("https://raw.githubusercontent.com/holtzy/R-graph-gallery/master/DATA/data_on_french_states.csv", header = TRUE, sep = ";")

# Make sure the column names match for the join
names(spdf_sf)[names(spdf_sf) == "code"] <- "id"

# Merge the fortified data with the additional data
spdf_fortified <- spdf_sf %>%
  left_join(data, by = c("id" = "depcom"))
# Note that if the number of restaurant is NA, it is in fact 0
spdf_fortified$nb_equip[ is.na(spdf_fortified$nb_equip)] = NA

# Plot
p <- ggplot(spdf_fortified) +
  geom_sf(aes(fill = nb_equip), linewidth=0, alpha=0.9) +
  theme_void() +
  scale_fill_viridis(direction=-1, trans = "log", breaks=c(1,5,10,20,50,100), name="Number of restaurant", guide = guide_legend( keyheight = unit(3, units = "mm"), keywidth=unit(12, units = "mm"), label.position = "bottom", title.position = 'top', nrow=1) ) +
  labs(
    title = "South of France Restaurant concentration",
    subtitle = "Number of restaurant per city district", 
    caption = "Data: INSEE | Creation: Yan Holtz | r-graph-gallery.com"
  ) +
  theme(
    text = element_text(color = "#22211d"), 
    plot.background = element_rect(fill = "#f5f5f2", color = NA), 
    panel.background = element_rect(fill = "#f5f5f2", color = NA), 
    legend.background = element_rect(fill = "#f5f5f2", color = NA),
    
    plot.title = element_text(size= 15, hjust=0.01, color = "#4e4d47", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
    plot.subtitle = element_text(size= 12, hjust=0.01, color = "#4e4d47", margin = margin(b = -0.1, t = 0.43, l = 2, unit = "cm")),
    plot.caption = element_text( size=8, color = "#4e4d47", margin = margin(b = 0.3, r=-99, unit = "cm") ),
    
    legend.position = c(0.7, 0.09)
  ) +
  coord_sf(datum = NA)
p

Note: Boundaries of city districts come from here. Number of restaurant per district comes from here.

Important Note: Here, the absolute number of restaurant per district is shown. Keep in mind that an important bias is present: districts with large area and / or high number of inhabitants are more prone to have a lot of restaurants.

What for


Choropleth maps are used to represent a variable on a map. It is a great way to show the distribution of a variable across a territory. It is often used in the field of demography, sociology, economy, etc. Here is a more concise version of the “What for” section:

They offer several key benefits:

  • Highlight Spatial Patterns: Easily identify trends and outliers across regions.
  • Facilitate Comparison: Use color gradients to compare data values between areas.
  • Integrate Data: Combine various data sources for a comprehensive geographical analysis.
  • Scalability: Adapt to different geographical levels, from global to local.
  • Customization: Adjust colors, classification methods, and add interactivity.
  • Bias Mitigation: Normalize data and consider alternatives like cartograms to reduce biases from region size or population density.

Variation


  • Hexbin map: A hexbin map is a variation of the choropleth map where the regions are replaced by hexagons. It is useful when the regions have a broad range of sizes.
  • Cartogram: A cartogram is a variation of the choropleth map where the regions are resized according to the variable of interest. It is useful to avoid the biais introduced by the size of the regions, or to illustrate this biais.

Common mistakes


  • Normalize your variable: you cannot compare raw numbers between regions of distinct size or population.
  • Take a huge care when choosing the continuous color palette.
  • Don’t forget the legend.
  • If your regions have a broad range of sizes it introduces a biais. You could consider using hexbin maps instead.
  • Don’t call it chLoropleth map.

Build your own


The R, Python, React and D3 graph galleries are 4 websites providing hundreds of chart example, always providing the reproducible code. Click the button below to see how to build the chart you need with your favorite programing language.

R graph gallery Python gallery React gallery D3 gallery

Dataviz decision tree

Data To Viz is a comprehensive classification of chart types organized by data input format. Get a high-resolution version of our decision tree delivered to your inbox now!


High Resolution Poster
 

A work by Yan Holtz for data-to-viz.com