A boxplot gives a nice summary of one or several numeric variables. It is composed of several elements:

- The line that divides the box into 2 parts represents the median of the data. If the median is 10, it means that there are the same number of data points below and above 10.
- The end of the box shows the upper and lower quartiles. If the third quartile is 15, it means that 75% of the observation are lower than 15.
- The difference between Quartiles 1 and 3 is called the interquartile range (IQR)
- The extreme lines show the highest and lowest value excluding outliers.

Here is a diagram showing the boxplot anatomy:

A boxplot summarizes the distribution of a numeric variable for several groups. The problem is that summarizing also means losing information, and that can become a pitfall. If we consider the boxplot below, it is easy to conclude that the group `C`

has a higher value than the others. However, we cannot see the underlying distribution of dots in each group or their number of observations.

```
# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(plotly)
# create a dataset
data <- data.frame(
name=c( rep("A",500), rep("B",500), rep("B",500), rep("C",20), rep('D', 100) ),
value=c( rnorm(500, 10, 5), rnorm(500, 13, 1), rnorm(500, 18, 1), rnorm(20, 25, 4), rnorm(100, 12, 1) )
)
# Plot
data %>%
ggplot( aes(x=name, y=value, fill=name)) +
geom_boxplot() +
scale_fill_viridis(discrete = TRUE) +
theme_ipsum() +
theme(
legend.position="none",
plot.title = element_text(size=11)
) +
ggtitle("A somewhat misleading boxplot") +
xlab("")
```