Network diagram

definition - mistake - related - code

Definition


Network diagrams (also called Graphs) show interconnections between a set of entities. Each entity is represented by a Node (or vertice). Connections between nodes are represented through links (or edges).

Here is an example showing the co-authors network of Vincent Ranwez, a researcher who’s my previous supervisor. Basically, people having published at least one research paper with him are represented by a node. If two people have been listed on the same publication at least once, they are connected by a link.

# Libraries
library(tidyverse)
library(viridis)
library(patchwork)
library(hrbrthemes)
library(ggraph)
library(igraph)
library(networkD3)

# Load researcher data
dataUU <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyUndirectedUnweighted.csv", header=TRUE)

# Transform the adjacency matrix in a long format
connect <- dataUU %>% 
  gather(key="to", value="value", -1) %>%
  na.omit()

# Number of connection per person
c( as.character(connect$from), as.character(connect$to)) %>%
  as.tibble() %>%
  group_by(value) %>%
  summarize(n=n()) -> coauth
colnames(coauth) <- c("name", "n")

# NetworkD3 format
graph=simpleNetwork(connect)

# Plot
simpleNetwork(connect,     
        Source = 1,                 # column number of source
        Target = 2,                 # column number of target
        height = 880,               # height of frame area in pixels
        width = 1980,
        linkDistance = 10,         # distance between node. Increase this value to have more space between nodes
        charge = -4,              # numeric value indicating either the strength of the node repulsion (negative value) or attraction (positive value)
        fontSize = 5,              # size of the node names
        fontFamily = "serif",       # font og node names
        linkColour = "#666",        # colour of edges, MUST be a common colour for the whole graph
        nodeColour = "#69b3a2",     # colour of nodes, MUST be a common colour for the whole graph
        opacity = 0.9,              # opacity of nodes. 0=transparent. 1=no transparency
        zoom = T                    # Can you zoom on the figure?
        )

Note: This chart is interactive: zoom on a specific cluster to see researcher names. Data have been retrieved using the scholar package, the pipeline is describe in this github repository. You can read more about this story here.

Four types of input


Four main types of network diagram exist, according to the features of data inputs. Here is a short description.


Undirected and Unweighted

Tom, Cherelle and Melanie live in the same house. They are connected but no direction and no weight.

# Create data
set.seed(2)
data=matrix(sample(0:1, 25, replace=TRUE), nrow=5)
data[lower.tri(data)] <- NA
colnames(data)=rownames(data)=LETTERS[1:5]

# Transform it in a graph format
network=graph_from_adjacency_matrix(data)

# Plot it
# Make the graph
ggraph(network) + 
  geom_edge_link(edge_colour="black", edge_alpha=0.3, edge_width=0.2) +
  geom_node_point( color="#69b3a2", size=5) +
  geom_node_text( aes(label=name), repel = TRUE, size=8, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 


Undirected and Weighted

In the previous co-authors network, people are connected if they published a scientific paper together. The weight is the number of time it happend.

# Create data
set.seed(1)
data=matrix(sample(0:3, 25, replace=TRUE), nrow=5)
data[lower.tri(data)] <- NA
colnames(data)=rownames(data)=LETTERS[1:5]
 
# Transform it in a graph format
network=graph_from_adjacency_matrix(data, weighted = TRUE)

# Make the graph
ggraph(network) + 
  geom_edge_link( aes(edge_width=E(network)$weight), edge_colour="black", edge_alpha=0.3) +
  geom_node_point( color="#69b3a2", size=5) +
  geom_node_text( aes(label=name), repel = TRUE, size=8, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 


Directed and Unweighted

Tom follows Shirley on twitter, but the opposite is not necessarily true. The connection is unweighted: just connected or not.

# Create data
set.seed(10)
data=matrix(sample(0:1, 25, replace=TRUE), nrow=5)
diag(data) = NA
colnames(data)=rownames(data)=LETTERS[1:5]
 
# Transform it in a graph format
network=graph_from_adjacency_matrix(data)

# Make the graph
ggraph(network) + 
  geom_edge_link(edge_colour="black", edge_alpha=0.8, edge_width=0.2, arrow = arrow(angle=20)) +
  geom_node_point( color="#69b3a2", size=3) +
  geom_node_text( aes(label=name), repel = TRUE, size=6, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 


Directed and Weighted

People migrate from a country to another: the weight is the number of people, the direction is the destination.

# Create data
set.seed(10)
data=matrix(sample(0:3, 16, replace=TRUE), nrow=4)
diag(data) <- NA
colnames(data)=rownames(data)=LETTERS[1:4]
 
# Transform it in a graph format
network=graph_from_adjacency_matrix(data, weighted=TRUE)

# Make the graph
ggraph(network) + 
  geom_edge_link(edge_colour="black", edge_alpha=0.3, aes(edge_width=E(network)$weight) , arrow=arrow()) +
  scale_edge_width(range=c(1,3)) +
  geom_node_point( color="#69b3a2", size=3) +
  geom_node_text( aes(label=name), repel = TRUE, size=6, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 



Note: as you can observe on the examples above, directed graphs are quite hard to represent using this type of visualization. More appropriate techniques exist to represent flows, like Sankey diagram or chord diagram.

Variation


Many customizations are available for network diagrams. Here are a few features you can work on to improve your graphic:


Fruchterman-Reingold


Probably the most widely used algorithm, using a force-directed method.

# Load researcher data
dataUU <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyUndirectedUnweighted.csv", header=TRUE)

# Transform the adjacency matrix in a long format
connect <- dataUU %>% 
  gather(key="to", value="value", -1) %>%
  na.omit()

# Number of connection per person
c( as.character(connect$from), as.character(connect$to)) %>%
  as.tibble() %>%
  group_by(value) %>%
  summarize(n=n()) -> coauth
colnames(coauth) <- c("name", "n")

# Create a graph object with igraph
mygraph <- graph_from_data_frame( connect, vertices = coauth )

# Make the graph
ggraph(mygraph, layout="fr") + 
  #geom_edge_density(edge_fill="#69b3a2") +
  geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
  geom_node_point(aes(size=n, alpha=n)) +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 

DrL


A force-directed graph layout toolbox focused on real-world large-scale graphs

# Make the graph
ggraph(mygraph, layout="drl") + 
  #geom_edge_density(edge_fill="#69b3a2") +
  geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
  geom_node_point(aes(size=n, alpha=n)) +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 

Randomly


This is what happens if node positions is set up randomly

ggraph(mygraph, layout="igraph", algorithm="randomly") + 
  #geom_edge_density(edge_fill="#69b3a2") +
  geom_edge_link(edge_colour="black", edge_alpha=0.2, edge_width=0.3) +
  geom_node_point(aes(size=n, alpha=n)) +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  ) 

Common mistakes


Hairball is the main caveat when ploting networks: when too many connections and no obivious pattern is represented, the figure get cluttered and unreadable.

Related


Build your own


The R and Python graph galleries are 2 websites providing hundreds of chart example, always providing the reproducible code. Click the button below to see how to build the chart you need with your favorite programing language.

R graph gallery Python gallery

Comments


Any thoughts on this? Found any mistake? Disagree? Please drop me a word on twitter or in the comment section below:

 

A work by Yan Holtz for data-to-viz.com