statnet, igraph: social network analysis in R

statnet, igraph: social network analysis in R === {%hackmd @88u1wNUtQpyVz9FsQYeBRg/r1vSYkogS %} > Lee Tsung-Tang > > ###### tags: `R` `statnet` `network` `igraph` > > [TOC] --- ## Preparation - The `statnet` suite of network analysis packages will be used here for the analyses. - The data used in this chapter (and throughout the rest of the book) are from the `UserNetR` package that accompanies the book. - The specific dataset used here is called Moreno, and contains a friendship network of fourth grade students first collected by Jacob Moreno in the 1930s. - 資料集 ```R= # library(devtools) # install_github("DougLuke/UserNetR") ``` ```R= library(statnet) library(UserNetR) data(Moreno) ``` --- ## Network fundamentals ### Simple Visualization ```R= gender <- Moreno %v% "gender" plot(Moreno, vertex.col = gender + 2, vertex.cex = 1.2) ``` ![](https://i.imgur.com/hPLxeKG.png) ### Basic Description - Size ```R= network.size(Moreno) #[1]33 summary(Moreno,print.adj=FALSE) #Network attributes: # vertices=33 # directed=FALSE # hyper=FALSE # loops=FALSE # multiple=FALSE # bipartite=FALSE # totaledges=46 # missingedges=0 # non-missingedges=46 # density=0.0871 # #Vertex attributes: # # gender: # numeric valued attribute # attribute summary: # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.00 1.00 2.00 1.52 2.00 2.00 # vertex.names: # character valued attribute # 33 valid vertex names # #No edge attributes ``` - Density > For a directed network, the maximum number of possible ties among k actors is k ∗ (k − 1), so the formula for density is: > > $\frac{L}{k\times (k-1)}$ > > where L is the number of observed ties in the network. Density, as defined here, does not allow for ties between a particular node and itself (called a loop). > > > density for an undirected network becomes: > > $\frac{2L}{k\times (k-1)}$ > ```R= den_hand <- 2*46/(33*32) den_hand [1] 0.08712121 gden(Moreno) [1] 0.08712121 ``` - Components ```R= components(Moreno) [1] 2 ``` - Diameter > The diameter then for an entire network is the longest of the shortest paths across all pairs of nodes. > > For the Moreno network there are two components. The smaller component only has two nodes. Therefore, we will use the larger component that contains the other 31 connected students. > ```R= lgc <- component.largest(Moreno,result="graph") gd <- geodist(lgc) max(gd$gdist) [1] 11 ``` It takes 11 steps to connect the two nodes that are situated the furthest apart in this friendship network - Clustering Coefficient > Transitivity is defined as the proportion of closed triangles (triads where all three ties are observed) to the total number of open and closed triangles (triads where either two or all three ties are observed). > ```R= gtrans(Moreno,mode="graph") [1] 0.286 ``` --- ## Network Data Management ### Network Data Structures - Sociomatrices(adjacency matrix) ![](https://i.imgur.com/3WhaLNZ.png) - Edge-Lists ![](https://i.imgur.com/nU5vdoJ.png) ### Information Stored in Network Objects > In general, a network data object can contain up to five types of information(statnet object) ![](https://i.imgur.com/8URmjyQ.png) ### Creating and Managing Network Objects - Creating a Network Object in statnet ```R= netmat1 <- rbind(c(0,1,1,0,0), c(0,0,1,1,0), c(0,1,0,0,0), c(0,0,0,0,0), c(0,0,1,0,0)) rownames(netmat1) <- c("A","B","C","D","E") colnames(netmat1) <- c("A","B","C","D","E") net1 <- network(netmat1,matrix.type="adjacency") class(net1) [1] "network" summary(net1) # Network attributes: # vertices = 5 # directed = TRUE # hyper = FALSE # loops = FALSE # multiple = FALSE # bipartite = FALSE # total edges = 6 # missing edges = 0 # non-missing edges = 6 # density = 0.3 # # Vertex attributes: # vertex.names: # character valued attribute # 5 valid vertex names # # No edge attributes # # Network adjacency matrix: # A B C D E # A 0 1 1 0 0 # B 0 0 1 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 1 0 0 ``` > The same network can be created using an edge list format. ```R= netmat2 <- rbind(c(1,2), c(1,3), c(2,3), c(2,4), c(3,2), c(5,3)) net2 <- network(netmat2,matrix.type="edgelist") network.vertex.names(net2) <- c("A","B","C","D","E") summary(net2) # Network attributes: # vertices = 5 # directed = TRUE # hyper = FALSE # loops = FALSE # multiple = FALSE # bipartite = FALSE # total edges = 6 # missing edges = 0 # non-missing edges = 6 # density = 0.3 # # Vertex attributes: # vertex.names: # character valued attribute # 5 valid vertex names # # No edge attributes # # Network adjacency matrix: # A B C D E # A 0 1 1 0 0 # B 0 0 1 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 1 0 0 ``` - reverse flow > coercing network data into other matrix formats > ```R= as.sociomatrix(net1) # A B C D E # A 0 1 1 0 0 # B 0 0 1 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 1 0 0 class(as.sociomatrix(net1)) [1] "matrix" ``` > A more general coercion function is as.matrix(). It can be used to produce a **sociomatrix** or an **edgelist** matrix. > ```R= all(as.matrix(net1) == as.sociomatrix(net1)) # [1] TRUE as.matrix(net1,matrix.type = "edgelist") # [,1] [,2] # [1,] 1 2 # [2,] 3 2 # [3,] 1 3 # [4,] 2 3 # [5,] 5 3 # [6,] 2 4 # attr(,"n") # [1] 5 # attr(,"vnames") ``` ### Managing Node and Tie Attributes - Node Attributes > The first example uses the more *formal* method to assign gender codes to the nodes in net1. > The second exam-ple uses a *shorthand method* to assign a numeric vector as an attribute. ```R= set.vertex.attribute(net1, "gender", c("F", "F", "M", "F", "M")) net1 %v% "alldeg" <- degree(net1) list.vertex.attributes(net1) # [1] "alldeg" "gender" "na" "vertex.names" summary(net1) # Network attributes: # vertices = 5 # directed = TRUE # hyper = FALSE # loops = FALSE # multiple = FALSE # bipartite = FALSE # total edges = 6 # missing edges = 0 # non-missing edges = 6 # density = 0.3 # # Vertex attributes: # # alldeg: # numeric valued attribute # attribute summary: # Min. 1st Qu. Median Mean 3rd Qu. Max. # 1.0 1.0 2.0 2.4 4.0 4.0 # # gender: # character valued attribute # attribute summary: # F M # 3 2 # vertex.names: # character valued attribute # 5 valid vertex names # # No edge attributes # # Network adjacency matrix: # A B C D E # A 0 1 1 0 0 # B 0 0 1 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 1 0 0 ``` > To see the actual values stored in a vertex attribute, you can use the following two equivalent methods. > ```R= get.vertex.attribute(net1, "gender") # [1] "F" "F" "M" "F" "M" net1 %v% "alldeg" # [1] 2 4 4 1 1 ``` - Tie Attributes > `set.edge.attributes` and `get.edge.attributes` functions ```R= list.edge.attributes(net1) # # [1] "na" set.edge.attribute(net1,"rndval", runif(network.size(net1),0,1)) list.edge.attributes(net1) # [1] "na" "rndval" summary(net1 %e% "rndval") # Min. 1st Qu. Median Mean 3rd Qu. Max. # 0.0376 0.1615 0.2062 0.3136 0.2965 0.9586 summary(get.edge.attribute(net1,"rndval")) # Min. 1st Qu. Median Mean 3rd Qu. Max. # 0.0376 0.1615 0.2062 0.3136 0.2965 0.9586 ``` > In statnet, the actual values of the valued ties are stored in an edge attribute. > ```R= netval1 <- rbind(c(0,2,3,0,0), c(0,0,3,1,0), c(0,1,0,0,0), c(0,0,0,0,0), c(0,0,2,0,0)) netval1 <- network(netval1,matrix.type="adjacency", ignore.eval=FALSE,names.eval="like") network.vertex.names(netval1) <- c("A","B","C","D","E") list.edge.attributes(netval1) # [1] "like" "na" get.edge.attribute(netval1, "like") # [1] 2 1 3 3 2 1 ``` :::info The key here are the `ignore.eval` and `names.eval` options. These two options, as set here, tell the network function to evaluate the actual values in the sociomatrix, and store those values in a new edge attribute called ‘like.’ ::: ```R= as.sociomatrix(netval1) # A B C D E # A 0 1 1 0 0 # B 0 0 1 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 1 0 0 as.sociomatrix(netval1,"like") # A B C D E # A 0 2 3 0 0 # B 0 0 3 1 0 # C 0 1 0 0 0 # D 0 0 0 0 0 # E 0 0 2 0 0 ``` ### Creating a Network Object in igraph ```R= detach(package:statnet) library(igraph) ``` ```R= inet1 <- graph.adjacency(netmat1) class(inet1) # [1] "igraph" ``` ```R= summary(inet1) # IGRAPH 5625b4a DN-- 5 6 -- # + attr: name (v/c) inet1 # IGRAPH 5625b4a DN-- 5 6 -- # + attr: name (v/c) # + edges from 5625b4a (vertex names): # [1] A->B A->C B->C B->D C->B E->C ``` :::info 1. In this case the ‘D’ indicates a directed graph, and the ‘N’ indicates that the vertices are named. 2. After these codes the number of vertices (5) and edges (6) are then displayed. ::: > Similarly, an igraph graph object can be created from an edge list. > ```R= inet2 <- graph.edgelist(netmat2) summary(inet2) # IGRAPH D--- 5 6 -- ``` > To create and use node attributes, the V() vertex accessor function is used. Similarly, to manage edge attributes, the E() edge accessor function is used. > ```R= V(inet2)$name <- c("A","B","C","D","E") E(inet2)$val <- c(1:6) summary(inet2) # IGRAPH DN-- 5 6 -- # + attr: name (v/c), val (e/n) inet2 # IGRAPH DN-- 5 6 -- # + attr: name (v/c), val (e/n) # + edges (vertex names): # [1] A->B A->C B->C B->D C->B E->C ``` ### Going Back and Forth Between statnet and igraph ```R= library(intergraph) class(net1) # [1] "network" net1igraph <- asIgraph(net1) class(net1igraph) # [1] "igraph" net1igraph # IGRAPH 7549448 D--- 5 6 -- # + attr: alldeg (v/n), gender (v/c), na (v/l), vertex.names (v/c), na (e/l), # | rndval (e/n) # + edges from 7549448: # [1] 1->2 3->2 1->3 2->3 5->3 2->4 ``` ### Importing Network Data ```R= detach("package:igraph", unload=TRUE) library(statnet) netmat3 <- rbind(c("A","B"), c("A","C"), c("B","C"), c("B","D"), c("C","B"), c("E","C")) net.df <- data.frame(netmat3) net.df # X1 X2 # 1 A B # 2 A C # 3 B C # 4 B D # 5 C B # 6 E C write.csv(net.df, file = "MyData.csv", row.names = FALSE) net.edge <- read.csv(file="MyData.csv") net_import <- network(net.edge, matrix.type="edgelist") ``` --- ### Common Network Data Tasks - Filtering Based on Node Values ```R= n1F <- get.inducedSubgraph(net1, which(net1 %v% "gender" == "F")) n1F[,] # A B D # A 0 1 0 # B 0 0 1 # D 0 0 0 ``` :::info `get.inducedSubgraph()`function returns a new network object that is filtered based on the vertex attribute criteria. ::: > This works the same way but uses the `%s%` operator, which is a shortcut for the `get.inducedSubgraph` function > ```R= deg <- net1 %v% "alldeg" # degree greater than or equal to 2 n2 <- net1 %s% which(deg > 1) gplot(n2,displaylabels=TRUE) ``` ![](https://i.imgur.com/MN1DYXr.png) - Removing Isolates :::danger DATA: The members of this network are scientists, and they have a tie if they worked together on a scientific grant submission. ::: ```R= data(ICTS_G10) gden(ICTS_G10) # [1] 0.0112 length(isolates(ICTS_G10)) # [1] 96 ``` > The `isolates()` function returns a vector of vertex IDs. This can be fed to the `delete.vertices()` function. However, unlike most R functions we have seen, `delete.vertices()` does *not* return an object, but it directly operates on the network that is passed to it. > ```R= n3 <- ICTS_G10 delete.vertices(n3,isolates(n3)) gden(n3) # [1] 0.0173 length(isolates(n3)) # [1] 0 ``` - Filtering Based on Edge Values :::danger The DHHS Collaboration Network (DHHS) contains network data from a study of the relationships among 54 tobacco control experts working in 11 different agencies in the Department of Health and Human Services in 2005. ::: ```R= data(DHHS) d <- DHHS gden(d) # [1] 0.312 ``` ```R= op <- par(mar = rep(0, 4)) gplot(d,gmode="graph",edge.lwd=d %e% 'collab', edge.col="grey50",vertex.col="lightblue" , vertex.cex=1.0,vertex.sides=20) par(op) ``` ![](https://i.imgur.com/Qstm2o9.png) ```R= as.sociomatrix(d)[1:6,1:6] # ACF-1 ACF-2 AHRQ-1 AHRQ-2 AHRQ-3 AHRQ-4 # ACF-1 0 1 0 0 0 0 # ACF-2 1 0 0 0 0 0 # AHRQ-1 0 0 0 1 1 1 # AHRQ-2 0 0 1 0 1 1 # AHRQ-3 0 0 1 1 0 1 # AHRQ-4 0 0 1 1 1 0 list.edge.attributes(d) #[1]"collab""na" as.sociomatrix(d,attrname="collab")[1:6,1:6] # ACF-1 ACF-2 AHRQ-1 AHRQ-2 AHRQ-3 AHRQ-4 # ACF-1 0 1 0 0 0 0 # ACF-2 1 0 0 0 0 0 # AHRQ-1 0 0 0 3 3 3 # AHRQ-2 0 0 3 0 3 2 # AHRQ-3 0 0 3 3 0 3 # AHRQ-4 0 0 3 2 3 0 ``` We can easily see the distribution of tie values ```R= table(d %e%"collab") # 1 2 3 4 # 163 111 94 79 ``` > Now we can filter the edges to only include formal collaboration ties(value > 2). This takes three steps. 1. First, a valued sociomatrix is created that contains the tie values stored in the ‘collab’ edge attribute. 2. Then we filter out the ties that we want to ignore. In this case the ties that are coded 1 and 2 are replaced with 0s. 3. Then, we create a new network based on the filtered sociomatrix. The key here is that a tie will be created anywhere a non-zero value is found in `d.val`. Also, by using the `ignore.eval` and `names.eval` options we store the retained edge values in an edge attribute called ‘collab.’ ```R= d.val <- as.sociomatrix(d,attrname="collab") d.val[d.val < 3] <- 0 d.filt <- as.network(d.val, directed=FALSE, matrix.type="a",ignore.eval=FALSE, names.eval="collab") ``` ```R= summary(d.filt,print.adj=FALSE) # Network attributes: # vertices = 54 # directed = FALSE # hyper = FALSE # loops = FALSE # multiple = FALSE # bipartite = FALSE # total edges = 173 # missing edges = 0 # non-missing edges = 173 # density = 0.1208945 # # Vertex attributes: # vertex.names: # character valued attribute # 54 valid vertex names # # Edge attributes: # # collab: # numeric valued attribute # attribute summary: # Min. 1st Qu. Median Mean 3rd Qu. Max. # 3.000 3.000 3.000 3.457 4.000 4.000 gden(d.filt) # [1] 0.1208945 ``` plot ```R= op <- par(mar = rep(0, 4)) gplot(d.filt,gmode="graph",displaylabels=TRUE, vertex.col="lightblue",vertex.cex=1.3, label.cex=0.4,label.pos=5, displayisolates=FALSE) par(op) ``` ![](https://i.imgur.com/UqZkRxj.png) - Transforming a Directed Network to a Non-directed Network ```R= net1mat <- symmetrize(net1,rule="weak") net1mat # [,1] [,2] [,3] [,4] [,5] # [1,] 0 1 1 0 0 # [2,] 1 0 1 1 0 # [3,] 1 1 0 0 1 # [4,] 0 1 0 0 0 # [5,] 0 0 1 0 0 ``` ```R= net1symm<-network(net1mat,matrix.type="adjacency") network.vertex.names(net1symm)<-c("A","B","C","D","E") summary(net1symm) # Network attributes: # vertices = 5 # directed = TRUE # hyper = FALSE # loops = FALSE # multiple = FALSE # bipartite = FALSE # total edges = 10 # missing edges = 0 # non-missing edges = 10 # density = 0.5 # # Vertex attributes: # vertex.names: # character valued attribute # 5 valid vertex names # # No edge attributes # # Network adjacency matrix: # A B C D E # A 0 1 1 0 0 # B 1 0 1 1 0 # C 1 1 0 0 1 # D 0 1 0 0 0 # E 0 0 1 0 0 ``` :::info `symmetrize(net1,rule="weak")` rule: 'weak' i j任一邊有連結則轉換後保留; 'strong' ij必須兩邊都有連結轉換後才保留連結 ::: --- ## Affiliation Networks ### Affiliations as 2-Mode Networks ```R= C1<-c(1,1,1,0,0,0) C2<-c(0,1,1,1,0,0) C3<-c(0,0,1,1,1,0) C4<-c(0,0,0,0,1,1) aff.df<-data.frame(C1,C2,C3,C4) row.names(aff.df)<-c("S1","S2","S3","S4","S5","S6") ``` ![](https://i.imgur.com/Ed9syZT.png) > This type of data matrix is called *anincidence matrix*, and it depicts how $n$ actors belong to $g$ groups. > ### Bipartite Graphs ```R= library(igraph) bn <- graph.incidence(aff.df) ``` ```R= plt.x <- c(rep(2,6),rep(4,4)) plt.y <- c(7:2,6:3) lay <- as.matrix(cbind(plt.x,plt.y)) ``` ```R== shapes <- c("circle","square") colors <- c("blue","red") plot(bn,vertex.color=colors[V(bn)$type+1], vertex.shape=shapes[V(bn)$type+1], vertex.size=10,vertex.label.degree=-pi/2, vertex.label.dist=1.2,vertex.label.cex=0.9, layout=lay) ``` ![](https://i.imgur.com/LWDX0qN.png) ### Affiliation Network Basics - Creating Affiliation Networks from Incidence Matrices ```R= bn <- graph.incidence(aff.df) bn # IGRAPH 1f8ada3 UN-B 10 11 -- # + attr: type (v/l), name (v/c) # + edges from 1f8ada3 (vertex names): # [1] S1--C1 S2--C1 S2--C2 S3--C1 S3--C2 S3--C3 S4--C2 S4--C3 S5--C3 S5--C4 S6--C4 ``` :::info The ‘B’ in the ‘UN-B’ string tells us that this is a bipartite net-work. Furthermore, the second line shows that this network has two vertex attributes: name stores the name of the vertex, and type is a logical vector that igraph uses to distinguish between the two different types of nodes ::: ```R= get.incidence(bn) # C1 C2 C3 C4 # S1 1 0 0 0 # S2 1 1 0 0 # S3 1 1 1 0 # S4 0 1 1 0 # S5 0 0 1 1 # S6 0 0 0 1 V(bn)$type # [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE V(bn)$name # [1] "S1" "S2" "S3" "S4" "S5" "S6" "C1" "C2" "C3" "C4" ``` - Creating Affiliation Networks from Edge Lists ```R= el.df <- data.frame(rbind(c("S1","C1"), c("S2","C1"), c("S2","C2"), c("S3","C1"), c("S3","C2"), c("S3","C3"), c("S4","C2"), c("S4","C3"), c("S5","C3"), c("S5","C4"), c("S6","C4"))) el.df # X1 X2 # 1 S1 C1 # 2 S2 C1 # 3 S2 C2 # 4 S3 C1 # 5 S3 C2 # 6 S3 C3 # 7 S4 C2 # 8 S4 C3 # 9 S5 C3 # 10 S5 C4 # 11 S6 C4 ``` ```R= bn2 <- graph.data.frame(el.df,directed=FALSE) bn2 # IGRAPH 2e7b9de UN-- 10 11 -- # + attr: name (v/c) # + edges from 2e7b9de (vertex names): # [1] S1--C1 S2--C1 S2--C2 S3--C1 S3--C2 S3--C3 S4--C2 S4--C3 S5--C3 S5--C4 S6--C4 ``` > set the type vertex attribute. > ```R= V(bn2)$type <- V(bn2)$name %in% el.df[,1] bn2 # IGRAPH 2e7b9de UN-B 10 11 -- # + attr: name (v/c), type (v/l) # + edges from 2e7b9de (vertex names): # [1] S1--C1 S2--C1 S2--C2 S3--C1 S3--C2 S3--C3 S4--C2 S4--C3 S5--C3 S5--C4 S6--C4 ```