# Superspreaders and Superblockers Based Community Evolution Tracking in Dynamic Social Networks
[TOC]
[ZhiouXu, Xiaobin Rui, Jing He, Zhixiao Wang, T. Hadzibeganovic (2019) *Superspreaders and superblockers based community evolution tracking in dynamic social networks*](https://www.sciencedirect.com/science/article/abs/pii/S0950705119306264)
###### tags: `articles`
## Abstract
- tracking communities in dynamic social network
- dynamic community detection
- incremental clustering
- error accumulation
- evolutionary events identification
- cored-node-based methods
- reduced accurency duo to non-distinction of heterogeneous contrib
- what's new
- error accumulation sensitive(EAS) incremental community detection
- superspreaders and superblockers(SAS) based community evolution tracking method
- two stage EAS-SAS approach
## Introduction
### Dynamic Community Detection
discover nodes with more intensive internal links
- 變化時重算
- 耗時
- incremental clustering
- 變化時計算變化
- 誤差隨時間擴大
### Evolutionary Events Identification
- superspreader
- birth, merge, expension
- superblocker
- death, split, shrink
### EAS-SAS approach
#### EAS
estimate the error accumulation, when exceeds a given threshold, totally recalculate
#### SAS
identify superspreaders and superblockers uncover the birth and death with superspreaders and superblockers respectively
#### Summary
- EAS
- balancer between estimated partition accuracy and time cost(BATC)
- optimize the threshold
- define evolutionary events identification in SAS algorithm
- numerical experiment on symthetic and real-world dynamic social networks
## Proposed Method
### Dynamic Network
A *dynamic network* $G=(V,E)$ consists of $T$ snapshots
- $G=\{G_t\mid t=1,2,\ldots,T\}$
- $G_t=(V_t,E_t)$
- $V=\{V_t\mid t=1,2,\ldots,T\}$
- $E=\{E_t\mid t=1,2,\ldots,T\}$
- the *comminuty structure* $C=\{C_t\mid t=1,2,\ldots,T\}$
- $C_t=\{c_t^i\mid i=1,2,\ldots,N_t\}$ be a set of communities
### Incremental Community Detection
Compute $C_{t+1}$ based on $C_t$ and updated part of $G_{t+1}$
- the *Louvain algorithm*
- *[V.D. Blondel, J.L. Guillaume, R. Lambiotte "Fast unfolding of communities in large networks" J. Stat. Mech. 2008(10) P10008](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Fast+unfolding+of+communities+in+large+networks&btnG=)*

### Superspreaders and Superblockers
$n$ *superspreaders* are $n$ core nodes maximizing the average node number affected by a spreading process launched by all of them
$n$ *superblockers* are $n$ core nodes leading to the minimum average size of outbreak when immunizing all of them
- choose superblockers by *CoreHD algorithm*
- *[L. Zdeborová, P. Zhang, H.J. Zhou "Fast and simple decycling and dismantling of networks" Scientific Reports 6, 37954 (2016)](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Fast+and+simple+decycling+and+dismantling+of+networks&btnG=)*
- choose superspreaders by *degree discount algorithm*
- *[W. Chen, Y. Wang, S. Yang "Efficient influence maximization in social networks" ACM 2009 pp. 199-208](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Efficient+influence+maximization+in+social+networks&btnG=)*


### Incremental Error Accumulation
Given a snapshot $G_t=(V_t,E_t)$, the *incremental error accumulation* of $G_t$ is estimated by
$$
IEA_t=\frac{\sum_{i=t_0}^{t}(t+1-i)|\Delta V_i|}{\sum_{i=t_0}^t|V_i|}
$$
where
- $\Delta V_i$ is the set of incremental nodes in $G_i$
- $t_0$ is the last totally partition time
### EAS incremental Community Detection Algorithm

### Balancer between Estimated Partition Accuracy and Time Cost
$$
BATC=\frac{\sum_{i=1}^TCC_t}{freq+ratio\times(T-freq)}
$$
where
- $freq$ is the times of re-partition operation
- $$ratio=\frac{\sum_{t=1}^T|\Delta E_t|}{\sum_{t=1}^T|E_t|}$$
- $\Delta E_t$ is the set of incremental edge set of $G_t$
- $$CC_t=\begin{cases}\frac{|E_t\cap E_{t_0}|}{|E_t|}&\mbox{if}&IEA_t\leq threshold\\1&\mbox{if}&IEA_t>threshold\end{cases}$$
- $t_0$ is the last totally partition time

### Community Evolution Events
#### Birth
An event is a *birth* event of $c_t^i$ if all superspeaders in $c_t^i$ are not a superspeader in $G_{t-1}$
$$
Birth(C_t^i)=1\quad\mbox{if}\\
\forall\,ss_t^i\in c_t^i\cap SS_t,\quad ss_t^i\not\in SS_{t-1}
$$
#### Death
An event is a *death* event of $c_{t-1}^i$ if all superblockers in $c_{t-1}^i$ are not superblocker in $G_t$
$$
Death(c_{t-1}^i)=1\quad\mbox{if}\\
\forall\,sb_{t-1}^i\in c_{t-1}^i\cap SB_{t-1},\quad sb_{t-1}^i\not\in SB_{t}
$$
#### Merging
An event is a *merging* event of $c_{t-1}^i,c_{t-1}^j$ into $c_t^k$ if some superspreaders appears in $c_t^k$
$$
Merging(c_{t-1}^i,c_{t-1}^j)=\{c_t^k\}\quad\mbox{if}\\
\exists\,ss_{t-1}^i\in c_{t-1}^i\cap SS_{t-1},ss_{t-1}^j\in c_{t-1}^j\cap SS_{t-1},\quad ss_{t-1}^i,ss_{t-1}^j\in c_t^k
$$
#### Split
An event is a *split event* of $c_{t-1}^i$ into $c_t^k$ and $c_t^l$ if some superblockers of $c_{t-1}^i$ appears in both $c_t^k$ and $c_t^l$
$$
Split(c_{t-1}^i)=\{c_t^k,c_t^l\}\quad\mbox{if}\\
\exists\,sb_{t-1}^i,sb_{t-1}^j\in c_{t-1}^i\cap SB_{t-1},\quad sb_{t-1}^i\in c_t^k,sb_{t-1}^j\in c_t^l
$$
#### Expansion
An event is an *expansion event* of $c_{t-1}^i$ into $c_t^k$ if $c_t^k$ has more and contains all superspeaders of $c_{t-1}^i$
$$
Expansion(c_{t-1}^i)=\{c_t^k\}\quad\mbox{if}\\
c_{t-1}^i\cap SS_{t-1}\subsetneq c_t^k\cap SS_t
$$
#### Shrink
An event is a *shrink event* of $c_{t-1}^i$ into $c_t^k$ if all superblocks of $c_t^k$ are in but not all superblockers of $c_{t-1}^i$
$$
Shrink(c_{t-1}^i)=\{c_t^k\}\quad\mbox{if}\\
c_{t-1}^i\cap SB_{t-1}\supsetneq c_t^k\cap SB_t
$$

## Experiments
### Network datasets used
- Real-world networks:
1. **arXiv**
- A well-known e-print citation network dataset
- Articles are nodes and citations are edges
- 13 consecutive snapshots
2. **Enron**
- A network containing emails exchanged between employees of Enron
- Employees are nodes and emails are edges
- 10 consecutive snapshots
3. **VAST2008**
- A network containing the information about 9834 calls among 400 people over a 10-day period in June 2006
- 10 consecutive snapshots
- Synthetic networks for community detection:
1. **SYN-FIX**
- A dynamic network with a fixed community number
- 10 consecutive snapshots
2. **SYN-VAR**
- A dynamic network with a varying community number
- 10 consecutive snapshots
- Synthetic networks for evolutionary events identification:
- We generated two different synthetic networks using the method by Greene et al.
- Let's call them _Syn1_ and _Syn2_ respectively
### Performance of community detection methods
- **Control groups:** typical incremental clustering methods
1. QCA
2. BatchInc
3. LBTR
- **Experimental groups:** EAS algorithm combined with typical methods
1. EAS-QCA
2. EAS-BatchInc
3. EAS-LBTR
:::info
**Procedure and results**
1. **_Algorithm 2_** was used to obtain the optimal threshold for these networks
- Real-world networks
|arXiv|Enron|VAST2008|
|---|---|---|
||||
|optimal threshold = 0.21|optimal threshold = 1.11|optimal threshold = 0.88|
- Synthetic networks
|SYN-FIX|SYN-VAR|
|---|---|
|||
|optimal threshold = 0.99|optimal threshold = 0.92|
2. **NMI ([Normalized Mutual Information](https://course.ccs.neu.edu/cs6140sp15/7_locality_cluster/Assignment-6/NMI.pdf))** method was employed to evaluate the partition performance of different algorithms
- Real-world networks:
_Since the ground-truth community structures are unavailable, we used the partition results of the Louvain algorithm as the standard reference._
- arXiv

- Enron

- VAST2008

- Synthetic networks
- SYN-FIX

- SYN-VAR

3. We further analyzed the efficiency of our proposed method

- EAS methods did not show any significant increases in the running-time costs, in spite of several repeated re-partitionings
:::
### Performance of the SAS algorithm
- Tests on real-world networks
- arXiv

- Enron

- VAST2008

- Tests on synthetic networks (_Syn1_ and _Syn2_)
- The generator provides the **ground truth** about community partition and evolution, so let's compare the accuracy of our proposed method with 5 other methods
- **Experimental groups:** SAS
- **Control groups:** iLCD, Greene, Facetnet, HOCTracker, gravitational
- Results of Syn1

- Results of Syn2

## Discussion and conclusions
- EAS incremental community detection method estimates the error accumulation degree via the IEA metric, and repartitions the network if the estimated IEA value exceeds a pre-selected threshold
- We also gives a proof for the existence of the optimal threshold, which maximizes the BATC value
- We utilized superspreader and superblocker nodes to identify critical evolution events in dynamic social networks
- Numerical experiments conducted on both synthetic and real-world networks demonstrated that our novel two-stage method outperforms a total of eight competing methods