# Superspreaders and Superblockers Based Community Evolution Tracking in Dynamic Social Networks [TOC] [ZhiouXu, Xiaobin Rui, Jing He, Zhixiao Wang, T. Hadzibeganovic (2019) *Superspreaders and superblockers based community evolution tracking in dynamic social networks*](https://www.sciencedirect.com/science/article/abs/pii/S0950705119306264) ###### tags: `articles` ## Abstract - tracking communities in dynamic social network - dynamic community detection - incremental clustering - error accumulation - evolutionary events identification - cored-node-based methods - reduced accurency duo to non-distinction of heterogeneous contrib - what's new - error accumulation sensitive(EAS) incremental community detection - superspreaders and superblockers(SAS) based community evolution tracking method - two stage EAS-SAS approach ## Introduction ### Dynamic Community Detection discover nodes with more intensive internal links - 變化時重算 - 耗時 - incremental clustering - 變化時計算變化 - 誤差隨時間擴大 ### Evolutionary Events Identification - superspreader - birth, merge, expension - superblocker - death, split, shrink ### EAS-SAS approach #### EAS estimate the error accumulation, when exceeds a given threshold, totally recalculate #### SAS identify superspreaders and superblockers uncover the birth and death with superspreaders and superblockers respectively #### Summary - EAS - balancer between estimated partition accuracy and time cost(BATC) - optimize the threshold - define evolutionary events identification in SAS algorithm - numerical experiment on symthetic and real-world dynamic social networks ## Proposed Method ### Dynamic Network A *dynamic network* $G=(V,E)$ consists of $T$ snapshots - $G=\{G_t\mid t=1,2,\ldots,T\}$ - $G_t=(V_t,E_t)$ - $V=\{V_t\mid t=1,2,\ldots,T\}$ - $E=\{E_t\mid t=1,2,\ldots,T\}$ - the *comminuty structure* $C=\{C_t\mid t=1,2,\ldots,T\}$ - $C_t=\{c_t^i\mid i=1,2,\ldots,N_t\}$ be a set of communities ### Incremental Community Detection Compute $C_{t+1}$ based on $C_t$ and updated part of $G_{t+1}$ - the *Louvain algorithm* - *[V.D. Blondel, J.L. Guillaume, R. Lambiotte "Fast unfolding of communities in large networks" J. Stat. Mech. 2008(10) P10008](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Fast+unfolding+of+communities+in+large+networks&btnG=)* ![](https://i.imgur.com/dyRQSUP.png) ### Superspreaders and Superblockers $n$ *superspreaders* are $n$ core nodes maximizing the average node number affected by a spreading process launched by all of them $n$ *superblockers* are $n$ core nodes leading to the minimum average size of outbreak when immunizing all of them - choose superblockers by *CoreHD algorithm* - *[L. Zdeborová, P. Zhang, H.J. Zhou "Fast and simple decycling and dismantling of networks" Scientific Reports 6, 37954 (2016)](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Fast+and+simple+decycling+and+dismantling+of+networks&btnG=)* - choose superspreaders by *degree discount algorithm* - *[W. Chen, Y. Wang, S. Yang "Efficient influence maximization in social networks" ACM 2009 pp. 199-208](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=Efficient+influence+maximization+in+social+networks&btnG=)* ![](https://i.imgur.com/5lxQGo3.png) ![](https://i.imgur.com/DJOGJVW.png) ### Incremental Error Accumulation Given a snapshot $G_t=(V_t,E_t)$, the *incremental error accumulation* of $G_t$ is estimated by $$ IEA_t=\frac{\sum_{i=t_0}^{t}(t+1-i)|\Delta V_i|}{\sum_{i=t_0}^t|V_i|} $$ where - $\Delta V_i$ is the set of incremental nodes in $G_i$ - $t_0$ is the last totally partition time ### EAS incremental Community Detection Algorithm ![](https://i.imgur.com/p8ZQG2s.png) ### Balancer between Estimated Partition Accuracy and Time Cost $$ BATC=\frac{\sum_{i=1}^TCC_t}{freq+ratio\times(T-freq)} $$ where - $freq$ is the times of re-partition operation - $$ratio=\frac{\sum_{t=1}^T|\Delta E_t|}{\sum_{t=1}^T|E_t|}$$ - $\Delta E_t$ is the set of incremental edge set of $G_t$ - $$CC_t=\begin{cases}\frac{|E_t\cap E_{t_0}|}{|E_t|}&\mbox{if}&IEA_t\leq threshold\\1&\mbox{if}&IEA_t>threshold\end{cases}$$ - $t_0$ is the last totally partition time ![](https://i.imgur.com/exaNVW1.png) ### Community Evolution Events #### Birth An event is a *birth* event of $c_t^i$ if all superspeaders in $c_t^i$ are not a superspeader in $G_{t-1}$ $$ Birth(C_t^i)=1\quad\mbox{if}\\ \forall\,ss_t^i\in c_t^i\cap SS_t,\quad ss_t^i\not\in SS_{t-1} $$ #### Death An event is a *death* event of $c_{t-1}^i$ if all superblockers in $c_{t-1}^i$ are not superblocker in $G_t$ $$ Death(c_{t-1}^i)=1\quad\mbox{if}\\ \forall\,sb_{t-1}^i\in c_{t-1}^i\cap SB_{t-1},\quad sb_{t-1}^i\not\in SB_{t} $$ #### Merging An event is a *merging* event of $c_{t-1}^i,c_{t-1}^j$ into $c_t^k$ if some superspreaders appears in $c_t^k$ $$ Merging(c_{t-1}^i,c_{t-1}^j)=\{c_t^k\}\quad\mbox{if}\\ \exists\,ss_{t-1}^i\in c_{t-1}^i\cap SS_{t-1},ss_{t-1}^j\in c_{t-1}^j\cap SS_{t-1},\quad ss_{t-1}^i,ss_{t-1}^j\in c_t^k $$ #### Split An event is a *split event* of $c_{t-1}^i$ into $c_t^k$ and $c_t^l$ if some superblockers of $c_{t-1}^i$ appears in both $c_t^k$ and $c_t^l$ $$ Split(c_{t-1}^i)=\{c_t^k,c_t^l\}\quad\mbox{if}\\ \exists\,sb_{t-1}^i,sb_{t-1}^j\in c_{t-1}^i\cap SB_{t-1},\quad sb_{t-1}^i\in c_t^k,sb_{t-1}^j\in c_t^l $$ #### Expansion An event is an *expansion event* of $c_{t-1}^i$ into $c_t^k$ if $c_t^k$ has more and contains all superspeaders of $c_{t-1}^i$ $$ Expansion(c_{t-1}^i)=\{c_t^k\}\quad\mbox{if}\\ c_{t-1}^i\cap SS_{t-1}\subsetneq c_t^k\cap SS_t $$ #### Shrink An event is a *shrink event* of $c_{t-1}^i$ into $c_t^k$ if all superblocks of $c_t^k$ are in but not all superblockers of $c_{t-1}^i$ $$ Shrink(c_{t-1}^i)=\{c_t^k\}\quad\mbox{if}\\ c_{t-1}^i\cap SB_{t-1}\supsetneq c_t^k\cap SB_t $$ ![](https://i.imgur.com/qUusYcv.png) ## Experiments ### Network datasets used - Real-world networks: 1. **arXiv** - A well-known e-print citation network dataset - Articles are nodes and citations are edges - 13 consecutive snapshots 2. **Enron** - A network containing emails exchanged between employees of Enron - Employees are nodes and emails are edges - 10 consecutive snapshots 3. **VAST2008** - A network containing the information about 9834 calls among 400 people over a 10-day period in June 2006 - 10 consecutive snapshots - Synthetic networks for community detection: 1. **SYN-FIX** - A dynamic network with a fixed community number - 10 consecutive snapshots 2. **SYN-VAR** - A dynamic network with a varying community number - 10 consecutive snapshots - Synthetic networks for evolutionary events identification: - We generated two different synthetic networks using the method by Greene et al. - Let's call them _Syn1_ and _Syn2_ respectively ### Performance of community detection methods - **Control groups:** typical incremental clustering methods 1. QCA 2. BatchInc 3. LBTR - **Experimental groups:** EAS algorithm combined with typical methods 1. EAS-QCA 2. EAS-BatchInc 3. EAS-LBTR :::info **Procedure and results** 1. **_Algorithm 2_** was used to obtain the optimal threshold for these networks - Real-world networks |arXiv|Enron|VAST2008| |---|---|---| |![](https://i.imgur.com/wNEeZRb.jpg)|![](https://i.imgur.com/2utI5by.jpg)|![](https://i.imgur.com/5kYwk8K.jpg)| |optimal threshold = 0.21|optimal threshold = 1.11|optimal threshold = 0.88| - Synthetic networks |SYN-FIX|SYN-VAR| |---|---| |![](https://i.imgur.com/o5nqYIq.jpg)|![](https://i.imgur.com/06KXOTE.jpg)| |optimal threshold = 0.99|optimal threshold = 0.92| 2. **NMI ([Normalized Mutual Information](https://course.ccs.neu.edu/cs6140sp15/7_locality_cluster/Assignment-6/NMI.pdf))** method was employed to evaluate the partition performance of different algorithms - Real-world networks: _Since the ground-truth community structures are unavailable, we used the partition results of the Louvain algorithm as the standard reference._ - arXiv ![](https://i.imgur.com/ayRV49s.jpg) - Enron ![](https://i.imgur.com/NJaB6o9.jpg) - VAST2008 ![](https://i.imgur.com/mPtl0o5.jpg) - Synthetic networks - SYN-FIX ![](https://i.imgur.com/jrnCbhA.jpg) - SYN-VAR ![](https://i.imgur.com/pZY47BD.jpg) 3. We further analyzed the efficiency of our proposed method ![](https://i.imgur.com/x4qvowN.jpg) - EAS methods did not show any significant increases in the running-time costs, in spite of several repeated re-partitionings ::: ### Performance of the SAS algorithm - Tests on real-world networks - arXiv ![](https://i.imgur.com/ZBw99MN.jpg) - Enron ![](https://i.imgur.com/FBe6BDu.jpg) - VAST2008 ![](https://i.imgur.com/nhJfDwO.jpg) - Tests on synthetic networks (_Syn1_ and _Syn2_) - The generator provides the **ground truth** about community partition and evolution, so let's compare the accuracy of our proposed method with 5 other methods - **Experimental groups:** SAS - **Control groups:** iLCD, Greene, Facetnet, HOCTracker, gravitational - Results of Syn1 ![](https://i.imgur.com/MVYMHJ8.jpg) - Results of Syn2 ![](https://i.imgur.com/MayJEvT.jpg) ## Discussion and conclusions - EAS incremental community detection method estimates the error accumulation degree via the IEA metric, and repartitions the network if the estimated IEA value exceeds a pre-selected threshold - We also gives a proof for the existence of the optimal threshold, which maximizes the BATC value - We utilized superspreader and superblocker nodes to identify critical evolution events in dynamic social networks - Numerical experiments conducted on both synthetic and real-world networks demonstrated that our novel two-stage method outperforms a total of eight competing methods