# DRAGON: Detection of Related Account Groups for Online services with uncertain graphs ### Purpose of this study Use DRAGON to detect suspicious groups. ### Backgrounds Existing methods usually focus on the **behavior level** (association bases on their behaviors using services). While this study focuses on **identity level** (relationship between service account and physical world). ### The structure of DRAGON 1. Using **web tracking** to track visitors. Three approaches are introduced: * HTTP cookie * Store in user's browser. * Can be cleared or using incognito mode. * IP Address * Stateless (not store in user's device). * 87% user reusing at least one IP over a month. * browser fringerprinting * Collect device information. * Reveal OS, hardware version, or geolocation. 2. **Modeling a bipartite graph** between physical devices and service accounts. * $e(M,N)$ is added in the graph if an identifier $M$ appears when the account $N$ is signed in. * Unweighted: the times it signs in are ignored. 3. Project the bipartite graph into an **uncertain graph**. * An **unvertain graph** $\mathcal{G}$ is a graph with a probability function $p:e\rightarrow (0,1]$ $\forall e\in \mathcal{E}(\mathcal{G})$. Note that the probability of each edge are independent. * The **Identity Degree** $n$ is the number of common identifiers they share with each other given any two accounts. * The **Coincident identifier probability** $q$ is the probability that one identifier originates from different devices. * We can now construct an uncertain graph $\mathcal{G}=(\mathcal{V},\mathcal{E},p)$ by the Bipartite gragh $\mathcal{B}(M,N,E)$: * $\mathcal{V} = N$. * $\mathcal{E} = \{(x,y)|x,y\in \mathcal{V}\}$. * $\forall e \in \mathcal{E}$, $p(e)=1-q^n$, where $n=|\{m\in M|e=(x,y), (m,x)\in E,(m,y)\in E\}|$. 4. Clipping at low identity degree * Set a **clipping threshold $n_{clip}$.** * Replace $p$ by $p(e)=(1-q^n)[n>n_{clip}]$. 5. **Cluster** the uncertain graph * Uncertain graph clustering algorithm(better). * Markov clustering algorithm. 6. Detecting suspicious account groups * A tendency: malicious groups usually use more device to manipulate a set of accounts than normal people do. * The **Expected Account Relation Ratio (EAR)** is the expected ratio of the number of edge existence to the number of all edges in the community: $$\displaystyle EAR = \sum_{G_i\subseteq\mathcal{G_c}}Pr(G_i)\cdot\frac{|E_i|}{|\mathcal{E}|} = \sum_{e\in \mathcal{E}}\frac{p(e)}{|\mathcal{E}|},$$ where $|\mathcal{E}|$ denotes the edge number in the uncertain graph $\mathcal{G_c}$ and $|E_i|$ stands for the edge number in a possible world $G_i$. The second equation due to the independency of edges. * A thershold of EAR can then be used to determine whether an account group is suspicious. ### Real-world evaluation of DRAGON 1. Real-world Dataset * 220 thousand login records collected for six months. * Each of the login records contains an anonymized account id and a User-Agent request header. * Only $0.2\%$ accounts are marked as malicious ones. 2. Approoaches * Set $q = 0.6$ since around 60% identifiers are found not unique in a dataset of a recent study. * Set $n_{clip}=1$ * Set the threshold of EAR in DRAGON when it achieves the best F1 score. ($0.9$) 3. Result * V.S. binary classifier: DRAGON greatly reduces the false positives. * Providing precision-recall curves, we can also see DRAGON is better than binary classifier and $n_{clip} = 1$ brings the best performance. * DRAGON can discover not only suspicious accounts but also the relationships among these accounts. 4. Ablation studies * The importance of identifier-account relations is shown by replacing all edges with 220 thousand randomly generated ones. * The performance of DRAGON without clipping is shown. * Results indicate UGC performs better than MCL in this task.