# DRAGON: Detection of Related Account Groups for Online services with uncertain graphs
### Purpose of this study
Use DRAGON to detect suspicious groups.
### Backgrounds
Existing methods usually focus on the **behavior level** (association bases on their behaviors using services). While this study focuses on **identity level** (relationship between service account and physical world).
### The structure of DRAGON
1. Using **web tracking** to track visitors. Three approaches are introduced:
* HTTP cookie
* Store in user's browser.
* Can be cleared or using incognito mode.
* IP Address
* Stateless (not store in user's device).
* 87% user reusing at least one IP over a month.
* browser fringerprinting
* Collect device information.
* Reveal OS, hardware version, or geolocation.
2. **Modeling a bipartite graph** between physical devices and service accounts.
* $e(M,N)$ is added in the graph if an identifier $M$ appears when the account $N$ is signed in.
* Unweighted: the times it signs in are ignored.
3. Project the bipartite graph into an **uncertain graph**.
* An **unvertain graph** $\mathcal{G}$ is a graph with a probability function $p:e\rightarrow (0,1]$ $\forall e\in \mathcal{E}(\mathcal{G})$. Note that the probability of each edge are independent.
* The **Identity Degree** $n$ is the number of common identifiers they share with each other given any two accounts.
* The **Coincident identifier probability** $q$ is the probability that one identifier originates from different devices.
* We can now construct an uncertain graph $\mathcal{G}=(\mathcal{V},\mathcal{E},p)$ by the Bipartite gragh $\mathcal{B}(M,N,E)$:
* $\mathcal{V} = N$.
* $\mathcal{E} = \{(x,y)|x,y\in \mathcal{V}\}$.
* $\forall e \in \mathcal{E}$, $p(e)=1-q^n$, where $n=|\{m\in M|e=(x,y), (m,x)\in E,(m,y)\in E\}|$.
4. Clipping at low identity degree
* Set a **clipping threshold $n_{clip}$.**
* Replace $p$ by $p(e)=(1-q^n)[n>n_{clip}]$.
5. **Cluster** the uncertain graph
* Uncertain graph clustering algorithm(better).
* Markov clustering algorithm.
6. Detecting suspicious account groups
* A tendency: malicious groups usually use more device to manipulate a set of accounts than normal people do.
* The **Expected Account Relation Ratio (EAR)** is the expected ratio of the number of edge existence to the number of all edges in the community:
$$\displaystyle EAR = \sum_{G_i\subseteq\mathcal{G_c}}Pr(G_i)\cdot\frac{|E_i|}{|\mathcal{E}|} = \sum_{e\in \mathcal{E}}\frac{p(e)}{|\mathcal{E}|},$$
where $|\mathcal{E}|$ denotes the edge number in the uncertain graph $\mathcal{G_c}$ and $|E_i|$ stands for the edge number in a possible world $G_i$. The second equation due to the independency of edges.
* A thershold of EAR can then be used to determine whether an account group is suspicious.
### Real-world evaluation of DRAGON
1. Real-world Dataset
* 220 thousand login records collected for six months.
* Each of the login records contains an anonymized account id and a User-Agent request header.
* Only $0.2\%$ accounts are marked as malicious ones.
2. Approoaches
* Set $q = 0.6$ since around 60% identifiers are found not unique in a dataset of a recent study.
* Set $n_{clip}=1$
* Set the threshold of EAR in DRAGON when it achieves the best F1 score. ($0.9$)
3. Result
* V.S. binary classifier: DRAGON greatly reduces the false positives.
* Providing precision-recall curves, we can also see DRAGON is better than binary classifier and $n_{clip} = 1$ brings the best performance.
* DRAGON can discover not only suspicious accounts but also the relationships among these accounts.
4. Ablation studies
* The importance of identifier-account relations is shown by replacing all edges with 220 thousand randomly generated ones.
* The performance of DRAGON without clipping is shown.
* Results indicate UGC performs better than MCL in this task.