Case Study: Network Graph with Channels and Users

# Case Study: Network Graph with Channels and Users ## A First Approach ### Goal We aim to demonstrate the potential of creating relationships between channels and users using a network graph approach. Our objective is to: 1. Understand common behaviors and patterns among channels. 2. Create clusters of channels and users to gain insights into their characteristics and detect fraudulent behavior. In this first attempt, we aim to detect: 1. Channels that are similar enough to suggest that they are managed by the same group of people or colluding users. 2. On users we aim to detect similarity on the features from the clusters created. ### Context #### Feature Engineering To work with channels, we generate a first relationship between channels based on three variables: the type of signal, the commodity mentioned, and the time frame of the messages. Using these variables, we assign weighted correlations between channels and use these to assign numeric values for the relative importance of relationships. For users, we created a similar coefficient, but based on the number of groups a user shares with other users and the language of the messages in the channels. With these relationships established, we create two network graphs: 1. Channels -[:importance]-> Channels 2. Users -[:importance]-> Users (Note: "()" represents nodes, and "-[:importance]->" represents weighted edges of the relationships) #### Pattern Detection With the network graph created, we can add new properties to users and channels to reveal the connections between them. We use centrality measures and cluster creation to achieve this and are key in this use case as we can see next. ### Use Cases After running the algorithm and applying filters to show only relevant nodes *-meassured by the relative importance of their connections-*, we can see the following figure: The size of the node represents the degree of centrality of the node, and the color represents the cluster to which it belongs. #### Channels analysis Let's examine clusters that are not at the core of the graph but have strong relationships: ![](https://i.imgur.com/6YAAoMY.jpg) We can zoom in between clusters and manually check Telegram channels: ![](https://i.imgur.com/UtF9J1y.jpg) **Red Cluster** has these groups: 1. Premiumbinancesignal 2. BinanceSygnalsTurkey These groups have the following type of message: ![searchforit] **Brown Cluster** has these groups: 1. CoinCoachSignals 2. VIPExpertSignals These groups have the following type of message: ![](https://i.imgur.com/djsjUfQ.jpg) ![](https://i.imgur.com/1w4r6EI.jpg) **Light Blue Cluster** has these groups: 1. crypto_pump_island 2. bitcoinpumpgroup These groups have the following type of message: ![](https://i.imgur.com/WFdsBn3.jpg) ![](https://i.imgur.com/xVWf8kp.jpg) **Green Cluster** has these groups: 1. cprofit 2. PrivateGroupforpump These groups have the following type of message: ![](https://i.imgur.com/xXXdC7s.jpg) ![](https://i.imgur.com/HxcOkP2.jpg) As we can see, these clusters have the same pattern regarding the type of emojis, the structure of the text, images they sent, and other characteristics that suggest they are potentially managed by the same group of users. Coming back to the first image, we can also see some central group of clusters from the graphs that have strong relations between them. So, we decided to check those groups and we detected that they shares some interesting similarities but we can difenrenciate clusters between them at the same time. **Yellow Cluster** - criptomillonaire01 - scalping_300 - binancekillercalls **Violet Cluster** - BinanceSignalsAbuDh **Orange Cluster** - CryptoCashFlowReal These groups have the following type of message: ![](https://i.imgur.com/45FTxO8.jpg) ![](https://i.imgur.com/6irfOb9.png) ![](https://i.imgur.com/gRjXDvM.png) ![](https://i.imgur.com/Yc1zlpe.png) ![](https://i.imgur.com/kXAHDpr.png) They share several messages that are almost the same messages and other that are exactly the same. We can infer that they are managed by a set of users that are higly probable to be the same one or that are colluding to achieve the pumps and dumps. Both type of clusters show us that we are detecting real relations between channels that we can exploid to have a better understanding of this channels. #### Users analysis Let's see the users graph structure of only users that have the strongest relations: ![](https://i.imgur.com/iCMdfDR.jpg) As expected, user nodes creates a more dense graph compared to the channels graph. We can see 4 main clusters (violet, yellow, Red, light bLue) and 3 secondary clusters (Green, orange and grey). Our question here is: 'What this clusters are telling us?'. By just adding one value on a property of the nodes (Detected characters = Arabic) we can see this: ![](https://i.imgur.com/IKz2e4N.jpg) ![](https://i.imgur.com/4eBAey1.jpg) As a first approach, we can see that have very different densities of arabic characters users. In ascendent order we have: 1. Light blue cluster has a almost none amount of users that use arabic characters 2. Violet cluster has only sporadic arabic users 3. Yellow cluster has almost all the arabic users. But just the surface, when checking the violet cluster we can detect that the arabic ones are: 1. Have strong connections with the yellow cluster 2. Are key pivotal nodes between a subset of strong related nodes that are separated from the core of the whole graph As we can see in this zoom in of the cluster: ![](https://i.imgur.com/uudHfeo.png) This reveals some very powerful uses: 1. Create more dense graph by estimating the probability of a user to be arabic in all the nodes we don´t have the information 2. Focus on specific behaviors per cluster in order to detect relevant agents. 3. Adding features/characteristics to nodes that are central and connect a group of users that we can´t have access to that information. It´s important to highlight that this analysis is just the tip of the iceberg, we are still working on more powerful nodes and edge detection that will provide an even more detailed set of relations both for channels and users graph. ### Going on a deeper analysis of users What if we see the problem through a new lens? We can use 2 new pieces of information: 1. Highlight users that are group administrators 2. Score users based on the relative importance of the groups they are part of. By adding these metrics we can segment into 2 analysis: 1. Same perspective as before but a bigger size if they are administrators 2. Increase the strength-edges-requirement to filter less important detected relations #### Highlight administrators In the image above we can see the graph structure: ![](https://i.imgur.com/1zUsiKV.png) **All administrators belong to the same cluster!** This is a huge step that raises several questions: a. Who are those users that are in the same "administrator cluster" but are not? b. Are those administrator related somehow? are they related? Are they in kahoot? We can also check that every administrator is more related to another specific cluster, let's zoom in to see this in more detail: ![](https://i.imgur.com/VYdpU2h.png) This is interesting to understand who they are, and their influence zone! #### Increase strength-edges-requirement It's interesting to notice that when we do that now the administrators start to belong to a unique cluster: ![](https://i.imgur.com/AVyKuPe.png) That, as expected, they are part of the cluster they were the closest. Let's zoom in again: ![](https://i.imgur.com/ZQ6PDJF.png) Wait for a second! We can see a clear relation when checking the language! this makes a lot of sense, right? We are automatically detecting a pattern of: 1. Potential collusion between administrator 2. Clusters of users that work together Let's do two last zoom on this: ![](https://i.imgur.com/WRAWj7u.png) ![](https://i.imgur.com/MTiyFmz.jpg) As we can see, this stands out the same we conclude with a different analysis than we did before! They are zone and similar geographical movement of users. And there are some obvious and not so obvious conclusions. Lets's start with the obvious ones: 1. Users tend to work with the language they feel more comfortable with. 2. We can automatically detect administrators with a high probability of collusion within the same "language of influence". 3. We can segment users that we DONT have the region and assign them a most probable one based on the clustering method 4. We can find administrators from different regions that it's highly probable that are working together! 5. Add information about users we know very little about based on relations, clustering, and machine learning modeling. ### Conclusions The power of the network graph analysis is huge. This first approach provides highly relevant information about users and channels that we may focus on and study more to detect the more important collusive and fraudulent behavior. There are several points of improvement and we run this analysis with a first approach regarding the detection of characteristics, the sophistication of the relation detection, and other relevant variables. > **Some other unresolved, and interesting questions:** > *Who are those cluster of users that seems to have no administrator? > (The orange one in the middle of the last graph)* > > This is a particularly interesting segment to study that we need to answer: > > * Are they key users recruiting others? > * Are they very proactive users that participate in several pumps? > * Are they related to an administrator in common?