Part 2 : GRAPH NEURAL NETWORK(GNN)

# **GRAPH NEURAL NETWORK(GNN)** > [Deepak Yadav](https://hackmd.io/profile) - **Why Graph Neural Network?** - **What are Graph Neural Networks?** - **What do we want to achive from GNN/GNN Output?** - **How to build Graph Neural Networks and working?** - **Graph Learning Python Libraries** - **Models of Graph Neural Networks** - **USE of GNN Model in Different Domain** --- ## Why Graph Neural Network? Over the years, **Deep Learning (DL)** has been the key to solving many **machine learning** problems in fields of image processing, natural language processing, and even in the video games industry. All this generated data is represented in spaces with a finite number of dimensions i.e. 2D or 3D spaces. Yet, in most current applications, generated data is generated from **non-Euclidean**(3D or Data in graph form) domains that represent data as graphs with relationships and mutual dependency between objects. This has led to a growing interest in deep learning research that focuses on the structure of graph data is called [Geometric Deep Learning.](https://hackmd.io/@w37LrfUISRWz42e0mlwy0A/HJStR9hEO) i.e To solve the Graph data or 3D data like problems we will use neural network is called as **GNN.** --- ## What are Graph Neural Networks? Graphs have tremendous expressive powers and are therefore gaining a lot of attention in the field of machine learning. Every node has an embedding associated with it that defines the node in the data space. **Graph neural networks** refer to the **neural network architectures** that operate on a **graph.** The aim of a GNN is for each node in the graph to learn an embedding containing information about its neighborhood (nodes directly connected to the target node via edges). This embedding can then be used for different problems like node labelling, node prediction, edge prediction, etc. ![](https://i.imgur.com/bJG2qhc.png) --- ## What do we want to achive from GNN/GNN Output? Using neural networks, nodes in a GNN structure add information gathered from neighboring nodes. The last layer then combines all this added information and outputs either a prediction or classification. **GNN output performs:** - Node classification - Link prediction - Graph classification **Node classification** Here, every node in the network is assigned a label. The network then determines the labels of new nodes introduced without the ground truth. A clear example of node classification is illustrated in the image above where the GNN establishes whether node A is either a toxic or safe drug. Another application of node classification is on protein interaction networks. Here, the nodes of the graph represent different types of proteins. The edges describe the types of biological interactions between them. Let’s take an example of two proteins in a network of 16 proteins. The 0th (node 0) and 15th (node 15) protein are related to a cancerous disease. The network should classify which proteins are the most related to each of them. **Link prediction** The goal in a link prediction task is to predict the likelihood of two nodes being inter-linked. Link prediction is used in maps to predict the occurrence of traffic jams enabling the suggestion of alternative routes (links) given the current traffic pattern. It’s applied in social networking sites such as Facebook. I’m sure we’ve all encountered the friend suggestion notification on Facebook. This is a good application of link prediction. The nodes are the friends on Facebook while the edges are the relationships between these friends. The network is able to establish the link between two friends (nodes) i.e. mutual friends. Once it establishes a link, it’s able to make a prediction and suggest the friend to you. The image below depicts the link prediction problem in social networks ![](https://i.imgur.com/pYVcuXe.png) **Graph classification** The idea behind graph classification is to classify graphs into different classes. This is related to image classification but the target changes into classifying graphs rather than images. Social networks have changed the way we socialize and communicate in this era. Graph classification in social network analysis help discover patterns in user’s interaction. This analysis helps summarize the perspectives and interests of social media users. Information gathered from these analysis can then be used for targeted online advertising. For example, the network could categorize users into different age groups. Targeted advertisements could then be tailored for these different classes of users. --- ## How to build Graph Neural Networks and working? To build the Graph Neural Network we have to perform two types of operation. - Take the input data in form of graph. - Feed this input Graph data into Neural Network via using **Node Embedding** to create feature vector. - **Node Embedding** is used for featuer vector ![](https://i.imgur.com/K0MyBqM.png) `So, We have already discussed, if we convert graph data into only adjacency graph, to train/feed the neural network So we can not. Reason: [1] Not In variant to node ordering. [2] Not appcicable to graph of difficult size. That's why we use Node Embedding which working is close to CNN working` > We've already disscued how graph nodes feed into adjacency metrix **Node Embedding** In graph theory, we implement the concept of **Node Embedding**. It means mapping nodes to a d- dimensional embedding space (low dimensional space rather than the actual dimension of the graph, **i.e real dimension>>mapping dimension**), so that similar nodes in the graph are embedded close to each other. `Note: To do that we have to introduce an encoder function that carries all the operation to perform the node embedding in space.` Our goal is to map nodes so that similarity in the embedding space approximates similarity in the network. Let’s define **u** and **v** as two nodes in a graph. **xu** and **xv** are two feature vectors. Now we’ll define the **encoder function Enc(u)** and **Enc(v),** which convert the feature vectors to **zu** and **zv**. `Note: the similarity function could be Euclidean distance.` ![](https://i.imgur.com/outqHvQ.png) **So the challenge now is how to come up with the encoder function?** The encoder function should be able to perform : - Locality (local network neighborhoods) - Aggregate information - Stacking multiple layers (computation) **Locality (local network neighborhoods):** Locality information can be achieved by using a computational graph. `computational graph: To creat graph to show how a node is connected to other nodes, which are near by to it ` To more explore click [here](https://medium.com/tebs-lab/deep-neural-networks-as-computational-graphs-867fcaa56c9) As shown in the graph below, **i** is the red node where we see how this node is connected to its neighbors and those neighbors’ neighbors. We’ll see all the possible connections, and form a computation graph. By doing this, we’re capturing the structure, and also borrowing feature information at the same time. ![](https://i.imgur.com/xsT9Q1K.png) **Aggregate Information OR Message Passing :** Once the locality information preserves the computational graph, we start aggregating. This is basically done using neural networks. ![](https://i.imgur.com/gctyAZl.png) Neural Networks are presented in grey boxes. They require aggregations to be order-invariant, like sum, average, maximum, because they are **permutation-invariant** functions. This property enables the aggregations to be performed. **How Neighborhood Aggregation or Message Passing work internally** Message passing refers to passing and receiving information between nodes about its neighborhood. Consider a target node having its initial embeddings: It receives information from its neighbors passed via edge neural networks. Data from these edges are aggregated (many techniques are used, like max pooling, averaging, etc.,) and passed to the activation unit of a node to get a new set of embeddings for the node. Every node in the initial setup has features x_v. The embeddings for each node after message passing can be defined as: ![](https://i.imgur.com/LLktKwJ.png) Where x_ne[v] denotes the features of the neighbors of v, x_co[v] is the edge features connected to v, h_ne[v] is the embedding of the neighbors of v. ![](https://i.imgur.com/dpMWsAE.png) In the above figure, hₐ⁽¹⁾ is the initial embedding of the node, hNₐ⁽¹⁾ is the aggregated embeddings of its neighbors. Combining these and passing to the node’s activation unit or filter will provide the new embedding for node A, which will also contain information about its neighbors. In this manner, each node gets a new set of embeddings for itself which determines its position in the graph. With various iterations or K layers of message passing, a node learns more and more about its neighborhood and its distant neighbors as well. Eventually, each node has a rough idea about the complete graph (or a part of it, depending on the number of iterations and node-node distance/path or layers considered). An example: Consider a graph with social media posts as nodes. Now, if these nodes have embeddings and are labelled with tags like romance, science, comedy, etc., we get a new post and we need to provide it with a tag. Using the existing network and embeddings, neighborhood aggregation will help us predict the labels and embeddings for the unseen node. Let’s move on to the **forward propagation rule** in GNNs. It determines how the information from the input will go to the output side of the neural network. ![](https://i.imgur.com/eavMycO.png) Every node has a feature vector. For example, (XA) is a feature vector of node A. The inputs are those feature vectors, and the box will take the two feature vectors (XA and Xc), aggregate them, and then pass on to the next layer. Notice that, for example, the input at node C are the features of node C, but the representation of node C in layer 1 will be a hidden, latent representation of the node, and in layer 2 it’ll be another latent representation. So in order to perform forward propagation in this computational graph, we need 3 steps: **1. Initialize the activation units:** ![](https://i.imgur.com/PDFAqZY.png) **2. Every layer in the network:** ![](https://i.imgur.com/3ieNjOp.png) We can notice that there are two parts for this equation: - The first part is basically averaging all the neighbors of node v. ![](https://i.imgur.com/NMjoRTG.png) - The second part is the previous layer embedding of node v multiplied with a bias Bk, which is a trainable weight matrix and it’s basically a self-loop activation for node v. ![](https://i.imgur.com/NlSCLsy.png) - σ: the non-linearity activation that is performed on the two parts. **3. The last equation (at the final layer):** ![](https://i.imgur.com/8JOqZOH.png) It’s the embedding after K layers of neighborhood aggregation. Now, to train the model we need to define a loss function on the embeddings. ![](https://i.imgur.com/LrdZzpp.png) We can feed the embeddings into any loss function and run stochastic gradient descent to train the weight parameters. Training can be unsupervised or supervised: - **Unsupervised training:** Use only the graph structure: similar nodes have similar embeddings. Unsupervised loss function can be a loss based on node proximity in the graph, or random walks. - **Supervised training:** Train model for a supervised task like node classification, normal or anomalous node. To recap, in this section we described a basic idea of generating node embeddings by aggregating neighborhood information. This is how GNN are works to find feature vector which we will feed futher for getting the result **Also, We can feed these embeddings into any loss function and run stochastic gradient decent to train the weight parameters** **Conclusion :** So, You can assume GNN is just a method to build graph by using two Terminology G(A,F). A->Adjacency Matrix(for storing the nodes) anf Feature vector(for node embedding) to feed the Model like GCNN, GRNN etc. --- ## Graph Learning Python Libraries Graph Learning Python Libraries Why would you need to use a library? You might wonder. As a beginner, the use of libraries enables a faster and efficient way of experimenting with GNNs. These libraries contain: - PyTorch Geometric - Graph Nets - Deep Graph Library (DGL) Implemented examples that make it easier for a beginner to understand. Free and easy to use examples. The most benchmarked datasets in the graph domain have already been integrated into these libraries. The top three libraries include: 1. **PyTorch Geometric** It has the latest types of Graph Networks, already built in for you. These graph networks are available as single line functions that are ready to be called in the PyTorch library. Read more about it [here](https://pytorch-geometric.readthedocs.io/en/latest). 2. **Graph Nets** It’s a python library created by DeepMind Technologies. It helps build graph networks in platforms such as TensorFlow and Sonnet. This library provides a lot of documentation and ready-made collaboration notebooks to showcase how to use their graph network library. But, it’s important to note that their code isn’t friendly for beginners when compared to the other two libraries. Learn more about it [here](https://github.com/deepmind/graph_nets). 3. **Deep Graph Library (DGL)** The Distributed Machine Learning community on GitHub created DGL. This platform has readable code, maintained, and cross-platform. DGL is the top pick for beginners. Learn more about it on their official [website](https://www.dgl.ai/). To better understand the use of these libraries, [here](https://docs.dgl.ai/tutorials/basics/1_first.html#sphx-glr-download-tutorials-basics-1-first-py) is an example problem implemented using the DGL library. It is based on the Zachary’s Karate club problem. Zachary’s karate club is a popular social network of a university used in networks that dates back to the year 1970. The club consisted of a total of 34 members each with close associations outside the club setting. Due to internal conflicts, the club was forced to split into two communities. One community was led by the instructor (Oth node) while the other was led by the club’s president (33rd node). The task is to predict which of the two communities, node 0 or node 33, each member of the club is likely to join. --- ## Models of Graph Neural Networks GNNs models consists of four types: - Recurrent Graph Neural Networks (RGNNs) - Convolutional Graph Neural Networks (CGNNs) - Graph Auto-Encoders (GAEs) - Spatial-Temporal Graph Neural Networks (STGNNs) --- ## **USE of GNN Model in Different Domain** **In Text Domain** ![](https://i.imgur.com/YwviAie.png) --- **In Images Domain:** ![](https://i.imgur.com/527xYGt.png) --- **In Science Domain:** ![](https://i.imgur.com/l0QFxmF.png) --- **In Computaional Domain** ![](https://i.imgur.com/zidX1ND.png) ---