# Hyperbolic Graph Convolutional Neural Networks - **Motivation** - **What is Hyperbolic Space** - **What is Hyperbolic Graph Convolutional Neural Networks(HGCNN)** - **Mehtod to Implement of HGCNN** - **HYPERGCN architecture** - **Final Key Point** - **Advantages of HGCNN** --- ## **Motivation** The motivation of HGCMNN all about comes from GCN model. when we have been implepenting GCN, we are perform node embedding operation that reduce the size of input graph, and with the use of it we can create feture vector. Graph convolutional neural networks (GCNs) map nodes in a graph to Euclidean embeddings, which have been shown to incur a large distortion when embedding real-world graphs with scale-free or hierarchical structure. To reduce the distortion researchers has found a way where embedding has less distortion. Hyperbolic geometry offers an exciting alternative, as it enables embeddings with much smaller distortion It is not clear how to define neural network operations, such as feature transformation and aggregation, in hyperbolic space. Since input features are often Euclidean, it is unclear how to transform the features into hyperbolic embeddings with the right amount of curvature. Here they propose Hyperbolic Graph Convolutional Neural Network (HYPERGCN), the first inductive hyperbolic GCN that leverages both the expressiveness of GCNs and hyperbolic geometry to learn inductive node representations for hierarchical and scale-free graphs.They derive GCNs operations in the hyperboloid model of hyperbolic space and map Euclidean input features to embeddings in hyperbolic spaces with different trainable curvatures at each layer --- ## **What is Hyperbolic Space:** So a few differences we’ve noticed between flat and spherical geometry so far: - In Euclidean geometry, there’s exactly one line parallel to an original line that goes through some other point. In spherical geometry, there are none. - In Euclidean geometry, all triangles add up to 180 degrees. In spherical geometry, they add up to more than 180 degrees. - (Side note: this number-of-degrees-in-a-triangle fact is equivalent to the parallel postulate, so these two facts are basically the same). There’s a natural question that comes up from these two differences: - Is there a geometry with more than one line parallel to an original that goes through some other point? - Is there a geometry where all triangles add up to less than 180 degrees? The answer is yes! This is called **hyperbolic geometry** and is where lots of research lives nowadays. In this land, if you draw a line there are infinitely many lines parallel to it that go through some other point. And all triangles add up to less than 180 degrees. There are many models of hyperbolic space, but we’ll just look at two. The first one is the Poincare disk model. Escher does a really good picture for this: ![](https://i.imgur.com/BRTDmER.jpg) --- ## **Properties** **Lines** A line is introduced, then there can be properties of intersecting lines that differ from intersecting lines in Euclidean geometry. For example, given two intersecting lines there are infinitely many lines that do not intersect either of the given lines. In sort: Two parallel lines are always the same distance apart in euclidean space. However, in hyperbolic space, parallel lines are not equidistant. We can construct two hyperbolic straight lines which do not intersect yet are separated by increasing distance as we move away from the origin. **Non-intersecting / parallel lines** Lines through a given point P and asymptotic to line R. Non-intersecting lines in hyperbolic geometry also have properties that differ from non-intersecting lines in Euclidean geometry: - For any line R and any point P which does not lie on R, in the plane containing line R and point P there are at least two distinct lines through P that do not intersect R. This implies that there are through P an infinite number of coplanar lines that do not intersect R. These non-intersecting lines are divided into two classes: ![](https://i.imgur.com/U8WBlBu.png) - Two of the lines (x and y in the diagram) are limiting parallels (sometimes called critically parallel, horoparallel or just parallel): there is one in the direction of each of the ideal points at the "ends" of R, asymptotically approaching R, always getting closer to R, but never meeting it. - All other non-intersecting lines have a point of minimum distance and diverge from both sides of that point, and are called ultraparallel, diverging parallel or sometimes non-intersecting. --- ## **What is Hyperbolic Graph Convolutional Neural Networks(HGCNN)** Hyperbolic Graph Convolutional Networks (HYPERGCN), a class of graph representation learning models that combine the expressiveness of GCNs and hyperbolic geometry to learn improved representations forreal-world hierarchical and scale-free graphs in inductive settings. Similarally, In GCN have performed node embedding to reduce the graph size and then perform Aggregation operation via passed in Neural networks. **The Limitations of Euclidean Space for Hierarchical Data** The whole reason why we went through that primer on hyperbolic geometry is that we want to embed data in it! Consider some hierarchical data, such as a tree. A tree with branching factor b has (b+1)bl−1 nodes at level l and (b+1)bl−2b−1 nodes on levels less than or equal to l. So as we grow the levels of the tree, the number of nodes grows exponentially. There are really two important pieces of information in a tree. One is the hierarchical nature of the tree: **the parent-child relationships.** The other is a relative "**distance" between the nodes**. Children and their parents should be close, but leaf nodes in totally different branches of the tree should be very far apart (probably somehow proportional to the number of links). Let's imagine putting this into a Euclidean (say 2D) space. First, we would probably put the root at the origin. Then, place the first level equidistant around it. Then place the second level equidistant from all those points. Figure shows these two levels. ![](https://i.imgur.com/x9ItqYB.png) You can see we're already running out of space. The hierarchical links are sort of represented **(distances between child/parent are maintained)**, **but importantly, the distances between siblings is gets smaller.** Look at the leaf nodes which are squeezed at the edge because we don't have enough "space". One way around, might be to increase dimensions but then you'd need to increase those with the number of levels you have, which brings a whole host of other problems. Long story short, Euclidean space isn't a good representation for graph-like data **Embedding Hierarchies in Hyperbolic Space** It shouldn't be a surprise at this point to know that hyperbolic space is a good representation of hierarchical data. Using the same sort of algorithm as we tried above of placing the root at the center and spacing the children out equidistant recursively does work in hyperbolic space. The reason is that distances grow exponentially as we move toward the edge of the disk, eliminating the "crowding" effect we saw above. Figure shows a visualization for a tree with branching factor two. ![](https://i.imgur.com/Z6KOa2C.png) Of course, Figure looks crowded at the lower levels just like the previous figure but that's only because hyperbolic space is hard to visualize intuitively. In fact all the points in Figure 16 are equidistant from their parent, and siblings and cousins nodes are much further apart then they appear. So crowding in fact isn't much of an issue (use my visualization above to play around with it). We can also use the exact same idea but instead of the Poincaré disk use the Poincaré ball in d dimensions. The problem now becomes, how do I map my nodes in my hierarchical data to the Poincaré ball? As usual, via an optimization. We have to use a variant of our usual stochastic gradient descent called **Riemannian Stochastic Gradient Descent (RSGD)** because we're optimizing over a **manifold**, not just simple Euclidean space. **Note** Manifold is the technique by which we will use to genrate feature vector (as embedding) from hierarchical data i.e hyperbolic space --- ## **Mehtod to Implement of HGCNN** - We derive the core transformation of GCNs in the hyperboloid model of hyperbolic space to transform the input features which lie in Euclidean space into hyperbolic embeddings. - We introduce a hyperbolic attention-based aggregation scheme that captures node hierarchies. - We apply feature transformations in hyperbolic spaces of different trainable curvatures at different layers, to learn hyperbolic embeddings that preserve the graph structure and a notion of hierarchy for nodes in the graph. > We introduce the essential components of HYPERGCN. First, since input features are often Euclidean, we derive a mapping from Euclidean features to hyperbolic space. Next, we derive the two components of graph convolution: The analogs of the Euclidean feature transformation and aggregation in the hyperboloid model. Finally, we introduce the HYPERGCN algorithm with trainable curvature. ## **Implementation** **Mapping from Euclidean to hyperbolic spaces** - HYPERGCN first maps input features to the hyperboloid manifold via the exp map. Let x<sup>0,E</sup> ∈ R<sup>d</sup> denote input Euclidean features. For instance, these features could be produced by pre-trained Euclidean neural networks **Feature transform in hyperbolic space** - The feature transform in Equation 1 is used in GCN to map the embedding space of one layer to the next layer embedding space and capture large neighborhood structures. We now want to learn transformations of points on the hyperboloid manifold. However, there is no notion of vector space structure in hyperbolic space. We build upon Hyperbolic Neural Network (HNN) and derive transformations in the hyperboloid model. The main idea is to leverage the exp and log maps in Corollary so that we can use the tangent space ToH<sup>d,K</sup> to perform Euclidean transformations. **Neighborhood aggregation on the hyperboloid manifold** - Aggregation (Equation 2) is a crucial step in GCNs as it captures neighborhood structures and features with message passing. Suppose that xi aggregates information from its neighbors (xj )j∈N(i) with weights (wj )j∈N(i). Mean aggregation in Euclidean GCN computes the weighted average P j∈N(i) wjxj . An analog of mean aggregation in hyperbolic space is the Frechet mean, which has no closed form solution. Instead, we propose to perform aggregation in tangent spaces using hyperbolic attention. Attention based aggregation. Attention in GCNs learns a notion of neighbors’ importance, and aggregates neighbors’ messages according to their importance to the center node. However, attention on Euclidean embeddings does not take into account the nodes’ hierarchies. Thus, we further propose hyperbolic attention-based aggregation. Given hyperbolic embeddings (x<sup>H</sup>i, x<sup>H</sup>j), e first map x<sup>H</sup>i and x<sup>H</sup>ix<sup>H</sup>j to the tangent space of the origin to compute attention weights wij with concatenation and Euclidean Multi-layer Percerptron (MLP). We then propose a hyperbolic aggregation to average nodes’ representations --- ## **HYPERGCN architecture** Having introduced all the building blocks of HYPERGCN, we now summarize the model architecture. Given a graph G = (V, E) and input Euclidean features (x 0,E)i∈V , the first layer of HYPERGCN is a mapping from Euclidean to hyperbolic space detailed disscused (above). HYPERGCN stacks multiple hyperbolic graph convolution layers. At each layer HYPERGCN transforms and aggregates neighbour’s embeddings in the tangent space of the center node and projects the result to a hyperbolic space with different curvature. Hence the message passing in a HYPERGCN layer is ![](https://i.imgur.com/NqUBpN2.png) --- ## **Final Key Point** - **Manifold.** An d−dimensional manifold M is a topological space that locally resembles the topological space R d near each point. More concretely, for each point x on M, we can find a homeomorphism (continuous bijection with continuous inverse) between a neighbourhood of x and R d. The notion of manifold is a generalization of surfaces in high dimensions. - **Tangent space.** Intuitively, if we think of M as a d−dimensional manifold embedded in R d+1, the tangent space TxM at point x on M is a d−dimensional hyperplane in R d+1 that best approximates M around x. Another possible interpretation for TxM is that it contains all the possible directions of curves on M passing through x. The elements of TxM are called tangent vectors and the union of all tangent spaces is called the tangent bundle T M = ∪x∈MTxM - **Riemannian manifold.** A Riemannian manifold is a pair (M, g), where M is a smooth manifold and g = (gx)x∈M is a Riemannian metric, that is a family of smoothly varying inner products on tangent spaces, gx : TxM × TxM → R. Riemannian metrics can be used to measure distances on manifolds. - **Distances and geodesics p .** Let (M, g) be a Riemannian manifold. For v ∈ TxM, define the norm of v by ||v||g := gx(v, v). Suppose γ : [a, b] → M is a smooth curve on M. --- ## **Advantages of HGCNN** - **Citation networks**. CORA and PUBMED are standard benchmarks describing citation networks where nodes represent scientific papers, edges are citations between them, and node labels are academic (sub)areas. CORAcontains 2,708 machine learning papers divided into 7 classes while PUBMED has 19,717 publications in the area of medicine grouped in 3 classes. - **Disease propagation tree**. We simulate the SIR disease spreading model, where the label of a node is whether the node was infected or not. Based on the model, we build tree networks, where the node features indicate the susceptibility to the disease. We build transductive and inductive variants of this dataset, namely DISEASE and DISEASE-M, with 1,044 and 43193 nodes respectively. - **Protein-protein interactions (PPI) networks**. We compose a dataset of PPI networks containing the human proteins in all tissues, where edges represent interactions between proteins. The node classification task is to predict the stem cell growth rate after 19 days. The 16-dimensional feature for each node represents the RNA expression levels of the corresponding proteins, and we perform log transform on features. - **Flight networks**. AIRPORT is a transductive dataset where nodes represent airports and edges represent the airline routes as from OpenFlights.org. Compared to previous compilations, this dataset has larger size (2236 nodes). We also augment the graph with geographic information (longitude, latitude and altitude), and GDP of the country where the airport belongs to. We use the population of the country where the airport belongs to as the label for node classification.