Geometric DL [Michael Bronstein] Video notes

# Geometric DL [Michael Bronstein] Video notes ## Formulation: Let $x_i \in R^k$ be features of node i. Stack these to form $X=\{x_1,x_2,..x_n\}^T$ ith row of X corresponds to ith node. A function f is permutation invariant if $f(PX)=f(X)$ where P is a permutation matrix made of 1s and 0s and $\sum_{row}$=1 for each row. ### Deep Sets: $$ f(X) = \phi\left(\sum_{i\in V}\psi(x_i)\right) $$ Sigma can be replaced by max or avg Hence general form of deep sets is: $$ f(X) = \phi\left(\bigoplus_{i\in V}\psi(x_i)\right) $$ where $\bigoplus$ is any permutation invariant op. ## Permutation equivariance Permutation invariance only works if we want answers at the level of entire set. If we want at at node level; we would like to have permutation equivariance A function f is permutation EQUI-variant if $f(PX)=Pf(X)$ where P is a permutation matrix ($\sum_{row}$=1 for each row) ## We also want locality! Nature has some level of noise in any data on local level. Any good model should be able to see through the noise. One way to enforce locality is to apply the same function $\psi$ to all nodes: $h_i = \psi(x_i)$ for all i. Deep sets do exactly this. ## Learning on graphs $G=(V,E)$ graph. Adjacency matrix A: $A_{ij} = 1_{(i,j) \in E}$ ### Permutation invariance and equivariance on graphs: A permutation matrix P on nodes means we also need to do permutation $PAP^T$ on adjacency matrix A. Invariance: $f(PX, PAP^T) = f(X,A)$ Equivariance: $F(PX, PAP^T) = PF(X,A)$ ### Locality preservation in graph Define one-hop neighborhood as: $N_i = \{j; (i,j) \in E\}$ Extract neighborhood features: $X_{N_i} = \{X_j, j \in N_i\}$ $X_{N_i}$ has same number of columns as $x_i$ (node) and number of rows is $N_i$. Now define $\phi$ as a local function $\phi(x_i, X_{N_i})$ Now given that $\phi$ is permutation invariant on node, we have the following permutation equivariant function on graph: $F(X,A)_i = \phi(x_i, X_{N_i})$ Above equation applies row-wise. ### Blueprint for learning on graphs: - Node classification: $z_i = f(h_i)$ - Graph classification: $z_G = f(\bigoplus_{i \in V} h_i)$ - Link prediction $z_{ij} = f(h_i,h_j,e_{ij})$ ### Common lingo F is GNN layer $\phi$ is diffusion/propogation/message passing ## GNN flavors ### Convolutional GNN $$h_i = \phi\left(x_i, \bigoplus_{j \in N_i} c_{ij}\psi(x_j)\right)$$ Coefficients $c_{ij}$ are fixed!! Usually it comes from adjacency matrix A. For eg. $c_{ij}=\frac 1 {deg_j}$ Examples: ChebyNet, GCN, SGC This is useful for **homophilous** graph ie neighboring nodes likely share the same label / has similarity. Used extensively in industry. ### Attentional GNN Replace $c_{ij}$ by $\alpha_{ij}$ where $\alpha_{ij}=a(x_i,x_j)$ is the learned attention weight. $$ h_i = \phi\left( x_i, \bigoplus_{j \in N_i} a(x_i,x_j) \psi(x_j) \right) $$ Examples: MoNet, GAT, GATv2 Transformer is also in ths space. ### Messaging-passing GNN Can pass arbitrary message vectors $m_{ij} = \psi(x_i,x_j)$ $$ h_i = \phi\left( x_i, \bigoplus_{j \in N_i} \psi(x_i,x_j) \right) $$ Examples: Interaction Nets, MPNN, GraphNets This is the most generic GNN. Conv-GNN is a subset of Att-GNN which is further a subset of MP-GNN.