Graph Representation Learning ch4~5

## Continue Graph Thinking https://hackmd.io/@teacher144123/SJh9ZSzpP --- ## Graph Representation Learning ch4 - Present: Dragon Chen - Date: 12/25 --- ## Intro We focus on one taks type: knowledge graph completion. Using **shallow embedding** was mentioned last chapter to deal with **multi-relation graphs** ---- ### Recap: Multi-Relation Graph ![](https://i.imgur.com/UCpa3Zf.png =1000x) Edge has different types ---- Multi-relation graph often referred as **knowledge graph**. Knowledge graph completion is to **predict missing edge**. --- ## Decoder ![](https://i.imgur.com/2EYoPF6.png =800x) We make our decoder **multi-relational** No multi-relation: $DEC(ENC(u), ENC(v)) = S[u, v]$ $DEC(z_u, z_v) = S[u, v]$ ---- ### Multi-Relation Decoder Has multi-relation: $DEC(z_u, r, z_v) = A[u, r, v]$ $r$ is **relation type**. $A[u, r, v]$ represent **possibility of this edge exists** in this graph. ---- ### Simplest example $DEC(z_u, r, z_v) = z_u^\top R_r z_v$ $R \in \mathbb{R}^{d \times d}$ is learnable matrix for each relation $loss = \sum_{u \in U} \sum_{v \in V} \sum_{r \in R} || DEC(z_u, r, z_v) - A[u, r, v]||^2$ ---- ### Problem - The first issue is that it is **extremely expensive to compute** The complexity is O(V^2 * R), but ideal complexity is O(E) in sparse graph - We wanna loss function like **CrossEntropy instead of Mean-Squared Error** MSE is natural for Regression. CrossEntropy is more suited for a edge classification task. --- ## Cross-Entropy Loss One popular loss function that is both efficient and suited to our task is the **cross-entropy loss with negative sampling**. $$ loss = \sum_{(u,r,v)\in E} -log(\sigma(DEC(z_u, r, z_v))) \\ - \gamma * \mathbb{E}_{v_n \in V} [log(\sigma(-DEC(z_u, r, z_{v_n})))] $$ $\sigma$ is sigmoid function $\gamma$ is a hyper-parameter ---- ### Detail This loss come from **binary cross entropy**. First part, **log-likelihood that we predict True** for an edge that does actually exist $loss = \sum_{(u,r,v)\in E} -log(\sigma(DEC(z_u, r, z_v)))$ Second, log-likelihood that we correctly **predict False for an edge that does not exist** $- \gamma * \mathbb{E}_{v_n \in V} [log(\sigma(-DEC(z_u, r, z_{v_n})))]$ ---- The expectation is evaluated using a Monte Carlo approximation. $$ L = \sum_{(u,r,v)\in E} \big( -log(\sigma(DEC(z_u, r, z_v))) \\ - \sum_{v_n \in p_{n, u}} [log(\sigma(-DEC(z_u, r, z_v)))] \big) $$ $P_{n,u}$ is a (usually small) set of nodes sampled from $P_{n,u}(V)$. --- ### Max-Margin Loss The other popular loss function used for multi-relational node embedding is the **margin loss**. $$ L = \sum_{(u, r, v) \in E} \sum_{v_n \in p_{n, u}} \\ max(0, -DEC(z_u, r, z_v) + DEC(z_u, r, z_{v_n}) + \bigtriangleup) $$ $\bigtriangleup$ is margin If the score **for the “true” pair is bigger than the “negative” pair** then we have a small loss --- ## Decoders We have mentioned one multi-relational decoder, the so-called RESCAL decoder $DEC(z_u, r, z_v) = z_u^\top R_r z_v$ This is not often used because **high computation cost and high complexity**. $R_r \in \mathbb{R}^{d \times d}$ is O(d^2), we want our decoder is O(d) ---- ### Translational decoders - TransE $DEC(z_u, r, z_v) = - || z_u + R_r - z_v ||$ This is one of the earliest decoder and continues to be a strong baseline. - TransX $DEC(z_u, r, z_v) = - || g_{1,r}(z_u) + R_r - g_{2,r}(z_v) ||$ $g_{i,r}$ are trainable transformations depend on the relation $r$ ---- - TransH(Want et al. 2014) -$|| (z_u - w_r^\top z_u w_r) + R_r - (z_v - w_r^\top z_v w_r) ||$ --- ## Different Decoder ### Multi-Linear Dot Product Another approach develops multi-relational decoders by generalizing the dot-product decode from simple graphs. Called DistMult decoder. $$ DEC(z_u, r, z_v) = <z_u, r_r, z_v> \\ = \sum_{i=1}^d z_u[i] \times r_r[i] \times z_v[i] $$ ---- One problem is **DistMult is symmetric**. $DEC(z_u, r, z_v) = DEC(z_v, r, z_u$ This is a limitation because **many multi-relational graphs are directed and asymmetric**. ---- ### Complex Decoder Trouillon et al [2016] proposed augmenting the DistMult encoder by employing complex-valued embeddings - ComplEx $$ DEC(z_u, r, z_v) = Re(<z_u, r, z_v>) \\ = Re(\sum_{i=1}^d z_u[i] \times r_r[i] \times ) $$