## Continue Graph Thinking
https://hackmd.io/@teacher144123/SJh9ZSzpP
---
## Graph Representation Learning ch4
- Present: Dragon Chen
- Date: 12/25
---
## Intro
We focus on one taks type: knowledge graph completion.
Using **shallow embedding** was mentioned last chapter to deal with **multi-relation graphs**
----
### Recap: Multi-Relation Graph

Edge has different types
----
Multi-relation graph often referred as **knowledge graph**.
Knowledge graph completion is to **predict missing edge**.
---
## Decoder

We make our decoder **multi-relational**
No multi-relation:
$DEC(ENC(u), ENC(v)) = S[u, v]$
$DEC(z_u, z_v) = S[u, v]$
----
### Multi-Relation Decoder
Has multi-relation:
$DEC(z_u, r, z_v) = A[u, r, v]$
$r$ is **relation type**.
$A[u, r, v]$ represent **possibility of this edge exists** in this graph.
----
### Simplest example
$DEC(z_u, r, z_v) = z_u^\top R_r z_v$
$R \in \mathbb{R}^{d \times d}$ is learnable matrix for each relation
$loss = \sum_{u \in U} \sum_{v \in V} \sum_{r \in R} || DEC(z_u, r, z_v) - A[u, r, v]||^2$
----
### Problem
- The first issue is that it is **extremely expensive to compute**
The complexity is O(V^2 * R), but ideal complexity is O(E) in sparse graph
- We wanna loss function like **CrossEntropy instead of Mean-Squared Error**
MSE is natural for Regression. CrossEntropy is more suited for a edge classification task.
---
## Cross-Entropy Loss
One popular loss function that is both efficient and suited to our task is the **cross-entropy loss with negative sampling**.
$$
loss = \sum_{(u,r,v)\in E} -log(\sigma(DEC(z_u, r, z_v))) \\ -
\gamma * \mathbb{E}_{v_n \in V} [log(\sigma(-DEC(z_u, r, z_{v_n})))]
$$
$\sigma$ is sigmoid function
$\gamma$ is a hyper-parameter
----
### Detail
This loss come from **binary cross entropy**.
First part, **log-likelihood that we predict True** for an edge that does actually exist
$loss = \sum_{(u,r,v)\in E} -log(\sigma(DEC(z_u, r, z_v)))$<!-- .element: class="fragment" -->
Second, log-likelihood that we correctly **predict False for an edge that does not exist**
$- \gamma * \mathbb{E}_{v_n \in V} [log(\sigma(-DEC(z_u, r, z_{v_n})))]$<!-- .element: class="fragment" -->
----
The expectation is evaluated using a Monte Carlo approximation.
$$
L = \sum_{(u,r,v)\in E} \big( -log(\sigma(DEC(z_u, r, z_v))) \\ -
\sum_{v_n \in p_{n, u}} [log(\sigma(-DEC(z_u, r, z_v)))] \big)
$$
$P_{n,u}$ is a (usually small) set of nodes sampled from $P_{n,u}(V)$.
---
### Max-Margin Loss
The other popular loss function used for multi-relational node embedding is the **margin loss**.
$$
L = \sum_{(u, r, v) \in E} \sum_{v_n \in p_{n, u}} \\
max(0, -DEC(z_u, r, z_v) + DEC(z_u, r, z_{v_n}) + \bigtriangleup)
$$
$\bigtriangleup$ is margin
If the score **for the “true” pair is bigger than the “negative” pair** then we have a small loss
---
## Decoders
We have mentioned one multi-relational decoder, the so-called RESCAL decoder
$DEC(z_u, r, z_v) = z_u^\top R_r z_v$
This is not often used because **high computation cost and high complexity**.<!-- .element: class="fragment" -->
$R_r \in \mathbb{R}^{d \times d}$ is O(d^2), we want our decoder is O(d)<!-- .element: class="fragment" -->
----
### Translational decoders
- TransE
$DEC(z_u, r, z_v) = - || z_u + R_r - z_v ||$
This is one of the earliest decoder and continues to be a strong baseline.<!-- .element: class="fragment" -->
- TransX
$DEC(z_u, r, z_v) = - || g_{1,r}(z_u) + R_r - g_{2,r}(z_v) ||$
$g_{i,r}$ are trainable transformations depend on the relation<!-- .element: class="fragment" --> $r$
----
- TransH(Want et al. 2014)
-$|| (z_u - w_r^\top z_u w_r) + R_r - (z_v - w_r^\top z_v w_r) ||$
---
## Different Decoder
### Multi-Linear Dot Product
Another approach develops multi-relational decoders by generalizing the dot-product decode from simple graphs. Called DistMult decoder.
$$
DEC(z_u, r, z_v) = <z_u, r_r, z_v> \\
= \sum_{i=1}^d z_u[i] \times r_r[i] \times z_v[i]
$$
----
One problem is **DistMult is symmetric**.
$DEC(z_u, r, z_v) = DEC(z_v, r, z_u$
This is a limitation because **many multi-relational graphs are directed and asymmetric**.
----
### Complex Decoder
Trouillon et al [2016] proposed augmenting the DistMult encoder by employing complex-valued embeddings
- ComplEx
$$
DEC(z_u, r, z_v) = Re(<z_u, r, z_v>) \\
= Re(\sum_{i=1}^d z_u[i] \times r_r[i] \times )
$$
{"metaMigratedAt":"2023-06-15T17:39:58.836Z","metaMigratedFrom":"YAML","title":"Graph Representation Learning ch4~5","breaks":true,"slideOptions":"{\"theme\":\"serif\",\"transition\":\"slide\"}","contributors":"[{\"id\":\"1155b8f4-1581-4900-adcf-1b4abbb2c7fd\",\"add\":5289,\"del\":685}]"}