Optimization on manifolds

geoopt/geoopt ( @ferrine, @rrkarim, @SomeoneS)
March 13: an overview, @SomeoneS
- Notes: https://hackmd.io/@SomeoneS/S1gDMMFV8
March 20: Hyperbolic deep learning, graph embeddings, @ferrine96

References

Differential Geometry
- Frederic Schuller's lectures on youtube
- Lee's series
Optimization: Absil et. al
Details in notes

Metagoals

This talk fails to provide examples (yet)
Aims at stating the problems
…and actually asking questions

"Riemannian Adaptive Optimization Methods"

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(Becigneul, Ganea)

Gradient descent

\(p_0\in K,~\alpha\in\mathbb{R},~f: K\subset \mathbb{R}^d\to \mathbb{R},\)
\(f(p)\to \min.\)

\(g_t = f'(p_t)^\top\)

\(x_{t+1} = x_t - \alpha g_t\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Manifolds

An \(n\)-manifold: "a 'nice' top. space \((M, \Theta)\) whose every \(p\in M\) has open neighbourhood homeomorphic to (a piece of) \(\mathbb{R}^n\)"

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

\[\phi: U\xrightarrow{\sim} \phi(U)\subset \mathbb{R}^n\]

Manifolds. Missing tools

Descent direction: no derivatives
- No notion of differentiability
Step along direction: no addition

Smooth manifolds

"A manifold covered with a maximal atlas of smoothly-compatible charts", \((M, \Theta, \mathcal{A})\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Smooth functions

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Directional derivatives

\((M, \Theta, \mathcal{A}),\)
\(p\in M,\)
Smooth curve \(\gamma:I\subset\mathbb{R}\to M,\) \(\gamma(0)=p,\)
\(f: M\to \mathbb{R}\) smooth.

\[(f\circ \gamma)'(0).\]

Directional derivatives

\[X_{\gamma, p}f = (f\circ \gamma)'(0)\]

"Directional derivative of \(f\) along \(\gamma\) at \(p\)"

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Directional derivatives

\[X_{\gamma, p}f = (f\circ \gamma)'(0).\]

\(X_{\gamma,p}:C^\infty(M)\to\mathbb{R}\) is a linear operator.
Leibniz rule: \(X_{\gamma, p}(fg) = f(p)X_{\gamma, p}g + X_{\gamma, p}(f)g(p).\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Tangent space

\[T_pM = \left\{ X_{\gamma,p}\left| \gamma:I\to M~\text{smooth}, ~\gamma(0)=p \right. \right\}\]

"Tangent space to \(M\) at \(p\)"
Coincides with the set of all derivations
Derivations
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
curve segments via charts
Forms a linear space

Coordinate basis

A chart \((U,x)\), \(p\in U\), corresponds to a basis in \(T_p M\), represented by coordinate curves.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Denoted: \(\frac{\partial}{\partial x^1},~\ldots,~\frac{\partial}{\partial x^n}.\)
Acts: \(\frac{\partial}{\partial x^k}f = \partial_k (\underbrace{f\circ (x^k)^{-1}}_{\mathbb{R}^n\to\mathbb{R}})(\underbrace{x(p)}_{\mathbb{R}^n})\) where \(\partial_k\) is partial derivative

Exponential map

TODO: move to after geodesics

"Direction" means \(X\in T_pM\)
\(\exists\) open \(C(p)\subset M\) (cut locus) s.t. ~~curve~~ geodesic \(\gamma\) representing \(X\) is unique

\[\exp_pX = \gamma(1)~\text{if defined}.\]

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Smooth manifolds. Missing tools

Before: \(p_{t+1} = p_t - \alpha g_{t},\)
After: \(p_{t+1} = \exp_{p_t}(-\alpha X_t).\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Given objective \(f:M\to\mathbb{R}\) and \(p_t\in M\), how to pick \(X_t\)?

Derivative of a function

Given smooth \(F:M\to N,\)

"the push-forward of \(F\) at \(p\)" is the linear map \(F_{*,p}:T_p{M}\to T_{F(p)}N\)

\[(\underbrace{F_{*,p}X}_{T_{F(p)}N})\underbrace{g}_{C^\infty(N)} = \underbrace{X}_{T_p M}(\underbrace{g\circ F}_{C^\infty(M)}).\]

In coordinate basis: Jacobian matrix

Derivative of objective

Given objective \(f:M\to \mathbb{R}\) its derivative at point \(p\in M\) is a linear functional \[T_p{M}\to T\mathbb{R}\cong \mathbb{R}.\]

\(f_{*,p}\in T_p^* M\),
need \(T_p M\)
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
.

Metric manifolds

"A smooth manifold endowed with smooth assignment of (almost) 'inner products' \(g:T_p M\times T_p M\to \mathbb{R}\) at each \(p\)". Notes for details

Rieszc in \(\mathbb{R}^n\): vectors
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
linear functionals via inner product
\(\omega = g(X, \cdot) = X^\flat\)
\(\omega \mapsto X=\omega^\sharp\) as unique \(X\) s.t. \(\omega(Y) = g(X, Y)\).

RGD

\[X_t = \operatorname{grad}f(p_t) \equiv f_{*,p_t}^\sharp,\] \[x_{t+1} = \exp_{p_t}(-\alpha X_t).\]

Ready to implement?

Extrinsic representation

Immersion: smooth \(\phi: M\to\mathbb{R}^n\) such that \(\phi_{*,p}\) is injective for every \(p\in M\). Embedding: a diffeomorphic immersion.

[Strong] Whitney: can embed in \(\mathbb{R}^{2\operatorname{dim}M}.\)

Extrinsic representation

\(M\subset\mathbb{R}^d\) be a level-set (\(\{p\left|F(p)=0\right.\}\)).

\(\left.\operatorname{id}\right|_{M}\) is an embedding and
can represent points and tangents via (redundant) \(\mathbb{R}^d\) coordinates,
can induce metric from Euclidean using \(F\),
need projection op.
(prevalent approach, including geoopt)
(examples using local charts?)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(Becigneul, Ganea)

Retractions

Closed-form \(\exp\) and \(\sharp\)?
- Usually unknown or infeasible
First-order approximations to \(\exp\): retractions

\[R_p:T_p M \to M,\] \[R_p(0) = p,\] \[(R_p)_{*,0} "=" \operatorname{id}_{T_p M}.\]

Faster methods

In vector spaces:
- second-order methods (Newton)
- dynamic approximations of Hessian (quasi-Newton)
  - "momentum", "adaptivity term"
On manifold:
- First derivative is a vector field
- How to differentiate vector fields?

Vector fields

A Bundle is a triple \((E, \pi, M)\), \[E\xrightarrow{\pi} M\]

\(E\) – "total space" (smooth mfd),
\(\pi\) – "projection" (smooth map),
\(M\) – "base space" (smooth mfd),

Vector fields

Tangent bundle: \[E = TM = \cup_p T_p M\] \[\pi = X_{\gamma, p} \mapsto p.\]

Vector fields

A smooth section of a bundle \(E\xrightarrow\pi M\) is a smooth map \[\sigma:M\to E\] such that \[\pi\circ\sigma = \operatorname{id}_M\]

Set of all smooth sections is denoted \(\Gamma(E)\).

Vector fields

A vector field is a smooth section of the tangent bundle \(TM\).

To each \(p\in M\) assigns \(X_p\in T_p M\)
but in a "smooth way"

Another example: metric \(g\) is a smooth section of the bundle of bilinear forms (\((0,2)\)-tensor field)

Covariant derivative

"Covariant derivative of vf \(Y\) along \(X\)": \[\nabla_XY\] Wishlist:

\(\nabla_{fX+Y}Z = f\nabla_XZ + \nabla_Y Z,\)
\(\nabla_X(Y+Z) = \nabla_XY + \nabla_X Z,\)
\(\nabla_X(fY) = X(f)Y + f\nabla_XY.\)

Covariant derivative

In chart:

…

(Christoffel symbols)

Levi-Civita connection

…

Vector field along a curve

\(\gamma:I\to M\)
\(X:I\to TM\) s.t. \(X_\lambda \in T_{\gamma(\lambda)}M\)

Extendible if there is \(\tilde{X}\in\Gamma(TM)\) s.t. \(X_\lambda = \tilde{X}_{\gamma(\lambda)}.\)

Thm: locally unique

Parallel translate

If \(X:I \to TM\) is a vf along \(\gamma\) extendible to \(\tilde{X}\) and \[\nabla_X\tilde{X} = 0\] then \(X\) is called autoparallely-transported

"Transports" \(X_0\) in \(T_{\gamma(0)}M\) to \(X_\lambda\in T_{\gamma(\lambda)}M\)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(Becigneul, Ganea)

Vector transport

No closed-form parallel translation either
"Vector transport" – 1st-order approx (see Absil)

(here should go some qualitative derivation vector transports for levelsets)

Newton

later addition

\(f: \mathbb{R} \to \mathbb{R},\) find \(x\) s.t. \(f(x)=0\).

Instead, replace \(f\) with linear \(h(x) = f(x_t) + f'(x_t)(x - x_t)\)
and solve \(h(x)=0\):
\(x = x_t-f'(x_t)^{-1}f(x_t)\)

Newton

later addition

\(f: \mathbb{R}\to \mathbb{R},\) find \(x\) s.t. \(f'(x)=0\).

Apply Newton's iteration to \(f'\)
\(x_{t+1} = x_t - f''(x_t)^{-1} f'(x_t)\)

Newton

later addition

\(f: \mathbb{R}^n\to\mathbb{R}^m\), find \(x\) s.t. \(f(x)=0\).

\(\tilde{f}(x) = f(x_t) + f'(x_t)(x-x_t),\)
\(\tilde{f}(x_{t+1}) = 0.\)

Newton

later addition

\(f: \mathbb{R}^n\to\mathbb{R}\), find \(x\) s.t. \(f'(x)=0\).

\(\tilde{f'}(x) = f'(x_t) + \underbrace{f''(x_t)(x - x_t)}_{\in~{^n}\mathbb{R}},\)
\(f''(x_t)(x - x_t) = -f'(x_t)\)

Newton

Solve \[\nabla_{X_t}\operatorname{grad}f(p_t) = -\operatorname{grad}f(p_t)\] for \(X_t\in T_{p_t}M\).

\[p_{t+1} = \exp_{p_t}X_t.\]

Accelerated first-order methods

Stochastic optimization
Accelerated SGD?
Adam: coordinate-wise adaptive learning rate (more or less)
- Parallel translation of "momentum"

"Coordinates" on manifolds?

Product manifolds

\[M = M_1\times \cdots\times M_n,\]

Product topology, product metric, …

RAdam: adaptivity term for each product component!

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(Becigneul, Ganea)

\(\varkappa\)-Stereographic space

(Becigneul, Ganea) use RAdam to embed wordnet in a Poincare ball (constant curvature \(c\))
Hyperbolic neural networks
\(c\) – hyperparameter
Optimize for \(c\)?

\(\varkappa\)-Stereographic space

\(S^\varkappa\) – \(\varkappa\)-stereographic space, universal for positive and negative curvature (Bloch)
"Almost \(\mathbb{R}\)-line bundle"
- \(E\xrightarrow\pi \mathbb{R}\)
- \(\pi^{-1}(\varkappa) \simeq S^\varkappa\)
Can we induce metric, etc? OK for optimization?

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.