or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Riemannopt@HSE
https://hackmd.io/@SomeoneS/manopt-slides
It's a two-parts talk about optimization on manifolds and hyperbolic embeddings in deep learning.
March, 6March, 13 and covers the basics.March, 13, March, 20.Notes and slides:
References:
Additional:
Starting point – RAdam
Let us first set the goals for to-day. Somewhat unconventionally, we jump off with the algorithm from "Riemannian Adaptive Optimization Methods" by Becigneul and Ganea. It might look scary at first, but what we're now going to do is iterate, decrypting it bit by bit.
Manifolds
An \(n\)-dimensional manifold \(M\) is a topological space \((M, \theta)\) that
Roughly speaking, it's a "locally-Euclidean space".
For instance, \(\mathbb{R}^n\) itself is a manifold, with (global) chart being identity function.
An additional piece of structure we're going to need (in addition to that of topology and local-Eucliddean-ness), is that of a smooth manifold. Consider two charts \((U, x)\), \((V, y)\) that share some part of their domains: \(U\cap V \neq \varnothing.\) Both charts being invertible functions, we can construct a "change of coordinates", or "transition map" from \(x\) to \(y\) as \(y\circ x^{-1}\). We say that \(x\) and \(y\) are smoothly compatible, if either \(U\cap V = \varnothing\) or transition map is a diffeomorphism (in Euclidean sense): \[y\circ x^{-1}: x(U\cap V)\subset\mathbb{R}^n \to y(U\cap V)\subset\mathbb{R}^n.\] Now, to endow \(M\) with a smooth structure, means to provide a maximal atlas \(A\) on \(M\) – a collection of smoothly-compatible charts, whose domains cover entire \(M\), and such that this collection cannot be extended with a new chart in a smoothly-compatible way.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Now, we've done some hard work, we probably already want to see some benefits? What smooth structure allows us to define, is: what does it mean for a function to be a smooth function between manifolds? Given two smooth manifolds \((M, \theta_M, A_M)\) and \((N, \theta_N, A_N)\), we say that a function \(F: M\to N\) is smooth (resp. \(C^k\)) if \[y\circ F\circ x^{-1}: x(U)\subset\mathbb{R}^m \to y(V)\subset\mathbb{R}^n\] is smooth (resp. \(C^k\)) for every \((U, x)\in A_M\), \((V, y) \in A_N\).
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Submanifolds, product manifolds, bundles, fibers, sections
On one hand, I want to introduce bundles&c after tangent spaces, in the context of vector fields&c; but on the other, it seems meaningful to introduce submanifolds&c at very beginning so that we have examples early on; this section will be probably split and moved later; this section follows Frederic Schuller's lecture
I also need bundles to try and introduce curvature learning on stereographic model spaces
One way to construct manifolds, and the one that we're going to use the most, is from other manifolds.
Given a topological \(n\)-manifold \((M, \theta)\), we call \((N, \left.\theta\right|_N)\), \(N\subset M\), a submanifold of \(M\) if \((N, \left.\theta\right|_N)\) is itself a manifold.
Given \(m\)-manifold \(M\) and \(n\)-manifold \(N\), their product manifold is the manifold with underlying set the cartesian product and endowed with product topology: \((M{\times}N, \left.\theta\right._{M{\times}N})\).
A bundle of topological manifolds is a triple \((E, \pi, M)\) of total space \(E\), continuous surjective projection map \(\pi:E\to M\), and base space \(M\). Bundle is a weaker version of product: instead of saying that \(E\) can be constructed from its components, we say that locally \(E\) can be projected onto components
Example: product manifold is a bundle (REMOVE?)
Example: Mobius strip is a bundle, but not a product
A fiber bundle is a bundle \(\pi:E\to M\) s.t. \(\pi^{-1}(p) \cong F\) for all \(p\in M\) and some ("typical fiber") manifold \(F\).
Lo and behold, given a bundle \(\pi:E\to M,\) a section \(\sigma: M\to E\) assigns to each point \(p\) in manifold \(M\) a point from its fiber \(\sigma(p)\in \pi^{-1}(p)\subset E\). In other words, it's a map \(\sigma: M\to E\) such that \(\pi \circ \sigma = \operatorname{id}_M\). Example: vector fields.
TODO
Gradient descent, "step in a direction", and differentiation
In this section we talk about line-search along geodesics, derivations and differentiation of functions between manifolds
TODO: actually, we need Riemannian metric before we can finish this section – in order to convert derivative into a tangent vector.
One of the simplest optimization methods one can think of is the gradient descent: given a (piecewise)-differentiable objective function \(f:X\subset\mathbb{R}^n \to \mathbb{R}\) initial estimate \(x_0 \in X\), and learning rate \(\alpha\in\mathbb{R}\), we gradually improve our estimate, by making a small linear step in the direction of steepest descent of the loss function: \[x_t = x_{t-1} - \alpha \operatorname{grad} f (x_{t-1}).\]
One way to interpret such iteration, is that at step \(t\), we approximate our objective \(f\)
Let's begin with the notion of "direction". The key observation for generalization of "directions" and "direction fields" to manifolds, is that "direction" attached to a point acts on scalar functions as a directional derivative at that point. Specifically, given a "test function" \(f:\mathbb{R}^n \to \mathbb{R}\), a vector \(X\in\mathbb{R}^n\) attached to point \(p\in\mathbb{R}^n\) acts on \(f\) by computing \[(\lambda\mapsto f(x + \lambda X))'(0).\]
In fact, for a fixed \(p\), there's one-to-one correspondence between such linear functionals and \(\mathbb{R}^d\)-vectors.
For an abstract smooth manifold \(M\), we can take \(C^\infty(M)\) for the space of test functions, while "directional derivatives" would be derivatives along smooth curves.
Given a smooth curve \(\gamma: I\subset\mathbb{R}\to M\), passing through a point \(p\in M\) (s.t. \(\gamma(0)=p\)), the derivative of a test function \(f:M\to \mathbb{R}\) at \(p\) along \(\gamma\) is \[X_{\gamma, p}(f) = (f\circ \gamma)'(0).\]
We could read this formula as "initial velocity of \(f\) along \(\gamma\)".
There's a handful observations to be made about this concept:
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Leibniz rule and derivations
Derivatives
Total derivative, pullback, pushforward
A \(C^k\) function \(F:M\to N\) between manifolds induces maps between (co-)tangent bundles of \(M\) and \(N\).
The push-forward \(F_*X_p\) of \(X_p\in T_pM\) ("on the left") under \(F\) is a tangent vector in \(T_{F(p)}N\) such that for \(f\in C^\infty(M)\) \[(F_*X_p)(f) = X_p(f\circ F).\]
Similarly, for a co-tangent \(\omega\in T^*_{F(p)}N\) ("on the right"), its pull-back under \(F\) is \(F^*\omega \in T^*_pM\) such that for \(X\in T_pM\) \[(F^*\omega)(X) = \omega(F_*X).\]
The notion of push-forward generalizes that of a derivative. In coordinate basis, push-forward corresponds to Jacobian matrix acting on column-vectors. Hmm, pull-backs should correspond to Jacobian matrix too, matrix acting from the left, am I right? Yet we can't have correspondence between tangents and cotangents without metric tensor? Oh, wait, the derivative, apparently, should be a \((1,1)\)-tensor?! Ah, not even so, it takes one index from left manifold and the other from the right mfd
Coordinate frame in \(T_pM\)
Push-forwards of inverse charts form a basis in \(T_pM\). This also allows to specify the dimension of tangent bundle as a manifold
Given a chart \((U, x)\), we can construct a basis in \(T_p^M,~p\in\operatorname{dom}x\) from canonical basis in \(T_{x(p)}\mathbb{R}^n \cong \mathbb{R}^n\), by push-forwarding it via inverse chart: \[E_i = \frac{\partial}{\partial x_i} = x^{-1}_* \partial_i,\] where \(\partial_i\) is the \(i\)'th partial derivative operator acting on functions on \(\mathbb{R}^n\).
This is visualized as follows: one can construct a curve \(\gamma_i\) emanating from \(p\) as following \(i\)'th coordinate in the chart \(x\): \[\gamma_i(t)=x^{-1}(x(p) + t e_i).\]
Then \[\frac{\partial}{\partial x_i}(f) = \partial_i (f \circ x^{-1})(x(p)) = (t\mapsto f(x^{-1}(x(p) + te_i)))'(0) = X_{\gamma_i, p}(f).\]
In other words, \(\frac{\partial}{\partial x_i}\) is represented by the curve \(\gamma_i\). It now takes just a little exercise to show they are linearly independent a form a basis in \(T_pM\).
Distances, angles, isometry of \(T_pM\) and \(T_p^*M\)
So far, we've only built machinery for reasoning about continuity and smoothness. We still don't have any tools for discussing geometry. Now we introduce the notion of Riemannian metric, which will allow us to measure lengths of curves, compute angles, and, most importantly, convert the derivative of our objective function, which is a cotangent vector, into element of tangent space – the gradient, that is, a direction to make a step in.
Roughly, a metric on \(M\) is a smooth assignment of symmetric "inner product" \(g_p = \langle\cdot,\cdot\rangle_p\) on each \(T_pM\). Metric is Riemannian, if it's positive definite (\(g(X, X)\geq 0\)). In fact, in the second talk on applications, we'll be more interested in Lorentzian metrics that have signature \((+--\cdots-)\) that correspond to hyperbolic spaces.
TLDR:
Now, by a "smooth assignment" we mean that a metric on a smooth manifold \(M\) is a smooth map \[g:\Gamma(\operatorname{T}M)\times\Gamma(\operatorname{T}M)\to C^\infty(M)\] that takes two vector fields and produces a scalar function. In other words, it's a \(\operatorname{T}_0^2M\) field.
Comparing vectors from different tangent spaces
In this section we talk about differentiation of vector fields, linear connection, zero-acceleration, and parallel translation
NB: in sssome of the following pictures we temporarily use the sometimes abusive and mostly obsolete notation from Absil et al.
Newton method
Now that we've learned to differentiate vector fields, we can generalize Newton's method to manifolds
\[\nabla_{\eta, x_t}\operatorname{grad}f(x_t) = -\operatorname{grad}f(x_t)\]
Curvature
Not even sure if I should include it… would be useful to prepare for the second talk
Hard truth – we don't know closed-form expmaps and parallel translations
Here we discuss retractions and vector transport – \(1\)'st order approximations to exponential map and parallel transport.