Optimization on manifolds
References
Differential Geometry
Frederic Schuller's lectures on youtube
Lee's series
Optimization: Absil et. al
Details in notes
Metagoals
This talk fails to provide examples (yet)
Aims at stating the problems
… and actually asking questions
"Riemannian Adaptive Optimization Methods"
(Becigneul, Ganea)
Gradient descent
\(p_0\in K,~\alpha\in\mathbb{R},~f: K\subset \mathbb{R}^d\to \mathbb{R},\)
\(f(p)\to \min.\)
\(g_t = f'(p_t)^\top\)
\(x_{t+1} = x_t - \alpha g_t\)
Manifolds
An \(n\) -manifold: "a 'nice' top. space \((M, \Theta)\)
whose every \(p\in M\) has open neighbourhood
homeomorphic to (a piece of) \(\mathbb{R}^n\) "
\[\phi: U\xrightarrow{\sim} \phi(U)\subset \mathbb{R}^n\]
Manifolds. Missing tools
Descent direction: no derivatives
No notion of differentiability
Step along direction: no addition
Smooth manifolds
"A manifold covered with a maximal atlas of smoothly-compatible charts", \((M, \Theta, \mathcal{A})\)
Smooth functions
Directional derivatives
\((M, \Theta, \mathcal{A}),\)
\(p\in M,\)
Smooth curve \(\gamma:I\subset\mathbb{R}\to M,\) \(\gamma(0)=p,\)
\(f: M\to \mathbb{R}\) smooth.
\[(f\circ \gamma)'(0).\]
Directional derivatives
\[X_{\gamma, p}f = (f\circ \gamma)'(0)\]
"Directional derivative of \(f\) along \(\gamma\) at \(p\) "
Directional derivatives
\[X_{\gamma, p}f = (f\circ \gamma)'(0).\]
\(X_{\gamma,p}:C^\infty(M)\to\mathbb{R}\) is a linear operator .
Leibniz rule :
\(X_{\gamma, p}(fg) = f(p)X_{\gamma, p}g + X_{\gamma, p}(f)g(p).\)
Tangent space
\[T_pM = \left\{
X_{\gamma,p}\left|
\gamma:I\to M~\text{smooth},
~\gamma(0)=p
\right.
\right\}\]
"Tangent space to \(M\) at \(p\) "
Coincides with the set of all derivations
Derivations curve segments via charts
Forms a linear space
Coordinate basis
A chart \((U,x)\) , \(p\in U\) , corresponds to a basis in \(T_p M\) ,
represented by coordinate curves .
Denoted: \(\frac{\partial}{\partial x^1},~\ldots,~\frac{\partial}{\partial x^n}.\)
Acts: \(\frac{\partial}{\partial x^k}f
= \partial_k (\underbrace{f\circ (x^k)^{-1}}_{\mathbb{R}^n\to\mathbb{R}})(\underbrace{x(p)}_{\mathbb{R}^n})\)
where \(\partial_k\) is partial derivative
Exponential map
TODO: move to after geodesics
" Direction " means \(X\in T_pM\)
\(\exists\) open \(C(p)\subset M\) (cut locus) s.t. curve geodesic \(\gamma\) representing \(X\) is unique
\[\exp_pX = \gamma(1)~\text{if defined}.\]
Smooth manifolds. Missing tools
Before: \(p_{t+1} = p_t - \alpha g_{t},\)
After: \(p_{t+1} = \exp_{p_t}(-\alpha X_t).\)
Given objective \(f:M\to\mathbb{R}\) and \(p_t\in M\) ,
how to pick \(X_t\) ?
Derivative of a function
Given smooth \(F:M\to N,\)
"the push-forward of \(F\) at \(p\) " is the linear map
\(F_{*,p}:T_p{M}\to T_{F(p)}N\)
\[(\underbrace{F_{*,p}X}_{T_{F(p)}N})\underbrace{g}_{C^\infty(N)} = \underbrace{X}_{T_p M}(\underbrace{g\circ F}_{C^\infty(M)}).\]
In coordinate basis: Jacobian matrix
Derivative of objective
Given objective \(f:M\to \mathbb{R}\)
its derivative at point \(p\in M\)
is a linear functional \[T_p{M}\to T\mathbb{R}\cong \mathbb{R}.\]
\(f_{*,p}\in T_p^* M\) ,
need \(T_p M\) .
Metric manifolds
"A smooth manifold endowed with smooth assignment of (almost) ' inner products ' \(g:T_p M\times T_p M\to \mathbb{R}\) at each \(p\) ". Notes for details
Rieszc in \(\mathbb{R}^n\) : vectors linear functionals via inner product
\(\omega = g(X, \cdot) = X^\flat\)
\(\omega \mapsto X=\omega^\sharp\) as unique \(X\) s.t. \(\omega(Y) = g(X, Y)\) .
RGD
\[X_t = \operatorname{grad}f(p_t) \equiv f_{*,p_t}^\sharp,\]
\[x_{t+1} = \exp_{p_t}(-\alpha X_t).\]
Ready to implement?
Extrinsic representation
Immersion : smooth \(\phi: M\to\mathbb{R}^n\) such that \(\phi_{*,p}\) is injective for every \(p\in M\) .
Embedding : a diffeomorphic immersion.
[Strong] Whitney: can embed in \(\mathbb{R}^{2\operatorname{dim}M}.\)
Extrinsic representation
\(M\subset\mathbb{R}^d\) be a level-set ( \(\{p\left|F(p)=0\right.\}\) ).
\(\left.\operatorname{id}\right|_{M}\) is an embedding and
can represent points and tangents via (redundant) \(\mathbb{R}^d\) coordinates,
can induce metric from Euclidean using \(F\) ,
need projection op.
(prevalent approach, including geoopt)
(examples using local charts?)
(Becigneul, Ganea)
Retractions
Closed-form \(\exp\) and \(\sharp\) ?
Usually unknown or infeasible
First-order approximations to \(\exp\) : retractions
\[R_p:T_p M \to M,\]
\[R_p(0) = p,\]
\[(R_p)_{*,0} "=" \operatorname{id}_{T_p M}.\]
Faster methods
In vector spaces:
second-order methods (Newton)
dynamic approximations of Hessian (quasi-Newton)
"momentum", "adaptivity term"
On manifold:
First derivative is a vector field
How to differentiate vector fields ?
Vector fields
A Bundle is a triple \((E, \pi, M)\) , \[E\xrightarrow{\pi} M\]
\(E\) – "total space" (smooth mfd),
\(\pi\) – "projection" (smooth map),
\(M\) – "base space" (smooth mfd),
Vector fields
Tangent bundle:
\[E = TM = \cup_p T_p M\]
\[\pi = X_{\gamma, p} \mapsto p.\]
Vector fields
A smooth section of a bundle \(E\xrightarrow\pi M\)
is a smooth map \[\sigma:M\to E\] such that \[\pi\circ\sigma = \operatorname{id}_M\]
Set of all smooth sections is denoted \(\Gamma(E)\) .
Vector fields
A vector field is a smooth section of the tangent bundle \(TM\) .
To each \(p\in M\) assigns \(X_p\in T_p M\)
but in a "smooth way"
Another example: metric \(g\) is a smooth section of the bundle of bilinear forms ( \((0,2)\) -tensor field)
Covariant derivative
"Covariant derivative of vf \(Y\) along \(X\) ":
\[\nabla_XY\]
Wishlist:
\(\nabla_{fX+Y}Z = f\nabla_XZ + \nabla_Y Z,\)
\(\nabla_X(Y+Z) = \nabla_XY + \nabla_X Z,\)
\(\nabla_X(fY) = X(f)Y + f\nabla_XY.\)
Covariant derivative
In chart:
…
(Christoffel symbols)
Vector field along a curve
\(\gamma:I\to M\)
\(X:I\to TM\) s.t. \(X_\lambda \in T_{\gamma(\lambda)}M\)
Extendible if there is \(\tilde{X}\in\Gamma(TM)\) s.t.
\(X_\lambda = \tilde{X}_{\gamma(\lambda)}.\)
Parallel translate
If \(X:I \to TM\) is a vf along \(\gamma\)
extendible to \(\tilde{X}\)
and
\[\nabla_X\tilde{X} = 0\]
then \(X\) is called autoparallely-transported
"Transports" \(X_0\) in \(T_{\gamma(0)}M\) to \(X_\lambda\in T_{\gamma(\lambda)}M\)
(Becigneul, Ganea)
Vector transport
No closed-form parallel translation either
"Vector transport" – 1st-order approx (see Absil)
(here should go some qualitative derivation vector transports for levelsets)
Newton
later addition
\(f: \mathbb{R} \to \mathbb{R},\) find \(x\) s.t. \(f(x)=0\) .
Instead, replace \(f\) with linear \(h(x) = f(x_t) + f'(x_t)(x - x_t)\)
and solve \(h(x)=0\) :
\(x = x_t-f'(x_t)^{-1}f(x_t)\)
Newton
later addition
\(f: \mathbb{R}\to \mathbb{R},\) find \(x\) s.t. \(f'(x)=0\) .
Apply Newton's iteration to \(f'\)
\(x_{t+1} = x_t - f''(x_t)^{-1} f'(x_t)\)
Newton
later addition
\(f: \mathbb{R}^n\to\mathbb{R}^m\) , find \(x\) s.t. \(f(x)=0\) .
\(\tilde{f}(x) = f(x_t) + f'(x_t)(x-x_t),\)
\(\tilde{f}(x_{t+1}) = 0.\)
Newton
later addition
\(f: \mathbb{R}^n\to\mathbb{R}\) , find \(x\) s.t. \(f'(x)=0\) .
\(\tilde{f'}(x) = f'(x_t) + \underbrace{f''(x_t)(x - x_t)}_{\in~{^n}\mathbb{R}},\)
\(f''(x_t)(x - x_t) = -f'(x_t)\)
Newton
Solve
\[\nabla_{X_t}\operatorname{grad}f(p_t)
= -\operatorname{grad}f(p_t)\]
for \(X_t\in T_{p_t}M\) .
\[p_{t+1} = \exp_{p_t}X_t.\]
Accelerated first-order methods
Stochastic optimization
Accelerated SGD?
Adam: coordinate-wise adaptive learning rate (more or less)
Parallel translation of "momentum"
"Coordinates" on manifolds?
Product manifolds
\[M = M_1\times \cdots\times M_n,\]
Product topology, product metric, …
RAdam : adaptivity term for each product component!
(Becigneul, Ganea)
\(\varkappa\) -Stereographic space
(Becigneul, Ganea) use RAdam to embed wordnet in a Poincare ball (constant curvature \(c\) )
Hyperbolic neural networks
\(c\) – hyperparameter
Optimize for \(c\) ?
\(\varkappa\) -Stereographic space
\(S^\varkappa\) – \(\varkappa\) -stereographic space, universal for positive and negative curvature ( Bloch )
"Almost \(\mathbb{R}\) -line bundle"
\(E\xrightarrow\pi \mathbb{R}\)
\(\pi^{-1}(\varkappa) \simeq S^\varkappa\)
Can we induce metric, etc? OK for optimization?
Resume presentation
Optimization on manifolds geoopt/geoopt ( @ferrine , @rrkarim , @SomeoneS ) March 13 : an overview, @SomeoneS Notes: https://hackmd.io/@SomeoneS/S1gDMMFV8 March 20 : Hyperbolic deep learning, graph embeddings, @ferrine96
{"metaMigratedAt":"2023-06-15T04:45:26.758Z","metaMigratedFrom":"YAML","title":"Riemannopt@HSE - SLIDES, pt. 1/2","breaks":false,"description":"Sketch of slides for the Riemann opts talk at HSE. View the slide with \"Slide Mode\".","slideOptions":"{\"transition\":\"fade\",\"allottedMinutes\":60,\"theme\":\"white\"}","contributors":"[{\"id\":\"5df6d742-95e4-4a3e-8568-902ecaabcfca\",\"add\":15267,\"del\":4364}]"}