Riemannopt@HSE
https://hackmd.io/@SomeoneS/manopt-slides
It's a two-parts talk about optimization on manifolds
and hyperbolic embeddings in deep learning.
- The preliminary talk, read by @SomeoneS, takes place on
March, 6 March, 13 and covers the basics.
- What are manifolds? Why Riemannian optimization?
- What's it mean to make a "step in a direction" on a manifold?
- How to differentiate functions between manifolds?
- What's tricky about higher-order derivatives, and what do "acceleration" and "parallel" mean?
- How to measure distances?
- How to represent manifolds with numerical arrays? Embedded manifolds
- How to get by without closed-form expressions? Retractions
- How to speed up? Adaptive methods on product-manifolds and discussion.
- Second talk, given by @ferrine96, is devoted to applications in deep learning, and in particular to graph embeddings. Date is
March, 13, March, 20.
Notes and slides:
- Notes and slides for the first talk.
The notes are a work-in-progress, many sections are still incomplete (some lack visualizations, some lack accompanying text, some both). We'll be updating the script as we approach the D-day.
- Notes and slides for the second talk: coming soon
References:
Additional:
Starting point – RAdam
Let us first set the goals for to-day.
Somewhat unconventionally, we jump off with the algorithm from
"Riemannian Adaptive Optimization Methods"
by Becigneul and Ganea. It might look scary at first,
but what we're now going to do is iterate, decrypting it bit by bit.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Manifolds
An -dimensional manifold is a topological space
that
- is "nice":
- Hausdorff (points have disjoint neighbourhoods)
- second-countable (has countable basis for topology )
- at every point has neighbourhood
in which it looks like Euclidean space:
there is a homeomorphism .
- is called a (local) "chart" at ; it provides "coordinates" on that piece of manifold
- "homeromorphism" means: continuous, invertible, with continuous inverse
Roughly speaking, it's a "locally-Euclidean space".
For instance, itself is a manifold,
with (global) chart being identity function.
An additional piece of structure we're going to need
(in addition to that of topology and local-Eucliddean-ness),
is that of a smooth manifold. Consider
two charts , that share some part of their domains:
Both charts being invertible functions,
we can construct a "change of coordinates", or "transition map"
from to as . We say that and
are smoothly compatible, if either
or transition map is a diffeomorphism (in Euclidean sense):
Now, to endow with a smooth structure, means to provide
a maximal atlas on – a collection of smoothly-compatible
charts, whose domains cover entire , and such that
this collection cannot be extended with a new chart in a smoothly-compatible way.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Now, we've done some hard work, we probably already want to see some benefits?
What smooth structure allows us to define, is: what does it mean for a
function to be a smooth function between manifolds?
Given two smooth manifolds
and , we say that a function
is smooth (resp. ) if
is smooth (resp. )
for every , .
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Submanifolds, product manifolds, bundles, fibers, sections
On one hand, I want to introduce bundles&c after tangent spaces, in the
context of vector fields&c; but on the other, it seems meaningful
to introduce submanifolds&c at very beginning so that we have examples
early on; this section will be probably split and moved later;
this section follows Frederic Schuller's lecture
I also need bundles to try and introduce curvature learning on stereographic model spaces
One way to construct manifolds, and the one that we're going
to use the most, is from other manifolds.
Given a topological -manifold ,
we call , , a submanifold of
if is itself a manifold.
Given -manifold and -manifold ,
their product manifold is the manifold
with underlying set the cartesian product
and endowed with product topology:
.
A bundle of topological manifolds
is a triple
of total space , continuous surjective projection map ,
and base space .
Bundle is a weaker version of product: instead of saying that
can be constructed from its components, we say that locally can
be projected onto components
- A fiber of at is:
- Future uses: vector bundles, tangent/cotangent bundle, sections
Example: product manifold is a bundle (REMOVE?)
- (continuous w.r.t. product topology),
Example: Mobius strip is a bundle, but not a product
- TODO: PICTURE
- – Mobius Strip,
- ,
A fiber bundle is a bundle s.t.
for all and some ("typical fiber") manifold .
Lo and behold, given a bundle
a section
assigns to each point in manifold
a point from its fiber .
In other words, it's a map such that .
Example: vector fields.
TODO
- Bundle morphisms
- Bundle isomorphisms
- Locally isomorphic bundles
- Trivial bundles, locally trivial bundles, and local trivialization
- Pullback bundle
Gradient descent, "step in a direction", and differentiation
In this section we talk about line-search along geodesics,
derivations and differentiation of functions between manifolds
TODO: actually, we need Riemannian metric
before we can finish this section – in order to convert
derivative into a tangent vector.
One of the simplest optimization methods one can think of
is the gradient descent:
given a (piecewise)-differentiable objective function
initial estimate ,
and learning rate ,
we gradually improve our estimate, by making
a small linear step in the direction of steepest descent
of the loss function:
One way to interpret such iteration, is that at step ,
we approximate our objective
Let's begin with the notion of "direction".
The key observation for generalization of "directions" and "direction fields" to manifolds, is that "direction" attached to a point acts on scalar functions as a directional derivative at that point.
Specifically, given a "test function" , a vector
attached to point
acts on by computing
In fact, for a fixed , there's one-to-one correspondence between such linear functionals and -vectors.
For an abstract smooth manifold ,
we can take for the space of test functions,
while "directional derivatives" would be
derivatives along smooth curves.
Given a smooth curve , passing through a point (s.t. ), the derivative of a test function at along is
We could read this formula as "initial velocity of along ".
There's a handful observations to be made about this concept:
- Only local behaviour of matters. Two curves may represent same directional derivative
- is a linear functional
- It satisfies Leibniz rule: . Such linear functionals are called derivations.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- The set actually forms a linear space: the action of linear combinations of derivations is defined pointwise, and an example curve representing the combination may be constructed in a chart:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Leibniz rule and derivations
Derivatives
Total derivative, pullback, pushforward
A function between manifolds
induces maps between (co-)tangent bundles
of and .
The push-forward of ("on the left") under
is a tangent vector in such that for
Similarly, for a co-tangent ("on the right"), its pull-back under is such that for
The notion of push-forward generalizes that of a derivative.
In coordinate basis, push-forward corresponds to Jacobian matrix acting on column-vectors.
Hmm, pull-backs should correspond to Jacobian matrix too, matrix acting from the left, am I right? Yet we can't have correspondence between tangents and cotangents without metric tensor? Oh, wait, the derivative, apparently, should be a -tensor?! Ah, not even so, it takes one index from left manifold and the other from the right mfd
Coordinate frame in
Push-forwards of inverse charts form a basis in .
This also allows to specify the dimension of tangent bundle as a manifold
Given a chart , we can construct a basis in from canonical basis in ,
by push-forwarding it via inverse chart:
where is the 'th partial derivative operator acting on functions on .
This is visualized as follows: one can construct a curve emanating from
as following 'th coordinate in the chart :
Then
In other words, is represented by the curve . It now takes just a little exercise to show they
are linearly independent a form a basis in .
Distances, angles, isometry of and
So far, we've only built machinery for reasoning
about continuity and smoothness. We still
don't have any tools for discussing geometry.
Now we introduce the notion of Riemannian metric,
which will allow us to measure lengths of curves,
compute angles, and, most importantly,
convert the derivative of our objective function,
which is a cotangent vector, into element
of tangent space – the gradient,
that is, a direction to make a step in.
Roughly, a metric on is a smooth assignment
of symmetric "inner product" on each .
Metric is Riemannian, if it's positive definite (). In fact, in the second talk on applications, we'll be more interested in Lorentzian metrics that have signature that correspond to hyperbolic spaces.
TLDR:
- Now finally, given a curve, we can measure its speeed: for .
- We can measure cosangles.
- We can now measure lengths:
- We can now convert elements of tangent and cotangent bundle back and forth:
- now defines a linear functional
- and a cotangent corresponds to tangent vector satisfying equation (i.e. resulting from application of inverse metric tensor)
- TODO: Show in a chart and under change of coordinates?
- This allows us to define, for ,
Now, by a "smooth assignment" we mean that
a metric on a smooth manifold is a smooth map
that takes two vector fields
and produces a scalar function.
In other words, it's a field.
Comparing vectors from different tangent spaces
In this section we talk about differentiation of vector fields,
linear connection, zero-acceleration, and parallel translation
NB: in sssome of the following pictures we temporarily
use the sometimes abusive and mostly obsolete notation from Absil et al.

Newton method
Now that we've learned to differentiate vector fields,
we can generalize Newton's method to manifolds
Curvature
Not even sure if I should include it…
would be useful to prepare for the second talk
Hard truth – we don't know closed-form expmaps and parallel translations
Here we discuss retractions and vector transport – 'st order approximations to exponential map and parallel transport.