Riemannopt@HSE

https://hackmd.io/@SomeoneS/manopt-slides

It's a two-parts talk about optimization on manifolds and hyperbolic embeddings in deep learning.

  • The preliminary talk, read by @SomeoneS, takes place on March, 6 March, 13 and covers the basics.
    • What are manifolds? Why Riemannian optimization?
    • What's it mean to make a "step in a direction" on a manifold?
    • How to differentiate functions between manifolds?
    • What's tricky about higher-order derivatives, and what do "acceleration" and "parallel" mean?
    • How to measure distances?
    • How to represent manifolds with numerical arrays? Embedded manifolds
    • How to get by without closed-form expressions? Retractions
    • How to speed up? Adaptive methods on product-manifolds and discussion.
  • Second talk, given by @ferrine96, is devoted to applications in deep learning, and in particular to graph embeddings. Date is March, 13, March, 20.

Notes and slides:

  • Notes and slides for the first talk. The notes are a work-in-progress, many sections are still incomplete (some lack visualizations, some lack accompanying text, some both). We'll be updating the script as we approach the D-day.
  • Notes and slides for the second talk: coming soon

References:

Additional:

Starting point RAdam

Let us first set the goals for to-day. Somewhat unconventionally, we jump off with the algorithm from "Riemannian Adaptive Optimization Methods" by Becigneul and Ganea. It might look scary at first, but what we're now going to do is iterate, decrypting it bit by bit.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Manifolds

An

n-dimensional manifold
M
is a topological space
(M,θ)
that

  • is "nice":
    • Hausdorff (points have disjoint neighbourhoods)
    • second-countable (has countable basis for topology
      θ
      )
  • at every point
    pM
    has neighbourhood
    UM
    in which it looks like Euclidean space: there is a homeomorphism
    x:Ux(U)Rn
    .
    • x
      is called a (local) "chart" at
      p
      ; it provides "coordinates" on that piece of manifold
    • "homeromorphism" means: continuous, invertible, with continuous inverse

Roughly speaking, it's a "locally-Euclidean space".

For instance,

Rn itself is a manifold, with (global) chart being identity function.

An additional piece of structure we're going to need (in addition to that of topology and local-Eucliddean-ness), is that of a smooth manifold. Consider two charts

(U,x),
(V,y)
that share some part of their domains:
UV.
Both charts being invertible functions, we can construct a "change of coordinates", or "transition map" from
x
to
y
as
yx1
. We say that
x
and
y
are smoothly compatible, if either
UV=
or transition map is a diffeomorphism (in Euclidean sense):
yx1:x(UV)Rny(UV)Rn.
Now, to endow
M
with a smooth structure, means to provide a maximal atlas
A
on
M
a collection of smoothly-compatible charts, whose domains cover entire
M
, and such that this collection cannot be extended with a new chart in a smoothly-compatible way.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Now, we've done some hard work, we probably already want to see some benefits? What smooth structure allows us to define, is: what does it mean for a function to be a smooth function between manifolds? Given two smooth manifolds

(M,θM,AM) and
(N,θN,AN)
, we say that a function
F:MN
is smooth (resp.
Ck
) if
yFx1:x(U)Rmy(V)Rn
is smooth (resp.
Ck
) for every
(U,x)AM
,
(V,y)AN
.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Submanifolds, product manifolds, bundles, fibers, sections

On one hand, I want to introduce bundles&c after tangent spaces, in the context of vector fields&c; but on the other, it seems meaningful to introduce submanifolds&c at very beginning so that we have examples early on; this section will be probably split and moved later; this section follows Frederic Schuller's lecture

I also need bundles to try and introduce curvature learning on stereographic model spaces

One way to construct manifolds, and the one that we're going to use the most, is from other manifolds.

Given a topological

n-manifold
(M,θ)
, we call
(N,θ|N)
,
NM
, a submanifold of
M
if
(N,θ|N)
is itself a manifold.

Given

m-manifold
M
and
n
-manifold
N
, their product manifold is the manifold with underlying set the cartesian product and endowed with product topology:
(M×N,θM×N)
.

  • n
    -torus
    Tn
    is product
    S1××S1
    .

A bundle of topological manifolds is a triple

(E,π,M) of total space
E
, continuous surjective projection map
π:EM
, and base space
M
. Bundle is a weaker version of product: instead of saying that
E
can be constructed from its components, we say that locally
E
can be projected onto components

  • π:EM
  • A fiber of
    E
    at
    pM
    is:
    π1(p)
  • Future uses: vector bundles, tangent/cotangent bundle, sections

Example: product manifold is a bundle (REMOVE?)

  • E=M×N,
  • π:M×NM,
  • (p,q)p
    (continuous w.r.t. product topology),

Example: Mobius strip is a bundle, but not a product

  • TODO: PICTURE
  • E
    Mobius Strip,
  • M=S1
    ,
  • π1(p)=[1,1].

A fiber bundle is a bundle

π:EM s.t.
π1(p)F
for all
pM
and some ("typical fiber") manifold
F
.

Lo and behold, given a bundle

π:EM, a section
σ:ME
assigns to each point
p
in manifold
M
a point from its fiber
σ(p)π1(p)E
. In other words, it's a map
σ:ME
such that
πσ=idM
. Example: vector fields.

TODO

  • Bundle morphisms
  • Bundle isomorphisms
  • Locally isomorphic bundles
  • Trivial bundles, locally trivial bundles, and local trivialization
  • Pullback bundle

Gradient descent, "step in a direction", and differentiation

In this section we talk about line-search along geodesics, derivations and differentiation of functions between manifolds

TODO: actually, we need Riemannian metric before we can finish this section in order to convert derivative into a tangent vector.

One of the simplest optimization methods one can think of is the gradient descent: given a (piecewise)-differentiable objective function

f:XRnR initial estimate
x0X
, and learning rate
αR
, we gradually improve our estimate, by making a small linear step in the direction of steepest descent of the loss function:
xt=xt1αgradf(xt1).

One way to interpret such iteration, is that at step

t, we approximate our objective
f

Let's begin with the notion of "direction". The key observation for generalization of "directions" and "direction fields" to manifolds, is that "direction" attached to a point acts on scalar functions as a directional derivative at that point. Specifically, given a "test function"

f:RnR, a vector
XRn
attached to point
pRn
acts on
f
by computing
(λf(x+λX))(0).

In fact, for a fixed

p, there's one-to-one correspondence between such linear functionals and
Rd
-vectors.

For an abstract smooth manifold

M, we can take
C(M)
for the space of test functions, while "directional derivatives" would be derivatives along smooth curves.

Given a smooth curve

γ:IRM, passing through a point
pM
(s.t.
γ(0)=p
), the derivative of a test function
f:MR
at
p
along
γ
is
Xγ,p(f)=(fγ)(0).

We could read this formula as "initial velocity of

f along
γ
".

There's a handful observations to be made about this concept:

  • Only local behaviour of
    γ
    matters. Two curves may represent same directional derivative
  • Xγ,p
    is a linear functional
  • It satisfies Leibniz rule:
    X(fg)=X(f)g(p)+f(p)X(g)
    . Such linear functionals are called derivations.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
  • The set
    TpM={Xγ,p: γ smooth curve through p}
    actually forms a linear space: the action of linear combinations of derivations is defined pointwise, and an example curve representing the combination may be constructed in a chart:
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Leibniz rule and derivations

Derivatives

Total derivative, pullback, pushforward

A

Ck function
F:MN
between manifolds induces maps between (co-)tangent bundles of
M
and
N
.

The push-forward

FXp of
XpTpM
("on the left") under
F
is a tangent vector in
TF(p)N
such that for
fC(M)
(FXp)(f)=Xp(fF).

Similarly, for a co-tangent

ωTF(p)N ("on the right"), its pull-back under
F
is
FωTpM
such that for
XTpM
(Fω)(X)=ω(FX).

The notion of push-forward generalizes that of a derivative. In coordinate basis, push-forward corresponds to Jacobian matrix acting on column-vectors. Hmm, pull-backs should correspond to Jacobian matrix too, matrix acting from the left, am I right? Yet we can't have correspondence between tangents and cotangents without metric tensor? Oh, wait, the derivative, apparently, should be a

(1,1)-tensor?! Ah, not even so, it takes one index from left manifold and the other from the right mfd

Coordinate frame in
TpM

Push-forwards of inverse charts form a basis in

TpM. This also allows to specify the dimension of tangent bundle as a manifold

Given a chart

(U,x), we can construct a basis in
TpM, pdomx
from canonical basis in
Tx(p)RnRn
, by push-forwarding it via inverse chart:
Ei=xi=x1i,
where
i
is the
i
'th partial derivative operator acting on functions on
Rn
.

This is visualized as follows: one can construct a curve

γi emanating from
p
as following
i
'th coordinate in the chart
x
:
γi(t)=x1(x(p)+tei).

Then

xi(f)=i(fx1)(x(p))=(tf(x1(x(p)+tei)))(0)=Xγi,p(f).

In other words,

xi is represented by the curve
γi
. It now takes just a little exercise to show they are linearly independent a form a basis in
TpM
.

Distances, angles, isometry of
TpM
and
TpM

So far, we've only built machinery for reasoning about continuity and smoothness. We still don't have any tools for discussing geometry. Now we introduce the notion of Riemannian metric, which will allow us to measure lengths of curves, compute angles, and, most importantly, convert the derivative of our objective function, which is a cotangent vector, into element of tangent space the gradient, that is, a direction to make a step in.

Roughly, a metric on

M is a smooth assignment of symmetric "inner product"
gp=,p
on each
TpM
. Metric is Riemannian, if it's positive definite (
g(X,X)0
). In fact, in the second talk on applications, we'll be more interested in Lorentzian metrics that have signature
(+)
that correspond to hyperbolic spaces.

TLDR:

  • Now finally, given a curve, we can measure its speeed:
    Xp2=X,Xp,
    for
    XTpM
    .
  • We can measure cosangles.
  • We can now measure lengths:
    L[a,b](γ)=abγ(t)γ(t)dt.
  • We can now convert elements of tangent and cotangent bundle back and forth:
    • XTpM
      now defines a linear functional
      X=YX,Yp,
    • and a cotangent
      ωTpM
      corresponds to tangent vector
      ωTpM
      satisfying equation
      ω,Y=ω(Y)
      (i.e. resulting from application of inverse metric tensor)
    • TODO: Show in a chart and under change of coordinates?
  • This allows us to define, for
    f:MR
    ,
    gradf=dfΓ(TM).

Now, by a "smooth assignment" we mean that a metric on a smooth manifold

M is a smooth map
g:Γ(TM)×Γ(TM)C(M)
that takes two vector fields and produces a scalar function. In other words, it's a
T02M
field.

Comparing vectors from different tangent spaces

In this section we talk about differentiation of vector fields, linear connection, zero-acceleration, and parallel translation

NB: in sssome of the following pictures we temporarily use the sometimes abusive and mostly obsolete notation from Absil et al.

Newton method

Now that we've learned to differentiate vector fields, we can generalize Newton's method to manifolds

η,xtgradf(xt)=gradf(xt)

Curvature

Not even sure if I should include it would be useful to prepare for the second talk

Hard truth we don't know closed-form expmaps and parallel translations

Here we discuss retractions and vector transport

1'st order approximations to exponential map and parallel transport.

Riemannopt@HSE https://hackmd.io/@SomeoneS/manopt-slides It's a two-parts talk about optimization on manifolds and hyperbolic embeddings in deep learning. The preliminary talk, read by @SomeoneS , takes place on March, 6 March, 13 and covers the basics. What are manifolds? Why Riemannian optimization? What's it mean to make a "step in a direction" on a manifold? How to differentiate functions between manifolds? What's tricky about higher-order derivatives, and what do "acceleration" and "parallel" mean? How to measure distances? How to represent manifolds with numerical arrays? Embedded manifolds How to get by without closed-form expressions? Retractions How to speed up? Adaptive methods on product-manifolds and discussion. Second talk, given by @ferrine96 , is devoted to applications in deep learning, and in particular to graph embeddings. Date is March, 13 , March, 20 . Embedding graphs in hyperbolic spaces Mixed curvature and \(\kappa\) -stereographic model Hyperbolic neural networks Batch-Normalization on manifolds Generalizing convolutional layers And more Notes and slides: Notes and slides for the first talk. The notes are a work-in-progress, many sections are still incomplete (some lack visualizations, some lack accompanying text, some both). We'll be updating the script as we approach the D-day. Notes and slides for the second talk: coming soon References: Gravity and Light workshop: https://gravity-and-light.herokuapp.com/tutorials "Geometrical Anatomy of Theoretical Physics" by Frederic Schuller on YouTube Lee's series Introduction to Topological Manifolds: https://archive.org/details/springer_10.1007-978-0-387-22727-6/mode/2up Introduction to Smooth Manifolds: http://webmath2.unito.it/paginepersonali/sergio.console/lee.pdf Riemannian Manifolds: https://www.maths.ed.ac.uk/~v1ranick/papers/leeriemm.pdf Absil's optimization on Matrix Manifolds: Book online: https://press.princeton.edu/absil Book: http://www.eeci-institute.eu/GSC2011/Photos-EECI/EECI-GSC-2011-M5/book_AMS.pdf Slides, Feb, 2011: http://www.eeci-institute.eu/GSC2011/Photos-EECI/EECI-GSC-2011-M5/slides_lectures.pdf Slides, Nov, 2014: http://gdr-mia.math.cnrs.fr/events/optimgeo-14/program/slides/absil.pdf Additional: http://www.cnbc.cmu.edu/cns/papers/edelman98.pdf https://arxiv.org/abs/1907.00949 https://arxiv.org/abs/1607.02833 Starting point – RAdam Let us first set the goals for to-day. Somewhat unconventionally, we jump off with the algorithm from "Riemannian Adaptive Optimization Methods" by Becigneul and Ganea. It might look scary at first, but what we're now going to do is iterate, decrypting it bit by bit. Manifolds An \(n\) -dimensional manifold \(M\) is a topological space \((M, \theta)\) that is "nice": Hausdorff (points have disjoint neighbourhoods) second-countable (has countable basis for topology \(\theta\) ) at every point \(p\in M\) has neighbourhood \(U\subset M\) in which it looks like Euclidean space: there is a homeomorphism \(x:U \to x(U)\subset \mathbb{R}^n\) . \(x\) is called a (local) "chart" at \(p\) ; it provides "coordinates" on that piece of manifold "homeromorphism" means: continuous, invertible, with continuous inverse Roughly speaking, it's a "locally-Euclidean space". For instance, \(\mathbb{R}^n\) itself is a manifold, with (global) chart being identity function. An additional piece of structure we're going to need (in addition to that of topology and local-Eucliddean-ness), is that of a smooth manifold . Consider two charts \((U, x)\) , \((V, y)\) that share some part of their domains: \(U\cap V \neq \varnothing.\) Both charts being invertible functions, we can construct a " change of coordinates ", or " transition map " from \(x\) to \(y\) as \(y\circ x^{-1}\) . We say that \(x\) and \(y\) are smoothly compatible , if either \(U\cap V = \varnothing\) or transition map is a diffeomorphism (in Euclidean sense): \[y\circ x^{-1}: x(U\cap V)\subset\mathbb{R}^n \to y(U\cap V)\subset\mathbb{R}^n.\] Now, to endow \(M\) with a smooth structure , means to provide a maximal atlas \(A\) on \(M\) – a collection of smoothly-compatible charts, whose domains cover entire \(M\) , and such that this collection cannot be extended with a new chart in a smoothly-compatible way. Now, we've done some hard work, we probably already want to see some benefits? What smooth structure allows us to define, is: what does it mean for a function to be a smooth function between manifolds ? Given two smooth manifolds \((M, \theta_M, A_M)\) and \((N, \theta_N, A_N)\) , we say that a function \(F: M\to N\) is smooth (resp. \(C^k\) ) if \[y\circ F\circ x^{-1}: x(U)\subset\mathbb{R}^m \to y(V)\subset\mathbb{R}^n\] is smooth (resp. \(C^k\) ) for every \((U, x)\in A_M\) , \((V, y) \in A_N\) . Submanifolds, product manifolds, bundles, fibers, sections On one hand, I want to introduce bundles&c after tangent spaces, in the context of vector fields&c; but on the other, it seems meaningful to introduce submanifolds&c at very beginning so that we have examples early on; this section will be probably split and moved later; this section follows Frederic Schuller's lecture I also need bundles to try and introduce curvature learning on stereographic model spaces One way to construct manifolds, and the one that we're going to use the most, is from other manifolds. Given a topological \(n\) -manifold \((M, \theta)\) , we call \((N, \left.\theta\right|_N)\) , \(N\subset M\) , a submanifold of \(M\) if \((N, \left.\theta\right|_N)\) is itself a manifold. Given \(m\) -manifold \(M\) and \(n\) -manifold \(N\) , their product manifold is the manifold with underlying set the cartesian product and endowed with product topology: \((M{\times}N, \left.\theta\right._{M{\times}N})\) . \(n\) -torus \(T^n\) is product \(S^1{\times}\cdots{\times}S^1\) . A bundle of topological manifolds is a triple \((E, \pi, M)\) of total space \(E\) , continuous surjective projection map \(\pi:E\to M\) , and base space \(M\) . Bundle is a weaker version of product: instead of saying that \(E\) can be constructed from its components, we say that locally \(E\) can be projected onto components \(\pi: E\to M\) A fiber of \(E\) at \(p\in M\) is: \(\pi^{-1}(p)\) Future uses: vector bundles, tangent/cotangent bundle, sections Example: product manifold is a bundle (REMOVE?) \(E = M{\times}N,\) \(\pi: M{\times}N \to M,\) \((p, q) \mapsto p\) (continuous w.r.t. product topology), Example: Mobius strip is a bundle, but not a product TODO: PICTURE \(E\) – Mobius Strip, \(M = S^1\) , \(\pi^{-1}(p) = [-1, 1].\) A fiber bundle is a bundle \(\pi:E\to M\) s.t. \(\pi^{-1}(p) \cong F\) for all \(p\in M\) and some ("typical fiber") manifold \(F\) . Lo and behold, given a bundle \(\pi:E\to M,\) a section \(\sigma: M\to E\) assigns to each point \(p\) in manifold \(M\) a point from its fiber \(\sigma(p)\in \pi^{-1}(p)\subset E\) . In other words, it's a map \(\sigma: M\to E\) such that \(\pi \circ \sigma = \operatorname{id}_M\) . Example: vector fields. TODO Bundle morphisms Bundle isomorphisms Locally isomorphic bundles Trivial bundles, locally trivial bundles , and local trivialization Pullback bundle Gradient descent, "step in a direction", and differentiation In this section we talk about line-search along geodesics, derivations and differentiation of functions between manifolds TODO: actually, we need Riemannian metric before we can finish this section – in order to convert derivative into a tangent vector. One of the simplest optimization methods one can think of is the gradient descent: given a (piecewise)-differentiable objective function \(f:X\subset\mathbb{R}^n \to \mathbb{R}\) initial estimate \(x_0 \in X\) , and learning rate \(\alpha\in\mathbb{R}\) , we gradually improve our estimate, by making a small linear step in the direction of steepest descent of the loss function: \[x_t = x_{t-1} - \alpha \operatorname{grad} f (x_{t-1}).\] One way to interpret such iteration, is that at step \(t\) , we approximate our objective \(f\) Let's begin with the notion of " direction ". The key observation for generalization of "directions" and "direction fields" to manifolds, is that "direction" attached to a point acts on scalar functions as a directional derivative at that point. Specifically, given a "test function" \(f:\mathbb{R}^n \to \mathbb{R}\) , a vector \(X\in\mathbb{R}^n\) attached to point \(p\in\mathbb{R}^n\) acts on \(f\) by computing \[(\lambda\mapsto f(x + \lambda X))'(0).\] In fact, for a fixed \(p\) , there's one-to-one correspondence between such linear functionals and \(\mathbb{R}^d\) -vectors. For an abstract smooth manifold \(M\) , we can take \(C^\infty(M)\) for the space of test functions, while "directional derivatives" would be derivatives along smooth curves . Given a smooth curve \(\gamma: I\subset\mathbb{R}\to M\) , passing through a point \(p\in M\) (s.t. \(\gamma(0)=p\) ), the derivative of a test function \(f:M\to \mathbb{R}\) at \(p\) along \(\gamma\) is \[X_{\gamma, p}(f) = (f\circ \gamma)'(0).\] We could read this formula as "initial velocity of \(f\) along \(\gamma\) ". There's a handful observations to be made about this concept: Only local behaviour of \(\gamma\) matters. Two curves may represent same directional derivative \(X_{\gamma, p}\) is a linear functional It satisfies Leibniz rule: \(X(fg) = X(f)g(p) + f(p)X(g)\) . Such linear functionals are called derivations . The set \(T_pM = \{X_{\gamma, p}:~\gamma~\text{smooth curve through}~p\}\) actually forms a linear space: the action of linear combinations of derivations is defined pointwise, and an example curve representing the combination may be constructed in a chart: Leibniz rule and derivations Derivatives Total derivative, pullback, pushforward A \(C^k\) function \(F:M\to N\) between manifolds induces maps between (co-)tangent bundles of \(M\) and \(N\) . The push-forward \(F_*X_p\) of \(X_p\in T_pM\) ("on the left") under \(F\) is a tangent vector in \(T_{F(p)}N\) such that for \(f\in C^\infty(M)\) \[(F_*X_p)(f) = X_p(f\circ F).\] Similarly, for a co-tangent \(\omega\in T^*_{F(p)}N\) ("on the right"), its pull-back under \(F\) is \(F^*\omega \in T^*_pM\) such that for \(X\in T_pM\) \[(F^*\omega)(X) = \omega(F_*X).\] The notion of push-forward generalizes that of a derivative. In coordinate basis, push-forward corresponds to Jacobian matrix acting on column-vectors. Hmm, pull-backs should correspond to Jacobian matrix too, matrix acting from the left, am I right? Yet we can't have correspondence between tangents and cotangents without metric tensor? Oh, wait, the derivative, apparently, should be a \((1,1)\) -tensor?! Ah, not even so, it takes one index from left manifold and the other from the right mfd Coordinate frame in \(T_pM\) Push-forwards of inverse charts form a basis in \(T_pM\) . This also allows to specify the dimension of tangent bundle as a manifold Given a chart \((U, x)\) , we can construct a basis in \(T_p^M,~p\in\operatorname{dom}x\) from canonical basis in \(T_{x(p)}\mathbb{R}^n \cong \mathbb{R}^n\) , by push-forwarding it via inverse chart: \[E_i = \frac{\partial}{\partial x_i} = x^{-1}_* \partial_i,\] where \(\partial_i\) is the \(i\) 'th partial derivative operator acting on functions on \(\mathbb{R}^n\) . This is visualized as follows: one can construct a curve \(\gamma_i\) emanating from \(p\) as following \(i\) 'th coordinate in the chart \(x\) : \[\gamma_i(t)=x^{-1}(x(p) + t e_i).\] Then \[\frac{\partial}{\partial x_i}(f) = \partial_i (f \circ x^{-1})(x(p)) = (t\mapsto f(x^{-1}(x(p) + te_i)))'(0) = X_{\gamma_i, p}(f).\] In other words, \(\frac{\partial}{\partial x_i}\) is represented by the curve \(\gamma_i\) . It now takes just a little exercise to show they are linearly independent a form a basis in \(T_pM\) . Distances, angles, isometry of \(T_pM\) and \(T_p^*M\) So far, we've only built machinery for reasoning about continuity and smoothness . We still don't have any tools for discussing geometry . Now we introduce the notion of Riemannian metric , which will allow us to measure lengths of curves, compute angles, and, most importantly, convert the derivative of our objective function, which is a cotangent vector, into element of tangent space – the gradient , that is, a direction to make a step in. Roughly, a metric on \(M\) is a smooth assignment of symmetric " inner product " \(g_p = \langle\cdot,\cdot\rangle_p\) on each \(T_pM\) . Metric is Riemannian , if it's positive definite ( \(g(X, X)\geq 0\) ). In fact, in the second talk on applications, we'll be more interested in Lorentzian metrics that have signature \((+--\cdots-)\) that correspond to hyperbolic spaces. TLDR: Now finally, given a curve, we can measure its speeed : \(\|X\|_p^2 = \langle X, X \rangle_p,\) for \(X\in\operatorname{T}_pM\) . We can measure cosangles. We can now measure lengths: \(L_{[a,b]}(\gamma) = \int_a^b \|\gamma'(t)\|_{\gamma(t)}\operatorname{d}t.\) We can now convert elements of tangent and cotangent bundle back and forth: \(X\in T_pM\) now defines a linear functional \(X^\flat=Y\mapsto \langle X, Y \rangle_p,\) and a cotangent \(\omega\in T_p^*M\) corresponds to tangent vector \(\omega^\sharp\in T_pM\) satisfying equation \(\langle \omega^\sharp, Y\rangle=\omega(Y)\) (i.e. resulting from application of inverse metric tensor) TODO: Show in a chart and under change of coordinates? This allows us to define, for \(f:M\to\mathbb{R}\) , \[\operatorname{grad}f = \operatorname{d}f^\sharp \in \Gamma(\operatorname{T}M).\] Now, by a "smooth assignment" we mean that a metric on a smooth manifold \(M\) is a smooth map \[g:\Gamma(\operatorname{T}M)\times\Gamma(\operatorname{T}M)\to C^\infty(M)\] that takes two vector fields and produces a scalar function. In other words, it's a \(\operatorname{T}_0^2M\) field. Comparing vectors from different tangent spaces In this section we talk about differentiation of vector fields, linear connection, zero-acceleration, and parallel translation NB : in sssome of the following pictures we temporarily use the sometimes abusive and mostly obsolete notation from Absil et al. Newton method Now that we've learned to differentiate vector fields, we can generalize Newton's method to manifolds \[\nabla_{\eta, x_t}\operatorname{grad}f(x_t) = -\operatorname{grad}f(x_t)\] Curvature Not even sure if I should include it … would be useful to prepare for the second talk Hard truth – we don't know closed-form expmaps and parallel translations Here we discuss retractions and vector transport – \(1\) 'st order approximations to exponential map and parallel transport.