owned this note
owned this note
Published
Linked with GitHub
# Dynamical Systems
https://www.vutbr.cz/www_base/zav_prace_soubor_verejne.php?file_id=175032






t


https://users.encs.concordia.ca/~concoco/MLEofaTS.pdf


## On discretization
https://nms.kcl.ac.uk/MSc_Th_Phys/cm131a-notes-1819.pdf
https://elmer.unibas.ch/pendulum/bterm.htm
https://math.libretexts.org/Bookshelves/Scientific_Computing_Simulations_and_Modeling/Book%3A_Introduction_to_the_Modeling_and_Analysis_of_Complex_Systems_(Sayama)/05%3A_DiscreteTime_Models_II__Analysis/5.07%3A_5.7_Linear_Stability_Analysis_of_Discrete-Time_Nonlinear_Dynamical_Systems
## On Maps
https://fada.birzeit.edu/bitstream/20.500.11889/1310/1/thesis_18032015_12641.pdf
# Future directions
Thus far, we provided the reader with a dynamical system perspective on game optimization. Through both analtycal results on the bi-linear min-max game and emperical results on GANs, we showed that the optimization dynamics could suffer from rotational dynamics in the vicinity of the equilibrium. Our analysis provided in the previous sections promote several directions for future research which we outline in this section.
### On the discripancy between continous-time and discrete-time dynamics
Recall the example of the bi-linear game in equation ?. As pointed in Sec ?, the continous-time dynamics is
where the Jacobian is ? and hence the dynamics around the equilibrium forms an orbit which maintains a fixed distance to the equilibrium. On the other hand, the discrete-time dynamics has the following form,
where the dynamics spiral out and diverge when initialized in the vicinity of the equilibrium. Figure ? depicts the discripancy between continous-time and discrete-time dynamics.
This observation suggests that instabalities in game optimization could be, at least partly, be attributed to the integration error introduced through the the discretization.
In the following preliminary analysis, we characterize the discretization error and show that this error could be seen
The effect of discretization when modeling continuous dynamics is a well
studied problem in the numerical analysis of partial differential equations. One tool commonly used
in this setting, is modified equation analysis Warming & Hyett (1974), which determines how to
better model discrete steps with a continuous differential equation by introducing higher order spatial
or temporal derivatives. We present two methods based on modified equation analysis, which modify
gradient flow to account for the effect of discretization.
Modified loss. Gradient descent always moves
in the direction of steepest descent on a loss
function L at each step, however, due to the
finite nature of the learning rate, it fails to remain on the continuous steepest descent path
given by gradient flow. Li et al. (2017); Feng
et al. (2019) and most recently Barrett & Dherin
(2020), demonstrate that the gradient descent
trajectory closely follows the steepest descent
path of a modified loss function Le. The divergence between these trajectories fundamentally
depends on the learning rate η and the curvature
H. As derived in Barrett & Dherin (2020), and
summarized in appendix D, this divergence is
given by the gradient correction −
See Fig. 4 for an illustrative example of this
method applied to a quadratic loss in R2.
Modified flow. Rather than modifying gradient
flow with higher order “spatial” derivatives of
the loss function, here we introduce higher order temporal derivatives. We start by assuming the
existence of a continuous trajectory θ(t) that weaves through the discrete steps taken by gradient
descent and then identify the differential equation that generates the trajectory. Rearranging the update
equation for gradient descent, θt+1 = θt−ηg(θt), and assuming θ(t) = θt and θ(t+η) = θt+1, gives
the equality −g(θt) = θ(t+η)−θ(t)
η
, which Taylor expanding the right side results in the differential
equation
). Notice that in the limit as η → 0 we regain gradient flow.
For small η, we obtain a modified version of gradient flow with an additional second-order term,
This approach to modifying first-order differential equation with higher order temporal derivatives was
applied by Kovachki & Stuart (2019) to construct a more realistic continuous model for momentum,
as illustrated in Fig. 4.
Figure 4: Modeling discretization. We visualize the trajectories of gradient descent and momentum (black dots), gradient flow with and without momentum (blue lines), and the modified dynamics (red lines) on the quadratic loss L(w) =
w
|
2.5 −1.5
−1.5 2
w. On the left we visualize gradient dynamics using modified loss. On the right
we visualize momentum dynamics using modified
flow. In both settings the modified continuous dynamics visually track the discrete dynamics better
than the original continuous dynamics. See appendix D for further details.
The centrifugal effect due to discretization originates from the spherical geometry
of the gradient field in parameter space – because scale symmetry implies the gradient is always
orthogonal to the parameter itself, each discrete update with a finite learning rate effectively pushes
the parameters away from the origin.