Notes on "Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks"

# Notes on "Opening the Black Box: Low-Dimensional Dynamics in High-Dimensional Recurrent Neural Networks" #### Author: [Sharath Chandra](https://sharathraparthy.github.io/) ## [Paper Link](https://www.mitpressjournals.org/doi/pdf/10.1162/NECO_a_00409) ## Introduction 1. RNNs are employed in domains which has temporal structure. 2. There has been a huge line of work on training of RNNs however the work on unpacking this RNN (blackbox) and understanding what's happening under the hood is very sparse. 3. This paper tries to understand RNNs by viewing them as a non-linear dynamical system and studying the fixed points and invariant subspaces. 4. To study the nature of these fixed points, the authors make use of well known linearization technique around the neighborhood fixed points. ## Linearization If we consider a dynamical system $\dot{x} = F(x)$ where $x \in \mathbb{R}^n$ and $F$ is the vector field that defines the evolution of the system. Let's suppose $x^\star$ is the point and we are interested in analyzing the behavior of dynamical systems around that point. We can taylor expand $F(x)$ around $\delta x$ neighbourhood of the query point $x^\star$ which gives \begin{equation} F(x^\star + \delta x) = F(x^\star) + F'(x^\star)\delta x + \frac{1}{2}\delta x F''(x^\star)\delta x + \dots \end{equation} In the linear regime, under some conditions on $F$ we can ignore the higher order terms in the expansion. And thee conditions are \begin{equation} | F'(x^\star)\delta x | > |F(x^\star)| \end{equation} \begin{equation} | F'(x^\star)\delta x | > |\frac{1}{2}\delta x F''(x^\star)\delta x | \end{equation} From above equations, we can see that the linearization is valid for fixed points $x^\star$ such that $F(x^\star) = 0$ and also for the "slow points" where the value of $F(x^\star)$ is very small. From this observation, the authors constructed a scalar function which is the norm of the dynamics where the norm is either close to zero or exactly zero. $$ q(x) = \frac{1}{2}|F(x)|^2 $$ By minimizing $|F(x)|$ we can find candidate regions for linearization that are just not fixed points. The paper employs some numerical methods to find fixed and slow points. The paper consists some interesting experiments and one such task is 3-Bit Flip-Flop example. ![](https://i.imgur.com/VFipPAt.png) Here the aim to to train an RNN to align to the input bits which are flipped randomly. Initially the RNN starts at the random start state (+/- 1) and whenever it notices a change in the input, it should align itself along that bit flip. For example, in the first row of the above image, the RNN starts at the random bit sign, and when it notices a change in the input (which is +1 in this case) the RNN flips it's output sign and stays in there until it (red) notices the further change. One way to visualize the high dimensional hidden state trajectories is by using principle component analysis where we look at the co-variance matrices of this high dimensional trajectories and do singular the singular value decomposition of these matrices. We can then look the the modes which are most influential and visualize the trajectories. In short we are using PCA to plot the 3D picture of the N-dimensional dynamics. ![](https://i.imgur.com/nZAcbVL.png) Here we pick three such principle components and unpack what's going on with the dynamics. We can see that all of the 8 different inputs (which is $2^3$) are encoded as asymptotically stable fixed points in this cube along the corners. And the edges of the cube represents the transition of the bits from one fixed point to another. And separating these fixed points there are some other points (green points in the figure) which are essentially saddle points where the eigen values of the system are either positive in real part or negative. By disentangling this eigen space into stable, unstable and center substances ($E^s, E^u, E^c$) what we can see is that we there is an unstable sub-spaces which are transverse and stable sub-spaces perpendicular to the fixed point. In other words there is a stable manifold $E^s$ which separates the cube into two parts. In this sense, everything that lies on the stable manifold $E^s$ converges to the saddle point and everything that lies in the unstable manifold $E^u$ deflects the dynamics away. The system learned this mechanism in which the bits transition from locally stable fixed points to other fixed points is because there are these invariant manifold along the saddle points that separate these quadrants of the cube.