RNN - HackMD

# Learning strange attractors with reservoir systems We have organised some notes for this paper. Paper: [Learning strange attractors with reservoir systems](https://arxiv.org/pdf/2108.05024) Implement: [Google colab](https://colab.research.google.com/drive/1geoz8rqQA8nAVzlh5HE88mPl6bqtEnCQ?usp=sharing) Other References: - Application: [A systematic exploration of reservoir computing for forecasting chaotic nonlinear dynamical systems](https://arxiv.org/pdf/2201.08910) - Proof: [Takens' Theorem: Proof and Applicaation](https://dspace.uib.es/xmlui/bitstream/handle/11201/149295/tfm_2017-18_MFMA_jpv514_1566.pdf?sequence=1&isAllowed=y) - Definition: - [Manifolds, Tensor Analysis, and Applications](https://link.springer.com/book/10.1007/978-1-4612-1029-0) - [An Introduction to Manifolds](https://link.springer.com/book/10.1007/978-1-4419-7400-6) ## Introduction In the implementation linked above, we observed that by using data from only a single dimension of a (chaotic) dynamical system, we can reconstruct the "structure" of the entire dynamical system via a Recurrent Neural Network (RNN) constructed with random matrices. (This involves projecting the data into a high-dimensional space and then using PCA to reduce it to three dimensions for visualization). The contribution of this paper is to provide the theoretical proof of why this approach is available. ::: danger **Theorem 4.5 (Linear reservoir embeddings)** Let $\phi \in \text{Diff}^2(M)$ be a dynamical system on a compact manifold $M$ of dimension $q$ that exhibits finitely many periodic orbits. Suppose that for each periodic orbit $m$ of $\phi$ with period $n \in \mathbb{N}$, the derivative $T_m\phi^{-n}$ has $q$ distinct eigenvalues $\lambda_1, \lambda_2, \dots, \lambda_q$. Let now $\ell \in \mathbb{N}$ be the lowest common multiple of all the periods of the finite periodic points of $\phi$ and let $N \in \mathbb{N}$ such that $N > \max\{2q, \ell\}$. The Theorem below represents the primary objective of the entire paper. Construct now $\overline{A} \in \mathbb{M}_{N,N}$ and $\overline{\mathbf{C}} \in \mathbb{R}^N$ by drawing their entries using *independent regular real-valued distributions* (see prop 4.4). Then, there exist rescaled versions $A$ and $\mathbf{C}$ of $\overline{A}$ and $\overline{\mathbf{C}}$ respectively such that the generalized synchronization $f_{(\phi, \omega, F)} \in C^2(M, \mathbb{R}^N)$ (associated to the state map $F(\mathbf{x}, z) := A\mathbf{x} + \mathbf{C}z$) is **almost surely an embedding** (for generic $\omega \in C^2(M, \mathbb{R})$). <div style="text-align: center;"> <img src="https://hackmd.io/_uploads/r18PboEBbg.png" alt="Description" style="width: 60%; margin: 0 auto; display: block;"> </div> ::: This paper primarily extends **Takens' Embedding Theorem**. For the proof of Takens' Theorem, please refer to: [Detecting strange attractors in turbulence](https://link.springer.com/chapter/10.1007/BFb0091924). In the following steps, we will systematically discuss the properties of this "reservoir map", ranging from smoothness to its embedding capabilities. Finally, we will prove that a map constructed via random matrices satisfies all necessary conditions. ## Definition $\phi \in \text{Diff}^2(M)$ presents the dynamical system. > $\phi$ is a *twice-differentiable diffeomorphism*, i.e. $\phi \in C^2(M)$ is a bijection and its inverse $\phi^{-1}$ is also twice-differentiable. $\omega$ is a **generic observation** used to observe data from a single dimension. > Taking our implementation (Lorenz system) as an example, $\phi^t(m) = (x(t), y(t), z(t))$, where $m\in M$ is the inition condition, then $\omega(\phi^t(m)) = x(t)$. > > The term "generic" means that they belong to an open and dense subset of $C^r(M, \mathbb{R}).$ **Generalised synchronisation** $f_{(\phi,\omega, F)}(\cdot)$ is defined as: for a drive system $m\in\mathcal M \subseteq\mathbb R^q$ and response system $\mathbf x\in \mathbb R^N$, they are synchronised if there exists a function $\psi$ such that $\psi(\mathbf x) = m$. :::success In this paper, the definition is stricter, the role of every $\psi$ is replaced by $f_{(\phi, \omega, F)}(\cdot)$. ::: >$f_{(\phi, \omega, F)} \in C^2(M, \mathbb{R}^N)$ can be interpreted as the map representing the entire RNN, where $\mathbb{R}^N$ is the **reservoir system** (state space). >Intuitively this implies that the dynamics of the response system are **entirely predictable** from the history of the input $F(\mathbf{x}, z) := A\mathbf{x} + \mathbf{C}z$ corresponds to the "recurrence" within the hidden layer of the RNN. [More explicit definition](https://hackmd.io/@2dofLGozQqud6yU_088mEA/BJPPohNHbl) ## Flow/Dependence Chart ```mermaid graph TD subgraph Setup ["Foundational Setup"] A["Dynamical System (M, φ) dim(M) = q"] --- B["Linear Reservoir xₜ = Axₜ₋₁ + Czₜ"] end subgraph Conditions ["Hypotheses (Thm 3.1)"] C1["Stability: ρ(A) < 1 (Echo State Property)"] C2["Condition (i): Orbit Distinction (Local Independence at periodic points)"] C3["Condition (ii): Reachability (Krylov space rank N)"] end subgraph Tools ["Intermediate Propositions"] P23["Prop 2.3: Existence, Smoothness & Continuity of Generalized Sync (f)"] P44["Prop 4.4 (Friedrich Philipp): Random A, C satisfy (i) & (ii) Almost Surely"] end subgraph Theorems ["Main Results"] T31["Theorem 3.1: Immersion Map f is smooth & immersive for generic observations"] T41["Theorem 4.1: Embedding Adds Dimension Constraint: N > max{2q, l}"] T45["Theorem 4.5: Linear Reservoir Embedding The 'Randomized' Version"] end subgraph Goal ["Applications"] L["Topological Conjugacy & Learnability"] R["Attractor Reconstruction & Forecasting"] end %% Relations A --> P23 B --> P23 P23 --> C1 C1 & C2 & C3 --> T31 T31 --> T41 H["LCM of Periods (l)"] --> T41 P44 --> T45 T41 --> T45 T45 --> L L --> R style T31 fill:#f9f,stroke:#333,stroke-width:2px style T41 fill:#f9f,stroke:#333,stroke-width:2px style T45 fill:#bbf,stroke:#333,stroke-width:4px style L fill:#dfd,stroke:#333 style R fill:#dfd,stroke:#333 ``` ## Differentiable (Section 2) ::: info First, we must establish that our map $f_{(\phi, \omega, F)}: M \to \mathbb{R}^N$ exists, is unique, and sufficiently smooth. Section 2 presents three key statements: **Theorem 2.1, Corollary 2.2, and Proposition 2.3**. **Theorem 2.1** provides the generalized framework. As long as the non-linear $F(x, z)$ satisfies $L_{F_x} < \min \{1, 1/\|T\phi^{-1}\|_\infty \}$, then $F$ has $(\phi, \omega)-\text{Echo State Property}$, and there exists a unique, differentiable *generalized synchronization (GS)* $f_{(\phi, \omega, F)}:M \longrightarrow W$. > The state sequence $x$ can be determined by recursions: $x_t = F\left( x_{t-1}, S_{(\phi, \omega)}(m)_t \right)$, where $t \in \mathbb{Z}, m \in M$ and $S_{(\phi, \omega)} : M \longrightarrow \ell^\infty(\mathbb{R}^d)$ is the $(\phi, \omega)$-delay map. > > We say that a **generalized synchronization** occurs in this configuration when there exists a map $f_{(\phi, \omega, F)} : M \longrightarrow \mathbb{R}^N$ such that such that for any time $t \in \mathbb{Z}$, the state of the reservoir $x_t$ satisfies: $\mathbf{x}_t = f_{(\phi, \omega, F)}(\phi^t(m))$. > > We say that $F$ has a $(\phi, \omega)-\text{Echo State Property (ESP)}$ if the solution sequence $x$ is unique. **Corollary 2.2** is a linear deduction from Thm 2.1, discussing the case where $F(\mathbf{x}, z) := A\mathbf{x} + \mathbf{C}z$.The most critical conclusion here is the explicit form of the solution: $f_{(\phi, \omega, F)}(m) = \sum_{j=0}^{\infty} A^j \mathbf{C} \omega (\phi^{-j}(m))$. **Proposition 2.3** provides the foundation for subsequent proofs. Here, the condition is restricted such that the spectral radius of matrix $A$ satisfies $\rho(A) < 1$. t proves that as long as $\phi \in \text{Diff}^r(M)$ and $\omega \in C^r(M, \mathbb{R})$, then $f_{(\phi, \omega, F)}$ is sufficiently smooth (belonging to $C^r(M, \mathbb{R}^N)$). Additionally, point (iii) establishes the continuity of the mapping $\omega \longmapsto f_{(\phi, \omega, F)}$ and the openness of the **generic** (open and dense) set. ::: ### Proposition 2.3 Let $\phi \in \text{Diff}^1(M)$ be a dynamical system on the manifold $M$ (not necessarily compact) and consider the observation map $\omega \in C^1(M, \mathbb{R})$. Let $F : \mathbb{R}^N \times \mathbb{R} \to \mathbb{R}^N$ be a linear state map given by $F(\mathbf{x}, z) := A\mathbf{x} + \mathbf{C}z$ with $A \in \mathbb{M}_{N,N}$, $\mathbf{C} \in \mathbb{R}^N$, $N \in \mathbb{N}$. 1. If the spectral radius of $A$ satisfies that $\rho(A) < 1$ and $\omega$ maps into a bounded set of $\mathbb{R}$ then the *GS* $f_{(\phi, \omega, F)} : M \to \mathbb{R}^N$ given by: $$ f_{(\phi, \omega, F)}(m) = \sum_{j=0}^{\infty} A^j \mathbf{C} \omega (\phi^{-j}(m)). \tag{2.6} $$ exists and it is a continuous map. 2. Additionally, let $r \in \mathbb{N}$ and suppose that $\phi \in \text{Diff}^r(M)$ and that there exist constants $k_1, \dots, k_r \in \mathbb{N}$ such that $\|A^{k_i}\| \, \|T^i \phi^{-k_i}\|_\infty < 1$, $\|T^i \phi^{-1}\|_\infty < \infty$ for all $i \in \{1, \dots, r\}$. Then for any $\omega \in C^r(M, \mathbb{R})$ such that $\|D^i \omega\|_\infty < \infty$, for all $i \in \{1, \dots, r\}$, the map $f_{(\phi, \omega, F)}$ belongs to $C^r(M, \mathbb{R}^N)$ and the higher order derivatives are given by: $$ D^i f_{(\phi, \omega, F)}(m) = \sum_{j=0}^{\infty} A^j \mathbf{C} D^i (\omega \circ \phi^{-j}) (m), \text{ for all } i \in \{1, \dots, r\}. \tag{2.7} $$ 3. Suppose now that $M$ is compact. In the hypotheses of points **(i)** and **(ii)** above, the map $$ \begin{aligned} \Theta_{(\phi, F)} : \quad C^r(M, \mathbb{R}) \quad &\longrightarrow \quad C^r(M, \mathbb{R}^N) \\ \omega \quad &\longmapsto \quad f_{(\phi, \omega, F)} \end{aligned} \tag{2.8} $$ is continuous. Moreover, the subsets $\Omega_i$ and $\Omega_e$ of $C^r(M, \mathbb{R})$ for which the corresponding *GS* are immersions and embeddings, respectively, are open. ## Immersion (Section 3) ::: info It provides the mathematical guarantee that a randomly generated linear reservoir can accurately "mirror" the internal behavior of a complex dynamical system using only a single stream of data. - **Immersive Synchronization**: Under certain conditions, the reservoir’s internal states will naturally synchronize with the hidden states of the original system. - **Geometric Preservation (Immersion)**: This synchronization is an immersion, meaning the reservoir creates a smooth "copy" of the original system’s trajectory without tangling or overlapping it. - **Minimal Requirements**: It proves that you don't need a perfectly designed system; generic (random) connections and generic (standard) observations are enough to capture the system's essence. - **The Foundation for Learning**: By establishing this immersion, it proves that the reservoir state contains all the necessary information to reconstruct and eventually predict the dynamics of the "strange attractor". ::: ### Theorem 3.1 (Immersion) Let $\phi \in\text{Diff}^2(M)$ be a dynamical system on a **compact** manifold $M$ of dimension $q$ that exhibits **finitely many periodic orbits**. Let $F:\mathbb R^N\times \mathbb R\longrightarrow\mathbb R^N$ be a **linear state map** as $$ F(\mathbf{x}, z):= A\mathbf x + Cz, \qquad A\in \mathbb M_{N\times N}, \;C\in\mathbb R^N $$ with $N\geq 2q$ and matrix $A$ satisfies $$\rho(A) :=\max_{i}\|\lambda_i\|<1$$ and such that $$\forall \omega\in\mathcal C^2(M,\mathbb R), \;f_{(\phi,\omega,F)}\in\mathcal C^2(M, \mathbb R^N) \;\text{and}\; \Theta_{(\phi, F)}\in\mathcal C(\mathcal C^2(M, \mathbb R),\mathcal C^{2}(M,\mathbb R))$$ Suppose the following conditions also hold: 1. For each periodic orbit $m$ of $\phi$ with period $n\in\mathbb Z^+$, the derivative $T_m\phi^{-n}$ has $q$ **distinct eigenvalues** $\lambda_{m,1},\ldots,\lambda_{m,q}$. Define $$ \lambda_{\text{max}} = \max_{m\in \text{Periodic Set}}\left(\max_{i\in\{1,\ldots,q_m\}}\|\lambda_{m,i}\|\right) $$ and $$ n_{\text{min}}=\min_{m\in\text{Periodic Set}}n_m $$ Suppose that $\rho(\lambda_{\text{max}}A^{n_{\text{min}}})<1$ and $$ \{(\mathbb I-\lambda_{m,j}A^n)^{-1}(\mathbb I-A)^{-1}(\mathbb I-A^n)\mathbf C\}_{j\in\{1,\ldots q_m\}}\;\text{forms an linearly independent set}, $$ where $\lambda_{m,j}$ are eigenvalues of $T_m\phi^{-n}, \;\forall m\in \text{Periodic Set}$. 2. The vectors $\{A^j\mathbf C\}_{j\in\{0,\ldots,N-1\}}$ form a linearly independent set. Then, for generic $\omega\in\mathcal C^2(M,\mathbb R)$ the generalised synchronisation $f_{(\phi, \omega, F)}\in\mathcal C^2(M, \mathbb R^N)$ is an **immersion**. ## Embedding (Section 4) ::: info This tells us that whenever $N$ is large enough ($N>\underset{m\in\text{Periodic Set}}{\text{lcm}}\{n_m\}$), our RNN, $f_{(\phi, \omega, F)}$ will become an **embedding** *automatically*. ::: ### Theorem 4.1 (Embedding) Assume that the hypotheses of **Theorem 3.1** hold true and that, additionally, $N>\max\{2q,l\}$ with $l\in\mathbb Z^+$ the **lower common multiple** of all the periods of the finite periodic points of $\phi$: $$ l := \underset{m\in \text{Periodic Set}}{\text{lcm}}\{n_m\} $$ Then, for generic $\omega\in\mathcal C^2(M,\mathbb R)$, the generalised synchronisation $f_{( \phi,\omega,F)} \in \mathcal C^2(M,\mathbb R^N)$ is an **embedding**. ::: success **Note:** The condition that $N>2q$ is derived from [Whitney's embedding theorem](https://en.wikipedia.org/wiki/Whitney_embedding_theorem). ::: ## Randomness (Section 4) ### Proposition 4.4 (Friedrich Philipp) Let $N \in\mathbb N$, $A \in \mathbb M_{N\times N}$, and $\mathbf C \in \mathbb R^N$ and assume that the entries of $A$ and $\mathbf C$ are drawn using **independent regular real-valued distributions**. The distribution satisfies the following: 1. The collection of random variable is defined as $$ \mathcal E:=\{A_{ij}\}_{1\leq i,j\leq N}\;\cup\{\mathbf C_i\}_{1\leq i\leq N} $$ 2. $\forall X,Y\in\mathcal E, \;X \mathrel{\perp\mspace{-9mu}\perp} Y$ ($X$ and $Y$ are independent). 3. $\forall X\in\mathcal E,\,\mathbb P(X=x) = 0,\quad\forall x\in\mathbb R$. Then the following statements hold: 1. The vectors $\mathbf C,\;A\mathbf C,A^2\mathbf C,\ldots,A^{N - 1}\mathbf C$ are **linearly independent almost surely**. 2. Given $m$ distinct complex numbers $\lambda_1,\ldots, \lambda_m \in \mathbb C$, where $m \in\mathbb N$, the event that $1, \lambda_1,\ldots, \lambda_m \ne \sigma(A)$ ( $\sigma(A)$ is the spectrum of $A$) and that the vectors $$ (\mathbb I - \lambda_j A)^{-1}(\mathbb I - A)^{-1}(\mathbb I - A^N)\mathcal C,\quad j = 1,\ldots,m $$ are **linearly independent** holds **almost surely**. :::info We can instantly have the following insights: - (i) means that randomly drawn matrix $A$ and vector $\mathbf C$ will satisfy the hypothesis (ii) in **Theorem 3.1/4.1** with probability 1. - (ii) act as the same role as (i) in hypothesis (i) in **Theorem 3.1/4.1**. So basically, we can conclude that once we have sufficient large $N$, and randomly chosen $A$ and $\mathbf C$, we can have our map to be an **embedding**, which is very close to the conclusion drawn in **Theorem 4.5**, the core concept of the paper. ::: ## Prediction Algorithm ### Grigoryeva et al. - Algorithm: ![image](https://hackmd.io/_uploads/HJ9Bh7frbe.png) - Implementation: [Here](https://colab.research.google.com/drive/1FVloEwObmCgCkQ4jLDyNY-RMYmzFKKbV?usp=sharing) ![image](https://hackmd.io/_uploads/rkUuCU7H-e.png) This paper mainly focus on one step forecasting, the result is quite good, but poor at long term prediction. --- ### Platt et al. - Algorithm: ![image](https://hackmd.io/_uploads/B1tl3LmSZg.png) - Implementation: [Here](https://colab.research.google.com/drive/1L61hYG37HROE2cxkD1o9W_fLuJfqCaBz?usp=sharing) ![image](https://hackmd.io/_uploads/H1Ria8QBZg.png) This paper mainly focus on the long term prediction, averagely, the series diverges visually when steps are more than 500.