# Rebuttal ICASSP
First, we would like to thank all the reviewers for their useful and mostly enthusiastic comments on our paper submission. We will proceed by answering to each reviewer individually.
**Answer to reviewer #1**
We are glad that you think we presented a nice idea in this paper.
*"I would like to see a bit more discussion elaborating on the continuous case"*
We would indeed have liked to elaborate further on the continuous modeling of dynamical systems, yet we needed to describe the discrete setting in detail since, in the experiments that we presented, the training setup was fully discrete and the continuous modeling was only used in the testing phase with a fixed trained model. ~~Training a model directly with a continuous model, although theoretically simple, features technical difficulties (e.g. for an efficient parallelized training) which we will likely discuss in future work.~~
*"For example, the authors should describe why the exponential is a better choice, compared to a linear step in the direction of the derivative (i.e., why should I not take z(t0)+ΔtDz(t0) as an estimate of z(t0+Δt))?"*
~~In fact, we used an exponential because it is the exact solution of the linear ODE that we introduced. There is indeed a lot of litterature that uses numerical integration schemes to track the evolution of latent states, notably neural ODE or previous works from our team (e.g. https://arxiv.org/abs/1712.07003). It was here unnecessary since we have a trivial dynamical system due to the fact that matrix K (and therefore D) is fixed over time.~~ We will comment on this by adding this sentence in the paper : "While many works, e.g. neural ODEs [appropriate reference], use a numerical integration scheme to model a continuous output, we do not need to do so since our model can be solved analytically."
**Answer to reviewer #5**
*"The discussion of the orthogonality lacks references and theoretical proof to support the arguments"*
We indeed decided to use intuitive ideas rather than equations and references in this discussion. The same regularization term as ours has indeed been used in other works, for other tasks, and we will refer to some of them to strenghten our argumentation (e.g. equation 7 from http://proceedings.mlr.press/v70/vorontsov17a/vorontsov17a.pdf or equation 1 from https://proceedings.neurips.cc/paper/2018/file/bf424cb7b0dea050a42b9739eb261a3a-Paper.pdf).
*"How effective is the introduced loss term?"*
This question is addressed in the experiments with our "ablated version", which is precisely our method without the orthogonality term in the training loss. See Table 1 of the paper and the associated discussion.
*"Will we get a consistent trend when varying sampling rates ?"*
We considered answering this question when writing the paper, but we chose to focus on simpler experiments due to the space limitations. However, we have made experiments with various sampling rates on the fluid flow dataset which indeed seem to show a consistent trend.
| Sampling period | DeepKoopman MSE | Our method MSE |
| -------- | -------- | -------- |
| 2 time steps | 1.44 x 10^⁻6 | 1.11 x 10^-6 |
| 5 time steps | 1.36 x 10^⁻5 | 1.39 x 10^-6 |
| 10 time steps | 2.15 x 10^⁻4 | 1.62 x 10^-6 |
| 20 time steps | 2.31 x 10^⁻3 | 2.31 x 10^-6 |
While all methods see their performance decrease when the sampling period increases, our method (without orthogonality loss here) is still able to provide good interpolations even at very low frequencies whereas the DeepKoopman interpolation is hardly better than a linear interpolation in the observation space.
*"(1) Why does the proposed method (ablated) perform better in LF (low frequency) setting than in HF?"*
The answer is that in the HF setting the training time series are 20 times longer, which means that if K is not orthogonal then the latent states will diverge exponentially fast far a greater number of steps, leading to large errors. This is true only for the pendulum dataset since the fluid flow time series are not long enough for this problem to emerge. We discuss this in the paper already : "the orthogonality constraint is crucial to keep the predictions stable when modeling very long time series, in particular for conservative models such as this one" (there is a mistake here : we will replace "model" by "systems")
*"(2) Follow up (1) is the proposed method sensitive to hyperparameter choices?"*
Our method is quite marginally sensitive to its introduced hyperparameters. As mentioned in the paper, we never felt the need to choose a relative importance between prediction and linearity loss terms. Our approach for fixing the weight of the orthogonality loss term was just to find the first power of 10 that ensures a near-orthogonality of the final matrix K, i.e. ensure that K.K^T is close to I. Basically, it is unlikely to obtain an (almost) orthogonal matrix K without the orthogonality loss term, so we just have to make this term sufficiently strong to enforce the (approximate) orthogonality while still representing a small part of the total loss.
**Answer to reviewer #8**
Thank you for pointing out some mistakes in the document layout : we will correct those.
We were not aware of the paper that you mentioned, but it is indeed fairly close to ours despite relying on standard methods. We will definitely mention it and might draw inspiration from it for future works. We will cite the paper along with another one in this sentence : "Some works have introduced practical ways to obtain stable Koopman models : [ref1], [ref2]"