## 1. High Efficiency ### 1.1 Computational efficiency: We provide additional results for all datasets. Please see [Tab.4](https://anonymous.4open.science/r/SKOLR-1D6F/Tab4_Computation.pdf). SKOLR achieves a compelling trade-off between memory, computation time, and accuracy. ### 1.2 Theoretical Complexity Analysis SKOLR achieves computational efficiency through structured design and linear operations. For a time series (length L, patch length P, embedding dimension D, N branches): - **Time complexity**: O(N × (L/P) × D²) from spectral decomposition, encoder/decoder MLPs, and linear RNN computation - **Memory complexity**: O(N × D²) for parameters and O(N × (L/P) × D) for activations Compared to a non-structured model with dimension D' = N×D: - Non-structured approach: O((L/P) × N²D²) time and O(N²D²) memory - SKOLR provides N-fold reduction in computational requirements SKOLR avoids quadratic scaling with sequence length seen in transformers (O((L/P)² × D + (L/P) × D²) time, O((L/P)² + (L/P) × D) memory). ### 1.3 Parallel computing The N separate branches are processed independently (in our code), reducing time complexity to O((L/P) × D²). The linear RNN computation has no activation functions, so the hidden state evolution is: $h_k = g(y_k) + \sum_{s=1}^{L/P} W^s g(y_{k-s})$. This allows efficient matrix operations, reducing time complexity to O($D^3 \log(L/P)$ + $(L/P)^2 \times D$) per branch. For time series where $L/P \ll D$, this is a significant speedup. ## 2. Exceptional Performance Claim Our claims are based on Tab.s 1 \& 2 and Fig.s 2 \& 4. SKOLR requires less-than-half the memory of any baseline and less-than-half of Koopa's training time (Fig.4). SKOLR has the (equal-)lowest MSE in 17 our of 32 tasks and ranks second in a further 7 (Tab.1). SKOLR significantly outperforms Koopa for synthetic non-linear systems (Tab.2). Given the very low memory footprint, the low training time, and the impressive accuracy, we consider that the claim of "exceptional" performance is supported, but we can use a less strong adjective. We agree that the paper would be strengthened by more examples (showcases) demonstrating the capture of complex patterns. We do have one example in Fig. 3, but we will include more. Please see [Fig. 2](https://anonymous.4open.science/r/SKOLR-1D6F/Fig2_Compare.pdf) as an example. SKOLR's predictions have much lower variance than Koopa's and track the oscillatory behaviour better. ## 3. Relevant Reference We will cite it and add discussion. The paper evaluates the existing non-linear RNNs and Koopman methods for near-wall turbulence. It does not draw connections or develop a new method. ## 4. Clarity ### 4.1 Frequency Decomposition We agree that the motivation for our learnable frequency decomposition could be better. Please see the response to Reviewer kXAS. Learnable frequency decomposition offers three key advantages. (1) Each branch can focus on specific frequency bands, decomposing complex dynamics. (2) Learnable components adaptively determine which frequencies are most informative for prediction. (3) This approach aligns with Koopman theory, as different frequency components often correspond to different Koopman modes. ### 4.2 Generalization capability Constructing theoretical guarantees that the approach generalizes is challenging and would be a paper all on its own. We do provide experimental results for a large variety of time-series that exhibit very different characteristics, ranging from strongly seasonal temperature series to pendulums exhibiting highly non-linear dynamics. We consider that the experimental evidence in the paper is strongly supportive of a capability to generalize to a diverse range of time series. ## 5. Theoretical analysis: eigenvalues This is an excellent suggestion. Our focus is forecasting, so our results and analysis concentrate on that task. However, the analysis of learned Koopman operator eigenvalues can indeed reveal important characteristics. We analyzed eigenvalue plots for Traffic dataset (see [Fig.3](https://anonymous.4open.science/r/SKOLR-1D6F/Fig3_Eigenvalue.pdf)). We see that each branch learns complementary spectral properties, with all eigenvalues within the unit circle, indicating stable dynamics. Branch 1 shows concentration at magnitude 0.4, while Branch 2 exhibits a more uniform distribution. The presence of larger magnitudes (0.7-0.9) indicates capture of longer-term patterns. ## 6. Questions ### 6.1 It is a typo. We will revise Fig. 1 to show the proper time indexing across all components. ### 6.2 Long horizon prediction To show SKOLR's capability for longer horizons, we included experiments with extended prediction horizons (T=336, 720) in App. B.2 (Tab. 9), SKOLR maintains its performance advantage at extended horizons, with lower error growth rates. Please also see response to reviewer kXAS concerning test-time horizon extension and [Tab. 1](https://anonymous.4open.science/r/SKOLR-1D6F/Tab1_ScaleUp.pdf).