kXAS - HackMD

## Longer test-time forecast horizon We have now conducted experiments for increased test-time horizon. Please see [Tab. 1](https://anonymous.4open.science/r/SKOLR-1D6F/Tab1_ScaleUp.pdf). SKOLR has a recursive structure. Even if we train over a given horizon, we can recursively predict for a longer horizon. There is performance deterioration, but our results show it is not severe. SKOLR's performance compares favorably with Koopa's. ## Non-overlapping patch tokenization We apply patch processing before Linear RNN branches. This reduces complexity to O(L/P) from timestamps (O(L)). Our efficiency results *do* use this for baselines (PatchTST, iTransformer, Koopa). It slightly improves baseline performance. ## Hyperparameter selection + sensitivity We will clearly specify the hyperparameter selection protocol in App.A.2. We selected the optimal configuration based on MSE for the validation split. Our setup strictly separates training, validation, and test sets, ensuring no information leakage. Tab. 4 (paper) analyzes how branch number N and dimension D impact performance. Response [Tab. 3](https://anonymous.4open.science/r/SKOLR-1D6F/Tab3_PatchLen.pdf) provides new results of sensitivity to patch length. SKOLR exhibits little sensitivity to patch length P. We do not tune this in our experiments; we used P = L/6 for all datasets. ## Revise Tab. 8 We apologize for inconsistencies in Tab.8. We've verified all results against our original records and fixed the MASE metric issues for the Quarter and Others categories in the revised [Tab.8](https://anonymous.4open.science/r/SKOLR-1D6F/Tab8_ShortTerm.pdf). ## Relationship to [1] Orvieto et al. 2023 Thank you for highlighting this paper. We will modify the introduction: "In this work, we consider time-series forecasting, and establish a connection between Koopman operator approximation and linear RNNs, building on the observation made by [1]. We make a more explicit connection and devise an architecture that is a more direct match." In Related Work, we will add: "Orieto et al. made the observation, based on a qualitative discussion, that the Koopman operator representation of a dynamical system can be implemented by combining a wide MLP and a linear RNN. We provide a more explicit connection, providing equations to show a direct analogy between a structured approximation of a Koopman operator and an architecture comprised of an MLP encoder plus a linear RNN. Although [1] observe this connection, their studied architecture stacks linear recurrent units interspersed with non-linear activations or MLPs. While excellent for long-range reasoning tasks, this departs from the architecture in their App. E. By contrast, our developed architecture does consist of (multiple branches) of an MLP encoder, a single-layer linear RNN, and an MLP decoder. It thus adheres exactly to the demonstrated analogy between Eq. (5) and (8) of our paper. Whereas our focus is time series forecasting, [1] target long-range reasoning. Although it is possible to convert their architecture to address forecasting, performance suffers because it is not the design goal." We recognize the importance of acknowledging [1], but we don't believe that its existence significantly diminishes our contribution. The connection observed by Orieto et al. is qualitative; there are no supporting equations. In contrast, we develop an explicit connection by expressing a linear RNN in (5) and a structured Koopman operator approximation in (8). This explicit connection adds value beyond the qualitative insights. ## Presentation - Sec. 3 We apologize for notational confusion and will correct it. There are strong motivations for the frequency decomposition design choice. In classical time-frequency analysis, the value of adaptation to different frequencies has long been recognized. Wavelet analysis applies different filters at different frequency scales. In more recent forecasting literature, frequency decomposition has been shown to be highly effective in TimesNet (Wu 2023), Koopa (Liu 2023), and MTST (Zhang 2024). Low-frequency dynamics may be considerably different from high-frequency dynamics and are more easily learnt after disentanglement. We are motivated to allow for observable functions to be learned both in frequency and time, with explicit consideration of the frequency aspect. We will modify Sec.3 to describe the whole pipeline for a single branch, and then describe the multiple-branch case. ## Questions: Non-linear systems; Fig. label Tab. 1 signals are real system measurements, e.g., Electricity Transformer Temperature. We do not know the exact dynamics. The 4 systems in Sec.4.2 are commonly-studied, synthetic non-linear systems. These allow us to study a setting where it is important to model non-linear dynamics. We will write "Synthetic Non-linear Systems" to stress the synthetic nature. The left figure of Fig.3 corresponds to (a) and the right to (b). We will add clear labels.