工具 / sktime
===
###### tags: `ML / 時間序列`
###### tags: `ML`, `時間序列`, `sktime`

<br>
[TOC]
<br>
## ==Key Points==
### 術語
- ### fh: forecasting horizon 預測範圍
- 相對 fh = N+1, N+2, N+3, ... (is_relative=True)
- 絕對 fh = 2021-12, 2022-01, 2022-02, ... (is_relative=False)
- 可轉成相對,需提供 `cutoff`
- 例如 `cutoff='2022-01'`
fh = -1, 0, 1, 2, ... (如同 t-1, t, t+1, t+2 概念)
- 一般 `cutoff` 皆設定為 train-set 最後一個日期 2021-11
fh = 1, 2, 3, 4, ... (如同 t+1, t+2, t+3, t+4 概念)
- ### sp: seasonal period 季節性週期
- ### FFT: Fast Fourier Transform, 快速傅立葉變換
可用於分解合成曲線/波的週期成份,趨勢可以想成第一主成份,週期可以想成次要成份
- ### estimator, 評估器
為 classifier, regressor, transformer, forecaster, 等等的上位稱呼
<br>
### 支援的任務
- classifier
- regressor
- transformer
- forecaster
<br>
### 目前支援的 forecaster 模型
| name | estimator<br>(prefix: sktime.forecasting.) |
| ---- | -------- |
| ARIMA | .arima.ARIMA |
| AutoARIMA | .arima.AutoARIMA |
| AutoETS | .ets.AutoETS |
| AutoEnsembleForecaster | .compose._ensemble.AutoEnsembleForecaster |
| ColumnEnsembleForecaster | .compose._column_ensemble.ColumnEnsembleForecaster |
| Croston | .croston.Croston |
| DirRecTabularRegressionForecaster | .compose._reduce.DirRecTabularRegressionForecaster |
| DirRecTimeSeriesRegressionForecaster | .compose._reduce.DirRecTimeSeriesRegressionForecaster |
| DirectTabularRegressionForecaster | .compose._reduce.DirectTabularRegressionForecaster |
| DirectTimeSeriesRegressionForecaster | .compose._reduce.DirectTimeSeriesRegressionForecaster |
| EnsembleForecaster | .compose._ensemble.EnsembleForecaster |
| ExponentialSmoothing | .exp_smoothing.ExponentialSmoothing |
| **ForecastingGridSearchCV** | .model_selection._tune.ForecastingGridSearchCV |
| ForecastingPipeline | .compose._pipeline.ForecastingPipeline |
| **ForecastingRandomizedSearchCV** | .model_selection._tune.ForecastingRandomizedSearchCV |
| MultioutputTabularRegressionForecaster | .compose._reduce.MultioutputTabularRegressionForecaster |
| MultioutputTimeSeriesRegressionForecaster | .compose._reduce.MultioutputTimeSeriesRegressionForecaster |
| MultiplexForecaster | .compose._multiplexer.MultiplexForecaster |
| NaiveForecaster | .naive.NaiveForecaster |
| OnlineEnsembleForecaster | .online_learning._online_ensemble.OnlineEnsembleForecaster |
| PolynomialTrendForecaster | .trend.PolynomialTrendForecaster |
| RecursiveTabularRegressionForecaster | .compose._reduce.RecursiveTabularRegressionForecaster |
| RecursiveTimeSeriesRegressionForecaster | .compose._reduce.RecursiveTimeSeriesRegressionForecaster |
| StackingForecaster | .compose._stack.StackingForecaster |
| ThetaForecaster | .theta.ThetaForecaster |
| TransformedTargetForecaster | .compose._pipeline.TransformedTargetForecaster |
| TrendForecaster | .trend.TrendForecaster |
| UnobservedComponents | .structural.UnobservedComponents |
| VAR | .var.VAR |
<br>
### state-of-art 模型概況
> [Forecasters in sktime - main families](https://www.sktime.org/en/latest/examples/01_forecasting.html#2.-Forecasters-in-sktime---main-families)
> 
| packages | models |
| -------- | ------ |
| statsmodels | ExponentialSmoothing, ThetaForecaster, autoETS |
| pmdarima | ARIMA, autoARIMA |
| tbats | BATS, TBATS |
| sktime | PolynomialTrend |
| Facebook prophet | Prophet |
- Prophet 在整合上,還在思考如何處理 fh
> NotImplementedError:
> <class 'pandas.core.indexes.period.PeriodIndex'> is not supported for input,
> use type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'> instead.
<br>
### 與 AI Maker 整合上的思考
1. 持續更新的數據集,要如何處理?
2. 選定某類模型演算法後,模型因新的數據集而持續迭代更新,要如何處理?
3. 輸入輸出接口
> 每一種演算法能處理的輸出入接口不盡相同
- 輸入:單變數/多變數
- 輸出:單變量/多變量
4. sktime 的模型演算法,有些需要輸入參數,如: sp (假定已知週期)
- 實務上,可能不知道資料集的實際週期
需要藉由實作**超參數搜尋**,來尋找最佳週期
- 即使開出 sp 參數要使用者輸入,使用者本身也可能不確定實際週期
<br>
<hr>
<hr>
<br>
## Source Code
> https://github.com/alan-turing-institute/sktime
- License: BSD-3-Clause License
- [Release note(發佈版本通知)](https://www.sktime.org/en/latest/changelog.html)
- [v0.8.1 (2021-10-28)](https://github.com/alan-turing-institute/sktime/tree/v0.8.1/sktime)
<br>
## [doc](https://www.sktime.org/en/stable/)
### [開發宗旨](https://www.sktime.org/en/stable/roadmap.html#project-aims)
- **Develop a unified framework for machine learning with time series in Python**
用 Python 開發具有時間序列的機器學習之統一框架
- [Developer Guide](https://www.sktime.org/en/stable/developer_guide.html)
致力於「與 scikit-learn 整合」的時間序列機器學習工具
- **Advance research on algorithm development and software design for machine learning toolboxes**
對於 sktime 機器學習工具箱,在演算法法開發與軟體設計上深入研究
- **Build a more connected community of researchers and domain experts who work with time series**
針對研究時間序列的研究人員和領域專家,建立一個彼此聯繫更緊密的社群
- **Create and deliver educational material including documentation and user guides**
建立和交付教材,包括文件和使用者指南
<br>
### [Get Started](https://www.sktime.org/en/stable/get_started.html)
- ### Installation
```python
pip install sktime
pip install sktime[all_extras]
```
:::warning
:warning: ImportWarning
> No module named 'xxx'. 'xxx' is a soft dependency and not included in the sktime installation. Please run: `pip install xxx`. To install all soft dependencies, run: `pip install sktime[all_extras]`
>
| module name | package |
| ------------ | -------------------------- |
| esig | `pip install esig` |
| fbprophet | `pip install fbprophet` |
| tbats | `pip install tbats` |
| hcrystalball | `pip install hcrystalball` |
| tsfresh | `pip install tsfresh` |
| stumpy | `pip install stumpy` |
- 資訊來源:執行 `df = all_estimators("forecaster", as_dataframe=True)`
:::
- ### Quickstart
```python=
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
mean_absolute_percentage_error(y_test, y_pred)
```
<br>
### [Tutorials](https://www.sktime.org/en/latest/tutorials.html)
- ### [Forecasting with sktime](https://www.sktime.org/en/latest/examples/01_forecasting.html) | [example / notebook](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb)
> at current time (v0.6x), ==forecasting of multivariate time seres is **not a stable functionality**==
>
> **Example**: as the running example in this tutorial, we use a textbook data set, the Box-Jenkins airline data set, which consists of the number of monthly totals of international airline passengers, from 1949 - 1960. Values are in thousands. **See "Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications", exercises sections 2 and 3.**
1.
1. **資料格式**
- pd.Series 用於 univariate (單變量)
- pd.DataFrame 用於 multivariate (多變量)
2. **基本工作流程**
- 準備資料
- 設定預測範圍 ForecastingHorizon
- 選定 forecaster
- 執行 fit
- 執行 predict
3. **評估模型好壞 (metric)**
- 5. testing whether this performance is statistically better than a chosen baseline performance
測試此效能在統計上是否優於所選基準效能
- mean_absolute_percentage_error 和 MeanAbsolutePercentageError 結果不同
- mean_absolute_percentage_error: 0.047958874937125424
- MeanAbsolutePercentageError: 0.04909243523851268
- 與 scikit-learn 相同
- 發現問題在於參數 `symmetric=False`
4. **滾動式更新&預測**
- 更新資料,且更新模型參數 (i.e. 效果同重新訓練,再預測)
- 使用方式:`forecaster.update() + predict()`
- 只更新資料,但不更新模型參數
- **情境1**:隨著時間過去,沒有新的觀測資料
> if no new data was observed, but time has progressed
- **情境2**:計算時間太久
> if computations take too long, and forecasts have to be queried.
- 使用方式:`forecaster.update(y_1958Apr, update_params=False)`
- simulate the update/predict
5. **evaluation worfklow**
> the forecaster needs to be tested in a set-up mimicking rolling forecasting, usually on past data
> [](https://i.imgur.com/eWGNdNH.png)
- [ExpandingWindowSplitter](https://www.sktime.org/en/latest/api_reference/auto_generated/sktime.forecasting.model_selection.ExpandingWindowSplitter.html) | [API doc](https://www.sktime.org/en/v0.7.0/api_reference/modules/auto_generated/sktime.forecasting.model_evaluation.evaluate.html)
- fh=1, initial_window=10, step_length=1, start_with_window=True
- `AutoARIMA` 就不支援 refit
> UserWarning: NotImplementedWarning: AutoARIMA does not have a custom `update` method implemented. AutoARIMA will be refit each time `update` is called.
>
2. **forecaster**
- **state-of-art**

- statsmodels
- ExponentialSmoothing, ThetaForecaster, and autoETS
- pmdarima
- ARIMA and autoARIMA
- tbats
- BATS and TBATS
- PolynomialTrend
- Facebook prophet
- Prophet
- **列舉可用 forecaster**
```
all_estimators("forecaster", as_dataframe=True)
```
- classifier
- regressor
- transformer
- forecaster
| name | estimator<br>(prefix: sktime.forecasting.) |
| ---- | -------- |
| ARIMA | .arima.ARIMA |
| AutoARIMA | .arima.AutoARIMA |
| AutoETS | .ets.AutoETS |
| AutoEnsembleForecaster | .compose._ensemble.AutoEnsembleForecaster |
| ColumnEnsembleForecaster | .compose._column_ensemble.ColumnEnsembleForecaster |
| Croston | .croston.Croston |
| DirRecTabularRegressionForecaster | .compose._reduce.DirRecTabularRegressionForecaster |
| DirRecTimeSeriesRegressionForecaster | .compose._reduce.DirRecTimeSeriesRegressionForecaster |
| DirectTabularRegressionForecaster | .compose._reduce.DirectTabularRegressionForecaster |
| DirectTimeSeriesRegressionForecaster | .compose._reduce.DirectTimeSeriesRegressionForecaster |
| EnsembleForecaster | .compose._ensemble.EnsembleForecaster |
| ExponentialSmoothing | .exp_smoothing.ExponentialSmoothing |
| **ForecastingGridSearchCV** | .model_selection._tune.ForecastingGridSearchCV |
| ForecastingPipeline | .compose._pipeline.ForecastingPipeline |
| **ForecastingRandomizedSearchCV** | .model_selection._tune.ForecastingRandomizedSearchCV |
| MultioutputTabularRegressionForecaster | .compose._reduce.MultioutputTabularRegressionForecaster |
| MultioutputTimeSeriesRegressionForecaster | .compose._reduce.MultioutputTimeSeriesRegressionForecaster |
| MultiplexForecaster | .compose._multiplexer.MultiplexForecaster |
| NaiveForecaster | .naive.NaiveForecaster |
| OnlineEnsembleForecaster | .online_learning._online_ensemble.OnlineEnsembleForecaster |
| PolynomialTrendForecaster | .trend.PolynomialTrendForecaster |
| RecursiveTabularRegressionForecaster | .compose._reduce.RecursiveTabularRegressionForecaster |
| RecursiveTimeSeriesRegressionForecaster | .compose._reduce.RecursiveTimeSeriesRegressionForecaster |
| StackingForecaster | .compose._stack.StackingForecaster |
| ThetaForecaster | .theta.ThetaForecaster |
| TransformedTargetForecaster | .compose._pipeline.TransformedTargetForecaster |
| TrendForecaster | .trend.TrendForecaster |
| UnobservedComponents | .structural.UnobservedComponents |
| VAR | .var.VAR |
```python=
# to obtain the fitted parameters, run
forecaster.get_fitted_params()
```
<br>
1. **statsmodels: exponential smoothing, theta forecaster, autoETS**
```
forecaster = ExponentialSmoothing(trend="add", seasonal="additive", sp=12)
```
> airline:
> [](https://i.imgur.com/9VJptbx.png)
> - mape: 0.050276529037763404
> - rmse: 26.375869286307378
> - r2: 0.8862942567527256
- additive 結果同 add
- multiplicative 結果同 mul
- [Is my time series additive or multiplicative?](https://www.r-bloggers.com/2017/02/is-my-time-series-additive-or-multiplicative/)
- 趨勢(trend)
- 整體情況如何變化
- 季節性(seasonality)
- 事情在給定時間內如何變化,例如 年、月、週、日
- error / residual / irregular
- 無法由趨勢或季節性值解釋的活動
<br>
```
forecaster = AutoETS(auto=True, sp=12, n_jobs=-1)
```
> airline:
> [](https://i.imgur.com/PSoJpuN.png)
> - mape: 0.06317607156941238
> - rmse: 33.96800911550286
> - r2: 0.811414184402218
2. **ARIMA and autoARIMA**
```python=
from sktime.forecasting.arima import ARIMA
forecaster = ARIMA(
order=(1, 1, 0), seasonal_order=(0, 1, 0, 12), suppress_warnings=True
)
```
> airline:
> [](https://i.imgur.com/0TCZRQE.png)
> - mape: 0.04257105757347649
> - rmse: 21.18778136161134
> - r2: 0.926626404165151
```python=
from sktime.forecasting.arima import AutoARIMA
forecaster = AutoARIMA(sp=12, suppress_warnings=True)
```
> airline:
> [](https://i.imgur.com/thyPdoK.png)
> - mape: 0.04117062367046531
> - rmse: 22.132236754717276
> - r2: 0.9199392872227382
<br>
3. **BATS and TBATS**
```python=
from sktime.forecasting.bats import BATS
forecaster = BATS(sp=12, use_trend=True, use_box_cox=False)
```
> airline:
> [](https://i.imgur.com/8eLSDo2.png)
> - mape: 0.08689500820486913
> - rmse: 54.529606793009044
> - r2: 0.5140030209344162
```python=
from sktime.forecasting.tbats import TBATS
forecaster = TBATS(sp=12, use_trend=True, use_box_cox=False)
```
> airline:
> [](https://i.imgur.com/UAwtwES.png)
> - mape: mape: 0.08493353477049946
> - rmse: 53.36423527634923
> - r2: 0.5345538759430307
3. 簡化, 調校, 管線, autoML
> **pipeline**:
> - STL is an acronym for “Seasonal and Trend decomposition using Loess,”
> **autoML**:
> - autoML, also known as automated model selection
1. **將預測簡化為迴歸**

#tabulate, tabulation, tabularise
- **strategy 參數**
- recursive (預設)
- direct
- dirrec (direct + recursive)
- multioutput
- **[4 Strategies for Multi-Step Time Series Forecasting](https://machinelearningmastery.com/multi-step-time-series-forecasting/)**
- **[How to Develop Multi-Output Regression Models with Python](https://machinelearningmastery.com/multi-output-regression-models-with-python/)**
- 有 code :+1: :100:
- **[[sktime] Forecasting wish list #220](https://github.com/alan-turing-institute/sktime/issues/220)**
- Recursive strategy
- basically https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.RegressorChain.html
- Direct strategy
- basically https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html
- DirRec strategy
- [Implement DirRec strategy for regression forecasting #226](https://github.com/alan-turing-institute/sktime/issues/226)
- ### Univariate time series classification with sktime
- ### Multivariate time series classification with sktime
<br>
### [User Guide](https://www.sktime.org/en/stable/user_guide.html)
- ### Forecasting
> #univariate, forecasting horizon

- [Example](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb)
<br>
### [Datasets](https://www.sktime.org/en/stable/api_reference/datasets.html)
- `dir(sktime.datasets)`
```=
'load_PBS_dataset',
'load_UCR_UEA_dataset',
'load_acsf1',
'load_airline',
'load_arrow_head',
'load_basic_motions',
'load_electric_devices_segmentation',
'load_gun_point_segmentation',
'load_gunpoint',
'load_italy_power_demand',
'load_japanese_vowels',
'load_longley',
'load_lynx',
'load_macroeconomic',
'load_osuleaf',
'load_shampoo_sales',
'load_unit_test',
'load_uschange'
```
<br>
<hr>
<br>
## 願望清單
- ### [[sktime] Forecasting wish list #220](https://github.com/alan-turing-institute/sktime/issues/220) :+1: :100:
- Atomic
- Reduction
- Composition
- Multivariate/vector forecasting
- Transformers
- Model selection
- Pipelines
- Enhancements
<br>
<hr>
<br>
## 多变量时间序列分类
> [[代码先锋网] sktime](https://www.codeleading.com/article/98486100407/)
```python=
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sktime.classification.compose import ColumnEnsembleClassifier
from sktime.classification.dictionary_based import BOSSEnsemble
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_basic_motions
from sktime.transformations.panel.compose import ColumnConcatenator
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
# multivariate input data
display(X_train.head())
# multi-class target variable
display(np.unique(y_train))
steps = [
("concatenate", ColumnConcatenator()),
("classify", TimeSeriesForestClassifier(n_estimators=100)),
]
clf = Pipeline(steps)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
clf = ColumnEnsembleClassifier(
estimators=[
("TSF0", TimeSeriesForestClassifier(n_estimators=100), [0]),
("BOSSEnsemble3", BOSSEnsemble(max_ensemble_size=5), [3]),
]
)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
clf = MrSEQLClassifier()
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
```
執行結果:

整合多個單欄位分類器:
```python=
clf = ColumnEnsembleClassifier(estimators=[
("TSF0", TimeSeriesForestClassifier(), [0]),
("TSF1", TimeSeriesForestClassifier(), [1]),
("TSF2", TimeSeriesForestClassifier(), [2]),
("TSF3", TimeSeriesForestClassifier(), [3]),
("TSF4", TimeSeriesForestClassifier(), [4]),
("TSF5", TimeSeriesForestClassifier(), [5]),
])
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
clf.score(X_test, y_test) # 1.0
from sklearn.metrics import accuracy_score
accuracy_score(y_pred, y_test) # 1.0
```
<br>
<hr>
<br>
## 參考資料
- ### [[代码先锋网] sktime](https://www.codeleading.com/article/98486100407/) :+1: :100:
- ### 英文說法
- [Forecasting with sktime](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb)
fitting the forecaster to the seen data
將預測器擬合到所見數據