工具 / sktime

工具 / sktime === ###### tags: `ML / 時間序列` ###### tags: `ML`, `時間序列`, `sktime` ![](https://i.imgur.com/S6FuR9B.png =50%x) [TOC] ## ==Key Points== ### 術語 - ### fh: forecasting horizon 預測範圍 - 相對 fh = N+1, N+2, N+3, ... (is_relative=True) - 絕對 fh = 2021-12, 2022-01, 2022-02, ... (is_relative=False) - 可轉成相對，需提供 `cutoff` - 例如 `cutoff='2022-01'` fh = -1, 0, 1, 2, ... (如同 t-1, t, t+1, t+2 概念) - 一般 `cutoff` 皆設定為 train-set 最後一個日期 2021-11 fh = 1, 2, 3, 4, ... (如同 t+1, t+2, t+3, t+4 概念) - ### sp: seasonal period 季節性週期 - ### FFT: Fast Fourier Transform, 快速傅立葉變換可用於分解合成曲線/波的週期成份，趨勢可以想成第一主成份，週期可以想成次要成份 - ### estimator, 評估器為 classifier, regressor, transformer, forecaster, 等等的上位稱呼 ### 支援的任務 - classifier - regressor - transformer - forecaster ### 目前支援的 forecaster 模型 | name | estimator (prefix: sktime.forecasting.) | | ---- | -------- | | ARIMA | .arima.ARIMA | | AutoARIMA | .arima.AutoARIMA | | AutoETS | .ets.AutoETS | | AutoEnsembleForecaster | .compose._ensemble.AutoEnsembleForecaster | | ColumnEnsembleForecaster | .compose._column_ensemble.ColumnEnsembleForecaster | | Croston | .croston.Croston | | DirRecTabularRegressionForecaster | .compose._reduce.DirRecTabularRegressionForecaster | | DirRecTimeSeriesRegressionForecaster | .compose._reduce.DirRecTimeSeriesRegressionForecaster | | DirectTabularRegressionForecaster | .compose._reduce.DirectTabularRegressionForecaster | | DirectTimeSeriesRegressionForecaster | .compose._reduce.DirectTimeSeriesRegressionForecaster | | EnsembleForecaster | .compose._ensemble.EnsembleForecaster | | ExponentialSmoothing | .exp_smoothing.ExponentialSmoothing | | **ForecastingGridSearchCV** | .model_selection._tune.ForecastingGridSearchCV | | ForecastingPipeline | .compose._pipeline.ForecastingPipeline | | **ForecastingRandomizedSearchCV** | .model_selection._tune.ForecastingRandomizedSearchCV | | MultioutputTabularRegressionForecaster | .compose._reduce.MultioutputTabularRegressionForecaster | | MultioutputTimeSeriesRegressionForecaster | .compose._reduce.MultioutputTimeSeriesRegressionForecaster | | MultiplexForecaster | .compose._multiplexer.MultiplexForecaster | | NaiveForecaster | .naive.NaiveForecaster | | OnlineEnsembleForecaster | .online_learning._online_ensemble.OnlineEnsembleForecaster | | PolynomialTrendForecaster | .trend.PolynomialTrendForecaster | | RecursiveTabularRegressionForecaster | .compose._reduce.RecursiveTabularRegressionForecaster | | RecursiveTimeSeriesRegressionForecaster | .compose._reduce.RecursiveTimeSeriesRegressionForecaster | | StackingForecaster | .compose._stack.StackingForecaster | | ThetaForecaster | .theta.ThetaForecaster | | TransformedTargetForecaster | .compose._pipeline.TransformedTargetForecaster | | TrendForecaster | .trend.TrendForecaster | | UnobservedComponents | .structural.UnobservedComponents | | VAR | .var.VAR | ### state-of-art 模型概況 > [Forecasters in sktime - main families](https://www.sktime.org/en/latest/examples/01_forecasting.html#2.-Forecasters-in-sktime---main-families) > ![](https://i.imgur.com/ntx9D3F.png) | packages | models | | -------- | ------ | | statsmodels | ExponentialSmoothing, ThetaForecaster, autoETS | | pmdarima | ARIMA, autoARIMA | | tbats | BATS, TBATS | | sktime | PolynomialTrend | | Facebook prophet | Prophet | - Prophet 在整合上，還在思考如何處理 fh > NotImplementedError: > <class 'pandas.core.indexes.period.PeriodIndex'> is not supported for input, > use type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'> instead. ### 與 AI Maker 整合上的思考 1. 持續更新的數據集，要如何處理？ 2. 選定某類模型演算法後，模型因新的數據集而持續迭代更新，要如何處理？ 3. 輸入輸出接口 > 每一種演算法能處理的輸出入接口不盡相同 - 輸入：單變數/多變數 - 輸出：單變量/多變量 4. sktime 的模型演算法，有些需要輸入參數，如： sp (假定已知週期) - 實務上，可能不知道資料集的實際週期需要藉由實作**超參數搜尋**，來尋找最佳週期 - 即使開出 sp 參數要使用者輸入，使用者本身也可能不確定實際週期 <hr> <hr> ## Source Code > https://github.com/alan-turing-institute/sktime - License: BSD-3-Clause License - [Release note(發佈版本通知)](https://www.sktime.org/en/latest/changelog.html) - [v0.8.1 (2021-10-28)](https://github.com/alan-turing-institute/sktime/tree/v0.8.1/sktime) ## [doc](https://www.sktime.org/en/stable/) ### [開發宗旨](https://www.sktime.org/en/stable/roadmap.html#project-aims) - **Develop a unified framework for machine learning with time series in Python** 用 Python 開發具有時間序列的機器學習之統一框架 - [Developer Guide](https://www.sktime.org/en/stable/developer_guide.html) 致力於「與 scikit-learn 整合」的時間序列機器學習工具 - **Advance research on algorithm development and software design for machine learning toolboxes** 對於 sktime 機器學習工具箱，在演算法法開發與軟體設計上深入研究 - **Build a more connected community of researchers and domain experts who work with time series** 針對研究時間序列的研究人員和領域專家，建立一個彼此聯繫更緊密的社群 - **Create and deliver educational material including documentation and user guides** 建立和交付教材，包括文件和使用者指南 ### [Get Started](https://www.sktime.org/en/stable/get_started.html) - ### Installation ```python pip install sktime pip install sktime[all_extras] ``` :::warning :warning: ImportWarning > No module named 'xxx'. 'xxx' is a soft dependency and not included in the sktime installation. Please run: `pip install xxx`. To install all soft dependencies, run: `pip install sktime[all_extras]` > | module name | package | | ------------ | -------------------------- | | esig | `pip install esig` | | fbprophet | `pip install fbprophet` | | tbats | `pip install tbats` | | hcrystalball | `pip install hcrystalball` | | tsfresh | `pip install tsfresh` | | stumpy | `pip install stumpy` | - 資訊來源：執行 `df = all_estimators("forecaster", as_dataframe=True)` ::: - ### Quickstart ```python= from sktime.datasets import load_airline from sktime.forecasting.base import ForecastingHorizon from sktime.forecasting.model_selection import temporal_train_test_split from sktime.forecasting.theta import ThetaForecaster from sktime.performance_metrics.forecasting import mean_absolute_percentage_error y = load_airline() y_train, y_test = temporal_train_test_split(y) fh = ForecastingHorizon(y_test.index, is_relative=False) forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity forecaster.fit(y_train) y_pred = forecaster.predict(fh) mean_absolute_percentage_error(y_test, y_pred) ``` ### [Tutorials](https://www.sktime.org/en/latest/tutorials.html) - ### [Forecasting with sktime](https://www.sktime.org/en/latest/examples/01_forecasting.html) | [example / notebook](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb) > at current time (v0.6x), ==forecasting of multivariate time seres is **not a stable functionality**== > > **Example**: as the running example in this tutorial, we use a textbook data set, the Box-Jenkins airline data set, which consists of the number of monthly totals of international airline passengers, from 1949 - 1960. Values are in thousands. **See "Makridakis, Wheelwright and Hyndman (1998) Forecasting: methods and applications", exercises sections 2 and 3.** 1. 1. **資料格式** - pd.Series 用於 univariate (單變量) - pd.DataFrame 用於 multivariate (多變量) 2. **基本工作流程** - 準備資料 - 設定預測範圍 ForecastingHorizon - 選定 forecaster - 執行 fit - 執行 predict 3. **評估模型好壞 (metric)** - 5. testing whether this performance is statistically better than a chosen baseline performance 測試此效能在統計上是否優於所選基準效能 - mean_absolute_percentage_error 和 MeanAbsolutePercentageError 結果不同 - mean_absolute_percentage_error: 0.047958874937125424 - MeanAbsolutePercentageError: 0.04909243523851268 - 與 scikit-learn 相同 - 發現問題在於參數 `symmetric=False` 4. **滾動式更新＆預測** - 更新資料，且更新模型參數 (i.e. 效果同重新訓練，再預測) - 使用方式：`forecaster.update() + predict()` - 只更新資料，但不更新模型參數 - **情境1**：隨著時間過去，沒有新的觀測資料 > if no new data was observed, but time has progressed - **情境2**：計算時間太久 > if computations take too long, and forecasts have to be queried. - 使用方式：`forecaster.update(y_1958Apr, update_params=False)` - simulate the update/predict 5. **evaluation worfklow** > the forecaster needs to be tested in a set-up mimicking rolling forecasting, usually on past data > [![](https://i.imgur.com/eWGNdNH.png)](https://i.imgur.com/eWGNdNH.png) - [ExpandingWindowSplitter](https://www.sktime.org/en/latest/api_reference/auto_generated/sktime.forecasting.model_selection.ExpandingWindowSplitter.html) | [API doc](https://www.sktime.org/en/v0.7.0/api_reference/modules/auto_generated/sktime.forecasting.model_evaluation.evaluate.html) - fh=1, initial_window=10, step_length=1, start_with_window=True - `AutoARIMA` 就不支援 refit > UserWarning: NotImplementedWarning: AutoARIMA does not have a custom `update` method implemented. AutoARIMA will be refit each time `update` is called. > 2. **forecaster** - **state-of-art** ![](https://i.imgur.com/ntx9D3F.png) - statsmodels - ExponentialSmoothing, ThetaForecaster, and autoETS - pmdarima - ARIMA and autoARIMA - tbats - BATS and TBATS - PolynomialTrend - Facebook prophet - Prophet - **列舉可用 forecaster** ``` all_estimators("forecaster", as_dataframe=True) ``` - classifier - regressor - transformer - forecaster | name | estimator (prefix: sktime.forecasting.) | | ---- | -------- | | ARIMA | .arima.ARIMA | | AutoARIMA | .arima.AutoARIMA | | AutoETS | .ets.AutoETS | | AutoEnsembleForecaster | .compose._ensemble.AutoEnsembleForecaster | | ColumnEnsembleForecaster | .compose._column_ensemble.ColumnEnsembleForecaster | | Croston | .croston.Croston | | DirRecTabularRegressionForecaster | .compose._reduce.DirRecTabularRegressionForecaster | | DirRecTimeSeriesRegressionForecaster | .compose._reduce.DirRecTimeSeriesRegressionForecaster | | DirectTabularRegressionForecaster | .compose._reduce.DirectTabularRegressionForecaster | | DirectTimeSeriesRegressionForecaster | .compose._reduce.DirectTimeSeriesRegressionForecaster | | EnsembleForecaster | .compose._ensemble.EnsembleForecaster | | ExponentialSmoothing | .exp_smoothing.ExponentialSmoothing | | **ForecastingGridSearchCV** | .model_selection._tune.ForecastingGridSearchCV | | ForecastingPipeline | .compose._pipeline.ForecastingPipeline | | **ForecastingRandomizedSearchCV** | .model_selection._tune.ForecastingRandomizedSearchCV | | MultioutputTabularRegressionForecaster | .compose._reduce.MultioutputTabularRegressionForecaster | | MultioutputTimeSeriesRegressionForecaster | .compose._reduce.MultioutputTimeSeriesRegressionForecaster | | MultiplexForecaster | .compose._multiplexer.MultiplexForecaster | | NaiveForecaster | .naive.NaiveForecaster | | OnlineEnsembleForecaster | .online_learning._online_ensemble.OnlineEnsembleForecaster | | PolynomialTrendForecaster | .trend.PolynomialTrendForecaster | | RecursiveTabularRegressionForecaster | .compose._reduce.RecursiveTabularRegressionForecaster | | RecursiveTimeSeriesRegressionForecaster | .compose._reduce.RecursiveTimeSeriesRegressionForecaster | | StackingForecaster | .compose._stack.StackingForecaster | | ThetaForecaster | .theta.ThetaForecaster | | TransformedTargetForecaster | .compose._pipeline.TransformedTargetForecaster | | TrendForecaster | .trend.TrendForecaster | | UnobservedComponents | .structural.UnobservedComponents | | VAR | .var.VAR | ```python= # to obtain the fitted parameters, run forecaster.get_fitted_params() ``` 1. **statsmodels: exponential smoothing, theta forecaster, autoETS** ``` forecaster = ExponentialSmoothing(trend="add", seasonal="additive", sp=12) ``` > airline: > [![](https://i.imgur.com/9VJptbx.png)](https://i.imgur.com/9VJptbx.png) > - mape: 0.050276529037763404 > - rmse: 26.375869286307378 > - r2: 0.8862942567527256 - additive 結果同 add - multiplicative 結果同 mul - [Is my time series additive or multiplicative?](https://www.r-bloggers.com/2017/02/is-my-time-series-additive-or-multiplicative/) - 趨勢(trend) - 整體情況如何變化 - 季節性(seasonality) - 事情在給定時間內如何變化，例如年、月、週、日 - error / residual / irregular - 無法由趨勢或季節性值解釋的活動 ``` forecaster = AutoETS(auto=True, sp=12, n_jobs=-1) ``` > airline: > [![](https://i.imgur.com/PSoJpuN.png)](https://i.imgur.com/PSoJpuN.png) > - mape: 0.06317607156941238 > - rmse: 33.96800911550286 > - r2: 0.811414184402218 2. **ARIMA and autoARIMA** ```python= from sktime.forecasting.arima import ARIMA forecaster = ARIMA( order=(1, 1, 0), seasonal_order=(0, 1, 0, 12), suppress_warnings=True ) ``` > airline: > [![](https://i.imgur.com/0TCZRQE.png)](https://i.imgur.com/0TCZRQE.png) > - mape: 0.04257105757347649 > - rmse: 21.18778136161134 > - r2: 0.926626404165151 ```python= from sktime.forecasting.arima import AutoARIMA forecaster = AutoARIMA(sp=12, suppress_warnings=True) ``` > airline: > [![](https://i.imgur.com/thyPdoK.png)](https://i.imgur.com/thyPdoK.png) > - mape: 0.04117062367046531 > - rmse: 22.132236754717276 > - r2: 0.9199392872227382 3. **BATS and TBATS** ```python= from sktime.forecasting.bats import BATS forecaster = BATS(sp=12, use_trend=True, use_box_cox=False) ``` > airline: > [![](https://i.imgur.com/8eLSDo2.png)](https://i.imgur.com/8eLSDo2.png) > - mape: 0.08689500820486913 > - rmse: 54.529606793009044 > - r2: 0.5140030209344162 ```python= from sktime.forecasting.tbats import TBATS forecaster = TBATS(sp=12, use_trend=True, use_box_cox=False) ``` > airline: > [![](https://i.imgur.com/UAwtwES.png)](https://i.imgur.com/UAwtwES.png) > - mape: mape: 0.08493353477049946 > - rmse: 53.36423527634923 > - r2: 0.5345538759430307 3. 簡化, 調校, 管線, autoML > **pipeline**: > - STL is an acronym for “Seasonal and Trend decomposition using Loess,” > **autoML**: > - autoML, also known as automated model selection 1. **將預測簡化為迴歸** ![](https://i.imgur.com/ODjRE9s.png) #tabulate, tabulation, tabularise - **strategy 參數** - recursive (預設) - direct - dirrec (direct + recursive) - multioutput - **[4 Strategies for Multi-Step Time Series Forecasting](https://machinelearningmastery.com/multi-step-time-series-forecasting/)** - **[How to Develop Multi-Output Regression Models with Python](https://machinelearningmastery.com/multi-output-regression-models-with-python/)** - 有 code :+1: :100: - **[[sktime] Forecasting wish list #220](https://github.com/alan-turing-institute/sktime/issues/220)** - Recursive strategy - basically https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.RegressorChain.html - Direct strategy - basically https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html - DirRec strategy - [Implement DirRec strategy for regression forecasting #226](https://github.com/alan-turing-institute/sktime/issues/226) - ### Univariate time series classification with sktime - ### Multivariate time series classification with sktime ### [User Guide](https://www.sktime.org/en/stable/user_guide.html) - ### Forecasting > #univariate, forecasting horizon ![](https://i.imgur.com/Ls5oKQK.png) - [Example](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb) ### [Datasets](https://www.sktime.org/en/stable/api_reference/datasets.html) - `dir(sktime.datasets)` ```= 'load_PBS_dataset', 'load_UCR_UEA_dataset', 'load_acsf1', 'load_airline', 'load_arrow_head', 'load_basic_motions', 'load_electric_devices_segmentation', 'load_gun_point_segmentation', 'load_gunpoint', 'load_italy_power_demand', 'load_japanese_vowels', 'load_longley', 'load_lynx', 'load_macroeconomic', 'load_osuleaf', 'load_shampoo_sales', 'load_unit_test', 'load_uschange' ``` <hr> ## 願望清單 - ### [[sktime] Forecasting wish list #220](https://github.com/alan-turing-institute/sktime/issues/220) :+1: :100: - Atomic - Reduction - Composition - Multivariate/vector forecasting - Transformers - Model selection - Pipelines - Enhancements <hr> ## 多变量时间序列分类 > [[代码先锋网] sktime](https://www.codeleading.com/article/98486100407/) ```python= import numpy as np from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline from sktime.classification.compose import ColumnEnsembleClassifier from sktime.classification.dictionary_based import BOSSEnsemble from sktime.classification.interval_based import TimeSeriesForestClassifier from sktime.classification.shapelet_based import MrSEQLClassifier from sktime.datasets import load_basic_motions from sktime.transformations.panel.compose import ColumnConcatenator X, y = load_basic_motions(return_X_y=True) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape) # multivariate input data display(X_train.head()) # multi-class target variable display(np.unique(y_train)) steps = [ ("concatenate", ColumnConcatenator()), ("classify", TimeSeriesForestClassifier(n_estimators=100)), ] clf = Pipeline(steps) clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) clf = ColumnEnsembleClassifier( estimators=[ ("TSF0", TimeSeriesForestClassifier(n_estimators=100), [0]), ("BOSSEnsemble3", BOSSEnsemble(max_ensemble_size=5), [3]), ] ) clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) clf = MrSEQLClassifier() clf.fit(X_train, y_train) print(clf.score(X_test, y_test)) ``` 執行結果： ![](https://i.imgur.com/rXo8FW8.png) 整合多個單欄位分類器： ```python= clf = ColumnEnsembleClassifier(estimators=[ ("TSF0", TimeSeriesForestClassifier(), [0]), ("TSF1", TimeSeriesForestClassifier(), [1]), ("TSF2", TimeSeriesForestClassifier(), [2]), ("TSF3", TimeSeriesForestClassifier(), [3]), ("TSF4", TimeSeriesForestClassifier(), [4]), ("TSF5", TimeSeriesForestClassifier(), [5]), ]) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) clf.score(X_test, y_test) # 1.0 from sklearn.metrics import accuracy_score accuracy_score(y_pred, y_test) # 1.0 ``` <hr> ## 參考資料 - ### [[代码先锋网] sktime](https://www.codeleading.com/article/98486100407/) :+1: :100: - ### 英文說法 - [Forecasting with sktime](https://github.com/alan-turing-institute/sktime/blob/main/examples/01_forecasting.ipynb) fitting the forecaster to the seen data 將預測器擬合到所見數據