工具 / FEDOT
===
###### tags: `ML / 時間序列`
###### tags: `ML`, `時間序列`, `FEDOT`
<br>
[TOC]
<br>
## 官方文件
:::info
:bulb: **官方文件**<br><br>
- **[Github](https://github.com/nccr-itmo/FEDOT)**
- **[FEDOT.Docs](https://itmo-nss-team.github.io/FEDOT.Docs/)**
- **[FEDOT API](https://fedot.readthedocs.io/en/master/api/api.html)**
:::
<br>
## 介紹
### 資料來源
- [FEDOT](https://github.com/nccr-itmo/FEDOT)
> github, open-source framework
> ED: Evolutionary Design
> FEDOT [fɪˋdɑt] (來源:[官方 Youtube 影片](https://www.youtube.com/watch?v=RjbuV6i6de4)的發音)
### 起源
- 大部分 AutoML 框架可以解決分類和迴歸
但很少能解決「文字」、「圖像處理」、以及「時間序列預測」
### 功能/特色
- FEDOT 是 AutoML 框架 (i.e. 可組裝 pipeline)
- 可支援:
- 分類 / 迴歸 / 分群 / 時間序列預測
- 序列中間缺失填補
- ML pipeline 自動生成
#pipeline #chain #composite model
- pipeline
- 每個節點表示一個容器,每個容器只做一件事情,可以是:
- 正規化 / 標準化 / 填補缺失 / 隨機森林 / 最近鄰居 等
:::warning
:warning: **注意(2021/09/13)**
這裡的容器是指程式的 wrapper,
不是指 docker container 之類的容器
:::
- pipeline 就是串接這些容器
### [任務] FEDOT 如何辦到時間序列處理?
- 分類器&迴歸器皆可做到時間序列預測
- 使用 SSA 方法:使用滑動窗口 + 軌跡矩陣(trajectory matrix)
- 滑動窗口大小是超參數
- 用於填補資料缺&預測
- 已實作的功能
- 前處理:移動平均平滑, 高斯平滑
- 模型:AR, ARIMA
<br>
<hr>
<br>
## 範例
### [[官方] Notebook 範例](https://github.com/nccr-itmo/FEDOT#examples--tutorials)
- ### [ITMO-NSS-team / fedot-examples](https://github.com/ITMO-NSS-team/fedot-examples/tree/main/notebooks/latest)
1. [Intro to AutoML](https://github.com/ITMO-NSS-team/fedot-examples/blob/main/notebooks/latest/1_intro_to_automl.ipynb)
2. [Intro to FEDOT functionality](https://github.com/ITMO-NSS-team/fedot-examples/blob/main/notebooks/latest/2_intro_to_fedot.ipynb)
3. [Intro to time series forecasting with FEDOT](https://github.com/ITMO-NSS-team/fedot-examples/blob/main/notebooks/latest/3_intro_ts_forecasting.ipynb)
4. [Advanced time series forecasting](https://github.com/ITMO-NSS-team/fedot-examples/blob/main/notebooks/latest/4_auto_ts_forecasting.ipynb)
5. [Gap-filling in time series and out-of-sample forecasting](https://github.com/ITMO-NSS-team/fedot-examples/blob/main/notebooks/latest/5_ts_specific_cases.ipynb)
<br>
### [AutoML for time series: advanced approaches with FEDOT framework](https://towardsdatascience.com/automl-for-time-series-advanced-approaches-with-fedot-framework-4f9d8ea3382c)
- [Fedot time-series forecasting electricity case](https://github.com/ITMO-NSS-team/fedot_electro_ts_case)
<br>
### [範例] iris (使用 `InputData`)
- ### load data
```python=
# data
import sklearn.datasets
# Dataclass for wrapping arrays into it
from fedot.core.data.data import InputData
# Tasks to solve
from fedot.core.repository.tasks import Task, TaskTypesEnum
# Type of the input data
from fedot.core.repository.dataset_types import DataTypesEnum
iris = sklearn.datasets.load_iris()
task_type = Task(TaskTypesEnum.classification)
data_type = DataTypesEnum.table
input_data = InputData(
task=task_type, # classification
data_type=data_type, # tabular data
idx=range(0, len(iris.data)), # row index
features=iris.data, # features
target=iris.target # target
)
```
- 如果是使用 pandas.DataFrame 接入 InputData
```python=
df = pandas.read_csv(
iris.filename, skiprows=1, header=None,
names=numpy.append(iris.feature_names, ['target']))
input_data = InputData(
idx=range(0, len(df)),
features=df[iris.feature_names].to_numpy(),
task=task_type,
data_type=data_type,
target=df['target'].to_numpy()
)
```
- ### train
```python=
#import warnings
#warnings.filterwarnings("ignore")
from fedot.api.main import Fedot
#task selection, initialisation of the framework
model = Fedot(problem="classification")
# run of the AutoML-based model generation
pipeline = model.fit(input_data)
pipeline.show()
pipeline.print_structure()
```
- framework 參數:`seed`, `verbose_level`
`Fedot(problem='classification', seed = 42, verbose_level=0)`
- `show()`

- `print_structure`
```
Pipeline structure:
{'depth': 2, 'length': 2, 'nodes': [xgboost, scaling]}
xgboost - {'learning_rate': 0.8235413666886693, 'max_depth': 1, 'min_child_weight': 3, 'n_estimators': 100, 'nthread': 1, 'subsample': 0.7378644163117208}
scaling - default_params
```
- 太多 warnings,可以不要顯示

```
#import warnings
#warnings.filterwarnings("ignore")
```
- ### predict
```python=
import pandas
model.predict(pandas.DataFrame(iris.data))
model.get_metrics(iris.target, ['acc', 'f1'])
```
- 執行結果
```
{'acc': 0.98, 'f1': 0.97999799979998}
```
### [範例] iris (不使用 `InputData`)
```python=
# load data-frame
import sklearn.datasets
iris = sklearn.datasets.load_iris()
columns = iris.feature_names
columns.extend(['target'])
df = pandas.read_csv(iris.filename, skiprows=1, header=None, names=columns)
df.features = df[iris.feature_names]
df.target = df[['target']]
# fit the training data
from fedot.api.main import Fedot
model = Fedot(problem='classification')
pipeline = model.fit(features=df.features, target=df.target)
pipeline.show()
pipeline.print_structure()
# predict the training data
model.predict(features=df.features)
model.get_metrics(df.target, ['acc', 'f1'])
```
- **worst case**
- acc=0.3333

### [範例] iris (更簡單)
```python=
import sklearn.datasets
iris = sklearn.datasets.load_iris()
from fedot.api.main import Fedot
model = Fedot(problem="classification")
pipeline = model.fit(features=iris.data, target=iris.target)
pipeline.show()
pipeline.print_structure()
model.predict(iris.data)
model.get_metrics(iris.target, ['acc', 'f1'])
```
<br>
<hr>
<br>
## 支援的演算法
- ### xgboost
- 從 fedot 到 xgboost

- 觀察方式
- 中途強制停止執行(KeyboardInterrupt)
- 看起來 operation_implementation 就是指向 xgboost 實體
- ### svm / svc
[](https://i.imgur.com/4S5R0Mo.png)
- 改成大寫,會報錯
[](https://i.imgur.com/9Ee4HnX.png)
- ### 演算法清單
> site-packages/fedot/core/repository/data/model_repository.json
- adareg
- ar
- arima
- bernb
- catboost
- catboostreg
- dt
- dtreg
- gbr
- kmeans
- knn
- knnreg
- lasso
- lda
- lgbm
- lgbmreg
- linear
- logit
- mlp
- multinb
- qda
- rf
- rfr
- ridge
- sgdr
- stl_arima
- svc
- svr
- tfidf
- treg
- xgboost
- xgbreg
- cnn
- ### 查詢 / 列出 / 列舉 當前任務中,可用的操作 (模型演算法, 資料操作)
```python=
# TaskTypesEnum
# - classification
# - clustering
# - regression
# - ts_forecasting
# mode
# - all (=model+data_operation)
# - model
# - data_operation
from fedot.core.repository.tasks import Task, TaskTypesEnum
from fedot.core.repository.operation_types_repository import get_operations_for_task
get_operations_for_task(Task(TaskTypesEnum.ts_forecasting))
```
執行結果:
```=
['adareg',
'ar',
'arima',
'catboostreg',
'dtreg',
'gbr',
'lasso',
'lgbmreg',
'linear',
'rfr',
'ridge',
'sgdr',
'stl_arima',
'svr',
'treg',
'xgbreg',
'scaling',
'normalization',
'pca',
'poly_features',
'ransac_lin_reg',
'ransac_non_lin_reg',
'rfe_lin_reg',
'rfe_non_lin_reg',
'lagged',
'sparse_lagged',
'smoothing',
'gaussian_filter',
'exog_ts_data_source']
```
- [只列舉 model](https://github.com/nccr-itmo/FEDOT/blob/master/cases/multi_modal_rating_prediction.py)
`available_model_types = get_operations_for_task(task=task, mode='model')`
- [API doc](https://fedot.readthedocs.io/en/master/api/repository.html#fedot.core.repository.operation_types_repository.get_operations_for_task)
<br>
<hr>
<br>
## 支援的指標
> site-packages/fedot/api/api_utils.py
```python=
composer_metrics_mapping = {
'acc': ClassificationMetricsEnum.accuracy,
'roc_auc': ClassificationMetricsEnum.ROCAUC,
'f1': ClassificationMetricsEnum.f1,
'logloss': ClassificationMetricsEnum.logloss,
'mae': RegressionMetricsEnum.MAE,
'mse': RegressionMetricsEnum.MSE,
'msle': RegressionMetricsEnum.MSLE,
'mape': RegressionMetricsEnum.MAPE,
'r2': RegressionMetricsEnum.R2,
'rmse': RegressionMetricsEnum.RMSE,
'rmse_pen': RegressionMetricsEnum.RMSE_penalty,
'silhouette': ClusteringMetricsEnum.silhouette,
'node_num': ComplexityMetricsEnum.node_num
}
```
<br>
<hr>
<br>
## 實驗特性
### 實驗結果是否可重製 (reproducible)?
```python=
# new instance to be used as AutoML tool
auto_model = Fedot(problem='classification', seed = 42, verbose_level=0)
```
- 結論

- 即使 seed 都設成 42
但限制時間為 2 分鐘,2 分鐘內做的速率也不一樣
最終結果還是不一樣
<br>
### 可再進行微調 (fine_tune) ?
```python=
pipeline.fine_tune_all_nodes(
loss_function=accuracy_score,
loss_params=None,
input_data=input_data)
```
<br>
<hr>
<br>
## API 使用注意事項
- `TsForecastingParams(forecast_length)`
- `InputData` 和 `Fedot` 都有 `task_params` 參數
- `InputData` + `Fedot`(AutoML) , 以 `Fedot` 為準
- `Fedot` 預設值為
`TsForecastingParams(forecast_length=30)`
- `InputData` + custom pipeline , 以 `InputData` 為準
- 否則會出現 shape 對不起來的問題
<br>
<hr>
<br>
## Troubleshooting
### model.fit() 警告
- #### Q1: `UserWarning: Operation ridge not found in the repository`
`warnings.warn(f'Operation {operation_id} not found in the repository')`

### model.fit() 報錯
- #### Q1: `KeyError: 'class_weight'` (不一定每次執行都會遇到)

...

- **程式碼**
```python=
import sklearn.datasets
iris = sklearn.datasets.load_iris()
columns = iris.feature_names
columns.extend(['target'])
df = pandas.read_csv(iris.filename, skiprows=1, header=None, names=columns)
df.features = df[iris.feature_names]
df.target = df[['target']] # <--- root cause ?
from fedot.api.main import Fedot
model = Fedot(problem='classification')
model.fit(features=df.features, target=df.target) # <---
model.predict(features=df.features)
model.get_metrics(df.target, ['acc', 'f1'])
```
- ~~**解法1**~~
> 找不到 solution, try 出來的
```python=
model = Fedot(
problem='classification',
task_params={'class_weight': None})
```
- task_params 參數說明
> additional parameters of the task
- 在 task_params 參數添加 dict 物件
- 並將 'class_weight' 作為 key, value 可為 None, 1.0 (測試都 ok)
<br>
- **解法2**:target 的 shape 必須是一維陣列

- #### Q2: `TypeError: '(slice(None, None, None), 0)' is an invalid key`
- **程式碼**
```python=
input_data = InputData(
idx=range(0, len(df)),
features=df[[0,1,2,3]], <---
task=task_type,
data_type=data_type,
target=df[[4]] <---
)
```
```python=
from fedot.api.main import Fedot
model = Fedot(problem="classification")
pipeline = model.fit(input_data) <---
```
- **原因**
雖說 data 可接受 pandas.DataFrame,但接了會出錯
- **解法**
- 透過 numpy.ndarray
```python=
input_data = InputData(
idx=range(0, len(df)),
features=df[[0,1,2,3]].to_numpy(), <---
task=task_type,
data_type=data_type,
target=df[[4]].to_numpy() <---
)
```
- #### Q3:
`TypeError: Singleton array array('target', dtype='<U6') cannot be considered a valid collection.`
`ValueError: Found input variables with inconsistent numbers of samples: [20000, 4000]`
- **程式碼**
```python=
model.fit(features=df_train, target=df_train.target)
```
- **解決方式**
> 找不到 solution, try 出來的
`model.fit(features=df_train, target=df_train.target)`... X
`model.fit(features=df_train, target='target')`... O
- target 參數要指定 '欄位名稱',不能帶 dataframe
- #### Q4: `AttributeError: 'GraphNode' object has no attribute 'operation'`
- 可能出至於 model 不支援?
- #### Q5: `TypeError: '>' not supported between instances of 'float' and 'NoneType'` (+9)

- 可能出至於 model 不支援?
- #### Q6: `TypeError: only integer scalar arrays can be converted to a scalar index` (+3)

- 可能出至於 model 不支援?
- #### Q7: `KeyError: 'max_depth'` (+8)

> #classification
- 可能出至於 model 不支援?
- #### Q8: `ValueError: Mix of label input types (string and number)` (+2)

> #classification
- 可能出至於 model 不支援?
- #### Q9: CPU 88 顆全滿,執行一直卡住沒有下文 (+3)
[](https://i.imgur.com/O88AORz.png)

- #### Q10: `AttributeError: Too much fitness evaluation errors. Composing stopped.`
```python=
import pandas
df = pandas.read_csv('air-passengers-full.csv', index_col=[0])
from fedot.api.main import Fedot
model = Fedot(problem="ts_forecasting")
pipeline = model.fit(
features=df.index.to_numpy(),
target=df["#Passengers"].to_numpy())
```
- **解決方式**
- 對於時間序列資料,只能使用 InputData (?)
<br>
### model.predict() 報錯
- #### Q1: `IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices`

- **原因**
- 雖說 doc 寫可接受 numpy.array(下圖所示),但接了會出錯

- `type(iris.data)`
numpy.ndarray
- `iris.data.shape`
(150, 4)
- **解法**
- 透過 pandas.DataFrame
```python=
import pandas
model.predict(pandas.DataFrame(iris.data))
```
- #### Q2: `TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''`
- fit: `{'depth': 1, 'length': 1, 'nodes': [lda]}`
- 可能出至於 model 不支援?
- #### Q3: `AttributeError: 'NoneType' object has no attribute 'forecast_length'`
- **程式碼**
```
predict_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting),
data_type=DataTypesEnum.ts,
idx=range(len(train_data), len(train_data) + len(test_data)),
features=train_data, # df_train
target=test_data # df_test
)
predict_data = pipeline.predict(predict_input) # <---
```
- **解法**
不能直接呼叫 `pipeline.predict`,要先呼叫 `model.predict` :warning:
```
predict_data1 = model.predict(predict_input)
predict_data2 = pipeline.predict(predict_input)
```


- #### Q4: `ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 13 is different from 2)`
- **程式碼**
```
model.task_params.forecast_length=len(train_data)
predict_data = model.predict(predict_input)
```
- **解法**
即使把值調回來,該 model 已經變得無法使用了 :warning:
<br>
### model.get_metrics() 報錯
- #### Q1: `AttributeError: 'NoneType' object has no attribute 'predict'`
- **程式碼**
```python=
model.get_metrics(train_data.target, ['acc', 'f1'])
```
- **解法**
要先執行 predict
`model.predict(features=train_data.features)`
<br>
<hr>
<br>
## 測試集:iris (分類測試)
```python=
import sklearn.datasets
df_train_x = pandas.read_csv('../dataset/classification/iris/input/train_x.csv', header=None)
df_train_y = pandas.read_csv('../dataset/classification/iris/input/train_y.csv', header=None)
df_test_x = pandas.read_csv('../dataset/classification/iris/input/test_x.csv', header=None)
df_test_y = pandas.read_csv('../dataset/classification/iris/input/test_y.csv', header=None)
iris = sklearn.datasets.load_iris()
from fedot.api.main import Fedot
model = Fedot(problem="classification")
pipeline = model.fit(features=df_train_x, target=df_train_y)
pipeline.show()
pipeline.print_structure()
model.predict(df_train_x)
model.get_metrics(df_train_y, ['acc', 'f1'])
```
```
{'acc': 0.9833333333333333, 'f1': 0.9833333333333334}
```
<br>
<hr>
<br>
## 測試集:合成的時間序列
### load data
[[程式碼] 合成的時間序列](https://hackmd.io/7h4bK5EqRvSANplJSoq2Mg#%E7%A8%8B%E5%BC%8F%E7%A2%BC)
### train data
```python=
from fedot.api.main import Fedot, TsForecastingParams, Task, TaskTypesEnum, DataTypesEnum, InputData
model = Fedot(
problem='ts_forecasting',
task_params=TsForecastingParams(
forecast_length=len(test_data))
)
train_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting),
data_type=DataTypesEnum.ts,
idx=range(0, len(train_data)),
features=train_data, # <-- train_data
target=train_data # <-- train_data
)
pipeline = model.fit(train_input)
print(pipeline)
pipeline.print_structure()
pipeline.show()
```
```
{'depth': 4, 'length': 4, 'nodes': [ridge, adareg, ridge, lagged]}
```
```
Pipeline structure:
{'depth': 4, 'length': 4, 'nodes': [ridge, adareg, ridge, lagged]}
ridge - {'alpha': 3.5389205189223545}
adareg - default_params
ridge - {'alpha': 3.5389205189223545}
lagged - {'window_size': 13.062557112045752}
```

### predict data
```python=
predict_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting),
data_type=DataTypesEnum.ts,
idx=range(len(train_data), len(train_data) + len(test_data)), # <--- forecast range
features=train_data, # <-- train_data
target=test_data # <-- test_data !!!
)
predict_data = model.predict(predict_input)
plt.figure(figsize=(12, 5), dpi=100)
plt.plot(synthetic_time_series, label='test_data')
plt.plot(train_data, label='train_data')
plt.plot(range(len(train_data), len(synthetic_time_series)),
predict_data, label='predict_data', linestyle='dotted', color='red')
plt.legend()
plt.show()
```

### Metrics
```python=
from sklearn.metrics import r2_score, mean_squared_error
print('test:')
print(' - R2:', round(r2_score(test_data, predict_data), 6))
print(' - MSE:', round(mean_squared_error(test_data, predict_data, squared=True), 6))
print(' - RMSE:', round(mean_squared_error(test_data, predict_data, squared=False), 6))
print(model.get_metrics(test_data, ['r2', 'mse', 'rmse']))
```
```
test:
- R2: -0.262834
- MSE: 0.163648
- RMSE: 0.404534
{'r2': 0.2628341357356181, 'mse': 0.16364794289776338, 'rmse': 0.40453422957490676}
```
- 無法對 train_data 進行評估
<br>
<hr>
<br>
## 測試集:每月搭機旅客人數 (AutoML)
### load data
> [[Kaggle] Air Passengers](/7h4bK5EqRvSANplJSoq2Mg#Kaggle-Air-Passengers)
```python=
import pandas
df_train = pandas.read_csv('air-passengers-train.csv', index_col=[0])
display(df_train)
df_test = pandas.read_csv('air-passengers-test.csv', index_col=[0])
display(df_test)
print(df_test.shape)
```
 
### train data
```python=
from fedot.api.main import InputData, Task, TaskTypesEnum, DataTypesEnum
from fedot.api.main import Fedot, TsForecastingParams
train_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting),
data_type=DataTypesEnum.ts,
idx=range(0, len(df_train)),
features=df_train.to_numpy(),
target=df_train.to_numpy()
)
model = Fedot(
problem='ts_forecasting',
task_params=TsForecastingParams(forecast_length=len(df_test)),
timeout=5 # min
)
pipeline = model.fit(train_input)
print(pipeline)
pipeline.print_structure()
pipeline.show()
```
```
{'depth': 3, 'length': 3, 'nodes': [ridge, lagged, simple_imputation]}
```
```
Pipeline structure:
{'depth': 3, 'length': 3, 'nodes': [ridge, lagged, simple_imputation]}
ridge - {'alpha': 9.096192326637516}
lagged - {'window_size': 9.962640983790191}
simple_imputation - default_params
```

### predict data
```python=
predict_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting),
data_type=DataTypesEnum.ts,
idx=range(len(df_train), len(df_train) + len(df_test)), # <--- forecast range
features=df_train.to_numpy(), # df_train
target=df_test.to_numpy() # df_test
)
predict_data = model.predict(predict_input)
print('predict_data:', predict_data)
df_predict = pandas.DataFrame(
predict_data, index=df_test.index, columns=['#Passengers'])
import matplotlib.pyplot as plt
# figure size
plt.figure(figsize=(12, 5), dpi=100)
# plot for the training data
plt.plot(df_train, label='true')
# plot for the testing data
# draw the line between the last point of df_train and the first point of df_test
xn_to_x0 = [df_train.index[-1], df_test.index[0]]
yn_to_y0 = [df_train['#Passengers'][-1], df_test['#Passengers'][0]]
plt.plot(xn_to_x0, yn_to_y0, color='tab:orange')
plt.plot(df_test, label='test', color='tab:orange')
# plot for the prediction data
xn_to_x0 = [df_train.index[-1], df_predict.index[0]]
yn_to_y0 = [df_train['#Passengers'][-1], df_predict['#Passengers'][0]]
plt.plot(xn_to_x0, yn_to_y0, color='tab:red', linestyle='dotted')
plt.plot(df_predict, label='pred', color='tab:red', linestyle='dotted')
plt.xticks([str(y) + '-01' for y in range(1949, 1962, 1)])
plt.legend()
plt.grid()
plt.show()
```
```
predict_data: [468.03845328 407.53694038 363.90146783 397.62346053 397.60541397
372.90541843 409.46790505 398.23674565 414.33977806 498.25005239
556.20503571 568.70702608 524.80747675 453.53027667 405.48368098
439.43015816 436.17662455 408.84883172 446.69847665 439.03161391
460.10774953 559.91364224 628.70725312 641.91153143 586.50213839
502.08357083 448.57156687 477.38240074]
```

### Metrics
```python=
from sklearn.metrics import r2_score, mean_squared_error
print('test:')
print(' - R2:', round(r2_score(df_test, df_predict), 6))
print(' - MSE:', round(mean_squared_error(df_test, df_predict, squared=True), 6))
print(' - RMSE:', round(mean_squared_error(df_test, df_predict, squared=False), 6))
print(model.get_metrics(df_test, ['r2', 'mse', 'rmse']))
```
```
test:
- R2: 0.752505
- MSE: 1526.476332
- RMSE: 39.070146
{'r2': 0.7525053429306533, 'mse': 1526.476332044568, 'rmse': 39.070146301806545}
```
<br>
<hr>
<br>
## 測試集:每月搭機旅客人數 (Custom)
### load data
> [[Kaggle] Air Passengers](/7h4bK5EqRvSANplJSoq2Mg#Kaggle-Air-Passengers)
```python=
import pandas
df_train = pandas.read_csv('air-passengers-train.csv', index_col=[0])
display(df_train)
df_test = pandas.read_csv('air-passengers-test.csv', index_col=[0])
display(df_test)
print(df_test.shape)
```
 
### train data
```python=
from fedot.api.main import PrimaryNode, SecondaryNode, Pipeline
forecast_length = 28
window_size = 24
train_input = InputData(
task=Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=forecast_length)),
data_type=DataTypesEnum.ts,
idx=range(0, len(df_train)),
features=df_train["#Passengers"].to_numpy(),
target=df_train["#Passengers"].to_numpy())
imput_node = PrimaryNode('simple_imputation')
lagged_node = SecondaryNode('lagged', nodes_from=[imput_node])
lagged_node.custom_params = {'window_size': window_size}
ridge_node = SecondaryNode('ridge', nodes_from=[lagged_node])
pipeline = Pipeline(ridge_node)
pipeline.fit(train_input)
print(pipeline)
pipeline.print_structure()
pipeline.show()
```
```
{'depth': 3, 'length': 3, 'nodes': [ridge, lagged, simple_imputation]}
```
```
Pipeline structure:
{'depth': 3, 'length': 3, 'nodes': [ridge, lagged, simple_imputation]}
ridge - default_params
lagged - {'window_size': 24}
simple_imputation - default_params
```

### predict data
```python=
predict_input = InputData(
task = Task(TaskTypesEnum.ts_forecasting, TsForecastingParams(forecast_length=forecast_length)),
data_type = DataTypesEnum.ts,
idx = range(0, len(df_train)),
features = df_train["#Passengers"].to_numpy(),
target = df_test["#Passengers"].to_numpy())
predicted_data = ridge_pipeline.predict(predict_input)
predicted_data = predicted_data.predict.ravel()
print(predicted_data)
df_predict = pandas.DataFrame(
predicted_data, index=df_test.index, columns=['#Passengers'])
import matplotlib.pyplot as plt
# figure size
plt.figure(figsize=(12, 5), dpi=100)
# plot for the training data
plt.plot(df_train, label='train')
# plot for the testing data
# draw the line between the last point of df_train and the first point of df_test
xn_to_x0 = [df_train.index[-1], df_test.index[0]]
yn_to_y0 = [df_train['#Passengers'][-1], df_test['#Passengers'][0]]
plt.plot(xn_to_x0, yn_to_y0, color='tab:orange')
plt.plot(df_test, label='test', color='tab:orange')
# plot for the prediction data
xn_to_x0 = [df_train.index[-1], df_predict.index[0]]
yn_to_y0 = [df_train['#Passengers'][-1], df_predict['#Passengers'][0]]
plt.plot(xn_to_x0, yn_to_y0, color='tab:red', linestyle='dotted')
plt.plot(df_predict, label='pred', color='tab:red', linestyle='dotted')
plt.xticks([str(y) + '-01' for y in range(1949, 1962, 1)])
plt.legend()
plt.grid()
plt.show()
```
```
[436.85707658 372.64061775 328.12607297 362.44832921 378.17602095
357.40503243 407.35332788 403.43492861 426.53023322 513.78619656
580.40143265 587.82320514 509.72866988 434.61819386 391.40753476
421.87465698 436.45401765 405.46591674 461.86552967 466.04925102
489.05312366 593.53557485 677.16376665 679.57127175 583.6557558
495.16898657 434.8809138 469.03468683]
```

### Metrics
```python=
from sklearn.metrics import r2_score, mean_squared_error
print('test:')
print(' - R2:', round(r2_score(df_test, df_predict), 6))
print(' - MSE:', round(mean_squared_error(df_test, df_predict, squared=True), 6))
print(' - RMSE:', round(mean_squared_error(df_test, df_predict, squared=False), 6))
#print(model.get_metrics(df_test, ['r2', 'mse', 'rmse']))
```
```
test:
- R2: 0.794247
- MSE: 1269.025435
- RMSE: 35.623383
{'r2': 0.7525053429306533, 'mse': 1526.476332044568, 'rmse': 39.070146301806545}
```
<br>
<hr>
<br>
## 參考資料
- ### [AutoML for time series: advanced approaches with FEDOT framework](https://towardsdatascience.com/automl-for-time-series-advanced-approaches-with-fedot-framework-4f9d8ea3382c)




- ### [Multi-step Time Series Forecasting with ARIMA, LightGBM, and Prophet](https://towardsdatascience.com/multi-step-time-series-forecasting-with-arima-lightgbm-and-prophet-cc9e3f95dfb0)
<br>
## 參考資料(待消化)
- ### [Python練習:自我迴歸(Autoregression)](http://blog.udn.com/gwogo/130821369)
- 通勤列車乘客人數預測
- 通勤列車趨勢值(trend)
- 如何讓資料變成具有穩定性
- ### [自迴歸模型](https://zh.wikipedia.org/wiki/%E8%87%AA%E8%BF%B4%E6%AD%B8%E6%A8%A1%E5%9E%8B)
- 必須具有自相關,自相關係數是關鍵。如果自相關係數(R)小於0.5,則不宜採用,否則預測結果極不準確。
- 自迴歸只能適用於預測與自身前期相關的經濟現象