# tsfresh使用
###### tags: `tsfresh`
## 快速簡介
tsfresh是一個python package。
它用來自動計算大量的時間序列特徵,此外還包含了一些方法,用於評估此類特徵對於回歸或分類任務的解釋能力和重要性。
> tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks.
## 安裝
輸入下方指令即可
```
pip install tsfresh
```
## 快速上手
可以先使用tsfresh中內建的資料集快速上手
### Quick start
* 下載資料夾
```python=
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, \
load_robot_execution_failures
download_robot_execution_failures()
timeseries, y = load_robot_execution_failures()
```
* 快速資料集瀏覽內容
```python=
print(timeseries.head())
```

* 透過extract_features快速提取特徵
這個function在官方文檔中號稱可以獲取約1200種不同的feature提取
**Note:在只有一條 time-series (就是沒有其他id) 的 data中,大概可以提取出700~800種左右**
```python=
from tsfresh import extract_features
extracted_features = extract_features(timeseries, column_id="id", column_sort="time")
# 此處的參數分別是
# timeseries: data (pandas的DataFrame datatype)
# column_id : data如何做分類,此處用id欄位做分類
# column_sort : data如何去做排序
```
* 使用 impute 和 select_features 去選擇需要的feature
透過這兩個function可以將原先1200多維的features降維成300左右
impute主要是在做處理資料的動作
像是將 $- inf 轉成 min \\ + inf 轉成 max \\ nan 轉成 median$
詳細介紹可以參考[官方文件](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.transformers.html#module-tsfresh.transformers.per_column_imputer)
```python=
from tsfresh import select_features
from tsfresh.utilities.dataframe_functions import impute
impute(extracted_features)
features_filtered = select_features(extracted_features, y)
# extracted_features : data
# y : 各data的Boolean (true/False)
```
* 同時做impute和select_features - extract_relevant_features
如果你是個怕麻煩的人,也可以使用這個function一次做完
```python=
from tsfresh import extract_relevant_features
features_filtered_direct = extract_relevant_features(timeseries, y,
column_id='id', column_sort='time')
```
## 一些進階技巧
### rolling
* 使用roll_time_series可以產生rolling的效果
```python=
from tsfresh.utilities.dataframe_functions import roll_time_series
df_rolled = roll_time_series(my_data[:100], column_id="id", column_sort="time")
```

* 在這之中,我們可以用min_timeshift和max_timeshift去控制rolling的大小
```python=
from tsfresh.utilities.dataframe_functions import roll_time_series
df_rolled = roll_time_series(my_data[:100], column_id="filename", column_sort="timestamp", min_timeshift=30,max_timeshift=30 )
```

### prediction
document中有提到可以呼叫
```
tsfresh.utilities.dataframe_functions.make_forecasting_frame
```
這個function去對時間序列進行預測
但我尚未進行實作的部分
## Reference
[tsfresh documentation](https://tsfresh.readthedocs.io/en/latest/index.html)