# tsfresh使用 ###### tags: `tsfresh` ## 快速簡介 tsfresh是一個python package。 它用來自動計算大量的時間序列特徵,此外還包含了一些方法,用於評估此類特徵對於回歸或分類任務的解釋能力和重要性。 > tsfresh is a python package. It automatically calculates a large number of time series characteristics, the so called features. Further the package contains methods to evaluate the explaining power and importance of such characteristics for regression or classification tasks. ## 安裝 輸入下方指令即可 ``` pip install tsfresh ``` ## 快速上手 可以先使用tsfresh中內建的資料集快速上手 ### Quick start * 下載資料夾 ```python= from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, \ load_robot_execution_failures download_robot_execution_failures() timeseries, y = load_robot_execution_failures() ``` * 快速資料集瀏覽內容 ```python= print(timeseries.head()) ``` ![](https://i.imgur.com/8fjfjXU.png) * 透過extract_features快速提取特徵 這個function在官方文檔中號稱可以獲取約1200種不同的feature提取 **Note:在只有一條 time-series (就是沒有其他id) 的 data中,大概可以提取出700~800種左右** ```python= from tsfresh import extract_features extracted_features = extract_features(timeseries, column_id="id", column_sort="time") # 此處的參數分別是 # timeseries: data (pandas的DataFrame datatype) # column_id : data如何做分類,此處用id欄位做分類 # column_sort : data如何去做排序 ``` * 使用 impute 和 select_features 去選擇需要的feature 透過這兩個function可以將原先1200多維的features降維成300左右 impute主要是在做處理資料的動作 像是將 $- inf 轉成 min \\ + inf 轉成 max \\ nan 轉成 median$ 詳細介紹可以參考[官方文件](https://tsfresh.readthedocs.io/en/latest/api/tsfresh.transformers.html#module-tsfresh.transformers.per_column_imputer) ```python= from tsfresh import select_features from tsfresh.utilities.dataframe_functions import impute impute(extracted_features) features_filtered = select_features(extracted_features, y) # extracted_features : data # y : 各data的Boolean (true/False) ``` * 同時做impute和select_features - extract_relevant_features 如果你是個怕麻煩的人,也可以使用這個function一次做完 ```python= from tsfresh import extract_relevant_features features_filtered_direct = extract_relevant_features(timeseries, y, column_id='id', column_sort='time') ``` ## 一些進階技巧 ### rolling * 使用roll_time_series可以產生rolling的效果 ```python= from tsfresh.utilities.dataframe_functions import roll_time_series df_rolled = roll_time_series(my_data[:100], column_id="id", column_sort="time") ``` ![](https://i.imgur.com/4s1Bt6S.png) * 在這之中,我們可以用min_timeshift和max_timeshift去控制rolling的大小 ```python= from tsfresh.utilities.dataframe_functions import roll_time_series df_rolled = roll_time_series(my_data[:100], column_id="filename", column_sort="timestamp", min_timeshift=30,max_timeshift=30 ) ``` ![](https://i.imgur.com/ZbCTqww.png) ### prediction document中有提到可以呼叫 ``` tsfresh.utilities.dataframe_functions.make_forecasting_frame ``` 這個function去對時間序列進行預測 但我尚未進行實作的部分 ## Reference [tsfresh documentation](https://tsfresh.readthedocs.io/en/latest/index.html)