<style> /* .reveal .slides h2 { text-align: left; }*/ .reveal section img { background: #F0F0F0; border: 0px solid #fff; box-shadow: 0 0 0px rgba(0, 0, 0, 0); padding: 2.5rem; border-radius: 2.5rem; width: 80%; } .markdown-body hr { height: .25em; padding: 0; margin: 24px 0; background-color: #ffffff; border: 0; } pre.flow-chart, pre.sequence-diagram, pre.graphviz, pre.mermaid, pre.abc, pre.vega { background-color: rgba(0, 0, 0, 0); } .reveal pre { box-shadow: 0px 0px 0px rgb(0 0 0 / 15%); } .alert-info { color: #31708f; background-color: #f7d9d900; border-color: #bce8f1; padding: 1rem; padding-right: 0rem; padding-left: 0rem; border-radius: 2.5rem; font-size: 2rem; } .reveal blockquote { font-size: 2.5rem; text-align: left; padding: 1rem; padding-left: 3rem; padding-right: 3rem; width: 100%; } .reveal pre { display: block; position: relative; width: 100%; margin: var(--r-block-margin) auto; text-align: left; font-size: 0.55em; font-family: var(--r-code-font); line-height: 1.2em; word-wrap: break-word; box-shadow: 0px 5px 15px rgb(0 0 0 / 15%); } .reveal pre code { max-height: 600px; } </style> # Artificial Intelligence[$^{\tiny\#}$](https://hackmd.io/@hsiangjenli/NKUST-AI#/) ###### --------------- 台灣發行量加權股價指數趨勢預測 --------------- ###### Department of Money and Banking 4$^{th}$ (金融四乙) ###### 李享紝(C107125248) --- ## 動機 <div style='text-align: left'> > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;在財務領域有許多用來預測時間序列的計量模型,像是AR、ARMA、ARIMA、ARCH、GARCH…。但使用這些模型通常都需要很多嚴謹的假設檢定和繁複的轉換方式。也因此比較少人使用這些模型。這些模型背後有許多理論,像是非定態的時間序列轉換成定態的時間序列。此次的目的就是希望可以善加利用這些統計理論結合pytorch來增加訓練模型的精準度。 > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;因此,我從當中選擇一個最簡單、好懂的AR模型作為對比,來比較看看,LSTM和AR哪種表現比較優異。 </div> --- ![](https://i.imgur.com/WpOqLeW.jpg) --- ## Model Selection <div style='text-align: left'> 1. **LSTM** 2. **Autoregressive model (自迴歸模型) + Pytorch Regression** > 一般的回歸式 >$$ y_{i} = \alpha + \beta_{0}X_{1i} + \beta_{1}X_{2i}...\\ $$ > 自回歸模型 >$$ y_{t+1} = \alpha + \beta_{0}y_{t} + \beta_{1}y_{t-1} + \beta_{2}y_{t-2} ... $$ </div> --- ## Dataset ![台灣證券交易所](https://upload.wikimedia.org/wikipedia/zh/thumb/0/00/Taiwan_Stock_Exchange.svg/560px-Taiwan_Stock_Exchange.svg.png) ```python= import requests import json url = '''https://www.twse.com.tw/exchangeReport/FMTQIK ?response=json&date=20211213''' r = requests.get(url) rawJson = r.json() ``` --- ### raw DataFrame <div style='font-size: 2.3rem'> | 日期 | TWII | 成交金額 | 成交股數 | 成交筆數 | 漲跌點數 | |:-----------|-----------------:|------------:|------------:|------------:|-----------:| | 2011-01-03 | 9025.3 | 1.49701e+11 | 6.66862e+09 | 1.29521e+06 | 52.8 | | 2011-01-04 | 8997.19 | 1.62515e+11 | 7.00041e+09 | 1.41335e+06 | -28.11 | | 2011-01-05 | 8846.31 | 1.80318e+11 | 7.66616e+09 | 1.50807e+06 | -150.88 | | 2011-01-06 | 8883.21 | 1.39196e+11 | 5.88292e+09 | 1.153e+06 | 36.9 | | 2011-01-07 | 8782.72 | 1.59552e+11 | 6.30132e+09 | 1.24305e+06 | -100.49 | </div> --- ![](https://i.imgur.com/Itx2teA.png) --- ## Preprocessing the Data 1. 將指數轉換成對數報酬率 --- ![](https://i.imgur.com/ABwXOoq.png) ![](https://i.imgur.com/fY9ihkf.png) --- ### 統計學上的時間序列... 1. 定態(stationary) -> 均值回歸 1. 非定態(nonstationary) > **均值回歸(Mean Reversion)** 指當某數值偏離平均值(Mean),其後數值將會傾向回歸平均值。 --- #### 非定態 -> 定態的方法 <div style='text-align: center'> > 1. 報酬率、對數報酬率 $$ \frac{P_{1} - P_{0}}{P_{0}} \space or\space \ln(\frac{P_{1}}{P_{0}}) $$ > 2. 差分法 $$ P_{1} - P_{0} $$ </div> --- ![](https://i.imgur.com/1g74Oyy.png) ![](https://i.imgur.com/Fc5BVXX.png) --- ![](https://i.imgur.com/JKDcA9G.png) --- ### Add Features (1/3) $\tiny{-Rolling\space Window}$ <div style='font-size: 2.3rem'> | Date| t| t-1| t-2| t-3| t-4| |:-----------|--------------:|-------------:|-------------:|-------------:|-------------:| | 2011-01-04 | -0.00311944 | nan | nan | nan | nan | | 2011-01-05 | -0.0169119 | -0.00311944 | nan | nan | nan | | 2011-01-06 | 0.00416256 | -0.0169119 | -0.00311944 | nan | nan | | 2011-01-07 | -0.0113768 | 0.00416256 | -0.0169119 | -0.00311944 | nan | | 2011-01-10 | 0.00399532 | -0.0113768 | 0.00416256 | -0.0169119 | -0.00311944 | | 2011-01-11 | 0.0127872 | 0.00399532 | -0.0113768 | 0.00416256 | -0.0169119 | | 2011-01-12 | 0.00375943 | 0.0127872 | 0.00399532 | -0.0113768 | 0.00416256 | | 2011-01-13 | 0.00117945 | 0.00375943 | 0.0127872 | 0.00399532 | -0.0113768 | | 2011-01-14 | -0.000342098 | 0.00117945 | 0.00375943 | 0.0127872 | 0.00399532 | </div> --- ### 計算對數報酬率 <div style='font-size: 3.2rem'> ```python= class logReturn: def __init__(self): pass def fit(self, df, windows:int=0): self.df = copy.deepcopy(df) self.windows = windows self.transformR() self.rollingWindow() def transformR(self): self.lr = np.log(self.df/self.df.shift(1)) return self def backwardR(self, logR, pred:bool=False): if pred: return np.exp(logR) * self.df[self.windows:].shift(1) else: return np.exp(logR) * self.df.shift(1) def rollingWindow(self): rolling = len(self.lr) - self.windows self.rollingX = np.array([self.lr[i: i+self.windows] for i in range(rolling)]) self.rollingY = np.array([self.lr[i+self.windows] for i in range(rolling)]) return self ``` </div> --- <div style='font-size: 3.5rem'> ```python= windows=10 LR = logReturn() LR.fit(df.close, windows=windows) ``` ```python= LR.lr # 純粹的對數報酬率 x, y = LR.rollingX, LR.rollingY # 使用Rolling-Window後 ``` ```python= # 純粹的對數報酬率返回 pd.DataFrame( { 'True': df.close.values, 'Backward': LR.backwardR(LR.lr).values } ) # 預測值返回,因為使用 Rolling-Window 前幾個row會刪除 pd.DataFrame( { 'Pred': outputs, 'Pred-Backward': LR.backwardR(outputs, pred=True) } ) ``` </div> --- ### Add Features (2/3) $\tiny{-X\space more\space Features}$ > 對Rolling Window生成的Features取平方, 次方, 對數...來增加一些多樣性 --- ### Add Features (3/3) $\tiny{-Talib}$ > 原本的資料包含**成交股數**, **成交筆數**, **漲跌點數**的欄位,使用[**Talib**](https://github.com/mrjbq7/ta-lib)生成技術指標 ```python= diff = df["漲跌點數"].shift(1) volume = df["成交股數"].shift(1) records = df["成交筆數"].shift(1) df['MOM_volume'] = talib.MOM(volume, timeperiod=5) df['MOM_records'] = talib.MOM(records, timeperiod=5) df['MOM_diff'] = talib.MOM(diff, timeperiod=5) df['RSI_volume'] = talib.RSI(volume,timeperiod=5) df['RSI_records'] = talib.RSI(records,timeperiod=5) df['RSI_diff'] = talib.RSI(diff,timeperiod=5) df['willR'] = talib.WILLR(volume, records, diff, timeperiod=5) df['CCI'] = talib.CCI(volume, records, diff, timeperiod=5) ``` --- ## 名詞解釋 > 1. **AR(Autoregressive, 自迴歸模型)** > 用過去的y值作為預測值的Features。 > 2. **Rolling-Window** > Window代表過去幾日的值作為Features. > e.g. windows=10, 過去10日的值當作預測第11日時的Features。 > 3. **Talib** > 生成技術指標的套件。 > 4. **log-return(對數報酬率)** > 在做量化投資時常用來計算報酬率的一種方式,具有良好的可加性。 --- ## 參考資料 <div style='font-weight: bold; font-size: 2.5rem'> 1. [David Macêdo (2020) Time Series Prediction with LSTM Using PyTorch](https://github.com/dlmacedo/starter-academic/blob/master/content/courses/deeplearning/notebooks/pytorch/Time_Series_Prediction_with_LSTM_Using_PyTorch.ipynb) 1. [How to transfer logReturns back to original prices with pandas?](https://stackoverflow.com/questions/54620686/how-to-transfer-logreturns-back-to-original-prices-with-pandas) 1. [何謂均值回歸?](https://service.hket.com/knowledge/2158464/何謂均值回歸?) 1. [[量化投資基本功]為什麼對數收益率在量化投資這麼重要? log return 與累積報酬率](https://pyecontech.com/2020/11/03/量化投資基本功為什麼對數收益率在量化投資這麼/) </div>
{"metaMigratedAt":"2023-06-16T16:21:48.138Z","metaMigratedFrom":"YAML","breaks":true,"contributors":"[{\"id\":\"205f9792-35fa-4607-bdb5-cc3e137792f3\",\"add\":14830,\"del\":6798}]","title":"NKUST-AI","description":"        在財務領域有許多用來預測時間序列的計量模型,像是AR、ARMA、ARIMA、ARCH、GARCH…。但使用這些模型通常都需要很多嚴謹的假設檢定和繁複的轉換方式。也因此比較少人使用這些模型。這些模型背後有許多理論,像是非定態的時間序列轉換成定態的時間序列。此次的目的就是希望可以善加利用這些統計理論結合pytorch來增加訓練模型的精準度。        因此,我從當中選擇一個最簡單、好懂的AR模型作為對比,來比較看看,LSTM和AR哪種表現比較優異。"}
    577 views
   owned this note