LSTM 神經網路(多用於時間序列資料預測)

# LSTM 神經網路(多用於時間序列資料預測) ###### tags: `tensorflow.keras` ## 讀取資料可於[**我的雲端**](https://drive.google.com/file/d/1kwmo-3PYVo_SExw7_Uo1rKkoOHsEzyEQ/view?usp=sharing)下載 ```python= df = pd.read_csv('2317.csv') df.isna().sum()#檢測缺失值為true 就是缺失 sum 為計算有幾個缺失值 ``` ![](https://i.imgur.com/cDN25t4.png) ```python= df.drop(['Change'],axis= 1,inplace = True)#inplace=True:這個行為要覆蓋到檔案 df.columns = ['date','open','high','law','close','rd','volume']#rd為股票振福 ``` ```python= df.set_index(['date'], inplace = True) df.rd.plot() ``` ![](https://i.imgur.com/f8EOCY8.png) ## 正規畫值會落在0~1之間 ```python= train = df[0:2670] test = df[2670:] from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() train = pd.DataFrame(scaler.fit_transform(train), columns=df.columns) test = pd.DataFrame(scaler.fit_transform(test), columns=df.columns) ``` ## 抓取用來預測的資料及預測值 ```python= import tqdm #進度條好看用的 n = 30 #用過去30天 x 預測 y資料 , 1/4起，所以能預測的第一個Y為2/3 feature_names = list(train.drop('rd', axis=1).columns) X = [] y = [] indexes = [] norm_data_x = train[feature_names] for i in tqdm.tqdm_notebook(range(0,len(train)-n)): X.append(norm_data_x.iloc[i:i+n]. values) #iloc[n,m]是取n,m那格數字 iloc[n:m]是取n*m那些數字 y.append(train['rd'].iloc[i+n-1]) #現有資料+30天的Y indexes.append(train.index[i+n-1]) #Y的日期 ``` ![](https://i.imgur.com/rk1zgAu.png) ## 建模 LSTM ### 多用來跟時間序列有關的資料 ```python= import keras from keras.models import Sequential from keras.layers import Dense,LSTM n_steps = 30 #前幾天的資料 n_features = 5#用幾個欄位 model = Sequential() model.add(LSTM(50,activation='relu', return_sequences=False, input_shape = (n_steps, n_features))) #model.add(神經元數量,激活函數,輸出是否為時間序列如果輸出為兩個以上值就是序列要寫True,input_shape(幾步,幾個特徵)) model.add(Dense(1)) model.compile(optimizer = 'adam', loss = 'mse' , metrics=['mse','mape']) ``` ## 訓練 ```python= history = model.fit(x,Y,batch_size = 100,epochs = 30) ``` ```python= print(history.__dict__) ``` ![](https://i.imgur.com/VoZKoN9.png) ```python= import matplotlib.pyplot as plt plt.title("train_loss") plt.plot(history.history['loss']) ``` ![](https://i.imgur.com/wQambc4.png) ## 測試 ```python= predictions= model.predict(x)#丟到模型做預測 predictions = pd.DataFrame(predictions).rename(columns={0:'預測值'})#轉成DataFrame格式 predictions#x的預測值 y_test = pd.DataFrame(Y).rename(columns={0 : "實際值"}) final = pd.concat([predictions,y_test],axis = 1)#合併 final["mae"] = abs(final["預測值"]-final["實際值"]) final ``` ![](https://i.imgur.com/C7NjIM6.png) ```python= norm_data = pd.DataFrame(scaler.inverse_transform(df), columns=df.columns, index=df.index) #將標準化後資料變為原資料 norm_data ``` ![](https://i.imgur.com/ZMjnSVn.png) # 結論誤差蠻大的