Time Series Prediction

2019deeplearn hw3 DeeplearnHW03 410535020 資管四葉松 ###### tags: `Fundamental Deep Learning Assignments 2019` # Time Series Prediction --- # 目錄 [TOC] # 1.問題 --- In this assignment, you are given the first part of a time series, {x(t)} .1000,t=1 The goal is to design a recurrent neural network to predict the second part of the series {x(t)} which will be used for testing your network by 1500,t=1001the teacher. This testing data set would not be available to you. # 2.解決步驟 --- 由於我們想透過前面的1000筆資料去預測後面500筆資料的數值，這裡我使用了LSTM遞歸神經望網絡去解決，從第一筆資料開始，每次抓取包含其後99筆資料作為訓練資料，其後第100筆資料的值作為該訓練資料對應的數值，將這些訓練資料丟進，模型中進行學習，最終達到預測的效果以下是我解決的步驟 **資料預處理->建立模型->訓練模型->預測->評估** ### 資料預處理首先安裝跟匯入需要用到的模組 ```python= import keras from keras import layers from keras.layers.embeddings import Embedding from keras.layers.recurrent import SimpleRNN import numpy as np ``` 接著從雲端把我們的訓練資料下載下來 ```python= path=keras.utils.get_file('A3_train.txt', origin='https://drive.google.com/uc?export=download&id=184ipV1dPL8tEW_SgXJqDcngOOKNRN4Lx') text = open(path) ``` 把TXT中讀到的資料先放到time陣列中方便等一下使用 ```python= times=[] # 把1000筆資料讀取到time陣列中 for i in range (1000): s=text.readline().rstrip() s=float(s) #print (s) times.append(s) ``` 接著從第一筆資料開始每100筆資料作為訓練資料而後將資料分成訓練資料跟測試資料 ```python= result = [] # 設定一次取99筆資料預測第100筆資料 sequence_length=100 for index in range(len(times) - sequence_length): result.append(times[index: index + sequence_length]) result = np.array(result) row = 500 train = result[:row, :] np.random.shuffle(train) X_train = train[:, :-1] y_train = train[:, -1] X_test = result[row:, :-1] y_test = result[row:, -1] ``` ### 建立模型 ```python= from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten, LSTM, TimeDistributed, RepeatVector,Input model = Sequential() model.add(LSTM(20, input_shape=(sequence_length-1,1), return_sequences=True)) model.add(LSTM(25, return_sequences=True)) model.add(TimeDistributed(Dense(1))) model.add(Flatten()) model.add(Dense(20,activation='linear')) model.add(Dense(1)) model.summary() ``` ### 訓練模型選擇好loss Function 就可以開始訓練了！ ```python= optimizer="adam",metrics=['mean_absolute_error']) model.compile(loss="mse", optimizer="adam",metrics=['mean_absolute_error']) epochs=50 model.fit(X_train, y_train,batch_size=50, nb_epoch=epochs, validation_split=0) ``` ### 預測後500筆資料的值接著就是將剛剛訓練好的的模型，拿來開始進行預測囉~ ```python= last_test=np.array(times[-1*(sequence_length-1)::]) s1=[] for i in range (sequence_length-1): s1.append(last_test[i]) predictedset=[] for i in range (500): x=np.array(s1) x=x.reshape(1,sequence_length-1,1) predicted = model.predict(x) out=predicted[0,0] predictedset.append(out) if i!=499: print(str(out)+",",end='') else: print(str(out),end='') s1.append(out) s1=s1[1:120] ``` ### 顯示預測的曲線圖將剛剛預測出來的結果繪製成圖形 ```python= import matplotlib.pyplot as plt try: fig = plt.figure() ax = fig.add_subplot(111) plt.plot(predictedset) plt.show() except Exception as e: print(str(e)) ``` # 3.效能評估 ![](https://i.imgur.com/OPhsrzc.png) 訓練的LOSS結果還不錯的有降到0.04以下，整體看上去也與前1000筆的走勢較為接近，不過一些較細微的地方還是跟原本的圖有些不同當中有遇到的問題是，若不是每次抓100筆資料當訓練資料，而是50筆或者更小的數字時，蠻容易導致預測出來的數值曲線圖，波鋒跟波谷都與前1000筆的小了快2倍，跑到近500筆資料時，曲線甚至逐漸變平，我認為可能是因為學習的訓練資料每次抓的範圍較小，預測的數值便沒辦法脫離那個小區間的區域，因此數值在那個區域來回跳導致的