###### tags: `畢專` `paper`
# Listening to Chaotic Whispers: A Deep Learning Framework for News-oriented Stock Trend Prediction
[TOC]
## ABSTRACT
* Stock trend prediction plays a critical role in seeking maximized profit from the stock investment.(在股票市場找尋利益最大化,股票趨勢的預測十分重要)
* the advancing development of natural language processing and text mining techniques have enabled investors to unveil market trends and volatility from online content.(natural language的處理和text mining的技術不斷進步,可從網路中獲得市場趨勢和波動)
* Difficult
* the highly volatile and non-stationary nature of the stock market.(股市的不穩定性)
* a large portion consists of the low-quality news,comments, or even rumors.(許多假消息)
* solution
* imitate the learning process of human beings facing such chaotic online news(模仿人類面對新聞的學習模式)
* using Hybrid Attention Networks (HAN) to predict the stock trend based on the sequence of recent related news.(運用HAN來預測近期股票趨勢)
* self paced learning mechanism to imitate the third principle.(self paced learning mechanism模仿以下三種原則)
* sequential content dependency
* diverse influence
* effective and efficient learning
* Extensive experiments on real-world stock market data demonstrate the effectiveness of our framework. A further simulation illustrates that a straightforward trading strategy based on our proposed framework can significantly increase the annualized return.
## INTRODUCTION
### Tradition
1. technical analysis upon recent prices and volumes on the market.(近期價格和數量的分析)
* 問題:yield the very limitations on unveiling the rules that govern the drastic dynamics of the market.(因為在股票市場的變化大,而有所侷限)
3. analyzing financial statements of each company.(分析每個公司的財務報表)
* 問題:is incapable of catching the impact of recent trends.
### the advancing development of Natural Language Processing techniques
* has inspired increasing efforts on stock trend prediction by automatically analyzing stock related articles.(因為natural language的技術進步,對於股市預測的影響變大)
* **Tetlock** extracts and quantifies the optimism and pessimism of Wall Street Journal reports. In pessimum reports, tend to be followed by a downtrend.and a reversion of market prices.(Tetlock 對華爾街日報進行量化,觀察到在悲觀報告之後,股市會有下降趨勢)
* but a large portion of the on-line content consists of the low-quality (大部分的資訊並不真實)
### three principle
* imitate the learning process of human beings facing such chaotic online news
> 1. Sequential Context Dependency
> * 即使資訊不全,人類還是可以根據現況做出好的決定,並可以找出各個部份的重要性做不同決定。
>2. Diverse Influence
> * 即使資料的來源面向不同,人類還是可以以一些重大新聞(e.q.戰爭...),全面的考慮每個新聞的真實影響性。
>3. Effective and Efficient Learning
> * human beings tend to first gain knowledge by focusing on informative occasions, and then turn to disturbing evidence to obtain uneasy experience.
### Hybrid Attention Networks (HAN)
* To capture the first two principles of human learning process
* predict the stock trend based on the sequence of recent related news(根據最近有相關性的新聞進行預測)
* recurrent neural networks (RNN)
* enables the processing of recent related news for a stock in a unified sequence(對最近有相關性的新聞做處理)
* the attention mechanism is capable of identifying more influential time periods of the sequence
* propose news-level attention-based neural networks at the lower level, which aims at recognizing more important news from others at the same time point
### self-paced learning (SPL)mechanism
* To imitate the effective and efficient learning of human(模仿人類有效且高效的學習方式)
* Since the news-based stock trend prediction is more challenging in some situations, SPL enables us to automatically skip those training samples from some challenging periods in the early stage of model training, and progressively increase the complexity of training samples(因為在一開始的學習,某些情況下的股票預測有困難,所以我們先跳過那些困難的部份,之後再慢慢增加難度)
## RELATED WORK
* traditional approaches can be categorized into two primary approaches: technical and fundamental analysis.(傳統方法可分成技術分析和基礎分析)
* Technical analysis
: The main goal of this type of approach is to discover the trading patterns that can be leveraged for future prediction.(其主要目標是 發現可用於未來預測的交易模式)
* the non-linear and non-stationary nature of stock prices limits the appli-cability of AR models. (廣為大家使用的model是線性和固定序列的Autoregressive (AR) model,但股票價格是不平穩及非線性的)
* To further model the long-term dependency in time series, recurrent neural networks (RNN), especially Long Short-Term Memory (LSTM) network, have also been employed in financial perdition.
* Zhang proposed a new RNN, called State Frequency Memory (SFM), to discover multi frequency trading patterns for stock price prediction.(Zhang提出了新的RNN叫SFM,用於預測股價的多頻交易)
* it is incapable of unveiling the rules that govern the dynamics of the market be-yond price data. (無法處理數據以外的訊息)
* Fundamental analysis
: seek information from outside market-historic-data, such as geopolitics, financial environment and business principles. Especially news. (從地緣政治 金融環境等等當中獲得資訊,特別是新聞)
* Nassirtoussi proposed a multi-layer dimension reduction algorithm with semantics and sentiment to predict intra-day directional-movements of a currency-pair in the foreign exchange market.(可以預測外匯市場中貨幣當日的變化)
* Ding proposed a deep learning method for event-driven stock market prediction.()
* Wang performed a text regression task to predict the volatility of stock prices
* Xie introduced a novel tree representation, and use it to train predictive models with tree kernels using support vector machines.
* Hagenau extract a large scale of expressive features to represent the unstructured text data and employs a robust feature selection to enhance the stock prediction.
* many example...
## EMPIRICAL ANALYSIS
### Sequential Context Dependency
* by broadly analyzing a sequence of news and combining them into a unified context, each news can provide complementary information and thus a more reliable assessment of stock trend can be made.
>
>
> * 2014年9月和2017年3月發生的石油行業改革
> * 中國石化(SINOPEC)的股價,該公司是中國最大的汽油公司之一。
> * 從兩張圖中可以看到兩個帶有波動符號的黃色圓圈,分別代表宣布兩個改革開始的兩個新聞。這兩項改革之間的差異可以通過其先前的新聞序列來推斷。特別是,先前有2014年9月下跌跡象的消息表明,這項改革可能會帶來負面影響;而2017年3月,帶有上升標誌的消息表明了相當積極的信號。
> * 模仿人類理想的分析過程應該在依照時間順序看上下文來整合和解釋每個新聞,而不是分別分析它們。
### Diverse Influence
>
> * 重大新聞比那些瑣碎的新聞對市場的影響更大
> * 例如,圖中帶有下降符號的第三條新聞總結說,中國石化的股價一直在持續下跌,這應該給股價趨勢帶來負面信號。但是,與同期報導汽油行業改革的正面消息相比,這一負面消息的重要性要小得多。
> * 在中國石化的價格僅下跌了一天之後,它就開始上漲並在相當長的一段時間內保持了上升趨勢,==證明了負面消息的影響確實比正面消息的影響要弱==。
> * 結論:理想的框架應該具有區分具有更強烈和持久影響力的新聞並對其予以更多關注的能力。
### Effective and Efficient Learning
> 
> * Within those 8.4% of time periods when there is no news reported for more than 10 days, it is quite tough to make any news-oriented prediction.
> * 理想的股票預測學習框架應遵循類似的過程,該過程尤其是在較早階段就獲取更多新聞進行學習,並進一步優化以應對更困難本。(垃圾手機網頁刪到了==)
## DEEP LEARNING FRAMEWORK FORNEWS-ORIENTED TREND PREDICTION
### 1. Problem Statement
* 視股票預測為classififcation problem
* t 是 date, s 是股票, 計算其上升百分比分為三種類型:DOWN, UP, PRESERVE,來代表下個日期的狀態

* the length of a time sequence N, C~i~會包含一組size為L的新聞集合, C~i~ = [ C~i1~, C~i2~,...,C~iL~ ], denoting L related news on date i.
### 2. Hybrid Attention Networks(HAN)
* 以 Sequential Context Dependency 原則為基礎
* our framework should interpret and analyze news in the sequential temporal context and pay more attention to critical time periods.(framework應該分析及判斷有時間順序的新聞的內容,並pay more attention to critical time periods.)
* 以 Diverse Influence 原則為基礎
* our framework should distinguish more significant news from others.
> 
> * the input of a news corpus sequence
> * a news embedding layer encodes each news into a news vector n~ti~(在news embedding layer會把每個新聞編碼成向量n~ti~)
> * a news-level attention layer assigns an attention value to each news vector in a date, and calculate the weighted mean of these news vectors as a corpus vector for this date. (news-level attention layer會為日期中的每個新聞向量分配一個attention value,並且計算這些新聞向量的加權平均值作為該日期的a corpus vector)
> * these corpus vectors are encoded by a bi-directional Gated Recurrent Units (GRU). (這些corpus vectors會被bi-directional GRU編碼)
> * temporal attention layer assigns an attention value to each date, and calculate the weighted mean of these encoded corpus vectors to represent the overall sequential context information. (temporal attention layer會給每個日期一個attention value, 然後計算這些被編碼的corpus vectors的加權平均值, 來表示the overall sequential context information)
> * the classification is made by a discriminative network.
#### News Embedding
* For each i^th^ news in news corpus C~t~ of date t, we use a word embedding layer to calculate the embedded vector for each word and then average all the words’ vectors to construct a news vector n~ti~ .(對於在日期t的news corpus C~t~的第i個新聞, 使用Word Embedding Layer來計算每個word的embedded vector, 然後平均所有words’ vectors來construct一個新聞的vector n~ti~)
* To reduce the complexity of the framework, we pre-train an unsupervised Word2Vec as the word embedding layer rather than tuning its parameters in the learning process.(為了降低framework的複雜度,預先訓練了unsupervised Word2Vec當作word embedding layer, 而不是邊學邊調參數)
#### News-level Attention
* Since not all news contributes equally to predicting the stock trend, we introduce an attention mechanism to aggregate the news weighted by an assigned attention value, in order to reward the news offering critical information. (因為並非新聞都會對股票預測都有相當的貢獻,所以有attention value的權重來決定各個的重要性)
* We first estimate attention values by feeding the news vector n~ti~ through a one layer network to get the news-level attention value u~ti~ , and then calculate a normalized attention weight α~ti~ through a softmax function. (將news vector n~ti~透過 a one layer network拿到news-level attention value u~ti~來估算attention value, 然後用softmax function來算normalized attention weight α~ti~)
* 求解normalized--[name=UsePerson]
:::info
* softmax function

:::
* we calculate the overall corpus vector d~t~ as a weighted sum of each news vector respectively, and use this vector to represent all news information for date t.(哭阿有夠難翻)
* Thus, we get a temporal sequence of corpus vector D = [d~i~ ], i ∈ [1, N ]. Obviously, the attention layer can be trained end-to-end and thus gradually learn to assign more attention to the reliable and informative news based on its content.(把sequence翻中文很沒feelingㄟ)
#### Sequential Modeling
* To encode the temporal sequence of corpus vectors, we adopt Gated Recurrent Units (GRU)
:::info
* GRU is a variant of recurrent neural networks that uses a gating mechanism to check the state of sequences without separate memory cells.(GRU是rnn的一種,使用gating mechanism來檢查sequence的狀態)
> 
> * At date t, the GRU computes the news state h~t~ by linearly interpolating the previous state h~t−1~ and the current updated state ˜ht(GRU通過linearly interpolated先前的h~t-1~和當前更新狀態˜ht來計算新聞的狀態h~t~)
> 這個該怎翻阿, 我英文就爛-[name=UsePerson]
> 
> * The current updated state ˜ht is computed by non-linearly com-
bining the corpus vector input for this time-stamp and the previous
state, as:(當前更新狀態的˜ht是通過 非線性組合現在和以前的corpus vector input來計算)
> 
> * where r~t~ denotes the reset gate, controlling how much past state should be used for updating the new state, and z~t~ is the update gate, deciding how much past information should be kept and how much new information should be added. These two gates are calculated by:(rt表示重置門,控制應使用多少過去狀態來更新新狀態,而zt是更新門,決定應保留多少過去信息以及應添加多少新信息)
> 
> * Therefore, we can get the latent vector for each date t through GRU. In order to capture the information from the past and future of a news as its context, we concatenate the latent vectors from both directions to construct a bi-directional encoded vector h~i~
(通過GRU獲得每個日期t的latent vector。為了從過去和未來獲得的資訊編成上下文, 因此將過去和未來的latent vectors !融合!, 變成雙向編碼的vector h~i~)
> * h~i~ incorporates the information of both its surrounding context and itself. In this way, we encode the temporal sequence of corpus vectors.(通過 h~i~ 合併了自己和其上下文的資訊的這種方式,我們對(D=[d~i~], i 屬於[1, N])進行編碼)
:::
#### Temporal Attention
:::info
> * temporal-level attention mechanism incorporates both the inherent temporal pattern and the news content, to distinguish the temporal difference.(temporal-level attention mechanism 結合了新聞內容和inherent temporal pattern, 以區分時間差異)
> 
> * θ~i~ is the parameter for each date in the softmax layer, indicating in general which date is more significant.(θ~i~是softmax層中每個日期的參數,通常指哪個日期更重要)
> * o is the latent representations of encoded corpus vectors. (o 是編碼的corpus vectors的latent representations)
> * attention vector β用來區分時間差異
> * use β to calculate the weighted sum V , so that it can incorporate the sequential news context information with temporal attention, and will be used for classification.(用β來計算weighted sum V, V可以將有temporal attention和sequential news context information合併, 將其用於分類)
:::
#### Trend Prediction
> The final discriminative network is a standard Multi-layer Perceptron (MLP), which takes V as input and produces the three-class classification of the future stock trend.(discriminative network將V作為輸入並產生未來股票趨勢的三級分類)
### 3. Self-paced Learning Mechanism
* 為了進行有效而高效的學習,某些情況下的股票預測有困難,所以我們先跳過那些困難的部份,之後再慢慢增加難度。
* Curriculum Learning可以模仿上述的學習過程, 但training sample的順序是藉由predetermined heuristics固定的, 無法調整成dynamic learned models。因此Kuma設計了Self-paced Learning(SPL)
> (1)
> * Given a training set D = (x~i~ , y~i~ )$^n_i$=1 , where x~i~ ∈ R^m^ denotes all the news inputs for the i^th^ observed sample, and y~i~ represents the corresponding stock trend label.(訓練集D=(x~i~ , y~i~ )$^n_i$=1,其中x~i~∈R^m^表示第i個觀察樣本的所有新聞input,而y~i~表示對應的股票趨勢label)
> * Let L(y~i~ , HAN (x~i~ , w)) denotes the loss function between label y~i~ and the output of the whole model HAN (x~i~, w), and w represents the model parameter to be learned.(L(y~i~ , HAN (x~i~ , w))是y~i~和HAN (x~i~, w)的輸出 的loss function, w代表要學習的model參數)
> * We assign each learning sample an importance weight v~i~ . The goal of SPL is to jointly learn the model parameter w and the latent weight v = [v~1~, ..., v~n~ ].(為每個learning sample 分配一個importance weight v~i~, SPL的目標是學習model參數 w 和latent weight v = [v~1~,...,v~n~])
> (2)
> * f denotes a self-paced regularizer which controls the learning scheme for penalizing the latent weight variables.(f是self-paced regularizer, 用於控制懲罰latent weight variables)
> * λ is a hyper-parameter that controls the pace at which the model learns new samples.(λ是hyper-parameter用於控制model學習新樣本的速度)
> * we adopt Alternative Convex Search(ACS) to solve Equation (1).(使用ACS解(1))
> * ACS divides the variables into two disjoint blocks. In each iteration, a block of variables is optimized while fixing the other block. With the fixed w, the unconstrained close-formed solution for the linear regularizer (2) can be calculated by:()
> (3)
> * where v$_i^∗$ denotes the i^th^ element in the iterated optimal solution. (v$_i^∗$代表最優解的第i個元素)
> * l~i~ denotes the loss for each element L(y~i~ , HAN (x~i~ , w)).(l~i~表示每個元素L(y~i~,HAN(x~i~,w))的loss)
> * In this way, the latent weight for samples that are different to what model has already learned will receive a linear penalty.==(求翻)==
> 
> * 將ACS分開的兩塊block交替更新,一直到model收斂
> * Step 3 learns the optimal model parameters with the fixed and most recent v^∗^ by standard back-propagation mechanism.(step 3 藉由standard back-propagation mechanism和固定的w及最新的v^*^來學習optimal model parameters)
> * Step 4 learns the optimal weight variables with the fixed w^∗^ by the linear regulizer.(step 4 藉由linear regulizer和固定的w^*^學習optimal weight variables)
> * As the training progresses and λ increases in Step 6, samples with larger loss will be gradually incorporated, realizing the effective and efficient learning.(隨著訓練的進行,在步驟6中λ的增加,將逐漸合併loss較大的樣本,從而實現有效而高效的學習。)
> :::warning
> * a small initial λ 只會讓loss很小的樣本影響學習
> :::
## EVALUATION
我們首先介紹我們的實驗設置。 然後我們
進行綜合實驗以評估我們深度學習框架的效能,然後再進行有系統性且在real-world market上交易的模擬來檢驗我們framework的有效性
### 1. Experimental Setup:
* **data-collection:**
先收集有關2014-2017年每天的股票資料(交易時間、股價、數量),
同時也收集這四年有關economy 的新聞(publish時間、標題、內容),
接著比對新聞的內容或標題是否和某個股票有相關性,然後把他們最分類整理,把不相關的新聞去掉。最後把這些新聞聚集起來按照日期分類建立daily new corpus。
* **learning setting:**
把股價的rise percent分成三個類別,分別是DOWN, UP 跟 PRESERVE,把資料集分成training set 跟 test set,各佔 33% 和 66%。然後進一步隨機抽樣來自training set的validation set,大小為10%的training set,以便優化hyper-parameters並選擇最佳時期。
在實驗中,我們標住了所有的新聞,並刪除新聞中不重要的字詞,來建立一個詞彙表。
* **Compared Methods:**
為了評估我們framework的有效性,我們把它和其他方式做比較。
**1. Random Forest:**
我們使用隨機森林(RF)分類器,森林中的樹木數量為200。我們透過平均某個日期的所有news vector,為一個日期定義了一個corpus vector,以及把10天內的corpus vector連接起來當成輸入。
**2. Multi-layer Perception:**
我們使用多層感知器(MLP)分類器,其三層大小分別為256、128和64。MLP的輸入與RF中的輸入相同。
**3. News-RNN:**
我們用bi-directional GRU-based 的RNN,RNN輸出層的維數為64。RNN模型採用以時間順序組織的corpus vector作為輸入。
**4. One-RNN:**
為了評估bi-directional setting的有效性,我們使用標準的one-direction GRU-based RNN作比較。
**5. Temporal-Attention-RNN:**
為了評估temporal attention layer,我們通過添加single temporal-level attention layer在News-RNN上來使用Temporal-Attention-RNN(Temp-ATT)。
**6. News-Attention-RNN:**
為了評估News Attention layer,我們
通過增加single news level attention layer 來使用News-Attention-RNN(News-ATT)去處理News-RNN之前的corpus vector。
**7. HAN:**
有正常學習過程的hybrid attention network。
**8. HAN-SPL:**
有self-paced learning學習機制的HAN。
### 2. Effects Of Two Attention Machanisms
* Demonstration on **News Attention**:
因為news attention values是通過single perception with the embedded news vector作為輸入,所以我們可以透過計算每則新聞的news attention vules來判斷我們的framework是不是真的能判斷哪個新聞較為重要。

根據資料顯示,依attention value的大小分成三個層級:
Lowest: 指顯示相關股票的股價跟code。
Middle: 紀錄過往的股市表現。
Highest: 含有預測股市的相關資訊。
* Visualization of **Temporal Attention**:

為了進一步說明Temporal Attention的影響,我們計算所有在test dataset 裡的 temporal attention vectors。最後一個總體分佈的圖,它意味著相較於之前的報導,最近報導的新聞對當前stock trend有更多的影響。
上圖是五種不同temporal attention的模式。
(總體分佈,在不同情況下具有不同的新聞
內容,注意力分佈可以非常多樣化
動態。 )信息豐富的新聞即使在幾天前也可以
在temporal attention layer表達他們的影響。
### 3. Effects of Self-paced learning
為了評估self-paced learning的有效性,我們比較有使用self-paced learning的訓練過程和沒有的,而他們的收斂速度和最終結果會被視為比較指標。

在一開始,有使用SPL的學習速度比不使用SPL的速度慢一些,可能是因為SPL可能會在早期忽略一些困難的樣品。 但是,隨epoch的增加,有用SPL模型的testing accuracy有上升,最後收斂到更好的結果。
### 4. Overall Performance Experiments
* **Classification Accuracy Result:**

在實驗環境中,三標籤分類問題,因為三標籤幾乎
均分,我們選擇用accuracy,即將在全部test dataset 中正確結果的比例作為評估指標。 顯示了所有方法的最終結果
在圖8中,其中每個條形圖表示testing dataset的平均accuracy,而我們的框架是所有的方法中最確率的。
* **Market Trading Simulation:**
為了進一步評估效果,我們使用back-testing(回測),模擬大約一年的股票交易,我們的估計策略每天進行交易。在每個交易日開始時,模型將給出每隻股票的得分,基於上升趨勢的機率減去的下降趨勢的機率。
根據這些分數,一種稱為top-K selects的straightforward portfolio construction strategy 會選出k個得分最高股票以作為隔天新的投資組合,選定的K個股票會根據其下一個交易日的開盤價,進行平均的投資。
為了更接近於真實的交易情況,我們還考慮了將每筆交易的0.3%作為交易成本,(我們還計算了
通過平均持有每隻股票作為基準,
表明整體市場趨勢。)?????

為了評估每種預測方法的性能,我們用annualized return作為度量標準,也就是每年的累計利潤。實際上,投資者總是選擇多隻股票以避免風險。 因此,我們平均投資這前k個股票,而k分別被設置為20,40,60,80來比較圖9中的結果。
通常,應該選擇最熱門的股票。 但是,當K為20時,改進沒有那麼明顯。 這是因為選擇少量股票很可能會導致高周轉率,從而導致高交易成本和抵消利潤。

我們還用K繪製了每種方法的累積利潤曲線。如圖所示,結果與剛剛介紹的總體績效分析一致,並且我們提出的framework能夠獲得最佳的利潤結果。
accuracy和annualized return兩種評估指標在不同method上的結果是一致的。
圖中的利潤曲線證明不同方法之間的利潤差距
幾乎不隨時間變化。
更重要的是,我們的HAN-SPL framework可以達到0.478的最佳精度(圖8),以及所有不同設置下的最高利潤(圖9),特別是對前40名股票進行投資可獲得最高的年化收益率0.611,相較於市場表現為0.04,這是非常厲害的。
接下來,我們進一步分析不同方法的結果,以
討論為什麼我們Framework可以生效。
### 5. Performance Discussion
* **Discussion on the RNN setting:**
MLP, RF的效果相較起來比較差,可能是這兩個方法沒有用的input都不是organized的sequential contexts。所以由此可以知道使用RNN的重要性。因為RNN都會process the news in the sequential context。
* **Discussion on the bi-directional GRU setting:**
雖然News-RNN 和 One-RNN都使用相似的模組架構,但是
News-RNN: bidirectional GRU
One-RNN: one-directional GRU
而從資料可以看到News-RNN的效果較好,這代表使用bidirectional 的News-RNN可以使用past 跟 future的資料去增強預測的能力。
* **Discussion on the attention mechanisms:**
從資料可以知道,兩個都有 one attention layer 的模組, Temp-
ATT 跟 News-ATT, 比沒有用 one attention layer 的 News-RNN
有更好的結果,由此看出這會影響判斷不同新聞和日期的影響的效力。
* **Discussion on our proposed HAN framework:**
從資料可以看出HAN的表現是全部模組裡最好的,所以由此可知當把兩個attetion layer結合在一起時,能達到最高得效力。
另外經由self-paced learning訓練的HAN是最強的,可見effective and efficient learning是很重要的。
## CONCLUSION AND FUTURE WORK
* **conclusion:**
在這篇paper中提到了,人們在預測股市時的行為原則。也就是sequential context dependency, diverse influencem effective 和efficient learning。 基於這些原則,我們提出了 新的學習框架,尤其是具有self-paced learning學習機制的HAN, 它用於從online news去預測股票的未來趨勢。 經過大量real stock market data的實驗證明我們提出的框架可以實現,且準確度很高。同時,通過back-test,使我們的框架可以產生可觀的利潤,並在一年的交易模擬中獲得超額收益。
* **future work:**
在未來,除了監控(modeling塑造)單一獨立股票和它相關的所有新聞外,我們還更進一步的計畫使用現實生活中產業和產業之間的關聯,去了解新聞事件們和單一獨立股票之間的關係。
此外,我們將研究如何將面向新聞的方法與技術分析相結合來做更準確的股票趨勢預測。