or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Portfolio Management Using Reinforcement Learning
1. Introduction to Portfolio Management
Portfolio is a collection of financial assets. Portfolio Management is a decision making process of continuously re-allocating an amount of fund into a number of financial assets to maximize the return while restraining the risk.
Previous Works has done some RL experiments on single stock trading.
We planned to do more experiments on stock portfolio management and explore the impact of observation window size.
1.1 Assets
Assets could be cash, bonds, stocks, options etc. In our project, the candidate assets only contain stocks and cash. And the portfolio consists of cash plus \(m\) stocks. The notations are listed below:
\[\pmb{v_t} := (v_{0,t}, v_{1,t}, ..., v_{m,t} )^T \in R^{m+1}\]
\[{V_t} \in R^{(m+1)\times 5} \]
\[\pmb{y_t} := \pmb{v_t} \oslash \pmb{v_{t-1}} = (1, \frac{v_{1,t}}{v_{1,t-1}}, ..., \frac{v_{m,t}}{v_{m,t-1}})^T \in R^{m+1} \tag{1}\]
1.2 Portfolio
In our project, assume we had \(p_0\) cash at the beginning \(t=0\). We will adjust the investment weights of those assets at the end of every trading day. The notations are listed below:
\[\pmb{w_t} := (w_{0,t}, w_{1,t}, ..., w_{m,t} )^T \in R^{m+1}, \ s.t.\ \sum_{i=0}^{t}w_{i,t} = 1\]
\[p_{t} := p_{t-1}\pmb{y_t} \cdot \pmb{w_{t-1}} \tag{2}\]
\[r_t := log(\frac{p_t}{p_{t-1}}) = log(\pmb{y_t} \cdot \pmb{w_{t-1}}) \tag{3}\]
Based on above notation, we can get:
\[p_{t}=p_{0} e^{\sum_{i=1}^{t} r_{i}}=p_{0} \prod_{i=1}^{t} \pmb{y}_{i} \cdot \pmb{w}_{i-1} \tag{4}\]
1.3 Transaction Cost
Buying and selling stock will produce commission fees. Assume there is a constant commission fee rate \(\mu\) on the basis of the trading value, we can get:
\[c_t := \mu\| \frac{\pmb{y_t} \odot \pmb{w_{t-1}}}{\pmb{y_t} \cdot \pmb{w_{t-1}}} - \pmb{w_t}\|_1 \tag{5}\]
\[r_{t} = log((1-c_t)\pmb{y}_{t} \cdot \pmb{w}_{t-1}) \tag{6}\]
\[p_{t} = p_{0} \prod_{i=1}^{t} (1-c_t)\pmb{y}_{i} \cdot \pmb{w}_{i-1} \tag{7}\]
2 Reinforcement Learning Problem Formulation
2.1 Assumptions
2.2 Markov Decision Processes Formulation
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 1: Markov Decision Processes (MDP)
As Figure 1 shows, Markov Decision Processes is a fundamental framework for reinforcement learning. We will transform our financial problem into a MDP problem. Figure 2 further illustrates the transformation. The notations of both figures are defined as follows:
\[ O_t = (V_{t-n+1:t}, \pmb{w_{t-1}}), \ where\ V_{t-n+1:t} \in R^{\ n\times (m+1)\times 5} \]
\[ \pmb{a_t} = \pmb{w_{t}} \]
\[r'_t = \frac{1}{T}r_{t+1} = \frac{1}{T}log((1-c_{t+1})\pmb{y}_{t+1} \cdot \pmb{w}_{t}) \tag{8}\]
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 2: MDPs for Portfolio Management
3 Method
Our problem is different with traditional reinforcement learning problem in the following ways:
Because of the above characteristics, we choose to focus on exploitation rather than exploration. Hence, deterministic policy rather than a stochastic policy may be a better choice for us.
3.1 DDPG
DDPG Lillicrap, et al. 2015, short for Deep Deterministic Policy Gradient, is a model-free off-policy actor-critic algorithm, combining DPG with DQN. DQN (Deep Q-Network) stabilizes the learning of Q-function by experience replay and the frozen target network. The original DQN works in discrete space, and DDPG extends it to continuous space with the actor-critic framework while learning a deterministic policy. Figure 3 shows the pseudocodes for this algorithm.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 3: DDPG algorithm
4. Experiments
4.1 Data Preparation
4.1.1 Data Source and Stock Selection
We used the S&P 500 dataset provided by Yahoo. The dataset contains the historical data of stocks in the U.S. market from 2015-01-02 to 2020-01-01. The history data contains Open, High, Low, Close and Volume of every stock on each day.
The dataset for our project is formed by the historical price data of 17 stocks chosen from different S&P 500 sectors: Information Technology, Health Care, Financials, Consumer Discretionary, Communication Services (sorted by size in descending order). We chose stocks from Top5 sectors to reduce the risk from the industry and improve model generalization. To evaluate model performance, we will use both seen stocks and 17 more unseen stocks.
4.1.2 Data Visualization
To obtain an overview of data, we visualized the trend of the close price of 19 stocks in the training dataset, shown in Figure 4. We've noticed that the stock prices of Google and Amazon constantly outperformed, compared to the rest of the companies, which is a realistic phenomenon.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 4: Normalized Close Price of 17 stocks in the training dataset
4.2 Evaluation Metrics
We use the following metrics to evaluate the performance of our agent in the test set.
4.3 Baseline
We choose market value as the baseline of our series experiments. The market value is obtained by equally investing to all stocks in the pool.
4.4 DDPG Methods
Before analyzing our result, we clarify that:
4.4.1 DDPG with CNN
Training Performance
From Figure 5 and Figure 6 we notice that:
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 5: DDPG-CNN Training performance(windows size=20)
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 6: DDPG-CNN Training performance(windows size=60)
Testing Performance
In Figure 7, we show the results of trading on 17 seen stocks using our model. We use sharpe ratio and cumulative returns to evaluate the performance of our model in test periods (2019-01-01 to 2020-01-01). The Sharpe Ratio is used to take risk into account.
As the result shows, it turns out that larger window size works better. For larger window sizes, the portfolio value is better than market value, which means that the DDPG+CNN model can make more profits than equally-distributed strategy.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 7: DDPG-CNN Back-test Results on 17 Seen Stocks
Generalization Test Performance
Figure 8 shows the results of trading on 17 unseen stocks to evaluate the model generalization ability. As the result shows, the generalization performance is fairly good. The agent with window size = 60 can win the baseline at some points in the testing period.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 8: DDPG-CNN Back-test Results on 17 Unseen Stocks
4.4.2 DDPG with LSTM
Training Performance
From two figures below, we see the training performance of two agents with window size=20 and window size=60, we notice that:
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 9: DDPG-LSTM Training performance(windows size=20)
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 10: DDPG-LSTM Training performance(windows size=60)
Testing Performance
Figure 11 shows the results of trading on 17 seen stocks using DDPG+LSTM. As the result shows below, in contrast to the training performance, it turns out that the agent with larger window size performs better, which has higher portfolio return. And all agents can make more profit than the average distribution strategy (baseline model).
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 11: DDPG-CNN Back-test Results on 17 Seen Stocks
Generalization Test Performance
Figure 12 shows the results of trading on 17 unseen stocks to evaluate the model generalization ability. The generalization performance is not quite good. The agents can make profit at some points in the test period but the performance is not quite stable. In the last 50 steps, the agents with window size=20 and window size=50 make profits. It is still an optimistic trend.
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 12: DDPG-CNN Back-test Results on 17 Unseen Stocks
5. Conclusion and future work
5.1. Conclusion
5.2. Future Work
6. References
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra. 2015. Continuous control with deep reinforcement learning
Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai, Senior Member, IEEE. 2017. Deep Direct Reinforcement Learning for Financial Signal Representation and Trading
Zhengyao Jiang, Dixing Xu, Jinjun Liang. 2017. A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem
Derek Snow. 2020. Machine Learning in Asset Management—Part 1: Portfolio
Construction—Trading Strategies
Appendix
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 13: 17 Seen Stocks
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Figure 14: 17 Unseen Stocks