# ML in Practice Project Report (IEOR4579)
## Abstract
Our aim for this project is twofold: 1) we aim to first replicate the project detailed in the paper [Stock price prediction using Generative Adversarial Networks](https://paperswithcode.com/paper/stock-price-prediction-using-generative) by H.Lin *et al.* and 2) we attempt to improve some of the potential shortcomings found in the paper.
## Paper Summary
The paper proposes the Generative Adversarial Network (GAN) to predict the stock closing price of AAPL in the following three days using the past 30 days' data. It uses Gated Recurrent Units (GRU) as the generator and Convolutional Neural Network (CNN) as the discriminator in GAN settings.
The paper then compares the performance of traditional LSTM, traditional GRU, basic GAN model, and WGAN-GP models on stock index data from Yahoo Finance and news data from SeekingAlpha. The models' performances are evaluated using Root Mean Square Error (RMSE). The conclusion is that both the basic GAN model and WGAN-GP model outperform traditional LSTM and GRU models. Moreover, while the basic GAN model performs better during normal periods, the WGAN-GP model performs better during unexpected events such as COVID-19.
## Extensions
### Datasets and Features
#### Dataset in Original Paper
The original dataset used in the paper spans from DATE_START to DATE_END. There are 36 features in total
Different frequency of data:
- How to we structure it such that model treats it separately (different learning rates)
#### Our Dataset Descriptions
Data are downloaded from three main data sources: 1) [Polygon.io](https://polygon.io/), 2) [FRED](https://fred.stlouisfed.org/) and 3) [Nasdaq Data Link](https://data.nasdaq.com/). We downloaded the tick-level trades data for the top 5 cryto-currencies (ranked by market cap) and converted them into bar data. Two kinds of bar data are used: time bar (1 hour frequency) and dollar-value bar ($1MM sampling frequency). The target price for the model is Bitcoin (BTCUSD) closing price.
### Model Architecture
#### Methodology in original paper
In the paper by Lin et al., the authors set up a GAN model with a Gated Recurrent Unit (GRU) as the generator and a Convolutional Neural Network (CNN) as the discriminator. The generator used in the original paper consists of 3 layers of GRU with 1024, 512, and 256 neurons respectively, and 2 layers of Dense with 128 and 64 neurons. The discriminator consists of 3 one-dimensional Convolution layers followed by 3 Dense layers. According to the core idea underlying the GAN model, the generator will generate fake datasets in an attempt to deceive the discriminator, whereas the discriminator's job would be to distinguish dataset generated by the generator from real dataset. In their paper, Lin et al. utilized cross-entropy method to calculate both the generator and the discriminator loss.
#### Potential Shortcomings
During the replication process, an issue we encountered was that we did not get the same results when plotting the Generator loss (G_loss) and Discriminator loss (D_loss) functions for the Basic GAN model. The original D_loss and G_loss plots for the Basic GAN model (Figure 1) are very smooth and both functions eventually converged. However, in our replication using the same code and dataset, the D_loss and G_loss functions for the Basic GAN model fluctuate a lot and do not suggest trends of convergence. This is a isuue that we spotted when attempting to replicate the original work by the authors. However, when we were replicating the WGAN model, our D_loss and G_loss plots are much smoothier and resembles the one presented in the authors' paper. In the WGAN model, the D_loss function decreases drastically to a small value below 1 and G_loss increases slightly, indicating that the training process is successful and complete since the generator cannot be further improved. Although our replication of the D_loss and G_loss functions for the Basic GAN model encountered problems, we arrive at the same conclusions with the authors in that the WGAN model performs better for this specific task.
#### Our Methodology
hyperparameter tuning
$$
E=MC^2
$$
$E=MC^2$
### Conclusion
#### Conclusion from original paper
#### Our conclusion
# Proposed Framework
## 1.Replication
### 1.1 Paper Summary
Summary
Model Architecture Detail:
### 1.2 Shortcomings and Adjustments
data prepocessing
### 1.3 Results
which plots to show
RMSE table
## 2.Extensions: apply the model to Crypto Data
### 2.1 Data
### 2.2 Model modifications (if any)
### 2.3 Results and conclusions