10.039 Deep Learning Small Project
--
[TOC]
# Team
**Keith Ng** 1003515
**Li Yuxuan** 1003607
**Ng Jia Yi** 1003696
# Introduction
In this project, we attempt to design a deep learning model, whose task is to assist with the diagnosis of pneumonia, for COVID and non-COVID cases, by using X-ray images of patients. We engineered a lightweight convulutional neural network model that can collectively train/test the given x-ray scans. In this report, we will cover our design process and the challenges that come along with it.
# Proposed Methodology
## Selected Classifier
Why we chose one 3-class classifier instead of 2 binary classifiers?
## Selected Model
We started off by researching on the Internet about the choices of models on doing COVID19 chest X-ray image classifications. From one of the research papers that we found, we found that the comparison results show that the performance of VGG16 model on both original and augmented X-Ray dataset is better than Resnet50. (https://www.biorxiv.org/content/biorxiv/early/2020/07/17/2020.07.15.205567.full.pdf) For Train-Test ratio 90%-10%, for original, VGG16 achieves 99.25% accuracy whereas Resnet50 achieves 50% accuracy. For augmented, VGG16 achieves 93.84% accuracy whereas Resnet50 achieves 93.38% accuracy.


Therefore, we decided to draw inspirations from the VGG16 eventually, designed the layers according to what VGG16 has.

<section style="text-align:center">VGG16 architecture (<a href="https://neurohive.io/en/popular-networks/vgg16/">https://neurohive.io/en/popular-networks/vgg16/</a>)</section>
However, due to the GPU limitations and very slow convergence of the original VGG16, we decided to cut down our layers by taking out some of the convolution layers inside. This results into the model shown below.
Our proposed model architecture is as follows:
```
Conv2d: in: 1, out: 64, k=3, s=1, p=1
ReLU
MaxPool: k=2, s=2
Conv2d: in: 64, out: 128, k=3, s=1, p=1
ReLU
MaxPool: k=2, s=2
Conv2d: in: 128, out: 256, k=3, s=1, p=1
ReLU
Conv2d: in: 256, out: 256, k=3, s=1, p=1
ReLU
MaxPool: k=2, s=2
Conv2d: in: 256, out: 512, k=3, s=1, p=1
ReLU
MaxPool: k=2, s=2
Conv2d: in: 512, out: 512, k=3, s=1, p=1
ReLU
MaxPool: k=2, s=2
AdaptiveAvgPool 7x7
Flatten
FC layer 1: in: 512*7*7, out: 4096
ReLU
Dropout
FC layer 2: in: 4096, out: 4096
ReLU
Dropout
FC layer 3: in: 4096, out: 1000
ReLU
Dropout
FC layer 4: in: 1000, out: 3
ReLU
```
## Dataset and Image Augmentation
The given dataset given to us consisted of the following.
![Uploading file..._p6ynqmfsd]()
We decided that the given dataset was too small and we would not be able to train our model sufficiently. To cope with this, data augmentation was utilised to expand our training and validation sets. The following code depicts our code implementation.
```
train_transformer = transforms.Compose([transforms.Resize((224,224)),transforms.RandomApply([
transforms.CenterCrop((180, 180)),
transforms.Resize((224, 224)),
transforms.RandomRotation(20, fill=(0,)),
transforms.RandomHorizontalFlip(),
],0.7)
])
```
Each imaged passed to our transformer has a 70% chance of going through a series of fixed transformations.
```
for _ in range(10):
img = train_transformer(img)
image = np.asarray(img) / 255
image = transforms.functional.to_tensor(np.array(image)).float()
imgs.append(image)
```
Every image will go through the transformer 10 times, and at each iteration, we save the resultant image. Hence, every 1 image produces 10 augmented images in total. Below is an example of how 10 augmented images(derived from the same original image) turns out.
# Implementation
We did most of our initial model testings on Google CoLab, before the access to the school GPU was given to us.
## Custom Dataset
We customized the Dataset object to categorize data into 3-classes: **Normal**, **Infected-COVID**, and **Infected-NONCOVID**. The Dataset object inherits from the torch dataset class and `__getitem__` method is overwritten to output the augmented image as well as its label. In this scenario, we used **0,1,2** to represents **Normal**, **Infected-COVID**, **Infected-NONCOVID**.
## Data Augmentation
Since we have limited number of images in our original dataset, we applied random image augmentation to all our images, with a probability of 70% to get a transformation. In augmentation, we applied *rotation*, *zooming* and *horizontal flipping* to the image. Each image will be looped to go through **10** random transformations to result in 10 images. The 10 random transformations are chained such that Hence, if we supply 5216 training images, we will actually be training our model on 52160 images. (though some of them will be the same as the probability of random transformation is 70%)
## Training
During training, we use a batch size of 32 images from the dataset for one training cycle. After running 200 batches, we do one validation of the model. We wrote a custom function that will print out a graphical representation of the current performance on the validation of the model, shown as follows:

The column label represents the model prediction whereas the row label is the ground true label. If the prediction is correct, meaning that predicted label is the same as ground truth, it will be colored green. If not, it will be colored red. Additionally, we can see that the data augmentation has taken effect on some of the images, which are rotated, flipped and zoomed in.
## Saving and Re-training
When we migrated our training onto the school GPU, it allowed us to train with less time and larger batch size of 32 (since originally we trained on Colab with batch size of 8 only). However, due to the complicated layers and large number of parameters, our training will crash when going into the 3rd epochs as CUDA GPU ran out of memory. Hence, we implemented a saving mechanism that during each training (process until it crashed), the model is constantly saved whenever the accuracy reached a higher value. This ensures that we always have the best model to train with at the start each time we re-run the training. Moreover, for visual representation, the training loss and validation loss, and accuracy are saved for each batch and saved in separate CSV to plot graphs. In the event of crash, the re-run will append to the previous accuracy, train loss and validation loss CSV so in the end we can plot an overall graph about how the three metrics evolve over time (number of batches).
> How we ran our code... Use sch GPU... 20 epochs... Original model too big so we cut down... Reduce batch size from 64 to 32... Always run out of memory... so save model everytime and export results to csv...
# Results
# Conclusion
Whats our consensus... What we could have improved.