Project 3: Fighting COVID-19

# Project 3: Fighting COVID-19 ##### Team members: Artem Vysogorets (amv458), Aishwarya Kamath (ask762), Uladzislau Sobal (us441), Muhammad Shujaat Mirza (msm622) ##### Person responsible for uploading submissions: Uladzislau Sobal (us441) The goal of our project is to classify chest x-ray images to detect patients infected with COVID-19. We will be using the following datasets: - [COVID-19 chestxray dataset](https://github.com/ieee8023/covid-chestxray-dataset). - a dataset containing 250 chest x-rays of patients infected with MERS, SARS and COVID-19. - [Chest Xray Pneumonia dataset](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) - a dataset of 5000 images containing xray images of pneumonia patients. - [PadChest dataset](https://arxiv.org/pdf/1901.07441.pdf) - a large dataset of 160,000 xray images (not necessarily pneumonia or covid-19). - [Chest-Xray8 dataset](https://www.kaggle.com/nih-chest-xrays/data) - a dataset of 100,000 xray images annotate with diagnosis type. ## Proposed approach The goal is to achieve the best possible classification performance on COVID-19 chest x-ray dataset. We will train our models to classify: - No pathology vs Pneumonia, - Bacterial vs Viral vs COVID-19 Pneumonia, - Survival of patient. We want to start from the easiest possible approach and iterate from there. We are going to use [DenseNet](https://arxiv.org/pdf/1608.06993v3.pdf) architecture. We can also use chexNet implementation [https://github.com/arnoweng/CheXNet](https://github.com/arnoweng/CheXNet) or here [https://github.com/jrzech/reproduce-chexnet](https://github.com/jrzech/reproduce-chexnet). The latter one has training code, while the first one is simply a pretrained model. There's also this repo with some tools for torch and xray: [https://github.com/mlmed/torchxrayvision](https://github.com/mlmed/torchxrayvision). The paper describes the datasets: [https://arxiv.org/pdf/2002.02497.pdf](https://arxiv.org/pdf/2002.02497.pdf) ### Experiment 1 Try to fine-tune a pre-trained DenseNet model for chest-xrays on the COVID-19 dataset. ### Experiment 2 Leverage the existing data from chest xray dataset to pre-train our model, and fine-tune it on COVID-19 dataset. ### Experiments with different models The idea is to explore the feature extractors, extract the features and apply different models which we learned in this class: svm, perceptron, logistic regression, all the fun stuff # Overall story- Contrast feature engineering+ classifiers to learning features using neural networks. But since we need lots of data to train the neural network, we try transfer learning by first pre-training on PadChest which has the most training signal with lot of different labels. Then fine tune on NIH which is more specifically to classify pneumonia. Followed by our tiny dataset. ## Full list of experiments/action items: - [x] Write a cros validation loop for the final dataset so we can comapre across all the classifiers - [ ] Build the final evaluation dataset by adding some 'No finding' images to covid 19 dataset (don't use the train images) [Vlad] - [ ] Extract and store features - [ ] Neural network output for all the images in the dataset [Vlad] - [x] Fourier [Artem] - [x] Local Binary Pattern (LBP) [Artem] - [x] Histogram of Gradients [Artem] - [ ] Elongated [Artem] - [ ] Other features from the [paper](https://arxiv.org/pdf/2004.05835.pdf) - [ ] Implement feature augmentations - [ ] Build and analyze models - [ ] SVM - [ ] logistic regression - [ ] perceptron - [ ] Decision trees - [ ] KNN - [ ] Forests - [ ] Adaboost (or just boosting) - [ ] Write the report