# U-Net for classification of Colectoral Cancer Images
## Introduction
The purpose of this study is to investigate how colorectal cancer images may be classified using the deep learning techniques. Globally, colorectal cancer is a major health concern that requires early identification and precise categorization for successful treatment. A significant portion of traditional diagnostic techniques rely on manual analysis, which may be labour-intensive and inconsistent.
The field of medical imaging has seen a significant transformation with the introduction of deep learning, and more accurate and automated image processing made possible by Convolutional Neural Networks (CNNs). The U-Net architecture, which was first created for medical image segmentation tasks, has shown it's potential for semantic segmentation and other visual tasks [8]. However, U-Net has seldomly been used for image classifiaction.
The first purpose of this work is to alter the standard U-Net architecture for classification, and compare it's performance on classification of colorectal cancer images to that of conventional CNN architectures. We believe that its precise localisation with little training data is a perfect match for medical image classifiaction, and may give an edge over conventional architectures that often require large datasets and training time. Another purpose, is to look into how utilising augmented datasets (such as ones using existing segmentation masks) affects classification performance.
We hope to enhance the precision and efficacy of colorectal cancer detection by attempting these two approaches. If successful, this can ultimately lead to improved patient outcomes and more economical use of medical resources.
## Background
In this section we give a brief background on colorectal cancer, and then discuss previous work done in the deep learning research community for image classifiaction.
### Colorectal Cancer
Colorectal cancer is a major global health concern, ranking as the third most common cancer and the second leading cause of cancer-related deaths worldwide, according to the World Health Organization (WHO). There has been an increase of this type of cancer in western countries, probably due to the the increased aging of the population, bad eating habits and different risk factors such as smoking, low physical exercise and obesity [1]. It originates in the colon or rectum, and early detection and accurate classification of its stages are important for effective treatment and improved patient outcomes. Traditional diagnostic methods rely heavily on histopathological analysis [2], but advancements in deep learning offer new ways for automating and increasing the accuracy of the diagnosis.
### Deep Learning Techniques
#### Image Classification
Deep learning has revolutionized many subjects, including of course medical imaging. Convolutional Neural Networks (CNNs), a class of deep learning models, have shown exceptional performances in image classification tasks due to their ability to group inputs based on their position through convolutional layers [3]. Different types of layers are used, for different purposes; convolutional layers and pooling layers are used to extract semantic features, and fully connected layers are then used to classify said features into classes. Intricasies around these basic concepts lead to state-of-the-art architectures, some notable architectures/achievements are:
**VGG16 and VGG19:** Simonyan and Zisserman (2014) presented the VGG network, which used deep stacks of convolutional layers with narrow receptive fields to achieve superior classification accuracy in images. These models show how well deep architectures work for image classification by defining a baseline for performance on the ImageNet dataset [8].
**ResNet:** Residual Networks (ResNet) were first introduced by He et al. (2015) as a solution to the vanishing gradient problem through the use of skip connections. This breakthrough made it possible to train incredibly deep networks, which led to the achievement of revolutionary results on a variety of image classification tasks [9].
**Breast Cancer Classification:** Araújo et al. (2017) used deep convolutional neural networks to classify images of breast cancer histology. Their research showed significant improvements in the automated classification of malignant tissues, emphasising the potential of deep learning techniques to enhance medical imaging accuracy for diagnosis [3].
#### U-Net architecture
The paper by Ronneberger et al.(2015) [9] introduced the U-Net architecture, which was primarily developed for accurate biomedical image segmentation that can be trained with less data. This is because of U-Net's encoder-decoder structure with skip connections, these allow for information to flow from the encoding blocks to the corresponding decoding convolutional blocks. This way the previously learned contextual information isn't lost and high localisation accuracy is maintained [10]. Figure 1 below shows a visualisation; specifically, each convolution block in U-Net consists of two (3x3) consecutive convolution layers, which are then downsampled using a (2x2) maxpool or upsampled using a (2x2) deconvolution layer.
Figure 1: U-Net architecture (example is using a 32x32 input image) for segmentation. Source is [9].

#### U-Net for classification
To the best of our knowledge, there is little to no work for using U-Net for image classification. Because of its architecture, which enables it to capture small features, we believe U-Net is a good candidate for classification problems, especially in medical imaging where the differences between classes can be subtle [4]. U-Net has however been used for semantic segmentation. In the work by [11] U-Net was combined with ResNet by replacing the convolutional layer in U-Net with the residual unit of ResNet, and instead of using deconvolution, linear interpolation is used. They found that their proposed Res-UNet modeled performed the best, followed by standalone U-Net, then ResNet.
## Methodology
In this section we discuss the methodology of our research that we applied to meet the two project goals highlighted in the Introduction. We explain the dataset in more detail, the associated data augmentation processes that were done, the models we tested and related implementation details, and finally, we discuss which experiments we carried out.
### Dataset
We are using the EBHI-Seg dataset[2] which contains 5,170 images of six types of colorectal tumor differentiation stages and the corresponding segmentation mask images. As we are attempting to use U-Net for image classification, we treat these different differentiation stages as six unique classes.
Upon initial data exploration, we observed that the data is highly class imbalanced. Furthermore, we ran into memory issues whilst performing early tests when the input images were their original size at 224x224 pixels. Therefore, we performed the following basic data augmentation on the data, inspired from papers such as:
1. Resize images to 160x160 pixels
2. To achieve an equal class distribution, for the classes that are lower in frequency than the most frequent class, we use existing images to produce additional augmented versions. Specifically, we make use of ImageDataGenerator in TensorFlow with: a rotation range of 90, width and height shift of 0.2, shear range of 0.2, zoom range between 0.8 and 1.2, fill mode of constant, and horizontal flips to True.
We use this dataset for the first purpose of our research, we denote this dataset as **Without Mask**.
For the second purpose of our research, we need a way to use information from the existing segmentation masks to potentially aid in classification. To this end, we create a seperate dataset. For each image in this dataset, its pixel retains the original value if that pixel was 1 in the corresponding segmentation mask, else if it's zero, its replaced with random noise. In this way, we hope that the important pixels in the image are further emphasized, aiding the network to learn to identify classes based on those. Next we apply the same two augmentation techniques mentioned above; Figure 1 below shows some resulting examples. We denote this dataset as **With Mask**.
Figure 2: Examples of augmented random noise images based on segmentation masks.

### Model(s)
We made two models in order to answer our research questions. The first is a unique U-Net architecture for classification. We made the U-Net that was described above in the Background section, having four depth encoder-decoders. However, instead of outputting an image at the final layer, we appended three linear layers; the layers have sizes of 500, 100, and six respectively. To these six values, we apply a softmax activation to get probability distributions for the class labels. In total there are about 20 million trainable parameters in this model with a size of ~70 Mb. There are also more complex models such as [7] that could have been used as a start, but those are especially usefull for organ like datasets.
Our other architecture was a far simpler CNN that we orchestrated inspired from the popular model VGG16. It has four (3x3) ConV2D layers each of which are followed by a ReLu activation and a maxpool operation (a 2x2 window). Finally, we add a single linear layer with a softmax activation.
### Experiment setup
We conducted a total of four experiments. In two of our experiments we made use of the U-Net model and as our dataset used the Mask and the Without Mask datasets respectively. In the other two experiments, we used the far simpler CNN architecture as our model and changed the dataset. Testing out these four combinations lets us see whether or not U-Net architecture provides better performance than a simpler CNN and whether or not a segmenatation mask augmented dataset is of aid to classification.
We split (both) datasets into a 80%/20% training and test split. Furthermore, for both models we made use of the Adam optimizer, and our loss function was the cross entropy loss. Additionally, our learning rate was set to 0.001, and we trained for 25 epochs. As our results will show, training for any more epochs is not of any further benefit.
## Results
### U-Net + Without Mask dataset
For our first experiment, we trained and tested the U-Net model on the Without Mask dataset. On the test set, we obtained an accuracy of about 70%. As seen in Figure 3, we initially see a steady increase in accuracy, however, around the later epochs, there is a noticeable decline in accuracy, indicating potential overfitting. This suggests that while U-Net can learn from the data, it starts to memorize the training set, leading to poorer performance on the validation set. Looking at the confusion matrix, the model seems to have missclassifications with the adenocarcinoma, high-grade IN and low-grade IN classes the most. Visually, we see that these images are very similar looking, the overfitting on the training set is causing issues to generalise better.
Figure 3: Learning curves, confusion matrix and example image predictions of U-Net model trained on Without Mask dataset
<img src="https://hackmd.io/_uploads/SyV98ovBC.jpg" alt="U-Net No Mask" width="400">
<img src="https://hackmd.io/_uploads/HyRZDswSR.jpg" alt="U-Net Without Mask" width="400">

### U-Net + With Mask dataset
For the next experiment, we used the U-Net model again but now with the Mask dataset that has augmented the images to encorporates mask knowledge. In Figure 4 we see that this augmentation has led to far more stable learning.
Figure 4: Learning curves, confusion matrix of U-Net model trained on With Mask dataset
<img src="https://hackmd.io/_uploads/SyFzviwB0.jpg" alt="U-Net Noise" width="400">
<img src="https://hackmd.io/_uploads/rJzIwswB0.jpg" alt="U-net Black Mask" width="400">
Furthermore, applying the mask improved the performance of the U-Net model, achieving an accuracy of around 75%. Specifically, comparing the confusion matrix from Figure 3, we see that there are far less misclassifications for the three classes it was struggling the most on before; especially for adenocarcinoma class we see almost a double in correct classifications. High-grade IN has also had a considerable improvement.
This indicates that the segmentation masks help the model focus on the relevant yet subtle features of the images far better. Unfortunately, we do still see overfitting happening. However, if this can be avoided by other means, it is possible we may see this augmentation to provide more than a 5% increase.
### Simpler CNN + Without Mask dataset
For the third experiment, we used the simple CNN model with the Without Mask dataset. This simpler model has around 3.4 million parameters (7x less than our U-Net). This reduces overfitting, and this effect can also be seen in the training curve in Figure 5. The number of epochs was set to 200 in order to better visualize the evolution of the model.
Figure 5: Learning curves, confusion matrix of Simple CNN model trained on Without Mask dataset
<div style="display: flex; flex-wrap: wrap;">
<div style="flex: 1;">
<img src="https://hackmd.io/_uploads/r1HlLD2S0.jpg" alt="simplermodelWoMask" width="400">
</div>
<div style="flex: 1;">
<img src="https://hackmd.io/_uploads/Sk-gIw2r0.jpg" alt="simplermodelWoMaskMatrix" width="400">
</div>
</div>
### Simpler CNN + With Mask dataset
Now with the final experiment, we use the simpler CNN with the Mask Dataset. The simpler model's accuracy increased to about 75%, matching the performance of the U-Net that used this dataset. This demonstrates that even a simpler architecture can benefit significantly from using segmentation masks augmented data. The masks help the model by emphasizing the important features and reducing noise from irrelevant parts of the images. In Figure 6, the confusion matrix for this model also shows improved performance in distinguishing between classes like adenocarinoma and serrated adenoma, indicating that the masks help in focusing on the relevant features.
Figure 6: Learning curves, confusion matrix of Simple CNN model trained on With Mask dataset
<div style="display: flex; flex-wrap: wrap;">
<div style="flex: 1;">
<img src="https://hackmd.io/_uploads/rkPtUsPr0.jpg" alt="Smaller Model" width="400">
</div>
<div style="flex: 1;">
<img src="https://hackmd.io/_uploads/rynuIovBA.jpg" alt="Confusion Smaller Module" width="400">
</div>
</div>
### Discussion
Looking at Table 1, we see that the use of segmentation masks consistently improved the accuracy of both U-Net and the simpler model. The confusion matrices showed that it helped reduce misclassifications between visually similar classes such as Polyp and Normal, as well as Adenocarcinoma, High-grade-IN, and Low-grade-IN.
The U-Net model without masks showed signs of overfitting, while the simpler model, with fewer parameters, maintained a more consistent performance, outperforming U-Net by 2%.
Table 1: Summary of experiments
| Model | Accuracy without Mask Applied | Accuracy with Mask Applied |
|----------------------------|-------------------------------|-------------------------------|
| U-Net | 0.70 | 0.75 |
| Simpler Model | 0.72 | 0.75 |
## Conclusion
### U-Net vs Simpler CNN Comparison
The simpler model performed slightly better without masks, the U-Net model showed comparable performance when masks were applied. The accuracy values, and the learning curves showed that U-Net was overfitting to the dataset, as despite the simpler model having significantly fewer parameters compared to the U-Net, their classification accuracies are comparable. Increasing dataset size, or introducing regularisation techniques may allow U-Net to outperform simpler models. There is potential of U-Net for complex medical image classification tasks.
### Effectiveness of segmentation mask augmentation
Both the U-Net model and the simpler CNN showed an improvement in classification accuracy when training on segmentation masks based augmented data. The U-Net model saw an increase of 5%, and the simpler of 3%. Analysing the confusion matrices showed that the models improved on classifying similar looking classes. This result suggests that utilizing the masks can improve the model's ability to focus on distinct class features, improving the classification performance.
## Future Work
Multiple models in an ensemble setting should be used on this dataset in order to see if this increases the performance.
Different other techniques can be used in order to generate augmented data, such as Adversarial Training.
## References
[1] Kuipers, Ernst J., et al. "Colorectal cancer." Nature reviews. Disease primers 1 (2015)
[2] Shi, Liyu, et al. "EBHI-Seg: A novel enteroscope biopsy histopathological hematoxylin and eosin image dataset for image segmentation tasks." Frontiers in Medicine 10 (2023): 1114673.
[3] Wang, Dayong, et al. "Deep learning for identifying metastatic breast cancer." arXiv preprint arXiv:1606.05718 (2016).
[4] Falk, Thorsten, et al. "U-Net: deep learning for cell counting, detection, and morphometry." Nature methods 16.1 (2019): 67-70.
[5] Rathore, Saima, Mutawarra Hussain, and Asifullah Khan. "A novel approach for colon biopsy image segmentation." 2013 ICME International Conference on Complex Medical Engineering. IEEE, 2013.
[6] Babu, Tina, et al. "Colon cancer prediction on different magnified colon biopsy images." 2018 Tenth International Conference on Advanced Computing (ICoAC). IEEE, 2018.
[7] Huang, Huimin, et al. "Unet 3+: A full-scale connected unet for medical image segmentation." ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2020.
[8] Alharbi, Amal H et al. “Segmentation and Classification of White Blood Cells Using the UNet.” Contrast media & molecular imaging vol. 2022 5913905. 11 Jul. 2022, doi:10.1155/2022/5913905
[9] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical
image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015:
18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages
234–241. Springer, 2015
[10] Bousias Alexakis, E. and Armenakis, C.: EVALUATION OF UNet AND UNet++ ARCHITECTURES IN HIGH RESOLUTION IMAGE CHANGE DETECTION APPLICATIONS, Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B3-2020, 1507–1514, https://doi.org/10.5194/isprs-archives-XLIII-B3-2020-1507-2020, 2020.
[11] Cao, K.; Zhang, X. An Improved Res-UNet Model for Tree Species Classification Using Airborne High-Resolution Images. Remote Sens. 2020, 12, 1128. https://doi.org/10.3390/rs12071128
## Team Contributions
Kanish - Implementing U-Net model and integrating it into experiment pipeline that Mihai made. Worked on methodology section of the report and did project storyline. Worked on refining introduction, related work and results section of the report.
Cosmin - Implemented training and testing pipeline on tensorflow (with the data augmentations). Worked on results and conclusion section of the report. Worked on story line aswell.
Shreya - Wrote the Introduction section and Related works section of the report.