Exploring Strategies for Training U-Net with Limited Data: Ablation Study and Fine-tuning

Exploring Strategies for Training U-Net with Limited Data: Ablation Study and Fine-tuning === _Group 26:_ Jeroen Hagenus, 5099528, j.hagenus@student.tudelft.nl\ Daan Scherrenburg, 5175151, d.j.scherrenburg@student.tudelft.nl ## Table of Contents [TOC] ## <a id="sources"></a>Sources <div style="text-align: left"> - U-Net paper: https://arxiv.org/abs/1505.04597 - Pytorch implementation of U-Net (Miltos-90): https://github.com/Miltos-90/UNet_Biomedical_Image_Segmentation - Datasets: - ISBI 2012: https://imagej.net/events/isbi-2012-segmentation-challenge - Drosophila Kc167 cells: https://bbbc.broadinstitute.org/BBBC007 - This blog post: https://hackmd.io/@wQ-A3P0dSumN2Slt7E_hvQ/computer_vision_project_group_26 </div> # <a id="introduction"></a>Introduction Medical image segmentation makes it possible to accurately identify and delineate specific structures or regions of interest within medical images. This identification allows for better understanding, diagnoses, and treatments for many different conditions. Originally, this identification relied on manually segmenting parts of interest, which required a lot of knowledge and time. Nowadays, neural networks, such as FCN and U-Net, are being trained to automatically identify these structures or regions of interest. These networks require the medical input images together with labeled images, containing the required information, to train its parameters to increase the performance. However, a major problem is that in many cases, it is hard to acquire these labeled medical images. This makes it very difficult to properly train these neural networks to achieve high performance. And because it is very important in medical applications to be as precise and accurate as possible, it is therefore not possible to use such a network if the performance is not high enough. In this blog post, we will describe and discuss two experiments that can contribute to the best possible training of a U-Net with little data. This U-Net is first presented in the paper “U-Net: Convolutional Networks for Biomedical Image Segmentation”[[1](#ref-1)]. In this paper, a large part of the performance is dedicated to the data augmentation techniques. They use several different augmentation techniques and therefore the first experiment will test which of these techniques makes the largest contribution to the performance. This may help in the future in selecting effective augmentation techniques. The second experiment will be concerned with fine-tuning a neural network. This means that the weights of a pre-trained network are slightly adjusted for a new dataset. This ensures that the model has already learned a large part and therefore does not have to start all over again. So this could result in improved performance. We will start by explaining the architecture of used U-Net together with the used data augmentation that was used. In the experiments section, the experiments will be presented together with the used datasets. Additionally, we will describe the training procedure and explain of the used performance metrics. Afterward, the quantitative and qualitative results will be presented and described. We will conclude with a discussion and conclusion of the results. # <a id="u-net"></a> U-Net The neural network used in this blog post is U-Net as presented in the paper "U-Net: Convolutional Networks for Biomedical Image Segmentation"[[1](#ref-1)]. In this blog post, we will use the PyTorch implementation of this architecture by Miltos-90 [[2](#ref-2)]. ### <a id="architecture"></a> Architecture The authors of the paper introduced a novel convolutional network called U-Net, specifically designed for image segmentation tasks. This model demonstrates high performance in segmenting images. The model architecture is visualized in [Figure 1](#figure_1). The input to the U-Net consists of 512x512-pixel images with a single channel, which is then converted to 572x572-pixel images through padding. The padding involves mirroring the image at the sides. After going through encoding and decoding steps, the input is transformed into segmentation images with dimensions of 388x388 pixels and 2 channels. The size of the output is significantly reduced due to the use of unpadded convolutions. The U-Net architecture comprises an encoder and a decoder. The encoder incorporates two convolutional layers with a 3x3 kernel size and the Rectified Linear Unit (ReLU) activation function. It also includes max-pooling layers with a size of 2x2 and a stride of 2. This encoding process is repeated four times, and finally, two final convolutional layers are applied. The decoder employs an up-convolutional layer with a 2x2 kernel size. It concatenates a cropped image that was generated before the corresponding max pooling layer. This concatenation is followed by two convolutional layers with a 3x3 kernel size and the ReLU activation function. Lastly, a 1x1 convolutional layer is utilized to reduce the 64 channels to only 2 channels. ![U-Net](https://i.imgur.com/tCAAtBJ.png) <a id="figure_1"></a> *Figure 1: U-Net model architecture.* [[1](#ref-1)] ### <a id="data-augmentation"></a>Data augmentation The U-Net implementation by Miltos-90 [[2](#ref-2)] uses a combination of several data augmentation methods to prevent to model from overfitting considering. The first applied augmentation method is random geometric transformation. This method picks a random transformation out of the following; horizontal flip, vertical flip, transpose, 90-degree rotation, and finally a shift, scale, and rotate transformation. Afterward, the model robustness is increased by variations in gray values using Gaussian noise, multiplicative noise, and random brightness and contrast adjustments. Finally, elastic deformations are applied to stretch the medical images in some ways to generate new data. These augmentation techniques contribute to improving the model's generalization capability and preventing it from overfitting. # <a id="experiments"></a> Experiments Two experiments will be performed to contribute to the training of a U-Net with little medically labeled data. The first experiment is an ablation study on the used data augmentation techniques of the U-Net implementation of Miltos-90. This experiment will show which of these techniques contributed most to the performance of the implemented model. In future implementations of a U-Net, this can contribute to the selection of suitable data augmentation techniques to obtain the best possible performance. The second experiment will focus on the effect of fine-tuning a U-Net model on medical data. Fine-tuning is the process of adapting a pre-trained model to a new task by optimizing the model's weights and parameters for the new data set. This could result in better performance because the network does not have to start training all over again. ### <a id="datasets"></a>Datasets The experiments that we present in this blog post will need data for the training of the models. Two different datasets have been selected for the training and validation of the models: ISBI-2012 and Kc167. **ISBI-2012** In the original paper that presents the U-Net model, the authors used the dataset of the ISBI challenge of 2012 to train the model. In this blog post, this dataset will be used to train the U-Net for the ablation study and the fine-tuning model. The images in this challenge depict the ventral nerve cord (VNC) of a Drosophila first instar larva, representing nervous cells from a fruit fly. The objective of the challenge is to segment the cell boundaries in these images. One of the images from the dataset is, together with the ground truth, visualized in [Figure 2](#figure_2). [[3](#ref-3)] The dataset comprises 30 training and 30 test images. These images naturally have the correct dimensions for the U-Net model of 512x512 pixels. For training, a validation set of 3 images will be extracted from the training set to determine the validation loss and adjust the model accordingly. <figure id="figure_2"> <img src="https://i.imgur.com/Xttv3fL.png" alt="ISBI data" width="90%"> <figcaption> Figure 2: Data ISBI challenge 2012: input image (left), ground truth (right). </figcaption> </figure> **Drosophila Kc167 cells** For the experiment on fine-tuning a different dataset obtained from the Broad Bioimage Benchmark Collection will be used. This dataset consists of Kc167 cells, also from fruit flies. These images may appear different from those in the ISBI 2012 challenge, but the ground truth images are labeled in the same way. Therefore, we selected this dataset to examine whether images related to the same topic but with distinct visuals can be successfully segmented using the fine-tuned model. One of the images from the dataset is, together with the ground truth, visualized in [Figure 3](#figure_3). [[4](#ref-4)] This dataset contains a total of 32 images. These images have different sizes, namely 400x400, 450x450, and 512x512. To get valid results from the U-Net, it is important to have input images that are all of the same size equal to 512x512. Therefore, we resized the images to the required size using bicubic interpolation. For training, a validation set of 3 images is extracted from the training set and another 3 images are extracted for the testing. <figure id="figure_3"> <img src="https://i.imgur.com/BDZXqwN.png" alt="DNA data" width="90%"> <figcaption> Figure 3: Data for generalization: input image (left), ground truth (right). </figcaption> </figure> ### <a id="experiment-a"></a> Experiment A: ablation study Experiment A focuses on the ablation study of the data augmentation methods. For this ablation study, several models will be compared with different data augmentation techniques. Below is a list of the different models. - _Miltos-90 U-Net reproduction_: This model will use the combination of all data augmentation techniques as presented in the U-Net implementation by Miltos-90. - _No augmentation_: The U-Net will be trained without any data augmentation techniques to validate the contribution of the augmentation techniques to the performance. - _Random geometric transformation_: This model will just make use of the random geometric transformation augmentation technique. - _Gaussian noise_: This model will only add Gaussian noise to the input images as data augmentation. - _Multiplicative noise_: This model will only add multiplicative noise to the input images as data augmentation. - _Random brightness contrast_: This model will randomly make brightness and contrast adjustments to the input images to modify the data and make the model more robust. - _Elastic deformations_: This model will only modify the data by randomly stretching and deforming the images elastically. All of the models will be trained and tested on the ISBI-2012 dataset. The performance will be compared to draw conclusions about the effectiveness of the different data augmentation techniques. ### <a id="experiment-b"></a> Experiment B: fine-tuning Experiment B focuses on comparing the performance of the fine-tuning concept. As mentioned earlier, fine-tuning is concerned with adapting a pre-trained model to a new data set. To properly compare the effect of this, several models are trained differently but compared in the same way using dataset Kc167. The models for this experiment are the following. - _Base model_: This model has been trained on the dataset that is also used for testing, namely Kc167. The purpose of this model is to serve as a baseline for comparison. This is how one would originally train the model. - _Generalized model_: This model will make use of the pre-trained Miltos-90 U-Net reproduction model from [Experiment A](#experiment-a), which has been trained just on the ISBI-2012 dataset. This model aims to see if it is possible to use a model, trained on another dataset with similar content, for a completely unknown dataset. - _Fine-tuned model_: This model will also make use of the pre-trained Miltos-90 U-Net reproduction model and will be fine-tuned to specialize on the Kc167 dataset. The purpose of this model is to validate whether the use of a pre-trained model in combination with fine-tuning contributes to performance. ### <a id="training-procedure"></a>Training Procedure For the training of the different models, we utilized the online platform Kaggle, which offers 30 hours of free GPU time per week and unlimited CPU time. In PyTorch, we had the option to choose between CPU and CUDA cores. We opted to leverage CUDA cores when available to accelerate the training process, and when CUDA cores were not available anymore, we utilized the free CPU cores. During training, several parameters played a crucial role, including the batch size, optimizer, learning rate, momentum, total number of epochs, and dataset size. All models discussed in the two experiments were trained with the same parameters. We set the batch size to 1, although there was a possibility to increase this number with the provided CUDA cores. The reason for this is that we want to keep the properties of the original U-Net model the same. We used stochastic gradient descent as the optimizer, with a momentum value of 0.99, consistent with the paper's approach. The Miltos-90 implementation of the U-Net model consists of a script to determine the best possible learning rate. For all models, this best learning rate was located at $1e^{-2}$, so this is used for training. We initialized the total number of epochs to a high value of 1000 and employed early stopping to obtain the model with the best loss without overfitting. For the loss function, we employed a weighted binary cross-entropy (BCE) loss. This loss function quantifies the dissimilarity between the predicted probabilities and the true binary labels. It is a commonly used loss function for binary classification tasks, where each input belongs to one of two classes. ### <a id="performance-metrics"></a>Performance metrics To measure the performance of the model we will use 3 different error metrics namely warping, rand, and pixel error.[[5](#ref-5)] *Warping error* measures the dissimilarity between two segmentations by quantifying the minimum mean square error between their pixels. It represents the discrepancy between a target segmentation and the best topology-preserving transformation of a reference segmentation. The goal is to alter the geometry of objects in an image while maintaining their topological properties. *Rand error* is a measure used to assess the dissimilarity between two segmentations of an image. It quantifies the level of disagreement or inconsistency between the two segmentations. The Rand index, which is the basis for calculating the Rand error, measures the agreement between pairs of pixels in terms of being in the same or different objects in the two segmentations. *Pixel error* is a metric used to evaluate image segmentation algorithms by measuring the disagreement between the original ground truth labels and the segmented labels at the pixel level. However, it is sensitive to minor displacements in object boundaries, which can result in large quantitative differences in the pixel error. For the calculation of these metrics, a Beanshell script provided by the ISBI-2012 challenge will be used.[[3](#ref-3)] The script requires the input image as well as the ground truth labels in TIF format. For all three metrics, it is desirable to have the lowest error value. # <a id="results"></a>Results In this section, we will present the results of both our experiments. To obtain the test results, the models were first trained one by one. As explained in the [training procedure](#training-procedure), after a period without improvement in the validation loss, the weights, with which the lowest validation loss was achieved are stored. The course of the training and validation loss over the epochs can be seen for each model in [Appendix A](#appendix-a). It is also indicated which epoch had the lowest validation loss and will therefore be used for the test results. ### <a id="reproduction"/> Reproduction of results from prior work Before doing the experiments, it is important to compare the reproduced U-Net with the state-of-the-art results. By making a comparison between the results obtained and the results of these previous works, the results of the experiments can be put back to the state-of-the-art results. Thus, to obtain this relationship, we will compare the Miltos-90 reproduction model with the results from the original U-Net paper [[1](#ref-1)]. The values for the three performance metrics of these two models are listed in [Table 1](#table-1). By looking at this table it can be observed that the state-of-the-art U-Net got significantly lower error values, about twice as low, for the warping and pixel errors. However, the pixel error is nearly the same. Therefore it can be concluded that the state-of-the-art performs significantly better on these metrics. <a id="table-1"></a><caption><span style="color:grey">Table 1: quantitative results for the models of the reproduction study.</span></caption> | <div style="width:280px">Model name</div> | Warping error <br /> (e-4) | Rand error <br /> (e-2) | Pixel error <br /> (e-2) | |---|:---:|:---:|:---:| | Miltos-90 U-Net reproduction |7.50|6.30|6.68| | State-of-the-art U-Net |3.53|3.82|6.11| ### <a id="results-experiment-a"/> Experiment A: ablation study To compare the performance of the different models of the ablation study, the three performance metrics are determined. The metric scores for all of the models are presented in [Table 2](#table-2). The Miltos-90 reproduction of the U-Net, which uses all of the augmentation methods in combination, achieved a warping error of $7.50e^{-4}$, a rand error of $6.30e^{-2}$, and a pixel error of $6.68e^{-2}$. If we compare these results with the model without any augmentation methods, it can be concluded that the augmentation methods mainly improved the rand error. The warping and pixel error did also improve, but this difference is less significant. Among the individual augmentation methods, the random geometric transformation exhibited the most promising results. This particular model achieved the lowest values for warping, random, and pixel error, surpassing the other augmentation methods. Notably, the warping error was even lower compared to the U-Net reproduction with all augmentation methods applied. Similarly, the models incorporating multiplicative noise and random brightness and contrast adjustments also demonstrated lower warping errors. The worst-performing augmentation methods are Gaussian noise and elastic deformation. Interestingly, the model that solely relies on Gaussian noise for data augmentation performed even worse than the model which did not use any data augmentation method. By observing the qualitative results that are visualized in [Table 3](#table-3), it is evident that the Miltos-90 U-Net reproduction model outperforms the other models in terms of segmenting the free space between cell boundaries. All models, including the Miltos-90 reproduction, exhibited proficient boundary segmentation. However, there were slight discrepancies observed in the ablation models, with some models introducing additional lines in their predictions that did not exist in the ground truth and some models are missing certain boundaries. <a id="table-2"></a><caption><span style="color:grey">Table 2: quantitative results for the models of the ablation study.</span></caption> | <div style="width:280px">Model name</div> | Warping error <br /> (e-4) | Rand error <br /> (e-2) | Pixel error <br /> (e-2) | |---|:---:|:---:|:---:| | Miltos-90 U-Net reproduction |7.50|6.30|6.68| | No augmentation |7.81|12.89|7.72| | | | | | | | | Random geometric transformation |**5.15**|**6.51**|**6.87**| | Gaussian noise |8.79|13.37|7.62| | Multiplicative noise |6.85|10.56|7.31| | Random brightness contrast |6.58|9.47|7.40| | Elastic deformation |7.99|10.90|7.55| <table id="table-3"> <caption>Table 3: qualitative results for the models of the ablation study.</caption> <tr> <th>Ground truth</th> <th>Miltos-90 U-Net reproduction</th> <th>No augmentation</th> <th>Random geometric transformation</th> </tr> <tr> <td><img src="https://i.imgur.com/kWl3RNq.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/tQoS1BE.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/02R9gzL.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/GnTMF79.png" alt="image" width="100%" /></td> </tr> <tr> <th>Gaussian noise</th> <th>Multiplicative noise</th> <th>Random brightness contrast</th> <th>Elastic deformation</th> </tr> <tr> <td><img src="https://i.imgur.com/iMfFGue.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/FxLafMJ.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/WFx0Qgj.png" alt="image" width="100%" /></td> <td><img src="https://i.imgur.com/kINowjf.png" alt="image" width="100%" /></td> </tr> </table> ### <a id="results-experiment-b"/> Experiment B: fine-tuning The three models for the fine-tuning experiment have been tested and for every model, the performance metrics are determined. These quantitative results are presented in [Table 4](#table-4). From these results, it can be observed that the base model scored best for both the warping and pixel error metrics. The best model in terms of rand error is the fine-tuned model. By comparing the results of the base model with the fine-tuned model, it can be observed that their performance is similar by considering the rand and pixel error. However, in terms of the warping error, the base model has a significantly lower value. The generalized model performed significantly worse than the other two models for all three performance metrics. By observing the qualitative results that are visualized in [Table 5](#table-5), the first major difference is the prediction of the generalized model. By comparing the prediction of this model with the ground truth it can be observed that this model did not predict the cell boundaries right, which was also concluded from the quantitative results. It was able to detect the inside of the cells pretty well, but it is not able to predict the ending of the cell boundaries. On the other hand, the base model and fine-tuned model did a great job of predicting these boundaries. The base model hardly made any mistakes. The fine-tuned model did a great job too, but the prediction contains more observable mistakes. Several cells can be observed whose boundary is not completely closed. <a id="table-4"></a><caption><span style="color:grey">Table 4: quantitative results for the models of the fine-tuning experiment.</span></caption> | <div style="width:230px">Model name</div> | Warping error <br /> (e-4) | Rand error <br /> (e-2) | Pixel error <br /> (e-2)| |---|:---:|:---:|:---:| | Base model |**6.14**|3.59| **2.14**| | Generalized model |325.84|31.53|27.02| | Fine-tuned model |51.63|**3.14**|2.31| <table id="table-5"> <caption>Table 5: qualitative results for the models of the fine-tuning experiment.</caption> <tr> <th>Ground truth</th> <th>Base model</th> <th>Generalized model</th> <th>Fine-tuned model</th> </tr> <tr> <td><img src="https://i.imgur.com/IVDWZMh.png" alt="ground_truth" width="100%" /></td> <td><img src="https://i.imgur.com/l5CbP46.png" alt="base_model" width="100%" /></td> <td><img src="https://i.imgur.com/JqwdDks.png" alt="generalized_model" width="100%" /></td> <td><img src="https://i.imgur.com/UkLlEjL.png" alt="fine-tuned_model" width="100%" /></td> </tr> </table> # <a id="discussion-conclusion"></a> Discussion & Conclusion Our goal for this blog post is to perform some experiments regarding the training of a U-Net for medical applications without having access to a lot of training data. We conducted two main experiments for this purpose. Firstly, we performed an ablation study to validate the effectiveness of the different data augmentation methods that were used by the U-Net reproduction by Miltos-90. Secondly, an experiment is conducted regarding the effectiveness of fine-tuning for U-Net models in the medical sector. To relate the results of the ablation study to the performance of the original U-Net model, we first compared the results of the reproduction to those of the original paper. This comparison shows that the reproduction of Miltos-90 has significantly higher error values than is recorded in the original paper. The pixel error is fairly similar, but the warping and edge errors are about a factor of two higher. One reason for this could be that the reproduction just used a different data augmentation system with different parameters. In the paper, the authors are not very clear about the augmentation methods used and therefore it may not have been clear exactly which ones are used. In addition, the paper mentions having used dropout layers as an augmentation method, however, it is not applied in the reproduction. Because these results do not match well enough, the experiments cannot be linked back to the original model with complete certainty. The results of experiment A show that the augmentation methods used by the Miltos-90 reproduction have a positive effect on performance. This is an expected outcome since augmentation ensures that the model receives more varied data as input and can therefore train more epochs without becoming overfit. When we start looking at the ablation models, the models that use any of the augmentation methods, we see that almost all of them had a positive impact on performance. Only the model that uses only Gaussian noise has resulted in poorer performance when it comes to warping and rand error. The best-performing model among the ablation models is the model that uses random geometric transformations. This model has the lowest error in all three metrics and for the warping and rand errors, it is even with a significant difference compared to the other models. So this will mean that much of the improvement in the performance of the Miltos-90 reproduction will be due to this augmentation method. Besides the model that uses Gaussian noise, the model that uses the elastic deformation is also a model that does not contribute much to the performance of the model. This is striking because, in the official paper presenting the U-Net, they mention that "especially random elastic deformations of the training samples seem to be the key concept to train a segmentation network with very few annotated images"[[1](#ref-1)]. So this very much contradicts each other. Experiment B shows useful results on the use of pre-trained models for application to a different comparable dataset. The first striking result is that the generalized model clearly shows no desired performance. On all three metrics, this model underperforms and the qualitative results also show that this model is not useful. Thus, it can be concluded that using a pre-trained model on a different but similar dataset does not work. The base model results show that it is then much more useful to train the model on the small dataset available. The main purpose of this experiment was to validate whether it is useful to apply fine-tuning for training a U-Net model with little data. The quantitative results of this experiment show that the base model and the fine-tuned model have similar performance in terms of rand and pixel error. However, the fine-tuned model performs significantly worse on the warping error metric. Comparing the qualitative results shows that the models indeed perform fairly similarly, but with the fine-tuned model it can indeed be seen that the boundary of some of the cells is not complete. So this means that in this case it is more useful to train the model only on the required data and thus using a pre-trained model did not contribute to better performance. All in all, it emerged from experiment A that the random geometric transformations in this case at least made the most positive contribution in improving the network. In addition, even though the results of the reproduction do not quite match the original U-Net paper, it emerged from this experiment that the elastic deformations do not seem to be a major asset to performance after all. For follow-up research, we would still recommend training the model through cross-validation. This would make the results more reliable and comparable. The reason for this is that due to the lack of data, only 3 images are used for validation. It may be that an augmentation method works better on these specific images so the results may vary. From experiment B, we can conclude that using a pre-trained model without further fine-tuning is certainly not desirable. These results were significantly worse and this model also did not give useful results. In addition, from this experiment we can tentatively conclude that fine-tuning, despite the hopeful theory, does not always ensure better results. Again, cross-validation would provide more certainty about this outcome. In addition, for a follow-up study, we would like to suggest trying this concept even more on several different data sets to obtain the most accurate and generalized result possible. ## <a id="head7"></a>References <div style="text-align: left"> <a id="ref-1"></a>[[1]] O. Ronneberger, P. Fischer and T. Brox. *U-Net: Convolutional Networks for Biomedical Image Segmentation*. arxiv, 2015. <a id="ref-2"></a>[[2]] Miltos-90. *U-Net: Convolutional Networks for Biomedical Image Segmentation Reproduction*. GitHub, 2022. <a id="ref-3"></a>[[3]] I. Arganda-Carreras, S Seung, A. Cardona and J. Schindelin. *USegmentation of neuronal structures in EM stacks challenge - ISBI 2012*. ImageJ, 2012. <a id="ref-4"></a>[[4]] A. Carpenter. *Drosophila Kc167 cells*. Broad Bioimage Benchmark Collection, 2012. <a id="ref-5"></a>[[5]] V. Jain, B. Bollmann, M. Richardson. *Boundary Learning by Optimization with Topological Constraints*. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010. </div> # <a id="appendices"></a> Appendices ### <a id="appendix-a"></a> Appendix A - Training and validation loss graphs | <div style="width:100px"></div> Model (epoch of early stopping)| <div style="width:260px"></div> Loss graph | | :--- | :---: | |Experiment A: <br /> Miltos-90 U-Net reproduction <br /> (epoch 269)| <img src="https://i.imgur.com/CMtMIgA.png" alt= “image” width="100%" /> | |Experiment A: <br /> No augmentation <br /> (epoch 66) | <img src="https://i.imgur.com/FfgngWf.png" alt= “image” width="100%" /> | |Experiment A: <br /> Random geometric transformation <br /> (epoch 177) | <img src="https://i.imgur.com/WXnZqYu.png" alt= “image” width="100%" /> | |Experiment A: <br /> Gaussian noise <br /> (epoch 61)| <img src="https://i.imgur.com/mdp23d3.png" alt= “image” width="100%" /> | |Experiment A: <br /> Multiplicative noise <br /> (epoch 60) | <img src="https://i.imgur.com/lbBm8Oz.png" alt= “image” width="100%" /> | |Experiment A: <br /> Random brightness contrast <br /> (epoch 56)| <img src="https://i.imgur.com/onHQfe7.png" alt= “image” width="100%" /> | |Experiment A: <br /> Elastic transformation <br /> (epoch 79) | <img src="https://i.imgur.com/hT4zVgY.png" alt= “image” width="100%" /> | |Experiment B: <br /> Base model <br /> (epoch 632) | <img src="https://i.imgur.com/HuUhEQg.png" alt= “image” width="100%" /> | |Experiment B: <br /> Fine-tuned model <br /> (epoch 399)| <img src="https://i.imgur.com/J6cnyTM.png" alt= “image” width="100%" /> | [1]: https://arxiv.org/abs/1505.04597 [2]: https://github.com/Miltos-90/UNet_Biomedical_Image_Segmentation [3]: https://imagej.net/events/isbi-2012-segmentation-challenge [4]: https://bbbc.broadinstitute.org/BBBC007 [5]: https://ashm8206.github.io/2018/04/08/Segmentation-Metrics.html <style> body {text-align: justify} table { width: 100%; } th, td { width: 25%; } </style>