# Collagen Segmentation in Scattered Light Imaging
*Gijs van de Linde, Laurens van Schreven & Mark Vermeulen*
To select the right treatment for cancer patients, it is important to be able to predict the chance a tumour might spread to a different part of the body (metastatis). There is increasing evidence that the orientation of collagen fibres near the tumour boundary can be used for this prediction.<sup>1,2</sup>
This has lead to research on techniques able to image this orientation. One such technique is (Computational) Scattered Light Imaging (ComSLI), which is investigated at the Menzellab at the TU Delft<sup>3</sup>.
Tumours contain a mixture of collagen and non-collagen tissues, and current methods of fibre orientation imaging do not directly show the presence of collagen. Instead, it is necessary to perform time-consuming collagen staining (~1hr), and to image these samples separately. These images must also be registered on the orientation maps, which can be a difficult and time-consuming process as well.
In this project, we will investigate the possibility of determining the collagen-rich tissue locations directly from the ComSLI data. To this end, we will create a machine learning dataset from ComSLI measurements and stained images of mouse hearts. We will train a deep learning image segmentation model on this dataset, and investigate its performance. Finally, we will give suggestions for further research.
## Computational Scattered Light Imaging (ComSLI)
<img src="https://hackmd.io/_uploads/HJOuJrb0kx.png" width="350em"><img src="https://hackmd.io/_uploads/ByW_krWCJg.png" width="350em">
Computational scattered light imaging (ComSLI) is a technique to image fibre networks in biological tissues. The fibres can be collagen, but also bundles of nerve, muscle or other types of fibres. ComSLI exploits the anisotropic scattering of light on fibre bundles: light tends to be scattered perpendicularly to the fibre orientation. This can be seen in figure A above, adapted from Georgiadis et al.<sup>4</sup> The current measurement setup, shown in figure B, works in reverse, illuminating the sample from different angles and observing the light that scatters into a camera positioned above.
A typical ComSLI measurement consists of 24 images, each with a different illumination direction. One way of representing this data is a *line profile*, a plot of the intensity verus the illumination angle, shown at the bottom of figure A. Each location in the sample has its own line profile with 24 values.
Each pixel has its own fibre orientation, and this can be visualized as a colourful *fibre orientation map (FOM)* (see figure C, adapted from <sup>3</sup>). A major advantage of ComSLI is the ability to determine the direction of two crossing fibres within a single pixel: this is shown with the checkerboard pattern highlighted in this image.
## Our Dataset
<img src="https://hackmd.io/_uploads/HygLeSWCke.png" width="60%"><img src="https://hackmd.io/_uploads/rkBplHbRyx.jpg" width="39%">
Left: This flowchart shows an overview of the steps in the data preprocessing.
Right: An example mouse heart image obtained after staining.
### Measurement
Our dataset consists of 36 sections of healthy mouse hearts. The sections were 4 $\mu\textrm{m}$ thick and did not contain tumours. Instead, we investigated the naturally present collagen fibres. Each section was measured with ComSLI, which resulted in a stack of 24 images with different illumination directions. Since the field-of-view of the setup was limited, multiple tiles were imaged separately. These were later stitched together to create the full measurement stack.
To get a ground truth labelling of collagen in these samples, they were stained using Picrosirius Red (PSR). PSR is a stain that turns red in the presence of collagen fibres, and yellow in other tissues.
The individual hearts were then cropped out of the measurement stacks and stained images, and these were used for further preprocessing.
### Thresholding
To create ground truth labels, the stained images need to be converted into binary masks. In these masks, the red regions that represent collagen should be turned white, and the yellow parts and background should become black. This highlights the areas which contain collagen as white in the binary image.
For more easy distinction between yellow and red, the image is converted from RGB to HSV color space. We do this because the hue component separates them more clearly than RGB does. Next, we define a threshold range for the red colors based on HSV values. We therefore use two separate ranges to capture both low and high reds, as red appears at both ends of the hue spectrum. The values of these ranges are first specified by using a color picker. A mask is created, which is converted into a binary image, where collagen is white and everything else is black.
### Image Registration
The created binary masks act as the ground truth for our ComSLI images; they highlight the regions of the heart that contain collagen. However, due to differences in imaging setups—such as camera lenses, focal lengths, or tripod angles—the ComSLI measurements and the corresponding stained microscopy images do not align pixel-wise. As a result, the pixel locations in the ComSLI image do not correspond to those in the binary mask. This misalignment poses a problem for training a neural network, which requires accurately aligned input (ComSLI) and label (mask) data.
To correct this misalignment, image registration is applied. Image registration geometrically aligns points in one image to corresponding points in another, by designating one image as the reference and applying a geometric transformation to the other so that it matches. For improved accuracy, both stained and ComSLI images are first converted to grayscale and normalized. Only the first slice of the 24-layer ComSLI measurement is used for this procedure.
Intensity-based registration—specifically phase correlation—was chosen because it is known to be robust to illumination and modality differences. This is particularly important, as the same anatomical region may appear markedly different across modalities like ComSLI and histological staining. Unlike feature-based methods, phase correlation does not rely on local features, which may not be consistent between imaging types. Instead, it leverages the Fourier shift property to estimate translation, producing a sharp and distinct correlation peak.<sup>6</sup>
Several software environments offer image registration functionality. In this study, MATLAB is used, with the `imregcorr` function, which estimates a geometric transformation based on phase correlation of image intensities.<sup>5</sup> The results of this registration procedure are shown in the figure below. The top row shows the ComSLI measurement and stained images before registration, and the bottom row shows the same pair after registration.

Apart from visual inspection, we also quantified the registration accuracy using a Dice similarity score. To calculate this, we first apply Gaussian smoothing, then detect edges using the Canny method. After performing morphological closing to connect fragmented edges, we fill enclosed regions and remove small noise components. The Dice score is then computed as:
$$
\text{Dice} = \frac{2 \cdot |A \cap B|}{|A| + |B|}
$$
where $A$ and $B$ are the filled binary masks extracted from the measurement and stained images. The results are shown below.

Sometimes the Dice score was low, not due to poor alignment, but because of how we compute the binary masks. We use edge detection followed by morphological operations (to connect the edges) and region filling, which can be sensitive to noise or intensity variations between images. In such cases, even though the computed masks may differ, the actual alignment can still be correct. For samples with a Dice score below 0.7, we performed a visual inspection. An example is shown in the second figure: although the Dice score is low due to mismatched binary masks, the measurement and registered stained image on the bottom row visually align very well.

# Data Augmentation
Because the size of our dataset is limited (only 36 samples), data augmentation becomes essential to enhance model generalization and to prevent overfitting. Data augmentation has been widely used in biomedical image segmentation, including by Ronneberger et al. in the original U-Net paper <sup>6</sup>. By applying various transformations, we can synthetically expand the dataset, providing the model with diverse examples to learn from.
> **Note:** To prevent data leakage, we first split the dataset into training and validation sets *before* applying augmentation. This ensures that augmented training samples don’t appear in the validation set in any form.
Our augmentation strategy includes, for each image, 10 variations or combinations of the following linear transformations:
+ **Random Horizontal and Vertical Flips**
These flips help the model recognize objects irrespective of their orientation.
+ **Rotations up to ±20°**
Introducing slight rotations allows the model to become invariant to object orientation.
+ **Affine Transformations with 10% Translation and 90–110% Scaling**
Affine transformations, which include translation and scaling, adjust the position and size of the images.
All transformations are applied synchronously to both the input images (measurements) and their corresponding registered masks to preserve spatial consistency. Furthermore, since our input consists of 24-channel stacks, transformations are applied identically across all channels to maintain alignment.
The augmentation pipeline was implemented using the PyTorch library [`torchvision.transforms.v2`](https://pytorch.org/vision/main/transforms.html)<sup>10</sup>, as shown below:
```python
transform = v2.Compose([
v2.RandomHorizontalFlip(p=0.5),
v2.RandomVerticalFlip(p=0.5),
v2.RandomRotation(degrees=20),
v2.RandomAffine(degrees=0, translate=(0.1, 0.1), scale=(0.9, 1.1)),
])
```
Some examples of how these augmentations may look are shown below:

# Models
The image segmentation was performed using a standard U-Net model. To test the influence of the image spatial structure on the data, we also perfomed pixel-by-pixel modelling.
## Logistic Regression (Single Pixel)
The pixel-by-pixel model was a logistic regression (implemented in [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)). The mages were turned into lists of measurement series (each 24 pixels). As the dataset is very unbalanced (pictures tend to have ~99.5% non collagen pixels), separate lists were made for collagen and non-collagen pixels. Various proportions of collagen and non-collagen pixels were used in training. To investigate the convergence and the effect of limited training data, the Dice score was calculated after training on the next image. A separate validation image (sample 1B), was used for this calculation.
## U-Net
Due to the small size of our dataset and the biomedical application of our problem, we were inspired by the orignal U-Net paper<sup>6</sup> to try a U-Net architecture. A U-Net performs well even with limited training data. Its flexible and strong performance have made it a standard choice in medical imaging. This is why we wanted to apply it to our collagen segmentation problem.
We used a UNet implementation by milesial (https://github.com/milesial/Pytorch-UNet/) which was originally designed for Kaggle's Carvana Image Masking Challenge. We modified this model to use our dataset, while retaining the used architecture and training infrastructure. This architecture used a RMSProp optimizer and a cross entropy loss. For logging, we used "Weights and Biases" (https://wandb.ai/). There was a lot of organizaional overhead from adapting this model. We designed a class for loading our data in the correct way. This is an error-prone processes, which required a lot of care from our side. Furthermore, data-augmentation also ended up as quite a lot of work. This is a clear downside of using a new dataset.
# Results
## Logistic Regression
<!--<img src="https://hackmd.io/_uploads/ByJzZBbCye.png" width="60%"> -->
Logistic regression on the full dataset of single pixels leads to a very low Dice score. The final score was 0.024, a maximum of 0.097 was reached during training, while a total agreement would be 1.
One possible cause of this low score could be dataset imbalance. Our dataset is very imbalanced: only around 0.5% of the pixels corresponds to collagen-rich tissue, while 99.5% correspond to non-collagen tissue or background. To investigate this, we repeated the training for datasets where we enforced a certain fraction of the dataset to be collagen pixels. This required throwing away the remaining non-collagen pixels.
<img src="https://hackmd.io/_uploads/H1BzgdWCyg.png" alt="cv_single_pixel_num_samples_vs_collagen_fraction" width="50%">
As expected, the amount of training samples (i.e. pixels in the dataset) grows rapidly for decreasing collagen fractions. In general, the model should perform worse with less training data. The goal of this analysis is more to figure out, given our imbalanced dataset, if there is a collagen fraction where the increase in data balance outweighs the overall decrease in data.
<img src="https://hackmd.io/_uploads/SkrzedZCke.png" alt="cv_single_pixel_DICE_vs_collagen_fraction" width="50%">
<img src="https://hackmd.io/_uploads/HyHMgdbAyg.png" alt="cv_single_pixel_DICE_vs_images" width="50%">
<!--<img src="https://hackmd.io/_uploads/SyHMe_WRJl.png" alt="cv_single_pixel_Max_DICE_vs_collagen_fraction" width="50%">-->
Looking at the final Dice score (left) it appears there is such a collagen fraction, but it is much smaller than expected. The maximum is at 1% collagen.
The right plot shows that the training behaviour is very chaotic. Though the training score eventually increases, the Dice score on a test image (sample 1B) does not smoothly increase, but rather change rapidly. This also influences our result: if the training had terminated ealier, 25% or 10% collagen would have an optimal Dice score instead.
These results would make sense if the model simply tries to keep the ratio of collagen pixels the same as in the input data. The single-pixel model predicts each pixel separately, and so should not be able to adjust the overall ratio of the image. Nevertheless, it could in theory learn the overall distribution of the image, and threshold this in such a way to get the correct proportion.
To figure out if this is the case, we looked at the labelled collagen percentage versus the collagen fraction in the training data.
<img src="https://hackmd.io/_uploads/rkHMl_ZAyl.png" alt="cv_single_pixel_collagen_label_vs_collagen_fraction" width="50%">
<img src="https://hackmd.io/_uploads/SJSzgOW0kx.png" alt="cv_single_pixel_collagen_label_vs_collagen_fraction_zoomed" width="50%">
In the plots above, the percentage of the test image labelled as collagen is plotted against the collagen percentage in the training data. The blue line represents the collagen fraction after the entire training, while the shaded area represents the spread during training. The grey dotted line is the ground truth collagen fraction. The black dot-dashed line is the fraction of all tissue in the image.
These plots show that indeed more collagen in the training data leads to more pixels being labelled as collagen. This is not simply a linear relationship, however. Initial tests seemed to show that some models simply segmented all tissue as collagen, but this cannot be seen in this chart.
## U-Net
<!--  -->
||
|:--:|
|*Validation Dice score and training loss without data augmentation (each epoch is 36 steps)*|
We initially trained our U-Net model without data augmentation. We noticed that te validation Dice score plummeted during the span of our first epoch of training, as can be seen in the figure above. We investigated this by looking at the predicted masks, which are displayed in the figure below.

<!--![cv_single_pixel_acc_vs_samples]
(https://hackmd.io/_uploads/SyeBzlOZCkg.png)-->
It is clear from this image that the model initially finds some patterns in our dataset (image below, t=6). The predicted masks continually becomes more and more sparse. After a while the model only predicts empty masks (all zero masks, as can be seen from t=34). After timestep 34 in the image above, all predicted masks where empty masks.
After adding data augmentation, the loss data looks very similar, as can be seen from the image below.
| |
|:--:|
| *Validation Dice score and training loss using data augmentation (randomly transformed for each epoch, each epoch is 36 steps)* |
Conclusions that can be drawn from these results, are that our loss function causes the model to predict empty masks. The fact that our dataset is very unbalanced likely also worsens this effect. These factors might cause our model's gradient to get stuck in a local minimum. In the next section, multiple suggestions are done to prevent this.
<!--To solve this, we required that a certain fraction of the dataset be collagen pixels, thus throwing away most of the non-collagen pixels. From this we can see that the peak dice score improves for a 25%-75% distribution over a 50%-50% distribution of collagen versus non-collagen pixels. Unfortunately the DICE score is still extremely low, and the training behaviour is very erratic. The end result of this training is a model that predicts all tissue as collagen, in effect segmenting the background.-->
# Discussion and Recommendations
It is difficult to draw a comparison between the U-Net and the single pixel model, as neither model gave correct results. Interestingly, the U-Net tends to converge towards labelleling nothing as collagen, while the single pixel model did not. This suggests that the U-Net is more sensitive to data balancing.
The U-Net showed signs of overfitting, as it quickly converged to a low training loss without any improvement in the testing score (Dice on the validation image). A more varied and/or more balanced dataset should be able to reduce this overfitting.
The effect of data augmentation could not be seen in our evaluation of the U-Net. We suspect this is because the dataset is so small and unbalanced, that augmentation is not able to resolve the overfitting. Additionally, augmentation is performed between each epoch. Since the model already converges to an empty prediction after the first epoch, this cannot be seen either.
Even though our models did not provide statisfactory performance, we did provide a new and interesting dataset, and preprocessed it to make it suitable for machine learning. The following subsection describes steps for future machine learning on this dataset.
## Next Steps
Beyond the solutions investigated in this project, there are other possibilities to improve the model performance:
The training behaviour of our methods was chaotic and unexpected. The *mixup* augmentation method<sup>8</sup> claims to improve training regularity by considering linear combinations of the training images as well.
Data imbalance could be compensated by adding class weights to the loss function. Another term could be added to the loss function, which greatly penalizes a (mostly) uniform non-collagen prediction. Finally, the dataset size could be increased by measuring new samples.
An approach that we started, but were not able to finish analysing, was using patching. We worked on implementing a dataset where each sample got cut up into patches of a certain size, and all empty patches (with no collagen labeled) got thrown out. This would make the model use only local information, and would yield us with a lot more datasamples, which could help improve model performance.
Finally, in this project we chose to use high-resolution labels (indicating pixel-by-pixel if there is collagen present) and use the raw measurement series as input. There are alternatives here as well. For example, it is possible to instead label the overall areas containing collagen, creating connected areas instead of disjoint spots of collagen. This would also allow for further downsampling and smoothing the measurement stacks.
<!--[TODO: illustrate this with a diagram maybe?]-->
The measurement stacks themselves could be replaced by parameter maps. Instead of the raw measurement series, the model could consider the number, width and position of peaks in the line profile, among others. Particularly interesting is the average over all the measurement images, which seems to be darker near collagen deposits. Additionally, the pixels corresponding to the background tend to have very low intensities. This should allow us to remove them from the dataset of the single pixel model, allowing it to focus on collagen versus non-collagen tissue instead of tissue versus background.
In this project, we have introduced the problem of collagen segmentation in ComSLI data, we have considered several approaches and we have suggested improvements for further research.
The code for this project can be found on [GitHub](https://github.com/gvandelinde/collagen-mouse-tissue-segmentation/)<sup>9</sup>.
## Sources
1. Esbona K.; Yi Y.; Saha S. et al.
*The Presence of Cyclooxygenase 2, Tumor-Associated Macrophages, and Collagen Alignment as Prognostic Markers for Invasive Breast Carcinoma Patients.* The American Journal of Pathology, Volume 188, Issue 3, 2018, Pages 559-573, ISSN 0002-9440, https://doi.org/10.1016/j.ajpath.2017.10.025.
2. Ouellette, J.N.; Drifka, C.R.; Pointer, K.B. et al. *Navigating the Collagen Jungle: The Biomedical Potential of Fiber Organization in Cancer.* Bioengineering 2021, 8, 17. https://doi.org/10.3390/bioengineering8020017.
3. https://menzellab.gitlab.io/research/
4. Georgiadis M.; Auf der Heiden F.; Abbasi H. et al. *Micron-resolution fiber mapping in histology independent of sample preparation.* Biorxiv : the Preprint Server for Biology. 2024 Aug:2024.03.26.586745. DOI: https://doi.org/10.1101/2024.03.26.586745. PMID: 38585744; PMCID: PMC10996646.
5. MathWorks. Use Phase Correlation as a Preprocessing Step in Registration. MATLAB Image Processing Toolbox Documentation. Accessed March 2025. Available at: https://nl.mathworks.com/help/images/use-phase-correlation-as-preprocessing-step-in-registration.html.
6. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-NET: Convolutional Networks for Biomedical Image Segmentation. In Lecture notes in computer science (pp. 234–241). https://doi.org/10.1007/978-3-319-24574-4_28
7. Foroosh, H.; Zerubia, J. B.; Berthod, M. Extension of Phase Correlation to Subpixel Registration. IEEE Transactions on Image Processing, Volume 11, Issue 3, 2002, Pages 188–200, ISSN 1057-7149, https://doi.org/10.1109/83.988953.
8. Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2017). mixup: Beyond empirical risk minimization. https://doi.org/10.48550/arXiv.1710.09412
9. Computer Vision Group 5, The GitHub repo of our project: Collagen Mouse Tissue Segmentation, https://github.com/gvandelinde/collagen-mouse-tissue-segmentation/
10. PyTorch Contributors.
Torchvision: Computer Vision Utilities for PyTorch. Online documentation. https://pytorch.org/vision/main/transforms.html. Accessed April 8, 2025.