# Storyline: Bayesian Loss for Crowd Count Estimation with Point Supervision
## Crowd Counting: Motivation and Current Approaches
- Crowd counting is crucial for various applications such as sports events, traffic monitoring[^CountCars], cells and bacteria from microscopic images [^CountObjects].
- Most state-of-the-art methods use density map estimation, converting point annotations (center of heads) to "ground-truth" density maps using Gaussian kernels [^MultiCount].
- These methods face limitations due to occlusions, varying object sizes, and shapes, leading to imperfect density maps.
- Another approach seen in earlier works is "detection-then-counting" i.e. segmenting individual scene objects
- This is computational expensive and mostly suitable for lower density crowds
- Also, image segmentation requires bounding box annotations which is cumbersome to produce
## Proposed Bayesian Loss
- Ma et al.[^Bayesian] introduce a novel loss function called Bayesian loss, which constructs a density contribution probability model from point annotations.
- Instead of pixel-wise supervision, it supervises the count expectation at each annotated point, enhancing reliability.
- This approach focuses on the count expectation rather than constraining every pixel's value in the density map.
- It does not rely on external detectors or multi-scale architectures, simplifying the model.
- Additional to their default *"Bayesian"* loss, they also present *"Bayesian+"* in which background pixel modelling is incorporated
- They evaluate their proposed loss on *VGG19* by comparing against SOTA methods and a baseline [^baseline] on the following crowd counting datasets: [UCF-QNRF](https://www.crcv.ucf.edu/data/ucf-qnrf/), [ShanghaiTechA](https://www.v7labs.com/open-datasets/shanghaitech), [ShanghaiTechB](https://www.v7labs.com/open-datasets/shanghaitech) and [UCF_CC_50](https://www.crcv.ucf.edu/data/ucf-cc-50/)
- Both *Bayesian* and *Bayesian+* outperform the baseline on all the four benchmarks
- *Bayesian+* reduces the MAE and MSE of the best method (CL-CNN) by 43.3 and 36.2 respectively on the hardest dataset: UCF-QNRF (with scores MAE: 88.7, MSE: 154.8)
## Our Ablation Study: Motivation
- No ablations (nor replications) have been conducted on this work so far
- It is worthwhile to find out whether the Bayesian loss proposed by Ma et al. generalizes to:
- **Architectures other than the backbone proposed in their work** - to demonstrate that their loss is model-agnostic. The paper mentions that their "proposed loss functions can be readily applied to any network structure to improve its performance on the crowd counting task", but they have only presented experiment results on VGG-19.
- **Datasets with non-human crowds (e.g. cells, bacteria, vehicles in traffic)** - to expand the usability of their proposed method. The authors of the paper only evaluate their method on human crowds, which provides no guarantees that their proposed loss function would also be suitable for other applications.
<!-- - Why is it good to count something other than humans?
- Why is it important to reproduce this paper? -->
## Experimental Questions
1. **Can we *investigate* whether we can yield similar results using their proposed loss and architecture (VGG-19)?**
_We aim to reproduce the results on the *UCF-QNRF* and *ShanghaiTechA* datasets as presented in Table 1 (for both *Bayesian* and *Bayesian+*)._
2. **Can we *validate* that the loss function presented by Ma et al. is model-agnostic?**
_We aim to test model-agnosticity by training an unexplored architecture (e.g. [DenseNet](https://arxiv.org/abs/1608.06993)), thereby fixing the proposed Bayesian loss and train/test datasets used._
3. **Does applying this method on non-human datasets yield similar results to human datasets?**
_We aim to extend their experiments on a non-human dataset (e.g. [IOCfish](https://arxiv.org/abs/2304.11677)) to test whether their loss indeed generalizes to non-human crowds._
## Task Division / Planning
| Task | Assignee |
| -------- | -------- |
| RQ1 | Dani, Vanessa |
| RQ2 | Dani |
| RQ3 | Nadine, Vanessa |
| Storyline writing | Nadine |
| Blog writing | All |
[^CountCars]: A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning. https://arxiv.org/abs/1609.04453
[^CountObjects]: Learning to Count Objects in Images. https://proceedings.neurips.cc/paper_files/paper/2010/file/fe73f687e5bc5280214e0486b273a5f9-Paper.pdf
[^MultiCount]: Multi-source multi-scale counting in extremely dense crowd images. https://ieeexplore.ieee.org/document/6619173
[^Bayesian]: Bayesian Loss for Crowd Count Estimation with Point Supervision. https://arxiv.org/abs/1908.03684
[^baseline]: Shares the same network (VGG19) and training process, without the Bayesian loss