# Causality matters in medical imaging
*By Aaditya Panchal*
P( Y | X )
**Predictive Modeling** is based on the perception "Given an Image X, train a model to predict some annotation Y. Of course this is on the assumptions that A: Sufficient training data is available, and B: Training and test data come from the same distribution.
**Causal Reasoning** is a more in depth relationship of "X and Y". Separated into two types of categories:
Causal - Predicting the effect from the cause
Anti-Causal - Predicting the cause from the effect
## Challenges
There poses many challenges to predictive modeling, One known as **data scarcity**. Data Scarcity is caused by imbalance of labelled vs. unlabeled data. This can also directly influence **dataset shifts**.
Dataset shifts are to put it short, external factors which can directly affect a deep learning model's performance. Below is a table of the prevalant shifts in medical imaging. This can become a major hurdle in medical imaging, hence its inaccuracy can be futile.
| Type | Causal/Anticausal | Examples |
| ------------------- | ----------------- | --------------------------------------------------------- |
| Population shift | Causal | Ages, sexes, diets, habits, ethnicities, genetics |
| Annotation shift | Causal | Annotation policy, annotator experience |
| Prevalence shift | Anticausal | Target selection|
| Manifestation shift | Anticausal | Anatomical manifestation of the target disease or trait |
| Acquisition shift | Either | Scanner, resolution, contrast, modality, protocol |
When training a model, studies are only performed in small samples, this may lead to **selection bias**.This bias arises due to the way data is collected, leading to systematic differences between the selected sample and the target population. As a result, the conclusions or predictions made by the model or study may lead to inaccuracies. It is often represented by by (X and Y) when they influnce an external variable (S), this leads to X and Y seeming more stronger than they are.
Furthermore, the combination of all the dataset shifts can point to this as well. Represented by the equation:
X -> S <-Y
In which **(X) represents the image**, and **(Y) represents the outcome or the target variable** that the model is trying to predict. **S is the selection mechanism or external factors that can directly affect X and Y**.
## Solutions
Although data scarcity and dataset shifts are prevelant, we can combat them using solutions like **semi-supervised-learning** and **data augmentation**. Semi-supervises-learning allows us to generate additional ( X, Y ) pairs by adding random controlled perturbations(or changes) to the data. Although this may only be plausible for anti-causal problems, due to the dependant distributions.
---
:::success
From the Causal paper, I understood this:
* Causal inferencing, and diagrams can easily express assumptions about the data. Organizing it into a precise framework to assess problems can be vital in predictive modeling. Although there poses many challenges in deep learning such as selection bias, data scarcity, and dataset shifts, they can be addressed by using these methodologies.
* While using Deep Learning, we must be aware of all of these external factors. Even the slightest change can vastly alter the annotation linked to the image. To ensure this does not happen, we use causal inferencing, and diagrams.
* We need a vast amount of training data based on all types and sorts of classifications.
:::