# Controlling Selection Bias in Causal Inference Notes
## Intro
Selection bias can be caused by exluding data. Often times, cases are only reported when outcomes are unusual. Non-cases are usually not reported. This can skew the data and create an overrepresentation.
If X is the treatment (something that researchers are studying / applying) and Y is an outcome (result of the effects), and they both influence a variable S, then it can seem like X and Y are more related than they actually are, which is **selection bias**.
**Confounding** occurs when X and Y are both influenced by an ommitted variable, which also makes the relationship between X and Y seem stronger.
Confounding Bias: This is when our research results are messed up because we didn't consider all the factors that could affect our study. This bias happens when any connection between X (treatment) and Y (outcome) is because of how treatments were chosen. It can be fixed by randomizing the treatment.
Selection Bias: This happens when we pick data in a way that doesn't represent the whole picture. This bias occurs when any connection between X and Y is because of how certain subjects were chosen to be in the study. Unlike confounding bias, you can't fix it by randomization.
## Selection Bias in Chain Structure
X -> Y -> s
When S = 1, it means someone is included in the study, and when it's 0, they're not. We call the data selected by this mechanism "s-biased."
The **odds ratio (OR)** is a statistical measure used to assess the strength of association or relationship between two variables, X and Y, while taking into account a third variable, Z. This measure helps us understand how the probability of a specific outcome (Y) changes concerning a particular exposure or characteristic (X) when we consider the influence of Z.
If the odds ratio is greater than 1, it suggests a positive association between X and Y when Z is held constant, meaning that X is associated with an increased likelihood of Y. If it's less than 1, it suggests a negative association, indicating that X is associated with a decreased likelihood of Y. If it's equal to 1, there is no association between X and Y when Z is considered.
The odds ratio (OR) helps us measure the association between X and Y while considering the presence of variables like "S" that affect the selection of subjects. It allows us to assess the strength of the relationship between X and Y, even in the presence of selection bias.
==G-recoverability==
OR(X, Y | Z, W): This represents the odds ratio that measures the association between X and Y while taking into account both Z and W. Collapsibility implies that the influence of the variables in set W does not affect the relationship between X and Y after we have already considered the variables in set Z.
:::success
From the GLACIAL paper, I understood this:
* The Granger framework is a popular tool for finding cause-and-effect relationships in data over time. It's typically used with data that is collected frequently. However, in fields like population health, we often deal with studies where we follow many individuals, but we only have data for them at a few points in time. These studies track many variables, and the relationships among these variables can be complex and unique to each individual. Also, it's common for some data to be missing, making it hard to accurately determine cause-and-effect relationships.
* GLACIAL is a method that combines Granger causality (GC) with machine learning to examine causal connections among various factors in a long-term study.
>>Instead of treating the entire study as one big dataset, it treats each person's data as a separate sample. By doing this, it can use a standard training and testing approach with some individuals left out to assess causal relationships.
* GLACIAL can accurately identify relationships, even in challenging scenarios with limited data, numerous variables, direct causes, and missing information.
:::
#### Next Steps
Peruse:
https://www.sciencedirect.com/topics/neuroscience/granger-causality