Partial Out Everything - from FWL theory to CUPED, IV/2SLS, and Double ML

# Partial Out Everything - from FWL theory to CUPED, Control Variates, and Double ML ![-partial-out-everything](https://hackmd.io/_uploads/BktsN6nWlg.jpg) In recent research, I found that multiple methods in AB testing and causal inference are relyed on the same theory: Frisch-Waugh-Lovell Theorem. However, to dig in this topic, I want to breifly explain the relation between linear regression and t-test. ## T-test v.s. Linear Regression > They are the same, if variable in linear regression is dummy ({0,1}) In T-test, we always calculate the t-statistic $T$ after experiment using treament $X$ and outcome $Y$. However, if we use linear regression, we can find out that $$ Y = TX + \varepsilon $$ Also, since $X$ means whether the outcome $Y$ is come from control group ($X=0$) or treatment group ($X=1$). ## Frisch-Waugh-Lovell Theorem In linear regression, when we want to know the relation between $X$ and $Y$, but I want to control (partial out) $Z$, I just do the regression like the following $$ Y \sim X+Z $$ However, there is another way to get it, and it is called FWL theorem: 1. $Y\sim Z$, and we get $Y$ residual $Y^*$ 2. $X\sim Z$, and we get $X$ residual $X^*$ 3. $Y^*\sim X^*$, and we still get the same relation. ![double-machine-fighting](https://hackmd.io/_uploads/rJIXUp2ble.jpg) ## Double Machine Learning Double Machine Learning is a causal inference method, the goal is to find the relationship without other variates effect. The concept is simple. In FWL theorem, we use linear regression $f(\cdot)$: 1. $Y=f(Z)+\varepsilon$, and we get $Y$ residual $Y^*$ 2. $X=f(Z)+\delta$, and we get $X$ residual $X^*$ 3. $Y^*=f(X^*)+\eta$, and we still get the same relation. In Double Machine Learning, we change linear regression $f(\cdot)$ to machine learning method $D(\cdot)$. 1. $Y=D(Z)+\varepsilon$, and we get $Y$ residual $Y^*$ 2. $X=D(Z)+\delta$, and we get $X$ residual $X^*$ 3. $Y^*=D(X^*)+\eta$, and we get the relation. ## Control Variates Control Variates is also called Regression Adjustment, is a way to decrease the variance. How to do variance reduction with this method, first we need to find variate $Z$ other than $X$ and $Y$. $Z$ needs to follow some rules 1. $Z$ depends on $Y$, more correlated better. 2. $Z$ is independent on $X$ If $Z$ is independent on $X$, then we do not have to do the second step on FWL theorem. If we use it on linear regression, then after finding $Z$, the following steps are: 1. $Y\sim Z$, and we get $Y$ residual $Y^*$ 2. $Y^*\sim X$, and we get the relation. Here is called **Regression Adjustment**. But if we change our view on Hypothesis Testing, then our process will focus on how to change the $Y$ into $Y_{adj}$. $$ Y_{adj}=Y-f(Z)=Y-\theta Z $$ However, since we want $Y$ is an unbiased estimator, that is $E(Y_{adj})=E(Y)$, so we need to change the above equation into below one: $$ Y_{adj}=Y-\theta Z+\theta \bar{Z} $$ where $\bar{Z}$ is the average of $Z$. Also, since our goal is to decrease variance, after some simple calculation, the $\theta$ will be $$ \theta = \cfrac{cov(Y,Z)}{var(Z)} $$ This is just the result from $Y\sim Z$. Therefore, we just need to find the suitable $Z$, and we can reduce variance from $Y$ to $Y_{adj}$. Here is called **Control Variates** ![the-shadow-of-the-shadow](https://hackmd.io/_uploads/Sk1rL6hZel.jpg) ## CUPED (Controlled-experiment Using Pre-Experiment Data) How to find a perfect $Z$? The most direct one is > The pre-experiment outcome is perfect $Z$ The pre-experiment outcome is related to $Y$, since they are the same outcome. Moreover, The pre-experiment outcome is independent to $X$, since at that time, treatment is not in the outcome. Also, pre-experiment outcomes is a prefect $Z$ since, we always need to do AA testing before AB testing. We can directly use the data from AA testing. The method using pre-experiment data is called **CUPED**. ## Reference https://zhuanlan.zhihu.com/p/604335170 https://ai.stanford.edu/~ronnyk/2013-02CUPEDImprovingSensitivityOfControlledExperiments.pdf https://www.evanmiller.org/you-cant-spell-cuped-without-frisch-waugh-lovell.html https://en.wikipedia.org/wiki/Frisch%E2%80%93Waugh%E2%80%93Lovell_theorem