# Causal inference Group meeting notes ## Mediation #### Definition(CDE) For any three variables $X$, $Y$ and $Z$, where $Z$ is a mediator between $X$ and $Y$ the \emph{controlled direct effect} (CDE) on $Y$ of changing the value of $X$ from $x$ to $x'$ is defined as: $$CDE = P(Y=y|do(X=x), do(Z=z)) - P(Y=y|do(X=x'), do(Z=z)).$$ So the CDE is the difference between the probabilities of $Y=y$ when doing two distinct interventions on X while "intervening" on the mediator but holding it stable. **Question:** When is the CDE identifiable (recall that a causal effect is identifiable if it can be uniquely determined from the causal structure on the basis of the observations only.)?} In general, the CDE od $X$ on $Y$, mediated by $Z$, is identifiable if the following two properties hold: 1. There exists a set $S_1$ of variables that blocks all backdoor paths from $Z$ to $Y$; 2. There exits a set $S_2$ of variables that blocks all backdoor paths from $X$ to $Y$, after deleting all arrows entering $Z$. **Question:** What is the total effect and what is the CDE in Figure 3.1.2? **Note:** In linear systems, the IDE = TE - CDE, but not in non-linear systems. ## 3.8 Causal Inference in Linear Systems * The causal methods introduced in the book work regardless of the type pf equations that make up the model in question. **d-separation** and the BDC make no assumptions whatsoever about the form of the relationship between two variables - only that the relationship exists. However, things become simpler in the linear setting and it is in this setting that we are now situated. See here for a refresher: https://almostsuremath.com/2021/02/24/multivariate-normal-distributions/ ### What is the difference between structural and regression coefficients? #### Regression equations $$y=r_1x + r_2z + \epsilon$$ * Descriptive and make no assumption about causation * The error terms in the equation denote the residual errors in observation, after fitting the eqaution $y=r_1x + r_2z$ to the data. These are human-made and arise due to imperfect fitting. #### Structural equation $$Y=\alpha X + \beta Z + U$$ * Makes an assumption about causation * The "error terms" ($U$)represent latent factors (aka disturbances or omitted variables) that influence $Y$ and are not themselves affected by $X$. These are nature-made. Question: Is it a matter of interpretation then? A different way at looking at things? A different perspective? * In a linear system, the direct effect of a variable on another corresponds to the structural coefficient. * In a linear system, the total effect of X on Y is simply the sum of the products of the coefficients of the edges on every nonbackdoor path from X to Y. Once we know the form of the linear system, which we can deduce from our graph, we can easily calculate the TDE and the DE. ## Meeting Notes 22/11/22