owned this note
owned this note
Published
Linked with GitHub
# Causal Inference Workshop 2021
timetable: https://events.hifis.net/event/98/timetable/
## House Keeping
### Code of Conduct
We facilitate this meeting under the [Dresden Code of Conduct](https://dresden-code-of-conduct.org/de/) (for lack of any other one). **Please be mindful of your peers and supportive in your communication!**
### Connection Details
Peter Steinbach is inviting you to a scheduled Zoom meeting.
Topic: An Introduction to Causality
Time: May 6, 2021 10:00 AM Amsterdam, Berlin, Rome, Stockholm, Vienna
Join Zoom Meeting
https://zoom.us/j/96747132671?pwd=ejNNY2ZIVUJnd1FSQWVoTVQ4UFdBdz09
Meeting ID: 967 4713 2671
Passcode: 284941
One tap mobile
+496938079883,,96747132671#,,,,*284941# Germany
+496950502596,,96747132671#,,,,*284941# Germany
Dial by your location
+49 69 3807 9883 Germany
+49 695 050 2596 Germany
+49 69 7104 9922 Germany
+49 30 5679 5800 Germany
+1 301 715 8592 US (Washington DC)
+1 312 626 6799 US (Chicago)
+1 346 248 7799 US (Houston)
+1 646 558 8656 US (New York)
+1 669 900 9128 US (San Jose)
+1 253 215 8782 US (Tacoma)
Meeting ID: 967 4713 2671
Passcode: 284941
Find your local number: https://zoom.us/u/acAUveM2gI
## Course
### Your Challenges: Bring Your Own Data
Please let us know, if you like to present your data (either only orally or with 1-2 slides). Spontaneous contributions are welcome alike.
- beamline control: avoid drill holes in walls, stochastically direct&shape beams using "mirrors"/magnets and undulators. Q: what change in mirrors/actuators caused beam to go where? data: intraction points w/ values, beam location outcome. Confounders: temperature, voltage, ...
- good aspect: this experiment can controlled, i.e. you can do experiments where you alter all motors/mirrors/magnets and see what happens
- problem: depending on the size of the beamline, these experiments take a lot of time
- this way, the Markov equivalence class problem can be mitigated because I can construct a dataset that will provide me with the causal graph for my quantity/outcome of interest
- dataset on blood loss after birth (natural delivery vs. "surgical" delivery):
- confounders known (age of mother, age and weight of fetus, BMI, ...)
- isolated confounders with experts
- model didn't provide good predictions, randomized trial hard if not impossible
- Q: did we check for the right confounders?
- controls need to be validated with experts
- ideally look at data with randomization properties
- use instrumental variables e.g. availability of surgical doctors
- MR: Mendelian randomization
- Compare the results from trans and cis eQTL summary based data (SMR) using the cis eQTLs as Instrumental Variables
- See which genes could be robustly instrumented with multiple independent markers in a cohort with 46 different tissues
- Measure the instrument strength either with TSLS or F_{stat}
- physical activity effect on molecules in blood. data: longitudinal observational study (t=1..3), binary outcome (molecule present).
- Q: adjust for age? (has effect on both variables of interest)
- first stab: control for age as it might effect the level of molecules in the blood when starting the study
- Q: data has selection bias
- would worry about it, e.g. marathon runners (consistent exercise, but also nutrition effects) vs. occasional runners
- BMI circular effect on physical activity?
- what is the effect you try to estimate? (average effect or specific effect)
- for average effect, monitoring/mitigating selection bias is important; for specific effect perhaps not
- maybe phrase question for effect on BMI at given point in time (BMI after study, BMI at fixed time `t`)
### Feedback before Lunch break
#### Please share something with us, that you liked about the course or something that you learned :+1:
- I like the interactiveness of the presentation :+5:
- the real world examples helped a lot to bring the more theoretical ideas into
- Small setting that favors interaction, extremely interesting, thanks a lot!
- Very good introduction into the concept of Causal Inference, good examples, all questions answered. (PS: never forget COLLIDER BIAS)
-
#### Please share something with us, that you didn't like or something that want us to improve :-1:
- Niki's audio problems
- The methods part/ overview was very dense. Looking forward to a few step-by-step applications in the afternoon ;-)
- Please spend a bit more time introducing terminology typically used in causal
inference such as T=treatment, C=control(?). Maybe also mention that these
are rooted in clinical studies (if true).
- Please spend a bit more time to make sure that most people digested the concepts of
confounder vs. collider, before moving on to topics that build on that, such
as instrumental variables and the method employing auxiliary models
eventually implemented in the notebook for the afternoon.
- On various slides: It would help to use symbols used in equations (such as L,
T, \hat{t}, X, Y, W, ...) in the graph sketches as well.
## Exercises
### how-to
- Unzip the distributed file
- go to https://colab.research.google.com/ and create a new notebook, choosing the "upload" option
- Upload the ipynb file
- In the tabs on the left, go to the "files" section. Drag & drop the remaining 3 files from the zip into this field (1 parquet file, 2 joblib files)
- After the parquet has finished uploading, you can start - go through the cells, running them using shift-enter and roughly following along.
- Your exercise is at the bottom. We hope that most can finish the main exercise - if you're very fast, there is extra credit. HINT: no need to follow every detail above the exercise. You can just quickly run everything and then follow the instructions in the exercise-section.
### running the notebooks locally
- the zip file is located here: https://hmgubox2.helmholtz-muenchen.de/index.php/s/fKkxtzd3d3gzMa7
- note that you may want to install `pyarrow` or `fastparquet` before loading the `.parquet` file (in my local pip `20.2.2` installation with python `3.9`, this was not considered a required dependency by defaul)
- `econml` can be found at https://github.com/microsoft/EconML
### Feedback on the afternoon part
#### Please share something with us, that you liked about the course or something that you learned :+1:
- the exercises were a nice fit to really start bringing the more abstract topics from the morning into a domain, where abstract context needed to be come concrete
- Niki's talk was great and made many things more clear! Esp. the IV
explanation using the smoker/treatment assignment with the assignment being
the instrument was super helpful.:+2:
- I like the style of Niki's talk as it went from almost atomic observations between entities to a causal inference :+2:
#### Please share something with us, that you didn't like or something that want us to improve :-1:
- Maybe a different code example to understand the morning material better. :+2:
- I was still chewing on the morning material, so I spent most of the coding hour actually understanding the data set.
- The concept of "elasticity" which is talked about at the beginning of the
notebook could use a more rigorous definition. If that was defined in the
morning talk, then I might have missed it :)
- The notebook mentioned another notebook called `_prep`(?), which was not present. Probably a leftover text snippet from another course?
- The coding done by Niki in the afternoon session was very fast. One could
only follow if all material from before was completely clear. But it showed
that he really knows his stuff!
- The xlabel of plot
```py
sns.distplot(df.groupby('StockCode').UnitPrice.std().dropna().clip(0, 15),
kde=False)
```
says "UnitPrice", while it should say for instance "std(UnitPrice)".
- In case the first part of Niki's talk (2SLS) is equal to the auxiliary model
method implemented in the notebook, then the talk (at least this part) should
be given in the morning session. If not then I'm confused and have to recap
the material even more.
- the notebook blew my mind a bit: I wasn't familiar with some of the terms used and in addition, training was not performed as I am used to (train-test-split), I think that imposed quite some cognitive load that made me struggle to appreciate the causal inference part of the notebook :+4:
- The python implementation was hard to follow for non-python people.
- The last talk from Niki was so interesting that I wish it would have been presented completely - a little bit more time would be awesome! Please keep that series, really nice group of people!
### Why Causal Inference?
https://pbs.twimg.com/media/E0PkD69XoAo4Kql?format=jpg&name=medium
The image above was produced based on a recent Paper by Gelman that identifies the most important statistical ideas of the last 50 years. This course brought you onto square 1!