# Causal Inference with [DoWhy](https://github.com/Microsoft/dowhy)
<br>
<br>
<small>Neil John D. Ortega</small><br>
<small>
<small>ML Engineer @ LINE Fukuoka</small><br>
<small>2021/02/26</small>
</small>
---
## Agenda
- Why Causal Inference (CI)?
- Why [DoWhy](https://github.com/Microsoft/dowhy)?
- 4-Step CI Workflow
- Case Study: What Causes Hotel Booking Cancellations?
- Recap
---
## Why Causal Inference (CI)?
- Increased usage of ML algorithms in decision-making across verticals, often with huge consequences<!-- .element: class="fragment" -->
- Increased need to understand<!-- .element: class="fragment" -->
- why a model suggested a particular decision<!-- .element: class="fragment" -->
- what the effects of that decision are (often poses ethical/social consequences)<!-- .element: class="fragment" -->
- Causal inference could help but it has its own set of challenges:<!-- .element: class="fragment" -->
- different frameworks can be used<!-- .element: class="fragment" -->
- comparing assumptions not straightforward<!-- .element: class="fragment" -->
- comparing robustness of results not straightforward<!-- .element: class="fragment" -->
---
## Why [DoWhy](https://github.com/Microsoft/dowhy)?
- Allows modeling problems as a causal graph, making the assumptions more explicit<!-- .element: class="fragment" -->
- Provides a single interface for different CI frameworks (structural causal model vs. potential outcome framework)<!-- .element: class="fragment" -->
- Allows automatic validation of assumptions and assessment of robustness of estimates under "what-if" scenarios<!-- .element: class="fragment" -->
<!-- 
<small><strong>Fig. 1.</strong> Relationships between NAS method categories [1]. Accessed 8 Nov 2020.</small> -->
---
## 4-Step CI Workflow
1. **Model** the problem as a causal graph
2. **Identify** a target variable under the causal model
3. **Estimate** the causal effect based on the target variable
4. **Refute** the obtained estimate
----
### (1) Model problem as a causal graph
- Causal graph - DAG to convert domain knowledge as a set of causal assumptions

----
### (1) Model problem as a causal graph
- Intervention graph - keeping everything else the same, we select a treatment whose causal effect to the outcome we want to estimate

----
### (2) Identify target variable
- How to represent desired quantities from **intervention graph** using statistical observations from data generated by **causal graph**
- Can we estimate these from given data?
----
### (3) Estimate causal effect
- Keeping all confounders constant, estimate the conditional probability $P(Y|T=t)$ of the outcome given the treatment
- Statistical/ML methods are employed here
- Face the same challenges for non-causal estimation (e.g. bias-variance tradeoff, etc.)
----
### (4) Refute the obtained estimate
- Test how the estimates behave under "what-if" scenarios
- Can be done with
- any one of the previous steps (unit test-like), or
- the entire pipeline (integration test-like)
- The more tests you can do, the better, i.e. :arrow_up: confidence with the model
- :warning: **DOES NOT PROVE CORRECTNESS**
---
## Case Study: What Causes Hotel Booking Cancellations?
---
## Recap
<style>
.reveal ul {font-size: 32px !important;}
</style>
- Increased need to peak inside the black-box of automated decision making. Also need to determine the effects of a decision, should we implement one<!-- .element: class="fragment" -->
- Causal inference lends itself to the above problem, but with its own set of challenges<!-- .element: class="fragment" -->
- [DoWhy](https://github.com/Microsoft/dowhy) is a good starting point for integrating causal-based techniques to the DS/ML pipeline<!-- .element: class="fragment" -->
- Good level of abstraction to the steps of CI workflow<!-- .element: class="fragment" -->
- Out-of-the-box implementation of techniques for each of steps (mix and match, but with due diligence)<!-- .element: class="fragment" -->
- Automated robustness checks for higher confidence with the model and its estimates<!-- .element: class="fragment" -->
---
# Thank you! :nerd_face:
---
## References
<!-- .slide: data-id="references" -->
<style>
.reveal p {font-size: 20px !important;}
.reveal ul, .reveal ol {
display: block !important;
font-size: 32px !important;
}
section[data-id="references"] p {
text-align: center !important;
}
</style>
[1] Sharma, A. and E. Kiciman. “DoWhy: An End-to-End Library for Causal Inference.” ArXiv abs/2011.04216 (2020): n. pag.
{"metaMigratedAt":"2023-06-15T19:54:00.145Z","metaMigratedFrom":"YAML","title":"Causal Inference with DoWhy","breaks":true,"description":"View the slide with \"Slide Mode\".","slideOptions":"{\"spotlight\":{\"enabled\":false}}","contributors":"[{\"id\":\"ed2adf4d-7b64-4cc8-9c2f-656c184d7122\",\"add\":4995,\"del\":9736}]"}