Causal Inference with DoWhy

# Causal Inference with [DoWhy](https://github.com/Microsoft/dowhy) <br> <br> <small>Neil John D. Ortega</small><br> <small> <small>ML Engineer @ LINE Fukuoka</small><br> <small>2021/02/26</small> </small> --- ## Agenda - Why Causal Inference (CI)? - Why [DoWhy](https://github.com/Microsoft/dowhy)? - 4-Step CI Workflow - Case Study: What Causes Hotel Booking Cancellations? - Recap --- ## Why Causal Inference (CI)? - Increased usage of ML algorithms in decision-making across verticals, often with huge consequences - Increased need to understand - why a model suggested a particular decision - what the effects of that decision are (often poses ethical/social consequences) - Causal inference could help but it has its own set of challenges: - different frameworks can be used - comparing assumptions not straightforward - comparing robustness of results not straightforward --- ## Why [DoWhy](https://github.com/Microsoft/dowhy)? - Allows modeling problems as a causal graph, making the assumptions more explicit - Provides a single interface for different CI frameworks (structural causal model vs. potential outcome framework) - Allows automatic validation of assumptions and assessment of robustness of estimates under "what-if" scenarios  --- ## 4-Step CI Workflow 1. **Model** the problem as a causal graph 2. **Identify** a target variable under the causal model 3. **Estimate** the causal effect based on the target variable 4. **Refute** the obtained estimate ---- ### (1) Model problem as a causal graph - Causal graph - DAG to convert domain knowledge as a set of causal assumptions ![](https://i.imgur.com/MJotPHh.png) ---- ### (1) Model problem as a causal graph - Intervention graph - keeping everything else the same, we select a treatment whose causal effect to the outcome we want to estimate ![](https://i.imgur.com/Z4mOwQn.png) ---- ### (2) Identify target variable - How to represent desired quantities from **intervention graph** using statistical observations from data generated by **causal graph** - Can we estimate these from given data? ---- ### (3) Estimate causal effect - Keeping all confounders constant, estimate the conditional probability $P(Y|T=t)$ of the outcome given the treatment - Statistical/ML methods are employed here - Face the same challenges for non-causal estimation (e.g. bias-variance tradeoff, etc.) ---- ### (4) Refute the obtained estimate - Test how the estimates behave under "what-if" scenarios - Can be done with - any one of the previous steps (unit test-like), or - the entire pipeline (integration test-like) - The more tests you can do, the better, i.e. :arrow_up: confidence with the model - :warning: **DOES NOT PROVE CORRECTNESS** --- ## Case Study: What Causes Hotel Booking Cancellations? --- ## Recap <style> .reveal ul {font-size: 32px !important;} </style> - Increased need to peak inside the black-box of automated decision making. Also need to determine the effects of a decision, should we implement one - Causal inference lends itself to the above problem, but with its own set of challenges - [DoWhy](https://github.com/Microsoft/dowhy) is a good starting point for integrating causal-based techniques to the DS/ML pipeline - Good level of abstraction to the steps of CI workflow - Out-of-the-box implementation of techniques for each of steps (mix and match, but with due diligence) - Automated robustness checks for higher confidence with the model and its estimates --- # Thank you! :nerd_face: --- ## References  <style> .reveal p {font-size: 20px !important;} .reveal ul, .reveal ol { display: block !important; font-size: 32px !important; } section[data-id="references"] p { text-align: center !important; } </style> [1] Sharma, A. and E. Kiciman. “DoWhy: An End-to-End Library for Causal Inference.” ArXiv abs/2011.04216 (2020): n. pag.