# PCS Papers Summaries
## Veridical Data Science [(Yu and Kumbier, 2020)](https://www.pnas.org/doi/epdf/10.1073/pnas.1901326117)
#### What does PCS do for science?
PCS provides guidelines for properly evaluating how well a model does. PCS defines what properties a "good" model should satisfy. Some of these properties are quantifiable (e.g. measurable loss function) and others can be a preliminary list (e.g. here are some constraints you should think about)
#### What do researchers gain by using PCS?
Researchers that use PCS have a sound framework for evaluating models/methods/processes for their specific application.
## Stable Discovery of Interpretably Subgroups... [(Dwivedi et al, 2020)](https://arxiv.org/pdf/2008.10109.pdf)
#### Setting: Causal Inference with heterogeneous treatment effects
#### Goal: Find the best CATE (function measuring effect sizes) estimator for analysis on a chosen dataset
Predictability
- predictive reality check based on calibration
Stability
- run their method multiple times using different perturbations of the data to check whether the same cell cover was found.
- For each cell they define a stability score and rank cells according to their stability scores.
P & S
- run their method on a new (but similar) dataset to show that conclusions obtained by their method on the VIGOR study can generalize to the APPROVe study
#### What the Setting Gains By Using PCS:
- Predictability gives trust in the selected CATE estimator because it passes an out-of-sample prediction accuracy
- Stability helps their method know that the CATE estimator they select is robust to data perturbations.
#### What are the Unique Challenges of PCS In This Setting:
- CATE is a function and is challenging to estimate
- Fundamental problem of missing data in causal inference -> cannot directly solve with conentional supervised learning techniques
- No test for CATE models because the missing potential outcomes means there's no plug-in estimate for any risk function
## Veridical Causal Inference Using Propensity Score Methods... [(Ross et. al., 2021)](https://pubmed.ncbi.nlm.nih.gov/34040495/)
Note: This paper is not written very clearly. They mention PCS once in the intro and then never again. Connections to PCS must be made by the reader.
#### Setting: Causal Inference with propensity score methods for various outcomes (e.g. binary, count, etc.) for medical insurance claims data
#### Goal: Want a proper method for answering scientific inquires using medical claims datasets that is also transparent and reproducible.
Predictability
- predict a propsentiy score
Stability
- stability of weights away from extreme values (trimming) for inverse probability of treatment weighting.
#### What the Setting Gains By Using PCS:
- this paper mentions PCS but does not actually explain how they use it or do they offer anything novel in my opinion
#### What are the Unique Challenges of PCS In This Setting:
- healthcare datasets have selection bias, heterogeneity, missing values, duplicate records, and misclassification of diseases.
## Next Waves in Veridical Network Embedding [(Ward et al., 2021)](https://onlinelibrary.wiley.com/doi/epdf/10.1002/sam.11486)
#### Setting: Evaluating network embedding algorithms
#### Goal: Study network embedding algorithms systematically and point to new directions for future research.
Predictability
- construct a metric to evaluate how well a model represents relationships in the original data.
Computability
- approximations or use of simpler low dimensional representations of large networks (e.g. limiting the number of vertices in the network)
Stability
- how stable are the results when the data or model is pertrubed?
- examples
- choice of representation space
- choice of embedding dimension
- preserving features of the network
- data changes: removing small number of edges from the network
#### What the Setting Gains By Using PCS:
- gives their method a template and validity
- gives them a way to comprehensively evaluate embedding algorithms that are very different.
#### What are the Unique Challenges of PCS In This Setting:
- There are many network embedding methods and are often dependent on downstream tasks
- For predictability, we cannot do "cross validation" or sampling the data like we do for other supervised ML problems. (i.e. how do you sample from a network?)
## A New Method to Compare the Interpretability of Rule-Based Algorithms [(Margot et al., 2021)](https://www.mdpi.com/2673-2688/2/4/37/htm)
#### Setting: evaluating interpretable, rule-based algorithms
#### Goal: definition of a score that is a weighted sum of Predictability, Stability, and Simplicity.
Predictability
- accuracy of predictive algorithms
Simplicity
- sum of the lengths of the rules derived from the model
Stability
- Dice-Sorensen index for comparing two rule sets generated by an algorithm using two independent samples
#### What the Setting Gains By Using PCS:
- able to create a well defined scoring system for evaluating interpretable, rule-based algorithms
- gives a structured framework for quantitatively evaluating interpretability
#### What are the Unique Challenges of PCS In This Setting:
- generating a stable set of rules is challenging
- they need to consider how concise / simple a produced rule is