Interpreting models via predictions

Hi, I would love to write a FAQ entry on interpreting (primarily linear) model results using predictions as questions about interpretation crop up quite frequently. After my recent misstep with the [attempt at guidance for model selection/hypothesis testing](https://discourse.mc-stan.org/t/hypothesis-testing-model-selection-model-comparison-some-thoughts/19163) I would like to post a proposal for comments first and only then make into some sort of semi-official guidance. I am putting a copy of the text below, and you can definitely comment here on Discourse. But to make collaboration easier (I remember @avehtari disliked the experience of collaborative writing on Discourse), I want to give HackMD.io a try - it seems pretty capable and works directly with markdown and code/math support so the result can be directly posted to Discourse. [Open the note to edit it](https://hackmd.io/gUnES5UIS5-v3C_AGsheHQ?both) (should be open to anyone without any registration). The overall idea is that instead of considering posteriors for individual parameters, a more general approach is to think of various prediction tasks a model allows and how those map to underlying questions, showing that posteriors for individual parameters are a special case of such a prediction task. And I know I do a lot of self-citation in those posts - this is simply because it is easy for me to find my writing on the topic, but please share references to elsewhere. It is also possible that this material is better for a blog post than for a FAQ... Thanks for any feedback. # Interpreting models via predictions So you fitted a model, but what do the results mean? Here we will focus on interpreting hierarchical linear models as fitted in `rstanarm` or `brms`, but the general ideas should apply to any Stan model. In most of the literature, people just fit a linear model and then interpret the relevant coefficient - and that's often enough for simple settings like `outcome ~ treatment`. But what to do with more complex models e.g. `outcome ~ treatment * age + (1 + treatment | subgroup)`. What to do about the interaction? Should we include the varying effect per subgroup? One way (but not the only one) to think about inferences is to imagine hypothetical new experiments/measurements and see what the model predicts for those situations. ## Definitions To avoid confusion, we will separate three possible quantities ## Model coefficients as predictions Looking at individual coefficients can be seen as a special case of such a prediction. If I regress `outcome ~ treatment` (where treatment has levels `A` and `B`) the coefficent for `treatmentB` represents how big difference between averages of `outcome` for the two groups I would see in a hypothetical new set of measurements with no noise/unlimited data. However, for more complex models the individual coefficients can correspond to difficult-to-grasp hypothetical quantities: for the `outcome ~ treatment * age + (1 + treatment | subgroup)` model, the `treatmentB:age` coefficient could be seen as the difference of two differences: a) change in outcome per each unit of age in treatment B b) change in outcome per each unit of age in treatment A. This all averaged over all subgroups and without measurement noise - things get an additional bit more complicated, if the outcome is non-Gaussian and/or there is a transformation/link function between the model scale and the outcome scale. ## Predictions unleashed So what we can do instead is to think about which prediction task is relevant to our particular real-world question. Keeping with the same example, I can: - Predict how average changes - I want to analyze, whether a randomly selected subject in the study would have their reaction changed over time. Then including the varying intercept and effect is IMHO necessary, but I would use the fitted clusters. - I want to analyze whether a random person in a hypothethical new realization of the experiment will change their reaction over time. Then I would use the varying intercept and effect and simulate a new cluster. So the big questions are: do I care about the average effect or do I care about individuals? And do I want to analyze what happened or do I want to generalize? Note also that there are other options, like simulating a hypothetical replication with a limited number of new subjects (with simulated new cluster values) and taking the mean which would get you somewhere in between and might be useful if you want to understand how likely is a replication to see a similar result or when predicting what next few customers will do. In practice, we can easily do predictions with `posterior_predict`, `posterior_epred` and `posterior_linpred` functions in those packages. This is where it gets interesting - there is no general rule and you need to be explicit about which predictions do you care about. Using the original data would mean you are interested in something like exact replications of the same experiment (with the same subjects etc.). But there are other sensible approaches - e.g. assume a hypothetical previously unseen subject (see the allow_new_levels argument for posterior_predict) at some pre-specified days from baseline that interest you. There really are a lot of options and I think the advantage of this approach is that it makes you be explicit about choosing one of those options, instead of just taking the default. The answer to your question is IMHO a big "it depends" and it depends on the real-world problem you are trying to attack. Some examples: # Further reading - A simple example where using predictions makes a difference from investigating marginal posteriors is at https://discourse.mc-stan.org/t/multivariate-model-interpreting-residual-correlation-and-group-correlation/15461/6?u=martinmodrak - A discussion of using predictions to infer from a complex model for a time-series is at https://discourse.mc-stan.org/t/marginalization-of-autocorrelation-model-using-posterior-predictive-distributions/19940 - Bob Carpenter wrote a [case study](https://aws1.discourse-cdn.com/standard14/uploads/mc_stan/original/2X/2/2820d4f0c6810acdae1b8f2d0e9fb27bd78a40de.pdf) showing that using predictions is better - https://betanalpha.github.io/assets/case_studies/factor_modeling.html discusses the interpretation of parameters vs. combinations of paramters..