owned this note
owned this note
Published
Linked with GitHub
# Spring 2024 Bayesian Learning Group
# Quick links
[Course website](https://cct-datascience.github.io/bayesian-learning-group)
### Jump to session notes
[Session 7 - April 12, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-7)
[Session 6 - March 29, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-6)
[Session 5 - March 15, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-5)
[Session 4 - March 1, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-4)
[Session 3 - February 16, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-3-Posterior-distributions-and-geocentric-linear-models)
[Session 2 - February 2, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-2-Digging-in-to-the-garden-of-forking-data)
[Session 1 - January 19, 2024](https://hackmd.io/z9Or6dTqRuKKkNHDeDM1cw?both=#Session-1-Why-are-we-here-and-how-should-we-do-it)
# Show-and-tell signups
What does this entail?
* Prepare a short (5-10m) overview of your Bayesian model:
* What is the research question?
* What is the data?
* What are the components of your model? (e.g., DAG, stat notation)
* How does it run? (e.g. which tools are you using? Screen share and/or demos welcome if practical!)
* What do you find interesting about it?
* Tell us about it and we'll ask friendly questions! (10-15m)
Signups:
* March 1st: Jessica (how plants regulate water flows)
* March 15th: Val (Do women really (not) talk more than men?)
* March 29th: ~~David (Specifying informative priors from knowledge and / or data)~~
* April 12th:
* April 26th: Henry Scharf, Stan demo of a logistic regression
# Session notes
## Session 8
* Housekeeping
* We'd like to have an opportunity to talk about feedback on how the learning group has gone
* What has been most useful?
* Suggestions for changes?
* Alternatively: please fill in the (anonymous) end-of-semester survey! https://forms.gle/eTKLCnbMD32CBpih6
* Guest lecture from Henry Scharf at 1pm!
* Discussion: MCMC
* Stochastic sampling vs. approximation
* How does Hamiltonian MC differ from Gibbs sampling and Metropolis algorithms?
* Decisions to make when running HMC
* Number of chains
* Specifying priors and the nature of the posterior (correlated parameters!)
* Key parameters and attributes of HMC to watch
* Rhat
* neff
* Trace plots
* Trank (trace rank) plots
* Common problems
* Divergent chains
* Non-identiifable parameters
## Session 7
* Housekeeping
* Should we continue into the summer? The fall?
* Discussion
* Model selection vs model comparison in (this) Bayesian framework VS statistical significance
* McElreath makes a big deal about how the "best" model by an information criterion or cross-validation approach may not be causally diagnostic. Is there more to this point than that a GOF/IC metric will tell you about how well a model describes data, but cannot compensate for user error at the model specification/experimental design stage?
* How do we interpret a model comparison approach in a framework often dominated by the litmus test of statistical significance? Maybe: significance is no longer the holy grail; we end up thinking much more about including parameters *because they make scientific sense* and then seeing how different they are from 0? How does this end up working out in practice? How do you communicate such an approach to reviewers, etc. who may be more comfortable in a significance-testing framework?
* "Note that model comparison here is not about selecting a model. Scientific considerations already select the relevant model. Instead it is about measuring the impact of model differences while accounting for overfitting risk."
* (Possibly more advanced conversation) Using regularizing priors to "keep the model from getting too excited about the data" (overfitting)
* Practice
* Either
* 7H5, 8H1, and 8H2
* Or
* 8H4 (start-to-finish from a question to model to model comparison to interpretation)
## Session 6
### Agenda
* Discussion
* Confounds and causal inference
* Working example
* ![image](https://hackmd.io/_uploads/ryh9Bj-y0.png)
### Attendance
_Put your name here (voluntary, but it helps us keep track!)_
Laura Dozal
Irina Stefan
Connor Wilson
Jessica Guo
Val Pfeifer
Renata Diaz
### Taxonomy of sneaky data
![6B76E83B-DB85-439B-9E6E-6A1B4DB2485D](https://hackmd.io/_uploads/Sy1WzqNJR.png)
## Session 5:
- Discussion
- Hypothesis testing in a Bayesian framework
- Problems
- Workshopping building a linear model
- Val show-and-tell
### Attendance
_Put your name here (voluntary, but it helps us keep track!)_
Antonio Rubio
Connor Wilson
Val
Julia Fisher
Anna Dornhaus
### Discussion
Under what conditions do we reach for a Bayesian model?
What does hypothesis testing mean in a Bayesian framework? (or, does every scientific question require a null hypothesis?)
### Practice 1 - Priors
* Begin with exercise 4M4 from Rethinking:
* 4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
* Time permitting, continue on:
* 4M5. Now suppose I tell you that the average height in the first year was 120 cm and that every student got taller each year. Does this information lead you to change your choice of priors? How?
* 4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
### Practice 2 - Fitting and playing with a model
* 4H2. Select out all the rows in the Howell1 data with ages below 18 years of age. If you do it right, you should end up with a new data frame with 192 rows in it.
* Fit a linear regression to these data, using `quap`. Present and interpret the estimates. For every 10 units of increase in weight, how much taller does the model predict a child gets?
* Plot the raw data, with height on the vertical axis and weight on the horizontal axis. Superimpose the MAP regression line and 89% interval for the mean. Also superimpose the 89% interval for predicted heights.
* What aspects of the model fit concern you? Describe the kinds of assumptions you would change, if any, to improve the model. You don’t have to write any new code. Just explain what the model appears to be doing a bad job of, and what you hypothesize would be a better model.
## Session 4:
### Attendance
_Put your name here (voluntary, but it helps us keep track!)_
Anna Dornhaus
Irina Stefan
Val
Kara Haberstock Tanoue
David LeBauer
Connor Wilson
Laura Dozal
### Housekeeping
* Show-and-tell signups
* Interest in collaborative data workshopping
* Anyone want to lead with their own data?
* Potentially meet up at CCT drop in hours (Tuesdays 9-10 am)
* Any interest in inviting outside speakers?
* Henry Scharf at UA - MCMC, Stan model demo
* Kiona Ogle at NAU - parameterizing random effects
* Plug for WIDS Data Blitz: https://widstucson.org/datablitz2024.html
* Feedback survey (also will be emailed and on Slack): https://docs.google.com/forms/d/e/1FAIpQLSfmefXS2w4wkFejQR87MiUdTUv21Qk0HJXIhV4mOtdy9FRSkQ/viewform
### Demo: Jessica - How plants regulate
### Discussion
* Review of chapter 4 and splines
## Session 3: Posterior distributions and geocentric (linear) models
### Attendance
_Put your name here (voluntary, but it helps us keep track!)_
Antonio Rubio
Ellen Bledsoe
Kara Haberstock Tanoue
Anna Dornhaus (only half my brain attending since trying to also follow seminar talk...)
David LeBauer
Connor Wilson
### Housekeeping
* Show-and-tell signups
* Interest in collaborative data workshopping
* Anyone want to lead with their own data?
* Potentially meet up at CCT drop in hours (Tuesdays 9-10 am)
* PSA: The permalink to the 2020 text was sending you to the 2018 text! This is now corrected, but you might want to update your bookmarks.
### Discussion 1 - Ch 3
* Refresher: What are the components of a model?
1) Likelihood function
2) Parameters
3) Prior distributions
* Why do we use density functions (e.g. `dbinom`) to calculate the likelihood?
In R the d prefix gives probability mass (discrete) or density (continuous) for a named distribution.
We calculate the likelihood of the data given that range of $p$
* How do we get the posterior from the likelihood and the priors?
* What do samples from the posterior represent?
### Practice 1 - Ch 3
* Session 1: Exercises from Ch. 3:
* ![image](https://hackmd.io/_uploads/Bk2BB6mjp.png)
* 3E2. How much posterior probability lies above p = 0.8? Did everyone get 11.16%?
* 3E5. 20% of the posterior probability lies above which value of p?
* 3E6. Which values of p contain the narrowest interval equal to 66% of the posterior probability?
* 3M1. Suppose the globe tossing data had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.
* 3M2. Draw 10,000 samples from the grid approximation from above. Then use the samples to calculate the 90% HPDI for p.
### Discussion 2 - Ch 4
* >"Any process that adds together random values from the same distribution converges to a normal."
* Does this make intuitive sense following McElreath's explanation?
* Ontological justification - the world is full of normal distributions
* Epistemological justification - most natural expression of our state of ignorance, consistent with lack of assumptions
* Linear models
* What does a linear model capture?
* Pattern, process, and "geocentrism"
* Components of a model (model specification, likelihood, priors)
* Have you seen models written out in the literature in your field?
* Why might authors chose this form?
* Approaches for summarizing and reporting model results
* Covariance and centering: Why is covariance between parameters important? Why does centering help with this?
* Plotting posterior prediction intervals: What does a plot of posterior samples for a **parameter** tell us? What does a plot of posterior samples for the **predicted data values** tell us? How do we obtain each?
### Practice 2 - Ch 4
* Begin with exercise 4M4 from Rethinking:
* 4M4. A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
* Time permitting, continue on:
* 4M5. Now suppose I tell you that the average height in the first year was 120 cm and that every student got taller each year. Does this information lead you to change your choice of priors? How?
* 4M6. Now suppose I tell you that the variance among heights for students of the same age is never more than 64cm. How does this lead you to revise your priors?
## Session 2: Digging in (to the garden of forking data)
### Attendance
_Put your name here (voluntary, but it helps us keep track!)_
* Renata Diaz
* Jessica Guo
* Kunal Palawat
* Val
* Cynthia Porter
* Sarah Britton
* Anna Dornhaus
* Connor Wilson
* Theresa LeGros
* Kara Haberstock Tanoue
* Julia Fisher
* Kamila Murawska-Wlodarczyk
* Priyanka Kushwaha
* Tomasz Wlodarczyk
* Irina Stefan
* Laura Dozal
* Stacey Tecot
* Evan MacLean
### Temperature check (10 minutes)
* How do you feel about the amount of reading/watching we attempted this week? Put a + next to your answer:
* Too much:
* Just right: ++ +++++++
* More please:
* Commentary:
* The videos feel like less of a lift than reading the chapters
* ^Co-signed, but also I have been doing both (but only doing the book homework)
* My capacity is video-only, and it has been great to get that amount of information without feeling overwhelmed
* Did you have the chance to attempt problems on your own time? How did that go?
* Yes, and I did them all! +
* Did a few +++
* Yes, and I have sticking points to work through:
* Haven't had the chance to look at them: +++++++
* Commentary:
* So many possible homeworks!+++!
* I only did the book homework (recommended problems)
* No, no plans to do that since that will be for our data anayst to do. I am more interested in the conceptual side.
* Takeaways
* JG and RD share an agreed-upon set of problems early in the 2 week cycle and share in Slack +
*
### Discussion questions (30m)
* How has Bayesian philosophy shown up in your life in the past 2 weeks?
* Is probability real?
* Ask a quantum physics researcher...
* Is there even such thing as 'truth'? [This is a common position among scientists I don't really understand. Don't we all subscribe to the idea that there is a single common reality? So 'truth' maybe inaccessible or hard, but surely exists?] - I was thinking in the context of experimental research, where we have to go through all sorts of abstraction, operationalizations and compromises. - [Sure, I get that, and I certainly agree with you that we never claim to have discovered 'actual truth', scientists are rightly wary of that. But it does exist, right? And we feel that everything we do in science helps us get closer to it? All arguments towards doing science in the first place are arguments towards the idea that science is a better way to get towards truth than other approaches.] - I agree that is it the best approach we currently have and it moves us towards truth, but not sure if it goes all the way. very interesting philosophical discussion! [Yes! See my lecture suggestion below :-) PS: Scientists often talk about 'no truth' in public. And other people do claim to have truth. What are the public to think when scientists say 'we don't really know' and other authority figures say 'we do know'? I think it is problematic to confuse accurate communication of uncertainty with too much self-deprecation on the part of scientists about not even caring about truth. So that's just why that worries me. ]
* All the 'probabilities' we talk about are about our information status, they are never about quantum indeterminism. E.g. rolling dice, the outcome varies because small differences in air movement etc have macroscopic effects, not because of anything quantum. [[PS: Just found a relevant McElreath quote, in the pdf on page 50 of the book: "In any event, randomness is always a property of information,
never of the real world."]]
* If anyone is interested in this, I REALLY REALLY recommend the excellent public lecture by Joanna Masel, not unfortunately on the COS website any more but it is here: https://drive.google.com/file/d/13LIi9qRBj2IOYrwTyC2EB8jgsY9XhVgJ/view?usp=drive_link - lecture title 'There is no certainty' (as a lot about probability, science, the history of randomized controlled trials, and other things we are discussing)
* I second this- great lecture!
* Thank you! +
* What about math: discovered, or invented (like shovels)?
* ![Screenshot 2024-01-26 at 09.50.19](https://hackmd.io/_uploads/SJ_INcUqp.jpg)
* Both for me - invented as a tool, but a discovery tool that can represent reality to the point that predictions can be made (I second that!)
* Are probability distributions discovered or invented?
### Break
### Worked examples (40m total)
15 minutes in breakout rooms, 5 minutes report back to group:
* Problems:
* 2E3
* ![image](https://hackmd.io/_uploads/HyNdn2qq6.png)
* 2E4
* ![image](https://hackmd.io/_uploads/rks9235qa.png)
* 2M1
* ![image](https://hackmd.io/_uploads/HyV3n3c56.png)
* 2M3
* ![image](https://hackmd.io/_uploads/r1kanhccp.png)
* Reflection questions:
* What is the purpose of the "grid" (`p_grid`)? Why does it go from 0 to 1?
* Why are we using `dbinom`? What is this function doing?
P(E|land) = P(land|E) * P(E) / P(land)
= P(land|E) * P(E) / (P(land|E) * P(E) + P(land|M) * P(M))
= 0.3 * 0.5 / (0.3 * 0.5 + 1 * 0.5)
15 minutes in breakout rooms, 5 minutes report back to group:
* Problems:
* For E's: ![image](https://hackmd.io/_uploads/rkAEU6596.png)
* 3E2, 3E5, 3E6
* ![image](https://hackmd.io/_uploads/rkjIUac96.png)
* 3M1, 3M2
* ![image](https://hackmd.io/_uploads/rkCPUTc56.png)
* Reflection questions:
* Draw a picture of what the HPDI captures
* Annotate your code (or draw a picture) explaining how the following components relate to each other:
* data
* likelihood
* prior
* posterior
* samples from the posterior
### Between now and next meeting (10m)
* Read Ch. 4 Linear Models and/or
* Watch [Geocentric Models](https://www.youtube.com/watch?v=tNOu-SEacNU) and [Lines and Curves](https://www.youtube.com/watch?v=F0N4b7K_iYQ&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus&index=4)
* Sign up for Bayesian statistics show-and-tell!
## Session 1: Why are we here and how should we do it?
You can type here and it will show up on the side!
### Attendance
*Put your name here to practice writing in the HackMD*
* Stacey Tecot
* Anna Dornhaus
* Jung Mee Park
* Soham Pal
* Godsgift Chuwkuonye
* Joel Parker
* Sarah Britton
* Linh Tran
* Walter Betancourt
* Kunal Palawat
* Savannah Fuqua
* Evan MacLean
* Alexa Brown
* Praneetha Yannam
* Val
* Henri Combrink
* Irina Stefan
* Dong Chen
* Kara Haberstock Tanoue
* Cynthia Porter
* Ulises Hernández
* Connor Wilson
* Andrew Antaya
* Grace Aroz-Moscoso
* Mahek Nirav Shah
* Antonio Rubio
* Laura Dozal
* Ellen Bledsoe
### Learning objectives (so far)
- absorb philosophy and conceptual underpinnings of Bayesian modeling
- build and interpret Bayesian models using modern software packages
- What else?
- Gain confidence when using bayesian methods.
- Gain a deep understanding of Bayesian priors, how to set them in practice
- Bayesian approaches to hypothesis testing
Informal room poll: perhaps 60/40 learning for general use vs. having a specific task at hand
#### From the room:
Accountability/company
Topics and philosophy of SR
Facilitating uptake of Bayesian methods
Learning the _concepts_ underlying Bayesian methods
Staying fresh
What is the purpose of statistics???
### Structure (so far)
Prior to session:
- Read or watch McElreath's *Statistical Rethinking*
- Attempt practice problems from the end of each chapter
- What else?
Between sessions:
- Post questions/resources on Slack
- Attend CCT DS's office hours
- What else
- Zoom co-working sessions for accountability? ++
- Group project meeting times?+
During session:
- Pose questions, discuss points of confusion
- Work through practice problems in break out rooms
- by software package +++++
- groups of mixed expertise? +++
- What else?
- Talk through a example model (your own or someone else's)? +++++++++
- Read papers/blog posts?+ - - -
- Nominate speakers for certain thorny topics? +++
- Form group projects? +++++
- Present a topic / problem from the book / lecture ++++++
- Discuss historical perspectives around the topics.+
- Discuss how we feel about the conceptual approach compared to NHST or other methods? +++
- Discuss examples from popular science - The Theory that Would Not Die ++
### Discussion questions
General:
1) What do you hope to accomplish from participating in this group?
I hope to learn how to translate the particular Bayesian tools I know across more modern software packages, and also improve my ability to communicate about the philosophical differences between Bayesian and frequentist frameworks
I want to expand my statistical toolbox and learn how to apply them to my future research
> Understand the mechanics behind Bayesian models better and gain more experience using R packages for Bayesian analyses (like brms) +
I want to learn how to best relate research questions, statistical methods, and experimental design. As hard as I try, it usually does not work out to have stats methods very flusehd out before sampling design.
I want to learn how to communicate Bayesian methods/results/interpretations to an audience with little to no confidence in statistics and heavy skepticism and mistrust (often deservedly so).
2) What are some points of confusion about Bayesian (theory, methods, or software) that you'd like to clarify from peer-to-peer learning?
How to choose priors? How to test and report the effect of different priors. +++
How to integrate Bayesian learning with other statistical / machine learning approaches? +
How do we assess real-world impact (i.e. Effect size) in Bayesian Framework? ++
How to determine when bayesian methods are most appropriate? +
How to represent models in R stat notation vs. how we write models out in regular stat notation (Many R packages now use familiar notation)
3) What tips, tricks, or other expertise can you share with other Bayesian learners?
I've gotten pretty good at translating between stat notation for distributions and programming notation, especially standard deviations vs. precisions. I can also share what I know about reparameterization and why we sometimes need it to provide meaningful priors/initial values.
INLA is a very flexible bayesian approach implimented in R that does not use MCMC chains. Makes for very fast Bayesian analysis with much lower computational overhead. It is also great for including both spatial and temporal effects. https://www.r-inla.org/
Python package that is somewhat similar (not an INLA expert): https://github.com/dflemin3/approxposterior
For problems that require more computation power, you can use the high performance computing resources at UA. https://docs.hpc.arizona.edu
From Chapter 1:
4) What does McElreath mean by a golem?
Powerful tools (e.g. statistical models, AI, software, etc.) that need to be used responsibly.
A process that can be used wisely or not.
I think he means that a model in itself is not knowledge, or smart, or a solution.
I think his point is also that the concepts of a statistical test or analysis, a model, and a hypothesis are all distinct things and it's important to keep in mind what you are talking about and what you want it to do.
(The Golem and The Djinni is a very nice novel.)
5) How are statistical hypothesis different from a scientific one, and why does it matter?
I don’t have a good direct answer to this but it reminds me of data/results that are "statistically significant but biologically irrelevant/meaningless"
Statistical hypotheses often determine the choice of statistical model but don't necessarily cleanly map onto the scientific question that was intended.
I think he means more than that (the statement above is usually used in the context of effect size): he means that 'statistical hypotheses' are really more like 'predictions' in the Scientific Method sense, i.e. they reflect a specific concrete quantitative outcome or relationship, which doesn't (usually) map 1:1 onto a real hypothesis (which describes some process happening in the world).
6) McElreath writes about the "probablistic nature of evidence". Does this comport with your field and understanding of scientific evidence? Are there certain fields that gravitate toward Bayesian inference because of this?
> First, observations are prone to error, especially at the boundaries of scientific knowledge. --> I think it is important to recognize that in the complex sciences (everything not physics), all observations and measurements have *incredible* levels of 'noise', which is not strictly measurement error but just the millions of unknown and unknowable processes and factors that affect outcomes (that are unrelated to the question being asked). (This is not about being at the boundaries of knowledge) I *really* recommend Joanna Masel's public lecture on this point: https://drive.google.com/file/d/13LIi9qRBj2IOYrwTyC2EB8jgsY9XhVgJ/view?usp=drive_link
> McElreath's discussion of measurement error is a major issue in my field (evaluation/social sciences broadly) because so many of our measures are drawn from survey tools that have all sorts of potential sources of error (from sampling error to the psychometrics of specific survey tools, etc). We're often building models to analyze datasets with many of these measurement problems
(Anna:) Trying to briefly summarize my point about cutoffs vs not: One approach to evidence is to say 'we just want the best estimate for how much factor A affects X'. This is the typical Bayesian approach and what McElreath favors, and it is certainly relevant in cases where you have a million factors affecting X and a yes-or-no answer isn't necessarily that meaningful (because the answer is usually yes, but not meaningfully so). Another approach to evidence is to say 'we set up an artificial, controlled situation to see whether or not factor A affects X'. In that case effect size is really meaningless, because it won't be the same as 'in nature' (whatever that means). Instead, what we really want there is a big hurdle that prevents people from claiming things like 'I found a cure for X' or 'I found this new process A-X'. This is a situation in which IMHO null-hypothesis-testing is really important, because it gives a non-negotiable result that may contradict what the researcher would like to find and that prevents a 'spin' being put on the results. I put a paper on Slack that argues for example that a dichotomous decision-rule is important.