# Bayesian Learning Group
## Session 2
### Attendance
### Discussion Q's
### Problems
## Session 1: Why are we here and how should we do it?
You can type here and it will show up on the side!
### Attendance
*Put your name here to practice writing in the HackMD*
* Stacey Tecot
* Anna Dornhaus
* Jung Mee Park
* Soham Pal
* Godsgift Chuwkuonye
* Joel Parker
* Sarah Britton
* Linh Tran
* Walter Betancourt
* Kunal Palawat
* Savannah Fuqua
* Evan MacLean
* Alexa Brown
* Praneetha Yannam
* Val
* Henri Combrink
* Irina Stefan
* Dong Chen
* Kara Haberstock Tanoue
* Cynthia Porter
* Ulises Hernández
* Connor Wilson
* Andrew Antaya
* Grace Aroz-Moscoso
* Mahek Nirav Shah
* Antonio Rubio
* Laura Dozal
### Learning objectives (so far)
- absorb philosophy and conceptual underpinnings of Bayesian modeling
- build and interpret Bayesian models using modern software packages
- What else?
- Gain confidence when using bayesian methods.
- Gain a deep understanding of Bayesian priors, how to set them in practice
- Bayesian approaches to hypothesis testing
Informal room poll: perhaps 60/40 learning for general use vs. having a specific task at hand
#### From the room:
Accountability/company
Topics and philosophy of SR
Facilitating uptake of Bayesian methods
Learning the _concepts_ underlying Bayesian methods
Staying fresh
What is the purpose of statistics???
### Structure (so far)
Prior to session:
- Read or watch McElreath's *Statistical Rethinking*
- Attempt practice problems from the end of each chapter
- What else?
Between sessions:
- Post questions/resources on Slack
- Attend CCT DS's office hours
- What else
- Zoom co-working sessions for accountability? ++
- Group project meeting times?+
During session:
- Pose questions, discuss points of confusion
- Work through practice problems in break out rooms
- by software package +++++
- groups of mixed expertise? +++
- What else?
- Talk through a example model (your own or someone else's)? +++++++++
- Read papers/blog posts?+ - - -
- Nominate speakers for certain thorny topics? +++
- Form group projects? +++++
- Present a topic / problem from the book / lecture ++++++
- Discuss historical perspectives around the topics.+
- Discuss how we feel about the conceptual approach compared to NHST or other methods? +++
- Discuss examples from popular science - The Theory that Would Not Die ++
### Discussion questions
General:
1) What do you hope to accomplish from participating in this group?
I hope to learn how to translate the particular Bayesian tools I know across more modern software packages, and also improve my ability to communicate about the philosophical differences between Bayesian and frequentist frameworks
I want to expand my statistical toolbox and learn how to apply them to my future research
> Understand the mechanics behind Bayesian models better and gain more experience using R packages for Bayesian analyses (like brms) +
I want to learn how to best relate research questions, statistical methods, and experimental design. As hard as I try, it usually does not work out to have stats methods very flusehd out before sampling design.
I want to learn how to communicate Bayesian methods/results/interpretations to an audience with little to no confidence in statistics and heavy skepticism and mistrust (often deservedly so).
2) What are some points of confusion about Bayesian (theory, methods, or software) that you'd like to clarify from peer-to-peer learning?
How to choose priors? How to test and report the effect of different priors. +++
How to integrate Bayesian learning with other statistical / machine learning approaches? +
How do we assess real-world impact (i.e. Effect size) in Bayesian Framework? ++
How to determine when bayesian methods are most appropriate? +
How to represent models in R stat notation vs. how we write models out in regular stat notation (Many R packages now use familiar notation)
3) What tips, tricks, or other expertise can you share with other Bayesian learners?
I've gotten pretty good at translating between stat notation for distributions and programming notation, especially standard deviations vs. precisions. I can also share what I know about reparameterization and why we sometimes need it to provide meaningful priors/initial values.
INLA is a very flexible bayesian approach implimented in R that does not use MCMC chains. Makes for very fast Bayesian analysis with much lower computational overhead. It is also great for including both spatial and temporal effects. https://www.r-inla.org/
Python package that is somewhat similar (not an INLA expert): https://github.com/dflemin3/approxposterior
For problems that require more computation power, you can use the high performance computing resources at UA. https://docs.hpc.arizona.edu
From Chapter 1:
4) What does McElreath mean by a golem?
Powerful tools (e.g. statistical models, AI, software, etc.) that need to be used responsibly.
A process that can be used wisely or not.
I think he means that a model in itself is not knowledge, or smart, or a solution.
I think his point is also that the concepts of a statistical test or analysis, a model, and a hypothesis are all distinct things and it's important to keep in mind what you are talking about and what you want it to do.
(The Golem and The Djinni is a very nice novel.)
5) How are statistical hypothesis different from a scientific one, and why does it matter?
I don’t have a good direct answer to this but it reminds me of data/results that are "statistically significant but biologically irrelevant/meaningless"
Statistical hypotheses often determine the choice of statistical model but don't necessarily cleanly map onto the scientific question that was intended.
I think he means more than that (the statement above is usually used in the context of effect size): he means that 'statistical hypotheses' are really more like 'predictions' in the Scientific Method sense, i.e. they reflect a specific concrete quantitative outcome or relationship, which doesn't (usually) map 1:1 onto a real hypothesis (which describes some process happening in the world).
6) McElreath writes about the "probablistic nature of evidence". Does this comport with your field and understanding of scientific evidence? Are there certain fields that gravitate toward Bayesian inference because of this?
> First, observations are prone to error, especially at the boundaries of scientific knowledge. --> I think it is important to recognize that in the complex sciences (everything not physics), all observations and measurements have *incredible* levels of 'noise', which is not strictly measurement error but just the millions of unknown and unknowable processes and factors that affect outcomes (that are unrelated to the question being asked). (This is not about being at the boundaries of knowledge) I *really* recommend Joanna Masel's public lecture on this point: https://drive.google.com/file/d/13LIi9qRBj2IOYrwTyC2EB8jgsY9XhVgJ/view?usp=drive_link
> McElreath's discussion of measurement error is a major issue in my field (evaluation/social sciences broadly) because so many of our measures are drawn from survey tools that have all sorts of potential sources of error (from sampling error to the psychometrics of specific survey tools, etc). We're often building models to analyze datasets with many of these measurement problems
(Anna:) Trying to briefly summarize my point about cutoffs vs not: One approach to evidence is to say 'we just want the best estimate for how much factor A affects X'. This is the typical Bayesian approach and what McElreath favors, and it is certainly relevant in cases where you have a million factors affecting X and a yes-or-no answer isn't necessarily that meaningful (because the answer is usually yes, but not meaningfully so). Another approach to evidence is to say 'we set up an artificial, controlled situation to see whether or not factor A affects X'. In that case effect size is really meaningless, because it won't be the same as 'in nature' (whatever that means). Instead, what we really want there is a big hurdle that prevents people from claiming things like 'I found a cure for X' or 'I found this new process A-X'. This is a situation in which IMHO null-hypothesis-testing is really important, because it gives a non-negotiable result that may contradict what the researcher would like to find and that prevents a 'spin' being put on the results. I put a paper on Slack that argues for example that a dichotomous decision-rule is important.