--- tags: stat340, learning-targets --- # Stat 340 Learning Target Quiz 1 Study Guide Learning Target Quiz #1 will focus on univariate models. It will include questions on learning targets 2-6. #### 2. Given the prior distribution and data, derive the posterior distribution for a univariate model. - Understand how the three steps of inference (prior, likelihood, and posterior) fit together - Be able to derive the likelihood function - Identify the likelihood through the story of a distribution (i.e., choose a logical distribution to model the data) - "The posterior is proportional to prior times likelihood." - Derive the posterior distribution using Bayes' rule up to the normalizing constant - Given a sampling distribution (likelihood), show that a prior is conjugate - For conjugate priors, identify and fully parameterize the posterior distribution - For discrete priors, calculate the posterior table. <ins>Example question:</ins> The Weibull distribution is often used as a model for survival times in biomedical, demographic, and engineering analyses. A random variable $Y$ has a Weibull distribution if its pdf is as follows \begin{eqnarray*} f(y \mid \alpha, \lambda) = \lambda \alpha y^{\alpha -1} \exp(-\lambda y^\alpha) \,\,\,\,\,\,\,\,\, \text{for } y > 0. \end{eqnarray*} Here, $\alpha>0$ and $\lambda>0$ are parameters of the distribution. For this problem, assume that $\alpha = \alpha_0$ is known, but $\lambda$ is not known. 1. Assuming the improper prior distribution $\pi(\lambda \mid \alpha = \alpha_0) \propto 1$, and that $Y_1, \ldots, Y_n$ are i.i.d.~Weibull random variables, derive the unstandardized posterior distribution for $\lambda$. 2. Write the name of the posterior distribution your derived in part (a) and expressions for its parameter values. --- #### 3. Given the posterior distribution and a research question, estimate the parameter (or function of parameters) of interest and interpret the results in context. - Given a posterior distribution, calculate the posterior mean, median, MAP estimate, variance, standard deviation, equal-tailed credible interval (i.e. percentile interval), and $P (\theta > k)$ - Interpret a credible interval for a parameter - Perform and interpret a Bayesian hypothesis test - Be able to use either the theoretical posterior or draws (simulations) from the posterior distribution <ins>Example question:</ins> Accidents and other incidents involving commercial aircraft are recorded by the Bureau of Aircraft Accidents Archive (B3A), an international organization located in Geneva, Switzerland. Let $Y$ denote the number of crashes observed per year, and let $n = 10$ denote the total number of observations (years). You observe $\sum Y = 185$. The following model is used to analyze these data: \begin{align*} Y_1,\ldots,Y_{10}| \theta & \overset{{\rm iid}}{\sim} {\rm Poisson}(\lambda)\\ \lambda &\sim {\rm Gamma}(10, 0.5)\\ \lambda | Y_1,\ldots,Y_{10} &\sim {\rm Gamma}(195, 10.5) \end{align*} 1. Explain how you would compute a 90% equal-tail credible interval for $\lambda$. 2. The 90% credible interval for $\lambda$ is 16.54 to 20.73. Interpret this interval in the context of the problem. --- #### 4. Given the posterior distribution and a research question, conduct a Bayesian hypothesis test and interpret the results in context. - Given a research question, set up hypotheses of interest - Given a posterior distribution, calculate the probability of the hypotheses - Interpret this posterior probability and use it to draw a logical conclusion for the hypothesis test in context <ins>Example question:</ins> Denote the probability that a part is defective as $\theta$. The industry standard is that no more than 0.1\% of parts can be defective, i.e., $\theta \le 0.001$. Your company has purchased a new machine, generated 10,000 parts, and tested each to determine if it is defective. You are now tasked with testing the null hypothesis that $\theta \le 0.001$ versus the alternative hypothesis that $\theta > 0.001$. You decide to use the following model for $Y$, the number of defective parts in the sample of size $n = 10,000$: \begin{align*} Y| \theta &\sim {\rm Binomial}(n, \theta)\\ \theta &\sim {\rm Beta}(0.5, 0.5)\\ \theta | Y &\sim {\rm Beta}(Y+0.5, n-Y+0.5) \end{align*} 1. Explain how you would compute $P( \theta < 0.001 | Y)$. 2. Provide an interpretation of $P( \theta < 0.001 | Y)$ in the context of the problem. --- #### 5. Given a univariate Bayesian model, derive the posterior predictive distribution and use it to make predictions about future observations. - Write down the integral expression for the posterior predictive distribution - Describe how you would generate samples from the posterior predictive distribution - Once you have the posterior predictive distribution, use it to calculate predictions and prediction intervals <ins>Example question:</ins> You just bought stock in FancyTech. Let $\mu$ denote be the average dollar amount that your FancyTech stock goes up or down in a one-day period. Suppose that it's reasonable to assume that the daily changes in FancyTech stock value are Normally distributed with a known standard deviation of $\sigma = 2$ dollars. On a random sample of 4 days, you observe changes in stock value of $-0.7$, 1.2, 4.5, and $-4$ dollars. You decide to use the following model for the daily returns: \begin{align*} Y_1, Y_2, Y_3, Y_4| \theta &\sim \mathcal{N}(\mu, 2)\\ \mu &\sim \mathcal{N}(7.2, 2.6)\\ \mu | Y_1, Y_2, Y_3, Y_4 &\sim \mathcal{N}(1.15, 0.87) \end{align*} Let $\mu^{(1)}, \mu^{(2)}, \ldots, \mu^{(n)}$ be $n$ draws from the posterior distribution of $\mu$. Outline the steps required to calculate an 89\% prediction interval for tomorrow's change in the FancyTech stock price. (Please give a numbered list. No code is required.) --- #### 6. Assess the adequacy of a univariate Bayesian model. - Describe how you would generate the posterior predictive distribution for model checking - Be able to critique/check a model using the posterior predictive distribution <ins>Example question:</ins> Beary (my golden retriever) likes to wake me up early, after which I stumble downstairs and make coffee. Suppose that I collected a representative sample of $n=20$ measurements of the time it took me to make coffee in the morning (the time from waking up to pressing the on button). After updating my prior belief, I found that the posterior of $\mu$, the average time to make coffee, is $\mathcal{N}(10, 1)$. Looking at my observed data, I see an unusually large value, $y = 22$. Below is a histogram of the maximum time to make coffee in a sample of 20 days for 1000 predictive samples. The observed maximum is displayed as a vertical line. ![](https://i.imgur.com/ufhkCqB.png) What does this plot reveal about the model's adequacy?