# 3章章末問題
## 3.10.1
### Question
Binomial and multinomial models: suppose data $(y_1, \cdots, y_J)$ follow a multinomial distribution with parameters $(\theta_1, \cdots, \theta_J)$. Also suppose that $\mathbf{\theta} = (\theta_1, \cdots, \theta_J)$ has a Dirichlet prior distribution. Let $\alpha = \frac{\theta_1}{\theta_1 + \theta_2}$. Write the marginal posterior distribution for $\alpha$.
### Solution
$\mathbf{\theta}$の事前分布を$\mathrm{Dirichlet}(a_1, a_2, \cdots, a_J)$とする。Dirichlet分布の性質 (p.583) より、$\theta_1, \theta_2$の同時周辺事後分布は
\begin{align}
p(\theta_1, \theta_2|y) \propto \theta^{y_1 + a_1 - 1}_1 \theta^{y_2 + a_2 - 1}_2 (1 - \theta_1 - \theta_2)^{y_3 + \cdots + y_J + a_3 + \cdots + a_J - 1}
\end{align}
となる。
ここで$(\alpha, \beta) = (\frac{\theta_1}{\theta_1 + \theta_2}, \theta_1 + \theta_2)$という変数変換を考える。$\theta_1 = \alpha\beta, \theta_2 = (1 - \alpha)\beta$なので、ヤコビアンは
\begin{align}
J &= \begin{vmatrix}
\frac{\partial \alpha\beta}{\partial \alpha} & \frac{\partial (1 - \alpha)\beta}{\partial \alpha} \\
\frac{\partial \alpha\beta}{\partial \beta} & \frac{\partial (1 - \alpha)\beta}{\partial \beta} \\
\end{vmatrix} \\
&= \begin{vmatrix}
\beta & -\beta \\
\alpha & 1 - \alpha \\
\end{vmatrix} \\
&= (1 - \alpha)\beta + \alpha\beta \\
&= \beta
\end{align}
となるので、
\begin{align}
p(\alpha, \beta|y) &\propto (\alpha\beta)^{y_1 + a_1 - 1} ((1 - \alpha)\beta)^{y_2 + a_2 - 1} (1 - \beta)^{y_3 + \cdots + y_J + a_3 + \cdots + a_J - 1}\beta \\
&= \alpha^{y_1 + a_1 - 1} (1 - \alpha)^{y_2 + a_2 - 1} \beta^{y_1 + y_2 + a_1 + a_2 - 1} (1 - \beta)^{y_3 + \cdots + y_J + a_3 + \cdots + a_J - 1} \\
&\propto \mathrm{Beta}(\alpha|y_1 + a_1, y_2 + a_2) \mathrm{Beta}(\beta|y_1 + y_2 + a_1 + a_2, y_3 + \cdots + y_J + a_3 + \cdots + a_J).
\end{align}
上式より、$\alpha, \beta$は独立なので$\alpha$の周辺事後分布は
\begin{align}
p(\alpha|y) \propto \mathrm{Beta}(y_1 + a_1, y_2 + a_2).
\end{align}
なお式(2.3) (p.30) より、これは事前分布$\mathrm{Beta}(a_1, a_2)$のもと総試行数$y_1 + y_2$ (うち成功数$y_1$) のデータを観察した場合の事後分布に等しい。
## 3.10.2
### Question
Comparison of two multinomial observations: on September 25, 1988, the evening of a presidential campaign debate, ABC News conducted a survey of registered voters in the United States; 639 persons were polled before the debate, and 639 different persons were polled after. The results are displayed in Table 3.2. Assume the surveys are independent simple random samples from the population of registered voters. Model the data with two different multinomial distributions. For $j = 1, 2$, let ($j$ be the proportion of voters who preferred Bush, out of those who had a preference for either Bush or Dukakis at the time of survey $j$. Plot a histogram of the posterior density for $\alpha_2 − \alpha_1$. What is the posterior probability that there was a shift toward Bush?
### Solution
ブッシュへの投票意向を成功、その他への候補への選好を失敗とする二項分布を考え、事前分布には$\mathrm{Beta}(1, 1)$を用いる。Table 3.2より、討論会前後におけるブッシュへの投票意向の事後分布は
\begin{align}
\alpha_{1} &\propto \mathrm{Beta}(295, 346) \\
\alpha_{2} &\propto \mathrm{Beta}(289, 352)
\end{align}
となる。$\alpha_2 - \alpha_1$の分布を解析的に求めるのは面倒なので、シミュレーションを用いる。
```{r}
# seed for random number generation
set.seed(1234)
# simulate alpha
alpha1 <- rbeta(1000, 295, 346)
alpha2 <- rbeta(1000, 289, 352)
# take the difference
alpha_diff <- alpha2 - alpha1
# summarize
summary(alpha_diff)
hist(alpha_diff)
```
## 3.10.3
### Question
Estimation from two independent experiments: an experiment was performed on the effects of magnetic fields on the flow of calcium out of chicken brains. Two groups of chickens were involved: a control group of 32 chickens and an ex posed group of 36 chickens. One measurement was taken on each chicken, and the purpose of the experiment was to measure the average flow $\mu_c$ in untreated (control) chickens and the average flow $\mu_t$ in treated chickens. The 32 measurements on the control group had a sample mean of 1.013 and a sample standard deviation of 0.24. The 36 measurements on the treatment group had a sample mean of 1.173 and a sample standard deviation of 0
.20.
#### (a)
Assuming the control measurements were taken at random from a normal distribution with mean $\mu_c$ and variance $\sigma^2_e$, what is the posterior distribution of $\mu_c$? Similarly, use the treatment group measurements to determine the marginal posterior distribution of $\mu_t$. Assume a uniform prior distribution on $(\mu_c, \mu_t, \log \sigma_e^2, \log \sigma_t^2)$
#### (b)
What is the posterior distribution for the difference, $\mu_t - \mu_c$? To get this, you may sample from the independent t distributions you obtained in part (a) above. Plot a histogram of your samples and give an approximate 95% posterior interval for $\mu_t - \mu_c$. The problem of estimating two normal means with unknown ratio of variances is called the Behrens–Fisher problem.
### Solution
#### (a)
まず事後分布は
\begin{align}
p(\mu_c, \mu_t, \mathrm{log}\sigma_c,\mathrm{log}\sigma_t|y) &\propto p(\mu_c, \mu_t, \mathrm{log}\sigma_c,\mathrm{log}\sigma_t)p(y|\mu_c, \mu_t, \mathrm{log}\sigma_c,\mathrm{log}\sigma_t)\\
&= \prod_{i=1}^{i=32}{\mathrm{N}(y_{ci}|\mu_c, \sigma_c^2)} \prod_{i=1}^{i=36}{\mathrm{N}(y_{ti}|\mu_t, \sigma_t^2)}
\end{align}
(3.2)での議論(教科書p.66付近)より、周辺事後分布は自由度n-1のt分布を$t_{n-1}(\bar{y}, \frac{s^2}{n})$に従うので、
\begin{align}
\mu_c |y \sim t_{31}(1.013, \frac{0.24^2}{32})\\
\mu_t |y \sim t_{35}(1.173, \frac{0.20^2}{36})\\
\end{align}
となる。
#### (b)
```{r}
mu.c <- 1.013 + (0.24/sqrt(32))*rt(1000,31)
mu.t <- 1.173 + (0.20/sqrt(36))*rt(1000,35)
dif <- mu.t - mu.c
hist (dif, xlab="mu_t - mu_c", yaxt="n",
breaks=seq(-.1,.4,.02), cex=2)
print (sort(dif)[c(25,976)])
```
## 3.10.4
### Question
Inference for a 2 × 2 table: an experiment was performed to estimate the effect of beta-blockers on mortality of cardiac patients. A group of patients were randomly assigned to treatment and control groups: out of 674 patients receiving the control, 39 died, and out of 680 receiving the treatment, 22 died. Assume that the outcomes are independent and binomially distributed, with probabilities of death of $p_0$ and $p_1$ under the control and treatment, respectively. \(We return to this example in Section 5.6).
(a) Set up a noninformative prior distribution on $(p_0, p_1)$ and obtain posterior simulations.
(b) Summarize the posterior distribution for the
odds ratio, $(p_1/(1−p_1))/(p_0/(1−p_0))$.
(c\) Discuss the sensitivity of your inference to your choice of noninformative prior density
### Solution
## 3.10.5
### Questions
Rounded data: it is a common problem for measurements to be observed in rounded form (for a review, see Heitjan, 1989). For a simple example, suppose we weigh an object five times and measure weights, rounded to the nearest pound, of 10, 10, 12, 11, 9. Assume the unrounded measurements are normally distributed with a noninformative prior distribution on the mean µ and variance σ2.
(a) Give the posterior distribution for (µ, σ2) obtained by pretending that the observations are exact unrounded measurements.
(b) Give the correct posterior distribution for (µ, σ2) treating the measurements as rounded.
(c) How do the incorrect and correct posterior distributions differ? Compare means, variances, and contour plots.
(d) Let z = (z1, . . . , z5) be the original, unrounded measurements corresponding to the five observations above. Draw simulations from the posterior distribution of z. Compute the posterior mean of (z1 − z2)2.
### Solutions
#### a
それぞれのデータが独立したものである場合、3.2の議論がそのまま活用できる。よって、$p(\sigma^2|y) \sim Inv-\chi^2(n-1, s^2)$かつ、$p(\mu|\sigma^2, y) \sim N(\bar{y}, \sigma^2/n)$が成り立つ。ただし、$n= 5, \bar{y} = (10 + 10 12 + 11 + 9)/5 = 10.4, s^2 = 1/(5-1)((10 -10.4)^2 +(10 -10.4)^2 + (12 -10.4)^2 + (11 -10.4)^2 + (9 -10.4)^2 )=1.3$
#### b
## 3.10.6
Binomial with unknown probability and sample size: some of the difficulties with setting
prior distributions in multiparameter models can be illustrated with the simple binomial
distribution. Consider data y1, . . . , yn modeled as independent Bin(N, $\theta$), with both N
and $\theta$ unknown. Defining a convenient family of prior distributions on (N, $\theta$) is difficult,
partly because of the discreteness of N.
Raftery (1988) considers a hierarchical approach based on assigning the parameter N
a Poisson distribution with unknown mean μ. To define a prior distribution on ($\theta$,N),
Raftery defines $\lambda$ = $\mu\theta$ and specifies a prior distribution on $(\lambda, \theta)$. The prior distribution
is specified in terms of $\lambda$ rather than $\mu$ because ‘it would seem easier to formulate prior
information about $\lambda$, the unconditional expectation of the observations, than about $\mu$,
the mean of the unobserved quantity N.
\begin{align}
p(\mu, \theta)
&\propto
p(\lambda, \theta)
Det(\frac{\partial (\lambda, \theta)}{\partial(\mu, \theta)}) \\
&= \frac{1}{\lambda} \theta \\
&= \frac{1}{\mu}
\end{align}
$\mu$ is independent from theta, it therefore follows that $p(\mu) \propto \mu^{-1}$
\begin{align}
p(N, \mu) &= p(N \mid \mu) p(\mu)\\
&\propto \frac{\mu^N e^{-\mu}}{N!}\frac{1}{\mu}\\
&= \frac{\mu^{N-1} e^{-\mu}}{N!}
\end{align}
Marginalize out $\mu$ with the definition of the gamma function
\begin{align}
p(N)
&\propto
\int_0^\infty \frac{\mu^{N-1} e^{-\mu}}{N!} d\mu
\\
&=
\frac{1}{N!} \Gamma (N)
\\
&=
\frac{1}{N!} (N - 1)!
\\
&=
\frac{1}{N}
\end{align}
The joint posterior is
\begin{align}
p(N, \theta \mid y)
\propto
\frac{1}{N}\cdot \prod_{i = 1}^n \binom{N}{y_i} \cdot \theta^{\sum_{i = 1}^n y_i} (1 - \theta)^{nN - \sum_{i = 1}^n y_i}
\end{align}
## 3.10.7
### Question
Poisson and binomial distributions: a student sits on a street corner for an hour and records the number of bicycles $b$ and the number of other vehicles $v$ that go by. Two modesls are considered.
- The outcomes $b$ and $v$ have independent Poisson distributions, with unknown means $\theta_{b}$ and $\theta_{v}$.
- The outcome $b$ has a binomial disribuion, with unknown probability $p$ and sample size $b+v$.
Show that the two models have the same likelihood if we define $p = \frac{\theta_{b}}{\theta_{b}+\theta_{v}}$
### Solution
\begin{align}
X : p(b|\theta_{b}) \sim Poisson(\theta_{b})\\
Y : p(v|\theta_{v}) \sim Poisson(\theta_{v})\\
Z : p(b|n, p) \sim Binomial(n, p)
\end{align}
とおくと、自転車が$b$回、その他の車両が$v$回観察される確率は以下のように表せられる。
まず、一つ目のモデルでは
\begin{align}
P(X = b)P(Y= v) &= \frac{\theta_{b}^{b} e^{-\theta_{b}}}{b!}\frac{\theta_{v}^{v} e^{-\theta_{v}}}{v!}\\
&= \frac{\theta_{b}^{b} \theta_{v}^{v} e^{-\theta_{b}-\theta_{v}}}{b!v!}
\end{align}
二つ目のモデルでは、$n = b + v$, $p = \frac{\theta_{b}}{\theta_{b}+\theta_{v}}$とおくと,
\begin{align}
P(Z=b) &= {n \choose b}p^{b}(1-p)^{n-b}\\
&= {b+v \choose b}(\frac{\theta_{b}}{\theta_{b}+\theta_{v}})^{b}(\frac{\theta_{v}}{\theta_{b}+\theta_{v}})^{v}\\
&= \frac{(b+v)!}{b!v!}\frac{\theta_{b}^{b} \theta_{v}^{v}}{(\theta_{b}+\theta_{v})^{b+v}}
\end{align}
上記二つの式を比較して、共通する部分を消すと、
\begin{align}
e^{-\theta_{b}-\theta_{v}} = \frac{(b+v)!}{(\theta_{b}+\theta_{v})^{b+v}}
\end{align}
これが成立することを示せば良い。(ここから先不明)
斉藤さんに提案されたやり方.
$np = (b+v)\frac{\theta_{b}}{\theta_{b}+\theta_{v}}$
\begin{align}
\lim_{n \to \infty} {n \choose b}p^{b}(1-p)^{n-b} &= \lim_{n \to \infty} \frac{n(n-1)(n-2)...(n-b+1)}{b!}p^{b}(1-p)^{n-b}\\
&= \frac{1}{b!}\lim_{n \to \infty}n(n-1)(n-2)...(n-b+1)p^{b}(1-p)^{n-b}\\
&= \frac{1}{b!}\lim_{n \to \infty}\frac{n}{n}(\frac{n}{n}-\frac{1}{n})(\frac{n}{n}-\frac{2}{n})...(\frac{n}{n} - \frac{b-1}{n})(np)^{b}(1-p)^{n-b}\\
& = \frac{1}{b!} (np)^{b} \lim_{n \to \infty}(1-p)^{n-b}\\
&= \frac{1}{b!} (np)^{b} \lim_{n \to \infty} \frac{(1-\frac{pn}{n})^{n}}{(1-\frac{pn}{n})^{b}}\\
&= \frac{1}{b!} (np)^{b}e^{-pn}
\end{align}
(c.f. $\lim_{n \to \infty}(1+\frac{x}{n}) = e^{x}$)
これは$Poisson(b, (b+v)\frac{\theta_{b}}{\theta_{b}+\theta_{v}})$には対応するが、$X : p(b|\theta_{b}) \sim Poisson(\theta_{b})$には対応してないように見える.
(齋藤案2)
まず、一つ目のモデルでは
\begin{align}
p(b, v|\theta_b, \theta_v) &= p(b|\theta_b)P(v|\theta_v) \\
&= \frac{\theta_{b}^{b} e^{-\theta_{b}}}{b!}\frac{\theta_{v}^{v} e^{-\theta_{v}}}{v!}\\
&\propto \theta_{b}^{b} \theta_{v}^{v} e^{-\theta_{b}-\theta_{v}} &(3.10.7.1)
\end{align}
であり、二つ目のモデルでは、
\begin{align}
P(b, v|p) &\propto p^b (1-p)^v &(3.10.7.2)
\end{align}
である。今、3.10.1に倣い、$q = \theta_b + \theta_v$とすると、
\begin{align}
\begin{cases}
\theta_b = pq \\
\theta_v = (1-p)q
\end{cases}
\end{align}
である。そして、変数変換$(\theta_b, \theta_v) \rightarrow (p, q)$を行い、$q$を積分消去することを通じて、 (3.10.7.1)から(3.10.7.2)式の導出を目指す。なお、ヤコビアンは、
\begin{align}
|J| &=
\begin{vmatrix}
\frac{\partial \theta_b}{\partial p} & \frac{\partial \theta_b}{\partial q} \\
\frac{\partial \theta_v}{\partial p} & \frac{\partial \theta_v}{\partial q}
\end{vmatrix} \\
&=
\begin{vmatrix}
q & p\\
-q & 1-p
\end{vmatrix} \\
&= |q(1-p)-p(-q)| \\
&= q
\end{align}
である。よって、(3.10.7.1)を変数変換すると、
\begin{align}
p(b,v|\theta_b(p, q), \theta_v(p, q))
&\propto (pq)^b \{(1-p)q\}^v \exp(-q) \\
&= p^b (1-p)^v q^{b+v} \exp(-q) &(3.10.7.3)
\end{align}
である、よって、(3.10.7.3)の$q$を積分消去すると、
\begin{align}
p(b, v|p) &= \int p(b, v|p, q)|J| dq \\
&\propto \int p^b (1-p)^v q^{b+v+1} \exp(-q) dq \\
&= p^b (1-p)^v \int q^{b+v+1} \exp(-q) dq \\
&\propto p^b(1-p)^v &(\because \int q^{b+v+1} \exp(-q) dq = \Gamma(b+v+2))
\end{align}
よって、(3.10.7.2)が導出され、題意は満たされた。
## 3.10.8
### Question
Analysis of proportions: a survey was done of bicycle and other vehicular traffic in the neighborhood of the campus of the University of California, Berkeley, in the spring of 1993. Sixty city blocks were selected at random; each block was observed for one hour, and the numbers of bicycles and other vehicles traveling along that block were recorded. The sampling was stratified into six types of city blocks: busy, fairly busy, and residential streets, with and without bike routes, with ten blocks measured in each stratum. Table 3.3 displays the number of bicycles and other vehicles recorded in the study. For this problem, restrict your attention to the first four rows of the table: the data on residential streets.
#### (a)
Let y1, . . . , y10 and z1, . . . , z8 be the observed proportion of traffic that was on bicycles in the residential streets with bike lanes and with no bike lanes, respectively (so y1 = 16/(16 + 58) and z1 = 12/(12 + 113), for example). Set up a model so that the $y_{i}$’s are independent and identically distributed given parameters θy and the zi’s are independent and identically distributed given parameters θz.
#### Solution
- ポワソンのoffset項を使う案があったが、offset項が時間を表している一方、ここでは観察時間に関する情報がないので使うのを見送った。
- $y_{i}, z_{i}$は自転車が通る割合と解釈できるため、ベータ分布を使う案もあった。自転車の通過回数を$\theta_{y}^{1}, \theta_{z}^{1}$、自転車以外の車両の通過回数を$\theta_{y}^{2}, \theta_{z}^{2}$とおくと、ベータ分布を用いて、
\begin{align}
y_{i}|\theta_{y}^{1}, \theta_{y}^{2} \sim Beta(\theta_{y}^{1}+1, \theta_{y}^{2}+1)\\
z_{i}|\theta_{z}^{}, \theta_{z}^{2} \sim Beta(\theta_{z}^{1}+1, \theta_{z}^{2}+1)
\end{align}
とモデル化できる。
Robert and Casella の *Introducing Monte Carlo Methods in R* (Springer, 2010) P72 によれば、観測値$x$($x \sim B(\alpha, \beta) = \frac{x^{\alpha - 1}{(1-x)^{\beta-1}}}{Beta(\alpha, \beta)}$)に対するベータ分布の共役事前分布は
\begin{align}
\biggl\{ \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \biggl\}^{\lambda}x_{0}^{\alpha}y_{0}^{\beta}
\end{align}
($\lambda, x_{0}, y_{0}$はハイパーパラメーター)
この事後分布は
\begin{align}
\biggl\{ \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} \biggl\}^{\lambda+1}[xx_{0}]^{\alpha}[(1-x)y_{0}]^{\beta}
\end{align}
となる。しかし、この事後分布は直接シミュレーションすることはできず、グリッド近似を用いる必要があるらしい。(ここから先不明)
(グリッド近似については矢内先生のこんな資料がありました。http://yukiyanai.github.io/jp/classes/rm1/contents/R/linear-regression-2.html)
## 3.10.9
### Question
Conjugate normal model: suppose $y$ is an independent and identically distributed sample of size n from the distribution $N(\mu, \sigma^2)$, where the prior distribution is $\sigma^2 \sim \mathrm{Inv-}\chi^2(\nu_0, \sigma^2_0)$ and $\mu|\sigma^2 \sim N(\mu_0, \sigma^2/\kappa_0)$. Derive the posteriror distribution $p(\mu, \sigma^2|y)$explicitly in terms of the prior parameters and the sufficient statistics of the data.
### Solution
\begin{align}
p(\mu, \sigma^2|y) \propto& \ \sigma^{-1} (\sigma^2)^{-(\nu_0/2 + 1)} \exp\Big(-\frac{1}{2\sigma^2}[\nu_0 \sigma^2_0 + \kappa_0 (\mu - \mu_0)^2]\Big) \times \\
& \ (\sigma^2)^{-n/2} \exp\Big(-\frac{1}{2\sigma^2}[(n - 1)s^2 + n(\bar{y} - \mu)^2]\Big) \\
=& \ \sigma^{-1} \exp\Big(-\frac{1}{2\sigma^2}[(\kappa_0 + n)\mu^2 - 2(\mu_0\kappa_0 + n\bar{y})\mu]\Big) \times \\
& \ \sigma^{-(\frac{\nu_0 + n}{2} + 1)}\exp\Big(-\frac{1}{2\sigma^2}[\nu_0 \sigma^2_0 + (n - 1)s^2 + \mu^2_0\kappa_0 + n\bar{y}^2]\Big) \\
=& \ (\sigma^2)^{-1/2} \exp\Big(-\frac{\kappa_0 + n}{2\sigma^2}\big(\mu - \frac{\mu_0\kappa_0 + n\bar{y}}{\kappa_0 + n}\big)^2\Big) \times \\
& \ \sigma^{-(\frac{\nu_0 + n}{2} + 1)} \exp\Big(-\frac{1}{2\sigma^2}[\nu_0 \sigma^2_0 + (n - 1)s^2 + \mu^2_0\kappa_0 + n\bar{y}^2 - \frac{(\mu_0\kappa_0 + n\bar{y})^2}{\kappa_0 + n}]\Big) \\
=& \ (\sigma^2)^{-1/2} \exp\Big(-\frac{\kappa_0 + n}{2\sigma^2}\big(\mu - \frac{\mu_0\kappa_0 + n\bar{y}}{\kappa_0 + n}\big)^2\Big) \times \\
& \ \sigma^{-(\frac{\nu_0 + n}{2} + 1)} \exp\Big(-\frac{1}{2\sigma^2}[\nu_0 \sigma^2_0 + (n - 1)s^2 + \frac{n\mu^2_0\kappa_0 - 2n\bar{y}\mu_0\kappa_0 + n\bar{y}^2\kappa_0}{\kappa_0 + n}]\Big) \\
=& \ (\sigma^2)^{-1/2} \exp\Big(-\frac{\kappa_0 + n}{2\sigma^2}\big(\mu - \frac{\mu_0\kappa_0 + n\bar{y}}{\kappa_0 + n}\big)^2\Big) \times \\
& \ \sigma^{-(\frac{\nu_0 + n}{2} + 1)} \exp\Big(-\frac{1}{2\sigma^2}[\nu_0 \sigma^2_0 + (n - 1)s^2 + \frac{\kappa_0 n}{\kappa_0 + n}(\bar{y} - \mu_0)^2]\Big)
\end{align}
Therefore,
\begin{align}
\mu|\sigma, y &\sim N\big(\frac{\mu_0\kappa_0 + n\bar{y}}{\kappa_0 + n}, \frac{\sigma^2}{\kappa_0 + n}\big), \\
\sigma|y &\sim \mathrm{Inv-}\chi^2\big(\nu_0 + n, \frac{\nu_0 \sigma^2_0 + (n - 1)s^2 + \frac{\kappa_0 n (\bar{y} - \mu)^2}{\kappa_0 + n}}{\nu_0 + n}\big)
\end{align}
## 3.10.10
### Question
Comparison of normal variances: for j = 1, 2, suppose that
\begin{align}
y_{j1},...,y_{jn_j}|\mu_j, \sigma_j^2 \sim iid N(\mu_j, \sigma_j^2)
\end{align}
and $(\mu_1, \sigma_1^2)$ are independent of $(\mu_2, \sigma_2^2)$ in the prior distribution. Show that the posterior distribution of $(s_1^2/s_2^2)/(\sigma_1^2/\sigma_2^2)$ is $F$ with $(n_1−1)$ and $(n_2−1)$ degrees of freedom. (Hint: to show the required form of the posterior density, you do not need to carry along all the normalizing constants.)
### Solution
$(s_1^2/\sigma_1^2)$と$(s_2^2/\sigma_2^2)$がそれぞれ自由度$(n_1-1)$と$(n_2-1)$の$\chi^2$分布に従うことを示せれば良い。
3.4式から各jについて
\begin{align}
p(\sigma_j^2|y) \propto& (\sigma_j^2)^{-n/2-1/2}exp(-(n-1)s^2/2\sigma_j^2)
\end{align}
となる。したがって、
\begin{align}
p(1/\sigma_j^2|y) &\propto (\sigma_j^2)^2(1/\sigma_j^2)^{n/2+1/2}exp(-(n-1)s^2/2\sigma_j^2) \\
&\propto ((n-1)s^2/\sigma_j^2)^{n/2-3/2}exp(-(n-1)s^2/2\sigma_j^2) \\
&= ((n-1)s^2/\sigma_j^2)^{\frac{n-1}{2}-1}exp(-(n-1)s^2/2\sigma_j^2) \\
\end{align}
これにより、$(n-1)s^2/\sigma_j^2$は$\chi^2$分布に従うことがわかる。
$(\mu_1, \sigma_1^2)$と$(\mu_2, \sigma_2^2)$は独立の仮定から、$(n_1-1)s_1^2/\sigma_1^2$と$(n_2-1)s_2^2/\sigma_2^2$はそれぞれ自由度$n_1-1$と$n_2-1$の$\chi^2$分布に互いに独立に従っている。
よって、$(s_1^2/s_2^2)/(\sigma_1^2/\sigma_2^2)$は自由度$(n_1-1,n_2-1)$の$F$分布にしたがっている。
## 3.10.11
### Question
Computation: in the bioassay example, replace the uniform prior density by a joint nor- mal prior distribution on (α, β), with α ∼ N(0, 22), β ∼ N(10, 102), and corr(α, β)=0.5.
### Solution
#### a
Repeat all the computations and plots of Section 3.7 with this new prior distribution.
まず、今回使用するモデルは以下の通りである。
\begin{align}
y_{i}\mid \theta_{i} \sim Bin(n_{i},\theta_{i} ) \\
Logit(\theta_{b}) = \alpha + \beta x_{i} \\
\alpha \sim Normal(0, 2^{2}) \\
\beta \sim Normal(10, 10^{2})
\end{align}
ここでは3.7と同様にグリッドアプローチを使用する。
[筆者の一人のページ](https://avehtari.github.io/BDA_R_demos/demos_ch3/demo3_6.html)を参考にして記述しています。
まず事後分布を作成します。
```r=
library(ggplot2)
theme_set(theme_minimal())
library(gridExtra)
library(tidyr)
library(dplyr)
library(purrr)
#使用するデータ
df1 <- data.frame(
x = c(-0.86, -0.30, -0.05, 0.73),
n = c(5, 5, 5, 5),
y = c(0, 1, 3, 5)
)
#グリッドを作成する
A = seq(-4, 8, length.out = 50) #alpha
B = seq(-10, 40, length.out = 50) #beta
#行列の作成?
cA <- rep(A, each = length(B))
cB <- rep(B, length(A))
#尤度関数
logl <- function(df, a, b)
df['y']*(a + b*df['x']) - df['n']*log1p(exp(a + b*df['x']))
#尤度の計算
ll <- apply(df1, 1, logl, cA, cB) %>%
# sum the log likelihoods of observations
# and exponentiate to get the joint likelihood
rowSums()
#事前分布と尤度から事後分布を計算
p <- (ll + dnorm(cA,0, 2, log = TRUE) + dnorm(cB, 10, 10, log = TRUE))%>% exp()
```
次に得られた分布からサンプリングをして行きます。
```r=
nsamp <- 1000 #試行回数
#事後分布の確率に基づいたサンプリング
samp_indices <- sample(length(p), size = nsamp,
replace = T, prob = p/sum(p))
samp_A <- cA[samp_indices[1:nsamp]]
samp_B <- cB[samp_indices[1:nsamp]]
#ジッター(p76)
samp_A <- samp_A + runif(nsamp, (A[1] - A[2])/2, (A[2] - A[1])/2)
samp_B <- samp_B + runif(nsamp, (B[1] - B[2])/2, (B[2] - B[1])/2)
#データフレームの作成
samps <- data_frame(ind = 1:nsamp, alpha = samp_A, beta = samp_B) %>%
mutate(ld50 = - alpha/beta) #LD50の定義より
```
最後に等高線図と散布図、LD50のヒストグラムを作成します。
```r=
#図の範囲
xl <- c(-4, 8)
yl <- c(-10, 40)
#等高線図
pos_1 <- ggplot(data = data.frame(cA ,cB, p), aes(cA, cB)) +
geom_raster(aes(fill = p, alpha = p), interpolate = T) +
geom_contour(aes(z = p), colour = 'black', size = 0.2) +
coord_cartesian(xlim = xl, ylim = yl) +
labs(title = 'Posterior density evaluated in grid', x = 'alpha', y = 'beta') +
scale_fill_gradient(low = 'yellow', high = 'red', guide = F) +
scale_alpha(range = c(0, 1), guide = F)
#散布図
sam_1 <- ggplot(data = samps) +
geom_point(aes(alpha, beta), color = 'blue') +
coord_cartesian(xlim = xl, ylim = yl) +
labs(title = 'Posterior draws', x = 'alpha', y = 'beta')
grid.arrange(pos_1, sam_1, nrow=2)
#ヒストグラム
his <- ggplot(data = samps) +
geom_histogram(aes(ld50), binwidth = 0.02,
fill = 'steelblue', color = 'black') +
coord_cartesian(xlim = c(-0.5, 0.5)) +
geom_vline(xintercept = 0) +
labs(x = 'LD50 = -alpha/beta')
plot(his)
`
```


#### b
#### c
2つの分布のグラフを見比べると、事後分布の平均値にはそれほど違いが見えない一方で、無情報分布よりも分散が小さくなっていることが確認できる。


## 3.10.12
Poisson regression model: expand the model of Exercise 2.13(a) by assuming that the number of fatal accidents in year t follows a Poisson distribution with mean α + βt. You will estimate α and β, following the example of the analysis in Section 3.7.
### a
Discuss various choices for a ‘noninformative’ prior for (α, β). Choose one.
### b
### c
### d
### e
### f
### g
### h
## 3.10.13
### Question
Multivariate normal model: derive equations (3.12) by completing the square in vector-matrix notation.
### Solution
\begin{align}
& \ (\mathbf{\mu} - \mathbf{\mu}_0)^T \mathbf{\Lambda}^{-1}_0 (\mathbf{\mu} - \mathbf{\mu}_0) + \sum^n_{i - 1}(\mathbf{y}_i - \mathbf{\mu})^T \mathbf{\Sigma}^{-1} (\mathbf{y}_i - \mathbf{\mu}) \\
=& \ \mathbf{\mu}^T \mathbf{\Lambda}^{-1}_0 \mathbf{\mu} - 2\mathbf{\mu}^T \mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 + \mathbf{\mu}^T_0 \mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 + \sum^n_{i = 1} \mathbf{y}^T_i \mathbf{\Sigma}^{-1} \mathbf{y}_i - 2n\mathbf{\mu}^T \mathbf{\Sigma}^{-1} \bar{\mathbf{y}} + n\mathbf{\mu}^T \mathbf{\Sigma}^{-1} \mathbf{\mu} \\
\propto& \ \mathbf{\mu}^T \mathbf{\Lambda}^{-1}_0 \mathbf{\mu} - 2\mathbf{\mu}^T \mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 - 2n\mathbf{\mu}^T \mathbf{\Sigma}^{-1} \bar{\mathbf{y}} + n\mathbf{\mu}^T \mathbf{\Sigma}^{-1} \mathbf{\mu} \\
=& \ \mathbf{\mu}^T (\mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1}) \mathbf{\mu} - 2\mathbf{\mu}^T(\mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 + n\mathbf{\Sigma}^{-1} \bar{\mathbf{y}}) \\
=& \ \mathbf{\mu}^T (\mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1}) \mathbf{\mu} - 2\mathbf{\mu}^T(\mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1}) (\mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1})^{-1} (\mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 + n\mathbf{\Sigma}^{-1} \bar{\mathbf{y}})
\end{align}
ここで、
\begin{align}
\mathbf{\mu}_n &= (\mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1})^{-1} (\mathbf{\Lambda}^{-1}_0 \mathbf{\mu}_0 + n\mathbf{\Sigma}^{-1} \bar{\mathbf{y}}) \\
\mathbf{\Lambda}^{-1}_n &= \mathbf{\Lambda}^{-1}_0 + n\mathbf{\Sigma}^{-1}
\end{align}
とおくと、上式は
\begin{align}
& \ \mathbf{\mu}^T \mathbf{\Lambda}^{-1}_n \mathbf{\mu} - 2 \mathbf{\mu}^T \mathbf{\Lambda}^{-1}_n \mathbf{\mu}_n \\
\propto& \ (\mathbf{\mu} - \mathbf{\mu}_n)^T \mathbf{\Lambda}^{-1}_n (\mathbf{\mu} - \mathbf{\mu}_n)
\end{align}
のように整理できる。
## 3.10.14
### Question
Improper prior and proper posterior distributions: prove that the posterior density (3.15) for the bioassay example has a finite integral over the range $(\alpha, \beta) \in (−\infty,\infty) \times(−\infty,\infty)$.
### Proof
(3.15)式は、
\begin{align}
p(\alpha, \beta|y, n, x) &\propto p(\alpha, \beta|n, x)p(y|\alpha, \beta, n, x) \\
&\propto p(\alpha, \beta)\prod_{i=1}^{k}p(y_{i}|\alpha, \beta, n_{i}, x_{i})
\end{align}
ここで、$p(\alpha, \beta)\propto 1$と仮定し、また、
\begin{align}
p(y_{i}|\alpha, \beta, n_{i}, x_{i})\propto [logit^{-1}(\alpha+\beta x_{i})]^{y_{i}}[1-logit^{-1}(\alpha+\beta x_{i})]^{n_{i}-y_{i}}
\end{align}
なので、
\begin{align}
p(\alpha, \beta|y, n, x) &\propto p(\alpha, \beta|n, x)p(y|\alpha, \beta, n, x) \\
&\propto p(\alpha, \beta)\prod_{i=1}^{k}p(y_{i}|\alpha, \beta, n_{i}, x_{i}) \\
&\propto \prod_{i=1}^{k}p(y_{i}|\alpha, \beta, n_{i}, x_{i})\\
&\propto \prod_{i=1}^{k}[logit^{-1}(\alpha+\beta x_{i})]^{y_{i}}[1-logit^{-1}(\alpha+\beta x_{i})]^{n_{i}-y_{i}}\\
&= [logit^{-1}(\alpha+\beta x_{i})]^{\sum_{i=1}^{k}y_{i}}[1-logit^{-1}(\alpha+\beta x_{i})]^{\sum_{i=1}^{k}n_{i}-y_{i}}
\end{align}
したがって示すべきは、
\begin{align}
\int\int_{D} p(\alpha, \beta|y, n, x)d\alpha d\beta = \int\int_{D}[logit^{-1}(\alpha+\beta x_{i})]^{\sum_{i=1}^{k}y_{i}}[1-logit^{-1}(\alpha+\beta x_{i})]^{\sum_{i=1}^{k}n_{i}-y_{i}}d\alpha d\beta
\end{align}
($D = \{ (\alpha, \beta) | (\alpha, \beta) = (−\infty,\infty) \times(−\infty,\infty)\}$)
これが有限の値を持つことを示す。
(これ以上はお手上げ)
(ベータ関数?広義積分????)
## 3.10.15
### Question
Joint distributions: The autoregressive time-series model y1, y2, . . . with mean level 0, autocorrelation 0.8, residual standard deviation 1, and normal errors can be written as $(y_t|y_{t−1}, y_{t−2}, . . .) \sim N(0.8y_{t−1}, 1)$ for all $t$. (a) Prove that the distribution of $y_t$, given the observations at all other integer time points $t$, depends only on $y_{t−1}$ and $y_{t+1}$. (b) What is the distribution of $y_t$ given $y_{t−1}$ and $y_{t+1}$?
### Solution
#### a
仮定より、
\begin{align}
p(y_t|y_{t-1}, y_{t-2}...) &= \frac{1}{\sqrt{2\pi}}\exp(-\frac{1}{2}(y_t - 0.8 y_{t-1})^2 ) &(15.1)\\
&= p(y_t|y_{t-1})
\end{align}
である。すなわち、$p(y_t|y_{t-1}, y_{t-2}...)$は、$y_t$と$y_{t-1}$の関数である。
今、$y_{t}$を除く$y$の集合を$y_{-t}$とおく。すると、題意は$p(y_t|y_{-t})$を求めれば良い。今、$y_t$を含む部分のみに注目して、
\begin{align}
p(y_t|y_{-t}) &= \frac{p(y_t, y_{-t})}{p(y_{-t})} \\
&\propto p(y_t, y_{-t}) \\
&= p(y_1)p(y_2|y_1)p(y_3|y_2, y_1)p(y_4|y_3, y_2, y_1)...\\
&= p(y_1)p(y_2|y_1)p(y_3|y_2)p(y_4|y_3)...&(\because (15.1))\\
&= p(y_1) \prod_{s=2}^T p(y_s|y_{s-1}) \\
&\propto p(y_t|y_{t-1})p(y_{t+1}|y_t) &(15.2)
\end{align}
以上から、$p(y_t|y_{-t})$は、$y_{t-1}$と$y_{t+1}$のみに依存する。
#### b
(15.2)式の対数を取り、(15.1)式を代入し、正規分布の$y_t$についての平方完成を目指す。
\begin{align}
\log p(y_t|y_{-t}) &= const. + \log p(y_t|y_{t-1})p(y_{t+1}|y_t) \\
&= const. -\frac{1}{2}((y_t - 0.8 y_{t-1})^2+ (y_t - 0.8 y_{t-1})^2) \\
&= const. - \frac{1}{2}(1.64 y_t^2 - 1.6(y_{t-1}+y_{t+1})) \\
&= const. - \frac{1}{2}\cdot 1.64 (y_t - \frac{0.8}{1.64}(y_{t-1}+y_{t+1}))^2 &(15.3)
\end{align}
よって、(15.3)式から対数を取り除くと、
\begin{align}
p(y_t|y_{-t}) &\propto \exp(- \frac{1}{2}\cdot 1.64 (y_t - \frac{0.8}{1.64}(y_{t-1}+y_{t+1}))^2) \\
&\propto N(\frac{0.8}{1.64}(y_{t-1}+y_{t+1}), \frac{1}{1.64} )
\end{align}
以上から、$y_t|y_{-t}$は、$N(\frac{0.8}{1.64}(y_{t-1}+y_{t+1}), \frac{1}{1.64} )$に従う。