<style>
.reveal {
font-size: 30px;
}
</style>
### Prior networks and ensemble distribution distillation
####
---

---
# :thinking_face:
- Model uncertainty
- Data uncertainty
- Distribution shift
<!-- -
%
%
%#### In practice, training could be long, fast inference is critical
-->
---
$\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right)=\int \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}$
$\mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) \approx \mathrm{q}(\boldsymbol{\theta})$
$\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right) \approx \frac{1}{M} \sum_{i=1}^{M} \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}^{(i)}\right), \boldsymbol{\theta}^{(i)} \sim \mathrm{q}(\boldsymbol{\theta})$
---


---
$\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right)=\iint \underbrace{\mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right)}_{\text {Data }} \underbrace{\mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right)}_{\text {Distributional }} \underbrace{\mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D})}_{\text {Model }} d \boldsymbol{\mu} d \boldsymbol{\theta}$
$\int\left[\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) d \boldsymbol{\mu}\right] \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}=\int \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}$
$\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right)\left[\int \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}\right] d \boldsymbol{\mu}=\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \mathcal{D}\right) d \boldsymbol{\mu}$
---
$\operatorname{Dir}(\boldsymbol{\mu} \mid \boldsymbol{\alpha})=\frac{\Gamma\left(\alpha_{0}\right)}{\prod_{c=1}^{K} \Gamma\left(\alpha_{c}\right)} \prod_{c=1}^{K} \mu_{c}^{\alpha_{c}-1}, \quad \alpha_{c}>0, \alpha_{0}=\sum_{c=1}^{K} \alpha_{c}$
$\mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\operatorname{Dir}(\boldsymbol{\mu} \mid \boldsymbol{\alpha}), \quad \boldsymbol{\alpha}=\boldsymbol{f}\left(\boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)$
$\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right) d \boldsymbol{\mu}=\frac{\alpha_{c}}{\alpha_{0}}$
Apply exponential to the outputs
$\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\frac{e^{z_{c}\left(\boldsymbol{x}^{*}\right)}}{\sum_{k=1}^{K} e^{z_{k}\left(\boldsymbol{x}^{*}\right)}}$
---

---
## goldfish questions
#### Is auroc valid here?
#### Is point estimatate enough?
---
## Ensemble distilation
$\mathcal{L}\left(\phi, \mathcal{D}_{\text {ens }}\right)=\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{x})}\left[\operatorname{KL}\left[\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{\theta} \mid \mathcal{D})}[\mathrm{P}(y \mid \boldsymbol{x} ; \boldsymbol{\theta})] \| \mathrm{P}(y \mid \boldsymbol{x} ; \boldsymbol{\phi})\right]\right]$
---
## Ensemble distribution distillation
$\mathcal{D}_{\text {ens }}=\left\{\boldsymbol{x}^{(i)}, \boldsymbol{\pi}^{(i, 1: M)}\right\}_{i=1}^{N} \sim \hat{\mathrm{p}}(\boldsymbol{x}, \boldsymbol{\pi})$
$\begin{aligned}
\mathcal{L}\left(\phi, \mathcal{D}_{\text {ens }}\right) &=-\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{x})}\left[\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{\pi} \mid \boldsymbol{x})}[\ln \mathrm{p}(\boldsymbol{\pi} \mid \boldsymbol{x} ; \boldsymbol{\phi})]\right] \\
&=-\frac{1}{N} \sum_{i=1}^{N}\left[\ln \Gamma\left(\hat{\alpha}_{0}^{(i)}\right)-\sum_{c=1}^{K} \ln \Gamma\left(\hat{\alpha}_{c}^{(i)}\right)+\frac{1}{M} \sum_{m=1}^{M} \sum_{c=1}^{K}\left(\hat{\alpha}_{c}^{(i)}-1\right) \ln \pi_{c}^{(i m)}\right]
\end{aligned}$
---

{"metaMigratedAt":"2023-06-15T19:30:28.327Z","metaMigratedFrom":"YAML","title":"Prior networks and ensemble distribution distillation","breaks":true,"description":"Seminar presentation","slideOptions":"{\"theme\":\"white\"}","contributors":"[{\"id\":\"173eb66f-920d-45e9-8ef5-4b365abfa9d8\",\"add\":4763,\"del\":2444}]"}