<style> .reveal { font-size: 30px; } </style> ### Prior networks and ensemble distribution distillation #### --- ![](https://i.imgur.com/npXhAE0.png) --- # :thinking_face: - Model uncertainty - Data uncertainty - Distribution shift <!-- - % % %#### In practice, training could be long, fast inference is critical --> --- $\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right)=\int \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}$ $\mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) \approx \mathrm{q}(\boldsymbol{\theta})$ $\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right) \approx \frac{1}{M} \sum_{i=1}^{M} \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}^{(i)}\right), \boldsymbol{\theta}^{(i)} \sim \mathrm{q}(\boldsymbol{\theta})$ --- ![](https://i.imgur.com/0vgNuLB.png) ![](https://i.imgur.com/5JhBWsG.png) --- $\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \mathcal{D}\right)=\iint \underbrace{\mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right)}_{\text {Data }} \underbrace{\mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right)}_{\text {Distributional }} \underbrace{\mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D})}_{\text {Model }} d \boldsymbol{\mu} d \boldsymbol{\theta}$ $\int\left[\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) d \boldsymbol{\mu}\right] \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}=\int \mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}$ $\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right)\left[\int \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \boldsymbol{\theta}\right) \mathrm{p}(\boldsymbol{\theta} \mid \mathcal{D}) d \boldsymbol{\theta}\right] d \boldsymbol{\mu}=\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*}, \mathcal{D}\right) d \boldsymbol{\mu}$ --- $\operatorname{Dir}(\boldsymbol{\mu} \mid \boldsymbol{\alpha})=\frac{\Gamma\left(\alpha_{0}\right)}{\prod_{c=1}^{K} \Gamma\left(\alpha_{c}\right)} \prod_{c=1}^{K} \mu_{c}^{\alpha_{c}-1}, \quad \alpha_{c}>0, \alpha_{0}=\sum_{c=1}^{K} \alpha_{c}$ $\mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\operatorname{Dir}(\boldsymbol{\mu} \mid \boldsymbol{\alpha}), \quad \boldsymbol{\alpha}=\boldsymbol{f}\left(\boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)$ $\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\int \mathrm{p}\left(\omega_{c} \mid \boldsymbol{\mu}\right) \mathrm{p}\left(\boldsymbol{\mu} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right) d \boldsymbol{\mu}=\frac{\alpha_{c}}{\alpha_{0}}$ Apply exponential to the outputs $\mathrm{P}\left(\omega_{c} \mid \boldsymbol{x}^{*} ; \hat{\boldsymbol{\theta}}\right)=\frac{e^{z_{c}\left(\boldsymbol{x}^{*}\right)}}{\sum_{k=1}^{K} e^{z_{k}\left(\boldsymbol{x}^{*}\right)}}$ --- ![](https://i.imgur.com/3jZCi8y.png) --- ## goldfish questions #### Is auroc valid here? #### Is point estimatate enough? --- ## Ensemble distilation $\mathcal{L}\left(\phi, \mathcal{D}_{\text {ens }}\right)=\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{x})}\left[\operatorname{KL}\left[\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{\theta} \mid \mathcal{D})}[\mathrm{P}(y \mid \boldsymbol{x} ; \boldsymbol{\theta})] \| \mathrm{P}(y \mid \boldsymbol{x} ; \boldsymbol{\phi})\right]\right]$ --- ## Ensemble distribution distillation $\mathcal{D}_{\text {ens }}=\left\{\boldsymbol{x}^{(i)}, \boldsymbol{\pi}^{(i, 1: M)}\right\}_{i=1}^{N} \sim \hat{\mathrm{p}}(\boldsymbol{x}, \boldsymbol{\pi})$ $\begin{aligned} \mathcal{L}\left(\phi, \mathcal{D}_{\text {ens }}\right) &=-\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{x})}\left[\mathbb{E}_{\hat{\mathrm{p}}(\boldsymbol{\pi} \mid \boldsymbol{x})}[\ln \mathrm{p}(\boldsymbol{\pi} \mid \boldsymbol{x} ; \boldsymbol{\phi})]\right] \\ &=-\frac{1}{N} \sum_{i=1}^{N}\left[\ln \Gamma\left(\hat{\alpha}_{0}^{(i)}\right)-\sum_{c=1}^{K} \ln \Gamma\left(\hat{\alpha}_{c}^{(i)}\right)+\frac{1}{M} \sum_{m=1}^{M} \sum_{c=1}^{K}\left(\hat{\alpha}_{c}^{(i)}-1\right) \ln \pi_{c}^{(i m)}\right] \end{aligned}$ --- ![](https://i.imgur.com/s6NcPGp.png)
{"metaMigratedAt":"2023-06-15T19:30:28.327Z","metaMigratedFrom":"YAML","title":"Prior networks and ensemble distribution distillation","breaks":true,"description":"Seminar presentation","slideOptions":"{\"theme\":\"white\"}","contributors":"[{\"id\":\"173eb66f-920d-45e9-8ef5-4b365abfa9d8\",\"add\":4763,\"del\":2444}]"}
    241 views