## 05.06 ### Decode $\mathbf{y}$ given $\mathbf{x}$ and $\mathbf{z}$ #### Default ('soft NLL') \begin{align} E(\mathbf{\hat{y}}_1,\ldots,\mathbf{\hat{y}}_T)&=-\sum_{t=1}^T \mathbf{\text{softmax}(f_\theta(\mathbf{\hat{y}}_{<t}, \mathbf{x}))\cdot\log\text{softmax}(\hat{y}}_t) \end{align} #### Constrained decoding \begin{align} E_{constrain}(\mathbf{\hat{y}}_{1:T})&=E(\mathbf{\hat{y}}_{1:T})+\sum_{\mathbf{y}\in \mathcal{Y}}\min_{t\in {1\ldots T}}\text{KL}(\mathbf{y}\| \hat{\mathbf{y}}_t) \end{align} - where $\mathcal{Y}$ is a set of tokens that we want to appear in the decoded sequence, and each $\mathbf{y}$ is a one-hot representation of a token in $\mathcal{Y}$. #### Counterfactual decoding \begin{align} E_{counterfactual}(\mathbf{\hat{y}}_{1:T})&=E(\mathbf{\hat{y}}_{1:T})+\sum_{t=1}^{T}\text{KL}(\mathbf{z}_{t}\| \hat{\mathbf{y}}_t) \end{align} - where $\mathcal{Z}$ is original story ending, $\mathbf{z}_t$ is the $t$-th token in $\mathcal{Z}$, and ${T}$ is the length of $\mathcal{Z}$. #### Combination decoding \begin{align} E_{combination}(\mathbf{\hat{y}}_{1:T})&=E(\mathbf{\hat{y}}_{1:T})+\sum_{\mathbf{y}\in \mathcal{Y}}\min_{t\in {1\ldots T}}\text{KL}(\mathbf{y}\| \hat{\mathbf{y}}_t) +\sum_{t=1}^{T}\text{KL}(\mathbf{z}_{t}\| \hat{\mathbf{y}}_t) \end{align} - where $\mathcal{Y}$ is some key tokens from $\mathcal{Z}$.