# 7/22~29 Progress
## Thesis
3.3 Supervision in Disentanglement

## Experiment

### analysis
- The accuracy is higher while Rouge-L and Bleu scores are lower when I add the continuous classifier.
- The contrastive method being used is crucial to the performance.
- The accuracy is not the crucial point for the final response answer.
## Human evaluation
- Ongoing
## Formulation
Objective:
$$
p(y,x_e|x) = p(y|x,x_e) \cdot p(x_{e}|x)
$$
To find $p(y∣x)$, marginalize over all possible values of $x_e$:
$$
p(y \mid x)=\int p(y, x_e \mid x) d x_e =\int p(y \mid x_e, x) p(x_e \mid x) d x_e
$$
Assume S is soft prompt produced by the transformer encoder $f_\theta$ and the MLP layers $f_\phi$ :
$$
S = f_\psi(f_\theta(x))
$$
Therefore, we rewrite the conditional probability:
$$
\begin{aligned}
& p(y \mid x)=\int p(y \mid S, x) p(S \mid x) d S \\
& =\int p(y \mid S, x) \delta\left(S-f_\psi\left(f_\theta(x)\right)\right) d S \\
& =p\left(y \mid f_\psi\left(f_\theta(x)\right), x\right) \\
& =\prod_{t=1}^T p\left(y_t \mid y_{<t}, x, f_\psi\left(f_\theta(x)\right)\right) \\
&=\exp \left(\sum_{t=1}^T \log p\left(y_t \mid y_{<t}, x, f_\psi\left(f_\theta(x)\right)\right)\right) \\
\end{aligned}
$$
$$
\mathcal{L}_{g}(x,y;\theta,\psi)=-\sum_{t=0}^T \log p_{\theta,\psi}(y_t|y_{<t},x,S)
$$
- Discrete Classifier
$$
\mathbb{E}_{p(x,x_e)} [p(x_e|x)] = - \log p_{\theta, \phi}(x_e|x)
$$
$$
\mathcal{L}_e(x,x_e;\theta,\phi)=-\sum_{i=0}^N \log p_{\theta,\phi}(x_{e_i}|x_i)
$$
- Continuous Classifier
$$
\begin{aligned}
& p\left(\hat{x}_e \mid x_e\right)=\frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left(-\frac{\left(\hat{x}_e-{x}_e\right)^2}{2 \sigma^2}\right) \\
& \quad \text { accuracy }=\frac{1}{N} \sum_{i=1}^N \mathbb{I}\left(\left|\hat{x_{e_i}}-x_{e_i}\right|<\epsilon\right)
\end{aligned}
$$
$\mathbb{I}$ : indicator function
## Visualization


ceclg_c_v2



---
ceclg_con_v1




---
discrete
