# Contrastive Disentanglement Learning for Empathetic Dialogue Generation
[Draft](https://www.overleaf.com/3934372545kxdzdqtxwpxp#73217f)
[Thesis_edit](https://www.overleaf.com/4338721613ggrxgcpwkgwf#1e1d93)
[Thesis view](https://www.overleaf.com/read/kgshfwthgsqd#193d34)
## Ongoing
**some baseline papers**:
[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)
[CTSM](https://arxiv.org/pdf/2403.15516v1)
[CEM](https://github.com/Sahandfer/CEM)
[CASE](https://github.com/jfzhouyoo/CASE)
[Supervised CL](https://arxiv.org/pdf/2004.11362)
[Enhanced Coherence-Aware Network with Hierarchical
Disentanglement for Aspect-Category Sentiment Analysis](https://aclanthology.org/2024.lrec-main.518.pdf)
[Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models](https://aclanthology.org/2024.lrec-main.53/)
[Harnessing the Power of Large Language Models for Empathetic Response
Generation: Empirical Investigations and Improvements](https://aclanthology.org/2023.findings-emnlp.433.pdf)
[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)
https://aclanthology.org/2023.findings-acl.498.pdf
**Note**
Formulation
Evidence
google link
github (only code no param)
draw.io 所有圖檔
weight param 額外存 去用link
---
## 8/22 Suggestion from examiners
<!-- - LLama2+prompt eos -->
- **continuous classifier probability**
<!-- - check emotion label -->
- VAD analysis and VA analysis
- show the result of the emotion prediction
<!-- - sample scenario case and more NYCUKA case -->
- **Data explaination and configuration**
- paper draft
## 8/25 Formulation
$$
\min_{\theta, \phi, \psi} \mathbb{E}_{(x, y) \sim \mathcal{D}} \left[ -\log p(y \mid x, x_\text{e}; \theta, \psi) - \log p(x_\text{e} \mid x; \theta, \phi) \right]
$$
Since $h_c, h_e = f_\theta$, through $c_\phi(h_e) = z_e$, where $z_e = [V_e,A_e,T_e]^T$
$$
\begin{aligned}
& \mathbb{E}_{(x, y) \sim \mathcal{D}} [\log p(x_e|x)] = - \log p_{\theta, \phi}(x_e|x) \\
&= - \log p_{\theta, \phi}(z_e|x) = \left\|\mathbf{z}_e-\mu_{\theta, \phi}(x)\right\|_2^2-\log Z \\
& = \quad \mathbb{E}_{(x, y) \sim \mathcal{D}}\left[\left\|\mathbf{z}_e-\mu_{\theta, \phi}(x)\right\|_2^2\right]-\log Z
\end{aligned}
$$
$$
\begin{aligned}
\mathcal{L}_e(x,x_e;\theta,\phi)=&-\sum_{i=0}^N \log p_{\theta,\phi}(x_{e_i}|x_i)\\
&= \sum_{i=0}^N\left\|\mathbf{z}_{e_i}-\mu_{\theta, \phi}\left(x_i\right)\right\|_2^2-N \cdot \log Z
\end{aligned}
$$
-------
Marginalizing over $x_\text e$
$$
p_{\theta, \psi}(y \mid x) = \int p_{\theta, \psi}(y \mid x_\text e, x) p_{\theta, \psi}(x_\text e \mid x) \, dx_\text e
$$
Rewriting with soft prompt ($S$) and incorporating delta function
$$
\begin{aligned}
p_{\theta, \psi}(y \mid x) & = \int p_{\theta, \psi}(y \mid S, x) p_{\theta, \psi}(S \mid x) \, dS \\
& = \int p_{\theta, \psi}(y \mid S, x) \delta(S - f_\psi(f_\theta(x))) \, dS \\
& = p_{\theta, \psi}(y \mid f_\psi(f_\theta(x)), x)
\end{aligned}
$$
$$
\mathcal{L}_{g}(x,y;\theta,\psi)=-\sum_{t=0}^T \log p_{\theta,\psi}(y_t|y_{<t},x,S)
$$
---
$$
\mathcal{L}_{\text {c }}=1-\frac{h_p \cdot h_c}{\left\|h_p\right\|\left\|h_c\right\|}
$$
$$
\mathcal{L}_{cl} = -\frac{1}{N} \sum_{i=1}^N \log \left(\frac{\exp \left(\frac{\bar{h}_i \cdot \bar{h}_{i}^{+}}{\tau}\right)}{\sum_{j=1}^N \exp \left(\frac{\bar{h}_i \cdot \bar{h}_{j}^{-}}{\tau}\right)}\right)
$$
## 8/1

- The human evaluation is preparing
- 7 methods' responses correspond to the same input.
- A/B testing
### Side outline
- Introduction
- Natural Language Generation
- Empathetic Dialogue System
- Motivation
- Methods in Dialogue Systems
- Knowledge Integration
- Disentanglement learning
- Large Languege Model Integration
- Contrastive Disentanglement for Coherent Empathetic Dialogue
- Data Augmentation
- Disentangled representation through Contrastive Learning
- Soft prompt Integration
- Experiment
- Experimental Setup
- Experimental Result
- Analysis
- Conclusion and Future Work
## 7/22
[Progress](https://hackmd.io/@Stream/7_22)
## 7/17 Comparison table


### Contributions of the Proposed Method
1. **Disentanglement of Semantic and Emotional Content:**
- Separates semantic and emotional content for nuanced and effective communication.
2. **Contrastive Learning for Emotional Context:**
- By distinguishing between different emotional expressions through the pairing of augmented sentences with their corresponding negative examples, the model can better understand and generate empathetic responses.
3. **Integration of Disentangled Information as Soft Prompts:**
- The proposed method integrates the disentangled emotional and content information as soft prompts, guiding the generation process to ensure that the model's responses align more closely with the emotional requirements of the consultation system.
4. **VAD-Based Emotion Analysis:**
- Maps emotion labels to a 3-dimensional VAD (Valence, Arousal, Dominance) space, providing a more detailed and continuous analysis compared to traditional discrete methods.
### Experiment for continuous classifier




### Expereiment for evaluation

---
## 7/1 Related work
## Compare others' method and mine
|Methods|Papers|Contribution|Loss|Advantagle|
|--|--|--|--|--|
|Contrative learning|[CTSM: Combining Trait and State Emotions for Empathetic Response Model](https://arxiv.org/pdf/2403.15516v1)(SOTA)|emotion guidance module, cross-contrastive learning decoder |$\mathcal{L}=\gamma_{1} \mathcal{L}_{e}+\gamma_{2} \mathcal{L}_{g}+\gamma_{3} \mathcal{L}_{c c l}+\gamma_{4} \mathcal{L}_{d i v}$|1. Integrates both trait and state emotions.<br>2. Utilizes a cross-contrastive learning approach for better guidance.
|Disentanglement|[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)|the representation of the history context is disentangled onto the content space and the emotion space with two auxiliary constraints based on the emotion label.|$l_{d i s \_c}=-\sum_{i=1}^{k} y_{c}^{i} \log y_{c}^{i}$$l_{d i s \_e}=-\sum_{i=1}^{k} e m o_{i} \log y_{e}^{i}$|1. Clear separation of content and emotion.<br>2. Auxiliary constraints improve the disentanglement process.|
|Graphical Network|[CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation](https://arxiv.org/pdf/2208.08845)|It first builds upon a commonsense cognition graph and an emotional concept graph and then aligns the user’s cognition and affection at both the coarsegrained and fine-grained levels|$\mathcal {L}_{total}=\gamma_{1}\mathcal {L}_{align}+\gamma_{2}\mathcal {L}_{emo}+ \gamma_{3}\mathcal {L}_{gen}+\gamma_{4}\mathcal {L}_{div}$|1. Multi-level alignment of cognition and emotion.<br>2. Incorporates commonsense and emotional concept graphs|
|My proposed method|-|-|$\mathcal {L}_{total}=\gamma_{1}\mathcal {L}_c+ \gamma_{2}\mathcal {L}_{e}+\gamma_{3}\mathcal {L}_{cl}+ \gamma_{4}\mathcal {L}_{g}$|1. Separates semantic and emotional content for nuanced and effective communication <br>2. Uses contrastive learning and classifier for emotional context.<br>3. Enhances coherent and empathetic responses.<br>4. Using VAD to analysis the context's emotion.|
### Response table
|Emotion|joyful|RougeL|
|--|--|--|
|Context|I could have. But I didn't. He cheated on me actually.|-|
|Golden|Oh I see, why are you still with him then?|-|
|LLama2 only|I could have. But I didn't. He cheated on me actually. and f\*cked it up for.Key details for cheating in a text message. A simple message saying cheated did not detail why or circumstances beyond.|7.5|
|LLama2 with disentangelement|That's terrible. I'm sorry you're going through this. Do you have any idea why he did this? I'm sure you're heartbroken. I hope you can find closure soon. I'll be thinking of you. Is there anything I can do for you?|8.5|
|Chatgpt4|it sounds like you might be feeling disappointed and betrayed. Such emotions are entirely natural in your situation. Could you share more about how you're feeling right now?|10.3
|Emotion|confident|RougeL|
|--|--|--|
|Context|I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly.|-|
|Golden|I certainly hope so, Why do you want to move?|-|
|LLama2 only|I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly.I've done up the house as best as I could, and I'm confident it will sell quickly. I have improved the house with some decor|6.7|
|LLama2 with disentangelement|I hope it goes well. I'm sure it will. You've got a lot of heart. Do you have a lot of experience? I'm just starting out. I've never done anything like this before. I hope it works out. Do they have a lot in common with you? I hope so. |11.5|
|Chatgpt4|It's great to hear that you've put so much effort into fixing up your house. Your confidence in its quick sale is a positive sign. How are you feeling about the whole process? Are there any aspects that you're particularly excited or nervous about?|7.01|
### Experiment
Discrete labels:

---
## 6/24 Formulation & Framework
### Motivation
[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)


- Attempts to understand the dialogue context and generate the empathetic response from both the content view and the emotion view via disentanglement.
[Towards a Unified Framework of Contrastive Learning for Disentangled Representations, NIPS](https://papers.nips.cc/paper_files/paper/2023/hash/d5470483dd38f71f7bd9e68ce1b94145-Abstract-Conference.html)

- This paper extends the theoretical guarantees for disentanglement to a broader family of contrastive methods, while also relaxing the assumptions about the data distribution
### Framework Refinement

### Formulation
Objective:
$$
p(y,x_e|x) = p(y|x,x_e) \cdot p(x_{e}|x)
$$
To find $p(y∣x)$, marginalize over all possible values of $x_e$:
$$
p(y \mid x)=\int p(y, x_e \mid x) d x_e =\int p(y \mid x_e, x) p(x_e \mid x) d x_e
$$
Assume S is soft prompt produced by the transformer encoder $f_\theta$ and the MLP layers $f_\phi$ :
$$
S = f_\psi(f_\theta(x))
$$
Therefore, we rewrite the conditional probability:
$$
\begin{aligned}
& p(y \mid x)=\int p(y \mid S, x) p(S \mid x) d S \\
& =\int p(y \mid S, x) \delta\left(S-f_\psi\left(f_\theta(x)\right)\right) d S \\
& =p\left(y \mid f_\psi\left(f_\theta(x)\right), x\right) \\
& =\prod_{t=1}^T p\left(y_t \mid y_{<t}, x, f_\psi\left(f_\theta(x)\right)\right) \\
&=\exp \left(\sum_{t=1}^T \log p\left(y_t \mid y_{<t}, x, f_\psi\left(f_\theta(x)\right)\right)\right) \\
\end{aligned}
$$
$$
\mathcal{L}_{g}(x,y;\theta,\psi)=-\sum_{t=0}^T \log p_{\theta,\psi}(y_t|y_{<t},x,S)
$$
----
$$
\{h_c,h_{e}\} = f_\theta(x)
$$
$$
\{h'_c,h'_{e}\} = f_\theta(x')
$$
$$
\mathcal{L}_{cl} = -\frac{1}{N} \sum_{i=1}^N \log \left(\frac{\exp \left(\frac{\bar{h}_i \cdot \bar{h}_{i}^{+}}{\tau}\right)}{\sum_{j=1}^N \exp \left(\frac{\bar{h}_i \cdot \bar{h}_{j}^{-}}{\tau}\right)}\right)
$$


- Discrete Classifier
$$
\mathbb{E}_{(x, y) \sim \mathcal{D}} [\log p(x_e|x)] = - \log p_{\theta, \phi}(x_e|x)
$$
$$
\mathcal{L}_e(x,x_e;\theta,\phi)=-\sum_{i=0}^N \log p_{\theta,\phi}(x_{e_i}|x_i)
$$
- Continuous Classifier
首先,假设你的模型预测的误差(即预测值 $\hat{x}_e$ 和实际值 $x_e$ 之间的差异)服从正态分布。对于一个给定的输入 $x$,输出 $x_e$ 可能服从如下的正态分布:
log-likelihood of a Gaussian distribution
$$
\begin{aligned}
& p\left(\hat{x}_e \mid x_e\right)=\frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left(-\frac{\left(\hat{x}_e-{x}_e\right)^2}{2 \sigma^2}\right) \\
& \quad \text { accuracy }=\frac{1}{N} \sum_{i=1}^N \mathbb{I}\left(\left|\hat{x_{e_i}}-x_{e_i}\right|<\epsilon\right)
\end{aligned}
$$
$\mathbb{I}$ : indicator function
- Notation
**$D = \{x,x_e,x_p,y \}$**
$x$ : Input text from the dataset
$x^{\prime}$ : Augmented text from $x$
$h_{c}$ : Contextual representation obtained from the encoder
$h_e$ : Emotional representation obtained from the encoder
$\hat{x}_e$ : Predicted emotion
$S$ : Soft prompt generated by the MLP
$\hat{y}$ :Predicted output
$f_\theta$:encoder
$f_\psi$:MLP
$c_\phi$:classifier
- Function
$$
\{h_c,h_{e}\} = f_\theta(x)
$$
$$
\{h'_c,h'_{e}\} = f_\theta(x')
$$
$$
\hat{x}_{e} = c_\phi (\mathcal{P}(h_{e}))
$$
$$
S = f_\psi(h)
$$
$$
\hat y_t = p(y_t|y_{<t},x,S)
$$
- Training loss:
$$
\mathcal{L}_{\text {c }}=1-\frac{h_p \cdot h_c}{\left\|h_p\right\|\left\|h_c\right\|}
$$
$$
\mathcal{L}_e=-\sum_{i=0}^N \log p(x_{e_i}|x_i,h_{e_i})
$$
$$
\mathcal{L}_{cl} = -\frac{1}{N} \sum_{i=1}^N \log \left(\frac{\exp \left(\frac{\bar{h}_i \cdot \bar{h}_{i}^{+}}{\tau}\right)}{\sum_{j=1}^N \exp \left(\frac{\bar{h}_i \cdot \bar{h}_{j}^{-}}{\tau}\right)}\right)
$$
$$
\mathcal{L}_{g}=-\sum_{t=0}^T \log p(y_t|y_{<t},x,S)
$$
$$
\mathcal {L}_{total}=\gamma_{1}\mathcal {L}_c+ \gamma_{2}\mathcal {L}_{e}+\gamma_{3}\mathcal {L}_{cl}+ \gamma_{4}\mathcal {L}_{g}
$$
### Experiment
- ablation study
- [dist](https://aclanthology.org/N16-1014.pdf)
#### ablation study using EMPATHETICDIALOGUES dataset
##### Using discrete classifier
Only use 1/10 data to measure the **coherent**
|Methods|Bleu|RougeL|dist-1|dist-2|PPL|
|--|--|--|--|--|--|
|w/o disentangelment|0.70|6.8|19.43|67.26
|w disentangelment|0.75|7.8|9.40|48.95
Use whole data to measure the **disentangelement** part
|Methods|Accuracy
|--|--|
|w/o contrative learning|0.05|
|w contrative learning |0.375|
##### Using continuous classifier
Only use 1/10 data to measure the **coherent**
|Methods|Bleu|RougeL|dist-1|dist-2|PPL|
|--|--|--|--|--|--|
|w/o disentangelment|0.74|7.4|18.78|69.48
|w disentangelment|0.89|9.3|9.7|42.23|34.2
Use whole data to measure the **disentangelement** part
|Methods|Accuracy
|--|--|
|w/o contrative learning|0.13|
|w contrative learning |0.44|
- Latent Space Analysis:
- Evaluate the representations in the latent space to ensure that the emotional and content aspects are effectively disentangled and that the emotional part is well-represented.
- Empathy and Coherence Metrics:
- Distint not good
----
## 6/18
### Flow chart

### Contribution
- **Objective**: To enhance the generation of coherent and empathetic responses in NLG tasks by leveraging contrastive learning and disentanglement representation.
- **Enhanced Disentanglement through Contrastive Learning**:
- Unlike previous methods that rely solely on labels to disentangle context and emotion, this approach uses the data itself for contrastive learning.
- By creating positive and negative pairs, the model learns to differentiate emotional expressions more effectively.
- **Soft Prompt Integration for Coherent and Empathetic Responses**:
- The disentangled semantic and emotional information is integrated as soft prompts.
- These soft prompts enhance the coherence and empathy of the generated responses, ensuring that they are contextually appropriate and emotionally aligned.
### Current measuring
- Disentanglement
- 
- 
- Classifier
- Coherence
- [Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset](https://aclanthology.org/P19-1534/)
- [A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support](https://aclanthology.org/2020.emnlp-main.425/)
---
## 6/12 Framework Refinement
### Scenario

### Refinement

### Current measuring
Disentanglement
- [t-sne](https://www.mropengate.com/2019/06/t-sne.html)
- 
- [MEASURING DISENTANGLEMENT: A REVIEW OF METRICS](https://www.arxiv.org/pdf/2012.09276)

Classifier
- paper: [valence arousal](https://www.nature.com/articles/s42256-020-00280-0)
- github: [valence arousal](https://github.com/face-analysis/emonet)

---
## 6/5 New framework

---
## 5/29
[knowledge](https://aclanthology.org/2023.findings-acl.498.pdf)
[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495) (SOTA)
- It is essential to model the content-emotion duality of a dialogue, which is composed of the content view and the emotion view.
- two different fully-connected networks are adopted to project the contextual representation H into two different spaces,
[Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models](https://arxiv.org/pdf/2402.11801)
- Previous work lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing finegrained emotions and their triggers
My proposed method:
X = E + S
X : Job interviews always make me sweat bullets, makes me uncomfortable in general to be looked at under a microscope like that
E :"sweat bullets", "uncomfortable", "looked at under a microscope"
S :"Job interviews always make me"
Y:Don't be nervous. Just be prepared.

---
## 5/22 New framework
:::success
**Sugguestions**
More figures and consistency
More detailed about disentanglement and coherence
Parameter for each network
:::

[Data Augmentation for Emotion Detection in Small Imbalanced Text Data](https://arxiv.org/pdf/2310.17015.pdf)

Inspired by [Towards a Unified Framework of Contrastive Learning for Disentangled Representations](https://papers.nips.cc/paper_files/paper/2023/hash/d5470483dd38f71f7bd9e68ce1b94145-Abstract-Conference.html)


- papers
[Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning](https://arxiv.org/pdf/1805.08651)
[ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA](https://arxiv.org/pdf/2002.11537)
[Nonlinear ICA of Temporally Dependent Stationary Sources
](https://proceedings.mlr.press/v54/hyvarinen17a/hyvarinen17a.pdf)
---
## 5/15 Address some points
**Data Augmentation for Emotional Enhancement**:
- To enable the model to better understand and generate empathetic responses.
**Contrastive Learning for Emotion Understanding**:
- To distinguish between different emotional expressions by pairing augmented sentences with their corresponding negative examples.
**Disentanglement of Semantic and Emotional Content**:
- This separation allows the model to focus independently on understanding the context and the underlying emotions, leading to more coherent and empathetic responses.
**Integration of Disentangled Information as Soft Prompts**:
- This approach guides the generation process, ensuring that the model's responses align more closely with the emotional requirements of the consultation system.
Current work:
Using [empathetic_dialogues](https://huggingface.co/datasets/empathetic_dialogues?row=7) to train this model.
Future work:
Using [AI project](https://docs.google.com/spreadsheets/d/1JyQPHHspKPhFLt-u0g3EqC7_rt8NU8al/edit?usp=sharing&ouid=101061530049693764160&rtpof=true&sd=true) to train this model.
### Survey papers
[E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation](https://aclanthology.org/2023.emnlp-main.653.pdf)(study)
Issue:
- Current approaches for empathetic dialogue generation mainly perceive an emotional label to generate an empathetic response conditioned on it, which simply treat emotions independently, but **ignore the intrinsic emotion correlation** in dialogues, resulting in inaccurate emotion perception and unsuitable response generation.
[Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements](https://arxiv.org/pdf/2310.05140)
[Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation](https://paperswithcode.com/paper/emotion-aware-transformer-encoder-for-1)
[A survey on empathetic dialogue systems](https://www.sciencedirect.com/science/article/pii/S1566253520303092?casa_token=EI6dcydBgr8AAAAA:Z5q9Vvoexd4HrsA-zlTPjyFZ5j3UjPhTtpN5p7V5mI5SuvydGIcljtN8OLqth2evBymmy0hbwXo)
[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)

- Attempts to understand the dialogue context and generate the empathetic response from both the content view and the emotion view via disentanglement.
:::spoiler Empathetic dialogue's 32 emotions:
(['sentimental', 'afraid', 'proud', 'faithful', 'terrified', 'joyful', 'angry', 'sad', 'jealous', 'grateful', 'prepared', 'embarrassed', 'excited', 'annoyed', 'lonely', 'ashamed', 'guilty', 'surprised', 'nostalgic', 'confident', 'furious', 'disappointed', 'caring', 'trusting', 'disgusted', 'anticipating', 'anxious', 'hopeful', 'content', 'impressed', 'apprehensive', 'devastated'])
:::
---
## 5/8
### Disentanglement table
|Methods|Papers|Contribution|Tasks or datasets|
|--|--|--|--|
|Contrastive VAE-based model|[Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation](https://openreview.net/attachment?id=t1ZPGMHyWL&name=pdf)|contrastive estimation with no external signals; sampling strategy for semantically similar and dissimilar views of the data.|video, audio and time series|
|Contrastive|[Self-Supervised Learning Disentangled Group Representation as Feature](https://proceedings.neurips.cc/paper_files/paper/2021/file/97416ac0f58056947e2eb5d5d253d4f2-Paper.pdf)|They ground the abstract semantics and the group acting on them into concrete contrastive learning.|images |
|=|[NeurIPS 2019 Disentanglement Challenge](https://arxiv.org/pdf/2002.12356.pdf)|List Challenge:
|Text style transfer VAE-based model|[Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders](https://aclanthology.org/2021.findings-emnlp.301.pdf)|We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations.|Yelp|
|Text style transfer VAE-based model|[An Evaluation of Disentangled Representation Learning for Texts](https://aclanthology.org/2021.findings-acl.170.pdf)|They proposes evaluation metrics tailored to the specific use-cases of disentangled representations in text generation; They describes empirical evaluations conducted on multiple datasets |PersonageNLG, GYAFC, Bible Datasets|
|Content-Emotion Duality|[Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation](https://arxiv.org/abs/2209.12495)|-|Empathetic dialouge|
|-|-|
---
## 4/30 Data Augmentation for emtional sentence
:::warning
**Evidence**
- Disentanglement
- Coherence
:::
[Data Augmentation for Emotion Detection in Small Imbalanced Text Data](https://arxiv.org/pdf/2310.17015.pdf)
[AugEmotionDetection_github](https://github.com/A-Koufakou/AugEmotionDetection/tree/main)
- Easy Data Augmentation (EDA)
- Embeddings
- BART Paraphraser ProtAugment
- ChatGPT API
[EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks](https://arxiv.org/pdf/1901.11196)
[eda_nlp_github](https://github.com/jasonwei20/eda_nlp)

|Utterance|Emotion|EDA|
|-|-|-|
|Was this a friend you were in love with_comma_ or just a best friend?|sentimental|This a was champion you in love precisely with_comma_?|
|This was a best friend. I miss her|sentimental|This be a. admirer Unity.|
|Where has she gone?|sentimental|Ha? gone|
|Wait what are sweatings|afraid|Sudation what|
|it's quite strange that you didnt imagine it|proud|quite strange that didnt it suppose|
---
## 4/23 Relation
### Contrastive Disentanglement for Coherent Empathetic Dialogue

**Disentanglement in Empathetic Dialogues:**
- Improved Understanding:
- By separating emotional and contextual factors, disentanglement helps models better comprehend underlying emotions and situations.
- Enhanced Generative Capabilities:
- To generate emotionally appropriate and contextually relevant responses, fostering more coherent and empathetic dialogues.
**Novelty of Contrastive Learning Combined with Soft Prompt in Empathetic Dialogues:**
- Enhanced Contextual Understanding:
- To focus on specific contextual aspects while learning disentangled representations, deepening the understanding of dialogue context and promoting more contextually relevant responses.
- Improved Emotional Representation:
- Contrastive learning encourages models to discern features for different emotional states, complemented by soft prompts for refined emotional representations.
- Adaptability to Variations:
- Efficient Learning
- Generalizability:
- This can across different domains and languages, capture emotional and contextual dynamics in empathetic dialogues more effecti
**The challenge involved two stages:**
- sim-to-real transfer learning
- advancing disentangled representation learning to complicated physical objects
[NeurIPS 2019 Disentanglement Challenge](https://arxiv.org/pdf/2002.12356.pdf)
[Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders](https://aclanthology.org/2021.findings-emnlp.301.pdf)
[Awesome Disentanglement in NLP](https://github.com/zhjohnchan/awesome-disentanglement-in-nlp)
## 4/16
### Papers
[Language Model Detoxification in Dialogue with Contextualized Stance Control](https://arxiv.org/pdf/2301.10368.pdf)
EMNLP 2022
Issue:
- Previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context.


$$
\begin{aligned}
& \mathcal{L}_{L M}=-\sum_{t=1}^{T} \log p\left(r_{t} \mid r_{<t}, c, t_{r}, m_{r}\right) \\
& \mathcal{L}_{s}=\mathbb{1}_{t_{c}=1} \max \left(m-d_{s}, 0\right)^{2} \\
& \mathcal{L}_{c}=\max \left(m-d_{c}, 0\right)^{2}
\end{aligned}
$$
- We propose a novel control framework that combines context-dependent and context-independent control utilizing hierarchical prefixes.
- We introduce novel contrastive training objectives to guide the meta prefixes to learn the control strategy implicitly.
[“Don’t Take This Out of Context!” On the Need for Contextual Models and Evaluations for Stylistic Rewriting](https://aclanthology.org/2023.emnlp-main.701.pdf) EMNLP 2023
[GreenLLaMA A Framework for Detoxification with Explanations](https://arxiv.org/pdf/2402.15951.pdf)
[InfoNCE Loss and Mutual Information Estimation](https://arxiv.org/pdf/1807.03748.pdf)
### Data preprocessing
|Toxic context|Clean context|Response|
|-|-|-|
|I can't afford to buy my fucking dog food.|I can not afford to buy my dogs any dog food.|Dogs should not be eating boogers. Feed them normal dog food, as some people may think you are mishandling your dogs if they see you do this.|
|0.999|0|-|
- Transforming the toxic sentence into the clean one may enhance coherence in the dialogue.
- It uses more polite language and expresses concern for all dogs rather than focusing solely on one.
- It can encourage understanding and lead to more helpful or supportive replies.
---
## 4/9 Framework


### Training model
Detoxifer:
- How to define $Z^-$ and $Z^+$
- Idea:
You are fucking handsome
You look nice
You are fucking fucking handsome
You are handsome
$$
\mathcal {L}_{InfoNCE}=-\frac{1}{N} \sum_{i=1}^N \log \left(\frac{\exp \left(\frac{q_i \cdot k_{i^{+}}}{\tau}\right)}{\sum_{j=1}^N \exp \left(\frac{q_i \cdot k_{j^{-}}}{\tau}\right)}\right)
$$
## 3/27 mearsuring toxicity
[roberta_toxicity_classifier](https://huggingface.co/s-nlp/roberta_toxicity_classifier?text=We%27re+about+to+die+of+hunger%3B+we%27re+lost%3B+we+can%27t+leave+this+tree+without+being+swallowed+alive+by+that+monster.)
[RealToxicityPrompts](https://arxiv.org/pdf/2009.11462.pdf)
- [PERSPECTIVE API](https://support.perspectiveapi.com/s/about-the-api-score?language=en_US)

[Classification of social media Toxic comments using Machine learning models](https://arxiv.org/ftp/arxiv/papers/2304/2304.06934.pdf)

### Idea
$P(toxic∣word)= \frac{1}{1+e^{-z}}$
- $z = \beta_0 + \beta_1x_1 +...+ \beta_nx_n$
- $\beta$ : parameter of model learning
Score the word
- $f(W) \geq \theta$
- The threshold $\theta$ can be determined based on empirical observations, domain expertise, or community standards.
**Keep thinking, more detail.......**
### How to evaluate the detoxifier is good?
[A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification](https://aclanthology.org/2022.humeval-1.8.pdf)
[Exploring the Limits of Domain-Adaptive Training for
Detoxifying Large-Scale Language Models](https://proceedings.neurips.cc/paper_files/paper/2022/file/e8c20cafe841cba3e31a17488dc9c3f1-Paper-Conference.pdf)
## 3/20 Example
:::warning
Example
Contrastive learning table
:::
### Example 1
Toxic Sentence:
- Original Sentence (S): “Somebody drop you on your fucking head?”
- Toxicity Score: High (above threshold)
Clean Sentence:
- Cleaned Sentence (S*): “Someone dropped you on your head?”
- Toxicity Score: Low (below threshold)
Related Response:
- Response :
1. "No, I don't recall anyone dropping me on my head. Is there a reason you're asking?"
2. "That's not a very kind thing to say. No, nobody has dropped me on my head. Is there something you'd like to discuss?"
### Example 2
Noised Sentence
- I tred a bit of shadowboxing today. Got one guy called Jaal on the chin and anther called Tyson betwen the eyes
Clean Sentence:
- I tried a bit of shadowboxing today. Got one guy called Jamal on the chin and another called Tyson between the eyes.
Related Response:
- This isn't a joke about black people, right? If it is, it isn't funny.
### Problem
No seq2seq datasets
Consistency
### Contrastive learning table
|Papers|Contribution|Novel----|
|--|--|--|
|[Contrastive Decoding: Open-ended Text Generation as Optimization](https://aclanthology.org/2023.acl-long.687.pdf)|-|-|
|[PiCO: Contrastive Label Disambiguation for Partial Label Learning](https://openreview.net/pdf?id=EhYjZy6e1gJ)|-|-|
|[Controlled Text Generation with Hidden Representation Transformations](https://arxiv.org/abs/2305.19230)|It steers large language models to generate text pertaining to certain attributes;It modifies the hidden representation of the base model through learned transformations.|-|
|[Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation](https://arxiv.org/pdf/2401.03468.pdf)|Pre-training on downstream, single model improve multimodal. constrastive inter and intra-learning.|
|[Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning](https://github.com/chujiezheng/Click)|-|-|
|[Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models](https://aclanthology.org/2023.emnlp-main.124.pdf)|(1)Denoising Autoencoder (DAE), (2) we adjust the representation space of the output to the corresponding classes by clustering through a Contrastive Learning (CL) method and data augmentation, (3) we apply fine-tuning to delimit the predefined categories.|3-Phase fine-tuning
|[Parameter-Efficient Detoxification with Contrastive Decoding](https://arxiv.org/pdf/2401.06947.pdf)|-|-|
|[CONTRASTIVE LEARNING FOR LOW-LIGHT RAW DENOISING](https://arxiv.org/pdf/2305.03352.pdf)|-|Loss|
|[CONT: Contrastive Neural Text Generation](https://arxiv.org/pdf/2205.14690.pdf) NIPS 2022|the construction of contrastive examples. the choice of the contrastive loss. the strategy in decoding.|N-pais loss|
### How to measure toxicity?
### Clean to toxic
https://huggingface.co/datasets/s-nlp/paranmt_for_detox (corpus)
[TOXIGEN](https://aclanthology.org/2022.acl-long.234.pdf)
---------
## 3/13
:::warning
Figure more prob.. ---running
Define what is toxic ---running
Use AI project datasets
- [Emotional-Support-Conversation](https://github.com/thu-coai/Emotional-Support-Conversation)
Algo on constrastive Detoxifier ---running
:::

### Toxic define
- $S_{clean}$ = $\{x_1,x_2,....\}$
$S_{toxic}$ = $\{x_1,\tilde{x}_2,....\}$
- $P(\tilde{x}_t \neq x_t| x_{<t})$
$P(\tilde{x}_t | x_{<t},c)$
Contrastive learning:
- $Z^{+}$ = $\{(z_1,z_2,...z_n)\}$
$Z^{-}$ = $\{(z_1',z_2',...z_n')\}$
$$
L=-\frac{1}{N} \sum_{i=1}^N \log \frac{\exp \left(\operatorname{sim}\left(f\left(Z_i^{+}\right), f\left(Z_i^{-}\right)\right) / \tau\right)}{\sum_{j=1}^N \exp \left(\operatorname{sim}\left(f\left(Z_i^{+}\right), f\left(Z_j^{-}\right)\right) / \tau\right)}
$$
- Assume ground truth : "I wanted to prank others like that"
1. $Z^{+}$:
$$
Z^{+}={\text{('wanted', 'prank', 'others', 'like')}}
$$
2. $Z^{-}$:
$$
Z^{-}={\text{('wated', 'prnk', 'othrs', 'lik')}}
$$
Think:
To reduce toxicity the positive model is fine-tuned on a non-toxic corpus while the negative model is fine-tuned on a toxic corpus.)
$$
\mathrm{BCE}\left(\hat{y}_i, y_i\right)=-\left(y_i \cdot \log \left(\hat{y}_i\right)+\left(1-y_i\right) \cdot \log \left(1-\hat{y}_i\right)\right)
$$
### Metric
BLEU
[Evaluating Coherence in Dialogue Systems using Entailment](https://aclanthology.org/N19-1381.pdf)
### Some consideration
Word-based contrastive:
Efficiently: Focusing solely on toxic and clean words makes it easier to capture subtle differences in these key features.
Loss of context:
Disregarding the entire sentence may result in the loss of certain contextual information, potentially affecting the model's understanding of toxicity in specific contexts.
-------
[Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)](https://ojs.aaai.org/index.php/AAAI/article/view/26594)

- The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.
### TODO
Find some contrastive learning method
---------
## 3/4 Datasets
1. Datasets
3. Hypothesis
4. Baseline?
5. Method
- Prepare a dialogue datasets.
- [prosocial-dialog](https://huggingface.co/datasets/allenai/prosocial-dialog)
- Similarity Score: 0.965
- Similarity Score: 0.920 (textattack)
- [daily_dialog](https://huggingface.co/datasets/daily_dialog)
- [paradetox](https://huggingface.co/datasets/s-nlp/paradetox?row=0) (toxic to clean)
- Attack the input
- https://github.com/QData/TextAttack
- Style Transfer
- Contextual Anomalies
- How to measure the coherence
- [Automatic Evaluation of Text Coherence: Models and Representations](https://www.ijcai.org/Proceedings/05/Papers/0505.pdf)
Word-based similarity:
$$
\begin{aligned}
& \operatorname{sim}\left(S_{1}, S_{2}\right)=\frac{2\left|\operatorname{words}\left(S_{1}\right) \cap \operatorname{words}\left(S_{2}\right)\right|}{\left(\left|\operatorname{words}\left(S_{1}\right)\right|+\left|\operatorname{words}\left(S_{2}\right)\right|\right)}
\end{aligned}
$$
Distributional similarity:
$$
\begin{aligned}
& \operatorname{sim}\left(S_{1}, S_{2}\right)=\cos \left(\mu\left(\vec{S}_{1}\right), \mu\left(\vec{S}_{2}\right)\right) \\
& =\frac{\sum_{j=1}^{n} \mu_{j}\left(\vec{S}_{1}\right) \mu_{j}\left(\vec{S}_{2}\right)}{\sqrt{\sum_{j=1}^{n}\left(\mu_{j}\left(\vec{S}_{1}\right)\right)^{2}} \sqrt{\sum_{j=1}^{n}\left(\mu_{j}\left(\vec{S}_{2}\right)\right)^{2}}}
\end{aligned}
$$
Taxonomybased similarity:
$$
\begin{aligned}
& \operatorname{sim}\left(S_{1}, S_{2}\right)=\frac{\sum_{\substack{w_{1} \in S_{1} \\
w_{2} \in S_{2}}} \underset{\substack{c_{1} \in \operatorname{senses}\left(w_{1}\right) \\
c_{2} \in \operatorname{senses}\left(w_{2}\right)}}{\operatorname{argmax}} \operatorname{sim}\left(c_{1}, c_{2}\right)}{\left|S_{1}\right|\left|S_{2}\right|}
\end{aligned}
$$
co-occurrence statistics in a Wordnet corpus.
- [Evaluating Coherence in Dialogue Systems using Entailment](https://aclanthology.org/N19-1381.pdf)
- [Coherent **Long** Text Generation by Contrastive Soft Prompt](https://aclanthology.org/2022.gem-1.42.pdf)
- Some model:
- [keyphrase-extraction-kbir-inspec](https://huggingface.co/ml6team/keyphrase-extraction-kbir-inspec)
- **Hyp**: Whether a sentence is toxiced or not may or may not affect the output result sentence. ; 把(toxicS,non-toxicS*)扔進去decoder看效果,
- Datasets : Jigsaw 、[paradetox](https://huggingface.co/datasets/s-nlp/paradetox) ; Detoxifier : [bart-base-detox](https://huggingface.co/s-nlp/bart-base-detox)
- compare perplexity -- **done**
- Average(1000) Perplexity for Detox Sentences: **11.123**
- Average(1000) Perplexity for Toxic Sentences: 15.875
- Similarity -- **done**
- 
- 比較bertscore? coherent?
- **Detoxifier** baseline
- [bart-base-detox ](https://huggingface.co/s-nlp/bart-base-detox)
- BERT
- contrastive
- prompt
- How to measure the detoxification is good?
fit into the detoxifier and see what?
[COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification](https://aclanthology.org/2023.findings-emnlp.579.pdf)
[Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise](https://arxiv.org/pdf/2212.11685.pdf)
### CDCG: Contrastive Denoiser for Coherent text Generation
-----------------------
### Objective function
$S = \{X_1,X_2\}$
$$
Z=\textbf{argmax}_i p(w_i|S)
$$
Z+ =?
$L_{\text{toxic}} - \lambda \cdot L_{\text{adv}}$
Z- =?
contrastive leanring??
final extraction:
$L_{\text{detox}} + \alpha \cdot L_{\text{keyphrase}}$
After Coherent:
$\mathcal{L}_{coherent}=\mathbb{E}_{\left(x, y\right) \sim \mathcal{D}}\left[-\log \left(p_\theta\left(Y \mid S^*,K\right)\right)\right]$
$\mathcal{L}_{\text{detox}} - \beta \cdot \mathcal{L}_{\text{Coherent}}$
## 2/17 Draft
### Objective function
$L_{\text{toxic}} - \lambda \cdot L_{\text{adv}}$
$L_{\text{detox}} + \alpha \cdot L_{\text{keyphrase}}$
$\mathcal{L}_{coherence}=\mathbb{E}_{\left(x, y\right) \sim \mathcal{D}}\left[-\log \left(p_\theta\left(Y \mid S^*,K\right)\right)\right]$
### Algorithm
**Input:**
- Toxic sentence: $S$
- Language Model: $LM$
**Procedure:**
1. **Initialization:**
- Initialize parameters: ${\Theta}=\{\theta, \phi_+, \phi_-\}$.
- Set iteration counter: $i = 1$.
2. **Adversarial Learning:**
- **While** stopping criterion not met:
- Generate positive example: $S_+ \leftarrow LM_+(S)$
- Generate negative example: $S_- \leftarrow LM_-(S)$
- Update parameters: ${\Theta} \leftarrow \text{Fine-tune}({\Theta}, S, S_+, S_-)$
- $i \leftarrow i + 1$
3. **Generate Non-toxic Sentence and Keyphrase:**
- $S^*, K \leftarrow \text{DetoxifyAndExtractKeyphrase}(S, LM, {\Theta})$
4. **Decode using llama-2:**
- Final Answer: $\text{Answer} \leftarrow \text{Decoder}(S^*, K)$
5. **Enhance Coherence:**
- $S^* \leftarrow \text{EnhanceCoherence}(S^*, \text{Y})$
**Output:**
- Non-toxic sentence: $S^*$
- Most important keyphrase: $K$
- Final Answer: $\text{Y}$
### Exp
hypothesis:
- Whether a sentence is toxiced or not may or may not affect the output result sentence.
- 把(toxicS,non-toxicS*)扔進去decoder看效果,比較bertscore? coherent?
<!-- 測試直接把toxic sentence餵進去pretrained-model-->
### Draft
[reference](https://arxiv.org/pdf/2304.06359.pdf)
[ICASSP](https://www.overleaf.com/6562376881wgsngwjtdbsh#fe3c55)
<!-- coherence沒想法、exp的部分還沒完成 -->
:::info
Strengths:
- Can be effective for learning representations that capture semantic similarities and differences.
- Useful for tasks where understanding the relationships between data points is crucial.
:::
## 1/30 CDSC: Contrastive Detoxification and Semantic Coherence
### Objective function
original input sentence meaning (Coherent)
$$
\mathcal{L}_{L M}=\mathbb{E}_{\left(x, y^*\right) \sim \mathcal{D}}\left[-\log \left(p_\theta\left(y=y^* \mid x\right)\right)\right]
$$
(Detox)
$$
\begin{array}{c}
\text { Paraphraser } \\
P_{L M}\left(y_t \mid y_{<t}, x\right)
\end{array}
$$
$$
\begin{array}{c}
\text { Toxic } \\
P_{L M}\left(y_t \mid y_{<t}, toxic\right)
\end{array}
$$
$$
\begin{array}{c}
\text { Normal } \\
P_{L M}\left(y_t \mid y_{<t}, safe\right)
\end{array}
$$
Datasets:
[ParaDetox](https://aclanthology.org/2022.acl-long.469.pdf)
[real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts)
### Coherent table
|Method|Papers|Contribution|Datasets|
|--|--|--|--|
|**Knowledge-driven**|[Learning to Copy Coherent Knowledge for Response Generation (AAAI-21)](https://ojs.aaai.org/index.php/AAAI/article/view/17486)|(1)Knowledge Discernment, (2)dialog goal and the dialog context, (3)Context Manager $L(\theta)=L_{N L L}(\theta)+L_{B O W}(\theta)+L_{K L}(\theta)$|DuConv and DuRecDial|
|-|[Knowledge-based Review Generation by Coherence Enhanced Text Planning](https://dl.acm.org/doi/pdf/10.1145/3404835.3462865?casa_token=kXswZr_T7wQAAAAA:6aG4XVLqlA7o85naOHzPVJJ8onSqhWrdj6bGxGm9i-ktVmU1F2Wq30acJ2zHC5aa3WVkaztvadDfzA)|(1) the document plan is modeled as a sequence of sentence plans in order, (2) the sentence plan is modeled as an entity-based subgraph from KG.|Amazon Electronic, Book, and IMDb Movie |
|Hybrid Latent Variables|[Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)](https://ojs.aaai.org/index.php/AAAI/article/view/26594)|The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.|DailyDialog and Opensubtitles|
|Diffusion|[Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models](https://openreview.net/attachment?id=17YbAlc1tW&name=pdf)|without introducing mismatches, Bayesian framework to jointly modify both revealed and unrevealed regions|CelebA-HQ and ImageNet-1K|
|Discourse (High-Level Language Representation)|[Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence](https://arxiv.org/pdf/2105.08963.pdf)|It can represent the prefix sentences at **sentence level and discourse level** in the decoding process; They propose two pretraining objectives to learn the representations by predicting inter-sentence semantic similarity and distinguishing between normal and shuffled sentence orders.|WritingPrompts and ROC|
|Discourse-level|[DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence](https://arxiv.org/pdf/2201.11176.pdf)|DiscoScore (a kind of metrics) strongly correlates with human rated coherence.|RC and LC and Lexical Chain|
|GANs|[TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation](https://aclanthology.org/2021.findings-acl.428.pdf)|They improve local and global coherence, we explicitly introduce a **multi-scale discriminator** to capture the semantic information at varying scales among the sequence of hidden representations encoded by Transformer.|MSCOCO, WMTNEWS and ROC-STORY|
|**Contrastive learning**|[Coherent **Long** Text Generation by Contrastive Soft Prompt](https://aclanthology.org/2022.gem-1.42.pdf)|It learns text representations in the hidden space for better planning long text generation; (**Similar to my idea**); Better than HINT|ROCStories and WritingPrompts|
|-|[CONT: Contrastive Neural Text Generation](https://proceedings.neurips.cc/paper_files/paper/2022/file/0f5fcf4bff73a3537e0813a38f0d3f76-Paper-Conference.pdf)|(1)Contrastive Examples from Predictions (2) N-Pairs Contrastive Loss (3) Inference with Learned Similarity Function |MT, XSum, Code Comment Generation, Data-to-text Generation, Commonsense Generation|
|-|[Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework](https://ojs.aaai.org/index.php/AAAI/article/view/26509)| We propose a contrastive framework to learn the state representations in a discrete space, and insert additional attention layers into the decoder to better exploit these states.|Wikiplots and CNN News|
|-|-|
## TODO
Contrastive learning table
Run some inference
## 1/23 CDCE: Contrastive Detoxification and Coherent Enhancement
### Detox table
|Method|papers|contribution|datasets|
|--|--|--|--|
|Diffusion|[DiffuDetox: A Mixed Diffusion Model for Text Detoxification](https://arxiv.org/pdf/2306.08505.pdf)|(1)conditional model reduces its toxicity (2)unconditional model guide the sampling process|
|Denoise|[Towards a Better Understanding of Noise in Natural Language Processing](https://aclanthology.org/2021.ranlp-1.7.pdf)|-|
|BERT|[Text Detoxification using Large Pre-trained Neural Models](https://arxiv.org/pdf/2109.08914.pdf))|(1) guidance of the generation process with small styleconditional language models and (2) use of paraphrasing models to perform style transfer.(**Similar to my idea**) Conditional bert| Jigsaw
|--|[Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings](https://arxiv.org/pdf/2112.08346.pdf)|(1)We propose a method to generalize toxic directions in the latent space.(2) We also provide a methodology for constructing parallel datasets using a context based word masking system.|
|--|[A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification](https://aclanthology.org/2022.humeval-1.8.pdf)|We conducted an evaluation of detoxification models for Russian using both automatic and manual metrics.
|**Prompt**|[Prompt Tuning for Text Detoxification](https://www.dialog-21.ru/media/5735/konodyukn120.pdf)|We conduct experiments to determine the optimal length of trainable prompt for the task.|
|--|[You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content](https://arxiv.org/pdf/2308.05596.pdf)|(1) Toxicity Classification (2) Toxic Span Detection (3) Detoxification|
|**Constrastive learning**|[COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification](https://aclanthology.org/2023.findings-emnlp.579.pdf)|They contrast the gold standard rephrasing with the identity input-tooutput mapping to effectively isolate and focus learning on non-toxic style transfer|ParaDetox、APPDIA|
|--|[Parameter-Efficient Detoxification with Contrastive Decoding](https://arxiv.org/pdf/2401.06947.pdf)|They leverages the frozen weights of the language model itself and only introduces a tiny portion of new model parameters to detoxify generation.|
|Context-aware|[CMD: a framework for Context-aware Model self- etoxification](https://arxiv.org/pdf/2308.08295.pdf)|
|-|[Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts](https://aclanthology.org/2023.acl-short.21.pdf)|MARCO uses likelihoods under a non-toxic LM (expert) and a toxic LM (anti-expert) to find candidate words to mask and replace.|Social Bias Frames|
|GreenLLaMA|[GreenLLaMA A Framework for Detoxification with Explanations](https://arxiv.org/pdf/2402.15951.pdf)|Black magic|ParaDetox|
### Todo
Finish the table above.
How to show the objective function.
$$
\mathcal{L}_ {pos}\left(\mathbf{Z}, \mathbf{Z}^{+}\right)=-\log \left(\frac{\exp \left(\cos \_\operatorname{sim}\left(\mathbf{Z}, \mathbf{Z}^{+}\right)\right)}{\exp \left(\cos \_\sin \left(\mathbf{Z}, \mathbf{Z}^{+}\right)\right)+\exp \left(\cos \_\operatorname{sim}\left(\mathbf{Z}, \mathbf{Z}^{-}\right)\right)}\right)
$$
$$
\mathcal{L}_ {neg}\left(\mathbf{Z}, \mathbf{Z}^{-}\right)=-\log \left(\frac{\exp \left(\cos \_\operatorname{sim}\left(\mathbf{Z}, \mathbf{Z}^{-}\right)\right)}{\exp \left(\cos \_\sin \left(\mathbf{Z}, \mathbf{Z}^{+}\right)\right)+\exp \left(\cos \_\operatorname{sim}\left(\mathbf{Z}, \mathbf{Z}^{-}\right)\right)}\right)
$$
## 1/16 "CLDetox: Contrastive Learning for Detoxification and Coherence Enhancement"
### Survey
[DiffuDetox: A Mixed Diffusion Model for Text Detoxification](https://arxiv.org/pdf/2306.08505.pdf)

Contribution:
- The conditional model takes toxic text as the condition and **reduces its toxicity**, yielding a diverse set of detoxified sentences. (detoxify)
- The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus **ensures text fluency**. (guide the sampling process)
Limiation:
- Sampling requires sampling both a conditional and a unconditional model, which results in **slower inference**.
- progressive distillation
- The **diversity** of generative models is **degraded** as $w$ increases.
- Ideally we would be able to have a model that improves upon the fluency as well as the model diversity
### Architecture

### Datsets
[real-toxicity-prompts](https://huggingface.co/datasets/allenai/real-toxicity-prompts)
:::info
feedback
1. 大圖的要consistency
2. parameter learning 誰 誰要更新
3. 想好執行流程 演算法 然後再把架構圖更新(改encoder 那邊 包成contrastive)
4. 整理detox coherent table
5. Objective function about detox and coherent
6. How to contrast leaning
7. Detoxification 如何用在AI project
:::
## 1/10 Enhancing consistency in text generation through contrastive learning
### Coherence and paraphrasing.
#### Coherence:
[Learning to Copy Coherent Knowledge for Response Generation (AAAI-21)](https://ojs.aaai.org/index.php/AAAI/article/view/17486)

[Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)](https://ojs.aaai.org/index.php/AAAI/article/view/26594)

- The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.
#### paraphrasing
[Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition (EMNLP-21)](https://aclanthology.org/2021.emnlp-main.430.pdf)
- We convert Conditional Random Field (CRF) into a **multi-label classification module** and encourage consistency on the entity appearance between the original and paraphrased sequences.
#### Problem
1. Incorporate transfer learning or others' leanting into dialog systems to enhance the quality of generated response
2. Incorporating external knowledge sources.
3. It is not clear how the model's latent variables correspond to different aspects of the generated responses.
### Others' previous tasks
story generation
- stories using abstract as outline
- [Consistency and Coherency Enhanced Story Generation](https://arxiv.org/pdf/2010.08822.pdf)
- 
### My preliminary idea
I want to maintain consistency in output even with poor input.
- I want to train a model to generate coherent responses based on input sentences with similar meanings but expressed differently.
Objective:
$$
sim(f(x_1), f(x_2))
$$
Loss:
$$
\mathcal{L}(x_1, x_2) = \max(0, m + \text{Similarity}(f(x_1), f(x_2)) - \text{Similarity}(f(x_1'), f(x_2)))
$$
$$
\mathcal{L}(x_1, x_2) + \alpha \cdot C(x_1, y_2) + \beta \cdot C(x_1, y_2)
$$
m is a margin, a hyperparameter that controls the minimum acceptable difference in similarity.
$C$ is Consistency Metric.
Because of the lack of correct answers in this task:
- Contrastive Learning
- Self-Supervised Learning
### Todo
Semantic similarity in NLG.
Key Information Extraction.
Contrastive learning.
- [Controlled Text Generation with Hidden Representation Transformations](https://arxiv.org/pdf/2305.19230.pdf)
Datasets
:::warning
Feedback
Ask gpt to generate good prompt and bad prompt to train the model.
Let the distence between the good output and victim more further.
address the coherent
address what toxicity
objective function
big picture
:::
---------
## 12/26 Enhancing NLG Consistency
### Title
"Enhancing NLG Consistency Across Diverse Inputs Using Data Augmentation and Keyword-Driven Prompts"
"CID: **C**onsistent NLG with **I**nput **D**iversity using Data Augmentation and Keyword-Driven Prompts"
### Problem definition

Data Augmentation

Inference Example
Input: I'm currently immerse in deep research of nature language generation task.
ANS If you have any specific questions or if there's a particular aspect of your research you'd like to discuss, feel free to share. I'm here to assist you in your endeavors related to natural language generation.
Input :I concentrating to address the various challenges brings by natural language generation.
**The output should be consistency even the input is invarint**
#### why this task is an issue
**Real-world Application Scenarios:**
- NLG systems often encounter diverse inputs from different users or contexts.
- Effectively handling this diversity and generating consistent outputs can better meet user requirements, enhancing the practicality of the system.
**Robustness and Generalization:**
- Considering the diversity of inputs in the real world, making NLG models more robust and capable of generalization is crucial.
- Introducing diverse inputs during training and emphasizing consistency can assist the model in adapting better to a variety of situations.
**Reduced Bias:**
- Denoising can help reduce biases present in the input, promoting fairness and equity in the generated conte
### Previous tasks
[Semantic Accuracy in Natural Language Generation: A Thesis Proposal](https://aclanthology.org/2023.acl-srw.48.pdf)
- They proposed a unified benchmark for NLG metrics focusing on semantic accuracy
Prompt?
[AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts](https://arxiv.org/pdf/2010.15980.pdf)

[Towards a Better Understanding of Noise in Natural Language Processing](https://aclanthology.org/2021.ranlp-1.7.pdf)

Self-supervised-learning
- SimCLR
Disentangled Representation Learning for texts and emotion or keyword ?
- This aim to capture the different dimensions of variation of a text in separate vector embeddings.
### Idea
Disentanglement-based models offer two main advantages:
1. Sampling from the latent space of the style embeddings allows for more diverse and controlled stylistic generation.
2. Similarity of documents can now be calculated for each aspect of variation, allowing for finer-grained retrieval.
Objective $$p(y|x1)=p(y|x2)$$
Problem $$\prod_0^t p(y_t|y_{<t},x,c)$$
c can be the keyword condition
### Challenge
No enough datasets:
- Using autoencoder to generate the similar sentences.
How to extract the keyword
How to know they(inputs) are the same
:::danger
**feedback**:
Title novelty method
- can't just combine prompt and extraction
previous work
fix the equation
:::
-----------------------