Contrastive Disentanglement Learning for Empathetic Dialogue Generation

Ongoing

some baseline papers:
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation

CTSM
CEM
CASE
Supervised CL
Enhanced Coherence-Aware Network with Hierarchical
Disentanglement for Aspect-Category Sentiment Analysis
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Harnessing the Power of Large Language Models for Empathetic Response
Generation: Empirical Investigations and Improvements
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation
https://aclanthology.org/2023.findings-acl.498.pdf

Note
Formulation
Evidence

google link

github (only code no param)
draw.io 所有圖檔
weight param 額外存去用link

8/22 Suggestion from examiners

continuous classifier probability

VAD analysis and VA analysis
show the result of the emotion prediction

Data explaination and configuration
paper draft

8/25 Formulation

min_{θ, ϕ, ψ} E_{(x, y) \sim D} [- \log p (y ∣ x, x_{e}; θ, ψ) - \log p (x_{e} ∣ x; θ, ϕ)]

Since

h_{c}, h_{e} = f_{θ}

, through

c_{ϕ} (h_{e}) = z_{e}

, where

z_{e} = [V_{e}, A_{e}, T_{e}]^{T}

\begin{aligned} E_{(x, y) \sim D} [\log p (x_{e} | x)] = - \log p_{θ, ϕ} (x_{e} | x) \\ = - \log p_{θ, ϕ} (z_{e} | x) = {‖ z_{e} - μ_{θ, ϕ} (x) ‖}_{2}^{2} - \log Z \\ = E_{(x, y) \sim D} [{‖ z_{e} - μ_{θ, ϕ} (x) ‖}_{2}^{2}] - \log Z \end{aligned}

\begin{aligned} L_{e} (x, x_{e}; θ, ϕ) = & - \sum_{i = 0}^{N} \log p_{θ, ϕ} (x_{e_{i}} | x_{i}) \\ = \sum_{i = 0}^{N} {‖ z_{e_{i}} - μ_{θ, ϕ} (x_{i}) ‖}_{2}^{2} - N \cdot \log Z \end{aligned}

Marginalizing over

x_{e}

p_{θ, ψ} (y ∣ x) = \int p_{θ, ψ} (y ∣ x_{e}, x) p_{θ, ψ} (x_{e} ∣ x) d x_{e}

Rewriting with soft prompt (

S

) and incorporating delta function

\begin{aligned} p_{θ, ψ} (y ∣ x) & = \int p_{θ, ψ} (y ∣ S, x) p_{θ, ψ} (S ∣ x) d S \\ = \int p_{θ, ψ} (y ∣ S, x) δ (S - f_{ψ} (f_{θ} (x))) d S \\ = p_{θ, ψ} (y ∣ f_{ψ} (f_{θ} (x)), x) \end{aligned}

L_{g} (x, y; θ, ψ) = - \sum_{t = 0}^{T} \log p_{θ, ψ} (y_{t} | y_{< t}, x, S)

L_{c} = 1 - \frac{h_{p} \cdot h_{c}}{‖ h_{p} ‖ ‖ h_{c} ‖}

L_{c l} = - \frac{1}{N} \sum_{i = 1}^{N} \log (\frac{\exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{i}^{+}}{τ})}{\sum_{j = 1}^{N} \exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{j}^{-}}{τ})})

8/1

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

The human evaluation is preparing
- 7 methods' responses correspond to the same input.
- A/B testing

Side outline

Introduction
- Natural Language Generation
- Empathetic Dialogue System
- Motivation
Methods in Dialogue Systems
- Knowledge Integration
- Disentanglement learning
- Large Languege Model Integration
Contrastive Disentanglement for Coherent Empathetic Dialogue
- Data Augmentation
- Disentangled representation through Contrastive Learning
- Soft prompt Integration
Experiment
- Experimental Setup
- Experimental Result
- Analysis
Conclusion and Future Work

7/22

Progress

7/17 Comparison table

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Contributions of the Proposed Method

Disentanglement of Semantic and Emotional Content:
- Separates semantic and emotional content for nuanced and effective communication.
Contrastive Learning for Emotional Context:
- By distinguishing between different emotional expressions through the pairing of augmented sentences with their corresponding negative examples, the model can better understand and generate empathetic responses.
Integration of Disentangled Information as Soft Prompts:
- The proposed method integrates the disentangled emotional and content information as soft prompts, guiding the generation process to ensure that the model's responses align more closely with the emotional requirements of the consultation system.
VAD-Based Emotion Analysis:
- Maps emotion labels to a 3-dimensional VAD (Valence, Arousal, Dominance) space, providing a more detailed and continuous analysis compared to traditional discrete methods.

Experiment for continuous classifier

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Expereiment for evaluation

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Compare others' method and mine

Methods	Papers	Contribution	Loss	Advantagle
Contrative learning	CTSM: Combining Trait and State Emotions for Empathetic Response Model(SOTA)	emotion guidance module, cross-contrastive learning decoder	$L = γ_{1} L_{e} + γ_{2} L_{g} + γ_{3} L_{c c l} + γ_{4} L_{d i v}$	1. Integrates both trait and state emotions. 2. Utilizes a cross-contrastive learning approach for better guidance.
Disentanglement	Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation	the representation of the history context is disentangled onto the content space and the emotion space with two auxiliary constraints based on the emotion label.	$l_{d i s_c} = - \sum_{i = 1}^{k} y_{c}^{i} \log y_{c}^{i}$ $l_{d i s_e} = - \sum_{i = 1}^{k} e m o_{i} \log y_{e}^{i}$	1. Clear separation of content and emotion. 2. Auxiliary constraints improve the disentanglement process.
Graphical Network	CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation	It first builds upon a commonsense cognition graph and an emotional concept graph and then aligns the user’s cognition and affection at both the coarsegrained and fine-grained levels	$L_{t o t a l} = γ_{1} L_{a l i g n} + γ_{2} L_{e m o} + γ_{3} L_{g e n} + γ_{4} L_{d i v}$	1. Multi-level alignment of cognition and emotion. 2. Incorporates commonsense and emotional concept graphs
My proposed method	-	-	$L_{t o t a l} = γ_{1} L_{c} + γ_{2} L_{e} + γ_{3} L_{c l} + γ_{4} L_{g}$	1. Separates semantic and emotional content for nuanced and effective communication 2. Uses contrastive learning and classifier for emotional context. 3. Enhances coherent and empathetic responses. 4. Using VAD to analysis the context's emotion.

Response table

Emotion	joyful	RougeL
Context	I could have. But I didn't. He cheated on me actually.	-
Golden	Oh I see, why are you still with him then?	-
LLama2 only	I could have. But I didn't. He cheated on me actually. and f*cked it up for.Key details for cheating in a text message. A simple message saying cheated did not detail why or circumstances beyond.	7.5
LLama2 with disentangelement	That's terrible. I'm sorry you're going through this. Do you have any idea why he did this? I'm sure you're heartbroken. I hope you can find closure soon. I'll be thinking of you. Is there anything I can do for you?	8.5
Chatgpt4	it sounds like you might be feeling disappointed and betrayed. Such emotions are entirely natural in your situation. Could you share more about how you're feeling right now?	10.3

Emotion	confident	RougeL
Context	I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly.	-
Golden	I certainly hope so, Why do you want to move?	-
LLama2 only	I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly.I've done up the house as best as I could, and I'm confident it will sell quickly. I have improved the house with some decor	6.7
LLama2 with disentangelement	I hope it goes well. I'm sure it will. You've got a lot of heart. Do you have a lot of experience? I'm just starting out. I've never done anything like this before. I hope it works out. Do they have a lot in common with you? I hope so.	11.5
Chatgpt4	It's great to hear that you've put so much effort into fixing up your house. Your confidence in its quick sale is a positive sign. How are you feeling about the whole process? Are there any aspects that you're particularly excited or nervous about?	7.01

Experiment

Discrete labels:

6/24 Formulation & Framework

Motivation

Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation

Attempts to understand the dialogue context and generate the empathetic response from both the content view and the emotion view via disentanglement.

Towards a Unified Framework of Contrastive Learning for Disentangled Representations, NIPS

This paper extends the theoretical guarantees for disentanglement to a broader family of contrastive methods, while also relaxing the assumptions about the data distribution

Formulation

Objective:

p (y, x_{e} | x) = p (y | x, x_{e}) \cdot p (x_{e} | x)

To find

p (y ∣ x)

, marginalize over all possible values of

x_{e}

p (y ∣ x) = \int p (y, x_{e} ∣ x) d x_{e} = \int p (y ∣ x_{e}, x) p (x_{e} ∣ x) d x_{e}

Assume S is soft prompt produced by the transformer encoder

f_{θ}

and the MLP layers

f_{ϕ}

S = f_{ψ} (f_{θ} (x))

Therefore, we rewrite the conditional probability:

\begin{aligned} p (y ∣ x) = \int p (y ∣ S, x) p (S ∣ x) d S \\ = \int p (y ∣ S, x) δ (S - f_{ψ} (f_{θ} (x))) d S \\ = p (y ∣ f_{ψ} (f_{θ} (x)), x) \\ = \prod_{t = 1}^{T} p (y_{t} ∣ y_{< t}, x, f_{ψ} (f_{θ} (x))) \\ = \exp (\sum_{t = 1}^{T} \log p (y_{t} ∣ y_{< t}, x, f_{ψ} (f_{θ} (x)))) \end{aligned}

L_{g} (x, y; θ, ψ) = - \sum_{t = 0}^{T} \log p_{θ, ψ} (y_{t} | y_{< t}, x, S)

{h_{c}, h_{e}} = f_{θ} (x)

{h_{c}^{'}, h_{e}^{'}} = f_{θ} (x^{'})

L_{c l} = - \frac{1}{N} \sum_{i = 1}^{N} \log (\frac{\exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{i}^{+}}{τ})}{\sum_{j = 1}^{N} \exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{j}^{-}}{τ})})

Discrete Classifier

$E_{(x, y) \sim D} [\log p (x_{e} | x)] = - \log p_{θ, ϕ} (x_{e} | x)$

L_{e} (x, x_{e}; θ, ϕ) = - \sum_{i = 0}^{N} \log p_{θ, ϕ} (x_{e_{i}} | x_{i})

Continuous Classifier
首先，假设你的模型预测的误差（即预测值
${\hat{x}}_{e}$ 和实际值
$x_{e}$ 之间的差异）服从正态分布。对于一个给定的输入
$x$ ，输出
$x_{e}$ 可能服从如下的正态分布：

log-likelihood of a Gaussian distribution

\begin{aligned} p ({\hat{x}}_{e} ∣ x_{e}) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{{({\hat{x}}_{e} - x_{e})}^{2}}{2 σ^{2}}) \\ accuracy = \frac{1}{N} \sum_{i = 1}^{N} I (| \hat{x_{e_{i}}} - x_{e_{i}} | < ϵ) \end{aligned}

I

: indicator function

Notation

$D = {x, x_{e}, x_{p}, y}$

$x$ : Input text from the dataset

$x^{'}$ : Augmented text from
$x$

$h_{c}$ : Contextual representation obtained from the encoder

$h_{e}$ : Emotional representation obtained from the encoder

${\hat{x}}_{e}$ : Predicted emotion

$S$ : Soft prompt generated by the MLP

$\hat{y}$ :Predicted output

$f_{θ}$ :encoder

$f_{ψ}$ :MLP

$c_{ϕ}$ :classifier
Function

${h_{c}, h_{e}} = f_{θ} (x)$

{h_{c}^{'}, h_{e}^{'}} = f_{θ} (x^{'})

{\hat{x}}_{e} = c_{ϕ} (P (h_{e}))

S = f_{ψ} (h)

{\hat{y}}_{t} = p (y_{t} | y_{< t}, x, S)

Training loss:

$L_{c} = 1 - \frac{h_{p} \cdot h_{c}}{‖ h_{p} ‖ ‖ h_{c} ‖}$

L_{e} = - \sum_{i = 0}^{N} \log p (x_{e_{i}} | x_{i}, h_{e_{i}})

L_{c l} = - \frac{1}{N} \sum_{i = 1}^{N} \log (\frac{\exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{i}^{+}}{τ})}{\sum_{j = 1}^{N} \exp (\frac{{\bar{h}}_{i} \cdot {\bar{h}}_{j}^{-}}{τ})})

L_{g} = - \sum_{t = 0}^{T} \log p (y_{t} | y_{< t}, x, S)

L_{t o t a l} = γ_{1} L_{c} + γ_{2} L_{e} + γ_{3} L_{c l} + γ_{4} L_{g}

Experiment

ablation study
dist

ablation study using EMPATHETICDIALOGUES dataset

Using discrete classifier

Only use 1/10 data to measure the coherent

Methods	Bleu	RougeL	dist-1	dist-2	PPL
w/o disentangelment	0.70	6.8	19.43	67.26
w disentangelment	0.75	7.8	9.40	48.95

Use whole data to measure the disentangelement part

Methods	Accuracy
w/o contrative learning	0.05
w contrative learning	0.375

Using continuous classifier

Only use 1/10 data to measure the coherent

Methods	Bleu	RougeL	dist-1	dist-2	PPL
w/o disentangelment	0.74	7.4	18.78	69.48
w disentangelment	0.89	9.3	9.7	42.23	34.2

Use whole data to measure the disentangelement part

Methods	Accuracy
w/o contrative learning	0.13
w contrative learning	0.44

Latent Space Analysis:
- Evaluate the representations in the latent space to ensure that the emotional and content aspects are effectively disentangled and that the emotional part is well-represented.
- Empathy and Coherence Metrics:
- Distint not good

6/18

Flow chart

Contribution

Objective: To enhance the generation of coherent and empathetic responses in NLG tasks by leveraging contrastive learning and disentanglement representation.
Enhanced Disentanglement through Contrastive Learning:
- Unlike previous methods that rely solely on labels to disentangle context and emotion, this approach uses the data itself for contrastive learning.
- By creating positive and negative pairs, the model learns to differentiate emotional expressions more effectively.
Soft Prompt Integration for Coherent and Empathetic Responses:
- The disentangled semantic and emotional information is integrated as soft prompts.
- These soft prompts enhance the coherence and empathy of the generated responses, ensuring that they are contextually appropriate and emotionally aligned.

Current measuring

Disentanglement
Classifier
Coherence
- Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
- A Computational Approach to Understanding Empathy Expressed in Text-Based Mental Health Support

Scenario

Current measuring

Disentanglement

Classifier

paper: valence arousal
github: valence arousal

6/5 New framework

5/29

knowledge

Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation (SOTA)

It is essential to model the content-emotion duality of a dialogue, which is composed of the content view and the emotion view.
two different fully-connected networks are adopted to project the contextual representation H into two different spaces,

Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models

Previous work lack the ability to deeply understand emotional and cognitive nuances, particularly in pinpointing finegrained emotions and their triggers

My proposed method:

X = E + S
X : Job interviews always make me sweat bullets, makes me uncomfortable in general to be looked at under a microscope like that

E :"sweat bullets", "uncomfortable", "looked at under a microscope"
S :"Job interviews always make me"

Y:Don't be nervous. Just be prepared.

5/22 New framework

Sugguestions
More figures and consistency
More detailed about disentanglement and coherence
Parameter for each network

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Inspired by Towards a Unified Framework of Contrastive Learning for Disentangled Representations

papers
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning
ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
Nonlinear ICA of Temporally Dependent Stationary Sources

5/15 Address some points

Data Augmentation for Emotional Enhancement:

To enable the model to better understand and generate empathetic responses.

Contrastive Learning for Emotion Understanding:

To distinguish between different emotional expressions by pairing augmented sentences with their corresponding negative examples.

Disentanglement of Semantic and Emotional Content:

This separation allows the model to focus independently on understanding the context and the underlying emotions, leading to more coherent and empathetic responses.

Integration of Disentangled Information as Soft Prompts:

This approach guides the generation process, ensuring that the model's responses align more closely with the emotional requirements of the consultation system.

Current work:
Using empathetic_dialogues to train this model.
Future work:
Using AI project to train this model.

Survey papers

E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation(study)
Issue:

Current approaches for empathetic dialogue generation mainly perceive an emotional label to generate an empathetic response conditioned on it, which simply treat emotions independently, but ignore the intrinsic emotion correlation in dialogues, resulting in inaccurate emotion perception and unsuitable response generation.

Harnessing the Power of Large Language Models for Empathetic Response Generation: Empirical Investigations and Improvements

Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation

A survey on empathetic dialogue systems

Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation

Attempts to understand the dialogue context and generate the empathetic response from both the content view and the emotion view via disentanglement.

Empathetic dialogue's 32 emotions:

(['sentimental', 'afraid', 'proud', 'faithful', 'terrified', 'joyful', 'angry', 'sad', 'jealous', 'grateful', 'prepared', 'embarrassed', 'excited', 'annoyed', 'lonely', 'ashamed', 'guilty', 'surprised', 'nostalgic', 'confident', 'furious', 'disappointed', 'caring', 'trusting', 'disgusted', 'anticipating', 'anxious', 'hopeful', 'content', 'impressed', 'apprehensive', 'devastated'])

5/8

Disentanglement table

Methods	Papers	Contribution	Tasks or datasets
Contrastive VAE-based model	Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation	contrastive estimation with no external signals; sampling strategy for semantically similar and dissimilar views of the data.	video, audio and time series
Contrastive	Self-Supervised Learning Disentangled Group Representation as Feature	They ground the abstract semantics and the group acting on them into concrete contrastive learning.	images
=	NeurIPS 2019 Disentanglement Challenge	List Challenge:
Text style transfer VAE-based model	Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders	We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations.	Yelp
Text style transfer VAE-based model	An Evaluation of Disentangled Representation Learning for Texts	They proposes evaluation metrics tailored to the specific use-cases of disentangled representations in text generation; They describes empirical evaluations conducted on multiple datasets	PersonageNLG, GYAFC, Bible Datasets
Content-Emotion Duality	Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation	-	Empathetic dialouge
-	-

4/30 Data Augmentation for emtional sentence

Evidence

Disentanglement
Coherence

Data Augmentation for Emotion Detection in Small Imbalanced Text Data
AugEmotionDetection_github

Easy Data Augmentation (EDA)
Embeddings
BART Paraphraser ProtAugment
ChatGPT API

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
eda_nlp_github

Utterance	Emotion	EDA
Was this a friend you were in love with_comma_ or just a best friend?	sentimental	This a was champion you in love precisely with_comma_?
This was a best friend. I miss her	sentimental	This be a. admirer Unity.
Where has she gone?	sentimental	Ha? gone
Wait what are sweatings	afraid	Sudation what
it's quite strange that you didnt imagine it	proud	quite strange that didnt it suppose

4/23 Relation

Contrastive Disentanglement for Coherent Empathetic Dialogue

Disentanglement in Empathetic Dialogues:

Improved Understanding:
- By separating emotional and contextual factors, disentanglement helps models better comprehend underlying emotions and situations.
Enhanced Generative Capabilities:
- To generate emotionally appropriate and contextually relevant responses, fostering more coherent and empathetic dialogues.

Novelty of Contrastive Learning Combined with Soft Prompt in Empathetic Dialogues:

Enhanced Contextual Understanding:
- To focus on specific contextual aspects while learning disentangled representations, deepening the understanding of dialogue context and promoting more contextually relevant responses.
Improved Emotional Representation:
- Contrastive learning encourages models to discern features for different emotional states, complemented by soft prompts for refined emotional representations.
Adaptability to Variations:
Efficient Learning
Generalizability:
- This can across different domains and languages, capture emotional and contextual dynamics in empathetic dialogues more effecti

The challenge involved two stages:

sim-to-real transfer learning
advancing disentangled representation learning to complicated physical objects

NeurIPS 2019 Disentanglement Challenge

Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders

Awesome Disentanglement in NLP

4/16

Papers

Language Model Detoxification in Dialogue with Contextualized Stance Control
EMNLP 2022

Issue:

Previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context.

\begin{aligned} L_{L M} = - \sum_{t = 1}^{T} \log p (r_{t} ∣ r_{< t}, c, t_{r}, m_{r}) \\ L_{s} = 1_{t_{c} = 1} max {(m - d_{s}, 0)}^{2} \\ L_{c} = max {(m - d_{c}, 0)}^{2} \end{aligned}

We propose a novel control framework that combines context-dependent and context-independent control utilizing hierarchical prefixes.
We introduce novel contrastive training objectives to guide the meta prefixes to learn the control strategy implicitly.

“Don’t Take This Out of Context!” On the Need for Contextual Models and Evaluations for Stylistic Rewriting EMNLP 2023

GreenLLaMA A Framework for Detoxification with Explanations

InfoNCE Loss and Mutual Information Estimation

Data preprocessing

Toxic context	Clean context	Response
I can't afford to buy my fucking dog food.	I can not afford to buy my dogs any dog food.	Dogs should not be eating boogers. Feed them normal dog food, as some people may think you are mishandling your dogs if they see you do this.
0.999	0	-

Transforming the toxic sentence into the clean one may enhance coherence in the dialogue.
It uses more polite language and expresses concern for all dogs rather than focusing solely on one.
It can encourage understanding and lead to more helpful or supportive replies.

4/9 Framework

Training model

Detoxifer:

How to define
$Z^{-}$ and
$Z^{+}$
Idea:
You are fucking handsome
You look nice
You are fucking fucking handsome
You are handsome

L_{I n f o N C E} = - \frac{1}{N} \sum_{i = 1}^{N} \log (\frac{\exp (\frac{q_{i} \cdot k_{i^{+}}}{τ})}{\sum_{j = 1}^{N} \exp (\frac{q_{i} \cdot k_{j^{-}}}{τ})})

3/27 mearsuring toxicity

roberta_toxicity_classifier

RealToxicityPrompts

PERSPECTIVE API

Classification of social media Toxic comments using Machine learning models

Idea

P (t o x i c ∣ w o r d) = \frac{1}{1 + e^{- z}}

$z = β_{0} + β_{1} x_{1} + . . . + β_{n} x_{n}$
$β$ : parameter of model learning

Score the word

$f (W) \geq θ$
The threshold
$θ$ can be determined based on empirical observations, domain expertise, or community standards.

Keep thinking, more detail…

How to evaluate the detoxifier is good?

A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification

Exploring the Limits of Domain-Adaptive Training for
Detoxifying Large-Scale Language Models

3/20 Example

Example
Contrastive learning table

Example 1

Toxic Sentence:

Original Sentence (S): “Somebody drop you on your fucking head?”
Toxicity Score: High (above threshold)

Clean Sentence:

Cleaned Sentence (S*): “Someone dropped you on your head?”
Toxicity Score: Low (below threshold)

Related Response:

Response :

"No, I don't recall anyone dropping me on my head. Is there a reason you're asking?"
"That's not a very kind thing to say. No, nobody has dropped me on my head. Is there something you'd like to discuss?"

Example 2

Noised Sentence

I tred a bit of shadowboxing today. Got one guy called Jaal on the chin and anther called Tyson betwen the eyes

Clean Sentence:

I tried a bit of shadowboxing today. Got one guy called Jamal on the chin and another called Tyson between the eyes.

Related Response:

This isn't a joke about black people, right? If it is, it isn't funny.

Problem

No seq2seq datasets
Consistency

Contrastive learning table

Papers	Contribution	Novel––
Contrastive Decoding: Open-ended Text Generation as Optimization	-	-
PiCO: Contrastive Label Disambiguation for Partial Label Learning	-	-
Controlled Text Generation with Hidden Representation Transformations	It steers large language models to generate text pertaining to certain attributes；It modifies the hidden representation of the base model through learned transformations.	-
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation	Pre-training on downstream, single model improve multimodal. constrastive inter and intra-learning.
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning	-	-
Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models	(1)Denoising Autoencoder (DAE), (2) we adjust the representation space of the output to the corresponding classes by clustering through a Contrastive Learning (CL) method and data augmentation, (3) we apply fine-tuning to delimit the predefined categories.	3-Phase fine-tuning
Parameter-Efficient Detoxification with Contrastive Decoding	-	-
CONTRASTIVE LEARNING FOR LOW-LIGHT RAW DENOISING	-	Loss
CONT: Contrastive Neural Text Generation NIPS 2022	the construction of contrastive examples. the choice of the contrastive loss. the strategy in decoding.	N-pais loss

How to measure toxicity?

Clean to toxic

https://huggingface.co/datasets/s-nlp/paranmt_for_detox (corpus)
TOXIGEN

3/13

Figure more prob.. –-running
Define what is toxic –-running
Use AI project datasets

Emotional-Support-Conversation

Algo on constrastive Detoxifier –-running

Toxic define

$S_{c l e a n}$ =
${x_{1}, x_{2}, . . . .}$

$S_{t o x i c}$ =
${x_{1}, {\tilde{x}}_{2}, . . . .}$
$P ({\tilde{x}}_{t} \neq x_{t} | x_{< t})$

$P ({\tilde{x}}_{t} | x_{< t}, c)$

Contrastive learning:

$Z^{+}$ =
${(z_{1}, z_{2}, . . . z_{n})}$

$Z^{-}$ =
${(z_{1}^{'}, z_{2}^{'}, . . . z_{n}^{'})}$

$L = - \frac{1}{N} \sum_{i = 1}^{N} \log \frac{\exp (sim (f (Z_{i}^{+}), f (Z_{i}^{-})) / τ)}{\sum_{j = 1}^{N} \exp (sim (f (Z_{i}^{+}), f (Z_{j}^{-})) / τ)}$
Assume ground truth : "I wanted to prank others like that"

$Z^{+}$ :

$Z^{+} = ('wanted', 'prank', 'others', 'like')$
$Z^{-}$ :

$Z^{-} = ('wated', 'prnk', 'othrs', 'lik')$

Think:
To reduce toxicity the positive model is fine-tuned on a non-toxic corpus while the negative model is fine-tuned on a toxic corpus.)

BCE ({\hat{y}}_{i}, y_{i}) = - (y_{i} \cdot \log ({\hat{y}}_{i}) + (1 - y_{i}) \cdot \log (1 - {\hat{y}}_{i}))

Metric

BLEU
Evaluating Coherence in Dialogue Systems using Entailment

Some consideration

Word-based contrastive:
Efficiently: Focusing solely on toxic and clean words makes it easier to capture subtle differences in these key features.

Loss of context:
Disregarding the entire sentence may result in the loss of certain contextual information, potentially affecting the model's understanding of toxicity in specific contexts.

Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)

The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.

TODO

Find some contrastive learning method

3/4 Datasets

Datasets
Hypothesis
Baseline?
Method

Prepare a dialogue datasets.
- prosocial-dialog
  - Similarity Score: 0.965
  - Similarity Score: 0.920 (textattack)
- daily_dialog
- paradetox (toxic to clean)
Attack the input
- https://github.com/QData/TextAttack
- Style Transfer
- Contextual Anomalies
How to measure the coherence
- Automatic Evaluation of Text Coherence: Models and Representations
  Word-based similarity:
  
  $\begin{aligned} sim (S_{1}, S_{2}) = \frac{2 | words (S_{1}) \cap words (S_{2}) |}{(| words (S_{1}) | + | words (S_{2}) |)} \end{aligned}$
  Distributional similarity:
  
  $\begin{aligned} sim (S_{1}, S_{2}) = \cos (μ ({\vec{S}}_{1}), μ ({\vec{S}}_{2})) \\ = \frac{\sum_{j = 1}^{n} μ_{j} ({\vec{S}}_{1}) μ_{j} ({\vec{S}}_{2})}{\sqrt{\sum_{j = 1}^{n} {(μ_{j} ({\vec{S}}_{1}))}^{2}} \sqrt{\sum_{j = 1}^{n} {(μ_{j} ({\vec{S}}_{2}))}^{2}}} \end{aligned}$
  Taxonomybased similarity:
  
  $\begin{aligned} sim (S_{1}, S_{2}) = \frac{\sum_{\begin{array}{c} w_{1} \in S_{1} \\ w_{2} \in S_{2} \end{array}} \underset{\begin{array}{c} c_{1} \in senses (w_{1}) \\ c_{2} \in senses (w_{2}) \end{array}}{argmax} sim (c_{1}, c_{2})}{| S_{1} | | S_{2} |} \end{aligned}$
  co-occurrence statistics in a Wordnet corpus.
- Evaluating Coherence in Dialogue Systems using Entailment
- Coherent Long Text Generation by Contrastive Soft Prompt
Some model:
- keyphrase-extraction-kbir-inspec
Hyp: Whether a sentence is toxiced or not may or may not affect the output result sentence. ; 把(toxicS,non-toxicS*)扔進去decoder看效果，
- Datasets : Jigsaw 、paradetox ; Detoxifier : bart-base-detox
- compare perplexity – done
  - Average(1000) Perplexity for Detox Sentences: 11.123
  - Average(1000) Perplexity for Toxic Sentences: 15.875
- Similarity – done
- 比較bertscore? coherent?
Detoxifier baseline
- bart-base-detox
- BERT
- contrastive
- prompt
How to measure the detoxification is good?
fit into the detoxifier and see what?

COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification

Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

CDCG: Contrastive Denoiser for Coherent text Generation

Objective function

S = {X_{1}, X_{2}}

Z = {argmax}_{i} p (w_{i} | S)

Z+ =?

L_{toxic} - λ \cdot L_{adv}

Z- =?
contrastive leanring??

final extraction:

L_{detox} + α \cdot L_{keyphrase}

After Coherent:

L_{c o h e r e n t} = E_{(x, y) \sim D} [- \log (p_{θ} (Y ∣ S^{*}, K))]

L_{detox} - β \cdot L_{Coherent}

2/17 Draft

Objective function

L_{toxic} - λ \cdot L_{adv}

L_{detox} + α \cdot L_{keyphrase}

L_{c o h e r e n c e} = E_{(x, y) \sim D} [- \log (p_{θ} (Y ∣ S^{*}, K))]

Algorithm

Input:

Toxic sentence:
$S$
Language Model:
$L M$

Procedure:

Initialization:
- Initialize parameters:
  $Θ = {θ, ϕ_{+}, ϕ_{-}}$ .
- Set iteration counter:
  $i = 1$ .
Adversarial Learning:
- While stopping criterion not met:
  - Generate positive example:
    $S_{+} \leftarrow L M_{+} (S)$
  - Generate negative example:
    $S_{-} \leftarrow L M_{-} (S)$
  - Update parameters:
    $Θ \leftarrow Fine-tune (Θ, S, S_{+}, S_{-})$
  - $i \leftarrow i + 1$
Generate Non-toxic Sentence and Keyphrase:
- $S^{*}, K \leftarrow DetoxifyAndExtractKeyphrase (S, L M, Θ)$
Decode using llama-2:
- Final Answer:
  $Answer \leftarrow Decoder (S^{*}, K)$
Enhance Coherence:
- $S^{*} \leftarrow EnhanceCoherence (S^{*}, Y)$

Output:

Non-toxic sentence:
$S^{*}$
Most important keyphrase:
$K$
Final Answer:
$Y$

Exp

hypothesis:

Whether a sentence is toxiced or not may or may not affect the output result sentence.
- 把(toxicS,non-toxicS*)扔進去decoder看效果，比較bertscore? coherent?

Draft

reference

ICASSP

Strengths:

Can be effective for learning representations that capture semantic similarities and differences.
Useful for tasks where understanding the relationships between data points is crucial.

1/30 CDSC: Contrastive Detoxification and Semantic Coherence

Objective function

original input sentence meaning (Coherent)

L_{L M} = E_{(x, y^{*}) \sim D} [- \log (p_{θ} (y = y^{*} ∣ x))]

(Detox)

\begin{matrix} Paraphraser \\ P_{L M} (y_{t} ∣ y_{< t}, x) \end{matrix}

\begin{matrix} Toxic \\ P_{L M} (y_{t} ∣ y_{< t}, t o x i c) \end{matrix}

\begin{matrix} Normal \\ P_{L M} (y_{t} ∣ y_{< t}, s a f e) \end{matrix}

Datasets:
ParaDetox
real-toxicity-prompts

Coherent table

Method	Papers	Contribution	Datasets
Knowledge-driven	Learning to Copy Coherent Knowledge for Response Generation (AAAI-21)	(1)Knowledge Discernment, (2)dialog goal and the dialog context, (3)Context Manager $L (θ) = L_{N L L} (θ) + L_{B O W} (θ) + L_{K L} (θ)$	DuConv and DuRecDial
-	Knowledge-based Review Generation by Coherence Enhanced Text Planning	(1) the document plan is modeled as a sequence of sentence plans in order, (2) the sentence plan is modeled as an entity-based subgraph from KG.	Amazon Electronic, Book, and IMDb Movie
Hybrid Latent Variables	Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)	The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.	DailyDialog and Opensubtitles
Diffusion	Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models	without introducing mismatches, Bayesian framework to jointly modify both revealed and unrevealed regions	CelebA-HQ and ImageNet-1K
Discourse (High-Level Language Representation)	Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence	It can represent the prefix sentences at sentence level and discourse level in the decoding process; They propose two pretraining objectives to learn the representations by predicting inter-sentence semantic similarity and distinguishing between normal and shuffled sentence orders.	WritingPrompts and ROC
Discourse-level	DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence	DiscoScore (a kind of metrics) strongly correlates with human rated coherence.	RC and LC and Lexical Chain
GANs	TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation	They improve local and global coherence, we explicitly introduce a multi-scale discriminator to capture the semantic information at varying scales among the sequence of hidden representations encoded by Transformer.	MSCOCO, WMTNEWS and ROC-STORY
Contrastive learning	Coherent Long Text Generation by Contrastive Soft Prompt	It learns text representations in the hidden space for better planning long text generation; (Similar to my idea); Better than HINT	ROCStories and WritingPrompts
-	CONT: Contrastive Neural Text Generation	(1)Contrastive Examples from Predictions (2) N-Pairs Contrastive Loss (3) Inference with Learned Similarity Function	MT, XSum, Code Comment Generation, Data-to-text Generation, Commonsense Generation
-	Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework	We propose a contrastive framework to learn the state representations in a discrete space, and insert additional attention layers into the decoder to better exploit these states.	Wikiplots and CNN News
-	-

TODO

Contrastive learning table
Run some inference

1/23 CDCE: Contrastive Detoxification and Coherent Enhancement

Detox table

Method	papers	contribution	datasets
Diffusion	DiffuDetox: A Mixed Diffusion Model for Text Detoxification	(1)conditional model reduces its toxicity (2)unconditional model guide the sampling process
Denoise	Towards a Better Understanding of Noise in Natural Language Processing	-
BERT	Text Detoxification using Large Pre-trained Neural Models)	(1) guidance of the generation process with small styleconditional language models and (2) use of paraphrasing models to perform style transfer.(Similar to my idea) Conditional bert	Jigsaw
–	Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings	(1)We propose a method to generalize toxic directions in the latent space.(2) We also provide a methodology for constructing parallel datasets using a context based word masking system.
–	A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification	We conducted an evaluation of detoxification models for Russian using both automatic and manual metrics.
Prompt	Prompt Tuning for Text Detoxification	We conduct experiments to determine the optimal length of trainable prompt for the task.
–	You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content	(1) Toxicity Classification (2) Toxic Span Detection (3) Detoxification
Constrastive learning	COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification	They contrast the gold standard rephrasing with the identity input-tooutput mapping to effectively isolate and focus learning on non-toxic style transfer	ParaDetox、APPDIA
–	Parameter-Efficient Detoxification with Contrastive Decoding	They leverages the frozen weights of the language model itself and only introduces a tiny portion of new model parameters to detoxify generation.
Context-aware	CMD: a framework for Context-aware Model self- etoxification
-	Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts	MARCO uses likelihoods under a non-toxic LM (expert) and a toxic LM (anti-expert) to find candidate words to mask and replace.	Social Bias Frames
GreenLLaMA	GreenLLaMA A Framework for Detoxification with Explanations	Black magic	ParaDetox

Todo

Finish the table above.
How to show the objective function.

L_{p o s} (Z, Z^{+}) = - \log (\frac{\exp (\cos_sim (Z, Z^{+}))}{\exp (\cos_\sin (Z, Z^{+})) + \exp (\cos_sim (Z, Z^{-}))})

L_{n e g} (Z, Z^{-}) = - \log (\frac{\exp (\cos_sim (Z, Z^{-}))}{\exp (\cos_\sin (Z, Z^{+})) + \exp (\cos_sim (Z, Z^{-}))})

1/16 "CLDetox: Contrastive Learning for Detoxification and Coherence Enhancement"

Survey

DiffuDetox: A Mixed Diffusion Model for Text Detoxification

Contribution:

The conditional model takes toxic text as the condition and reduces its toxicity, yielding a diverse set of detoxified sentences. (detoxify)
The unconditional model is trained to recover the input text, which allows the introduction of additional fluent text for training and thus ensures text fluency. (guide the sampling process)

Limiation:

Sampling requires sampling both a conditional and a unconditional model, which results in slower inference.
- progressive distillation
The diversity of generative models is degraded as
$w$ increases.
- Ideally we would be able to have a model that improves upon the fluency as well as the model diversity

Architecture

Datsets

real-toxicity-prompts

feedback

大圖的要consistency
parameter learning 誰誰要更新
想好執行流程演算法然後再把架構圖更新（改encoder 那邊包成contrastive)
整理detox coherent table
Objective function about detox and coherent
How to contrast leaning
Detoxification 如何用在AI project

1/10 Enhancing consistency in text generation through contrastive learning

Coherence and paraphrasing.

Coherence:

Learning to Copy Coherent Knowledge for Response Generation (AAAI-21)

Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23)

The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses.

paraphrasing

Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition (EMNLP-21)

We convert Conditional Random Field (CRF) into a multi-label classification module and encourage consistency on the entity appearance between the original and paraphrased sequences.

Problem

Incorporate transfer learning or others' leanting into dialog systems to enhance the quality of generated response
Incorporating external knowledge sources.
It is not clear how the model's latent variables correspond to different aspects of the generated responses.

Others' previous tasks

story generation

stories using abstract as outline
Consistency and Coherency Enhanced Story Generation

My preliminary idea

I want to maintain consistency in output even with poor input.

I want to train a model to generate coherent responses based on input sentences with similar meanings but expressed differently.

Objective:

s i m (f (x_{1}), f (x_{2}))

Loss:

L (x_{1}, x_{2}) = max (0, m + Similarity (f (x_{1}), f (x_{2})) - Similarity (f (x_{1}^{'}), f (x_{2})))

L (x_{1}, x_{2}) + α \cdot C (x_{1}, y_{2}) + β \cdot C (x_{1}, y_{2})

m is a margin, a hyperparameter that controls the minimum acceptable difference in similarity.

C

is Consistency Metric.

Because of the lack of correct answers in this task:

Contrastive Learning
Self-Supervised Learning

Todo

Semantic similarity in NLG.
Key Information Extraction.
Contrastive learning.

Controlled Text Generation with Hidden Representation Transformations

Datasets

Feedback
Ask gpt to generate good prompt and bad prompt to train the model.
Let the distence between the good output and victim more further.
address the coherent
address what toxicity
objective function
big picture

12/26 Enhancing NLG Consistency

Title

"Enhancing NLG Consistency Across Diverse Inputs Using Data Augmentation and Keyword-Driven Prompts"

"CID: Consistent NLG with Input Diversity using Data Augmentation and Keyword-Driven Prompts"

Problem definition

Data Augmentation

Inference Example
Input: I'm currently immerse in deep research of nature language generation task.

ANS If you have any specific questions or if there's a particular aspect of your research you'd like to discuss, feel free to share. I'm here to assist you in your endeavors related to natural language generation.

Input :I concentrating to address the various challenges brings by natural language generation.

The output should be consistency even the input is invarint

why this task is an issue

Real-world Application Scenarios:

NLG systems often encounter diverse inputs from different users or contexts.
Effectively handling this diversity and generating consistent outputs can better meet user requirements, enhancing the practicality of the system.

Robustness and Generalization:

Considering the diversity of inputs in the real world, making NLG models more robust and capable of generalization is crucial.
Introducing diverse inputs during training and emphasizing consistency can assist the model in adapting better to a variety of situations.

Reduced Bias:

Denoising can help reduce biases present in the input, promoting fairness and equity in the generated conte

Previous tasks

Semantic Accuracy in Natural Language Generation: A Thesis Proposal

They proposed a unified benchmark for NLG metrics focusing on semantic accuracy

Prompt?
AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Towards a Better Understanding of Noise in Natural Language Processing

Self-supervised-learning

SimCLR

Disentangled Representation Learning for texts and emotion or keyword ?

This aim to capture the different dimensions of variation of a text in separate vector embeddings.

Idea

Disentanglement-based models offer two main advantages:

Sampling from the latent space of the style embeddings allows for more diverse and controlled stylistic generation.
Similarity of documents can now be calculated for each aspect of variation, allowing for finer-grained retrieval.

Objective

p (y | x 1) = p (y | x 2)

Problem

\prod_{0}^{t} p (y_{t} | y_{< t}, x, c)

c can be the keyword condition

Challenge

No enough datasets:

Using autoencoder to generate the similar sentences.

How to extract the keyword

How to know they(inputs) are the same

feedback:
Title novelty method

can't just combine prompt and extraction

previous work
fix the equation

Contrastive Disentanglement Learning for Empathetic Dialogue Generation

Ongoing

8/22 Suggestion from examiners

8/25 Formulation

8/1

Side outline

7/22

7/17 Comparison table

Contributions of the Proposed Method

Experiment for continuous classifier

Expereiment for evaluation

7/1 Related work

Compare others' method and mine

Response table

Experiment

6/24 Formulation & Framework

Motivation

Framework Refinement

Formulation

Experiment

ablation study using EMPATHETICDIALOGUES dataset

Using discrete classifier

Using continuous classifier

6/18

Flow chart

Contribution

Current measuring

6/12 Framework Refinement

Scenario

Refinement

Current measuring

6/5 New framework

5/29

5/22 New framework

5/15 Address some points

Survey papers

5/8

Disentanglement table

4/30 Data Augmentation for emtional sentence

4/23 Relation

Contrastive Disentanglement for Coherent Empathetic Dialogue

4/16

Papers

Data preprocessing

4/9 Framework

Training model

3/27 mearsuring toxicity

Idea

How to evaluate the detoxifier is good?

3/20 Example

Example 1

Example 2

Problem

Contrastive learning table

How to measure toxicity?

Clean to toxic

3/13

Toxic define

Metric

Some consideration

TODO

3/4 Datasets

CDCG: Contrastive Denoiser for Coherent text Generation

Objective function

2/17 Draft

Objective function

Algorithm

Exp

Draft

1/30 CDSC: Contrastive Detoxification and Semantic Coherence

Objective function

Coherent table

TODO

1/23 CDCE: Contrastive Detoxification and Coherent Enhancement

Detox table

Todo

1/16 "CLDetox: Contrastive Learning for Detoxification and Coherence Enhancement"

Survey

Architecture

Datsets