some baseline papers:
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation
CTSM
CEM
CASE
Supervised CL
Enhanced Coherence-Aware Network with Hierarchical
Disentanglement for Aspect-Category Sentiment Analysis
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Harnessing the Power of Large Language Models for Empathetic Response
Generation: Empirical Investigations and Improvements
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation
https://aclanthology.org/2023.findings-acl.498.pdf
Note
Formulation
Evidence
google link
github (only code no param)
draw.io 所有圖檔
weight param 額外存 去用link
Since , through , where
Marginalizing over
Rewriting with soft prompt () and incorporating delta function
Disentanglement of Semantic and Emotional Content:
Contrastive Learning for Emotional Context:
Integration of Disentangled Information as Soft Prompts:
VAD-Based Emotion Analysis:
Methods | Papers | Contribution | Loss | Advantagle |
---|---|---|---|---|
Contrative learning | CTSM: Combining Trait and State Emotions for Empathetic Response Model(SOTA) | emotion guidance module, cross-contrastive learning decoder | 1. Integrates both trait and state emotions. 2. Utilizes a cross-contrastive learning approach for better guidance. |
|
Disentanglement | Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation | the representation of the history context is disentangled onto the content space and the emotion space with two auxiliary constraints based on the emotion label. | 1. Clear separation of content and emotion. 2. Auxiliary constraints improve the disentanglement process. |
|
Graphical Network | CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation | It first builds upon a commonsense cognition graph and an emotional concept graph and then aligns the user’s cognition and affection at both the coarsegrained and fine-grained levels | 1. Multi-level alignment of cognition and emotion. 2. Incorporates commonsense and emotional concept graphs |
|
My proposed method | - | - | 1. Separates semantic and emotional content for nuanced and effective communication 2. Uses contrastive learning and classifier for emotional context. 3. Enhances coherent and empathetic responses. 4. Using VAD to analysis the context's emotion. |
Emotion | joyful | RougeL |
---|---|---|
Context | I could have. But I didn't. He cheated on me actually. | - |
Golden | Oh I see, why are you still with him then? | - |
LLama2 only | I could have. But I didn't. He cheated on me actually. and f*cked it up for.Key details for cheating in a text message. A simple message saying cheated did not detail why or circumstances beyond. | 7.5 |
LLama2 with disentangelement | That's terrible. I'm sorry you're going through this. Do you have any idea why he did this? I'm sure you're heartbroken. I hope you can find closure soon. I'll be thinking of you. Is there anything I can do for you? | 8.5 |
Chatgpt4 | it sounds like you might be feeling disappointed and betrayed. Such emotions are entirely natural in your situation. Could you share more about how you're feeling right now? | 10.3 |
Emotion | confident | RougeL |
---|---|---|
Context | I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly. | - |
Golden | I certainly hope so, Why do you want to move? | - |
LLama2 only | I have fixed up house to the best of my ability. I feel very certain that it will sell very quickly.I've done up the house as best as I could, and I'm confident it will sell quickly. I have improved the house with some decor | 6.7 |
LLama2 with disentangelement | I hope it goes well. I'm sure it will. You've got a lot of heart. Do you have a lot of experience? I'm just starting out. I've never done anything like this before. I hope it works out. Do they have a lot in common with you? I hope so. | 11.5 |
Chatgpt4 | It's great to hear that you've put so much effort into fixing up your house. Your confidence in its quick sale is a positive sign. How are you feeling about the whole process? Are there any aspects that you're particularly excited or nervous about? | 7.01 |
Discrete labels:
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation
Towards a Unified Framework of Contrastive Learning for Disentangled Representations, NIPS
Objective:
To find , marginalize over all possible values of :
Assume S is soft prompt produced by the transformer encoder and the MLP layers :
Therefore, we rewrite the conditional probability:
log-likelihood of a Gaussian distribution
: indicator function
Notation
: Input text from the dataset
: Augmented text from
: Contextual representation obtained from the encoder
: Emotional representation obtained from the encoder
: Predicted emotion
: Soft prompt generated by the MLP
:Predicted output
:encoder
:MLP
:classifier
Function
Only use 1/10 data to measure the coherent
Methods | Bleu | RougeL | dist-1 | dist-2 | PPL |
---|---|---|---|---|---|
w/o disentangelment | 0.70 | 6.8 | 19.43 | 67.26 | |
w disentangelment | 0.75 | 7.8 | 9.40 | 48.95 |
Use whole data to measure the disentangelement part
Methods | Accuracy |
---|---|
w/o contrative learning | 0.05 |
w contrative learning | 0.375 |
Only use 1/10 data to measure the coherent
Methods | Bleu | RougeL | dist-1 | dist-2 | PPL |
---|---|---|---|---|---|
w/o disentangelment | 0.74 | 7.4 | 18.78 | 69.48 | |
w disentangelment | 0.89 | 9.3 | 9.7 | 42.23 | 34.2 |
Use whole data to measure the disentangelement part
Methods | Accuracy |
---|---|
w/o contrative learning | 0.13 |
w contrative learning | 0.44 |
Objective: To enhance the generation of coherent and empathetic responses in NLG tasks by leveraging contrastive learning and disentanglement representation.
Enhanced Disentanglement through Contrastive Learning:
Soft Prompt Integration for Coherent and Empathetic Responses:
Disentanglement
Classifier
Coherence
Disentanglement
Classifier
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation (SOTA)
Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models
My proposed method:
X = E + S
X : Job interviews always make me sweat bullets, makes me uncomfortable in general to be looked at under a microscope like that
E :"sweat bullets", "uncomfortable", "looked at under a microscope"
S :"Job interviews always make me"
Y:Don't be nervous. Just be prepared.
Sugguestions
More figures and consistency
More detailed about disentanglement and coherence
Parameter for each network
Data Augmentation for Emotion Detection in Small Imbalanced Text Data
Inspired by Towards a Unified Framework of Contrastive Learning for Disentangled Representations
Data Augmentation for Emotional Enhancement:
Contrastive Learning for Emotion Understanding:
Disentanglement of Semantic and Emotional Content:
Integration of Disentangled Information as Soft Prompts:
Current work:
Using empathetic_dialogues to train this model.
Future work:
Using AI project to train this model.
E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation(study)
Issue:
Emotion-Aware Transformer Encoder for Empathetic Dialogue Generation
A survey on empathetic dialogue systems
Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation
(['sentimental', 'afraid', 'proud', 'faithful', 'terrified', 'joyful', 'angry', 'sad', 'jealous', 'grateful', 'prepared', 'embarrassed', 'excited', 'annoyed', 'lonely', 'ashamed', 'guilty', 'surprised', 'nostalgic', 'confident', 'furious', 'disappointed', 'caring', 'trusting', 'disgusted', 'anticipating', 'anxious', 'hopeful', 'content', 'impressed', 'apprehensive', 'devastated'])
Methods | Papers | Contribution | Tasks or datasets |
---|---|---|---|
Contrastive VAE-based model | Sample and Predict Your Latent: Modality-free Sequential Disentanglement via Contrastive Estimation | contrastive estimation with no external signals; sampling strategy for semantically similar and dissimilar views of the data. | video, audio and time series |
Contrastive | Self-Supervised Learning Disentangled Group Representation as Feature | They ground the abstract semantics and the group acting on them into concrete contrastive learning. | images |
= | NeurIPS 2019 Disentanglement Challenge | List Challenge: | |
Text style transfer VAE-based model | Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders | We propose a Variational Autoencoder based method which models language features as discrete variables and encourages independence between variables for learning disentangled representations. | Yelp |
Text style transfer VAE-based model | An Evaluation of Disentangled Representation Learning for Texts | They proposes evaluation metrics tailored to the specific use-cases of disentangled representations in text generation; They describes empirical evaluations conducted on multiple datasets | PersonageNLG, GYAFC, Bible Datasets |
Content-Emotion Duality | Modeling Content-Emotion Duality via Disentanglement for Empathetic Conversation | - | Empathetic dialouge |
- | - |
Evidence
Data Augmentation for Emotion Detection in Small Imbalanced Text Data
AugEmotionDetection_github
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
eda_nlp_github
Utterance | Emotion | EDA |
---|---|---|
Was this a friend you were in love with_comma_ or just a best friend? | sentimental | This a was champion you in love precisely with_comma_? |
This was a best friend. I miss her | sentimental | This be a. admirer Unity. |
Where has she gone? | sentimental | Ha? gone |
Wait what are sweatings | afraid | Sudation what |
it's quite strange that you didnt imagine it | proud | quite strange that didnt it suppose |
Disentanglement in Empathetic Dialogues:
Novelty of Contrastive Learning Combined with Soft Prompt in Empathetic Dialogues:
Enhanced Contextual Understanding:
Improved Emotional Representation:
Adaptability to Variations:
Efficient Learning
Generalizability:
The challenge involved two stages:
NeurIPS 2019 Disentanglement Challenge
Disentangling Generative Factors in Natural Language with Discrete Variational Autoencoders
Awesome Disentanglement in NLP
Language Model Detoxification in Dialogue with Contextualized Stance Control
EMNLP 2022
Issue:
We propose a novel control framework that combines context-dependent and context-independent control utilizing hierarchical prefixes.
We introduce novel contrastive training objectives to guide the meta prefixes to learn the control strategy implicitly.
GreenLLaMA A Framework for Detoxification with Explanations
InfoNCE Loss and Mutual Information Estimation
Toxic context | Clean context | Response |
---|---|---|
I can't afford to buy my fucking dog food. | I can not afford to buy my dogs any dog food. | Dogs should not be eating boogers. Feed them normal dog food, as some people may think you are mishandling your dogs if they see you do this. |
0.999 | 0 | - |
Detoxifer:
Classification of social media Toxic comments using Machine learning models
Score the word
Keep thinking, more detail…
A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification
Exploring the Limits of Domain-Adaptive Training for
Detoxifying Large-Scale Language Models
Example
Contrastive learning table
Toxic Sentence:
Clean Sentence:
Related Response:
Noised Sentence
Clean Sentence:
Related Response:
No seq2seq datasets
Consistency
Papers | Contribution | Novel–– |
---|---|---|
Contrastive Decoding: Open-ended Text Generation as Optimization | - | - |
PiCO: Contrastive Label Disambiguation for Partial Label Learning | - | - |
Controlled Text Generation with Hidden Representation Transformations | It steers large language models to generate text pertaining to certain attributes;It modifies the hidden representation of the base model through learned transformations. | - |
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Pre-training on downstream, single model improve multimodal. constrastive inter and intra-learning. | |
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning | - | - |
Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models | (1)Denoising Autoencoder (DAE), (2) we adjust the representation space of the output to the corresponding classes by clustering through a Contrastive Learning (CL) method and data augmentation, (3) we apply fine-tuning to delimit the predefined categories. | 3-Phase fine-tuning |
Parameter-Efficient Detoxification with Contrastive Decoding | - | - |
CONTRASTIVE LEARNING FOR LOW-LIGHT RAW DENOISING | - | Loss |
CONT: Contrastive Neural Text Generation NIPS 2022 | the construction of contrastive examples. the choice of the contrastive loss. the strategy in decoding. | N-pais loss |
https://huggingface.co/datasets/s-nlp/paranmt_for_detox (corpus)
TOXIGEN
Figure more prob.. –-running
Define what is toxic –-running
Use AI project datasets
Algo on constrastive Detoxifier –-running
=
=
Contrastive learning:
=
=
Assume ground truth : "I wanted to prank others like that"
Think:
To reduce toxicity the positive model is fine-tuned on a non-toxic corpus while the negative model is fine-tuned on a toxic corpus.)
BLEU
Evaluating Coherence in Dialogue Systems using Entailment
Word-based contrastive:
Efficiently: Focusing solely on toxic and clean words makes it easier to capture subtle differences in these key features.
Loss of context:
Disregarding the entire sentence may result in the loss of certain contextual information, potentially affecting the model's understanding of toxicity in specific contexts.
Find some contrastive learning method
Prepare a dialogue datasets.
Attack the input
How to measure the coherence
Some model:
Hyp: Whether a sentence is toxiced or not may or may not affect the output result sentence. ; 把(toxicS,non-toxicS*)扔進去decoder看效果,
Datasets : Jigsaw 、paradetox ; Detoxifier : bart-base-detox
compare perplexity – done
Similarity – done
比較bertscore? coherent?
Detoxifier baseline
How to measure the detoxification is good?
fit into the detoxifier and see what?
COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification
Z+ =?
Z- =?
contrastive leanring??
final extraction:
After Coherent:
Input:
Procedure:
Initialization:
Adversarial Learning:
Generate Non-toxic Sentence and Keyphrase:
Decode using llama-2:
Enhance Coherence:
Output:
hypothesis:
Strengths:
original input sentence meaning (Coherent)
(Detox)
Datasets:
ParaDetox
real-toxicity-prompts
Method | Papers | Contribution | Datasets |
---|---|---|---|
Knowledge-driven | Learning to Copy Coherent Knowledge for Response Generation (AAAI-21) | (1)Knowledge Discernment, (2)dialog goal and the dialog context, (3)Context Manager | DuConv and DuRecDial |
- | Knowledge-based Review Generation by Coherence Enhanced Text Planning | (1) the document plan is modeled as a sequence of sentence plans in order, (2) the sentence plan is modeled as an entity-based subgraph from KG. | Amazon Electronic, Book, and IMDb Movie |
Hybrid Latent Variables | Towards Diverse, Relevant and Coherent Open-Domain Dialogue Generation via Hybrid Latent Variables (AAAI-23) | The HLV method combines the strengths of both continuous and discrete latent variables to generate diverse, relevant, and coherent dialogue responses. | DailyDialog and Opensubtitles |
Diffusion | Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models | without introducing mismatches, Bayesian framework to jointly modify both revealed and unrevealed regions | CelebA-HQ and ImageNet-1K |
Discourse (High-Level Language Representation) | Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence | It can represent the prefix sentences at sentence level and discourse level in the decoding process; They propose two pretraining objectives to learn the representations by predicting inter-sentence semantic similarity and distinguishing between normal and shuffled sentence orders. | WritingPrompts and ROC |
Discourse-level | DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence | DiscoScore (a kind of metrics) strongly correlates with human rated coherence. | RC and LC and Lexical Chain |
GANs | TILGAN: Transformer-based Implicit Latent GAN for Diverse and Coherent Text Generation | They improve local and global coherence, we explicitly introduce a multi-scale discriminator to capture the semantic information at varying scales among the sequence of hidden representations encoded by Transformer. | MSCOCO, WMTNEWS and ROC-STORY |
Contrastive learning | Coherent Long Text Generation by Contrastive Soft Prompt | It learns text representations in the hidden space for better planning long text generation; (Similar to my idea); Better than HINT | ROCStories and WritingPrompts |
- | CONT: Contrastive Neural Text Generation | (1)Contrastive Examples from Predictions (2) N-Pairs Contrastive Loss (3) Inference with Learned Similarity Function | MT, XSum, Code Comment Generation, Data-to-text Generation, Commonsense Generation |
- | Generating Coherent Narratives by Learning Dynamic and Discrete Entity States with a Contrastive Framework | We propose a contrastive framework to learn the state representations in a discrete space, and insert additional attention layers into the decoder to better exploit these states. | Wikiplots and CNN News |
- | - |
Contrastive learning table
Run some inference
Method | papers | contribution | datasets |
---|---|---|---|
Diffusion | DiffuDetox: A Mixed Diffusion Model for Text Detoxification | (1)conditional model reduces its toxicity (2)unconditional model guide the sampling process | |
Denoise | Towards a Better Understanding of Noise in Natural Language Processing | - | |
BERT | Text Detoxification using Large Pre-trained Neural Models) | (1) guidance of the generation process with small styleconditional language models and (2) use of paraphrasing models to perform style transfer.(Similar to my idea) Conditional bert | Jigsaw |
– | Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings | (1)We propose a method to generalize toxic directions in the latent space.(2) We also provide a methodology for constructing parallel datasets using a context based word masking system. | |
– | A Study on Manual and Automatic Evaluation for Text Style Transfer: The Case of Detoxification | We conducted an evaluation of detoxification models for Russian using both automatic and manual metrics. | |
Prompt | Prompt Tuning for Text Detoxification | We conduct experiments to determine the optimal length of trainable prompt for the task. | |
– | You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content | (1) Toxicity Classification (2) Toxic Span Detection (3) Detoxification | |
Constrastive learning | COUNT: COntrastive UNlikelihood Text Style Transfer for Text Detoxification | They contrast the gold standard rephrasing with the identity input-tooutput mapping to effectively isolate and focus learning on non-toxic style transfer | ParaDetox、APPDIA |
– | Parameter-Efficient Detoxification with Contrastive Decoding | They leverages the frozen weights of the language model itself and only introduces a tiny portion of new model parameters to detoxify generation. | |
Context-aware | CMD: a framework for Context-aware Model self- etoxification | ||
- | Detoxifying Text with MARCO: Controllable Revision with Experts and Anti-Experts | MARCO uses likelihoods under a non-toxic LM (expert) and a toxic LM (anti-expert) to find candidate words to mask and replace. | Social Bias Frames |
GreenLLaMA | GreenLLaMA A Framework for Detoxification with Explanations | Black magic | ParaDetox |
Finish the table above.
How to show the objective function.
DiffuDetox: A Mixed Diffusion Model for Text Detoxification
Contribution:
Limiation:
feedback
Learning to Copy Coherent Knowledge for Response Generation (AAAI-21)
Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition (EMNLP-21)
story generation
I want to maintain consistency in output even with poor input.
Objective:
Loss:
m is a margin, a hyperparameter that controls the minimum acceptable difference in similarity.
is Consistency Metric.
Because of the lack of correct answers in this task:
Semantic similarity in NLG.
Key Information Extraction.
Contrastive learning.
Datasets
Feedback
Ask gpt to generate good prompt and bad prompt to train the model.
Let the distence between the good output and victim more further.
address the coherent
address what toxicity
objective function
big picture
"Enhancing NLG Consistency Across Diverse Inputs Using Data Augmentation and Keyword-Driven Prompts"
"CID: Consistent NLG with Input Diversity using Data Augmentation and Keyword-Driven Prompts"
Data Augmentation
Inference Example
Input: I'm currently immerse in deep research of nature language generation task.
ANS If you have any specific questions or if there's a particular aspect of your research you'd like to discuss, feel free to share. I'm here to assist you in your endeavors related to natural language generation.
Input :I concentrating to address the various challenges brings by natural language generation.
The output should be consistency even the input is invarint
Real-world Application Scenarios:
Robustness and Generalization:
Reduced Bias:
Semantic Accuracy in Natural Language Generation: A Thesis Proposal
Prompt?
AUTOPROMPT: Eliciting Knowledge from Language Models with Automatically Generated Prompts
Towards a Better Understanding of Noise in Natural Language Processing
Self-supervised-learning
Disentangled Representation Learning for texts and emotion or keyword ?
Disentanglement-based models offer two main advantages:
Objective
Problem
c can be the keyword condition
No enough datasets:
How to extract the keyword
How to know they(inputs) are the same
feedback:
Title novelty method
previous work
fix the equation