NIPS 2022
Outline
Intoduction
Method
Experiment
Conclusion
Introduction
Previous methods using contrastive learning in neural text generation usually lead to inferior performance
- If we simply use from-batch positive-negative samples following simCLR, and adopt the InfoNCE loss which ignores the difference between negative samples (Naive CL).
- Previous work attempts to build better contrastive samples by disturbing the ground truth in the discrete space or the continuous embedding space
Contrastive Neural Text generation (CONT) addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects:
- the construction of contrastive examples. (through beam search algorithm)
- the choice of the contrastive loss. (N-pairs)
- the strategy in decoding. (the learned sequence similarity score)
We validate CONT on five generation tasks with ten benchmarks:
- machine translation
- summarization
- code comment generation
- data-to-text generation
- commonsense generation.
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Method
Architecture
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Contrastive Examples from Predictions
We use the diverse beam search algorithm to create contrastive examples from the top-K list of the model’s lastest predictions and then append them to the from-batch samples to form the contrastive examples.
A warm-up stage where the model is only supervised by is recommended as it guarantees the quality of the examples from the model’s prediction.
These self-generated contrastive examples alleviate the model’s exposure bias.
N-Pairs Contrastive Loss
We first rank all the contrastive examples based on an oracle function , which computes a sequence-level score with the ground truth .
reflect the quality difference in these pairs.
Inference with Learned Similarity Function
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Experiment
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Conclusion
It models an additional contrastive learning objective to provide a sequence-level supervision for auto-regressive neural text generation models.
We explore three shortcomings that limit the development of contrastive learning on text generation tasks.
Speeding up the training stage without losing accuracy is the next important step to improve CONT.
Appendix
batch samples
The physical meaning of batch samples in this paper is that they represent a group of text sequences or data points that are fed into the neural network simultaneously for processing.
N-Pairs Contrastive Loss Algorithm
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Background