Try   HackMD

CONT: Contrastive Neural Text Generation

NIPS 2022


Outline

Intoduction

Method

Experiment

Conclusion


Introduction

Previous methods using contrastive learning in neural text generation usually lead to inferior performance

  • If we simply use from-batch positive-negative samples following simCLR, and adopt the InfoNCE loss which ignores the difference between negative samples (Naive CL).
  • Previous work attempts to build better contrastive samples by disturbing the ground truth in the discrete space or the continuous embedding space

Contrastive Neural Text generation (CONT) addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects:

  • the construction of contrastive examples. (through beam search algorithm)
  • the choice of the contrastive loss. (N-pairs)
  • the strategy in decoding. (the learned sequence similarity score)

We validate CONT on five generation tasks with ten benchmarks:

  • machine translation
  • summarization
  • code comment generation
  • data-to-text generation
  • commonsense generation.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Method

Architecture

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Contrastive Examples from Predictions

We use the diverse beam search algorithm to create contrastive examples from the top-K list of the model’s lastest predictions and then append them to the from-batch samples to form the contrastive examples.

A warm-up stage where the model is only supervised by

LNLL is recommended as it guarantees the quality of the examples from the model’s prediction.

These self-generated contrastive examples alleviate the model’s exposure bias.

N-Pairs Contrastive Loss

We first rank all the contrastive examples based on an oracle function

o(·,y), which computes a sequence-level score with the ground truth
y
.

LN-Pairs =(y+,y)PL(y+,y)=(y+,y)Pmax{0,cos(zx,zy)cos(zx,zy+)+ξ}

P contains CK2 pairs constructed from B, ground truth y, and from-batch examples. 

ξ=γ(rank(y)rank(y+)) reflect the quality difference in these pairs.

Inference with Learned Similarity Function

y=argmaxy^{αcos(zx,zy^)+(1α)t=0np(y^tx,y^<t)}

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Experiment

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →


Conclusion

It models an additional contrastive learning objective to provide a sequence-level supervision for auto-regressive neural text generation models.

We explore three shortcomings that limit the development of contrastive learning on text generation tasks.

Speeding up the training stage without losing accuracy is the next important step to improve CONT.


Appendix

batch samples

The physical meaning of batch samples in this paper is that they represent a group of text sequences or data points that are fed into the neural network simultaneously for processing.

N-Pairs Contrastive Loss Algorithm

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Background

LNCE=logexp(cos(zx,zy)/τ)yBexp(cos(zx,zy)/τ)LNLL=t=1Nlogpθ(ytx,y<t),