CONT: Contrastive Neural Text Generation

NIPS 2022

Outline

Intoduction

Method

Experiment

Conclusion

Introduction

Previous methods using contrastive learning in neural text generation usually lead to inferior performance

If we simply use from-batch positive-negative samples following simCLR, and adopt the InfoNCE loss which ignores the difference between negative samples (Naive CL).
Previous work attempts to build better contrastive samples by disturbing the ground truth in the discrete space or the continuous embedding space

Contrastive Neural Text generation (CONT) addresses bottlenecks that prevent contrastive learning from being widely adopted in generation tasks from three aspects:

the construction of contrastive examples. (through beam search algorithm)
the choice of the contrastive loss. (N-pairs)
the strategy in decoding. (the learned sequence similarity score)

We validate CONT on five generation tasks with ten benchmarks:

machine translation
summarization
code comment generation
data-to-text generation
commonsense generation.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Method

Architecture

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Contrastive Examples from Predictions

We use the diverse beam search algorithm to create contrastive examples from the top-K list of the model’s lastest predictions and then append them to the from-batch samples to form the contrastive examples.

A warm-up stage where the model is only supervised by

L_{NLL}

is recommended as it guarantees the quality of the examples from the model’s prediction.

These self-generated contrastive examples alleviate the model’s exposure bias.

N-Pairs Contrastive Loss

We first rank all the contrastive examples based on an oracle function

o (\cdot, y)

, which computes a sequence-level score with the ground truth

y

L_{N-Pairs} = \sum_{(y^{+}, \boldsymbol y^{-}) \in P} L (\boldsymbol y^{+}, \boldsymbol y^{-}) = \sum_{(\boldsymbol y^{+}, \boldsymbol y^{-}) \in P} max {0, \cos (z_{\boldsymbol x}, z_{\boldsymbol y^{-}}) - \cos (z_{\boldsymbol x}, z_{\boldsymbol y^{+}}) + ξ}

P contains C_{K}^{2} pairs constructed from B, ground truth y, and from-batch examples.

ξ = γ * (rank (\boldsymbol y^{-}) - rank (\boldsymbol y^{+}))

reflect the quality difference in these pairs.

Inference with Learned Similarity Function

\boldsymbol y^{*} = \arg max_{\hat{\boldsymbol y}} {α \cdot \cos (z_{\boldsymbol x}, z_{\hat{\boldsymbol y}}) + (1 - α) \prod_{t = 0}^{n} p ({\hat{y}}_{t} ∣ \boldsymbol x, {\hat{\boldsymbol y}}_{< t})}

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Experiment

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Conclusion

It models an additional contrastive learning objective to provide a sequence-level supervision for auto-regressive neural text generation models.

We explore three shortcomings that limit the development of contrastive learning on text generation tasks.

Speeding up the training stage without losing accuracy is the next important step to improve CONT.

Appendix

batch samples

The physical meaning of batch samples in this paper is that they represent a group of text sequences or data points that are fed into the neural network simultaneously for processing.

N-Pairs Contrastive Loss Algorithm

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Background

\begin{matrix} L_{NCE} = - \log \frac{\exp (\cos (z_{\boldsymbol x}, z_{\boldsymbol y}) / τ)}{\sum_{\boldsymbol y^{'} \in B} \exp (\cos (z_{\boldsymbol x}, z_{\boldsymbol y^{'}}) / τ)} \\ L_{NLL} = - \sum_{t = 1}^{N} \log p_{θ} (y_{t} ∣ \boldsymbol x, \boldsymbol y_{< t}), \end{matrix}

CONT: Contrastive Neural Text Generation

Outline

Introduction

Method

Architecture

Contrastive Examples from Predictions

N-Pairs Contrastive Loss

Inference with Learned Similarity Function

Experiment

Conclusion

Appendix

batch samples

N-Pairs Contrastive Loss Algorithm

Background

Read more

Contrastive Disentanglement for Coherent Empathetic Dialogue

Towards a Unified Framework of Contrastive Learning for Disentangled Representations, NIPS

How to measure hallucination

1/26 Study papers