# PLUG AND PLAY LANGUAGE MODELS: A SIMPLE APPROACH TO CONTROLLED TEXT GENERATION
###### tags: `RL Group meeting` 112/3/28
## Outline
- Abstract
- Introduction
- Related Work
- Plug and Play Language Models
- Experiments, Results, and Evaluation
- Conclusion
## Abstract
- Controlling attributes of the generated language is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining.
- PPLM combines a **pretrained LM** with one or more simple **attribute classifiers** that guide text generation without any further training of the LM.
- Instead of training the PLM, **PPLM wants to correct the output of the PLM by an additional attribute model** so that it meets certain expectations.
- The attribute model here can be either a bag of words to represent a certain topic or a trained mini-model to score a certain topic.
- Then pass the gradient to the hidden states of the PLM through this attribute model, so that the hidden states can be corrected.
- The PLM output will then be skewed towards the topic we want.
## Introduction
- We demonstrate the PPLM approach using a GPT-2 345M model as the general-purpose LM $p(x)$
- The method applies in any representation space from any transformer-based text generator and allows combination with any attribute model $p(a|x)$.

- We introduce the Plug and Play LM for controlled language generation, discuss its relation to existing work, and how sampling from a PPLM works.
- We quantify effectiveness using both automated evaluation as well as human evaluation.
- We show that the PPLM approach can be used to **c** where generation of toxic content is likely by following the negative gradient of a model trained to detect toxicity.
## Related Work
- Controlled generation
- Current models need to be separately fine-tuned for each specific attribute.
- Our method does not require retraining any conditional generative model, and both the language model and the conditional model can be flexibly assembled.
- Noisy Channel Modeling
- Their approach translates a source language sentence $y$ into a target language sentence $x$ by first sampling from a forward model proposal distribution $p_{forward}(x|y)$ and then reranking samples based on probabilities given by $p_{backward}(x|y) ∝ p(x)p(y|x)$.
- PPLM scores samples using the same basic equation, but we have no forward or proposal model $p_{forward}(x|a)$,we rely on the latent space updates.
- Weighted decoding
- Control with weighted decoding (WD) is difficult and often leads to sacrificing fluency and coherence.
- Sophisticated sampling methods can be used to constrain the model generation to certain keywords and topics.
- Text Style Transfer
- A key difference between the above and our approach is that we use an offline discriminator and perform optimization based on this discriminator.
## Plug and Play Language Models
**1. Language Modeling with Transformers**
- Given a sequence of tokens $X$ = {$x_0,...,x_n$}
- $p(X)=\prod_{i=1}^n p\left(x_i \mid x_0, \cdots, x_{i-1}\right)$
- $H_t=\left[\left(K_t^{(1)}, V_t^{(1)}\right), \cdots,\left(K_t^{(l)}, V_t^{(l)}\right)\right]$ ,where $\left(K_t^{(i)}, V_t^{(i)}\right)$ corresponds to the key-value pairs from the $i$-th layer generated at all time-steps from 0 to $t$.
- $o_{t+1}, H_{t+1}=\operatorname{LM}\left(x_t, H_t\right)$
**2. Steering Generation: Ascending log $p(a|x)$**
- We shift the history $H_t$ in the direction of the sum of two gradients:
- one toward higher log-likelihood (LL) of the attribute $a$ under the conditional attribute model $p(a|x)$.
- one toward higher LL of the unmodified language model $p(x)$.
- Combining these factors with a variable multiplier provides us with a controllable “knob” to guide generation in a given direction with a specified strength.
- $∆H_t$ is initialized at zero and updated with gradients from an attribute model that measures the extent to which the generated text possesses the desired attribute.
- We rewrite the attribute model $p(a|x)$ as $p(a|H_t + ∆H_t)$.

- It will quickly result in unrealistic adversarial or fooling examples as the text moves into low probability regions.
**3. Ensuring Fluency: Ascending log $p(x)$**
- Kullback–Leibler (KL) Divergence

- Post-norm Geometric Mean Fusion
- it serves to constantly tie the generated text to the unconditional p(x) LM distribution.
- $x_{t+1} \sim \frac{1}{\beta}\left(\widetilde{p}_{t+1}^{g_m} p_{t+1}^{1-\gamma_{g m}}\right)$
**4. PPLM provides two functionalities**
- A score that can be used to rank samples based on the LL of the desired attribute.
- A gradient ascent direction to perform an update in the latent space.

## Experiments, Results, and Evaluation
- We conduct an ablation study with four variants:
- **B**: the baseline,unchanged GPT-2 LM, sampled once.
- **BR**: B but sampled r times, with best sample chosen based on the LL ranking and filtering based on Dist score.
- **BC**: update the latent representations (Het) and then sample once.
- **BCR**: update the latent representations $\tilde{H} _t$ and generate r samples, choose the best sample based on the LL score.
- As baseline approaches we consider:
- **CTRL**: a recent language model.
- **GPT2-FTRL**: a GPT-2 LM fine-tuned for human evaluated positivity with RL;
- **WD**: a weighted decoding baseline in which the B LM’s outputs are weighted directly toward maximizing $p(a|x)$.
- BoW Attribute Models
- The simplest attribute model we use gives the log of the sum of likelihoods of each word in some predefined Bag of Words (BoW).
- 


- Discriminator Attribute Models
- We optimize for a higher-probability of the sequence having a specific attribute by considering changes only to the next token to be generated.
- 


## Conclusion
- PPLM flexibly combines a large, pre-trained LM and a BoW or a small, easy-to-train discriminator.
- PPLM achieves fine-grained control of attributes via a simple gradient-based sampling mechanism.
## Appendix

[Reference](https://arxiv.org/pdf/1912.02164.pdf)