# PLUG AND PLAY LANGUAGE MODELS: A SIMPLE APPROACH TO CONTROLLED TEXT GENERATION ###### tags: `RL Group meeting` 112/3/28 ## Outline - Abstract - Introduction - Related Work - Plug and Play Language Models - Experiments, Results, and Evaluation - Conclusion ## Abstract - Controlling attributes of the generated language is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. - PPLM combines a **pretrained LM** with one or more simple **attribute classifiers** that guide text generation without any further training of the LM. - Instead of training the PLM, **PPLM wants to correct the output of the PLM by an additional attribute model** so that it meets certain expectations. - The attribute model here can be either a bag of words to represent a certain topic or a trained mini-model to score a certain topic. - Then pass the gradient to the hidden states of the PLM through this attribute model, so that the hidden states can be corrected. - The PLM output will then be skewed towards the topic we want. ## Introduction - We demonstrate the PPLM approach using a GPT-2 345M model as the general-purpose LM $p(x)$ - The method applies in any representation space from any transformer-based text generator and allows combination with any attribute model $p(a|x)$. ![](https://i.imgur.com/IMRXBLO.png) - We introduce the Plug and Play LM for controlled language generation, discuss its relation to existing work, and how sampling from a PPLM works. - We quantify effectiveness using both automated evaluation as well as human evaluation. - We show that the PPLM approach can be used to **c** where generation of toxic content is likely by following the negative gradient of a model trained to detect toxicity. ## Related Work - Controlled generation - Current models need to be separately fine-tuned for each specific attribute. - Our method does not require retraining any conditional generative model, and both the language model and the conditional model can be flexibly assembled. - Noisy Channel Modeling - Their approach translates a source language sentence $y$ into a target language sentence $x$ by first sampling from a forward model proposal distribution $p_{forward}(x|y)$ and then reranking samples based on probabilities given by $p_{backward}(x|y) ∝ p(x)p(y|x)$. - PPLM scores samples using the same basic equation, but we have no forward or proposal model $p_{forward}(x|a)$,we rely on the latent space updates. - Weighted decoding - Control with weighted decoding (WD) is difficult and often leads to sacrificing fluency and coherence. - Sophisticated sampling methods can be used to constrain the model generation to certain keywords and topics. - Text Style Transfer - A key difference between the above and our approach is that we use an offline discriminator and perform optimization based on this discriminator. ## Plug and Play Language Models **1. Language Modeling with Transformers** - Given a sequence of tokens $X$ = {$x_0,...,x_n$} - $p(X)=\prod_{i=1}^n p\left(x_i \mid x_0, \cdots, x_{i-1}\right)$ - $H_t=\left[\left(K_t^{(1)}, V_t^{(1)}\right), \cdots,\left(K_t^{(l)}, V_t^{(l)}\right)\right]$ ,where $\left(K_t^{(i)}, V_t^{(i)}\right)$ corresponds to the key-value pairs from the $i$-th layer generated at all time-steps from 0 to $t$. - $o_{t+1}, H_{t+1}=\operatorname{LM}\left(x_t, H_t\right)$ **2. Steering Generation: Ascending log $p(a|x)$** - We shift the history $H_t$ in the direction of the sum of two gradients: - one toward higher log-likelihood (LL) of the attribute $a$ under the conditional attribute model $p(a|x)$. - one toward higher LL of the unmodified language model $p(x)$. - Combining these factors with a variable multiplier provides us with a controllable “knob” to guide generation in a given direction with a specified strength. - $∆H_t$ is initialized at zero and updated with gradients from an attribute model that measures the extent to which the generated text possesses the desired attribute. - We rewrite the attribute model $p(a|x)$ as $p(a|H_t + ∆H_t)$. ![](https://i.imgur.com/5PFkcH0.png) - It will quickly result in unrealistic adversarial or fooling examples as the text moves into low probability regions. **3. Ensuring Fluency: Ascending log $p(x)$** - Kullback–Leibler (KL) Divergence ![](https://i.imgur.com/jYqjNxJ.png) - Post-norm Geometric Mean Fusion - it serves to constantly tie the generated text to the unconditional p(x) LM distribution. - $x_{t+1} \sim \frac{1}{\beta}\left(\widetilde{p}_{t+1}^{g_m} p_{t+1}^{1-\gamma_{g m}}\right)$ **4. PPLM provides two functionalities** - A score that can be used to rank samples based on the LL of the desired attribute. - A gradient ascent direction to perform an update in the latent space. ![](https://i.imgur.com/KfYvFqr.png) ## Experiments, Results, and Evaluation - We conduct an ablation study with four variants: - **B**: the baseline,unchanged GPT-2 LM, sampled once. - **BR**: B but sampled r times, with best sample chosen based on the LL ranking and filtering based on Dist score. - **BC**: update the latent representations (Het) and then sample once. - **BCR**: update the latent representations $\tilde{H} _t$ and generate r samples, choose the best sample based on the LL score. - As baseline approaches we consider: - **CTRL**: a recent language model. - **GPT2-FTRL**: a GPT-2 LM fine-tuned for human evaluated positivity with RL; - **WD**: a weighted decoding baseline in which the B LM’s outputs are weighted directly toward maximizing $p(a|x)$. - BoW Attribute Models - The simplest attribute model we use gives the log of the sum of likelihoods of each word in some predefined Bag of Words (BoW). - ![](https://i.imgur.com/rmKbM8d.png) ![](https://i.imgur.com/BV4VYAH.png) ![](https://i.imgur.com/tv01dXF.png) - Discriminator Attribute Models - We optimize for a higher-probability of the sequence having a specific attribute by considering changes only to the next token to be generated. - ![](https://i.imgur.com/w9LjW9j.png) ![](https://i.imgur.com/kXd3JWL.png) ![](https://i.imgur.com/EO9ENzn.png) ## Conclusion - PPLM flexibly combines a large, pre-trained LM and a BoW or a small, easy-to-train discriminator. - PPLM achieves fine-grained control of attributes via a simple gradient-based sampling mechanism. ## Appendix ![](https://i.imgur.com/ZCXTzC5.png) [Reference](https://arxiv.org/pdf/1912.02164.pdf)