Idea V1 - HackMD

# Bcakbone model :::spoiler LLaMA [LLaMA](https://github.com/facebookresearch/llama) [lit-llama](https://github.com/Lightning-AI/lit-llama) [LLaMA-Adapter](https://github.com/OpenGVLab/LLaMA-Adapter/tree/main) [LLaMA: Open and Efficient Foundation Language Models (paper)](https://arxiv.org/pdf/2302.13971.pdf) [LLaMA 2](https://ai.meta.com/llama/) ::: :::spoiler ChatGLM [ChatGLM-6B code](https://github.com/THUDM/ChatGLM-6B) [ChatGLM-6B_v2_huggingface](https://huggingface.co/THUDM/chatglm2-6b) [GLM: General Language Model Pretraining with Autoregressive Blank Infilling (paper)](https://arxiv.org/pdf/2103.10360.pdf) ![](https://hackmd.io/_uploads/S1djTVUoh.png) [ChatGLM-LoRA-RLHF-PyTorch code](https://github.com/jackaduma/ChatGLM-LoRA-RLHF-PyTorch) ::: :::spoiler T5 model [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) [G ithub](https://github.com/google-research/text-to-text-transfer-transformer) ::: :::spoiler Parameter-efficieny-Adapter [Towards a Unified View of Parameter-Efficient Transfer Learning](https://arxiv.org/pdf/2110.04366.pdf) [Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners](https://arxiv.org/pdf/2108.13161.pdf) [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](https://arxiv.org/pdf/2205.05638.pdf) [Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning](https://openreview.net/pdf?id=Nk2pDtuhTq) [Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning](https://arxiv.org/pdf/2303.10512.pdf) ::: # Progress direction V1 :::spoiler Understanding of Noise \ HyperPrompt ## Some method [Towards a Better Understanding of Noise in Natural Language Processing](https://aclanthology.org/2021.ranlp-1.7.pdf) ![](https://hackmd.io/_uploads/Hkbp3GAsh.png) [HyperPrompt: Prompt-based Task-Conditioning of Transformers](https://proceedings.mlr.press/v162/he22f/he22f.pdf) ![](https://hackmd.io/_uploads/r194abMhh.png) ::: :::spoiler CTG ## HMMs [Prediction-Constrained Hidden Markov Models for Semi-Supervised Classification](https://www.ics.uci.edu/~sudderth/papers/icml21timeseriesworkshop.pdf) [Tractable Control for Autoregressive Language Generation](https://openreview.net/attachment?id=ET6qkbzeOx&name=pdf) ![](https://hackmd.io/_uploads/Sk8f0k7a2.png) * 多樣性和流暢度：在HMM中，可以使用不同的隱藏狀態來表示多種可能的生成情況，從而實現生成多樣性的文本。這可以通過在HMM的狀態轉換中引入更多的變異性來實現。 * 語義和指稱一致性：在HMM中，可以設計狀態轉換和輸出概率，以便在生成過程中保持語義一致性和指稱一致性。這需要設計適當的模型參數來捕捉上下文的含義。 * 限制生成範圍：在HMM中，可以通過限制狀態轉換和輸出概率來控制生成的範圍，從而避免生成無意義或荒謬的文本。 * 唯一一篇說到CTG可以結合PCs(其中的HMMs)去實現 * Datasets: [CommonGen](https://inklab.usc.edu/CommonGen/index.html) ## CTG [Why is constrained neural language generation particularly challenging?](https://arxiv.org/pdf/2206.05395.pdf) [COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics](https://openreview.net/pdf?id=TiZYrQ-mPup) ![](https://hackmd.io/_uploads/rk8mDw462.png) * 透過Langvegin Dynamics產生的Sequence去跟一般LM產生的Seq去作Top-Mask。 * 更好地處理限制條件之間的衝突或矛盾的問題，例如**同時要求生成的文本既符合風格限制又符合語義限制**。 [Effective Unsupervised Constrained Text Generation based on Perturbed Masking](https://aclanthology.org/2022.findings-acl.111.pdf) [COLLIE: Systematic Construction of Constrained Text Generation Tasks](https://arxiv.org/pdf/2307.08689.pdf) ![](https://hackmd.io/_uploads/ByDMVt4An.png) [Parallel Refinements for Lexically Constrained Text Generation with BART](https://arxiv.org/pdf/2109.12487.pdf) - CBART leverages the pre-trained model BART and transfers part of the generation burden from the decoder to the encoder by decomposing this task into two sub-tasks, thereby improving the sentence quality. - Datasets: - [Yelp review](https://www.yelp.com/dataset) - [One-Billion-Word](https://github.com/NLPCode/CBART) - CITE) [POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training](http://aclanthology.lst.uni-saarland.de/2020.emnlp-main.698.pdf) - Non-autoregressive(NAT) - It produce an entire output translation in parallel - We initialize the decoding process using copied source inputs from the encoder side. - Copy source inputs uniformly - Copy source inputs using fertilities(More powerful) - ![](https://hackmd.io/_uploads/SkFeBYDX6.png) [Constrained Beam Search](https://arxiv.org/pdf/1612.00576.pdf) ![](https://hackmd.io/_uploads/HJXSgKLZa.png =70%x) [Controllable Text Generation with Language Constraints](https://arxiv.org/pdf/2212.10466.pdf) ![](https://hackmd.io/_uploads/BkgPVCgma.png) - Balance the specified constraints and fluency [MultiControl_github](https://github.com/HappyGu0524/MultiControl) ## Inference [GeLaTo](https://github.com/UCLA-StarAI/GeLaTo) export PATH=$PATH:/home/Work/julia-1.9.3/bin/ 目前想法: 問題/挑戰：約束文本生成時難以處理多義性或歧義性，導致生成的結果不確定或不符合預期多義性建模：利用概率電路來有效地建模生成過程中的多義性。約束集成：將約束信息集成到概率電路中，以確保生成的文本滿足特定約束。這可以是額外的約束條件，如語法規則、語境要求等。 **不確定性**量化：這可以通過概率分佈的形式，清晰地表示在某一步驟上多個可能的生成選擇，使得生成的文本結果更具概率性。動態調整：在生成過程中動態調整模型的注意力或權重，以更好地適應特定的約束或上下文。 ## Comparision to tradition - Flexibility: - Traditional Task-Specific Approaches: Typically involve designing a task-specific model or system from scratch. This can be resource-intensive and less flexible when adapting to new tasks or changing requirements. - Constraint-Based Text Generation: Offers more flexibility as the same underlying model can be adapted to various tasks by adjusting the constraints. This versatility can be particularly advantageous in dynamic environments or when dealing with a range of tasks. - Resource Efficiency: - Traditional Task-Specific Approaches: Require collecting and annotating large amounts of task-specific data. Building and training models for each task can be resource-intensive. - Constraint-Based Text Generation: May leverage pre-trained language models, which have been trained on vast amounts of general data. This can significantly reduce the need for extensive task-specific datasets and training resources. - Scalability: - Traditional Task-Specific Approaches: Building and maintaining models for multiple tasks can be challenging to scale, especially when dealing with diverse tasks or domains. - Constraint-Based Text Generation: Offers scalability because the same model architecture can be reused across different tasks with adjustments made to the constraints. This can simplify the process of scaling to new tasks. - Rapid Prototyping: - Traditional Task-Specific Approaches: Developing a new task-specific model can be time-consuming, especially in cases where the task is not well-defined or evolving. - Constraint-Based Text Generation: Enables rapid prototyping and experimentation. Since the base model is pre-trained, adapting it to a new task involves defining the constraints, which can expedite the development process. - Transfer Learning: - Traditional Task-Specific Approaches: May not easily transfer knowledge learned from one task to another. - Constraint-Based Text Generation: Capitalizes on transfer learning as the pre-trained model brings general language understanding. This knowledge can be fine-tuned for specific tasks with the introduction of task-specific constraints. - Consistency and Compliance: - Traditional Task-Specific Approaches: Ensuring consistency and compliance with specific constraints or guidelines may require extensive manual effort. - Constraint-Based Text Generation: Offers a systematic way to enforce constraints, ensuring generated content aligns with predefined rules, standards, or domain-specific requirements. ::: :::spoiler Probabilistic circuits [Probabilistic circuits slide from UCLA](https://web.cs.ucla.edu/~guyvdb/slides/IJCAI20.pdf) [Paper from Probabilistic Circuits: Representation and Inference](http://starai.cs.ucla.edu/papers/LecNoAAAI20.pdf) [Tractable Control for Autoregressive Language Generation](https://openreview.net/attachment?id=ET6qkbzeOx&name=pdf) ::: ::: spoiler 10/24 \ 11/27 ## 10/24 [Why is constrained neural language generation particularly challenging?](https://arxiv.org/pdf/2206.05395.pdf) - [Tractable Regularization of Probabilistic Circuits](https://openreview.net/pdf?id=W9oywyjO8VN) - They combine advantages of probabilistic graphical models (PGMs) with those of neural networks (NNs) - Probabilistic Circuits - [Probabilistic Circuits: A Unifying Framework for Tractable Probabilistic Models](http://starai.cs.ucla.edu/papers/ProbCirc20.pdf) - [Probabilistic circuits: Representations, inference, learning and applications](https://www.youtube.com/watch?v=2RAG5-L9R70) - [Probabilistic Generating Circuits](https://arxiv.org/pdf/2102.09768.pdf) - Leaf nodes, which are $z_i$ or constants. - [Sparse Probabilistic Circuits via Pruning and Growing](https://openreview.net/pdf?id=KieCChVB6mN) - combining pruning and growing operations to exploit the sparsity of PC structures - [Generating Language with Tractable Constraints(GeLaTo)](https://openreview.net/attachment?id=ET6qkbzeOx&name=pdf) - [Scaling Up Probabilistic Circuits by Latent Variable Distillation](https://arxiv.org/pdf/2210.04398.pdf) ## 11/27 [Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference](https://openreview.net/attachment?id=6eGltW7t8F&name=pdf) ![](https://hackmd.io/_uploads/rJ7N3h_XT.png) * We propose a probabilistic dialogue system for KGD that can be learned by approximate MLE with sequential posterior inference (SPI). [Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents](https://openreview.net/pdf?id=INvxF4iQ34) [ATTENTION SATISFIES: A CONSTRAINT-SATISFACTION LENS ON FACTUAL ERRORS OF LANGUAGE MODELS](https://openreview.net/attachment?id=gfFVATffPd&name=pdf) 都是強調限制模型讓機器人行走安全 [Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue](https://arxiv.org/pdf/1906.07220.pdf) ![image](https://hackmd.io/_uploads/HyYOwkMS6.png =70%x) ::: :::spoiler Others [REPLUG: Retrieval-Augmented Black-Box Language Models](https://arxiv.org/pdf/2301.12652.pdf) ![](https://hackmd.io/_uploads/S1Whhe4ba.png) ::: Those contents above are previous idea.. -------