# Very Large Language Model (VLLM) Note: this is just a quick draft responding to Prof. Yuh-Jie Lee's call for VLLM. This is initially drafted by Chang-Chi Meng, a student of I-Chen Wu. Surely, more updates are welcome from all. ## Related Works ### GPT - [GPT : Improving Language Understanding by Generative Pre-Training](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) - Task Specific Supervised Learning - [GPT-2 : Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) - Pure Unsupervised Learning (Bad Performance) - [GPT-3 : Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf) - Prompt Learning(few shot) - [ChatGPT](https://openai.com/blog/chatgpt/): Highly likely to be from [InstructGPT (Training language models to follow instructions with human feedback)](https://arxiv.org/pdf/2203.02155.pdf) - [Deep Reinforcement Learning from Human Preferences](https://arxiv.org/pdf/1706.03741.pdf) - [Learning to summarize from human feedback](https://arxiv.org/pdf/2009.01325.pdf) ![](https://i.imgur.com/9KIZDpZ.png) ### BLOOM [[Website]](https://www.narrativa.com/bloom-is-here-heres-what-makes-it-different-from-gpt-3/) ... The training started on March 11, 2022. But in fact, the preparations of the corpus and the datasets started much earlier. A model with these characteristics is not achieved overnight. 4 months later, here we have it. And it hasn’t been easy: - 384 graphic cards of 80 gigabytes each on the **Jean Zay supercomputer** in France. - BLOOM has **176 billion parameters**, one billion more than GPT-3. - 70 layers – 112 attention heads per layers – hidden dimensionality of 14336 – 2048 tokens sequence length. - **ALiBi positional embeddings** – GeLU activation function. ... Comments: - How much computation resources does it need to fine-tune RLHF? ### GPT-J [[Website]](https://6b.eleuther.ai/) [[Github]](https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b) An open-sourced version of GPT-3 with **6B parameters**. Comments: - Smaller, easy to train/fine-tune (good to do it from scratch). But, the quality may not be good. And, also need to fine-tune RLHF. ### Reinforcement Learning with human feedback [[Website]](https://huggingface.co/blog/rlhf) [[Github]](https://github.com/lvwerra/trl) Papers are listed above. ![](https://i.imgur.com/yl6V34S.jpg) ## Suggestions from Prof. Yuh-Jie Lee 1. *Determine the use case: The first step is to determine the use case for the VLLM. What kind of language tasks do you want the VLLM to perform and what are the specific requirements for the model in terms of accuracy, speed, and scalability?* - Check a list of tasks in [ChatGPT Cheat Sheet](https://www.kdnuggets.com/publications/sheets/ChatGPT_Cheatsheet_Costa.pdf?fbclid=IwAR30J2c82I_dRGDu80Fbrb7IU7onotq5t8wIVJFx0nKgz32C9je1oBEhwCk) as listed below: - NLP Tasks a. Text Generation b. Summarization c. Open Domain Question Answering d. Paraphrasing e. Sentiment Analysis (few-shot or zero-shot) f. Table to Text g. Text to Table h. Token Classification (few-shot or zero-shot) i. Dataset Generation (few-shot or zero-shot) j. Machine Translation - Code - Structured Output Styles - Unstructured Output Styles - Media Types - Meta ChatGPT - Expert Prompting - Any more tasks? Any less? Fake news? ... 2. *Gather data: A crucial step in developing a VLLM is to gather a large and diverse dataset. This includes text in different languages and from various sources such as books, websites, and social media.* - GPT-3 includes the following. - BooksCorpus (800M words) - English Wikipedia (2500M words) - Reddit (800M documents/40GB) - WebText2(19 B) - Gais? PTT? - need something clean like wikipedia! 3. *Choose an architecture: Based on the use case and data, you will need to choose a suitable architecture for the VLLM. You could consider existing architectures such as Transformer, GPT, or RoBERTa, or develop a custom architecture based on your requirements.* - InstructGPT. Also need to know how to instruct the rewards. (Professional persons?) 4. *Train the model: Once the architecture is chosen, the model needs to be trained on the data. This is a computationally intensive process that requires a lot of computing resources. You could consider using cloud computing services or setting up a high-performance computing cluster to train the model.* - TWCC II, not for TWCC I/III/IV. (Any other choices?) 5. *Evaluate the model: Once the model is trained, it needs to be evaluated to determine its performance. This includes evaluating the model on various language tasks such as language translation, text classification, and question answering.* - To be discussed. 6. *Deploy the model: After the model is evaluated and its performance is satisfactory, it can be deployed for use. This includes integrating the model into applications or services that need language processing capabilities.* - To be discussed. 7. *Continuously improve the model: Finally, the model needs to be continuously improved over time by incorporating new data, fine-tuning the model, and updating the architecture as needed.* - To be discussed. ## Side Notes - The name of VLLM. Not sure about whether VLLM is a good name, since it is hard to make it **bigger** than GPT-3 (for all computing resources that we can access) and with the help of InstructGPT we probably do not need one bigger than GPT-3 for now. How about LLM-T (LLM from Taiwan)? - New subscription plan from OpenAI: ChatGPT+ - GPT-4: unreleased yet. - Pass Courses: (王宏恩) - UPenn管理學院教授Christian Terwiesch拿ChatGPT3去回答他的MBA必修課營運管理(Operation Management)期末考卷,先把他上課教材丟給ChatGPT,然後回答七題期末考的申論題跟計算,分別拿到A+、A+、C、B-、F、B、A+,最後總成績B,拿到這門MBA必修課的學分。 - 其中,ChatGPT在以個案為主答的申論題裡面回答完美,但是包含數學計算(中小學程度)的題目就會卡住。 - Terwiesch認為這將會衝擊到管院的教育以及MBA設立的目的。該文章在四天前公開在該教授的個人網站。 - Some interesting examples: - Take over the world. ![](https://i.imgur.com/DRqf5ic.png) - Four kids. ![](https://i.imgur.com/MypZ9Xh.png) - Wife is right. ![](https://i.imgur.com/51dcZJt.jpg) - Gender bias. ![](https://i.imgur.com/Xd4RC3O.jpg)