reference: https://www.youtube.com/watch?v=AVIKFXLCPY8&list=PLJV_el3uVTsPz6CTopeRp2L2t4aL_KgiI&pp=iAQB # LLM (Large Language Model) [TOC] Large Language Models (LLMs) are a type of deep learning model The more parameters, the more complex and powerful the model becomes. ![圖片](https://hackmd.io/_uploads/ryA-2HY30.png) Papper Source "Archive" ## Generative AI Definition: Generative AI refers to systems that can generate complex and structure objects. ![圖片](https://hackmd.io/_uploads/H1uMHwF3R.png) In order to train model to discover the pattern or function, vast amount of data are required! ## Language Model (LM) ![圖片](https://hackmd.io/_uploads/BJjbNPYn0.png) Examples of Language Models: - chat GPT - Gemini ## Prompt Engineering Prompt Engineering is the process of crafting prompt to guide LLMs in generating desired response. ![圖片](https://hackmd.io/_uploads/HyYQae2CR.png) ### Magic Prompt ![圖片](https://hackmd.io/_uploads/r1t_pgnCA.png) Key steps for creating effective prompt 1. Let’s think step by step | Breaking down the thought process. 2. Explain the answer (Chain of Thought). 3. This is important to my life | Emotional manipulation as a prompt strategy. 4. It’s more effective to tell LLMs what to do rather than what not to do. 5. Rewards and penalties can be helpful in optimizing responses. http://arxiv.org/abs/2312.16171 https://arxiv.org/pdf/2312.16171 ### Premise & Provide example! (In-context Learning) In-context learning involves giving examples within the prompt to improve the accuracy of LLM responses. ![圖片](https://hackmd.io/_uploads/HJa00enRA.png) #### In-context Learning Provide example help improve the correctness of responses. For example, reminding GPT-4 to "read the example" can lead to more accurate answers ![圖片](https://hackmd.io/_uploads/BJgbcuYnR.png) Gemini 1.5 has strong in-context learning capabilities. ![圖片](https://hackmd.io/_uploads/S1lKidY3A.png) ### Breaking Down a Task Ref: https://arxiv.org/abs/2210.06774 Recursive Reprompting & reversion Breaking down complex tasks can be achieved through techniques like recursive reprompting and revision. ![image](https://hackmd.io/_uploads/rkhLLO-6R.png) Similar to chain of thought technique. ![image](https://hackmd.io/_uploads/S1ToO_WaA.png) GPT3.5 uses chain of thought technique by default ## Self-Reflection BY LLM Self-reflection is valuable because it allows the model to identify errors. LLM has a chance to discover the answer is wrong. The reason is that generating the correct answer is harder than verifying the answer! (Like Human ) ![圖片](https://hackmd.io/_uploads/SyONt3oa0.png) ### Constitutional AI (Harmlessness Respons) (Harmlessness form AI feedback) This refers to AI ensuring its responses remain harmless, following predefined safety policies. -> Policy Protection ![圖片](https://hackmd.io/_uploads/SJkNWoK60.png) ![圖片](https://hackmd.io/_uploads/HkXvHiKTA.png) ## Self - Consistency Since LLM responses vary due to randomness, Multiple queries can be used to extract the best response, improving the accuracy rate. ![圖片](https://hackmd.io/_uploads/Hyo-OjFTC.png) https://arxiv.org/abs/2203.11171 ### Tree Of Thought Combines self-reflection and self-consistency to refine the reasoning process. ![圖片](https://hackmd.io/_uploads/S1oYf1s60.png) ## Letting GPT Use Tools (DALL-E, etc.) LLM can use different tool to answer complex question ![圖片](https://hackmd.io/_uploads/H1QQ3aoTC.png) One question cam using multiple tools to answer question ![圖片](https://hackmd.io/_uploads/SJq-aToa0.png) https://youtu.be/ZID220t_MpI?feature=shared ### Retrieval Augmented Generation (RAG) ![圖片](https://hackmd.io/_uploads/rJ2Tp2s6C.png) In this method, GPT-4 determines when to use external tools, such as the internet, to provide accurate and relevant answers. For instance, you could append “please use the internet to answer this question” to a prompt. ### Program of thought Letting GPT write code, execute the program, and return the result. ![圖片](https://hackmd.io/_uploads/By83zTjp0.png) ### GPT Plug-in This model uses GPT plug-ins to extend functionality and enable tool use. ## Model Cooperation Combining models leads to better results (1+1 > 1). Models can complement each other’s strengths. ![圖片](https://hackmd.io/_uploads/Hy6EG0o6A.png) e.g. Frugal GPT ### Self-Reflection Across Models Different module can reflect and collaborate to improve their result ![image](https://hackmd.io/_uploads/ByLdyu80C.png) ![image](https://hackmd.io/_uploads/H1oYydI0A.png) ### Multiple Agent & Multiple Debate Using multiple AI agents to hold debates improves problem-solving. Each agent offers insights, and the debate continues until the best conclusion is reached. ![image](https://hackmd.io/_uploads/SJAPnOL0C.png) ### Debate Methods Avoid ending a debate too quickly ![image](https://hackmd.io/_uploads/HyZZpOIA0.png) How to determine the debate has finished ![image](https://hackmd.io/_uploads/By8ERO8AR.png) ### Debate Prompt Avoid ending a Debate too quickly ![image](https://hackmd.io/_uploads/rJ2eJYIRR.png) # LLM Training ![image](https://hackmd.io/_uploads/Hy-YNpNl1x.png) ## Step 1 (self-supervised learning.) ![image](https://hackmd.io/_uploads/ByhpQ2Exyg.png) - Language knowledge - World Knowledge Expert knowledge Domain knowledge ### Train Methods Crawling Data from internet -> self-supervised learning. ![image](https://hackmd.io/_uploads/B14eU3VeJx.png) ### Cleaning Data - Content filtering - Filter Harmful Data - Text Extraction - Remove HTML tags - Quality Filtering - Remove Low Quality Data - Repetition Remove - Remove Duplicate Data - Test-set Filtering ### LLM Capability Parameters -> 先天資質 e.g. GPT-1 117 M GPT-2 1542M GPT-3 175 B Data Amount -> 後天學習 GPT-1 1G GPT-2 40G GPT-3 580G ## Step2 ## Instruction Find-Tuning (SuperVised Learning) -> Required Labels (USER: AI:) e.g USER:最帥的人是誰? AI:侯 USER:最帥的人是誰?侯 AI:智 USER:最帥的人是誰?侯智 AI:晟 Conclusion: we can use the parameter form the self-supervised leaning as the initial parameters. Next, Using artificial label to fine-Tuning the LLM parameters. ### Adapter Adapter is a technique that didn't adjust the original Parameter instead it attempt append new parameter to affect the original function. - Reducing Training Time - Avoiding new parameters is far way original parameter. ![image](https://hackmd.io/_uploads/r1V_p6Ngkx.png) Pre-Train -> Complex Rules Adapter - LoRA -> F(x)+G(x) Key Point -> If we have great Pre-Train, it will get great result After we fine-tune ![image](https://hackmd.io/_uploads/HJcTfRElJg.png) ![image](https://hackmd.io/_uploads/H1AVB0Vg1g.png) General Find-tune ![image](https://hackmd.io/_uploads/Hk3pH0Ngyx.png) Review Methods ! https://arxiv.org/abs/1909.03329v2 Instruct GPT ![image](https://hackmd.io/_uploads/SyEmOA4gJg.png) https://arxiv.org/abs/2203.02155 Instruct Fine-tune is a finish touch ![image](https://hackmd.io/_uploads/SJSNKA4xkg.png) -> In instruction fine tune -> the quality is impotent ### Reverse ChatGPT to gain Instruct Fine-tune Data -> What kind of Tasks you can do ? -> According the task to generate to probably user input(Question ). -> According the User Input to generate to the Answer. We got the -> '{Question} And {Answer}' (Generate By ChatGPT). ### Initial Parameters LLaMA -> Open Source Pre-Train Model. ### Self Find tune - Pre-Train: LLaMA - Instruct fine-tune: Reverse ChatGPT to gain Instruct Fine-tune Data ## Step 3 (RLFH) Reinforcement Learning form human feedback ! ------------------------------------------ # [Papper - Research](/OWtbE0z5Qk-pe7UE_2HuCQ)