如何打造個人化小型LLM：Fine-Tuning vs. RAG的選擇與實踐 - eric lam

# 如何打造個人化小型LLM：Fine-Tuning vs. RAG的選擇與實踐 - eric lam {%hackmd @mopcon/rkdTi5NTR %} > 從這開始 ## 講者資訊 https://eric-lam.com/ ## 投影片 https://1drv.ms/p/s!ApgFIQVzhoU9gh58dze0FD338x0O ## 重點摘要 * 各廠 * openai-chatgpt * anthropic-claude * meta-llama * google-gemini * x-grok * [chatbot arena](https://lmarena.ai/) * 為何需要個人化 LLM * 更了解模型運作 * 隱私和安全 * 客製化需求 * 增加效率 * 專業領域應用 * [localllama](https://www.reddit.com/r/LocalLLaMA/) ### 要做 LLM，已經有一位工程師，還需要多少硬體資源 * [Local llama 3.1 405b setup](https://www.reddit.com/r/LocalLLaMA/comments/1ej9uzh/local_llama_31_405b_setup/) * 推薦設定 (看不到圖片 QQ) ![image](https://hackmd.io/_uploads/ryndzVsxkl.png) ### FINE-TUNING vs RAG * 在實施之前 * Input directly * System Prompt：設定模型與使用者互動的風格 * [leaked-system-prompts](https://github.com/jujumilk3/leaked-system-prompts) * Prompting * 提早回答正確率 * 特定風格和角度 * Jailbreak - DeepInception * CoT * Prompting 範例 ![image](https://hackmd.io/_uploads/HyfnVVoeJe.png) * In Context Learning/Few-Shot prompting * 給多個例子讓模型理解你的意圖 * 透過這些例子去生成更多例子 * 做資料生成非常有效（Alpace 7B 便使用生成的資料 Fine-tune） * RAG（難度較低，建議在 Fine-tuning 前執行） * 檢索來源 * internal * 模型訓練時已有的資料 * external * 訓練時未看過的資料 * [unstructured - GitHub repo](https://github.com/Unstructured-IO/unstructured) * 資料檢索 * 步驟：搜尋、排序 * 兩大流派（可以混用） * Sparse - BM25 * ranking * Term Frequecy * IDF * Length * 匹配精確關鍵字、固定詞、短語時效果較好 * Dense - dense retrieval * 語意相似的情況下表現較好 * 比較 ![image](https://hackmd.io/_uploads/r1qXv4sxyl.png) ![image](https://hackmd.io/_uploads/H1FdD4olJg.png) ![image](https://hackmd.io/_uploads/HkXtvVsxkl.png) ![image](https://hackmd.io/_uploads/BJdCPNseyl.png) ![image](https://hackmd.io/_uploads/S1hyONogJe.png) * Fine-tuning（難度較高，可能弄壞 model） * 三個訓練階段 * 預訓練 * 準備大量文本讓模型去學 * LLAMA3 - 15T tokens * [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) * 微調 * instruction tuning * [Unsloth](https://github.com/unslothai/unsloth) * 使用的記憶體更少 * 可以處理的文本長度更長。 * 對齊 * RLHF * 告訴模型哪個 Sampling 的結果是想要的 * 輸出不同與 decoding strategy 有關 * Greedy * Beam-search * Sampling * Top-P * Top-K ### 結論 * 個人化LLM的選擇 * Fine-tuning：透過額外訓練將預訓練模型調整到特定任務 * RAG：將檢索的外部知識與生成能力結合 * 提升 LLM 的步驟 * 使用系統提示來控制行為風格 * 利用上下文來指導，不需重新訓練 * 透過 Fine-tuning or RAG 實現更專業/知識密集任務 * 未來趨勢 * 人類資源枯竭 * AI 資料污染 * 多模態 ## 問答 * fine-tuning 的資料量要多少，才能讓模型產生影響？ * pre-training: 全部 wikipedia 繁中資料 * fine-tuning: 1/10 wikipedia 繁中資料 * alignment: 1/100 wikipedia 繁中資料 * Unsloth 可以用在其他模型上嗎？ * 基本上 HuggingFace 上有的模型，都可以使用。 ## 補充資料 - retrival encode 文檔而不是 OCR，https://arxiv.org/abs/2406.11251 - coding leaderboard https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard