方法論：生成式 AI 技術方法探討

# 方法論：生成式 AI 技術方法探討 ## Prompt Engineering :::success [Improve your prompts in the developer console](https://www.anthropic.com/news/prompt-improver) 更新時間：2024-11-15 ::: ## Agent :::success AWS Labs: [Multi-Agent Orchestrator framework](https://awslabs.github.io/multi-agent-orchestrator/) AWS Bedrock samples: [GitHub](https://github.com/aws-samples/amazon-bedrock-samples/tree/main) AWS Bedrock samples: [Amazon Bedrock Recipes](https://aws-samples.github.io/amazon-bedrock-samples/) AWS Bedrock samples: [Amazon Bedrock Agent Samples](https://github.com/awslabs/amazon-bedrock-agent-samples/tree/main) ::: :::info Anthropic: [Building effective agents](https://www.anthropic.com/research/building-effective-agents) 更新時間：2024-12-20 Paper: [A Survey on Large Language Model based Autonomous Agents](https://arxiv.org/abs/2308.11432) 更新時間：2023-08-22 NVIDIA blog: [Multi-agent](https://developer.nvidia.com/blog/search-posts/?q=multi+agent) ::: :::spoiler Anthropic: agentic system overview ![Screenshot 2024-12-22 at 4.49.17 PM](https://hackmd.io/_uploads/HknizwHr1x.png) ![Screenshot 2024-12-22 at 4.43.47 PM](https://hackmd.io/_uploads/Hyp9MvSr1e.png) ![Screenshot 2024-12-22 at 4.45.37 PM](https://hackmd.io/_uploads/H1EoGwrH1x.png) ![Screenshot 2024-12-22 at 4.50.09 PM](https://hackmd.io/_uploads/r17hGDrByx.png) ![Screenshot 2024-12-22 at 4.53.43 PM](https://hackmd.io/_uploads/Hy3nzDrrJl.png) ![Screenshot 2024-12-22 at 4.55.07 PM](https://hackmd.io/_uploads/BJF6GPrrJx.png) ![Screenshot 2024-12-22 at 4.58.23 PM](https://hackmd.io/_uploads/SyKAGDSS1l.png) ![Screenshot 2024-12-22 at 4.59.56 PM](https://hackmd.io/_uploads/SJcz7PBBkl.png) ::: **基本術語** Planning（規劃） Tool use（工具使用） Reflection（反思） Multi-agent communication（多代理溝通） Memory（記憶） **ReAct 方法** 旨在結合推理能力（如思維鏈提示）和行動能力（如行動計劃生成） **關於持久性和串流** 持久性（Persistence）允許在特定時間點保存代理的狀態，實現後續互動時的無縫恢復。這對於需要連續性的長時間運行應用特別重要。而串流（Streaming）則提供代理運作的即時可見性，發出表示代理當前行動和思考過程的信號序列。 --- ### Agent Design Pattern :::success LangChain: [Plan-and-Execute Agents](https://blog.langchain.dev/planning-agents/) 更新時間：2024-02-13 LangChain: [How to Build the Ultimate AI Automation with Multi-Agent Collaboration](https://blog.langchain.dev/how-to-build-the-ultimate-ai-automation-with-multi-agent-collaboration/) 更新時間：2024-05-09 ::: 適合Plan and Execute的場景任務步驟明確且相對固定需要大量並行工具調用對執行效率和成本敏感適合ReAct的場景需要高度動態交互任務邏輯複雜且難以預先規劃需要強調上下文理解和推理能力 ***ReAct*** TBA ***Plan and Execute*** [Resource](https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/plan-and-execute/plan-and-execute.ipynb) 規劃與執行範式包括明確的規劃階段和執行階段。首先生成計劃或步驟集，然後由子代理按順序執行。每個步驟後，可能根據結果更新或修改計劃，這個過程持續到計劃完成或需要重新規劃。主要特點先由 LLM 生成完整的多步驟計劃執行階段可使用較小/較弱的模型每個子任務可以不需要諮詢大型 LLM 即可執行執行完成後可以重新規劃優勢提高效率並降低成本改善長期規劃能力執行速度更快 - 單一 Agent 進行單工處理 ![Untitled](https://hackmd.io/_uploads/HyZmcOWGJg.png) ***Plan and Solve*** 進階的規劃執行架構。允許在規劃階段進行變量賦值，使得後續任務可以引用之前任務的結果，從而減少了對LLM的重複調用，提高了效率。主要特點使用變數引用語法(如 #E1)來引用先前的輸出每個任務只需要必要的上下文包含 Planner、Worker 和 Solver 三個組件優勢更有效率的任務執行更精確的上下文管理支援並行任務執行 ![image](https://hackmd.io/_uploads/rkn6W5gXJl.png) ***Self-Ask*** 採取自我提問。主要特點 LLM 主動生成問題來收集決策所需信息在示範中加入額外的自我提問步驟結合 Actor-Critic 架構使用優勢提高決策品質生成更具策略性的行動改善資訊收集能力 ***Critique and Revise*** 自我反思和改進。主要特點使用一個 LLM 生成輸出另一個 LLM 進行評估和提供反饋循環進行批評和重寫優勢持續改進輸出品質提高準確性支援多樣化任務類型 ***LLMCompiler*** ![image](https://hackmd.io/_uploads/SJ0gQql71g.png) ***Language Agent Tree Search*** [Resource](https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/lats/lats.ipynb) 這種方法在可能的行動狀態上執行樹搜索，生成行動、對其進行反思，並反向傳播信息以更新父節點。它允許跳回樹中的前序狀態，使代理能夠探索不同的行動路徑，同時利用早期反思和更新的信息。 - 單一 Agent 進行多工處理 ![Untitled](https://hackmd.io/_uploads/Hy24q_-Gyg.png) ***Flow Engineering*** [Reference](https://arxiv.org/abs/2401.08500) 如 AlphaCode 論文所述的流程工程，指的是設計具有定向信息流的架構，在關鍵點上進行迭代循環。這種方法將線性管道與循環迭代相結合，針對特定問題領域（如編碼）進行優化。目標是為特定任務設計最佳的信息流和決策流程。 ![lab6_4](https://hackmd.io/_uploads/HydBytZfkg.png) ***Multi-Agent Collaboration*** [Resource](https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/multi_agent/multi-agent-collaboration.ipynb) 在多代理架構中，多個代理協同工作於共享狀態。這些代理可以是提示、語言模型和工具的組合，各自貢獻其能力。關鍵在於它們都在同一個共享狀態上運作並傳遞，使它們能夠迭代地建立在彼此的工作之上。 * 須設計根據情境/條件設計 Routers * 須自行設計工作流 ![Untitled](https://hackmd.io/_uploads/SJ4BKuZMkl.png) ***Supervisor Agent*** [Resource](https://github.com/langchain-ai/langgraph/blob/main/docs/docs/tutorials/multi_agent/agent_supervisor.ipynb) 監督代理架構涉及一個中央監督代理，負責協調和管理各個子代理的輸入和輸出。監督者決定每個子代理的具體輸入和任務，這些子代理可以有自己的內部狀態和流程。與多代理方法不同，這裡沒有單一的共享狀態，而是由監督者協調子代理之間的信息流。 ![Untitled](https://hackmd.io/_uploads/Hkuxc_WGyg.png) ### Agentic Workflow 場景 :::success DeepLearning.AI: [Agentic Design Patterns](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?utm_campaign=The+Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz--9ARMthd09q0ABUi-abo6BH62BLbcwPo13LrXs9hUezs-L050Ay7b_rHdWuRIqBVOD6k_S) 更新時間：2024-05-20 ::: #### Reflection TBA #### Tool Use TBA #### Planning :::danger 請見 [DeepLearning.AI Part 4](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/?ref=dl-staging-website.ghost.io)，相對於 Reflection 這類型的串連工作流程，Planning 的能力相對不成熟。更新時間：2024-04-10 ::: #### Multi-Agent Collaboration :::danger 請見 [DeepLearning.AI Part 5](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/?ref=dl-staging-website.ghost.io)，相對於 Reflection 這類型的串連工作流程，Multi-Agent Collaboration 的能力相對不成熟。更新時間：2024-04-17 ::: --- ## RAG :::success ⭐ AWS blog: [From RAG to fabric: Lessons learned from building real-world RAGs - Part 1](https://aws.amazon.com/blogs/machine-learning/from-rag-to-fabric-lessons-learned-from-building-real-world-rags-at-genaiic-part-1/) 更新時間：2024-10-24 ::: ![ML-16452-ingestion](https://hackmd.io/_uploads/SkPnXHse1e.jpg) ### RAG 實務經驗 RAG系統失敗的兩大主因：檢索文檔缺少相關信息：模型可能會編造答案相關信息被大量無關數據淹沒：可能導致混淆和錯誤答案檢索器質量評估的四大指標： Top-k準確率：檢查前k個文檔中是否有相關內容平均倒數排名(Mean Reciprocal Rank, MRR)：評估相關文檔的排名位置召回率(Recall)：能找到多少相關文檔精確率(Precision)：避免檢索到無關文檔的能力生成回答的評估方式有兩種：主題專家評估：最可靠但速度慢基礎模型評估：可快速迭代，但需要最終由人類確認建議的自動評估框架： [Ragas](https://docs.ragas.io/en/stable/howtos/customizations/customize_models/#aws-bedrock)：評估上下文和答案質量 [LlamaIndex](https://docs.llamaindex.ai/en/latest/module_guides/evaluating/)：可獨立評估檢索和生成 [RefChecker](https://github.com/amazon-science/RefChecker)：專注檢測幻覺內容 #### RAG 問題診斷與解決方案對照表 | 問題類型 | 診斷指標 | 問題細節 | 解決方案 | 解決方案詳解 | |---------|---------|----------|----------|-------------| | **相關chunk未被檢索** | • Top-k準確率低 • 召回率低 • 人工評估發現 | 檢索器無法找到相關內容 | 1. 增加近鄰搜索文檔數 2. 使用混合搜索 3. 查詢重寫 | • 重新排序以減少最終chunks數量 • 結合關鍵詞和語義搜索，特別適用於專業術語 • 例：將"知識庫中有關中國經濟前景的信息？"重寫為"中國經濟前景" | | **檢索過多chunks** | • 精確率低 • 人工評估發現 | 檢索結果包含過多無關內容 | 1. 關鍵詞匹配限制 2. 元數據過濾 3. 查詢重寫 4. 重新排序 | • 只檢索明確提到目標實體的文檔 • 使用日期等元數據進行過濾 • 使用FM將用戶查詢重寫為結構化查詢 • 減少傳遞給FM的chunks數量 | | **缺少上下文** | • 僅通過人工評估判斷 | 檢索的chunk缺乏完整上下文 | 1. 調整分塊策略*1 2. 增加塊大小和重疊 3. 使用基於章節分塊 4. 使用small-to-large檢索 | • 根據問題類型選擇合適的塊大小 • 精確問題用小塊，廣泛問題用大塊 • 使用文檔結構作為分塊依據 • 同時檢索相鄰chunks以保持上下文 | | **FM生成問題** | • 人工評估 • LLM評估 | 檢索正確但生成答案有誤 | 1. 提示工程 2. 使用引號 3. 使用其他FM評估 | • 通過提示工程減少幻覺 • 要求在答案中使用引號便於核實 • 使用額外的FM評估或修正答案 | | **通用問題** | - | 上述方法均無效 | • 訓練自定義嵌入*2 | • 為特定領域或用例開發定制嵌入模型 | *1. 分塊策略參考：HTML 或 Markdown 請參考 Langchain [ Markdown Splitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/markdown_header_metadata/) 以及 [HTML Splitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/HTML_header_metadata/); PDF 請參考 [Textractor library](https://aws-samples.github.io/amazon-textract-textractor/installation.html) *2. 嵌入模型請參考：[FlagEmbedding](https://huggingface.co/BAAI/bge-large-en#frequently-asked-questions) --- ### RAG 教戰守則 (使用 OpenSeaerch) #### RAG 優化技術詳細對照表 | 優化技術 | 適用場景 | 實作方式 | 優點 | 注意事項 | |---------|---------|---------|------|----------| | 混合搜索 | 產品規格查詢系統 含有專業術語和產品名稱 | • 結合向量搜索和關鍵詞搜索 • 可調整語義和關鍵詞權重 | • 更好處理專業術語 • 提高產品名稱匹配準確度 | • 需要調整權重比例 | | 元數據添加 | 長文件分塊後需保持上下文 | • 在每個chunk前加入文件標題 • 添加文件相關元數據 | • 改善檢索準確度 • 保持文件上下文 | • 需在索引時加入元數據 | | Small-to-large檢索 | 需要完整上下文的場景 (如故障排除指南) | • 先檢索小塊 • 再獲取相鄰chunks • 最後合併提供給FM | • 保持細節的同時獲取完整上下文 • 避免內容被截斷 | • 需要追蹤chunk編號 • 需要額外的檢索步驟 | | 基於章節分塊 | • 財經新聞分析 • 操作指南 • 需要完整上下文的場景 | • 使用文件結構(HTML/Markdown)劃分 • 使用語義聚類 | • 產生更連貫的chunks • 保持文檔結構完整性 | • chunk大小可能不一致 • 需注意模型token限制 | | 查詢重寫 | 複雜查詢需要優化 | • 使用FM重寫查詢 • 提取關鍵詞 • 提取產品名稱等元數據 | • 去除無關信息 • 提高檢索準確度 • 可用於過濾 | • 會增加延遲 • 建議使用小型模型 | | 自定義嵌入訓練 | 其他方法都無效時 | • 收集正負樣本對 • 使用預訓練模型微調 • 部署評估 | • 可大幅提升性能 • 更好處理特定領域 | • 成本高 • 需要大量數據 • 需要較長時間 | #### FM回應優化策略 | 策略 | 實作方式 | 目的 | 注意事項 | |-----|----------|------|----------| | **提示詞工程** | • 限制只使用文檔信息 • 允許回答不知道 | 減少幻覺 | 需要明確的指令 | | **引用生成** | • 使用標籤輸出引用 • 內嵌引用到回答中 | • 提高可驗證性 • 增加可信度 | • 可能增加輸出長度 • 需要額外驗證 | | **引用驗證** | • 程式檢查引用存在 • UI顯示驗證結果 | • 確保答案準確性 • 提供來源透明度 | • 可能有假陰性 • 需要靈活處理引用格式 | --- ### 進階 RAG #### 利用資料前處理實施上下文檢索 :::success ⭐ Anthropic blog: [Introducing Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval) 更新時間：2024-09-20 ::: ```html <document> {{整個文檔}} </document> 這是我們想要在整個文檔中定位的文本塊 <chunk> {{塊的內容}} </chunk> 請提供簡短精確的上下文來幫助這個塊在整體文檔中定位,以改善塊的搜索檢索。只需回答簡潔的上下文,不要加入其他內容。 ``` 以上方法可以使用提示快取來降低上下文檢索的成本由於特殊的提示快取功能,使用Claude可以以低成本實現上下文檢索。使用提示快取,您不需要為每個塊傳入參考文檔。您只需將文檔加載到快取中一次,然後引用之前快取的內容即可。假設每個塊800個標記,文檔8k個標記,上下文說明50個標記,每個塊100個標記的上下文,生成上下文化塊的一次性成本為每百萬文檔標記1.02美元。 [Prompt cache](https://dev.to/m_sea_bass/comparing-prompt-caching-openai-anthropic-and-gemini-2mfh) #### 使用重新排序進一步提升性能 #### 外部參考資料 [RAG for Legal Documents](https://ipchimp.co.uk/2024/02/16/rag-for-legal-documents/) ⭐ Medium: [Legal Document RAG: Multi-Graph Multi-Agent Recursive Retrieval through Legal Clauses](https://medium.com/enterprise-rag/legal-document-rag-multi-graph-multi-agent-recursive-retrieval-through-legal-clauses-c90e073e0052) 更新時間：2024-09-08 --- ## Cache :::success AWS Community: [Bridging the Efficiency Gap: Mastering LLM Caching for Next-Generation AI (Part 2)](https://community.aws/content/2juMSXyaSX2qelT4YSdHBrW2D6s/bridging-the-efficiency-gap-mastering-llm-caching-for-next-generation-ai-part-2) 更新時間：2024-08-07 AWS Community: [Bridging the Efficiency Gap: Mastering LLM Caching for Next-Generation AI (Part 1)](https://community.aws/content/2k3vKGhjWVbvtjZHf0eHc3QsATI/bridging-the-efficiency-gap-mastering-llm-caching-for-next-generation-ai-part-1) 更新時間：2024-08-04 ::: --- ## Embedding 降維利用 Embedding 進行二次降維，作為語意分類的依據。降維技術包括： * principal component analysis (PCA) * linear discriminant analysis (LDA) * t-distributed stochastic neighbor embedding (T-SNE). :::success AWS blog: [Visualize vector embeddings stored in Amazon Aurora PostgreSQL and explore semantic similarities](https://aws.amazon.com/blogs/database/visualize-vector-embeddings-stored-in-amazon-aurora-postgresql-and-explore-semantic-similarities/) 更新時間：2024-10-24 ::: :::spoiler 實作步驟、服務、及工具部落格內的簡易範例，提供了如何使用 Amazon Bedrock 和 Aurora 執行 PCA 降維： * 準備您的數據集以生成向量嵌入。在本文中，我們使用包含產品類別的範例數據集。 * 使用 Amazon Bedrock FM titan-embed-text-v1 生成產品描述的向量嵌入。 * 將產品數據和向量嵌入存儲在安裝了 pgvector 擴展的 Aurora PostgreSQL 數據庫中。 * 導入執行 PCA 所需的程式庫。 * 使用 PCA 將高維向量嵌入轉換為三維嵌入。 * 生成三維嵌入的散點圖，並視覺化數據中的語義相似性。完成此解決方案，您必須具備以下先決條件： * 在您的 AWS 帳戶中建立的 Aurora PostgreSQL 相容叢集。 * 存放在 AWS Secrets Manager 中的 Aurora PostgreSQL 相容憑證。 * 在 Amazon Bedrock 中啟用 Amazon Titan Embeddings G1 – Text 模型的存取權限。 * 使用 Amazon SageMaker 在雲端執行 Python 腳本的 Jupyter notebook 執行個體。更多詳細資訊，請參閱在 SageMaker notebook 執行個體中建立 Jupyter notebook。 * 具備 pandas、NumPy、plotly 和 scikit-learn 的基本知識。這些是用於數據分析和機器學習(ML)的基本 Python 程式庫。 ::: --- ## Embedding 調優 :::success AWS blog: [Fine-tune a BGE embedding model using synthetic data from Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/fine-tune-a-bge-embedding-model-using-synthetic-data-from-amazon-bedrock/) 更新時間：2024-10-23 ::: [Beijing Academy of Artificial Intelligence, BAAI](https://huggingface.co/BAAI) [LlamaIndex fine-tune embeddings example](https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/) [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding) [LM-Cocktail](https://github.com/FlagOpen/FlagEmbedding/tree/master/LM_Cocktail) | [Paper](https://arxiv.org/pdf/2311.13534) [InformationRetrievalEvaluator](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/evaluation/InformationRetrievalEvaluator.py) 使用 [sentence-transformers](https://github.com/UKPLab/sentence-transformers/tree/master) --- ## Training :::success NVIDIA blog: [Mastering LLM Techniques: Training ](https://developer.nvidia.com/blog/mastering-llm-techniques-training/) 更新時間：2023-11-16 ::: --- ## Inference :::success NVIDIA blog: [Mastering LLM Techniques: Inference Optimization](https://developer.nvidia.com/blog/mastering-llm-techniques-inference-optimization/) 更新時間：2023-11-17 NVIDIA blog: [Accelerated Inference for Large Transformer Models Using NVIDIA Triton Inference Server](https://developer.nvidia.com/blog/accelerated-inference-for-large-transformer-models-using-nvidia-fastertransformer-and-nvidia-triton-inference-server/) 更新時間：2022-08-03 ::: --- ## Evaluation and Guardrails ### Alignment Faking :::success Anthropic blog: [Alignment faking in large language models](https://www.anthropic.com/research/alignment-faking) 更新時間：2024-12-18 ::: [Note](https://hackmd.io/@yuhsintsao/Bkg30Dilyl) [Evaluate prompts in the developer console](https://www.anthropic.com/news/evaluate-prompts) [Introducing the analysis tool in Claude.ai](https://www.anthropic.com/news/analysis-tool) [Not All LLMs Are Created Equal: Key Factors to Consider When Selecting an LLM ](https://broadcast.amazon.com/videos/1071750?query=Choosing%20LLM) Paper: [Benchmarking LLM Guardrails in Handling Multilingual Toxicity ](https://arxiv.org/html/2410.22153v1)