LLM-based agent system

# AI agent LLM-based agent可作為一個架構中的腦袋，或者說是controller，來解決複雜的問題，像是將使用者的請求轉換成一連串的行為。為了回應某些問題，僅靠LLM是不夠的，必須透過結合RAG或者有能力使用外部工具來完成。 ![image](https://hackmd.io/_uploads/Sk4lxjt80.png) ## Design pattern ![image](https://hackmd.io/_uploads/ryRNcgP80.png) LLM在zero-shot的情況下已有不錯表現，如果結合其他策略可以進一步提升效果。 ### Reflection * self-reflection: 透過multi-step、給予feedback的方式讓LLM檢視與改進自己的輸出。像是先請LLM生成一段程式，接著要求它根據正確性、效率等等來改進程式。 * w/ external tools: 同樣改進一段程式的例子，但給予外部工具像是看code跑單元測試的結果有沒有問題、到網頁搜尋資訊等等方式。 * muli-agent framework: 一個agent產出內容、一個agent對此給予建設性的建議。 ### Tool Use 僅依靠pre-trained的資料集是不夠的，透過網頁搜尋、執行程式等以協助LLM蒐集資訊、採取行動。通常會讓LLM產生特定格式的輸出，像是`{tool: web-search, query: "coffee maker reviews"}`，接著執行相關的行為後把結果餵回去給、或是`{tool: python-interpreter, code: "100 * (1+0.07)**12"}`。可以讓LLM調用的資源、函式可能包含搜尋不同網站、使用不同工具等等，所以要替每一個function都提供詳盡的功能描述與所需參數。因為可能有太多個function，沒辦法每一個都餵進去，需要進一步選擇哪一個subset才是有關的。其中一個做法是用RAG。舉例來說，讓LLM呼叫物件辨識相關的function，等同透過tool use讓模型有電腦視覺方面的能力。 ### Planning 讓LLM自行設計並執行多步驟的計畫以達成目標。跟tool use的差別在於會讓LLM更靈活的把任務拆分成子任務，但相對reflection與tool use來說結果更難被預測。 ### Multi-agent collaboration 多個agent合作以把任務分給不同角色、或是讓agent之間彼此討論激發想法，來勝過單一agent。與planning相似，結果較難被預測。 ## Ref [Agentic Design Patterns Part 1](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io) [Agentic Design Patterns Part 2, Reflection](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-2-reflection/?ref=dl-staging-website.ghost.io) [Agentic Design Patterns Part 3, Tool Use](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-3-tool-use/?ref=dl-staging-website.ghost.io) [Agentic Design Patterns Part 4, Planning](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/?ref=dl-staging-website.ghost.io) [Agentic Design Patterns Part 5, Multi-Agent Collaboration](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/?ref=dl-staging-website.ghost.io) # Techniques ## Retrieval augmented generation (RAG) 一般在使用LLM的時候，是把prompt餵進去。RAG會去查額外文件中相關的內容且一併餵入，這樣的技巧能彌補訓練時資料集過時、特定使用場景資訊、消除幻覺(hallucination)、提升回答品質等等問題，不用重新訓練而是讓LLM能取得最新的資料。 ![image](https://hackmd.io/_uploads/B1706dFIA.png) 更具體地來說，文件會被切成一塊一塊(chunk)的，接著轉換成embedding被存在vector store中，待需要時再來檢索與input相關的內容。 ### Steps 1. Indexing - Load: 讀取各種類型的source document 2. Indexing - Split: 將其切為特定大小的chunk 3. Indexing - Store: 把切好的chunk存進vector store 4. Retrieval and Generation - Retrieve: 選出database中與query相近的內容 5. Retrieval and Generation - Generate: 生成answer ## Prompt enginnering ## Fine-tuning ## Ref [Retrieval Augmented Generation (RAG) for LLMs](https://www.promptingguide.ai/research/rag) [Build a Retrieval Augmented Generation (RAG) App](https://python.langchain.com/v0.2/docs/tutorials/rag/) # Tools ## OLLAMA https://medium.com/@simon3458/ollama-llm-model-as-a-service-introduction-d849fb6d9ced ## Model ## Framework