Function call or Tool call in LLM 介紹

# 基本介紹 **Q. 為什麼要在 LLM 模型中使用 API？** A. LLM 模型有時可能提供的回答不夠準確，或缺少最新資訊或相關技術資訊，導致回應過時或不合時宜。在這些情況下，我們會希望模型能藉助 API 獲取即時、可靠的資料來給出精確回答。 **範例問題: 今天傍晚5點新竹的天氣如何？** 在這個例子中，我們並不希望 ChatGPT 依賴舊有的學習資料來回答天氣問題，而是希望它能夠即時查詢最新的氣象資訊，像是從氣象局的網站獲得準確的回應。為了達成這樣的效果，GPT-4 已經具備內建 API 搜尋功能，可以直接透過網路查詢並提供更新過的資訊，避免過時回答的問題。（如下圖） ![GPT範例圖](https://hackmd.io/_uploads/SJ_yZUvZ1g.png) ### 難處過去大家寫 LLM 的 function call API 各有各的寫法，特別是因為模型和設定不同，像 token 處理、模型類型等等，開源套件也常因這些差異而無法共通使用。那為了讓大家方便，Hugging Face 開發團隊很好心的整理了"近乎"所有模型都能使用的 API 工具文件。現在只要用 transformers 庫裡的內建功能，就能很方便地實現 API 呼叫，解決了很多兼容性問題。 # 使用介紹接下來的例子，我將透過hugging face的範例程式，來示範如何做到call API。 ## 前言在使用這些功能之前，我們先來了解開發團隊的設計背景。由於不同語言模型使用的“起始 token”與“結束 token”可能有所各異。為了解決這個問題，開發團隊採用了 chat-template 的方式來統一這些格式，並使用了類似 Jinja 的模板語言來標準化模板格式。（使用tiny Jinja template）- [Jinja 官方網站](https://jinja.palletsprojects.com/en/stable/) 但即便有了 chat-template，我們仍可能因不同程式語言（如 JavaScript、Rust 等）間傳入的 function 格式各不相同，可能會造成不一致性。因此，為了更好地統一，所有傳入的 chat-template 會被自動轉換成 JSON 格式。即使我們傳入的是 Python 函數，系統仍會自動將其轉換為標準化的 JSON schema，避免了不同程式語言的兼容問題。 ## 範例 ### 1. 定義function 首先定義需要call的function，注意格式需符合規格(Doc-string)[用"""包起來]，以下為官方的範例檔，以下為"get_current_temperature的function"，而當中需包含: 1. 函式的description（最好明顯地寫出此函式的功能、input、output、何時使用、使用時機等等），未來會被引入當作參數使用。 ``` Gets the temperature at a given location. ``` 2. 所需要的參數（Args），包含參數名稱及參數介紹 ``` Args: location: The location to get the temperature for, in the format "city, country" ``` 3. return的值 ``` return 22.0 ``` 完整合併起來的function將如下（上、下皆為官方的範例code）: ``` def get_current_temperature(location: str): """ Gets the temperature at a given location. Args: location: The location to get the temperature for, in the format "city, country" """ return 22.0 ``` ### 2.窺探function的JSON schema 為了更加了解我們所定義的function，傳入chat_template後的形式(JSON schema)，我們可以透過內建的function-"get_json_schema"來窺探傳入後得樣子： ``` from transformers.utils import get_json_schema get_json_schema(get_current_weather) ``` 以下get_json_schema的執行結果，將清楚看到func_name, description, parameters，等所有function資訊。 <details> <summary>執行結果</summary> ``` { "type": "function", "function": { "name": "get_current_temperature", "description": "Gets the temperature at a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the temperature for" } }, "required": [ "location" ] } } } ``` </details> ### 3. 檢查prompt 而有了function之後，我們建立一個tool（將是一個List），將所有可參考使用的function都放入裡面，（也就是可以有多個tool function提供模型參考），並建立一個chat_history將問題丟入模型。 ``` tools = [get_current_temperature] chat = [ {"role": "user", "content": "Hey, what's the weather like in Paris right now?"} ] ``` ### 4. 添加tool至chat_template 以上都初始化完畢後，透過下方tokenizer.apply_chat_template(tools=tools)，就可以將tool list function傳入。(return_dict是為了回傳格式包含:input_ids、attention_mask的形式，讓model了解input_token以及mask(token的對應位置1或0，當中1表有用、0表填充)) ``` tool_prompt = tokenizer.apply_chat_template( chat, tools=tools, return_tensors="pt", return_dict=True, add_generation_prompt=True, ) ``` 而我們也簡單地將tool_prompt印出，讓我們清楚知道他是如何加進去的，可以看到他是將tools直接帶入system_prompt中。（但要注意的是，並非所有的model都是這樣就可以使用，需要確認"所使用的LLM model"，當時訓練時是如何引用所有參數的，有可能某些model並不是這樣帶入！） Ex: llama-8b，雖然官方說可以使用function-calling（也確實可以使用），但要注意我們照著上述執行，並不會顯示下方的結果，因此需自行帶入system_prompt。並且model回應的response，也不盡相同，可以參考最下方參考資訊的內容及網站。 <details> <summary>token介紹及使用方法</summary> 那這邊也可以讓大家學會一個小技巧，因為每個model將對應不同tokenizer（訓練時通常會一並儲存），可以利用tokenizer.decode將input解碼。（要注意一個點: chat_template是要丟入模型的，模型看不懂文字，只看得懂token，所以要先把input_ids拿出來並且利用.squeeze(0)讓他變成一維向量，在進行解碼(decode)，以上就可以成功看到chat_template轉成token後在解碼後的樣子！ </details> ![帶入的chat_template](https://hackmd.io/_uploads/Byt-bGsWJx.png) ### 5.執行將tool_prompt帶入model.generate答案出來，如下: ``` out = model.generate(**tool_prompt, max_new_tokens=128) generated_text = out[0, tool_prompt['input_ids'].shape[1]:] print(tokenizer.decode(generated_text)) ``` <details> <summary>簡單介紹為何上方code長這樣</summary> 可以簡單發現中間使用了out[0, tool_prompt['input_ids'].shape[1]:]，是因為out是產生出來的結果，而由於使用的是auto-regression的model，所以他是將我們的input-token帶入後，接續後面產出我們所需要的回應。所以直接取input位置之後就會是答案！（當然，上述過程若我們用huggingface包裝好的pipeline的話，就不用這麼麻煩了，會是response list，然後包在['generated_text']這個key裡） </details> ### 6.執行結果在此模型裡，我們可以得到下方的結果，將會透過<tool_call>的XML標籤包住。 ``` <tool_call> {"arguments": {"location": "Paris, France"}, "name": "get_current_temperature"} </tool_call><|im_end|> ``` 注意：這是因為此模型在training時的chat_template就是包含此system_prompt，並非所有模型皆可適用，如下圖：(可以透過tokenizer.chat_template去看他怎包的) 參考網站:[Huggingface介紹chat_template如何使用](https://huggingface.co/docs/transformers/main/en/chat_templating#automated-function-conversion-for-tool-use) ![image](https://hackmd.io/_uploads/r1GPfriZJe.png) ### 7. 添加結果進chat history **注意！！！** 目前存在的所有function call都並沒有讓model直接幫你執行function的存在，而是model將產生"可能的function"，而我們透過"自行執行結果，並且將結果添加至chat history"，再讓model回應給使用者，如下：先前我們從model回應得到要call function 「get_current_temperature with "location": "Paris, France"」，因此我們要將這個回應加入chat_history。 <details> <summary>添加tool response至chat history</summary> ``` message = { "role": "assistant", "tool_calls": [ { "type": "function", "function": { "name": "get_current_temperature", "arguments": {"location": "Paris, France"} } } ] } chat.append(message) ``` </details> 接者，自己執行function，並且把結果加回chat history，如下: ![image](https://hackmd.io/_uploads/H1Tjnrs-Jg.png) 因此，最終chat history會如: ![image](https://hackmd.io/_uploads/H1Hx6Ss-ye.png) 最後，將此chat history丟入tokenizer包裝，再丟入model執行，得到回應： ![image](https://hackmd.io/_uploads/ryA4aSoWke.png) ### 8.總結目前的 function call 機制並非直觀地讓模型直接執行 API 來獲取答案。所有模型都是返回「可能需要使用的 function」，但實際的執行步驟仍然需要由開發者來完成。具體流程： - 模型返回 function 建議：模型並不會直接調用 API，而是生成一個建議，提示哪些 function 可能有助於回答。 - 開發者執行 function：根據模型的建議，開發者編寫程式碼來調用相應的 function 或 API，並獲取結果。 - 結果加入對話歷史：執行 function 後，將返回的結果（答案）手動加入對話歷史中，這樣才能讓模型在後續的回應中使用該答案。這種 function call 機制本質上是「外包」了一個 tool 執行步驟給開發者，而模型僅負責判斷需要調用的 function，但不直接執行它。 # 其餘參考資訊 ### Function call v.s. Tool call 其實兩者是一樣的東西，但是Tool call是進化版的Function call 主要差別在於"Tool call"多了平行運算，讓Tool call可以一次產生多個function的回答，而Function call一次只能回答一個。 [OpenAI官方回答差別](https://community.openai.com/t/tools-v-functions-performance-difference/787997) #### 小實驗結果 Q: 想請問台北、波士頓、美國紐約的分別溫度為何？（理論上要回傳3個地方) A: tool call結果，可以看到回傳的答案tool_calls確實包含3個function分別對應三個地方。 ![tool call結果](https://hackmd.io/_uploads/rJ73lLs-kg.png) A: function call結果，可以看到回傳的答案FunctionCall只有"台北"。 ![function call結果](https://hackmd.io/_uploads/S1tzW8jbJg.png) # 其餘閱讀紀錄 ### OpenAI 官方function call的文檔[OpenAI document](https://platform.openai.com/docs/guides/function-calling) #### 回傳JSON schema format 1. 可以透過setting strict: true 讓回傳的JSON schema一定符合 - 但官方不建議使用此方法，這方法雖然嚴格保證會符合，但會有反效果。 2. 透過更詳細的system prompt（何時要使用，使用時機，使用方法） 3. function description、Args description（都是越詳細越好） 4. 透過設定function arguments的enum（限縮參數範圍）但要注意，官方所說tool calls的function 至多10-20個就好。因為他其實是將所有function轉變成token後放入system prompt，接著隨著chat history丟入model執行。因此，model大小很重要，token大小也很重要，不可能無上限的添加。 ### Llama 3.1 官方參考文檔:[Llama document網站](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/#-tool-calling-(8b/70b/405b)-) 目前有三種使用tool call的方法: - Built-in Tools (Brave, Wolfram) - Brave Search: Tool call to perform web searches. - Wolfram Alpha: Tool call to perform complex mathematical calculations. - JSON-based Tool Calling (官方提供格式) - User-defined Custom Tool Calling （自定義格式) 目前在taiwan-8B上測試，算是有效，但是若要套上其他adapter，會爆炸...。官方目前有說建議tool call使用在70B、405B的model上，雖然8B在zero-shot 上也是可以部分使用，但要維持對話及複雜的tool call並不建議。 #### 測試結果目前在本地端使用，8B + system prompt執行簡單的function call是可行的，但回答的形式卻很不一定，由於目前model不會"幫使用者直接執行API"，因此若格式跑掉，這樣在後續處理上要有確保機制才行。 # 備註 - 小的可能有表述不清的部分，還請見諒。 - 若有任何想法，也不吝情指教，還請留言交流，讓彼此都更加強大！ - [官方釋出的Tool documentation](https://huggingface.co/blog/unified-tool-use) - [台灣公開氣象資料平台](https://opendata.cwa.gov.tw/devManual/datalist) - [Llama 官網](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/#-tool-calling-(8b/70b/405b - [OpenAI 官網](https://platform.openai.com/docs/guides/function-calling#name-functions-intuitively-with-detailed-descriptions)