###### tags: `AFS` # How to Use Tool Call of FFM Conversation API with Llama3 Series Meta 在 4 月 19 日推出新一代大語言模型 Llama 3,有 80 億和 700 億參數兩種版本,在經過 ASUS AI 核心團隊調整過後,可以與函式一起使用,由大語言模型判斷是否呼叫函式。如果請求中包含一個或多個函式,則模型會根據提示的上下文決定是否需要呼叫函式。當模型確定應該使用某個函式時,會以該函式參數的格式化資料(**JSON**)來進行輸出。 模型是基於所提供的函式,再解析意圖後,輸出對應的 API 與結構化資料。特別注意的是,**模型只挑選出適用的函式,但並不會進行函式的操作,函式呼叫是由「應用端」所實作的業務邏輯來控制**。 函式的使用可以分為三個步驟: 1. 提供函式並輸入使用者問題來呼叫 FFM Conversation API,取得函式呼叫的資訊。 2. 使用模型輸出的函式資訊來呼叫對應的 API 或函式,並取得執行結果。 3. 再次呼叫 FFM Conversation API,將第 2 步驟所取得的執行結果一併傳入模型推論服務中,以便獲得總結。 ::: info :bulb: **提示:** Parallel function calling(目前尚未支援) Parallel function calls 允許輸出多個函式呼叫,進而可以並行執行和檢索結果。這樣可以減少 API 呼叫次數,來提高整體效能。 ::: ## API Support 隨著 Llama3-FFM 模型的發布,FFM Conversation API 會提供更完整的格式來完善 Function Calling 功能。[**舊格式**](https://hackmd.io/D6hChs1HRxyOaLul_HgirQ#Conversation-API) Request body 中的 functions 欄位,以及 Response 中的 function_call 欄位,未來將被棄用,後面的章節會描述新格式的使用方式。 ::: info :bulb: **提示:** 目前 LLMBackend 可以往前相容。也就是不論 FFM-Llama2 還是 FFM-Llama3 模型,使用舊版 function call 格式會得到舊版 functaion call response,使用新版 tool call 格式,會得到新版 tool call response。 ::: 1. **Request body by calling the model with tools** 透過大語言模型來選擇適當的函式並解析對應的參數。 * 參數 **`tools`** 為 array 格式,內容主要為函式所對應的 [**JSON Schema**](https://json-schema.org/understanding-json-schema) 描述,其中包含兩個必要參數。 * type * function | Field | Type | Required | Description | | -------- | -------- | -------- | -------- | | **tools** | array | Optional | JSON 格式的函式列表| :::spoiler **Properties** <br> <table> <tbody> <tr> <td><b>type</b>&emsp;<font color="#808080">string</font>&emsp;<font color="#FF0000">Required</font> <p>tool 類型,目前僅支援 <code>function</code>。</p> </td> </tr> </tbody> <tr> <td><p><b>function</b>&emsp;<font color="#808080">object</font>&emsp;<font color="#FF0000">Required</font></p> <p> <blockquote><table> <tbody> <tr> <td><b>description</b>&emsp;<font color="#808080">string</font>&emsp;<font color="#FF0000">Optional</font><br>函式功能的描述,模型根據描述選擇何時呼叫函式。 </td> </tr> <tr> <td><b>name</b>&emsp;<font color="#808080">string</font>&emsp;<font color="#FF0000">Required</font><br>函式名稱,必須是 a-z、A-Z、0-9,或是包含底線(_)或連接號(-)。 </td> </tr> <tr> <td><b>parameters</b>&emsp;<font color="#808080">object</font>&emsp;<font color="#FF0000">Optional</font><br>函式的輸入參數,使用 JSON Schema 來描述。用法可以參考此 <a href="https://json-schema.org/understanding-json-schema">JSON Schema reference</a> 連結。 </td> </tr> </tbody> </table> </p> </td> </tr> </table> ::: * 參數 **`tool_choice`** 為 string 或 object 格式,非必要參數,**此功能只支援 Llama3-FFM 版本**,主要用來指定函式呼叫的情境。當有提供函式時,此欄位預設為 **`"auto"`**,無函式時,預設值為 **`"none"`**。 * **`"none"`**:不執行函式呼叫的功能,而是文字生成。 * **`"auto"`**:由模型自行決定輸出為函式呼叫或是文字生成。 - 在此模式下,可透過回傳欄位 **`finish_reason`** 來判別模型的輸出,若是 **`"finish_reason": "tool_calls"`** 則為函式呼叫,非 **`"tool_calls"`** 則是文字生成。 * **`{"type": "function", "function": {"name": "my_function"}}`** 指定某 function 的函式呼叫。 - 在此模式下,因為已經明確指定要輸出函式呼叫,所以 **`finish_reason`** 是一般像 eos_token 等提示,並 **不會** 是 **`"tool_calls"`**,這部分由應用端自行解析內容來判別。 | Field | Type | Required | Description | | -------- | -------- | -------- | -------- | | **tool_choice** | string or object | Optional | 指定函式呼叫的情境| :::spoiler Possible Types <br> <table> <tbody> <tr> <td><font color="#808080">string</font> <p>- <code>none</code> 不執行函式呼叫,輸出為一般的文字生成。 <br>- <code>auto</code> 由模型決定輸出為函式呼叫或是文字生成。</p> </td> </tr> </tbody> <tr> <td><font color="#808080">object</font> <p>- <code>{"type": "function", "function": {"name": "my_function"}}</code> 指定函式,強制模型輸出指定的函式呼叫。 </p> <blockquote><p>properities</p> <table> <tbody> <tr> <td> <b>type</b>&emsp;<font color="#808080">string</font>&emsp;<font color="#FF0000">Required</font> <p>tool 類型,前僅支援 <code>function</code>。 </p> </td> </tr> </tbody> <tbody> <tr> <td> <b>function</b>&emsp;<font color="#808080">object</font>&emsp;<font color="#FF0000">Required</font> <blockquote><p>函式屬性</p> <table> <tbody> <tr> <td> <b>name</b>&emsp;<font color="#808080">string</font>&emsp;<font color="#FF0000">Required</font> <p>函式名稱 </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </table> ::: #### **Request 使用範例** :::spoiler Non-Streaming ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.01, "top_k": 100, "top_p": 0.93, "seed": 42 }, "stream": false }' ``` ::: :::spoiler Streaming ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.01, "top_k": 100, "top_p": 0.93, "seed": 42 }, "stream": true }' ``` ::: #### **使用 tool_choice 的 Request 範例** :::spoiler Use auto ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.01, "top_k": 100, "top_p": 0.93, "seed": 42 }, "tool_choice": "auto", "stream": false }' ``` ::: :::spoiler Use none ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.01, "top_k": 100, "top_p": 0.93, "seed": 42 }, "tool_choice": "none", "stream": false }' ``` ::: :::spoiler Specifies a function ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.01, "top_k": 100, "top_p": 0.93, "seed": 42 }, "tool_choice": { "type": "function", "function": { "name": "get_current_weather" } }, "stream": false }' ``` ::: 2. **Response by calling the model with functions** 大語言模型回傳函式呼叫的結果 | Field | Type | | -------- | -------- | | **tool_calls** | array | :::spoiler Possible Types <br> <table> <tbody> <tr> <td><b>id</b>&emsp;<font color="#808080">string</font> <p>函式呼叫識別碼</p> </td> </tr> </tbody> <tbody> <tr> <td><b>type</b>&emsp;<font color="#808080">string</font> <p>tool 類型。目前僅支援 <code>function</code>。</p> </td> </tr> </tbody> <tbody> <tr> <td><b>function</b>&emsp;<font color="#808080">object</font> <p>為包含函式名稱、參數值的函式呼叫內容。</p> </td> </tr> </tbody> </table> ::: #### **使用範例** ::: spoiler Response with Non-Streaming ```JSON= { "tool_calls": [ { "index": 0, "type": "function", "id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" } } ], "details": null, "total_time_taken": "1.17 sec", "prompt_tokens": 141, "generated_tokens": 43, "total_tokens": 184, "finish_reason": "tool_calls" } ``` ::: ::: spoiler Response with Streaming ```JSON= data: {"tool_calls": [{"index": 0, "type": "function", "id": "call_afc9227158e6458798d789ab1f84c920", "function": {"name": "get_current_weather", "arguments": ""}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "{\""}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "location"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "\":"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "Boston, MA"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "\","}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "unit"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "\":"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "c"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "elsius"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": "\"}"}}], "details": null, "finish_reason": null} data: {"tool_calls": [{"index": 0, "function": {"arguments": ""}}], "details": null, "total_time_taken": "1.17 sec", "prompt_tokens": 141, "generated_tokens": 43, "total_tokens": 184, "finish_reason": "tool_calls"} ``` ::: ::: spoiler Response with tool_choice by auto ```JSON= { "tool_calls": [ { "type": "function", "id": "call_fe97cf6c20ae4b00b88b660b853d93d9", "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" } } ], "details": null, "total_time_taken": "1.16 sec", "prompt_tokens": 135, "generated_tokens": 43, "total_tokens": 178, "finish_reason": "tool_calls" } ``` ::: ::: spoiler Response with tool_choice by none ```JSON= { "generated_text": "As of my last update, the weather in Boston was quite chilly with temperatures around 40°F (4°C) and some light rain. However, it's always a good idea to check the latest weather forecast before heading out, as conditions can change quickly.", "details": null, "total_time_taken": "1.41 sec", "prompt_tokens": 18, "generated_tokens": 53, "total_tokens": 71, "finish_reason": "stop_sequence" } ``` ::: ::: spoiler Response with tool_choice by specifies a function ```JSON= { "tool_calls": [ { "type": "function", "id": "call_7JK8LIPTho7DffbvceTV5Oey", "function": { "name": "get_current_weather", "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" } } ], "details": null, "total_time_taken": "0.82 sec", "prompt_tokens": 159, "generated_tokens": 18, "total_tokens": 177, "finish_reason": "eos_token" } ``` ::: 3. **Request body by sending the response back to the model to summarize** 大語言模型將函式執行後的結果,以容易理解的方式來輸出。這個步驟屬於多輪對話的情境,除了要提供之前的歷史對話紀錄,還需要將執行函式的結果,放在 role 為 **`tool`** 的 **content** 欄位中。 | Field |value | | -------- | -------- | |**role** | tool | |**tool_call_id** | 引用 tool_calls 中的函式呼叫識別碼 | |**content** | 函式呼叫的執行結果 | #### **使用範例** ::: spoiler Request ```JSON= export API_KEY={API_KEY} export API_URL={API_URL} curl -X POST "${API_URL}/models/conversation" \ -H "accept: application/json" \ -H "X-API-KEY:${API_KEY}" \ -H "X-API-HOST: afs-inference" \ -H "content-type: application/json" \ -d '{ "model": "Llama-3-8b", "messages": [ { "role": "user", "content": "What is the weather like in Boston?" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"Boston, MA\", \"unit\": \"celsius\"}" } } ] }, { "role": "tool", "tool_call_id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", "content": "{\"location\": \"Boston, MA\", \"temperature\": \"22\", \"unit\": \"celsius\"}" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA" }, "unit": { "type": "string", "enum": [ "celsius", "fahrenheit" ] } }, "required": [ "location" ] } } } ], "parameters": { "show_probabilities": false, "max_new_tokens": 350, "frequence_penalty": 1, "temperature": 0.5, "top_k": 100, "top_p": 0.93, "seed": 42 } }' ``` ::: ::: spoiler Response ```JSON= { "generated_text": "The current temperature in Boston, MA is 22 degrees Celsius.", "details": null, "total_time_taken": "0.43 sec", "prompt_tokens": 250, "generated_tokens": 14, "total_tokens": 264, "finish_reason": "stop_sequence" } ``` :::