Try   HackMD
tags: AFS

How to Use Tool Call of FFM Conversation API with Llama3 Series

Meta 在 4 月 19 日推出新一代大語言模型 Llama 3,有 80 億和 700 億參數兩種版本,在經過 ASUS AI 核心團隊調整過後,可以與函式一起使用,由大語言模型判斷是否呼叫函式。如果請求中包含一個或多個函式,則模型會根據提示的上下文決定是否需要呼叫函式。當模型確定應該使用某個函式時,會以該函式參數的格式化資料(JSON)來進行輸出。

模型是基於所提供的函式,再解析意圖後,輸出對應的 API 與結構化資料。特別注意的是,模型只挑選出適用的函式,但並不會進行函式的操作,函式呼叫是由「應用端」所實作的業務邏輯來控制

函式的使用可以分為三個步驟:

  1. 提供函式並輸入使用者問題來呼叫 FFM Conversation API,取得函式呼叫的資訊。
  2. 使用模型輸出的函式資訊來呼叫對應的 API 或函式,並取得執行結果。
  3. 再次呼叫 FFM Conversation API,將第 2 步驟所取得的執行結果一併傳入模型推論服務中,以便獲得總結。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
提示: Parallel function calling(目前尚未支援)
Parallel function calls 允許輸出多個函式呼叫,進而可以並行執行和檢索結果。這樣可以減少 API 呼叫次數,來提高整體效能。

API Support

隨著 Llama3-FFM 模型的發布,FFM Conversation API 會提供更完整的格式來完善 Function Calling 功能。舊格式 Request body 中的 functions 欄位,以及 Response 中的 function_call 欄位,未來將被棄用,後面的章節會描述新格式的使用方式。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
提示: 目前 LLMBackend 可以往前相容。也就是不論 FFM-Llama2 還是 FFM-Llama3 模型,使用舊版 function call 格式會得到舊版 functaion call response,使用新版 tool call 格式,會得到新版 tool call response。

  1. Request body by calling the model with tools
    透過大語言模型來選擇適當的函式並解析對應的參數。

    • 參數 tools 為 array 格式,內容主要為函式所對應的 JSON Schema 描述,其中包含兩個必要參數。

      • type
      • function
      Field Type Required Description
      tools array Optional JSON 格式的函式列表
      Properties
      typestringRequired

      tool 類型,目前僅支援 function

      functionobjectRequired

      descriptionstringOptional
      函式功能的描述,模型根據描述選擇何時呼叫函式。
      namestringRequired
      函式名稱,必須是 a-z、A-Z、0-9,或是包含底線(_)或連接號(-)。
      parametersobjectOptional
      函式的輸入參數,使用 JSON Schema 來描述。用法可以參考此 JSON Schema reference 連結。

    • 參數 tool_choice 為 string 或 object 格式,非必要參數,此功能只支援 Llama3-FFM 版本,主要用來指定函式呼叫的情境。當有提供函式時,此欄位預設為 "auto",無函式時,預設值為 "none"

      • "none":不執行函式呼叫的功能,而是文字生成。
      • "auto":由模型自行決定輸出為函式呼叫或是文字生成。
        • 在此模式下,可透過回傳欄位 finish_reason 來判別模型的輸出,若是 "finish_reason": "tool_calls" 則為函式呼叫,非 "tool_calls" 則是文字生成。
      • {"type": "function", "function": {"name": "my_function"}} 指定某 function 的函式呼叫。
        • 在此模式下,因為已經明確指定要輸出函式呼叫,所以 finish_reason 是一般像 eos_token 等提示,並 不會"tool_calls",這部分由應用端自行解析內容來判別。
      Field Type Required Description
      tool_choice string or object Optional 指定函式呼叫的情境
      Possible Types
      string

      - none 不執行函式呼叫,輸出為一般的文字生成。
      - auto 由模型決定輸出為函式呼叫或是文字生成。

      object

      - {"type": "function", "function": {"name": "my_function"}} 指定函式,強制模型輸出指定的函式呼叫。

      properities

      typestringRequired

      tool 類型,前僅支援 function

      functionobjectRequired

      函式屬性

      namestringRequired

      函式名稱

    Request 使用範例

    Non-Streaming
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.01, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ }, ​​​​ "stream": false ​​​​}'
    Streaming
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.01, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ }, ​​​​ "stream": true ​​​​}'

    使用 tool_choice 的 Request 範例

    Use auto
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.01, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ }, ​​​​ "tool_choice": "auto", ​​​​ "stream": false ​​​​}'
    Use none
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.01, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ }, ​​​​ "tool_choice": "none", ​​​​ "stream": false ​​​​}'
    Specifies a function
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.01, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ }, ​​​​ "tool_choice": { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather" ​​​​ } ​​​​ }, ​​​​ "stream": false ​​​​}'
  2. Response by calling the model with functions
    大語言模型回傳函式呼叫的結果

    Field Type
    tool_calls array
    Possible Types
    idstring

    函式呼叫識別碼

    typestring

    tool 類型。目前僅支援 function

    functionobject

    為包含函式名稱、參數值的函式呼叫內容。

    使用範例

    Response with Non-Streaming
    ​​​​{ ​​​​ "tool_calls": [ ​​​​ { ​​​​ "index": 0, ​​​​ "type": "function", ​​​​ "id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" ​​​​ } ​​​​ } ​​​​ ], ​​​​ "details": null, ​​​​ "total_time_taken": "1.17 sec", ​​​​ "prompt_tokens": 141, ​​​​ "generated_tokens": 43, ​​​​ "total_tokens": 184, ​​​​ "finish_reason": "tool_calls" ​​​​}
    Response with Streaming
    ​​​​data: {"tool_calls": [{"index": 0, "type": "function", "id": "call_afc9227158e6458798d789ab1f84c920", "function": {"name": "get_current_weather", "arguments": ""}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "{\""}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "location"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "\":"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "Boston, MA"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "\","}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "unit"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "\":"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": " \""}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "c"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "elsius"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": "\"}"}}], "details": null, "finish_reason": null} ​​​​data: {"tool_calls": [{"index": 0, "function": {"arguments": ""}}], "details": null, "total_time_taken": "1.17 sec", "prompt_tokens": 141, "generated_tokens": 43, "total_tokens": 184, "finish_reason": "tool_calls"}
    Response with tool_choice by auto
    ​​​​{ ​​​​ "tool_calls": [ ​​​​ { ​​​​ "type": "function", ​​​​ "id": "call_fe97cf6c20ae4b00b88b660b853d93d9", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" ​​​​ } ​​​​ } ​​​​ ], ​​​​ "details": null, ​​​​ "total_time_taken": "1.16 sec", ​​​​ "prompt_tokens": 135, ​​​​ "generated_tokens": 43, ​​​​ "total_tokens": 178, ​​​​ "finish_reason": "tool_calls" ​​​​}
    Response with tool_choice by none
    ​​​​{ ​​​​ "generated_text": "As of my last update, the weather in Boston was quite chilly with temperatures around 40°F (4°C) and some light rain. However, it's always a good idea to check the latest weather forecast before heading out, as conditions can change quickly.", ​​​​ "details": null, ​​​​ "total_time_taken": "1.41 sec", ​​​​ "prompt_tokens": 18, ​​​​ "generated_tokens": 53, ​​​​ "total_tokens": 71, ​​​​ "finish_reason": "stop_sequence" ​​​​}
    Response with tool_choice by specifies a function
    ​​​​{ ​​​​ "tool_calls": [ ​​​​ { ​​​​ "type": "function", ​​​​ "id": "call_7JK8LIPTho7DffbvceTV5Oey", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "arguments": "{\"location\": \"Boston, MA\", \"unit\": \"celsius\"}" ​​​​ } ​​​​ } ​​​​ ], ​​​​ "details": null, ​​​​ "total_time_taken": "0.82 sec", ​​​​ "prompt_tokens": 159, ​​​​ "generated_tokens": 18, ​​​​ "total_tokens": 177, ​​​​ "finish_reason": "eos_token" ​​​​}
  3. Request body by sending the response back to the model to summarize
    大語言模型將函式執行後的結果,以容易理解的方式來輸出。這個步驟屬於多輪對話的情境,除了要提供之前的歷史對話紀錄,還需要將執行函式的結果,放在 role 為 toolcontent 欄位中。

    Field value
    role tool
    tool_call_id 引用 tool_calls 中的函式呼叫識別碼
    content 函式呼叫的執行結果

    使用範例

    Request
    ​​​​export API_KEY={API_KEY} ​​​​export API_URL={API_URL} ​​​​curl -X POST "${API_URL}/models/conversation" \ ​​​​-H "accept: application/json" \ ​​​​-H "X-API-KEY:${API_KEY}" \ ​​​​-H "X-API-HOST: afs-inference" \ ​​​​-H "content-type: application/json" \ ​​​​-d '{ ​​​​ "model": "Llama-3-8b", ​​​​ "messages": [ ​​​​ { ​​​​ "role": "user", ​​​​ "content": "What is the weather like in Boston?" ​​​​ }, ​​​​ { ​​​​ "role": "assistant", ​​​​ "content": "", ​​​​ "tool_calls": [ ​​​​ { ​​​​ "id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "arguments": "{\"location\":\"Boston, MA\", \"unit\": \"celsius\"}" ​​​​ } ​​​​ } ​​​​ ] ​​​​ }, ​​​​ { ​​​​ "role": "tool", ​​​​ "tool_call_id": "call_8a53fdf7e96c418aaaff76d2e1bb9964", ​​​​ "content": "{\"location\": \"Boston, MA\", \"temperature\": \"22\", \"unit\": \"celsius\"}" ​​​​ } ​​​​ ], ​​​​ "tools": [ ​​​​ { ​​​​ "type": "function", ​​​​ "function": { ​​​​ "name": "get_current_weather", ​​​​ "description": "Get the current weather in a given location", ​​​​ "parameters": { ​​​​ "type": "object", ​​​​ "properties": { ​​​​ "location": { ​​​​ "type": "string", ​​​​ "description": "The city and state, e.g. San Francisco, CA" ​​​​ }, ​​​​ "unit": { ​​​​ "type": "string", ​​​​ "enum": [ ​​​​ "celsius", ​​​​ "fahrenheit" ​​​​ ] ​​​​ } ​​​​ }, ​​​​ "required": [ ​​​​ "location" ​​​​ ] ​​​​ } ​​​​ } ​​​​ } ​​​​ ], ​​​​ "parameters": { ​​​​ "show_probabilities": false, ​​​​ "max_new_tokens": 350, ​​​​ "frequence_penalty": 1, ​​​​ "temperature": 0.5, ​​​​ "top_k": 100, ​​​​ "top_p": 0.93, ​​​​ "seed": 42 ​​​​ } ​​​​}'
    Response
    ​​​​{ ​​​​ "generated_text": "The current temperature in Boston, MA is 22 degrees Celsius.", ​​​​ "details": null, ​​​​ "total_time_taken": "0.43 sec", ​​​​ "prompt_tokens": 250, ​​​​ "generated_tokens": 14, ​​​​ "total_tokens": 264, ​​​​ "finish_reason": "stop_sequence" ​​​​}