打造一個 AI Discord 聊天機器人？ - 工具篇

# 打造一個 AI Discord 聊天機器人？ - 工具篇 [TOC] :::success **[English Version](https://hackmd.io/@stanley2058/r1FbvLK3lg)** {%preview https://hackmd.io/@stanley2058/r1FbvLK3lg %} 這是這個系列的第二篇文章，還沒看過第一篇的可以下面。 {%preview https://hackmd.io/@stanley2058/Hkzbhiv2ee %} ::: ## 0. 什麼是工具？鐵錘、螺絲起子..哦不是那種工具是吧？工具，就是讓 LLM 可以跟文字以外的東西進行互動的關鍵。舉例來說，模型本身沒有取得連結內容的功能，那麼在 ChatGPT 的網站上它是怎麽存取你給他的連結呢？實際上，模型在背後回去呼叫一個工具取得這個連結的內文，工具回應的內文會在被加回模型的資訊內。這樣一來一往，模型就能夠根據正確的內容產生回應了！ ## 1. 那麼要怎麽造工具呢？ AI SDK 提供了很方便的且穩定的工具界面： ```typescript import { tool } from 'ai'; import { z } from 'zod'; export const weatherTool = tool({ description: 'Get the weather in a location', inputSchema: z.object({ location: z.string().describe('The location to get the weather for'), }), // location below is inferred to be a string: execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) - 10, }), }); ``` 官方範例提供了最基本構成工具的要素： - `description`：工具主要描述。 - `inputSchema`：用 `zod` 定義的輸入格式，是模型要提供給工具的參數。如果模型給了不正確的參數，或是錯誤的格式，則會回傳輸入驗證錯誤的訊息回去給模型。 - 每一個欄位都可以在最後面加上 `.describe("說明文字")` 來告訴模型每個欄位的使用細節、情境和限制。 - `execute`：模型呼叫工具時實際上會執行的程式。回傳值會直接當成結果送回去給模型。 ## 2. MCP 是什麼？也能用它嗎？ MCP 是 Model Context Protocol 的縮寫，是 Anthropic 提出的一個工具標準格式。支援三種傳輸格式：`stdio`、`http`、`sse`。 MCP 在 AI SDK 只有實驗性支援，API 界面可能會改動。不過既然他支援，我們就先當做可以用把 MCP 加進來吧！ ```typescript import { experimental_createMCPClient as createMCPClient, type Tool } from "ai"; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js"; import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js"; // local mcp const fetchClient = await createMCPClient({ transport: new StdioClientTransport({ command: "uvx", args: ["mcp-server-fetch"], // env: { YOUR_ENV_VAR: "hello!" }, }), }); // remote mcp (http) const context7Client = await createMCPClient({ transport: new StreamableHTTPClientTransport( new URL("https://mcp.context7.com/mcp"), { requestInit: { headers: { "Authorization": "Bearer <CONTEXT7_API_KEY>", }, }, }, ), }); // remote mcp (sse) const client = await createMCPClient({ transport: { type: "sse", url: "<url using sse>", headers: {}, }, }); // 轉成 ai-sdk 的工具格式 const tools = await Promise.all( [fetchClient, context7Client, client].map((t) => t.tools()), ); ``` ## 3. 怎麽用工具？有了幾個工具後，要讓模型可以使用工具才有用。不然就會變成這樣： ![image](https://hackmd.io/_uploads/SkRliSFhll.png) ### 基本呼叫方式 AI SDK 把這件事情做的很簡單，讓我們稍微借用一下之前寫過的程式： ```typescript import { stepCountIs } from "ai"; // 前面的程式 const { textStream, response } = streamText({ model: provider('gpt-5'), messages, // 加上這個把剛剛的 `tools` 傳進去就好了！ tools, // 最多執行幾輪，如果有工具呼叫會至少多一輪來回。 // 因為這個數值預設是 1，如果不給大於 1 的話會卡在 `finishReason` 為 `tool_call`。 stopWhen: stepCountIs(10), }); ``` 不過這樣有個小問題，這個方法只適用原生就支援工具呼叫的模型。像是 `gpt-5-chat` 就不支援這個呼叫方式，如果傳了 `tools` 參數進去嘗試請求，OpenAI 的 API 會回應 400 錯誤告訴你這個模型不支援工具呼叫。 ### 支援任何模型！那麼要怎麽辦呢？其實，我們可以自己定義一個工具呼叫格式，在 system prompt 內告訴模型要怎麽呼叫工具，在串接 `textStream` 的時候去偵測是否有符合這個格式。符合的話就執行工具呼叫，並且把這一個呼叫格式從輸出中切掉。執行完工具後把結果拼回去 `messages` 中在次執行 `streamText` 直到結束 (`finishReason` = `stop`) 或超過最高輪迴上限。我們先來假設一個工具呼叫的格式：`<tool-call tool="{name}">{payload}</tool-call>` 兩個簡單 function 的判斷目前的字串有沒有可能是個工具呼叫： ```typescript function maybeToolCallStart(text: string) { const start = "<tool-call"; for (let i = 0; i < Math.min(text.length, start.length); i++) { if (text[i] !== start[i]) return false; } return true; } function maybeToolCallEnd(text: string) { const end = "</tool-call>"; for (let i = 0; i < Math.min(text.length, end.length); i++) { if (text[text.length - i - 1] !== end[end.length - i - 1]) return false; } return true; } ``` 原本 AI SDK 會自動幫我們插入工具的描述，但是我們現在要手動來做這件事情，所以要先把工具描述給做出來： ```typescript import { asSchema, type ModelMessage } from "ai"; const toolDesc = Object.entries((tools = tools || {})).map(([name, tool]) => { return { name, description: tool.description, jsonSchema: asSchema(tool.inputSchema).jsonSchema, }; }); const toolSystemPrompt: ModelMessage = { role: "system", content: "Important rule to call tools:\n" + '- If you want to call a tool, you MUST ONLY output the tool call syntax: <tool-call tool="{name}">{payload}</tool-call>\n' + "- Examples:\n" + ' - <tool-call tool="fetch">{"url":"https://example.com","max_length":10000,"raw":false}</tool-call>\n' + ' - <tool-call tool="eval">{"code":"print(\'Hello World\')"}</tool-call>\n' + "\nAvailable tools:\n" + JSON.stringify(toolDesc, null, 2), }; ``` 再來的程式會有點複雜，我們要重新包裝 `streamText` 並提供類似的界面。流程是這樣的： 1. 呼叫 `streamText` 2. 監控串流累積內容，如果是工具呼叫先存著，不是工具呼叫就把內容丟出去。 3. 串流結束後如果有工具呼叫則執行，沒有的話就可以結束了。 4. 工具呼叫完把結果加回去 `messages` 中，再從 1 開始。 :::spoiler 上面的程式 ```typescript export function streamTextWithCompatibleTools({ tools, messages, ...rest }: StreamTextParams) { messages = [...(messages || [])]; const toolDesc = Object.entries((tools = tools || {})).map(([name, tool]) => { return { name, description: tool.description, jsonSchema: asSchema(tool.inputSchema).jsonSchema, }; }); const toolsSystemPrompt: ModelMessage = { role: "system", content: "Important rule to call tools:\n" + '- If you want to call a tool, you MUST ONLY output the tool call syntax: <tool-call tool="{name}">{payload}</tool-call>\n' + "- Examples:\n" + ' - <tool-call tool="fetch">{"url":"https://example.com","max_length":10000,"raw":false}</tool-call>\n' + ' - <tool-call tool="eval">{"code":"print(\'Hello World\')"}</tool-call>\n' + "\nAvailable tools:\n" + JSON.stringify(toolDesc, null, 2), }; let callSequence = 0; const generateCallId = () => `${toolCallIdPrefix}-${++callSequence}`; ``` ::: ```typescript const { promise: finishReason, resolve: resolveFinishReason } = Promise.withResolvers<FinishReason>(); const finalResponsesAccu: ResponseMessage[] = []; const { promise: finalResponses, resolve: resolveFinalResponses } = Promise.withResolvers<{ messages: ResponseMessage[] }>(); const TOOL_CALL_SINGLE = /<tool-call\s+tool="([^"]+)">([\s\S]*?)<\/tool-call>/; // 一個 async generator，等於原本的 `textStream` const textStreamOut = async function* () { while (true) { const { textStream, finishReason, response } = streamText({ ...rest, messages: [toolsSystemPrompt, ...messages], prompt: undefined, // 這個是為了確保 type 是對的 tools: undefined, // 確保沒有 `tools` 被傳進去 }); let buffer = ""; let toolMatch: RegExpExecArray | null = null; let inToolCall = false; let carryOver = ""; for await (const chunk of textStream) { if (inToolCall) { // 如果可能是工具呼叫就累積起來 buffer += chunk; } else if (maybeToolCallStart(chunk) && !toolMatch) { // 如果可能是工具呼叫就開始累積 inToolCall = true; buffer = chunk; } else { // 不是工具呼叫，丟出去然後繼續 yield chunk; continue; } // 如果是合法工具呼叫就先存著 if (inToolCall && maybeToolCallEnd(buffer)) { const match = buffer.match(TOOL_CALL_SINGLE); if (match) { const full = match[0]; const idx = buffer.indexOf(full); const endIdx = idx + full.length; carryOver = buffer.slice(endIdx); toolMatch = [ full, match[1], match[2], ] as unknown as RegExpExecArray; } else { yield buffer; } buffer = ""; inToolCall = false; } } // 串流結束後如果 buffer 內有東西，大概是錯誤的工具呼叫語法，當成一般內容丟出去 if (!toolMatch && buffer) { if (inToolCall) yield buffer; buffer = ""; inToolCall = false; } const [, toolName, payload] = toolMatch ?? []; const tool = toolName && tools?.[toolName]; // 沒有工具呼叫，結束串流 if (!toolName || !tool || !tool.execute) { resolveFinishReason(await finishReason); if (carryOver) { yield carryOver; carryOver = ""; } resolveFinalResponses({ messages: finalResponsesAccu }); break; } console.log(`Calling tool in compatible mode: ${toolName}`); // 把這輪內容先放進去 messages 裡面 const callId = generateCallId(); const { messages: respMessages } = await response; messages.push(...respMessages); finalResponsesAccu.push(...respMessages); try { // 執行工具呼叫 const toolResult: unknown = await tool.execute(tryParseJson(payload), { toolCallId: callId, messages: respMessages, }); // 呼叫成功，當成系統訊息放進去 `messages` 內。 // 正常的工具呼叫 `role` 會是 `tool`，不過某些 API 提供商會去對 `toolCallId` 之前 // 是否存在，所以確保不會壞掉的方式就是當成系統訊息。 messages.push({ role: "system", content: JSON.stringify([ { type: "tool-result", toolCallId: callId, toolName, output: toToolResultOutput(toolResult), }, ]), }); } catch (err) { // 呼叫失敗，告訴模型為什麼失敗 messages.push({ role: "system", content: JSON.stringify([ { type: "tool-result", toolCallId: callId, toolName, output: { type: "error-text", value: `Tool execution failed: ${String(err)}`, }, }, ]), }); } if (carryOver) { yield carryOver; carryOver = ""; } } }; return { textStream: textStreamOut(), finishReason, response: finalResponses, }; } ``` :::spoiler 後面的程式 ```typescript function toToolResultOutput(output: unknown): ToolResultPart["output"] { if (typeof output === "string") return { type: "text", value: output }; // treat undefined/null as empty text if (output === undefined || output === null) return { type: "text", value: "" }; try { JSON.stringify(output); return { type: "json", value: output as JSONValue }; } catch { return { type: "error-text", value: "Non-serializable tool output" }; } } function tryParseJson(raw: string | undefined): unknown { if (!raw) return undefined; const trimmed = raw.trim(); if (!trimmed) return ""; try { return JSON.parse(trimmed); } catch (error) { return trimmed; } } ``` ::: ## 4. 結束了？對，這次是真的寫完了 :tada: :tada: :tada: 如果你覺得要寫這麼多太麻煩了，可以直接用我寫好的 [js-llmcord](https://github.com/stanley2058/js-llmcord) (~~無情推廣~~)。原本是從 [llmcord](https://github.com/jakobdylanc/llmcord) 改寫過來然後硬塞了工具進去，後來為了加 RAG 跟支援 `gpt-5-chat` 越改越多就變成幾乎重寫了 :sweat_smile:。 (曬一下可愛的機器人 (?)) ![image](https://hackmd.io/_uploads/H1kjSUY2xl.png)