114-1 Metaverse 課程講義：Advanced

# 114-1 Metaverse 課程講義：Advanced --- # 2025/12/4 3DGS & 4DGS ## 常用連結 **3dgs** https://github.com/MrNeRF/LichtFeld-Studio https://github.com/MrNeRF/LichtFeld-Studio/wiki/ https://superspl.at/editor **unity** https://github.com/aras-p/UnityGaussianSplatting https://github.com/ninjamode/Unity-VR-Gaussian-Splatting **4dgs** https://github.com/hustvl/4DGaussians **mobile app** https://lumalabs.ai/interactive-scenes ## 相關影片/網站 [Lichtfeld Studio: The Best Open-Source 3D Gaussian Splatting Software? (Full Tutorial)](https://www.youtube.com/watch?v=aX8MTlr9Ypc) [LichtFeld Studio Beginner Tutorial - Using Colmap to create a dataset for LichtFeld Studio](https://www.youtube.com/watch?v=-3TBbukYN00) [LichtFeld Studio Beginner Tutorial - Using Reality Scan to create a dataset for LichtFeld Studio](https://www.youtube.com/watch?v=JWmkhTlbDvg) [【輕鬆懂】NeRF v.s. 3D Gaussian Splatting 一次看](https://www.aiposthub.com/e3-80-90-e8-bc-95-e9-ac-86-e6-87-82-e3-80-91nerf-v-s-3d-gaussian-splatting-e4-b8-80-e6-ac-a1-e7-9c-8b/) [4DGS（CVPR 2024）原理介绍与Windows环境下复现流程记录](https://zhuanlan.zhihu.com/p/18900492638) [Animated Gaussian Splatting in Unreal Engine 5](https://80.lv/articles/animated-gaussian-splatting-in-unreal-engine-5) ## 0. 3DGS vs NeRF ![image](https://hackmd.io/_uploads/Sy6-6E0W-x.png) ![image](https://hackmd.io/_uploads/BkVh2EAZ-e.png) > (來源：[【輕鬆懂】NeRF v.s. 3D Gaussian Splatting 一次看](https://www.aiposthub.com/e3-80-90-e8-bc-95-e9-ac-86-e6-87-82-e3-80-91nerf-v-s-3d-gaussian-splatting-e4-b8-80-e6-ac-a1-e7-9c-8b/)) ## 1. Reconstruct Database (Colmap) * **下載** [Colmap](https://github.com/colmap/colmap/releases) * **新建專案**：創建新的database、放入你拍攝的照片資料夾 ![image](https://hackmd.io/_uploads/SyJ0yRAbbg.png) * **Feature Extraction**：找特徵點位置與向量 (SIFT) ![image](https://hackmd.io/_uploads/SJtQgAR-We.png) * **Feature Matching**：Paring pictures ![image](https://hackmd.io/_uploads/S1aHgR0Wbx.png) * **Start Reconstruction** ![image](https://hackmd.io/_uploads/SJg_l0CWbl.png) ![image](https://hackmd.io/_uploads/rJeG4mC--e.png) * **Undistortion** (Output path 可選擇新增'dense'資料夾) ![image](https://hackmd.io/_uploads/rkOogRR-Zl.png) * **檢查**是否有image及sparse的資料夾 ![image](https://hackmd.io/_uploads/r1Nf-A0-bg.png) ## 2. Training (LichtFeld Studio) * **Prepare your greatest device!** ![image](https://hackmd.io/_uploads/rJJobRRb-x.png) * 按照連結的Instructions，**下載** [LichtFeld Studio](https://github.com/MrNeRF/LichtFeld-Studio/wiki/Build-Instructions-%E2%80%90-Windows) * 於 Build 資料夾中會看到執行檔 ![image](https://hackmd.io/_uploads/rJgIMR0b-l.png) * 下載 [Example Dataset](https://github.com/MrNeRF/LichtFeld-Studio/wiki/Example-Dataset) **For Windows** 1. Download zip ```shellcode curl -L -o tandt_db.zip https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/datasets/input/tandt_db.zip ``` 2. unzip ```powershell mkdir data tar -xf tandt_db.zip -C data ``` 3. Check ![image](https://hackmd.io/_uploads/SJwK70RWZl.png) * 將資料夾丟進LichtFeld Studio (自己準備的話、可以丟一整個dense資料夾) ![image](https://hackmd.io/_uploads/rksaXCAbWg.png) ![image](https://hackmd.io/_uploads/rJfbNC0bZe.png) * Start Training! (可以選擇自己的checkpoints step) ![image](https://hackmd.io/_uploads/rJcXVRRWWe.png) * 操作方式 ![image](https://hackmd.io/_uploads/B1MOER0bbe.png) * **Result** ![螢幕擷取畫面 2025-12-04 045913](https://hackmd.io/_uploads/HySQc4RbZx.png) * **Export** ![image](https://hackmd.io/_uploads/SJkoVRR-be.png) ![image](https://hackmd.io/_uploads/BJfTNCCZ-e.png) ## 2.5 Cropping and Cleanup (SuperSplat) * **使用 LichtFeld Studio Cropping** ![image](https://hackmd.io/_uploads/ryqmS0CZWx.png) 1. Show Crop Box：顯示Crop範圍 2. Use Crop Box：僅顯示Crop範圍的3dgs 3. 改變Rotation、Bounds的數值值到Crop到正確位置 4. Crop Active PLY：執行 Cropping 5. **Export** ![image](https://hackmd.io/_uploads/HyK6r0A--e.png) * **至 [SuperSplat](https://superspl.at/editor)** 1. Open Splat mode、Hide Splats ![image](https://hackmd.io/_uploads/B19lI0AW-e.png) 2. 透過 Select Tools 選取雜點 ![image](https://hackmd.io/_uploads/ByW58ACWZe.png) 3. Delete Selection ![image](https://hackmd.io/_uploads/Hyr2IRAb-e.png) 4. 校正Transform ![image](https://hackmd.io/_uploads/rkHyP0RWbx.png) 5. **Result** ![螢幕擷取畫面 2025-12-04 063110](https://hackmd.io/_uploads/HJzU5ECWZe.png) ## 3. Rendering (Unity) * `git clone https://github.com/ninjamode/Unity-VR-Gaussian-Splatting.git` * Create GaussianSplatAsset ![image](https://hackmd.io/_uploads/rkhDD0AW-x.png) ![image](https://hackmd.io/_uploads/ByBuDACbZx.png) * 建立一個新物件、添加 Gaussian Splat Renderer、並將Asset拉進去 ![image](https://hackmd.io/_uploads/r1tADAR-We.png) * 打開 Optimize for Quest ![image](https://hackmd.io/_uploads/BJ3EORRWWg.png) * **測試結果！** ![螢幕擷取畫面 2025-12-04 063208](https://hackmd.io/_uploads/H1Tt5ERZbl.png) ## 4. 4DGS? ![image](https://hackmd.io/_uploads/Byxcj4Cbbg.png) **Pipeline** ![image](https://hackmd.io/_uploads/rycy2VCbWx.png) > (來源：[4DGS（CVPR 2024）原理介绍与Windows环境下复现流程记录](https://zhuanlan.zhihu.com/p/18900492638)) In Unreal Engine [Animated Gaussian Splatting in Unreal Engine 5](https://80.lv/articles/animated-gaussian-splatting-in-unreal-engine-5) ## 5. Luma AI * (手機) 下載 [Luma AI: 3D Capture](https://play.google.com/store/apps/details?id=ai.lumalabs.polar) * (筆電) 前往 [Luma AI](https://lumalabs.ai/interactive-scenes) ![螢幕擷取畫面 2025-12-04 050353](https://hackmd.io/_uploads/SyiJoNRZZl.png) ![image](https://hackmd.io/_uploads/Hkq9HmRW-g.png) --- # 2025/12/4 AI Conversation ## 目錄 - Chapter 1. Conversational AI 基礎介紹 - Chapter 2. Unity 與環境準備 - Chapter 3. 呼叫 OpenAI API - Chapter 4. Responses API（含 Structured Output） - Chapter 5. Realtime API（即時語音 / 多模態） - Chapter 6. Prompt Engineering 與資料檢索技巧 - Chapter 7. 語音對話 Pipeline（依 Conversational AI 模組分類） --- ## Chapter 1. Conversational AI 基礎介紹 ### 1.1 什麼是 Conversational AI？ - Conversational AI 是能與人自然互動的人工智慧系統 - 傳統 Conversational AI 架構： ASR（語音辨識）→ NLU（語意理解）→ DM（對話管理）→ NLG（語言生成）→ TTS（語音合成） ![image alt](https://www.simplesolve.com/hs-fs/hubfs/Conversational%20AI%20Architecture.jpg?width=825&name=Conversational%20AI%20Architecture.jpg) - 大型語言模型（LLM）可以同時涵蓋 NLU、DM、NLG，降低系統複雜度 ### 1.2 Unity 在對話式 AI 中的角色 - 適合 VR / MR / AR / 3D 互動應用 - 建置於 PC、Android、iOS、Quest 3 - 適用：AI NPC、虛擬助教、智能導覽、沉浸式教學 --- ## Chapter 2. Unity 與環境準備 ### 2.1 Unity 推薦版本 - Unity 2022 以上 - Quest：需使用 IL2CPP + Android Build Target ### 2.2 Project Settings 設定 - API Compatibility Level → `.NET Standard 2.1` - Scripting Backend → IL2CPP ### 2.3 建議安裝套件 - TextMeshPro - Newtonsoft JSON（解析 JSON） - UnityWebRequest（使用 REST API 的話需要） ### 2.4 OpenAI API Key 設定建議 - 不要以任何形式 ( 包含 inspector 上的資訊 ) 把 API key 上傳到網路 ( git, cloud ) 上，若有上傳專案的需求應該在上傳之前把 api key 都刪除 - 可以使用 .env 的方式去儲存 api key 並確保 .env 沒有被上傳到網路 - 若公開發佈應使用 Proxy Server --- ## Chapter 3. 呼叫 OpenAI API ### REST API vs OpenAI .NET SDK **REST API（搭配 UnityWebRequest）** 優點： - 最穩定、平台支援度最高（含 Quest）。 - 直接使用官方 HTTP 端點，支援最新 Responses / Realtime 等功能。缺點： - 需自行組裝 JSON、解析回傳結果。 **OpenAI .NET SDK** 優點： - 類別結構清楚，程式碼較精簡。缺點： - IL2CPP / Android / Quest 可能有相容性問題。 - 新功能有時不如 REST 版本即時。 --- ### 3.1 使用 REST API 呼叫 Responses（UnityWebRequest 範例） ```csharp IEnumerator RequestResponsesAPI(string prompt) { string url = "https://api.openai.com/v1/responses"; string json = "{"" + "\\"model\\": \\"gpt-5-nano\\"," + "\\"input\\": \\"" + prompt + "\\"," + "\\"response_format\\": { \\"type\\": \\"text\\" }" + "}"; UnityWebRequest req = new UnityWebRequest(url, "POST"); byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(json); req.uploadHandler = new UploadHandlerRaw(bodyRaw); req.downloadHandler = new DownloadHandlerBuffer(); req.SetRequestHeader("Content-Type", "application/json"); req.SetRequestHeader("Authorization", "Bearer " + API_KEY); yield return req.SendWebRequest(); Debug.Log(req.downloadHandler.text); } ``` --- ### 3.2 使用 OpenAI .NET SDK 呼叫 Responses ```csharp var client = new OpenAIClient(API_KEY); var response = await client.Responses.CreateAsync(new ResponseOptions { Model = "gpt-5-nano", Input = "請用一句話介紹量子疊加", ResponseFormat = ResponseFormat.Text }); string text = response.Output[0].Content[0].Text; Debug.Log(text); ``` > [!Note] 何時選 REST / SDK？ > > **較適合 REST API 的情境** > > - Quest 3 / Quest 2 > - Android / iOS > - VR / MR 專案 > - 需要 SSE streaming > - 想第一時間用到最新 API 功能 > > **較適合 SDK 的情境** > > - 僅在 PC / Mac（非 IL2CPP） > - Unity Editor 示範工具 > - 開發階段快速原型 --- ### 3.3 Responses API 與 Realtime API 的角色 - **Responses API**： - 走 HTTP REST，適合一次性請求 / 工具呼叫 / 結構化輸出。 - 適合 NPC 行為控制、任務生成、遊戲事件、一般文字聊天。 - **Realtime API**： - 走 WebSocket / WebRTC，支援語音串流與即時多模態對話。 - 適合語音助理、VR 語音 NPC、需要玩家打斷的即時對話。詳細說明與進階用法分別在 Chapter 4 / Chapter 5。 --- ### 3.4 安裝 OpenAI .NET SDK (如果要使用 SDK 來呼叫 API 的話) #### 下載並安裝 NuGetForUnity 連結：[NuGetForUnity](https://openupm.com/packages/com.github-glitchenzo.nugetforunity/#modal-manualinstallation) >![image](https://hackmd.io/_uploads/ByAKnk1z-g.png) #### 使用 NugetForUnity 安裝 OpenAI .Net SDK >![image](https://hackmd.io/_uploads/SJ94p11fbg.png) >![image](https://hackmd.io/_uploads/HyeL6JyMbe.png) >[!Note]參考資訊 >- **Nuget是什麼？** > NuGet 是 .NET 平台的套件管理系統就像 Unity 有 Package Manager、Node.js 有 NPM 一樣，.NET 程式（包含 C#）使用 NuGet 來下載外部函式庫（library） --- ## Chapter 4. Responses API（含 Structured Output） ### 4.1 Responses API 是什麼？ - 新一代通用 API，可視為 Chat Completions + Assistants 的整合版本。 - 支援：文字輸出、工具呼叫、檔案、程式碼執行、搜尋等。 - 適合做「會做事的聊天 agent」，也是 Unity 中 NPC 腦袋的好選擇。 ### 4.2 Responses API 回應形式（Modalities） - **Text**：一般文字回答。 - **JSON / Structured Output**：透過 `json_schema` 嚴格控制輸出格式。 - **Streaming（SSE）**：逐 token / chunk 輸出，用於打字機效果。 - **Tool calls**：模型可主動呼叫你定義的工具函式。 ### 4.3 Responses API 一般回應（Non‑Streaming） Responses API 的最基本使用方式是一次回傳完整結果，適用以下情境如： - 一般 NPC 對話 - 任務資料生成 - 遊戲事件觸發 #### 4.3.1 一般請求範例 ```json { "model": "gpt-5-nano", "input": "幫我用一句話介紹量子疊加", "response_format": {"type": "text"} } ``` #### 4.3.2 一般回應結果 ```json { "id": "resp_123", "output": [ { "type": "output_text", "content": [ {"type": "text", "text": "量子疊加是量子系統能同時處於多種狀態的現象。"} ] } ] } ``` #### 4.3.3 NPC 對話情境：含 task 說明的 prompt（範例）以下展示 Responses API 在 Unity NPC 對話中常用的 prompt 設計方式，包含 **任務描述（task）**： ```json { "model": "gpt-5-nano", "input": [ {"role": "system", "content": "你是一位友善且樂於協助玩家的 NPC 導遊，語氣輕鬆親切。請根據玩家說的話給出自然、沉浸式、具角色風格的回應。"}, {"role": "user", "content": "嗨，我第一次來這個村莊，能帶我走走嗎？"} ], "response_format": {"type": "text"} } ``` 此格式適合： - 讓 NPC 維持一致的角色個性 - Unity 中進行多輪對話 - 讓 API 理解「任務」是提供 NPC 口吻的回應 - 適合與後續 Structured Output 或動作控制搭配 --- ### 4.4 Responses API Streaming（SSE） Streaming 版本會以 SSE（Server‑Sent Events）逐段傳回文字。 - chunk 通常是詞組，3–10 字元 - 不像 Realtime API 是逐 token - 適合打字機效果 #### 4.4.1 Streaming 請求方式 ```json { "model": "gpt-5-nano", "input": "介紹量子疊加", "stream": true } ``` #### 4.4.2 SSE 回應事件類型以下是 Responses API Streaming 常見事件： > **response.started** > > ```json > {"type":"response.started","response":{"id":"resp_123"}} > ``` >--- > **response.output\_text.delta** > > ```json > {"type":"response.output_text.delta","delta":"量子疊加"} > {"type":"response.output_text.delta","delta":"是一種可以同時存在"} > {"type":"response.output_text.delta","delta":"多種狀態的現象。"} > ``` >--- > **response.output\_text.completed** > > ```json > {"type":"response.output_text.completed","text":"量子疊加是量子系統能同時處於多種狀態的現象。"} > ``` >--- > **error** > > ```json > {"type":"error","error":{"message":"Invalid request"}} > ``` ### 4.5 Structured Output（JSON Schema） Structured Output 讓模型被「強制」輸出符合你定義的 JSON Schema。 #### 4.5.1 請求範例（REST） ```json { "model": "gpt-5-nano", "input": "玩家揮手並說 hello", "response_format": { "type": "json_schema", "json_schema": { "name": "npc_action", "schema": { "type": "object", "properties": { "action": { "type": "string" }, "emotion": { "type": "string" }, "reply": { "type": "string" } }, "required": ["action", "reply"] } } } } ``` #### 4.5.2 模型回覆範例 ```json { "action": "WaveBack", "emotion": "Happy", "reply": "你好，很高興見到你！" } ``` #### 4.5.3 適用場景 - NPC 行為控制（動作、表情、語音內容）。 - 任務 / 關卡資料生成。 - 對話選項、分支劇情（例如 `choices: [...]`）。 - AI 行為樹或狀態機節點描述。 > [!Warning]Realtime API 目前**應該不支援** Structured Output， > 需要穩定 JSON 結構時，請優先使用 Responses API > [!Note]參考資訊 > 如果要更細緻的去操控 Structured Output 請參考 [Structured model Output 官方參考資料](https://platform.openai.com/docs/guides/structured-outputs/examples?utm_source=chatgpt.com) ### 4.6 何時使用 Responses API？ - 重點在「內容」或「控制資料」而非即時語音。 - 需要 JSON / 結構化結果，方便程式解析。 - 需要串接資料庫 / 工具 / RAG / 外部服務。 > [!Note]參考資訊 > 詳細如何使用Response API 請參考 [Response API 官方參考資料](https://platform.openai.com/docs/api-reference/responses?utm_source=chatgpt.com) --- ## Chapter 5. Realtime API（即時語音 / 多模態） ### 5.1 Realtime API 是什麼？ - 使用 WebSocket / WebRTC 的即時多模態 API。 - 支援： - 語音輸入（audio in） - 語音輸出（audio out） - 即時轉錄（transcription） - 文字輸入 / 回覆 - 適合做低延遲語音代理、VR NPC、語音助理。 ### 5.2 Responses vs Realtime 比較 | 能力 | Responses API | Realtime API | | ----------------- | --------------- | -------------------- | | 呼叫方式 | REST (HTTP) | WebSocket / WebRTC | | 語音輸入 | 需另呼叫 Whisper | 原生支援語音串流 | | 語音輸出 | Audio API 另呼叫 | 原生語音 chunk 串流 | | Structured Output | 支援 json\_schema | 不支援 | | 可否打斷 AI | 一次請求不可中途斷 | 可在說話中途打斷 / 重啟 | | 多模態 | 圖片、工具、JSON | 語音、文字 | ### 5.3 Realtime API 回傳事件類型介紹以下整理 Realtime API 可能出現在 WebSocket 中的所有主要事件，並附上用途與 JSON 格式範例，方便 Unity 中解析。 --- ### 5.3.1 系統級事件（System Events) >### **session.created** >Realtime 連線成功後，伺服器回傳 session 資訊。 >```json >{ > "type": "session.created", > "session": { > "id": "sess_123", > "model": "gpt-4o-realtime-preview" > } >} >``` >用途： >- 確認連線成功 >- 可從中得知預設模型、參數 ### 5.3.2 語音輸入事件（Audio Input Events） > ### **input\_audio\_buffer.append** > 代表你上傳（append）一段語音 chunk。 > > ```json > { > "type": "input_audio_buffer.append", > "audio": "<base64 audio chunk>" > } > ``` > > 用途： > > - 送出新的語音片段給模型 >--- > ### **input\_audio\_buffer.completed** > > 代表你「停止提供語音」，模型可以開始處理。 > > ```json > { > "type": "input_audio_buffer.completed" > } > ``` > > 用途： > > - 告訴模型「我講完了」 > - 模型會根據語音內容開始推理 >--- > ### **input\_audio\_transcription.completed** > > 代表模型已完成語音轉文字。 > > ```json > { > "type": "input_audio_transcription.completed", > "text": "你好，我想問現在幾點。" > } > ``` > > 用途： > > - 語音轉文字結果（替代 Whisper） > - 可顯示在 UI 文字框 > ### **5.3.3 文字訊息事件（Text Output Events）** > ### **response.output\_text.delta** > > 模型幾乎以逐字輸出文字 > > #### 範例： > > ```json > { "type": "response.output_text.delta", "delta": "你" } > { "type": "response.output_text.delta", "delta": "好" } > { "type": "response.output_text.delta", "delta": "，" } > { "type": "response.output_text.delta", "delta": "我" } > { "type": "response.output_text.delta", "delta": "是" } > { "type": "response.output_text.delta", "delta": "你" } > { "type": "response.output_text.delta", "delta": "的" } > { "type": "response.output_text.delta", "delta": "助" } > { "type": "response.output_text.delta", "delta": "手" } > { "type": "response.output_text.delta", "delta": "。" } > ``` > > Realtime API 追求極低延遲，因此： > > - 會在模型還在思考時就先送出 token > - 每個片段非常短，有時甚至不是完整字或詞 > - 讓 Unity / VR 能同步顯示文字或語音 > - 也能在任何時候中斷 AI 說話 > --- > ### **response.output\_text.completed** > > 模型完成文字生成。 > > ```json > { > "type": "response.output_text.completed", > "text": "你好，我是你的助手。" > } > ``` > > 用途： > > - 完整文字訊息（非流式） ### **5.3.4 語音輸出事件（Audio Output Events）** > ### **response.output\_audio.delta** > > 代表 AI 語音輸出的音訊 chunk。 > > ```json > { > "type": "response.output_audio.delta", > "audio": "<base64 audio chunk>" > } > ``` > > 用途： > > - 分段播放 TTS 語音（低延遲） > - 適合 Unity 用 AudioClip 逐段寫入 > > --- > > ### **response.output\_audio.completed** > > 語音輸出完成。 > > ```json > { > "type": "response.output_audio.completed" > } > ``` > > 用途： > > - 可停止 AudioSource queue ### **5.3.5 Response 完成事件（Response Lifecycle）** > ### **response.started** > > 模型開始處理輸入。 > > ```json > { > "type": "response.started", > "response": { "id": "resp_123" } > } > ``` > > --- > > ### **response.completed** > > 模型完成此次回應（包含文字 / 語音）。 > > ```json > { > "type": "response.completed", > "response": { "id": "resp_123" } > } > ``` > > 用途： > > - 用來判斷本輪對話是否完整結束 ### **5.3.6 錯誤事件（Error Events）** > ### **error** > > 模型或 WebSocket 發生錯誤。 > > ```json > { > "type": "error", > "error": { > "message": "Invalid audio format", > "code": "invalid_audio" > } > } > ``` > > 用途： > > - 顯示錯誤訊息 > - 重新連線或重新送出語音 > ### **5.3.7 停止事件（Interruption Events）** > > ### **response.interrupted** > > 代表 AI 的語音被用戶打斷。 > > ```json > { > "type": "response.interrupted", > "response": { "id": "resp_123" } > } > ``` > > 用途： > > - 玩家隨時可說話中斷 AI > - AI 將停止語音生成 >[!Note]參考資訊 > 更詳細關於 Realtime API 的事件類型說明請參考 [Client events 官方文件](https://platform.openai.com/docs/api-reference/realtime-client-events?utm_source=chatgpt.com) ### 5.4 何時使用 Realtime API？ - 需要「講話」而不只是「打字」 - 需要玩家可以打斷 AI 的說話流程 --- ## Chapter 6. Prompt Engineering 與資料檢索技巧 ### 6.1 Prompt Engineering：對話記憶（Conversation History） **C# List 管理對話歷程範例** ```csharp List<Message> history = new(); history.Add(new("user", userText)); history.Add(new("assistant", aiResp)); ``` - 可將 history 傳給 Responses API，使模型理解前文脈絡。 #### Responses 傳送 history 的範例 ```json { "model": "gpt-5-nano", "input": [ {"role": "system", "content": "你是一位友善的 NPC 導遊。"}, {"role": "user", "content": "你好，我剛到這個村莊。"}, {"role": "assistant", "content": "歡迎！今天需要我帶你去哪裡嗎？"}, {"role": "user", "content": "帶我去市場。"} ], "response_format": {"type": "text"} } ``` #### Unity C#（REST）傳 history 範例 ```csharp var history = new List<object> { new { role = "system", content = "你是一位友善的 NPC 導遊。" }, new { role = "user", content = "你好，我剛到這個村莊。" }, new { role = "assistant", content = "歡迎！今天需要我帶你去哪裡嗎？" }, new { role = "user", content = "帶我去市場。" } }; var request = new { model = "gpt-5-nano", input = history, response_format = new { type = "text" } }; string json = JsonConvert.SerializeObject(request); ``` 以上方式可讓 Responses API 理解完整脈絡，並產生自然連貫的 NPC 對話反應。 --- ### 6.2 Prompt Engineering：System Prompt（角色設定） ```json {"role":"system", "content":"你是一個友善的 NPC 導遊。"} ``` - 用來定義 NPC 的個性、語氣、說話範圍。 - 可加入 **風格**、**限制**、**技能列表**，提升角色一致性。 --- ### 6.3 Prompt Engineering：常用技巧 #### **明確任務描述（Clear Tasking）** 告訴模型「你要它做什麼」→ 比如 NPC 回覆要簡潔、有沉浸感、保持角色一致。 ```json {"role":"system", "content":"任務：以村莊 NPC 身分回覆玩家，語氣友善、有點神秘感，避免講太長。"} ``` --- #### **限制輸出風格（Style Constraints）** ```json {"role":"system", "content":"你的回覆限制：不超過 20 字、避免專業術語、加入一個表情符號。"} ``` --- #### **使用範例（Few‑Shot Examples）** ```json {"role":"assistant","content":"歡迎來到綠楓村！想去哪裡走走？✨"} {"role":"user","content":"我想看看有什麼特別的地方。"} {"role":"assistant","content":"市場最熱鬧，要我帶你去嗎？🧺"} ``` --- #### **限制知識範圍（Knowledge / Canon Rules）** 避免 NPC 亂講故事之外的內容。 ```json {"role":"system","content":"你只能使用此村莊背景資訊，不得創造歷史事件或國家設定。"} ``` --- #### **要求思考步驟（Chain‑of‑Thought / Guided Reasoning）** NPC 思考 → 再轉成人類能理解的回答。 ```json {"role":"system","content":"請先在心中推理（不要展示），再給簡短 NPC 回覆。"} ``` --- ### 6.4 RAG：檢索增強生成（Retrieval Augmented Generation） > #### **OpenAI 是否提供內建 RAG？** > - OpenAI 在 **Assistants API** 時代曾內建 RAG 工具（vector store / file search）。 > - **但在 2024–2025 的新版 API（Responses / Realtime）中不再強調內建 RAG**，通常需自行實作： > - 本地資料庫（ScriptableObject / JSON） > - 第三方向量資料庫（Pinecone / Weaviate / Chroma） > - 也可自行建立 embedding + cosine similarity > ### **6.4.1 RAG 基本架構** 下面是遊戲 / Unity 中常用的 RAG 架構，輕量且可完全離線運作。 ``` 玩家輸入 → (1) Embedding → (2) 搜尋最接近的資料 → (3) 將資料塞入 prompt → GPT 回覆 ``` - **Embedding API**：將玩家提問轉成向量。 - **本地向量資料庫**（List / ScriptableObject）：查找相似資料。 - **Responses API**：最終生成 NPC 回覆。 --- ### **6.4.2 建立本地資料庫（ScriptableObject）** ```csharp [System.Serializable] public class LoreEntry { public string id; public string content; public float[] embedding; } [CreateAssetMenu] public class LoreDatabase : ScriptableObject { public List<LoreEntry> entries; } ``` --- ### **6.4.3 產生資料庫 Embedding （.NET SDK）** > 在 Unity Editor 離線生成 embedding，避免遊戲執行時消耗成本。 ```csharp public async Task<float[]> CreateEmbedding(string text) { var client = new OpenAIClient(API_KEY); var res = await client.Embeddings.CreateAsync(new EmbeddingCreateOptions { Model = "text-embedding-3-large", Input = text }); return res.Data[0].Embedding; } ``` --- ### **6.4.4 查找最相似資料（Cosine Similarity）** ```csharp float Cosine(float[] a, float[] b) { float dot = 0f, magA = 0f, magB = 0f; for (int i = 0; i < a.Length; i++) { dot += a[i] * b[i]; magA += a[i] * a[i]; magB += b[i] * b[i]; } return dot / (Mathf.Sqrt(magA) * Mathf.Sqrt(magB)); } LoreEntry SearchTop1(float[] userEmbedding, LoreDatabase db) { float bestScore = -1f; LoreEntry best = null; foreach (var e in db.entries) { float score = Cosine(userEmbedding, e.embedding); if (score > bestScore) { bestScore = score; best = e; } } return best; } ``` --- ### **6.4.5 將檢索結果塞進 Prompt → Responses API** ```csharp string InjectContext(string retrieved) { return $"以下是與玩家問題最接近的資料：{retrieved} 請根據這些資訊用 NPC 的口吻回答玩家。"; } // 最終請求 var req = new { model = "gpt-5-nano", input = new object[] { new { role = "system", content = "你是一位 NPC 導遊。" }, new { role = "system", content = InjectContext(bestLore.content) }, new { role = "user", content = userInput } }, response_format = new { type = "text" } }; ``` --- ### **6.4.6 完整 RAG 流程（Unity 實務整合）** ```csharp async Task<string> NPC_RAG_Reply(string userInput) { // 1. 建立使用者 embedding float[] userEmbedding = await CreateEmbedding(userInput); // 2. 查找最接近的資料 var bestLore = SearchTop1(userEmbedding, loreDB); // 3. 準備 prompt string promptContext = InjectContext(bestLore.content); // 4. 呼叫 Responses API var req = new { model = "gpt-5-nano", input = new object[] { new { role = "system", content = "你是一位 NPC 導遊。" }, new { role = "system", content = promptContext }, new { role = "user", content = userInput } }, response_format = new { type = "text" } }; string json = JsonConvert.SerializeObject(req); string res = await PostJSON("https://api.openai.com/v1/responses", json); return ExtractText(res); } ``` --- ### 6.5 Prompt 情境範例（NPC 對話） #### **NPC 對話基礎模板** ```json {"role":"system","content":"你是一位 NPC，語氣友善。請以沉浸式方式回答玩家問題。"} ``` #### **任務觸發模板** ```json {"role":"system","content":"若玩家提到 '市場'、'商品'、'買東西' 等關鍵字，回覆應引導到市場相關任務。"} ``` #### **情緒驅動模板** ```json {"role":"system","content":"若玩家語氣低落，請以安慰方式回覆；若玩家興奮，回覆能量提升。"} ``` #### **禁止與限制模板** ```json {"role":"system","content":"不得提及遊戲外資訊、不得打破第四面牆、不得回答與村莊設定無關的世界觀內容。"} ``` --- ### 6.6 小型資料庫查找（本地 JSON / ScriptableObject）即使沒有真正的 RAG 服務，也可以用 Unity 本地資料達到類似的功能。 #### **本地 JSON** ```json { "locations": [ {"name":"市場", "desc":"這裡是村子裡最熱鬧的地方。"}, {"name":"神木", "desc":"矗立千年的大樹，被村民視為守護者。"} ] } ``` #### **ScriptableObject 資料庫** ```csharp [CreateAssetMenu] public class LocationDB : ScriptableObject { public List<LocationInfo> locations; } ``` #### **Unity 查詢後塞到 prompt** ``` 你查詢到的資料：<desc>。請用 NPC 的語氣回覆玩家。 ``` --- ## Chapter 7. 語音對話 Pipeline 以下以 **傳統 Conversational AI 的 5 大模組**（ASR → NLU → DM → NLG → TTS）重構語音對話流程，並同時介紹傳統做法 vs Realtime API 整合流程。 --- ### **7.1 ASR（Automatic Speech Recognition：語音辨識）** ASR 模組負責將玩家語音 → 文字。 **Whisper（非 Realtime）** 1. Unity 錄音 → 產生 WAV / FLAC 2. 使用 Whisper API（/v1/audio/transcriptions） 3. 回傳純文字給 Unity **Realtime API（內建 ASR）** Realtime API 會自動： - 接收語音 chunk - 自動判斷語音起訖（VAD） - 自動轉文字 → 發送事件：`input_audio_transcription.completed` --- ### **7.2 NLU（Natural Language Understanding：語意理解）** 模型要理解玩家的目的、意圖與語境。 **傳統流程（非 Realtime）** - Whisper → 回傳文字 - 將文字塞入 Responses API（例如 gpt-5-nano） - 由模型負責語意理解 **Realtime API** - 聽到語音後模型即時做 NLU - 常見事件：`response.output_text.delta`（模型理解後即時回覆） --- ### **7.3 DM（Dialogue Management：對話管理）** DM 模組負責： - 決定 NPC 下一步動作 - 回覆風格 - 情境保持（context） - 依照任務邏輯做分歧 **傳統 DM（基於 Responses API）** - 使用 System Prompt 定義 NPC 角色 - 使用 history 維持上下文 - 可用 Structured Output 產生行為控制，例如： ```json { "action":"Wave", "emotion":"Happy", "reply":"你好！歡迎來到這裡。" } ``` - DM 邏輯可由遊戲程式 + prompt 雙方共同決定 **Realtime DM** - Realtime API 無 Structured Output - DM 需改為程式端配合事件流判斷（e.g., 打斷、語音停頓） - 可以使用 transcript 再去給有 structured ouput 的模型做決策 --- ### **7.4 NLG（Natural Language Generation：語言生成）** **Responses API（傳統）** - 模型計算後一次回傳完整文字（或 streaming chunk） - 若要控制格式 → 使用 Structured Output（Responses API 專屬） **Realtime API（即時）** - 以 token 級速度輸出：`response.output_text.delta` - 可做到同步字幕、同步 NPC 嘴型 --- ### **7.5 TTS（Text‑To‑Speech 語音生成）** **傳統 TTS 流程** - Responses（文字） → Audio TTS API → wav/mp3 - Unity 建立 AudioClip → AudioSource 播放 **Realtime TTS（語音 chunk 流式輸出）** - Realtime 直接回傳：`response.output_audio.delta` - Unity 可邊收邊播，延遲比傳統 TTS 低很多 --- ### **7.6 VAD（Voice Activity Detection：語音起訖偵測）** VAD 用來判斷「玩家什麼時候在講話」、「何時停下」。 **Unity 簡易 VAD（程式端）** 依音量判斷： ```csharp float ComputeVolume(float[] samples){ float sum=0f; foreach(var s in samples) sum+=s*s; return Mathf.Sqrt(sum/samples.Length); } ``` - 連續高於門檻 → 認定正在講話 - 連續低於門檻 → 認定講完了 **Realtime API 的 VAD** - 內建自動判斷語音起訖 - 不需額外做 VAD，但可搭配自製 VAD 控制何時送 `input_audio_buffer.completed` - 啟用 Server_VAD 範例: ```json { "type": "session.update", "session": { "voice": "gpt-voice", "vad": { "mode": "server" } } } ``` >[!Note]參考資訊如果要做更多VAD的參數設定可以參考[官方 document](https://platform.openai.com/docs/guides/realtime-vad) --- ### **7.7 傳統語音 Pipeline vs Realtime 整合（總覽）** | 模組 | 傳統 Pipeline（Responses + Whisper + TTS） | Realtime API | | --- | ------------------------------- | --------------- | | ASR | Whisper API | 內建即時 ASR | | NLU | Responses API | 內建 NLU | | DM | Structured Output、system prompt | 事件流、無結構化輸出 | | NLG | Responses / SSE | token 級即時輸出 | | TTS | Audio API（延遲較高） | 即時語音 chunk（低延遲） | | VAD | 程式端自行實作 | 內建 | --- ### **7.8 何時用傳統？何時用 Realtime？** #### **選傳統 Pipeline（Whisper + Responses + TTS）** 若： - 需要 Structured Output 控制 NPC 動作 - 需要可預測與穩定的 JSON 回傳 - 想精細調整每個 AI 模組（ASR / NLU / DM / NLG / TTS） - 不追求語音極低延遲 #### **選 Realtime API** 若： - 要語音助理、VR NPC、需被玩家打斷的即時對話 - 需要語音輸入 / 語音輸出同時處理 - 想要最低延遲（token 級輸出 + 即時 TTS） - 希望由 API 自動處理 VAD / ASR / TTS ---

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.