BASHCAT
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # LM Studio 0.4.1 完整攻略:從 Anthropic API 到 Claude Code 本地化 ![lmstudio-041-cover](https://hackmd.io/_uploads/SygcS4hvbg.jpg) [TOC] 那天晚上我在 debug 一個棘手的 race condition,Claude Code 用得正順手,突然收到帳單通知 -- 這個月的 API 費用又破百美金了。我盯著螢幕想,明明手上這台 M4 Max 有 128GB 記憶體,為什麼每次推理都得送到雲端去? 後來我發現 LM Studio 0.4.1 悄悄加了一個殺手級功能:Anthropic API 相容端點。意思是,你可以用 3 行指令,把 Claude Code 直接接到跑在你自己電腦上的模型,零成本、完全隱私、不用改任何程式碼。 這篇文章就是我折騰了一個週末之後的完整筆記。 --- ## 從 0.4.0 到 0.4.2,三個版本做了什麼 LM Studio 在 2026 年初連發三個版本,節奏快到我差點跟不上。先拉一條時間線出來看: **0.4.0(2026-01-28)** 是一次大改版。引入了 llmster 守護程式,讓你不開 GUI 也能跑模型;支援平行推理,同時處理多個請求不再排隊;全新的 REST API v1 有了狀態管理能力;UI 也重新設計過了。 **0.4.1(2026-01-30)** 只隔了兩天就出來,改動看起來不大 -- 就加了一個 Anthropic API 相容端點 `POST /v1/messages`。但這個小功能的影響力遠超想像。因為 Claude Code、Cursor、Cline 這些工具都是走 Anthropic 的 Messages API 溝通的,有了這個端點,所有這些工具瞬間都能接上本地模型。 **0.4.2(2026-02-06)** 補上了 MLX 後端的平行請求支援。在 Apple Silicon 上跑模型的人終於可以同時餵多個請求,不用再一個一個等了。 三個版本連在一起看,LM Studio 的野心很清楚:它不只想當一個跑模型的 GUI,它想成為本地 AI 的基礎設施。 --- ## 三層 API 架構,一張圖看懂 ![lmstudio-041-api-architecture](https://hackmd.io/_uploads/BJO9BV3w-e.jpg) 0.4.1 之後的 LM Studio 同時暴露三套 API,跑在同一個 port 上: :::info **三層 API 一覽** - **原生 REST API v1**(`/api/v1/`)-- 有狀態聊天 + MCP 整合,功能最完整 - **OpenAI 相容**(`/v1/chat/completions`, `/v1/responses`, `/v1/embeddings`)-- 讓原本接 OpenAI 的工具無縫切換 - **Anthropic 相容**(`/v1/messages`)-- 0.4.1 新增,讓 Claude Code 等工具直接對接 ::: 為什麼要搞三套?因為現在的 AI 工具生態已經分裂成兩個陣營了。一邊是 OpenAI 的 Chat Completions 格式,另一邊是 Anthropic 的 Messages 格式。LM Studio 兩邊都支援,等於幫你打通了所有工具鏈。 原生 API 則是 LM Studio 自己的東西,有些獨家功能像有狀態聊天管理和 MCP 整合,只有透過 `/api/v1/` 才能用。 三套 API 共用同一個 port,靠 URL path 區分,所以啟動一次 server 就全部可用了。 --- ## Claude Code 整合實戰:3 行搞定 ![lmstudio-041-claude-code-workflow](https://hackmd.io/_uploads/SJGiSEnwWx.jpg) 好,這是整篇文章最重要的部分。先確認幾件事: - LM Studio 0.4.1 以上已安裝 - 至少一個模型已下載(我推薦 `openai/gpt-oss-20b` 或 `qwen3-coder`) - Claude Code CLI 已安裝 然後就這麼簡單: ```bash=1 # 啟動 LM Studio server lms server start --port 1234 # 把 Claude Code 指向本地 export ANTHROPIC_BASE_URL=http://localhost:1234/ export ANTHROPIC_AUTH_TOKEN=lmstudio # 啟動 Claude Code,指定模型 claude --model openai/gpt-oss-20b ``` 就這樣。沒了。Claude Code 會以為自己在跟 Anthropic 的伺服器說話,但實際上所有請求都被導到你本機的 LM Studio。 有幾個地方值得注意: `ANTHROPIC_AUTH_TOKEN` 設成 `lmstudio` 只是因為 Claude Code 要求這個環境變數不能為空。LM Studio 預設不檢查認證,所以你填什麼都行,但不能不填。 `--model` 後面接的是你在 LM Studio 裡載入的模型識別碼。你可以用 `lms ls` 查看已下載的模型。 :::warning **效能提醒**:本地模型的推理速度跟你的硬體直接相關。在 M4 Max 上跑 20B 參數的模型,回應速度大概在每秒 30-50 tokens,體驗還不錯。但如果你的機器記憶體不夠,模型會被迫 offload 到磁碟,速度會慢到讓你懷疑人生。 ::: --- ## API 使用範例 除了 Claude Code 的整合之外,你可能也想直接從自己的程式碼呼叫 LM Studio 的 API。以下是三種語言的範例。 ### cURL -- Anthropic Messages 格式 ```bash=1 curl http://localhost:1234/v1/messages \ -H "Content-Type: application/json" \ -H "x-api-key: lmstudio" \ -H "anthropic-version: 2023-06-01" \ -d '{ "model": "openai/gpt-oss-20b", "max_tokens": 1024, "messages": [ {"role": "user", "content": "用一句話解釋什麼是 WebSocket"} ] }' ``` ### cURL -- OpenAI Chat Completions 格式 ```bash=1 curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer lmstudio" \ -d '{ "model": "openai/gpt-oss-20b", "messages": [ {"role": "system", "content": "你是一個資深軟體工程師"}, {"role": "user", "content": "解釋 event loop"} ], "temperature": 0.7, "stream": true }' ``` ### Python -- 用 Anthropic SDK ```python=1 import anthropic client = anthropic.Anthropic( base_url="http://localhost:1234", api_key="lmstudio", # 任意值,LM Studio 預設不檢查 ) message = client.messages.create( model="openai/gpt-oss-20b", max_tokens=1024, messages=[ {"role": "user", "content": "寫一個 Python decorator 實作 retry 機制"} ], ) print(message.content[0].text) ``` ### TypeScript -- 用 OpenAI SDK ```typescript=1 import OpenAI from "openai" const client = new OpenAI({ baseURL: "http://localhost:1234/v1", apiKey: "lmstudio", }) const response = await client.chat.completions.create({ model: "openai/gpt-oss-20b", messages: [ { role: "user", content: "解釋 TypeScript 的 conditional types" }, ], stream: true, }) for await (const chunk of response) { process.stdout.write(chunk.choices[0]?.delta?.content || "") } ``` 重點是:你原本用 Anthropic SDK 或 OpenAI SDK 寫的程式碼,只要改個 `base_url` 就能切到本地,不需要動其他邏輯。 --- ## llmster 無頭部署 如果你想在伺服器上跑 LM Studio 但不需要 GUI(比如在一台 Linux 工作站上),llmster 就是你要的東西。它是 0.4.0 引入的無頭守護程式。 ### 安裝 ```bash=1 # macOS / Linux curl -fsSL https://lmstudio.ai/install-lmstudio-cli | bash # 或者從 LM Studio GUI 裡安裝 CLI: # 左側選單 > Developer > 安裝 CLI ``` ### 基本使用 ```bash=1 # 啟動 server(前景模式) lmstudio server start --port 1234 # 背景模式 lmstudio server start --port 1234 & # 載入模型 lms load openai/gpt-oss-20b # 查看已載入模型 lms ps # 停止 server lms server stop ``` ### systemd 服務配置 在生產環境裡,你會想讓 llmster 跟著系統啟動。建立 `/etc/systemd/system/lmstudio.service`: ```ini=1 [Unit] Description=LM Studio Server (llmster) After=network.target [Service] Type=simple User=your-username ExecStart=/usr/local/bin/lmstudio server start --port 1234 Restart=on-failure RestartSec=10 Environment=HOME=/home/your-username [Install] WantedBy=multi-user.target ``` 然後: ```bash=1 sudo systemctl daemon-reload sudo systemctl enable lmstudio sudo systemctl start lmstudio # 確認狀態 sudo systemctl status lmstudio ``` 這樣你就有了一台永遠在線的本地 AI 推理伺服器。把 `ANTHROPIC_BASE_URL` 指向這台機器的 IP,辦公室裡的每個人都能用。 --- ## CLI 命令速查表 用 `lms` 可以在終端機控制 LM Studio 的一切: | 命令 | 說明 | |------|------| | `lms status` | 查看 server 狀態 | | `lms server start --port 1234` | 啟動 API server | | `lms server stop` | 停止 server | | `lms load <model>` | 載入模型到記憶體 | | `lms unload <model>` | 卸載模型 | | `lms ps` | 查看已載入的模型 | | `lms ls` | 列出已下載的模型 | | `lms get <model>` | 下載模型 | | `lms log stream` | 即時查看 server log | | `lms create` | 建立自訂模型配置 | | `lms version` | 查看版本資訊 | 幾個我常用的組合: ```bash=1 # 一行搞定:啟動 server + 載入模型 lms server start --port 1234 && lms load openai/gpt-oss-20b # 監控推理狀態 lms log stream | grep -i "inference" # 快速切換模型 lms unload openai/gpt-oss-20b && lms load qwen3-coder ``` --- ## 推薦模型 跑了一圈下來,以下是我覺得在 LM Studio 裡體驗最好的幾個模型: | 模型 | 參數量 | 適合場景 | 記憶體需求 | |------|--------|----------|------------| | `openai/gpt-oss-20b` | 20B | 通用程式碼生成、對話 | ~14GB | | `qwen3-coder` | 14B/32B | 程式碼補全、重構 | ~10GB / ~22GB | | `ibm/granite-4-micro` | 8B | 輕量推理、快速回應 | ~6GB | **openai/gpt-oss-20b** 是我目前的主力。OpenAI 開源的這個模型在程式碼理解上出乎意料地好,跑在 M4 Max 上速度也夠快,是性價比最高的選擇。 **qwen3-coder** 如果你的機器記憶體夠大,32B 版本在程式碼任務上的表現非常接近雲端模型。14B 版本則是記憶體吃緊時的好選擇。 **ibm/granite-4-micro** 適合需要極快回應速度的場景。8B 參數量意味著它可以在幾乎任何 Apple Silicon Mac 上流暢運行,拿來做 code review 的初步篩選很合適。 :::success **選擇建議**:如果你只想下載一個模型先試試,選 `openai/gpt-oss-20b`。它在通用性和效能之間取得了很好的平衡,搭配 Claude Code 使用的體驗也最穩定。 ::: --- ## 認證機制 LM Studio 預設是不需要認證的 -- 畢竟你是在本機跑,通常不需要擔心未授權的存取。 但如果你把 server 開放到區域網路,或者有安全需求,可以在 Developer Settings 裡啟用 API Token: 1. 開啟 LM Studio 2. 進入 **Developer** 頁面 3. 找到 **API Security** 區塊 4. 啟用 **Require API Token** 5. 複製產生的 token 啟用之後,所有 API 請求都需要帶上認證: ```bash=1 # OpenAI 格式 curl http://localhost:1234/v1/chat/completions \ -H "Authorization: Bearer YOUR_TOKEN" \ ... # Anthropic 格式 curl http://localhost:1234/v1/messages \ -H "x-api-key: YOUR_TOKEN" \ ... ``` 對應到 Claude Code 的設定: ```bash export ANTHROPIC_AUTH_TOKEN=YOUR_TOKEN ``` 一個小提醒:即使啟用了 token 認證,LM Studio 目前還不支援 TLS。如果你需要在不安全的網路上暴露 API,建議在前面擺一個 nginx 或 Caddy 做 reverse proxy 加上 HTTPS。 --- ## MCP via API:把工具能力帶進本地 0.4.0 引入的原生 REST API 有一個很酷的功能:透過 `integrations` 欄位在 API 請求中掛載 ephemeral MCP server。這讓你的本地模型也能使用外部工具。 ```bash=1 curl http://localhost:1234/api/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openai/gpt-oss-20b", "messages": [ {"role": "user", "content": "台北現在幾度?"} ], "integrations": [ { "type": "mcp", "server": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-weather"] } } ] }' ``` 注意這個功能只在原生 REST API(`/api/v1/`)上可用,OpenAI 和 Anthropic 相容端點目前還不支援。 MCP 整合的威力在於它是 ephemeral 的 -- 每次請求都會啟動一個新的 MCP server,用完就丟。你不需要預先配置任何東西,只要在請求裡指定就好。 想像一下這樣的場景:你的本地模型接收到「幫我查一下這個 GitHub repo 的最新 issue」的請求,它透過 MCP 啟動一個 GitHub 工具,拿到資料後回答你。整個過程發生在你的機器上,資料不會離開本地網路。 --- ## 我踩過的坑 用了一個多禮拜,有幾個問題值得提前警告你: **模型載入失敗**。有時候 `lms load` 會卡住不動,通常是記憶體不夠。先用 `lms ps` 看看有沒有其他模型還在佔著記憶體,用 `lms unload` 清掉再試。 **Claude Code 報 connection refused**。確認 server 確實在跑(`lms status`),然後確認 port 沒有被別的東西佔住(`lsof -i :1234`)。另一個常見原因是 `ANTHROPIC_BASE_URL` 後面不小心多了一個斜線。 **回應品質不如預期**。本地模型畢竟不是 Claude Opus,在複雜的多步驟推理上會有明顯差距。我的做法是把本地模型當作日常的「第一道篩選」-- 簡單的 code review、格式轉換、文件生成交給它,真正困難的架構設計和 debug 再切回雲端。 **streaming 斷斷續續**。如果你用的是 MLX 後端(Apple Silicon),確保已經更新到 0.4.2,平行請求的支援在那個版本才加上。 --- ## 這到底省了多少錢 我算了一下。以我平均每天用 Claude Code 約 200 次請求、每次大概 2000 tokens 的使用量,全部走 Anthropic API 的話,一個月大概要 80-120 美金。 切到本地之後,這筆錢直接歸零。 當然,「免費」是有代價的 -- 你需要一台夠力的機器,而且回應速度和品質都會打折扣。但對我來說,80% 的日常開發任務其實不需要最頂尖的模型。真正需要深度推理的時候,我隨時可以切回 `claude --model claude-opus-4-6` 走雲端。 兩者混用才是最聰明的策略。 --- ## 參考資料 - [LM Studio 官方網站](https://lmstudio.ai) - [LM Studio 0.4.1 Release Notes](https://lmstudio.ai/blog/lmstudio-0.4.1) - [LM Studio 0.4.0 Release Notes](https://lmstudio.ai/blog/lmstudio-0.4.0) - [LM Studio 0.4.2 Release Notes](https://lmstudio.ai/blog/lmstudio-0.4.2) - [LM Studio API 文件](https://lmstudio.ai/docs/api) - [LM Studio CLI 文件](https://lmstudio.ai/docs/cli) - [Claude Code 官方文件](https://docs.anthropic.com/en/docs/claude-code) - [Anthropic Messages API 規格](https://docs.anthropic.com/en/api/messages) - [Model Context Protocol(MCP)](https://modelcontextprotocol.io) - [llmster 無頭部署指南](https://lmstudio.ai/docs/advanced/headless)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully