為什麼會有這篇筆記

整理從Apple MacBook Pro(2020)到丐版Apple MAC Mini M4上，使用docker版Kuwa GenAI OS的安裝經驗及如何結合mlx-whisper榨出Apple silicon的效能。

安裝過程

下載整包原始碼

打開終端機輸入以下指令：

git clone https://github.com/kuwaai/genai-os.git
cd genai-os/docker

調整設定

模型設定檔

於compose目錄下加入或調整要使用的模型設定檔，例如參考ollama.yaml複製，重新命名成ollama-taide.yaml後，改裡面的內容如下：

services:
  ollama-taide-executor:
    image: kuwaai/model-executor
    environment:
      EXECUTOR_TYPE: ollama
      EXECUTOR_ACCESS_CODE: ollama-taide
      EXECUTOR_NAME: TAIDE
      EXECUTOR_IMAGE: TAIDE.png # Refer to src/multi-chat/public/images
    depends_on:
      - executor-builder
      - kernel
      - multi-chat
    command: [
      "--ollama_host", "host.docker.internal:11434", # The ollama server on host
      "--model", "Llama3-TAIDE-LX-8B-Chat-Alpha1-4bit:latest",
      "--system_prompt","你是一個來自台灣的AI助理，你的名字是TAIDE，樂於以台灣人的立場幫助使用者，會用繁體中文回答問題。"
    ]
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped
    networks: ["backend"]

之後就可以與直接安裝在MAC上的ollama進行介接(關於Ollama可參考另一篇Kuwa GenAI OS Ｗindows可攜式版本調校筆記-Ollama)。

加入要啟動的compose

將docker目錄下的run.sh.sample複製，重新命名成rum.sh，改confs=()裡面的內容如下，如果沒有要用的，可以使用#進行註解。

#!/bin/bash

# Define the configuration array
confs=(
  "base"
  # "dev" # Increase the verbosity for debug
  "pgsql"
  "copycat"
  "sysinfo"
  "pipe"
  "uploader"
  "token_counter"
  "gemini"
  #"chatgpt"
  #"dall-e"
  "llama-3.3-70b-groq"
  "docqa"
  "searchqa"
  "ollama-taide"
)

# Append "-f" before each element
for i in "${confs[@]}"; do
  new_confs+=("-f" "compose/${i}.yaml" )
done

# Join the elements with white space
joined_confs=$(echo "${new_confs[@]}" | tr ' ' '\n' | paste -sd' ' -)

subcommand="${@:-up --remove-orphans}"
command="docker compose --env-file ./.env ${joined_confs} ${subcommand}"

echo "Command: ${command}"
bash -c "${command}"

變更預設語言

編輯multi-chat目錄下的app.env設定檔


locale=${DEFAULT_LOCALE:-en_us}
fallback_locale=${FALLBACK_LOCALE:-en_us}

改成


locale="zh_tw"
fallback_locale="zh_tw"

避免在建置過程中因讀不到DEFAULT_LOCALE，而multi-chat建置失敗。

調整Timeout值(預設120秒)

可參考另一篇的說明進行調整Kuwa GenAI OS Ｗindows可攜式版本調校筆記-Timeout秒數調整(120變成1200)。

設定管理者帳密、資料庫密碼及相關環進變數

可參考Github上Docker 版安裝教學-更改設定檔

從原始碼建置Docker Images

打開終端機輸入以下指令：

cd genai-os/docker
./run.sh build

再來就是等候一段時間(可能幾小時)。
看到類似下圖，就可以打開遊覽器輸入127.0.0.1或區域網路下該台MAC的IP位址，進行登入。

若要停止，則回到終端機按下鍵盤control+c進行中斷。
下次從終端機啟動，則輸入以下指令：

cd genai-os/docker
./run.sh

在ＭAC Mini M4建置mlx-whisper API

Kuwa GenAI OS使用的是faster-whisper，在MAC OS上的docker只能用CPU執行，接著面臨的是Timeout。
在不斷地追問ChatGPT和perplexity，如何讓whisper能夠使用MAC上的GPU資源，最後直接Google找到了Apple Silicon專用的AI框架MLX(ml-explore)。

利用ChatGPT產生mlx-whisper API程式碼並進行微調(些微調整…)，最後版本(中間的坑就不多說了，但這些都是經驗)為利用Kuwa GenAI OS的聊天室上傳錄音檔，取得錄音檔URL後交給mlx-whisper API進行語音轉文字，在吐逐字稿給Kuwa GenAI OS的聊天室進行輸出。

mlx-whisper

在終端機切換在Anaconda中建立的虛擬環境mlx。

source activate mlx

安裝執行mlx-whisper時，必要的套件(只需執行一次)。

conda install -c conda-forge mlx
conda install -c conda-forge ffmpeg
pip install mlx-whisper

mlx-whisper API

安裝執行API時，必要的套件(只需執行一次)。

pip install fastapi uvicorn

將以下mlx-whisper API的原始碼存成Python執行檔(mlx-whisper-url.py)。






























































































#!/usr/bin/env python
# coding: utf-8

# In[1]:


# pip install fastapi uvicorn


# In[ ]:


from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from fastapi.responses import PlainTextResponse
import requests
import mlx_whisper
import os
import shutil

app = FastAPI()

# 加載 Whisper 模型
# MODEL_SIZE = "base"
# model = whisper.load_model(MODEL_SIZE)


class AudioRequest(BaseModel):
    url: str  # 輸入音檔的網址


@app.post("/transcribe-url/")
async def transcribe_audio_url(request: AudioRequest):
    try:
        # 下載音檔
        audio_url = request.url
        response = requests.get(audio_url, stream=True)
        if response.status_code != 200:
            raise HTTPException(status_code=400, detail="Failed to download the audio file from the URL.")

        # 確定文件格式並保存
        temp_dir = "temp_audio"
        os.makedirs(temp_dir, exist_ok=True)
        file_path = os.path.join(temp_dir, "temp_audio_file")
        content_type = response.headers.get("Content-Type", "")
        if "audio/mpeg" in content_type:
            file_path += ".mp3"
        elif "audio/wav" in content_type:
            file_path += ".wav"
        elif "audio/mp4" in content_type or "audio/x-m4a" in content_type:
            file_path += ".m4a"
        elif "audio/aac" in content_type:
            file_path += ".aac"
        else:
            raise HTTPException(status_code=400, detail="Unsupported audio format from the URL.")

        with open(file_path, "wb") as f:
            shutil.copyfileobj(response.raw, f)

        # 調用 mlx_Whisper 模型進行轉錄
        result = mlx_whisper.transcribe(
            file_path
            ,path_or_hf_repo="mlx-community/whisper-large-v3-mlx"
            # ,word_timestamps=True
            ,task="transcribe"
            ,language="zh"
            ,initial_prompt='加入標點符號。'
        )
        os.remove(file_path)

        # 格式化逐字稿
        transcript = result["segments"]
        formatted_transcript = []
        for segment in transcript:
            start = format_timestamp(segment["start"])
            end = format_timestamp(segment["end"])
            text = segment["text"].strip()
            formatted_transcript.append(f"[{start} - {end}] {text}")

        # 返回格式化的文字
        return PlainTextResponse("\n".join(formatted_transcript))

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))


def format_timestamp(seconds):
    """
    格式化時間戳（以秒為單位）為 HH:MM:SS 格式。
    """
    hours = int(seconds // 3600)
    minutes = int((seconds % 3600) // 60)
    seconds = int(seconds % 60)
    return f"{hours:02}:{minutes:02}:{seconds:02}"

於終端機輸入以下指令啟動API。

uvicorn mlx-whisper-url:app --host 127.0.0.1 --port 8000

利用Kuwa GenAI OS上的Python進行驗證

利用ChatGPT產生Client端程式碼，並在Kuwa GenAI OS上的Python進行驗證，若得到預期結果產生出逐字稿，代表在Kuwa GenAI OS上可行，可以進行下一步。

建立Kuwa GenAI OS的Tool

利用Upload Tool上傳Python執行檔(.py)，即可建立Tool及對應的Bot。
Kuwa OS Client的Python程式碼如下：

#!/usr/bin/env python
# coding: utf-8

import requests
import fileinput

def transcribe_url(audio_url):
    url = "http://localhost:8000/transcribe-url/"
    # audio_url = "http://localhost/storage/pdfs/1/%E7%BE%85%E6%96%AF%E7%A6%8F%E8%B7%AF%E4%B8%89%E6%AE%B5242%E8%99%9F%2037-.m4a"

    payload = {"url": audio_url}
    response = requests.post(url, json=payload)

    if response.status_code == 200:
        print("逐字稿結果：")
        print(response.text)
    else:
        print(f"失敗：{response.status_code}")
        print(response.json())


def is_url(input_line):
    """
    Checks if the input line is a valid URL.
    
    Args:
        input_line: The input line to be checked.
    
    Returns:
        True if the input line is a valid URL, False otherwise.
    """
    try:
        result = urlparse(input_line)
        return all([result.scheme, result.netloc])
    except ValueError:
        return False

if __name__ == "__main__":
    for line in fileinput.input():
        line = line.strip()
        print(str(line))
        # if not is_url(line):
        #     print(line)
        #     return
        transcribe_url(line)

調整Bot設定

將模型設定檔變更如下(這裡的py檔名與剛剛用Upload Tool上傳的一致)：


PARAMETER pipe_program python
PARAMETER pipe_argv 'mlx-whisper-Kuwa1.py'

成果Demo

可以拿出手機或筆電連接有線或無線區域網路，輸入Apple MAC Mini M4(未接螢幕輸出時，服務掛著耗電只有1W；接上Hdmi，服務掛著耗電也只有3W；執行生成式AI時，大約耗電30W左右)的IP位址。
手機上傳錄音檔，讓Apple MAC Mini M4上的mlx-whisper進行語音轉文字。

mlx-whisper運行時，所耗資源如下：

mlx-whisper產生的逐字稿，在Kuwa GenAI OS上輸出結果。

參考資料

陳伶志，「Kuwa AI 安裝過程筆記」https://hackmd.io/@cclljj/r1mIc3tNR
Kuwa AI網站 https://kuwaai.tw/zh-Hant/os/intro
Kuwa GenAI OS https://github.com/kuwaai/genai-os
mlx(ml-explore) https://github.com/ml-explore/mlx
mlx套件說明 https://pypi.org/project/mlx/
mlx-whisper套件說明 https://pypi.org/project/mlx-whisper/