Try   HackMD

為什麼會有這篇筆記

整理從Apple MacBook Pro(2020)到丐版Apple MAC Mini M4上,使用docker版Kuwa GenAI OS的安裝經驗及如何結合mlx-whisper榨出Apple silicon的效能。

安裝過程

下載整包原始碼

打開終端機輸入以下指令:

git clone https://github.com/kuwaai/genai-os.git
cd genai-os/docker

調整設定

模型設定檔

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

於compose目錄下加入或調整要使用的模型設定檔,例如參考ollama.yaml複製,重新命名成ollama-taide.yaml後,改裡面的內容如下:

services:
  ollama-taide-executor:
    image: kuwaai/model-executor
    environment:
      EXECUTOR_TYPE: ollama
      EXECUTOR_ACCESS_CODE: ollama-taide
      EXECUTOR_NAME: TAIDE
      EXECUTOR_IMAGE: TAIDE.png # Refer to src/multi-chat/public/images
    depends_on:
      - executor-builder
      - kernel
      - multi-chat
    command: [
      "--ollama_host", "host.docker.internal:11434", # The ollama server on host
      "--model", "Llama3-TAIDE-LX-8B-Chat-Alpha1-4bit:latest",
      "--system_prompt","你是一個來自台灣的AI助理,你的名字是TAIDE,樂於以台灣人的立場幫助使用者,會用繁體中文回答問題。"
    ]
    extra_hosts:
      - "host.docker.internal:host-gateway"
    restart: unless-stopped
    networks: ["backend"]

之後就可以與直接安裝在MAC上的ollama進行介接(關於Ollama可參考另一篇Kuwa GenAI OS Windows可攜式版本調校筆記-Ollama)。

加入要啟動的compose

將docker目錄下的run.sh.sample複製,重新命名成rum.sh,改confs=()裡面的內容如下,如果沒有要用的,可以使用#進行註解。

#!/bin/bash

# Define the configuration array
confs=(
  "base"
  # "dev" # Increase the verbosity for debug
  "pgsql"
  "copycat"
  "sysinfo"
  "pipe"
  "uploader"
  "token_counter"
  "gemini"
  #"chatgpt"
  #"dall-e"
  "llama-3.3-70b-groq"
  "docqa"
  "searchqa"
  "ollama-taide"
)

# Append "-f" before each element
for i in "${confs[@]}"; do
  new_confs+=("-f" "compose/${i}.yaml" )
done

# Join the elements with white space
joined_confs=$(echo "${new_confs[@]}" | tr ' ' '\n' | paste -sd' ' -)

subcommand="${@:-up --remove-orphans}"
command="docker compose --env-file ./.env ${joined_confs} ${subcommand}"

echo "Command: ${command}"
bash -c "${command}"

變更預設語言

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

編輯multi-chat目錄下的app.env設定檔

locale=${DEFAULT_LOCALE:-en_us} fallback_locale=${FALLBACK_LOCALE:-en_us}

改成

locale="zh_tw" fallback_locale="zh_tw"

避免在建置過程中因讀不到DEFAULT_LOCALE,而multi-chat建置失敗。

調整Timeout值(預設120秒)

可參考另一篇的說明進行調整Kuwa GenAI OS Windows可攜式版本調校筆記-Timeout秒數調整(120變成1200)

設定管理者帳密、資料庫密碼及相關環進變數

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

可參考Github上Docker 版安裝教學-更改設定檔

從原始碼建置Docker Images

打開終端機輸入以下指令:

cd genai-os/docker
./run.sh build

再來就是等候一段時間(可能幾小時)。
看到類似下圖,就可以打開遊覽器輸入127.0.0.1或區域網路下該台MAC的IP位址,進行登入。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

若要停止,則回到終端機按下鍵盤control+c進行中斷。
下次從終端機啟動,則輸入以下指令:

cd genai-os/docker
./run.sh

在MAC Mini M4建置mlx-whisper API

Kuwa GenAI OS使用的是faster-whisper,在MAC OS上的docker只能用CPU執行,接著面臨的是Timeout。
在不斷地追問ChatGPT和perplexity,如何讓whisper能夠使用MAC上的GPU資源,最後直接Google找到了Apple Silicon專用的AI框架MLX(ml-explore)。

利用ChatGPT產生mlx-whisper API程式碼並進行微調(些微調整),最後版本(中間的坑就不多說了,但這些都是經驗)為利用Kuwa GenAI OS的聊天室上傳錄音檔,取得錄音檔URL後交給mlx-whisper API進行語音轉文字,在吐逐字稿給Kuwa GenAI OS的聊天室進行輸出。

mlx-whisper

在終端機切換在Anaconda中建立的虛擬環境mlx。

source activate mlx

安裝執行mlx-whisper時,必要的套件(只需執行一次)。

conda install -c conda-forge mlx
conda install -c conda-forge ffmpeg
pip install mlx-whisper

mlx-whisper API

安裝執行API時,必要的套件(只需執行一次)。

pip install fastapi uvicorn

將以下mlx-whisper API的原始碼存成Python執行檔(mlx-whisper-url.py)。

#!/usr/bin/env python # coding: utf-8 # In[1]: # pip install fastapi uvicorn # In[ ]: from fastapi import FastAPI, HTTPException from pydantic import BaseModel from fastapi.responses import PlainTextResponse import requests import mlx_whisper import os import shutil app = FastAPI() # 加載 Whisper 模型 # MODEL_SIZE = "base" # model = whisper.load_model(MODEL_SIZE) class AudioRequest(BaseModel): url: str # 輸入音檔的網址 @app.post("/transcribe-url/") async def transcribe_audio_url(request: AudioRequest): try: # 下載音檔 audio_url = request.url response = requests.get(audio_url, stream=True) if response.status_code != 200: raise HTTPException(status_code=400, detail="Failed to download the audio file from the URL.") # 確定文件格式並保存 temp_dir = "temp_audio" os.makedirs(temp_dir, exist_ok=True) file_path = os.path.join(temp_dir, "temp_audio_file") content_type = response.headers.get("Content-Type", "") if "audio/mpeg" in content_type: file_path += ".mp3" elif "audio/wav" in content_type: file_path += ".wav" elif "audio/mp4" in content_type or "audio/x-m4a" in content_type: file_path += ".m4a" elif "audio/aac" in content_type: file_path += ".aac" else: raise HTTPException(status_code=400, detail="Unsupported audio format from the URL.") with open(file_path, "wb") as f: shutil.copyfileobj(response.raw, f) # 調用 mlx_Whisper 模型進行轉錄 result = mlx_whisper.transcribe( file_path ,path_or_hf_repo="mlx-community/whisper-large-v3-mlx" # ,word_timestamps=True ,task="transcribe" ,language="zh" ,initial_prompt='加入標點符號。' ) os.remove(file_path) # 格式化逐字稿 transcript = result["segments"] formatted_transcript = [] for segment in transcript: start = format_timestamp(segment["start"]) end = format_timestamp(segment["end"]) text = segment["text"].strip() formatted_transcript.append(f"[{start} - {end}] {text}") # 返回格式化的文字 return PlainTextResponse("\n".join(formatted_transcript)) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) def format_timestamp(seconds): """ 格式化時間戳(以秒為單位)為 HH:MM:SS 格式。 """ hours = int(seconds // 3600) minutes = int((seconds % 3600) // 60) seconds = int(seconds % 60) return f"{hours:02}:{minutes:02}:{seconds:02}"

於終端機輸入以下指令啟動API。

uvicorn mlx-whisper-url:app --host 127.0.0.1 --port 8000

利用Kuwa GenAI OS上的Python進行驗證

image
利用ChatGPT產生Client端程式碼,並在Kuwa GenAI OS上的Python進行驗證,若得到預期結果產生出逐字稿,代表在Kuwa GenAI OS上可行,可以進行下一步。

建立Kuwa GenAI OS的Tool

image
利用Upload Tool上傳Python執行檔(.py),即可建立Tool及對應的Bot。
Kuwa OS Client的Python程式碼如下:

#!/usr/bin/env python
# coding: utf-8

import requests
import fileinput

def transcribe_url(audio_url):
    url = "http://localhost:8000/transcribe-url/"
    # audio_url = "http://localhost/storage/pdfs/1/%E7%BE%85%E6%96%AF%E7%A6%8F%E8%B7%AF%E4%B8%89%E6%AE%B5242%E8%99%9F%2037-.m4a"

    payload = {"url": audio_url}
    response = requests.post(url, json=payload)

    if response.status_code == 200:
        print("逐字稿結果:")
        print(response.text)
    else:
        print(f"失敗:{response.status_code}")
        print(response.json())


def is_url(input_line):
    """
    Checks if the input line is a valid URL.
    
    Args:
        input_line: The input line to be checked.
    
    Returns:
        True if the input line is a valid URL, False otherwise.
    """
    try:
        result = urlparse(input_line)
        return all([result.scheme, result.netloc])
    except ValueError:
        return False

if __name__ == "__main__":
    for line in fileinput.input():
        line = line.strip()
        print(str(line))
        # if not is_url(line):
        #     print(line)
        #     return
        transcribe_url(line)

調整Bot設定

image
將模型設定檔變更如下(這裡的py檔名與剛剛用Upload Tool上傳的一致):

PARAMETER pipe_program python PARAMETER pipe_argv 'mlx-whisper-Kuwa1.py'

成果Demo

可以拿出手機或筆電連接有線或無線區域網路,輸入Apple MAC Mini M4(未接螢幕輸出時,服務掛著耗電只有1W;接上Hdmi,服務掛著耗電也只有3W;執行生成式AI時,大約耗電30W左右)的IP位址。
手機上傳錄音檔,讓Apple MAC Mini M4上的mlx-whisper進行語音轉文字。

image
mlx-whisper運行時,所耗資源如下:
image

mlx-whisper產生的逐字稿,在Kuwa GenAI OS上輸出結果。
image

參考資料