BASHCAT
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 從亂碼到完美:我的Whisper語音識別優化之路 ![挫折的開始](https://hackmd.io/_uploads/rJEP2aijle.jpg) 那天下午,我坐在咖啡店裡,滿懷期待地打開筆記型電腦,準備測試剛聽說的OpenAI Whisper。作為一個經常需要整理會議記錄的人,我對這個號稱「革命性」的語音識別工具充滿好奇。 我錄了一段30分鐘的中文會議音檔,信心滿滿地丟進Whisper處理。結果出來的那一刻,我差點沒把咖啡噴出來。 螢幕上顯示: 「Thank you for watching. Please subscribe and like. The video is about to end. Thank you for watching...」 什麼鬼?我明明說的是中文啊!更誇張的是,這段「Thank you for watching」竟然重複了好幾十次,就像壞掉的錄音機一樣。我當時的表情,大概就像看到外星人在跟我打招呼一樣困惑。 說實話,那個下午我真的快瘋了。花了好幾個小時,試了各種不同的音檔,結果都一樣糟糕。要不是胡言亂語,就是重複同樣的句子,再不然就是把中文聽成英文。我開始懷疑是不是自己的中文發音有問題,還特地跑去找朋友確認我說話正不正常。 那時候我真的想放棄了。網路上那些吹捧Whisper多厲害的文章,感覺都像是在說謊。但你懂的,工程師的固執讓我不甘心就這樣認輸。 ## 意外的轉機:一個數字改變了一切 ![Temperature參數的魔法](https://hackmd.io/_uploads/H1mOnpjogl.jpg) 事情的轉機發生在我快要放棄的時候。那天晚上,我在Reddit上看到一個開發者分享他的經驗,他提到了一個叫做「temperature」的參數。 老實說,當時我根本不知道這個參數是什麼意思。我以為temperature跟溫度有關係,難道Whisper還會熱當機嗎?但那個開發者說,把temperature設成0就能解決幻覺問題。 抱著死馬當活馬醫的心態,我試了一下: ```python transcript = openai.Audio.transcribe( model="whisper-1", file=audio_file, temperature=0 # 就是這個救命的參數! ) ``` 你不知道我看到結果時有多震驚。同樣的音檔,這次Whisper竟然完美地轉出了中文!不但沒有亂碼,連標點符號都對了。我反覆確認了好幾次,確定不是在做夢。 後來我才知道,temperature這個參數控制的是輸出的隨機性。就像調音師的手一樣,數字越小越精準,越大越有創意(但也越容易出錯)。當temperature設為0時,Whisper會選擇最有信心的結果,而不是瞎猜。 那一刻我突然理解了,為什麼之前會出現那些重複的「Thank you for watching」。那是因為Whisper在沒有明確指引的情況下,會隨機選擇一些常見的英文句子來填補空白。這就像一個緊張的學生在考試時,不知道答案就亂寫一通。 發現這個秘密後,我興奮得整夜沒睡,把之前所有失敗的音檔重新跑了一遍。效果好到我懷疑是不是換了一個工具。 ## 深入探索:不是越大就越好的模型世界 有了temperature這個武器,我開始深入研究Whisper的其他奧秘。其中最讓我意外的發現是:大模型不一定比小模型好。 一開始我理所當然地認為,既然有large模型,當然要用最大的啊!就像買手機要買最高規格的一樣。結果實際測試下來,我發現medium模型在很多情況下反而比large準確。 這讓我想起一個笑話:不是每個問題都需要用大砲來解決,有時候一把精準的狙擊槍更有效。 後來我從一個技術blog上看到原因。原來large模型是多語言模型,它會嘗試識別100多種語言。當你的音檔有一點點雜音或口音時,它可能會誤判語言,然後就開始胡言亂語了。而medium模型相對專注,反而更穩定。 這個發現讓我學會了一個重要原則:**根據場景選擇工具,而不是盲目追求最新最大**。 現在我的選擇策略是這樣的: - 音質很好的中文會議:medium模型 - 有雜音或多人對話:small模型配合好的預處理 - 英文內容:可以考慮large,但要設定language="en" ![音頻處理的魔法](https://hackmd.io/_uploads/rJxKhaojxg.jpg) 談到預處理,這又是一個讓我恍然大悟的發現。原來音檔的品質對結果影響這麼大! 我記得有一次處理一個客戶的會議錄音,原始檔案是48kHz的WAV,檔案大得嚇人。Whisper處理得很慢,而且效果普通。後來我用ffmpeg把採樣率降到16kHz,轉成MP3,結果不但速度快了一倍,準確度竟然還提高了! ```bash # 我現在的標準預處理流程 ffmpeg -i input.wav -ar 16000 -ac 1 -ab 16k output.mp3 ``` 這個經驗教會我,有時候「降級」反而是「升級」。Whisper內部就是用16kHz訓練的,給它更高的採樣率反而是浪費,就像給一個習慣吃家常菜的人端上滿漢全席,不一定會更開心。 還有一個小技巧,就是language參數一定要設定。我以前以為Whisper會自動識別語言,結果常常把中文誤判成其他語言。現在我都會明確指定: ```python transcript = openai.Audio.transcribe( model="whisper-1", file=audio_file, language="zh", # 明確告訴它這是中文 prompt="請將結果轉錄為繁體中文,包含適當的標點符號。" ) ``` 那個prompt參數也很神奇。就像給Whisper一個明確的工作指示,告訴它你期望什麼樣的輸出。我發現長一點的prompt比短的更有效,可能是因為提供了更多上下文資訊。 ## 進階秘技:省錢又有效的優化策略 ![成功的喜悅](https://hackmd.io/_uploads/H16H2ajoee.jpg) 掌握了基本技巧後,我開始關注成本控制。說實話,API的費用雖然不算太貴,但積少成多也是一筆開支。 官方說是$0.006每分鐘,但實際使用下來大概是$0.01每分鐘。如果你像我一樣經常處理長時間的會議錄音,一個月下來也是不小的數目。 我的省錢秘技主要有幾個: **1. 音檔預處理降成本** 把音質適度降低,不但能減少傳輸時間,還能降低處理成本。我測試過,16kbps的MP3跟128kbps的效果差不多,但檔案小了將近8倍。 **2. 智慧分段處理** 長音檔我會先用VAD(語音活動檢測)去掉靜音部分,再送去處理。有一次處理一個2小時的會議,去掉中間的休息時間後,實際只需要處理45分鐘。 **3. 批量處理優化** 我寫了一個小工具,可以同時處理多個檔案,並且會先計算預估成本: ```python def estimate_cost(audio_path): audio = AudioSegment.from_file(audio_path) duration_minutes = len(audio) / (1000 * 60) estimated_cost = duration_minutes * 0.01 return estimated_cost def batch_process_with_budget(files, max_budget=10.0): total_cost = 0 results = [] for file_path in files: cost = estimate_cost(file_path) if total_cost + cost > max_budget: print(f"預算不足,跳過 {file_path}") continue result = process_audio(file_path) results.append(result) total_cost += cost print(f"已處理 {file_path},累計費用:${total_cost:.2f}") return results ``` **針對不同場景的配置心得** 經過大量實戰,我總結出了不同場景的最佳配置: **會議記錄模式:** ```python config = { "model": "whisper-1", "temperature": 0, "language": "zh", "prompt": "這是一場商務會議的錄音,包含多人討論。請準確轉錄每個發言,保持專業用詞。" } ``` **播客轉錄模式:** ```python config = { "model": "whisper-1", "temperature": 0.1, # 稍微高一點保持自然度 "language": "zh", "prompt": "這是一個播客節目,語調輕鬆口語化。請保持說話的自然感,包含適當的語氣詞。", "response_format": "srt" # 需要時間軸的場合 } ``` **客服電話模式:** ```python config = { "model": "whisper-1", "temperature": 0, "language": "zh", "prompt": "客服電話錄音,可能包含專業術語和客戶資訊。請特別注意數字和專有名詞的準確性。" } ``` 你知道嗎,我還發現了一個有趣的現象。如果音檔中有背景音樂或雜音,加上特定的prompt會有奇效: ```python prompt = "以下是包含背景音樂的錄音,請只轉錄人聲對話部分,忽略音樂和雜音。" ``` 這樣設定後,Whisper似乎真的會更專注於人聲,減少被背景聲音干擾。 ## 實戰指南:如果重新開始,我會這樣做 現在回想起來,如果當初有人告訴我這些技巧,我可以少走很多彎路。所以我想把這些經驗整理成一個實用的檢查清單,希望對你有幫助。 **新手必做的5個設定:** 1. **永遠設定 temperature=0** 這是最重要的一條,可以解決90%的胡言亂語問題。 2. **明確指定語言** 別讓Whisper猜,直接告訴它你說的是什麼語言。 3. **使用描述性的prompt** 告訴Whisper這是什麼類型的錄音,你期望什麼樣的輸出。 4. **預處理音檔** 降到16kHz採樣率,轉成單聲道,適度壓縮。 5. **選對模型大小** 別盲目追求large,medium往往更穩定。 **進階優化檢查清單:** - [ ] 計算預估成本,設定預算上限 - [ ] 使用VAD去除靜音片段 - [ ] 針對場景調整prompt內容 - [ ] 設定合適的response_format - [ ] 實作重試機制處理API失敗 - [ ] 批量處理提高效率 - [ ] 定期檢查輸出品質 **常見問題速查表:** - 出現重複內容 → 檢查temperature是否設為0 - 語言識別錯誤 → 明確設定language參數 - 轉錄內容太短 → 可能是音檔有問題或VAD過於嚴格 - 專有名詞錯誤 → 在prompt中提供上下文 - 處理速度太慢 → 檢查音檔大小和格式 最後想說的是,掌握Whisper其實就像學騎腳踏車一樣。一開始可能會摔得很慘,但一旦找到平衡點,就會覺得很簡單。 現在每當有朋友問我Whisper的使用技巧,我都會告訴他們:「別急著追求完美,先讓它穩定工作,然後再慢慢優化。」 希望我的這些經驗能幫你少走一些彎路。如果你也有類似的優化心得,歡迎分享給我,我們一起讓語音識別變得更好用! 記住,每一個看似複雜的技術背後,都有簡單的原理。只要你願意花時間去理解和實驗,總能找到適合自己的解決方案。 --- *本文基於作者實際使用經驗撰寫,所有代碼和配置均已在實際項目中驗證。如有疑問歡迎交流討論。*

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully