開始玩囉
這回接到工程部服務組需求語音轉文字,該單位想嘗試把會議紀錄快速產出,
之前玩過各大雲廠商的 Speech-To-Text (Azure、IBM、Google…)語音轉文字,使用結果心得就不評論囉。
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
   
更是直接使用GitHub上現有工具Whisper Desktop可離線使用,並試試目前火紅的 OpenAI 開放訓練好的 Whisper 模組。
Whisper介紹、評測
OpenAI Whisper提供五種規模的模型供選擇,其中大型模型在精準度方面表現優異,但會消耗更多資源並降低處理速度。除了最大型的模型外,而英語專屬模型則能提供更優異的識別結果。
Whisper是一種自動語音識別(ASR)系統,來源於從網路收集的68萬小時訓練,包含多國語言、各種口音。
| Size | 
Parameters | 
English-only model | 
Multilingual model | 
Required VRAM | 
Relative speed | 
| tiny | 
39 M | 
tiny.en | 
tiny | 
~1 GB | 
~32x | 
| base | 
74 M | 
base.en | 
base | 
~1 GB | 
~16x | 
| small | 
244 M | 
small.en | 
small | 
~2 GB | 
~6x | 
| medium | 
769 M | 
medium.en | 
medium | 
~5 GB | 
~2x | 
| large | 
1550 M | 
N/A | 
large | 
~10 GB | 
1x | 
表格出處:OpenAI Wisper Github
Whisper的辨識過程不同語言有很大的變化。下圖顯示了各語言的WER(Word Error Rate,詞誤率)分析(數字越小,效能越好),可以看到英文的識別率極佳,而中文的錯誤率約 14.7%。
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
   
安裝&執行
以下是我在 Windows 安裝及執行 Whisper 的紀錄:
- 直接到「 Whisper Desktop 」的 GitHub 頁面,在右方的「 Releases 」,找到最新版軟體的下載網址,下載後解壓縮,直接執行裡面的「WhisperDesktop」。
 
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
   
- 
下載 Whisper 模型檔,建議挑選 ggml-medium.bin 模型。
 
- 
執行 WhisperDesktop 程式,並選擇運算模型 檔。
 
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
   
- 進入到正式準備語音轉文字的畫面,按下 
Transcribe 即可: 
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
   
實測過程
環境1.純內顯(Intel® UHD Graphics 630)
- 使用模型:大型 ggml-large.bin
 
- 耗費時間:3小時
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
    
環境2.Nvida GTX 1050
- 使用模型:大型 ggml-large.bin
 
- 耗費時間:20分16秒
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
    
環境3.Nvida GTX 1050
- 使用模型:ggml-medium.bin
 
- 耗費時間:4分54秒
    
    
      Image Not Showing
      Possible Reasons
      - The image file may be corrupted
 - The server hosting the image is unavailable
 - The image path is incorrect
 - The image format is not supported
 
      Learn More →
     
    
實測結果:
初步測試
- 使用
大型模型,CPU vs GPU 對打,效能差了9倍。 
- 使用 
大型、中型模型 辨識結果沒有明顯差異,但GPU效能卻差了5倍。 
辨識精準度極高(90%),錯字少,但專有名詞上比較容易出現錯字。
例如:
- 泛型(Generic) → 泛行
 
- Clone → Cleon
 
其他參考