哈密蛤
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Voice Cloning ## 2025/2/19 1. 發現YourTTS僅支援舊版的Python(3.11前)且虛擬機Ubuntu也是僅支援舊版的(20.10前) 2. 預估接下來嘗試使用SV2TTS進行嘗試 ## 2025/2/21 1. 使用SV2TTS嘗試 * [基于SV2TTS的中文语音克隆合成实践原创](https://blog.csdn.net/weixin_43811043/article/details/102600940) * [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558) * [Real-Time Voice Cloning](https://github.com/CorentinJ/Real-Time-Voice-Cloninghttps://) 2. 在SV2TTS當中分別有三個模塊 ![image](https://hackmd.io/_uploads/By2RYKBqkx.png) 分別為 Encoder、Synthesizer、Vocoder 今天先以Encoder為主下去做研究 3. Encoder研究 * [GENERALIZED END-TO-END LOSS FOR SPEAKER VERIFICATION](https://arxiv.org/pdf/1710.10467v1) * 這個論文提到Encoder的歷程,原本最常使用的Encoder是叫做TE2E(tuple-based endto-end)的辨識法,這個方法主要在模擬「註冊」(enrollment)和「驗證」(verification)兩個階段的過程。我們利用tuple的方式輸入給LSTM神經網路,每個tuple裡有 1個驗證語音 和 M個註冊語音,這些語音都會轉成一種特徵表示(例如log-mel濾波器能量)。 > [name=哈密蛤 ]tuple的意思是一組資料的組合,可以把它想像成是「一包資料」 > [name=哈密蛤 ]tuple還會分成正、負樣本,差別在於驗證註冊的語音來源是否相同。 > [name=哈密蛤 ]如果驗證語音與註冊語音是同一人,則希望相似度𝑠越大越好。 > [time=Fri, Feb 21, 2025 1:48 PM] * 而在這篇論文中探討了新一代且也是SV2TTS使用到的Encoder,GE2E(generalized end-to-end),他最主要跟前面提到的TE2E不同的地方在於它從原本的tuple變成batch(批次),每個batch當中不在只有1個驗證語音和M個註冊語音,而變成N個驗證語音和M個註冊語音。這樣使他可以一次處理大量音訊。 * 在 TE2E 中,要實現和 GE2E 一次更新同樣的效果,至少需要 2(N-1) 個元組來比較(N是說話者數量)。 > [name=哈密蛤 ] GE2E一次更新相當於TE2E至少2(N-1)次的效果。 > [time=Fri, Feb 21, 2025 2:25 PM] * GE2E&TE2E總結: * TE2E是每次訓練用一個語音對應一個質心,只產生單一的相似度值。 * GE2E是每次訓練一次處理大量語音,產生的是一個相似度矩陣,同時比較一個語音與所有說話者的質心,效果更全面、更有效。 ``` 文本相關(TD-SV) 在註冊和驗證的時候,使用者講的話都是固定的,例如都說一樣的密碼(「OK Google」)。 文本無關(TI-SV) 不限制使用者講什麼內容,你可以隨便說任何話,系統都能夠辨認你的聲音是不是你本人。 ``` * 論文實驗(一)文本相關說話者驗證(TD-SV)實驗結果 * 使用TE2E、GE2E、TE2E+MultiReader、GE2E+MultiReader做比較(EER) * 兩組訓練資料差距極大: 資料集1("OK Google"):1.5億語音、63萬說話者。 資料集2("OK/Hey Google"):120萬語音、1.8萬說話者。 評估用的是 Equal Error Rate (EER)(越低越好)。 ![image](https://hackmd.io/_uploads/S1NoAcr91l.png) > [name=哈密蛤 ]MultiReader可以避免資料不足產生的過擬合因為它會把另一個較大的、相關但不完全相同的資料集放進去一起訓練。 * 從這張圖可以看到不管是TE2E或GE2E使用MultiReader可以有更低的EER * GE2E 的平均EER(2.38%)比 TE2E(2.67%)好約10%,而且GE2E比TE2E的訓練時間縮短約60%。 * 論文實驗(二)文本無關說話者驗證(TI-SV)實驗結果 * 實驗設計: 訓練資料:約3600萬語音、1.8萬說話者。 評估資料:額外1000位說話者,每人約6.3條註冊語音、7.2條驗證語音。 ![image](https://hackmd.io/_uploads/B1id-sr9ye.png) * 結果: 1. Softmax(傳統方法)EER為4.06%。 2. TE2E(之前方法)EER為3.64%,已比Softmax好。 3. GE2E(新方法)EER為3.30%,比TE2E又再改善超過10%。 4. 而且GE2E訓練時間只有其他方法的三分之一。 * 結論: 1. GE2E 不論在文本相關或文本無關的說話者驗證,效果都明顯優於TE2E。 2. MultiReader技術非常有效,特別在處理多關鍵詞或資料極度不平衡的情況。 3. 總結來說,GE2E不只效果更好,訓練速度也快很多(約快60%以上)。 ## 2025/2/25 今天主要是以Synthesizer下去做研究 * Synthesizer研究 * [NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MELSPECTROGRAM PREDICTIONS](https://arxiv.org/pdf/1712.05884) * 這個論文是以Google 推出的 Tacotron 2模型作為研究對象 * 主要分成4大區域:Encoder,Concat,Attention,Decoder * 在這邊的Encoder的主要目的是將輸入的文字(character sequence)轉換為隱藏特徵表示(hidden feature representation),這些特徵會被後續的 Attention 機制 和 Decoder 使用來生成 Mel 頻譜。 * Concat在做的事情是讓Decoder能將過去的語音狀態和當前的文字資訊做資訊整合提高語音品質。 * Attention機制的主要作用是對齊(alignment)和順序校正,對齊是為了將當前要「生成的聲學特徵」對應到「輸入的哪一部分文字」,校正是為了減少漏讀或重複朗讀的問題。 > [name=哈密蛤 ] 用更簡單的說法就是用來掌握「此時此刻要發音(或生成)哪個文字」的關鍵模組。 ## 2025/03/25 * Synthesizer研究 * [NATURAL TTS SYNTHESIS BY CONDITIONING WAVENET ON MELSPECTROGRAM PREDICTIONS](https://arxiv.org/pdf/1712.05884) * 今天主要是要先處理Synthesizer剩下的Decoder的部分 * Decoder主要作用為:產生聲學特徵、配合Attention和處理語速與韻律,從Encoder的高階特徵表示再加上Attention去調整輸出最後再用Decoder的自回歸方式產生聲學特徵訓練最後即可合成完整的頻譜(Mel Spectrogram)。 > [name=哈密蛤 ] 這裡有一個重點,Decoder出來的結果並不是音訊,而是頻譜,音訊的完成還要藉由後面的Vocoder來完成。 * 完整的TTS可以看成以下的架構 ``` [Text] ↓ [Encoder + Attention + Decoder (Synthesizer)] ↓ ← 自回歸產生 mel [Mel Spectrogram] ↓ [Vocoder (WaveNet)] ↓ ← 自回歸產生聲音 [Waveform 音訊] ``` ## 2025/03/26 * 研究內容:Vocoder * [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558) * vocoder的作用是將最後由synthesizer中Decoder輸出的Mel spectrogram轉換變成Audio waveform。 * 在這邊研究的Vocoder類型為自回歸式WaveNet * 模型架構: * 使用30層dilated convolution layers 的 WaveNet,優點能捕捉短期 + 長期語音特徵;保有高音質。 * 為什麼要用到30層在這篇論文中有提到,它們嘗試使用較淺的模型(12層),音質就會掉一些(MOS變低) > [name=哈密蛤 ] Dilated convolution layers(擴張捲積)可以視為「跳點」(不再是1,2,3個點而是以擴張程度來看),簡單來說就是可以讓他視野(receptive field)變大,但參數不會增加。 > [name=哈密蛤 ] MOS的意思是{Mean Opinion Score|平均主觀意見分數},評分方式是來自「人耳聽感」的主觀評分。 * 不直接接收 speaker embedding,而是完全依賴 synthesize 出來的 mel spectrogram 來產生語音。 > [name=哈密蛤 ] speacker embedding的概念是讓電腦知道這個是誰說的,但這個 vocoder 是不帶 speaker-specific conditioning 的設計,這樣做的好處是它能夠「一次訓練,多人適用」。 * 採用自回歸方式,一點一點預測音訊波形(時間序列數據)。 > [name=哈密蛤 ] Vocoder的自回歸和前面Decoder的自回歸模型是不同的,因為兩者負責不同的轉換。 ## 2025/04/05 * 內容:SV2TTS實作 * {%preview https://github.com/lsh950919/sv2tts %} * {%preview https://github.com/fatchord/WaveRNN %} 1. 先按照他的SetUp程序下必須的Requirements ``` pip install python=3.6 pip install torch pip install ffmpeg 還有他的requirements大禮包 pip install -r requirements.txt ``` 2. 下載模型必須的pretrained檔案 ![image](https://hackmd.io/_uploads/S1NzbYAa1l.png) 3. 下載好後進入到地獄的demo_cli.py檢測 * 這邊需要按照他所缺少的東西一個一個慢慢地從Github當中下載下來 * 成功後會長這樣 ![image](https://hackmd.io/_uploads/BkrSmKCaJg.png) * 碰到的問題: * GPU沒被偵測 ==尚未解決== ![image](https://hackmd.io/_uploads/Skx_XtRa1x.png) * Github上面的檔案沒有所需的資源 ==解決== * 在vocoder當中使用WaveRNN需要的models不見了只好自己去碰運氣找 * 最後使用fatchord_version.py * 簡而言之東西大致上都下載好了 目前到這邊以為都好了但後來發現我看的這篇github是別人從原版實作出來的所以少了很多東西,總結是,內容不全加上東拼西湊導致問題一大堆,我有找到原版的github等下次研究的時候再來重新實作 {%preview https://github.com/CorentinJ/Real-Time-Voice-Cloning %} ## 2025/04/07 * 研究內容:再次嘗試執行 {%preview https://github.com/CorentinJ/Real-Time-Voice-Cloning %} * 在經過多次探索後成功通過了demo_cli.py * 以下是所需要下載的檔案 ![image](https://hackmd.io/_uploads/H1WvYB-Rkx.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully