蝦想承為工程師
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ## 自監督式學習(self-supervised) v.s. 監督式學習(supervised) 以下是 Copilot 對於兩者所給出的介紹與定義 >監督式學習(Supervised Learning): >1. 監督式學習是一種基於標記數據的學習方法。 >2. 例如,分類(例如圖像分類)、回歸(例如房價預測)和序列生成(例如語言模型)。 >自監督式學習(Self-Supervised Learning): >1. 自監督式學習是一種無需人工標記的學習方法。 >2. 例如,在視覺模型中將圖像切成不同塊,然後預測它們之間的關係。 ![image](https://hackmd.io/_uploads/SyOvcCJ80.png) ## BERT Series ### Masking input BERT 是 Transformer 的 Encoder,在 BERT 的訓練過程中,有一些輸入是被隨機遮蓋住,稱為 Masking input,而遮蓋的方式可以是一個 Special token 或者是替換成其他的方塊字,將這串有 Masking 的輸入丟入 BERT 中得到輸出 Vector sequence,把先前做 masking 的輸出向量做線性轉換(Linear transform),接著再經過 Softmax 輸出另一個向量(機率分布),期待與正確答案(ground truth)的 loss 愈小愈好。 這個被遮蓋住的輸入,透過 BERT 找到對應的方塊字,可以想像成是讓 BERT 學習一個填空題的任務,看到機 X 學習四個字,預期 BERT 自動學習並填上方塊字「器」。 ![image](https://hackmd.io/_uploads/H1uQiRk8A.png) ### Next Sentence Prediction 除了利用 masking input 訓練 BERT 外,還有另一個是 Next Sentence Prediction,丟入兩個句子到 BERT 中,開頭的 [CLS] 與中間的 [SEP] 都是 Special token,經過 BERT 後只看 [CLS] 的輸出向量,接著做 Linear transform 輸出 Yes/No,判斷句子是否相連或是不相連,完成一個二元分類的任務。 ![image](https://hackmd.io/_uploads/r1tsBMbIA.png) 但是,做完了 Next Sentence Prediction 的訓練,BERT 並沒有學到太多東西,因為 Next Sentence Prediction 的任務很簡單,從資料庫中隨意挑選出兩個句子,很容易就能夠判別出這兩個句子是否相接在一起,對於 BERT 而言並沒有太大的幫助,因此有人提出了 Sentence Order Prediction(SOP) 使用在 ALBERT(進階版的 BERT)上,有興趣者可再深入研究。 :::info 訓練一個 BERT 的模型,至今仍是非常困難,在 Google 釋出的模型需要的訓練資料高達 30 億筆,相當於哈利波特全集看了 3000 次,由於 BERT 是利用 Self-supervised Learning 不需要標註資料(labeled data),因此可以在網路上爬到相當的資料,但更困難在於訓練過程,利用 TPU 資源也需要跑了好幾天才能將 BERT 訓練完成。 ::: :::info Pre-train a Seq2Seq model 上述只有提到如何訓練 BERT 也就是 Encoder 的架構,那 Decoder 該如何訓練呢?這裡需要將 Encoder 的輸入向量弄壞(Corrupted),經過 Encoder 將輸出資訊利用 Cross attention 傳給 Decoder,而 Decoder 必須試圖還原輸入的方塊字。 弄壞(Corrupted)的方式有很多種,例如 Masking input、Delete input、Permutation、Rotation、Text infilling 等。 ![image](https://hackmd.io/_uploads/HyZ0BEbIR.png) ![image](https://hackmd.io/_uploads/BkXgvkmv0.png) ::: ### BERT 之應用 經過自監督式學習(Self-supervised Learning)的 BERT,這個過程稱為 Pre-train,歷經了 Masking input 以及 Next Sentence Prediction 的訓練後,已經學會了如何做填空題的任務,神奇的事情就在此刻發生了,再給予少量標註資料(labeld data)至模型中,也就是微調(Fine tune),就能夠將 BERT 運用在下游的任務(Downstream Tasks)上。 簡單可以想像成,BERT 其實就像是幹細胞,具有發展成許多不同細胞類型的潛力,例如肝細胞,肌肉細胞,血球細胞等,正與 BERT 經過 Fine tune 後能夠運用在其他下游任務上相似。 因此,這也可以拿來決定 Self-supervised model 例如 BERT 的好或壞,最知名的任務集為 GLUE(General Language Understanding Evaluation),會將 BERT 做微調(Fine tune)在九個不同的任務上,看它在每個任務上的表現並取平均,即可用來判斷該模型的好壞。 ![image](https://hackmd.io/_uploads/BJMCKM-IC.png) #### Case 1: Sentiment analysis 此任務為判斷評論是正面(positive)或負面(negative),這裡的 BERT 是已經學會做填空題的模型,使用 pre-train 完的參數作為初始化,而 Linear 裡的參數則是隨機初始化,後續再經過 gradient descent 更新參數。 輸入 [CLS] 以及一個句子至 BERT 中,只看 [CLS] 的輸出向量經過 Linear 轉換,再經過 Softmax 輸出類別(positive/negative)。 ![image](https://hackmd.io/_uploads/S1P6kmZUC.png) #### Case 2: POS tagging 此任務為詞性標註,跟 case 1 的條件相同,只不過這裡的輸出是句子中每一個字的詞性類別。 ![image](https://hackmd.io/_uploads/HJu_7Q-UC.png) #### Case 3: Natural Language Inference(NLI) 此任務為立場分析,輸入有兩個分別是前提(Premise)以及假設(Hypothesis),丟入至微調後的 model 中,只看 [CLS] 的輸出向量並經過 Linear,得到一個類別(contradiction/entailment/neural)。 ![image](https://hackmd.io/_uploads/HJpT67WIC.png =50%x)![image](https://hackmd.io/_uploads/rysiRm-L0.png =49%x) #### Case 4: Extraction-based Question Answering(QA) 此任務為問答系統,且答案可以在文章裡找到,簡單來說,丟入兩個輸入分別是文章(D)以及問題(Q)至模型中,輸出兩個正整數,代表答案是從第 s 個字到第 e 個字。 ![image](https://hackmd.io/_uploads/SkE6k4Z8C.png) 更具體來說,將 Q 以及 D 丟入微調後的 BERT 中,輸出一串 vector sequence,再使用兩個向量皆為隨機初始化(Random initialized)的參數,第一個橘色向量與輸出向量做內積(Inner product),對應較高數值代表 s 的數字,相同的第二個藍色向量也會得到另一個代表 e 的數字,即可得知答案為文章裡的第 s 個字至第 e 個字。 ![image](https://hackmd.io/_uploads/B1w0g4ZIA.png =49%x)![image](https://hackmd.io/_uploads/rJOkbNZUR.png =49%x) :::info BERT 雖然是 Transformer 的 Encoder,但是其輸入數量仍有上限,若輸入在沒有限制的情況下,在 BERT 中使用的 attention 運算量會過大,因此其輸入是有限制的。 ::: ### BERT 的運作原理 輸入具有 masking 的文字至 BERT 中,輸出能夠代表該方塊字的向量,為何 BERT 能夠準確地輸出該方塊字呢?最主要原因是考慮到前後文的資訊,並且理解向量間的關係或是距離遠近,最後推斷出最合適的方塊字。 較早以前有 CBOW 的模型,也是做類似填空題的任務,但比起 BERT 是較簡單的模型,BERT 為較多層的模型且考慮前後文的資訊,因此被稱為 Contextualized word embedding。 ![image](https://hackmd.io/_uploads/BysuVHHI0.png) #### Multi-lingual BERT 其中一個 BERT 在 Pre-train 過程看過了 104 種語言,但是在微調(fine tune)上有所差異,給予少量中文問答(QA)的資料,在實際中文問答的測試上有高達 88.7% 的準確率,然而給予少量英文問答的資料,在實際中文問答的測試上也有高達 78.7% 的準確率,BERT 只有看過中文問答的資料,卻自動也學會了英文問答。 這代表在看過 104 種語言後的 BERT,不論是中文、英文或是其他語言,只要這些字具有類似的意思,它們的向量分布(距離)就會愈接近。 ![image](https://hackmd.io/_uploads/rJJtorBIC.png) 但是,BERT 在未經微調(fine tune)前是做填空題的任務,給予中文填空就會輸出該方塊字,給予英文填空就會輸出該英文字,而不會給予中文填空輸出英文字,雖然不同語言具有相近意思的字其向量會較接近,但 BERT 仍知道不同語言是有差異的,並沒有抹除掉語言資訊。 原本輸入為英文的句子丟入至 Multi-BERT 後,得到一串 vector sequence,再加上一個藍色向量代表中文與英文的差距,竟然輸出會是中文的句子。 ![image](https://hackmd.io/_uploads/HJnOJLBL0.png) ## GPT Series GPT 能夠預測下一個出現的 Token,輸入 <BOS> 至模型中,輸出一個 Embedding,依序經過 linear transform 以及 softmax 得到一個機率分布,最後輸出方塊字「台」,接著將輸出「台」再丟入模型中,最後將句子完成,這邊特別注意,GPT 不會看到之後的輸入,僅會看到現在的輸入以及過去的資訊。 ![image](https://hackmd.io/_uploads/rJhTfLSLC.png) 在 GPT 的推論過程中,會給予任務的描述(Task description),還有一些 example,期待 GPT 能夠看前面的輸入(prompt)就能夠將句子完成,這種方式稱為 Few-shot learning,當然還有 One-shot learning 以及 Zero-shot learning,過程中不會調整 GPT 模型的參數,也就是不會做 Gradient descent 最佳化處理,此種方式稱為 In-context learning。 ![image](https://hackmd.io/_uploads/rkYU48SUR.png) ## 總結,自監督式學習(self-supervised learning) 在文章中,介紹了 BERT 的訓練過程,包含 masking input 以及 next sentence prediction,經過 Pre-train 後的 BERT 就像是幹細胞,微調(fine tune)後能夠使用在不同的下游任務上,還有 BERT 的運作原理,除了能保有語言資訊外,也能夠讓意思相近的方塊字向量較接近,最後是介紹了 GPT 的簡單概念。 --- :::info 以上就是這篇文章「自監督式學習 (Self-supervised Learning) - BERT、GPT」的所有內容,第一次看的人會花比較多時間消化吸收,這是很正常的事情,若有任何問題,歡迎在下方與我聯繫、討論,接下來也會繼續分享相關文章,敬請期待。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully