funnn
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.

      Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

      Explore these features while you wait
      Complete general settings
      Bookmark and like published notes
      Write a few more notes
      Complete general settings
      Write a few more notes
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note No publishing access yet

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.

    Your account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Your team account was recently created. Publishing will be available soon, allowing you to share notes on your public page and in search results.

    Explore these features while you wait
    Complete general settings
    Bookmark and like published notes
    Write a few more notes
    Complete general settings
    Write a few more notes
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Transformer 架構介紹 > 論文: > Attention Is All You Need > https://arxiv.org/pdf/1706.03762 ![螢幕擷取畫面 2024-10-14 182815](https://hackmd.io/_uploads/HkFoGu91ye.png) 是由傳統的Sequence to sequence演化而來 ![image](https://hackmd.io/_uploads/rkrOGP5kye.png) ![image](https://hackmd.io/_uploads/B13tzw5k1g.png) 先來介紹Encoder ### Encoder > 輸入一個sequence,會輸出另一個sequence ![image](https://hackmd.io/_uploads/rJi2fwc11e.png) - 而Encoder 是由多個block組成,每個block又主要包含兩個子層(Multi-head self attention)、FC(Feed-forward Neural Network) Q: why block 不叫做layer? > 因為block在做的事情,是好幾層layer在做的事 ![0B71EAE2-1F00-4F7C-A362-DAD8D959471C](https://hackmd.io/_uploads/ryyP7v9yJg.jpg) > 1. Input 一排vector > 2. Output一排vector > 3. 再經過Fully Connected layer(也就是Feed-forward Neural Network)後,作為一個block的output - 而每個block,又包含一個 Add & Norm (也就是Residual Connection & Layer Normailization) ![DA4328D2-8006-4821-8BAC-450528354811](https://hackmd.io/_uploads/S11LBDckJl.jpg) > 1. Input 會先經過Self attention > 2. 經過Add & Norm (也就是Residual Connection & Layer Normailization) > 3. 再經過Fully connected > 4. 再經過Add & Norm (也就是Residual Connection & Layer Normailization) #### Encoder 完整架構 ![image](https://hackmd.io/_uploads/rJrcBD9J1g.png) ### Decoder ![image](https://hackmd.io/_uploads/S1nWww5y1x.png) ![image](https://hackmd.io/_uploads/B1Azwvc1Je.png) > 在Transformer架構下的Decoder,通常都採用Autoregressive(AT,自我回歸模型)的方法,步驟如下: > 1. 每次生成一個新的token > 2. 將該token添加到已生成的sequence中 > 3. 使用更新後的sequence來預測下一個token #### Autogressive介紹 > 透過遞迴方式預測下一個token,依賴於先前的sequence值,意味著每一步都依賴於先前的輸出結果。 > 常用於機器翻譯、文本生成 ![image](https://hackmd.io/_uploads/Hyn0OD5JJg.png) > Encoder部分:輸入一個voice,會輸出另一個vector sequence > Decoder: > 1. 將Encoder的output當作Decoder的input > 2. 經過Softmax後,基於機率分布挑選出vector中最大的值最為輸出的token - 此vector的長度,會跟結果表值的數目是一樣的(所以要先定義好會輸出那些東西) ![image](https://hackmd.io/_uploads/Bk8k5vqk1g.png) > 每次產生的token就做為下一次Decoder的input #### Decoder 完整架構 ![image](https://hackmd.io/_uploads/r134qv9kJg.png) ### Encoder 跟 Decoder 比較 ![image](https://hackmd.io/_uploads/Syhdcw9kkg.png) > 若遮掉Decoder中間的部分,跟Self-attention的不同,和輸出多經過了Softmax外,其實兩者架構大致相同 ### Self-attention 與 Masked Self-attention 差異 #### Self-attention > 是Transformer 中用來捕捉sequence 中不同位置之間依賴關係的機制。對於Sequence中的每個位置,model會計算該位置與其他位置的相似度(注意力權重),進而決定如何整合各個輸入訊息。 特色: - 每個位置的元素可以和序列中的所有其他位置互相作用。 - 模型透過查詢(Query)、鍵(Key)和值(Value)來計算注意力權重並進行加權求和,生成新的表示。 - 沒有任何限制,因此序列中的所有元素都可以互相關注 讓模型能夠捕捉到序列中長距離的依賴關係,適用於像是機器翻譯等任務,因為一句話中的某個詞可能與距離較遠的上下文相關聯 ![image](https://hackmd.io/_uploads/HJFRCPc11x.png) > 計算時會考慮到整個sequence ![image](https://hackmd.io/_uploads/HkA1ovqJ1g.png) #### Masked Self-attention > 與Self-attention不同在於多加了一個Mask(遮蔽),限制model只能關注到sequence已生成的部分,不能看到未來的元素,確保了model在生成sequence時不會偷看到尚未生成的內容 特色: - 在計算注意力權重時,模型會對未來的時間步(標記)進行遮蔽,阻止這些未來的標記被當前的時間步關注到。 - 在解碼過程中,生成下個標記時,模型只能看到之前生成的標記,無法提前看到後續標記。 此作法保證了任務中的順序依賴性,常用於文本生成的任務,以確保前後語意連貫 ![image](https://hackmd.io/_uploads/rkQb1uqk1l.png) > 計算時只會考慮到他自己之前 ![image](https://hackmd.io/_uploads/BkDMsw9yke.png) ### 差異總結 | **特性** | **Self-Attention(自注意力)** | **Masked Self-Attention(遮蔽的自注意力)** | |------------------------|---------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------| | **關注範圍** | 每個位置的元素可以與序列中的所有其他元素互相作用,**沒有限制**。 | 模型只能關注到**當前位置之前**的元素,未來的元素被遮蔽,無法看到未來的內容。 | | **使用場景** | 主要用於**編碼器**,適合需要捕捉整個序列上下文的任務,如分類、機器翻譯等。 | 主要用於**解碼器**,特別是生成任務(如語言模型、文本生成等),確保生成過程中遵循順序,不會參考未來的內容。 | | **計算方式** | 每個標記能與所有其他標記進行交互,捕捉全局信息。 | 透過遮蔽,模型只能關注已經生成的部分,確保生成過程是逐步進行的,不會依賴尚未生成的內容。 | | **應用場景舉例** | 編碼器的 Self-Attention,能夠考慮到句子中的所有詞語來生成每個詞的表示。 | 解碼器的 Masked Self-Attention,當生成一個新的詞時,只能看到之前生成的詞,保證生成是按順序進行的。 | ### Encoder 結合 Decoder ![image](https://hackmd.io/_uploads/S1Aw1O5ykg.png) ### Cross attention >負責將Encoder的輸出(表示輸入序列的信息)和Decoder的輸入(即正在生成的序列)連接 > 有兩個input: k、v來自Encoder、一個input: q 來自Decoder本身 ![image](https://hackmd.io/_uploads/ByyUk_qyke.png) ![image](https://hackmd.io/_uploads/HyDbldqk1l.png) 這張圖顯示了**Transformer 解碼器**中的**Cross-Attention** 和 **Masked Self-Attention** 機制的工作原理。以下是對圖中各部分的詳細解釋: ### 1. **Encoder 編碼器部分** 圖的左側是**編碼器**,它處理輸入序列(如語音波形或文本序列),並生成輸入的關鍵值對,即 `k` 和 `v`。這些值表示編碼器對每個輸入位置的理解。 - `k^1`, `k^2`, `k^3`: 這些是編碼器為輸入序列中的每個標記生成的**鍵(Key)**。 - `v^1`, `v^2`, `v^3`: 這些是對應的**值(Value)**。 - `a^1`, `a^2`, `a^3`: 表示編碼器中的某些狀態或表示,這些狀態經過編碼後會傳遞到解碼器中供其使用。 ### 2. **Self-Attention (Masked)** 圖的右側顯示了解碼器中的**遮蔽自注意力機制(Masked Self-Attention)**。這部分負責生成過程中,解碼器對當前標記的自我關注。 - 解碼器的每個標記會經過自注意力機制 `q'`,該機制基於當前位置的內容和已生成的內容,將生成的序列逐步擴展。 - 遮蔽(masking)確保模型不能“偷看”還未生成的標記,只能參考之前的輸出。 ### 3. **Cross-Attention 機制** 圖中的**綠色線條**表示 Cross-Attention 的運作,它將來自編碼器的 `k` 和 `v` 與解碼器的 `q'` 相互結合。 - `q'` 是解碼器生成的查詢向量(query),它與來自編碼器的鍵值對(`k`, `v`)進行計算。 - `α'_1`, `α'_2`, `α'_3`: 這些是注意力權重,表示解碼器對每個編碼器輸入位置的注意力分配。這些權重決定了解碼器應該對哪些編碼器輸出的部分關注更多。 - 通過這種方式,解碼器可以參考編碼器對輸入序列的理解,從而在生成過程中保持與輸入的相關性。 ### 4. **加權求和(Weighted Sum)** 圖中標有 `×` 的部分表示對注意力權重和對應的 `v` 值進行乘法操作,並通過加權求和生成最終的 `v'`。這個加權的 `v'` 是解碼器輸出的一部分,並會傳遞到全連接層(FC)進一步處理。 ### 總結 這張圖示意了**Transformer 解碼器中的兩個主要注意力機制**: - **Masked Self-Attention**:解碼器關注自己已生成的序列,通過遮蔽確保生成是按順序進行的。 - **Cross-Attention**:解碼器關注編碼器的輸出,這是解碼器與編碼器之間的連接橋樑,讓解碼器可以參考輸入序列的信息來生成與之匹配的輸出。 --- ## 也可應用於object detection ![image](https://hackmd.io/_uploads/rJInWO5yye.png) > 論文: > End-to-End Object Detection with Transformers > https://arxiv.org/pdf/2005.12872 ### Masked-attention Mask Transformer for Universal Image Segmentation [論文](https://openaccess.thecvf.com/content/CVPR2022/papers/Cheng_Masked-Attention_Mask_Transformer_for_Universal_Image_Segmentation_CVPR_2022_paper.pdf) > [Github](https://github.com/facebookresearch/Mask2Former?tab=readme-ov-file) #### Abstract > Image segmentation 將具有不同semantic的pixel分組,例如類別或實例的歸屬。每種semantic的選擇定義了一個任務。雖然這些任務的區別僅在於semantic,但當前的研究主要關注為每個任務設計專門的架構。 > > 論文提出了Masked-attention Mask Transformer(Mask2Former),這是一種新架構,能夠處理任何圖像分割任務(全景分割、實例分割或語義分割)。其關鍵組件包括遮罩注意力,透過將一般Transformer做的cross self-attention限制在預測的mask self-attention區域內,來提取局部特徵,更能確保先後順序。 > > 此作法除了減少至少三倍的研究工作量外,它在四個流行的數據集上也顯著超越了最佳的專用架構。最值得注意的是,Mask2Former在全景分割(COCO數據集上達到57.8 PQ)、實例分割(COCO數據集上達到50.1 AP)以及語義分割(ADE20K數據集上達到57.7 mIoU)方面創下了新的最先進技術標準。 ![image](https://hackmd.io/_uploads/H1IQY9ckke.png) > 圖1. 最先進的分割架構通常針對每個圖任務進行專門設計。儘管最近的研究提出了嘗試處理所有任務的通用架構(2021年 Meta 推出的 MaskFormer),並且在semantic和panoptic segmentation 上表現出色,但在instance segmentation的表現就很差。在2022年,Meta提出改良過的框架Mask2Former,首次在多個數據集上的三個分割任務中,超越了最佳的專用架構。 ### 1. Introduction 影像處理主要分成三種任務:全景、實例、語意分割,雖然他們只有semantic不同,但仍然針對每個任務開發了專用的架構 如FCNs 專門處理semantic segentation 問題、 Mask classification專門處理instance segmentation問題 但彼此要泛化到其他任務的表現皆不佳 為了解決這種分散的情況,近期的研究設計了能夠處理所有分割任務的universal architectures(即universal image segmentation)。這些架構通常基於end-to-end的集合預測目標(如DETR),可處理多個任務,而無需修改架構、損失函數或訓練過程。 需要注意的是,universal architectures仍然需要針對不同的tasks和datasets單獨訓練,儘管架構是相同的。除了靈活性之外,通用架構最近在語義和全景分割上展示了最先進的結果。然而,近期的研究仍然專注於推進專門的架構,這引發了疑問:為什麼通用架構還沒有取代專門的架構? 儘管現有的通用架構足夠靈活來處理任何分割任務,但實踐中它們的性能仍落後於最佳的專門架構。例如,通用架構在實例分割上的最佳報告性能低於專門架構(> 9 AP)。以下整理出幾項缺點: 1. 每個任務都能處理,但與專用型的相比性能仍較差 2. 訓練更佳困難:硬體要求高與更長的訓練時間。例如,訓練MaskFormer (這次的主角Mask2Former的前身) 需要300個epoch才能達到40.1 AP,而且只能在一個32GB的GPU上放下一張圖像。而專門的Swin-HTC++僅需72個epoch就能取得更好的性能。 這些性能和訓練效率問題阻礙了universal architectures的廣泛部署。 在本研究中,提出了一個 **universal image segmentation** 架構,名為 ## **Mask-attention Mask Transformer(Mask2Former)** 該架構在不同的分割任務上超越了專門的架構,並且在每個任務的訓練上也很簡單。基於一個簡單的meta architecture,包括 ![image](https://hackmd.io/_uploads/H1qtg0qyyl.png) 1. a backbone feature extractor 2. a pixel decoder 3. Transformer decoder 這次提出了幾個關鍵的改進,使得結果更好且訓練更高效(對比之前的Maskformer) 1. 在Transformer decoder使用mask self-attention,去限制self-attention,讓它只能關注預測片段前的局部特徵,這些片段可以是物體或區域,具體取決於像素分組的語義。 2. 與一般的Transformer decoder的cross attention相比,我們的mask-attention使得收斂速度更快,性能更好。 3. 其次,我們使用了multi-scale high resolution features,有助於模型分割小物體或區域。 4. 我們提出了一些優化改進,例如改變Transformer中self-attention和cross-attention的位置,使查詢特徵可學習,以及去掉dropout 所有這些改進在不增加計算量的情況下提升了性能。此外,我們通過在隨機取樣的少量點上計算mask loss,節省了3倍的訓練內存,且不影響性能。 作者們在三個圖像分割任務(全景分割、實例分割和語義分割)上,使用四個流行數據集(COCO、Cityscapes、ADE20K和Mapillary Vistas)對Mask2Former進行了評估。首次在所有這些基準上,我們的單一架構與專門的架構表現相當或更好。Mask2Former創下了新的最先進技術標準:COCO全景分割達到57.8 PQ,COCO實例分割達到50.1 AP,ADE20K語義分割達到57.7 mIoU。 ### Related Work

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password
    or
    Sign in via Google Sign in via Facebook Sign in via X(Twitter) Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    By signing in, you agree to our terms of service.

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully