楊志璿
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # SOO :::warning 前情提要 [Facebook](https://www.facebook.com/groups/system.software2022/posts/2081326045382128/) [微中子的貼文](https://tigercosmos.xyz/post/2022/06/c++/smallvec/) ::: SSO and SOO 我相信大家都不陌生,詳細可以看 Mes 鄭教授的[這篇介紹](https://hackmd.io/@Mes/Miner_Magic_SSO)。不過對於[微中子的貼文鏡像](http://archive.today/Aw9tT)中的測試方式第一眼我是有疑慮的 這邊有幾點需要考慮: 1. [物件可不可以被優化,而搬到迴圈之外?](#物件可不可以被優化,而搬到迴圈之外?) 2. [如何在 stack 上生成 array ?](#如何在stack上生成array?) 3. [更好的 SSO 評估方式](#更好的SSO評估方式) > 有點抱歉中英文沒有加空白鍵,讀起來有一點點痛苦,這是為了要實現 href 讓你方便直接跳到那個 tag ## 物件可不可以被優化,而搬到迴圈之外? 假定你已經熟悉這優化,也知道 RAII,我們就直接來看組語吧! [Godbolt](https://godbolt.org/z/q8xs49Tsx) 先看 C++ ```cpp= template <typename T, size_t N = 10> class SmallVec { public: SmallVec() = default; void push_back(T const & value) { if(cur < N) m_data[cur++] = value; else m_v.push_back(value); } private: std::vector<T> m_v; std::array<T, N> m_data; size_t cur = 0; }; ``` 對照抽取的部份 asm ```asm= .L19: mov QWORD PTR [rsp+80], 0 xor eax, eax xor r14d, r14d xor ebp, ebp xor r15d, r15d xor ebx, ebx jmp .L15 .L39: lea rdx, [rax+1] mov QWORD PTR [rsp+80], rdx mov DWORD PTR [rsp+40+rax*4], ebx .L3: add ebx, 1 cmp ebx, 10 je .L14 .L40: mov rax, QWORD PTR [rsp+80] .L15: mov r12, r14 sub r12, rbp cmp rax, 9 jbe .L39 cmp r14, r15 je .L4 mov DWORD PTR [r15], ebx add ebx, 1 add r15, 4 cmp ebx, 10 jne .L40 .L14: test rbp, rbp je .L16 mov rsi, r12 mov rdi, rbp call operator delete(void*, unsigned long) sub r13, 1 jne .L19 ``` 注意到 .L14 這邊就是 dtor,可以明顯看到我們確實讓他在 .L15 的 Line 26 mov DWORD PTR [r15], ebx 把資料般到 r15 之後,如過超出迴圈範圍,會跳到 .L14 呼叫 dtor 我們把他加上一個 std::vector 作為 member,這樣當 sv 需要被解構的時候必須要去解構 std::vector,這樣就無法被 compiler 化簡 dtor。 所以我們得到 std::vector 跟 SmallVec 效能相似的結果 ![](https://i.imgur.com/EVGRDO3.png) 而效能比 std::vector 好,我認為是分支比較少的關係,因為我們可以觀察到: sv 的 push_back 相當簡單,只有一個分支而後就可以搬動資料 ```asm= .L19: mov QWORD PTR [rsp+80], 0 xor eax, eax xor r14d, r14d xor ebp, ebp xor r15d, r15d xor ebx, ebx jmp .L15 .L71: lea rdx, [rax+1] mov QWORD PTR [rsp+80], rdx mov DWORD PTR [rsp+40+rax*4], ebx ``` 而,std::vector 的 push_back 分支處理就比較多次: ```asm= .L31: mov r12, r14 sub r12, r15 cmp rbx, r14 jne .L74 movabs rcx, 2305843009213693951 mov rax, r12 sar rax, 2 cmp rax, rcx je .L75 cmp rbx, r15 mov edx, 1 cmovne rdx, rax add rax, rdx jc .L25 test rax, rax jne .L76 xor r9d, r9d xor r14d, r14d xor r13d, r13d .L27: mov DWORD PTR [r13+0+r12], ebp lea rbx, [r13+4+r12] test r12, r12 jg .L77 test r15, r15 jne .L29 .L30: # Loopping add ebp, 1 mov r12, r9 mov r15, r13 cmp ebp, 10 jne .L31 ``` 觀察到 .L31 的邊界檢查以及移動邊界,分支失敗多次對效能是很有傷害的。 所以,即便 vector 的仍快取在 page table 上,也會有相較於純 array 1.5 倍的邊界負擔。 ## 如何在stack上生成array? [Godbolt](https://godbolt.org/z/Y8aj4Wcqs) 首先我們看到 Line 10: ```asm= main: push r15 mov eax, 3735928495 push r14 push r13 push r12 movabs r12, 2305843009213693950 push rbp push rbx sub rsp, 104 mov QWORD PTR [rsp+24], rax lea r13, [rsp+56] mov rbp, r13 ``` 對應的 Cpp 是: ```cpp= constexpr int K = 10; namespace { template <typename T, size_t N = 10> class SmallVec { public: explicit SmallVec(size_t size); void push_back(T const & value); T operator[](size_t index); private: T *m_head = nullptr; size_t m_size = 0; size_t m_capacity = N; std::array<T, N> m_data; }; }; ``` 104 是被 alignment 優化的,優化 stack 往上增長以 8 的倍數做考量(因為是 64-bit)。我們可以明顯看到 sub rsp, 104 就是推移 stack pointer 104 byte 預留給這個 array 跟 member data。 ### Array 的運作方式 如上是直接推移 stack pointer 沒有對 memory 做額外的 allocation,基本上跟 [alloca](https://man7.org/linux/man-pages/man3/alloca.3.html) 等價。 而 std::vector 每次都會去 heap allocation 所以會有微中子這篇的結果: ![](https://user-images.githubusercontent.com/18013815/174789476-7c2693bd-55b1-47cd-a588-2be5c36d8dbc.png) ### 迴圈與函式呼叫 會有這議題是因為 sv 物件生命週期在迴圈的 scpoe 當中,所以我們期待要將 sv 物件在迴圈離開(或重入)的時候解構。但是這個 sv 物件的 stack 上成員其實是可以重複利用的,所以在 -fcombine-stack-adjustments (-O1 就有) 開啟的情況下,是可以將個一個 sv 物件的 array reference 到 stack 上 array。 而且迴圈不會推動 stack 的指標,所以這邊在建構及解構 sv 物件的時後,頂多是洗掉 array 中的值,也不會將 array re-allocation 去推動 stack pointer。 所以這觀念有點像 tail-recursion,用跳躍的方式去執行重複步驟,而不用額外推 stack 保存狀態。 ### 有一點奇怪的測試結果 考慮到上面是把 array 放在 stack 上,所以比反覆 malloc 在 heap 效能好,那麼我們把 array 放在 heap 上,同時也把 vector 的 reserve 打開(減少 space doubling 的誤差)。 這樣兩個效能應該要近似: ![](https://i.imgur.com/R6gQLx3.png) 確實是有點相似,但是為什麼 std::unique_ptr<std::array<T, N>> 會比 std::vector 還要差呢? 我還沒有好的解釋,std::unique_ptr 除了搬到 heap 上,應該是沒有其他 overhead 才對,歡迎告訴我。(其實我還沒去看 asm,但是我想可以從 asm 看出 std::unique_ptr 的問題) ## 更好的SSO評估方式 因為重點是小物件的分配可以放在 stack 上,然而 stack 是 fixed length 的,即便你跨越這個 function stack 預期的空間使用,你還是有可能會遇到超出 stack page 的 stack overflow,所以我們設計一個包裝 (wrapper) 用來包裝 array 跟 vector,然後跟指標指向 vector 比較,以減少 RAII dtor 對效能的衝擊。 > 其實我覺得 RAII 是好東西,但是對於最極度的優化(最佳化),是需要調整的。 上面用 std::unique_ptr 把 array 搬到 heap 是一個方式,這樣可以「控制變因」,讓兩個比較都有經過經過 heap 上相同的解構次數。 但是,更好的測定方式是將造成干擾的變因移除,不是加入干擾。 所以我們把 vector 搬到迴圈外,然後在迴圈內使用 clear 而不是 dtor,這樣可以有效的移除 dror 在迴圈結束的後,造成的衝擊。 ### 保留 std::unique_ptr 與 std::vector 的比較 ![](https://i.imgur.com/p5hy1n0.png) ```cpp= static void SmallVector(benchmark::State& state) { for (auto _ : state) { SmallVec<int> sv; // 使用 inline memory for(int i = 0 ; i < K; i++) { sv.push_back(i); } } } BENCHMARK(SmallVector); static void StdVector(benchmark::State& state) { std::vector<int> v; v.reserve(K); // 分配新的 Heap 記憶體 for (auto _ : state) { v.clear(); for(int i = 0 ; i < K; i++) { v.push_back(i); } } } BENCHMARK(StdVector); ``` 我們可以看到這樣 vector 就顯著的比原本好多了,可見如我們預期主要影響因子是 dtor。 ### 把 dtor 都消除,比較都在 heap 上的結果 ![](https://i.imgur.com/BYLkPnq.png) 可以看到 std::vector<> 的處理似乎比較好,但是原因尚未清楚。 看起來 mov 0xc(%rsp),%eax 是把 i 複製到 array 當中,花上了 50% 的效能 ![](https://i.imgur.com/EDKVgEg.png) #### 把 O3 改為 O2 ![](https://i.imgur.com/HroFjcb.png) 我對 O3 的魔法不清楚, [GNU 文件](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) 上面提到: > -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-partial-pre -funswitch-loops -fvect-cost-model=dynamic -fversion-loops-for-strides 但是究竟是什麼激進的提示,讓 compiler 可以作到上面化簡,不清楚。 ## 定長治百病 既然上面結果這樣,我們就可以嘗試把 std::unique_ptr 移除,讓 SOO 保留在 stack 上。也不要有動態長度。 ![](https://i.imgur.com/iArbvYm.png) 我們得到這樣的結果。太棒了,SOO 讓物件搬到 stack,這樣完全消除了容器的 overhead。 :::success 看到這精美的效能優勢, Yes, 有極致的效能就是身心舒暢。 ::: ![](https://i.imgur.com/N6lmJnF.png)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully