yuoffthewall
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    2022q1 Homework2 (quiz2) ======================== contributed by < [tommy2234](https://github.com/tommy2234) > 題目連結 [quiz2](https://hackmd.io/@sysprog/linux2022-quiz2#2022q1-%E7%AC%AC-2-%E9%80%B1%E6%B8%AC%E9%A9%97%E9%A1%8C) ## 測驗 1 : Average - 方法一 ```cpp #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a >> 1) + (b >> 1) + (a & b & 1); } ``` `(a >> 1) + (b >> 1)` 將 a 和 b 分別除以 2 再相加,這樣就不會有 overflow 的問題,但是由於兩邊的計算都各自取了 floor,當 a 和 b 都是奇數,將會導致答案少 1。 因此需要在最後加上 `a & b & 1`,當 a, b 都是奇數時,補償少掉的 1。 - 方法二 ```cpp #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a & b) + ((a ^ b) >> 1); } ``` 用 `a & B` 得到 carry 的部份,再用 `(a ^ b) >> 1` 得到 sum 的部份並除以二,最後將 carry 和 sum 相加就是平均值。 ### 比較上述實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 `$ gcc -S -O3 average.c` 1. `(a + b) / 2` ``` average: endbr64 leal (%rdi,%rsi), %eax shrl %eax ret ``` 將 rdi + rsi 放在 eax 然後用 shrl 右移一位(除以二)之後回傳。 2. `low + (high - low) / 2` ``` average: endbr64 movl %esi, %eax subl %edi, %eax shrl %eax addl %edi, %eax ret ``` 將 esi (high) 和 edi (low) 相減後放在 eax ,然後除以二,最後加上 edi (low)。 3. `(a >> 1) + (b >> 1) + (a & b & 1)` ```= average: endbr64 movl %edi, %eax movl %esi, %edx andl %esi, %edi shrl %eax shrl %edx andl $1, %edi addl %edx, %eax addl %edi, %eax ret ``` 3~5 行做 a & b , 6~7 行做 shift ,最後把三個括號內的值相加後回傳。由於步驟比較多所以指令數量也比較多。 4. `(a & b) + ((a ^ b) >> 1)` ``` average: endbr64 movl %edi, %eax andl %esi, %edi xorl %esi, %eax shrl %eax addl %edi, %eax ret ``` 先做 and 再做 xor 和 shift ,最後相加。 - 從以上幾個實做的 assembly code 可以看到在 O3 優化的情況下幾乎都只剩下最精簡的指令,沒有其它多餘的步驟。 ::: warning TODO 研讀 Linux 核心原始程式碼 [include/linux/average.h](https://github.com/torvalds/linux/blob/master/include/linux/average.h),探討其 [Exponentially weighted moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) (EWMA) 實作,以及在 Linux 核心的應用 ::: --- ## 測驗 2:實做 max ```cpp #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } ``` `a ^ 0` 會得到 a ,而 `a ^ (a ^ b)` 會得到 b , 因此我們必須設法將`((EXP4) & -(EXP5))`設計成: 1. a > b 時 `((EXP4) & -(EXP5))` = 0 2. a < b 時 `((EXP4) & -(EXP5))` = (a ^ b) 答案為:`((a ^ b) & -(a < b))` 這是一個不需要分支就能求得 max 的作法,因為原本需要的條件判斷被藏在`-(a < b)` 之中。當 a < b 時`-(a < b)`會得到 32 個 bits 的 1,否則就是全 0 。 再和前面的 a ^ b 做 and 就能決定 `((a ^ b) & -(a < b))` 究竟是 `a ^ b` 還是 0。 ### 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作 ```cpp #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((a ^ b) & -(a < b)); } ``` 以上實作不管針對無號數或是有號數,最後的結果只會是 a 或 b,並且每個位元包含 sign bit 都會被保留,因此無號數和有號數都可以共用此實做。 ### 在 Linux 核心原始程式碼中,找出更多類似的實作手法 - [int2float](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=747f49ba67b8895a5831ab539de551b916f3738c) - 原本尋找 MSB 的 方式是利用迴圈不斷 left shift 直到找到 1 為止,這個 commit 改成使用 __fls(x) ,可避免原本的迴圈。 - 根據註解可得知使用 rotation 可以使我們不用判斷數字是否大於 2^24 ,這是 int 轉 float 支援的範圍。 > Use a rotate instead of a shift because that works both leftwards and rightwards due to the mod(32) behaviour. This means we don't need to check to see if we are above 2^24 or not. ```cpp uint32_t int2float(uint32_t x) { uint32_t msb, exponent, fraction; /* Zero is special */ if (!x) return 0; /* Get location of the most significant bit */ msb = __fls(x); /* * Use a rotate instead of a shift * because that works both leftwards * and rightwards due to the mod(32) * behaviour. This means we don't * need to check to see if we are above * 2^24 or not. */ fraction = ror32(x, (msb - I2F_FRAC_BITS) & 0x1f) & I2F_MASK; exponent = (127 + msb) << I2F_FRAC_BITS; return fraction + exponent; } ``` - [ror32](https://elixir.bootlin.com/linux/v4.8/source/include/linux/bitops.h#L118) ```cpp static inline __u32 ror32(__u32 word, unsigned int shift) { return (word >> shift) | (word << (32 - shift)); } ``` 利用 `(word >> shift) | (word << (32 - shift))` 可以達成頭尾相接的效果,也就是 rotation。 --- ## 測驗 3:GCD - GCD 演算法的步驟如下 : 1. If both x and y are 0, gcd is zero gcd(0,0)=0. 2. gcd(x, 0)=x and gcd(0, y)=y because everything divides 0. 3. If x and y are both even, gcd(x, y)=2∗gcd(x/2 ,y/2), because 2 is a common divisor. Multiplication with 2 can be done with bitwise shift operator. 4. If x is even and y is odd, gcd(x, y)=gcd(x/2, y). 5. On similar lines, if x is odd and y is even, then gcd(x,y)=gcd(x, y/2). It is because 2 is not a common divisor. - 在程式碼中對應以上流程的區塊加入註解 ```cpp #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { /* * gcd(x,0)=x and gcd(0,y)=y because * everything divides 0. */ if (!u || !v) return u | v; /* * If x and y are both even, * gcd(x,y)=2∗gcd(x/2 ,y/2) because 2 is * a common divisor. * Multiplication with 2 can be * done with bitwise shift operator. */ int shift; // while both u and v are even for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } /* * 1. If x is even and y is odd, * gcd(x, y)=gcd(x/2, y). * * 2. On similar lines, if x is odd and y * is even, then gcd(x, y)=gcd(x, y/2). * It is because 2 is not a common divisor. */ while (!(u & 1)) // while u is even u /= 2; do { while (!(v & 1)) v /= 2; // while v is even /* * uint64_t t = v; * v = u % v; * u = t; */ if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); // GCD ends when v == 0 return RET; // return u when v == 0 } ``` - GCD 演算法在 v 變成 0 的時候中止。所以 `COND` 應該填入 `v` - 一開始在 x 和 y 都是偶數時我們不斷讓他們對 2 做約分,並且將約分次數存在 `shift` ,也就是步驟三提到的 `If x and y are both even, gcd(x, y)=2∗gcd(x/2 ,y/2)` 因此最後 return u 時要把除掉的那些 2 乘回來才是正確的 gcd , `RET` 應該填入 `u << shift` ### 透過 `__builtin_ctz` 改寫 GCD,分析對效能的提升 `__builtin_ctzll` 可以直接找到從 LSB 開始連續的 0 的個數,因此我們可以避免掉使用 while 迴圈不斷右移並檢查 LSB 的那些部份,全都換成 `__builtin_ctzll` 。 ```cpp #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int u_shift = __builtin_ctzll(u); int v_shift = __builtin_ctzll(v); u = u >> u_shift; v = v >> v_shift; while (!(u & 1)) u /= 2; do { v >>= __builtin_ctzll(v); if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << (u_shift < v_shift ? u_shift : v_shift); } ``` - 使用 perf 進行效能測試,分別執行 1000000 次。 `$ perf stat ./a.out 1000000` 1. 未使用 `__builtin_ctz` 的版本 ```shell Performance counter stats for './a.out 1000000': 177.98 msec task-clock 0.178229090 seconds time elapsed 0.178248000 seconds user 0.000000000 seconds sys ``` 2. 使用 `__builtin_ctz` 的版本 ```shell Performance counter stats for './a.out 1000000': 323.33 msec task-clock 0.323722946 seconds time elapsed 0.319772000 seconds user 0.003997000 seconds sys ``` - 從結果可以看到使用 `__builtin_ctz` 的版本的 performance 好了不少,大約是 1.8 倍,果然硬體加速的效益還是最直接的。 ### Linux 核心中也內建 GCD,請解釋其實作手法和探討在核心內的應用場景。 [lib/math/gcd.c](https://github.com/torvalds/linux/blob/master/lib/math/gcd.c) 提供了兩種 GCD 函式,原因是當硬體支援 __ffs 指令時,第一種方法會比較快。 > If __ffs is available, the even/odd algorithm benchmarks slower. - 使用 `__ffs` 的實作 - `__ffs` 可以找到 first set bit 的位置,和 `__builtin_ctz` 的作用是一樣的。 - 過程中用到 `r & -r` 的技巧,和測驗四的技巧相同,可以保留最低位的 1 ,其他都清成 0。 ```cpp unsigned long gcd(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __ffs(b); if (b == 1) return r & -r; for (;;) { a >>= __ffs(a); if (a == 1) return r & -r; if (a == b) return a << __ffs(r); if (a < b) swap(a, b); a -= b; } } ``` --- ## 測驗 4 : bitset 考慮 GNU extension 的 [`__builtin_ctzll`](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 的行為是回傳由低位往高位遇上連續多少個 `0` 才碰到 `1`。 用以改寫的程式碼如下: ```cpp #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = EXP6; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } ``` - 其中第 9 行的作用是找出目前最低位元的 `1`,並紀錄到 `t` 變數。舉例來說,若目前 `bitset` 為 000101b000101b,那 `t` 就會變成 000001b000001b,接著就可以透過 XOR 把該位的 `1` 清掉,其他保留 (此為 XOR 的特性)。 - 2's complement 的結果為 1's complement + 1。在所有 bits 反轉後加入的 1 會形成一個 carry bit,再次將 target bit (最低位的 1) 之前的 bit 全部反轉,此時 target 的位置會和原本的 bit 同為 1 ,再和原本的數字做 and 就能保留下 target 位置的 1 ,其他位置全為 0。 因此 EXP6 = `bitset & -bitset` - 本題使用的方法是以 `bitset & -bitset` 保留下最低位的 1 ,再套用 `_builtin_ctzll` 取出其位置,如此一來就可以避免逐個 bit 做檢查,因此若 bitmap 越鬆散 (即 `1` 越少), improved 的效益就更高。 ### 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?考慮到不同的 [bitmap density](http://www.vki.com/2013/Support/docs/vgltools-3.html) - 依照不同的 density 產出 bitmap。單純考慮 1 的數量。 ```cpp uint64_t *gen_bitmap(size_t size, float density){ uint64_t *bitmap = malloc(size * sizeof(uint64_t)); int shift = 63 * (1 - density); uint64_t value = ~0ul << shift; for(int i = 0; i < size; i++) bitmap[i] = value; return bitmap; } ``` - 試著調整 bitmapsize, density , 搭配 perf 測試 naive 和 improve 的效能落差。 - bitmapsize = argv[1] - density = argv[2] - mode = argv[3] == 1 ? naive : improve ```cpp int main(int argc, char *argv[]){ size_t bitmapsize = atol(argv[1]); float density = atof(argv[2]); int mode = atoi(argv[3]); uint32_t *out = (uint32_t *)malloc(bitmapsize * 64 * sizeof(uint32_t)); uint64_t *bitmap = gen_bitmap(bitmapsize, density); if(mode == 1) naive(bitmap, bitmapsize, out); else improved(bitmap, bitmapsize, out); } ``` - 用 perf 測量 bitmapsize = 1000000 的 execution time 比較 (msec) | density | naive | improve | | ------- | ------ | ------- | | 0 | 131.92 | 8.43 | | 0.25 | 150.51 | 58.86 | | 0.5 | 171.38 | 120.17 | | 1 | 190.47 | 221.54 | ![](https://i.imgur.com/Mfhc45y.png) - 從結果可以看出 bitmap 的 density 越低,使用硬體加速的優勢越明顯,因為原本需要迭代 64 次的迴圈 ,迭代次數可減少為 1 的個數。 但是當密度為 100% 時,improve 的版本卻輸給 naive ,可能是因為此時兩者都要迭代 64 次,但是 improve 版本在迴圈內的指令複雜的多。 ### 閱讀 [Data Structures in the Linux Kernel](https://0xax.gitbooks.io/linux-insides/content/DataStructures/linux-datastructures-3.html) 並舉出 Linux 核心使用 bitmap 的案例 - Bitmap 最常見的利用就是節省記憶體,尤其當我們只需要紀錄一個二元的『 狀態 』,例如有 vs 無,存在 vs 不存在 ,就可以用 bitmap 來表示,就可以用 bitmap 來表示。 相較於使用布林陣列中一個 byte 只能紀錄一個物件,使用 bitmap 可以在一個 byte 中就紀錄 8 個物件。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully