許雅雯
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2017q3 Homework1 (clz) contributed by <`tina0405`> ## 閱讀 [重新理解數值](https://hackmd.io/s/BkRKhQGae) * count leading zero([clz](https://en.wikipedia.org/wiki/Find_first_set#CLZ)) * wiki 裡面提到: ~~~ Count Leading Zeros (clz) counts the number of zero bits preceding the most significant one bit. ~~~ * 例子: 32-bits 0x00000F00 的 clz 會是多少呢? 0000_0000_0000_0000_0000_1111_0000_0000 在有效位數前有 20 個 0 所以 clz 就是 20 了 * wiki 中的算法,以下是 non-optimized : ~~~c= int clz1( uint32_t x ) { int n; if (x == 0) return 32; for (n = 0; ((x & 0x80000000) == 0); n++, x <<= 1); return n; } ~~~ * 因為和 32-bits 1000_0000_0000_0000_0000_0000_0000_0000 ,只要 input 一直往左 shift 直到不為 0 就是最高有效位前的 0 數目。 * x <<= 1 意思是 x = x<<1 * 第 2 種的算法是利用查表和 n += 4 和 x <<= 4 來減少 for 迴圈的次數 ~~~ c= static const uint8_t clz_table_4bit[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 }; int clz2( uint32_t x ) { int n; if (x == 0) return 32; for (n = 0; ((x & 0xF0000000) == 0); n += 4, x <<= 4); n += (int)clz_table_4bit[x >> (32-4)]; return n; } ~~~ * 這邊我想特別解釋 table 數字的意義 { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 } * 因為這個迴圈是以 4-bit 為單位,所以總共可以表達 16 個數,因此: ~~~ 0000 -> 在有效位前有 4 個 0 0001 -> 在有效位前有 3 個 0 0010 -> 在有效位前有 2 個 0 0011 -> 在有效位前有 2 個 0 0100 -> 在有效位前有 1 個 0 0101 -> 在有效位前有 1 個 0 0110 -> 在有效位前有 1 個 0 0111 -> 在有效位前有 1 個 0 1000 -> 在有效位前有 0 個 0 1001 -> 在有效位前有 0 個 0 1010 -> 在有效位前有 0 個 0 1011 -> 在有效位前有 0 個 0 1100 -> 在有效位前有 0 個 0 1101 -> 在有效位前有 0 個 0 1110 -> 在有效位前有 0 個 0 1111 -> 在有效位前有 0 個 0 ~~~ * 在 $<<重新理解數值>>$ 的本文中說明 clz 應用在 log 以 2 為底的計算 ~~~ 當我們計算 log2 (以2為底的對數) 時, 其實只要算高位有幾個 0’s bits. 再用 31 減掉即可。 ~~~ * 在以前的理解方面,只將數字換算出是 2 的 X 次方而得到答案,現在只需利用 shift 和邏輯運算就可快速得到 * 如果以 10 為底,不能 shift 則可利用查表法來避免除法 ## 比較 32-bit 數值對以下實做的效能差異 ### recursive version ~~~ c= static const int mask[] = {0, 8, 12, 14}; static const int magic[] = {2, 1, 0, 0}; unsigned clz2(uint32_t x,int c) { // *INDENT-OFF* if (!x && !c) return 32; uint32_t upper = (x >> (16 >> c)); uint32_t lower = (x & (0xFFFF>>mask[c])); if (c == 3) return upper ? magic[upper] : 2 + magic[lower]; return upper ? clz2(upper, c + 1) : (16 >> (c)) + clz2(lower, c + 1); // *INDENT-ON* } ~~~ * 這邊一直分 upper 和 low ,如果 upper =0 (沒有有效位),就去追求 lower 的 recursive ,而 lower 又會再分成 upper 和 lower , 如果 upper =1,就去找 upper , 因為目的是看最高有效位前有幾個0。 * upper = (x >> (16 >> c)) 這邊的每一輪多 shift 1位是因為切一半 , 也可以想乘除以 2 * magic[] = {2, 1, 0, 0} 也跟上面解釋的 wiki 的寫法類似 , 最後一輪寫死 ### iteration version ~~~c= static inline __attribute((always_inline)) unsigned clz(uint32_t x) { // *INDENT-OFF* int n = 32, c = 16; do { uint32_t y = x >> c; if (y) { n -= c; x = y; } c >>= 1; } while (c); return (n - x); // *INDENT-ON* } ~~~ * iteration 是從一半一直往右移如果為 0 ,則再往右移現在的一半 , n -= c 是因為用扣的如果在有效位數後的值愈多,有效位數前就愈少,全部加起來固定 32-bits ### binary search technique ~~~c= static inline __attribute((always_inline)) unsigned clz(uint32_t x) { // *INDENT-OFF* if (x == 0) return 32; int n = 0; if (x <= 0x0000FFFF) { n += 16; x <<= 16; } if (x <= 0x00FFFFFF) { n += 8; x <<= 8; } if (x <= 0x0FFFFFFF) { n += 4; x <<= 4; } if (x <= 0x3FFFFFFF) { n += 2; x <<= 2; } if (x <= 0x7FFFFFFF) { n += 1; x <<= 1; } return n; // *INDENT-ON* } ~~~ * 比完的大小後,就可得知前面的 0 數目,然後一直累加 ### byte-shift version ~~~c= static inline __attribute((always_inline)) unsigned clz(uint32_t x) { // *INDENT-OFF* if (x == 0) return 32; int n = 1; if ((x >> 16) == 0) { n += 16; x <<= 16; } if ((x >> 24) == 0) { n += 8; x <<= 8; } if ((x >> 28) == 0) { n += 4; x <<= 4; } if ((x >> 30) == 0) { n += 2; x <<= 2; } n = n - (x >> 31); return n; // *INDENT-ON* } ~~~ ### Harley’s algorithm ~~~c= static inline __attribute((always_inline)) unsigned clz(uint32_t x) { #ifdef CTZ // *INDENT-OFF* static uint8_t const Table[] = { 0xFF, 0, 0xFF, 15, 0xFF, 1, 28, 0xFF, 16, 0xFF, 0xFF, 0xFF, 2, 21, 29, 0xFF, 0xFF, 0xFF, 19, 17, 10, 0xFF, 12, 0xFF, 0xFF, 3, 0xFF, 6, 0xFF, 22, 30, 0xFF, 14, 0xFF, 27, 0xFF, 0xFF, 0xFF, 20, 0xFF, 18, 9, 11, 0xFF, 5, 0xFF, 0xFF, 13, 26, 0xFF, 0xFF, 8, 0xFF, 4, 0xFF, 25, 0xFF, 7, 24, 0xFF, 23, 0xFF, 31, 0xFF, }; // *INDENT-ON* #else // *INDENT-OFF* static uint8_t const Table[] = { 32, 31, 0, 16, 0, 30, 3, 0, 15, 0, 0, 0, 29, 10, 2, 0, 0, 0, 12, 14, 21, 0, 19, 0, 0, 28, 0, 25, 0, 9, 1, 0, 17, 0, 4, 0, 0, 0, 11, 0, 13, 22, 20, 0, 26, 0, 0, 18, 5, 0, 0, 23, 0, 27, 0, 6, 0, 24, 7, 0, 8, 0, 0, 0 }; // *INDENT-ON* #endif /* Propagate leftmost 1-bit to the right */ x = x | (x >> 1); x = x | (x >> 2); x = x | (x >> 4); x = x | (x >> 8); x = x | (x >> 16); /* x = x * 0x6EB14F9 */ x = (x << 3) - x; /* Multiply by 7. */ x = (x << 8) - x; /* Multiply by 255. */ x = (x << 8) - x; /* Again. */ x = (x << 8) - x; /* Again. */ return Table[(x >> 26)]; } ~~~ * 找 TABLE 意義 ## 觀察原圖表 ![](https://i.imgur.com/c9GRkEM.png) * 這邊探討從 67100000 到 67118000,是因為 ~~~c= unsigned int min = atoi(argv[1]); unsigned int max = atoi(argv[2]); ~~~ * 而 MAKEFILE 也有給定參數( run裡 ) ~~~ ./$$method 67100000 67116384; ~~~ >> 因為看圖的時候,一直覺得 recurcive 版只做 4 (c=0~c=3)次 if ,binary 做了5次,但 cycle 還是比較少 ,因此不太明白 cycle 是如何產升的,和什麼會影響 cycle * 嘗試理解 get_cycles ~~~c= void get_cycles(unsigned *high, unsigned *low) { asm volatile ("CPUID\n\t" "RDTSC\n\t" "mov %%edx, %0\n\t" "movl %%eax, %1\n\t": "=r" (*high), "=r" (*low)::"%rax","%rbx","%rcx","%rdx" ); } ~~~ * [參考](http://www.ethernut.de/en/documents/arm-inline-asm.html) * 固定格式 asm(code : output operand list : input operand list : clobber list); * 沒放資料也要 ":" 但可以空著,像這邊就沒有 input operand list * %0 %1 是 output operand list 的第 0 筆和第 1 筆 (*high) (*low) * clobber list: 直接存取的 rigister 會放在 clobber list * 這邊可以計算 cycle 數,是因為 RDTSC * 參考[x86 Instruction Set Reference_RDTSC](http://x86.renejeschke.de/html/file_module_x86_id_278.html) * 原來是因為這樣程式才要選擇 edx 和 eda | Opcode | Mnemonic | Description | | -------- | -------- | -------- | |0F 31 | RDTSC | Read time-stamp counter into EDX:EAX | * 時間戳隨著 clock cycle持續增加 ~~~ The processor monotonically increments the time-stamp counter MSR every clock cycle and resets it to 0 whenever the processor is reset. ~~~ * MSR 的解釋:model specific register (MSR) that accurately counts the cycles that occur on the processor ## 作業要求 * 閱讀 [重新理解數值](https://hackmd.io/s/BkRKhQGae) 裡頭關於 count leading zero ([clz](https://en.wikipedia.org/wiki/Find_first_set#CLZ)) 的描述,設計實驗,比較 32-bit 數值對以下實做的效能差異: * recursive version * iteration version * binary search technique * byte-shift version * Harley's algorithm * 除了在 [重新理解數值](https://hackmd.io/s/BkRKhQGae) 列出的程式,詳閱[劉亮谷的實驗紀錄](/s/BJBZn6Q6),予以重現並解釋個別實作的運作原理 * 解說影片: [Count Leading Zero](https://www.youtube.com/watch?v=DRkXFjLfaVg) [必看!] * 走訪全部的 32-bit 數值,用上述演算法帶入計算 clz 值,先驗證正確性,如果演算法不正確,試圖改正 * 在 GitHub 上 fork [clz-tests](https://github.com/sysprog21/clz-tests),以此為基礎進行以下調整 (如在 9 月 23 日就 fork 過,那請重新 fork) * 用 C99 或 C11 改寫程式碼,其中 C11 的 [_Generic](http://www.robertgamble.net/2012/01/c11-generic-selections.html) 非常好用,詳情可見 [你所不知道的 C 語言:前置處理器應用篇](https://hackmd.io/s/S1maxCXMl) * 比照 [phonebook](/s/rJYD4UPKe) 和 [compute-pi](/s/HkiJpDPtx),設計一套 benchmark suite,得以針對所有的 32-bit 數值進行個別 clz 實做效能的分析,並透過 gnuplot 製圖 * 要附上個別數值實際耗費時間,不能只列出平均值 * 提供落點分析圖,類似 [tcp-anaysis](https://upload.wikimedia.org/wikipedia/commons/6/65/Gnuplot_tcp_analysis.png) ([with-code](https://commons.wikimedia.org/wiki/File:Gnuplot_tcp_analysis.png)) * 為了避免編譯器最佳化的影響,務必指定編譯參數 `-O0` 來抑制最佳化 * 找至少 3 個案例,說明 clz 的應用場合 * 示範: [A Fast Hi Precision Fixed Point Divide](http://me.henri.net/fp-div.html) * 提示:透過 [Google Books](https://books.google.com/) 可以找到一些專門探討 optimization 的書籍,裡頭會有 clz 的應用 * 將你的觀察、上述要求的解說、應用場合探討,以及各式效能改善過程,善用 gnuplot 製圖,紀錄於「[作業區](https://hackmd.io/s/HyxQTaZj-)」

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully