N96104234曾晧峖
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 2022q1 Homework2 (quiz2) contributed by < `tseng0201` > ## 作業要求 * 重新回答[第 2 周測驗題](https://hackmd.io/@sysprog/linux2022-quiz2)^從測驗一到測驗七^,附帶的「==延伸問題==」也需要完成 * 解釋程式運作原理時,應提供對應的 [Graphviz](https://graphviz.org/) 圖例,可參照 [Linked List 題目 1 + 分析](https://hackmd.io/@sysprog/linked-list-quiz) * 比照 [課前測驗參考解答: Q1](https://hackmd.io/s/ByzoiggIb), [Linked list 題目分析](https://hackmd.io/s/HyELy5bTz) 和 [參考題解](https://hackmd.io/@RinHizakura/BysgszHNw) 的模式來撰寫共筆,需要詳細分析自己的思路、參閱的材料 (以第一手材料為主,包含 C 語言規格書的章節),以及==進行相關實驗==。 * 將你的共筆加到 [2022q1 Homework2 (作業區)](https://hackmd.io/@sysprog/linux2022-homework2) ## 作業進度 - [x] 測驗1 - [ ] 測驗2 - [ ] 測驗3 - [ ] 測驗4 - [ ] 測驗5 - [ ] 測驗6 - [ ] 測驗7 ### 測驗一 :::info - [x] 解釋下方程式碼運作的原理 - [X] 比較下方實作在編譯器最佳化開啟的狀況,對應的組合語言輸出,並嘗試解讀 (可研讀 CS:APP 第 3 章) - [x] 研讀 Linux 核心原始程式碼 include/linux/average.h,探討其 Exponentially weighted moving average (EWMA) 實作,以及在 Linux 核心的應用 >移動平均(Moving average),又稱滾動平均值、滑動平均,在統計學中是種藉由建立整個資料集合中不同子集的一系列平均數,來分析資料點的計算方法。 移動平均通常與時間序列資料一起使用,以消除短期波動,突出長期趨勢或周期。短期和長期之間的閾值取決於應用,移動平均的參數將相應地設置。例如,它通常用於對財務數據進行技術分析,如股票價格、收益率或交易量。它也用於經濟學中研究國內生產總值、就業或其他宏觀經濟時間序列。 ::: 考慮以下對二個無號整數取平均值的程式碼: - [ ] 版本一 ```c #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return (a + b) / 2; } ``` 這個直覺的解法會有 overflow 的問題,若我們已知 a, b 數值的大小,可用下方程式避免 overflow: - [ ] 版本二 ```c #include <stdint.h> uint32_t average(uint32_t low, uint32_t high) { return low + (high - low) / 2; } ``` 或是說我們可以看成 $a/2+b/2$, 但是由於 int 會進行無條件捨去所以我們還要額外處理最後一位的部份, 由真值表可以看出當只有 a 和 b 2進位表示時最後一位都為1時才會發生數值正確的情形在程式中以 `a & b & 1` 進行處理 | a | b | `a + b` 溢位 | |:---:|:---:|:-----------:| | 0 | 0 | 0 | | 0 | 1 | 0 | | 1 | 0 | 0 | | 1 | 1 | 1 | 版本三 #include <stdint.h> uint32_t average(uint32_t a, uint32_t b) { return a>>2 + b>>2 + (a & b & 1) } 接者可以再把加法可以看成 不進位的加法加上其進位 不進位的加法部分為 a ^ b 進位部分為 a & b << 1 再將兩者除以2改成以下等價的實作 版本四 uint32_t average(uint32_t a, uint32_t b) { return (a & b) + ((a ^ b) >> 1); } 組合語言 設定: GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0" 參數: gcc -Og -s 常使用到的暫存器 : %rdi : 除存第一個傳入參數 %rsi : 除存第二個傳入參數 %eax : 除存回傳值 版本一 解釋 透過 leal 指令直接將 rdi 與 rsi 相加後存入 eax 等於進行 a + b 後 直接存入esi 對 eax 進行 右移1 回傳 ``` .LFB0: .cfi_startproc endbr64 leal (%rdi,%rsi), %eax shrl %eax ret .cfi_endproc .LFE0: ``` 版本二 解釋 1.將a移入%eax 2.計算 a-b 並存入%eax 3.計算(a-b)>>1 4.計算 a + (a-b)>>1 ``` .LFB1: .cfi_startproc endbr64 movl %edi, %eax subl %esi, %eax shrl %eax addl %edi, %eax ret .cfi_endproc .LFE1: ``` 版本三 解釋: 將 a 移入 %eax 右移 %eax 計算 a>>1 將 b 移入 %edx 右移 %edx 計算 b>>1 計算 a>>1 + b>>1 存在 %eax 將 b 移入 %edi 計算 a & b 存在%edi 計算 edi & 1 存在%edi 將 %eax(a>>1 + b>>1) 與 %edi(a & b & 1) 相加後存入 %eax 回傳 ``` .LFB2: .cfi_startproc endbr64 movl %edi, %eax shrl %eax movl %esi, %edx shrl %edx addl %edx, %eax andl %esi, %edi andl $1, %edi addl %edi, %eax ret .cfi_endproc .LFE2: ``` 版本四 解釋: 將 a 移入 %eax 計算 a + b 並存在 %eax 計算 a ^ b 存在 edi 右移 %edi 得 (a ^ b) >> 1 將 %edi 與 %eax 相加 存入 %eax 回傳 ``` .LFB3: .cfi_startproc endbr64 movl %edi, %eax andl %esi, %eax xorl %esi, %edi shrl %edi addl %edi, %eax ret .cfi_endproc .LFE3: ``` 研讀linux average_ 常見函式 參考 : [https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) [linux/build_bug.h](https://github.com/torvalds/linux/blob/master/include/linux/build_bug.h) __builtin_constant_p() 功能 :檢查是否為常數 (constant at compile time), 當該值為常數回傳1,如果不是回傳0 BUILD_BUG_ON() 功能:當條件為真時打斷 complier 並顯示錯誤的條件式 ``` #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) #define BUILD_BUG_ON(condition) \ BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) ``` BUILD_BUG_ON_NOT_POWER_OF_2() 功能:檢查該值是否為 2, 的 n 次方 (e.g.,1、2、4.....) ``` #define BUILD_BUG_ON_NOT_POWER_OF_2(n) \ BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0)) ``` WRITE_ONCE 確定寫入時不會被編譯器優化 待完成 :發生場景 ```c #define WRITE_ONCE(var, val) \ (*((volatile typeof(val) *)(&(var))) = (val)) ``` 探討 linux average.h 其實作 首先查閱 [Wikipedia](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) 理解什麼是 Exponentially weighted moving average 可以想成每次都將之前的值以一定比例衰退, 再加上現在的值 其公式為 $$ S_t = \begin{cases} Y_0, & \text{$t$ = 0} \\ \alpha Y_t+(1-\alpha)S_{t-1}, & \text{$t$ > 0} \end{cases} $$ 再來觀察 linux/average.h, 依其註解所述該 marco 會建立一個初始的結構與若干個輔助函式 (helper functions), 其參數為 : 名稱、精準度(小數部分的位元長度)及 $\alpha$ 的倒數, 由於其參數是固定的, 在調用 Macro 時便會固定其值, 所以會看到許多 BUILD_BUG_ON() 用於檢查其參數使否為常數 更新函式為同上述為$\alpha Y_t+(1-\alpha)S_{t-1}$, 其中$\alpha = 1 / \text{_weight_rcp}$ 同時為了速度的要求 _weight_rcp 只能為 2 的次方(能夠直接位移) ```c #define DECLARE_EWMA(name, _precision, _weight_rcp) \ struct ewma_##name { \ unsigned long internal; \ }; \ static inline void ewma_##name##_init(struct ewma_##name *e) \ { \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ /* \ * Even if you want to feed it just 0/1 you should have \ * some bits for the non-fractional part... \ */ \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ e->internal = 0; \ } \ static inline unsigned long \ ewma_##name##_read(struct ewma_##name *e) \ { \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ return e->internal >> (_precision); \ } \ static inline void ewma_##name##_add(struct ewma_##name *e, \ unsigned long val) \ { \ unsigned long internal = READ_ONCE(e->internal); \ unsigned long weight_rcp = ilog2(_weight_rcp); \ unsigned long precision = _precision; \ \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ \ WRITE_ONCE(e->internal, internal ? \ (((internal << weight_rcp) - internal) + \ (val << precision)) >> weight_rcp : \ (val << precision)); \ } ``` 首先先看看其結構 僅僅只有一個值用於紀錄 EWMA ```c struct ewma_##name { unsigned long internal; }; ``` 接下來是初始化的部分,利用 BUILD_BUG_ON 檢查參數是否正確並要求 _precision 必須保留整數的部分, 最後將 EWMA 初始值設為 0 ```c static inline void ewma_##name##_init(struct ewma_##name *e) \ { \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ /* \ * Even if you want to feed it just 0/1 you should have \ * some bits for the non-fractional part... \ */ \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ e->internal = 0; \ } ``` 最後是 add 用於計算下一輪的 EWMA 需要傳入 val作為新的值, 使用函式 iloc() 計算 weight_rcp 等同於多少次 向右位移, 並將最終結果寫入 e->internal READ_ONCE 和 WRITE_ONCE 巨集, 用於避免編譯器優化時忽略或調換指令的順序 註 : 其實現$(1-\alpha)$方法為 $(s_{t-1}* rcp - s_{t-1}) / rcp$ 避免分數計算及可使用位移與減法實現 ```c static inline void ewma_##name##_add(struct ewma_##name *e, \ unsigned long val) \ { \ unsigned long internal = READ_ONCE(e->internal); \ unsigned long weight_rcp = ilog2(_weight_rcp); \ unsigned long precision = _precision; \ \ BUILD_BUG_ON(!__builtin_constant_p(_precision)); \ BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp)); \ BUILD_BUG_ON((_precision) > 30); \ BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp); \ \ WRITE_ONCE(e->internal, internal ? \ (((internal << weight_rcp) - internal) + \ (val << precision)) >> weight_rcp : \ (val << precision)); \ } ``` 應用 --- ### 測驗二 :::info 延伸問題: - [x] 解釋下方碼運作的原理 - [ ] 針對 32 位元無號/有號整數,撰寫同樣 branchless 的實作 - [ ] Linux 核心也有若干 branchless / branch-free 的實作,例如 lib/sort.c: ``` * Logically, we're doing "if (i & lsbit) i -= size;", but since the * branch is unpredictable, it's done with a bit of clever branch-free * code instead. */ __attribute_const__ __always_inline static size_t parent(size_t i, unsigned int lsbit, size_t size) { i -= size; i -= size & -(i & lsbit); return i / 2; } ``` 請在 Linux 核心原始程式碼中,找出更多類似的實作手法。請善用 git log 檢索。 ::: 改寫〈[解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation)〉一文的「不需要分支的設計」一節提供的程式碼 min,我們得到以下實作 (max): #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((EXP4) & -(EXP5)); } 我們從 a ^ ((EXP4) & -(EXP5)) 開始分析,有兩種情型: 第一種情型 a 是比較大的,我們會希望回傳 a ,此時 ((EXP4) & -(EXP5)) = 0 第二種情型 a 是比較大的,我們會希望回傳 b ,此時我們需要將 a 消去並且產生 b 此時可藉由 a ^ a 會抵消的概念推得((EXP4) & -(EXP5)) = a ^ b, 此時可以推出 EXP4 或 EXP5 其中一個會是 a ^ b 接下來再考慮 & 任何數與0xffffffff做&可以保留原本的數值,反之與0做&只能得到0,故需要一個判斷式決定結果應該是保留 a ^ b 或是得到0, 得 EXP4 或 EXP5 其中一項為 a > b 或是 a < b,再觀察 > 只會產生0或1,0是我們想要的但1不是我們想要的是0xfffffff也就是-1,此時觀察 EXP5 有個負號正是我們所需要的,最後可得下列答案 ``` #include <stdint.h> uint32_t max(uint32_t a, uint32_t b) { return a ^ ((a ^ b) & -(a < b)); } ``` 針對有號整數進行相同 branchless 的實作 ``` int32_t max(int32_t a, int32_t b) { return a - ((a-b) & (a-b)>>31); } ``` 在查詢資料的過程中發現有人提到寫成以下方法 並使用gcc -O2 優化時, 會使用 cmovge 指令, 經查詢其為條件移動指令, 不會因為分支預測失敗而不執行, 他是透過 EFLAG 或 RFLAG 的值決定是否進行 move 但是延遲較高, 不一定會比使用分支預測的程式好。 ``` int32_t max(int32_t a, int32_t b) { return (a > b) ? a : b; } gcc 4.7.3 -O2 -m32 max(int, int): mov eax, DWORD PTR [esp+4] mov edx, DWORD PTR [esp+8] cmp edx, eax cmovge eax, edx ret ``` 預計實驗 比較上述 branchless 程式與使用 cmove 指令何者較快 --- ### 測驗三 :::info 延伸問題: - [x] 解釋下方程式運作原理; - [x] 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升; - [ ] Linux 核心中也內建 GCD (而且還不只一種實作),例如 lib/math/gcd.c,請解釋其實作手法和探討在核心內的應用場景。 ::: 考慮以下 64 位元 GCD ([greatest common divisor](https://en.wikipedia.org/wiki/Greatest_common_divisor), 最大公因數) 求值函式: #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; while (v) { uint64_t t = v; v = u % v; u = t; } return u; } 註: GCD 演算法 1. If both x and y are 0, gcd is zero $gcd(0,0)=0$. 2. $gcd(x,0)=x$ and $gcd(0,y)=y$ because everything divides 0. 3. If x and y are both even, $gcd(x,y)=2∗gcd(x2,y2)$ because 2 is a common divisor. Multiplication with 2 can be done with bitwise shift operator. 4. If x is even and y is odd, $gcd(x,y)=gcd(\dfrac{x}{2},y)$. 5. On similar lines, if x is odd and y is even, then $gcd(x,y)=gcd(x,\dfrac{y}{2})$.It is because 2 is not a common divisor. 改寫為以下等價實作: #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift; for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } while (!(u & 1)) u /= 2; do { while (!(v & 1)) v /= 2; if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (COND); return RET; } 我們可以從GCD演算法第3點 $gcd(x,y)=2∗gcd(\dfrac{x}{2},\dfrac{y}{2})$ 以及下列式子得出 shift 作用為記錄執行第3點幾次最後回傳時到時候要乘回去 (RET << shift) ``` for (shift = 0; !((u | v) & 1); shift++) { u /= 2, v /= 2; } ``` 接下來可以觀察到下列式子當 u < v 時,將 v 不斷減去 u 就像是再進行 v % u 而另一部分就像是在做swap ``` if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } ``` swap後 u = v, v = u - v 我們知道當其中一方為0後 gcd 便可以算完 當 u = v 時才會發生相減為0,而 u - v 由 v 儲存, 至此便可推出 COND 為 v 而 RET 便是 U << shift 在 x86_64 上透過 __builtin_ctz 改寫 GCD,分析對效能的提升(使用linux time 指令測量) 參考來源 : https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html __builtin_ctz(): 算出從右邊數來的第一個1前有幾個0,當輸入為0為 undefined 由於函式輸入是uint64_t 故使用__builtin_ctzll ``` #include <stdint.h> uint64_t gcd64(uint64_t u, uint64_t v) { if (!u || !v) return u | v; int shift = __builtin_ctzll(u | v); u = u >> __builtin_ctzll(u); while (!(u & 1)) u /= 2; do { v = v >> __builtin_ctzll(v); if (u < v) { v -= u; } else { uint64_t t = u - v; u = v; v = t; } } while (v); return u << shift; } ``` 效能比較: 使用迴圈進行 gcd64 10000次進行比較 使用 -Og 優化 使用time 進行執行時間測量 ![](https://i.imgur.com/wwJGRjl.png) 原始版 ![](https://i.imgur.com/afzCr2M.png) 改良版 ![](https://i.imgur.com/I62g3QA.png) ### 測驗四 :::info - [X] 解釋下方程式運作原理,並舉出這樣的程式碼用在哪些真實案例中; - [ ] 設計實驗,檢驗 ctz/clz 改寫的程式碼相較原本的實作有多少改進?應考慮到不同的 bitmap density; - [ ] 思考進一步的改進空間; - [ ] 閱讀 Data Structures in the Linux Kernel 並舉出 Linux 核心使用 bitmap 的案例; ::: 下方程式碼的目地為找出在 bitset 中每個1所在的位置並紀錄在out 並回傳總共有幾個1 #include <stddef.h> size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out) { size_t pos = 0; uint64_t bitset; for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; while (bitset != 0) { uint64_t t = bitset & -bitset; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } } return pos; } 首先看向第一個迴圈,便是把 bitset 每次取 64bit 進行處理 for (size_t k = 0; k < bitmapsize; ++k) { bitset = bitmap[k]; } 再看向第二個迴圈 當 bitset 不為0時繼續尋找該 bitset 中的 1, 首先 t 紀錄目前 bitset 中最右邊的1位置, 透過 bitset & -bitset 取得 原理是負值為相反再加1, bitset相反時0都變成1,所以原本從右邊開始連續的0都會變成連續的1, 再+1後會不斷進位至 bitset 最右邊1的位置此時做&就能剛好保留原本 bitset 中最右邊的1 接下來透過__builtin_ctzll(bitset) 取得最右邊的1是第幾位 透過 out[pos++] = k * 64 + r 紀錄1的位置 最後 再透過 xor 0是保留,1是相反的特性消除最右邊的1,不斷循環直到該 bitset 值為0就取下一組 bitset while (bitset != 0) { uint64_t t = bitset & -bitset; int r = __builtin_ctzll(bitset); out[pos++] = k * 64 + r; bitset ^= t; } 應用查詢中 ### 測驗五 ### 測驗六 :::info - [x] 解釋下方程式運作原理 - [ ] 在 Linux 核心原始程式碼中找出 __alignof__ 的使用案例 2 則,並針對其場景進行解說 - [ ] 在 Linux 核心源使程式碼找出 ALIGN, ALIGN_DOWN, ALIGN_UP 等巨集,探討其實作機制和用途,並舉例探討 (可和上述第二點的案例重複)。思索能否對 Linux 核心提交貢獻,儘量共用相同的巨集 ::: __ alignof __ 是 GNU extension,以下是其可能的實作方式: 再記憶體中 char c 與 struct _h 可能會以下圖方法排列 (參考 [KHLee529](https://hackmd.io/xMiGthU3RCuMuaDpk0yxPg?both) ) ```graphviz digraph{ node[shape = plaintext]; struct[label = "{<f0> char c} | padding | {<f1> t _h}"; shape = record]; aa[label="addr + alignment"]; al[label="alignment"]; addr->struct:f0:nw; aa->struct:f1:nw; struct:f1:sw->0[style="invis"]; 0->struct:f0:sw; struct:f3:sw->al[style="invis"]; al->struct:f1:sw; } ``` 透過取得 _h 減去 0 的距離,便可取得 alignment 的大小 ``` /* * ALIGNOF - get the alignment of a type * @t: the type to test * * This returns a safe alignment for the given type. */ #define ALIGNOF(t) \ ((char *)(&((struct { char c; t _h; } *)0)->_h) - (char *)0) ``` ### 測驗七 :::info - [x] 解釋下方程式運作原理並評估 naive.c 和 bitwise.c 效能落差 避免 stream I/O 帶來的影響,可將 printf 更換為 sprintf - [ ] 分析 Faster remainders when the divisor is a constant: beating compilers and libdivide 一文的想法 (可參照同一篇網誌下方的評論),並設計另一種 bitmask,如「可被 3 整除則末位設為 1」「可被 5 整除則倒數第二位設定為 1」,然後再改寫 bitwise.c 程式碼,試圖運用更少的指令來實作出 branchless; - [ ] 參照 fastmod: A header file for fast 32-bit division remainders on 64-bit hardware 研讀 The Fastest FizzBuzz Implementation 及其 Hacker News 討論,提出 throughput (吞吐量) 更高的 Fizzbuzz 實作 - [ ] 解析 Linux 核心原始程式碼 kernel/time/timekeeping.c 裡頭涉及到除法運算的機制,探討其快速除法的實作 (注意: 你可能要對照研讀 kernel/time/ 目錄的標頭檔和程式碼) 過程中,你可能會發現可貢獻到 Linux 核心的空間,請充分討論 ::: 考慮貌似簡單卻蘊含實作深度的 FizzBuzz 題目: 從 1 數到 n,如果是 3的倍數,印出 “Fizz” 如果是 5 的倍數,印出 “Buzz” 如果是 15 的倍數,印出 “FizzBuzz” 如果都不是,就印出數字本身 直覺的實作程式碼如下: (naive.c) ``` #include <stdio.h> int main() { for (unsigned int i = 1; i < 100; i++) { if (i % 3 == 0) printf("Fizz"); if (i % 5 == 0) printf("Buzz"); if (i % 15 == 0) printf("FizzBuzz"); if ((i % 3) && (i % 5)) printf("%u", i); printf("\n"); } return 0; } ``` 以下是利用 bitwise 和上述技巧實作的 FizzBuzz 程式碼: (bitwise.c) 1.透過Faster remainders when the divisor is a constant: beating compilers and libdivide快速計算是否可被 3、5 整除 2.透過計算是否可 整除3 整除5 決定 取得字串 "FizzBuzz%u"的那些部分 ``` static inline bool is_divisible(uint32_t n, uint64_t M) { return n * M <= M - 1; } static uint64_t M3 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 3 + 1; static uint64_t M5 = UINT64_C(0xFFFFFFFFFFFFFFFF) / 5 + 1; int main(int argc, char **argv) { for (size_t i = 1; i <= 100; i++) { uint8_t div3 = is_divisible(i, M3); uint8_t div5 = is_divisible(i, M5); unsigned int length = (2 << div5) << div3; char fmt[9]; strncpy(fmt, &"FizzBuzz%u"[(8 >> div5) >> (div3 << 2)], length); fmt[length] = '\0'; printf(fmt, i); printf("\n"); } return 0; } ``` 我認為兩者比較大的要能落差 1. 在 naive.c 中會不斷重複計算 e.g. 可以被15整除 及等於可以被 3 跟 5 整除而 bitwise.c 只計算是否可以被 3 跟 5 整除並記錄其結果 2. bitwise.c並沒有使用 == 進行比較而是使用 bitwise 操作減少分支 3. 根據文章 [Faster remainders when the divisor is a constant: beating compilers and libdivide](https://lemire.me/blog/2019/02/08/faster-remainders-when-the-divisor-is-a-constant-beating-compilers-and-libdivide/) 所述 is_divisible 的計算速度較快原因是其除法只用了一次(初始化 M3、M5)之後都是用乘法計算是否可被整除而一般的 % 操作每次都是除法, 而除法比乘法需要耗更多時間

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully