freshLiver
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: linux2022, quiz --- # 2022q1 Homework3 (quiz3) contributed by < `freshLiver` > ## 第一題 - `GENMASK` (LEFT, RIGHT) ```c // 產生一個第 `l` 到第 `h` 個位元皆為 1、而其他位元則為 0 的數值 #define GENMASK(h, l) \ (((~0UL) >> (LEFT)) & ((~0UL) >> (l) << (RIGHT))) ``` 這段程式碼是利用對 `~0UL`(64 個 1)進行位移來將重疊部份(第 `l` 到 `h` 位元)以外的部份皆設為 0,而需要設為 0 的區段有: - 前 `63 - h` 個位元:因此 ==LEFT== 應為 `(63 - h)`。 - 後 `l` 個位元:由於已經先右移 `l` 位元了,所以要再左移 `l` 位元才能使最低 `l` 位元皆為 0,因此 ==RIGHT== 應為 `l`。 ### 比較 Linux 核心 `GENMASK` 巨集的實作,闡述其額外的考量 #### 先備知識 在看 Linux 核心中 `GENMASK` 的實作之前,要先來了解相關巨集以及 [GNU C Extensions](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 的用途以及原理: ##### 巨集 - `UL` ```c // include/uapi/linux/const.h #define __AC(X,Y) (X##Y) #define _AC(X,Y) __AC(X,Y) #define _UL(x) (_AC(x, UL)) // include/vdso/const.h #define UL(x) (_UL(x)) ``` `UL` 是一個定義在 [include/vdso/const.h](https://github.com/torvalds/linux/blob/master/include/vdso/const.h) 中的巨集,展開後會變成 `(((x##Y)))`,而其中的 `##` [token concatenation](https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html#Concatenation),可以在前處理階段時將 `x`、`y` 兩個 token 合併,因此這個巨集展該後相當於在 `x` 後加上一個 `UL` 後綴。 接著根據 [n1570 的 6.4.4.1 Integer constants](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#%5B%7B%22num%22%3A162%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D) 可以知道 `U` 以及 `L` 分別代表了 **unsigned-suffix** 以及 **long-suffix**。而在 6.4.4.1 的第 2 項則說明了這個後綴是用來指定一整數常數的型別: > 2. An integer constant begins with a digit, but has no period or exponent part. It may have a prefix that specifies its base and a **suffix that specifies its type**. 因此可以知道這個巨集是用來表示整數常數 `x` 為 64 位元無號整數([LP64](https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models))。 ##### 巨集 - `BUILD_BUG_ON_ZERO` ```c // include/linux/build_bug.h #ifdef __CHECKER__ #define BUILD_BUG_ON_ZERO(e) (0) #else #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); }))) #endif ``` :::warning TODO : `__CHECKER__` 在哪被定義 ::: 這是一個定義在 [include/linux/build_bug.h](https://github.com/torvalds/linux/blob/master/include/linux/build_bug.h) 的巨集,可以看到當 `__CHECKER__` 有被定義時,這個巨集會展開成: ```c ((int)(sizeof(struct { int:(-!!(e)); }))) ``` 這邊使用了 Bit-Fields 的技巧, > 4. The expression that specifies the width of a bit-field shall be an integer constant expression with a **nonnegative value** that does not exceed the width of an object of the type that would be specified were the colon and expression omitted. $^{122)}$ If the value is zero, the declaration shall have no declarator. 而根據 [6.5.3.3 Unary arithmetic operators](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#%5B%7B%22num%22%3A219%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C-27%2C816%2Cnull%5D) 的第 5 項中則有明確規範一元運算子 `!` 的結果只會是 `0` 或 `1`: > 5. The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, **1 if the value of its operand compares equal to 0**. The result has type int. The expression !E is equivalent to (0==E). 因此當 `e` 為非 0 時,這個 bit-field 的 width 會是 `-1`,但由於 bit-field 的 width 必須為非負整數,所以會產生錯誤造成編譯失敗;相對的,當 `__CHECKER__` 沒定義或是 `e` 為 0 時則不會出現錯誤,而是會展開成常數 0 或是一個回傳 0 的表示式。 :::success 在 [bit-field](https://hackmd.io/@sysprog/c-bitfield?type=view) 一文中則有更詳細的說明。 ::: ##### Extension - `__builtin_choose_expr(const_exp, expr1, expr2)` `__builtin_choose_expr` 是 [GCC 的內建函式](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html),根據文件中對這個函式的說明: > You can use the built-in function \__builtin_choose_expr to evaluate code depending on the value of a constant expression. This built-in function returns exp1 if const_exp, which is an integer constant expression, is nonzero. Otherwise it returns exp2. > > This built-in function is analogous to the ‘? :’ operator in C, except that the expression returned has its type unaltered by promotion rules. Also, the built-in function does not evaluate the expression that is not chosen. For example, if const_exp evaluates to true, exp2 is not evaluated even if it has side effects. > > This built-in function can return an lvalue if the chosen argument is an lvalue. > > If exp1 is returned, the return type is the same as exp1’s type. Similarly, if exp2 is returned, its return type is the same as exp2. 可以知道,這個內建函式跟三元運算子 `?:;` 相似,都是根據條件是否成立來決定要進行的操作,但 `__builtin_choose_expr` 這個函式要求條件 `const_expr` 必須為常數,否則會造成編譯失敗。 :::warning TODO : 與 `?:;` 的用途、優缺點比較 ::: ##### 巨集 - `__is_constexpr` ```c #define __is_constexpr(x) \ (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) ``` 這個巨集被定義在 [include/linux/const.h](https://github.com/torvalds/linux/blob/master/include/linux/const.h) 中,可以用來檢查 `x` 是否能夠在**編譯時期**被視為常數。 :::warning TODO : 原理 ::: 若是 `x` 能夠在編譯階段被視為常數的話,這個巨集就會回傳 `1`,否則則會回傳 0。 #### GENMASK 實作 在 [include/linux/bits.h](https://github.com/torvalds/linux/blob/master/include/linux/bits.h) 中有定義 `GENMASK` 相關的巨集: ```c // include/linux/bits.h #define __GENMASK(h, l) \ (((~UL(0)) - (UL(1) << (l)) + 1) & \ (~UL(0) >> (BITS_PER_LONG - 1 - (h)))) #define GENMASK(h, l) \ (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) ``` 而 `GENMASK` 又會根據 `__ASSEMBLY__` 的定義與否展開成不同的內容: ```c #if !defined(__ASSEMBLY__) #include <linux/build_bug.h> #define GENMASK_INPUT_CHECK(h, l) \ (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \ __is_constexpr((l) > (h)), (l) > (h), 0))) #else /* * BUILD_BUG_ON_ZERO is not available in h files included from asm files, * disable the input check if that is the case. */ #define GENMASK_INPUT_CHECK(h, l) 0 #endif ``` 可以看到若是 `__ASSEMBLY__` 有被定義的話,就會展開成 ```c #define GENMASK_INPUT_CHECK(h, l) \ (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \ __is_constexpr((l) > (h)), (l) > (h), 0))) ``` 而從前面的說明可以知道這段可以應該 2 種情況討論: - 若 `(l) > (h)` 能夠在編譯時期被確認是成立: 1. `__is_constexpr` 在編譯時期被相當於 1 2. `__builtin_choose_expr` 會選擇 `(l) > (h)` 的表示式 3. 由於 `(l) > (h)` 成立,所以 `BUILD_BUG_ON_ZERO` 會產生錯誤、造成編譯失敗 - 若 `(l) > (h)` 在編譯時期被確認是**不成立**,或**無法確認是否成立**: 1. `__is_constexpr` 在編譯時期被相當於 0 2. `__builtin_choose_expr` 會選擇 `0` 的表示式 3. 由於 `0` 代表不成立,所以 `BUILD_BUG_ON_ZERO` 不會產生錯誤 因此可以知道這段複雜的巨集用意是在避免使用**常數**作為 `GENMASK` 巨集的參數時誤將傳入數值比 `h` 大的 `l`。 :::warning 在 GCC 內建函式中有類似的函式 [`__builtin_constant_p(exp)`](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) 能夠作到與 `__is_constexpr` 相似的檢查,而原先的 `GENMASK` 巨集也是依賴這個內建函式進行檢查,但在 [2018 年 3 月時被反應](https://lore.kernel.org/lkml/42b4342b-aefc-a16a-0d43-9f9c0d63ba7a@rasmusvillemoes.dk/) `__builtin_constant_p` 在編譯時可能無法在正確的時間點檢查 `exp`,因此會造成 `__builtin_choose_expr` 的 `const_expr` 部份被認為是非常數而導致編譯失敗。直到在 [2021 年 3 月時才改用 `__is_constexpr`](https://github.com/torvalds/linux/commit/f747e6667ebb2ffb8133486c9cd19800d72b0d98) 來取代 `__builtin_constant_p` 進行判斷。 ::: 而在檢查完 `h`、`l` 之後(或因 `__ASSEMBLY__` 未定義而不進行檢查),就會接著產生 mask 的部份: ```c #define __GENMASK(H, L) \ (((~UL(0)) - (UL(1) << (L)) + 1) & \ (~UL(0) >> (BITS_PER_LONG - 1 - (H)))) ``` :::info 因為 `1` 與 `l` 的字型非常相似,所以為了愛護眼睛,我將巨集的 `h` 以及 `l` 改用大寫字母 `H` 以及 `L` 替代。 ::: 可以看到這個巨集一樣是對 64 位元的無號整數 0 取一補數來產生 64 個 1 進行位移,而這邊將這個巨集分成兩部份討論: - `((~UL(0)) - (UL(1) << (L)) + 1)` 的部份: - `(~UL(0))`:產生 64 個 1 - `- (UL(1) << (L))`:將第 `L` 個位元值設成 0 - `+ 1`:使用 `+ 1` **進位的特性**,將最低位的 0(也就是第 `L` 個位元)設成 1,而其它更低的位元(即第 `0` 至 `L-1` 位元)則會因進位被設成 0 - 到這邊可以產生==第 `L-1` ~ 0 位元皆為 0 的 mask== - `(~UL(0) >> (BITS_PER_LONG - 1 - (H)))` 的部份: - `(~UL(0)`:產生 64 個 1 - `(BITS_PER_LONG - 1 - (H))`:由於位元的編號是從 0 開始,所以最高位應為 63(即 `BITS_PER_LONG - 1`),而 `- (H)` 則是找出應要右移的位數 - `>> (BITS_PER_LONG - 1 - (H))`:右移讓最高的 `63 - (H)` 個位元為 0 - 到這邊可以產生==第 `63` ~ `H+1` 位元皆為 0 的 mask== 最後再將兩個 mask 進行 AND 運算即可產生第 `H` ~ `L` 位元皆為 1 的 mask。 ### Linux 核心原始程式碼中二處 GENMASK 巨集和 include/linux/bitfield.h 的應用案例 --- ## 第二題 - `ALIGN_DOWN` (EXP1) ```c struct foo; struct fd { struct foo *foo; unsigned int flags; }; enum { FOO_DEFAULT = 0, FOO_ACTION, FOO_UNLOCK, } FOO_FLAGS; static inline struct fd to_fd(unsigned long v) { return (struct fd){EXP1, v & 3}; } ``` 這段程式碼是要將一地址 `v` 看作是結構體 `foo` 的起始位置,若地址 `v` 沒有以 4 位元組進行對齊的話,則需要將地址向下對齊,因此 EXP1 要先捨去地址的後 2 個位元,然後再轉型成結構體 `foo` 的指標,即 ==EXP1== 應為 `(struct foo *)(v & ~3)`。 ### 在 Linux 核心原始程式碼找出類似技巧的實作 (例如檔案系統) 並解說 在[第九題的延伸問題](#%E6%89%BE%E5%87%BA%E4%B8%A6%E8%AA%AA%E6%98%8E-Linux-%E6%A0%B8%E5%BF%83%E4%B8%AD-round-upround-down-%E7%9A%84%E6%87%89%E7%94%A8%E5%A0%B4%E5%90%88)中一併討論。 --- ## 第三題 - [LeetCode 190. Reverse Bits](https://leetcode.com/problems/reverse-bits/) (EXP2, EXP3) ```c= #include <stdint.h> uint8_t rev8(uint8_t x) { x = (x >> 4) | (x << 4); // 以 4 位元為單位,兩段互換 x = ((x & 0xCC) >> 2) | (EXP2); // 以 2 位元為單位,0,1 2,3 段兩兩互換 x = ((x & 0xAA) >> 1) | (EXP3); // 以 1 位元為單位,0,1 2,3 4,5 6,7 段兩兩互換 return x; } ``` 將 $n$ 個位元以不同單位大小分割成偶數段,每次將奇數部份以及偶數部份透過位移以及 bitwise-or 互換,而因為 `0xCC` 與 `0xAA` 分別代表 `0b11001100` 與 `0x10101010`,即分別為單位大小為 2 位元及 1 位元的偶數段部份,所以 EXP2 與 EXP3 應為對應單位大小的奇數段部份。因此 ==EXP2== 應為 `0x33` (`0b00110011`),而 ==EXP3== 則應為 `0x55` (`0b01010101`) ### 在 Linux 核心原始程式碼找出類似技巧的實作並解說其應用場景 在 [include/linux/bitrev.h](https://github.com/torvalds/linux/blob/master/include/linux/bitrev.h) 中包含了 1, 2, 4 位元組反轉的巨集實作,可以看到大致上分成 3 種實作方式: 1. bitwise 反轉(常數) 當要反轉的值屬於常數時,則會直接使用與測驗題相似的方式,應該是為了讓編譯器能夠將常數運算直接最佳化掉。 2. 查表(非常數且無硬體指令) 在 [lib/bitrev.c](https://github.com/torvalds/linux/blob/master/lib/bitrev.c) 中有透過陣列 `byte_rev_table` 列舉了 8 位元對應的反轉結果,因此當 `CONFIG_HAVE_ARCH_BITREVERSE` 沒有被定義時,可以不用以 4, 2, 1 位元作為單位大小進行反轉,直接透過查表找到該位元組對應的反轉結果即可。 3. 反轉指令(非常數但有硬體指令) 在某些架構的指令集中可能有能夠反轉位元的指令,像是 [ARM 的 `RBIT` 指令](https://developer.arm.com/documentation/dui0489/h/arm-and-thumb-instructions/rev--rev16--revsh--and-rbit) 就能夠直接反轉一個 32 位元的資料。因此在 `CONFIG_HAVE_ARCH_BITREVERSE` 有被定義時就能夠直接使用指令進行更有效率的反轉操作。 --- ## 第四題 - foreach (EXP4, EXP5) ```c #include <assert.h> #define _foreach_no_nullval(i, p, arr) \ assert((i) >= sizeof(arr) / sizeof(arr[0]) || (p)) #define foreach_int(i, ...) \ for (unsigned _foreach_i = (((i) = ((int[]){__VA_ARGS__})[0]), 0); \ _foreach_i < sizeof((int[]){__VA_ARGS__}) / sizeof(int); \ (i) = ((int[]){__VA_ARGS__, 0})[EXP4]) #define foreach_ptr(i, ...) \ for (unsigned _foreach_i = \ (((i) = (void *) ((typeof(i)[]){__VA_ARGS__})[0]), 0); \ (i); (i) = (void *) ((typeof(i)[]){__VA_ARGS__, NULL})[EXP5], \ _foreach_no_nullval(_foreach_i, i, \ ((const void *[]){__VA_ARGS__}))) ``` 兩個 `for each` 巨集的實作概念都差不多,使用了 [Variadic Macro](https://gcc.gnu.org/onlinedocs/cpp/Variadic-Macros.html) 進行陣列宣告,並透過 `for` 迴圈以及 `,` 運算子: > 在規格書中的 [6.5.17 Comma operator](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf#%5B%7B%22num%22%3A270%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2C0%2C792%2C0%5D) 中關於 `,` 這個運算子的說明: > 1. Syntax : > ``` > expression: > assignment-expression > expression , assignment-expression > ``` > 2. The left operand of a comma operator is evaluated as a void expression; there is a sequence point between its evaluation and that of the right operand. Then the right operand is evaluated; the result has its type and value. > 3. **EXAMPLE** As indicated by the syntax, the comma operator (as described in this subclause) cannot appear in contexts where a comma is used to separate items in a list (such as arguments to functions or lists of initializers). On the other hand, it can be used within a parenthesized expression or within the second expression of a conditional operator in such contexts. In the function call `f(a, (t=3, t+2), c)` the function has three arguments, the second of which has the value 5. 因此在 `for` 迴圈的初始化部份會先執行 `,` 左側,也就是先將陣列的第一個元素賦值給變數 `i`,最後才會將 `,` 右側的 `0`,也就是陣列的初始 index 賦值給 `_foreach_i`。 除了初始化用來儲存陣列的變數 `i` 以及陣列 index 的變數 `_foreach_i` 之外,每次迴圈結束還需要增加 `_foreach_i` 並將 `i` 移動到下一個元素,因此 `_foreach_i` 可以直接透過 `++_foreach_i` 或 `_foreach_i++` 增加,但 `i` 由於並不是指標,所以必須重新初始化陣列,並將下一個元素透過 `_foreach_i` 賦值給 `i`,因此 ==EXP4== 與 ==EXP5== 皆應為 `++_foreach_i` 才能在每次迭代結束時將正確的元素重新賦值給 `i`。 ### 在 Linux 核心原始程式碼找出類似技巧的實作並解說其應用場景 透過 `$ grep -rP --include="*.[ch]" "\\[\\].*__VA_ARGS__.*"` 尋找相關用法大多是用來初始化陣列或是檢查巨集參數的數量,其中與這題比較相近的是: ```c // drivers/gpu/drm/i915/i915_reg.h /* * Given the arbitrary numbers in varargs, pick the 0-based __index'th number. * * Always prefer _PICK_EVEN() over this if the numbers are evenly spaced. */ #define _PICK(__index, ...) (((const u32 []){ __VA_ARGS__ })[__index]) ``` 除了透過單純宣告成一個陣列之外,還能透過指定 index 來取得目標參數。 --- ## 第五題 - [LeetCode 29. Divide Two Integers](https://leetcode.com/problems/divide-two-integers/) ```c= #include <limits.h> int divide(int dividend, int divisor) { int signal = 1; unsigned int dvd = dividend; if (dividend < 0) { signal *= -1; dvd = ~dvd + 1; } unsigned int dvs = divisor; if (divisor < 0) { signal *= -1; dvs = ~dvs + 1; } int shift = 0; while (dvd > (EXP6)) shift++; unsigned int res = 0; while (dvd >= dvs) { while (dvd < (dvs << shift)) shift--; res |= (unsigned int) 1 << shift; EXP7; } if (signal == 1 && res >= INT_MAX) return INT_MAX; return res * signal; } ``` ### 指出可改進部份並實作 無乘法、無迴圈以外的 branch ```c int divide(int dividend, int divisor) { uint32_t sign = 0; int dvd_mask = dividend >> 31; sign ^= dvd_mask & 0xFFFFFFFF; // explicitly cast to unsigned to avoid overflow uint32_t dvd = (uint32_t) (dividend ^ dvd_mask) - dvd_mask; int dvs_mask = divisor >> 31; sign ^= dvs_mask & 0xFFFFFFFF; // explicitly cast to unsigned to avoid overflow uint32_t dvs = (uint32_t) (divisor ^ dvs_mask) - dvs_mask; uint8_t shift = 0; while (dvd > (dvs << shift)) ++shift; uint32_t res = 0; while (dvd >= dvs) { while (dvd < (dvs << shift)) --shift; // 1 should be unsigned to avoid overflow (1 << 31) res |= 1U << shift; dvd -= dvs << shift; } res |= -!!(res > 0x7FFFFFFFU && !sign) & 0x7FFFFFFFU; // explicitly cast to unsigned to avoid overflow res ^= (uint32_t)(res > 0x7FFFFFFFU && !sign) << 31; return (res ^ sign) - sign; } ``` ### 找出並討論 Linux 核心原始程式碼中找出針對整數/浮點數/定點數除法特別撰寫的程式碼 --- ## 第六題 - [LeetCode 149. Max Points on a Line](https://leetcode.com/problems/max-points-on-a-line/) ```c= #include <stdbool.h> #include "list.h" struct Point { int x, y; }; struct point_node { int p1, p2; struct list_head link; }; static bool can_insert(struct list_head *head, int p1, int p2) { struct point_node *pn; list_for_each_entry (pn, head, link) return EXP8; return true; } static int gcd(int x, int y) { while (y) { int tmp = y; y = x % y; x = tmp; } return x; } static int maxPoints(struct Point *points, int pointsSize) { if (pointsSize <= 2) return pointsSize; int i, j, slope_size = pointsSize * pointsSize / 2 + 133; int *dup_cnts = malloc(pointsSize * sizeof(int)); int *hori_cnts = malloc(pointsSize * sizeof(int)); int *vert_cnts = malloc(pointsSize * sizeof(int)); int *slope_cnts = malloc(slope_size * sizeof(int)); memset(hori_cnts, 0, pointsSize * sizeof(int)); memset(vert_cnts, 0, pointsSize * sizeof(int)); memset(slope_cnts, 0, slope_size * sizeof(int)); for (i = 0; i < pointsSize; i++) dup_cnts[i] = 1; struct list_head *heads = malloc(slope_size * sizeof(*heads)); for (i = 0; i < slope_size; i++) INIT_LIST_HEAD(&heads[i]); for (i = 0; i < pointsSize; i++) { for (j = i + 1; j < pointsSize; j++) { if (points[i].x == points[j].x) hori_cnts[i]++, hori_cnts[j]++; if (points[i].y == points[j].y) vert_cnts[i]++, vert_cnts[j]++; if (points[i].x == points[j].x && points[i].y == points[j].y) dup_cnts[i]++, dup_cnts[j]++; if (points[i].x != points[j].x && points[i].y != points[j].y) { int dx = points[j].x - points[i].x; int dy = points[j].y - points[i].y; int tmp = gcd(dx, dy); dx /= tmp; dy /= tmp; int hash = dx * dy - 1333 * (dx + dy); if (hash < 0) hash = -hash; hash %= slope_size; if (can_insert(&heads[hash], i, j)) { struct point_node *pn = malloc(sizeof(*pn)); pn->p1 = i; pn->p2 = j; EXP9; } } } } for (i = 0; i < slope_size; i++) { int index = -1; struct point_node *pn; list_for_each_entry (pn, &heads[i], link) { index = pn->p1; slope_cnts[i]++; } if (index >= 0) slope_cnts[i] += dup_cnts[index]; } int max_num = 0; for (i = 0; i < pointsSize; i++) { if (hori_cnts[i] + 1 > max_num) max_num = hori_cnts[i] + 1; if (vert_cnts[i] + 1 > max_num) max_num = vert_cnts[i] + 1; } for (i = 0; i < slope_size; i++) { if (slope_cnts[i] > max_num) max_num = slope_cnts[i]; } return max_num; } ``` ### 指出可改進部份並實作 ### 擴充 LeetCode 題目,考慮更多座標點的輸入 (例如超過 10 萬個) 時,設計有效的資料結構以降低記憶體開銷,並確保快速的執行 --- ## 第七題 - `ilog32` ```c /** * @brief 對於一 32 位元無號整數 @v 最少需要幾位元才能儲存 * * 儲存最少需要的位元數 = 最高位元 1 的位置 (32-clz) * 因此可以可以透過 BITMASK 將資料分成高位元以及低位元兩部份,並檢查最高位元的 1 * 在哪部份: * - 若是不在高位元部份就縮小範圍再檢查一次。 * - 若是在高位元部份的話就紀錄並移除低位元部份,然後縮小範圍再重新檢查一次。 */ int ilog32(uint32_t v) { int ret = v > 0; int m = (v > 0xFFFFU) << 4; v >>= m; ret |= m; m = (v > 0xFFU) << 3; v >>= m; ret |= m; m = (v > 0xFU) << 2; v >>= m; ret |= m; m = EXP10; v >>= m; ret |= m; EXP11; return ret; } ``` 以二元搜尋的概念找出最高位元的 1 的位置,從中間開始,每次都將檢查範圍縮小一半,當最高位 1 位於高位元部份時,則可以右移掉低位元部份,讓兩種狀況能以相同的形式繼續進行檢查。若是以這個邏輯實作的話,只要重複三行程式碼: ```c= // 檢查最高位 1 是否在 16 - 31 位元 int m = (v > 0xFFFFU) << 4; v >>= m; // 若是的話 m 會變成 16,否則為 0 (代表能夠捨棄的位元數) ret |= m; // 相當於 ret += m,即用來紀錄最高位 1 的位置 // 檢查最高位 1 是否在 或 8 - 31 位元 m = (v > 0xFFU) << 3; v >>= m; // 若是的話 m 為 8,否則為 0 (代表能夠捨棄的位元數) ret |= m; // 檢查最高位 1 是否在 或 4 - 31 位元 m = (v > 0xFU) << 2; v >>= m; // 若是的話 m 為 4,否則為 0 (代表能夠捨棄的位元數) ret |= m; // 檢查最高位 1 是否在 或 2 - 31 位元 m = (v > 0x3U) << 1; /* EXP10 */ v >>= m; // 若是的話 m 為 2,否則為 0 (代表能夠捨棄的位元數) ret |= m; // 檢查最高位 1 是否在 或 1 - 31 位元 m = (v > 0x1U) << 0; v >>= m; // 若是的話 m 為 1,否則為 0 (代表能夠捨棄的位元數) ret |= m; ``` 可以簡單的知道 ==EXP10== 應為 `(v > 0x3U) << 1`,即用來檢查最高位元 1 是否在第 2 至 31 位元。 但這樣會因為每次都分成兩長度不為 0 的部份,所以最多只能檢查 31 個位元,造成當 `v == 1` 時會產生錯誤結果,因此最後還需要檢查 `v` 是否在第 0 個位元,即 `v` 是否為 0: ```c=26 ret += (v > 0); ``` 這邊要注意的是,由於前面都是透過 `|` 設定 `ret` 的第 0 至第 5 位元來紀錄所需位元數,因此最後需要改用 `+`,才不會在某些情況(第 23 行條件成立時)產生錯誤結果。 在了解檢查機制之後,需要注意到在原程式碼的一開始宣告 `int ret = v > 0;` 時就已經先檢查 `v > 0` 的情況了,所以需要改用 `+` 的是第 23 至 25 行的部份: ```c=22 // 檢查最高位 1 是否在 或 1 - 31 位元 m = (v > 0x1U) << 0; v >>= m; // 若是的話 m 為 1,否則為 0 (代表能夠捨棄的位元數) ret += m; ``` 而這邊又可以在進行簡化(左移 `0` 可以忽略、`v` 不須再右移、`m` 與 `ret` 的操作可以合併): ```c=23 ret += (v > 0x1); ``` 應此 ==EXP11== 最精簡的表示法應為 `ret += v > 1`。 ### 在 Linux 核心原始程式碼找出類似實作並解說其應用場景 [quiz4 的 ilog2 說明](https://hackmd.io/@freshLiver/linux2022q1-hw4-quiz#ilog2-巨集)。 ### 缺乏硬體 ctz/clz 指令的微處理器上,如何實作 branchless 的 ilog 研讀論文[《Using de Bruijn Sequences to Index a 1 in a Computer Word》](http://supertech.csail.mit.edu/papers/debruijn.pdf) ### 實作編譯時期 (compile-time) 的 ilog2 實作 運用〈[你所不知道的 C 語言:前置處理器應用篇](https://hackmd.io/@sysprog/c-preprocessor)〉和〈[你所不知道的 C 語言:數值系統篇](https://hackmd.io/@sysprog/c-numerics)〉提及的技巧 --- ## 第八題 - 二元搜尋樹 (C++) ```cpp= typedef struct tnode *tree; struct tnode { int data; tnode *left; tnode *right; tnode(int d) { data = d; left = right = 0; } }; ``` ```cpp= // native.cpp void remove_data(tree &t, int d) { tnode *p = t; tnode *q; while (p != 0 && p->data != d) { if (d < p->data) { q = p; p = p->left; } else { q = p; p = p->right; } } if (p != 0) { if (p->left == 0) if (p == t) t = p->right; else if (p == q->left) q->left = p->right; else q->right = p->right; else if (p->right == 0) if (p == t) t = p->left; else if (p == q->left) q->left = p->left; else q->right = p->left; else { tnode *r = p->right; while (r->left) { q = r; r = r->left; } p->data = r->data; if (r == p->right) p->right = r->right; else q->left = r->right; p = r; } delete p; } } ``` 使用 indirect pointer 改寫簡化: ```cpp= // improve.cpp void remove_data(tree &t, int d) { tnode **p = &t; while (*p != 0 && (*p)->data != d) { if (d < (*p)->data) EXP12; else EXP13; } tnode *q = *p; if (!q) return; if (!q->left) *p = q->right; else if (!q->right) *p = q->left; else { tnode **r = &q->right; while ((*r)->left) r = EXP14; q->data = (*r)->data; q = *r; *r = q->right; } delete q; } ``` ### 以更精簡、高效且符合 C99 標準進行實作探討效率的提升 ### 探討針對記憶體佈局 (memory layout) 的改進策略 研讀 [Static B-Trees](https://en.algorithmica.org/hpc/data-structures/s-tree/) --- ## 第九題 - ALIGN_UP ```c /* maximum alignment needed for any type on this platform, rounded up to a power of two */ #define MAX_ALIGNMENT 16 /* Given a size, round up to the next multiple of sizeof(void *) */ #define ROUND_UP_TO_ALIGNMENT_SIZE(x) \ (((x) + MAX_ALIGNMENT - MMM) & ~(NNN)) ``` 與實作四捨五入的概念(加上 `0.5` 再捨去小數點部份)有點相似,直接對地址加上對齊大小(這段程式碼為 16 位元組對齊),接著再捨去地址的最低 4 位元,因此 `~(NNN)` 應為除最低 4 位元皆為 1 的數,即 ==NNN== 應為 `MAX_ALIGNMENT - 1` (15)。 但是直接加上對齊大小會讓已對齊的地址也被向上對齊,因此要先讓原地址 `x` 減 1 才能確保已對齊的地址向上對齊後保持不變,所以 ==MMM== 應為 1。 ### 撰寫出對應 ROUND_DOWN 巨集 ```c #define MAX_ALIGNMENT 16 #define ROUND_DOWN_TO_ALIGNMENT_SIZE(x) ((x) & ~(MAX_ALIGNMENT - 1)) ``` 跟第二題的概念相同,直接捨棄地址最後的 $log_2(MAX\_ALIGNMENT)-1$ 個位元。 ### 找出並說明 Linux 核心中 round-up/round-down 的應用場合 md raid metadata ```c // driver/md/raid1.h /* * each barrier unit size is 64MB fow now * note: it must be larger than RESYNC_DEPTH */ #define BARRIER_UNIT_SECTOR_BITS 17 #define BARRIER_UNIT_SECTOR_SIZE (1<<17) // driver/md/raid1.c static sector_t align_to_barrier_unit_end(sector_t start_sector, sector_t sectors) { sector_t len; WARN_ON(sectors == 0); /* * len is the number of sectors from start_sector to end of the * barrier unit which start_sector belongs to. */ len = round_up(start_sector + 1, BARRIER_UNIT_SECTOR_SIZE) - start_sector; if (len > sectors) len = sectors; return len; } ``` --- ## 第十題 - DIVIDE_ROUND_CLOSEST ```c #define DIVIDE_ROUND_CLOSEST(x, divisor) \ ({ \ /* evaluate once */; \ typeof(x) __x = x; \ typeof(divisor) __d = divisor; \ /* check unsigned or not */; \ (((typeof(x)) -1) > 0 || ((typeof(divisor)) -1) > 0 || \ (((__x) > 0) == ((__d) > 0))) \ ? ((7777 /* RRR */) / (__d)) \ : ((8888 /* SSS */) / (__d)); \ }) /** * True : SSS 皆可以用 "被除數 + 0.5 * 除數" 表示 * - TTT 兩數皆為無號 * 1. 皆為正數 : SSS = 被除數 + 0.5 * 除數 * 2. 皆非 0 : divide by zero,不處理 * * - FTT 被除數為有號,除數為無號 * 1. 皆為正數 : SSS = 被除數 + 0.5 * 除數 * 2. 除數必為 0 : divide by zero,不處理 * * - TFT 被除數為無號,除數為有號 * 1. 皆為正數 : SSS = 被除數 + 0.5 * 除數 * 2. 被除數為 0,除數為負數 : 0 >= SSS > 除數 * 3. 被除數為 0,除數為 0 : divide by zero,不處理 * * - FFT 兩數皆為有號數 * 1. 皆為正數 : SSS = 被除數 + 0.5 * 除數 * 2. 皆為負數 : SSS = 被除數 - 0.5 * |除數| * 3. 被除數為 0,除數為負數 : 0 >= SSS > 除數 * 4. 被除數為負數,除數為 0 : divide by zero,不處理 * 5. 被除數為 0,除數為 0 : divide by zero,不處理 * * - FTF 被除數為無號,除數為有號,恰一數為非正數 * 1. 被除數為 0,除數為正 : -除數 < SSS < 除數 * 2. 被除數為正,除數為負 : SSS = 被除數 + 0.5 * |除數| * 3. 被除數為正,除數為 0 : divide by zero,不處理 * * - TFF 被除數為有號,除數為無號,恰一數為非正數 * 1. 被除數為 0,除數為正 : -除數 < SSS < 除數 * 1. 被除數為負,除數為正 : * 1. 被除數為正,除數為 0 : divide by zero,不處理 * * - TTF 兩數皆為無號數,但恰有一數為非正數 (0) * 1. 被除數為 0,除數為正數 : -除數 < SSS < 除數 * 2. 被除數為正數,除數為 0 : divide by zero,不處理 * * * False : RRR 皆可以用 "被除數 - 0.5 * 除數" 表示 * - FFF 兩數皆為有號型別,且恰有一數為非正數 : * 1. 被除數為正,除數為負 : RRR = 被除數 + 0.5 * |除數| * 2. 被除數為負,除數為正 : RRR = 被除數 - 0.5 * 除數 * 3. 被除數為 0,除數為正 : -除數 < RRR < 除數 * 4. 被除數為正,除數為 0 : divide by zero,不處理 * */ ``` ### 找出並說明 Linux 核心中 div round-up/closest 的實作與使用情景 在 `insmod` 時,會使用到定義在 `include/linux/module_decompress.c` 中的 [`module_decompress`](https://hackmd.io/rJ7YiwC_TSa6C5l2i-p0kg?both#%E5%82%B3%E9%81%9E%E6%A0%B8%E5%BF%83%E6%A8%A1%E7%B5%84%E5%8F%83%E6%95%B8),而其中又會使用到 `DIV_ROUND_UP` 來確保核心模組解壓縮後仍有足夠空間能儲存資訊。 ## 第十一題 - Integer Square Root ```c= static inline unsigned long fls(unsigned long word) { int num = 64 - 1; if (!(word & (~0ul << 32))) { num -= 32; word <<= 32; } if (!(word & (~0ul << (64 - 16)))) { num -= 16; word <<= 16; } if (!(word & (~0ul << (64 - 8)))) { num -= 8; word <<= 8; } if (!(word & (~0ul << (64 - 4)))) { num -= 4; word <<= 4; } if (!(word & (~0ul << (64 - 2)))) { num -= 2; word <<= 2; } if (!(word & (~0ul << (64 - 1)))) num -= 1; return num; } ``` ```c= unsigned long i_sqrt(unsigned long x) { unsigned long b, m, y = 0; if (x <= 1) return x; m = 1UL << (fls(x) & ~1UL); while (m) { b = y + m; XXX; if (x >= b) { YYY; y += m; } ZZZ; } return y; } ``` ### 嘗試利用硬體的 clz/ctz 指令改寫 ### 在 Linux 核心找出類似的巨集和程式碼,說明其應用場景

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully