2019q3 Homework1 (review)

# 2019q3 Homework1 (review) contributed by < `flawless0714` > ## 實驗環境 ``` OS: Ubuntu 18.04.3 LTS (kernel version: v5.0.0) GCC: 7.4.0 CPU arch: amd64 ``` 以下有關 Linux kernel 的程式碼其版本號均為 `v5.0.10`。 --- ## 題組 1 - [課程簡介和注意須知 Page 15](https://docs.google.com/presentation/d/1Wo22s5aYuuyry97-z2MMmUmdxhjD0wKTdhxIW5oqjvo/edit#slide=id.g613ce7d15f_1_15) ```cpp= #include <stdint.h> int32_t abs(int32_t x) { int32_t mask = (x >> 31); return (x + mask) ^ mask; } ``` ### 運作原理上述程式碼第三行的 `mask` 拿到的是 `x` 進行算術位移([arithmetic shift](https://en.wikipedia.org/wiki/Arithmetic_shift))後得到的結果，而 C11 ([GNU GCC 7.4.0 預設 std 為 gnu11](https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gcc/Standards.html#Standards)) [規格書](http://www.iso-9899.info/n1570.html)第 6.5.7 段指出： :::info The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2 E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined. ::: 由此可知，負數在進行算術右移時是否將 MSB 填補到右移後的空缺位元(此舉為 sign extension 的一種)是 **implementation-defined** 的。接著看看 GCC 如何實做這個 implementation-defined 的實做，[GCC 官方指出](https://gcc.gnu.org/onlinedocs/gcc-7.4.0/gcc/Integers-implementation.html#Integers-implementation)： :::info Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension. ::: 如此一來就可以確保我們使用的 GCC (7.4.0) 在遇到負數進行算術右移時的行為。現在可以確定一件事，`x` 如果是正整數的話 `mask` 會得到 0，而負整數會得到 -1。往下看到第四行，首先我們知道二補數正數要轉負數是將正數反相後加 1，反之如果要將負數轉回正數就是減 1 後反相。對，第四行在做的就是這件事，`(x + mask)` 在 `x` 為負數時會進行 `x - 1`，反之則是 `x - 0`，計算完後我們使用 `^ mask` 對前面的和做反相(若 x 為正整數則不動作，因為任意整數對 0 做 XOR 為自己本身)。 ### `abs(-2147483648)` 會得到什麼? 在 32 位元有號整數下，將 `-2147483648` 減 1 會得到 `2147483647` (0x7fffffff)，接著將剛剛得到的結果反相會得到 `0x80000000`，也就是 `-2147483648`。至此我們可以得知為什麼 `-2147483648` 用 `abs()` 取絕對值會得到非預期的結果。 ## 題組 2 - [第一週測驗題第一題](https://hackmd.io/@sysprog/HksKVpUIr#%E6%B8%AC%E9%A9%97-1) ### 題目考慮以下 C 程式的 `align4` 巨集的作用是，針對 4-byte alignment，找到給定地址的 round up alignment address。 ```cpp #include <stdio.h> #define align4(x) (((x) + K) & (-4)) int main(void) { int p = 0x1997; printf("align4(p) is %08x\n", align4(p)); return 0; } ``` 預期程式輸出 `align4(p) is 00001998` ==作答區== `K` = ? * `(a)` `(-3)` * `(b)` `(-2)` * `(c)` `(-1)` * `(d)` `(0)` * `(e)` `(1)` * `(f)` `(2)` * `(g)` `(3)` :::success 延伸題目: 在 Linux 核心原始程式碼找出類似的 alignment 巨集並解釋其作用 ::: ### 作答答案為 `(g)` `(3)`，本題使用 round up 手法將給定的記憶體位置轉換為 4-byte alignment 的記憶體位置。round up 原用於輸出指定數值之倍數的值，並且該輸出會大於等於輸入，其實在 Linux kernel 中有類似的[巨集](https://elixir.bootlin.com/linux/v5.0.10/source/include/linux/kernel.h#L96): ```c /* * This looks more complex than it should be. But we need to * get the type for the ~ right in round_down (it needs to be * as wide as the result!), and we want to evaluate the macro * arguments just once each. */ #define __round_mask(x, y) ((__typeof__(x))((y)-1)) /** * round_up - round up to next specified power of 2 * @x: the value to round * @y: multiple to round up to (must be a power of 2) * * Rounds @x up to next multiple of @y (which must be a power of 2). * To perform arbitrary rounding up, use roundup() below. */ #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1) #define round_down(x, y) ((x) & ~__round_mask(x, y)) ``` 先說明題目中 `align4()` 這個巨集的工作原理，首先遇到的 `((x) + 3)` 意在將輸入增加至以下範圍： ``` 0x1994...0x1998...0x199c |----| | --- | <- 0x1998~0x199b (after addition of 3) || ∟--0x1994~0x1997 (where `x` might start) ``` 相加後將結果對 `-4 (0xfffffffc)` 做 AND 運算，此運算的目的是將小於 4 的倍數的位元(bit 0, bit 1)清掉，至此即可得到**大於等於**(若輸入本身已為 4 之倍數，則輸出等於輸入)輸入的 4-byte alignment address。值得注意的是這種手法僅可用於 2^n^-byte alignment，因為其中的 `& (-4)` 使用了二進制的特性(小於 2^n^ 的位元都會是 0)。如果用此方法取得非 2^n^ 數值的 alignment address，結果會是非預期的。例如：以同樣方式實做了 `align5()`，假設輸入為 7，運算結果為： ``` (((7) + 4) & (-5)) => 0xB & 0xFFFFFFFB => 0xB != 0xA ``` 接著嘗試解析 Linux kernel 中實做的 round up 巨集(此巨集也僅用於 2^n^ 的 round up，若需任意數的 round up，[Linux kernel 也有提供實做](https://elixir.bootlin.com/linux/v5.0.10/source/include/linux/kernel.h#L139))，有關 `typeof` 這個 extension 的用途可以參考[你所不知道的C語言：技巧篇 - typeof](/@sysprog/HyIdoLnjl?type=view#善用-GNU-extension-的-typeof)，簡單來說就是要避免 double evaluation 的發生(p.s. 在這邊還有另個用途，`round_down()` 時 `y` 的資料型態需與 `x` 一致，如此一來在做 AND 運算時才可以確保所有位元都有 AND 到反相後的 `__round_mask()`)。首先，將輸入(`x`)減 1，目的為將稍後的運算結果控制在 round up 預期的輸出(最接近 `x` 且大於等於 `x` 的 2^y^ 數值)。接著將上述得到的差與 `y - 1` 做 `OR` 運算，減 1 的目的為了使最後加 1 的和為**大於等於**輸入的 `y` 的倍數。而 OR 運算用於取得與 `y` 的倍數差 1 的數。最後，將上述結果加 1，即可得到 round up 後的 2^n^-byte alignment address。 :::info 剛剛在 @kksweet8845 同學的共筆中看到同份 kernel head file 中還有個巨集，`ALIGN()`，這個巨集對應到另個名為 `__ALIGN_KERNEL()` 的巨集，查看內部實做後發現其實在做的就是 round up，這邊不太理解為什麼 `ALIGN()` 不直接用 `round_up()` 這個巨集包起來，而是再實做一個功能同樣為 2^n^ round up 的巨集。 ::: :::danger 何不試著修改 Linux 核心原始程式碼，然後做實驗呢？原始程式碼是給你改進用的，不是拿來「舉燭」 :notes: jserv ::: ## 題組 3 - [第一週測驗題第二題](https://hackmd.io/@sysprog/HksKVpUIr#%E6%B8%AC%E9%A9%97-2) 題目程式碼： ```c #include <stdint.h> #define INT_SIZE_OF(n) \ ((sizeof(n) + sizeof(int) + X) & ~(sizeof(int) + Y)) ``` ==作答區== `X` = ? * `(a)` `(-3)` * `(b)` `(-2)` * `(c)` `(-1)` * `(d)` `(0)` * `(e)` `(1)` * `(f)` `(2)` * `(g)` `(3)` `Y` = ? * `(a)` `(-3)` * `(b)` `(-2)` * `(c)` `(-1)` * `(d)` `(0)` * `(e)` `(1)` * `(f)` `(2)` * `(g)` `(3)` ### 作答 `X` = `(c)` `(-1)`, `Y` = `(c)` `(-1)` 手法與前一題類似(round up)，首先先取得輸入 `n` 的大小(`sizeof(n)`)，接著將陣列大小加上 `int` 所佔空間後，將和減 1，這邊加上 `int` 所佔空間是為了稍後做 AND 運算後可以得到 round up 的效果，而減 1 是為了避免 `n` 本來就是 int 的倍數，而造成加完後得到下一個 int 的倍數，使得最後得到非預期的結果。將上述運算結果與 `sizeof(int) - 1` 的反相做 AND 運算，這麼做是為了要以 mask 的方式將前一項(`sizeof(n) + sizeof(int) + (-1)`)小於 int 所佔空間的位元 mask 掉，以使得最終得到 int-aligned 的大小。以下以輸入一名為 `arr` 且大小為 14 bytes 的 array of character 做為舉例(假設 data model 為 LP64)： ``` #define INT_SIZE_OF(n) ((sizeof(n) + sizeof(int) - 1) & ~(sizeof(int) - 1)) char arr[14]; INT_SIZE_OF(arr) -------------------------------- // after preprocess ((14 + 4 - 1) & ~(4 - 1)) => ((17) & ~(3)) => 0b10001 & 0b11100 // bits lesser than 4 (sizeof(int)) are being // masked by AND-ing with `~(sizeof(int) - 1)` => 0b10000 => 16 ``` ### [Memory access and alignment](https://lwn.net/Articles/260832/) - 即便提示 GCC 不插入 padding 做 alignment，GCC 還是會加入額外的 instruction 以防止 unaligned access，以避免在特定架構上觸發 processor exception，但是會以效能做 tradeoff - ARM32, Alpha 架構之 CPU 對於 unaligned access 會觸發 processor exception，並且此 exception 會提供足夠的資訊讓 kernel 去修正該次 access (開銷很大)，而其他除了 x86_64 & i386 可以做 unaligned access 之外，有些架構只會觸發 exception 但不提供足夠資訊讓 kernel 做修正，又有些架構完全不會觸發 exception，他們照常存取，但**拿到的資料是錯的** - byte to byte 的存取不算 unaligned access，`memcpy()` 在這種情境是你的好夥伴 - 有鑑於 GNU/Linux 是個支援多種處理器架構的作業系統，所以其在編寫時就需遵守 natural alignment (假設 N = 2^k^，我們要存取 N 個 byte，那麼 `addr % N == 0` 需成立) 這個限制，也就是說不管 target architecture 是 x86_64 還是 ARM32，他們上面運行的 Linux 對記憶體的存取都是 alignment 的 ### `INT_SIZE_OF()` on GitHub 在 GitHub 上找到關於這個巨集的使用十之八九都是給 variadic argument 的巨集用的，大部分都是這樣： ```c #define _INTSIZEOF(n) ( (sizeof(n) + sizeof(int) - 1) & ~(sizeof(int) - 1) ) // `va_list` might be a pointer to either char or void (implementation-dependent) #define va_start(ap,v) ( ap = (va_list)&v + _INTSIZEOF(v) ) #define va_arg(ap,t) ( *(t *)((ap += _INTSIZEOF(t)) - _INTSIZEOF(t)) ) #define va_end(ap) ( ap = (va_list)0 ) ``` 之前一直對 variadic argument 怎麼運作的很好奇，剛好趁這次來了解一下 XD 一般 variadic argument 的 function 至少會有一個已知參數，因為 `va_start()` 會需要已知參數來定位後面的 variadic argument。[這位大大](http://213style.blogspot.com/2012/11/c-cdecl-part2.html)說明了如果 function 沒有已知參數的話要怎麼拿 variadic argument 的第一個參數，原本想試試看，結果編譯時出現： ``` error: ISO C requires a named argument before ‘...’ ``` 看來是 C 語言的規格有限定 variadic argument 的 function 至少要有一個已知參數，至於這位大大怎麼成功的我想是因為他編譯器用 MSVC (從 inline assembly 確定) 的關係。總之如果沒已知參數還要拿 variadic argument 的首個參數的話其實就是自己用 RBP (暫存器) 減掉 offset (這個 offset 跨越了上一層 function 的 RBP 以及 function 的 return address，也就是說會有 16 bytes 的 offset)去拿 RDI ([System V AMD64 ABI calling convention](http://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf) p.21, first arg)丟進 stack 的資料(首個 va_arg 的位置) `_INTSIZEOF()` 在這邊用途是將每次更新後的位置向 `sizeof(int)` 對齊，例如：遇到一個資料型態為 `char` 的可變數量參數，當使用 `va_arg()` 拿取這個參數的資料後，我們不會只將 `ap` 內儲存的記憶體位置加 1 (`char` 的大小)，而是將記憶體位置加至下一個 int-alignment 的位置，以此方式走訪可變數量參數，可保證我們存取的記憶體位置永遠都是 4-byte alignment 的。 `va_start()` 用於初始化 `va_list`，它使用可變數量參數的**前一個**固定參數的位置來取得第一個可變數量參數的位置。 `va_arg()` 用於取得目前對應的可變數量數量參數並且更新 `va_list` 至下一個可變數量參數的位置。剛開始因為沒注意其中的 `+=` 運算子而一直搞不懂這個巨集到底怎麼工作的...，其實它就是先把 `va_list` 更新到下個位置，再用更新完的位置扣去同樣大小的 offset 以得到原先要存取的可變數量參數的位置，然後再轉型成可變數量參數的資料型態的指標(以免對 `void*` 做 dereference)，最後對其做 dereference 以取得可變數量參數的資料。 `va_end()`，這個巨集可用可不用，它只是將指標清空，以表示可變數量參數的走訪已經結束了，實做時注意一下就好。值得注意的是，有些實做會在 `va_start()` 的開頭跟 `va_end()` 的結尾分別包上 `{` 以及 `}`，這時候就一定要用到 `va_end()` 了。 ## 筆記 - `({})` 是 GCC 的 [extension](https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html)，所以會看到 `do {} while (0)` 及 `({})` 都有人使用 ## 參考資料 - [C99 規格書](http://www.iso-9899.info/n1570.html) - [C11 規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf) ###### tags: `sysprog2019`