2024q1 Homework5 (assessment)

# 2024q1 Homework5 (assessment) contributed by < [tintinjian12999](https://github.com/tintinjian12999) > ## 從前 4 週的測驗題選出 3 題改進 ### quiz3 測驗一 > [題目](https://hackmd.io/@sysprog/linux2024-quiz3#%E6%B8%AC%E9%A9%97-1) #### 程式碼運作原理 [Digit-by-digit calculation](https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Digit-by-digit_calculation) 的作法是將數值視為基底的相加，假如想知道的平方根為 N，則可以將 N 寫為 $$N = \sum_{i = 0}^{n}a_i$$其中 $a_i$ 視基底而定，若為二進位則 $a_i = 2^i\ or\ 0$，目標是要找出每個位元是 $2^i$ 還是 $0$。將 N 平方後可以得到 $$N^2 = (\sum_{i = 0}^{n}a_i)^2 = a_1^2 + 2a_n\sum_{i = 1}^{n - 1}a_i + a_n^2$$ 此時假設從 $2^n$ 開始已經有 $m$ 項被決定數值，則猜測解可以被寫為 $$P_m = a_n + a_{n-1}+...+a_m = \sum_{i = m}^{n}a_i $$ 而 $P_m$ 和 $a_m$ 的決定可以先假設 $$P_m = P_{m + 1} + 2^m$$ 若 $$P_m^2 \le N^2$$ 則 $a_m = 2^m$ 反之 $a_m = 0$ 且將 $P_m$ 設為 $P_{m + 1}$。上述比較式中需要平方的部份可以透過儲存兩者間的差避免令 $$X_m = N^2 - P_m^2$$此時 $$X_m = X_{m + 1} - Y_m$$ 其中$$Y_m = P_m^2 - P_{m + 1}^2 = 2P_{m + 1}a_m + a_m^2$$ 初始值設為$$a_n = P_n = 2^n$$ 用 $X_m < 0$ 作為判斷 $a_m$ 的基礎。 $Y_m$ 可以被拆解為 $c_m + d_m$，$c_m = 2P_{m + 1}a_m$，$d_m = (a_m)^2$，由此 $Y_m$ 可以被寫為 $$ Y_m = \left\{ \begin{array}{c} c_m + d_m, if\ a_m = 2^m \\ 0, if\ a_m = 0 \end{array} \right.$$ 在每次迴圈更新數值時 $$c_{m-1} = (P_{m + 1} + a_m)2^m = \left\{ \begin{array}{c} c_m/2 + d_m \ \ \ \ if\ a_m = 2^m\\ c_m/2 \ \ \ \ if\ a_m = 0 \end{array} \right.$$ $$d_{m-1} = \frac{d_m}{4}$$ 對應到程式碼中 $b = Y_m$，$m = d_m$，$z = c_m$，$x = X_{m+1}$ #### 使用 `ffs` 取代 `__builtin_clz(x)` 根據 linux manual 的敘述 >The ffs() function returns the position of the first (least significant) bit set in the word i. The least significant bit is position 1 and the most significant position is, for example, 32 or 64. ffs 會返回 least significant bit set的位置，如 ffs(2) 會返回 2，ffs(4) 會返回 3。因此若要知道 most significant 的 1 在哪，可以透過以下程式 ```c int get_msb(int x) { int temp = x, msb = 0; while (temp) { int i = ffs(temp); msb += i; temp >>= i; } msb--; return msb; } ``` ```c int ffs_i_sqrt(int x) { if (x <= 1) /* Assume x is always positive */ return x; int z = 0; int msb = get_msb(x); for (int m = 1UL << (msb & ~1UL); m; m >>= 2) { int b = z + m; z >>= 1; if (x >= b) x -= b, z += m; } return z; } ``` #### 無分支的版本參考[解讀計算機編碼](https://hackmd.io/@sysprog/binary-representation#%E4%B8%8D%E9%9C%80%E8%A6%81%E5%88%86%E6%94%AF%E7%9A%84%E8%A8%AD%E8%A8%88)可以改寫為以下程式碼 ```c for (int m = 1UL << (msb & ~1UL); m; m >>= 2) { int b = z + m; z >>= 1; z += (~(x - b) >> 31) & m; x -= (~(x - b) >> 31) & b; } ``` 使用 `perf` 可以得知 branch-misses 的情況進而比對無分支版本的差異 ```c int main() { for (int i = 0; i < 100000; i++) { int input = rand(); int result = ffs_i_sqrt(input); } return 0; } ``` 使用以上程式碼，用亂數產生隨機正整數重複 100000 次並使用 100 次測量的平均值原始版本 ``` Performance counter stats for './test' (100 runs): 61,090,311 instructions # 0.72 insn per cycle ( +- 0.01% ) 84,543,881 cycles ( +- 0.24% ) 879,974 branch-misses # 7.38% of all branches ( +- 0.02% ) 11,917,337 branches ( +- 0.02% ) 0.026520 +- 0.000441 seconds time elapsed ( +- 1.66% ) ``` 無分之版本 ``` Performance counter stats for './test' (100 runs): 71,990,721 instructions # 1.01 insn per cycle ( +- 0.01% ) 71,186,457 cycles ( +- 0.14% ) 169,479 branch-misses # 1.62% of all branches ( +- 0.06% ) 10,496,528 branches ( +- 0.01% ) 0.022861 +- 0.000418 seconds time elapsed ( +- 1.83% ) ``` 可以看到 branch-misses 有明顯的提昇。 #### 在 Linux 核心原始程式碼找出對整數進行平方根運算的程式碼，並解說其應用案例，至少該包含 block 目錄的程式碼。 #### 閱讀〈Analysis of the lookup table size for square-rooting〉和〈Timing Square Root〉，嘗試撰寫更快的整數開平分根運算程式 ### quiz4測驗三 > [題目](https://hackmd.io/@sysprog/linux2024-quiz4) ## 閱讀教材：CSAPP > 不准看盜版 ### 第二章 #### C語言支援的資料型態 C 語言支援的基礎資料型態與其大小如下表 ![image](https://hackmd.io/_uploads/rytBpnLgA.png) 需要注意的是 `long` 根據機器的不同是有差異的。因此其能表達的數值範圍在 32 位元機器為 ![image](https://hackmd.io/_uploads/HJJ1R2Il0.png) 在 64 位元機器為 ![image](https://hackmd.io/_uploads/ByUe0nUl0.png) #### 整數表示式 ##### 無號整數考慮一個由 $w$ 個位元組成的整數型態，將其寫成 $[x_{w-1}, x_{w_2}, ..., x_0]$，其中 $x_{wn}$ 都為 0 或 1，若要將以上的二進位數值寫成無號整數，可以透過以下表達式達成$$B2U_w = \sum_{i = 0} ^ {w - 1}x_i2^i$$ 如 $$B2U_4([1111]) = 15$$ 由此可已得知無號整數的最大值為 $$ UMax_w = 2^w - 1$$ 因此由 4 個位元組與 8 個位元組組成的無號整數數值範圍$$2 ^ {32} - 1 = 4,294,967,295$$ 和 $$2 ^ {64} - 1 = 18,446,744,073,709,551,615$$ ##### 有號整數這裡使用的是[二補數](https://en.wikipedia.org/wiki/Two%27s_complement)，其轉換函式為$$B2T_w = -x_{w-1} + \sum_{i = 0} ^ {w - 2}x_i2^i$$ 如$$B2T_4([1111]) = -1$$ $$B2T_4([0101])= 5$$ 相似地，有號整數的數值範圍為$$TMin_w = -2^w - 1$$和$$TMax_w = 2^{w-1} - 1$$ ##### 有號整數與無號整數的互換可以簡單的看出兩者的換算為$$T2U_w(x) = \begin{cases} x + 2^w,\quad x<0\\ x,\quad x\ge0 \end{cases}$$ 另一方面$$U2T_w(u) = \begin{cases} u ,\quad u \le TMax_w\\ u - 2^w,\quad u\gt TMax_w \end{cases}$$ #### 浮點數表示式 ### 第十章 ### 第十二章看起來並行多核心可以加速 for 迴圈的運算 ![image](https://hackmd.io/_uploads/r1oYesbXC.png) 不太清楚為什麼 > 解釋個別的做法，才能討論 ## 閱讀教材:Operating Systems: Three Easy Pieces ## 簡述想投入的專案 > 參照 [2023 年期末專題](https://hackmd.io/@sysprog/linux2023-projects)，簡述你想投入的專案 (亦可建立新專案)，至少選出 (或訂出) 二個。 ### 第六次作業:Integration 希望能夠投入[作業六](https://hackmd.io/@sysprog/linux2024-integration/%2F%40sysprog%2Flinux2024-integration-a)的開發。 ### Linux 亂數產生器研究實驗室其他人有在做物理亂數產生相關的研究，正好可以以此做比較。 --- IEEE 754. Assume float is 32-bit ```c float float_mul10(float x) { // Using bitwise operations. No mul/div int exp = (int)x & (0x7f800000); exp <<= 3; (int)x |= exp & (0x7fffffff); x += 2; return x; } ``` // 10 = (2 << 3) + (2 << 0) // https://godbolt.org/ TODO: 繼續撰寫程式碼、測試，並附上分析將浮點數轉型成整數並將 exp 的部分加 3 可以得到乘以 8 的結果，加 1 可以得到乘以 2 的結果，再將兩者相加即可完成乘以 10 。 ```c float float_mul10(float x) { // Using bitwise operations. No mul/div int i_x = *(int*) &x; int sign = i_x & 0x80000000; int exp = i_x & 0x7f800000; int frac = i_x & 0x007fffff; int exp1 = exp + (3 << 23); int exp2 = exp + (1 << 23); int result1 = sign | exp1 | frac; int result2 = sign | exp2 | frac; float result1_f = *(float*) &result1; float result2_f = *(float*) &result2; return result1_f + result2_f; } ``` 注意不能使用 `int i_x = (int) x`，因為這實際上會以整數的表示式來表達浮點數數值，而非將浮點數轉為整數，從下面範例可以觀察 ```c int main () { float x = 4.0; int i_x = (int) x; //Expected printf("%b \n", x); printf("%b \n", i_x); return 0; } ``` 輸出結果 ```c 10100010101110011011000001011000 100 ``` 兩者的二進制表示法明顯不一致。 TODO: ksort + extra