2023q1 Homework2 (quiz2)

--- title: 2023q1 Homework2 (quiz2) tags: linux 2023 --- # 2023q1 Homework2 (quiz2) contributed by < `randyuncle` > ## 測驗一 ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 8; x |= x >> 16; x |= x >> 32; return ++x; } ``` * 此程式碼主要的目標是找出變數 `x` 最接近且大於等於 2 的冪次。例如說代入 `x = 42`，回傳的值會是 64，也就是 $2^{6}$。 * 達成以上目標的原理：將輸入的變數 `x` 在做 right shift 後產生的值和原本的 `x` 做 63 次的 OR 運算，及可將第一個 1 以後全部的 0 皆設定成 1。如 `x = 42` 代入並經過前述之運算後，會變成 63 (如果以十六進位的觀點來看的話，就是從 `x = 0x002A` 變成 `0x003F`)。最後，於函數的尾端將 `x` 的值加一，使 `x` 的值進位，從而得到答案。不過，如果要再精簡此程式碼的話，可以將其簡化為以下： ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; x |= x >> 32; return ++x; } ``` ### 運用 `__builtin_clzl` 改寫程式碼首先，先看 `__builtin_clzl` 的定義 > int __builtin_clz (unsigned int x) Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. int __builtin_clzl (unsigned long) Similar to __builtin_clz, except the argument type is unsigned long. 從上面的定義可知道，`__builtin_clzl` 是用來計算並回傳整數中的最高位元位之前的 0 位元的數量。所以，我們也可以用`__builtin_clzl` 去做變數 `x` 的下一個 2 的冪次方的計算。要特別注意的是 `__builtin_clzl` 和 `__builtin_clz` 一樣，在輸入為`x = 0` 時是未定義操作，所以在實作此函數時要做相應的排除。除此之外，有一點要注意的事，`__builtin_clzl` 只能支援 `unsigned long` 的資料型態，也就是說它的輸入最大為 32 位元的值。所以在實作此改寫方案時要將變數 `x` 切成兩個 32 位元的數值來進行。因此，程式可被改寫為以下： ```c uint64_t next_pow2(uint64_t x) { if(!x) return 1; uint32_t front = (uint32_t)(x >> 32), back = (uint32_t)x; int n = 0; if(front) n = 64 - __builtin_clzl(front); else n = 32 - __builtin_clzl(back); return 1 << n; } ``` ### `clz` 內建函式已運用時，編譯器能否產生對應的 x86 指令？執行 `cc -O2 -std=c99 -S next_pow2.c` 後，可以在產生的 `next_pow2.s` 檔案中，發現到裡面誕生了 `bsrq` 指令。因此，可說明上面實作的 `clz` 內建函式在運用後，編譯器能產生對應的 x86 指令。 ``` ... next_pow2: .LFB0: .cfi_startproc endbr64 movl $1, %eax testq %rdi, %rdi je .L1 movq %rdi, %rax shrq $32, %rax jne .L8 bsrq %rdi, %rdi movl $32, %ecx xorq $63, %rdi subl %edi, %ecx ... ``` ## 測驗二