# 2023q1 Homework2 (quiz2) contributed by < `yutingshih` > ## 開發環境 > 目前尚未在實體機器執行 GNU/Linux,因此暫時在 Apple M1 Mac 上使用 Lima VM,參考 [Lima VM with a foreign architecture (slow mode)](https://github.com/lima-vm/lima/blob/master/docs/multi-arch.md) ```shell $ uname -a Linux lima-linux2023 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 40 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: AuthenticAMD Model name: QEMU Virtual CPU version 2.5+ CPU family: 15 Model: 107 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: 1 BogoMIPS: 1999.99 Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_ good nopl cpuid extd_apicid pni cx16 hypervisor lahf_lm cmp_legacy svm 3dnowprefetch vmmcall Virtualization features: Virtualization: AMD-V Caches (sum of all): L1d: 256 KiB (4 instances) L1i: 256 KiB (4 instances) L2: 2 MiB (4 instances) L3: 64 MiB (4 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 ``` ## 測驗 1 根據題目描述: > `next_pow2` 可針對給定無號 64 位元數值 x,找出最接近且大於等於 2 的冪的值,例如: > - `next_pow2(7)` = 8 > - `next_pow2(13)` = 16 > - `next_pow2(42)` = 64 其中「找出最接近且大於等於 2 的冪的值」指的應該是從所有大於等於 `x` 而且屬於 2 的非負整數次冪的數值當中選出最小值,即 $$ next\_pow2(x) = 2^n,\ where\ \ 2^{n-1} \lt x \le 2^n,\ n \in \mathbb{N} \cup \{0\} $$ 因此當 `x` 剛好等於 2 的非負整數次冪時,`next_pow(x)` 應該要剛好等於 `x`,例如: - `next_pow2(1)` = 1 - `next_pow2(2)` = 2 - `next_pow2(4)` = 4 這個結果剛好與題目給的範例程式輸出吻合 ```c #include <stdint.h> static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; } uint64_t next_pow2(uint64_t x) { uint8_t lo = 0, hi = 63; while (lo < hi) { uint8_t test = (lo + hi)/2; if (x < pow2(test)) { hi = test; } else if (pow2(test) < x) { lo = test+1; } else { return pow2(test); } } return pow2(lo); } ``` ### 解釋程式碼原理 依照題目的指示,完成以下 `next_pow2` 的改寫 ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 1; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` 其想法為將 `x` 最高位的 set bit (值為 `1` 的位元) 以下的所有位元都設為 `1`,再將結果 +1,即獲得一個最接近的 2 的冪,等同於 ```c uint64_t next_pow2(uint64_t x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` 但對於 `x` 原本就是 2 的冪的狀況,以上實作顯然無法給出正確答案,例如當 `x` 等於 16 時: | | `x` | | ---------------- | --------------- | | initial | `00...001000` (16) | | after left shift | `00...001111` (31) | | after add 1 | `00...010000` (32) | 因此我們需要對以上實作做一些修改,我們觀察一下給定不同 `x` 值時 `next_pow2` 計算結果的規律 | `x` | `next_pow2(x)` 計算結果 | `next_pow2(x)` 正確答案 | | --------------- | -------------------- | -------------------- | | $2^n$ | $2^{n+1}$ | $2^n$ | | $2^n-1$ | $2^n$ | $2^n$ | | $2^n-2$ | $2^n$ | $2^n$ | | ... | $2^n$ | $2^n$ | | $2^n-2^{n-1}+1$ | $2^n$ | $2^n$ | | $2^{n-1}$ | $2^n$ | $2^{n-1}$ | 我們可以發現當 `x` 為 2 的冪時,輸出會是正確答案的 2 倍,同時我們發現如果把 `x` 減 1 後再做位元運算即可滿足正確答案 | `x` | `next_pow2(x-1)` 計算結果 | `next_pow2(x)` 正確答案 | | --------------- | -------------------- | -------------------- | | $2^n$ | $2^n$ | $2^n$ | | $2^n-1$ | $2^n$ | $2^n$ | | $2^n-2$ | $2^n$ | $2^n$ | | ... | $2^n$ | $2^n$ | | $2^n-2^{n-1}+1$ | $2^n$ | $2^n$ | | $2^{n-1}$ | $2^{n-1}$ | $2^{n-1}$ | 但這個做法 (左移運算之前先減 1) 在 `x` 為 0 時仍會出錯,根據題目要求 `next_pow2(0)` 應該要得到 1,但當我們對 0 進行減法卻會導致溢位發生,最後算出來結果是 0,如同以下過程 | | `x` | | ---------------- | --------------- | | initial | `00...000000` (0) | | after left shift | `11...111111` (UINT64_MAX) | | after add 1 | `00...000000` (0) | 因此我們需要特別處理 `x` 為 0 的狀況,當 `x > 0` 時,先減 1 再做位元運算,當 `x == 0` 時,直接做位元運算 (也可以說減 0 後再做位元運算),因此我們只需要添加下面這一行即可 ```diff uint64_t next_pow2(uint64_t x) { + x -= !!x; x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; x |= x >> 32; return x + 1; } ``` ### 用 __builtin_clzl 改寫 ```c uint64_t next_pow2(uint64_t x) { int nlz = __builtin_clzl(x); /* number of leading zeros */ return ((uint64_t)-1 >> nlz) + 1; } ``` ### 在 Linux 核心原始程式碼找出類似的使用案例並解釋 ### 當上述 clz 內建函式已運用時,編譯器能否產生對應的 x86 指令? --- ## 測驗 2 ### 解釋程式碼運作原理 ```c int concatenatedBinary(int n) { const int M = 1e9 + 7; int len = 0; /* the bit length to be shifted */ /* use long here as it potentially could overflow for int */ long ans = 0; for (int i = 1; i <= n; i++) { /* removing the rightmost set bit * e.g. 100100 -> 100000 * 000001 -> 000000 * 000000 -> 000000 * after removal, if it is 0, then it means it is power of 2 * as all power of 2 only contains 1 set bit * if it is power of 2, we increase the bit length */ if (!(i & (i - 1))) len++; ans = (i | (ans << len)) % M; } return ans; } ``` ### 嘗試使用 __builtin_{clz,ctz,ffs} 改寫,並改進 $\mod 10^9 + 7$ 的運算 將 `x % (1e9 + 7)` 運算改用 bitwise operator,得先將 $10^9 + 7$ 使用二進位來表示 ## 測驗 3 ## 測驗 4