# 2023q1 Homework2 (quiz2)
contributed by < `yutingshih` >
## 開發環境
> 目前尚未在實體機器執行 GNU/Linux,因此暫時在 Apple M1 Mac 上使用 Lima VM,參考 [Lima VM with a foreign architecture (slow mode)](https://github.com/lima-vm/lima/blob/master/docs/multi-arch.md)
```shell
$ uname -a
Linux lima-linux2023 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: AuthenticAMD
Model name: QEMU Virtual CPU version 2.5+
CPU family: 15
Model: 107
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 1
BogoMIPS: 1999.99
Flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_
good nopl cpuid extd_apicid pni cx16 hypervisor lahf_lm cmp_legacy svm 3dnowprefetch vmmcall
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 256 KiB (4 instances)
L1i: 256 KiB (4 instances)
L2: 2 MiB (4 instances)
L3: 64 MiB (4 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
```
## 測驗 1
根據題目描述:
> `next_pow2` 可針對給定無號 64 位元數值 x,找出最接近且大於等於 2 的冪的值,例如:
> - `next_pow2(7)` = 8
> - `next_pow2(13)` = 16
> - `next_pow2(42)` = 64
其中「找出最接近且大於等於 2 的冪的值」指的應該是從所有大於等於 `x` 而且屬於 2 的非負整數次冪的數值當中選出最小值,即
$$
next\_pow2(x) = 2^n,\ where\ \ 2^{n-1} \lt x \le 2^n,\ n \in \mathbb{N} \cup \{0\}
$$
因此當 `x` 剛好等於 2 的非負整數次冪時,`next_pow(x)` 應該要剛好等於 `x`,例如:
- `next_pow2(1)` = 1
- `next_pow2(2)` = 2
- `next_pow2(4)` = 4
這個結果剛好與題目給的範例程式輸出吻合
```c
#include <stdint.h>
static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; }
uint64_t next_pow2(uint64_t x)
{
uint8_t lo = 0, hi = 63;
while (lo < hi) {
uint8_t test = (lo + hi)/2;
if (x < pow2(test)) { hi = test; }
else if (pow2(test) < x) { lo = test+1; }
else { return pow2(test); }
}
return pow2(lo);
}
```
### 解釋程式碼原理
依照題目的指示,完成以下 `next_pow2` 的改寫
```c
uint64_t next_pow2(uint64_t x)
{
x |= x >> 1;
x |= x >> 1;
x |= x >> 1;
x |= x >> 1;
x |= x >> 1;
x |= x >> 1;
x |= x >> 1;
x |= x >> 8;
x |= x >> 16;
x |= x >> 32;
return x + 1;
}
```
其想法為將 `x` 最高位的 set bit (值為 `1` 的位元) 以下的所有位元都設為 `1`,再將結果 +1,即獲得一個最接近的 2 的冪,等同於
```c
uint64_t next_pow2(uint64_t x)
{
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x |= x >> 32;
return x + 1;
}
```
但對於 `x` 原本就是 2 的冪的狀況,以上實作顯然無法給出正確答案,例如當 `x` 等於 16 時:
| | `x` |
| ---------------- | --------------- |
| initial | `00...001000` (16) |
| after left shift | `00...001111` (31) |
| after add 1 | `00...010000` (32) |
因此我們需要對以上實作做一些修改,我們觀察一下給定不同 `x` 值時 `next_pow2` 計算結果的規律
| `x` | `next_pow2(x)` 計算結果 | `next_pow2(x)` 正確答案 |
| --------------- | -------------------- | -------------------- |
| $2^n$ | $2^{n+1}$ | $2^n$ |
| $2^n-1$ | $2^n$ | $2^n$ |
| $2^n-2$ | $2^n$ | $2^n$ |
| ... | $2^n$ | $2^n$ |
| $2^n-2^{n-1}+1$ | $2^n$ | $2^n$ |
| $2^{n-1}$ | $2^n$ | $2^{n-1}$ |
我們可以發現當 `x` 為 2 的冪時,輸出會是正確答案的 2 倍,同時我們發現如果把 `x` 減 1 後再做位元運算即可滿足正確答案
| `x` | `next_pow2(x-1)` 計算結果 | `next_pow2(x)` 正確答案 |
| --------------- | -------------------- | -------------------- |
| $2^n$ | $2^n$ | $2^n$ |
| $2^n-1$ | $2^n$ | $2^n$ |
| $2^n-2$ | $2^n$ | $2^n$ |
| ... | $2^n$ | $2^n$ |
| $2^n-2^{n-1}+1$ | $2^n$ | $2^n$ |
| $2^{n-1}$ | $2^{n-1}$ | $2^{n-1}$ |
但這個做法 (左移運算之前先減 1) 在 `x` 為 0 時仍會出錯,根據題目要求 `next_pow2(0)` 應該要得到 1,但當我們對 0 進行減法卻會導致溢位發生,最後算出來結果是 0,如同以下過程
| | `x` |
| ---------------- | --------------- |
| initial | `00...000000` (0) |
| after left shift | `11...111111` (UINT64_MAX) |
| after add 1 | `00...000000` (0) |
因此我們需要特別處理 `x` 為 0 的狀況,當 `x > 0` 時,先減 1 再做位元運算,當 `x == 0` 時,直接做位元運算 (也可以說減 0 後再做位元運算),因此我們只需要添加下面這一行即可
```diff
uint64_t next_pow2(uint64_t x)
{
+ x -= !!x;
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x |= x >> 32;
return x + 1;
}
```
### 用 __builtin_clzl 改寫
```c
uint64_t next_pow2(uint64_t x)
{
int nlz = __builtin_clzl(x); /* number of leading zeros */
return ((uint64_t)-1 >> nlz) + 1;
}
```
### 在 Linux 核心原始程式碼找出類似的使用案例並解釋
### 當上述 clz 內建函式已運用時,編譯器能否產生對應的 x86 指令?
---
## 測驗 2
### 解釋程式碼運作原理
```c
int concatenatedBinary(int n)
{
const int M = 1e9 + 7;
int len = 0; /* the bit length to be shifted */
/* use long here as it potentially could overflow for int */
long ans = 0;
for (int i = 1; i <= n; i++) {
/* removing the rightmost set bit
* e.g. 100100 -> 100000
* 000001 -> 000000
* 000000 -> 000000
* after removal, if it is 0, then it means it is power of 2
* as all power of 2 only contains 1 set bit
* if it is power of 2, we increase the bit length
*/
if (!(i & (i - 1)))
len++;
ans = (i | (ans << len)) % M;
}
return ans;
}
```
### 嘗試使用 __builtin_{clz,ctz,ffs} 改寫,並改進 $\mod 10^9 + 7$ 的運算
將 `x % (1e9 + 7)` 運算改用 bitwise operator,得先將 $10^9 + 7$ 使用二進位來表示
## 測驗 3
## 測驗 4