2023q1 Homework2 (quiz2)

contributed by < yutingshih >

開發環境

目前尚未在實體機器執行 GNU/Linux，因此暫時在 Apple M1 Mac 上使用 Lima VM，參考 Lima VM with a foreign architecture (slow mode)

$ uname -a
Linux lima-linux2023 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ gcc --version
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               AuthenticAMD
  Model name:            QEMU Virtual CPU version 2.5+
    CPU family:          15
    Model:               107
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            1
    BogoMIPS:            1999.99
    Flags:               fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_
                         good nopl cpuid extd_apicid pni cx16 hypervisor lahf_lm cmp_legacy svm 3dnowprefetch vmmcall
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (4 instances)
  L1i:                   256 KiB (4 instances)
  L2:                    2 MiB (4 instances)
  L3:                    64 MiB (4 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3

測驗 1

根據題目描述：

next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值，例如：

next_pow2(7) = 8

next_pow2(13) = 16

next_pow2(42) = 64

其中「找出最接近且大於等於 2 的冪的值」指的應該是從所有大於等於 x 而且屬於 2 的非負整數次冪的數值當中選出最小值，即

n e x t_p o w 2 (x) = 2^{n}, w h e r e 2^{n - 1} < x \leq 2^{n}, n \in N \cup {0}

因此當 x 剛好等於 2 的非負整數次冪時，next_pow(x) 應該要剛好等於 x，例如：

next_pow2(1) = 1
next_pow2(2) = 2
next_pow2(4) = 4

這個結果剛好與題目給的範例程式輸出吻合

#include <stdint.h>
static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; }

uint64_t next_pow2(uint64_t x)
{
    uint8_t lo = 0, hi = 63;
    while (lo < hi) {
        uint8_t test = (lo + hi)/2;
        if (x < pow2(test)) { hi = test; }
        else if (pow2(test) < x) { lo = test+1; }
        else { return pow2(test); }
    }
    return pow2(lo);
}

解釋程式碼原理

依照題目的指示，完成以下 next_pow2 的改寫

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return x + 1;
}

其想法為將 x 最高位的 set bit (值為 1 的位元) 以下的所有位元都設為 1，再將結果 +1，即獲得一個最接近的 2 的冪，等同於

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return x + 1;
}

但對於 x 原本就是 2 的冪的狀況，以上實作顯然無法給出正確答案，例如當 x 等於 16 時：

	`x`
initial	`00...001000` (16)
after left shift	`00...001111` (31)
after add 1	`00...010000` (32)

因此我們需要對以上實作做一些修改，我們觀察一下給定不同 x 值時 next_pow2 計算結果的規律

`x`	`next_pow2(x)` 計算結果	`next_pow2(x)` 正確答案
$2^{n}$	$2^{n + 1}$	$2^{n}$
$2^{n} - 1$	$2^{n}$	$2^{n}$
$2^{n} - 2$	$2^{n}$	$2^{n}$
…	$2^{n}$	$2^{n}$
$2^{n} - 2^{n - 1} + 1$	$2^{n}$	$2^{n}$
$2^{n - 1}$	$2^{n}$	$2^{n - 1}$

我們可以發現當 x 為 2 的冪時，輸出會是正確答案的 2 倍，同時我們發現如果把 x 減 1 後再做位元運算即可滿足正確答案

`x`	`next_pow2(x-1)` 計算結果	`next_pow2(x)` 正確答案
$2^{n}$	$2^{n}$	$2^{n}$
$2^{n} - 1$	$2^{n}$	$2^{n}$
$2^{n} - 2$	$2^{n}$	$2^{n}$
…	$2^{n}$	$2^{n}$
$2^{n} - 2^{n - 1} + 1$	$2^{n}$	$2^{n}$
$2^{n - 1}$	$2^{n - 1}$	$2^{n - 1}$

但這個做法 (左移運算之前先減 1) 在 x 為 0 時仍會出錯，根據題目要求 next_pow2(0) 應該要得到 1，但當我們對 0 進行減法卻會導致溢位發生，最後算出來結果是 0，如同以下過程

	`x`
initial	`00...000000` (0)
after left shift	`11...111111` (UINT64_MAX)
after add 1	`00...000000` (0)

因此我們需要特別處理 x 為 0 的狀況，當 x > 0 時，先減 1 再做位元運算，當 x == 0 時，直接做位元運算 (也可以說減 0 後再做位元運算)，因此我們只需要添加下面這一行即可

uint64_t next_pow2(uint64_t x)
{
+   x -= !!x;
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return x + 1;
}

用 __builtin_clzl 改寫

uint64_t next_pow2(uint64_t x)
{
    int nlz = __builtin_clzl(x);  /* number of leading zeros */
    return ((uint64_t)-1 >> nlz) + 1;
}

在 Linux 核心原始程式碼找出類似的使用案例並解釋

當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？

測驗 2

解釋程式碼運作原理

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(i & (i - 1)))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

嘗試使用 _builtin{clz,ctz,ffs} 改寫，並改進
$\mod 10^{9} + 7$ 的運算

將 x % (1e9 + 7) 運算改用 bitwise operator，得先將

10^{9} + 7

使用二進位來表示

2023q1 Homework2 (quiz2)

開發環境

測驗 1

解釋程式碼原理

用 __builtin_clzl 改寫

在 Linux 核心原始程式碼找出類似的使用案例並解釋

當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？

測驗 2

解釋程式碼運作原理

嘗試使用 _builtin{clz,ctz,ffs} 改寫，並改進
$\mod 10^{9} + 7$ 的運算

測驗 3

測驗 4

2023q1 Homework2 (quiz2)

開發環境

測驗 1

解釋程式碼原理

用 __builtin_clzl 改寫

在 Linux 核心原始程式碼找出類似的使用案例並解釋

當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？

測驗 2

解釋程式碼運作原理

嘗試使用 _builtin{clz,ctz,ffs} 改寫，並改進 mod109+7 的運算

測驗 3

測驗 4

嘗試使用 _builtin{clz,ctz,ffs} 改寫，並改進
$\mod 10^{9} + 7$ 的運算