2023q1 Homework2 (quiz2)

contributed by < EdwardCKC >

測驗 `1`

題目: 針對給定無號 64 位元數值 x 找出最接近且大於等於 2 的冪的值,例如:

next_pow2(6) = 8
next_pow2(15) = 16
next_pow2(64) = 64

x |= x >> 1 因為 >> precedence 比 | 高，所以執行順序是 x = x | (x >> 1)

因為每做一次 x |= x >> 1 會向 LSB 多擴展 1 個 1

AAAA 之前會有 8 個 1, 而 AAAA 之後需要 16 個 1, 所以 AAAA = 8

同理 BBBB = 32

最後使 x 中 MSB 的 1 擴展到 LSB
然後 CCCC = x + 1 剛好進位到下一個 2 的冪的值。
但如果

x = 2^{n}

x = 2^{64}

答案會 +1 或 integer overflow, 所以要 x - 1

配合 `__builtin_clzl` 改寫

uint64_t next_pow2(uint64_t x)
{
    /* If x is 0, the result is undefined. */
    /* avoid x - 1 is 0 or -1 if x is 1 or 0 */
    if (x <= 1)
        return 1;
    
    uint64_t carry = 64 - __builtin_clzl(x - 1);
    return 1UL << (carry);
}

在 Linux 核心原始程式碼找出類似的使用案例並解釋

從 C 語言：數值系統的roundup_pow_of_two() 得知類似操作,

unsigned long __roundup_pow_of_two(unsigned long n)

Description

round the given value up to the nearest power of two - the result is undefined when n == 0 - this can be used to initialise global variables from constant data

從grep -r "rounddown_pow_of_two\|roundup_pow_of_two" linux-6.2.2/kernel/ 找到 hashtab.c 的 htab_map_alloc有類似的使用案例

htab->n_buckets = roundup_pow_of_two(htab->map.max_entries);

在 line 493 利用 roundup_pow_of_two 確保 n_buckets 的大小是2的冪的值

編譯器能否產生對應的 x86 指令

在Compiler Explorer x86-64 gcc 11.3 執行, 結果有 bsrq 指令

測驗 `2`

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(i & i-1))
            len++;
        ans = (i | ans << len) % M;
    }
    return ans;
}

len 是 int 在記錄 bit 的有效長度,只有
len 的長度可以用 bitwise 操作 !(i & i-1) 判定 i 的bit長度是否需要 len++

最後的 ans 是先把 ans 向左移動(ans << len), 之後用 bitwise or 去把兩個值的 bit 接在一起再 mod M

_builtin{clz,ctz,ffs} 改寫

for (int i = 1; i <= n; i++)
        ans = (i | (ans << (32 - __builtin_clz(i)))) % M;

return ans;

利用 32 - __builtin_clz(i) 算出位移量,可以減少決定 len的值的 if statement 和 len 參數

2023q1 Homework2 (quiz2)

測驗 `1`

配合 `__builtin_clzl` 改寫

在 Linux 核心原始程式碼找出類似的使用案例並解釋

編譯器能否產生對應的 x86 指令

測驗 `2`

_builtin{clz,ctz,ffs} 改寫

測驗 `3`

測驗 `4`

2023q1 Homework2 (quiz2)

測驗 1

配合 __builtin_clzl 改寫

在 Linux 核心原始程式碼找出類似的使用案例並解釋

編譯器能否產生對應的 x86 指令

測驗 2

_builtin{clz,ctz,ffs} 改寫

測驗 3

測驗 4

Read more

2023q1 Homework1 (lab0)

2023q1 Homework1 (quiz1)

測驗 `1`

配合 `__builtin_clzl` 改寫

測驗 `2`

測驗 `3`

測驗 `4`