2023q1 Homework2 (quiz2)

contributed by < kata1219 >

測驗 `1`

參照你所不知道的 C 語言: bitwise 操作，考慮 next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> AAAA;
    x |= x >> 16;
    x |= x >> BBBB;
    return CCCC;
}

AAAA = 8
BBBB = 32
CCCC = ++x

程式碼也可以改寫為

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return ++x;
}

在 x 為 2 的冪時答案會錯誤，需要提前檢查 x 是否為 2 的冪

延伸問題

解釋上述程式碼原理，並用 __builtin_clzl 改寫

int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
int __builtin_clzl (unsigned long)
Similar to __builtin_clz, except the argument type is unsigned long.

我們需要把最左側為 1 的位元及其右側的位元都設為 1，再加上 1 後即可得到最接近且大於等於 2 的冪。給定的範例程式碼中出現了 7 次 x |= x >> 1;，代表 x 中為 1 的位元及其後面 7 位元共 8 位元設為 1。下一行的功能是將這 8 位元的 1 拓展一倍，以此類推，直到將 1 後面的 64 位元都設為 1。

uint64_t next_pow2(uint64_t x)
{
    return (1 << 64 - __builtin_clzl(x));
}

在 Linux 核心原始程式碼找出類似的使用案例並解釋

當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？

提示: 可執行 cc -O2 -std=c99 -S next_pow2.c 並觀察產生的 next_pow2.s 檔案，確認 bsrq 指令 (表示 “Bit Scan Reverse”)

可以，以下為節錄的程式碼。

next_pow2:
.LFB13:
	.cfi_startproc
	endbr64
	bsrq	%rdi, %rdi
	movl	$64, %ecx
	movl	$1, %eax
	xorq	$63, %rdi
	subl	%edi, %ecx
	sall	%cl, %eax
	cltq
	ret
	.cfi_endproc

測驗 `2`

給定一整數

n

，回傳將 1 到

n

的二進位表示法依序串接在一起所得到的二進位字串，其所代表的十進位數字 mod

10^{9} + 7

之值。

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(DDDD))
            len++;
        ans = (i | (EEEE)) % M;
    }
    return ans;
}

DDDD = i ^ (i & ~(i - 1))
EEEE = ans << len

延伸問題

解釋上述程式碼運作原理

在程式碼中，每回合 ans 需要空出 i 所佔的位元數後進行合併並取餘數。len 代表 i 所占用的位元數，當 i 是 2 的冪時長度需要加 1。

嘗試使用 __builtin_{clz,ctz,ffs} 改寫，並改進 mod
$10^{9} + 7$ 的運算

int concatenatedBinary_built_in(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++)
    {
        if (!(i ^ 1 << __builtin_ctz(i)))
        // 或 if(__builtin_clz(i) + __builtin_ctz(i) == 31)
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

目前沒想到怎麼用 bitwise operation 實作 mod
$10^{9} + 7$

參照 Barrett reduction
jserv

測驗 `3`

SWAR 實作

size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + len >> 3;

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = (1 << AAAA) * (len / 8)  - count;
    count += (len & BBBB) ? count_utf8((const char *) end, len & CCCC) : DDDD;

    return count;
}

AAAA = 3
BBBB = 7
CCCC = 7
DDDD = 0

延伸問題

解釋上述程式碼運作原理，比較 SWAR 和原本的實作效能落差

在 Linux 核心原始程式碼找出 UTF-8 和 Unicode 相關字串處理的程式碼，探討其原理，並指出可能的改進空間

測驗 `4`

以下程式碼可判定 16 位元無號整數是否符合特定樣式(在 leftmost set bit 左側的位元皆為 1 且 x 不為 0)，將程式碼改寫為使用 bitwise operation。

#include <stdint.h>
#include <stdbool.h>

bool is_pattern(uint16_t x)
{
    if (!x)
        return 0;

    for (; x > 0; x <<= 1) {
        if (!(x & 0x8000))
            return false;
    }

    return true;
}

bool is_pattern(uint16_t x)
{
    const uint16_t n = EEEE;
    return (n ^ x) < FFFF;
}

EEEE = 0xFFFF
FFFF = (x & ~(x - 1))

延伸問題

解釋上述程式碼運作原理

嘗試將 n 設為幾個常見的數，如 0x0000、0xFFFF 等等，代入 n ^ x 發現只要 x 符合特定樣式，在 0xFFFF ^ x 後，皆會變成 2 的冪 - 1，因此我們找出 x 的 rightmost set bit，只要 XOR 後大於等於 rightmost set bit 的都不符合特定樣式。

在 Linux 核心原始程式碼找出上述 bitmask 及產生器，探討應用範疇

參見 Data Structures in the Linux Kernel

2023q1 Homework2 (quiz2)

測驗 1

延伸問題

測驗 2

延伸問題

測驗 3

延伸問題

測驗 4

延伸問題

Read more

2023q1 Homework3 (fibdrv)

2023q1 Homework1 (lab0)

2023q1 Homework1 (quiz1)

2023q1 Homework3 (quiz3)

測驗 `1`

測驗 `2`

測驗 `3`

測驗 `4`