2023q1 Homework2 (quiz2)

contributed by < Ahsen-lab >

測驗 `1`

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return x + 1;
}

程式碼原理

對輸入的 64-bits 無號整數 x ，將其最高位元的 1 到最低位元中的所有位元都設定為 1，最後回傳 x + 1 ，即可得到大於且最接近 x 的 2 的冪的值。

舉例來說

x = 001 \overset{61}{\overset{⏞}{00. . .000}}

最高位元到最低位元中的第一個 1 是位在第 62 位元，首先重複 7 次 x |= x >> 1 的操作，將 x 中最高位元的 1 右側的 7 個位元都設定為 1

x = 0011111111 \overset{54}{\overset{⏞}{000. . .000}}

接著執行 x |= x >> 8 的操作，將第 55 位元往右的 8 個位元都設定為 1

x = 00 \overset{16}{\overset{⏞}{1111111111111111}} \overset{46}{\overset{⏞}{000. . .000}}

接著執行 x |= x >> 16 的操作，將第 47 位元往右的 16 個位元都設定為 1

x = 00 \overset{32}{\overset{⏞}{111. . .111}} \overset{30}{\overset{⏞}{000. . .000}}

接著執行 x |= x >> 32 的操作，將第 31 位元往右的 32 個位元都設定為 1

x = 00 \overset{62}{\overset{⏞}{111. . .111}}

最後回傳 x + 1

x + 1 = 01 \overset{62}{\overset{⏞}{000. . .000}}

如此便可得到大於且最接近 x 的 2 的冪的值。

當輸入的 x 是 2 的冪時 ( 例如 2 、 8 、 16 等等 ) ，上述程式會回傳錯誤的答案，因為題目要求的是找出最接近且 大於等於 2 的冪的值，而上述程式只會回傳最接近且大於 2 的冪的值。

換句話說，如果輸入的 x 是 2 的冪時，只需回傳 x 本身，不需要再尋找比它更大的 2 的冪。因此，上述程式在 x 是 2 的冪時會回傳錯誤的答案。

使用 `__builtin_clzl` 改寫

uint64_t next_pow2(uint64_t x)
{
    if (!x)
        return 0;

    int lead_0 = __builtin_clzl(x);
    uint64_t test = (uint64_t) 1 << (63 - lead_0);

    if (!lead_0)
        return test;

    if (x > test)
        return test << 1;
    else
        return x;
}

測驗 `2`

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(i & i - 1))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

程式碼原理

給定一個整數

n

，將 1 到

n

的二進位表示法依序串接在一起得到一個二進位字串，回傳其所代表的十進位數字 mod

10^{9} + 7

之值。

使用一個迴圈從 1 遍歷到

n

，如果當前遍歷到的數字是 2 的冪，增加 len 的長度， len 為當前數字其二進位表示的長度，然後將其與上一個結果串接起來，最終得到新的結果。

在函式中使用 !(i & i - 1) 來判斷 i 是否為 2 的冪，如果一個數字 i 是 2 的冪， i - 1 會剛好等於 ~i ，而 i & ~i 會等於 0 ，因此可以透過 !(i & i - 1) 來判斷 i 是否為 2 的冪。

在遍歷 1 到

n

的過程中，使用 i | (ans << len) ， ans 代表上一次計算所得的結果，使用了 long 型別的變數來存儲它，以避免整數溢出問題，len 為當前數字其二進位表示的長度，也是 ans 需向左位移的距離， ans 會向左位移 len 個 bits ，以便將 i 串接在後面。

最終，函式返回一個整數，即串接後的二進位表示對

10^{9} + 7

取模的結果。

使用 `__builtin_clz` 改寫

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    int int_bits = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        int_bits = sizeof(int) * 8;
        if (!(i & ~(1 << int_bits - __builtin_clz(i) - 1)))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

__builtin_clz 函式會返回一個整數的二進制表示中，左起第一個 1 之前的 0 的個數，可以利用此功能來改寫程式中判斷一個整數是不是 2 的冪的部分。

首先我給定一個變數 int_bits 表示 int 型別的位元數，計算方式為 sizeof(int) * 8 ， sizeof(int) 回回傳 int 型別的位元組 (byte) 數，每個位元組由 8 個位元組成，因此 sizeof(int) * 8 可得到 int 型別的位元數。

接著用你所不知道的 C 語言: bitwise 操作所提到的 clear a bit 的操作來將整數 i 二進制表示中最左側的 1 設為 0 ，透過 ~(1 << int_bits - __builtin_clz(i) - 1) 可以找到要將其設為 0 的那個位元，然後把它與 i 做 and 運算，如此便可將整數 i 二進制表示中最左側的 1 變為 0。

若 i 為 2 的冪，其二進制表示中應該只包含一個 1，將其設為 0 會使得整數 i 變成 0 ，因此若是在經過上述的操作後 i 變成 0 ，就表示 i 是 2 的冪。

使用 `__builtin_ctz` 改寫

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(i & ~(1 << __builtin_ctz(i))))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

__builtin_ctz 函式會返回一個整數的二進制表示中，右起第一個 1 右側的 0 的個數，可以利用此功能來改寫程式中判斷一個整數是不是 2 的冪的部分。

使用 ~(1 << __builtin_ctz(i)) 可以找到要將其設為 0 的那個位元，然後把它與 i 做 and 運算，如此便可將整數 i 二進制表示中最右側的 1 變為 0。

使用 `__builtin_ffs` 改寫

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(i & ~(1 << __builtin_ffs(i) - 1)))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

__builtin_ffs 函式會返回一個整數的二進制表示中，右起第一個 1 的位置，假設 i = 00001000 ，則 __builtin_ffs(i) 會返回 4 ，可以利用此功能來改寫程式中判斷一個整數是不是 2 的冪的部分。

使用 ~(1 << __builtin_ffs(i) - 1) 可以找到要將其設為 0 的那個位元，然後把它與 i 做 and 運算，如此便可將整數 i 二進制表示中最右側的 1 變為 0。

測驗 `3`

函式 `swar_count_utf8` 運作原理

size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + (len >> 3);

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = (1 << 3) * (len / 8)  - count;
    count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;

    return count;
}

swar_count_utf8 函式使用 SWAR 的技巧來計算一個給定的 UTF-8 字串中的 Unicode 字元數量。

const uint64_t *qword = (const uint64_t *) buf;
const uint64_t *end = qword + (len >> 3);

首先，把 UTF-8 字串 buf 轉換成一個指向 uint64_t 整數類型的指標 qword ，這樣就可以按照 uint64_t 的位數（64 位）進行處理，便於後續計算。接著計算 qword 的終止條件 end 。

end 的計算方式為 qword + (len >> 3) ， len >> 3 相當於 len / 8 ，( 因為 uint64_t 的大小是 8 個位元組），然後加上 qword 的初始值，得到 end 的位置，也就是將 buf 所占的總位元組每 8 個為一組分組後，剩餘的位元組開頭的位置 ( UTF-8 字串 buf 所占的總位元組數 len 不一定是 8 的倍數，因此可能會有剩餘 )。

size_t count = 0;
for (; qword != end; qword++) {
    const uint64_t t0 = *qword;
    const uint64_t t1 = ~t0;
    const uint64_t t2 = t1 & 0x04040404040404040llu;
    const uint64_t t3 = t2 + t2;
    const uint64_t t4 = t0 & t3;
    count += __builtin_popcountll(t4);
}

然後，對於每個 64-bit 無號整數，使用以下步驟操作來計算其中後續位元組 ( continuation byte(s) ) 的數量：

參考 2023q1 第 2 週測驗題 - 測驗 3

輸入的位元組

t0 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx|00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx]
                          ^^^^^^^^^                               ^^^^^^^^^
                          continuation byte                       continuation byte

反向運算 (not bit6)

t1 = ~t0
t1 = [11xx.xxxx|10xx.xxxx|01xx.xxxx|00xx.xxxx|11xx.xxxx|10xx.xxxx|01xx.xxxx|00xx.xxxx]

隔離反向的第 6 個位元

t2 = t1 & 0x4040404040404040
t2 = [0100.0000|0000.0000|0100.0000|0000.0000|0100.0000|0000.0000|0100.0000|0000.0000]

對第 6 個位元，向左位移 1 個位元 (移至第 7 個位元的位置)

t3 = t2 + t2
t3 = [1000.0000|0000.0000|1000.0000|0000.0000|1000.0000|0000.0000|1000.0000|0000.0000]

進行 not bit6 and bit7

t4 = t0 & t3
t4 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx|00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx] &
     [1000.0000|0000.0000|1000.0000|0000.0000|1000.0000|0000.0000|1000.0000|0000.0000]
   = [0000.0000|0000.0000|1000.0000|0000.0000|0000.0000|0000.0000|1000.0000|0000.0000]
                          ^^^^^^^^^                               ^^^^^^^^^
                          the non-zero byte                       the non-zero byte

以 __builtin_popcountll 計算，即 __builtin_popcountll(t4)，並將結過加到 count 裡。
重複上述步驟，最後得到的 count 即為後續位元組 ( continuation byte(s) ) 的數量。

count = (1 << 3) * (len / 8)  - count;
count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;
return count;

(1 << 3) * (len / 8) 將 len 除以 8，然後乘以 8（即左移 3 位），可使得位元組數量變成 8 的倍數 ( 為了配合 uint64_t 的位數（8 bytes）進行處理 )，接著減去後續位元組 ( continuation byte(s) ) 的數量，就可得到 UTF-8 字元的數量。

因為 UTF-8 字串 buf 所占的總位元組數 len 不一定是 8 的倍數，所以要計算 len % 8 來判斷 end 是否有指向 buf 尾端剩餘的位元組，當除數為

2^{n}

時，可以 len & n - 1 來表示對

n

做模運算，因此 len % 8 可改寫為 len & 7，假設所得的餘數大於 0 ，則使用 count_utf8 函式來處理剩餘的位元組，計算剩餘的位元組中所包含的 UTF-8 字元的數量，並將結果加到 count ，若餘數等於 0 則將 0 加到 count，最後回傳 count 。

測驗 `4`

bool is_pattern(uint16_t x)
{
    const uint16_t n = ~x + 1;
    return (n ^ x) < x;
}

程式碼原理

pattern: 8000 (1000 0000 0000 0000)
pattern: c000 (1100 0000 0000 0000)
pattern: e000 (1110 0000 0000 0000)
pattern: f000 (1111 0000 0000 0000)
pattern: f800 (1111 1000 0000 0000)
pattern: fc00 (1111 1100 0000 0000)
pattern: fe00 (1111 1110 0000 0000)
pattern: ff00 (1111 1111 0000 0000)
pattern: ff80 (1111 1111 1000 0000)
pattern: ffc0 (1111 1111 1100 0000)
pattern: ffe0 (1111 1111 1110 0000)
pattern: fff0 (1111 1111 1111 0000)
pattern: fff8 (1111 1111 1111 1000)
pattern: fffc (1111 1111 1111 1100)
pattern: fffe (1111 1111 1111 1110)
pattern: ffff (1111 1111 1111 1111)

從上面的 pattern 可看出，函式 is_pattern 會篩選出 16 位無號整數 x 的二進制表示中，有連續的 1 集中在左側而連續的 0 集中在右側的數。

精簡版的程式碼使用了二補數的概念，對輸入的無號整數 x 取反加 1 得到 n，然後將 n 與 x 做 XOR 運算，如果 x 符合上述的 pattern，n 會是 2 的冪，其值為 x 的二進制表示中，最右側那個 1 的位元所代表的數，所以 n ^ x 計算的結果相當於消除 x 的二進制表示中最右側那個 1 ，因此，如果 (n ^ x) < x ，則表示 x 符合上述 pattern，這就是 is_pattern 函式要檢查的模式，因此返回 true，否則返回 false。

2023q1 Homework2 (quiz2)

測驗 1

程式碼原理

使用 __builtin_clzl 改寫

測驗 2

程式碼原理

使用 __builtin_clz 改寫

使用 __builtin_ctz 改寫

使用 __builtin_ffs 改寫

測驗 3

函式 swar_count_utf8 運作原理

測驗 4

程式碼原理

Read more

2023q1 Homework1 (lab0)

2023q1 Homework3 (fibdrv)

2022q1 Homework3 (fibdrv)

2022q1 Homework2 (quiz2)

測驗 `1`

使用 `__builtin_clzl` 改寫

測驗 `2`

使用 `__builtin_clz` 改寫

使用 `__builtin_ctz` 改寫

使用 `__builtin_ffs` 改寫

測驗 `3`

函式 `swar_count_utf8` 運作原理

測驗 `4`