2023q1 Homework2 (quiz2)

測驗一

next_pow2 可針對給定無號 64 位元數值 x，找出最接近且大於等於 2 的冪的值，例如:

next_pow2(7) = 8
next_pow2(13) = 16
next_pow2(42) = 64

以下是可能的實作方式:

#include <stdint.h>
static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; }

uint64_t next_pow2(uint64_t x)
{
    uint8_t lo = 0, hi = 63;
    while (lo < hi) {
        uint8_t test = (lo + hi)/2;
        if (x < pow2(test)) { hi = test; }
        else if (pow2(test) < x) { lo = test+1; }
        else { return pow2(test); }
    }
    return pow2(lo);
}

然而上述程式碼存在分支，於是我們可考慮建構以下「填補」位元表示中的 1:

x = 0010000000000000
x = 0011000000000000
x = 0011110000000000
x = 0011111111000000
x = 0011111111111111

最初 x 是 0010000000000000，經過一系列操作後，成為 0011111111111111，亦即設定 (set，即指派為 1) 自原本最高位元到最低位元中，所有的位元。

我們嘗試改寫為以下:

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> AAAA;
    x |= x >> 16;
    x |= x >> BBBB;
    return CCCC;
}

請補完程式碼，使其符合預期。作答規範:

AAAA 和 BBBB 皆為數值，且 AAAA 小於 BBBB
CCCC 為表示式

延伸問題:

解釋上述程式碼原理，並用 __builtin_clzl 改寫

int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
int __builtin_clzl (unsigned long)
Similar to __builtin_clz, except the argument type is unsigned long.
在 Linux 核心原始程式碼找出類似的使用案例並解釋
當上述 clz 內建函式已運用時，編譯器能否產生對應的 x86 指令？

提示: 可執行 cc -O2 -std=c99 -S next_pow2.c 並觀察產生的 next_pow2.s 檔案，確認 bsrq 指令 (表示 "Bit Scan Reverse")

程式碼原理

程式碼輸入範圍為

0

2^{64} - 1

，將數值帶入前七個式子裡時x |= x >> 1;，相當於右移7次，並在前面填入7個1，跳過x |= x >> AAAA;，並執行x |= x >> 16;時，以

2^{60}

為例，可看出中間多了8個0，因為本函式的目的為填補位元的1，於是可以推得AAAA的值為8，到執行x |= x >> BBBB;前，共填補了1+7+8+16 ＝ 32個1，輸入值最大有64個位元，於是還須填補32個1才能將所有位元補滿，於是可推得BBBB為32，因為最大只會有64個1，所以超過

2^{63}

的值，其最接近的2的冪

2^{64}

無法表示出來。最後將x + 1即為最接近且大於等於 2 的冪的值，但是當輸入為2的冪時，輸出會是大於輸入的值，例如輸入64，輸出為128。

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

文字訊息不要用圖片展現。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

jserv

答案

AAAA = 8
BBBB = 32
CCCC = x + 1

程式碼

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 8;
    x |= x >> 16;
    x |= x >> 32;
    return x + 1;
}

改寫

int __builtin_clz (unsigned int x) 會回傳自 MSB 為 1 ，其左邊 0 的數量

uint64_t x = 127
Binary format = 00...001111111
                       ^
                       7
Output：57

64 - 57 = 7，可以藉由這個函式得知 MSB 為 1 的位置，只須往左位移 1 bit 就是最接近且大於等於 2 的冪的值，所以 7 可以看作是位移量，將 1 向左移 7 bit 就是 127 最接近且大於等於 2 的冪的值 (128) ，可以改寫成：

uint64_t next_pow2_2(uint64_t x)
{
    return 1 << (64 - __builtin_clzl(x));;
}

測驗二

LeetCode 1680. Concatenation of Consecutive Binary Numbers 給定一整數

n

，回傳將 1 到

n

的二進位表示法依序串接在一起所得到的二進位字串，其所代表的十進位數字 mod

10^{9} + 7

之值。

測試資料 1:
輸入： n = 1
輸出： 1

"1" 於二進位中對應著十進位中 1 的值

測試資料 2:
輸入： n = 3
輸出： 27

二進位中，1 、 2 、 3 對應著 1, 10 和 11。
將它們串接在一起，我們得到 11011，其對應著十進位數值 27

測試資料 3:
輸入： n = 12
輸出： 505379714

串接結果為 1101110010111011110001001101010111100
該十進位值為 118505380540
mod
$10^{9} + 7$ 後，結果為 505379714

以下是可能的實作:

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        // removing the rightmost set bit
        // e.g. 100100 -> 100000
        //      000001 -> 000000
        //      000000 -> 000000
        // after removal, if it is 0, then it means it is power of 2
        // as all power of 2 only contains 1 set bit
        // if it is power of 2, we increase the bit length
        if (!(DDDD))
            len++;
        ans = (i | (EEEE)) % M;
    }
    return ans;
}

請補完程式碼，使其符合預期。作答規範:

DDDD 和 EEEE 皆為表示式

延伸問題:

解釋上述程式碼運作原理
嘗試使用 __builtin_{clz,ctz,ffs} 改寫，並改進 mod
$10^{9} + 7$ 的運算

程式碼原理

DDDD是為了判斷 i 是否為 2 的冪，將 2 的冪轉成二進位，可以發現除了最高位元是 1 其餘都是 0，又可以發現，2 的冪減 1 後，剛好等於原數 NOT 後的值。在數位邏輯中 N & !N = 0，我們可以透過這個方法來判斷 N 是否為 2 的冪，所以是否為 2 的冪的判斷規則為N & (N-1) == 0，也就是!(i & (i - 1))。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

ans每次須左移，才能將後續的二進位數值放入，位移量就是 len 的值，len 的值要增加，需要上述判斷式為 2 的冪才會增加，當 i = 1 時，len = 1，當 i = 2 時，len = 2，所以 len 恰為所需的位移量，於是為ans << len。

答案

DDDD = i & (i - 1)
EEEE = ans << len

程式碼

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        // removing the rightmost set bit
        // e.g. 100100 -> 100000
        //      000001 -> 000000
        //      000000 -> 000000
        // after removal, if it is 0, then it means it is power of 2
        // as all power of 2 only contains 1 set bit
        // if it is power of 2, we increase the bit length
        if (!(i & (i - 1)))
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

改寫

int __builtin_clz (unsigned int x) 會回傳自 MSB 為 1 ，其左邊 0 的數量
int __builtin_ctz (unsigned int x) 會回傳自 LSB 為 1 ，其右邊 0 的數量
int __builtin_ffs (unsigned int x) 會回傳 1 + LSB 為 1 的 index

int x = 128;
binary format = 10000000
__builtin_clz = 24
__builtin_ctz = 7
__builtin_ffs = 8

可以觀察到，當輸入為 2 的冪時，其__builtin_clz(x) + __builtin_ffs(x) 會等於 bits 數，所以判斷式可以改寫成：

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; 
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        if (__builtin_ffs(i) + __builtin_clz(i) == 32)
            len++;
        ans = (i | (ans << len)) % M;
    }
    return ans;
}

測驗三

SWAR 通常不易實作，因此，我們觀察它們的位元模式：(從最低位元起算) 第 7 位元為 1，且第 6 位元為 0。於是我們可採用以下表示式:

not bit6 and bit7

再計算 (使用 population count 指令) 有多少位元組裡頭的 1 (即具有此屬性)。__builtin_popcount

輸入的位元組

t0 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx]
                          ^^^^^^^^^
                          continuation byte

反向運算 (not bit6)

t1 = ~t0
t1 = [11xx.xxxx|10xx.xxxx|01xx.xxxx|00xx.xxxx]

隔離反向的第 6 個位元

t2 = t1 & 0x40404040
t2 = [0100.0000|0000.0000|0100.0000|0000.0000]

對第 6 個位元，向左位移 1 個位元 (移至第 7 個位元的位置)

t3 = t2 + t2
t3 = [1000.0000|0000.0000|1000.0000|0000.0000]

進行 not bit6 and bit7

t4 = t0 & t3
t4 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx] 
   & [1000.0000|0000.0000|1000.0000|0000.0000]
   = [0000.0000|0000.0000|1000.0000|0000.0000]
                          ^^^^^^^^^
                          the only non-zero byte

以 population count 指令計算，即 popcount(t4)

SWAR 實作如下:

size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + len >> 3;

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = (1 << AAAA) * (len / 8)  - count;
    count += (len & BBBB) ? count_utf8((const char *) end, len & CCCC) : DDDD;

    return count;
}

程式碼原理

const uint64_t *qword = (const uint64_t *) buf;
const uint64_t *end = qword + len >> 3;

為了提升效能，將1 byte 的 char * 強制轉形成8 bytes 的 qword ，因為每次讀取8個bytes，所以字串總長度要除8，也就是 len >> 3 。

for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

迴圈會去計算 continuation bytes 數量。

count = (1 << 3) * (len / 8)  - count;

是為了計算出能被8 bytes整除中的 UTF-8 字元數量，(1 << 3) * (len / 8) 能計算出總共有多少能被8 bytes 整除的 bytes 數量，再去減掉 continuation bytes 數量，即為 UTF-8 字元數量。

count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;

這部份是用來計算那些無法被8 bytes 整除的部份，使用 count_utf8 來計算剩餘的 UTF-8 字元數量，再加上之前計算的 UTF-8 字元數量，即為總字元數量。

答案

AAAA = 3
BBBB = 7
CCCC = 7
DDDD = 0

程式碼

size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + len >> 3;

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = (1 << 3) * (len / 8)  - count;
    count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;

    return count;
}

測驗四

以下程式碼可判定 16 位元無號整數是否符合特定樣式 (pattern):

#include <stdint.h>
#include <stdbool.h>

bool is_pattern(uint16_t x)
{
    if (!x)
        return 0;

    for (; x > 0; x <<= 1) {
        if (!(x & 0x8000))
            return false;
    }

    return true;
}

符合上述程式碼的樣式，如下:

pattern: 8000 (32768)
pattern: c000 (49152)
pattern: e000 (57344)
pattern: f000 (61440)
pattern: f800 (63488)
pattern: fc00 (64512)
pattern: fe00 (65024)
pattern: ff00 (65280)
pattern: ff80 (65408)
pattern: ffc0 (65472)
pattern: ffe0 (65504)
pattern: fff0 (65520)
pattern: fff8 (65528)
pattern: fffc (65532)
pattern: fffe (65534)
pattern: ffff (65535)

改寫上述程式碼，使其達到等價行為，但更精簡有效。

bool is_pattern(uint16_t x)
{
    const uint16_t n = EEEE;
    return (n ^ x) < FFFF;
}

程式碼原理

若為符合 pattern 的數值，在做完二補數後，其值會是最左邊的一個 bit 為 1，其餘是 0 ，再對自己做 XOR ，則會使原本值的最右邊的 1 變成 0 ，故結果會小於原本的值。若為不符合 pattern 的數值，再做完 XOR ，其值會是除了最右邊一個 bit 為 0 ，其餘為 1 ，故結果會大於原本的值。

32565
 x     = 0111111100110101
-x     = 1000000011001011
-x ^ x = 1111111111111110

63488
x      = 1111100000000000
-x     = 0000100000000000
-x ^ x = 1111000000000000

答案

EEEE = -x
FFFF = x

程式碼

bool is_pattern(uint16_t x)
{
    const uint16_t n = -x;
    return (n ^ x) < x;
}

2023q1 Homework2 (quiz2)

測驗一

程式碼原理

答案

改寫

測驗二

程式碼原理

答案

改寫

測驗三

程式碼原理

答案

測驗四

程式碼原理

答案

Read more

Linux Linked-List Queue

Linux Fibonacci Driver

2023q1 Linux核心設計-問題紀錄

2023q1 Homework6 (quiz5)