2023q1 Homework2 (quiz2)

contributed by < willwillhi1 >

測驗一

uint64_t next_pow2(uint64_t x)
{
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> AAAA;
    x |= x >> 16;
    x |= x >> BBBB;
    return CCCC;
}

此實作的原理是將最高位的一 bit 依照 1, 2, 4, 8, 16, 32 的順序依序向右移，且每次都要用 or 把 1 寫到原本的 x，最後再對 x 加一找到大於且最接近 x 的冪的數。

舉例來說如果 x 為 (0010 0100)₂，做到最後 x 會變成 (0011 1111)₂，所以再對它加一就可以得到 (0100 0000)₂。

用 __builtin_clzl 改寫，先去看其定義

Similar to __builtin_clz, except the argument type is unsigned long.
__builtin_clz 的定義為

Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

__builtin_clzl 簡單來說就是回傳 leading zero 的個數
因此可以把上面的程式改寫為:

uint64_t next_pow2(uint64_t x)
{
    int count = __builtin_clzl(x);
    int num = 64-count;
    int ans = 1;
    while (num) {
        ans <<= 1;
        num--;
    }
    return ans;
}

測驗二

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
         // if it is power of 2, we increase the bit length
        if (!(DDDD))
            len++;
        ans = (i | (EEEE)) % M;
    }
    return ans;
}

if it is power of 2, we increase the bit length 所以 DDDD 要做的事情就是判斷 i 是否為 2 的冪，是就把 len 加一。
由上述可知 DDDD 為 i & (i-1)，因為如果 i 為 2 的冪， i & (i-1) 的結果就會是 0。
要把 len 加一的情況是因為位移需要，舉例來說如果 i = 1, len = 1，則當 i = 2 時因為 2 = 0b10，由最低位數到最高位的 1 的總長度為 2 個 bits，所以位移共需要 2 個 bit，len 要加一。
由上可知 EEEE 就是 ans << len。

測驗三

參考 Ahsen-lab 同學的講解
此題為運算出 utf-8 字元數量，原理為只要判斷出首位元組和後續位元組就可以得出 utf-8 字元數量。而後續位元組都有一個共通點：最高的兩個位元為 10。




















size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + (len >> 3);

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = (1 << 3) * (len / 8)  - count;
    count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;

    return count;
}

首先，以每 64 bits 為單位，所以先把 buf 轉換成 uint64_t，之後 end 表示為 buf 的最後一個滿 64 bits 的位址。


const uint64_t *qword = (const uint64_t *) buf;
const uint64_t *end = qword + (len >> 3);

然後，對於每個 64 bits 無號整數，使用以下步驟操作來計算其中後續位元組 ( continuation byte(s) ) 的數量：









size_t count = 0;
for (; qword != end; qword++) {
    const uint64_t t0 = *qword;
    const uint64_t t1 = ~t0;
    const uint64_t t2 = t1 & 0x04040404040404040llu;
    const uint64_t t3 = t2 + t2;
    const uint64_t t4 = t0 & t3;
    count += __builtin_popcountll(t4);
}

輸入的位元組



t0 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx]
                          ^^^^^^^^^
                          continuation byte

反向運算 (not bit6)


t1 = ~t0
t1 = [11xx.xxxx|10xx.xxxx|01xx.xxxx|00xx.xxxx]

隔離反向的第 6 個位元


t2 = t1 & 0x40404040
t2 = [0100.0000|0000.0000|0100.0000|0000.0000]

對第 6 個位元，向左位移 1 個位元 (移至第 7 個位元的位置)


t3 = t2 + t2 = t2 << 1
t3 = [1000.0000|0000.0000|1000.0000|0000.0000]

進行 not bit6 and bit7






t4 = t0 & t3
t4 = [00xx.xxxx|01xx.xxxx|10xx.xxxx|11xx.xxxx] &
     [1000.0000|0000.0000|1000.0000|0000.0000]
   = [0000.0000|0000.0000|1000.0000|0000.0000]
                          ^^^^^^^^^
                          the only non-zero byte

以 population count 指令計算，即 popcount(t4)
最後用總長(以 64 bits 為單位)扣掉 count 得到 UTF-8 字元的數量後，再用 count_utf8 處理剩下不足 64 bits 的部分(也就是長度為 len & 7 = len % 8 的部分)



count = (1 << 3) * (len / 8)  - count;
count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;
return count;

測驗四

要判斷 x 是否符合以下 pattern:

pattern: 0x8000 = 0b1000 0000 0000 0000
pattern: 0xc000 = 0b1100 0000 0000 0000
pattern: 0xe000 = 0b1110 0000 0000 0000
pattern: 0xf000 = 0b1111 0000 0000 0000
pattern: 0xf800 = 0b1111 1000 0000 0000
pattern: 0xfc00 = 0b1111 1100 0000 0000
pattern: 0xfe00 = 0b1111 1110 0000 0000
pattern: 0xff00 = 0b1111 1111 0000 0000
pattern: 0xff80 = 0b1111 1111 1000 0000
pattern: 0xffc0 = 0b1111 1111 1100 0000
pattern: 0xffe0 = 0b1111 1111 1110 0000
pattern: 0xfff0 = 0b1111 1111 1111 0000
pattern: 0xfff8 = 0b1111 1111 1111 1000
pattern: 0xfffc = 0b1111 1111 1111 1100
pattern: 0xfffe = 0b1111 1111 1111 1110
pattern: 0xffff = 0b1111 1111 1111 1111

bool is_pattern(uint16_t x)
{
    const uint16_t n = ~x+1;
    return (n ^ x) < x;
}

EEEE = ~x+1，這樣做的效果可以讓 n^x 的結果保證小於 x。

-x 亦可 jserv
以 (1111 1100)₂ 和 (1111 0100)₂ 為例
- (1111 1100)₂ 的 2 的補數是 (0000 0100)₂，與原本的數做 XOR 為 (1111 1000)₂，效果是把原本的數的最低位的 １ 改成 0，所以最後的結果一定會小於原本的數。
- 但是 (1111 0100)₂ 的 2 的補數是 (0000 1100)₂，與原本的數做 XOR 為 (1111 1000)₂，可以發現除了把最低位的 1 改成 0 之外，還會把原本的數的 1 中間的 0 的位元會變成 1 (粗體的部分)，導致結果大於原本的數。

尊重視障朋友 (含色弱者) 閱讀的權益，解說程式碼時，避免談及特別的顏色。
jserv

2023q1 Homework2 (quiz2)

測驗一

測驗二

測驗三

測驗四

Read more

2023q1 Homework3 (fibdrv)

2023q1 Homework1 (lab0)

指標篇 心得

save_ie

指標篇心得