2023q1 Homework2 (quiz2)

contributed by < Korin777 >

測驗三

size_t swar_count_utf8(const char *buf, size_t len)
{
    const uint64_t *qword = (const uint64_t *) buf;
    const uint64_t *end = qword + len >> 3;

    size_t count = 0;
    for (; qword != end; qword++) {
        const uint64_t t0 = *qword;
        const uint64_t t1 = ~t0;
        const uint64_t t2 = t1 & 0x04040404040404040llu;
        const uint64_t t3 = t2 + t2;
        const uint64_t t4 = t0 & t3;
        count += __builtin_popcountll(t4);
    }

    count = ((len >> 3) << 3) - count;
    count += (len & 7) ? count_utf8((const char *) end, len & 7) : 0;

    return count;
}

與原本的 count_utf8 相比， swar_count_utf8 充分利用處理器暫存器一次檢查 8 個 byte
先計算出 continuation byte 的數量，再與字串所佔的 bytes 相減即為 utf-8 字元數
當字串所佔 bytes 不是 8 的倍數，在透過 count_utf8 補齊剩餘的 utf-8 字元

效能比較

github

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

測試資料為隨機生成的 utf-8 字元
綠色為原本的實作，紫色則為加入 SWAR 技巧的實作，直覺上執行時間應該會快 8 倍左右，實際上卻快了12倍以上

測驗四

從輸出可以觀察到符合的樣式為左邊皆為 1 右邊皆為 0 的 bit pattern e.g. 1100

bool is_pattern(uint16_t x)
{
    const uint16_t n = -x;
    return (n ^ x) < x;
}

考慮以下三種情況：

當 x 符合，-x ^ x 一定會將最低位的 1 改為 0，使他比 x 小
e.g. 若 x =
$110_{2}$ ，則
$010_{2}$ ^
$110_{2}$ =
$100_{2}$
當 x 不符合， -x ^ x 一定會將最高位 1 及次高位 1 中間的 0 改為 1，使他比 x 大
e.g. 若 x =
$101_{2}$ ，則
$011_{2}$ ^
$101$ =
$110_{2}$
當 x 為 0，-x ^ x 仍為 0

題目連結

2023q1 Homework2 (quiz2)

測驗三

效能比較

測驗四

Read more

2023q1 Homework7 (ktcp)

2023q1 Homework5 (quiz5)

2023q1 Homework4 (quiz4)

2023q1 Homework3 (quiz3)