2022q1 Homework 5 (quiz8)

測驗題目

測驗1

利用上述 SIMD within a register (SWAR) 的技巧，我們可改寫為以下 memchr_opt 函式:































































void *memchr_opt(const void *src_void, int c, size_t length)
{
    const unsigned char *src = (const unsigned char *) src_void;
    unsigned char d = c;

    /* 判斷是否與 long 對齊 */
    while (UNALIGNED(src)) {
        if (!length--)
            return NULL;
        if (*src == d)
            return (void *) src;
        src++;
    }

    if (!TOO_SMALL(length)) {
        /* If we get this far, we know that length is large and
         * src is word-aligned.
         */

        /* The fast code reads the source one word at a time and only performs
         * the bytewise search on word-sized segments if they contain the search
         * character, which is detected by XORing the word-sized segment with a
         * word-sized block of the search character and then detecting for the
         * presence of NULL in the result.
         */
        unsigned long *asrc = (unsigned long *) src;
        /*for ascii the value range will be 0 ~ 127
         *so need 8 bits to express the target char
         *first make double char mask so  shift 8 bits
         */
        unsigned long mask = d << 8 | d; /* '.' = 46 = 101110 */
        /*mask = 00010 1110 0010 1110*/

        /*make 32 bits mask*/
        mask = mask << 16 | mask;
        /*make 64 bits mask*/
        for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1) /* LBLOCKSIZE = 8 or 4*/
            mask = (mask << i) | mask;
        /*用於 8 */
        while (length >= LBLOCKSIZE) {
            /* XXXXX: Your implementation should appear here */
            /* 如果每 8 個 bytes 進入函式有找到目標字元,則會回傳非0值,則可跳出迴圈*/
            if(DETECT_CHAR(*asrc, mask)) 
                break;

            asrc++; /* 每 long 長度個 byte 進行換下一 long 的長度*/
            length -= LBLOCKSIZE; /* 每 long 長度遞減*/
        }

        /* If there are fewer than LBLOCKSIZE characters left, then we resort to
         * the bytewise loop.
         */
        src = (unsigned char *) asrc; /* 大範圍找過後,擷取找到第一個出現的字串片段*/
    }
    /*針對該段字串,進行每個字元的查找*/
    while (length--) {
        if (*src == d)
            return (void *) src;
        src++;
    }
    return NULL;
}

測試程式碼:
目標為找尋字串中的第一個 '.' 字元，印出第一個 '.' 之後的字串

int main()
{
    const char str[] = "http://wiki.csie.ncku.edu.tw";
    const char ch = '.';
    
    char *ret = memchr_opt(str, ch, strlen(str));
    printf("String after |%c| is - |%s|\n", ch, ret);
    return 0;
}

輸出結果:

String after |.| is - |.csie.ncku.edu.tw|

首先看到 7~13 行，程式主要是透過一次檢查 long 長度的字串長度進行快速的查找
所以在一開始進行的時候先確認字串是否與 long 的長度對齊，利用 UNALIGNED 進行判斷

/* Nonzero if either X or Y is not aligned on a "long" boundary */
#define UNALIGNED(X) ((long) X & (sizeof(long) - 1))

再來看到 15 行的判斷式，判斷字串的長度是否有小於一次找尋的長度，若字串長度超過才需要一次大範圍的找，以 long 的長度進行，而根據不同的電腦架構 long 的長度也會不一樣(佔 8 byte 或 4 byte)
若小於 long 長度則直接對輸入字串查找即可

/* How many bytes are loaded each iteration of the word copy loop */
#define LBLOCKSIZE (sizeof(long)) /* 8 byte or 4 byte */
/* Threshhold for punting to the bytewise iterator */
#define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE)

再來看到如何實現快速查找，首先看到 26 行中，將字串強制轉型為 long 指標，所以將字串的的每 long 長度個字元組合成的待檢查數值，每個字元皆由 8 bits 組成，所以若一次查到 8 個字元，會形成 64 bits 的待檢查數值
製作 bitmask ，看到 31 行、 34 行、 37~38 行中，製作 32 bits 或 64 bits 的目標字元 mask，以本題為例是找尋 '.'

'.' ASCII 為 46 以 8 bits 顯示為 000101110
做成 16 bits，先 shift 8 位再重複
-> 000101110 000101110
同樣的手法做成 32 bits 先 shift 16 位再重複
-> 000101110 000101110 000101110 000101110

也可以做成 64 bits 但取決於電腦架構的 long 型態決定
若 long 為 32 bits 則不會進入迴圈
若 long 為 64 bits 則進入迴圈將 mask 用同樣的手法製作成 64 bits

製作 bitmask 後，開始進行字元的查找，透過 DETECT_CHAR 巨集， * 解參考與 mask ( '.' mask)進行查找，若該片段有找出 '.' 在 DETECT_CHAR 會回傳非 0 值則可跳出迴圈，對該片段進行逐字元的查找
asrc++; 若沒找到字元則需要往下一個字串片段遞增，而因為宣告型態為 long 型態在遞增(++)時為每 long 長度進行
length -= LBLOCKSIZE; 字串的長度也隨著每個迴圈進行遞減，也是根據 long 長度，若迴圈都沒找到則 length 會得到負值，在後續的程式碼會回傳 NULL 表示位找到目標字元

XXXXX:

while (length >= LBLOCKSIZE) {
    /* 如果每 long 長度個 bytes 進入函式有找到目標字元,則會回傳非0值,則可跳出迴圈*/
    if(DETECT_CHAR(*asrc, mask))
        break;

    asrc++; /* 每 long 長度個 byte 進行換下一 long 的長度*/
    length -= LBLOCKSIZE; /* 每 long 長度遞減*/
}

此兩個巨集就是透過 ^ 的方式進行查找，若有相符的字元則在 ^ 會得到 0 的結果(歸零律: p ^ p = 0)
透過 DETECT_NULL 將 0 的值轉成 80 ，將非 0 值轉成 0

/* Nonzero if X (a long int) contains a NULL byte. */
#define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080)
/* @return nonzero if (long)X contains the byte used to fill MASK. */
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))

測試:

const char str[] = "...12345";
unsigned long *test = (unsigned long *)str;
printf("result : %lx\n", DETECT_CHAR(*test, mask));
/* result : 80808000000 */

跳出迴圈後，在 53 行中得到含有目標字元該片段的起始位置
最後再以該片段進行逐字元的查找( 56~60 行)，此時的 src++; 是以 char 的長度進行移動，每次移動一個 byte 不同於 long 的移動方式

while (length--) {
    if (*src == d)
        return (void *) src;
    src++;
}

2022q1 Homework 5 (quiz8)

測驗1

測驗2

Read more

2022q1 Homework6 (ktcp)

2022q1 Homework3 (quiz3)

2022q1 Homework2 (quiz2)

2022q1 Homework1 (lab0)