# 2022q1 Homework8 (quiz8) ###### tags: `2022-04-04` 、 `StevenChou499` ## 測驗 `1` > 利用上述 [SIMD within a register](https://en.wikipedia.org/wiki/SWAR) (SWAR) 的技巧,我們可改寫為以下 `memchr_opt` 函式: > ```c= > #include <stddef.h> > #include <stdint.h> > #include <limits.h> > #include <string.h> > > /* Nonzero if either X or Y is not aligned on a "long" boundary */ > #define UNALIGNED(X) ((long) X & (sizeof(long) - 1)) > > /* How many bytes are loaded each iteration of the word copy loop */ > #define LBLOCKSIZE (sizeof(long)) > > /* Threshhold for punting to the bytewise iterator */ > #define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE) > > #if LONG_MAX == 2147483647L > #define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080) > #else > #if LONG_MAX == 9223372036854775807L > /* Nonzero if X (a long int) contains a NULL byte. */ > #define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080) > #else > #error long int is not a 32bit or 64bit type. > #endif > #endif > > /* @return nonzero if (long)X contains the byte used to fill MASK. */ > #define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK)) > > void *memchr_opt(const void *src_void, int c, size_t length) > { > const unsigned char *src = (const unsigned char *) src_void; > unsigned char d = c; > > while (UNALIGNED(src)) { > if (!length--) > return NULL; > if (*src == d) > return (void *) src; > src++; > } > > if (!TOO_SMALL(length)) { > /* If we get this far, we know that length is large and > * src is word-aligned. > */ > > /* The fast code reads the source one word at a time and only performs > * the bytewise search on word-sized segments if they contain the search > * character, which is detected by XORing the word-sized segment with a > * word-sized block of the search character and then detecting for the > * presence of NULL in the result. > */ > unsigned long *asrc = (unsigned long *) src; > unsigned long mask = d << 8 | d; > mask = mask << 16 | mask; > for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1) > mask = (mask << i) | mask; > > while (length >= LBLOCKSIZE) { > /* XXXXX: Your implementation should appear here */ > } > > /* If there are fewer than LBLOCKSIZE characters left, then we resort to > * the bytewise loop. > */ > src = (unsigned char *) asrc; > } > > while (length--) { > if (*src == d) > return (void *) src; > src++; > } > > return NULL; > } > ``` > 請補完程式碼,使上述 memchr_opt 的實作符合 [memchr](https://man7.org/linux/man-pages/man3/memchr.3.html) 行為,作答規範: > 1. 列出 `memchr_opt` 函式完整程式碼,儘量撰寫程式註解 > 2. `XXXX` 所在的 scope 應該利用 `DETECT_CHAR` 巨集 > 3. 儘量以最精簡的程式碼撰寫 ## Trace code 為了知道 `XXXXX` 需要填什麼,我們需要先了解程式碼的內容與用意。 ```c=7 #define UNALIGNED(X) ((long) X & (sizeof(long) - 1)) ``` `UNALIGNED()` 的作用是確認該字串的地址是否有以 `long` 的長度來做對齊,若有則會回傳 0 ,反之則會回傳其他整數。 ```c=10 #define LBLOCKSIZE (sizeof(long)) ``` 根據不同的 [Data Model](https://ryan0988.pixnet.net/blog/post/194111613-%E8%B3%87%E6%96%99%E6%A8%A1%E5%9E%8B(data-model-lp32--ilp32--lp64--ilp64--llp64)) , long 所需的字節數也不同,因此需要以巨集來定義。 ```c=13 #define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE) ``` 比較其大小是否超過 long 的長度。 ```c=15 #if LONG_MAX == 2147483647L #define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080) #else #if LONG_MAX == 9223372036854775807L /* Nonzero if X (a long int) contains a NULL byte. */ #define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080) #else #error long int is not a 32bit or 64bit type. #endif #endif ``` 此處的用意為區分 32-bit 與 64-bit 在數字上表現的極限差異,並且其中需要的 `DETECT_NULL()` 也會因此需要調整。 ```c=27 #define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK)) ``` 透過 `DETECT_NULL()` 去檢查是否擁有該字節,若沒有則會回傳 0 ,反之則不會。 ```c=31 const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; while (UNALIGNED(src)) { if (!length--) return NULL; if (*src == d) return (void *) src; src++; } ``` 此處為判斷 `src` 是否有對齊成功,若有則不會進入,若無則會使用例題之方法檢查並回傳地址或是回傳  `NULL` 。 ```c=42 if (!TOO_SMALL(length)) { /* If we get this far, we know that length is large and * src is word-aligned. */ /* The fast code reads the source one word at a time and only performs * the bytewise search on word-sized segments if they contain the search * character, which is detected by XORing the word-sized segment with a * word-sized block of the search character and then detecting for the * presence of NULL in the result. */ unsigned long *asrc = (unsigned long *) src; unsigned long mask = d << 8 | d; mask = mask << 16 | mask; for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1) mask = (mask << i) | mask; while (length >= LBLOCKSIZE) { /* XXXXX: Your implementation should appear here */ } /* If there are fewer than LBLOCKSIZE characters left, then we resort to * the bytewise loop. */ src = (unsigned char *) asrc; } ``` 此處是先確認字傳長度夠長,接著定義一個指向 `unsigned long` 的 `asrc` ,並且使用想要找到的 `d` 製作一個 `mask` ,後面的 `for` 迴圈是因為會有不同的 [Data Model](https://ryan0988.pixnet.net/blog/post/194111613-%E8%B3%87%E6%96%99%E6%A8%A1%E5%9E%8B(data-model-lp32--ilp32--lp64--ilp64--llp64)) 導致 `long` 所佔的 bytes 不一樣。 ### `XXXXX` 填答: 在 `while` 迴圈中就是正式尋找字元 `c` 的範圍了。因為是需要依照 `LBLOCKSIZE` 的長度去做尋找,所以我們需要透過 `DETECT_CHAR()` ,再加上 `mask` 與 `asrc` 來尋找,但是由於 `asrc` 為一指標,所以需要加上 `*` 。若 `DETECT_CHAR(mask, *asrc)` 回傳非 0 值,代表在這 8 個 byte 中有找到所需字元 `c` ,這時必須停止尋找,並跳出迴圈,因此要加上 `break` 。反之若回傳 0 則代表還沒有找到,則需要跳往下 8 個 byte 繼續尋找,此時必須將 `asrc++` 便會一次跳躍 8 個 byte ,並且 `length` 要減去 `LBLOCKSIZE` ,這樣才不會已經檢查完了但確還沒有跳出迴圈。 ### 完整程式碼註解 ```c #include <stddef.h> #include <stdint.h> #include <limits.h> #include <string.h> #include <stdio.h> /* Nonzero if either X or Y is not aligned on a "long" boundary */ #define UNALIGNED(X) ((long) X & (sizeof(long) - 1)) /* How many bytes are loaded each iteration of the word copy loop */ #define LBLOCKSIZE (sizeof(long)) /* Threshhold for punting to the bytewise iterator */ #define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE) #if LONG_MAX == 2147483647L // computer is 32-bit #define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080) #else #if LONG_MAX == 9223372036854775807L // computer is 64-bit /* Nonzero if X (a long int) contains a NULL byte. */ #define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080) #else #error long int is not a 32bit or 64bit type. #endif #endif /* @return nonzero if (long)X contains the byte used to fill MASK. */ #define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK)) void *memchr_opt(const void *src_void, int c, size_t length) { const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; while (UNALIGNED(src)) { // use normal way to find character if (!length--) return NULL; if (*src == d) return (void *) src; src++; } if (!TOO_SMALL(length)) { // the length si long enough /* If we get this far, we know that length is large and * src is word-aligned. */ /* The fast code reads the source one word at a time and only performs * the bytewise search on word-sized segments if they contain the search * character, which is detected by XORing the word-sized segment with a * word-sized block of the search character and then detecting for the * presence of NULL in the result. */ unsigned long *asrc = (unsigned long *) src; unsigned long mask = d << 8 | d; // create a mask full of character d mask = mask << 16 | mask; for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1) mask = (mask << i) | mask; // the mask is whether 4 or 8 bytes long while (length >= LBLOCKSIZE) { if(DETECT_CHAR(mask, *asrc)){ // return non-zero if detect d in *asrc break; // if detected, exit the while loop } asrc++; // if not detected, jump to the next 8 bytes length -= LBLOCKSIZE; // subtract length by LBLOCKSIZE } /* If there are fewer than LBLOCKSIZE characters left, then we resort to * the bytewise loop. */ src = (unsigned char *) asrc; } while (length--) { // continue searching for character d if (*src == d) return (void *) src; // if found, return the address src++; // if not found, jump to the next character } return NULL; // if all not found, return NULL } ``` :::success 解答: ```c while (length >= LBLOCKSIZE) { /* XXXXX: Your implementation should appear here */ if(DETECT_CHAR(mask, *asrc)){ break; } asrc++; length -= LBLOCKSIZE; } ``` ::: ## 測驗 `2`