# [2022q1](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 第 8 週測驗題 ###### tags: `linux2022` ### 測驗 `1` [SIMD within a register](https://en.wikipedia.org/wiki/SWAR) (SWAR) 是軟體最佳化技巧之一,以下展示 SWAR 運用於 64 位元微處理器架構,原本判斷 2 個 32 位元寬度的整數是否都是奇數 (odd),可能會這樣撰寫: ```cpp #include <stdint.h> bool both_odd(uint32_t x, uint32_t y) { return (x & 1) && (y & 1); } ``` 但我們可先組合 (compound) 2 個 32 位元寬度的整數為 1 個 64 位元整數,再運用特製的 bitmask,從而減少運算量: ```cpp static uint64_t SWAR_ODD_MASK = (1L << 32) + 1; bool both_odd_swar(uint64_t xy) { return (xy & SWAR_ODD_MASK) == SWAR_ODD_MASK; } ``` 測試程式: ```cpp static inline uint64_t bit_compound(uint32_t x, uint32_t y) { return ((0L + x) << 32) | ((y + 0L) & (-1L >> 32)); } int main() { int x = 12345678; int y = 9012345; uint64_t xy = bit_compound(x, y); printf("%d, %d\n", both_odd(x, y), both_odd_swar(xy)); } ``` 延伸閱讀: [SIMD and SWAR Techniques](https://www.chessprogramming.org/SIMD_and_SWAR_Techniques) 在 Linux 核心原始程式碼中,[lib/string.c](https://github.com/torvalds/linux/blob/master/lib/string.c) 具備 [memchr](https://man7.org/linux/man-pages/man3/memchr.3.html) 的實作: ```cpp /** * memchr - Find a character in an area of memory. * @s: The memory area * @c: The byte to search for * @n: The size of the area. * * returns the address of the first occurrence of @c, or %NULL * if @c is not found */ void *memchr(const void *s, int c, size_t n) { const unsigned char *p = s; while (n-- != 0) { if ((unsigned char)c == *p++) { return (void *)(p - 1); } } return NULL; } ``` 測試程式: ```cpp int main() { const char str[] = "http://wiki.csie.ncku.edu.tw"; const char ch = '.'; char *ret = memchr_opt(str, ch, strlen(str)); printf("String after |%c| is - |%s|\n", ch, ret); return 0; } ``` 預期執行結果: ``` String after |.| is - |.csie.ncku.edu.tw| ``` 利用上述 [SIMD within a register](https://en.wikipedia.org/wiki/SWAR) (SWAR) 的技巧,我們可改寫為以下 `memchr_opt` 函式: ```cpp #include <stddef.h> #include <stdint.h> #include <limits.h> #include <string.h> /* Nonzero if either X or Y is not aligned on a "long" boundary */ #define UNALIGNED(X) ((long) X & (sizeof(long) - 1)) /* How many bytes are loaded each iteration of the word copy loop */ #define LBLOCKSIZE (sizeof(long)) /* Threshhold for punting to the bytewise iterator */ #define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE) #if LONG_MAX == 2147483647L #define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080) #else #if LONG_MAX == 9223372036854775807L /* Nonzero if X (a long int) contains a NULL byte. */ #define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080) #else #error long int is not a 32bit or 64bit type. #endif #endif /* @return nonzero if (long)X contains the byte used to fill MASK. */ #define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK)) void *memchr_opt(const void *src_void, int c, size_t length) { const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; while (UNALIGNED(src)) { if (!length--) return NULL; if (*src == d) return (void *) src; src++; } if (!TOO_SMALL(length)) { /* If we get this far, we know that length is large and * src is word-aligned. */ /* The fast code reads the source one word at a time and only performs * the bytewise search on word-sized segments if they contain the search * character, which is detected by XORing the word-sized segment with a * word-sized block of the search character and then detecting for the * presence of NULL in the result. */ unsigned long *asrc = (unsigned long *) src; unsigned long mask = d << 8 | d; mask = mask << 16 | mask; for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1) mask = (mask << i) | mask; while (length >= LBLOCKSIZE) { /* XXXXX: Your implementation should appear here */ } /* If there are fewer than LBLOCKSIZE characters left, then we resort to * the bytewise loop. */ src = (unsigned char *) asrc; } while (length--) { if (*src == d) return (void *) src; src++; } return NULL; } ``` 請補完程式碼,使上述 `memchr_opt` 的實作符合 [memchr](https://man7.org/linux/man-pages/man3/memchr.3.html) 行為,作答規範: 1. 列出 `memchr_opt` 函式完整程式碼,儘量撰寫程式註解 2. `XXXX` 所在的 scope 應該利用 `DETECT_CHAR` 巨集 3. 儘量以最精簡的程式碼撰寫 :::success 延伸問題: 1. 解釋上述程式碼運作原理 2. 比較 Linux 核心原本 (與平台無關) 的實作和 `memchr_opt`,設計實驗來觀察隨著字串長度和特定 pattern 變化的效能影響 3. 在 Linux 核心原始程式碼找出 x86_64 或 arm64 對應的最佳化實作,跟上述程式碼比較,並嘗試舉出其中的策略和分析 :::