# [2022q1](http://wiki.csie.ncku.edu.tw/linux/schedule) 第 8 週測驗題
> contributed by < [Nomad1230](https://github.com/Nomad1230) >
> 2020-04-04
## 測驗 `1`
測試程式碼:
```c
#include <stddef.h>
#include <stdint.h>
#include <limits.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
/* Nonzero if either X or Y is not aligned on a "long" boundary */
#define UNALIGNED(X) ((long) X & (sizeof(long) - 1))
/* How many bytes are loaded each iteration of the word copy loop */
#define LBLOCKSIZE (sizeof(long))
/* Threshhold for punting to the bytewise iterator */
#define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE)
#if LONG_MAX == 2147483647L
#define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080)
#else
#if LONG_MAX == 9223372036854775807L
/* Nonzero if X (a long int) contains a NULL byte. */
#define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080)
#else
#error long int is not a 32bit or 64bit type.
#endif
#endif
/* @return nonzero if (long)X contains the byte used to fill MASK. */
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))
void *memchr_opt(const void *src_void, int c, size_t length)
{
const unsigned char *src = (const unsigned char *) src_void;
unsigned char d = c;
while (UNALIGNED(src)) {
if (!length--)
return NULL;
if (*src == d)
return (void *) src;
src++;
}
if (!TOO_SMALL(length)) {
/* If we get this far, we know that length is large and
* src is word-aligned.
*/
/* The fast code reads the source one word at a time and only performs
* the bytewise search on word-sized segments if they contain the search
* character, which is detected by XORing the word-sized segment with a
* word-sized block of the search character and then detecting for the
* presence of NULL in the result.
*/
unsigned long *asrc = (unsigned long *) src;
unsigned long mask = d << 8 | d;
mask = mask << 16 | mask;
for (unsigned long i = 32; i < LBLOCKSIZE * 8; i <<= 1)
mask = (mask << i) | mask;
while (length >= LBLOCKSIZE) {
/* XXXXX: Your implementation should appear here */
if (DETECT_CHAR(*asrc, mask))
break;
else {
length -= LBLOCKSIZE;
asrc++;
}
}
/* If there are fewer than LBLOCKSIZE characters left, then we resort to
* the bytewise loop.
*/
src = (unsigned char *) asrc;
}
while (length--) {
if (*src == d)
return (void *) src;
src++;
}
return NULL;
}
int main()
{
const char str[] = "http://wiki.csie.ncku.edu.tw";
const char ch = '.';
char *ret = memchr_opt(str, ch, strlen(str));
printf("String after |%c| is - |%s|\n", ch, ret);
return 0;
}
```
此程式碼的目的在於用 SWAR 的方式改良 `memchr` ,原本 `memchr` 的方式為一個字符一個字符去讀取並比較,而改良後的 `memchr_opt` 會先判斷記憶體是否與 `long` 型別對齊,若沒對齊則用 `while` 迴圈用原本讀取單個字符的方式讀取,直到記憶體與 `long` 對齊後跳出。
當記憶體對齊後即可使用 SWAR 增進讀取效率,改為一次讀取 `long` 大小的資料作判斷。
由於一次讀取 `long` 大小,也就是 8 bytes ,故在判斷資料時不能使用 `==` 運算子,在此程式中使用的方式是先產生一個 bit mask :
```c
unsigned long mask = d << 8 | d;
mask = mask << 16 | mask;
for (unsigned long i = 32; i < LBLOCKSIZE * 8; i <<= 1)
mask = (mask << i) | mask;
```
可以看出 `mask` 其實就是把 `d` 變數複製成 8 bytes 的長度,如示意圖:
```graphviz
digraph struct {
label = "d"
node [shape = record]
num1 [label = "."]
}
```
```graphviz
digraph struct {
label = "mask"
node [shape = record]
num1 [label = ".|.|.|.|.|.|.|."]
}
```
如此一來就可以用 XOR 與資料進行比對:
```c
DETECT_CHAR(*asrc, mask)
```
看到此巨集展開的定義:
```c
#if LONG_MAX == 9223372036854775807L
#define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080)
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))
```
先解釋 `DETECT_NULL(X)` 的原理,可以看到此巨集先把 `X` 減掉 `0x0101010101010101` 並與 `~(X)` 作 AND 運算,也就是先把 `X` 的每一個 byte 都減去 1 ,這時和 `~(X)` 作 AND 運算,若此 byte 不為空,其結果在最左邊的 bit 必不為 0。
若 byte 為空,在減去 1 的時候會補位成 `0xff` ,與 `~(X)` 作 AND 運算的結果會是 `0xff`。
因此最後再與 `0x8080808080808080` 作 AND 運算即可判斷是否有空的 byte 。
最後看到 `DETECT_CHAR` :
```c
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))
```
其先把 `X` 與 `mask` 作 XOR 運算,若 `X` 其中某一 byte 與 mask 相等,其結果將會為 0 ,不為 0 則代表不相等,因此用 `DETECT_NULL` 就可以判別資料是否相等。
`XXXXX` 部份的程式碼實作如下:
```c
while (length >= LBLOCKSIZE) {
/* XXXXX: Your implementation should appear here */
if (DETECT_CHAR(*asrc, mask))
break;
else {
length -= LBLOCKSIZE;
asrc++;
}
}
```
每次取一 byte 的資料作判斷,若沒找到符合的資料就將 `length` 減去 8 ,並將指標往後移,注意這邊 `asrc` 為 `unsigned long` 型別的指標,加 1 就會一次移動 8 bytes 的大小。
## 測驗 `2`
## 測驗 `3`