# 2022-04-04 [rickywu0421](https://github.com/rickywu0421?tab=repositories)
## 測驗 1
### 題目
```c
#include <stddef.h>
#include <stdint.h>
#include <limits.h>
#include <string.h>
/* Nonzero if either X or Y is not aligned on a "long" boundary */
#define UNALIGNED(X) ((long) X & (sizeof(long) - 1))
/* How many bytes are loaded each iteration of the word copy loop */
#define LBLOCKSIZE (sizeof(long))
/* Threshhold for punting to the bytewise iterator */
#define TOO_SMALL(LEN) ((LEN) < LBLOCKSIZE)
#if LONG_MAX == 2147483647L
#define DETECT_NULL(X) (((X) -0x01010101) & ~(X) & 0x80808080)
#else
#if LONG_MAX == 9223372036854775807L
/* Nonzero if X (a long int) contains a NULL byte. */
#define DETECT_NULL(X) (((X) -0x0101010101010101) & ~(X) & 0x8080808080808080)
#else
#error long int is not a 32bit or 64bit type.
#endif
#endif
/* @return nonzero if (long)X contains the byte used to fill MASK. */
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))
void *memchr_opt(const void *src_void, int c, size_t length)
{
const unsigned char *src = (const unsigned char *) src_void;
unsigned char d = c;
while (UNALIGNED(src)) {
if (!length--)
return NULL;
if (*src == d)
return (void *) src;
src++;
}
if (!TOO_SMALL(length)) {
/* If we get this far, we know that length is large and
* src is word-aligned.
*/
/* The fast code reads the source one word at a time and only performs
* the bytewise search on word-sized segments if they contain the search
* character, which is detected by XORing the word-sized segment with a
* word-sized block of the search character and then detecting for the
* presence of NULL in the result.
*/
unsigned long *asrc = (unsigned long *) src;
unsigned long mask = d << 8 | d;
mask = mask << 16 | mask;
for (unsigned int i = 32; i < LBLOCKSIZE * 8; i <<= 1)
mask = (mask << i) | mask;
while (length >= LBLOCKSIZE) {
/* XXXXX: Your implementation should appear here */
}
/* If there are fewer than LBLOCKSIZE characters left, then we resort to
* the bytewise loop.
*/
src = (unsigned char *) asrc;
}
while (length--) {
if (*src == d)
return (void *) src;
src++;
}
return NULL;
}
```
### 作答
```c
/* The fast code reads the source one word at a time and only performs
* the bytewise search on word-sized segments if they contain the search
* character, which is detected by XORing the word-sized segment with a
* word-sized block of the search character and then detecting for the
* presence of NULL in the result.
*/
```
根據上述程式碼註解可以推斷, 此題運用 bitwise operation 一次對 `sizeof(long)` byte (one word) 的字串進行操作, 這樣會比原本 naive 的方法 (一次處理一個 byte) 有效率, 因其善用了 cpu 一次讀取一個 word 到 register 中, 降低了從 memory/cache 中讀取資料的成本。
如何判斷一個 word 中使否存在目標字元 `d` 的具體的作法:
產生一個 mask, 其為 `d` 的擴展:假設 `d = '.'(0x2e)`, 則產生一個 `mask = "........"(0x2e2e2e2e2e2e2e2e)`, 再將每個 word 對 mask 做 xor 運算, 若運算結果中存在一個 byte 為 0 (透過 `DETECT_NULL` 巨集判斷), 則表示該 word 中存在與 `d` 相同值的 byte。
題目非常好心的提供了一個 `DETECT_CHAR` 巨集:
```c
#define DETECT_CHAR(X, MASK) (DETECT_NULL(X ^ MASK))
```
其做的即是上述提到的操作。
以下為答題的部份:
```c
while (length >= LBLOCKSIZE) {
if (DETECT_CHAR(*asrc, mask))
break;
length -= LBLOCKSIZE;
asrc++;
}
```
這邊要注意的是要對 `asrc++` 而不是 `asrc += LBLOCKSIZE`, 因為 `asrc` 的 type 為 `unsigned long *`, 進行 post-increment 後其值會加上 `sizeof(unsigned long)` 而不是 `1`。