2020q3 Homework2 (quiz2)

# 2020q3 Homework2 (quiz2) contributed by < `johnnycck` > ## 測驗 1 ### 題目 :::info 參考以下程式碼： ```c= #include <stddef.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; for (int i = 0; i < size; i++) if (str[i] & 0x80) /* i.e. (unsigned) str[i] >= 128 */ return false; return true; } ``` 其中 str 是開頭的記憶體地址，size 則是要檢驗的範圍。逐一字元比對的效率低，在 64 位元的架構下，我們嘗試一次比對 64 位元的資料 (即 word size)，程式碼改寫如下: ```c= #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> bool is_ascii(const char str[], size_t size) { if (size == 0) return false; int i = 0; while ((i + 8) <= size) { uint64_t payload; memcpy(&payload, str + i, 8); if (payload & MMM) return false; i += 8; } while (i < size) { if (str[i] & 0x80) return false; i++; } return true; } ``` MMM = ? (a) 0x0080008000800080 (b) 0x8000800080008000 (c) 0x0808080808080808 (d) 0x8080808080808080 (answer) (e) 0x8888888888888888 (f) 0xFFFFFFFFFFFFFFFF ::: ### 題目分析 1. 若是擴展字元(128~255)，那第8個 bit 勢必是 1，因此簡易版方法是以 &0x80 的方式，若非 0 則代表大於 127。 2. 若是要擴展成一次比對 64 位元，也就是 8 byte，推論下來就是一次比對 &0x8080808080808080 (選項 d) 3. 若是比對的資料小於 8 byte，則將剩下的 1 byte 以 &0x80 方式一一比完 ### 延伸問題 :::success * 解釋上述程式的運作原理，特別是為何用到 memcpy 呢？提示: 與 data alignment 有關，詳閱 [C 語言：記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/c-memory) * 將上述技巧應用於「給予一個已知長度的字串，檢測裡頭是否包含有效的英文大小寫字母」 * 承 (2)，考慮常見的英文標點符號，判斷輸入字串是否為有效的字元集，避免逐一字元比對 * 提示: Intel SSE 4.2 加入 STTNI (String and Text New Instructions) 指令，可加速字串比對，詳見: [Schema Validation with Intel® Streaming SIMD Extensions 4](https://software.intel.com/content/www/us/en/develop/articles/schema-validation-with-intel-streaming-simd-extensions-4-intel-sse4.html) ::: ### 延伸問題分析 1. memcpy.c 如下，source code 來源 -- [glibc-2.14](http://ftp.gnu.org/gnu/glibc/glibc-2.14.tar.bz2 )： ```c= #include <string.h> #include <memcopy.h> #include <pagecopy.h> #undef memcpy void * memcpy (dstpp, srcpp, len) void *dstpp; const void *srcpp; size_t len; { unsigned long int dstp = (long int) dstpp; unsigned long int srcp = (long int) srcpp; /* Copy from the beginning to the end. */ /* If there not too few bytes to copy, use word copy. */ if (len >= OP_T_THRES) { /* Copy just a few bytes to make DSTP aligned. */ len -= (-dstp) % OPSIZ; BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ); /* Copy whole pages from SRCP to DSTP by virtual address manipulation, as much as possible. */ PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len); /* Copy from SRCP to DSTP taking advantage of the known alignment of DSTP. Number of bytes remaining is put in the third argument, i.e. in LEN. This number may vary from machine to machine. */ WORD_COPY_FWD (dstp, srcp, len, len); /* Fall out and copy the tail. */ } /* There are just a few bytes to copy. Use byte memory operations. */ BYTE_COPY_FWD (dstp, srcp, len); return dstpp; } libc_hidden_builtin_def (memcpy) ``` * 可以看到，若是要複製的 `len` 小於 `OP_T_THRES` (根據系統可能為 8 或 16)，就直接進行 byte 的複製 * OPSIZ 代表 sizeof(unsigned long int)，在例子中假設是8 * 若是大於 `OP_T_THRES`，則先把多於的 byte 做複製後，先以 PAGE 的形式複製，再來以 WORD 的形式複製，以此達到 data alignment 的效果 * 以下舉例傳入值為 memcpy(0xd,0x5ff1,16350)，也就是要從 0xd 這個地址複製 16350 bytes 到 0x5ff1 的地址，一 page 為預設的 4KB ![](https://i.imgur.com/3xkuK6k.jpg) * 假設 `OP_T_THRES` 為 8，毫無疑問會進入 (len >= OP_T_THRES) 的 block * `(-dstp) % OPSIZ` 可以得到需要做 alignment 的 byte 數，如在這個例子，會得到 7， `0x5ff1` 跟 8 取餘數會得到 7，代表若是要一個 PAGE 複製，甚至一個 WORD 複製，都需要先把 7 個 bytes 複製完 * 因此程式碼會執行 `BYTE_COPY_FWD (dstp, srcp, (-dstp) % OPSIZ)` ，先將 7 bytes 複製過去 * 接下來會盡可能複製越多 PAGE 越好，程式碼會執行 PAGE_COPY_FWD_MAYBE (dstp, srcp, len, len)，這是一個定義在 pagecopy.h 的 MACRO，類似於前面的操作，只不過這次換成先把 WORD copy 完，再來 copy PAGE，以此加快速度，以下擷取 `PAGE_COPY_FWD_MAYBE` 部分程式碼 ```c= if (nbytes_before != 0) \ { \ /* First copy the words before the first page boundary. */ \ WORD_COPY_FWD (dstp, srcp, nbytes_left, nbytes_before); \ assert (nbytes_left == 0); \ nbytes -= nbytes_before; \ } \ PAGE_COPY_FWD (dstp, srcp, nbytes_left, nbytes); ``` * 最後會執行 `WORD_COPY_FWD (dstp, srcp, len, len);`，把剩餘的 WORD 複製完，也就完成 memcpy 的操作 ## 測驗 2 ## 測驗 3 ## 測驗 4 ## 測驗 5 ## 測驗 6