[2019q3](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 第 11 週測驗題

# [2019q3](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 第 11 週測驗題 :::info 目的: 1. 檢驗學員對 C 語言程式設計的認知 2. 檢驗學員對 mmap 系統呼叫及對應記憶體管理的認知 ::: --- ### 測驗 `1` 在 [PLDI'19](https://pldi19.sigplan.org/) 論文 [Computing Summaries of String Loops in C for Better Testing and Refactoring](https://srg.doc.ic.ac.uk/files/papers/loops-pldi-19.pdf) 提到開發者會希望對 loop summarisation 進行替換，例如 [GNU patch](http://www.gnu.org/software/patch/) 這支程式原本程式碼是: ```cpp while (*s != '\n') s++; ``` 替換為 GNU extension: ```cpp s = rawmemchr(s, '\n'); ``` 其中 [rawmemchr](https://linux.die.net/man/3/rawmemchr) 是自從 glibc-2.1 即提供的擴充函式: > The rawmemchr() function is similar to memchr(): it assumes (i.e., the programmer knows for certain) that an instance of c lies somewhere in the memory area starting at the location pointed to by s, and so performs an optimized search for c (i.e., no use of a count argument to limit the range of the search). If an instance of c is not found, the results are unpredictable. The following call is a fast means of locating a string's terminating null byte: > ```cpp > char *p = rawmemchr(s, '\0'); > ``` 以下程式碼嘗試運用 [你所不知道的C語言：數值系統篇](https://hackmd.io/@sysprog/c-numerics) 提及的 Detect NULL 技巧，來加速 `rawmemchr` 的實作，適用於 32 位元和 64 位元的 little-endian 架構： ```cpp #include <limits.h> #define UNALIGNED(X) ((long) X & (sizeof(long) - 1)) /* How many bytes are loaded each iteration of the word copy loop. */ #define LBLOCKSIZE (sizeof(long)) #if LONG_MAX == 2147483647L #define DETECTNULL(X) (((X) -0x01010101) & ~(X) &0x80808080) #else #if LONG_MAX == 9223372036854775807L #define DETECTNULL(X) (((X) -0x0101010101010101) & ~(X) &0x8080808080808080) #else #error long int is not a 32bit or 64bit type. #endif #endif #ifndef DETECTNULL #error long int is not a 32bit or 64bit byte #endif /* DETECTCHAR returns nonzero if (long)X contains the byte used to fill (long)MASK. */ #define DETECTCHAR(X, MASK) (DETECTNULL(X ^ MASK)) void *rawmemchr(const void *src_void, int c) { const unsigned char *src = (const unsigned char *) src_void; unsigned char d = c; while (UNALIGNED(src)) { if (*src == d) return (void *) src; src++; } /* If we get this far, src is word-aligned. */ unsigned long *asrc = (unsigned long *) src; unsigned long mask = d << 8 | d; mask = mask << AAA | mask; for (unsigned long i = BBB; i < LBLOCKSIZE * 8; i <<= 1) mask = (mask << i) | mask; while (1) { if (DETECTCHAR(*asrc, mask)) break; asrc++; } /* resort to a bytewise loop. */ for (src = (unsigned char *) asrc;; src++) { if (*src == d) return (void *) src; } } ``` 請補完程式碼，應要能在 x86_64 架構獲得效能提升。 ==作答區== AAA = ? * `(a)` 4 * `(b)` 8 * `(c)` 16 * `(d)` 32 * `(e)` 64 BBB = ? * `(a)` 4 * `(b)` 8 * `(c)` 16 * `(d)` 32 * `(e)` 64 :::success 延伸問題: 1. 解釋上述程式碼運作原理，並探討是否適用於 Big-endian 硬體; 2. 比照 [CS:APP Assign5.18](https://hackmd.io/@aben20807/rkdzvWJTX) 在 GNU/Linux 上透過 perf 一類的效能工具，分析上述兩種 loop summarisation 實作的效能落差，並運用 [CS:APP](https://hackmd.io/@sysprog/CSAPP) 第 5 章的 CPE 分析手法 3. 研讀論文 [Computing Summaries of String Loops in C for Better Testing and Refactoring](https://srg.doc.ic.ac.uk/files/papers/loops-pldi-19.pdf)，嘗試探討裡頭 [Symbolic Execution](https://en.wikipedia.org/wiki/Symbolic_execution) 的手法，還有為何能做到自動分析程式碼並給予 refactor 建議 * 延伸閱讀: [Binary 自動分析的那些事](https://hitcon.org/2016/CMT/slide/day1-r1-a-1.pdf), [MIT 6.858 Computer Systems Security: Symbolic Execution](https://youtu.be/yRVZPvHYHzw) ::: --- ### 測驗 `2` 考慮到 [cirbuf](https://github.com/sysprog21/cirbuf) 這個 [circular buffer](https://en.wikipedia.org/wiki/Circular_buffer) 實作，嘗試透過 [mmap 系統呼叫](http://man7.org/linux/man-pages/man2/mmap.2.html)以化簡緩衝區邊界處理的議題。為何 circular buffer 初始化時要呼叫三次 mmap 呢？ * 第一次呼叫 * 向作業系統請求配置一塊足以容納兩倍 circular buffer 空間的記憶體 * 第二次呼叫: * 將稍早用 mkstemp 建立的暫存檔映射到第一次呼叫 mmap 所取得的記憶體中，映射大小為 buffer 的大小 * 第三次呼叫: * 以第二次要求映射的記憶體位置 (即第一次得到的位置) 加上 buffer 大小的記憶體位置作為要求配置的位置，映射大小同樣為 buffer 的大小，並也是映射到同樣的 fd 上(注意，兩次呼叫傳入的 offset 均為0) * 如此一來，第二次與第三次映射的記憶體位置即指向同一塊記憶體有了這個特性後，當我們使用 memcpy 在寫入/讀取 buffer 時即可不用考慮到邊界問題，進而改善存取效率，否則，我們必須考慮到目前 index 是否會超出邊界等等，這將對效能造成衝擊，在 [Using black magic to make a fast circular buffer](https://lo.calho.st/posts/black-magic-buffer/) 一文指出，這兩種方法的效率差距可達三倍之譜。對照 [cirbuf.h](https://github.com/sysprog21/cirbuf/blob/master/cirbuf.h) 和 [test-cirbuf.c](https://github.com/sysprog21/cirbuf/blob/master/tests/test-cirbuf.c) 以得知具體用法，請補完以下程式碼: ```cpp static inline int cirbuf_offer(cirbuf_t *cb, const unsigned char *data, const int size) { /* prevent buffer from getting completely full or over commited */ if (cirbuf_unusedspace(cb) <= size) return 0; int written = cirbuf_unusedspace(cb); written = size < written ? size : written; memcpy(cb->data + cb->tail, data, written); cb->tail += written; MM1 return written; } static inline unsigned char *cirbuf_peek(const cirbuf_t *cb) { if (cirbuf_is_empty(cb)) return NULL; MM2 } ``` ==作答區== MM1 = ? * `(a)` if (cb->size > cb->tail) cb->tail = 0; * `(b)` if (cb->size < cb->tail) cb->tail = 0; * `(c)` if (cb->size < cb->tail) cb->tail -= cb->size; * `(d)` if (cb->size < cb->tail) cb->tail %= cb->size; * `(e)` if (cb->size < cb->tail) cb->tail++; * `(f)` if (cb->size < cb->tail) cb->size = 0; MM2 = ? * `(a)` return cb->data; * `(b)` returb cb->head; * `(c)` returb cb->data + cb->head; * `(d)` return NULL; :::success 延伸問題: 1. 解析 test suite 運作原理，嘗試強化現有 cirbuf 的例外處理機制，並且實作於 unit test 內部; 2. 儘管已有 12 個 test case，但涵蓋層面仍不夠廣泛，請指出現有實作的缺陷並著手改善; （提示: 數值範圍及多個 PAGE_SIZE 的空間) 3. 學習 [Using black magic to make a fast circular buffer](https://lo.calho.st/posts/black-magic-buffer/)，指出 fast circular buffer 實作的技巧，並分析 cirbuf 的效能，並逐步量化及改善效率; :::