# 2019q1 Homework1 (lab0) ###### tags: `sysprog` `2019` `2019spring` contributed by < `czvkcui` > :::danger 認真看作業要求,除了提及如何逐步達到要求,還需要進行: 1. 改善程式效能 2. 解釋自動評分系統運作的原理 3. 提及 qtest 的行為和裡頭的技巧 還未達成符合預期目標,請繼續付出! :notes: jserv ::: # queue implement `q_new`,`q_free`,`q_insert_head`,`q_insert_tail`,`q_remove_head`,`q_size`,`q_reverse` implement ## q_new ```clike= typedef struct { list_ele_t *head, *tail; /* Linked list of elements */ /* You will need to add more fields to this structure to efficiently implement q_size and q_insert_tail */ size_t size; } queue_t; queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q) { q->head = NULL; q->tail = NULL; q->size = 0; } return q; } ``` size保存linked list 的長度 q_new 初始化`queue_t`結構 ## q_free ```clike= void q_free(queue_t *q) { /* Free queue structure */ if (!q) return; list_ele_t* t; while (q->head) { t = q->head->next; free(q->head->value); free(q->head); q->head = t; } free(q); } ``` ## q_insert_head ```clike= bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh = malloc(sizeof(list_ele_t)); int len = strlen(s); if (!q) goto error; if (!newh) goto error; newh->value = malloc((len + 1) * sizeof(char)); if (!newh->value) goto error; strncpy(newh->value, s, len); newh->value[len] = '\0'; newh->next = q->head; if (!q->head) { q->tail = newh; } q->head = newh; q->size++; return true; error: if (newh) free(newh); return false; } ``` 利用`goto`來統一處理例外狀況 ## q_insert_tail ```clike= bool q_insert_tail(queue_t *q, char *s) { list_ele_t *newh = malloc(sizeof(list_ele_t)); int len = strlen(s); if (!q) goto error; if (!newh) goto error; newh->value = malloc((len + 1) * sizeof(char)); if (!newh->value) goto error; strncpy(newh->value, s, len); newh->value[len] = '\0'; newh->next = NULL; if (!q->tail) { q->head = newh; q->tail = newh; } q->tail->next = newh; q->tail = newh; q->size++; return true; error: if (newh) free(newh); return false; } ``` ## q_remove_head ```clike= bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (!q || !q->head) goto error; char* str = q->head->value; if (sp) { strncpy(sp, str, bufsize - 1); sp[bufsize - 1] = '\0'; } list_ele_t* t = q->head; q->head = q->head->next; free(t->value); free(t); q->size--; if (q->size == 0) q->tail = NULL; return true; error: return false; } ``` ## q_size ```clike= int q_size(queue_t *q) { if (!q || !q->head) return 0; return q->size; } ``` ## q_reverse ```clike= void q_reverse(queue_t *q) { if (!q || !q->head || !q->head->next) return; if (q->head == q->tail) return; q->tail = q->head; q->head = q->head->next; list_ele_t* back = q->tail, *curr = q->head->next; while (curr != NULL) { q->head->next = back; back = q->head; q->head = curr; curr = curr->next; } q->head->next = back; q->tail->next = NULL; } ``` # 解釋自動評分系統運作的原理 ## scripts ### git hook we have 2 hooks inside our scripts: 1. pre-commit.hook 2. commit-msg.hook ### pre-commit.hook * trigger on `git commit` command * check format is suitable with clang format * use cppcheck checking ### commit-msg.hook 1. check commit message with regular expression ## qtest command interpret ### 初始化階段: ```c queue_init(); init_cmd(); console_init(); set_verblevel(level); ``` init_cmd和console_init會向console註冊所使用的command ### command parse `main`->`run_console`->`cmd_select`->`interpret_cmd`->`interpret_cmda` ### interpret_cmda執行 ```c cmd_ptr next_cmd = cmd_list; bool ok = true; while (next_cmd && strcmp(argv[0], next_cmd->name) != 0) next_cmd = next_cmd->next; if (next_cmd) { ok = next_cmd->operation(argc, argv); if (!ok) record_error(); } else { ... // error handling } ``` `cmd_list`是一個link list,它保存了`queue_init`和`init_cmd`所註冊的command。透過循序搜尋找到符合的cmd,然後執行對應的函式(next_cmd->operation)。 ## report.[hc] ```c static void check_exceed(size_t new_bytes) { size_t limit_bytes = (size_t) mblimit << 20; size_t request_bytes = new_bytes + current_bytes; if (mblimit > 0 && request_bytes > limit_bytes) { report_event(MSG_FATAL, "Exceeded memory limit of %u megabytes with %lu bytes", mblimit, request_bytes); } } ``` Q:Why limit_bytes = mblimit<<20 ? A:mblimit 作為判斷malloc有無超出上限,`mblimit<<20`代表最高可以使用的大小(KB),在本例中`0`<<20還是為0(`mblimit`預設為`0`表示無上限) ```c void init_time(double *timep) { (void) delta_time(timep); } ``` Q:Why we need `(void)` before delta_time(timep) ? A:某些編譯器可能會警告使用者沒有處理返回值,加上`(void)`來處理返回值(轉形成void),但是像是`printf`等常用的function可以在宣告時加上`__attribute__(noreturn)`來避免。 > 在gcc中 `-Wall` 預設會開啟`Wunused-result`,不過在我的平台上(`gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0`),刪除`(void)`和加上`-Wunused-result`都沒有效果。此外也可以在function宣告加上`__attribute__((warn_unused_result))`強迫警告使用者處理返回值。 > ### getrusage: 獲取資源的使用情形 ```c /* Number of bytes resident in physical memory */ size_t resident_bytes() { struct rusage r; size_t mem = 0; int code = getrusage(RUSAGE_SELF, &r); if (code < 0) { report_event(MSG_ERROR, "Call to getrusage failed"); } else { mem = r.ru_maxrss * 1024; } return mem; } ``` declaration: `int getrusage(int who, struct rusage *usage);` usage接受一個參數負責返回結果,而who則有以下參數 * RUSAGE_SELF: 只統計本身process的使用資訊(包含thread) * RUSAGE_CHILDREN: 會統計所有children和calling的process的資訊 * RUSAGE_THREAD(linux 2.6.26以後): 統計calling thread的資訊 rusage structure: ```c struct rusage { struct timeval ru_utime; /* user CPU time used */ struct timeval ru_stime; /* system CPU time used */ long ru_maxrss; /* maximum resident set size */ long ru_ixrss; /* integral shared memory size */ long ru_idrss; /* integral unshared data size */ long ru_isrss; /* integral unshared stack size */ long ru_minflt; /* page reclaims (soft page faults) */ long ru_majflt; /* page faults (hard page faults) */ long ru_nswap; /* swaps */ long ru_inblock; /* block input operations */ long ru_oublock; /* block output operations */ long ru_msgsnd; /* IPC messages sent */ long ru_msgrcv; /* IPC messages received */ long ru_nsignals; /* signals received */ long ru_nvcsw; /* voluntary context switches */ long ru_nivcsw; /* involuntary context switches */ }; ``` ## console.[hc] ### cmd_select ,run_console: ```c bool run_console(char *infile_name) { if (!push_file(infile_name)) { report(1, "ERROR: Could not open source file '%s'", infile_name); return false; } while (!cmd_done()) { cmd_select(0, NULL, NULL, NULL, NULL); } return err_cnt == 0; } ``` ## harness.[hc] ### test_malloc and test_free `queue.c`因為有include `harness.h` 因此呼叫的`malloc`和`free`為`test_malloc`和`test_free`。 struct of block_ele_t ```c typedef struct BELE { struct BELE *next; struct BELE *prev; size_t payload_size; size_t magic_header; /* Marker to see if block seems legitimate */ unsigned char payload[0]; /* Also place magic number at tail of every block */ } block_ele_t; ``` payload[0]是C99後引入的語法,可以允許使用者在runtime時才決定該array大小。 ```c block_ele_t *new_block = malloc(size + sizeof(block_ele_t) + sizeof(size_t)); ``` size是是真正要給使用者使用的空間大小(bytes),加上block_ele_t的大小,sizeof(size_t)則是儲存footer的空間。在`test_malloc`中,block_ele_t會依序填上`MAGICHEADER`、`size`、 `FOOTER`、`payload`,接著將next指向link list的開頭(allcated則指向新的new block,舊的prev指標指向新的block),再由`allocated_count++`後完成記憶體分配。`test_free`會檢查要被釋放的block的`MAGICHEADER`和`MAGICFOOTER`。當payload存取的資料超過payload的大小時會覆蓋自己的`MAGICFOOTER`,`MAGICHEADER`則可以檢查自己的block是否被別人所覆蓋[^1]。最後`test_free`將block移出double link list,`allocated_count--`釋放block。 ### trigger_exception,exception_setup,exception_cancel `jmp_read`的type是`volatile`,在參考[from C 語言 setjmp 與 longjmp 函數用法教學](https://blog.gtwang.org/programming/c-setjmp-longjmp-function-tutorial/)後得知: * 未開啟最佳化options(`-O`)只有 `register` 宣告的變數會改變(not change:`volatile`.`global`,`static`,`local`) * with `-O` option: `register`,`local`宣告的變數值均會改變 * `sigsetjmp`,`siglongjmp`: * 和`setjmp`和`longjmp`一樣都是nonlocal jmp * 但是`setjmp` 和`longjmp`並不會在jump儲存當前的signal mask ```c void trigger_exception(char *msg) { error_occurred = true; error_message = msg; if (jmp_ready) siglongjmp(env, 1); else exit(1); } ``` 當exception觸發(SIGSEGV,SIGALRM),`jmp_rady`會被trigger成true,而進行`setlongjmp`至上一個`exception_setup`位置(`if (sigsetjmp(env, 1))`),印出錯誤訊息後返回。 `exception_cancel`:清除'jmp_ready'和error_message,若是超時則會觸發alarm。 ## qtest.c ```c bool do_new(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } bool ok = true; if (q != NULL) { report(3, "Freeing old queue"); ok = do_free(argc, argv); } error_check(); if (exception_setup(true)) q = q_new(); exception_cancel(); qcnt = 0; show_queue(3); return ok && !error_check(); } ``` 根據`exception_setup`和`exception_cancel`可以得知 ```c static void queue_init() { fail_count = 0; q = NULL; signal(SIGSEGV, sigsegvhandler); signal(SIGALRM, sigalrmhandler); } ``` `queue_init`會使qtest在初始化時註冊`SIGSEGV`和`SIGALRM`,當中斷發生時OS會通知signal handler做對應的處理,在本例中`SIGSEGV`發生時會交由`sigsegvhandler`,而`SIGALARM`由`sigalrmhandler`處理,兩者都是印出錯誤訊息後,呼叫`trigger_exception`。 # 效能改善 ## Test in parallel 修改`driver.py` 以 process pool來執行: Performance counter stats for 'python scripts/driver.py'(with process pool): 1276.391265 task-clock (msec) # 1.327 CPUs utilized 842 context-switches # 0.660 K/sec 58 cpu-migrations # 0.045 K/sec 135,961 page-faults # 0.107 M/sec 3,941,655,638 cycles # 3.088 GHz 6,034,081,563 instructions # 1.53 insn per cycle 1,167,962,182 branches # 915.050 M/sec 2,832,393 branch-misses # 0.24% of all branches 0.961677493 seconds time elapsed Performance counter stats for 'python scripts/driver.py': 1007.687420 task-clock (msec) # 0.797 CPUs utilized 803 context-switches # 0.797 K/sec 12 cpu-migrations # 0.012 K/sec 131,010 page-faults # 0.130 M/sec 3,107,377,907 cycles # 3.084 GHz 5,928,785,192 instructions # 1.91 insn per cycle 1,146,581,106 branches # 1137.834 M/sec 2,129,417 branch-misses # 0.19% of all branches 1.264532373 seconds time elapsed In [2]: (1.264532373-0.961677493)/1.264532373 * 100 Out[2]: 23.949950706402355 ## compiler opt 在用perf record分析時,發現有些部份可能可以交由編譯器最佳化,例如每次malloc都會被呼叫但是回傳值都是一樣的`fail_allocation` * orig ``` 21,424,519 cache-misses # 91.576 % of all cache refs ( +- 0.05% ) 23,395,281 cache-references ( +- 0.17% ) 1,580,167,387 instructions # 1.51 insn per cycle ( +- 0.05% ) 1,049,546,666 cycles ( +- 1.31% ) 246,893 branch-misses ( +- 2.07% ) 0.343512614 seconds time elapsed ( +- 1.37% ) ``` * gcc -O2 flag ``` 21,198,677 cache-misses # 92.116 % of all cache refs ( +- 0.06% ) 23,013,085 cache-references ( +- 0.14% ) 1,242,986,343 instructions # 1.39 insn per cycle ( +- 0.05% ) 893,054,339 cycles ( +- 1.19% ) 232,820 branch-misses ( +- 1.79% ) 0.291908910 seconds time elapsed ``` * gcc -O3 flag ``` 21,257,036 cache-misses # 92.071 % of all cache refs ( +- 0.06% ) 23,087,689 cache-references ( +- 0.15% ) 1,245,395,797 instructions # 1.36 insn per cycle ( +- 0.06% ) 916,791,967 cycles ( +- 1.35% ) 243,474 branch-misses ( +- 1.96% ) 0.299784791 seconds time elapsed ( +- 1.45% ) ``` * clang -O2 flag ``` 21,258,614 cache-misses # 91.996 % of all cache refs ( +- 0.04% ) 23,108,203 cache-references ( +- 0.16% ) 1,242,551,526 instructions # 1.34 insn per cycle ( +- 0.06% ) 924,695,143 cycles ( +- 1.43% ) 261,872 branch-misses ( +- 8.10% ) 0.302822045 seconds time elapsed ( +- 1.53% ) ``` * clang -O3 flag ``` 21,243,947 cache-misses # 92.033 % of all cache refs ( +- 0.03% ) 23,082,922 cache-references ( +- 0.11% ) 1,241,602,442 instructions # 1.35 insn per cycle ( +- 0.05% ) 917,623,562 cycles ( +- 1.30% ) 249,315 branch-misses ( +- 5.23% ) 0.299945914 seconds time elapsed ( +- 1.28% ) ``` # 驗證常數時間複雜度[^3] Reference --- [^1]:[0xff07筆記](https://hackmd.io/s/SJnc7dtSN?fbclid=IwAR0oJhQQ2GTM4NNXGHi8HHw2D0tRHAtIYxFE60PgceKMUXlqOS4Bl6c2EHg) [^2]:[cjwind筆記](https://hackmd.io/s/r1vWC9tS4?fbclid=IwAR1ijirhutApLtwOivWqGMh3LBNzYCeXkwdkHYpIPcgUkUYo5Jw-87ZImjs#) [^3]:[afcidk筆記](https://hackmd.io/s/ry4VZS9SN?fbclid=IwAR0cG3zIIgovml4IUqqPqgOGtdpdJfGAOkzFPXE8-xyJk4n4qm4KoY8_8TU#%E9%A9%97%E8%AD%89%E5%B8%B8%E6%95%B8%E6%99%82%E9%96%93%E8%A4%87%E9%9B%9C%E5%BA%A6)