2023q1 Homework1 (lab0)

# 2023q1 Homework1 (lab0) contributed by < `mickey30606` > > 目前 `make test` 分數 95/100 :::danger HackMD 筆記不是讓你張貼完整程式碼的地方，此處著重於「開發紀錄」，亦即你的觀察、描述遭遇到的問題，還有如何克服和改進。 :notes: jserv ::: ## 研讀 Linux 核心原始程式碼的 lib/list_sort.c ### `merge()` - 將兩個 linked list 以 compare function 為基準合併成一個 linked list 。 - 使用到 pointer to pointer 的技巧。 - 只有做到 linked list 的單向串接。 - 如果節點相等，則取 `a` 來維持排序的 stability 。 :::danger 改進漢語表達！ :notes: jserv ::: ### `merge_final()` - 做的事情與 `merge()` 相同，不過有將 merge 完的 list 變成 doubly linked list 。 - 比較有趣的是，在這個 `merge_final()` 的後半段，是將剩餘的 linked list 變成雙向的，因為擔心那個迴圈跑太久，所以設定了一個 8-bit 的 `count` ，若是 overflow ，則會呼叫一次 `cmp()` ，看註解是說，可以在 `cmp()` 中定期呼叫 `cond_resched()` (目前唯一想到的解法是維護一個 global variable )，這樣 cpu 就不會一直卡在這個迴圈裡面。 ```c do { /* * If the merge is highly unbalanced (e.g. the input is * already sorted), this loop may run many iterations. * Continue callbacks to the client even though no * element comparison is needed, so the client's cmp() * routine can invoke cond_resched() periodically. */ if (unlikely(!++count)) cmp(priv, b, b); b->prev = tail; tail = b; b = b->next; } while (b); ``` ## 閱讀論文〈Dude, is my code constant time?〉在論文中有提到 > a) Pre-processing: We pre-process measurements prior to statistical processing. We discard measurements that are larger than the p percentile, for various values of p. 並在 `dudect/src/dudect.h` 中有看到程式碼的實作 ```clike if (first_time) { // throw away the first batch of measurements. // this helps warming things up. prepare_percentiles(ctx); } else { update_statistics(ctx); ret = report(ctx); } ``` ```c static int64_t percentile(int64_t *a, double which, size_t size) { qsort(a, size, sizeof(int64_t), (int (*)(const void *, const void *))cmp); size_t array_position = (size_t)((double)size * (double)which); assert(array_position >= 0); assert(array_position < size); return a[array_position]; } /* set different thresholds for cropping measurements. the exponential tendency is meant to approximately match the measurements distribution, but there's not more science than that. */ static void prepare_percentiles(dudect_ctx_t *ctx) { for (size_t i = 0; i < DUDECT_NUMBER_PERCENTILES; i++) { ctx->percentiles[i] = percentile( ctx->exec_times, 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)), ctx->config->number_measurements); } } ``` 在第一次的 measurement 做完之後，程式會將紀錄下的結果進行排序，然後依照百分比，取得在該百分比下能接受的最大值。 :::warning - 不過在 `percentile()` 當中每次的 `a` 和 `size` 都一樣，那位什麼不先排序好之後在乎叫到這個函式？ - 註解中提到，程式碼第 18 行的 `1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES))` 是沒有什麼理論解釋的，只是大概去抓符合 distribution 的數值，那有什麼比較好的解法嗎？ > 讀論文！ :notes: jserv ::: 並在之後的 measurement 當中只要大於 `ctx->percentiles[crop_index]` 的測量值，都不會加入計算。 ```clike // t-test on cropped execution times, for several cropping thresholds. for (size_t crop_index = 0; crop_index < DUDECT_NUMBER_PERCENTILES; crop_index++) { if (difference < ctx->percentiles[crop_index]) { t_push(ctx->ttest_ctxs[crop_index + 1], difference, ctx->classes[i]); } } ``` :::warning 不過之後取得最終值 `t` 的程式碼中一樣是取得最大值作為最終的值，那有沒有進行 crop 目前看來沒有達成原本的目的？ ```clike static ttest_ctx_t *max_test(dudect_ctx_t *ctx) { size_t ret = 0; double max = 0; for (size_t i = 0; i < DUDECT_TESTS; i++) { if (ctx->ttest_ctxs[i]->n[0] > DUDECT_ENOUGH_MEASUREMENTS) { double x = fabs(t_compute(ctx->ttest_ctxs[i])); if (max < x) { max = x; ret = i; } } } return ctx->ttest_ctxs[ret]; } ``` :::