Try   HackMD

2023q1 Homework1 (lab0)

contributed by < mickey30606 >

目前 make test 分數 95/100

HackMD 筆記不是讓你張貼完整程式碼的地方,此處著重於「開發紀錄」,亦即你的觀察、描述遭遇到的問題,還有如何克服和改進。

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

研讀 Linux 核心原始程式碼的 lib/list_sort.c

merge()

  • 將兩個 linked list 以 compare function 為基準合併成一個 linked list 。
  • 使用到 pointer to pointer 的技巧。
  • 只有做到 linked list 的單向串接。
  • 如果節點相等,則取 a 來維持排序的 stability 。

改進漢語表達!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

merge_final()

  • 做的事情與 merge() 相同,不過有將 merge 完的 list 變成 doubly linked list 。
  • 比較有趣的是,在這個 merge_final() 的後半段,是將剩餘的 linked list 變成雙向的,因為擔心那個迴圈跑太久,所以設定了一個 8-bit 的 count ,若是 overflow ,則會呼叫一次 cmp() ,看註解是說,可以在 cmp() 中定期呼叫 cond_resched() (目前唯一想到的解法是維護一個 global variable ),這樣 cpu 就不會一直卡在這個迴圈裡面。
	do {
		/*
		 * If the merge is highly unbalanced (e.g. the input is
		 * already sorted), this loop may run many iterations.
		 * Continue callbacks to the client even though no
		 * element comparison is needed, so the client's cmp()
		 * routine can invoke cond_resched() periodically.
		 */
		if (unlikely(!++count))
			cmp(priv, b, b);
		b->prev = tail;
		tail = b;
		b = b->next;
	} while (b);

閱讀論文 〈Dude, is my code constant time?〉

在論文中有提到

a) Pre-processing: We pre-process measurements prior to statistical processing. We discard measurements that are larger than the p percentile, for various values of p.

並在 dudect/src/dudect.h 中有看到程式碼的實作

  if (first_time) {
    // throw away the first batch of measurements.
    // this helps warming things up.
    prepare_percentiles(ctx);
  } else {
    update_statistics(ctx);
    ret = report(ctx);
  }
static int64_t percentile(int64_t *a, double which, size_t size) {
  qsort(a, size, sizeof(int64_t), (int (*)(const void *, const void *))cmp);
  size_t array_position = (size_t)((double)size * (double)which);
  assert(array_position >= 0);
  assert(array_position < size);
  return a[array_position];
}

/*
 set different thresholds for cropping measurements.
 the exponential tendency is meant to approximately match
 the measurements distribution, but there's not more science
 than that.
*/
static void prepare_percentiles(dudect_ctx_t *ctx) {
  for (size_t i = 0; i < DUDECT_NUMBER_PERCENTILES; i++) {
    ctx->percentiles[i] = percentile(
        ctx->exec_times, 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)),
        ctx->config->number_measurements);
  }
}

在第一次的 measurement 做完之後,程式會將紀錄下的結果進行排序,然後依照百分比,取得在該百分比下能接受的最大值。

  • 不過在 percentile() 當中每次的 asize 都一樣,那位什麼不先排序好之後在乎叫到這個函式?
  • 註解中提到,程式碼第 18 行的 1 - (pow(0.5, 10 * (double)(i + 1) / DUDECT_NUMBER_PERCENTILES)) 是沒有什麼理論解釋的,只是大概去抓符合 distribution 的數值,那有什麼比較好的解法嗎?

讀論文!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

並在之後的 measurement 當中只要大於 ctx->percentiles[crop_index] 的測量值,都不會加入計算。

    // t-test on cropped execution times, for several cropping thresholds.
    for (size_t crop_index = 0; crop_index < DUDECT_NUMBER_PERCENTILES; crop_index++) {
      if (difference < ctx->percentiles[crop_index]) {
        t_push(ctx->ttest_ctxs[crop_index + 1], difference, ctx->classes[i]);
      }
    }

不過之後取得最終值 t 的程式碼中一樣是取得最大值作為最終的值,那有沒有進行 crop 目前看來沒有達成原本的目的?

static ttest_ctx_t *max_test(dudect_ctx_t *ctx) {
  size_t ret = 0;
  double max = 0;
  for (size_t i = 0; i < DUDECT_TESTS; i++) {
    if (ctx->ttest_ctxs[i]->n[0] > DUDECT_ENOUGH_MEASUREMENTS) {
      double x = fabs(t_compute(ctx->ttest_ctxs[i]));
      if (max < x) {
        max = x;
        ret = i;
      }
    }
  }
  return ctx->ttest_ctxs[ret];
}