contributed by < TSAICHUNGHAN >
jimmy01240397
- bool flag = false;
+ bool now_del = false;
element_t *entry, *safe;
list_for_each_entry_safe (entry, safe, head, list) {
+ now_del = entry->list.next != head &&
+ strcmp(entry->value, safe->value) == 0;
+ if (now_del || flag) {
- if (entry->list.next != head &&
- strcmp(entry->value, safe->value) == 0) {
list_del(&entry->list);
q_release_element(entry);
+ flag = now_del;
- flag = true;
- } else if (flag) {
- list_del(&entry->list);
- q_release_element(entry);
- flag = false;
}
}
- 已修改沒必要的 NULL ,我只需將我的迴圈改成判斷是否回到
head
即可,因此也不須將環狀鏈結串列斷開。
- while (fast != NULL && fast->next != NULL)
+ while (fast != head && fast->next != head)
- 在
q_delete_dup
裡我確實寫的過於冗長,我已改進我的程式碼,下次 coding 會避免沒必要的分支以精簡程式碼。
修改後 commit:579a2ef
謝謝jimmy01240397
同學抽空 Review 。
jeremy
jujuegg
你的洞見呢?與其談「心法」,不如設想「如果我是你」,我會怎麼做,示範一次給同學看,而且檢討自己哪裡沒做好,從而呼應原本學員的不足處。清楚標註學員的不足處 (git commits 和描述)。
軟體工程師要學會說故事,本作業要求學員從良性詳盡的批評開始。繼續檢討!
jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
$ lscpu | less
架構: x86_64
CPU 作業模式: 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
供應商識別號: GenuineIntel
Model name: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
CPU 家族: 6
型號: 158
每核心執行緒數: 1
每通訊端核心數: 8
Socket(s): 1
製程: 12
CPU max MHz: 4900.0000
CPU min MHz: 800.0000
BogoMIPS: 7200.00
對照 Dictionary.com 對 implement 一詞的解釋:
「實作」會是符合上述 "implement" 語意的譯詞,取「起」以「創作」的意思,而「製造」本質上是固定輸入並通常產生固定輸出的行為。
clang-format的使用範例
$ clang-format -i queue.c
建立一個空佇列,初始化 struct list_head
,須先將 next
與 prev
都指向自身。
避免非必要的項目縮排 (即 *
和 -
),以清晰、明確,且流暢的漢語書寫。
設計流程:
用 malloc 分配記憶體空間給 head
若 head 分配空間無效,則回傳 NULL
使用 list.h
中的 INIT_LIST_HEAD()
來初始化 head
程式碼
struct list_head *q_new()
{
struct list_head *new =
(struct list_head *) malloc(sizeof(struct list_head));
if (!new)
return NULL;
INIT_LIST_HEAD(new);
return new;
}
使用作業規範的程式碼縮排風格。
避免非必要的項目縮排 (即 *
和 -
),以清晰、明確,且流暢的漢語書寫。
若 head 為無效,回傳
新增兩個 element 指標
使用 list_for_each_entry_safe
可以訪問每個鏈節,且儲存下個連結的鏈節 ???
使用 list_del
移除鏈節後,此時還未釋放節點的記憶體空間
使用 free() 使放鏈節記憶體空間
/* Free all storage used by queue */
void q_free(struct list_head *head)
{
if (!head)
return;
element_t *node, *temp;
list_for_each_entry_safe (node, temp, head, list) {
list_del(&node->list);
q_release_element(node);
}
free(head);
}
q_insert_head
, q_insert_tail
避免非必要的項目列表 (即 *
),使用清晰明確的漢語書寫。
false
false
strdup(s)
將 char *s
裡的字串複製給 node->value 指定的位置strdup
返回的值是否無效,若無效則釋放記憶體空間,並回傳 falselist.h
中的 list_add
插到鏈結頭部中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
q_insert_head
程式碼
bool q_insert_head(struct list_head *head, char *s)
{
if (!head)
return false;
element_t *node = malloc(sizeof(element_t));
if (!node)
return false;
node->value = strdup(s);
if (!node->value) {
free(node);
return false;
}
list_add(&node->list, head);
return true;
}
q_insert_tail
程式碼
/* Insert an element at tail of queue */
bool q_insert_tail(struct list_head *head, char *s)
{
if (!head)
return false;
element_t *node = malloc(sizeof(element_t));
if (!node)
return false;
node->value = strdup(s);
if (!node->value) {
+ free(node);
return false;
}
list_add_tail(&node->list, head);
return true;
}
q_remove_head
,q_remove_tail
NULL
list_first_entry
來取得 list 的第一個 elementsp==NULL && bufsize <= 0
則回傳 NULLstrncpy()
將 node->value
所指向的字串符複製到 sp
,最多複製bufsize個list_del
移除list中的節點q_remove_head
程式碼
/* Remove an element from head of queue */
element_t *q_remove_head(struct list_head *head, char *sp, size_t bufsize)
{
if (!head || list_empty(head))
return NULL;
element_t *node = list_first_entry(head, element_t, list);
if (!sp && bufsize <= 0)
return NULL;
- strncpy(sp, node->value, bufsize);
+ strncpy(sp, node->value, bufsize - 1);
+ sp[bufsize - 1] = 0;
list_del(&node->list);
return node;
}
q_remove_tail
程式碼
/* Remove an element from tail of queue */
element_t *q_remove_tail(struct list_head *head, char *sp, size_t bufsize)
{
if (!head || list_empty(head))
return NULL;
element_t *node = list_last_entry(head, element_t, list);
if (!sp && bufsize <= 0)
return NULL;
- strncpy(sp, node->value, bufsize);
+ strncpy(sp, node->value, bufsize - 1);
+ sp[bufsize - 1] = 0;
list_del(&node->list);
return node;
}
更動:
在字串的結尾補上 \0
表示結束
list_for_each_entry
從 head 開始訪問每個鏈節,每經過一個總數加一程式碼
/* Return number of elements in queue */
int q_size(struct list_head *head)
{
int sum = 0;
- if (!head)
+ if (!head || list_empty(head));
return 0;
- element_t *node = malloc(sizeof(element_t));
+ element_t *node;
list_for_each_entry (node, head, list) {
sum++;
}
return sum;
}
更動:
此部份更動已寫在後續章節「使用 Valgrind 分析記憶體問題」中
false
list_entry
由慢指標得知結構體,再用 q_release_element
進行記憶體的釋放程式碼
/* Delete the middle node in queue */
if (!head || list_empty(head))
return false;
- struct list_head *prev = head->prev;
- head->prev->next = NULL;
struct list_head *fast = head->next;
struct list_head *slow = head->next;
- while (fast != NULL && fast->next != NULL) {
+ while (fast != head && fast->next != head) {
fast = fast->next->next;
slow = slow->next;
}
list_del(slow); // prev->next = slow->next;
q_release_element(list_entry(slow, element_t, list));
- head->prev = prev;
- prev->next = head;
return true;
}
更動 1:
當初在設計時忘記是個循環的雙向鏈結串列,因此將鏈結串列的循環給斷開,再最後才把循環接起來。
更動 2:
修改自 jimmy01240397
的建議,只須將迴圈條件設置成當 fast
指標指向 head
即可。
list_for_each_entry_safe
訪問每個節點,並紀錄下個節點跟當下的節點的字串進行比較flag
作記號來判斷是否刪除程式碼
/* Delete all nodes that have duplicate string */
bool q_delete_dup(struct list_head *head)
{
// https://leetcode.com/problems/remove-duplicates-from-sorted-list-ii/
if (!head || list_empty(head))
return false;
bool flag = false;
+ bool now_del = false;
element_t *entry, *safe;
list_for_each_entry_safe (entry, safe, head, list) {
- if (entry->list.next != head &&
- strcmp(entry->value, safe->value) == 0) {
+ now_del = entry->list.next != head && strcmp(entry->value, safe->value) == 0;
+ if(now_del || flag)
+ {
list_del(&entry->list);
q_release_element(entry);
- flag = true;
- } else if (flag) {
- list_del(&entry->list);
- q_release_element(entry);
- flag = false;
+ flag = now_del;
}
}
return true;
}
修改自 jimmy01240397
的建議,減少沒必要的分支以精簡程式碼。
list_move
將 cur
的下個節點移至 newhead
的下個節點來進行 swap ,再將 newhead
設為 cur
來進行下組的 swap程式碼
/* Swap every two adjacent nodes */
void q_swap(struct list_head *head)
{
// https://leetcode.com/problems/swap-nodes-in-pairs/
- if (head == NULL)
+ if (!head || list_empty(head))
return;
- struct list_head *cur = head->next;
- struct list_head *newhead = cur;
+ struct list_head *newhead = head, *cur = newhead->next;
- for (; cur != head && cur->next != head;
- cur = cur->next, newhead = newhead->next->next) {
- list_move(cur, newhead->next);
- }
+ for (; cur != head && cur->next != head; newhead = cur, cur = cur->next) {
+ list_move(cur->next, newhead);
+ }
}
更動:
參考 chiangkd,更改swap實做使程式更直觀。
list_for_each_safe
可以訪問每個節點的同時保留下個節點並進行 reverselist_for_each_safe
定義上的關係 node != head
,因此必須另外定義head的 next
及 prev
程式碼
void q_reverse(struct list_head *head)
{
if (!head || head->next == NULL)
return;
struct list_head *node, *safe;
list_for_each_safe (node, safe, head) {
node->next = node->prev;
node->prev = safe;
};
safe = head->next;
head->next = head->prev;
head->prev;
}
待改進:
使用 list_move
使程式碼更簡潔易讀
中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
與 q_swap
實作相似,利用 list_move
可以用更簡潔的方式表示。以 k 個節點為一組進行 reverse ,在將 newhead
變更為前一組的最後一個節點。
list_head
的指標 curr
及 newhead
指向 head
curr->next
每次會被移除到newhead
的下個節點,經過(k-1)次的循環後curr
會間接的變成尾巴的節點。curr
變為新的頭節點程式碼:
/* Reverse the nodes of the list k at a time */
void q_reverseK(struct list_head *head, int k)
{
// https://leetcode.com/problems/reverse-nodes-in-k-group/
if (!head || list_empty(head))
return;
struct list_head *newhead = head, *curr = newhead->next;
int times = q_size(head) / k;
for (int i = 0; i < times; i++) {
for (int j = 1; j < k; j++) {
list_move(curr->next, newhead);
}
newhead = curr;
+ curr = newhead->next;
}
}
參考你所不知道的 C 語言: linked list 和非連續記憶體和學長 wanghanchi 中合併兩個鏈節串列並作排序的方法。
避免非必要的項目縮排 (即 *
和 -
),以清晰、明確,且流暢的漢語書寫。
mergesort
list_head
q_merge_two
q_merge_two
list1
與list2
字串大小list1
或 list2
走到盡頭,剩餘的一方將會接續排序在尾巴q_sort
head->next
的值帶入 mergesort
函式prev
已亂掉,因此重新定義,並把環狀結構接回struct list_head *mergesort(struct list_head *l)
{
if (!l || l->next == NULL)
return l;
struct list_head *fast = l, *slow = l;
while (fast->next != NULL && fast->next->next != NULL) {
fast = fast->next->next;
slow = slow->next;
}
struct list_head *mid = slow->next;
slow->next = NULL;
return (q_merge_two(mergesort(l), mergesort(mid)));
}
/*Merge the two lists in a one sorted list*/
struct list_head *q_merge_two(struct list_head *list1, struct list_head *list2)
{
struct list_head *new_head = NULL;
struct list_head **ptr = &new_head;
- element_t *list1_entry = list_entry(list1, element_t, list);
- element_t *list2_entry = list_entry(list2, element_t, list);
for (struct list_head **node = NULL; list1 && list2;
+ *node = (*node)->next) {
+ node = strcmp(list_entry(list1, element_t, list)->value,
+ list_entry(list2, element_t, list)->value) >= 0
+ ? &list2
+ : &list1;
*ptr = *node;
ptr = &(*ptr)->next;
}
*ptr = (struct list_head *) ((uint64_t) list1 | (uint64_t) list2);
return new_head;
}
更動:
在我執行trace-04-ops
時 sort 總是無法順利進行,輸出如下:
$ ./qtest -v 3 -f traces/trace-04-ops.cmd
# Test of insert_head, insert_tail, size, swap, and sort
l = []
l = [gerbil]
l = [bear gerbil]
l = [dolphin bear gerbil]
l = [bear dolphin gerbil]
Removed bear from queue
l = [dolphin gerbil]
l = [bear dolphin gerbil]
Queue size = 3
l = [bear dolphin gerbil]
l = [bear dolphin gerbil meerkat]
l = [bear dolphin gerbil meerkat bear]
l = [bear dolphin gerbil meerkat bear gerbil]
Queue size = 6
l = [bear dolphin gerbil meerkat bear gerbil]
ERROR: Not sorted in ascending order
l = [bear meerkat gerbil bear dolphin gerbil]
Removed bear from queue
l = [meerkat gerbil bear dolphin gerbil]
Removed meerkat from queue
l = [gerbil bear dolphin gerbil]
Removed gerbil from queue
l = [bear dolphin gerbil]
Removed bear from queue
l = [dolphin gerbil]
Queue size = 2
l = [dolphin gerbil]
Freeing queue
此時我才發現佇列並不會跟著鏈結串列一起訪問節點,因此把 list_entry
帶入迴圈跟著存取每個節點。
access 的翻譯是「存取」,而非「訪問」(visit)
/* Sort elements of queue in ascending/descending order */
void q_sort(struct list_head *head, bool descend)
{
if (head == NULL || list_empty(head))
return;
head->prev->next = NULL;
head->next = mergesort(head->next);
struct list_head *cur = head;
struct list_head *next = head->next;
while (next) {
next->prev = cur;
cur = next;
next = next->next;
}
cur->next = head;
head->prev = cur;
}
題目說明 : 2487. Remove Nodes From Linked List
參考 wanghanchi 的解法,在單向鏈節串列中可以使用reverse
來更加簡潔的表達,在環狀雙向鏈節串列中使用prev
就能更輕易的解決。
head
為無效或為空的鏈節串列,回傳零list_head
的指標 curr
指向 head->prev
及 prev
指向 curr->prev
list_entry
得到佇列裡的資訊strcmp()
進行比較,若 curr_entry
的字串大於 prev_entry
,則刪除 prev
節點,反之使 curr
移動到 prev
所在節點程式碼
int q_descend(struct list_head *head)
{
// https://leetcode.com/problems/remove-nodes-from-linked-list/
if (!head || head->next == NULL)
return 0;
- struct list_head *curr = head->prev, *prev = curr->prev;
+ struct list_head *curr = head->prev;
- element_t *curr_entry = list_entry(curr, element_t, list);
- element_t *prev_entry = list_entry(prev, element_t, list);
- while (prev != head) {
+ while (curr != head && curr->prev != head) {
+ element_t *curr_entry = list_entry(curr, element_t, list);
+ element_t *prev_entry = list_entry(prev, element_t, list);
if (strcmp(curr_entry->value, prev_entry->value) > 0) {
- list_del(prev);
+ list_del(&prev_entry->list);
q_release_element(prev_entry);
} else {
- curr = prev;
+ curr = curr->prev;
}
}
return q_size(head);
}
更動:
list_head
的指標使用,使程式碼更精簡list_entry
的位置,使每次迴圈都能訪問到佇列位置參考 chiacyu
避免非必要的項目縮排 (即 *
和 -
),以清晰、明確,且流暢的漢語書寫。
檢查 head 是否為空或無效
新增一個 list_head
名為 temp
使用 list_for_each_entry_safe
訪問每個佇列,
用 list_splice_init
將每個節點移動到 temp
的鏈結串列,並將佇列 curr->q
的頭指標初始化為空佇列
用 q_sort
進行排序
使用 list_splice_init
將排序後的 temp
鏈結串列合併到 head
鏈結串列的第一個隊列中
回傳合併後的節點數量
/* Merge all the queues into one sorted queue, which is in ascending/descending
* order */
int q_merge(struct list_head *head, bool descend)
{
// https://leetcode.com/problems/merge-k-sorted-lists/
if(!head || list_empty(head)){
return 0;
}
struct list_head temp;
INIT_LIST_HEAD(&temp);
queue_contex_t *curr, *safe;
list_for_each_entry_safe(curr, safe, head, chain){
list_splice_init(curr->q, &temp);
}
q_sort(&temp,false);
list_splice_init(&temp, list_first_entry(head, queue_contex_t, chain)->q);
return q_size(head);
}
:::info
和 :::spoiler
,讓行文清晰且流暢。$ make test
scripts/driver.py -c
...
+++ TESTING trace trace-17-complexity:
# Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
ERROR: Probably not constant time or wrong implementation
--- trace-17-complexity 0/5
--- TOTAL 95/100
作業要求:論文〈Dude, is my code constant time?〉重點提示
論文相關筆記紀錄 : 筆記
LAB0-C/dudect
dudect/constant.c
measure
主要測試 insert
和 remove
功能是否正常。
其測試的方法為用佇列的大小判斷,下方程式碼從 case DUT(insert_head)
中擷取。
int before_size = q_size(l);
before_ticks[i] = cpucycles();
dut_insert_head(s, 1);
after_ticks[i] = cpucycles();
int after_size = q_size(l);
dut_free();
if (before_size != after_size - 1)
return false;
dudect/ttest.c
先看結構體
typedef struct {
double mean[2];
double m2[2];
double n[2];
} t_context_t;
t_init
主要用作初始化 t_context_t
這個結構體
void t_init(t_context_t *ctx)
{
for (int class = 0; class < 2; class ++) {
ctx->mean[class] = 0.0;
ctx->m2[class] = 0.0;
ctx->n[class] = 0.0;
}
return;
}
t_push
看似用來確認穩定狀態
void t_push(t_context_t *ctx, double x, uint8_t class)
{
assert(class == 0 || class == 1);
ctx->n[class]++;
/* Welford method for computing online variance
* in a numerically stable way.
*/
double delta = x - ctx->mean[class];
ctx->mean[class] = ctx->mean[class] + delta / ctx->n[class];
ctx->m2[class] = ctx->m2[class] + delta * (x - ctx->mean[class]);
}
t_compute
下方程式碼為 Welch's test 公式 :
double t_compute(t_context_t *ctx)
{
double var[2] = {0.0, 0.0};
var[0] = ctx->m2[0] / (ctx->n[0] - 1);
var[1] = ctx->m2[1] / (ctx->n[1] - 1);
double num = (ctx->mean[0] - ctx->mean[1]);
double den = sqrt(var[0] / ctx->n[0] + var[1] / ctx->n[1]);
double t_value = num / den;
return t_value;
}
dudect/fixture.c
/* Interface to test if function is constant */
#define _(x) bool is_##x##_const(void);
DUT_FUNCS
#undef _
is_##x##_const
函式由前置處理器 concatenation 處理,當中 DUT_FUNCS
包含 insert_head
、 insert_tail
、remove_head
、remove_tail
相關知識可以參考 你所不知道的 C 語言:前置處理器應用篇
differentiate()
主要用作計算執行時間
for (size_t i = 0; i < N_MEASURES; i++)
exec_times[i] = after_ticks[i] - before_ticks[i];
report()
用作判斷是否為 constant time
/* Definitely not constant time */
if (max_t > t_threshold_bananas)
return false;
/* Probably not constant time. */
if (max_t > t_threshold_moderate)
return false;
/* For the moment, maybe constant time. */
return true;
doit()
ret
接收 measure
測試 insert
、 remove
是否都正常的布林值。
update_statistics
接收來自 differentiate
的執行時間結果和 classes。
最後 ret
檢查是否為 constant time 。
bool ret = measure(before_ticks, after_ticks, input_data, mode);
differentiate(exec_times, before_ticks, after_ticks);
update_statistics(exec_times, classes);
ret &= report();
...
return ret;
在 traces/trace-17-complexity.cmd
中看到第一行為 option simulation 1
,可見 simulation 為檢驗時間複雜度的開關,因此先來了解 simulation 與 dudect 的關聯。
在 qtest.c
中看到這段
if (simulation) {
if (argc != 1) {
report(1, "%s does not need arguments in simulation mode", argv[0]);
return false;
}
bool ok =
pos == POS_TAIL ? is_insert_tail_const() : is_insert_head_const();
if (!ok) {
report(1,
"ERROR: Probably not constant time or wrong implementation");
return false;
}
report(1, "Probably constant time");
return ok;
程式碼更新於 commit : bdcfae3
用 pos == POS_TAIL ? is_insert_tail_const()
可以用 pos
選擇要插入的位置。
從上述程式碼中看到似乎是用 is_insert_head(tail)_const
去回傳是否為 constant time 。
static bool test_const(char *text, int mode)
{
bool result = false;
t = malloc(sizeof(t_context_t));
for (int cnt = 0; cnt < TEST_TRIES; ++cnt) {
printf("Testing %s...(%d/%d)\n\n", text, cnt, TEST_TRIES);
init_once();
for (int i = 0; i < ENOUGH_MEASURE / (N_MEASURES - DROP_SIZE * 2) + 1;
++i)
result = doit(mode);
printf("\033[A\033[2K\033[A\033[2K");
if (result)
break;
}
free(t);
return result;
}
預設測試 TEST_TRIES
次,每次測試時先呼叫 init_once()
對 t_context_t
結構體進行初始化,接著進入 doit
,由 prepare_inputs
輸出提供的值 ,經由 measure
測試功能是否正常及 differentiate
得到執行時間,再將計算得到的執行時間、 classes 輸入給 update_statistics
,若函式都成功執行且為 constant time ( report()
回傳 true
) 則 ret
回傳 true
。
LAB0-C/dudect
達到時間常數在 LAB0-C/dudect
中相較原程式碼 dudect.h 少了 percentile()
功能,即少了裁切測量值。
因此將原始碼的 percentile 加入 LAB0-C/dudect
中並修改,最後終於讓我看到了卡比之星。
修改內容 commit d04d313
讀了一遍這篇論文,對於完全沒有統計基礎的人來說十分挫折,我並不能完全理解裡面分析及理論想表達的內容,反而是透過看同學們的筆記一點一點紀錄才大致了解實作原理,後續有時間再回來複習。
詳細內容可以參考我的 commit:d0493b2 和 commit:c091ea3
設計邏輯遵循 Fisher–Yates shuffle 裡的 The modern algorithm
下方表格為實作概念:
Times | Range | Roll | Scratch | Result |
---|---|---|---|---|
0 | 1 2 3 4 5 6 7 | |||
1 | 0~6 | 2 | 1 2 7 4 5 6 | 3 |
2 | 0~5 | 5 | 1 2 7 4 5 | 6 3 |
3 | 0~4 | 3 | 1 2 7 5 | 4 6 3 |
4 | 0~3 | 1 | 1 5 7 | 2 4 6 3 |
5 | 0~2 | 1 | 1 7 | 5 2 4 6 3 |
6 | 0~1 | 1 | 1 | 7 5 2 4 6 3 |
7 | 0~0 | 0 | 1 7 5 2 4 6 3 |
在 qtest
中輸出結果
cmd> new
l = []
cmd> it 1
l = [1]
cmd> it 2
l = [1 2]
cmd> it 3
l = [1 2 3]
cmd> it 4
l = [1 2 3 4]
cmd> it 5
l = [1 2 3 4 5]
cmd> it 6
l = [1 2 3 4 5 6]
cmd> it 7
l = [1 2 3 4 5 6 7]
cmd> shuffle
l = [5 6 3 2 1 7 4]
接著導入 M01: lab0 所提供的驗證程式輸出
Expectation: 41666
Observation: {'1234': 41792, '1243': 41748, '1324': 41661, '1342': 41497, '1423': 41519, '1432': 41770, '2134': 41175, '2143': 41992, '2314': 41550, '2341': 41643, '2413': 41633, '2431': 41949, '3124': 41780, '3142': 41597, '3214': 41703, '3241': 41754, '3412': 41483, '3421': 41258, '4123': 41648, '4132': 41651, '4213': 41983, '4231': 41864, '4312': 41753, '4321': 41597}
chi square sum: 21.73297172754764
shuffle 的頻率是大致符合 Uniform distribution
由於我在 $make check
總是出現有 ERROR: Freed queue, but 2 blocks are still allocated
。
$ make check
./qtest -v 3 -f traces/trace-eg.cmd
# Demonstration of queue testing framework
# Use help command to see list of commands and options
# Initial queue is NULL.
l = NULL
# Create empty queue
l = []
# See how long it is
Queue size = 0
l = []
# Fill it with some values. First at the head
l = [dolphin]
l = [bear dolphin]
l = [gerbil bear dolphin]
# Now at the tail
l = [gerbil bear dolphin meerkat]
l = [gerbil bear dolphin meerkat bear]
# Reverse it
l = [bear meerkat dolphin bear gerbil]
# See how long it is
Queue size = 5
l = [bear meerkat dolphin bear gerbil]
# Delete queue. Goes back to a NULL queue.
l = NULL
ERROR: There is no queue, but 2 blocks are still allocated
# Exit program
Freeing queue
ERROR: Freed queue, but 2 blocks are still allocated
make: *** [Makefile:57:check] 錯誤 1
一開始看完以為問題出在 q_free
上,因此花費了大量時間反覆修改測試,直到我看了 以 Valgrind 分析記憶體問題 才開始使用到工具分析計體資訊,以下為輸出結果。
$ valgrind ./qtest < traces/trace-eg.cmd
l = NULL
l = []
Queue size = 0
l = []
==26769== Conditional jump or move depends on uninitialised value(s)
==26769== at 0x484ED28: strlen (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769== by 0x10DC6B: strsave_or_fail (report.c:254)
==26769== by 0x10E551: parse_args (console.c:152)
==26769== by 0x10E551: interpret_cmd (console.c:200)
==26769== by 0x10F1F0: run_console (console.c:691)
==26769== by 0x10D3C3: main (qtest.c:1258)
==26769==
==26769== Conditional jump or move depends on uninitialised value(s)
==26769== at 0x484F02A: strncpy (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769== by 0x10DCE5: strncpy (string_fortified.h:95)
==26769== by 0x10DCE5: strsave_or_fail (report.c:266)
==26769== by 0x10E551: parse_args (console.c:152)
==26769== by 0x10E551: interpret_cmd (console.c:200)
==26769== by 0x10F1F0: run_console (console.c:691)
==26769== by 0x10D3C3: main (qtest.c:1258)
==26769==
l = [dolphin]
l = [bear dolphin]
l = [gerbil bear dolphin]
l = [gerbil bear dolphin meerkat]
l = [gerbil bear dolphin meerkat bear]
l = [bear meerkat dolphin bear gerbil]
Queue size = 5
l = [bear meerkat dolphin bear gerbil]
l = NULL
ERROR: There is no queue, but 2 blocks are still allocated
Freeing queue
ERROR: Freed queue, but 2 blocks are still allocated
==26769== 128 bytes in 2 blocks are still reachable in loss record 1 of 1
==26769== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==26769== by 0x10F2E7: test_malloc (harness.c:133)
==26769== by 0x10F90F: q_size (queue.c:104)
==26769== by 0x10B60F: do_size (qtest.c:560)
==26769== by 0x10DFD1: interpret_cmda (console.c:181)
==26769== by 0x10E586: interpret_cmd (console.c:201)
==26769== by 0x10F1F0: run_console (console.c:691)
==26769== by 0x10D3C3: main (qtest.c:1258)
==26769==
這時我才發現可能是我的 q_size
造成的問題,以下是我當時的 q_size
及更改後:
/* Return number of elements in queue */
int q_size(struct list_head *head)
{
int sum = 0;
- if (!head)
+ if (!head || list_empty(head));
return 0;
- element_t *node = malloc(sizeof(element_t));
+ element_t *node;
list_for_each_entry (node, head, list) {
sum++;
}
return sum;
}
原來我這時在設計 q_size
時自作聰明的多幫 node
設置 malloc ,下次在設計時更應該思考配置記憶體的必要。
這問題現在看似很簡單,但在當時 $ make check
的情況看來,我只會認為一定是我的 q_free
在某個地方出錯,我更應該善用工具來分析問題。
感謝 RinHizakura, chiangkd, Risheng
不該只有致謝,你的洞見呢?
__attribute
可以設置函式屬性(Function Attribute)、變量屬性(Variable Attribute)和類型屬性(Type Attribute)。
其中函式屬性可以幫助開發者把一些特性添加到函數聲明中,從而可以使編譯器在錯誤檢查方面的功能更強大。
__attribute__
機制也很容易同非GNU應用程序> 做到兼容之功效。
為何不查閱 GCC 手冊呢?
__attribute__((nonnull(2,3)))
其中 2、3 代表地二、三個引數不能為空
參考 __attribute__((nonnull)) function attribute
This function attribute specifies function parameters that are not supposed to be null pointers. This enables the compiler to generate a warning on encountering such a parameter
__attribute__((nonnull(2,3,4)))
static struct list_head *merge(void *priv, list_cmp_func_t cmp,
struct list_head *a, struct list_head *b)
{
struct list_head *head, **tail = &head;
for (;;) {
/* if equal, take 'a' -- important for sort stability */
if (cmp(priv, a, b) <= 0) {
*tail = a;
tail = &a->next;
a = a->next;
if (!a) {
*tail = b;
break;
}
} else {
*tail = b;
tail = &b->next;
b = b->next;
if (!b) {
*tail = a;
break;
}
}
}
return head;
}
先定義函式第 2、3、4 不能為 NULL
在 linux/include/linux/list_sort.h 中找到 cmp 的 typedef
typedef int __attribute__((nonnull(2,3))) (*list_cmp_func_t)(void *,
const struct list_head *, const struct list_head *);
可以確定 cmp
是函數指標,且回傳 int
,參數分別為 void *
、兩個 struct list_head *
cmp
> 0 ,即為 a
在 b
之後cmp
<= 0 ,即為 a
在 a
之前因 list_sort 為 stable sort ,故不需特別探討小於和等於 0 的區別。
stable sort : Stable sorting algorithms maintain the relative order of records with equal keys (i.e. values). That is, a sorting algorithm is stable if whenever there are two records R and S with the same key and with R appearing before S in the original list, R will appear before S in the sorted list. See here for a more complete description.
ex :
排序前 : 3,5,19,1,3*,10
排序後 : 1,3,3*,5,10,19
兩個3、3*排序前後順序相同
* Combine final list merge with restoration of standard doubly-linked
* list structure. This approach duplicates code from merge(), but
* runs faster than the tidier alternatives of either a separate final
* prev-link restoration pass, or maintaining the prev links
* throughout.
與 merge
作法相似,差別在於最終連結 prev
回上一個節點,形成雙向環狀鏈結串列,在速度上比起每次 merge 後再接 prev
來的更快。
__attribute__((nonnull(2,3,4,5)))
static void merge_final(void *priv, list_cmp_func_t cmp, struct list_head *head,
struct list_head *a, struct list_head *b)
{
struct list_head *tail = head;
u8 count = 0;
for (;;) {
/* if equal, take 'a' -- important for sort stability */
if (cmp(priv, a, b) <= 0) {
tail->next = a;
a->prev = tail;
tail = a;
a = a->next;
if (!a)
break;
} else {
tail->next = b;
b->prev = tail;
tail = b;
b = b->next;
if (!b) {
b = a;
break;
}
}
}
/* Finish linking remainder of list b on to tail */
tail->next = b;
do {
/*
* If the merge is highly unbalanced (e.g. the input is
* already sorted), this loop may run many iterations.
* Continue callbacks to the client even though no
* element comparison is needed, so the client's cmp()
* routine can invoke cond_resched() periodically.
*/
if (unlikely(!++count))
cmp(priv, b, b);
b->prev = tail;
tail = b;
b = b->next;
} while (b);
/* And the final links to make a circular doubly-linked list */
tail->next = head;
head->prev = tail;
}
程式碼的註釋提到若合併是極度不平衡(像是輸入已經排序),則會經過多次的迭代,儘管這些迭代不需要。因此 cmp
可以週期性的呼叫 cond_resched()
。
查閱 Linux 核心原始程式碼裡頭的文件和程式碼註解,上述描述不精確。
接著討論 likely
與 unlikely
,在 linux/include/linux/compiler.h 可以找到
#define likely_notrace(x) __builtin_expect(!!(x), 1)
#define unlikely_notrace(x) __builtin_expect(!!(x), 0)
__built_expect()
是 gcc 的內建 function ,用來將 branch 的相關資訊提供給 compiler ,告訴 compiler 設計者期望的比較結果,以便對程式碼改變分支順序來進行優化。
!!(x)
用來確保值一定是 0 或 1!!(x)
就無法確保值一定是0或1if (likely(x)) {
/* execute when x is true */
}
else {
/* execute when x is false */
}
x==ture
的對應執行程式碼接在判斷後面x==false
的對應執行程式碼接在判斷後面參考自 chiangkd 提供的參考資料 likely / unlikely macro introduction
在看程式碼之前先來看註解的說明 :
* The comparison function @cmp must return > 0 if @a should sort after
* @b ("@a > @b" if you want an ascending sort), and <= 0 if @a should
* sort before @b *or* their original order should be preserved. It is
* always called with the element that came first in the input in @a,
* and list_sort is a stable sort, so it is not necessary to distinguish
* the @a < @b and @a == @b cases.
*
* This is compatible with two styles of @cmp function:
* - The traditional style which returns <0 / =0 / >0, or
* - Returning a boolean 0/1.
* The latter offers a chance to save a few cycles in the comparison
* (which is used by e.g. plug_ctx_cmp() in block/blk-mq.c).
*
* A good way to write a multi-word comparison is::
*
* if (a->high != b->high)
* return a->high > b->high;
* if (a->middle != b->middle)
* return a->middle > b->middle;
* return a->low > b->low;
若 a
需被排序在 b
之後,則 @cmp > 0
(@a > @b
),反之若 a
需被排序在 b
之前,則 @cmp <= 0
。list_sort 是 stable sort 因此在這裡我們不需特別探討小於和等於 0 的區別。
* This mergesort is as eager as possible while always performing at least
* 2:1 balanced merges. Given two pending sublists of size 2^k, they are
* merged to a size-2^(k+1) list as soon as we have 2^k following elements.
從這裡開始說明 list_sort 的實作方法,兩個要被 merge 的 list 始終保持 2 : 1 的大小,這可以降低比較次數。
相較 fully-eager bottom-up mergesort 只要出現兩個
避免過多的中英文混用,已有明確翻譯詞彙者,例如「鏈結串列」(linked list) 和「佇列」(queue),就使用該中文詞彙,英文則留給變數名稱、人名,或者缺乏通用翻譯詞彙的場景。
* Thus, it will avoid cache thrashing as long as 3*2^k elements can
* fit into the cache. Not quite as good as a fully-eager bottom-up
* mergesort, but it does use 0.2*n fewer comparisons, so is faster in
* the common case that everything fits into L1.
只要這 2 : 1 的 list (即
* The merging is controlled by "count", the number of elements in the
* pending lists. This is beautifully simple code, but rather subtle.
count
用來計算在 pending lists 的 element
數量 * Each time we increment "count", we set one bit (bit k) and clear
* bits k-1 .. 0. Each time this happens (except the very first time
* for each bit, when count increments to 2^k), we merge two lists of
* size 2^k into one list of size 2^(k+1).
count
增加時,第 k 位的 bit 設為 1 ,並清除 k-1 位的 bit 為 0不懂就說不懂,沒有「不太明白」這回事。
後續的註解說明看不懂,因此直接參考 kdnvt 的確保 2 : 1 的實作方法中有非常清楚的說明
引述其的表格 :
count decimal | count binary | merge | before merge | after merge |
---|---|---|---|---|
0 | 0000 | No | NULL | X |
1 | 0001 | No | 1 | X |
2 | 0010 | Yes | 1-1 | 2 |
3 | 0011 | No | 1-2 | X |
4 | 0100 | Yes | 1-1-2 | 2-2 |
5 | 0101 | Yes | 1-2-2 | 1-4 |
6 | 0110 | Yes | 1-1-4 | 2-4 |
7 | 0111 | No | 1-2-4 | X |
8 | 1000 | Yes | 1-1-2-4 | 2-2-4 |
9 | 1001 | Yes | 1-2-2-4 | 1-4-4 |
10 | 1010 | Yes | 1-1-4-4 | 2-4-4 |
11 | 1011 | Yes | 1-2-4-4 | 1-2-8 |
12 | 1100 | Yes | 1-1-2-8 | 2-2-8 |
13 | 1101 | Yes | 1-2-2-8 | 1-4-8 |
14 | 1110 | Yes | 1-1-4-8 | 2-4-8 |
15 | 1111 | No | 1-2-4-8 | X |
16 | 10000 | Yes | 1-1-2-4-8 | 2-2-4-8 |
list_sort 實作 :
do {
size_t bits;
struct list_head **tail = &pending;
/* Find the least-significant clear bit in count */
for (bits = count; bits & 1; bits >>= 1)
tail = &(*tail)->prev;
/* Do the indicated merge */
if (likely(bits)) {
struct list_head *a = *tail, *b = a->prev;
a = merge(priv, cmp, b, a);
/* Install the merged result in place of the inputs */
a->prev = b->prev;
*tail = a;
}
/* Move one element from input list to pending */
list->prev = pending;
pending = list;
list = list->next;
pending->next = NULL;
count++;
} while (list);
count = 0
初始狀態如下圖所示,這邊假設一共有 5 個節點,其中 node1
為 head
此時 bit
為 0 因此不進入 for()
迴圈也不進行 merge
count = 1
此時 bits
為 1 經過一次 for
迴圈 tail
指向 node2
的 prev
,不進行 merge
。
count = 2
此時 bits
為 for
迴圈,但進行 merge
此時 a(node3)
跟 b(node2)
已經合併完成,合併後用 node2,3
來表示
count = 3
count = 3 = for
迴圈,此時 tail
指向 node2,3
的 prev
,因 bit
為 0 故此階段不進行合併。
count = 4
count
為 bits
不為 0 進行合併
此時 a(node5)
跟 b(node4)
已經合併完成,合併後用 node4,5 來表示
上述節點都已被加到 pending list ,接著將所有的 pending list 合併在一起
/* End of input; merge together all the pending lists. */
list = pending;
pending = pending->prev;
for (;;) {
struct list_head *next = pending->prev;
if (!next)
break;
list = merge(priv, cmp, pending, list);
pending = next;
}
最後將 prev 重新接回變成雙向鏈結串列
/* The final merge, rebuilding prev links */
merge_final(priv, cmp, head, pending, list);
list_sort
加入到專案中LAB0_C
專案中list_sort.c
中用到的巨集加入到 list_sort.h
中likely
與 unlikely
,在 list_sort.h
中加入以下程式碼,可以在linux/include/linux/compiler.h中找到#define likely_notrace(x) __builtin_expect(!!(x), 1)
#define unlikely_notrace(x) __builtin_expect(!!(x), 0)
list_sort.c
中的 u8
型別#include <stdint.h>
typedef uint8_t u8;
Makefile
中的 OBJS 中新增 list_sort.o
OBJS := qtest.o report.o console.o harness.o queue.o \
random.o dudect/constant.o dudect/fixture.o dudect/ttest.o \
shannon_entropy.o \
linenoise.o web.o \
+ list_sort.o
qtest.c
include "list_sort.h"
list_sort
static int use_list_sort = 0;
cmp
static int cmp(void *priv, const struct list_head *a, const struct list_head *b)
{
return strcmp(list_entry(a, element_t, list)->value,
list_entry(b, element_t, list)->value);
}
此時參考到 Linux 核心的比較函式撰寫
/* `priv` is the test pointer so check() can fail the test if the list is invalid. */
static int cmp(void *priv, const struct list_head *a, const struct list_head *b)
{
struct debug_el *ela, *elb;
ela = container_of(a, struct debug_el, list);
elb = container_of(b, struct debug_el, list);
check(priv, ela, elb);
return ela->value - elb->value;
}
得知 priv
是個測試用的指標
list_sort
加入到 help 中,可以透過 option list_sort [true/false]
來啟用 sort
或 list_sort
add_param("list_sort", &use_list_sort,
"The list_sort function used within the Linux kernel.", NULL);
Options:
descend 0 | Sort and merge queue in ascending/descending order
echo 1 | Do/don't echo commands
entropy 0 | Show/Hide Shannon entropy
error 5 | Number of errors until exit
fail 30 | Number of times allow queue operations to return false
length 1024 | Maximum length of displayed string
list_sort 0 | The list_sort function used within the Linux kernel.
malloc 0 | Malloc failure probability percent
simulation 0 | Start/Stop simulation mode
verbose 4 | Verbosity level
cmd>
trace-sort.cmd
將其放入 trace 目錄中# Testing the efficiency of list_sort and sort.
# You can modify the option listsort and the RAND to meet your need
option list_sort 1
new
it RAND 1000000
time
sort
time
輸入
$ ./qtest -f traces/trace-sort.cmd
即可得到 list_sort
或 sort
的處理時間
參考 perf 安裝
輸入:
$ cat /proc/sys/kernel/perf_event_paranoid
kernel.perf_event_paranoid 的會是 4
輸入:
$ sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"
-1
可以將權限全開
list_sort
與 q_sort
效率分析使用 Metric prefix,而非「萬」
資料數 | q_sort | list_sort |
---|---|---|
100k | 0.066 | 0.039 |
200k | 0.167 | 0.128 |
300k | 0.290 | 0.202 |
400k | 0.414 | 0.290 |
500k | 0.547 | 0.387 |
600k | 0.687 | 0.479 |
700k | 0.816 | 0.579 |
800k | 0.951 | 0.682 |
900k | 1.102 | 0.753 |
1000k | 1.255 | 0.856 |
使用 gnuplot 製圖工具畫成折線圖方便觀察
從圖看出當資料量越大 list_sort
的處理效率明顯更好
接著使用 perf
工具分析更詳細的內容
輸入
$ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./qtest -f traces/trace-sort.cmd
不要急著比較程式效能,你該思考:
謝謝老師的指教,我會在更進一步的分析
jeremy
重複 5 次該程序 ,並顯示每個 event 的變化區間,接著分析對一百萬的資料做 sort ,並分析 cache-misses
、 cache-references
、instructions
、 cycles
。
得到以下結果
q_sort
Performance counter stats for './qtest -f traces/trace-sort.cmd' (5 runs):
60,203,429 cache-misses # 65.85% of all cache refs ( +- 1.23% )
91,423,384 cache-references ( +- 0.20% )
4,732,222,486 instructions # 0.64 insn per cycle ( +- 0.01% )
7,353,673,626 cycles ( +- 0.71% )
1.7575 +- 0.0174 seconds time elapsed ( +- 0.99% )
list_sort
Performance counter stats for './qtest -f traces/trace-sort.cmd' (5 runs):
47,561,816 cache-misses # 66.22% of all cache refs ( +- 0.17% )
71,827,278 cache-references ( +- 0.16% )
4,693,241,118 instructions # 0.84 insn per cycle ( +- 0.03% )
5,615,783,740 cycles ( +- 0.42% )
1.32896 +- 0.00571 seconds time elapsed ( +- 0.43% )
測試內容 | q_sort | list_sort |
---|---|---|
cache-misses | 60,203,429 | 47,561,816 |
cache-references | 91,423,384 | 71,827,278 |
instructions | 4,732,222,486 | 4,693,241,118 |
cycles | 7,353,673,626 | 5,615,783,740 |
由以上資料分析得知:
list_sort
相比 q_sort
少了 21% 的 cache-misses
q_sort
快了約 31%中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
留意各式細節,唯有重視小處並步步為營,方可挑戰原始程式碼規模超越三千七百萬行的 Linux 核心
收到
jeremy
目前只是原封不動的加入到 lab0
後續再將其修改與解讀
在 console.c
中加入
ADD_COMMAND(ttt, "Play Tic Tac Toe", "");
和 do_ttt
函式
static bool do_ttt(int argc, char *argv[])
{
/* .... */
}
更多合併過程可以參考我的
在合併的過程中遇到的問題:Cppcheck: Null pointer dereference
zobrist.c:63:21: error: Null pointer dereference: (struct zobrist_entry_t*)0 [nullPointer]
entry = hlist_entry(hash_table[i].first, zobrist_entry_t, ht_list);
^
Fail to pass static analysis.
探討:
hlist_entry
hlist_entry
為 container_of
的包裝container_of
中有這段__typeof__(((type *) 0)->member)
(type *)0
將 0 轉型成 null pointer
,是為了取得 member 的 type 。
因此修改 pre-commit.hook
,在 CPPCHECK_suppresses
中加入這段
+ --suppress=nullPointer:zobrist.c \
在 jasperlin1996 的筆記中也有遇到相同問題,且探討的更深入。
新增電腦對決電腦的功能至 qtest 中
./qtest
cmd> option ai_vs_ai 1
cmd> ttt
1 | × ×
2 | × ○
3 | ○ × ○
4 | ○
---+-------------
A B C D
O won!
Moves: B2 -> C2 -> C3 -> D3 -> B1 -> B3 -> D1 -> A4
問題:
每次執行的結果都相同,應增加更多隨機性使其結果有更多變化
TODO:
在對弈的過程中,要在螢幕顯示當下的時間 (含秒數),並持續更新。
參考:Monte Carlo Tree Search Wiki / Monte Carlo Tree Search 解說
四步驟:
從根節點開始,由 UCB 判斷該往那的子節點進行移動,其中 UCB 將在最佳策略與探索新路徑做出平衡。
: 該 node 贏得的次數
: 該 node 總共進行次數
: Parent 的模擬次數
: 權重 (可以自行調整 c: )
總結:
此方程式不完全只走目前已知勝率最高的路線,而是在探索與最佳策略之間切換來達到平衡。
為何不建議在 Linux 核心裡使用浮點數運算 :
拖慢執行速度,還需要手動儲存/取回相關的 FPU 暫存器。
其中 FPU 造成之問題:Lazy FP state restore
來自 Linux 核心原始程式碼的 Load Average 處理機制 => linux/kernel/sched/loadavg.c
理解這定點數的乘冪花了我些時間了解,先從其註釋看起
* By exploiting the relation between the definition of the natural power
* function: x^n := x*x*...*x (x multiplied by itself for n times), and
* the binary encoding of numbers used by computers: n := \Sum n_i * 2^i,
* (where: n_i \elem {0, 1}, the binary vector representing n),
* we find: x^n := x^(\Sum n_i * 2^i) := \Prod x^(n_i * 2^i), which is
* of course trivially computable in O(log_2 n), the length of our binary
* vector.
寫成數學式:
例如 (
下面程式碼的目的為決定是否要將 n & 1
為 false
,故
if (n & 1) {
result *= x;
result += 1UL << (frac_bits - 1);
result >>= frac_bits;
}
n >>= 1;
到進行下次迴圈時
x *= x;
x += 1UL << (frac_bits - 1); //carry
x >>= frac_bits;
較大的縮放係數,可以使其數值範圍較大,但反之精度會較差,在這裡縮放係數表示為
在老師提供的 mcts
中 score
使用的是 double
,因此要將其改成定點數運算,使用到 stdint.h 中的 uint64_t
來取代 double
。
在 UCB 的計算中用到 log2 與 sqrt 的運算,需先將其改成使用定點數的運算。
log
參考< jujuegg >的實作使用泰勒級數展開來取得近似
sqrt
接個引入第三週測驗中的開方根函式
static uint64_t fixed_sqrt(uint64_t N)
{
uint64_t msb = 0;
uint64_t n = N;
while (n > 1) {
n >>= 1;
msb++;
}
uint64_t a = 1 << msb;
uint64_t result = 0;
while (a != 0) {
if ((result + a) * (result + a) <= N)
result += a;
a >>= 1;
}
return result;
}