Try   HackMD

2025q1 Homework1 (lab0)

contributed by < chensheep >

Reviewed by Andrushika

撰寫文件時的用詞正確性、語句通順度

根據老師撰寫的資訊科技詞彙翻譯

「實作」會是符合上述 "implement" 語意的譯詞,取「起」以「創作」的意思,而「製造」本質上是固定輸入並通常產生固定輸出的行為。

目前全篇都寫作「實做」,建議修改。

「閱讀 lib/list_sort.c」小節有錯字:

目標是 merge list a 與 list b,因為 merge 中不「維」 prev 指標

此外,在撰寫文件時,應該注意標點符號與斷句位置,方便閱讀。
q_merge 小節可以修正如下(使用 Claude 協助修正):

目前實現方式是依序從第 2 個 queue 開始,將所有的 node 都接到第 1 個 queue 的後方,最後再針對第 1 個 queue 做排序。
這個作法很直覺,但因為最後需要針對第 1 個 queue 做排序,目前使用的是 Merge sort,所以其時間複雜度為

O(nlogn)

新增 shuffle 命令處的數學統計

原文的句子:

從卡方分佈表找合適的 P value,

X2 為 24.762842054728754,因為 14.848 < 24.762842054728754 < 32.007,所以 P value 介於 0.9 和 0.1 之間。

此處使用的是右尾檢定,但推導過程不精準,若顯著水準為 alpha = 0.05,則應該直接使用

χ²< (即
24.763<35.172
)進行推導說明,而非找 0.9, 0.1 去進行比較。

作業書寫規範:

  • 無論標題和內文中,中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
  • 文字訊息 (尤其是程式執行結果) 請避免用圖片來表示,否則不好搜尋和分類
  • 共筆書寫請考慮到日後協作,避免過多的個人色彩,用詞儘量中性
  • 不要在筆記內加入 [TOC] : 筆記左上方已有 Table of Contents (TOC) 功能,不需要畫蛇添足
  • 不要變更預設的 CSS 也不要加入任何佈景主題: 這是「開發紀錄」,用於評分和接受同儕的檢閱
  • 在筆記中貼入程式碼時,避免非必要的行號,也就是該手動將 c=cpp= 變更為 ccpp。行號只在後續討論明確需要行號時,才要出現,否則維持精簡的展現。可留意「你所不知道的 C 語言: linked list 和非連續記憶體」裡頭程式碼展現的方式
  • HackMD 不是讓你張貼完整程式碼的地方,GitHub 才是!因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示),可附上對應 GitHub commit 的超連結,列出程式碼是為了「檢討」和「便於他人參與討論」
  • 留意科技詞彙的使用,請參見「資訊科技詞彙翻譯」及「詞彙對照表
  • 不要濫用 :::info, :::success, :::warning 等標示,儘量用清晰的文字書寫。:::danger 則僅限授課教師作為批注使用
  • 避免過多的中英文混用,已有明確翻譯詞彙者,例如「鏈結串列」(linked list) 和「佇列」(queue),就使用該中文詞彙,英文則留給變數名稱、人名,或者缺乏通用翻譯詞彙的場景
  • 在中文敘述中,使用全形標點符號,例如該用「,」,而非 ","。注意書名號的使用,即 ,非「小於」和「大於」符號
  • 避免使用不必要的 emoji 字元

開發環境

$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   6
  On-line CPU(s) list:    0-5
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i5-9500 CPU @ 3.00GHz
    CPU family:           6
    Model:                158
    Thread(s) per core:   1
    Core(s) per socket:   6
    Socket(s):            1
    Stepping:             10
    CPU(s) scaling MHz:   18%
    CPU max MHz:          4400.0000
    CPU min MHz:          800.0000
    BogoMIPS:             6000.00
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi
                           mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfm
                          on pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 mo
                          nitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic mov
                          be popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault
                           epb pti ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjus
                          t bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsav
                          ec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_
                          clear flush_l1d arch_capabilities
Virtualization features:  
  Virtualization:         VT-x
Caches (sum of all):      
  L1d:                    192 KiB (6 instances)
  L1i:                    192 KiB (6 instances)
  L2:                     1.5 MiB (6 instances)
  L3:                     9 MiB (1 instance)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-5
Vulnerabilities:          
  Gather data sampling:   Mitigation; Microcode
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
  Mds:                    Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affecte
                          d; BHI Not affected
  Srbds:                  Mitigation; Microcode
  Tsx async abort:        Mitigation; TSX disabled

實作 queue.[ch]

q_new

參考以下實做

struct list_head *q_new()
{
    struct list_head *lh = malloc(sizeof(struct list_head));
    INIT_LIST_HEAD(lh);

    return lh;
}

malloc(3) — Linux manual page 中RETURN VALUE 提到

  • The malloc(), calloc(), realloc(), and reallocarray() functions
    return a pointer to the allocated memory, which is suitably
    aligned for any type that fits into the requested size or less.
    On error, these functions return NULL and set errno. Attempting
    to allocate more than PTRDIFF_MAX bytes is considered an error, as
    an object that large could cause later pointer subtraction to
    overflow.

加入檢查 malloc 函式回傳結果,當 malloc 發生錯誤時 q_new 回傳 NULL

struct list_head *q_new() { struct list_head *lh = malloc(sizeof(struct list_head)); if (!lh) { return NULL; } INIT_LIST_HEAD(lh); return lh; }

q_free

實做方式

void q_free(struct list_head *head) { element_t *e = NULL, *is = NULL; if (!head) return; list_for_each_entry_safe (e, is, head, list) { list_del(&e->list); q_release_element(e); } free(head); }

透過 qtest 程式測試,產生一個空的 queue 後釋放沒有錯誤訊息

$ ./qtest
cmd> new
l = []
cmd> free
l = NULL
cmd> 

但在實做完 q_insert_head,出現以下錯誤訊息

$ ./qtest 
cmd> new
l = []
cmd> ih 1
l = [1]
cmd> free
ERROR: Corruption detected in block with address 0x614c90fac170 when attempting to free it
l = NULL
cmd> 

首先 harness.c 程式碼找到打印相關訊息的部份
是由於 footer 與 MAGICFOOTER 不一致所導致

if (footer != MAGICFOOTER) { report_event(MSG_ERROR, "Corruption detected in block with address %p when " "attempting to free it", p); error_occurred = true; }

首先想到的作法 test_freealloc 打印出一些訊息,也確實觀察到 footer 被改動
但無法明確的發現在何時被更改,後來想到之前老師提到可以透過 gdb 工具除錯,步驟如下

  1. harness.c 檔案中的函式 alloc 加入訊息,印出 footer 記憶體位置,後續會用 gdb 追蹤這個位址
diff --git a/harness.c b/harness.c
index 4e02891..f2720db 100644
--- a/harness.c
+++ b/harness.c
@@ -161,6 +161,9 @@ static void *alloc(alloc_t alloc_type, size_t size)
     // cppcheck-suppress nullPointerRedundantCheck
     new_block->prev = NULL;
 
+    printf("new block p %p magic footer %zx %p\n",
+        p, *find_footer(new_block), &(*find_footer(new_block)));
+
     if (allocated)
         allocated->prev = new_block;
     allocated = new_block;
  1. 執行 qtest 程式
$./qtest
  1. 開啟另外一個 terminal 查詢執行中 qtest 的 PID 並 attach 到 gdb
$ ps -aux |grep qtest
$ sudo gdb -p <PID>
  1. 於 gdb 中設定 breakpoint,這邊將 breakpoint 設定在 harness.caloc 的最後一行
(gdb) break harness.c:173
Breakpoint 1 at 0x6068a22dba01: file harness.c, line 173.
(gdb) continue 
Continuing.
  1. qtest 運行的 terminal 執行
cmd> new
new block p 0x6096d0f6f2c0 magic footer beefdead 0x5dc240b922d0
l = []
cmd> it 1
new block p 0x6096d0f6f710 magic footer beefdead 0x5dc240b92720
new block p 0x6096d0f6f750 magic footer beefdead 0x5dc240b92752
  1. gdb terminal 設定觀察 footer 記憶位置
(gdb) awatch *(0x5dc240b92720)
(gdb) continue
  1. qtest 運行的 terminal 執行
cmd> free
  1. gdb 可以觀察到錯誤訊息,可以透過 bt 打印出 stack
Hardware access (read/write) watchpoint 2: *(0x5dc240b92720)

Old value = -1091576147
New value = 1085874880
list_add (head=0x5dc240b922c0, node=0x5dc240b92718) at /home/chen/workspace/linux2025/lab0-c/list.h:107
107	    node->prev = head;

(gdb) bt
#0  list_add (head=0x5dc240b922c0, node=0x5dc240b92718) at /home/chen/workspace/linux2025/lab0-c/list.h:107
#1  q_insert_head (head=0x5dc240b922c0, s=s@entry=0x5dc22380f138 "123") at queue.c:49
#2  0x00005dc223807f8b in queue_insert (pos=pos@entry=POS_HEAD, argc=<optimized out>, argv=<optimized out>) at qtest.c:234
#3  0x00005dc22380815c in do_ih (argc=<optimized out>, argv=<optimized out>) at qtest.c:282
#4  0x00005dc22380976b in interpret_cmda (argc=argc@entry=2, argv=argv@entry=0x5dc240b93250) at console.c:183
#5  0x00005dc223809d32 in interpret_cmd (cmdline=cmdline@entry=0x5dc240b931e0 "ih 1") at console.c:203
#6  0x00005dc22380a828 in run_console (infile_name=infile_name@entry=0x0) at console.c:661
#7  0x00005dc223808b5a in main (argc=1, argv=<optimized out>) at qtest.c:1451

可以觀察到在呼叫 q_insert_head 最後執行 list_add 時就已經修改到 footer
在審視一下目前的實做方式

bool q_insert_head(struct list_head *head, char *s) { element_t *e = malloc(sizeof(struct list_head)); int sl = strlen(s); e->value = malloc((sl + 1) * sizeof(char)); strncpy(e->value, s, sl + 1); list_add(&e->list, head); return true; }

發現在第 3 行透過 malloc 分配記憶體給 e 時,應取 element_t 的結構大小而非 list_head

@@ -36,7 +36,7 @@ void q_free(struct list_head *head)
 /* Insert an element at head of queue */
 bool q_insert_head(struct list_head *head, char *s)
 {
-    element_t *e = malloc(sizeof(struct list_head));
+    element_t *e = malloc(sizeof(element_t));
     // printf("malloc e %p\n", e);
     int sl = strlen(s);
     e->value = malloc((sl + 1) * sizeof(char));

q_insert_head

閱讀 queue.h 中關於 q_insert_head 註解提到

  • Argument s points to the string to be stored.
    The function must explicitly allocate space and copy the string into it.

觀察到 q_insert_head 的 parameter 包含 head 與 s ,並未輸入字串 s 的長度,由此判斷 s 應最後一個字元應該為 '\0'

實做如下

bool q_insert_head(struct list_head *head, char *s) { element_t *e = malloc(sizeof(struct list_head)); int sl = strlen(s); e->value = malloc((sl + 1) * sizeof(char)); strncpy(e->value, s, sl + 1); list_add(&e->list, head); return true; }

此實做分配錯記憶體大小的問題,已更新,詳細的查測過程

@@ -36,7 +36,7 @@ void q_free(struct list_head *head)
 /* Insert an element at head of queue */
 bool q_insert_head(struct list_head *head, char *s)
 {
-    element_t *e = malloc(sizeof(struct list_head));
+    element_t *e = malloc(sizeof(element_t));
     // printf("malloc e %p\n", e);
     int sl = strlen(s);
     e->value = malloc((sl + 1) * sizeof(char));

處理針對 trace-11-malloc 與 trace-12-malloc 錯誤訊息

+++ TESTING trace trace-11-malloc:
# Test of malloc failure on 'q_insert_head': 'q_new', and 'q_insert_head'
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer---	trace-11-malloc	0/6
+++ TESTING trace trace-12-malloc:
# Test of malloc failure on 'q_insert_tail': 'q_new', 'q_insert_head', and 'q_insert_tail'
Segmentation fault occurred.  You dereferenced a NULL or invalid pointer---	trace-12-malloc	0/6

待處理

q_insert_tail

參考 q_insert_head 中透過 list_add 函式將新的節點加入到 list 的頭,參考 list.h 中相對有個函式 list_add_tail 將節點加入到 list 的尾,調整使用 list_add_tail 實做如下

bool q_insert_tail(struct list_head *head, char *s) { element_t *e = malloc(sizeof(struct list_head)); int sl = strlen(s); e->value = malloc((sl + 1) * sizeof(char)); strncpy(e->value, s, sl + 1); list_add_tail(&e->list, head); return true; }

使用 qtest 測試結果符合預期

$ ./qtest 
cmd> new
l = []
cmd> ih 1
l = [1]
cmd> ih 2
l = [2 1]
cmd> it 3
l = [2 1 3]
cmd>

q_merge

目前實做方式是依序從第 2 個 queue 開始將所有的 node 都接到第一個 queue 的後方,最後在針對第 1 個 queue 做排序。

這個作法很直覺但因為最後需針對第 1 個 queue 做排序,目前排序時做為 Merge sort 所以其時間複查度為

O(nlogn)

實做方式透過函式 list_splice_tail 移動所有的 nodes,注意到 list_splice_tail 的註解說明中有提到

The @list head is not modified and has to be initialized to be used as a valid list head/node again.

所以需在呼叫 INIT_LIST_HEAD 才可以再當成可用的 head,參考下方。

diff --git a/queue.c b/queue.c
index 5f398d3..d3c76b5 100644
--- a/queue.c
+++ b/queue.c
@@ -478,6 +478,7 @@ int q_merge(struct list_head *head, bool descend)
     list_for_each_entry_safe(e, is, head, chain) {
         if (e != first) {
             list_splice_tail(e->q, first->q);
+           INIT_LIST_HEAD(e->q);
         }
     }

配合 q_merge 註解中提到

There is no need to free the 'queue_contex_t' and its member 'q' since they will be released externally.

加入 INIT_LIST_HEAD(e->q); 後續做 q_free 才不會發生問題。

在 qtest 提供新的命令 shuffle

已更新 Commit dc074bd

透過 scripys/driver-shuffle.py 輸出 shuffle 結果如下

Expectation:  4166
Observation:  {'1234': 24965, '1243': 0, '1324': 0, '1342': 0, '1423': 0, '1432': 0, '2134': 0, '2143': 0, '2314': 0, '2341': 24894, '2413': 0, '2431': 0, '3124': 0, '3142': 0, '3214': 0, '3241': 0, '3412': 24888, '3421': 0, '4123': 25253, '4132': 0, '4213': 0, '4231': 0, '4312': 0, '4321': 0}
chi square sum:  500101.38214114256

shuffle_statistic

從圖表來看 shuffle 的頻率明顯不符合 Uniform distribution

Commit 3773ea2 已修正此問題,執行 ./scripts/driver-shuffle.py 的結果

$ ./scripts/driver-shuffle.py Expectation: 4166 Observation: {'1234': 4241, '1243': 4165, '1324': 4115, '1342': 4269, '1423': 4210, '1432': 4143, '2134': 4171, '2143': 4191, '2314': 4130, '2341': 4006, '2413': 4196, '2431': 4185, '3124': 4108, '3142': 4229, '3214': 4232, '3241': 4180, '3412': 4091, '3421': 4109, '4123': 4202, '4132': 4256, '4213': 4157, '4231': 4027, '4312': 4225, '4321': 4162} chi square sum: 24.762842054728754
  • 決定自由度,由於 shffle 4 有 24 種結果,因此自由度為 23。
  • alpha 設定為常見的 0.05
  • P value
    卡方分佈表找合適的 P value,
    X2
    為 24.762842054728754,因為 14.848 < 24.762842054728754 < 32.007,所以 P value 介於 0.9 和 0.1 之間。
    image

因為 P value(0.9~0.1)> alpha(0.05),統計檢定的結果不拒絕虛無假說(

H0)。

也就是樣本資料不拒絕 shuffle 的結果遵循 Uniform distribution,因為沒有足夠證據推翻。

shuffle_statistic

從圖表來看 shuffle 的頻率是大致符合 Uniform distribution 的。

閱讀 lib/list_sort.c

include/linux/types.h 中知到 struct list_head 定義如下

...
struct list_head {
        struct list_head *next, *prev;
};
...

list_sort_.c 中 merge 的運作參考以下流程,目標是 merge list a 與 list b,因為 merge 中不維 prev 指標,所以下圖省略







DoublyLinkedList



ahead


a



anode1



1


next



ahead->anode1





anode2



2


next



anode1:next->anode2





anode4



4


next



anode2:next->anode4





NULL1

NULL



anode4:next->NULL1





bhead


b



bnode3



3


next



bhead->bnode3





NULL2

NULL



bnode3:next->NULL2





head


head



tail


tail



tail->head





after loop 1







DoublyLinkedList



ahead


a



anode2



2


next



ahead->anode2





anode1



1


next



anode1:next->anode2





anode4



4


next



anode2:next->anode4





NULL1

NULL



anode4:next->NULL1





bhead


b



bnode3



3


next



bhead->bnode3





NULL2

NULL



bnode3:next->NULL2





head


head



head->anode1





tail


tail



tail->anode1:next





after loop 2







DoublyLinkedList



ahead


a



anode4



4


next



ahead->anode4





anode1



1


next



anode2



2


next



anode1:next->anode2





anode2:next->anode4





NULL1

NULL



anode4:next->NULL1





bhead


b



bnode3



3


next



bhead->bnode3





NULL2

NULL



bnode3:next->NULL2





head


head



head->anode1





tail


tail



tail->anode2:next





after loop 3







DoublyLinkedList



ahead


a



anode4



4


next



ahead->anode4





anode1



1


next



anode2



2


next



anode1:next->anode2





bnode3



3


next



anode2:next->bnode3





NULL1

NULL



anode4:next->NULL1





bhead


b



NULL2

NULL



bhead->NULL2





bnode3:next->NULL2





head


head



head->anode1





tail


tail



tail->bnode3:next





因為 b 為 NULL 所以進一步執行

if (!b) {
    *tail = a;
    break;
}






DoublyLinkedList



ahead


a



anode4



4


next



ahead->anode4





anode1



1


next



anode2



2


next



anode1:next->anode2





bnode3



3


next



anode2:next->bnode3





NULL1

NULL



anode4:next->NULL1





bhead


b



NULL2

NULL



bhead->NULL2





bnode3:next->anode4





head


head



head->anode1





tail


tail



tail->bnode3:next





最後回傳 head。

接著說明 list_sort 函式,部份程式碼如下

... do { size_t bits; struct list_head **tail = &pending; /* Find the least-significant clear bit in count */ for (bits = count; bits & 1; bits >>= 1) tail = &(*tail)->prev; /* Do the indicated merge */ if (likely(bits)) { struct list_head *a = *tail, *b = a->prev; a = merge(priv, cmp, b, a); /* Install the merged result in place of the inputs */ a->prev = b->prev; *tail = a; } /* Move one element from input list to pending */ list->prev = pending; pending = list; list = list->next; pending->next = NULL; count++; } while (list); ...

count 變數從 0 開始每回和加 1 。
程式碼第 6 行,會找尋最小為 0 的 bit,可以參考下方表格,用黃色與紅色表示。
程式碼第 10 行,會判斷剩餘的的 bits 是不是 0,非 0 則進行整並,參考下方紅色的 0 即需整並的情形,對應到 merge 欄位。

count (binary) merge initial pending after merge after insert a new node
0 (00000000) NO NULL [1]
1 (00000001) NO [1] [1, 1]
2 (00000010) YES [1, 1] [2] [2, 1]
3 (00000011) NO [2, 1] [2, 1, 1]
4 (00000100) YES [2, 1, 1] [2, 2] [2, 2, 1]
5 (00000101) YES [2, 2, 1] [4, 1] [4, 1, 1]
6 (00000110) YES [4, 1, 1] [4, 2] [4, 2, 1]
7 (00000111) NO [4, 2, 1] [4, 2, 1, 1]
8 (00001000) YES [4, 2, 1, 1] [4, 2, 2] [4, 2, 2, 1]
9 (00001001) YES [4, 2, 2, 1] [4, 4, 1] [4, 4, 1, 1]
10 (00001010) YES [4, 4, 1, 1] [4, 4, 2] [4, 4 ,2, 1]
11 (00001011) YES [4, 4, 2, 1] [8, 2, 1] [8, 2, 1, 1]
12 (00001100) YES [8, 2, 1, 1] [8, 2, 2] [8, 2, 2, 1]
13 (00001101) YES [8, 2, 2, 1] [8, 4, 1] [8, 4, 1, 1]
14 (00001110) YES [8, 4, 1, 1] [8, 4, 2] [8, 4, 2, 1]
15 (00001111) NO [8, 4, 2, 1] [8, 4, 2, 1, 1]
16 (00010000) YES [8, 4, 2, 1, 1] [8, 4, 2, 2] [8, 4, 2, 2, 1]

取當 count 為 9 來舉例說明,初始 pending 列表如下







DoublyLinkedList



node1


1



node2


2



node1->node2





node3


3



node2->node3





node4


4



node3->node4





pending


pending



pending->node1





tail


tail



tail->pending





9 的 binary 為 00001001,因為最小的 0 出現前有一個 1 因此下方會執行 1 次

tail = &(*tail)->prev;






DoublyLinkedList



node1


1


prev



node2


2


prev



node1:prev->node2





node3


2


prev



node2:prev->node3





node4


4


prev



node3:prev->node4





pending


pending



pending->node1





tail


tail



tail->node1:prev





接著

struct list_head *a = *tail, *b = a->prev;






DoublyLinkedList



node1


1


prev



node2


2


prev



node1:prev->node2





node3


2


prev



node2:prev->node3





node4


4


prev



node3:prev->node4





pending


pending



pending->node1





tail


tail



tail->node1:prev





a


a



a->node2





b


b



b->node3





然後

a = merge(priv, cmp, b, a);
/* Install the merged result in place of the inputs */
a->prev = b->prev;
*tail = a;






DoublyLinkedList



node1


1


prev



node2


4


prev



node1:prev->node2





node4


4


prev



node2:prev->node4





pending


pending



pending->node1





tail


tail



tail->node1:prev





a


a



a->node2





最後這段程式碼會在放如一個節點

/* Move one element from input list to pending */
                list->prev = pending;
                pending = list;
                list = list->next;
                pending->next = NULL;
                count++;






DoublyLinkedList



node1


1


prev



node2


4


prev



node1:prev->node2





node4


4


prev



node2:prev->node4





pending


pending



newnode1


1


prev



pending->newnode1





newnode1:prev->node1





接著看 merge_final 函式中這段程式碼

...
do {
                /*
                 * If the merge is highly unbalanced (e.g. the input is
                 * already sorted), this loop may run many iterations.
                 * Continue callbacks to the client even though no
                 * element comparison is needed, so the client's cmp()
                 * routine can invoke cond_resched() periodically.
                 */
                if (unlikely(!++count))
                        cmp(priv, b, b);
                b->prev = tail;
                tail = b;
                b = b->next;
        } while (b);

...

待處理

關於 likely 與 unlikely 函式
於 linux-6.14-rc7 的檔案 include/linux/compiler.h 可以找到其定義,在 CONFIG_TRACE_BRANCH_PROFILING 未被開啟的狀態下被定義如下

//有 defined CONFIG_TRACE_BRANCH_PROFILING
# define likely(x)     (__branch_check__(x, 1, __builtin_constant_p(x)))
# define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))

//沒有 defined CONFIG_TRACE_BRANCH_PROFILING
# define likely(x)      __builtin_expect(!!(x), 1)
# define unlikely(x)    __builtin_expect(!!(x), 0)

關於 CONFIG_TRACE_BRANCH_PROFILING 可以參考 kernel/trace/Kconfig 中說明
在 linux-6.14-rc7 的資料夾底下

make menuconfig

可以在 Tracers -> Branch Profiling -> Trace likely/unlikely profile 找到說明
image
image
這個 config 是用來追蹤 likely 與 unlikely 的情形,打開此選項對效能有很大的影響
因此可以推斷平常執行時 likely 與 unlikely 定義是使用

# define likely(x)      __builtin_expect(!!(x), 1)
# define unlikely(x)    __builtin_expect(!!(x), 0)

解釋 __builtin_expect,參考 6.64 Other Built-in Functions Provided by GCC

Built-in Function: long __builtin_expect (long exp, long c)
You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
可以使用這個函式來告知編譯器分支預測的資訊。
當第 c 帶入 0 代表這個分支不被預期發生,帶入 0 則代表會被預期發生,編譯器會針對這個對於產的的 assembly code 做調整。
參考下方實驗

#include <stdlib.h>
#include <stdio.h>
#include "time.h"


int main()
{
    int x = !time(NULL);

    if(__builtin_expect(x, 0)) {
        printf("not expected\n");
    } else {
        printf("expected\n");
    }

    return 0;
}
.LC0:
	.string	"not expected"
.LC1:
	.string	"expected"
	.text
	.globl	main
	.type	main, @function
main:
.LFB39:
	.cfi_startproc
	endbr64
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movl	$0, %edi
	call	time@PLT
	testq	%rax, %rax
	je	.L5
	leaq	.LC1(%rip), %rdi
	call	puts@PLT
.L3:
	movl	$0, %eax
	addq	$8, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 8
	ret
.L5:
	.cfi_restore_state
	leaq	.LC0(%rip), %rdi
	call	puts@PLT
	jmp	.L3
	.cfi_endproc

#include <stdlib.h>
#include <stdio.h>
#include "time.h"


int main()
{
    int x = !time(NULL);

    if(__builtin_expect(x, 1)) {
        printf("expected\n");
    } else {
        printf("not expected\n");
    }

    return 0;
}
.LC0:
	.string	"expected"
.LC1:
	.string	"not expected"
	.text
	.globl	main
	.type	main, @function
main:
.LFB39:
	.cfi_startproc
	endbr64
	subq	$8, %rsp
	.cfi_def_cfa_offset 16
	movl	$0, %edi
	call	time@PLT
	testq	%rax, %rax
	jne	.L2
	leaq	.LC0(%rip), %rdi
	call	puts@PLT
.L3:
	movl	$0, %eax
	addq	$8, %rsp
	.cfi_remember_state
	.cfi_def_cfa_offset 8
	ret
.L2:
	.cfi_restore_state
	leaq	.LC1(%rip), %rdi
	call	puts@PLT
	jmp	.L3
	.cfi_endproc

可以觀察到 expected 的訊息都被安排到 je 或 jne 的下方。

另外文件中提到可以在編譯時加入 -fprofile-arcs 可以追蹤真實運作中每個分支實際被運作的情形,進一步查閱 3.13 Program Instrumentation Options

-fprofile-arcs
Add code so that program flow arcs are instrumented. During execution the program records how many times each branch and call is executed and how many times it is taken or returns. On targets that support constructors with priority support, profiling properly handles constructors, destructors and C++ constructors (and destructors) of classes which are used as a type of a global variable.

When the compiled program exits it saves this data to a file called auxname.gcda for each source file. The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage). Each object file’s auxname is generated from the name of the output file, if explicitly specified and it is not the final executable, otherwise it is the basename of the source file. In both cases any suffix is removed (e.g. foo.gcda for input file dir/foo.c, or dir/foo.gcda for output file specified as -o dir/foo.o).

Note that if a command line directly links source files, the corresponding .gcda files will be prefixed with the unsuffixed name of the output file. E.g. gcc a.c b.c -o binary would generate binary-a.gcda and binary-b.gcda files.

在編譯時加入這個選項後,運作程式時會自動產生 .gcda 檔案。

The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage)

(待處理)

另外也提到可以透過 gcov 工作來分析。(待處理)

harness

(gdb) set $b=(block_element_t*) ((size_t)p-sizeof(block_element_t))
(gdb) set $t=(size_t*)((size_t)$b+sizeof(block_element_t)+$b->payload_size)
(gdb) p/x *$t

研讀 select(2)

select() can monitor only file descriptors numbers that are less than FD_SETSIZE (1024)

文件中提及 select 只能監視最多 1024 個 file descriptors,超過則可以用 poll(2) 或 epoll(7)。

Thus, if using select() within a loop, the sets must be
reinitialized before each call.

Reference

  1. How do the likely/unlikely macros in the Linux kernel work and what is their benefit?