2025q1 Homework1 (lab0)

contributed by < vicLin8712 >

作業書寫規範:

  • 無論標題和內文中,中文和英文字元之間要有空白字元 (對排版和文字搜尋有利)
  • 文字訊息 (尤其是程式執行結果) 請避免用圖片來表示,否則不好搜尋和分類
  • 共筆書寫請考慮到日後協作,避免過多的個人色彩,用詞儘量中性
  • 不要在筆記內加入 [TOC] : 筆記左上方已有 Table of Contents (TOC) 功能,不需要畫蛇添足
  • 不要變更預設的 CSS 也不要加入任何佈景主題: 這是「開發紀錄」,用於評分和接受同儕的檢閱
  • 在筆記中貼入程式碼時,避免非必要的行號,也就是該手動將 c=cpp= 變更為 ccpp。行號只在後續討論明確需要行號時,才要出現,否則維持精簡的展現。可留意「你所不知道的 C 語言: linked list 和非連續記憶體」裡頭程式碼展現的方式
  • HackMD 不是讓你張貼完整程式碼的地方,GitHub 才是!因此你在開發紀錄只該列出關鍵程式碼 (善用 diff 標示),可附上對應 GitHub commit 的超連結,列出程式碼是為了「檢討」和「便於他人參與討論」
  • 留意科技詞彙的使用,請參見「資訊科技詞彙翻譯」及「詞彙對照表
  • 不要濫用 :::info, :::success, :::warning 等標示,儘量用清晰的文字書寫。:::danger 則僅限授課教師作為批注使用
  • 避免過多的中英文混用,已有明確翻譯詞彙者,例如「鏈結串列」(linked list) 和「佇列」(queue),就使用該中文詞彙,英文則留給變數名稱、人名,或者缺乏通用翻譯詞彙的場景
  • 在中文敘述中,使用全形標點符號,例如該用「,」,而非 ","。注意書名號的使用,即 ,非「小於」和「大於」符號
  • 避免使用不必要的 emoji 字元

Testbed Information

$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

$ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          39 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                GenuineIntel
  Model name:             Intel(R) Core(TM) i5-6400 CPU @ 2.70GHz
    CPU family:           6
    Model:                94
    Thread(s) per core:   1
    Core(s) per socket:   4
    Socket(s):            1
    Stepping:             3
    CPU(s) scaling MHz:   94%
    CPU max MHz:          3300.0000
    CPU min MHz:          800.0000
    BogoMIPS:             5399.81

Progression Tracing

以下複製自「作業要求

  • 在 GitHub 上 fork lab0-c
  • 依據上述指示著手修改 queue.[ch] 和連帶的檔案,測試後用 Git 管理各項修改,要能滿足 $ make test 自動評分系統的所有項目。
  • Linux 核心原始程式碼的 lib/list_sort.c 並嘗試引入到 lab0-c 專案,比較你自己實作的 merge sort 和 Linux 核心程式碼之間效能落差
  • qtest 提供新的命令 shuffle,允許藉由 Fisher–Yates shuffle 演算法,對佇列中所有節點進行洗牌 (shuffle) 操作,需要以統計的原理來分析你的實作,探究洗牌的「亂度」
  • qtest 中執行 option entropy 1 並搭配 ih RAND 10 一類的命令,觀察亂數字串的分布,並運用「資訊與熵互補,資訊就是負熵」的觀念,設計一組新的命令或選項,得以在 qtest 切換不同的 PRNG 的實作 (可選定 Xorshift),搭配統計分析工具,比較亂數產生器 (如 Xorshift vs. 內建) 的品質
  • 研讀論文〈Dude, is my code constant time?〉,解釋本程式的 "simulation" 模式是如何透過以實驗而非理論分析,達到驗證時間複雜度,需要解釋 Student's t-distribution 及程式實作的原理。
    • 注意:現有實作存在若干致命缺陷,請討論並提出解決方案
    • 解釋 Student's t-distribution
    • oreparaz/dudect 的程式碼具備 percentile 的處理,但在 lab0-c 則無。當你改進後,可提交 pull request
  • 指出現有程式的缺陷或者可改進之處 (例如更全面地使用 Linux 核心風格的鏈結串列 API),嘗試實作並提交 pull request
  • 除了修改程式,也要編輯「作業區」,增添開發紀錄和 GitHub 連結,除了提及你如何逐步達到自動評分程式的要求外,共筆也要涵蓋以下:
    • 詳閱第 1 週所有教材及作業描述 (一), (二), (三), (四), (五),記錄你的啟發和疑惑
    • 開啟 Address Sanitizer,修正 qtest 執行過程中的錯誤
    • 運用 Valgrind 排除 qtest 實作的記憶體錯誤,並透過 Massif 視覺化 "simulation" 過程中的記憶體使用量,需要設計對應的實驗
    • 解釋 select 系統呼叫在本程式的使用方式,並分析 console.c 的實作,說明其中運用 CS:APP RIO 套件 的原理和考量點。可對照參考 CS:APP 第 10 章重點提示

Record of queue.[ch] Development

q_free()

Confusion

I tried to introducelist_for_each(pos, head) in q_free(). I don't understand the parameter pos used for. So I check The Linux Kernel API. Obviously, to operate test_free() funtion on the node, list_for_each_safe() is a better option due to it can remove current node safely. Next, utilizeq_release_element(node) to release element with list embeded in.

Why using test_free but not free function?

Refer to N01: lab0, for the better memory management, harness.h define a new structure called __block_element, which containing externel member magic_header used for memory accessing check. Different from pure free function, it can return alarm message when accessing error memory location.

q_insert_head()/q_insert_tail()

Unsafe function

Error message as below shown. CERN indicates that the functionstrcpy() is an unsafe function。

Dangerous function detected in /home/q36131045/lab0-c/queue.c
41:    strcpy(new->value, s);

cppcheck warning: style

Both messages shown below indicate that qualifierconstshould be added brfore char *s in the case that s isn't manipulated in the function.

$ git commit -a
Following files need to be cleaned up:
queue.c
queue.c:51:50: style: Parameter 's' can be declared as pointer to const [constParameterPointer]
bool q_insert_head(struct list_head *head, char *s)
                                                 ^
queue.c:57:50: style: Parameter 's' can be declared as pointer to const [constParameterPointer]
bool q_insert_tail(struct list_head *head, char *s)
                                                 ^
nofile:0:0: information: Active checkers: 106/592 (use --checkers-report=<filename> to see details) [checkersReport]


Fail to pass static analysis.

Refer to Inconsistent Cppcheck Version Leads to Differing Default Enabled Checking Flags.
Edit ./script/pre-commit.hook to suppress this warning message.

@@ -21,7 +21,6 @@ CPPCHECK_suppresses="--inline-suppr harness.c \
- --suppress=constParameterPointer:queue.c\

Utilize marco and remove redundant code

q_insert_head() & q_insert_tail() have extremely similarity. With the application of marco, both function usage can be simplified. Also, add a new parameter, "pos," to determine which position should be inserted.

@@ -21,7 +21,8 @@
         return false;                                                  \
     }                                                                  \
     strncpy(new->value, s, strlen(s) + 1);                             \
-    list_add(&new->list, to);                                          \
+    !strcmp(pos, "head") ? list_add(&new->list, head)                  \
+                         : list_add_tail(&new->list, head);            \
     return true;
@@ -66,14 +67,14 @@ void q_free(struct list_head *head)
 /* Insert an element at head of queue */
 bool q_insert_head(struct list_head *head, char *s)
 {
-    q_insert_new(head, s);
+    q_insert_new(head, s, "head");
 }

Fix bufsize issue in the marco

Add condiction branch to deal with the overflow problem. Also, the previous version had extreme logical error. That is, strlen() only return the length without last ending char, '\0'.

     list_del(&cur_elem->list);                                            \
     if (cur_elem && sp) {                                                 \
         size_t temp = strlen(cur_elem->value);                            \
-        strncpy(sp, cur_elem->value, temp + 1);                           \
-        sp[temp + 1] = '\0';                                              \
+        if (temp > bufsize - 1)                                           \
+            temp = bufsize - 1;                                           \
+        strncpy(sp, cur_elem->value, temp);                               \
+        sp[temp] = '\0';                                                  \
     }                                                                     \
     return cur_elem;

q_delete_mid()

Fix mid node position

In previous version, a pointer to a pointer is not necessary. Furthermore, the final node to delete should be the next one node because for loop starts at head.

     if (!head || list_is_singular(head))
         return NULL;
-    struct list_head **indir = &head, *fast = head;
+    struct list_head *indir = head, *fast = indir->next;
 
     for (; fast->next != head && fast->next->next != head;
          fast = fast->next->next)
-        indir = &(*indir)->next;
-    struct list_head *del = *indir;
+        indir = indir->next;
+    struct list_head *del = indir->next;
     list_del(del);
     q_release_element(list_entry(del, element_t, list));

q_sort()

Complexity issues

In first verion, I use insertion sort to implement. However, this method can't pass complexity test. Use merge sort algorithm instead can reduce time complexity.

Duplicate strings order issue in merge sort

+++ TESTING trace trace-15-perf:
ERROR: Not stable sort. The duplicate strings "akjks" are not in the same order.
ERROR: Not stable sort. The duplicate strings "akjks" are not in the same order.
---     trace-15-perf   0/6

Fix: using greater-than-or-equal-to operator instead.

-        if (((descend * 2) - 1) * strcmp(l->value, r->value) > 0) 
+        if (((descend * 2) - 1) * strcmp(l->value, r->value) >= 0) 

q_merge()

Revise merge function

To apply the merge function, which can merge two lists in expected order into a new list. However, in the previous merge function, the merging API function didn't consider the original list state. As below showed.

@@ -206,9 +208,9 @@ void merge(struct list_head *head,
         }
     }
     if (!list_empty(left))
-        list_splice_tail(left, head);
+        list_splice_tail_init(left, head);
     else
-        list_splice_tail(right, head);
+        list_splice_tail_init(right, head);
 }

Using list_splice_tail_init instead ensure the initialization of the head of the list can satisfy the requirement of NULL-queue after merge.

Studying lib/list_sort.c

list_sort problems record

merge function

  • How this function work?
    Merge two "null-terminated" linked lists into one "signly" linked list.

  • Why using for(;;) instead while(1)?
    Linux Kernel style. Also for cleaner implementation.

merge_final function

  • What is u8 count = 0; stand for?
    1 bit, equivalent to uint8_t count = 0;, defined in types.h.

  • Unbalanced issue
    In the final merge process, marco unlikely defined in compiler.h, which call cmp function when count overflow and turn out to be zero.

Unsolved:
a. How cmp works so that it is called here?
b. What is the statement "client's cmp() routine can invoke cond_resched() periodically" meaning for?

/* * If the merge is highly unbalanced (e.g. the input is * already sorted), this loop may run many iterations. * Continue callbacks to the client even though no * element comparison is needed, so the client's cmp() * routine can invoke cond_resched() periodically. */ if (unlikely(!++count)) cmp(priv, b, b);

list_sort function

  • Initial state before do loop
    list_sort
  • First loop count=0

Shuffle Command

In order to add new command, shuffle, I checked how qtest.c executed.

Progran startup

int main(int argc, char *argv[])

Refer to C11 5.1.2.2.1 program startup, it constrains that argc should be a nonnegative value and argv[0] through argv[argc-1] are used for program parameters where argv[0] is the program name.

while statement and getopt function

Next, quite interesting, how does the function getopt work, and how does the switch statement be used? Why these function used here?

while ((c = getopt(argc, argv, "hv:f:l:")) != -1) {
    switch (c) {
    ...
    }

Refer to The GNU C Library, which describes the usage of function getopt, indicating that this function is used for getting next option arguments from argument list specified by argc and argv arguments and will return ASCII value.

Refer to C11 6.8.4.2, The controlling expresion of a switch shall have intger type. As return ASCII value of function getopt, now we can use switch with the same option argument.

Now analyze cases one by one.

case 'h':
    usage(argv[0]);
    break;

Where function usage is:

static void usage(char *cmd)
{
    printf("Usage: %s [-h] [-f IFILE][-v VLEVEL][-l LFILE]\n", cmd);
    printf("\t-h         Print this information\n");
    printf("\t-f IFILE   Read commands from IFILE\n");
    printf("\t-v VLEVEL  Set verbosity level\n");
    printf("\t-l LFILE   Echo results to LFILE\n");
    exit(0);
}

Above codes show the usages of each flag.

Next case is the usage of flag f, which copy the parameter optarg which is the argument value provided by function getopt. As the same functionality of case l.

case 'f':
    strncpy(buf, optarg, BUFSIZE);
    buf[BUFSIZE - 1] = '\0';
    infile_name = buf;
    break;
case 'l':
    strncpy(lbuf, optarg, BUFSIZE);
    buf[BUFSIZE - 1] = '\0';
    logfile_name = lbuf;
    break;

q_init

Now the program call q_init function, which is trying to initialize initial stae of queue. As the below code shown.

static void q_init()
{
    fail_count = 0;
    INIT_LIST_HEAD(&chain.head);
    signal(SIGSEGV, sigsegv_handler);
    signal(SIGALRM, sigalrm_handler);
}

Briefly,refer to lab0-c, signal function is used for controlling signal.

TODO:
More detailed explanations of signal

init_cmd

The function init_cmd is in file console.c. In this function, a linked list will be created which is used for storing command declare in init_cmd.

Implement shuffle

First, I added the q_shuffle function, which is used for shuffling elements with a given linked list with the algorithm. Then, I added a do_shuffle function to check the program's status and relative variables.

Commit a8524fa

test_result

image

Expectation:  41666
Observation:  {'1234': 41943, '1243': 41628, '1324': 41683, '1342': 41631,
'1423': 41862, '1432': 41907, '2134': 41342, '2143': 41575, '2314': 41840, 
'2341': 41733, '2413': 41479, '2431': 41790, '3124': 41502, '3142': 41631,
'3214': 41713, '3241': 41587, '3412': 41457, '3421': 41733, '4123': 41826,
'4132': 41656, '4213': 41688, '4231': 41685, '4312': 41640, '4321': 41469}
chi square sum:  12.60792972687563

TODO: Enhance Accuracy of Entropy Measurement

Reading "Dude, is my code constant time?"

Reading "Teach Yourself Programming in Ten Years"

In my past point of view, programming language was just a language used to communicate with computers. I didn't even know the design of programming language or the principle of how it works, and most importantly, I was not interested in digging into it.
This article makes me review my attitude toward this course. More specifically, what attitude does the professor want me to learn in this course?

  • Accomplish tasks carefully not just finish tasks in the time
    At the beginning of this course, my primary goal was to submit assignments on time. However, it became clear that I couldn’t guarantee the quality of my code under such tight deadlines. What made things worse was that I didn’t even know how to measure code quality or properly utilize the tools provided by the professor.
    I realized that it wasn’t feasible to handle such a large workload so quickly. Therefore, I decided to revisit each piece of material provided by the teacher and study it carefully. As the teacher often says: “缺甚麼補甚麼” (fill in what you’re missing). Moving forward, I plan to rebuild my foundation so I’ll be ready for future challenges in the information technology industry.

  • Interact with other programmers
    In class, Jserv consistently emphasizes active interaction between the students and himself.

  • Work on projects after other programmers
    In lab0, the most valuable project and materials provided by the teacher.

  • Work on projects with other programmers
    We reviewed and commented on Lab0 together. This exercise taught me how to inspect other people’s code and ideas.

  • The importance of standardization of programming language
    This is one of the most valuable lessons from Jserv. It encourages me to consult official standards whenever I encounter technical issues, rather than relying solely on AI tools or casual Internet searches.

Select a repo