2024q1 Homework1 (lab0)

contributed by < ChiTingCheng >

Reviewed by `chloexyw`

若從兩個方向進行走訪，能避免計算佇列長度所花費的時間，能否以更短時間找到中間節點

因為是雙向鏈結串列，所以如果想從兩個方向走訪尋找中點時需要分別考慮奇數和偶數的兩種情形

奇數節點：可以找到中點
偶數節點：可以找到第
$\frac{n}{2}$ 個節點

struct list_head *left = head->next;
struct list_head *right = head->prev;
while ((left != right) && (left->next != right)) {
    left = left->next;
    right = right->prev;
}

Reviewed by david965154

計算佇列有效長度

在實作時有想過目前程式是計算迭代次數求得佇列中的節點數，由於佇列是雙向環狀鏈結串列，若以 head 為起點，從 next 和 prev 兩個方向分別走訪，當指標地址相同或出現交錯時離開迴圈，是否能以更短的時間計算佇列長度？考量到增加迴圈中條件式的成本。

先查看 list_for_each_entry 巨集的定義：

#define list_for_each_entry(entry, head, member)                       \
    for (entry = list_entry((head)->next, __typeof__(*entry), member); \
         &entry->member != (head);                                     \
         entry = list_entry(entry->member.next, __typeof__(*entry), member))

判斷式只有一個，不過若改成頭尾同時向雙向環狀鏈結串列中間移動計算節點數，判斷式可能不只一個，需考慮奇數及偶數節點所造成的不同情況

奇數:同時移動並交會，if(front->next == behind->prev)
偶數:同時移動並交錯，if(front == behind->prev)

這樣看下來，如果沒辦法使用一個判斷式就表示兩種情況，那麼會造成雖然兩邊同時移動，但判斷式數量仍相同，而不會降低時間計算佇列長度。

快慢指標的運用

q_sort

由於佇列是雙向環狀鏈結串列，使用快慢指標會無法找到中間點

事實上，若以 fast 存取快指標， slow 存取慢指標，則可以寫成以下

for (fast = fast->next; fast != head && fast->next != head;
         fast = fast->next->next, slow = slow->next)
        ;

結束時 slow 即指向中間節點。

函式宣告

參考 jimmylu890303
將 q_test.c 寫入 extern 宣告此函式會在他處被定義，亦即：

extern void q_shuffle(struct list_head *head);

Reviewed by `jserv`

commit 0f5034d 僅提到 Fisher–Yates shuffle，但沒說明考量因素，例如程式碼的時間複雜度。另一 commit d6405bb 的標題也是 "Add q_shuffle into queue"，修改紀錄也雷同，但顯然變更的程式碼不同，這就會造成他人 (包含一個月以後的「你」) 理解及後續維護的問題。使用 git rebase -i 來修訂 git commit message。

commit 8d02fec 提及重寫 q_swap 是想重用 q_reverseK 的程式碼，但在 q_swap 卻沒看到對應的函式呼叫。

commit 272680d "Fix function problem in q_merge" 標題不足以反映出程式碼的變更，到底是什麼「問題」？開發紀錄也看不出原本的程式碼有何過錯，資訊量不足。前一項修改 commit aa59c39 不是宣稱 "Complete" 嗎？後續還做修改為了什麼？查詢英語辭典，以理解 "complete" 的用法，並對照作業需求，看語境是否存在衝突。

開發環境

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              2
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           61
Model name:                      Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Stepping:                        4
CPU max MHz:                     2700.0000
CPU min MHz:                     500.0000
BogoMIPS:                        3199.66
Virtualization:                  VT-x
L1d cache:                       64 KiB (2 instances)
L1i cache:                       64 KiB (2 instances)
L2 cache:                        512 KiB (2 instances)
L3 cache:                        3 MiB (1 instances)
NUMA node0 CPU(s):               0-3

針對佇列操作的程式碼實作

在開始實作前先閱讀課堂教材 linked list 和非連續記憶體，在 lab0 當中，要通過對雙向環狀鏈結串列 (cicular double-linked list) 的操作來建立自定義結構的佇列 (queue) ，示意圖如下：

Image Not Showing Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →

參照Linux 核心的 hash table 實作，使用 Graphviz 重新製圖，並嵌入到 HackMD 筆記中。

自 Head 開始將我們定義的結構體 list_head 嵌入個別節點 element_t 結構體中，並藉由 container_of 巨集來找尋對應的節點，並善用 list.h 中既有的函式來完成實作。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

無法藉由 container_of 巨集找到對應的節點， head 的存在目的是照到對應的佇列。

改進你的漢語表達，注意嵌入到結構體者實際是什麼？

q_new

在實作前可以參閱 queue.h 文件中對各函式的詳細介紹。
q_new 會建立一個空的佇列，其 struct list_head 結構體中的 next 及 prev 指標會指向其本身，使用 malloc 運算子配置一個 struct list_head 需要的空間，當配置失敗時回傳 NULL。若配置成功，使用 INIT_LIST_HEAD 初始化 struct list_head，讓 next 與 prev 指向自身。

q_free

釋放 head 對應佇列使用的空間，在一開始先使用 if 判斷佇列是否存在，若不存在則返回。若存在，使用 list_for_each_safe 走訪佇列中的每一個節點，藉由額外的 safe 紀錄迭代操作節點的下一個節點，因此允許移除目前的操作節點。

使用 list_del 可將節點從所屬佇列中移走，但此處我們還需要將 element_t 結構體所配置的記憶體釋放，使用 list_entry 取得結構體的記憶體位址，並透過 q_release_element 釋放，在釋放完所有節點後，最後再將 head 所指的 struct list_head 結構體空間釋放。

`q_insert_tail` / `q_insert_tail`

將新的節點加入到佇列的開頭或尾端中，當成功時回傳 true，當佇列不存在或配置記憶體空間失敗時回傳 false。

除了配置記憶體空間給 element_t 結構體外，另外需要配置足夠空間給字串 s，定將其複製到空間中。配置時需注意額外配置 1 個 byte，用於存放結束符 \0。

使用 list_add 或 list_add_tail 將 element_t中的 list 加入佇列中。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

若配置 element_t 空間成功，但配置字串空間失敗時，仍需釋放原本配置的結構體空間。

`q_remove_head` / `q_remove_tail`

將節點從佇列的開頭或尾端中移走，移走的節點空間不需釋放，並回傳該 element_t 結構體位址。當佇列不存在或為空時，回傳 NULL。

使用 list_del 將節點移走，另外需要將 element_t 中存放的字串複製進 sp 中，需考慮 bufsize 大小，先計算字串佔用空間再與 bufsize 比較。

q_size

計算佇列長度，使用 list_for_each 可走訪佇列中所有節點，使用一個計數器計算迭代次數後回傳結果。

在實作時有想過目前程式是計算迭代次數求得佇列中的節點數，由於佇列是雙向環狀鏈結串列，若以 head 為起點，從 next 和 prev 兩個方向分別走訪，當指標地址相同或出現交錯時離開迴圈，是否能以更短的時間計算佇列長度？考量到增加迴圈中條件式的成本。

`q_delete_mid`

刪除佇列中間的節點，可以使用 q_size 先取得佇列長度 n，再走訪至該節點並將其移除，這邊我加入 math.h 中的上取整函數 (ceiling function) ，協助我處理佇列長度有奇或偶數的情況。與 q_aize 相同，若
從兩個方向進行走訪，能避免計算佇列長度所花費的時間，能否以更短時間找到中間節點，可以在後續與 q_size 一同進行比較。

`q_delete_dup`

將佇列中重複內容的節點刪除，在 queue.h 中並沒有提到佇列是經過排序，但在附上的 leetcode 參考問題中有說明佇列是經過排序，因此這邊先假設佇列已經過排序為前提進行實作，後續可再延伸思考未經排序的實作方式。

`q_swap`

commit 8d02fec

將佇列中的節點兩個一組順序調換，節點總數若為奇數，則最後一個節點不更動，這個實作第一個想到的作法是修改 list_for_each_safe 的迴圈條件，再針對奇偶分開處理，但明顯函式內容過於冗長，後續在實作 q_reverseK 時發現可以將 q_swap 視為 q_reverseK 當 K 為 2 時的情況，以下是修正後的函式。

-    for (node = (head)->next, safe = node->next;
-        node != (head) && node->next != (head);
-        node = safe, safe = node->next) {
-        tmp->next = safe;
-        node->next = safe->next;
-        safe->prev = tmp;
-        node->prev = safe;
-        safe->next = node;
-        tmp = node;
+    int num = q_size(head) / 2;
+    for (; num != 0; num--) {
+        node = tmp->next;
         safe = node->next;
-    }
-    if (q_size(head) % 2) {
-        node->prev = tmp;
-    } else {
-        head->prev = tmp;
+        list_del(safe);
+        list_add(safe, tmp);
+        tmp = node;

q_reverse

將佇列內部節點順序反轉，使用 list_for_each_safe 走訪整個佇列，在每次迭代時調整操作節點內指標所指的位址，最後在調整 head 內的指標所指位址。有和同學討論出能利用 list_del 和 list_add 對單一節點操作，重新調整佇列順序，此方法也能達成要求。

遇到的問題：

command 是「命令」，而非「指令」(instruction)

在完成 q_reverse 後有先用 ./queue 作測試，結果是能正常執行的，但在終端機下指令命令

$ make test

trace-14-perf 一直無法通過測試，並出現以下錯誤訊息：

ERROR: Time limit exceeded.  Either you are in an infinite loop, or your code is too inefficient
ERROR: Freed queue, but 4000000 blocks are still allocated

但與 q_reverse 相關的 trace-03 與 trace-05 是能通過測試，有嘗試過在同學的電腦上執行相同程式碼，是能正常通過測試，並且在 GitHub 上執行也沒有問題，這部份目前還沒有找到解決辦法，不確定是否跟硬體本身有關。

更新：後續用 Valgrind 分析時，也能正常通過測試。

q_reverseK

將佇列中每 K 個節點為一組進行順序反轉，這個實作可以利用 q_reverse 協助完成，將每個分段視為一個佇列來處理，再將分段間重新相連即可完成實作。

q_ascend / q_descend

若目標節點右方存在數值比本身大 (或小) 的節點時，將目標節點移走，一開始的想法是從第一個節點開始與佇列中後方節點逐個比較，但此方法其時間複雜度為

O (n^{2})

，應該其他更好的作法，如果觀察執行完函式後的佇列，從尾端往前看會是一個嚴格遞增 (或遞減) 的佇列，對於佇列處理會更簡單，於是在實作時決定改從最後方開始走訪，僅需一次走訪即可達成要求。

以嚴格遞減為例，從 haed 開始往 prev 方向走訪，目標節點為 node ，比較節點為 cmp 對佇列作走訪，當 cmp 所含的數值比 node 所含的數值小時，將 cmp 移走， cmp 所含的數值比 node 所含的數值大時， node 改為指向 cmp 所在地址，cmp 繼續移動直到走訪完所有節點。

問題：在 queue.h 的函式描述中，是用 remove node ，但如果不釋放記憶體空間，在後續 qtest.c 執行時會回報記憶體未釋放的錯誤，因此最後實作時還是刪除節點，在 queue.h 中的敘述需要做修正。

q_sort

這邊選用 merge sort 的方式去實作 q_sort，一開始的想法是先將佇列切割至只剩單個節點的佇列，再將佇列兩個一組執行 merge ，但在實作時遇到進入無限迴圈的問題，在檢查完程式碼後發現，在佇列切割的執行期間，由於佇列是雙向環狀鏈結串列，使用快慢指標會無法找到中間點，和 YaRu056 同學討論後決定改用 q_size 取得佇列長度以找尋中間點，再對切割後的佇列進行 merge。
merge sort 只執遞增順序的排序，最後再依據輸入的布林值 descend 決定是否對佇列作反轉。

q_merge

將已排序過的多個佇列，合併為一個排序過的佇列，目前提出的實作方式是先將多個佇列合併為一個未排序的佇列，最後再執行排序，第一次完成的程式碼無法通過測試，在重新檢查後發現是取得佇列地址時引數設錯，在 queue.h 中定義一個 context_t 結構體來管理佇列，在 context_t 中 q 儲存佇列地址，chain 能指向其他 context_t ，能使用 list_for_each_entry 來走訪所有佇列，函式在重新改寫後能順利通過測試。

-    struct list_head *node, *safe;
-    list_for_each_safe (node, safe, head) {
-        list_splice_tail(node, head);
+    if (!head || list_empty(head))
+        return 0;
+    queue_contex_t *tmp = list_first_entry(head, queue_contex_t, chain);
+    queue_contex_t *cont = NULL;
+    list_for_each_entry (cont, head, chain) {
+        if (tmp != cont) {
+            list_splice_init(cont->q, tmp->q);
+            tmp->size += cont->size;
+            cont->size = 0;
         }
     }
-    q_sort(head, descend);
-    return q_size(head);
+    q_sort(tmp->q, descend);
+    return tmp->size;

在完成 queue.c 後以 make teat 測試，得分為 95 分，剩下 trace-17-complexity 的複雜度測試未通過，根據作業說明，現有實作存在若干致命缺陷仍需修正，目前還沒找到解決方法。

目前實作仍有眾多可以改進的地方，後續若有改寫會再更新。

以 Valgrind 分析記憶體問題

依據作業說明，可以使用 Valgrind 對記憶體分析，找出可能的記憶體問題如，memory leak, buffer overflow, Dangling pointer 等。在 lab0-c 中已整合 Valgrind
只要終端機下以下指令即可得到測試結果。

$ make valgrind

但輸出結果並沒有出現作業說明中提到的錯誤訊息。

```shell --- Trace Points +++ TESTING trace trace-01-ops: # Test of insert_head and remove_head --- trace-01-ops 5/5 +++ TESTING trace trace-02-ops: # Test of insert_head, insert_tail, remove_head, remove_tail, and delete_mid --- trace-02-ops 6/6 +++ TESTING trace trace-03-ops: # Test of insert_head, insert_tail, remove_head, reverse and merge --- trace-03-ops 6/6 +++ TESTING trace trace-04-ops: # Test of insert_head, insert_tail, size, swap, and sort --- trace-04-ops 6/6 +++ TESTING trace trace-05-ops: # Test of insert_head, insert_tail, remove_head, reverse, size, swap, and sort --- trace-05-ops 6/6 +++ TESTING trace trace-06-ops: # Test of insert_head, insert_tail, delete duplicate, sort, descend and reverseK --- trace-06-ops 6/6 +++ TESTING trace trace-07-string: # Test of truncated strings --- trace-07-string 6/6 +++ TESTING trace trace-08-robust: # Test operations on empty queue --- trace-08-robust 6/6 +++ TESTING trace trace-09-robust: # Test remove_head with NULL argument --- trace-09-robust 6/6 +++ TESTING trace trace-10-robust: # Test operations on NULL queue --- trace-10-robust 6/6 +++ TESTING trace trace-11-malloc: # Test of malloc failure on insert_head --- trace-11-malloc 6/6 +++ TESTING trace trace-12-malloc: # Test of malloc failure on insert_tail --- trace-12-malloc 6/6 +++ TESTING trace trace-13-malloc: # Test of malloc failure on new --- trace-13-malloc 6/6 +++ TESTING trace trace-14-perf: # Test performance of insert_tail, reverse, and sort --- trace-14-perf 6/6 +++ TESTING trace trace-15-perf: # Test performance of sort with random and descending orders # 10000: all correct sorting algorithms are expected pass # 50000: sorting algorithms with O(n^2) time complexity are expected failed # 100000: sorting algorithms with O(nlogn) time complexity are expected pass --- trace-15-perf 6/6 +++ TESTING trace trace-16-perf: # Test performance of insert_tail --- trace-16-perf 6/6 +++ TESTING trace trace-17-complexity: # Test if time complexity of q_insert_tail, q_insert_head, q_remove_tail, and q_remove_head is constant Probably constant time Probably constant time ERROR: Probably not constant time or wrong implementation ERROR: Probably not constant time or wrong implementation --- trace-17-complexity 0/5 --- TOTAL 95/100 ```

列出關鍵訊息，並予以討論。

於是為了測試 Valgrind 是否正常運作，我刻意將 q_del_mid 中的 q_release_element() 註解掉，並重新下命令測試。

    while (num) {
        tmp = tmp->next;
        num--;
    }
    element_t *node = list_entry(tmp, element_t, list);
    list_del(&node->list);
    // q_release_element(node);
    return true;

測試結果

+++ TESTING trace trace-02-ops:
# Test of insert_head, insert_tail, remove_head, remove_tail, and delete_mid
ERROR: Freed queue, but 4 blocks are still allocated
==3597== 47 bytes in 1 blocks are still reachable in loss record 1 of 4
==3597==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3597==    by 0x10F2E7: test_malloc (harness.c:133)
==3597==    by 0x10F55F: test_strdup (harness.c:212)
==3597==    by 0x10F7A7: q_insert_head (queue.c:44)
==3597==    by 0x10CB32: queue_insert (qtest.c:232)
==3597==    by 0x10CD00: do_ih (qtest.c:280)
==3597==    by 0x10DFD1: interpret_cmda (console.c:181)
==3597==    by 0x10E586: interpret_cmd (console.c:201)
==3597==    by 0x10E987: cmd_select (console.c:610)
==3597==    by 0x10F273: run_console (console.c:705)
==3597==    by 0x10D3C3: main (qtest.c:1258)
==3597== 
==3597== 48 bytes in 1 blocks are still reachable in loss record 2 of 4
==3597==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3597==    by 0x10F2E7: test_malloc (harness.c:133)

此時終端機有回報錯誤訊息，這代表 Valgrind 是有在運作，第一次沒有回報錯誤代表測試過程沒有出現記憶體問題，或是我的測試方式有遺漏的部份，~~若同學有想法歡迎提出。~~

不用特別說「若同學有想法歡迎提出」，課程的作業之所以公開並允許所有登入者編輯，就是為了相互指教、相互學習。

更新：後續有再重複測試， Valgrind 皆正常執行，沒有回報錯誤。

用效能分析工具比較 list_sort.c 及自己實作的排序

用 valgrind cachegrind 分析 cache miss：

在 qtest 提供新的命令 `shuffle`

利用 Fisher–Yates shuffle 演算法來實作洗牌（shuffle），並在 qtest 中加入新的命令 shuffle。

首先在 queue.c 中加入新函式 q_shuffle，函式宣告如下所示

/**
 * q_shuffle() - Implement Fisher–Yates shuffle in queue
 * @head: header of queue
 * 
 * No effect if queue is NULL or empty. 
 */
void q_shuffle(struct list_head *head);

再來是需要在 q_test.c 中加入新的命令，命令的結構可以參考其他命令的架構設計，我是利用 reverse 命令的結構去改，另外還需要到 q_ueue.h 中宣告。

第一次沒宣告終端機回傳以下錯誤訊息：

qtest.c:1015:13: error: ‘do_shuffle’ defined but not used [-Werror=unused-function]
 1015 | static bool do_shuffle(int argc, char *argv[])
      |             ^~~~~~~~~~
cc1: all warnings being treated as errors

在宣告後執行成功，使用 ./qtest 測試命令，以下為終端機輸出結果：

l = [3 8 3 8 4 7 4 2 1]
cmd> shuffle
l = [8 1 3 4 3 7 2 4 8]
cmd> quit
Freeing queue

成功執行 shuffle 的命令，但在我準備將程式碼 push 到 GitHub 時，終端機回傳以下錯誤：

[!] You are not allowed to change the header file queue.h or list.h

我無法修改 queue.h 與 list.h，於是我尚未 push 目前版本，在 qtest.c 與 queue.c 中添加的程式碼紀錄於筆記上。

static bool do_shuffle(int argc, char *argv[]) { if (argc != 1) { report(1, "%s takes no arguments", argv[0]); return false; } if (!current || !current->q) report(3, "Warning: Calling shuffle on null queue"); error_check(); set_noallocate_mode(true); if (current && exception_setup(true)) q_shuffle(current->q); exception_cancel(); set_noallocate_mode(false); q_show(3); return !error_check(); }

標示對應的 GitHub commit 超連結。

:::spoiler queue.c 中增加的 q_shuffle() ```c /*Implement Fisher–Yates shuffle in queue*/ void q_shuffle(struct list_head *head) { if (!head || list_empty(head)) return; int len = q_size(head) - 1; struct list_head *new = head->prev, *old = NULL; for (; len != 0; len--) { for (int num = rand() % len; num == 0; num--) old = head->next; if (new == old) continue; struct list_head *tmp = old->prev; list_move(old, new); list_move(new, tmp); new = old; } } ``` :::

列出程式碼的目的是為了討論和配合闡述你的洞見 (而非宣稱自己有投入)，反之，若沒有相關討論，則沒必要列出程式碼。

統計方法驗證 `shuffle`

參考 lab0 的作業說明，創建 scripts/shuffle_test.py, 將範例程式碼放入後執行，終端機回傳以下結果：

Expectation:  41666
Observation:  {'1234': 41771, '1243': 41660, '1324': 41841, '1342': 41559, '1423': 41681, '1432': 41734, '2134': 41656, '2143': 41932, '2314': 41570, '2341': 41774, '2413': 41754, '2431': 41674, '3124': 41936, '3142': 41441, '3214': 41361, '3241': 41650, '3412': 41792, '3421': 41267, '4123': 41775, '4132': 41682, '4213': 41655, '4231': 41668, '4312': 41671, '4321': 41496}
chi square sum:  14.174578793260693

Figure_1

2024q1 Homework1 (lab0)

Reviewed by chloexyw

Reviewed by david965154

Reviewed by jserv

開發環境

針對佇列操作的程式碼實作

q_new

q_free

q_insert_tail / q_insert_tail

q_remove_head / q_remove_tail

q_size

q_delete_mid

q_delete_dup

q_swap

q_reverse

遇到的問題：

q_reverseK

q_ascend / q_descend

q_sort

q_merge

以 Valgrind 分析記憶體問題

用效能分析工具比較 list_sort.c 及自己實作的排序

用 valgrind cachegrind 分析 cache miss：

在 qtest 提供新的命令 shuffle

統計方法驗證 shuffle

參考資料

Read more

Assignment2: RISC-V Toolchain

Assignment3: Single-cycle RISC-V CPU

Assignment1: RISC-V Assembly and Instruction Pipeline

2024q1 Homework2 (quiz1+2)