# 2025q1 Homework1 (lab0)
contributed by <`DrXiao`>
{%hackmd NrmQUGbRQWemgwPfhzXj6g %}
## 開發環境
```shell
$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 13th Gen Intel(R) Core(TM) i7-13700F
CPU family: 6
Model: 183
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 1
CPU(s) scaling MHz: 23%
CPU max MHz: 5200.0000
CPU min MHz: 800.0000
BogoMIPS: 4224.00
L1d: 640 KiB (16 instances)
L1i: 768 KiB (16 instances)
L2: 24 MiB (10 instances)
L3: 30 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-23
```
<s>
* GitHub 檔案庫:[DrXiao/lab0-c](https://github.com/DrXiao/lab0-c)
</s>
:::danger
不用在此提及 repository 資訊,你該聚焦在細節。
注意 markdown 的項目編號,也就是內文該用 `##` 和 `###`,避免過深的縮排
:::
## 開發前處理
在開發 labo-c 時,因為 sysprog21/lab0-c 本身後來又有不少 pull request 被合併,故每當有新的修改,會隨時同步到自己的檔案庫後,才繼續開發下去。
## 為所有佇列操作介面提供初步實作
* 對應到 commit [aea1a33](https://github.com/DrXiao/lab0-c/commit/aea1a332dde9a694cac43bbbe9bb9b3e591b8af5)
在 GitHub 上產生 lab0-c 的檔案庫後,第一個想法是先為所有的佇列操作撰寫可以運作的程式碼,先了解每個操作函式應該要達成什麼目的,爾後尋找改進的機會,接著會一一說明目前的實作原理
### `q_new()`
以 lab0-c 設計的佇列操作介面來說,會設想開發者會以 Linux 核心風格的 `list_head` 結構體,以該結構體的指標去維護整個佇列,因此 `q_new()` 的目的即是適當地建立一空佇列,動態配置記憶體後,將佇列本身的指標作為該函式的回傳值,爾後讓開發者以其他介面操作佇列。
故這裡的步驟很簡單:
* 動態配置一個 `list_head` 的物件,協助初始化該物件的指標後,就回傳該物件的記憶體位置。
* 但若遭遇動態配置失敗,則回傳 `NULL`。(爾後使用佇列的開發者要自行注意對應的處理)
因此目前 `queue.c` 中,即可見到上面步驟的實作
### `q_free()`
因為是以動態配置記憶體的方式,提供開發者使用佇列,自然也要有對應的釋放記憶體的操作,讓開發者釋放不再使用的佇列。
而這裡做的事情,**自然是逐一走訪一個佇列中仍存在的節點,並對每一個節點進行記憶體的釋放,最後再釋放 `q_new()` 而來的 `list_head` 物件**(即佇列的開頭指標)。
而這裡的實作步驟如下:
* 以 `list_for_each_safe()` 巨集,配合兩個 `list_head` 指標,分別命名為 `curr` 、 `safe`,走訪所有節點
* 使用 `element_t` 的指標、`curr` 指標再加上 `list_entry()` 巨集,即可取得節點的開頭位址,爾後
* 對每一個節點進行移除操作,將前一節點、下一節點互相鏈結起來
* 先釋放節點維護的資料,即 `element_t` 的 `value` 的記憶體,再釋放節點本身的記憶體
* 走訪完所有節點以後,最後釋放 `head`
<s>warning</s>
:::danger
不要濫用 `warning` 標示
:::
但這裡其實最初實作時,是使用 `list_for_each_entry_safe` 處理,但在要 git commit 時,Git hook(或是說 `cppcheck` )會產生下面的訊息:
```
Following files need to be cleaned up:
queue.c
Running static analysis...
queue.c:35:5: style: Label 'int' is not used. [unusedLabel]
list_for_each_entry_safe (curr_elem, safe_elem, head, list) {
^
queue.c:272:5: style: Label 'int' is not used. [unusedLabel]
list_for_each_entry (curr_elem, head, list) {
^
queue.c:303:5: style: Label 'int' is not used. [unusedLabel]
list_for_each_entry (curr_elem, head, list) {
^
Fail to pass static analysis.
```
似乎是 `cppcheck` 認為我定義了一個名為 `int` 的 jump label(可用 `goto` 進行無條件跳躍),但沒有被使用從而輸出這道訊息,類似的情形也發生在 `list_for_each_entry` 上。
目前這裡不確定是我當時候使用上真的有問題,還是 Git hook 撰寫的規則有潛在瑕疵,或許後續再擇時來釐清。
### `q_insert_head()` 、 `q_insert_tail()`
這兩個函式都是為佇列增加節點的操作,共通點都是會給定「佇列的開頭指標」、「新節點的資料」,唯獨差別在於前者要將新節點安插在佇列的第一個位置,而後者則要將新節點插入在佇列的最尾端。
故這裡的步驟如下:
1. 先動態配置 `element_t` 物件
2. 再利用 `element_t` 的 `value` 指標,配置記憶體。
* 因為此處佇列每一節點的資料即是一道字串,故使用 `strdup()`,直接完成動態配置及複製字串的操作
3. 將新節點安插至適當位置後,便回傳 `true`
* 以 `q_insert_head()` 來說,就要使用 `list_add()`,即可將節點插入在第一個位置
* 相對地,`q_insert_tail()` 就是使用 `list_add_tail()`,將節點放在最後一個位置
4. 若遭遇動態配置失敗,釋放已配置的記憶體後,就回傳 `false`
### `q_remove_head()` 、 `q_remove_tail()`
類似於前面討論增加節點,這兩個函式都是移除節點,差別僅在於要操作的佇列的開頭節點還是尾端的節點。
然而,以這兩個函式的設計來說,若開發者有給定合法的字元陣列及其大小的話,就要將移除節點的資料,複製到該字元陣列裡面。最後移除成功後,要將所移除的節點的記憶體位址作為回傳值,回傳給呼叫者
處理步驟:
1. 若佇列開頭指標是 `NULL`,或是佇列本身是空佇列,就直接回傳 `NULL`
2. 若有節點可移除,就利用 `element_t` 指標、佇列的開頭指標、`list_entry()`,取得某個節點的開頭位址
* `q_remove_head` 要使用 `head->next` ,配合巨集來取得第一個節點的開頭位址
* `q_remove_tail` 則要用 `head->prev` ,配合巨集取得最後一個節點的開頭位址
3. 若有給定合法的字元指標(`sp` 不為 `NULL`),就利用 `strncpy()` 、 `bufsize` ,將移除節點的資料,複製到 `sp` 裡。
4. 最後利用 `list_del()` 斷開移除節點的鏈結,並回傳移除節點的開頭位址
* `q_remove_head` 使用 `list_del()` 時要帶入 `head->next`。
* `q_remove_tail` 則帶入 `head->prev`。
不過**目前的實作存在一些缺失,導致 `trace-07-string` 的測試項目失敗**,後續會釐清並改進
### `q_size()`
這個函式是用來計算並回傳佇列所包含的節點數量,實作上只需要透過一個 `list_head` 指標一個計數器,搭配 `list_for_each()` 巨集,逐一走訪每一個節點並對計數器的值加 1 即可。
而當然地,若給定的佇列開頭指標不合法或是給定一空佇列,直接回傳 0 即可。
### `q_delete_mid()`
這個函式的用意是,刪除佇列的正中間的那個節點,這裡以兩個例子舉例
* **包含奇數個節點的佇列**
若給定一個奇數個節點(假設數量是 $n$)的佇列,要刪除的就是第 $\frac{n+1}{2}$ 個節點。
假設 $n$ 是 5 為例,要刪除的就是第 3 個節點(正中間的節點)

* **包含偶數個節點的佇列**
給定一個奇數個節點(假設數量是 $n$)的佇列,要刪除的就是第 $\frac{n}{2}$ 個節點。
假設 $n$ 是 4 為例,要刪除的就是第 2 個節點。因為偶數的關係,對於第 2 與第 3 個節點來說,都可說是「正中間」,但此處定義要刪除的是最靠近 `head` 的那個中間節點

---
因為佇列在 `q_new()` 時,僅使用 `list_head` 指標維護整個佇列,除此之外,不會儲存其他資訊。因此最直接的作法,可以先走訪整個陣列一次,得知佇列節點的數量後,再走訪到正中間的節點並移除。
不過在指標操作上,可以使用**快慢指標**的技巧,準備兩個 `list_head`,分別命名為 `slow`、`fast`,都先指向第一個節點,爾後利用迴圈走訪佇列時:
* `slow`: 每次走訪一個節點後,前往下一個節點,即 `slow->next`
* `fast`: 每次走訪一個節點後,前往下下一個節點,即 `fast->next->next`
這裡將兩個指標的移動以走路的步數來比喻,因為每次移動時,`fast` 會比 `slow` 還要快一步,故當 `slow` 走了 $n$ 步時,`fast` 就走了 $2n$ 步。基於這樣的特性也就代表,**若 `fast` 已經抵達佇列的尾端節點,`slow` 剛好走到整個佇列的一半,也就是正中間的節點**。如此,透過快慢指標,只要用一個迴圈讓兩個指標走訪,`fast` 走訪完整個佇列的那一刻,`slow` 所指向的節點就是正中間的節點。
因此在 `q_delete_mid()` 的實作裡,就使用前述的兩個指標、一個 `element_t` 的指標(`del_elem`),在 `fast` 走訪完佇列後,讓 `del_elem` 利用 `list_entry()` 與 `slow`,取得正中間的節點本體,最後再進行刪除即可。
:::danger
考慮到 List API 針對「環狀」且「雙向」的鏈結串列,改寫上述的 fast-slow pointer 手法並比較。
:::
而最後再說明一些細節處理:
* 一個佇列包含 n 個節點,
* 若 $n$ 是偶數,`fast` 指標最後會順利地回到佇列的開頭指標(`head`),並且 `slow` 也找到適當的正中間節點
* 若 $n$ 是奇數,則要小心處理,因為每次走兩步的關係,會有一次剛好走到第 $n$ 個節點(即最後一個節點),此時若再次移動兩步,又會移動到第一個節點,從而讓迴圈繼續下去。最終 `slow` 會得到非預期的節點。
* 因此迴圈進行時,要同時判斷 `fast->next`、`fast->next->next` 是否都不會回到 `head`
* 這個函式明確是要進行**刪除**而非移除,故斷開正中間節點的鏈結後,也要釋放該節點所配置的記憶體
### `q_delete_dup()`
這個函式的用意是若**給定已排序的佇列**,若有重複出現的資料,就要將帶有相同資料的節點全部刪除,舉例如下:

上圖可見,佇列當中,有三個節點都擁有字串 `"alpha"`,以 `q_delete_dup()` 的行為來說,最終應刪除這三個節點,最後使佇列留下那些資料僅出現一次的那些節點:

故最終,整個佇列會留下兩個節點(分別包含 `"beta"` 、 `"gamma"`)
---
因為該函式認定佇列是已經排序過的,所以若有重複的字串出現,它們所在的節點必定是相鄰的,故可以用兩個指標(以下以 `curr`、`iter` 代稱)走訪整個佇列處理,具體手法下:
1. 利用 `list_for_each()` + `curr` 開始走訪節點,而`iter` 在每次執行迴圈內容的一開始,都指向 `curr` 的下一個節點。
2. 適當利用 `element_t` 指標及 `list_entry()` 後,比較 `curr` 與 `iter` 所在節點的字串
* 若相同,以一個布林變數標記為 `true`,代表有重複的字串出現,爾後 `iter` 則移動至下一個節點,且重複第 2 步
* 若不同,前進至第 3 步
3. 利用布林變數得知重複的狀況
* 若有發現有重複的字串出現,則開始移動 `curr`,將 `curr` 所在的節點開始,到 `iter` 的前一個節點,通通進行刪除。
* 若沒有重複的字串,不需要做其他動作
這邊接續前面的舉例,配合示意圖輔助說明上述步驟:

上圖可見,`iter` 會如同前述步驟所述,因為節點資料與 `curr` 相同,故不斷往下走,直至走到帶有 `"beta"` 的節點而停止,而顯而易見地,`curr` 到 `iter->prev` 之前的節點,都帶有 `"alpha"` 字串,故這些節點都要刪除,而接下來會如下圖所示:

因為前面遭遇重複資料的節點, `curr` 逐一走訪這些節點並刪除後,就會移動到 `iter` 所在的節點。爾後 `iter` 再度指向 `curr` 的下一個節點,重複前述步驟。
* `curr` 指向 `"beta"` 的節點時,`iter` 會指向 `"gamma"` 的節點,這時發現沒有重複出現的字串,不需要刪除節點。故 `curr` 直接因為 `list_for_each()` 直接走向下一個節點(即是 `iter` 所在的節點)
* `curr` 指向 `gamma` 的節點時,因為 `iter` 已經回到佇列的開頭指標,仍未發現有重複的資料出現,故不須刪除節點
最後,`curr` 會再次往下走一步,同樣地回到佇列的開頭指標,此時即完成了該函式的操作。
### `q_swap()`
這個函式的用意是,將佇列的所有節點,每兩個相鄰節點組成一對,爾後將每一對節點的位置互相交換。(若遭遇奇數個節點,則最後一個節點不做處理)
針對這個操作,只需要如下處理:
* 依序走訪第 1、3、5 ... 個位置的節點
* 透過 `list_head` 的 `next` 成員,存取這些奇數位置節點的下一個節點,即第 2、4、6 ... 個位置的節點
* 直接將成對的節點,以 `list_move()`,互換成對節點的位置即可
故實作上可見,用了 `list_for_each()` 及 `list_head` 指標(命名為 `curr`)走訪節點,並以 `list_move()` 將 `curr->next` 移動到 `curr` 之前,這樣成對的節點便可互換位置。
最後這邊的細節是:
1. `curr` 原先指向奇數位置的節點,進行位置互換後,`curr` 本來所指向節點,其位置索引值自然會變成偶數,直接透過 `list_for_each()` 隱含的 `curr = curr->next`,即可前往下一個奇數位置的節點。也就是說 **`list_move()` 完成後,不需要思考 `curr` 是否要另外處理移動的問題。**
2. `list_move()` 的第一個參數確實是帶入 `curr->next`,表示要移動偶數位置的節點,但第二個參數須注意的是,要帶入 `curr->prev`,**代表說 `curr->next` 指向的節點要移動到 `curr->prev` 所指向的節點的後面**。
### `q_reverse()`
這個操作是反轉整個佇列,也就是說
* 第一個節點變成最後一個節點,最後一個節點變成第一個節點
* 第二個節點變成倒數第二個節點,倒數第二個節點變成第二個節點
* ...
這裡的處理只需要依序走訪所有節點,將每一個節點的 `list_head`,`prev` 與 `next` 成員所儲存的記憶體位址互換即可。然而佇列的開頭指標(`head`)因為本質上也是儲存一個 `list_head` 物件,`next` 儲存第一個節點的位址、`prev` 儲存最後一個節點的位址,因此「交換 `next` 與 `prev` 儲存的內容」的動作,同樣地也要作用在 `head` 所指向的 `list_head` 物件上。
如此歸納後,可以更簡單的說,只要依序走訪佇列當中所有的 `list_head` 物件,對每個物件交換其兩個指標的內容,即可完成佇列反轉。
### `q_reverseK()`
這個函式與 `q_reverse()` 類似,目的是為佇列進行反轉動作,不同的是,是每一定數量的節點進行局部的反轉。
開發者可在該函式當中,用第二個參數 `k`,指定每 `k` 個節點進行反轉,舉例如下:

上圖舉例一個包含 6 個節點的佇列,在指定 `k` 為 3 並執行 `reverseK()` 以後,
* 會將前 3 個節點 `"alpha" <--> "gamma" <--> "beta"` 視作一組,反轉後即得到 `"beta" <--> "gamma" <--> "alpha"`
* 而後 3 個節點也是相似的道理,如上圖所示,最後會變成 `"bbb" <--> "ttt" <--> "ppp"` 的順序。
---
實作上,以 `list_for_each()` 與 `list_head` 指標(命名為 `curr`)來走訪佇列,爾後準備另一個 `list_head` 指標(命名為 `iter`),先指向 `curr` 所指向的節點,爾後不斷往下一個節點前進且邊計數,直至數了 k 個節點,就對 `curr` 與 `iter` 之間的節點,開始實施反轉。
接下來接續前述的例子(6 個節點的佇列、將 `k` 指定為 3 進行 `q_reverseK()`),搭配數張圖片探討處理方式:
1. 最一開始,`curr` 與 `iter` 皆會指向同一個節點

2. `iter` 會邊往下一個節點前進邊計數,直至計數累積到 3,爾後可見 `iter` 會抵達第 4 個節點

這時候即是要把 `curr` 到 `iter->prev` 之間的所有節點(包含 `curr` 與 `iter->prev` 指向的節點),視作同一組,並實施反轉處理。
3. 這時候在開始反轉前,先透過另外 3 個指標 `prev` 、 `new_head` 與 `new_tail`(後續可得知這些指標的用途),分別指向 `curr->prev`、`iter->prev` 與 `curr`。

爾後與 `q_reverse()` 雷同,逐步移動 `curr` 指標,直到抵達 `iter` 所指向的節點,將該組每一個節點的 `list_head`,`next` 與 `prev` 所儲存的記憶體位址互換。
4. 當 `curr` 逐步移動、逐步處理 `list_head` 的兩個指標,抵達 `iter` 指向的節點後,中間各個指標的指向,一度會如下圖所示(多留意前 3 個節點的 `next` 與 `prev` 指標的指向):

觀摩上圖可見,某些指標的指向會指向不對的節點,故這裡要額外處理下列事項:
* `prev` 指向的 `list_head` 物件(這裡只是剛好是佇列的開頭指標 `head`), `prev->next` 應變成指向 `new_head` 所指向的節點。
* 剛剛以「分組」的概念,來說明要進行反轉的那些節點們,我們可以將每一個分組視作一個「子」佇列(或是說「子」鏈結串列),`new_head` 的含意就是子佇列的反轉後,子佇列真正的第一個節點,而明顯地,`new_head->prev` 應該要指向 `prev` 所指向的節點。
* `new_tail` 就是指反轉後,子佇列的最後一個節點,而 `new_tail->next` 應要指向
`curr` 所指向的節點。
* `curr` 此時是反轉前,子佇列最後一個節點(第 3 個節點)的下一個節點,此時也要修正 `curr->prev` 的指向,應該指到 `new_tail` 所指向的節點。
5. 到此,一旦完成第 4 點最後提及的 4 個修改指標指向的操作,最後就完成了一組節點的反轉了:

該圖的上半部是完成修改指標後的樣貌,下半部的圖示則是依據上半部,適當調整前 3 個節點在圖中的相對位置而已。
如此可見,前 3 個節點已經適當地反轉了,該修改的指標指向也都修正到。
最終,只要透過如前述的處理方式,適當地找出要反轉的節點區間,反轉子佇列的所有節點,並適當修正 4 個指標的指向,不斷分組處理後,即可完成對整個佇列的局部反轉。
而最後只要稍加注意,若分組分到最後,剩餘的節點不足 `k` 個的話,就不對這些剩餘的節點處理
### `q_sort()`
該函式了行為是為佇列的所有元素進行排序(升序或是降序),原先測試的要求,是必須以時間複雜度 $O(n\log{n})$ 的排序演算法,完成佇列的排序,這樣才能通過 `trace-15-perf` 的測試項目。
不過此處暫時先以選擇排序法處理,後者實際上是時間複雜度為 $O(n^2)$ 的排序法,這樣做主要是為了先了解 `qtest` 及相關的測試項目,使用佇列操作的情況,所以先以一個簡單的演算法代替。
後續會再改寫為合併排序法,改進這個函式的效能。
### `q_ascend()`
`q_ascend()` 的目的是讓整個佇列變成一個升序排序的佇列,但不是進行排序,而是針對每一個節點而言,在它之後有某個節點的字串比前者要小,就把後者自記憶體中刪除。
也就是說,透過刪除節點的方式,僅留下部份節點,使這些節點的資料是升序的。實作上目前也是採取簡單明瞭的方法,先走訪一個節點,爾後自這個節點的下一個節點往後找,若找到某些節點的字串比前者小,就把這些節點刪除。
### `q_descend()`
該函式與 `q_descend` 雷同,但是要將佇列變成一個降序排序的佇列,這個操作同樣可能會刪除某些節點。
實作上,是從佇列的最後一個節點開始,逐步走訪節點並逐步往前一個節點移動,直至回到佇列的開頭。每走訪一個節點時,就掃描這個節點的前面,是否有其他節點的字串小於當前的節點,若有找到某些節點使前述的情況成立,就刪除這些節點。
### `q_merge()`
`q_merge()` 是可以給定多個已排序的佇列,將這些佇列合併成一個仍保持排序的佇列。
而給定兩個佇列要合併時,實作時只需要透過兩個 `list_head` 指標,分別走訪兩個佇列的節點
* 第 2 個佇列的指標(這裡以 `q2_curr` 代稱)指向某個節點後,會先固定不動,第 1 個佇列的指標(以 `q1_curr` 代稱)則會逐步移動。
* `q2_curr` 即是要被合併到第 1 個佇列的某個節點,為了找到適當的插入位置,`q1_curr` 就是拿來逐步在第一個佇列走訪,爾後找到某個適當的節點,`q1_curr` 可拿來協助將 `q2_curr` 安插在適當的位置上。
* `q1_curr` 與 `q2_curr` 各自走訪到某個節點時,會用 `list_entry()` + `element_t` 指標,爾後取得字串並比較。
* 若發現 `q1_curr` 須排在 `q2_curr` 之後,就把 `q2_curr` 自第 2 個佇列移除,並插入在 `q1_curr` 之前。爾後 `q2_curr` 指標指向第 2 個佇列的新的第一個節點,也就是原先移動前,`q2_curr` 的下一個節點 `q2_curr->next`。
* 反之,`q1_curr` 直接往下一個節點前進。
* 可能會有兩種情況發生
* `q1_curr` 已經走回第 1 個佇列的開頭指標 `head`,若第 2 個佇列還有剩餘的節點,直接將這些節點移動至第 1 個佇列的尾端,隨後即可結束合併流程。
* 第 2 個佇列已經成為一個空佇列,表示所有節點都被合併至第 1 個佇列,此時不用再繼續做下去,可直接結束合併流程
這樣一來,`q1_curr` 與 `q2_curr` 會各自走訪一個佇列,透過上述方式,即可完成兩個佇列合併。
不過 `q_merge()` 是給定多個佇列,進行合併,目前的手法是會將第 2, 3, 4... 個佇列,一一取出,透過前述手法合併回第 1 個佇列。
### 額外實作的 `cmp_ascend()` 、 `cmp_descend()` 函式
因為 `q_sort()` 本身的第二個參數,可以決定要讓佇列以**升序**或是**降序**的方式排序,故準備了這兩個全域的靜態比較函式,這兩個函式皆是給定 2 個字串,進行比較後便回傳適當的布林值。
這兩個靜態函式的參數都命名為 `str1` 與 `str2`,排序演算法須將兩個不同的元素代入靜態函式內,若回傳值得到 `true`,則排序演算法要將那兩個元素的位置交換。其中須注意代入兩個元素時,兩個元素的位置必定是相對地,一個在前一個在後,而位在前面的元素要代入 `str1`,後面的元素則代入 `str2` 當中。
這兩個比較函式,本質上會呼叫 `strcmp()` 進行字串比較
* 若是 `cmp_ascend()`,會回傳 「`strcmp()` 的回傳值是否大於 0」。
* 這樣的含意是若 `str1` 比 `str2` 還要大,那麼以升序方式來講,這兩個字串就應該要交換。
* 反之若 `str1` 小於或等於 `str2`,則代表兩個元素的相對位置正確,是升序的,不需要交換
* 若是 `cmp_descend()`,則會回傳 「`strcmp()` 的回傳值是否小於 0」。
* 道理與 `cmp_ascend()` 相同,只是判斷的是某兩個元素之間是否是降序的。
如此,`q_sort()` 再準備一個函式指標陣列,準備兩個指標分別指向上述兩個比較函式,
,最後只須透過 `descend` 參數,就可以直接呼叫對應的比較函式。這樣不需要用分支判斷要使用升序還是降序的比較方式。
而這兩個比較函式,也會拿去輔助 `q_merge()` 的實作。
### 額外定義的 `q_is_empty()` 巨集
因為實作前述的函式時,有很多時候需要判斷佇列是否是一個空佇列,而判斷的方式自然是使用佇列的開頭指標 `head`,判斷其所維護 `list_head` 物件的 `next` 是否指向 `head` 本身。
在 `list.h` 中,原先已經有 `list_empty()` 函式,裡頭已經定義前述所需的判斷式,不過因為是要判斷「佇列是否是空佇列」,因此還是定義了 `q_is_empty()` 巨集,給定 `head` 後,會再協助代換成 `list_empty()` 函式,最終仍回傳為 `head->next == head`。
## 改進初步實作
### 修正 `q_remove_head()` 、 `q_remove_tail()` 的錯誤
當呼叫移除節點的函式的時候,若有給定合法的 `sp` 指標,這些移除節點的函式會將移除節點的字串,協助複製到 `sp` 指標所指向的字元陣列當中(其中陣列大小可由 `bufsize` 得知)。
原先實作上使用 `strncpy()`,這樣即可依據 `bufsize` 複製適當的字元數量到 `sp` 當中,而考量要容納 `'\0'` 字元,`strncpy()` 最後一個參數會代入 `bufsize - 1`,爾後進行複製。
後來發現 `strncpy()` 的複製處理,有可能剛好複製 `bufsize - 1` 個字元,但沒有附加 `'\0'` 字元,從而導致 `sp` 所最終儲存的字串,後續使用時會發生錯誤(如 `printf()` 輸出的字串,長度超過 `bufsize - 1` 個字元)
因此針對這兩個移除節點的函式,只要在呼叫 `strncpy()` 以後,主動為 `sp` 附加一個 `\0`,即可解決前述的缺陷:
* 實施修改後,對應到 commit [4809388](https://github.com/DrXiao/lab0-c/commit/4809388b88946ac549181e1cbe40048c435e37e2)
### 將 `q_sort()` 改為使用合併排序法
原先 `q_sort()` 使用選擇排序法去處理佇列的排序,後來透過 `qtest` 進行測試後,已知佇列的排序除了注意比較大小的方式外,還要**特別注意各個指標的指向是否正確**,後者會是一個要特別注意的點。
爾後為了滿足測試項目的要求,已將該函式改為合併排序法進行排序,前述的修改對應到 commit [ac55cdf](https://github.com/DrXiao/lab0-c/commit/ac55cdf2ab4d6600e701535d4d6e0283fcba1021)。
---
因為 lab0-c 的佇列本質上是用鏈結串列實作,接下來會以「鏈結串列」為主要術語,說明 `q_sort()` 的合併排序法是如何進行的。
對鏈結串列實施合併排序法,觀念與對陣列排序相同,都是先將所有元素視為個體,先兩兩合併,成為一個較小、已排序的子串列/子陣列後,再將相鄰的子串列/子陣列不斷地兩兩合併並排序,直至合併到只剩下一組串列/陣列後,即是排序完畢了。
以下先不論原先 lab0-c 的設計,先以一個儲存數個整數的鏈結串列,搭配數張圖片說明:

首先上圖可見一個鏈結佇列中,包含了數個整數(1 ~ 8),若要實施合併排序法,首先須如下圖所示意,先將整個鏈結串列分割成 8 個子鏈結串列(每個子串列僅包含一個節點)

上圖下半部以虛線示意,將鏈結串列分割成了 8 個子串列,這樣便完成了第一步。接著要依序將所有子串列兩兩合併並排序,不斷進行合併且排序後,即可完成整個鏈結串列的排序,前面這一段描述,可以用下面這張圖示意:

這裡搭配圖中,由上至下逐步解釋其含意
* 第 1 步完成分割子串列後,接下來先將 8 個子串列兩兩進行合併,爾後整個串列的樣貌如第 2 步所示
* 第 2 步此時會有 4 個已排序的子串列,因仍包含諸多子串列,故再次進行兩兩合併,爾後前進至第 3 步
* 第 3 步此時已經剩下 2 個已排序的子串列,再度合併後,即是第 4 步呈現的樣貌
* 第 4 步此時發現剩下 1 個已排序的子串列,後者事實上就是整個已排序完畢的鏈結串列,故此時排序完成。
以上,便是鏈結串列實施合併排序法的原理,與陣列的差異,僅僅只是在記憶體中,節點之間是離散的分佈,故要用指標走訪這些節點而已。
---
然而,lab0-c 模仿 Linux 核心,建立「雙向」且「環狀」的鏈結串列,並且連維護串列的方式,都與 Linux 核心相同,接下來將以實作的角度(但串列節點的資料仍假設儲存整數),講解應注意的細節:
首先這裡先沿用前述鏈結串列的例子,但示意圖畫出 `list_head` 物件在各個節點的樣貌:

爾後要注意的細節列舉及說明如下:
1. **如何分割串列為多個子串列?**
因為鏈結串列的節點在記憶體是離散的,無法像陣列一樣用索引值(或是已知的陣列大小)快速地進行分割,而對應的作法是,設法用 `list_head` 指標,找到正中間的節點:

這樣一來,就可以用這個指向正中間節點的指標(對應圖中的 `mid_ptr`),做第一次的分割,換言之,排序整個鏈結串列,相當於先排序 `head->next` ~ `mid_ptr` 這個子串列以及 `mid_ptr->next` ~ `head->prev` 這個子串列,最後再合併兩個子串列即可。
當然對於子串列的排序,同樣是用指標找到子串列的中間節點,不斷遞迴下去,直至無法再分割時便開始將更小的子串列,不斷排序並合併回原本的串列。
而前述也提及了關鍵字「遞迴」,所以在實作上,有先定義了名為 `q_sort_interval()` 的靜態函式,這裡僅列出做出關鍵行為的程式碼:
```c
static int q_sort_interval(struct list_head *head,
struct list_head *tail,
bool (*cmp)(const char *, const char *))
{
/* Handle the simplest cases */
if (head->next == tail) {
...
return 1;
}
...
...
/* Split the lists into two sub-lists and sort them. */
q_sort_interval(head, mid_ptr, cmp);
...
q_sort_interval(mid_ptr->next, tail, cmp);
/* Merge two sub-lists. */
...
...
}
static void q_sort(struct list_head *head, bool descend)
{
bool (*cmp[2])(const char *, const char *) = {cmp_ascend, cmp_descend};
if (!head || q_is_empty(head))
return;
q_sort_interval(head->next, head->prev, cmp[descend]);
}
```
透過 `q_sort_interval()`,後者可給定鏈結串列的第一個節點指標、最後一個節點的指標以及排序的比較方式(給定比較函式的指標),即會實施排序。其中須注意 `q_sort_interval()` 不可以給定空串列。
* 若遭遇某些最簡單的排序案例,就會直接將串列排序後並結束執行
* 最簡單的案例,即是 `q_sort_interval()` 發現該串列只有一個節點或是兩個節點,若是後兩者其一的話,直接視情況交換兩個元素的位置或是直接結束執行即可。
* 反之若串列的節點數量夠多,則會分割成兩個串列後,遞迴地呼叫自己兩次,排序兩個子串列。排序完畢後,就會將兩個已排序的子串列進行合併。
如此,`q_sort()` 本身只需要呼叫 `q_sort_interval()`,給定真正的第一個節點與最後一個節點的記憶體位址(即 `head->next` 與 `head->prev`),這樣即可完成整個串列的排序。
2. **`list_head` 物件的兩個指標處理**
前面第 1 點已經提及可用遞迴搭配指標的方式,不過這裡要提及第二個細節,就是實作上若僅用**指標**是不夠的,要用到**指標的指標**(或稱**間接指標**)的操作技巧,具體原因進一步說明如下:

原先會預期 `q_sort_interval()` 代入開頭節點與結尾節點的指標並完成串列的排序,但因為前述並沒有提到 `head` 所指向的 `list_head` 物件,其 `next` 與 `prev` 指標要如何適當調整,故若處理不當,即便排序完畢,`head` 所維護的 `list_head` 也會指向錯誤的節點(如上圖的下半部所呈現)。
再來看一個例子,因為會不斷分割成小的串列並排序,所以也有這樣的例子:

從前面的例子不難觀察到,原先的串列不斷地分割完後,會先合併 `4` 與 `1` 的節點並排序,再來合併 `7` 與 `3` 的節點並排序。上圖假設 `4` 與 `1` 的節點已排序變成先 `1` 後 `4`,再來處理 `7` 與 `3` 的節點。
而處理帶有 `3` 與 `7` 的子串列時,會遭遇到與剛剛類似的理由,因為 `q_sort_interval()` 僅代入開頭節點與結尾節點,所以 `7` 的節點與 `3` 的節點確實可以排序到,**但這個子串列前一與後一節點的指標沒有被調整到**(即上圖下半部 `4` 的節點的 `next` 指標、`5` 的節點的 `prev` 指標),這樣走訪串列時仍會得到錯誤的順序。
---
如此可以這樣歸納說明:「`q_sort_interval()` 若參數僅用指標,無法處理(子)串列前一個 `list_head` 物件與後一個 `list_head` 物件的指標。」
因此,可以用間接指標的技巧,
```c
static int q_sort_interval(struct list_head **head,
struct list_head **tail,
bool (*cmp)(const char *, const char *))
{
/* Handle the simplest cases */
if (head->next == tail) {
...
return 1;
}
...
...
/* Split the lists into two sub-lists and sort them. */
q_sort_interval(head, &mid_ptr, cmp);
...
q_sort_interval(&mid_ptr->next, tail, cmp);
/* Merge two sub-lists. */
...
...
}
static void q_sort(struct list_head *head, bool descend)
{
bool (*cmp[2])(const char *, const char *) = {cmp_ascend, cmp_descend};
if (!head || q_is_empty(head))
return;
q_sort_interval(&head->next, &head->prev, cmp[descend]);
}
```
將參數改寫為 `struct list_head **` 的型態後,就可以解決前面提及的問題,讓子串列被排序後,在子串列前後的 `list_head` 物件的指標也被調整到。
雖說光靠 `struct list_head *` 也可以存取子串列前後的 `list_head` 物件並修改,但這樣就要在程式碼中加入「若子串列排序後,開頭/結尾節點有改變,就要調整子串列前後 `list_head` 物件的指標」的處理。
在 [你所不知道的 C 語言: linked list 和非連續記憶體](https://hackmd.io/@sysprog/c-linked-list#%E5%BE%9E-Linux-%E6%A0%B8%E5%BF%83%E7%9A%84%E8%97%9D%E8%A1%93%E8%AB%87%E8%B5%B7) 教材中,有一段「移除鏈結串列的節點」程式碼,如果僅用指標,就要做出額外的特例處理,但用指標的指標,就可以省去特例處理,從而讓程式碼變得簡潔。所以 `q_sort_interval()` 參數型態改用指標的指標,也是類似的道理。
如此,透過前述手法,`q_sort_interval()` 即可適當地完成串列的排序,從而順利完成 `q_sort` 的實作,滿足時間複雜度的要求。
## 修改 `q_delete_mid()` 的手法並比較
原先 `q_delete_mid()` 採用快慢指標的手法,找到正中間的節點並刪除之,但考慮 List API 建立雙向且環狀的鏈結串列,還有另一個實作思維:
以一個圓形的操場來比喻,若兩人自操場的同一點,背對背行進行走操場,以速率相同但行進方向相反的方式繞著操場走,交會的那一刻,兩人都剛好走了操場圓周長的一半。故針對環狀且雙向的鏈結串列,可以利用兩個指標,一個從開頭節點往結尾節點的方向移動,另一個節點,從結尾節點開始往開頭節點的方向走,兩個指標在迴圈中不斷地移動(僅移動一步),當兩個指標交會的那一刻,也可以找到正中間的節點。
依循上述的手法進行修改後,對應到 `improve-q_delete-mid` 分支的 commit [c837f44
](https://github.com/DrXiao/lab0-c/commit/c837f4495199f75a29df5ccf1a6eb28c74d3da4f)
為了比較前後的執行時間差異,找出真正最佳的那個方法,後來有追加一些程式,做了一些前置準備後,利用一些手法產生測量數據並比較。
### 追加 `measure` 命令
在 `qtest` 內建的命令列,雖然已經有 `time` 命令可以測量某個命令的執行時間,但這個命令的輸出時間,精確度只能到毫秒等級,而且一次只能執行一次命令。
為了可以「下一次命令,執行多次動作」,後續自行追加了 `measure` 命令,在執行 `qtest` 後,可用 `help` 命令見到說明如下:
```
cmd> help
..
..
measure n cmd arg .. | Using clock_gettime to measure the time for each execution
..
..
```
也就是說,我新做的 `measure` 命令與 `time` 命令類似,但前者使用 `clock_gettime()` 來測量執行時間(精確度達到奈秒等級),並且第一個參數 `n` 即是可執行給定的命令(`cmd arg`)n 次,並用標準輸出(或者說利用 `report()` 函式),將每一次執行的時間呈現在命令介面上。
舉例來說,可以以下列方式執行命令:
```
cmd> new
l = []
cmd> measure 5 it RAND
l = [jceids]
elapsed time: 22777
l = [jceids gebwxleul]
elapsed time: 16983
l = [jceids gebwxleul pynqvwtc]
elapsed time: 17329
l = [jceids gebwxleul pynqvwtc vfvdaby]
elapsed time: 14588
l = [jceids gebwxleul pynqvwtc vfvdaby iwllwfjk]
elapsed time: 18163
```
上面這段 `qtest` 的操作,是先新增一個空佇列後,爾後執行 `measure 5 it RAND`,意思就是 `it RAND` 這個操作被執行 5 次,而每執行完一次後,會輸出 `elapsed time: ...` 的訊息。這樣一來,開發者若要測量某個佇列操作的執行時間,可以透過 `measure` 命令,讓某個操作被執行多次並輸出每一次的執行時間,接著可以進一步做的事情,就是收集這多次的時間,進行分析後,得知該操作的平均執行時間(注意是在自己環境上的平均執行時間)。
上述追加 `measure` 命令的修改,對應到 commit [ef57f26](ef57f26a48b2c7209b3e9feb72d3a0da34b838ca)
### 設計 `q_delete_mid()` 的測試實驗
1. 設計初步的測試命令序列
有了自己撰寫的 `measure` 命令後,我初步便設計了以下的命令序列(建立 `testcase.cmd` 檔案)
* `testcase.cmd`
```
new
it RAND 10000
log result-mytest
measure 30 dm
free
```
這樣一來,即可測試 10000 個節點下,移除 30 次正中間節點的執行時間。不過因為是直接移除 30 個節點,會輸出 30 次 `elapsed time: ...` 的訊息,並且將輸出訊息也儲存到 `result-mytest` 檔案中
這裡要注意:
* 第一次的 `elapsed time: ...` 訊息:是自 10000 個節點中,移除正中間節點的執行時間
* 第二次的 `elapsed time: ...` 訊息:是自 **9999** 個節點中,移除正中間節點的執行時間
* 第三次的 `elapsed time: ...` 訊息:是自 **9998** 個節點中,移除正中間節點的執行時間
* ...
亦即那 30 次執行移除正中間節點時,遭遇的節點總數量不同,故要注意這樣的區別。
爾後在終端機,可以用下列 shell 命令,使用 `qtest` 來測試 `testcase.cmd`
```
cat testcase.cmd | ./qtest > /dev/null
```
這樣就可以用 `cat` 將 `testcase.cmd` 欲執行的所有操作,透過 shell 的管線(pipe)一次性地送到 `qtest` 內執行。
而假設想執行 100 次 `qtest` 及設計好的測試命令組合,並取得平均時間,就要執行上述的 shell 命令 100 次,產出 100 個 `result-mytest` 檔案後,自行取出裡面的內容,再計算平均執行時間。(具體的計算方式,會在第 3 點論述)
為了可以自動化執行前面的過程、計算平均執行時間、產出測試圖表,於是接下來就設計了一系列地測試工具。
---
2. 建立自動化測試的腳本
```shell=
#!/bin/sh
if [ "$1" = "" ] || [ "$2" = "" ] || [ "$3" = "" ]; then
echo "Usage: ./test_script.sh <n> <init_node> <rm_node"
echo "run a testcase n times and produce analysis figure."
exit 1
fi
testcase="
new
it RAND $2
log result-mytest
measure $3 dm
free
"
mkdir -p result-mytest-dir
git checkout master
make
for i in $(seq 1 $1); do
echo "$testcase" | ./qtest > /dev/null
mv result-mytest result-mytest-dir/result-mytest$i
done
mv result-mytest-dir/result-mytest* measurement/mytest-master || exit 1
git checkout improve-q_delete_mid
make
for i in $(seq 1 $1); do
echo "$testcase" | ./qtest > /dev/null
mv result-mytest result-mytest-dir/result-mytest$i
done
mv result-mytest-dir/result-mytest* measurement/mytest-improve || exit 1
cd measurement
python3 measure.py $1 $2
```
這個 shell 腳本會進行數項操作,這邊配合行號並描述上述的執行過程
* L1 - L7:提示命令執行方式為 `./test-script.sh <n> <init_node> <rm_node>`
* 因為前面提及的 `testcase.cmd` 固定是初始有 10000 個節點,並移除 30 次節點,所以 `<init_node>` 可以指定初始節點、`<rm_node>` 可以指定要移除的次數,增加測試的彈性
* 而 `<n>` 可指定使用 `qtest` 執行命令序列 $n$ 次,產出 $n$ 個 `result-mytest` 的測試結果檔案
* L9 - L14:這裡用名為 `testcase` 的變數,儲存稍後欲執行的命令序列,而實際上就是前面提及的 `testcase.cmd` 所包含的命令,不過將 `10000` 置換為 `$2`、30 置換為 `$3`,這樣可以透過在 shell 下命令時,彈性地修改初始節點數量、移除正中間節點的次數
* L17:因為會產出多個 `result-mytest` 檔案,所以會用 `mkdir` 產生一個目錄,後續會將所有 `result-mytest` 檔案存放至 `result-mytest-dir` 目錄下
* L19 - L33:
* 針對 `q_delete_mid()` 的實作,在 `master` 支線採用快慢指標的作法,`improve-q_delete_mid` 採用新的作法(兩個指標向正中間移動的作法),接著必須先後切換至兩個分支上,編譯並執行 `qtest` 並執行 `testcase` 的命令,爾後產出 `result-mytest` 並進行後續分析
* 所以這裡會自動執行 `git checkout`,
* 先切換到 `master` 支線,以 `make` 編譯 `qtest` 並透過 `for` 迴圈執行 `qtest` $n$ 次,並將每一次產生的 `result-mytest` 移動到 `result-mytest-dir` 目錄,且在檔案名稱最後面附加號碼。執行完後,會將 `result-mytest-dir` 下多個 `result-mytest` 檔案,存放到 `measurement/mytest-master` 目錄下
* 爾後再次 `git checkout`,切換到 `improve-q_delete_mid`,並做一樣的事情,但最後 `result-mytest-dir` 下的所有檔案,會通通移動到 `measurement/mytest-improve` 目錄底下
* L35 - L36:
* 根據我自己的設計來講,執行 `test_script.sh` 以後,最後會得到類似下列的目錄及檔案:
```
.
├── console.c
...
├── measurement
│ ├── measure.py
│ ├── mytest-master
│ │ ├── result-mytest1
│ │ ├── result-mytest2
│ │ ...
│ └── mytest-improve
│ ├── result-mytest1
│ ├── result-mytest2
│ ...
│
├── qtest
...
```
也就是說,`master` 與 `improve-q_delete_mid` 兩個分支各自的測試結果,會分別存放至 `measurement/mytest-master` 及 `measurement/mytest-improve` 底下。
(其中 `measurement` 是自行預先建立好的目錄、`measure.py` 後續會說明其用意)
* 接著就是透過 `cd` 切換工作目錄至 `measurement` 內,爾後啟用 `measure.py`,透過 Python 程式進行分析
---
3. 透過 Python 分析、計算平均執行時間、產出圖表
經過前面的 shell 腳本,最終會得到許多的 `result-mytest<n>` 檔案(`<n>` 是 shell 腳本最終給定的編號),亦即多次執行結果,如果觀察 `result-mytest<n>`,不難發現我們可以將這多次的執行結果,自行想像並展開成一個表格
假設執行 `./test_script.sh 50 1000 100` 的話,亦即 `mytest-master` 與 `mytest-improve` 目錄,最終都應可以得到各自的 `result-mytest1` ~ `result-mytest50`,爾後可以像下方這樣展開成兩個表格
```
mytest-master
+--------------------+--------+--------+ +---------+
| \ sample | | | | |
| total nodes \ | 1 | 2 | . . . . . . . | 50 |
+--------------------+--------+--------+ +---------+
| 1000 | 170012 | 160020 | | 1611116 |
+--------------------+--------+--------+ +---------+
| 999 | 166611 | 165432 | .
+--------------------+--------+--------+ .
| 998 | 165555 | 164353 | . . . . . . . .
+--------------------+--------+--------+
. .
. . . . . . . . . . . .. . . . . . . . . .
. .
+--------------------+ +---------+
| 901 | . . . . . . . . . . . . . . . . | 1443232 |
+--------------------+ +---------+
```
```
mytest-improve
+--------------------+--------+--------+ +---------+
| \ sample | | | | |
| total nodes \ | 1 | 2 | . . . . . . . | 50 |
+--------------------+--------+--------+ +---------+
| 1000 | 171012 | 164420 | | 1613216 |
+--------------------+--------+--------+ +---------+
| 999 | 164611 | 165432 | .
+--------------------+--------+--------+ .
| 998 | 165555 | 157353 | . . . . . . . .
+--------------------+--------+--------+
. .
. . . . . . . . . . . .. . . . . . . . . .
. .
+--------------------+ +---------+
| 901 | . . . . . . . . . . . . . . . . | 1333232 |
+--------------------+ +---------+
```
* 如第一點所述,因為移除多個節點的過程中,節點的總數量是同時遞減,故勢必要以總節點數量區分收集到的多次執行時間,所以上述表格就有了 `total nodes`,表示在對應的總節點數量下,移除節點的執行時間。
* 接著因為透過 `test_script` 執行了 **50** 次 `qtest`,所以會有縱軸的 `sample`,這樣可以清晰地見到 `total nodes` 的每一個項目,都有 50 次執行時間的數據。
如果可以理解前述的說明,就可以知道接下來要這樣處理:
```
mytest-master
+--------------------+--------+
| \ average| |
| total nodes \ | |
+--------------------+--------+
| 1000 | 160612 |
+--------------------+--------+
| 999 | 160011 |
+--------------------+--------+
| 998 | 163355 |
+--------------------+--------+
. . .
. . .
. . .
+--------------------+--------+
| 901 | 141212 |
+--------------------+--------+
```
```
mytest-improve
+--------------------+--------+
| \ average| |
| total nodes \ | |
+--------------------+--------+
| 1000 | 157312 |
+--------------------+--------+
| 999 | 156611 |
+--------------------+--------+
| 998 | 154555 |
+--------------------+--------+
. . .
. . .
. . .
+--------------------+--------+
| 901 | 142002 |
+--------------------+--------+
```
因為原先的表格,每一列都有 50 次數據,所以可以取平均值,得知特定總節點數量下,移除正中間節點的平均執行時間,最終算完平均值的表格,就可以將其繪製曲線圖,從而比較 `master` 與 `improve-q_delete-mid` 的實作,從視覺化的圖表中,得知執行時間的差異並找出最佳的實作
因此,為了將所有的 `result-mytest<n>`,轉換成表格、計算每一項目的平均值、並畫出曲線圖,這裡採用撰寫 Python 腳本的策略,完成前述要做的事情,而 Python 腳本的實作如下:
```python
import matplotlib.pyplot as plt
import numpy as np
import sys
base_name = "result-mytest"
def outlier_filter(data, threshold = 2):
data = np.array([y for y in data])
z = np.abs((data - data.mean()) / data.std())
return data[z < threshold]
def parse_data(path, cases, init_node):
all_data = None
for i in range(cases):
j = 0
data = []
with open(path + "/" + base_name + "%d"%(i + 1), "r") as f:
for line in f:
if line.startswith("elapsed time:"):
exec_time = int(line.split(":")[1].strip())
data.append([init_node - j, exec_time])
j += 1
if all_data:
for i in range(len(all_data)):
all_data[i].append(data[i][1])
else:
all_data = data
for i in range(len(all_data)):
filtered_data = outlier_filter(all_data[i][1:]).mean()
all_data[i] = [all_data[i][0], filtered_data]
return all_data
def main():
if len(sys.argv) < 3:
print("Usage: python3 measure.py <num_of_data> <init_node>")
exit(1)
num = int(sys.argv[1])
init_node = int(sys.argv[2])
plt.figure(figsize=(8, 6))
master_data = parse_data("mytest-master", num, init_node)
improve_data = parse_data("mytest-improve", num, init_node)
x_values, y_values = zip(*master_data)
plt.plot(x_values, y_values, marker='o', linestyle='-',
color='b', label="fast-slow pointer")
x_values, y_values = zip(*improve_data)
plt.plot(x_values, y_values, marker='o', linestyle='-',
color='g', label="Two pointers move toward to each other")
plt.legend()
plt.show()
if __name__ == "__main__":
main()
```
因為上述的程式碼並未上傳到 GitHub 上,故呈現在此並解說:
* 執行方式:`python3 measure.py <n> <init_node>`
* 透過 shell 的參數,告知 `result-mytest<n>` 的 `<n>` 最高是多少
* 因為 `result-mytest<n>` 的內容並沒有初始總節點的資訊,所以透過 `<init_node>` 告知這個腳本,每一次測試的初始總節點數量
* `parse_data()` 即是給定 `mytest-master` 或是 `mytest-improve` 字串後,會去指定的目錄下讀取 `result-mytest1` ~ `result-mytest<n>`,並轉成 Python 的 `list` 物件(就是將收集到的數據做成表格),爾後會對每一列的數據取平均值,變成算完平均值的表格
* 考量數據可能會有極端值的存在,故有參考 2022 年 - [K04: fibdrv](https://hackmd.io/@sysprog/linux2022-fibdrv#%E7%94%A8%E7%B5%B1%E8%A8%88%E6%89%8B%E6%B3%95%E5%8E%BB%E9%99%A4%E6%A5%B5%E7%AB%AF%E5%80%BC) 的說明,可以用統計手法去除極端值,所以中間還有設計 `outlier_filter()` 函式,讓 `parse_data()` 先除去極端值後再計算平均值。
* `main` 函式就是呼叫 `parse_data()` 後,得到兩種表格,分別對應 `master` 與 `improve-q_delete_mid` 的執行時間數據。接著呼叫 `matplotlib` 的繪圖功能,將兩條曲線畫出來。
---
綜合前面的說明,執行 `test_script.sh`,指定取樣次數、初始總節點數量、要移除正中間節點的次數以後,接著腳本會協助到兩個分支上測試並產出測試數據,最後 Python 腳本程式可以協助產出類似下方的圖表:

圖中的橫軸即是總節點數量、縱軸則是執行時間(單位是奈秒),所以**圖中的每一個點,都代表對應的總節點數量下,移除正中間節點的平均時間**。
### 修改 `q_delete_mid` 的小結
透過 [修改 q_delete_mid() 的手法並比較](#設計-q_delete_mid-的測試實驗) 的說明,我原先設想可以收集多次執行時間的數據後,得知 `q_delete_mid()` 的平均執行時間並繪製圖表後,就可以比較這個函式修改前後的差異。
不過後來有多用 `test_script` 執行數次,有發現到以下的現象
* 執行 `./test_script.sh 50 10000 100` 兩次
* 第一次可見是新方法比較快

* 但第二次竟然顯示舊方法比較塊

後來嘗試修改初始節點數量,也會有類似的情形
* 執行 `./test_script.sh 50 30000 100` 兩次


* 執行 `./test_script.sh 50 50000 100` 兩次


原先預期不論怎麼執行 `test_script.sh`,都必定會顯示某一種方法比較快,但從前面的描述可以得知,`test_script.sh` 會告訴我有時候是新方法比較快,有時則是舊方法比較快,這樣無從判斷哪一個方法是最佳的實作。
或許是我設想的實驗方式,有未考量到的影響因素,從而導致有時是新方法比較快,有時則是舊方法比較快,後續或許要更周全地考量實驗方式與測試環境,才能更正確地分析並找出最佳的實作。