sysprog
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    ## Linux 核心專題: 重做 lab0 > 執行人: ChenFuhuangKye [解說影片連結](https://www.youtube.com/watch?v=DKD8cJdd8eU&ab_channel=chenfu) ### Reviewed by `LindaTing0106` > 透過 Valgrind,可以觀察程式運作時的 cache miss 狀況,分析其原因,並改善程式碼。 有針對使 cache miss 降低的議題對程式碼進行改善了嗎? ### Reviewed by `kevinzxc1217` > 而 list_del 需要多次讀取 node 前後的節點位置,因此容易發生 cache miss 這裡為什麼讀取 node 前後節點位置會容易發生 cache miss 呢? ### Reviewed by `chloe0919` 1. 可以不需要貼上完整程式碼,貼上特別要說明的段落即可。 2. hackmd 撰寫請符合中文和英文字元之間要有空白字元 3. > 由於每次檢驗彼此獨立,因此 perf 計算出的 cache missing 數據,除非兩種演算法的結果差異達到一定程度,否則無法用於評估演算法的好壞。 可以再清楚說明為何每次檢驗彼此獨立,會影響 perf 計算出的 chche missing 在評估演算法上的效能。 ### Reviewed by `Lccgth` >我們可以發現,merge_sort 發生了 13,331,089 次 cache miss,這是因為遞迴呼叫容易多次出現 cache miss。 可以說明為什麼遞迴呼叫容易多次出現 cache miss。 遞迴呼叫中,每次的調用會存取不同的記憶體位置,可能會導致 cache miss。 ### Reviewed by `mesohandsome` > 重複執行 5 次,分別得到 執行 5 次是否太少? ### Reviewed by `dockyu` >如果 $p$ 小於 0.05 ,拒絕虛無假設,硬幣為不公平硬幣 >1. $\chi^2 = 36$ ,$p < 0.05$ , 1不符合虛無假設,硬幣為不公平硬幣 >2. $\chi^2 = 1$ , $0.50 < p < 0.25$ , $p > 0.05$ , 符合虛無假設,硬幣為公平硬幣 >3. $\chi^2 = 0$ , $p = 0.99 > 0.05$ , 符合虛無假設,硬幣為公平硬幣 可以說明 $p$ 是如何得出的。 > 首先確認自由度,由於硬幣只有正反面,假設正面出現機率為 x ,反面出現的機率為 1 - x ,由此可以知道拋硬幣的自由度為 1 。接著利用卡方值查表,舉第 1 組為例,此時的卡方值為 36 、自由度為 1 ,而當自由度為 1 且 p = 0.05 時,卡方值為 3.84 ,可以發現第一組的卡方值大於 p = 0.05 時,因此硬幣為不公平硬幣。 ### Reviewed by `hugo0406` 能補上觀察圖示「以數學分析來觀察 shuffle 1M 次的結果」的文字敘述嗎? ### Reviewed by `lintin528` 可以嘗試設計幾個資料集來測試不同的排序演算法之表現,又或是他們出現 `worst case` 或 `best case` 的情形並比較。 ### Reviewed by `randyuncle` 最後的用 valgrind 觀察 cache missing 的部份不需用把整個包含警告訊息的結果貼上來,可以只貼出關鍵的數據就好,方便閱讀。 ## 任務簡介 重做 lab0,並強化統計相關議題。 ## TODO: 依據第一次作業規範,重作 lab0 > 善用 git rebase 操作,在最新的 [sysprog21/lab0-c](https://github.com/sysprog21/lab0-c) 開發。 > 可適度彙整其他學員的成果,但務必指明出處並確認其內容有效 ### 以數學分析來觀察 shuffle 1M 次的結果 ![Figure_1](https://hackmd.io/_uploads/HkyhBKqL0.png) :::danger 改進上圖的 x 軸標示 > 已更新 ::: ``` Expectation: 41666 Observation: {'1234': 41392, '1243': 41907, '1324': 41598, '1342': 41563, '1423': 42164, '1432': 41460, '2134': 41834, '2143': 41677, '2314': 41740, '2341': 41727, '2413': 41207, '2431': 41311, '3124': 41722, '3142': 41696, '3214': 41826, '3241': 41624, '3412': 41788, '3421': 41507, '4123': 41514, '4132': 41613, '4213': 41921, '4231': 41857, '4312': 41867, '4321': 41485} chi square sum: 26.045792732683726 ``` :::danger 說好的數學呢? 列出 Lemma 和 Theorem,並詳盡討論。翻出久違的教科書。 ::: 可以透過卡方檢驗比較期望出現頻率分佈是否符合預期的分佈。 ### 卡方檢驗方法 計算卡方值 $\chi^2$ ,檢驗卡方值的大小。 卡方值的公式如下 $\chi^2=\sum^k_{i=1}\cfrac{(O_i-E_i)^2}{E_i}$ * $O_i$ 為類別觀察的次數 * $E_i$ 為類別期望的次數 * k 為類別的數量 * 自由度為 df = k-1 舉拋 100 次公平硬幣為例,我們期望會有 50 次正面、 50 次反面。 舉三個例子 1. 如果觀察到有 80 次正面、 20 次反面 * $\chi^2=\cfrac{(80-50)^2}{50} + \cfrac{(20-50)^2}{50} = 36$ 2. 如果觀察到有 55 次正面、45 次反面 * $\chi^2=\cfrac{(55-50)^2}{50} + \cfrac{(45-50)^2}{50} = 1$ 3. 如果觀察到有 50 次正面、 50 次反面 * $\chi^2=\cfrac{(50-50)^2}{50} + \cfrac{(50-50)^2}{50} = 0$ 可以發現當 $\chi^2$ 值越小,觀察次數與期望次數越接近,反之當 $\chi^2$ 值越大,觀察次數與期望次數差異越大,越不符合期望。 由於拋 100 次公平硬幣中,只有正面跟反面兩個類別,因此自由度為 1 。 現在有了卡方值以及自由度,可以透過查表來判斷是否為公平硬幣。 以上面三個例子為例, * 虛無假設 H0: 硬幣為公平硬幣 * 對立假設 H1: 硬幣為不公平硬幣 如果 $p$ 小於 0.05 ,拒絕虛無假設,硬幣為不公平硬幣 1. $\chi^2 = 36$ ,$p < 0.05$ , 1不符合虛無假設,硬幣為不公平硬幣 2. $\chi^2 = 1$ , $0.50 < p < 0.25$ , $p > 0.05$ , 符合虛無假設,硬幣為公平硬幣 2. $\chi^2 = 0$ , $p = 0.99 > 0.05$ , 符合虛無假設,硬幣為公平硬幣 ![image](https://hackmd.io/_uploads/HySvD7bDR.png) :::danger 修正圖片的存取權限 > 已修正 ::: 如果 shuffle 是公平的代表發牌結果出現頻率要一樣,以發了一百萬次牌為例,會有 24 種狀況,每種狀況出現的期望次數為 41666 次。 可以透過卡方檢驗比較期望出现頻率分佈是否符合預期的分佈。如果洗牌是公平的,代表發牌结果出現頻率應當相同。以發了一百萬次牌為例,會有24種情况,每種情況出現的期望次數為 41666 次。 配適度檢定建立的假設如下: * 虛無假設 H0: 數據符合出現的頻率分佈 * 對立假設 H1: 數據不符合出現的頻率分佈 計算的 p-value 為 0.299 大於 0.005 ,無法拒絕虛無假設,因此結論 shuffle 的結果符合期望的結果。 ```shell $ python test_shuffle.py Power_divergenceResult(statistic=np.float64(26.045119999999997), pvalue=np.float64(0.29874079973512085)) ``` ```python import numpy as np from scipy import stats f_obs = [41392, 41907, 41598, 41563, 42164, 41460, 41834, 41677, 41740, 41727, 41207, 41311, 41722, 41696, 41826, 41624, 41788, 41507, 41514, 41613, 41921, 41857, 41867, 41485] total_obs = sum(f_obs) f_exp = [total_obs / len(f_obs)] * len(f_obs) output = stats.chisquare(f_obs=f_obs, f_exp=f_exp) print(output) ``` ### 研讀 Linux 核心原始程式碼的 [lib/list_sort.c](https://github.com/torvalds/linux/blob/master/lib/list_sort.c) list_sort 參數的意義。 * `priv` : `list_sort` 函式不會去使用,而是傳遞給 `cmp` 比較函式,可以用於傳遞需要在比較函式中使用的額外訊息。 * `head` : 為鏈結串列的開頭。 * `cmp` : 為一個比較函式,用於比較兩個元素之間的排列順序,並回傳一個整數來表示兩個元素的比較結果。 由於 `head` 跟 `cmp` 不能為 NULL ,加上 `__attribute__((nonnull(2, 3)))` 讓編譯器可以發出警告。 ```c __attribute__((nonnull(2, 3))) void list_sort(void *priv, struct list_head *head, list_cmp_func_t cmp) ``` list_sort 初始值 * `list` : 這是一個指向鏈結串列中第一個節點的指標,並確保此鏈結串列數量不是 0 或是 1 。 * `pending` : 這是一個指向待處理節點的指標,其初始值為 NULL 。 * `count` : 這是一個整數值,用來紀錄待處理節點的數量。 :::danger 注意用語,務必詳閱 https://hackmd.io/@sysprog/it-vocabulary ::: list_sort 有分成三個區塊。 1. 逐一走訪每個節點,並把節點逐一加入 `pending` , 當 `pending` 的數量 + 1 後為 2 的冪就不合併,反之合併。 2. 當 `list` 中沒有節點時,將 `pending` 中所有節點合併。 3. 最後整理成環狀串列。 #### 執行步驟 ##### `在 do while` 之前 ```graphviz digraph ele_list { rankdir=LR node[shape=record] head [label="head", style="bold"] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] head -> node6 node6 -> node5 node5 -> node4 node4 -> node3 node3 -> node2 node2 -> node1 list -> node6:n } ``` ##### count = 1 加入 6 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node4 node4 -> node3 node3 -> node2 node2 -> node1 list -> node5:w pending -> node6:w } ``` ##### count = 2 加入 5 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node6 node4 -> node3 node3 -> node2 node2 -> node1 list -> node4:w pending -> node5:w } ``` ##### count = 3 加入 4 到 `pending` ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5 -> node6 node4 -> node5 node3 -> node2 node2 -> node1 list -> node3:w pending -> node4:w } ``` ##### count = 4 加入 3 到 `pending` ,5 節點的分支與 6 節點的分支合併 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node4 -> node5 node3 -> node4 node2 -> node1 list -> node2:w pending -> node3:w } ``` ##### count = 5 加入 2 到 `pending` , 3 節點的分支與 4 節點的分支合併。 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node3 -> node5 node3:s -> node4:n node2 -> node3 list -> node1:w pending -> node2:w } ``` ##### count = 6 加入 1 到 `pending` , 3 節點的分支與 5 節點的分支合併。 ```graphviz digraph ele_list { rankdir=LR node[shape=record] node6 [label="6", style="bold"] node5 [label="5", style="bold"] node4 [label="4", style="bold"] node3 [label="3", style="bold"] node2 [label="2", style="bold"] node1 [label="1", style="bold"] list [label="list", style="normal", color="white"] pending [label="pending", style="normal", color="white"] node5:s -> node6:n node4:s -> node5:n node3:s -> node4:n node2 -> node3 node1 -> node2 list pending -> node1:w } ``` ##### 當 `list` 沒有節點 將 1 節點的分支、 2 節點的分支與 3 節點的分支合併。 #### 最後將頭尾接上 ## 強化統計的基礎 > 例如針對排序和其分析,需要多大的樣本空間? ## TODO: 改進鏈結串列的排序 > 拉近與 Linux 核心鏈結串列排序的距離,甚至在特定狀況得以超越其性能。 ## TODO: 善用工具進行分析 > 如 perf, valgrind, massif 除了比較執行時間外,判斷排序演算法的好壞還可以觀察其發生 cache missing 的情況。由於 CPU 與 cache 之間的溝通速度遠快於與記憶體的溝通速度,如果能夠降低 cache missing 的次數,則代表排序演算法較優。然而,每次計算 cache missing 的結果都可能有所不同,因此找到適當的樣本大小來代表演算法的 cache missing 是非常重要的。 ### merge_sort 實作 merge_sort 可以利用 `Divide and Conquer` 使用遞迴方法,每次遞迴先將 list 分成兩個左右兩個區塊,直到 list 不能分割,接著合併左右 list 直到全部合併。 ```c void merge_sort(struct list_head *head, bool descend) { if (!head || list_empty(head) || list_is_singular(head)) return; struct list_head *mid = head; for (struct list_head *fast = head->next; fast != head && fast->next != head; fast = fast->next->next) mid = mid->next; LIST_HEAD(left); LIST_HEAD(right); list_cut_position(&left, head, mid); list_splice_init(head, &right); merge_sort(&left, descend); merge_sort(&right, descend); merge_list(head, &left, &right, descend); } void merge_list(struct list_head *head, struct list_head *left, struct list_head *right, bool descend) { LIST_HEAD(tmp); while (!list_empty(left) && !list_empty(right)) { element_t *left_entry = list_entry(left->next, element_t, list); element_t *right_entry = list_entry(right->next, element_t, list); if (descend ? strcmp(left_entry->value, right_entry->value) > 0 : strcmp(left_entry->value, right_entry->value) < 0) { list_move_tail(left->next, &tmp); } else { list_move_tail(right->next, &tmp); } } list_splice_tail_init(list_empty(left) ? right : left, &tmp); list_splice_init(&tmp, head); } ``` ### 使用 perf 計算資料排序程式的 cache missing 首先,利用 `lscpu` 得知測試環境的 L3 cache 大小為 12MB,`list_head` 大小為 16B 。可以將樣本大小設為 12 * 1024 * 64。我設計了一個測試程式,並測試。 ```c #include <stdio.h> #include <stdlib.h> #include "queue.h" #define N 12 * 1024 * 64 int main() { struct list_head *head = q_new(); for(int i = 0; i < N; i++) q_insert_head(head, "world"); q_sort(head, 0); return 0; } ``` ```shell $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz CPU family: 6 Model: 165 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 2 CPU max MHz: 5000.0000 CPU min MHz: 800.0000 BogoMIPS: 5199.98 ``` ```shell $ sudo perf stat -e cache-references,cache-misses ./merge_sort Performance counter stats for './merge_sort': 59,127,873 cache-references 23,874,713 cache-misses # 40.38% of all cache refs 0.377605633 seconds time elapsed 0.351518000 seconds user 0.023967000 seconds sys ``` 重複執行 5 次,分別得到 :::danger 為何重複 5 次呢?而不是其他次數?你的統計學素養在哪? ::: | | 第一次取樣 | 第二次取樣 | 第三次取樣 | 第四次取樣 | 第五次取樣 | |:----------------:|:----------:|:----------:|:----------:|:----------:|:----------:| | cache-references | 59127873 | 57615096 | 57638423 | 57674422 | 59593397 | | cache-misses | 23874713 | 24929428 | 24034141 | 24130806 | 24184553 | 如何確保這些分佈是否都相同可以利用卡方檢定 (Chi-Squared Test) 中的同質性檢定 ([Test of Homogeneity](https://courses.lumenlearning.com/wm-concepts-statistics/chapter/test-of-homogeneity/)) ,檢驗對於同一個類別變數,資料內不同群體出現的頻率分佈是否相同。 同質性檢定建立的假設如下: * 虛無假設 H0: 對於 cache missing 類別,不同的子群體之間存在**相同**的分佈。 * 對立假設 H1: 對於 cache missing 類別,不同的子群體之間存在**不同**的分佈。 卡方檢定中需要找出觀察與預期,我們需要定義出預期,這邊我們的虛無假設是『對於 cache missing 類別,不同的子群體之間存在**相同**的分佈。』,因此我們的同質性預期可以用 cache 狀況代表。 * cache-references ,總共有 291,649,211 * cache-misses ,總共有 121,153,641 * 預期的 cache missing 比例為 121,153,641 / 291,649,211 = 41.51% 利用 python 程式計算同質性檢定 ```python import numpy as np from scipy import stats output = stats.chi2_contingency( observed = np.array([ [35253160 ,32685668 ,33604282 ,33543616 ,35408844], [23874713 ,24929428 ,24034141 ,24130806 ,24184553] ]) ) print(output) ``` ``` $ python main.py Chi2ContingencyResult(statistic=np.float64(129007.8621825518), pvalue=np.float64(0.0), dof=4, expected_freq=array([[34565635.80424913, 33681279.64222307, 33694916.40176633, 33715961.10133328, 34837777.0504282 ], [24562237.19575087, 23933816.35777693, 23943506.59823367, 23958460.89866673, 24755619.94957181]])) ``` 可以得到 * Statistic ($x^2$) : 129007.8621825518 * 值越大代表觀察頻率與期望頻率差異顯著 * p-value: 0.0 * 值為 0小於0.05 ,在同質性檢定的結論,不是每次的 cache missing 都一樣 因此, 本次的同質性檢定得出各組數據關係不相同,而是互相獨立。雖然說可以大致推測出 cache missing 比例為 41.51% ,然而經過卡方檢驗後,可以得知每次取樣的結果彼此互相獨立。有可能與測試程式有關,因為計算 cache missing 也有包含 list 在增加 node 的過程。 #### 增加採樣次數 我寫了一個腳本,進行了 10000 次採樣,其 p-value 也接近於 0 。我好奇這些資料的分佈狀況,於是以 1% 為單位累加出現的次數,並畫出了cache miss機率分佈圖。結果顯示,cache missing 大致呈現常態分佈,但誤差範圍很大,這與同質性檢驗的結果一致,且本次的執行結果與之前 cache missing 結果不一樣。由於每次檢驗彼此獨立,因此 perf 計算出的 cache missing 數據,除非兩種演算法的結果差異達到一定程度,否則無法用於評估演算法的好壞。 ![Figure_1](https://hackmd.io/_uploads/Hy_e5DtUC.png) ```bash #!/bin/bash # Initialize result arrays cache_references_results=() cache_misses_results=() cache_correct_results=() # Repeat 10000 times for i in {1..10000} do # Run perf command and store the output in a variable output=$(perf stat -e cache-references,cache-misses ./merge_sort 2>&1) # Extract cache-references and cache-misses values from the output cache_references=$(echo "$output" | grep 'cache-references' | awk '{print $1}' | tr -d ',') cache_misses=$(echo "$output" | grep 'cache-misses' | awk '{print $1}' | tr -d ',') # Calculate cache_correct cache_correct=$((cache_references - cache_misses)) # Store the results in arrays cache_references_results+=("$cache_references") cache_misses_results+=("$cache_misses") cache_correct_results+=("$cache_correct") cache_miss_rate=$(awk "BEGIN {print ($cache_misses/$cache_references)*100}") # Print the extracted values for the current iteration echo "Run $i:" echo "Cache References: $cache_references" echo "Cache Misses: $cache_misses" echo "Cache Correct: $cache_correct" echo "Cache missing: $cache_miss_rate%" echo "---------------------------------" done # Write all cache references results to a file echo "${cache_references_results[@]}" > cache_references.txt # Write all cache misses results to a file echo "${cache_misses_results[@]}" > cache_misses.txt # Write all cache correct results to a file echo "${cache_correct_results[@]}" > cache_correct.txt # Print all results echo "All Cache References Results: ${cache_references_results[@]}" echo "All Cache Misses Results: ${cache_misses_results[@]}" echo "All Cache Correct Results: ${cache_correct_results[@]}" ``` ### 使用 valgrind 觀察 cache misssing valgrind 可以觀察到程式運作時 cache missing 發生的狀況,可以找出是那一行程式發生的多次的 cache missing 。 ``` $valgrind --tool=cachegrind ./merge_sort ==558249== ./.valgrindrc was not read as it is either not a regular file, ==558249== or is world writeable, or is not owned by the current user. ==558249== Cachegrind, a cache and branch-prediction profiler ==558249== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al. ==558249== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==558249== Command: ./merge_sort ==558249== --558249-- warning: L3 cache found, using its data for the LL simulation. --558249-- warning: specified LL cache: line_size 64 assoc 16 total_size 12,582,912 --558249-- warning: simulated LL cache: line_size 64 assoc 24 total_size 12,582,912 ==558249== brk segment overflow in thread #1: can't grow to 0x485b000 ==558249== (see section Limitations in user manual) ==558249== NOTE: further instances of this message will not be shown ==558249== ==558249== I refs: 2,264,282,423 ==558249== I1 misses: 1,334 ==558249== LLi misses: 1,326 ==558249== I1 miss rate: 0.00% ==558249== LLi miss rate: 0.00% ==558249== ==558249== D refs: 1,348,111,526 (871,215,763 rd + 476,895,763 wr) ==558249== D1 misses: 24,598,017 ( 19,308,724 rd + 5,289,293 wr) ==558249== LLd misses: 7,576,318 ( 4,952,807 rd + 2,623,511 wr) ==558249== D1 miss rate: 1.8% ( 2.2% + 1.1% ) ==558249== LLd miss rate: 0.6% ( 0.6% + 0.6% ) ==558249== ==558249== LL refs: 24,599,351 ( 19,310,058 rd + 5,289,293 wr) ==558249== LL misses: 7,577,644 ( 4,954,133 rd + 2,623,511 wr) ==558249== LL miss rate: 0.2% ( 0.2% + 0.6% ) $ sudo cg_annotate cachegrind.out.558249 -------------------------------------------------------------------------------- I1 cache: 32768 B, 64 B, 8-way associative D1 cache: 32768 B, 64 B, 8-way associative LL cache: 12582912 B, 64 B, 24-way associative Command: ./merge_sort Data file: cachegrind.out.558249 Events recorded: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Events shown: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw Thresholds: 0.1 100 100 100 100 100 100 100 100 Include dirs: User annotated: Auto-annotation: on -------------------------------------------------------------------------------- Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw -------------------------------------------------------------------------------- 2,264,282,423 (100.0%) 1,334 (100.0%) 1,326 (100.0%) 871,215,763 (100.0%) 19,308,724 (100.0%) 4,952,807 (100.0%) 476,895,763 (100.0%) 5,289,293 (100.0%) 2,623,511 (100.0%) PROGRAM TOTALS -------------------------------------------------------------------------------- Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw file:function -------------------------------------------------------------------------------- 352,321,488 (15.56%) 6 ( 0.45%) 6 ( 0.45%) 125,829,110 (14.44%) 4,121 ( 0.02%) 14 ( 0.00%) 73,138,163 (15.34%) 0 0 ???:merge_list 243,793,840 (10.77%) 1 ( 0.07%) 1 ( 0.08%) 121,896,920 (13.99%) 1,365 ( 0.01%) 5 ( 0.00%) 48,758,768 (10.22%) 10,090 ( 0.19%) 74 ( 0.00%) ???:list_empty 221,769,976 ( 9.79%) 29 ( 2.17%) 29 ( 2.19%) 37,750,909 ( 4.33%) 0 0 37,746,111 ( 7.91%) 1,572,303 (29.73%) 1,571,760 (59.91%) ./malloc/./malloc/malloc.c:_int_malloc 186,122,168 ( 8.22%) 5 ( 0.37%) 5 ( 0.38%) 98,828,268 (11.34%) 13,331,089 (69.04%) 3,119,860 (62.99%) 33,292,271 ( 6.98%) 1,365 ( 0.03%) 9 ( 0.00%) ???:merge_sort 173,958,096 ( 7.68%) 4 ( 0.30%) 4 ( 0.30%) 39,321,600 ( 4.51%) 4,919,860 (25.48%) 1,622,063 (32.75%) 0 0 0 ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S:__strcmp_avx2 173,015,040 ( 7.64%) 0 0 94,371,840 (10.83%) 0 0 62,914,560 (13.19%) 0 0 ???:list_add_tail 141,557,760 ( 6.25%) 1 ( 0.07%) 1 ( 0.08%) 78,643,200 ( 9.03%) 1,034,147 ( 5.36%) 209,220 ( 4.22%) 47,185,920 ( 9.89%) 3,277,277 (61.96%) 655,694 (24.99%) ???:list_del 125,829,120 ( 5.56%) 1 ( 0.07%) 1 ( 0.08%) 39,321,600 ( 4.51%) 0 0 39,321,600 ( 8.25%) 0 0 ???:list_move_tail 89,653,302 ( 3.96%) 6 ( 0.45%) 6 ( 0.45%) 33,030,163 ( 3.79%) 1 ( 0.00%) 0 25,165,839 ( 5.28%) 196,159 ( 3.71%) 196,156 ( 7.48%) ???:test_malloc 69,205,812 ( 3.06%) 7 ( 0.52%) 7 ( 0.53%) 17,301,518 ( 1.99%) 1 ( 0.00%) 1 ( 0.00%) 6,291,973 ( 1.32%) 0 0 ./malloc/./malloc/malloc.c:malloc 53,477,308 ( 2.36%) 2 ( 0.15%) 2 ( 0.15%) 26,738,654 ( 3.07%) 4,095 ( 0.02%) 15 ( 0.00%) 17,301,482 ( 3.63%) 10,770 ( 0.20%) 73 ( 0.00%) ???:list_splice 50,128,732 ( 2.21%) 3 ( 0.22%) 3 ( 0.23%) 12,582,920 ( 1.44%) 4 ( 0.00%) 2 ( 0.00%) 4,718,595 ( 0.99%) 0 0 ./stdlib/./stdlib/random_r.c:random_r 33,030,165 ( 1.46%) 3 ( 0.22%) 3 ( 0.23%) 12,582,920 ( 1.44%) 1 ( 0.00%) 1 ( 0.00%) 3,145,730 ( 0.66%) 0 0 ./stdlib/./stdlib/random.c:random 32,243,671 ( 1.42%) 4 ( 0.30%) 4 ( 0.30%) 17,301,482 ( 1.99%) 3,699 ( 0.02%) 17 ( 0.00%) 9,437,172 ( 1.98%) 212,607 ( 4.02%) 197,017 ( 7.51%) ???:list_cut_position 29,884,435 ( 1.32%) 3 ( 0.22%) 3 ( 0.23%) 9,437,190 ( 1.08%) 1 ( 0.00%) 1 ( 0.00%) 4,718,595 ( 0.99%) 0 0 ???:fail_allocation 29,097,966 ( 1.29%) 1 ( 0.07%) 1 ( 0.08%) 11,010,041 ( 1.26%) 0 0 4,718,589 ( 0.99%) 0 0 ???:list_is_singular 28,311,744 ( 1.25%) 31 ( 2.32%) 30 ( 2.26%) 14,155,838 ( 1.62%) 1,049 ( 0.01%) 13 ( 0.00%) 20 ( 0.00%) 1 ( 0.00%) 1 ( 0.00%) ???:??? 28,311,528 ( 1.25%) 2 ( 0.15%) 2 ( 0.15%) 14,155,764 ( 1.62%) 0 0 9,437,176 ( 1.98%) 0 0 ???:INIT_LIST_HEAD 26,738,654 ( 1.18%) 1 ( 0.07%) 1 ( 0.08%) 13,369,327 ( 1.53%) 0 0 8,650,741 ( 1.81%) 4,049 ( 0.08%) 12 ( 0.00%) ???:list_splice_tail 25,952,256 ( 1.15%) 3 ( 0.22%) 3 ( 0.23%) 8,650,752 ( 0.99%) 0 0 7,077,888 ( 1.48%) 0 0 ???:q_insert_head 25,165,792 ( 1.11%) 1 ( 0.07%) 1 ( 0.08%) 7,864,310 ( 0.90%) 0 0 7,864,310 ( 1.65%) 0 0 ???:list_splice_init 22,806,540 ( 1.01%) 3 ( 0.22%) 3 ( 0.23%) 1,572,865 ( 0.18%) 0 0 3,145,730 ( 0.66%) 0 0 ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms 20,447,245 ( 0.90%) 0 0 9,437,190 ( 1.08%) 0 0 4,718,595 ( 0.99%) 0 0 ???:find_footer 18,874,368 ( 0.83%) 2 ( 0.15%) 2 ( 0.15%) 6,291,456 ( 0.72%) 0 0 5,505,024 ( 1.15%) 0 0 ???:test_strdup 17,301,504 ( 0.76%) 1 ( 0.07%) 1 ( 0.08%) 9,437,184 ( 1.08%) 0 0 6,291,456 ( 1.32%) 0 0 ???:list_add 12,582,896 ( 0.56%) 0 0 3,932,155 ( 0.45%) 0 0 3,932,155 ( 0.82%) 0 0 ???:list_splice_tail_init 11,796,480 ( 0.52%) 2 ( 0.15%) 2 ( 0.15%) 2,359,296 ( 0.27%) 0 0 1,572,864 ( 0.33%) 0 0 ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms 11,010,048 ( 0.49%) 1 ( 0.07%) 1 ( 0.08%) 1,572,864 ( 0.18%) 1 ( 0.00%) 1 ( 0.00%) 0 0 0 ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S:__strlen_avx2 6,291,474 ( 0.28%) 0 0 2,359,300 ( 0.27%) 0 0 786,437 ( 0.16%) 0 0 ???:main 3,145,219 ( 0.14%) 2 ( 0.15%) 2 ( 0.15%) 0 0 0 1 ( 0.00%) 0 0 ./malloc/./malloc/arena.c:malloc -------------------------------------------------------------------------------- The following files chosen for auto-annotation could not be found: -------------------------------------------------------------------------------- ./malloc/./malloc/arena.c ./malloc/./malloc/malloc.c ./stdlib/./stdlib/random.c ./stdlib/./stdlib/random_r.c ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S ./string/../sysdeps/x86_64/multiarch/strcmp-avx2.S ./string/../sysdeps/x86_64/multiarch/strlen-avx2.S ``` 我們可以發現,merge_sort 發生了 13,331,089 次 cache miss,這是因為遞迴呼叫容易多次出現 cache miss。同樣地,list_del 出現了 1,034,147 次 cache miss,這是由於 list_move_tail 會呼叫 list_del,而 list_del 需要多次讀取 node 前後的節點位置,因此容易發生 cache miss。透過 Valgrind,可以觀察程式運作時的 cache miss 狀況,分析其原因,並改善程式碼。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully