Webber
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # 2024q1 Homework5 (assessment) contributed by < [youjiaw](https://github.com/youjiaw) > ## 閱讀〈因為自動飲料機而延畢的那一年〉的啟發與課程心得 ### 閱讀心得 作者在組建團隊時找了不同科系的人幫忙,這部分談到了學用落差的現象,「資工系的學生不會寫程式,機械系的學生不會做機械」,儘管大學安排了很多紮實的課程,但學生的實做能力依舊貧乏。而我也曾有過這方面的困惑,例如,修這些課程會有幫助嗎?我未來什麼時候會用到? 但從另一個角度來看,如果環境就是這樣,那學生是否更應該重視自主學習,做一些與其他人不一樣的事?像是在完成專案的過程中,作者利用了嵌入式系統課程與大學期間寫網站的經驗寫出飲料機的程式,其他成員也因為跑工廠、花時間做很多習題累積了不同的經驗,從而有能力為專案作出貢獻,這些皆是曾經的犧牲所換來的結果。 然而,除了如何將自己所學好好發揮出來,也許更重要的是如何面對陌生領域的種種問題, >「你不能現在就放棄,要是現在就放棄的話,你這輩子日後遇到這種等級的困難,就只會想逃避而已。」 這是我看完之後最有感觸的一句話,作者沒做過機器,卻經手了飲料機的每個細節,即使經歷了多次的自我懷疑,作者在遇到各種問題時都沒有放棄,而是持續學習並不斷嘗試與改進,這樣的精神讓我十分敬佩。同時,作者通過不斷確認自己要什麼、不要什麼,做出取捨讓專案可以在自己的底線內達到目標,這也呼應了先前提到的,為了達成目標,有時候必須做出取捨和犧牲。 ### 課程心得 我發現我就像老師說的,所有科目考完試就忘了不少,因為過去我的學習動機都是為了在考試中拿取高分,從而達到其他目的,而不是真正理解學科內容。這樣的習慣也讓我發現「記錄疑惑」對我來說是有難度的,因為我已經習慣被動的接收那些考試內容,而不是主動思考、質疑答案的真實性。 因此,這堂課讓我重新審視了自己的學習態度,雖然我基礎知識缺了不少,導致閱讀教材和寫作業花的時間比我預期的還要多,但正如老師所說的「誠實面對自己,缺什麼就補什麼」,如果逃避,問題還是在那裡,努力想辦法解決問題才是正確做法。 ## 研讀課程教材和 CS:APP 3/e ### 提問 * [影片](https://www.youtube.com/watch?v=bVJ-mWWL7cE)中提到,有時在編譯器最佳化的情況下,會將 branching code 轉換為 branchless 的組合語言,執行效能也可能比我們自己寫的 branchless code 還要好,因此想請問老師: 1. 我們要如何評估自己寫的 branchless code?只能輸出組合語言來比對嗎? 2. 如果經過編譯器最佳化後,branching code 與 branchless code 得到相同的組合語言,那何者的執行速度會更快? * 老師於 4/18 課堂中提到,[測驗一](https://hackmd.io/@sysprog/linux2024-quiz9#%E6%B8%AC%E9%A9%97-1)的 fork_count 上限要依照 CPU core 的數量來設置,若設太大會導致結果錯誤。 想請問老師為什麼 fork_count 的上限會與 CPU 有關,而不是系統的 pid_max? branchless https://hackmd.io/@sysprog/binary-representation security CPU pipeline: hazard stall out-of-order execution ILP: instruction-level parallalism instruction latency system (國中物理) load balance => CPU scheduling book ## 簡述想投入的專案 TODO: https://hackmd.io/@sysprog/concurrency 並紀錄問題 TODO: quiz9,quiz10,quiz12 (or others related to concurrency) ### 提問 1. [〈並行程式設計: Atomics 操作〉](https://hackmd.io/@sysprog/concurrency-atomics) 提到 `alignas` 可以避免 false sharing 或增進記憶體的存取效率,但是我查閱 Linux 核心原始程式碼卻沒找到此操作。隨後發現 Linux 核心是使用 `attribute((aligned(sizeof(long))))` 這種對齊方式。 2. [第 9 週測驗一](https://hackmd.io/@sysprog/linux2024-quiz9#%E6%B8%AC%E9%A9%97-1) 限制 fork 的次數不超過 5,但是 `fork_count`並不共享於 process 之間。因此如下圖,執行第一次 fork 之後 `Task 1` 和 `Task 2` 會在各自的記憶體區域紀錄 `fork_count = 1`,導致再進行 fork 後,紀錄的 `fork_count` 為 2,但是實際 fork 的次數已經是 3 次了。因此 `fork_count` 應是記錄 fork 的深度? ```graphviz digraph g_fork { node [shape="record"]; Main_task [label="{<l>Main_task|<r>fork_count = 0}", color=red] Main_task -> Fork_1; Task_1 [label="{<l>Task 1|<r>fork_count = 1}", color="#2b932b"] Task_2 [label="{<l>Task 2|<r>fork_count = 1}", color=blue] Fork_1 -> Task_1; Fork_1 -> Task_2; Task_1_1 [label="{<l>Task 1_1|<r>fork_count = 2}", color="#2b932b"] Task_1_2 [label="{<l>Task 1_2|<r>fork_count = 2}", color="#2b932b"] Task_1 -> Fork_2; Fork_2 -> Task_1_1; Fork_2 -> Task_1_2; Task_2_1 [label="{<l>Task 2_1|<r>fork_count = 2}", color=blue] Task_2_2 [label="{<l>Task 2_2|<r>fork_count = 2}", color=blue] Task_2 -> Fork_3; Fork_3 -> Task_2_1; Fork_3 -> Task_2_2; } ``` 教材詞彙修正: 1. [futex_wait](https://hackmd.io/@sysprog/concurrency-mutex#futex_wait) 的第一句「`FUTEX_WAIT_PRIVATE` 的 ~~option~~ 部分即 `FUTEX_WAIT`」,option 是否應更改為 operation? 2. [futex_requeue](https://hackmd.io/@sysprog/concurrency-mutex#futex_requeue) 提到,「若 `futex` 中所有的等待者超過 `limit` 個,則~~所有的~~等待者皆會被喚醒並改在 `other` 這個 futex word 對應的 wait queue 中等待」,其中”所有的”是否應更改為“剩餘的”? 3. 老師的教材〈Concurrency Primer〉中,第二章的最後一段第二行,`my_v = v` 與示範程式碼 `bv_v = v` 不符(`my_v = v` 為第一章的示範程式碼)。 4. [〈案例: Thread Pool 實作和改進〉](https://hackmd.io/@sysprog/concurrency-thread-pool#%E5%AF%A6%E4%BD%9C)中,實作的部分開始提到的 `bpp` 相關函式與變數,及 [tpool.c](https://github.com/sysprog21/concurrent-programs/blob/0d5c7a40381430f396aa099cb209c9bcb8369713/tpool/tpool.c#L324C9-L324C18) 第 324 行的註解 `BPP`,是否應更正為 `bbp`? ## Linux 核心專題: 改進 concurrent (fork) merge sort > 執行人: youjiaw > [專題解說錄影](https://youtu.be/7kZxFtMPFd8) ### 任務簡介 以[第九週測驗一](https://hackmd.io/@sysprog/linux2024-quiz9#%E6%B8%AC%E9%A9%97-1)給定的 concurrent (fork) merge sort 為基礎,嘗試實作[2024-04-16 問答簡記](https://hackmd.io/VyLWa5VNQG-EpDoi2WqTMw#david965154)中,上課時說明的可能改進方法。 原版 `merge_sort` 函式: ```c void merge_sort(int *arr, const int len) { if (len == 1) return; const int mid = len / 2; const int left_len = len - mid; const int right_len = mid; /* If forked too often, it gets way too slow. */ if (fork_count < 5) { pid_t pid = fork(); fork_count++; // printf("fork_count: %d\n", fork_count); if (pid == 0) { /* Child process */ merge_sort(arr, left_len); exit(0); } // printf("pid: %d\n", pid); /* Parent process */ merge_sort(arr + left_len, right_len); waitpid(pid, NULL, 0); } else { merge_sort(arr, left_len); merge_sort(arr + left_len, right_len); } memcpy(arr, merge(arr, left_len, arr + left_len, right_len), len * sizeof(int)); } ``` 改進方法: 1. 避免遞迴,降低 fork 的成本 2. 預先 fork (pre-forking) 3. 實作 workqueue // 參見 案例: Thread Pool 實作和改進 ### 實作 workqueue 及 pre-forking 參考[〈案例: Thread Pool 實作和改進〉](https://hackmd.io/@sysprog/concurrency/%2F%40sysprog%2Fconcurrency-thread-pool#%E4%B8%A6%E8%A1%8C%E7%A8%8B%E5%BC%8F%E8%A8%AD%E8%A8%88-Thread-Pool-%E5%AF%A6%E4%BD%9C%E5%92%8C%E6%94%B9%E9%80%B2)的 [tpool.c](https://github.com/sysprog21/concurrent-programs/blob/master/tpool/tpool.c),加入 thread pool。 首先更改 `main` 函式,在呼叫 `merge_sort()` 之前先於 thread pool 中建立指定數量的 thread,並在排序完成後將資源釋放,減少動態建立和釋放的開銷。 ```diff + pool = tpool_create(4); merge_sort(arr, N_ITEMS); + tpool_join(pool); ``` `merge_sort()` 函式先將任務依照中間點,切分為左半邊與右半邊,並以 `arg` 結構記錄任務的變數,接著使用 `tpool_apply()` 將各任務要執行的函式與變數加入 workqueue 中。 最後等待任務執行完,就釋放相關資源,並呼叫 `merge()`。 ```c= static void merge_sort(int *arr, const int len) { if (len == 1) return; const int mid = len / 2; const int left_len = len - mid; const int right_len = mid; arg left_task = {arr, left_len}; arg right_task = {arr + left_len, right_len}; tpool_future_t left_future = tpool_apply(pool, merge_sort_task, &left_task); tpool_future_t right_future = tpool_apply(pool, merge_sort_task, &right_task); tpool_future_get(left_future, 0); tpool_future_destroy(left_future); tpool_future_get(right_future, 0); tpool_future_destroy(right_future); memcpy(arr, merge(arr, left_len, arr + left_len, right_len), len * sizeof(int)); } ``` 其中,新增的 `arg` 結構體與 `merge_sort_task()` 函式如下。 ```c typedef struct __arg{ int *data; const int len; } arg; void *merge_sort_task(arg *task) { int *arr = (*task).data; int len = (*task).len; merge_sort(arr, len); } ``` 但是在這樣的更改下,程式無法順利執行完成。經過測試發現,所有 threads 都會在卡在 `merge_sort()` 第 16 行的 `tpool_future_get()` 中,呼叫 ` pthread_cond_wait(&future->cond_finished, &future->mutex)` 等待子任務的完成,卻一直沒有被喚醒。 進一步分析發現,因為 thread 是透過 `jobqueue_fetch()` 來取得並執行任務(下方第 1 行),而上述等待情況所對應的喚醒函式 `pthread_cond_broadcast()` 需要等 `merge_sort()` 執行完成後才會被呼叫(下方第 11 行),但 `merge_sort()` 本身又在等待子任務完成,形成了所有 thread 執行的任務都在等待子任務完成,這種無法結束的等待情況。 ```c= void *ret_value = task->func(task->arg); pthread_mutex_lock(&task->future->mutex); if (task->future->flag & __FUTURE_DESTROYED) { pthread_mutex_unlock(&task->future->mutex); pthread_mutex_destroy(&task->future->mutex); pthread_cond_destroy(&task->future->cond_finished); free(task->future); } else { task->future->flag |= __FUTURE_FINISHED; task->future->result = ret_value; pthread_cond_broadcast(&task->future->cond_finished); pthread_mutex_unlock(&task->future->mutex); } ``` 因為目前沒有想到其他的解決辦法,所以直接將 `merge_sort()` 改為非遞迴的版本,來避免任務與子任務之間的等待狀況。 ### 更改為非遞迴的合併排序 非遞迴版本中,要交給 thread 執行的任務就沒有 `merge_sort()` 了,只有 `merge()`,程式會在 `merge_sort()` 中提前切分好所有要進行 `merge()` 的任務,並將這些任務加入 workqueue,更改後的 `arg` 結構體與 `merge_task()` 函式如下。 ```c typedef struct __arg{ int *arr; int left; int mid; int right; } arg; void *merge_task(void *args) { arg *task = (arg *)args; // merge(task->arr, task->left, task->mid, task->right); memcpy(task->arr + task->left, merge(task->arr, task->left, task->mid, task->right), (task->right - task->left + 1) * sizeof(int)); return NULL; } ``` `merge_sort()` 中,等待 `future` 任務執行完成與釋放資源的函式,可以選擇放在兩個迴圈之間,或是放在最內層的迴圈。因為若是如下方放在迴圈外面,除了 `futures` 陣列大小難以決定以外,一次性的把所有任務加入,也會導致執行順序無法確定,造成結果錯誤的狀況。 ```c void merge_sort(int *arr, int left, int right) { int current_size; int left_start; tpool_future_t futures[N_ITEMS]; int i = 0; for (current_size = 1; current_size <= right - left; current_size = 2 * current_size) { for (left_start = left; left_start < right; left_start += 2 * current_size) { int right_end = MIN(left_start + 2 * current_size - 1, N_ITEMS - 1); int mid = MIN(left_start + current_size - 1, N_ITEMS - 1); arg *merge_arg = malloc(sizeof(arg)); merge_arg->arr = arr; merge_arg->left = left_start; merge_arg->mid = mid; merge_arg->right = right_end; futures[i++] = tpool_apply(pool, merge_task, merge_arg); } } for (int j = 0; j < i; j++){ tpool_future_get(futures[j], 0); tpool_future_destroy(futures[j]); } } ``` 如下方 `N_ITEMS = 4` 的例子,workqueue 會被加入第一層的兩個任務,及第二層的一個任務,假設 thread 數量為 3,且各取得了一個任務,這時如果第二層的任務比任何一個第一層的任務早完成,排序結果就可能會出錯。 ``` - 正確例子 (4) (3) (2) (1) \ / \ / (3, 4) (1, 2) \ / (1, 2, 3, 4) - 錯誤例子(先排前兩個,再排前兩個和後兩個,最後排後兩個) (4) (3) (2) (1) 1st merge: \ / (3, 4) (2, 1) -> 不變 2nd merge: \ / (2, 1, 3, 4) -> 假設 (2, 1) 已排序過,所以按順序加入 3rd merge: \ / -> 排序 arr[2], arr[3] (2, 1, 3, 4) ``` 如果將函式放在兩個迴圈之間,相當於每次都只先加入一層的任務,並立即執行所有任務,然後再繼續加入下一層的任務,直到結束。這樣就可以保證上述不確定的情況被排除。 ```c void merge_sort(int *arr, int left, int right) { int current_size; int left_start; for (current_size = 1; current_size <= right - left; current_size = 2 * current_size) { int len = N_ITEMS / (2 * current_size) + 1; tpool_future_t futures[len]; int i = 0; for (left_start = left; left_start < right; left_start += 2 * current_size) { int right_end = MIN(left_start + 2 * current_size - 1, N_ITEMS - 1); int mid = MIN(left_start + current_size - 1, N_ITEMS - 1); arg *merge_arg = malloc(sizeof(arg)); merge_arg->arr = arr; merge_arg->left = left_start; merge_arg->mid = mid; merge_arg->right = right_end; futures[i++] = tpool_apply(pool, merge_task, merge_arg); } for (int j = 0; j < i; j++){ tpool_future_get(futures[j], 0); tpool_future_destroy(futures[j]); } } } ``` 若是將函式放在最內層的迴圈內,則是代表每放入一個任務,就等待他做完才放入下一個任務,失去了並行的能力。 但是我猜測這種方法可以讓剛加入的任務還在 Memory hierachy 上層時,就先進行合併,所以對 cache 較友善,而且也不需要多餘的記憶體空間來存放 `futures` 陣列。 ```c for (current_size = 1; current_size <= right - left; current_size = 2 * current_size) { for (left_start = left; left_start < right; left_start += 2 * current_size) { int right_end = MIN(left_start + 2 * current_size - 1, N_ITEMS - 1); int mid = MIN(left_start + current_size - 1, N_ITEMS - 1); arg *merge_arg = malloc(sizeof(arg)); merge_arg->arr = arr; merge_arg->left = left_start; merge_arg->mid = mid; merge_arg->right = right_end; tpool_future_t future = tpool_apply(pool, merge_task, merge_arg); tpool_future_get(future, 0); tpool_future_destroy(future); } } ``` ### 排序測試和效能評估機制 使用以下兩種方式進行評估: 1. 執行時間 原先參考論文[〈Dude, is my code constant time?〉](https://eprint.iacr.org/2016/1123.pdf),利用 CPU 提供的週期計數器 TSC register 作為評估執行時間的方法,但是[相關說明](https://learn.microsoft.com/en-us/windows/win32/dxtecharts/game-timing-and-multicore-processors)顯示,在多處理器和雙核系統上不保證計數器在核心之間同步處理,因此使用 rdtsc 需要將待測試的程式限制在一個 CPU 上 (set CPU affinity)以此減少誤差。 因此,我決定使用 `clock_gettime()` 來測量,避免 cpu pinning 以及同步的問題。 > This use of RDTSC for timing suffers from these fundamental issues: Discontinuous values. Using RDTSC directly assumes that the thread is always running on the same processor. Multiprocessor and dual-core systems do not guarantee synchronization of their cycle counters between cores. ... 2. 使用 perf 工具進行性能分析。 <!-- 3. 記憶體使用量 使用 valgrind 工具中的 massif 來測量記憶體使用量。 --> 以下測試皆固定建立 10 個 thread。 首先比較上方提到的,將等待任務執行完成與釋放資源的函式放在兩個迴圈之間(以下簡稱 A 方法),或是最內層的迴圈(以下簡稱 B 方法)。 **執行時間** 使用 `clock_gettime()` 測量 10 種陣列大小,從 100,000 筆資料開始,每次加 100,000 一直到 1,000,000,並且每個大小都會重複執行 5 次,最後再取執行時間的平均。 可以發現 A 方法在所有資料大小的平均執行時間都比 B 方法快 3 倍左右。 ![compare_gt](https://hackmd.io/_uploads/SJaB_-i8A.png) **Perf** 使用 `perf stat` 分析 cache-misses, cache-references, instructions 以及 cycles。 ``` $ perf stat --repeat 5 -e cache-misses,cache-references,instructions,cycles ./main ``` 觀察下方結果可以發現,有如先前所推測的,B 方法雖然執行速度較慢、花費的 instructions, cycles 比較多,但是 cache-misses 的機率從 3.66% 下降到了 0.21%。 這邊我疑惑的是,為什麼 instructions per cycle 比較多? A 方法: ``` Performance counter stats for './main' (5 runs): 4139,2926 cache-misses # 3.66% of all cache refs ( +- 0.54% ) 11,3224,7852 cache-references ( +- 1.03% ) 950,1385,8505 instructions # 0.61 insn per cycle ( +- 0.41% ) 1566,6598,5158 cycles ( +- 0.81% ) 12.7242 +- 0.0775 seconds time elapsed ( +- 0.61% ) ``` B 方法: ``` Performance counter stats for './main' (5 runs): 860,1043 cache-misses # 0.21% of all cache refs ( +- 1.72% ) 41,6341,6998 cache-references ( +- 0.87% ) 4812,2255,9812 instructions # 0.75 insn per cycle ( +- 0.08% ) 6422,3224,9672 cycles ( +- 0.50% ) 59.1235 +- 0.0966 seconds time elapsed ( +- 0.16% ) ``` 經過查詢,cache-misses 與 cache-references 計算的是 last level cache,再次測試 L1-dcache 發現 B 方法的存取次數多了快 5 倍。 A 方法: ``` Performance counter stats for './main' (5 runs): 258,8940,6382 L1-dcache-loads ( +- 1.19% ) 16,7917,5049 L1-dcache-loads-misses # 6.49% of all L1-dcache accesses ( +- 2.80% ) 153,9005,4198 L1-dcache-store ( +- 0.36% ) 13.774 +- 0.459 seconds time elapsed ( +- 3.33% ) ``` B 方法: ``` Performance counter stats for './main' (5 runs): 1154,0040,5239 L1-dcache-loads ( +- 0.26% ) 73,3492,6426 L1-dcache-loads-misses # 6.36% of all L1-dcache accesses ( +- 0.91% ) 741,8025,0221 L1-dcache-store ( +- 0.27% ) 55.836 +- 0.126 seconds time elapsed ( +- 0.23% ) ``` 我原先認為的 B 方法因為一加入任務就執行,所以 L1 cache 存取次數會比較多、misses 少,且 last level cache 的存取次數較少,但與測試的結果不相符,目前還沒有想到原因。 至於為什麼具有較好並行處理能力的 A 方法 instructions per cycle 會較低,我原先的推測是 thread 之間的切換造成較多的 context switch,但是測試結果如下。 A 方法: ``` Performance counter stats for './main' (5 runs): 1286,8241 context-switches ( +- 0.80% ) 3,1163,8462 branch-misses ( +- 0.68% ) 1,5208 migrations ( +- 6.61% ) 14.2010 +- 0.0878 seconds time elapsed ( +- 0.62% ) ``` B 方法: ``` Performance counter stats for './main' (5 runs): 6420,2873 context-switches ( +- 0.13% ) 11,1517,4417 branch-misses ( +- 0.20% ) 2,9538 migrations ( +- 5.98% ) 53.297 +- 0.114 seconds time elapsed ( +- 0.21% ) ``` 所以整體而言,A 方法無論是在執行時間,還是資源的消耗都明顯優於 B 方法,而 B 方法的 instructions per cycle 略高的原因,我會再繼續研究。 #### 比較原程式碼與使用 A 方法的程式碼 原程式碼的 fork_count 達到限制次數之前,每次 fork 都會增加當前的 process 數量,代表最多有 2^n 個 process 同時運行。所以我認為可以建立相同數量的 thread 來進行比較。 經過測試發現,原程式碼的執行時間和使用的資源並不會隨著 fork 數量的增加而有顯著的改變,但是經過我的修改,卻變成所有開銷都會隨著 thread 數量增加而變多。 原程式碼: ``` fork_count 最多為 2: Performance counter stats for './t' (5 runs): 353,7818 cache-misses # 42.28% of all cache refs ( +- 1.37% ) 836,6862 cache-references ( +- 2.80% ) 69,9744,1954 instructions # 1.58 insn per cycle ( +- 0.00% ) 44,2110,0134 cycles ( +- 0.05% ) 11 context-switches ( +- 20.85% ) 5417,7829 branch-misses ( +- 0.04% ) 4 migrations ( +- 52.68% ) 13,5404 page-faults ( +- 0.00% ) 1.01920 +- 0.00303 seconds time elapsed ( +- 0.30% ) fork_count 最多為 3: Performance counter stats for './merge_sort' (5 runs): 361,0411 cache-misses # 38.77% of all cache refs ( +- 1.38% ) 931,2532 cache-references ( +- 4.38% ) 70,0072,1959 instructions # 1.59 insn per cycle ( +- 0.01% ) 43,9481,5979 cycles ( +- 0.02% ) 19 context-switches ( +- 13.56% ) 5398,8739 branch-misses ( +- 0.03% ) 10 migrations ( +- 21.59% ) 13,5530 page-faults ( +- 0.00% ) 1.01197 +- 0.00249 seconds time elapsed ( +- 0.25% ) fork_count 最多為 4: Performance counter stats for './t' (5 runs): 361,2487 cache-misses # 40.36% of all cache refs ( +- 2.49% ) 895,0482 cache-references ( +- 5.96% ) 70,0375,8024 instructions # 1.58 insn per cycle ( +- 0.00% ) 44,3996,1195 cycles ( +- 0.10% ) 24 context-switches ( +- 17.66% ) 5403,8126 branch-misses ( +- 0.07% ) 10 migrations ( +- 34.35% ) 13,5783 page-faults ( +- 0.00% ) 1.01479 +- 0.00355 seconds time elapsed ( +- 0.35% ) ``` 我的程式碼: ``` Thread 數量為 4: Performance counter stats for './main' (5 runs): 4634,6027 cache-misses # 8.37% of all cache refs ( +- 0.34% ) 5,5372,3054 cache-references ( +- 1.35% ) 385,1146,9270 instructions # 0.73 insn per cycle ( +- 0.27% ) 528,7587,6798 cycles ( +- 0.44% ) 251,5514 context-switches ( +- 0.45% ) 1,1819,8786 branch-misses ( +- 0.24% ) 2262 migrations ( +- 8.48% ) 19,6335 page-faults ( +- 0.00% ) 6.1997 +- 0.0377 seconds time elapsed ( +- 0.61% ) Thread 數量為 8: Performance counter stats for './main' (5 runs): 4758,8317 cache-misses # 5.53% of all cache refs ( +- 0.34% ) 8,6049,3112 cache-references ( +- 0.50% ) 690,0228,5445 instructions # 0.62 insn per cycle ( +- 0.26% ) 1107,2844,8769 cycles ( +- 0.43% ) 644,9611 context-switches ( +- 0.36% ) 2,0449,7558 branch-misses ( +- 0.19% ) 9244 migrations ( +- 4.60% ) 19,6417 page-faults ( +- 0.00% ) 10.0268 +- 0.0538 seconds time elapsed ( +- 0.54% ) Thread 數量為 16: Performance counter stats for './main' (5 runs): 5521,1937 cache-misses # 3.68% of all cache refs ( +- 0.49% ) 14,9976,0521 cache-references ( +- 0.37% ) 1216,9900,6744 instructions # 0.56 insn per cycle ( +- 0.31% ) 2178,4866,1794 cycles ( +- 0.32% ) 1326,1207 context-switches ( +- 0.47% ) 3,2319,3069 branch-misses ( +- 0.49% ) 15,9792 migrations ( +- 9.34% ) 19,6589 page-faults ( +- 0.00% ) 18.132 +- 0.112 seconds time elapsed ( +- 0.62% ) ``` #### 分析結果 原始 fork 版本效能較好的可能原因: 1. Fork 建立的是獨立的 process,有各自的記憶體空間,減少了資源競爭。 2. Child process 結束後會直接退出,減少了長期運行的開銷。 我的版本效能較差的可能原因: 1. Threads 共享同一個記憶體空間,可能導致更多的資源競爭和同步開銷。 2. Thread 直到程式結束前才會全部退出,可能導致額外的管理開銷。 我會繼續探索這些問題並嘗試更多優化方案,以進一步提升 concurrent merge sort 的效能。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully