B07: phonebook-concurrent

tags: `sysprog2017`

主講人: jserv / 課程討論區: 2017 年系統軟體課程

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

返回「作業系統設計與實作」課程進度表

預期目標

學習第 4 週提及的 concurrency 程式設計
學習 POSIX Thread
學習效能分析工具
code refactoring 練習
探索 clz 的應用

Code Refactoring

refactoring（重構）的定義：「在不改變軟體的外在行為之下，改善既有軟體的內部設計」
侯捷：「作為一個程式員，任誰都有看不順眼手上程式碼的經驗 —— 程式碼來自你鄰桌那個菜鳥，或三個月前的自己。面臨此境，有人選擇得過且過；然而根據我對『程式員』人格特質的了解，更多人盼望插手整頓。挽起袖子劍及履及，其勇可嘉其慮未縝。過去或許不得不暴虎憑河,忍受風險。現在，有了嚴謹的重構準則和嚴密的重構手法，『穩定中求發展』終於有了保障。」
Refactoring 的示範請見李昆憶共筆
延伸閱讀:
- 《重構》(原文書名: Refactoring - Improving the Design of Existing Code)
- 什麼是Refactoring？
- 對於重構的兩則常見誤解

clz 的應用: 加速記憶體管理

摘錄自 MIT 的論文〈SuperMalloc: A Super Fast Multithreaded Malloc for 64-bit Machine〉

Small objects are as small as 8 bytes, and increase in size by at most 25% to limit internal fragmentation. Small object sizes are regularly spaced: the sizes are 8, 10, 12, 14, 16, 20, 24, . . . , 224, and their sizes take the form k · 2ⁱ
for 4 ≤ k ≤ 7 and 1 ≤ i ≤ 5. In other words, a small object size, when written in binary is a 1, followed by two arbitrary digits, followed by zeros. Thus bin 0 contains 8-byte objects, bin 1 contains 10-byte objects, and so forth. To compute the bin number from a small size can be done with bit hacking in O(1) operations.

以下程式碼轉換 size 為 bin number，透過 clz() 可得到一個數值用二進位表示中開頭的 "0" 的數量:

int size_2_bin(size_t s)
{
    if (s <= 8) return 0;
    if (s <= 320) {
        // Number of leading zeros in s.
        int z = clz(s);
        // Round up to the relevant
        // power of 2.
        size_t r = s + (1ul << (61 - z)) -1;
        int y = clz(r);
        // y indicates which power of two.
        // r shifted and masked indicates
        // what the low-order bits are.
        return 4 * (60 - y)+ ((r >> (61 - y)) & 3);
    }
    if (s <= 448) return 22;
    if (s <= 512) return 23;
    ...
    if (size <= 1044480) return 45;
    return 45 + ceil(size-1044480, 4096);
}

SuperMalloc (我們課程修改過的版本)，編譯和測試方式:

$ git clone https://github.com/sysprog21/SuperMalloc
$ cd SuperMalloc/release
$ make
$ LD_PRELOAD=lib/libsupermalloc_pthread.so ./test-malloc_test

執行最後一行需要等待，請抱持耐心，預期會得到類似 8996967 的數值輸出

之後可在執行時期 (runtime)，透過下方命令指定用 SuperMalloc (不需要重新編譯程式碼)

$ LD_PRELOAD=libsupermalloc_pthread.so所在的路徑 接著想執行的程式路徑

透過 POSIX Thread 搭配 mmap 的初步實驗

由吳彥寬貢獻的實驗
- 程式解說 video (必看!)
透過 mmap 避免 blocking I/O
透過 POSIX Thread 建立各自獨立的 linked list，最終再合併為單一 list

thread num = 1:
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
thread num = 2
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
thread num = 4
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
thread num = 8:
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
thread num = 16:
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

這邊很清楚可見，太多 thread 會導致更慢的執行時間，而執行緒數量為 1, 2, 4 時，沒有顯著差異，問題出在前述實做使用 sync 的方式，所以若有一個thread跑比較慢，那總體時間也就會比較慢。

作業要求

在 GitHub 上 fork phonebook-concurrent，然後適度修改 phonebook_opt.c 和相關的檔案
除了修改程式，也要編輯「作業區」，增添開發紀錄和 GitHub 連結
- 至少涵蓋研讀 concurrency 教材的認知、程式正確性驗證、效能分析實驗 (必須要有圖表)，以及充份說明你如何改善效能
延續 B01: phonebook 的開發方向，本作業著重於透過 POSIX Thread 來縮減 alloc() 的時間成本
- 詳細閱讀吳彥寬的實驗，指出他的實做缺失，並提出改進縮減 append() 時間的可行提案，接著開發程式來驗證
- 提示：可透過建立 thread pool 來管理 worker thread
- 需要一併實作出刪除 (remove) 特定資料的功能，並且分析對效能的影響。要留意 race condition 和正確性議題。
- 第一週 phonebook 未完成和待改進項目也一併在 phonebook-concurrent 的基礎下進行
學習 concurrent-ll (concurrent linked-list 實作) 的 scalability 分析方式，透過 gnuplot 製圖比較 list 操作的效能
一併嘗試重構 (refactor) 給定的程式碼，使得程式更容易閱讀和維護。延續 B05: introspect，不只是在共筆上用文字提出良性詳盡的批評，也該反映在程式碼的變革
截止日期：
- Mar 27, 2017 (含) 之前
- 越早在 GitHub 上有動態、越早接受 code review，評分越高

挑戰題

分析 SuperMalloc 對於多執行緒程式的效能影響
引入 Cache-friendly binary search
- 相關討論: Hacker News

共筆選讀

LanKuDot (李昆憶): 共筆

用 valgrind 檢查有無 memory leak 的問題。他發現原本吳彥寬同學撰寫的程式碼中，malloc 佔 28 次，卻只有 13 次 free。
如何修正 memory leak 呢？做了以下分析:
- 既然 THREAD_NUM 是在 compilation 時期指定，則隨 THREAD_NUM 變動的 array 長度可以不用使用 malloc
  thread_args 的元素也是透過 malloc 取得，要 free 掉
- 在 findName() 中會給找到的 entry 的 lastname 和 detail 配給記憶體，但是沒有處理已經被搜尋過的 entry 不需要在配置記憶體的情況，像在主程式中搜尋 zyxel 就有三次，造成前兩次配置的記憶體遺失。

c14006078 (吳彥寬): 共筆

思考到一個很基本的問題：malloc 和 assert 往往要針對不同型態，撰寫不同版本的定義，能否透過類似 C++ template 來實做呢？C11 有 _Generic 關鍵字，可做出類似的效果。
使用可以很精巧，可見共筆的 "Template in C" 一節。

TempoJiJi (邱靖吉): 共筆
tundergod (林文盛): 共筆

實作lock-free thread pool

CheHsuan (林哲亘): 共筆

整理 mmap 和 msync 的資訊。
有了基本概念後，我們可嘗試變更 mmap() 的參數，從 MAP_SHARED 換成 MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS 再對照執行時間的變化。

kevinbird61 (瞿旭民): 共筆

透過 setjmp / longjmp 實作 coroutine 的方式，並且實際撰寫程式來測試，他建立兩個 coroutine: A, B。
在他的測試中，setjmp 把資訊儲存於 bufferA 後執行 B routine後，再藉由 longjmp 回到當初於 A routine 中呼叫 B 的位置；藉此可以達到再兩個函式中切換；把 routine A, B 換成兩條thread，便是 thread 間維持 sequential consistency 的實作。