2017q1 Homework4 (phonebook-concurrent)

contributed by <illusion030>

開發環境

illusion030@illusion030-X550LD:~/Desktop/2017sysprog/phonebook-concurrent$ lscpu
Architecture:          x86_64
CPU 作業模式：    32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
每核心執行緒數：2
每通訊端核心數：2
Socket(s):             1
NUMA 節點：         1
供應商識別號：  GenuineIntel
CPU 家族：          6
型號：              69
Model name:            Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
製程：              1
CPU MHz：             1649.475
CPU max MHz:           2600.0000
CPU min MHz:           800.0000
BogoMIPS:              4588.95
虛擬：              VT-x
L1d 快取：          32K
L1i 快取：          32K
L2 快取：           256K
L3 快取：           3072K
NUMA node0 CPU(s):     0-3

開發過程

重現實驗

fork phonebook-concurrent 後執行看看
append() 的時間甚至比 findName() 還低

Refactoring

參考 csielee 的共筆及 async.h 和 async.c
將 phonebook 的 function 包成 API，使得 main.c 中只需要呼叫 API
orig 的 API

struct __PHONEBOOK_API__ pb = { 
    .findLastName = orig_findName,
    .append = orig_append,
    .write = orig_write,
    .free = orig_free,
};

opt 的 API

struct __PHONEBOOK_API__ pb = { 
    .findLastName = opt_findLastName,
    .append = opt_append,
    .write = opt_write,
    .free = opt_free,
};

實際跑看看，發現因為把 text_align() 放到 append() 裡執行，所以導致 append() 的時間大幅提昇

Thread pool

參考 concurrency 教材中的參考文件 threadpool-mbrossard 的作法
將 phonebook 用 thread pool 的方式改寫
Thread pool 實作原理
建一個 queue 並且每個 thread 會等待 task 進入 queue，當我們把 task 放進 queue 中，threads 會去競爭這個 task，搶到 task 的 thread 就開始執行，執行完後回來等待 queue 的下一個 task。
當 task 很多時，可以達到平均 thread 執行時間的效果，因為當一個 thread 執行完不用等到其他 thread 結束，就可以繼續執行下一個 task 了

pthread_mutex_t lock;
threadpool_t *pool;

pthread_mutex_init(&lock, NULL);

assert((pool = threadpool_create(THREAD_NUM, QUEUE_SIZE)) != NULL);

for (i = 0; i < TASK_NUM; i++)
    threadpool_add(pool, (void *)&threads_append, (void *)thread_args[i]);
                           
while(TASK_NUM != done) ;

assert(threadpool_destroy(pool, 0) == 0);

實際執行後發現不同的 QUEUE_SIZE 和 TASK_NUM 會影響效率，於是固定 QUEUE_SIZE 為 15000，TASK_NUM 從 1 跑到 9901 每次加 100，THREAD_NUM 為 4
發現執行時間非常的浮動，間距在 0.072556 到 0.144134 之間

max = 0.144134 in 5301
min = 0.072556 in 1301

執行時間最小 ( TASK_NUM = 1301 )
執行時間最大 ( TASK_NUM = 5301 )
認為是 TASK_NUM 放進 queue 跟 thread 拿出去之間關聯性影響了時間，要讓每個 thread 一直都有事做才會達到預期的平衡每個 thread 執行時間的效果
還要再思考看看 TASK_NUM 要怎麼取到最佳值

要注意是否有 lock contention，可用 mutrace 追蹤 –jserv