http://forumloverz.blog18.net 就可以使用intptr_t ```c #include <stdio.h> #include <stdint.h> int main() { int i = 10; int *p = &i; intptr_t pp = (intptr_t)p; return 0; } ``` 所以在這邊的struct就是link list的node裡面的data有做型態轉換 node 定義完後,就可繼續定義 llist_t: ```c= typedef struct llist { node_t *head; uint32_t size; } llist_t; ``` 定義完型態,接著是各種操作: ```c= // 建立新的 list 物件 llist_t *list_new(); // 從 list 插入給定資料 int list_add(llist_t *the_list, val_t val); // 逐一印列 list 內容 void list_print(llist_t *the_list); // 產生新的節點 node_t *new_node(val_t val, node_t *next); // 取得某個 index 的節點資訊 node_t *list_get(llist_t *the_list, uint32_t index); ``` 到這邊是list.h的內容 ### [Thread pool](http://swind.code-life.info/posts/c-thread-pool.html)(對照[Toward Concurrency](https://hackmd.io/s/H10MXXoT)) 在這邊我同時看了讀了[C 的 Thread Pool 筆記](http://swind.code-life.info/posts/c-thread-pool.html)下圖是表示圖 ![](https://i.imgur.com/T4EBNbc.png) #### Thread Pool 的資料結構 首先 Thread Pool 要有的東西就是 job 或者是 task 讓 Thread 知道他們要做什麼事情。 定義 task_t 來封裝 task: ```c= typedef struct _task { void (*func)(void *); /* 對應到最終執行的函式 */ void *arg; /* 傳入的參數 */ struct _task *next, *last; /* 因為 queue 要用 doubly-linked list, 需要儲存 next 和 last */ } task_t; ``` 在這邊task queue適用double-link list來建立成的 定義Task的free操作 ```c= int task_free(task_t *the_task); ``` 所以只要有一個資料結構紀錄要執行的 function pointer 與要傳遞的參數即可。 接下來就是 Thread Pool 本身,他必須存放所有的 Thread 與 Job Queue 再來是 thread pool 所需的 queue 結構: ```c= typedef struct { task_t *head, *tail; pthread_mutex_t mutex; pthread_cond_t cond; uint32_t size; } tqueue_t; ``` 這邊他使用了task_t 的 pointer 來紀錄thread pool中的task queue,簡單來說就是一個 task_t 的 array,而 head, tail 就是紀錄 array 的 offset。 其中用到了mutex(參考:[[轉]Linux 中程式同步處理概念 - Mutex](http://huenlil.pixnet.net/blog/post/23950021-%5B%E8%BD%89%5Dlinux-%E4%B8%AD%E7%A8%8B%E5%BC%8F%E5%90%8C%E6%AD%A5%E8%99%95%E7%90%86%E6%A6%82%E5%BF%B5---mutex))、[pthread_cond_t](http://fanli7.net/a/bianchengyuyan/C__/20130421/343974.html),他們常常一起使用為什麼呢?[這邊有一個很好的例子可以搞懂](http://blog.sina.com.cn/s/blog_6ffd3b5c0100mc3n.html) 在這邊我也同時看了[Mathias Brossard的thread pool](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c)作法 與[Johan Hanssen Seferidis的thread pool](https://github.com/Pithikos/C-Thread-Pool/blob/master/thpool.c)做法, 看有沒有機會來修改程式 ```c= int tqueue_init(tqueue_t *the_queue); task_t *tqueue_pop(tqueue_t *the_queue); uint32_t tqueue_size(tqueue_t *the_queue); int tqueue_push(tqueue_t *the_queue, task_t *task); int tqueue_free(tqueue_t *the_queue); ``` 接著把 queue 和 thread 包裝成 thread pool: ```c= typedef struct { pthread_t *threads; uint32_t count; tqueue_t *queue; } tpool_t; int tpool_init(tpool_t *the_pool, uint32_t count,void *(*func)(void *)); int tpool_free(tpool_t *the_pool); ``` 想要知道這邊是怎麼包裝的要詳細去看`tpool_init`會比較了解,這邊用簡單的圖示來表達我對於這段code的理解 ![](https://i.imgur.com/iwftacQ.png) :::info tqueue_t * queue就是task queue的宣告,而pthread_t *threads就是thread pool的宣告 ::: 看到這邊大概就了解了程式的架構了 之後實做`task_run`作為稍早提到流程圖的主迴圈 (main-loop) ```c= void *task_run(void *data) { task_t *cur_task = NULL; while (1) { cur_task = tqueue_pop(pool->queue); if (cur_task){ >>>>>>> if (!cur_task->func) { >>>>>這邊我看不太懂QQ tqueue_push(pool->queue, cur_task); break; } else{ curTask->func(cur_task->arg); task_free(cur_task); } } } pthread_exit(NULL); } ``` 有了這樣的基礎建設,mergesort 就很容易透過 task 這樣的包裝,加入 thread pool 中。 ## 分析 mutex contention 這邊是mutrace的結果 > (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 1 8 ``` mutrace: 0.2 sucessfully initialized for process sort (pid 27914). input unsorted data line-by-line sorted results: [4110] [7204] [10821] [13027] [14267] [19687] [23790] [29726] mutrace: Showing statistics for process sort (pid 27914). mutrace: 3 mutexes used. mutrace: No mutex contended according to filtering parameters. mutrace: Total runtime is 0.654 ms. mutrace: Results for SMP with 4 processors. ``` > (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 2 8 ``` mutrace: 0.2 sucessfully initialized for process sort (pid 27926). input unsorted data line-by-line sorted results: [56] [4097] [8133] [14146] [17152] [20812] [23018] [30057] mutrace: Showing statistics for process sort (pid 27926). mutrace: 3 mutexes used. Mutex #1 (0x0x1e799d0) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2] ./sort(tqueue_init+0x38) [0x401277] ./sort(tpool_init+0x6a) [0x4014cc] ./sort(main+0x161) [0x401c74] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830] Mutex #0 (0x0x603120) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2] ./sort(main+0x11d) [0x401c30] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830] mutrace: Showing 2 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 90 8 5 0.017 0.000 0.001 Mx.--. 0 5 3 0 0.001 0.000 0.000 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 0.670 ms. mutrace: Results for SMP with 4 processors. ``` > (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 4 8 ``` mutrace: 0.2 sucessfully initialized for process sort (pid 27938). input unsorted data line-by-line sorted results: [677] [6066] [6705] [6776] [15791] [22869] [24944] [30130] mutrace: Showing statistics for process sort (pid 27938). mutrace: 3 mutexes used. Mutex #1 (0x0x1cd39b0) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2] ./sort(tqueue_init+0x38) [0x401277] ./sort(tpool_init+0x6a) [0x4014cc] ./sort(main+0x161) [0x401c74] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830] Mutex #0 (0x0x603120) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2] ./sort(main+0x11d) [0x401c30] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830] Mutex #2 (0x0x7f235a8ef860) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f235d2e96b9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f235a6ebfec] [(nil)] mutrace: Showing 3 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 244 45 37 0.045 0.000 0.001 Mx.--. 0 13 9 2 0.004 0.000 0.001 M-.--. 2 20 7 2 0.006 0.000 0.001 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 0.853 ms. mutrace: Results for SMP with 4 processors. ``` 我想要先觀察thread number對於執行時間的影響 先修改Makefile ```shell mutrace_test: sort for number in 1 2 4 8 16 32 64 128 256 512;do\ (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort $$number 8;\ done ``` 將時間抓出來 ```shell= make mutrace_test 2>&1 >/dev/null | grep -i 'Total runtime' | awk '{print $5}' > time ``` 最後結果 ``` Thread_num Time 1 0.613 2 0.596 4 0.666 8 0.766 16 1.570 32 1.540 64 5.124 128 9.999 256 10.884 512 307.797 ``` 將資料換成80000筆 ``` 136.403 80.437 76.989 77.935 75.384 77.527 81.285 92.251 232.526 802.332 ``` 將資料換成160000筆 ``` 238.642 210.835 161.025 159.942 184.871 213.917 228.325 205.106 613.932 2466.450 ``` 在閱讀`TempoJiJi`同學的[筆記](https://hackmd.io/s/B1-Kytv0)後我決定修改我的Makefile,我寫的實在有夠爛,好想直接刪掉,這邊是他的版本 ```shell= bench: for i in `seq 100000 10000 500000`; do \ ./input_generator $$i; \ for j in 1 2 4 8 16 32 64; do \ ./sort $$j $$i; \ done \ done plot: bench gnuplot runtime.gp ``` 我改了一下,這邊先寫出我自己的版本,但我實在有夠廢,卡一整晚卡在grep不知道怎麼用在Makefile裡面 ```shell= mutrace_test: sort for j in $$(seq 100000 10000 50000); do \ #這邊是N的size for number in 1 2 4 8 16 32 64 128 ; do \ #這邊是thread的數量 (for i in {1..$$j}; do echo $RANDOM; done) | mutrace ./sort $$number $$j # | 「這邊再補上grep totoal runtime就可以把時間拿出來可是我很廢弄不出來」\ done \ done ``` 好吧,簡單說我的改法就是將`TempoJiJi`的input_generator換成直接用在Makefile裡的command不用再多跑一個程式,但是我的時間要怎麼抓還要再想想,可能我在請教一下`TempoJiJi` >> 你需要比照之前 clz 作業的統計模型,因為涉及到輸入資料實際上是處於不連續記憶體的分佈,變因頗多,最好先縮減範圍 [name=jserv] >> 好的,會仔細去比較看看 [name=SwimGlass] 閱讀 [Measure time in Linux - time vs clock vs getrusage vs clock_gettime vs gettimeofday vs timespec_get?](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof) 參考並修改[王紹華同學的compute pi](https://hackmd.io/s/BJdYsOPa)來做信賴區間的實作,來統計執行的時間,下面就是有著信賴區間的自動測試版本 ```shell= for i in `seq 100 5000 1000000`; do \ printf "%d " $i;\ for thread in 2 4 8 16 32 64 ;do\ (for i in {1..$j}; do echo ANDOM; done) | ./sort $thread $i;\ done;\ printf "\n";\ done > result_clock_gettime.csv ``` 最後的執行成果在這邊 ![](https://i.imgur.com/H8Nid1z.png) 邊可以發現時間會隨著thread number的數量增加而增加,本來是想探討thread數量對於mutex使用的影響,但是目前測出來最高就是用到三個 ## 改進方向 - [ ] 閱讀[concurrent-ll](https://github.com/jserv/concurrent-ll) - [ ] 調整merge sort > 詢問過後:計算merger sort 的空間複雜度,建立memory pool給merge sort 使用,分析完後配置最大空間,但是同一時間可能會同時有兩個地方在merge,防範overlap ,要把記憶體切成若干區段,最後的資料搬動(找memory pool的實作),malloc的回收 排序的資料在記憶體中不連續(找link-list cache)