http://forumloverz.blog18.net
就可以使用intptr_t

#include <stdio.h>
#include <stdint.h>
int main()
{
    int i = 10;
    int *p = &i;
    intptr_t pp = (intptr_t)p;
    return 0;
}

所以在這邊的struct就是link list的node裡面的data有做型態轉換

node 定義完後,就可繼續定義 llist_t:

typedef struct llist { node_t *head; uint32_t size; } llist_t;

定義完型態,接著是各種操作:

// 建立新的 list 物件 llist_t *list_new(); // 從 list 插入給定資料 int list_add(llist_t *the_list, val_t val); // 逐一印列 list 內容 void list_print(llist_t *the_list); // 產生新的節點 node_t *new_node(val_t val, node_t *next); // 取得某個 index 的節點資訊 node_t *list_get(llist_t *the_list, uint32_t index);

到這邊是list.h的內容

Thread pool(對照Toward Concurrency)

在這邊我同時看了讀了C 的 Thread Pool 筆記下圖是表示圖

Thread Pool 的資料結構

首先 Thread Pool 要有的東西就是 job 或者是 task 讓 Thread 知道他們要做什麼事情。
定義 task_t 來封裝 task:

typedef struct _task { void (*func)(void *); /* 對應到最終執行的函式 */ void *arg; /* 傳入的參數 */ struct _task *next, *last; /* 因為 queue 要用 doubly-linked list, 需要儲存 next 和 last */ } task_t;

在這邊task queue適用double-link list來建立成的
定義Task的free操作

int task_free(task_t *the_task);

所以只要有一個資料結構紀錄要執行的 function pointer 與要傳遞的參數即可。 接下來就是 Thread Pool 本身,他必須存放所有的 Thread 與 Job Queue
再來是 thread pool 所需的 queue 結構:

typedef struct { task_t *head, *tail; pthread_mutex_t mutex; pthread_cond_t cond; uint32_t size; } tqueue_t;

這邊他使用了task_t 的 pointer 來紀錄thread pool中的task queue,簡單來說就是一個 task_t 的 array,而 head, tail 就是紀錄 array 的 offset。
其中用到了mutex(參考:[轉]Linux 中程式同步處理概念 - Mutex)、pthread_cond_t,他們常常一起使用為什麼呢?這邊有一個很好的例子可以搞懂
在這邊我也同時看了Mathias Brossard的thread pool作法
Johan Hanssen Seferidis的thread pool做法,
看有沒有機會來修改程式

int tqueue_init(tqueue_t *the_queue); task_t *tqueue_pop(tqueue_t *the_queue); uint32_t tqueue_size(tqueue_t *the_queue); int tqueue_push(tqueue_t *the_queue, task_t *task); int tqueue_free(tqueue_t *the_queue);

接著把 queue 和 thread 包裝成 thread pool:

typedef struct { pthread_t *threads; uint32_t count; tqueue_t *queue; } tpool_t; int tpool_init(tpool_t *the_pool, uint32_t count,void *(*func)(void *)); int tpool_free(tpool_t *the_pool);

想要知道這邊是怎麼包裝的要詳細去看tpool_init會比較了解,這邊用簡單的圖示來表達我對於這段code的理解

tqueue_t * queue就是task queue的宣告,而pthread_t *threads就是thread pool的宣告

看到這邊大概就了解了程式的架構了

之後實做task_run作為稍早提到流程圖的主迴圈 (main-loop)

void *task_run(void *data) { task_t *cur_task = NULL; while (1) { cur_task = tqueue_pop(pool->queue); if (cur_task){ >>>>>>> if (!cur_task->func) { >>>>>這邊我看不太懂QQ tqueue_push(pool->queue, cur_task); break; } else{ curTask->func(cur_task->arg); task_free(cur_task); } } } pthread_exit(NULL); }

有了這樣的基礎建設,mergesort 就很容易透過 task 這樣的包裝,加入 thread pool 中。

分析 mutex contention

這邊是mutrace的結果

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 1 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27914).
input unsorted data line-by-line

sorted results:
[4110] [7204] [10821] [13027] [14267] [19687] [23790] [29726] 

mutrace: Showing statistics for process sort (pid 27914).
mutrace: 3 mutexes used.

mutrace: No mutex contended according to filtering parameters.

mutrace: Total runtime is 0.654 ms.

mutrace: Results for SMP with 4 processors.

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 2 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27926).
input unsorted data line-by-line

sorted results:
[56] [4097] [8133] [14146] [17152] [20812] [23018] [30057] 

mutrace: Showing statistics for process sort (pid 27926).
mutrace: 3 mutexes used.

Mutex #1 (0x0x1e799d0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
	./sort(tqueue_init+0x38) [0x401277]
	./sort(tpool_init+0x6a) [0x4014cc]
	./sort(main+0x161) [0x401c74]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]

Mutex #0 (0x0x603120) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
	./sort(main+0x11d) [0x401c30]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]

mutrace: Showing 2 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1       90        8        5        0.017        0.000        0.001 Mx.--.
       0        5        3        0        0.001        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 0.670 ms.

mutrace: Results for SMP with 4 processors.

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 4 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27938).
input unsorted data line-by-line

sorted results:
[677] [6066] [6705] [6776] [15791] [22869] [24944] [30130] 

mutrace: Showing statistics for process sort (pid 27938).
mutrace: 3 mutexes used.

Mutex #1 (0x0x1cd39b0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
	./sort(tqueue_init+0x38) [0x401277]
	./sort(tpool_init+0x6a) [0x4014cc]
	./sort(main+0x161) [0x401c74]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]

Mutex #0 (0x0x603120) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
	./sort(main+0x11d) [0x401c30]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]

Mutex #2 (0x0x7f235a8ef860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f235d2e96b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f235a6ebfec]
	[(nil)]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1      244       45       37        0.045        0.000        0.001 Mx.--.
       0       13        9        2        0.004        0.000        0.001 M-.--.
       2       20        7        2        0.006        0.000        0.001 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 0.853 ms.

mutrace: Results for SMP with 4 processors.

我想要先觀察thread number對於執行時間的影響
先修改Makefile

mutrace_test: sort
        for number in 1 2 4 8 16 32 64 128 256 512;do\
                (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort $$number 8;\
        done

將時間抓出來

make mutrace_test 2>&1 >/dev/null | grep -i 'Total runtime' | awk '{print $5}' > time

最後結果

Thread_num 	Time
1 		0.613
2		0.596
4 		0.666
8 		0.766
16 		1.570
32 		1.540
64 		5.124
128 		9.999
256 		10.884
512 		307.797

將資料換成80000筆

136.403
80.437
76.989
77.935
75.384
77.527
81.285
92.251
232.526
802.332

將資料換成160000筆

238.642
210.835
161.025
159.942
184.871
213.917
228.325
205.106
613.932
2466.450

在閱讀TempoJiJi同學的筆記後我決定修改我的Makefile,我寫的實在有夠爛,好想直接刪掉,這邊是他的版本

bench: for i in `seq 100000 10000 500000`; do \ ./input_generator $$i; \ for j in 1 2 4 8 16 32 64; do \ ./sort $$j $$i; \ done \ done plot: bench gnuplot runtime.gp

我改了一下,這邊先寫出我自己的版本,但我實在有夠廢,卡一整晚卡在grep不知道怎麼用在Makefile裡面

mutrace_test: sort for j in $$(seq 100000 10000 50000); do \ #這邊是N的size for number in 1 2 4 8 16 32 64 128 ; do \ #這邊是thread的數量 (for i in {1..$$j}; do echo $RANDOM; done) | mutrace ./sort $$number $$j # | 「這邊再補上grep totoal runtime就可以把時間拿出來可是我很廢弄不出來」\ done \ done

好吧,簡單說我的改法就是將TempoJiJi的input_generator換成直接用在Makefile裡的command不用再多跑一個程式,但是我的時間要怎麼抓還要再想想,可能我在請教一下TempoJiJi

你需要比照之前 clz 作業的統計模型,因為涉及到輸入資料實際上是處於不連續記憶體的分佈,變因頗多,最好先縮減範圍 jserv
好的,會仔細去比較看看 SwimGlass

閱讀 Measure time in Linux - time vs clock vs getrusage vs clock_gettime vs gettimeofday vs timespec_get?

參考並修改王紹華同學的compute pi來做信賴區間的實作,來統計執行的時間,下面就是有著信賴區間的自動測試版本

for i in `seq 100 5000 1000000`; do \ printf "%d " $i;\ for thread in 2 4 8 16 32 64 ;do\ (for i in {1..$j}; do echo ANDOM; done) | ./sort $thread $i;\ done;\ printf "\n";\ done > result_clock_gettime.csv

最後的執行成果在這邊


邊可以發現時間會隨著thread number的數量增加而增加,本來是想探討thread數量對於mutex使用的影響,但是目前測出來最高就是用到三個

改進方向

詢問過後:計算merger sort 的空間複雜度,建立memory pool給merge sort 使用,分析完後配置最大空間,但是同一時間可能會同時有兩個地方在merge,防範overlap ,要把記憶體切成若干區段,最後的資料搬動(找memory pool的實作),malloc的回收
排序的資料在記憶體中不連續(找link-list cache)