Untitled - HackMD

http://forumloverz.blog18.net
就可以使用intptr_t

#include <stdio.h>
#include <stdint.h>
int main()
{
    int i = 10;
    int *p = &i;
    intptr_t pp = (intptr_t)p;
    return 0;
}

所以在這邊的struct就是link list的node裡面的data有做型態轉換

node 定義完後，就可繼續定義 llist_t:




typedef struct llist {
    node_t *head;
    uint32_t size;
} llist_t;

定義完型態，接著是各種操作:














// 建立新的 list 物件
llist_t *list_new();

// 從 list 插入給定資料
int list_add(llist_t *the_list, val_t val);

// 逐一印列 list 內容
void list_print(llist_t *the_list);

// 產生新的節點
node_t *new_node(val_t val, node_t *next);

// 取得某個 index 的節點資訊
node_t *list_get(llist_t *the_list, uint32_t index);

到這邊是list.h的內容

Thread pool(對照Toward Concurrency)

在這邊我同時看了讀了C 的 Thread Pool 筆記下圖是表示圖

Thread Pool 的資料結構

首先 Thread Pool 要有的東西就是 job 或者是 task 讓 Thread 知道他們要做什麼事情。
定義 task_t 來封裝 task:







typedef struct _task {
    void (*func)(void *); /* 對應到最終執行的函式 */
    void *arg; /* 傳入的參數 */
    struct _task *next, *last;
    /* 因為 queue 要用 doubly-linked list，
       需要儲存 next 和 last */
} task_t;

在這邊task queue適用double-link list來建立成的
定義Task的free操作


int task_free(task_t *the_task);

所以只要有一個資料結構紀錄要執行的 function pointer 與要傳遞的參數即可。接下來就是 Thread Pool 本身，他必須存放所有的 Thread 與 Job Queue
再來是 thread pool 所需的 queue 結構:






typedef struct {
    task_t *head, *tail;
    pthread_mutex_t mutex;
    pthread_cond_t cond; 
    uint32_t size;
} tqueue_t;

這邊他使用了task_t 的 pointer 來紀錄thread pool中的task queue，簡單來說就是一個 task_t 的 array，而 head, tail 就是紀錄 array 的 offset。
其中用到了mutex(參考：[轉]Linux 中程式同步處理概念 - Mutex)、pthread_cond_t，他們常常一起使用為什麼呢？這邊有一個很好的例子可以搞懂
在這邊我也同時看了Mathias Brossard的thread pool作法
與Johan Hanssen Seferidis的thread pool做法，
看有沒有機會來修改程式





int tqueue_init(tqueue_t *the_queue);
task_t *tqueue_pop(tqueue_t *the_queue);
uint32_t tqueue_size(tqueue_t *the_queue);
int tqueue_push(tqueue_t *the_queue, task_t *task);
int tqueue_free(tqueue_t *the_queue);

接著把 queue 和 thread 包裝成 thread pool:








typedef struct {
    pthread_t *threads;
    uint32_t count;
    tqueue_t *queue;
} tpool_t;

int tpool_init(tpool_t *the_pool, uint32_t count,void *(*func)(void *));
int tpool_free(tpool_t *the_pool);

想要知道這邊是怎麼包裝的要詳細去看tpool_init會比較了解，這邊用簡單的圖示來表達我對於這段code的理解

tqueue_t * queue就是task queue的宣告，而pthread_t *threads就是thread pool的宣告

看到這邊大概就了解了程式的架構了

之後實做task_run作為稍早提到流程圖的主迴圈 (main-loop)

















void *task_run(void *data)
{
    task_t *cur_task = NULL;
    while (1) {
        cur_task = tqueue_pop(pool->queue);
        if (cur_task){
>>>>>>>     if (!cur_task->func) { >>>>>這邊我看不太懂ＱＱ
                tqueue_push(pool->queue, cur_task);
                break;
            } else{
                curTask->func(cur_task->arg);
                task_free(cur_task);
            }
        }
    }
    pthread_exit(NULL);
}

有了這樣的基礎建設，mergesort 就很容易透過 task 這樣的包裝，加入 thread pool 中。

分析 mutex contention

這邊是mutrace的結果

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 1 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27914).
input unsorted data line-by-line

sorted results:
[4110] [7204] [10821] [13027] [14267] [19687] [23790] [29726] 

mutrace: Showing statistics for process sort (pid 27914).
mutrace: 3 mutexes used.

mutrace: No mutex contended according to filtering parameters.

mutrace: Total runtime is 0.654 ms.

mutrace: Results for SMP with 4 processors.

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 2 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27926).
input unsorted data line-by-line

sorted results:
[56] [4097] [8133] [14146] [17152] [20812] [23018] [30057] 

mutrace: Showing statistics for process sort (pid 27926).
mutrace: 3 mutexes used.

Mutex #1 (0x0x1e799d0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
	./sort(tqueue_init+0x38) [0x401277]
	./sort(tpool_init+0x6a) [0x4014cc]
	./sort(main+0x161) [0x401c74]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]

Mutex #0 (0x0x603120) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
	./sort(main+0x11d) [0x401c30]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]

mutrace: Showing 2 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1       90        8        5        0.017        0.000        0.001 Mx.--.
       0        5        3        0        0.001        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 0.670 ms.

mutrace: Results for SMP with 4 processors.

(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 4 8

mutrace: 0.2 sucessfully initialized for process sort (pid 27938).
input unsorted data line-by-line

sorted results:
[677] [6066] [6705] [6776] [15791] [22869] [24944] [30130] 

mutrace: Showing statistics for process sort (pid 27938).
mutrace: 3 mutexes used.

Mutex #1 (0x0x1cd39b0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
	./sort(tqueue_init+0x38) [0x401277]
	./sort(tpool_init+0x6a) [0x4014cc]
	./sort(main+0x161) [0x401c74]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]

Mutex #0 (0x0x603120) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
	./sort(main+0x11d) [0x401c30]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]

Mutex #2 (0x0x7f235a8ef860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f235d2e96b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f235a6ebfec]
	[(nil)]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1      244       45       37        0.045        0.000        0.001 Mx.--.
       0       13        9        2        0.004        0.000        0.001 M-.--.
       2       20        7        2        0.006        0.000        0.001 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 0.853 ms.

mutrace: Results for SMP with 4 processors.

我想要先觀察thread number對於執行時間的影響
先修改Makefile

mutrace_test: sort
        for number in 1 2 4 8 16 32 64 128 256 512;do\
                (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort $$number 8;\
        done

將時間抓出來


make mutrace_test 2>&1 >/dev/null | grep -i 'Total runtime' | awk '{print $5}' > time

最後結果

Thread_num 	Time
1 		0.613
2		0.596
4 		0.666
8 		0.766
16 		1.570
32 		1.540
64 		5.124
128 		9.999
256 		10.884
512 		307.797

將資料換成80000筆

將資料換成160000筆

在閱讀TempoJiJi同學的筆記後我決定修改我的Makefile，我寫的實在有夠爛，好想直接刪掉，這邊是他的版本











bench:
        for i in `seq 100000 10000 500000`; do \
            ./input_generator $$i; \
            for j in 1 2 4 8 16 32 64; do \
            	./sort $$j $$i; \
            done \
    	done

plot: bench
        gnuplot runtime.gp

我改了一下，這邊先寫出我自己的版本，但我實在有夠廢，卡一整晚卡在grep不知道怎麼用在Makefile裡面







mutrace_test: sort
    for j in $$(seq 100000 10000 50000); do \ #這邊是Ｎ的size
        for number in 1 2 4 8 16 32 64 128 ; do \ #這邊是thread的數量
            (for i in {1..$$j}; do echo $RANDOM; done) | mutrace ./sort $$number $$j
    # | 「這邊再補上grep totoal runtime就可以把時間拿出來可是我很廢弄不出來」\  
        done \
    done

好吧，簡單說我的改法就是將TempoJiJi的input_generator換成直接用在Makefile裡的command不用再多跑一個程式，但是我的時間要怎麼抓還要再想想，可能我在請教一下TempoJiJi

你需要比照之前 clz 作業的統計模型，因為涉及到輸入資料實際上是處於不連續記憶體的分佈，變因頗多，最好先縮減範圍 jserv
好的，會仔細去比較看看 SwimGlass

閱讀 Measure time in Linux - time vs clock vs getrusage vs clock_gettime vs gettimeofday vs timespec_get?

參考並修改王紹華同學的compute pi來做信賴區間的實作，來統計執行的時間，下面就是有著信賴區間的自動測試版本








for i in `seq 100 5000 1000000`; do \
	printf "%d " $i;\
                for thread in  2 4 8 16 32 64 ;do\
		(for i in {1..$j}; do echo ANDOM; done) | ./sort $thread $i;\
	done;\
	printf "\n";\
done > result_clock_gettime.csv

最後的執行成果在這邊

邊可以發現時間會隨著thread number的數量增加而增加，本來是想探討thread數量對於mutex使用的影響，但是目前測出來最高就是用到三個

改進方向

閱讀concurrent-ll
調整merge sort

詢問過後：計算merger sort 的空間複雜度，建立memory pool給merge sort 使用，分析完後配置最大空間，但是同一時間可能會同時有兩個地方在merge，防範overlap ，要把記憶體切成若干區段，最後的資料搬動（找memory pool的實作），malloc的回收
排序的資料在記憶體中不連續（找link-list cache）