http://forumloverz.blog18.net
就可以使用intptr_t
```c
#include <stdio.h>
#include <stdint.h>
int main()
{
int i = 10;
int *p = &i;
intptr_t pp = (intptr_t)p;
return 0;
}
```
所以在這邊的struct就是link list的node裡面的data有做型態轉換
node 定義完後,就可繼續定義 llist_t:
```c=
typedef struct llist {
node_t *head;
uint32_t size;
} llist_t;
```
定義完型態,接著是各種操作:
```c=
// 建立新的 list 物件
llist_t *list_new();
// 從 list 插入給定資料
int list_add(llist_t *the_list, val_t val);
// 逐一印列 list 內容
void list_print(llist_t *the_list);
// 產生新的節點
node_t *new_node(val_t val, node_t *next);
// 取得某個 index 的節點資訊
node_t *list_get(llist_t *the_list, uint32_t index);
```
到這邊是list.h的內容
### [Thread pool](http://swind.code-life.info/posts/c-thread-pool.html)(對照[Toward Concurrency](https://hackmd.io/s/H10MXXoT))
在這邊我同時看了讀了[C 的 Thread Pool 筆記](http://swind.code-life.info/posts/c-thread-pool.html)下圖是表示圖

#### Thread Pool 的資料結構
首先 Thread Pool 要有的東西就是 job 或者是 task 讓 Thread 知道他們要做什麼事情。
定義 task_t 來封裝 task:
```c=
typedef struct _task {
void (*func)(void *); /* 對應到最終執行的函式 */
void *arg; /* 傳入的參數 */
struct _task *next, *last;
/* 因為 queue 要用 doubly-linked list,
需要儲存 next 和 last */
} task_t;
```
在這邊task queue適用double-link list來建立成的
定義Task的free操作
```c=
int task_free(task_t *the_task);
```
所以只要有一個資料結構紀錄要執行的 function pointer 與要傳遞的參數即可。 接下來就是 Thread Pool 本身,他必須存放所有的 Thread 與 Job Queue
再來是 thread pool 所需的 queue 結構:
```c=
typedef struct {
task_t *head, *tail;
pthread_mutex_t mutex;
pthread_cond_t cond;
uint32_t size;
} tqueue_t;
```
這邊他使用了task_t 的 pointer 來紀錄thread pool中的task queue,簡單來說就是一個 task_t 的 array,而 head, tail 就是紀錄 array 的 offset。
其中用到了mutex(參考:[[轉]Linux 中程式同步處理概念 - Mutex](http://huenlil.pixnet.net/blog/post/23950021-%5B%E8%BD%89%5Dlinux-%E4%B8%AD%E7%A8%8B%E5%BC%8F%E5%90%8C%E6%AD%A5%E8%99%95%E7%90%86%E6%A6%82%E5%BF%B5---mutex))、[pthread_cond_t](http://fanli7.net/a/bianchengyuyan/C__/20130421/343974.html),他們常常一起使用為什麼呢?[這邊有一個很好的例子可以搞懂](http://blog.sina.com.cn/s/blog_6ffd3b5c0100mc3n.html)
在這邊我也同時看了[Mathias Brossard的thread pool](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c)作法
與[Johan Hanssen Seferidis的thread pool](https://github.com/Pithikos/C-Thread-Pool/blob/master/thpool.c)做法,
看有沒有機會來修改程式
```c=
int tqueue_init(tqueue_t *the_queue);
task_t *tqueue_pop(tqueue_t *the_queue);
uint32_t tqueue_size(tqueue_t *the_queue);
int tqueue_push(tqueue_t *the_queue, task_t *task);
int tqueue_free(tqueue_t *the_queue);
```
接著把 queue 和 thread 包裝成 thread pool:
```c=
typedef struct {
pthread_t *threads;
uint32_t count;
tqueue_t *queue;
} tpool_t;
int tpool_init(tpool_t *the_pool, uint32_t count,void *(*func)(void *));
int tpool_free(tpool_t *the_pool);
```
想要知道這邊是怎麼包裝的要詳細去看`tpool_init`會比較了解,這邊用簡單的圖示來表達我對於這段code的理解

:::info
tqueue_t * queue就是task queue的宣告,而pthread_t *threads就是thread pool的宣告
:::
看到這邊大概就了解了程式的架構了
之後實做`task_run`作為稍早提到流程圖的主迴圈 (main-loop)
```c=
void *task_run(void *data)
{
task_t *cur_task = NULL;
while (1) {
cur_task = tqueue_pop(pool->queue);
if (cur_task){
>>>>>>> if (!cur_task->func) { >>>>>這邊我看不太懂QQ
tqueue_push(pool->queue, cur_task);
break;
} else{
curTask->func(cur_task->arg);
task_free(cur_task);
}
}
}
pthread_exit(NULL);
}
```
有了這樣的基礎建設,mergesort 就很容易透過 task 這樣的包裝,加入 thread pool 中。
## 分析 mutex contention
這邊是mutrace的結果
> (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 1 8
```
mutrace: 0.2 sucessfully initialized for process sort (pid 27914).
input unsorted data line-by-line
sorted results:
[4110] [7204] [10821] [13027] [14267] [19687] [23790] [29726]
mutrace: Showing statistics for process sort (pid 27914).
mutrace: 3 mutexes used.
mutrace: No mutex contended according to filtering parameters.
mutrace: Total runtime is 0.654 ms.
mutrace: Results for SMP with 4 processors.
```
> (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 2 8
```
mutrace: 0.2 sucessfully initialized for process sort (pid 27926).
input unsorted data line-by-line
sorted results:
[56] [4097] [8133] [14146] [17152] [20812] [23018] [30057]
mutrace: Showing statistics for process sort (pid 27926).
mutrace: 3 mutexes used.
Mutex #1 (0x0x1e799d0) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
./sort(tqueue_init+0x38) [0x401277]
./sort(tpool_init+0x6a) [0x4014cc]
./sort(main+0x161) [0x401c74]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]
Mutex #0 (0x0x603120) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fad23cb04b2]
./sort(main+0x11d) [0x401c30]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fad236e7830]
mutrace: Showing 2 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags
1 90 8 5 0.017 0.000 0.001 Mx.--.
0 5 3 0 0.001 0.000 0.000 M-.--.
||||||
/|||||
Object: M = Mutex, W = RWLock /||||
State: x = dead, ! = inconsistent /|||
Use: R = used in realtime thread /||
Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
Mutex Protocol: i = INHERIT, p = PROTECT /
RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC
mutrace: Note that the flags column R is only valid in --track-rt mode!
mutrace: Total runtime is 0.670 ms.
mutrace: Results for SMP with 4 processors.
```
> (for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort 4 8
```
mutrace: 0.2 sucessfully initialized for process sort (pid 27938).
input unsorted data line-by-line
sorted results:
[677] [6066] [6705] [6776] [15791] [22869] [24944] [30130]
mutrace: Showing statistics for process sort (pid 27938).
mutrace: 3 mutexes used.
Mutex #1 (0x0x1cd39b0) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
./sort(tqueue_init+0x38) [0x401277]
./sort(tpool_init+0x6a) [0x4014cc]
./sort(main+0x161) [0x401c74]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]
Mutex #0 (0x0x603120) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f235d2e94b2]
./sort(main+0x11d) [0x401c30]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f235cd20830]
Mutex #2 (0x0x7f235a8ef860) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f235d2e96b9]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f235a6ebfec]
[(nil)]
mutrace: Showing 3 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags
1 244 45 37 0.045 0.000 0.001 Mx.--.
0 13 9 2 0.004 0.000 0.001 M-.--.
2 20 7 2 0.006 0.000 0.001 M-.--.
||||||
/|||||
Object: M = Mutex, W = RWLock /||||
State: x = dead, ! = inconsistent /|||
Use: R = used in realtime thread /||
Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
Mutex Protocol: i = INHERIT, p = PROTECT /
RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC
mutrace: Note that the flags column R is only valid in --track-rt mode!
mutrace: Total runtime is 0.853 ms.
mutrace: Results for SMP with 4 processors.
```
我想要先觀察thread number對於執行時間的影響
先修改Makefile
```shell
mutrace_test: sort
for number in 1 2 4 8 16 32 64 128 256 512;do\
(for i in {1..8}; do echo $RANDOM; done) | mutrace ./sort $$number 8;\
done
```
將時間抓出來
```shell=
make mutrace_test 2>&1 >/dev/null | grep -i 'Total runtime' | awk '{print $5}' > time
```
最後結果
```
Thread_num Time
1 0.613
2 0.596
4 0.666
8 0.766
16 1.570
32 1.540
64 5.124
128 9.999
256 10.884
512 307.797
```
將資料換成80000筆
```
136.403
80.437
76.989
77.935
75.384
77.527
81.285
92.251
232.526
802.332
```
將資料換成160000筆
```
238.642
210.835
161.025
159.942
184.871
213.917
228.325
205.106
613.932
2466.450
```
在閱讀`TempoJiJi`同學的[筆記](https://hackmd.io/s/B1-Kytv0)後我決定修改我的Makefile,我寫的實在有夠爛,好想直接刪掉,這邊是他的版本
```shell=
bench:
for i in `seq 100000 10000 500000`; do \
./input_generator $$i; \
for j in 1 2 4 8 16 32 64; do \
./sort $$j $$i; \
done \
done
plot: bench
gnuplot runtime.gp
```
我改了一下,這邊先寫出我自己的版本,但我實在有夠廢,卡一整晚卡在grep不知道怎麼用在Makefile裡面
```shell=
mutrace_test: sort
for j in $$(seq 100000 10000 50000); do \ #這邊是N的size
for number in 1 2 4 8 16 32 64 128 ; do \ #這邊是thread的數量
(for i in {1..$$j}; do echo $RANDOM; done) | mutrace ./sort $$number $$j
# | 「這邊再補上grep totoal runtime就可以把時間拿出來可是我很廢弄不出來」\
done \
done
```
好吧,簡單說我的改法就是將`TempoJiJi`的input_generator換成直接用在Makefile裡的command不用再多跑一個程式,但是我的時間要怎麼抓還要再想想,可能我在請教一下`TempoJiJi`
>> 你需要比照之前 clz 作業的統計模型,因為涉及到輸入資料實際上是處於不連續記憶體的分佈,變因頗多,最好先縮減範圍 [name=jserv]
>> 好的,會仔細去比較看看 [name=SwimGlass]
閱讀 [Measure time in Linux - time vs clock vs getrusage vs clock_gettime vs gettimeofday vs timespec_get?](http://stackoverflow.com/questions/12392278/measure-time-in-linux-time-vs-clock-vs-getrusage-vs-clock-gettime-vs-gettimeof)
參考並修改[王紹華同學的compute pi](https://hackmd.io/s/BJdYsOPa)來做信賴區間的實作,來統計執行的時間,下面就是有著信賴區間的自動測試版本
```shell=
for i in `seq 100 5000 1000000`; do \
printf "%d " $i;\
for thread in 2 4 8 16 32 64 ;do\
(for i in {1..$j}; do echo ANDOM; done) | ./sort $thread $i;\
done;\
printf "\n";\
done > result_clock_gettime.csv
```
最後的執行成果在這邊

邊可以發現時間會隨著thread number的數量增加而增加,本來是想探討thread數量對於mutex使用的影響,但是目前測出來最高就是用到三個
## 改進方向
- [ ] 閱讀[concurrent-ll](https://github.com/jserv/concurrent-ll)
- [ ] 調整merge sort
> 詢問過後:計算merger sort 的空間複雜度,建立memory pool給merge sort 使用,分析完後配置最大空間,但是同一時間可能會同時有兩個地方在merge,防範overlap ,要把記憶體切成若干區段,最後的資料搬動(找memory pool的實作),malloc的回收
排序的資料在記憶體中不連續(找link-list cache)