owned this note
owned this note
Published
Linked with GitHub
# 2016q3 Homework3 (mergesort-concurrent)
contribute by <`heathcliffYang`>
[Github](https://github.com/heathcliffYang/mergesort-concurrent)
## 目標
* 作為 [concurrency](/s/H10MXXoT) 的展示案例
* 學習 POSIX Thread Programming,特別是 [synchronization object](https://docs.oracle.com/cd/E19683-01/806-6867/6jfpgdcnd/index.html)
* 為日後效能分析和 scalability 研究建構基礎建設
* 學習程式品質分析和相關的開發工具
## 理解Code
### list.h
llist_t 有head指向第一個node_t,而非llist_t本身的指標指向第一個node & size (list大小)
**node_t :** datatype為intptr_t的data
> 若在不同的平台其長度也不相同,那就不懂使用intptr_t的用意
>> 其長度都會恰好與使用平台的指標長度相同,因此該變數可以拿來儲存數值,也可以拿來儲存指標 [name=Hua]
>>
### threadpool.h
**task_t :** 單一工作,有待執行function的pointer,傳入的變數,指向下一個task_t的next,指向上一個工作的last
**tqueue_t :** 是工作佇列,紀錄頭尾的pointer,==mutex (thread搶奪到時方能執行task_t) 與 cond (工作通知)==
**tpool_t :** 放置threads & queue
### threadpool.c
**task_free :** 釋放單一工作 ==記得用完要釋放!!==
**tqueue_pop & tqueue_push :** 從tail取工作,回傳工作指標;從head把工作塞入 ==在更動queue裡的東西必要注意mutex==
**tqueue_free :** 用迴圈把一整串queue free掉
==實驗只free head 與此的差別==
### main.c
==--- global ---==
**tmp_list** , **the_list** , thread_count , data_count , max_cut & pool
**data_context :** mutex & cut_thread_count (紀錄thread切分list的次數)
==--- global ---==
**merge_list() :** merge兩個a list & b list,並回傳排序好的_lsit
```c
llist_t *small = (llist_t*) ((intptr_t) a * (a->head->data <= b->head->data) +
(intptr_t) b * (a->head->data > b->head->data));
```
small為a或b裡data較小者的list
> 為什麼要intptr_t?
> ==a & b 是 pointer,用(intptr_t)強制轉換datatype,使其能與判斷式的結果(int)匹配(不會出現addr與int相乘的error也不會出現warning:cast from ptr to int)而可以相乘== --- from Hua 提供的資訊與試驗
>> 程式碼是寫給你改的,不是給你背誦用,覺得有疑惑,就大膽去實驗,比方說改為 `(void *)` 再觀察看看 [name=jserv]
>> 好的,謝謝老師
**merge_sort() :** 將傳入的list分兩半(用`list_nth()`找出要的mid的list位置),並且處理前半段left list的最後一個node_t的指標與size
**merge() :** `_list->size < (uint32) data_count` 成立,代表list還沒全部merge_sort完,則用_list與_t(也就是已有東西的tmp_list)做新工作排入task queue;若不成立,則將_list放到the_list
==tmp_list的意義 : 待被`merge_list`的list==
**cut_func() :** ==成立條件 : list->size > 1 且 cut_count < max_cut==,將傳入的list切半之後分別輸入task queue
**task_run() :** 從queue取出新工作並執行
**main() :**
1. argv[]-->輸入分別為thread_count跟data_count,max_cut取min(thread_count, data_count)
2. 初始化threadpool, task ---> create thread_count個thread,並傳入task_run
3. 製作`cut->func`的task並傳入queue,由某個thread搶到這個工作後,開始切分原始的list,並製造left & right的新的切分工作傳入queue,直到達到max_cut
4. 開始執行`merge`裡的`merge_sort`把list成兩半直到left&right都只剩1個node,再開始`merge_list`再回傳,直到merge完所有的data
5. ==處理資料輸入 :==
> 與hugikun999討論到,atoi()不能轉換整個字串,而我做了一個小實驗,發現對於字串的指標,shift一個byte的話,他照樣會記錄後面的字串,而不是其中某個字元;所以還在思考到底要用甚麼方法,才能一個一個地將字元轉換;之後也考慮到mmap()時,對於一個pointer來說,他怎麼知道他是要取這樣一個長度?是取決於一開始對該pointer宣告的datatype?(實驗中)
> ```c
> char word[16]="apple";
> printf("%s",word+1); -->會印出pple
> ```
> 其中也找到了一個函式:`getc( ) //從某一個檔案讀取一個字元`
> with hugikun999's help, I use mmap() successfully!!
### Makefile (not yet)
`$ (for i in {1..8}; do echo $RANDOM; done) | ./sort 4 8`
[資料](http://linux.vbird.org/linux_basic/0320bash.php)
## Start My Try!!
### deal with my data set
After I get the mmap pointer of the data set, how can I use it to turn the character one by one? Try to print what the map pointer point to.
```c
char *map = mmap(NULL, file_size, PROT_READ, MAP_SHARED, fp_orig, 0);
printf("%s\n", map);
```
It print all words... with "enter!"
my first word is kurupi
```c
char *map = mmap(NULL, file_size, PROT_READ, MAP_SHARED, fp_orig, 0);
char getline = *(map+7);
printf("%s\n", &getline);
-----------------------------------------------------------------
output is :
s // with a strange symbol but I don't know why
```
With this, I know data in virtual memory is continuouns and ignore the "enter." I know I need align my data (more desirable) or separate them..
`read(fp_orig, read_line, 1)` won't ignore the "enter." ---> method 1 : read one by one
fopen the same file is OK. ---> method 2 : read line by line ==****use this==
### analysis
1. each word will be a `node_t`, with a little array consisting of characters. ---> no need to be merge
2. the llist_t consists of words!
### orignal method's time
set structure time; sort time:0.354496,1.233266
## Pthread
* [Synchronization的函式](https://docs.oracle.com/cd/E19683-01/806-6867/6jfpgdcnd/index.html)
1. `int pthread_join(thread_t tid, void **status);`
When status is not NULL, it points to a location that is set to the exit status of the terminated thread when pthread_join() returns successfully.
If multiple threads wait for the same thread to terminate, they all wait until the target thread terminates, than one thread returns successfully and the others fail with an error of ESRCH.
> 不太懂status的部分的作用,以及先前是從哪裡設定的
2. The pthread_detach() function is used to indicate to the implementation that storage for the thread tid can be reclaimed when the thread terminates.
> detach的實際作用?
3. Thread-specific data is maintained on a per-thread basis. TSD is the only way to define and refer to data that is private to a thread. Each thread-specific data item is associated with a key that is global to all threads in the process. Using the key, a thread can access a pointer (void *) that is maintained per-thread.
4. The per-thread binding is deallocated when a thread terminates if the key was created with a destructor function.
5. When a key has a non-NULL destructor function and the thread has a non-NULL value associated with that key, the destructor function is called with the current associated value when the thread exits. The order in which the destructor functions are called is unspecified.
6. Signal Mask ??
7. When threads in two or more processes can use a single synchronization object jointly. Note that the synchronization object should be initialized by only one of the cooperating processes, because reinitializing a synchronization object sets it to the unlocked state.
8. Block on a condition variable ---> `pthread_cond_wait(3THR)`
> thread waiting的同時也把mutex的lock解除???被誰??