A07: mergesort-concurrent

contributed by < snoopy831002 >

作業要求

將 merge sort 的實做改為可接受 phonebook-concurrent 的 35 萬筆資料輸入的資料檔
- 字典檔資料需要事先用 sort -R 處理過
- 思考如何得到均勻分佈的亂數排列，並且設計自動測試的機制
研究 thread pool 管理 worker thread 的實做，提出實做層面的不足，並且參照 concurrent-ll，提出 lock-free 的實做
學習 concurrent-ll (concurrent linked-list 實作) 的 scalability 分析方式，透過 gnuplot 製圖比較 merge sort 在不同執行緒數量操作的效能
- 注意到 linked list 每個節點配置的記憶體往往是不連續，思考這對效能分析的影響
一併嘗試重構 (refactor) 給定的程式碼，使得程式更容易閱讀和維護。延續 A05: introspect，不只是在共筆上用文字提出良性詳盡的批評，也該反映在程式碼的變革
共筆的內容儘量用 GraphViz 製作
截止日期：
- 08:00AM Oct 7, 2016 (含) 之前
- 越早在 GitHub 上有動態、越早接受 code review，評分越高

code review

ThreadPool:
程式碼內有兩用到threadpool

第一個地方：在main呼叫cut_func()的時候





 task_t *_task = (task_t *)
 malloc(sizeof(task_t));
 _task->func = cut_func;
 _task->arg = the_list;
 tqueue_push(pool->queue, _task);

cut_func()將會被視為一個工作丟入queue

第二個地方在cut_func()內遞迴呼叫cut_func()











    /* create new task: left */
    task_t *_task = (task_t *) malloc(sizeof(task_t));
    _task->func = cut_func;
    _task->arg = _list;
    tqueue_push(pool->queue, _task);

    /* create new task: right */
    _task = (task_t *) malloc(sizeof(task_t));
    _task->func = cut_func;
    _task->arg = list;
    tqueue_push(pool->queue, _task);

所以基本上切割時都是用threadpool的方式

觀察main函式
流程: cut_func() → merge_sort()→ merge_list() → merge()
cut_func() : 切割輸入的linked list


max_cut = thread_count * (thread_count <= data_count) +
data_count * (thread_count > data_count) - 1;

這裡有比較特別，他有做把資料平均分給每個thread的動作
如果data比thread多就把data平均切割給thread;如果data比thread少那就按照data的數量去切(例如：4個data，切三次就好)

merge_sort() : 也是切割輸入的linked list，所以我覺得它跟cut_func應該可以合併
merge_list() ：真正做排序的地方。我個人認為名稱不是很好，因為會讓我誤以為要做linked list的合併，但是其實它是在做sort
- 做比較的方法：
```
 llist_t *small = (llist_t *)
 ((intptr_t) a * (a->head->data <= b->head->data) +
 (intptr_t) b * (a->head->data > b->head->data));
```
如果a的第一個數值比b小，那比較小的數值就是a。反之，比較小的數值就是b。這樣的寫法很簡潔，不會出現一大堆if else(雖然要看一下才看得懂…)
merge()：合併剛剛切割的linked list

code refactoring

因為在上面code review中，發現其實 cut_func 與 merge_sort 有重複的地方，覺得這兩個地方可以合併ㄛ

使用GDB對segmentation fault(core dump)進行偵錯

做這份作業的時候常常會遇到segmentation fault(core dump)的問題。為了解決它，我嘗試使用GDB對它進行偵錯。

使用方法：

丟入待測執行檔(以sort程式作為範例) gdb sort
run and see what happen (gdb) run 4 6 ←run後面可以加入參數
如果出現SIGSEGV代表我們嘗試在使用一個不能使用的記憶體區塊，下面的錯誤訊息我們大慨可以知道是在cut_func的地方出現問題。

[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77f0700 (LWP 5738)]
[New Thread 0x7ffff6fef700 (LWP 5739)]
[New Thread 0x7ffff67ee700 (LWP 5740)]
[New Thread 0x7ffff5fed700 (LWP 5741)]

Thread 2 "sort" received signal SIGSEGV,     
Segmentation fault.
[Switching to Thread 0x7ffff77f0700 (LWP 
5738)]
0x00000000004019d4 in cut_func             
(data=0x604010) at main.c:87
87	    
list_nth(list, mid - 1)->next = NULL;

案例分析: mergesort-concurrent

(1)測試

make

./sort [num of thread] [num of input]

結果

input unsorted data line-by-line
32
45
6
23
56
89
76
56
sorted results:
[6] [23] [32] [45] [56] [56] [76] [89]

(2)修正資料的輸入方式，改由外部引入

CODE:








FILE *fp=fopen("testSort.txt","r");
if(fp) {
   long int data;
   while((fscanf(fp,"%ld\n",&data))!=EOF){
       list_add(the_list, data);
   }
}vim 
fclose(fp);

修正程式碼內資料型態，使其可接受文字輸入

val_t 改為 char


typedef char val_t[16];

list_add 函式中輸入的data資料型態也應改成char




char data[16];
while((fscanf(fp,"%s\n",data))!=EOF) {
    list_add(the_list, data);
}

(3)資料輸入

對35萬筆資料(word.txt)做sort 資料輸入
- 指令：uniq words.txt | sort -R > input.txt
  uniq 可以去掉資料中重複的部份，而sort則可以-R(random)做隨機排序。sort -R 也可用 shuf 代替

參考資料

使用gdb偵錯segmentation fault

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.