2016q3 Homework3 (mergesort-concurrence)

# 2016q3 Homework3 (mergesort-concurrence) contributed by <`abba123`> ## 開發環境 OS : ubuntu 16.04.1 LTS ``` lscpu ``` ``` Architecture: x86_64 CPU 作業模式： 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 每核心執行緒數： 2 每通訊端核心數： 2 Socket(s): 1 NUMA 節點： 1 供應商識別號： GenuineIntel CPU 家族： 6 型號： 58 Model name: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz 製程： 9 CPU MHz： 1924.316 CPU max MHz: 3100.0000 CPU min MHz: 1200.0000 BogoMIPS: 4988.73 虛擬： VT-x L1d 快取： 32K L1i 快取： 32K L2 快取： 256K L3 快取： 3072K NUMA node0 CPU(s): 0-3 ``` ## 案例分析: [mergesort-concurrent](https://github.com/sysprog21/mergesort-concurrent) (對單向 Linked List) ### Thread pool ref: http://openhome.cc/Gossip/DesignPattern/ThreadPool.htm http://swind.code-life.info/posts/c-thread-pool.html 在一般使用 thread 的時候，當一個請求到來，就會建立一個新的 thread，用完之後，便不再使用。但 thread 的建立需要系統資源，對於一個接受許多請求的情況，就會不斷的建立新執行緒，造成系統效能的降低。 thread pool 的概念就是重複使用建好的 thread。 ![](https://i.imgur.com/MqlKMva.png) task_t 裡面包含了要做的 function 和要傳的參數 ```c= typedef struct _task { void (*func)(void *); void *arg; struct _task *next, *last; } task_t; ``` task_queue 的資料結構，包含了 lock 去控制 task 的執行 ```c= typedef struct { task_t *head, *tail; pthread_mutex_t mutex; pthread_cond_t cond; uint32_t size; } tqueue_t; ``` thread_pool 裡面包含了所有的 thread 以及 task_queue ```c= typedef struct { pthread_t *threads; uint32_t count; tqueue_t *queue; } tpool_t; ``` ## 作業要求 * 將 merge sort 的實做改為可接受 [phonebook-concurrent](https://github.com/sysprog21/phonebook-concurrent) 的 35 萬筆資料輸入的資料檔 * 字典檔資料需要事先用 `sort -R` 處理過 * 思考如何得到均勻分佈的亂數排列，並且設計自動測試的機制原本檔案接收的資料是數字，但我們要改成可以接受英文的排序，所以必須要做一點小更動首先把 node 裡面改成可接受 string ```c= typedef struct node { char str[20]; struct node *next; } node_t; ``` 這邊我們改一下判斷式 strcmp 這邊的比較是依照字典排序剛好符合我們的需求 ```c= while (a->size && b->size) { llist_t *small = (llist_t *) ((intptr_t) a * (strcmp(a->head->str,b->head->str)<= 0) + (intptr_t) b * (strcmp(a->head->str,b->head->str) > 0)); ``` 我們先把原本的資料打散 ``` $ uniq words.txt | sort -R > words_input.txt ``` Makefile 的部份寫了一套自動比較的script 這邊我們用 diff 來比較正確性 ``` plot: sort ./sort 1 diff words.txt output.txt ./sort 2 diff words.txt output.txt ./sort 4 diff words.txt output.txt ./sort 8 diff words.txt output.txt ./sort 16 diff words.txt output.txt ./sort 32 diff words.txt output.txt gnuplot plot.gp ``` ![](https://i.imgur.com/JGOs9qr.png)