# 2017q1 Homework4 (mergesort-concurrent) contributed by <`twzjwang`> 作業說明: [B08: mergesort-concurrent](https://hackmd.io/s/B1xV_p_jl) github: [twzjwang/mergesort-concurrent](https://github.com/twzjwang/mergesort-concurrent) # 開發環境 - Ubuntu Ubuntu 16.04.2 LTS Linux 4.8.0-36-generic - cpu version: Intel(R) Core(TM) i5-3337U CPU @ 1.80GHz Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K - memory size: 4GiB # 開發前 - 依照作業說明執行 ``` $ (for i in {1..8}; do echo $RANDOM; done) | ./sort 4 8 Segmentation fault (core dumped) ``` - `argv[2]` 為 `build_list_from_file` 的檔名,而不是先的 `input_count` - fopen 失敗 - 執行 `$ uniq test_data/words.txt | sort -R > test_data/word_sorted.txt` 取得亂數排列的檔案 - 根據 Manual - NAME : `sort - sort lines of text files ` - DESCRIPTION : `--sort=WORD sort according to WORD: random -R` - 執行 `$ ./sort 4 test_data/word_sorted.txt` - 結果 ``` #Total_tasks_consumed: 12 #Elapsed_time: 51.829 ms #Throughput: 231 (per sec) ``` - list_print ``` 0 ... 0 ``` - 字串經 [atol](http://www.cplusplus.com/reference/cstdlib/atol/) 轉換回傳 long int - 轉換結果皆為0 # 修改-使用字串排列 - 定義 datatype 為大小 16 的char array - `typedef char val_t[16];` - 使用 strcmp 比大小`llist_t *small = (strcmp(a->head->data, b->head->data) <= 0) ? a : b;` - 自動測試 - `make check_words` ``` check_words: sort genData ./sort $(THREADS) test_data/input.txt output.txt diff output.txt test_data/expected.txt ``` - diff 沒有輸出 : 內容完全相同 - 結果 ``` $ make check uniq test_data/words.txt | sort -R > test_data/input.txt uniq test_data/words.txt | sort > test_data/expected.txt ./sort 4 test_data/input.txt output.txt #Total_tasks_consumed: 12 #Elapsed_time: 150.177 ms #Throughput: 79 (per sec) diff output.txt test_data/expected.txt ``` - mutrace - Lock Contention - lock contention can have a big impact on the runtime behaviour of applications - 項目 - Locked : 整個 runtime 中 mutex locked 的次數 - Changed : 擁有 mutex 的 thread 改變的次數 - 數量高 => the risk of contention 高 - Cont : contention 發生次數 - 想要取得 lock 但已經先被其他 thread 取得,必須等待 - tot.Time[ms] : 等待的總時間 - avg.Time[ms] : 平均等待時間 - max.Time[ms] : 最大等待時間 - 結果 - Mutex #1 發生大量 contention 並造成等待 - 使用 `addr2line` 找到對應的位置,推測 - Mutex #0 - Mutex #1 : the_queue->mutex - Mutex #2 : data_context.mutex ``` $ mutrace ./sort 4 test_data/input.txt output.txt mutrace: 0.2 sucessfully initialized for process sort (pid 3788). #Total_tasks_consumed: 12 #Elapsed_time: 129.242 ms #Throughput: 92 (per sec) mutrace: Showing statistics for process sort (pid 3788). mutrace: 3 mutexes used. Mutex #1 (0x0x1fba670) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2] ./sort(tqueue_init+0x46) [0x4014eb] ./sort(tpool_init+0x6a) [0x40174f] ./sort(main+0xe5) [0x401ee4] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830] Mutex #0 (0x0x7fbca513b860) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7fbca7b356b9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7fbca4f37fec] [(nil)] Mutex #2 (0x0x603140) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2] ./sort(main+0xab) [0x401eaa] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830] mutrace: Showing 3 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 83875 31938 25850 11.116 0.000 0.019 Mx.--. 0 20 15 8 0.005 0.000 0.001 M-.--. 2 6 4 0 0.002 0.000 0.001 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 200.374 ms. mutrace: Results for SMP with 4 processors. ``` :::danger 特別留意 lock contention 對效能的影響,目前整合的 thread pool 實作不是很有效率 (之後我們應該要重寫一套更好的) --jserv ::: # 修改-[lock free](https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom) - lock free - individual threads 可能發生 starve ,但保證 system-wide throughput - All wait-free algorithms are lock-free - lock-free 演算法的四個 phases - 完成自己的 operation - 協助阻塞的 operation - 中止阻塞的 operation - 等待 - 針對 lock contention 次數較多的 Mutex #1 : the_queue->mutex - crtitical section - pthread_mutex_lock() 和 pthread_mutex_unlock() 之間 - ```C= task_t *tqueue_pop(tqueue_t * const the_queue) { task_t *ret; pthread_mutex_lock(&(the_queue->mutex)); ret = the_queue->head; if (ret) { the_queue->head = ret->next; ret->next = NULL; if (!(the_queue->head)) { the_queue->tail = NULL; } the_queue->size--; the_queue->num_of_consumed++; } pthread_mutex_unlock(&(the_queue->mutex)); return ret; } uint32_t tqueue_size(tqueue_t * const the_queue) { uint32_t ret; pthread_mutex_lock(&(the_queue->mutex)); ret = the_queue->size; pthread_mutex_unlock(&(the_queue->mutex)); return ret; } int tqueue_push(tqueue_t * const the_queue, task_t *task) { pthread_mutex_lock(&(the_queue->mutex)); task->next = NULL; if (the_queue->tail) the_queue->tail->next = task; the_queue->tail = task; if (the_queue->size++ == 0) the_queue->head = task; pthread_mutex_unlock(&(the_queue->mutex)); return 0; } ``` # 參考資料 - [LanKuDot 學長 筆記](https://hackmd.io/s/S1ezGhIA) - [Measuring Lock Contention](http://0pointer.de/blog/projects/mutrace.html) ###### tags: `twzjwang`