# 2017q1 Team 12 (mergesort-concurrent) contributed by <`baimao8437`> , <`xdennisx`> - Task: 比照 [B08: mergesort-concurrent](https://hackmd.io/s/B1xV_p_jl) 要求,分析 mutex contention,引入更多工具 (如 ThreadSanitizer) 以理解原有程式的問題 接序上一份作業,將30萬筆 phonebook data 進行 merge sort ## Mutex 分析 - Mutrace `$ mutrace ./sort 4 test_data/words.txt` ``` orginal file size = 3206080 #Total_tasks_consumed: 12 #Elapsed_time: 60.443 ms #Throughput: 198 (per sec) ... mutrace: Showing statistics for process sort (pid 12819). mutrace: 3 mutexes used. Mutex #1 (0x0x55b33ca1d220) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c] ./sort(tqueue_init+0x46) [0x55b33acdf9d5] ./sort(tpool_init+0x6a) [0x55b33acdfc39] ./sort(main+0x103) [0x55b33ace0787] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1] Mutex #0 (0x0x7f2248dee380) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f224bd3e827] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f2248be9fac] [(nil)] Mutex #2 (0x0x55b33aee20c0) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c] ./sort(main+0xc7) [0x55b33ace074b] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1] mutrace: Showing 3 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 53419 555 519 8.680 0.000 1.174 Mx.--. 0 20 9 5 0.004 0.000 0.001 M-.--. 2 6 4 0 0.003 0.001 0.001 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 1582.833 ms. mutrace: Results for SMP with 4 processors. ``` 結果顯示此程式中有 3 個 mutex 其中 mutex 1 所佔時間比例最大 若將 thread 數量增加至 8 個 則執行結果: ``` orginal file size = 3206080 #Total_tasks_consumed: 24 #Elapsed_time: 171.773 ms #Throughput: 139 (per sec) ... mutrace: 0.2 sucessfully initialized for process sort (pid 1626). mutrace: Showing statistics for process sort (pid 1626). mutrace: 3 mutexes used. Mutex #1 (0x0x55a1c5d3ae70) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c] ./sort(tqueue_init+0x46) [0x55a1c4ee49d5] ./sort(tpool_init+0x6a) [0x55a1c4ee4c39] ./sort(main+0x103) [0x55a1c4ee5787] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1] Mutex #2 (0x0x7f6cabfe5380) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f6cb0f39827] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f6cabde0fac] [(nil)] Mutex #0 (0x0x55a1c50e70c0) first referenced by: /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c] ./sort(main+0xc7) [0x55a1c4ee574b] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1] mutrace: Showing 3 most contended mutexes: Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags 1 252553 1755 1650 37.412 0.000 2.291 Mx.--. 2 40 12 2 0.007 0.000 0.001 M-.--. 0 14 11 0 0.005 0.000 0.001 M-.--. |||||| /||||| Object: M = Mutex, W = RWLock /|||| State: x = dead, ! = inconsistent /||| Use: R = used in realtime thread /|| Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /| Mutex Protocol: i = INHERIT, p = PROTECT / RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC mutrace: Note that the flags column R is only valid in --track-rt mode! mutrace: Total runtime is 560.546 ms. mutrace: Results for SMP with 4 processors. ``` 可以看到 mutex 1 所佔用的時間居然為 4 threads 時的 4 倍之多 這邊想嘗試用addr2line去觀察程式碼 `$ addr2line -e sort 0x46` 出來結果永遠是 `??:0` 不知道為何不能正確找到程式碼 ## ThreadSanitizer `ThreadSanitizer` 是一個在 C/C++ 中可以偵測 data race 的 detector,data race 是指在沒有 lock 的情況下有多個 thread 在同一個 proceess 在無法確定 thread 之間的**讀寫順序**同時存取到相同的 memory location 時,就會產生 data race 的情況 ### Data Race Example ```c= #include <pthread.h> int Global; void *Thread1(void *x) { Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, NULL, Thread1, NULL); Global = 43; pthread_join(t, NULL); return Global; } ``` 上面兩個 thread 就在沒有做 synchronization 的情況下對同一個 build-in type variable 作了修改的動作,會產生以下的結果 ```shell= dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c dennis@dennis-X550CC:~$ ./a.out ================== WARNING: ThreadSanitizer: data race (pid=7815) Write of size 4 at 0x0000014ace38 by main thread: #0 main /home/dennis/test.c:10 (a.out+0x0000004a1f7e) Previous write of size 4 at 0x0000014ace38 by thread T1: #0 Thread1 /home/dennis/test.c:4 (a.out+0x0000004a1f37) Location is global '<null>' at 0x000000000000 (a.out+0x0000014ace38) Thread T1 (tid=7817, finished) created by main thread at: #0 pthread_create <null> (a.out+0x000000422236) #1 main /home/dennis/test.c:9 (a.out+0x0000004a1f74) SUMMARY: ThreadSanitizer: data race /home/dennis/test.c:10 in main ================== ThreadSanitizer: reported 1 warnings ``` 所以如果將 `pthread_join()` 往上移一行,讓 main thread 等待 `Thread1()` 執行完之後再對 `Global` 做修改就不會出錯 ```c= #include <pthread.h> int Global; void *Thread1(void *x) { Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, NULL, Thread1, NULL); pthread_join(t, NULL); Global = 43; return Global; } ``` ```shell= dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c dennis@dennis-X550CC:~$ ./a.out dennis@dennis-X550CC:~$ ``` ### 使用方法 在編譯的時候加入 `-fsanitize=thread -no-pie` 即可 ### 分析結果 因為原本程式用 mutex 避免了 data race,所以一開始跑 TSan 沒有任何輸出結果,後來把 `pthread_mutex_lock` 跟 `pthread_mutex_unlock` 拿掉後就有結果輸出 `$ ./sort 1 test_data/words.txt` ```shell WARNING: ThreadSanitizer: data race (pid=7685) Read of size 8 at 0x0000006051e8 by thread T1: #0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 (sort+0x000000402d16) #1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a) #2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180) #3 <null> <null> (libtsan.so.0+0x0000000245a9) Previous write of size 8 at 0x0000006051e8 by thread T2: #0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:32 (sort+0x000000402d34) #1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a) #2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180) #3 <null> <null> (libtsan.so.0+0x0000000245a9) Location is global 'tmp_list' of size 8 at 0x0000006051e8 (sort+0x0000006051e8) Thread T1 (tid=7687, running) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x000000028510) #1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241) #2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b) Thread T2 (tid=7688, running) created by main thread at: #0 pthread_create <null> (libtsan.so.0+0x000000028510) #1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241) #2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b) SUMMARY: ThreadSanitizer: data race /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 in merge_thread_lists ... ``` 對應 `main.c:30` ```c=28 if (_list->size < (uint32_t) data_count) { // pthread_mutex_lock(&(data_context.mutex)); llist_t *_t = tmp_list; if (!_t) { ``` ## 檢討 - 1:18 mutrace 使用原理 - addr2line原理:分析ELF執行檔,用addr回推程式碼所在位置 - lock contention:**持有**資源的過程中是有重疊的,在意時間順序, - mutrace 本質:LD_PRELOAD 取代mutex 實作時紀錄時間 - 查mutrace 後面攔位在幹嘛 - 4.12 data race: write 後去 read 時間上的順序 才會發生 - 4.33 Tsan 運作原理 查論文 - 5.09 PIE: position independent execution ## Reference - [ThreadSanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) - [Data Race](https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces) - [Data Race and Race Condition](http://blog.regehr.org/archives/490) - [ThreadSanitizer – data race detection in practice](http://www.cs.columbia.edu/~junfeng/11fa-e6121/papers/thread-sanitizer.pdf) - [Happens-Before](http://opass.logdown.com/posts/797113-concurrency-series-3-happens-before) - [Mutex vs RwLock](http://wiki.jikexueyuan.com/project/rust-primer/rcarc/mutex.html) - [Mutrace source code](https://github.com/dbpercona/mutrace/blob/master/mutrace.c)