# 2017q1 Team 12 (mergesort-concurrent)
contributed by <`baimao8437`> , <`xdennisx`>
- Task: 比照 [B08: mergesort-concurrent](https://hackmd.io/s/B1xV_p_jl) 要求,分析 mutex contention,引入更多工具 (如 ThreadSanitizer) 以理解原有程式的問題
接序上一份作業,將30萬筆 phonebook data 進行 merge sort
## Mutex 分析
- Mutrace
`$ mutrace ./sort 4 test_data/words.txt`
```
orginal file size = 3206080
#Total_tasks_consumed: 12
#Elapsed_time: 60.443 ms
#Throughput: 198 (per sec)
...
mutrace: Showing statistics for process sort (pid 12819).
mutrace: 3 mutexes used.
Mutex #1 (0x0x55b33ca1d220) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c]
./sort(tqueue_init+0x46) [0x55b33acdf9d5]
./sort(tpool_init+0x6a) [0x55b33acdfc39]
./sort(main+0x103) [0x55b33ace0787]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1]
Mutex #0 (0x0x7f2248dee380) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f224bd3e827]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f2248be9fac]
[(nil)]
Mutex #2 (0x0x55b33aee20c0) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c]
./sort(main+0xc7) [0x55b33ace074b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1]
mutrace: Showing 3 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags
1 53419 555 519 8.680 0.000 1.174 Mx.--.
0 20 9 5 0.004 0.000 0.001 M-.--.
2 6 4 0 0.003 0.001 0.001 M-.--.
||||||
/|||||
Object: M = Mutex, W = RWLock /||||
State: x = dead, ! = inconsistent /|||
Use: R = used in realtime thread /||
Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
Mutex Protocol: i = INHERIT, p = PROTECT /
RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC
mutrace: Note that the flags column R is only valid in --track-rt mode!
mutrace: Total runtime is 1582.833 ms.
mutrace: Results for SMP with 4 processors.
```
結果顯示此程式中有 3 個 mutex 其中 mutex 1 所佔時間比例最大
若將 thread 數量增加至 8 個 則執行結果:
```
orginal file size = 3206080
#Total_tasks_consumed: 24
#Elapsed_time: 171.773 ms
#Throughput: 139 (per sec)
...
mutrace: 0.2 sucessfully initialized for process sort (pid 1626).
mutrace: Showing statistics for process sort (pid 1626).
mutrace: 3 mutexes used.
Mutex #1 (0x0x55a1c5d3ae70) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c]
./sort(tqueue_init+0x46) [0x55a1c4ee49d5]
./sort(tpool_init+0x6a) [0x55a1c4ee4c39]
./sort(main+0x103) [0x55a1c4ee5787]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1]
Mutex #2 (0x0x7f6cabfe5380) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f6cb0f39827]
/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f6cabde0fac]
[(nil)]
Mutex #0 (0x0x55a1c50e70c0) first referenced by:
/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c]
./sort(main+0xc7) [0x55a1c4ee574b]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1]
mutrace: Showing 3 most contended mutexes:
Mutex # Locked Changed Cont. tot.Time[ms] avg.Time[ms] max.Time[ms] Flags
1 252553 1755 1650 37.412 0.000 2.291 Mx.--.
2 40 12 2 0.007 0.000 0.001 M-.--.
0 14 11 0 0.005 0.000 0.001 M-.--.
||||||
/|||||
Object: M = Mutex, W = RWLock /||||
State: x = dead, ! = inconsistent /|||
Use: R = used in realtime thread /||
Mutex Type: r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
Mutex Protocol: i = INHERIT, p = PROTECT /
RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC
mutrace: Note that the flags column R is only valid in --track-rt mode!
mutrace: Total runtime is 560.546 ms.
mutrace: Results for SMP with 4 processors.
```
可以看到 mutex 1 所佔用的時間居然為 4 threads 時的 4 倍之多
這邊想嘗試用addr2line去觀察程式碼
`$ addr2line -e sort 0x46`
出來結果永遠是 `??:0` 不知道為何不能正確找到程式碼
## ThreadSanitizer
`ThreadSanitizer` 是一個在 C/C++ 中可以偵測 data race 的 detector,data race 是指在沒有 lock 的情況下有多個 thread 在同一個 proceess 在無法確定 thread 之間的**讀寫順序**同時存取到相同的 memory location 時,就會產生 data race 的情況
### Data Race Example
```c=
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
Global = 43;
pthread_join(t, NULL);
return Global;
}
```
上面兩個 thread 就在沒有做 synchronization 的情況下對同一個 build-in type variable 作了修改的動作,會產生以下的結果
```shell=
dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c
dennis@dennis-X550CC:~$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=7815)
Write of size 4 at 0x0000014ace38 by main thread:
#0 main /home/dennis/test.c:10 (a.out+0x0000004a1f7e)
Previous write of size 4 at 0x0000014ace38 by thread T1:
#0 Thread1 /home/dennis/test.c:4 (a.out+0x0000004a1f37)
Location is global '<null>' at 0x000000000000 (a.out+0x0000014ace38)
Thread T1 (tid=7817, finished) created by main thread at:
#0 pthread_create <null> (a.out+0x000000422236)
#1 main /home/dennis/test.c:9 (a.out+0x0000004a1f74)
SUMMARY: ThreadSanitizer: data race /home/dennis/test.c:10 in main
==================
ThreadSanitizer: reported 1 warnings
```
所以如果將 `pthread_join()` 往上移一行,讓 main thread 等待 `Thread1()` 執行完之後再對 `Global` 做修改就不會出錯
```c=
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
pthread_join(t, NULL);
Global = 43;
return Global;
}
```
```shell=
dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c
dennis@dennis-X550CC:~$ ./a.out
dennis@dennis-X550CC:~$
```
### 使用方法
在編譯的時候加入 `-fsanitize=thread -no-pie` 即可
### 分析結果
因為原本程式用 mutex 避免了 data race,所以一開始跑 TSan 沒有任何輸出結果,後來把 `pthread_mutex_lock` 跟 `pthread_mutex_unlock` 拿掉後就有結果輸出
`$ ./sort 1 test_data/words.txt`
```shell
WARNING: ThreadSanitizer: data race (pid=7685)
Read of size 8 at 0x0000006051e8 by thread T1:
#0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 (sort+0x000000402d16)
#1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a)
#2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180)
#3 <null> <null> (libtsan.so.0+0x0000000245a9)
Previous write of size 8 at 0x0000006051e8 by thread T2:
#0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:32 (sort+0x000000402d34)
#1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a)
#2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180)
#3 <null> <null> (libtsan.so.0+0x0000000245a9)
Location is global 'tmp_list' of size 8 at 0x0000006051e8 (sort+0x0000006051e8)
Thread T1 (tid=7687, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000028510)
#1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241)
#2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b)
Thread T2 (tid=7688, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x000000028510)
#1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241)
#2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b)
SUMMARY: ThreadSanitizer: data race /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 in merge_thread_lists
...
```
對應 `main.c:30`
```c=28
if (_list->size < (uint32_t) data_count) {
// pthread_mutex_lock(&(data_context.mutex));
llist_t *_t = tmp_list;
if (!_t) {
```
## 檢討
- 1:18 mutrace 使用原理
- addr2line原理:分析ELF執行檔,用addr回推程式碼所在位置
- lock contention:**持有**資源的過程中是有重疊的,在意時間順序,
- mutrace 本質:LD_PRELOAD 取代mutex 實作時紀錄時間
- 查mutrace 後面攔位在幹嘛
- 4.12 data race: write 後去 read 時間上的順序 才會發生
- 4.33 Tsan 運作原理 查論文
- 5.09 PIE: position independent execution
## Reference
- [ThreadSanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual)
- [Data Race](https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces)
- [Data Race and Race Condition](http://blog.regehr.org/archives/490)
- [ThreadSanitizer – data race detection in practice](http://www.cs.columbia.edu/~junfeng/11fa-e6121/papers/thread-sanitizer.pdf)
- [Happens-Before](http://opass.logdown.com/posts/797113-concurrency-series-3-happens-before)
- [Mutex vs RwLock](http://wiki.jikexueyuan.com/project/rust-primer/rcarc/mutex.html)
- [Mutrace source code](https://github.com/dbpercona/mutrace/blob/master/mutrace.c)