Try   HackMD

2017q1 Team 12 (mergesort-concurrent)

contributed by <baimao8437> , <xdennisx>

  • Task: 比照 B08: mergesort-concurrent 要求,分析 mutex contention,引入更多工具 (如 ThreadSanitizer) 以理解原有程式的問題

接序上一份作業,將30萬筆 phonebook data 進行 merge sort

Mutex 分析

  • Mutrace
    $ mutrace ./sort 4 test_data/words.txt
orginal file size = 3206080
#Total_tasks_consumed: 12
#Elapsed_time: 60.443 ms
#Throughput: 198 (per sec)

...

mutrace: Showing statistics for process sort (pid 12819).
mutrace: 3 mutexes used.

Mutex #1 (0x0x55b33ca1d220) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c]
	./sort(tqueue_init+0x46) [0x55b33acdf9d5]
	./sort(tpool_init+0x6a) [0x55b33acdfc39]
	./sort(main+0x103) [0x55b33ace0787]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1]

Mutex #0 (0x0x7f2248dee380) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f224bd3e827]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f2248be9fac]
	[(nil)]

Mutex #2 (0x0x55b33aee20c0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f224bd3e62c]
	./sort(main+0xc7) [0x55b33ace074b]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f224b7763f1]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1    53419      555      519        8.680        0.000        1.174 Mx.--.
       0       20        9        5        0.004        0.000        0.001 M-.--.
       2        6        4        0        0.003        0.001        0.001 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1582.833 ms.

mutrace: Results for SMP with 4 processors.

結果顯示此程式中有 3 個 mutex 其中 mutex 1 所佔時間比例最大

若將 thread 數量增加至 8 個 則執行結果:

orginal file size = 3206080
#Total_tasks_consumed: 24
#Elapsed_time: 171.773 ms
#Throughput: 139 (per sec)

...

mutrace: 0.2 sucessfully initialized for process sort (pid 1626).

mutrace: Showing statistics for process sort (pid 1626).
mutrace: 3 mutexes used.

Mutex #1 (0x0x55a1c5d3ae70) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c]
	./sort(tqueue_init+0x46) [0x55a1c4ee49d5]
	./sort(tpool_init+0x6a) [0x55a1c4ee4c39]
	./sort(main+0x103) [0x55a1c4ee5787]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1]

Mutex #2 (0x0x7f6cabfe5380) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x47) [0x7f6cb0f39827]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f6cabde0fac]
	[(nil)]

Mutex #0 (0x0x55a1c50e70c0) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xec) [0x7f6cb0f3962c]
	./sort(main+0xc7) [0x55a1c4ee574b]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7f6cb09713f1]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1   252553     1755     1650       37.412        0.000        2.291 Mx.--.
       2       40       12        2        0.007        0.000        0.001 M-.--.
       0       14       11        0        0.005        0.000        0.001 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 560.546 ms.

mutrace: Results for SMP with 4 processors.

可以看到 mutex 1 所佔用的時間居然為 4 threads 時的 4 倍之多

這邊想嘗試用addr2line去觀察程式碼
$ addr2line -e sort 0x46
出來結果永遠是 ??:0 不知道為何不能正確找到程式碼

ThreadSanitizer

ThreadSanitizer 是一個在 C/C++ 中可以偵測 data race 的 detector,data race 是指在沒有 lock 的情況下有多個 thread 在同一個 proceess 在無法確定 thread 之間的讀寫順序同時存取到相同的 memory location 時,就會產生 data race 的情況

Data Race Example

#include <pthread.h> int Global; void *Thread1(void *x) { Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, NULL, Thread1, NULL); Global = 43; pthread_join(t, NULL); return Global; }

上面兩個 thread 就在沒有做 synchronization 的情況下對同一個 build-in type variable 作了修改的動作,會產生以下的結果

dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c dennis@dennis-X550CC:~$ ./a.out ================== WARNING: ThreadSanitizer: data race (pid=7815) Write of size 4 at 0x0000014ace38 by main thread: #0 main /home/dennis/test.c:10 (a.out+0x0000004a1f7e) Previous write of size 4 at 0x0000014ace38 by thread T1: #0 Thread1 /home/dennis/test.c:4 (a.out+0x0000004a1f37) Location is global '<null>' at 0x000000000000 (a.out+0x0000014ace38) Thread T1 (tid=7817, finished) created by main thread at: #0 pthread_create <null> (a.out+0x000000422236) #1 main /home/dennis/test.c:9 (a.out+0x0000004a1f74) SUMMARY: ThreadSanitizer: data race /home/dennis/test.c:10 in main ================== ThreadSanitizer: reported 1 warnings

所以如果將 pthread_join() 往上移一行,讓 main thread 等待 Thread1() 執行完之後再對 Global 做修改就不會出錯

#include <pthread.h> int Global; void *Thread1(void *x) { Global = 42; return x; } int main() { pthread_t t; pthread_create(&t, NULL, Thread1, NULL); pthread_join(t, NULL); Global = 43; return Global; }
dennis@dennis-X550CC:~$ clang -fsanitize=thread -g -O1 test.c dennis@dennis-X550CC:~$ ./a.out dennis@dennis-X550CC:~$

使用方法

在編譯的時候加入 -fsanitize=thread -no-pie 即可

分析結果

因為原本程式用 mutex 避免了 data race,所以一開始跑 TSan 沒有任何輸出結果,後來把 pthread_mutex_lockpthread_mutex_unlock 拿掉後就有結果輸出
$ ./sort 1 test_data/words.txt

WARNING: ThreadSanitizer: data race (pid=7685)
  Read of size 8 at 0x0000006051e8 by thread T1:
    #0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 (sort+0x000000402d16)
    #1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a)
    #2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180)
    #3 <null> <null> (libtsan.so.0+0x0000000245a9)

  Previous write of size 8 at 0x0000006051e8 by thread T2:
    #0 merge_thread_lists /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:32 (sort+0x000000402d34)
    #1 sort_local_list /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:56 (sort+0x000000402e4a)
    #2 task_run /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:94 (sort+0x000000403180)
    #3 <null> <null> (libtsan.so.0+0x0000000245a9)

  Location is global 'tmp_list' of size 8 at 0x0000006051e8 (sort+0x0000006051e8)

  Thread T1 (tid=7687, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x000000028510)
    #1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241)
    #2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b)

  Thread T2 (tid=7688, running) created by main thread at:
    #0 pthread_create <null> (libtsan.so.0+0x000000028510)
    #1 tpool_init /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/threadpool.c:140 (sort+0x000000402241)
    #2 main /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:130 (sort+0x00000040340b)

SUMMARY: ThreadSanitizer: data race /home/baimao/文件/2017嵌入式作業/mergesort-concurrent/main.c:30 in merge_thread_lists
...

對應 main.c:30

if (_list->size < (uint32_t) data_count) { // pthread_mutex_lock(&(data_context.mutex)); llist_t *_t = tmp_list; if (!_t) {

檢討

  • 1:18 mutrace 使用原理
  • addr2line原理:分析ELF執行檔,用addr回推程式碼所在位置
  • lock contention:持有資源的過程中是有重疊的,在意時間順序,
  • mutrace 本質:LD_PRELOAD 取代mutex 實作時紀錄時間
  • 查mutrace 後面攔位在幹嘛
  • 4.12 data race: write 後去 read 時間上的順序 才會發生
  • 4.33 Tsan 運作原理 查論文
  • 5.09 PIE: position independent execution

Reference