Try   HackMD

2017q1 Homework4 (mergesort-concurrent)

contributed by <twzjwang>
作業說明: B08: mergesort-concurrent
github: twzjwang/mergesort-concurrent

開發環境

  • Ubuntu Ubuntu 16.04.2 LTS
    Linux 4.8.0-36-generic

  • cpu
    version: Intel® Core i5-3337U CPU @ 1.80GHz
    Architecture: x86_64
    CPU op-mode(s): 32-bit, 64-bit
    Byte Order: Little Endian
    CPU(s): 4
    L1d cache: 32K
    L1i cache: 32K
    L2 cache: 256K
    L3 cache: 3072K

  • memory
    size: 4GiB

開發前

  • 依照作業說明執行

    ​​​​$ (for i in {1..8}; do echo $RANDOM; done) | ./sort 4 8
    ​​​​Segmentation fault (core dumped)
    
  • argv[2]build_list_from_file 的檔名,而不是先的 input_count

    • fopen 失敗
  • 執行 $ uniq test_data/words.txt | sort -R > test_data/word_sorted.txt 取得亂數排列的檔案

  • 根據 Manual

    • NAME : sort - sort lines of text files
    • DESCRIPTION :
      --sort=WORD sort according to WORD: random -R
  • 執行 $ ./sort 4 test_data/word_sorted.txt

    • 結果
      ​​​​​​​​#Total_tasks_consumed: 12
      ​​​​​​​​#Elapsed_time: 51.829 ms
      ​​​​​​​​#Throughput: 231 (per sec)
      
    • list_print
      ​​​​​​​​0
      ​​​​​​​​...
      ​​​​​​​​0
      
  • 字串經 atol 轉換回傳 long int

    • 轉換結果皆為0

修改-使用字串排列

  • 定義 datatype 為大小 16 的char array

    • typedef char val_t[16];
  • 使用 strcmp 比大小llist_t *small = (strcmp(a->head->data, b->head->data) <= 0) ? a : b;

  • 自動測試

    • make check_words
      ​​​​​​​​check_words: sort genData
      ​​​​​​​​./sort $(THREADS) test_data/input.txt output.txt
      ​​​​​​​​diff output.txt test_data/expected.txt
      
      • diff 沒有輸出 : 內容完全相同
    • 結果
      ​​​​​​​​$ make check
      ​​​​​​​​uniq test_data/words.txt | sort -R > test_data/input.txt
      ​​​​​​​​uniq test_data/words.txt | sort > test_data/expected.txt
      ​​​​​​​​./sort 4 test_data/input.txt output.txt
      ​​​​​​​​#Total_tasks_consumed: 12
      ​​​​​​​​#Elapsed_time: 150.177 ms
      ​​​​​​​​#Throughput: 79 (per sec)
      ​​​​​​​​diff output.txt test_data/expected.txt
      
  • mutrace

    • Lock Contention
      • lock contention can have a big impact on the runtime behaviour of applications
    • 項目
      • Locked : 整個 runtime 中 mutex locked 的次數
      • Changed : 擁有 mutex 的 thread 改變的次數
        • 數量高 => the risk of contention 高
      • Cont : contention 發生次數
        • 想要取得 lock 但已經先被其他 thread 取得,必須等待
      • tot.Time[ms] : 等待的總時間
      • avg.Time[ms] : 平均等待時間
      • max.Time[ms] : 最大等待時間
    • 結果
      • Mutex #1 發生大量 contention 並造成等待
      • 使用 addr2line 找到對應的位置,推測
        • Mutex #0
        • Mutex #1 : the_queue->mutex
        • Mutex #2 : data_context.mutex
    ​​​​$ mutrace ./sort 4 test_data/input.txt output.txt
    ​​​​mutrace: 0.2 sucessfully initialized for process sort (pid 3788).
    ​​​​#Total_tasks_consumed: 12
    ​​​​#Elapsed_time: 129.242 ms
    ​​​​#Throughput: 92 (per sec)
    
    ​​​​mutrace: Showing statistics for process sort (pid 3788).
    ​​​​mutrace: 3 mutexes used.
    
    ​​​​Mutex #1 (0x0x1fba670) first referenced by:
    ​​​​    /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2]
    ​​​​    ./sort(tqueue_init+0x46) [0x4014eb]
    ​​​​    ./sort(tpool_init+0x6a) [0x40174f]
    ​​​​    ./sort(main+0xe5) [0x401ee4]
    ​​​​    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830]
    
    ​​​​Mutex #0 (0x0x7fbca513b860) first referenced by:
    ​​​​    /usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7fbca7b356b9]
    ​​​​    /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7fbca4f37fec]
    ​​​​    [(nil)]
    
    ​​​​Mutex #2 (0x0x603140) first referenced by:
    ​​​​    /usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fbca7b354b2]
    ​​​​    ./sort(main+0xab) [0x401eaa]
    ​​​​    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fbca756c830]
    
    ​​​​mutrace: Showing 3 most contended mutexes:
    
    ​​​​ Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
    ​​​​       1    83875    31938    25850       11.116        0.000        0.019 Mx.--.
    ​​​​       0       20       15        8        0.005        0.000        0.001 M-.--.
    ​​​​       2        6        4        0        0.002        0.000        0.001 M-.--.
    ​​​​                                                                           ||||||
    ​​​​                                                                           /|||||
    ​​​​          Object:                                     M = Mutex, W = RWLock /||||
    ​​​​           State:                                 x = dead, ! = inconsistent /|||
    ​​​​             Use:                                 R = used in realtime thread /||
    ​​​​      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
    ​​​​  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
    ​​​​     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 
    
    ​​​​mutrace: Note that the flags column R is only valid in --track-rt mode!
    
    ​​​​mutrace: Total runtime is 200.374 ms.
    
    ​​​​mutrace: Results for SMP with 4 processors.
    

特別留意 lock contention 對效能的影響,目前整合的 thread pool 實作不是很有效率 (之後我們應該要重寫一套更好的) jserv

修改-lock free

  • lock free

    • individual threads 可能發生 starve ,但保證 system-wide throughput
    • All wait-free algorithms are lock-free
    • lock-free 演算法的四個 phases
      • 完成自己的 operation
      • 協助阻塞的 operation
      • 中止阻塞的 operation
      • 等待
  • 針對 lock contention 次數較多的 Mutex #1 : the_queue->mutex

  • crtitical section

    • pthread_mutex_lock() 和 pthread_mutex_unlock() 之間
    ​​​​task_t *tqueue_pop(tqueue_t * const the_queue) ​​​​{ ​​​​ task_t *ret; ​​​​ pthread_mutex_lock(&(the_queue->mutex)); ​​​​ ret = the_queue->head; ​​​​ if (ret) { ​​​​ the_queue->head = ret->next; ​​​​ ret->next = NULL; ​​​​ if (!(the_queue->head)) { ​​​​ the_queue->tail = NULL; ​​​​ } ​​​​ the_queue->size--; ​​​​ the_queue->num_of_consumed++; ​​​​ } ​​​​ pthread_mutex_unlock(&(the_queue->mutex)); ​​​​ return ret; ​​​​} ​​​​uint32_t tqueue_size(tqueue_t * const the_queue) ​​​​{ ​​​​ uint32_t ret; ​​​​ pthread_mutex_lock(&(the_queue->mutex)); ​​​​ ret = the_queue->size; ​​​​ pthread_mutex_unlock(&(the_queue->mutex)); ​​​​ return ret; ​​​​} ​​​​int tqueue_push(tqueue_t * const the_queue, task_t *task) ​​​​{ ​​​​ pthread_mutex_lock(&(the_queue->mutex)); ​​​​ task->next = NULL; ​​​​ if (the_queue->tail) ​​​​ the_queue->tail->next = task; ​​​​ the_queue->tail = task; ​​​​ if (the_queue->size++ == 0) ​​​​ the_queue->head = task; ​​​​ pthread_mutex_unlock(&(the_queue->mutex)); ​​​​ return 0; ​​​​}

參考資料

tags: twzjwang