2017q1 team7 mergesort-concurrent

contributed by <zmke、petermouse、0xff07>

使程式能接受字串輸入

參考 0xff07 的筆記，修改成可以接受字串輸入

建立字串輸入的驗證機制

check-words: sort
	@uniq test_data/words.txt | sort -R | tail -n $(NUM_OF_DATA) > $(TEST_DATA_FILE)
	@sort -d $(TEST_DATA_FILE) > $(SORTED_DATA_FILE)
	@./sort $(THREADS) $(TEST_DATA_FILE) | tail -n +4 > $(SORTED_RESULT)
	@bash scripts/compare.sh $(SORTED_DATA_FILE) $(SORTED_RESULT)

mutrace 分析

mutrace: Showing statistics for process sort (pid 4217).
mutrace: 3 mutexes used.

Mutex #2 (0x0x2407670) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fc25fb344b2]
	./sort(tqueue_init+0x46) [0x4014ca]
	./sort(tpool_init+0x6a) [0x40172e]
	./sort(main+0xe5) [0x401ee4]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc25f56b830]

Mutex #0 (0x0x7fc25d13a860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7fc25fb346b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7fc25cf36fec]
	[(nil)]

Mutex #1 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fc25fb344b2]
	./sort(main+0xab) [0x401eaa]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc25f56b830]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       2   282836    93324    81938       29.010        0.000        0.016 Mx.--.
       0       20       16        5        0.003        0.000        0.001 M-.--.
       1        6        4        0        0.001        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1119.388 ms.

mutrace: Results for SMP with 4 processors.

Locked : mutex 鎖了幾次
Changed : mutex 在不同 thread 之間切換的次數
Cont. : thread 嘗試要求已鎖住的 mutex 的次數
tot.Time[ms] : mutex 鎖住的總時間
avg.Time[ms] : 平均鎖住 mutex 的時間
max.Time[ms] : 持有 mutex 的最大時間

可以發現到 Mutex #2 在使用上非常的頻繁！thread 之間常需要搶奪 mutex，也常因此等待

追蹤原程式碼的 Mutex #2 出現的位置 ($ addr2line -e sort 0x4014ca)
第一次存取出現在 /mergesort-concurrent/threadpool.c:45




    pthread_mutex_init(&(the_queue->mutex), NULL);
    pthread_cond_init(&(the_queue->cond), NULL);
    the_queue->size = 0;                                                        
    the_queue->num_of_consumed = 0;

(每個 mutex 實際對應到程式碼會差一行，不知道為什麼)

可以發現是 threadpool 中負責保護 queue pop 與 push 的 Mutex

lock-free threadpool 實作

讀完 A Pragmatic Implementation of Non-Blocking Linked-Lists 的演算法之後想先動手實作看看，關於演算法的解釋 0xff07 的筆記有詳細的說明

Compare And Swap

bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
當 ptr 和 oldval 相同的時候，會把 ptr 的內容改寫成 newval ，並回傳 oldval

Built-in functions for atomic memory access 的解釋如下
These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.
The “bool” version returns true if the comparison is successful and newval was written. The “val” version returns the contents of *ptr before the operation.

tqueue_pop()

首先從 tqueue_pop() 開始著手修改，目標是把 tqueue_pop() 的 mutex 拿掉，參考 hungry-birds 的實作，並引用 concurrent-ll 內的 atomic_ops_if.h
先判斷 queue 裡面是否只有一個 node (head 和 tail 相同)，如果只有一個 node 就嘗試把 tail 和 head 設成 NULL ，如果不只一個 node 嘗試把 head 設成 head->next





























task_t *tqueue_pop(tqueue_t * const the_queue)
{
    task_t *ret;
    ret = the_queue->head;
    if (ret) {
        if(CAS_PTR(&(the_queue->tail), ret, NULL) == ret) {
            if(CAS_PTR(&(the_queue->head), ret, NULL) == ret) {
                FAD_U32(&(the_queue->size));
                FAI_U32(&(the_queue->num_of_consumed));
                return ret;
            } else {
                return NULL;
            }
        } else {
            if(ret->next) {
                if(CAS_PTR(&(the_queue->head), ret, ret->next) == ret) {
                    FAD_U32(&(the_queue->size));
                    FAI_U32(&(the_queue->num_of_consumed));
                    return ret;
                } else {
                    return NULL;
                }
            } else {
                return NULL;
            }
        }  
    }
    return ret;
}

$ make check-words 結果為 Passed!
再用 mutrace 測試一次
$ mutrace ./sort 4 random.txt

mutrace: Showing statistics for process sort (pid 15836).
mutrace: 3 mutexes used.

Mutex #2 (0x0x7f720a85b860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f720d2556b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f720a657fec]
	[(nil)]

Mutex #0 (0x0xa43670) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f720d2554b2]
	./sort(tqueue_init+0x46) [0x4014ca]
	./sort(tpool_init+0x6a) [0x40173d]
	./sort(main+0xe5) [0x401ef3]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f720cc8c830]

Mutex #1 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f720d2554b2]
	./sort(main+0xab) [0x401eb9]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f720cc8c830]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       2       20       17        6        0.003        0.000        0.000 M-.--.
       0       13        9        1        0.004        0.000        0.001 Mx.--.
       1        6        5        0        0.002        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1057.670 ms.

mutrace: Results for SMP with 4 processors.

發現只有更改 tqueue_pop() 就讓 Mutex #0 (從 address 可以看出就是前面的 Mutex #2 )的使用率大幅下降

tqueue_push()

tqueue_push() 一開始不知道要怎麼改變 tqueue_tail->next 並改變 tqueue_tail 之值，最後也是參考到 hungry-birds 的寫法，才有些想法

其實兩個的操作並不是不行切割，而是可以先選擇改變 tail 之值，確定成功之後，取得舊的 tail ，再將舊的 tail node 指向新的 tail 。



























int tqueue_push(tqueue_t * const the_queue, task_t *task)
{
    task_t *tail = NULL, *head = NULL, *tail_next = NULL;
    task->next = NULL;

    do {
        tail = the_queue->tail;
        if (CAS_PTR(&(the_queue->tail), tail, task) == tail) 
            break;
    } while(1);

    if (tail) {
        do {
            tail_next = tail->next;
            if (CAS_PTR(&(tail->next), tail_next, task) == tail_next)
                break;
        } while(1);
    } else {
        do {
            head = the_queue->head;
            if (CAS_PTR(&(the_queue->head), head, task) == head)
                break;
        } while(1);
    }
    FAI_U32(&(the_queue->size));
    return 0;
}

整合後發生的問題

我們發現程式碼大多數時候可以成功執行，但有時候 thread 沒有接收到中止的 task 使 thread 中止運作，導致進入無窮迴圈

之後讓程式重複執行500次，發現中間出現 core dumped，用$ addr2line -e ./sort 找到
threadpool.c 的 tqueue_free
free(cur);
可能發生重複 free 同一塊記憶體，或是 free 到沒有被 malloc 的記憶體

*** Error in `./sort': double free or corruption (fasttop): 0x00007f5f000009c0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f5f091987e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f5f091a0e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f5f091a498c]
./sort(tqueue_free+0x41)[0x4016cb]
./sort(tpool_free+0x65)[0x401841]
./sort(main+0x132)[0x401f5f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5f09141830]
./sort(_start+0x29)[0x4011a9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00602000-00603000 r--p 00002000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00603000-00604000 rw-p 00003000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00ce9000-0224f000 rw-p 00000000 00:00 0                                  [heap]
7f5ef0000000-7f5ef0021000 rw-p 00000000 00:00 0 
7f5ef0021000-7f5ef4000000 ---p 00000000 00:00 0 
7f5ef8000000-7f5ef8021000 rw-p 00000000 00:00 0 
7f5ef8021000-7f5efc000000 ---p 00000000 00:00 0 
7f5efc000000-7f5efc021000 rw-p 00000000 00:00 0 
7f5efc021000-7f5f00000000 ---p 00000000 00:00 0 
7f5f00000000-7f5f00021000 rw-p 00000000 00:00 0 
7f5f00021000-7f5f04000000 ---p 00000000 00:00 0 
7f5f06f07000-7f5f06f1d000 r-xp 00000000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f06f1d000-7f5f0711c000 ---p 00016000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f0711c000-7f5f0711d000 rw-p 00015000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f0711d000-7f5f0711e000 ---p 00000000 00:00 0 
7f5f0711e000-7f5f0791e000 rw-p 00000000 00:00 0 
7f5f0791e000-7f5f0791f000 ---p 00000000 00:00 0 
7f5f0791f000-7f5f0811f000 rw-p 00000000 00:00 0 
7f5f0811f000-7f5f08120000 ---p 00000000 00:00 0 
7f5f08120000-7f5f08920000 rw-p 00000000 00:00 0 
7f5f08920000-7f5f08921000 ---p 00000000 00:00 0 
7f5f08921000-7f5f09121000 rw-p 00000000 00:00 0 
7f5f09121000-7f5f092e0000 r-xp 00000000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f092e0000-7f5f094e0000 ---p 001bf000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e0000-7f5f094e4000 r--p 001bf000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e4000-7f5f094e6000 rw-p 001c3000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e6000-7f5f094ea000 rw-p 00000000 00:00 0 
7f5f094ea000-7f5f09502000 r-xp 00000000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09502000-7f5f09701000 ---p 00018000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09701000-7f5f09702000 r--p 00017000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09702000-7f5f09703000 rw-p 00018000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09703000-7f5f09707000 rw-p 00000000 00:00 0 
7f5f09707000-7f5f0972d000 r-xp 00000000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f09908000-7f5f0990b000 rw-p 00000000 00:00 0 
7f5f09929000-7f5f0992c000 rw-p 00000000 00:00 0 
7f5f0992c000-7f5f0992d000 r--p 00025000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f0992d000-7f5f0992e000 rw-p 00026000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f0992e000-7f5f0992f000 rw-p 00000000 00:00 0 
7fff9c1a7000-7fff9c1c8000 rw-p 00000000 00:00 0                          [stack]
7fff9c1e0000-7fff9c1e2000 r--p 00000000 00:00 0                          [vvar]
7fff9c1e2000-7fff9c1e4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
已經終止 (core dumped)

week 10 改進方向

之前的程式還有很多有待改進的空間

程式的正確性
- 之前的程式是錯的!!!
- 也許我們也忘記了 ABA 問題
驗證 lock-free 的方式有誤
- 我們一開始只驗證 lock-free 的 push 或 pop 其中一邊 , 另一邊維持著 mutex , 這樣反而造成了 race condition
memory leak
現在不建議使用 __sync_val_compare_and_swap 了
- 應改用 C11 _Atomic

老師提到的改善方法

可以參考 LockLess 與 ring 的作法
提供 benchmark 程式

參考 lock-free queue 實作方法

這邊我們參考了 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms 論文，看到很多 lock-free queue 程式碼都建立在這篇論文的基礎上實作，像是 Boost 也參考了這篇實作出 lock-free queue

structure pointer_t { ptr: pointer to node_t, count: unsigned integer }
structure node_t { value: data type, next: pointer_t }
structure queue_t { Head: pointer t, Tail: pointer_t }

原文中有個中介的資料結構叫做 pointer_t ，這個資料結構去指向真正的 node (node_t) 以及存放一個變數名稱 count 代表 reference counting 的次數，可以用來避免 ABA 問題

initialize(Q: pointer to queue_t)
    node = new_node()		// Allocate a free node
    node->next.ptr = NULL	// Make it the only node in the linked list
    Q->Head.ptr = Q->Tail.ptr = node	// Both Head and Tail point to it

比較特別的是，一開始初始化 lock-free queue 就會新建一個 node ，這個 node 並沒有任何的作用， head 與 tail 會間接指向這個 node ， lock-free queue 至少有一個 node 的存在。

  enqueue(Q: pointer to queue_t, value: data type)
   E1:   node = new_node()	// Allocate a new node from the free list
   E2:   node->value = value	// Copy enqueued value into node
   E3:   node->next.ptr = NULL	// Set next pointer of node to NULL
   E4:   loop			// Keep trying until Enqueue is done
   E5:      tail = Q->Tail	// Read Tail.ptr and Tail.count together
   E6:      next = tail.ptr->next	// Read next ptr and count fields together
   E7:      if tail == Q->Tail	// Are tail and next consistent?
               // Was Tail pointing to the last node?
   E8:         if next.ptr == NULL
                  // Try to link node at the end of the linked list
   E9:            if CAS(&tail.ptr->next, next, <node, next.count+1>)
  E10:               break	// Enqueue is done.  Exit loop
  E11:            endif
  E12:         else		// Tail was not pointing to the last node
                  // Try to swing Tail to the next node
  E13:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  E14:         endif
  E15:      endif
  E16:   endloop
         // Enqueue is done.  Try to swing Tail to the inserted node
  E17:   CAS(&Q->Tail, tail, <node, tail.count+1>)

push 時去確保
[E7] Q->tail 還是 Q->tail
[E8] Q->tail 的 next.ptr 為 NULL
再去嘗試做 [E9] CAS ，失敗重來
特別的是 [E8] 失敗時，也嘗試去更新 Q->tail 為 Q->tail->next.ptr，失敗就算了沒關係(好隨便啊！)
[E17] 試著改變 tail 之值，失敗也沒關係，代表已經被其他人更新

  dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
   D1:   loop			     // Keep trying until Dequeue is done
   D2:      head = Q->Head	     // Read Head
   D3:      tail = Q->Tail	     // Read Tail
   D4:      next = head.ptr->next    // Read Head.ptr->next
   D5:      if head == Q->Head	     // Are head, tail, and next consistent?
   D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
   D7:            if next.ptr == NULL  // Is queue empty?
   D8:               return FALSE      // Queue is empty, couldn't dequeue
   D9:            endif
                  // Tail is falling behind.  Try to advance it
  D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  D11:         else		     // No need to deal with Tail
                  // Read value before CAS
                  // Otherwise, another dequeue might free the next node
  D12:            *pvalue = next.ptr->value
                  // Try to swing Head to the next node
  D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
  D14:               break             // Dequeue is done.  Exit loop
  D15:            endif
  D16:         endif
  D17:      endif
  D18:   endloop
  D19:   free(head.ptr)		     // It is safe now to free the old node
  D20:   return TRUE                   // Queue was not empty, dequeue succeeded

先前提到 queue 會有一個 node ， head 會指向這個 dummy node ，所以 pop 會先去抓取 dummy node 的下一個 node，釋放指向 dummy node 的 pointer (ptr_t) ，再去將取得的 node 作為 dummy node ，交給下一次 pop 的釋放 ptr_t

確保 [D5] Q->head 還是原本的 Q->head
[D6] Q->head.ptr 若等於先前抓到的 Q->tail.ptr [D7]，代表 queue 可能是空的，這時進一步去檢查 tail->next.ptr 有值也嘗試更新 Q->tail ，失敗也是沒關係算了
然而如果 [D6] 是錯的 [D11] ，把值取出，並改變 head， head 可能改變失敗代表其他 thread 已經改變 head ，一切重新來過

實作 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

這邊我們使用 C11 atomic_compare_exchange_weak 來替換 __sync_val_compare_and_swap ，與論文不同的是我們並沒有使用中介結構來操作

在 tqueue_init() 進行初始化的時候先新增一個 dummy_node ，並讓 head 和 tail 指向這個 node




    task_t *dummy_node = task_new(dummy_function, NULL);
    dummy_node->next = NULL;
    the_queue->head = dummy_node;
    the_queue->tail = dummy_node;

tqueue_pop()

























task_t *tqueue_pop(tqueue_t * const the_queue)
{
    task_t *head = NULL, *tail = NULL, *next = NULL, *ret = NULL;
    while(1) {
        head = the_queue->head;
        tail = the_queue->tail;
        next = head->next;
        if(head == the_queue->head) {
            if(head == tail) {
                if(next == NULL) {
                    return NULL;
                }
                atomic_compare_exchange_weak(&(the_queue->tail), &tail, next);
            } else {
                ret = next;
                if(atomic_compare_exchange_weak(&(the_queue->head), &head, next)) {
                    break;
                }
            }
        }
    }
    atomic_fetch_sub(&(the_queue->size), 1);
    atomic_fetch_add(&(the_queue->num_of_consumed), 1);
    return ret;
}

tqueue_push()























int tqueue_push(tqueue_t * const the_queue, task_t *task)
{
    task_t *tail = NULL, *tail_next = NULL;
    task->next = NULL;
    while(1) {
        tail = the_queue->tail;
        tail_next = tail->next;
        if(tail == the_queue->tail) {
            if(tail_next == NULL) {
                if(atomic_compare_exchange_weak(&(tail->next), &tail_next, task)) {
                    if(atomic_compare_exchange_weak(&(the_queue->tail), &tail, task)) {
                    }
                    break;
                }
            } else {
                if(atomic_compare_exchange_weak(&(the_queue->tail), &tail, tail_next)) {
                }
            }
        }
    }
    atomic_fetch_add(&(the_queue->size), 1);
    return 0;
}

整合結果

整合後發現最後一個 task 無法正確被 pop ，導致程式無法終止，把原本 task_run() 內的 tqueue_push(pool->queue, _task); 改成 tqueue_push(pool->queue, task_new(NULL, NULL)); 就能正確執行了，不知道原因為何

$ mutrace ./sort 4 random.txt

mutrace: Showing statistics for process sort (pid 5998).
mutrace: 3 mutexes used.

Mutex #1 (0x0x7f71cf274860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f71d1c6e6b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f71cf070fec]
	[(nil)]

Mutex #2 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f71d1c6e4b2]
	./sort(main+0xab) [0x4020bb]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f71d16a5830]

mutrace: Showing 2 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1       20       14        7        0.003        0.000        0.001 M-.--.
       2        6        5        0        0.003        0.000        0.002 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1110.494 ms.

mutrace: Results for SMP with 4 processors.

原本的 mutex 已經消失了

存在的問題

memory leak
根據原本的寫法，我們會 pop 每一個 task 並在 task 做完後 free task，但是最一開始的 dummy node 並沒有 free 掉且再也無法取得
如果 threadpool 被設計成可以中途停止的話，剩下在 task queue 中的 task 也無法被釋放記憶體
ABA 問題
這個問題我們目前還沒有遇到，但是有可能發生，還不曉得要怎麼解決，可能的解決方式有:
- hazard pointer
- reference counting
效率比原本的還差
pop 與 push 做的事情相當單純，反而 lock-free queue 要不斷地去嘗試 pop 與 push ，過度競爭的狀況使得效益上不減反增，我們對於 lock-free 的結構想得太簡化了，也許要回到每個 thread 有自己的 workqueue 的形式並搭配 ondition variable 等來實現有效率的 lock-free thread pool

  52.08%  sort     sort                [.] tqueue_pop
  16.94%  sort     sort                [.] task_run
   4.71%  sort     sort                [.] split_n_merge
   4.39%  sort     sort                [.] sort_n_merge
   4.08%  sort     [kernel.kallsyms]   [k] queue_work_on
   3.45%  sort     libc-2.23.so        [.] __strcmp_sse2_unaligned
   3.45%  sort     [kernel.kallsyms]   [k] tty_ldisc_deref
   3.14%  sort     [kernel.kallsyms]   [.] native_irq_return_iret
   2.82%  sort     sort                [.] list_get
   2.51%  sort     [kernel.kallsyms]   [k] update_blocked_averages
   2.20%  sort     libc-2.23.so        [.] vfprintf

2017q1 team7 mergesort-concurrent

使程式能接受字串輸入

建立字串輸入的驗證機制

mutrace 分析

lock-free threadpool 實作

Compare And Swap

tqueue_pop()

tqueue_push()

整合後發生的問題

week 10 改進方向

參考 lock-free queue 實作方法

實作 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

tqueue_pop()

tqueue_push()

整合結果

存在的問題

參考資料