Try   HackMD

2017q1 team7 mergesort-concurrent

contributed by <zmkepetermouse0xff07>

使程式能接受字串輸入

參考 0xff07 的筆記,修改成可以接受字串輸入

建立字串輸入的驗證機制

check-words: sort
	@uniq test_data/words.txt | sort -R | tail -n $(NUM_OF_DATA) > $(TEST_DATA_FILE)
	@sort -d $(TEST_DATA_FILE) > $(SORTED_DATA_FILE)
	@./sort $(THREADS) $(TEST_DATA_FILE) | tail -n +4 > $(SORTED_RESULT)
	@bash scripts/compare.sh $(SORTED_DATA_FILE) $(SORTED_RESULT)

mutrace 分析

mutrace: Showing statistics for process sort (pid 4217).
mutrace: 3 mutexes used.

Mutex #2 (0x0x2407670) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fc25fb344b2]
	./sort(tqueue_init+0x46) [0x4014ca]
	./sort(tpool_init+0x6a) [0x40172e]
	./sort(main+0xe5) [0x401ee4]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc25f56b830]

Mutex #0 (0x0x7fc25d13a860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7fc25fb346b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7fc25cf36fec]
	[(nil)]

Mutex #1 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7fc25fb344b2]
	./sort(main+0xab) [0x401eaa]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc25f56b830]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       2   282836    93324    81938       29.010        0.000        0.016 Mx.--.
       0       20       16        5        0.003        0.000        0.001 M-.--.
       1        6        4        0        0.001        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1119.388 ms.

mutrace: Results for SMP with 4 processors.

Locked : mutex 鎖了幾次
Changed : mutex 在不同 thread 之間切換的次數
Cont. : thread 嘗試要求已鎖住的 mutex 的次數
tot.Time[ms] : mutex 鎖住的總時間
avg.Time[ms] : 平均鎖住 mutex 的時間
max.Time[ms] : 持有 mutex 的最大時間

可以發現到 Mutex #2 在使用上非常的頻繁!thread 之間常需要搶奪 mutex,也常因此等待

追蹤原程式碼的 Mutex #2 出現的位置 ($ addr2line -e sort 0x4014ca)
第一次存取出現在 /mergesort-concurrent/threadpool.c:45

pthread_mutex_init(&(the_queue->mutex), NULL); pthread_cond_init(&(the_queue->cond), NULL); the_queue->size = 0; the_queue->num_of_consumed = 0;

(每個 mutex 實際對應到程式碼會差一行,不知道為什麼)

可以發現是 threadpool 中負責保護 queue pop 與 push 的 Mutex

lock-free threadpool 實作

讀完 A Pragmatic Implementation of Non-Blocking Linked-Lists 的演算法之後想先動手實作看看,關於演算法的解釋 0xff07 的筆記有詳細的說明

Compare And Swap

bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)
type __sync_val_compare_and_swap (type *ptr, type oldval type newval, ...)
ptroldval 相同的時候,會把 ptr 的內容改寫成 newval ,並回傳 oldval

Built-in functions for atomic memory access 的解釋如下
These builtins perform an atomic compare and swap. That is, if the current value of *ptr is oldval, then write newval into *ptr.
The “bool” version returns true if the comparison is successful and newval was written. The “val” version returns the contents of *ptr before the operation.

tqueue_pop()

首先從 tqueue_pop() 開始著手修改,目標是把 tqueue_pop() 的 mutex 拿掉,參考 hungry-birds 的實作,並引用 concurrent-ll 內的 atomic_ops_if.h
先判斷 queue 裡面是否只有一個 node (head 和 tail 相同),如果只有一個 node 就嘗試把 tail 和 head 設成 NULL ,如果不只一個 node 嘗試把 head 設成 head->next

task_t *tqueue_pop(tqueue_t * const the_queue) { task_t *ret; ret = the_queue->head; if (ret) { if(CAS_PTR(&(the_queue->tail), ret, NULL) == ret) { if(CAS_PTR(&(the_queue->head), ret, NULL) == ret) { FAD_U32(&(the_queue->size)); FAI_U32(&(the_queue->num_of_consumed)); return ret; } else { return NULL; } } else { if(ret->next) { if(CAS_PTR(&(the_queue->head), ret, ret->next) == ret) { FAD_U32(&(the_queue->size)); FAI_U32(&(the_queue->num_of_consumed)); return ret; } else { return NULL; } } else { return NULL; } } } return ret; }

$ make check-words 結果為 Passed!
再用 mutrace 測試一次
$ mutrace ./sort 4 random.txt

mutrace: Showing statistics for process sort (pid 15836).
mutrace: 3 mutexes used.

Mutex #2 (0x0x7f720a85b860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f720d2556b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f720a657fec]
	[(nil)]

Mutex #0 (0x0xa43670) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f720d2554b2]
	./sort(tqueue_init+0x46) [0x4014ca]
	./sort(tpool_init+0x6a) [0x40173d]
	./sort(main+0xe5) [0x401ef3]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f720cc8c830]

Mutex #1 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f720d2554b2]
	./sort(main+0xab) [0x401eb9]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f720cc8c830]

mutrace: Showing 3 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       2       20       17        6        0.003        0.000        0.000 M-.--.
       0       13        9        1        0.004        0.000        0.001 Mx.--.
       1        6        5        0        0.002        0.000        0.000 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1057.670 ms.

mutrace: Results for SMP with 4 processors.

發現只有更改 tqueue_pop() 就讓 Mutex #0 (從 address 可以看出就是前面的 Mutex #2 )的使用率大幅下降

tqueue_push()

tqueue_push() 一開始不知道要怎麼改變 tqueue_tail->next 並改變 tqueue_tail 之值,最後也是參考到 hungry-birds 的寫法,才有些想法

其實兩個的操作並不是不行切割,而是可以先選擇改變 tail 之值,確定成功之後,取得舊的 tail ,再將舊的 tail node 指向新的 tail

int tqueue_push(tqueue_t * const the_queue, task_t *task) { task_t *tail = NULL, *head = NULL, *tail_next = NULL; task->next = NULL; do { tail = the_queue->tail; if (CAS_PTR(&(the_queue->tail), tail, task) == tail) break; } while(1); if (tail) { do { tail_next = tail->next; if (CAS_PTR(&(tail->next), tail_next, task) == tail_next) break; } while(1); } else { do { head = the_queue->head; if (CAS_PTR(&(the_queue->head), head, task) == head) break; } while(1); } FAI_U32(&(the_queue->size)); return 0; }

整合後發生的問題

我們發現程式碼大多數時候可以成功執行,但有時候 thread 沒有接收到中止的 task 使 thread 中止運作,導致進入無窮迴圈

之後讓程式重複執行500次,發現中間出現 core dumped,用$ addr2line -e ./sort 找到
threadpool.ctqueue_free
free(cur);
可能發生重複 free 同一塊記憶體,或是 free 到沒有被 malloc 的記憶體

*** Error in `./sort': double free or corruption (fasttop): 0x00007f5f000009c0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f5f091987e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x7fe0a)[0x7f5f091a0e0a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f5f091a498c]
./sort(tqueue_free+0x41)[0x4016cb]
./sort(tpool_free+0x65)[0x401841]
./sort(main+0x132)[0x401f5f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5f09141830]
./sort(_start+0x29)[0x4011a9]
======= Memory map: ========
00400000-00403000 r-xp 00000000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00602000-00603000 r--p 00002000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00603000-00604000 rw-p 00003000 08:0b 525385                             /home/ke/mergesort-concurrent/sort
00ce9000-0224f000 rw-p 00000000 00:00 0                                  [heap]
7f5ef0000000-7f5ef0021000 rw-p 00000000 00:00 0 
7f5ef0021000-7f5ef4000000 ---p 00000000 00:00 0 
7f5ef8000000-7f5ef8021000 rw-p 00000000 00:00 0 
7f5ef8021000-7f5efc000000 ---p 00000000 00:00 0 
7f5efc000000-7f5efc021000 rw-p 00000000 00:00 0 
7f5efc021000-7f5f00000000 ---p 00000000 00:00 0 
7f5f00000000-7f5f00021000 rw-p 00000000 00:00 0 
7f5f00021000-7f5f04000000 ---p 00000000 00:00 0 
7f5f06f07000-7f5f06f1d000 r-xp 00000000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f06f1d000-7f5f0711c000 ---p 00016000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f0711c000-7f5f0711d000 rw-p 00015000 08:0a 136909                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f5f0711d000-7f5f0711e000 ---p 00000000 00:00 0 
7f5f0711e000-7f5f0791e000 rw-p 00000000 00:00 0 
7f5f0791e000-7f5f0791f000 ---p 00000000 00:00 0 
7f5f0791f000-7f5f0811f000 rw-p 00000000 00:00 0 
7f5f0811f000-7f5f08120000 ---p 00000000 00:00 0 
7f5f08120000-7f5f08920000 rw-p 00000000 00:00 0 
7f5f08920000-7f5f08921000 ---p 00000000 00:00 0 
7f5f08921000-7f5f09121000 rw-p 00000000 00:00 0 
7f5f09121000-7f5f092e0000 r-xp 00000000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f092e0000-7f5f094e0000 ---p 001bf000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e0000-7f5f094e4000 r--p 001bf000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e4000-7f5f094e6000 rw-p 001c3000 08:0a 148362                     /lib/x86_64-linux-gnu/libc-2.23.so
7f5f094e6000-7f5f094ea000 rw-p 00000000 00:00 0 
7f5f094ea000-7f5f09502000 r-xp 00000000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09502000-7f5f09701000 ---p 00018000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09701000-7f5f09702000 r--p 00017000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09702000-7f5f09703000 rw-p 00018000 08:0a 149114                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7f5f09703000-7f5f09707000 rw-p 00000000 00:00 0 
7f5f09707000-7f5f0972d000 r-xp 00000000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f09908000-7f5f0990b000 rw-p 00000000 00:00 0 
7f5f09929000-7f5f0992c000 rw-p 00000000 00:00 0 
7f5f0992c000-7f5f0992d000 r--p 00025000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f0992d000-7f5f0992e000 rw-p 00026000 08:0a 148365                     /lib/x86_64-linux-gnu/ld-2.23.so
7f5f0992e000-7f5f0992f000 rw-p 00000000 00:00 0 
7fff9c1a7000-7fff9c1c8000 rw-p 00000000 00:00 0                          [stack]
7fff9c1e0000-7fff9c1e2000 r--p 00000000 00:00 0                          [vvar]
7fff9c1e2000-7fff9c1e4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
已經終止 (core dumped)

week 10 改進方向

之前的程式還有很多有待改進的空間

  • 程式的正確性
    • 之前的程式是錯的!!!
    • 也許我們也忘記了 ABA 問題
  • 驗證 lock-free 的方式有誤
    • 我們一開始只驗證 lock-free 的 push 或 pop 其中一邊 , 另一邊維持著 mutex , 這樣反而造成了 race condition
  • memory leak
  • 現在不建議使用 __sync_val_compare_and_swap

老師提到的改善方法

  • 可以參考 LockLessring 的作法
  • 提供 benchmark 程式

參考 lock-free queue 實作方法

這邊我們參考了 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms 論文,看到很多 lock-free queue 程式碼都建立在這篇論文的基礎上實作,像是 Boost 也參考了這篇實作出 lock-free queue

structure pointer_t { ptr: pointer to node_t, count: unsigned integer }
structure node_t { value: data type, next: pointer_t }
structure queue_t { Head: pointer t, Tail: pointer_t }

原文中有個中介的資料結構叫做 pointer_t ,這個資料結構去指向真正的 node (node_t) 以及存放一個變數名稱 count 代表 reference counting 的次數,可以用來避免 ABA 問題

initialize(Q: pointer to queue_t)
    node = new_node()		// Allocate a free node
    node->next.ptr = NULL	// Make it the only node in the linked list
    Q->Head.ptr = Q->Tail.ptr = node	// Both Head and Tail point to it

比較特別的是, 一開始初始化 lock-free queue 就會新建一個 node ,這個 node 並沒有任何的作用, head 與 tail 會間接指向這個 node , lock-free queue 至少有一個 node 的存在。

  enqueue(Q: pointer to queue_t, value: data type)
   E1:   node = new_node()	// Allocate a new node from the free list
   E2:   node->value = value	// Copy enqueued value into node
   E3:   node->next.ptr = NULL	// Set next pointer of node to NULL
   E4:   loop			// Keep trying until Enqueue is done
   E5:      tail = Q->Tail	// Read Tail.ptr and Tail.count together
   E6:      next = tail.ptr->next	// Read next ptr and count fields together
   E7:      if tail == Q->Tail	// Are tail and next consistent?
               // Was Tail pointing to the last node?
   E8:         if next.ptr == NULL
                  // Try to link node at the end of the linked list
   E9:            if CAS(&tail.ptr->next, next, <node, next.count+1>)
  E10:               break	// Enqueue is done.  Exit loop
  E11:            endif
  E12:         else		// Tail was not pointing to the last node
                  // Try to swing Tail to the next node
  E13:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  E14:         endif
  E15:      endif
  E16:   endloop
         // Enqueue is done.  Try to swing Tail to the inserted node
  E17:   CAS(&Q->Tail, tail, <node, tail.count+1>)

push 時去確保
[E7] Q->tail 還是 Q->tail
[E8] Q->tailnext.ptrNULL
再去嘗試做 [E9] CAS , 失敗重來
特別的是 [E8] 失敗時,也嘗試去更新 Q->tailQ->tail->next.ptr,失敗就算了沒關係(好隨便啊!)
[E17] 試著改變 tail 之值,失敗也沒關係,代表已經被其他人更新

  dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
   D1:   loop			     // Keep trying until Dequeue is done
   D2:      head = Q->Head	     // Read Head
   D3:      tail = Q->Tail	     // Read Tail
   D4:      next = head.ptr->next    // Read Head.ptr->next
   D5:      if head == Q->Head	     // Are head, tail, and next consistent?
   D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
   D7:            if next.ptr == NULL  // Is queue empty?
   D8:               return FALSE      // Queue is empty, couldn't dequeue
   D9:            endif
                  // Tail is falling behind.  Try to advance it
  D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
  D11:         else		     // No need to deal with Tail
                  // Read value before CAS
                  // Otherwise, another dequeue might free the next node
  D12:            *pvalue = next.ptr->value
                  // Try to swing Head to the next node
  D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
  D14:               break             // Dequeue is done.  Exit loop
  D15:            endif
  D16:         endif
  D17:      endif
  D18:   endloop
  D19:   free(head.ptr)		     // It is safe now to free the old node
  D20:   return TRUE                   // Queue was not empty, dequeue succeeded

先前提到 queue 會有一個 node , head 會指向這個 dummy node ,所以 pop 會先去抓取 dummy node 的下一個 node, 釋放指向 dummy node 的 pointer (ptr_t) , 再去將取得的 node 作為 dummy node ,交給下一次 pop 的釋放 ptr_t

確保 [D5] Q->head 還是原本的 Q->head
[D6] Q->head.ptr 若等於先前抓到的 Q->tail.ptr [D7],代表 queue 可能是空的, 這時進一步去檢查 tail->next.ptr 有值也嘗試更新 Q->tail ,失敗也是沒關係算了
然而如果 [D6] 是錯的 [D11] ,把值取出,並改變 head, head 可能改變失敗代表其他 thread 已經改變 head ,一切重新來過

實作 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms

這邊我們使用 C11 atomic_compare_exchange_weak 來替換 __sync_val_compare_and_swap ,與論文不同的是我們並沒有使用中介結構來操作

tqueue_init() 進行初始化的時候先新增一個 dummy_node ,並讓 headtail 指向這個 node

task_t *dummy_node = task_new(dummy_function, NULL); dummy_node->next = NULL; the_queue->head = dummy_node; the_queue->tail = dummy_node;

tqueue_pop()

task_t *tqueue_pop(tqueue_t * const the_queue) { task_t *head = NULL, *tail = NULL, *next = NULL, *ret = NULL; while(1) { head = the_queue->head; tail = the_queue->tail; next = head->next; if(head == the_queue->head) { if(head == tail) { if(next == NULL) { return NULL; } atomic_compare_exchange_weak(&(the_queue->tail), &tail, next); } else { ret = next; if(atomic_compare_exchange_weak(&(the_queue->head), &head, next)) { break; } } } } atomic_fetch_sub(&(the_queue->size), 1); atomic_fetch_add(&(the_queue->num_of_consumed), 1); return ret; }

tqueue_push()

int tqueue_push(tqueue_t * const the_queue, task_t *task) { task_t *tail = NULL, *tail_next = NULL; task->next = NULL; while(1) { tail = the_queue->tail; tail_next = tail->next; if(tail == the_queue->tail) { if(tail_next == NULL) { if(atomic_compare_exchange_weak(&(tail->next), &tail_next, task)) { if(atomic_compare_exchange_weak(&(the_queue->tail), &tail, task)) { } break; } } else { if(atomic_compare_exchange_weak(&(the_queue->tail), &tail, tail_next)) { } } } } atomic_fetch_add(&(the_queue->size), 1); return 0; }

整合結果

整合後發現最後一個 task 無法正確被 pop ,導致程式無法終止,把原本 task_run() 內的 tqueue_push(pool->queue, _task); 改成 tqueue_push(pool->queue, task_new(NULL, NULL)); 就能正確執行了,不知道原因為何

$ mutrace ./sort 4 random.txt

mutrace: Showing statistics for process sort (pid 5998).
mutrace: 3 mutexes used.

Mutex #1 (0x0x7f71cf274860) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_lock+0x49) [0x7f71d1c6e6b9]
	/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Find_FDE+0x2c) [0x7f71cf070fec]
	[(nil)]

Mutex #2 (0x0x603140) first referenced by:
	/usr/lib/mutrace/libmutrace.so(pthread_mutex_init+0xf2) [0x7f71d1c6e4b2]
	./sort(main+0xab) [0x4020bb]
	/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f71d16a5830]

mutrace: Showing 2 most contended mutexes:

 Mutex #   Locked  Changed    Cont. tot.Time[ms] avg.Time[ms] max.Time[ms]  Flags
       1       20       14        7        0.003        0.000        0.001 M-.--.
       2        6        5        0        0.003        0.000        0.002 M-.--.
                                                                           ||||||
                                                                           /|||||
          Object:                                     M = Mutex, W = RWLock /||||
           State:                                 x = dead, ! = inconsistent /|||
             Use:                                 R = used in realtime thread /||
      Mutex Type:                 r = RECURSIVE, e = ERRRORCHECK, a = ADAPTIVE /|
  Mutex Protocol:                                      i = INHERIT, p = PROTECT /
     RWLock Kind: r = PREFER_READER, w = PREFER_WRITER, W = PREFER_WRITER_NONREC 

mutrace: Note that the flags column R is only valid in --track-rt mode!

mutrace: Total runtime is 1110.494 ms.

mutrace: Results for SMP with 4 processors.

原本的 mutex 已經消失了

存在的問題

  • memory leak
    根據原本的寫法,我們會 pop 每一個 task 並在 task 做完後 free task,但是最一開始的 dummy node 並沒有 free 掉且再也無法取得
    如果 threadpool 被設計成可以中途停止的話,剩下在 task queue 中的 task 也無法被釋放記憶體

  • ABA 問題
    這個問題我們目前還沒有遇到,但是有可能發生,還不曉得要怎麼解決,可能的解決方式有:

  • 效率比原本的還差
    pop 與 push 做的事情相當單純,反而 lock-free queue 要不斷地去嘗試 pop 與 push ,過度競爭的狀況使得效益上不減反增,我們對於 lock-free 的結構想得太簡化了,也許要回到每個 thread 有自己的 workqueue 的形式並搭配 ondition variable 等來實現有效率的 lock-free thread pool

  52.08%  sort     sort                [.] tqueue_pop
  16.94%  sort     sort                [.] task_run
   4.71%  sort     sort                [.] split_n_merge
   4.39%  sort     sort                [.] sort_n_merge
   4.08%  sort     [kernel.kallsyms]   [k] queue_work_on
   3.45%  sort     libc-2.23.so        [.] __strcmp_sse2_unaligned
   3.45%  sort     [kernel.kallsyms]   [k] tty_ldisc_deref
   3.14%  sort     [kernel.kallsyms]   [.] native_irq_return_iret
   2.82%  sort     sort                [.] list_get
   2.51%  sort     [kernel.kallsyms]   [k] update_blocked_averages
   2.20%  sort     libc-2.23.so        [.] vfprintf

參考資料

  1. A Pragmatic Implementation of Non-Blocking Linked-Lists
  2. 0xff07 筆記
  3. hungry-birds
  4. Built-in functions for atomic memory access
  5. Compare-and-swap
  6. Acquire and Release Semantics
  7. Toward Concurrency
  8. Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
  9. Simple, Fast, and Practical Non-Blocking and Blocking
    Concurrent Queue Algorithms ppt
  10. intro-to-lock-free-wait-free-and-aba
  11. Boost lockfree queue
  12. Lock Free Concurrent Data Structures,
    CAS and the ABA Problem
  13. lfqueue