contributed by <kevinbird61
>
hyperthreading
Multicore
on-die cache
A synchronizes-with B
Thread: Create&Join
- 跟fork(),exec()相似的過成(fork前的memory於全體thread是可見的)
- 文中提到java對於這塊thread間的問題處理掉了,而C++則是選擇讓使用者自己來;所以就想到,假如要處理的資料屬於獨立性較強,則不需要實作等待的mutex,只要再程式末加上判斷thread count是否全部到達,再把資料依據thread的順序排列即可,可以省去mutex的時間。
詳細的分析:
Memory Consistency Model
Sequential Consistency
setjmp & longjmp
setjmp()
function dynamically establishes the target to which control will later be transferred ; longjmp()
performs the transfer of execution.(TODO實作後補上自己的說明)setjmp()
function saves various information about the calling environment(typically,the stack pointer,the instruction pointer,possibly the values of other registers and the signal mask) in the buffer env
for later use by longjmp(). In this case, setjmp() returns 0.)longjmp()
function uses the information saved in env
to transfer control back to the point where setjmp() was called and to restore ("rewind") the stack to its state at the time of the setjmp() call.$ ./a.out
A(1)
B(1)
A(2) r = 10001
B(2) r=20001
A(3) r = 10002
B(3) r=20002
A(4) r = 10003
提供完整的(partial ordering interface)於store、loads、atomic operations的操作上,Memory ordering exceptions expressed with strict fence interface. partial order set
而memory model和instruction set是分開的,為了使用上方便,porters transcribe a handful of operations from architecture manuals and rarely have to concern themselves with memory model minutiae(細節) and additional boilerplate(樣板).
所有的concurrent accesses到actively mutable state(應該譯做易變動的記憶體區塊)都被紀錄再可用於必要hardware instructions的ck_pr
operations,也可被視為compiler barrier
實作了許多不同特性跟擴展性的保證
實作兩種不同的spinlock做比較
文中提到,starvation-freedom跟fairness對於NUMA(Non-Uniform Memory Access)架構來說是非常重要的
匯流排
的複雜程度,把系統切成數個node,每個node上都有處理器及記憶體,當處理器存取同一個節點的記憶體時,會有較高的存取速度;而當需要存取到其他node時,才需要花費較多的時間。而這些演算法實作成的spinlock,優化了busy-wait stage並提供了較強並fairness(公正?)的保證
Ex: Elision(Intel x86 & Power 8 processors support restricted transactional memory - RTM) => Typical use-case with those processors is Elision.
from =>
to =>
@ Unscalable lock => degrade performance under contention
Blocking Asymmetric Synchronization
liveness
and reachability
of object is decoupled with blocking synchronization, techniques like reference counting must be used(reference counting => expensive!) => 提到使用blocking synchronization時同時會需要的成本Ring Buffer
ck_ring_t
: is a lock-free ring buffer in C.K. .It is wait-free for single-producer/ single-consumer and lock-free for single-producer/ multi-consumer.Hash Table