--- tags: lock, talk-notes --- # [Let's Talk Locks!] Table of contents This series is my notes for the talk https://www.infoq.com/presentations/go-locks/. ## Summary The talk talks about how locks are implemented in modern languages, taking Go as an example, since it has good concurrency design - its goroutines, which is user-space threads. ## Hardware lock primitives On Intel machines, `CMPXCHG` (compare-and-swap) used together with the `LOCK` prefix can be used to implement simple spin locks. `Lock` prefix will lock the system bus, so the the associated instruction can't be interrupted to guarantee atomicity. More details refer to [[Let's Talk Locks!] #1: x86 LOCK prefix](/ubSrsjTHQwatD10E-s9vUw). TSX or HLE also has this: Intel's backward-compatible HLE is based on two new instruction prefixes (rather than new instructions): XACQUIRE (F2) and XRELEASE (F3). You basically put the XACQUIRE prefix on the instruction that starts your critical section, and XRELEASE on the instruction that ends it. ## From spin lock to futex to hybrid lock ### Spin lock Just using a atomic compare-and-swap implements a spin lock, which means while a thead is holding the system bus, other threads will keep retrying in a loop - its compare-and-swap test will return 1 and thus it keeps looping until CAS returns 0. This wastes CPU cycles. On x86 such code works, though with a lot of room for improvement: ``` ; Intel syntax locked: ; The lock variable. 1 = locked, 0 = unlocked. dd 0 spin_lock: mov eax, 1 ; Set the EAX register to 1. xchg eax, [locked] ; Atomically swap the EAX register with ; the lock variable. ; This will always store 1 to the lock, leaving ; the previous value in the EAX register. test eax, eax ; Test EAX with itself. Among other things, this will ; set the processor's Zero Flag if EAX is 0. ; If EAX is 0, then the lock was unlocked and ; we just locked it. ; Otherwise, EAX is 1 and we didn't acquire the lock. jnz spin_lock ; Jump back to the MOV instruction if the Zero Flag is ; not set; the lock was previously locked, and so ; we need to spin until it becomes unlocked. ret ; The lock has been acquired, return to the calling ; function. spin_unlock: xor eax, eax ; Set the EAX register to 0. xchg eax, [locked] ; Atomically swap the EAX register with ; the lock variable. ret ; The lock has been released. ``` Note that some memory barriers may be required to correctly implement a spin lock, to avoid compiler reordering - like maybe some other `mov eax, 0` from other part of the program can be reordered before `mov eax 1`, which would cause the code to not work correctly. However, x86's total store ordering already guarantees this, so no memory barrier is needed for this to work on x86. ### Futex Instead of just letting the waiting thread burning CPU cycles, an alternative approach is to put it to sleep and only awake it when the ongoing thread gives up the lock. Linux implements this as `futex`. The pros is that the CPU cycles are saved and can run other tasks, but context switch can be expensive too. An example of using futex to implement mutex: ![](https://i.imgur.com/9tjKlnh.png) Because the kernel needs to keep track of what thread is waiting on what futex (aka, `uaddr` - the address of the lock integer on which atomic CAS is performed). This is implemented by hashing the address `uaddr` to a hash table and store a bucket for each hash key. When different threads are operating the same bucket, a lock is needed. So that is - to implement a futex lock, we first need a lock. In this case, spin lock is used, which makes sense, because we know the bucket operation won't be too long but we are not sure how long the futex owner will keep the lock. On linux `pthread_mutex_lock` is implemented using futex. More details: https://lwn.net/Articles/360699/ ### Hybrid lock If we know the waiting thread won't wait for too long, it might be cheaper to just spinning rather than being scheduled out by the OS. A hybrid lock first tries to spin for sometime, then request the OS to put it to sleep. ## Go locks Instead of requesting the OS to do thread scheduling, which has high context switch cost, Go tries to schedule the threads at runtime system level. So a Go thread may be scheduled but the OS thread could still be the same OS thread. A go routine is essentially a coroutine or a continuation. More advanced questions involve something like how to avoid the situation that some threads keep being put to sleep and waked up and always can't get a lock - since it takes some time to be able to run after it's added to the ready queue. ## "Inception" in locks ![](https://i.imgur.com/2EnG1By.png) Probably the coolest part of the talk! ## More locks - software locks - double-checked locks Both are related to memory barriers. <!-- questions? - Why is thread context switch expensive? - How does OS put a thread to sleep for spin lock? - what system call is called for a thread to request the OS to put it to sleep? -->