[Let's Talk Locks!] Table of contents

This series is my notes for the talk https://www.infoq.com/presentations/go-locks/.

Summary

The talk talks about how locks are implemented in modern languages, taking Go as an example, since it has good concurrency design - its goroutines, which is user-space threads.

Hardware lock primitives

On Intel machines, CMPXCHG (compare-and-swap) used together with the LOCK prefix can be used to implement simple spin locks. Lock prefix will lock the system bus, so the the associated instruction can't be interrupted to guarantee atomicity. More details refer to [Let's Talk Locks!] #1: x86 LOCK prefix.

TSX or HLE also has this:
Intel's backward-compatible HLE is based on two new instruction prefixes (rather than new instructions): XACQUIRE (F2) and XRELEASE (F3). You basically put the XACQUIRE prefix on the instruction that starts your critical section, and XRELEASE on the instruction that ends it.

From spin lock to futex to hybrid lock

Spin lock

Just using a atomic compare-and-swap implements a spin lock, which means while a thead is holding the system bus, other threads will keep retrying in a loop - its compare-and-swap test will return 1 and thus it keeps looping until CAS returns 0. This wastes CPU cycles.

On x86 such code works, though with a lot of room for improvement:

; Intel syntax

locked:                      ; The lock variable. 1 = locked, 0 = unlocked.
     dd      0

spin_lock:
     mov     eax, 1          ; Set the EAX register to 1.
     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
                             ; This will always store 1 to the lock, leaving
                             ;  the previous value in the EAX register.
     test    eax, eax        ; Test EAX with itself. Among other things, this will
                             ;  set the processor's Zero Flag if EAX is 0.
                             ; If EAX is 0, then the lock was unlocked and
                             ;  we just locked it.
                             ; Otherwise, EAX is 1 and we didn't acquire the lock.
     jnz     spin_lock       ; Jump back to the MOV instruction if the Zero Flag is
                             ;  not set; the lock was previously locked, and so
                             ; we need to spin until it becomes unlocked.
     ret                     ; The lock has been acquired, return to the calling
                             ;  function.

spin_unlock:
     xor     eax, eax        ; Set the EAX register to 0.
     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
     ret                     ; The lock has been released.

Note that some memory barriers may be required to correctly implement a spin lock, to avoid compiler reordering - like maybe some other mov eax, 0 from other part of the program can be reordered before mov eax 1, which would cause the code to not work correctly. However, x86's total store ordering already guarantees this, so no memory barrier is needed for this to work on x86.

Futex

Instead of just letting the waiting thread burning CPU cycles, an alternative approach is to put it to sleep and only awake it when the ongoing thread gives up the lock. Linux implements this as futex. The pros is that the CPU cycles are saved and can run other tasks, but context switch can be expensive too.

An example of using futex to implement mutex:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Because the kernel needs to keep track of what thread is waiting on what futex (aka, uaddr - the address of the lock integer on which atomic CAS is performed). This is implemented by hashing the address uaddr to a hash table and store a bucket for each hash key. When different threads are operating the same bucket, a lock is needed. So that is - to implement a futex lock, we first need a lock. In this case, spin lock is used, which makes sense, because we know the bucket operation won't be too long but we are not sure how long the futex owner will keep the lock.

On linux pthread_mutex_lock is implemented using futex.

More details:
https://lwn.net/Articles/360699/

Hybrid lock

If we know the waiting thread won't wait for too long, it might be cheaper to just spinning rather than being scheduled out by the OS. A hybrid lock first tries to spin for sometime, then request the OS to put it to sleep.

Go locks

Instead of requesting the OS to do thread scheduling, which has high context switch cost, Go tries to schedule the threads at runtime system level. So a Go thread may be scheduled but the OS thread could still be the same OS thread. A go routine is essentially a coroutine or a continuation.

More advanced questions involve something like how to avoid the situation that some threads keep being put to sleep and waked up and always can't get a lock - since it takes some time to be able to run after it's added to the ready queue.

"Inception" in locks