[toc]
## Prerequisites
1. **Basic Program Execution**: 程式碼是怎麼在電腦裡跑起來的?變數存在哪裡? (Search keywords: `program execution flow`, `variables and memory`)
2. **Threads and Concurrency**: 為什麼程式可以『同時』做好幾件事? (Search keywords: `what is a thread`, `concurrency`, `multi-threading basics`)
3. **Shared Resources**: 什麼是程式裡面大家可能搶著用的『東西』? (Search keywords: `shared resources multi-threading`)
## What is the race condition?
:::info
:information_source: **Definition**
They arise when two or more threads (or processes) access and manipulate the same data or resource simultaneously (or concurrently) without proper coordination.
:::
:::danger
:fire: **Hazard**
The outcome depends on **the unpredictable order of execution** of multiple concurrent threads (or processes), which can lead to several serious consequences.
These include ***data inconsistencies***, system instability, and security vulnerabilities.
:::
### Simple explained
:::warning
:teacher: Consider that we have shared data which may be accessed concurrently by two threads.
:::
- Thread A's job
- Read the x
- do x = x + 1
- Write the x back
- Thread B's job
- Read the x
- do x = x + 3
- Write the x back
```mermaid
sequenceDiagram
participant a as Thread A
participant b as Thread B
participant s as Shared data
Note over s: x = 3
par
s->>a: read
Note over a: do x = x + 1
and
s->>b: read
Note over b: do x = x + 3
end
Note over a, b: A: [x == 4], B: [x == 6]
alt A then B
a->>s: write
b->>s: write
Note over s: x==?
else B then A
b->>s: write
a->>s: write
Note over s: x==?
end
```
:::warning
What's the expected result of `x`?
It's going to depend on **the order of execution** of two threads. :thumbsdown: BAD!
So, do we have any solution to solve this problem? :thinking_face:
:::
## What if we introduce the mutex (mutual exclusion) lock?
:::info
**Definition**
A [mutex lock](https://en.wikipedia.org/wiki/Mutual_exclusion), or mutual exclusion lock, is a synchronization primitive used in concurrent programming to protect shared resources from simultaneous access by multiple threads.
***It ensures that only one thread can hold the lock (or "own" the mutex) at any given time***, preventing race conditions and data inconsistencies.
:::
:::success
:teacher: Okay, now we can leverage the mutex lock so that any thread wanting to access the shared data ***must acquire the lock first***. Then release the lock after accessing.
:::
```mermaid
sequenceDiagram
participant a as Thread A
participant b as Thread B
participant l as Mutex lock
participant s as Shared data
activate l
Note over s: x = 3
par
l->>a: try to aquire
deactivate l
activate a
s->>a: read
Note over a: hold the lock, then do x = x + 1
and
l--xb: try to aquire
loop exponential retry with jitter
l-->b: try to aquire
end
end
a->>s: write
Note over s: x == 4
a->>l: release
deactivate a
activate l
l->>b: try to aquire
deactivate l
activate b
s->>b: read
Note over b: hold the lock, then do x = x + 3
b->>s: write
Note over s: x == 7
b->>l: release
deactivate b
activate l
deactivate l
```
:::success
:tada: Congrats! We can now guarantee the shared data will be updated correctly.
:::
## What if the data is owned by anothor service?
:::warning
Extend the same scenario, there are 2 threads, mutex, and the shared data, ***but if the shared data is owned by the database server*** and ***only one application (i.e., process)*** can access this data, should we worry about race conditions in this architecture? :thinking_face:
:::
```mermaid
sequenceDiagram
box rgba(233, 243, 16, 0.2) Application Server
participant a as Thread A
participant b as Thread B
participant l as Mutex lock
end
box rgba(92, 243, 16, 0.2) Database Server
participant s as Shared data
end
activate l
Note over s: x = 3
par
l->>a: try to aquire
deactivate l
activate a
s->>a: read
Note over a: hold the lock, then do x = x + 1
and
l--xb: try to aquire
loop exponential retry with jitter
l-->b: try to aquire
end
end
a->>s: write
Note over s: x == 4
a->>l: release
deactivate a
activate l
l->>b: try to aquire
deactivate l
activate b
s->>b: read
Note over b: hold the lock, then do x = x + 3
b->>s: write
Note over s: x == 7
b->>l: release
deactivate b
activate l
deactivate l
```
:::success
:tada: No, we don't need to worry if the shared data is accessed by a single application using an in-memory mutex lock, which is designed to protect that.
:::
:::info
:information_source: In some scenarios, you may utilize database ***transactions***. This built-in feature helps you avoid the need to implement your own lock management.
For more details, e.g.
- [SQL database's transaction](https://en.wikipedia.org/wiki/Database_transaction)
- [Redis's transaction, Lua script, and `WATCH` command](https://stackoverflow.com/a/38760441)
:::
## What if we have multiple applications accessing the shared data?
:::danger
:bomb: The ***in-memory*** mutex within application won't help in a distributed system!
:::
```mermaid
sequenceDiagram
box rgba(233, 243, 16, 0.2) Application Server
participant a as Threads ...
participant la as Mutex lock
end
box rgba(233, 243, 16, 0.2) Application Server
participant b as Threads ...
participant lb as Mutex lock
end
box rgba(92, 243, 16, 0.2) Database Server
participant s as Shared data
end
par
la->>a: aquire...?!
s->>a: read
a->>s: write
and
lb->>b: aquire...?!
s->>b: read
b->>s: write
end
Note over s: ...?!
```
## How to avoid race condition on a distributed system?
:::success
2. We could introduce an external server to hold the mutex lock for specific data or resources. This server would then guarantee ***sequential (single-threaded) processing*** of concurrent client requests.
:::
### Using external server to hold the mutex
```mermaid
sequenceDiagram
box rgba(233, 243, 16, 0.2) Application Server
participant a as Thread A
end
box rgba(233, 243, 16, 0.2) Application Server
participant b as Thread B
end
box rgba(243, 82, 16, 0.2) External server
participant l as Mutex lock
end
box rgba(92, 243, 16, 0.2) Database Server
participant s as Shared data
end
activate l
Note over s: x = 3
par
l->>a: try to aquire
deactivate l
activate a
s->>a: read
Note over a: hold the lock, then do x = x + 1
and
l--xb: try to aquire
loop exponential retry with jitter
l-->b: try to aquire
end
end
a->>s: write
Note over s: x == 4
a->>l: release
deactivate a
activate l
l->>b: try to aquire
deactivate l
activate b
s->>b: read
Note over b: hold the lock, then do x = x + 3
b->>s: write
Note over s: x == 7
b->>l: release
deactivate b
activate l
deactivate l
```
:::success
:tada: By having the external server process concurrent lock acquisition requests sequentially, we can ***guarantee*** that the data won't be manipulated ***simultaneously***.
:::
## "Talk is cheap. Show me the code."
:::info
在 2000 年 8 月 25 日,在討論關於 kernel 執行緒優化的問題時,一個人提出了[一個他認為非常高效的方案](https://zh.moegirl.org.cn/zh-hk/Talk_is_cheap._Show_me_the_code.#cite_note-1)。Linus Torvalds 認為這個方案不好,便在回信中寫下了這句[至理名言](https://lkml.org/lkml/2000/8/25/132):
“Talk is cheap. Show me the code.”
Linus
:::
### The race condition (C#)
Go to: [race-condition-example/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/race-condition-example/Program.cs)
### Use mutex (C#)
:::info
{%preview https://learn.microsoft.com/zh-tw/dotnet/api/system.threading.mutex?view=net-8.0 %}
:::
Go to: [no-race-condition-with-mutex/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/no-race-condition-with-mutex/Program.cs)
### Use Redis as a DLM (Distributed Lock Manager) (C#)
Go to: [no-race-condition-with-dlm/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/no-race-condition-with-dlm/Program.cs)
:::warning
:thinking_face: Thinking...
Q: How to avoid the resource being locked forever if the application doesn't release the lock? (e.g. unexpectedly crash)
- Ans: Use **Lock Lease**
Q: How to ensure the lock you are going to release is the one you acquired before? (e.g. the lock is being automatically released before you release it after your heavy job)
- Ans: Use **Fencing Token** or **Owner Identification**
:point_right: [Cache and Redis #Distributed locks](https://hackmd.io/@maxcian/101-caching-and-redis#Distributed-locks)
:::
## Further reading
- [Distributed locks and the ***Redlock Algorithm*** - A distributed lock pattern with Redis](https://redis.io/docs/latest/develop/clients/patterns/distributed-locks/)
- [How to do distributed locking (*posted by the author of 『Designing Data-Intensive Applications』*)](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)