[toc] ## Prerequisites 1. **Basic Program Execution**: 程式碼是怎麼在電腦裡跑起來的?變數存在哪裡? (Search keywords: `program execution flow`, `variables and memory`) 2. **Threads and Concurrency**: 為什麼程式可以『同時』做好幾件事? (Search keywords: `what is a thread`, `concurrency`, `multi-threading basics`) 3. **Shared Resources**: 什麼是程式裡面大家可能搶著用的『東西』? (Search keywords: `shared resources multi-threading`) ## What is the race condition? :::info :information_source: **Definition** They arise when two or more threads (or processes) access and manipulate the same data or resource simultaneously (or concurrently) without proper coordination. ::: :::danger :fire: **Hazard** The outcome depends on **the unpredictable order of execution** of multiple concurrent threads (or processes), which can lead to several serious consequences. These include ***data inconsistencies***, system instability, and security vulnerabilities. ::: ### Simple explained :::warning :teacher: Consider that we have shared data which may be accessed concurrently by two threads. ::: - Thread A's job - Read the x - do x = x + 1 - Write the x back - Thread B's job - Read the x - do x = x + 3 - Write the x back ```mermaid sequenceDiagram participant a as Thread A participant b as Thread B participant s as Shared data Note over s: x = 3 par s->>a: read Note over a: do x = x + 1 and s->>b: read Note over b: do x = x + 3 end Note over a, b: A: [x == 4], B: [x == 6] alt A then B a->>s: write b->>s: write Note over s: x==? else B then A b->>s: write a->>s: write Note over s: x==? end ``` :::warning What's the expected result of `x`? It's going to depend on **the order of execution** of two threads. :thumbsdown: BAD! So, do we have any solution to solve this problem? :thinking_face: ::: ## What if we introduce the mutex (mutual exclusion) lock? :::info **Definition** A [mutex lock](https://en.wikipedia.org/wiki/Mutual_exclusion), or mutual exclusion lock, is a synchronization primitive used in concurrent programming to protect shared resources from simultaneous access by multiple threads. ***It ensures that only one thread can hold the lock (or "own" the mutex) at any given time***, preventing race conditions and data inconsistencies. ::: :::success :teacher: Okay, now we can leverage the mutex lock so that any thread wanting to access the shared data ***must acquire the lock first***. Then release the lock after accessing. ::: ```mermaid sequenceDiagram participant a as Thread A participant b as Thread B participant l as Mutex lock participant s as Shared data activate l Note over s: x = 3 par l->>a: try to aquire deactivate l activate a s->>a: read Note over a: hold the lock, then do x = x + 1 and l--xb: try to aquire loop exponential retry with jitter l-->b: try to aquire end end a->>s: write Note over s: x == 4 a->>l: release deactivate a activate l l->>b: try to aquire deactivate l activate b s->>b: read Note over b: hold the lock, then do x = x + 3 b->>s: write Note over s: x == 7 b->>l: release deactivate b activate l deactivate l ``` :::success :tada: Congrats! We can now guarantee the shared data will be updated correctly. ::: ## What if the data is owned by anothor service? :::warning Extend the same scenario, there are 2 threads, mutex, and the shared data, ***but if the shared data is owned by the database server*** and ***only one application (i.e., process)*** can access this data, should we worry about race conditions in this architecture? :thinking_face: ::: ```mermaid sequenceDiagram box rgba(233, 243, 16, 0.2) Application Server participant a as Thread A participant b as Thread B participant l as Mutex lock end box rgba(92, 243, 16, 0.2) Database Server participant s as Shared data end activate l Note over s: x = 3 par l->>a: try to aquire deactivate l activate a s->>a: read Note over a: hold the lock, then do x = x + 1 and l--xb: try to aquire loop exponential retry with jitter l-->b: try to aquire end end a->>s: write Note over s: x == 4 a->>l: release deactivate a activate l l->>b: try to aquire deactivate l activate b s->>b: read Note over b: hold the lock, then do x = x + 3 b->>s: write Note over s: x == 7 b->>l: release deactivate b activate l deactivate l ``` :::success :tada: No, we don't need to worry if the shared data is accessed by a single application using an in-memory mutex lock, which is designed to protect that. ::: :::info :information_source: In some scenarios, you may utilize database ***transactions***. This built-in feature helps you avoid the need to implement your own lock management. For more details, e.g. - [SQL database's transaction](https://en.wikipedia.org/wiki/Database_transaction) - [Redis's transaction, Lua script, and `WATCH` command](https://stackoverflow.com/a/38760441) ::: ## What if we have multiple applications accessing the shared data? :::danger :bomb: The ***in-memory*** mutex within application won't help in a distributed system! ::: ```mermaid sequenceDiagram box rgba(233, 243, 16, 0.2) Application Server participant a as Threads ... participant la as Mutex lock end box rgba(233, 243, 16, 0.2) Application Server participant b as Threads ... participant lb as Mutex lock end box rgba(92, 243, 16, 0.2) Database Server participant s as Shared data end par la->>a: aquire...?! s->>a: read a->>s: write and lb->>b: aquire...?! s->>b: read b->>s: write end Note over s: ...?! ``` ## How to avoid race condition on a distributed system? :::success 2. We could introduce an external server to hold the mutex lock for specific data or resources. This server would then guarantee ***sequential (single-threaded) processing*** of concurrent client requests. ::: ### Using external server to hold the mutex ```mermaid sequenceDiagram box rgba(233, 243, 16, 0.2) Application Server participant a as Thread A end box rgba(233, 243, 16, 0.2) Application Server participant b as Thread B end box rgba(243, 82, 16, 0.2) External server participant l as Mutex lock end box rgba(92, 243, 16, 0.2) Database Server participant s as Shared data end activate l Note over s: x = 3 par l->>a: try to aquire deactivate l activate a s->>a: read Note over a: hold the lock, then do x = x + 1 and l--xb: try to aquire loop exponential retry with jitter l-->b: try to aquire end end a->>s: write Note over s: x == 4 a->>l: release deactivate a activate l l->>b: try to aquire deactivate l activate b s->>b: read Note over b: hold the lock, then do x = x + 3 b->>s: write Note over s: x == 7 b->>l: release deactivate b activate l deactivate l ``` :::success :tada: By having the external server process concurrent lock acquisition requests sequentially, we can ***guarantee*** that the data won't be manipulated ***simultaneously***. ::: ## "Talk is cheap. Show me the code." :::info 在 2000 年 8 月 25 日,在討論關於 kernel 執行緒優化的問題時,一個人提出了[一個他認為非常高效的方案](https://zh.moegirl.org.cn/zh-hk/Talk_is_cheap._Show_me_the_code.#cite_note-1)。Linus Torvalds 認為這個方案不好,便在回信中寫下了這句[至理名言](https://lkml.org/lkml/2000/8/25/132): “Talk is cheap. Show me the code.” Linus ::: ### The race condition (C#) Go to: [race-condition-example/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/race-condition-example/Program.cs) ### Use mutex (C#) :::info {%preview https://learn.microsoft.com/zh-tw/dotnet/api/system.threading.mutex?view=net-8.0 %} ::: Go to: [no-race-condition-with-mutex/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/no-race-condition-with-mutex/Program.cs) ### Use Redis as a DLM (Distributed Lock Manager) (C#) Go to: [no-race-condition-with-dlm/Program.cs](https://github.com/hjcian/distributed-lock-tutorial/blob/main/no-race-condition-with-dlm/Program.cs) :::warning :thinking_face: Thinking... Q: How to avoid the resource being locked forever if the application doesn't release the lock? (e.g. unexpectedly crash) - Ans: Use **Lock Lease** Q: How to ensure the lock you are going to release is the one you acquired before? (e.g. the lock is being automatically released before you release it after your heavy job) - Ans: Use **Fencing Token** or **Owner Identification** :point_right: [Cache and Redis #Distributed locks](https://hackmd.io/@maxcian/101-caching-and-redis#Distributed-locks) ::: ## Further reading - [Distributed locks and the ***Redlock Algorithm*** - A distributed lock pattern with Redis](https://redis.io/docs/latest/develop/clients/patterns/distributed-locks/) - [How to do distributed locking (*posted by the author of 『Designing Data-Intensive Applications』*)](https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html)