Kernel Synchronization

# Kernel Synchronization ###### tags: `LINUX KERNEL` 實務必看 https://devopedia.org/race-condition-software # Atomic operation 針對single Variable的保護理想情況 ![](https://i.imgur.com/uASsqEN.png) 可能出錯情況 ![](https://i.imgur.com/bQDy2Xz.png) atomic operation保護情況 ![](https://i.imgur.com/Ov0YHwV.png) ![](https://i.imgur.com/KBuvnBI.png) ![](https://i.imgur.com/4Ry3muq.png) 最常見的用法是實作counters ![](https://i.imgur.com/EoFFZYr.png) 也可以設值後test看值對不對 ![](https://i.imgur.com/XzrF0ZO.png) atomic read通常在大多的ISA上，word-sized read is always atomic，這種的可以看到他就是直接回傳，macro裡面不做其他的事情 ![](https://i.imgur.com/ZppJ2I7.png) linux 2.6.9本的automic operation 在/linux-2.6.9/include/asm-i386/atomic.h" # Atomic Bitwise Operations architecture-specific ![](https://i.imgur.com/MIgq25T.png) # Locking 當要保護的區域是比較大的資料結構(像是linked list的增刪都會對同一結構存取)，atomic operation就不夠用了，大多數OS都使用locking機制。當一個thread在讀結構時同時另一個thread在write，資料結構就會不一致 > What is needed is a way of making sure that only one thread manipulates the data structure at a time > ![](https://i.imgur.com/lnZwGD8.png) (locks都是用atomic operation作出來的) **atomic test and set :** instruction that tests the value of an integer and sets it to a new value only if it is zero.A value of zero means unlocked. # Causes of Concurrency 對user space程式來說，程式只有在被scheduler preempt才會有同步的問題????(user thread parrellel執行呢?) * 例如兩個不同的user space程式都修改了same shared memory * or write to the same file descriptor * multiple single-threaded processes sharing files * or within a single program with signals because signals can occur asynchronously 這種兩個processes 不是同時間執行 but interleave with each other such that they might as well—is called > **pseudo-concurrency.** 兩個processes同時在不同cpu上執行相同的critical region叫做 > **true concurrency** > Kernel則在以下情況會有concurrency問題 * Interrupts * An interrupt can occur asynchronously at almost any time, interrupting the currently executing code. * Softirqs and tasklets * The kernel can raise or schedule a softirq or tasklet at almost any time, interrupting the currently executing code. * Kernel preemption * Because the kernel is preemptive, one task in the kernel can preempt another. * Sleeping and synchronization with user-space * A task in the kernel can sleep and thus invoke the scheduler, resulting in the running of a new process. * Symmetrical multiprocessing * Two or more processors can execute kernel code at exactly the same time. # 難的地方是決定什麼該保護 * interrupt-safe * Code that is safe from concurrent access from an interrupt handler * SMP-safe * Code that is safe from concurrency on symmetrical multiprocessing machines * preempt-safe * Code that is safe from concurrency with kernel preemption # Knowing what to protect * 只屬於一個thread的data不需要被保護 * local automatic variables * thread stack裡的資料 * process 只有自己用到的data * 大多數的global kernel data structures都需要保護 * remember to lock data, not code. > Kernel可以在compile time用CONFIG來選擇SMP or UP環境，當你選擇UP環境的時候，許多locking isssues相對於SMP是不存在的 > ex: spinlocks在UP環境下，功能只剩下disable preemption > 當你在寫kernel code的時候，可以問問自己以下幾個問題 ![](https://i.imgur.com/6dK6ftg.png) # Deadlock ![](https://i.imgur.com/NckuDbS.png) 防止deadlock的辦法 ![](https://i.imgur.com/7ON997w.png) # Linux 2.6 scheduler演進的一些歷史在2.4之前的kernel，scheduler只有一個run queue，到了2.6版本，O(1) scheduler演進成每個cpu上都弄一個runqueue, 每個runqueue都用上了一個unique lock.這個lock從一個single global lock演進成分散在不同cpu上的獨自的lock，這就大大增加了效能。(因為你某顆cpu拿到了global lock就擋住了其它cpu運算)，在更之後的2.6版本裡CFS演算法又更好的改善了效能。通常設計上要看到一段data structure太常出現contention，加上lock機制才會對整體performance有幫助，在多核cpu環境下，通常要設計出更多的locking，但同樣的機制擺到少核cpu環境下，卻會拖累整體performance，這就要考驗工程師們對目前環境的效能最佳化能調適到什麼程度。 # spinlock spinlock是linux kernel最常見的Lock，因為其會一直"spin"等待lock解鎖的特性，持有spinlock的時間最好 > 不要超過兩次context switch(swap process out & swap process in)的時間這樣才划的來，不然你busy waiting的時間把CPU讓給其他人整體效能才有加分到因為這樣的特性，**spinlock特別適用於interrupt context** # spinlock methods Spin locks are architecture-dependent and implemented in assembly ![](https://i.imgur.com/CkRIB3S.png) 先把kernel preemption關掉，再把lock設成1，進入 ![](https://i.imgur.com/sq2Krdf.png) ![](https://i.imgur.com/xmHVbVX.png) ![](https://i.imgur.com/z09mH4U.png) ![](https://i.imgur.com/rRUR0kU.png) * 在interrupt context使用spinlock的時候，必須先關掉local interrupt! 很重要!!!! * 在interrupt context使用spinlock的時候，必須先關掉local interrupt! 很重要!!!! * 在interrupt context使用spinlock的時候，必須先關掉local interrupt! 很重要!!!! 不然有可能一個interrupt戳進來，interrupt handler就把你preempt掉，這時你手中握著spinlock，interrupt handler這時就會spin住，等待你手中這把spinlock放出來，但你等不到cpu的，因為interrupt handler優先權比你高。記住你只需要disable current cpu，因為假如interrupt戳到的是別顆CPU，他一樣會spin住等你釋放這個spinlock，但interrupt handler是可以等到你釋放這個spinlock的。要使用disable interrupt版本的spinlock，請使用 ![](https://i.imgur.com/sh3tAOL.png) ![](https://i.imgur.com/BUKWPuv.png) irqrestore return interrupts to their previous state 因為你不知道之前的state是關掉還是打開不能在原本關掉的狀況下貿然打開他 ![](https://i.imgur.com/7BLCta6.png) 如果你已經知道interrupts一開始就被enabled 可以用這個版本 ![](https://i.imgur.com/kIZ5WhE.png) ![](https://i.imgur.com/QF8wO9A.png) 作者表示: linux現在已經複雜到不可能always知道什麼情況下interrupt一定打開啥的，所以用這個方法不安全，如果你要用，你最好100%確定你是正確的... CONFIG_DEBUG_LOCK_ALLOC **可以用來協助你spin_lock的debug** 其它spinlock ![](https://i.imgur.com/L2CBxbf.png) # Spin Locks and Bottom Halves > spin_lock_bh() > spin_unlock_bh() > * bottom half會preempt process context code, if data is shared between a bottom-half process context,需要lock * interrupt handler 會preempt bottom half，如果有shared data在interrupt handler跟bootom half之間，兩邊都需要lock且disable interrupts. * 只屬於tasklets自身的data因為same type不會同時在兩個CPU跑，所以這種的就不需要lock * shared data between two different tasklets--> spinlock * tasklets不會preempt another running tasklet 在同一顆cpu上，thus no need disable BH. * softirq always used locks.但 softirq never preempts another softirq running on the same processor, no need disable BH. ![](https://i.imgur.com/8gppRaP.png) ![](https://i.imgur.com/mgkJnPa.png) preempt_count()的解釋 (記錄了hardirq and softirq的進入次數，用來判斷目前是不是在interrupt context) https://zhuanlan.zhihu.com/p/88883239 grep一下net常用 # Reader-Writer Spin Locks When a data structure is neatly split into reader/writer or consumer/producer usage patterns, use **reader-writer spin locks** ex: Process Management ![](https://i.imgur.com/ak7fGQx.png) * W/W,R互斥 When the list is updated (written to), it is important that no other threads of execution concurrently write to or read from the list * R/W互斥 (when the list is searched (read from), it is only important that nothing else writes to the list) * R/R不互斥 ![](https://i.imgur.com/QngMP4E.png) 不能把write_lock夾在read_lock裡面，會deadlock ![](https://i.imgur.com/YnWHACa.png) 因為reader優先權比writer高，writer會spin住cpu，一直有reader進來所有cpu就死了 * If the read lock is held and a writer is waiting for exclusive access, readers that attempt to acquire the lock continue to succeed * The spinning writer does not acquire the lock until all readers release the lock * Therefore, a sufficient number of readers can starve pending writers. # Single CPU上spinlock的用處在單cpu上面只要你佔用了processor此時你要了一個lock，剛好的這個lock已經被其他的context（process or interrupt）所佔據。無論如何，當下你一定要進行context switch才能把執行權限放給其他的context把事情做完，並且把鎖釋放，這樣你才有機會獲得這把鎖。因此才有了以下不同的spin lock在單cpu裡面實做的變形，而不是真的在那邊spin。（在單處理器裡，spin lock的實做就是disable interrupt & disable preemption(自己快點做完)這樣就可以避免使用semaphore導致context switch） # Semaphores 一個會睡覺的lock. 當一個task嘗試取得一個semaphore時，如果Semaphore此時被其他人占用，semaphore會把這個task放到wait queue並且叫他先去睡覺。 Semaphores的特點 * 因為想拿semaphores的task會去睡覺的特性(拿不到的話)，semaphores特別適合critical section要做很久的的情況(遠遠大於兩個context switch time的情況) * 相反的，拿semaphore來lock時間很短的CS區，overhead會很嚴重，因為你比spinlock額外要maintain wait queue, waking back up等事情 * 因為thread在拿semaphore時會sleep, 你只能在process context去拿, 因為interrupt context是 not schedulable * interrupt handler本身有優先順序，高優先權的hardirq是可以preempt 低優先權的hardirq的，但這不是schedule，因為這沒有牽扯到任何process之間的交換 * 你可以在持有semaphore的時候sleep,雖然你盡可能的不要這樣做，但最壞情況你還是能wake up並把semaphore放掉 * 你在拿spinlock的時候不能取得semaphore,因為你可能在等semaphore的時候去sleep,但你不能在此時跑去睡覺 * semaphore不會disable kernel preemption * 所以使用semaphore不會影響到scheduling latency. /////**常用情況**///// * when synchronizing with user-space, semaphores are the sole solution * 當你真的有選擇的餘地，你在spinlock跟semaphore之間的考量點應該是lock hold time. # Counting and Binary Semaphores Semaphore的一個好處是可以同時允許多個lock holders. spinlock同時間只能有一個人拿著 semaphore可以分兩種，可以在declared time intialize count number. * mutex * usage count = 1 * 幾乎都用這個 * counting semaphore * usage count = count * 在linux kernel滿少用的 Semaphores的機制 * down() * atomic把count減一 * 減一後如果是0 or greater, 拿到lock * 如果是負值, task被放到wait queue, cpu去處理別的事情 * ![](https://i.imgur.com/E00KmOg.png) * up() * atmoic把count加一 * 如果semaphore's的wait queue不是空的，會去把在queue裡的task awakened and allowed to acquire the semaphore. ![](https://i.imgur.com/VK9qd9s.png) Init ![](https://i.imgur.com/wFcyPSM.png) # Using Semaphores down_interruptilbe() * if the semaphore is unavailable, it places the calling process to sleep in the TASK_INTERRUPTILBE state. * If the task receives a signal while waiting for the semaphore, it is awakened and down_interruptible() returns -EINTR down() * 會把task放去TASK_UNINTERRUPTIBLE, signal叫不醒，通常不會想用到他 down_trylock() * try to acquire the given semaphore without blocking * If the semaphore is already held, the function immediately returns nonzero * ![](https://i.imgur.com/cYrI7we.png) # Reader-Writer Semaphores 存在的目的跟R/W spinlocks很像 * All reader-writer semaphores are mutexes--that is, their usage count is one * although they enforce mutual exclusion only for writers, not readers * 只有uninterruptble sleep版本，只能用down() * ![](https://i.imgur.com/Ujww00B.png) * R/W鎖除非你明確知道你要保護的data struct有明顯的write paths & read paths之分，你才用 # Mutexes simple版本semaphore with usage count=1 ![](https://i.imgur.com/mwxAuNQ.png) ![](https://i.imgur.com/NPaAx2G.png) ![](https://i.imgur.com/vFGT3fZ.png) * mutex跟semaphore比有更嚴格的use case * Only one task can hold the mutex at a time * Whoever locked a mutex must unlock it * 不能跨context去unlock mutex * 所以不能用在kernel & user-space之間的同步 * 不能recursive locks and unlocks * a process can't exit while holding mutex(這有點問題，semaphore可以拿到lock的時候去sleep, mutex不能???) * ![](https://i.imgur.com/hqUNFaa.png) ![](https://i.imgur.com/I1t0GMb.png) 看起來會睡覺 = = * 不能在interrupt context拿mutex * 只能用official API managed,不能自己定義 # Semaphores vs Mutexes * 除非mutex的限制你不想用，不然prefer mutex than semaphores. * When writing new code, only specific, often low-level, uses need a semaphore * 從mutex先開始寫，碰到限制再改成Semaphore. # Spin Locks vs Mutexes ![](https://i.imgur.com/LluwM7E.png) # Completion Variables when two tasks in kernel，when one task needs to signal to the other that an event has occurred. One task waits on the completion variable while another task performs some work.When the other task has completed the work, it uses the completion variable to wake up any waiting tasks. 聽起來跟Semaphores很像，對沒錯 ex: vfork() system call ![](https://i.imgur.com/dqwgvHJ.png) # BKL lock: The Big Kernel Lock 早期kernel在用的lock，現在的人不能使用這種lock，但還是能在早期kernel看到他 ![](https://i.imgur.com/xmLupqu.png) # Sequential Locks 一種prefer writer than reader的locks，使用情境如以下 * Your data has a lot of readers * Your data has few writers. * Although few in number, you want to favor writers over readers and never allow readers to starve writers. * Your data is simple, such as a simple structure or even a single integer that, for whatever reason, cannot be made atomic. 使用者: jiffies ![](https://i.imgur.com/HEM9BLn.png) # Preemption Disabling 因為linux kernel is preemptive, a process可以隨時得被其它higher priority process搶佔，代表 critical region是會被搶占過程重入的。 * Kernel用spinlock來當作nonpreemptive regions的markers. * if a spinlock is held, kernel is not preemptive. ![](https://i.imgur.com/rqUk6Uq.png) 有些情況是我們只想要disable kernel preemption, 不想要spinlock，最常見的狀況是per-processor data. * 如果一個data is unique to each processor, there might be no need to protect it with a lock because only that one processor can access the data. 可能情況如下 ![](https://i.imgur.com/V0YwiLB.png) 可以call preempt_disable() ![](https://i.imgur.com/FHVaGMG.png) The preemption count stores the number of held locks and preempt_disable() calls * If the number is zero, the kernel is preemptive * If the value is one or greater, the kernel is not preemptive * This count is incredibly useful—it is a great way to do atomicity and sleep debugging.The function preempt_count() returns this value ex: get_cpu() This function disables kernel preemption prior to returning the current processor number # Ordering and Barriers compiler常常會優化程式先後順序作最佳化，但是當你不想要compiler自作聰明時，可以用barriers. ![](https://i.imgur.com/FIv8zaA.png) a & b沒有關聯性，compiler常會把他們reorder ![](https://i.imgur.com/Ms5Tkzm.png) a & b有關連性, compiler不會reorder this. * rmb(): read memory barrier * ensures no loads are reordered across the rmb() call. * no loads prior to the call will be reordered to after the call * no loads after the call will be reordered to before the call. * 像是一個城牆，裡面的跑不到外面，外面的跑不到裡面所以叫做barrier(屏障) * wmb(): write barrier * store版本 * mb(): both read barrier and write barrier.