Avoiding Scheduler Subversion using Scheduler–Cooperative Locks

# Avoiding Scheduler Subversion using Scheduler–Cooperative Locks ###### tags: ` scheduling ` ###### paper origin: EuroSys '20 ###### [paper](https://research.cs.wisc.edu/adsl/Publications/eurosys20-scl.pdf) ## Problem * Lock usage patterns determine which thread runs, thereby subverting CPU scheduling ## SCLs Scheduler Cooperative Locks (SCLs) * tracking lock usage * penalizing threads that use locks excessively * guaranteeing exclusive lock access with lock slices ## Scheduler Subversion ### Example * Thread priority of each thread to the default * The graph shows that **insert threads are allocated significantly more CPU than the find threads** (nearly six times more) leading to subversion of the scheduling goals ![](https://i.imgur.com/DjJJE7b.png) ### Casues of Scheduler Subversion * critical sections are of significantly different lengths * when different threads acquire a lock, they may hold the lock for varied amounts of * We call this property different critical-section lengths * when time spent in critical sections is high and there is significant contention for locks * We call this majority locked run time ## Lock Opportunity A new metric to measure lock usage ### Inability to control CPU allocation 1. T0 critical section is very long (10 seconds) 2. T1 is very short (1 second Goal : **give equal shares of the CPU to** each thread Result : **thread with the long critical section (T0) to obtain much more CPU** This simple example shows the inability of existing locks to control the CPU allocation ![](https://i.imgur.com/umPjTpr.png) ## Scheduler-Cooperative Locks SCL design is guided by four high-level goals: 1. Controlled lock usage allocation * Guarantee a specified amount of lock opportunity to competing entities irrespective of their lock usage patterns 2. High lock-acquisition fairness * Provide lock-acquisition fairness 3. Minimum overhead and scalable performance * minimize this overhead to provide high performance 4. Easy to port to existing applications * widely used SCL is comprised of three components : 1. Lock usage * Each lock must track its usage across each entity 2. Penalizing threads depending on lock usage * SCLs force threads that have used their lock usage quota to sleep when they prematurely try to reacquire a lock 3. Dedicated lock opportunity using lock * To avoid excessive locking overhead * It mitigate the cost of frequent lock owner transfers and related accounting ### u-SCL Implementation If the lock slice has not expired, the slice owner can acquire the lock as many times as needed (Step 6) ![](https://i.imgur.com/kDOeME4.png) #### Optimization Waiting thread that will next acquire the lock is allowed to start spinning. This mechanism improves performance by enabling a fast switch #### Limitations * Threads that sporadically acquire a lock continue to be counted in the total weight of threads for that lock ### k-SCL Implementation * k-SCL is the kernel implementation of u-SCL * Set the **lock slice size to zero** while accessing kernel service ### RW-SCL ![](https://i.imgur.com/rVCFo38.png) ## Evaluation ### Fairness and Performance #### u-SCL * Figure 5a shows the amount of time each of the two threads holds the lock (dark for the thread with a 1μs critical section, light for 3 μs and throughput * Figure 5b summarizes these results with Jain’s fairness metric for lock hold time * Figure 5c, the critical section size for half of the threads (8 total) is 1 μs and other half (also 8) is 3 μs respectively ![](https://i.imgur.com/20c6eQJ.png) ### Proportional Allocation #### uSCL * (3:1) indicates that shorter critical section threads (darker color) should receive three times the CPU of longer critical section threads (lighter) * (1:3) shows the inverse ratio The number on the top of each bar shows the lock usage fairness ![](https://i.imgur.com/kltFSZN.png) ### Lock Overhead #### u-SCL * left(a), the number of threads and CPU cores are increased, from 2 to * The mutex and u-SCL block while waiting to acquire a lock and hence perform better * right(b), the number of CPUs is fixed at two, but the number of threads is increased from 2 to * With u-SCL and mutex, only two threads or one thread, respectively, are running, which significantly reduces CPU scheduling ![](https://i.imgur.com/hcqLYPj.png) ### Lock Slice Sizes vs. Performance #### u-SCL Throughput and lantency ![](https://i.imgur.com/ok712Z5.png) ##### Interactive jobs The graph shows that for u-SCL, the length of the slice size has a large impact on wait-time ![](https://i.imgur.com/QPvsm1P.png) In summary, to deliver low latency, the slice size in u-SCL should always be less than or equal to the smallest critical section size ### Real-world Workloads In contrast, with mutexes, UpScaleDB allocated approximately only 2 CPU seconds to the find threads and 24 CPU seconds to the insert thread ![](https://i.imgur.com/VD98D72.png) RW-SCL ensures that the writer thread obtains 10% of the lock opportunity compared to 90% for the readers ![](https://i.imgur.com/ZkDrGCY.png) Varing the number of reader and writer thread ![](https://i.imgur.com/IVAvMbf.png) ### k-SCL Bully process, long period of thread hold a lock, may lead to starving ![](https://i.imgur.com/xi5pAXI.png) For the mutex, the bully process dominates lock usage, ![](https://i.imgur.com/e1ST17F.png) ## Conclusion Evaluation shows that SCLs ensure lock usage fairness even with extreme lock usage patterns and scale well.