# Avoiding Scheduler Subversion using Scheduler–Cooperative Locks
###### tags: ` scheduling `
###### paper origin: EuroSys '20
###### [paper](https://research.cs.wisc.edu/adsl/Publications/eurosys20-scl.pdf)
## Problem
* Lock usage patterns determine which thread runs, thereby subverting CPU scheduling
## SCLs
Scheduler Cooperative Locks (SCLs)
* tracking lock usage
* penalizing threads that use locks excessively
* guaranteeing exclusive lock access with lock slices
## Scheduler Subversion
### Example
* Thread priority of each thread to the default
* The graph shows that **insert threads are allocated significantly more CPU than the find threads** (nearly six times more) leading to subversion of the scheduling goals

### Casues of Scheduler Subversion
* critical sections are of significantly different lengths
* when different threads acquire a lock, they may hold the lock for varied amounts of
* We call this property different critical-section lengths
* when time spent in critical sections is high and there is significant contention for locks
* We call this majority locked run time
## Lock Opportunity
A new metric to measure lock usage
### Inability to control CPU allocation
1. T0 critical section is very long (10 seconds)
2. T1 is very short (1 second
Goal : **give equal shares of the CPU to** each thread
Result : **thread with the long critical section (T0) to obtain much more CPU**
This simple example shows the inability of existing locks to control the CPU allocation

## Scheduler-Cooperative Locks
SCL design is guided by four high-level goals:
1. Controlled lock usage allocation
* Guarantee a specified amount of lock opportunity to competing entities irrespective of their lock usage patterns
2. High lock-acquisition fairness
* Provide lock-acquisition fairness
3. Minimum overhead and scalable performance
* minimize this overhead to provide high performance
4. Easy to port to existing applications
* widely used
SCL is comprised of three components :
1. Lock usage
* Each lock must track its usage across each entity
2. Penalizing threads depending on lock usage
* SCLs force threads that have used their lock usage quota to sleep when they prematurely try to reacquire a lock
3. Dedicated lock opportunity using lock
* To avoid excessive locking overhead
* It mitigate the cost of frequent lock owner transfers and related accounting
### u-SCL Implementation
If the lock slice has not expired, the slice owner can acquire the lock as many times as needed (Step 6)

#### Optimization
Waiting thread that will next acquire the lock is allowed to start spinning. This mechanism improves performance by enabling a fast switch
#### Limitations
* Threads that sporadically acquire a lock continue to be counted in the total weight of threads for that lock
### k-SCL Implementation
* k-SCL is the kernel implementation of u-SCL
* Set the **lock slice size to zero** while accessing kernel service
### RW-SCL

## Evaluation
### Fairness and Performance
#### u-SCL
* Figure 5a shows the amount of time each of the two threads holds the lock (dark for the thread with a 1μs critical section, light for 3 μs and throughput
* Figure 5b summarizes these results with Jain’s fairness metric for lock hold time
* Figure 5c, the critical section size for half of the threads (8 total) is 1 μs and other half (also 8) is 3 μs respectively

### Proportional Allocation
#### uSCL
* (3:1) indicates that shorter critical section threads (darker color) should receive three times the CPU of longer critical section threads (lighter)
* (1:3) shows the inverse ratio
The number on the top of each bar shows the lock usage fairness

### Lock Overhead
#### u-SCL
* left(a), the number of threads and CPU cores are increased, from 2 to
* The mutex and u-SCL block while waiting to acquire a lock and hence perform better
* right(b), the number of CPUs is fixed at two, but the number of threads is increased from 2 to
* With u-SCL and mutex, only two threads or one thread, respectively, are running, which significantly reduces CPU scheduling

### Lock Slice Sizes vs. Performance
#### u-SCL
Throughput and lantency

##### Interactive jobs
The graph shows that for u-SCL, the length of the slice size has a large impact on wait-time

In summary, to deliver low latency, the slice size in u-SCL should always be less than or equal to the smallest critical section size
### Real-world Workloads
In contrast, with mutexes, UpScaleDB allocated approximately only 2 CPU seconds to the find threads and 24 CPU seconds to the insert thread

RW-SCL ensures that the writer thread obtains 10% of the lock opportunity compared to 90% for the readers

Varing the number of reader and writer thread

### k-SCL
Bully process, long period of thread hold a lock, may lead to starving

For the mutex, the bully process dominates lock usage,

## Conclusion
Evaluation shows that SCLs ensure lock usage fairness even with extreme lock usage patterns and scale well.