Compare Linux CFS with FreeBSD ULE #48

tags: `linux2021 jserv 1-on-1`

Paper
ULE 介紹

TODO

讀完 ULE 介紹跟看完影片，我發現論文的細節跟 ULE 介紹的有一點出入

差異性

從論文中感受到兩者實作之 starvation 問題。

CFS 強調 fairness, 而 ULE 把 thread 分類後，如果你不幸不是在 interactive queue 的話，就有機會真的被 starve 死，這就是根本的設計精神的差別了。

LOC
- CFS: ~17900 lines
- ULE: ~2950 lines
load vs interactivity penalty metric
- CFS: load == average CPU utilization of a thread, weighted by thread's priority
- ULE: interactivity penalty metric
  - $[0, 100]$
  - kept for the last 5 seconds
  - $< 30$ is considered interactive
  - Negative nice value -> easier for the thread to be considered interactive
  - r: time spent running
  - s: time spent sleeping voluntarily
  - $p e n a l t y (r, s) = {\begin{cases} \frac{m}{\frac{s}{r}} & s > r \\ \frac{m}{\frac{r}{s} + m} & otherwise \end{cases}$
Load balancing
- both will run when a core is idle
- interval
  - CFS: 4ms
  - ULE: varies between 0.5s and 1.5s
- topology
  - CFS: respect
  - ULE: disregard
- group
  - CFS: cgroup
  - ULE: N/A
- stealing
  - CFS: max 32 threads per core per run
  - ULE: 1 thread per core per run
- the core in charge
  - CFS: …
  - ULE: always core 0
- strategy
  - CFS: even out average amount of work (load)
    - load of the core = total of runnable threads on the core
  - ULE: even out number of threads between cores
- result (from the experience in pg. 92)
  - CFS: balances the loads faster, never achieve perfect load balance
    - only balance loads between NUMA nodes when the diff in load is greater than 25%
  - ULE: better result in the long run
- fork
  - CFS: ??
  - ULE: placed on the core with lowest number of threads
runqueue
- Concept
  - CFS / ULE: per-core runqueue
- Data structure:
  - CFS: RB-tree with vruntime
    - pick next: the one with lowest vruntime
  - ULE: FIFO
    - pick next: pick from interactive first. If empty, then pick from batch.
    - 3 runqueues
      - interactive
        
        1 FIFO per priority
      - batch
        
        1 FIFO per priority
        
        the more the thread run, the lower the priority is
      - idle (only used when a core is idle)
    - the main idea is the give priority to interactive threads
starvation
- CFS: avoids it by scheduling all threads within a time period
  - $\leq 8$ threads -> default to 48 ms
  - $> 8$ threads -> 6 * number of threads ms
    - 6: avoid preemption
- ULE: might starve threads that it deems to be non-interactive for unbounded amount of time
  - In the batch runqueue, starvation was attempted to be avoided by keep the difference between threads low
timeslice
- CFS: same timeslice across all
- ULE: timeslice is dynamically adjusted
  - 1 core 1 thread: 10 ticks (78 ms)
  - lowerbound of 1 tick (1/127s = 7 ms)
preemption
- CFS: enabled
- ULE: disabled
  - only kernel thread can preempt others
wake-up of thread
- CFS: uses heuristics to improve cache usage
  - core to place the thread
    - factor: frequency of the thread the initiated the wakeup
      - 1-to-many producer-consumer pattern: spread out
      - 1-to-1 communication pattern: put on cores that shares caches
  - checks the differnece between vruntime of itself and of the currently running thread
    - if
      $< 1$ ms: no preemption of the currently running thread
      - sacriface latency to avoid frequent thread preemption
    - else: preemption
- ULE: affinity heuristic
  - if the thread is cache affine on the last core it ran on -> don't migrate
  - if failed, find the highest level in the topology that is considered affine, or the entire machine if none is available. From there, ULE first tries to find a core on which the minimum priority is higher than that of this thread.
  - if failed, pick the core with the lowest number of running threads on the machine
Miscellaneous
- CFS
  - fairness between thread -> fairness between cgroup
    - cgroup
      - supports nesting
      - vruntime = sum of all threads within it
      - when a cgroup is chosen for execution, the thread with the lowest vruntime is picked
  - ensures that vruntime of the min and max is less than 6ms
    - thread creation: vruntime = max vruntime
    - thread wakeup:
      $\geq$ min vruntime waiting to be scheduled
      - minimize latency for interactive applications

Compare Linux CFS with FreeBSD ULE #48

tags: linux2021 jserv 1-on-1

TODO

差異性

Read more

LKML 提交 patch

Infer

Data

Build Linux Kernel Image for Raspberry Pi from Source [Ubuntu Server]

tags: `linux2021 jserv 1-on-1`