Try โ€‚โ€‰HackMD

Scheduler BPF

BPF (Berkeley Packet Filter)

eBPF (extended BPF) is traditionally used to safely and securely accelerate the network path in Linux with custom logic specified by userspace.

Notable changes from cBPF (classic BPF) to eBPF:

  • 32-bit reg -> 64-bit reg
  • 2 general purpose reg -> 10 general purpose reg + 1 frame pointer reg
  • introduction of JIT compiler
  • upgraded instruction set, but remains backward-compatibility to cBPF

Scheduler BPF

Without scheduler BPF, one want to do scheduler related tuning without modifying the kernel would typically have only one interface, that is, debugfs for sched, /sys/kernel/debug/sched/*.

Not to mention these can do only coarse-grained tweaking.

Now, with introduction of scheduler BPF, we can:

  • do fine-grained scheduling policy tweaking, e.g. benefit specific workload by preempting other workloads (of course, with lower priority).
  • quickly (and safely) experiment with different policies in production, without having to shut down applications or reboot systems, to determine what the policies for different workloads should be.
  • have acceptable consequence of faulty policy tweaking, which is temporarily degraded system throughput. Instead of crashing the entire machine.

โ€ฆto name a few.

As a side note:

Our very first experiments with using BPF in CFS look very promising. We're at a very early stage, however already have seen a nice latency and ~1% RPS wins for our (Facebook's) main web workload.

ghOSt (GitHub โ€“ userspace, kernelspace)

A more aggressive scheduling policy customization project made by Google.

Adding scheduling class โ€“ ghOSt

The priority of the following scheduling classes is in ascending order from LEFT to right.

ghOSt is inserted next to the CFS sched class.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

(from lwn โ€“ Fixing SCHED_IDLE)

As you might have guessed, what if ghOSt crashed? It's totally fine, since the ghOSt scheduling class has lower priority than that of the CFS, it would simply fallback using the CFS to do the scheduling job.

However, in-kernel thread of ghOSt has real-time scheduling alike priority, which means that if something goes wrong with them, the system would still crash, or, starvation occurs.

Note that the scheduling class has no longer chained by a linked list. Instead, it's now presented as an array. Related discussion.

Architecture

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

(from the paper)

The userspace agent also leverages ghOSt events (TIMER_TICK, THREAD_WAKEUP, THREAD_PREEMP, etc.) sent from within the kernel to assist the policy tweaking work.

As a side note:

  • BPF is also employed by ghOSt to speed up some of the hot paths inside the kernel.
  • we implement a ghOSt policy for machines running our production, massive-scale database, which serves billions of requests per day. For that workload, we show for both throughput and latency that ghOSt matches and often outperforms by 40-50% the kernel scheduling policy we use today.

DEMO

Refs