eBPF (extended BPF) is traditionally used to safely and securely accelerate the network path in Linux with custom logic specified by userspace.
Notable changes from cBPF (classic BPF) to eBPF:
Without scheduler BPF, one want to do scheduler related tuning without modifying the kernel would typically have only one interface, that is, debugfs for sched, /sys/kernel/debug/sched/*
.
Not to mention these can do only coarse-grained tweaking.
Now, with introduction of scheduler BPF, we can:
โฆto name a few.
As a side note:
Our very first experiments with using BPF in CFS look very promising. We're at a very early stage, however already have seen a nice latency and ~1% RPS wins for our (Facebook's) main web workload.
A more aggressive scheduling policy customization project made by Google.
The priority of the following scheduling classes is in ascending order from LEFT to right.
ghOSt is inserted next to the CFS sched class.
As you might have guessed, what if ghOSt crashed? It's totally fine, since the ghOSt scheduling class has lower priority than that of the CFS, it would simply fallback using the CFS to do the scheduling job.
However, in-kernel thread of ghOSt has real-time scheduling alike priority, which means that if something goes wrong with them, the system would still crash, or, starvation occurs.
Note that the scheduling class has no longer chained by a linked list. Instead, it's now presented as an array. Related discussion.
The userspace agent also leverages ghOSt events (TIMER_TICK, THREAD_WAKEUP, THREAD_PREEMP, etc.) sent from within the kernel to assist the policy tweaking work.
As a side note:
we implement a ghOSt policy for machines running our production, massive-scale database, which serves billions of requests per day. For that workload, we show for both throughput and latency that ghOSt matches and often outperforms by 40-50% the kernel scheduling policy we use today.