# Hybrid Round-Robin(RR)/FIFO scheduler
contributed by < [`charliechiou`](https://github.com/charliechiou) > < [`EricccTaiwan`](https://github.com/EricccTaiwan) >
## Linux Environment
```shell
OS: Ubuntu 25.04 x86_64
Kernel: 6.14.0-16-generic
```
## Implementation
> [GitHub branch](https://github.com/cce-underdogs/scx/tree/otteryc_exp)
We are implementing a **hybrid Round-Robin(RR)/FIFO** scheduler based on `scx_rlfifo`, but we are encountering issues related to `slice_ns`.
Our scheduling policy is defined as follows:
* $NICE < 0 \to$ FIFO, and are pinned to single CPU
* $NICE \ge 0 \to$ RR, and are allowed to swith CPUs
> $NICE = 0$ $\Leftrightarrow$ `task.weight` $= 1000$
:::warning
Both RR and FIFO policy with `dispatched_task.slice_ns = 10_000_000 // 10 ms` (Code line 17)
:::
:::spoiler Code
```rust=
fn dispatch_tasks(&mut self) {
while let Ok(Some(task)) = self.bpf.dequeue_task() {
let mut dispatched_task = DispatchedTask::new(&task);
let t_weight = task.weight;
if t_weight > 100 {
// Nice < 0 => Treat as FIFO
// limit task migration to the same CPU
dispatched_task.cpu = task.cpu;
} else {
// Nice >= 0 => Treat as RR
let cpu = self.bpf.select_cpu(task.pid, task.cpu, task.flags);
dispatched_task.cpu = if cpu >= 0 { cpu } else { task.cpu };
}
// Assign 10ms to both policy
dispatched_task.slice_ns = 10_000_000;
self.bpf.dispatch_task(&dispatched_task).unwrap();
}
// This function will put the scheduler to sleep, until another task needs to run.
self.bpf.notify_complete(0);
}
```
:::
## Simulation
Test w/ stress-ng
```shell
$ # terminal 1
$ sudo ./bin/scx_rlfifo
$ # terminal 2
$ sudo scxtop trace --trace-ms 20000 --output-file test_10ms
$ # terminal 3
$ sudo watch nice -n -5 stress-ng --cpu 1 --timeout 1s
```
### Exp 1
We can see that the Wall duration is 1 second, indicating that the command is functioning correctly. However, the **Average Wall duration is reported as 55.55 ms**, which is unexpected.

### Exp 2
Avg Wall duration 17ms.

:::danger
Since we are only running a single `stress-ng` task on a single CPU, context switches rarely occur, allowing `stress-ng` to run continuously for a long time.
$\to$ To observe time slices under Round-Robin scheduling, we should run multiple tasks on the same CPU.
:::
## Question
- Is the issue caused by the testing procedure ?
- There is no context switch between each stress-ng task. So profetto may view each task together as a block.Which cause the time-slice bigger than our assignment.
- Profetto may loss the short task.
- We may have to assign more task to see the behavior.
- We are wondering whether the task slice is working properly.
## Clarification
- :::warning
In Perfetto, the Avg Wall duration refer to the average time a process runs before being interrupted, rather than average time slice.
:::
> got a :+1: from [name=Daniel Hodges]
- If a process runs for 10 consecutive time slices, each 10 ms long, without being interrupted, Perfetto will show a single Wall duration of 100 ms, instead of ten separate 10 ms durations.
> Perfetto traces `sched_switch()` events, but if the same task keeps running on the same CPU, you can't see the time that you assigned in the trace, because no `sched_switch()` event happened [name=Andrea Righi]


## What's next
1. > try this to track the actual time slice used by each task: https://github.com/sched-ext/scx/tree/rustland-core-track-time-slice, start the scheduler as normal and look at the trace in `$ cat /sys/kernel/debug/tracing/trace_pipe` [name=Andrea Righi]
2. When using RR scheduling on a single CPU with two `stress-ng` tasks running concurrently, and assigning a 10 ms time slice, I expect each task's runtime observed between sched_switch events to be less than or equal to 10 ms.
## PR (maybe)
1. I think it would be more intuitive if the duration were divided by time slices; however, the Wall duration calculation still stays the same.
2. `scxtop` can’t trace very short-lived tasks (as someone mentioned earlier, though I forgot who 😅).
3. Add $NICE$ in task struct ?
## 0513
So, we have tested with the default scheduler `scx_simple -f` and `scx_rlfifo` several times, with three `stress-ng` tasks running on the same CPU.
* In `scx_simple -f`, the time slice behaves as expected.

* whereas in `scx_rlfifo`, it varies.

and we couldn't figure out the problem ...
## Test with bpf_printk
記得先啟動 ssh 自度開啟,否則 server 重新啟動後會無法連線
```shell
$ sudo systemctl enable ssh # 開機後自動開啟
$ sudo systemctl status ssh # 檢查連線狀態
$ ssh localhost # 連線確認,確認可以連到自己
```
以隔離 **CPU#13** 為例,方便進行測試和觀察
```diff
$ sudo vim /etc/default/grub
$ # 加上下方的 diff
+ GRUB_CMDLINE_LINUX_DEFAULT="isolcpus=13 nohz_full=13 rcu_nocbs=13"
$ sudo update-grub
$ sudo reboot
```
~~粗暴言論~~
改用 [cpusets](https://docs.kernel.org/admin-guide/cgroup-v1/cpusets.html)