---
# System prepended metadata

title: linux2026-introspect
tags: [linux2026]

---

# Linux SLUB 配置器的 Per-CPU Freelist 設計與多核效能分析

contributed by <`davidzwei`>


> 課程 quiz3 的兩道題目提供了切入點。calculate_order() 裡的 roundup_pow_of_two() 是 SLUB 決定 slab 頁面數的核心計算；

> 想要研究：
SLUB 如何在 allocation 速度、記憶體碎片率、以及多核競爭之間取得平衡？


## 先備知識
https://hackmd.io/@sysprog/linux-memory
https://hackmd.io/@sysprog/linux-slab 
https://fosdem.org/2026/schedule/event/DPXVYJ-slub-sheaves-update/

[mm/slub.c](https://github.com/torvalds/linux/blob/master/mm/slub.c)


<!-- ## slub 歷史演化

- SLUB（2007）
- ...
- SLAB 移除（6.8）→ 只剩 SLUB
- 2026 SLUB percpu sheaves（6.18） -->


## 相關研究整理

### cmpxchg 路徑的原始碼追蹤
- `mm/slub.c → slab_alloc_node() → __slab_alloc() → ___slab_alloc()`
- `this_cpu_cmpxchg_double` 在哪裡，保護什麼


### SLUB Fast Path 與 Slow Path 機制分析

Linux Kernel v6.17

`void *__slab_alloc_node` 會決定 slowpath, fastpath

- [ ] fastpath:
- fastpath 發生條件
- this_cpu_cmpxchg_double
- tid 的用途

- [ ] slowpath:
當 fastpath 條件不滿足，會走向 slowpath

```c
if (!USE_LOCKLESS_FAST_PATH() ||
    unlikely(!object || !slab || !node_match(slab, node))) {
    object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
}
```

- 發生 slow path 條件:
- 與數學模型的對應:
    - 透過實驗檢查是否符合


### lab, quiz 延伸問題

- `is_power_of_2` 在 SLUB order 計算的角色
   `mm/slub.c` → `calculate_sizes()` → `get_order()`
   - 連結 quiz3 Problem F
   - 1. 原始碼追蹤：`calculate_sizes() → get_order() → roundup_pow_of_two()`
   - 2. 觀察：/sys/kernel/slab/*/order 的實際數值，印證碎片率分析

- N 核 coherence traffic 數學模型
   - 直接引用 warmup spinlock 教材的公式框架，套用到 freelist

### SLUB maintainer Vlastimil Babka 在 FOSDEM 2026 提出了「SLUB percpu sheaves」的新設計
動機正是現有 SLUB 的 per-CPU freelist 在某些情境下不夠有效率，目標是改善 bulk allocation 和 cmpxchg 操作的效能。
- 概念
- 原始碼追蹤
- trylock 用途


## 開發環境
```sh
$ gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ lscpu
Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             39 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      12
  On-line CPU(s) list:       0-11
Vendor ID:                   GenuineIntel
  Model name:                12th Gen Intel(R) Core(TM) i5-12400
    CPU family:              6
    Model:                   151
    Thread(s) per core:      2
    Core(s) per socket:      6
    Socket(s):               1
    Stepping:                2
    CPU(s) scaling MHz:      11%
    CPU max MHz:             5600.0000
    CPU min MHz:             800.0000
    BogoMIPS:                4992.00


$ uname -r
6.17.0-20-generic

```


tools

```sh
$ sudo bpftool prog list 2>&1 | head -3
58: cgroup_device  name s_snapd_desktop  tag 03b4eaae2f14641a  gpl
	loaded_at 2026-04-18T20:16:18+0800  uid 1000
	xlated 296B  jited 164B  memlock 4096B  map_ids 1

$ sudo perf stat ls 2>&1 | tail -5

       0.000000000 seconds user
       0.002412000 seconds sys
```


## 實驗
<!-- - freelist 滿溢觸發 partial list 遷移時的 cmpxchg 路徑，與 spinlock 的對比
- 用 perf 量化 kmalloc 在不同 allocation size 下的 L1/L2 miss rate
- quiz3 Problem F（is_power_of_2、SLUB order 計算）、warmup 的 cache coherence spinlock -->
- 改變三個變數，觀察 SLUB fast/slow path 的比例如何變化：
    * 變數一：object size → 決定走哪個 size class
    * 變數二：lifetime → 決定 freelist 的回收頻率
    * 變數三：thread 數 → 決定 per-CPU 的競爭程度
    * 方法
        * eBPF kprobe `___slab_alloc` 紀錄 slow path 比例變化

- 升級到 kernel 6.18，啟用 sheaves 用同樣的 workload 重跑
    - 比較差異 vs kernel 6.17


## 研究問題

本研究要回答三個問題：

- Q1. per-CPU freelist 在幾個執行緒之後 slow path 比例開始
    非線性上升？
    - object size × lifetime × thread 數
    - 觀察與記錄問題 
<!--     （object size × thread 數的退化矩陣） -->
<!-- - Q2. object size 和 lifetime 如何影響 freelist 的有效性？ -->
    
- Q2. kernel 6.18 的 sheaves 機制在相同 workload 下，
    slow path 比例改善了多少？
    - 觀察與紀錄問題


## References
- [2026 fosdem](https://fosdem.org/2026/schedule/event/DPXVYJ-slub-sheaves-update/)
- [mm/slub.c](https://github.com/torvalds/linux/blob/master/mm/slub.c)