linux2026-introspect

# Linux SLUB 配置器的 Per-CPU Freelist 設計與多核效能分析 contributed by <`davidzwei`> > 課程 quiz3 的兩道題目提供了切入點。calculate_order() 裡的 roundup_pow_of_two() 是 SLUB 決定 slab 頁面數的核心計算； > 想要研究： SLUB 如何在 allocation 速度、記憶體碎片率、以及多核競爭之間取得平衡？ ## 先備知識 https://hackmd.io/@sysprog/linux-memory https://hackmd.io/@sysprog/linux-slab https://fosdem.org/2026/schedule/event/DPXVYJ-slub-sheaves-update/ [mm/slub.c](https://github.com/torvalds/linux/blob/master/mm/slub.c)  ## 相關研究整理 ### cmpxchg 路徑的原始碼追蹤 - `mm/slub.c → slab_alloc_node() → __slab_alloc() → ___slab_alloc()` - `this_cpu_cmpxchg_double` 在哪裡，保護什麼 ### SLUB Fast Path 與 Slow Path 機制分析 Linux Kernel v6.17 `void *__slab_alloc_node` 會決定 slowpath, fastpath - [ ] fastpath: - fastpath 發生條件 - this_cpu_cmpxchg_double - tid 的用途 - [ ] slowpath: 當 fastpath 條件不滿足，會走向 slowpath ```c if (!USE_LOCKLESS_FAST_PATH() || unlikely(!object || !slab || !node_match(slab, node))) { object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); } ``` - 發生 slow path 條件: - 與數學模型的對應: - 透過實驗檢查是否符合 ### lab, quiz 延伸問題 - `is_power_of_2` 在 SLUB order 計算的角色 `mm/slub.c` → `calculate_sizes()` → `get_order()` - 連結 quiz3 Problem F - 1. 原始碼追蹤：`calculate_sizes() → get_order() → roundup_pow_of_two()` - 2. 觀察：/sys/kernel/slab/*/order 的實際數值，印證碎片率分析 - N 核 coherence traffic 數學模型 - 直接引用 warmup spinlock 教材的公式框架，套用到 freelist ### SLUB maintainer Vlastimil Babka 在 FOSDEM 2026 提出了「SLUB percpu sheaves」的新設計動機正是現有 SLUB 的 per-CPU freelist 在某些情境下不夠有效率，目標是改善 bulk allocation 和 cmpxchg 操作的效能。 - 概念 - 原始碼追蹤 - trylock 用途 ## 開發環境 ```sh $ gcc --version gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i5-12400 CPU family: 6 Model: 151 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 2 CPU(s) scaling MHz: 11% CPU max MHz: 5600.0000 CPU min MHz: 800.0000 BogoMIPS: 4992.00 $ uname -r 6.17.0-20-generic ``` tools ```sh $ sudo bpftool prog list 2>&1 | head -3 58: cgroup_device name s_snapd_desktop tag 03b4eaae2f14641a gpl loaded_at 2026-04-18T20:16:18+0800 uid 1000 xlated 296B jited 164B memlock 4096B map_ids 1 $ sudo perf stat ls 2>&1 | tail -5 0.000000000 seconds user 0.002412000 seconds sys ``` ## 實驗  - 改變三個變數，觀察 SLUB fast/slow path 的比例如何變化： * 變數一：object size → 決定走哪個 size class * 變數二：lifetime → 決定 freelist 的回收頻率 * 變數三：thread 數 → 決定 per-CPU 的競爭程度 * 方法 * eBPF kprobe `___slab_alloc` 紀錄 slow path 比例變化 - 升級到 kernel 6.18，啟用 sheaves 用同樣的 workload 重跑 - 比較差異 vs kernel 6.17 ## 研究問題本研究要回答三個問題： - Q1. per-CPU freelist 在幾個執行緒之後 slow path 比例開始非線性上升？ - object size × lifetime × thread 數 - 觀察與記錄問題   - Q2. kernel 6.18 的 sheaves 機制在相同 workload 下， slow path 比例改善了多少？ - 觀察與紀錄問題 ## References - [2026 fosdem](https://fosdem.org/2026/schedule/event/DPXVYJ-slub-sheaves-update/) - [mm/slub.c](https://github.com/torvalds/linux/blob/master/mm/slub.c)