---
tags: LINUX KERNEL, LKI
---
# [Linux 核心設計](https://beta.hackfoldr.org/linux/): `PREEMPT_RT` 作為邁向硬即時作業系統的機制
Copyright (**慣C**) 2016, 2019 [宅色夫](http://wiki.csie.ncku.edu.tw/User/jserv)
==[直播錄影(上)](https://youtu.be/15-ZVimSHTs)==
==[直播錄影(下)](https://youtu.be/xUU0vo9PHh0)==
## 簡介
「工業 4.0」反映了 real-time system (即時系統) 的需求,而自動控制工業大廠如 Siemens 和 KUKA 均投入研發資源在 GNU/Linux 為基礎的機器人設計,但尚未有圓滿地解決現有需求的單一方案,原因是 GNU/Linux 過於複雜而且得考慮到開發社群的多樣需求。
於是,在高速的發展過程中,邁向 hard real-time 作業系統的目標,伴隨著巨大的維護成本,其中 Linux 核心主流的解決方案就是 PREEMPT_RT。本線上講座嘗試從排程、中斷處理、high resolution timer (HRT),對稱多核心 (SMP) 架構特有設計等角度去探討 PREEMPT_RT 的設計和實作議題。
## 1 分鐘看完數位馬達
[數位馬達](https://www.youtube.com/watch?v=AyQ1YeUxXeg)
![](https://i.imgur.com/ZnnBEg8.png)
一台機械手臂要價 120 萬貴嗎?其實很便宜
* 機械手臂的組裝速度,是人類的 3 倍
* 人類一天工作 8 小時,機械手臂可以 24 小時
* 人類一周休息 2 天,機械手臂工作 7 天
* 人類一年放假 130 天左右,機械手臂工作 365 天
整體算起來,機械手臂是人類的 12 倍產能
* 正常人一年的薪水最低算 30 萬就好,這樣換算起來,手臂的價格只要 1/3
* 10 隻手臂可以抵上百人,更省了管理問題 (上百人的公司,又要工會、福委會、育嬰室有的沒的)
所以建造機器人大軍絕對正確,千萬不要跟機器人競爭,時代在轉變,工作型態一定要轉變。
[[source](https://www.facebook.com/photo.php?fbid=10154122588853898&set=a.116227113897.99543.634428897)]
* 機械手臂結構大致有:
* Cartesian: PPP;
* Cylindrical: RPP;
* Spherical: RRP;
* Articulated: RRR;
* SCARA(Selective Compliant Articulated Robot for Assembly): RRP。
(P: prismatic ;R: revolute)
* 機械手的部份,又有Normal;Sliding;Approach等三軸。
## Real-time Linux
Real-time Linux 不僅可拿來控制馬達,事實上應用範疇相當廣,早在 1997 年,Linux 被改造為 hard real-time 系統時,其中一個目的就是為了打造火箭控制系統 (請見 [RTLinux](https://en.wikipedia.org/wiki/RTLinux), 研究人員來自美國新墨西哥大學 [NMT]),經過這近 20 年的演化,產生了相當多不同的組合。
## POSIX
* [POSIX – 25 Years of Open Standard APIs](http://www.rtcmagazine.com/posix-25-years-of-open-standard-apis/)
![](https://i.imgur.com/cdkiqmb.jpg)
* 用圖表說明了 PSE51, PSE52, PSE53, PSE54,以及用於即時處理應用的 IEEE 1003.13-2003 (POSIX.13) Profiles
* 1986 年,IEEE 指定了一個委員會制定了一個開放作業系統的標準,稱為 POSIX (Portable Operating Systems Interface),最後加上個 "X" 是因為本質上是 UNIX 的標準
[API standards for Open Systems](http://www.opengroup.org/austin/papers/wp-apis.txt)
* IEEE POSIX Standards for Real-time
* The PASC Real-time System Services Working Group (SSWG-RT) has developed a series of standards that amend IEEE Std 1003.1-1990 and a profile standard (IEEE Std 1003.13-1998).
* The Real-time amendments to IEEE Std 1003.1-1990 are as follows:
* IEEE Std 1003.1b-1993 Realtime Extension
* IEEE Std 1003.1c-1995 Threads
* IEEE Std 1003.1d-1999 Additional Realtime Extensions
* IEEE Std 1003.1j-2000 Advanced Realtime Extensions
* IEEE Std 1003.1q-2000 Tracing
* Note that RTLinux from FSMLabs turns the 1003.13 hierarchy upsidedown, with the smaller PSE51/52 realtime threads in control, and the full-figured Linux system (similar in functionality to PSE54) as just another thread under control of the realtime threads.
* This is the opposite of what the PASC SSWG-RT had imagined when 1003.13-1998 was written, but it nonetheless works.
* [POSIX module on top of Xenomai](https://gitlab.denx.de/Xenomai/xenomai/blob/eol/v2.6.x/doc/txt/pse51-skin.txt)
## Preemptible Kernel
* [Linux 核心搶佔](https://hackmd.io/@sysprog/linux-preempt)
* [Linux 5.3 已整合 PREEMPT_RT 部分修改](https://lwn.net/ml/linux-kernel/20190715150402.798499167@linutronix.de/)
* [Making Linux do Hard Real-time](https://www.slideshare.net/jserv/realtime-linux)
* [SMP/Linux Real-time Analysis](http://wiki.csie.ncku.edu.tw/embedded/rt-linux-smp.pdf)
* 解釋 resolution 前,要先搞懂 Spatial frequency
(綠色: preemptible; 紅色: non-preemptible)
**Non-Preemptive**
![](https://i.imgur.com/gNXyQoz.png)
[ CONFIG_PREEMPT_NONE ]
* Preemption is not allowed in Kernel Mode
* Preemption could happen upon returning to user space
**Preemption Points in Linux Kernel**
![](https://i.imgur.com/7591V5x.png)
[ CONFIG_PREEMPT ]
* Implicit preemption in Kernel
* preempt_count
* Member of thread_info
* Preemption could happen when preempt_count == 0
**Fully Preemptive**
![](https://i.imgur.com/rotZoIu.png)
[CONFIG_PREEMPT_RT_BASE ] / [ CONFIG_PREEMPT_RT_FULL ]
* Difference appears in the interrupt context
* Goal: Preempt Everywhere except
* Preempt disable
* Interrupt disable
* Reduce non-preemptible cases in kernel
* spin_lock
* Interrupt
[中斷處理和現代架構考量](https://hackmd.io/s/S1WKTCFM4)
![](https://i.imgur.com/oLbYnVd.png)
> Timeline of merged real-time features in the mainline Linux kernel, most of them coming from the PREEMPT_RT patch
[The real-time linux kernel: A survey on Preempt_RT](https://www.researchgate.net/publication/331290349_The_real-time_linux_kernel_A_survey_on_Preempt_RT)
## 原有的問題
原本的 Linux 核心從對一個即時行程 (realtime process) 予以排程,到該即時行程實際投入執行之間的延遲時間 (稱為 latency) 行為上不可預測,因此無法給予有著硬即時需求的任務任何保證,自然不可能滿足即時需求。
對一個即時行程進行排程,僅意味著它可執行,並不意味著它實際會執行。一個行程能否實際執行,需要取決於目前執行中的任務是否允許搶佔 (preempt) 的發生,否則待執行的即時行程仍可能嚴重地延遲。
考慮下圖情境:
![](https://i.imgur.com/zmg3vz4.png)
行程 A (非即時行程) 進入一個由 spinlock 保護的臨界區域,這意味著該 CPU 在目前執行的任務離開臨界區之前,無法發生行程切換,但不會阻止中斷的發生。假設這時中斷觸發,且在中斷處理中喚醒了即時行程 B,此時 B 雖是即時行程,但仍不能執行,因為臨界區域不允許行程切換。直到中斷結束,即時行程 B 待命,等待行程 A 離開臨界區域後,才得以實際執行行程 B!若臨界區域很長,那麼即時行程 B 將會嚴重地拖延。
## PREEMPT_RT 的手法
該如何強化 Linux 核心對硬即時處理的能力呢?
PREEMPT_RT 首先將中斷處理轉化為核心執行緒,保證它不再佔用任意 context,這保證即便中斷被即時行程搶佔,也不會連累無辜地被它佔用核心堆疊的行程,再者,將一般行程使用的 spinlock 改為可以睡眠的 mutex (所謂的 [sleeping spinlock](https://lwn.net/Articles/271817/)),如此可保證喚醒即時行程後,立即執行該即時行程。
接著讓我們思考:spinlock 的實作中,需要關閉中斷嗎?
若中斷處理程式中需要鎖上該 spinlock,那麼只需要關掉 local (本地,也就是多核處理器中,執行該程式的處理器) 中斷即可。一旦關閉本地中斷,可保證本地處理器不會再觸動 spinlock 的標識,但其它處理器依然可在鎖的標誌位上 spin 等待,這不會造成死鎖,因為本地處理器總會釋放鎖。
* 延伸閱讀: [宋寶華: Linux 即時補丁的原理和實踐](https://blog.csdn.net/juS3Ve/article/details/79788554)
## 回顧 Linux Foundation 舉辦的研討會中即時相關議程
- [ ] [Maintaining a Real Time Stable Kernel](https://elinux.org/images/1/14/Elc-rt-stable-2018.pdf) / [錄影](https://youtu.be/pIJ3Zv_uxn0)
- [ ] [Not Really, But Kind of Real Time Linux](https://elinux.org/images/d/df/Kind_of_real_time_linux4.pdf) / [錄影](https://youtu.be/S7vE9NpOTns)
- [ ] [Exploit the Advantages and Resolve the Challenges of Multicore Technology with Linux](http://wiki.csie.ncku.edu.tw/embedded/02-linux-mc.pdf)
- [ ] [Xenomai 3: An Overview of the Real-Time Framework for Linux - Jan Kiszka, Siemens AGapplications](http://events.linuxfoundation.org/sites/events/files/slides/ELC-2016-Xenomai_0.pdf)
- PLC, machine control system
- printing machines (manroland)
- network switches (Ruggedcom)
- Magnetic resonance tomograph (Siemens Health care)
symbo wrapping
- pthread_mutex_lock => __wrap_pthread_mutex_lock
preserve Linux service for Cobalt service
- system call
- trap
RTDM
- character device
* UDD (analogous to UIO), memory
- protocol device
tool
- ipipe latency tracer
valgrind / Helgrind
- not supported because of unknown system call
SIGDEBUG (SIGXPCPU): enable when RT thread enter RT time-critical phrases
functional limit
changes to critical subsystem regull artly cuse /7A
"Dovetail"
- co-kernel extension
- sharing of CPU traps
- [ ] [SCHED_DEADLINE: A Status Update - Juri Lelli, ARM Ltd](http://events.linuxfoundation.org/sites/events/files/slides/SCHED_DEADLINE-20160404.pdf)
not only about deadline
since 3.14
real-time scheduling policy: higher prio than NORMAL and FIFO/RR
enables predictable task scheduling
- allow explicit per-task latency constraints
- avoid starvation (tasks can not eat all available CPU time)
- enrich scheduler's knowledge about QoS requirements
policies
- EDF (Earlies Deadline First)
- CBS (constant bandwidth server)
resource (CPU) servation mechanism
- Q time unit (runtime) in every interval of length P (period)
EDF + CBS provides temporal isolation
load balancing + inheritance
work with PREEMPT_RT? orthogonal
QoS
under discussing
- bandwidth reclaiming
- [ ] [Understanding a Real-Time System - Steven Rostedt, Red Hat](http://events.linuxfoundation.org/sites/events/files/slides/elc-understanding-rt-system-2016.pdf)
real fast vs. real time
- hot cache: look ahead features
- paging: TLB
- least interruptions
- optimize the most likely case: transactional memory
branch prediction
- deadline_test
NUMA
- memory speeds dependent on CPU
- need to organize the tasks
Hyper-threading
- recommended to disable on RT
system management interrupt (SMI)
- put processor into system management mode (SMM)
RT kernel
- threaded interrupt
- system management threads
- high resolution timer
threaded interrupt
- don't poll network task (higher prio) on network interrupt
software interrupt
- network irq will run network softirqs
except for softirqs raised by real hard interrupt
- RCU
- timer
- run in ksoftirqd
timer
- setitimer(): requires ksoftirqd to run (on PREEMPT_RT)
- timer_create() / timer_settime()
NO_HZ
- when CPU is idle, turn off timers
- let CPU go into deep sleep
- great for power saving
NO_HZ_FULL
- works if only one task
priority inheritance locking
- prevent unbounded latency
- pthread set protocol
real-time vs. multi-process
- migrateion clear caches (memory and TLB)