# [Linux 核心設計](https://beta.hackfoldr.org/linux/): `PREEMPT_RT` 作為邁向硬即時作業系統的機制 Copyright (**慣C**) 2016, 2019 [宅色夫](http://wiki.csie.ncku.edu.tw/User/jserv) ==[直播錄影(上)](https://youtu.be/15-ZVimSHTs)== ==[直播錄影(下)](https://youtu.be/xUU0vo9PHh0)== ## 簡介 「工業 4.0」反映了 real-time system (即時系統) 的需求,而自動控制工業大廠如 Siemens 和 KUKA 均投入研發資源在 GNU/Linux 為基礎的機器人設計,但尚未有圓滿地解決現有需求的單一方案,原因是 GNU/Linux 過於複雜而且得考慮到開發社群的多樣需求。 於是,在高速的發展過程中,邁向 hard real-time 作業系統的目標,伴隨著巨大的維護成本,其中 Linux 核心主流的解決方案就是 PREEMPT_RT。本線上講座嘗試從排程、中斷處理、high resolution timer (HRT),對稱多核心 (SMP) 架構特有設計等角度去探討 PREEMPT_RT 的設計和實做議題。 ## 1 分鐘看完數位馬達 [數位馬達](https://www.youtube.com/watch?v=AyQ1YeUxXeg) ![](https://i.imgur.com/ZnnBEg8.png) 一台機械手臂要價 120 萬貴嗎?其實很便宜 * 機械手臂的組裝速度,是人類的 3 倍 * 人類一天工作 8 小時,機械手臂可以 24 小時 * 人類一周休息 2 天,機械手臂工作 7 天 * 人類一年放假 130 天左右,機械手臂工作 365 天 整體算起來,機械手臂是人類的 12 倍產能 * 正常人一年的薪水最低算 30 萬就好,這樣換算起來,手臂的價格只要 1/3 * 10 隻手臂可以抵上百人,更省了管理問題 (上百人的公司,又要工會、福委會、育嬰室有的沒的) 所以建造機器人大軍絕對正確,千萬不要跟機器人競爭,時代在轉變,工作型態一定要轉變。 [[source](https://www.facebook.com/photo.php?fbid=10154122588853898&set=a.116227113897.99543.634428897)] * 機械手臂結構大致有: * Cartesian: PPP; * Cylindrical: RPP; * Spherical: RRP; * Articulated: RRR; * SCARA(Selective Compliant Articulated Robot for Assembly): RRP。 (P: prismatic ;R: revolute) * 機械手的部份,又有Normal;Sliding;Approach等三軸。 ## Real-time Linux Real-time Linux 不僅可拿來控制馬達,事實上應用範疇相當廣,早在 1997 年,Linux 被改造為 hard real-time 系統時,其中一個目的就是為了打造火箭控制系統 (請見 [RTLinux](https://en.wikipedia.org/wiki/RTLinux), 研究人員來自美國新墨西哥大學 [NMT]),經過這近 20 年的演化,產生了相當多不同的組合。 ## POSIX * [POSIX – 25 Years of Open Standard APIs](http://www.rtcmagazine.com/posix-25-years-of-open-standard-apis/) ![](https://i.imgur.com/cdkiqmb.jpg) * 用圖表說明了 PSE51, PSE52, PSE53, PSE54,以及用於即時處理應用的 IEEE 1003.13-2003 (POSIX.13) Profiles * 1986 年,IEEE 指定了一個委員會制定了一個開放作業系統的標準,稱為 POSIX (Portable Operating Systems Interface),最後加上個 "X" 是因為本質上是 UNIX 的標準 [API standards for Open Systems](http://www.opengroup.org/austin/papers/wp-apis.txt) * IEEE POSIX Standards for Real-time * The PASC Real-time System Services Working Group (SSWG-RT) has developed a series of standards that amend IEEE Std 1003.1-1990 and a profile standard (IEEE Std 1003.13-1998). * The Real-time amendments to IEEE Std 1003.1-1990 are as follows: * IEEE Std 1003.1b-1993 Realtime Extension * IEEE Std 1003.1c-1995 Threads * IEEE Std 1003.1d-1999 Additional Realtime Extensions * IEEE Std 1003.1j-2000 Advanced Realtime Extensions * IEEE Std 1003.1q-2000 Tracing * Note that RTLinux from FSMLabs turns the 1003.13 hierarchy upsidedown, with the smaller PSE51/52 realtime threads in control, and the full-figured Linux system (similar in functionality to PSE54) as just another thread under control of the realtime threads. * This is the opposite of what the PASC SSWG-RT had imagined when 1003.13-1998 was written, but it nonetheless works. * [POSIX module on top of Xenomai](https://gitlab.denx.de/Xenomai/xenomai/blob/eol/v2.6.x/doc/txt/pse51-skin.txt) ## Preemptible Kernel * [Linux 5.3 已整合 PREEMPT_RT 相關修改](https://lwn.net/ml/linux-kernel/20190715150402.798499167@linutronix.de/) * [Making Linux do Hard Real-time](https://www.slideshare.net/jserv/realtime-linux) * [SMP/Linux Real-time Analysis](http://wiki.csie.ncku.edu.tw/embedded/rt-linux-smp.pdf) * 解釋 resolution 前,要先搞懂 Spatial frequency (綠色: preemptible; 紅色: non-preemptible) **Non-Preemptive** ![](https://i.imgur.com/gNXyQoz.png) [ CONFIG_PREEMPT_NONE ] * Preemption is not allowed in Kernel Mode * Preemption could happen upon returning to user space **Preemption Points in Linux Kernel** ![](https://i.imgur.com/7591V5x.png) [ CONFIG_PREEMPT ] * Implicit preemption in Kernel * preempt_count * Member of thread_info * Preemption could happen when preempt_count == 0 **Fully Preemptive** ![](https://i.imgur.com/rotZoIu.png) [CONFIG_PREEMPT_RT_BASE ] / [ CONFIG_PREEMPT_RT_FULL ] * Difference appears in the interrupt context * Goal: Preempt Everywhere except * Preempt disable * Interrupt disable * Reduce non-preemptible cases in kernel * spin_lock * Interrupt [中斷處理和現代架構考量](https://hackmd.io/s/S1WKTCFM4) ![](https://i.imgur.com/oLbYnVd.png) > Timeline of merged real-time features in the mainline Linux kernel, most of them coming from the PREEMPT_RT patch [The real-time linux kernel: A survey on Preempt_RT](https://www.researchgate.net/publication/331290349_The_real-time_linux_kernel_A_survey_on_Preempt_RT) ## 原有的問題 原本的 Linux 核心從對一個即時行程 (realtime process) 予以排程,到該即時行程實際投入執行之間的延遲時間 (稱為 latency) 行為上不可預測,因此無法給予有著硬即時需求的任務任何保證,自然不可能滿足即時需求。 對一個即時行程進行排程,僅意味著它可執行,並不意味著它實際會執行。一個行程能否實際執行,需要取決於目前執行中的任務是否允許搶佔 (preempt) 的發生,否則待執行的即時行程仍可能嚴重地延遲。 考慮下圖情境: ![](https://i.imgur.com/zmg3vz4.png) 行程 A (非即時行程) 進入一個由 spinlock 保護的臨界區域,這意味著該 CPU 在目前執行的任務離開臨界區之前,無法發生行程切換,但不會阻止中斷的發生。假設這時中斷觸發,且在中斷處理中喚醒了即時行程 B,此時 B 雖是即時行程,但仍不能執行,因為臨界區域不允許行程切換。直到中斷結束,即時行程 B 待命,等待行程 A 離開臨界區域後,才得以實際執行行程 B!若臨界區域很長,那麼即時行程 B 將會嚴重地拖延。 ## PREEMPT_RT 的手法 該如何強化 Linux 核心對硬即時處理的能力呢? PREEMPT_RT 首先將中斷處理轉化為核心執行緒,保證它不再佔用任意 context,這保證即便中斷被即時行程搶佔,也不會連累無辜地被它佔用核心堆疊的行程,再者,將一般行程使用的 spinlock 改為可以睡眠的 mutex (所謂的 [sleeping spinlock](https://lwn.net/Articles/271817/)),如此可保證喚醒即時行程後,立即執行該即時行程。 接著讓我們思考:spinlock 的實作中,需要關閉中斷嗎? 若中斷處理程式中需要鎖上該 spinlock,那麼只需要關掉 local (本地,也就是多核處理器中,執行該程式的處理器) 中斷即可。一旦關閉本地中斷,可保證本地處理器不會再觸動 spinlock 的標識,但其它處理器依然可在鎖的標誌位上 spin 等待,這不會造成死鎖,因為本地處理器總會釋放鎖。 * 延伸閱讀: [宋寶華: Linux 即時補丁的原理和實踐](https://blog.csdn.net/juS3Ve/article/details/79788554) ## 回顧 Linux Foundation 舉辦的研討會中即時相關議程 - [ ] [Maintaining a Real Time Stable Kernel](https://elinux.org/images/1/14/Elc-rt-stable-2018.pdf) / [錄影](https://youtu.be/pIJ3Zv_uxn0) - [ ] [Not Really, But Kind of Real Time Linux](https://elinux.org/images/d/df/Kind_of_real_time_linux4.pdf) / [錄影](https://youtu.be/S7vE9NpOTns) - [ ] [Exploit the Advantages and Resolve the Challenges of Multicore Technology with Linux](http://wiki.csie.ncku.edu.tw/embedded/02-linux-mc.pdf) - [ ] [Xenomai 3: An Overview of the Real-Time Framework for Linux - Jan Kiszka, Siemens AGapplications](http://events.linuxfoundation.org/sites/events/files/slides/ELC-2016-Xenomai_0.pdf) - PLC, machine control system - printing machines (manroland) - network switches (Ruggedcom) - Magnetic resonance tomograph (Siemens Health care) symbo wrapping - pthread_mutex_lock => __wrap_pthread_mutex_lock preserve Linux service for Cobalt service - system call - trap RTDM - character device * UDD (analogous to UIO), memory - protocol device tool - ipipe latency tracer valgrind / Helgrind - not supported because of unknown system call SIGDEBUG (SIGXPCPU): enable when RT thread enter RT time-critical phrases functional limit changes to critical subsystem regull artly cuse /7A "Dovetail" - co-kernel extension - sharing of CPU traps - [ ] [SCHED_DEADLINE: A Status Update - Juri Lelli, ARM Ltd](http://events.linuxfoundation.org/sites/events/files/slides/SCHED_DEADLINE-20160404.pdf) not only about deadline since 3.14 real-time scheduling policy: higher prio than NORMAL and FIFO/RR enables predictable task scheduling - allow explicit per-task latency constraints - avoid starvation (tasks can not eat all available CPU time) - enrich scheduler's knowledge about QoS requirements policies - EDF (Earlies Deadline First) - CBS (constant bandwidth server) resource (CPU) servation mechanism - Q time unit (runtime) in every interval of length P (period) EDF + CBS provides temporal isolation load balancing + inheritance work with PREEMPT_RT? orthogonal QoS under discussing - bandwidth reclaiming - [ ] [Understanding a Real-Time System - Steven Rostedt, Red Hat](http://events.linuxfoundation.org/sites/events/files/slides/elc-understanding-rt-system-2016.pdf) real fast vs. real time - hot cache: look ahead features - paging: TLB - least interruptions - optimize the most likely case: transactional memory branch prediction - deadline_test NUMA - memory speeds dependent on CPU - need to organize the tasks Hyper-threading - recommended to disable on RT system management interrupt (SMI) - put processor into system management mode (SMM) RT kernel - threaded interrupt - system management threads - high resolution timer threaded interrupt - don't poll network task (higher prio) on network interrupt software interrupt - network irq will run network softirqs except for softirqs raised by real hard interrupt - RCU - timer - run in ksoftirqd timer - setitimer(): requires ksoftirqd to run (on PREEMPT_RT) - timer_create() / timer_settime() NO_HZ - when CPU is idle, turn off timers - let CPU go into deep sleep - great for power saving NO_HZ_FULL - works if only one task priority inheritance locking - prevent unbounded latency - pthread set protocol real-time vs. multi-process - migrateion clear caches (memory and TLB)