Try   HackMD

Linux 核心設計: PREEMPT_RT 作為邁向硬即時作業系統的機制

Copyright (慣C) 2016, 2019 宅色夫

直播錄影(上)
直播錄影(下)

簡介

「工業 4.0」反映了 real-time system (即時系統) 的需求,而自動控制工業大廠如 Siemens 和 KUKA 均投入研發資源在 GNU/Linux 為基礎的機器人設計,但尚未有圓滿地解決現有需求的單一方案,原因是 GNU/Linux 過於複雜而且得考慮到開發社群的多樣需求。

於是,在高速的發展過程中,邁向 hard real-time 作業系統的目標,伴隨著巨大的維護成本,其中 Linux 核心主流的解決方案就是 PREEMPT_RT。本線上講座嘗試從排程、中斷處理、high resolution timer (HRT),對稱多核心 (SMP) 架構特有設計等角度去探討 PREEMPT_RT 的設計和實作議題。

1 分鐘看完數位馬達

數位馬達

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

一台機械手臂要價 120 萬貴嗎?其實很便宜

  • 機械手臂的組裝速度,是人類的 3 倍
  • 人類一天工作 8 小時,機械手臂可以 24 小時
  • 人類一周休息 2 天,機械手臂工作 7 天
  • 人類一年放假 130 天左右,機械手臂工作 365 天

整體算起來,機械手臂是人類的 12 倍產能

  • 正常人一年的薪水最低算 30 萬就好,這樣換算起來,手臂的價格只要 1/3
  • 10 隻手臂可以抵上百人,更省了管理問題 (上百人的公司,又要工會、福委會、育嬰室有的沒的)

所以建造機器人大軍絕對正確,千萬不要跟機器人競爭,時代在轉變,工作型態一定要轉變。
[source]

  • 機械手臂結構大致有:
    • Cartesian: PPP;
    • Cylindrical: RPP;
    • Spherical: RRP;
    • Articulated: RRR;
    • SCARA(Selective Compliant Articulated Robot for Assembly): RRP。
      (P: prismatic ;R: revolute)
  • 機械手的部份,又有Normal;Sliding;Approach等三軸。

Real-time Linux

Real-time Linux 不僅可拿來控制馬達,事實上應用範疇相當廣,早在 1997 年,Linux 被改造為 hard real-time 系統時,其中一個目的就是為了打造火箭控制系統 (請見 RTLinux, 研究人員來自美國新墨西哥大學 [NMT]),經過這近 20 年的演化,產生了相當多不同的組合。

POSIX

  • POSIX – 25 Years of Open Standard APIs

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    • 用圖表說明了 PSE51, PSE52, PSE53, PSE54,以及用於即時處理應用的 IEEE 1003.13-2003 (POSIX.13) Profiles

    • 1986 年,IEEE 指定了一個委員會制定了一個開放作業系統的標準,稱為 POSIX (Portable Operating Systems Interface),最後加上個 "X" 是因為本質上是 UNIX 的標準

API standards for Open Systems

  • IEEE POSIX Standards for Real-time

  • The PASC Real-time System Services Working Group (SSWG-RT) has developed a series of standards that amend IEEE Std 1003.1-1990 and a profile standard (IEEE Std 1003.13-1998).

  • The Real-time amendments to IEEE Std 1003.1-1990 are as follows:

    • IEEE Std 1003.1b-1993 Realtime Extension
    • IEEE Std 1003.1c-1995 Threads
    • IEEE Std 1003.1d-1999 Additional Realtime Extensions
    • IEEE Std 1003.1j-2000 Advanced Realtime Extensions
    • IEEE Std 1003.1q-2000 Tracing
  • Note that RTLinux from FSMLabs turns the 1003.13 hierarchy upsidedown, with the smaller PSE51/52 realtime threads in control, and the full-figured Linux system (similar in functionality to PSE54) as just another thread under control of the realtime threads.

    • This is the opposite of what the PASC SSWG-RT had imagined when 1003.13-1998 was written, but it nonetheless works.
  • POSIX module on top of Xenomai

Preemptible Kernel

(綠色: preemptible; 紅色: non-preemptible)

Non-Preemptive

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

[ CONFIG_PREEMPT_NONE ]

  • Preemption is not allowed in Kernel Mode
  • Preemption could happen upon returning to user space

Preemption Points in Linux Kernel

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

[ CONFIG_PREEMPT ]

  • Implicit preemption in Kernel
  • preempt_count
    • Member of thread_info
    • Preemption could happen when preempt_count == 0

Fully Preemptive

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

[CONFIG_PREEMPT_RT_BASE ] / [ CONFIG_PREEMPT_RT_FULL ]

  • Difference appears in the interrupt context
  • Goal: Preempt Everywhere except
    • Preempt disable
    • Interrupt disable
  • Reduce non-preemptible cases in kernel
    • spin_lock
    • Interrupt

中斷處理和現代架構考量

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Timeline of merged real-time features in the mainline Linux kernel, most of them coming from the PREEMPT_RT patch

The real-time linux kernel: A survey on Preempt_RT

原有的問題

原本的 Linux 核心從對一個即時行程 (realtime process) 予以排程,到該即時行程實際投入執行之間的延遲時間 (稱為 latency) 行為上不可預測,因此無法給予有著硬即時需求的任務任何保證,自然不可能滿足即時需求。

對一個即時行程進行排程,僅意味著它可執行,並不意味著它實際會執行。一個行程能否實際執行,需要取決於目前執行中的任務是否允許搶佔 (preempt) 的發生,否則待執行的即時行程仍可能嚴重地延遲。

考慮下圖情境:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

行程 A (非即時行程) 進入一個由 spinlock 保護的臨界區域,這意味著該 CPU 在目前執行的任務離開臨界區之前,無法發生行程切換,但不會阻止中斷的發生。假設這時中斷觸發,且在中斷處理中喚醒了即時行程 B,此時 B 雖是即時行程,但仍不能執行,因為臨界區域不允許行程切換。直到中斷結束,即時行程 B 待命,等待行程 A 離開臨界區域後,才得以實際執行行程 B!若臨界區域很長,那麼即時行程 B 將會嚴重地拖延。

PREEMPT_RT 的手法

該如何強化 Linux 核心對硬即時處理的能力呢?

PREEMPT_RT 首先將中斷處理轉化為核心執行緒,保證它不再佔用任意 context,這保證即便中斷被即時行程搶佔,也不會連累無辜地被它佔用核心堆疊的行程,再者,將一般行程使用的 spinlock 改為可以睡眠的 mutex (所謂的 sleeping spinlock),如此可保證喚醒即時行程後,立即執行該即時行程。

接著讓我們思考:spinlock 的實作中,需要關閉中斷嗎?

若中斷處理程式中需要鎖上該 spinlock,那麼只需要關掉 local (本地,也就是多核處理器中,執行該程式的處理器) 中斷即可。一旦關閉本地中斷,可保證本地處理器不會再觸動 spinlock 的標識,但其它處理器依然可在鎖的標誌位上 spin 等待,這不會造成死鎖,因為本地處理器總會釋放鎖。

回顧 Linux Foundation 舉辦的研討會中即時相關議程

symbo wrapping

  • pthread_mutex_lock => __wrap_pthread_mutex_lock

preserve Linux service for Cobalt service

  • system call
  • trap

RTDM

  • character device
    • UDD (analogous to UIO), memory
  • protocol device

tool

  • ipipe latency tracer

valgrind / Helgrind

  • not supported because of unknown system call

SIGDEBUG (SIGXPCPU): enable when RT thread enter RT time-critical phrases
functional limit
changes to critical subsystem regull artly cuse /7A
"Dovetail"

  • co-kernel extension

  • sharing of CPU traps

  • SCHED_DEADLINE: A Status Update - Juri Lelli, ARM Ltd
    not only about deadline
    since 3.14
    real-time scheduling policy: higher prio than NORMAL and FIFO/RR
    enables predictable task scheduling

  • allow explicit per-task latency constraints

  • avoid starvation (tasks can not eat all available CPU time)

  • enrich scheduler's knowledge about QoS requirements
    policies

  • EDF (Earlies Deadline First)

  • CBS (constant bandwidth server)

resource (CPU) servation mechanism

  • Q time unit (runtime) in every interval of length P (period)
    EDF + CBS provides temporal isolation
    load balancing + inheritance
    work with PREEMPT_RT? orthogonal
    QoS

under discussing

real fast vs. real time

  • hot cache: look ahead features
  • paging: TLB
  • least interruptions
  • optimize the most likely case: transactional memory

branch prediction

  • deadline_test

NUMA

  • memory speeds dependent on CPU
  • need to organize the tasks

Hyper-threading

  • recommended to disable on RT

system management interrupt (SMI)

  • put processor into system management mode (SMM)

RT kernel

  • threaded interrupt
  • system management threads
  • high resolution timer

threaded interrupt

  • don't poll network task (higher prio) on network interrupt

software interrupt

  • network irq will run network softirqs

except for softirqs raised by real hard interrupt

  • RCU
  • timer
  • run in ksoftirqd

timer

  • setitimer(): requires ksoftirqd to run (on PREEMPT_RT)
  • timer_create() / timer_settime()

NO_HZ

  • when CPU is idle, turn off timers
  • let CPU go into deep sleep
  • great for power saving

NO_HZ_FULL

  • works if only one task

priority inheritance locking

  • prevent unbounded latency
  • pthread set protocol

real-time vs. multi-process

  • migrateion clear caches (memory and TLB)