Yiwei Lin
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    7
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- tags: NCKU Linux Kernel Internals, 作業系統 --- # Linux 核心設計: Interrupt [Linux 核心設計: 中斷處理和現代架構考量](https://hackmd.io/@sysprog/linux-interrupt) :::danger 老師的課程內容從很廣泛的角度講述了中斷相關的議題,一時無法完整的消化。此筆記僅補充一些基本的議題,整理的還不夠完整,詳細請參考課程錄影 ::: ## What is interrupt? Some reference to [CSE 438/598 Embedded Systems Programming](http://rts.lab.asu.edu/web_438_Fall_2014/CSE438_Fall2014_Main_page.htm) : [Linux Interrupt Processing and Kernel Thread](http://rts.lab.asu.edu/web_438/CSE438_598_slides_yhlee/438_7_Linux_ISR.pdf) 簡單概括的話,Interrupt 是一個通知 CPU 事件發生的機制,迫使 CPU 無論忙碌與否,都要對此事件做出回應。 當 interrupt 發生,類似於 context switch(需注意僅是類似,但本質上並不相同!),硬體會儲存當前 process 的狀態(通常需要儲存的訊息會相對 context switch 少一些),從 process context 切換到 interrupt context,判斷 interrupt 的類型後,使用對應的 interrupt handler 去對此進行處理。 ### Preemptive Context Switching ![](https://i.imgur.com/j7tNtQQ.png) Context Switching / multitasking 可以分成[協同式(Cooperative)](https://en.wikipedia.org/wiki/Cooperative_multitasking)與[搶佔式(Preemptive)](https://en.wikipedia.org/wiki/Preemption_(computing)),前者需由 thread 本身決定甚麼時候讓出 CPU 讓其他 thread 執行(例如透過 [schedule()](https://elixir.bootlin.com/linux/latest/source/kernel/sched/core.c#L4375)),後者則需藉由 interrupt,在每次離開 interupt context 時去做 context switch,把 CPU 移轉給當前優先權最高的 thread 去執行。 ### Interrupt Handling Interrupt 可以分成多種類型,例如: * I/O interrupt * Timer interrupt * Interprocessor interrupt ![](https://i.imgur.com/o76BTom.png) 當外部的硬體裝置發出某種訊號,[PIC](https://en.wikipedia.org/wiki/Programmable_interrupt_controller) 會接收該硬體發出的 interrupt。PIC 接受的訊號會被轉換成一組 vector,用來查詢系統中的 [IDT](https://en.wikipedia.org/wiki/Interrupt_descriptor_table),找到對應的 [ISR / Interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler) 起始位址進行處理。每個 PIC 可以處理有限數量的 interrupts,如果讓其中一個 interupt 接受另一個 PIC 的 訊號,則可以擴充可處理的 interrupts 總量。 > 延伸閱讀: [PIC中斷控制器介紹](http://stenlyho.blogspot.com/2008/08/pic.html) 在現代的作業系統中,ISR 會被切成 top half 和 botton half 兩個部份,目的是為了減少任務的延遲。當 interrupt 發生,為了避免 nested interrupt 導致中斷的處理變得複雜(需考慮如 ISR 的 reentry、資源的互斥等),最簡單的作法是在 interrupt context 中關閉 interrupt,然而如果關閉的時間過長,可能會導致系統對 I/O 的回應變慢,導致錯過某個 interrupt 而產生延遲。 Top half 和 botton half 的區分使得系統可以把 interrupt 的處理推遲,在 top half 中,disable interrupt ,做最小而重要的任務後(例如 pending 發生的 interrupt 類型),enable interrupt,如果接著沒有 interrupt 進來,再對 bottom half 去做處理。藉此,降低處理 interrupt 產生的 latency。 在 linux 中,主要有三種延遲 intterupt 處理的機制: * softirqs * tasklets * workqueues ### Softirq softirq 在 kernel 的編譯時期就會被註冊,由 [open_softirq](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/softirq.c#L447) 初始化。 ```c= void open_softirq(int nr, void (*action)(struct softirq_action *)) { softirq_vec[nr].action = action; } ``` 可以看到一個這裡去 index `softirq_vec` 並設定一個對應 softirq 處理的 function pointer。 ```c= struct softirq_action { void (*action)(struct softirq_action *); }; static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; const char * const softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL", "TASKLET", "SCHED", "HRTIMER", "RCU" }; ``` `softirq_vec` 是型別為 `softirq_action` 的 array,結構中僅有一個指向 action funtion 的 pointer。在 `softirq_vec` 中,有 NR_SOFTIRQS(=10) 種的 softirq 被註冊: * 兩個屬於 tasklet 的處理 (HI, TASKLET) * 兩個屬於網路 (NET_TX, NET_RX) * 兩個屬於 block device (BLOCK, BLOCK_IOPOLL) * 兩個屬於 timer (TIMER, HRTIMER) * 一個屬於 scheduler (SCHED) * 一個屬於 read-copy-update (RCU) 透過 `cat /proc/softirqs` 也可以得到相關的資訊。 ```c= void raise_softirq(unsigned int nr) { unsigned long flags; local_irq_save(flags); raise_softirq_irqoff(nr); local_irq_restore(flags); } ``` `raise_softirq` 會觸發 softirq 的處理。`local_irq_save` 首先將狀態存入一個 [Interrupt flag](https://en.wikipedia.org/wiki/Interrupt_flag) 並且關閉 interrupt,`local_irq_restore` 則反之會回存 flag,回復到 `local_irq_save` 之前的狀態(interrupt 可能是開或關,視乎保存前的狀況而定)。 關閉 interrupt 的理由為何呢? 這是由於 `raise_softirq_irqoff` 中將會對全域的變數做設置 bitflag 的操作(對某個位元做 or 1,詳見 `or_softirq_pending`),則倘若 interrupt 未關閉,將可能導致該全域變數的 race condition。因此避免另一個 softirq 的執行,才可以預防競爭導致的 dead lock。 ```c= inline void raise_softirq_irqoff(unsigned int nr) { __raise_softirq_irqoff(nr); if (!in_interrupt()) wakeup_softirqd(); } ``` `raise_softirq_irqoff` 會根據 `nr` 透過 `__raise_softirq_irqoff(nr)` 去 pending softirq 的 bitmask `__softirq_pending`, 標註要被延遲處理的 intterupt 類型。 在離開 `raise_softirq_irqoff` 之前,檢查 CPU 是在 interrupt context 或是 process context,如果是在 interrupt context 中,則 restore interrupt flag 再開啟 interrupt 即可,返回後會自然進行 softirq 的 bottom half 處理,但是如果是在 process context 的話,則需要透過 `wakeup_softirqd` 去喚醒 kernel thread deamon `ksoftirqd`。 ```c= asmlinkage __visible void __softirq_entry __do_softirq(void) { unsigned long end = jiffies + MAX_SOFTIRQ_TIME; ... restart: while ((softirq_bit = ffs(pending))) { ... h->action(h); ... } ... pending = local_softirq_pending(); if (pending) { if (time_before(jiffies, end) && !need_resched() && --max_restart) goto restart; } ... } ``` `ksoftirqd` 會透過 `run_ksoftirqd` 去檢查是否有被推遲處理的 interrupt ,使用 `__do_softirq` 去做對應的處理。根據 `__softirq_pending` 的 bitmask 內容,就可以知道有哪些 interrupt 的處理是被延遲的。 當系統在做推遲的處理時,有可能會不斷有新的 softirqs 發生,此時如果為了處理新的 softirq,可能會導致 userspace 的 thread 不能被排程,因此可以看到這裡會設定一個允許處理的時間。 對於有沒有被推遲的 softirq 檢查會被安插在 kernel 中以確保周期性的運作。主要的檢查點在 [`do_IRQ`](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/irq.c#L218) 中,也就是實際有 intterrupt 發生時的處理點。在 `do_IRQ` 的結束前,會呼叫 `exiting_irq()`,`exiting_irq()` 再呼叫 `irq_exit()`。 ```c= void irq_exit(void) { ... if (!in_interrupt() && local_softirq_pending()) invoke_softirq(); ... } ``` `irq_exit` 會檢查是否有 pending 的 softirq,呼叫的 `invoke_softirq` 也會呼叫 `__do_softirq`,對 bottom half 做相應的處理。 ### Tasklet Softirq 是面向性能的,相同的 softirq 可以同時在不同的 CPU 上平行進行,因此程式必須要可以 reentry,對於撰寫程式就增加了一定的難度。而其另一個缺點是在編譯時期就決定好對應的處理,無法動態的註冊和刪除,顯然對於 kernel module 的撰寫不大友善,而 tasklet 的設計可以解決此問題。 ```c= void __init softirq_init(void) { int cpu; for_each_possible_cpu(cpu) { per_cpu(tasklet_vec, cpu).tail = &per_cpu(tasklet_vec, cpu).head; per_cpu(tasklet_hi_vec, cpu).tail = &per_cpu(tasklet_hi_vec, cpu).head; } open_softirq(TASKLET_SOFTIRQ, tasklet_action); open_softirq(HI_SOFTIRQ, tasklet_hi_action); } ``` 在初始化階段時,程式會走遍所有 possible processors(支援熱插拔的 processor?),並初始化 [per_cpu](https://0xax.gitbooks.io/linux-insides/content/Concepts/linux-cpu-1.html) 的 `tasklet_vec` 和 `tasklet_hi_vec` ```c= struct tasklet_head { struct tasklet_struct *head; struct tasklet_struct **tail; }; static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec); static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec); ``` 每個 CPU 都會維護一個 tasklet 的 linked-list,其中 HI_SOFTIRQ 用於高優先級的 tasklet,TASKLET_SOFTIRQ 則用於普通的 tasklet。可以看到 `softirq_init` 的最後有呼叫我們在前面提到的 `open_softirq`,去註冊兩個 tasklet 相關的 softirq。 ```c= void tasklet_init(struct tasklet_struct *t, void (*func)(unsigned long), unsigned long data) { t->next = NULL; t->state = 0; atomic_set(&t->count, 0); t->func = func; t->data = data; } ``` 接著,我們可以透過 linux kernel 中提供的 API 來操作 tasklet。一個例子是 `tasklet_init`,可以用來動態的初始化 `tasklet_struct` ```c= DECLARE_TASKLET(name, func, data); DECLARE_TASKLET_DISABLED(name, func, data); ``` 透過上面兩個 macro 也可以靜態的定義 tasklet。 ```c= void tasklet_schedule(struct tasklet_struct *t); void tasklet_hi_schedule(struct tasklet_struct *t); void tasklet_hi_schedule_first(struct tasklet_struct *t); static inline void tasklet_schedule(struct tasklet_struct *t) { if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) __tasklet_schedule(t); } void __tasklet_schedule(struct tasklet_struct *t) { unsigned long flags; local_irq_save(flags); t->next = NULL; *__this_cpu_read(tasklet_vec.tail) = t; __this_cpu_write(tasklet_vec.tail, &(t->next)); raise_softirq_irqoff(TASKLET_SOFTIRQ); local_irq_restore(flags); } ``` 上面的 API 則可以用來標示 tasklet 已經準備好要被執行(根據優先權的要求使用不同的 API)。以 tasklet_schedule 為例,會將 tasklet stuct 的狀態設成 `TASKLET_STATE_SCHED`,再去執行 `__tasklet_schedule`,`__tasklet_schedule` 的作用就類似前面提及的 `raise_softirq`,先保存 interrupt flag 並且關閉 interrupt,將 `tasklet_vec` 更新後,呼叫 `raise_softirq_irqoff` 去 pending softirq。如此一來,當 kernel 要去處理 bottom half 時,前面註冊的 softirq action `tasklet_action` 就會被呼叫。 ```c= static void tasklet_action(struct softirq_action *a) { local_irq_disable(); list = __this_cpu_read(tasklet_vec.head); __this_cpu_write(tasklet_vec.head, NULL); __this_cpu_write(tasklet_vec.tail, this_cpu_ptr(&tasklet_vec.head)); local_irq_enable(); while (list) { if (tasklet_trylock(t)) { t->func(t->data); tasklet_unlock(t); } ... } } ``` 在 tasklet action 中,本地的 interrupt 會先被關閉,接著取出 local cpu 的 tasklet linked-list 到一個臨時變量中,再將該鍊linked-list 設為 NULL。然後開啟 interrupt,走遍整個 list。 ```c= static inline int tasklet_trylock(struct tasklet_struct *t) { return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state); } ``` `tasklet_trylock` 被呼叫來嘗試將 state 設為 `TASKLET_STATE_RUN`,如果成功,則執行在 `tasklet_init` 註冊的對應 function,結束後再透過 `tasklet_unlock` 回復 state。 注意到 softirq 和 tasklet 同樣運行在 interrupt context (software irq context) 之下,因此不允許 sleep / preempt / context switch,也不允許存取 userspace 的資料。此外,同一個 tasklet 不允許在多個 CPU 上平行處理,每個 tasklet 將僅在調度它的 CPU 上運行,以優化 cache 使用。因而這種設計可能不理想,因為其他潛在 idle 的 CPU 不能用於運行此 tasklet。 > * [why tasklet cant sleep](https://lists.kernelnewbies.org/pipermail/kernelnewbies/2011-November/003812.html) > * [Why kernel code/thread executing in interrupt context cannot sleep?](https://stackoverflow.com/questions/1053572/why-kernel-code-thread-executing-in-interrupt-context-cannot-sleep/1056710#1056710) ### Work queue workqueue 是另一種處理 bottom half 的方式,其最大的特點在於 workqueue 是執行在 kernel context,而非 interrupt context。 ```c= struct work_struct { atomic_long_t data; struct list_head entry; work_func_t func; #ifdef CONFIG_LOCKDEP struct lockdep_map lockdep_map; #endif }; ``` 整個 workqueue 的核心概念是對 interrupt 的處理建立 per-CPU 的 kernel threads,而整個 workqueue 的基本單元根據一個 [`work_struct`](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/workqueue.h#L100) 來描述。其中 `func` 是排程任務的執行內容,`data` 則是任務要處理的數據。 ```c= #define DECLARE_WORK(n, f) \ struct work_struct n = __WORK_INITIALIZER(n, f) ``` DECLARE_WORK 可以用來靜態建立 workqueue。 ```c= #define INIT_WORK(_work, _func) \ __INIT_WORK((_work), (_func), 0) #define __INIT_WORK(_work, _func, _onstack) \ do { \ __init_work((_work), _onstack); \ (_work)->data = (atomic_long_t) WORK_DATA_INIT(); \ INIT_LIST_HEAD(&(_work)->entry); \ (_work)->func = (_func); \ } while (0) ``` 或者可以通過 `INIT_WORK` 動態建立。 ```c= static inline bool queue_work(struct workqueue_struct *wq, struct work_struct *work) { return queue_work_on(WORK_CPU_UNBOUND, wq, work); } ``` 一旦 `work_struct` 被建立,可以透過 `queue_work` 將其加入到 workqueue 中。`queue_work_on` 被呼叫,其中 `WORK_CPU_UNBOUND` 表示該 kernel thread 不限定在哪個 CPU 中被執行。 ### Reference * [Introduction to deferred interrupts (Softirq, Tasklets and Workqueues)](https://0xax.gitbooks.io/linux-insides/content/Interrupts/linux-interrupts-9.html) * [linux kernel的中断子系统之(八):softirq](http://www.wowotech.net/irq_subsystem/soft-irq.html) * [linux kernel的中断子系统之(九):tasklet](http://www.wowotech.net/irq_subsystem/tasklet.html) * [softirq, tasklet和workqueue的区别](https://blog.csdn.net/jusang486/article/details/51155277) ## TODO - [ ] 自行閱讀 softirq、tasklet、work queue 的程式碼,並透過實驗補充二手文章中可能忽略的更多的細節 - [ ] 研究 interrupt 在多核心上的額外考量 - [ ] 研究虛擬化技術的實作框架(如何運作?),以及其對作業系統在中斷處理上的影響

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully