epoll / file descripter
contributed by < eric88525
>
The epoll API performs a similar task to poll(2): monitoring
multiple file descriptors to see if I/O is possible on any of
them. The epoll API can be used either as an edge-triggered or a
level-triggered interface and scales well to large numbers of
watched file descriptors.
Overview
- 概念像是請人代為監控 fd 狀態,有需要時就直接拿取 ready 的 fd
- epoll 代表 event poll,是 linux 的特殊結構
- 允許 process 監視多個 file descriptors ,並在 I/O 可以執行時得到提醒 (edge-triggered 和 level-triggered)
- epoll 不是 system call,而是一種 kernel 資料結構
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
epoll 語法
1. epoll create
建立 epoll instance,此 system call 回傳 epoll instance 的 file descriptior。
- 參數
- size : 希望 process 監視多少個 file descriptor。在 linux 2.6.8 之後取消,改為動態決定 size
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
有令一個用法 epoll_create1(int flags)
, flags 可為 0
或是 EPOLL_CLOEXEC
,當為 0
就跟 epoll_create 一樣
當 flags = EPOLL_CLOEXEC
,被 fork 出去的 child process 會在 exec() 以前先關掉 epoll descriptor,讓 child process 不能用 epoll instance。
2. epoll_ctl
process 可以透過此函式來增加想觀察的 file descriptor
被註冊的 fd 稱做 epoll set
或是 interest list
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
ready list 是 interest list 的子集合
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
實際用法
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
3. epoll_wait
透過 epoll_wait system call,thread 可以得知在 epoll set/interest set 內哪些 event 被觸發。
name |
description |
epfd |
透過 epoll_create() 建立的 file descriptor,用以識別 epoll instance |
evlist |
epoll_event 的 array,執行完 epoll_wait 後會被填充,用來得知哪些 fd 在 ready list |
maxevents |
length of evlist |
timeout |
block 時間(ms) |
timeout
- 0: 檢查完成後就離開,不會block
- -1: process 永遠的等待 (sleep),直到 epoll_wait 回傳
- : 有回傳或是時間數完才離開
return value
- -1: 出事了 error codes
- 0: fd 都不在 ready list
- : 如果有只少一個 fd 在 ready list,returns the number of file descriptors ready for the requested I/O,接著就能檢查 evlist 來看哪些 fd 有事件發生。
epoll 的陷阱
file descriptors (描述符)
epoll 跟 fd 息息相關,因此需要先了解 fd
process 透過 file descriptor 來與 i/o streams 有關聯。每個 process 都有自己的 fd table,有兩個欄位 flag 和 pointer,flag 只有一種選項 close on exec
,這個後續提到。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
descriptor 可以透過 sysyem call 像是 open, pipe, socker 創建,或是透過 fork。
當 process exits 或是 close 都會關閉 file descriptor,還有一種情況: 標記為 close on exec
的 descriptor 在 fork 後,只讓 parent 使用 descriptor,child process 則關閉 descriptor。
Process b 由 a fork 而來,在 b exec() 以前 descriptor 就會被標記 inactive 而無法使用。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
KERNEL 另有維護一個 open file table, 裡面記載所有被打開的 file (如果某檔案被兩個 process開,那就會有兩欄)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
當 process 被 fork 出來,descriptor 也會被複製並指向相同地方,如果更改其中一者的 offset 其他被複製出來的也會受影響 (他們是連動的)。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
innode table 介紹
innode 是 file system 的 data structure,裡面記載 file system object 的物件資訊。
資訊包含:
- location: 資料儲存在哪個 block 或是 disk
- file 和 directory 的屬性
- 額外的 metadata,像是 access time, owner, permissions…
每個存在於檔案系統的檔案都包含 inode entry,又稱作 inode number,用以指向檔案。
而 innode table 用來紀錄 inode number 和 inode structure 的對應。
下圖表示 Process A 在打開 abc.txt
後產生 fd5,Proces B 打開同一份檔案後產生 fd10,雖然他們在 open file table 指向不同地方,但最後指向同一份檔案。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
epoll 核心
以下是 process A 打開兩個不同的檔案
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
process A 呼叫 epoll_create
建立 epoll instance,fd9 作為 file descriptor。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
透過 epoll_ctl
新增要監視的 fd, fd0 新增到 interest list。
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
如果此時 fork 出 Process B,B 繼承 A 的 fd table,就連 fd9都共享。
不論 Process A 新增啥到 interest list,Process B 都會收到提醒。

就算有 process 關閉了被 epoll 關注的檔案,他還是會收到提醒。
跟 select / poll 比較
select / poll 複雜度 ,每當檢查時都會全掃一遍。假設是網站的話就要把所有 client 都檢查一次。
epoll 則只需要呼叫 epoll_wait
,拿到的就都是有 event 發生的 fd。
As a result, the cost of epoll is O(number of events that have occurred) and not O(number of descriptors being monitored) as was the case with select/poll.
level trigger 條件觸發 / edge trigger 邊緣觸發
- 條件觸發(滿足條件就產生 io事件)
- 邊緣觸發(狀態變化時發生一個 io 事件)
預設 epoll 提供 level-triggered notifications,每當呼叫 epoll_wait
,只回傳 ready list。就像下圖只回傳 [fd2, fd3]。

但有時我們只想觀察某個 fb 的狀態,不管他是不是
ready,也就是想得到 edge-triggered notifications,此時我們能透過對 bitmask 做 or 運算來關注。
edge trigger
vs level trigger
:
- level trigger: 專注在條件(ready),只要 fd ready ,就會一直提醒你。
- edge trigger: 只要有狀態變化才會通知你一次。
資料來源
The method to epoll’s madness
边缘触发(Edge Trigger)和条件触发(Level Trigger)
深入理解 Linux 的 epoll 機制