This article provides a good introduction to epoll
. The man page referred are epoll_create, epoll_ctl and epoll_wait.
The design below are assumed to be non-blocking. Among all the epoll
syscalls, only epoll_wait
can potentially block, but it is possible to make it non-blocking by setting 0 as timeout
. In the first iteration only socketpair
and eventfd
can be monitored with epoll
, but it can be extend to other file description afterwards.
epoll_ctl
: Add, update and delete fd
from the interest list of epoll
instance, throw error when appropriate.events
we supported occured in the file description. (relevant to event
field of epoll_event
struct below)
EPOLLIN
/EPOLLOUT
: Check if input or output happened for eventfd
and socketpair
file description.EPOLLRDHUP
: Check if the peer of socketpair
is dropped.epoll_wait
(This should be achievable after the two steps above are completed)General description
epoll_ctl
modifies the the interest list stored in the epoll instance pointed by epfd
.
Function parameters for epoll_ctl
epfd
: The file descriptor of a epoll
instance. An epoll
instance can be created using epoll_create1
.op
: Operation to be performed, and either of the three flags below will be used here:
EPOLL_CTL_ADD
: Add the file descriptor fd
to the interest list, the set of events that we are interested in monitoring is pointed to by event
.EPOLL_CTL_MOD
: Modify the event setting for fd
, which means replacing original epoll_event
associated with fd
with event
.EPOLL_CTL_DEL
: Remove fd
from the interest list.fd
: File descriptor in the interest list that should be modified.event
: A pointer to epoll_event
.
EPOLL_CTL_ADD
is used, the fd
will be stored in the interest list together with this epoll_event
EPOLL_CTL_MOD
is used, the old epoll_event
associated with fd
will be replaced with this.EPOLL_CTL_DEL
is used, event
will be ignored.Field description for epoll_event
:
events
: A bit mask specifying events that we are interested in monitoring for fd
.
EPOLLIN
: read
is possible on the file description.EPOLLOUT
: write
is possible on the file description.EPOLLRDHUP
: Stream socket peer closed connection, or shut down writing half of connection.EPOLLET
: Employ edge-triggered event notification. (Explained in this section)EPOLLONESHOT
, EPOLLERR
, EPOLLHUP
… (TODO: add the remaining unsupported flags)u64
: User can freely decide what to store in it, but it should only be a u64
(pointer will be rejected).There are two models of notification for epoll
:
epoll_wait
.By default, epoll
employs level-triggered event notification, and edge-triggered event notification can be enabled using the EPOLLET
flag. Since tokio
use EPOLLET
, only edge-triggered event notification will be implemented in the first iteration.
This example is written to give a brief idea on how epoll
is used and will be converted to a test after epoll_wait
is completed.
If both EPOLLIN
and EPOLLOUT
is set, it will be considered ready if either input or output happens.
If an event is added in ready list, but did not return as ready by epoll_wait
, then notification should be provided for the next epoll_wait
call.
If a fd is dupped, and the intial file descriptor value is closed, as long as it is not removed from interest list through epoll_ctl
, notification will still be provided.
edge case for epollrdhup:
The readiness of the file description will be check during insertion or when any event happened. Setting: register using EPOLLIN EPOLLOUT EPOLLRDHUP EPOLLET
Test: don't do anything on socketpair.
Test: write to socketpair
Test case: register -> write -> epoll_wait -> deregister -> epoll_wait
epoll_wait
returns EPOLLIN and EPOLLOUT (if without deregister, nothing will be returned in the second epoll_wait)Test case: write to one side, epoll_wait, then close another side.
Every time a file descriptor is registered / epoll_event is modified, we should return the flags representing the current state.
We only return epoll_return only if there is event since the last return.
epoll_wait
returns EPOLLIN
and EPOLLOUT
(If an event occured, the readiness of all flags is checked again)In socketpair
, only the peer fd will be notified for all events (read/write/close
).
In socketpair, read
will trigger notification when the read
call emptied the buffer. Although it is possible to have notification when the buffer is not completely empty. https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/epoll.20notification.20on.20socketpair.20write.20unblock/near/459694798
In edge-triggreed, The moment a file description is registered with epoll, it will trigger a notification. But if there is multiple epfd registered this file description, it will only wake up the one that registered it.
If multiple threads (or processes, if child processes have inherited the epoll file descriptor across fork(2)) are blocked in epoll_wait(2) waiting on the same epoll file descriptor and a file descriptor in the interest list that is marked for edge- triggered (EPOLLET) notification becomes ready, just one of the threads (or processes) is awoken from epoll_wait(2). This provides a useful optimization for avoiding "thundering herd" wake-ups in some scenarios.
To be perhaps clearer, epoll_wait() won't return an fd unless something changed on that socket, but if something did change, it returns all the flags representing the current state.
EPOLLER:
Weird case
Unsupported operation
Only edge-triggered notification is supported, so if the EPOLLET
flag is not used, throw_unsup_format
will be used.
Related structs
epoll_ctl_add
EEXIST
.
epoll_event
to the interest list.epoll_event
to the file description.epoll_return
to ready list if applicable.epoll_ctl_mod
ENOENT
.epoll_return
to ready list if applicable.epoll_ctl_del
ENOENT
."poll" operation here means checking the "readiness" of the file description (not the poll syscall).
For edge-triggered notification, we need to add EpollReturn
to the ready list immediately event occured to the file description. To achieve this, during read/write/close, iterate through epoll_events
in the file description, and add a new EpollReturn
into the ready list if applicable. If the EpollReturn
entry already exists, modify the event mask of that entry.
Notification should not be returned if there is no event between two epoll_wait
call on the same epoll instance. So a epoll_return
entry will be removed after being returned by epoll_wait
A list of details:
EPOLL_CTL_ADD
call, exactly one epoll_event
will be inserted to the interest list of that epoll instance.epoll_event
in the same epoll instance must have a unique file descriptor value. But it is valid to have two epoll_event
with same file descriptor value to exist in two different epoll instance.epoll_create1
call.EpollReturn
and EpollEvent
has one to one relationship. An EpollEvent
can only generate or update one and only oneEpollEvent
in the ready list.EpollEvent
of same Epoll
interest will share the same ready_list
.ready_list
only contains the EpollReturn
only if the epoll_event
is considered "ready".Function parameters:
epoll_wait
can block, currently only 0 will be supported.When epoll_wait
is called, we just return the ready list, but the operations below need to be done too:
If there exists no file descriptor
pointing to a file description in the interest list, that event should never be returned as ready. To achieve this, in epoll_wait
, before returning, we can attempt to upgrade the file description in EpollReturn
to check if the file description is closed. If it is closed, that particular EpollReturn
entry will be discarded.
After a an epoll_return
is successfully return, it will be cleared from the ready_list, so no notification would be provided for the next epoll_wait
if there is no event happened between the two epoll_wait
. (We do this instead of clearing the whole ready_list because it is possible for some epoll_return
to be not returned due to the limit imposed by max_events
)
If number of ready event > maxevents, we will only return the first maxevents number of them.
Enhancement: