Time Sequence Constraint

# Time Sequence Constraint ## flush CLOSE requests when anon fd is closed When anonymous fd gets closed, all CLOSE requests associated with this anonymous fd will be flushed from the xarray, to avoid the following race. ``` P1 P2 ------------ ----------- umount enqueue CLOSE request with a object_id close anon fd free the object_id this object_id is reused for another blob read one CLOSE request with outdated object_id close anon fd for other blob # oops ``` However this mechanism can not cover the race described by the following sequence: ``` P1 P2 ------------ ----------- (daemon) read one CLOSE request, and come back to the user space (daemon) close anon fd flush CLOSE requests still inside the xarray find no CLOSE requests in the xarray this object_id is reused for another blob go on to process this CLOSE request close anon fd for other blob # oops ``` In this case, the user daemon is responsible for avoiding the above sequence. ## race between reading/flush requests The user daemon will read "/dev/cachefiles" to fetch one request to handle, in which case it will search through the xarray to find a valid request to handle. On error path, the request will be removed from the xarray directly since the request has already been marked as non-CACHEFILES_REQ_NEW previously. Besides, for CLOSE requests, they will be removed from the xarray immediately once they are read by the user daemon, since CLOSE requests don't have reply. The procedure can be described as ``` # for CLOSE requests xa_lock search the xarray to find a valid request xa_unlock id = xas.xa_index; copy this request to user buffer xa_erase(..., id) ``` ``` # for other request types xa_lock search the xarray to find a valid request xa_unlock id = xas.xa_index; copy this request to user buffer on error path: xa_erase(..., id) ``` The above operations to the xarray (search the xarray, and xa_erase()) are not atomic as a whole, which will race with flushing CLOSE requests in cachefiles_ondemand_fd_release(). Considering the following sequence: ``` P1 P2 ------------ ----------- xa_lock search the xarray to find a valid request xa_unlock id = xas.xa_index; copy this request to user buffer close anon fd xa_lock flush CLOSE requets xa_unlock another request may be enqueued into the xarray, reusing the previous id xa_erase(..., id) # oops ``` This can be fixed by only flushing CLOSE requests marked with CACHEFILES_REQ_NEW in cachefiles_ondemand_fd_release(). While for other request types, though the operations to the xarray (search the xarray, and xa_erase() on error path) are also not atomic as a whole, there's no race with cachefiles_ondemand_fd_release(), since cachefiles_ondemand_fd_release() will only flush CLOSE requests. ## constraint for failover When anon fd is closed prematurely (the cachefiles_object will switch to *close* state), i.e. there's still inflight READ request, the failover mechanism will automatically resend an OPEN request to reallocate an anon fd (in which case the cachefiles_object will switch to *opening* state). Once the OPEN requst is completed (with a successful copen replied), the cachefiles_object will switch to *open* state. Current implementation doesn't cover the potential race described by the following sequence: ``` P1 ------------ when reopen is triggered, object switches to opening state (daemon) read OPEN request close anon fd object switches to close state reply (a successful) copen object switches to open state # oops object is in open state with invalid object_id (CACHEFILES_ONDEMAND_ID_CLOSED) ``` This can not be fixed by the following attempt, which make the object switch to open state only when the object is in opening state. ``` cachefiles_ondemand_copen cmpxchg(&req->object->state, CACHEFILES_OBJECT_STATE_opening, CACHEFILES_OBJECT_STATE_open) ``` Because the object may be in opening state in error path, which can be described by the following sequence: ``` P1 P2 ------------ ----------- when reopen is triggered, object switches to opening state (daemon) read OPEN request close anon fd object switches to close state since object is in close state now, reopen again, and enqueue a new OPEN request, and object switches to opening state reply (a successful) copen object switches to open state # oops object is in open state with invalid object_id (CACHEFILES_ONDEMAND_ID_CLOSED) ``` Besides, there's other possible sequence interfering the object state machine. ``` P1 P2 ------------ ----------- when reopen is triggered, object switches to opening state (daemon) read OPEN request close anon fd object switches to close state since object is in close state now, reopen again, and enqueue a new OPEN request, and object switches to opening state (daemon) read OPEN request reply (a successful) copen object switches to open state reply (a fail) copen object switches to close state # oops ``` A potential fix is that, also flushing OPEN requests when anon fd is closed. 1. If cachefiles_ondemand_fd_release() runs before cachefiles_ondemand_copen(), i.e. the user daemon closes anon fd before replying copen, then inside cachefiles_ondemand_fd_release(), object will switch to close state, but the OPEN request itself won't be removed from the xarray (since all request types with reply can't be flushed considering the following sequence). ``` P1 P2 ------------------------------- ----------------------------- cachefiles_ondemand_daemon_read cachefiles_ondemand_fd_release xa_lock read OPEN request, and its (msg) id xa_unlock xa_lock flush READ requests xa_unlock cachefiles_ondemand_send_req enqueue another request, and reuse the former (msg) id error encountered, and xa_erase(..., id) ``` While for cachefiles_ondemand_copen(), it won't make any change to the object state when cachefiles_ondemand_fd_release() has been called before. ``` cachefiles_ondemand_copen() if (req->error) # i.e. cachefiles_ondemand_fd_release() has been called before return else make the object switch to open/close state according to the copen ``` 2. If the user daemon replies copen before closing anon fd, then cachefiles_ondemand_copen() will make the object switch to open/close state according to the copen - if a successful copen is replied, then the object will switch to open state in cachefiles_ondemand_copen() - or the object will keep in opening state until the anon fd gets closed and the object will switch to close state in cachefiles_ondemand_fd_release(). If the object switches to close state inside cachefiles_ondemand_copen() when a bad copen is received, then it will race with the setting close state inside cachefiles_ondemand_fd_release(). However the above fix seems quite complicated. Thus the current implementation doesn't cover the above described race, while the user daemon is responsible for avoiding the above time sequence where anon fd gets closed before replying copen.

Read more

Dragonfly Nydus bi-weekly meeting

Constraint for Daemon

ondemand 并发时序

erofs + fscache TODO