Time Sequence Constraint

flush CLOSE requests when anon fd is closed

When anonymous fd gets closed, all CLOSE requests associated with this anonymous fd will be flushed from the xarray, to avoid the following race.

P1                                          P2
------------                                -----------
                                            umount
                                              enqueue CLOSE request with a object_id
close anon fd
  free the object_id

this object_id is reused for another blob

read one CLOSE request with outdated object_id
  close anon fd for other blob # oops

However this mechanism can not cover the race described by the following sequence:

P1                                          P2
------------                                -----------
                                            (daemon) read one CLOSE request,
                                            and come back to the user space
(daemon) close anon fd
  flush CLOSE requests still inside the xarray
    find no CLOSE requests in the xarray

this object_id is reused for another blob

                                            go on to process this CLOSE request
                                              close anon fd for other blob # oops

In this case, the user daemon is responsible for avoiding the above sequence.

race between reading/flush requests

The user daemon will read "/dev/cachefiles" to fetch one request to handle, in which case it will search through the xarray to find a valid request to handle. On error path, the request will be removed from the xarray directly since the request has already been marked as non-CACHEFILES_REQ_NEW previously. Besides, for CLOSE requests, they will be removed from the xarray immediately once they are read by the user daemon, since CLOSE requests don't have reply. The procedure can be described as

# for CLOSE requests

xa_lock
search the xarray to find a valid request
xa_unlock

id = xas.xa_index;

copy this request to user buffer

xa_erase(..., id)

# for other request types

xa_lock
search the xarray to find a valid request
xa_unlock

id = xas.xa_index;

copy this request to user buffer

on error path:
  xa_erase(..., id)

The above operations to the xarray (search the xarray, and xa_erase()) are not atomic as a whole, which will race with flushing CLOSE requests in cachefiles_ondemand_fd_release(). Considering the following sequence:

P1                                          P2
------------                                -----------
xa_lock
search the xarray to find a valid request
xa_unlock

id = xas.xa_index;

copy this request to user buffer

                                            close anon fd
                                              xa_lock
                                              flush CLOSE requets
                                              xa_unlock

                                            another request may be enqueued into the xarray,
                                            reusing the previous id
xa_erase(..., id) # oops

This can be fixed by only flushing CLOSE requests marked with CACHEFILES_REQ_NEW in cachefiles_ondemand_fd_release().

While for other request types, though the operations to the xarray (search the xarray, and xa_erase() on error path) are also not atomic as a whole, there's no race with cachefiles_ondemand_fd_release(), since cachefiles_ondemand_fd_release() will only flush CLOSE requests.

constraint for failover

When anon fd is closed prematurely (the cachefiles_object will switch to close state), i.e. there's still inflight READ request, the failover mechanism will automatically resend an OPEN request to reallocate an anon fd (in which case the cachefiles_object will switch to opening state). Once the OPEN requst is completed (with a successful copen replied), the cachefiles_object will switch to open state.

Current implementation doesn't cover the potential race described by the following sequence:

P1
------------
when reopen is triggered,
object switches to opening state

(daemon) read OPEN request

close anon fd
  object switches to close state

reply (a successful) copen
  object switches to open state # oops object is in open state with invalid object_id (CACHEFILES_ONDEMAND_ID_CLOSED)

This can not be fixed by the following attempt, which make the object switch to open state only when the object is in opening state.

cachefiles_ondemand_copen
  cmpxchg(&req->object->state, CACHEFILES_OBJECT_STATE_opening, CACHEFILES_OBJECT_STATE_open)

Because the object may be in opening state in error path, which can be described by the following sequence:

P1                                          P2
------------                                -----------
when reopen is triggered,
object switches to opening state

(daemon) read OPEN request

close anon fd
  object switches to close state

                                            since object is in close state now,
                                            reopen again, and enqueue a new OPEN request,
                                            and object switches to opening state

reply (a successful) copen
  object switches to open state # oops object is in open state with invalid object_id (CACHEFILES_ONDEMAND_ID_CLOSED)

Besides, there's other possible sequence interfering the object state machine.

P1                                          P2
------------                                -----------
when reopen is triggered,
object switches to opening state

(daemon) read OPEN request

close anon fd
  object switches to close state

                                            since object is in close state now,
                                            reopen again, and enqueue a new OPEN request,
                                            and object switches to opening state
                                            
                                            (daemon) read OPEN request
                                            reply (a successful) copen
                                              object switches to open state

reply (a fail) copen
  object switches to close state # oops

A potential fix is that, also flushing OPEN requests when anon fd is closed.

If cachefiles_ondemand_fd_release() runs before cachefiles_ondemand_copen(), i.e. the user daemon closes anon fd before replying copen, then inside cachefiles_ondemand_fd_release(), object will switch to close state, but the OPEN request itself won't be removed from the xarray (since all request types with reply can't be flushed considering the following sequence).

P1					                P2
-------------------------------		-----------------------------
cachefiles_ondemand_daemon_read		cachefiles_ondemand_fd_release
  xa_lock
  read OPEN request, and its (msg) id
  xa_unlock
					                  xa_lock
					                  flush READ requests
					                  xa_unlock

					                 cachefiles_ondemand_send_req
					                  enqueue another request, and
					                  reuse the former (msg) id
  error encountered, and
  xa_erase(..., id)

While for cachefiles_ondemand_copen(), it won't make any change to the object state when cachefiles_ondemand_fd_release() has been called before.

cachefiles_ondemand_copen()
    if (req->error) # i.e. cachefiles_ondemand_fd_release() has been called before
        return
    else
        make the object switch to open/close state according to the copen

If the user daemon replies copen before closing anon fd, then cachefiles_ondemand_copen() will make the object switch to open/close state according to the copen

if a successful copen is replied, then the object will switch to open state in cachefiles_ondemand_copen()
or the object will keep in opening state until the anon fd gets closed and the object will switch to close state in cachefiles_ondemand_fd_release(). If the object switches to close state inside cachefiles_ondemand_copen() when a bad copen is received, then it will race with the setting close state inside cachefiles_ondemand_fd_release().

However the above fix seems quite complicated. Thus the current implementation doesn't cover the above described race, while the user daemon is responsible for avoiding the above time sequence where anon fd gets closed before replying copen.

Time Sequence Constraint

flush CLOSE requests when anon fd is closed

race between reading/flush requests

constraint for failover

Read more

Dragonfly Nydus bi-weekly meeting

Constraint for Daemon

ondemand 并发时序

erofs + fscache TODO