ondemand 并发时序

# ondemand 并发时序 ``` task daemon =========== ========== send_req close_devfd close_anonfd daemon_read (for OPEN) copen (write on devfd) (for READ) CACHEFILES_IOC_READ_COMPLETE ioctl (on anonfd) ``` ## 1. task:send_req 与 daemon:close_devfd 访问 erofs 文件的进程在触发 ondemand 的时候，通过 cachefiles_ondemand_send_req() 将请求添加到 xarray 中，即 enqueue request ``` cachefiles_ondemand_init_object cachefiles_ondemand_clean_object cachefiles_ondemand_read cachefiles_ondemand_send_req ``` daemon 在 close devfd 的时候需要将 xarray 中的请求 flush 掉，即 flush request ``` cachefiles_daemon_release cachefiles_flush_reqs ``` enqueue request 与 flush request 这两步操作通过 cache->flags 的 CACHEFILES_DEAD bit 进行同步，即 ``` # flush request cachefiles_daemon_release set CACHEFILES_DEAD bit cachefiles_flush_reqs # flush request ``` ``` # enqueue request cachefiles_ondemand_send_req # if CACHEFILES_DEAD bit not set: # enqueue request ``` 但是这里有两个时序需要注意 1) enqueue 的时候 a) test CACHEFILES_DEAD bit, b) enqueue request 这两个操作作为一个整体必须是 atomic 的，否则会出现以下 race ``` /* * Stop enqueuing the request when daemon is dying. The * following two operations need to be atomic as a whole. * 1) check cache state, and * 2) enqueue request if cache is alive. * Otherwise the request may be enqueued after xarray has been * flushed, leaving the orphan request never being completed. * * CPU 1 CPU 2 * ===== ===== * test CACHEFILES_DEAD bit * set CACHEFILES_DEAD bit * flush requests in the xarray * enqueue the request */ ``` 因而使用了 spinlock (xarray->xa_lock) 锁来确保上述 atomic 的要求 ``` # enqueue request cachefiles_ondemand_send_req xa_lock # if CACHEFILES_DEAD bit not set: # enqueue request xa_unlock ``` ``` # flush request cachefiles_daemon_release set CACHEFILES_DEAD bit cachefiles_flush_reqs xa_lock # flush request xa_unlock ``` 2) 注意上述 flush request 的时候，没有把 "set CACHEFILES_DEAD bit" 也放在 lock 的 critical area 内 (主要是不想修改 cachefiles_daemon_release() 函数)，这样 a) set CACHEFILES_DEAD bit, b) flush request 两个操作中间就有可能插入 enqueue request 路径的操作；如果 a) set CACHEFILES_DEAD bit, b) flush request 两个操作再发生乱序，就有可能导致以下时序 ``` /* * Make sure the following two operations won't be reordered. * 1) set CACHEFILES_DEAD bit * 2) flush requests in the xarray * Otherwise the request may be enqueued after xarray has been * flushed, leaving the orphan request never being completed. * * CPU 1 CPU 2 * ===== ===== * flush requests in the xarray * test CACHEFILES_DEAD bit * enqueue the request * set CACHEFILES_DEAD bit */ ``` 所以为了让 flush request 路径的 a) set CACHEFILES_DEAD bit, b) flush request 两个操作，不要发生乱序，在这两个操作的中间加了一个 memory barrier ``` # flush request cachefiles_daemon_release set CACHEFILES_DEAD bit cachefiles_flush_reqs smp_mb(); xa_lock # flush request xa_unlock ``` > 这里 flush request 路径中的 xa_lock 能不能充当 memory barrier 呢？ > spinlock 的 lock 操作隐含的是 read acquire 语义，而 read acquire 语义则是，确保 read acquire 之后的内存访问指令都在 read acquire 之后执行，相当于具有抑制 LoadLoad/LoadStore reordering 的作用 > > ``` > read acquire > --------------------- > all memory operations > stay below the line > ``` > > 但是我们这里是需要抑制 Store\[Load|Store]，所以 xa_lock 隐含的 read acquire 语义并不能解决这个问题 ## 2. task:send_req 与 daemon:close_anonfd 类似地，当 daemon 在 close anonfd 的时候需要将 xarray 中，与该 anonfd 相关的请求 flush 掉，主要是 READ/CLOSE 请求，即 flush request 在引入 failover 特性之后，close anonfd 的时候不再需要 flush READ 请求，但是仍然需要 flush CLOSE 请求;flush CLOSE 请求的原因请参考 [flush CLOSE requests when anon fd is closed](https://hackmd.io/YNsTQqLcQYOZ4gAlFWrNcA#flush-CLOSE-requests-when-anon-fd-is-closed) 这里 enqueue request 与 flush request 这两步操作通过 object->ondemand_id 进行同步，即 ``` # flush request cachefiles_ondemand_fd_release object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED # flush request ``` ``` # enqueue request cachefiles_ondemand_send_req # if object->ondemand_id valid (ondemand_id > 0): # enqueue request ``` 类似地，enqueue request 路径中 a) test ondemand_id, b) enqueue request 这两个操作作为一个整体必须是 atomic 的，因而这里也是使用了 spinlock (xarray->xa_lock) 锁来确保上述 atomic 的要求 ``` # flush request cachefiles_ondemand_fd_release xa_lock object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED # flush request xa_unlock ``` ``` # enqueue request cachefiles_ondemand_send_req xa_lock # if object->ondemand_id valid (ondemand_id > 0): # enqueue request xa_unlock ``` 后面在引入 failover 特性，支持 object state 之后，也就变成了 ``` # flush request cachefiles_ondemand_fd_release xa_lock set_object_close # flush request xa_unlock ``` ``` # enqueue request cachefiles_ondemand_send_req xa_lock # if object is not in close state: # enqueue request xa_unlock ``` ## 3. daemon: daemon_read 与 daemon: close_anonfd cachefiles_ondemand_daemon_read() 中，存在 a) search xarray, b) erase xarray 两个操作 ``` # for other request types xa_lock search the xarray to find a valid request clear CACHEFILES_REQ_NEW mark xa_unlock id = xas.xa_index; copy this request to user buffer on error path: xa_erase(..., id) ``` 对于 CLOSE 请求，在 read 得到一个 CLOSE 请求之后，就会执行 erase xarray 操作 ``` # for CLOSE requests xa_lock search the xarray to find a valid request clear CACHEFILES_REQ_NEW mark xa_unlock id = xas.xa_index; copy this request to user buffer xa_erase(..., id) ``` 可以看到上述 daemon_read 路径中存在 a) search xarray, b) erase xarray 两个操作，而这两个操作作为一个整体并不是 atomic 的同时我们之前介绍过，daemon 在 close anonfd 的时候，会 flush request，此时就有可能导致以下时序 ``` P1 P2 ------------ ----------- xa_lock search the xarray to find a valid request clear CACHEFILES_REQ_NEW mark xa_unlock id = xas.xa_index; copy this request to user buffer close anon fd xa_lock flush related requets xa_unlock another request may be enqueued into the xarray, reusing the previous id xa_erase(..., id) # oops ``` 如果要用一个 spinlock 锁把上述 daemon_read 路径中的 a) search xarray, b) erase xarray 两个操作包起来，一个是实现起来比较麻烦，在 daemon: close_anonfd 路径中也要把相关的代码段用这个 spinlock 包起来；另外一个，daemon_read 路径中 "copy this request to user buffer" 这一步还可能有其他操作，例如对于 OPEN 请求会调用 cachefiles_ondemand_get_fd()，这些操作可能会陷入阻塞，不能在持有 spinlock 的语境下调用因而现在的修复方法是，daemon: close_anonfd 路径中只对 CACHEFILES_REQ_NEW 标记的请求做 flush 操作 ``` P1 P2 ------------ ----------- xa_lock search the xarray to find a valid request clear CACHEFILES_REQ_NEW mark xa_unlock id = xas.xa_index; copy this request to user buffer close anon fd xa_lock flush related requets with CACHEFILES_REQ_NEW marked xa_unlock the request processed by P1 is not flushed xa_erase(..., id) ``` 请参考 [race between reading/flush requests](https://hackmd.io/YNsTQqLcQYOZ4gAlFWrNcA?view#race-between-readingflush-requests) ## 4. daemon: daemon_read 与 daemon: close_devfd 类似地，daemon 在 close devfd 的时候同样会 flush 所有请求，那么上述介绍的 race 有没有可能在 daemon: close_devfd 的时候触发呢？答案是不会，因为 daemon: daemon_read 与 daemon: close_devfd 这两个操作根本不会并行发生，daemon: daemon_read 是对 devfd 进行 read 操作的时候触发的，那么既然 devfd 还在执行 read 操作，那么 devfd 根本就还不会被 close 掉

Read more

Dragonfly Nydus bi-weekly meeting

Constraint for Daemon

Time Sequence Constraint

erofs + fscache TODO