Try   HackMD

ondemand 并发时序

task                daemon
===========         ==========
send_req            close_devfd
                    close_anonfd
                    daemon_read
                    (for OPEN) copen (write on devfd)
                    (for READ) CACHEFILES_IOC_READ_COMPLETE ioctl (on anonfd)

1. task:send_req 与 daemon:close_devfd

访问 erofs 文件的进程在触发 ondemand 的时候,通过 cachefiles_ondemand_send_req() 将请求添加到 xarray 中,即 enqueue request

cachefiles_ondemand_init_object
cachefiles_ondemand_clean_object
cachefiles_ondemand_read
    cachefiles_ondemand_send_req

daemon 在 close devfd 的时候需要将 xarray 中的请求 flush 掉,即 flush request

cachefiles_daemon_release
    cachefiles_flush_reqs

enqueue request 与 flush request 这两步操作通过 cache->flags 的 CACHEFILES_DEAD bit 进行同步,即

# flush request
cachefiles_daemon_release
    set CACHEFILES_DEAD bit
    cachefiles_flush_reqs
        # flush request
# enqueue request
cachefiles_ondemand_send_req
    # if CACHEFILES_DEAD bit not set:
        # enqueue request

但是这里有两个时序需要注意

  1. enqueue 的时候 a) test CACHEFILES_DEAD bit, b) enqueue request 这两个操作作为一个整体必须是 atomic 的,否则会出现以下 race
		/*
		 * Stop enqueuing the request when daemon is dying. The
		 * following two operations need to be atomic as a whole.
		 *   1) check cache state, and
		 *   2) enqueue request if cache is alive.
		 * Otherwise the request may be enqueued after xarray has been
		 * flushed, leaving the orphan request never being completed.
		 *
		 * CPU 1			CPU 2
		 * =====			=====
		 *				test CACHEFILES_DEAD bit
		 * set CACHEFILES_DEAD bit
		 * flush requests in the xarray
		 *				enqueue the request
		 */

因而使用了 spinlock (xarray->xa_lock) 锁来确保上述 atomic 的要求

# enqueue request
cachefiles_ondemand_send_req
    xa_lock
    # if CACHEFILES_DEAD bit not set:
        # enqueue request
    xa_unlock
# flush request
cachefiles_daemon_release
    set CACHEFILES_DEAD bit
    cachefiles_flush_reqs
        xa_lock
        # flush request
        xa_unlock
  1. 注意上述 flush request 的时候,没有把 "set CACHEFILES_DEAD bit" 也放在 lock 的 critical area 内 (主要是不想修改 cachefiles_daemon_release() 函数),这样 a) set CACHEFILES_DEAD bit, b) flush request 两个操作中间就有可能插入 enqueue request 路径的操作;如果 a) set CACHEFILES_DEAD bit, b) flush request 两个操作再发生乱序,就有可能导致以下时序
	/*
	 * Make sure the following two operations won't be reordered.
	 *   1) set CACHEFILES_DEAD bit
	 *   2) flush requests in the xarray
	 * Otherwise the request may be enqueued after xarray has been
	 * flushed, leaving the orphan request never being completed.
	 *
	 * CPU 1			CPU 2
	 * =====			=====
	 * flush requests in the xarray
	 *				test CACHEFILES_DEAD bit
	 *				enqueue the request
	 * set CACHEFILES_DEAD bit
	 */

所以为了让 flush request 路径的 a) set CACHEFILES_DEAD bit, b) flush request 两个操作,不要发生乱序,在这两个操作的中间加了一个 memory barrier

# flush request
cachefiles_daemon_release
    set CACHEFILES_DEAD bit
    cachefiles_flush_reqs
        smp_mb();
        xa_lock
        # flush request
        xa_unlock

这里 flush request 路径中的 xa_lock 能不能充当 memory barrier 呢?
spinlock 的 lock 操作隐含的是 read acquire 语义,而 read acquire 语义则是,确保 read acquire 之后的内存访问指令都在 read acquire 之后执行,相当于具有抑制 LoadLoad/LoadStore reordering 的作用

   read acquire
---------------------
all memory operations 
stay below the line

但是我们这里是需要抑制 Store[Load|Store],所以 xa_lock 隐含的 read acquire 语义并不能解决这个问题

2. task:send_req 与 daemon:close_anonfd

类似地,当 daemon 在 close anonfd 的时候需要将 xarray 中,与该 anonfd 相关的请求 flush 掉,主要是 READ/CLOSE 请求,即 flush request

在引入 failover 特性之后,close anonfd 的时候不再需要 flush READ 请求,但是仍然需要 flush CLOSE 请求;flush CLOSE 请求的原因请参考 flush CLOSE requests when anon fd is closed

这里 enqueue request 与 flush request 这两步操作通过 object->ondemand_id 进行同步,即

# flush request
cachefiles_ondemand_fd_release
    object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED
    # flush request
# enqueue request
cachefiles_ondemand_send_req
    # if object->ondemand_id valid (ondemand_id > 0):
        # enqueue request

类似地,enqueue request 路径中 a) test ondemand_id, b) enqueue request 这两个操作作为一个整体必须是 atomic 的,因而这里也是使用了 spinlock (xarray->xa_lock) 锁来确保上述 atomic 的要求

# flush request
cachefiles_ondemand_fd_release
    xa_lock
    object->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED
    # flush request
    xa_unlock
# enqueue request
cachefiles_ondemand_send_req
    xa_lock
    # if object->ondemand_id valid (ondemand_id > 0):
        # enqueue request
    xa_unlock

后面在引入 failover 特性,支持 object state 之后,也就变成了

# flush request
cachefiles_ondemand_fd_release
    xa_lock
    set_object_close
    # flush request
    xa_unlock
# enqueue request
cachefiles_ondemand_send_req
    xa_lock
    # if object is not in close state:
        # enqueue request
    xa_unlock

3. daemon: daemon_read 与 daemon: close_anonfd

cachefiles_ondemand_daemon_read() 中,存在 a) search xarray, b) erase xarray 两个操作

# for other request types

xa_lock
search the xarray to find a valid request
clear CACHEFILES_REQ_NEW mark
xa_unlock

id = xas.xa_index;

copy this request to user buffer

on error path:
  xa_erase(..., id)

对于 CLOSE 请求,在 read 得到一个 CLOSE 请求之后,就会执行 erase xarray 操作

# for CLOSE requests

xa_lock
search the xarray to find a valid request
clear CACHEFILES_REQ_NEW mark
xa_unlock

id = xas.xa_index;

copy this request to user buffer

xa_erase(..., id)

可以看到上述 daemon_read 路径中存在 a) search xarray, b) erase xarray 两个操作,而这两个操作作为一个整体并不是 atomic 的

同时我们之前介绍过,daemon 在 close anonfd 的时候,会 flush request,此时就有可能导致以下时序

P1                                          P2
------------                                -----------
xa_lock
search the xarray to find a valid request
clear CACHEFILES_REQ_NEW mark
xa_unlock

id = xas.xa_index;

copy this request to user buffer

                                            close anon fd
                                              xa_lock
                                              flush related requets
                                              xa_unlock

                                            another request may be enqueued into the xarray,
                                            reusing the previous id
xa_erase(..., id) # oops

如果要用一个 spinlock 锁把上述 daemon_read 路径中的 a) search xarray, b) erase xarray 两个操作包起来,一个是实现起来比较麻烦,在 daemon: close_anonfd 路径中也要把相关的代码段用这个 spinlock 包起来;另外一个,daemon_read 路径中 "copy this request to user buffer" 这一步还可能有其他操作,例如对于 OPEN 请求会调用 cachefiles_ondemand_get_fd(),这些操作可能会陷入阻塞,不能在持有 spinlock 的语境下调用

因而现在的修复方法是,daemon: close_anonfd 路径中只对 CACHEFILES_REQ_NEW 标记的请求做 flush 操作

P1                                          P2
------------                                -----------
xa_lock
search the xarray to find a valid request
clear CACHEFILES_REQ_NEW mark
xa_unlock

id = xas.xa_index;

copy this request to user buffer

                                            close anon fd
                                              xa_lock
                                              flush related requets with CACHEFILES_REQ_NEW marked
                                              xa_unlock

                                            the request processed by P1 is not flushed
xa_erase(..., id)

请参考 race between reading/flush requests

4. daemon: daemon_read 与 daemon: close_devfd

类似地,daemon 在 close devfd 的时候同样会 flush 所有请求,那么上述介绍的 race 有没有可能在 daemon: close_devfd 的时候触发呢?

答案是不会,因为 daemon: daemon_read 与 daemon: close_devfd 这两个操作根本不会并行发生,daemon: daemon_read 是对 devfd 进行 read 操作的时候触发的,那么既然 devfd 还在执行 read 操作,那么 devfd 根本就还不会被 close 掉