Coroutine

參照 wiki 所述 :

Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed

程式可以透過 coroutine 實現 concurrency ( 非 parallelism )。
透過此可以在指定進入點以及離開點實現 non-preemptive multitasking ( 只能等待自己放棄資源其他爭奪的程式才能使用其資源 )，也就是說在只要進入所寫定紀錄點即可確保離開前不會被其他 generator 搶佔。但如果在 OS 中以multi-thread 運作 coroutine 還是要考慮 critical section 問題。

通常使用 Coroutine 會運作在目標為 hard-realtime context 之系統。

hard-realtime context
switching between coroutines need not involve any system calls or any blocking calls whatsoever

在呼叫 system call 時會從 user mode 切換成 kernel mode ，並在結束之後再切回 user mode ，而程式本身此種切換會造成大量成本，源自於其兩種不同段程式碼，要考慮 cache miss 以及 page fault 情況。

Job vs Deferred

以下參考 Kotlin 所定義做為參考

一個正在被執行之過程可封裝成 Job

                                                      wait children
    +-----+       start      +--------+   complete   +-------------+  finish  +-----------+
    | New | ---------------> | Active | -----------> | Completing  | -------> | Completed |
    +-----+                  +--------+              +-------------+          +-----------+
       |                         |                         |
       | cancel                  | cancel                  | cancel
       V                         V                         |
  +-----------+   finish   +------------+                  |
  | Cancelled | <--------- | Cancelling | <----------------+
  |(completed)|            +------------+
  +-----------+

而 Deferred 可享成一種 Job 的種類，在 Deferred 結束完成後提供返回值

   +-----+       start      +--------+   complete  +-------------+ finish +-----------+
   | New | ---------------> | Active | ----------> | Completing  | ---+-> | Resolved  |
   +-----+                  +--------+             +-------------+    |   |(completed)|
      |                         |                        |            |   +-----------+
      | cancel                  | cancel                 | cancel     |
      V                         V                        |            |   +-----------+
 +-----------+   finish   +------------+                 |            +-> |  Failed   |
 | Cancelled | <--------- | Cancelling | <---------------+                |(completed)|
 |(completed)|            +------------+                                  +-----------+
 +-----------+

Implementation method

傳統語言例如 C 並沒有 coroutine 的函式庫或者直接支援此類操作。( C++ 存在 library Boost Context ) 。

This is, in large part, due to the limitations of stack-based subroutine implementation.

畢竟 C 語言再呼叫副函式時式使用 stack 儲存並返還原先位置。
那該如何實作 coroutine ？有兩種方法 :

closure
用最簡的來說就是需要兩個條件，分別為回傳或者傳進函式引數可以使用函式(也就是 C 語言 function pointer 的概念) 以及 local stack 的資料可以備回傳或保存(遺憾的是 C 語言在 pop 回 callee 位置時，caller 的 local stack 內容已經被釋放掉了)
以 state machine 並以 goto 或者 context switch 做交互呼叫
但是維護困難，很容易變成 Spaghetti code ，這式沒有人樂意見到的。

Implementations for C

想利用 C 做出 Coroutine 之程式會需要利用到第二個 Stack 結構 ( 第一個就是存放區域變數、Program Counter… )，那為了做到這件事情最好的方式為利用少量 Inline assembler 直接以 stack pointer 對記憶體做操作。除了上述方法也可以利用 POSIX sigaltstack system call 例如 getcontext, setcontext, makecontext 以及 swapcontext。不過上述目前在 POSIX 1.2008 已經不再維護了。

在上述提到的 stack 建構完成後我們可以利用 setjmp.h 中的 setjmp 以及 longjmp 完成 "context switch"。

在 C 語言中的 goto 只能跳至該函數之 label 位置，若要跳到其他函數中則必須使用 setjmp 與 longjmp。

如何優雅的寫此段程式也是我感興趣的部分 :

Minimalist implementations, which do not piggyback off the setjmp and longjmp functions, may achieve the same result via a small block of inline assembly which swaps merely the stack pointer and program counter, and clobbers all other registers.

我們可以透過 inline assembly code 以取代 setjmp 進行實作，事實上這也本篇重點，利用此可以 push 更少的內容，達到加速的效果。

接下來以 lwan webserver 中的 coroutine 為例

為何要用 coroutine 來實作 webserver？收到一筆request 就直接返回一 response 就行了吧？

http protocol 存在 http1.0 以及 http1.1 這兩種版本(其實不只這兩種，只是這邊是為了處理這部分)。
考慮 keep-alive 的方式，我們需要解決 response 送出後不 close socket ，而繼續等待下一筆 request 直到 timeout 或者被要求 close 。

coro_new

所運行 Thread 首先會使用 coro_new() 對該筆請求(connection)之 coro 加入 switcher 進行排程，其中會先使用 mmap 進行映射至虛擬記憶體中，stack 為一個指標而之後會存在一連續記憶體空間其大小為 131072 * 4 之 stack

void *stack = mmap(NULL, CORO_STACK_SIZE, PROT_READ | PROT_WRITE,
                       MAP_STACK | MAP_ANON | MAP_PRIVATE, -1, 0);

MAP_ANONYMOUS 建立匿名映射。此時會忽略參數fd ( 此處fd 設定為 -1 )，不涉及文件，而且映射區域無法和其他進程共享 (不過之後並不會 fork 出其他 process 可以不用考慮這點)。

MAP_ANON
Synonym for MAP_ANONYMOUS; provided for compatibility with
other implementations.

MAP_ANONYMOUS
The mapping is not backed by any file; its contents are
initialized to zero. The fd argument is ignored; however,
some implementations require fd to be -1 if MAP_ANONYMOUS
(or MAP_ANON) is specified, and portable applications
should ensure this. The offset argument should be zero.
The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is
supported on Linux only since kernel 2.4.

MAP_STACK 參考mmap(2) — Linux manual page

MAP_STACK (since Linux 2.6.27)
Allocate the mapping at an address suitable for a process
or thread stack.
This flag is currently a no-op on Linux. However, by
employing this flag, applications can ensure that they
transparently obtain support if the flag is implemented in
the future. Thus, it is used in the glibc threading
implementation to allow for the fact that some
architectures may (later) require special treatment for
stack allocations. A further reason to employ this flag
is portability: MAP_STACK exists (and has an effect) on
some other systems (e.g., some of the BSDs).

建立好 coro 以及 stack 之後，會再建立 coro_defer_array 以及 switcher(其實也就是 unsigned long array[10])，建立好之後會呼叫 coro_reset(coro,function,data)

在 lwan 當中此處 function 為 process_request_coro()
，會在後面提到

coro_reset 會做 3 件事情，分別為 :

coro_defered_run(coro,0)
這函式裡第二個參數"0" 其實為當下 generation 所需次數，會根據當下之所需呼叫function(data)之數量(這裡的function 為 process_request_coro() 並透過 yeild 回傳至此"main process")
不過目前為 coro_new 故不會用到此 yield以及 resume 操作
coro_defer_array_reset
將coro->context 分別寫入 data,coro,func,coro_entry_point
















    /* coro_entry_point() for x86-64 has 3 arguments, but RDX isn't
     * stored.  Use R15 instead, and implement the trampoline
     * function in assembly in order to use this register when
     * calling the user function. */
    coro->context[5 /* R15 */] = (uintptr_t)data;
    coro->context[6 /* RDI */] = (uintptr_t)coro;
    coro->context[7 /* RSI */] = (uintptr_t)func;
    coro->context[8 /* RIP */] = (uintptr_t)coro_entry_point_x86_64;

    /* Ensure stack is properly aligned: it should be aligned to a
     * 16-bytes boundary so SSE will work properly, but should be
     * aligned on an 8-byte boundary right after calling a function. */
    uintptr_t rsp = (uintptr_t)stack + CORO_STACK_SIZE;

#define STACK_PTR 9
    coro->context[STACK_PTR] = (rsp & ~0xful) - 0x8ul;

不知道這行之用意何在?


#define STACK_PTR 9
    coro->context[STACK_PTR] = (rsp & ~0xful) - 0x8ul;

以上就完成每筆 connction 對應 coro 之初始化，並在之後透過 timequeue 判斷已完成多筆 connection 之concurrent。

以下關於 coroutine 為本處設計函式最主要兩大關鍵 yield 以及 resume

首先介紹 &coro->context 以及 &coro->switcher->caller

此處兩者皆為 coro_context structure (x86_64)裡面都用於存放 caller stack 之資料。


typedef uintptr_t coro_context[10]

coro_yield










inline int64_t coro_yield(struct coro *coro, int64_t value)
{
    assert(coro);

    coro->yield_value = value;
    coro_swapcontext(&coro->context, &coro->switcher->caller);

    return coro->yield_value;
}

coro_resume














ALWAYS_INLINE int64_t coro_resume(struct coro *coro)
{
    assert(coro);

#if defined(STACK_PTR)
    assert(coro->context[STACK_PTR] >= (uintptr_t)coro->stack &&
           coro->context[STACK_PTR] <=
               (uintptr_t)(coro->stack + CORO_STACK_SIZE));
#endif

    coro_swapcontext(&coro->switcher->caller, &coro->context);

    return coro->yield_value;
}

reference

Generator (computer programming)
Coroutine
Coroutine: 入門篇

Coroutine

相關名詞比較

Coroutine vs Thread

Coroutine vs Subroutine

Coroutine vs Generator

Job vs Deferred

Implementation method

Implementations for C

coro_new

coro_yield

coro_resume

reference

Coroutine

相關名詞比較

Coroutine vs Thread

Coroutine vs Subroutine

Coroutine vs Generator

Job vs Deferred

Implementation method

Implementations for C

coro_new

coro_yield

coro_resume

reference

Read more

eBPF

How io_uring and eBPF Will Revolutionize Programming in Linux

Optimed Strlen (cont.)

Optimed Strlen