This topic is a proceeding from Agda in WebAssembly & Experiments on Language Servers. I aim to study deeper on replicating POSIX I/O mechanism.
tl;dr, poll_oneoff
for WASI preview 1 is difficult to get right. Will making stdin non-blocking help?
Let's see the following sample code:
In WASI environment, we expect fcntl
calls fd_fdstat_set_flags
system call and (hopefully) turns on the non-blocking mode for stdin (fd=0). How will every environment handle this then?
In a typical POSIX-compat system, the output will be:
When stdin is set to non-blocking mode, the read sould immediately return with EAGAIN
!
p.s. the return of fcntl(0, F_SETFL)
is a mix of fd status flags and access mode flag, as explained in fcntl(3p). It might not be zero.
The most hassle-free way to get a working WASM module:
But no direct CLI. You need to write some code to pull it offβ¦
Build: Nov 10 2022 09:50:46, Apple LLVM 14.0.0 (clang-1400.0.29.202)
1.7.2
β οΈ means the read call blocks there, and you need to press Ctrl-D to escape.
wasmtime-cli 21.0.1 (cedf9aa0f 2024-05-22)
Trace: (From my vague understanding of Rust code) See crates/wasi/src/preview1.rs
for the function fd_fdstat_set_flags
. A file object is some member of enum File
. As you can see that stdin
is defined as a different member, the operation to "get a File
from fd" (get_file_mut
) will lead to a BADF
error:
wasmer 4.3.1
Trace: "Permission denied" is more interesting. See function fd_fdstat_set_flags_internal
. STDIN_DEFAULT_RIGHTS
is hardcoded in lib/wasix/src/fs/mod.rs
and it does not include FD_FDSTAT_SET_FLAGS
. But adding that flag is not effective either:
Problems observed:
fcntl
succeeds, but the flag is not changed.Cannot proceed. There is an EOT (Ctrl-D) support added in 2024/10, but the engine on the extension marketplace has yet updated (last updated 2024/9).
Hopeless? Good news is that the runtime is under your control. It seems that you can provide a facade stdin that throws an error if it would block.
The error message would become "Resource busy". This is less than ideal because the program normally would abort in this case, while in EAGAIN
's case the program should retry several times. Sadly even duck typing FileSystemError
is futile.
fcntl
For 1., in the same file, the fdstat
function always return flag 0
.
For 2., the write only saves the flag in memory, and does not acknowledge the system.
Let's patch them.
poll_oneoff
accordinglyThe above patch works well only for our example. It breaks apart as soon as our code involves poll_oneoff
.
The blocking read from stdin issue is described in Tokio's documentation, as per https://docs.rs/tokio/latest/tokio/io/struct.Stdin.html:
This handle is best used for non-interactive uses, such as [β¦] For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.
With some strace-ing, I can almost conclude that Wasmer (tokio, and in turn mio) relies on the fact that stdin is blocking, and does not take advantage of any technique of I/O multiplexing (select/poll/epoll/β¦) for poll_oneoff
, and so behaves unexpectedly in non-blocking mode. For instance,
It is surprising to see how Wasmer deviates from the specβ¦
Blocking? | Timeout | POSIX spec | Wasmer's behavior | Remark |
---|---|---|---|---|
Yes | NULL |
Blocks | Blocks | |
Yes | {0, 0} |
Returns 0 | Blocks (*1) | This is trivial to fix by skipping s/g buffers of zero length. |
No | NULL |
Blocks | Returns 1 (*2) | This is hard to fix on Wasmer's side. |
No | {0, 0} |
Returns 0 | Returns 1 (*2) | This corresponds to the demo code. The trace is provided below. |
(*1): Wasmtime's behavior is correct. I believe this is another bug in Wasmer. The only correct way to tell poll_oneoff
to wait indefinitely is to not include a clock event.
(*2): it errs out with EAGAIN
(!); the readiness
is set to EPOLLERR
.
Call try_select
after setting O_NONBLOCK
and compare the behavior in a UNIX system vs. in WASI runtime. UNIX returns 0
denoting that the read is not ready. On the other hand, the debug trace from Wasmer shows that poll_oneoff
returns 1
when the underlying FD is set to non-blocking mode. A program seeing this will do the read and fail with EAGAIN
.
The strace -ff
shows that the original polling implementation is actually a direct read
! (See Stdin
's poll_read_ready
)
And the trace from Rust's side is
We want polling without timeout blocks until stdin is ready, but non-blocking read
must not block. The takeaway is that you can never truly make a non-blocking, concurrent program without essential APIs from OS.
After several failing attempts, I do not think the problem can be solved. But after a month of debugging, finally I found that the problem genuinely came from two instances of Stdin
s ever constructed. The solution can be broken up in two parts:
poll_read_ready
. Note that we do not need anything fancy in poll_read
. It is the outer layer (i.e., fd_read
that is responsible for the blocking/nonblocking mode.Here is my patch against Wasmer v4.2.5. Note that this patch only works for UNIX-like system , and programs relying on a blocking stdin will break.
The patch enables the nonblocking mode for fd_fdstat_set_flags
, which a program can invoke through fcntl
. It would be better to inherit that flag from the host. This way, the user can use a wrapper program (like stdbuf) to specify the mode.
This bug bothers me for nearly a month. I have to resort to talking with ChatGPT on my Rust-y code.
Thanks to osa1 and others for a discussion thread at Rust user forum, and the code from project @osa1/tiny.
This section serves as a placeholder. I expect Wasmtime easier to fix.