This topic is a proceeding from Agda in WebAssembly & Experiments on Language Servers. I aim to study deeper on replicating POSIX I/O mechanism.
tl;dr, poll_oneoff
for WASI preview 1 is hard to get right. Will making stdin non-blocking help?
Let's see the following sample code:
In WASI environment, we expect fcntl
calls fd_fdstat_set_flags
system call and (hopefully) turns on the non-blocking mode for stdin (fd=0). How will every environment handle this then?
In a typical UNIX system, the output will be:
When stdin is set to non-blocking mode, the read sould immediately return with EAGAIN
!
p.s. the return of fcntl(0, F_SETFL)
is a mix of fd status flags and access mode flag, as explained in fcntl(3p). It might not be zero.
The most hassle-free way to get a working WASM module:
But no direct CLI. You need to write some code to pull it offβ¦
Build: Nov 10 2022 09:50:46, Apple LLVM 14.0.0 (clang-1400.0.29.202)
1.7.2
β οΈ means the read call blocks there, and you need to press Ctrl-D to escape.
wasmtime-cli 21.0.1 (cedf9aa0f 2024-05-22)
Trace: (From my vague understanding of Rust code) See crates/wasi/src/preview1.rs
for the function fd_fdstat_set_flags
. A file object is some member of enum File
. As you can see that stdin
is defined as a different member, the operation to "get a File
from fd" (get_file_mut
) will lead to a BADF
error:
wasmer 4.3.1
Trace: "Permission denied" is more interesting. See function fd_fdstat_set_flags_internal
. STDIN_DEFAULT_RIGHTS
is hardcoded in lib/wasix/src/fs/mod.rs
and it does not include FD_FDSTAT_SET_FLAGS
. But adding that flag is not effective either:
Problems observed:
fcntl
succeeds, but the flag is not changed.fcntl
For 1., in the same file, the fdstat
function always return flag 0
.
For 2., the write only saves the flag in memory, and does not acknowledge the system.
Let's patch them.
poll_oneoff
accordinglyThe above patch works well only for our example. It breaks apart as soon as our code involves poll_oneoff
.
The blocking read from stdin issue is described in Tokio's documentation, as per https://docs.rs/tokio/latest/tokio/io/struct.Stdin.html:
This handle is best used for non-interactive uses, such as [β¦] For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.
With some strace-ing, I can almost conclude that Wasmer (tokio, and in turn mio) relies on the fact that stdin is blocking, and does not take advantage of any technique of I/O multiplexing (select/poll/epoll/β¦) for poll_oneoff
, and so behaves unexpectedly in non-blocking mode. For instance,
It is surprising to see how Wasmer deviates from the specβ¦
Blocking? | Timeout | POSIX spec | Wasmer's behavior | Remark |
---|---|---|---|---|
Yes | NULL |
Blocks | Blocks | |
Yes | {0, 0} |
Returns 0 | Blocks (*1) | This is trivial to fix by skipping s/g buffers of zero length. |
No | NULL |
Blocks | Returns 1 (*2) | This is hard to fix on Wasmer's side. |
No | {0, 0} |
Returns 0 | Returns 1 (*2) | This corresponds to the demo code. The trace is provided below. |
(*1): Wasmtime's behavior is correct. I believe this is another bug in Wasmer. The only correct way to tell poll_oneoff
to wait indefinitely is to not include a clock event.
(*2): it errs out with EAGAIN
(!); the readiness
is set to EPOLLERR
.
Call try_select
after setting O_NONBLOCK
and compare the behavior in a UNIX system vs. in WASI runtime. UNIX returns 0
denoting that the read is not ready. On the other hand, the debug trace from Wasmer shows that poll_oneoff
returns 1
when the underlying FD is set to non-blocking mode. A program seeing this will do the read and fail with EAGAIN
.
The strace -ff
shows that the original polling implementation is actually a direct read
! (See Stdin
's poll_read_ready
)
And the trace from Rust's side is
We want polling without timeout blocks until stdin is ready, but non-blocking read
must not block. The takeaway is that you can never truly make a non-blocking, concurrent program without essential APIs from OS.
After several failing attempts, I do not think the problem can be solved. But after a month of debugging, finally I found that the problem genuinely came from two instances of Stdin
s ever constructed. The solution can be broken up in two parts:
poll_read_ready
. Note that we do not need anything fancy in poll_read
. It is the outer layer (i.e., fd_read
that is responsible for the blocking/nonblocking mode.Here is my patch against Wasmer v4.2.5. Note that this patch only works for UNIX-like system , and programs relying on a blocking stdin will break.
The patch enables the nonblocking mode for fd_fdstat_set_flags
, which a program can invoke through fcntl
. It would be better to inherit that flag from the host. This way, the user can use a wrapper program (like stdbuf) to specify the mode.
This bug bothers me for nearly a month. I have to resort to talking with ChatGPT on my Rust-y code.
Thanks to osa1 and others for a discussion thread at Rust user forum, and the code from project @osa1/tiny.
This section serves as a placeholder. I expect Wasmtime easier to fix.