This topic is a proceeding from Agda in WebAssembly & Experiments on Language Servers. I aim to study deeper on replicating POSIX I/O mechanism.
tl;dr, poll_oneoff
for WASI preview 1 is hard to get right. Will making stdin non-blocking help?
Let's see the following sample code:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
int main() {
printf("nonblock stdin before? %d\n", fcntl(0, F_GETFL) & O_NONBLOCK);
if (fcntl(0, F_SETFL, O_NONBLOCK) < 0) {
perror("fcntl");
}
printf("nonblock stdin after ? %d\n", fcntl(0, F_GETFL) & O_NONBLOCK);
char buf[16];
int n = read(0, buf, 1);
printf("read ret=%d\n", n);
if (n < 0)
perror("read");
}
In WASI environment, we expect fcntl
calls fd_fdstat_set_flags
system call and (hopefully) turns on the non-blocking mode for stdin (fd=0). How will every environment handle this then?
In a typical UNIX system, the output will be:
$ gcc nonblock.c -o nonblock
$ ./nonblock
nonblock stdin before? 0
nonblock stdin after ? 4
read ret=-1
read: Resource temporarily unavailable
When stdin is set to non-blocking mode, the read sould immediately return with EAGAIN
!
p.s. the return of fcntl(0, F_SETFL)
is a mix of fd status flags and access mode flag, as explained in fcntl(3p). It might not be zero.
The most hassle-free way to get a working WASM module:
zig cc -target wasm32-wasi nonblock.c -o nonblock.wasm
But no direct CLI. You need to write some code to pull it offβ¦
import { readFile } from 'node:fs/promises'
import { WASI } from 'node:wasi'
const wasi = new WASI({
version: 'preview1',
args: ['nonblock.wasm'],
env: {},
returnOnExit: true,
})
const wasm = await WebAssembly.compile(
await readFile('nonblock.wasm'),
)
const instance = await WebAssembly.instantiate(wasm, wasi.getImportObject())
process.exit(await wasi.start(instance))
Build: Nov 10 2022 09:50:46, Apple LLVM 14.0.0 (clang-1400.0.29.202)
1.7.2
$ wazero run nonblock.wasm
nonblock stdin before? 0
nonblock stdin after ? 4
read ret=-1
read: Resource temporarily unavailable
β οΈ means the read call blocks there, and you need to press Ctrl-D to escape.
wasmtime-cli 21.0.1 (cedf9aa0f 2024-05-22)
$ wasmtime run nonblock.wasm
nonblock stdin before? 0
fcntl: Bad file descriptor
nonblock stdin after ? 0
β οΈ
read ret=0
Trace: (From my vague understanding of Rust code) See crates/wasi/src/preview1.rs
for the function fd_fdstat_set_flags
. A file object is some member of enum File
. As you can see that stdin
is defined as a different member, the operation to "get a File
from fd" (get_file_mut
) will lead to a BADF
error:
match self.descriptors.get_mut(&fd) {
Some(Descriptor::File(file)) => Ok(file),
_ => Err(types::Errno::Badf.into()),
}
wasmer 4.3.1
$ wasmer run nonblock.wasm
nonblock stdin before? 0
fcntl: Permission denied
nonblock stdin after ? 0
β οΈ
read ret=0
Trace: "Permission denied" is more interesting. See function fd_fdstat_set_flags_internal
. STDIN_DEFAULT_RIGHTS
is hardcoded in lib/wasix/src/fs/mod.rs
and it does not include FD_FDSTAT_SET_FLAGS
. But adding that flag is not effective either:
$ target/release/wasmer nonblock.wasm
nonblock stdin before? 0
nonblock stdin after ? 0
β οΈ
read ret=0
Problems observed:
fcntl
succeeds, but the flag is not changed.fcntl
For 1., in the same file, the fdstat
function always return flag 0
.
pub fn fdstat(&self, fd: WasiFd) -> Result<Fdstat, Errno> {
match fd {
__WASI_STDIN_FILENO => {
return Ok(Fdstat {
fs_filetype: Filetype::CharacterDevice,
fs_flags: Fdflags::empty(), // β¬
οΈ
fs_rights_base: STDIN_DEFAULT_RIGHTS,
fs_rights_inheriting: Rights::empty(),
})
}
...
For 2., the write only saves the flag in memory, and does not acknowledge the system.
Let's patch them.
// wasmer/lib/wasix/src/fs/mod.rs
pub fn fdstat(&self, fd: WasiFd) -> Result<Fdstat, Errno> {
match fd {
__WASI_STDIN_FILENO => {
let fd = self.get_fd(fd)?; // β
return Ok(Fdstat {
fs_filetype: Filetype::CharacterDevice,
fs_flags: fd.flags, // β
fs_rights_base: STDIN_DEFAULT_RIGHTS,
fs_rights_inheriting: Rights::empty(),
})
// wasmer/lib/wasix/src/syscalls/wasi/fd_fdstat_set_flags.rs
#[instrument(level = "debug", skip_all, fields(%fd), ret)]
pub fn fd_fdstat_set_flags(
// ...
) -> Result<Errno, WasiError> {
// ...
let guard = fd_entry.inode.read();
let maybe_sys_fd = match guard.deref() {
Kind::File { handle, .. } => {
if let Some(handle) = handle {
let handle = handle.clone();
let handle = wasi_try_ok!(handle.read().map_err(|_| { Errno::Badf }));
handle.get_special_fd()
} else { None }
},
_ => None,
};
if let Some(sys_fd) = maybe_sys_fd {
let sys_fd = wasi_try_ok!(i32::try_from(sys_fd).map_err(|_| { Errno::Badf }));
let fcntl_flags = unsafe { libc::fcntl(sys_fd, libc::F_GETFL) };
wasi_try_ok!((fcntl_flags >= 0).then(|| ()).ok_or(Errno::Access));
let fcntl_flags = if flags.contains(Fdflags::NONBLOCK) {
fcntl_flags | libc::O_NONBLOCK
} else {
fcntl_flags & (!libc::O_NONBLOCK)
};
let ret = unsafe { libc::fcntl(sys_fd, libc::F_SETFL, fcntl_flags) };
wasi_try_ok!((ret == 0).then(|| ()).ok_or(Errno::Access));
};
// ...
Ok(Errno::Success)
}
poll_oneoff
accordinglyThe above patch works well only for our example. It breaks apart as soon as our code involves poll_oneoff
.
The blocking read from stdin issue is described in Tokio's documentation, as per https://docs.rs/tokio/latest/tokio/io/struct.Stdin.html:
This handle is best used for non-interactive uses, such as [β¦] For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.
With some strace-ing, I can almost conclude that Wasmer (tokio, and in turn mio) relies on the fact that stdin is blocking, and does not take advantage of any technique of I/O multiplexing (select/poll/epoll/β¦) for poll_oneoff
, and so behaves unexpectedly in non-blocking mode. For instance,
// #include <sys/select.h>
int try_select() {
fd_set readfds;
FD_ZERO(&readfds);
FD_SET(0, &readfds);
// NOTE: passing NULL or a negative number in timeout
// means waiting indefinitely!
struct timeval timeout = {
.tv_sec = 0,
.tv_usec = 0,
};
return select(1, &readfds, NULL, NULL, &timeout);
}
It is surprising to see how Wasmer deviates from the specβ¦
Blocking? | Timeout | POSIX spec | Wasmer's behavior | Remark |
---|---|---|---|---|
Yes | NULL |
Blocks | Blocks | |
Yes | {0, 0} |
Returns 0 | Blocks (*1) | This is trivial to fix by skipping s/g buffers of zero length. |
No | NULL |
Blocks | Returns 1 (*2) | This is hard to fix on Wasmer's side. |
No | {0, 0} |
Returns 0 | Returns 1 (*2) | This corresponds to the demo code. The trace is provided below. |
(*1): Wasmtime's behavior is correct. I believe this is another bug in Wasmer. The only correct way to tell poll_oneoff
to wait indefinitely is to not include a clock event.
(*2): it errs out with EAGAIN
(!); the readiness
is set to EPOLLERR
.
Call try_select
after setting O_NONBLOCK
and compare the behavior in a UNIX system vs. in WASI runtime. UNIX returns 0
denoting that the read is not ready. On the other hand, the debug trace from Wasmer shows that poll_oneoff
returns 1
when the underlying FD is set to non-blocking mode. A program seeing this will do the read and fail with EAGAIN
.
The strace -ff
shows that the original polling implementation is actually a direct read
! (See Stdin
's poll_read_ready
)
[pid 95011] read(0, <unfinished ...>
[pid 94997] <... futex resumed>) = -1 ETIMEDOUT (ι£η·θΆ
ιζι)
[pid 94997] sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
[pid 94997] munmap(0x79dfaa456000, 12288) = 0
[pid 94997] rt_sigprocmask(SIG_BLOCK, ~[RT_1], NULL, 8) = 0
[pid 94997] madvise(0x79dfa9e00000, 2076672, MADV_DONTNEED) = 0
[pid 94997] exit(0) = ?
[pid 94997] +++ exited with 0 +++
123
[pid 95011] <... read resumed>"123\n", 8192) = 4
And the trace from Rust's side is
TRACE [...]::poll_oneoff: enter
TRACE [...]::poll_oneoff: triggered fd=0 readiness=EPOLLERR userdata=0 ty=1 peb=1 fd_guards="[guard-file(fd=0, peb=1)]"
TRACE [...]::poll_oneoff: return=Ok(Errno::success) fd_guards="[guard-file(fd=0, peb=1)]" seen="Event { userdata: 0, error: Errno::again, type: Eventtype::FdRead }"
TRACE [...]::poll_oneoff: close time.busy=1.86ms time.idle=417ns fd_guards="[guard-file(fd=0, peb=1)]" seen="Event { userdata: 0, error: Errno::again, type: Eventtype::FdRead }"
We want polling without timeout blocks until stdin is ready, but non-blocking read
must not block. The takeaway is that you can never truly make a non-blocking, concurrent program without essential APIs from OS.
After several failing attempts, I do not think the problem can be solved. But after a month of debugging, finally I found that the problem genuinely came from two instances of Stdin
s ever constructed. The solution can be broken up in two parts:
poll_read_ready
. Note that we do not need anything fancy in poll_read
. It is the outer layer (i.e., fd_read
that is responsible for the blocking/nonblocking mode.Here is my patch against Wasmer v4.2.5. Note that this patch only works for UNIX-like system , and programs relying on a blocking stdin will break.
The patch enables the nonblocking mode for fd_fdstat_set_flags
, which a program can invoke through fcntl
. It would be better to inherit that flag from the host. This way, the user can use a wrapper program (like stdbuf) to specify the mode.
This bug bothers me for nearly a month. I have to resort to talking with ChatGPT on my Rust-y code.
Thanks to osa1 and others for a discussion thread at Rust user forum, and the code from project @osa1/tiny.
This section serves as a placeholder. I expect Wasmtime easier to fix.