Try   HackMD

Non-blocking stdin support in WASI runtimes πŸ€”

This topic is a proceeding from Agda in WebAssembly & Experiments on Language Servers. I aim to study deeper on replicating POSIX I/O mechanism.

tl;dr, poll_oneoff for WASI preview 1 is difficult to get right. Will making stdin non-blocking help?

Let's see the following sample code:

#include <stdio.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <string.h> int main() { printf("nonblock stdin before? %d\n", fcntl(0, F_GETFL) & O_NONBLOCK); if (fcntl(0, F_SETFL, O_NONBLOCK) < 0) { perror("fcntl"); } printf("nonblock stdin after ? %d\n", fcntl(0, F_GETFL) & O_NONBLOCK); char buf[16]; int n = read(0, buf, 1); printf("read ret=%d\n", n); if (n < 0) perror("read"); }

In WASI environment, we expect fcntl calls fd_fdstat_set_flags system call and (hopefully) turns on the non-blocking mode for stdin (fd=0). How will every environment handle this then?

Expected output

In a typical POSIX-compat system, the output will be:

$ gcc nonblock.c -o nonblock
$ ./nonblock
nonblock stdin before? 0
nonblock stdin after ? 4
read ret=-1
read: Resource temporarily unavailable

When stdin is set to non-blocking mode, the read sould immediately return with EAGAIN!

p.s. the return of fcntl(0, F_SETFL) is a mix of fd status flags and access mode flag, as explained in fcntl(3p). It might not be zero.

The survey

The most hassle-free way to get a working WASM module:

zig cc -target wasm32-wasi nonblock.c -o nonblock.wasm
  • Node.js 20 πŸ™†

But no direct CLI. You need to write some code to pull it off…

Grab this!
import { readFile } from 'node:fs/promises'
import { WASI } from 'node:wasi'
const wasi = new WASI({
  version: 'preview1',
  args: ['nonblock.wasm'],
  env: {},
  returnOnExit: true,
})
const wasm = await WebAssembly.compile(
  await readFile('nonblock.wasm'),
)
const instance = await WebAssembly.instantiate(wasm, wasi.getImportObject())
process.exit(await wasi.start(instance))
  • wasm3 πŸ™†

Build: Nov 10 2022 09:50:46, Apple LLVM 14.0.0 (clang-1400.0.29.202)

  • wazero πŸ™†

1.7.2

$ wazero run nonblock.wasm
nonblock stdin before? 0
nonblock stdin after ? 4
read ret=-1
read: Resource temporarily unavailable

Runtimes that do not pass this…

⚠️ means the read call blocks there, and you need to press Ctrl-D to escape.

wasmtime πŸ™…

wasmtime-cli 21.0.1 (cedf9aa0f 2024-05-22)

$ wasmtime run nonblock.wasm
nonblock stdin before? 0
fcntl: Bad file descriptor
nonblock stdin after ? 0
⚠️
read ret=0

Trace: (From my vague understanding of Rust code) See crates/wasi/src/preview1.rs for the function fd_fdstat_set_flags. A file object is some member of enum File. As you can see that stdin is defined as a different member, the operation to "get a File from fd" (get_file_mut) will lead to a BADF error:

match self.descriptors.get_mut(&fd) {
    Some(Descriptor::File(file)) => Ok(file),
    _ => Err(types::Errno::Badf.into()),
}

wasmer πŸ™…

wasmer 4.3.1

$ wasmer run nonblock.wasm
nonblock stdin before? 0
fcntl: Permission denied
nonblock stdin after ? 0
⚠️
read ret=0

Trace: "Permission denied" is more interesting. See function fd_fdstat_set_flags_internal. STDIN_DEFAULT_RIGHTS is hardcoded in lib/wasix/src/fs/mod.rs and it does not include FD_FDSTAT_SET_FLAGS. But adding that flag is not effective either:

$ target/release/wasmer nonblock.wasm
nonblock stdin before? 0
nonblock stdin after ? 0
⚠️
read ret=0

Problems observed:

  1. The first fcntl succeeds, but the flag is not changed.
  2. It is still blocking.

VS Code (WebAssembly Execution Engine) πŸ™…

nonblock stdin before? 0
fcntl: Function not implemented
nonblock stdin after ? 0
⚠️

Cannot proceed. There is an EOT (Ctrl-D) support added in 2024/10, but the engine on the extension marketplace has yet updated (last updated 2024/9).

Hopeless? Good news is that the runtime is under your control. It seems that you can provide a facade stdin that throws an error if it would block.

const stdinPipe = wasm.createWritable()
;(stdinPipe as any).read = function(mode?: 'max', size?: number) {
  logger.appendLine(`STDIN READ mode=${mode} size=${size}`)
  if ((pty as any).lines.length <= 0) {
    // FIXME: Come up with a way to throw a WasiError, which is not exported
    throw FileSystemError.Unavailable('This read to stdin would block')
  }
  return pty.read(size ?? 0)
}

The error message would become "Resource busy". This is less than ideal because the program normally would abort in this case, while in EAGAIN's case the program should retry several times. Sadly even duck typing FileSystemError is futile.

The attempt to fix Wasmer

Step 1: Make it possible to call fcntl

For 1., in the same file, the fdstat function always return flag 0.

    pub fn fdstat(&self, fd: WasiFd) -> Result<Fdstat, Errno> {
        match fd {
            __WASI_STDIN_FILENO => {
                return Ok(Fdstat {
                    fs_filetype: Filetype::CharacterDevice,
                    fs_flags: Fdflags::empty(),  // ⬅️
                    fs_rights_base: STDIN_DEFAULT_RIGHTS,
                    fs_rights_inheriting: Rights::empty(),
                })
            }
            ...

For 2., the write only saves the flag in memory, and does not acknowledge the system.

Let's patch them.

// wasmer/lib/wasix/src/fs/mod.rs pub fn fdstat(&self, fd: WasiFd) -> Result<Fdstat, Errno> { match fd { __WASI_STDIN_FILENO => { let fd = self.get_fd(fd)?; // βœ… return Ok(Fdstat { fs_filetype: Filetype::CharacterDevice, fs_flags: fd.flags, // βœ… fs_rights_base: STDIN_DEFAULT_RIGHTS, fs_rights_inheriting: Rights::empty(), })
// wasmer/lib/wasix/src/syscalls/wasi/fd_fdstat_set_flags.rs #[instrument(level = "debug", skip_all, fields(%fd), ret)] pub fn fd_fdstat_set_flags( // ... ) -> Result<Errno, WasiError> { // ... let guard = fd_entry.inode.read(); let maybe_sys_fd = match guard.deref() { Kind::File { handle, .. } => { if let Some(handle) = handle { let handle = handle.clone(); let handle = wasi_try_ok!(handle.read().map_err(|_| { Errno::Badf })); handle.get_special_fd() } else { None } }, _ => None, }; if let Some(sys_fd) = maybe_sys_fd { let sys_fd = wasi_try_ok!(i32::try_from(sys_fd).map_err(|_| { Errno::Badf })); let fcntl_flags = unsafe { libc::fcntl(sys_fd, libc::F_GETFL) }; wasi_try_ok!((fcntl_flags >= 0).then(|| ()).ok_or(Errno::Access)); let fcntl_flags = if flags.contains(Fdflags::NONBLOCK) { fcntl_flags | libc::O_NONBLOCK } else { fcntl_flags & (!libc::O_NONBLOCK) }; let ret = unsafe { libc::fcntl(sys_fd, libc::F_SETFL, fcntl_flags) }; wasi_try_ok!((ret == 0).then(|| ()).ok_or(Errno::Access)); }; // ... Ok(Errno::Success) }

Step 2: Fix poll_oneoff accordingly

The above patch works well only for our example. It breaks apart as soon as our code involves poll_oneoff.

The blocking read from stdin issue is described in Tokio's documentation, as per https://docs.rs/tokio/latest/tokio/io/struct.Stdin.html:

This handle is best used for non-interactive uses, such as […] For interactive uses, it is recommended to spawn a thread dedicated to user input and use blocking IO directly in that thread.

With some strace-ing, I can almost conclude that Wasmer (tokio, and in turn mio) relies on the fact that stdin is blocking, and does not take advantage of any technique of I/O multiplexing (select/poll/epoll/…) for poll_oneoff, and so behaves unexpectedly in non-blocking mode. For instance,

// #include <sys/select.h>

int try_select() {
  fd_set readfds;
  FD_ZERO(&readfds);
  FD_SET(0, &readfds);

  // NOTE: passing NULL or a negative number in timeout 
  //       means waiting indefinitely!
  struct timeval timeout = {
    .tv_sec = 0,
    .tv_usec = 0,
  };
  return select(1, &readfds, NULL, NULL, &timeout);
}

It is surprising to see how Wasmer deviates from the spec…

Blocking? Timeout POSIX spec Wasmer's behavior Remark
Yes NULL Blocks Blocks
Yes {0, 0} Returns 0 Blocks (*1) This is trivial to fix by skipping s/g buffers of zero length.
No NULL Blocks Returns 1 (*2) This is hard to fix on Wasmer's side.
No {0, 0} Returns 0 Returns 1 (*2) This corresponds to the demo code. The trace is provided below.

(*1): Wasmtime's behavior is correct. I believe this is another bug in Wasmer. The only correct way to tell poll_oneoff to wait indefinitely is to not include a clock event.
(*2): it errs out with EAGAIN (!); the readiness is set to EPOLLERR.

Read this section only if you are curious about the inner working...

Call try_select after setting O_NONBLOCK and compare the behavior in a UNIX system vs. in WASI runtime. UNIX returns 0 denoting that the read is not ready. On the other hand, the debug trace from Wasmer shows that poll_oneoff returns 1 when the underlying FD is set to non-blocking mode. A program seeing this will do the read and fail with EAGAIN.

The strace -ff shows that the original polling implementation is actually a direct read! (See Stdin's poll_read_ready)

[pid 95011] read(0,  <unfinished ...>
[pid 94997] <... futex resumed>)        = -1 ETIMEDOUT (ι€£η·šθΆ…ιŽζ™‚ι–“)
[pid 94997] sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
[pid 94997] munmap(0x79dfaa456000, 12288) = 0
[pid 94997] rt_sigprocmask(SIG_BLOCK, ~[RT_1], NULL, 8) = 0
[pid 94997] madvise(0x79dfa9e00000, 2076672, MADV_DONTNEED) = 0
[pid 94997] exit(0)                     = ?
[pid 94997] +++ exited with 0 +++
123
[pid 95011] <... read resumed>"123\n", 8192) = 4

And the trace from Rust's side is

TRACE [...]::poll_oneoff: enter
TRACE [...]::poll_oneoff: triggered fd=0 readiness=EPOLLERR userdata=0 ty=1 peb=1 fd_guards="[guard-file(fd=0, peb=1)]"
TRACE [...]::poll_oneoff: return=Ok(Errno::success) fd_guards="[guard-file(fd=0, peb=1)]" seen="Event { userdata: 0, error: Errno::again, type: Eventtype::FdRead }"
TRACE [...]::poll_oneoff: close time.busy=1.86ms time.idle=417ns fd_guards="[guard-file(fd=0, peb=1)]" seen="Event { userdata: 0, error: Errno::again, type: Eventtype::FdRead }"

We want polling without timeout blocks until stdin is ready, but non-blocking read must not block. The takeaway is that you can never truly make a non-blocking, concurrent program without essential APIs from OS.

After several failing attempts, I do not think the problem can be solved. But after a month of debugging, finally I found that the problem genuinely came from two instances of Stdins ever constructed. The solution can be broken up in two parts:

  1. Carefully implement polling logic in poll_read_ready. Note that we do not need anything fancy in poll_read. It is the outer layer (i.e., fd_read that is responsible for the blocking/nonblocking mode.
  2. Stick to that non-blocking IO from now on.

Here is my patch against Wasmer v4.2.5. Note that this patch only works for UNIX-like system , and programs relying on a blocking stdin will break.

The patch enables the nonblocking mode for fd_fdstat_set_flags, which a program can invoke through fcntl. It would be better to inherit that flag from the host. This way, the user can use a wrapper program (like stdbuf) to specify the mode.

This bug bothers me for nearly a month. I have to resort to talking with ChatGPT on my Rust-y code.

Acknowledgements

Thanks to osa1 and others for a discussion thread at Rust user forum, and the code from project @osa1/tiny.

TODO for Wasmtime.

This section serves as a placeholder. I expect Wasmtime easier to fix.