# WASM/WASI realpath issue **Symptom**: In Haskell, the `canonicalizePath` function from `directory` package is overly-conservative on simplifying paths when its path component prefixes contain a hole that is not backed by any preopen. > Function definition for reference: https://github.com/haskell/directory/blob/v1.3.9.0/System/Directory/OsPath.hs#L865 :::success **tl;dr**: The behavior is intended but you might have a hard time discovering it. If you cannot control your preopens, you had better patch the WASI runtime to emulate a root directory. ::: For example, the WASM module with only one preopen `/foo/bar` that contains a directory `xxx` and a file `yyy` fails to canonicalize `/foo/bar/xxx/../yyy` to `/foo/bar/yyy` (will return as-is). This contradicts to [the description](https://hackage.haskell.org/package/directory-1.3.9.0/docs/System-Directory.html#v:canonicalizePath) (emphasis mine): > Indirections include the two special directories `.` and `..`, as well as any symbolic links (and junction points on Windows). The input path need not point to an existing file or directory. Canonicalization is performed ==on the longest prefix of the path that points to an existing file or directory==. The remaining portion of the path that does not point to an existing file or directory will still be normalized, ==but case canonicalization and indirection removal are skipped== as they are impossible to do on a nonexistent path. In this case the "longest prefix that points to an existing file or directory" is `/foo/bar/xxx`. It does not matter whether `/foo` exists or not. Does the failure imply that the existence of `/` breaks path resolutions? As seen below, this was proven to be the case and the function refused to do further resolutions when no preopen points to `/`. # Analysis Code path (TODO): * `realpath` (libc-top-half/musl/src/misc/realpath.c) * `readlink` (libc-bottom-half/sources/posix.c) * `find_relpath` * `find_relpath2` * `__wasilibc_find_relpath` (or ~`_alloc`) * `__wasilibc_find_abspath` (libc-bottom-half/headers/public/wasi/libc-find-relpath.h) * ... * if it exists -> `__wasilibc_nocwd_readlinkat` * trap to `__wasi_path_readlink` in JS import object TODO: From skimming the code. I cannot verify these branches because I do not know how to do `zig cc` with my instrumented wasi-libc. # Working around wasi-libc's `realpath` From expriments it appears that `realpath` from libc is the curprit. Given the directory structure: ```shell $ mkdir -p /tmp/foo/bar/baz $ mkdir -p /tmp/foo/bar/owo/qaq ``` And the testing code: ```c #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> int main() { char buf[4096]; char* ptr = realpath("/tmp/foo/bar/baz/../owo/qaq", buf); if (ptr == NULL) printf("failed: %s\n", strerror(errno)); else printf("realpath %s\n", ptr); } ``` Compile with: `zig cc -target wasm32-wasi main.c -o main.wasm` Testing with Node.js or wasmtime gives the same result: ```shell! $ wasmtime ./main.wasm # no preopen failed: No such file or directory $ wasmtime --dir / ./main.wasm # "/" is preopened explicitly realpath /tmp/foo/bar/owo/qaq $ wasmtime --dir /tmp ./main.wasm # "/" is implicitly created to contain "tmp" realpath /tmp/foo/baz/owo/qaq $ wasmtime --dir /tmp/foo ./main.wasm # why? failed: No such file or directory $ wasmtime --dir /tmp/foo/bar ./main.wasm # why? failed: No such file or directory ``` When `/` does not exist in the VFS, all calls to `realpath` fail with `ENOENT`. This behavior is not evident in the hind sight since things start breaking at the second level of nesting. Mounting `/tmp` alone is okay but mounting `/tmp/foo` is not. ### Traces on Node.js Did not trace on wasmtime but if their behavior is the same, it is less likely that it is an implementation bug in the runtime. The sample program yields the following if `/` or `/tmp` is mounted: ``` [WASI fd_prestat_get] 3,16777208 RET=0 [WASI fd_prestat_dir_name] fd=3 len=4 --> DONE ret=0 result=/tmp [WASI fd_prestat_get] 4,16777208 RET=8 [WASI random_get] 16777212,4 RET=0 [WASI path_readlink] dirfd=3 path=[.] outbuflen=4073 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo] outbuflen=4077 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo/bar] outbuflen=4081 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo/bar/baz] outbuflen=4085 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo/bar/baz/..] outbuflen=4088 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo/bar/owo] outbuflen=4092 --> FAIL ret=28 [WASI path_readlink] dirfd=3 path=[foo/bar/owo/qaq] outbuflen=4096 --> FAIL ret=28 [WASI fd_fdstat_get] 1,16764584 RET=0 [WASI fd_write] 1,16764592,2,16764588 realpath /tmp/foo/bar/owo/qaq RET=0 ``` When things do not work, like preopening `/tmp/foo`, it gives up rather early (not even stating `/`!): ``` [WASI fd_prestat_get] 3,16777208 RET=0 [WASI fd_prestat_dir_name] fd=3 len=8 --> DONE ret=0 result=/tmp/foo [WASI fd_prestat_get] 4,16777208 RET=8 [WASI random_get] 16777212,4 RET=0 [WASI fd_fdstat_get] 1,16764584 RET=0 [WASI fd_write] 1,16764592,2,16764588 failed: No such file or directory RET=0 ``` > [!NOTE] **Side note** > `path_readlink` syscall returns `28` (`EINVAL`) if the target exists and it is not a symlink. This is intended and conforms to the POSIX [`readlink` syscall](https://man7.org/linux/man-pages/man2/readlink.2.html). NOT to be confused with the [`readlink` command](https://man7.org/linux/man-pages/man1/readlink.1.html)! ## Suggested workaround Simply put, avoid preopening directory that is more than one level deep. Like, if you want to preopen `/tmp/foo`, preopen `/tmp` and make a `foo` directory in it. To workaround in your WASI implementation: 1. At program startup, wasi-libc [enumerates](https://github.com/WebAssembly/wasi-libc/blob/wasi-sdk-27/libc-bottom-half/sources/preopens.c#L244) fd starting from 3 and probe `fd_prestat_get`/`fd_prestat_dirname` until it sees `EBADF` (errno = 8). * When the underlying `fd_prestat_get` first errors and a root has not been discovered, allocate a fake root fd at this number `n` and report it instead. * You should fill `\0\0\0\0\x01\0\0\0` to the output buffer, meaning it is a dir (the 0th byte as uint8) with path length of 1 (the 4th byte as size_t or uint32). * When `fd_prestat_dirname` is called with `n`, report `/` (of course, only if the buffer is long enough). 3. In all other syscalls, pretend that `/` is backed by a dummy hierarchy of directories that "plugs" the holes. * In this case, wasi-libc attempts to call `path_readlink` on the faked root fd with path `.`, `tmp`, ...etc., depending on how deep your first preopen is. You should return `28` (`EINVAL`). Return `44` (`ENOENT`) for all other queries. * The most simple way is to create a directory structure and use WASI's runtime to answer the queries for you. * It seems not necessary to "emulate" complete subpath resolution in `/` (like `/tmp/foo/bar`). If wasi-libc matches a prefix of preopen directory, it queries the corresponding preopen fd. What a relief. > ps. In vscode-wasm a root file system is always attached. They know what they are doing :) > https://github.com/microsoft/vscode-wasm/blob/release/wasm-wasi-lsp/0.1.0-pre.8/wasm-wasi-core/src/common/service.ts#L1094 <div style="height: 2em"></div>