Try   HackMD

My notes on WasmFS

From the official documentation of Emscripten as of v3.1.48:

WasmFS is a high-performance, fully-multithreaded, WebAssembly-based file system layer for Emscripten that will replace the existing JavaScript version.

The JavaScript-based file system was originally written before pthreads were supported and when it was more optimal to write code in JS. As a result it has overhead in pthreads builds because we must proxy to the main thread where all filesystem operations are done. WasmFS, instead, is compiled to Wasm and has full multithreading support. It also aims to be more modular and extensible.

Its public-facing API is at src/library_wasmfs.js. To let a project adapt WasmFS, link it with the flag -s WASMFS. It replaces the position of the traditional FS layer in mainstream. Considering that all filesystem APIs are native in the first place, you probably want -s FORCE_FILESYSTEM as well to expose the JS APIs.

Unlike traditional FS implementation that wires almost all system calls through their counterparts in JS land, WasmFS is mostly written in C/C++ that is then compiled to WASM alongside with other libraries. This makes async operations directly at comsumer's disposal and provides thread safety naturally when interacting with file systems. In theory, it also helps reduce the runtime JS bundle size, though some benchmark is needed to draw this conclusion.

One can still write JS backends, but writing one for WasmFS is inherently more complicated than before. Now that data structures are not in JS objects, each will require a wrapper layer in C. There is a built-in backend primitive JSImplBackend for this. The corresponding concrete backend is JSFILEFS. They are not fully-featured, just enough to prove the concepts.

Here is the list of implemented backends in the official repository:

  • MEMFS: memory-mapped.
  • NODEFS: mapped to Node's synchronous FS API.
  • OPFS: supporting Origin private file system.
  • ICASEFS: a case-ignored FS.
  • FETCHFS: an async FS that is proxied through pthread. - new in WasmFS
  • JSFILEFS: a FS that has JS-defined logic with a thin C++ wrapper. - new in WasmFS

Note that IDBFS does not (yet?) support WasmFS. If your existing project is linked against it, it will not compile with WasmFS.

TODO: explain wasmFS.addBackend

Now let's trace what a WasmFS mount operation goes through when with a custom backend:

  • The operation starts by passing an object with method createBackend to the type argument of FS.mount(type, opts, mountpoint). The job of createBackend is to initate a backend in WASM memory and return an opaque pointer to it. Built-in constructors are named after wasmfs_create_*_backend. The pointer is passed to __wasmfs_mount.
  • _wasmfs_* is at system/lib/wasmfs/js_api.cpp. It calls a system call wasmfs_create_directory, which points to doMkdir.
  • It gets the backend and calls its createDirectory method. In this file, we can see that when a syscall need to do operation on a file/directory/symlink, it will eith query its backend, or query the file object and delegate to it.

Let's see a particular backend implementation, JSImplBackend:

// To write a new backend in JS, you basically do the following:
//
//  1. Add a declaration of the C function to create the backend in the
//     "backend creation" section of emscripten/wasmfs.h. (One line.)
//  2. Add a cpp file for the new backend, and implement the C function from 1,
//     which should create it on both the C++ (using JSImplBackend) and JS
//     sides. (By convention, the C function should just call into C++ and JS
//     which do the interesting work; the C is just a thin wrapper.) (A few
//     lines.)
// 3. Write a new JS library, and add the implementation of the JS method just
//    mentioned, which should set up the mapping from the C++ backend object's
//    address to the JS code containing the hooks to read and write etc. (99%
//    of the work happens here.)
//
// For a simple example, see js_file_backend.cpp and library_wasmfs_js_file.js

js_impl_backend.h contains brief instructions on how to make ones, but I have never compiled Emscripten itself, so that part is left for the future. Hope that they can publish more guides soon.

JSImplBackend

Current JSImplBackend has only one implementation as JSFILEFS. The async version also has one as FETCHFS. Let's dig into JSFILEFS.

It is available as a library (-ljsfile.js) and gets implemented at library_wasmfs_js_file.js:

  • src/library_jsfile.js:

    ​​​$JSFILEFS: {
    ​​​  createBackend(opts) {
    ​​​    return _wasmfs_create_js_file_backend();
    ​​​  }
    ​​​}, /* ... */
    
  • src/library_wasmfs_js_file.js:

    ​​​addToLibrary({
    ​​​  /* ... */
    ​​​  _wasmfs_create_js_file_backend_js: (backend) => {
    ​​​    wasmFS$backends[backend] = {
    ​​​      allocFile: (file) => { /* ... */ },
    ​​​      /* ... */
    ​​​    },
    ​​​    /* ... */
    ​​​  }
    ​​​});
    

In JSImplBackend, Symlinks and directories are still memory-mapped. Only files are specialized as JSImplFile. Files are handled through a minimal collection of JS functions and file contents are exposed as Uint8Arrays. See system/lib/wasmfs/js_impl_backend.h#L115.

TODO: describe what happens on JS-side and C-side respectively.

In jsimpl, a backend exposes a createBackend method that stores the JS methods into $wasmFS$backends[id]. I wonder if different JS mounts are sharing the same definition of backend.

Sample code

I played around WasmFS's JSFILEFS here:
https://gist.github.com/andy0130tw/76352f747f7a1d9b3210e9535db7e4a1.