# My notes on WasmFS From [the official documentation of Emscripten as of v3.1.48](https://emscripten.org/docs/api_reference/Filesystem-API.html#new-file-system-wasmfs): > WasmFS is a high-performance, fully-multithreaded, WebAssembly-based file system layer for Emscripten that will replace the existing JavaScript version. > > The JavaScript-based file system was originally written before pthreads were supported and when it was more optimal to write code in JS. As a result it has overhead in pthreads builds because we must proxy to the main thread where all filesystem operations are done. WasmFS, instead, is compiled to Wasm and has full multithreading support. It also aims to be more modular and extensible. Its public-facing API is at [src/library_wasmfs.js](https://github.com/emscripten-core/emscripten/blob/3.1.48/src/library_wasmfs.js). To let a project adapt WasmFS, link it with the flag `-s WASMFS`. It replaces the position of the traditional FS layer in mainstream. Considering that all filesystem APIs are native in the first place, you probably want `-s FORCE_FILESYSTEM` as well to expose the JS APIs. Unlike traditional FS implementation that wires almost all system calls through their counterparts in JS land, WasmFS is mostly written in C/C++ that is then compiled to WASM alongside with other libraries. This makes async operations directly at comsumer's disposal and provides thread safety naturally when interacting with file systems. In theory, it also helps reduce the runtime JS bundle size, though some benchmark is needed to draw this conclusion. One can still write JS backends, but writing one for WasmFS is inherently more complicated than before. Now that data structures are not in JS objects, each will require a wrapper layer in C. There is a built-in backend primitive `JSImplBackend` for this. The corresponding concrete backend is JSFILEFS. They are not fully-featured, just enough to prove the concepts. Here is the list of implemented backends in the official repository: * MEMFS: memory-mapped. * NODEFS: mapped to Node's synchronous FS API. * OPFS: supporting [Origin private file system](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API/Origin_private_file_system). * ICASEFS: a case-ignored FS. * FETCHFS: an async FS that is proxied through pthread. --- new in WasmFS * JSFILEFS: a FS that has JS-defined logic with a thin C++ wrapper. --- new in WasmFS Note that **IDBFS** does *not* (yet?) support WasmFS. If your existing project is linked against it, it will not compile with WasmFS. TODO: explain `wasmFS.addBackend` Now let's trace what a WasmFS mount operation goes through when with a custom backend: * The operation starts by passing an object with method `createBackend` to the `type` argument of [`FS.mount(type, opts, mountpoint)`](https://github.com/emscripten-core/emscripten/blob/3.1.48/src/library_wasmfs.js#L346). The job of `createBackend` is to initate a backend in WASM memory and return an opaque pointer to it. Built-in constructors are named after `wasmfs_create_*_backend`. The pointer is passed to `__wasmfs_mount`. * `_wasmfs_*` is at [system/lib/wasmfs/js_api.cpp](https://github.com/emscripten-core/emscripten/blob/3.1.48/system/lib/wasmfs/js_api.cpp). It calls a system call `wasmfs_create_directory`, which points to `doMkdir`. * It gets the backend and calls its `createDirectory` method. In this file, we can see that when a syscall need to do operation on a file/directory/symlink, it will eith query its backend, or query the file object and delegate to it. Let's see a particular backend implementation, `JSImplBackend`: ```cpp // To write a new backend in JS, you basically do the following: // // 1. Add a declaration of the C function to create the backend in the // "backend creation" section of emscripten/wasmfs.h. (One line.) // 2. Add a cpp file for the new backend, and implement the C function from 1, // which should create it on both the C++ (using JSImplBackend) and JS // sides. (By convention, the C function should just call into C++ and JS // which do the interesting work; the C is just a thin wrapper.) (A few // lines.) // 3. Write a new JS library, and add the implementation of the JS method just // mentioned, which should set up the mapping from the C++ backend object's // address to the JS code containing the hooks to read and write etc. (99% // of the work happens here.) // // For a simple example, see js_file_backend.cpp and library_wasmfs_js_file.js ``` `js_impl_backend.h` contains brief instructions on how to make ones, but I have never compiled Emscripten itself, so that part is left for the future. Hope that they can publish more guides soon. # JSImplBackend Current JSImplBackend has only one implementation as `JSFILEFS`. The async version also has one as `FETCHFS`. Let's dig into `JSFILEFS`. It is available as a library (`-ljsfile.js`) and gets implemented at `library_wasmfs_js_file.js`: * [src/library_jsfile.js](https://github.com/emscripten-core/emscripten/blob/3.1.48/src/library_jsfile.js): ```javascript $JSFILEFS: { createBackend(opts) { return _wasmfs_create_js_file_backend(); } }, /* ... */ ``` * [src/library_wasmfs_js_file.js](https://github.com/emscripten-core/emscripten/blob/3.1.48/src/library_wasmfs_js_file.js): ```javascript addToLibrary({ /* ... */ _wasmfs_create_js_file_backend_js: (backend) => { wasmFS$backends[backend] = { allocFile: (file) => { /* ... */ }, /* ... */ }, /* ... */ } }); ``` In JSImplBackend, Symlinks and directories are still memory-mapped. Only files are specialized as `JSImplFile`. Files are handled through a minimal collection of JS functions and file contents are exposed as `Uint8Array`s. See [system/lib/wasmfs/js_impl_backend.h#L115](https://github.com/emscripten-core/emscripten/blob/3.1.48/system/lib/wasmfs/js_impl_backend.h#L115). TODO: describe what happens on JS-side and C-side respectively. In `jsimpl`, a backend exposes a `createBackend` method that stores the JS methods into `$wasmFS$backends[id]`. I wonder if different JS mounts are sharing the same definition of backend. ## Sample code I played around WasmFS's `JSFILEFS` here: https://gist.github.com/andy0130tw/76352f747f7a1d9b3210e9535db7e4a1.