A new `vm` Source Text Module API

# A new `vm` Source Text Module API This pad proposes a new `vm` Module API that replaces the existing `vm.SourceTextModule`. ## 1. Full customization ```js // 1. Full customization example import { SourceTextModule, SyntheticModule, SourceTextModuleLoader, } from 'vm/modules'; // Names started with `helper` are only added for this example, they // are not part of the API. class ExampleModuleLoader extends SourceTextModuleLoader { #sourceStore = new Map(); #moduleCache = new Map(); // Invoked by Node.js to perform dynamic import(), overrides default dynamicImport(request, context, parent) /* override */ { // XXX: is the batching a bit unnecessary? Should we go with always operating on a single request? const mod = this.helperGetSingleModule(request, parent); mod.evaluate(); // It's not required for topLevelCapability to fulfill here. // That promise will just be forwarded to the code actually awaiting // on the dynamic import(). return mod.namespace(); }, // Invoked by Node.js to initialize import.meta, overrides the default importMeta(meta, context, parent) /* override */ { // If objects are attached, they should be typically created // with vm.runInThisContext() os similar with context meta.identifier = referrerIdentifier; } // Should we add load/resolve hook? How does that work for synthetic modules? // What if the dependencies should come from a different context? // Idea: provide defaultResolve/defaultLoad: https://github.com/nodejs/node/issues/61127 // Invoked by Node.js to get the Module for requests, overrides the default. // If the loading process is async, users can hoist // the fetching out and fetch proactively outside getModules() // TODO: Might need a better name getModules(requests, context, parent) /* override */ { // This is DFS, users can implement BFS too. const result = []; for (const { specifier } of requests) { // Consult the custom cache first. const cached = this.#moduleCache.get(specifier); if (cached) { result.push(cached); continue; } // Use the custom source store. const source = this.#sourceStore.get(specifier); if (!source) { throw new Error('module not found'); } // For a more full-fledged implementation, this might be expanded into full paths const identifier = specifier; // Compile. const mod = new SourceTextModule(source, { identifier, context }); // Get dependencies. // This is DFS, users can implement BFS too for better parallelization. const modules = this.getModules(mod.requests, context, mod); // This linking is again DFS in this implementation. mod.link(modules); result.push(mod); } return result; } // getModules() takes an array for batching, // this helper does it one-to-one helperGetSingleModule(request, context, parent) { return this.getModules([request], context, parent)[0]; } helperAddSource(identifier, source) { this.#sourceStore.set(identifier, source); } // Example helper that adds builtin as Synthetic module to the // module cache. helperAddBuiltin(identifier, mapping) { const keys = Object.keys(mapping); const mod = new SyntheticModule(keys => { for (const key of keys) { mod.setExport(key, mapping[key]); } }); this.#moduleCache.set(identifier, mapping); } }; ``` Preparing the custom loader: ```js // Add a few module sources loader.helperAddSource('async-root', ` export { foo } from 'foo'; export let bar; bar = await import('builtin:bar') + import.meta.identifier; `); loader.helperAddSource('sync-root', ` export { foo } from 'foo'; import { default as bar } from 'builtin:bar'; export const bar = bar + import.meta.identifier; `); loader.helperAddSource('foo', ` export const foo = globalThis.foo; `); // Add a synthetic module with prebaked values loader.helperAddBuiltin('builtin:bar', { default: 'bar' }); ``` #### 1.a Loading module that allows async graph in a new context ```js // Create a custom execution context import { createContext } from 'node:vm'; const context = createContext({foo: 'foo'}); const loader = new ExampleModuleLoader(); { const { 0: mod } = loader.getModules([{ specifier: 'async-root' }], context); // Throws synchronosly if loading errors. try { // TODO(joyee): what about deferred evaluate()? mod.evaluate(); } catch(e) { if (e.code === 'ERR_VM_MODULE_STATUS') { // handle status issue. } throw e; } mod.hasTopLevelAwait(); // true mod.hasAsyncGraph(); // true await mod.topLevelCapability; mod.namespace(); // { foo: 'foo', bar: 'bar root' } from custom context } ``` #### 1.b. mini synchronus require(esm) ```js { const {0: mod} = loader.getModules([{ specifier: 'sync-root' }], context)[0]; // Throws synchronosly if loading errors. try { mod.evaluate(); } catch(e) { if (e.code === 'ERR_VM_MODULE_STATUS') { // handle status issue. } throw e; } mod.hasTopLevelAwait(); // false mod.hasAsyncGraph(); // false // TODO: we can implement an API for users to unwrap evaluation // errors synchronusly from the rejection. mod.topLevelCapability; // A promise fulfilled with undefined. mod.namespace(); // { foo: 'foo', bar: 'bar root' } from custom context } ``` ## 2. Partial customization & registering it globally ```js // 2. Customizing based on default loading import { SourceTextModuleLoader, SyntheticModule } from 'vm/modules'; import { createContext, runInContext } from 'node:vm' // Support special import attributes { type: 'foo' } class CustomLoader extends SourceTextModuleLoader { getModules(requests, context, parent) { return requests.map((req) => { if (req.attributes?.type === 'foo') { return new SyntheticModule(['foo'], function() { this.setExport('foo', 'foo'); }, { context }); } // This can be optimized by batching uncustomized requests too return super.getModules([request], context, parent)[0]; }); } // Other methods are left to be the default. }; const loader = new CustomLoader(); // Register it for the vm context // This is only allowed to be called once per context; // TODO: Maybe not just once - using Chain of Responsibility pattern // to allow hoisting of asynchronicity; // For the middleware pattern, // use module.registerHooks(hooks, context) instead; // Only one pattern can be applied per context. loader.register(context); // { foo: 'foo' } evaluated in the vm context console.log( await runInContext( `import('foo', { with: { type: 'foo'} });`, context ) ); loader.deregister(context); // When done. // To register it for the main context globally // This is only allowed to be called once per context loader.register(); // { foo: 'foo' } evaluated in the main context constole.log(await import('foo', { with: { type: 'foo'} })); ``` ## Background It's been a while since the `vm` module APIs was implemented behind `--experimental-vm-modules` and stayed experimental. - There was a [tracking issue](https://github.com/nodejs/node/issues/37648) about its stabilization. - Notable discussions: - https://github.com/nodejs/node/issues/43899 - https://github.com/nodejs/node/issues/37648 Recently there's been renewed momentum to look into what it takes to bring it out of the experimental status again, and some new APIs have been added to address missing needs. Along the way it seems most of the API surface needs a redesign to be more powerful/flexible/efficient/conceptually closer to ES spec, it has become a bit awkward to keep piling new methods on the existing classes in order to do things differently without breaking the API. The recent changes include: - `link()`: - The original `link` method takes an asynchronous callback, which takes a specifier and returns the resolved module and is invoked once per module request - The new `linkRequest` method is invoked proactively by users, which takes an array of resolved modules that correspond to the array of module requests. - The new API has a lower performance overhead (by avoiding multiple C++ -> JS roundtrips), and allows implementers to control however they want to perform the loading so it's possible to implement synchornous `require(esm)` with it. - This method is now a no-op for SynthethicModule, because the concept of linking does not apply for them. - `instantiate()`: New API to accompany `linkRequest()`, which is called by users proactively, essentially the old `stm.link((specifier) => module)` was split into `stm.linkRequest(modules)` + `stm.instantiate()`. This allows users to perform what they want to do before the actual instantiation. - `moduleRequests`: Another new API to accompany `linkRequest()`, the order of the module request is stable for the same source text module and the modules passed to `linkRequest()` must correspond to the same order (and when two module requests are identical except phases, the modules being resolved must also be identical) - `hasTopLevelAwait()` and `hasAsyncGraph()`: to aid implementation of `require(esm)` - `evaluate()`: This previously always returns a pending promise and was thus asynchronus, now it returns a fulfilled promise for modules without top-level await, which matches more closely to what the ES spec says. The loader error, however, is still mixed with module evalutation error in the rejection to be backward compatible, which is a bit messy. Another issue in the current design of the API was that the callbacks used to handle dynamic `import()` and `import.meta` intialization are passed as an option to the module constructors, and it require careful memory management to make sure that they neither leak when these callbacks captures over the referrers, nor encounter use-after-free when some remote code calls its `import()` indirectly via a closure exported elsewhere. We've partially addressed this for modules with dedicated contexts by supporting per-context callbacks shared across all modules that are created in the context, but it remains unsolved for modules created in the main context. At this point, it seems better to consolidate the changes into a new API in order to offer a more baggage-free experience, instead of mixing them inside the existing interface.