FVM Wasm Memory Expansion

# FVM Wasm Memory Expansion This document is a collection of notes I took during preliminary investigation into the [issues](https://github.com/filecoin-project/ref-fvm/issues/851) around gas accounting for memory usage. The main goal was to understand how to configure [wasmtime](https://github.com/bytecodealliance/wasmtime), the Wasm library used by the FVM, so that we can achieve our goals for the FVM in an efficient manner. One concern was that we might end up over-allocating a lot of memory, or under-allocating and charging gas for copying memory around when the contract runs out of memory and needs to "double" it, which would be costly and slow. The other was how to enforce overall limits during contract execution. ## FVM Overview It helps to get an idea of how FVM works to understand what we want from Wasmtime. Each message (except transfers) included in a Block is handled by creating a fresh Wasm instance, loading the Wasm module corresponding to the Actor the message is targeted at, injecting all the functions supported by the host (e.g. interacting with storage), then invoking the execution, which can return some value. The interesting part from our perspective here is what happens when the Actor needs to talk to another Actor, if it needs to recurse. ### The Actor Model Actors communicate through messages. If we look at the built-in Market Actor as an example, we see that it implements [invoke_method](https://github.com/filecoin-project/builtin-actors/blob/5934421954d63e7ec84b4ba6eaac1e30811e9a05/actors/market/src/lib.rs#L1455) which gets a `MethodNum` it uses for dispatching and some `RawBytes` as parameters, which are deserialized into the correct type depending on the method number. Then, the code calls the corresponding method such as [WithdrawBalance](https://github.com/filecoin-project/builtin-actors/blob/5934421954d63e7ec84b4ba6eaac1e30811e9a05/actors/market/src/lib.rs#L1473-L1476). Each of these methods follow a pattern of executing the call [in a transaction](https://github.com/filecoin-project/builtin-actors/blob/5934421954d63e7ec84b4ba6eaac1e30811e9a05/actors/market/src/lib.rs#L158), which updates and persists the actor state, and _then_ optionally use the return value to [send a message](https://github.com/filecoin-project/builtin-actors/blob/5934421954d63e7ec84b4ba6eaac1e30811e9a05/actors/market/src/lib.rs#L191) to another Actor using `Runtime::send`. In fact if we look at the `FvmRuntime` [implementation](https://github.com/filecoin-project/builtin-actors/blob/5934421954d63e7ec84b4ba6eaac1e30811e9a05/runtime/src/runtime/fvm.rs#L284-L286) of `send` we see that sending is not allowed during transactions. > Note: It would be nice if [Runtime::transaction](https://github.com/filecoin-project/builtin-actors/blob/fc3c24b27bb903b4bdba98627a98b9f029d18506/runtime/src/runtime/mod.rs#L113) would pass a less powerful version of itself that doesn't even have a `send` method, but that's a different story. We see the same [idea](https://docs.cosmwasm.com/docs/1.0/actor-model/idea/) in the CosmWasm [actor model](https://docs.cosmwasm.com/docs/1.0/architecture/actor/). Based on this we might get the impression that the execution is like a trampoline, invoking an actor, collecting the messages that need to go out, adding them to a queue, shutting down the Wasm module serving this message, then invoking the next actor, and so on until the message queue becomes empty. This, however, is just a convention of the built-in actors, and not enforced by the FVM. For example in the EVM, we can call another smart contract and use the return value in the same contract, ie. do a recursive call. What happens in that case is the FVM will have a stack of Wasm instances, it suspends the execution of the caller while it handles the recursive call in a new instance, then resumes with the return value. ### FVM Machinery The lifecylce of a recursive call in the FVM looks roughly as follows: 1. [DefaultExecutor::execute_message](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/executor/default.rs#L53) is called. It [instantiates](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/executor/default.rs#L69-L75) a new `CallManager` and [calls](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/executor/default.rs#L88-L90) `CallManager::send` in a `CallManager::with_transaction`. 2. [DefaultCallManager::with_transaction](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L170-L181) wraps the closure passed to it in calls to [StateTree::begin_transaction and StateTree::end_transaction](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/state_tree.rs#L436-L448) which maintain a stack of state snapshots. This provides transaction isolation: if a recursive call fails, only the last layer needs to be discarded; if it succeeds, it's merged into the caller state. This is also similar to CosmWasm, in that the caller can decide how to handle failures. 3. [DefaultCallManager::send](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L99) has the main responsibility of [checking and maintaining](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L136-L145) the `call_stack_depth`, before delegating to `send_unchecked` to look up or create the recipient Actor, and finally call `send_resolved` to do the actual message handling. 4. [DefaultCallManager::send_resolved](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L358) has the logic to [short circuit](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L384) transfers, [put the input parameters](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L392) into the IPLD `BlockRegistry` where Wasm can find it by ID later, get a [wasmtime::Engine wrapper](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L402), and create a [new wasmtime::Store instance](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L420) for this call, passing to it a [new Kernel instance](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L417) bound to this `CallManager`. The [Kernel](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/kernel/mod.rs#L46-L57) is an interface to all the host-provided operations available to the actors, including `SendOps`. The [Engine](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/machine/engine.rs#L23) wraps the `wasmtime::Engine` instance, of which there typically exists only one, and it's also the place to look for the [default wasmtime configuration](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/machine/engine.rs#L75) used by the FVM. 5. Then, we [create the wasmtime::Instance](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L425-L428) by calling [Engine::get_instance](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L425-L428) which looks up or creates a cached `wasmtime::Linker` instance and [binds all syscalls](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/syscalls/mod.rs#L97), including [send](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/syscalls/mod.rs#L184). 6. The [memory of this instance](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L430-L435) is made accessible to the `Store` so the syscall handlers can interact with it. Finally we [look up and invoke](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L437-L450) the `ActorCode::invoke_method` we saw the Market Actor implements, passing in the ID of the IPLD block representing the inputs. The [result](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/call_manager/default.rs#L466-L479) is another (optional) IPLD block ID that can be retrieved from the `BlockRegistry`. 7. So where does recursion happen? If we look at the [send](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/syscalls/send.rs#L12) method injected into the `Linker` we see it receives the values Wasm is able to pass: numbers, potentially representing pointers and lengths into the `Memory` we put into the `Store`, IDs in the `BlockRegistry`, or numeric values. After restoring the recipient address from the `Memory` it calls `Kernel::send`. 8. [DefaultKernel::send](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/kernel/default.rs#L396) first [gets the input](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/kernel/default.rs#L409) from the `BlockRegistry`, then [recursively calls](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/kernel/default.rs#L418-L420) `CallManager::send` in in its own `CallManager::with_transaction`, and finally deals with the [return value](https://github.com/filecoin-project/ref-fvm/blob/f5729b57a3c34ddd8e176936a326aced9b19bc74/fvm/src/kernel/default.rs#L427-L434): because `CallManager` can return an actual IPLD block, it puts it into the `BlockRegistry` into the next slot, and returns its ID to be sent back to Wasm. So, steps 2) to 8) repeat for each recursive call, each time adding a new `wasmtime::Store` and `wasmtime::Instance`, implicitly building up a stack of environments, all held together by the same `CallManager`. The interesting question for us is what happens when we create a new `wasmtime::Instance` with regards to memory, how expensive is this operation, how many recursions can we support, how can we limit the memory use across the whole recursion, and what is the best configuration for us to use? ## Wasmtime ### Intro Here are some links that give a good overview of Wasmtime: * the [Architecture of Wasmtime](https://docs.wasmtime.dev/contributing-architecture.html) * the top level [documentation](https://docs.rs/wasmtime/latest/wasmtime/) explaining the Core Concepts and the optional features * a rundown of the [performance optimizations](https://bytecodealliance.org/articles/wasmtime-10-performance) ### Configuration Options The following are some of the most relevant knobs for us to turn, but not an exhaustive list of all options that matter. Important for us is that it has [static and dynamic](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#static-vs-dynamic-memory) memory which offer different tradeoffs, and depending on configuration we end up using one, or the other. In a nutshell, static memory is allocated up front, with lots of optimizations around it, but cannot grow beyond its maximum size, while dynamic memory can grow at the cost of relocations when the maximum is reached, and using it prevents some of the clever optimizations used by Wasmtime to kick in. #### [Config::allocation_strategy](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.allocation_strategy) This is the most crucial choice. We only have two built-in [InstanceAllocationStrategy](https://docs.rs/wasmtime/latest/wasmtime/enum.InstanceAllocationStrategy.html) to choose from, and they take effect in [Config::build_allocator](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/wasmtime/src/config.rs#L1445): * `OnDemand`: Basically using dynamic memory for everything. * `Pooling`: Using static memory exclusively, with many tricks to make instantiation faster. ##### OnDemand `OnDemand` is the default one. With it, we get the [OnDemandInstanceAllocator](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator.rs#L430) which [creates a new dynamic memory](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator.rs#L490) every times it [allocates an instance](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator.rs#L552). Unless we implement our own [MemoryCreator](https://docs.wasmtime.dev/api/wasmtime/trait.MemoryCreator.html) and specify it via [Config::with_host_memory](https://docs.wasmtime.dev/api/wasmtime/struct.Config.html#method.with_host_memory) it will use the [DefaultMemoryCreator](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L34-L52), which just creates a new [MmapMemory](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L160) when called. ###### `MmapMemory` An `MmapMemory` can [handle both](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L205-L222) static and dynamic [MemoryStyle](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/environ/src/module.rs#L15). The style is [determined](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/environ/src/module.rs#L30) based on what the Wasm module declares about its own memory requirement and our own configuration values. If the style is static, a maximum size will be enforced in [RuntimeLinearMemory::grow](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L72). If style is dynamic, it can [copy the contents](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L278-L302) to a new, extended virtual address space when it has to. In both cases, only the minimum amount of memory is [made accessible](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L232) initially. ###### Guard Pages `MmapMemory` also handles _guard pages_. These are extra virtual address space reserved before (optional) and after the memory required by the wasm module, made [inaccessible](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/mmap.rs#L195) by `Mmap::accessible_reserved` by default, by the virtue of passing the `PROT_NONE` flag to the underlying [mmap](https://man7.org/linux/man-pages/man2/mmap.2.html). The memory will be [made](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/mmap.rs#L295) `READ|WRITE` as and when it [grows](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L306). The benefit of these guard pages is best explained in the comments of [code_translator::prepare_addr](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/cranelift/wasm/src/code_translator.rs#L2167). The gist of it is that hitting the guard page will result in a `SEGFAULT` because it is inaccessible, and if this area is sufficiently large, then Wasmtime can elide checking out-of-bounds memory accesses before certain operations, because it can prove that even if the pointer is out of bounds, it will fall within the guard pages and not cause any safety violation. This elision is done by the Cranelift code generator at compile time. The smaller the guard page, the less aggressively it can remove bounds checks. It should be noted that the default guard page size for dynamic memory is much smaller than the static one; so small that it probably leaves all bounds checks in place. ##### Pooling The `Pooling` strategy gives us the [PoolingInstanceAllocator](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L1020), which is about reusing `wasmtime::Instance`s if possible. With this strategy we have to specify further options: * [PoolingAllocationStrategy](https://docs.wasmtime.dev/api/wasmtime/enum.PoolingAllocationStrategy.html): governs how instances are to be reused. The default `ReuseAffinity` is the highlight here, which prompts Wasmtime to try to look for an instance in the pool which already has the same module loaded into memory that we want to instantiate. * [InstanceLimits](https://docs.wasmtime.dev/api/wasmtime/struct.InstanceLimits.html) tells the pool how many instances to allocate and what limits it can put on modules. The pooling strategy is the best if we can define reasonable maximum values for how many Wasm instances we want to allow and the maximum amount of memory they can consume. The default limit on instances (`count`) is 1000. There are two pools backing the allocator. The [InstancePool](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L202) reserves a single virtual address space for the basic data structures every module requires (~1MB per instance), and also creates a [MemoryPool](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L642) with enough memory for every instance. The `MemoryPool` [allocates](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L718-L728) up front `pre_guard_size + (max_memory + post_guard_size) * max_instances` as a single `mmap`, that is, it reserves enough virtual space so that it never has to do it again. [According to the docs](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L79) on 64 bit systems we have 128TiB addressable memory. By default it reserves 6GB per instance, plus 2GB guard page up front. This is another optimisation afforded to the pooling allocator: because it knows the memory layout and the limits, it can create overlaps in the after/before guard pages of subsequent memory slots; after all, they are all equally inaccessible. [InstancePool::allocate](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L301) is responsible for looking up a free instance index to use. [InstancePool::allocate_memories](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L362) is where we can see the pool get the memory slice for the selected index and initialize it with the module contents if it needs to, [wrapping it](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L421) in a [StaticMemory](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L346). Finally [InstancePool::deallocate_memories](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L440) _resets_ or clears the memory after the instance is returned to the pool. ###### Copy-on-Write The reset operation is key to performance. This is achieved by [MemoryImageSlot::clear_and_remain_ready](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/cow.rs#L470), which uses the Linux feature of `madvise` to reset the virtual memory to its initial contents. This, together with the `memory-init-cow` [feature](https://docs.rs/wasmtime/latest/wasmtime/#crate-features) of Wasmtime allows [MemoryImageSlot::instantiate](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/cow.rs#L345) to use copy-on-write semantics and reuse the Wasm module memory it initialized on a previous run, with new physical memory only allocated when something changes. Here's a [good overview](https://www.youtube.com/watch?v=8hVLcyBkSXY) of these features of `mmap`. According to the benchmarks done by the devs this technique resulted in a 400x speedup in Wasm instantiation. Unfortunately it is not available for the `OnDemand` strategy because `MmapMemory` will necessarily [create a new](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L240-L245) `MemoryImageSlot` every time. By contrast the `MemoryPool` used by the `InstancePool` [keeps tracks](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/instance/allocator/pooling.rs#L766-L796) of `MemoryImageSlots`, so that they can be reused. #### [Config::static_memory_maximum_size](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.static_memory_maximum_size) This is how much memory the [Pooling](#Pooling) allocator will reserve for each instance up front. It is also the value beyond which the [OnDemand](#OnDemand) allocator supports moving and growing; if the Wasm module wants less memory than this, it all gets pre-allocated. #### [Config::static_memory_guard_size](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.static_memory_guard_size) Sets the [guard page](#Guard-Pages) size of the regions between static memory slots used by the [Pooling](#Pooling) allocator, by default 2GB. #### [Config::dynamic_memory_guard_size](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.dynamic_memory_guard_size) Sets the [guard page](#Guard-Pages) size of the dynamic memories used by the [OnDemand](#OnDemand) allocator, by default 64KB. The docs say that this value isn't critical for performance becuase if we want performance we need static memory anyway. #### [Config::dynamic_memory_reserved_for_growth](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.dynamic_memory_reserved_for_growth) This setting governs how much (initially inaccessible) extra space a dynamic memory reserves for future growth. It was introduced becuase initially it just extended by a small amount, and caused fuzzers to spend all their time doing expansions. The default value here is 2GB. So the growth is not exponential doubling, it's always a constant buffer. #### [Config::guard_before_linear_memory](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.guard_before_linear_memory) It puts guard pages to the beginning of memory, not just after. They will be equal in size. The motivation here isn't to guard against out-of-bounds programming errors but against bugs in Cranelift code generation. For the pooling allocator this is a negligible extra, and is enabled by default. For dynamic it is a double overhead. #### [Config::memory_init_cow](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.memory_init_cow) Can be used to disable [Copy-on-Write](#Copy-on-Write). #### [Config::with_host_memory](https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.with_host_memory) It can be used to specify a custom [MemoryCreator](https://docs.rs/wasmtime/latest/wasmtime/trait.MemoryCreator.html). However this is a rather blunt instrument; we won't know for what module we're instantiating the memory for, so we can't do similar caching as the [Pooling](#Pooling) allocator. It is only used by the [OnDemand](#OnDemand) allocator. #### [Storage::limiter](https://docs.rs/wasmtime/latest/wasmtime/struct.Store.html#method.limiter) This is the other crucial setting for us. It allows us to configure a [ResourceLimiter](https://docs.rs/wasmtime/latest/wasmtime/trait.ResourceLimiter.html) which we can use to set an overall limit to the amount of memory a full recursion can use. That is, if we allow 4GB or _maximum_ memory _per instance_, but we obviously don't want all 1000 instances (which is our target recursion limit) to actually use that much, we can track the whole stack by adding a limiter in `DefaultExecutor::execute_message` where the `CallManager` was created. It is also the **perfect place to charge for gas**, becuase it is only called when new memory is required by the module, but not for example when dynamic memory is moved, so we'd only be charging for what the Wasm module actually uses, not for artifacts of misconfiguration. Notable places where `ResourceLimiter::memory_growing` is called: * in [Memory::limit_new](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L723) to check if the initial minimum required memory can be allowed * in [RuntimeLinearMemory::grow](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/runtime/src/memory.rs#L104) when the memory needs to expand beyond its current size (eg. from the initial minimum) ## Next Steps The results of this investigation look promising: * We wanted to limit recursion to ~1000 (like Python), which is in line with the default size of the pooling allocator, so it should work, although AFAIU prior testing showed some problems at 4096 instances. * We wanted to use memory efficiently and allocate a single slab, which is what the pooling allocator does. * The question is whether we can live with a limit like 4GB for a single Wasm instance. Like I said it would not be easy to replicate what the pooling allocator is doing, because Wasmtime doesn't allow us to plug in our own allocator. It could be interesting though to see whether making the [Engine::allocator](https://github.com/bytecodealliance/wasmtime/blob/v1.0.1/crates/wasmtime/src/engine.rs#L84-L85) settable could open up some avenues for us to combine the pooling allocator with the on-demand allocator. The idea would be to do what the [MmapMemory](#MmapMemory) does and check the `style` of the memory `plan`, and get an instance either from the pooling or the on-demand allocator, so modules that can't put a static maximum on their memory requirements can still be used, with lower performance. With that in mind, the first immediate actions would be to: * Create a `ResourceLimiter` to start tracing memory growth * Use the `ResourceLimiter` to start charging for gas; initially at 0 price because we don't know about the costs * Reinstante the test vectors used for benchmarking and collect stats about the minimum/maximum memory usage, recursion limits, etc. * Swap from the default to the pooling allocation strategy * Test that the desired recursion can be supported by the virtual address space without problems * Compare the benchmark results to have a rough estimate of the impact of using one allocator versus the other Further down the line we must estimate the true cost of expanding memory in terms of gas (ie. time), on standardized hardware. It's not immediately clear to me where time will be spent: does it depend on the size of the memory required from `mmap`, or just the portion that is made accessible, etc.