Different Stacks of Wasm

--- tags: wasm --- # Different Stacks of Wasm > What follows, is a copy of a message trying to untangle all the different concepts that involve "stack overflow" in the context of wasm. Wasm is a stack machine. intel/arm and pretty much every other architecture is register based. That means that for instructions to perform their work you need to provide the input values in registers. Outputs are also conveyed by the registers. In a stack machine you do it by putting on the value stack. Whenever wasm compiled into machine code those stack values are mapped into machine registers. The number of machine registers is limited. intel has ≈16 registers. The stack is on the other hand is not limited by physical constraints. So the goal is to map the unlimited stack values onto the limited set of registers. This is called register allocation. Not all of them will fit but not all of them are needed in registers at the same time. To keep around those not fit values until they actually needed they are "spilled" into the (native) stack. Besides that the native code have to store the return addresses when it performs calls into other functions. That has to be tracked. The WebAssembly spec is a bit vague about the limits. It [says](https://webassembly.github.io/spec/core/appendix/implementation.html#execution) that there has to be a limit for number of calls or values on the stack, but it is implementation defined. In practice, register allocation is full of heuristics and can change quite a lot. It is [not](https://github.com/bytecodealliance/wasmtime/issues/4214) theoretical. Never mind the difference between different platforms (e.g. Aarch64 has 32 registers). In order to tame this difference we use the [stack height instrumentation](https://github.com/paritytech/wasm-instrument/blob/d10bbdf5548d436228fb15a030467f0bcf718b4b/src/stack_limiter/mod.rs#L64-L113). Roughly, it tries to estimate the max amount of native stack used by a function and instrument every function call to decrement some global counter with that maximum. If the counter is overflown a trap is generated. The estimation of the maximum size is based on counting of the maximum number of values pushed on the wasm value stack, arguments used by the function and the number of local variables defined (see the link above for details). This estimate is what we refer to as **logical** stack cost. This makes the trapping deterministic if this condition is reached before the native stack is exhausted. In practice, that means the native stack should be just bigger. However, that instrumentation was developed with a single-pass (read dumb) compiler in mind, and [does not work well](https://bytecodealliance.zulipchat.com/#narrow/stream/217126-wasmtime/topic/deterministic.20stack.20usage) with a real register allocation. To mitigate this we overallocate to be 256x times bigger than the theoretical maximum. There is no guarantee that an attacker would not be able to find a piece of malicious code that would be able to have a stack frame that is 256x of the estimate. I was meaning to write a fuzzer for wasmtime that would try to break that invariant, but this requires a custom-metric guided fuzzer and I am not sure if there are any. This requires attention. The stack limit on the wasmtime part is a tiny bit weird. You can specify however big you want to have it, but it won't actually do anything. It would not prevent the wasm code executed from actually overflowing the native stack and abort the process. See more details in [the wasmtime fix PR](https://github.com/bytecodealliance/wasmtime/pull/4295). The [PR](https://github.com/paritytech/polkadot/pull/5712) Basti linked is actually to prevent the situation when a PVF managed to consume more than the native stack space (unlikely but possible) which leads to abort of the process. The typical limit on linux is 2-8 MiB and can be configured via ulimit. Thus that's also a determinism/consensus concern. Then, when you compile a high-level language such as Rust into wasm you won't be able to get away with only the value stack. One of the reasons why not is because the value stack is opaque and is not addressable. The only defined operations on it is push and pop and that's it. However, you might have noticed that in Rust you are able to put a variable in stack and then take its address as a pointer. For those cases, Rust/LLVM emulates a shadow stack via the wasm memory. To work with it, Rust/LLVM allocates a global variable for the stack pointer. Roughly, to push a value on the stack it decrements (the stack grows from top to bottom) the stack pointer and then writes the value at that location. To pop a value it just increments the stack pointer. To get an address of an item you just copy the value of the stack pointer. LLVM/LLD allocates the shadow stack. There is a configuration value and it is or was controlled by the wasm-builder. By default it was generous 1 MiB. LLD puts it the first thing in the wasm memory. Literally, 0-1 MiB is occupied by the stack. This layout is useful because whenever the stack overflows, literally, when the stack pointer is 0 and by pushing another value there it will be at the 0xFFFFFFxx range. When a write is performed, assuming that the wasm memory is less than 4 GiB, then OOB write trap will be produced. The runtime/PVF authors should keep those limits in mind. Specifically: the wasm stack is limited by 65536 values and the shadow stack is limited by whatever they configured it (typically 1 MiB).