HackMD - Collaborative Markdown Knowledge Base

So I was thinking about shifting the complexity and offline compilation and whether we may benefit from doing the transformations "online". If we think about PolkaVM (or any other VM) holistically, as a process from the source code to performing the IO (host functions included) on the target machine, you can notice that there is a path through components with some boundaries. It's actually a graph, because there could be multiple producers for some components. Right now, we have something like that source → {Rust, C++} compiler → LLVM → \[wasm-opt\] → wasmtime → substrate/PVF host you could drill down deeper into each of the components. rust has MIR. LLVM also has its internal representations as well as wasmtime. There is also a linker. Each boundary provides some advantages and disadvantages. 1. A component may support several producers with varying degrees of coupling. wasmtime can accept any wasm binary. LLVM has more coupling, it also may couple the target platform, and generally, the frontend knows what platforms it compiles to. A component requires covering it with tests to ensure the correctness of transformation from input to output. Therefore, a component requires tooling of different categories. Essentials: objdump, assemblers and disassemblers, and then a long tail of nice-to-haves: at the beginning of the list is a test harness and in the end of the list things like syntax highlighting for Github, or other text editors. A component may find itself on different sides of the lifecycle. For example, we could imagine placing Rust to be the input. As one can imagine, it wouldn't be a really great solution, in terms of maintainability, robustness, determinism, etc. This is obvious I hope. Then, it's interesting to consider what if we move the frontier after before wasm-opt. IOW, we would guarantee running wasm-opt for the user binaries. There are different effects. This would increase the binary sizes that the user/the network has to pay. It marginally improves the DX. It fixes the version of wasm-opt used. It kinda of embeds some of the performance characteristics, so other implementations don't need it but will be highly inclined to use wasm-opt. Also, there is a parameter of blame: by embedding it we are basically taking responsibility for wasm-opts correctness. Then, there is the aspect of the number of eyes per SLOC or community support. The assumption is, that the more users there are, the technology is trending to be better, for some definition of that better. It doesn't mean that the technology would be the best, but in general, it would be better. In general. This is key word here, because, it also kinda of tradeoff. The more users there are, the worse they could be catered, in the end. Moreover, with more users it would be harder to change for your specific needs. I think wasm is a good example of this. I think generally it's a net very good technology. That is, for a portable-ish, secure-ish, sandboxed-ish, deterministic-ish, one-size-fits-all-ish, that is designed to be compiled efficiently by an optimizing compiler (but sacrificing a bunch of stuff for that) it is very good. There are a bunch of ready tools, several execution engines, several producers, etc. At the same time, it's rather hard to modify it. I can't imagine we can push through any standards. Heck, those guys did not really want to fix a standard and now as a result some parts toolchains moved ahead and do not support MVP stuff. The same dynamics work on another level: wasmtime. My implementation of async metering got stuck on some refactorings. All that makes sense and I don't want to blame them. They must think about the maintenance burden and so on. That's OK. So now, with this framework, let's see how PolkaVM fits in. source → {Rust, C++} compiler → LLVM -(riscv+ELF)→ polkavm optimizer/linker -(.polkavm)→ polkavm runtime → substrate/PVF host Here, we can actually choose where we draw the boundary. We can put it before the offline optimizer or after. I think it's actually a different situation than with wasm-opt, kind of, and we should seriously consider it. the question about the container is not that important. I imagine we could come up with an isomorphic container format and just treat it as lossless compression. Then the format itself. Jan mentioned that he does some instruction fusing, etc. The tangible result is the binary size. As I brought up on the call, maybe an actual data compression mechanism could make those things negligible. Overall performance/efficiency is not affected, because those transformations will still be done, just as part of the compilation. It seems that overall complexity is not affected, and I would argue that it is actually lowered because now you can ignore the serialization issues, and don't have to produce as much tooling (you will still need to have some tooling for debugging and testing though), etc. The robustness requirements are affected. This is because if the optimizer is offline it can panic, if the optimizer is online, it cannot. It will also be limited to linear algorithms. However, the input code will be essentially risc-v which would have big benefits. I already mentioned some "tools". One such tool are provers. There are already risc0 which allows you to create SNARK validity proofs. There are others such as powdr, nexus is building one on completely different stack. It's just a popular choice for a general-purpose ISA. Moreover, the same is happening on the fault proofs side, as I wrote [here](https://pep.wtf/posts/wasm-fraud-proofs/). Several fault proof engines already exists. With the surge of popularity of rollup solutions, which as part of them (I wrote about [this](https://pep.wtf/posts/sca-and-light-clients/) as well) have either of those proving systems, it may be very prudent to consider using risc-v as what we expose to the users.

Read more

Untitled