# Status updates ## 27 January 2025 - Took a sick leave, down with the cold together with the rest of my family ## 17 January 2025 **Completed:** - Experimented with disabling RISCV extensions we don't need to simplify the codebase - Quite difficult as the code that we need depends on those definitions - 30k lines of compilation errors, way too much work - Reached a dead end trying to replacing Risc register with Zisk registers - A lot of assumptions in existing patterns about specific registers, need to delete/rewrite them - Crashes with cryptic error messages if some of the register conventions are not satisfied **Next steps:** - Investigating another approach that involves disabling RISCV register allocator and replacing it with another pass - The LLVM patterns will still be happy and use fixed registers when needed - But when the register is not fixed, we will have opportunity to use memory slots - Then in the future we can start relaxing the patterns that requite fixed registers ## 10 January 2025 **Completed:** - Learned all about registers in LLVM - How to define Zisk Register type - Where the register class annotations are added to the LLVM IR Graph nodes - How virtual registers are translated to physical ones through register allocation **Next steps:** - Generate Zisk instructions working with memory from LLVM, the steps involved are: 1. Define Zisk register class and concrete registers (done) 2. Implement a mapping from LLVM DAG Node types to Zisk register class (done) 3. Write a register allocator for RISC-V that converts virtual Zisk registers to memory slots 4. Re-define all RISC-V instructions that use registers to now use memory slots 5. Define the new encoding for instructions, as they don't fit into 32 bits anymore due to register address size ## 27 December 2024 **Completed:** - Improved the iteration time from 10m+ to 10s for a change in LLVM backend - Switched to dynamic LLVM to rustc linking - Wrote a shell script to incrementally rebuild LLVM and update the shared library in Rustc artifacts - Makes a huge difference for experiments with a codebase - [Implemented](https://github.com/0xPolygonHermez/zisk/commit/54d9820169f983b8b94fe123d0c9e4102fa45888) proof of concept for parsing of new instructions in Zisk emulator - This allows to encode a instructions longer than 64 bits in the RISC-V ELF from LLVM side, convert it to Zisk and execute it correctly - Required changes to ziskemu to support instructions longer that 32 bit - Looks like a workable path, but a proper support would require some refactoring in translator and emulator **Next steps:** - Turn the proof of concept into a more generic implementation to support further experiments - Extend `RiscvInstruction` with a native zisk opcode payload - Chose an encoding for `ZiskInst` in the binary generated by LLVM - Implement encoding in LLVM and decoding in ziskemu - Try to generate a Zisk instruction from LLVM that works directly with memory - I'm not sure I will be able to do this incrementally, so might need to convert all instructions at once, taking more time to implement this - I will be out next week for some travel and family time and will be back on 6th of January ## 23 December 2024 **Completed:** - Implemented a [prototype](https://github.com/aborg-dev/llvm-project/commit/ecf61daf7f6ac944e432712bbb40e50fe3eb9234) to encode long Zisk instructions in LLVM RISC-V backend - The proper way to do this is to use [LLVM interface](https://groups.google.com/g/llvm-dev/c/uiWRmdpV0PQ/m/-KXCkr59AwAJ) for long instructions over 64 bits - This would require also defining a disassembler for new instructions - A simpler path that would work for a prototype is to store additional payload in Zisk instructions and emit it manually - The instructions will be 256 bits in size, with a first 32 bit component being a `custom-0` RISC-V opcode to simplify parsing in Ziskemu - Read through WebAssembly register stackify code to borrow ideas - The main part of implementation is in https://github.com/aborg-dev/llvm-project/blob/rust-llvm/rustc/19.1-2024-12-03/llvm/lib/Target/WebAssembly/WebAssemblyRegNumbering.cpp and is fairly simple - We would also need to define a pseudo-registers for Zisk https://github.com/aborg-dev/llvm-project/blob/rust-llvm/rustc/19.1-2024-12-03/llvm/lib/Target/WebAssembly/WebAssemblyRegisterInfo.td - The initial approach is a bit wasteful as it does not reuse memory locations that have been freed, but there are [optimizations for this](https://github.com/aborg-dev/llvm-project/blob/rust-llvm/rustc/19.1-2024-12-03/llvm/lib/Target/WebAssembly/WebAssemblyRegColoring.cpp) - Next steps: - Exploring how to do the mapping from virtual registers to memory - Rough model is that before register selection, we have a sequential list of registers, each being defined only once (SSA) in a single code path - Q: Are there any constraints in Zisk for values in operands? - Speed up the build process - right now takes 10m for each change, need to find a better way ## 16 December 2024 - Finished end-to-end test with LLVM :tada: - Steps: - Fork LLVM RISC-V backend - Build Rust compiler with new LLVM - Register new Rust toolchain - Build Zisk program using this toolchain - Run the resulting program on ZiskEmu - Codebase: https://github.com/aborg-dev/llvm-project/tree/rust-llvm/rustc/19.1-2024-12-03/llvm/lib/Target/RISCV - Takes quite some time to build (~1 hour on a powerful machine), but then can be shipped to end users like today - Next steps: - Implement optimization that bypasses registers for arithmetic operations ## 6 December 2024 **Completed:** - Wrote a summary for Cranelift investigation: > The main intention of existing Wasmtime/Cranelift compilation pipeline is to generate bytecode that will run inside `wasmtime` VM. For ZisK project we will need to break that core assumption (as we agree that maintaining compatibility with a moving `wasmtime` target is a non-starter). > > The three main components that we will need to (re)write and maintain are: > 1. WASM frontend that generates ZisK-compatible CLIF - ~5k lines of Rust > 2. ZisK backend/codegen - ~10k lines of Rust + ISLE > 3. ZisK assembler/linker - ~2k lines of Rust; outputs a ZisK binary from a WASM module and ZisK bytecode > > These components are very unlikely to make it back to upstream due to specialized nature, so we will need to maintain a `wasmtime` fork and periodically sync it with the upstream. - Started to go through tutorials [1, 2] for creating a new LLVM backend - Quite big, but comparable with Cranelift: 45k C++, 30k TableGen, total 80k - To use with Rust we will need to compile a new `rustc` version with new LLVM linked in - We will also need to compile Rust standard library for the new target **Next steps:** - Fork LLVM RISC-V backend and build a binary end-to-end from Rust to ZisK using it [1] https://sourcecodeartisan.com/2020/09/13/llvm-backend-0.html [2] https://brson.github.io/2023/03/12/move-on-llvm#writing-an-llvm-backend-with-rust ## 29 November 2024 **Completed:** - Converted and launched Cranelift ELF files for SHA and Fibonacci on ZisK emulator - All sections can be parsed and opcodes can be converted - Actual execution crashes due to lack of WASM VM context support which is used for memory accesses - Identified [necessary changes](https://hackmd.io/XQIIUqjRRM-FrkrQsNAQrg?view#Code-Changes) to WASM -> CLIF codegen - We need to rewrite the conversion to account that WASM code will be running on the bare-metal like ZisK machine and not in the WASM VM - These changes are necessary regardless of the backend that we use (direct ZisK or through RISC-V), so start with this **Next steps:** - [Confirming](https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/Running.20.60wasmtime.20compile.60-ed.20WASM.20in.20the.20alternative.20VM/near/484976140) whether this is the right path with Wasmtime maintainers ## 22 November 2024 **Completed:** - Studied [Cranelift ELF file generation](https://github.com/bytecodealliance/wasmtime/blob/5af89308dc0229ca404cd7000eec694201022e2d/crates/wasmtime/src/compile.rs#L63) - Goes from WASM to an ELF file with bytecode for a given backend (e.g. RISC-V or ZisK) - A very complex piece of machinery, but it looks like we will be able to reuse most of it - The format is a bit different from normal ELFs, see [a discussion](https://bytecodealliance.zulipchat.com/#narrow/channel/217117-cranelift/topic/.60wasmtime.20compile.60.20for.20.60riscv64-ima.60.20target/near/483642712) - Got a lot more experience with debugging Cranelift codegen while investigating generated ELF files - It's possible to get detailed information about compiler decisions for individual functions - Can also get CLIF and bytecode for each function - Submitted [code refactoring to ZisK](https://github.com/0xPolygonHermez/zisk/commit/726c98eb3936b11af17d4c59e1ead50efcdfa777) ## 15 November 2024 It's been a bit hectic week and I needed to take a few sick days to looks after my son (who is feeling much better now). **Completed:** - Finished the document with a plan for the work ahead on Cranelift ZisK backend: https://hackmd.io/XQIIUqjRRM-FrkrQsNAQrg - Started to work on the stage 1 and will aim to share some results next week ## 7 November 2024 **Completed:** - [ISA benchmarks](https://github.com/aborg-dev/isa_benchmarks/tree/main) are now complete, you can see performance across all relevant targets on eth_block (and a few other) benchmark. I also produced a detailed instruction usage reports for RISC-V and WASM. That shows which instructions are the most prevalent in each of the benchmarks. - [Published](/jAXva4muSgKyJV17QZ9zfw) and discussed with @Jordi next possible projects to improve Rust -> ZisK compilation. We agreed to proceed with option 3. WASM -> ZisK compiler using Cranelift. We expect to get 20% - 50% improvement in instruction count over existing RISC-V based backend based on the initial measurements. **Next up:** - I'm writing a design document for a project to outline: - What exactly needs to be built - How the new backend will interact with ZisK and ZiskOS - Time estimates for different stages - I expect to finish and share this next week