SUREEL zkVM weekly syncup notes ============ ## 2024-12-10 ### PSE - debug tool - (WIP) remove hardcode position 0 lk mutiplicity check [PR 649](https://github.com/scroll-tech/ceno/pull/649) - prover - (WIP) skip PCS commitment/opening for structual column [issue 654](https://github.com/scroll-tech/ceno/issues/654) - based on issue 654 works, we can add new task to skip commitment of range table - optimize opcode circuit with 32-bit range check [proposal in #702](https://github.com/scroll-tech/ceno/issues/702) - refactor circuit to use 32 bit range check - spartan SPARK is implemented [draft PR 713](https://github.com/scroll-tech/ceno/pull/713) - current on hold for obstacle, need to figure out a way to do sparse fraction sum: [main obstacle](https://github.com/scroll-tech/ceno/issues/702#issuecomment-2527165584) - sumcheck optimisation - ideas: optimize virtualpoly and extract identity polynomial ### Miscs - Scroll set timeline to publish Ceno research paper/blog on 2025/01/07 including timeline, benchmark. - they setup timeline for engineering stuff: benchmark, sdk to be ready before Dec 24'. - Scroll asked opinion of wordings for our contributions. we suggest from individual perspective > We would like to acknowledge and thank Ming (https://github.com/hero78119), Soham (https://github.com/zemse), Kimi (https://github.com/KimiWu123), and Han (https://github.com/han0110) from Privacy and Scaling Exploration (https://pse.dev/) (PSE) for their significant contribution (https://github.com/privacy-scaling-explorations/ceno/graphs/contributors) to the design and implementation of Ceno since its early inception. - Have a meetup with Axiom UTC 3pm Thursday - Axiom design docs: https://github.com/axiom-crypto/afs-prototype/blob/main/docs/specs/vm/axVM_STARK_Architecture_DRAFT.pdf ## 2024-12-03 ### PSE Ceno - circuit optimisation - (Reviewing) unify MUL opcode to field arithmetics [PR 660](https://github.com/scroll-tech/ceno/pull/660) - debug tool - (WIP) remove hardcode position 0 lk mutiplicity check [PR 649](https://github.com/scroll-tech/ceno/pull/649) - (Reviewing) merge slt/sltu [PR 659](https://github.com/scroll-tech/ceno/pull/659) - proving system - (WIP) enhancement: skip structural witin commitment & PCS [PR 654](https://github.com/scroll-tech/ceno/issues/654) - (WIP) mpcs opening optimisation: refactor to share same sumcheck impelementation & code cleanup [PR 653](https://github.com/scroll-tech/ceno/pull/653) - will extract to smaller PR for review - found another issue: performance on main proof regressed after recent [PR 671](https://github.com/scroll-tech/ceno/pull/671) ### scroll & external contributors - benchmark [super issue](https://github.com/scroll-tech/ceno/issues/641) - [x] sproll-evm: 14.6M cycles, 170 kHz. - [ ] revm => we need follow up on this - ... - precompile DSL approach: frontend & backend design [WIP document](https://hackmd.io/xy0tzQa4SqajFCtiXHp2Cw) - Ming will work on opcode circuit approach (next priority after circuit optimisation) - parallel proving design via shardings - GPU proving ## Miscs - Connecting with cores team on tg as a follow up of `Starks - aggregating hash based signatures` - asking what the hash based signature candidate - Axiom connecting us and look for potiential colaboration. - - Precon Retro, happend on UTC+8 0am, Dec 4 ## 2024-11-26 ### takeaways & idea & suggestion after pre-con #### Ming - updated roadmap accordingly, focus on EL & CL benchmark - few new ideas. e.g. zkVM based pre-compiles, lasso succintly structural table evaluation in more place. ### Roadmap summary (from Ming): we probably focus on - zkVM (prover) performance > idea justification: all zkVM vendors haven't catch up the performance for Ethereum EL/CL usage. For ceno (SOTA), it's around 250 kHz, our target > 10MHz (zkVM proved opcode per second), around 100x boost - L1 targeted benchmarks > focus on EL/CL task, and optimize surround this two task - precompile/co-processor for zkVM > as cryptography primative accelaration #### Boosting zkVM (prover) performance With proven ideas from other zkVM, e.g. - lasso structure table to skip commitment + succintly evaluate from verifier for structural table - [x] applied in Ceno non-uniform memory scheme address - [ ] apply in `logup` table T to support large table range check > challenge: need to convert sparse m(x) into dense poly commitment. - axiom/sp1: recursive proof in another VM with custom and small ISA: recursive-vm. ### L1 benchmarks - EL: revm e2e - CL: hash based signature scheme on zkVM > build connection with Kev team from CL, and know the hash-base signature candidates ### precompiles - build precompile framework => preliminary idea: to build precompile from zkVM opcode. Discussing idea with Scroll. - take EL precompiles and CL hash based signature as first priority target ### zkVM researches & exploration - sumcheck algorithm optimisation - hardware optimisation: AVX/Cuda - binary field domain knowhow - audit & fv ### Scroll & External contributor - Scroll roadmap on Ceno - prepare for announcement, cooked on more benchmarks - private I/O integration test - MPCS: benchmark WHIR as a replacement/enhancement of basefold - recursive verifier SNARK ### Miscs Ceno task WIP - optimise circuit with less witness (Kimi) - mock-prover support padding check (Soham) - discussed pre-compile and proving system optimisation (Ming) With Scroll Community Calls - Every Tuesday: Strategy meeting - Every Thursday: Weekly progress meeting including all developer (CET 10am) - Scroll Slack: PSE are invited as guest ## .. 2024-11-5/12/19 skips due to PRECON/DEVCON ## 2024-10-29 ### PSE - (completed) opcode development: slt/slti/srai - (wip) sltiu - (under reviewing) modular memory/public i/o design [PR 457](https://github.com/scroll-tech/ceno/pull/457) - experiment optimise sumcheck protocol by PolygonZero [publish](https://eprint.iacr.org/2024/108) ### Scroll & External contributor - (completed) load/store word [load](https://github.com/scroll-tech/ceno/pull/455)/[store](https://github.com/scroll-tech/ceno/pull/449) - (Doing) ELF program load into memory & e2e test ## 2024-10-22 - pending task opcode developments: srai/slti/sltiu, mem load(byte, half word) - (From Soham) how signed extension works in efficiency - Kimi refactor mock-program and move to unittest, clean up from mock-prover - Ming working on public I/O. Discussion thread on Slack (Scroll) https://scrollco.slack.com/archives/C064WRCBMHU/p1729507904585529 - Hallow project sunset sooner. - For zkVM, probably take more scope from [The Verge](https://x.com/VitalikButerin/status/1588669782471368704). - High level documentation https://hackmd.io/@pse-zkevm/B1kPQYQe1e ## 2024-10-15 Milestone 1 - We might expand milestone benchmark scope to cover more other than Fabonacci - because for pure Fabonacci people might think it's kind of cheat and maybe only outperform in this specific workload. - RIV32im are fully around 38 opcode, with that we can cover more benchmark like - rsa - regex - is-prime - ssz-withdrawal - tendermint light client ### PSE opcodes - (reviewing) logical i-type are under reviewing - (Done) mock-prover error dedup - (Done) soundness: public input fix - (Ongoing) few more opcode tasks: SLLI, SRAI, SLTIU, SLTI, MULH ### Scroll & External contritor - (Ongoing) public I/O data design and implementation planning - https://github.com/scroll-tech/ceno/issues/215 - memory gadget [PR 360](https://github.com/scroll-tech/ceno/issues/360) - blocking LH, LB, SH, SB,... Protocol ## 2024-10-08 Milestone 1 - opcode 15/24, 6 revidewing - benchmark: - on cpu, 2^20 instance e2e 2.10s, vs SP1 11s (until 2024/07) > we can further improve after resolving this issue via single limb [Issue 285](https://github.com/scroll-tech/ceno/issues/285) > many optimisation in planed follow up on milestone 1 ### PSE - opcodes development: - (Done) [SRL/SRR](https://github.com/scroll-tech/ceno/pull/304) - (Done) [MULHU](https://github.com/scroll-tech/ceno/pull/306) - (Done) [ValueAdd/ValueMul](https://github.com/scroll-tech/ceno/pull/323) - unittest enhancement - (Done) assertion on register rd value [PR 301](https://github.com/scroll-tech/ceno/pull/301/) - (Done) customized debug expression probing [PR 306](https://github.com/scroll-tech/ceno/pull/306) - Doing Divu/SLI soundness fixing - divu [PR 335](https://github.com/scroll-tech/ceno/pull/335/) ### Scroll & External contritor - (Done) ecall-halt [PR 258](https://github.com/scroll-tech/ceno/pull/258) - also support public input ... ### Misc share benchmark result with sumcheck-gpu team as a GPU optimisation anchor ``` GPU 2^26 size mle with degree 4 over BN scalar field - GPU sumcheck ~800ms, only kernel execution time, excluding data transfer and hashing (random oracle) CPU 2^26 size mle with degree 3 over Goldilock + Ext Field 2 on AMD 5800 8 cores 2 hyperthread: - school book sumcheck: 3.5301s - devergo sumcheck: 901.56 ms on AMD EPYC 9R14 16 cores 2 hyperthread: - school book sumcheck: 3.1585 s - **devergo sumcheck: 340.44 ms** ``` ## 2024-10-01 (skip meeting, combining with next week) ### PSE - (Done) improve CI testing ## 2024-09-24 ### PSE - (Done) improve unittest testing iteration [PR242](https://github.com/scroll-tech/ceno/pull/242) - (Reviewing) register/memory table and optimize table sumcheck proof [PR 251](https://github.com/scroll-tech/ceno/pull/251) - (Reviewing) i_inst and SRLI opcode [PR 98](https://github.com/scroll-tech/ceno/pull/229) - (Reviewing) divu opcode https://github.com/scroll-tech/ceno/pull/266 - TODO 1: add initial state_in/out to complete e2e rw soundness. - ... ### Scroll & External contritor - (Reviewing) BEQ/BNE [PR 272](https://github.com/scroll-tech/ceno/pull/272) - (Done) e2e opening on fixed polynomial [PR 253](https://github.com/scroll-tech/ceno/pull/253) - (Reviewing) blt e2e test [PR 249](https://github.com/scroll-tech/ceno/pull/249) > a bug where instance num > 2^10 got verification failed - [sumcheck zero PIOP from polygon-zero](https://eprint.iacr.org/2024/108) - analyzing [note](https://www.overleaf.com/project/66cf8ab8ea23adb139a41e56) from Scroll ### Misc - (WIP, low priority) refactor uint for better expression conversion [PR 264](https://github.com/scroll-tech/ceno/pull/264) - performance: 2^20 add instance: 3.6s -> 2.6s > 1. sumcheck protocol improvement, e.g polygon zero PIOP paper stuff > 2. cpu optimization: avx2/avx512 on goldilock, e.g > 4x improvement > 3. binary field arithmetics/PCS > 4. implementaion xxx ## 2024-09-17 ### PSE Ceno - (Done) ci integration pipelipe [PR 209](https://github.com/scroll-tech/ceno/pull/209) - (Done) mock prover [PR 206](https://github.com/scroll-tech/ceno/pull/206) - (Done) mul opcode generalization [generalized MUL OP](https://github.com/scroll-tech/ceno/pull/200) - (Doing) SRLI [PR](https://github.com/scroll-tech/ceno/issues/122) - (Doing) memory/cpu consistent check [Issue 126](https://github.com/scroll-tech/ceno/issues/126) > enhance sumcheck to align and run on different num_variables - (Doing) mul opcode [PR 98](https://github.com/scroll-tech/ceno/pull/219) ### Scroll & External contritor - MPCS: 2.1s (goal 1.5s) - (Done) program table & opcode lookup - Integrate MPCS to proving flow ### Misc - discuss: possible enhancement/tool in development of opcode > Soham: macro !set_value(xx) hard to find root cause => Ming: debugging message in vscode seems shows root cause. > Ming: we can't unittest on register write value if its in expression type. issue raise https://github.com/scroll-tech/ceno/issues/220 > - proposal: categorize opcodes into r-type/i-type/b-branch development > `match` syntax on opcode type, example, r-type first version [example](https://github.com/scroll-tech/ceno/pull/230/files#diff-328a733332c3613e42e08e642da79434de4a8500d36a654ec055b82f47e18380) > r-type pending on this https://github.com/scroll-tech/ceno/pull/231 - sumcheck on GPU open source https://github.com/pseXperiments/cuda-sumcheck > first version schoolbook sumcheck algo > MPI: a standardize interface for cluster compuration programming ## 2024-09-10 ### PSE Ceno - (Done) Mock Prover error print [PR 182](https://github.com/scroll-tech/ceno/pull/182) - (Done) add CI target as metrics [PR 195](https://github.com/scroll-tech/ceno/pull/195) - (Done) witness assignment interface [PR 187](https://github.com/scroll-tech/ceno/pull/187) - (Reviewing) Lt Util [PR 183](https://github.com/scroll-tech/ceno/pull/183) - (Reviewing) Lock-free thread-safe logup multiplicity witness counting [PR 198](https://github.com/scroll-tech/ceno/pull/198) - (Doing) [generalized MUL OP](https://github.com/scroll-tech/ceno/pull/200) - (TODO) opcode implementation (mul, addi, srli) - (TODO) MockProver cache table data and load once - it might be more urgent as now per run, we load > 5 tables, and each with size 2^16. It slows down CI ### Scroll & External contritor - (Reviewing) E2E opcodes and table prover https://github.com/scroll-tech/ceno/pull/188 - (Reviewing) Emulator Runtime (I/O, Allocator) - (TODO) Byte code table [PR 104](https://github.com/scroll-tech/ceno/issues/104) - (TODO) opcode development Miscs - lack behind of M1 progress https://hackmd.io/@ceno-zkvm/ryDWX5_5R - due to still consolidate the overall proving system. - improve reviewing speed & quality - fasten opcode developments - Ethereum granted project for FV on zk(E)VM https://verified-zkevm.org/ - Ali will reach them and seek for collarboration - GPU sumcheck colaboration: Sowoon + Dohoon + Scroll => tg group - Benchmark result: 5x fast than SP1 on Fobanacci task. - MPCS: 60 poly commitments => 2 s - create proof of Add opcode 2^20 => 1s - Add opcode 16 poly commits - Cost: MPCS proof 10X, 8Mb - opt1: optimise codebase/ mechanism - opt2: recursive/aggregation we can compress the proof into smaller size ## 2024-09-03 Ceno ### PSE - (Done) [PR](https://github.com/scroll-tech/ceno/pull/172) fix sumcheck degree & monomial form dedup issue => bug captured [PR 169](https://github.com/scroll-tech/ceno/pull/169) - [PR] overflow handling - [discussion thread]( https://github.com/scroll-tech/ceno/pull/173#issuecomment-2325489932) - due to usage wrapping_add/sub/mul/div in revm - in summary: - 1. disable compiler by default seasoning overflow check on a;; instructions - 2. support overflow as external assignment. - 3. rely on rust `wrapping_XXX` respective assemply to deal with overflow check - (Done) UInt refactor - (Done) Mock Prover - (Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167 #### TODOs & Doing - (Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167 - [UInt](https://github.com/scroll-tech/ceno/issues/174) support M/C != 4 mul constrains - witness assignment trait design - bitwise operation opcode [SLL](https://github.com/scroll-tech/ceno/issues/123) ### Scroll & External contritor - range table circuit https://github.com/scroll-tech/ceno/pull/154 - Emulator migrating from sp1 - PCS integration: Basefold + Plonky2-FRI optimisation - porting poseidon from plonky2 - control flow opcode implementation ## 2024-08-27 Ceno ### PSE - [PR](https://github.com/scroll-tech/ceno/pull/165) Fix verifier failed when `lk_expression.len()` > `r/w_expression.len()` - will also eliminate potiential soundness by trussless from prover proof - Change default to RIV32 https://github.com/scroll-tech/ceno/pull/166 > RIV32 toolchain got to be more mature > align benchmark with SP1 and other zkVM > experiment riv64 later - Mock Prover [PR](https://github.com/scroll-tech/ceno/pull/113) - Ming have done the review, WIP for adding >1 degree `assert_zero_expression` multiplication. - follow up tasks: modify `addsub` opcode unittest to use MockProver > 1. replace random generated witness with real data to pass the unittest > 2. keep prove/verify flow in benchmark/example - UInt refactor [PR](https://github.com/scroll-tech/ceno/pull/106) - review done from Ming - suggestion: commits cherry-pick and exclude commits in master branch ### Scroll & External Contributor - range table circuit https://github.com/scroll-tech/ceno/pull/154 - Interpreter implementation - PCS integration: Basefold + Plonky2-FRI optimisation > 2 explorations > 1. FRI-Binius => benchmark result not good, pending for new research / implementaion polishment > 2. Basefold + Plonky2 - From Snarkify: control flow opcode implementation ### Misc - a one-on-one scheduled around this Thur/Fri for sharing peer review/self evaluation result :) ## 2024-08-20 Ceno - Performance: remove unnessesary [to_vec()](https://github.com/scroll-tech/ceno/commit/a35d642869b44e4dfed5b076205b0af99612e8b4) clone improve latency from 600ms -> 380ms => 2.7Mhz zkVM prover. - Project Milestone [dashboard] (https://github.com/orgs/scroll-tech/projects/15) - timeline on hackmd https://hackmd.io/@ceno-zkvm/SkMzxt_9A - Util lib development - MockProver: lookup expression assertion check are done, while others WIP - UInt utility: under reviewing - Super Issue and TODOs review - https://github.com/scroll-tech/ceno/issues/95 ## 2024-08-13 Ceno - Up-to-date result: 2^20 instance run from 1.04s -> 600ms on 16 phy-cores 64GB, achieve 1.5Mhz prover - Ongoing tasks - framework - devirgo sumcheck commit [PR](https://github.com/scroll-tech/ceno/pull/91/commits/7b5ce9f034d6cac0f5c9a9d0ee5516c1bafd5dea) - degree > 1 zero expression sumcheck & verifier [pr](https://github.com/scroll-tech/ceno/pull/91/commits/9885767d074be36ad12394d867e4b557280493a1) - edge case handling and address review comment - mock prover: https://github.com/scroll-tech/ceno/issues/105 - UInt expression: https://github.com/scroll-tech/ceno/issues/97 ## 2024-08-06 Ceno - zkVM v2 implemetation - framework almost done - benchmark - Up-to-date result: 2^20 instance run in 1.04s on 16 phy-cores 64GB, achieve 1Mhz prover (should be 10x than sp1, > 12x than jolt). - raise super issues to trace sub tasks https://github.com/scroll-tech/ceno/issues/95 - high priorities sub-tasks - implement multi-opcode support => blocking other opcode implementation - benchmark: devirgo sumcheck optimization - (Kimi) Refactor UInt gadget and use expression system https://github.com/scroll-tech/ceno/issues/103 - add MockProver: improve opcode debugging ability - Research - [Scroll] plan to benchmark binius PCS + GKR in the following 2 weeks. ## 2024-07-30 Ceno - (Ming Ongoing): new zkVM PR draft https://github.com/scroll-tech/ceno/pull/91 - design docs from Scroll https://hackmd.io/@P4deJs5uRSyvHnXF8yyQJQ/B1DpOQDOA - multi-variate plonkish + memory offline check via GKR - use halo2 expression to construct constraints - TODOs further task breakdown - GKR logup arguments implementation - revamp Uint to works on new constraints system. - On one super-circuit with multiple opcode + sumcheck batch - riscv add [PR](https://github.com/scroll-tech/ceno/pull/85) - second round reviewing from Soham Interpreter: - ongoing, with pending tasks on running with mainblock and getting statistics result. ## 2024-07-23 Ceno - (Ming Ongoing) Design and implementing GKR + Hyperplonk variant to specificly address zkVM use case. - PoC shows 262k Hz for riscv add great value for potiential fast zkvm prover. - review riscv opcodes and see which one can NOT (or high cost) be expressed by new design. - high level implementation idea on computation graph: with the dag graph with various operation node, and each node might involve sumcheck or just simply evaluation split/merge. - Goal is to keep existing ceno frontend design while just change the underlying implementation to achieve highly code reuse. - layer -> vector, and no cellid. - target to finish first version in the following weeks. - (Kimi Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/85) riscv add opcode reviewing - unit test error and pending for debug Interpreter - opcode distribution on 2 workload: (r)evm push/pop, and evm poseidon https://docs.google.com/spreadsheets/d/16gjv2VbmmOK51PFEqDzHpr_6JN-X8iZVsW9jrsrYB7c/edit?gid=1665867100#gid=1665867100 - Code used to generate the data: - evm benchmarking https://github.com/zemse/sp1/tree/evm-benchmarking - https://github.com/zemse/sp1-revm-playground ## 2024-07-16 Ceno - Engineering - (Done) [PR](https://github.com/scroll-tech/ceno/pull/83) more refactor to applied devirgo sumcheck. Boost around 20x on evm add benchmark - (Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/89) optimize prover run time/memory - (Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/85) riscv add opcode - zkVM new design from Scroll - PoC benchmark shows around 262k Hz (2^20 add in 3.x sec) to generate proof (without PCS) > Jolt 90k hz, which means around 3x fast than Jolt. > SP1 1.7x fast than Jolt Interpreter - (Ongoing) based on SP1 emulator - repo link (TBD) - framework still work in progress - currently bug fixing Misc - Ceno open source roadmap - Aligned with Scroll: prioritize on zkVM and build the framework based on PoC to shift the project asap. Would be focus on `zkVM Keccak` instead of general `keccak`.