SUREEL zkVM weekly syncup notes
============
## 2024-12-10
### PSE
- debug tool
- (WIP) remove hardcode position 0 lk mutiplicity check [PR 649](https://github.com/scroll-tech/ceno/pull/649)
- prover
- (WIP) skip PCS commitment/opening for structual column [issue 654](https://github.com/scroll-tech/ceno/issues/654)
- based on issue 654 works, we can add new task to skip commitment of range table
- optimize opcode circuit with 32-bit range check [proposal in #702](https://github.com/scroll-tech/ceno/issues/702)
- refactor circuit to use 32 bit range check
- spartan SPARK is implemented [draft PR 713](https://github.com/scroll-tech/ceno/pull/713)
- current on hold for obstacle, need to figure out a way to do sparse fraction sum: [main obstacle](https://github.com/scroll-tech/ceno/issues/702#issuecomment-2527165584)
- sumcheck optimisation
- ideas: optimize virtualpoly and extract identity polynomial
### Miscs
- Scroll set timeline to publish Ceno research paper/blog on 2025/01/07 including timeline, benchmark.
- they setup timeline for engineering stuff: benchmark, sdk to be ready before Dec 24'.
- Scroll asked opinion of wordings for our contributions. we suggest from individual perspective
> We would like to acknowledge and thank Ming (https://github.com/hero78119), Soham (https://github.com/zemse), Kimi (https://github.com/KimiWu123), and Han (https://github.com/han0110) from Privacy and Scaling Exploration (https://pse.dev/) (PSE) for their significant contribution (https://github.com/privacy-scaling-explorations/ceno/graphs/contributors) to the design and implementation of Ceno since its early inception.
- Have a meetup with Axiom UTC 3pm Thursday
- Axiom design docs: https://github.com/axiom-crypto/afs-prototype/blob/main/docs/specs/vm/axVM_STARK_Architecture_DRAFT.pdf
## 2024-12-03
### PSE
Ceno
- circuit optimisation
- (Reviewing) unify MUL opcode to field arithmetics [PR 660](https://github.com/scroll-tech/ceno/pull/660)
- debug tool
- (WIP) remove hardcode position 0 lk mutiplicity check [PR 649](https://github.com/scroll-tech/ceno/pull/649)
- (Reviewing) merge slt/sltu [PR 659](https://github.com/scroll-tech/ceno/pull/659)
- proving system
- (WIP) enhancement: skip structural witin commitment & PCS [PR 654](https://github.com/scroll-tech/ceno/issues/654)
- (WIP) mpcs opening optimisation: refactor to share same sumcheck impelementation & code cleanup [PR 653](https://github.com/scroll-tech/ceno/pull/653)
- will extract to smaller PR for review
- found another issue: performance on main proof regressed after recent [PR 671](https://github.com/scroll-tech/ceno/pull/671)
### scroll & external contributors
- benchmark [super issue](https://github.com/scroll-tech/ceno/issues/641)
- [x] sproll-evm: 14.6M cycles, 170 kHz.
- [ ] revm => we need follow up on this
- ...
- precompile DSL approach: frontend & backend design [WIP document](https://hackmd.io/xy0tzQa4SqajFCtiXHp2Cw)
- Ming will work on opcode circuit approach (next priority after circuit optimisation)
- parallel proving design via shardings
- GPU proving
## Miscs
- Connecting with cores team on tg as a follow up of `Starks - aggregating hash based signatures`
- asking what the hash based signature candidate
- Axiom connecting us and look for potiential colaboration.
-
- Precon Retro, happend on UTC+8 0am, Dec 4
## 2024-11-26
### takeaways & idea & suggestion after pre-con
#### Ming
- updated roadmap accordingly, focus on EL & CL benchmark
- few new ideas. e.g. zkVM based pre-compiles, lasso succintly structural table evaluation in more place.
### Roadmap summary (from Ming):
we probably focus on
- zkVM (prover) performance
> idea justification: all zkVM vendors haven't catch up the performance for Ethereum EL/CL usage. For ceno (SOTA), it's around 250 kHz, our target > 10MHz (zkVM proved opcode per second), around 100x boost
- L1 targeted benchmarks
> focus on EL/CL task, and optimize surround this two task
- precompile/co-processor for zkVM
> as cryptography primative accelaration
#### Boosting zkVM (prover) performance
With proven ideas from other zkVM, e.g.
- lasso structure table to skip commitment + succintly evaluate from verifier for structural table
- [x] applied in Ceno non-uniform memory scheme address
- [ ] apply in `logup` table T to support large table range check
> challenge: need to convert sparse m(x) into dense poly commitment.
- axiom/sp1: recursive proof in another VM with custom and small ISA: recursive-vm.
### L1 benchmarks
- EL: revm e2e
- CL: hash based signature scheme on zkVM
> build connection with Kev team from CL, and know the hash-base signature candidates
### precompiles
- build precompile framework
=> preliminary idea: to build precompile from zkVM opcode. Discussing idea with Scroll.
- take EL precompiles and CL hash based signature as first priority target
### zkVM researches & exploration
- sumcheck algorithm optimisation
- hardware optimisation: AVX/Cuda
- binary field domain knowhow
- audit & fv
### Scroll & External contributor
- Scroll roadmap on Ceno
- prepare for announcement, cooked on more benchmarks
- private I/O integration test
- MPCS: benchmark WHIR as a replacement/enhancement of basefold
- recursive verifier SNARK
### Miscs
Ceno task WIP
- optimise circuit with less witness (Kimi)
- mock-prover support padding check (Soham)
- discussed pre-compile and proving system optimisation (Ming)
With Scroll Community Calls
- Every Tuesday: Strategy meeting
- Every Thursday: Weekly progress meeting including all developer (CET 10am)
- Scroll Slack: PSE are invited as guest
## .. 2024-11-5/12/19 skips due to PRECON/DEVCON
## 2024-10-29
### PSE
- (completed) opcode development: slt/slti/srai
- (wip) sltiu
- (under reviewing) modular memory/public i/o design [PR 457](https://github.com/scroll-tech/ceno/pull/457)
- experiment optimise sumcheck protocol by PolygonZero [publish](https://eprint.iacr.org/2024/108)
### Scroll & External contributor
- (completed) load/store word [load](https://github.com/scroll-tech/ceno/pull/455)/[store](https://github.com/scroll-tech/ceno/pull/449)
- (Doing) ELF program load into memory & e2e test
## 2024-10-22
- pending task opcode developments: srai/slti/sltiu, mem load(byte, half word)
- (From Soham) how signed extension works in efficiency
- Kimi refactor mock-program and move to unittest, clean up from mock-prover
- Ming working on public I/O. Discussion thread on Slack (Scroll) https://scrollco.slack.com/archives/C064WRCBMHU/p1729507904585529
- Hallow project sunset sooner.
- For zkVM, probably take more scope from [The Verge](https://x.com/VitalikButerin/status/1588669782471368704).
- High level documentation https://hackmd.io/@pse-zkevm/B1kPQYQe1e
## 2024-10-15
Milestone 1
- We might expand milestone benchmark scope to cover more other than Fabonacci
- because for pure Fabonacci people might think it's kind of cheat and maybe only outperform in this specific workload.
- RIV32im are fully around 38 opcode, with that we can cover more benchmark like
- rsa
- regex
- is-prime
- ssz-withdrawal
- tendermint light client
### PSE
opcodes
- (reviewing) logical i-type are under reviewing
- (Done) mock-prover error dedup
- (Done) soundness: public input fix
- (Ongoing) few more opcode tasks: SLLI, SRAI, SLTIU, SLTI, MULH
### Scroll & External contritor
- (Ongoing) public I/O data design and implementation planning
- https://github.com/scroll-tech/ceno/issues/215
- memory gadget [PR 360](https://github.com/scroll-tech/ceno/issues/360)
- blocking LH, LB, SH, SB,...
Protocol
## 2024-10-08
Milestone 1
- opcode 15/24, 6 revidewing
- benchmark:
- on cpu, 2^20 instance e2e 2.10s, vs SP1 11s (until 2024/07)
> we can further improve after resolving this issue via single limb [Issue 285](https://github.com/scroll-tech/ceno/issues/285)
> many optimisation in planed follow up on milestone 1
### PSE
- opcodes development:
- (Done) [SRL/SRR](https://github.com/scroll-tech/ceno/pull/304)
- (Done) [MULHU](https://github.com/scroll-tech/ceno/pull/306)
- (Done) [ValueAdd/ValueMul](https://github.com/scroll-tech/ceno/pull/323)
- unittest enhancement
- (Done) assertion on register rd value [PR 301](https://github.com/scroll-tech/ceno/pull/301/)
- (Done) customized debug expression probing [PR 306](https://github.com/scroll-tech/ceno/pull/306)
- Doing Divu/SLI soundness fixing
- divu [PR 335](https://github.com/scroll-tech/ceno/pull/335/)
### Scroll & External contritor
- (Done) ecall-halt [PR 258](https://github.com/scroll-tech/ceno/pull/258)
- also support public input
...
### Misc
share benchmark result with sumcheck-gpu team as a GPU optimisation anchor
```
GPU
2^26 size mle with degree 4 over BN scalar field
- GPU sumcheck ~800ms, only kernel execution time, excluding data transfer and hashing (random oracle)
CPU
2^26 size mle with degree 3 over Goldilock + Ext Field 2
on AMD 5800 8 cores 2 hyperthread:
- school book sumcheck: 3.5301s
- devergo sumcheck: 901.56 ms
on AMD EPYC 9R14 16 cores 2 hyperthread:
- school book sumcheck: 3.1585 s
- **devergo sumcheck: 340.44 ms**
```
## 2024-10-01 (skip meeting, combining with next week)
### PSE
- (Done) improve CI testing
## 2024-09-24
### PSE
- (Done) improve unittest testing iteration [PR242](https://github.com/scroll-tech/ceno/pull/242)
- (Reviewing) register/memory table and optimize table sumcheck proof [PR 251](https://github.com/scroll-tech/ceno/pull/251)
- (Reviewing) i_inst and SRLI opcode [PR 98](https://github.com/scroll-tech/ceno/pull/229)
- (Reviewing) divu opcode https://github.com/scroll-tech/ceno/pull/266
- TODO 1: add initial state_in/out to complete e2e rw soundness.
- ...
### Scroll & External contritor
- (Reviewing) BEQ/BNE [PR 272](https://github.com/scroll-tech/ceno/pull/272)
- (Done) e2e opening on fixed polynomial [PR 253](https://github.com/scroll-tech/ceno/pull/253)
- (Reviewing) blt e2e test [PR 249](https://github.com/scroll-tech/ceno/pull/249)
> a bug where instance num > 2^10 got verification failed
- [sumcheck zero PIOP from polygon-zero](https://eprint.iacr.org/2024/108)
- analyzing [note](https://www.overleaf.com/project/66cf8ab8ea23adb139a41e56) from Scroll
### Misc
- (WIP, low priority) refactor uint for better expression conversion [PR 264](https://github.com/scroll-tech/ceno/pull/264)
- performance: 2^20 add instance: 3.6s -> 2.6s
> 1. sumcheck protocol improvement, e.g polygon zero PIOP paper stuff
> 2. cpu optimization: avx2/avx512 on goldilock, e.g > 4x improvement
> 3. binary field arithmetics/PCS
> 4. implementaion xxx
## 2024-09-17
### PSE
Ceno
- (Done) ci integration pipelipe [PR 209](https://github.com/scroll-tech/ceno/pull/209)
- (Done) mock prover [PR 206](https://github.com/scroll-tech/ceno/pull/206)
- (Done) mul opcode generalization [generalized MUL OP](https://github.com/scroll-tech/ceno/pull/200)
- (Doing) SRLI [PR](https://github.com/scroll-tech/ceno/issues/122)
- (Doing) memory/cpu consistent check [Issue 126](https://github.com/scroll-tech/ceno/issues/126)
> enhance sumcheck to align and run on different num_variables
- (Doing) mul opcode [PR 98](https://github.com/scroll-tech/ceno/pull/219)
### Scroll & External contritor
- MPCS: 2.1s (goal 1.5s)
- (Done) program table & opcode lookup
- Integrate MPCS to proving flow
### Misc
- discuss: possible enhancement/tool in development of opcode
> Soham: macro !set_value(xx) hard to find root cause => Ming: debugging message in vscode seems shows root cause.
> Ming: we can't unittest on register write value if its in expression type. issue raise https://github.com/scroll-tech/ceno/issues/220
>
- proposal: categorize opcodes into r-type/i-type/b-branch development
> `match` syntax on opcode type, example, r-type first version [example](https://github.com/scroll-tech/ceno/pull/230/files#diff-328a733332c3613e42e08e642da79434de4a8500d36a654ec055b82f47e18380)
> r-type pending on this https://github.com/scroll-tech/ceno/pull/231
- sumcheck on GPU open source https://github.com/pseXperiments/cuda-sumcheck
> first version schoolbook sumcheck algo
> MPI: a standardize interface for cluster compuration programming
## 2024-09-10
### PSE
Ceno
- (Done) Mock Prover error print [PR 182](https://github.com/scroll-tech/ceno/pull/182)
- (Done) add CI target as metrics [PR 195](https://github.com/scroll-tech/ceno/pull/195)
- (Done) witness assignment interface [PR 187](https://github.com/scroll-tech/ceno/pull/187)
- (Reviewing) Lt Util [PR 183](https://github.com/scroll-tech/ceno/pull/183)
- (Reviewing) Lock-free thread-safe logup multiplicity witness counting [PR 198](https://github.com/scroll-tech/ceno/pull/198)
- (Doing) [generalized MUL OP](https://github.com/scroll-tech/ceno/pull/200)
- (TODO) opcode implementation (mul, addi, srli)
- (TODO) MockProver cache table data and load once
- it might be more urgent as now per run, we load > 5 tables, and each with size 2^16. It slows down CI
### Scroll & External contritor
- (Reviewing) E2E opcodes and table prover https://github.com/scroll-tech/ceno/pull/188
- (Reviewing) Emulator Runtime (I/O, Allocator)
- (TODO) Byte code table [PR 104](https://github.com/scroll-tech/ceno/issues/104)
- (TODO) opcode development
Miscs
- lack behind of M1 progress https://hackmd.io/@ceno-zkvm/ryDWX5_5R
- due to still consolidate the overall proving system.
- improve reviewing speed & quality
- fasten opcode developments
- Ethereum granted project for FV on zk(E)VM https://verified-zkevm.org/
- Ali will reach them and seek for collarboration
- GPU sumcheck colaboration: Sowoon + Dohoon + Scroll => tg group
- Benchmark result: 5x fast than SP1 on Fobanacci task.
- MPCS: 60 poly commitments => 2 s
- create proof of Add opcode 2^20 => 1s
- Add opcode 16 poly commits
- Cost: MPCS proof 10X, 8Mb
- opt1: optimise codebase/ mechanism
- opt2: recursive/aggregation we can compress the proof into smaller size
## 2024-09-03
Ceno
### PSE
- (Done) [PR](https://github.com/scroll-tech/ceno/pull/172) fix sumcheck degree & monomial form dedup issue => bug captured [PR 169](https://github.com/scroll-tech/ceno/pull/169)
- [PR] overflow handling
- [discussion thread]( https://github.com/scroll-tech/ceno/pull/173#issuecomment-2325489932)
- due to usage wrapping_add/sub/mul/div in revm
- in summary:
- 1. disable compiler by default seasoning overflow check on a;; instructions
- 2. support overflow as external assignment.
- 3. rely on rust `wrapping_XXX` respective assemply to deal with overflow check
- (Done) UInt refactor
- (Done) Mock Prover
- (Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167
#### TODOs & Doing
- (Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167
- [UInt](https://github.com/scroll-tech/ceno/issues/174) support M/C != 4 mul constrains
- witness assignment trait design
- bitwise operation opcode [SLL](https://github.com/scroll-tech/ceno/issues/123)
### Scroll & External contritor
- range table circuit https://github.com/scroll-tech/ceno/pull/154
- Emulator migrating from sp1
- PCS integration: Basefold + Plonky2-FRI optimisation
- porting poseidon from plonky2
- control flow opcode implementation
## 2024-08-27
Ceno
### PSE
- [PR](https://github.com/scroll-tech/ceno/pull/165) Fix verifier failed when `lk_expression.len()` > `r/w_expression.len()`
- will also eliminate potiential soundness by trussless from prover proof
- Change default to RIV32 https://github.com/scroll-tech/ceno/pull/166
> RIV32 toolchain got to be more mature
> align benchmark with SP1 and other zkVM
> experiment riv64 later
- Mock Prover [PR](https://github.com/scroll-tech/ceno/pull/113)
- Ming have done the review, WIP for adding >1 degree `assert_zero_expression` multiplication.
- follow up tasks: modify `addsub` opcode unittest to use MockProver
> 1. replace random generated witness with real data to pass the unittest
> 2. keep prove/verify flow in benchmark/example
- UInt refactor [PR](https://github.com/scroll-tech/ceno/pull/106)
- review done from Ming
- suggestion: commits cherry-pick and exclude commits in master branch
### Scroll & External Contributor
- range table circuit https://github.com/scroll-tech/ceno/pull/154
- Interpreter implementation
- PCS integration: Basefold + Plonky2-FRI optimisation
> 2 explorations
> 1. FRI-Binius => benchmark result not good, pending for new research / implementaion polishment
> 2. Basefold + Plonky2
- From Snarkify: control flow opcode implementation
### Misc
- a one-on-one scheduled around this Thur/Fri for sharing peer review/self evaluation result :)
## 2024-08-20
Ceno
- Performance: remove unnessesary [to_vec()](https://github.com/scroll-tech/ceno/commit/a35d642869b44e4dfed5b076205b0af99612e8b4) clone improve latency from 600ms -> 380ms => 2.7Mhz zkVM prover.
- Project Milestone [dashboard]
(https://github.com/orgs/scroll-tech/projects/15)
- timeline on hackmd https://hackmd.io/@ceno-zkvm/SkMzxt_9A
- Util lib development
- MockProver: lookup expression assertion check are done, while others WIP
- UInt utility: under reviewing
- Super Issue and TODOs review
- https://github.com/scroll-tech/ceno/issues/95
## 2024-08-13
Ceno
- Up-to-date result: 2^20 instance run from 1.04s -> 600ms on 16 phy-cores 64GB, achieve 1.5Mhz prover
- Ongoing tasks
- framework
- devirgo sumcheck commit [PR](https://github.com/scroll-tech/ceno/pull/91/commits/7b5ce9f034d6cac0f5c9a9d0ee5516c1bafd5dea)
- degree > 1 zero expression sumcheck & verifier [pr](https://github.com/scroll-tech/ceno/pull/91/commits/9885767d074be36ad12394d867e4b557280493a1)
- edge case handling and address review comment
- mock prover: https://github.com/scroll-tech/ceno/issues/105
- UInt expression: https://github.com/scroll-tech/ceno/issues/97
## 2024-08-06
Ceno
- zkVM v2 implemetation
- framework almost done
- benchmark
- Up-to-date result: 2^20 instance run in 1.04s on 16 phy-cores 64GB, achieve 1Mhz prover (should be 10x than sp1, > 12x than jolt).
- raise super issues to trace sub tasks https://github.com/scroll-tech/ceno/issues/95
- high priorities sub-tasks
- implement multi-opcode support => blocking other opcode implementation
- benchmark: devirgo sumcheck optimization
- (Kimi) Refactor UInt gadget and use expression system https://github.com/scroll-tech/ceno/issues/103
- add MockProver: improve opcode debugging ability
- Research
- [Scroll] plan to benchmark binius PCS + GKR in the following 2 weeks.
## 2024-07-30
Ceno
- (Ming Ongoing): new zkVM PR draft https://github.com/scroll-tech/ceno/pull/91
- design docs from Scroll https://hackmd.io/@P4deJs5uRSyvHnXF8yyQJQ/B1DpOQDOA
- multi-variate plonkish + memory offline check via GKR
- use halo2 expression to construct constraints
- TODOs further task breakdown
- GKR logup arguments implementation
- revamp Uint to works on new constraints system.
- On one super-circuit with multiple opcode + sumcheck batch
- riscv add [PR](https://github.com/scroll-tech/ceno/pull/85)
- second round reviewing from Soham
Interpreter:
- ongoing, with pending tasks on running with mainblock and getting statistics result.
## 2024-07-23
Ceno
- (Ming Ongoing) Design and implementing GKR + Hyperplonk variant to specificly address zkVM use case.
- PoC shows 262k Hz for riscv add great value for potiential fast zkvm prover.
- review riscv opcodes and see which one can NOT (or high cost) be expressed by new design.
- high level implementation idea on computation graph: with the dag graph with various operation node, and each node might involve sumcheck or just simply evaluation split/merge.
- Goal is to keep existing ceno frontend design while just change the underlying implementation to achieve highly code reuse.
- layer -> vector, and no cellid.
- target to finish first version in the following weeks.
- (Kimi Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/85) riscv add opcode reviewing
- unit test error and pending for debug
Interpreter
- opcode distribution on 2 workload: (r)evm push/pop, and evm poseidon https://docs.google.com/spreadsheets/d/16gjv2VbmmOK51PFEqDzHpr_6JN-X8iZVsW9jrsrYB7c/edit?gid=1665867100#gid=1665867100
- Code used to generate the data:
- evm benchmarking https://github.com/zemse/sp1/tree/evm-benchmarking
- https://github.com/zemse/sp1-revm-playground
## 2024-07-16
Ceno
- Engineering
- (Done) [PR](https://github.com/scroll-tech/ceno/pull/83) more refactor to applied devirgo sumcheck. Boost around 20x on evm add benchmark
- (Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/89) optimize prover run time/memory
- (Ongoing) [PR](https://github.com/scroll-tech/ceno/pull/85) riscv add opcode
- zkVM new design from Scroll
- PoC benchmark shows around 262k Hz (2^20 add in 3.x sec) to generate proof (without PCS)
> Jolt 90k hz, which means around 3x fast than Jolt.
> SP1 1.7x fast than Jolt
Interpreter
- (Ongoing) based on SP1 emulator
- repo link (TBD)
- framework still work in progress
- currently bug fixing
Misc
- Ceno open source roadmap
- Aligned with Scroll: prioritize on zkVM and build the framework based on PoC to shift the project asap. Would be focus on `zkVM Keccak` instead of general `keccak`.