SUREEL zkVM weekly syncup notes

2024-12-10

PSE

debug tool
- (WIP) remove hardcode position 0 lk mutiplicity check PR 649
prover
- (WIP) skip PCS commitment/opening for structual column issue 654
- based on issue 654 works, we can add new task to skip commitment of range table
optimize opcode circuit with 32-bit range check proposal in #702
- refactor circuit to use 32 bit range check
- spartan SPARK is implemented draft PR 713
- current on hold for obstacle, need to figure out a way to do sparse fraction sum: main obstacle
sumcheck optimisation
- ideas: optimize virtualpoly and extract identity polynomial

Miscs

Scroll set timeline to publish Ceno research paper/blog on 2025/01/07 including timeline, benchmark.
- they setup timeline for engineering stuff: benchmark, sdk to be ready before Dec 24'.
- Scroll asked opinion of wordings for our contributions. we suggest from individual perspective
  
  We would like to acknowledge and thank Ming (https://github.com/hero78119), Soham (https://github.com/zemse), Kimi (https://github.com/KimiWu123), and Han (https://github.com/han0110) from Privacy and Scaling Exploration (https://pse.dev/) (PSE) for their significant contribution (https://github.com/privacy-scaling-explorations/ceno/graphs/contributors) to the design and implementation of Ceno since its early inception.
Have a meetup with Axiom UTC 3pm Thursday
- Axiom design docs: https://github.com/axiom-crypto/afs-prototype/blob/main/docs/specs/vm/axVM_STARK_Architecture_DRAFT.pdf

2024-12-03

PSE

Ceno

circuit optimisation
- (Reviewing) unify MUL opcode to field arithmetics PR 660
debug tool
- (WIP) remove hardcode position 0 lk mutiplicity check PR 649
- (Reviewing) merge slt/sltu PR 659
proving system
- (WIP) enhancement: skip structural witin commitment & PCS PR 654
- (WIP) mpcs opening optimisation: refactor to share same sumcheck impelementation & code cleanup PR 653
  - will extract to smaller PR for review
  - found another issue: performance on main proof regressed after recent PR 671

scroll & external contributors

benchmark super issue
- sproll-evm: 14.6M cycles, 170 kHz.
- revm => we need follow up on this
- …
precompile DSL approach: frontend & backend design WIP document
- Ming will work on opcode circuit approach (next priority after circuit optimisation)
parallel proving design via shardings
GPU proving

Miscs

Connecting with cores team on tg as a follow up of Starks - aggregating hash based signatures
- asking what the hash based signature candidate
Axiom connecting us and look for potiential colaboration.
Precon Retro, happend on UTC+8 0am, Dec 4

2024-11-26

takeaways & idea & suggestion after pre-con

Ming

updated roadmap accordingly, focus on EL & CL benchmark
few new ideas. e.g. zkVM based pre-compiles, lasso succintly structural table evaluation in more place.

Roadmap summary (from Ming):

we probably focus on

zkVM (prover) performance

idea justification: all zkVM vendors haven't catch up the performance for Ethereum EL/CL usage. For ceno (SOTA), it's around 250 kHz, our target > 10MHz (zkVM proved opcode per second), around 100x boost
L1 targeted benchmarks

focus on EL/CL task, and optimize surround this two task
precompile/co-processor for zkVM

as cryptography primative accelaration

Boosting zkVM (prover) performance

With proven ideas from other zkVM, e.g.

lasso structure table to skip commitment + succintly evaluate from verifier for structural table
- applied in Ceno non-uniform memory scheme address
- apply in logup table T to support large table range check
  
  challenge: need to convert sparse m(x) into dense poly commitment.
axiom/sp1: recursive proof in another VM with custom and small ISA: recursive-vm.

L1 benchmarks

EL: revm e2e
CL: hash based signature scheme on zkVM

build connection with Kev team from CL, and know the hash-base signature candidates

precompiles

build precompile framework
=> preliminary idea: to build precompile from zkVM opcode. Discussing idea with Scroll.
take EL precompiles and CL hash based signature as first priority target

zkVM researches & exploration

sumcheck algorithm optimisation
hardware optimisation: AVX/Cuda
binary field domain knowhow
audit & fv

Scroll & External contributor

Scroll roadmap on Ceno
prepare for announcement, cooked on more benchmarks
private I/O integration test
MPCS: benchmark WHIR as a replacement/enhancement of basefold
recursive verifier SNARK

Miscs

Ceno task WIP

optimise circuit with less witness (Kimi)
mock-prover support padding check (Soham)
discussed pre-compile and proving system optimisation (Ming)

With Scroll Community Calls

Every Tuesday: Strategy meeting
Every Thursday: Weekly progress meeting including all developer (CET 10am)
Scroll Slack: PSE are invited as guest

.. 2024-11-5/12/19 skips due to PRECON/DEVCON

2024-10-29

PSE

(completed) opcode development: slt/slti/srai
(wip) sltiu
(under reviewing) modular memory/public i/o design PR 457
experiment optimise sumcheck protocol by PolygonZero publish

Scroll & External contributor

(completed) load/store word load/store
(Doing) ELF program load into memory & e2e test

2024-10-22

pending task opcode developments: srai/slti/sltiu, mem load(byte, half word)
(From Soham) how signed extension works in efficiency
Kimi refactor mock-program and move to unittest, clean up from mock-prover
Ming working on public I/O. Discussion thread on Slack (Scroll) https://scrollco.slack.com/archives/C064WRCBMHU/p1729507904585529
Hallow project sunset sooner.
For zkVM, probably take more scope from The Verge.
High level documentation https://hackmd.io/@pse-zkevm/B1kPQYQe1e

2024-10-15

Milestone 1

We might expand milestone benchmark scope to cover more other than Fabonacci
- because for pure Fabonacci people might think it's kind of cheat and maybe only outperform in this specific workload.
RIV32im are fully around 38 opcode, with that we can cover more benchmark like
- rsa
- regex
- is-prime
- ssz-withdrawal
- tendermint light client

PSE

opcodes

(reviewing) logical i-type are under reviewing
(Done) mock-prover error dedup
(Done) soundness: public input fix
(Ongoing) few more opcode tasks: SLLI, SRAI, SLTIU, SLTI, MULH

Scroll & External contritor

(Ongoing) public I/O data design and implementation planning
- https://github.com/scroll-tech/ceno/issues/215
memory gadget PR 360
- blocking LH, LB, SH, SB,…

Protocol

2024-10-08

Milestone 1

opcode 15/24, 6 revidewing
benchmark:
- on cpu, 2^20 instance e2e 2.10s, vs SP1 11s (until 2024/07)
  
  we can further improve after resolving this issue via single limb Issue 285
  many optimisation in planed follow up on milestone 1

PSE

opcodes development:
- (Done) SRL/SRR
- (Done) MULHU
- (Done) ValueAdd/ValueMul
unittest enhancement
- (Done) assertion on register rd value PR 301
- (Done) customized debug expression probing PR 306
- Doing Divu/SLI soundness fixing
  - divu PR 335

Scroll & External contritor

(Done) ecall-halt PR 258
- also support public input
  …

Misc

share benchmark result with sumcheck-gpu team as a GPU optimisation anchor

GPU
2^26 size mle with degree 4 over BN scalar field
- GPU sumcheck ~800ms, only kernel execution time, excluding data transfer and hashing (random oracle)

CPU
2^26 size mle with degree 3 over Goldilock + Ext Field 2
on AMD 5800 8 cores 2 hyperthread:
- school book sumcheck: 3.5301s
- devergo sumcheck: 901.56 ms

on AMD EPYC 9R14 16 cores 2 hyperthread:
- school book sumcheck: 3.1585 s
- **devergo sumcheck: 340.44 ms**

2024-10-01 (skip meeting, combining with next week)

PSE

(Done) improve CI testing

2024-09-24

PSE

(Done) improve unittest testing iteration PR242
(Reviewing) register/memory table and optimize table sumcheck proof PR 251
(Reviewing) i_inst and SRLI opcode PR 98
(Reviewing) divu opcode https://github.com/scroll-tech/ceno/pull/266
TODO 1: add initial state_in/out to complete e2e rw soundness.
…

Scroll & External contritor

(Reviewing) BEQ/BNE PR 272
(Done) e2e opening on fixed polynomial PR 253
(Reviewing) blt e2e test PR 249

a bug where instance num > 2^10 got verification failed
sumcheck zero PIOP from polygon-zero
- analyzing note from Scroll

Misc

(WIP, low priority) refactor uint for better expression conversion PR 264
performance: 2^20 add instance: 3.6s -> 2.6s
1. sumcheck protocol improvement, e.g polygon zero PIOP paper stuff
2. cpu optimization: avx2/avx512 on goldilock, e.g > 4x improvement
3. binary field arithmetics/PCS
4. implementaion xxx

2024-09-17

PSE

Ceno

(Done) ci integration pipelipe PR 209
(Done) mock prover PR 206
(Done) mul opcode generalization generalized MUL OP
(Doing) SRLI PR
(Doing) memory/cpu consistent check Issue 126

enhance sumcheck to align and run on different num_variables
(Doing) mul opcode PR 98

Scroll & External contritor

MPCS: 2.1s (goal 1.5s)
(Done) program table & opcode lookup
Integrate MPCS to proving flow

Misc

discuss: possible enhancement/tool in development of opcode

Soham: macro !set_value(xx) hard to find root cause => Ming: debugging message in vscode seems shows root cause.
Ming: we can't unittest on register write value if its in expression type. issue raise https://github.com/scroll-tech/ceno/issues/220
proposal: categorize opcodes into r-type/i-type/b-branch development

match syntax on opcode type, example, r-type first version example
r-type pending on this https://github.com/scroll-tech/ceno/pull/231
sumcheck on GPU open source https://github.com/pseXperiments/cuda-sumcheck

first version schoolbook sumcheck algo
MPI: a standardize interface for cluster compuration programming

2024-09-10

PSE

Ceno

(Done) Mock Prover error print PR 182
(Done) add CI target as metrics PR 195
(Done) witness assignment interface PR 187
(Reviewing) Lt Util PR 183
(Reviewing) Lock-free thread-safe logup multiplicity witness counting PR 198
(Doing) generalized MUL OP
(TODO) opcode implementation (mul, addi, srli)
(TODO) MockProver cache table data and load once
- it might be more urgent as now per run, we load > 5 tables, and each with size 2^16. It slows down CI

Scroll & External contritor

(Reviewing) E2E opcodes and table prover https://github.com/scroll-tech/ceno/pull/188
(Reviewing) Emulator Runtime (I/O, Allocator)
(TODO) Byte code table PR 104
(TODO) opcode development

Miscs

lack behind of M1 progress https://hackmd.io/@ceno-zkvm/ryDWX5_5R
- due to still consolidate the overall proving system.
- improve reviewing speed & quality
- fasten opcode developments
Ethereum granted project for FV on zk(E)VM https://verified-zkevm.org/
- Ali will reach them and seek for collarboration
GPU sumcheck colaboration: Sowoon + Dohoon + Scroll => tg group
Benchmark result: 5x fast than SP1 on Fobanacci task.
- MPCS: 60 poly commitments => 2 s
- create proof of Add opcode 2^20 => 1s
- Add opcode 16 poly commits
- Cost: MPCS proof 10X, 8Mb
  - opt1: optimise codebase/ mechanism
  - opt2: recursive/aggregation we can compress the proof into smaller size

2024-09-03

Ceno

PSE

(Done) PR fix sumcheck degree & monomial form dedup issue => bug captured PR 169
[PR] overflow handling
- discussion thread
- due to usage wrapping_add/sub/mul/div in revm
- in summary:
  - 1. disable compiler by default seasoning overflow check on a;; instructions
  - 1. support overflow as external assignment.
  - 1. rely on rust wrapping_XXX respective assemply to deal with overflow check
(Done) UInt refactor
(Done) Mock Prover
(Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167

TODOs & Doing

(Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167
UInt support M/C != 4 mul constrains
witness assignment trait design
bitwise operation opcode SLL

Scroll & External contritor

range table circuit https://github.com/scroll-tech/ceno/pull/154
Emulator migrating from sp1
PCS integration: Basefold + Plonky2-FRI optimisation
- porting poseidon from plonky2
control flow opcode implementation

2024-08-27

Ceno

PSE

PR Fix verifier failed when lk_expression.len() > r/w_expression.len()
- will also eliminate potiential soundness by trussless from prover proof
Change default to RIV32 https://github.com/scroll-tech/ceno/pull/166

RIV32 toolchain got to be more mature
align benchmark with SP1 and other zkVM
experiment riv64 later
Mock Prover PR
- Ming have done the review, WIP for adding >1 degree assert_zero_expression multiplication.
- follow up tasks: modify addsub opcode unittest to use MockProver
  1. replace random generated witness with real data to pass the unittest
  2. keep prove/verify flow in benchmark/example
UInt refactor PR
- review done from Ming
- suggestion: commits cherry-pick and exclude commits in master branch

Scroll & External Contributor

range table circuit https://github.com/scroll-tech/ceno/pull/154
Interpreter implementation
PCS integration: Basefold + Plonky2-FRI optimisation
2 explorations
1. FRI-Binius => benchmark result not good, pending for new research / implementaion polishment
2. Basefold + Plonky2
From Snarkify: control flow opcode implementation

Misc

a one-on-one scheduled around this Thur/Fri for sharing peer review/self evaluation result :)

2024-08-20

Ceno

Performance: remove unnessesary to_vec() clone improve latency from 600ms -> 380ms => 2.7Mhz zkVM prover.
Project Milestone [dashboard]
(https://github.com/orgs/scroll-tech/projects/15)
- timeline on hackmd https://hackmd.io/@ceno-zkvm/SkMzxt_9A
Util lib development
- MockProver: lookup expression assertion check are done, while others WIP
- UInt utility: under reviewing
Super Issue and TODOs review
- https://github.com/scroll-tech/ceno/issues/95

2024-08-13

Ceno

Up-to-date result: 2^20 instance run from 1.04s -> 600ms on 16 phy-cores 64GB, achieve 1.5Mhz prover
Ongoing tasks
- framework
  - devirgo sumcheck commit PR
  - degree > 1 zero expression sumcheck & verifier pr
  - edge case handling and address review comment
- mock prover: https://github.com/scroll-tech/ceno/issues/105
- UInt expression: https://github.com/scroll-tech/ceno/issues/97

2024-08-06

Ceno

zkVM v2 implemetation
- framework almost done
- benchmark
  - Up-to-date result: 2^20 instance run in 1.04s on 16 phy-cores 64GB, achieve 1Mhz prover (should be 10x than sp1, > 12x than jolt).
- raise super issues to trace sub tasks https://github.com/scroll-tech/ceno/issues/95
- high priorities sub-tasks
  - implement multi-opcode support => blocking other opcode implementation
  - benchmark: devirgo sumcheck optimization
  - (Kimi) Refactor UInt gadget and use expression system https://github.com/scroll-tech/ceno/issues/103
  - add MockProver: improve opcode debugging ability
- Research
  - [Scroll] plan to benchmark binius PCS + GKR in the following 2 weeks.

2024-07-30

Ceno

(Ming Ongoing): new zkVM PR draft https://github.com/scroll-tech/ceno/pull/91
- design docs from Scroll https://hackmd.io/@P4deJs5uRSyvHnXF8yyQJQ/B1DpOQDOA
- multi-variate plonkish + memory offline check via GKR
- use halo2 expression to construct constraints
- TODOs further task breakdown
  - GKR logup arguments implementation
  - revamp Uint to works on new constraints system.
  - On one super-circuit with multiple opcode + sumcheck batch
riscv add PR
- second round reviewing from Soham

Interpreter:

ongoing, with pending tasks on running with mainblock and getting statistics result.

2024-07-23

Ceno

(Ming Ongoing) Design and implementing GKR + Hyperplonk variant to specificly address zkVM use case.
- PoC shows 262k Hz for riscv add great value for potiential fast zkvm prover.
- review riscv opcodes and see which one can NOT (or high cost) be expressed by new design.
- high level implementation idea on computation graph: with the dag graph with various operation node, and each node might involve sumcheck or just simply evaluation split/merge.
- Goal is to keep existing ceno frontend design while just change the underlying implementation to achieve highly code reuse.
- layer -> vector, and no cellid.
- target to finish first version in the following weeks.
(Kimi Ongoing) PR riscv add opcode reviewing
- unit test error and pending for debug

Interpreter

opcode distribution on 2 workload: ®evm push/pop, and evm poseidon https://docs.google.com/spreadsheets/d/16gjv2VbmmOK51PFEqDzHpr_6JN-X8iZVsW9jrsrYB7c/edit?gid=1665867100#gid=1665867100
- Code used to generate the data:
  - evm benchmarking https://github.com/zemse/sp1/tree/evm-benchmarking
  - https://github.com/zemse/sp1-revm-playground

2024-07-16

Ceno

Engineering
- (Done) PR more refactor to applied devirgo sumcheck. Boost around 20x on evm add benchmark
- (Ongoing) PR optimize prover run time/memory
- (Ongoing) PR riscv add opcode
zkVM new design from Scroll
- PoC benchmark shows around 262k Hz (2^20 add in 3.x sec) to generate proof (without PCS)
  > Jolt 90k hz, which means around 3x fast than Jolt.
  > SP1 1.7x fast than Jolt

Interpreter

(Ongoing) based on SP1 emulator
- repo link (TBD)
- framework still work in progress
- currently bug fixing

Misc

Ceno open source roadmap
- Aligned with Scroll: prioritize on zkVM and build the framework based on PoC to shift the project asap. Would be focus on zkVM Keccak instead of general keccak.

SUREEL zkVM weekly syncup notes

2024-12-10

PSE

Miscs

2024-12-03

PSE

scroll & external contributors

Miscs

2024-11-26

takeaways & idea & suggestion after pre-con

Ming

Roadmap summary (from Ming):

Boosting zkVM (prover) performance

L1 benchmarks

precompiles

zkVM researches & exploration

Scroll & External contributor

Miscs

.. 2024-11-5/12/19 skips due to PRECON/DEVCON

2024-10-29

PSE

Scroll & External contributor

2024-10-22

2024-10-15

PSE

Scroll & External contritor

2024-10-08

PSE

Scroll & External contritor

Misc

2024-10-01 (skip meeting, combining with next week)

PSE

2024-09-24

PSE

Scroll & External contritor

Misc

2024-09-17

PSE

Scroll & External contritor

Misc

2024-09-10

PSE

Scroll & External contritor

2024-09-03

PSE

TODOs & Doing

Scroll & External contritor

2024-08-27

PSE

Scroll & External Contributor

Misc

2024-08-20

2024-08-13

2024-08-06

2024-07-30

2024-07-23

2024-07-16

Read more

How Tower works in Ceno zkVM

Variant: GKR + Hyperplonk

Optimize LogUp challenge in IVC

Lookup Argument in Nova tricks: permutation checking without re-ordering