Try   HackMD

SUREEL zkVM weekly syncup notes

2024-12-10

PSE

  • debug tool
    • (WIP) remove hardcode position 0 lk mutiplicity check PR 649
  • prover
    • (WIP) skip PCS commitment/opening for structual column issue 654
    • based on issue 654 works, we can add new task to skip commitment of range table
  • optimize opcode circuit with 32-bit range check proposal in #702
    - refactor circuit to use 32 bit range check
    - spartan SPARK is implemented draft PR 713
    - current on hold for obstacle, need to figure out a way to do sparse fraction sum: main obstacle
  • sumcheck optimisation
    - ideas: optimize virtualpoly and extract identity polynomial

Miscs

2024-12-03

PSE

Ceno

  • circuit optimisation
    • (Reviewing) unify MUL opcode to field arithmetics PR 660
  • debug tool
    • (WIP) remove hardcode position 0 lk mutiplicity check PR 649
    • (Reviewing) merge slt/sltu PR 659
  • proving system
    • (WIP) enhancement: skip structural witin commitment & PCS PR 654
    • (WIP) mpcs opening optimisation: refactor to share same sumcheck impelementation & code cleanup PR 653
      • will extract to smaller PR for review
      • found another issue: performance on main proof regressed after recent PR 671

scroll & external contributors

  • benchmark super issue
    • sproll-evm: 14.6M cycles, 170 kHz.
    • revm => we need follow up on this
  • precompile DSL approach: frontend & backend design WIP document
    • Ming will work on opcode circuit approach (next priority after circuit optimisation)
  • parallel proving design via shardings
  • GPU proving

Miscs

  • Connecting with cores team on tg as a follow up of Starks - aggregating hash based signatures
    • asking what the hash based signature candidate
  • Axiom connecting us and look for potiential colaboration.
  • Precon Retro, happend on UTC+8 0am, Dec 4

2024-11-26

takeaways & idea & suggestion after pre-con

Ming

  • updated roadmap accordingly, focus on EL & CL benchmark
  • few new ideas. e.g. zkVM based pre-compiles, lasso succintly structural table evaluation in more place.

Roadmap summary (from Ming):

we probably focus on

  • zkVM (prover) performance

    idea justification: all zkVM vendors haven't catch up the performance for Ethereum EL/CL usage. For ceno (SOTA), it's around 250 kHz, our target > 10MHz (zkVM proved opcode per second), around 100x boost

  • L1 targeted benchmarks

    focus on EL/CL task, and optimize surround this two task

  • precompile/co-processor for zkVM

    as cryptography primative accelaration

Boosting zkVM (prover) performance

With proven ideas from other zkVM, e.g.

  • lasso structure table to skip commitment + succintly evaluate from verifier for structural table

    • applied in Ceno non-uniform memory scheme address
    • apply in logup table T to support large table range check

      challenge: need to convert sparse m(x) into dense poly commitment.

  • axiom/sp1: recursive proof in another VM with custom and small ISA: recursive-vm.

L1 benchmarks

  • EL: revm e2e
  • CL: hash based signature scheme on zkVM

    build connection with Kev team from CL, and know the hash-base signature candidates

precompiles

  • build precompile framework
    => preliminary idea: to build precompile from zkVM opcode. Discussing idea with Scroll.
  • take EL precompiles and CL hash based signature as first priority target

zkVM researches & exploration

  • sumcheck algorithm optimisation
  • hardware optimisation: AVX/Cuda
  • binary field domain knowhow
  • audit & fv

Scroll & External contributor

  • Scroll roadmap on Ceno
  • prepare for announcement, cooked on more benchmarks
  • private I/O integration test
  • MPCS: benchmark WHIR as a replacement/enhancement of basefold
  • recursive verifier SNARK

Miscs

Ceno task WIP

  • optimise circuit with less witness (Kimi)
  • mock-prover support padding check (Soham)
  • discussed pre-compile and proving system optimisation (Ming)

With Scroll Community Calls

  • Every Tuesday: Strategy meeting
  • Every Thursday: Weekly progress meeting including all developer (CET 10am)
  • Scroll Slack: PSE are invited as guest

.. 2024-11-5/12/19 skips due to PRECON/DEVCON

2024-10-29

PSE

  • (completed) opcode development: slt/slti/srai
  • (wip) sltiu
  • (under reviewing) modular memory/public i/o design PR 457
  • experiment optimise sumcheck protocol by PolygonZero publish

Scroll & External contributor

  • (completed) load/store word load/store
  • (Doing) ELF program load into memory & e2e test

2024-10-22

2024-10-15

Milestone 1

  • We might expand milestone benchmark scope to cover more other than Fabonacci
    • because for pure Fabonacci people might think it's kind of cheat and maybe only outperform in this specific workload.
  • RIV32im are fully around 38 opcode, with that we can cover more benchmark like
    • rsa
    • regex
    • is-prime
    • ssz-withdrawal
    • tendermint light client

PSE

opcodes

  • (reviewing) logical i-type are under reviewing
  • (Done) mock-prover error dedup
  • (Done) soundness: public input fix
  • (Ongoing) few more opcode tasks: SLLI, SRAI, SLTIU, SLTI, MULH

Scroll & External contritor

Protocol

2024-10-08

Milestone 1

  • opcode 15/24, 6 revidewing
  • benchmark:
    • on cpu, 2^20 instance e2e 2.10s, vs SP1 11s (until 2024/07)

      we can further improve after resolving this issue via single limb Issue 285
      many optimisation in planed follow up on milestone 1

PSE

  • opcodes development:
  • unittest enhancement
    • (Done) assertion on register rd value PR 301
    • (Done) customized debug expression probing PR 306
    • Doing Divu/SLI soundness fixing

Scroll & External contritor

  • (Done) ecall-halt PR 258
    • also support public input

Misc

share benchmark result with sumcheck-gpu team as a GPU optimisation anchor

GPU
2^26 size mle with degree 4 over BN scalar field
- GPU sumcheck ~800ms, only kernel execution time, excluding data transfer and hashing (random oracle)

CPU
2^26 size mle with degree 3 over Goldilock + Ext Field 2
on AMD 5800 8 cores 2 hyperthread:
- school book sumcheck: 3.5301s
- devergo sumcheck: 901.56 ms

on AMD EPYC 9R14 16 cores 2 hyperthread:
- school book sumcheck: 3.1585 s
- **devergo sumcheck: 340.44 ms**

2024-10-01 (skip meeting, combining with next week)

PSE

  • (Done) improve CI testing

2024-09-24

PSE

  • (Done) improve unittest testing iteration PR242
  • (Reviewing) register/memory table and optimize table sumcheck proof PR 251
  • (Reviewing) i_inst and SRLI opcode PR 98
  • (Reviewing) divu opcode https://github.com/scroll-tech/ceno/pull/266
  • TODO 1: add initial state_in/out to complete e2e rw soundness.

Scroll & External contritor

Misc

  • (WIP, low priority) refactor uint for better expression conversion PR 264
  • performance: 2^20 add instance: 3.6s -> 2.6s
    1. sumcheck protocol improvement, e.g polygon zero PIOP paper stuff
    2. cpu optimization: avx2/avx512 on goldilock, e.g > 4x improvement
    3. binary field arithmetics/PCS
    4. implementaion xxx

2024-09-17

PSE

Ceno

  • (Done) ci integration pipelipe PR 209
  • (Done) mock prover PR 206
  • (Done) mul opcode generalization generalized MUL OP
  • (Doing) SRLI PR
  • (Doing) memory/cpu consistent check Issue 126

    enhance sumcheck to align and run on different num_variables

  • (Doing) mul opcode PR 98

Scroll & External contritor

  • MPCS: 2.1s (goal 1.5s)
  • (Done) program table & opcode lookup
  • Integrate MPCS to proving flow

Misc

2024-09-10

PSE

Ceno

  • (Done) Mock Prover error print PR 182
  • (Done) add CI target as metrics PR 195
  • (Done) witness assignment interface PR 187
  • (Reviewing) Lt Util PR 183
  • (Reviewing) Lock-free thread-safe logup multiplicity witness counting PR 198
  • (Doing) generalized MUL OP
  • (TODO) opcode implementation (mul, addi, srli)
  • (TODO) MockProver cache table data and load once
    • it might be more urgent as now per run, we load > 5 tables, and each with size 2^16. It slows down CI

Scroll & External contritor

Miscs

  • lack behind of M1 progress https://hackmd.io/@ceno-zkvm/ryDWX5_5R
    • due to still consolidate the overall proving system.
    • improve reviewing speed & quality
    • fasten opcode developments
  • Ethereum granted project for FV on zk(E)VM https://verified-zkevm.org/
    • Ali will reach them and seek for collarboration
  • GPU sumcheck colaboration: Sowoon + Dohoon + Scroll => tg group
  • Benchmark result: 5x fast than SP1 on Fobanacci task.
    • MPCS: 60 poly commitments => 2 s
    • create proof of Add opcode 2^20 => 1s
    • Add opcode 16 poly commits
    • Cost: MPCS proof 10X, 8Mb
      • opt1: optimise codebase/ mechanism
      • opt2: recursive/aggregation we can compress the proof into smaller size

2024-09-03

Ceno

PSE

  • (Done) PR fix sumcheck degree & monomial form dedup issue => bug captured PR 169
  • [PR] overflow handling
    • discussion thread
    • due to usage wrapping_add/sub/mul/div in revm
    • in summary:
        1. disable compiler by default seasoning overflow check on a;; instructions
        1. support overflow as external assignment.
        1. rely on rust wrapping_XXX respective assemply to deal with overflow check
  • (Done) UInt refactor
  • (Done) Mock Prover
  • (Doing) LT/GT Gadget https://github.com/scroll-tech/ceno/issues/167

TODOs & Doing

Scroll & External contritor

2024-08-27

Ceno

PSE

  • PR Fix verifier failed when lk_expression.len() > r/w_expression.len()
    • will also eliminate potiential soundness by trussless from prover proof
  • Change default to RIV32 https://github.com/scroll-tech/ceno/pull/166

    RIV32 toolchain got to be more mature
    align benchmark with SP1 and other zkVM
    experiment riv64 later

  • Mock Prover PR
    • Ming have done the review, WIP for adding >1 degree assert_zero_expression multiplication.
    • follow up tasks: modify addsub opcode unittest to use MockProver
      1. replace random generated witness with real data to pass the unittest
      2. keep prove/verify flow in benchmark/example
  • UInt refactor PR
    • review done from Ming
    • suggestion: commits cherry-pick and exclude commits in master branch

Scroll & External Contributor

  • range table circuit https://github.com/scroll-tech/ceno/pull/154
  • Interpreter implementation
  • PCS integration: Basefold + Plonky2-FRI optimisation

    2 explorations

    1. FRI-Binius => benchmark result not good, pending for new research / implementaion polishment
    2. Basefold + Plonky2
  • From Snarkify: control flow opcode implementation

Misc

  • a one-on-one scheduled around this Thur/Fri for sharing peer review/self evaluation result :)

2024-08-20

Ceno

2024-08-13

Ceno

2024-08-06

Ceno

  • zkVM v2 implemetation
    • framework almost done
    • benchmark
      • Up-to-date result: 2^20 instance run in 1.04s on 16 phy-cores 64GB, achieve 1Mhz prover (should be 10x than sp1, > 12x than jolt).
    • raise super issues to trace sub tasks https://github.com/scroll-tech/ceno/issues/95
    • high priorities sub-tasks
      • implement multi-opcode support => blocking other opcode implementation
      • benchmark: devirgo sumcheck optimization
      • (Kimi) Refactor UInt gadget and use expression system https://github.com/scroll-tech/ceno/issues/103
      • add MockProver: improve opcode debugging ability
    • Research
      • [Scroll] plan to benchmark binius PCS + GKR in the following 2 weeks.

2024-07-30

Ceno

Interpreter:

  • ongoing, with pending tasks on running with mainblock and getting statistics result.

2024-07-23

Ceno

  • (Ming Ongoing) Design and implementing GKR + Hyperplonk variant to specificly address zkVM use case.
    • PoC shows 262k Hz for riscv add great value for potiential fast zkvm prover.
    • review riscv opcodes and see which one can NOT (or high cost) be expressed by new design.
    • high level implementation idea on computation graph: with the dag graph with various operation node, and each node might involve sumcheck or just simply evaluation split/merge.
    • Goal is to keep existing ceno frontend design while just change the underlying implementation to achieve highly code reuse.
    • layer -> vector, and no cellid.
    • target to finish first version in the following weeks.
  • (Kimi Ongoing) PR riscv add opcode reviewing
    • unit test error and pending for debug

Interpreter

2024-07-16

Ceno

  • Engineering
    • (Done) PR more refactor to applied devirgo sumcheck. Boost around 20x on evm add benchmark
    • (Ongoing) PR optimize prover run time/memory
    • (Ongoing) PR riscv add opcode
  • zkVM new design from Scroll
    • PoC benchmark shows around 262k Hz (2^20 add in 3.x sec) to generate proof (without PCS)
      > Jolt 90k hz, which means around 3x fast than Jolt.
      > SP1 1.7x fast than Jolt

Interpreter

  • (Ongoing) based on SP1 emulator
    • repo link (TBD)
    • framework still work in progress
    • currently bug fixing

Misc

  • Ceno open source roadmap
    • Aligned with Scroll: prioritize on zkVM and build the framework based on PoC to shift the project asap. Would be focus on zkVM Keccak instead of general keccak.