Try   HackMD

Folding Schemes: Why they matter and how they are not employed in the best way possible (Part 1)

In this multi-part series, we aim to make folding schemes less mysterious. In the first part, we focus on answering simple question. We also examine why some projects experience poor performance despite using folding. TLDR: it is because of sub-optimal choices and things built around folding schemes (e.g., circuits fed to folding schemes) rather than folding schemes themselves. In future parts, we will provide a gentle introduction to NeutronNova's and Neo's folding schemes.

  • Among all folding schemes, NeutronNova has the best asymptotics and concrete efficiency. NeutronNova also shows how to handle non-uniformity efficiently that arises when dealing with zkVMs. In particular, NeutronNova provides an approach to build folding schemes for complex statements including VM executions.
  • After NeutronNova, we will dive into Neo, which combines the best of all existing schemes:
    • incurs tiny recursion overheads (like Nova)
    • leverages small fields (like STARKs)
    • incurs pay-per-bit commitment costs (like Binius)
    • leverages the sum-check protocol (like Spartan and Jolt)

Now, let's begin. We start with the most important question.

What is the fundamental advantage of folding schemes over traditional monolithic proving systems?

By traditional monolithic proof systems, we mean proof systems such as Spartan (which powers Jolt, the state-of-the-art zkVM) and STARKs (which underlies other zkVMs).

First, monolithic proving systems follow a relatively elaborate pipeline:

  1. Commit to the witness
  2. Run a Polynomial IOP (PIOP) – often introducing additional commitments like quotient polynomials
  3. Prove evaluations of the committed polynomials – in many cases more expensive than committing to the witness

Note that if we have a circuit with a few million constraints/rows in AIR, R1CS, or Plonkish (or their generalization CCS), the best proving system turns out to be Spartan. See this article for a deeper dive on Spartan. In the current article, we are concerned with very large computations (e.g., those that require many millions of CPU cycles to run).

Folding-based systems, especially modern ones like NeutronNova, transform this paradigm by:

  • Eliminating step (3) entirely. Rather, folding schemes work with commitments to smaller witnesses and prove an evaluation only once for a small folded polynomial.
  • Drastically reducing the cost of step (2)

Second, monolithic proof systems cannot scale beyond a certain scale as they would run out of memory. To address this, they use recursion: produce proofs of smaller statements and recursively compose those proofs. This incurs additional prover work. Folding schemes are designed to make recursion efficienteven without producing a proof in the first place (i.e., they simply commit to a witness and run a prefix of the PIOP).

Let’s unpack what that means in practice.


🚀 Folding Schemes Slash Proof Generation Costs

In many traditional proof systems, opening polynomial evaluations (step 3) is surprisingly expensive—often more expensive than committing the original witness.

Folding eliminates the need for repeated polynomial evaluations. For example, NeutronNova:

  • Uses no quotient commitments
  • Implements single-round sum-checks (instead of logarithmic rounds)

This means fewer rounds, fewer commitments, faster prover times, and lower recursion overheads.


🔄 Folding Enables Cheap and Scalable Recursion

Another massive benefit: folding enables efficient recursion.

Most proof systems can’t scale to large computations in a single shot. They rely on recursion—but naively combining proofs incurs massive overhead, especially when the base prover is expensive. Moreover, some proof systems like STARKs use the so-called "brute-force" recursion, which is significantly more expensive. An alternative approach is the very recent streaming sum-check, but leveraging this requires adopting a host of other techniques and is concretely more expensive than a folding-based solution.

Folding-based systems like Nova shine here. Recursion is built in, cheap, and incremental. Proof composition is natural, and there’s no need to re-prove everything from scratch. That’s a game-changer for use cases like zkVMs.


🧠 Implementation Details Matter (A Lot)

Despite this promise, we see a lot of projects exhibit poor performance despite using folding, compared to solutions built on monolithic proving systems. Many folding-based systems miss out on performance because of how they use folding, rather than folding itself. A few critical mistakes include:

  • Using Merkle trees for memory checking
    → Requires committing to thousands of random witness values (even when using SNARK-friendly hash functions like Poseidon) per memory op—totally impractical.
    → State-of-the-art approaches such as Nebula address this, but one must be very careful not to pay for arbitrary witness values. If one does not implement those required optimizations described in the paper (which are not part of the core protocol), they are not bound to reap the full benefits of Nebula. This is a common mistake that we see in the wild.
  • Using universal circuits that pay for every instruction even when they are not invoked.
    • There are several different ways to handle non-uniformity. If you want to explore, the NeutronNova paper provides a detailed recipe. We will unpack this in a future post.
  • Using tiny step circuits. Folding works by repeatedly folding a step circuit. The work done in each step circuit should be larger than the recursive verifier circuit.
  • Linear folding instead of tree folding. Many zkVMs based on monolithic provers employ multiple parallel provers and use brute-force recursion. The same strategy should be used when folding, to get the best possible latency.
  • Sub-optimal witness generation
  • Ignoring bit-width optimizations during multi-scalar multiplication (MSM), a core contibutor to the folding costs.

In fact, MSM performance can vary drastically based on implementation.


📊 Data Speaks: MSM Performance Benchmarks

Folding isn't just a theoretical win—it shines when backed by practical optimizations. Take multi-scalar multiplication (MSM), a core building block in ZK systems. Many libraries like halo2curves and arkworks don’t optimize for small values, but Nova’s commitment routines do.

Let’s compare mean times (in milliseconds) for committing 2^20 scalars across different bit-widths. The benchmark code is available here

Input Bit-width halo2curves Commit (ms) Nova Specialized Commit (ms)
1 226.72 8.66
10 249.75 15.73
16 370.24 34.76
32 417.82 52.37
64 282.60 98.53

This comparison makes two things clear:

  1. Bit-width aware MSMs matter. The Nova routines massively outperform unoptimized ones when committing small values.
  2. Poor MSM implementations can hide folding's potential. Libraries not optimized for real-world workloads leave performance on the table. In particular, existing curve libraries are optimized for committing to random/arbitrary values. As we can see from the table, small values are sometimes more expensive to commit with halo2curves than bigger values.

📌 Key Takeaways

  • Folding reduces prover's cost by eliminating significant parts of the prover's work.
  • Recursive proof composition becomes practical and efficient.
  • Implementation details—especially memory checking and MSM—make or break performance.
  • Popular elliptic curve libraries often miss optimization opportunities, leaving performance on the table.

👉 In Part 2, we’ll explore the NeutronNova folding scheme and how to handle non-uniformity.