By Luke Pearson and the Cysic team
TL;DR:
The significance of zero-knowledge proofs (ZKP) has grown exponentially in recent years, emerging as one of the most crucial innovations in computer science over the past half-century. This can be attributed to the fact that ZKPs have the potential to dramatically enhance the scalability of blockchain platforms such as Ethereum. A key aspect of ZKPs is their ability to significantly increase the transactions per second (tps) on various blockchain platforms, relying solely on mathematical principles rather than trust. By enabling validators to consolidate multiple transactions into a single, concise proof, ZKPs ensure both accuracy and integrity throughout the process. ZKPs offer many other features that make them essential components of various scaling and privacy solutions, including ZK aggregations like StarkNet, private ZK aggregations like Aztec, and Layer 1 chains like Mina, Filecoin, Manta and Aleo. Nonetheless, ZKP is not without its limitations, as the process of generating proofs can be extremely resource-intensive in terms of both time and energy. The creation of proofs is often slowed down by the need for numerous complex mathematical operations, such as exponentiations, inversions, and bilinear pairing computations. Consequently, it remains a challenge to optimize ZKP solutions in order to fully harness their potential. To overcome these issues for all proposed ZKP constructions, it is of utmost importance to develop hardware acceleration methods. Namely, they can be accelerated 10-1000 times through the use of specialized hardware such as Field Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs).
In this article, we will provide an overview of the computational challenges associated with ZKPs, followed by a discussion of potential improvements that could help address these issues and enhance the efficiency of these cryptographic techniques
zkSNARK (Zero-Knowledge Succinct Non-Interactive ARgument of Knowledge) scheme is a type of ZKP that allows a prover to convince a verifier that the prover knows a witness without revealing any information about that witness. The scheme consists of four algorithms: Setup, KeyGen, Prove and Verify. The setup algorithm produces some structured reference string, which will be used by the KeyGen algorithm to generate a proving key and a verification key for some specified circuit. The prover, with the proving key and the statement/witness pair, can generate a ZK proof with respect to the relation of the statement/witness pair through the specified circuit. The verifier can check the validness of ZK proofs using the verification key and the statement.
A zkSNARK scheme needs to satisfy the following features:
The other variant is called zkSTARK (Zero-Knowledge Scalable and Transparent ARgument of Knowledge). It can function with or without zero knowledge, and can be either interactive or non-interactive protocols. They can also substitute zkSNARK as an interactive protocol. Unlike zkSNARK, interactive proofs do not need a trusted setup phase, making STARK systems a better option. STARK systems have overcome this disadvantage by utilizing the Interactive Oracle Proofs (IOP) domain, which relies on Fast Reed-Solomon codes to improve scalability, leading to the development of zkSTARK proof systems. Furthermore, zkSTARK based systems traditionally work with logarithmic complexity for both the prover and verifier, making them efficient and preventing one party from performing a Denial of Service (DoS) attack since each side's computations take a similar amount of time. Another notable property of zkSTARK is the conjectured post-quantum security, while zkSNARK is not due to the Shor's quantum algorithm. The zkSTARK system offers post-quantum security, provided that the hash function employed within the framework is itself resistant to quantum computing attacks.
In the realm of blockchain technology, the abbreviations SNARK and STARK are often used in place of zkSNARK and zkSTARK, as either when privacy is not implemented or required for differing blockchain use cases. Consequently, throughout this article, we will also adopt the simplified term "zk" in our discussions and explanations. There are two important use cases of ZKPs:
In this context of blockchain, the above use cases of ZKP are presented as:
Consequently, as cryptocurrency adoption continues to expand, the demand for enhanced performance and privacy is also expected to rise, fueling the need for more sophisticated applications of ZKPs. As developers create new applications and protocols, ZKPs are poised to play a crucial role in facilitating secure and efficient decentralized systems. These systems will be capable of handling large-scale transaction processing while preserving user privacy, meeting the evolving requirements of the digital currency landscape.
To prove a computation using ZKPs, it is necessary to convert the computation from a classical program to a ZK-friendly format. There are two ways to do this: either by using a library written in some existent language, like Arkworks (in Rust), or by using a Domain Specific Language (DSL) such as Cairo or Circom that compiles down to the required primitives for generating the proof. The more complicated the operation, the longer it takes to generate the proof. In addition, some operations are not inherently ZK-friendly and require additional work to be made so. For instance, operations like bitwise functions utilized in SHA or Keccak may not be friendly with ZKPs, leading to extended proof generation times. This can occur even for operations that are relatively inexpensive when executed on a classical computer. After converting your computation into a ZK-friendly format, you select some inputs and submit them to a proof system. There are numerous proof systems, such as Groth16, PLONK, HyperPlonk, UltraPlonk, Sonic, Spartan, and STARK. All these systems share the same basic function of accepting a ZK-friendly computation with inputs and generating a proof as output. Depending on the proof system, the proof generation process may differ, but the bottleneck ends up being similar.
There are mainly two computational tasks that are often used in the context of ZKPs:
In proof systems where both NTTs and MSMs exist, about 60% of the time generating a proof is spent on MSMs, and the rest is dominated by NTTs. Both MSMs and NTTs have performance challenges that can be addressed in the following ways:
We believe that hardware acceleration is a vital component in enabling blockchain protocols to attain the necessary scalability for practical applications. Currently, blockchain protocols are constrained by their on-chain compute and storage capacities, as well as their networking bandwidth. By optimizing off-chain hardware and proof generation, we are confident that we can substantially boost the performance of blockchain networks, making them more effective and efficient.
ZPrize is a collective initiative within the blockchain industry, comprising over 32 partners and sponsors who contribute their time, resources, and efforts to ensure its success. The program provides a range of challenges and programs to inspire participants to develop inventive solutions that promote sustainability and reduce carbon emissions. They took pride in achieving both objectives, which marks a significant advancement in the field of zero-knowledge cryptography. The outcomes of the ZPrize surpassed their expectations, as evidenced by the chart provided below. For the majority of the prizes for which they received submissions, the average enhancement from the baseline was more than five times. Notably, some prizes involving FPGAs lacked a public benchmark, making these submissions the first public demonstration of the algorithm implementations in open-source form. Here are a few noteworthy achievements:
In the coming months, they plan to feature the work of all the teams who participated in this initiative. As the field of zero-knowledge cryptography evolves, novel methods and obstacles will arise. They aspire to regularly organize the ZPrize competition to facilitate the progress of this technology and make it available to the public through a collection of open-source materials.
The bottleneck in proof generation typically stems from the multiplication of large number vectors, including fields or group elements, multiscalar multiplications of variable or fixed cardinality, and NTT and inverse NTT. In proof systems utilizing both NTT and MSM, MSM contributes to approximately 60% of the proof generation time, while NTT dominates the remaining time. Although MSM is parallelizable, it remains slow, necessitating significant memory resources and often consuming most of the available memory on the device. On the other hand, NTT relies heavily on frequent data shuffling, which complicates acceleration by distributing the load across computing clusters. Additionally, NTT demands substantial bandwidth to run on hardware, as loading and unloading data over the network can notably decelerate operations, despite the hardware operations themselves being quite fast. However, there are methods to enhance the performance of both MSM and NTT in order to optimize proof generation processes.
MSM is commonly implemented using Pippenger's algorithm to minimize the number of group addition operations. This method has a straightforward hardware implementation, which includes a group addition unit and a bucket table as its basic components. From a system design perspective, MSM is highly accelerator-friendly for the following reasons:
In summary, accelerating MSM using specialized hardware is highly advantageous. However, the main issue is the lack of such hardware in the market. Cycis has designed an MSM accelerator using Xilinx FPGAs, which demonstrates the best system-wise performance to date (around 40ms per
The NTT is composed of layers of butterfly operators. While the textbook layer-by-layer NTT implementation can quickly become bottlenecked by memory bandwidth utilization (due to strided access), this problem can be resolved by executing a batch of layers simultaneously and properly reordering NTT data during computation. As a rule of thumb, memory access can be reorganized into block access with a granularity of approximately (size of on-chip memory capacity) / (
Hardware acceleration can bolster the performance of blockchain protocols by deploying optimized algorithms on diverse hardware technologies, such as GPUs, FPGAs, or ASICs. Enhancing software and algorithmic implementations to more effectively utilize existing resources like CPUs and GPUs, as well as developing custom hardware in conjunction with novel algorithms tailored for specific hardware configurations, can facilitate hardware acceleration. By doing so, the performance of blockchain networks, which are presently constrained by their on-chain computing and storage capacities and network bandwidth, can be significantly improved.
To determine the best hardware technology to implement the acceleration techniques discussed above, we need to consider that ZKPs are still in their early stages of development. As such, there is limited standardization on system parameters and choice of proof system, such as FFT width or bit-size of elements. Additionally, there are two more factors we need to consider:
Based on the above two metrics, does this mean that FPGAs cannot outperform GPU? The Answer is NO. Although a single FPGA chip cannot compete with a single GPU, a massively interconnected FPGA system can beat a massively interconnected GPU system. A top-end GPU system can have at most 10 GPU cards in total due to the constraints of PCIe slots. However, dozens of FPGA chips can be connected together with customized, programmable, high-bandwidth interconnect, therefore reaching a performance level that can beat the GPU system. Cysic's massively-connected FPGA system is a direct testimony of this insight.
ASIC is unanimously believed to be the ideal hardware for ZKP. However, there are three majors concerns over building ASIC for ZKP.
The first issue can be addressed using a framework in hardware, called Instruction-Set Architecture (ISA). An ISA refers to the abstract framework and design of a hardware system, such as CPU and hardware accelerator. It represents the set of basic instructions and operations that a processor can execute. These instructions include arithmetic operations, data movement, logical operations, control flow, and other low-level functions. Using ISA, the hardware accelerating ZKP can be divided into three layers:
This ISA structure provides a solution to support multiple ZK systems and the potential updates. An AZK (ASIC for ZK) can be programmed using the ZK-ISA to support the versatility of ZK systems.
For the other concerns about doing ASIC, it is determined by the market. Currently the valuation of ZK-related projects adds up to more than $10b, we believe it is worthwhile to spend around $10m to build an ASIC to improve the efficiency of ZK proof generation. This time-to-market issue is inherent in all ASIC designs. Fortunately, the supply chain issue for ASIC has become much better now than before. The only solution to address this issue is to start early and warmup the supply chain thoroughly. By the time ASICs come out, it can demolish any other forms of ZK hardware.
ZKPs possess the potential to facilitate scalable and private payment systems, as well as smart contract platforms, substantially enhancing the usability and security of blockchain technology. It is true that ZKPs introduce overheads that have historically impeded their adoption. Factors such as significant computation and verification costs, along with the complexity of implementing ZKP-based systems, can create barriers to entry for numerous developers. Nonetheless, ongoing research and development in the field of ZKPs are addressing these challenges, simplifying the implementation and application of these techniques in practice.
Considering the advantages of GPUs in terms of flexibility, ease of deployment, and expected performance, we believe that companies concentrating on GPU solutions are more likely to thrive in the coming months before ASIC technology emerges. If ASICs attain dominant scale and stability, they may become the preferred hardware for optimized algorithms. Consequently, ZKPs are expected to maintain a crucial role in enabling increasingly advanced and secure blockchain systems in the future.