STARK Prover for a Fluent Integration with the AggLayer: Boosting Efficiency with Plonky3 and rWASM on Triton VM

# STARK Prover for a Fluent Integration with the AggLayer: Boosting Efficiency with Plonky3 and rWASM on Triton VM --- # Introduction Fluent currently supports Succinct Non-interactive Arguments of Knowledge (SNARK)-based proving for its rWasm based virtual machine, known for small proof sizes but high computational time. On the contrary, Scalable and Transparent Arguments of Knowledge (STARKs) offer larger proof sizes with lower computational time. STARKs are efficient argument systems that require no trusted setup and are capable of succinct proof generation and composition. They are also adaptable and can be used for a wide range of applications or purposes, highlighting their flexibility and utility across different use cases. Initially, STARKs included argument systems that used a trustless setup with elliptic curve commitment schemes. However, the term now primarily refers to univariate systems that utilize the FRI low-degree test for Reed-Solomon codes verification. STARKs have gained popularity for their low overhead in embedding during arithmetization (as they operate outside of cryptographically large fields), and their straightforwardness compared to complex aggregation and folding schemes. In these systems, witness data is transformed into polynomials over a univariate domain, necessitating a compatible arithmetization field for Fast Fourier Transform (FFT) techniques, including both classical and additive FFTs. In this piece, we explore the potential integration of a STARK-based prover with Fluent, focusing on enhancing the Triton VM implementation. Our goal is to improve its performance by integrating with Plonky3 and incorporating Fluent rWASM opcodes, hence offering the flexibility to utilize both STARK and SNARK proof systems. This hybrid approach could significantly improve the current landscape by providing a more efficient and scalable framework for verifying computational integrity without sacrificing security. We will begin by reviewing the latest advancements in STARK-based technologies, specifically Plonky2, its successor Plonky3, and Circle STARK. Our investigation will cover the integration of the Triton VM platform with Plonky3, including the Mersenne31 Field and its extension field for supporting 64-bit words, as well as enabling the TIP5 hash function, already supported by Triton VM. Furthermore, we will focus on the implementation of rWASM opcodes and exploring Polygon’s proof aggregation system, the [AggLayer](https://docs.polygon.technology/learn/agglayer/). These enhancements are aimed at improving the capabilities of Triton VM and creating an efficient environment for developing a STARK-based Wasm solution for Fluent. Through this exploration, we intend to open new pathways for the flexible use of STARK and SNARK proof systems, potentially enhancing the effectiveness of cryptographic proofs in the layer 2 space. ## Goldilocks, Baby Bear, and Mersenne Prime Fields In the scaling landscape of blockchain technology, SNARKs and STARKs have become crucial as the backbone of zkEVMs (Zero-Knowledge Ethereum Virtual Machines). Particularly, zkEVMs have emerged in the Ethereum ecosystem as layer 2 solutions for both privacy and scalability, enabling efficient and secure transactions on execution environments on top of Ethereum layer 1. SNARKs and STARKs allow for the compression of transaction data and the execution of computations offchain (layer 2), while still ensuring that all transactions comply to the network's rules. They enable improved transaction speeds and reduced costs for layer 2 users, hence addressing some of the most critical challenges in blockchain scalability. The selection of the underlying prime fields, characterized by unique properties found in [Goldilocks](https://eprint.iacr.org/2015/625.pdf), [Baby Bear](https://eprint.iacr.org/2023/824.pdf), and [Mersenne Prime](https://en.wikipedia.org/wiki/Mersenne_prime#:~:text=Mersenne%20primes%20take%20their%20name,%2C%2067%2C%20127%2C%20257.) Fields, plays a critical role in enhancing the performance and security of these advancements. In the 2015 work by Mike Hamburg, a unique category of primes known as [Goldilock](https://eprint.iacr.org/2015/625.pdf) primes were introduced, described by the formula $p = φ^2 - φ - 1$. Within these prime fields, $φ$ conforms to the Golden ratio, satisfying the equation $φ^2 = φ + 1$. This alignment is not merely of theoretical interest; it has practical implications, particularly when $φ$ is expressed as a power of two. However, the formula's subtraction of 1 introduces a limitation: it diminishes the availability of power-of-two roots of unity. This reduction is suboptimal because a richer set of these roots would substantially improve our capacity to conduct large-scale number theoretic transforms, which are crucial for advanced mathematical computations. Despite this, the Goldilocks primes are also associated with an extension field approximately 128 bits in size, indicating their utility in high-performance computing environments where both security and efficiency are paramount. A Mersenne prime is defined as a prime number that is exactly one less than a power of two, represented as $M_n = 2^n - 1$, where n is an integer. It's important to note that if $n$ is not a prime itself, then $2^n - 1$ will also not be prime. Thus, Mersenne primes can be described as prime numbers that are expressed in the form $M_p = 2^p - 1$, with $p$ being a prime number. Among these, $M_{31}$ (or Mersenne31) represents the prime field generated by $2^{31} - 1$. This specific field is particularly valued for its computational efficiency and simplicity, attributes that are critically important in cryptographic applications. Further enhancing its utility, Mersenne31 features extension fields that are approximately 128 bits in size and are optimized for use like AVX2 and NEON architectures, making it highly suited for sophisticated computational tasks. A prime field $\mathbb{F}$ = $\mathbb{F}_p$ is known as Baby Bear where $p =2^{31} - 2^{27} + 1$. Baby Bear is similarly configured for approximately 128-bit extension fields, optimized for NEON. Plonky2 utilizes 64-bit Goldilocks prime field, characterized by a modulus of $p = 2^{64} - 2^{32} + 1$. This particular size of the modulus is special for efficient execution on conventional computer architectures. Furthermore, its multiplicative group possesses a degree of smoothness that facilitates the effective use of FFT for Reed-Solomon encoding, accommodating two-adic subgroups of up to size $2^{32}$. On the other hand, Plonky3 advances this trend by employing the Mersenne31 field. All these systems all utilize FRI for their commitment scheme, prompting the question: *Can small fields be beneficial beyond this context, for instance, in sum-check-based Polynomial Commitment Schemes (PCS)?* From an efficiency standpoint concerning field implementation and arithmetic circuits, smaller fields are often more desirable. Currently, RISC Zero stands out as the singular project we are aware of venturing in this direction, employing a 31-bit small Baby Bear prime as the primary modulus for witnesses. Similar to the Goldilocks field, the Baby Bear's multiplicative group is adequately smooth for a majority of applications, with $p-1$ = $2^{27}$ $\times$ 3 $\times$ $5$. In particular, Mersenne fields ($p$=$2^{31}$-$1$) facilitates highly efficient arithmetic on 32-bit architectures. In this context, the product expansion to $2^{32}=2 \mod p$ allows for easy reduction of an expanded product, expressed as $2^{32}.x_{hi}+x_{lo}$, to a significantly smaller value of $2.x_{hi}+x_{lo}$. However, the prime’s factorization, $p-1$ = $2.3^2$ $\times$ $7$ $\times$ $11$ $\times$ $31$ $\times$ $151$ $\times$ $331$, indicates a lack of two-adic subgroups, which are instrumental for conducting efficient Cooley-Tukey FFTs. This discussion explores the trade-offs and advantages of utilizing “small” finite fields in zero-knowledge (ZK) proof frameworks, their practical applications, and potential future explorations. For instance, the RISC Zero codebase highlights the benefits of the Baby Bear field for its compatibility with 32-bit CPUs and its optimization for FFT, crucial in numerous ZKP systems. The Goldilocks field's smaller size ensures every element fits within 64 bits, significantly boosting CPU performance, with reported improvements of up to 40x. ## The Complex Extension and real FFTs The Elliptic Curve FFT (ECFFT) employs smooth elliptic curve subgroups for FFT operations. Despite its reliance on elliptic curve group structures, the ECFFT is an algebraic transform akin to the additive FFT, built through a series of curves connected by 2-to-1 algebraic degree 2 mappings, reducing domain sizes at each step. This allows for effective interpolation of low-degree rational functions on the curves. The foundational elements of STARKs, such as interactive oracle proofs and low-degree tests, are adapted to these new rational function spaces and their algebraic geometry codes. ECFFT-enabled STARKs, or elliptic curve STARKs, match the efficiency of traditional STARKs both concretely and theoretically. By moving away from the finite field smoothness requirement, they open up new possibilities in choosing specific primes for arithmetization. While the primary focus is on Mersenne fields, the findings are applicable to any prime field $\mathbb F=\mathbb F_p$ with a modulus p satisfying $p≡3\ mod\ 4$. This condition implies that $(p-1)/2$ is odd, or equivalently, that -1 is not a quadratic residue within the field. Under these conditions, the polynomial $X^2+1$ remains irreducible over $\mathbb F$, allowing us to construct the complex extension field $\mathbb C(\mathbb F)=\mathbb F[X]/(X^2 + 1)$, called Circle Curve, to be the smooth algebraic variety over $\mathbb F_p$. This extension incorporates the formal root $i=√(-1)$, expressed as $\mathbb C(\mathbb F)={x+i⋅y∶ x,y∈ \mathbb F}$, with field operations defined by the algebraic rule $i^2=-1$. The unit circle within F is identified as the algebraic set $S_1 = {(x,y)∈ \mathbb F^2: x^2 + y^2 = 1}$, or in complex terms $S_1 = z ∈ \mathbb C(\mathbb F)^*: z⋅ \bar{z} = 1$, where $\bar{z}$ represents the conjugate $\bar{z}= x - i⋅ y$ for any $z = x + i⋅ y$. Since conjugation functions as a field automorphism, $S_1$ maintains closure under complex multiplication, establishing it as a subgroup of the complex multiplicative group $\mathbb C(\mathbb F)^*$, also known as the circle group. **Theorem (Circle Group).** Let $\mathbb F=\mathbb F_p$ be a prime field with $p=3\ mod\ 4$. Then the circle group $S_1$ over $\mathbb F$ equals the group of $(p+1)-th$ roots of unity in $\mathbb C(\mathbb F)^*$, and has order $p+1$ with the group operation $(x_0,y_0 ).(x_1,y_1 )=(x_0.x_1-y_0.y_1,x_0.y_1+y_0 x_1)$. ## Tip5 Hash Function The Tip5 hash function is a newly specified arithmetization-oriented hash function designed with the SHARK design strategy, incorporating low-degree power maps and lookup tables. It is specifically conducted for the field with $p=2^{64}-2^{32}+1$ elements, aiming to address the needs of recursive verification of STARKs. The Tip5 hash function is a cryptographic hash function that employs a sponge construction which is used in cryptographic functions to process data of any size and produce output of a desired length. Tip5 operates with a state consisting of 16 field elements (denoted as $m = 16$) in a finite field $\mathbb F_p$ with $p$ being a prime number. The core of the sponge construction is a permutation function $f: \mathbb{F}_p^m → \mathbb F_p^m$, which is a bijective function mapping meaning $f$ takes the entire state as input and shuffles it around in a deterministic way. In each iteration, $r=10$ field elements are read from the input message. These $r$ elements replace the first $r$ elements of the state. After updating the state with input data, the permutation function $f$ is applied to the entire state. Once all input data has been absorbed, the squeezing phase begins to generate the output hash. In each iteration, the first $r=10$ elements of the state are read and appended to the output. ```mermaid graph LR r1 -->|r| B((f)) A -->|c| B((f)) r2 -->|r| C((f)) B -->|c| C((f)) r3 -->|r| D((f)) C -->|c| D((f)) r4 -->|r| E((f)) D -->|c| E((f)) r5 -->|r| F((f)) E -->|c| F((f)) F --> G[c] F --> H[r] E -->|r| X[r] ``` ***Figure 1**: A sponge Construction demonstrating 3 absorbing and 2 squeezing Iterations, highlighting overwriting of state's rate part during absorption compared to traditional addition-based absorption.* The permutation function $f$ is applied to the state between each squeezing iteration. While the sponge construction theoretically allows for an infinite output length, Tip5 truncates the output to $d=5$ field elements, defining the fixed output size of the hash function. More concretely, the structure of each round is designed to incorporate both non-linear and linear transformations, as well as the injection of randomness, to ensure a high level of security and resistance against cryptographic attacks. It consists of the following three rounds: * **S-box Layer**: Introduces non-linearity to the permutation process, a crucial aspect for cryptographic security. The first $s = 4$ elements of the state are transformed using an S-box function $S:\mathbb F_p→\mathbb F_p$, which is a permutation on $\mathbb F_p$. The remaining elements of the state are transformed using a different S-box function $T:\mathbb F_p→\mathbb F_p$, also a permutation on $\mathbb F_p$. Note that this layer ensures that the transformation is highly resistant to linear and differential cryptanalysis by disrupting potential algebraic structures within the state. * **Linear Layer**: Achieves diffusion by spreading the influence of each state element across the entire state. The entire state vector is multiplied by an $m×m$ MDS (Maximum Distance Separable) matrix. The MDS property ensures optimal diffusion by guaranteeing that a change in any single input element will affect all output elements. Note that this layer is essential for ensuring that the effects of the S-box transformations are uniformly distributed across the state, making the cipher more secure against various forms of cryptanalysis. * **Round Constants**: Injects additional randomness into the permutation to prevent attack vectors that exploit structural patterns. For every round and each state element, a unique round constant is added. These constants are sampled independently and uniformly at random. The addition of round constants ensures that even identical inputs processed through different rounds will yield different results, thereby enhancing the permutation's resistance to symmetric attacks and ensuring that each round contributes a unique transformation. This iterative process is designed to secure the integrity of the permutation against a wide range of cryptographic attacks, making it an essential component of secure cryptographic systems. The choice of field and design strategy makes TIP5 particularly suitable for cryptographic contexts requiring efficient arithmetization, such as leveraging STARKs for enhanced security and scalability. ## Monolith Hash Function Monolith is a state-of-the-art hash function designed explicitly for zero-knowledge proofs, aiming to optimize both processing efficiency and integration within zero-knowledge proof systems. These proofs enable one to affirm the possession of information without revealing the actual content, leveraging hash functions which convert variable-sized data into a fixed-length string. Monolith distinguishes itself by performing equally fast or even faster than conventional hash functions like SHA3-256 and Poseidon in certain scenarios. It maintains consistent processing time regardless of the input, providing resistance to side-channel attacks, and operates on prime fields with specific variables for enhanced security. Key features of Monolith include a lookup-based approach that accelerates computations for efficient software implementation while remaining circuit-friendly. This results in a faster performance than other circuit-friendly hash functions and, in some cases, even surpasses the speed of SHA3-256. Its constant-time implementation fortifies it against side-channel attacks. Moreover, it maintains superior speed compared to its peers in different configurations and state sizes, especially notable in Monolith-64’s comparison to Poseidon and similar functions. Monolith's design does not depend on lookup tables, allowing for constant-time implementations without significant performance detriments. It stands out as a flexible, secure, and fast option in the realm of cryptographic hash functions, bridging the gap between traditional and arithmetization-friendly hash functions. # Plonky3: Efficiency Gains over Plonky2 Plonky3 has emerged as a pioneering advancement in cryptographic systems, refining Plonky2's innovations to elevating its computational efficiency, security, and flexibility. In other words, Plonky3 outperforms its predecessor Plonky2 by optimizing computational processes, particularly through the adoption of the Goldilocks field and improved algorithmic structures, leading to significant efficiency improvements. Plonky2 utilized a larger field size, which, while effective for its time, presented limitations in terms of computational efficiency and flexibility that Plonky3 addresses with its refined approach. The Plonky3 implementation already supports cryptographic fields including Baby Bear, Goldilocks, and Mersenne31. Central to Plonky3 is the Goldilocks field and the extension to permutations like Anemoi (which is a family of ZK friendly arithmetization-oriented hash functions) and the use of hashes like Blake3, balancing speed and security efficiently. The discovery of the Mersenne31 field in Polygon's Plonky3 codebase, suitable for 32-bit data types and thus enhancing GPU/CPU performance, aligns with the ongoing exploration for efficiency. # Circle STARK Conventional STARKs depend on a cyclic group with a smooth order for efficient point interpolation via the FFT algorithm and for expressing constraints concerning adjacent rows. ECFFT presented a novel method for optimizing STARKs within any finite field by utilizing a cyclic group on an elliptic curve. Circle STARK proposes a more straightforward approach, using a simple construction on the circle curve $x^2 + y^2 = 1$. This method proves to be as efficient as the traditional STARKs and ECFFT, particularly when p+1 is divisible by a substantial power of $2$. In the context of the Mersenne prime $p=2^{31}-1$, Circle STARK initial benchmarks show a $40%$ speed increase over traditional STARKs that use the Baby Bear prime $p=2^{31}-2^{27}+1$, indicating a significant step forward in the efficiency of cryptographic proofs. We can generate scalable and transparent proofs across all large fields, via ECFFT. Circle STARKs show similarity to their univariate equivalents, with the primary distinction being their foundation in the realm of bivariate polynomials structured around the circle curve, specifically $\mathbb F_p [x,y]/(x^2+y^2-1)$, as opposed to the linear scope of univariate polynomials. These STARKs are special for arithmetic circuits across prime fields $\mathbb F_p$, particularly when $(p-1)$ exhibits less than ideal smoothness, whereas $(p+1)$ demonstrates compatibility with Circle FFT (CFFT) for orders of significant magnitude. In this setup, witness information is transcribed into bivariate polynomials through the circle FFT, and the fulfilment of constraints transitions to the algebraic equations applied to these polynomials. The interactive oracle proof maintains a core similarity to the univariate scenario, albeit with minor adjustments owing to the dimensional discrepancy between the circle FFT's output and the comprehensive polynomial space. For clarity, this discussion is narrowed down to Algebraic Intermediate Representations (AIR) that focus on constraints among adjacent rows. Extensions to accommodate features like varying row offsets, non-repetitive constraints, and permutation and lookup arguments follow established protocols. | FFT size | Threads | M31 CFFT (ms) | BabyBear FFT (ms) | Ratio | |----------|---------|----------------|-------------------|-------| | $2^{14}$ | 1 | 11.0 | 17.6 | 1.6 | | $2^{16}$ | 1 | 86.8 | 121 | 1.4 | | $2^{18}$ | 1 | 387 | 542 | 1.4 | | $2^{20}$ | 1 | 1700 | 2384 | 1.4 | | $2^{22}$ | 1 | 7400 | 10345 | 1.4 | | $2^{14}$ | 8 | 5.57 | 7.04 | 1.3 | | $2^{16}$ | 8 | 67.0 | 67.7 | 1.0 | | $2^{18}$ | 8 | 321 | 319 | 1.0 | | $2^{20}$ | 8 | 1410 | 1400 | 1.0 | | $2^{22}$ | 8 | 6170 | 6110 | 1.0 | ***Table 1**: A performance evaluation of the FFT algorithm was conducted by the authors, comparing Baby Bear and Mersenne31 for a batch FFT involving 128 polynomials. This assessment was carried out over 100 iterations on an Intel i7-1165G7 processor, with the frequency locked at 1.5 GHz, in a single-threaded mode and utilizing AVX2 for vectorization. In the current multi-threaded setup, a memory bottleneck is encountered for domain sizes of $2^{16}$ or larger, as indicated by a gray background in the results. However, plans are in place to address and optimize this limitation in future updates.* # Triton VM The Triton VM is designed as a stack-based machine equipped with RAM, following the Harvard architecture which separates the storage and signal pathways for instructions (read-only memory) and data. It operates over a base field $\mathbb{F}_p$, where $p$ is the prime $2^{64} - 2^{32} + 1$, known as the Oxfoi prime. This architecture allows for the handling of data and instruction operations within the field $\mathbb{F}_p$, ensuring that both registers and memory elements operate within this finite field. As a result, the VM's transition functions generate low-degree polynomials for verification, derived from the ring of multivariate polynomials over $\mathbb{F}_p$. The instruction set is flexible, supporting variable-width instructions that can manipulate stack elements, perform arithmetic and logical operations, and interact with memory. This flexibility is crucial in implementing complex algorithms efficiently on the VM. For instance, the “add” instruction, which combines the top two elements on the stack and leaves the result on the top, is an example of a single-word instruction. Conversely, a “push + arg” instruction, which places an argument onto the stack, demonstrates a double-word instruction. Triton VM introduces a modular approach to memory, distinguishing between program memory, operational stack memory, RAM, and jump stack memory. Each serves a distinct purpose, from storing the currently executed program to facilitating the VM's operational stack and random-access memory functionalities. This separation not only enhances the VM's efficiency but also its capability to handle complex computational tasks. The VM's design includes separate interfaces for public and private data inputs, ensuring data integrity and security. The design intricately differentiates the arithmetization processes between the public and private interfaces, tailoring the computational framework to enhance security and efficiency. This distinction is crucial in cryptographic contexts, where the handling of sensitive information must be accurately managed. The output interface, designed for public data, reinforces the VM's commitment to secure and transparent computations. The core of Triton VM's innovation lies in its arithmetization strategy, enabling the efficient verification of computations. The VM defines a robust framework for creating algebraic execution tables and intermediate representation constraints, facilitating the generation of proofs for complex computations. This structure is crucial for integrating with Plonky3, as well as supporting Wasm because it allows for the generation of proofs that are both succinct and verifiable, leveraging the VM's capabilities to support advanced cryptographic protocols. This integration positions the VM as a state-of-the-art platform for executing cryptographic proofs and complex computations, validating its role in advancing cryptographic verification systems and supporting scalable, secure applications. ## Instruction Set Architecture (ISA) and Algebraic Execution Tables (AETs) In the Triton VM architecture, the execution trace is distributed across several tables, each interconnected through various cryptographic proofs. This setup facilitates modularization, allowing distinct components to handle specific tasks independently. For instance, the Processor Table captures the activity of the main processor, which can offload complex operations, such as hashing or XOR, to specialized co-processors. The execution details of these operations are then documented in dedicated traces. For example, hashing tasks are managed by the hash co-processor and documented in the Hash Table, while bitwise operations are handled by the u32 co-processor, with its activities noted in the U32 Table. This modular approach extends to managing memory consistency, which is primarily the responsibility of the Operational Stack Table, RAM Table, and Jump Stack Table, illustrating a clear division of tasks within the system. ![image](https://hackmd.io/_uploads/SkhaCP8aa.png) ***Figure 2**: Arithmetic Execution Tables in Triton VM with 9 Arithmetic Execution Tables (AETs), interconnected. In this figure, blue arrows represent Permutation Arguments, red arrows represent Evaluation Arguments, dotted arrows represent Lookup Arguments, and green arrows represent Contiguity Arguments.* Elements such as “program digest,” “input,” and “output” are not considered AETs but are publicly accessible pieces of information. These components collectively form the basis of the claim that Triton VM substantiates through its proof generation. For further details on how these elements integrate into the broader proof system via Evaluation Arguments, refer to the sections titled “[Arguments Using Public Information](https://triton-vm.org/spec/arithmetization.html)” and “[Program Attestation]()”. ## Processor, Operational Stack, and RAM Tables The Triton VM employs a set of specialized tables and memories to manage its operations and state during program execution: * The Processor Table tracks the execution states of the VM in chronological order, from the initial to the terminal state. * The Operational Stack Table oversees the operational stack, utilizing 16 registers for immediate access. Operations, function arguments, and pointers are stored here, while additional stack elements are kept in the "operational stack underflow memory," which is recorded in the Op Stack Table to ensure its immutability. This table also logs any interactions with the underflow memory to maintain its constancy. The RAM Table ensures consistent access to RAM, mandating explicit mutations through specific instructions, thus fostering a deterministic execution. Unlike the Operational Stack Table, it does not use a stack pointer, and undetermined values arise from reading uninitiated addresses, providing a gateway for non-deterministic input. However, once read, these addresses will consistently return the same value. * The Cascade Table assists in the arithmetization process for lookups in the split-and-lookup S-box of the Tip5 permutation by enabling the Hash Table to handle 16-bit wide limbs, despite the S-box being defined over 8-bit limbs. This is in conjunction with the Lookup Table, which performs the actual lookup of 8-bit limbs and is always fully populated to facilitate consistent lookups. * The Jump Stack Memory contains the overflow from the Jump Stack, while the U32 Operations Table arithmetizes coprocessor operations for complex 32-bit unsigned integer operations. Each operation is broken down into multiple rows, known as a section, within the U32 Table, and these sections operate independently, with the current instruction guiding their arithmetization. * Lastly, the Hash Table is part of the hash coprocessor's arithmetization, along with the Cascade Table and the Lookup Table. It can hash the top ten elements of the Op Stack in one cycle and supports the quick execution of sponge functions for hashing. This functionality not only expedites hashing instructions but also contributes to program attestation by hashing the program itself. ## Enhancing Triton VM with rWasm WebAssembly (Wasm) is a stack-based virtual machine designed to support widely used programming languages like C/C++, JavaScript, Ruby, Rust, Swift, and TypeScript. It allows developers to compile these languages into Wasm bytecode, enabling the creation and execution of distributed applications on blockchain platforms capable of handling large transaction volumes and high transactions per second (TPS). Furthermore, Wasm bytecode is critical in compiling smart contracts, opening doors for widespread adoption by bridging traditional software languages with blockchain technology. The Fluent project focuses on addressing the scalability and expressivity limitations of Ethereum L1 by developing a layer 2 architecture utilizing zkWasm. Fluent's primary aim is to enable the seamless integration of libraries and applications, particularly those based on Wasm, within the Ethereum ecosystem. To further support this integration, the initiative involves enhancing Triton VM to accommodate all ZK-friendly Wasm opcodes, replacing those that are not ZK friendly with alternatives designed for compatibility, known as rWasm. This entails setting specific limits for the execution of each instruction or opcode within Triton VM. rWasm's objective is to generate a simplified binary representation of Wasm that remains backward compatible with the original Wasm instruction set. This simplification aims to streamline the process of proving the correctness of Wasm programs, enhancing efficiency and making the development environment more conducive to zkSTARK applications while retaining compatibility with existing instructions. However, this approach introduces challenges in proving operations using ZK proofs, necessitating innovative solutions to overcome these obstacles. ## Enhancing TritonVM with Polygon’s AggLayer Current blockchain infrastructure lacks trust minimized cross-chain interoperability, causing fractured and insecure cross-chain movement of liquidity. An emerging solution for trust-minimized bridging between different chains and rollups are proof aggregation-based systems, aggregating and verifying proofs representing the state of each connected chain. These solutions also aim to deliver atomicity which ensures all parts of a transaction across multiple chains either succeed together or fail together, eliminating the risk of partial updates. This process, as it applies to Ethereum based rollups,, faces limitations in speed due to the need for transaction finalization on Ethereum and the verification of chain states, introducing a delay in the transaction process. Polygon's AggLayer protocol aims to address these challenges by creating a decentralized aggregation protocol for ZK proofs across chains, facilitating low-latency, cross-chain transactions. The AggLayer protocol mainly operates as a decentralized system with two primary functions: * It compiles ZK proofs from all interconnected blockchains. * It guarantees secure and nearly instantaneous atomic cross-chain transactions. This protocol ensures that transactions are secure and atomic, improving the interoperability and user experience across the Polygon ecosystem. The AggLayer operates through proof aggregation, batch confirmation, and a cross-chain interoperability mechanism. It aggregates proofs and state updates from various chains and settles them on Ethereum. It also enables chains to interoperate without waiting for Ethereum's state finalization by aggregating and batching proofs and state updates from multiple chains, allowing for a collective verification and settlement process on Ethereum. This batching process not only streamlines transactions but also significantly reduces latency by leveraging a pre-consensus mechanism that ensures all transactions within a batch are validated and agreed upon by the involved chains before being finalized on Ethereum. Security is a critical aspect, ensuring that no chain's state relies on invalid or non-finalized states from another chain. This mechanism prevents the finalization of any chain's state on Ethereum if it includes unsuccessful transactions from an atomic bundle across other chains. ``` mermaid %%{init: {"quadrantChart": {"chartWidth": 480, "chartHeight": 480}, "themeVariables": {"quadrant1Fill": "green", "quadrant2Fill": "grey","quadrant4Fill": "grey"} }}%% quadrantChart x-axis Low Scalability --> High Scalability y-axis Low User Experience --> High User Experience quadrant-1 AGGREGATED quadrant-2 MONOLITHIC quadrant-4 MODULAR ``` The protocol mitigates data withholding attacks on validium chains by instituting a challenge-response mechanism, where validators must prove data availability upon request, ensuring transparency and integrity. It leverages decentralized data layers and fraud proofs to detect and prevent data withholding, thereby securing cross-chain transactions. It also outlines a three-phase process for chain interaction within the ecosystem, balancing the trade-off between latency and liveness guarantees. By enabling low-latency cross-chain transactions, it aims to foster a unified liquidity experience across the Polygon ecosystem, enhancing user experience. # Triton VM Enhancements: Extension with Plonky3 As said earlier, Plonky3 supports larger and more complex circuits compared to Plonky2. This increased capacity makes it more suitable for a broader range of applications. * Integrating Plonky3 into Triton VM has potential to enhance Triton VM's prover as well as reduce proof sizes. * The Mersenne31 field, known for its efficiency in arithmetic operations (especially on GPU/CPU which often use 32 bits), will be integrated into Triton VM. * Furthermore, we aim to use the extension field over the Mersenne31 to support the word size of 64 bits for Wasm opcodes. ## Integration with Plonky3 and WASM The integration with Plonky3, which utilizes the Mersenne31 field, offers optimizations leading to faster prover and verifier times, enhancing performance on both CPU and GPU platforms. Implementing rWASM opcodes within Triton VM further extends its capabilities, providing an efficient environment for executing STARK-based Wasm solutions, marking a significant step forward in cryptographic verification and execution systems. ## Triton VM with an Extension of Plonky3 and TIP5 Hash Function Monolith is already a new promising hash function with high CPU throughput and fewer arithmetic calculations compared to Rescue and Poseidon. Utilizing a more efficient hash function is of utmost importance because it takes the dominant calculations in such a low degree extension (LDE), hashing long inputs, building Merkle trees, and FRI. Plonky3 has already been supporting the Monolith hash function, and we aim to integrate it with Triton VM. * **rWASM Opcode Implementation:** rWasm opcodes of Fluent enable on-chain reconfiguration, enhancing the adaptability of decentralized applications. The implementation of rWasm opcodes will extend Triton VM's capabilities, allowing for more flexible and dynamic smart contract execution. * **Integration to Polygon’s AggLayer:** With this integration, transactions across various blockchains (other layer 2s) are efficiently processed through Polygon's AggLayer, significantly reducing transaction costs. This approach will also enhance interoperability and communication between L2 chains, utilizing ZK proofs for security and Ethereum for ensuring transaction integrity and validation. In this context, all the proof generated by Stark Fluent will be verifiable by the AggLayer. ## Potential Methodology of the Integration * **Plonky3 Integration:** This involves adapting existing components of Triton VM such as BFieldElement and XFieldElement to accommodate Plonky3's cryptographic primitives such as Mersenne31, BinomialExtensionField, and ensuring smooth interoperability. * **Mersenne Field with 64-bit Word Extension:** Modify the Plonky3 implementation to support the Mersenne31 field with a 64-bit word extension. This may involve adjustments to the underlying arithmetic operations, data structures, and memory management within Triton VM (e.g., BFieldElement and XFieldElement). * **Monolith Hash Function Support:** Integrate the Monolith hash function with the necessary algorithms and data structures with TritonVM. * **rWASM Opcode Implementation:** Develop and integrate rWasm opcodes into Triton VM's execution engine. This involves extending the VM's instruction set, updating the interpreter, and ensuring that the new opcodes are compatible with existing smart contracts. * **Integration to Polygon’s AggLayer:** We first modify our Plonky3 proofs to satisfy AggLayer's specific SNARK circuit format and prover/verifier interface. Then, we aim to develop a mechanism for the Fluent L2 to efficiently submit state updates to the AggLayer periodically. Finally, we will implement the communication protocols for Fluent to interact with the AggLayer and other chains within the ecosystem. # References 1. Hamburg, M. (2015). 'Ed448-Goldilocks, a new elliptic curve', IACR ePrint. Available at: https://eprint.iacr.org/2015/625.pdf (06/03/2024). 2. Ben-Sasson, E., Carmon, D., Kopparty, S., Levit, D. (2023). 'Scalable and transparent proofs over all large fields, via elliptic curves: (ECFFT Part II)'. In: Theory of Cryptography: 20th International Conference, TCC 2022, Chicago, IL, USA, November 7–10, 2022, Proceedings, Part I, pp. 467–496. Springer. Available at: https://eprint.iacr.org/2022/1542 (06/03/2024). 3. Ben-Sasson, E., Carmon, D., Kopparty, S., Levit, D. (2021). 'Elliptic curve fast fourier transform (ECFFT) part I: fast polynomial algorithms over all finite fields'. Electron. Colloquium Comput. Complex., p. 103. Available at: https://eccc.weizmann.ac.il/report/2021/103 (06/03/2024). 4. Haböck, U., Lubarov, D., Nabaglo, J. (2023). 'Reed-Solomon Codes over the Circle Group', IACR preprint archive. Available at: https://eprint.iacr.org/2023/824 (06/03/2024). 5. Polygon Zero Team. 'Plonky2: Fast recursive arguments with PLONK and FRI'. Available at: https://github.com/mir-protocol/plonky2/blob/main/plonky2/plonky2.pdf (06/03/2024). 6. Polygon Miden. 'A STARK-based virtual machine'. Available at: https://github.com/maticnetwork/miden (06/03/2024). 7. Polygon Hermez. Available at: https://github.com/orgs/0xPolygonHermez/repositories (06/03/2024). 8. 'Triton VM Specification'. Available at: https://triton-vm.org/spec/ (06/03/2024). 9. 'Introduction to Fluent VM'. Fluent Labs. Available at: https://docs.fluentlabs.xyz/learn/introduction/introduction-to-fluent-vm (06/03/2024). 10. 'Risc0 Documentation'. Available at: https://github.com/risc0/risc0 (06/03/2024). 11. Bertoni, G., Daemen, J., Peeters, M., Van Assche, G. (2012). 'Cryptographic sponge functions'. Available at: https://keccak.team/files/CSF-0.1.pdf (06/03/2024). 12. 'Plonky3 GitHub Repository'. Available at: https://github.com/Plonky3/Plonky3 (06/03/2024). 13. Bouvier, C., Briaud, P., Chaidos, P., Perrin, L., Salen, R., Velichkov, V., Willems, D. (2022). 'New Design Techniques for Efficient Arithmetization-Oriented Hash Functions: Anemoi Permutations and Jive Compression Mode', IACR Cryptol. ePrint Arch., p. 840. Available at: https://eprint.iacr.org/2022/840 (06/03/2024). 14. 'Anemoi: A Family of ZK-friendly AO Hash Functions'. Available at: https://anemoi-hash.github.io/ (06/03/2024).