Ingonyama - HackMD

BaseFold+-
Yuval Domb yuval@ingonyama.com In BaseFold, the authors make a key observation about the connection between multilinear and univariate polynomials, and the FRI protocol, leading to the construction of a new multilinear Polynomial Commitment Scheme (PCS). Consider a multilinear polynomial $f_M(\underline{x})$, where $\underline{x} \equiv (x_0, x_1, \dots, x_{n-1})$. Transforming it into a univariate polynomial $f_U(x)$ can be done by substituting $x_i \leftarrow x^{2^i}$ for all $i \in [n]$. The resulting univariate polynomial has degree $\deg(f_U) = 2^n - 1$, as expected. We refer to $f_M(\underline{x})$ and $f_U(x)$ as the multilinear-univariate twins, hereafter. Notably, executing $n$ rounds of the FRI protocol on the univariate polynomial $f_U(x)$ with the randomness sequence $\underline{r} \equiv (r_0, r_1, \dots, r_{n-1})$ results in a constant polynomial $f(x) = c_r$ where $c_r = f_M(\underline{r})$, the evaluation of $f_U(x)$'s multilinear twin at $\underline{r}$. This suggests that FRI can act as an instantiation of the random oracle for a Sum-Check Interactive Oracle Proof (IOP). Concretely, this entails executing the two protocols concurrently, round-by-round, using the same randomness and equating their outputs. To construct a PCS, BaseFold proposes the following approach. Suppose a Prover is given a multilinear polynomial $f_M(\underline{x})$ and aims to construct a transparent PCS for evaluating it at $\underline{z} \in \mathbb{F}^n$.The Prover begins by committing to $f_U(x)$, the univariate twin of $f_M(\underline{x})$. It then concurrently executes the FRI protocol on $f_U(x)$ and the Sum-Check protocol on the modified polynomial $\tilde{f}_M(\underline{x}) \equiv f_M(\underline{x}) \cdot eq(\underline{x}, \underline{z})$, where
Yuval Domb changed 3 months agoView mode Like 1 Bookmark
Hardware-Friendliness of HyperPlonk, Part 2
This article is a follow-up to our earlier post on the hardware-friendliness of HyperPlonk, expanding on claims made about the Sumcheck protocol. In that post, we argued that the performance bottleneck in Sumcheck stems not from raw computation, but from memory access patterns. This is a critical distinction: ideally, cryptographic protocols like Sumcheck should be compute-bound rather than memory-bound, since modern hardware—particularly GPUs and specialized accelerators—is optimized for high-throughput arithmetic but often struggles when constrained by memory bandwidth. This memory-bound behavior is especially evident in the use of Number Theoretic Transforms (NTTs), a fundamental building block for polynomial operations in Sumcheck. Although NTTs are highly parallelizable, they require frequent and structured access to large arrays of field elements, placing significant pressure on memory subsystems and cache hierarchies. As a result, even when the underlying arithmetic is relatively inexpensive, memory access overhead can dominate overall performance. Addressing this limitation calls for optimized memory layouts, hardware-aware scheduling strategies, and rethinking systems design to push hardware toward its computational limits, not its memory ceilings. This follow-up aims to empirically validate our earlier claim by profiling the Sumcheck implementations within ICICLE. Through this analysis, we offer concrete evidence that memory access is indeed the dominant bottleneck, and provide insights that can inform future optimization and hardware acceleration strategies for Sumcheck and similar protocols.
Elan Neiger changed 3 months agoView mode Like Bookmark
The Barrett-Montgomery duality
Yuval Domb yuval@ingonyama.com x (33) Multiplication in prime fields with large unstructured primes is a fundamental building block of modern cryptography. Modular multiplication of two field elements $a, b \in \mathbb{F}_p$ requires computing their full product $ab$ and reducing it modulo $p$. The two most widely used reduction algorithms are due to Barrett and Montgomery, with the latter operating in a permuted coset of the field (Montgomery form). Single-precision implementations of both algorithms typically require three full multiplications: one for $ab$ and two for the reduction. With schoolbook multiplication, each full multiplier costs $n^2$ digit multiplications, resulting in a total of $2n^2$ for the reduction. Faster methods like Karatsuba can decrease the number of digit multiplications but introduce significant overhead in additions and complex internal dependencies, making them less suitable for CPU-based implementations. Multi-precision approaches (operand scanning), particularly with Montgomery reduction, are the most widely used today. The best known algorithm (at least to us and ChatGPT) achieves reduction in $n^2 + n$ digit multiplications without increasing additions or introducing complex dependencies, making it highly efficient for CPU-based systems.
Elan Neiger changed 4 months agoView mode Like 3 Bookmark
AIR-ICICLE : Plonky3 on ICICLE, part 1
image (6) In this article, we present an integration of the ICICLE field library (rust wrappers) for trace generation and symbolic constraints for AIR arithmetization using the Plonky3 framework. This enables users to write AIR circuits in the Plonky3 AIR language and easily interface with ICICLE APIs, generate trace data in ICICLE field types in any device using ICICLE’s device agnostic API’s, generate symbolic constraints. The user can parse the symbolic constraints and show that the trace satisfies the constraints by taking advantage of ICICLE backend APIs such as the NTT, Polynomials, VecOps, Hash, MerkleTree and FRI (upcoming). In short, this enables users to write custom STARK provers that meet their application requirements.
Omer Shlomovits changed 5 months agoView mode Like 1 Bookmark
Solving Reproducibility Challenges in Deep Learning and LLMs: Our Journey
Reproducibility in deep learning has long been a critical concern. Whether it’s in autonomous vehicles, medical systems, or simply replicating the results of a scientific experiment, ensuring consistency is key. This challenge has become even more prominent with the rise of large language models (LLMs), particularly when we want our models to consistently follow a specific pattern, for example in safety-critical applications. This is a well-known problem (e.g., Determinism in Deep Learning, Reproducible Deep Learning Using PyTorch, and to the best of our knowledge, there is still no canonical solution for the general case — only best practices to mitigate its impact. Citing from the documentation of Pytorch on Reproducibility: “Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.” 51982534-14E4-4D5C-A80E-A09A89E9F7F8 (1) <center> “Create an image of a red apple in studio lighting”</center> Our Motivation
Omer Shlomovits changed 9 months agoView mode Like Bookmark
zkDL++
Introducing zkDL++, a novel framework designed for provable AI. Leveraging zkDL++, we address a key challenge in generative AI watermarking—maintaining privacy while ensuring provability. By enhancing the watermarking system developed by Meta, zkDL++ solves the problem of needing to keep watermark extractors private to avoid attacks, offering a more secure solution. Beyond watermarking, zkDL++ proves the integrity of any deep neural network (DNN) with high efficiency. In this post, we outline our approach, evaluate its performance, and propose avenues for further optimization. Introduction Stable Signature is a watermarking (WM) scheme introduced by Meta AI to allow for identification of images generated by their latent diffusion model (LDM). It uses a Convolutional Neural Network(CNN) to extract the WM, and has the advantage of being very robust to image tampering, allowing for detection even if the image was filtered, cropped, compressed, etc. However, Stable Signature suffers from the drawback of having to keep the extractor model private, as exposing it makes the WM susceptible to attacks. This drawback can be significant if at any point there is a controversy about ownership of a certain image. While Meta, or any other company using such a WM method, can run the extractor and see that a certain image was generated by their model, they cannot prove it to a third party such as social media users, unless they expose the weights of the extractor model. Such exposure may lead to obsolescence of the WM system. We propose using the emerging technology of Zero-Knowledge Proof (ZKP) as a solution to this problem. In a ZKP scheme, a prover wishes to convince a verifier that they have knowledge of a witness $W$, which obeys a certain functional relation $F(X,W)=0$, without revealing $W$. The relation $F$ is generally represented by an arithmetic circuit and is public. The variable $X$ contains public input and output to the circuit. In the case of Stable Signature, the relation $F$ is the architecture of the WM extractor, with its output compared to the key, and it is satisfied if-and-only-if the extracted WM from the input image matches the key. The public variable $X=(x_{in},x_{out})$ includes the image and the Boolean output, and the private witness $W=(w,k)$ includes the weights of the extractor and the key as is depicted in the diagram above. A verifier can then run a verification algorithm that tests the proof and returns True if the relation is indeed satisfied, and False otherwise. An additional challenge is ensuring that the weights weren’t deliberately chosen to produce a specific result for a given image. Fortunately, this issue can be easily resolved by the following:
Omer Shlomovits changed 10 months agoView mode Like 4 Bookmark