Dmitry Khovratovich – Ethereum Foundation and Dusk Network
JP Aumasson – Taurus and Inference
Porçu Quine – Lurk Lab and Protocol Labs
Bart Mennink - Radboud University
We define a unified Sponge API for Field Elements (SAFE), which provides ZK proof systems designers with a secure and efficient framework for hashing, encryption, and applications thereof (commitment schemes, Fiat-Shamir transforms, AEAD, and so on). We do not restrict the permutation algorithm nor the field type, thus SAFE can be instantiated with established constructions.
SAFE is implemented by Filecoin's Neptune, which is our reference implementation (in Rust).
Sponges are a technique to build cryptographic primitives from a single permutation: hash functions, MACs, authenticated encryption schemes, PRNGs, and others. Examples of sponges include
When operating in duplex mode, a sponge can be seen as stateful object that can ingest input ("absorb") and produce output ("squeeze") at any time and in arbitrary order. However, the original duplex specification sees the input and output as raw bytes, and leaves application-specific encoding to the users.
However, in ZK proof systems (ZKPs) hash functions often process field elements, rather than raw bytes. Performance being critical to reduce the proof generation and verification cost, dedicated "field-friendly", algebraic hash functions were designed, most of which are sponges. These include for example Poseidon, Rescue, MiMC, and Reinforced Concrete. But their integration in ZKPs can benefit from safety and performance optimization, owing to the following issues:
To address these, to reduce the workload of developers, and as a first step towards a unified cross-platform interface, we propose a generic API for sponge functions that can be instantiated with various permutations of different size, to create the functionalities required in ZKPs.
Our API can be used in the applications and protocols that need the following:
Limitation: The SAFE API does not support variable-length hashing when the length of data hashed is unknown in advance. We make this choice due to its little usage vs. the extra complexity and performance overhead. Indeed, restricting SAFE to pre-defined hashing patterns allows us to get rid of the padding. We nonetheless explain how to extend our API to support variable-length hashing.
Most sponge designs have a permutation that can be reused with our API, in particular the aforementionned ZK-friendly designs. The two main parameters to take into account, aside from the capacity, are:
We recommend choosing a permutation and implementation that have already been "battle-tested", that is, designed and reviewed by cryptanalysts, and already deployed.
The state of a sponge function is seen as an array of width \(n = r + c\) bits, where \(r\) and \(c\) are the rate and the capacity, respectively. The capacity can be thought of as twice the security level, meaning that a capacity \(c = 256\) bits gets roughly 128-bit security against collisions and preimages – as long as the output is at least 256-bit.
When working with field elements, it's more convenient to measure in field elements rather than in bits. This lead to the concept of arithmetic capacity (see 4.3 in the Rescue/Vision paper). In the following, we will adopt this convention and count capacity and rate in field elements, instead of bits. Thus, a rate \(r=2\) means that up to 2 field elements can be absorbed.
For example, the Poseidon hash family has a state consisting of field elements, and its main instances works with 255-bit representations of field elements, and an arithmetic capacity of one or two elements, for 128- and 256-bit security, respectively.
The security level to choose depends on the use case, but as a rule of thumb 128-bit security should be enough for most applications. Note that the security level of the hash function should be consistent with that of other cryptographic schemes in your system. For example, it's probably of little value to have a 256-bit-secure hash if all the other primitives have at most 128-bit security, for example.
The main security notions applicable to hash functions in our context are, from the easiest to achieve (and hardest to break):
usize
type, with potential enforcement of upper bounds on the input size (thus on value of the received usize
element).A SAFE sponge object should expose the following operations to protocol designers (details are described in 2.3 and 2.4):
OK
, or an error.The general workflow of a sponge is then as follows:
REMARK: Several IO patterns can belong to the same equivalence class, and thus leading to identical instances. This is because consecutive calls of a same type (\(\mathsf{ABSORB}\) or \(\mathsf{SQUEEZE}\)) are aggregated to define the initial state. An application that needs to absorb \(L>1\) elements in a row can thus do it one by one (with \(L\) calls to \(\mathsf{ABSORB}\)), or with a single call including the \(L\) elements. See 2.3 for details.
Important notes:
Let \(c<n\) be the number of capacity elements. The sponge state consists of the following elements:
A sponge instance is characterized by a tag, which is derived from an IO pattern, that is, its a sequence of absorb phases and squeeze phases and their respective number of field elements. However, different IO patterns may lead to the same tag:
For example, the following IO patterns lead to a distinct tag/instance:
However, the following pattern has the same
tag/instance as Pattern 1, as per our aggregation mechanism:
The tag of an instance is used as an initial value, to ensure that distinct instances behave like distinct functions. Using distinct tags for different, non-equivalent usage patterns avoids trivial collisions between input sequences of different length, where a "non-input" element is replaced by a zero element in the colliding message (this would lead to a collision because of the lack of padding).
Furthermore, for applications that need to distinguish equivalent IO patterns, a domain separator can be used.
A tag is calculated from an IO pattern and a domain separator as follows:
[0x80000003, 0x80000003, 0x00000003]
.[0x80000003, 0x80000003]
with a single 0x80000006
.0x4142
, then the example above would yield the string (if big-endian convention is used): 0x80000006000000034142
.Given its tag string, an instance admits an arbitrary number of executions, which are in addition characterized by an input \(\mathcal Y\in (\mathbb F^r)^\star\).
REMARK: The 32-bit encoding restricts the number of elements absorbed or squeezed to \(2^{31}-1\) per call. For applications that need to absorb or squeeze such a large number of elements, the operation must therefore be done via multiple calls, rather than a single one.
REMARK: If the hash function used to create the tag received field elements rather than byte strings, and can directly process calls 32-bit integers as field elements, then the serialization mechanism (incl. endianness aspects) is not needed.
Each call to \(\mathsf{ABSORB}\) or \(\mathsf{SQUEEZE}\) both:
When all calls are done, the \(\mathsf{FINISH}\) operation verifies that no call is left undone.
REMARK: Each call verifies "as it goes" that the correct sequence of calls is performed, as prescribed by the IO pattern initially fed to \(\mathsf{START}\). Executing a distinct IO pattern, even if equivalent after aggregation, would lead to an error.
\(\mathsf{START}(\mathtt{IOPattern}: IO[L],\mathtt{DomainSeparator}: D)\):
REMARK: Memory-constrainted applications may not store the IO pattern expected, but only the tag (if significantly shorter), and then recompute the tag from the IO pattern executed when finalizing the sponge. In that case, the continuous verification of the IO pattern does not need to be done as part of \(\mathsf{ABSORB}\) and \(\mathsf{SQUEEZE}\) calls.
\(\mathsf{ABSORB}(L, X[L])\):
\(\mathsf{SQUEEZE}(L) \to Y[L]\):
REMARK: We do not set \(\textsf{absorb_pos}\) to \((n-c)\) as in ABSORB as we may want the state to absorb at the same positions that have been squeezed. Example is authenticated encryption.
\(\mathsf{FINISH}()\):
The permutation state should not be directly manipulated as a mere vector of values. Instead, it should only expose such functions:
InitializeCapacity(Tag)
: Sets the first bits of the capacity elements to the tag.ReadRateElement(Offset) -> FieldElement
: Reads the state element at position Offset
AddRateElement(Offset, FieldElement)
: Add in the field an element to the state at position Offset
Permute()
: Permutes the stateThe simplest way to hash \(L\) elements \(X=X_1, X_2,\dots, X_{L}\) is to do a single call \(\mathsf{ABSORB}(L, X)\) followed by a single \(\mathsf{SQUEEZE}\), with the length of the desired output. These calls should be preceeded by a \(\mathsf{START}\) call and succeeded by a \(\mathsf{FINISH}\) call.
If the \(L\) elements are absorbed using more than one call – for example, via \(\mathsf{ABSORB}(1, X_1)\) followed by \(\mathsf{ABSORB}(L-1, (X_2,\dots,X_{L}))\) – then the resulting hash will not change.
Consider a binary tree whose leaves and nodes are field elements, for example from a 256-bit field \(\mathbb{F}\). Each inner node with children nodes \(X_1,X_2 \in \mathbb F\) is then obtained as follows:
[0x81, 0x81, 0x01]
) and \(D\) an arbitrary (possibly empty) domain separatorIf \(\mathsf{FINISH}\) succeeds, then \(Y\) is the node value above the two nodes hashed.
Alternatively, the two \(\mathsf{ABSORB}\) calls can be replaced by a single \(\mathsf{ABSORB}(2, (X_1,X_2))\), which will yield the same result. Note that the \(IO\) supplied to \(\mathsf{START}\) is then replaced by [0x82,0x01]
.
This construction generalizes to binary trees whose elements are tuples, in some \(\mathbb{F}^L, L>1\).
If computed with SHA3-256 with big-endian word-to-byte conversion, the 16-byte tag of our example would then be the hash of the serialized words [0x800000002,0x00000001]
(note that the two \(\mathsf{ABSORB}\)s are aggregated), that is:
hashlib.sha3_256(b'\x80\x00\x00\x02\x00\x00\x00\x01').hexdigest()[:32]
'3be11cba2e57c1d9e7ff6a72538baeef'
With a domain separator \x41\x42
, the tag would then be
hashlib.sha3_256(b'\x80\x00\x00\x02\x00\x00\x00\x01\x41\x42').hexdigest()[:32]
'09db848230d0b7d463bec1bf621b7844'
Consider a \(1\)-element commitment to three values \(X_1,X_2,X_3\in \mathbb F^2\), that is, each element \(X_i\) is a pair of field elements. The commitment is then obtained as follows:
[0x82, 0x82, 0x82, 0x01]
) and \(D\) an arbitrary (possibly empty) domain separatorIf \(\mathsf{FINISH}\) succeeds, then \(Y\) is the commitment value. This construction generalizes to other input and output lengths.
Equivalently, the three \(\mathsf{ABSORB}\) calls can be replaced by a single \(\mathsf{ABSORB}\) taking 6 field elements.
REMARK: The API does not distinguish the cases of (say)
- 2 elements A and B in \(\mathbb F\), and
- 1 element (A,B) in \(\mathbb F^2\).
In both cases, the sponge will absorb A and B, and produce the same result. If an application needs to distinguish those two cases, it must add some additional signalling, for example as a domain separator.
If computed with SHA3-256 with big-endian word-to-byte conversion, the 16-byte tag of our example would then be the hash of the serialized words [0x800000006,0x00000001]
(note that the three \(\mathsf{ABSORB}\)s are aggregated), that is:
hashlib.sha3_256(b'\x80\x00\x00\x06\x00\x00\x00\x01').hexdigest()[:32]
'c1dff57614db1d8e3ea1d60be1124497'
Consider the following example protocol:
The challenge generation using Fiat-Shamir would then look as follows:
Authenticated encryption with SAFE is a simplification of the SpongeWrap mode (from the Duplex paper): Encryption of \(b\) blocks of data with the key \(K\in\mathbb{F}^k\) and nonce \(N\in\mathbb{F}^m\), where block \(D_i\) consists of \(L_i\) field elements, is done as follows:
Upon success of \(\mathsf{FINISH}()\) and of previous calls, the string \((C_1+D_1)||(C_2+D_2)||\cdots ||(C_b+D_b)||S\) will be the ciphertext, where "\(+\)" denotes addition in \(\mathbb F\). Here \(t\) is the tag length, usually set to be at least 128 bits long.
REMARK: This construction is the most efficient when \(L_i \equiv r \mod n\), that is, all blocks fit the rate parameter of the sponge. This should be set by the protocol designer.
Decryption of \(b\) blocks of data (plus \(t\) extra tag elements \(E_t\in\mathbb F^t\)) on the key \(K\in\mathbb{F}^k\) and nonce \(N\in\mathbb{F}^m\), where block \(E_i\) consists of \(L_i\) field elements, is done as follows:
Upon success of \(\mathsf{FINISH}()\) and of previous calls, and if \(S\) matches \(E_t\), then the string \((E_1-C_1)||(E_2-C_2)||\cdots ||(E_b-C_b)\) is returned as plaintext.
This mode can be adapted to supported associated data (authenticated but not encrypted), in the same vein as the SpongeWrap mode.
A stream cipher generates a pseudo-random stream from a secret key and a not necessarily secret nonce, while a PRNG generates a pseudo-random stream from a seed. These can thus be instantiated with a similar sponge object to generate a pseudo-random stream \(C_1, \dots, C_b\), where \(C_i\) contains \(L_i\) field elements:
For the stream cipher case, the plaintext \(D_1,\dots,D_b\) with \(D_i\) consists of \(L_i\) field elements is then encrypted to \((C_1+D_1)||(C_2+D_2)||\cdots ||(C_b+D_b)\).
The authenticated encryption and stream cipher (with unique nonces), as well as PRNG mode (with a fixed-length seed) can be securely instantiated even if the IO pattern (and thus the tag) is not known in advance. In such a case, the state is initialized to elements \(1,2,3, .., n\), to distinguish it from the all-zero state used in the fixed-length mode.
In that case, the calls cannot, and thus do not enforce the IO pattern. Also, the sponge output can be used prior to the \(\mathsf{FINISH}\) call, which thus serves to "close" the instance and does not enforce an IO pattern either.
To avoid computing a costly cryptographic hash as the hasher \(\mathcal H\), one can use a universal hash function, such as the following:
Let \(X\) be a 128-bit constant, which is co-prime to \(2^{128}\). The input is parsed as a sequence of 128-bit integers \(A=(a_0,a_1,\ldots,a_n)\) and hashed to \[ \sum a_iX^i \bmod{2^{128}} \]
In that case, cross-protocol collision resistance property is reduced, as an attacker who controls the IO pattern can choose two patterns from distinct equivalent classes that yield the same tag.
Systems wherein the IO patterns and fixed by design may only need that weaker notion. However, if the IO patterns are fixed, then the tag can be precomputer with a costly hash.
If an application expects that some input data is relatively large but is known in advance, it may precompute the data by essentially hashing it. Concretely, we suggest replacing
A protocol or an application may consist of several submodules/subprotocols which call each other. A submodule is then initialized with the information accumulated in the caller, and later returns some output back to the caller. We suggest the following procedure for this:
The sponge state may be used to generate private data for one or many parties in a protocol. For this the party first extracts the state
which generates \(L\) field elements.
It is possible to implement the SAFE API for non-sponge hashes though with certain overhead. Our primary usecase is the multi-round interactive protocols. For Prover sending data \(D_1,D_2,D_l\) in rounds from 1 to \(l\), a typical Fiat-Shamir implementation of challenges \(c_1,c_2,\ldots,c_l\) is
\begin{align}
c_1\leftarrow H(D_1,X);\\
c_i \leftarrow H(D_i,c_i)
\end{align}
where \(X\) is the context and \(H\) is a non-sponge hash (say SHA-256).
This protocol can be implemented in SAFE as follows:
Here the state is 3 256-bit elements \((S_0,S_1,S_2)\), and each sponge permutation call is replaced with
\[
(S_0,S_1,S_2)\leftarrow (H(0||S_0,S_1,S_2),H(1||S_0,S_1,S_2),H(2||S_0,S_1,S_2))
\]
This results in a so called T-sponge.