## Introduction
[Block ciphers](https://en.wikipedia.org/wiki/Block_cipher) have been widely used in cryptography for many years. The idea is to separate a plaintext into fixed-sized blocks and perform transformations on these blocks in order to obtain a ciphertext. Block ciphers are also used in the construction of compression functions for the design of hash functions.
More recently, a new design paradigm for hash functions has been discovered a so-called *sponge construction* [[2](#BDP07)]. The cryptographic funtion `keccak`, which is widely used in the Ethereum protocol, was the first implementation of sponge constructions [[3](#BDP08)].
The goal of this article is to introduce a general algorithm for sponge constructions without presenting any practical use. The main focus will be in giving a presentation of the algorithm for any fixed-length function with some accompanying code snippets. The idea behind this approach is to present some concrete realizations of all parts of the sponge construction. Security will not be discussed here. For more details the reader is encouraged to refer to the original references at the end of this article. The code snippets that will appear will be written in Python for the sake of claritiy.
## The Sponge Construction
The figure below (taken from https://keccak.team/sponge_duplex.html) presents a general view of the sponge construction.

Let us start explaining each part of this figure. Here $M$ is the input message and $Z$ is the output. Both $M$ and $Z$ are strings of bits. We will denote by $|M|$ and $|Z|$ the number of bits in $M$ and $Z$, respectively.
A sponge function is divided in two stages: (1) absorbing and (2) squeezing.
### Absorbing stage
The idea for the absorbing stage is to divide $M$ is groups of bits each of length $r$ where $r$ is a given positive integer. If $|M|$ is not a multiple of $r$ then, for security reasons as argued in [[1]("BDP11"), Theorem 10], we use the following padding rule refered to as a *multi-rate padding*.
```python
#M is a string a bits and r is a positive integer
def pad(M,r):
if len(M)%r != 0:
paddingBits = len(M)//r*r+r-len(M)
if paddingBits >= 2:
return M + "1" + "0"*(paddingBits-2) + "1"
else:
return M + "1" + "0"*(paddingBits-2+r) + "1"
else:
return M
```
A multi-rate padding adds a string $10^*1$ to the end of $M$ such that the resulting string is a multiple of $r$. From now on, with no loss of generality, we will assume that $M$ if of length $n=kr$ where $k$ is the number of groups of bits of length $r$ in $M$.
In the figure above note that there is a function $f$ that is applied iteratively to $b=r+c$ bits. The number $b$ is called the *width*, $r$ is called the *rate*, and $c$ is called the *capacity*. Thus $f$ is a function of the form $f:\mathbb Z_2^b\to \mathbb Z_2^b$; for cryptographic applications $f$ must be a one-way permutation. Each group of $r$ bits of $M$ is used $k$ times. Let $M_i$ denote the $i$-th group of $k$ bits of $M$ with $1\leq i\leq k$.
Step 0 of the the absorbing stage is an initizialization of step before applying the function $f$. We define the state $s_0$ of the sponge at step 0 as
```python
s = M1 + 0*c
```
where `M1` is the block $M_1$ of $M$. An XOR operation is applied to the block $M_1$ with the string `0*r`, which is just equal to $M_1$, and then it is padedd with $c$ 0s. The total the number of bits of $s_0$ is $b$.
Let $s_{i-1}$ be the state just before applying the `xor`operation and $f$ at step $i$, where step $i$ is the $i$-th time $f$ is used. The code below implements step $i$.
```python
#M is the i-th block of bits
#s is the state at the (i-1)-th step
#r is the rate
def step(M,s,r):
xors = ""
for(i in range(r)):
xors += str(int(s[i])+int(M[i])-2*int(s[i])*int(M[i]))
newS = xors + s[r:]
return f(newS)
```
The `for` cycle computes de xor between $M_i$ and the first $r$ bits of $s_{i-1}$. Note that the last $b-r$ bits do not go through the xor operation and it is directly used in the call to $f$.
This last subrutine is applied $k$ times thus obtaining a state $s_k$. After computing $s_k$ we use it as input for the squeezing stage.
### Squeezing stage
Let $s_k$ be the state after the absobing stage. Let $\ell$ the the number of output bits, that is, the length of $Z$. During the squeezing stage, the output $Z$ is constructed by iteratively concatenating blocks of $r$ bits obtained by a sequential application of the function $f$ on the state. The code below executes one step of the squeeze stage.
```python
#s is the current state
#r is the rate
def squeeze_step(s,r):
sNew = f(s)
outterState = sNew[0:r]
return outterState, sNew
```
The function `squeeze_step` is called until the length of the output is at least $\ell$.
```python
#sk is the state at the end of the absorbe stage
#r is the rate
#l is the number of output bits
def squeeze(sk,r,l):
Z = sk[0:r]
s = sk
while len(Z) < l:
output_r, s = squeeze_step(s,r)
Z += output_r
return Z[0:l]
```
This concludes our brief explanation of the sponge construction.
## References
1. <a name="BDP11">Bertoni, G., Daemen, J., Peeters, M. & Van Assche, G. (2011). Cryptographic sponge functions. [[link](https://keccak.team/files/CSF-0.1.pdf)]</a>.
2. <a name="BDP07">Bertoni, G., Daemen, J., Peeters, M., & Van Assche, G. (2007). Sponge functions. In ECRYPT hash workshop (Vol. 2007, No. 9). [[link](https://keccak.team/files/SpongeFunctions.pdf)]</a>
3. <a name="BDP08">Bertoni, G., Daemen, J., Peeters, M., & Van Assche, G. (2008) Keccak specifications, NIST SHA-3 Submission. [[link](https://keccak.team/keccak.html)]</a>