# Poseidon Hash
Poseidon hash [1] maps strings over $F_p$ to fixed length strings over $F_p$:
$$\texttt{POSEIDON}: F_p^* \rightarrow F_p^o$$
where $o$ is the output length measured in number of $F_p$ elements (usually $o = 1$).
In this document, we describe a Poseidon hash specification based on [1].
## Poseidon Overview
In a high-level, Poseidon hash applies round function (defined below) $R = 2 R_f + R_P$ times.
### Components of Round Function
Each round function contains the following $3$ components. Suppose the input to each round is
$$ \vec{v} = [v_0, v_1, \ldots, v_{t-1}]$$
where $v_i \in F_p$, and $p$ is a prime of size $p \approx 2^n$, these three components are:
* **AddRoundConstants**, denoted as $\texttt{ARC}(\cdot)$. It adds pre-selected constants to the inputs.
$$ \texttt{ARC}(v_i) = v_i + b_i$$
* **S-box**, denoted as $\texttt{SB}(\cdot)$. There are two kinds of S-boxes we consider:
1. $\alpha$-power S-boxes, defined as:
$$\texttt{SB}(v_i) = v_i^\alpha $$
where $\alpha$ is the smallest positive integer s.t. $gcd(\alpha, p−1) = 1$. For example, if we set $\alpha = 3$, the permutation is called $x^3-\texttt{POSEIDON}^{\pi}$
2. inverse S-boxes, defined as
$$\texttt{SB}(v_i) = v_i^{-1}$$
we assume $\texttt{SB}(0) = 0$ here.
It turns out that s-box with $\alpha=5$ is suitable for two of the most popular prime fields in ZK applications, concretely the prime subfields of the scalar field of the BLS12-381 and BN254 curves, so we mainly consider this S-box.
* **MixLayer**, also called linear layer, denoted by $\texttt{M}(\cdot)$. In this layer, we use a Maximum Distance Separable (MDS) matrix $\mathcal{M}$:
$$\mathcal{M}: F_p^{t \times t}$$
Please refer [the poseidon paper chapter 2.3](https://eprint.iacr.org/2019/458.pdf) for the details on how to compute and the security creterion of MDS matrix.
(TODO: add a bit more discription of MDS).
Then:
$$\texttt{M}(\vec{v}) = \vec{v} \times \mathcal{M}$$
### Construction of Round Function
Each round function consists similar 3 steps formed by the previously introduced components:
$$ ARC \rightarrow SB \rightarrow M$$
However, the s-box steps are not the same across different rounds. More specificly, there are two kinds of s-box steps:
* **Full S-box step**: apply S-box function on all input finite fields $v_i$:
$$ f_{sb}(\vec{v}) = [\texttt{SB}(v_0), \texttt{SB}(v_1), \ldots, \texttt{SB}(v_{t-1})] $$
* **Partial S-box step**: apply S-box function only to the first element:
$$ f_{sb}(\vec{v}) = [\texttt{SB}(v_0), v_1, \ldots, v_{t-1} ]$$
In total, POSEIDON has $R = 2 R_f + R_p$ rounds, the first $R_f$ rounds are with a full S-box step, and the following $R_p$ rounds are with a partial S-box step, the last $R_f$ rounds are with a full S-box step, as demonstrated below:

The intuition is that full S-box layer provides better security property than Partial S-box. However, full S-box layer also leads to more computation by applying S-box function on all inputs. The design of Poseidon makes a trade-off here.
## Concrete Construction
The Poseidon hash of width $n$ can be understood as a sequence of operations performed on a vector $\vec v \in F^t$ for some finite field $F$ of prime order $p$. These (non-linear) operations require three pieces of data:
* A sequence $b_0, b_1, \ldots b_{m-1}$ of elements of $F$. Here, $m= t \times R$.
* An MDS matrix $\mathcal{M}: F^{t \times t}$. (The MDS condition is equivalent to having no singular submatrix. See note below on security)
* An $\alpha$ parameter. This is either a positive integer relatively prime to $p-1$ or else is equal to $-1$.
The vector $\vec v$ of initial data will now be acted on in successive rounds that are either *full* or *partial*.
Either sort of round starts with an "add constants" step: we take the next $n$ constants from the sequence $(b_i)$ and add these to $\vec v$. Denoting this step as $A$ ("add") we are performing $$ A \vec v := \vec v + (b_{k*t},\ldots, b_{(k+1)*t-1})$$
In a *full round* we take each element $v_1, \ldots v_n$ of $\vec v$ and pass it into an "S-box": this is a function $S: F \to F$ of the form $x \mapsto x^\alpha$, where $\alpha$ is the above parameter (chosen to be relatively prime to $p-1$ in order to have this function invertible). In a *partial round* it is only the first entry $v_1$ that is passed into the S-box. That is, a full round consists of $$S_\text{full}: \vec v \mapsto (v_0^\alpha, v_1^\alpha, \ldots v_{t-1}^\alpha)$$ whereas a partial round consists of $$S_\text{partial}: \vec v \mapsto (v_0^\alpha, v_1, \ldots v_{t-1}).$$
Finally, either sort of round ends with a linear "mix layer" consisting of matrix multiplication: $$\vec v \mapsto \vec v \times M$$
Describing these as a composition of functions, a full round is $\vec v\mapsto M S_\text{full} A \vec v$ and a partial round is $\vec v \mapsto M S_\text{partial} A \vec v$.
We will first perform $R_f$ full rounds, then $R_P$ partial rounds, then $R_f$ full rounds. That is, the overall state change that $\vec v$ undergoes is
$$\vec v \mapsto \left(M S_\text{full} A\right)^{R_f} \left(M S_\text{partial} A\right)^{R_P} \left(M S_\text{full} A\right)^{R_f} \vec v $$
## Poseidon Hash Parameter Selection
In Poseidon hash, there are several pre-defined hyper-parameters regardless of input value to the hash function.
In particular, there are three set of hyper-parameters:
* Round Constants
* $\alpha$ in Sbox
* Maximum Distance Separable (MDS) matrix
Please refer to [3] for examples.
## Optimized Poseidon Hash
Credit to Filecoin (https://github.com/filecoin-project/neptune).
Filecoin designed an optimized poseidon hash [4] to reduce the number of constraints. Given the same input, this optimized poseidon hash produces exactly the same output as the unoptimized poseidon hash described in poseidon hash paper [1]. At a high-level, the optimized poseidon hash reduces the number of round constants and converts dense MDS matrix into sparse matrix to reduce constraints for partial rounds.
We first describe the generation of optimized round constants and sparse MDS matrices. Then we present how to compute optimized Poseidon hash.
### Optimized Round Constants
**Input**:
* RoundConstants $\in Z^{[2tR_f + tR_P]}_p$.
This is the round constant from unoptimized poseidon hash. $p$ is the prime field modulus. $2R_f+R_P$ is the total number of rounds, $t$ is the input width. (*i.e.*, len(hash input) + 1).
**Output**:
* RoundConstants' $\in Z^{[2tR_f+R_P]}$
This is the round constant from optimized poseidon hash.
We note that, in each partial round, the number of round constant is 1, which is significantly smaller than the number $t$ in unoptimized poseidon hash.
**Intuitive Example for Full Round**:
In full rounds, we have:
$$ARC_1 \rightarrow SB \rightarrow M \rightarrow ARC_2 \rightarrow SB \rightarrow \cdots$$
Suppose the output of $SB$ operation is $\vec v$ (a vector of length $t$), we first have the output of $M$ operation as $\vec v \times M$. Then, we have the output of $ARC_2$ as
$$\vec v \times M + RoundConstant_2$$
Here, $RoundConstant_2$ is a vector of length $t$ and $+$ indicates element-wise addition. Considering that $M$ is invertible, we have the following equivalent equation
$$(\vec v + RoundConstant'_2) \times M$$
$$RoundConstant'_2 = RoundConstant_2 \times M^{-1}$$
We note that $M$ and $RoundConstant_2$ are both public such that $RoundConstant'_2$ is also public and can be precomputed.
We define $ARC'_2$ as:
$$ARC'_2 (\vec v) = \vec v + RoundConstant'_2$$
Now we can rewrite the full round computation as
$$ARC_1 \rightarrow SB \rightarrow ARC'_2 \rightarrow M \rightarrow SB \rightarrow \cdots$$
We stress that the first $ARC_1$ is not changed here.
**Intuitive Example for Partial Round**:
The major property in partial round is that we only need to apply s-box on the first element instead of all $t$ elements. By exploiting this property, we can fuse round constants for i-th elements ($2\le i \le t$) in multiple round into 1 round constants.
Let's first describe an unoptimized three-round, where the first round is a full round and the last two rounds are partial rounds.
$$ARC_1 \rightarrow SB_1 \rightarrow M_1 \rightarrow ARC_2 \rightarrow SB_2 \rightarrow M_2 \rightarrow ARC_3 \rightarrow SB_3 \rightarrow M_3$$
While $M_1$, $M_2$, and $M_3$ are the same matrix, we use different subscripts to denote the computation in different rounds. Denote the round constants in $ARC_i$ as $RoundConstants_i = [rc_{i,0}, rc_{i,1}, ..., rc_{i,t-1}]$ which are vectors of length $t$. Denote the hash input as $\vec v=[v_0, v_1, ..., v_{t-1}]$. We have
$$ARC_1: \vec v + RoundConstants_1= [v_0+rc_{1,0}, v_1+rc_{1,1}, \cdots, v_{t-1}+rc_{1,t-1}] $$
$$SB_1: [(v_0+rc_{1,0})^\alpha, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha]$$
$$M_1: [(v_0+rc_{1,0})^\alpha, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha] \times M$$
Let us denote the output of M in first round as $\vec u = [u_0, u_1, ..., u_{t-1}]$.
$$ARC_2: \vec u + RoundContants_2= [u_0+rc_{2,0}, u_1+rc_{2,1}, \cdots, u_{t-1}+rc_{2,t-1}] $$
$$SB_2: [(u_0+rc_{2,0})^\alpha, u_1+rc_{2,1}, \cdots, u_{t-1}+rc_{2,t-1}]$$
$$M_2: [(u_0+rc_{2,0})^\alpha, u_1+rc_{2,1}, \cdots, u_{t-1}+rc_{2,t-1}] \times M$$
Let us denote the output of M in the second round as $\vec w = [w_0, w_1, ..., w_{t-1}]$.
$$ARC_3: \vec w + RoundConstants_3 = [w_0+rc_{3,0}, w_1+rc_{3,1}, \cdots, w_{t-1}+rc_{3,t-1}] $$
$$SB_3: [(w_0+rc_{3,0})^\alpha, w_1+rc_{3,1}, \cdots, w_{t-1}+rc_{3,t-1}]$$
$$M_3: [(w_0+rc_{3,0})^\alpha, w_1+rc_{3,1}, \cdots, w_{t-1}+rc_{3,t-1}] \times M$$
Now, we will show the optimized poseidon hash for these three rounds. At a high-level, we will start from the round constants of the third round and reverse-propagate the round constants to the first round.
Let's denote $acc_3 = RoundConstants_3=[rc_{3,0}, rc_{3,1}, ..., rc_{3,t-1}]$. $ARC_3$ can be written as $\vec w + acc_3$.
Let's denote $\vec u = u_{[0]} + u_{[1:]}$, where $u_{[1:]} = [0, u_1, u_2, ..., u_{t-1}]$, $u_{[0]} = [u_0, 0,0,..., 0]$. Denote $RoundConstants_i = RC_{i, [0]} + RC_{i, [1:]}$ where $RC_{i,[1:]} = [0, rc_{i,1}, rc_{i,2}, ..., rc_{i,t-1}]$ and $RC_{i,[0]} = [rc_{i,0}, 0,...,0]$.
Then, we have
$$ARC_2: (u_{[0]} + RC_{2,[0]}) + (u_{[1:]} + RC_{2, [1:]})$$
$$SB_2: (u_{[0]}+RC_{2,[0]})^\alpha + (u_{[1:]} + RC_{2, [1:]})$$
$$M_2: \vec w = [(u_{[0]}+RC_{2,[0]})^\alpha + (u_{[1:]} + RC_{2, [1:]})] \times M$$
We use $(u_{[0]}+RC_{2,[0]})^\alpha$ to denote the power computation on each element for notation simplicity.
Combining $ARC_2$, $SB_2$, $M_2$ with $ARC_3$, we have
$$\vec w + acc_3 = [(u_{[0]}+RC_{2,[0]})^\alpha + (u_{[1:]} + RC_{2, [1:]})] \times M + acc_3$$
Since $M$ is invertible, we can have $acc'_3=acc_3 \times M^{-1}$. So we have
$$\vec w + acc_3 = [(u_{[0]}+RC_{2,[0]})^\alpha + (u_{[1:]} + RC_{2, [1:]}) + acc'_3] \times M$$
Let's denote $acc'_3 = acc'_{3,[0]} + acc'_{3,[1:]}$, we have
$$\vec w + acc_3 = [((u_{[0]}+RC_{2,[0]})^\alpha+acc'_{3,[0]}) + (u_{[1:]} + RC_{2, [1:]} + acc'_{3,[1:]})] \times M$$
Let's denote $partial\_consts_3 = acc'_{3,[0]}$, $acc_2 = acc_{2,[0]} + acc_{2,[1:]}$ where $acc_{2,[0]} = RC_{2,[0]}$, and $acc_{2,[1:]} = RC_{2,[1:]}+acc'_{3,[1:]}$. We have
$$\vec w + acc_3 = [((u_{[0]} + acc_{2,[0]})^\alpha + partial\_consts_3) + (u_{[1:]} + acc_{2,[1:]})] \times M$$
Let us denote $\tilde{u} = \vec u+acc_2 = \tilde{u}_{[0]} + \tilde{u}_{[1:]} = (u_{[0]} + acc_{2,[0]}) + (u_{[1:]} + acc_{2,[1:]})$.
We have
$$\vec w + acc_3 = [((\tilde{u}_{[0]})^\alpha + partial\_consts_3) + \tilde{u}_{[1:]}] \times M$$
Now, we have rewritten $ARC_2 \rightarrow SB_2 \rightarrow M_2 \rightarrow ARC_3$ as $ARC'_2 \rightarrow SB_2 \rightarrow ARC'_3 \rightarrow M_2$ where
$$ARC'_2: \vec u+acc_2= (u_{[0]}+acc_{2,[0]})+(u_{[1:]} + acc_{2,[1:]})$$
$$SB_2: (u_{[0]}+acc_{2,[0]})^\alpha +(u_{[1:]} + acc_{2,[1:]})$$
$$ARC'_3:((u_{[0]}+acc_{2,[0]})^\alpha+partial\_consts_3) +(u_{[1:]} + acc_{2,[1:]})$$
$$M_2: [((u_{[0]}+acc_{2,[0]})^\alpha+partial\_consts_3) +(u_{[1:]} + acc_{2,[1:]})] \times M$$
We note that, in $ARC'_3$, we need only $1$ addition instead of $t$ addition since only $1$ element in $partial\_consts_3$ is non-zero.
We can observe that $ARC'_2$ is $\vec u+acc_2$, which share the same format as $ARC_3$. So we can similarly fuse $ARC'_2$ with $ARC_1$, $SB_1$ and $M_1$.
Before optimization:
$$ARC_1 \rightarrow SB_1 \rightarrow M_1 \rightarrow ARC_2 \rightarrow SB_2 \rightarrow M_2 \rightarrow ARC_3 \rightarrow SB_3 \rightarrow M_3$$
Step 1:
$$ARC_1 \rightarrow SB_1 \rightarrow M_1 \rightarrow ARC_2 \rightarrow SB_2 \rightarrow ARC_3' \rightarrow M_2 \rightarrow SB_3 \rightarrow M_3$$
Step 2:
$$ARC_1 \rightarrow SB_1 \rightarrow M_1 \rightarrow ARC'_2 \rightarrow SB_2 \rightarrow ARC_3'' \rightarrow M_2 \rightarrow SB_3 \rightarrow M_3$$
Step 3:
$$ARC_1 \rightarrow SB_1 \rightarrow ARC''_2 \rightarrow M_1 \rightarrow SB_2 \rightarrow ARC_3'' \rightarrow M_2 \rightarrow SB_3 \rightarrow M_3$$
**Algorithm**:
This is the algorithm to generate the optimized round constants.

Notes [4]:
* $\times$ denotes a row vector-matrix multiplication which outputs a row vector.
* **Line 2**. The first $t$ round constants are unchanged. Note that both $RoundConstants'_0$ and $RoundConstants'_1$ are used in the first optimized round $r=0$.
* **Lines 3-4**. For each first-half full round, transform the round constants into $RoundConstans_r \times M^{-1}$. On the correctness, please refer to **Intuitive Example for Full Round**.
* **Line 5**. Create a variable to store the round constants for the partial rounds partial_consts (in reverse order).
* **Line 6**. Create and initialize a variable $acc$ that is transformed and added to $RoundConstants_r$ in each do loop iteration.
* **Line 7-11**. For each partial round $r$ (startomg from the gratest partial round index $R_f + R_P-1$ and proceeding to the least $R_f$) transform $acc$ into $acc \times M^{-1}$, take its first element as a partial round constant, then perform element-wise addition with $RoundConstants_r$. On the correctness, please refer to **Intuitive Example for Partial Round**. The value of $acc$ at the end of the $i^{th}$ loop iteration is:

* **Line 12**. Set the last first-half full round's constants using the final value of $acc$.
* **Line 13**. Set the partial round constants.
* **Line 14-15**. Set the remaining full round constants.
### Optimized MDS Matrices
**Input**:
* MDS matrix $M \in Z_p^{[t\times t]}$
$t$ is the input width (*i.e.*, len(hash input)+1).
**Output**:
* Pre-sparse matrix (a non-sparse matrix) $P \in Z_p^{[t\times t]}$.
Pre-sparse matrix $P$ is used in MDS mixing for the last full round of the first-half $r=R_f-1$. Like the MDS matrix $M$, the pre-sprase matrix $P$ is symmetric.
* A sequence of sparse matrices $S \in Z_p^{[t\times t]^{[R_p]}}$.
The array of sparse matrices that $M$ is factored into, which are used for MDS mixing in the optimized partial rounds.
**Note:**
This optimization on MDS can only be applied if the optimization on round constants has been applied. Otherwise the computation results with and without optimized MDS are not the same.
**Intuitive Example**
Optimized MDS reduces computation for partial rounds and does not change computation in full rounds (except the last round in the first-half full rounds).
To provide an intuitive example, we will first show the computation of a two-round poseidon hash without optimized MDS. Then, we will show the computation of a two-round poseidon hash with optimized MDS. Finally, we will mathematically show that the computation results of these two hashes are the same.
**Without Optimized MDS**
Consider a two-round Poseidon hash with optimized round constants and without optimized MDS.
Suppose the first round is a full round and the second round is a partial round.
$$ARC_1 \rightarrow SB_1 \rightarrow ARC_2 \rightarrow M_1 \rightarrow SB_2 \rightarrow ARC_3 \rightarrow M_2$$
While $M_1$ and $M_2$ are the same matrix, we use different subscription to denote the computation in different rounds.
Denote the round constants in $ARC_1$ as $RoundConstants_1 = [rc_{1,0}, rc_{1,1}, ..., rc_{1,t-1}]$. Denote the round constants in $ARC_2$ as $RoundConstants_2 = [rc_{2,0}, 0, ..., 0]$. The last $t-1$ elements are zero in $ARC_2$ since it is optimized for partial round. This property is the key to prove the correctness of the optimized MDS.
Similarly, we have $ARC_3 = [rc_{3,0}, 0,...,0]$. We include $ARC_3$ to prove the correctness of optimized MDS in a general case when there are more rounds following the second (partial) round.
Denote the hash input as $\vec v=[v_0, v_1, ..., v_{t-1}]$. We have
$$ARC_1: [v_0+rc_{1,0}, v_1+rc_{1,1}, \cdots, v_{t-1}+rc_{1,t-1}] $$
$$SB_1: [(v_0+rc_{1,0})^\alpha, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha]$$
$$ARC_2:[(v_0+rc_{1,0})^\alpha+rc_{2,0}, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha] $$
$$M_1: [(v_0+rc_{1,1})^\alpha+rc_{2,0}, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha]M$$
Let denote the output of $M_1$ as $\vec u = [u_0, u_1, ..., u_{t-1}]$.
$$SB_2: [u_0^\alpha, u_1, \cdots, u_{t-1}]$$
$$ARC_3: [u_0^\alpha + rc_{3,0}, u_1, \cdots, u_{t-1}]$$
$$M_2: [u_0^\alpha + rc_{3,0}, u_1, \cdots, u_{t-1}]M$$
**With Optimized MDS**
Now, we will show the optimized poseidon hash for these two rounds. Let's first split $M$ into $m'$ and $m''$ following the following function:

Let $S=m''$, $P=M\times m'$, where $\times$ is matrix multiplication
The first full round is rewritten as
$$ARC_1: [v_0+rc_{1,0}, v_1+rc_{1,1}, \cdots, v_{t-1}+rc_{1,t-1}] $$
$$SB_1: [(v_0+rc_{1,0})^\alpha, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha]$$
$$ARC_2:[(v_0+rc_{1,0})^\alpha+rc_{2,0}, (v_1+rc_{1,1})^\alpha, \cdots, (v_{t-1}+rc_{1,t-1})^\alpha] $$
$$M'_1: [(v_0+rc_{1,0})^\alpha+rc_{2,0}, (v_2+rc_{1,2})^\alpha, \cdots, (v_t+rc_{1,t})^\alpha]P$$
Let's denote that
$$\vec u = [(v_0+rc_{1,0})^\alpha+rc_{2,0}, (v_2+rc_{1,2})^\alpha, \cdots, (v_t+rc_{1,t})^\alpha]M$$
Then, the output of $M_1'$ is $\vec u' = \vec u \times m' = [u'_0, u'_1, ..., u'_{t-1}]$.
The second (partial) round is rewritten as
$$SB_2': [u_0'^\alpha, u_1', \cdots, u_{t-1}']$$
$$ARC_3': [u_0'^\alpha+rc_{3,0}, u_1', \cdots, u_{t-1}']$$
$$M_2': [u_0'^\alpha+rc_{3,0}, u_1', \cdots, u_{t-1}']S$$
**Proof of correctness**
We want to show that the output of $M_2'$ in optimized MDS (*i.e.*, $[u_0'^\alpha+rc_{3,0}, u_1', \cdots, u_{t-1}']$) is the same as the output of $M_2$ in unoptimized MDS (*i.e.*, $[u_0^\alpha + rc_{3,0}, u_1, \cdots, u_{t-1}]M$).
Denote $\vec u = [u_0, u_{[1:]}]$ where $u_0$ is a scalar and $u_{[1:]} = [u_1, u_2, ..., u_{t-1}]$ is a vector.
Then, we have:
$$M_1': \vec u \times m' \\
= [u_0, u_{[1:]}] \times \left[
\begin{array}{c|c}
1 & 0 \\ \hline
0 & \hat m
\end{array}\right]\\
=[u_0, u_{[1:]}\hat m]$$
$$SB_2': [u_0^\alpha, u_{[1:]}\hat m]$$
$$ARC_3': [u_0^\alpha+rc_{3,0}, u_{[1:]}\hat m]$$
$$M_2' = [u_0^\alpha + rc_{3,0}, u_{[1:]}\hat m] \times S\\
= [u_0^\alpha + rc_{3,0}, u_{[1:]}\hat m] \times \left[
\begin{array}{c|c}
m_{0,0} & m_{0,[1:]} \\ \hline
\hat m^{-1} \times w & I_{t-1}
\end{array}\right]\\
= [m_{0,0}(u_0^\alpha + rc_{3,0})+u_{[1:]}\times \hat m \times \hat m^{-1} \times w, \\
(u_0^\alpha + rc_{3,0})m_{0,[1:]}+u_{[1:]}\hat m] \\
=[I, II]$$
$$I = m_{0,0}(u_0^\alpha + rc_{3,0})+u_{[1:]}\times[m_{1,0}, m_{2,0}, ..., m_{t-1,0}]^T \\
= m_{0,0}(u_0^\alpha + rc_{3,0})+\sum_{i=1}^{t-1}m_{i,0}u_i$$
$$II = (u_0^\alpha + rc_{3,0})[m_{0,1}, ..., m_{0,t-1}]+[u_1, ..., u_{t-1}] \times \left[\begin{array}{ccc}
m_{1,1} & \cdots & m_{1,t-1} \\
\cdots & \cdots & \cdots \\
m_{t-1,1} & \cdots & m_{t-1,t-1}
\end{array}\right] \\
=[m_{0,1}(u_0^\alpha + rc_{3,0})+\sum_{i=1}^{t-1}m_{i,1}u_i, ..., m_{0,t-1}(u_0^\alpha + rc_{3,0})+\sum_{i=1}^{t-1}m_{i,t-1}u_i]$$
Combining $I$ and $II$, we have
$$M_2' = [u_0^\alpha + rc_{3,0}, u_1, ..., u_{t-1}] \times M$$
**Algorithm**
This is the algorithm to generate $P$ and $S$.

## Future Optimizations
In this section, we list potential optimizations to reduce the number of constraints in Poseidon hash. Please feel free to add more.
* In MixLayer, we can reduce constraints for mds by combining multiplication and addition from 2 constraints into 1 constraint.
* In MixLayer, we can use strassen algorithm to reduce constraints for matrix multiplication.
* For Sbox function, we may use customized gate to improve the performance.

## Reference
[1] Poseidon: A New Hash Function for Zero-Knowledge Proof Systems. USENIX Security'21.
[2] Webb's implementation on Poseidon hash. https://github.com/webb-tools/arkworks-gadgets/blob/master/arkworks-plonk-circuits/src/poseidon/poseidon.rs
[3] Examples of hyperparameters in Poseidon Hash. https://github.com/webb-tools/arkworks-gadgets/tree/master/arkworks-utils/src/utils
[4] Filecoin's optimized poseidon. https://github.com/filecoin-project/neptune