# Summa 2.0 aka SNARKless Proof Of Solvency for CEX
## Preliminary Notions
Roots of unity in modular arithmetic (or within finite fields) play a significant role in cryptography in general and are crucial for the Summa V2. Let's break down the idea of $n$-th roots of unity in modular rings.
### Modular Arithmetic
An intuitive example of modular arithmetic from our daily life is the "clock arithmetic". When we see "19:00" boarding time on a boarding pass, we know that it corresponds to "7" on the clock face. Formally, in this case we perform the modular reduction by the modulus 12:
$19 \equiv 7 \pmod{12}$,
because the clock face is only marked from 1 to 12.
### Roots Of Unity in Modular Arithmetic
An integer $\omega$ is an $n$-th root of unity modulo $p$ if:
$\omega^n \equiv 1 \pmod{p}$
and $\omega^k \not\equiv 1 \pmod{p}$ for any $0 < k < n$.
In other words, $n$ is the lowest integer such that $\omega^n$ "wraps around" due to modular reduction to yield exactly $1$. Integers smaller than $n$ don't yield $1$ modulo $p$.
### Roots Of Unity Example
Let's observe a finite field of order $p = 7$: $\mathbb{F}_7 = \{1,2,...,7\}$. Let's see that $2$ and $4$ are the 3rd roots of unity in such a field:
- $2^3 \equiv 8 \equiv 1 \pmod{7}$, so 2 is a 3rd root of unity modulo 7.
- $4^3 \equiv 64 \equiv 1 \pmod{7}$, so 4 is another 3rd root of unity modulo 7.
- $1$ itself is a trivial root of unity, too.
### Special Property of The Sum of Roots of Unity
Let's consider a finite field $\mathbb{F}_q$ that has $n$-th roots of unity. Let $\omega$ be a primitive $n$-th root of unity in $\mathbb{F}_q$, which means $\omega^n = 1$ and no smaller positive power of $\omega$ equals 1.
The $n$-th roots of unity in $\mathbb{F}_q$ are $1, \omega, \omega^2, \ldots, \omega^{n-1}$.
**Claim**: $1 + \omega + \omega^2 + \ldots + \omega^{n-1} = 0$ (**the sum of all the roots of unity in a finite field is equal to zero**).
**Proof**:
Consider the sum $S = 1 + \omega + \omega^2 + \ldots + \omega^{n-1}$. We can multiply $S$ by $\omega - 1$, noting that $\omega - 1 \neq 0$ in a field so that such a multiplication preserves the equality:
$(\omega - 1)S = (\omega - 1)(1 + \omega + \omega^2 + \ldots + \omega^{n-1})$
Expanding the right hand side, we get:
$\omega + \omega^2 + \omega^3 + \ldots + \omega^n - (1 + \omega + \omega^2 + \ldots + \omega^{n-1})$
Notice that if we were to expand further, every term except $\omega^n$ and $-1$ would cancel out:
$(\omega - 1)S = \omega^n - 1$
Since $\omega$ is a primitive $n$-th root of unity, $\omega^n = 1$. So, $\omega^n - 1 = 0$. Therefore:
$(\omega - 1)S = 0$
If the product of two factors is zero, at least one of them must be zero. We already established that $\omega - 1 \neq 0$, thus $S$ must be zero:
$S = 0$
Therefore, $\boxed{1 + \omega + \omega^2 + \ldots + \omega^{n-1} = 0}$.
Let's also check it on our previous toy example of $\mathbb{F}_7$ and $n = 3$:
$1 + 2 + 4 = 7 \equiv 0 \pmod{7}$.
## Summa V2
Let's see how we can take advantage of *the sum of all roots of unity being zero* when applied to the proof of solvency.
### Data Structure & Commitment Scheme
The desired commitment scheme for Summa should have the following properties:
* Committing to the total liabilities of the Custodian that is the sum of all user balances;
* Allowing to publicly reveal the total liabilities;
* Allowing to prove the individual user inclusion into the commitment;
* Preserving the user privacy and hiding the user data (namely the user cryptocurrency balances);
* Outperfrom the Merkle sum tree in both commitment phase and proving phase[^1].
[^1]: ***TODO*** add a "Motivation" section that explains why Merkle sum tree (MST) in Summa V1 was not sufficient and we have been searching for a better solution. In brief, MST involves hashing and contains $n \log n$ entries, making it computationally demanding. In Summa V1 the MST inclusion proofs have to be wrapped inside a ZK-SNARK, so it is also computationally demanding to generate all of them at once for the entire user base of the Custodian that can be on the order of hundreds of millions of users.
We will demonstrate how a polynomial commitment can be used to achieve all of these properties.
Let's consider a polynomial $B(X)$ that evaluates to a $i$-th user balance $b_i$ at a "user's point" - some value $x_i$ that is designated for this specific user:
$B(x_i) = b_i$.
We can call it a user balance polynomial. It is quite easy to construct such a polynomial using the Lagrange interpolation. The formula for the polynomial that interpolates these data points is:
$B(X) = \sum_{i=1}^{n} b_i \cdot L_i(X)$
Where $L_i(X)$ is the Lagrange basis polynomial defined as:
$L_i(X) = \prod_{\substack{j=1 \\ j \neq i}}^{n} \frac{X - x_j}{x_i - x_j}$
A polynomial constructed using the Lagrange interpolation is known to have the degree $d = n - 1$ where $n$ is the number of users (therefore, the number of the balance evaluation points). The resulting polynomial should look like the following:
$\boxed{B(X) = a_0 + a_1x + a_2x^2 + ... + a_{d} x^{d}}$
Let's choose the $x_i$ values as the $i$-th degrees of an $n$-th root of unity (assuming that we are perforing all the calculations in the prime field with a sufficiently large modulus):
$\boxed{B(\omega^i) = b_i}$, where $\omega$ is the $n$-th primitive root of unity.
### KZG Commitment Scheme
We choose a KZG commitment scheme to commit to this polynomial for the compatibility with Halo2 API (more on that later). In brief, a KZG commitment is a single finite field element $C$ that uniquely represents the polynomial $B$.
It is impossible to reconstruct the polynomial from the commitment, so our requirement of user privacy is satisfied because it is impossible to infer any evaluations of the polynomial from the single-value commitment $C$.
During the reveal (aka opening) phase, the committed value $C$ is used along with the claimed polynomial evaluation $B(x)$ to provide a succinct proof $\pi$, verifying that the value $B(x)$ is indeed an evaluation of a polynomial $B(X)$ at point $x$ and corresponds to the original commitment $C$. Therefore, KZG commitment allows the Custodian to individually provide the opening proofs $\pi_i$ to each user to proof that the polynomial $B(X)$ indeed evaluates to the user balance $b_i$ at the point $x_i = \omega^i$:
$\{C, B(\omega^i),\pi\}: B(\omega^i) = b_i$
More broadly, the KZG commitment allows the prover to *open the polynomial at any point*, and we will later see gow it benefits our case.
### Grand Total of the Polynomial Evaluations
To prove the solvency of the Custodian, we need to find its total liabilities by summing up all the user balances and to prove to the public that the sum is less than the assets owned by the Custodian. An individual $i$-th user balance is the evaluation of the polynomial at the $\omega^i$ value corresponding to the user:
$B(\omega^i) = b_i =a_0 + a_1(\omega^i)^1 + a_2(\omega^i)^2 + ... + a_{n-1} (\omega^i)^{n-1}$
Let's calculate the sum $S$ of all the user balances as the sum of the polynomial evaluations:
\begin{align*}
S = \sum\limits_{i} B(\omega^i)& = &a_0\quad& + &a_1\omega^{0\phantom{-1}}\quad & + & a_2(\omega^{0\phantom{-1}})^2\quad & + & \cdots\quad & + & a_{n-1} (\omega^{0\phantom{-1}})^{n-1} +\\
& + &a_0\quad& + &a_1\omega^{1\phantom{-1}}\quad & + & a_2(\omega^{1\phantom{-1}})^2\quad & + & \cdots\quad & + & a_{n-1} (\omega^{1\phantom{-1}})^{n-1} +\\
& + &a_0\quad& + &a_1\omega^{2\phantom{-1}}\quad & + & a_2(\omega^{2\phantom{-1}})^2\quad & + & \cdots\quad & + & a_{n-1} (\omega^{2\phantom{-1}})^{n-1} +\\
&&&&&\vdots \\
& + &a_0\quad& + &a_1\omega^{n-1}\quad & + & a_2(\omega^{n-1})^2\quad & + & \cdots\quad & + & a_{n-1} (\omega^{n-1})^{n-1} =\\
\\
\rlap{\text{(let's factor out the each $a_i$)}}
\end{align*}
$\begin{align*}\phantom{S = \sum\limits_{i} B(\omega^i)}&=n a_0 + a_1(\underbrace{\omega^0 + \omega^1 + \omega^2 + \cdots +\omega^{n-1}}_{=0}) + \cdots + a_{n-1}(\underbrace{\omega^0 + \omega^1 + \omega^2 +\cdots+ \omega^{n-1} }_{=0})^{n-1} =
\\
&\rlap{\text{(using the property of the sum of all roots of unity inside the parnetheses being zero)}}
\\&= n a_0
\end{align*}$
Therefore, the grand sum of the user balances is simply the constant coefficient of the polynomial times the number of users:
$\boxed{S = \sum\limits_{i} B(\omega_i) = n a_0}$
As it turns out, the Halo2 proving system is internally using the roots of unity as $X$ coordinates for the polynomial construction, and we will later see how we can take advantage of that.
### Proof of Solvency
Using the described polynomial construction technique and the KZG commitment, it is sufficient for the Custodian to "open" the KZG commitment at $x = 0$:
$\{ C, B(0),\pi_{x=0}\}: B(0) = a_0 + a_10 + a_20^2 + ... + a_{n-1} 0^{n-1} = a_0$
The total liabilities can then be calculated by multiplying the $a_0$ value by the number of users:
$S = n a_0$
### Proof of Inclusion
As described in the KZG section, individual users would receive the KZG opening proofs $\{C, B(\omega^i),\pi\}$ at their specific point $\omega^i$ and they would be able to check that
- the opening evaluation is equal to their balance: $B(\omega^i) = b^i$;
- the opening proof $\pi_i$ corresponds to the public KZG commitment $C$.
The caveat is that if two or more users have the same cryptocurrency balance, a malicious Custodian could give them the same KZG proof because the user index $i$ is a value defined by the Custodian. We will use the following technique to mitigate that:
- the Custodian has to additionally commit to another polynomial that evaluates to the hashes of user IDs at the specific user points: $H(\omega^i) = h_i$;
- the hashed user ID should be known to the user (e.g, the email address used to register with the Custodian);
- the Custodian then gives two KZG commitments and two opening proofs to the user - one proving the balance inclusion into the balances polynomial and the other proving the user ID inclusion into the ID polynomial:
$\{C_B, B(\omega^i),\pi_B\}: B(\omega^i) = b_i$
$\{C_H, H(\omega^i),\pi_H\}: H(\omega^i) = h_i$
## Performance Estimate
To estimate the computational complexity of performing a Lagrange interpolation polynomial and then making KZG commitment to it, we need to consider both steps separately.
1. **Lagrange Interpolation Polynomial**:
- The Lagrange interpolation involves computing Lagrange basis polynomials and then summing them up, weighted by the data points.
- The computational complexity of calculating a single Lagrange basis polynomial is $O(n)$, where $n$ is the number of data points.
- Since we need to calculate this for each of the $n$ data points and then sum them up, the total complexity for Lagrange interpolation is $O(n^2)$.
2. **KZG Commitment**:
- KZG commitments are based on bilinear pairings over elliptic curves and depend on the size of the polynomial to which the commitment is made.
- The KZG commitment scheme involves computing a commitment to each coefficient of the polynomial, which is a constant-time operation.
- Assuming the polynomial resulting from the Lagrange interpolation has $n$ coefficients, the computational complexity of making a KZG commitment to it is $O(n)$.
Combining both, the overall complexity for performing Lagrange interpolation and then making a KZG commitment can be estimated as $O(n^2) + O(n)$, which is dominated by the $O(n^2)$ term from the Lagrange interpolation for large $n$. Lagrange interpolation can be optimized to sub-quadratic complexity. The key optimization involves recognizing and eliminating the redundant computations, particularly in the denominators of the Lagrange basis polynomials. Since each but one basis polynomials share a common denominator, this part can be computed more efficiently. Here's a brief outline of the optimization:
1. **Compute the Product of All Terms**: First, compute the product of all terms $(x - x_i)$ for $i = 0$ to $n$. This can be done in $O(n \log n)$ time using techniques similar to those in the Fast Fourier Transform (FFT).
2. **Use Inverse Multiplication for Denominators**: Instead of computing each denominator separately, we can compute the inverses of the terms $x - x_i$ and then use these inverses to quickly compute each denominator. This step also utilizes $O(n \log n)$ time.
3. **Compute the Numerators Efficiently**: The numerators of each Lagrange basis polynomial can be computed directly in linear time.
So, by adopting these optimizations, the computational complexity of Lagrange interpolation can be reduced to $O(n \log n)$, which is significantly better than $O(n^2)$ for large $n$.