# Research for applying KZG commitment to Summa
Summa V1's goal was achieved with three proofs: `Proof of Inclusion`, `Proof of Solvency`, and `Proof of Ownership`, all combined. Two of these proofs, `Proof of Inclusion` and `Proof of Solvency`, are connected with the `root_hash`, which is a hashing of the root of the Merkle sum tree.
Switching to KZG commitment for `Proof of Inclusion` is easy to achieve. We create polynomials from every user's data (similar to the `leaf_hash` in `MST`), then evaluate and commit. The commit is a point on the elliptic curve, evaluated with KZG commitment, and can be used instead of the `root_hash` for verifiers.
## Approches
The initial idea of applying KZG commitment instead of Merkle Sum Tree was posted on Vitalik's blog.
https://vitalik.ca/general/2022/11/19/proof_of_solvency.html
Applying KZG commitment to Summa depends on how we decide vectors like Vitalik's simple idea.
### Single Vector
As mentioned in the blog, we can define the vector as one single polynomial like this below:
`[H(username_0, asset_0_balance, asset_1_balance, ..., asset_i_balance), asset_0_balance, ..., asset_i_balance, ... ,
H(username_n, ...), ..., asset_i_balance]`
Every `i`th element in the vector is located as a hash of the user data that has the username and each balance of the asset. We can add running sum elements between `i` and `i + N` th, but it makes increasing total degree of polynomials
In this structure, there is no constraint for running sum, unlike Vitalik's idea for KZG commitment in that blog.
We can generate a polynomial from the vector like this:
$$I(X) = \sum_{i=0}^{k-1} a_i \prod_{\frac{j=0}{j\neq i}}^{k-1} \frac{X - j}{i - j}$$
If the whole polynomial is $P(x)$ and the prover generate the proof for the `user_i`, then they will generate polynomial `I_i(X)` like this:
$$I_i(X) = \sum_{i}^{i + N_{asset}} a_i \prod_{\frac{j=0}{j\neq i}}^{i + N_{asset}}\frac{X - j}{i - j}$$
I guess this approach is limited by the degree of polynomials because we are handling multiple assets for committing.
For example, in Summa, we have tested cases with `2^15` entries and two assets. In this scenario, the total degree of the polynomial $P(x)$ can be expected to be `2^17`. However, if we apply the running sum elements into the vector, the degree will increase further, reacing `2^19`, entries and double of `N` assets in the same scenario.
### Comparing MST and KZG
Theoretically, KZG commitments are known to offer superior performance compared to Merkle trees when it comes to commitments.
Setting aside the circuit for generating proof, it's essential to measure how much faster the KZG commitment is in comparison to obtaining the root hash of the Merkle Sum Tree for Summa.
The [kzg-mst](https://github.com/sifnoc/kzg_test) repository offers an in-depth performance analysis between the Merkle Sum Tree (MST) and KZG Polynomial Commitments.
| Entries(2^k) | MST commit time(s) | Polynomial Commit time(s) | Performance Ratio (MST/Poly) |
|---:|------------------:|-------------------------:|----------:|
| 15 | 22.015 | 7.923 | x2.778 |
| 17 | 88.211 | 31.043 | x2.841 |
| 19 | 352.468 | 123.681 | x2.849 |
For a more detailed breakdown of the results, click [here](https://github.com/sifnoc/kzg_test#test-results).
### Using KZGCircuit
We embarked on a series of tests to evaluate commits on the KZG commitment polynomial with various entries, followed by proof generation using the `KZGCircuit`.
This particular circuit was a creation of Punwai, and for those interested, it's accessible [here](https://github.com/punwai/halo2-lib/tree/kzg).
For our tests, we selected only 16 entries. It's noteworthy to mention that the circuit requires a significant number of rows to successfully generate a proof.
| Task | Duration |
|------------------------|----------------:|
| Generating vkey | 141.388s |
| Generating pkey | 61.444s |
| Proving time | 241.209s |
You can find more details about this test in [here](https://github.com/sifnoc/kzg_test#testing-kzg-commitment-on-kzgcircuit)
### Multiple Vectors
Combining multiple vectors into a single polynomial and then committing to that polynomial using the KZG commitment scheme is a more complex task.
Suppose you have `m` users, and each has a length of `N` assets.
$$U_0 = [H(user_0, asset_0, ... asset_n), asset_0, ... asset_n]$$
$$U_1 = [H(user_1, asset_0, ... asset_n), asset_0, ... asset_n]$$
$$.$$
$$U_m = [H(user_m, asset_0, ... asset_n), asset_0, ... asset_n]$$
Each single vector, `U_i`, represents the data of a user at index `i`.
We can construct a polynomial that represents all these vectors. One way to do this is to use a two-variable polynomial:
$$I(x, y) = \sum_{i=0}^{m-1}\sum_{j=0}^{n -1}a_{ij} L_i(x) M_j(y) $$
where $L_i(x)$ and $M_j(y)$ are the Lagrange basis polynomials for the `x` and `y` variables, respectively.
They can be defined as:
$$L_i(x) = \prod_{{k=0}, {k\neq i}}^{m-1} \frac{x - x_k}{x_i - x_k}$$
$$M_j(x) = \prod_{{l=0}, {n\neq j}}^{n-1} \frac{y - y_l}{y_j - y_l}$$
While the multi-vector approach offers the capability to handle large entries, it's important to note that the total degree of the polynomial still remains high. This could pose challenges for scalability, especially when the data needs to be shared with users for verification.
## Conslusion
The research suggests that KZG commitments could offer performance advantages over the existing Merkle Sum Tree approach in Summa.
1. Polynomial Commitment: The current single-vector scheme is limited by the polynomial degree, which escalates with the number of users and assets. Exploring bi-variant polynomial schemes could be a promising avenue for future research to efficiently handle multiple vectors per user.
2. KZGCircuit Limitations: The circuit faces bottlenecks in the degree of the input polynomial. This degree is influenced by the number of users and assets, making it a critical factor for scalability. Optimizing the KZGCircuit to address these limitations is crucial.
KZG commitments serve as one of the alternative schemes to the existing Merkle Sum Tree approach in Summa. While promising, the transition to KZG commitments requires further research to address limitations in polynomial vector approaches and the KZGCircuit for a successful and scalable implementation.