owned this note
owned this note
Published
Linked with GitHub
# Adapting Pointproofs for Witness Compression
The proposed polynomial commitments approach to compress witnesses suffer from complications related to state updates. Here is a simple competing approach that adopts, adapts and extends the great work: Pointproofs: Aggregating Proofs for Multiple Vector Commitments by introducing a hierarchical tree of vector commitments.
## Brief background: Pointproofs paper
The PointProofs paper focuses on replacing contract storage Merkle Patricia tries with vector commitments (with subvector openings), and Merkle proofs with point proofs. It treats the contracts to have storages of a fixed length (evaluations in the paper assume 1000 storage locations per contract). The paper does not explicitly address the witnesses or proofs for the world state trie.
The fundamental contribution of the paper is in the construction of a single point proof for a block accessing various locations inside different contract storages, and hence giving a tiny constant size witness. However, to validate the clients need to maintain commitments for every contract account. Another drawback is about growing contract storage. When the contract storage grows in size and needs more than N locations, then the entire trusted setup needs to be redone.
The biggest advantage in relation to polynomial commitments is that updating the state commitment does not need the whole state. However, this approach does not cater to exclusion proofs.
## Proposal
Here is an approach that extends the central contribution of the PointProofs paper towards:
* variable and unbounded growing of contract storage sizes,
* unbounded growing of the world state, and
* more importantly for validating clients to maintain only one commitment.
Recollect that the world state is a key-value store, where keys are the 32 byte hashes of the account addresses, and values are 128 byte tuple of (nonce, balance, storageRoot, codeHash) where each element in the tuple is of size 32 bytes. The storageRoot of a contract account is the Merkle root of the trie implementation of another key value store. In this key value store, keys are 32 bytes of indexes in a contract storage that represent state variables and values hold 32 bytes of data.
In this approach we will have two kinds of vector commitments trees (VCT): world state VCT and contract storage VCT.
### World State Vector Commitment Tree
We could group accounts in groups of size N, and build a hierarchy of vector commitments, as shown below.
![](https://i.imgur.com/QAl8Ywd.jpg)
At the very bottom, we have 32 byte hashes of the account information: address, nonce, balance, root storage commitment (discussed later) and codehash. N of such hashes are committed by a depth 2 vector commitment, that are hashed again and N of such hashes are bound by a depth 1 commitment, and are hashed yet again, and N of such hashes are bound by the root vector commitment, shown by RootC. When N = 1000, the RootC binds 10^9 accounts with a depth of 2. Note that this VCT does not store the account information, it only stores the hashes of them.
#### Single Point Proof
Illustration that a single point proof is enough to capture the updates to different accounts follows. Suppose, N = 1000, and a block reads 5 accounts, say indexed by 1, 1001, 2001, 3001 and 4001 accounts (assuming counting starts from 1). Then a single cross commitment revealing point proof can be constructed following the ideas from the paper. Let $\pi_1, \pi_{1001}, \pi_{2001}, \pi_{3001}, \pi_{4001}$ denote the corresponding revealing proofs using the commitments at depth 2. Let $\pi_{d1}$ be the revealing corresponding to the commitment at depth 1. Let $\pi_r$ be the revealing corresponding to the root commitment. Then, let $\pi$ denote the cross-commitment aggregation of these proofs: $\pi_1, \pi_{1001}, \pi_{2001}, \pi_{3001}, \pi_{4001}, \pi_{d1}, \pi_r$.
To verify, a stateless client needs to be provided with the read witness containing:
1. positions of the accessed accounts,
2. information of the accessed accounts,
3. involved commitments, and
4. the point proof $\pi$.
In our example, the block is passed with the following additional information.
* Positions : 1, 1001, 2001, 3001, 4001.
* Information of the account at these positions.
* Depth 2 commitments: $C_1, C_{1001}, C_{2001}, C_{3001}, C_{4001}$, and depth 1 commitment say $C’_1$.
* The point proof π.
In our example, a validator first constructs the hashes of accounts in positions 1, 1001, 2001, 3001 and 4001 from the provided account information, and the hashes of commitments $C_1, C_{1001}, C_{2001}, C_{3001}, C_{4001}$ and $C’_1$.
A validator is required to store only the world state root commitment.
In order to verify we need to fix an order to traverse this VCT. Any fixed traversal order such as in-order traversal is fine. We can then verify the cross-commitment aggregated proof using the same equation provided in the paper.
![](https://i.imgur.com/Z3p0ZQm.png)
The essential difference is that the paper expects that the client has all the commitments used in the LHS of the above equation, but in our case, we have only one root commitment, and the rest of the commitments are provided as part of the read witness. Another difference is that the hashes of the provided depth 1 and depth 2 commitments are also used as values in the RHS of this equation.
The number of required commitments inside the read witness for a block reading $k$ accounts is in the order of $k~log~k$ (logarithm base is N) in the worst case. Note that the size of Merkle proofs are in the order of the state size.
Same-commitment aggregation and cross-commitment aggregation of proofs require random scalars denoted by $t’_j$ and $t_{j,i}$ in the above equation. The same random scalars have to be constructible by both prover and verifier independently. The same approach of the paper that uses hashes can be used here as well.
#### Writing into accounts
In addition to the data described in the points 1, 2, 3, and 4 of the read witness, we naturally need the updated account values and nothing more. The basic idea is to extend the notion of UpdateCommit in the paper. First, compute the new hashes from the updated account information. Then use the old and new hashes to compute the updated commitment at depth 2. Similarly using the hashes of old and new commitments at depth 2, compute the updated commitment at depth 1. Further, we can compute the new root commitment using the old root commitment. The important thing to note here is that the root commitment is updated without blowing up the size of the witness.
#### Adding / Deleting accounts
For the sake of easy exposition, a fixed hierarchy of two depths was mentioned above. In fact, we can allow the VCT to grow in an unbounded fashion. Consider the case where we have only a zero depth tree. It means we have a single vector commitment at root binding N hashes in its next level below. Suppose we exhaust all N accounts and need to create a new N+1th account. Then we can increase the depth of the VCT by 1 dynamically as shown in the figure below.
![](https://i.imgur.com/JrhWXUc.jpg)
**Explanation**. As part of this write operation, the validator needs to provide the details of the newly created account. The validator has the old RootC. The following sequence of operations creates the new VCT.
* Compute the hash of the new account
* Compute the new depth 1 commitment, C1, with the above computed hash and N-1 locations with zeros.
* Construct the N-2 depth 1 commitments with zeros and their hashes.
* Construct the new RootC from the hashes of oldRootC, C1, and N-2 empty commitments (zeros).
Note that a validator does not require anything more for this operation of adding accounts and dynamically updating the VCT.
The deletion of contract accounts can be achieved by updating the corresponding hash of the account to zeros. Note that we treat elements with all zeros to be empty accounts, or empty commitments.
*Implementation detail / optimisation*: It will save the number of exponentiations both for the prover and for the verifier if we allow Hash(0) to be 0.
### Contract Storage Vector Commitments Tree
The above explanation discusses only the world state. The design of contract storage VCT follows from the same principles, except that we do not need separate hashes at the deepest level. At the deepest level of the tree, the commitments are directly constructed from N storage locations. The account information holds the root of the storage vector VCT.
In addition to 1. to 4. of the read witness of the world state, a validator needs the following information when contract accounts’ storages are read:
* positions of the accessed storage locations,
* read data values,
* involved commitments that are part of the contract storage VCT, and
* one single point proof for all the transactions in a block.
The provided proof is verified by a validator on the same lines as defined above for the world state point proof verification.
Note that the validators do not store root storage commitments, but they are provided with. The integrity of the provided root storage commitments is first verified when we verify the world state point proof, followed by verifying the integrity of the provided data values for contract storage locations.
For the case of writing, in addition to the above information, a validator needs to be provided with the values that are written. Once the integrity of the read values are confirmed, the new commitment is constructed using the differences between the new and the old values, as described in the paper.
## Cache Friendly Implementation of World State and Contract Storage
Note that we do not need to store the full information of the account in the world state VCT, only hashes of them are sufficient. This opens a scope for an efficient cache-friendly implementation of the world state key store. The hashes (hash pointers) in the current modified Merkle Patricia Trie of the world state is not needed any more, because with this approach, we do not use Merkle proofs anymore. Then this enables an efficient cache-friendly implementation of the world state.
Similarly, it is possible for the contract storage VCTs to not store the actual data values, and store only commitments and their hashes.
## Advantages
1. **Witness compression.** The additional information required for an entire block is:
* world state point proof,
* contract Storage point proof,
* relevant commitments of World State VCT
* relevant commitments of Contract Storage VCT
It can be seen that the number of relevant commitments needed is in $O(k~log~k)$ where $k$ is the number of accessed accounts in world state (correspondingly number of storage locations accessed in contract storage) and the logarithm is to the base N. In contrast, Merkle tree based proofs are in the order of the state size (correspondingly contract storage size). So, if a block accesses $n$ accounts and $k$ contract storage locations, we would need a witness, whose size in the order of $O(n~log~n+k~log~k)$
2. **Validator stores only one world state root commitment.**
3. **One-time common trusted setup.** Addition and deletion of accounts or the growth in contract storages do not require revisiting of the trusted setup. The public parameters obtained from the trusted setup for the world state and for the contract storages are the same and stay unchanged for the life of the block chain.
4. **Enables cache-friendly efficient implementation of World State and Contract Storage.**
## Disadvantages
1. No exclusion proofs. Hence this approach is not applicable for data availability and accumulator problems.
2. Linear size public parameters.
3. Stepping up by a factor of N at every level in the tree, cause a sparse structure. A same N for the world state as well as for all the contract storages, may not exist.
## TODO
* Experiments extracting Merkle witnesses from blocks on Mainnet and contrasting with the size of the witnesses from this approach.
* Analysing the computational requirements to generate a witness. This requires analysing the number of multi-scalar multiplication required to generate same and cross-commitment proofs.
Thanks to Olivier Bégassat (PegaSys, ConsenSys) for reviews and inputs.