*This is a first draft*
## 3 issues facing current PoR solutions
1. **Dispute resolution:** In all existing PoL solutions there is the problem of how to resolve a dispute between user and exchange when a user claims that they are unjustly represented in the proof. If a user is misrepresented in the PoL, or not represented at all, then they will have a hard time convincing everyone. Generally it is the user's word against the exchange's, there is no way to tell which of the user or the exchange is being malicious; even if a few users come out with the same issue it's still not cut and dry because it might be that a bunch of people are trying to give the exchange a bad name. Ideally this problem should be solved for PoL.
1. **Knowledge of verifiers:** Another issue with existing PoLs is that, over time with many PoLs, if the exchange knows that only some users continuously do the verification then they know which users they can omit from the proof with high likelihood that they will not get caught. This issue only arises when the tree and inclusion proofs are controlled by the exchange, as apposed to having the whole tree public. Controlling the tree is important for a) privacy, because making the tree public can reveal some information about the exchange (total liabilities, number of users) and b) practicality, because the trees can be 10s or 100s of GB in size which means storing and downloading the full tree becomes expensive (especially when there are many PoLs).
1. **Friend attack:** In privacy preserving PoAs (like the one in Provisions) an exchange can cheat via the friend attack: the exchange asks a friend to send them some tokens while they do the PoA and then they send the tokens back (or their friend gives them their keys). Public PoAs (where the exchange publishes their addresses) are also susceptible to this attack but it is fairly easy to spot when this happens. Collusion between exchanges is another similar attack where exchanges share funds. Provisions does give a method to solve this issue but it requires exchanges to use the same protocol in a specific manner.
## Goals of a full PoR solution
With the above in mind we can make some goals:
1. _Diverse exchange requirements._ Exchanges have different requirements: from amount of time available to incorporate/run new software to privacy requirements on their and their user's data. Different requirements would mean different PoA/PoL protocols. Ideally we want to cater for all of these.
2. _Minimal effort for users._ Users are imperative to all existing PoL solutions, and so it's important that the effort involved on their side is minimal. The more users that are able to get involved with the PoL verification the better.
3. _Defend against the friend attack on PoA._ If multiple different PoAs are to be supported then there should be a defense against the friend attack (and exchange collusion).
4. _Solve the user V exchange dispute dilemma._ If a single user is misrepresented in the PoL, or not represented at all, then there should be a way for them to prove this fact. Most likely only a fraction of the users will be doing the verification, and so it is important that if even a single user finds an issue then they are able to prove it.
5. _Hiding continuous verification by the same users._ An exchange should not know which users verify the PoL because there may be only a few users that end up verifying repeatedly and then the exchange can tailor the PoL to exclude some users who do not verify.
6. _Auditor friendly._ The job of auditors should be made as easy as possible; they should simply just be able to pick a set of random accounts and then pass these onto the protocol which will handle all the heavy lifting.
## Possible solutions to the 3 issues
One solid defense against the friend attack (#3) is to ask an exchange to do a PoA in the past, where there is no possibility of having a friend send funds.
To achieve goal #4 one could ask the affected user to make their account information public, which would allow others to check that their claims about PoL incorrectness are valid. But this approach is susceptible to malicious users who just make up random data and claim an exchange is acting badly.
We need a mechanism that can give users the ability to prove cryptographically that they are misrepresented in the PoL. With this kind of mechanism malicious actors cannot pretend to be users claiming incorrect PoLs, so both users and exchanges benefit. One way this can be realized is if the exchange signs the account information that it is giving it's users and that will be used in the PoL. If a user can prove that the exchange said their balance was X at time T then they can prove if the PoL does not reflect this accurately. Note that this does not solve the problem entirely, because, for example, the exchange could just sign an incorrect balance and then it is again the user's word against the exchange's as to what the real balance is; but at least the locale of the problem shifts from PoL to the exchange's central management system.
Another possible solution would be to use TLSNotary but it's not clear if this is ready yet or if it would work for this use case.
To reach goal #5 an exchange could make their PoL public in some way, so that verification does not depend on an API that the exchange controls.
## Iteration 1
The above 3 solutions can be rolled into 1 by having a third party involved. This third party can be assumed to be trusted at first, but we will slowly move toward the trustless realm as we move through the various iterations of the protocol.
The design leans heavily on the following tasks that are out of our hands and may not be plausible:
- Exchanges will have to sign user account data (for which the pub key is publicly known)
- Exchanges will have to keep a full history of their user's balances
Basic outline of the protocol: exchanges sign user balances and share this with their users. Users asynchronously send these records to the 3rd party, who gathers and stores them over time. The 3rd party will request for the exchange to perform a PoR at some point in the past. Because the 3rd party has been gathering records over time, they have, in a sense, a commitment that the exchange has made, and can use this to verify the exchange's PoR.
Note that the PoRs are back in time, not current, which could be seen as a disadvantage but it is not unlike auditing in the financial world of today.
The first iteration of the protocol will not be concerned with hiding information from the 3rd party. This iteration is concerned with achieving goals #2 & #5.

[Iteration 1 full image](https://drive.google.com/file/d/1nCX09wgkbDOvKUdvDGyYuxLnEGSTxbQN)
Details:
1. Exchange sends users a signed record of their balance with a timestamp and account ID. These records can be sent out by email on a regular cadence (hourly, daily, weekly, monthly). If we want to achieve goal #5 then the exchange would have to send records out to all users at the same time.
2. Users send their records to the 3rd party; this can be done asynchronously to receiving the records. The 3rd party gathers the records in batches that have matching timestamps. When enough records with the same timestamp have been collected the 3rd party requests a PoR for that timestamp from the exchange (3rd party can wait a while before requesting, it's up to them).
3. 3rd Party chooses one of the available timestamps and requests for the exchange to produce a PoR for that timestamp.
4. Exchange produces PoA by composing a list of it's addresses and balances, along with signatures proving ownership of the addresses. Signature data should include the provided timestamp as a randomness check, but if this is not secure enough then the 3rd party can request for a random value instead.
5. Exchange produces PoL using an SMT (Maxwell+) (note that all balances and account IDs are exposed)
6. Exchange sends PoA & PoL to the 3rd party.
7. 3rd Party verifies:
- PoA by checking signatures and on-chain balances add up to expected value
- PoL by checking that it matches the published root and that the set of users for the specified timestamp are in the tree and their balances are correctly represented
- checks that `assets >= liabilities`
In this iteration everyone will have to trust the 3rd party to correctly verify the PoR. If this trust can be established then goal #4 has been achieved, but there is another way to achieve goal #4 using some crypto (next iteration).
Note that goal #3 is not achieved by this iteration because the exchange knows exactly which timestamps it is signing for it's users, so it can preemptively do the friend attack on those timestamps. The risk of this can be reduced by having the signing frequency increased to daily or even hourly; the reasoning behind this is that a friend is less likely to lend funds if the lending has to be done 24 times a day everyday. But there is still the collusion attack where exchanges share funds between one another, and this attack is easy no matter how high the frequency of signature production is, as long as the colluding exchanges sign at different timestamps.
## Iteration 2
The second iteration is concerned with achieving, on top of goals #2 & #5 from the first iteration, goal #4 (without trusting the 3rd party).

[Iteration 2 full image](https://drive.google.com/file/d/1M84zvDndk_kATJ1O8Fy4OXtz0kyfYexh)
Details:
1. Same as 1st iteration.
2. Same as 1st iteration.
3. Same as 1st iteration but the timestamp is published on-chain.
4. Same as 1st iteration.
5. Same as 1st iteration but the exchange publishes the tree's root hash on-chain.
6. Exchange sends PoA & PoL to the 3rd party.
7. Same as 1st iteration.
Goal #4 is achieved as follows:
- If the 3rd party finds that a user's balance is misrepresented then they can produce a zk-snark proof to show that this is the case, and then publish this proof for anyone to verify. Details of the snark are in the image.
- If the oracles find that a user is not in the tree at all then they can again produce a snark proof showing the user is not in the tree but should be.
The trust is not completely eliminated regarding goal #4 because the 3rd party is still trusted to do the verifications and produce the snark proofs. What iteration 2 does is remove the trust that the 3rd party is not being malicious toward the exchange. Before iteration 2 it would be possible for the 3rd party to throw the exchange under the bus by just saying that the verifications have failed; after iteration 2 they have to provide a proof for failed verifications. But after iteration 2 the 3rd party must still be trusted to not be colluding with the exchange.
### Notes on the snarks
Both the snarks require the root hash and timestamp to be public in order for anyone to verify the proofs.
The 2nd snark has to build the entire tree which can quickly become unwieldy given the number of users exchanges have. In the case of Coinbase there are ~10^8 users which implies that building a tree takes at least 10^7 hashes. With recent developments in snark technology (Testudo by Filecoin research) it may actually be feasible to have large numbers of hashes inside a snark. Related to this issue is the size of the leaf nodes, which have to be sent over the internet from the exchange to the 3rd party. If we assume the nodes to contain only an account ID and balance, both of 256 bits, then the size of all the leaf nodes is ~6GB which is not unmanageable.
It may be possible to circumvent the tree sizing issues by moving to a commitment scheme like Kate commitments for PoL, but it would need to be made non-interactive to keep goal #5.
Another possible circumvention would be to implore the Summa protocol, where by the exchange sends over zk-SNARK proofs instead of leaf nodes (one snark per user). Unfortunately the smallest zk-SNARK proof is 3 group elements (Groth16), which, if using an elliptic curve group, would be six 256-bit elements. Account IDs would have to be associated with each proof so the 3rd party knows which proofs to verify. This takes the total up to seven 256-bit elements, which is larger than a leaf node and so the total size of all the snark proofs would be larger than all the nodes.
Yet another workaround idea is to have the exchange sign the nodes that it sends to the 3rd party. The snark could then verify the signature as apposed to constructing the whole tree, which may be more efficient.
On the topic of snarks.. we can decrease the trust in the 3rd party a little more by having it produce a snark proof for when the tree provided by the exchange matches the published root hash. Again we run into the unfortunate problem of having to build the entire tree in the snark.
Conversely, if the tree does not match the published root hash then it's not possible to do a snark to prove this because how can the 3rd party prove that the nodes they inputted to the snark are the ones the exchange gave them? One way to solve this would be to have the exchange sign the nodes that it hands over to the 3rd party, which can then be verified in the snark.
There is 1 thing snarks definitely cannot help with at this stage and that's the PoA. This is because balances need to be checked against the chain. This means the final PoR equation `assets - liabilities` cannot be checked in a snark. Of course the PoA can be made public and then everyone could check it themselves.
When it comes to reducing trust in the 3rd party, if it's not possible or just impractical to use snarks then it's always an option to make the PoA/PoL fully public (we will see this in a later iteration) and anyone who does not want to trust the 3rd party can go check the data themselves. Of course this is not desirable from a privacy point of view, although there are ways of making proofs fully public while hiding information (we will see also this in a later iteration). Another option is to have the 3rd party be an oracle network. The network would have to be trusted as a whole to come to an honest consensus about a particular verdict. Unfortunately this means more hands have access to the data so there is a higher risk of data being leaked.
For now it seems that the signature method is the most effective so we will go with that in the next iteration.
## Iteration 3
In iteration 3 we add the PoL signature and additional snarks as well as adjusting the snark for missing user.

[Iteration 3 full image](https://drive.google.com/file/d/1mEO_5g4-ATavzpKJa3_0PRQutLv-MUCI)
## Iteration 4
The 3rd party still has to be trusted with the account information of the users. Let's solve this by having the account IDs hidden by hashes. We don't want to hash the ID directly because the 3rd party can still tell which records belong to the same account; we have to add a salt. To keep things simple by not adding another variable to the record, we hash the account ID concatenated with the timestamp. If the user wanted their own custom salt here then they would have to have a custom salt for every timestamp that the exchange signs data at (to make the hashes different so that accounts could not be tracked across different records); the exchange would also have to store all these salts so that it would know how to compute the correct hash for the leaf node given a particular timestamp.
Note that if the account IDs do not live in a large enough space then the 3rd party can brute-force the hash to figure out the account ID. For now let's assume that the account IDs live in a large enough space and are unique to each user (e.g. hash of user email/phone number).
Also note that it is not strictly necessary to have a hash of the account ID. A regular random number identifying each record uniquely would be just as good. Using hashes of existing data means no new information needs to be kept track of on the exchange's side.
This gives the 4th iteration.

[Iteration 4 full image](https://drive.google.com/file/d/1r7NDm4dyKIX00DjkPasqIjnbcvlXeHmD)
At this point we have something that looks very similar to existing PoR solutions out in the wild today. In fact, we need only to adjust the SMT to get the exact solution that BitMEX uses (albeit with signatures and back-in-time on-demand PoR requests).
## Iteration A5
So let's do that: we make the SMT a sparse tree, which will hide the number of users of the exchange. It will also hide users balances from the 3rd party unless the user explicitly shared their record with the 3rd party. We also require the exchange to publish the PoA & PoL (we don't actually need the full tree, just the leaf nodes). The snarks here are not used as ZKPK (since the data is public anyway) but rather as a compression algorithm to allow better performance for verification (so that it can be done autonomously in a smart contract, for example). If no snark is present then verifying that the liabilities list adds to some number would require downloading the whole file (which could be large), performing a signature check and then adding all the values together; with the snark one only needs to verify the snark and check that a) the pub key input of the snark belongs to the exchange and b) the asset sum input of the snark matches the published number.
As before we can't use snarks to show correctness of the PoA because at the end of the day one needs to check the balances on-chain.
One problem we have to think through regarding the tree is how the 3rd party knows which of the leaf nodes correspond to the records they have on hand. If we construct the hashes in a certain way then the 3rd party can brute force the indices until it finds the correct leaf nodes. It should take max `<num_splits> x <num_users>` hashes to guess all indices for a single record, which is very manageable.
We also have to adjust the snark proofs to account for the multiple paths up the tree. Note that trees do not need to be built anymore in the snarks but there are still a bunch of hashes that need to be computed for the missing-user snark (same as number of nodes). There is also a signature check on a larger piece of data, which could add a large amount of steps in the circuit.
At this point we have the BitMEX PoR but with a few added benefits:
- goal #2: users do not have to download code & nodes to verify the PoL, they only have to send their data to the [trusted] 3rd party (although they can still go the DIY route if they so choose)
- goal #4: the signatures allow snarks to be used to cryptographically prove when the exchange is misbehaving (both the users and the 3rd party can do this)
(goal #5 is already achieved with BitMEX's existing solution so it was not added above as an added benefit)
Note that since none of the snarks have to build the whole tree this iteration should be scalable up to Coinbase-sized user base, but one does actually need to do a detailed analysis on the snarks and their input sizes to be sure of this.
The 5th iteration (going in direction A) is the BitMEX version.

[Iteration A5 full image](https://drive.google.com/file/d/1_dYdQbOocAsMyU2P1nLiRsGHsJIgIe3G)
At this point we have a protocol where the following efforts need to be made:
- Exchanges need to run code once weekly/daily/etc that fetches user balances, signs them, and distributes them to the users (e.g. via email)
- Users need to do their due diligence and pass on their signed account info to the 3rd party
- Exchanges need to create public a PoA & PoL on-demand at a time in the past
The rest of the heavy lifting is done by the 3rd party, and if there is enough trust then one can be sure that all PoRs are verified to the best possible degree.
Advantages of the protocol:
- Goals #2, #4 & #5 are achieved
- The PoL does not give up any information about the exchange (number of users, distribution of liabilities) except for the total asset sum
Issues with this iteration:
- Users may not want to share their account balance with the 3rd party
- The total asset sum + address list of the exchange is public. Some exchanges may not want this.
- The 3rd party is trusted to check the validity of the PoA (but anyone not trusting the 3rd party can do this themselves since the PoA is public)
## Iteration B5
For the next iteration we are going to hide the account balance from the 3rd party, as well as the asset sum and liabilities of the exchange. Instead of building on iteration A5 we are going to build on top of the 4th iteration. We will use the PoA from the Provisions paper and the PoL from DAPOL+.
The PoA has the following advantage/disadvantages:
- [ad] the viewer of the PoA does not know which addresses the exchange controls or the asset sum.
- [ad] proof size & construction time as well as verification time are all reasonable (300MB, 15min, 3min respectively for anonymity set size of 500k)
- [dis] private keys have to be used directly in the PoA (exchanges may not be able to feed their private keys directly into foreign software)
The PoL from DAPOL+ was not envisioned to be sent anywhere in it's entirety (rather it was intended to be held within the exchange), as this divulges the number of users of the exchange to whomever receives it. Let's consider for the time being that this data leakage is not an issue for the exchange, as long as it's just to the trusted 3rd party. The PoL has the following advantages:
- viewer does not know user balances because they are hidden behind Pedersen commitments
- viewer does not know the exact account ID and cannot track users over different PoLs
- viewer does not know the asset sum
DAPOL+ offers various accumulators and there are benefits and trade-offs with each. There needs to be a more in depth analysis of which would be best to use here but for now let's use the simplest of them: NDM-SMT (non-deterministically mapping SMT), and set the height of the tree as low as possible to keep the total size low. This tree is then basically a regular SMT but with Pedersen commitments. We will assume that the tree is not going to be used by anyone else except the 3rd party; usage of the tree by anyone else would reveal number of users if the height is small.
The tree needs to be accompanied by range proofs. More range proofs are required in DAPOL+ than we actually need in our case because the tree is hidden and range proofs need to be given for each node in an inclusion proof for a user. In our case we just need range proofs for all the leaf nodes because as long as we know the input values are less than a certain amount we know their sum will pass the order of the group (since the order is much larger than we need as a range for individual liabilities). The total size of the range proofs is `672B x <num_leaf_nodes>` which is 67.2GB in the case of Coinbase (~10^8 users). Clearly this is not acceptable, but an exchange with an order of magnitude less users would result in a reasonable ~7GB. Having said that though, Bulletproofs (the range proofs used in DAPOL+) do offer collective range proofs that are logarithmic in size complexity in the number of included ranges-to-prove, so it's definitely plausible to have a Coinbase-sized user base produce a single proof of reasonable size.

[Iteration B5 full image](https://drive.google.com/file/d/1PTNXdXW-_HPKTLr7NeALSEIoQerggNbi)
Details:
1. Same as 4th iteration.
2. Same as 4th iteration.
3. Same as 4th iteration.
4. Uses Provisions as the PoA. The exchange is required to sign the Provisions proof values which comes in handy in the snarks. The entire proof is published. The asset sum is small enough to be put on-chain but the ZKPK values are ~300MB so would need to live on something like IPFS or S3.
5. DAPOL+ PoL with a signature. It's important here that the whole tree is not made public but the root hash and commitment to liabilities is.
6. Because the liabilities and assets are hidden behind Pedersen commitments the exchange has to produce a further proof that their difference is a commitment to 0. This can be done with using the Schnorr ZKPK which shows the exchange knows the discrete log to the group generator associated with the blinding factor. The size and proving time for this algorithm are both negligible compared to the Provisions PoA. It could actually be verified on-chain via a smart contract.
7. Same as 4th iteration.
8. 3rd party verifies the PoA & PoL.
There are 4 extra snarks:
- One each for proving that the PoA ZKPK is valid or not. The idea here is to compress the proof, not to hide anything. It is much cheaper to verify a snark and check 3 simple values (pub key, timestamp, asset commitment) than it is to download the whole PoA proof and verify it. Note however that one still needs to check the on-chain balances and this can either be entrusted to the 3rd party or anyone else who's willing to make the effort. Note the bad PoA snark will have to be different for every possible equation failure.
- One each for proving whether the range proof for the PoL is valid or not.
## Further iterations
We have only managed to achieve goals #1, #2, #4 & #5. Goals #3 & #6 are still out of reach.
Here are a list of ideas to develop further in another doc.
There are possible adjustments to the DAPOL+ tree that can do a better job at hiding the exchanges user numbers to whomever has the whole tree but these ideas are not part of the paper and so would need to be reviewed properly.
On-demand request of records by users or auditors does not work because if one of the on-demand records was given to the 3rd party and the 3rd party requested a PoR from the exchange for that timestamp then the exchange would know which user is the one which would be validated in the PoL, and so they can manipulate the tree.
To solve the friend-attack issue and allow on-demand requests of signed data we could adjust the method of singing data: instead of signing data at the same time the exchange could sign account balances every time an account balance changes; this would mean that all signatures happen at different times.
When a user requests records they can request records during an interval. And they can also request that the exchange regularly send them records at a cadence specified by them; the cadences need not be the same between users.
When the 3rd party request a PoR from the exchanges they will give an interval (as apposed to a timestamp) and the exchange will produce a PoL that includes all balances for each user within that interval.
Another issue occurs with the signing if we try to allow auditors to come in and request signed records for multiple users: the exchange could create fake data and sign that--the auditors wouldn't know if the data it contained was correct or not. The same issue applies to users requesting records in the past and they don't remember what their balances should have been at the time.
It may be possible to solve this if we allow the 3rd party to act as another 3rd party that controls in some way the keys used for signing. If the key can only be used for signing data during a certain time interval then exchanges would not be able to sign data for a given timestamp/interval if it was in the past, it would only be able to retrieve pre-signed data from it's database.
Another issue with auditors is that the exchange will know which records they are fetching and so could tailor the tree to be honest for those accounts but not for others. This is not such a big issue because auditors could just request account data at timestamps that already have PoR's. To keep the exchange from hiding accounts that it might've misrepresented in the PoL the auditor can go to the 3rd party and ask for a random set of leaf node hashes, then take this set to the exchange. As long as the 3rd party is not colluding with the exchange this would be full-proof.
Issue with the trust: the 3rd party is trusted to produce the snark proofs when PoL verification is wrong for a user. They could just be lazy and not look for any problems, or be colluding with the exchange. One way to solve this would be to add incentive for finding errors in the PoL, a monetary reward for example. Another way would be to have the 3rd party be on oracle network.
A good point to note is that if a trusted 3rd party could be setup then, without any of the signatures and snarks there would still be value in having this 3rd party because it could continuously verify the PoRs that exchanges have out today. (The trust would be coming from the users, not the exchanges, since users would have to come to the 3rd party and give their account data so the 3rd party could check the data against the PoL that the exchange produced.)
One cheap way to get goal #4 is to have different parts of the tree live in different S3 buckets. As long as the exchange pays for storage of the whole tree then users don't have to download the whole tree to verify it (only a part).