owned this note
owned this note
Published
Linked with GitHub
# (WIP!!!) Cheap Escape from Altitude
***This is a WIP document with ideas for how to break the 50byte per claim limit in shutdown - there is a major challenge in adding complexity and trading off security***
<!-- TODO: link main document -->
### Batching
The cost of making a claim will be dominated by the cost of posting the Merkle proof. The claim itself is consists of an address, a range of coins specified with two integers, and a block number, which is probably less than 50 bytes. This means that even if we replace the Merkle proof with an accumulator with very small proofs (like KZG on BLS12_381 at 48 bytes), half of the claim is the proof.
We can batch claims and simply provide a ZKP that all the claims are valid. This ZKP is not verified immediately, it is just accumulated to be recursively verified by the resolver later to save gas.
<!-- Maybe KZG proofs are just more practical and we can eat the extra 48 bytes? Is verification of KZG proofs in a SNARK efficient? Also, KZG proofs can be batched more efficiently!
It's not clear to me what ZKP and accumulator systems are best for concrete efficiency. Look into caulk perhaps?
-->
Claims with different block numbers can be batched together, but the batch's block number must be the earliest of all claims in the batch, or else users would be able to claim coins for later periods where they didn't own them and supersede the actual owner's claim. Therefore users should keep their proofs up to date so that they can batched with as many other claims as possible.
# Main Solutions
If an Altitude style validium shuts down, users will have to post at least 50 bytes to trustlessly claim each of their sets of coins. Consolidation and Escape Pods are two important techniques to minimise this cost. Consolidation lets users combine their loose change into a single region during normal operation, which lets them make a single claim during shutdown. Escape pods are a way for users within a contiguous region to claim their coins together costing only a constant amount of onchain data, while retaining control of their individual coins.
## Consolidation
Users will generally only want to have a single claim for all their coins to minimise the cost of fully trustless claiming, and simplify cooperative claiming (like batching and escape pods). There are 3 techniques that make consolidation cheaper and easier: shuffling lets users safely trade amongst themselves while keeping their claims valid; increasing the coinspace's dimensionality can give regions more neighbours to swap without increasing the number of bytes in a claim; tidying lets users force the operator to consolidate their coins.
### Shuffling
In the following linear coinspace `AABAA` (where Alice owns coins 1, 2, 4 and 5, and Bob owns coin 3), Alice needs two claims to retrieve her funds during shutdown. Alice and Bob can shuffle the coinspace to `AAAAB` or `BAAAA` to fix this. However, if they use standard transactions, they'll immediately invalidate their current claims, yet they'll have to get the claims for their new ownership. Also, Alice can't safely sign one of her coins to Bob until she sees that he has signed one of his to her and vice versa, leaving them at an impasse.
The solution is a new transaction type called a shuffle. A shuffle is just a collection of unsigned transactions in a Merkle tree. A shuffle can only be executed when everyone has signed it, meaning there is no risk in signing first. Users should only sign a shuffle when they have also seen and saved the Merkle proof for their incoming coins. That Merkle proof is used as an alternative form of claim during shutdown.
Unlike ordinary transactions, the shuffle must be posted onchain as an output of the state transition ZKP. This ensures that the chain can verify a claim using a shuffle during shutdown. This means shuffles may have slightly higher fees to amortise the extra gas cost, but this is likely negligible, as shuffles generally include many different users who share the costs.
Once a shuffle is built and signed, it can be forced just like a regular transaction. As the shuffle is being passed around for signing there is no guarantee that it will execute, so a user might sign many shuffles and hold many Merkle proofs as they wait for the rest of the users to sign the shuffle.
If a user has signed a shuffle, their coins might shift at any time, which is fine except for the difficulty in signing transactions with UTXOs that could become invalid at any point. To mitigate this, when a user signs a shuffle, they also sign the last valid block where this shuffle is acceptable. That way, their coins are only in limbo for a predetermined period.
<!-- Note how locking can make it easier to coordinate shuffling -->
### Dimensionality
Coins have 2 neighbours per dimension. In a linear coinspace, a coin can easily get stuck between users who are unwilling to trade. For example, if Alice won't trade, Bob is stuck with 2 claims: `ABAB`. Bob can trade his coins to someone else, but they'll be stuck with the same problem.
In this 2D coinspace, Bob is still stuck between Alice's coins on two sides, but he can trade with Charlie to reduce to combine his coins into a single claim.
```
ACAC ---> ABAC
ABAB ABAC
```
At the extreme we could use a boolean hypercube ([🧐](https://pbs.twimg.com/media/FveCMHdXwAIfnsN.png)), but claims would only be able to represent a restricted number of values. In particular, two points on the dimension $d$ hypercube form their own boolean hypercube of dimension $0 < n \leq d$, which encloses $2^n$ points. A resonable middleground could be using a 64bit number, with 8 dimensions of size $2^8$.
### Tidying
<!-- Note how tidying is probably mostly used in conjunction with tidying and how *we can quite succinctly represent a tidying request because the operator has full data knowledge.* Note how we can probably break the data transfer into parts in a state-channel-like game that allows the requestor to incrementally demonstrate that they're serious. -->
It's not necessarily possible to voluntarily trade with other users to tidy everyone's coins into contiguous claims as users can't be forced to trade their coins. However, the operator is in a unique position of being able to make sure that the validium doesn't shut down, so even if they have all the coins in small change, they can be sure that it can be taken out during graceful shutdown for minimal onchain cost. So it's reasonable force the operator to tidy up our coins.
Tidy requests can be made onchain like a regular transaction, except to tidy `n` coins the operator must first send `n` contiguous coins to the user, which is enforced in the main state transition ZKP. This is a potential attack vector against the operator, in that the operator could be forced to deposit large amounts of capital to give an attacker contiguous coinspace, while only getting change in return, which may not be cheaply withdrawn until shutdown, and may be hard to trade and tidy. However, malicious tidy requests can be ignored, forcing the user onchain. So the operator can probably follow an adaptive strategy that makes this form of attack capital inefficient.
Similarly, if non-malicious users accidentally ask for a tidy-up that will cost the operator significant capital, the operator can decline, and ask the user to extend their timeframe so that the operator can swap coins around a larger pool of other tidy-ups in the same period.
Tidy requests are dangerous in that the user temporarily loses custody of their coins.
## Escape Pods
If calldata costs 16 gas per non-zero byte, then a single batched claim costs at least 800 gas. A 15M gas block can process 18750 claims, and if blocks are every 10 seconds, which comes out to around 4B claims in 30 days.
With 1 billion users (less than Visa), at 4 claims per user (probably an underestimate), we occupy the entire Ethereum chain for a month. Even at much lower user numbers there would be effects like a spike in gas fees, and it not being worth claiming your coins.
### Pod Structure
Escape pods are an alternative form of batching that doesn't require posting the claims on chain. The idea is that owners of adjacent coins cooperate to form joint claim. Suppose Alice had coins 1-5 in block 10, and Bob had coins 6-10 in the same block. Instead of making two claims, Alice and Bob both sign messages stating that their coins will go to the smart contract `escape_pod_x`. Then they claim that "coins 1-10 were destined for `escape_pod_x` at block 10."
This claim can be proved with a single ZKP just like a standard batch, except that it must verify a signature from each claimant. The signatures are on the root of the claim tree, and the user should only give a signature for a tree when they already have the proof that their claim is in the tree, because generally that will be needed to get the tokens out of the escape pod.
Note that if Alice's claim was for block 10, and Bob's was for block 5, then we can only claim that "coins 1-10 were destined for `escape_pod_x` at block 5," since Bob didn't prove ownership of coins 6-10 after block 5. This is checked inside the ZKP.
<!-- Note how the block number of an escape pod is requires every subregion to have two proofs which enclose that block number. Note that this means we have to have non-repeating UTXO IDs (might already be a requirement), so that we can show that a coin was held continuously just by showing the start and end. Note this also means users should occassionally ask for new proofs so that they're likely to fit in the widest possible range of escape pods. -->
<!-- We should find some abstraction that clarifies the similarity between shuffling and escape pods - they both require signatures on the actual *root* of the new state. -->
### Pod Resolution
The most basic form of escape pod is just a smart contract that holds tokens, and lets users take the tokens out if they can prove they own the token using a signature and a Merkle proof. Most likely, escape pods will bridge to other L2s, so that users can take their funds out with low fees. Note, these will generally be run by someone who takes a fee.
It's possible for a user to sign multiple escape pods. In fact, it can be useful, as there's a risk that escape pod coordinators might choose not to deploy the pod you signed if they construct a better one. For example, if your claim makes the pod's shape more complex to describe onchain, that can induce a marginal cost higher than the coordinator's fee. Moreover, if users sign multiple escape pods, then it's more likely that a contiguous area will be able to agree on the same pod.
Pod coordinators might create escape pods during normal operation. This gives users some certainty that they will be able to get their coins out for low fees. However, escape pods gradually become outdated as more transactions are processed, making them less valuable to claim for the coordinator. Therefore, coordinators and users both want guarantees that their pods are high value.
The validium operator can tell the coordinator how valuable their escape pod currently is. An escape pod has a range and timestamp (e.g. coins 1-10 at block 60), and the operator can tell the coordinator how many of those coins have not moved since that timestamp. This gives an upper bound on the value of the escape hatch. However, other escape hatches might claim the same coins depleting the value. To avoid this, coordinators can "lock in" their escape pods by sending them to the operator. The operator stores escape pods in a list, where old escape pods supersede new olds. This list is part of the validium's state. Now, the operator can create ZKP showing how valuable a pod would be if operations stopped. Now, coordinators and users can be sure that a pod is valuable enough to be retrived. The pod will still degrade over time as money is moved, but this evaluation proof can be used to know when to build a new pod etc. Using forcing mechanisms described below, the operator is forced to lock in escape pods; return proofs of escape pod index, and evaluate escape pods. The proof of index is part of the ZKP posted when claiming an escape pod.
<!-- TODO: split up this paragraph and link the forcing parts -->
During resolution, certain escape pods will win certain regions due to their precedence in the list. The resolution proof calculates the winners, outputs a commitment to the lost regions for each pod, and calculates the amount of money for each pod. To get their data out of the escape pod, users should prove that their coins aren't from a lost region. Since the lost regions are deterministically determined by the escape pod indices, users can calculate a non-membership proof from onchain data.
## An Alternative Shuffling Proposal
### Forced Shuffle
TODO: write this part nicely
ACTUALLY! Can we have the following flow:
- user forces the operator to promise to tidy the coins by time x
- once time x is unalterable by outside parties, the operator commits to the shuffle *early* before it makes it into the state change!
- now the user has a time period to request their Merkle proof (and go through necessary forcing etc)
This means that we have don't have to worry about the unforced shuffling mechanic! The operator just chooses amongst the pool of shufflable coins.
There are several time periods that have to line up correctly, namely we have to order the following as `A < B < C < D < E`
```
A B C D E
|---------------------|-------------------|------------------|---------------------|
Inclusion Last moment Latest time Earliest time Point where users
in state to ask for batch can be batch can be can't effect the state
Merkle proof committed to committed to
```
One interesting question is how we can tune the relationships between these values so that we have the maximum flexibility in which claims we combine together. I think we can have a lot of flexibility if we alleviate the relationship between C and D by forcing users who want to force a swap to first lock their coins, so that the operator can be sure they can't spend them and mess up the shuffle. Then we make the gap between A and C quite long (like a day whereas AB is like 30 mins), so that the user can query daily and find out if their UTXO has been shuffled. That suggests that we need a separate method to force a request for whether the UTXO has been spent in any of the committed batches, so we should probably use an indexed Merkle tree for non-membership proofs. Note that E is essentially the lock-up point.
We also need a good word for the time when the batch is committed, because it's finalised but not processed, in that it will not be processed if the rollup goes down, but if it stays up it will be processed. This is different from the kind of instant finality we get from promises, which could simply not be processed, but that leaves a potential for slashing via interaction.
Is this actually safe? It seems like in the worst case, one could require *huge* capital lockup by the operator, which would be hyper expensive. If we can solve that worst case issue, then we can request contiguity of whole escape pod areas. If that's the case, then I think we're sorted! It seems like if this works at all, it will solve escape pods.
Perhaps we can require everyone to declare their escape pods anyway, such that we can exit nicely if everything is turning to shit without going to L1! That ends up being the "I can't do it sorry, I have to take the L" which happens in case of an attack where the captial lockup is so high it's worth losing the stake, or in the case where that happens naturally from an unreasonable consolidation request. We could cap the consolidation request I suppose. Such that you can only request a certain proportion of the existing capital. I do wonder what the attacks are.
OK, thinking out loud again. So what if the solution is that the operator can appropriate spare change into the mix set? OK but then you need to wait a *long* time in order to give them a chance to ask whether their coins have been appropriated. So that would be about a month, as that's the time period we're willing to force the users to check up on the chain (and now on their coins via the operator). OK, so to avoid a rejigging attack we have to make the shuffling lockup an entire month! Ah! I know, it's not strictly a lockup, it's more like the forced shuffle queue. Maybe, it takes a month for a shuffle request to become urgent, then it can be forced.
Alright, let's repeat that one more time.
A tidy request is a special transaction type where a user declares that they want to combine their claims into a single region. The request is added to a pool of unaddressed tidy requests inside the validium's state. The operator takes a set of tidy requests, along with their own coins, and figures out how to shuffle them such that all claims are tidied. The operator then commits to the shuffle and puts it in its own queue in the validium's state. Now the UTXOs in the shuffle are locked and can't be moved, and the operator will have to include it in a new block soon. The shuffle can't be included of a day which gives the users time to find out what their new claims will be. The shuffle's root is posted onchain.
There are forced APIs letting the user determine whether their tidy requests have been locked into a shuffle, and to get a Merkle proof/claim from a shuffle. Claims from a shuffle are equivalent to claims from a regular transaction, as both are just a Merkle proof relating a user, a region of coins, and a block number using an onchain Merkle root.
The operator is eventually forced to include a tidy request. However, this is an attack vector that could cause the operator to spend capital. Consider `AAAAaAAAA` where Alice owns `A` coins, and Alice maliciously moved `a` coins to another address. She can force the operator to tidy her `A` coins, but if she won't voluntarily let her `a` coins be shuffled, the operator needs at least 8 contiguous coins. Now the operator has two disjoint sets of 4 coins which are less useful for tidying. Now suppose Alice maliciously breaks up her coins again (`AAAaAAAA`) and forces another tidy request. The operator needs another 7 coins. With this simple technique, an attacker with `n` coins can force the operator to hold $O(n^2)$, which imposes a significant cost on the operator.
The solution is to make forced tidying very slow (about a month) and allow the operator to requisition specific coins into the shuffle pool after a month long period. This requires all users to check if their coins have been requisitioned every month, but they already have to check the chain every month to see if they need to make claims in the shutdown period, so this is not a major additional security assumption. Users can forcibly ask whether their coins have been requisitioned.
## Shuffling
Users will generally only want one claim for all their coins to minimise the cost of fully trustless claiming, and simplify semi-trusted claiming (like batching and escape pods).
Users request that their coins be tidied, which puts their coins in the shufflable pool. The operator commits to a shuffle on a subset of the pool. Users have time to get the claim on their new coins before the shuffle is finalised.
and can even use requisition uncooperative users' coins to guarantee consolidation. Cooperative users must check in ~daily to see if their coin has been shuffled so they can get their new claim. Uncooperative users must check in ~monthly to see if their coins have been requisitioned, which is similar to the requirement to check the chain monthly for shutdown. Increasing the coinspace's dimensionality give regions more neighbours to swap without increasing the number of bytes in a claim.
### Tidy Requests
ZKP requires all users be tidied.
Locking period.
In state queue.
### Requisitioning
<!--
TODO:
Discuss the differences in the assumptions:
- shutdown is unlikely for external reasons, requisition is likely
- requisition doesn't steal money, it just makes theft more likely during shutdown
- consider forced endpoints like "how many unacknowledged requisitions", or "list requisitions," which good actors can use
Consider the problems of transferring an requisitioned coin. Does it impose a real cost on the user? Do we have to worry about the operator imposing this cost on everyone?
Discuss the apparent tradeoff where cheap claiming requires vigilance. -->
# Bridges
Can we use these concepts around escape pods to implement efficient bridges? Do we even need those concepts?
<!-- TODO: L2 <-> L2 bridges. We have low bridges, where the recipients and amounts are posted onchain, and we have high bridges where a commitment to all those things are posted onchain. A high bridge needs an escape hatch to a low bridge. A high bridge is quite suitable in cases like upgrades, since we can put the sequencer of the recieving rollup on the hook for addressing the outputs of the previous rollup, since it's assumed to be the same party! So that's called an re-enforced high bridge.
Note, to make a reinforced high bridge, we have to make sure that the updates or both rollups are *atomic.* I.e., we can't update v1 without also including v2. -->