Trust and Upgradability

Special thanks to Cláudio Silva, Danilo Tuler and Milton Jonathan for reviewing this article!

Hello, my name is Guilherme. I've been part of the on-chain team behind Cartesi Rollups since late 2021. Since then, one topic that has always sparked my interest and that of fellow Cartesi contributors is upgradability. What are the trust assumptions at play? How can we minimize them, while also taking into consideration other aspects such as implementation complexity, UX, infrastructure costs, and L1 fees? Let's dig in.

Introduction

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

In the software industry, when a severe security vulnerability is reported and fixed, users are highly encouraged to upgrade to the patched version.

On web2, users aren't often given much choice besides agreeing with the new terms and conditions. Users are, therefore, often coerced to agree with the new terms, despite possible setbacks in privacy and transparency. This is even more severe with vendor lock-in practices, which are so common in this industry.

On web3, to employ this policy would be incompatible with the philosophy of zero trust. That is why, once a smart contract is deployed, its code cannot be modified, not even by the deployer. It can, indeed, self-destruct, but the conditions for that to happen are specified by the smart contract itself, much like how a paper contract would state the conditions of its conclusion. Furthermore, any upgrade/migration logic must be described by the smart contract from day 1.

You could argue that certain elaborate proxy contract schemes allow authorized parties to switch between implementations with relative ease, but this flexibility comes at the cost of users having to trust the proxy owner. Some users might not mind the backdoor, or even know of it, and some might even deposit real L1 assets, but more seasoned web3 users would avoid interacting with such application, as they have seen enough hacks taking advantage of this design in the past.

So, even though Ethereum allows for pseudo-trustless applications, users tend to value more those applications in line with the web3 ethos. Likewise, Cartesi Rollups inherits some of the principles set forth by Ethereum, by establishing the genesis state of the machine as an immutable on-chain value, and by allowing anyone to add inputs, validate/execute outputs, and deploy new applications with whichever code.

We have to admit that we're still not there at the golden standard of trustlessness, just like many other scaling solutions, employing so-called "training wheels". However, we're making great strides with Dave, our fraud proof system, which will allow permissionless and trustless arbitration of disputes on-chain, with great properties in favor of honest actors with average hardware and modest staked funds.

No upgrades

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

What if our application did not implement any upgrade strategy whatsoever?

If our application were validated by Dave, our fraud proof system, then, as long as there is one honest validator, it will stay alive and safe.

Alternatively, if our application were validated by an authority, then they would either have to commit to the task of validating it forever, or eventually pull the plug. In the latter case, the authority can give a generous heads-up for users to withdraw their assets before the shutdown, but it would always be possible for some inattentive user to miss the announcement and only realize they've got their assets locked forever after the shutdown.

You could remedy this worst case scenario by allowing an authorized party to trigger a system-wide withdrawal request. This would harm trustlessness, but, then, if your application is being validated by an authority, users already need to trust that they will keep the application alive, right? And if you're going to pull the plug either way, you might as well guard users against locking their assets.

Having no upgrades in place is definitely the simplest option from the implementation perspective, but it can have some drawbacks. If a user wanted to migrate from one version to the next, then they would have to issue a withdrawal request, wait for the epoch to close, and execute as many vouchers as asset types they'd like to bridge from L2 to L1. Then, they will have to deposit these assets to the newest version of the application. This whole process can be very costly, and, honestly, is also not great from a UX point-of-view. ERC-4337 might improve on this, with bundling.

In terms of trust, though, it would be very much in line with the web3 ethos. The application user wouldn't have to trust the code of the next version. But what if they do, and want to migrate? Can we have some improvement over the current model, with similar trust assumptions?

Individual upgrades

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

So, imagine that a user wanted to migrate their assets from App v1 to App v2. How could the process be smoother? Well, App v1 could support a special type of withdrawal request that bridges the assets directly to another application. The back-end of App v1 could be totally unaware of the existence of App v2. If the App v1 front-end can be easily updated, it could suggest App v2 as a possible bridging destination.

In more technical terms, App v1 would accept a direct input from users requesting migration, that would emit one voucher for each asset type. In the case of ERC-20 tokens, for example, it would first emit an approval voucher to the ERC-20 portal with a high enough allowance, and another to the ERC-20 portal with the user balance in L2. A similar approach would be possible for other token standards.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Technical detail: Ether withdrawals

As of Cartesi Rollups SDK v1, it would be a bit convoluted for an application to deposit Ether through the EtherPortal contract, because the withdrawEther function doesn't take a payload parameter, which would be used to encode the Solidity function call. For Cartesi Rollups SDK v2, we are planning to add a value field to all vouchers, so Ether withdrawals and payable function calls will be much more straightforward.

This approach already saves the user potentially many L1 transactions. Instead of executing a withdraw voucher and depositing an asset manually, the user would execute a voucher that would deposit the asset directly into the new application. On the execLayerData field, the application would specify the user address, so that the receiving application would know in which account to deposit the tokens.

One problem with this approach is that every user that wants to migrate from v1 to v2 still has to execute as many vouchers as assets they would like to bridge from one application to the other.

Group upgrades

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

What if we could transfer the assets of a whole group of users at once? This can reduce the total amount of L1 fees spent on voucher execution during migrations. You could sum the balances of all users that want to migrate to the same application, and specify in the execLayerData field how much each user owns from that sum.

If the size of execLayerData grew with the size of the group (for example, if you encoded it as an array of address-balance tuples), then so would the cost of executing that voucher. Still, you would be saving some gas per group user, since you would be splitting a fixed cost among everyone. This fixed cost includes retrieving the claim from the consensus contract, checking a validity proof, and marking the voucher as executed.

For small groups, this approach might be good enough, and you would still be saving some gas per user. For larger groups, however, this solution might be unfeasible. Enter, Merkle trees.

Mass upgrades

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

If an application reaches a sizeable userbase, then per-user and per-group upgrades might not scale very well, due to the high cost of L1 calldata. With mass upgrades, however, the size of execLayerData would not grow with the size of the group. It would, in fact, have constant size.

When the application decides to trigger a mass upgrade, having a considerable amount of compliant users, it would, for each asset type, construct a Merkle tree out of address-balance pairs, and compute its root hash. This would be the execLayerData field of the deposit voucher.

We could also construct a sparse Merkle tree, in which the leaf index is the user address itself. It would contain

2^{160}

leaves, but we know an efficient way to compute the root hash of sparse Merkle trees with complexity

O (n)

, where

n

is the number of non-zero leaves.

From the other side, App v2 would receive a deposit coming from App v1 with this Merkle root as execLayerData. With only the Merkle commitment, it would, of course, not be able to open it up, and assign balances to the respective accounts. So, users would have to submit either Merkle proofs, or all the address-balance pairs. A nice property of this strategy is that it is App v2 that decides how the Merkle root commitment can be opened up. As we'll later see in section "Impact on L1 fees", there are many different ways this can be done, with advantages and disadvantages each.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Technical discussion: EIP-4844

If the application uses L1 calldata as the only source of data availability, then you could argue that L1 blockspace is still being disputed when users have to wake up their dormant accounts by providing sizable proofs as inputs. Alternatively, in the future, EIP-4844 could allow for multiple proofs or a large array to be sent in one input, given that each blob has a fixed size of 128 KB.

Support for EIP-4844 as data availability for Cartesi Rollups SDK is still experimental, and will only be possible in the upcoming release of SDK v2. You can check out this repository for implementation details and resources of my Experiment Week project.

Impact on implementation complexity

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

From all the strategies, the easiest one to implement is, by far, having no upgrades at all. Deposits and withdrawals are actions that Cartesi application developers already know how to implement.

With individual upgrades, we need the application to emit one voucher to the token contract approving a transfer, and another targeting the appropriate portal for the asset type being bridged. In the execLayerData field of the second voucher, the application must specify the address of the user (

20

bytes).

With group upgrades, we need to have some group formation logic. A simple implementation is to add two actions: join group and close group. The first action would create a group headed to an application, if there isn't one already, or join an already existing one. The second action would trigger the migration of the group to the target application. Then, the application would have to sum the balances of all group members, and emit a single voucher to the appropriate portal with the addresses and balances of all members in the execLayerData field (

52 n

bytes, for

n \geq 1

With mass upgrades, you could use the same group formation logic. Once a group is closed, the application would construct a Merkle tree on top of all address-balance pairs, and emit the leaves in a report, and calculate the root hash, which would be passed down to the application via the execLayerData field (

32

bytes). The report would be used for later commitment reveal in App v2, which may take many different forms. Since App v2 could receive multiple Merkle root commitments from App v1, they could be distinguished by a commitment index.

App v2 could accept a reveal request, containing the address of the old application and the commitment index. It could accept either all address-balance pairs, from which the Merkle root could be recomputed and checked against the one sent by App v1, marking the commitment as "used". It could also accept a proof of an individual user, and mark that specific user as "woken up". Alternatively, using the mixed method, it could accept a proof of a subtree, and mark that subgroup of users as "woken up".

The receiving application would already know the address of accepted "old" versions, and would be able to distinguish between upgrade types simply by the size of the execLayerData.

`execLayerData` size	Upgrade type
$20$	Individual
$32$	Mass
$52 n, n \geq 1$	Group

Impact on UX

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Let's analyze how the strategies we've enumerated in the previous sections compare in relation to UX. And, by that, we mean, we'd like to minimize user interaction as much as possible, while still giving them control over their assets. All of them (except the "no upgrade" strategy) require at least one input from the user telling the application they agree to migrate.

Strategy	# Transactions	L1 fee	Notes
No upgrades	2-3	$O (1)$	User must sign 1-2 TXs
Individual upgrades	1-2	$O (1)$
Group upgrades	1-2 per group of $n$	$O (n)$
Mass upgrades	1-2 per group of $n$	(see note)	Requires later commit reveal

Note: Without EIP-4844, mass upgrades cost

O (n)

calldata. With EIP-4844, they cost

O (1)

calldata and

O (n)

blobs. More in the next section.

Impact on infrastructure cost

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

In a scenario in which Dave is being used, then as long as there is one interested honest party, the application will keep alive and safe.

In a scenario in which an application is being validated, instead, by an authority, there would be two possible outcomes:

The authority will keep validating both versions, increasing infrastructure costs.
The authority stops validating the old version, and starts validating the new one.

In the latter case, users would need to withdraw their assets before the shutdown. As suggested before, the application may even have a shutdown feature, in which all assets are withdrawn at once. The application could also support an opt-in policy for upgrades. Past some date, those that have opted in would have their assets migrated to the latest version, while the rest would have their assets withdrawn.

Impact on L1 fees

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

How upgrades are carried out can directly affect the amount of L1 fees that users have to spend. Let's compare the aforementioned upgrade strategies, for bridging one asset type between applications.

Without upgrades, the user has to manually execute 1 withdrawal voucher and do 1 deposit. Also, depending on the asset type and allowance, the user may also need to do 1 approval to the portal. This is the default way to bridge assets between applications.

With individual upgrades, users would be able to execute a minimum of 1 deposit voucher. Depending on the asset type and allowance, though, they might need to execute 1 extra approval voucher. In practice, however, the application could emit a maxed out approval voucher that could leverage multiple deposit vouchers.

Now, with group upgrades, we start to amortize the fixed cost of executing a deposit voucher among the group. So, even though the execLayerData field would grow linearly with the group size (52 bytes per group member), the total amount of L1 fees would still be smaller, when compared to individual upgrades.

Finally, with mass upgrades, we might get the cheapest L1 fee, because the execLayerData field would have a constant, smaller size of 32 bytes, regardless of the group size. If proofs are delivered through L1 calldata, then the total L1 cost might be worse than in the group strategy. If they are delivered through EIP-4844 "blobs", on the other hand, this strategy might have an advantage, assuming blobs are cheaper than calldata.

If the constructed Merkle tree is sparse on the Ethereum address space, the proof would have the following structure, occupying a bit over 5 KB. Each blob could support, therefore, 25 users.

struct Proof {
  address user;
  uint256 balance;
  bytes32[160] siblings;
}

Alternatively, the Merkle tree could have address-balance tuples as leaves. So, for a group of

\leq 2^{H}

users, we could bring down the number of siblings from

160

H

, which would, in turn, allow us to increase the number of proofs per blob. With this, an extra leaf index field is needed in the proof.

struct Proof {
  address user;
  uint256 balance;
  uint256 index;
  bytes32[H] siblings;
}

Finally, there is another option: to send the whole array of address-balance tuples (

20 + 32 = 52

bytes per tuple). For small groups, this is totally affordable with calldata, but, for larger groups, EIP-4844 would really shine as a cheaper data availability alternative.

struct Leaf {
    address user;
    uint256 balance;
}

struct Proof {
    Leaf[N] leaves;
}

The following table shows how many EIP-4844 blobs would be needed to send all the proofs or the whole array of address-balance pairs, for different values of

H

. Also note that, with respect to the group size

n

, the proof size is

O (l o g n)

, while the array size is

O (n)

. Therefore, to migrate the whole group, sending individual proofs takes size

O (n l o g n)

, while sending the whole array takes

O (n)

Group size	$H$	Proof size	# blobs (proofs)	Array size	# blobs (array)
1	0	84	1	52	1
2	1	116	1	104	1
3-4	2	148	1	156-208	1
5-8	3	180	1	260-416	1
9-16	4	212	1	468-832	1
17-32	5	244	1	884-1664	1
33-64	6	276	1	1716-3328	1
65-128	7	308	1	3380-6656	1
129-256	8	340	1	6708-13312	1
257-512	9	372	1-2	13364-26624	1
513-1024	10	404	2-4	26676-53248	1
1025-2048	11	436	4-7	53300-106496	1
2049-4096	12	468	8-15	106548-212992	1-2
4097-8192	13	500	16-32	213044-425984	2-4
8193-16384	14	532	34-67	426036-851968	4-7
16385-32768	15	564	71-142	852020-1703936	7-13

From the table above, we can see that arrays make more efficient use of blobspace, compared to individual proofs (2520 vs. 352 users/blob, respectively). This is easy to see given that proofs also contain a leaf index and a siblings array. Arrays may be more space efficient, but they need to be complete to compute the Merkle root. Meanwhile, proofs allow progressive upgrades, in batches of 352 users. This option would only make sense for more groups larger than 2520 users, in which case 1 blob would not be enough to fit the entire array. Still, in this case, it would take at least 9 blobs to fit all proofs.

There is also a mixed approach, in which you provide the leaves of a subtree, and the inclusion of that subtree in the bigger tree. It seems that the maximum subgroup size is

2^{11} = 2048

users/blob. The proof then contains

H - 11

siblings, for

11 \leq H \leq 160

, which all still fit in a blob. This is a considerable improvement, compared to the

352

users/blob ratio we had previously. This mixed approach allows for a users/blob ratio comparable to the pure array strategy (2048 vs. 2520 users/blob), while also allowing progressive upgrades, and requiring, in practice,

O (n)

blobs.

struct Leaf {
    address user;
    uint256 balance;
}

struct Proof {
    uint256 index;
    Leaf[1 << 11] leaves;
    bytes32[H - 11] siblings;
}

Conclusion

Upgrades are a vital part of traditional software development. We cannot escape from them, even in the web3 era. However, extra caution must be taken when translating traditional web2 procedures to a paradigm of zero trust.

The blockchain also imposes several technical limitations to the amount of data that can be transported between applications that use L1 as data availability. These should also be accounted for, as they impact experience and migration costs for users.

Furthermore, we also should consider designs that allow for gradual or progressive upgrades, specially with large userbases. We should also look forward to EIP-4844 making large inputs cheaper to send on L1.