Try   HackMD

EVM speedups speedsup MEV

peter note: OCC with deterministic aborts is for nodes to be all the same when it comes to fee market charging fees for abort, plus classical OCC is not for distributed systems. primary market and secondary market (MEV) for transaction preferences. l2 sequencers/block building/latency

  • parallelization is basically nodes doing a first round of blockbuilding (kicking out irrelevant nodes in the graph). And then grind multidimensional knapsack on the remaining transactions
    - EIP1559 parallelizability market: 1559 is better if we assume there are little edge cases to n-dimensional knapsack problem, i.e., if bundles are independent then 1559 is like a naive auction algorithm which actually works
    - parallelization as a 1-dimention of the fee market is like merging all parallelization related resources together, which is sub-optimal and less refined? Yes, if we consider the local view. But if we consider the global view then the conflicts are not present in this design, but if so then we are essentially encouraging users to predict where the MEV is and avoid them, but then isn't this already done by per-account fees? So essentially the parallelization fee market is same as per-account fees!
    - parallelization is like binary blockbuilding
    - parallelizd auctions = OFA separation of txn batches = internalized by a block builder
    Fair ordering is like for those that do not benifit from the positive sum searching anyway, they need to be awarded with lower latency, but they cannot really because MEV is not a priori, MEV is a posteriori. So unless they know beforehand (opt-in), then they can —→ Arbitrum is really like enshrining latency/privacy as their designated social choice function that is better than any other social choice function

Through the lens of MEV, how do the optimization choices on EVM impact and be impacted by protocol design. We discuss the tradeoffs of several parallelism proposals to increase the EVM execution speed (for block building, private information bundle merging, and zkrollups) and show why they are subject to “the conjecture of MEV conservation.”

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Parallel block building before Ethereum (colorized)

TLDR:

Ethereum Virtual Machine(EVM)'s performance is vital for several reasons. First, if we have a faster VM then Ethereum clients will be able to process and verify transactions faster, thus making it easier for everyone to run a full node, strengthening the decentralization and security of the network. Second, and more importantly, as MEV extraction on Ethereum becomes more prominent, we need to make MEV extraction easier so profits from it can be more evenly distributed, preventing too much economic centralization of the network. A more performant EVM achieves this by helping searchers (and block builders when Ethereum introduces Proposer-Builder Separation) produce more profitable full blocks and relays to have lower latency. This means the builder market will become more efficient, thus attracting more searchers and making the market even more competitive, which in turn makes proprietary MEV extraction harder to dominate the profit and endanger network stability. The reason why Ethereum Foundation didn't push on the parallelization/performant VM front is because scalability research can help increase the throughput by 100-1000x while a more performant VM can only on average increase less than 10x. But now with PBS the need for more efficient block building arises, EVM speedups are becoming much more relevant to the security and scalability of the network again, and to improve EVM performance with backward compatibility and minimal change to consensus rules or how storage is implemented, we need parallelism.[1]

In part 1 of the series, we argued the need for a parallel EVM and covered several approaches to achieve it such as EIP 648, EIP 2930, and speculative execution. In this post (part 2) we present how to approach parallelism (and more broadly, the problem of shared state analysis) methodologically. We loosely separate the tricks we use into categories like refinements and witness oracles. We then present two example algorithms that utilize formal methods for better scheduling. Finally, we zoom out and show the tradeoffs made in our parallel execution design by discussing their application to block building, private information bundle merging, and modular blockchain scaling use cases like data availability proofs and fraud proofs.

Recap: A breakdown of the parallelization problem

Reusing the formalizations in part 1 of the post, we describe two jobs needed to derive a good EVM parallelization algorithm

δp:

  1. Derive information on possible shared data conflicts,
    κ
    , for every transaction.
  2. Based on the preciseness of the information, we design our parallelization algorithm
    δp(t¯,sk1,κ)
    , where
    t¯
    is the list of transactions and
    sk1
    is the EVM state for block
    k1

As we mentioned in the previous post, there are multiple dimensions as to the design of an optimal parallelization algorithm[2]: (i) the granularity of the shared data analysis information

κ, (ii) the granularity of the level of parallelization[3], and (iii) the degree of parallelism resources (e.g., how many processors we can use). We already gave a glimpse into how the first job (shared data conflict analysis) was done by discussing the approach by multiple related works, and we talked about how the second job (scheduling) might be approached. Now continuing the discussion, suppose we already have perfect information
κperfect
, what is the optimal parallelization algorithm we can have under Amdah's law[4]?

The dark fate

It turns out not much: with transaction-level parallelism, perfect information, and 20 threads to allocate, the optimal EVM parallelization algorithm

δp can only achieve an average 4x execution speedup (without any storage prefetch or merkle root calculation tricks) compared with a sequential EVM, which is not a lot since doing a naive speculative parallelism can already achieve 3x block execution speed not to mention more sophisticated versions that rely on caching. However, if we re-run the experiment while ignoring any shared data conflicts on the smart contract level, we get a 23.9x speedup. Upon closer inspection of the distribution of parallel speedup bounds (below), we can notice that even though on some blocks we achieved good parallelism speedups, a major portion of the blocks gave us less than 200% speedups (and this is backtesting with
κperfect
).[5] This means that for a large number of blocks in Ethereum, the application inherent conflict on the smart contract level is frequent.[6]

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Fig 1. Distribution of parallel speedup bounds from Utilizing Parallelism in Smart Contracts on Decentralized Blockchains by Taming Application-Inherent Conflicts

This inherent high conflict rate of shared data poses a serious problem. And the problem is becoming worse: data suggests that the conflict rate increases over time. This makes sense because when we are happily composing DeFi legos together we are also creating more interdependencies between smart contracts and therefore the Ethereum storage.

Bag of tricks

So are we ready to embrace the dark fate that a parallel EVM is capped at 4x speedup? Say no! Because the magician still has a few colorful strips up her sleeve.

Refinement

Recall that there are three dimensions as to the design of our parallelization scheme: we cannot change the parallelization hardware resource we have, but we can change the granularity of our shared data analysis as well as our level of parallelism to increase the parallelization ratio. We discuss those two dimensions separately:

  1. Shared data (
    κ
    ) refinement

Empirically, efforts to push for "free" state witness[7] oracles[8] like EIP-2930 and Solana's account model are typically coarse-grained. This means that the oracle acts as an easy heuristic for parallelization but we can and need to do better: to push for decentralized efficient block building and increase censorship resistance in a post-Proposer-Builder-Separation (PBS) Ethereum, we have to increase the granularity. And, for one to stay as a competitive block builder in an open market, having the most general low-effort level of parallelism gains is not enough. However, finer granularity also means higher latency in the generation of

κ, so if the state witnesses are not provided as oracles, then there is a tradeoff to be made.

  1. Parallelization level (
    δp
    ) refinement

We could also provide a more refined parallelization level (even automatically adjust the oracle-guided scheduling algorithm according to

κ). For example, for some smart contracts that have a high conflict rate and appear frequently in transactions, we can do offline static analysis to transpile the EVM bytecodes (especially those of "hotspot" contracts) into a more performant language with better parallelization support, which according to existing experiments on geth can generate 15-50x speedups. For example, we can identify commutative operations like the addition of uint256 (though the difficulty of identification increases as the programming language becomes more low level) and transpile them into some parallelizable instructions in a new language/virtual machine. This incurs no execution latency because we are not doing this static analysis and transpilation on-the-fly. And even if so empirically JIT compilation/profiling has been quicker than the naive approach, plus we can also iteratively update the oracles' granularity as execution progresses (or even use caching mechanism to do this "for free").

However, static analysis-based approaches like this are usually engineeringly challenging and potentially a DOS vector: back when TheDAO incident happened in 2016 people proposed to use soft forks that rely on static analysis to filter transactions, but this idea was soon discarded because of static analysis's latency and intractability concerns. It was argued that there is a way for malicious actors to intentionally embed transactions that can drastically drag the analysis speed down and thus cause serious network congestion.

This is similar to when MEV searchers spam duplicated transactions on Solana and causing the network to crash. Similar to how we can solve Solana's network crashing problem by introducing a fee market and charge simulation fees (reverted transactions), we can also introduce an incentive system to prevent systems that use static analysis from getting DOS-ed. For example, analogous to EIP 2930, we can naively reward people inversely proportional to how long it takes for their transaction to be statically analyzed. And, just like all incentive systems, our fee market have to be enforced inside some kind of consensus protocol that produce deterministic results, which equals adding another "n-dimensional fee market" layer on top of the existing chain which has to take into account of all the incentive pricing for block builders, smart contract developers, and transaction senders. However, even if we implemented this incentive system our scheduler

δp would still be subject to Amdahl's law so there is a diminishing return in our level of refinement.

Oracles & Programming paradigm

Since Solana introduced parallelism early on, project developers write their contracts in a parallelism-aware manner. For example, Serum puts different trading pairs for different orders into separate accounts, which enables a higher parallelization ratio. Moreover, when users sign their transactions, they are automatically providing a relatively fine-grained state witness oracle to the virtual machine (compared with if we store data for different pairs in the same account), and this is all "for free" in the sense that it incurs little to no extra execution latency, though this design does come with some tradeoffs on gas.

This is because having more accounts means the smart contract takes up more storage space, which means that the Ethereum state Merkle Trie will be larger and transactions will potentially cost more gas (because it needs to initialize/change more storage locations). More importantly, having more accounts means you also have to write extra smart contract logic to combine those accounts, thus incurring non-negligible gas costs (which might not be a problem on Solana but is one on Ethereum). Additionally, the extra controller logic for different accounts might create an uneven distribution in the gas fees for users (like in the case of a cache miss), thus creating new MEV opportunities. And we see this flaw in real-world too, in the early version of Tornado Cash the claim function for AP (the LP token for anonymity mining) not only updates your Merkle leaf but also everybody else's. So there was a time when the gas cost of claiming LP rewards was so high that for more than 24 hours nobody claimed and they all waited for the one "hero" to bear all the gas costs.

Another related design is Conflict-Aware Token Distribution, proposed in observation that token distribution (airdrops, IDO, etc,.) are by far the most common shared data conflicts. This distribution design mandates that the initial sender address be separated into several addresses so that the claiming transactions can be processed in parallel because they no longer access the same sender address balance.

Interestingly, having a parallel virtual machine helps developers to make their protocol's UX better because they now have some incentive to do parallelism-aware smart contract programming. For example, since Solana has a per-account fee market, it means that a protocol with a parallelism design, which has more accounts, will be able to provide a fairer fee for its users. This is because those who trade on less popular pairs won't have to pay for the transaction congestion caused by popular trade pairs in the same protocol. Moreover, the developer programming habit paradigm shift can in turn generate better state witness oracles and an inherently higher parallelization ratio, thus achieving a higher throughput of the network so it is not only fairer but also faster for everybody.

Now Ethereum doesn't have a per-account fee design. But it can be argued that MEV on Ethereum acts similar to a per-account fee market: transactions that touch popular accounts/smart contracts usually emit more MEV than those that touch unpopular contracts, thus they are be included in the block first (as searchers/builders want to capture more outstanding MEV). So the MEV that a transaction on Ethereum emits is inherently its fee (effective gas price of the bundle), and MEV is even more refined than a per-account fee market: it is per-transaction.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

But all those incentives is at the expense of (i) a higher engineering effort and (ii) a higher gas cost. So for many, the tradeoff might not be clear. A more radical approach is to directly design a metric (EIP) for incentivizing parallelism-friendly designs. For example, one can design a function that gauges how much parallelization ratio transactions on this smart contract are added to the system and rewards the deployer accordingly. Note that this is different from EIP-2930 which is incentivizing on the user-level for providing refined state witness oracles.

Automatic Refinement of Oracles

Sloppy counters had been used in Linux Kernel scaling to artificially increase the parallelization ratio. And Garamvölgyi et al first proposed to use them in smart contract scaling. One can think of a sloppy counter as a way to use algorithms to automatically expand your program into a parallelism-aware style (i.e., a way to automatically refine state witness oracles).

Specifically, sloppy counters replace one counter (e.g., uint256 or uint256[]) with multiple sub-counters and a guard expression that classify and assign incoming transactions with different sub-counters. One example is if you have two transactions (signed by 0xa and 0xb respectively) sending tokens to address 0xc, then a sloppy counter on the token balance of 0xc will have two sub-counters, one for recording the update from 0xa and the other for recording the update from 0xb. We can easily see that this resolves the shared data conflict.

However, if we want to read from a sloppy counter, our program would have to perform the extra computation to merge sub-counter values. And if we have to initialize sub-counters, this process would incur significant runtime latency. Thus, they are only suitable for write-heavy variables.

Intuitively, sloppy counters are a more formalized version of partitioned storage that we see in many existing Solana smart contract designs. So naturally sloppy counters also have the previously discussed flaws of inherently higher gas cost and potential uneven gas distribution (across the dimension of calls) even though they create a more fair & even gas distribution across the dimension of accounts.

Optimistic concurrency controls

Optimistic concurrency control (OCC) was studied and used extensively in traditional distributed database systems with a focus on maximizing throughput. Specifically, OCC optimistically assumes that each transaction gets the lock/resource and if it fails to commit retry in the future.

OCCs with deterministic commit order, only requires that the resulting state of all the nodes correspond to some non-specific ordering of the transactions but not one particular ordering. OCC with deterministic aborts, on top of that, also guarantees that if a transaction is aborted on one node, it is on all other nodes.

Different variations of OCC can parallelize Ethereum from a consensus protocol level by changing fundamentally the way blocks are produced: there is no specific transaction ordering proposed like in PBS but instead, each node runs the same OCC protocol which guarantees some level of determinism for the ordering. If we're willing to modify the consensus protocol, there doesn't even need to be a serial order. The miner could just include a dependency graph in the block. But then, there needs to be some mechanism to punish dishonest miners who omit or add extra dependencies.

However, many OCC variants overlooked the role of MEV in distributed systems design, as the ordering that is produced by an OCC scheduler doesn't take into account how MEV is extracted and therefore can potentially destabilize the consensus or at least lead to serious centralization of actors. Having a more "free" OCC determinism is like "solving MEV with a random transaction ordering" which inherently encourages more spam (because MEV extraction is probabilistic thus spamming more transactions gives you a greater chance of extraction). Eventually, the validators(miners) and searchers(builders) will possibly derive two kinds of OCC schemes, where the searchers employ a more "free" one since they don't have to make the resulting execution ordering correspond to a specific sequential ordering, and the validators use a much more strict OCC which adheres to a deterministic order proposed by the builder.

Here we make a conjecture that no matter what level of determinism granularity an OCC variant chooses, it doesn't matter because due to MEV the ordering of transactions will eventually converge. Although we can explicitly enforce the compatibility of OCCs with MEV by devising a special kind of bundle that allow searchers to state that they don't mind getting scheduled differently under certain circumstances. This is isomorphic to having the parallelism-aware fee market baked into the PBS (flashbots) level.

To summarize, consider the following theory:

The Law of MEV Conservation: assuming the total amount of MEV extractable in the system is

γ, the amount of additional on-chain computation that is required to extract MEV is
δ
(in gas), the fee market dictates that the additional fee per gas is
ϵ
, the latency of the system is
l
(in seconds), and the cost of achieving that latency by having more off-chain computation is
β
, then eventually
γ=ϵδ=βl
. This means that MEV is conserved through fee market (more spam computation) and node centralization (latency) in the long-term equilibrium.

We predict that in reality, OCCs (just like all distributed systems) fall under the law of MEV conservation. Specifically, since the total amount of MEV

γ is not changed in the system, the latency gains (a smaller
l
) and MEV extraction friction gains (a higher
δ
) from OCC will be conserved through (i) a higher
ϵ
which means more fees per transaction, and (ii) a higher centralization within the network because centralization is a monotonically increasing function given
β
as its domain, and in this case
β
is increased since
γ
is fixed.

Wat do?

Ideally, we would live in a world where storage partitioning (sloppy counters) and parallelism-aware programming paradigms are incentivized and widely adopted. But to get to that world we still need much work on developer advocation (e.g., building template libraries & organizing workshops) and potential protocol-level changes after we thoroughly study the tradeoffs. As a result, we wrote a note which presents two specific example algorithms on the first trick (refinement), as that is the optimization we can immediately execute. It is also interesting that refinement is the only trick that gives searchers/builders a unique advantage at parallelism-related EVM speedups, so it is the one that searchers would care about the most, while the other two are more public good types of work that would cause more interest among protocols and applications. In an MEV endgame world where every block builder has the same edge from mass adoption of parallelism-aware programming, the optimizations they would focus on is refinement.

Zooming out

In this section, we highlight why the concept and analysis of parallelization (parallel execution) discussed in this post is different from the ones (parallel verification) from previous proposals. Specifically, we argue that parallel execution (which mostly contributes to MEV searcher/block builder/sequencer/prover speedup) is important for blockchain finality, and parallel verification (which mostly contributes to fraud proofs, data availability proofs, light clients, and stateless Ethereum) is important for blockchain scalability.

Parallel verification

Previous attempts at "EVM parallelization" like statelessness & fraud proofs and data availability proofs all focuses on the idea of parallel verification[9]: execution of transactions only needs to be done once by a few centralized parties, and then any decentralized node can just jump in anytime (thus "parallel") to verify the integrity of computation. This idea holds strongly because of (i) cryptographic magic that guarantees much faster verification than computation (e.g., SNARK, STARKS, erasure coding, etc,.) (ii) the extra information (state witness traces) needed for parallel verification is a tradeoff of disk IO for network IO, and bandwidth increases much faster than storage increases.

Here, the parallel part comes after precise state witness (i.e.,

κ) generation. Typically, a cryptography accumulator (e.g., Coded Merkle Trie, Verkle Trie, or multi-dimensional Reed Soloman Encoded Merkle Trie) is generated after the sequencer first executes the transactions and state traces are produced.

Parallel execution

If we want to parallelize the execution of a program and not the verification of it, we have little to no state witness information on it (at least not precise ones due to Rice's theorem[10]). Thus, we rely on state witness oracles, which are generated by offloading work to others to pre-process transactions and transmit processing results.

Therefore, the problem of parallel execution, through the use of oracles, is reduced to the problem of (i) integrity of witness information and (ii) tradeoff of local computation latency to network latency of information propagation. For (i) there are numerous solutions using cryptography or economic incentives, and for (ii) the tradeoff is justified for exactly the previously discussed reason.

To make further optimizations on top of our use of oracles, we introduce refinements (through e.g., formal methods, the advocate of new programming paradigms, or tricks like sloppy counters). Of course, refinements introduce new problems like being a potential DoS vector whose solution is generally devising a good incentive mechanism design for relays, which in turn is the problem of devising an optimal fee market. There is no known "definitive" guide to this problem, but there are many examples that we can take a look at: Bloxroue's simulation fees, Tornado cash's relayer rewards and many cross-chain bridge relay designs.

However, those design all fall under the transaction quality trilemma (excluding "free" refinements from parallelism-aware programming): and all we can do is to explore and try to find a sweet spot in the trilemma. For example, the level of granularity of the state witness oracles roughly equals the amount of MEV you allow block builders/sequencers to generate, so in a PFOF-ish system, submitting a transaction with a fine-grained state witness makes your orderflow worth more than others and thus searchers are willing to pay more for it. Some potential future directions might be to research on devising an effective parallelization oracle market. One immediate thing that can be done is to replace the probabilistic access list in EIP 2930 with some more advanced data structure (e.g., symbolic state witness summaries like the ones we used in section Precise Analysis of the Refinement Notes).

Case Study: Parallelizing zk-rollups

Zero-knowledge rollups encode offline computation as concise cryptographic proofs and publish them on-chain to be verified by a layer-1 smart contract. Using StarkNet as an example, during the lifetime of a transaction on zk-rollup, it undergoes three phases:

  1. Users send transactions off-chain to sequencers who determine the ordering of the batch (transactions that she receives within some timeframe) and then sends the ordering to provers. Obviously in this step the sequencers have immense power as the MEV that can be extracted from the batch (which probably contains millions of transactions once zk-rollups gets adoption) is huge. So they need to be decentralized (or at least have the MEV profit democratized) using a PBS or flashbots-like system.
  2. Provers execute the batch according to the received ordering, records the result (state witness information) of the computation, generates cryptographic proofs (in this case STARKs) which encode the computation results, and then publishes the proof to a verifier.
  3. The verifier (usually a smart contract on a layer one blockchain like Ethereum) verifies the proof is correct and from this point onward the finality of the batch is the same as the finality of the layer one blockchain. This means that in a PoW blockchain, the batch is still prone to reorgs, and in a PoS blockchain with finality guarantees (e.g., PoS Ethereum with single slot finality) the security guarantees are much stronger.

So, what can we do to use parallelism (or more generally shared state conflict analysis) to help increase the throughput of zk-rollups? We propose two directions specifically: (i) just like what we can do in the block builder case, we can use parallelism to help sequencers extract more MEV and produce more efficient batch ordering (ii) using the shared state analysis information, we can do recursive proofs by sharding the transaction batch to different provers and then finally generating an "aggregator proof" that says the result of all those separate proofs can be correctly integrated. This helps reduce the overall latency of the proving phase and more importantly helps prover decentralization as the computation task for each prover is greatly reduced. Moreover, due to the way zk-rollups work, direction (ii) has a much more direct impact on the UX than direction (i) because:

Finality

Due to the computation-intensive nature of zk proof generation, prover latency is multiple times worse than sequencer latency. And because provers and sequencers have roughly the same rate of shared state conflict rate (since they process the same batch of transactions), theoretically they should have the same percentage of parallelism speedup. Thus parallelized provers reduce the overall latency much more than parallelized sequencers.

Mechanism Design

Since the time window for proof generation is lowered, we have an easier (tokenomics) mechanism design problem which will enable better decentralization (by lowering prover nodes' minimum stake requirement). To see this we use an example: if it takes 1 minute for sequencers to determine the ordering and 1 hr for provers to generate the proofs, then after processing a batch the sequencer has to optimistically "trust" (for 59 mins) that the proof of the batch will be correctly generated, published, and verified according to the sequencer's ordering.

Currently, this "trust" is justified by mechanism design: the verifier contract will verify that the ordering is not tempered and slashes the prover if it does.

Alternatively, if the data availability of the transaction sequencing is on the sequencer side (meaning the sequencer is responsible for posting transaction ordering onto either layer one or some where else), then there needs to be incentives put up to make sequencers not do re-org. Typical layer 2s post sequencing data (or simply hash of end-state or state diffs) about 2x faster than the proof, which adds security risk and the frequency of state diff posting should be increased.

Interestingly, if we have a rational majority assumption on the verifier contract's layer one then we need to adjust the reward parameters right so the mechanism design for our rollup works out to guarantee liveness (given that we have a rational majority assumption for the provers). But if we have an honest majority assumption on layer one then we only need an honest minority assumption for the prover nodes on our rollup because as long as there is only one honest prover then liveness is guaranteed (although in reality latency will be seriously hurt so the user experience will decrease a lot). This is an example of how MEV can affect the security guarantees of layer twos.

Thereore, the current mechanism makes sense only if the prover's reward from being honest is larger than the MEV that can be extracted. But since there is a 59 batch delay due to the times it takes for proof generation, the prover can basically see 59 batches into the future, and thus there is more MEV to be extracted from tempering with the ordering. Therefore, the reward paid to provers has to be much higher (since the MEV that can be extracted is superlinear to the number of transactions) than if it only takes 10 mins for proof generation. Now you might ask that "didn't the sequencer already produce an (optimal) ordering so that there is little room for more MEV to be extracted?" This is untrue as the prover has more information than the sequencer and thus can make more profit through "probabilistic MEV" (e.g., a market maker can manage her inventory according to this additional information by adjusting the pricing function for each asset accordingly).

One might also notice that the argument of "parallelization enables a stronger mechanism design" is similar to the argument of "eliminating MEV by shortening blocktime:" they both reduce the total amount of MEV available but increases the centralization of MEV extraction activity. But to make it clear in this design of a parallel StarkNet since there are components like sequencer PBS and prover incentivization systems so the negative externalities are much better mitigated than those short-blocktime layer ones.

Scalability

Prover parallelism is much easier than sequencer parallelism because sequencers don't have the exact state witness information, while provers (just like in the case of parallel verification) can have perfect information on state witness traces. This can be achieved by making the provers record and transmit the state witnesses that they get from their execution to the provers as oracles. But of course, this would again introduce the problem of information integrity as there is currently no mechanism to ensure that the state witness oracles provided by the sequencers are correct (and if so, to what granularity?). So to solve that problem we need additional incentives/cryptography which introduces extra complexity and latency. For example, we can have a "parallelism oracle layer" (kind of like the idea of a separate data-availability layer) in between the rollups and layer 1s where people deploy trustless commit/slash contracts to make sure the incentives work out in a modular and decentralized way.

An interesting observation is that the finality argument here doesn't apply to optimistic rollups at all so the argument for parallel execution in zk-rollups is much more justified than in optimistic rollups. This is because zk-rollups have heavy computation on the execution level and light computation on the verification level, while optimistic rollups have heavy computation on the verification level but light computation on the execution level. Specifically, fraud/DA proofs are made valid only given that enough time has passed or enough nodes have checked them, so the computation latency for one node doesn't matter. However, the scalability argument does apply: we can decentralize fraud-proof checking nodes in optimistic rollups just the same as we decentralize prover nodes in zk-rollups, since they can both obtain

κperfet.

Conclusion

Through the lens of MEV, how do the optimization choices on EVM impact and be impacted by protocol design. We discuss the tradeoffs of several parallelism proposals to increase the EVM execution speed (for block building, private information bundle merging, and zkrollups) and show why they are subject to “the conjecture of MEV conservation.”


  1. Of course, many other optimizations don't need base layer changes, such as batching state read & write operations, caching, flattening the storage database, or just using a more performant programming language or hardware-rization. We see many of those optimizations implemented in clients like Optimism EVM, Nethermind, Akula and Silkworm. ↩︎

  2. We deliberately chose this formalization of the parallelization algorithm

    δp to evade complicated usage & explanation of fine-grained locks. ↩︎

  3. X-level granularity means X is the smallest unit whose code is executed sequentially. So transaction level parallelism means we parallel the execution of transactions but the code inside each transaction is still executed sequentially. ↩︎

  4. Amdah's law is a formula for predicting what is the maximum speedup ratio you can achieve using parallelism under limited resources. ↩︎

  5. For the exact experiment setup and more data insights refer to the ICSE'22 paper Utilizing Parallelism in Smart Contracts on Decentralized Blockchains by Taming Application-Inherent Conflicts by Garamvölgyi et al. ↩︎

  6. Interestingly, our observations from bundle clashing analysis show that bundles have a lower shared data conflict rate. This seems counter-intuitive as there are only as many MEV-emitting transactions in one block and most MEV searchers would go after the same opportunity, thus creating many bundles which contain the same transaction. We highlight this distinction as different shared data conflict rates could lead to very different analyses & scheduling strategies. ↩︎

  7. State witnesses are discussed extensively in the context of the stateless Ethereum and data availability layers. It is usually passed in forms of Verkle Tries. ↩︎

  8. For simplicity of this post, one can interpret state witnesses as the encoding of shared data information

    κ in a cryptographic setting (specifically, when the data is represented in the form of an accumulator). ↩︎

  9. And most works had focused on how to generate a better crypto accumulator intended for shorter fraud proofs and(or) state witness representations. ↩︎

  10. Intuitively, this is because the state witness of a future transaction is dynamic. To see this, a simple example from Vitalik's Ethereum Sharding FAQ: "if the transaction execution has code of the form read(f(read(x))) where the address of some state read depends on the execution result of some other state read. In this case, the address that the transaction sender thinks the transaction will be reading at the time that they send the transaction may well differ from the address that is actually read when the transaction is included in a block, and so the Merkle proof may be insufficient." ↩︎