CAMCOS 2022 Spring Projects Brainstorm

# CAMCOS 2022 Spring Projects Brainstorm My current plan, in order of priority: 1. try to flesh out Aditya + Barnabe's question(s). It seems nice, open ended, potentially fruitful. However, it's a bit underspecced right now for my tastes. 2. as a backup, work on further EIP-1559 stuff, starting with multidimensional EIP-1559. ## (Aditya + Barnabe) The Ethereum User Experience This is an attempt to summarize and clarify Aditya and Barnabe's ideas, trying to convert them to projects that would make sense for either a current CAMCOS project or next semester's CAMCOS project. From my understanding, the main question here is: > What does the right UI / numbers and options shown to the user look like? My take for why this question is confusing is that it really has several layers, depending on which "layer" the user is. So I'm going to try to separate the layers and see if I got the questions right. ### From the transaction to the secondary layer So here, the "user" is some normal person. You make a transaction. Then your phone sends it into the mempool, and stuff happens to it. The question then is something like: > What should my phone tell me about how to make my transaction / the status of my transaction? This is itself two questions. I think the first question, what I'll call the **action space question**, is **very similar to that of a Geth oracle**: the phone should tell you: - the current gas prices - some recommendations of "if you want your thing included soon, you should tip X" or "if you want to be included in 100 blocks, you should tip Y" Basically, what information will help me make decisions? The second question, which I call the **status question** is not a thing about helping me make decisions, but about what is going on with my transaction. My guess is in proof-of-work, when a user sends a transaction, they get to see something that changes between - transaction sent - transaction sent, and it still hasn't been put in a block. It's probably hopeless at this point. - transaction put in a block - transaction put in a block that is k blocks deep and is head of the chain - transaction put in a block that is >= some number (like 12?) blocks deep, we are now fairly sure your transaction is included! (is this picture correct?) So first of all, I think the easiest mode of this question is already answered - you can kind of *still* do the above, and to the average user this is not going to make a difference. But I'm guessing you want something more sophisticated, like: 1. at least you can add "transaction in finalized block" as the ultimate guarantee 2. maybe ultimately, something like "transaction in block that is X% sure to be finalized", giving them a single number. (notice theoretically we can also do this in proof-of-work, but the number is different) 3. maybe warnings like "we might be under a liveness attack" So, **where is the math**? I think the action question is about game theory (including EIP-1559)! I think the status question is about how **proof-of-stake** differs from **proof-of-work**, so we need to use different heuristics / bounds to see what numbers we can show the user. TODO: **What's the most basic unanswered question that we can try to work on?** ### From the secondary layer to the block proposer Here, the "user" is a **sequencer** (?), someone who has a list of transactions from the mempool who might operate, a rollup, be Flashbots, or some other layer-2 thing. The sequencer has a list of transactions, and what they want to do is to have some user experience proposing things to the block proposer. Here, I don't see them looking at their phone; I see them looking (or not even looking, since it can be done automatically) at a computer terminal, and they have some software that gets a certain interface to the state of the blockchain and they can automatically or manually do things based on that interface. So instead of a phone, I see them asking: > What should my (open-source) software tell me, as derived from the publically available information, useful information that can help me navigate my action space? I think the meat of the question here is to figure out what the action space is. I think it's something like: 1. they propose parts of blocks to the block proposer (am I getting this right?) 2. so they can give a bundle of transactions to the block proposer, and say something like "I want to put this sequence of transactions into the block; I'll pay you (this is part of the proposal) X to make sure this sequence is built on block Y" 3. extensions might include "... and also in the first half of the block" This means I imagine the things the user wants to see is something like: 1. a list of blocks that they can build on 2. some values on those blocks; maybe certain blocks are marked as "safe" and certain blocks are marked "unsafe" These are things I guess a normal user would not be interested in, but a sequencer would. My take for the status question is that it is *basically identical* to the status question discussed previously. Basically something like a status indicator like: - you sent your bundle of transactions to the proposer - the proposer still haven't proposed your block, it is probably lost at this point. - the proposer proposed your block! - the proposer proposed your block, and now it's X% likely to be finalized - the proposer proposed your block, and it is finalized. So **where is the math**? I think for the action space question, the answer is *split* between the proof-of-stake stuff (since that matters in telling you which blocks are safe or not) and some understanding / potential game theory (?) of the actual action space in the interaction between the sequencer and the block proposer, which I hope I got right. For the status question, I think it's basically the same question as that of the previous case. TODO: **What's the most basic unanswered question that we can try to work on?** ### Barnabé's notes This seems slightly misstated. From what I see there are two settings: 1. A user wants to include their transaction on the primary layer. 2. A user wants to include their transaction on the secondary layer. Generally what we want is to offer degrees of confidence to the user before inclusion, and guarantees of "non-uninclusion" (e.g., reorgs) after inclusion. The first, before inclusion, has to do with the gas market. The second, after inclusion, has to do with consensus properties. - Before inclusion, a user can observe the gas market to gauge the price they should pay to be included. You know well by now that 1559 made that operation a little easier :) A user who cares for maximum confidence fast inclusion should bump up their priority fee to hedge for the case where the next block is full. - After inclusion, the guarantees have more to do with chain stability. Under PoS (which is what we care about), we can offer guarantees that a transaction will not be reverted based on the voting weight the block that includes the transaction has accumulated. In Gasper, validators vote for their view of the head of the chain, and these votes are tallied such that the more weight a block accummulates, the harder it becomes for it to be reorged. See also [Dankrad's post](https://ethresear.ch/t/high-confidence-single-block-confirmations-in-casper-ffg/8909) which currently guides our implementation of "safe head", i.e., answering "when is my transaction safe?" The two points above take the example of the primary layer, where a user directly interacts with the Proof-of-Stake chain. This user can be many different persons: someone who sends a Uniswap transaction/makes a deposit to a rollup; a rollup sequencer who publishes data to L1; Flashbots trying to interface directly with the block producer to have their bundles inserted. On the secondary layer, we have the same problem but a different instantiation. At the moment, let's only consider a simple user who wants to do a transaction with an app deployed on the Layer 2, e.g., the user wants to use Uniswap on Optimism. In this case, the user must send their transaction on the Optimism network and wait until a *sequencer* adds their transaction to a batch. The sequencer listens to the "Optimism mempool" and once they can produce a batch, they do so by grabbing a subset of transactions $T$ from the mempool. There is then a two-step process: 1. The sequencer publishes the batch to the L2 network. This informs all participants of the network that there is a new batch of transactions (<=> new block). 2. The sequencer must at some point post the L2 batch to the primary layer. They take the batch of transactions $T$, compress them to a data chunk $\overline{T}$, and publish that data chunk to the L1. Btw, this is where multidimensional EIP-1559 comes in. Currently, we price data chunks in multiples of execution, e.g., a non-zero byte costs 16 gas, and a zero byte costs 4 gas. This is weird, why should data be priced like this? Multidimensional EIP-1559 allows us to separate the two markets, e.g., now we are selling oil for the engines to run but also metal to build warehouses that store the data. The oil and the metal markets are distinct, as they should be. We can set how much average data bandwidth we want to offer (how many data chunks can be written per second) by setting the target for that EIP-1559 data market. L2 proposer putthing things on L1 - they give a string - memory vs storage - costs 16 gas to put non-zero in storage - [idea] - oil / metal - warehouse is the storage - engine is the computation - oil for computation is gas - 2 base fees, one for usage and one for the storage cost - verifiability vs scaling - idealize the worst computer on the network should be a retail laptop - do tests and have some stress threshold - zk-rollup example - user has 1 Eth - they send ETH to a bridge smart contract - now you have 1 ETH on layer 2 - they send a transaction, - zk-rollup publish state differences - proof the transition has happened correctly on layer 2 - users can use proof and give some message to the bridge to get layer 1 asset - for optimistic rollup there's a period - at least you save the gas each time L2 landscape - ignore sidechains (Polygon) - "commit chains" might save roots on the L1 chain - rollups - https://l2beat.com/ - Arbitrum - centralized sequencing - decentralized sequencing is next step and context for the last question - simple: 10 sequences who are bonded - "bonded" put up some desosit - [wait for reading list] - still optimistic versus zk - right now in centralized sequencing you can do things like promise "I'll include your thing in next block", but what if you don't So here is a first question: **how would L2 sequencers interact with the L1 (multi-dimensional) fee market?** Sequencers must quote a price to the users they include in their batches. This price includes: 1. The future price of publishing the transaction data to the L1 as part of the batch data chunk. 2. *Maybe* a fixed fee for their sequencer services (kind of like the fixed "miner fee" in EIP-1559) 3. *Maybe* a congestion fee to charge users when the L2 becomes too congested. We can think of the L2 gas market like we do the L1: there is some bandwidth that we must respect, e.g., the L2 sequencer can only include up to $x$ amount of gas per second. We could have an EIP-1559 mechanism on the L2 to price the congestion charge then, where we target $x$, allow sequencers to use up to $y$ amount of gas per second and do the basefee updates depending on the usage. For cost 1., the issue is that a sequencer must quote the price to the L2 user *before* paying the cost of publishing. Here is an example: | Timestep | L1 BASEFEE | L2 inclusion cost | Revenue | L1 publication cost | |-|-|-|-|-| | 1. User inclusion | 10 | 10 | 10 | 10 | | 2. User inclusion | 15 | 15 | 25 | 30 | | 3. L1 publication | 20 | x | 25 | 40 | Here we assume there are two users included in the batch. The first user is included when the L1 basefee is 10, so the sequencer tells them "you should pay 10 to me". The second user is included when the L1 basefee is 15, so the sequencer tells them "you should pay me 15". At this point, the sequencer revenue is 25. At timestep 3., the sequencer is expected to publish the batch to the L1. Doing so will cost them 40 (2 * 20) because the L1 basefee is now 20. This is bad: the sequencer didn't manage to recoup their costs. So the sequencer can change their pricing model: why not 2x the current L1 basefee to have higher confidence they will be able to pay the publication cost? | Timestep | L1 BASEFEE | L2 inclusion cost | Revenue | L1 publication cost | |-|-|-|-|-| | 1. User inclusion | 10 | 20 | 20 | 10 | | 2. User inclusion | 15 | 30 | 50 | 30 | | 3. L1 publication | 20 | x | 50 | 40 | Now the sequencer revenue is 50 and they must pay 40. All good? In a sense yes, but now the users are overpaying by a lot. Here is a second question: **Think about pricing models to ensure sequencers recoup their costs with minimal overhead**. My current idea is to use derivatives, e.g., future contracts, on the basefee. If you have finance-minded students who are keen to explore this side, it may be a good self-contained question. To recap', these are the different archetypes we have: | Interacts with L1 fee market | Interacts with L2 fee market | |-|-| | Simple user interacting with dapp on L1 | Simple user interacting with dapp on L2 | | Sequencer publishing data to L1 | xxx | | Flashbots including bundles on L1 | Flashbots including bundles on L2 | A **third question** is looking at the relationship between the L2 user and the sequencer. What guarantees can a sequencer give to the user that they will be included in the next batch? For instance, we can think of the _fast pre-confirmation game_. In this game, the sequencer bonds themselves/stakes some amount. The sequencer is allowed to sign messages saying "I promise that I will include transaction $t$ in batch $b$". Should $t$ not be included in batch $b$, the sender of $t$ (or anyone else) can submit a proof that **a)** the sequencer promised to include the transaction and **b)** the sequencer did not hold up their promise. In the event a proof is submitted, part of the sequencer stake is slashed. In all the preceding, we only considered a situation where there is one centralised sequencer, i.e., the rollup relies on a single entity to select transactions, produce batches and publish the data on L1. Ideally though, the role of the sequencer would be decentralised, to avoid situations where the centralised sequencer "goes down" and doesn't process any transaction anymore (this happened to [Arbitrum](https://offchain.medium.com/todays-arbitrum-sequencer-downtime-what-happened-6382a3066fbc)). There are multiple models for decentralised sequencing: an [MEV auction](https://ethresear.ch/t/mev-auction-auctioning-transaction-ordering-rights-as-a-solution-to-miner-extractable-value/6788), a simple round-robin with bonded sequencers, a [proof of burn](https://ethresear.ch/t/spam-resistant-block-creator-selection-via-burn-auction/5851), [some version of Proof-of-Stake](https://ethresear.ch/t/against-proof-of-stake-for-zk-op-rollup-leader-election/7698), a [token model based on fees](https://fuel-labs.ghost.io/token-model-layer-2-block-production/)... none has currently emerged as the gold standard (mostly rollups are busy deploying other features first). As a **fourth question**, students could explore the different models of selecting the next sequencer. Each model is a different "game", like a mini-protocol. Can the games be formalised? Who are the players, action spaces? What are the outcomes? ## (Barnabe + Vitalik) EIP-1559 Continuation Two general subclasses here as well: 1. General progress/response to Vitalik's multidimensional EIP-1559 proposal. 2. Continuing our own work. The directions that seem most interesting are: a. more time series analysis (Barnabe mentioned an EF researcher who might be interested in things like Fourier decomposition) b. better models of users using something like the valuation framework and/or common sense, probably dividing them up into "Groups" based on behavior. This is supported by the bimodal data we got from earlier EIP-1559 projects, which I thought was one of the most interesting outcomes. ## (Aditya + Barnabe + Vitalik): Single-slot-finality I'm a bit confused here since I think different people see different things here. TODO ## (Barnabé + Brian) Out-there Dark Forest idea Here is a different project that relies more on data science. Dark Forest has had a couple of rounds where users sent transactions on the xDai (now Gnosis) chain. The transactions are cheap so users sent many transactions to make their game moves. Equipped with a data set of all the transactions, the project could try to infer different player patterns/archetypes. Are anon players (e.g., unlinked Twitter accounts) more aggressive? Do players make use of the real time gameplay to time their attacks when opponents are asleep? Do coalitions form? This is the descriptive approach. As a normative approach, students could think through ways collusion would be enabled by smart contracts. E.g., could smart contracts enforce non-aggression between players? Could they enable a team of players to "vampire attack" the game and shift other players' incentives? This line of thought is inspired by [Virgil's "Ethereum is game-changing technology, literally"](https://medium.com/@virgilgr/ethereum-is-game-changing-technology-literally-d67e01a01cf8). Here are a few outcomes of this project, perhaps in order of difficulty: 1. Data analysis of past plays 2. Formalisation of the "game" (in game theory terms) of Dark Forest 3. Shifting player incentives / enforcing coalitions with smart contracts 4. Actually writing these smart contracts and trying them out in test play This is more exploratory, and less related to core Ethereum protocol. However, extracting generality on the space of smart contracts that blockchains natively allow could be of interest. There was a line of thought started by the piece linked above on how far "game warping" could go, and while some examples can be found from classic game theory, it would be interesting to see whether blockchain-native settings such as Dark Forest have such instances.