# Ethereum 15.0: Inactivity Leaks (thanks to Vitalik Buterin for conversation and feedback) ## Motivation In [A model for cumulative committee-based finality - Consensus - Ethereum Research](https://ethresear.ch/t/a-model-for-cumulative-committee-based-finality/10259), we see a long-term proposal for single-round finality schemes with the concept of **inactivity leaks**. This document attempts to map out the concept in detail, so we can start asking concrete questions. I think of the setup as the following: 1. We have some proof-of-stake protocol based on rounds 2. each round, there is a committee (a subset of the validators) who participate in some sort of consensus algorithm `CONSENSUS`. One of 3 things happen: - we get an a *finalized* (unlike e.g. Gasper) block - we fail to reach consensus - we break safety (e.g. finalize 2 blocks) 3. we assume that we have already decided to have the committees be random from one finalized block to the next (to distribute power). But this adds to the probability that some committee is vulnerable to a 33% attack. 4. The elephant in the room in this situation is that theoretically, In this situation, the attackers in the committee can break finality at the cost of some stake (guaranteed by the accountable safety guarantee of `CONSENSUS`. However, another attack that's not malicious and not punishable by the protocol is to for the attackers to be inactive, which breaks liveness indefinitely. - To break this situation, we need to *change the committee if we fail to reach consensus*. - The first natural solution is to roll for a completely new random committee. However, this sets up the "we weren't home" situation where the (all honest) validators in committee A go offline, a new (honest) committee B finalizes, and then A returns with a finalized block, breaking finality. - to avoid the "we weren't home" situation, *we need the new committee to have significant overlap with the old committee*, while ensuring it's not exactly the same as the old committee. 7. The discussions in the previous point gives a natural solution: if we did not break safety but failed to reach consensus, we can punish the **inactive** participants in the committee. This way honest participants keep their stake while inactive participants leak stake. 8. In other works, we want inactivity to be a thing that's bad enough that we should punish it, but not too much because it is totally reasonable to expect that honest validators will sometimes be inactive. ## What is Inactivity? Here are some properties we want from defining inactivity: - it's something that an honest validator may "accidentally" do (which is why we differentiate it from e.g. double-voting) so we want to punish it, but not too much - if a validator does absolutely nothing, it counts as being inactive - as a corollary, we can't prove someone was inactive. - depending on the system, we may or may not be able to prove someone was active. For example, if we believe in timestamps, then we can timestamp something that proves someone *did* do something and "redeem" her afterwards First, it is tricky to define **inactivity**; it will be dependent on the consensus algorithm and our purposes. For example, is someone who consistently sends messages, but just at the last second when they aren't useful, inactive? Many attacks consist of the adversary doing coordinated things that are notably "abnormal" but not necessarily illegal. Do we want to capture these as "inactive?" At a higher level, we effectively need some function `judgment()` which takes some data (including a committee) as input and then maps everyone in the committee to one of 3 types: - L(egal): did not do anything provably illegal, and can be interpreted as having been "online" (example: attested) - I(nactive): did not do anything, which is not illegal (example: didn't attest) - P(unishable): did something punishable (example: attest to 2 blocks). - we assume only byzantine members of the committee are allowed to perform P actions. - Note this concept of "levels of transgressions" can be generalized; outside of its main properties described earlier, there's nothing that special about Inactivity. Like our own legal system, we can theoretically have `judgment()` return things like "Felonies" for heinous crimes, "Misdemeanors" for small crimes, and/or "Suspicious" (which Inactivity is really a form of) for non-provable criminal activity. Many nuances hide under the *domain* of `judgment()`. For example: - Maybe it has type `judgment(block)`. This has the benefit that everyone with the block has the same judgment. However, this function cannot then take into account the activity *during* a period of non-liveness, which is probably the most important information to use to decide inactivity! - Maybe it has type `judgment(block, seen_messages_after_block)`. This would allow us to make a judgment on people before the next block, but might give different pictures depending on who is observing the system. What all this tells me is that a priori, there seems to be many very different *implementations* of inactivity leaks (these are not well-used words AFAIK; I am kind of making them up but I wouldn't be surprised if they were already in literature) with vastly different properties. This means we probably want to tighten our scope, which I try to do in the next section. ## Formalization We make the following (fairly strong) assumptions: - the state of the network is in consensus on which the last finalized block is. Thus, we do not consider attacks where someone is trying to fork an earlier block than the last finalized block; in other words, any participant "knows" (or will know, before they participate in the current round) (a) what the last finalized block was and (b) how many slots since then it is, and attacks on this assumption are outside the scope of our discussion. - the proposer of a block defines which participants of the committee were "inactive" last slot as part of the block he writes - as an intuitive example, the proposer of a block in a Gasper-like algorithm might include all (Gasper-like) attestations he saw, and we just define people who weren't in the attesting set as **inactive**. We can then make them all lose some stake. - this is analogous to slashing, where someone is slashed when the slashing proof is actually written into a block. - (**TODO**: this brings up weird game theory where people may want to block other people's attestations, or threaten people to do so in return for money outside the chain. This game theory is already relevant to the current design without interactivity leaks. I wonder if this is worth studying - maybe the block proposer should be incentivized to include every attestation he sees?) - when we reach consensus in a block, the entire committee is **reshuffled** (this is using a hidden assumption that Ethereum wants to "spread out democracy" as another desiderata; otherwise it may *increase* safety to stick with a committee that finalized correctly, depending on implementation!) - when we fail to reach consensus in a block (say in slot j), the committee is static but **leaks**. Formally, this means that we have a `get_committee(block, wait)` function that defines the committee from a block given that we have waited for `wait` slots. - even if we don't reshuffle the committee, we probably need some randomness in the protocol (for example, if our consensus algorithm has a proposer who is corrupted, then leaking inactive participants doesn't do very much since the proposer can just keep being equivocating and never label himself as inactive when he proposes a block) - Tricky: with our design, what this means is that someone in the committee who is proposing blocks must then also declare which members of the committee were inactive as opposed to the members who were not during the period of the non-liveness. Even not considering the potential complexity of this, it might set up weird game theory - wouldn't rational actors just say "everyone was inactive except me" all the time? We also want to black box as much of the consensus algorithm itself as possible. We need, at a minimum: - **safety guarantee**: if the consensus algorithm breaks (that is, outputs more than 2 blocks), then we can provably punish some p (say 1/3) of the participants for Punishable behavior. - **(probabilistic) liveness guarantee**: if at least p_H of the **honest** participants play H (we need this nuance since it is possible in our definition space for dishonest participants to not play provably P or I but still not contribute to consensus!), then we produce a block with some probability q_L (for liveness) Else it is very hard to prove anything! Now, we need to quantify the things we want: 1. we want to make breaking safety *very* difficult. Sadly, we are already in the situation where we assume the safety assumption can be broken. This means instead of simply assuming some bound on the proportion of B's, we need to phrase it as a lower bound on the amount of resources it would take to break the consensus algorithm. 2. we want to make not finalizing a block costly. This means for (some specific to start, maybe later for all a) some 0 < a < 1, we want a lower bound (as a function of a) on the amount of resources it would take to keep (# of slots that don't hit consensus) / (# of slots total) below a. 3. we want to NOT punish inactivity too much (since honest nodes can be inactive). # [Scratchwork, Ignore] 1. We have a `committee(block, slot)` function that tells us who the committee is from the point of view of slot `slot`. When `slot = block.slot`, this is just the assigned committee. Otherwise, this function is there to handle **inactivity leaks**. - in particular, if the chain is going *as planned*, then `committee(block_2, 4)` is NOT the actual committee at block 4, but what the committee would be from the point of view of someone who saw block 2 but did not see blocks 3 and 4. 2. To formalize **inactivity leaks**, we need something of the form `judgment(block` - Important: from the **point of view** of an outsider, this may not be unique. For example, the outsider may feel there was no consensus since there was a network partition. It is probably good for the protocol to make these the only 2 options: the outsider either sees the block (for which there is only one way to interpret the outcome) or not. 3. We have a `committee_transition(state, consensus_output) -> state` function that changes the committee based on the output. We make some (strong) assumptions: - if `consensus_output.block` is legitimate, we refresh the committee randomly. This just seems in general better than letting the committee keep power, and is better the more we are sure that in general people tend t - is the concept of the algorithm "not reaching consensus" because (a) a certain time has passed and there was no finalization (which means it could theoretically be completed into a finalized block later) or because the group has actually come into consensus that there was no block? (e.g. a successful run of the "leader change" part of a PFBT-esque consensus algorithm.) The main - **strong inactivity**: all the honest validators can come to consensus on exactly who was inactive. - this is so strong it might actually be impossible (!). I haven't proved it yet. - **subjective inactivity**: different validators may conclude different people were inactive, and by the time the next block is committed the inactivity status becomes part of the block information. - **SOLVED**: why don't we just reshuffle the committee? Something like having `committee(block, wait)` be defined by some hash function and the block state? This increases the entropy to get bad committees, yes, but isn't this already a cost we were willing to eat under normal operation? (very basic **SOLVED** that I can't answer: don't leader change algorithms already "solve" inactivity? What's bad about just plugging in an "off-the-shelf" consensus algorithm with leader change?): leader change algorithms doesn't help in the situation where 1/3 was broken ## Comments Can we assume individuals with more stake than others are most likely to be honest validators or proposers? If we are able to make this assumption, we can define changing validator set as more stake more weight and more chance of being a proposer/validator? this can maybe guarantee safety? (long stretched idea) - Ram