owned this note
owned this note
Published
Linked with GitHub
---
tags: Network, Ethereum2.0
---
# Towards Ethereum2.0 Threat Model
**Abstract**
Casper FFG security based on an assumption than less than 1/3 validators violate slashing conditions. Serenity [security model](https://notes.ethereum.org/@vbuterin/rkhCgQteN?type=view#Security-models) complements this with *uncoordinated rational majority* model and network delay factor. We extend the analysis of risk factors which may threaten the beacon chain protocol security with further analysis of network-level faults. We formulate threats model based on BFT, economics and probabilistic perspectives.
> **NB** Currently, we mostly ignore two important parts:
> - Implementation bugs
> - Privacy threats
>
> They are important, but to be considered later and/or in other documents.
## Safety, Liveness and Robustness
Casper FFG safety - and thus beacon chain protocol's safety - is hard to break. However, its liveness properities can be affected by failures of sub-protocols and assumptions, which beacon chain protocol is based on. One example is big network delays, which can lead to a situation when some validators may look inactive.
Another one is clock disparity becomes too large, which is very similar to big network delays problem - i.e. clock synchronization assumption is broken.
While liveness in theoretical sense cannot be violated in a finite execution, in practical terms, if consensus cannot be reached then system is not usable, particularly, validators cannot receive their rewards, which can reduce value of the system to both end-users and validators.
So, from economical perspective, protocol/system is highly desirable to be robust to unusually high rate of faults. So, we can call the practical liveness property as robustness.
Due to inactivity lick, validator which fails to participate in protocol, gradually loses its balance. That preserves liveness in case of network partitions or high network delays. However, it introduces an incentive for an attacker to kick out validators by isolating them from the rest of participants. In mass, it can be a cheap way to obtain more than one third of voting power, which can be used to censor transactions (or reduce practical liveness, in general).
So, while formally "accountable safety" cannot be broken, otherwise honest validators losing their balances is an example of "something wrong happens" and constitutes violation of safety properties of the overall beacon chain system.
## Sources of faults
One can figure out three main sources of faults:
- protocol flaws
- implementation bugs
- assumption violations
- hardware faults
- polynomially bounded adversary
- clock synchronization
- network model (delays, bandwidth, connectivity)
- wealthy and crazy adversary
We are concentrating on the last group. We can further distinguish:
- hardware faults
- clock drift
- network faults
- delays exceed bounds
- message drops (infinite delays)
- DoS
- reduced/low bandwith
- attacks
## Beacon chain and subprotocols
Beacon chain protocol relies on sub-protocols and assumptions:
- libp2p to communicate
- clock synchronization
- among validators
- world time standard
- topic discovery
- node membership
In general, if BFT protocol critically relies on some assumption or sub-protocol, then the subprotocol should posess BFT properties too (and assumption may be required to be enforced with some BFT protocol). In the other case, an attack can be performed against the subprotocol, which may lead to a violation of the beacon chain security properties.
An example, if a malicious adversary can attack validator clocks, it can effectively insulate validators from participating in the beacon chain protocol.
Thus, overall security properties should be traced down to sub-protocol properties and assumptions. In the case of clock synchronization property, a BFT Clock Synchronization protocol may be needed, if it cannot be assumed that validators can set up reliable independent time sources.
## Concepts and terminology
Faults can be:
- independent
- correlated:
- coordinated (attack)
- uncoordinated (random)
Another classification:
- random
- malicious
- induced
If faults are rare and independent, then probability of many faults occuring at the same period of time, so that security properties are violated are negligible.
However, the faults can be correlated. A notable example is an attack, where many nodes equivocate from protocol or induce faults to other nodes.
There can be correlated network faults or hardware faults, e.g. due to a natural accident or disaster.
Random, malicious and induced faults can sum up, so that alone they might not be enough, but together can violate security. For example, an adversary can post several deposits and is able to induce clock faults to many validators. So, it can indirectly control (although in a limited way) much more voting power.
If long lasting geographic-wide network problems occur for some reason, then it can also be exploited by an adversary which has direct control of certain amount of validators and indirectly control certain amount of other validators (e.g. by disconnecting them from others or by inducing clock faults).
### Terminology
Correct node/validator
Crash fault
Non-crash fault
Two-faced fault
Delay/omission
Generated message (too early messages belong here)
Forged message (should not happen)
Corrupted message
Benign/honest node/validator - crash-faulty or correct.
Non-crash faulty - not necessarily malicious.
Benevolent/malicious adversary.
## Byzantine and weak Byzantine models
BFT protocols should tolerate arbitrary faults. That can be expensive, since additional rounds and information should be added to prevent worst-cases. Often, worst case scenarios are possible in theory but extremely unlikely in practice. Thus, additional assumptions are often inroduced, which can improve security bound and/or worst case behaviour, if the assumptions holds.
An example is synchronous model or GST partially synchronous model. Another example is authentification protocol/trusted setup/PKI. Polynomially bounded adversary is one more example.
Additional possibilities:
- anonymous access to public service
- adversary cannot distinguish protocol participant from other public access
- randomness
- adversary cannot gather enough knowledge or have no means to control validator node performance on low level, e.g. cannot control natural clock drift
Restrictining adversary power on the model level can bring closer worst-case and typical behaviour.
For example, given f faulty clocks and 3f+1 nodes, there can be cases when (trimmed) clock averaging leads to very slow convergence. However, due to natural random clock drift taht should be extremely unlikely.