Intro to Fault Tolerant Distributed Algorithms
===
## Motivation: Replication
* Server may fail
* Replicate
* need consensus
* despite faulty servers
* Goal: provide illusion of a single non-faulty server
## Models
### Communication Model
* Message passing
* Shared memory
### Timing Model
* Synchrony
* Asynchrony
* Somewhere inbetween
### Fault Model
* Crash
* **Byzantine / arbitrary / malicious**
* Omission
* ...
## Broadcast Problem
n parties, including 1 sender with an input x, up to f faulty
Desired properties:
* Safety: no two honest parties output different values
* Liveness: every honest party outputs a value
* Validity: if sender is honest, every honest party outputs x
We don't care what dishonest parties do, since we have no control over them
## Agreement Problem
n parties, each has an input x_i, up to f faulty
Desired properties:
* Safety: no two honest parties output different values
* Liveness: every honest party outputs a value
* Validity: if every honest party has the same input x_i = x, then every honest party outputs x
Alternative validity: gamma-fraction, ...
These are deceptively simple problems
## Simplest Model
* Crash fault
* Synchrony, lock-step rounds
* Pair-wise connection, no message lost
## Dolev-Strong
### Algorithm (Crash version)
In round r <= f, if receive v from anyone, multicast v if haven't already.
At the end of round f+1, output v if received one, output default value otherwise.
### Correctness
Liveness: everyone outputs at round f+1
Validity: Sender not crash → everyone gets x after 1st round
Safety: If party i outputs v, every party outputs v
proof sketch:
* case 1: i receives v in round r <= f → i multicast v
* case 2: i receives v in round f+1
* why did i receive v only at the very last round?
* it must have been the case that at each round, only 1 party receives v, it sends v to only 1 other party and then crashes
* at most f faults → round f+1 multicast successfully
### Complexity
Latency: f+1 rounds
Communication complexity: O(n^2) messages, each size |v|
* n multicasts, each cost O(n)
* note that each party only multicasts once
### Byzantine Fault
What can go wrong?
* sender sends multiple values
* nodes make up values
* nodes delay forwarding
* ...
### Signature
* secret key sk and public key pk
* Sign(sk, msg) → $\sigma$
* Verify(pk, msg, $\sigma$) → T / F
Notation: $\text{<m>}_i \equiv$ (m, $\sigma_i$)
### Algorithm
* In round r <= f
* if receive M = \<...\<v\>$_s$ ... \>$_j$
* a value v **signed by r distinct** nodes
* s (the inner most signature) is sender
* multicast \<M\>$_i$
* sign it yourself
* extract v
* let V = V $\cup$ {v}
* At the end of round f+1
* if V = {v}, output v
* if V = $\emptyset$, output default value
* if |V| >= 2, output default value
One more thing: each party forwards at most two values
### Correctness
* Liveness: everyone outputs at round f+1
* Validity:
* sender honest → everyone extracts x after 1st round
* everyone extracts no other value, as only the values signed by the sender are accepted
* Safety (sketch):
* If party i extracts v and only v, so does everyone
* similar argument as crash version
* i extracts v in r <= f → i signs and multicasts
* i extracts v in f+1 → i receives \<...\<v\>$_s$...\>$_{\_}$ with f+1 distinct signs → at least 1 honest has signed and multicasted
* If party i extracts >= 2 values, everyone extracts >= 2 values (i ensures this by forwarding them)
* If party i extracts no value, so does everyone
### Complexity
Latency: f+1 rounds
Communication complexity: 2n^2 messages, each size up to (f+1)|$\sigma$|
* each node multicasts at most twice
## Fault Bounds
* Broadcast (crash or Byzantine): f < n
* Byzantine Agreement: f < n/2
### Rigorous proof
Byzantine Agreement is not solvable if f >= n/2
* Should not assume anything about the algorithm...
* Divide all parties into two groups P and Q s.t. |P| <= f and |Q| <= f
* doable because f >= n/2
* Consider the following scenarios:
1. P are honest and receive input x; Q are Byzantine and behave as if they receive input x' != x
2. Q are honest and receive input x'; P are Byzantine and behave as if they receive input x
3. P receive x; Q receive x'; All are honest
* In scenario 1, due to validity, P should output x
* No assumption on any algorithm that will produce this outcome
* Just say there's some magical black box algorithm
* In scenario 2, due to validity, Q should output x'
* In scenario 3, all parties are honest. However
* P cannot distinguish 3 from 1 → outputs x
* Q cannot distinguish 3 from 2 → outputs x'
* Violates safety (P and Q are honest but they output different values)
## Problems...
* Broadcast (crash fault)
* Byzantine Broadcast
* Byzantine Agreement
* State Machine Replication
* agree on a sequence of values, rather than only 1 value
* ... All in synchrony timing model!
* In asynchrony, FLP Impossibility states that no determinisitic agreement protocol can tolerate a single crash fault
## Then what?
* Randomization
* usually use a "common coin"
* outcome of coin flips is unpredictable
* but every party gets the same output
* related to some interesting crypto concepts
* secret sharing
* a secret s is divided into n "shares"
* any t shares reconstruct s
* any t-1 shares reveal no info about s
* threshold encryption
* n parties have secret shares of a private key sk
* any t can come together and decrypt
* ... but without learning each other's share!
* threshold signature
* any t signers can sign together
* without learning other's shares
* ...
* Easier models
* Partially synchronous
* several variants
* alternating periods of async and sync
* ...
* Paxos
* Liveness is guaranteed during synchrony period
* Byzantine versions of Paxos
* ...
* Easier problems
* Broadcast variants that relax the liveness requirement
* Reliable Broadcast
* Consistent Broadcast
* Graded Broadcast
* ... Bitcoin!
* with some cryptographical assumptions, computing power comes into play
* relaxed safety property
* allowing multiple chains to co-exist (not a strick "consensus" so to say)
* probabilitic "confidence" in what are being considered as committed