Intro to Fault Tolerant Distributed Algorithms === ## Motivation: Replication * Server may fail * Replicate * need consensus * despite faulty servers * Goal: provide illusion of a single non-faulty server ## Models ### Communication Model * Message passing * Shared memory ### Timing Model * Synchrony * Asynchrony * Somewhere inbetween ### Fault Model * Crash * **Byzantine / arbitrary / malicious** * Omission * ... ## Broadcast Problem n parties, including 1 sender with an input x, up to f faulty Desired properties: * Safety: no two honest parties output different values * Liveness: every honest party outputs a value * Validity: if sender is honest, every honest party outputs x We don't care what dishonest parties do, since we have no control over them ## Agreement Problem n parties, each has an input x_i, up to f faulty Desired properties: * Safety: no two honest parties output different values * Liveness: every honest party outputs a value * Validity: if every honest party has the same input x_i = x, then every honest party outputs x Alternative validity: gamma-fraction, ... These are deceptively simple problems ## Simplest Model * Crash fault * Synchrony, lock-step rounds * Pair-wise connection, no message lost ## Dolev-Strong ### Algorithm (Crash version) In round r <= f, if receive v from anyone, multicast v if haven't already. At the end of round f+1, output v if received one, output default value otherwise. ### Correctness Liveness: everyone outputs at round f+1 Validity: Sender not crash → everyone gets x after 1st round Safety: If party i outputs v, every party outputs v proof sketch: * case 1: i receives v in round r <= f → i multicast v * case 2: i receives v in round f+1 * why did i receive v only at the very last round? * it must have been the case that at each round, only 1 party receives v, it sends v to only 1 other party and then crashes * at most f faults → round f+1 multicast successfully ### Complexity Latency: f+1 rounds Communication complexity: O(n^2) messages, each size |v| * n multicasts, each cost O(n) * note that each party only multicasts once ### Byzantine Fault What can go wrong? * sender sends multiple values * nodes make up values * nodes delay forwarding * ... ### Signature * secret key sk and public key pk * Sign(sk, msg) → $\sigma$ * Verify(pk, msg, $\sigma$) → T / F Notation: $\text{<m>}_i \equiv$ (m, $\sigma_i$) ### Algorithm * In round r <= f * if receive M = \<...\<v\>$_s$ ... \>$_j$ * a value v **signed by r distinct** nodes * s (the inner most signature) is sender * multicast \<M\>$_i$ * sign it yourself * extract v * let V = V $\cup$ {v} * At the end of round f+1 * if V = {v}, output v * if V = $\emptyset$, output default value * if |V| >= 2, output default value One more thing: each party forwards at most two values ### Correctness * Liveness: everyone outputs at round f+1 * Validity: * sender honest → everyone extracts x after 1st round * everyone extracts no other value, as only the values signed by the sender are accepted * Safety (sketch): * If party i extracts v and only v, so does everyone * similar argument as crash version * i extracts v in r <= f → i signs and multicasts * i extracts v in f+1 → i receives \<...\<v\>$_s$...\>$_{\_}$ with f+1 distinct signs → at least 1 honest has signed and multicasted * If party i extracts >= 2 values, everyone extracts >= 2 values (i ensures this by forwarding them) * If party i extracts no value, so does everyone ### Complexity Latency: f+1 rounds Communication complexity: 2n^2 messages, each size up to (f+1)|$\sigma$| * each node multicasts at most twice ## Fault Bounds * Broadcast (crash or Byzantine): f < n * Byzantine Agreement: f < n/2 ### Rigorous proof Byzantine Agreement is not solvable if f >= n/2 * Should not assume anything about the algorithm... * Divide all parties into two groups P and Q s.t. |P| <= f and |Q| <= f * doable because f >= n/2 * Consider the following scenarios: 1. P are honest and receive input x; Q are Byzantine and behave as if they receive input x' != x 2. Q are honest and receive input x'; P are Byzantine and behave as if they receive input x 3. P receive x; Q receive x'; All are honest * In scenario 1, due to validity, P should output x * No assumption on any algorithm that will produce this outcome * Just say there's some magical black box algorithm * In scenario 2, due to validity, Q should output x' * In scenario 3, all parties are honest. However * P cannot distinguish 3 from 1 → outputs x * Q cannot distinguish 3 from 2 → outputs x' * Violates safety (P and Q are honest but they output different values) ## Problems... * Broadcast (crash fault) * Byzantine Broadcast * Byzantine Agreement * State Machine Replication * agree on a sequence of values, rather than only 1 value * ... All in synchrony timing model! * In asynchrony, FLP Impossibility states that no determinisitic agreement protocol can tolerate a single crash fault ## Then what? * Randomization * usually use a "common coin" * outcome of coin flips is unpredictable * but every party gets the same output * related to some interesting crypto concepts * secret sharing * a secret s is divided into n "shares" * any t shares reconstruct s * any t-1 shares reveal no info about s * threshold encryption * n parties have secret shares of a private key sk * any t can come together and decrypt * ... but without learning each other's share! * threshold signature * any t signers can sign together * without learning other's shares * ... * Easier models * Partially synchronous * several variants * alternating periods of async and sync * ... * Paxos * Liveness is guaranteed during synchrony period * Byzantine versions of Paxos * ... * Easier problems * Broadcast variants that relax the liveness requirement * Reliable Broadcast * Consistent Broadcast * Graded Broadcast * ... Bitcoin! * with some cryptographical assumptions, computing power comes into play * relaxed safety property * allowing multiple chains to co-exist (not a strick "consensus" so to say) * probabilitic "confidence" in what are being considered as committed