--- tags: consensus, notes, --- # On Bracha Reliable Broadcast To recap the problem [Bracha'87](https://core.ac.uk/download/pdf/82523202.pdf) is trying to solve: There are $n$ parties, out of whom at most $f<n/3$ are byzantine, with a **designated leader**/broadcaster for each protocol run. The protocol runs in an **asynchronous**[^asyn] and **permissioned**[^perm] network **without the help of Public Key Infrastructure (PKI)**/signatures to authenticate messages. The leader starts with some input value $v$ from a predefined value space $v\in\mathcal{V}$ (usually we consider $\mathcal{V}=\{0,1\}$) and a party $i$ that _terminates_ needs to output a value $v_i \in\mathcal{V}$. The two properties our broadcast protocol wants to achieve are: 1. **Validity**: If the leader is honest, then _eventually_ all honest parties will output leader's input. 2. **Agreement**: If some honest party outputs a value, then _eventually_ all honest parties will output the same value. Notice that: - These definitions are almost identical to that of a standard Broadcast Problem, except the addition of "eventual" condition since we are working under asynchrony. - We abandon the _termination_ property. Why? and why is a protocol that doesn't guarantee termination useful? see [discussion below](#To-terminate-or-not-to-terminate). - Bracha87 uses this reliable broadcast as a subprotocol to improve the fault tolerance of Asynchronous Byzantine Agreement by [Ben-Or'83](https://allquantor.at/blockchainbib/pdf/ben1983another.pdf) from $f<n/5$ to $f<n/3$. Now, let's recall the protocol (skip rephrasing the validity and agreement proof) and explore a few extension questions: ![protocol screenshot from Prateek's slide](https://i.imgur.com/1la7lzx.png) ## To terminate, or not to terminate Bracha has reminded us of the famous [FLP impossibility](https://groups.csail.mit.edu/tds/papers/Lynch/jacm85.pdf) result in Section 8: "byzantine general protocols are impossible in asynchronous system even in the presence of a single fault". Another way to put it: to achieve consensus in asynchronous model, even a single crash failure would result in _infinite execution_. Intutively, we can use the "indistinguishability argument" to quickly see why. In world 1, the leader is faulty and crashes without sending any messages. In world 2, all parties are honest, but the message from leader got delayed for aribrary amount of time. If we want termination, then there must be a priori time bound $T_0$ at which all parties should output a value. With a delay $>T_0$ in asynchrony, honest parties can't distinguish the two worlds (since they recieve no message in both worlds) and will output something different than leader's input, thus breaking validity. In order to avoid violating validity or agreement, honest parties have no choice but keep waiting indefinitely. To circumvent the impossibility, people usually weaken the properties required to be held. In Bracha's case, we use a weaker (but still good enough) notion of termination: - **Weak Termination**: if leader is honest, then all honest parties eventually terminates; if leader is faulty, then either all or no honest parties eventually terminates. > Excercise: Show an attack that cause the protocol to not terminate for some parties. Apparently, if a faulty leader doesn't send anything, then there will be never enough (at most $f$) `<Echo, v>` for all honest parties, thus never a single vote from any honest party due to 3A, and thus never more than $f$ `<Vote, v>` sent and 3B will never be satisfied, and no honest parties will terminate. $\blacksquare$ However, this is a correct but not super insightful attack, let's see a few more interesting ones, which help us internalize Bracha protocol later. ### Attack 1: not enough honest parties know For simpler presentation, I'd like to assume $n=3f+1$, and we refer quorum A~C from left to right. In the attack below, Quorum B will get `2f-1=n-f-1` echo, and Quorum C will only get `f` echo, since the faulty Quorum A only echo to Quorum B. Therefore, no honest parties will ever vote (since neither condition 3A nor 3B can be satisfied) --> can't terminate. $\blacksquare$ ```mermaid sequenceDiagram participant A as f faulty (incl. leader) participant B as f honest participant C as f+1 honest A->>B: <echo,1> B->>B: <echo,1> B->>C: <echo,1> ``` ### Attack 2: not enough honest parties know that "enough honest parties know" Here's an even more revealing attack: Quorum C knows that enough ($\geq f+1$) honest parties heard about `<echo, 1>` (since they receive $\geq n-f$ `<echo, 1>`, meeting condition 3A), therefore they vote for it. However, we don't have enough of a Quorum C to convince Quorum B to vote. Note that Quorum C didn't receive anything from the leader, thus they won't echo; Quorum B didn't receive enough echo to meet condition 3A and only counting on condition 3B to vote which unfortunately we fall short of. Another note: another extreme case is Quorum B has $2f-1$ honest parties, and Quorum C only has $1$ honest party in which case there will only be 1 vote from Quorum C and also a non-terminating situation. $\blacksquare$ ```mermaid sequenceDiagram participant L as faulty leader participant A as A: f-1 faulty participant B as B: f+2 honest participant C as C: f-1 honest L->>B: <echo, 1> B->>C: <echo, 1> B->>B: <echo, 1> A->>C: <echo, 1> C->>B: <vote, 1> ``` > Excercise: Modify Bracha's broadcast primitive to make it a terminating protocol in a **synchronous network**, and provide security proof. Given the lower bound from [FL82](https://apps.dtic.mil/dtic/tr/fulltext/u2/a105946.pdf), we can't acheive interactive consistency (a problem reducible from/to BB & BA) that tolerates $f$ byzantine parties with fewer than $f+1$ rounds. Therefore, we are inspired to modify Bracha to the following: ``` // Round 1: leader sends <v> to all parties // Round 2: if received <v> from leader: send <echo, v> to all parties // Round 3: if received <echo, v> from n-f distinct parties: send <vote, v> to all parties // Round 4 to 3+f: For i in range(1, f+1): if received <vote, v> from f+i distinct parties && have not voted before: send <vote, v> to all parties // Round 3+f+1: if received <vote, v> from n-f distinct parties: output v else output \bot ``` Proof: - _Termination_: it's a finite round protocol with predefined timeout (since it's synchronous), thus obviously will terminate. - _Validity_: honest leader will trigger all honest parties to echo in round 2 and all vote in round 3, doing nothing for the next $f$ round and finally all output leader's input since all got at least $n-f$ vote from round 3. - _Agreement_: almost identical to Dolev-Strong's proof with the only difference being: in round $i$, instead of checking $\#(vote)\geq i$ in [[DS82]](https://dl.acm.org/doi/abs/10.1137/0212045), we check $\#(vote)\geq f+i$ since we can't use any authenticators without PKI setup. But the idea and proof logic is the same for upholding agreement. $\blacksquare$ ## Internalizing Bracha's Design Here's how I personally internalize Bracha's design: Let $E$ denotes the event that "there are enough ($\geq f+1$) honest parties receive `<echo, v>`". - Step 3A: - avoids conflicting votes from honest parties using Quorum Intersection argument. - intuitively, honest party who observed that $E$ occurs, will signal others by voting on it. - ensuring enough honest parties know that event $E$ happens could prevent [attack 1 above](#Attack-1-not-enough-honest-parties-know). - to ensure Quorum Intersection, we need $n-f$ echo; equivalently, enough honest parties know also means $\geq f+f+1$ echo. - Step 3B: - avoids message-dropping faulty nodes from forcing only some honest parties to terminate. - :star: intuitively, this step gives honest parties who haven't or can't observe event $E$ themselves the chance to trust other honest parties (who have observed $E$) and relay the knowledge forward. - relaying this knowledge is basically signaling "I now know that 'enough honest parties receive `<echo,v>`'" which could prevent [attack 2 above](#Attack-2-not-enough-honest-parties-know-that-“enough-honest-parties-know”). - you only need $f+1$ to ensure at least 1 honest vote. - Step 4: - with $n-f$ votes, at least $f+1$ honest parties who know that "$E$ happened" which is sufficient to drive the protocol to turn "$E$ happened" a common knowledge among all honest parties. Thus feel safe to terminate without worrying about agreement violation. > Excercise: If we drop step 3B, which property fail to hold? Show an attack. From the validity proof of Bracha broadcast, we know Step 3A is enough to ensure validity, thus we are looking for an attack on agreement. Given that Step 3A ensures no-conflicting votes among honest parties, we won't be using "equivocation attack" but rather "dropping message" attack. Receiving $n-f$ votes in step 4 implies that at least $f+1$ honest parties voted. By removing step 3B, the only source of honest vote is step 3A, meaning at least $f+1$ honest parties received $n-f$ echo. Problem is what about the other $f$ honest parties, can they also reach terminating state by receiving enough votes? From this line of logic, here's the attack: ```mermaid sequenceDiagram participant L as faulty leader participant A as A: f-1 faulty participant B as B: f+2 honest participant C as C: f-1 honest L->>B: <echo, 1> A->>B: <echo, 1> B->>B: <echo, 1> B->>C: <echo, 1> L->>B: <vote, 1> A->>B: <vote, 1> B->>B: <vote, 1> B->>C: <vote, 1> ``` In the attack above, Quorum B all received enough echos in 3A to vote, and later all receive enough votes to terminate. However, Quorum C can never terminate since they only have $f+2$ votes and $f+2$ echoes and get stuck forever, thus violating agreement. $\blacksquare$ > Excercise: Is the following "optimized" Bracha protocol correct? Show seucrity proof or an attack. > ![wrong bracha variant](https://i.imgur.com/oFVExlc.png) From our summarized interalization above, it's easy to notice that modified step 3 only require $f+1$ echo, which doesn't guarantee non-conflicting votes from honest nodes. Therefore our attack idea is to break agreement by misleading honest parties to conflict each other using equivocating byzantine nodes. ```mermaid sequenceDiagram participant L as faulty leader participant A as A: f-1 faulty participant B as B: f+1 honest participant C as C: 1 honest participant D as D: f-1 honest rect rgb(191, 223, 255) L->>B: <echo, 1> B->>C: <echo, 1> A->>C: <echo, 1> end Note over C: C: output 1 rect rgb(255, 255, 222) L->>D: <echo, 0> D->>D: <echo, 0> A->>D: <echo, 0> Note over D: D: > f+1 <echo,0> D->>B: <echo, 0> A->>B: <echo, 0> Note over B: B: > f+1 <echo, 0> B->>D: <echo, 0> end Note over D: D: output 0 ``` [^asyn]: In an asynchronous network, messsages can be delayed for arbitrary amount of time and delievered out of order, but there will be eventual delivery. [^perm]: In a permissioned network, every nodes know all participants and their IP addresses (and will only accept messages from them).