Try   HackMD

On Bracha Reliable Broadcast

To recap the problem Bracha'87 is trying to solve:

There are

n parties, out of whom at most
f<n/3
are byzantine, with a designated leader/broadcaster for each protocol run. The protocol runs in an asynchronous[1] and permissioned[2] network without the help of Public Key Infrastructure (PKI)/signatures to authenticate messages.
The leader starts with some input value
v
from a predefined value space
vV
(usually we consider
V={0,1}
) and a party
i
that terminates needs to output a value
viV
.

The two properties our broadcast protocol wants to achieve are:

  1. Validity: If the leader is honest, then eventually all honest parties will output leader's input.
  2. Agreement: If some honest party outputs a value, then eventually all honest parties will output the same value.

Notice that:

  • These definitions are almost identical to that of a standard Broadcast Problem, except the addition of "eventual" condition since we are working under asynchrony.
  • We abandon the termination property. Why? and why is a protocol that doesn't guarantee termination useful? see discussion below.
  • Bracha87 uses this reliable broadcast as a subprotocol to improve the fault tolerance of Asynchronous Byzantine Agreement by Ben-Or'83 from
    f<n/5
    to
    f<n/3
    .

Now, let's recall the protocol (skip rephrasing the validity and agreement proof) and explore a few extension questions:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

To terminate, or not to terminate

Bracha has reminded us of the famous FLP impossibility result in Section 8: "byzantine general protocols are impossible in asynchronous system even in the presence of a single fault". Another way to put it: to achieve consensus in asynchronous model, even a single crash failure would result in infinite execution.

Intutively, we can use the "indistinguishability argument" to quickly see why. In world 1, the leader is faulty and crashes without sending any messages. In world 2, all parties are honest, but the message from leader got delayed for aribrary amount of time. If we want termination, then there must be a priori time bound

T0 at which all parties should output a value. With a delay
>T0
in asynchrony, honest parties can't distinguish the two worlds (since they recieve no message in both worlds) and will output something different than leader's input, thus breaking validity. In order to avoid violating validity or agreement, honest parties have no choice but keep waiting indefinitely.

To circumvent the impossibility, people usually weaken the properties required to be held. In Bracha's case, we use a weaker (but still good enough) notion of termination:

  • Weak Termination: if leader is honest, then all honest parties eventually terminates; if leader is faulty, then either all or no honest parties eventually terminates.

Excercise: Show an attack that cause the protocol to not terminate for some parties.

Apparently, if a faulty leader doesn't send anything, then there will be never enough (at most

f) <Echo, v> for all honest parties, thus never a single vote from any honest party due to 3A, and thus never more than
f
<Vote, v> sent and 3B will never be satisfied, and no honest parties will terminate.

However, this is a correct but not super insightful attack, let's see a few more interesting ones, which help us internalize Bracha protocol later.

Attack 1: not enough honest parties know

For simpler presentation, I'd like to assume

n=3f+1, and we refer quorum A~C from left to right.
In the attack below, Quorum B will get 2f-1=n-f-1 echo, and Quorum C will only get f echo, since the faulty Quorum A only echo to Quorum B. Therefore, no honest parties will ever vote (since neither condition 3A nor 3B can be satisfied) > can't terminate.

f+1 honestf honestf faulty (incl. leader)f+1 honestf honestf faulty (incl. leader)<echo,1><echo,1><echo,1>

Attack 2: not enough honest parties know that "enough honest parties know"

Here's an even more revealing attack: Quorum C knows that enough (

f+1) honest parties heard about <echo, 1> (since they receive
nf
<echo, 1>, meeting condition 3A), therefore they vote for it. However, we don't have enough of a Quorum C to convince Quorum B to vote. Note that Quorum C didn't receive anything from the leader, thus they won't echo; Quorum B didn't receive enough echo to meet condition 3A and only counting on condition 3B to vote which unfortunately we fall short of.
Another note: another extreme case is Quorum B has
2f1
honest parties, and Quorum C only has
1
honest party in which case there will only be 1 vote from Quorum C and also a non-terminating situation.

C: f-1 honestB: f+2 honestA: f-1 faultyfaulty leaderC: f-1 honestB: f+2 honestA: f-1 faultyfaulty leader<echo, 1><echo, 1><echo, 1><echo, 1><vote, 1>

Excercise: Modify Bracha's broadcast primitive to make it a terminating protocol in a synchronous network, and provide security proof.

Given the lower bound from FL82, we can't acheive interactive consistency (a problem reducible from/to BB & BA) that tolerates

f byzantine parties with fewer than
f+1
rounds. Therefore, we are inspired to modify Bracha to the following:

// Round 1: 
leader sends <v> to all parties

// Round 2: 
if received <v> from leader:
    send <echo, v> to all parties

// Round 3: 
if received <echo, v> from n-f distinct parties:
    send <vote, v> to all parties

// Round 4 to 3+f: 
For i in range(1, f+1):
    if received <vote, v> from f+i distinct parties && have not voted before:
        send <vote, v> to all parties

// Round 3+f+1: 
if received <vote, v> from n-f distinct parties:
    output v
else 
    output \bot

Proof:

  • Termination: it's a finite round protocol with predefined timeout (since it's synchronous), thus obviously will terminate.
  • Validity: honest leader will trigger all honest parties to echo in round 2 and all vote in round 3, doing nothing for the next
    f
    round and finally all output leader's input since all got at least
    nf
    vote from round 3.
  • Agreement: almost identical to Dolev-Strong's proof with the only difference being: in round
    i
    , instead of checking
    #(vote)i
    in [DS82], we check
    #(vote)f+i
    since we can't use any authenticators without PKI setup. But the idea and proof logic is the same for upholding agreement.

Internalizing Bracha's Design

Here's how I personally internalize Bracha's design:

Let

E denotes the event that "there are enough (
f+1
) honest parties receive <echo, v>".

  • Step 3A:
    • avoids conflicting votes from honest parties using Quorum Intersection argument.
    • intuitively, honest party who observed that
      E
      occurs, will signal others by voting on it.
      • ensuring enough honest parties know that event
        E
        happens could prevent attack 1 above.
    • to ensure Quorum Intersection, we need
      nf
      echo; equivalently, enough honest parties know also means
      f+f+1
      echo.
  • Step 3B:
    • avoids message-dropping faulty nodes from forcing only some honest parties to terminate.
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      intuitively, this step gives honest parties who haven't or can't observe event
      E
      themselves the chance to trust other honest parties (who have observed
      E
      ) and relay the knowledge forward.
      • relaying this knowledge is basically signaling "I now know that 'enough honest parties receive <echo,v>'" which could prevent attack 2 above.
    • you only need
      f+1
      to ensure at least 1 honest vote.
  • Step 4:
    • with
      nf
      votes, at least
      f+1
      honest parties who know that "
      E
      happened" which is sufficient to drive the protocol to turn "
      E
      happened" a common knowledge among all honest parties. Thus feel safe to terminate without worrying about agreement violation.

Excercise: If we drop step 3B, which property fail to hold? Show an attack.

From the validity proof of Bracha broadcast, we know Step 3A is enough to ensure validity, thus we are looking for an attack on agreement. Given that Step 3A ensures no-conflicting votes among honest parties, we won't be using "equivocation attack" but rather "dropping message" attack.
Receiving

nf votes in step 4 implies that at least
f+1
honest parties voted.
By removing step 3B, the only source of honest vote is step 3A, meaning at least
f+1
honest parties received
nf
echo. Problem is what about the other
f
honest parties, can they also reach terminating state by receiving enough votes? From this line of logic, here's the attack:

C: f-1 honestB: f+2 honestA: f-1 faultyfaulty leaderC: f-1 honestB: f+2 honestA: f-1 faultyfaulty leader<echo, 1><echo, 1><echo, 1><echo, 1><vote, 1><vote, 1><vote, 1><vote, 1>

In the attack above, Quorum B all received enough echos in 3A to vote, and later all receive enough votes to terminate. However, Quorum C can never terminate since they only have

f+2 votes and
f+2
echoes and get stuck forever, thus violating agreement.

Excercise: Is the following "optimized" Bracha protocol correct? Show seucrity proof or an attack.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

From our summarized interalization above, it's easy to notice that modified step 3 only require

f+1 echo, which doesn't guarantee non-conflicting votes from honest nodes. Therefore our attack idea is to break agreement by misleading honest parties to conflict each other using equivocating byzantine nodes.

D: f-1 honestC: 1 honestB: f+1 honestA: f-1 faultyfaulty leaderD: f-1 honestC: 1 honestB: f+1 honestA: f-1 faultyfaulty leaderC: output 1D: > f+1 <echo,0>B: > f+1 <echo, 0>D: output 0<echo, 1><echo, 1><echo, 1><echo, 0><echo, 0><echo, 0><echo, 0><echo, 0><echo, 0>

  1. In an asynchronous network, messsages can be delayed for arbitrary amount of time and delievered out of order, but there will be eventual delivery. ↩︎

  2. In a permissioned network, every nodes know all participants and their IP addresses (and will only accept messages from them). ↩︎