BEEFY: A Light-weight Finality Gadget for Cross-chain Bridges

# BEEFY: A Light-weight Finality Gadget for Cross-chain Bridges This document presents a description of the BEEFY protocol. BEEFY's aim is to efficiently follow a chain that has GRANDPA finality, a finality gadget created for Substrate/Polkadot ecosystem. This is useful for bridges (e.g., Polkadot->Ethereum), where a chain can follow another chain and light clients suitable for low storage devices such as mobile phones. Further, it provides an in-depth desurity and economic incentive analysis of the Rnadom Sampling based protocol. In the process, we address several concurrency issues in the original design and suggest possible fixes. ## Why not just use GRANDPA? A client could just follow GRANDPA using GRANDPA justifications, sets of signatures from validators. This is used for the substrate-substrate bridge and in light clients such as the Substrate connect browser extension. The main issue with this is that GRANDPA justifications are large and that they are expensive to verify on other chains like Ethereum that do not use the same cryptography. It is not easy to modify GRANDPA to make it better for this. Certain design decisions, like validators voting for different blocks in a justification make this hard. We also don't want to use slower cryptography in GRANDPA. Thus we are adding an extra protocol, BEEFY, that will allow lighter bridges and light clients of Polkadot. ## Protocol Overview BEEFY consists of two parts 1. **Consensus Extension** on GRANDPA finalisation that is a voting round The consensus extension serves to have smaller consensus justification than GRANDPA and alternative cryptography this helps the second part of BEEFY, the light client protocol described below. 2. **Light client** protocol for convincing the other chain/device efficiently about this vote. In the BEEFY light client, a prover, a full node or bridge relayer, wants to convince a verifier, a light client that may be on-chain, of the outcome of a BEEFY vote. The prover has access to all voting data from the BEEFY voting round. In the light client part, the prover may generate a proof and send it to the verifier or they may engage in an interactive protocol with several rounds of communication. Since BEEFY runs on top of GRANDPA, similarly to how GRANDPA is lagging behind the best produced (non-finalized) block, BEEFY is going to lag behind the best GRANDPA (finalised) block. The following simplifications for its setup make sense. - BEEFY validator set is the same as GRANDPA's, however they might be identified by different keys from their session keys - From a single validator perspective, BEEFY has at most one active voting round. - Since GRANDPA validators are reaching finality, we assume they are online and well-connected and have a similar view of the state of the blockchain. ## Consensus Extension: Voting Round In this part of BEEFY all validators need to produce and broadcast votes on GRANDPA finalised block. A vote is the validator's signatures over a Commitment that consists of a payload and a block number from which the payload originates. The payload is a blob extracted from a block or state at that block, and is expected to be a crypto accumulator, e.g., Merkle Tree Hash or Merkle Mountain Range Root Hash. Additionally Commitment contains BEEFY validator set id at that particular block. Note the block is finalised, so there is no ambiguity despite using block number instead of a hash. A collection of votes, or rather a Commitment and a collection of signatures is going to be called Signed Commitment. A valid Signed Commitment is also called a BEEFY Justification or BEEFY Finality Proof. A round is an attempt by BEEFY validators to produce BEEFY Justification. Round number is simply defined as a block number the validators are voting for, or to be more precise, the Commitment for that block number. Round ends when the next round is started, which may happen when one of the events occur: 1. Either collect 2/3rd + 1 valid votes for that round. 2. Or have received a BEEFY Justification for a block greater than the current best BEEFY block. The new round number is determined locally based on what is believed to be the current round number. The choice is based on knowledge of: - Best GRANDPA finalised block number - Best BEEFY finalised block number - Starting block of current session A session means a period of time or rather number of blocks where validator set (keys) do not change. ## Implementation alternatives for the BEEFY Light Client BEEFY will support multiple ways of doing light client protocol and in turn they need different signature schemes too, which is also applied to the voting round. There will be two versions of the light client protocol, a) one using random sampling and secp256k1 signatures (cite) and b) one using SNARKs and BLS signatures (cite). The voting round will use both secp256k1 and BLS signatures to support both types of light clients. For efficiency, the verifier doesn't need to know the public keys of the validator set, instead they just have a commitment to the list of keys. The nature of this commitment will be different for the schemes. Since Polkadot's validator set changes over time, when it does, the current validator set will need to sign in a BEEFY message, the commitment to the next validator set and the verifier can update the commitment when it is convinced that the current validator set agreed on it. ## Interactive Random Subsampling Implementation The aim is to have an interactive protocol with a prover that has access to all the votes who voted for that outcome and a verifier who wants to know if an outcome is true. The prover claims that a particular subset of validators signed the outcome and sends the signature of one of the validators. See [here](https://hackmd.io/wsVcL0tZQA-Ks3b5KJilFQ) for the restristctions on this one signature. The verifier asks the prover for signatures from a randomly selected sample of validators who the prover claimed voted for this outcome. Then the prover provides these signatures and if they verify, the verifier accepts. Our aim is to be efficient and have this interaction to be a minimal number of signatures that have to be inquired by the verifier to be convinced of the outcome with a certain probability. #### Form of the votes and light client communication The detailed description of datastructures and message format can be found in the [Beefy Section](https://github.com/w3f/polkadot-spec/blob/bnb/beefy-spec/docs/sect-finality.md) of the Polkadot Spec. *Vote*: The validators sign messages with secp256k1 keys. The message is a Polkadot relay chain block number and Merkle Mountain Range (MMR) (cite) root, where the leafs refer to Polkadot relay chain blocks. The leafs also include information about the bridge parachain blocks of that relay chain block. *Commitment*: The commitment to the list of the validator's public keys is a Merkle tree. The prover indicates the subset of keys that signed a message by sending a *bitvector*, whose $k$th bit is 1 if the $k$th key in the list committed to by the Merkle tree signed the message. It also inlcludes the signature of one of the validators, this is important to be able to slash anyone if the claim is wrong, since provers often dont have much stake themselves to be slashed for a wrong claim. To prove that the $k$th key signed the message when asked, the prover provides the copath of the $k$th public key in the Merkle tree, as well as the signature on the outcome that verifies with this public key. ### Protocol Parameters and Analysis We will be more formal about this than here in https://hackmd.io/c6STzrvfQGyN2P2rVmTmoA . There are $n$ validators. We assume that up to $f=\lfloor (n-1)/3 \rfloor$ of these may be dishonest, which gives that $n > 3f$, consistent with GRANDPA's assumptions. The verifier requires that the prover's bitvector contains at least $n-f$ 1s. The verifier then asks for $m$ signatures, sampling these uniformly at random from the bits which are 1. $m$ depends on the certainty that the verifier wants to have of the outcome, where for overwhelming certainty it needs $m=f+1$ samples. Consider an honest prover having $n-f$ signatures on a message. If the validators the prover claims signed the outcome, then the prover should always be able to provide the signatures the verifier asked for and convince the verifier. Now suppose the prover is lying and the outcome did not happen. Then no honest validator would have signed the message. The prover needs to claim that $n-f$ validators signed the message but they can only obtain signatures from $f$ validators, the dishonest ones. So at least $n-2f$ of the claimed validators are honest and did not sign. For each signature the validator asks for, there is at least $(n-2f)/(n-f) \geq 1-f/(n-f) > 1-f/2f = 1/2$ chance that the validator is honest and then the prover cannot provide the signature. The verifier is only convinced when all samples are dishonest. Since each sample is independent, this happens with probability at most $(f/(n-f))^m < 2^{-m}$. *Example*: let us assume we have 100 total voters that are validators of a chain. The chain has agreed to outcome X, where 67 honest voters have voted for it and 33 dishonest voters have abstained from voting. The prover wants to convince the verifier that X has been agreed upon with probability 1/1000 . The verifier needs to inquire about the signatures of these 67 honest voters to be convinced that X has indeed been agreed upon. If the verifier inquires about $m=34$ signatures, it can be sure with overwhelming probability that at least one of these signatures is from an honest voter since dishonest voters are at most 33 voters. Now to be sure with probability 1/1000, the verifier only needs to inquire about $m=10$ signatures, because every time successful inquiry reduces the probability that the claim that X has been agreed upon being false by a factor of 1/2. #### How to do this on-chain? Here the verifier is on another chain. The prover will post their messages on-chain and chain logic or a smart contract will do the verification. This logic will need randomness for the samples, obtained from the consensus of the chain. We need this randomness to be random enough, even conditioned on what the prover knows when they make their claim. There will be a delay between the prover's message claiming that certain validators signed and the randomness for the challenge to ensure this. For proof of work chains, we will be able to use the blockhash as a source of randomness. Proof of stake chains generally have their own sources of randomness, as consensus often needs one. In Ethereum's case, this is called RANDAO randomness and is available to smart contracts. ## Pre-requisites There are $n$ validators with public keys $pk_1,\dots,pk_n$. Of these at least $n-f$ for $3f < n$ are honest and at most $f$ are dishonest. A prover wants to convince a verifier that at least one honest validator signed a message $msg$. The verifier starts with a commitment $C$ to the public keys $pk_1,\dots,pk_i$, e.g. their Merkle root. The prover and verifier do this by engaging in an interactive protocol. The properties we want for this protocol are: **Completeness**: An honest prover who has correct signatures from $n-f$ different validators on $msg$ will always be able to convince the verfier. (A correct signature is one that will always verify on the message and corresponding public key) **Soundness**: If no honest validator signed $m$, then the prover can convince the verifier with probability at most $2^{-m}$ for a parameter $m$. ## The protocol 1. The prover sends $msg$. They also send a bitvector $b$ of length $n$ with at least $n-f$ 1s. An honest prover sets $b_i=1$ if and only if they have a signature $\sigma_i$ that verifies against $pk_i$. They also send one of these signatures $\sigma_j$ and an opening $op_j$ of $C$ to $pk_j$ (e.g. a Merkle copath). 2. The verfier samples $i_1,\dots i_m$ where each $i_k$ is chosen independently at random uniformly from the positions of bits of $b$ that are $1$. 3. The prover sends signatures $\sigma_{i_k}$, the public keys $pk_{i_1} ,\dots,pk_{i_m}$ and openings $op_{i_k}$ of $C$ to $pk_{i_k}$ for $k$ from $1$ to $m$. 4. The verifier checks that $b$ had at least $n-f$ 1s. They check the $m+1$ opeingings $op_j, op_{i_1},\dots,op_{i_m}$ against the corresponfing public keys $pk_j, pk_{i_1} ,\dots,pk_{i_m}$ and $C$. They also check the $m+1$ signatures $\sigma_j, \sigma_{i_1} ,\dots,\sigma_{i_m}$ against the public keys $pk_j, pk_{i_1} ,\dots,pk_{i_m}$ and $msg$. If all checks passed they are convinced that an honest validator signed $m$. ## Security Proof **Assumption:** We assume that the signature scheme is unforgable and the commitment scheme is binding. We ignore here the negligible probability that the prover can find a signature that verifiers against the public key of an honest validator that did not sign the message or that they can find an opening of $C$ at position $i$ to a value other than $pk_i$ that verifies. **Completeness**: completeness easily follows from the protocol description and our assumptions. **Soundness**: assume that no honest validator signed $msg$. The prover must provide a bitfield with at least $n-f$ 1s. Consider $i_k$ for some $1 \leq i \leq k$. The public key $pk_{i_k}$ belongs to a dishonest validator only if $i_k$ is one of at most $f$ possible values out of the least $n-f$ $1$s. Since $n > 3f$, with probability at least $1-f/(n-f) > 1-f/2f=1/2$, $pk_{i_k}$ belongs to an honest validator. Since the $i_k$s are each chosen independently, the probability that no $pk_{i_k}$ for any $k$ belongs to an honest validator is at most $2^{-m}$. It remains to show that if some $pk_{i_k}$ belongs to an honest validator, then the prover cannot convince the verifier. Because the commitment scheme is binding, the prover cannot provide an opening of $C$ at position $i_k$ to a value other than $pk_{i_k}$. Because the signature scheme is unforgeable and honest validators did not sign $msg$,, the prover cannot provide a signature $\sigma_{i_k}$ that verifies against $pk_{i_j}$. So the prover will not convince the verifier in this case. ### Concurrency Issue For the random sampling light client version of there is a concurrency issue as follows. The prover (relayer) sends a validator signature to the verifier when he sends the commitment to the set of validators who signed the BEEFY block. This is so that if the claim is wrong the validator whose signature has been submitted first can be slashed, provers (relayers) themselves might not have enough stake to allow a significant punishment for wrong claims. However, the problem is that this slashing might take some time and meanwhile many other provers (relayers) or even the same prover (relayer) might use the same signature for many other wrong claims that we cannot slash for anymore. To mitigate this problem we have several options. 1. A straightforward solution would have been to globally limit the number of times a certain validator's signature can be used for a claim in a certain time period. The problems with this solution are that a. There might be many more claims than honest validators, and this restriction can lead to the same validator needing to be used more than once. b. If we have light client logic that doesn't allow the same validator's signature to be used consecutively even by different relayers, then there is an attack on liveness where a malicious relayer can frontrun honest relayers with a claim using the same validator. We could look into mitigating this by using a commit-reveal scheme and yet another transaction. 2. We could forbid each relayer from using the same validator twice for a period of time. This would mean that we would have to limit the number of relayers and increase the number of signatures required by $\log_2 (\text{max no. of relayers})$. We give details of the second option in the next section, The period of time involved here is (the time it takes for a slash to be reported) + (one session on Polkadot, i.e. the time before we change a validator set). If the bridge logic is aware of sessions, because that means a change of validator set, then we can use that, or else use 4000 Polkadot blocks (one session should be at most 3,600 blocks). ## Solution-1: Limiting validators reuse per relayer We do not allow the same relayer to use the same validator signature in the same period, i.e. 4000 Polkadot blocks. On Polkadot now there are 300 validators and each claim needs to be that 200 of those validators signed something. As long as a relayer makes no more than 200 claims, which is one every 20 Polkadort blocks or shorter than the gaps between claim and response on Ethereum, it will never run out of validators to use. The smart contract needs to store a bitvector per relayer and maybe some data indicating which period of 4000 blocks this bitvector refers to. If the smart contract is already storing claim data with relayers addresses then this might not be a huge change to the code. The code would check if the bitvector is still valid and then set a bit with each claim. Then we'd need to increase the number of checked signatures by $\log_2 (\text{max no. of relayers})$. We can have the relayers have a deposit i.e. lock some Ether. We could also have Polkadot whitelist some relayers. Or do both, whitelist some relayers from Polkadot, but allow extra relayers to join with a large deposit to deal with all existing relayers going down simultaneously. Then we can compute a max number. For this relayers would need to make a deposit that can never be slashed, but they should be unable to withdraw it in the same session as they make a claim. That is not ideal, as if BEEFY stalls, the deposits are locked forever, so maybe they could also be unlocked after a large number of Ethereum blocks (think weeks or months, enough for us to fix BEEFY if it is possible). It may also be ok to increase the number of signatures only by $\lceil \log_2 (\text{no. of relayers this session}) \rceil + \lceil \log_2 \log_2 (\text{max no. of relayers}) \rceil$ or $1+2\lceil \log_2 (\text{no. of relayers this session}) \rceil$ which opimistically is much smaller. An attacker adding relayers would need increasing number of signature checks with the number of relayers and so we should be able to bound the total probability of any relayer convincing the chain of something false per validator signing an incorrect claim. With this model, we needn't have relayer deposits at all. An attacker can grief relayers by spamming relayers who do nothing, but the cost to the attacker is exponential in the amount of extra work they cause in a relayer so this may be fine. For this, we should record $\text{no. of relayers this session}$ with a relayer's ticket sometime, because we do not want frontrunning a relayer's response with all the signatures with registering another relayer to make the response invalid. If the probability of the adverary winning without this factor is $p$, and we add $1+2\lceil \log_2 (\text{no. of relayers this session}) \rceil$, then the total probability of the adversary winning from all claims that include one validator, if they keep on registering relayers is bounded by \begin{align*} \sum_{r=1}^\infty 2^{-1-2\lceil \log_2 r \rceil} p & = (1/2 + \sum_{i=0}^{\infty} \sum_{r=2^i + 1}^{2^{i+1}} 2^{-1-2\lceil \log_2 r \rceil}) p \\ & = (1/2 + \sum_{i=0}^{\infty} 2^i 2^{-1-2i-2}) p \\ & = (3/4) p \end{align*} ## Solution-2: Signature Checks dynamically depend on the No. of initial Claims per session We can also dynamically increase the number of signatures checked based on the total number of SubmitInitial (inital claims) transactions in a session. We keep a counter $t$ (total claims) that resets to 0 at the beginning of a session, and increments by one for each SubmitInitial called (no matter which relayer or the initial validator). Assuming $M$ to be the minimum number of signature checks, the protocol now asks $M+1+2*\lceil log_2 t\rceil$ where $t$ is the value at the point in time where the SubmitInitial is called by the relayer. The issues with this approach is as follows: an adversarial relayer can increase the number of signature checks for Good relayers by continuosly spamming SubmitInitial claims (all backed by just one initial validator signature). With just 100 submit initial claims, he can increase the signature checks for other good relayers by $1+2*log_2 100 = 15$ signature checks for the whole session. The approach mentioned in Solution-3 handles this issue and makes such a griefing attacks much more costly for the adversarial relayer. ## Solution-3: Increasing checked signatures with the number of times the validator's signature is used. Here is another approach for mitigating the Validator reuse concurrency issue. We can dynamically change the number of signatures sampled based on the number of claims made in the current session using the same validators signature. We keep a counter $i_{S,V}$ for each validator signature $V$ used in the session $S$, that increments by 1 whenever there is an initial claim made by a relayer. For any further claims made (by any relayer using a validator signature $V$), the number of signatures sampled during submit final needs to increase by $1 + 2*\lceil log_2 (i_{S,V})\rceil$. This dynamic increase in the number of signature checks ensures that the probability of succeding is proportionally lowered, and the advantage gained by using the same validator signature multiple times is neutralised. Assume that an adversarial relayer $A$ (or a set $\{A_1, ..A_n\}$, which is same for this analysis) want to reuse the same malicious signature $V_m$ multiple times, to increase their chances of succeding while not increasing the cost of the attack $Slash$ (the maximum slashable value for a validator). $A$ initiates multiple claims with the same validator signature $V_m$. Let $M$ be the minimum number of signatures that have to be checked for each relayer. With the dynamic approach, the $k^{th}$ claim of $A$ has $M + 1 + 2*\lceil log_2 i_{S,V}\rceil$ number of signatures checked, instead of just $M$. $$ E(k)= \Sigma_{i=1}^{k} \frac{1}{2^{M + 1 + 2*\lceil log_2 i\rceil}} $$ [Here](https://github.com/bhargavbh/BEEFY-Analysis/blob/main/DynamicChallengesAnlaysis.ipynb) is a ipynb which computes the factor of advantage gained by sending multiple claims (in comparison to an honest relayer). The ipynb also includes analysis on how a griefing attack by malicious relayer to increase the number fo signature checks for the network is prohoibitively expensive. ## Solution-4: A static increase in the number of checkers To account for the concurrency issue, we could also just increase the number of signature checks. For an adversarial relayer, we assume he fits in as many SubmitInitial transactions as possible within two Polkadot sessions (the time required to detecting if a BEEFY slash occured). Let $p$ be the probability of successful attack (not considering the concurrency issue), Dot Market Cap $D$, and Slash value $S$ is, gas cost of one SubmitInitial transaction in USD. Expected incentive for trying with $k$ malicious SubmitInitial claims for the adversary is: $$ \begin{aligned} E(k) & = (k*p)*D + (1-k*p)*(-S) + k*(-g)\\ & = k*(p*(D+S) -g) -S \end{aligned} $$ We notice that the expectation of economic incentive linearly depends on $k$ with a slope of $p*(D+S)-g$. We need to set $p$ such that $p*(D+S)-g\leq 0$ for the adversary to not exploit the concurrency issue. With current values of $D$,$S$,$g$, we derive $p\approx 3.6*10^{-10}$, hence number of signature $\lceil log_2(p^{-1})\rceil = 32$ (excluding the signatures required to negate Randao biasing). The total number of signature checks are significantly higher than other approaches. Assuming $M=log_2 (D/S +1)$, the expectation boils down to: $$ \begin{aligned} E(k) & = k*S + (1-k*s/D)*(-S) + k*(-g) \\ & = k*(S-g) - S \end{aligned} $$ where the expectation is linealry dependent ok $k$ with a large slope $S-g\approx S$ since $(g<<S)$. Hence, the adversary is always incentivised to perform the concurrency attack as his increase is Expected incentive overweighs the gas costs of performing the attack. ## Conclusion Of the 4 possible solutions, Solution-3 is likely the best option. The advantages are: 1. No requierment of Relayer deposits on the ethereum side 2. No bound on the number of relayers that can participate 3. The total number of signature checks (globally in the protocol) is lower than having a fixed high initial threshold. The number of signature checks dynamically increase only in the case of an attack eploiting concurrency issue. 4. An adversary needs to spend a lot of gas budget for griefing attacks. To elaborate, the main advantage is its harder for adversaries to spam submitInitial claims to raise the number of signature checks (for other honest relayers). If the increase happened on Total claims (solution-2) in a session, then the adversary could pick just one unique validator signature to support multiple SubmitInitial claims, and increase signature checks for everyone (even honest relayers). With the proposed change in Solution-3, honest relayers can always chose a different validator signature (other than the one used by adversary) to support their claims. # Deriving Security Parameters from Economic Incentive Analysis This document considers the number of random validators's signatures that must be included in the final step of the random sampling light client of BEEFY, the protocol described [here](https://hackmd.io/UsPqx0IATX6yFSxcBLIhHQ), to be secure. [This analysis](https://hackmd.io/c6STzrvfQGyN2P2rVmTmoA) shows using $k$ random validator's signatures gives a probability of at most $2^{-k}$ of an adversary convincing the verifier of a false claim. However this considers a one-shot protocol with unbiased randomness. The goal is to make an attack on the BEEFY client of Polkadot more expensive than all DOT tokens. Note that this many tokens can be used for other attacks on Polkadot, such as attacking GRANDPA or BEEFY directly by choosing 1/3 of the validators or attacking parachains. For the latter, we only obtain that attacks are expensive in expectation, and we will aim for something similar here. Then we can argue that any bridge using BEEFY is not a weak point. This also holds if BEEFY is used for many bridges, since an attack on Polkadot or a bridge hub parachain could be used to attack several bridges. The first BEEFY transaction contains a signature by one validator. When slashing for BEEFY is appropriately implemented, it should be possible to report and slash any validator whose signature appears on another chain. Ideally relayers for the bridge would automatically make such a report in a timely fashion. If the protocol was one-shot and unbiased, we could set $$k = \lceil \log_2 \frac{\text{total DOT supply}}{\text{minimum amount slashed}} \rceil $$ However, there are a number of things that can bias the number of needed signatures and when we need to increase the number of checkers to stay safe. 1. The concurrency [issue](https://hackmd.io/wsVcL0tZQA-Ks3b5KJilFQ). The problem is that a single validator's signature may be used many times before they are slashed. 2. If an Randao attack happens which will impact which validators signature they ask for. The Randao attack works as follows. The adversary waits until it has the possiblity to create or not, many blocks at the end of an epoch. This will give it a chance to bias epoch+2 and consequently epoch+4. We want the waiting time to be as long as possible to make the attack expensive. The number of signatures needed since they impact the cost of the Randao attack: All these factors gives a formula for an upper bound for $k$ which is secure that looks something like: \begin{align*} k = \lceil \log_2 & \frac{\text{total DOT supply}}{\text{minimum amount slashed}} \times \\ & \text{number of slots the RanDAO value could come form} \times \\ & \text{expected number of choices under a RanDAO attack per slot} \times \\ & \text{the number of times a validators signature can be reused in claims} \rceil \end{align*} We will look at these parameters in the following sections. For the last number, we will aim for a variable probability and add $1+2\lceil \log_2 (\text{no. of initial claims this session signed by the same validator}) \rceil$. Let's assume 1000 validators as on Kusama and 25% slashing. Then using recommendations from below, we obtain: \begin{align*} k = \lceil \log_2 & 2.5 \times 1000 \times 4 \times \\ & 78 \times \\ & 172.8 \rceil\\ & + 1+2\lceil \log_2 (\text{no. of initial claims this session signed by the same validator}) \rceil \\ &= 29+2\lceil \log_2 (\text{no. of initial claims this session signed by the same validator}) \rceil \end{align*} ### Estimating the ratio of the total DOTs to what could be slashed We need to estimate $\frac{\text{Total issuance}}{\text{Dots that can be slashed for backing of the least backed validator}}$ At the current time, we have: | Network | Total issuance | Stake backing least backed validator | Ratio | Number of validators | Ratio per validator| | -------------------------------| ------------------------ | ------------------------------------ | ------ | ------------ |--| | Kusama (block no. 19,422,555) | 13.9492 MKSM | 6,750 KSM | $2066.54$ | 1000 | 2.066 | | Polkadot (block no. 17,033,647) | $1.3496\times 10^9$ DOT | $1.9567\times 10^6$ DOT | $689.73$ | 297 |2.3198| Based on this numbers we would suggest to include a factor of 2.5 times the number of validators in the formula above to be safe, assuming that we slash 100% for signing the wrong payload in BEEFY. We expect the number of validators in Kusama/Polkadot to change in the future, but expect the *ratio per validator* to remain the same. Hence we suggest using *ratio per validator* $\times$ *Validator Set*, keeping it parametric to the validator set size which the light client is aware of. If we wanted to slash e.g. 3%, then we'd need to increase this by $1/0.03$ to get the correct value for $\frac{\text{total DOT supply}}{\text{minimum amount slashed}}$. We also suggest keeping the number of checks paramteric w.r.t the slashing rate. ### The concurrency issue In https://hackmd.io/wsVcL0tZQA-Ks3b5KJilFQ , we reccommended adding $1+2\lceil \log_2 (\text{no. of relayers this session}) \rceil$ checkers, assuming we can make each relayer not use the same validator signature. Alternatives include adding $1+2\lceil \log_2 (\text{no. of initial claims this session}) \rceil$, which requires only counting the number of initial claims. Or we can keep track of the number of times each validator has a signature used in the inital claim, and then increase by $1+2\lceil \log_2 (\text{no. of initial claims this session signed by the same validator}) \rceil$. Currently it appears that only having a static increase in the number of validators, not tracking anything in the session, would require not only a decent increase, but also new security assumptions. #### Randao Biasability We did some analysis in https://github.com/w3f/consensus/blob/master/random/randao_analysis.ipynb . Our analysis of Randao is similar to the one in https://eth2book.info/capella/part2/building_blocks/randomness/#randao-biasability but slightly more involved. We assume that an attacker is unable to prevent an honest block producer's block from being included in the chain. They might be able to because 1. The attacks on LMD Casper considered in https://arxiv.org/pdf/2102.02247.pdf and https://arxiv.org/pdf/2110.10086.pdf . In particular the one block private fork attack allows a malicious block producer to skip an honest block producer in the next slot. This really complicates the analysis, making it much more combinatoric. 2. The fact that Ethereum block producers are publicly known well in advance makes them a target for DoS attacks, potentially causing thwem to miss their slot. If this is consistently possible, then Ethereum is censorable, the randao randomness is arbitrarily biasable and the sync committee is useless. 2 will be fixed by some form of mostly secret leader election. 1 would be solved by adopting the Goldfish or RLMD protocols. We ignore these attacks. As in https://eth2book.info/capella/part2/building_blocks/randomness/#randao-biasability, the key quantity is the tail length, the number $m$ of slots with adversarial block producers in sequence before the randao value is used e.g. at the end of the epoch or the challenge block for BEEFY. The last honest block producer before this point produces a block that must be included, whose contribution to the randomness is random and unknown in advance. The adversarial contributions the randomness are fixed by this point, so the adversary has the choice of publishing a block or not. This gives them $2^m$ choices of randomness. The randao randomness at the end of epoch $n$ is used to determine the block producers of epoch $n+2$. We can imagine that the adversary wants to maximise the numnber of adversarial slots at the end of epoch $n+2$ to get control over the randonness in epoch $n+4$ etc. If they choose to do this, we can construct a Markov chain, where each state is the number of adversarial blocks at the end of this epoch. The next state is the number of adversarial blocks at the end of the epoch after next. (So odd and even numbered epochs are mostly indepedent.) If the adversray controls $1/3$ of the validator set and controls $m$ slots at the end of the current epoch then under this attack, the distribution of the number of slots they control at the end of the next epoch is the maximum of $2^m$ geometric distributions (the kind that start at 0) with parameter $2/3$. The stationary distribution of this Markov chain is the distribution of the number of adversarial slots at the end of the peoch under continous attack where the adversary tries to maximise this. We want to consider sampling randomness 4 epochs after some trigger happens. Now an attacker could wasit until the current epoch has many adversarial validators at the end, before causing the trigger to happen. Then 4 epochs, later, the current epoch may still be somewhate biasable, allowing the adversary to have a more than usual chance to get many adversarial blocks before the trigger block. How many slots at the end of an epoch is it reasonable to wait for? Well from https://github.com/w3f/consensus/blob/master/random/randao_analysis.ipynb , we obtained that, under the stationary distribution, i.e. under the continuous attack above, tail lengths of 15,16,17 occur in expectation once in every 8.03,24.1 and 72.3 years respectively. It seems rather expensive in terms of missed block rewards to carry out the attack for that long. Without the continuous attack, tail lengths 15,16,17 only occur in expectation every 26.2,78.6,235.8 years respectively. So we went with 16 as the maximum feasible tail length for the adversary. Hopefully this is conservative enough for 1. not to be a problem. If there is a tail length of 16 in the current epoch $n$ and the adversary makes the initial transaction now, then the distribution of the number of slots before a slot in epoch $n+4$ is given by the distribution of the Markov chain two transitions from the state correspoding to tail length 16. The expectation of $2^X$ with $X$ sampled from this distribution is the expected number of choices the adversary will have. We obtained an expected number of 172.8 choices from this. We therefore advise adding $\lceil \log_2 172.8 \rceil = 8$ samples to deal with the RanDAO biasability. #### How many choices of slot number does the adversary have? In the BEEFY protocol as implemented, when the block number which is the initial block number + 128 is reached, the relayer needs to send a transaction that samples the previous RANDAO random number. They have a certain number of blocks $m$ to include this transaction. The adversary can then choose from $m$ block numbers. However, if the protocol works using block numbers, the number of choices of slot number can be much bigger. Block producers are assigned to slot numbers, not block numbers, in an epoch. Some slot numbers will have more adversarial slots before them and so more choices for the randomness. By choosing whether or not to produce blocks in earlier slots, which do not affect the number of choices for the randomness, the adversary may be able to ensure that the sampling block occurs at their choice of slot number. This attack is not useful before the adversary knows the randomness for the epoch in question. However, there is not much point them doing this until 2 epochs before, because then they don't know which slot to aim for until they know the block producers. After that, they need at most 64 blocks until the RanDAO randomness can be sampled. Then they have an additional few blocks depending on a parameter `RandaoCommitExpiration`. Let's assume that this is 3, though we can tolerate much larger numbers. Just how many choices of slots do they have, starting at 67 blocks before? Again we assume that honest blocks always end up in the chain. Though now the LMD attacks where they don't are not so useful as they end up with some adversarial block in the chain instead. If all block producers produce blocks, then the can be a minium of 64 slots. The maximum number of slots before 67 blocks is 67 plus the number of adversarial slots before there are 67 honest slots. If the slots assignments were random, and of course they can be biased as above, this is a sample from a negative binomial distribution with parameters r=67. p=2/3. As calculated at the end of https://github.com/w3f/consensus/blob/master/random/randao_analysis.ipynb , there could be 74 adversarial slots before 67 honest ones. except with probability $1/(172.8 \times 3738.7)$. Thus the slots taken for 64 to 67 blocks could be in the range $64$ to $67+74$, giving $78$ choices. Using linearity of expectation, we can naively bound the expected number of choices of RanDAO as $172.8 \times 78$ with $\lceil \log_2 172.8 \times 78 \rceil = 14$. But I suspect that we can do better. #### Tighter bounds to reduce signature checks [This analysis](https://hackmd.io/c6STzrvfQGyN2P2rVmTmoA) shows using $k$ random validator's signatures gives a probability of at most $2^{-k}$ of an adversary convincing the verifier of a false claim. However, if $k$ unique signatures are chosen (i.e., there are no duplicates), this bound can be improved. Snowbridge already implements this optimisation, as the subsample routine that randomly selects signatures also ensures that the chosen signatures are all unique. Assuming $n$ as the number of signatures sent by the relayer ($\geq$ 2/3* (ValidatorSet)+1), and $f$ malicious signatures ($\leq$ 1/3 *(ValidatorSet)), and $m'$ signatures chosen randomly uniquely, the winning probability is : $$ \frac{f!}{(f-m')!}* \frac{(n-m')!}{n!} $$ Using the analysis above, for the case of Polkadot, assuming $n=201$, $f=100$, chosing $m=30$ signatures randomly gives the same security guarantees as chosing $m'=27$ signatures randomly uniquely (this optimisation already exists in implementation). ### B) Aggregateable Signatures and SNARK The main difference of this implementation alternative is that BLS signatures are used that make aggregating signatures possible. Note that the verification of these signatures is different too. Another difference is that the prover only needs to send a single aggregated signature and accompanying proofs once to the verifier, which means we don't need to do random sampling anymore and no further interaction is needed. The prover obtains a single aggregate signature and a commitment to all the necessary public keys needed for verifying this signature. Normally, the verifier needs to compute an aggregate public key for verification from the signing public keys, however we avoid this by having the prover provide a commitment and SNARK that the aggregate public key is correct. Thus the verifier does not need the full list of public keys, just the commitment and a bitvector of who signed. It uses the commitment, the bitvector and the aggregate public key as inputs to the SNARK. If the SNARK verifies, it knows that the aggregate public key is correct which it then uses to verify the aggregate signature. Note that this alternative is not probabilistic. Indeed it is **accountable**, i.e. a full node, who has the real list of validator public keys can determine which validators have misbehaved and can report this behaviour to the chain. The bitvector indicates who has signed, and the same aggregate public key can be computed directly from the public keys. #### Cryptographic details We use BLS signatures, probably on the BLS12-377 curve. The SNARK can be done most efficiently on the BW6-761 curve. This has subsecond proving time. We plan for the primitives necessary to verify this be added as host calls to substrate and be usable on a parachain: https://github.com/paritytech/substrate/pull/13031 . This will only be efficient in on-chain light clients of chains that support primitives for these curves. As an alternative, we could prove the aggregation and BLS verification inside a SNARK that uses a completely different curve, like BLS12-381 or BN254. This requires using "non-native arithmetic" inside the SNARK resulting in slower proving times by a factor of several 100 so a few minutes proving time, but it will be usable on many chains like Ethereum. These different curves require different commitments to the list of validator's public keys. So BEEFY will have to support multiple commitments. ## Comparison of the two light client alternatives The subsampling protocol requires less cryptographic primitives. It is also ready to be using in Snowfork. The SNARK based protocol does not rely on on-chain randomness and provides accountability. It also has no delay because it has less communication rounds and is more efficient in terms of data because we do not need to provide several Merkle proofs and signatures. It is therefore superior when it can be used, when we have the SNARK code ready and when the verifier can easily support the cryptography required. Ultimately, we should aim to use the SNARK-based light client whenever possible because its simpler and more secure. However, on the short-term using the random sampling light client makes sense. ## Slashing for BEEFY misbehaviour It is necessary to slash validators who vote on a chain that is not finalised. It is not directly possible to prove this because we dont know what is finalized on chain. It should be slashable to vote in BEEFY for a block that is not in the current chain. This includes blocks with a smaller or equal block number to the head of the current chain but are not in the chain, and blocks with a higher block number. As long as GRANDPA is safe, no validator will be slashed in a finalised chain if they vote for only finalised blocks. Except for the case of higher block numbers, the slashing report would need to show that the signed message, which was for a block of a certain height, wasn't what the message should be for the block in the current chain of this height. The message, oddly called commitment in the code, is a tuple of (payload, Block Number, ValidatorSetID). We will ignore ValidatorSetID here as if it were wrong, it would at worst have the wrong validators signing the correct payload. So we want to slash for signing a message containing the wrong payload for a previous block number. The payload is an MMR root. We should store recent MMR roots on chain. If the MMR root is for a block number older than those stored on-chain, it should be possible for the slash reporter to generate an MMR prefix proof which shows that an MMR with a particular root and block number was a prefix of an MMR with a later proof and block number. This latter has to correspond with a recent MMR root stored on-chain. Sampling-based BEEFY light clients require that there is considerable slashing for the offense that is signing the wrong chain (there are no other types of slashable offenses in BEEFY), unlike in GRANDPA where a single equivocation offense is not slashed much unless there are many simultaneous offenses since a single equivocation in GRANDPA is less risky. Also note that in GRANDPA, it is easy to accidentally equivocate by running a second node with the same validator identity. However in BEEFY, as long as GRANDPA is safe, validators can only be slashed for voting for blocks they do not see as finalised by GRANDPA, which honest nodes will never do. So we can have significant slashing for one offense. If BEEFY only supported SNARK based light clients, it could have small slashing for a first offense and increased slashing for more simultaneous offenses like GRANDPA. This is because such light clients are accountable i.e. if it is misled and the proof is made public, then we should be able to slash 1/3 of validators. So a few misbehaving validators are not so much of a threat.