Zenotta’s RAFT BFT

# BFT compliant RAFT consensus ## Table of Contents: [TOC] ## 1. Abstract & Motivation This document aims to probe potential concepts that can bolster the fast and crash tolerant RAFT consensus with BFT capabilities. RAFT consensus, which is the underlying consensus algorithm in Zenotta's Network Protocol, assumes that the rings of Compute and Storage nodes are run in secure environments such that there are no maliciousness in the system. Vanilla RAFT focuses on performance, often through optimistic execution that provides excellent performance when there are no faults and the network is well-behaved. But it cannot tolerate nodes that exhibit arbitrary behavior, including malicious behavior. This makes it unsuitable for the network once the RAFT rings are open to be run by the public. ## 2. Terminology - **Quorum/Threshold** - refers to the minimum number of participants or nodes that must agree on a decision before it can be considered valid and executed. - **DKG** - Distributed Key Generation. A cryptographic protocol that allows a group of participants to jointly generate a public key and a corresponding private key, without revealing the private key to any individual participant. This is in contrast to traditional key generation, where a single entity generates the private key and then shares the public key with others - **Threshold cryptography** - cryptographic technique that distributes shares of secret key of a cryptosystem among multiple participants or nodes. - **Threshold signatures** - a cryptographic scheme that allows a group of participants to collectively generate a valid signature on a message - **Threshold encryption** - a cryptographic scheme that facilitates decryption to be cooperated by a specified number of nodes. ## 3. Prior Work ### 3.1 Threshold Cryptography Most of the concepts discussed below in this document requires a special set of cryptographic scheme called '**Threshold Cryptography**' to be in place in the network. Threshold cryptography is a cryptographic technique that **distributes shares of a secret key** of a cryptosystem among multiple participants or nodes. This approach enhances the security and resilience of the cryptosystem by ensuring that **no single participant can compromise the secret key**. In the context of consensus protocols, threshold cryptography plays a crucial role in achieving Byzantine Fault Tolerance (BFT). In a threshold cryptography-based consensus protocol, the secret key is divided into multiple shares, each held by a different participant. To perform a cryptographic operation, such as signing a message or decrypting data, a minimum number of participants, known as the **threshold**, must collaborate. This ensures that even if a subset of participants fails or behaves maliciously, the overall security of the system remains intact. To implement threshold cryptography schemes in distributed environments, a special type of key generation called **Distributed Key Generation** is required to distribute the secret key fragments, or shares, among several parties. ### 3.2 Distributed Key Generation DKG allows for a distribution of power (here, in the form of a secret key shares) among a group of independent nodes that do not trust each other. Only when a quorum has been achieved, a power, i.e. the key, can be used. Nobody apart from the involved nodes knows the key either, and the protocol requires no intervention from trusted third parties. In essence, **a group of n parties** P₁, …, Pₙ interact with each other, through secure and authenticated communication channels, to **generate a key pair(sk, pk**) — a secret key and a public key, respectively. They create **n secret key shares** sk₁, …, skₙ, so that the party Pᵢ owns the share skᵢ at the end of the protocol, for each i=1, …, n. In fact, only the non-Byzantine parties are guaranteed to end up with a secret key share. ## 4. Potential concepts ### 4.1 Node Reputation: Refers to the trustworthiness and reliability of a node within the network/a ring. It is an assessment of a node's behavior, performance, and adherence to the blockchain's rules. Node reputation plays a crucial role in maintaining the security, efficiency, and overall stability of the network/a ring. Node reputation serves several important purposes: - **Enhancing Security**: By identifying and penalizing malicious or misbehaving nodes, node reputation helps to maintain a secure and trustworthy network environment. - **Reputation-based Routing**: Node reputation can be used to guide network routing decisions, favoring nodes with higher reputations for bootstrapping. This can improve network performance and reduce the risk of congestion or attacks. - **Incentivizing Good Behavior**: By rewarding nodes with good reputations, reputation systems can incentivize fair and honest participation within the network. This can promote a more collaborative and self-regulating ecosystem. - **Identifying and Isolating Bad Actors**: Node reputation can help to identify and isolate nodes that consistently misbehave or violate network rules. This can prevent them from causing further harm to the system. ### 4.2 Continuous Auditing and Verification: Establish mechanisms for continuous auditing and verification of the system state and transaction history. This could involve using cryptographic logs, tamper-proof audit trails, and independent verification procedures. Nodes amongst the same ring can initiate auditing on each other with a probablistic chance and verify that the proving node is able maintain data integrity. Data that can be audited amongst the ring could be: - **On-chain auditing**: On-chain auditing involves analyzing the data stored on the blockchain itself. This can be done using a variety of tools and techniques, such as transaction history analysis, smart contract auditing, and consensus protocol auditing. - **Off-chain auditing**: Off-chain auditing involves collecting data from external sources, such as node logs, network traffic, and user activity. This data can then be analyzed to identify patterns and anomalies that may indicate fraud, security vulnerabilities, or operational problems. ### 4.3 Leader Validation: Leader validation is a technique that aims to address the BFT limitations of RAFT by introducing a mechanism to verify the integrity of the leader's proposals before replicating them to followers. This validation process helps to ensure that the leader is not modifying the log entries maliciously or proposing invalid states. The leader can hash each proposed log entry and publish the hash value to a DHT(Distributed Hash table). Followers can then retrieve the hash value from the DHT and verify it against the hash value of the proposed entry and rolling hashes can be used to ensure that the leader does maliciously append log entries into the consensus system. ## 5. Resources: - [David Wong's SSS explainer](https://www.cryptologie.net/article/486/difference-between-shamir-secret-sharing-sss-vs-multisig-vs-aggregated-signatures-bls-vs-distributed-key-generation-dkg-vs-threshold-signatures/) - [BLS Deep Dive](https://medium.com/skale/bls-deep-dive-793a4e8a6f4e) - [Dashpay's BLS-DKG RFC](https://github.com/dashpay/dips/blob/master/dip-0006/bls_m-of-n_threshold_scheme_and_dkg.mda) - [Maidsafe's take on Node reputation](https://github.com/maidsafe/rfcs/blob/master/text/0045-node-ageing/0045-node-ageing.md) - [The Honey Badger of BFT Protocols](https://eprint.iacr.org/2016/199.pdf) - [Asynchronous Distributed Private-Key Generators for Identity-Based Cryptography](https://eprint.iacr.org/2009/355) # Discussion points What needs to be done with multi-node? ---- - Testing scenarios for Compute: - Botching leader election - Term Starvation - Add malicious transactions to the block - Testing scenarios for Storage(same as Compute): - Try forging/manipulating blocks as leader - DoS the leader? - Chaos testing: Drop consensus messages randomly Counter measures for Sybil attack? - Node identity ----