or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Eth2 Phase0 client implementation - from scratch
Contribution must read
Approach: start tackling each topic one by one. Others can assign themselves to certain sub-topics and help write this.
Contribution rules:
LICENSE: by contributing here, you agree that your output is licensed as CC0 1.0 Universal. Like the eth2.0 specs itself. If you mix in work that is licensed differently, you either have to communicate it and make it very clear, or your contribution may be removed.
FORMAT:
Maintainers / Contributors
Add yourself here if you contributed in some way
Maintainers:
Contributors:
Table of Contents
Client architecture
An eth2.0 client generally consists of two main components:
Phase1 extends this with similar responsibilities for shards.
Phase2 is experimental, but then extends this with nodes capable of processing shards beyond basic data-storage functionality.
Beacon node responsibilities
Validator node responsibities
Encoding and merkelization stack: SSZ
Simple Serialize (SSZ) is Eth2's type system. All Eth2 datastructures use SSZ-defined types that follow the SSZ standard for efficient serialization and consistent merkleization.
SSZ defines basic types
as well as composite types
Special packed encoding is given to a vector and list of boolean, aliased bitvector and bitlist.
hash-tree-root, with binary trees
One primary operation on SSZ datastructures is
hashTreeRoot
, the retrieval of the "root hash" (root) of the merkleized data.A root of data is used extensively throughout Eth2 as:
All SSZ-defined data can be unambiguously represented as a binary merkle tree, using sha256 as the hash function. This unambiguous representation gives us a tool to gain consensus about potentially large datasets by simply agreeing on a root.
(See Data-sharing and caching as first-class citizen on the advantages / tradeoffs of persisting the merkle tree as a representation of SSZ-defined data)
One key insight is that composite SSZ types may be viewed as merkle trees composed of merkle trees. Many Eth2 datastructures have elements that are roots of historical versions of structures, eg: links to previous blocks or states. A root of data is a stand-in for the full datastructure, as a root is simply the top node in a merkle tree representation. Large Eth2 SSZ types are generally heterogeneous structures which can be recursively expanded into deeper and deeper trees that traverse further into history and deeper into large, summarized datasets.
Light client protocols make extensive use of this ability to expand roots into trees.
(See Block-header and Block: Hash-tree-root transparency for more)
(de)serialization, with offsets
SSZ limit type data for encoding and consensus safety
Block-header and Block: Hash-tree-root transparency
Data-sharing and caching as first-class citizen
Layered approach, typing abstracting away backings
Tree-backing encodings
Light client functionality, SSZ Partials
Network stack
Discovery/bootstrapping (@leobago - General)
When a node tries to join the p2p network, it first try to contact a bootnode, which is node maintained (by the Ethereum Foundation for exemple) with the specific purpose of service a discovery endpoint. For instance, Geth has a list of bootnodes hardcoded into its implementation. After reaching out to the bootnode, it asks for a list of nodes close to it (through the FindNode procedure), and then iteratively ask those nodes for other nodes closer to it.
Methods
Kademlia DHT (@leobago - General)
Kademlia is a peer-to-peer distributed hash table (DHT) intented to offer the ability to easily discover nodes in a network. Kademlia uses a XOR-based metric topology that simplifies some of the features, such as reducing the number of configuration messages in the system. The simple communication protocol involves remote procedure calls (RPC) such as Ping, Store, FindNode and FindValue. For Eth2, only Ping and FindNode calls are essential, although this might change in the future. Kademlia uses a SHA256(node id) as the Kademlia id, a 32-bit value. As a consequence of XOR Metric used in Kademlia, the routing table is a set of lists called k-buckets in which each bucket holds a maximum of k endpoints.
Discovery v5 (@leobago - General)
Discovery 5 is the evolution of the mechanism by which peers discover each other in the p2p network, previously called Discovery 4. The Discovery 5 protocol is inspired from the Kademlia DHT, with a few differences (e.g., signed and versioned node records are used instead of DHT stores). Discovery 5 has three main functions:
sequence
.Discovery 5 has several improvements over its predecesor (Discovery 4), for instance it allows to advertise specific topics and it also allows to store arbitrary node metadata.
Records (@leobago - General)
Records are useful in a P2P network to maintain information about the nodes in the mesh. This information can be used identify nodes as well as to rate their participation in the network. Ethereum 2 implements its own node records and they are presented in the following section.
ENR to enhance peer info (@leobago - General)
Ethereum Node Records (ENR) correspond to signed and versioned information about the nodes. This information is usually related to the network endpoints of a node, such as its IP addresses and ports but it can also provide a way to classify different types of nodes in the network. The information is stored in the records as a simple list of key-value pairs. The list of pairs is then signed cryptographically and added into the
signature
component of the record. Every time the information changes (e.g., IP address is updated), the information is signed again and thesignature
is updated and the sequence numberseq
(64-bit unsigned integer) is incremented.Peer connections (@leobago - General)
Upon first startup, clients MUST generate an RSA key pair in order to identify the client on the network. The SHA-256 multihash of the public key is the clients's Peer ID, which is used to look up the client in libp2p's peer book and allows a client's identity to remain constant across network changes.
Introduce multi-addresses (@leobago - General)
Multi-addresses (
multiaddr
) are a new way to make addresses more future-proof by solving several of the limitations that current addresses have. The following are some of the features that multi-addresses have:multiaddrs
support addresses for several network protocol increasing interoperability.multiaddr
conform to a simple syntax that is self-describing.multiaddr
are both human-readable and machine-readable.multiaddr
can be easily wrapped and unwrapped in several encapsulation layers.Multi-addresses Example (@leobago - General)
Eth2 clients are identified by
multiaddr
. For example, the human-readablemultiaddr
for a client located at example.com, available via TCP on port 8888, and with peer ID ehgyukGllbeWhyukyverG35T45GEg3G3GwfQWfewewefQU would look like this:/dns4/example.com/tcp/8888/p2p/ehgyukGllbeWhyukyverG35T45GEg3G3GwfQWfewewefQU
Peer scoring
Peer limits and prunes (@leobago - General)
In a P2P network nodes can have an arbitrary number of connections, however there is a sweet spot, too few could lead to messages being lost and too many generates unnecessary traffic. However, not all peers perform equally and some of them could deliver messages unreliably or even corrupted. To avoid the case of having a node being stuck with bad peers, frequent recalibration is necessary. For this purpose nodes can Prune a mesh link and set a backoff timer to avoid re-grafting a link with the prunned peer.
Multi-plexing connections
Transports (@leobago - General)
There are several transport protocols that can be used to relay information on a P2P network. In this section we describe the different ones that have been considered to be used for Eth2.
TCP (@leobago - General)
Transmission Control Protocol (TCP) is the standard way to communicate between applications running on hosts communicating via the internet, in a ordered, reliable and error-free fashion. To guarantee this, TCP implements packet retransmission and error detection as well as a three-way handshake to establish active open connections. As usual, reliability features come with a performance overhead in terms of both latency and bandwidth.
QUIC (@leobago - General)
QUIC is a transport layer netwrok protocol defined by Google in 2012 and implemented in the Chrome browser to speedup connections over TCP for various services such as Maps or Youtube. To achieve better performance, QUIC establishes multiple connections between two endpoints over User Datagram Protocol (UDP), instead of using only TCP. In addition to its high performance, the protocol is secured with ecrypted communications, Transport Layer Security (TLS).
Websockets and WebRTC. Circuit relay mechanisms, browser nodes, etc.
Security
SecIO
Noise
TLS 1.3
RPC
Multiselect libp2p protocols
Request/response
Protocols as message types
Encoding negotiation through multiselect
Response codes
Chunkification
Length prefixes, compressed payloads
Goodbye messages
PubSub, through libp2p GossipSub (@leobago - General)
P2P networks usually implement Publish/Subscribe systems to distribute messages in an asynchronous fashion. Subscribers declare their interest for a specific topic and publishers sent messages in one of the existing topics. In this way, senders and receivers are not in direct communication, but rather interact throught the pub/sub system. There are several types of p2p networks, structured and unstructured. The first type has Super Nodes which are assigned more responsabilities (e.g., relay events, support routing, etc) than normal Nodes. The second type (i.e., unstructured) do not have Super Nodes, implying that all nodes can be ephemeral (e.g., mobile devices) making it harder to guarantee reliability of message delivery.
GossipSub vs floodsub (@leobago - General)
Due to the constrains of unstructured P2P overlays, different protocols have been proposed to relay information over the network. The simplest one being Floodsub, which consists on forwarding every message to all subscribed peers. This obviously creates a huge amount of unnecessary traffic over the network, as peers receive the same message through different sources. To avoid such a bandwidth waste, GossipSub proposes to forward metadata of the messages they have "seen", instead of the entire content of the message. In addition, GossipSub implements Lazy push in which peers that are interested on a message for which they have received the metadata, request that message explecitly. Given the difference in size between message data and metadata, using this technique it is possible to save a non-neglegible part of the network bandwidth, improving scalability.
Content-based message IDs
Attestation aggregation subnets
Techniques
(Naive) Local aggregation
On-the-fly aggregation
About the privacy problem
Handel
Advanced gossip based aggregation
Committee shuffling and subnet backbones
Global aggregates collection topic
Beacon blocks topic
Misc. consensus operations topics
State management
Storage
Hash-tree-root cache metadata
Finalized storage
Pruning
Flattening
Hot storage
Data-sharing
Batched persists
Handling re-orgs
Handling long gaps
In-memory hot state data
Data-sharing, caching
Lazy-load cold state
Block imports
Queuing, resolve ancestors first
Processing. From checkpoint or hot state.
Slashing detection
Listening for attestations
Surround and double vote detection
Efficient storage: cover full weak-subjectivity period
Efficient matching: find slashings quick
Fork choice
On Eth2, as in any blockchain, there is a function to decide which is the canonical chain when multiple options are proposed. Given that different validators in the network can produce and/or receive different blocks at different times, it is common to observe different competing chains exist during short periods of time, until one of them finally is chosen as the "Fork choice". In bitcoin, for instance, the algorithm selects the longest chain as the fork choice, which is the security strategy on a PoW blockchain. On Eth2 it is not the longest chain the one that is selected, but rather the chain with most validator votes, following the PoS security strategy.
LMD GHOST
Fork versions
Attestation processing
Balance-weighting
Justification/finalization
Attack protection
Eth1
Eth1 data voting, voting periods
Eth1 deposit contract processing
BLS Signatures (@leobago - General)
Validators have several roles in Eth2, one of them is to attest that the work of other validators has been done correctly, and if not those bad actors should be slashed. This verification consists on verifying thousands of validators at every single step, which could be extremely time consuming with conventional signature verification mechanisms. Boneh-Lynn-Shacham (BLS) signatures have an interesting property that allows for signature aggregation, speeding the process of signature verification dramatically and allowing for much larger committee sizes.
Pubkey store
Lazy serialize/deserialize
IETF standard
Fast aggregate-verify
Validator client
Validators play an essential role in Eth2, they are the new miners of the network, the ones in charge of maintaining the security of the system and verifying that all nodes follow the rules. In contrast with Eth1, and any PoW chain, validators do not need to spit millions of hashes per second in order to participate in the block creation process. The randomness comes from a completely different root, which makes the whole procedure much less energy hungry and hence more sustainable. Validators have multiple roles and they follow a strict life cycle that has been manually tunned for security.
Proposing
RANDAO participation
Eth1 deposits
Attestation inclusion
Aggregates value optimalization
Slashing inclusion
Exits, and future withdrawals, transfers, etc.
Attesting
Signing
Slashing protections
Selected as aggregator
Subnet switch after new shuffling
Key management
BLS key standard
Validator life cycle
Deposits
New validator
Top-up existing validator
Activation eligibilty
Activation queue
Active
Exiting
Exit queue
Withdrawal
Sync
Status messages, sync peer selection
Initial sync
Blocks-By-Range
Catch-up sync
Blocks-By-Root
Sync responses
Consensus
Beacon-chain transition
Optimizations
Proposers pre-computation
Committee shuffling pre-computation
Active-indices and committee count pre-computation
Attester-status pre-computation
Alternatives with Memoization