--- title: 'Eth2.0 Light Clients: How light is light?' description: > Eth2 light client design and rationale slideOptions: allottedMinutes: 20 --- <!-- .slide: data-background="#283032" --> ## Eth2.0 Light Clients ### How Light is Light? note: light clients are an integral part of eth2 we'll see why they're important and how they will work this will be an overview of eth2 light clients with enough background to understand why things are the way they are --- ## Introduction --- ## About Me Cayman Nava - Eth2.0 developer @ ChainSafe Systems Twitter: [@caymannan](https://twitter.com/caymannan) Github: [wemeetagain](https://github.com/wemeetagain) <section> <img width="200" src="https://pbs.twimg.com/profile_images/1111758438990835713/aE4FTs5F_400x400.png" /> <img width="200" src="https://avatars1.githubusercontent.com/u/1348242?s=460&v=4" /> </section> --- ## About Lodestar ##### Typescript Eth2.0 Ecosystem Beacon Chain Light Client Developer Tooling https://github.com/ChainSafe/lodestar [![Discord](https://img.shields.io/discord/593655374469660673.svg?label=Discord&logo=discord)](https://discord.gg/aMxzVcr) <section> <img width="100" src="https://raw.githubusercontent.com/remojansen/logo.ts/master/ts.png" /> <img width="100" src="https://pbs.twimg.com/profile_images/1098120092280336384/4EUmYuFd_400x400.png" /> </section> --- ## About ChainSafe Toronto-based blockchain protocol development Twitter: [@chainsafeth](https://twitter.com/chainsafeth) Github: [ChainSafe](https://github.com/ChainSafe) ![](https://avatars2.githubusercontent.com/u/27474093?s=200&v=4) --- ## Overview - Motivation - What and why? - Background - PoW vs PoS Light Clients - Merkle Proofs, Multiproofs, and SSZ - Eth2 Light Client - Sync Protocol - Data/Proof Requests - Open questions note: this will be an overview of eth2 light clients with enough background to understand why things are the way they are we'll start with motivation for light clients, what they are, why we need them in eth2 Then, if we need to, we'll cover a little background, we'll cover what merkle trees and merkle proofs are (and how they're useful for light clients) And the difference between PoW and PoS light clients how to think about pos light clients Then we'll dive into eth2 light clients the meat of things starting with the sync protocol, followed by some specifics with data requests and end with some open questions --- ## Motivation --- ## What is a Light Client? > Software looking to **securely consume blockchain data** with requirements that scale **logarithmically** to total blockchain state. note: eg: squaring the number of transactions should only double a light client’s cost. (eg. going from 1,000 tx/day to 1,000,000 tx/day) ---- <iframe src="https://www.desmos.com/calculator/wtbuqpiia9?embed" width="500px" height="500px" style="border: 1px solid #ccc" frameborder=0></iframe> --- ## Why Light Clients in Eth2? Light clients are first class citizens ---- ### Resource-constrained Environments - mobile phones - embedded systems - websites - etc. note: raise your hand if you have a smartphone keep it raised if you haved a synced ethereum blockchain on your phone? ---- ### Decentralization - dreaded Infura single point of failure (:heart:) note: for all you who raised your hand a second ago, how many have metamask installed? ---- ### Blockchains as Light Clients eth2 can peer with other blockchains (**eth1**, cosmos, polkadot, etc.) note: if the requirements are light enough, we can that will be really important if any other blockchains want to verify anything from eth2, eth1 included ---- ### More Shards, More Data - 1024 shards, too much state - validators are light clients-ish of other shards note: light clients are baked into the design of eth2, in the sense that most regular folks, even validators, won't have all of the eth2 state. validators will need to sync recent shard state as part of their duties they'll be using some of the techniques we describe --- ## Background note: start with background, if we don't know the background, we're going to be lost with the actual light client protocol i'm going to cover a few seemingly disparate topics, they all connect --- ### Merkle Proofs Verify the authenticity of a chunk of data **logarithmic** to the number of chunks note: nin either case, we make extensive use of merkle proofs poll audience: who needs a refresher on merkle proofs the proof is succinct, it grows logarithmically to the total number of chunks ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))"->"h(h(a,b),h(b,c))" "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))"->"h(h(e,f),h(g,h))" "h(h(a,b),h(b,c))" "h(h(e,f),h(g,h))" "h(h(a,b),h(b,c))"->"h(a,b)" "h(h(a,b),h(b,c))"->"h(c,d)" "h(h(e,f),h(g,h))"->"h(e,f)" "h(h(e,f),h(g,h))"->"h(g,h)" "h(a,b)" "h(c,d)" "h(e,f)" "h(g,h)" "h(a,b)"->a "h(a,b)"->b "h(c,d)"->c "h(c,d)"->d "h(e,f)"->e "h(e,f)"->f "h(g,h)"->g "h(g,h)"->h c [style=filled,fillcolor=pink] {rank=same;a b c d e f g h} } ``` note: when we want some particular data, we assume its part of a merkle tree ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))"->"h(h(a,b),h(b,c))" "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))"->"h(h(e,f),h(g,h))" "h(h(a,b),h(b,c))" "h(h(e,f),h(g,h))" "h(h(a,b),h(b,c))"->"h(a,b)" "h(h(a,b),h(b,c))"->"h(c,d)" "h(h(e,f),h(g,h))"->"h(e,f)" "h(h(e,f),h(g,h))"->"h(g,h)" "h(a,b)" "h(c,d)" "h(e,f)" "h(g,h)" "h(a,b)"->a "h(a,b)"->b "h(c,d)"->c "h(c,d)"->d "h(e,f)"->e "h(e,f)"->f "h(g,h)"->g "h(g,h)"->h c [style=filled,fillcolor=pink] {rank=same;a b c d e f g h} } ``` note: very important, its a merkle tree that have the root of, and we "trust" it, the scheme only works if we have the root merkle roots are often stored in a blockchain ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="?"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="?",style=filled,fillcolor=pink] h10 [label="?"] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" "h0"->"h2" "h1"->"h3" "h1"->"h4" "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: when verifying a merkle proof, the root is the only thing known and trusted thats usually why you're requesting data in the first place ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="?"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="?"] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: when we're given a chunk of data, which we don't necessarily trust, we have to be able to link it back up the tree to the root, which we do trust. we need to be given the intermediate nodes required to recreate the root these intermediate nodes are the proof ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="?"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: we need one intermediate node per level in the tree, starting from the bottom ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="h()",style=filled,fillcolor=pink] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: with each intermediate node in the proof, we're able to create the immediate parent ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="?"] h2 [label="?"] h3 [label="h()",style=filled,fillcolor=lightblue] h4 [label="h()",style=filled,fillcolor=pink] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" [style=bold,color=blue] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="h()",style=filled,fillcolor=pink] h2 [label="?"] h3 [label="h()",style=filled,fillcolor=lightblue] h4 [label="h()",style=filled,fillcolor=pink] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" [style=bold,color=blue] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="h()",style=filled,fillcolor=pink] h2 [label="h()",style=filled,fillcolor=lightblue] h3 [label="h()",style=filled,fillcolor=lightblue] h4 [label="h()",style=filled,fillcolor=pink] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" [style=bold,color=blue] "h1"->"h3" [style=bold,color=blue] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="h()"] [style=filled, fillcolor=pink] h1 [label="h()",style=filled,fillcolor=pink] h2 [label="h()",style=filled,fillcolor=lightblue] h3 [label="h()",style=filled,fillcolor=lightblue] h4 [label="h()",style=filled,fillcolor=pink] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" [style=bold,color=blue] "h1"->"h3" [style=bold,color=blue] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: at that point, you can compare your trusted root against this newly computed root, and iff the roots match ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] "h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))" [style=filled,fillcolor="#90ee90"] h0 [label="h()"] [style=filled, fillcolor="#90ee90"] h1 [label="h()",style=filled,fillcolor="#90ee90"] h2 [label="h()",style=filled,fillcolor=lightblue] h3 [label="h()",style=filled,fillcolor=lightblue] h4 [label="h()",style=filled,fillcolor="#90ee90"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?"] h8 [label="?"] h9 [label="c",style=filled,fillcolor="pink"] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" [style=bold,color=blue] "h1"->"h3" [style=bold,color=blue] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: then you've verified that the data is correct you've "verified the proof" one thing to reiterate is that you only need these blue pieces, one per level, that number grows logarithmically with the total number of leaves --- ## Merkle Multiproofs note: merkle multiproofs are merkle proofs ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] h0 [label="?"] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="?"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="?",style=filled,fillcolor=pink] h8 [label="?"] h9 [label="?",style=filled,fillcolor=pink] h10 [label="?"] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" "h0"->"h2" "h1"->"h3" "h1"->"h4" "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: multiproofs are proofs for multiple leaves in the tree ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] h0 [label="?"] [style=filled, fillcolor=lightyellow] h1 [label="?"] h2 [label="?"] h3 [label="?"] h4 [label="?"] h5 [label="?"] h6 [label="?"] h7 [label="?"] h7 [label="a",style=filled,fillcolor=pink] h8 [label="?"] h9 [label="c",style=filled,fillcolor=pink] h10 [label="?"] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" "h1"->"h3" [style=bold,color=red] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" [style=bold,color=red] "h3"->"h8" "h4"->"h9" [style=bold,color=red] "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: we construct the multiproof in a similar way to how we construct the individual proofs. Identify the elements needed to recreate the roots from each leaf BUT the idea is that we can share the elements needed for each leaf ---- ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] h0 [label="h()"] [style=filled, fillcolor=lightyellow] h1 [label="h()",style=filled,fillcolor=lightyellow] h2 [label="h()",style=filled,fillcolor=lightblue] h3 [label="h()",style=filled,fillcolor=lightyellow] h4 [label="h()",style=filled,fillcolor=lightyellow] h5 [label="?"] h6 [label="?"] h7 [label="a",style=filled,fillcolor=pink] h8 [label="b",style=filled,fillcolor=lightblue] h9 [label="c",style=filled,fillcolor=pink] h10 [label="d",style=filled,fillcolor=lightblue] h11 [label="?"] h12 [label="?"] h13 [label="?"] h14 [label="?"] "h0"->"h1" [style=bold,color=red] "h0"->"h2" [style=bold,color=blue] "h1"->"h3" [style=bold,color=red] "h1"->"h4" [style=bold,color=red] "h2"->"h5" "h2"->"h6" "h3"->"h7" [style=bold,color=red] "h3"->"h8" [style=bold,color=blue] "h4"->"h9" [style=bold,color=red] "h4"->"h10" [style=bold,color=blue] "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: so a proof for just A here would need 3 nodes in the tree and a proof for just C would need 3 nodes in the tree but instead of needing 6 nodes for the multiproof, we only need 3 --- ## PoW vs. PoS Light Clients note: briefly look at some of the differences, how the eth2 light client will differ from an eth1 light client --- ## PoW Light Clients Easy because headers can be verified with only protocol rules note: the headers have everything we need ---- <div style="background: white"> <img src="https://blog.ethereum.org/wp-content/uploads/2015/01/pow_header.png" /> </div> 1. download headers 2. verify (block by block) 3. request merkle proofs note: we hash the header verify the proof of work verify the next header's previous hash once we get to the head, we have the relevant merkle roots, and we can request data and merkle proofs --- ## PoS Light Clients Headers alone aren't sufficient to verify proof of stake We need to track stake note: In the PoS world we're governed by some sort of (super)majority stake We must ensure we're on the chain with the most stake which means we need to track balances and votes votes are cryptographic signatures this is a different beast than PoW light clients, theres an opportunity to do things a little bit differently --- ## SSZ (Simple Serialize) note: this is an eth2 spec designed around consistent and easy merkleization merkleization ---- #### SSZ Example: Checkpoint ```python class Checkpoint(Container): epoch: Epoch root: Hash ``` ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] h0 [label="h()"] h1 [label="epoch",style=filled,fillcolor=lightblue] h2 [label="root",style=filled,fillcolor=lightblue] "h0"->"h1" "h0"->"h2" } ``` ---- #### SSZ Example: Crosslink ```python class Crosslink(Container): shard: Shard parent_root: Hash start_epoch: Epoch end_epoch: Epoch data_root: Hash ``` ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] h0 [label="h()"] h1 [label="h()"] h2 [label="h()"] h3 [label="h()"] h4 [label="h()"] h5 [label="h()"] h6 [label="h()"] h7 [label="shard",style=filled,fillcolor=lightblue] h8 [label="parent_root",style=filled,fillcolor=lightblue] h9 [label="start_epoch",style=filled,fillcolor=lightblue] h10 [label="end_epoch",style=filled,fillcolor=lightblue] h11 [label="data_root",style=filled,fillcolor=lightblue] h12 [label="0x0"] h13 [label="0x0"] h14 [label="0x0"] "h0"->"h1" "h0"->"h2" "h1"->"h3" "h1"->"h4" "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` ---- #### SSZ Example: AttestationData ```python class AttestationData(Container): # LMD GHOST vote beacon_block_root: Hash # FFG vote source: Checkpoint target: Checkpoint # Crosslink vote crosslink: Crosslink ``` ```graphviz digraph mygraph { nodesep=0.5 node [color=black,fontname=Courier,shape=box] edge [color=black, style=dashed, dir=none] hash [label="beacon_block_root",style=filled,fillcolor=lightblue] a0 [label="source",style=filled,fillcolor=lightblue] a1 [label="epoch"] a2 [label="root"] b0 [label="target",style=filled,fillcolor=lightblue] b1 [label="epoch"] b2 [label="root"] h0 [label="crosslink",style=filled,fillcolor=lightblue] h1 [label="h()"] h2 [label="h()"] h3 [label="h()"] h4 [label="h()"] h5 [label="h()"] h6 [label="h()"] h7 [label="shard"] h8 [label="parent_root"] h9 [label="start_epoch"] h10 [label="end_epoch"] h11 [label="data_root"] h12 [label="0x0"] h13 [label="0x0"] h14 [label="0x0"] j0 [label="h()"] j1 [label="h()"] j2 [label="h()"] j0->j1 j0->j2 j1->hash j1->a0 j2->b0 j2->h0 "h0"->"h1" "h0"->"h2" "a0"->"a1" "a0"->"a2" "b0"->"b1" "b0"->"b2" "h1"->"h3" "h1"->"h4" "h2"->"h5" "h2"->"h6" "h3"->"h7" "h3"->"h8" "h4"->"h9" "h4"->"h10" "h5"->"h11" "h5"->"h12" "h6"->"h13" "h6"->"h14" } ``` note: when you would create a merkle root of this "attestationdata", you would be including the root of the underlying crosslink and the crosslink includes the "data_root" which is the merkle root of some shard data eth2 datastructures include merkle roots in many places because its really useful and necessary to be able to create proofs beacon state -> beacon blocks beacon blocks -> beacon state shard blocks -> beacon blocks --- ## Eth2 Light Client --- ## Eth2 Sync Protocol note: explain by asking questions and figuring out what makes sense ---- #### Motivating Questions How do we get updated _trusted_ merkle roots? Can we do this _succinctly_? note: pow strategy not sufficient we need to get up-to-date "trusted" merkle roots but in Pos, that means we need stake in the name of light clients, lets make this as lightweight as possible ---- #### What is trusted? Roots attested by 2/3 of stake. Staked votes gives weight to the chain. note: PoS requirement ---- #### Key insight 1: Instead of syncing headers by hashing one by one Use staked vote balance, skip ahead to a current header* note: we don't need to sync headers one by one verifying each one by checking the parent hash we're in a PoS world, where we're governed by a 2/3 majority we can use votes as verification of recent headers instead of checking hashes for pow validity we have to track validator stake/votes ---- #### Key insight 2: Instead of tracking all validator balances + votes Track a subset of validators (committee) note: track a committee ---- ## Where do eth2 validators validate? ---- ### Crosslink Committees - fast, changing every epoch (~6 min) - attest to recent shard data (crosslink) - attest to beacon block root (but only as total validator set) - attest to recent checkpoints (FFG data) note: this doesn't really work for light clients because we still need the whole validator set to authenticate recent block checkpoints ---- ### Shard Committees - slow, period committies change every shard period (~27 hrs) - attest to shard block root note: shard block header contains a beacon block root better candidate, changes slowly, only need to update every ~27 hours --- ### Sync Protocol - Sync by shard block root - with items needed to verify weight* - Shard root gives us beacon block header (via merkle proof) - Every ~27 hrs, update period committee note: Track the period committees assigned to that shard shard committees change fully every ~27 hours who voted for that shard block how much of the total stake voted for the block we can jump 27 hrs at a time lets look briefly at the datastructures involved in syncing ---- ```python class LightClientMemory(object): # Randomly initialized and retained forever shard: Shard # Beacon header which is not expected to revert header: BeaconBlockHeader # period committees corresponding to the beacon header previous_committee: CompactCommittee current_committee: CompactCommittee next_committee: CompactCommittee ``` note: This is data we retain for syncing this is the minimum amount of data to store the shard tells us which shard we're tracking (this is random and we shouldn't actually care which one since we're just using the shard to get to the beacon block header) the header is the key trusted piece of data we use to verify merkle proofs against (just like in PoW light clients) from a beacon block, we can use merkle proofs to verify data about shards, beacon state, everything the committees are stored to keep track of pubkeys/balances of those who are voting on recent shard block roots a shard committee is a blend of two underlying period committees, which change every ~27 hrs ---- ```python class LightClientUpdate(container): # Shard block root (and authenticating signature data) shard_block_root: Hash fork_version: Version aggregation_bits: Bitlist signature: BLSSignature # Updated beacon header (and authenticating branch) header: BeaconBlockHeader header_branch: MerkleProof # Updated period committee (and authenticating branch) committee: CompactCommittee committee_branch: MerkleProof ``` note: This is the data we request to stay synced We need one of these every ~27 hours The top section is the shard block root and authenticating information lets start from the botttom of the section the signature is an aggregated signature that contains the signatures of all attesters in the committee who voted for the shard block root the aggregation bits tell us who in the committee signed the fork version lets us make sure the votes are for the fork we think we're on and the shard block root is the updated block root The next section is the new beacon block header, our new key to the castle. If all goes well, we'll update our light client memory with this header. The header branch is a merkle proof that we run against the shard block root. and the committee is the new period committee. If all goes well, we'll update our light client memory with this new committee. The committee branch is a merkle proof that we run against the header ---- ```python def update_memory( memory: LightClientMemory, update: LightClientUpdate ): # Verify the update does not skip a period # Verify shard attestations # - vote is for the shard root # Verify shard committee votes pass 2/3 threshold # - vote has sufficient weight # Verify update header against shard block root and header branch # - header is valid # Update period committees if entering a new period # - verify committee against header # Update header ``` note: can't skip a period - we need to track all committee changes so we keep track of all stake for that shard ensure that the vote is for the shard root we just got in the update and that it has sufficient weight at this point, we 'trust' the shard root now we can use the shard root and proof to prove the validity of the included beacon block header that way, we now trust the beacon block header and once the header is trusted, we can use it and a proof to prove the validity of the committee update ---- ### Update data size ---- - shard_block_root: 32 bytes - fork_version: 4 bytes - aggregation_bits: 16 bytes - signature: 96 bytes - header: 8 + 32 + 32 + 32 + 96 = 200 bytes - header_branch: 4 * 32 = 128 bytes - committee: 128 * (48 + 8) = 7,168 bytes - committee_branch: (5 + 10) * 32 = 480 bytes Total: **8,124 bytes** per **~27 hours** ---- Total: 8,124 bytes per ~27 hours or **~0.083 bytes per second** vs. Bitcoin SPV: 80 bytes per ~560 second **~0.143 bytes per second** note: Our light client sync requires approx 0.083 bytes per second For reference, bitcoin's light client protocol, requires approx 0.143 bytes per second So we're doing pretty good --- ## Data/Proof Requests Once we're synced, now what? A: Gimme proofs ---- ### Who has what data All valididators have recent beacon chain state 1/1024 validators have recent shard state Relayers/State providers have EE state note: this effects how we will request data ---- ### Proof sizes: Token EE Balance Example 1. light client update(w/o new committee)? 476 bytes 2. finalized block root? 224 bytes 3. state root? 128 bytes 4. crosslink data? 672 bytes 5. ee state root? ~736 bytes 6. token balance? ~1024 bytes note: lets think about what a light client would need to get their updated balance on some shard on some ee lets look at the path we would need, and roughly what the size is total ~3.2kb --- ## Open questions * Request structure * How do clients request which data they need? * Networking * How do clients pick servers to connect to? * How does a client quickly connect to a new shard? * Incentivization * How do clients pay servers? * Should shard nodes be light client servers by default? --- <!-- .slide: data-background="#283032" --> ## End github.com/chainsafe/lodestar ---- References - https://blog.ethereum.org/2015/01/10/light-clients-proof-stake/ - https://arxiv.org/pdf/1710.09437.pdf - https://blog.ethereum.org/2014/11/25/proof-stake-learned-love-weak-subjectivity/ - https://github.com/ethereum/eth2.0-specs/blob/dev/specs/light_client/sync_protocol.md - https://medium.com/@jgm.orinoco/understanding-sparse-merkle-multiproofs-9b9f049e8f08 - @ralexstokes and @dannyryan