Try   HackMD

Eth2.0 Light Clients

How Light is Light?

note:
light clients are an integral part of eth2
we'll see why they're important and how they will work

this will be an overview of eth2 light clients with enough background to understand why things are the way they are


Introduction


About Me

Cayman Nava - Eth2.0 developer @ ChainSafe Systems

Twitter: @caymannan
Github: wemeetagain

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

About Lodestar

Typescript Eth2.0 Ecosystem

Beacon Chain
Light Client
Developer Tooling

https://github.com/ChainSafe/lodestar

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

About ChainSafe

Toronto-based blockchain protocol development

Twitter: @chainsafeth
Github: ChainSafe

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →


Overview

  • Motivation
    • What and why?
  • Background
    • PoW vs PoS Light Clients
    • Merkle Proofs, Multiproofs, and SSZ
  • Eth2 Light Client
    • Sync Protocol
    • Data/Proof Requests
    • Open questions

note:

this will be an overview of eth2 light clients with enough background to understand why things are the way they are

we'll start with motivation for light clients,
what they are, why we need them in eth2

Then, if we need to, we'll cover a little background,
we'll cover what merkle trees and merkle proofs are
(and how they're useful for light clients)
And the difference between PoW and PoS light clients
how to think about pos light clients

Then we'll dive into eth2 light clients
the meat of things
starting with the sync protocol,
followed by some specifics with data requests
and end with some open questions


Motivation


What is a Light Client?

Software looking to securely consume blockchain data with requirements that scale logarithmically to total blockchain state.

note:
eg:
squaring the number of transactions
should only double a light client’s cost.
(eg. going from 1,000 tx/day to 1,000,000 tx/day)



Why Light Clients in Eth2?

Light clients are first class citizens


Resource-constrained Environments

  • mobile phones
  • embedded systems
  • websites
  • etc.

note:
raise your hand if you have a smartphone
keep it raised if you haved a synced ethereum blockchain on your phone?


Decentralization

  • dreaded Infura single point of failure (
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    )

note:
for all you who raised your hand a second ago, how many have metamask installed?


Blockchains as Light Clients

eth2 can peer with other blockchains
(eth1, cosmos, polkadot, etc.)

note:
if the requirements are light enough, we can
that will be really important if any other blockchains want to verify anything from eth2, eth1 included


More Shards, More Data

  • 1024 shards, too much state
  • validators are light clients-ish of other shards

note:
light clients are baked into the design of eth2,
in the sense that most regular folks, even validators, won't have all of the eth2 state.
validators will need to sync recent shard state as part of their duties
they'll be using some of the techniques we describe


Background

note:
start with background, if we don't know the background, we're going to be lost with the actual light client protocol

i'm going to cover a few seemingly disparate topics, they all connect


Merkle Proofs

Verify the authenticity of a chunk of data logarithmic to the number of chunks

note:
nin either case, we make extensive use of merkle proofs
poll audience: who needs a refresher on merkle proofs
the proof is succinct, it grows logarithmically to the total number of chunks








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h(h(a,b),h(b,c))

h(h(a,b),h(b,c))



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))->h(h(a,b),h(b,c))




h(h(e,f),h(g,h))

h(h(e,f),h(g,h))



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))->h(h(e,f),h(g,h))




h(a,b)

h(a,b)



h(h(a,b),h(b,c))->h(a,b)




h(c,d)

h(c,d)



h(h(a,b),h(b,c))->h(c,d)




h(e,f)

h(e,f)



h(h(e,f),h(g,h))->h(e,f)




h(g,h)

h(g,h)



h(h(e,f),h(g,h))->h(g,h)




a

a



h(a,b)->a




b

b



h(a,b)->b




c

c



h(c,d)->c




d

d



h(c,d)->d




e

e



h(e,f)->e




f

f



h(e,f)->f




g

g



h(g,h)->g




h

h



h(g,h)->h




note:
when we want some particular data, we assume its part of a merkle tree








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h(h(a,b),h(b,c))

h(h(a,b),h(b,c))



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))->h(h(a,b),h(b,c))




h(h(e,f),h(g,h))

h(h(e,f),h(g,h))



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))->h(h(e,f),h(g,h))




h(a,b)

h(a,b)



h(h(a,b),h(b,c))->h(a,b)




h(c,d)

h(c,d)



h(h(a,b),h(b,c))->h(c,d)




h(e,f)

h(e,f)



h(h(e,f),h(g,h))->h(e,f)




h(g,h)

h(g,h)



h(h(e,f),h(g,h))->h(g,h)




a

a



h(a,b)->a




b

b



h(a,b)->b




c

c



h(c,d)->c




d

d



h(c,d)->d




e

e



h(e,f)->e




f

f



h(e,f)->f




g

g



h(g,h)->g




h

h



h(g,h)->h




note:
very important, its a merkle tree that have the root of, and we "trust" it, the scheme only works if we have the root

merkle roots are often stored in a blockchain








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

?



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

?



h4->h9




h10

?



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
when verifying a merkle proof, the root is the only thing known and trusted
thats usually why you're requesting data in the first place








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

?



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

?



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
when we're given a chunk of data, which we don't necessarily trust,
we have to be able to link it back up the tree to the root, which we do trust.
we need to be given the intermediate nodes required to recreate the root
these intermediate nodes are the proof








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

?



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
we need one intermediate node per level in the tree, starting from the bottom








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
with each intermediate node in the proof, we're able to create the immediate parent








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14











mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

h()



h0->h1




h2

?



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14











mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

?



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14











mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

h()



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
at that point, you can compare your trusted root against this newly computed root, and iff the roots match








mygraph



h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))

h(h(h(a,b),h(b,c)),h(h(e,f),h(g,h))



h0

h()



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
then you've verified that the data is correct
you've "verified the proof"
one thing to reiterate is that you only need these blue pieces, one per level, that number grows logarithmically with the total number of leaves


Merkle Multiproofs

note:
merkle multiproofs are merkle proofs








mygraph



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

?



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

?



h3->h7




h8

?



h3->h8




h9

?



h4->h9




h10

?



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
multiproofs are proofs for multiple leaves in the tree








mygraph



h0

?



h1

?



h0->h1




h2

?



h0->h2




h3

?



h1->h3




h4

?



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

a



h3->h7




h8

?



h3->h8




h9

c



h4->h9




h10

?



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
we construct the multiproof in a similar way to how we construct the individual proofs.
Identify the elements needed to recreate the roots from each leaf
BUT the idea is that we can share the elements needed for each leaf








mygraph



h0

h()



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

?



h2->h5




h6

?



h2->h6




h7

a



h3->h7




h8

b



h3->h8




h9

c



h4->h9




h10

d



h4->h10




h11

?



h5->h11




h12

?



h5->h12




h13

?



h6->h13




h14

?



h6->h14




note:
so a proof for just A here would need 3 nodes in the tree
and a proof for just C would need 3 nodes in the tree
but instead of needing 6 nodes for the multiproof, we only need 3


PoW vs. PoS Light Clients

note:
briefly look at some of the differences, how the eth2 light client will differ from an eth1 light client


PoW Light Clients

Easy because headers can be verified with only protocol rules

note:
the headers have everything we need


Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
  1. download headers
  2. verify (block by block)
  3. request merkle proofs

note:
we hash the header
verify the proof of work
verify the next header's previous hash
once we get to the head, we have the relevant merkle roots, and we can request data and merkle proofs


PoS Light Clients

Headers alone aren't sufficient to verify proof of stake
We need to track stake

note:
In the PoS world we're governed by some sort of (super)majority stake
We must ensure we're on the chain with the most stake
which means we need to track balances and votes
votes are cryptographic signatures

this is a different beast than PoW light clients, theres an opportunity to do things a little bit differently


SSZ (Simple Serialize)

note:
this is an eth2 spec designed around consistent and easy merkleization
merkleization


SSZ Example: Checkpoint

class Checkpoint(Container):
    epoch: Epoch
    root: Hash






mygraph



h0

h()



h1

epoch



h0->h1




h2

root



h0->h2





class Crosslink(Container):
    shard: Shard
    parent_root: Hash
    start_epoch: Epoch
    end_epoch: Epoch
    data_root: Hash






mygraph



h0

h()



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

h()



h2->h5




h6

h()



h2->h6




h7

shard



h3->h7




h8

parent_root



h3->h8




h9

start_epoch



h4->h9




h10

end_epoch



h4->h10




h11

data_root



h5->h11




h12

0x0



h5->h12




h13

0x0



h6->h13




h14

0x0



h6->h14





SSZ Example: AttestationData

class AttestationData(Container):
    # LMD GHOST vote
    beacon_block_root: Hash
    # FFG vote
    source: Checkpoint
    target: Checkpoint
    # Crosslink vote
    crosslink: Crosslink






mygraph



hash

beacon_block_root



a0

source



a1

epoch



a0->a1




a2

root



a0->a2




b0

target



b1

epoch



b0->b1




b2

root



b0->b2




h0

crosslink



h1

h()



h0->h1




h2

h()



h0->h2




h3

h()



h1->h3




h4

h()



h1->h4




h5

h()



h2->h5




h6

h()



h2->h6




h7

shard



h3->h7




h8

parent_root



h3->h8




h9

start_epoch



h4->h9




h10

end_epoch



h4->h10




h11

data_root



h5->h11




h12

0x0



h5->h12




h13

0x0



h6->h13




h14

0x0



h6->h14




j0

h()



j1

h()



j0->j1




j2

h()



j0->j2




j1->hash




j1->a0




j2->b0




j2->h0




note:
when you would create a merkle root of this "attestationdata", you would be including the root of the underlying crosslink
and the crosslink includes the "data_root" which is the merkle root of some shard data
eth2 datastructures include merkle roots in many places because its really useful and necessary to be able to create proofs
beacon state -> beacon blocks
beacon blocks -> beacon state
shard blocks -> beacon blocks


Eth2 Light Client


Eth2 Sync Protocol

note:
explain by asking questions and figuring out what makes sense


Motivating Questions

How do we get updated trusted merkle roots?
Can we do this succinctly?

note:
pow strategy not sufficient
we need to get up-to-date "trusted" merkle roots
but in Pos, that means we need stake
in the name of light clients, lets make this as lightweight as possible


What is trusted?

Roots attested by 2/3 of stake.
Staked votes gives weight to the chain.

note:
PoS requirement


Key insight 1:

Instead of syncing headers by hashing one by one
Use staked vote balance, skip ahead to a current header*

note:
we don't need to sync headers one by one
verifying each one by checking the parent hash

we're in a PoS world, where we're governed by a 2/3 majority
we can use votes as verification of recent headers
instead of checking hashes for pow validity
we have to track validator stake/votes


Key insight 2:

Instead of tracking all validator balances + votes
Track a subset of validators (committee)

note:
track a committee


Where do eth2 validators validate?


  • fast, changing every epoch (~6 min)
  • attest to recent shard data (crosslink)
  • attest to beacon block root (but only as total validator set)
  • attest to recent checkpoints (FFG data)

note:
this doesn't really work for light clients because we still need the whole validator set to authenticate recent block checkpoints


Shard Committees

  • slow, period committies change every shard period (~27 hrs)
  • attest to shard block root

note:
shard block header contains a beacon block root
better candidate, changes slowly, only need to update every ~27 hours


Sync Protocol

  • Sync by shard block root
    • with items needed to verify weight*
  • Shard root gives us beacon block header (via merkle proof)
  • Every ~27 hrs, update period committee

note:
Track the period committees assigned to that shard
shard committees change fully every ~27 hours
who voted for that shard block
how much of the total stake voted for the block

we can jump 27 hrs at a time
lets look briefly at the datastructures involved in syncing


class LightClientMemory(object):
    # Randomly initialized and retained forever
    shard: Shard
    
    # Beacon header which is not expected to revert
    header: BeaconBlockHeader
    
    # period committees corresponding to the beacon header
    previous_committee: CompactCommittee
    current_committee: CompactCommittee
    next_committee: CompactCommittee

note:
This is data we retain for syncing
this is the minimum amount of data to store

the shard tells us which shard we're tracking
(this is random and we shouldn't actually care which one since we're just using the shard to get to the beacon block header)

the header is the key trusted piece of data we use to verify merkle proofs against (just like in PoW light clients)
from a beacon block, we can use merkle proofs to verify data about shards, beacon state, everything

the committees are stored to keep track of pubkeys/balances of those who are voting on recent shard block roots
a shard committee is a blend of two underlying period committees, which change every ~27 hrs


class LightClientUpdate(container):
    # Shard block root (and authenticating signature data)
    shard_block_root: Hash
    fork_version: Version
    aggregation_bits: Bitlist
    signature: BLSSignature
    
    # Updated beacon header (and authenticating branch)
    header: BeaconBlockHeader
    header_branch: MerkleProof
    
    # Updated period committee (and authenticating branch)
    committee: CompactCommittee
    committee_branch: MerkleProof

note:
This is the data we request to stay synced
We need one of these every ~27 hours

The top section is the shard block root and authenticating information
lets start from the botttom of the section
the signature is an aggregated signature that contains the signatures of all attesters in the committee who voted for the shard block root
the aggregation bits tell us who in the committee signed
the fork version lets us make sure the votes are for the fork we think we're on
and the shard block root is the updated block root

The next section is the new beacon block header, our new key to the castle. If all goes well, we'll update our light client memory with this header.
The header branch is a merkle proof that we run against the shard block root.

and the committee is the new period committee. If all goes well, we'll update our light client memory with this new committee.
The committee branch is a merkle proof that we run against the header


def update_memory(
    memory: LightClientMemory,
    update: LightClientUpdate
    ):
    # Verify the update does not skip a period
    # Verify shard attestations
    # - vote is for the shard root
    # Verify shard committee votes pass 2/3 threshold
    # - vote has sufficient weight
    # Verify update header against shard block root and header branch
    # - header is valid
    # Update period committees if entering a new period
    # - verify committee against header
    # Update header

note:
can't skip a period - we need to track all committee changes so we keep track of all stake for that shard

ensure that the vote is for the shard root we just got in the update
and that it has sufficient weight

at this point, we 'trust' the shard root

now we can use the shard root and proof to prove the validity of the included beacon block header

that way, we now trust the beacon block header

and once the header is trusted, we can use it and a proof to prove the validity of the committee update


Update data size


  • shard_block_root: 32 bytes
  • fork_version: 4 bytes
  • aggregation_bits: 16 bytes
  • signature: 96 bytes
  • header: 8 + 32 + 32 + 32 + 96 = 200 bytes
  • header_branch: 4 * 32 = 128 bytes
  • committee: 128 * (48 + 8) = 7,168 bytes
  • committee_branch: (5 + 10) * 32 = 480 bytes

Total: 8,124 bytes per ~27 hours


Total: 8,124 bytes per ~27 hours or
~0.083 bytes per second

vs.

Bitcoin SPV: 80 bytes per ~560 second
~0.143 bytes per second

note:
Our light client sync requires approx 0.083 bytes per second
For reference, bitcoin's light client protocol, requires
approx 0.143 bytes per second

So we're doing pretty good


Data/Proof Requests

Once we're synced, now what?

A: Gimme proofs


Who has what data

All valididators have recent beacon chain state

1/1024 validators have recent shard state

Relayers/State providers have EE state

note:
this effects how we will request data


Proof sizes: Token EE Balance Example

  1. light client update(w/o new committee)? 476 bytes
  2. finalized block root? 224 bytes
  3. state root? 128 bytes
  4. crosslink data? 672 bytes
  5. ee state root? ~736 bytes
  6. token balance? ~1024 bytes

note:

lets think about what a light client would need to get their updated balance on some shard on some ee
lets look at the path we would need, and roughly what the size is

total ~3.2kb


Open questions

  • Request structure
    • How do clients request which data they need?
  • Networking
    • How do clients pick servers to connect to?
    • How does a client quickly connect to a new shard?
  • Incentivization
    • How do clients pay servers?
  • Should shard nodes be light client servers by default?

End

github.com/chainsafe/lodestar


References