note:
light clients are an integral part of eth2
we'll see why they're important and how they will work
this will be an overview of eth2 light clients with enough background to understand why things are the way they are
Cayman Nava - Eth2.0 developer @ ChainSafe Systems
Twitter: @caymannan
Github: wemeetagain
Beacon Chain
Light Client
Developer Tooling
https://github.com/ChainSafe/lodestar
Learn More →
Toronto-based blockchain protocol development
Twitter: @chainsafeth
Github: ChainSafe
note:
this will be an overview of eth2 light clients with enough background to understand why things are the way they are
we'll start with motivation for light clients,
what they are, why we need them in eth2
Then, if we need to, we'll cover a little background,
we'll cover what merkle trees and merkle proofs are
(and how they're useful for light clients)
And the difference between PoW and PoS light clients
how to think about pos light clients
Then we'll dive into eth2 light clients
the meat of things
starting with the sync protocol,
followed by some specifics with data requests
and end with some open questions
Software looking to securely consume blockchain data with requirements that scale logarithmically to total blockchain state.
note:
eg:
squaring the number of transactions
should only double a light client’s cost.
(eg. going from 1,000 tx/day to 1,000,000 tx/day)
Light clients are first class citizens
note:
raise your hand if you have a smartphone
keep it raised if you haved a synced ethereum blockchain on your phone?
note:
for all you who raised your hand a second ago, how many have metamask installed?
eth2 can peer with other blockchains
(eth1, cosmos, polkadot, etc.)
note:
if the requirements are light enough, we can
that will be really important if any other blockchains want to verify anything from eth2, eth1 included
note:
light clients are baked into the design of eth2,
in the sense that most regular folks, even validators, won't have all of the eth2 state.
validators will need to sync recent shard state as part of their duties
they'll be using some of the techniques we describe
note:
start with background, if we don't know the background, we're going to be lost with the actual light client protocol
i'm going to cover a few seemingly disparate topics, they all connect
Verify the authenticity of a chunk of data logarithmic to the number of chunks
note:
nin either case, we make extensive use of merkle proofs
poll audience: who needs a refresher on merkle proofs
the proof is succinct, it grows logarithmically to the total number of chunks
note:
when we want some particular data, we assume its part of a merkle tree
note:
very important, its a merkle tree that have the root of, and we "trust" it, the scheme only works if we have the root
merkle roots are often stored in a blockchain
note:
when verifying a merkle proof, the root is the only thing known and trusted
thats usually why you're requesting data in the first place
note:
when we're given a chunk of data, which we don't necessarily trust,
we have to be able to link it back up the tree to the root, which we do trust.
we need to be given the intermediate nodes required to recreate the root
these intermediate nodes are the proof
note:
we need one intermediate node per level in the tree, starting from the bottom
note:
with each intermediate node in the proof, we're able to create the immediate parent
note:
at that point, you can compare your trusted root against this newly computed root, and iff the roots match
note:
then you've verified that the data is correct
you've "verified the proof"
one thing to reiterate is that you only need these blue pieces, one per level, that number grows logarithmically with the total number of leaves
note:
merkle multiproofs are merkle proofs
note:
multiproofs are proofs for multiple leaves in the tree
note:
we construct the multiproof in a similar way to how we construct the individual proofs.
Identify the elements needed to recreate the roots from each leaf
BUT the idea is that we can share the elements needed for each leaf
note:
so a proof for just A here would need 3 nodes in the tree
and a proof for just C would need 3 nodes in the tree
but instead of needing 6 nodes for the multiproof, we only need 3
note:
briefly look at some of the differences, how the eth2 light client will differ from an eth1 light client
Easy because headers can be verified with only protocol rules
note:
the headers have everything we need
note:
we hash the header
verify the proof of work
verify the next header's previous hash
once we get to the head, we have the relevant merkle roots, and we can request data and merkle proofs
Headers alone aren't sufficient to verify proof of stake
We need to track stake
note:
In the PoS world we're governed by some sort of (super)majority stake
We must ensure we're on the chain with the most stake
which means we need to track balances and votes
votes are cryptographic signatures
this is a different beast than PoW light clients, theres an opportunity to do things a little bit differently
note:
this is an eth2 spec designed around consistent and easy merkleization
merkleization
class Checkpoint(Container):
epoch: Epoch
root: Hash
class Crosslink(Container):
shard: Shard
parent_root: Hash
start_epoch: Epoch
end_epoch: Epoch
data_root: Hash
class AttestationData(Container):
# LMD GHOST vote
beacon_block_root: Hash
# FFG vote
source: Checkpoint
target: Checkpoint
# Crosslink vote
crosslink: Crosslink
note:
when you would create a merkle root of this "attestationdata", you would be including the root of the underlying crosslink
and the crosslink includes the "data_root" which is the merkle root of some shard data
eth2 datastructures include merkle roots in many places because its really useful and necessary to be able to create proofs
beacon state -> beacon blocks
beacon blocks -> beacon state
shard blocks -> beacon blocks
note:
explain by asking questions and figuring out what makes sense
How do we get updated trusted merkle roots?
Can we do this succinctly?
note:
pow strategy not sufficient
we need to get up-to-date "trusted" merkle roots
but in Pos, that means we need stake
in the name of light clients, lets make this as lightweight as possible
Roots attested by 2/3 of stake.
Staked votes gives weight to the chain.
note:
PoS requirement
Instead of syncing headers by hashing one by one
Use staked vote balance, skip ahead to a current header*
note:
we don't need to sync headers one by one
verifying each one by checking the parent hash
we're in a PoS world, where we're governed by a 2/3 majority
we can use votes as verification of recent headers
instead of checking hashes for pow validity
we have to track validator stake/votes
Instead of tracking all validator balances + votes
Track a subset of validators (committee)
note:
track a committee
note:
this doesn't really work for light clients because we still need the whole validator set to authenticate recent block checkpoints
note:
shard block header contains a beacon block root
better candidate, changes slowly, only need to update every ~27 hours
note:
Track the period committees assigned to that shard
shard committees change fully every ~27 hours
who voted for that shard block
how much of the total stake voted for the block
we can jump 27 hrs at a time
lets look briefly at the datastructures involved in syncing
class LightClientMemory(object):
# Randomly initialized and retained forever
shard: Shard
# Beacon header which is not expected to revert
header: BeaconBlockHeader
# period committees corresponding to the beacon header
previous_committee: CompactCommittee
current_committee: CompactCommittee
next_committee: CompactCommittee
note:
This is data we retain for syncing
this is the minimum amount of data to store
the shard tells us which shard we're tracking
(this is random and we shouldn't actually care which one since we're just using the shard to get to the beacon block header)
the header is the key trusted piece of data we use to verify merkle proofs against (just like in PoW light clients)
from a beacon block, we can use merkle proofs to verify data about shards, beacon state, everything
the committees are stored to keep track of pubkeys/balances of those who are voting on recent shard block roots
a shard committee is a blend of two underlying period committees, which change every ~27 hrs
class LightClientUpdate(container):
# Shard block root (and authenticating signature data)
shard_block_root: Hash
fork_version: Version
aggregation_bits: Bitlist
signature: BLSSignature
# Updated beacon header (and authenticating branch)
header: BeaconBlockHeader
header_branch: MerkleProof
# Updated period committee (and authenticating branch)
committee: CompactCommittee
committee_branch: MerkleProof
note:
This is the data we request to stay synced
We need one of these every ~27 hours
The top section is the shard block root and authenticating information
lets start from the botttom of the section
the signature is an aggregated signature that contains the signatures of all attesters in the committee who voted for the shard block root
the aggregation bits tell us who in the committee signed
the fork version lets us make sure the votes are for the fork we think we're on
and the shard block root is the updated block root
The next section is the new beacon block header, our new key to the castle. If all goes well, we'll update our light client memory with this header.
The header branch is a merkle proof that we run against the shard block root.
and the committee is the new period committee. If all goes well, we'll update our light client memory with this new committee.
The committee branch is a merkle proof that we run against the header
def update_memory(
memory: LightClientMemory,
update: LightClientUpdate
):
# Verify the update does not skip a period
# Verify shard attestations
# - vote is for the shard root
# Verify shard committee votes pass 2/3 threshold
# - vote has sufficient weight
# Verify update header against shard block root and header branch
# - header is valid
# Update period committees if entering a new period
# - verify committee against header
# Update header
note:
can't skip a period - we need to track all committee changes so we keep track of all stake for that shard
ensure that the vote is for the shard root we just got in the update
and that it has sufficient weight
at this point, we 'trust' the shard root
now we can use the shard root and proof to prove the validity of the included beacon block header
that way, we now trust the beacon block header
and once the header is trusted, we can use it and a proof to prove the validity of the committee update
Total: 8,124 bytes per ~27 hours
Total: 8,124 bytes per ~27 hours or
~0.083 bytes per second
vs.
Bitcoin SPV: 80 bytes per ~560 second
~0.143 bytes per second
note:
Our light client sync requires approx 0.083 bytes per second
For reference, bitcoin's light client protocol, requires
approx 0.143 bytes per second
So we're doing pretty good
Once we're synced, now what?
A: Gimme proofs
All valididators have recent beacon chain state
1/1024 validators have recent shard state
Relayers/State providers have EE state
note:
this effects how we will request data
note:
lets think about what a light client would need to get their updated balance on some shard on some ee
lets look at the path we would need, and roughly what the size is
total ~3.2kb
References