Eth2.0 Light Clients: How light is light?

Eth2.0 Light Clients

How Light is Light?

note:
light clients are an integral part of eth2
we'll see why they're important and how they will work

this will be an overview of eth2 light clients with enough background to understand why things are the way they are

Introduction

About Me

Cayman Nava - Eth2.0 developer @ ChainSafe Systems

Twitter: @caymannan
Github: wemeetagain

About Lodestar

Typescript Eth2.0 Ecosystem

Beacon Chain
Light Client
Developer Tooling

https://github.com/ChainSafe/lodestar

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

About ChainSafe

Toronto-based blockchain protocol development

Twitter: @chainsafeth
Github: ChainSafe

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Overview

Motivation
- What and why?
Background
- PoW vs PoS Light Clients
- Merkle Proofs, Multiproofs, and SSZ
Eth2 Light Client
- Sync Protocol
- Data/Proof Requests
- Open questions

note:

this will be an overview of eth2 light clients with enough background to understand why things are the way they are

we'll start with motivation for light clients,
what they are, why we need them in eth2

Then, if we need to, we'll cover a little background,
we'll cover what merkle trees and merkle proofs are
(and how they're useful for light clients)
And the difference between PoW and PoS light clients
how to think about pos light clients

Then we'll dive into eth2 light clients
the meat of things
starting with the sync protocol,
followed by some specifics with data requests
and end with some open questions

Motivation

What is a Light Client?

Software looking to securely consume blockchain data with requirements that scale logarithmically to total blockchain state.

note:
eg:
squaring the number of transactions
should only double a light client’s cost.
(eg. going from 1,000 tx/day to 1,000,000 tx/day)

Why Light Clients in Eth2?

Light clients are first class citizens

Resource-constrained Environments

mobile phones
embedded systems
websites
etc.

note:
raise your hand if you have a smartphone
keep it raised if you haved a synced ethereum blockchain on your phone?

Decentralization

dreaded Infura single point of failure (
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
)

note:
for all you who raised your hand a second ago, how many have metamask installed?

Blockchains as Light Clients

eth2 can peer with other blockchains
(eth1, cosmos, polkadot, etc.)

note:
if the requirements are light enough, we can
that will be really important if any other blockchains want to verify anything from eth2, eth1 included

More Shards, More Data

1024 shards, too much state
validators are light clients-ish of other shards

note:
light clients are baked into the design of eth2,
in the sense that most regular folks, even validators, won't have all of the eth2 state.
validators will need to sync recent shard state as part of their duties
they'll be using some of the techniques we describe

Background

note:
start with background, if we don't know the background, we're going to be lost with the actual light client protocol

i'm going to cover a few seemingly disparate topics, they all connect

Merkle Proofs

Verify the authenticity of a chunk of data logarithmic to the number of chunks

note:
nin either case, we make extensive use of merkle proofs
poll audience: who needs a refresher on merkle proofs
the proof is succinct, it grows logarithmically to the total number of chunks

note:
when we want some particular data, we assume its part of a merkle tree

note:
very important, its a merkle tree that have the root of, and we "trust" it, the scheme only works if we have the root

merkle roots are often stored in a blockchain

note:
when verifying a merkle proof, the root is the only thing known and trusted
thats usually why you're requesting data in the first place

note:
when we're given a chunk of data, which we don't necessarily trust,
we have to be able to link it back up the tree to the root, which we do trust.
we need to be given the intermediate nodes required to recreate the root
these intermediate nodes are the proof

note:
we need one intermediate node per level in the tree, starting from the bottom

note:
with each intermediate node in the proof, we're able to create the immediate parent

note:
at that point, you can compare your trusted root against this newly computed root, and iff the roots match

note:
then you've verified that the data is correct
you've "verified the proof"
one thing to reiterate is that you only need these blue pieces, one per level, that number grows logarithmically with the total number of leaves

Merkle Multiproofs

note:
merkle multiproofs are merkle proofs

note:
multiproofs are proofs for multiple leaves in the tree

note:
we construct the multiproof in a similar way to how we construct the individual proofs.
Identify the elements needed to recreate the roots from each leaf
BUT the idea is that we can share the elements needed for each leaf

note:
so a proof for just A here would need 3 nodes in the tree
and a proof for just C would need 3 nodes in the tree
but instead of needing 6 nodes for the multiproof, we only need 3

PoW vs. PoS Light Clients

note:
briefly look at some of the differences, how the eth2 light client will differ from an eth1 light client

PoW Light Clients

Easy because headers can be verified with only protocol rules

note:
the headers have everything we need

download headers
verify (block by block)
request merkle proofs

note:
we hash the header
verify the proof of work
verify the next header's previous hash
once we get to the head, we have the relevant merkle roots, and we can request data and merkle proofs

PoS Light Clients

Headers alone aren't sufficient to verify proof of stake
We need to track stake

note:
In the PoS world we're governed by some sort of (super)majority stake
We must ensure we're on the chain with the most stake
which means we need to track balances and votes
votes are cryptographic signatures

this is a different beast than PoW light clients, theres an opportunity to do things a little bit differently

SSZ (Simple Serialize)

note:
this is an eth2 spec designed around consistent and easy merkleization
merkleization

SSZ Example: Checkpoint

class Checkpoint(Container):
    epoch: Epoch
    root: Hash

SSZ Example: Crosslink

class Crosslink(Container):
    shard: Shard
    parent_root: Hash
    start_epoch: Epoch
    end_epoch: Epoch
    data_root: Hash

SSZ Example: AttestationData

class AttestationData(Container):
    # LMD GHOST vote
    beacon_block_root: Hash
    # FFG vote
    source: Checkpoint
    target: Checkpoint
    # Crosslink vote
    crosslink: Crosslink

note:
when you would create a merkle root of this "attestationdata", you would be including the root of the underlying crosslink
and the crosslink includes the "data_root" which is the merkle root of some shard data
eth2 datastructures include merkle roots in many places because its really useful and necessary to be able to create proofs
beacon state -> beacon blocks
beacon blocks -> beacon state
shard blocks -> beacon blocks

Eth2 Light Client

Eth2 Sync Protocol

note:
explain by asking questions and figuring out what makes sense

Motivating Questions

How do we get updated trusted merkle roots?
Can we do this succinctly?

note:
pow strategy not sufficient
we need to get up-to-date "trusted" merkle roots
but in Pos, that means we need stake
in the name of light clients, lets make this as lightweight as possible

What is trusted?

Roots attested by 2/3 of stake.
Staked votes gives weight to the chain.

note:
PoS requirement

Key insight 1:

Instead of syncing headers by hashing one by one
Use staked vote balance, skip ahead to a current header*

note:
we don't need to sync headers one by one
verifying each one by checking the parent hash

we're in a PoS world, where we're governed by a 2/3 majority
we can use votes as verification of recent headers
instead of checking hashes for pow validity
we have to track validator stake/votes

Key insight 2:

Instead of tracking all validator balances + votes
Track a subset of validators (committee)

note:
track a committee

Where do eth2 validators validate?

Crosslink Committees

fast, changing every epoch (~6 min)
attest to recent shard data (crosslink)
attest to beacon block root (but only as total validator set)
attest to recent checkpoints (FFG data)

note:
this doesn't really work for light clients because we still need the whole validator set to authenticate recent block checkpoints

Shard Committees

slow, period committies change every shard period (~27 hrs)
attest to shard block root

note:
shard block header contains a beacon block root
better candidate, changes slowly, only need to update every ~27 hours

Sync Protocol

Sync by shard block root
- with items needed to verify weight*
Shard root gives us beacon block header (via merkle proof)
Every ~27 hrs, update period committee

note:
Track the period committees assigned to that shard
shard committees change fully every ~27 hours
who voted for that shard block
how much of the total stake voted for the block

we can jump 27 hrs at a time
lets look briefly at the datastructures involved in syncing

class LightClientMemory(object):
    # Randomly initialized and retained forever
    shard: Shard
    
    # Beacon header which is not expected to revert
    header: BeaconBlockHeader
    
    # period committees corresponding to the beacon header
    previous_committee: CompactCommittee
    current_committee: CompactCommittee
    next_committee: CompactCommittee

note:
This is data we retain for syncing
this is the minimum amount of data to store

the shard tells us which shard we're tracking
(this is random and we shouldn't actually care which one since we're just using the shard to get to the beacon block header)

the header is the key trusted piece of data we use to verify merkle proofs against (just like in PoW light clients)
from a beacon block, we can use merkle proofs to verify data about shards, beacon state, everything

the committees are stored to keep track of pubkeys/balances of those who are voting on recent shard block roots
a shard committee is a blend of two underlying period committees, which change every ~27 hrs

class LightClientUpdate(container):
    # Shard block root (and authenticating signature data)
    shard_block_root: Hash
    fork_version: Version
    aggregation_bits: Bitlist
    signature: BLSSignature
    
    # Updated beacon header (and authenticating branch)
    header: BeaconBlockHeader
    header_branch: MerkleProof
    
    # Updated period committee (and authenticating branch)
    committee: CompactCommittee
    committee_branch: MerkleProof

note:
This is the data we request to stay synced
We need one of these every ~27 hours

The top section is the shard block root and authenticating information
lets start from the botttom of the section
the signature is an aggregated signature that contains the signatures of all attesters in the committee who voted for the shard block root
the aggregation bits tell us who in the committee signed
the fork version lets us make sure the votes are for the fork we think we're on
and the shard block root is the updated block root

The next section is the new beacon block header, our new key to the castle. If all goes well, we'll update our light client memory with this header.
The header branch is a merkle proof that we run against the shard block root.

and the committee is the new period committee. If all goes well, we'll update our light client memory with this new committee.
The committee branch is a merkle proof that we run against the header

def update_memory(
    memory: LightClientMemory,
    update: LightClientUpdate
    ):
    # Verify the update does not skip a period
    # Verify shard attestations
    # - vote is for the shard root
    # Verify shard committee votes pass 2/3 threshold
    # - vote has sufficient weight
    # Verify update header against shard block root and header branch
    # - header is valid
    # Update period committees if entering a new period
    # - verify committee against header
    # Update header

note:
can't skip a period - we need to track all committee changes so we keep track of all stake for that shard

ensure that the vote is for the shard root we just got in the update
and that it has sufficient weight

at this point, we 'trust' the shard root

now we can use the shard root and proof to prove the validity of the included beacon block header

that way, we now trust the beacon block header

and once the header is trusted, we can use it and a proof to prove the validity of the committee update

Update data size

shard_block_root: 32 bytes
fork_version: 4 bytes
aggregation_bits: 16 bytes
signature: 96 bytes
header: 8 + 32 + 32 + 32 + 96 = 200 bytes
header_branch: 4 * 32 = 128 bytes
committee: 128 * (48 + 8) = 7,168 bytes
committee_branch: (5 + 10) * 32 = 480 bytes

Total: 8,124 bytes per ~27 hours

Total: 8,124 bytes per ~27 hours or
~0.083 bytes per second

vs.

Bitcoin SPV: 80 bytes per ~560 second
~0.143 bytes per second

note:
Our light client sync requires approx 0.083 bytes per second
For reference, bitcoin's light client protocol, requires
approx 0.143 bytes per second

So we're doing pretty good

Data/Proof Requests

Once we're synced, now what?

A: Gimme proofs

Who has what data

All valididators have recent beacon chain state

1/1024 validators have recent shard state

Relayers/State providers have EE state

note:
this effects how we will request data

Proof sizes: Token EE Balance Example

light client update(w/o new committee)? 476 bytes
finalized block root? 224 bytes
state root? 128 bytes
crosslink data? 672 bytes
ee state root? ~736 bytes
token balance? ~1024 bytes