# Navigating Options for Implementing SSZ-QL on Prysm
> [!Note]
> This write-up contains some important things to know and presents rough ideas when implementing SSZ-QL. You can see the full project proposal [here](https://github.com/eth-protocol-fellows/cohort-six/blob/master/projects/ssz-ql-with-merkle-proofs.md).
## Introduction
I've been exploring ways to implement SSZ-QL in Prysm codebase, and found that there are two options available right now. Before getting into the options, I think it will be better to understand how `BeaconState` is fetched and handled in the context of Beacon API. Personally, I feel much better on the second option (`Byte-Parsing Engine`) but always welcome for any feedbacks.
## Background: Fetching `BeaconState`
In Prysm, [beacon endpoints](https://ethereum.github.io/beacon-APIs/#/Beacon) with `state_id` usually fetch the corresponding `BeaconState` using `Stater` interface. ([beacon-chain/rpc/lookup/stater.go](https://github.com/OffchainLabs/prysm/blob/cd6cc76d5826f4a4ff880a1b8a9fa3d58772928a/beacon-chain/rpc/lookup/stater.go#L75-L80))
```go
// Stater is responsible for retrieving states.
type Stater interface {
State(ctx context.Context, id []byte) (state.BeaconState, error)
StateRoot(ctx context.Context, id []byte) ([]byte, error)
StateBySlot(ctx context.Context, slot primitives.Slot) (state.BeaconState, error)
}
```
`Stater` provides various methods but I think it's enough to check `State` method (as it share similar structure each other.). Here is the code block:
```go
// State returns the BeaconState for a given identifier. The identifier can be one of:
// - "head" (canonical head in node's view)
// - "genesis"
// - "finalized"
// - "justified"
// - <slot>
// - <hex encoded state root with '0x' prefix>
// - <state root>
func (p *BeaconDbStater) State(ctx context.Context, stateId []byte) (state.BeaconState, error) {
var (
s state.BeaconState
err error
)
stateIdString := strings.ToLower(string(stateId))
switch stateIdString {
case "head":
s, err = p.ChainInfoFetcher.HeadState(ctx)
if err != nil {
return nil, errors.Wrap(err, "could not get head state")
}
case "genesis":
s, err = p.StateBySlot(ctx, params.BeaconConfig().GenesisSlot)
if err != nil {
return nil, errors.Wrap(err, "could not get genesis state")
}
case "finalized":
checkpoint := p.ChainInfoFetcher.FinalizedCheckpt()
targetSlot, err := slots.EpochStart(checkpoint.Epoch)
if err != nil {
return nil, errors.Wrap(err, "could not get start slot")
}
// We use the stategen replayer to fetch the finalized state and then
// replay it to the start slot of our checkpoint's epoch. The replayer
// only ever accesses our canonical history, so the state retrieved will
// always be the finalized state at that epoch.
s, err = p.ReplayerBuilder.ReplayerForSlot(targetSlot).ReplayToSlot(ctx, targetSlot)
if err != nil {
return nil, errors.Wrap(err, "could not get finalized state")
}
case "justified":
checkpoint := p.ChainInfoFetcher.CurrentJustifiedCheckpt()
targetSlot, err := slots.EpochStart(checkpoint.Epoch)
if err != nil {
return nil, errors.Wrap(err, "could not get start slot")
}
// We use the stategen replayer to fetch the justified state and then
// replay it to the start slot of our checkpoint's epoch. The replayer
// only ever accesses our canonical history, so the state retrieved will
// always be the justified state at that epoch.
s, err = p.ReplayerBuilder.ReplayerForSlot(targetSlot).ReplayToSlot(ctx, targetSlot)
if err != nil {
return nil, errors.Wrap(err, "could not get justified state")
}
default:
if bytesutil.IsHex(stateId) {
decoded, parseErr := hexutil.Decode(string(stateId))
if parseErr != nil {
e := NewStateIdParseError(parseErr)
return nil, &e
}
s, err = p.stateByRoot(ctx, decoded)
} else if len(stateId) == 32 {
s, err = p.stateByRoot(ctx, stateId)
} else {
slotNumber, parseErr := strconv.ParseUint(stateIdString, 10, 64)
if parseErr != nil {
// ID format does not match any valid options.
e := NewStateIdParseError(parseErr)
return nil, &e
}
s, err = p.StateBySlot(ctx, primitives.Slot(slotNumber))
}
}
return s, err
}
```
As you might notice, you can obtain a desired `BeaconState` through various methods, which can be classified by their source:
1. For head state, the `blockchain.Service` would _probably_ contain the state. ([By calling `headState` method.](https://github.com/OffchainLabs/prysm/blob/cd6cc76d5826f4a4ff880a1b8a9fa3d58772928a/beacon-chain/blockchain/head.go#L275-L283))
2. If the desired state is already "hot" (e.g., loaded in cache), it would call `get()` function or something else.
- There are two caches: `hotStateCache`, `epochBoundaryStateCache`.
3. If the desired state is saved on the database, it would query to the database.
But of course, **not all `BeaconState`s are retrieved from those sources directly**. As it is inefficient to save all `BeaconState`s for all slots, the node usually saves the `BeaconState` **per 2048 slot**, which is the default value for `--slots-per-archive-point`. Also, cache would not have the desired state.
For space efficiency, Prysm uses `Replayer` to generate the state. `Replayer` actually replays blocks and slots from base `BeaconState`. [The `chainer` interface has `chainForSlot` method](https://github.com/OffchainLabs/prysm/blob/cd6cc76d5826f4a4ff880a1b8a9fa3d58772928a/beacon-chain/state/stategen/replayer.go#L59-L63), which returns the base state(`s`) and blocks to be processed (= `descendants`).
Eventually, by getting `BeaconState` from `Stater`, we can all use the methods that `BeaconState` interface provides.
---
## Option 1. Use In-memory Go structs. (= `Reflection Engine`)
In [`beacon-chain/state/interfaces.go`](https://github.com/OffchainLabs/prysm/blob/develop/beacon-chain/state/interfaces.go), there are lots of interfaces and this option suggests to add a new interface called `Querier`. `Querier` is a simple interface: a struct should only implement `Query` method.
```go
// Querier is an interface for objects that can be navigated
// using an SSZ path to retrieve a nested value.
type Querier interface {
// Query resolves a path and returns the final value.
Query(path []PathElement) (any, error)
}
```
(NOTE: `PathElement` is not yet decided)
Here's a pseudo-code for implementing `Query` in `BeaconState`.
```go
func (b *BeaconState) Query(path []PathElement) (any, error) {
if len(path) == 0 {
return bs, nil
}
element := path[0]
remainingPath := path[1:]
nextObject, err := b.resolveElement(element)
if err != nil {
return nil, err
}
if len(remainingPath) == 0 {
return nextObject, nil
}
querier, ok := nextObject.(Querier)
if !ok {
return nil, someError
}
return querier.Query(remainingPath)
}
```
So now the key is to implement the function `resolveElement`. At this moment, I'm not confident how to resolve an element in the specific data structure. (Of course, we can use verbose `switch-case` statements to match a string with field name, but it's not feasible.) Or we can use [reflect](https://pkg.go.dev/reflect) to be more "generic".
```go
func Marshal(data interface{}) ([]byte, error) {
val := reflect.ValueOf(data)
m, err := makeMarshaler(val)
if err != nil {
return nil, err
}
totalBufferSize := determineSize(val)
buffer := make([]byte, totalBufferSize)
if _, err = m.MarshalType(rval, buf, 0 /* start index */); err != nil {
return nil, fmt.Errorf("failed to marshal for type: %v", rval.Type())
}
return buffer, nil
}
func makeMarshaler(val reflect.Value) (Marshaler, error) {
kind := val.Type().Kind()
switch {
case kind == reflect.Bool:
// Handle bool marshaling...
case kind == reflect.Uint8:
// Handle uint marshaling...
case kind == reflect.Uint16:
// Handle uint marshaling...
case kind == reflect.Uint32:
// Handle uint marshaling...
case kind == reflect.Uint64:
// Handle uint marshaling...
case kind == reflect.Slice && typ.Elem().Kind() == reflect.Uint8:
// Handle byte slice marshaling...
case kind == reflect.Array && typ.Elem().Kind() == reflect.Uint8:
// Handle byte array marshaling...
case isBasicTypeSlice(val.Type()) || isBasicTypeArray(val.Type()):
// Handle basic-type array/slice marshaling...
case kind == reflect.Array && !isVariableSizeType(typ.Elem()):
// Handle fixed-size element array marshaling...
case kind == reflect.Slice && !isVariableSizeType(typ.Elem()):
// Handle fixed-size element slice marshaling...
case kind == reflect.Slice || kind == reflect.Array:
// Handle variable-sized element slice/array marshaling...
case kind == reflect.Struct:
// Handle struct marshaling...
case kind == reflect.Ptr:
// Handle pointer marshaling...
default:
return nil, fmt.Errorf("type %v is not serializable", typ)
}
}
// determineSize figures out the total size of the SSZ-encoding in order
// to preallocate a buffer before marshaling.
func determineSize(val reflect.Value) uint64 {
...
}
```
Above code is an excerpt from [Raul Jordan](https://github.com/rauljordan)'s [post](https://rauljordan.com/go-lessons-from-writing-a-serialization-library-for-ethereum/). I think the similar approach is feasible.
### Pros & Cons
Pros:
- Personally I like the recursive way 😁 as it is more readable.
- We don't have to marshal `BeaconState` again after fetching from `Stater`. Marshalling would be a significant overhead as the state size in bytes are over 250MB nowadays.
Cons:
- Codebase can be bloated: All query-able structs should implement the `Querier` interface.
- Might be bothersome if we add more fields on top of current `BeaconState` in future upgrades (Glamsterdam and so on...).
- We should use Reflect with care in case we have decided to use it. It is [not performant](https://dev.to/leapcell/golang-reflection-is-it-slow-33ka) in most of the cases.
---
## Option 2. Marshal everything, and treat as string of bytes. (= `Byte-Parsing Engine`)
SSZ-encoded bytes can be divided into two parts: fixed length types are serialized [**before**](https://eth2book.info/capella/part2/building_blocks/ssz/#fixed-and-variable-size-types) variable length types. For variable length types, the encoder just denotes its offset in the fixed length section. Thanks to SSZ's unambiguous design, we can preprocess data using its schema to efficiently calculate the offset and data length needed to read any desired item.
Let's take an example. We'd like to trace `.data.target.root` query of `IndexedAttestation`. I wrote all necessary information down:
```python
class IndexedAttestation(Container):
# [Modified in Electra:EIP7549]
attesting_indices: List[ValidatorIndex, MAX_VALIDATORS_PER_COMMITTEE * MAX_COMMITTEES_PER_SLOT]
data: AttestationData
signature: BLSSignature # Bytes96
class AttestationData(Container):
slot: Slot # uint64
index: CommitteeIndex # uint64
# LMD GHOST vote
beacon_block_root: Root # Bytes32
# FFG vote
source: Checkpoint
target: Checkpoint
class Checkpoint(Container):
epoch: Epoch # uint64
root: Root # Bytes32
```
0. Let `current` be `0` to track the offset.
1. First, `attesting_indices` is a list type, which is a variable length type. We can just put an offset (`uint32`) here. Add `4` to the current offset. (`current` == 4)
2. `AttestationData` is a fixed length type as all members are fixed length types. For `.target`, we must add the sum of the sizes of the fields before `target`:
```
Checkpoint container size = 8 (epoch) + 32 (root) = 40
8 (slot) + 8 (index) + 32 (beacon_block_root) + 40 (source) = 88
```
So now `current == 92`.
3. For the last element (`.root`), we also add `8` bytes to `current`. Now `current == 100`.
4. As a result, we read **32 bytes from the offset 100** to get `.data.target.root` from `IndexedAttestation`.
This example makes me confident about generalizing all those things. But how can we provide the schema as a Golang code? I think `.pb.go` files can be key. Although it seems the team is planning for removing Protobuf (Related issue: [#15323](https://github.com/OffchainLabs/prysm/issues/15323)), I believe we can enjoy automatically generated (by `hack/update-go-pbs.sh`) Go code for our implementation. We can also find the corresponding data structures in `*.pb.go` files like below:
```go
type IndexedAttestationElectra struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
AttestingIndices []uint64 `protobuf:"varint,1,rep,packed,name=attesting_indices,json=attestingIndices,proto3" json:"attesting_indices,omitempty" ssz-max:"131072"`
Data *AttestationData `protobuf:"bytes,2,opt,name=data,proto3" json:"data,omitempty"`
Signature []byte `protobuf:"bytes,3,opt,name=signature,proto3" json:"signature,omitempty" ssz-size:"96"`
}
type AttestationData struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
Slot github_com_OffchainLabs_prysm_v6_consensus_types_primitives.Slot `protobuf:"varint,1,opt,name=slot,proto3" json:"slot,omitempty" cast-type:"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Slot"`
CommitteeIndex github_com_OffchainLabs_prysm_v6_consensus_types_primitives.CommitteeIndex `protobuf:"varint,2,opt,name=committee_index,json=committeeIndex,proto3" json:"committee_index,omitempty" cast-type:"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.CommitteeIndex"`
BeaconBlockRoot []byte `protobuf:"bytes,3,opt,name=beacon_block_root,json=beaconBlockRoot,proto3" json:"beacon_block_root,omitempty" ssz-size:"32"`
Source *Checkpoint `protobuf:"bytes,4,opt,name=source,proto3" json:"source,omitempty"`
Target *Checkpoint `protobuf:"bytes,5,opt,name=target,proto3" json:"target,omitempty"`
}
type Checkpoint struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
Epoch github_com_OffchainLabs_prysm_v6_consensus_types_primitives.Epoch `protobuf:"varint,1,opt,name=epoch,proto3" json:"epoch,omitempty" cast-type:"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Epoch"`
Root []byte `protobuf:"bytes,2,opt,name=root,proto3" json:"root,omitempty" ssz-size:"32"`
}
```
It contains all necessary information: name and types of each field, size (`ssz-size`), the limit of List type (`ssz-max`).
### Pros & Cons
Pros:
- More **"generalized"** solution than Option 1. This also fits well with our first possible challenge: _The implementation must not overfit to BeaconState. The code must be generic enough to work with any other arbitrary SSZ object._ This is applicable for most situations provided that there is SSZ-encoded `[]byte` and corresponding SSZ schema.
Cons:
- Marshalling `BeaconState` in memory is an **overhead**. This won't be the problem for the state that is saved on DB, but as [the previous section](#Background-Fetching-BeaconState) indicates, most of the state would be the **result** of replaying.
- Protobuf would be removed in the future. Related issue in Prysm: [#15323](https://github.com/OffchainLabs/prysm/issues/15323)
---
## Resources
- [Writing an Ethereum serialization library using reflection by Raul Jordan](https://rauljordan.com/go-lessons-from-writing-a-serialization-library-for-ethereum/)
- Great explanation from the perspective of Go Engineer, though the technique is now quite outdated.
- [SSZ: Simple Serialize from eth2book](https://eth2book.info/capella/part2/building_blocks/ssz/)
- eth2book is always the best annotated resource with real-world examples.