# Struct Tags used in `sszgen`
{%preview https://hackmd.io/@junsong/B1v3x4wwle %}
I'm implementing SSZ-QL upon Prysm codebase, and chose to **preprocess structs that are from automatically generated `*pb.go` files**. `PreCalculateSSZInfo` function highly utilizes [Go Struct Tags](https://go.dev/ref/spec#Struct_types) like `json`, `ssz-size` and `ssz-max`. For example, `historical_roots` field in `BeaconState` has the following tag:
```!
`protobuf:"bytes,2004,rep,name=historical_roots,json=historicalRoots,proto3" json:"historical_roots,omitempty" ssz-max:"16777216" ssz-size:"?,32"`
```
I found `ssz-size` and `ssz-max` tags are **de-facto** struct tags in Go & SSZ world. I'm not sure whether it is originated from, but I guess the first `sszgen` tool (which is [`fastssz`](https://github.com/ferranbt/fastssz)) suggested these indicators. Therfore it seems the [first concern](https://hackmd.io/@junsong/B1v3x4wwle#Limitation-amp-Future-Works) is resolved, unless if new sszgen library brings different method.
This write-up contains how they should be interpreted in the context of distinguishing and processing `List` & `Vector` SSZ types with inductive approach.
## Background: How are `*.pb.go` files generated? (And How does `sszgen` works?)
> [!Note]
> This part is dedicated to describe how the scripts like `update-go-pbs.sh` synthesize the auto-generated files. You may skip this section and [go ahead](#Tags-ssz-size-and-ssz-max).
```protobuf!
message BeaconStateElectra {
// Versioning [1001-2000]
uint64 genesis_time = 1001;
bytes genesis_validators_root = 1002 [ (ethereum.eth.ext.ssz_size) = "32" ];
uint64 slot = 1003 [
(ethereum.eth.ext.cast_type) =
"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Slot"
];
Fork fork = 1004;
// History [2001-3000]
BeaconBlockHeader latest_block_header = 2001;
repeated bytes block_roots = 2002
[ (ethereum.eth.ext.ssz_size) = "block_roots.size" ];
repeated bytes state_roots = 2003
[ (ethereum.eth.ext.ssz_size) = "state_roots.size" ];
repeated bytes historical_roots = 2004 [
(ethereum.eth.ext.ssz_size) = "?,32",
(ethereum.eth.ext.ssz_max) = "16777216"
];
// and so on...
}
```
To add or modify a data type that would be used in the consensus, you need to change `*.proto` file first. `*.proto` file is a **template** file so that we can substitute necessary parts with specific inputs. You might notice `ethereum.eth.ext.{cast_type,ssz_max,ssz_size}` directives above.
A shell script [`update-go-pbs.sh`](https://github.com/OffchainLabs/prysm/blob/develop/hack/update-go-pbs.sh) is responsible for building `*.pb.go` files with `*.proto` files. You can find a custom Bazel rule in [`proto/ssz_proto_library.bzl`](https://github.com/OffchainLabs/prysm/blob/develop/proto/ssz_proto_library.bzl). As mainnet and minimal have different configs, the substitution map also differs each other like:
```python!
def _ssz_proto_files_impl(ctx):
"""
ssz_proto_files implementation performs expand_template based on the value of "config".
"""
outputs = []
if (ctx.attr.config.lower() == "mainnet"):
subs = mainnet
elif (ctx.attr.config.lower() == "minimal"):
subs = minimal
else:
fail("%s is an unknown configuration" % ctx.attr.config)
for src in ctx.attr.srcs:
output = ctx.actions.declare_file(src.files.to_list()[0].basename)
outputs.append(output)
ctx.actions.expand_template(
template = src.files.to_list()[0],
output = output,
substitutions = subs, // Here's where substitution takes place.
)
return [DefaultInfo(files = depset(outputs))]
```
For example, `block_roots.size` will be replaced in `mainnet` setting with `"8192,32"`,
```python!
mainnet = {
"block_roots.size": "8192,32", # SLOTS_PER_HISTORICAL_ROOT, [32]byte
# ...
}
```
which will result the following auto-generated struct.
```go!
type BeaconStateElectra struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
GenesisTime uint64 `protobuf:"varint,1001,opt,name=genesis_time,json=genesisTime,proto3" json:"genesis_time,omitempty"`
GenesisValidatorsRoot []byte `protobuf:"bytes,1002,opt,name=genesis_validators_root,json=genesisValidatorsRoot,proto3" json:"genesis_validators_root,omitempty" ssz-size:"32"`
Slot github_com_OffchainLabs_prysm_v6_consensus_types_primitives.Slot `protobuf:"varint,1003,opt,name=slot,proto3" json:"slot,omitempty" cast-type:"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Slot"`
Fork *Fork `protobuf:"bytes,1004,opt,name=fork,proto3" json:"fork,omitempty"`
LatestBlockHeader *BeaconBlockHeader `protobuf:"bytes,2001,opt,name=latest_block_header,json=latestBlockHeader,proto3" json:"latest_block_header,omitempty"`
BlockRoots [][]byte `protobuf:"bytes,2002,rep,name=block_roots,json=blockRoots,proto3" json:"block_roots,omitempty" ssz-size:"8192,32"`
StateRoots [][]byte `protobuf:"bytes,2003,rep,name=state_roots,json=stateRoots,proto3" json:"state_roots,omitempty" ssz-size:"8192,32"`
HistoricalRoots [][]byte `protobuf:"bytes,2004,rep,name=historical_roots,json=historicalRoots,proto3" json:"historical_roots,omitempty" ssz-max:"16777216" ssz-size:"?,32"`
// ...
}
```
If all `*.pb.go` files are ready, we can run another script, `update-go-ssz.sh`. This script will generate `*.ssz.go` files using `*.pb.go` files. `*.ssz.go` files are responsible for implementing `ssz.Marshaler` and `ssz.Unmarshaler` interfaces, and relies on [ferranbt/fastssz](https://github.com/ferranbt/fastssz). (NOTE: The team will replace `fastssz` with `methodical-ssz`. Related [issue](https://github.com/OffchainLabs/prysm/issues/15398) and [comment](https://github.com/OffchainLabs/prysm/pull/15453#issuecomment-3084667639).)
---
## Tags: `ssz-size` and `ssz-max`
Those two tags mostly imply whether this homogeneous collection type is `List` or `Vector`. The former is variable sized type and the latter is fixed sized type. They should be (de)serialized and merklized differently.
### Explain with code
I checked the tag parser codes from three projects (See [References](#References)), and [`extractSSZDimensions`](https://github.com/OffchainLabs/methodical-ssz/blob/acb1236eb24527b0cd15b6ca3d007a33461ddd63/sszgen/tagparse.go#L81-L122) from `methodical-ssz` project seems best to explain those tags.
```go=
func extractSSZDimensions(tag string) ([]*SSZDimension, error) {
tp := &TagParser{}
tp.Init(tag)
tags := tp.GetSSZTags()
szStr, sizeDefined := tags["ssz-size"]
sizes := strings.Split(szStr, ",")
maxStr, maxDefined := tags["ssz-max"]
dims := make([]*SSZDimension, 0)
maxes := strings.Split(maxStr, ",")
if !sizeDefined {
if !maxDefined {
return nil, fmt.Errorf("no ssz-size or ssz-max tags found for element")
}
for _, m := range maxes {
max, err := strconv.Atoi(m)
if err != nil {
return nil, errors.Wrapf(err, "error parsing ssz-size=%s, ssz-max=%s", szStr, maxStr)
}
dims = append(dims, &SSZDimension{ListLength: &max})
}
return dims, nil
}
for i := 0; i < len(sizes); i++ {
if sizes[i] == "?" {
if len(maxes) <= i {
return nil, fmt.Errorf("more than one wildcard in ssz-size, or ssz-max undefined in tag %s", tag)
}
max, err := strconv.Atoi(maxes[i])
if err != nil {
return nil, err
}
dims = append(dims, &SSZDimension{ListLength: &max})
} else {
vsize, err := strconv.Atoi(sizes[i])
if err != nil {
return nil, err
}
dims = append(dims, &SSZDimension{VectorLength: &vsize})
}
}
return dims, nil
}
type SSZDimension struct {
VectorLength *int
ListLength *int
}
```
As SSZ supports any dimension list/vector, `extractSSZDimensions` returns a slice of `*SSZDimension`. `SSZDimension` is like an enum, which describes either `List` or `Vector`. As line 6 and 9 say, each tag can be splitted with comma. A tag can be infinitely long with trailing commas and numbers, but practically it normally has at most two dimension.
Line 12 implies that either `ssz-size` or `ssz-max` has to be present for the homogeneous collection type.
Line 14~21 are for the case that only `ssz-max` is provided. For `List` type, `ssz-size` can be omitted, but `ssz-max` must be provided because the limit of `List` type is critical for SSZ operations.
If `ssz-size` is provided, there are two possible cases: wildcard or actual size. For the wildcard(`?`) case (Line 25~32), it is treated as same as `List` case. For non-wildcard case, the corresponding dimension is for `Vector` type.
### Concrete Examples with `consensus-specs`
#### `ExecutionRequests`: `List` types.
```python
class ExecutionRequests(Container):
deposits: List[DepositRequest, MAX_DEPOSIT_REQUESTS_PER_PAYLOAD] # MAX_DEPOSIT_REQUESTS_PER_PAYLOAD = 8192
withdrawals: List[WithdrawalRequest, MAX_WITHDRAWAL_REQUESTS_PER_PAYLOAD] # [MAX_WITHDRAWAL_REQUESTS_PER_PAYLOAD = 16
consolidations: List[ConsolidationRequest, MAX_CONSOLIDATION_REQUESTS_PER_PAYLOAD] # MAX_CONSOLIDATION_REQUESTS_PER_PAYLOAD = 2
```
is represented as protobuf struct (some fields and tags are omitted for brevity) like:
```go!
type ExecutionRequests struct {
Deposits []*DepositRequest `ssz-max:"8192"`
Withdrawals []*WithdrawalRequest `ssz-max:"16"`
Consolidations []*ConsolidationRequest `ssz-max:"2"`
}
```
Only `ssz-max` tags are presented, because all fields are `List` type.
#### `ExecutionPayload` (since Deneb)
```python
class ExecutionPayload(Container):
parent_hash: Hash32 # Hash32 = Bytes32
fee_recipient: ExecutionAddress # ExecutionAddress = Bytes20
state_root: Bytes32
receipts_root: Bytes32
logs_bloom: ByteVector[BYTES_PER_LOGS_BLOOM] # ByteVector[N] = ByteN. BYTES_PER_LOGS_BLOOM = 256
prev_randao: Bytes32
block_number: uint64
gas_limit: uint64
gas_used: uint64
timestamp: uint64
extra_data: ByteList[MAX_EXTRA_DATA_BYTES] # ByteList[N] = List[Byte, N]. MAX_EXTRA_DATA_BYTES = 32
base_fee_per_gas: uint256
block_hash: Hash32 # Hash32 = Bytes32
# Transaction = ByteList[MAX_BYTES_PER_TRANSACTION]. MAX_BYTES_PER_TRANSACTION = 1073741824
transactions: List[Transaction, MAX_TRANSACTIONS_PER_PAYLOAD] # MAX_TRANSACTIONS_PER_PAYLOAD = 1048576
withdrawals: List[Withdrawal, MAX_WITHDRAWALS_PER_PAYLOAD] # MAX_WITHDRAWALS_PER_PAYLOAD = 16
blob_gas_used: uint64
excess_blob_gas: uint64
```
is represented as protobuf struct (some fields and tags are omitted for brevity) like:
```go!
type ExecutionPayloadDeneb struct {
ParentHash []byte `ssz-size:"32"`
FeeRecipient []byte `ssz-size:"20"`
StateRoot []byte `ssz-size:"32"`
ReceiptsRoot []byte `ssz-size:"32"`
LogsBloom []byte `ssz-size:"256"`
PrevRandao []byte `ssz-size:"32"`
BlockNumber uint64
GasLimit uint64
GasUsed uint64
Timestamp uint64
ExtraData []byte `ssz-max:"32"`
BaseFeePerGas []byte `ssz-size:"32"`
BlockHash []byte `ssz-size:"32"`
Transactions [][]byte `ssz-max:"1048576,1073741824" ssz-size:"?,?"`
Withdrawals []*Withdrawal `ssz-max:"16"`
BlobGasUsed uint64
ExcessBlobGas uint64
}
```
There are tons of tags here, but we can categorize fields as threefolds:
1. Only `ssz-size` tag (e.g., `parent_hash`, `logs_bloom`) implies `Vector` type. (`BytesN` is an alias of `Vector[byte, N]`)
2. Only `ssz-max` tag (e.g., `extra_data`, `withdrawals`) implies `List` type.
3. `transactions` has both tag and `ssz-size` only contains wildcard (`?`). This is because the type of `transactions` is `List` of `List`.
## What's next?
At this moment, my [PoC](https://hackmd.io/@junsong/B1v3x4wwle) doesn't support `List` type. It's quite tricky to infer variable sized types, thus I spent some hours to supplement my understanding about this issue.
From now, it's time to write some code parsing various sized types like `List` types!
---
## References
I read several tag parser codes from those projects:
- [ferranbt/fastssz](https://github.com/ferranbt/fastssz)
- [karalabe/ssz](https://github.com/karalabe/ssz)
- [OffchainLabs/methodical-ssz](https://github.com/OffchainLabs/methodical-ssz)