# Struct Tags used in `sszgen` {%preview https://hackmd.io/@junsong/B1v3x4wwle %} I'm implementing SSZ-QL upon Prysm codebase, and chose to **preprocess structs that are from automatically generated `*pb.go` files**. `PreCalculateSSZInfo` function highly utilizes [Go Struct Tags](https://go.dev/ref/spec#Struct_types) like `json`, `ssz-size` and `ssz-max`. For example, `historical_roots` field in `BeaconState` has the following tag: ```! `protobuf:"bytes,2004,rep,name=historical_roots,json=historicalRoots,proto3" json:"historical_roots,omitempty" ssz-max:"16777216" ssz-size:"?,32"` ``` I found `ssz-size` and `ssz-max` tags are **de-facto** struct tags in Go & SSZ world. I'm not sure whether it is originated from, but I guess the first `sszgen` tool (which is [`fastssz`](https://github.com/ferranbt/fastssz)) suggested these indicators. Therfore it seems the [first concern](https://hackmd.io/@junsong/B1v3x4wwle#Limitation-amp-Future-Works) is resolved, unless if new sszgen library brings different method. This write-up contains how they should be interpreted in the context of distinguishing and processing `List` & `Vector` SSZ types with inductive approach. ## Background: How are `*.pb.go` files generated? (And How does `sszgen` works?) > [!Note] > This part is dedicated to describe how the scripts like `update-go-pbs.sh` synthesize the auto-generated files. You may skip this section and [go ahead](#Tags-ssz-size-and-ssz-max). ```protobuf! message BeaconStateElectra { // Versioning [1001-2000] uint64 genesis_time = 1001; bytes genesis_validators_root = 1002 [ (ethereum.eth.ext.ssz_size) = "32" ]; uint64 slot = 1003 [ (ethereum.eth.ext.cast_type) = "github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Slot" ]; Fork fork = 1004; // History [2001-3000] BeaconBlockHeader latest_block_header = 2001; repeated bytes block_roots = 2002 [ (ethereum.eth.ext.ssz_size) = "block_roots.size" ]; repeated bytes state_roots = 2003 [ (ethereum.eth.ext.ssz_size) = "state_roots.size" ]; repeated bytes historical_roots = 2004 [ (ethereum.eth.ext.ssz_size) = "?,32", (ethereum.eth.ext.ssz_max) = "16777216" ]; // and so on... } ``` To add or modify a data type that would be used in the consensus, you need to change `*.proto` file first. `*.proto` file is a **template** file so that we can substitute necessary parts with specific inputs. You might notice `ethereum.eth.ext.{cast_type,ssz_max,ssz_size}` directives above. A shell script [`update-go-pbs.sh`](https://github.com/OffchainLabs/prysm/blob/develop/hack/update-go-pbs.sh) is responsible for building `*.pb.go` files with `*.proto` files. You can find a custom Bazel rule in [`proto/ssz_proto_library.bzl`](https://github.com/OffchainLabs/prysm/blob/develop/proto/ssz_proto_library.bzl). As mainnet and minimal have different configs, the substitution map also differs each other like: ```python! def _ssz_proto_files_impl(ctx): """ ssz_proto_files implementation performs expand_template based on the value of "config". """ outputs = [] if (ctx.attr.config.lower() == "mainnet"): subs = mainnet elif (ctx.attr.config.lower() == "minimal"): subs = minimal else: fail("%s is an unknown configuration" % ctx.attr.config) for src in ctx.attr.srcs: output = ctx.actions.declare_file(src.files.to_list()[0].basename) outputs.append(output) ctx.actions.expand_template( template = src.files.to_list()[0], output = output, substitutions = subs, // Here's where substitution takes place. ) return [DefaultInfo(files = depset(outputs))] ``` For example, `block_roots.size` will be replaced in `mainnet` setting with `"8192,32"`, ```python! mainnet = { "block_roots.size": "8192,32", # SLOTS_PER_HISTORICAL_ROOT, [32]byte # ... } ``` which will result the following auto-generated struct. ```go! type BeaconStateElectra struct { state protoimpl.MessageState sizeCache protoimpl.SizeCache unknownFields protoimpl.UnknownFields GenesisTime uint64 `protobuf:"varint,1001,opt,name=genesis_time,json=genesisTime,proto3" json:"genesis_time,omitempty"` GenesisValidatorsRoot []byte `protobuf:"bytes,1002,opt,name=genesis_validators_root,json=genesisValidatorsRoot,proto3" json:"genesis_validators_root,omitempty" ssz-size:"32"` Slot github_com_OffchainLabs_prysm_v6_consensus_types_primitives.Slot `protobuf:"varint,1003,opt,name=slot,proto3" json:"slot,omitempty" cast-type:"github.com/OffchainLabs/prysm/v6/consensus-types/primitives.Slot"` Fork *Fork `protobuf:"bytes,1004,opt,name=fork,proto3" json:"fork,omitempty"` LatestBlockHeader *BeaconBlockHeader `protobuf:"bytes,2001,opt,name=latest_block_header,json=latestBlockHeader,proto3" json:"latest_block_header,omitempty"` BlockRoots [][]byte `protobuf:"bytes,2002,rep,name=block_roots,json=blockRoots,proto3" json:"block_roots,omitempty" ssz-size:"8192,32"` StateRoots [][]byte `protobuf:"bytes,2003,rep,name=state_roots,json=stateRoots,proto3" json:"state_roots,omitempty" ssz-size:"8192,32"` HistoricalRoots [][]byte `protobuf:"bytes,2004,rep,name=historical_roots,json=historicalRoots,proto3" json:"historical_roots,omitempty" ssz-max:"16777216" ssz-size:"?,32"` // ... } ``` If all `*.pb.go` files are ready, we can run another script, `update-go-ssz.sh`. This script will generate `*.ssz.go` files using `*.pb.go` files. `*.ssz.go` files are responsible for implementing `ssz.Marshaler` and `ssz.Unmarshaler` interfaces, and relies on [ferranbt/fastssz](https://github.com/ferranbt/fastssz). (NOTE: The team will replace `fastssz` with `methodical-ssz`. Related [issue](https://github.com/OffchainLabs/prysm/issues/15398) and [comment](https://github.com/OffchainLabs/prysm/pull/15453#issuecomment-3084667639).) --- ## Tags: `ssz-size` and `ssz-max` Those two tags mostly imply whether this homogeneous collection type is `List` or `Vector`. The former is variable sized type and the latter is fixed sized type. They should be (de)serialized and merklized differently. ### Explain with code I checked the tag parser codes from three projects (See [References](#References)), and [`extractSSZDimensions`](https://github.com/OffchainLabs/methodical-ssz/blob/acb1236eb24527b0cd15b6ca3d007a33461ddd63/sszgen/tagparse.go#L81-L122) from `methodical-ssz` project seems best to explain those tags. ```go= func extractSSZDimensions(tag string) ([]*SSZDimension, error) { tp := &TagParser{} tp.Init(tag) tags := tp.GetSSZTags() szStr, sizeDefined := tags["ssz-size"] sizes := strings.Split(szStr, ",") maxStr, maxDefined := tags["ssz-max"] dims := make([]*SSZDimension, 0) maxes := strings.Split(maxStr, ",") if !sizeDefined { if !maxDefined { return nil, fmt.Errorf("no ssz-size or ssz-max tags found for element") } for _, m := range maxes { max, err := strconv.Atoi(m) if err != nil { return nil, errors.Wrapf(err, "error parsing ssz-size=%s, ssz-max=%s", szStr, maxStr) } dims = append(dims, &SSZDimension{ListLength: &max}) } return dims, nil } for i := 0; i < len(sizes); i++ { if sizes[i] == "?" { if len(maxes) <= i { return nil, fmt.Errorf("more than one wildcard in ssz-size, or ssz-max undefined in tag %s", tag) } max, err := strconv.Atoi(maxes[i]) if err != nil { return nil, err } dims = append(dims, &SSZDimension{ListLength: &max}) } else { vsize, err := strconv.Atoi(sizes[i]) if err != nil { return nil, err } dims = append(dims, &SSZDimension{VectorLength: &vsize}) } } return dims, nil } type SSZDimension struct { VectorLength *int ListLength *int } ``` As SSZ supports any dimension list/vector, `extractSSZDimensions` returns a slice of `*SSZDimension`. `SSZDimension` is like an enum, which describes either `List` or `Vector`. As line 6 and 9 say, each tag can be splitted with comma. A tag can be infinitely long with trailing commas and numbers, but practically it normally has at most two dimension. Line 12 implies that either `ssz-size` or `ssz-max` has to be present for the homogeneous collection type. Line 14~21 are for the case that only `ssz-max` is provided. For `List` type, `ssz-size` can be omitted, but `ssz-max` must be provided because the limit of `List` type is critical for SSZ operations. If `ssz-size` is provided, there are two possible cases: wildcard or actual size. For the wildcard(`?`) case (Line 25~32), it is treated as same as `List` case. For non-wildcard case, the corresponding dimension is for `Vector` type. ### Concrete Examples with `consensus-specs` #### `ExecutionRequests`: `List` types. ```python class ExecutionRequests(Container): deposits: List[DepositRequest, MAX_DEPOSIT_REQUESTS_PER_PAYLOAD] # MAX_DEPOSIT_REQUESTS_PER_PAYLOAD = 8192 withdrawals: List[WithdrawalRequest, MAX_WITHDRAWAL_REQUESTS_PER_PAYLOAD] # [MAX_WITHDRAWAL_REQUESTS_PER_PAYLOAD = 16 consolidations: List[ConsolidationRequest, MAX_CONSOLIDATION_REQUESTS_PER_PAYLOAD] # MAX_CONSOLIDATION_REQUESTS_PER_PAYLOAD = 2 ``` is represented as protobuf struct (some fields and tags are omitted for brevity) like: ```go! type ExecutionRequests struct { Deposits []*DepositRequest `ssz-max:"8192"` Withdrawals []*WithdrawalRequest `ssz-max:"16"` Consolidations []*ConsolidationRequest `ssz-max:"2"` } ``` Only `ssz-max` tags are presented, because all fields are `List` type. #### `ExecutionPayload` (since Deneb) ```python class ExecutionPayload(Container): parent_hash: Hash32 # Hash32 = Bytes32 fee_recipient: ExecutionAddress # ExecutionAddress = Bytes20 state_root: Bytes32 receipts_root: Bytes32 logs_bloom: ByteVector[BYTES_PER_LOGS_BLOOM] # ByteVector[N] = ByteN. BYTES_PER_LOGS_BLOOM = 256 prev_randao: Bytes32 block_number: uint64 gas_limit: uint64 gas_used: uint64 timestamp: uint64 extra_data: ByteList[MAX_EXTRA_DATA_BYTES] # ByteList[N] = List[Byte, N]. MAX_EXTRA_DATA_BYTES = 32 base_fee_per_gas: uint256 block_hash: Hash32 # Hash32 = Bytes32 # Transaction = ByteList[MAX_BYTES_PER_TRANSACTION]. MAX_BYTES_PER_TRANSACTION = 1073741824 transactions: List[Transaction, MAX_TRANSACTIONS_PER_PAYLOAD] # MAX_TRANSACTIONS_PER_PAYLOAD = 1048576 withdrawals: List[Withdrawal, MAX_WITHDRAWALS_PER_PAYLOAD] # MAX_WITHDRAWALS_PER_PAYLOAD = 16 blob_gas_used: uint64 excess_blob_gas: uint64 ``` is represented as protobuf struct (some fields and tags are omitted for brevity) like: ```go! type ExecutionPayloadDeneb struct { ParentHash []byte `ssz-size:"32"` FeeRecipient []byte `ssz-size:"20"` StateRoot []byte `ssz-size:"32"` ReceiptsRoot []byte `ssz-size:"32"` LogsBloom []byte `ssz-size:"256"` PrevRandao []byte `ssz-size:"32"` BlockNumber uint64 GasLimit uint64 GasUsed uint64 Timestamp uint64 ExtraData []byte `ssz-max:"32"` BaseFeePerGas []byte `ssz-size:"32"` BlockHash []byte `ssz-size:"32"` Transactions [][]byte `ssz-max:"1048576,1073741824" ssz-size:"?,?"` Withdrawals []*Withdrawal `ssz-max:"16"` BlobGasUsed uint64 ExcessBlobGas uint64 } ``` There are tons of tags here, but we can categorize fields as threefolds: 1. Only `ssz-size` tag (e.g., `parent_hash`, `logs_bloom`) implies `Vector` type. (`BytesN` is an alias of `Vector[byte, N]`) 2. Only `ssz-max` tag (e.g., `extra_data`, `withdrawals`) implies `List` type. 3. `transactions` has both tag and `ssz-size` only contains wildcard (`?`). This is because the type of `transactions` is `List` of `List`. ## What's next? At this moment, my [PoC](https://hackmd.io/@junsong/B1v3x4wwle) doesn't support `List` type. It's quite tricky to infer variable sized types, thus I spent some hours to supplement my understanding about this issue. From now, it's time to write some code parsing various sized types like `List` types! --- ## References I read several tag parser codes from those projects: - [ferranbt/fastssz](https://github.com/ferranbt/fastssz) - [karalabe/ssz](https://github.com/karalabe/ssz) - [OffchainLabs/methodical-ssz](https://github.com/OffchainLabs/methodical-ssz)