Serialization Overview for Ethereum CL Devs

From a CL developer perspective, there are three serialization formats for consensus structures (e.g. BeaconBlock, ExecutionPayload):

Format	Primary Use	Primary Goal	Endianness	Specs
SSZ	P2P comms & hashing	Compact on the wire, bijective	Little-endian	ethereum/consensus-specs
CL JSON	Beacon Node HTTP API	Human Readable	NA	etherum/beacon-apis
EL JSON	CL <> EL Comms	Human Readable	Big-endian	ethereum/execution-apis

Note: RLP has been omitted since it's not so relevant for CL devs. RLP is the EL equivalient of SSZ; a compact encoding designed for the P2P network.

A Brief Chonology

The "EL JSON" format is the oldest of the three, I believe it existed when Ethereum launched in 2015. It's the encoding used in the classic "Ethereum JSON RPC" (e.g. what geth attach gives). The EL JSON format is designed to be human readable so users can type stuff into the JSON RPC console and read the answers they get back.

Next came SSZ in 2018/2019 during the specification of the Beacon Chain. The goal of SSZ is to be simple and compact. It should be simple so it's easy to grasp conceptually and implement correctly. It should be compact because it's used for transmitting blocks/attestations over the P2P network where there's lots of broadcast amplification. SSZ is not human readable; it's impractical to type out a SSZ message and read what you get back. It's just raw bytes without field names. (Fun fact: Danny and Vitalik are conversationally fluent in SSZ and use it to converse privately during ACD calls.)

After (or around the same time as) SSZ came "CL JSON". It was created as the method of comms for consensus layer APIs (e.g., the Beacon Node HTTP API). It aims to be human readable, so anyone (who reads English) can curl things from the BN API and read the response. CL JSON doesn't care much about compactness since it's intended for local comms that happen via a LAN rather than P2P comms across the Internet.

Why are EL and CL JSON different?

It seemed like a good idea at the time 😬. It wasn't clear that we'd end up with the EL+CL ecosystem back then.

CL JSON was an attempt to address some oddities in EL JSON and start fresh with an improved standard.

Differences

There are many different types one can communicate via these three serialization formats: structs, lists, integers, byte-arrays, etc. The EL and CL JSON formats are very close, whilst the SSZ encoding is very different.

I don't think we should describe all types in this document, rather just focus on what I think are the most confusing: integers and byte-arrays.

First, I'll present a table of differences for each type and then I'll show some examples to demonstrate these differences.

Differences: Integers

Format	Integer Representation	Representation of of `Uint64(1337)`
SSZ	Little-endian bytes	`[39, 05, 0, 0, 0, 0, 0, 0]`
CL JSON	Decimal string	`"1337"`
EL JSON	Hexidecimal, big-endian string with leading-zeros stripped	`"0x0539"`

Differences: Byte-Arrays

Format	Byte-Array Representation	Representation of of `List([0, 42])`
SSZ	Simply an array of bytes	`[0, 42]`
CL JSON	0x-prefixed hexidecimal string	`"0x002a"`
EL JSON	0x-prefixed hexidecimal string	`"0x002a"`

Examples

Let's use this imaginary structure to demonstrate :





class SpecialMessage:
    # A sequential integer used to order messages.
    message_id: Uint64
    # Some special message of 8 bytes. Maybe a UTF-8 string, maybe not.
    body: List[Uint8, 8]

Let's instantiate it:


my_message = SpecialMessage(message_id=1337, body=[0,0,0,0,0,0,0,42])

Now we'll view it in the three different formats:

`SpecialMessage` as SSZ


ssz_encode(my_message) == [39, 5, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42]

For readability I'll take the message and split it across lines with comments:








[
    # This is the 64-bit decimal integer 1337 encoded as little endian.
    39, 5, 0, 0, 0, 0, 0, 0,
    # This is an SSZ "offset", saying you can read the value of the list at index 12.
    12, 0, 0, 0,
    # Starting at index 12, this is the value of the `body` field.
    0, 0, 0, 0, 0, 0, 0, 42
]

Note that this response is schemaless. The SSZ encoding tells you nothing about the structure of the bytes, it's assumed that you know what you're decoding so you should know where each field starts and ends. This sucks for readability, but clearly reduces the size of the message.

The SSZ message is 20 bytes in total.

`SpecialMessage` as CL JSON


cl_json_encode(my_message) == '{"message_id": "1337","body": "0x00000042"}'

Once again, let's split the message across lines and comment it (comments are illegal in JSON, but YOLO):

{
  # An integer presented as a string because Javascript doesn't
  # natively support 64-bit integers.
  "message_id": "1337",
  # A byte-array represented as `0x`-prefixed hex.
  "body": "0x000000000000002a"
}

The CL JSON message is 49 bytes in total.

`SpecialMessage` as EL JSON


el_json_encode(my_message) == '{"message_id": "0x0539","body": "0x0000002a"}'

The message split across lines for readability (with YOLO comments):







{
  # The decimal integer 1337 represented as `0x`-prefixed big-endian hex,
  # with leading-zeros stripped.
  "message_id": "0x0539",
  # A byte-array represented as `0x`-prefixed hex.
  "body": "0x000000000000002a"
}

The QUANTITY and DATA encoding formats described in the Encoding section of the Engine API spec are critical to understanding this format. In our example, the QUANTITY encoding is used for message_id whilst the DATA encoding is used for the body.

The EL JSON message is 51 bytes in total.

Serialization Overview for Ethereum CL Devs

A Brief Chonology

Why are EL and CL JSON different?

Differences

Differences: Integers

Differences: Byte-Arrays

Examples

SpecialMessage as SSZ

SpecialMessage as CL JSON

SpecialMessage as EL JSON

Read more

SnowyHitch Panics

Windows `cache_arena.rs` panic

Debugging

Lighthouse v3.1.0 [DRAFT RELEASE NOTES]

`SpecialMessage` as SSZ

`SpecialMessage` as CL JSON

`SpecialMessage` as EL JSON