Try   HackMD

Serialization Overview for Ethereum CL Devs

From a CL developer perspective, there are three serialization formats for consensus structures (e.g. BeaconBlock, ExecutionPayload):

Format Primary Use Primary Goal Endianness Specs
SSZ P2P comms & hashing Compact on the wire, bijective Little-endian ethereum/consensus-specs
CL JSON Beacon Node HTTP API Human Readable NA etherum/beacon-apis
EL JSON CL <> EL Comms Human Readable Big-endian ethereum/execution-apis

Note: RLP has been omitted since it's not so relevant for CL devs. RLP is the EL equivalient of SSZ; a compact encoding designed for the P2P network.

A Brief Chonology

The "EL JSON" format is the oldest of the three, I believe it existed when Ethereum launched in 2015. It's the encoding used in the classic "Ethereum JSON RPC" (e.g. what geth attach gives). The EL JSON format is designed to be human readable so users can type stuff into the JSON RPC console and read the answers they get back.

Next came SSZ in 2018/2019 during the specification of the Beacon Chain. The goal of SSZ is to be simple and compact. It should be simple so it's easy to grasp conceptually and implement correctly. It should be compact because it's used for transmitting blocks/attestations over the P2P network where there's lots of broadcast amplification. SSZ is not human readable; it's impractical to type out a SSZ message and read what you get back. It's just raw bytes without field names. (Fun fact: Danny and Vitalik are conversationally fluent in SSZ and use it to converse privately during ACD calls.)

After (or around the same time as) SSZ came "CL JSON". It was created as the method of comms for consensus layer APIs (e.g., the Beacon Node HTTP API). It aims to be human readable, so anyone (who reads English) can curl things from the BN API and read the response. CL JSON doesn't care much about compactness since it's intended for local comms that happen via a LAN rather than P2P comms across the Internet.

Why are EL and CL JSON different?

It seemed like a good idea at the time 😬. It wasn't clear that we'd end up with the EL+CL ecosystem back then.

CL JSON was an attempt to address some oddities in EL JSON and start fresh with an improved standard.

Differences

There are many different types one can communicate via these three serialization formats: structs, lists, integers, byte-arrays, etc. The EL and CL JSON formats are very close, whilst the SSZ encoding is very different.

I don't think we should describe all types in this document, rather just focus on what I think are the most confusing: integers and byte-arrays.

First, I'll present a table of differences for each type and then I'll show some examples to demonstrate these differences.

Differences: Integers

Format Integer Representation Representation of of Uint64(1337)
SSZ Little-endian bytes [39, 05, 0, 0, 0, 0, 0, 0]
CL JSON Decimal string "1337"
EL JSON Hexidecimal, big-endian string with leading-zeros stripped "0x0539"

Differences: Byte-Arrays

Format Byte-Array Representation Representation of of List([0, 42])
SSZ Simply an array of bytes [0, 42]
CL JSON 0x-prefixed hexidecimal string "0x002a"
EL JSON 0x-prefixed hexidecimal string "0x002a"

Examples

Let's use this imaginary structure to demonstrate :

class SpecialMessage: # A sequential integer used to order messages. message_id: Uint64 # Some special message of 8 bytes. Maybe a UTF-8 string, maybe not. body: List[Uint8, 8]

Let's instantiate it:

my_message = SpecialMessage(message_id=1337, body=[0,0,0,0,0,0,0,42])

Now we'll view it in the three different formats:

SpecialMessage as SSZ

ssz_encode(my_message) == [39, 5, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42]

For readability I'll take the message and split it across lines with comments:

[ # This is the 64-bit decimal integer 1337 encoded as little endian. 39, 5, 0, 0, 0, 0, 0, 0, # This is an SSZ "offset", saying you can read the value of the list at index 12. 12, 0, 0, 0, # Starting at index 12, this is the value of the `body` field. 0, 0, 0, 0, 0, 0, 0, 42 ]

Note that this response is schemaless. The SSZ encoding tells you nothing about the structure of the bytes, it's assumed that you know what you're decoding so you should know where each field starts and ends. This sucks for readability, but clearly reduces the size of the message.

The SSZ message is 20 bytes in total.

SpecialMessage as CL JSON

cl_json_encode(my_message) == '{"message_id": "1337","body": "0x00000042"}'

Once again, let's split the message across lines and comment it (comments are illegal in JSON, but YOLO):

{
  # An integer presented as a string because Javascript doesn't
  # natively support 64-bit integers.
  "message_id": "1337",
  # A byte-array represented as `0x`-prefixed hex.
  "body": "0x000000000000002a"
}

The CL JSON message is 49 bytes in total.

SpecialMessage as EL JSON

el_json_encode(my_message) == '{"message_id": "0x0539","body": "0x0000002a"}'

The message split across lines for readability (with YOLO comments):

{ # The decimal integer 1337 represented as `0x`-prefixed big-endian hex, # with leading-zeros stripped. "message_id": "0x0539", # A byte-array represented as `0x`-prefixed hex. "body": "0x000000000000002a" }

The QUANTITY and DATA encoding formats described in the Encoding section of the Engine API spec are critical to understanding this format. In our example, the QUANTITY encoding is used for message_id whilst the DATA encoding is used for the body.

The EL JSON message is 51 bytes in total.