From a CL developer perspective, there are three serialization formats for consensus structures (e.g. BeaconBlock
, ExecutionPayload
):
Format | Primary Use | Primary Goal | Endianness | Specs |
---|---|---|---|---|
SSZ | P2P comms & hashing | Compact on the wire, bijective | Little-endian | ethereum/consensus-specs |
CL JSON | Beacon Node HTTP API | Human Readable | NA | etherum/beacon-apis |
EL JSON | CL <> EL Comms | Human Readable | Big-endian | ethereum/execution-apis |
Note: RLP has been omitted since it's not so relevant for CL devs. RLP is the EL equivalient of SSZ; a compact encoding designed for the P2P network.
The "EL JSON" format is the oldest of the three, I believe it existed when Ethereum launched in 2015. It's the encoding used in the classic "Ethereum JSON RPC" (e.g. what geth attach
gives). The EL JSON format is designed to be human readable so users can type stuff into the JSON RPC console and read the answers they get back.
Next came SSZ in 2018/2019 during the specification of the Beacon Chain. The goal of SSZ is to be simple and compact. It should be simple so it's easy to grasp conceptually and implement correctly. It should be compact because it's used for transmitting blocks/attestations over the P2P network where there's lots of broadcast amplification. SSZ is not human readable; it's impractical to type out a SSZ message and read what you get back. It's just raw bytes without field names. (Fun fact: Danny and Vitalik are conversationally fluent in SSZ and use it to converse privately during ACD calls.)
After (or around the same time as) SSZ came "CL JSON". It was created as the method of comms for consensus layer APIs (e.g., the Beacon Node HTTP API). It aims to be human readable, so anyone (who reads English) can curl
things from the BN API and read the response. CL JSON doesn't care much about compactness since it's intended for local comms that happen via a LAN rather than P2P comms across the Internet.
It seemed like a good idea at the time 😬. It wasn't clear that we'd end up with the EL+CL ecosystem back then.
CL JSON was an attempt to address some oddities in EL JSON and start fresh with an improved standard.
There are many different types one can communicate via these three serialization formats: structs, lists, integers, byte-arrays, etc. The EL and CL JSON formats are very close, whilst the SSZ encoding is very different.
I don't think we should describe all types in this document, rather just focus on what I think are the most confusing: integers and byte-arrays.
First, I'll present a table of differences for each type and then I'll show some examples to demonstrate these differences.
Format | Integer Representation | Representation of of Uint64(1337) |
---|---|---|
SSZ | Little-endian bytes | [39, 05, 0, 0, 0, 0, 0, 0] |
CL JSON | Decimal string | "1337" |
EL JSON | Hexidecimal, big-endian string with leading-zeros stripped | "0x0539" |
Format | Byte-Array Representation | Representation of of List([0, 42]) |
---|---|---|
SSZ | Simply an array of bytes | [0, 42] |
CL JSON | 0x-prefixed hexidecimal string | "0x002a" |
EL JSON | 0x-prefixed hexidecimal string | "0x002a" |
Let's use this imaginary structure to demonstrate :
class SpecialMessage:
# A sequential integer used to order messages.
message_id: Uint64
# Some special message of 8 bytes. Maybe a UTF-8 string, maybe not.
body: List[Uint8, 8]
Let's instantiate it:
my_message = SpecialMessage(message_id=1337, body=[0,0,0,0,0,0,0,42])
Now we'll view it in the three different formats:
SpecialMessage
as SSZssz_encode(my_message) == [39, 5, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42]
For readability I'll take the message and split it across lines with comments:
[
# This is the 64-bit decimal integer 1337 encoded as little endian.
39, 5, 0, 0, 0, 0, 0, 0,
# This is an SSZ "offset", saying you can read the value of the list at index 12.
12, 0, 0, 0,
# Starting at index 12, this is the value of the `body` field.
0, 0, 0, 0, 0, 0, 0, 42
]
Note that this response is schemaless. The SSZ encoding tells you nothing about the structure of the bytes, it's assumed that you know what you're decoding so you should know where each field starts and ends. This sucks for readability, but clearly reduces the size of the message.
The SSZ message is 20 bytes in total.
SpecialMessage
as CL JSONcl_json_encode(my_message) == '{"message_id": "1337","body": "0x00000042"}'
Once again, let's split the message across lines and comment it (comments are illegal in JSON, but YOLO):
{
# An integer presented as a string because Javascript doesn't
# natively support 64-bit integers.
"message_id": "1337",
# A byte-array represented as `0x`-prefixed hex.
"body": "0x000000000000002a"
}
The CL JSON message is 49 bytes in total.
SpecialMessage
as EL JSONel_json_encode(my_message) == '{"message_id": "0x0539","body": "0x0000002a"}'
The message split across lines for readability (with YOLO comments):
{
# The decimal integer 1337 represented as `0x`-prefixed big-endian hex,
# with leading-zeros stripped.
"message_id": "0x0539",
# A byte-array represented as `0x`-prefixed hex.
"body": "0x000000000000002a"
}
The QUANTITY
and DATA
encoding formats described in the Encoding section of the Engine API spec are critical to understanding this format. In our example, the QUANTITY
encoding is used for message_id
whilst the DATA
encoding is used for the body
.
The EL JSON message is 51 bytes in total.