# A cat named CESR – an extraordinary introduction to CESR
## Introduction
In the first post of the series, we introduce the [CESR (Composable Event Streaming Representation)](https://weboftrust.github.io/ietf-cesr/draft-ssmith-cesr.html) protocol, primarily invented by Dr. Sam Smith to enable effective data streaming in the [KERI](https://weboftrust.github.io/ietf-keri/draft-ssmith-keri.html) protocol. One of its unique features, novel in the industry, is the ability to switch between text and binary representations, enabling easier adoption and debugging with text and, when ready, going production with binary. Thus, the consumer benefits from both worlds: the compactness and efficiency of binary representation and the readability and ease of use with a text representation. For a higher-level overview of the protocol, we also recommend looking at [this article](https://medium.com/happy-blockchains/cesr-one-of-sam-smiths-inventions-is-as-controversial-as-genius-d757f36b88f8).
In this post, we demystify CESR using some fanciful information to better understand the underlying protocol mechanics.
## CESR building blocks
CESR uses other encoding formats internally, such as JSON, CBOR, and MessagePack, to serialize data structures using field maps. We will call those building block payloads. They are interleaved with CESR-specific elements.
The second building block, which is CESR-specific, is attachment. It is a combination of codes, primitives, and groups, which we will discuss in more detail in the following sections. Thanks to attachments, we can enrich data with extra information and context while maintaining the payload's integrity.
Together, payloads and attachments form a CESR stream or datagram. Below is an example of the stream using the KERI protocol in CESR. Payloads are in green, and attachments are in yellow.

### Protocols (Code Tables)
Although CESR primarily emerged to support KERI events with efficient transmission of cryptographic material, it enables support for any other protocol. CESR is all about the metadata that describes underlying data structures (payloads). Thus, anything that benefits from CESR properties might establish a new protocol.
### Primitives
The most basic CESR element is primitive. It represents a single piece of data with a text code and raw bytes of its value. The code specifies the data type and length. For example, let us say we have a code, `M,` representing a two-byte binary number. Assume we have a number `255` that we want to encode in CESR. Thus it is `MAD_` in the text domain or `[48, 0, 255]` in the binary (more will follow).
### Groups
Groups are primary aggregators, enabling the creation of more complex data structures. They can contain not only primitives but also other groups, which can be nested to create a hierarchical structure. They are represented by codes that indicate the types of elements that will appear next in the stream and count them.
### Pipelining
CESR provides special codes for pipelining the attachments. They are similar to group codes, but instead of a count of groups, it contains the total length of attached data. Thanks to that, the attachment length is known after parsing the code. We can cut it from a stream and process it independently. It is not possible in the case of group codes because we know the count of attached groups, not the exact attachment length.
### Representation domains
CESR elements can be represented in one of three domains: as bytes (B domain), as text (T domain), and as a tuple of text code and value bytes (R domain). CESR provides transformations between each of those domains, denoted as `T(B)`,`T(R)`, `B(T)`, `B(R)`, `R(B)`, `R(T)`.

Here is how we can represent primitive from the above example in each domain:

## Example
Let us demonstrate CESR's fundamental features by introducing an arbitrary, custom-defined protocol. The protocol is about a `cat` and its sophisticated characteristics. We define it solely for teaching purposes to explain protocol mechanics.
### Cat's color
Let us introduce a CESR primitive for that. Color is represented in [RGB](https://en.wikipedia.org/wiki/RGB_color_model).
In the RGB color model, primary colors - red, green, and blue - can be combined in various proportions to create a wide range of other colors. Each primary color is represented by a number ranging from 0 to 255, with 0 representing the absence of that color and 255 representing its maximum intensity. For example, `[255, 0, 0]` is red, `[0, 0, 0]` - black and `[255, 255, 255]` - white.
We need three bytes to encode color. Let us define the code and value length for our primitive:
| code | description | value length (in bytes) | value length (in chars)|
|----------|:-------------:|------:|---:|
| 1COL | color in RGB | 3 | 4|
**Note** Codes are not random. Their length depends on the length of data that they are encoding.
Primitive here has a fixed size, and code length is the same as the count of base64 padding characters after encoding those values, or four if there is no padding. The first characters of the code are specified in CESR documentation and let parsers choose the proper code table. For further explanation, look into [CESR RFC](https://weboftrust.github.io/ietf-cesr/draft-ssmith-cesr.html#section-3.1).
Examples of color primitives in each domain:

### Multicolor cat
Cats often have more than one color, so how can we express that? Let's create additional code, that will represent a group.
| code | description |
|----------|:-------------:|
| -C## | group of cat colors |
**Note** Hash symbols in this table represents a count of elements in that group. It is encoded in [positional notation](https://en.wikipedia.org/wiki/Positional_notation) with 64 as a base and base64-url safe alphabet letters as digits. Here are some examples:
`AA -> 0`, `AB -> 1`, `AC -> 2`, `A_ -> 63`, `BA -> 64`.
For example, `-CAC` means, that two colors will appear next in the stream. Because the code for each primitive specifies its value size, CESR "knows" where each primitive ends. It doesn't require additional delimiting characters to separate primitives.
So having the following stream, we're now able to interpret its contents:

### Cat's life number
Since cats have nine lives, assume that in each of them they can have another color. Can we attach this information using CESR?
Yes, we can! Let's add new codes for that: one for the order number of cat's life, and a second for group, that will gather life and colors.
Because one byte is enough to encode the order number of cat's life, we can use two character code. (That's because base64 encoded one-byte value has two padding characters)
| code | description | value length (in bytes) | value length (in chars) |
|----------|:-------------:|------:|----:|
| 0L | order number of cat's life | 1 | 2 |
Here are examples of cat life number primitive in each domain:

And here is a group code for gathering life and color:
| code | description |
|----------|:-------------:|
| -G## | group of cat's life number and colors |
Now, let's join all of this together in the attachment, which tells us what color our cat has in which life.

### Pipelining
In the example, we will use `-V` code from the specification :
| code | description |
|----------|:-------------:|
| -V## | Count of total attached grouped material qualified Base64 4 char quadlets |
As mentioned before, this code lets us pack attachments in a block, that can be cut from the stream and processed separately.
For example, let's look at the stream:
`-VAL-GAC0LAE-CAC1COLAAAA1COL____0LAF-CAB1COL5ng8`
Code `-VAL` means, that 11 four characters quadlets contain one block of attachments. 44 characters should be parsed to get it.
### Final protocol codes table
Codes specify groups and primitives that appear in the stream. When we collect all our codes in one table, we get CESR Master Codes Table for the cat use case:
| code | description | value length (in bytes) | value length (in chars) |
|----------|:-------------:|------:|----:|
| | **Basic Two Character Codes** | | |
| 0L | number of cat's life | 1 | 2 |
| | **Basic Four Character Codes** | |
| 1COL | color in RGB | 3 | 4 |
| | **Group codes** | | |
| -C## | group of cat colors | |
| -G## | group of cat's life number and colors | |
| | **Framing codes** | | |
| -V## | Count of total attached grouped material qualified Base64 4 char quadlets |
### To sum it up
With all this in mind, we can interpret the stream from the beginning of this blog post:
```
{"message": "Let the cat out of the bag"}-VAL-GAC0LAE-
CAC1COLAAAA1COL____0LAF-CAB1COL5ng8{"message": "Curiosity killed the cat"}
-VAR-GAD0LAB-CAB1COLZGRk0LAC-CAD1COL5ng81COL____1COLZGRk0LAD-CAB1COLAAAA
{"message": "A cat always lands on its feet"}-VAc-GAD0LAB-CAB1COLZGRk0LAC
-CAD1COL5ng81COL____1COLZGRk0LAD-CAB1COLAAAA-GAC0LAE-CAC1COLAAAA1COL____0LAF
-CAB1COL5ng8
```
See the following picture for a profound explanation of our custom protocol, defined using CESR's fundamental building blocks:

# Summary
We have introduced CESR fundamentals and demonstrated an arbitrary CESR protocol. It inherently showcases CESR is about metadata, represented using its mechanics via groups and primitives.
CESR protocol, while primarily invented for streaming cryptographic material along with the KERI messages, enables a stream of any metadata related to the payload. In other words, any metadata-rich case is CESR-compliant. If furthermore compactness and robustness are also significant factors, CESR is an excellent fit.