---
robots: noindex, nofollow
---
# Why CBOR? Video Transcript
[move to Gordian?]
## Transcript
I'm Wolf McNally, Lead Researcher for Blockchain Commons.
In this video I'm going to explain in detail one of the key architectural decisions we've made, how it's impacting our projects, and how we expect our community of developers
and supports to benefit from it.
Blockchain Commons is a "not-for-profit" social benefit corporation that advocates for the creation of open, interoperable, secure, and compassionate digital infrastructure.
Our goal is to enable people to control their own digital destiny and maintain their human dignity online.
The research and development mission of Blockchain Commons includes the development of open source
technical specifications, reference implementations, and tooling that helps developers solve common
problems with hardware and software that needs to be decentralized, secure, preserve privacy,
and enhance human independence.
Blockchain Commons is working to build a stack of technologies that can be easily adopted
by developers.
Part of our aim is to invent new solutions where needed, and integrate existing "best
of breed" solutions where they already exist.
Many of the solutions we're working on have a fundamental need to serialize structured
binary data.
You'd like to be able to create structured data then move it through the network, store
it electronically or even on hard physical media, and finally receive that data in a
different place, possibly by a different agent, running different software.
Because of varying requirements for serializing data, this basic task has seen a lot of approaches
taken to it, all of which involve various tradeoffs.
So in this video, I'm going to focus on our serialization format of choice, and how we
reached that decision.
As I already mentioned, we wanted a *structured, binary* format.
Many of the applications we're concerned with involve cryptographic keys, signatures, and
other forms of data best represented as binary.
Text formats like JSON require the use of additional encoding layers like Base-64, adding
bulk and complexity, especially when you'd like to continue parsing down inside that
data!
Second, we wanted the serialized structured data to be as *concise* as possible.
This means that small structures should result in messages of no more bytes than reasonably
necessary.
A format like BSON, for example, has a surprisingly large serialization footprint, as it trades
off conciseness for the ability to easily update it in place in a database.
Third, we wanted a format that is *self-describing*.
This means that the serialized data contains the associated metadata that describes its
semantics.
Self-describing formats can be *schemaless*, which makes a lot of sense in a world where
both ends of a communication relationship may be evolving at high speed.
Like JSON, which is fundamentally schemaless, we also wanted the option to support formal
schemas as the need arises, but didn't want to tie developers to specific schema processors
or toolchains.
Fourth, we wanted a format that works well in *constrained environments*, like special
purpose embedded systems and the Internet of Things.
This means the codec implementations should be straightforward and efficiently implementable
in a minimum number of lines of code.
Fifth, we wanted a format that is not closely tied to any particular hardware or software
platform, or any specific programming language.
And finally, we wanted a format has had many experienced eyes on it, and that means a format
that has been through the standards process.
This also means that exemplary specifications exist, along with multiple reference implementations
and test vectors.
Being adopted as a standard also reduces resistance to adoption, therefore increasing the likelihood
that there is an active community of developers and projects relying on the code and tools
that support the standard.
This led us to selecting CBOR: the Concise Binary Object Representation.
Binary formats all have the primary drawback that you can't simply examine them in a text
editor.
But CBOR tooling is very good, and is quite easy to see a dump that breaks down well-formed
CBOR into its constituents.
With a little more effort, known tags can automatically be displayed, making understanding
the semantics even easier.
But it gets even better.
The CBOR Diagnostic Notation moves above the byte level and uses a JSON-like text syntax,
including square brackets for arrays and curly braces for maps (analogous to JSON objects.)
CBOR Diagnostic Notation is designed to round-trip with the CBOR binary encoding, but it is primarily
intended as a tool for development and debugging.
If you encounter some unfamiliar CBOR, you can always parse it into diagnostic notation
to start exploring it: no external schema is needed.
The structure you're seeing here is an instance of Gordian Envelope.
Blockchain Commons has tools that take examining the structure of envelopes to the next level.
This is the same structure in Envelope Notation.
You can now see it's just the subject "Alice" with a single assertion having the predicate
"knows" and the object "Bob".
The Blockchain Commons reference implementation tools even include output in the graphical
Mermaid format, making the structure of envelopes (especially complex ones) even easier to understand.
So while CBOR is a binary format, in our experience the gains far outweigh the costs.
CBOR's designers weren't kidding when they put "concise" in its name.
When encoding a small structure like this nested array of integers, BSON weighs in at
a hefty 34 bytes, Abstract Syntax Notation One (DER encoding) weighs in at 13, EXI4JSON
uses 11, RFC-713 uses 7, and CBOR only needs 5!
The only popular structured binary serialization format that matches it is the less-capable
and never-standardized MessagePack, upon which the design of CBOR was based.
CBOR's light weight is due to its consistent use of a single byte header for every element.
In fact, that single-byte header can be used as a jump-table for super-fast CBOR decoding.
It defines whether the element is an unsigned integer, a negative integer, a byte string,
a UTF-8 text string, an array, a map, a tagged item or certain special "simple" values like
`true`, `false`, and `null`.
Tags allow developers to define extended and composite data types.
Devices in the Internet of Things and other embedded systems often operate under tight
constraints of processing power, storage, and bandwidth.
CBOR is designed to be simple to encode and decode, with one popular C++ codec consisting
of only about 900 lines of code, and one Python implementation having less than 500 lines!
Over 200 Github repositories are tagged CBOR.
Numerous CBOR codecs exist across many popular languages, and sites like CBOR.io exist as
hubs to point developers to documentation, implementations, and other tools.
Last but not least, CBOR is *standardized*.
In addition to RFC 8949, which is the core CBOR spec, the IETF Datatracker shows 23 RFCs,
as well as 16 active Internet Drafts that reference CBOR in their title.
Another important RFC is 8610: The Concise Data Definition Language (CDDL), which is
a schema description notation for CBOR.
Blockchain Commons uses CDDL to describe all of our CBOR-based structures, notably those
of our Gordian Envelope Internet Draft.
Additionally IANA, the Internet Assigned Numbers Authority, maintains a registry of CBOR tags,
which helps developers coordinate extending CBOR data types.
But there's one more thing that makes CBOR a particularly great choice for what we're doing at Blockchain Commons, and particular our requirements for Gordian Envelope.
Envelopes are "smart documents," and one of several things that makes them smart is that for a particular set of semantics to encode, there is a single unique way of encoding it. This is particularly important for cryptographic constructs like hashing and signing. Not many other data serialization formats have a standard way to do this.
Some, like ASN.1 define a specific encoding method like DER to do this.
Before text formats like JSON can be used as smart documents, the JSON has to be "canonicalized," which uses a rather involved algorithm needed to transform the document to enable repeatability, at the expense of human readability. Furthermore, there aren't many implementations of the JSON Canonicalization Scheme, and in
fact there are several ongoing competing efforts.
One might ask: if you're going to sacrifice readability, why not just go with a binary format with a single, standard, canonical form? and Gordian envelopes require all CBOR they contain to be deterministically encoded.
And that's why Blockchain Commons is excited about working with CBOR, and we're looking forward to hearing your questions and ideas.