1. JSON vs CBOR

# JSON vs. CBOR: Optimizing Data for the Web3 In the transition from Web2 to Web3, we often inherit dependencies and patterns that don't quite fit the constraints of a blockchain environment. We are moving away from centralized servers for user access control and toward verifiable, self-sovereign information. To achieve this, we need to rethink how we structure and serialize data. While JSON is the common format of the web, is it the right tool for the Web3 and beyond? Let’s look at the alternatives. ## The Data Structure Let's consider a typical e-commerce transaction structure. In a standard Web2 API, it might look like this JSON representation: ```json { "shopId": "SHOP123", "terminalCount": 1, "entries": [ { "saleId": "A9F2KD", "terminalId": "T1", "timestamp": 1735000000, "amount": "1200000000", "asset": "pUSD", "merchant": "5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY", "customer": "5FHneW46xGXgs5mUiveU4sbTyGBzmstUspZC92UhjJM694ty", "txHash": "0xafcc889f090139ce746....a7b61988ad45acddd6", "block": 1234567, "status": "finished", "refund": { "refunded": false, "refundSaleId": null } } ] } ``` Minified, this JSON takes up **434 bytes**. While acceptable for centralized servers that have gigabytes or even terabytes of storage, in the context of Web3 where we have limited storage maybe we can leverage some tools to save a few more bits. ## CBOR (Concise Binary Object Representation) It is important that we value attributes beyond just raw compression - specifically **maintainability** and **extensibility**. CBOR allows us to write data that is **self-describing** (like JSON) but binary (like SCALE). 1. **Solution Design:** Because it supports all JSON types natively, it is developer-friendly and readable by generic decoders without needing a specific schema file. 2. **Reliability:** It allows for backward compatibility. An older version of an application can often still decode data created by a newer version, even if fields have been added. This supports the "graceful upgrade" philosophy essential to modern blockchain development. 3. **Size:** The JSON example above, when encoded in CBOR, drops to **369 bytes**. That is a **~15% reduction** compared to minified JSON. ## The Problem: The Determinism Trap CBOR seems like a great middle ground: smaller than JSON, more flexible than SCALE. But standard CBOR (and JSON) shares a critical flaw when applied to cryptography: **Non-Determinism**. In smart contracts or other decentralized system we **very often** hash data to verify it. This requires the input to be identical every single time. Consider this simple data: ```json { "name": "Leonardo", "lastName": "Custodio" } ``` If we convert the minified JSON to hex and hash it, we get: `0x0b1d2e3a256a212f57b772d8674b368e42d936d19901b9459ceb7d6be7a680b7` However, JSON and standard CBOR do not enforce key order. If a different library or language serializes the *same data* but orders the keys differently: ```json { "lastName": "Custodio", "name": "Leonardo" } ``` The hex representation changes, and consequently, the **hash** becomes: `0xc0c992cf52e41d06bea208e43b73d78cd573885a6a9f98763a63b84147c2f1f5` This change in hash breaks everything. It makes it impossible to verify if an input was saved before, impossible to deduplicate data efficiently, and expensive to validate signatures on-chain. To use CBOR effectively in a blockchain context, we need the flexibility of JSON, the binary efficiency of CBOR, and the strict determinism of SCALE. This is where **dCBOR** comes in and I will be talking about it here: https://hackmd.io/@leonardocustodio/deterministic-data-intro-to-dcbor