tbdex message protocol

tbDEX Protocol === the tbDEX protocol is comprised of two elements: _resources_ and _messages_. [ToC] # Messages ## Example The following example is an RFQ ```json= { "metadata": { "from": "did:ex:alice", "to": "did:ex:pfi", "kind": "rfq", "id": "abcd123" "threadId": <RFQ_ID>, "parentId": null, "dateCreated": "ISO_8601" }, "data": { "offeringId": <OFFERING_ID>, "quoteAmountSubunits": "STR_VALUE", "credentials": <PRESENTATION_SUBMISSION_HASH>, "payinMethod": { "kind": "BTC_ADDRESS", "paymentDetails": <OBJ_HASH> }, "payoutMethod": { "kind": "MOMO_MPESA", "paymentDetails": <OBJ_HASH> } }, "signature": "COMPACT_JWS", "private": { "credentials": <PRESENTATION_SUBMISSION>, "payinMethod": { "paymentDetails": <OBJ> }, "payoutMethod": { "paymentDetails": <OBJ> } } } ``` ## Fields All tbdex messages are JSON objects which can include the following top-level properties: | Field | Required (Y/N) | Description | | ----------- | -------------- | --------------------------------------------------------------------- | | `metadata` | Y | An object containing fields _about_ the message | | `data` | Y | The actual message content | | `signature` | Y | signature that verifies the authenticity and integrity of the message | | `private` | N | An ephemeral JSON object used to transmit sensitive data (e.g. PII) | ### `metadata` The `metadata` object contains fields _about_ the message and is present in _every_ tbdex message. | Field | Required (Y/N) | Description | | ------------- | -------------- | ----------------------------------------------------------------------------------------- | | `from` | Y | The sender's DID | | `to` | Y | the recipient's DID | | `kind` | Y | e.g. `rfq`, `quote` etc. This defines the `data` property's _type_ | | `id` | Y | The message's ID | | `threadId` | Y | ID for a "thread" of messages between Alice <-> PFI. Set by the first message in a thread | | `parentId` | N | the ID of the most recent message in a thread | | `dateCreated` | Y | ISO 8601 | > :information_source: **TODO**: decide on what to do about `parentId` when the message is an rfq. set to `null`? remove it? ### `data` The actual message content. This will _always_ be a JSON object. ### `private` Often times, an RFQ will contain PII or PCI data either within the `credentials` being presented or within `paymentDetails` of `payinMethod` or `payoutMethod` (e.g. card details, phone numbers, full names etc). In order to prevent storing this sensitive data with the message itself, the value of a property containing sensitive data can be a [hash](#Hashing) of the sensitive data. The actual sensitive data itself is included in the `private` field. The `private` field is ephemeral and **MUST** only be present when the message is initially sent to the intended recipient The value of `private` **MUST** be a JSON object that matches the structure of `data`. The properties present within `private` **MUST** only be the properties of `data` that include the hash counterpart. > :information_source: Rationale behind the `private` JSON object matching the structure of `data` is to simplify programmatic hash evaluation using JSONPath to pluck the respective hash from `data`. **NOTE**: we should try this to make sure it's actually "easy" #### Example ```json= { "data": { "offeringId": <OFFERING_ID>, "quoteAmountSubunits": "STR_VALUE", "credentials": <PRESENTATION_SUBMISSION_HASH>, <---- hash "payinMethod": { "kind": "BTC_ADDRESS", "paymentDetails": <OBJ_HASH> <---- hash }, "payoutMethod": { "kind": "MOMO_MPESA", "paymentDetails": <OBJ_HASH> <---- hash } }, "private": { "credentials": <PRESENTATION_SUBMISSION>, <---- actual "payinMethod": { "paymentDetails": <OBJ> <---- actual }, "payoutMethod": { "paymentDetails": <OBJ> <---- actual } } } ``` ### `signature` The `signature` property's value is a compact [JWS](https://datatracker.ietf.org/doc/html/rfc7515) #### Header The JWS header **MUST** contain the following properties: | Field | Description | | ----- | ------------------------------------------------------------------------------ | | `alg` | [Reference](https://datatracker.ietf.org/doc/html/rfc7515#section-4.1.1) | | `kid` | the `id` of the DID Doc`verificationMethod` that can be used to verify the JWS | #### Payload The Payload is a JSON object and **MUST** contain the following: | Field | Description | | ---------- | ------------------------------ | | `metadata` | [Hash](#Hashing) of `metadata` | | `data` | [Hash](#Hashing) `data` | ## ID generation Currently, tbdex message IDs are [TypeIDs](https://github.com/jetpack-io/typeid) generated by the sender. The prefix for a given id **MUST** be the same as `metadata.kind` of the message > :information_source: TODO: Discuss using `prefix_$(sha256(cbor(message)))` as the ID as an alternative ## Hashing TL;DR: ``` base64Encode( sha256( cbor(json) ) ) ``` 1. **CBOR Encode the JSON Object**: take your JSON object and encode it using CBOR. This step produces a binary representation of your JSON object. 2. **SHA256 the Bytes**: Hash the CBOR-encoded byte sequence using SHA256. This produces a fixed-size (256-bit) hash. 3. **Base64 Encode the Hash**: Finally, to represent the hash in a text format (for easier sharing, storage, etc.), Base64 encode the SHA256 hash bytes. ### Rationale #### Why CBOR? Benefits: * **Deterministic Serialization**: JSON serialization libraries can sometimes produce non-deterministic results, especially when it comes to the ordering of keys in objects. This could result in the same logical object having different serialized representations. CBOR, by contrast, offers deterministic serialization, ensuring that the same logical object will always produce the same binary representation. * **Uniform Data Representation**: Some data types, such as floating-point numbers, can have multiple valid representations in JSON (e.g., 1.0 vs. 1). CBOR can offer a more consistent representation of these types. Trade-offs: * **Complexity**: Additional complexity & dependencies to encode CBOR * **Performance**: While CBOR might be more space-efficient, the act of converting JSON to CBOR introduces an additional computational step. For small objects or infrequent operations, this might be negligible, but for high-frequency operations, the conversion overhead could become noticeable. #### Why SHA256? * **Widely Recognized and Adopted**: SHA256, which is part of the SHA-2 (Secure Hash Algorithm 2) family, is widely recognized and adopted in various cryptographic applications and protocols. SHA256 is standardized by the National Institute of Standards and Technology (NIST) in the U.S. Being a standard means it has undergone extensive review and evaluation by experts in the field. * **Security**: As of today, SHA256 has no known vulnerability to collision attacks, preimage attacks, or second preimage attacks. * A collision attack is when two different inputs produce the same hash. * A preimage attack is when, given a hash, an attacker finds an input that hashes to it. * A second preimage attack is when, given an input and its hash, an attacker finds a different input that produces the same hash. * **Output Size**: SHA256 provides a fixed hash output of 256 bits (32 bytes). This size strikes a balance between efficiency and security #### Why Base64? When sending a SHA-256 hash (or any binary data) over the wire, it's common to use an encoding that translates the binary data into a set of characters that can be safely transmitted over systems that might not handle raw binary well. One of the most common encodings used for this purpose is Base64 encoding. Base64-encoded data is safe for transmission over most protocols and systems since it only uses printable ASCII characters. Base64 Encoding/Decoding is widely supported across several programming languages. > :information_source: it's worth noting that a raw SHA256 hash is 32 bytes. When base64 encoded it becomes a 44 byte string ## Encryption > :information_source: TODO: Fill out ## Compatibility: tbDEX <> Web5/DWeb Messages The tbDEX messaging format is designed to work with HTTP/REST implementations for PFIs, but can also work with Web5-DWeb messages and thus DWNs. This is important for self-custodial use cases and for transactional portability for end consumers to own their financial data and transaction history. This can be done by mechanically mapping as described below: DWN representation with comments indicating the equivalent tbdex field ```json= { "descriptor": { "recipient": "did:ex:pfi", // same as to "schema": "rfq", // same as kind "protocol": "tbdex", "protocolPath": "tbdex/rfq", "dataCid": <CID_OF_DATA_PROP>, "contextId": "abcd123", // same as threadId "parentId": null, "dateCreated": "ISO_8601" }, "data": { "offeringId": <OFFERING_ID>, "quoteAmountSubunits": "STR_VALUE", "credentials": <PRESENTATION_SUBMISSION_HASH>, "payinMethod": { "kind": "BTC_ADDRESS", "paymentDetails": <OBJ_HASH> }, "payoutMethod": { "kind": "MOMO_MPESA", "paymentDetails": <OBJ_HASH> } }, "authorization": ["COMPACT_JWS"], "private": { // needs work "credentials": <PRESENTATION_SUBMISSION>, "payinMethod": { "paymentDetails": <OBJ> }, "payoutMethod": { "paymentDetails": <OBJ> } } } ``` Current differences: The difference between tbDEX messages and DWN messages are minimal and reconcilable given the appropriate amount of time. The primary differences between the two formats are the following: * tbDEX messages contain an ephemeral `private` field that can be used to transport sensitive data without storing it. * DWeb Message IDs are IPLD CIDs, specifically `dag-cbor`. Support for this is not readily available in kotlin or swift. This difference can be reconciled in a number of ways. * DWeb Message descriptors contain `dataCid` which is an IPLD CID, specifically `dag-pb`. Support for this is not readily available in kotlin or swift. This difference can be reconciled in a number of ways. * The tbDEX message format's hash algorithm is similar to `dag-cbor` minus the `dag` aspect to ensure that implementations in other languages are relatively straightforward * DWeb messages make use of General JWS whereas tbDEX messages use compact JWS * The DWeb message `descriptor` property is roughly analagous to tbDEX message's `metadata` property * DWeb message `data` is returned as base64url encoded whereas tbDEX message `data` is returned as is Potential changes for DWeb messages: move away from IPLD CIDs, move to a simpler hash format that has broad lanhguage support. Support directly embedding of data in some scenarios (or optionally support inline data for small values). tbDEX will rename from metadata to descriptor. # Resources > :information_source: TODO: describe tbDEX Resources (e.g. `Offering`)