# Verifiable credential data formats
[TOC]

## SNARK-friendly data formats
1. Vector Pederson commitment (pre-quantum), like in BBS
2. Merkle commitment (post-quantum), like SSZ
## Notes
- EUDI will adopt VCDM 2.0 as soon as it is ready
- VCDM 2.0 will be published as a standard in June/July
- VCDM2.0 is JSON-LD, but can be processed as JSON, can use SD-JWT, JWT
- JSON is in fact compatible with data integrity proofs
- e.g. ecdsa-jcs
- JSON canonicalisation
- data integrity processing is independent of serialisation and canonicalisation
- sometimes data integrity depends on external context, e.g. ecdsa-rdfc
- "transform" step: take JSON or JSON-LD into an easier format
- new cryptosuite for STARKs can be a different processing
- "processing" is the "transform" step
- https://w3c.github.io/vc-di-ecdsa/#representation-ecdsa-rdfc-2019-with-curve-p-256
- https://w3c.github.io/vc-di-ecdsa/#representation-ecdsa-rdfc-2019-with-curve-p-256
- "canonical form" is specific to JSON-LD, gets the document into a set of statements
- transformation/canonicalisation is completely up to your cryptosuite
- look at BBS Test Vectors w3c.github.io/vc-di-bbs/#test-vectors
- canonicalised document is signed?
- holder takes base document (which is NOT canonicalised document) and derives another document with removed fields
- get in touch with Greg Bernstein to figure out canonicalisation for STARKs
- data integrity WG are very interested in supporting STARKs
1. issuer takes raw "base" document without proof
2. issuer canonicalises document
3. issuer signs canonicalised document
4. get a credential with proof
5. do more processing to get a signature attached to the canonicalised document
- "enveloping proofs" totally takes over the data model, makes it opaque to application layer, need special processors to extract it
- don't need to canonicalise or transform, just put the original data into the envelope and secure it
- "embedding proofs"
- verifier needs to be aware of many more transform algorithms
- but they will always need to anyway with selective disclosure?
- BBS also needs transform step
- why is it easier to merge data with JSON-LD?
- semantic data modelling
- JSON has no semantics, all single-use semantics
- in JSON, cannot differentiate between the same word used in two different systems
- JSON-LD context defines property in a global way, so no semantic ambiguity between different systems
- match each field to a globally resolvable long URL
- JSON is a local tree, but no merge semantics -- same fields will collide
- JSON-LD merge preserves global id's, graph merging is well-defined
- verifiers who do JSON-LD processing need to resolve and possibly contexts
- for offline, contexts in production are meant to be permanently cached, verifier should not have to get it from the Internet
- "type-specific processing"
## Questions
- high-level question: how does W3C VCDM standard interact with EUDI attestation format?
- can we participate in W3C Credentials Community Group?
- are we trying to pick an abstract data model or specific data format?
- is CBOR a...canonicalisation? Serialisation?
- VCDM1.1 vs VCDM2.0 -- which is going to be adopted? Is it the case that VCDM2.0 is specifically for JSON-LD, and if so, is this backwards compatible with VCDM1.1?
- data integrity proofs vs. embedded signatures?
- which is detached and which is embedded?
- do you think zk-(S)NARKs can be supported as detached data integrity proofs? (Embedded vs enveloping proofs)
- do JWTs support data integrity proofs?
- "JWT makes it visible what is signed in contrast to LD-Proofs, e.g., LD Signatures, that are detached from the actual payload and contain links to external documents which makes it not obvious for a developer to figure out what is part of the signature."
- "Readers should be aware that Zero-Knowledge Proofs are currently proposed as a sub-type of LD-Proofs and thus fall into the final column below."
- "Using these graphs has a concrete effect when performing JSON-LD processing, as this properly separates statements expressed in one graph from those in another graph. Implementers that limit their processing to other media types, such as JSON, YAML, or CBOR, will need to keep this in mind if they merge data from one document with data from another, such as when an id value string is the same in both documents. It is important to not merge objects that seem to have similar properties, when those objects do not have an id property and/or use a global identifier type such as a URL, as without these, is not possible to tell whether two such objects are expressing information about the same entity. "
## Design considerations
- Optimising for zkSNARK compatibility
- to parse inside a SNARK: CBOR seems easier
- to contain a SNARK proof
- "If no JWS is present, a proof property MUST be provided. The proof property can be used to represent a more complex proof, as may be necessary if the creator is different from the issuer, or a proof not based on digital signatures, such as Proof of Work. The issuer MAY include both a JWS and a proof property. For backward compatibility reasons, the issuer MUST use JWS to represent proofs based on a digital signature."
- (JSON-LD) "The next example utilizes the verifiable credential above to generate a new derived verifiable credential with a privacy-preserving proof. The derived verifiable credential is then placed in a verifiable presentation, so that the verifiable credential discloses only the claims and additional credential metadata that the holder intended. To do this, all of the following requirements are expected to be met: " https://www.w3.org/TR/vc-data-model/#zero-knowledge-proofs
- Optimising for developer hours
- we may not have the time to write something from scratch; regex and field parsing already available for zkJSON
- CBOR circuits: https://github.com/noway/nzcp-circom/blob/main/circuits/cbortpl.circom
- we should find/construct an example of:
- a SNARK over a signature being used as a data integrity proof
- - `proof`, `domain`, `challenge`, `proofValue`
- `proofPurpose`
- `authentication`, `assertionMethod`, `keyAgreement`, `capabilityDelegation`, `capabilityInvocation`
- https://w3c.github.io/vc-data-integrity/#add-proof
- a recursive SNARK over a signature being used as a data integrity proof
- proof chain, `previousProof`
## Formats to consider
### SD-JWT
- JSON is incompatible with Data Integrity Proof?
- JWTs can embed proof attributes for repudiable proofs such as Zero-Knowledge Proofs. In that case, the JWS will not have an signature element.
- "The LD-Proof format is capable of modifying the algorithm that generates the hash or hashes that are cryptographically signed. This cryptographic agility enables digital signature systems, such as Zero-Knowledge Proofs, to be layered on top of LD-Proofs instead of an entirely new digital signature container format to be created. JWTs are designed such that an entirely new digital signature container format will be required to support Zero-Knowledge Proofs."
- Is this true? "A JWT can fully describe itself without the need to retrieve or verify any external documents"
- For example, one JSON document might use "title" (meaning "book title") in a way that is semantically incompatible with another JSON document using "title" (meaning "job title").
- should be verifier's job to understand schemas?
### JSON-LD

- canonicalisation -- find out what's involved
- is it actually backwards-compatible with JSON?
- abstract data model supports DAG
- "JSON-LD's abstract data model supports the expression of information as a directed graph of labeled nodes and edges, which enables an open world data model to be supported. JSON's abstract data model only supports the expression of information as a tree of unlabeled nodes and edges, which restricts the types of relationships and structures that can be natively expressed in the language."
- compatible with other encodings besides base-64
- "LD-Proofs enable developers to use common JSON tooling without having to convert the format into a different format or structure. JWTs base-64 encode payload information, resulting in complicated pre and post processing steps to convert the data into JSON data while not destroying the digital signature."
- has an @context field; can be parsed as JSON by ignoring this field
- In general, a JSON array is ordered, while a JSON-LD array is not ordered unless that array uses the @list keyword.
- flow: verifier caches the @context that it is planning to use
- "It is best practice that JSON data structures typically do not expect changing types of their internal attributes. JSON-LD has implicit support for compact form serialization which transforms arrays with a single element only to switch its data type. Developers writing parsers have to implement special handling of these data types, which results in more code, is more error-prone and sometimes does not allow parsers based on code generation, which rely on static types."
### CBOR
- binary format
- self-describing
### SSZ
https://ethereum.org/en/developers/docs/data-structures-and-encoding/ssz/