DID Doc Encoding: Abstract Data Model in JSON

# DID Doc Encoding: Abstract Data Model in JSON This is a proposal to simplify DID-Docs by defining a simple abstract data model in JSON and then permitting other encodings such as JSON-LD, CBOR, etc. This would eliminate an explicit dependency on the RDF data model. ## Universal Adoptability For universal interoperability, DIDS and DID-Docs need to follow standard representations. One goal of the DID specification is to achieve universal adoption. Broad adoption is fostered by using familiar representations or encodings for the DID and DID Doc. The DID syntax itself is derived from the widely adopted and highly familiar URI/URL identifier syntax. This takes advantage not only of familiarity but also the tooling built up around that syntax. Likewise greater adoption is fostered to the degree that the DID Doc representation or encoding uses a familiar widely adopted representation with extant tooling. The only reason not to use a highly familiar representation is if the requirements for representation demand or greatly benefit from a less familiar representation. In an appendix to this document provides some detail about the main purposes of a DID Doc. This shows that a complex representation is not required and may not be beneficial. In addition, having only a single representation or encoding, albeit highly familiar and widely adopted, may be insufficient to achieve universal adoption. It may require multiple representations or encodings. Multiple encodes require a standard encoding that is common. Or in other words the least common denominator from which other encodings may be derived. One way to accomplish this is to use an abstract data model as the standard encoding and then allow for other encodings. This was proposed in the following issue: https://github.com/w3c/did-core/issues/103#issuecomment-553532359 The problem with an abstract data model is that the syntax is expressed in some abstract modeling language, typically a kind of pseudo code. Pseudo code is usually less familiar than real code. This means that even in the major case the spec is written in a *language* that is unfamiliar. This runs counter to fostering broader adoption. A solution to this problem is to pick a real language encoding for the abstract data model that then provides both an abstracted standard encoding that other encodings can more easily be derived from and also provides the lowest common denominator standard encoding. Clearly given the web roots of the DID syntax itself as a derivation of URL syntax, JSON's web roots would make it the ideal candidate for an abstract data model language. Of any encoding available, JSON is the closest to a universally adopted encoding. JSON is simple but has sufficient expressive power to model the important data elements needed. It is therefore a ***sufficient*** encoding. Annotated JSON could be used to model additional data types such as an ordered mapping (in the event that they are needed). Many of the related standards popular among implementors such as the JWT standards are based on JSON. Casual conversations with many others in the community seem to suggest that a super majority of implementors would support JSON as the standard encoding for the combined abstract data model and default encoding. Given JSON's familiarity JSON would not pose a barrier to implementors of other optional encodings such as JSON-LD or CBOR. It would be just as easy if not easier to translate JSON to another encoding as pseudo-code. ## The Elephant in the Room The result of this proposal would be to make JSON the standard encoding for the DID Doc specification and demote JSON-LD to be an optional encoding. The current DID spec uses JSON-LD as the preferred encoding but does not prevent the use of naive JSON as an encoding. However the DID spec mandates JSON-LD elements that show up as artifacts when using JSON that a JSON implementer must handle specially. Moreover, the semantics of JSON-LD are much more restrictive than JSON. This results in a lot of time being expended unproductively in community meetings discussing the often highly arcane and non-obvious details of JSON-LD syntax and semantics. The community is largely unfamiliar with JSON-LD. It is clear that JSON is sufficient to accomplish the main purposes of the DID Doc. Although JSON-LD may provide some advantages in some cases, its extra complexity runs counter to the goal of fostering more universal adoption. This proposal does not exclude JSON-LD but would encapsulate and isolate discussion about the esoteric syntax and semantics of JSON-LD to that subset of the community that really wants JSON-LD. Each optional encoding including JSON-LD would have a companion specification to the DID spec that defines how to implement that encoding. This structure will make it easier to implement other encodings in the future because JSON is much closer to a lowest common denominator data model than JSON-LD. The relevant questions up for decision are: - Is JSON a sufficient encoding for the purpose of DID Docs ? - Would JSON foster greater adoption than some other encoding such as JSON-LD ? - Do a majority of implementors prefer JSON as the default encoding over some other encoding such as JSON-LD ? The purpose of this proposal is not to debate the general good and bad of JSON-LD and RDF. There is much good in JSON-LD for many applications. But, relevant here is that JSON-LD is not as well aligned as JSON with the goal of fostering universal adoption. More specifically the RDF model employed by JSON-LD complicates the implementation of other encodings that do not share the RDF data model and RDF semantics. JSON does not suffer from this complication. This complication has the deleterious effect of slowing adoption. ## Appendix ### Purpose of DID-Doc The current DID specification includes a specification for a DID Document (DID-Doc). The main purpose of the DID-Doc is to provide information needed to use the associated DID in an authoritative way. A distinguishing feature of a DID (Decentralized Identifier) is that the controller (entity) of the DID obtains and maintains its control authority over that DID using a decentralized root of trust. Typically this is self-derived from the entropy in a random number (expressed as collision resistance) that is then used to create a cryptographic public/private key pair. When the identifier is universally uniquely derived from this entropy then the identifier has the property of self-certifiability. Another somewhat less decentralized root of trust for an identifier is a public ledger or registry with decentralized governance. In any event, a more-or-less decentralized root of trust only has value if other entities recognize and respect that root of trust. Hence portable interoperable decentralized identifiers must be based on an interoperable standard representation. Hence the DID standard. In contrast, "administrative" identifiers obtain and maintain their control authority from a centralized administrative entity. This control authority is not derived from the entropy in a random number. This statement may be confusing to some because administrative identifiers often use cryptographic public/private key pairs. To explain, PKI with public/private key pairs and cryptographic digital signatures enables the conveyance of control authority via signed non-repudiable attestations. But the source of that control authority may or may not be decentralized. Thus an administrative entity may convey trust via PKI (public/private keys pairs) but does not derive its control authority therein. Whereas a decentralized entity may derive its control authority over a DID solely from the entropy in the a private key in a PKI public/private key pair. A key technology under pining DIDs is cryptographic signatures by which the control authority over the associated DID and affiliated resources may be verified by any user of the DID. In contrast an administrative identifier always has a last recourse appeal to the authority of the administrative entity by whatever means that authority is established. Indeed, given the foregoing explanation, the most important task facing a user of a DID is to cryptographically verify control authority over the DID so that the user may then further cryptographically verify any attestations of the controller (entity) about the DID itself and/or affiliated resources. The verifications must be cryptographic because, with a decentralized root of trust, the original control authority was established cryptographically and the conveyance of that control authority may only be verified cryptographically. With DIDs it's cryptographic verification all the way down. From this insight we can recognize that a DID-Doc should support a primary purpose and a secondary purpose as follows: - Primary: Aid the user in cryptographically verifying the current control authority over the DID. - Secondary: Aid the user in discovering and verifying anything else affiliated with the DID based on the current control authority. If the user cannot determine the current control authority over the DID then the information in the DID Doc cannot be authoritatively cryptographically verified. Consequently, absent verified control authority, any use of the DID Doc for any purpose whatsoever is at best problematic. ### Process Model for Establishing Cryptographic Control Authority As mentioned above a fully decentralized identifier is self-certifiable. Other partially decentralized identifiers may be created on a ledger or registry with decentralized governance. The first case is the most important from a process model point of view. The second case is less informative. The root of trust in a self-certifying identifier is the entropy used to created a universally unique *random number* or *seed*. Sufficient entropy ensures that the *random seed* is unpredictable (collision resistant) to a degree that exceeds the computational capability of any potential exploiter for some significant amount of time. Currently 128 bits of entropy is considered sufficient. That random seed is then converted to a private key for a given cryptographic digital signature scheme. Through a one-way function, that private key is used to produce a public key. The simplest form of self-certifying identifier includes that public key in the identifier itself. Often the identifier syntax enables it to become a self-certifying name-space where the public key is used as a prefix to a family of identifiers. Any attestation signed with the private key may be verified with the public key. Because of its universal collision resistance no other identifier may be associated with a verifiable attestation. This makes the identifier ***self-certifying***. Furthermore, instead of the public key itself the identifier may include a fingerprint of the public key. In order to preserve the cryptographic strength of the root of trust in the random seed, the fingerprint must have comparable collision resistance to the original random seed. The application of further one-way functions can be applied successively to produce successive derived fingerprints. This is similar to how hierarchically deterministic key chains are generated. To restate, a one-way function may be applied to the public key producing a derived fingerprint and then another to that fingerprint and so one. The collision resistance must be maintained across each application of a one-way function. Instead of merely deriving a simple fingerprint, one could take the public key and use it as a public *seed* that when combined with some other data may be transformed with a one-way function (such as a hash) to produce yet another fingerprint. As long as the process of creation of any derived fingerprint may be ascribed universally uniquely to the originating public/private key pair, the resultant derived identifier may be uniquely associated with attestations signed with the private key and verifiable with the public key. This makes the eventually derived identifier also self-certifiable. #### Rotation The problem is that over time any public/private key pair used to sign attestations becomes weakened due to exposure via that usage. In addition, a given digital signature scheme may become weak due to a combination of increased compute power and better exploit algorithms. Thus to preserve cryptographic control of the identifier in the face of exposure, the originating public/private key may need to be *rotated* to a new key pair. In this case the identifier is not changed, only the public/private key pair that is authoritative for the identifier is changed. This provides continuity of the identifier under changes in control of the identifier. This poses a problem for verification because there is no longer any apparent connection between the newly authoritative public/private key pair and the identifier. That connection must be established by a rotation operation that is signed by the previously authoritative private key. The signed attestation that is the signed rotation operation transfers authoritative control from one key pair to another. Each successive rotation operation performs a transfer of control. #### State Machine Model of Control Authority To summarize, control authority over a decentralized identifier is originally established though a self-certification process that uniquely associates an identifier with a public/private key pair. Successive signed rotation operations may be then used to transfer that control authority to a sequence of public/private key pairs. The current control authority at any time may be established by starting at the originating key pair and then applying the successive rotation operations in order. Each operation is verified via its cryptographic signature. The process and data model for this is a state machine. In a state machine there is a current state, an input event and a resultant next state determined by state transition rules. Given an initial state, a set of state transition rules, replaying a sequence of events will always result in the same terminal or current state. This is a simple unambiguous process model. The data model is also simple. It must describe the state and the input events. There is no other data needed. The state is unambiguously and completely determined by the initial state, the transition rules and events. No other context or inference is needed. A simple representation will suffice. Once the current control authority for a DID has been established to be a given key pair (or key pairs) then any other information affiliated with that DID may be cryptographically verified via a signed attestation using the current key pair(s). The important information needed to establish the authoritative stature of any additional information such as encryption keys or service endpoints is the current authoritative signing key pair(s) for the identifier and that the version of the information in the DID Doc is sourced from the controlling entity of the current key pair(s). This means the DID Doc may benefit from an internal identifier that corresponds to the latest rotation event that establishes the current key pair(s) or some other identifier that associates the DID Doc with specific signing key pair(s). This process of first establishing authoritative key pair(s) greatly simplifies the cryptographic establishment of all the other data. There are various mechanisms that may be employed to maintain the state and associated event sequence. These could be as simple as a set of servers with immutable logs for the events/states that also run the code for the state transition logic. A more complex setup might rely on a distributed consensus ledger to maintain the state. The DID Doc in and of itself, however, is insufficient to fully establish the current authoritative key pair(s). Other infrastructure is required. Merely including a set of rotation events in a DID Doc only establishes control authority up to the latest included rotation event. But other rotation events may have happened since that version of the DID Doc was created. Consequently a DID Doc's main role in this respect it to help a user discover the mechanisms used to establish current control authority. This must be done with some care because in a sense the DID Doc is bootstrapping discovery of the authority by which one may trust the discovery provided in the DID Doc. Nonetheless in order to be authoritative, the other information in the DID Doc that is not part of discovering authoritative control does not need an event history but merely a version identifier linking it to the authoritative key pair(s) and an attached authoritative signature from the current authoritative key pair(s). In other words the DID Doc is used to bootstrap discovery of the current authoritative controlling keys and then to provide authoritative versioned discovery of affiliated information. #### RDF Complications The RDF model uses triples to canonicalize a directed graph. This graph may be used to make inferences about data. This model attaches a context to a given DID Doc that must be verified as authoritative. This expansion complicates the process of producing an authoritative versioned discovery document or an evented state machine. Clearly a clever implementation of a cyclical directed graph could be used to implement versioned discovery documents or evented state machines. Many implementations of RDF, however, use directed acyclical graphs making the implementation of evented state machines at best problematic and versioned discovery documents more cumbersome. This forces a particular methodology on implementing versioned discovery documents or evented state machines than what might be the easiest for the implementer.