An Implementors Rant on the DID spec

From an implementors point of view, the current DID spec (v0.13 at the time of this writing) has many problems that make implementing DID-powered software unnecessarily difficult and unwieldy. This document captures all of the thoughts and criticisms of the current spec that came up during the design and implementation of a new DID-powered signing tool written in Rust, and a DID document parser implementated as a Rust crate. What follows are specific issues roughly in the order they were encountered.

Problem 1: Too Broad in Scope

Starting with the Abstract and Introduction, an implementor gets a really good mental picure of what a decentralize identifier (DID) is and how it fits in the bigger picture. A DID resolves to a DID document. The DID document contains a number of pieces of data attached to the subject that the DID refers to and enables trustful interractions with the controller of the subject/DID. This sounds to me like an introduction that then links to the spec for DIDs, the spec for DID documents, and a spec for "basic DID operations", and links to individual service endpoint specifications.

This spec covers too much ground and muddies the waters whenever it switches contexts from DIDs to DID documents to DID operations to other miscellaneous details. To clarify things for system implementors, this DID spec really should become at least four separate specifications: DID string, DID doc, DID operations, and DID service endpoint specs.

A good start to a specification on the basic DID operations can be found in the RWoT8 document Universal DID Operations. A good example of a service-endpoint specification is the RWoT6 document Introduction to DID Auth.

Problem 1.1: Sorting through the mess

As stated above, there's just too much in this document. In this problem I will walk through the current spec and assign a destination document for it.

1. Introduction – split into DID string, DID doc, and DID method specs.
- 1.1 A Simple Example – Example 1 DID string spec and Example 2 DID doc spec.
- 1.2 Design Goals – copy unchanged into all three specs.
- 1.3 Interoperability – split relevant pieces to each of the three specs.
1. Terminology – split relevant pieces to each of the three specs.
1. Data Model – DID doc spec.
1. Decentralized Identifiers (DIDs) – DID string spec.
1. DID Documents – DID doc spec.
1. DID Document Syntax – DID doc encoding addendum…DID docs should be encoding agnostic.
1. DID Methods – DID operations spec.
1. DID Resolvers – DID string spec??
1. Security Considerations – DID doc spec.
- 9.1 Requirements of DID Method Specifications – DID operations sepc.
- 9.2 Choosing DID Resolvers – DID string spec??
- 9.3 Binding of Identity – DID operations spec.
- 9.4 Authentication Service Endpoints – Service endpoints meta spec.
- 9.5 Non-Repudiation – DID operations spec.
- 9.6 Notification of DID Document Changes – Somewhere, anywhere, else.
- 9.7 Key and Signature Expiration – DID doc spec.
- 9.8 Key Revocation and Recovery – DID operations spec.
- 9.9 The Role of Human-Friendly Identifiers – Intro of DID string spec?
- 9.10 Immutability – DID doc spec.
1. Privacy Considerations – DID doc spec.
- 10.1 Requirements of DID Method Specifications – DID operations spec.
- 10.2 Keep Personally-Identifiable Information (PII) Private – DID doc spec.
- 10.3 DID Correlation Risks and Pseudonymous DIDs – DID string spec.
- 10.4 DID Document Correlation Risks – DID doc spec.
- 10.5 Herd Privacy – Somewhere else?
1. Future Work – Somewhere else?
A. Registries – DID operations spec.
B. Real World Example – DID doc spec.

Problem 2: What is a DID?

The first introduction for a DID happens in section 1.1 with a simple example that says a DID consists of three parts: URL scheme, DID method, DID method-specific identifier. Then, the spec dives in to DID documents and section after section of terminology and definitions and explanations that really don't tell me what a DID is. That doesn't come up again until section 4. The simple example seems way out of place and should probably be split into two examples–a DID example and a DID document example–with the DID example being moved to section 4 and the DID document example moving to section 5. But again, because of problem 1 above, section 4 should be broken out into a stand-alone specification on it's own.

Problem 3: No strong case for generic DID parameters

This problem was originally called "did parameters are both too generic and too specific" but was generalized because the argument for DID parameters is not well formed. They are really only useful in method-specific DIDs and the generic ones should be removed and we should resist the urge to create new ones. Generic parameters are really just universal DID operations (see: Problem 1) masquerading as DID parameters.

Currently there are only two generic parameters that are not also marked with "Note: This parameter may not be supported by all DID methods". The two "optional" parameters are really just method-specific parameters and should be removed outright. As for the other two, they will be addressed separately in their own sections below.

The conclusion is that DID parameters are useful for method-specific uses, and we should define a way to format/parse them, but we should not mandate any top-level generic parameters because they really should be part of the universal DID operations (see: Problem 1).

Problem 3.1: Hashlinks give you nothing

The "hl" generic parameter gives the user nothing of substance. It only proves that the DID resolves to a document that is the pre-image of the hash in the DID. It doesn't provide for any real authenticy attestations other than the link between the DID and the document. It does nothing to bind a DID controller to the DID or the DID document. A separate challeng-response interraction would be necessary to establish the link between a DID controller and the DID document and the DID that resolves to the document. Adding hashlinks as a parameter is just a waste of CPU cycles and bytes.

Problem 3.2: Service is too specific and too generic

The intent of the "service" parameter is to retrieve a service endpoint data unit from a DID document using the service endpoint's id string. First of all, why are only service endpoints enshrined as a generic parameter? What about publicKey data units or authentication data units? Service is too specific. Second, "service" is really just additional data for the universal DID operations. Instead of making this a parameter, it should probably be a query on the end of a DID string. The "service" parameter in the spec is too generic and needs to be attached to specific DID operations that make sense. As more and more method-specific parameters get popular, they will eventually migrate to be universal parameters just like functions always creep towards a base class. Eliminating the possiblity of generic parameters fixes this problem.

Also, since "service" shouldn't be any more special than "publicKey" or "authentication", it should be moved to be a query parameter and renamed to something more generic like "selector" and then we get to have fun defining a domain-specific language (DSL) for selecting specific data units from a DID document. Since DID documents are structures of dictionaries and lists, the DSL could be as simple as using '.' to mean address into a dictionary and ',' to mean index into a list. Assume we have the following DID document:

{
  "@context": "https://www.w3.org/2019/did/v1",
  "id": "did:example:123456789abcdefghi",
  "authentication": [{
    "id": "did:example:123456789abcdefghi#keys-1",
    "type": "RsaVerificationKey2018",
    "controller": "did:example:123456789abcdefghi",
    "publicKeyPem": "-----BEGIN PUBLIC KEY...END PUBLIC KEY-----\r\n"
  }],
  "service": [{
    "id":"did:example:123456789abcdefghi#vcs",
    "type": "VerifiableCredentialService",
    "serviceEndpoint": "https://example.com/vc/"
  }]
}

Now to address the "type" value in the first authentication data unit, the selector string would be: .authentication,0.type. The root is a dictionary so we start with a '.' followed by the key in that dictionary: "authentication". The value associated with "authentication" is a list so we then use a ',' followed by the index of the data unit we want to address: 0. Since the 0th data unit in the "authentication" list is a dictionary, we then use a '.' followed by the name of the key we want to address: "type". Putting this all together, a read DID operation to get the type of the 0th authentication data unit would use the following DID:

did:example:123456789abcdefghi?selector=.authentication,0.type

A read DID operation to get all service endpoint data units from a DID document would look like this:

did:example:123456789abcdefghi?selector=.service

That would return a list of service endpoint data units. This could also support masking so that a client can request all service endpoint records of a certain type using the query portion of a DID. Here's an example:

did:example:123456789abcdefghi?selector=.service&match-key=.type&match-value=VerifiableCredentialService

This will return all service endpoint data units where that have a type of "VerifiableCredentialService".

This is all just shooting from the hip, but it demonstrates a way to normalize a method for addressing specific pieces of a DID document and further justifies removal of the "service" generic parameter in favor of leaving it up to specific universal DID operations to define and moving selectors to be query parameters.

Problem 4: The DID document needs its own spec

The current DID spec covers too many topics to not be confusing. DID strings need their own spec as does the DID document. The following are all problems with DID documents that cropped up when writing DID parsers and other DID document based software.

Problem 4.1: The DID document spec should be encoding agnostic

The existing DID specification just takes it on faith that DID documents must be encoded in JSON-LD. This is wrong mostly because the DID specification would be clearer if it focused on the what not the how. JSON-LD encoding is the how and it is completely orthogonal to what must be present in a DID document for it to be useful in the SSI context. DID documents consist of a number of different types of "data units" that when stored together form the basis upon which all decentralized identity operations can operate on.

We already covered the fact that there is a set of universal DID operations and designed for extensibility by allowing arbitrary service endpoint data units and additional method-specific operations. But what data in the DID document must be there to make the universal DID operations possible? What allows for binding, key rotation, authentication, etc? That's what this problem is about and to make things better for implementors, the DID document deserves its own specification that defines the data units, what is in them, and why each data unit exists. Most importantly, the DID document encoding must not be defined in the main specification.

To address DID document encoding, an addendum to the DID document specification could be created that handles defining the formatting of basic types (e.g. strings, integers, floats, URLs, dates, etc) as well as the names of keys in data units for when they are serialized into a text-based dictionary/list type encoding. The encoding addendum can also specify the minimum requirements for any binary encoding of the data.

JSON-LD is core to the current design of the DID spec but it shouldn't be. It seems like two things are conflated: extensibility and encoding. If we keep the "human readable" requirement, there are many different text-based encodings that are as extensible as JSON-LD. The greatest problem with JSON-LD of course is the fact that canonicalization is specified as an algorithm but left up to the implementor to accomplish. JSON-LD was never designed to operate in a world where digital signatures over JSON-LD encoded data was to be the primary means of ensuring authenticity.

To prove that it is possible to improve upon JSON-LD, I came up with a simpler way to do a fully extensible text based encoding that enables permissionless innovation but also solves the problem of canoncialization as well as automatic parsing and verification. The format is called Serialize Data Expression Format and it is a hack of simple s-expressions to create an extensible data encoding scheme.

Problem 4.2: What must be in a key data unit?

One data unit already defined in the existing specification is the "publicKey" data unit. One observation is that DID documents aren't always public and there are reasons to also support secret key data. The DIDDir specification defines how DID documents stored in a Git repository and managed using the 'did:git' DID method can be used not only as a keyring of DID documents for a Git repo but also as a personal keyring of DID documents, including their own DID documents with their secret keys.

The DID document specification should change the "publicKey" data unit to just "key" because it the private context it may contain secrets and other data such as ratchet states. If I have a Noise Protocol session between one of my identities and a remote identity, my local copy of their DID document may contain the current ratchet state associated with my DID that represents my end of the session. In my local DID document I may also store my private half of all Noise Protocol "pre-keys" that I have published in my public DID document to enable asynchronous discovery and session initiation. There are many reasons why the key data unit needs to be a generic key record that can also contain secrets.

Also missing from the current DID spec is any support for key recovery mechanisms. There have been proposals but nothing has made it into the spec yet. Key recovery data should be optionally included in each key data unit in the keys list. The kinds of key recovery mechanisms should be an exhaustive list with associated data inside of an object. For something as simple as a pre-image proof to replace a key, the recovery information could look something like this:

{
  "key": [{
    "id": "did:example:123456789abcdefghi#keys-1",
    //...
    "recovery": {
      "kind": "PreimageProof2019",
      "data": "0a6d71e286d0a99d641e286d0a99a6d71e286d0a"
    }
  }]
}

The DID controller can use a universal DID operation to provide the pre-image to the proof along with new key data and new recovery data to replace the current data in the DID document. This is just speculation on how it would look. The point is that the current spec doesn't cover all of the data necessary for full CRUD operations on key data units.

Problem 4.2: The key-encoding-based key naming is garbage

Using key names that are based on the key data encoding (e.g. publicKeyPem, publicKeyJwk, etc) is one of two of the most glaring mistakes made in the current DID specification. From an implementors standpoint this is a nightmare and it opens the door for lots of corner case and combinatorial errors.

First of all, the "NOTE" in the spec says that the list of possible key names is non-exhaustive. At the very least this means no DID parser/validator can be automatically generated. There is no ABNF in the world that can handle non-exhaustive lists without being so general as to have no validating power. Non-exhaustive actually means "any name will do…you pick." It also means that having a clean enumeration of all possible key encodings and writing invariant checks on public key data units is impossible. This goes against all software engineering best practices.

The correct way to implement this is to have a key named "encoding" with a value that is one of an exhaustive set of key encoding schemes. That list can be governed through a well defined update process and refreshed as needed in the future. In addition to the "encoding" there should be a key named "key" that contains the encoded key data. The encoding must match the kind of encoding specified by the "encoding" value. All of this can be parsed and validated automatically.

Another nitpick is the use of "type" as a key name. In nearly every programming language that matters, the word "type" is reserved so naming a key as "type" means that a parser/validator cannot be automatically generated without some hand written aliasing.

If the solution for problem 4.1 is accepted, then one normalized example for the key data unit could look like this:

{
  "key": [{
    "id": "did:example:123456789abcdefghi#keys-1",
    "kind": "RsaVerificationKey2018",
    "controller": "did:example:123456789abcdefghi",
    "public": {
      "encoding": "Pem",
      "key": "-----BEGIN PUBLIC KEY...END PUBLIC KEY-----\r\n"
    },
    "secret": {
      "encoding": "SealedBoxHex"
      "key": "dcd2097f79d53a0a6d71e286d0a99d64"
    }
  }]
}

Problem 4.3: The list of key data units being either strings or objects is garbage

Another nightmare for implementors that requires more hand-written parsing code. Dealing with collections of disperate types is diffult in all programming languages that matter. Designing the DID document to have lists of strings or structs make implementations harder than they have to be, increasing the opportunity for bugs to creep in. It appears that both the publicKey list and the authentications list are both defined this way.

Based on the current spec, strings in either list must be treated as if they are URLs and dereferenced to a full DID document and then the keys/authentication data units are iterated over looking for a matching id. To make life infinitely easier for implementors, making these strings into objects with just an "id" would serve the same purpose. So instead of a list of strings or objects, we get a list of just objects and we can all avoid having to hand write parsing code. In the current spec, Example 9 shows an authentication list with a string and an object. It should actually be like the following:

{
  //...
  
  "authentication":[{
      "id": "did:example:123456789abcdefghi#keys-1",
      "ref": "did:sov:2399af938edde99fa23742#keys-1"
    },{
      "id": "did:example:123456789abcdefghi#keys-2",
      "type": "Ed25519VerificationKey2018",
      "controller": "did:example:123456789abcdefghi",
      "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
  }],
  
  //...
}

Another advantage to switching to all objects is that the references themselves can have their own "id" in the context of the current DID thus creating an alias to the externally reference DID document. This aggregation capability would allow for DID controllers to have DID documents spread out in various contexts (e.g. btcr, sov, git, etc) and have one be an aggregate DID document that references the other ones. Again, just throwing ideas out there. The important thing is that the lists need to be just objects to simplify the implementation of software that utilizes DID documents.