owned this note
owned this note
Published
Linked with GitHub
---
tags: KERI, Zero Trust, Async, Data
email: sam@samuelsmith.org
version: 1.00
---
# Zero Trust Computing Architecture for Data Management: How to support Secure Async Data Flow Routing in KERI enabled Applications
See also the discussion of field normalization here:
https://hackmd.io/XfdKjT3ZQDi1M6Iv3iYhbg
## Data Management in KERI
KERI is about managing verifiable data structures. When data is part of a verifiable data structure we can make strong security guarantees about that data as derived from the verifiability guarantees of the data structure itself. The principal verifiable data structure in KERI is a KEL or KERL.
Data may be directly embedded in a KEL or it may be anchored to a KEL using a cryptographic digest or SAID (self addressing identifier). A SAID is a self-referential digest as identifier. Given that the cryptographic strength is sufficient, any digest anchored data has the same verifiable security guarantees as the embedded data for which is was derived.
A SAD (Self-Addressed Data) item is a serialization of a data item that includes its SAID. A commitment to the SAID of a SAD is cryptographically equivalent to a commitment to the SAD itself.
KERI Protocol employs several types of cryptographic commitments to serialized data. Typically a cryptographic commitment is a non-repudiable digital signature on that serialized data. These are labeled commitment Types 1-5 in the following list:
1. Commitment by an event in a KEL: This is either event data in a KEL or a data seal anchored in a KEL. This is the strongest type of commitment as must be made by a controller at a given key state. Both the key state and the commitment (anchor) are verifiable as part of the KEL. The data or an anchor (seal) are equivalent from a security perspective. An anchor has an additional availability requirement besides that of the anchoring KEL. Whereas data embedded in the KEL has the same availability as the KEL itself. Ordering of updates to Type-1 data is determined by order of appearance of data or its anchor in KEL.
2. Commitment by an Identifier whose identifier is committed to by a KEL but the committed to data is not embedded or anchored in the KEL. Type-2 data is essentially a form of authorized data that may be authenticated to the committed to identifier. This is a weaker form of commitment because it only commits to an authorized identifier in the KEL. An example would be a signature on a message body or a SAID or SAD or some other type of Message where that signature is provided by some entity that is authorized by the KEL via a commitment in that KEL. Besides the controlling identifier Prefix of a given KEL other identifiers committed to in that KEL such as a witness, backer, or registrar identifier are examples of authorized entities that may make Type-2 commitments. Ordering of updates to Type-2 data is determined by a monotonic date-time of the updating entity.
3. Commitment to data by an Identifier that is NOT committed to by the KEL and that committed data is also not anchored in the KEL. This requires trust in the identifier or some other process to authorize or designate this identifier as trustworthy. A signature on a SAID or SAD or some other type of Message where that signature is provided by some entity that is not committed to (i.e. authorized ) by a KEL forms a commitment at this level. A commitment to data made by some entity in the wild is an example of a Type-3 commitment. Ordering of updates to Type-3 data is determined by a monotonic date-time of the updating entity.
4. Commitment to a generic data envelope via a signature on the envelope not the embedded data independently of the envelope. This type of commitment may be made by a signer that is authorized by a KEL. The unique characteristic of this type of commitment is that the data serialization is ephemeral. It is meant to be thrown away after its use. The data may be used or kept in another form but the serialization is not meant to be kept. This is to support truly ephemeral data communication such as a query for other data. Signing the serialization is a way to authenticate the request but once processed the request is obsolete. Likewise responses that use generic envelopes indicate that the serialization of the envelope is meant to be discarded and any data is either of temporary use of is combined or stored in another form besides the serialized envelope. Although each envelope may be uniquely identified by a digest of the whole envelope, the envelope itself is meant as a generic temporary carrier of data. Otherwise, a SAID, or SAD, or Event should be defined to indicate data serializations that are not meant to be ephemeral but are meant to be thrown away. A generic data envelope could convey a SAD (with SAID) as a payload of the envelope. In this case a commitment to the SAD or its SAID should be used instead of a commitment to the serialized envelope. Ordering of updates to Type-4 data is determined by a monotonic date-time of the updating entity.
5. Commitment to a generic data envelope via a signature on the envelope not the embedded data independently of the envelope. This type of commitment may be made by a signer that is NOT authorized by a KEL. This is identical to the previous classification except for the fact that the signer must be trusted or some other process that authorizes the signer besides a KEL. Ordering of updates to Type-5 data is determined by a monotonic date-time of the updating entity.
## Current Facilities for communicating Data
### Non-Enveloped Data Specific Typed Messages
`icp`, `rot`, `ixn`, `ksn`, etc fixed format (field composition) messages of various types.
### Interactive Exchange `exn` Messages
`exn` messages are part of interactive exchange. An `exn` may be used to either solicit action and be a response to a solicitation to action by another `exn`. The `exn` was created to enable multi-step interactions.
Currently `exn` message envelopes uses `q` for its payload. Suggest changing to `a` for attributes (data payload) so as to not confuse it with the the `q` query modifiers block in a `qry` message. This would also allow an `exn` to have an optional `q` in addition to the `a` where the `q` provides some additional modifiers that are better adapted to a ReST endpoint model. This would better support a multi-protocol implementation that includes other protocols like TCP, UDP in addition to HTTP ReST but where the rest model is mimic-ed by the other protocols.
### Non-interactive Query Message Envelope
`qry` messages to solicit action. Currently the only way to reply to a `qry` message is with a typed message. Previously there was no generic envelope for communicating data in reply to a `qry` beside creating a new message type for each type of reply to a `qry`.
One option would be to allow an `exn` to be sent as a reply to a `qry` message. This seems out of place to the intent of an `exn` as a multi-step interaction. Having two types of workflows , one that uses both `qry` and `exn` and another that uses only `exn` would be confusing.
## Suggested new components
### Non Interactive Reply Message Envelope
`rpy`, reply message envelope as either a solicited reply to a `qry` query message or as an unsolicited message to transfer of data . This would enable any type of data to be exchanged without requiring a dedicated message type for each type of data. The reply `rpy` provides a generic envelope independent of the data payload. The `rpy` message may be used in both solicited and unsolicited mode. In the later there is not a one-to-one correspondence between a `rpy` and some `qry`. In other words, a `rpy` may be triggered independently of a `qry`. This allows push communications or other asynchronous communication of enveloped data payloads.
An open question arises around the cases wherein a `rpy` message could be a valid mechanism in the context of an exchange using `exn` messages. An `exn` , could trigger a `rpy` but sould only do so as a side effect. The `rpy` so triggered should not part of the interaction protocol. Any messages that are directly part of the exchange protocol should use `exn` messages not `rpy`. A reply `rpy` could be triggered as a side-effect of an `exn` but the `rpy` should not be an explicit step in a defined exchange protocol.
#### Message Data Conveyance Lifecycle Issues
Any new data block, that is not part of an interactive protocol, starts life as a data payload of a `rpy` envelope. At some point if the data block is important enough or common enough that it would be more optimal to have a typed message for that data payload, then a new message type may be created that is dedicated to that data. Another reason to create a dedicated message type instead of using a `rpy` envelope is if the commitment to the data in the envelope is meant to persist and be forwarded to some other entity or reused outside of the context of the envelope.
### Route Field
In general the `r` field of a `rpy`, `qry`, `exn` or `ksn` message that acts to route the data payload of the associated envelope or message. This allows message instance specific handling of the enveloped data. The route field value is a namespace so the routes are not practically limited. As a result, the route field is a more general more extensible mechanism than using data specific message types. Moreover, although a `ksn` is a typed message, because there are multiple reasons or actions that may trigger a `ksn` message, a route `r` field in a `ksn` provides a way to differentiate the reason for a given `ksn` and thereby direct it the correct handler.
### Reply Route Field
`rr` reply route field (route of reply) so that solicited replies may be routed to a data flow destination on the initiator's (solicitor's) side. This is a new required field in a `qry` message but may be empty. When the `rr` field i in a `qry` s not empy then the associated `rpy` will have its `r` field set to the `rr` of the triggering `qry`. This enables the querier to indicate how to route the resultant associated but asynchronous reply `rpy` message within its computing infrastructure
In a `qry` message one could view the `r` as the destination handler of the query and the `rr` as the intended destination handler of the reply back to source of the `qry`.
Likewise `exn` messages could have optional `rr` fields that may be useful in more complex interactions protocols where the logic may branch and an explicit next route for the next `exn` must be provided by the preceding `exn`.
An `r` field will be added to the `ksn` message so that a `ksn` may be the routed response to `qry` message with a non-empty `rr` field.
### New Granular Commitments with New Attachments
The Signed envelopes `exn` and `rpy` are meant to provide authentication with stale or replay attack protection for the envelope and its embedded data. When the data is ephemeral the envelope and attached signature may be thrown away once the the message is authenticated. When the data is persistent, its authentication must be able to be re-established or re-proven. When this proof must be conveyed forward to an external party it may require embedding it in another envelope thus resulting in nested envelopes. This may be verbose or cumbersome or confusing.
One alternative would be to define a unique message type for each embedded configuration of data. This may be problematic because it may require too frequent versioning the protocol to add the new message types or an explosion in message types.
In general, in order to comply with Zero Trust Computing principles any data that is durable should be re-authenticatible (i.e. the information such as signatures and signed serializations) should also be durable so that the authenticity may be re-established. Thus some more convenient mechanism besides re-enveloping or unique message types may be desirable. A granular construction of embedded data with correspondingly granular commitments would enable more granular data management without nested envelopes or a multiplicity of new message types.
The proposed solution provides granular commitments to an embedded SAID or SAD in an envelope (`exn` or `rpy`) using a new attachment type. There are two types of attachments based on the signer prefix type and each commitment may be to one of either a `SAID` or `SAD`.
- SAID Commitment. A SAID is a Self-Addressing IDentifier. Essentially a self-referential content addressable identifier.
- SAD Commitment. A SAD is Self-Addressed Data. A SAD must contain a SAID for that SAD. The SAID has as its target its associated SAD.
Given that the data payload of an envelope, `rpy` or `exn` is a `SAD` with an embedded `SAID`, or is merely a `SAID `then an attachment to the envelope could convey an independent granular commitment to that `SAD` or `SAID`.
#### Non-Transferable Identifier Granular Commitment
The attachment for a non-transferable identifier based commitment to a `SAD` or `SAID` has the following primitives in CESR format.
`SAID`, `Prefix`, `Signature`
The `SAID` is that of the associated `SAD`. The `SAD` with embedded `SAID` may be included in the envelope or the `SAID` may be included in the envelope and the `SAD` provided elsewhere (such as in an attachment).
The `Prefix` is the non-transferable identifier prefix of the committer (signer).
The `Signature` is the signature by the private key of the `Prefix` on the enveloped `SAD` or `SAID` as appropriate .
#### Transferable Identifier Granular Commitment
The attachment for a transferable identifier based commitment to a `SAD` or `SAID` has the following primitives in CESR format:
`SAID`, `Prefix`, `SN`, `Digest`, `Indexed Signature(s) Group`
The `SAID` is that of the associated `SAD`. The `SAD` with embedded `SAID` may be included in the envelope or the `SAID` may be included in the envelope and the `SAD` provided elsewhere (such as in an attachment).
The `Prefix` is the non-transferable identifier prefix of the committer (signer).
The `SN` and `digest` are the sequence number and digest in the KEL of the `Prefix` of the event that establishes the authoritative signing keys used to create the signatures in the `Indexed Signatures Group`. These signatures are made with those authoritative keys.
## ReST API Convenience
ReST APIs assume a synchronous connection based request response architecture. Each request has a corresponding response on a given connection. As described above, `qry` and `rpy` messages provide an asynchronous mechanism for requests (queries) and corresponding responses (replies) as generic data envelopes. They provide an envelope mechanism for conveying generic data. Furthermore the asynchrounous query/reply `qry` and `rpy` message formats are designed to support asynchronous communication over non-HTTP transports like TCP and UDP. But it would be advantageous if they could be mapped to the synchronous request/response model of HTTP ReST APIs. Therefore the packet design is informed by HTTP ReST API's but adapted to non-HTTP packet formats.
When one looks at how web frameworks work, the heavy lifting of URL composition, encoding, and decoding is done by the framework. Any given endpoint just receives the end result of that parsing in the form of dicts that contain the parsed and decoded parameters. These typically include: a path parameters dict, a query string dict, a headers dict etc. Each dict has fields with labeled values. The values in each of these dicts have already been URL decoded. Therefore instead of using URL strings with path, query, and fragment string in the `qry` and `rpy` messages, the URL itself with path and query strings may be exploded into labeled fields in a mapping block. To clarify, he resource path string, and and any path parameter elements and query string parameters the path and query strings used to compose the URL may be provided instead by labeled fields in a mapping block. The advantage pf exploding the URL path and query string into a map or dict is that there is no need for URL parsing to process a `qry` or `rpy` thus making the envelope protocol agnostic. Nonetheless, a URL can be constructed from the exploded components when needed to use an HTTP ReST endpoint transport or equivalent.
This approach generalizes the ReST concept of a resource or endpoint into a data flow `route`. Unlike ReST where only server's have resource endpoints and clients identify each resource end point using a URL (Uniform Resource Locator), the route and reply rout fields enable both client's and server's to use data flow `routes`. These enable the identification of to endpoints on both sides of any interaction or data exchange or pub/sub. By having routes on both sides, generic peer-to-peer protocols like UDP and TCP may be supported not merely client-server protocols like HTTP.
Recall that the route field is denoted with the compact label `r`. The route field may map one-to-one to a URL path fragment so it can be conveniently mapped to an HTTP ReST API. In a flow based or data flow based programming architecture, the `route` maps to a data buffer. Some behavior is responsible for processing that buffer.
In addition the reply route `rr` field enables a solicitor or subscriber to specify a reply route or return route that the receiving party may publish to. The reply route field is denoted with the `rr` compact label. The details of each message, namely, `qry` and `rpy` are found below.
### qry Message
The `qry` message as described above provides a way to solicit a reply or subscribe to a push stream. The `r` field contains the route path string. The path elements are delimited with the `/` character. For example `route/path/to/a/resource`. It serves to namespace routes, resources, or endpoints. So instead of multiplying message types, one for each unique composition of data fields, a namespace identifies unique data field compositions via a route path tree. This allows the `qry` message to be generic. Each `r` field path string value may address or route to a unique data resource. The `qry` message also contains a reply route field denoted with the `rr` compact label. This allows the solicitor or subscriber or querier to specify the return route of the corresponding reply.
The `qry`message also contains a version string field as its first field. The version field is denoted with compact label `v`. Because of the version string any compatible serialization may be used such as JSON, CBOR, MPCK, or CESR.
Another notable field in a `qry` message is the query field denoted by the compact label `q`. Its value is a map or dict whose labels and values are the path and query parameters that further define the query.
The major drawback of the route path string plus exploded path parameter, query string dict approach is that a JSON object is more verbose than a URL string where that URL includes the full path and query string because JSON block delimiters add a few characters over the URL `? = &` separators. But if the URL includes `%` escaped encodings in the query parameters then the compactness advantage may flip to JSON exploded query mapping. Moreover if the serialization is not JSON but a binary equivalent such as CBOR or MGPK or CESR then the the resulting binary query map may be always more compact than the encoded URL.
As per the rest design documentation [Web API Design](https://cloud.google.com/files/apigee/apigee-web-api-design-the-missing-link-ebook.pdf) best practices for ReSTful interface design, each resource has two base urls: a collection URL, and a single element URL. The collection URL depends on query parameters for operations on the collection. A special case, however, of any collection is a single element. This makes the single element base URL somewhat redundant. One benefit of this redundancy is the the clarity that single element URLs provide about intent. However under the hood one can do everything with the collective base URL and a singular query string equivalent. In order to regain some of the simplicity of the non-collective (singular) query, we take advantage of the fact that in KERI we enforce ordering of the appearance of fields in a mapping. (ordered mapping). This means that the query `q` block could be interpreted as the a non-collection request if all the fields in the query `q` block are singular (non-collections) and the ordering and presence or absence of fields mimics a singular (non-collection) traversal of the resource path.
#### Authentication Support
Solicitation benefit from some form or access restrictions or access control. This requires both authentication and replay attack prevention. Unlike KERI event messages which are cryptographic commitments or disclosures originally initiated by some controller that are verified according to the protocol against signatures by the controller of the associated identifier, or receipt messages which are merely conveying signature and other cryptographic material used to verify signatures on events, a `qry` query is asking some host to disclose information. As a result it makes sense to include a bare bones authentication mechanism in the `qry` to enable authorization of that disclosure by a layer above KERI. For security reasons it's best if the authentication happens at the ingestion of the `qry`. For better scalability and asynchronicity, a non-interactive mechanism is preferred. The heavy lifting needed to support authentication is already done by KERI. That heavy lifting is provided by the current key state for a given identifier. This means that a query may have attached to it the identifier of the querier and a signature of the query message. This essentially authenticates the querier.
As mentioned above, attached to the serialized query message body is an attachment with the prefix of the querier and signature(s). Authentication of signature(s) always assumes that latest key state for the prefix as well as the service endpoint address for the querier. Key state is not relevant for non-transferable prefixes but is relevant for transferable ones. The signatures are verified by the host server (querient) against the latest available key state known by that host server for the querier. This means that information about which establishment event was used for the signatures does not need to be attached to indicate which set of keys were used. It may be assumed to always be the latest key state. If not then the query is not timely anyway and may be safely dropped. The latest querier service endpoint data may also be attached to the query or assumed available to the server (querient).
Signing a query message, however, does not protect against is a replay attack of that signed query. Queries need to be timely. The simplest non-interactive mechanism to protect against replay attacks is a date-time stamp in the query message. The `qry` message must therefore have a `dt` field. The querient (server, recipient of the query) then refuses any signed queries whose datetime stamps, `dt`, are not within a narrow time window around the server's current datetime. An attacker has to replay any signed queries within that window. Thus stale queries outside that window may be refused. Consequently an attacker can't request the information outside of that window. unless the attacker is able to compromise private keys. Key compromise is hard. The server can increase the degree of protection by enforcing a policy of that all queries must have monotonically increasing datetime stamps. This can be done by keeping a cache of the latest query. Monotonicity also making any replay attacks detectable by the querier. The server will only respond to the first query at a given datetime so a successful replay attack requires a man-in-the middle but unless that man-in-the middle is part of the routing infrastructure it may be detected because the reply must be specifically routed by the server to the querier's service end-point not the man-in-the middle's service end point. But a signed query that indicates the return service endpoint address may not be changed by the man-in-the-middle without compromising keys. Mixed routing are multi-perspective routing infrastructure would foil a non-key-compromise man-in-the middle replay attack on a such a query.
#### Example qry
```json
{
"v" : "KERI10JSON00011c_",
"t" : "qry",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs",
"rr": "log/processor",
"q" :
{
"i": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"sn": "5",
"dt": "2020-08-01T12:20:05.123456+00:00",
}
}
```
The `dt` field in the query body top level is the datetime of the request used for replay attack prevention,
The field `r`, is the query route.
The field `rr`, is the reply route (return).
The `q` block separates the query body from the query envelope and avoids confusion without having to define unique field labels.
The `q` field value holds the equivalent query parameters exploded into a mapping.
Should a `dt` field be provided in the `q` block then that `dt` is for the query target log entry not for replay attack protection.
`qry` messages may be signed with an associated attachment that provides the signer (querier)as well as the signature. This serves to authenticate that `qry`. A given recipient (querient) could drop any `qry` messages that were not signed by identifiers it did not recognize or accept as authorized. If the querient does not have the current key state for the querier the querient may escrow the query.
```json
{
"v" : "KERI10JSON00011c_",
"t" : "qry",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs",
"rr": "log/processor",
"q" :
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"sn": "5",
"dt": "2020-08-01T12:20:05.123456+00:00",
}
}
```
### Reply Message, `rpy`
The reply, `rpy`, message as described above provides a way to respond to a solicitation or publish to a push stream.
- The attribute data payload block for the `rpy` message is denoted with the compact label `a`. Its value is a map or dict whose labels and values constitue the body of data for the replay. The `a` block may be a SAD (Self-Addressed Data) item with embedded SAID (Self-Addressing IDentifier). The attribute block, `a` may contain nested SADs as appropriate
- The `rpy` message also contains its own self-referential SAID field denoted with the compact label `d` (for content addressable digest). This makes the total reply message as an envelope a `SAD` (Self-Addressed Data) item in its own right. This enables the `rpy` message envelope to be persisted with a known content addressable identifier, i.e. the `d` field value as the database key. This provides support for reasoning about the `rpy` as the authentication mechanism (with attached signature) for re-establishing the authentication of its embedded data when that embedded data is not a SAD. The SAID in `d` is generated from the contents of the `rpy` using the SAID derivation algorithm.
- The `rpy` message uses the BADA (Best Available Data Acceptance) model for its security. These means that there must be an attributable originator of the `rpy`. The BADA security model provides a degree of replay attack protection. The attributate originator (issuer, author, source) is provided by an attached signature couple or quadruple. A single reply could have multiple originators. When used as an authorization the reply attributes may include the identifier of the authorizer and the logic for processing the associated route may require a matching attachment.
- The `r` field contains the route path string. The path elements are delimited with the `/` character. For example `route/path/to/a/resource`. It serves to namespace routes, resources, or endpoints. So instead of multiplying message types, one for each unique composition of data fields, the `r` field namespaces data field compositions via a route path tree. This allows the `rpy` message to be generic. Each `r` field path string value may address or route to a unique data resource. The value of the `r` field is the value of the `rr` field of the corresponding `qry` message that motivated a given `rpy` message. If the `rr` field in the `qry` is empty then the `r` field in the `rpy` may be an empty string.
- The `rpy`message also contains a version string field as its first field. The version field is denoted with compact label `v`. Because of the version string any compatible serialization may be used such as JSON, CBOR, MPCK, or CESR. The version field is authoritative for any nested SADs in the reply. This enables the reply and any nested SADS to use any of the supported serialization without including a version field in each nested SAD. In other words, the nested SADs share the reply message envelope's version. The reply envelope's version field must be stored along side any nested SAD storage. The security posture is that the nested SAD is signed so the SAD itself is tamper proof (evident) and the serializations are incompatible so worst case if an attacker corrupts the separately stored version field, deserialization will fail. Nonetheless, given a limited set of serializations it would be straightforward to rediscover the orginal serialization. This is viewed as a reasonable tradeoff, that is, an unsigned version field from the envelope that is inherited by any nested SADs versus a dedicated redundant but signed version field in each nested SAD. Should this not be secure enough, the original envelope with its signed version field and signature could be stored along side any nested SADs not merely the unsigned version field. This makes the version field tamper evident at the cost of redundant storage of the reply message but still keeps the nested SADs clear of having to include a per SAD version field. The nested SADs still share the reply message envelope's version but more securely.
#### Example rpy
```json
{
"v" : "KERI10JSON00011c_",
"t" : "rpy",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs/processor",
"a" :
{
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"name": "John Jones",
"role": "Founder,
}
}
```
#### Out-of-order or Stale Reply Data
When the `rpy` message is used as an envelope for Type-2 or higher data, the recipient needs some mechanism for detecting stale, or out-of-order transmission of updates to that data via a reply message. Stale re-transmission may be innocuous or part of a replay
or DDOS attack. When the data is Type-1 the ordering may be determined by the anchor location in the associated KEL. But for Type-2 or higher data there is no anchor so some other mechanism is needed to order updates to the data. The simplest non-interactive mechanism for ordering data updates is a monotonic date-time. The date-time in the `rpy` is relative to the `rpy` sender's (replier's) clock. The monotonicity of the date-time enables the recipient (replient) of the `rpy` to detect out-of-order updates by first storing that date-time of the most recent received `rpy` for that data and then comparing it to the date-time of any newly recieved `rpy`. Therefore the `rpy` message includes a `dt` field. Any recipient of a `rpy` may refuse it if its date-time stamp, `dt`, field is not later than the date-time it already has on hand for that data item.
#### rpy Authentication and Authentication
`rpy` messages may be signed with an associated attachment that provides the identifier of a signer as well as the signature. This authenticates the `rpy` with respect to its sender signer. A given recipient could drop any `rpy` messages that were not signed by identifiers it did not recognize or accept as authorized. If the receipient does not have the current key state for the sender (signer) then it may escrow the `rpy` message and attachments untill such time as it has the current key state.
#### Query/Reply Tracking and Matching
In the case where the same data item provided by a `rpy` may come from multiple sources, the recipient may choose to track sources independently of each other by matching each `rpy` to a prior originating `qry`. If there is no match then the `rpy` amy be dropped. Tracking could be based on some combination of different items of information each providing a tracking mechanism. One tracking mechanism is to match the source signer identifier in a signature attachment of the `rpy` to some destination identifier in the prior matching `qry`. For example, if a `qry` asks for information about a KEL from the controller or a witness to that KEL, both of whose identifiers are associated with that KEL, then a corresponding `rpy` signed by either the controller or a witness could be matched to the originating query for information about that KEL. Another tracking mechanism is to match the route, `r`, field of the `rpy` with the reply route, `rr`, field of the originating `qry`. In addition, a transaction identifier or cryptographic token could be included in the `rr` in the originating `qry` to more specifically match a given `rpy` to that `qry`.
### Reply Security Posture
#### Ephemeral Reply Data
In this case the reply's payload data is meant to be ephemeral. A signature on the message establishes authenticity of the message envelope but the intent is that once processed the envelope is discarded and the data payload is not intended to be the target of a cryptographic commitment (signature) that is stored nor is it intended that the authenticity of the data ever needs to be re-proven externally or internally. Thus the signature of the envelope merely serves as an ephemeral authentication. In accord with Zero Trust Computing (ZTC) principles, ephemeral data is is stored in memory and is not meant to be persisted to durable storage. In general data stored in protected process memory is not accessible by other processes unless specifically granted. When that process exists all data in memory is lost. Thus the authentication and authorization of the startup of process itself provides a degree of protection to the data stored in its memory. When data is placed into durable storage however that protection is removed. The chain of custody or control over a given storage device (hard drive, flash drive, etc) may have been broken during any time that the running process is stopped. Consequently the next time the process runs and loads the data from durable storage there is no guarantee the data is still authentic. As a result best practices ZTC is to re-verify or re-establish the authenticity of any data in durable storage whenever there is any doubt as to the chain of custody of that durable data. Encrypting the data merely displaces that authentication chain-of-custody to the chain-of-custody of the decryption key. One may make a time performance trade-off between re-verification of signatures at startup and continuous encryption/decryption while running.
#### Persistent Reply Data
When Type-2 data conveyed by a `rep` is persisted to durable storage there must be a mechanism for re-establishing the authenticity of that data. Persistence indicates a need to re-prove the authenticity of the data either externally or internally. This usually means storing the latest signed version of that `rpy` with the attached signature and then re-verifying the signature on that `rpy` before refreshing the stored data. If the signature does not verify or the verified `rpy` data does not match the persisted data then the persisted data is no longer authenticatible and should not be trusted until it is re-authenticated (i.e. authenticity is re-established).
#### Reply with SAD (Self-Addressed Data)
In order to better reason about the embedded attributed data payload it may be desirable to make make the `a`, attribute block into a SAD (Self-Addressed Data) block with an embedded SAID ( Self-Addressing IDentifier). The SAID is provided by the `d` field in the the `a` block , i.e. the contents of the `a` block is a SAD item. A SAID is a specially derived self-referential cryptographic digest of the data block in which it resides. This makes is self-referential content addressable identifier or self-addressing identifier for short.
```json
{
"v" : "KERI10JSON00011c_",
"t" : "rpy",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs/processor",
"a" :
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"name": "John Jones",
"role": "Founder,
}
}
```
#### Expose Message
Exposure messages for disclosure of sealed data associated with anchored seals in a KEL. Reference to anchoring
seal is provided as an attachment to exposure message.
Exposure 'exp' message is a SAD item with an associated derived SAID in its `d` field.
```json
{
"v": "KERI10JSON00011c_",
"t": "exp",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"r": "sealed/processor",
"a":
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"name": "John Jones",
"role": "Founder",
}
}
```
### Independently Persisted SAD
An alternative to persisting the whole `rpy` with attached signature in order provide a mechanism for re-establishing the authenticity of include data is to use a SAD format for the `a`, attributes block (which therefore includes an embedded SAID) and also attach a signature on just the serialized attributes block (SAD). This attached signature would be in addition to the attached signature on the whole `rpy` envelope. In this case the whole reply with attached signature does not need to be stored to re-establish the authenticity of the data attributes, merely the SAIDed (SAD) attributes block and the attached signature on that block. This approach makes the `rpy` envelope bigger but may make the persisted storage smaller. This approach allows the `rpy` to be used in an ephemeral manner to filter out stale `rpy` messages as well as to enable matching the `rpy` to a `qry` or for cueing the embedded data via the route, `r` field but only persist the embedded `a` block and attached signatures.
#### Type-1 Data
When the SAD is Type-1 data then order of appearance of an anchor in the corresponding KEL or TEL (given by the attached anchor seal) determines whether or not the SAD is stale. Consequently Type-1 SADs do not need an embedded date-time for stale detection. (they may need an embedded date-time for some other purpose).
Note the `d` field is the SAID of the contents of the `a` block, i.e. the contents of the `a` block is a SAD item.
```json
{
"v" : "KERI10JSON00011c_",
"t" : "exp",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"r" : "logs/processor",
"a" :
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"name": "John Jones",
"role": "Founder,
}
}
```
```json
{
"v": "KERI10JSON00011c_",
"t": "exp",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"r": "sealed/processor",
"a":
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"name": "John Jones",
"role": "Founder",
}
}
```
#### Type-2 Data
When the data is Type-2 then a date-time field must be included in the attributes block to identify stale or out-of-order updates.
Note the `d` field is the SAID of the contents of the `a` block, i.e. the contents of the `a` block is a SAD item.
```json
{
"v" : "KERI10JSON00011c_",
"t" : "rpy",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs/processor",
"a" :
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"name": "John Jones",
"role": "Founder,
}
}
```
#### Compact Reply with SAID only
An alternative is to only include the SAID in the attributes block of the signed `rpy` to provide a cryptographic committment to the associated SAD and then provide the SAD itself in a cache or in an attachment. This allows for reduced bandwidth requirements when the same SAD may be transmitted or shared redundantly. The compact form can be used as a notice of a change or update that triggers a request for the actual data. When used as a notice the `dt` field in the `rpy` is used to determine if it's a stale notice that may be ignored. Whether or not the SAD contains a date-time field depends on if the SAD has an anchor in a corresponding KEL or TEL.
```
{
"v" : "KERI10JSON00011c_",
"t" : "rpy",
"d": "EZ-i0d8JZAoTNZH3ULaU6JR2nmwyvYAfSVPzhzS6b5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"r" : "logs/processor",
"a" :
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM"
}
}
```
The `d` field is the SAID of the contents of the externally provided `a` block, i.e. the contents of the `a` block is a SAD item.
The actual SAD for Type-2 data is as follows and is provide elsewhere:
```
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"dt": "2020-08-22T17:50:12.988921+00:00",
"name": "John Jones",
"role": "Founder,
}
```
When the SAD is Type-1 data then order of appearance of anchor in the corresponding KEL or TEL (given by the attached anchor seal) determines wether or not the SAD is stale. Consequently Type-1 SADs do not need an embedded date time for stale detection. (they may need an embedded datatime for some other purpose).
```
{
"d": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"i": "EAoTNZH3ULvYAfSVPzhzS6baU6JR2nmwyZ-i0d8JZ5CM",
"name": "John Jones",
"role": "Founder,
}
```
### ksn Message (Key State Notice)
As described above, instead of using the generic `rpy` message envelope, some data is important enough and used enough to justify a dedicated message type. One such message with associated data is the `ksn` message type. Notable here is a revised `ksn` that adds a route, `r`, field. A `ksn` represents Type-1 data, i.e. data included in a KEL.
An example `ksn` is provided below:
#### Example ksn
```
{
"v": "KERI10JSON00011c_",
"i": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM",
"s": "2":,
"t": "ksn",
"p": "EYAfSVPzhzZ-i0d8JZS6b5CMAoTNZH3ULvaU6JR2nmwy",
"d": "EAoTNZH3ULvaU6JR2nmwyYAfSVPzhzZ-i0d8JZS6b5CM",
"f": "3",
"dt": "2020-08-22T20:35:06.687702+00:00",
"et": "rot",
"kt": "1",
"k": ["DaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM"],
"n": "EZ-i0d8JZAoTNZH3ULvaU6JR2nmwyYAfSVPzhzS6b5CM",
"bt": "1",
"b": ["DnmwyYAfSVPzhzS6b5CMZ-i0d8JZAoTNZH3ULvaU6JR2"],
"c": ["eo"],
"ee":
{
"s": "1",
"d": "EAoTNZH3ULvaU6JR2nmwyYAfSVPzhzZ-i0d8JZS6b5CM",
"br": ["Dd8JZAoTNZH3ULvaU6JR2nmwyYAfSVPzhzS6b5CMZ-i0"],
"ba": ["DnmwyYAfSVPzhzS6b5CMZ-i0d8JZAoTNZH3ULvaU6JR2"]
},
"di": "EYAfSVPzhzS6b5CMaU6JR2nmwyZ-i0d8JZAoTNZH3ULv",
"r", "route/to/buffer"
}
```
Note that the `ksn` includes a route, `r`, field. This may be an empty string. The other fields of the `ksn` have been defined elsewhere.
### tsn Message (Transaction State Notice)
As described above, instead of using the generic `rpy` message envelope, some data is important enough and used enough to justify a dedicated message type. One such message with associated data is the `tsn` message type. Notable here is a revised `ksn` that adds a route, `r`, field. A `tsn` represents Type-1 data, i.e. data included in a KEL.
An example `tsn` is provided below:
### exn Message (exchange)
```
{
"v": "KERI10JSON00011c_",
"t": "exn",
"i": "EaU6JR2nmwyZ-i0d8JZAoTNZH3ULvYAfSVPzhzS6b5CM", // recipient
"dt": "2020-08-22T17:50:12.988921+00:00", // replay attack prevention
"r": "route/of/exchange", //route
"rr": "replyroute/of/subsequent/exchange", //reply route
"a":
{
"name": "John Jones"
} // data payload
}
```
## Best Available Data Acceptance (BADA) Policy
BADA (Best Available Data Acceptance) model for each reply message.
Latest-Seen-Signed Pairwise comparison of new update reply compared to
old already accepted reply from same source for same route (same data).
Accept new reply (update) if new reply is later than old reply where:
1) Later means sn (sequence number) of last (if forked) Est evt if any that
provides keys for signature(s) of new is greater than or equal to
sn of last Est evt that provides keys for signature(s) of olf.
2) if key state same or non-transferable then Later means date-time-stamp of new is greater than old
If nontrans and last Est Evt is not yet accepted then escrow.
If nontrans and partially signed then escrow.
Escrow process logic is route dependent and is dispatched by route,
i.e. route is address of buffer with route specific handler of escrow.
## Read Update Nullify (RUN) Model
Relative to Client-Server or Peer-to-Peer interaction:
Create, Read, Update, Delete (CRUD)
Read, Update, Nullify (RUN)
Decentralized control means server never creates only client. Client (Peer) updates server (other Peer) always for data sourced by Client (Peer). So no Create. Non-interactive monotonicity means we can’t ever delete. So no Delete. We must Nullify instead. Nullify is a special type of Update.
#### Ways to Nullify:
- null value
- flag indicating nullified
#### Rules for Update :
(anchored to key state in KEL)
- Accept if no prior record.
- Accept if anchor is later than prior record.
#### Rules for Update:
(signed by keys given by key state in KEL, ephemeral identifiers have constant key state)
- Accept if no prior record.
- Accept if key state is later than prior record.
- Accept if key state is the same and date-time stamp is later than prior record.
## Restful APIs
A useful set of design guidelines for ReSTful APIs may be found here:
[Web API Design](https://cloud.google.com/files/apigee/apigee-web-api-design-the-missing-link-ebook.pdf)
A related but more dated book.
[API Design](https://pages.apigee.com/rs/apigee/images/api-design-ebook-2012-03.pdf)
The basic design consists of two base URLs per resource. A collection URL and a specific element in the collection URL.
'/dogs?all=true' (collection with query parameters to operate on the collection)
'/dogs/1234' (specific element with path to specify element)
The base URLs are operated on with the HTTP verbs, POST, GET, PUT, PATCH, and DELETE corresponding to the CRUD (create, read, update, delete) methods on a database. Unlike PUT, PATCH allows updating only part of a resource.
| Resource | POST create | GET read | PUT/PATCH update | DELETE delete|
|:----------:|-------------|-----------|--------------------|----------------|
/dogs | Create a new dog | List dogs | Bulk update dogs | Delete all dogs |
/dogs/1234 | Error | Show dog 1234 (if exists) | Update dog 1234 (if exists) | Delete dog 1234 (if exists)|
Sweep complexity under the '?'
Use limit and offset for pagination.
`/dogs?limit=25&offset=50`
### Suggested Resources
#### Clone Replay of KELs in first seen order.
Logs are stored with key of identifier prefix plus monotonic date time.
`\logs`
`\logs\{pre}`
`\logs\{pre}\{datetime}`
`{pre}` is template for identifier prefix
`{datetime}` is template for url encoded ISO8601 datetime. (Alternatively the datetime could be encoded as a Unix compatible datetime floating point number, but that is not a format that is universal to all operating systems)
` \logs\EABDELSEKTH` replays first seen log for identifier prefix `EABDELSEKTH' (prefix clone)
`\logs?all=true` replays first seen log for all identifier prefixes in database (full database clone)
`\logs\EABDELSEKTH\%272020-08-22T17%3A50%3A09.988921%2B00%3A00%27` (get event of prefix at datetime)
`\logs\EABDELSEKTH?after={datetime}&limit=1`
Returns next event for 'EABDELSEKTH' after `{datetime}` where date time is ISO8601 URL encoded.
`\logs\EABDELSEKTH?before={datetime}`
Returns all events for 'EABDELSEKTH' before `{datetime}` where date time is ISO8601 URL encoded.
`\logs\EABDELSEKTH?after=%272020-08-22T17%3A50%3A09.988921%2B00%3A00%27`
Returns all events for 'EABDELSEKTH' after '2020-08-22T17:50:09.988921+00:00' (url encoded ISO-8601)
`\logs\EABDELSEKTH?before=%272020-08-22T17%3A50%3A09.988921%2B00%3A00%27`
`\logs?pre=EABDELSEKTH&after=%272020-08-22T17%3A50%3A09.988921%2B00%3A00%27`
Equivalent query on the collective base URL
` \logs?pre=EABDELSEKTH,E123ABDELSE,EzyyABDELSE`
Get logs for three prefixes
#### Replay of KELs by Sequence Number
Given recovery forks the KEL indexed by sn will not be the same as the first seen KEL. The key state will be the same but the exact sequence of events in a replay will not. So this is for verifying key state not for cloning the append only event log. But for any verification, the KEL by sn is more appropriate because you can query the key state at any sn and allows a verifier to find a given authoritative event in the log by its location seal.
`\events`
`\events\{pre}`
`\events\{pre}\{sn}`
`{pre}` is template for identifier prefix
`{sn}` is template for sequence number
`\events?all=true` (All KELs in database in order by sn)
`\events\{pre}` (KEL for identifier prefix `{pre}`)
`\events\{pre}\{sn}` (event at `{sn}` where `{sn}` is template for sequence number
`\events\{pre}?offset={sn}&limit=1000` (next 1000 events starting at sn = {sn}
`\events?pre={pre}&sn={sn}`
Using collective to get event at sn of pre
#### Fetch Keys for Event at given sequence number
`/keys`
`/keys/{pre}`
`/keys/{pre}/{sn}`
#### Fetch latest Key State for KEL at prefix
`/states`
`/states/{pre}`