Storage & Replication Proposal

# Storage & Replication Proposal This document outlines a proposed system for storing, modifying, and replicating documents ("objects") in a user's Identity Hub. ## Overview In our proposed storage system, a user's Hub consists of a set of document revisions ("commits") which each represent a change (creation, update, or deletion) of a single Hub object. Each commit is signed, immutable, and content-addressable (stored and referenced by its hash). The set of commits representing an object is generally append-only, with certain exceptions made to allow garbage collection of older commits. A single Hub object consists of a linear chain of commits which can be combined via a determinstic merge algorithm to arrive at a final object state. Replication of state between Hubs is acheived by ensuring that all relevant commits have been distributed to each of a user's Hubs. The presence of all commits, combined with the merge algorithm, ensures that all Hubs eventually converge to an identical state. #### Scope The following are not in scope for this document: * How Hub objects are encrypted * The underlying transport protocol between Hubs during replication ## Commits Each Hub object consists of a series of commits which each represent an immutable, atomic update of that object. The following diagram illustrates an object `3a9de008` which consists of 3 commits: ![](https://i.imgur.com/owH9ET1.png) Each commit contains change metadata and an optional payload. For example, the following commit could represent the creation of object `3a9de008`: ```json { "rev": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15", "operation": "create", "committed_at": 1530308239, "parents": [], "object": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15", "state": { // ... }, "patch": { // ... } } ``` Followed by an update to the same object: ```json { "rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593", "operation": "update", "committed_at": 1530309810, "parents": ["3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15"], "object": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15", "state": { // ... }, "patch": { // ... } } ``` Commit objects contain the following fields: | Field | Type | Description | | ------ | -------- | ----------- | | rev | string | The revision ID (SHA3-256 hash of the commit object). | | committed_at | number | Commit timestamp in [unix time](https://en.wikipedia.org/wiki/Unix_time); used for roughly ordering commits within the history of an individual object. | | parents | array<string> | Array of hash(es) of the commit(s) upon which this one was based *(optional)*. | | object | string | The ID of the object that is modified by this commit, or `new` when creating an object. | | state | object | The full state of the object at this point *(optional)*. | | patch | object | A [JSON Patch](http://jsonpatch.com/) or [JSON Merge Patch](https://tools.ietf.org/html/rfc7396) object (format TBD) representing the changes in this commit *(optional)*. | | operation | string | The operation being performed on the object (`create`, `update`, or `delete`). | Typically a commit should specify at least one of the `state` or `patch` fields, or else have `operation` set to `delete`. :::warning Open: Hashing a JSON object requires a canonicalization step. This could be avoided with an approach like JWT where the hash is computed from a serialized (e.g. base64) form, however Hubs would likely be required to save two copies of each commit (the original serialized base64 plus an expanded, index-able JSON format). ::: > [name=Danny Strockis] D: Should we just use the CouchDB commit schema, or the parts of it that we need? > A: we kinda wanted to be db agnostic too... ### Linear object history The set of commits pertaining to an object are organized into a linear state based on a well-known determinstic merge algorithm. This ensures that all Hubs eventually converge on an identical object history, regardless of the order in which commits are received. The merge algorithm logic is: 1. A commit with `operation: create` always precedes (comes before) any other commit. 2. A commit with `operation: delete` always supersedes (comes after) any other commit. 3. Commits with the same `operation` are ordered based on their `committed_at` fields, with higher timestamps superseding lower timestamps. 4. In the event of two commits with identical `operation` and `committed_at` fields, the commit with the lexicographically highest `rev` field supersedes the other. Wall-clock timestamps are known to be unreliable in distributed systems. They were chosen over other approaches (like vector clocks) for the merge algorithm because they can be intuitively reasoned about by end users. The merge algorithm must only be consistent, not necessarialy "correct" based on some concept of real-world time, so wall-clock timestamps are sufficient. :::warning Open: Should the `parents` field be considered? If `CommitB` lists `CommitA` in its `parents`, `CommitB` should clearly come after `CommitA` regardless of timestamps. However, this will complicate the implementation. ::: > [name=Danny Strockis] D: Clarify that the above and below logic pertains to a single object's history, and that the ordering of all changes across a hub is up to the hub to decide/implement. ### Implicit graph-based object history In addition to the canonical linear commit state of an object, the values in the `parents` field of each commit allows the construction of a parallel history graph which more richly represents the causality of changes. The following diagram illustrates a series of commits of a single object, along with each commit's `parents` field (`p`) and `committed_at` time (`t`). The top diagram shows the commits in linear form, and the bottom shows the same commits expanded into the graph form: ![](https://i.imgur.com/gIZpg8I.png) The implicit graph history allows apps and user agents to optionally support more intelligent merge semantics for conflicting updates. For example, if the illustration above represents edits to a shared grocery list, the conflicting commits `C` and `D` could be resolved by an intelligent user agent by creating the new merged commit `E`. > A: If we make the assumption that REST-style simplicity of requests has gone out the window due to encryption requirements, and clients therefore need an SDK already, does it still make sense to keep the simpler timestamp based ordering? We could just have the SDK handle the parenting logic. > [name=Danny Strockis] D: So it's basically last-write-wins (assuming clocks are aligned). Why would anyone specify a parent commit in a change, if it's not required? > > A: This was an attempt to offer both a simple PUT/GET interface as well as a more advanced ordering system for clients that care about conflicts. > [name=Danny Strockis] Are you sure you want to base the order of changes to an object on the timestamp of the client that generated the change? I wonder if some unexpected behavior might occur, like an update not taking effect b/c the timestamp was lower than an existing commit. Or like a client specifying a change in the future that overrides any other changes made. ### Patch-based commits In the simplest usage, each commit specifies a `state` field representing the complete state of the object at that point in time. The current state of the object is thus represented by the `state` field of the final commit in the object's history (as determined by the linear merge algorithm). This approach is simple but not always ideal. For example, in the scenario of a shared grocery list that is simultaneously updated by two participants, one of the edits will "win" based on the merge algorithm and the other will be "lost" (although still available in the object's history). To accomodate this scenario, we introduce the idea of *patch commits*. A commit can specify a `patch` field containing a [JSON Patch](http://jsonpatch.com/) or [JSON Merge Patch](https://tools.ietf.org/html/rfc7396) object (the exact format is still open). If a collision is detected between multiple patch-based commits (i.e. where multiple commits reference the same parent), an intelligent app can resolve the conflict by combining the patch operations into a new merged commit. A single commit can specify both `patch` and `state` fields to indicate both the difference from the prior commit as well as the current up-to-date state of the object. > [name=Danny Strockis] D: I'm not sure I see the point of offering both the `state` and the `patch` methods of making a request. Can't we just pick one? It seems to me like the problem of conflict resolution between two commits is the same whether I use `state` or `patch`. > > A: Again, trying to give a simple and an advanced way to make changes. Advanced patch way makes it easier to resolve conflicting changes, like if two changes add different values. > > D: Worried about cases where apps try to use state & patch in combination/simultaenously. Also not sure the easier conflict resolution gained by using patch is worth the complexity of having two ways to do things. > [name=Danny Strockis] How does a client become aware of a conflict? Do they need to scan previous revisions at all times? > > The hub could send a signal in responses like a flag if the most recent version references the same parent as another commit. > [name=Danny Strockis] If a patch is submitted, does the hub do the auto-merge with previous commits to return the current object state on a GET request? > > Nope, because of encryption. This means if just one client uses patch to change an object, all clients should use patch to change/read the object. Unless we mandate that clients using patch must also submit state. ### Object deletion An object is deleted by issuing a commit for that object with the `operation` field set to `delete`. A deletion commit intentionally supersedes any other commit, which allows a Hub to be certain that the deletion is final and cannot be un-done by another commit. This allows a Hub to optionally reclaim space from deleted objects by saving only the final deletion commit and freeing any earlier commit data. Apps which want to present an "Undo delete" option can accomplish this by re-creating an identical object which will be assigned a new ID. > [name=Danny Strockis] Wouldn't we want to allow someone to un-delete an object with the same ID? > > Having a finalized delete state makes it easy to garbage collect, apps that care about undo can maybe do a soft delete. ### Garbage collection Our proposal allows a Hub to intelligently discard old commit states as needed to accomodate storage limitations (e.g. in the case of a storage-limited free plan). Any such limitations should be clearly communicated to the user. A commit is generally "safe" to delete as long as there exists an un-conflicted commit later in the linear history which contains the `state` field, since this `state`-based commit can be used as the base for further `patch`-only commits In other words, a chain of `patch`-only commits should not be broken, as missing patches will make it impossible to recalculate the final state of the object. Note that it is not possible to absolutely guarantee the safety of any deletion - there is always the possibility that a conflicting commit has been made in an offline Hub but has not been replicated yet. In practice, however, deletions may be achievable given a sufficient delay (e.g. 1-2 years) to ensure a high likelihood that no conflicts exist. As an alternative to automatic garbage collection, a Hub could present the user with a list of potentially "safe" deletions based on this logic and request confirmation. When garbage collecting old commits, the commit IDs should be retained ("tombstoned") in order to avoid re-adding the commit when replicating state from another Hub which may have a longer retention policy. ### Identifiers #### Commit identifiers Each commit is identified by the hash of its canonical JSON form without the `rev` field. Because equivalent JSON objects may differ in ordering and whitespace depending on the platform, a canonical representation is necessary to ensure a consistent hash. We propose using the rules outlined by the [json-stable-stringify](https://github.com/substack/json-stable-stringify) and [fast-json-stable-stringify](https://www.npmjs.com/package/fast-json-stable-stringify) libraries as they are well-used and simple to implement (~60 LOC). #### Object identifiers The choice of how to construct object identifiers in the system is an important one. Identifiers are ideally deterministic and verifiable. We propose to use the revision hash (`rev`) of the initial commit of a new object as the object's ID. Since the revision identifier is based on the hash of the content, this ensures that the object ID is also deterministic and verifiable. This introduces a slight chicken-and-egg problem, as commits are expected to contain an `object` field with the ID of the associated object, but the commit hash (and thus the object ID) is not known until the commit is fully constructed. To solve this, we propose that, for a commit which creates a new object, the hash of the commit should be calculated without the value of the `object` field. Thus the steps to generate the ID for such a commit are: 1. Construct the commit without the `object` or `rev` fields 2. Hash the canonicalized commit to obtain the revision hash value 3. Add both the `rev` and `object` fields to the commit Similarly, such a commit can be verified with the following steps: 1. Delete the `object` and `rev` fields from the commit object 2. Hash the canonicalized commit to obtain the revision hash value 3. Verify that both `object` and `rev` match the resulting hash An alternative approach would be to replace the `object` field in the initial commit with a special value (e.g. `new`) with the understanding that the object ID of such a commit is defined to be equal to the revision ID (`rev`) field. > Vote for`new` or`0` > [name=AF] ## Replication Because our set of commits is generally append-only and each commit is immutable, the problem of replication is simplified. Hubs need only ensure they collect the full set of available commits, which can be done by periodically contacting each peer Hub to identify any new commits which must be replicated. Once the full set of commits is obtained, the determinstic merge algorithm will ensure all Hubs converge to an identical state. Our design proposes a simple log-based replication protocol for negotiating state between pairs, but also leaves open the possibility of expanding the replication system with more efficient protocols in the future. ### Log-based replication In this approach, each Hub instance maintains an opaque state token string (a.k.a. [watermark](https://en.wikipedia.org/wiki/Watermark_(data_synchronization))) representing the current state of that Hub. A Hub should save an independent state token from each of its peers. During replication, if the Hub possesses a state token previously returned from a specific peer, the Hub can present the token back to that same peer in order to discover what commits have occurred on the peer since that time. #### State tokens We propose using opaque state tokens to offer the greatest flexibility to Hub implementers. External inspection of a state token is not guaranteed to reveal any information about the internal state of a Hub; the only supported action is to replay the token to the Hub which created it in order to identify changes since the original token was issued. The simplest implementation of a state token on a single-machine Hub could be a monotonically increasing integer which is incremented each time a commit is seen. For a complex cloud-based Hub provider spanning multiple regions and data centers, the coordination and synchronization requirements for a monotonic integer may be prohibitive. In this case, the opacity of the token allows the Hub implementer to choose a different mechanism (e.g. serializing a vector clock) which is more appropriate to their internal storage system. An alternative approach would be to specify a well-known state token format to be shared by all Hub implementations. This has the advantage of reducing bandwidth and computation needs: if a peer presents a state token which is identical to a Hub's own token, the Hub can be assured that the peer's internal state is also identical. However, it is challenging to design a format that is both simple enough for a naive Hub implementation but also expressive enough to accomodate complex multi-region providers. > [name=Danny Strockis] If I have this right, the order of the state token does not need to match the linear object history, right? So replication from a hub might return commits "out of order"? > > Correct. > [name=Danny Strockis] Are state tokens specific to the filters used? For example, what if I'm using filters A & B, but then change my query and use filter A only. Should I start my sync over, or use my previous state token? > > No, for now let's say if you change your filters you should start the sync over from zero. #### Filters To support the case of Hubs which may only wish to replicate a subset of data, we also introduce the concept of "filters". A filter is a mechanism for a Hub to specify the specific objects or hierarchies of interest during replication. The initial implementation of filters consists of a simple list of desired objects and/or areas: ```json [ { // All collections objects with the MusicPlaylist schema "interface": "collections", "schema": "schema.org/MusicPlaylist" }, { // A specific object "object": "3a9de008f526d239d8990..." }, // ... ] ``` All conditions within a given filter object must match, and at least one of the given filters must match for an object to be included. :::warning Open: Should the options for filter conditions match the permissions filters? ::: It will also be possible to expand the supported filter types in the future. For example, a filter based on [Bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) could allow the space-efficient representation of a potentially large set of object IDs with an allowable false positive rate. #### Replication flow Replication occurs as a one-way process between two peers where new commits in the source are replicated to the target. A full two-way synchronization can be performed by initiating two one-way replication sessions. The one-way nature of this protocol is desirable in certain situations, for example a Hub located on a mobile phone which needs to refresh state from a Hub in the cloud but does not yet want to upload its own updates due to bandwidth and battery constraints. The protocol described here is a slightly simplified version of the [CouchDB replication protocol](http://docs.couchdb.org/en/stable/replication/protocol.html). It defines replication in terms of APIs available from each Hub which allows orchestration to be done by a third-party Replicator, assuming the third-party has sufficient permissions to access the necessary APIs in each Hub. In practice the Replicator will likely reside alongside one of the Hubs participating in the replication flow. In this case, API calls that would be made to the local Hub may be replaced by internal calls that have the same effect. The one-way replication flow from source to target follows several steps: ##### 1. Identify changed documents A request for a list of commits is sent to the *source*. This request may optionally include: * A state token previously received from the *source*, in which case the response should include only commits which *were first seen by the source* after the state token was generated. * A list of `filters`, in which case the response should include only commits on objects which match the provided filter(s). * A list of `fields` to return, to minimize the amount of metadata transferred. An example request: ```json { "iss": "did:hub.id", "aud": "did:user.id", "@type": "Replication/Read", "request": { "fields": ["rev"], "filters": [ { "since": "aGM3ODlmcVokQDM3Yzk4bzNxNDdmaDk=" }, { "interface": "collections", "schema": "schema.org/MusicPlaylist" } ], } } ``` The response should list commit IDs which match the specified request parameters, e.g.: ```json { "@type": "Replication/Response", "response": { "state": "ZHNmc2RmZGZzZGZzYWRmcedA2RzZmE=", "commits": [ {"rev": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15"}, {"rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593"}, {"rev": "7eb02e738ec6eb95542b6e0555c2231ad299ccda124e056ed850c88e"} ] } } ``` :::warning TODO: Define pagination options (just update `since` field?) and possibly a continuous feed option. ::: > [name=Danny Strockis] How does this Replication API adhere to authorization rules? Does it evaluate all permissions to determine which objects can/can't be included in this response? > > The thought was that this API/protocol was not meant for clients, only for user agents and hubs that have access to all your data. We could revisit this if we think clients need to partake in replication. > > How did the hub get permissions to read objects in the first place? Did it ask for consent to all objects somehow? ##### 2. Calculate revision difference Any commit IDs already seen by the *target* should be subtracted from the list of commits returned by the *source*. The remaining difference represents the state in the *source* which must be replicated to the *target*. This is done by sending a request to the target with the change feed details from the source: ```json { "iss": "did:hub.id", "aud": "did:user.id", "@type": "Replication/Diff", "request": { "commits": [ {"rev": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15"}, {"rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593"}, {"rev": "7eb02e738ec6eb95542b6e0555c2231ad299ccda124e056ed850c88e"} ] } } ``` The target should respond with a list of missing commits which it would like to receive: ```json { "@type": "Replication/Response", "response": { "commits": [ {"rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593"}, {"rev": "7eb02e738ec6eb95542b6e0555c2231ad299ccda124e056ed850c88e"} ] } } ``` ##### 3. Replicate change data After determining the set of missing commits, an additional request should be issued to retrieve the full details of the relevant commits: ```json { "iss": "did:hub.id", "aud": "did:user.id", "@type": "Replication/Read", "request": { "since": "aGM3ODlmcVokQDM3Yzk4bzNxNDdmaDk=", "filters": [ {"rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593"}, {"rev": "7eb02e738ec6eb95542b6e0555c2231ad299ccda124e056ed850c88e"} ] } } ``` With no `fields` parameter specified, the response should return the full details of each matching commit: ```json { "@type": "Replication/Response", "response": { "state": "ZHNmc2RmZGZzZGZzYWRmcedA2RzZmE=", "commits": [ { "rev": "fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593", "committed_at": 1530309810, "parents": ["3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15"], "object": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15", "state": { // ... }, "patch": { // ... } }, { "rev": "7eb02e738ec6eb95542b6e0555c2231ad299ccda124e056ed850c88e", "committed_at": 1530309923, "parents": ["fe4fd3240ff1c68a85730b3cf35742c8a241a0a70bd98e8869b4b593"], "object": "3a9de008f526d239d89905e2203fa484f6e68dfc096a7c051eb80f15", "state": { // ... }, "patch": { // ... } } ] } } ``` Once all missing commits have been replicated to the `target`, the one-way replication process is complete. :::warning Open: Pagination is still an open question at the Hub protocol level. The pagination scheme chosen for the Hub overall should also apply to the `Replication` interface. ::: ### Sample replication flow The following diagram illustrates a sample replication flow that might occur between a Hub on a user's mobile device (`Phone Hub`) and a Hub hosted by a cloud provider (`Cloud Hub`). Initially, the user chooses to sync only music playlists to their `Phone Hub` : ```sequence participant Phone Hub as PH participant Cloud Hub as CH PH -> CH: Request all changes for filter {"schema": "schema.org/MusicPlaylist"} since the beginning of time CH -> PH: Return state token "abc123" and list of commit IDs PH -> CH: Request full commit details for filter {"rev":"537ad1af"} CH -> PH: Return full state for matching changes ``` The `Phone Hub` goes offline and then comes back online at a later date and wishes to discover any relevant changes since the last replication: ```sequence participant Phone Hub as PH participant Cloud Hub as CH PH -> CH: Request all changes for filter {"schema": "schema.org/MusicPlaylist"} since state token "abc123" CH -> PH: Return state token "def456" and list of commit IDs PH -> CH: Request full commit details for filter {"rev":"7af33ed9"} CH -> PH: Return full state for matching changes ``` Later, the user also chooses to synchronize their recipes collection to their phone: ```sequence participant Phone Hub as PH participant Cloud Hub as CH PH -> CH: Request all changes for filter {"schema": "schema.org/Recipe"} since the beginning of time CH -> PH: Return state token "ghi789" and list of commit IDs PH -> CH: Request full commit details for filter {"rev":"8ae4e543"} CH -> PH: Return full state for matching changes ``` > [name=Danny Strockis] Note that for client apps to partake in replication, we'll likely have to build SDKs. > [name=Danny Strockis] So I understand that this replication protocol is meant to work as a two-way sync between mobile clients and servers, where neither is the "master". But I only see a pull-style replication flow here. How does the phone sync its changes to the hub? Right now it seems like the hub(s) is/are the master instance(s), and the phone pushes changes to the server hubs before other instances can become aware of them. Does this really achieve the censorship protection goals you had? > > Can add couchDB style push changes into spec. > [name=Danny Strockis] On a related note, how do clients/oher hubs figure out which hub to contact to sync/get data, or to write a change? Will they always just pick the first in the DDO? Doesn't that first hub then retain censorship power? > > Similarly, if one hub sends back a 404 not found for a document, how does the client know it should check a different hub for the same document? Do clients need to always partake in replication and sync changes from multiple hubs to protect against censorship? How many hubs? ### Future replication enhancements #### Continuous log-based replication A future enhancement could add support for continuous log-based replication based on keepalive connections in the `Replication/Read` request. Rather than closing the connection after processing new commits, the connection would be kept open by periodic heartbeats, and further commits could be be sent as soon as they are available. This would require modification of the Hub request/response protocol to support streaming responses where each chunk of data is encrypted as a separate blob. #### Additional replication protocols The log-based replication system is a reasonably simple approach that has been used with success in other systems like CouchDB. However, if Hub usage grows signifigantly, it may make sense to investigate the possibility of adding more efficient but more complex replication options. As an example, [Invertible Bloom Filters](https://www.ics.uci.edu/~eppstein/pubs/EppGooUye-SIGCOMM-11.pdf) containing commit IDs could be used to efficiently identify commits missing from the *target*, after which the standard replication protocol could be used to transfer the necessary commit data.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.