Note @vmx: Thinking about it again, the whole proposal isn't really different from just using two CIDs, one for the context, one for the content. There was another idea floating around about generalizing the idea of multicodec code + bytes
, I'll try to find some time to write that down as well.
This proposal adds another layer on top of CIDs to describe application specific context. This can range from semantic information about the data, or auxiliary data that is needed for traversal. It can be considered a fat-pointer.
The Application Context tries to solve several problems.
The Application Context is arbitrary structured data that describes some application specific information. It could range from a simple identifier, that attaches some semantic meaning, to a full WASM module including its interface definitions. It is expected that common schemas for those contexts will arise.
The Application Context is then encoded as:
โโโโ<application-context-code><hash><cid>
application-context-code
is a Multicodec code.hash
is hash of the structured data that is encoded as DAG-CBOR and hashed with SHA2-256.cid
is a CIDv1 that points to the content.There are two slightly different cases termed dynamic and static invocation.
The dynamic invocation is when the data is interpreted in different ways at run time. The UnixFS pathing use cases from above is a good example. You have fixed data stored, but you want to be able to traverse it in different ways. The static use case is e.g. used in the IPLDVM use case from above, where you want to store the executable code directly with the data.
A gateway implementation of the dynamic use case would introduce a new endpoint called context
. That makes it already clear that we are requesting an application context, hence we don't need to supply the Application Context code. It takes two parameters that are represented like a path. The first one is a Multibase encoded version of the SHA-256 hash of the context, the second one is a Multibase encoded CIDv1. This means the endpoint be /context/{multibase-encoded-sha-256-hash}/{multibase-encooded-cidv1}
.
The reason to keep the context and the CID separate, as opposed to Multibase-encoding the together, is to make caching easier. If the application context is a commonly used one, then the gateway can easily cache the context and use the Multibase encoded hash as an identifier. There could then be several requests with different data (CIDs).
It is represented as a path as the context has a single argument, which is the CID.
The static invocation is for storing the context directly in the DAG. Again, it should not be part of a link itself, but rather provide additional information. This could be things like decryption keys. The application can decide on how to store it within the IPLD data model. Likely it would be a two element tuple, where the first element is the hash of the structure data describing the application context. The second element is then the CID.
In the Multiformats/IPLD world, there can be three layers identified, that serve different purposes:
The missing piece is the computation/semantics layer, which this proposal tries to fill.
Introducing a new concept can be costly. The decision to not extend current CIDs (creating a CIDv2) was made due to several reasons:
It will enable new use cases for the IPLD world. On the Multicodec repository, there are several issues where people wanted to bend CIDs for their needs. This proposal makes many of those case possible.
It's an optional new feature that gateways are free to implement. In doesn't have any compatibility issues.
The application context itself is just content addressed data, the security implications depend on the individual context. Not all gateways will implement all contexts.
The CIDv2 - Tagged Pointers IPIP is similar, but extending CID itself.
One could do tricks with inline CIDs and put some data there. Inline CIDs are highly problematic as they lead to many special cases. Ideally they are not used at all, hence using it for new cases is not a good idea.
Copyright and related rights waived via CC0.
Most of those ideas were discussed at the LabWeek event in Lisbon 2022. I'd like to thank everyone who has contributed to all this, especially: