Parametrized multihashes
=========================
This proposal is for hash functions that have different results, depending on the given parameters. Examples are BLAKE2, Skein or Poseidon. While it's possible to put all 244 variants of Skein into the multicodec table, it's not easily possible with e.g. Poseidon, which has a bigger parameter space. An option would be to only add parameters that people actually use.
But even this is a problem, as there certainly is a certain overhead involved in adding entries to the multicodec table. Currently, the entries in the multicodec table are curated and people prefer having short identifiers, which leads to a bit of coordination effort to add a new hash function. Then implementations need to be updated.
Ideally you'd only need to go through this process for a hash function family, and then you can choose any parameters you like. The important property of a multihash is, that it's identifiable and users can verify a hash, solely based on the information the multihash provides.
This proposal keeps those properties and retrofits it into the current multihash system.
Hashing parameters
------------------
As the parameter space may be large, we define it in a structured way, hash that information and use that as an identifier. Those identifiers may even collide, as an application is not expected to support all possible hash functions, with all possible parameters. If it turns out that a collision is troublesome, some salt could be added.
For structured data, JSON is a natural choice, in this ecosystem DAG-JSON, which is valid JSON with some additional constraints to make it better suitable for content-addressed data. To keep things simple, a subset is used, which is trivially both valid JSON and valid DAG-JSON. All of JSON is allowed with those restrictions:
- Keys need to be sorted (lexicographically, as they are UTF-8, it's sorted by the bytes).
- Keys that contain only a slash (`/`) are forbidden.
- Whitespace outside strings must be removed.
- Numbers must be without exponent.
For Poseidon, such a parameter description could look like this:
```json
{"arity":2,"curve":"bls12-381","rounds_full":8,"rounds_partial":55,"sbox":5,"security_level":128}
```
Those parameters (the bytes of the string itself) are then hashed. As outlined above, collisions are OK, hence a non-cryptographic function like xxHash is OK to use. The xxHash32 of the parameters is `c2eba2dc`.
### Global parameters
There are also parameters that can be applied to any hash. Those are:
- `salt`: an arbitrary number so that you can get a different hash if you need it
- `truncate`: number of bits a hash should be truncated to
Make it self-describable
------------------------
In order to make the hash self-describable, the hash family as well as its parameters are needed. Having the hash family outside the parameters, makes it easy for applications to early on distinguish whether they can verify the hash or not.
For every new hash family, the multicodec table will only get a single entry, e.g. `poseidon`, which would then also describe the possible parameters. Applications would then define their parameters and hash those. Probably common ones will be published in some list for reference.
So the information that needs to be stored is:
```
<parametrized-hash-identifier><hash-family><hash-of-parameters><actual-hash>
```
Retrofit into Multihash
-----------------------
Current multihashes look like this:
```
<varint hash function code><varint digest size in bytes><hash function output>
```
To make it work with current multihash implementations, it will be retrofitted into the existing system. A parametrized hash looks like this:
```
<varint of parametrized-hash-identifier><varint of the rest of the bytes><varint hash family><32-bit xxhash of the parameters><hash function output>
```
The size of the actual hash output doesn't need to be specified, as it can be derived from the parameters.
### Example
In this example, we use 0x300003 for the parametrized hash identifier and 0x345678 for the Poseidon hash family.
So for Poseidon with the parameters as used in the example above, it will look like:
<varint of 0x300003><varint of 40><varint of 0x345678><c2eba2dc><bytes of the Poseidon hash output>
And as concrete example in bytes (represented as hex values):
8380c001 40 f8acd101 c2eba2dc 766d7869736d796e616d0x657468617466616b65736173686132323536686173