owned this note
owned this note
Published
Linked with GitHub
# Michelson combs in 008
In Edo values of type `pair` are allowed to have more than two arguments.
It was made for more optimal record handling: reading/writing nested structures at linear cost, reducing code size.
### Micheline syntax
Several Michelson domain types can have different representations, e.g. `address` can be of `string` (readable) or `bytes` (optimised), `timestamp` can be either `int` (optimised) or `string` (readable).
Same works for the renewed `pair`, in the readable form one should use:
```
Pair a b c d
```
Optimised form depends on the number of arguments
```
Pair a b # if n == 2
Pair a (Pair c d) # if n == 3
{ a; b; c; d; } # if n >= 4
```
Separately, in order to produce backward-compatible Big_map key hashes one should use:
```
Pair a (Pair b (Pair c d))
```
Lastly, you can use any form when injecting operations (but won't see that in RPC output), e.g. have:
```
{a; b}
Pair a b c
```
### Binary representation
~~Binary representation remains the same, e.g. the following values have equal byte forms:~~
```
Pair a b c d
Pair a (Pair b (Pair c d))
```
However, when unpacking you have to choose one of the representations listed above, depending on the context and number of args.
## Rationale
Michelson combs are "supposed" to be a temporary solution, an intermediate step to native records.
This step is needed for several reasons:
1. To reduce gas and storage consumption right now
2. To get compilers ready for native records
3. To ensure safe transition from pairs to records in the next update
> <speculation>
>
> It's possible that pairs will be eliminated in favour of native records, which means that all existing contracts will be altered during a context migration (like it was in Babylon with spendable/delegatable contracts).
> </speculation>
### Overloading `pair` vs introducing new type
The only argument so far is that introducing new primitive is undesireable because total number of primitives is limited by 255, already used 140. The limit can be increased at the cost of altering all decoders/encoders.
## Consequences
Although combs are supposedly a temporary solution we will still have to live with them for some time and all the tooling has to be updated in accordance with the new spec.
Several comments regarding the design choice and consequences for tooling developers are listed below.
### Overloading `pair` type
1. **Overloading means altering existing behavior**
Unlike introducing a new type when developers have just to add a new handler, overloading breaks existing logic, leads to ad-hoc patches, complexity, and eventually technical debt.
2. **Overloading comes with exceptions**
Attempting to fit new functional into existing primitive leads to limitations and exceptional cases that have to be handled.
In our case, we get two different rules for pairs of length 2, and 3+, different manipulation instructions, implicit interoperability between old-style and new combs (`pair a b c d` and `pair a (pair b (pair (c d)))` are interoperable).
3. **Overloading is implicit itself**
Lastly, the semantics of the word "pair" gives no clues about potentially arbitrary number of arguments, this can be very misleading to anyone new in Tezos willing to build on top of Michelson.
### Multiple Micheline representations
The main problem with multiple data representations is that you have to always expect any of the possible options, which can lead to very complex code, especially in languages that are not as expressive as OCaml (which is mostly the case for client libraries/indexers/other RPC feed consumers).
This is especially a major issue for JSON parsers, because e.g. for `timestamp` you can receive both `{"string": "2020-12-04T00:00:00Z"}` and `{"int": "123456789"}` which is not good but workable. However when the variants are `{"prim": "Pair", "args": [{}, {}, {}, {}]}` vs `[{}, {}, {}, {}]` (completely different layout) it becomes very difficult to handle using standard parsers in commonly used languages.
Currently there is no consistency in which form (readable/optimised) you will see the data in various places.
For instance, you can deploy contract with an optimised initial storage, but it is not guaranteed that you will receive the same form via RPC if querying `origination` operation, or external `transaction`, or internal `transaction` (inter-contract call), or contract `storage`.
Separately, different clients can deploy contracts/inject calls with args in both readable/optimised form, thus when you do parse operations spawned by various clients you cannot expect a single data representation (heterogenity).
This issue has been also described in https://gitlab.com/tezos/tezos/-/issues/843
#### What if I don't use `pair a b c d +` in my contract?
As was mentioned, 008 combs and nested pairs with comb layout are treated as the same type internally (and also have the same binary representation). Thus even if your contract e.g. accepts `pair a (pair b (pair c d))` as a parameter, anyone (alternative client) can craft a transaction that uses new combs and send `Pair A B C D` which in turn will be either `{"prim": "Pair", args: [A, B, C, D]}` or `[A, B, C, D]` in the RPC response and will break your parsing logic.
Example:
1. Origination operation (note the storage type in the script) https://rpc.tzkt.io/edonet/chains/main/blocks/21551/operations/3
2. Big map value https://rpc.tzkt.io/edonet/chains/main/blocks/head/context/big_maps/63/expru4T4AdfAxaa4ZFJMWkMssihhZU3GtQKvd1e2DexDqR7CPtYT8K
## Possible solutions
### Unparsing mode in RPC
The cheapest solution that would resolve the main issue with multiple Micheline representations is to let the RPC consumer decide which form (either readable or optimised) he wants to get. That will introduce consistency and abstract RPC users from the protocol complexity.
Specifically it could be an optional query parameter in the url, e.g.:
```
GET /chains/main/blocks/187/operations/3?mode=optimised
GET /chains/main/blocks/head/context/contracts/KT1A4mDg25qvC3tcwyNFy87kcCpLNyfvop2M/storage?mode=readable
```
Basically, it means resolving ambiguity at a single place (tezos-node) instead of having multiple implementations across the ecosystem libraries.
Linked issues:
- https://gitlab.com/tezos/tezos/-/issues/1016
- https://gitlab.com/tezos/tezos/-/issues/1020
### Fully-fledged records
A more radical path is to refuse Michelson combs in favour to fully-fledged records. Additionally, it seems that a more gentle approach would be to deprecate `pair` type instead of enforcing context migration, because changing existing smart contracts is not something that should be considered a good practice.
This approach however cancels all the work done by core developers and those who already support the change (smartpy, ligo, tzstats), plus the advantage of decreasing gas/storage costs will likely be more desirable by the broad audience comparing to eliminating the technical debt.
## Conclusions
An opinion that is luckily shared by the majority of the developer community is that there is a need to change the process of making design choices, at least for Michelson language and RPC interface used to build tools and apps on top of.
It can be RFC-based approach (currently explored by LIGO team), TZIP-based (similarly to EVM EIPs), or other. Ideally, one should consider opinions of representatives of devs working on the core, compilers, indexers, client libraries, dapps, other tools and applications.
## Appendix
### List of RPC endpoints requiring "unparsing mode" param
```
GET /chains/{}/mempool/pending_operations
GET /chains/{}/blocks/{}
GET /chains/{}/blocks/{}/operations
GET /chains/{}/blocks/{}/operations/3
GET /chains/{}/blocks/{}/operations/3/{}
GET /chains/{}/blocks/{}/context/contracts/{}
POST /chains/{}/blocks/{}/context/contracts/{}/big_map_get
GET /chains/{}/blocks/{}/context/contracts/{}/script
GET /chains/{}/blocks/{}/context/contracts/{}/storage
GET /chains/{}/blocks/{}/context/big_maps/{}/{}
POST /chains/{}/blocks/{}/helpers/parse/operations
POST /chains/{}/blocks/{}/helpers/preapply/operations
POST /chains/{}/blocks/{}/helpers/scripts/run_code
POST /chains/{}/blocks/{}/helpers/scripts/run_operation
POST /chains/{}/blocks/{}/helpers/scripts/trace_code
```