# ZIP-10: Combinatorial Markets
This bulletin details the theory and design behind Forecasting Technologies' implementation of [Referendum #502](https://polkadot.polkassembly.io/referenda/502).
Most of this research is based on Gnosis' [documentation](https://docs.gnosis.io/conditionaltokens/) and [implementation](https://github.com/gnosis/conditional-tokens-contracts). Our implementation will closely follow theirs to ensure testability. All files will therefore include a Gnosis copyright notice and LGPL license note.
## ZIP-10.1: Theory of Combinatorial Pools and Betting
### Payout Vectors
As of version `<0.6.0`, when a market resolves, it resolves to a particular outcome (in case of categorical markets) or a number (in case of scalar markets). For various reasons, it's easier to think of a resolved market having a _payout_ vector though.
The payout vector of a resolved market with outcomes $x_1, \ldots, x_k$ is a vector $p = p(x_1, \ldots, x_k)$ of $k$ non-negative numbers with $\sum_{i=1}^k p_i = 1$. Redeeming one unit of the $i^{\mathrm{th}}$ outcome token yields $p_i$ units of collateral.
It's also convenient to denote the payout of an outcome $x$ by $p(x)$.
### Combinatorial Markets
For prediction markets $M_1, \ldots, M_n$ with outcome sets $O(M_1), \ldots, O(M_n)$, and for outcomes $x_i \in O(M_i)$, we denote by $(x_1, \ldots, x_n \in O(M_1) \times \cdots \times O(M_n)$ or $x_1 \land \cdots \land x_n$ the outcome is _true_ if and only if $x_1, \ldots, x_n$ are all true, or, if scalar markets are included, the outcome which has the payoff $p(x_1 \land \cdots \land x_n) = \prod_i p(x_i)$ where $p(x_i)$ is the payoff of $x_i$.
The _combinatorial market_ $M_1 \times \cdots \times M_n$ derived from $M_1, \ldots, M_n$ can be thought of as a market whose outcome tokens are $x_1 \land \cdots \land x_n$ for $x_i \in O(M_i)$.
But there are other types of combinatorial tokens. We can also introduce conjunctions for outcomes of the same market: If $x^1, \ldots, x^k$ are outcomes from the same market, then the conjunction $x^1 \lor \cdots \lor x^k$ is _true_ if and only if at least one of the outcomes $x^1, \ldots, x^k$ is true, or, if scalar markets are included, the outcome which has the payoff $p(x^1 \lor \cdots \lor x^k) = \sum_i p(x_k)$.
Putting it all together, if $x_i^1, \ldots, x_1^{k_i}$ are outcomes of $M_i$ for all $i = 1, \ldots, n$, then the outcome
$$
\bigwedge_{i=1}^n \bigvee_{j=1}^{k_i} x_i^j = (x_1^1 \lor \cdots \lor x_1^{k_1}) \land \cdots \land (x_n^1 \lor \cdots \lor x_n^{k_n})
$$
has the payout
$$
\prod_{i=1}^n \sum_{j=1}^{k_i} p(x_i^j).
$$
However, in Zeitgeist's implementation, where the resolution of markets and the trading of markets is completely separate, it's better not to think of the markets, but rather the pools as having been combined. In other words, a _combinatorial pool_ for the markets $M_1, \ldots, M_n$ is just a pool that contains, for all $x_i \in O(M_i)$, the outcomes $x_1 \land \ldots \land x_n$. Such a pool facilitates trading of outcomes contingent on other outcomes, but allows Zeitgeist to retain the simple resolution and dispute logic of separate categorical or scalar markets.
### Combinatorial Betting
Buying individual outcomes of a combinatorial market allows the user to bet on more specialzied outcomes (A _and_ B will happen), but the true strength of combinatorial markets lies is making bets on contingencies (if A occurrs, then so will B).
The following example is taken from [H13]. Let $D$ denote the event that the Democratic Party wins the 2016 U.S. Presidential election and let $H$ denote the event that Hillary Clinton is nominated as the Dem's candidate. The pairs of securities $(D, \neg D)$ (i.e. "Pays 1\$ if $D$" and "Pays 1\$ if not $D$") and $(H, \neg H)$ are traded on separate markets, but can be combined into a single market by using _splits_.
If you know how buying complete sets works on Zeitgeist, you already know one example of a split: The collateral token is split into a set of tokens which, when the markets resolves, will have the same value as the collateral token. By the same reasoning fashion, we can split non-collateral tokens like "Pays 1\$ if $D$" into "Pays 1\$ if $D$ and $H$" and $Pays 1\$ if $D$ and not $H$". For the sake of simplicity, we denote these tokens by $D$, $D \land H$ and $D \land \bar H$, resp. By combining the two markets above, we create a new market with four assets: $D \land H$, $D \land \bar H$, $\bar D \land H$ and $\bar D \land \bar H$.
Using splits, agents can now bet on correlations between the events $D$ and $H$. For example, to bet that the Dems win if Hillary is nominated, the agent would split their collateral into $x$ units of $H$ and $\bar H$, then split these units into $x$ units of $H \land D$ and $H \land \bar D$, then sell their units of $H \land \bar D$ for more $H \land D$. There are three possible outcomes:
- Hillary is nominated and wins the election. Each unit of $H \land D$ pays 1\$. The agent makes a profit as they own more than $x$ units.
- Hillary is nominated and loses the election. All assets that the agent holds are worthless.
- Hillary is not nominated. Each unit of $\bar H$ pays 1\$. The agent receives their buy-in of $x$ back (the bet is essentially declared void).
The spot price of this swap is equal to the [conditional probability](https://en.wikipedia.org/wiki/Conditional_probability) $P(D|H)$ predicted by the market. We denote the trade described above by $H \Rightarrow D$.
To prevent arbitrage opportunities, buying such a composite asset must not move the price of uninvolved outcomes. For example, betting that $D$ holds if $H$ holds should leave the price of $\bar H$ unchanged. After all, the agent is betting on what happens contingent on $H$; they're not making any claims as to how likely $H$ is to occur. This is summarized in the _modularity property_ defined by Hanson in [H03a]:
> Consider the case of an agent who bets with a market scoring rule, and bets only on the event $A$ given the event $B$. That is, this agent gains assets of the form “Pays \$1 if $A$ and $B$ hold”, in trade for assets of the form “Pays \$1 if $B$ holds and $A$ does not. [...] In general, depending on the particular market scoring rule, such a bet might change any probability estimate $p_i$, and thus change any event probability p(C) = \sum_{i \in C} p_i$. It seems preferable, however, for this bet to change as little as possible besides $p(A|B)$ (and of course $p(\bar A|B) = 1 − p(A|B)$).
(The market maker implemented in zrml-neo-swaps has that property.)
A fundamental idea is that trades made on prediction markets are not so much absolute bets ("I bet $10 that Hillary will win at 1:1 odds"), but rather that trades are made to "correct" the forecast of the market by changing the prices of the outcome. The agent executing the trade pays a price to do so, but is rewarded if the prediction is closer to the truth that the current one.
One could say that the agent buying $H \Rightarrow D$ claims that $H \land D$ is undervalued and $H \land \bar D$ is overvalued, and that $\bar H \land D$ and $\bar H \land \bar D$ are correctly priced or the agent doesn't know how to profitably change their price.
Viewed from this angle, a _combinatorial bet_ divides the assets in a combinatorial pool into three categories: _buy_ (the assets the agent thinks are undervalued), _sell_ (the assets the agent thinks are overvalued) and _keep_ (the assets whose prices the agent can't or won't change).
## ZIP 10.2: zrml-combinatorial-tokens
This module is responsible for handling the minting and burning of tokens representing combinatorial positions $x_1 \land \cdots \land x_n$ as described in ZIP 10.1.
### Collections and Positions
An important distinction which we've so far neglected to make is the distinction between an abstract _collection_ $x_1 \land \cdots \land x_n$ and a _position_, which is a collection together with a collateral token. Collections are purely abstract and used in the implementation. Positions are actual tokens on the chain.
Every collection and position has an ID, a 256 bit value, which is used to uniquely identify it. The ID need not be human-readable.
Gnosis use the notation `(A|B)&X` for the token $(A \lor B) \land X$ (that's a collection) and the notation `$:(A|B)&X` for the position obtained by taking that collection and using the collateral token `$`. We assume that the collateral token is always clear from the context and use the same notation for collections and positions.
### Splitting and Merging Tokens
Splitting is the process of creating collections and combinatorial positions from collateral or other outcome tokens. Merging is the process of putting them back together to simpler positions or collateral.
There are two types of splits and merges, _vertical_ and _horizontal_. Vertical splits increase the depth or complexity of the token, while horizontal splits create finer grained tokens of the same depth. We'll describe these types below, considering the special case where collateral is involved separately.
Splits and merges are specified using partitions of the index set of the markets outcomes. If the market has $n$ outcomes, the instructions for a split are encoded in a partition $\mathcal P$ of the index set $I_n = \{1, \ldots, n\}$. The exact meaning of the index set will become clear below.
#### Collateral
Collateral can only be split vertically. Let $M$ be a market with outcomes $x_1, \ldots, x_n$ and $\mathcal P$ be an partition of $I_n$. Then the vertical split $(M, \mathca P)$ splits collateral into the tokens $\bigvee_{i \in I} x_i$ for all $I \in \mathcal P$.
For example, if a market has three outcomes $A, B, C$ and $\mathcal P = \{\{0, 1\}, \{2\}\}$, then splitting collateral results in $A \lor B$ And $C$.
Merging into collateral is the opposite process. If, for some partition $\mathcal P$ of $I_n$, the user owns the tokens $\lor_{i \in I} x_i$ for all $I \in \mathcal P$, then they can merge these tokens into collateral.
It is easy to verify that the value under redeeming is invariant under these operations.
#### Proper Vertical Splits and Merges
Suppose the user owns a combinatorial token $t$ and let $M$ be a market with outcome tokens $x_1, \ldots, x_n$. Let $\mathcal P$ be a partition of $I_n$. Then the vertical split of $t$ using $(M, \mathcal P)$ is yields the tokens $t \land (\bigvee_{i \in I} x_i)$ for $I \in \mathcal P$.
For example, if $t = \mathrm{Yes}$, $M$ has three outcomes $A, B, C$ and $\mathcal P = \{\{0, 1\}, \{2\}\}$, then splitting $t$ with $(M, \mathcal P)$ yields the tokens $\mathrm{Yes} \land (A \lor B)$ and $\mathrm{Yes} \land C$.
Vertical merging is the opposite process. If, for some partition $\mathcal P$ of $I_n$, the user owns the tokens $t\land (\bigvee_{i \in I} x_i)$ for $I \in \mathcal P$, then they can merge these tokens into $t$.
It is easy to verify that the value under redeeming is invariant under these operations.
#### Proper Horizontal Splits and Merges
Suppose the user owns a combinatorial token $t$ which, for a market $M$ with outcomes $x_1, \ldots, x_n$, takes the form $t = t' \land (\lor_{i\in J} x_i)$ where $J \subseteq I_n$. Then for a partition $\mathcal P$ of $J$ (not $I_n$!), the token $t$ can be split into the tokens $t' \land (\bigvee_{i \in I} x_i)$ for all $I \in \mathcal P$.
For example, given two markets with outcomes $x_1 = A, x_2 = B, x_3 = C$ and $X, Y$, resp., consider the token $t = X \land (A \lor B)$. Then $t = t' \land (x_1 \lor x_2)$. Let $\mathcal P = \{\{1\}, \{2\}\}$. Then $t$ can be split into $t' \land x_1 = X \land A$ and $t' \land x_2 = X \land B$.
Horizontal merging is the opposite process. If, for some partition $\mathcal P$ of a set $J \subseteq I_n$, the user owns the tokens $t \land (\bigvee_{i \in I} x_i)$ for $I \in \mathcal P$, the user can merge these tokens into $t \land (\bigvee_{i \in J} x_i)$.
It is easy to verify that the value under redeeming is invariant under these operations.
### Redeeming Tokens
It is possible to partially redeem tokens. Suppose a combinatorial position contains multiple markets $M_1, \ldots, M_n$ and $M_n$ is resolved (recall that order doesn't matter, and it's convenient for us to assume that the last market is resolved). Then we can redeem the combinatorial position, but, unless $n = 1$, instead of receiving collateral, we will receive a position over the markets $M_1, \ldots, M_{n-1}$.
If $p(x_n)$ denotes the payout of $x_n$, then the token $x_1 \land \cdots \land x_n$ redeems for $p(x)$ units of $x_1 \land \cdots \land x_{n-1}$. In particular, if $n = 1$, then $x_1 = x_1 \land \cdots \land x_n$ redeems for $p(x_1)$ units of collateral.
More generally, the outcome token $\bigwedge_{i=1}^n \bigvee_{j=1}^{k_i} x_i^j$ redeems for
$$
p(\bigvee_{j=1}^{k_n} x_n^j) = \sum_{j=1}^{k_n} p(x_n^j)
$$
units of $\bigwedge_{i=1}^{n-1} \bigvee_{j=1}^{k_i} x_i^j$.
### Implementation
The implementation will happen as substrate pallet.
The pallet's `Config` will have an associated type `CombinatorialIdManager` which will serve as the equivalent of the `CTHelper` class from Gnosis' original implementation:
```rust
/// Handles calculations of combinatorial IDs.
pub trait CombinatorialIdManager {
type Asset;
type MarketId;
type CombinatorialId;
/// Calculate the collection ID obtained when splitting `parent_collection_id` over the market
/// given by `market_id` and the `index_set`.
///
/// If `force_max_work` parameter is set, the calculation will use up the maximum amount of work
/// necessary, independent of the other parameters. Should only be used for benchmarking
/// purposes.
fn get_collection_id(
parent_collection_id: Option<Self::CombinatorialId>,
market_id: Self::MarketId,
index_set: Vec<bool>,
force_max_work: bool,
) -> Option<Self::CombinatorialId>;
/// Calculate the position ID belonging to the `collection_id` combined with `collateral` as
/// collateral.
fn get_position_id(
collateral: Self::Asset,
collection_id: Self::CombinatorialId,
) -> Self::CombinatorialId;
}
```
The standard implementation of this trait is discussed in a separate section.
The pallet will have the following extrinsics:
```rust
/// Split `amount` units of the position specified by `parent_collection_id` over the market
/// with ID `market_id` according to the given `partition`.
///
/// The `partition` is specified as a vector whose elements are equal-length `Vec<bool>`. A
/// `true` entry at the `i`th index of a partition element means that the `i`th outcome
/// token of the market is contained in this element of the partition.
///
/// For each element `b` of the partition, the split mints a new outcome token which is made
/// up of the position to be split and the conjunction `(x|...|z)` where `x, ..., z` are the
/// items of `b`. The position to be split, in turn, is burned or transferred into the
/// pallet account, depending on whether or not it is a true combinatorial token or
/// collateral.
///
/// If the `parent_collection_id` is `None`, then the position split is the collateral of the
/// market given by `market_id`.
///
/// If the `parent_collection_id` is `Some(pid)`, then there are two cases: vertical and
/// horizontal split. If `partition` is complete (i.e. there is no index `i` so that `b[i]`
/// is `false` for all `b` in `partition`), the position split is the position obtained by
/// combining `pid` with the collateral of the market given by `market_id`. If `partition`
/// is not complete, the position split is the position made up of the
/// `parent_collection_id` and the conjunction `(x|...|z)` where `x, ..., z` are the items
/// covered by `partition`.
pub fn split_position(
origin: OriginFor<T>,
parent_collection_id: Option<CombinatorialIdOf<T>>,
market_id: MarketIdOf<T>,
partition: Vec<Vec<bool>>,
amount: BalanceOf<T>,
force_max_work: bool,
) -> DispatchResultWithPostInfo;
/// Merge `amount` units of the tokens obtained by splitting `parent_collection_id` using
/// `partition` into the position specified by `parent_collection_id` (vertical split) or
/// the position obtained by splitting `parent_collection_id` according to `partiton` over
/// the market with ID `market_id` (horizontal; see below for details).
///
/// The `partition` is specified as a vector whose elements are equal-length `Vec<bool>`. A
/// `true` entry at the `i`th index of a partition element means that the `i`th outcome
/// token of the market is contained in this element of the partition.
///
/// For each element `b` of the partition, the split burns the outcome tokens which are made
/// up of the position to be split and the conjunction `(x|...|z)` where `x, ..., z` are the
/// items of `b`. The position given by `parent_collection_id` is
///
/// If the `parent_collection_id` is `None`, then the position split is the collateral of the
/// market given by `market_id`.
///
/// If the `parent_collection_id` is `Some(pid)`, then there are two cases: vertical and
/// horizontal merge. If `partition` is complete (i.e. there is no index `i` so that `b[i]`
/// is `false` for all `b` in `partition`), the the result of the merge is the position
/// defined by `parent_collection_id`. If `partition` is not complete, the result of the
/// merge is the position made up of the `parent_collection_id` and the conjunction
/// `(x|...|z)` where `x, ..., z` are the items covered by `partition`.
pub fn merge_position(
origin: OriginFor<T>,
parent_collection_id: Option<CombinatorialIdOf<T>>,
market_id: MarketIdOf<T>,
partition: Vec<Vec<bool>>,
amount: BalanceOf<T>,
force_max_work: bool,
) -> DispatchResultWithPostInfo;
/// (Partially) redeems a position if part of it belongs to a resolved market given by
/// `market_id`.
///
/// The position to be redeemed is the position obtained by combining the position given by
/// `parent_collection_id` and `collateral` with the conjunction `(x|...|z)` where `x, ...
/// z` are the outcome tokens of the market `market_id` given by `index_set`.
///
/// The position to be redeemed is completely removed from the origin's wallet. According to
/// how much the conjunction `(x|...|z)` is valued, the user is paid in the position defined
/// by `parent_collection_id` and `collateral`.
///
/// The `force_max_work` parameter can be used to trigger the maximum amount of allowed work
/// for the combinatorial ID manager. Should only be used for benchmarking purposes.
pub fn redeem_position(
origin: OriginFor<T>,
parent_collection_id: Option<CombinatorialIdOf<T>>,
market_id: MarketIdOf<T>,
index_set: Vec<bool>,
force_max_work: bool,
) -> DispatchResultWithPostInfo;
```
#### The Pallet Account
One important distinction between splits and merges involving collateral and proper combinatorial tokens is that when collateral is split, the collateral, instead of being burned, is moved into the pallet account. This ensures that combinatorial-tokens never changes the total issuance of the collateral token. Conversely, if combinatorial tokens are merged into collateral, the collateral is moved out of the pallet account into the user's wallet.
On the other hand, if proper combinatorial tokens are split or merged, they are simply minted and burned accordingly.
Note that, in a departure from the original design used in prediction-markets for handling the collateral used to back outcome tokens, individual markets don't have their own separate account for storing collateral. This would be impossible, as the following example shows.
Let $M$ be a market with outcomes $A, B$ and $N$ be a market with outcomes $X, Y$. Suppose now the following happens:
- Alice splits one DOT into $A, B$
- Alice splits one unit of $A$ into $A \land X$ and $A \land Y$ and one unit of $B$ into $B \land X$ and $B \land Y$.
- Alice merges one unit of $A \land X$ and one unit of $B \land X$ into one unit of $X$ and one unit of $A \land Y$ and one unit of $B \land Y$ into one unit of $Y$.
- Alice merges one unit of $X$ and one unit of $Y$ into one DOT.
But if separate markets had separate accounts, the DOT would be deposited into the account of $M$ but the DOT in the last step would have to be withdrawn from the account of $N$.
This shows that using combinatorial tokens, Zeitgeist actually because one giant omni market of predictions.
### The Standard Implementation of `CombinatorialIdManager`
The standard implementation of `CombinatorialIdManager` using elliptic curves was pioneered by Gnosis.
The reason IDs are used in place of the entire data of the collection is that storing the entire collection data is heavy in terms of storage and comparing two collection IDs is heavy in terms of computation. The ID of a collection is defined as a cryptographic hash and designed to be sufficiently collision resistant. Beyond that, it satisfies the property that the order in which tokens are split does not matter. This is achieved by using elliptic curve operations.
#### `alt_bn128`
`alt_bn128` was defined in (Reitwiessner, 2017) as the elliptic curve $E$ defined by the equation
$$
y^2 = x^3 + 3
$$
over the Galois field $\mathbb F_p$ with modulus
$$
p = 21888242871839275222246405745257275088696311157297823662689037894645226208583.
$$
It was introduced on Ethereum to be used for zkSNARK verification. A Rust implementation may be found [here](https://github.com/zcash-hackworks/bn).
#### Primitive Collection IDs
A _collection ID_ is a 256-bit number with the following properties:
- The most significant bit is not set.
- The 254 less significant bits, interpreted as big endian unsigned integer, is the $x$-coordinate of `alt_bn128`.
We denote a collection ID by $(o, x)$, where $o$ is the second most significant bit and $x$ is the big endian unsigned integer defined by 254 less significant bits.
A primitive collection is mapped to its associated collection ID by first mapping it to a point on the elliptic curve `alt_bn128`, denoted as $E$, and then _compressing_ this curve point into a collection ID.
The compression algorithm is based on two simple observations:
- If $(x, y) \in E$, then $x^3 + 3 = y^2$, which means that up to its sign, $y$ is determined by $x$. Moreover, since $p$ is odd, $y$ and $-y = p - y$ have different parity.
- Since $p < 2^{254}$, we can store $x$ in memory using 254 bits.
<!-- > -->
Let $\mathbb Z_{2^256}$ denote the set of 256-bit unsigned integers. The compression $E \rightarrow \mathbb Z_{2^{256}}$ uses the second most significant bit to store whether $y$ is odd. In pseudo-code:
```python
def compress(x: int, y: int) -> int:
if y % 2 == 1:
x = x ^ (1 << 254)
return x
```
Unfortunately, not every $x \in \mathbb F_p$ is the $x$-coordinate of a point on $E$. Thus, decompression of a collection ID $x$ is a little more involved. We start using the following observation, which follows from [Euler's criterion](https://en.wikipedia.org/wiki/Euler%27s_criterion), which says that $x$ is a quadratic residue in $\mathbb F_p$ if and only if $x^{(p-1)/2} = 1$:
:::info
**Lemma.**
If $p \equiv 3 \mod 4$, then $x \in \mathbb F_p$ is a quadratic residue if and only if $y = x^{(p + 1)/2}$ satisfies $y^2 = x$.
:::
_Proof._ If $y = x^{(p + 1)/2}$ satisfies $y^2 = x$, then obviously $x$ is a quadratic residue. If, on the other hand, $x$ is a quadratic residue, then $x^{(p-1)/2} = 1$ by Euler's criterion. Thus, we have $y^2 = x^{(p+1)/2} = x^{(p-1)/2} x = x.$ QED.
Thus equipped, it's very easy for us to define a function which returns a root of its input $x$ if $x$ is a quadratic residue and `None` otherwise:
```python
def quadratic_residue(x: int) -> Optional[int]:
xx = (x * x) % p
xxx = (x * xx) % p
yy = (xxx + 3) % p
y = yy ** ((p + 1) / 4)
if (y * y) % p == yy:
return y
return None
```
The algorithm for decompressing a collection ID `(odd, x)` is the following:
```python
def decompress(odd: bool, x: int) -> (int, int):
y = quadratic_residue(x)
assert y is not None
if (odd and y % 2 == 0) or (not odd and y % 2 == 1):
y = p - y
return (x, y)
```
Mapping a primitive collection `(market_id, index_set)` consisting of a market ID and an index set to a point on $E$ is done using a variation of the `decompress` algorithm and makes use of a cryptographic `hasher` with 256 bit digest. The primitive collection is hashed, the result interpreted as 256-bit big endian unsigned integer $x$ and the repeatedly incremented until $x$ is the $x$-coordinate of a point on `alt_bn128`. The excess MSB is used to determine which of the two roots to take:
```python
def decompress_primitive_collection(market_id: int, index_set: list[bool]) -> (int, int):
odd = x >> 255 != 0
h = hasher((market_id, index_set))
while True:
x = (x + 1) % p
y = quadratic_residue(x)
if y is not None:
break
if (odd and y % 2 == 0) or (not odd and y % 2 == 1):
y = bn128.field_modulus - y
return (x, y)
```
Note that this ignores the second most significant bit of the hash result `h`. This is fine as this only slightly diminishes the collision resistance of the hasher. The fact that Gnosis have chosen to use what's essentially a do-while loop instead of a while loop strikes us as a little odd, but we'll keep this around so we can use Gnosis' implementation to generate test cases for ours.
There is no easily available bound on how many passes the while loop takes until a point of `alt_bn128` is found. Statistical testing with $250,000,000$ random samples of points in $\mathbb{F}_p^2$ gives us the following distribution:
| Iterations | Samples |
| ---------- | --------- |
| 0 | 124995762 |
| 1 | 62502095 |
| 2 | 31253303 |
| 3 | 15627739 |
| 4 | 7806601 |
| 5 | 3906045 |
| 6 | 1955579 |
| 7 | 976263 |
| 8 | 488150 |
| 9 | 244516 |
| 10 | 121976 |
| 11 | 61037 |
| 12 | 30362 |
| 13 | 15342 |
| 14 | 7595 |
| 15 | 3802 |
| 16 | 1890 |
| 17 | 974 |
| 18 | 458 |
| 19 | 236 |
| 20 | 130 |
| 21 | 79 |
| 22 | 33 |
| 23 | 19 |
| 24 | 6 |
| 25 | 2 |
| 26 | 3 |
| 30 | 1 |
| 31 | 1 |
| 32 | 1 |
The pattern that emerges is that increasing the number of iterations by 1 halves the number of points that require that many iterations. As exactly half of the points of $\mathbb{F}_p^2$ lie on alt_bn128, we expect no point to require more than $\log_2(p^2) \approx 500$ iterations.
The runtime implementation will use a `for` loop with a maximum number of passes to protect against an outlier to this statistic. The maximum will be a generous overestimation based on the statistic above, subject to the results of the benchmarks.
In particular, if finding the nearest point on-curve takes too many iterations, the function returns an error instead of taking up more resources than allocated. It's worth noting that agents have no control over the hash. Even if they find a particularly bad hash using brute force search off-chain, they won't be able to make any use of it. This leaves a very small likelihood that a user will hit an outlier when trying to split/merge a token.
But these algorithms require us to exponentiate $x$ by $(p + 1) / 4$, which is a giant integer. You'd think it's _game over_ at this point. Enter addition chains.
#### Addition Chains
Let $n$ be an integer. An _addition chain_ for $n$ is a strictly increasing sequence of integers $(n_1, \ldots, n_k)$ "starting with $1$ and ending with $n$, such that each number in the sequence is the sum of two previous numbers" (see [here](https://en.wikipedia.org/wiki/Addition_chain)).
Given an addition chain for $n$, there is an associated algorithm for calculating the powers $x^n$:
```python
def calculate(chain: list[int], x: int) -> int:
values = {1: x} # Stores previously calculates powers, maps `n` to `x^n`.
result = x
for prev, curr in zip(chain, chain[1:]):
d = curr - prev
v = values[d] # **Should** always work.
result = result * v
values[curr] = result
return result
```
The classical example of an addition chain is calculating the power $x^{16}$ by calculating $x^2 = x \cdot x$, $x^4 = x^2 \cdot x^2$, $x^8 = x^4 \cdot x^4$ and finally $x^{16} = x^8 \cdot x^8$ instead of stubbornly doing $x^2 = x \cdot x$, $x^3 = x^2 \cdot x$, $x^4 = x^3 \cdot x$, etc.
The gist is that addition sequences should bring the computational complexity of calculating $x^n$ down from $O(n)$ to $O(\log n)$.
Determining a sufficiently short addition sequence for a given integer (like $(p + 1) / 4$ bove) is non-trivial, but there's an algorithm (Cohen and Frey, 2005, p. 161-162), although their definition of $v \otimes w$ is flawed and should be
$$
v \otimes w = (v_0, \ldots, v_s, v_sw_1, \ldots, v_sw_t)
$$
and step 3 in the definition of `minchain(n)` should be `return chain(`$n, \gamma(n)$`)`. See [here](https://crypto.stackexchange.com/q/27179/71252) for a sample implementation (we have our own prototype implementation). The bets thing is that if $n$ is known in advance (which is true in our case), the developers can hardcode the algorithm.
In the case of $(p + 1) / 4$, the resulting addition chain has length 324, which is only slightly off $\log_2((p + 1) / 4) \approx 252$:
```
1, 2, 4, 8, 10, 11, 21, 42, 84, 126, 252, 504, 546, 557, 1114, 2228, 2249, 4498, 8996, 17992, 20241, 20798, 41596, 83192, 166384, 187182, 189431, 210229, 420458, 840916, 1030347, 2060694, 4121388, 8242776, 16485552, 18546246, 18756475, 19786822, 39573644, 79147288, 158294576, 177051051, 354102102, 708204204, 885255255, 1770510510, 3541021020, 3718072071, 3737858893, 7475717786, 14951435572, 29902871144, 37378588930, 41116447823, 82232895646, 164465791292, 328931582584, 657863165168, 665338882954, 665515934005, 1331031868010, 1996547802015, 2000285660908, 4000571321816, 6000856982724, 12001713965448, 24003427930896, 36005141896344, 72010283792688, 144020567585376, 288041135170752, 576082270341504, 1152164540683008, 1188169682579352, 1200171396544800, 1200836912478805, 1202837198139713, 2405674396279426, 3606511308758231, 4809348506897944, 8415859815656175, 13225208322554119, 21641068138210294, 43282136276420588, 64923204414630882, 129846408829261764, 143071617151815883, 286143234303631766, 572286468607263532, 593927536745473826, 736999153897289709, 1473998307794579418, 2210997461691869127, 2804924998437342953, 3541924152334632662, 6346849150771975615, 9888773303106608277, 19777546606213216554, 39555093212426433108, 49443866515533041385, 55790715666305017000, 111581431332610034000, 167372146998915051000, 177260920302021659277, 233051635968326676277, 410312556270348335554, 643364192238675011831, 1053676748509023347385, 2107353497018046694770, 3161030245527070042155, 6322060491054140084310, 6965424683292815096141, 8019101431801838443526, 16038202863603676887052, 24057304295405515330578, 48114608590811030661156, 55080033274103845757297, 110160066548207691514594, 220320133096415383029188, 440640266192830766058376, 550800332741038457572970, 605880366015142303330267, 1211760732030284606660534, 2423521464060569213321068, 4847042928121138426642136, 9694085856242276853284272, 9804245922790484544798866, 9812265024222286383242392, 9867345057496390228999689, 19679610081718676612242081, 29546955139215066841241770, 59093910278430133682483540, 88640865417645200523725310, 177281730835290401047450620, 354563461670580802094901240, 709126923341161604189802480, 738673878480376671031044250, 758353488562095347643286331, 1516706977124190695286572662, 2275060465686286042929858993, 4550120931372572085859717986, 4579667886511787152700959756, 9159335773023574305401919512, 13739003659535361458102879268, 14497357148097456805746165599, 19077025034609243958447125355, 38154050069218487916894250710, 76308100138436975833788501420, 90805457286534432639534667019, 109882482321143676597981792374, 219764964642287353195963584748, 310570421928821785835498251767, 420452904249965462433480044141, 731023326178787248268978295908, 1462046652357574496537956591816, 2193069978536361744806934887724, 2613522882786327207240414931865, 3344546208965114455509393227773, 6689092417930228911018786455546, 10033638626895343366528179683319, 20067277253790686733056359366638, 40134554507581373466112718733276, 42748077390367700673353133665141, 85496154780735401346706267330282, 170992309561470802693412534660564, 341984619122941605386825069321128, 384732696513309306060178202986269, 388077242722274420515687596214042, 430825320112642121189040729879183, 861650640225284242378081459758366, 1292475960337926363567122189637549, 2584951920675852727134244379275098, 5169903841351705454268488758550196, 6031554481576989696646570218308562, 6419631724299264117162257814522604, 12839263448598528234324515629045208, 13270088768711170355513556358924391, 19689720493010434472675814173446995, 32959809261721604828189370532371386, 52649529754732039300865184705818381, 85609339016453644129054555238189767, 138258868771185683429919739944008148, 276517737542371366859839479888016296, 414776606313557050289759219832024444, 553035475084742733719678959776032592, 1106070950169485467439357919552065184, 2212141900338970934878715839104130368, 2626918506652527985168475058936154812, 2712527845668981629297529614174344579, 2850786714440167312727449354118352727, 5563314560109148942024978968292697306, 8414101274549316254752428322411050033, 13977415834658465196777407290703747339, 27954831669316930393554814581407494678, 41932247503975395590332221872111242017, 50346348778524711845084650194522292050, 64323764613183177041862057485226039389, 128647529226366354083724114970452078778, 257295058452732708167448229940904157556, 514590116905465416334896459881808315112, 1029180233810930832669792919763616630224, 2058360467621861665339585839527233260448, 4116720935243723330679171679054466520896, 8233441870487446661358343358108933041792, 16466883740974893322716686716217866083584, 32933767481949786645433373432435732167168, 65867534963899573290866746864871464334336, 131735069927799146581733493729742928668672, 263470139855598293163466987459485857337344, 526940279711196586326933974918971714674688, 1053880559422393172653867949837943429349376, 2107761118844786345307735899675886858698752, 4215522237689572690615471799351773717397504, 8431044475379145381230943598703547434795008, 16862088950758290762461887197407094869590016, 33724177901516581524923774394814189739180032, 67448355803033163049847548789628379478360064, 134896711606066326099695097579256758956720128, 269793423212132652199390195158513517913440256, 539586846424265304398780390317027035826880512, 1079173692848530608797560780634054071653761024, 2158347385697061217595121561268108143307522048, 4316694771394122435190243122536216286615044096, 8633389542788244870380486245072432573230088192, 17266779085576489740760972490144865146460176384, 34533558171152979481521944980289730292920352768, 69067116342305958963043889960579460585840705536, 138134232684611917926087779921158921171681411072, 276268465369223835852175559842317842343362822144, 552536930738447671704351119684635684686725644288, 1105073861476895343408702239369271369373451288576, 2210147722953790686817404478738542738746902577152, 4420295445907581373634808957477085477493805154304, 8840590891815162747269617914954170954987610308608, 17681181783630325494539235829908341909975220617216, 35362363567260650989078471659816683819950441234432, 70724727134521301978156943319633367639900882468864, 141449454269042603956313886639266735279801764937728, 282898908538085207912627773278533470559603529875456, 565797817076170415825255546557066941119207059750912, 1131595634152340831650511093114133882238414119501824, 2263191268304681663301022186228267764476828239003648, 4526382536609363326602044372456535528953656478007296, 9052765073218726653204088744913071057907312956014592, 18105530146437453306408177489826142115814625912029184, 36211060292874906612816354979652284231629251824058368, 72422120585749813225632709959304568463258503648116736, 144844241171499626451265419918609136926517007296233472, 289688482342999252902530839837218273853034014592466944, 579376964685998505805061679674436547706068029184933888, 1158753929371997011610123359348873095412136058369867776, 2317507858743994023220246718697746190824272116739735552, 4635015717487988046440493437395492381648544233479471104, 9270031434975976092880986874790984763297088466958942208, 18540062869951952185761973749581969526594176933917884416, 37080125739903904371523947499163939053188353867835768832, 74160251479807808743047894998327878106376707735671537664, 148320502959615617486095789996655756212753415471343075328, 296641005919231234972191579993311512425506830942686150656, 593282011838462469944383159986623024851013661885372301312, 1186564023676924939888766319973246049702027323770744602624, 2373128047353849879777532639946492099404054647541489205248, 4746256094707699759555065279892984198808109295082978410496, 9492512189415399519110130559785968397616218590165956820992, 18985024378830799038220261119571936795232437180331913641984, 37970048757661598076440522239143873590464874360663827283968, 75940097515323196152881044478287747180929748721327654567936, 151880195030646392305762088956575494361859497442655309135872, 303760390061292784611524177913150988723718994885310618271744, 607520780122585569223048355826301977447437989770621236543488, 1215041560245171138446096711652603954894875979541242473086976, 2430083120490342276892193423305207909789751959082484946173952, 4860166240980684553784386846610415819579503918164969892347904, 9720332481961369107568773693220831639159007836329939784695808, 19440664963922738215137547386441663278318015672659879569391616, 38881329927845476430275094772883326556636031345319759138783232, 77762659855690952860550189545766653113272062690639518277566464, 155525319711381905721100379091533306226544125381279036555132928, 311050639422763811442200758183066612453088250762558073110265856, 622101278845527622884401516366133224906176501525116146220531712, 1244202557691055245768803032732266449812353003050232292441063424, 2488405115382110491537606065464532899624706006100464584882126848, 4976810230764220983075212130929065799249412012200929169764253696, 9953620461528441966150424261858131598498824024401858339528507392, 19907240923056883932300848523716263196997648048803716679057014784, 39814481846113767864601697047432526393995296097607433358114029568, 79628963692227535729203394094865052787990592195214866716228059136, 159257927384455071458406788189730105575981184390429733432456118272, 318515854768910142916813576379460211151962368780859466864912236544, 637031709537820285833627152758920422303924737561718933729824473088, 1274063419075640571667254305517840844607849475123437867459648946176, 2548126838151281143334508611035681689215698950246875734919297892352, 5096253676302562286669017222071363378431397900493751469838595784704, 10192507352605124573338034444142726756862795800987502939677191569408, 20385014705210249146676068888285453513725591601975005879354383138816, 40770029410420498293352137776570907027451183203950011758708766277632, 81540058820840996586704275553141814054902366407900023517417532555264, 163080117641681993173408551106283628109804732815800047034835065110528, 326160235283363986346817102212567256219609465631600094069670130221056, 652320470566727972693634204425134512439218931263200188139340260442112, 1304640941133455945387268408850269024878437862526400376278680520884224, 2609281882266911890774536817700538049756875725052800752557361041768448, 5218563764533823781549073635401076099513751450105601505114722083536896, 10437127529067647563098147270802152199027502900211203010229444167073792, 20874255058135295126196294541604304398055005800422406020458888334147584, 41748510116270590252392589083208608796110011600844812040917776668295168, 83497020232541180504785178166417217592220023201689624081835553336590336, 166994040465082361009570356332834435184440046403379248163671106673180672, 333988080930164722019140712665668870368880092806758496327342213346361344, 667976161860329444038281425331337740737760185613516992654684426692722688, 1335952323720658888076562850662675481475520371227033985309368853385445376, 2671904647441317776153125701325350962951040742454067970618737706770890752, 5343809294882635552306251402650701925902081484908135941237475413541781504, 10687618589765271104612502805301403851804162969816271882474950827083563008, 21375237179530542209225005610602807703608325939632543764949901654167126016, 42750474359061084418450011221205615407216651879265087529899803308334252032, 85500948718122168836900022442411230814433303758530175059799606616668504064, 171001897436244337673800044884822461628866607517060350119599213233337008128, 342003794872488675347600089769644923257733215034120700239198426466674016256, 684007589744977350695200179539289846515466430068241400478396852933348032512, 1368015179489954701390400359078579693030932860136482800956793705866696065024, 2736030358979909402780800718157159386061865720272965601913587411733392130048, 5472060717959818805561601436314318772123731440545931203827174823466784260096, 5472060717959818805561601436314318772174077789324455915672259473661306552146
```
The resulting algorithm can be summarized as follows:
```python
def pow_magic_number(x: int) -> int:
x_1 = x
x = x * x
x_2 = x
x = x * x
x = x * x
x = x * x_2
x_10 = x
x = x * x_1
x_11 = x
x = x * x_10
x_21 = x
x = x * x
x_42 = x
x = x * x
x = x * x_42
x = x * x
x = x * x
x = x * x_42
x = x * x_11
x_557 = x
x = x * x
x = x * x
x = x * x_21
x_2249 = x
x = x * x
x = x * x
x = x * x
x = x * x_2249
x = x * x_557
x_20798 = x
x = x * x
x = x * x
x = x * x
x = x * x_20798
x = x * x_2249
x_189431 = x
x = x * x_20798
x_210229 = x
x = x * x
x = x * x
x = x * x_189431
x_1030347 = x
x = x * x
x_2060694 = x
x = x * x
x = x * x
x = x * x
x = x * x_2060694
x = x * x_210229
x_18756475 = x
x = x * x_1030347
x_19786822 = x
x = x * x
x = x * x
x = x * x
x = x * x_18756475
x_177051051 = x
x = x * x
x = x * x
x = x * x_177051051
x = x * x
x = x * x
x = x * x_177051051
x = x * x_19786822
x_3737858893 = x
x = x * x
x_7475717786 = x
x = x * x
x = x * x
x = x * x_7475717786
x = x * x_3737858893
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x_7475717786
x = x * x_177051051
x_665515934005 = x
x = x * x
x = x * x_665515934005
x = x * x_3737858893
x_2000285660908 = x
x = x * x
x = x * x_2000285660908
x = x * x
x_12001713965448 = x
x = x * x
x = x * x_12001713965448
x_36005141896344 = x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x_36005141896344
x = x * x_12001713965448
x = x * x_665515934005
x_1200836912478805 = x
x = x * x_2000285660908
x_1202837198139713 = x
x = x * x
x = x * x_1200836912478805
x_3606511308758231 = x
x = x * x_1202837198139713
x_4809348506897944 = x
x = x * x_3606511308758231
x_8415859815656175 = x
x = x * x_4809348506897944
x_13225208322554119 = x
x = x * x_8415859815656175
x_21641068138210294 = x
x = x * x
x = x * x_21641068138210294
x = x * x
x = x * x_13225208322554119
x_143071617151815883 = x
x = x * x
x = x * x
x = x * x_21641068138210294
x_593927536745473826 = x
x = x * x_143071617151815883
x_736999153897289709 = x
x = x * x
x = x * x_736999153897289709
x = x * x_593927536745473826
x_2804924998437342953 = x
x = x * x_736999153897289709
x_3541924152334632662 = x
x = x * x_2804924998437342953
x_6346849150771975615 = x
x = x * x_3541924152334632662
x_9888773303106608277 = x
x = x * x
x = x * x
x = x * x_9888773303106608277
x = x * x_6346849150771975615
x_55790715666305017000 = x
x = x * x
x = x * x_55790715666305017000
x = x * x_9888773303106608277
x_177260920302021659277 = x
x = x * x_55790715666305017000
x_233051635968326676277 = x
x = x * x_177260920302021659277
x_410312556270348335554 = x
x = x * x_233051635968326676277
x_643364192238675011831 = x
x = x * x_410312556270348335554
x_1053676748509023347385 = x
x = x * x
x = x * x_1053676748509023347385
x = x * x
x = x * x_643364192238675011831
x_6965424683292815096141 = x
x = x * x_1053676748509023347385
x_8019101431801838443526 = x
x = x * x
x = x * x_8019101431801838443526
x = x * x
x = x * x_6965424683292815096141
x_55080033274103845757297 = x
x = x * x
x_110160066548207691514594 = x
x = x * x
x = x * x
x = x * x_110160066548207691514594
x = x * x_55080033274103845757297
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x_110160066548207691514594
x = x * x_8019101431801838443526
x_9812265024222286383242392 = x
x = x * x_55080033274103845757297
x_9867345057496390228999689 = x
x = x * x_9812265024222286383242392
x_19679610081718676612242081 = x
x = x * x_9867345057496390228999689
x_29546955139215066841241770 = x
x = x * x
x = x * x_29546955139215066841241770
x = x * x
x = x * x
x = x * x
x = x * x_29546955139215066841241770
x = x * x_19679610081718676612242081
x_758353488562095347643286331 = x
x = x * x
x = x * x_758353488562095347643286331
x = x * x
x = x * x_29546955139215066841241770
x_4579667886511787152700959756 = x
x = x * x
x = x * x_4579667886511787152700959756
x = x * x_758353488562095347643286331
x_14497357148097456805746165599 = x
x = x * x_4579667886511787152700959756
x_19077025034609243958447125355 = x
x = x * x
x = x * x
x = x * x_14497357148097456805746165599
x_90805457286534432639534667019 = x
x = x * x_19077025034609243958447125355
x_109882482321143676597981792374 = x
x = x * x
x = x * x_90805457286534432639534667019
x_310570421928821785835498251767 = x
x = x * x_109882482321143676597981792374
x_420452904249965462433480044141 = x
x = x * x_310570421928821785835498251767
x_731023326178787248268978295908 = x
x = x * x
x = x * x_731023326178787248268978295908
x = x * x_420452904249965462433480044141
x_2613522882786327207240414931865 = x
x = x * x_731023326178787248268978295908
x_3344546208965114455509393227773 = x
x = x * x
x = x * x_3344546208965114455509393227773
x = x * x
x = x * x
x = x * x_2613522882786327207240414931865
x_42748077390367700673353133665141 = x
x = x * x
x = x * x
x = x * x
x = x * x_42748077390367700673353133665141
x = x * x_3344546208965114455509393227773
x_388077242722274420515687596214042 = x
x = x * x_42748077390367700673353133665141
x_430825320112642121189040729879183 = x
x = x * x
x_861650640225284242378081459758366 = x
x = x * x_430825320112642121189040729879183
x = x * x
x = x * x
x = x * x_861650640225284242378081459758366
x = x * x_388077242722274420515687596214042
x_6419631724299264117162257814522604 = x
x = x * x
x = x * x_430825320112642121189040729879183
x_13270088768711170355513556358924391 = x
x = x * x_6419631724299264117162257814522604
x_19689720493010434472675814173446995 = x
x = x * x_13270088768711170355513556358924391
x_32959809261721604828189370532371386 = x
x = x * x_19689720493010434472675814173446995
x_52649529754732039300865184705818381 = x
x = x * x_32959809261721604828189370532371386
x_85609339016453644129054555238189767 = x
x = x * x_52649529754732039300865184705818381
x_138258868771185683429919739944008148 = x
x = x * x
x = x * x_138258868771185683429919739944008148
x_414776606313557050289759219832024444 = x
x = x * x_138258868771185683429919739944008148
x = x * x
x = x * x
x = x * x_414776606313557050289759219832024444
x = x * x_85609339016453644129054555238189767
x_2712527845668981629297529614174344579 = x
x = x * x_138258868771185683429919739944008148
x_2850786714440167312727449354118352727 = x
x = x * x_2712527845668981629297529614174344579
x_5563314560109148942024978968292697306 = x
x = x * x_2850786714440167312727449354118352727
x_8414101274549316254752428322411050033 = x
x = x * x_5563314560109148942024978968292697306
x_13977415834658465196777407290703747339 = x
x = x * x
x = x * x_13977415834658465196777407290703747339
x = x * x_8414101274549316254752428322411050033
x_50346348778524711845084650194522292050 = x
x = x * x_13977415834658465196777407290703747339
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x
x = x * x_50346348778524711845084650194522292050
return x
```
Of course, we will replace the `*` operation with a checked Galois field multiplication.
#### General Collection IDs
Until now, we've only established how to calculate the IDs of primitive collections. Recall that, as a requirement, the order in which primitive collections are added to the collection should not matter when calculating the ID. So, if $T_1, T_2$ are two primitive collections, then $T_1 \land T_2$ and $T_2 \land T_1$, although they are technically not the same, **must** have the same ID. This allows users to trade tokens which would technically be different but equivalent in payout.
This is where the elliptic curve $E$ comes in. On any elliptic curve, there is a natural commutative _pairing_ $+\colon E \times E \rightarrow E$, _commutative_ being the key word. To calculate the ID two collections, decompress both, take their sum and then compress them again:
```python
def calculate_collection_id(collection_id_1: (bool, int), collection_id_2: (bool, int)) -> int:
x1, y1 = decompress(*collection_id_1)
x2, y2 = decompress(*collection_id_2)
x3, y3 = alt_bn128_sum((x1, y1), (x2, y2))
return compress(x3, y3)
```
Thanks to the commutativity of $(E, +)$, the return value of `calculate_collection_id` is independent of the order of its arguments.
#### Position IDs
The ID of a combinatorial position is a cryptographic hash of the collateral ID and the ID of the collection. The _collateral ID_ is an abstraction of the `Asset` object used on Zeitgeist to ensure that changes to the `Asset` object doesn't change the expected hashes.
### Notes on Integration
#### Dealing with Collection IDs and Position IDs
Due to the structure of collection and position IDs, the indexer will have to do some heavy lifting and maintains a bi-directional mapping of collection and position IDs to human readable outcome tokens. This mapping could take the following form:
```
{
[207, 168, 160, 93, 238, 221, 197, 1, 171, 102, 28, 24, 18, 107, 205, 231, 227, 98, 220, 105, 211, 29, 181, 30, 53, 7, 200, 154, 134, 246, 38, 139]: {0: [1, 1, 0], 5: [0, 0, 0, 1]},
// etc.
}
```
This means that the position ID `[207, 168, 160, 93, 238, 221, 197, 1, 171, 102, 28, 24, 18, 107, 205, 231, 227, 98, 220, 105, 211, 29, 181, 30, 53, 7, 200, 154, 134, 246, 38, 139]` represents belongs to the markets with IDs `0` and `5` and represents the token $(x_1 \land x_2) \lor y_4$ where $x_1, x_2, x_3$ are the outcome tokens of market `0` and $y_1, \ldots, y_4$ are the outcome tokens of market `5`.
#### Migration Approach
Zeitgeist will use the old system for creating outcome tokens (`buy_complete_set`, `sell_complete_set`) and redeeming (`redeem_shares`) implemented in zrml-prediction-markets (let's call this Zeitgeist 1.0) alongside the new system. The two will not collide in any way (you can use the new system on old markets, etc.). Old and new outcome tokens will not be interchangeable though.
The long-term migration approach will be to officially announce the deprecation of the old system:
- The frontend will no longer allow access to the old system.
- Following a grace period, the old tokens, liquidity pools, etc. will be destroyed.
### Notes
- It's not necessary for markets to have a `base_asset` field - we can just let users split using any collateral they like, provided that it's allowed as base asset for markets. This is not a property of this particular implementation of the split/merge design: We could equally well just have added a `collateral` parameter to the `buy_complete_set` and `sell_complete_set` functions.
## ZIP 10.3: Combinatorial Betting using zrml-neo-swaps
We've already discussed what combinatorial betting is in ZIP 10.1. We will outline the details here.
### General Approach
Allowing users to make combinatorial bets means allowing them to create pools with combinatorial tokens in them, and to specify arbitrary _buy_, _keep_ and _sell_ sets (as defined above in ZIP 10.1) when buying and selling.
On the other hand, letting users create pools with arbitrary combinations of combinatorial tokens is bound to be too complex. One straightforward approach that allows capturing all possible combinatorial bets involving a finite set of markets $M_1, \ldots, M_n$ with the same `base_asset` field and outcome tokens $x_1^1, \ldots, x^{k_1}_1, \ldots, x_n^1, \ldots, x_n^{k_n}$ is to create a single pool containing the following combinatorial positions:
$$
x_1^{i_1} \land x_2^{i_2} \land \cdots \land x_n^{i_n}
$$
where $i_1 \in I_{k_1}, i_2 \in I_{k_2}, \ldots, i_n \in I_{k_n}$ and $I_j = \{1, \ldots, j\}$ as defined above. We call these _atoms_, as they allow the most fine-grained combinatorial bets on outcomes defined in these markets.
The existing `join` and `exit` functions for liquidity pools, and the functionality behind then, including liquidity trees, will remain the same for combinatorial pools. Below, we will provide formulas for letting users specify arbitrary buy, keep and sell sets. Thus, we have the two fundamental functions of combinatorial betting covered: Pools and liquidity, as well as opening and closing bets.
### Gory Details: Formulas
Let $M$ be a prediction market with outcomes $1, \ldots, n$. A _combinatorial bet_ on $M$ is a tuple $(B, K, S)$ consisting of three mutually disjoint subsets $B, K, S \subseteq \{1, \ldots, n\}$ with $B \neq \emptyset$, $S \neq \emptyset$ and $B \cup K \cup S = \{1, \ldots, n\}$. Note that $K$ is allowed to be empty.
The sets $B$, $K$ and $S$ are called the _buy_ (or _back_), _keep_ (or _void_) and _sell_ (or _lay_) of the bet. The informant making this combinatorial bet wagers that one of the outcomes in $B$ will occur if none of the outcomes in $K$ occur. If one of the outcomes of $K$ occur, their bet is voided.
Combinatorial bets are used to make bets on conditional probabilities. For example, consider two events $X$ and $Y$ which are used to define a market with four outcomes: $X \land Y$, $X \land \bar Y$, $\bar X \land Y$ and $\bar X \land \bar Y$, where overbar denoted negation ($\bar X$ means "$X$ does not occur") and $\land$ denotes conjunction. If an informant wants to bet that if $X$ occurs, then $Y$ occurs, they make the combinatorial bet $B = \{X \land Y\}$, $K = \{\bar X \land Y, \bar X \land \bar Y\}$, $S = \{X \land \bar Y\}$. If $X$ and $Y$ materialize, the informant will make a profit. If $X$ does not materialize, the bet is declared void. If $X$ materializes but $Y$ doesn't, they lose. We denote this bet by $X \Rightarrow Y$.
The execution of a combinatorial bet is implemented in DLMSR by generalizing the buy and sell operations described above to involve arbitrary sets of tokens. Specifically, opening a combinatorial bet for $x$ dollars is implemented by buying $x$ complete sets and then selling the tokens in $S$ for a particular amount $y(x)$ of each token in $B$ while simply retaining $x$ of the tokens in $K$. One can think of the informant's position in $K$ as an insurance.
Returning to the example, if $X$ and $Y$ materialize, the informant will make a profit of $y(x) - x$. If $X$ does not materialize, the market resolves to one of the outcomes contained in $K$, which means the informant holds $x$ winning tokens, which means they receive back their buy-in of $x$ dollars. If $x$ materializes but $Y$ doesn't, they lose their stake in the bet. With this in mind, the notation $\bar K \Rightarrow B$ seems intuitive for arbitrary combinatorial bets ($S$ can be reconstructed as $\{1, \ldots, n\} \setminus (B \cup K)$).
Cancelling a combinatorial bet is implemented by equalizing the informants positions in the tokens of $B$, $K$ and $S$ by swapping on the market so that they end up holding a certain number of full sets which they can then sell. This is done by performing up to two elementary _equalizations_, which are described below: We first equalize, say, $B$ and $K$, and then equalize $B \cup K$ and $S$. As DLMSR is path-independent, this particular order can be replaced with any of its permutations.
Considering the implementation described above, note that buying and selling as defined earlier in the document is just a combinatorial bet with $|B| = 1$ and $K = \emptyset$.
#### Notation
As in the previous sections, let $\varphi(b, r)$ denote the trading function. We will fix $b$ once and for all and just denote it by $\varphi(r)$ for the sake of convenience. For any $r$, we define
$$
\psi \colon \mathbb{R} \rightarrow \mathbb{R}, \qquad \psi(r) = \exp(r/b).
$$
and for every subset $I \subseteq \{1, \ldots, n\}$, we define
$$
\psi_I \colon \mathbb{R}^n \rightarrow \mathbb{R}, \qquad \psi_I(r) = \sum_{i \in I} \psi(r_i).
$$
Note that
$$
\varphi(r) = \psi_{\{1, \ldots, n\}}(-r) = \sum_{i=1}^n \psi(-r).
$$
The derivative of $\psi_I$ is
$$
\psi_I'(r) = \frac{1}{b} \psi_I(r).
$$
For the sake of brevity, we also let $\mathbb{R}$ operate on $\mathbb{R}^n$ by translation and write
$$
r + s = r + (s)^n = (r_1 + s, \ldots, r_n + s).
$$
Note that, using the properties of the exponential function, we have $\psi_I(r + s) = \psi(s) \psi_I(r)$.
#### Combinatorial Buys
Recall that opening (buying) a combinatorial bet $(B, K, S)$ for $x$ dollars is implemented by buying $x$ complete sets and then selling the tokens contained in $S$ for $y(x)$ more units of each token in $B$. This changes the reserve of the pool to
$$
r_i' = \begin{cases}
r_i - y(x), & i \in B, \\
r_i, & i \in K, \\
r_i + x, & i \in S.
\end{cases}
$$
Thus,
\begin{align*}
1 &= \varphi(r') \\
&= \psi_S(-r-x) + \psi_K(-r) + \psi_B(-r + y(x)) \\
&= \psi(-x) \psi_S(-r) + \psi_K(-r) + \psi(y(x))\psi_B(-r).
\end{align*}
Solving for $y(x)$ yields
$$
y(x) = b \ln \left( \frac{1 - \psi(-x) \psi_S(-r) - \psi_K(-r)}{\psi_B(-r)} \right).
$$
The argument of the $\ln$ can be rewritten to solve certain numerical issues. For example, the terms $\psi_K(-r)$ and $\psi_B(-r)$ can be replaced by their counterparts $\psi_B(-r) + \psi_S(-r)$ and $1 - \psi_K(-r) - \psi_S(-r)$, resp. For example,
$$
y(x) = b ( \ln ( 1 - \psi(-x) \psi_S(-r) - \psi_K(-r)) - \ln \psi_B(-r) )
$$
Note also that the entire $\ln$ expression can be split using $\ln(u/v) = \ln u - \ln v$.
As the combinatorial bet consists of different amounts of the tokens from $B$ and $K$, there is, strictly speaking, no clearly defined "total amount" received. However, it makes sense to consider the amount $z(x) = x + y(x)$ of tokens of $B$ that the informant receives for spot price calculations.
In fact, the spot price can then be calculated using l'Hopital's rule, as follows.
$$
p(B, K, S) = \lim_{x \rightarrow 0} \frac{x}{z(x)} = \frac{1}{z'(0)}.
$$
We have $z'(0) = 1 + y'(0)$, and, by virtue of a simple calculation,
$$
y'(x) = \frac{\psi(-x) \psi_S(-r)}{1 - \psi(-x)\psi_S(-r) - \psi_K(-r)}.
$$
Thus,
\begin{align*}
y'(0) &= \frac{\psi_S(-r)}{1 - \psi_S(-r) - \psi_K(-r)} \\
&= \frac{\psi_S(-r)}{\psi_B(-r)}.
\end{align*}
Using the properties of the trading functions, we get
\begin{align*}
1 + y'(0) &= \frac{\psi_B(-r) + \psi_S(-r)}{\psi_B(-r)} \\
&= \frac{1 - \psi_K(-r)}{\psi_B(-r)},
\end{align*}
and, thus, finally,
$$
p(B, K, S) = \frac{\psi_B(-r)}{1 - \psi_K(-r)}.
$$
Note that this is compatible with the formula $p_i = e^{-r_i/b}$ for the spot price of a single outcome token if $B = \{i\}$, $K = \emptyset$.
#### Equalizing Positions
Suppose we have two non-empty disjoint sets $B, S \subseteq \{1, \ldots, n\}$. Let $K = \{1, \ldots, n\} \setminus (B \cup S)$. Further assume that an informant has $t$ units of (each token in) $B$ and $s$ units of (each token in) $S$. Now suppose that $t \geq s$ and they want to sell units $t' \in [0, t-s]$ units of $B$ for more units of $S$ so that they end up with the same amount of $B$ and $S$. This process is called _equalization_.
Let $u(s, t)$ be the amount of $S$ received. This means that the informant holds $t + t'$ units of $B$ and $s + u(s, t)$ units of $S$ after the trade. The reserves of the pool, on the other hand, are
$$
r_i' = \begin{cases}
r_i + t', & i \in B, \\
r_i, & i \in K, \\
r_i - u(s, t), & i \in S.
\end{cases}
$$
post execution. Note that, as the informant now holds the same amount of $B$ and $S$, we have $s + u(s, t) = t + t'$ and thus
$$
u(s, t) = t - t' - s.
$$
Using the DLMSR trading function, we get
\begin{align*}
1 &= \varphi(r') \\
&= \psi_B(-r-t') + \psi_S(-r+u(s, t)) + \psi_K(-r) \\
&= \psi(-t') \psi_B(-r) + \psi(-t') \psi_S(-r + t - s) + \psi_K(-r).
\end{align*}
Solving for $t'$ yields
$$
t' = b \ln \left( \frac{1 - \psi_K(-r)}{\psi_B(-r) + \psi(t-s)\psi_S(-r)} \right).
$$
The amount of $B \cup S$ the informant now holds is (in analogy to earlier sections) $v(s, t) = t - t'$. Same comments as above apply regarding approaches to solving numerical issues.
#### Selling Combinatorial Bets
Cancelling (selling) a combinatorial bet $(B, K, S)$ is implemented by performing one or two equalizations and then selling whatever amount of complete sets this leaves the informant.
Specifically, if $K = \emptyset$, then the combinatorial bet is sold by equalizing $B$ and $S$, leaving the informant with an equal amount of both $B$ and $S$. But as $K = \emptyset$, these holds are just complete sets, which are then sold.
If $K \neq \emptyset$, then the sell is implemented by first equalizing $B$ and $K$, leaving the informant with the same amount of $B$ and $K$ (but in general different amounts of $B \cup K$ and $S$), and thereafter equalizing $B \cup K$ and $S$, leaving them with a certain amount of full sets.
The order of these two equalizations can be changed thanks to the path-independence of DLMSR. This could be helpful in creating situations which are numerically well-behaved.
In summary, the naive algorithm for selling a position in pseudocode is the following:
```
Procedure Equalize(buy, keep, amount_buy, amount_keep)
// Performs equalization as defined in the section above and returns the absolute value of the delta of the buy position.
End Procedure
Procedure ComboSell(buy, sell, amount_buy, amount_keep)
keep ← assets.difference(buy).difference(sell)
If keep
delta_buy ← Equalize(buy, keep, amount_buy, amount_keep)
amount_buy_keep ← amount_buy - delta_buy
Else
// `amount_keep` should zero here!
amount_buy_keep ← amount_buy
End If
buy_keep ← buy.union(keep)
delta_buy_keep ← Equalize(buy_keep, sell, amount_buy_keep, 0)
Return amount_buy_keep - delta_buy_keep
End Procedure
```
### Implementation
Combinatorial betting will be implemented as an addition to the neo-swaps pallet. The following significant changes are made outside of adding extrinsics.
- Pools are now indexed by a pool ID as opposed to a market ID. The type of the pool ID can be specified using an associated type in `Config`. Pools previously indexed by their market ID will just keep their index - since we'll just the same type for market IDs and pool IDs, we'll be fine.
- The `PoolStorage` interface implements a counted map which doesn't recycle pool IDs.
- The `MaxSplits` associated type in `Config` specifies how many splits are allowed when creating a combinatorial market.
- The neo-swaps pallet will have access to token splits and merges through an API implemented by combinatorial-tokens.
- Each pool will have a _pool type_. A pool can either be _standard_ or _combinatorial_. Included in the pool type is a list of markets associated with the pool. A standard pool has one market associated with it, a combinatorial pool an arbitrary number. The `PoolType` struct implements a method that allows iterating over the markets. External fees will be charged for every market associated with the pool.
- Each pool will have an `assets` object which contains all assets held in the pool for convenience.
The following extrinsics are added:
```rust
/// Make a combinatorial bet on the specified pool.
///
/// The `amount_in` is paid in collateral. The transaction fails if the amount of outcome
/// tokens received is smaller than `min_amount_out`. The user must correctly specify the
/// number of outcomes for benchmarking reasons.
///
/// The user's collateral is used to mint complete sets of the combinatorial tokens in the
/// pool. The parameters `buy` and `sell` are used to specify which of these tokens the user
/// wants and doesn't want: The assets in `sell` are sold to buy more of `buy` from the
/// pool. The assets not contained in either of these will remain in the users wallet
/// unchanged.
///
/// The function will error if certain numerical constraints are violated.
///
/// # Parameters
///
/// - `origin`: The origin account making the purchase.
/// - `pool_id`: Identifier for the pool used to trade on.
/// - `asset_count`: Number of assets in the pool.
/// - `buy`: The assets that the user want to have more of. Must not be empty.
/// - `sell`: The assets that the user doesn't want any of. Must not be empty.
/// - `amount_in`: Amount of collateral paid by the user.
/// - `min_amount_out`: Minimum number of outcome tokens the user expects to receive.
///
/// # Complexity
///
/// Depends on the implementation of `CombinatorialTokensUnsafeApi` and `ExternalFees`; when
/// using the canonical implementations, the runtime complexity is `O(asset_count)`.
pub fn combo_buy(
origin: OriginFor<T>,
#[pallet::compact] pool_id: T::PoolId,
asset_count: AssetIndexType,
buy: Vec<AssetOf<T>>,
sell: Vec<AssetOf<T>>,
#[pallet::compact] amount_in: BalanceOf<T>,
#[pallet::compact] min_amount_out: BalanceOf<T>,
) -> DispatchResult;
/// Cancel a combinatorial bet on the specified pool.
///
/// The `buy`, `keep` and `sell` parameters are used to specify the amounts of the bet the
/// user wishes to cancel. The user must hold `amount_buy` units of each asset in `buy` and
/// `amount_keep` of each asset in `keep` in their wallet. If `keep` is empty, then
/// `amount_keep` must be zero.
///
/// The transaction fails if the amount of outcome tokens received is smaller than
/// `min_amount_out`. The user must correctly specify the number of outcomes for
/// benchmarking reasons.
///
/// The function will error if certain numerical constraints are violated.
///
/// # Parameters
///
/// - `origin`: The origin account making the purchase.
/// - `pool_id`: Identifier for the pool used to trade on.
/// - `asset_count`: Number of assets in the pool.
/// - `buy`: The `buy` of the bet that the user wishes to cancel. Must not be empty.
/// - `keep`: The tokens not contained in `buy` or `sell` of the bet that the user wishes to
/// cancel. May be empty.
/// - `sell`: The `sell` of the bet that the user wishes to cancel. Must not be empty.
/// - `amount_buy`: Amount of tokens in `buy` the user wishes to let go.
/// - `amount_keep`: Amount of tokens in `keep` the user wishes to let go.
/// - `min_amount_out`: Minimum number of outcome tokens the user expects to receive.
///
/// # Complexity
///
/// Depends on the implementation of `CombinatorialTokensUnsafeApi` and `ExternalFees`; when
/// using the canonical implementations, the runtime complexity is `O(asset_count)`.
pub fn combo_sell(
origin: OriginFor<T>,
#[pallet::compact] pool_id: T::PoolId,
asset_count: AssetIndexType,
buy: Vec<AssetOf<T>>,
keep: Vec<AssetOf<T>>,
sell: Vec<AssetOf<T>>,
#[pallet::compact] amount_buy: BalanceOf<T>,
#[pallet::compact] amount_keep: BalanceOf<T>,
#[pallet::compact] min_amount_out: BalanceOf<T>,
) -> DispatchResult;
/// Deploy a combinatorial pool for the specified markets and provide liquidity.
///
/// The tokens of each of the markets specified by `market_ids` are split into atoms. For
/// each combination of outcome tokens `x, ..., z` from the markets, there is one
/// combinatorial token `x & ... & z` in the pool.
///
/// The pool's assets are ordered by lexicographical order, using the ordering of tokens of
/// each individual market provided by the `MarketCommonsApi`. For example, if three markets
/// with outcomes `x_1, x_2`, `y_1, y_2` and `z_1, z_2` are involved, the outcomes of the
/// pool are (in order):
///
/// x_1 & y_1 & z_1
/// x_1 & y_1 & z_2
/// x_1 & y_2 & z_1
/// x_1 & y_2 & z_2
/// x_2 & y_1 & z_1
/// x_2 & y_1 & z_2
/// x_2 & y_2 & z_1
/// x_2 & y_2 & z_2
///
/// The sender specifies a vector of `spot_prices` for the assets of the new pool, in the
/// order as described above.
///
/// Depending on the values in the `spot_prices`, the transaction will transfer different
/// amounts of each outcome to the pool. The sender specifies a maximum `amount` of outcome
/// tokens to spend.
///
/// Unlike in the `deploy_pool` extrinsic, the sender need not acquire the outcome tokens
/// themselves. Instead, all they need is `amount` units of collateral.
///
/// Deploying the pool will cost the signer an additional fee to the tune of the
/// collateral's existential deposit. This fee is placed in the pool account and ensures
/// that swap fees can be stored in the pool account without triggering dusting or failed
/// transfers.
///
/// The `force_max_work` parameter can be used to force the `CombinatorialTokensApi` to
/// spend the maximum amount of work, independently of the parameters that it is called
/// with. This is useful for benchmarking purposes and should not be used in production.
///
/// # Complexity
///
/// `O(n)` where `n` is the number of splits required to create the pool.
pub fn deploy_combinatorial_pool(
origin: OriginFor<T>,
asset_count: AssetIndexType,
market_ids: Vec<MarketIdOf<T>>,
amount: BalanceOf<T>,
spot_prices: Vec<BalanceOf<T>>,
swap_fee: BalanceOf<T>,
force_max_work: bool,
) -> DispatchResult;
```
The old extrinsics are only modified in that they will use the newly implemented combinatorial math for calculations.
#### Numerical Issues
One issue when calculating the formulas above is the exponential function and how easily it can over or underflow. We solve this issue simply by rejecting any trade that would result in such in any image of $\exp$ becoming too large or small. This essentially results in us rejecting any trade that pushes the price of any asset in the pool above or below a certain theshold.
This might sound like a problem upon first glimpse, as it looks like this could "deadlock" trading. If a particular asset's price drops very close to the lower limit, it can not effectively be used as the `sell` parameter of a buy operation.
But this is actually not an issue, both in terms of UX as well as in terms of effective information aggregation: If an asset's price drops _this_ low, that means the market deems it very unlikely. If an agent adds this asset to the `sell` parameter of the upcoming combinatorial bet just means that they wish to provide the information that they think that the outcome is even more unlikely. That's... not very helpful. The LPs, who are basically paying for this information, are right to reject that trade.
But what if the agent has some useful information on some other outcomes and their trade is rejected due to their stance on some very cheap asset? They can, instead of adding it to the `sell` parameter, just add it to the `keep` parameter. This won't change the conditions of their trade very much, but will allow the trade to go through without any numerical calculations throwing errors.
#### Merging Existing Pools
In addition to creating combinatorial pools, it should also be possible to merge two standard pools into a combinatorial pool. (The terminology is a bit odd here: The pools are merged, but this done by splitting tokens.) The backend is able to accomodate this process, but will not implement it directly. Instead, this is something to be handled on the frontend.
We must assume that an LP owns two liquidity pools and wants to merge them. Suppose the first pool has the outcome tokens $x_1, \ldots, x_n$ and the second pool has the outcome tokens $y_1, \ldots, y_m$. Denote their prices by $a_1, \ldots, a_n$ and $b_1, \ldots, b_m$, resp. The process of the merge is the following then:
- Withdraw all liquidity from both pools.
- Split the tokens of the first pool and second pool into atoms over the two markets: $z_{ij} = x_i \land y_j$ for $i = 1, \ldots, n$, $j = 1, \ldots, m$.
- Create a new combinatorial pool. The price of $x_i \land y_j$ in the new pool is $a_i \cdot b_j$. Deposit as much liquidity as is available to the user.
The choice of price is justified as follows: The event $x_i$ is represented in the pool as the set of events $z_{ij}$ for $j = 1, \ldots, m$. That means that the sum of the prices $c_{ij}$ of $z_{ij}$ for $j = 1, \ldots, m$ should be equal to the original price $a_i$ of $x_i$. And this is actually the case, as
$$
\sum_j c_{ij} = \sum_j a_ib_j = a_i \sum_j b_j = a_i
$$
thanks to the fact that DLMSR guarantees all prices to sum up to $1$. The same argument yields $\sum_i c_{ij} = b_j$. This means that the pool is created with the exact right prices. Note that this sort of construction is impossible with market makers that don't satisfy the no-arbitrage property.