# Hiding TxO selection for XAND-CT
While considering the selection algorithm, we refer to the following known vulnerability in the Monero selection algorithm.
https://eprint.iacr.org/2017/338.pdf
In summary, a uniform random selection from all the available TxOs is vulnerable to an attack that gives away any confidentiality. An old TxO is simply more likely to be already used up than an new TxO. For this reason, if an attacker guesses the true input TxO to be the one that is created last, it gives a pretty good approximation to the true input. The paper suggests a different algorithm for the input selection keeping in mind the actual probability distribution in the age of the true TxOs used in Monero.
However, for Xand, we do not have any data on the true distribution of true input TxOs. Hence, we might choose to use a different algorithm that prevents any age-based analysis. The trick is to use TxOs which are basically all the same age. We use the following algorithm for a ring size of $m$ -
- First choose $k \leftarrow \{1...m\}$
- Select $k-1$ transactions created just before the true TxO in the order of creation in the blockchain.
- Select $m-k$ transactions after the true TxO in the order of creation in the blockchain. If $m-k$ transactions are not available after the true TxO, repeat from the beginning.
This selects TxOs that are as close in their age as possible eliminating any possibility of age-based analysis. However, this leads to the following problems -
- It is possible that the TxOs created right before or after the true TxO are also member of the same transaction. This will break confidentiality to some extent.
- The following flood-attack is easier with this technique - https://eprint.iacr.org/2019/455.pdf
### Flood attack
The basic principle of a flood attack is simply to create a flood of transactions in the chain creating a huge number of TxO owned by the attacker. When any algorithm selects the hiding input, it is highly likely that it will select a considerable number from the set created by the attacker. The attacker can them simply eliminate them from the ring leaving only the true TxO. In case of the algorithm we suggested above, a flood attack is easier in the sense even intermittent flooding would contaminate all TxOs that are created at the same point in time.
The following modified algorithm may be considered instead to resolve the above problem. We will call it an $N$-age-bound random selection -
- We first fix a system-wide integer $N$ that is the bound in the transaction age.
- First choose $k \leftarrow \{1...N\}$
- Select $k-1$ transactions created just before the true TxO in the order of creation in the blockchain. Let the set selected be $\mathcal{P}$
- Select $N-k$ transactions after the true TxO in the order of creation in the blockchain. If $N-k$ transactions are not available after the true TxO, repeat from the beginning. Let the set selected be $\mathcal{S}$
- Select $m-1$ TxOs from the set $\mathcal{P} \cup \mathcal{S}$ with a uniform distribution. Repeat this step until there are no true TxO in this set.
From the Monero attack, it seems that a value of $N=1000$ is a good enough strategy. A higher $N$ in general resolves the problem of true TxO collision and reduces the problem with the flood attack, but also makes the system vulnerable to age-based analysis.
### Closed Set Attack
There was a recent attack called closed set attack described in https://fc19.ifca.ai/preproceedings/69-preproceedings.pdf
The closed set attack is derived from their statistical analysis on
CryptoNote-style cryptocurrencies. The attack is proven optimal assuming that no additional information is
given. In other words, in terms of the result, closed set attack is equivalent to brute-force attack, which exhausts all possible input choices and
removes those that are impossible given the constraints imposed by the mixins of each transaction.
This attack is based on the fact that n transaction inputs will and must use $n$ distinct public-keys as real-spend, since each publickey can only be redeemed once. A set of inputs is called a closed set if the number of inputs equals to the number of distinct public-keys included. Hence, we can deduce that all public-keys included in a closed set must be mixins in other inputs outside of this closed set. In this way, the searching for closed sets will be helpful to trace the real-spend of some other inputs. Different from cascade effect attack which relies on the “chain-reaction analysis” due to zero-mixin inputs, closed set attack conducts further traceability without relying on any previous traceable inputs.
The closed set attack is an iteration process that finds out all possible closed sets from the transaction inputs, removes public-keys included, and finds those traceable inputs. The closed set attack can render more inputs traceable. Such attack can start from any anonymous input to trace back previous transactions. For example, if we consider four inputs included in transactions $\{\mathsf{tx}_i\}_{i \in [4]}$ and assume that there are four distinct public-keys $\{pk_j\}_{j\in[4]}$ included in the input sets of them. Let
$$\mathsf{input}_1 = \{pk_1, pk_2, pk_3\}\\ \mathsf{input}_2 = \{pk_2, pk_3\} \\
\mathsf{input}_3 = \{pk_1,pk_3\} \\
\mathsf{input}_4 = \{pk_1,pk_2,pk_3,pk_4\}$$
Note that, there must exist no other transaction input who is only composed of public-keys among $\{\mathsf{pk}_j\}_{j\in[4]}$. Otherwise, the design principle of Monero that one output can only be redeemed once will be broken.
Although we can not make all aforementioned inputs traceable, but we can trace the real-spend of one of them. Specifically, consider the set $S = \{\mathsf{Input}_i\}_{i\in[3]}$. Among that, the union set of all distinct public-keys included is ${pk_1, pk_2, pk_3}$. Clearly, the size of $S$ equals to the number of distinct public-keys included in it such that it is a closed set. Since each output can be spent once only, then the output $pk_j (j \in [3])$ must be a real-spend in a certain $\mathsf{tx}_j (j \in [3])$. In this way, we can deduce that the real-spend of transaction 4 must be $pk_4$.
#### Analysis of vulnerability
In case of XAND, we choose $m$ hiding TxOs from a set of $N$ consequitive TxOs. We try to estimate the probability of getting a closed set in our chain.
Let us consider a chain with a total of $L$ TxOs and $T$ transactions. The total number of ways $T$ true inputs can be chosen from $L$ TxOs is $\binom{L}{T}$. For each of these choices, the total number of ways to choose $m-1$ inputs is $\left((N-1)\binom{N-2}{m-2} + \binom{N-1}{m-1}\right) = m \binom{N-1}{m-1}$. Hence, the total number of possible chains in terms of links is $m^T \binom{N-1}{m-1}^T\binom{L}{T}$.
To uniquely count each possible closed set, we define a window of TxOs in order starting from a fixed TxO. Since the starting TxO is fixed and the size is fixed to $N$, this does not overlap with other windows. We now compute the number of possible closed set of size $n$, i.e. with $n$ transactions, each having a ring of size $m$. Since we are looking at a closed set, it means that there are exactly $n$ TxOs out of the total $N$ that are used in the rings of the transactions. Since the first TxO must be used in every possible set, the number of ways we can select the rest of the $n-1$ TxOs to be included is $\binom{N-1}{n-1}$. Now, given each of these selections, we can choose the exact selection of $m-1$ remaining inputs for each of the $n$ transactions in $\binom{n-1}{m-1}^n$. Hence, the total number of choices for $n$ sized closed sets is $\binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n$.
Now, since the ring size is always $m$, the minimum value of $n$ is $m$. And since the window size is $N$, the maximu value of $m$ is $N$. So, the total number of ways we can have any closed set is $\sum_{n=m}^N \binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n$. In a chain of $L$ transactions, there are $L-N+1$ such windows. So, the total number of possible ways to make closed transactions is $(L-N+1)\sum_{n=m}^N \binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n$.
Now, once we have a closed set, the rest of the chain can be chosen in the regular manner. If we chose of closed set of length $n$, .So, the total number of chains of length $L$ with at least one closed set is at most $(L-N+1)\sum_{n=m}^N \binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n \cdot m^{T-n} \cdot \binom{N-1}{m-1}^{T-n} \cdot \binom{L-n}{T-n}$
Hence, the maximum probability of finding at least one closed set is $\frac{(L-N+1)\sum_{n=m}^N \binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n \cdot m^{T-n} \cdot \binom{N-1}{m-1}^{T-n} \cdot \binom{L-n}{T-n} }{m^T \cdot \binom{N-1}{m-1}^T \cdot \binom{L}{T}} \\= (L-N+1)\sum_{n=m}^N \binom{N-1}{n-1}\cdot \binom{n-1}{m-1}^n \cdot m^{-n} \cdot \binom{N-1}{m-1}^{-n} \cdot \frac{\binom{L-n}{T-n}}{\binom{L}{T}}$
We will first see that $f(n)=\binom{N-1}{n-1} \cdot \binom{n-1}{m-1}^n \cdot m^{-n} \cdot \binom{N-1}{m-1}^{-n} \cdot \frac{\binom{L-n}{T-n}}{\binom{L}{T}}$ is a monotone w.r.t. $n$. Computing $f(n)/f(n+1)$ and simplifying, we get $f(n)/f(n+1) = \frac{\binom{N-1}{n-1}}{\binom{N-1}{n}} \cdot m \cdot \frac{n+1-m}{n} \cdot \frac{\binom{N-1}{m-1}}{\binom{n}{m-1}} \cdot \frac{L-n}{T-n}$.
Now, we have $n \ge m > 1$. So, the minimum value of $\frac{n-m+1}{n}$ is obtained $n=m$ when the ratio evaluates to $1/m$. For higher values of $n$, the ratio gets closer to $1$. Hences, $m \cdot \frac{n+1-m}{n} \ge 1$. It is also true that $\frac{\binom{N-1}{m-1}}{\binom{n}{m-1}} \ge 1$ since $N > n$ whenever the ratio $f(n)/f(n+1)$ can be calculated in our domain. Also, $\frac{L-n}{T-n} > 1$ since $L>T$.
Hence, $f(n)/f(n+1) \ge 1$. Hence, the function $f(n)$ is a monotonically decreasing function. So, since we are looking for an upper bound, we assign the term under the sum to it's maximum value, i.e. when $n=m$.
When $n=m$, the term under the sum is less than $m^{-m}\binom{N-1}{m-1}^{-m+1}$. Assuming $m=5$ and $N=1000$, the term inside the sum is always less than $5^{-5}\binom{999}{4}^{-4} = 10^{-46}$. Assuming $L=10^{13}$, the probability of having at least one closed set is less than $10^{-30}$.