--- title: Cumulative Weight Calculation tags: iota --- # Cumulative Weight Calculation In the following we will discuss algorithms for calculating the cumulative weight of all transactions in a subgraph in the Tangle, comprised of all ancestors of some vertex $s$. This calculation is needed to perform the weighted random walk phase of the tip selection algorithm in IOTA. ## Definitions Let $G=(V, A)$ be a directed acyclic graph. For each $v \in V$, we define $F(v)$ to be the set of ancestors of $v$ (those $u \in V$ for which there is a path from $u$ to $v$). In the Tangle context, $F(v)$ contains all the *future* transactions of $v$ which approve it. We define $H(v)=1+|F(v)|$ to be the cumulative weight of $v$. In other words, the cumulative weight of a vertex is the number of ancestors it has plus one. ## IRI Algorithm This is the algorithm currently implemented in [IRI](https://github.com/iotaledger/iri/blob/master/src/main/java/com/iota/iri/service/tipselection/impl/CumulativeWeightCalculator.java). Input: A DAG $G=(V,A)$ and a starting vertex $s \in V$. This graph represents the main Tangle. 1. $\sigma =(v_1,\dots,v_n) \gets$ DFS-ordering of $G$ with source $s$ 1. For each $v$ in $\sigma$: 1. Compute $F(v)$ by DFS 1. $H(v) \gets 1+|F(v)|$ 1. (Optimization) Free the memory taken by $F(v)$, as it is no longer used ### Time complexity $O(n^2)$ for a linear Tangle and $O(log(n)\cdot n)$ for a balanced binary tree. ### Space complexity $O(n)$ for storing $F(s)$ as well as $H(v)$. ## DP Algorithm Input: A DAG $G=(V,A)$ and a starting vertex $s \in V$. This graph represents the main Tangle. 1. $\sigma =(v_1,\dots,v_n) \gets$ topological ordering of the [induced subgraph](https://en.wikipedia.org/wiki/Induced_subgraph) $G[F(s)]$ 1. For each $v$ in $\sigma$: 1. $F_v \gets \{ v \}$ 1. For each direct approver $u$ of $v$: 1. $F_v \gets F_v \cup F_u$ 1. $H(v) \gets |F_v|$ ### Time complexity Computing $\sigma$ can be done in $O(n)$ as in the Tangle every node has an outdegree of at most 2 and the loop runs $n$ times. Assuming that the set union has linear running time, this leads to $O(n^2)$ for a linear Tangle and $O(log(n)\cdot n)$ for a balanced binary tree. However, [significant](https://github.com/iotaledger/iri/issues/812) [improvements](https://github.com/iotaledger/iri/issues/813) can be introduced to reduce the set union runtime. ### Space complexity Computing the subgraph and sorting topologically both are $O(n)$ in memory for the Tangle case. However, the ancestor sets $F_v$ can get quite large. Let us define $S=\sum_{v\in \sigma} |F_v|$, the sum of the cardinalities of all ancestor sets. A naive worst case analysis implies an upper bound of $S \leq n^2$, since each vertex has at most $n$ ancestors. This leads to a space complexity of $O(n^2)$. However, we believe that actual bound is significantly lower, since we didn't take into account the tangle property that each vertex has at most 2 direct children. ### Further discussion 1. We are looking to show that this algorithm is not memory-prohibitive for practical use in the tangle. To achieve that, we need to estimate the size of $F(s)$ for a characteristic selected start vertex, which we have not yet done. This should be considered both theoretically and by looking at real tangle behavior. 1. Show a better bound for the space complexity than $O(|F(s)|^2)$. 1. For the case of selecting multiple starting points $s_1,s_2,\dots$ the algorithm can be modified in the following way: calculate the ancestors for each of the starting positions, and run the rest of the algorithm on the DAG made of $F(s_1) \cup F(s_2) \cup \dots$ Doing so will save a significant amount of redundant calculation. 1. Implementation in IRI would not be hard, but will require proper testing to make sure the memory consumption does not overflow. ## Cumulative Weight Propagation One of the computationally demanding processes when simulating the Tangle is the calculation of the cumulative weight. Since the cumulative of most transactions (txs) increases every time a tx gets revealed an efficient way to keep track of the cumulative weights is by storing the cumulative weight of all vertices in an array. The weight of a tx that gets revealed can then be added to the cumulative weight of all referenced txs by backwards propagation. Nevertheless, these algorithms may still be numerically demanding. An alternative and potentially numerically faster method is presented by the following method where the referenced transactions are stored in a bit vector to allow for efficient set union operations using boolean OR operations. More formally this may be expressed as follows: Let $N$ be the total number of considered txs in the tangle and let $x^i \in \{0,1\}^N$ denote the transactions directly or indirectly approved by tx $i$ with $$x^i_j := \begin{cases} 1,\, \text{for $i=j$,}\\ 1,\, \text{if $j$ is referenced by $i$,}\\ 0,\, \text{if $j$ is not referenced by $i$.} \end{cases}$$ Then, for any new tx $\ell$ approving two previous txs $\ell_1$ and $\ell_2$, the list of transactions referenced by $\ell$ is given by $$x^\ell = x^{\ell_1} \vee x^{\ell_2} \vee {\mathbf{e}_\ell}^T\,,$$ where $\vee$ is the elementwise boolean OR operation and $\mathbf{e}_\ell$ is the $\ell$-th vector of the canonical orthonormal basis. The cumulative weight $H(j)$ of a tx $j$ increases each time a tx $i$ is revealed that references it, i.e. $x^i_j = 1$. It therefore depends on the number of revealed txs and can be iteratively calculated by $$H_{i}(j)=H_{i-1}(j) + x^i_j\,.$$ ### Time complexity Updating the cumulative weights $H(v)$ for all $v \in V$ on each newly revealed tx involves two OR operations of bit vectors of size $N$ and then $O(N)$ operations to update the actual weights. ### Space complexity If no dynamic memory allocation is implemented the vector size is always $N$. Furthermore, since the probability of being left behind may be greater than zero, it should be considered to also remove reference lists of txs that are considered to be left behind.