--- tags: explainers description: Explaining the swap-or-not shuffle in Ethereum 2.0 image: https://benjaminion.xyz/f/favicon-96x96.png --- # Shuffling in Ethereum 2.0 ![My avatar](https://benjaminion.xyz/f/ms-icon-144x144.png =32x32) Ben Edgington (Protocol Engineering @[ConsenSys](https://consensys.net/) — but views expressed are all my own) _3 September 2020_ ## Introduction If you want to [learn to dance](https://www.youtube.com/watch?v=RECaepj8LkU), you're in the wrong place. But trust me, the Eth2 shuffle is _almost_ as exciting. Shuffling lists is a fundamental operation in Ethereum&nbsp;2.0. It is mainly used to pseudo-randomly assign validators to the committees that operate during every 12 second slot, but also to select the beacon chain block proposer at each slot. Shuffling seems fairly simple. Although there are [pitfalls](https://www.developer.com/tech/article.php/616221/How-We-Learned-to-Cheat-at-Online-Poker-A-Study-in-Software-Security.htm) to watch out for, it is a pretty well understood problem in computer science. The gold standard is probably the [Fisher--Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle). So why aren't we using that for Eth2? Well, I'll explain more at the [end](#Why-swap-or-not) of the article, but in short: light clients. Rather than Fisher--Yates, we're using something called a "swap-or-not" shuffle. This is based on the work in [this paper](https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf), intended originally for building encryption schemes. I recently [re-wrote](https://github.com/PegaSysEng/teku/blob/0.12.3/ethereum/datastructures/src/main/java/tech/pegasys/teku/datastructures/util/CommitteeUtil.java#L109) our implementation in the [Teku](https://pegasys.tech/teku/) Eth2 client, so thought I'd write it up while it's still fresh in my mind. ## The Swap-or-Not Shuffle ### A single round The shuffle proceeds in rounds. Each round is the same, and I'll describe just one round below. It's much simpler than it looks :slightly_smiling_face: #### 1. Choose a pivot and find the first mirror index First, we pick a pivot index `p`. This is pseudorandomly chosen, based on the round number and some other seed data. The pivot is fixed for the rest of the round. With this pivot, we then pick a mirror index `m1` halfway between `p` and `0`. That is, `m1 = p / 2`. (We are going to ignore pesky off-by-one rounding issues for the purposes of this explanation.) <p style="text-align:center"> <img width="776" height="115" alt="The pivot and the first mirror index." src="https://benjaminion.xyz/images/200821_shuffling_0.png"><br> <i>The pivot and the first mirror index.</i> </p> #### 2. Traverse first mirror to pivot, swapping or not For each index between the mirror index `m1` and the pivot index `p`, we randomly decide whether we are going to swap the element or not. Consider the element at index `i1`. If we choose not to swap it, we just move on to consider the next index. If do we decide to swap, then we exchange the list element at `i1` with that at `i1'`, its image in the mirror index. That is, `i1` is swapped with `i1' = m1 - (i1 - m1)`, so that `i1` and `i1'` are equidistant from `m1`. We make the same swap-or-not decision for each index between `m1` and `p`. <p style="text-align:center"> <img width="776" height="151" alt="Swapping or not from the first mirror up to the pivot." src="https://benjaminion.xyz/images/200821_shuffling_1.png"><br> <i>Swapping or not from the first mirror up to the pivot.</i> </p> #### 3. Calculate the second mirror index After doing all the indices from `m1` to `p`, mirroring in `m1`, we now find a second mirror index at `m2`, which is the point equidistant from `p` and the end of the list. In fact `m2 = m1 + n / 2`. <p style="text-align:center"> <img width="776" height="142" alt="The second mirror index." src="https://benjaminion.xyz/images/200821_shuffling_2.png"><br> <i>The second mirror index.</i> </p> #### 4. Traverse pivot to second mirror, swapping or not Finally, we repeat the swap-or-not process, considering all the points from the pivot, `p` to the second mirror `m2`. If we choose not to swap, we just move on. If we choose to swap then we exchange the element at `j1` with it's image at `j1'` in the mirror index `m2`. <p style="text-align:center"> <img width="776" height="160" alt="Swapping or not from the pivot to the second mirror." src="https://benjaminion.xyz/images/200821_shuffling_3.png"><br> <i>Swapping or not from the pivot to the second mirror.</i> </p> #### Putting it all together At the end of the round, we have considered all the indices between `m1` and `m2`, which is half of the total indices, and for each index either swapped it or not with a distinct index in the other half. So, all of the indices have been considered exactly once for swapping. The next round begins by incrementing (or decrementing) the round number, which gives us a new pivot index, and off we go again. <p style="text-align:center"> <img width="776" height="179" alt="The whole process running from one mirror to the other." src="https://benjaminion.xyz/images/200821_shuffling_4.png"><br> <i>The whole process running from one mirror to the other in a single round.</i> </p> ### Interesting things #### Clever stuff When deciding whether to swap or not for each index, the algorithm cleverly bases its decision on the higher of the candidate index or its image in the mirror. That is, `i_1` rather than `i_1'` (when below the pivot), and `i_k'` rather than `i_k` (when above the pivot). This means that we have flexibility when running through the indices of the list: we could do `0` to `m1` and `p` to `m2` as two separate loops, or do it with a single loop from `m1` to `m2` as I outlined above (and [implemented](https://github.com/PegaSysEng/teku/blob/0.12.3/ethereum/datastructures/src/main/java/tech/pegasys/teku/datastructures/util/CommitteeUtil.java#L109)). The result will be the same: it doesn't matter if we are considering `i_1` or its image `i_1'`; the decision as to whether to swap or not has the same outcome. #### The number of rounds In Eth2 we do 90 rounds of the above. The [original paper](https://link.springer.com/content/pdf/10.1007%2F978-3-642-32009-5_1.pdf) suggests that $6\lg{N}$ rounds is required "to start to see a good bound on CCA-security", where $N$ is the list length. In his [annotated spec](https://github.com/ethereum/annotated-spec/blob/master/phase0/beacon-chain.md) Vitalik says "Expert cryptographer advice told us ~$4\log_2{N}$ is sufficient for safety". The absolute maximum number of validators in Eth2, thus the maximum size of the list we need to shuffle, is about $2^{22}$ (4.2 million). On Vitalik's estimate that gives us 88 rounds, on the paper's estimate, 92 rounds (assuming that $\lg$ is the natural logarithm). So we are in the right ballpark, especially as we are very, very unlikely to end up with that many active validators. It might be interesting to make the number of rounds adaptive based on list length. But we don't do that; it's probably an optimisation too far. Fun fact: when Least Authority audited the beacon chain specification, they initially found bias in the shuffling used for selecting block proposers (see [Issue F](https://leastauthority.wpengine.com/static/publications/LeastAuthority-Ethereum-2.0-Specifications-Audit-Report.pdf)). This turned out to be due to mistakenly using a configuration that had only 10 rounds of shuffling. When they increased it to the 90 we use for mainnet, the bias no longer appeared. #### (Pseudo) randomness The algorithm requires that we select a pivot point randomly in each round, and randomly choose whether to swap each element or not in each round. In Eth2, we deterministically generate the "randomness" from a seed value, such that the same seed will always generate the same shuffling. The pivot index is generated from eight bytes of a SHA2 hash of the seed concatenated with the round number, so it usually changes each round. The decision bits used to determine whether or not to swap elements are bits drawn from SHA256 hashes of the seed, the round number, and the index of the element within the list. #### Efficiency This shuffling algorithm is much slower than Fisher--Yates. That algorithm requires $N$ swaps. Our algorithm will require $90N/4$ swaps on average. We should also consider the generation of pseudo-randomness, which is the most expensive part of the algorithm. Fisher--Yates needs something like $N\log_2{N}$ bits of randomness, and we need $90(\log_2{N} + N/2)$ bits, which, for the range of $N$ we need in Eth2, is many more bits (about twice as many when $N$ is a million). ## Why swap-or-not? Why use such an inefficient implementation? #### Shuffling single elements The brilliance is that, if we are interested in only a few indices, we _do not need to compute the shuffling of the whole list_. In fact, we can apply the algorithm to a single index to find out which index it will be swapped with. So, if we want to know where the element with index 217 gets shuffled to, we can run the algorithm with only that index; we do not need to shuffle the whole list. Moreover, if we want to know the converse, which element gets shuffled into index 217, we can just run the algorithm backwards for element 217 (backwards means running the round number from high to low rather than low to high). In summary, we can compute the destination of element `i` in $O(1)$, and the source of element `i` (the inverse operation) also in $O(1)$, not dependent on the length of the list. Shuffles like the Fisher--Yates shuffle do not have this property and cannot work with single indices, they always need to iterate the whole list. Being applied to a single index is how the algorithm is presented in the [Eth2 specifications](https://github.com/ethereum/eth2.0-specs/blob/v0.12.2/specs/phase0/beacon-chain.md#compute_shuffled_index). In fact, shuffling the whole list at once is just an optimisation! We could simply run the shuffle for each element of the list in turn if we wished: run the (inverse) shuffle to find out which element ends up at index 0; re-run the shuffle to find out which element ends up at index 1, and so on. We don't do this, but only because the swap-or-not decision uses a hash that generates 256 bits at a time and it is wasteful to throw away 255 bits. If we used a 1-bit hash or oracle it would be very nearly as efficient to shuffle the list one element at a time as to shuffle the whole list at once. #### Keeping light clients light The reason that this property is useful is all about light clients. Light clients are observers of the Eth2 beacon and shard chains that do not store the entire state, but do wish to be able to securely access data on the chains. As part of verifying that they have the correct data---that no-one has lied to them---it may be necessary to compute the committees that attested to that data. This means shuffling, and we don't want light clients to have to hold and shuffle the entire list of validators. By using the swap-or-not shuffle, they are able to compute only the small number of committee members that they are interested in, which should be vastly more efficient overall. #### History If you like GitHub archaeology (as I do), you can see the initial discussion about the search for a good shuffling algorithm for Eth2 [here](https://github.com/ethereum/eth2.0-specs/issues/323), and the announcement of the winner [here](https://github.com/ethereum/eth2.0-specs/issues/563). And for a different depiction of the swap-or-not shuffle, take a look at the estimable Protolambda's [more visual explanation](https://github.com/protolambda/eth2-docs#shuffling). ## And finally... Here's a picture of me at EthCC in 2019 implementing our first version of the swap-or-not shuffle in Teku (known as Artemis back then), while listening to Justin Drake describing the swap-or-not shuffle. Such meta :slightly_smiling_face: <p style="text-align:center"> <a href="https://twitter.com/benjaminion_xyz/status/1102867104020221953"><img width="384" height="512" alt="Implementing swap-or-not at EthCC." src="https://pbs.twimg.com/media/D04rx1yX4AA0c2R?format=jpg&name=large"></a> </p>