3 September 2020
If you want to learn to dance, you're in the wrong place. But trust me, the Eth2 shuffle is almost as exciting.
Shuffling lists is a fundamental operation in Ethereum 2.0. It is mainly used to pseudo-randomly assign validators to the committees that operate during every 12 second slot, but also to select the beacon chain block proposer at each slot.
Shuffling seems fairly simple. Although there are pitfalls to watch out for, it is a pretty well understood problem in computer science. The gold standard is probably the Fisher–Yates shuffle. So why aren't we using that for Eth2? Well, I'll explain more at the end of the article, but in short: light clients.
Rather than Fisher–Yates, we're using something called a "swap-or-not" shuffle. This is based on the work in this paper, intended originally for building encryption schemes. I recently re-wrote our implementation in the Teku Eth2 client, so thought I'd write it up while it's still fresh in my mind.
The shuffle proceeds in rounds. Each round is the same, and I'll describe just one round below. It's much simpler than it looks
First, we pick a pivot index p
. This is pseudorandomly chosen, based on the round number and some other seed data. The pivot is fixed for the rest of the round.
With this pivot, we then pick a mirror index m1
halfway between p
and 0
. That is, m1 = p / 2
. (We are going to ignore pesky off-by-one rounding issues for the purposes of this explanation.)
For each index between the mirror index m1
and the pivot index p
, we randomly decide whether we are going to swap the element or not.
Consider the element at index i1
. If we choose not to swap it, we just move on to consider the next index.
If do we decide to swap, then we exchange the list element at i1
with that at i1'
, its image in the mirror index. That is, i1
is swapped with i1' = m1 - (i1 - m1)
, so that i1
and i1'
are equidistant from m1
.
We make the same swap-or-not decision for each index between m1
and p
.
After doing all the indices from m1
to p
, mirroring in m1
, we now find a second mirror index at m2
, which is the point equidistant from p
and the end of the list. In fact m2 = m1 + n / 2
.
Finally, we repeat the swap-or-not process, considering all the points from the pivot, p
to the second mirror m2
. If we choose not to swap, we just move on. If we choose to swap then we exchange the element at j1
with it's image at j1'
in the mirror index m2
.
At the end of the round, we have considered all the indices between m1
and m2
, which is half of the total indices, and for each index either swapped it or not with a distinct index in the other half. So, all of the indices have been considered exactly once for swapping.
The next round begins by incrementing (or decrementing) the round number, which gives us a new pivot index, and off we go again.
When deciding whether to swap or not for each index, the algorithm cleverly bases its decision on the higher of the candidate index or its image in the mirror. That is, i_1
rather than i_1'
(when below the pivot), and i_k'
rather than i_k
(when above the pivot). This means that we have flexibility when running through the indices of the list: we could do 0
to m1
and p
to m2
as two separate loops, or do it with a single loop from m1
to m2
as I outlined above (and implemented). The result will be the same: it doesn't matter if we are considering i_1
or its image i_1'
; the decision as to whether to swap or not has the same outcome.
In Eth2 we do 90 rounds of the above. The original paper suggests that
It might be interesting to make the number of rounds adaptive based on list length. But we don't do that; it's probably an optimisation too far.
Fun fact: when Least Authority audited the beacon chain specification, they initially found bias in the shuffling used for selecting block proposers (see Issue F). This turned out to be due to mistakenly using a configuration that had only 10 rounds of shuffling. When they increased it to the 90 we use for mainnet, the bias no longer appeared.
The algorithm requires that we select a pivot point randomly in each round, and randomly choose whether to swap each element or not in each round.
In Eth2, we deterministically generate the "randomness" from a seed value, such that the same seed will always generate the same shuffling.
The pivot index is generated from eight bytes of a SHA2 hash of the seed concatenated with the round number, so it usually changes each round.
The decision bits used to determine whether or not to swap elements are bits drawn from SHA256 hashes of the seed, the round number, and the index of the element within the list.
This shuffling algorithm is much slower than Fisher–Yates. That algorithm requires
We should also consider the generation of pseudo-randomness, which is the most expensive part of the algorithm. Fisher–Yates needs something like
Why use such an inefficient implementation?
The brilliance is that, if we are interested in only a few indices, we do not need to compute the shuffling of the whole list. In fact, we can apply the algorithm to a single index to find out which index it will be swapped with.
So, if we want to know where the element with index 217 gets shuffled to, we can run the algorithm with only that index; we do not need to shuffle the whole list. Moreover, if we want to know the converse, which element gets shuffled into index 217, we can just run the algorithm backwards for element 217 (backwards means running the round number from high to low rather than low to high).
In summary, we can compute the destination of element i
in i
(the inverse operation) also in
Being applied to a single index is how the algorithm is presented in the Eth2 specifications. In fact, shuffling the whole list at once is just an optimisation! We could simply run the shuffle for each element of the list in turn if we wished: run the (inverse) shuffle to find out which element ends up at index 0; re-run the shuffle to find out which element ends up at index 1, and so on. We don't do this, but only because the swap-or-not decision uses a hash that generates 256 bits at a time and it is wasteful to throw away 255 bits. If we used a 1-bit hash or oracle it would be very nearly as efficient to shuffle the list one element at a time as to shuffle the whole list at once.
The reason that this property is useful is all about light clients. Light clients are observers of the Eth2 beacon and shard chains that do not store the entire state, but do wish to be able to securely access data on the chains. As part of verifying that they have the correct data–-that no-one has lied to them–-it may be necessary to compute the committees that attested to that data. This means shuffling, and we don't want light clients to have to hold and shuffle the entire list of validators. By using the swap-or-not shuffle, they are able to compute only the small number of committee members that they are interested in, which should be vastly more efficient overall.
If you like GitHub archaeology (as I do), you can see the initial discussion about the search for a good shuffling algorithm for Eth2 here, and the announcement of the winner here.
And for a different depiction of the swap-or-not shuffle, take a look at the estimable Protolambda's more visual explanation.
Here's a picture of me at EthCC in 2019 implementing our first version of the swap-or-not shuffle in Teku (known as Artemis back then), while listening to Justin Drake describing the swap-or-not shuffle. Such meta
Learn More →