Try   HackMD

Shuffling in Ethereum 2.0

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Ben Edgington (Protocol Engineering @ConsenSys — but views expressed are all my own)

3 September 2020

Introduction

If you want to learn to dance, you're in the wrong place. But trust me, the Eth2 shuffle is almost as exciting.

Shuffling lists is a fundamental operation in Ethereum 2.0. It is mainly used to pseudo-randomly assign validators to the committees that operate during every 12 second slot, but also to select the beacon chain block proposer at each slot.

Shuffling seems fairly simple. Although there are pitfalls to watch out for, it is a pretty well understood problem in computer science. The gold standard is probably the FisherYates shuffle. So why aren't we using that for Eth2? Well, I'll explain more at the end of the article, but in short: light clients.

Rather than FisherYates, we're using something called a "swap-or-not" shuffle. This is based on the work in this paper, intended originally for building encryption schemes. I recently re-wrote our implementation in the Teku Eth2 client, so thought I'd write it up while it's still fresh in my mind.

The Swap-or-Not Shuffle

A single round

The shuffle proceeds in rounds. Each round is the same, and I'll describe just one round below. It's much simpler than it looks

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

1. Choose a pivot and find the first mirror index

First, we pick a pivot index p. This is pseudorandomly chosen, based on the round number and some other seed data. The pivot is fixed for the rest of the round.

With this pivot, we then pick a mirror index m1 halfway between p and 0. That is, m1 = p / 2. (We are going to ignore pesky off-by-one rounding issues for the purposes of this explanation.)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The pivot and the first mirror index.

2. Traverse first mirror to pivot, swapping or not

For each index between the mirror index m1 and the pivot index p, we randomly decide whether we are going to swap the element or not.

Consider the element at index i1. If we choose not to swap it, we just move on to consider the next index.

If do we decide to swap, then we exchange the list element at i1 with that at i1', its image in the mirror index. That is, i1 is swapped with i1' = m1 - (i1 - m1), so that i1 and i1' are equidistant from m1.

We make the same swap-or-not decision for each index between m1 and p.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Swapping or not from the first mirror up to the pivot.

3. Calculate the second mirror index

After doing all the indices from m1 to p, mirroring in m1, we now find a second mirror index at m2, which is the point equidistant from p and the end of the list. In fact m2 = m1 + n / 2.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The second mirror index.

4. Traverse pivot to second mirror, swapping or not

Finally, we repeat the swap-or-not process, considering all the points from the pivot, p to the second mirror m2. If we choose not to swap, we just move on. If we choose to swap then we exchange the element at j1 with it's image at j1' in the mirror index m2.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Swapping or not from the pivot to the second mirror.

Putting it all together

At the end of the round, we have considered all the indices between m1 and m2, which is half of the total indices, and for each index either swapped it or not with a distinct index in the other half. So, all of the indices have been considered exactly once for swapping.

The next round begins by incrementing (or decrementing) the round number, which gives us a new pivot index, and off we go again.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The whole process running from one mirror to the other in a single round.

Interesting things

Clever stuff

When deciding whether to swap or not for each index, the algorithm cleverly bases its decision on the higher of the candidate index or its image in the mirror. That is, i_1 rather than i_1' (when below the pivot), and i_k' rather than i_k (when above the pivot). This means that we have flexibility when running through the indices of the list: we could do 0 to m1 and p to m2 as two separate loops, or do it with a single loop from m1 to m2 as I outlined above (and implemented). The result will be the same: it doesn't matter if we are considering i_1 or its image i_1'; the decision as to whether to swap or not has the same outcome.

The number of rounds

In Eth2 we do 90 rounds of the above. The original paper suggests that 6lgN rounds is required "to start to see a good bound on CCA-security", where N is the list length. In his annotated spec Vitalik says "Expert cryptographer advice told us ~4log2N is sufficient for safety". The absolute maximum number of validators in Eth2, thus the maximum size of the list we need to shuffle, is about 222 (4.2 million). On Vitalik's estimate that gives us 88 rounds, on the paper's estimate, 92 rounds (assuming that lg is the natural logarithm). So we are in the right ballpark, especially as we are very, very unlikely to end up with that many active validators.

It might be interesting to make the number of rounds adaptive based on list length. But we don't do that; it's probably an optimisation too far.

Fun fact: when Least Authority audited the beacon chain specification, they initially found bias in the shuffling used for selecting block proposers (see Issue F). This turned out to be due to mistakenly using a configuration that had only 10 rounds of shuffling. When they increased it to the 90 we use for mainnet, the bias no longer appeared.

(Pseudo) randomness

The algorithm requires that we select a pivot point randomly in each round, and randomly choose whether to swap each element or not in each round.

In Eth2, we deterministically generate the "randomness" from a seed value, such that the same seed will always generate the same shuffling.

The pivot index is generated from eight bytes of a SHA2 hash of the seed concatenated with the round number, so it usually changes each round.

The decision bits used to determine whether or not to swap elements are bits drawn from SHA256 hashes of the seed, the round number, and the index of the element within the list.

Efficiency

This shuffling algorithm is much slower than FisherYates. That algorithm requires N swaps. Our algorithm will require 90N/4 swaps on average.

We should also consider the generation of pseudo-randomness, which is the most expensive part of the algorithm. FisherYates needs something like Nlog2N bits of randomness, and we need 90(log2N+N/2) bits, which, for the range of N we need in Eth2, is many more bits (about twice as many when N is a million).

Why swap-or-not?

Why use such an inefficient implementation?

Shuffling single elements

The brilliance is that, if we are interested in only a few indices, we do not need to compute the shuffling of the whole list. In fact, we can apply the algorithm to a single index to find out which index it will be swapped with.

So, if we want to know where the element with index 217 gets shuffled to, we can run the algorithm with only that index; we do not need to shuffle the whole list. Moreover, if we want to know the converse, which element gets shuffled into index 217, we can just run the algorithm backwards for element 217 (backwards means running the round number from high to low rather than low to high).

In summary, we can compute the destination of element i in O(1), and the source of element i (the inverse operation) also in O(1), not dependent on the length of the list. Shuffles like the FisherYates shuffle do not have this property and cannot work with single indices, they always need to iterate the whole list.

Being applied to a single index is how the algorithm is presented in the Eth2 specifications. In fact, shuffling the whole list at once is just an optimisation! We could simply run the shuffle for each element of the list in turn if we wished: run the (inverse) shuffle to find out which element ends up at index 0; re-run the shuffle to find out which element ends up at index 1, and so on. We don't do this, but only because the swap-or-not decision uses a hash that generates 256 bits at a time and it is wasteful to throw away 255 bits. If we used a 1-bit hash or oracle it would be very nearly as efficient to shuffle the list one element at a time as to shuffle the whole list at once.

Keeping light clients light

The reason that this property is useful is all about light clients. Light clients are observers of the Eth2 beacon and shard chains that do not store the entire state, but do wish to be able to securely access data on the chains. As part of verifying that they have the correct data-that no-one has lied to them-it may be necessary to compute the committees that attested to that data. This means shuffling, and we don't want light clients to have to hold and shuffle the entire list of validators. By using the swap-or-not shuffle, they are able to compute only the small number of committee members that they are interested in, which should be vastly more efficient overall.

History

If you like GitHub archaeology (as I do), you can see the initial discussion about the search for a good shuffling algorithm for Eth2 here, and the announcement of the winner here.

And for a different depiction of the swap-or-not shuffle, take a look at the estimable Protolambda's more visual explanation.

And finally

Here's a picture of me at EthCC in 2019 implementing our first version of the swap-or-not shuffle in Teku (known as Artemis back then), while listening to Justin Drake describing the swap-or-not shuffle. Such meta

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →