How to Store a Permutation Compactly

Bram Cohen and Dan Boneh

Feb. 2022

In this note we describe a compact way to store a permutation, while supporting a fast way to evaluate and invert the permutation. This question comes up naturally when optimizing the software underlying the Chia proof-of-space blockchain. Chia replaces Nakamoto's energy hungry proof-of-work consensus with an eco-friendly proof-of-space.

We end the note with an open problem that we don't know how to solve.

How to store a permutation compactly

A permutation on the set

[n] : = {0, 1, 2, \dots, n - 1}

is a one-to-one function from

[n]

[n]

. The group of all

n!

permutations on

[n]

is denoted by

S_{n}

. Here is an example permutation in

S_{16}

that maps

0 \to 9, 1 \to 14, 2 \to 12, 3 \to 2

, etc.:

\begin{matrix} (1) & π_{1} : = (\begin{matrix} 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 \\ 9 & 14 & 12 & 2 & 7 & 10 & 13 & 3 & 1 & 15 & 0 & 6 & 4 & 11 & 5 & 8 \end{matrix}) \end{matrix}

We say that

D

is a data structure for storing a permutation if

D

supports the following interface:

$process (π) \to d_{π}$ : process the given permutation
$π \in S_{n}$ and output its compact representation
$d_{π}$ ,
$eval (d_{π}, x) \to y$ : on input
$d_{π}$ and
$x \in [n]$ , output
$y = π (x)$ .

We require that

eval (\cdot, \cdot)

be fast, meaning that its running should be at most

O (\log n)

. Algorithm

process (\cdot)

can run in time polynomial in

n

Our goal is to design a data structure for storing a permutation where the worst case size of

d_{π}

, measured in bits, is as small as possible. By worst case we mean worst case over all

π \in S_{n}

. For extra bonus points we also want the data structure to support fast inversion:

$invert (d_{π}, y) \to x$ : on input
$d_{π}$ and
$y \in [n]$ , output
$x = π^{- 1} (y)$ .

This algorithm should similarly run in time at most

O (\log n)

The trivial solution

Suppose

n

is a power of 2. The trivial data structure for storing a permutation

π \in S_{n}

simply lists the elements of the permutation in order. That is,

$process (π)$ outputs
$d_{π} : = [π (0), π (1), \dots, π (n - 1)]$ , and
$eval (d_{π}, x)$ outputs
$d_{π} [x]$ for
$x \in [n]$ .

For example, for the permutation

π_{1}

(1)

above we have

d_{π_{1}} : = [1010 1100 1101 1001 0111 0010 1110 0011 1011 1111 0000 0110 0100 0001 0101 1000] .

This trivial data structure has the following properties:

the length of
$d_{π}$ is always
$n \log_{2} n$ bits, and
$eval (\cdot, \cdot)$ runs in constant time.

This looks quite good. So, are we done?

Well, not quite. Since the number of permutation in

S_{n}

n!

, the number of bits needed to represent a permutation in the worst case is at least

P (n) : = ⌈ \log_{2} (n!) ⌉ .

By Stirling's approximation, and the fact that

\log_{2} e

is about

1.443

, we know that

P (n) = ⌈ \log_{2} (n!) ⌉ = n \log_{2} n - n \log_{2} e + Θ (\log_{2} n) \approx n \log_{2} n - 1.443 n .

This means that we should be able to reduce the space needed to store a premutation by about 1.443 bits per entry over the trivial data structure.

The optimally compressed representation. One can represent a permutation

π

S_{n}

as an integer

d_{π}

{1, \dots, n!}

, and this will take at most

P (n)

bits. While this is more compact than the trivial data structure, we lose the ability to quickly evaluate and invert the permutation.

So, to summarize:

The challenge is to construct a data structure for storing a permutation that takes less space than the trivial data stucture, specifically 1.44 n fewer bits, while retaining the ability to quickly evaluate the permutation.

Who cares?

Is saving 1.44 bits per entry worth discussing?

Yes! There is an immediate application. In a Proof-of-Space based blockchain the monetary rewards for those providing proofs is proportional to the amount of space they have. Therefore, compressing the files used to create a Proof-of-Space will lead to increased rewards without the need to buy additional storage. It is a good day when a clever algorithm can increase revenues without increasing cost.

The Chia Proof-of-Space uses a domain of size

n = 2^{32}

. A reduction of

1.44 n

bits in storage results in a

1.44 n / (n \log_{2} n) = 1.44 / 32 \approx 4.5 %

increase in rewards, at no additional cost. However, due to the specifics of the Chia Proof-of-Space, one needs to compress a slightly more complicated object than a permutation. Adapting the compression technique presented here to Chia can, in principle, increase revenues by about 0.48% in the limit.

A compact data structure for storing a permutation

Now back to our question. From here on we will assume that

n

is a power of two, so that

n = 2^{k}

for some positive integer

k

. Recall that we define

P (n) : = ⌈ \log_{2} (n!) ⌉

Our starting point is the classic Beneš network (1965). A Beneš network with

2

inputs is a simple switch that has two inputs and two outputs. The switch has two settings: in its "zero" setting the outputs are the same as the inputs; in its "one" setting the outputs are swapped. This is illustrated in the following figure:

A Beneš network with

n = 2^{k}

inputs is defined recursively using the following figure:

It is a simple exercise to show that for every permutation

π

S_{n}

there is a setting of the switches in the Beneš network that implements the permutation

π

. For example, for

n = 16

, here are the switch settings for the permutation

π_{1}

from

(1)

. The dotted path shows how the input `4' is mapped to the output '12'. The purpose of the green switches will become clear in a minute.

Once all the switches are set, evaluating

π (x)

for an input

x \in [n]

takes

(2 \log_{2} n) - 1

steps. For a given input

x \in [n]

, we follow the path from

x

to an output by traversing one switch at a time. This path contains exactly

(2 \log_{2} n) - 1

switches. Inverting the permutation is similarly done in

(2 \log_{2} n) - 1

steps by processing the switches in the reverse order.

How much space do we need to store all the switch settings? The number of switches in an

n

-input Beneš network is exactly

# of switches = n \log_{2} n - \frac{n}{2} .

This lets us represent any permutation in

S_{n}

using that many bits, which is a savings of 0.5 bits per entry over the trivial solution. A good start, but we want to do better.

Waksman (1968) observed that the top-left-most switch of a Beneš network can always be set to zero while still retaining the network's ability to express every permutation

π

S_{n}

. This switch is shown in green in our recursive description of the Beneš network. We can apply this observation recursively to all the constituent Beneš networks and set all the green switches in our example network to zero. Since these switches are fixed, we do not need to store their settings, and as a result we save a total of

n / 2 - 1

bits. Hence, this observation reduces the number of bits to exactly

# of bits = n \log_{2} n - n + 1.

Good progress, but we are still not at

n \log_{2} n - 1.443 n

Munroa, Raman, Ramanc, and Rao (2012) suggest a way to further compress a Beneš network. Let

q = 2^{ℓ}

be a small power of two, say

ℓ \leq 8

. We prematurely terminate the recursive structure of the Beneš network at a permutation of size

q

, and then encode this permutation (on a domain of size

q

) using the optimally compressed representation. Here is the network for

π_{1}

when we terminate the recursion at a permutation of size 4:

Here we replaced the three inner layers of the Beneš network with four permutations in

S_{4}

. Each permutation can be represented as an integer

d

where

d \in [24]

. Evaluating one of these

S_{4}

permutations takes constant time.

More generally, suppose we terminate the recursion at a permutation of size

q = 2^{ℓ}

. This eliminates the inner most

2 ℓ + 1

layers of the network. Then the number of bits needed to store the entire network is:

\begin{aligned} \underset{\begin{array}{c} # switches excluding the \\ 2 ℓ + 1 inner layers \end{array}}{\underset{⏟}{n (\log_{2} n - \log_{2} q)}} & - \underset{\begin{array}{c} # Waksman \\ bits \end{array}}{\underset{⏟}{(n / q) + 1}} + \underset{\begin{array}{c} size of the single \\ q -permutation layer \end{array}}{\underset{⏟}{(n / q) \cdot P (q)}} = \\ (2) & = n \log_{2} n - n (\log_{2} q + (1 / q) - ⌈ \log_{2} (q!) ⌉ / q) + 1. \end{aligned}

Concretely, we get

for
$q = 2$ the data structure uses about
$[n \log_{2} n - n]$ bits. (a Beneš-Waksman network)
for
$q = 4$ the data structure uses about
$[n \log_{2} n - n]$ bits. (as in the figure above)
for
$q = 8$ the data structure uses about
$[n \log_{2} n - 1.125 n]$ bits.
for
$q = 16$ the data structure uses about
$[n \log_{2} n - 1.25 n]$ bits.
for
$q = 32$ the data structure uses about
$[n \log_{2} n - 1.34 n]$ bits.
For
$q = 256$ the data structure uses about
$[n \log_{2} n - 1.43 n]$ bits.

This converges to

P (n) \approx n \log_{2} n - 1.443 n

bits, which is exactly what we want. In practice using

q = 32

is sufficient. We note that one can store a batch of

S_{q}

permutations more compactly than storing them separately. This lets us remove the ceiling function in the

⌈ \log_{2} (q!) ⌉

term in

(2)

and get some more space savings.

Evaluation time. We can speed up the evaluation procedure by processing multiple layers at a time. To do so, observe that the group of twelve blue switches in the figure below are sufficient to process the first three layers for the inputs

{0, 1, \dots, 7}

The twelve yellow switches are sufficient to process the first three layers for the inputs

{8, \dots, 15}

. Thus, we can store the twelve bits for the blue switches in one block on disk, and the twelve bits for the yellow switches in another block on disk. Now, for an input

x \in [16]

, we can process the first three layers with a single disk access.

For permutations in

S_{n}

where

n = 2^{32}

–- the case of interest to us –- the Beneš network contains

63

layers. Using

q = 32

we get to remove the 11 inner layers, and replace them with permutations in

S_{32}

. This leaves 53 layers to process: 26 switching layers, a permutation in

S_{32}

, and 26 more switching layers.

The plan is to process eight layers at a time. In this case, the group of "blue" switches contains

2^{7} \times 8 = 1024

switches, which means storing 1024 bits, or 128 bytes, in one block on disk. Evaluating the permutation at a given input

x \in [2^{32}]

can now be done by reading only seven disk blocks:

we process the first 24 layers, eight layers at a time, by reading three disk blocks;
we process the next two layers, the permutation in
$S_{32}$ , and the next two layers after that, using a single disk access;
finally, we process the remaining 24 layers, eight layers at a time, by reading three more disk blocks.

This is a total of seven disk blocks that need to be read. Interestingly, evaluating the inverse premutation at a given input

y \in [2^{32}]

is done exactly the same way, just in the reverse order.

Conclusion

This completes our story. We explained how a compressed Beneš network lets us store a permutation

π

S_{n}

using an optimal number of bits. Evaluating

π (x)

and

π^{- 1} (y)

can be done in about

2 \log_{2} n

steps. The algorithm has some locality that lets us reduce the number of disk reads.

An open problem. Design a data structure with similar performance to the one presented here and better locality. In particular, evaluating a permutation in

S_{n}

for

n = 2^{32}

should require reading only a single disk block (4KB). This will match the performance of the trivial solution, but with less space.

Bernstein (2020) describes Waksman's observation and its history in Section 6. We thank Dan Bernstein for pointing us to this, and for other helpful comments.

Barbay and Navarro show how to compress specific classes of permutations in

S_{n}

, while retaining the ability to quickly evalaute the premutation and its inverse. For example, consider the class of local permutations where there is a known bound

b ≪ n

such that

| π (x) - x | \leq b

for all

x \in [n]

How to Store a Permutation Compactly

Bram Cohen and Dan Boneh

Feb. 2022

How to store a permutation compactly

The trivial solution

Who cares?

A compact data structure for storing a permutation

Conclusion

Related work

Read more

How to Build a Private DAO on Ethereum

A Simple Range Proof From Polynomial Commitments