Try   HackMD

How to Store a Permutation Compactly

Bram Cohen and Dan Boneh

Feb. 2022


In this note we describe a compact way to store a permutation, while supporting a fast way to evaluate and invert the permutation. This question comes up naturally when optimizing the software underlying the Chia proof-of-space blockchain. Chia replaces Nakamoto's energy hungry proof-of-work consensus with an eco-friendly proof-of-space.

We end the note with an open problem that we don't know how to solve.

How to store a permutation compactly

A permutation on the set

[n]:={0,1,2,,n1} is a one-to-one function from
[n]
to
[n]
. The group of all
n!
permutations on
[n]
is denoted by
Sn
. Here is an example permutation in
S16
that maps
09,  114,  212,  32
, etc.:

(1)π1:=(01234567891011121314159141227101331150641158)

We say that

D is a data structure for storing a permutation if
D
supports the following interface:

  • process(π)dπ
    :
    process the given permutation
    πSn
    and output its compact representation
    dπ
    ,
  • eval(dπ,x)y
    :
    on input
    dπ
    and
    x[n]
    , output
    y=π(x)
    .

We require that

eval(,) be fast, meaning that its running should be at most
O(logn)
. Algorithm
process()
can run in time polynomial in
n
.

Our goal is to design a data structure for storing a permutation where the worst case size of

dπ, measured in bits, is as small as possible. By worst case we mean worst case over all
πSn
. For extra bonus points we also want the data structure to support fast inversion:

  • invert(dπ,y)x
    :
    on input
    dπ
    and
    y[n]
    , output
    x=π1(y)
    .

This algorithm should similarly run in time at most

O(logn).

The trivial solution

Suppose

n is a power of 2. The trivial data structure for storing a permutation
πSn
simply lists the elements of the permutation in order. That is,

  • process(π)
    outputs
    dπ:=[π(0), π(1), , π(n1)]
    , and
  • eval(dπ,x)
    outputs
    dπ[x]
    for
    x[n]
    .

For example, for the permutation

π1 in
(1)
above we have
dπ1:=[1010110011011001011100101110001110111111000001100100000101011000].
This trivial data structure has the following properties:

  • the length of
    dπ
    is always
    nlog2n
    bits, and
  • eval(,)
    runs in constant time.

This looks quite good. So, are we done?

Well, not quite. Since the number of permutation in

Sn is
n!
, the number of bits needed to represent a permutation in the worst case is at least
P(n):=log2(n!).
By Stirling's approximation, and the fact that
log2e
is about
1.443
, we know that
P(n)=log2(n!)=nlog2nnlog2e+Θ(log2n)nlog2n1.443n.
This means that we should be able to reduce the space needed to store a premutation by about 1.443 bits per entry over the trivial data structure.

The optimally compressed representation. One can represent a permutation

π in
Sn
as an integer
dπ
in
{1,,n!}
, and this will take at most
P(n)
bits. While this is more compact than the trivial data structure, we lose the ability to quickly evaluate and invert the permutation.

So, to summarize:

The challenge is to construct a data structure for storing a permutation that takes less space than the trivial data stucture, specifically 1.44 n fewer bits, while retaining the ability to quickly evaluate the permutation.

Who cares?

Is saving 1.44 bits per entry worth discussing?

Yes! There is an immediate application. In a Proof-of-Space based blockchain the monetary rewards for those providing proofs is proportional to the amount of space they have. Therefore, compressing the files used to create a Proof-of-Space will lead to increased rewards without the need to buy additional storage. It is a good day when a clever algorithm can increase revenues without increasing cost.

The Chia Proof-of-Space uses a domain of size

n=232. A reduction of
1.44n
bits in storage results in a
1.44n/(nlog2n)=1.44/324.5%
increase in rewards, at no additional cost. However, due to the specifics of the Chia Proof-of-Space, one needs to compress a slightly more complicated object than a permutation. Adapting the compression technique presented here to Chia can, in principle, increase revenues by about 0.48% in the limit.

A compact data structure for storing a permutation

Now back to our question. From here on we will assume that

n is a power of two, so that
n=2k
for some positive integer
k
. Recall that we define
P(n):=log2(n!)
.

Our starting point is the classic Beneš network (1965). A Beneš network with

2 inputs is a simple switch that has two inputs and two outputs. The switch has two settings: in its "zero" setting the outputs are the same as the inputs; in its "one" setting the outputs are swapped. This is illustrated in the following figure:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

A Beneš network with

n=2k inputs is defined recursively using the following figure:
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

It is a simple exercise to show that for every permutation

π in
Sn
there is a setting of the switches in the Beneš network that implements the permutation
π
. For example, for
n=16
, here are the switch settings for the permutation
π1
from
(1)
. The dotted path shows how the input `4' is mapped to the output '12'. The purpose of the green switches will become clear in a minute.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Once all the switches are set, evaluating

π(x) for an input
x[n]
takes
(2log2n)1
steps. For a given input
x[n]
, we follow the path from
x
to an output by traversing one switch at a time. This path contains exactly
(2log2n)1
switches. Inverting the permutation is similarly done in
(2log2n)1
steps by processing the switches in the reverse order.

How much space do we need to store all the switch settings? The number of switches in an

n-input Beneš network is exactly
# of switches=nlog2nn2.
This lets us represent any permutation in
Sn
using that many bits, which is a savings of 0.5 bits per entry over the trivial solution. A good start, but we want to do better.

Waksman (1968) observed that the top-left-most switch of a Beneš network can always be set to zero while still retaining the network's ability to express every permutation

π in
Sn
. This switch is shown in green in our recursive description of the Beneš network. We can apply this observation recursively to all the constituent Beneš networks and set all the green switches in our example network to zero. Since these switches are fixed, we do not need to store their settings, and as a result we save a total of
n/21
bits. Hence, this observation reduces the number of bits to exactly
# of bits=nlog2nn+1.

Good progress, but we are still not at

nlog2n1.443n.

Munroa, Raman, Ramanc, and Rao (2012) suggest a way to further compress a Beneš network. Let

q=2 be a small power of two, say
8
. We prematurely terminate the recursive structure of the Beneš network at a permutation of size
q
, and then encode this permutation (on a domain of size
q
) using the optimally compressed representation. Here is the network for
π1
when we terminate the recursion at a permutation of size 4:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Here we replaced the three inner layers of the Beneš network with four permutations in

S4. Each permutation can be represented as an integer
d
where
d[24]
. Evaluating one of these
S4
permutations takes constant time.

More generally, suppose we terminate the recursion at a permutation of size

q=2. This eliminates the inner most
2+1
layers of the network. Then the number of bits needed to store the entire network is:
n(log2nlog2q)# switches excluding the2+1 inner layers(n/q)+1# Waksmanbits+(n/q)P(q)size of the singleq-permutation layer=(2)=nlog2nn(log2q+(1/q)log2(q!)/q)+1.

Concretely, we get

  • for
    q=2
    the data structure uses about
    [nlog2nn]
    bits.    (a Beneš-Waksman network)
  • for
    q=4
    the data structure uses about
    [nlog2nn]
    bits.    (as in the figure above)
  • for
    q=8
    the data structure uses about
    [nlog2n1.125n]
    bits.
  • for
    q=16
    the data structure uses about
    [nlog2n1.25n]
    bits.
  • for
    q=32
    the data structure uses about
    [nlog2n1.34n]
    bits.
  • For
    q=256
    the data structure uses about
    [nlog2n1.43n]
    bits.

This converges to

P(n)nlog2n1.443n bits, which is exactly what we want. In practice using
q=32
is sufficient. We note that one can store a batch of
Sq
permutations more compactly than storing them separately. This lets us remove the ceiling function in the
log2(q!)
term in
(2)
and get some more space savings.


Evaluation time. We can speed up the evaluation procedure by processing multiple layers at a time. To do so, observe that the group of twelve blue switches in the figure below are sufficient to process the first three layers for the inputs

{0,1,,7}.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The twelve yellow switches are sufficient to process the first three layers for the inputs

{8,,15}. Thus, we can store the twelve bits for the blue switches in one block on disk, and the twelve bits for the yellow switches in another block on disk. Now, for an input
x[16]
, we can process the first three layers with a single disk access.

For permutations in

Sn where
n=232
- the case of interest to us - the Beneš network contains
63
layers. Using
q=32
we get to remove the 11 inner layers, and replace them with permutations in
S32
. This leaves 53 layers to process: 26 switching layers, a permutation in
S32
, and 26 more switching layers.

The plan is to process eight layers at a time. In this case, the group of "blue" switches contains

27×8=1024 switches, which means storing 1024 bits, or 128 bytes, in one block on disk. Evaluating the permutation at a given input
x[232]
can now be done by reading only seven disk blocks:

  • we process the first 24 layers, eight layers at a time, by reading three disk blocks;
  • we process the next two layers, the permutation in
    S32
    , and the next two layers after that, using a single disk access;
  • finally, we process the remaining 24 layers, eight layers at a time, by reading three more disk blocks.

This is a total of seven disk blocks that need to be read. Interestingly, evaluating the inverse premutation at a given input

y[232] is done exactly the same way, just in the reverse order.

Conclusion

This completes our story. We explained how a compressed Beneš network lets us store a permutation

π in
Sn
using an optimal number of bits. Evaluating
π(x)
and
π1(y)
can be done in about
2log2n
steps. The algorithm has some locality that lets us reduce the number of disk reads.

An open problem. Design a data structure with similar performance to the one presented here and better locality. In particular, evaluating a permutation in

Sn for
n=232
should require reading only a single disk block (4KB). This will match the performance of the trivial solution, but with less space.

Bernstein (2020) describes Waksman's observation and its history in Section 6. We thank Dan Bernstein for pointing us to this, and for other helpful comments.

Barbay and Navarro show how to compress specific classes of permutations in

Sn, while retaining the ability to quickly evalaute the premutation and its inverse. For example, consider the class of local permutations where there is a known bound
bn
such that
|π(x)x|b
for all
x[n]
.