Neural nets perform exceptionally well at a variety of classification tasks - tasks like determining if an image contains an airplace, or recognising handwritten digits. These models use millions or even billions of floating point parameters to compute their classifications using multiple layers of matrix multiplications and non-linearities. While these calculations can be carried out very efficiently, it remains challenging to efficiently prove that the calculations were carried out correctly. Overcoming this challenge would allow slow computers (e.g. blockchains, or edge devices such as smartphones) to delegate neural network inference tasks to untrusted parties, enabling applications such as trustless biometric identification and smart contracts that are truly very smart.
The problem is that the primitives of ZK and ML are difficult to reconcile. ZK operates at a fundamental level with modular arithmetic (i.e. with discrete values over a finite field) whereas neural nets and most machine learning models perform "smooth" operations on floating point numbers, called "weights". Existing approaches have attempted to bridge this divide by quantizing the weights of a neural net, so that they can be represented as elements of the finite field. Care must be taken to avoid a "wrap-around" occurring in the (now, modular!) arithmetic of the quantized network, and weight quantization can only decrease model accuracy. But more than anything, it does feel a little like trying to force a square peg into a round hole.
We propose a different approach: let's go back to a time before the NN paradigm was settled, to a time when a greater variety of neural nets roamed the earth, and let's find a machine learning model more amenable to ZKP. One such model is the "Weightless Neural Network". It's claimed to be the first ever neural net to be commercialized! Wow. But wow again, it is a very dusty dinosaur. Over the decades, it has received very little attention compared to familiar NNs. We set out to develop a system for proving the inferences of this weightless wonder … and we call it … Zero Gravity (The Weight is Over).
Weightless means no weights, no floating point arithmetic, and no expensive linear algebra, let alone non-linearities - so none of the challenges mentioned above. Will there be different challenges, and will they be worse? This is what we set out to discover at ZKHack hackathon (Lisbon, 2023).
A Weightless Neural Network (WNN) is entirely combinatorial. Its input is a bitstring (e.g. encoding an image), and their output is one of several predefined classes, e.g. corresponding to the ten digits. It learns from a dataset of (input, output) pairs by remembering observed bitstring patterns in a bunch of devices called RAM cells, grouped into "discriminators" that correspond to each output class. RAM cells are so called since they are really just big lookup tables, indexed by bitstring patterns, and storing a 1 when that pattern has been observed in an input string that is labeled with the class of this discriminator.
(Figures from the BTHOWeN paper, see below)
Each RAM cell is connected to only a small number of inputs i.e. bits of the input bitstring. This is necessary since the size of its "random-access" lookup table will grow exponentially in the number of inputs (addressed using Bloom filters, see below). The wiring from the bits of the input bitstring to the RAM cells is typically randomized using a fixed permutation..
Another important thing to note is that there is only one layer of RAM cells. Consequently a WNN may excel in learning combinatorial, superficial patterns in the input bits, but can't hope to learn the composite, semantically rich features that can be learnt by a deep neural network. Why just one layer? Remember that WNNs come from the time before the back-prop and deep nets won out. And there has been comparatively very little work done on them since. To make WNNs deep, you'll need to invent an analog of back-prop (task for another hackathon?). Despite their simplicity, WNNs perform impressively on datasets such as MNIST - the BTHOWeN model, discussed below, achieves a test set accuracy exceeding 95%.
RAM cells are terribly space inefficient, but they are very sparse, since most bit patterns are never observed. Bloom filters offer a space efficient method of representing the data of a RAM cell, allowing the RAM cells to receive many more inputs. What's a Bloom filter? Bloom filters are space-efficient data structures for probabilistically testing set membership. False positives are possible, false negatives are not - that is, Bloom filters will (efficiently) answer with "x is definitely not in the set" or "x is in the set with high probability".
Under the hood, a Bloom filter consists of a fixed-length bit array and a number of functions that map potential set elements to positions in the array. These "hash functions" are chosen so as to hit each index of the bit array with uniform probability - though they may not always have cryptographic properties.
Zero Gravity is a system for proving an inference run (i.e. a classification) for a pre-trained, public WNN and a private input. In Zero Gravity, the prover claims to know an input bitstring
Zero Gravity builds upon the recent BTHOWeN model by Susskind et al (2022), in which the authors improve upon earlier WNN models in a number of interesting ways. Most importantly for this hackathon project, they helpfully provide an implementation complete with pre-trained models and reproducible benchmarks.
The interesting problem of proving that a WNN has been correctly trained or updated is left for another hackathon!
The hash functions in the WNN consume a short substring of the permuted input bitstring, outputting an index in a Bloom filter. The BTHOWeN authors chose their hash function to match their target domain: edge devices, and FPGAs in particular. Our application domain is entirely different and imposes different constraints. We want the hash function to be appropriate for a zero knowledge proof system. What sort of hash function should we use?
A cryptographic hash function is not an appropriate choice, since they are expensive to implement in a ZK proof system and as the hash functions only consume a short bitstring (e.g. of length 49 for MNIST) they are brute-force invertible in any case. Hash functions involving bit decompositions are also too expensive. We want a hash function
How about the hash function
The problem with a linear
where
Suitable primes
We called our new, non-cryptographic hash function the "'MishMash" after its discoverer, team member HaMish.
We wrote our proving system in Aleo with some metaprogramming in Python, and modified the BTHOWeN implementation to use our choice of hash function in order to re-train the models. Messy, hackathon quality code has been shamelessly made available.
Team Zero Gravity, from left to right: Victor Sint Nicolaas, Benjamin Wilson, Hamish Ivey-Law, Artem Grigor, Cathy So, Georg Wiese.