# Lookup Arguments for Mad Scientists and Busy People ## The setting for lookups - **When proving a relation** (think in a SNARK), standard **arithmetic gates** can be an awkward way to express what you want to prove. For example, byte-to-byte XOR (ubiquitous, e.g., in hashing) requires several arithmetic gates to be expressed. - **A more compact way** to express this and other constraints is a **lookup** table: - precompute all possible values of the table and store them in a commitment - at proving time, just prove you know a row in that table. This is tantamount to of showing that you know, say, A,B,C such that A XOR B = C. - A building block for this setting can be composed with other SNARKs and get a significantly more efficient prover. ## Useful, but **how do we build this?** - If you ignore that a table is actually composed of "multiple columns"[^col], then this sounds like a vector commitment problem: just lookup, say, XOR outputs, by proving some values are inside a precommitted vector. This is what a **vector commitment** does! [^vc] - **Except**, that we actually have a few more problems to solve (explained below) This is why vector commitments are inadequate for the job. We need a better primitive! - ==**A lookup argument** is a building block that that we can use for the setting above. It can be thought as a vector commitment on steroids[^vc1].== ### The problems with vector commitments as lookup arguments 1. **The "rows" you are looking up should be compressed, not sent**. Vector commitments, in general, require the verifier to read the vector values being opened. 2. **Prover should not run in the size of the whole table**. Even if we solve the compressing problem, we want it to be compatible with a prover that does not need to work linearly in the whole table (even with preprocessing!) This is not necessarily the case with vector commitments. 3. **It should support "multisets"**. For technical reasons, we may need to open the same value multiple times. A vector commitment instead generally does not guarantee this. 4. Zero-knowledge. (usually optional) ## A final nuance Several of the efficient constructions for lookups out there are motivated by consistency with a specific type of vector commitment we use: Pedersen on the Lagrangian basis. This is because this is very common in several SNARK constructions as a building block (usually implicitly). Other approaches are possible, e.g. from [RSA](https://eprint.iacr.org/2021/1672). These other approaches achieve different tradeoffs, e.g. in size of parameters and running times. Nonetheless, for common practical cases, non-RSA approaches clearly have the most attractive efficienct balance. [^col]: It turns out that the many-column case reduces to the single-column one. Note, though, that this usually requires additional properties, e.g. homomorphic commitments. [^vc]: One should consider this as a point for clarity only. There are other constructions for lookups not based on vector commitments. [^vc1]: Not a perfect analogy, but true for many constructions. See footnote above and [this paragraph](https://hackmd.io/K_87kflkT-GLWBQ3NYPxHA#A-final-nuance).