Exploring ZK compression

If you've been active on Solana Twitter, you might have seen a lot of new exciting products being launched such as blinks and ZK compression. There is already a lot of developer content available for blinks and developers are already chewing glass and building blinks. Let's look at ZK compression - What the hell is it? What's the problem that ZK compression is trying to fix? How is it achieving it?

What is ZK compression?

ZK compression is a new primitive built on Solana, by Helius Labs and Light Protocol, which enables developers to create applications at scale at cheaper costs via compressing on-chain data (which is stored in Solana's state), thereby reducing the cost of creating a new account on-chain by a lot.

"compressing on-chain data" doesn't mean that the data is initially being compressed via some lossy or lossless compression algorithm and is then decompressed later on. It might reduce the cost but not by a factor of 160x and 5000x.

Creation Cost	Regular Account	Compressed Account
100-byte PDA Account	~ 0.0016 SOL	~ 0.00001 SOL (160x cheaper)
100 Token Accounts	~ 0.2 SOL	~ 0.00004 SOL (5000x cheaper)

ZK compression uses a similar approach as compressed NFTs, which leverages the idea of using state compression, which allows developers to store account data on less-pricey Solana's ledger while preserving the security, performance, and composability of the Solana L1.

In broad, ZK compression is mainly split into two parts - compression via concurrent merkle trees and zero-knowledge proofs to ensure the integrity of the compressed state.

Concurrent Merkle Trees

In computer science, a Tree is a type of data structure that is a collection of objects (which are referred to as "nodes") that are linked to each other to represent a hierarchy. The top-most node is known as the "root node" and the bottom-most node is known as the "leaf node".

There are many sub-types of a tree, one of which is a binary tree in which there are two sub-trees similar to how are there two units of representation in binary system - 0 and 1

Merkle tree is a type of binary tree, in which every node which isn't a leaf node (referred to as "branch") is a cryptographic hash. The nodes are hashes of their children and the leaf nodes contain the data which is being hashed. Each branch is then also hashed together sequentially until only a single hash remains (referred to as "root hash").

After computing the root hash, data stored within a leaf node can be verified by rehashing the leaf's data along with hashes of adjacent branches. This is known as the proof path. Comparing this rehash to the root hash verifies the leaf data. If they match, the data is accurate. If not, the leaf data has been changed.

Whenever data in one of the leaf nodes is changed, the root hash is recomputed and the previous root hash becomes invalid, which is a major drawback of using a traditional Merkle tree during concurrent writes.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Suppose there is a request to update

X_{4}

X_{4}^{'}

(request A) and another concurrent request to update

X_{6}

X_{6}^{″}

(request B). Assume that request A has been processed earlier with {

X_{4}^{'}

X_{5}

X_{3}

} as proof. The root hash changes from

X_{1}

X_{1}^{'}

after request A has been processed.

Due to this request B is invalidated, as the root hash has been changed, which can be mathematically expressed as:

X_{1}^{'} \neq H (H (X_{6}, X_{7}), X_{2})

(as

X_{2}

has been changed to

X_{2}^{'}

due to request A)

X_{2}

is replaced with

X_{2}^{'}

in request B then both the requests would be processed successfully. Here comes in concurrent Merkle tree, which keeps a record of past write updates performed on the Merkle tree. Light protocol has made their own implementation of concurrent Merkle tree, which stores change log and the current proof in an on-chain account for that corresponding Merkle tree which also stores final root hash.

In the above example,

Change log =
$[X_{4}^{'}, X_{2}^{'}]$ , it's the path of the tree which was changed by the last operation done on the tree
Current proof =
$[X_{7}, X_{2}]$

The implementation finds the intersection node between the change log and current proof and replaces

X_{2}

with

X_{2}^{'}

in the current proof. If you're curious to know what exactly happens under the hood, I'd recommend reading the "Concurrent Leaf Replacement" section in Compressing Digital Assets with Concurrent Merkle Trees whitepaper.

Zero-knowledge proof

At the core, zero-knowledge proofs (ZKPs) allow one party (the prover) to prove to another (the verifier) that a statement is true, without revealing any information beyond the validity of the statement itself.

Where's Waldo?

Consider the following example, there are two people (Alice and Bob) who are trying to find Waldo in the below picture, which is from a children's book named Where's Waldo

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Alice has found Waldo but she doesn't want to reveal the exact location of Waldo to Bob she also wants to prove to Bob that she has found Waldo Alice being the prover, cuts a hole of the same size as Waldo in a very large, opaque sheet of cardboard. She places the picture under the cardboard such that only Waldo can be seen through the hole. She asks Bob to see through the hole - Bob can see Waldo but he doesn't know the exact coordinates of Waldo concerning the rest of the image as the cardboard is much bigger than the image.

This is an example of zero-knowledge proof where Alice proves to Bob that she knows where Waldo is but Bob doesn't gain any more knowledge except for the fact that Alice has found Waldo in the picture.

Two Balls

Consider the following example, Alice is colorblind but Bob is not colorblind. Alice doesn't believe Bob and Bob sets up an experiment that includes a challenge and response mechanism, through which she can deduce whether Bob is lying or not.

Bob gives two balls of two different colors (one is red and another one is blue) to Alice to hold those two balls in her hands. Alice puts those balls behind her back and she can either exchange the balls between her hands or not. Afterwards, she will show those the balls to Bob and he'll respond with "yes" if the balls have been exchanged, or else, he'll respond with "no".

The experiment goes back and forth between Alice and Bob until Alice is convinced. In such an experiment, the probability of guessing the correct answer consecutively 20 times is 1 in 1,048,576. Such experiments, give a statistical guarantee that something could be true.

The "Where's Waldo?" example is an example of non-interactive zero-knowledge proof which didn't require any kind of challenge-response mechanism as in the "Two Balls" example, which is an example of interactive zero-knowledge proof.

ZK compression built by Helius labs and Light Protocol uses ZK-SNARK which stands for Zero Knowledge Succinct Non-interactive ARgument of Knowledge, which is known for its small proof size (128 bytes in case of ZK compression) and quick verification times. ZKPs is a vast and interesting topic in the field of mathematics and computer science which would be hard to summarize in this blog post, so here are a few resources related to ZKP, if you want to read more about ZK and ZK-SNARK:

The good news is that you don't have to know about ZKPs in-depth to start developing applications using ZK compression.

Developing apps using ZK compression

Enough of theory, let's jump into practicals and start building using ZK compression.

Using Helius ZK compression RPC node

Let's use Helius Labs' ZK compression RPC node, instead of going through the hassle of setting up our local node. Let's write a simple typescript script that creates a token and mints some tokens into our payer account.

import stateless from "@lightprotocol/stateless.js";

const connection = stateless.createRpc(
  "https://zk-testnet.helius.dev:8899", // rpc
  "https://zk-testnet.helius.dev:8784", // zk compression rpc
  "https://zk-testnet.helius.dev:3001" // prover
);

const main = async () => {
  const health = await connection.getIndexerHealth();
  console.log(health);
};

main();

Prover is the service responsible for processing Merkle tree updates. The code is almost similar to how it is done using @solana/web3.js, except for the fact that we've had to add ZK compression RPC and prover while creating a connection object.

import { Rpc, confirmTx, createRpc } from "@lightprotocol/stateless.js";
import { createMint, mintTo } from "@lightprotocol/compressed-token";
import { Keypair } from "@solana/web3.js";

const payer = Keypair.generate();

const connection: Rpc = createRpc(
  "https://zk-testnet.helius.dev:8899", // rpc
  "https://zk-testnet.helius.dev:8784", // zk compression rpc
  "https://zk-testnet.helius.dev:3001" // prover

);

const main = async () => {
  console.log(`payer - ${payer.publicKey.toString()}`);

  await confirmTx(
    connection,
    await connection.requestAirdrop(payer.publicKey, 10e9)
  );

  const { mint, transactionSignature } = await createMint(
    connection,
    payer,
    payer.publicKey,
    6
  );

  console.log(`create-mint  success! txId: ${transactionSignature}`);

  const mintToTxId = await mintTo(
    connection,
    payer,
    mint,
    payer.publicKey,
    payer,
    1e6
  );

  console.log(`mint-to success! txId: ${mintToTxId}`);
 
}; 

main();

In the above code snippet, we're creating a compressed token and minting 1 token into our payer wallet.

createMint function initializes a token mint account using Solana Token Program and creates a token pool PDA using Light Protocol's Compressed Token Program (create_token_pool source code).

Here's the signature of a transaction which is executed via createMint function - https://solana.fm/tx/49Pu29qgwJoEdLhE13rWUaRv5szFKHp966T58v41osP929Pa8rovn77hogeEQwrzW8FKxyiFFWbG9awA3pcj3gJX.

Make sure to use https://zk-testnet.helius.dev:8899 as the RPC URL in solana.fm

System Program allocates 86 bytes for 46ieAhhWgmLHjDbj5PjnWvK8knPxAhfVuYGH9y7Ud7A1 account
SPL Token Program initializes a mint account at the above address with mint authority as AfiNiZ1M1UYeQWjCYmxMfoyGefLiigKmKc11zooHGPU9
Light Protocol's Compressed Token Program creates token pool PDA (4bqJdrQkUeQv2Bvm4T2fLpza48xqK3L4z3VCPnrUtQ5v) owned by SPL Token Program, with token mint as 46ieAhhWgmLHjDbj5PjnWvK8knPxAhfVuYGH9y7Ud7A1

The mintTo function is responsible for minting the compressed tokens to the destination wallet (in this case, it's payer)

Here's the signature of a transaction that is executed via mintTo function - https://solana.fm/tx/283nNu7kSp7VVffUTqQT5LUQjw828K9R8dV3eAesch6unJEFaLZiYKTJfRYhDxL41iw1pSReb751txxT1jdJ644T

The mentioned amount of SPL token are minted to token pool PDA whose authority is the signer of the transaction (mint_spl_to_pool_pda source code)
The token data is serialized and the account data of every pubkey in the transaction is compressed (create_output_compressed_accounts source code)
CPI is executed with the serialized data of mintTo transaction which is later parsed and stored in the database via indexers like photon indexer (built by Helius Labs) (cpi_execute_compressed_transaction_mint_to source code)

This was a quick example to get an idea of how ZK compression programs work. Other examples can be built using ZK compression, all the examples are available in examples directory.

Using local ZK compression node

For spinning up a local ZK compression node, you've to install ZK compression CLI and photon-indexer as well. Change pnpm with the package manager of your wish.

pnpm add @lightprotocol/zk-compression-cli -g
cargo install photon-indexer --version 0.30.0 --locked

After the installation step is done, run the following command which starts a single-node Solana cluster along with Photon RPC for indexing compression programs and Prover for processing Merkle trees.

light test-validator

Limitations

ZK compression offers lower account rent costs but also has a few limitations, which must be kept in mind before opting to use ZK compression in your app:

Larger transaction size
High compute unit usage

Larger transaction size

The maximum transaction size on Solana is 1232 bytes and if you read from at least 1 compressed account then 128 bytes are used up for validity proof, which is constant per transaction.

If you have a use case where an account's data is updated multiple times then its uncompressed version might be cheaper than the compressed version as every write operation has a small additional cost in contrast to the fixed per-byte cost for the uncompressed version. Whenever a transaction writes to a compressed account, it nullifies the previous compressed account state and appends the new compressed account as a leaf to the state tree, both of these actions add up to the additional cost.

Appending a compressed state to a state tree typically costs around 100-200 lamports per new leaf and nullifying a leaf in a state tree costs 5000 lamports per nullified leaf.

High compute unit usage

A normal token transfer via the SPL Token program typically uses around 10k CU as compared to a compressed token transfer which uses around 292k CU.

Breakdown of usage of compute units in transactions that use ZK compression:

100k CU for validity proof, it's constant per transaction if you read data from at least 1 compressed account
100k CU used for Poseidon hashing
6k CU per compressed account read/write

Whenever Solana's global per-block CU limit (50 million CU) is reached, validator clients may prioritize transactions with higher per-CU priority fees, which would require your application's users to increase their priority fees.

Exploring ZK compression

What is ZK compression?

Concurrent Merkle Trees

Zero-knowledge proof

Where's Waldo?

Two Balls

Developing apps using ZK compression

Using Helius ZK compression RPC node

Using local ZK compression node

Limitations

Larger transaction size

High compute unit usage

Resources

ZKPs and zk-SNARKs

Developing using ZK compression

Exploring ZK compression

What is ZK compression?

Concurrent Merkle Trees

Zero-knowledge proof

Where's Waldo?

Two Balls

Developing apps using ZK compression

Using Helius ZK compression RPC node

Using local ZK compression node

Limitations

Larger transaction size

High compute unit usage

Resources

ZKPs and zk-SNARKs

Developing using ZK compression

Read more

Deep dive into NFT Compression