ZK Hack - Let's Hash it Out - WriteUp

# ZK Hack - Let's Hash it Out - WriteUp In my journey to master all aspects of ZKPs I decided to jump to ZK Hack CTFs, which offer some of the best challenges covering essential topics in ZKP. This series of writeups assumes that the reader is already familiar with ZKP and the underlying mechanisms. I will be solving the majority of challenges, with each challenge a detailed beginner-friendly solution that breaks down the problem into manageable steps. These write-ups will include all necessary resources, mathematical concepts, algorithms and notations required to understand and solve each challenge. [Puzzle link](https://zkhack.dev/events/puzzle1.html) ## Introduction Let's start first by looking at the description to gain more information about what we are working on. ```rust= pub const PUZZLE_DESCRIPTION: &str = r#"Alice designed an authentication system in which users gain access by presenting it a signature on a username, which Alice provided. One day, Alice discovered 256 of these signatures were leaked publicly, but the secret key wasn't. Phew. The next day, she found out someone accessed her system with a username she doesn't know! This shouldn't be possible due to existential unforgeability, as she never signed such a message. Can you find out how it happened and produce a signature on your username?"#; ``` We can already see from the description that we won't be solving any DLP or recovering private keys, instead we have to produce a signature of our GH username (**0xhashiman** in my case) using the available data leaked, which is 256 messages + 256 signatures [data link](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/data.rs) and validate the new signature against Alice's public key to solve the challenge. If you read the description carefully, an important keyword will give you the first hint to solve the challenge ***existential unforgeability***. This means it should be impossible for an attacker to produce a valid signature on any message that hasn't been explicitly signed by the legitimate signer. But this is not true in the case of this challenge because upon looking at the hash_to_curve function used, you can clearly see the [RNG (random number generator)](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/hash.rs#L18-L21) is seeded with a constant, fixed value (a vector of all 1s). This means that every time the hash function is called, it produces the same parameters for the Pedersen hash. As a result, the hash_to_curve doesn’t introduce any randomness that would typically make it hard to forge a signature on a new message, which is the first issue. ## BLS Curve & Signature ### BLS Curve This curve is named after Boneh Lynn and Shacham, with the basic equation being $y^2 = x^3 + 4$ defined over the field $\mathbb{F}_q$ with q being a large 381 bit prime, this curve is called $E(F_q)$. The BLS12-381 curve also has another curve defined over by extending $F_{q}$ to $F_{q^2}$, where the equation is slightly modified to $y^2 = x^3 + 4(1+i)$ and we'll call this curve $E'(F_{q^2})$. One important feature of the BLS curve is the bilinear pairing property (also called the mapping property), which is heavily used in the signature verification process. To explain this property the simplest way, let's take a point $P \in G_1 \subset E(F_q)$ and a point $Q \in G_2 \subset E'(F_{q^2})$. For any two scalars $a,b$, we say $e(P, Q)$ is a bilinear pairing/mapping when: - $e([a]P, [b]Q) = e(P, [b]Q)^a = e(P, Q) ^{ab} = e(P, [a]Q)^b = e([b]P, [a]Q)$. We can also define a public key on this curve using this format and choose a random number between 1 and r-1, where **r is the order of the elliptic curve subgroup: the number of points in the subgroup**. We'll call this random number $sk$, which represents the private key. The corresponding public key (if we're using $G1$ for public keys) is $$pk = [sk]g1$$ Where $g1$ is the generator of $G1$, which is a subgroup of $E$. The discrete logarithm problem means that it is impossible to recover $sk$ given the public key $pk$. ### BLS Signing & Verification #### Sign If we have a message $m$ that is represented as EC point either G1 or G2 (depends on the signature scheme used), this means $m \in G_1 or G_2$. We can sign this message by simply multiplying it with the private key $sk$: $$\Sigma = m \cdot sk$$ where $\Sigma$ is the signature that can be verified against the public key $pk$. #### Verify In order to verify the signature $\Sigma$, we will be using the $e$ bilinear mapping property explained above. This signature is valid if and only if: $$e(m \cdot pk) = e(\Sigma, g)$$ We will start with $e(m \cdot pk)$ and showcase why it will always be equal to $e(\Sigma, g)$: $$ \begin{align*} e(m \cdot pk) &= e(m, sk \cdot g) \quad \text{(because } pk = sk \cdot g\text{)} \\ &= e(m \cdot sk, g) \quad \text{(because of the bilinear mapping property)} \\ &= e(\Sigma, g) \quad \text{(because } \Sigma = m \cdot sk\text{)} \end{align*} $$ The problem now is if we want to sign a message it needs to be a curve point as we said earlier, and not a string message like **0xhashiman**, so we must understand how we transform any message to a curve point and prepare it for the signing process. In the case of this challenge, the answer is the Pedersan hash function. ### Key Resources - [Everything You Need to Know About BLS12-381 Curve](https://hackmd.io/@benjaminion/bls12-381) - [Magic Behind Pairings and Mappings in Cryptography](https://static1.squarespace.com/static/5fdbb09f31d71c1227082339/t/5ff394720493bd28278889c6/1609798774687/PairingsForBeginners.pdf) - [Defining Twists on Elliptic Curves](http://indigo.ie/~mscott/twists.pdf) - [Introduction to Elliptic Curve Cryptography](https://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/) ## Hash To Curve ### Pedersen Hash Definition The Pedersen hash is an algebraic hash function that is simply computed as a linear combination of points on an elliptic curve. They are therefore fairly efficient to compute in zkp since their computations rely on field arithmetic, contrary to the more traditional hash functions that have no convenient mathematical structure. Let $M$ be the message that we wish to hash and let $g1, g2, …, gk$ be EC points. These points must be sampled uniformly at random, such that no relationship between them is known. Split the message $M$ into $k$ chunks of bits each: $M = b1b2... bk$. The Pedersen hash is defined as the linear combination of the points with the encoding of the message chunks: $$ H(M) = H(b1b2 … bk) = ⟨b1⟩ ⋅ g1 + ⟨b2⟩ ⋅ g2 + … + ⟨bk⟩ ⋅gk $$ In mathematical terms, we can say given an $n$-bit message $m = (b_1, \dots, b_n)$, where each $b_i$ is a single bit, the value of the Pedersen hash of $m$ is $$ H(m) = \sum_{i=1}^{n} b_i \cdot g_i $$ ### Important Note You might have read articles about how this function is vulnerable to **length extension attacks**, however this issue won’t help us solve this challenge because the usage of this function is generally paired with a hash function like keccak256, sha256 or Blake2s to normalize the input size. In this CTF, the Pedersen hash is paired with the Blake2s hash function. This means that no matter the size of the original message, the input to the Pedersen hash function will be 256 bits. We can now define a more precise formula for signing any message. We will denote the Blake2s hash of a message $m$ using $bl(m), additionally we will denote the $i^{th}$ bit of $bl(m)$ using $bl_i(m)$. Therefore, the Pedersen hash $H$ of any message $m$ in case of our challenge is: $$ H(m) = \sum_{i=1}^{256} bl_i(m) \cdot g_i $$ ### Key Resources - [Pedersen Hash Function Overview](https://iden3-docs.readthedocs.io/en/latest/iden3_repos/research/publications/zkproof-standards-workshop-2/pedersen-hash/pedersen.html) - [Breaking Pedersen Hash in Practice](https://www.nccgroup.com/us/research-blog/breaking-pedersen-hashes-in-practice/) - [Sage Implementation of Pedersen Hash Breaking](https://github.com/ncc-pbottine/ToyPedersenHash/blob/main/pedersen.sage) ## Linear Algebra In order to solve this challenge, we have to use linear algebra to both represent the current data and solve it. We have 256 messages + signatures as leaked data . If we take 2 messages m1, m2 using the previous hash to curve equation, we can define their Pedersan hash using the below notation. $$ H_1 = \sum_{i=1}^{256} bl_i(m_1) \cdot g_i \quad \quad H_2 = \sum_{i=1}^{256} bl_i(m_2) \cdot g_i $$ And their signatures $s_1$, $s_2$ in the BLS curve are where $sk$ is the private key: $$ s_1 = sk \cdot H_1 \quad s_2 = sk \cdot H_2 $$ One of the properties of signatures in the BLS curve is aggregation, which means the signature for $h_1 + h_2$ is $s_1 + s_2$. This is because: $$ sk \cdot (h_1 + h_2) = sk \cdot h_1 + sk \cdot h_2 = s_1 + s_2 $$ We can now represent the whole 256 messages and their respective signatures using the below formula, where $i$ is the index of each message and $j$ is the $j^{th}$ bit of its Blake2s hash, $H_i$ is the Pedersan hash of that message, and $S_i$ is its signature. $$ \sum_{i=1}^{256}H_i = \sum_{j=1}^{256} bl_j(m_i) \cdot g_j \quad \quad \sum_{i=1}^{256}S_i = H_i \cdot sk $$ ### CTF Data Representation For a visual representation, the ctf data can be both structured as EC points so that we can sign the data and binary (blake) format to solve the challenge: #### EC Points Format The formula below will result in $\sum_{i=1}^{256}H_i$ being EC points, this will make it hard to solve the challenge. \begin{array}{c@{\quad}c} \begin{bmatrix} H_1 \\ H_2 \\ \vdots \\ H_{256} \end{bmatrix} = & \begin{bmatrix} bl_1(m_1)\cdot g_1 & bl_2(m_1)\cdot g_2 & \cdots & bl_{256}(m_1)\cdot g_{256} \\ bl_1(m_2)\cdot g_1 & bl_2(m_2)\cdot g_2 & \cdots & bl_{256}(m_2) \cdot g_{256} \\ \vdots & \vdots & \ddots & \vdots \\ bl_1(m_{256})\cdot g_1 & bl_2(m_{256})\cdot g_2 & \cdots & bl_{256}(m_{256})\cdot g_{256} \end{bmatrix} \end{array} #### Binary Format/Blake Format The formula below will result in $\sum_{i=1}^{256}bl(m_i)$ binary format, which is simply the blake2s of each message where each row has 256 columns and each column has exactly 1-bit of the hash. \begin{array}{c@{\quad}c} \begin{bmatrix} bl(m_1) \\ bl(m_2) \\ \vdots \\ bl(m_{256}) \end{bmatrix} = & \begin{bmatrix} bl_1(m_1)\ & bl_2(m_1) & \cdots & bl_{256}(m_1) \\ bl_1(m_2)\ & bl_2(m_2) & \cdots & bl_{256}(m_2) \\ \vdots & \vdots & \ddots & \vdots \\ bl_1(m_{256})\ & bl_2(m_{256}) & \cdots & bl_{256}(m_{256}) \end{bmatrix} \end{array} #### Signatures We multiply each value of $H_i$ (EC format) by the private key. \begin{array}{c@{\quad}c} \begin{bmatrix} S_1 \\ S_2 \\ \vdots \\ S_{256} \end{bmatrix} = & \begin{bmatrix} H_1 \cdot sk \\ H_2 \cdot sk \\ \vdots \\ H_{256} \cdot sk \end{bmatrix} \end{array} ### Smaller Scale Example Let's work on a smaller scale example. The message we’re working with is deadbeef, which is a 4-byte hexadecimal value. We need to first hash the message **deadbeef** to follow the same rules as the challenge. Let’s assume we are using a hashing function like BLAKE2s. However, for simplicity we’ll use a hypothetical hash output for deadbeef that is 4 bytes (32 bits) long, rather than the usual 32 bytes (256 bits) that a real BLAKE2s hash would produce. Let's say hashing **deadbeef** gives us the output $0x12345678$, which is 4 bytes. Now, convert each byte of $0x12345678$ to binary: \begin{array}{l@{\quad}l} \text{Byte 1: } 0x12 \text{ is } 00010010 & \text{Byte 2: } 0x34 \text{ is } 00110100 \\ \text{Byte 3: } 0x56 \text{ is } 01010110 & \text{Byte 4: } 0x78 \text{ is } 01111000 \end{array} Combine these binary values to get the full 32-bit binary representation of the hash output: H(deadbeef)=$0x12345678$=$00010010001101000101011001111000$ From the previous part, we said the pedersan hash will give us an EC point at the end of the hash because we multiply each bit by $g$. For the sake of this example and challenge, we will be working with the binary format and not the EC format. The row of this message in the matrix would look like this: \begin{array}{c@{\quad}c} \begin{bmatrix} H_1 \end{bmatrix} & = \begin{bmatrix} 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 \end{bmatrix} \end{array} ## Mathematical Solution We have now explained all the essential parts. To produce a signature for our GH username, we must first compute its blake2s hash $bl(0xhashiman)$ (in my case). If our message can be represented as a linear combination of the 256 messages $\sum_{i=1}^{256}bl(m_i)$, then we can translate the same linear combination to its 256 signatures. In mathematical terms, if we are able to find $c_1, \dots, c_{256}$ such that: $$ bl(0xhashiman) = \sum_{i=1}^{256} c_i \cdot bl(m_i) $$ Then we can generate the signature $s_{0xhashiman}$ by following the linearity property. After all, the signature in BLS is simply the private key multiplied by a point, with no random number to obfuscate the process or make it non-linear. $$ s_{0xhashiman} = \sum_{i=1}^{256} c_i \cdot s_i $$ --- To find the values of $\sum_{i=1}^{256} c_i$ denoted as $g$ a vector of constants, we start with the Blake2s hash of our GitHub username denoted as $bl(0xhashiman)$. This hash can be represented as a vector $E$ (a 1x256 matrix), where each column in $E$ holds a single bit of the hash of our message. The new formula is: $$E = g \cdot bl(m_i)$$ From earlier recall, $\sum_{i=1}^{256} bl(m_i)$ forms a large 256x256 matrix, where each row corresponds to a message and each column represents one bit of the hash. We call this matrix F. The new formula is: $$E = g \cdot F$$ For the solution of this linear system that has g as a variable and E F as a known variable: $$ \mathbf{g} = F^{-1} E $$ This is also possible because the matrix F is invertible since it has full rank. In this case, F is a 256x256 matrix, where each row represents a different message and each column corresponds to a bit in the hash of those messages. Since each row in F represents the hashed message and we are using the Blake2 hash function, which is collision-resistant, each hashed message is unique. This property means that you can't find two different messages that produce the same hash, ensuring that no rows in F are duplicates or can be represented as linear combinations of others. This uniqueness contributes to F having full rank, making it invertible. ## Practical Code Solution ### Rust Code ***ALL RUST CODE MUST BE ADDED TO ```verify-bls-pedersen.r```*** #### First Part The first step is to add this import statement to help us convert from bytes to bits easily: ```rust use ark_crypto_primitives::crh::pedersen::bytes_to_bits; ``` The code below is responsible for saving both the matrix F which is 256x256 as well as the vector E which is 1x256: ```rust= // Open F.txt for writing the matrix let mut f_file = File::create("F.txt")?; // Write the beginning of the matrix declaration in Sage format writeln!(f_file, "F = matrix(R, [")?; for (i, m) in ms.iter().enumerate() { // Print the index of the current message being processed println!("Processing message {}", i); // Hash the message to get a curve point format and binary format let (hash, point) = hash_to_curve(&m); // Convert the hash into a bit vector let bits = bytes_to_bits(&hash); // Convert each bit to a u8 let bits: Vec<u8> = bits.into_iter().map(|x| x as u8).collect(); // Write the bit vector as a row in F.txt writeln!(f_file, "{:?}, ", bits)?; } // Close the matrix declaration in Sage format writeln!(f_file, "]);")?; // Open E.txt for writing the target vector let mut e_file = File::create("E.txt")?; // Hash the target name to get the binary and point on the curve let (name_hash, name_point) = hash_to_curve("0xhashiman".as_bytes()); // Convert the hash of the target name into a bit vector let bits = bytes_to_bits(&name_hash); let bits: Vec<u8> = bits.into_iter().map(|b| b as u8).collect(); // Write the target vector in Sage format to E.txt writeln!(e_file, "E = vector(R, {:?})", bits)?; println!("Matrix saved to F.txt and vector saved to E.txt."); ``` We can now jump to Sage before moving to the second part in order to solve our linear system the simplest way possible. #### Second Part At this stage we have everything needed to forge a signature for the message $0xhashiman$: ```rust= // Path to the file containing coefficients let path = "g.txt"; // Open the file containing the coefficients let file = File::open(path)?; let mut constants = Vec::new(); // Read each line, convert it to a BigInt, and add it to the `constants` vector for line in io::BufReader::new(file).lines() { let line = line?; // Read the line as a string let constant = BigInt::from_str(&line).expect("Failed to parse BigInt"); // Convert to BigInt constants.push(constant); // Store in the vector } // Initialize `forged_sig` as the zero point of G1Projective let mut forged_sig = G1Projective::zero(); // Iterate over each coefficient and corresponding signature for (constant, sig) in constants.into_iter().zip(sigs.iter()) { // Convert the BigInt coefficient to an element of the field `Fr` let constant_field: Fr = Fr::from_str(&constant.to_string()).expect("Failed to convert to Fr"); // Perform scalar multiplication with the signature and accumulate forged_sig += sig.mul(constant_field); } // Use the final accumulated `forged_sig` in the verification function verify(pk, "0xhashiman".as_bytes(), forged_sig.into()); // Prepare a buffer for serialization (48 bytes for G1Affine in BLS12-381) let mut hex_sig = Vec::new(); forged_sig.serialize_unchecked(&mut hex_sig).expect("Serialization failed"); // Serialize the signature let hex_string = hex::encode(hex_sig); // Using `hex` crate for hex encoding println!("forged signature = 0x{}", hex_string); // Print the final forged signature in hex ``` The output should be like this: ```rust forged signature = 0xe3cab6c089f2be2cb13b6a72f8ae55d074a1511c5e1be9e66c0cdf5b8fff832fbe276a1b8885b569cec7f06fa567ac18384a0380b9d4d4abea517e2cb745be49b20a184970faa8fdcbe2b615d2b2cd6d00f553bb2ca5d8c8f5022d36a03e4d10 ``` ### Sage Code ```sage= R = GF(0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001)//field initialization F = matrix(R, [[1, 1, 1, ....]])//copy and paste here the matrix saved in F.txt E = vector(R, [0, 0, 1, .... ])//copy and paste here the vector saved in E.txt F = F.transpose()//we need each message hash represented as a column to match the structure of the linear system hence we need to transpose the matrix F_inv = F.inverse()//the inverse of matrix F g = F_inv * E//g is our solution ``` You can now save the constant vector g in a txt file directly from sage using this script. ```sage= with open("g.txt", "w") as file: # Write the vector to the file in a readable format for element in g: file.write(f"{element}\n") print("Vector g saved to g.txt.") ``` ## Conclusion The takeaway from this challenge as an auditor is that whenever you see a project implementing BLS signatures, ensure that you verify the hash-to-curve function being used and check whether the RNG is truly random. This is important because it affects whether the elliptic curve points used in the operation remain the same for every message. Starting from the second challenge on ZK-Hack, we will see more and more of ZKP. See you in the next chapter.