# ZK-Hack Puzzle #1 Writeup
## Intro
A really cool event is ongoing now in the Zero-Knowledge community - the ZK-Hack (details: https://zkhack.dev). Every week a workshop about one of the ZK technologies being developed today is given ([details](https://www.zkhack.dev/#workshop)). The first one, given on 26/10/2021 was an introductory workshop about the field of Zero Knowledge Proofs with a great historical introduction.
The event is ongoing for the following seven weeks, with each week a new puzzle is published after the workshop of that week is given.
In this post I'll share a write-up of the first puzzle ([link](https://www.zkhack.dev/puzzle1.html)) I was solving with [Elichai](https://twitter.com/Elichai2) and [Shalev](https://twitter.com/shalev0s).
## Let's Hash it Out
So before the puzzle we are given introductory material to two topics in cryptography:
1. [BLS Signatures](https://hackmd.io/@benjaminion/bls12-381#BLS-digital-signatures)
1. [Pedersen Hashes](https://developer.aleo.org/developer/toy_examples/pedersen_hash/)
We are also given a referral to the documentation of [arkworks library](https://docs.rs/ark-algebra-intro/0.3.0/ark_algebra_intro/).
The challenge is available on github, [here](https://github.com/kobigurk/zkhack-bls-pedersen).
First let's talk briefly on those two cryptographic topics.
### BLS Signatures
BLS is a signature scheme, named after Boneh, Lynn and Shacham. It is based on **pairings** which is a unique algebraic construct based on **bilinear maps**. Before anyone panics I'll explain what that means.
> Definition: Let $G, G_T$ be two groups. A function $e:G\times G \rightarrow G_T$ is a **bilinear-map** if:
> 1. For any scalar $\lambda$ and for any $a,b\in G$ and for $c = e(a,b)$ we have:
> $$e( \lambda \cdot a,b) = e(a, \lambda \cdot b) = \lambda \cdot e(a,b) = \lambda \cdot c$$
> 2. For any $a_1,a_2,b\in G$ we have:
> $$e(a_1+ a_2,b) = e(a_1,b)+ e(a_2,b) $$
> 3. For any $b_1,b_2,a\in G$ we have:
> $$e(a,b_1+ b_2) = e(a,b_1)+ e(a,b_2) $$
In a sense, it just means a scalar can be moved freely between the any of the arguments or even out of the function evaluation to the output. If you're really interested how the magic happens I'd recommend reading about [Weil-Pairings](https://en.wikipedia.org/wiki/Weil_pairing).
So the BLS signatures are signature schemes that provide the following the functions of KeyGen, Sign and Verify. Let's see how those are performed, please notice that BLS Signatures have multiple variants, I'll be discussing a simplified one that is sufficient to understand the problem.
Setup - Let $g$ be a generator of group $G$ of prime order $r$, and $e:G \times G \rightarrow G_T$ a bilinear map.
1. KeyGen - Take a random scalar $sk$ between 0 and $r-1$. The private key will be $sk$, the public key, $pk=sk \cdot g$. Publish $pk$ and keep $sk$ secret.
2. Sign - Given a message $m\in G$ the signature of the message is $sk \cdot m$.
3. Verify, given a message $m$ and public key $pk$ and signature $\sigma$ verify that $e(m,pk)=e(\sigma,g)$. Notice that if $\sigma = sk \cdot m$ and since $pk=sk \cdot g$ then, following the bilinearity of $e$ we get:
$$ e(m,pk) = e(m,sk \cdot g) = e(sk \cdot m,g) = e(\sigma,g)$$
This is all nice but in real life scenarios our message to sign is an arbitrary stream of bits and not a group element. Can we find a way to map arbitrary bit-streams into group elements? The answer is yes, and is composed typically of two steps.
- First, employing a cryptographically secure hash function to the message, reducing it to a constant size (e.g. 256 bits). One example of such function is [blake2b](https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE2).
- Second, mapping the hash to a group element (typically an element on an elliptic-curve) using a "hash-to-curve" technique.
It is important that along the way we create as little bias as possible to retain the security of the signature scheme.
One such hash-to-curve technique *Pedersen Hashes*, which is our next subject.
### Pedersen Hashes
As mentioned, using *Pedersen Hashes* we can map the output of a hash function to a group element. Since those groups are typically elliptic curve points, we will use "group element" and "curve point" interchangeably in this section.
So, the Pedersen hash scheme setup is based on a set of $n$ group elements $g_1,...,g_n$, which we assume we don't know the discrete-log of each $g_i$ with respect to any other $g_j$. In other words, for each pair $i,j$ ($i\neq j$) we can't efficiently find a value $k$ such that $g_i^k = g_j$.
Given an $n$-bit output of of a hash function $h=(b_1,...,b_n)$ (each $b_i$ is a single bit), the value of the pedersen hash of $h$ is $\sum_{i=1}^n b_i \cdot g_i$. By that we get an element in the group.
## The Challenge
In the challenge we are given [256 messages](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/data.rs#L8) ($m_1,...,m_{256}$) and those [messages' signatures](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/data.rs#L267) $(s_1,...,s_{256})$ signed by some unknown private key ($sk$) which its corresponding [public-key is given](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/data.rs#L7) as well ($pk$).
Each message is signed using the [BLS signature scheme](https://github.com/kobigurk/zkhack-bls-pedersen/blob/main/src/bls.rs) where messages are mapped to group elements by first [employing a blake2s hash](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/hash.rs#L23) and then using [pedersen hash](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/hash.rs#L17) on its output.
It's important to mention that the output of blake2s is 256-bits wide.
The group elements $g_1,...,g_{256}$ are [picked arbitrarily](https://github.com/kobigurk/zkhack-bls-pedersen/blob/e72914cf67799306d6577332c61f162d48440f0f/src/hash.rs#L18).
Notice that the prime size of the group in our challenge is $r=0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001$.
Therefore the private key is a number in $\mathbb{Z}_r$ (between $0$ and $r-1$).
We are told that these signatures were published and someone managed to sign some previously unsigned message, and we're asked how can this be done? The solution is in the form of signing ourselves our username as a proof to show we know how can this be done, this basically means that [Existential-Unforgeability](https://en.wikipedia.org/wiki/Digital_signature_forgery#Existential_forgery_(existential_unforgeability,_EUF)) isn't a property of the signature scheme in this puzzle.
If you haven't tried yet tackling this challenge, I highly encourage you to do so, this is the best way to really understand what happens and to get a better grasp of the underlying concepts who take part in the challenge.
Either way, let's see how this can be solved. We begin with a few notations to make the solution simpler.
## Notation
As for notation, we will denote the blake2s of a message $m_j$ using $b(m_j)$. We will also denote the $i^{\text th}$ bit of $b(m_j)$ using $b_i(m_j)$. Therefore, the pedersen-hash of each message $m_j$ is:
$$ \sum_{i=1}^{256}b_i(m_j)\cdot g_i$$
Thus, we can view each blake2s output $b(m)$ as a column-vector of 1's and 0's:
$$b(m) = \begin{pmatrix}
b_1(m) \\
b_2(m) \\
\vdots \\
b_{256}(m) \\
\end{pmatrix}$$
So, we can also define basic arithmetic of blake2s hashes using vector arithmetics. For two message $m_i,m_j$ we have:
$$ b(m_i) + b(m_j) = \begin{pmatrix}
b_1(m_i) +\ b_1(m_j) \\
b_2(m_i) +\ b_2(m_j) \\
\vdots \\
b_{256}(m_i) +\ b_{256}(m_j) \\
\end{pmatrix}
$$
Where addition is over $\mathbb{Z}_r$ ($r$ is the size of the group $G$).
For a scalar $c\in \mathbb{Z}_r$ and for some message $m$ we define the scalar multiplication $c\cdot b(m)$ as:
$$c\cdot b(m) = \begin{pmatrix}
c\cdot b_1(m) \\
c\cdot b_2(m) \\
\vdots \\
c\cdot b_{256}(m) \\
\end{pmatrix}$$
## The solution
The solution is based on the fact that the signature itself is linear. In the end of the day, we are signing (i.e. multiplying our private key by) a group elemenet. We'll show that the signature of the sum of two group elememts is the sum of the signatures. Let's see what does it mean:
If we take two messages $m_1,m_2$ and blake2s them $b(m_1),b(m_2)$ their pedersen hashes are:
$$ \begin{aligned}
h_1 &= \sum_{i=1}^{256}b_i(m_1)\cdot g_i &
h_2 &= \sum_{i=1}^{256}b_i(m_2)\cdot g_i &
\end{aligned}
$$
Thus, their signatures $s_1,s_2$ are:
$$ \begin{aligned}
s_1 &= sk \cdot h_1 &
s_2 &= sk \cdot h_2
\end{aligned}
$$
It's very easy to tell that the signature of $h_1+h_2$ is: $$sk\cdot(h_1+h_2)=sk\cdot h_1 + sk\cdot h_2 = s_1 + s_2$$
Not only that, but the pedersen hashing is also linear!
Given the vector arithmetics defined previously for the blake2s outputs we can tell that first summing the blake2s outputs and then performing the pedersen hash transformations or first doing the pedersen transformation for each blake2s output and then adding the results yields the same output. In algebtraic terms it means that:
$$
\sum_{i=1}^{256}b_i(m_1)\cdot g_i + \sum_{i=1}^{256}b_i(m_2)\cdot g_i =
\sum_{i=1}^{256}(b_i(m_1) + b_i(m_2))\cdot g_i
$$
Ok, with these two linearity properties, we are ready to solve the puzzle!
We have message $m$ we want to sign, we compute its blake2s hash $b(m)$.
If we are able to find constants $c_1,...,c_{256}$ in $\mathbb{Z}_r$ such that:
$$\begin{aligned} (\triangle) & \ \ \ \ b(m) = \sum_{i=1}^{256} c_i \cdot b(m_i) \end{aligned}$$
Then we can generate signature $s_m$ for $m$ by following both linearity properties.
$$ \begin{aligned} (\square) &\ \ \ \ \ s_m = \sum_{i=1}^{256} c_i \cdot s_i \end{aligned}$$
Remember that $(\triangle)$ is a vector equation, holding for each entry in the vector $b(m)$. Which means that for all $j$:
$$b_j(m) = \sum_{i=1}^{256} c_i \cdot b_j(m_i)$$
Let's see why $(\square)$ holds:
$$ \begin{aligned}
s_m &= sk\cdot \sum_{j=1}^{256}b_j(m)\cdot g_j \\
&= sk\cdot \sum_{j=1}^{256}\left(\sum_{i=1}^{256} c_i b_j(m_i)\right)\cdot g_j \\
&= sk\cdot \sum_{i=1}^{256}\left(\sum_{j=1}^{256} c_i b_j(m_i)\cdot g_j\right) \\
&= \sum_{i=1}^{256}c_i\underbrace{\cdot s_k\left(\sum_{j=1}^{256} b_j(m_i)\cdot g_j\right)}_{s_i} \\
&= \sum_{i=1}^{256}c_is_i \\
\end{aligned}
$$
Great, so now the only question is - how can we find those constants $c_1,...,c_{256}$?
Well, let's look at $(\triangle)$ again, as we already said this is a vector-equation, so it's actually a linear equation system with 256 linear equations (since the vectors are of size 256) and with 256 variables ($c_1,...,c_{256}$).
We can express this system of linear equations in matrix nontation:
$$Ac = b$$
Such that the variable of the equation is:
$$ c = \begin{pmatrix}
c_1 \\
c_2 \\
\vdots \\
c_{256}
\end{pmatrix}
$$
The vector $b$ is the bits of the blake hash of our target message $m$:
$$ b = b(m) = \begin{pmatrix}
b_1(m) \\
b_2(m) \\
\vdots \\
b_{256}(m)
\end{pmatrix}
$$
And the matrix $A$ is the bits of the blake hash of all messages $m_1,...,m_{256}$. Each message has its own column:
$$ A = \begin{pmatrix}
b_1(m_1) & b_1(m_2) & \dots & b_1(m_{256}) \\
b_2(m_1) & b_2(m_2) & \dots & b_2(m_{256}) \\
\vdots & \vdots & \ddots & \vdots \\
b_{256}(m_1) & b_{256}(m_2) & \dots & b_{256}(m_{256}) \\
\end{pmatrix}
$$
The solution to the equation is:
$$c = A^{-1}b$$
Luckily enough, matrix $A$ is invertible so can easily sign the message $m$ following our previous equation:
$$s_m = \sum_{i=1}^{256}c_is_i $$
## Code
In this section we give the basic tools we used to solve the problem in code. While the input data and the whole rust program is written in rust, we have written our solution in the [sage programming language](https://www.sagemath.org/).
Sage is a mathematical-oriented programming language with many built-in tools for computation in the fields of algebra, statistics, combinatorics, graph theory and more. It may look similar to python to some readers.
We have also used some rust code to preprocess the input to the sage program and postprocess its output.
### Input preprocessing
First we took the messages blake2s hashes from the rust file using the following functions:
```rust=
fn bytes_to_bits_string(bytes: &[u8]) -> String {
let bits = bytes_to_bits(bytes);
let mut s = String::with_capacity(bits.len());
for bit in bits {
if bit {
s.push('1');
} else {
s.push('0');
}
}
return s;
}
fn write_msgs_to_file(msgs: Vec<Vec<u8>>) {
let mut file = File::create(format!(
"bits_vecs-{}",
(SystemTime::now().duration_since(UNIX_EPOCH))
.unwrap()
.as_millis()
))
.unwrap();
for msg in msgs {
let blake = hash_to_curve(&msg).0;
let string = bytes_to_bits_string(&blake);
file.write_all(string.as_ref()).unwrap();
file.write_all(b"\n").unwrap();
}
}
```
The `write_msgs_to_file` function takes a vector of blake2b hashes of the input messages and writes each hash as a string of 0s and 1s representing the binary form of the hash.
Each hash is written in a separate line to the output file.
This file will be passed to our sage code.
### Algebraic Processing in Sage
Our sage code is quite short and powerful.
First we read the input and parse it as a list of lists (therefore - a matrix) of 0s and 1s, each binary digit in its own cell of the matrix.
```sage=
A = list()
# bits_vecs-1635288647752 was computed by the rust program, it contains a list of all the messages in bits represantion (after the blake2s hash)
with open("bits_vecs-1635288647752", 'r') as f:
for line_index, line in enumerate(f):
A.append(list())
for bit_index in range(0, 256):
A[line_index].append(int(line[bit_index]))
```
Next, we define $P$ to be the order of the curve, so scalar that will later be multiplied by the generators in the Pedersen-has-to-curve scheme will be taken from the field $F=\mathbb{Z}_P$ defined right after.
```sage=
# Curve Order
P = 0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001
F = FiniteField(P)
```
Next, we define the matrix `GA` to be simply the list-of-lists `A` we previously defined, where each entry (0/1) is considered as an element in field `F`. We transpose it because if you pay attention in the `A` matrix we defined in the previous section each *column* (and not row) should contain the bits of a hash of a specific message.
Finally, we compute `GAinv` which is the transpose of our matrix.
```sage
GA = Matrix(F, A)
GAT = GA.transpose()
GAinv = GAT.inverse()
```
Next, we computed the blake2s value of our message -- this is our vector $b$ from previous section where we will look at each bit as an element in $\mathbb{Z}_P$. Finally, we have to compute $A^{-1}\cdot b$.
```sage=
```sage
# blake2s of ou
bitstring = '0111011010010000000111101110100010110110010100111011011011111111100110001010110001110000111111111100111001100110001001111101110110111011101000100110010110011111001111011010110111110011111111110100111011111011000110000011100101101000001110011101011100010000'
bits = [int(el) for el in bitstring]
gbits = vector(F, bits)
gsolution = GAinv * gbits
# Print the generator multipliers.
",".join(['Fq::from_str("'+str(el)+'").unwrap()' for el in gsolution])
```
The output we get is a vector $c$ of elements from $\mathbb{Z}_P$ such that the signature for our message will be $\sum_{i=1}^{256}c_i\cdot s_i$ where $s_i$ is the signature of the $i^{\text th}$ message. This part will be done in rust.
### Final Postprocessing and Signature Generation
Our signature generation is also done using rust and is available here as the full code:
https://gist.github.com/elichai/7401f5423c2693960677ba4f8a9fab14#file-computing_the_sig-rs-L55
Here we'll give some explanation on snippets out of it.
First we define the `selectors` array, a very long array such that `selectors[i]` is the field element $c_i$ we have obtained using our sage program.
```rust=
let selectors = [
Fr::from_str(
"27645015623588109382996024038763530282647599513403648261518408122004451823795",
)
.unwrap(),
Fr::from_str(
"23018579491472099737921523253639007115479688088731410213980168199642094036630",
)
....
....
....
.unwrap(),
Fr::from_str(
"6769691326408305518047502379958157439957827386631887324632648911856770263560",
)
.unwrap(),
];
```
Next, we multiply each `selector[i]` by $s_i$ the signature of $i^{\text th}$ message (`sigs[i]`).
This is done using arkworks library.
```rust=
let mut sum = G1Projective::zero();
for (i, num) in selectors.iter().enumerate() {
let additive = sigs[i].into_projective().mul(num);
sum += additive;
}
```
Next we make a curve element out of the sum
```rust=
let affine = G1Affine::from(sum);
```
And we verify we actually got the correct signature.
```rust=
verify(pk, msg, affine.clone());
```
Assuming we do, we output the signature so it is ready for submission and we're done! :clap:
```rust=
let mut sig = Vec::new();
affine.serialize(&mut sig).unwrap();
let sig_hex = hex::encode(sig);
println!("sig: {}", sig_hex);
```