TSA Cyber Champion 2024

# TSA Cyber Champion 2024 This is a short writeup for the Cryptography challs in TSA Cyber Champion 2024. This time, most of the challenges revolves around ZKP and BLS, both of which I did not fully comprehend the minute little details. Thus, this event was quite fun for me as I got to have more experience with ZKP variants and the simple yet confounding BLS signature scheme. ## 101 - Cryptography This is a boilerplate RSA challenge, given `ct, e, n` where `n` has a small factor. Not much else to say. ### Solver ```python= c = 231722077914684998818993776518942509384465803531548983146869754932667754136315007943497593396644630089073196170276638447665765624960333289097324447779290700092664403080584161276778064977902852018557301618273474139777712464709585187730351308079009718870031364399745764326436147001877583703027251271265576350621173 e = 65537 n = 257208938346934642693512128888810986151634836498153528507638790770764504946719195736987613302526116425873247750032929224521429342437621496424825810959518932424007107126934957421561529561264636001476988808843995824395131838577901446930016348590793828420808295335603083382120208905347497068915850813369038886980997 p = 2647 q = n // p d = pow(e, -1, (p-1) * (q-1)) from libnum import n2s print(n2s(pow(c, d, n))) ``` ## cryptomata This challenge uses the BLS signature scheme, where essentially we are tasked with forging the signature for message `admin`. This was quite confusing at first as we are not given even the public key of the signature scheme, so I went digging on the `py_ecc` BLS library. After inspecting the implemented BLS scheme in the library for a couple hours, nothing seemed out of place. The only interesting bit that I noticed is that the library actually implements a more complicated hash function, contrary to the one implemented in the challenge. ```python= def hash_to_G1(message: bytes, DST: bytes, hash_function: HASH) -> G1Uncompressed: """ Convert a message to a point on G1 as defined here: https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-09#section-6.6.3 The idea is to first hash into FQ and then use SSWU to map the result into G1. Contents and inputs follow the ciphersuite ``BLS12381G1_XMD:SHA-256_SSWU_RO_`` defined here: https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hash-to-curve-09#section-8.8.1 """ u0, u1 = hash_to_field_FQ(message, 2, DST, hash_function) q0 = map_to_curve_G1(u0) q1 = map_to_curve_G1(u1) r = add(q0, q1) p = clear_cofactor_G1(r) return p ``` ```python= # Helper function to hash a message to a point on G1 def hash_to_G1(message: bytes) -> tuple: hash_val = int.from_bytes(hashlib.sha256(message).digest(), 'big') % curve_order return multiply(G1, hash_val) # G1 is the generator point on the curve ``` As it turns out, this simple hashing is indeed the vulnerability. In order to attack this, let's take a look at the signing function. ```python= # Sign a message def sign_message(sk: int, message: bytes) -> tuple: H_m = hash_to_G1(message) # Hash message to a point on G1 sigma = multiply(H_m, sk) # Signature: sk * H(m) return sigma ``` Here, it is clear that signature `s` is just `G1 * H(m) * sk` where `H(m)` is the integer value of the message hash. Furthermore, we know the curve order `q` such that `G1 * q * k = O` where `O` is the infinity point for any arbitrary `k`. Thus, given a known message signature pair `(m, s)`, we can construct signature for target message `(s', m')` as follows: `s' = G1 * H(m') * sk = G1 * H(m) * inv(H(m), q) * H(m') * sk = s * H(m') * inv(H(m), q)` Note that we can calculate all of the values in the right hand side. Thus, we can easily forge signatures for any message this way. ### Solver ```python= from py_ecc.optimized_bls12_381 import G1, G2, add, multiply, pairing, is_on_curve, neg from py_ecc.optimized_bls12_381.optimized_curve import curve_order import hashlib import random from py_ecc.optimized_bls12_381 import normalize, FQ from azunyan.conn import remote, process # Helper function to hash a message to a point on G1 def hash_to_G1(message: bytes) -> tuple: hash_val = int.from_bytes(hashlib.sha256(message).digest(), 'big') % curve_order return multiply(G1, hash_val) # G1 is the generator point on the curve def hash_to_int(message: bytes) -> int: hash_val = int.from_bytes(hashlib.sha256(message).digest(), 'big') % curve_order return hash_val def serialize(point): xy = normalize(point) b = int.to_bytes(int(xy[0]), 48, 'big') b += int.to_bytes(int(xy[1]), 48, 'big') return b.hex() def deserialize(token): token = bytes.fromhex(token) assert len(token) == 96 x = int.from_bytes(token[:48], 'big') y = int.from_bytes(token[48:], 'big') return (FQ(x), FQ(y), FQ(1)) r = remote("0.cloud.chals.io", 32571) #r = process(['python3', 'challenge.py'], level='debug') msg = b'azuketto' hVal = hash_to_int(msg) tmsg = b'admin' thVal = hash_to_int(tmsg) f = pow(hVal, -1, curve_order) * thVal % curve_order # sign up r.sendlineafter(b'$> ', b'2') r.sendlineafter(b'Username:', msg) r.recvuntil(b"Here's your credential token:") tok = r.recvtype(str) sig = deserialize(tok) sigf = multiply(sig, f) # sign in r.sendlineafter(b'$> ', b'1') r.sendlineafter(b'Username:', tmsg) r.sendlineafter(b":", serialize(sigf).encode()) r.interactive() # TSA{proof_that_you_are_smarter_than_AI_3a78250f} ``` ## zkrsa This challenge is pretty basic for a zkp challenge, but the challenge is implemented in Go and uses a Go library, so part of the solver must use Go. Essentially, we must prove that for a given RSA public key `n` that we know two number `p, q` such that `p * q = n`. The challenge does validation for the numbers as follows. ```go= package main import ( "github.com/consensys/gnark/frontend" ) type Circuit struct { P frontend.Variable // p --> secret visibility (default) Q frontend.Variable `gnark:"q,secret"` // q --> secret visibility RSA frontend.Variable `gnark:",public"` // rsa --> public visibility } // taken from official example: https://play.gnark.io/?id=rsa // so it should be safe func (circuit *Circuit) Define(api frontend.API) error { // ensure we don't accept RSA * 1 == RSA api.AssertIsDifferent(circuit.P, 1) api.AssertIsDifferent(circuit.Q, 1) // compute P * Q and store it in the local variable res. rsa := api.Mul(circuit.P, circuit.Q) // assert that the statement P * Q == RSA is true. api.AssertIsEqual(circuit.RSA, rsa) return nil } ``` Note that the validation only checks that neither `p` nor `q` is equal to one. This is not good enough, as we can set `p = -1` and `q = -n` and we still achieve `p * q = n`. The difficulty comes in actually writing a solver, which took me nearly an hour. ### Solver Note: I am not highly proficient in Go and was unable to communicate with the remote server using Go, so I did the communication with Python and copy pasted values to the Go solver. ```python= from azunyan.conn import remote from hashlib import sha256 r = remote('0.cloud.chals.io', 13706, level='debug') r.recvuntil(b'Provide me hexstring x such that len(x) == 6 AND SHA256(x) ==') targetHash = r.recvtype(str) charset = 'abcdef0123456789' def gen(cur = ''): if len(cur) == 6: yield cur return for c in charset: yield from gen(cur + c) for g in gen(): if sha256(g.encode()).hexdigest() == targetHash: print(f"Found: {g}") r.sendlineafter(b'x: ', g.encode()) break r.interactive() ``` ```go= package main import ( "bytes" "encoding/hex" "fmt" "math/big" "github.com/consensys/gnark-crypto/ecc" "github.com/consensys/gnark/backend/groth16" "github.com/consensys/gnark/frontend" "github.com/consensys/gnark/frontend/cs/r1cs" "github.com/rs/zerolog" ) func init() { zerolog.SetGlobalLevel(zerolog.Disabled) } func main() { var bigMin = big.NewInt(-1) var str = "==INSERT MODULUS INT STR HERE==" var prove_str = "==INSERT PROVING KEY HEX HERE==" bigNum := new(big.Int) _, ok := bigNum.SetString(str, 10) // 10 is the base (decimal) if !ok { fmt.Println("Error: failed to parse string to big.Int") return } circuit := Circuit{ P: bigMin, // ignore private value Q: new(big.Int).Mul(bigMin, bigNum), // ignore private value RSA: bigNum, } witness, err := frontend.NewWitness(&circuit, ecc.BLS12_377.ScalarField()) if err != nil { panic(err) } r1cs, _ := frontend.Compile(ecc.BLS12_377.ScalarField(), r1cs.NewBuilder, &circuit) pkh, _ := hex.DecodeString(prove_str) var loadedPk groth16.ProvingKey loadedPk = groth16.NewProvingKey(ecc.BLS12_377) _, err = loadedPk.ReadFrom(bytes.NewReader(pkh)) if err != nil { fmt.Println("Invalid key!") return } var pkHex bytes.Buffer loadedPk.WriteTo(&pkHex) fmt.Println("ProvingKey:", fmt.Sprintf("%x", pkHex.Bytes())) fmt.Println("Running proof") proof, err := groth16.Prove(r1cs, loadedPk, witness) if err != nil { fmt.Println("Fail proof!") return } fmt.Println("Proof done") var proofHex bytes.Buffer proof.WriteTo(&proofHex) fmt.Println("Proof:", fmt.Sprintf("%x", proofHex.Bytes())) } // TSA{small_warmup_and_intro_to_zkp_in_the_wild} ``` ## zeronium Whew, this one was a doozy. I spotted the theoretical flaw of this protocol immediately, where the initial transcript is not actually committed, and can be generated after the proof. This should be a major flaw theoretically, but actually finding a practical attack was such a pain, and I will log every failed attempt I made. The challenge implements a ZKP that allows proving and verifying of knowledge of values in a certain range. In the challenge, the parameter is set to prove knowledge in the range `[0, 2**32)`. However, the flag is given only when we manage to create a `(proof, commitment)` pair, such that when we reveal our value, the value is actually outside of the proof range. ### Theoretical flaw First, let's see how the initial commitment of the value is done. ```python= def prove(): value = int(input("value: ")) % curve.P256.q v_blinding = get_random_int() print("Your secret blinding value is:", v_blinding) commitment = zk.commit(value, v_blinding) proof = zk.prove(value, v_blinding) print("Commitment:", point_to_bytes(commitment).hex()) print("Proof:", serialize_proof(*proof)) ``` ```python= def commit(self, v: int, v_blinding: int): commitment = pedersen_commitment(v, v_blinding, self.B, self.B_blinding) self.transcript += point_to_bytes(commitment) return commitment ``` Okay, it seems that the commitment is done before `prove` is called, and the commitment is indeed inserted into `transcript`. However, upon further inspection it appears that this `transcript` is actually reinitialized in the `prove` function. ```python= A = multiexp(self.G, a_l) + multiexp(self.H, a_r) + a_blinding * self.B_blinding S = multiexp(self.G, s_l) + multiexp(self.H, s_r) + s_blinding * self.B_blinding self.transcript = point_to_bytes(A) self.transcript += point_to_bytes(S) ``` Well, this is a major flaw. Theoretically, in order to provide Honest Verifier Zero Knowledge (HVZK) property in the protocol, there must exist a simulator such that if it sees the challenge before choosing committed values, it can efficiently compute a proof/transcript even without knowing the secret. Note that in this Fiat-Shamir variant, the challenge is indeed the transcript, which is computed from hashes of committed values. Note: Transcript in this implementation differs from the common definition of the transcript in ZKP and it is quite confusing. Altough it is true that all values inserted into the transcript variable are evetually emitted as part of the proof transcript, in this implementation the transcript variable is used more as a challenge, not unlike the ones used in the Fiat-Shamir transformation. Thus, from here onwards, I will refer to transcript as the transcript defined in this implementation (which is closer to the definition of challenge), and I will refer to the usual ZKP transcript as proof. ### My wandering thoughts *This section does not contribute anything to the solution. This is just a personal log of my failed ideas. Skip ahead for the solution.* #### Idea #1: Modifying initial values On the prove function, the `v` (value) variable is read bit by bit and stored on the `a` variable. ```python= a = [((v >> i) & 1) for i in range(self.n)] a_l = [] a_r = [] for ai in a: a_l += [ai] a_r += [(ai - 1) % p] ``` The `v` variable is only read here, but these three arrays are used extensively in other variables computed in the prove function. It is clear that these arrays are inteded to have binary values (other that `a_r`), but I tried to make non binary values work by modifying primarily values in `a`. However, it seems that there needs to be some property fulfilled by `a_l` and `a_r` so that the other equations work. ```python= exp_2 = 1 exp_y = 1 for i in range(self.n): l_0.append((a_l[i] - z) % p) l_1.append(s_l[i]) r_0.append((exp_y * (a_r[i] + z) + z*z * exp_2) % p) r_1.append(exp_y * s_r[i] % p) exp_y *= y exp_2 += exp_2 ``` I did a couple hours of trial and error here, but even at the time of writing this, I still do not have a complete understanding on how the equations work, and whether or not this idea could actually succeed with proper modifications. #### Idea #2: Directly adding commitment Let's take a look at the end of the verification function, and what values it calculates. ```python= def verify(self, A, S, T1, T2, t, t_blinding, e_blinding, a, b, L, R, commitment): # Lots of calculation redacted points = [ A, S, commitment, T1, T2, self.B, self.B_blinding, ] + self.G + self.H + L + R scalars = [ 1, x, c * z*z % p, c * x % p, c * x*x % p, (w*(t - a*b) + c*(delta(y, z, p, self.n) - t)) % p, (-e_blinding - c*t_blinding) % p, ] + scalar_mul_g + scalar_mul_h + challenges + challenges_inv check = multiexp(points, scalars) return check == point.Point.IDENTITY_ELEMENT ``` It seems that our commitment will be multiplied with `c * z*z % p` where `c` is a random integer generated on verification, and `z` is a value generated from the transcript. Noting that our commitment is just `commitment = B1 * value + B1_blinding * v_blinding`, we see that if we can add commitment by some `B1 * n`, we can also add value by `n`. First, I tried to set `n = 1`, so that the sum will increase by `B * c * z * z`. Now, take a look at the 6th value in the array. It does the multiplication of `B` and `(w*(t - a*b) + c*(delta(y, z, p, self.n) - t)) % p`. Now, we a lot of potential variables to play around with. Let's first eliminate some variables that are hard to manipulate. Looking at `delta(y, z, p, self.n)`, we see that all the parameters are either constants or values generated from the transcript. The delta function itself also seems solid enough. ```python= def delta(y,z,p,n): sum_pow_2_y = sum([pow(y, i, p) for i in range(n)]) % p z_pow_3 = pow(z, 3, p) sum_2 = sum([pow(2, i, p) for i in range(n)]) % p return (((z - pow(z, 2, p)) * sum_pow_2_y) - (z_pow_3 * sum_2)) % p ``` Thus, we cannot modify this variable. Lastly, we also note that `w` is a value calculated from the transcript, and is hard to modify. This leaves us with variables `t, a, b`. #### Idea #2.1: Modifying variable t The variable `t` is actually calculated pretty late, and is not used anywhere else after initialization other that inserted into the transcript. ```python= t = (t_poly[0] + t_poly[1]*x + t_poly[2]*x*x) % p t_blinding = (z*z * v_blinding + t1_blinding*x + t2_blinding*x*x) % p e_blinding = (a_blinding + s_blinding*x) % p self.transcript += int.to_bytes(t, 32, 'big') ``` Thus, adding `z * z` before it is added to the transcript might just work. Looking at `(w*(t - a*b) + c*(delta(y, z, p, self.n) - t)) % p` however, we see that even though is reduces the equation's value by `c * z * z`, whis also adds it by `w * z * z`. Well, can we somehow balance this by modifying `a, b`? #### Idea #2.2: Modifying variable t, a, and b Interstingly, variables `a` and `b` are actually not inserted into the transcript, even though it is part of the proof. It is calculated last on the `inner_product_proof` function. ```python= a,b,L,R,_= inner_product_prove(l_list, r_list, G, Hprime, Q) ``` ```python= from utils import * def inner_product(a, b): return sum(a * b for a, b in zip(a, b)) % curve.P256.q def split_half(data): mid_index = len(data) // 2 return data[:mid_index], data[mid_index:] def inner_product_prove(a: list, b: list, G: list, H: list, Q: point.Point): assert len(a) == len(b) == len(G) == len(H) transcript = b"" for g in G: transcript += point_to_bytes(g) for h in H: transcript += point_to_bytes(h) ab = inner_product(a, b) commitment = pedersen_commitment(a, b, G, H) + ab * Q L_list = [] R_list = [] u_list = [] n = len(a) while n != 1: n //= 2 a_low, a_hi = split_half(a) b_low, b_hi = split_half(b) G_low, G_hi = split_half(G) H_low, H_hi = split_half(H) L = multiexp(G_hi, a_low) + multiexp(H_low, b_hi) + inner_product(a_low, b_hi) * Q R = multiexp(G_low, a_hi) + multiexp(H_hi, b_low) + inner_product(a_hi, b_low) * Q L_list.append(L) R_list.append(R) transcript += point_to_bytes(L) transcript += point_to_bytes(R) u = challenge(transcript) u_inv = pow(u, -1, curve.P256.q) u_list.append(u) for i in range(n): a_low[i] = (a_low[i] * u + a_hi[i] * u_inv) % curve.P256.q b_low[i] = (b_low[i] * u_inv + b_hi[i] * u) % curve.P256.q G_low[i] = multiexp([G_low[i], G_hi[i]], [u_inv, u]) H_low[i] = multiexp([H_low[i], H_hi[i]], [u, u_inv]) a = a_low b = b_low G = G_low H = H_low a = a[0] b = b[0] return a, b, L_list, R_list, commitment ``` Unfortunately, as seen above, this inner product function is absurdly complicated. Even so, I tried to understand and modify it for a couple hours, and failed miserably. At the time of writing this, I still do not know if I could succeed given more time using this idea. ### Forging a proof Now, let's take a look at the verification function once more, and what values it calculates. ```python= def verify(self, A, S, T1, T2, t, t_blinding, e_blinding, a, b, L, R, commitment): # Lots of calculation redacted points = [ A, S, commitment, T1, T2, self.B, self.B_blinding, ] + self.G + self.H + L + R scalars = [ 1, x, c * z*z % p, c * x % p, c * x*x % p, (w*(t - a*b) + c*(delta(y, z, p, self.n) - t)) % p, (-e_blinding - c*t_blinding) % p, ] + scalar_mul_g + scalar_mul_h + challenges + challenges_inv check = multiexp(points, scalars) return check == point.Point.IDENTITY_ELEMENT ``` Recall that our commitment will be multiplied with `c * z*z % p` where `c` is a random integer generated on verification, and `z` is a value generated from the transcript. Noting that our commitment is just `commitment = B1 * value + B1_blinding * v_blinding`, we see that if we can add commitment by some `B1 * n` and balance this added value somewhere such that the verification still passes, we can efficiently forge a proof for value larger than 32 bits. After pondering for what feels like an eternity, turns out the solution is very simple. The variables `T1` and `T2` are the perfect candidates to do this balancing. In this solution, I will only use `T1` as it is sufficient. ```python= T1 = pedersen_commitment(t1 - z**2, t1_blinding, self.B, self.B_blinding) # PATCHED HERE T2 = pedersen_commitment(t2, t2_blinding, self.B, self.B_blinding) self.transcript += point_to_bytes(T1) self.transcript += point_to_bytes(T2) ``` Intially, `T1` is calculated as `t1 * B + t1_blinding * B_blinding`. Well, what if we just reduce it by `z**2 * B`? In the final verification calculation, `T1 * c * x` will decrease by `c * z**2 * x * B`. This is exactly what we needed, as now we can add the commitment by `B1 * x` such that in the final calculation, `commitment * c * z**2` will increase by `c * z**2 * x * B`, balancing out the entire calculation. This allows us to forge the proof easily. Note that we need to increase commitment depending on the variable `x` which is generated from the transcript after `T1`. This is not a problem, but keep in mind that we have to modify `T1` __before__ the calculation of the transcript, otherwise the transcript calculated in the verification will differ. ```python= def get_random_int(): #rand = random.SystemRandom() random.seed(23425) rand = random return rand.randint(1, curve.P256.q) ``` In order to write a solver easier, we can also fix the randomness such that `x` will always be the same on different runs. ### Solver ```python= from range_proof import RangeProof from utils import get_random_int, bytes_to_point, point_to_bytes, serialize_proof, deserialize_proof, curve from azunyan.conn import process, remote #r = process(['python3', 'challenge.py'], level='debug') r = remote("0.cloud.chals.io", 28347) zk = RangeProof(32) x = 51080844872852276557658802498560092047565239188604507211960285500720406450459 value = (2 ** 31 - 1) v_blinding = 23 proof = zk.prove(value, v_blinding) commitment = zk.commit(value + x, v_blinding) # verify r.sendlineafter(b'$> ', b'2') r.sendlineafter(b'Commitment:', point_to_bytes(commitment).hex().encode()) r.sendlineafter(b'Proof:', serialize_proof(*proof)) r.sendlineafter(b'(y/n)>', b'y') r.sendlineafter(b'value:', str(value + x).encode()) r.sendlineafter(b'blinding value:', str(v_blinding).encode()) r.interactive() #TSA{this_challenge_is_just_excuse_for_you_to_study_bulletproofs_xD} ``` Note: solver uses patched `RangeProof` and `utils`. See previous segment for details.