# Hash Attacks ## What is a Hash A hash is a function that is very easy to compute in one direction and hard to compute in the other. If $f(x)$ is a hash function, and we know $f$, then if $f(x) = y$ and BOTH are in bytes knowing $x$ its easy to get $y$. If we only know $y$, we should have to brute force to get $x$ Hashes are psuedorandom; close inputs will produce very different outputs. They are NOT encryption, they don't use an encryption key. We use them to quickly "summarize" the data in a file or password. You might want to know if I have a certain file without sending me the file itself. To check, we could just exchange hashes. Google might want to check if you have the password corresponding to your username without just storing your password. So they will store a hash of the password (this is typically done with a relatively hard-to-compute hash to make bruteforce really hard). Basic vocab: collision: any two values that hash to the same value; $a \neq b, f(a) = f(b)$ 2nd preimage Given a desired hash $h$ find $a$ where $f(a) = h$ Note; both of these are really bad. If you can generate either one of these (or a few more attacks) in less than brute-force for a given hash, then the hash is considered broken. ## Common Hashes ### BROKEN AND DO NOT USE Md4, Md5 Sha1 ### Fine :D :D Sha256, Sha384, Sha512, Sha3 ## How do hash tho?? Most common is the Merkle Damgard construction. Given $f(x) = y$ where $x$ is larger than $y$, and BOTH are in bytes, we call $f$ a compression function. If it is hard to find $a$ and $b$ such that $f(a) = f(b)$ $f$ is said to be collision resistant (property of a good hash)! Merkle Damgard says that if we can find a compression function with this property, we can get a strongish hash from it. But it has some weaknesses. Assume that $||$ is concatencation; if ``` python a = b"dog" b = b"cat" ``` Then, $$ c = a || b $$ is like ``` python c = a + b ``` Merkle Damgard: $IV$ is the intial state value (publicly known usually) and $f$ is the compression function. We pad the message until it can be divided into equally sized blocks $b_1,b_2...b_n$ We add a padding block with some info about the original length of the data $b_1,b_2...b_n,b_{padding}$ Then... $state = f(IV || b_1)$ $state = f(state || b_2)$ $state = f(state || b_3)$ ... $state = f(state || b_n)$ $state = f(state || b_{padding})$ finally, we return the state. This is vulnerable to attack. ### Length Extension Assume we create a Message Authentication Code (MAC) from Merkle-Damgard hash $H(x)$ in this way: $MAC = H(secret || msg)$ to confirm that we trust the message, we verify this relationship. And since 2nd preimage is hard, this should be secure, right? WRONG Given a $MAC$ we can forge valid messages Recall: $state = f(IV || b_1)$ $state = f(state || b_2)$ These blocks have $secret$ $state = f(state || b_3)$ ... $state = f(state || b_n)$ These blocks contain $msg$ $state = f(state || b_{padding})$ Now, $state = MAC$ So the last state is our MAC... We know the hash function state after $secret || msg || padding(secret || msg)$ is processed This is the MAC But because of how the hash is made, what if we wanted $H(secret || msg|| padding(secret || msg)||msg_{evil})$ $state = f(IV || b_1)$ $state = f(state || b_2)$ These blocks have $secret$ $state = f(state || b_3)$ ... $state = f(state || b_n)$ These blocks contain $msg$ $state = f(state || b_{padding(secret||msg)})$ POINT A Now $state = MAC$ $state = f(state || b_{n+1})$ These blocks contain $msg_{evil}$ $state = f(state || b_{n+2})$ $state = f(state || b_{actual-padding})$ We don't know secret so we can't recompute to POINT A. But we do know $state$ at that point-it's just $MAC$ so $state = f(MAC|| b_{n+1})$ These blocks contain $msg_{evil}$ $state = f(state || b_{n+2})$ $state = f(state || b_{actual-padding})$ Now $state$ is a valid MAC... we've forged a valid MAC for a new message, the message $secret||msg||padding(secret||msg)||msg_{evil}$