# Hash Attacks
## What is a Hash
A hash is a function that is very easy to compute in one direction and hard to compute in the other.
If $f(x)$ is a hash function, and we know $f$, then
if
$f(x) = y$ and BOTH are in bytes
knowing $x$ its easy to get $y$. If we only know $y$, we should have to brute force to get $x$
Hashes are psuedorandom; close inputs will produce very different outputs. They are NOT encryption, they don't use an encryption key.
We use them to quickly "summarize" the data in a file or password. You might want to know if I have a certain file without sending me the file itself. To check, we could just exchange hashes. Google might want to check if you have the password corresponding to your username without just storing your password. So they will store a hash of the password (this is typically done with a relatively hard-to-compute hash to make bruteforce really hard).
Basic vocab:
collision: any two values that hash to the same value;
$a \neq b, f(a) = f(b)$
2nd preimage
Given a desired hash $h$
find $a$ where
$f(a) = h$
Note; both of these are really bad.
If you can generate either one of these (or a few more attacks) in less than brute-force for a given hash, then the hash is considered broken.
## Common Hashes
### BROKEN AND DO NOT USE
Md4, Md5
Sha1
### Fine :D :D
Sha256, Sha384, Sha512, Sha3
## How do hash tho??
Most common is the Merkle Damgard construction.
Given $f(x) = y$ where $x$ is larger than $y$, and BOTH are in bytes, we call $f$ a compression function. If it is hard to find $a$ and $b$ such that $f(a) = f(b)$ $f$ is said to be collision resistant (property of a good hash)!
Merkle Damgard says that if we can find a compression function with this property, we can get a strongish hash from it. But it has some weaknesses.
Assume that $||$ is concatencation;
if
``` python
a = b"dog"
b = b"cat"
```
Then,
$$
c = a || b
$$
is like
``` python
c = a + b
```
Merkle Damgard:
$IV$ is the intial state value (publicly known usually) and $f$ is the compression function.
We pad the message until it can be divided into equally sized blocks
$b_1,b_2...b_n$
We add a padding block with some info about the original length of the data
$b_1,b_2...b_n,b_{padding}$
Then...
$state = f(IV || b_1)$
$state = f(state || b_2)$
$state = f(state || b_3)$
...
$state = f(state || b_n)$
$state = f(state || b_{padding})$
finally, we return the state.
This is vulnerable to attack.
### Length Extension
Assume we create a Message Authentication Code (MAC) from Merkle-Damgard hash $H(x)$ in this way:
$MAC = H(secret || msg)$
to confirm that we trust the message, we verify this relationship. And since 2nd preimage is hard, this should be secure, right?
WRONG
Given a $MAC$ we can forge valid messages
Recall:
$state = f(IV || b_1)$
$state = f(state || b_2)$ These blocks have $secret$
$state = f(state || b_3)$
...
$state = f(state || b_n)$ These blocks contain $msg$
$state = f(state || b_{padding})$
Now, $state = MAC$
So the last state is our MAC...
We know the hash function state after
$secret || msg || padding(secret || msg)$ is processed
This is the MAC
But because of how the hash is made, what if we wanted $H(secret || msg|| padding(secret || msg)||msg_{evil})$
$state = f(IV || b_1)$
$state = f(state || b_2)$ These blocks have $secret$
$state = f(state || b_3)$
...
$state = f(state || b_n)$ These blocks contain $msg$
$state = f(state || b_{padding(secret||msg)})$
POINT A
Now $state = MAC$
$state = f(state || b_{n+1})$ These blocks contain $msg_{evil}$
$state = f(state || b_{n+2})$
$state = f(state || b_{actual-padding})$
We don't know secret so we can't recompute to POINT A. But we do know $state$ at that point-it's just $MAC$
so
$state = f(MAC|| b_{n+1})$ These blocks contain $msg_{evil}$
$state = f(state || b_{n+2})$
$state = f(state || b_{actual-padding})$
Now $state$ is a valid MAC...
we've forged a valid MAC for a new message, the message
$secret||msg||padding(secret||msg)||msg_{evil}$