owned this note
owned this note
Published
Linked with GitHub
# lab 1 D0029E
## Table of contents
[Part 2](#Task-2)
[Task 2.1 ... Creating colliding files](#21)
[Task 2.2 ... Understanding MD5’s Property](#22)
[Task 2.3 ... Generating Two Executable Files with the Same MD5 Hash](#23)
[Task 2.4 ... Making the Two Programs Behave Differently](#24)
## Task 2
### 2.1
#### To create two files that collide
```bash
md5collgen -p prefix.txt -o out1.bin out2.bin
```
This is the sample command for generating two files which generate the same md5 sum.
#### The prefix file is randomly generated with
```bash
head -c 64 /dev/zero > prefix.txt
```
This command takes 64 random bytes and pushes them into our prefix file.
#### Viewing the files

> The generated files (size 128)
To view the generated binary files SEEDlabs encurage us to use the bless binary editor.
#### Generating the md5 sums

> The generated sums
Generating the sums is done as expected with `md5sum` and as we can see the two different files generate the same sum.
#### Questions
* If the length of the prefix file is not a multiple of 64, what is going to happen?
> Depending on the size of the prefix file, the size of the output files changes to add the prefix provided. So the basecase is 128 bytes, when adding the length of the prefix it will be rounded up to closest multiple of 64.
* Create a prefix file with exactly 64 bytes, and run the collision tool again, and see what happens.
> Output files with a size of 192 bytes. The same principle as mentioned before about 128 basecase + prefix of 64 bytes. Rounded up (not needed) gives 192.
* Are the data (128 bytes) generated by md5collgen completely different for the two output files? Please identify all the bytes that are different
> The data generated by md5collgen are not completely different but only differ by around 4 bytes.
### 2.2
This is very easy, just add some suffix to out1.bin and out2.bin and verify that out1_longer.bin and out2_longer.bin have the same hash (as they should)

> Demonstration of the experiment
The experiment to verify this property of MD5 is done by:
1. Grabbing two files with the same verified MD5 sum
2. Append a selected suffix to create two new files.
3. Compute the MD5 sums of the new files.
4. Verify that the newly computed sums are equivalent.
### 2.3
Task 2.3 is to create two executable files with the same MD5 hashes.
This can be accomplished through the following steps:
1. Generate C code:
For this task the following C code was given:
```c
#include <stdio.h>
unsigned char xyz[200] = {
/* The actual contents of this array are up to you */
};
int main(){
int i;
for (i=0; i<200; i++){
printf("%x", xyz[i]);
}
printf("\n");
}
```
We filled the char array `xyz` with A (`0x41`) and inspected the generated binary with bless to find the offset for where we could insert our colliding hashes. Since MD5 reads in blocks of 64 bytes at a time we knew this was supposed to be a multiple of 64, the closest multiple of 64 inside `xyz` for us was at byte index 4224 (`0x1080`).
2. From the provided diagram we knew that we had to split the generated C executable called `good.out` and generate a new binary of the structure: `good.out[0:4224] + collision_hash + good.out[4224+len(collision_hash):]`. As we know from the earlier tasks `good.out[0:4224]+collision_hash` can be generated with:
```bash
head -c 4224 good.out > prefix
md5collgen -p prefix -o out1.bin out2.bin
```
where `out1.bin` and `out2.bin` will be `good.out[0:4224]+collision_hash`.
3. After this we only need to generate `good.out[4224+len(collision_hash):]` (which will actually be `good.out[4352:]`).
This can be done with:
```c
tail -c +4352 good.out > suffix
```
4. With all our parts ready we just have to combine them:
```bash
cat out1.bin suffix > good.out
cat out2.bin suffix > bad.out
```
these files will then produce the same hash with `md5sum` but different outputs (after being `chmod`ed) as can be observed in the screenshot below.

> All steps
### 2.4
Task 4 was making the two binaries with the same hash sum behave differently. We deviated from SEEDlabs instructions here and decided to take our own approach to the mechanism for triggering the different codes.
1. Write the C code to manipulate:
```c
#include <stdio.h>
char key[200] = {0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x4 1, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0 x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, };
char arr[10] = {
0x2,
};
int main() {
int sum = 0;
for (int i = 0; i < 200; i++) {
sum = (sum + key[i]) % 9;
}
printf("sum is: %d\t comparing with: %d\t p @ 0x%x\n", sum, arr[0], &arr[0]);
if (sum == arr[0]) {
printf("hacked\n");
} else {
printf("Good code\n");
}
}
```
This is the code we wrote for this example.
The hacked if-case should in reality result in malicious code being run. Our approach leveraged putting the colliding hashes within the `key` array.
To find out which hash is included within the array (either the hash which triggers the malicious code or the benign code) we summarize the entire array while continously applying the `mod` operator to the output, ensuring a simple value be kept in the `sum` variable.
This will let us differentiate the functionaity of the code by modifying the code in memory. We also store the comparison value in a array to easily have it available in `bless`.
2. After generating our binary code (`gcc program.c`) we can start working on injecting the hashes.
For this we use the same methodology as the last task.
This means using the first 4224 bytes from `program.c` as a prefix and using `md5collgen` to concatenate for a complete header.
After this we can extract the tail in the same way to get the actual code part of `program.c`.
3. After this we need to modify the value contained in `arr` in our code, the element which we compare with.
Running the code should print the two colliding hashes' sums which can then be noted.
To make the program work we then set the values around the placeholder value in `arr`(in our case `0x02`) to the hex value for one of the sums.
For us this was `-4` (`0xFC`). We then set this value in the `tail` file using `bless`.
4. Putting it all together.
To put it all together we concatenate the header and tail using cat, `cat out1.bin tail > out1.bin` and `cat out2.bin tail > out1.bin`.
After doing this there should be two executable files, `out1.bin` and `out2.bin`, one should print `hacked`and the other should print `good code` but still share the same `md5sum`.
