« Back to the main CSCI1660 website
Due: Thursday, February 13 by 11:59pm EST
Note: We're still setting up the Gradescope submission for this assignment, so it's not possible to submit your work yet.
We'll post an announcement on Ed when the Gradescope submission is ready, after which this message will be removed.
In this project, you will investigate the security of some small cryptographic systems as a way to gain practice with cryptography and security principles. Each system is a self-contained problem–you can work on the problems in any order.
Each problem describes a cryptographic scheme employed by some part of the fictional "Blue University", which has a history of terrible security practices. As you may expect, all of the schemes have fundamental weaknesses that make them possible to break–-and you get to break them!
Our support code provides some programs that emulate the weak cryptosystem described in each problem. You will carry out your attack by writing a script that performs the attack, the goal of which might be to leak a secret key or recover a plaintext.
This document has a section for each problem describes some background about the problem, the attack, the mechanics of how the support programs and stencil code work, and how you need to interact with them.
Requirements: CS1660 students must complete the three problems:
Students in CS1620/CS2660 must complete one extra problem:
For all students, each problem is worth an equal portion of the overall credit for this project. For CS1660, this means that each problem is worth one third of the available credit, and 25% each for CS1620/CS2660.
You can think of this document as four self-contained, mini-handouts, one for each problem, with some general setup instructions and resources at the beginning and end that apply to all problems. For each problem, there are two components:
If this is your first time reading this handout, you might want to skim all the background sections first to get a sense of the concepts you'll need, or perhaps you'll want to start looking at the implementation. Either way is fine, but once you do your first-pass read, be sure to come back for the other parts when you need them.
You should work on this assignment using our course container environment–-if you have not done so already, you can find setup instructions here.
If you indicated on HW0 that you are unable to run the container environment on your own system, we will contact you with some options. In the meantime, you can work on this assignment using any CS department system–-see this guide for instructions on how to access them remotely via SSH.
In the last year, we've made some changes to project's infrastructure and added one new problem (Passwords). We're always looking for ways to improve our documentation and resources–as the project continues, we may release more guidance and resources to help!
If you have questions or find anything in this document that you think is incorrect or unclear, please let us know by posting on Ed (your post can be anonymous, even to us). For each project, we will maintain a pinned FAQ post with a "readling list"–-please check this thread for updates and clarifications!
If we make any updates to this handout, we'll post a comment in the FAQ post or make an announcement to let you know. When this happens, please refresh the page to make sure you're on the latest version.
This assignment is made up of a several small parts. This section describes how to navigate how we've organized the support and stencil code you'll be using for the various problems, as well as some general mechanics about the project.
If you're reading this document for the first time and want to see the problems, skim over this section and then return here when you're ready to start writing code.
Each problem contains some support programs, as well as stencil code for developing your attack. You can download your starter code, and create a git repository you will use for development using this link.
Cloning your repository: To make development easier, we recommend cloning your stencil code for this project so you can access files in it from your development container. To do this, we recommend cloning the stencil in your DEV-ENVIRONMENT/home
directory, where DEV-ENVIRONMENT
is the name of the folder you used when you cloned the development container.
Here's an example of what your filesystem might look like (based on what you might have from project 1):
- ...
|--DEV-ENVIRONMENT
| |--docker/
| |--home/
| | |--.etc/
| | |--p01-cryptography-yourname/ # <------------- Clone your stencil here!
| |--run-container
| |-- ...
...
After your clone your repository, the layout will look like this:
p01-cryptography-yourname
|- grades/ # <--- Problem directory for ivy
| |- stencil/ # <--- Stencil code for grades
| |- go/
| | |- STENCIL.md # Guide for using this stencil
| | |- sol.go
| | |- ...
| |- python/
| | |- STENCIL.md
| | |- ...
| |- ...
|- ivy/ # <--- Problem directory for ivy
| |- stencil/ # <--- Stencil code for ivy
| |- ...
|- passwords/ # <--- Problem directory for passwords
| |- ...
|- ...
Each problem has its own directory (called the "problem directory"), which contains some support files and stencil code. Stencil code is provided in multiple languages (more on this in the next section).
Warning: Do not change this directory structure, as we expect your code to follow this layout when we clone your repository for grading. For more details on how to submit your code to work with our autograder, please read the "Assignment" and "Deliverables" sections for each problem.
When you start on a problem, you should do the following:
grades
problem and want to use the Go stencil, you could run (from the repository root):cs1660-user@container:~/repo$ cp -Trv grades/stencil/python grades
Warning: Do not skip this step! When grading, we will look only in the main problem directory for your code–-so be sure to put it there!
STENCIL.md
, which is included with each stencil code version. This file contains helpful notes about how to use this particular stencil for this problem.TODOs
that you can fill in with your implementation.If you wish to use a language that doesn't have a stencil, you may do so ONLY if the tools for it are already installed in our course container environment. Note that the course staff may be not be able to provide support for language-level questions. If you have questions on this, please reach out to us on Ed.
For all stencils, your program should be an executable called sol
, with different arguments depending on the problem.
If you are using a scripting language like Python, your stencil will include an executable script called sol
that you will modify. For compiled languages like Go, your stencil includes a Makefile
that builds your code and outputs an executable called sol
that we will run. See your STENCIL.md
for details.
Your stencil code is set up to conform to the necessary conventions to run your code when grading (ie, program names, arguments, etc.). Do not modify these, or you will encounter issues during grading!
You will submit your work via Gradescope–-you can find a link to Gradescope on the Assignments page of the course website. Each problem has its own entry on Gradescope where you can submit, which will run our autograder for that problem.
Note: We're still finalizing the Gradescope submission for this project and hope to have it ready soon, at latest over the weekend. You can generally test each problem on your own without needing an autograder.
In this problem, you'll explore how statistical correlation can be a powerful technique for cryptanalysis.
Blue University is deciding how it wants to encrypt its grades database using symmetric encryption. One of the decisions when encrypting the database is choosing a block cipher mode. While poking around the database, you learn that it uses Electronic Codebook Mode (ECB).
A block cipher takes fixed-size messages and produces fixed-size ciphertexts. Block ciphers are powerful cryptographic primitives, but alone they aren't enough to encrypt anything interesting, since they can only encrypt small, fixed-size messages. We can encrypt longer messages by combining multiple uses of a block cipher by using a particular block cipher mode.
Blue University's choice of block cipher mode–-Electronic Codebook (ECB) mode–-splits the plaintext into chunks (one block long), separately encrypts each plaintext block using the block cipher, and then finally concatenates the resulting blocks to create the cipher text.
ECB's simplicity comes with serious weaknesses–-most critically, the same plaintext block will always produce the same ciphertext block. This allows for a number of attacks (such as replay attacks), or, more relevant to this problem, statistical cryptanalysis. ECB completely leaks the statistical distribution of plaintext blocks (though it doesn't leak the original plaintexts), which can be damaging in cases where overall patterns are generally more important than local detail. For example:
Plaintext image | Same image encrypted with ECB mode |
---|---|
![]() |
![]() |
In the context of a grade database, ECB mode presents similar issues–-after all, with the University's list of 100,000 students, there must be statistical patterns you can attack...
You have learned that each plaintext database record consists of a five-digit student ID (randomly chosen from 00000
..99999
, with leading zeroes if necessary), and a course grade (one of A
, B
, C
, N
) using the following format:
id:01234, grd:A\n
Where \n
is a newline character (0x0A
). Helpfully, the ID part of the line ("id:01234
") is eight bytes long, and the grade part (from the comma to the newline: ", grd:A\n
") is also 8 bytes long.
You have also learned that the block cipher used to encrypt the database uses a block length of 8 bytes (what a coincidence!). This means that the ciphertext blocks of the encrypted database are laid out like this:
...[student X id][student X’s grade][student Y id][student Y’s grade]...
Thanks to statistics that the University must report annually, you also know:
Grade | % |
---|---|
A | 50% |
B | 30% |
C | 15% |
N | 5% |
Your goal is to exploit the weaknesses of ECB mode to learn some information about student grades.
Initial setup: The starter code contains a program that generates an encrypted grades database for you to attack. To start, enter the grades
directory of your repository and run the following to generate a database:
you@container:~/repo/grades$ ./generate-database > /tmp/database.enc
This will create an encrypted copy of the database called database.enc
.
Apple silicon users: Please use the version of generate-database
located
in the arm64
directory instead, which will run faster.
Task: Using your knowledge of the database layout and university statistics, write a program or script that reads the database and figures out the following:
We have provided stencil code in both Python and Go, located in the stencil
directory for this problem. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md
file for instructions related to the stencil you are using.
Regardless of which stencil you use, your program should have the name sol
and should take in the encrypted database (eg. /tmp/database.enc
) as an argument, as follows:
./sol <encrypted database>
Your deliverable will include your program as well as a short README
describing how you determined your answer for each question, as described in the next section.
For this problem, you should submit your code as well as a README
that briefly explains how your code works and how you arrived at an answer for each problem. Your README
is worth 40% of the credit for this problem, and the program is worth 60%.
Your code and README
must reside in the grades
directory of your repository. When we run your code, we will look for an executable file called sol
, which we will run using with arguments specified above.
Your program must print out your answers in exactly this following format:
Total blocks: <number>
Ciphertext for grade A: <hex string>
Ciphertext for grade B: <hex string>
Ciphertext for grade C: <hex string>
Ciphertext for grade N: <hex string>
Famous student As: <number>
Famous student Bs: <number>
Famous student Cs: <number>
Famous student Ns: <number>
where <hex string>
is a hex string like abcd0123
and <number>
is any decimal number. Each stencil contains an example for how to print this output.
In this problem, you'll carry out an attack on a cryptographic protocol–a procedure for sending encrypted messages between two parties.
Scenario: Blue University's CS department has a "secure" wireless network for its department systems that uses a custom wifi protocol called Ivy. All devices on the network share a key
You want to connect your own computer to the secure network, so you want to obtain the key
This problem is based on a classic attack for WEP, an early method of securing WiFi networks (circa 2000-2004).
Don't worry, though–you don't need to know anything about WEP, or any more about wireless encryption protocols or networking beyond what is described in this handout. Our emulated version abstracts all these details away from you so you can focus on the idea of attacking a weak protocol.
While you're welcome to read more about WEP online, we do not recommend doing so to find guidance on how to solve this problem. Since our version abstracts away many details, anything WEP-specific you read online will not translate well and thus isn't likely to help you. There are also several other, more popular attacks on WEP other than the one we use here that simply aren't possible on our toy version of the protocol.
Network traffic is split up into packets–chunks of data that are sent over the network independently. Devices connected to a wireless network are called clients, which send packets to the network via a router. All devices connected to the network share the same key. To send a packet, the client encrypts it with the key and sends the resulting ciphertext to the router, which decrypts the packet before sending it elsewhere (eg. the Internet).
However, if the same packet data is sent twice, it would always encrypt to the same ciphertext–this would allow an eavesdropper to learn information about packets they have seen before. To address this, each packet is appended with a random initialization vector (IV) so that, so long as two different IVs are used, the same plaintexts will produce different ciphertexts.
The IVy protocol has two phases:
The actual encryption and decryption use a stream cipher. Here's the gist (with details below):
The encryption function takes an 8-byte key
encrypt(k, m):
IV = R() // Generatre initialization vector
seed = IV || k // Concatenate IV and k
r = G(seed, len(m)) // Generate len(m) random bytes
c = m ^ r // xor m and r to produce ciphertext
return IV, c
where:
R()
is a source of randomness. Each call to R()
generates a 16-bit random number uniformly distributed from {G(seed, len)
is a pseudorandom stream generator. Given a seed, seed
and length l
, it generates a pseudorandom stream l
bytes. Given the same values for seed
and l
, G
will always produce the same output.After encrypting, the sender sends a packet containing both the ciphertext and IV, ie.
Thus, when the receiver gets the packet, it thus receives both the ciphertext and IV. The receiver also has the key, and it can reconstruct the same random stream the sender used to encrypt a message to do the decryption like this:
decrypt(k, iv, c):
seed = iv || k // Concatenate IV and k
r = G(seed, len(c)) // Generate len(c) random bytes
m = c ^ r // xor c and r to get plaintext m
return m
The reason that this works is that both sides know the same key,
c = m ^ r
Then, when we decrypt the ciphertext, we compute:
c ^ r = (m ^ r) ^ r
= m ^ (r ^ r)
= m ^ 0 // Any number xor'd with itself is 0
= m // Message recovered!
This allows the receiver to recover the original message!
Once again, your goal is to recover the key
In conceptual terms, this setup creates what is called an encryption oracle: you can pick any plaintext you want and observe the encrypted output (in this case, as an (IV, ciphertext) pair)–and you can do this as many times as you like. This type of attack is called a chosen-plaintext attack, and the Ivy protocol is vulnerable in such a way that it can allow you to recover
Sadly, we don't have an actual wireless network for you to attack . Instead, we give you a program called
client
, which simulates your interaction with the IVy protocol: you can "sniff" and send messages to it by reading and writing to the program's stdin and stdout.
Your goal is to write a script that interacts with the client
binary to recover the key
The ivy
directory of your project repository contains a binary called client
, which simulates the IVy protocol.
You'll communicate with the client
program by sending plaintexts on stdin
and reading them the encrypted result from its stdout
. For input, the client
program expects to receive each plaintext as a string in hexadecimal format (called "hex-encoding"), followed by a newline.
For examples of how to think about hex-encoding and how to send data, take a look at one of the following examples:
To send the human-readable plaintext string like "hello", we first need to convert it into an array of bytes (type bytes
), which will make it easy to send an manipulate. In Python, we can convert strings to/from bytes
objects using encode
and decode
:
# Convert plaintext string to a byte array
plaintext = "hello".encode("utf-8")
For bytes
objects, Python provides the method .hex()
to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes:
# Encode in hexadecimal
to_send = plaintext.hex()
# to_send === "68656c6c6f", based on ASCII values for "hello":
# h (0x68), e (0x65), l (0x6c), ...
# Send hex-encoded string to client program
ivy_stdin.write(to_send + "\n")
Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own bytes
objects:
# Make a byte slice from constant values
from_ints = bytes([0xaa, 0xbb, 0xcc, 0xdd])
from_hex_string = bytes.fromhex("aabbccdd")
assert from_ints == from_hex_string # These are equivalent!
# Make a byte array of 100 bytes, initialized to zero
empty = bytes(100)
In Python, you can manipulate bytes
objects much like you would ordinarily use lists–if you haven't used them before, we recommend trying out some examples in a Python shell to get a feel for it.
For more examples of how to work with bytes
objects, take a look at the stencil code, or feel free to look up examples online.
To send the human-readable plaintext string like "hello", we first need to convert it into a byte slice (type []byte
), which will make it easy to send an manipulate, which we can do like this:
// Convert plaintext string to a byte array
plaintext := []byte("hello")
Go provides a helper method to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes:
// Encode in hexadecimal
toSend = hex.EncodeToString(plaintext)
// toSend === "68656c6c6f", based on ASCII values for "hello":
// h (0x68), e (0x65), l (0x6c), ...
// Send hex-encoded string to client program
_, err := ivyStdin.Write([]byte(fmt.Sprintf("%s\n", toSend)))
if err != nil {
. . .
}
Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own byte slice:
// Make a byte slice from constant values
plaintext := []byte{0xaa, 0xbb, 0xcc, 0xdd}
// Make a byte slice of size 100, initialized to zero
plaintext := make([]byte, 100)
For more examples of how to work with byte arrays, take a look at the stencil code, or feel free to look up examples online.
For each plaintext message, the client
program prints corresponding ciphertexts as a line to stdout
in the format:
<iv> <ciphertext>
The first line of output corresponds to the ciphertext of the authentication packet that the client first sends to the router.
Apple silicon users: Please use the version of the client
binary located in the arm64
directory instead.
We have provided stencil code in both Go and Python, located in the in the ivy/stencil
directory. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md
file for instructions related to the stencil you are using.
Your main executable should be called sol
, which is run as follows:
$ ./sol <client binary> [test key]
where <client binary>
is the path to the client
binary.
The stencil code creates a subprocess that runs the client
program and allows you to send packets by writing to the client
's stdin
and observing the output by reading from its stdout
. For details on how this works, see the comments in the files for each stencil, and your STENCIL.md
file.
Note: When you first start the client
program, it prints an (IV, ciphertext) pair before any input is given. This is the packet from the setup phase, where where the ciphertext is an encrypted packet containing the key itself, ie. pkt =
To check your work, you can make the client
use a key of your choosing by specifying a "test key" as an optional argument. The key must be an 8-byte value, specified as a hex-string, like this:
$ ./sol ./client aabbccddeeff0101
In this example, your attack should recover the key aabbccddeeff0101
. If a test key is not specified, the key will be generated automatically.
Your code and README
must reside in the ivy
directory of your repository. When we run your code, we will look for an executable file called sol
, which we will run using with arguments specified above. For more details, see this section, as well as your STENCIL.md
file.
After recovering the key, your program should print out the recovered key to stdout
as an 8-byte hex string (eg. ababababcdcdcdcd
) as its last line of output, followed by a newline. You can example for how to do this in each stencil.
Your code is program is worth 60% of the credit for this problem, and the README
(and your analysis of how the attack worked) is worth 40%.
Your README
should cover the following:
Note: We'll learn more about passwords in lectures 4-6, though you're welcome to get started on this problem now!
In this problem, your goal is to think about how systems use and store passwords. Passwords are highly sensitive–if an attacker can leak information about a user's password for some application or system, they could impersonate the user, or potentially compromise other systems where the user reused the same password. Thus, in a secure system, passwords should never be stored in plaintext: instead, systems usually store a hashed or encrypted version of the password to make it harder for the attacker to recover the original value.
In this problem, you'll take turns doing some attack and defense on a hypothetical "database" of passwords:
Scenario: Blue University is looking for a way to build a single sign-on (SSO) system, and needs a way to represent passwords. Their IT department wants to use the following password policy (which conveniently makes this problem feasible for an assignment):
a-z
) and digits (0-9
)Context note: While this password policy is very simplistic, it's enough to demonstrate how to think about password storage and cracking without needing to expend significant time and computing resources to do it. Modern password policies and storage require more complex passwords and stronger hash functions, but in general build on the same techniques you're learning here.
You should work on this project from the passwords
directory of your repository. Our support code provides scripts to generate password databases secured with various methods that we'll explore in this problem.
To generate your databases, run the following command, which should produce output like this and write some database files into the db
directory of your repo:
user@container:~/repo/passwords$ make generate
Generating database db/demo.db.json
Generating database db/part1.db.json
Generating database db/part2.db.json
Generating database db/part3.db.json
Our password databases are just JSON files named something.db.json
, located in the db
directory. To get a sense of the format, open up db/demo.db.json
, which should look something like this (Note: your users and passwords will be different!):
{
"method": "plain",
"users": {
"user0399": {
"password": "7vxd"
},
"user0449": {
"password": "hb5s"
},
"user0857": {
"password": "r65g"
},
. . .
}
}
There are two important components here:
method
field tells us the format for storing passwords. In our system, this database uses the plain
method, which just stores the password in cleartext, which is bad!user0857
) has an object containing data about that user–right now, it just contains the password, but this may store different info depending on the methodOur databases support three methods total–you will implement two of them (more details in the following sections):
plain
: Store the password in plaintextsha1-nosalt
: Store the SHA1 hash of the passwordsha1-salt4
: Store the SHA1 hash of (the password + a 4-byte "salt" value)For the plain
method, we can just obtain the passwords from the database by inspection–but we won't be able to do this for the other methods (open up any other db.json
file for an example). In order to test your work, the plaintext passwords are always stored in a "secrets" file–you can't use this in your implementation, but you can look at it to check your work when testing.
For example, here's the secrets file for the demo database:
user@container:~/repo/passwords$ cat db/demo.secrets.txt
user0399 7vxd
user0449 hb5s
user0857 r65g
. . .
To simulate using the database, we've provided a program login
that attempts to authenticate as a user given their username and password. To try it out on the demo database, do the following:
# Usage: ./login <database> <username> <password>
user@container:~/repo/passwords$ ./login db/demo.db.json <user> <password>
where <user>
and <password>
are passwords from your demo database, which you can find in the secrets file. (Note that usernames and passwords are randomly generated, so yours will be different from the example.) On a correct password, you should see Success!
.
To check all passwords in a database, you can use the script check_login
, like this:
# Usage: ./check_login <database> <secrets file>
user@container:~/repo/passwords$ ./check_login db/demo.db.json db/demo.secrets.txt
Logging in as user0399 => OK
Logging in as user0449 => OK
Logging in as user0857 => OK
. . .
Success!
In the next few parts, you'll extend login
to implement the other password storage methods, and you can use check_login
to test your work
sha1-nosalt
method)A common strategy for password storage is to store a hash of the user's password, which avoids storing the password in cleartext. In our database, we'll do this using the SHA1 hash function. Concretely, the idea is to store the hash h = SHA1(password)
The database file db/part1.json.db
stores 1 hashed password using this method, which looks like this:
$ cat db/part1.json.db
{
"method": "sha1-nosalt",
"users": {
"user3234": {
"password": "1cc33637bdd3b586d89d259d719e8ad9a5e4f42e"
}
}
In this form, the password
field represents the hash of the password (written in hex) instead of the password itself.
Now that we've discussed the format, you'll extend login
to support this type of database, and then you'll crack the passwords.
Note: This problem has only one stencil, which is in Python. Unlike the other problems, there's no need to copy any stencil code from a stencil
directory–the code in the passwords
directory is already set up as a stencil for you.
If you want to use another language, you can just replace the stencil login
and pwfind
programs (more on these later) with your own versions. Our testing scripts will run and just examine their return values and output files, so you can replace them with your own code as long as you adhere to the specification. You are free to use any language that runs inside our container environment by default.
If you use want to use a compiled language (Go, C, …), please modify the provided Makefile
such that running make
builds your login
and pwfind
programs. For an example, see the Makefile
s for the other problems.
login
Task: Extend login
to check the user's password input against the hashed value in the database. To do this, open up login
and fill in the TODOs related to the sha1-nosalt
method. Some notes:
login
must return 0 (EXIT_SUCCESS
) when the user's password is correct, and 2 (EXIT_FAILURE
) when it's incorrect. You can see an example for the plain
methodcrypto.py
, which you can access with crypto.hash
.When you're done, check your work by trying to login as a user with ./login
and then ./check_login
.
pwfind
Now it's time to go on the offensive! Your job is to write a program to discover the password for a given hash value by brute force: since you know the maximum size and number of allowed characters per the password policy, you can enumerate all possible hash values the password could have.
To get you started, we've provided a stencil password cracking program called pwfind
. This program takes in a database as input and outputs a file containing all the passwords found in the database, which should use a similar format to the secrets file. You should be able to run pwfind
like this:
# Usage: ./check_login <database> <secrets file>
user@container:~/repo/passwords$ ./pwfind db/part1.db.json output.txt
. . .
where output.txt
is a file containing all of the user-password pairs, in a format similar to the secrets file. (To get a sense of how it works, you can try it out on demo.db.json
, which will quickly declare victory by "cracking" the plaintext passwords.)
Task: Extend pwfind
to crack passwords encrypted using the sha1-nosalt
method, and use it to discover the password of the single user in db/part1.db.json
. The process should take several seconds to complete.
We've provided a makefile to automate testing for this part. To check your work, you can run:
$ make check-part1
This should automatically check your login and password-cracking process.
Now that you have successfully cracked a single password, it's time to break more. The database file db/part2.db.json
contains password hashes using the sha1-nosalt
method for 1000 users.
While you could use your existing pwfind
implementation to crack 1000 passwords, this would take quite a while. Instead, consider what you know about the hashing process–is there a strategy you can use to save time?
Task: Update your pwfind
implementation for the sha1-nosalt
method such that the runtime is independent of the number of passwords you need to recover. Your updated version should be able to crack all 1000 passwords in roughly the same time as the 1 password from part 1.
What does this mean? We are asking you to look for a strategy for cracking N passwords, stored using the sha1-nosalt
method, where the runtime doesn't depend on N. In other words, cracking 1000 passwords shouldn't take 1000x longer than cracking one password. We're interested in your overall strategy here–you do NOT need to finely optimize your pwfind
to be as fast as possible.
If your implementation is for this part taking significantly longer than part 1, you should rethink your strategy–please come talk to us in hours or on Ed!
You can check your work for this part using make check-part2
.
A common method to avoid the problem with storing a simple hash is to concatenate the password with a random value before hashing, and then storing both the hashed password and salt. Our database setup implements this using the sha1-salt4
method, which prepends the password with a 4-byte salt and stores the hash as h = sha1(salt || password)
The database db/part3.db.json
contains 4 users with passwords stored using the sha1-salt4
method, and looks like this:
{
"method": "sha1-salt4",
"users": {
"user1117": {
"password": "5b830fdebacc6670ebd88b478f1ef911f548cac7",
"salt": "3a8c3022"
},
"user4164": {
"password": "d77e0b4568cc099e6e9eafd38ce082bb7dd95734",
"salt": "7011f215"
}
. . .
}
Task: As before, update your login
and pwfind
programs to check and crack passwords using the sha1-salt4
method when using db/part3.db.json
. You can check your work for this part using make check-part3
. Relative to part 2, how long does this take, and why?
Your submission for this part should include your completed login
and pwfind
programs. When you upload your code, we will test your programs on new, randomized databases–please do not upload your own.
Tip: If you want to check your work for all parts at once, you can run make check
.
In your passwords
directory, you should also include a README
that describes your password cracking strategy for each part. In particular, you should consider the following:
sha1-nosalt
method, why can you crack 1000 passwords in such a short time, relative to cracking just one?sha1-salt4
) method, would you be able to apply the same principle? How does using a salt change the difficulty of the cracking process?This problem is required for CS1620/CS2660 students only.
In this problem, you'll exploit a seemingly minor information leak to compromise an entire system.
Warning: This problem is longer than the others and requires some thinking, do not leave it until the last minute.
For more background on this problem, we highly recommend you attend the gearup for this project on Monday, February 5 from 5-7pm EST in CIT368 (or on Zoom). Notes and a recording will be posted afterward.
Scenario: At the same time you published your findings regarding the grades database, the University's administrators were developing a system for viewing student grades. Vowing to never make the ECB mistake again, the administrators set out to find an alternative block cipher mode for use in the grading system. They decided to implement Cipher Block Chaining (CBC) mode using a block cipher with 16-byte blocks.
Knowing Blue University's track record for implementing cryptographic protocols on their own, you decide to try to break in and read your grades. For this one, you don't have access to the database, but you can connect to the server that stores the data. The server accepts commands encrypted with a key
This problem is based on the POODLE attack, a real, major vulnerability discovered in 2014 on widely-used implementations of TLS, the protocol that secures all web traffic (ie, the "s" in "https").
Blue University's grading system is a toy system that contains a similar vulnerability. You don't need to know anything about TLS or anything more about the POODLE attack than what is presented in this handout, but you're welcome to read more about it if you want.
Please also note that the fictional grading system presented here isn't a good way to design a secure application–many important security guarantees (like user authentication, integrity, etc.) are missing, as they'd just get in the way of the attack we want you to do.
In CBC mode, each plaintext block is XOR'ed with the previous ciphertext block before it is encrypted using the block cipher. In this mode, any change in a plaintext block affects the ciphertext of all following blocks since the encryption of a given block depends on all preceding blocks, as shown in the figure below. This makes statistical attacks much more difficult.
When using any block cipher, the plaintext's length might not be a multiple of the block size. To fix this, we can add extra data to the end of the plaintext called padding until its length is a multiple of the block size. Different systems use a variety of schemes to apply padding to messages–this problem uses PKCS#7 a widely-used standard for storing encrypted data.
How it works: in PKCS#7, plaintexts are padded with bytes whose values encode the length of the padding. If one byte of padding is used, that byte's value is 1; if two bytes are used, those bytes' values are each 2. Here's an example of what some plaintext blocks would be padded (where xx
is the plaintext, which could have any value):
| ------------- 16 byte block ---------------- |
xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 01 (15 bytes + 1 byte padding)
xx xx xx xx xx xx xx xx xx xx xx xx xx xx 02 02 (14 bytes + 2 bytes padding)
xx xx xx xx xx xx xx xx xx xx xx xx xx 03 03 03 (13 bytes + 3 bytes padding)
xx xx xx xx xx xx xx xx xx xx xx xx 04 04 04 04 (12 bytes + 4 bytes padding)
If a plaintext's length is already a multiple of the block size, we add an entire block of padding. Since the cipher uses 16-byte blocks, a plaintext whose length is a multiple of 16 bytes would be padded with 16 bytes, each with the value 0x10
.
Implementing any cryptographic protocol or library is very tricky, and even small details can lead to significant information leaks. This problem explores what can go wrong when cryptography is implemented incorrectly.
Let's say we write a decryption function that does the following:
While this sequence of steps may seem reasonable, it leaks a small piece of information: whether the padding at the end was correct or not. While the different errors seem relatively harmless, it's enough to completely break the security of the system!
Using this small piece of information, it turns out that we can forge an (IV, ciphertext) pair which decrypts to a plaintext of our choosing. We'll give you an outline of how to think about the attack–-your job is to fill in the pieces and make it happen.
First, you'll need the ability to recover intermediate state, or the decryption of a given ciphertext block in CBC. In other words, given two ciphertext blocks
Hint: Recall that you can send (IV, ciphertext) pairs of your choice to the server. If you can somehow get 0x01
as the final byte of the plaintext, then the padding will be valid (and the server will not raise an "incorrect padding" error). By doing that, you'll know the final byte of 0x01
) and the final byte of
Once we can recover intermediate state
After some digging, you've learned a bit about how the Blue University grading server works and how to connect to it[1]. Here's a summary of what you know:
\n
), like this: [16 byte IV][Ciphertext (some multiple of 16 bytes)]\n
incorrect padding
". This is the critical information leak!help
.In the padding
directory of your repository, we have provided a binary called server
that simulates a grading server that follows this protocol and has a padding information leak. You can run the server binary as follows:
./server [--debug] <address:port>
where –debug
flag can print some useful information about the server's operation. The argument <address:port>
specifies an address and port number for how the server should "listen" for new connections, which is how we will connect to the server with our attack program–-we'll discuss what this means later in the course, for now, it's sufficient to run the server like this:
./server localhost:9999
this sets up the server to wait for connections on your local system on port 9999. We'll use this port to connect to the server with our attack program.
Apple silicon users: Please use the version of the server
binary located in the arm64
directory instead.
Before starting your attack, you should start the server
in a container terminal as above. The server will run until you exit with Ctrl+C, waiting for you to connect to it and send commands. When you perform your attack, the server must be running. This means you'll need to have at least two terminals open inside the container–-one for the server, and one for the attack program.
Task: Your goal is to get the server to reveal the grades of the student whose ID is 12345. You should start by sending the "help
" command, which should give you more information about how to find the student's grades.
We have provided stencil code in both Go and Python, located in the padding/stencil
directory of your repository. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md
file for instructions related to the stencil you are using.
In all implementations, the stencil code implements command-line argument validation and sets up the necessary networking code to read and write arbitrary strings to/from padding
server, which you can leverage to create encrypted messages to perform your attack. You can modify the provided networking code if necessary, but it shouldn't be necessary.
The main executable for all stencils is called sol
, which is run as follows:
./sol <server address> <server port> "<command>"
For example, to send the command "help
" to the server using the address and port shown above, you would run:
./sol localhost 9999 "help"
Your code and README
should be located in the padding
directory of your repository. When we run your code, we will look for an executable file called sol
, which we will run using a pairs file as specified above. For more details, see this section, as well as your STENCIL.md
file.
After your program performs the attack, you should print the response from the server to stdout
–the response should be the last line(s) of your program's output.
Your program is worth 50% of the credit for this problem, and the README
is worth 50%.
help
command should only require a single ciphertext block, so this should allow you to test your solution to recovering a single ciphertext block's intermediate state even if you haven't yet figured out how to forge multi-block messages.\0xaa\0xbb
, you must actually send the byte encoding of the hex string 'aabb'
, which in byte form is actually 4 bytes long: \0x61\0x61\0x61\0x61
.Updates to this assignment after release will be posted here.