Project 1: Cryptography

Due: Thursday, February 13 by 11:59pm EST

Note: We're still setting up the Gradescope submission for this assignment, so it's not possible to submit your work yet.

We'll post an announcement on Ed when the Gradescope submission is ready, after which this message will be removed.

Introduction

In this project, you will investigate the security of some small cryptographic systems as a way to gain practice with cryptography and security principles. Each system is a self-contained problem–you can work on the problems in any order.

Each problem describes a cryptographic scheme employed by some part of the fictional "Blue University", which has a history of terrible security practices. As you may expect, all of the schemes have fundamental weaknesses that make them possible to break–-and you get to break them!

Our support code provides some programs that emulate the weak cryptosystem described in each problem. You will carry out your attack by writing a script that performs the attack, the goal of which might be to leak a secret key or recover a plaintext.

This document has a section for each problem describes some background about the problem, the attack, the mechanics of how the support programs and stencil code work, and how you need to interact with them.

Requirements: CS1660 students must complete the three problems:

Grades
Ivy
Passwords

Students in CS1620/CS2660 must complete one extra problem:

Padding

For all students, each problem is worth an equal portion of the overall credit for this project. For CS1660, this means that each problem is worth one third of the available credit, and 25% each for CS1620/CS2660.

How to read this document

You can think of this document as four self-contained, mini-handouts, one for each problem, with some general setup instructions and resources at the beginning and end that apply to all problems. For each problem, there are two components:

Background on the system and the cryptographic techniques involved, and an overview of the attack you will be performing
Implementation and logistical details on how to navigate the support code, how to test your work, etc.

If this is your first time reading this handout, you might want to skim all the background sections first to get a sense of the concepts you'll need, or perhaps you'll want to start looking at the implementation. Either way is fine, but once you do your first-pass read, be sure to come back for the other parts when you need them.

Development environment

You should work on this assignment using our course container environment–-if you have not done so already, you can find setup instructions here.

If you indicated on HW0 that you are unable to run the container environment on your own system, we will contact you with some options. In the meantime, you can work on this assignment using any CS department system–-see this guide for instructions on how to access them remotely via SSH.

Updates and feedback

In the last year, we've made some changes to project's infrastructure and added one new problem (Passwords). We're always looking for ways to improve our documentation and resources–as the project continues, we may release more guidance and resources to help!

If you have questions or find anything in this document that you think is incorrect or unclear, please let us know by posting on Ed (your post can be anonymous, even to us). For each project, we will maintain a pinned FAQ post with a "readling list"–-please check this thread for updates and clarifications!

If we make any updates to this handout, we'll post a comment in the FAQ post or make an announcement to let you know. When this happens, please refresh the page to make sure you're on the latest version.

How everything works

This assignment is made up of a several small parts. This section describes how to navigate how we've organized the support and stencil code you'll be using for the various problems, as well as some general mechanics about the project.

If you're reading this document for the first time and want to see the problems, skim over this section and then return here when you're ready to start writing code.

Repository setup

Each problem contains some support programs, as well as stencil code for developing your attack. You can download your starter code, and create a git repository you will use for development using this link.

Cloning your repository: To make development easier, we recommend cloning your stencil code for this project so you can access files in it from your development container. To do this, we recommend cloning the stencil in your DEV-ENVIRONMENT/home directory, where DEV-ENVIRONMENT is the name of the folder you used when you cloned the development container.

Here's an example of what your filesystem might look like (based on what you might have from project 1):

 - ...
 |--DEV-ENVIRONMENT
 | |--docker/ 
 | |--home/
 | | |--.etc/
 | | |--p01-cryptography-yourname/  # <------------- Clone your stencil here!
 | |--run-container
 | |-- ...
 ...

After your clone your repository, the layout will look like this:

    p01-cryptography-yourname
     |- grades/        # <--- Problem directory for ivy
     |  |- stencil/    # <--- Stencil code for grades
     |     |- go/
     |     |  |- STENCIL.md  # Guide for using this stencil
     |     |  |- sol.go
     |     |  |- ...
     |     |- python/
     |     |  |- STENCIL.md
     |     |  |- ...
     |     |- ...
     |- ivy/         # <--- Problem directory for ivy
     |  |- stencil/  # <--- Stencil code for ivy
     |     |- ...
     |- passwords/        # <--- Problem directory for passwords
     |  |- ... 
     |- ...

Each problem has its own directory (called the "problem directory"), which contains some support files and stencil code. Stencil code is provided in multiple languages (more on this in the next section).

:warning: Warning: Do not change this directory structure, as we expect your code to follow this layout when we clone your repository for grading. For more details on how to submit your code to work with our autograder, please read the "Assignment" and "Deliverables" sections for each problem.

Getting started

When you start on a problem, you should do the following:

Read over the problem's section in this document, which provides background and describes how to run the support code
Look over the support code in your repository and decide which stencil you want to use. Some problems have stencils in multiple languages (usually Python and Go). You can choose whichever stencil you feel most comfortable using for a particular problem.
Using the command below, copy the files for your stencil of choice into the problem directory. For example, if you are working on the grades problem and want to use the Go stencil, you could run (from the repository root):

cs1660-user@container:~/repo$ cp -Trv grades/stencil/python grades

Warning: Do not skip this step! When grading, we will look only in the main problem directory for your code–-so be sure to put it there!

Read the file STENCIL.md, which is included with each stencil code version. This file contains helpful notes about how to use this particular stencil for this problem.
Read over the stencil code and get started! We've left TODOs that you can fill in with your implementation.

Want to use a different language?

If you wish to use a language that doesn't have a stencil, you may do so ONLY if the tools for it are already installed in our course container environment. Note that the course staff may be not be able to provide support for language-level questions. If you have questions on this, please reach out to us on Ed.

How to run your code

For all stencils, your program should be an executable called sol, with different arguments depending on the problem.

If you are using a scripting language like Python, your stencil will include an executable script called sol that you will modify. For compiled languages like Go, your stencil includes a Makefile that builds your code and outputs an executable called sol that we will run. See your STENCIL.md for details.

Your stencil code is set up to conform to the necessary conventions to run your code when grading (ie, program names, arguments, etc.). Do not modify these, or you will encounter issues during grading!

Submitting your work

You will submit your work via Gradescope–-you can find a link to Gradescope on the Assignments page of the course website. Each problem has its own entry on Gradescope where you can submit, which will run our autograder for that problem.

Note: We're still finalizing the Gradescope submission for this project and hope to have it ready soon, at latest over the weekend. You can generally test each problem on your own without needing an autograder.

Grades

In this problem, you'll explore how statistical correlation can be a powerful technique for cryptanalysis.

Blue University is deciding how it wants to encrypt its grades database using symmetric encryption. One of the decisions when encrypting the database is choosing a block cipher mode. While poking around the database, you learn that it uses Electronic Codebook Mode (ECB).

Background

Block Cipher Modes

A block cipher takes fixed-size messages and produces fixed-size ciphertexts. Block ciphers are powerful cryptographic primitives, but alone they aren't enough to encrypt anything interesting, since they can only encrypt small, fixed-size messages. We can encrypt longer messages by combining multiple uses of a block cipher by using a particular block cipher mode.

Blue University's choice of block cipher mode–-Electronic Codebook (ECB) mode–-splits the plaintext into chunks (one block long), separately encrypts each plaintext block using the block cipher, and then finally concatenates the resulting blocks to create the cipher text.

Weaknesses of ECB

ECB's simplicity comes with serious weaknesses–-most critically, the same plaintext block will always produce the same ciphertext block. This allows for a number of attacks (such as replay attacks), or, more relevant to this problem, statistical cryptanalysis. ECB completely leaks the statistical distribution of plaintext blocks (though it doesn't leak the original plaintexts), which can be damaging in cases where overall patterns are generally more important than local detail. For example:

Plaintext image	Same image encrypted with ECB mode

In the context of a grade database, ECB mode presents similar issues–-after all, with the University's list of 100,000 students, there must be statistical patterns you can attack...

What you know about the grades database

You have learned that each plaintext database record consists of a five-digit student ID (randomly chosen from 00000..99999, with leading zeroes if necessary), and a course grade (one of A, B, C, N) using the following format:

id:01234, grd:A\n

Where \n is a newline character (0x0A). Helpfully, the ID part of the line ("id:01234") is eight bytes long, and the grade part (from the comma to the newline: ", grd:A\n") is also 8 bytes long.

You have also learned that the block cipher used to encrypt the database uses a block length of 8 bytes (what a coincidence!). This means that the ciphertext blocks of the encrypted database are laid out like this:

...[student X id][student X’s grade][student Y id][student Y’s grade]...

Helpful University Statistics

Thanks to statistics that the University must report annually, you also know:

The university has 100,000 students.
Each student has completed 30 courses (that is, there are 30 entries for each of the 100,000 students).
The university-wide distribution of grades is approximately:

Grade	%
A	50%
B	30%
C	15%
N	5%

Your Assignment

Your goal is to exploit the weaknesses of ECB mode to learn some information about student grades.

Initial setup: The starter code contains a program that generates an encrypted grades database for you to attack. To start, enter the grades directory of your repository and run the following to generate a database:

you@container:~/repo/grades$ ./generate-database > /tmp/database.enc

This will create an encrypted copy of the database called database.enc.

Apple silicon users: Please use the version of generate-database located
in the arm64 directory instead, which will run faster.

Task: Using your knowledge of the database layout and university statistics, write a program or script that reads the database and figures out the following:

Since you know the plaintext format of the database, how many possible unique ciphertext blocks exist? (Hint: you can answer this question based on only the information in this document–-then you can check your answer with a script!)
What ciphertext block corresponds to an A grade? B? C? N?
There's a student who's famous at the university for being the only student to ever get both As and Cs but no Bs. Exactly how many As, Cs, and Ns has this student received?

How to get started

We have provided stencil code in both Python and Go, located in the stencil directory for this problem. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md file for instructions related to the stencil you are using.

Regardless of which stencil you use, your program should have the name sol and should take in the encrypted database (eg. /tmp/database.enc) as an argument, as follows:

./sol <encrypted database>

Your deliverable will include your program as well as a short README describing how you determined your answer for each question, as described in the next section.

What to submit

For this problem, you should submit your code as well as a README that briefly explains how your code works and how you arrived at an answer for each problem. Your README is worth 40% of the credit for this problem, and the program is worth 60%.

Your code and README must reside in the grades directory of your repository. When we run your code, we will look for an executable file called sol, which we will run using with arguments specified above.

Your program must print out your answers in exactly this following format:

Total blocks:  <number>
Ciphertext for grade A: <hex string>
Ciphertext for grade B: <hex string>
Ciphertext for grade C: <hex string>
Ciphertext for grade N: <hex string>
Famous student As:  <number>
Famous student Bs:  <number>
Famous student Cs:  <number>
Famous student Ns:  <number>

where <hex string> is a hex string like abcd0123 and <number> is any decimal number. Each stencil contains an example for how to print this output.

Ivy Wireless

In this problem, you'll carry out an attack on a cryptographic protocol–a procedure for sending encrypted messages between two parties.

Scenario: Blue University's CS department has a "secure" wireless network for its department systems that uses a custom wifi protocol called Ivy. All devices on the network share a key

k

that's used to encrypt traffic, just like a typical password on a standard, small-scale Wifi network.

You want to connect your own computer to the secure network, so you want to obtain the key

k

. As a student, you can use department systems that are already connected to secure network, but as a normal (non-admin) user, you can't just poke around in the system settings and look up the key. Instead, you try to attack the protocol, and you can leverage your existing access to department systems to help you do it.

Historical context: where does this problem come from?

This problem is based on a classic attack for WEP, an early method of securing WiFi networks (circa 2000-2004).

Don't worry, though–you don't need to know anything about WEP, or any more about wireless encryption protocols or networking beyond what is described in this handout. Our emulated version abstracts all these details away from you so you can focus on the idea of attacking a weak protocol.

While you're welcome to read more about WEP online, we do not recommend doing so to find guidance on how to solve this problem. Since our version abstracts away many details, anything WEP-specific you read online will not translate well and thus isn't likely to help you. There are also several other, more popular attacks on WEP other than the one we use here that simply aren't possible on our toy version of the protocol.

Background: The Ivy Protocol

Network traffic is split up into packets–chunks of data that are sent over the network independently. Devices connected to a wireless network are called clients, which send packets to the network via a router. All devices connected to the network share the same key. To send a packet, the client encrypts it with the key and sends the resulting ciphertext to the router, which decrypts the packet before sending it elsewhere (eg. the Internet).

However, if the same packet data is sent twice, it would always encrypt to the same ciphertext–this would allow an eavesdropper to learn information about packets they have seen before. To address this, each packet is appended with a random initialization vector (IV) so that, so long as two different IVs are used, the same plaintexts will produce different ciphertexts.

The IVy protocol has two phases:

Setup phase: When each client connects, it sends an encrypted packet
$p k t$ to the router containing the key itself: ie
$p k t = {I V, E_{k} (k)}$ . This is used to set up the client's connection and authenticate it on the network (ie, prove that it knows the key,
$k$ ). The key is 8-bytes long.
Normal operation: After the first message, the client sends out encrypted packets for each plaintext message
$m$ it wants to send over the network, ie.
$p k t = {I V, E_{k} (m)}$ .
(The details of the actual message aren't important–to the protocol, they're just bytes.)

The actual encryption and decryption use a stream cipher. Here's the gist (with details below):

The key and IV are used as the seed to a pseudorandom number generator
$G$ : a function that generates output that appears as a random sequence of bytes, but can be regenerated precisely given the same starting seed
To encrypt, a message is XORed with the random sequence of bytes to produce a ciphertext, which is sent over the network
If the receiver has the key, they can decrypt the message by using the pseudorandom number generator to get the same random sequence of bytes, XOR it with the ciphertext, and recover the message.

Ready for the full protocol details? Click here!

The encryption function takes an 8-byte key

k

, and a message

m

, and returns both the randomly-generated IV and the ciphertext:

encrypt(k, m):
    IV   = R()          // Generatre initialization vector
    seed = IV || k      // Concatenate IV and k
    r = G(seed, len(m)) // Generate len(m) random bytes
    c = m ^ r           // xor m and r to produce ciphertext
    return IV, c

where:

R() is a source of randomness. Each call to R() generates a 16-bit random number uniformly distributed from {
$0, . . ., 2^{16} - 1$ }
G(seed, len) is a pseudorandom stream generator. Given a seed, seed and length l, it generates a pseudorandom stream l bytes. Given the same values for seed and l, G will always produce the same output.

After encrypting, the sender sends a packet containing both the ciphertext and IV, ie.

p k t = {I V, c}

Thus, when the receiver gets the packet, it thus receives both the ciphertext and IV. The receiver also has the key, and it can reconstruct the same random stream the sender used to encrypt a message to do the decryption like this:

decrypt(k, iv, c):
    seed = iv || k      // Concatenate IV and k
    r = G(seed, len(c)) // Generate len(c) random bytes
    m = c ^ r           // xor c and r to get plaintext m
    return m

The reason that this works is that both sides know the same key,

k

, and they both know the same IV since the IV is sent along with the ciphertext. As a result, they are able to reconstruct the seed and the same random stream,

r

. Thus, generating the ciphertext means computing:

c = m ^ r

Then, when we decrypt the ciphertext, we compute:

c ^ r = (m ^ r) ^ r
      = m ^ (r ^ r)
      = m ^ 0       // Any number xor'd with itself is 0
      = m           // Message recovered!

This allows the receiver to recover the original message!

The attack

Once again, your goal is to recover the key

k

. In terms of the problem scenario, you have two tools available to help you do this:

You have an account on a department system already connected to the secure network, so you can use it to send any packets you want
Using your own computer, you can inspect the encrypted packets (where
$p k t = {I V, c i p h e r t e x t}$ ) that are sent over Wifi–this is called wifi sniffing which we'll learn more about later in the course

In conceptual terms, this setup creates what is called an encryption oracle: you can pick any plaintext you want and observe the encrypted output (in this case, as an (IV, ciphertext) pair)–and you can do this as many times as you like. This type of attack is called a chosen-plaintext attack, and the Ivy protocol is vulnerable in such a way that it can allow you to recover

k

Your task

Sadly, we don't have an actual wireless network for you to attack :cry:. Instead, we give you a program called client, which simulates your interaction with the IVy protocol: you can "sniff" and send messages to it by reading and writing to the program's stdin and stdout.

Your goal is to write a script that interacts with the client binary to recover the key

k

by performing a chosen plaintext attack.

Problem setup (and how to send data)

The ivy directory of your project repository contains a binary called client, which simulates the IVy protocol.

You'll communicate with the client program by sending plaintexts on stdin and reading them the encrypted result from its stdout. For input, the client program expects to receive each plaintext as a string in hexadecimal format (called "hex-encoding"), followed by a newline.

Important background: sending data as hex strings

For examples of how to think about hex-encoding and how to send data, take a look at one of the following examples:

Example for Python

To send the human-readable plaintext string like "hello", we first need to convert it into an array of bytes (type bytes), which will make it easy to send an manipulate. In Python, we can convert strings to/from bytes objects using encode and decode:

# Convert plaintext string to a byte array
plaintext = "hello".encode("utf-8")

For bytes objects, Python provides the method .hex() to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes:

# Encode in hexadecimal
to_send = plaintext.hex()   
# to_send === "68656c6c6f", based on ASCII values for "hello":
# h (0x68), e (0x65), l (0x6c), ...

# Send hex-encoded string to client program
ivy_stdin.write(to_send + "\n")

Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own bytes objects:

# Make a byte slice from constant values
from_ints = bytes([0xaa, 0xbb, 0xcc, 0xdd]) 
from_hex_string = bytes.fromhex("aabbccdd")
assert from_ints == from_hex_string  # These are equivalent!


# Make a byte array of 100 bytes, initialized to zero
empty = bytes(100)

In Python, you can manipulate bytes objects much like you would ordinarily use lists–if you haven't used them before, we recommend trying out some examples in a Python shell to get a feel for it.

For more examples of how to work with bytes objects, take a look at the stencil code, or feel free to look up examples online.

Example for Go

To send the human-readable plaintext string like "hello", we first need to convert it into a byte slice (type []byte), which will make it easy to send an manipulate, which we can do like this:

// Convert plaintext string to a byte array
plaintext := []byte("hello")

Go provides a helper method to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes:

// Encode in hexadecimal
toSend = hex.EncodeToString(plaintext)
// toSend === "68656c6c6f", based on ASCII values for "hello":
// h (0x68), e (0x65), l (0x6c), ...

// Send hex-encoded string to client program
_, err := ivyStdin.Write([]byte(fmt.Sprintf("%s\n", toSend)))
if err != nil {
    . . .  
}

Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own byte slice:

// Make a byte slice from constant values
plaintext := []byte{0xaa, 0xbb, 0xcc, 0xdd}
    
// Make a byte slice of size 100, initialized to zero
plaintext := make([]byte, 100)

For more examples of how to work with byte arrays, take a look at the stencil code, or feel free to look up examples online.

For each plaintext message, the client program prints corresponding ciphertexts as a line to stdout in the format:

<iv> <ciphertext>

The first line of output corresponds to the ciphertext of the authentication packet that the client first sends to the router.

Apple silicon users: Please use the version of the client binary located in the arm64 directory instead.

How to get started

We have provided stencil code in both Go and Python, located in the in the ivy/stencil directory. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md file for instructions related to the stencil you are using.

Your main executable should be called sol, which is run as follows:

$ ./sol <client binary> [test key]

where <client binary> is the path to the client binary.

The stencil code creates a subprocess that runs the client program and allows you to send packets by writing to the client's stdin and observing the output by reading from its stdout. For details on how this works, see the comments in the files for each stencil, and your STENCIL.md file.

Note: When you first start the client program, it prints an (IV, ciphertext) pair before any input is given. This is the packet from the setup phase, where where the ciphertext is an encrypted packet containing the key itself, ie. pkt =

{I V, E_{k} (k)}

Checking your work: the "test key"

To check your work, you can make the client use a key of your choosing by specifying a "test key" as an optional argument. The key must be an 8-byte value, specified as a hex-string, like this:

$ ./sol ./client aabbccddeeff0101

In this example, your attack should recover the key aabbccddeeff0101. If a test key is not specified, the key will be generated automatically.

What to submit

Your code and README must reside in the ivy directory of your repository. When we run your code, we will look for an executable file called sol, which we will run using with arguments specified above. For more details, see this section, as well as your STENCIL.md file.

After recovering the key, your program should print out the recovered key to stdout as an 8-byte hex string (eg. ababababcdcdcdcd) as its last line of output, followed by a newline. You can example for how to do this in each stencil.

Your code is program is worth 60% of the credit for this problem, and the README (and your analysis of how the attack worked) is worth 40%.

Your README should cover the following:

Click to expand

Explain in detail what your attack does and why it works. Your explanation does not need to be long, but should have sufficient detail to describe it to someone who has only read this document.
Describe the vulnerability that made your attack possible, and discuss whether your attack (or a similar one) would have been possible without the vulnerability.
Discuss how the vulnerability could be fixed, by considering the following:
- How could the Ivy wireless protocol change to make your attack more difficult (or impossible)? While Ivy's setup phase (sending the encrypted key at startup) may seem problematic, wireless protocols usually requires a setup step like this to make the the protocol work. How could you make the attack more difficult without getting rid of the setup phase?
- What would an attacker need to do to defeat the new design? How is this more secure than the original design?
If you have any additional instructions or notes about your implementation or how to run it, or any questions/feedback for us, please include it as well.

Passwords

Note: We'll learn more about passwords in lectures 4-6, though you're welcome to get started on this problem now!

In this problem, your goal is to think about how systems use and store passwords. Passwords are highly sensitive–if an attacker can leak information about a user's password for some application or system, they could impersonate the user, or potentially compromise other systems where the user reused the same password. Thus, in a secure system, passwords should never be stored in plaintext: instead, systems usually store a hashed or encrypted version of the password to make it harder for the attacker to recover the original value.

In this problem, you'll take turns doing some attack and defense on a hypothetical "database" of passwords:

On the defensive side, you'll implement two methods of "securely" storing passwords, and implement a login procedure that uses them
On the offensive side, you'll conduct two attacks to recover users' passwords from the database using a type of brute-force attack (often called "cracking" the passwords)

Scenario: Blue University is looking for a way to build a single sign-on (SSO) system, and needs a way to represent passwords. Their IT department wants to use the following password policy (which conveniently makes this problem feasible for an assignment):

All passwords are 4 characters long
Passwords are composed of lower-case ASCII letters (a-z) and digits (0-9)

Context note: While this password policy is very simplistic, it's enough to demonstrate how to think about password storage and cracking without needing to expend significant time and computing resources to do it. Modern password policies and storage require more complex passwords and stronger hash functions, but in general build on the same techniques you're learning here.

Getting started: the workflow

You should work on this project from the passwords directory of your repository. Our support code provides scripts to generate password databases secured with various methods that we'll explore in this problem.

To generate your databases, run the following command, which should produce output like this and write some database files into the db directory of your repo:

user@container:~/repo/passwords$ make generate
Generating database db/demo.db.json
Generating database db/part1.db.json
Generating database db/part2.db.json
Generating database db/part3.db.json

Database format

Our password databases are just JSON files named something.db.json, located in the db directory. To get a sense of the format, open up db/demo.db.json, which should look something like this (Note: your users and passwords will be different!):

{
 "method": "plain",
 "users": {
  "user0399": {
   "password": "7vxd"
  },
  "user0449": {
   "password": "hb5s"
  },
  "user0857": {
   "password": "r65g"
  },
  . . .
 }
}

There are two important components here:

The method field tells us the format for storing passwords. In our system, this database uses the plain method, which just stores the password in cleartext, which is bad!
Each user (eg user0857) has an object containing data about that user–right now, it just contains the password, but this may store different info depending on the method

Our databases support three methods total–you will implement two of them (more details in the following sections):

plain: Store the password in plaintext
sha1-nosalt: Store the SHA1 hash of the password
sha1-salt4: Store the SHA1 hash of (the password + a 4-byte "salt" value)

For the plain method, we can just obtain the passwords from the database by inspection–but we won't be able to do this for the other methods (open up any other db.json file for an example). In order to test your work, the plaintext passwords are always stored in a "secrets" file–you can't use this in your implementation, but you can look at it to check your work when testing.

For example, here's the secrets file for the demo database:

user@container:~/repo/passwords$ cat db/demo.secrets.txt
user0399 7vxd
user0449 hb5s
user0857 r65g
. . .

Logging in

To simulate using the database, we've provided a program login that attempts to authenticate as a user given their username and password. To try it out on the demo database, do the following:

# Usage:  ./login <database> <username> <password>
user@container:~/repo/passwords$ ./login db/demo.db.json <user> <password>

where <user> and <password> are passwords from your demo database, which you can find in the secrets file. (Note that usernames and passwords are randomly generated, so yours will be different from the example.) On a correct password, you should see Success!.

To check all passwords in a database, you can use the script check_login, like this:

# Usage:  ./check_login <database> <secrets file>
user@container:~/repo/passwords$ ./check_login db/demo.db.json db/demo.secrets.txt
Logging in as user0399 => OK
Logging in as user0449 => OK
Logging in as user0857 => OK
. . .
Success!

In the next few parts, you'll extend login to implement the other password storage methods, and you can use check_login to test your work

Part 1: Storing a password hash (`sha1-nosalt` method)

A common strategy for password storage is to store a hash of the user's password, which avoids storing the password in cleartext. In our database, we'll do this using the SHA1 hash function. Concretely, the idea is to store the hash

h

such that h = SHA1(password)

The database file db/part1.json.db stores 1 hashed password using this method, which looks like this:

$ cat db/part1.json.db
{
 "method": "sha1-nosalt",
 "users": {
  "user3234": {
   "password": "1cc33637bdd3b586d89d259d719e8ad9a5e4f42e"
  }
 }

In this form, the password field represents the hash of the password (written in hex) instead of the password itself.

Now that we've discussed the format, you'll extend login to support this type of database, and then you'll crack the passwords.

Note: This problem has only one stencil, which is in Python. Unlike the other problems, there's no need to copy any stencil code from a stencil directory–the code in the passwords directory is already set up as a stencil for you.

Don't want to use Python?

If you want to use another language, you can just replace the stencil login and pwfind programs (more on these later) with your own versions. Our testing scripts will run and just examine their return values and output files, so you can replace them with your own code as long as you adhere to the specification. You are free to use any language that runs inside our container environment by default.

If you use want to use a compiled language (Go, C, …), please modify the provided Makefile such that running make builds your login and pwfind programs. For an example, see the Makefiles for the other problems.

Extending `login`

Task: Extend login to check the user's password input against the hashed value in the database. To do this, open up login and fill in the TODOs related to the sha1-nosalt method. Some notes:

login must return 0 (EXIT_SUCCESS) when the user's password is correct, and 2 (EXIT_FAILURE) when it's incorrect. You can see an example for the plain method
We've provided an implementation of the SHA1 hash for you in the support file crypto.py, which you can access with crypto.hash.

When you're done, check your work by trying to login as a user with ./login and then ./check_login.

Cracking the passwords: `pwfind`

Now it's time to go on the offensive! Your job is to write a program to discover the password for a given hash value by brute force: since you know the maximum size and number of allowed characters per the password policy, you can enumerate all possible hash values the password could have.

To get you started, we've provided a stencil password cracking program called pwfind. This program takes in a database as input and outputs a file containing all the passwords found in the database, which should use a similar format to the secrets file. You should be able to run pwfind like this:

# Usage:  ./check_login <database> <secrets file>
user@container:~/repo/passwords$ ./pwfind db/part1.db.json output.txt
. . .

where output.txt is a file containing all of the user-password pairs, in a format similar to the secrets file. (To get a sense of how it works, you can try it out on demo.db.json, which will quickly declare victory by "cracking" the plaintext passwords.)

Task: Extend pwfind to crack passwords encrypted using the sha1-nosalt method, and use it to discover the password of the single user in db/part1.db.json. The process should take several seconds to complete.

We've provided a makefile to automate testing for this part. To check your work, you can run:

$ make check-part1

This should automatically check your login and password-cracking process.

Part 2: Cracking more passwords

Now that you have successfully cracked a single password, it's time to break more. The database file db/part2.db.json contains password hashes using the sha1-nosalt method for 1000 users.

While you could use your existing pwfind implementation to crack 1000 passwords, this would take quite a while. Instead, consider what you know about the hashing process–is there a strategy you can use to save time?

Task: Update your pwfind implementation for the sha1-nosalt method such that the runtime is independent of the number of passwords you need to recover. Your updated version should be able to crack all 1000 passwords in roughly the same time as the 1 password from part 1.

What does this mean? We are asking you to look for a strategy for cracking N passwords, stored using the sha1-nosalt method, where the runtime doesn't depend on N. In other words, cracking 1000 passwords shouldn't take 1000x longer than cracking one password. We're interested in your overall strategy here–you do NOT need to finely optimize your pwfind to be as fast as possible.

If your implementation is for this part taking significantly longer than part 1, you should rethink your strategy–please come talk to us in hours or on Ed!

You can check your work for this part using make check-part2.

Part 3: Adding some salt

A common method to avoid the problem with storing a simple hash is to concatenate the password with a random value before hashing, and then storing both the hashed password and salt. Our database setup implements this using the sha1-salt4 method, which prepends the password with a 4-byte salt and stores the hash as h = sha1(salt || password)

The database db/part3.db.json contains 4 users with passwords stored using the sha1-salt4 method, and looks like this:

{
 "method": "sha1-salt4",
 "users": {
  "user1117": {
   "password": "5b830fdebacc6670ebd88b478f1ef911f548cac7",
   "salt": "3a8c3022"
  },
  "user4164": {
   "password": "d77e0b4568cc099e6e9eafd38ce082bb7dd95734",
   "salt": "7011f215"
  }
  . . .
 }

Task: As before, update your login and pwfind programs to check and crack passwords using the sha1-salt4 method when using db/part3.db.json. You can check your work for this part using make check-part3. Relative to part 2, how long does this take, and why?

When you are done

Your submission for this part should include your completed login and pwfind programs. When you upload your code, we will test your programs on new, randomized databases–please do not upload your own.

Tip: If you want to check your work for all parts at once, you can run make check.

In your passwords directory, you should also include a README that describes your password cracking strategy for each part. In particular, you should consider the following:

For the sha1-nosalt method, why can you crack 1000 passwords in such a short time, relative to cracking just one?
If you needed to crack 1000 passwords using salted hashes (like the sha1-salt4) method, would you be able to apply the same principle? How does using a salt change the difficulty of the cracking process?

Padding (CS1620/CS2660 only)

This problem is required for CS1620/CS2660 students only.

In this problem, you'll exploit a seemingly minor information leak to compromise an entire system.

:warning: Warning: This problem is longer than the others and requires some thinking, do not leave it until the last minute.

For more background on this problem, we highly recommend you attend the gearup for this project on Monday, February 5 from 5-7pm EST in CIT368 (or on Zoom). Notes and a recording will be posted afterward.

Scenario: At the same time you published your findings regarding the grades database, the University's administrators were developing a system for viewing student grades. Vowing to never make the ECB mistake again, the administrators set out to find an alternative block cipher mode for use in the grading system. They decided to implement Cipher Block Chaining (CBC) mode using a block cipher with 16-byte blocks.

Knowing Blue University's track record for implementing cryptographic protocols on their own, you decide to try to break in and read your grades. For this one, you don't have access to the database, but you can connect to the server that stores the data. The server accepts commands encrypted with a key

k

using CBC mode. You don't have the key, but you still want your grades, and it turns out their implementation of CBC mode is a flaw that lets you do this.

Historical context: where did this problem come from?

This problem is based on the POODLE attack, a real, major vulnerability discovered in 2014 on widely-used implementations of TLS, the protocol that secures all web traffic (ie, the "s" in "https").

Blue University's grading system is a toy system that contains a similar vulnerability. You don't need to know anything about TLS or anything more about the POODLE attack than what is presented in this handout, but you're welcome to read more about it if you want.

Please also note that the fictional grading system presented here isn't a good way to design a secure application–many important security guarantees (like user authentication, integrity, etc.) are missing, as they'd just get in the way of the attack we want you to do.

Background

CBC Mode

In CBC mode, each plaintext block is XOR'ed with the previous ciphertext block before it is encrypted using the block cipher. In this mode, any change in a plaintext block affects the ciphertext of all following blocks since the encryption of a given block depends on all preceding blocks, as shown in the figure below. This makes statistical attacks much more difficult.

What is padding?

When using any block cipher, the plaintext's length might not be a multiple of the block size. To fix this, we can add extra data to the end of the plaintext called padding until its length is a multiple of the block size. Different systems use a variety of schemes to apply padding to messages–this problem uses PKCS#7 a widely-used standard for storing encrypted data.

How it works: in PKCS#7, plaintexts are padded with bytes whose values encode the length of the padding. If one byte of padding is used, that byte's value is 1; if two bytes are used, those bytes' values are each 2. Here's an example of what some plaintext blocks would be padded (where xx is the plaintext, which could have any value):

| ------------- 16 byte block  ---------------- |
 xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 01  (15 bytes + 1 byte  padding)
 xx xx xx xx xx xx xx xx xx xx xx xx xx xx 02 02  (14 bytes + 2 bytes padding)
 xx xx xx xx xx xx xx xx xx xx xx xx xx 03 03 03  (13 bytes + 3 bytes padding)
 xx xx xx xx xx xx xx xx xx xx xx xx 04 04 04 04  (12 bytes + 4 bytes padding)

If a plaintext's length is already a multiple of the block size, we add an entire block of padding. Since the cipher uses 16-byte blocks, a plaintext whose length is a multiple of 16 bytes would be padded with 16 bytes, each with the value 0x10.

What can go wrong?

Implementing any cryptographic protocol or library is very tricky, and even small details can lead to significant information leaks. This problem explores what can go wrong when cryptography is implemented incorrectly.

Let's say we write a decryption function that does the following:

Validate the IV and ciphertext (that is, the IV is 16 bytes long and the ciphertext length is a non-zero a multiple of 16 bytes). If anything is malformed, report an error and stop processing.
Decrypt the ciphertext using CBC mode.
Check the padding on the plaintext. If the padding is valid, strip off the padding. If the padding is invalid, report an error and stop processing.
Interpret the resulting plaintext as a command, and send any output or error messages to the user.

While this sequence of steps may seem reasonable, it leaks a small piece of information: whether the padding at the end was correct or not. While the different errors seem relatively harmless, it's enough to completely break the security of the system!

The Attack

Using this small piece of information, it turns out that we can forge an (IV, ciphertext) pair which decrypts to a plaintext of our choosing. We'll give you an outline of how to think about the attack–-your job is to fill in the pieces and make it happen.

Recovering Intermediate State

First, you'll need the ability to recover intermediate state, or the decryption of a given ciphertext block in CBC. In other words, given two ciphertext blocks

C_{1}

and

C_{2}

(

C_{1}

here could also be the IV), we'd like to determine the decryption of

C_{2}

. This gives us the intermediate state at

C_{2}

, or

I_{2}

. Once we can recover intermediate state, forging a single plaintext block is straightforward: we simply pick

C_{1}

such that

C_{1} \oplus I_{2} = P_{2}

. Here's how intermediate state plays into the CBC decryption process:

Hint: Recall that you can send (IV, ciphertext) pairs of your choice to the server. If you can somehow get 0x01 as the final byte of the plaintext, then the padding will be valid (and the server will not raise an "incorrect padding" error). By doing that, you'll know the final byte of

P_{2}

(0x01) and the final byte of

C_{1}

(since you chose

C_{1}

). From there, you can determine the final byte of

I_{2}

. Using this information, you should be able to recover more bytes of

I_{2}

and, eventually, recover

I_{2}

in its entirety.

Forging Multiple Blocks

Once we can recover intermediate state

I_{1}

, we can easily forge a single plaintext block

P_{1}

by picking an IV such that

IV \oplus I_{1} = P_{1}

. However, forging plaintexts of arbitrary length is more difficult. Nevertheless, you should be able to use your ability to recover intermediate state as a building block.

Background: the grading server

After some digging, you've learned a bit about how the Blue University grading server works and how to connect to it^[1]. Here's a summary of what you know:

Commands sent by the client to the server must be encrypted using CBC mode. Encrypted commands are sent as hex-encoded strings and are formatted by combining a 16-byte IV with a ciphertext of some non-zero multiple of 16 blocks, followed by a newline character (\n), like this:
```
    [16 byte IV][Ciphertext (some multiple of 16 bytes)]\n
```
If the padding of the ciphertext is incorrect, the server will respond with the message "incorrect padding". This is the critical information leak!
You don't know all of the available commands the server supports (ie, what all of the valid plaintext values should be), but you know that one command is help.
When the server responds to a command, its response message is not encrypted^[2]. This is true for both errors and successful responses.

Assignment

In the padding directory of your repository, we have provided a binary called server that simulates a grading server that follows this protocol and has a padding information leak. You can run the server binary as follows:

      ./server [--debug] <address:port>

where –debug flag can print some useful information about the server's operation. The argument <address:port> specifies an address and port number for how the server should "listen" for new connections, which is how we will connect to the server with our attack program–-we'll discuss what this means later in the course, for now, it's sufficient to run the server like this:

./server localhost:9999

this sets up the server to wait for connections on your local system on port 9999. We'll use this port to connect to the server with our attack program.

Apple silicon users: Please use the version of the server binary located in the arm64 directory instead.

Initial setup

Before starting your attack, you should start the server in a container terminal as above. The server will run until you exit with Ctrl+C, waiting for you to connect to it and send commands. When you perform your attack, the server must be running. This means you'll need to have at least two terminals open inside the container–-one for the server, and one for the attack program.

Task: Your goal is to get the server to reveal the grades of the student whose ID is 12345. You should start by sending the "help" command, which should give you more information about how to find the student's grades.

How to get started

We have provided stencil code in both Go and Python, located in the padding/stencil directory of your repository. For a guide on how to work with the stencils, see this section, as well as the STENCIL.md file for instructions related to the stencil you are using.

In all implementations, the stencil code implements command-line argument validation and sets up the necessary networking code to read and write arbitrary strings to/from padding server, which you can leverage to create encrypted messages to perform your attack. You can modify the provided networking code if necessary, but it shouldn't be necessary.

The main executable for all stencils is called sol, which is run as follows:

./sol <server address> <server port> "<command>"

For example, to send the command "help" to the server using the address and port shown above, you would run:

./sol localhost 9999 "help"

Deliverables

Your code and README should be located in the padding directory of your repository. When we run your code, we will look for an executable file called sol, which we will run using a pairs file as specified above. For more details, see this section, as well as your STENCIL.md file.

After your program performs the attack, you should print the response from the server to stdout–the response should be the last line(s) of your program's output.

Your program is worth 50% of the credit for this problem, and the README is worth 50%.

Your readme should describe the following

(40%) Explain in detail what your attack does and why it works. Your explanation should be detailed enough to convince someone who has only read the specification of the protocol that your attack works.
(10%) Imagine that you've intercepted an (IV, ciphertext) pair sent from a legitimate client to the server, and you'd like to decrypt it. Describe in detail how you could modify your attack to allow you to decrypt the ciphertext.
If you have any additional instructions or notes about your implementation or how to run it, or any questions/feedback for us, please include it as well.

Hints and technical Details

Sending the help command should only require a single ciphertext block, so this should allow you to test your solution to recovering a single ciphertext block's intermediate state even if you haven't yet figured out how to forge multi-block messages.
After sending a message to the server, you may need to pause/sleep briefly (~0.001 seconds) before attempting to read the server's response to avoid any network inconsistencies
The server expects hex-encoded data. Meaning although you may want to send the following 2 bytes, \0xaa\0xbb, you must actually send the byte encoding of the hex string 'aabb', which in byte form is actually 4 bytes long: \0x61\0x61\0x61\0x61.

Extra resources

Version history

Updates to this assignment after release will be posted here.

When we talk about networking later in the course, we'll learn how you could do this "digging" process! ↩︎
You find this very strange, but since it makes your job of attacking the server much easier, you try not to worry about it too much. :upside_down_face: ↩︎

Project 1: Cryptography

Introduction

How to read this document

Development environment

Updates and feedback

How everything works

Repository setup

Getting started

How to run your code

Submitting your work

Grades

Background

Block Cipher Modes

Weaknesses of ECB

What you know about the grades database

Helpful University Statistics

Your Assignment

How to get started

What to submit

Ivy Wireless

Background: The Ivy Protocol

The attack

Your task

Problem setup (and how to send data)

Important background: sending data as hex strings

How to get started

Checking your work: the "test key"

What to submit

Passwords

Getting started: the workflow

Database format

Logging in

Part 1: Storing a password hash (sha1-nosalt method)

Extending login

Cracking the passwords: pwfind

Part 2: Cracking more passwords

Part 3: Adding some salt

When you are done

Padding (CS1620/CS2660 only)

Background

CBC Mode

What is padding?

What can go wrong?

The Attack

Recovering Intermediate State

Forging Multiple Blocks

Background: the grading server

Assignment

Initial setup

How to get started

Deliverables

Hints and technical Details

Extra resources

Version history

Read more

Handin Project Setup Guide

Project 0: Container environment setup

Infrastructure guide for staff

Lab: Getting started with Wireshark

Part 1: Storing a password hash (`sha1-nosalt` method)

Extending `login`

Cracking the passwords: `pwfind`