<style> h3 { border-bottom: 1px solid #ccc; } section { margin-bottom: 2rem; padding: 0em 1em; padding-bottom: 1em; border-radius: 4px; background-color: #f7f7f7; border: 1px solid #ccc; } center { text-align: center; } summary { font-weight: bolder; } summary:hover { text-decoration: underline; } </style> **[&laquo; Back to the main CSCI1660 website](https://brown-csci1660.github.io)** # Project 1: Cryptography **Due: Friday, February 16 by 11:59pm EST** :::danger **Note**: We're still setting up the Gradescope submission for this assignment, so it's not possible to submit your work yet. We'll post an announcement on Ed when the Gradescope submission is ready, after which this message will be removed. ::: ## Introduction In this project, you will investigate the security of some small cryptographic systems as a way to gain practice with cryptography and security principles. Each system is a self-contained problem--you can work on the problems in any order. Each problem describes a cryptographic scheme employed by some part of the fictional "Blue University", which has a history of terrible security practices. As you may expect, all of the schemes have fundamental weaknesses that make them possible to break---and you get to break them! Our support code provides some programs that *emulate* the weak cryptosystem described in each problem. You will carry out your attack by writing a script that performs the attack, the goal of which might be to leak a secret key or recover a plaintext. This document has a section for each problem describes some background about the problem, the attack, the mechanics of how the support programs and stencil code work, and how you need to interact with them. **Requirements**: CS1660 students must complete the three problems: - [Grades](#Grades) - [Ivy](#IVy-Wireless) - [Passwords](#Passwords) Students in CS1620/CS2660 must complete one extra problem: - [Padding](#Padding) For all students, each problem is worth an equal portion of the overall credit for this project. For CS1660, this means that each problem is worth one third of the available credit, and 25% each for CS1620/CS2660. <!-- **Meta-context**: many of the problems presented here derive from real vulnerabilities on real-world systems. To make them feasible to complete as a course project, we've abstracted away many of the details by creating "toy" versions of the system so you can focus on the cryptography part. As a result, --> ### How to read this document You can think of this document as four self-contained, mini-handouts, one for each problem, with some general setup instructions and resources at the beginning and end that apply to all problems. For each problem, there are two components: - **Background on the system and the cryptographic techniques involved**, and an overview of the attack you will be performing - **Implementation and logistical details** on how to navigate the support code, how to test your work, etc. If this is your first time reading this handout, you might want to skim all the background sections first to get a sense of the concepts you'll need, or perhaps you'll want to start looking at the implementation. Either way is fine, but once you do your first-pass read, **be sure to come back for the other parts when you need them**. ### Development environment You should work on this assignment using our course container environment---if you have not done so already, you can find setup instructions [here](https://hackmd.io/@cs1660/container-setup). If you indicated on HW0 that you are unable to run the container environment on your own system, we will contact you with some options. In the meantime, you can work on this assignment using any CS department system---see [this guide](https://cs.brown.edu/about/system/connecting/ssh/) for instructions on how to access them remotely via SSH. :::info ### Updates and feedback In the last year, we've made some changes to project's infrastructure and added one new problem ([Passwords](#Passwords)). We're always looking for ways to improve our documentation and resources--as the project continues, we may release more guidance and resources to help! If you have questions or find anything in this document that you think is incorrect or unclear, please let us know by posting on Ed (your post can be anonymous, even to us). For each project, we will maintain a **pinned FAQ post** with a "readling list"---please check this thread for updates and clarifications! If we make any updates to this handout, we'll post a comment in the FAQ post or make an announcement to let you know. When this happens, please refresh the page to make sure you're on the latest version. ::: ## How everything works This assignment is made up of a several small parts. This section describes how to navigate how we've organized the support and stencil code you'll be using for the various problems, as well as some general mechanics about the project. If you're reading this document for the first time and want to see the problems, skim over this section and then return here when you're ready to start writing code. ### Repository setup Each problem contains some support programs, as well as stencil code for developing your attack. You can download your starter code, and create a git repository you will use for development using **[this link](https://classroom.github.com/a/pAm6Bg0y)**. **Cloning your repository**: To make development easier, we recommend cloning your stencil code for this project so you can access files in it from your development container. To do this, we recommend cloning the stencil in your `DEV-ENVIRONMENT/home` directory, where `DEV-ENVIRONMENT` is the name of the folder you used when you cloned the development container. Here's an example of what your filesystem might look like (based on what you might have from project 1): ``` - ... |--DEV-ENVIRONMENT | |--docker/ | |--home/ | | |--.etc/ | | |--p01-cryptography-yourname/ # <------------- Clone your stencil here! | |--run-container | |-- ... ... ``` After your clone your repository, the layout will look like this: ``` p01-cryptography-yourname |- grades/ # <--- Problem directory for ivy | |- stencil/ # <--- Stencil code for grades | |- go/ | | |- STENCIL.md # Guide for using this stencil | | |- sol.go | | |- ... | |- python/ | | |- STENCIL.md | | |- ... | |- ... |- ivy/ # <--- Problem directory for ivy | |- stencil/ # <--- Stencil code for ivy | |- ... |- passwords/ # <--- Problem directory for passwords | |- ... |- ... ``` Each problem has its own directory (called the *"problem directory"*), which contains some support files and stencil code. Stencil code is provided in multiple languages (more on this in the next section). :::danger :warning: **Warning: Do not change this directory structure**, as we expect your code to follow this layout when we clone your repository for grading. For more details on how to submit your code to work with our autograder, please read the "Assignment" and "Deliverables" sections for each problem. ::: ## Getting started When you start on a problem, you should do the following: 1. **Read over the problem's section in this document**, which provides background and describes how to run the support code 2. **Look over the support code in your repository and decide which stencil you want to use.** Some problems have stencils in multiple languages (usually Python and Go). You can choose whichever stencil you feel most comfortable using for a particular problem. 3. **Using the command below, copy the files for your stencil of choice into the problem directory.** For example, if you are working on the `grades` problem and want to use the Go stencil, you could run (from the repository root): ```shell cs1660-user@container:~/repo$ cp -Trv grades/stencil/python grades ``` :::warning **Warning**: **Do not skip this step!** When grading, we will look only in the main problem directory for your code---so be sure to put it there! ::: 4. **Read the file `STENCIL.md`**, which is included with each stencil code version. This file contains helpful notes about how to use this particular stencil for this problem. 5. Read over the stencil code and get started! We've left `TODOs` that you can fill in with your implementation. <details><summary>Want to use a different language?</summary> If you wish to use a language that doesn't have a stencil, you may do so **ONLY** if the tools for it are already installed in our course container environment. Note that the course staff may be not be able to provide support for language-level questions. If you have questions on this, please reach out to us on Ed. </details> ### How to run your code For all stencils, your program should be an executable called `sol`, with different arguments depending on the problem. If you are using a scripting language like Python, your stencil will include an executable script called `sol` that you will modify. For compiled languages like Go, your stencil includes a `Makefile` that builds your code and outputs an executable called `sol` that we will run. See your `STENCIL.md` for details. Your stencil code is set up to conform to the necessary conventions to run your code when grading (ie, program names, arguments, etc.). Do not modify these, or you will encounter issues during grading! ### Submitting your work You will submit your work via Gradescope---you can find a link to Gradescope on the Assignments page of the course website. Each problem has its own entry on Gradescope where you can submit, which will run our autograder for that problem. :::danger **Note**: We're still finalizing the Gradescope submission for this project and hope to have it ready soon, at latest over the weekend. You can generally test each problem on your own without needing an autograder. ::: # Grades In this problem, you'll explore how statistical correlation can be a powerful technique for cryptanalysis. Blue University is deciding how it wants to encrypt its grades database using symmetric encryption. One of the decisions when encrypting the database is choosing a *block cipher mode*. While poking around the database, you learn that it uses *Electronic Codebook Mode (ECB)*. ### Background #### Block Cipher Modes A *block cipher* takes fixed-size messages and produces fixed-size ciphertexts. Block ciphers are powerful cryptographic primitives, but alone they aren't enough to encrypt anything interesting, since they can only encrypt small, fixed-size messages. We can encrypt longer messages by combining multiple uses of a block cipher by using a particular *block cipher mode*. Blue University's choice of block cipher mode---*Electronic Codebook (ECB)* mode---splits the plaintext into chunks (one block long), separately encrypts each plaintext block using the block cipher, and then finally concatenates the resulting blocks to create the cipher text. #### Weaknesses of ECB ECB's simplicity comes with serious weaknesses---most critically, the same plaintext block will always produce the same ciphertext block. This allows for a number of attacks (such as replay attacks), or, more relevant to this problem, statistical cryptanalysis. ECB completely leaks the statistical distribution of plaintext blocks (though it doesn't leak the original plaintexts), which can be damaging in cases where overall patterns are generally more important than local detail. For example: | Plaintext image | Same image encrypted with ECB mode | |---|--| | ![Tux](https://hackmd.io/_uploads/SyTAQ9Lca.jpg =200x) | ![Tux_ecb](https://hackmd.io/_uploads/BkfyVcI5T.jpg =200x) | In the context of a grade database, ECB mode presents similar issues---after all, with the University's list of 100,000 students, there must be statistical patterns you can attack\... ### What you know about the grades database You have learned that each *plaintext* database record consists of a five-digit student ID (randomly chosen from `00000`..`99999`, with leading zeroes if necessary), and a course grade (one of `A`, `B`, `C`, `N`) using the following format: <center> ``` id:01234, grd:A\n ``` </center> Where `\n` is a newline character (`0x0A`). Helpfully, the ID part of the line ("`id:01234`") is eight bytes long, and the grade part (from the comma to the newline: "`, grd:A\n`") is also 8 bytes long. You have also learned that the block cipher used to encrypt the database uses a block length of 8 bytes (what a coincidence!). This means that the ciphertext blocks of the encrypted database are laid out like this: <center> ``` ...[student X id][student X’s grade][student Y id][student Y’s grade]... ``` </center> ### Helpful University Statistics Thanks to statistics that the University must report annually, you also know: - The university has 100,000 students. - Each student has completed 30 courses (that is, there are 30 entries for each of the 100,000 students). - The university-wide distribution of grades is approximately: |Grade |%| |------|-| |A |50%| |B |30%| |C| 15%| |N| 5%| ### Your Assignment Your goal is to exploit the weaknesses of ECB mode to learn some information about student grades. **Initial setup**: The starter code contains a program that generates an encrypted grades database for you to attack. To start, enter the `grades` directory of your repository and run the following to generate a database: ``` you@container:~/repo/grades$ ./generate-database > /tmp/database.enc ``` This will create an encrypted copy of the database called `database.enc`. :::warning **Apple silicon users**: Please use the version of `generate-database` located in the `arm64` directory instead, which will run faster. ::: **Task**: Using your knowledge of the database layout and university statistics, **write a program or script that reads the database and figures out the following**: 1. Since you know the plaintext format of the database, how many possible **unique** ciphertext blocks exist? (Hint: you can answer this question based on only the information in this document---then you can check your answer with a script!) 2. What ciphertext block corresponds to an A grade? B? C? N? 3. There's a student who's famous at the university for being the only student to ever get both As and Cs but no Bs. Exactly how many As, Cs, and Ns has this student received? #### How to get started We have provided stencil code in both Python and Go, located in the `stencil` directory for this problem. For a guide on how to work with the stencils, see [this section](#Getting-started), as well as the `STENCIL.md` file for instructions related to the stencil you are using. Regardless of which stencil you use, your program should have the name `sol` and should take in the encrypted database (eg. `/tmp/database.enc`) as an argument, as follows: ``` ./sol <encrypted database> ``` Your deliverable will include your program as well as a short `README` describing how you determined your answer for each question, as described in the next section. #### What to submit For this problem, you should submit your code as well as a `README` that briefly explains how your code works and how you arrived at an answer for each problem. Your `README` is worth 40% of the credit for this problem, and the program is worth 60%. Your code and `README` must reside in the `grades` directory of your repository. When we run your code, we will look for an executable file called `sol`, which we will run using with arguments specified above. Your program must print out your answers in exactly this following format: ``` Total blocks: <number> Ciphertext for grade A: <hex string> Ciphertext for grade B: <hex string> Ciphertext for grade C: <hex string> Ciphertext for grade N: <hex string> Famous student As: <number> Famous student Bs: <number> Famous student Cs: <number> Famous student Ns: <number> ``` where `<hex string>` is a hex string like `abcd0123` and `<number>` is any decimal number. Each stencil contains an example for how to print this output. <!-- To submit your work, upload *your entire repository* to the assignment "Project 1: Cryptography - Grades" on Gradescope using the **Github upload** method. If you encounter issues, please ensure that your submission follows the instructions listed here---otherwise, please let us know and we can help. --> # IVy Wireless In this problem, you'll carry out an attack on a cryptographic *protocol*--a procedure for sending encrypted messages between two parties. **Scenario**: Blue University's CS department has a "secure" wireless network for its department systems that uses a custom wifi protocol called Ivy. All devices on the network share a key $k$ that's used to encrypt traffic, just like a typical password on a standard, small-scale Wifi network. You want to connect your own computer to the secure network, so you want to obtain the key $k$. As a student, you can use department systems that are already connected to secure network, but as a normal (non-admin) user, you can't just poke around in the system settings and look up the key. Instead, you try to attack the protocol, and you can leverage your existing access to department systems to help you do it. :::info <details><summary>Historical context: where does this problem come from?</summary> This problem is based on a classic attack for [WEP](https://en.wikipedia.org/wiki/Wired_Equivalent_Privacy), an early method of securing WiFi networks (circa 2000-2004). Don't worry, though--you don't need to know anything about WEP, or any more about wireless encryption protocols or networking beyond what is described in this handout. Our emulated version abstracts all these details away from you so you can focus on the idea of attacking a weak protocol. While you're welcome to read more about WEP online, we **do not** recommend doing so to find guidance on how to solve this problem. Since our version abstracts away many details, anything WEP-specific you read online will not translate well and thus isn't likely to help you. There are also several other, more popular attacks on WEP other than the one we use here that simply aren't possible on our toy version of the protocol. </details> ::: <!-- **Scenario**: You live in a dorm at Blue University. To provide Internet access in the dorm, Blue University worked with a small startup that sold and set up the network hardware at a *really* good price. However, the system has some oddities: - Occupants can only connect to the network with a physical cable to a "router" that lives in their room - To save on cabling costs, each "router" connects wirelessly to the dorm's central hub, which has a link to the Internet - All wireless traffic is encrypted with a shared key, $k$, which is known to the central hub and to all room routers. The network setup is illustrated by this figure: <center> ![routerProblem](https://hackmd.io/_uploads/B1x86F8ca.png =450x) </center> A quick web search reveals some information about router---it turns out that startup decided it was easier to build a custom wireless encryption protocol called *Ivy*, instead of relying on well-known standards. Since you are taking a security class, this makes you uneasy---you decide to try and break the Ivy protocol by recovering the key $k$. --> ### Background: The Ivy Protocol Network traffic is split up into *packets*--chunks of data that are sent over the network independently. Devices connected to a wireless network are called *clients*, which send packets to the network via a *router*. All devices connected to the network share the same key. To send a packet, the client encrypts it with the key and sends the resulting ciphertext to the router, which decrypts the packet before sending it elsewhere (eg. the Internet). However, if the same packet data is sent twice, it would always encrypt to the same ciphertext--this would allow an eavesdropper to learn information about packets they have seen before. To address this, each packet is appended with a random *initialization vector* (IV) so that, so long as two different IVs are used, the same plaintexts will produce different ciphertexts. <!-- To send packets to the network, the router encrypts each packet with the key, $k$--this is similar to the password on a home wifi network. Devices encrypt each packet using $k$, then send the resulting ciphertext over the wireless link, and the hub performs the same operation to send encrypted packets to the router. --> The IVy protocol has two phases: 1. **Setup phase**: When each client connects, it sends an encrypted packet $pkt$ to the router containing the key itself: ie $pkt = \{IV, E_k(k)\}$. This is used to set up the client's connection and authenticate it on the network (ie, prove that it knows the key, $k$). The key is 8-bytes long. 2. **Normal operation**: After the first message, the client sends out encrypted packets for each plaintext message $m$ it wants to send over the network, ie. $pkt = \{IV, E_k(m)\}$. <br />(The details of the actual message aren't important--to the protocol, they're just bytes.) The actual encryption and decryption use a stream cipher. Here's the gist (with details below): - The key and IV are used as the seed to a pseudorandom number generator $G$: a function that generates output that *appears* as a random sequence of bytes, but can be regenerated precisely given the same starting seed - To encrypt, a message is XORed with the random sequence of bytes to produce a ciphertext, which is sent over the network - If the receiver has the key, they can decrypt the message by using the pseudorandom number generator to get the same random sequence of bytes, XOR it with the ciphertext, and recover the message. :::info <details><summary>Ready for the full protocol details? Click here!</summary> The encryption function takes an 8-byte key $k$, and a message $m$, and returns both the randomly-generated IV and the ciphertext: ``` encrypt(k, m): IV = R() // Generatre initialization vector seed = IV || k // Concatenate IV and k r = G(seed, len(m)) // Generate len(m) random bytes c = m ^ r // xor m and r to produce ciphertext return IV, c ``` where: - `R()` is a source of randomness. Each call to `R()` generates a 16-bit random number uniformly distributed from {$0, ..., 2^{16} -1$} - `G(seed, len)` is a *pseudorandom stream generator*. Given a seed, `seed` and length `l`, it generates a pseudorandom stream `l` bytes. Given the same values for `seed` and `l`, `G` will always produce the same output. After encrypting, the sender sends a packet containing both the ciphertext and IV, ie. $pkt = \{IV, c\}$. Thus, when the receiver gets the packet, it thus receives both the ciphertext and IV. The receiver also has the key, and it can reconstruct the same random stream the sender used to encrypt a message to do the decryption like this: ``` decrypt(k, iv, c): seed = iv || k // Concatenate IV and k r = G(seed, len(c)) // Generate len(c) random bytes m = c ^ r // xor c and r to get plaintext m return m ``` The reason that this works is that both sides know the same key, $k$, and they both know the same IV since the IV is sent along with the ciphertext. As a result, they are able to reconstruct the seed *and* the same random stream, $r$. Thus, generating the ciphertext means computing: ``` c = m ^ r ``` Then, when we decrypt the ciphertext, we compute: ``` c ^ r = (m ^ r) ^ r = m ^ (r ^ r) = m ^ 0 // Any number xor'd with itself is 0 = m // Message recovered! ``` <!-- $$\begin{aligned} c \oplus r &= (m \oplus r) \oplus r \\ &= m \oplus (r \oplus r) \\ &= m \oplus 0 \qquad\qquad \text{[a number XORed with itself = 0]} \\ &= m \end{aligned}$$ --> This allows the receiver to recover the original message! </details> ::: ### The attack Once again, your goal is to recover the key $k$. In terms of the problem scenario, you have two tools available to help you do this: - You have an account on a department system already connected to the secure network, so you can use it to send any packets you want - Using your own computer, you can inspect the encrypted packets (where $pkt = \{IV, ciphertext\}$) that are sent over Wifi--this is called *wifi sniffing* which we'll learn more about later in the course In conceptual terms, this setup creates what is called an *encryption oracle*: you can pick any plaintext you want and observe the encrypted output (in this case, as an (IV, ciphertext) pair)--and you can do this *as many times as you like*. This type of attack is called a **chosen-plaintext attack**, and the Ivy protocol is vulnerable in such a way that it can allow you to recover $k$. ### Your task Sadly, we don't have an actual wireless network for you to attack :cry:. Instead, we give you a program called `client`, which simulates your interaction with the IVy protocol: you can "sniff" and send messages to it by reading and writing to the program's stdin and stdout. Your goal is to write a script that interacts with the `client` binary to recover the key $k$ by performing a chosen plaintext attack. #### Problem setup (and how to send data) The `ivy` directory of your project repository contains a binary called `client`, which simulates the IVy protocol. You'll communicate with the `client` program by sending plaintexts on `stdin` and reading them the encrypted result from its `stdout`. For input, the `client` program expects to receive each plaintext as a string in hexadecimal format (called "hex-encoding"), followed by a newline. :::info #### Important background: sending data as hex strings For examples of how to think about hex-encoding and how to send data, take a look at one of the following examples: <details> <summary>Example for Python</summary> To send the human-readable plaintext string like "hello", we first need to convert it into an array of bytes (type `bytes`), which will make it easy to send an manipulate. In Python, we can convert strings to/from `bytes` objects using `encode` and `decode`: ```python # Convert plaintext string to a byte array plaintext = "hello".encode("utf-8") ``` For `bytes` objects, Python provides the method `.hex()` to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes: ```python # Encode in hexadecimal to_send = plaintext.hex() # to_send === "68656c6c6f", based on ASCII values for "hello": # h (0x68), e (0x65), l (0x6c), ... # Send hex-encoded string to client program ivy_stdin.write(to_send + "\n") ``` Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own `bytes` objects: ```python # Make a byte slice from constant values from_ints = bytes([0xaa, 0xbb, 0xcc, 0xdd]) from_hex_string = bytes.fromhex("aabbccdd") assert from_ints == from_hex_string # These are equivalent! # Make a byte array of 100 bytes, initialized to zero empty = bytes(100) ``` In Python, you can manipulate `bytes` objects much like you would ordinarily use lists--if you haven't used them before, we recommend trying out some examples in a Python shell to get a feel for it. For more examples of how to work with `bytes` objects, take a look at the stencil code, or feel free to look up examples online. </details> <details><summary>Example for Go</summary> To send the human-readable plaintext string like "hello", we first need to convert it into a byte slice (type `[]byte`), which will make it easy to send an manipulate, which we can do like this: ```go // Convert plaintext string to a byte array plaintext := []byte("hello") ``` Go provides a helper method to turn the byte array into a string representing its values in hex, which is much easier to type and print than raw bytes: ```go // Encode in hexadecimal toSend = hex.EncodeToString(plaintext) // toSend === "68656c6c6f", based on ASCII values for "hello": // h (0x68), e (0x65), l (0x6c), ... // Send hex-encoded string to client program _, err := ivyStdin.Write([]byte(fmt.Sprintf("%s\n", toSend))) if err != nil { . . . } ``` Note, however, that it's more likely you'll want to think about sending manually-crafted byte arrays rather than human-readable strings like "hello". Here's how you can make your own byte slice: ```go // Make a byte slice from constant values plaintext := []byte{0xaa, 0xbb, 0xcc, 0xdd} // Make a byte slice of size 100, initialized to zero plaintext := make([]byte, 100) ``` For more examples of how to work with byte arrays, take a look at the stencil code, or feel free to look up examples online. </details> ::: For each plaintext message, the `client` program prints corresponding ciphertexts as a line to `stdout` in the format: ``` <iv> <ciphertext> ``` The first line of output corresponds to the ciphertext of the authentication packet that the client first sends to the router. :::warning **Apple silicon users**: Please use the version of the `client` binary located in the `arm64` directory instead. ::: ### How to get started We have provided stencil code in both Go and Python, located in the in the `ivy/stencil` directory. For a guide on how to work with the stencils, see [this section](#sec:getting-started), as well as the `STENCIL.md` file for instructions related to the stencil you are using. Your main executable should be called `sol`, which is run as follows: ``` $ ./sol <client binary> [test key] ``` where `<client binary>` is the path to the `client` binary. The stencil code creates a subprocess that runs the `client` program and allows you to send packets by writing to the `client`'s `stdin` and observing the output by reading from its `stdout`. For details on how this works, see the comments in the files for each stencil, and your `STENCIL.md` file. :::info **Note**: When you first start the `client` program, it prints an (IV, ciphertext) pair before any input is given. This is the packet from the setup phase, where where the ciphertext is an encrypted packet containing the key itself, ie. pkt = $\{IV, E_k(k)\}$. ::: ### Checking your work: the "test key" To check your work, you can make the `client` use a key of your choosing by specifying a "test key" as an optional argument. The key must be an 8-byte value, specified as a hex-string, like this: ``` $ ./sol ./client aabbccddeeff0101 ``` In this example, your attack should recover the key `aabbccddeeff0101`. If a test key is not specified, the key will be generated automatically. ### What to submit Your code and `README` must reside in the `ivy` directory of your repository. When we run your code, we will look for an executable file called `sol`, which we will run using with arguments specified above. For more details, see [this section](#How-everything-works), as well as your `STENCIL.md` file. After recovering the key, your program should print out the recovered key to `stdout` as an 8-byte hex string (eg. `ababababcdcdcdcd`) as its last line of output, followed by a newline. You can example for how to do this in each stencil. Your code is program is worth 60% of the credit for this problem, and the `README` (and your analysis of how the attack worked) is worth 40%. Your `README` should cover the following: <details><summary>Click to expand</summary> - Explain in detail what your attack does and why it works. Your explanation does not need to be long, but should have sufficient detail to describe it to someone who has only read this document. - Describe the vulnerability that made your attack possible, and discuss whether your attack (or a similar one) would have been possible without the vulnerability. - Discuss how the vulnerability could be fixed, by considering the following: - How could the Ivy wireless protocol change to make your attack more difficult (or impossible)? While Ivy's setup phase (sending the encrypted key at startup) may seem problematic, wireless protocols usually requires a setup step like this to make the the protocol work. How could you make the attack more difficult *without* getting rid of the setup phase? - What would an attacker need to do to defeat the new design? How is this more secure than the original design? - If you have any additional instructions or notes about your implementation or how to run it, or any questions/feedback for us, please include it as well. </details> # Passwords :::info **Note**: We'll learn more about passwords in lectures 4-6, though you're welcome to get started on this problem now! ::: In this problem, your goal is to think about how systems use and store passwords. Passwords are highly sensitive--if an attacker can leak information about a user's password for some application or system, they could impersonate the user, or potentially compromise other systems where the user reused the same password. Thus, in a secure system, passwords should never be stored in plaintext: instead, systems usually store a hashed or encrypted version of the password to make it harder for the attacker to recover the original value. In this problem, you'll take turns doing some attack and defense on a hypothetical "database" of passwords: - On the defensive side, you'll implement two methods of "securely" storing passwords, and implement a login procedure that uses them - On the offensive side, you'll conduct two attacks to recover users' passwords from the database using a type of brute-force attack (often called "cracking" the passwords) **Scenario**: Blue University is looking for a way to build a single sign-on (SSO) system, and needs a way to represent passwords. Their IT department wants to use the following password policy (which conveniently makes this problem feasible for an assignment): - All passwords are **4** characters long - Passwords are composed of lower-case ASCII letters (`a-z`) and digits (`0-9`) :::warning **Context note**: While this password policy is *very* simplistic, it's enough to demonstrate how to think about password storage and cracking without needing to expend significant time and computing resources to do it. Modern password policies and storage require more complex passwords and stronger hash functions, but in general build on the same *techniques* you're learning here. ::: ### Getting started: the workflow You should work on this project from the `passwords` directory of your repository. Our support code provides scripts to generate password databases secured with various *methods* that we'll explore in this problem. To generate your databases, run the following command, which should produce output like this and write some database files into the `db` directory of your repo: ``` user@container:~/repo/passwords$ make generate Generating database db/demo.db.json Generating database db/part1.db.json Generating database db/part2.db.json Generating database db/part3.db.json ``` #### Database format Our password databases are just JSON files named `something.db.json`, located in the `db` directory. To get a sense of the format, open up `db/demo.db.json`, which should look something like this (**Note**: your users and passwords will be different!): ``` { "method": "plain", "users": { "user0399": { "password": "7vxd" }, "user0449": { "password": "hb5s" }, "user0857": { "password": "r65g" }, . . . } } ``` There are two important components here: - The `method` field tells us the format for storing passwords. In our system, this database uses the `plain` method, which just stores the password in cleartext, which is bad! - Each user (eg `user0857`) has an object containing data about that user--right now, it just contains the password, but this may store different info depending on the method Our databases support three methods total--you will implement two of them (more details in the following sections): - `plain`: Store the password in plaintext - `sha1-nosalt`: Store the SHA1 hash of the password - `sha1-salt4`: Store the SHA1 hash of (the password + a 4-byte "salt" value) For the `plain` method, we can just obtain the passwords from the database by inspection--but we won't be able to do this for the other methods (open up any other `db.json` file for an example). In order to test your work, the plaintext passwords are always stored in a "secrets" file--you can't use this in your implementation, but you can look at it to check your work when testing. For example, here's the secrets file for the demo database: ``` user@container:~/repo/passwords$ cat db/demo.secrets.txt user0399 7vxd user0449 hb5s user0857 r65g . . . ``` #### Logging in To simulate using the database, we've provided a program `login` that attempts to authenticate as a user given their username and password. To try it out on the demo database, do the following: ``` # Usage: ./login <database> <username> <password> user@container:~/repo/passwords$ ./login db/demo.db.json <user> <password> ``` where `<user>` and `<password>` are passwords from **your demo database**, which you can find in the secrets file. (Note that usernames and passwords are randomly generated, so yours will be different from the example.) On a correct password, you should see `Success!`. To check *all* passwords in a database, you can use the script `check_login`, like this: ``` # Usage: ./check_login <database> <secrets file> user@container:~/repo/passwords$ ./check_login db/demo.db.json db/demo.secrets.txt Logging in as user0399 => OK Logging in as user0449 => OK Logging in as user0857 => OK . . . Success! ``` In the next few parts, you'll extend `login` to implement the other password storage methods, and you can use `check_login` to test your work ### Part 1: Storing a password hash (`sha1-nosalt` method) A common strategy for password storage is to store a hash of the user's password, which avoids storing the password in cleartext. In our database, we'll do this using the SHA1 hash function. Concretely, the idea is to store the hash $h$ such that `h = SHA1(password)` The database file **`db/part1.json.db`** stores 1 hashed password using this method, which looks like this: ``` $ cat db/part1.json.db { "method": "sha1-nosalt", "users": { "user3234": { "password": "1cc33637bdd3b586d89d259d719e8ad9a5e4f42e" } } ``` In this form, the `password` field represents the hash of the password (written in hex) instead of the password itself. Now that we've discussed the format, you'll extend `login` to support this type of database, and then you'll crack the passwords. :::info **Note**: This problem has only one stencil, which is in Python. Unlike the other problems, there's no need to copy any stencil code from a `stencil` directory--the code in the `passwords` directory is already set up as a stencil for you. <details><summary>Don't want to use Python?</summary> If you want to use another language, you can just replace the stencil `login` and `pwfind` programs (more on these later) with your own versions. Our testing scripts will run and just examine their return values and output files, so you can replace them with your own code as long as you adhere to the specification. You are free to use any language that runs inside our container environment by default. If you use want to use a compiled language (Go, C, ...), please modify the provided `Makefile` such that running `make` builds your `login` and `pwfind` programs. For an example, see the `Makefile`s for the other problems. </details> ::: #### Extending `login` **Task**: Extend `login` to check the user's password input against the hashed value in the database. To do this, open up `login` and fill in the TODOs related to the `sha1-nosalt` method. Some notes: - `login` must return 0 (`EXIT_SUCCESS`) when the user's password is correct, and 2 (`EXIT_FAILURE`) when it's incorrect. You can see an example for the `plain` method - We've provided an implementation of the SHA1 hash for you in the support file `crypto.py`, which you can access with `crypto.hash`. When you're done, check your work by trying to login as a user with `./login` and then `./check_login`. #### Cracking the passwords: `pwfind` Now it's time to go on the offensive! Your job is to write a program to discover the password for a given hash value by brute force: since you know the maximum size and number of allowed characters per the password policy, you can enumerate all possible hash values the password could have. To get you started, we've provided a stencil password cracking program called `pwfind`. This program takes in a database as input and outputs a file containing all the passwords found in the database, which should use a similar format to the secrets file. You should be able to run `pwfind` like this: ``` # Usage: ./check_login <database> <secrets file> user@container:~/repo/passwords$ ./pwfind db/part1.db.json output.txt . . . ``` where `output.txt` is a file containing all of the user-password pairs, in a format similar to the secrets file. (To get a sense of how it works, you can try it out on `demo.db.json`, which will quickly declare victory by "cracking" the plaintext passwords.) **Task**: Extend `pwfind` to crack passwords encrypted using the `sha1-nosalt` method, and use it to discover the password of the single user in `db/part1.db.json`. The process should take several seconds to complete. We've provided a makefile to automate testing for this part. To check your work, you can run: ``` $ make check-part1 ``` This should automatically check your login and password-cracking process. ### Part 2: Cracking more passwords Now that you have successfully cracked a single password, it's time to break more. The database file **`db/part2.db.json`** contains password hashes using the `sha1-nosalt` method for **1000 users**. While you *could* use your existing `pwfind` implementation to crack 1000 passwords, this would take quite a while. Instead, consider what you know about the hashing process--is there a strategy you can use to save time? **Task**: Update your `pwfind` implementation for the `sha1-nosalt` method such that **the runtime is independent of the number of passwords you need to recover**. Your updated version should be able to crack all 1000 passwords in roughly the same time as the 1 password from part 1. :::info **What does this mean?** We are asking you to look for a strategy for cracking N passwords, stored using the `sha1-nosalt` method, where the runtime doesn't depend on N. In other words, cracking 1000 passwords shouldn't take 1000x longer than cracking one password. We're interested in your overall strategy here--you do NOT need to finely optimize your `pwfind` to be as fast as possible. If your implementation is for this part taking significantly longer than part 1, you should rethink your strategy--please come talk to us in hours or on Ed! ::: You can check your work for this part using `make check-part2`. ### Part 3: Adding some salt A common method to avoid the problem with storing a simple hash is to concatenate the password with a random value before hashing, and then storing both the hashed password and salt. Our database setup implements this using the `sha1-salt4` method, which prepends the password with a 4-byte salt and stores the hash as `h = sha1(salt || password)` The database **`db/part3.db.json`** contains 4 users with passwords stored using the `sha1-salt4` method, and looks like this: ``` { "method": "sha1-salt4", "users": { "user1117": { "password": "5b830fdebacc6670ebd88b478f1ef911f548cac7", "salt": "3a8c3022" }, "user4164": { "password": "d77e0b4568cc099e6e9eafd38ce082bb7dd95734", "salt": "7011f215" } . . . } ``` **Task**: As before, update your `login` and `pwfind` programs to check and crack passwords using the `sha1-salt4` method when using `db/part3.db.json`. You can check your work for this part using `make check-part3`. Relative to part 2, how long does this take, and why? ### When you are done Your submission for this part should include your completed `login` and `pwfind` programs. When you upload your code, we will test your programs on new, randomized databases--please do not upload your own. :::success **Tip**: If you want to check your work for all parts at once, you can run `make check`. ::: In your `passwords` directory, you should also include a `README` that describes your password cracking strategy for each part. In particular, you should consider the following: 1. For the `sha1-nosalt` method, why can you crack 1000 passwords in such a short time, relative to cracking just one? 2. If you needed to crack 1000 passwords using salted hashes (like the `sha1-salt4`) method, would you be able to apply the same principle? How does using a salt change the difficulty of the cracking process? # Padding (CS1620/CS2660 only) This problem is required for CS1620/CS2660 students only. In this problem, you'll exploit a seemingly minor information leak to compromise an entire system. :::danger :warning: **Warning**: This problem is longer than the others and requires some thinking, **do not leave it until the last minute**. For more background on this problem, we highly recommend you attend the gearup for this project on Monday, February 5 from 5-7pm EST in CIT368 (or on Zoom). Notes and a recording will be posted afterward. ::: **Scenario**: At the same time you published your findings regarding the grades database, the University's administrators were developing a system for viewing student grades. Vowing to never make the ECB mistake again, the administrators set out to find an alternative block cipher mode for use in the grading system. They decided to implement *Cipher Block Chaining (CBC)* mode using a block cipher with 16-byte blocks. Knowing Blue University's track record for implementing cryptographic protocols on their own, you decide to try to break in and read your grades. For this one, you don't have access to the database, but you can connect to the server that stores the data. The server accepts commands encrypted with a key $k$ using CBC mode. You don't have the key, but you still want your grades, and it turns out their implementation of CBC mode is a flaw that lets you do this. :::info <details><summary>Historical context: where did this problem come from?</summary> This problem is based on the [POODLE attack](https://en.wikipedia.org/wiki/POODLE), a real, major vulnerability discovered in 2014 on widely-used implementations of TLS, the protocol that secures all web traffic (ie, the "s" in "https"). Blue University's grading system is a toy system that contains a similar vulnerability. You don't need to know anything about TLS or anything more about the POODLE attack than what is presented in this handout, but you're welcome to read more about it if you want. Please also note that the fictional grading system presented here isn't a good way to design a secure application--many important security guarantees (like user authentication, integrity, etc.) are missing, as they'd just get in the way of the attack we want you to do. </details> ::: ### Background #### CBC Mode In CBC mode, each plaintext block is XOR'ed with the previous ciphertext block before it is encrypted using the block cipher. In this mode, any change in a plaintext block affects the ciphertext of *all* following blocks since the encryption of a given block depends on all preceding blocks, as shown in the figure below. This makes statistical attacks much more difficult. ![CBC_encryption](https://hackmd.io/_uploads/SJdfYIu5p.png) #### What is padding? When using any block cipher, the plaintext's length might not be a multiple of the block size. To fix this, we can add extra data to the end of the plaintext called *padding* until its length is a multiple of the block size. Different systems use a variety of schemes to apply padding to messages--this problem uses [PKCS#7](https://en.wikipedia.org/wiki/PKCS_7) a widely-used standard for storing encrypted data. **How it works**: in PKCS#7, plaintexts are padded with bytes whose values encode the length of the padding. If one byte of padding is used, that byte's value is 1; if two bytes are used, those bytes' values are each 2. Here's an example of what some plaintext blocks would be padded (where `xx` is the plaintext, which could have any value): ``` | ------------- 16 byte block ---------------- | xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx 01 (15 bytes + 1 byte padding) xx xx xx xx xx xx xx xx xx xx xx xx xx xx 02 02 (14 bytes + 2 bytes padding) xx xx xx xx xx xx xx xx xx xx xx xx xx 03 03 03 (13 bytes + 3 bytes padding) xx xx xx xx xx xx xx xx xx xx xx xx 04 04 04 04 (12 bytes + 4 bytes padding) ``` If a plaintext's length is already a multiple of the block size, we add an entire block of padding. Since the cipher uses 16-byte blocks, a plaintext whose length is a multiple of 16 bytes would be padded with 16 bytes, each with the value `0x10`. #### What can go wrong? Implementing any cryptographic protocol or library is very tricky, and even small details can lead to significant information leaks. This problem explores what can go wrong when cryptography is implemented incorrectly. Let's say we write a decryption function that does the following: 1. Validate the IV and ciphertext (that is, the IV is 16 bytes long and the ciphertext length is a non-zero a multiple of 16 bytes). If anything is malformed, report an error and stop processing. 2. Decrypt the ciphertext using CBC mode. 3. Check the padding on the plaintext. If the padding is valid, strip off the padding. If the padding is invalid, report an error and stop processing. 4. Interpret the resulting plaintext as a command, and send any output or error messages to the user. While this sequence of steps may seem reasonable, **it leaks a small piece of information: whether the padding at the end was correct or not.** While the different errors seem relatively harmless, it's enough to completely break the security of the system! ### The Attack Using this small piece of information, it turns out that we can forge an (IV, ciphertext) pair which decrypts to a plaintext of our choosing. We'll give you an outline of how to think about the attack---your job is to fill in the pieces and make it happen. <!-- **Setup**: This is a chosen-*ciphertext* attack: you can send as many (IV, ciphertext) pairs to the grading server as you want, and observe the output. You don't know the key the server is using--instead, your goal is to intelligently guess at (IV, ciphertext) pairs until f --> #### Recovering Intermediate State First, you'll need the ability to recover *intermediate state*, or the decryption of a given ciphertext block in CBC. In other words, given two ciphertext blocks $C_1$ and $C_2$ ($C_1$ here could also be the IV), we'd like to determine the decryption of $C_2$. This gives us the intermediate state at $C_2$, or $I_2$. Once we can recover intermediate state, forging a single plaintext block is straightforward: we simply pick $C_1$ such that $C_1 \oplus I_2 = P_2$. Here's how intermediate state plays into the CBC decryption process: <center> ![IntermediateState](https://hackmd.io/_uploads/SkR-yv_5p.png) </center> *Hint:* Recall that you can send (IV, ciphertext) pairs of your choice to the server. If you can somehow get `0x01` as the final byte of the plaintext, then the padding will be valid (and the server will not raise an "incorrect padding" error). By doing that, you'll know the final byte of $P_2$ (`0x01`) and the final byte of $C_1$ (since you chose $C_1$). From there, you can determine the final byte of $I_2$. Using this information, you should be able to recover more bytes of $I_2$ and, eventually, recover $I_2$ in its entirety. #### Forging Multiple Blocks Once we can recover intermediate state $I_1$, we can easily forge a single plaintext block $P_1$ by picking an IV such that $\text{IV} \oplus I_1 = P_1$. However, forging plaintexts of arbitrary length is more difficult. Nevertheless, you should be able to use your ability to recover intermediate state as a building block. ### Background: the grading server After some digging, you've learned a bit about how the Blue University grading server works and how to connect to it[^1]. Here's a summary of what you know: - Commands sent by the client to the server must be encrypted using CBC mode. Encrypted commands are sent as hex-encoded strings and are formatted by combining a 16-byte IV with a ciphertext of some non-zero multiple of 16 blocks, followed by a newline character (`\n`), like this: <figure> <pre><code> [16 byte IV][Ciphertext (some multiple of 16 bytes)]\n</code></pre> </figure> - If the padding of the ciphertext is incorrect, the server will respond with the message "`incorrect padding`". This is the critical information leak! - You don't know all of the available commands the server supports (ie, what all of the valid plaintext values should be), but you know that one command is `help`. - When the server responds to a command, its response message is **not** encrypted[^2]. This is true for both errors and successful responses. ### Assignment In the `padding` directory of your repository, we have provided a binary called `server` that simulates a grading server that follows this protocol and has a padding information leak. You can run the server binary as follows: ``` ./server [--debug] <address:port> ``` where `–debug` flag can print some useful information about the server's operation. The argument `<address:port>` specifies an address and port number for how the server should "listen" for new connections, which is how we will connect to the server with our attack program---we'll discuss what this means later in the course, for now, it's sufficient to run the server like this: ``` ./server localhost:9999 ``` this sets up the server to wait for connections on your local system on port 9999. We'll use this port to connect to the server with our attack program. :::warning **Apple silicon users**: Please use the version of the `server` binary located in the `arm64` directory instead. ::: #### Initial setup Before starting your attack, you should start the `server` in a container terminal as above. The server will run until you exit with Ctrl+C, waiting for you to connect to it and send commands. When you perform your attack, the server must be running. This means you'll need to have at least two terminals open inside the container---one for the server, and one for the attack program. **Task**: **Your goal is to get the server to reveal the grades of the student whose ID is 12345.** You should start by sending the "`help`" command, which should give you more information about how to find the student's grades. #### How to get started We have provided stencil code in both Go and Python, located in the `padding/stencil` directory of your repository. For a guide on how to work with the stencils, see [this section](#Getting-started), as well as the `STENCIL.md` file for instructions related to the stencil you are using. In all implementations, the stencil code implements command-line argument validation and sets up the necessary networking code to read and write arbitrary strings to/from `padding` server, which you can leverage to create encrypted messages to perform your attack. You can modify the provided networking code if necessary, but it shouldn't be necessary. The main executable for all stencils is called `sol`, which is run as follows: ```` ./sol <server address> <server port> "<command>" ```` For example, to send the command "`help`" to the server using the address and port shown above, you would run: ```` ./sol localhost 9999 "help" ```` #### Deliverables Your code and `README` should be located in the `padding` directory of your repository. When we run your code, we will look for an executable file called `sol`, which we will run using a pairs file as specified above. For more details, see [this section](#How-everything-works), as well as your `STENCIL.md` file. After your program performs the attack, you should print the response from the server to `stdout`--the response should be the last line(s) of your program's output. Your program is worth 50% of the credit for this problem, and the `README` is worth 50%. <details><summary>Your readme should describe the following</summary> - (40%) Explain in detail what your attack does and why it works. Your explanation should be detailed enough to convince someone who has only read the specification of the protocol that your attack works. - (10%) Imagine that you've intercepted an (IV, ciphertext) pair sent from a legitimate client to the server, and you'd like to decrypt it. Describe in detail how you could modify your attack to allow you to decrypt the ciphertext. - If you have any additional instructions or notes about your implementation or how to run it, or any questions/feedback for us, please include it as well. </details> #### Hints and technical Details - Sending the `help` command should only require a single ciphertext block, so this should allow you to test your solution to recovering a single ciphertext block's intermediate state even if you haven't yet figured out how to forge multi-block messages. - After sending a message to the server, you may need to pause/sleep briefly (~0.001 seconds) before attempting to read the server's response to avoid any network inconsistencies [^1]: When we talk about networking later in the course, we'll learn how you could do this "digging" process! [^2]: You find this very strange, but since it makes your job of attacking the server much easier, you try not to worry about it too much. :upside_down_face: # Extra resources <!-- ## Working with hex strings and bytes In this project, you'll be working with arbitrary sequences of bytes, which are the typical inputs and outputs of all modern cryptographic libraries. In Python, sequences of bytes are represented by the `bytes()` object. You can convert For example, the plaintext "Hello!" can be encoded as a hex string like this: ```python "Hello!".encode("utf-8").hex() ``` Since arbitrary bytes are challenging to type and print, we typically write out byte sequences for plaintexts, ciphertexts, and keys encoded as strings of hex characters (usually called "hex strings") to represent the bytes. In Python, the standard type --> ## Version history Updates to this assignment after release will be posted here.