IND-CPA Security & ECB Mode

CMSC 134 (Intro to Computer Security): Writeup #1 --- **What is IND-CPA Security?** IND-CPA security stands for "**Indistinguishability of ciphertexts under chosen-plaintext attack**". An encryption scheme is said to be IND-CPA secure if an attacker cannot distinguish between two ciphertexts encrypted under the same key. In simpler terms, we can break it down into two main components, Indistinguishability and CPA (Chosen-plaintext attack). 1. **Indistinguishability** - According to its Cambridge Dictionary definition, this term refers to the characteristic where it is impossible to judge something as being different when compared to another similar thing. 2. **Chosen-Plaintext Attack** - This is an attack in which we assume that the attacker can choose any arbitrary plaintext to be encrypted and the attacker then receives its ciphertext. In this type of attack, the attacker tries to acquire the secret key or alternatively creates an algorithm that decrypts ciphertext that was encrypted using this secret key, all without knowing the secret key itself. Although the success rate is low, the attacker would have the freedom to obtain more information about the secret key as well as the system being attacked, due to the ability to choose any plaintext to be encrypted by the cipher. This allows the attacker to understand and learn more about the system's behaviour as well as check for patterns in its output ciphertext, based on the given input data. If we define it in the context of a game, we have an attacker attempting to demonstrate that they can break the encryption scheme with a non-negligible advantage, all while adhering to the rules. The rules of the IND-CPA game are the following: For this game, we have Mynar (Attacker) and Oliver (Encryptor). 1. Mynar chooses two different messages, $M_0$ and $M_1$. 2. Oliver chooses a message randomly between $M_0$ and $M_1$, encrypts it, and then sends it to Mynar. 3. Mynar can ask Oliver for the encryption of any arbitrary plaintext message. 4. Mynar must then guess if whether the message he received from Step 2 belongs to either message $M_0$ or $M_1$. In this game, Mynar is crowned the winner if the advantage is considered non-negligible. This means that the attackers ability to distinguish between the two ciphertexts is significant enough to be practically exploitable which signifies a weakness in the encryption scheme. **When can we know that a system is secure in terms of indistinguishability?** >A system is considered secure if there are no adversary that can identify the message choice or win the game with significantly greater probability than an adversary who is simply guessing randomly. In the case of a two-element message, the adversary must not have a probability of identifying the message greater than $1/2$ or 50%. | IND-CPA Secure Ciphers | !IND-CPA Secure Ciphers | | -------- | -------- | | Counter (CTR) | Electronic Code Book (ECB) | | Cipher Block Chaining (CBC) One thing to note is that an IND-CPA secure algorithm **cannot be deterministic**. This means that given an encryption *$Enc(k,m)$* where ***$k$*** is the key and ***$m$*** is the message, calling *$Enc(k,m)$* twice with the same ***$k$*** and the same ***$m$*** must lead to different outputs. If not, then the scheme is not IND-CPA secure. --- **What is Electronic Code Book (ECB)?** Encryption algorithms are divided into categories depending on the input type, and those two categories are block ciphers and stream ciphers. Block cipher is a cryptographic algorithm that takes a plaintext block of a fixed size and key as input, and then outputs a ciphertext of the same size. Within block ciphers, there are two primary modes of operation which are Electronic Codebook (ECB) and Cipher Block Chaining (CBC). Among these two primary operations, Electronic Codebook is the easiest due to its direct encryption where each block of input plaintext is directlly encrypted into blocks of output ciphertext. If the message is larger than the predetermined bit size, the message can then be broken down into a bunch of blocks wherein each block is encrypted independently using the same key and then concatenated at the end to receive the final ciphertext output. An example of its operation is shown below. ![ECB Diagram](https://hackmd.io/_uploads/SJGF_uYTT.png) However, using this operation comes with a certain risk due to identical plaintext blocks resulting in identical ciphertext blocks. This happens because the encryption algorithm behaves **deterministically**. When we say an encryption algorithm behaves deterministically, this means that its output is completely determined by its input. For a particular input, the encryption would always give the same output. A figure for better understanding is shown below. Given a plaintext '**atla**', we have: | Character | ASCII Code | Binary | | :--------: | :--------: | :--------: | | a | 97 | 01100001 | | t | 116 | 01110100 | | l | 108 | 01101100 | | a | 97 | 01100001 | ![Initial Message (3)](https://hackmd.io/_uploads/S1GwnFK66.png) If you look at the figure shown above, the resulting ciphertext for the plaintext '**01100001**' which represents '**a**' in the first block and the last block are the same. This is a risk to the system as it can reveal patterns that attackers can exploit. --- **The Problem with Electronic Code Book (ECB)** ECB or Electronic Code Book is NOT IND-CPA Secure for the simple reason that it reveals a pattern in your data. In fact, most deterministic encryption algorithms are not IND-CPA secure due to its characteristic of mapping the same initial message to the same ciphertext. In ECB, each block of plaintext will always result to a defined corresponding ciphertext, while the ciphertext will also always result to a defined corresponding plaintext. In the IND-CPA game, the *attacker* can always ask for the encryption of *any* message. This means that the *attacker* can just simply ask for the encryption of the *attacker*'s chosen messages for the game. The *attacker* can then compare the choices of the IND-CPA game with the encryption of the message that the *attacker* asked to be encrypted. This way, the attacker will *always* win simply due to the fact that same messages output the same ciphertexts after encryption. What can we get from this information? Well, since the outputs in ECB are deterministic, patterns may start to show. Since it uses a simple substitution instead of chaining, two identical plaintext blocks will result in two correspondingly identical ciphertext blocks as well. Say, for example, we have the plaintext "*hehe*" and each character is given one block and encrypted. The encryption of the first '*h*' character is the same as the encryption of the second '*h*' character. The same is true for the '*e*' characters. This makes it easier to spot patterns which may partly (or completely) reveal the message. Let's have another example. Suppose we have the plaintext "*text*" wherein each character is divided into blocks and goes through a cipher function that just simply flips the value (e.g. the value 1100 will result to 0011). First, let's convert each character into its binary equivalent based on its ASCII code. We will have the following: <div align="center"> | Character | ASCII Code | Binary | | :--------: | :--------: | :--------: | | t | 116 | 01110100 | | e | 101 | 01100101 | | x | 120 | 01111000 | | t | 116 | 01110100 | </div> Now, each character has 8 bits. Assuming each block can only accept a maximum of 8 bits, then each block can accept one character from the plaintext. Let's have the diagram below to illustrate it better: ![Plaintext to Ciphertext](https://hackmd.io/_uploads/B1JIYOKTp.png) Since we assumed that each block can accomodate an 8-bit input and each character has an 8-bit binary equivalent, then each block contains one character. Each binary value will then go through a cipher function together with a key. Since we also assumed that the cipher function here will just flip the binary value, then the output will just be the reverse of the binary value of the plaintext. As seen on the diagram, the plaintext '*t*' has a binary equivalent of 01110100 which became the input for the first cipher block. The output of such cipher block is the reverse, which is 00101110. However, we can observe that the last cipher block also yielded the same output as the first cipher block: 00101110. Since ECB is deterministic, this can only mean one thing--this cipher block has the same input as the first cipher block. Oh, no! Due to the deterministic property of ECB, a pattern of the plaintext has been leaked. This will make the message less confidential since we are now aware that the first and last character of the plaintext are just the same characters. This, in a way, makes it easier to get to the plaintext without even originally knowing what the plaintext was. Supposedly, if we assume that certain information about the plaintext is already known (such as having only 4 letters of the alphabet), then there will still be $26^4$ or 456,976 possible combinations--which is fairly hard to guess for such a short plaintext. However, now that we know that the first and last characters are the same letters and that the 2nd and 3rd characters are different letters from each other and is not the same with the first and last character, we are now only left with $26*25*24$ or 15,600 combinations. Based on our example, originally, there were more than 450,000 possible letter combinations--which is hard to guess--but simply due to the fact that the pattern of the message was leaked, the possible letter combinations went down to a mere 15,600. The pattern leak made it significantly easier to obtain the plaintext based on the ciphertext. Such is the problem of ECB and all of the deterministic block cipher operations. Being deterministic prevents them from being IND-CPA secure since same inputs produce the same outputs which may compromise the confidentiality of the message by possibly leaking the pattern of the plaintext through the ciphertext. --- **Alternative Block Cipher Operations** Since we have already seen that Electronic Code Books are not IND-CPA secure, it is probably not a good idea to continue using them. There are alternative Block Cipher operations that are IND-CPA secure, which includes: 1. **CBC (Cipher Block Chaining)** One of the Block Cipher opetations that adheres to IND-CPA security is the Cipher Block Chaining (CBC).   ![Initial Message (3)](https://hackmd.io/_uploads/Sy8UfatpT.png)   Notice in the implementation above where two plaintext blocks are identical. They both undergo the same exact process but with a slight and critical difference, they are XORed with different values. This makes any two identical plaintext blocks produce unique ciphertext blocks which makes this system **IND-CPA secure**.   Essentially, to generate a cipher block text **C~n~**, the process requires an initial vector (IV) and key with block cipher encryption. Note that the key and block cipher encryption is the same for all cipher blocks. With this alone, it can make this encryption algorithm deterministic.   However, CBC utilizes an XOR operation with a certain value before any plaintext M~n~ is subjected to the main block cipher encryption. Furthermore, this value is not the same across all plaintext blocks which makes the whole encryption algorithm non-deterministic (*or does it?*). CBC executes this mechanism by not reusing the XOR value for C~1~ to the succeeding cipher blocks. Instead, CBC uses the preceeding cipher blocks as the value for the XOR operation in generating the succeeding cipher blocks.   With this, no two identical plaintext blocks can produce the same ciphertext block (*really?*) since although they use the same key and block cipher encryption, they do not utilize the same value for the XOR operation which completely changes the resulting ciphertext block further down the line.   However, it can be noticed that although the XOR values are generated randomly, there is still a tiny possibility where two identical plaintext blocks will be encrypted by, (with a very bad luck) identical XOR values, generated completely randomly. It is already obvious at this point that this is highly unlikely, almost a cosmic impossibility, and if it ever happens, the infliction will be minimal, but it is fun to point that out! 2. **CRT (Counter)** Similar to the concept of CBC, the CTR also utilizes functions such as IV, keys and block cipher encryption, and XOR operation. While CTR has some differences with CBC, it also offers IND-CPA security. The main feature that provide IND-CPA security to this encryption algorithm is the **ctr** value. The **ctr** value is generated incrementally meaning there is *absolutely* no way two identical plaintext blocks can generate identical ciphertext blocks. This is because the increment generation provides an injective function where each increment can only generate a unique ciphertext block.   It can be noticed that this feature solves the infinitesimally small and "*... a billion-to-one cosmic fluke.*" possibility to break CBC's indistinguishability. While this solves CBC's tiny vulnerability, it also opens up new ones which is some form of trade-off.   ![Initial Message (2)](https://hackmd.io/_uploads/ryGMfpYp6.png)   Firstly, the primary key in ensuring both CBC's and CTR's IND-CPA security is the **XOR value** for CBC and the **ctr** for CTR. While both of these alternative block cipher operations adhere to IND-CPA security, CTR is arguably less IND-CPA secure than CBC even though it solves CBC's tiny vulnerability. This is because CTR's ctr is astronomically more predictable (due to it being generated via incrementation pattern) than CBC's XOR value which are generated without a pattern. Rather than being generated in a completely random manner like in CBC, CTR's ctr is generated through a pattern which becomes a vulnerability.   While this is the case, CTR still offers indistinguishability but is arguably easier to break in comparison to CBC. **References:** Cambridge University Press. (n.d.). Indistinguishable. In Cambridge English Dictionary. https://dictionary.cambridge.org/dictionary/english/indistinguishable Crypto-IT. (2020). Chosen Plaintext Attack. Crypto-IT. https://www.crypto-it.net/eng/attacks/chosen-plaintext.html CrySyS. (2021). Deterministic, Stateless Encryption Is Not IND-CPA Secure. CrySyS Blog. https://crysec.dev/2021/02/17/deterministic-stateless-encryption-is-not-ind-cpa-secure.html GeeksforGeeks. (2023). Block Cipher Modes of Operation. GeeksforGeeks. https://www.geeksforgeeks.org/block-cipher-modes-of-operation/ GeeksforGeeks. (2023). Difference between Deterministic and Non-Deterministic Algorithms. GeeksforGeeks. https://www.geeksforgeeks.org/difference-between-deterministic-and-non-deterministic-algorithms/ George Mason University. (n.d.). Lecture 2 ISA 562: Information Security, Theory and Practice. Retrieved from https://cs.gmu.edu/~gordon/teaching/isa562/notes/lecture_2.pdf Heninger, N. (n.d.). Hash Functions [Slides]. CSE 107A course - UCSD. https://cseweb.ucsd.edu/classes/wi24/cse107-a/slides/6-hash.pdf Open Oregon State. (n.d.). Chapter 8: Security Against Chosen Plaintext Attacks. Open Oregon State. https://open.oregonstate.education/cryptographyOEfirst/chapter/chapter-8-security-against-chosen-plaintext-attacks Wikipedia. (2022). Ciphertext indistinguishability. In Wikipedia. https://en.wikipedia.org/wiki/Ciphertext_indistinguishability