--- title: CS2107 Notes tags: CS2107 breaks: false --- <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Config({ tex2jax: {inlineMath: [['$', '$']]}, messageStyle: "none" }); </script> # CS2107 Introduction to Information Security > [TOC] # Topic 0 Overview C-I-A triad |Confidentiality|Integrity|Availability| |---|---|---| |Anonymity, Privacy, Covert Channel|Non-Repudiation, Authenticity|People who need access to data can get it.| Other - Accountability - Traitor-Tracing - Plausible deniability # Topic 1 Encryption ## Attack Model |Attackere's Goal|Description| |---|---| |Total Break| The attacker wants to find the key.| |Partial Break| The attacker may want to decrypt a ciphertext but not interested in knowing the secret key, or the attacker may simply want to extract some information about the plaintext.| |Distinguishability|With some non-negligible probability more than $\frac{1}{2}$, the attacker can correctly distinguish the *ciphertexts* of a given *plaintext* from the *ciphertext* of another given *plaintext*.| :::info - If attacker can ahieve total break, the attacker also can achieve partial break and disringuishability. - We want to design a secure system that can prevent attacker from achieving the weakest goal (distinguishability). ::: |Attacker's Capability|Description| |---|---| |Ciphertext only attack|The attacker is given a collection of ciphertext. The attacker may know some properties of the plaintext. (The attacker can't choose the plaintext.)| |Known plaintext attack|The attacker is given a collection of plaintext and their corresponding ciphertext. (The attacker can't choose the plaintext.)| |Chosen plaintext attack (CPA)|The attacker has access to an **oracle**. The attacker can choose and feed any plaintext to the oracle and obtain the corresponding ciphertext (all encrypted with the same key). The attacker can access the oracle many times, as long as within the attacker's compute power. He can see the ciphertext and then choose the next input. We call this black-box an **encryption oracle**.| |Chosen ciphertext attack (CCA2)|Same as chosen plaintext attack, but here, the attacker chooses the ciphertext and the black-box outputs the plaintext. We call the black-box a **decryption oracle**.| :::info - There are practical scenarios where the attacker has access to a weaker form of decryption oracle. Example: [Padding Oracle](#Modern-Ciphers) - If a cipher can defend against decryption oracle, then the cipher can defend against all other weaker forms. ::: ## Classical Ciphers ### Substitution Cipher #### Encryption |Plaintext|Key|Ciphertext| |---|---|---| | $X=x_{1}x_{2}x_{3}\cdots x_{n}$ | $S$ (a substitution table, 1-1 onto function) | $E_{s}(X) = S(x_{1})S(x_{2})S(x_{3})\cdots S(x_{n})$ | #### Decryption: |Ciphertext|Key|Plaintext| |---|---|---| | $C=c_{1}c_{2}c_{3}\cdots c_{n}$ | $S$ | $D_{s}(C) = S^{-1}(c_{1})S^{-1}(c_{2})S^{-1}(c_{3})\cdots S^{-1}(c_{n})$ | #### Attack: - Exhaustive Search - Consider a substitution cipher with table size 27. - Then the size of key space is $27!$, which needs approximately $2^{94}$ loops. - Infeasible using current compute power. - Ciphertext only attack (đŸ’„) - Given the ciphertext and the attacker knows the plaintext is in English letter, the attacker can use exhaustive search. - But there are more efficient method: **frequency analysis** attack - The frequency of letters used in English is not uniform. - Given a sufficiently long ciphertext, the attacker may guess the plaintext by mapping frequent characters in the ciphertext to the frequent character in English. - Known plaintext attack (đŸ’„) - Given a plaintext and the corresponding ciphertext, the attacker can figure out the replacement of the characters. - With sufficiently long ciphertext, the full table can be found. - Example: X = hello_world, C = hnllqpoqilb, then we know |a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|\_| |---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---| ||||b|n|||h||||l|||q|||i|||||o||||p| #### Remark - Performing substitution twice using two tables does not increase difficulty of attack. It simply **reduces to one table**. ### Permutation Cipher #### Encryption **Description:** The plaintext is grouped into blocks of $t$ characters, then apply a secret permutation to each block by shuffling the characters. **Key:** The key is the permutation $p$ and the block size $t$. We can write $p=(p_{1},p_{2},p_{3},\cdots,p_{t})$ which indicate the character at position $i$ is shifted to position $p_{i}$. #### Decryption #### Attack - Known plaintext attack (đŸ’„) - Given a plaintext and the corresponding ciphertext, the attacker may figure out the key. - Example: plaintext = aabbbbababaa, ciphertext = babaabbbabaa - ![image](https://hackmd.io/_uploads/BJkhv63sR.png) - So, the key will be $t = 4$, $p = (4,2,1,3)$ - Ciphertext only attack (đŸ’„) - > similar to Substitution Cipher #### Remark - Performing permutation twice using two permutations does not increase difficulty of attack. It simply **reduces to one permutation**. - By using both Substitution and Permutation Ciphers, the attacks become more difficult. ### One Time Pad #### Encryption |Plaintext|Key|Ciphertext| |---|---|---| |$x_{1},x_{2},\cdots ,x_{n}$|$k_{1},k_{2},\cdots ,k_{n}$|$(x_{1}\oplus k_{1}),(x_{2}\oplus k_{2}), \cdots, (x_{n}\oplus k_{n})$| #### Decryption |Ciphertext|Key|Plaintext| |---|---|---| |$c_{1},c_{2},\cdots ,c_{n}$|$k_{1},k_{2},\cdots ,k_{n}$|$(c_{1}\oplus k_{1}),(c_{2}\oplus k_{2}), \cdots, (c_{n}\oplus k_{n})$| #### Remark - The key cannot be re-used. That is, a key can only be used once. (For S and P, a same key is being used to encrypt multiple plaintexts.) - So, a 1 GB plaintext requires a 1 GB key to encrypt. - The long key renders one-time-pad **impractical**. - Attacker can derive the key if given the ciphertext and corresponding plaintext, but such key is useless, since it will not be used any more. - It is shown that one-time-pad **leaks no information** of the plaintext. (CS4236) - It is **unbreakable**. ## Modern Ciphers Designs of modern ciphers take into considerations of known-plaintext-attack, frequency analysis and other known attacks (except CC2). We can quantify the security of an encryption scheme by the length of the key. Example: - DES (Data Encryption Standard, 1977) - The key length is 56 bits. Hence, the exhaustive search needs to loop for $2^{56}$ times in the worst case. - Too short. - RC4 (Rivest's Cipher 4, 1987) - Broken in some adoptions - AES (Advanced Encryption Standard, 2001) - The block length is 128 bits. The key length can be 128, 192 or 256 bits. - Currently, no known attacks on AES. ### Symmetric vs Asymmetric Key Cryptography - Symmetric key cryptography - ![symmetric-encryption](https://hackmd.io/_uploads/rkj182W1kx.svg) - uses the same key for both encryption and decryption - used often in bulk encryption - need a secure channel to establish the secret key for any two entities - implemented as either [block ciphers](#Block-Cipher) or [stream ciphers](#Stream-Cipher) - Example: DES, RC4, AES - Asymmetric key cryptography - a.k.a Public key cryptography (PKC) - uses two different key for encryption and decryption - used often in [data authentication](#Topic-3-Autheticity-Data-Origin) - only need a secure broadcast channel to distribute the public key - Example: RSA, ElGamal, Paillier, Post-Quantum cryptography - More details: [Public Key Cryptography](#Public-Key-Cryptography) ### Block Cipher - A **block cipher** has fixed size input and output. - DES and AES are block cipher. - AES is designed for 128 bits (16 bytes) input and output. - For large plaintext, it will be divided into bloacks, then apply block cipher. - The method of extending encryption from a single block to multiple blocks is called **mode-of-operation**. - Mode of operation in this note: [ECB](#ECB-Electronic-Code-Book), [CBC](#CBC-Cipher-Block-Chaining), [CTR](#CTR), [GCM](#GCM) #### ECB (Electronic Code Book) - Plaintext is divided into blocks. - Block cipher is applied to each block, using same key. - ![image](https://hackmd.io/_uploads/Bk4jCxRi0.png) - But ECB leaks information. - Since blocks are encrypted using the same key, same blocks will be encrypted into same ciphertext. - ![image](https://hackmd.io/_uploads/B1z61bRiC.png) #### CBC (Cipher Block Chaining) - Plaintext is divided into blocks. - The first block will XOR with the IV. - Next block will XOR with the previous encrypted block. - Encryption: ![image](https://hackmd.io/_uploads/SJzubWRoR.png) - Initialization Vector (IV) is an arbitrary value chosen during encryption, and is different in different encryption. - IV will be included/sent with the ciphertext. - $y_{0}=\text{IV}$, $c_{1}=E_{k}(x_{1}\oplus y_{0})$, $c_{i}=E_{k}(x_{i}\oplus c_{i-1})$, $i > 1$ - CBC also leaks information, similar reason with ECB. - Decryption: ![photo_2024-09-04_23-54-14](https://hackmd.io/_uploads/S1gRf-8hC.jpg) #### CTR (Counter Mode) - Similar to CBC. - But, now IV is incrementing. - Encryption: ![image](https://hackmd.io/_uploads/H1RedZCiC.png) #### GCM - Details omitted in this course. - An "Authenticated-Encryption". - Ciphertext consists of an extra tag for authentication. - It is secure in the presence of decryption oracle. ### Stream Cipher ![image](https://hackmd.io/_uploads/BJbSrcxT0.png) - Encryption: 1. Given plaintext, $m$ and secret key, randomly generate IV. 2. From secret key and IV, generate a long pseudorandom sequence, $K$. 3. Ciphertext will made up of IV and $(K\oplus m)$. - Decryption: 1. Given the key and the ciphertext with the IV. 2. Extracts the IV from the ciphertext. 3. From the key and IV, generates the long sequence. 4. Performs xor to get the plaintext. ## Examples of Attacks ### Triple DES & Meet-in-the-middle #### 2DES - If we want to improve DES, we can encrypt the plaintext twice or more, using different keys. - Let's consider this double encryption under known plaintext attack. Then the attacker has a plaintext and the corresponding ciphertext, and wants to find the two secret keys. - Using exhaustive search, $2^{56+56}$ loops required, so the key-strength seems to increased to 112. - But it can be less with meet-in-the-middle attack. ```graphviz digraph G { rankdir=LR; // labelloc=t; label="Encryption"; node [shape=box, style=rounded, width=1.0, height=0.5]; C [label="c", shape=plaintext]; P [label="m", shape=plaintext]; D1 [label="DES"]; D2 [label="DES"]; P -> D1; D1 -> D2 [label="X"]; D2 -> C; K1 [label=<K<SUB>1</SUB>>, shape=plaintext]; K2 [label=<K<SUB>2</SUB>>, shape=plaintext]; K1 -> D1; K2 -> D2; { rank=same; K1; D1; } { rank=same; K2; D2; } } ``` ```graphviz digraph G { rankdir=LR; // labelloc=t; label="Decryption"; node [shape=box, style=rounded, width=1.0, height=0.5]; C [label="c", shape=plaintext]; P [label="m", shape=plaintext]; D1 [label="DES"]; D2 [label="DES"]; C -> D1; D1 -> D2 [label="X"]; D2 -> P; K1 [label=<K<SUB>1</SUB>>, shape=plaintext]; K2 [label=<K<SUB>2</SUB>>, shape=plaintext]; K2 -> D1; K1 -> D2; { rank=same; K1; D2; } { rank=same; K2; D1; } } ``` #### Meet-in-the-middle Attack - Known plaintext attack. - Assume attacker has a pair of $(m,c)$ plaintext and the corresponding ciphertext. Note: $c=\text{DES}_{k1}(\text{DES}_{k2}(m))$ - Process: - Take the plaintext $m$, encrypt it using all possible keys. - Let $C$ be the set of all possible ciphertexts. - Take the corresponding ciphertext $c$, decrypt it using all possible keys. - Let $M$ be the set of all possible plaintext. - Find common element in $C$ and $M$. - The key used to encrpyt $m$ that results in the common element $X$, will be $k_{1}$. - The key used to decrypt $c$ that results in the common element $X$, will be $k_{2}$. - For 2DES, the number of "guesses" are reduced to $2^{56} + 2^{56} = 2^{56+1}$. - In general, for $k$-bit keys, it reduces the number of crypto operations to $2^{k+1}$ using approx $2^{k+1}$ units of storage space. #### 3DES - a.k.a 3DES, TDES, TDEA, 2TDES, 3TDES - Since DES and 2DES are not secure enough, we use 3DES. - There are 2 versions: - a) $E_{k_{1}}(E_{k_{2}}(E_{k_{1}}(p)))$ - b) $E_{k_{1}}(D_{k_{2}}(E_{k_{1}}(p)))$ - Both versions are believed to have same level of security. - Version (b) can be more convenient. ### Padding Oracle Attack - The block size of AES is 128 bits (16 bytes). - Suppose the length og the plaintext is 25 bytes, it will be fitted into two blocks. - The remaining 7 bytes will be padded with some values. - There are many ways to pad, but the number of padded bits must be encoded together, so the receiver will know the length of the plaintext. #### PKCS#7 - PKCS#7 is a padding standard. - Suppose the block size is 8 bytes, and the last block has 5 bytes, which mean 3 extra bytes needed. ```graphviz digraph G { node [shape=plaintext]; a [label=< <TABLE BORDER="1" CELLBORDER="0" CELLSPACING="10"> <TR> <TD><FONT COLOR="black">DD DD DD DD DD DD DD DD</FONT></TD> </TR> </TABLE> >]; b [label=< <TABLE BORDER="1" CELLBORDER="0" CELLSPACING="10"> <TR> <TD><FONT COLOR="black">DD DD DD DD DD</FONT></TD> <TD><FONT COLOR="red">03 03 03</FONT></TD> </TR> </TABLE> >]; } ``` - If the last block has 8 bytes, an extra bloack is added where each byte is `08`. #### Padding Oracle Attack - Attacker has: ciphertext $(\text{iv, c})$, where $c$ is encrypted using key $k$. - Attacker's goal: obtain the plaintext. - A padding oracle will accept any cipher text, and output whether the plaintext is in correct padding format. ```graphviz digraph G { rankdir=LR; attacker [label="attacker"]; oracle [label="Padding\nOracle\nSecret key", shape=box]; attacker -> oracle [label="ciphertext"]; oracle -> attacker [label="YES / NO"]; } ``` - Assumption: - Attacker has access to the padding oracle. - Attacker knows how many bytes is padded. - AES CBC mode is not secure against padding oracle attack. - Process: - Assume the block size is 8 bytes, $c$ is 1 block. - ![image](https://hackmd.io/_uploads/B1GcPxLhA.png) - Attacker wants to find out $x_5$. 1. Calculate $\text{iv'}$ from $\text{iv}$ by xor with the last 4 bytes (see image), where $t$ is some value. - ![image](https://hackmd.io/_uploads/rkB4OlIhR.png) - Why choose $07$ for last 3 bits: - Originally, $(v_{8}\oplus d_{k}(c_{{8}})) = 03$ - Since we want $x_{5}$ and we know last 3 bits are $03$ - We want to somehow make it `x1 x2 x3 x4 04 04 04 04` - We notice $(07\oplus v_{8})\oplus d_{k}(c_{{8}}) = 07\oplus (v_{8}\oplus d_{k}(c_{{8}})) = 07\oplus 03 = 04$ - So choose $07$ 2. Feeds the padding oracle with $(\text{iv', c})$. 3. If output YES, then $(t\oplus x_{5})$ must be $04$, and we got the correct $t$. - $(t\oplus v_{5})\oplus d_{k}(c_{{5}}) = t\oplus (v_{5}\oplus d_{k}(c_{{5}})) = t\oplus x_{5}$ - Since we choose $07$, $(t\oplus x_{5})$ must be $04$ - Hence, $x_{5} = t\oplus 04$ 4. If output NO, repeat steps 1-4 with different $t$. #### Remark - We can easily extend the algorithm to find the full plaintext. - It is easy to determine the initial padding length. (see tutorial question) - CTR mode also vulnerable to padding oracle. - Schemes that are secure against decryption oracle are also known as IND-CCA2 secure. - GCM, or other “authenticated encryption”, is believed to be IND-CCA2 secure, and thus secure against padding oracle attack. ## Cryptography Pitfalls 1. Wrong choices of IV 2. Reusing one-time-pad 3. Predictable secret generation 4. Designing your own cipher 5. Reliance on Obscurity: Kerckhoffs’s Principle > A system should be secure even if everything about the system, except the secret key, is public knowledge. # Topic 2 Authentication Credential # Topic 3 Autheticity (Data Origin) ## Public Key Cryptography - Public key scheme (PKS) uses different keys for encryption and decryption. - ![asymmetric-encryption](https://hackmd.io/_uploads/ryuCwnWk1l.svg) - There is no need to clearly indicate which key is for encryption, which key is for decryption. - Both keys can be used to perform either encryption or decryption, but not both. - Public key is shared with everyone, private key is kept secret by user. - Public key can be used for encryption or decryption, depends on the context. (same goes for private key) - Without PKS, every individual would need to keep a symmetric secret key pair with everyone else. - Keys needed for symmetric-key setting: $\frac{n(n-1)}{2} = O(n^2)$ - Keys needed for asymmetric-key setting: $2n = O(n)$ - Example: RSA, ElGamal, Paillier ### Classroom RSA We call it “classroom” because the variant used in practice is different, with padding and special considerations in choosing the primes, etc. Also called “textbook” RSA. Setup: 1. Owner randomly chooses 2 large primes $p$, $q$ and computes $n = pq$ 2. Owner randomly chooses an encryption exponent $e$ s.t. $\text{gcd}(e, (p-1)(q-1)) = 1$ 3. Owner finds the decryption exponent $d$ where $$d\times e\text{ mod } (p-1)(q-1) = 1$$ 4. Owner publishes $(n, e)$ as public key, and safe-keeps $d$ as the private key Encryption: 1. Public key: $(n, e)$ 2. Given message $m$, ciphertext $c = m^e \text{ mod } n$ Decryption: 1. Private key: $d$, Decryption key: $(n, d)$ 2. Given ciphertext $c$, message $m = c^d \text{ mod } n$ ![rsa](https://hackmd.io/_uploads/HJehXvMyke.svg) Remarks: 1. Note that for any positive $m<n$, and any pair of public/private keys, $$\text{Decrypt(Encrypt(m)) = m}$$ $$(m^e)^d \text{ mod } n = m$$ 2. Can we compute $p,q,d,e$ and $\text{modulo}$ efficiently? - To find random prime $p,q$: - Randomly pick a number $p$ - If $p$ is a prime, output $p$ and halts. Otherwise, repeat step 1-2. - The probability that the randomly chosen $p$ is prime is non-negligible. This is by the well-known "Prime Number Theorem". Probability of a randomly chosen 1024-bit number being a prime is around ln(21024)~ (1/710). So likely takes about 710 trials to get a 1024-bit prime number. Fast but too slow for real-time application. - There is fast algorithm to determine whether a number is a prime. - Choose a small $e$ so that encryption can be done very fast. It turns out that the value of $e$ won’t affect the security. But it can’t be too small due to some attack. It is common to set $e = 65537$. - The value of $d$ can be efficiently computed from $e$ and $n$ using the extended Euclidean algorithm. - Using Chinese Reminder Theorem, one can speedup decryption about 2x faster. Public key doesn’t need to be large (but not too small). To achieve speedup in encryption, some scheme choose a fixed small public key such as $65537$, which is $2^{16} - 1$ 3. We can use the decryption key $d$ to encrypt, and encryption key $e$ to decrypt. In other words, we can swap the role of $d$ and $e$, so that anyone in the public can decypt, but only the owner can encrypt. This property is useful in desgining signature scheme. Issues: 1. Security: - Same as symmetric-key encryption, some forms of IV is required so that encryption of the same plaintext at different time would give different ciphertext. - Classroom RSA has an interesting “homomorphic” property. These properties are useful in applications, but they also lead to attacks. Such properties can be destroyed using padding. - The standard Public-Key Cryptography Standards (PKCS)#1, add “optimal padding”. [https://en.wikipedia.org/wiki/PKCS_1](https://en.wikipedia.org/wiki/PKCS_1) - RSA and "discrete log based" PKC will be broken with Quantum computer. 2. Performance: - RSA is signigicantly slower than AES. - A 128-bit AES has the same key strength as a 3072-bit RSA (NIST recommendation). - If we want to encrypt a large file $F$, it would be very slow to directly use RSA on $F$. Instead, it is typically carried out in this way: 1. Chooses a random AES 128-bit key $k$ 2. Encrypts $k$ using PKC to get $y$ 3. Encrypts $F$ using AES with $k$ as the key to get $C$. 4. The final ciphertext will be $(y,C)$ - ![rsa_big_file](https://hackmd.io/_uploads/HkpimnG1kl.svg) ## Cryptographic Hash A hash is a function that takes in an arbitrarily long message as input and outputs a fixed-size **digest**. ![hash.drawio](https://hackmd.io/_uploads/Hk8r4af1yl.svg) Security Requirements: 1. Preimage resistant - Given a digest $d$, it is difficult to find a $m$ s.t. $h(m) = d$ 2. Second-preimage resistant - Given $m_1$, it is difficult to find a second preimage $m_1 \ne m_2$ s.t. $h(m_1) = h(m_2)$ 3. Collision-resistant - It is difficult to find two different messages $m_1 \ne m_2$ that "hashes" into the same digest, i.e. $h(m_1) = h(m_2)$ Popular Hash: - Broken: SHA-0, SHA-1, MD5 - Partially broken: SHA-2 - No known attack: SHA-3 ### Keyed Hash A keyed-hash is a function that takes an arbitrarily long message and a secret key as input, and outputs a fixed-size **MAC**. ![mac](https://hackmd.io/_uploads/B1WZraGyJx.svg) Security Requirements: 1. Without knowing the key, it is difficult to forge the MAC Popular Keyed-Hash (MAC): - [CBC-MAC](#CBC-MAC) - [HMAC](#HMAC) #### CBC-MAC Based on AES operated under CBC mode. ![image](https://hackmd.io/_uploads/B1O5npMJkl.png) #### HMAC Baed on SHA. ![image](https://hackmd.io/_uploads/Bkot6TGyJg.png) > `||` means concatenation ### Application of Unkeyed Hash How do we know if a file we downloaded is authentic? **Unkeyed Hash for Integrity Protection** - Let $F$ be the original file. - We obtain the digest $h(F)$ from the secure channel. - We then obtain the file $F'$, whose origin claims that the file is $F$. - We can compute and compare the digests $h(F)$ and $h(F')$: - If $h(F) = h(F')$, then $F = F'$ (with high confidence) - If $h(F) \ne h(F')$, then the file integrity is compromised Assumptions: - There is a secure channel to send a short piece of information (digest). ![hash_integrity](https://hackmd.io/_uploads/BJMsi6GyJx.svg) ### Application of Keyed Hash In the previous example of using unkeyed hash, we assume that there is a secure channel to send the digest. What if we don't have a secure channel to deliver the digest? In such scenarios, we can protect the digest with the help of some secrets. - In the symmetric key setting, it is called the **mac**. - In the public key setting, it is called the [**digital signature**](#Signature). ![mac](https://hackmd.io/_uploads/Bk-4UCfkJg.svg) - Note that there is no issue on confidentiality. In fact, the data F can be sent in clear. - Typically, the mac is appended to F. Hence, mac is also called the authentication tag, or authentication code. #### Signature Asymmetric-key version of MAC. Here, the owner uses the private key to generate the signature. The public can use the public key to verify the signature. So, anyone can verify the authenticity of the data, but only the person who know the private key can generate the signature. ![signature](https://hackmd.io/_uploads/B1tIdRMyJx.svg) Procedure: 1. The original file $F$ is first hashed using a hash function. 2. The hash value is then passed through the signature function together with the private/signature key $k_{\text{pri}}$ to produce signature $s$. 3. $s$ is then appended to $F$ and sent to the recipient via an insecure channel. 4. The receipient then passes the received signature $s'$ and the public/verification key $k_{\text{pub}}$ into the verification function. 5. The verification function will produce a hash value. 6. The file $F'$ is then pass through a hash function to return a hash value. 7. The hash values are compared to see if the file has been modified, or if the digital signature is valid. ![mac_signature_hash](https://hackmd.io/_uploads/Byeu6CGkyx.svg) Notice that in this setting, the private key is used to encrypt, while the public key is used to decrypt. Signature scheme achieves **Non-repudiation.** Why hash the file before signing? - For greater efficiency. - Since hash value is a unique representation of the data, it is sufficient to sign the hash instead of original file. Popular Signature scheme: - RSA based: RSASSA-PSS, RSASSA-PKCS1 - Discrete log based: Digital Signature Algorithm (DSA) ## Example of Attacks ### Birthday Attack From Wikipedia: > The birthday paradox is a veridical paradox: it seems wrong at first glance but is, in fact, true. While it may seem surprising that only $23$ individuals are required to reach a $50\%$ probability of a shared birthday, this result is made more intuitive by considering that the birthday comparisons will be made between every possible pair of individuals. With $23$ individuals, there are ⁠$\frac{23 × 22}{2}⁠ = 253$ pairs to consider, far more than half the number of days in a year. This can be applied to the hash function. Suppose we have $M$ messages, and each message is tagged with a value randomly chosen from ${1,2,3,\dots,T}$. Then the probability that there is a pair of messages tagged with the same value is approximately: $$\boxed{\text{Prob(collision)} = 1-e^{(-\frac{M^2}{2T})}}$$ In particular, when $M \gt 1.17\sqrt{T}$, then $\text{Prob(collision)}\gt 1-e^{(\frac{1.17}{2})}\gt 0.5$. Suppose the hash gives 128-bit digest. Then $T=2^{128}$. By choosing $M=1.17(2^{64})$, with probability more than 0.5, birthday attack succeed. For the birthday attack variant, we are looking at calculating the probability of a random set of elements sharing at least one element with another set of distinct elements, with both sets being subsets of an even larger set. Below is an useful theorem: > Let $S$ be a set of $k$ distinct elements where each element is an $n$ bits binary string. Let us independently and randomly select $m$ $n$-bit binary strings and put them in the set $T$. The probability that $S$ has non-empty intersection with $T$ is more than $$\boxed{1-2.7^{-km2^{-n}}}$$ <table> <tr> <td> Straightforward method to find collision </td> <td> Birthday attack </td> </tr> <td> Suppose H() is a hash with k-bit digest. Find collision: 1. Randomly pick two messages m<sub>1</sub>, m<sub>2</sub>. 2. If H(m1) = H(m2), then output m<sub>1</sub>, m<sub>2</sub> and halt. 3. Repeat step 1 to 3. </td> <td> Suppose H() is a hash with k-bit digest. 1. Constructs a set S of M = ⎡1.17* 2<sup>k/2</sup>⎀ unique randomly chosen messages. 2. Compute the digest of each message m in S. 3. Check whether there are two messages in S having the same digest. If so, output m<sub>1</sub>, m<sub>2</sub>. Otherwise, output “Fail”. </td> </tr> <td> The expected number of rounds taken by the above algorithm is 2<sup>k</sup> and thus the expected number of hashes is more than 2<sup>k</sup>. If k = 128, as analyzed in tutorial one, this is computationally infeasible. </td> <td> When k = 128, one round would take ~2<sup>64</sup> hashes, significantly lower than before. And this has >50% chance will succeeds. </td> </tr> </table> ## Pitfalls **Using encryption for the purpose of authentication** Encryption is designed to provide confidentiality. It does not guarantee integrity and authenticity. For example, a company that communicates instructions via SMS does not provide integrity and authenticity if it simply encrypts its message. A sucure design could use MAC instead of encryption. Simply adding MAC to the instructions is not sufficiently secure. It is still vulnerable to "replay attack". For example, an adversary can intercept one of the messages, and simply repeatedly send the message to receiver, which result in some command or instruction being repeatedly executed. To prevent replay attack, a cryptographic "noice" is required, which is a random or preudo-random number issued that prevents old communication from being reused. # Topic 4 Part 1 Public Key infrastructure # Topic 4 Part 2 Channel Security # Topic 5 Network Security # Topic 6 Web Security # Topic 7 & 8 Secure Programming # Topic 9 Access Control ## Access Control Model 1. Access control setup security perimeter/boundary. Design of the boudary is guided by: - Principle of least privilege - access rights that are not required to complete the role will not be assigned - Compartmentalization - limits the access to information to persons or other entities on a need-to-know basis to perform certain tasks - Defense in depth / Swiss Cheese Model - Segregation of duties - breaks down processes so that no single person is responsible for every stage in a process 2. Separation between policy and mechanism: ```graphviz digraph G { rankdir=LR; // labelloc=t; node [shape=box] "Principle" -> "Do operation" -> "Reference monitor" -> "Object" } ``` - A *principal* (or *subject*) wants to access an *object* with some *operation*. The *reference monitor* either grants or denies the access. - Principals: humas users - Subjects: entity in the system the operate on behalf of the principals 3. Types of accesses: - Observe - reading a file - Alter: - writing a file - deleting a file - changing properties - Action: - executing a program 4. Access rights: - Discretionary access control - owner of the object decides the rights - Mandatory access control - system-wide policy decides the rights - strict rules that everyone must follow ## Specifying Access Rights > r: read, w: write, x: execute, s: execute as owner, o: owner #### Access Control Matrix Example: <table> <tr style="background: gray"> <td> </td> <td> my.c </td> <td> mysh.sh </td> <td> sudo </td> <td> a.txt </td> </tr> <tr> <td style="background: gray"> root </td> <td> {r,w} </td> <td> {r,x} </td> <td> {r,s,o} </td> <td> {r,w} </td> </tr> <tr> <td style="background: gray"> Alice </td> <td> {r,w} </td> <td> {r,x,o} </td> <td> {r,s} </td> <td> {r,w,o} </td> </tr> <tr> <td style="background: gray"> Bob </td> <td> {r,w,o} </td> <td> {} </td> <td> {r,s} </td> <td> {} </td> </tr> </table> - Pros: can specify the access right for all pairs of principals and objects - Cons: the table would be very large, and thus different to manage #### Access Control List The access control list stores the access rights to an object as a list. Example: <table> <tr> <td style="background: gray">my.c</td> <td style="border: none;border-collapse: collapse">-> (root, {r,w})</td> <td style="border: none;border-collapse: collapse">-> (Alice, {r,w})</td> <td style="border: none;border-collapse: collapse">-> (Bob, {r,w,o})</td> </tr> <tr> <td style="background: gray">mysh.sh</td> <td style="border: none;border-collapse: collapse">-> (root, {r,x})</td> <td style="border: none;border-collapse: collapse">-> (Alice, {r,x,o})</td> </tr> <tr> <td style="background: gray">sudo</td> <td style="border: none;border-collapse: collapse">-> (root, {r,s,o})</td> <td style="border: none;border-collapse: collapse">-> (Alice, {r,s})</td> <td style="border: none;border-collapse: collapse">-> (Bob, {r,s})</td> </tr> <tr> <td style="background: gray">a.txt</td> <td style="border: none;border-collapse: collapse">-> (root, {r,w})</td> <td style="border: none;border-collapse: collapse">-> (Alice, {r,w,o})</td> </tr> </table> - Pros: easy to find out all subjects that can access an object - Cons: difficult to find out all objects that a subject can access #### Capabilities A subject is given a list of capabilities, where each capability is the access rights to an object. Example: <table> <tr> <td style="background: gray">root</td> <td style="border: none;border-collapse: collapse">-> (my.c, {r,w})</td> <td style="border: none;border-collapse: collapse">-> (mysh.sh, {r,x})</td> <td style="border: none;border-collapse: collapse">-> (sudo, {r,s,o})</td> <td style="border: none;border-collapse: collapse">-> (a.txt, {r,w})</td> </tr> <tr> <td style="background: gray">Alice</td> <td style="border: none;border-collapse: collapse">-> (my.c, {r,w})</td> <td style="border: none;border-collapse: collapse">-> (mysh.sh, {r,x,o})</td> <td style="border: none;border-collapse: collapse">-> (sudo, {r,s})</td> <td style="border: none;border-collapse: collapse">-> (a.txt, {r,w,o})</td> </tr> <tr> <td style="background: gray">Bob</td> <td style="border: none;border-collapse: collapse">-> (my.c, {r,w,o})</td> <td style="border: none;border-collapse: collapse">-> (sudo, {r,s})</td> </tr> </table> - Pros: easy to find out all objects that a subject can access - Cons: difficult to find out all subjects that can assess an object #### Notes - UNIX file system employs ACL - The size of lists for ACL and Capabilities are still too large - Another approach is to group the subjects and objects and define access rights on the defined groups. So, we need **intermediate control** ## Intermediate Control It is not practical for an owner to specific each single entries in the access control matrix. So, we need some ways to simplify the representation. One method is to "group" the subjects/objects and define the access rights on the group. This is called **intermediate control**. Recalls that UNIX employs ACL, but instead of specifying all subjects, UNIX's ACL only specifies the rights for owner, group and world/others. #### Role-based access control The grouping can be determined by the "role" of the subject. A role associates with a collection of procedures. In order to carry out these procedures, access rights to certain objects are required. To design the access right of a role, we should follow the **least privilege principle**. #### Privileges The term privilege is sometime used to describe access right. Privilege can also be viewed as an intermediate control. It can be represented by a number, e.g. 1,2,3. if a subject can access an object, another subject with higher privilege can also access the object. #### Protection rings In OS, "privilege" is often called protection rings. They are the same but with different name. Each object (data) and subject (process) is assigned a number. Whether a subject can access an object is determined by their respective assigned number. Object with smaller number are more important. If a process is assigned a number $i$, we say that the process runs in ring $i$. We call processes with lower ring number as having "higher privilege". Unix has only 2 rings, *superuser* and *user*. ```mermaid flowchart LR subgraph &nbsp&nbsp l[Low privilege process] -->|R,W| lo(Low privilege Object) end subgraph &nbsp h[High privilege process] -->|R,W| ho(High privilege Object) end h -->|R,W| lo ``` Two well-known models: 1. **Bell-LaPadula**: Confidentiality ```mermaid flowchart LR subgraph &nbsp&nbsp l[Low privilege process] -->|R,W| lo(Low privilege Object) end subgraph &nbsp h[High privilege process] -->|R,W| ho(High privilege Object) end h -->|R| lo l -->|W| ho ``` - **no read up** - A subject does not have read access to object in higher level. This prevent a lower level from getting info in the higher level. - **no write down** - A subject does not have append-right to object in lower level. This prevents a malicious insider from passing information to lower levels. (e.g. a clerk working in the highly classified department is forbidden to gossip with other staff) 2. **Biba**: Integrity ```mermaid flowchart LR subgraph &nbsp&nbsp l[Low privilege process] -->|R,W| lo(Low privilege Object) end subgraph &nbsp h[High privilege process] -->|R,W| ho(High privilege Object) end h -->|W| lo l -->|R| ho ``` - **no write up** - A subject does not has “write” access to objects in higher level. This prevent a malicious subject from poisoning upper level data, and thus ensure that a process will not get compromised by lower level subjects. - **no read down** - A subject does not has read access to objects lower level. This prevents a subject from reading data poisoned by lower level subjects. # Tutorial Terminologies |Terminologies|Meaning| |---|---| |Cryptology|| |Cryptanalysis|| |Cryptography|| |NSA|| |NIST|| |cryptography backdoor|| |Key Escrow|| |Decryption order|| |Graphical Passwords|| |covert channel|| |end-to-end encryption|| |Single Sign-On|| |retinal vs iris scan|| |Forward Secrecy|| |Insider threat||