Cryptography Intro notes

# Cryptography Intro notes jotting down the concepts, definitions, and mental models from chapters 1 and 2 of *Serious Cryptography, 2nd Edition by Jean-Philippe Aumasson* and *Real-World Cryptography by David Wong*. goal is to capture the core ideas as i read ... extracting the principles that matter: how encryption actually works, why classical ciphers fail, what perfect secrecy means, how security models are defined, why randomness is critical, and how primitives like hash functions fit into real systems. this also serves as a quick reference later when revisiting these topics or connecting them to things like zk systems, or real cryptographic implementations:) --- # Encryption: The Fundamentals Encryption is the principal application of cryptography — it makes data incomprehensible to ensure **confidentiality**. An algorithm called a **cipher** uses a secret **key** to transform readable data (**plaintext**) into unreadable data (**ciphertext**). Without the key, you can't decrypt and you can't learn anything about the encrypted message. The core functions of any cipher are: Plaintext P → E(K, P) = C → Ciphertext C → D(K, C) = P → Plaintext P Ciphertexts can be the same size as or slightly longer than plaintexts, but **never shorter**. ## Symmetric vs. Asymmetric ### Symmetric Encryption Same key encrypts and decrypts. Both parties share a single secret. Fast, used for bulk data. Example: AES. ### Asymmetric Encryption Two different keys — a public key encrypts, a private key decrypts. The public key can be shared openly. Slower, solves key distribution. Example: RSA. Asymmetric crypto also gives us **key exchanges** (Diffie-Hellman: two parties derive a shared secret without ever transmitting it) and **digital signatures** (RSA, EdDSA: prove authorship of a message using a private key; anyone with the public key can verify). ### Kerckhoffs's Principle (1883) The security of a cipher must rely **only on the secrecy of the key**, not on the secrecy of the algorithm itself. Modern ciphers are publicly specified standards (like AES). Only keys are secret. Security through obscurity does not work. Cryptographic algorithms are called **primitives** — the smallest useful building blocks. Primitives are combined to build **protocols** (step-by-step procedures to achieve something like secure communication). Two fundamental goals of most cryptographic protocols: - **confidentiality** (hiding data from wrong eyes) - **authentication** (verifying who you're talking to and that data hasn't been tampered with) --- # Classical Ciphers & Why They Fail ## The Caesar Cipher Shifts each letter by 3 positions. ZOO → CRR No secret key is involved (shift is always 3). Trivially broken — just shift back. Even a version with a variable shift has only 25 possible keys to try. In 2006, Italian police broke a mafia boss's Caesar-variant encryption. ## The Vigenère Cipher Uses a keyword as a series of shift values. Key `DUH` = shifts of 3, 20, 7 applied cyclically. More secure than Caesar, but breakable via two steps: 1. **Find the key length** Repeated ciphertext patterns reveal the key length. 2. **Frequency analysis** In English, E is the most common letter. For each key position, find the most frequent ciphertext letter and deduce the shift. Frequency analysis requires a few sentences of ciphertext. Short messages might be safe enough for historical use. Kerckhoffs estimated most wartime messages needed confidentiality for only **3–4 hours**. ## Anatomy of a Cipher: Two Components ### The Permutation A function that transforms each item (letter, bit-group) such that each item has a unique inverse. Requirements: - determined by the key - different for different keys - random-looking ### The Mode of Operation An algorithm that uses permutations to process messages of arbitrary size. Must avoid producing duplicate ciphertext letters for duplicate plaintext letters. ## Why Classical Ciphers Are Doomed For the 26-letter alphabet, there are 26! ≈ 2^88 possible permutations. This is roughly the number of atoms in the human body. A secure permutation should be randomly chosen from this huge set. But classical ciphers are limited to operations doable by hand with short keys. Modern ciphers instead perform **hundreds of bit operations per letter using 128- or 256-bit keys**. --- # The One-Time Pad: Perfect Secrecy The most secure cipher — guarantees **perfect secrecy**. Even with unlimited computing power, an attacker learns nothing about the plaintext except its length. ### One-Time Pad Formula Encrypt C = P ⊕ K Decrypt P = C ⊕ K Where ⊕ is XOR 0 ⊕ 0 = 0 0 ⊕ 1 = 1 1 ⊕ 0 = 1 1 ⊕ 1 = 0 K must be: - truly random - same length as P - used **only once** Example P = 01101101 K = 10110100 C = 11011001 Decrypt: C ⊕ K = P ## Why It's Secure If K is random then C = P ⊕ K looks just as random as K itself. An attacker sees pure randomness. Claude Shannon proved the key must be **at least as long as the message**. If the key is shorter, attackers can brute force possible plaintexts. ## The Fatal Flaw: Key Reuse If the same key encrypts two messages: C1 ⊕ C2 = (P1 ⊕ K) ⊕ (P2 ⊕ K) Which simplifies to P1 ⊕ P2 An attacker learns the XOR difference of both plaintexts. If either message is known, the other is fully recovered. ## Practical Limitation OTP is impractical. Encrypting a 1TB disk requires a **1TB key**. Used historically by: - British SOE - Soviet spies - NSA But not viable for normal systems. ## Probability in Cryptography For an n-bit key: Probability of guessing correctly: 1 / 2^n For n = 128 this probability is essentially zero. --- # Encryption Security: Models, Goals & Notions A cipher is secure if observing plaintext-ciphertext pairs reveals nothing about other messages. To define this precisely we use: - **attack models** - **security goals** ## Attack Models ### Ciphertext-Only Attack (COA) Attacker only sees ciphertexts. ### Known-Plaintext Attack (KPA) Attacker knows some plaintext-ciphertext pairs. ### Chosen-Plaintext Attack (CPA) Attacker can choose plaintexts and obtain ciphertexts. ### Chosen-Ciphertext Attack (CCA) Attacker can decrypt chosen ciphertexts. Most powerful model. ## Gray-Box Models Attacker observes the **implementation**. Examples: - timing attacks - power analysis - electromagnetic leaks - fault injection ## Security Goals ### Indistinguishability (IND) Ciphertexts should look like random strings. Attacker cannot determine which plaintext was encrypted. ### Non-Malleability (NM) Attacker cannot transform a ciphertext into another meaningful ciphertext. Note: OTP is malleable. ## Security Notions Security notion = goal + model Examples: IND-CPA IND-CCA NM-CPA NM-CCA Relationships: - IND-CCA ⇒ IND-CPA - NM-CPA ⇒ IND-CPA - IND-CPA does not imply NM-CPA - IND-CCA ≡ NM-CCA ## Semantically Secure Construction Example construction: C = (DRBG(K || R) ⊕ P, R) Where R is random. ## Two Encryption Use Cases ### In-Transit Encryption Protects data sent between machines. Example: HTTPS. ### At-Rest Encryption Protects stored data. Example: disk encryption. --- # Beyond Basic Encryption ## Authenticated Encryption (AE / AEAD) Produces: - ciphertext C - authentication tag T The tag ensures integrity. AEAD additionally authenticates **associated data**. ### Replay Attacks AE does not prevent replay attacks. Solution: Use message counters. ## Other Encryption Types ### Format-Preserving Encryption Ciphertext keeps the same format. Example: credit card → encrypted credit card. ### Fully Homomorphic Encryption Allows computation on encrypted data. Still extremely slow. ### Searchable Encryption Allows searching encrypted databases. Mostly experimental. ### Tweakable Encryption Adds a tweak parameter. Used in disk encryption. ## How Things Go Wrong Weak cipher example: A5/1 used in 2G mobile networks. Broken via time-memory tradeoffs. Another example: Padding oracle attacks. Servers leak padding errors allowing attackers to decrypt ciphertexts. --- # Randomness: The Foundation of Everything Without randomness cryptography becomes predictable. Randomness is used for: - key generation - encryption schemes - attacks ## Random vs Random-Looking Randomness refers to the **process**, not the output string. Two mistakes: ### Mistaking Non-Random for Random Believing something is random because it looks random. ### Mistaking Random for Non-Random Believing random patterns must have meaning. ## Probability Distributions A randomized process is described by a **probability distribution**. Uniform distribution: All outcomes equally likely. Example: each 128-bit key has probability 1 / 2^128. ## Entropy Entropy measures uncertainty. Shannon entropy: H = −Σ pᵢ log₂(pᵢ) Examples: Fair coin → 1 bit entropy Biased coin → less than 1 bit entropy. Entropy is maximized when distribution is uniform. --- # Random & Pseudorandom Number Generators ## RNGs True random number generators harvest entropy from the physical world: - temperature - noise - keyboard activity - disk interrupts Quantum RNGs use radioactive decay or photon polarization. ## PRNGs Pseudorandom generators expand small randomness into long sequences. Pipeline: RNG → Entropy Pool → DRBG → Pseudorandom Bits ### Core Operations init() refresh(R) next(N) ### Security Requirements #### Backtracking Resistance Past outputs cannot be recovered. #### Prediction Resistance Future outputs cannot be predicted. ## Fortuna Designed by Ferguson and Schneier. Uses: - 32 entropy pools - AES encryption - counter mode generation ## Crypto vs Non-Crypto PRNGs Never use non-crypto PRNGs for security. Examples of insecure ones: - C rand() - PHP mt_rand() - Python random - Java Random ### Mersenne Twister Problem State = 624 × 32-bit words. State relationships are **linear**. Linear equations are easily solvable. Cryptographic PRNGs must use nonlinear operations. --- # Real-World PRNGs ## Linux /dev/urandom Entropy sources: - keyboard - mouse - disk - network activity Since kernel 5.17: - switched to BLAKE2 - entropy capped at 256 bits - /dev/random and /dev/urandom behave similarly ## Windows CryptGenRandom BCryptGenRandom ## Intel Secure Key RDRAND instruction. Uses thermal noise. Concern exists about possible backdoors. Recommendation: mix with other entropy sources. ## Statistical Tests Test suites like: - Diehard - TestU01 - NIST SP 800-22 check distribution quality but **not unpredictability**. Therefore they do not guarantee cryptographic security. --- # Hash Functions A hash function maps arbitrary input to a fixed-size output. Example: SHA-256 Input → digest (256 bits) Properties: - deterministic - fixed output length - one-way ## Security Properties ### Pre-image Resistance Given hash h, cannot find x such that H(x)=h. ### Second Pre-image Resistance Given x, cannot find y ≠ x where H(y)=H(x). ### Collision Resistance Cannot find any two inputs with same hash. ## Birthday Bound For N-bit hashes: Collision expected after ~2^(N/2) attempts. Therefore: 256-bit hash → 128-bit collision security. ## Uses of Hash Functions - file integrity - commitment schemes - subresource integrity - BitTorrent chunk verification - Tor onion addresses Hash alone does not guarantee integrity. The digest must be delivered through a trusted channel. --- # SHA-2 and SHA-3 MD5 and SHA-1 are broken. Do not use them. CRC32 is not cryptographic. ## SHA-2 Uses Merkle-Damgård construction. Common variant: SHA-256. Steps: 1. pad input 2. split into blocks 3. apply compression function iteratively. ### Length Extension Attack Given H(secret || message) Attacker can compute H(secret || message || extra) without knowing the secret. ## SHA-3 Uses sponge construction. State size = 1600 bits. Phases: 1. absorbing 2. squeezing Properties: - resistant to length extension - approximates random oracle. ## SHAKE / cSHAKE Extendable output functions. Can generate arbitrary output length. cSHAKE adds **domain separation**. ## TupleHash Prevents ambiguous hashing. Always serialize inputs before hashing. ## Password Hashing Use slow, memory-hard algorithms. Recommended: Argon2. Others: bcrypt scrypt PBKDF2 Always use a **salt**. --- # Key Takeaways ## Serious Cryptography Chapter 1 - cipher = permutation + mode - classical ciphers insecure - OTP perfect but impractical - IND-CPA baseline security - encryption must be randomized - authenticated encryption provides integrity - side channels break systems ## Serious Cryptography Chapter 2 - entropy measures uncertainty - RNG collects entropy - PRNG expands it - backtracking and prediction resistance required - never use non-crypto PRNGs - /dev/urandom standard interface - statistical tests do not measure security ## Real-World Cryptography Chapter 1 - primitives are building blocks - symmetric vs asymmetric crypto - Diffie-Hellman for key exchange - RSA / EdDSA for signatures - Kerckhoffs principle - lifecycle: research → standardization → deployment ## Real-World Cryptography Chapter 2 - hash properties: preimage, second preimage, collision resistance - birthday bound halves security - SHA-2 widely used but length-extension vulnerable - SHA-3 recommended - SHAKE/cSHAKE extendable output - TupleHash avoids ambiguous hashing - Argon2 recommended for password hashing - hash alone does not guarantee integrity