RSA - HackMD

# RSA Let's say, Sebastian, Daniel, and I are in a room together. I want to tell Sebastian a really funny joke about Daniel, but I don't want Daniel to find out that I'm making fun of him. Cryptography allows me to solve this problem. Using a traditional encryption algorithm, I would have to agree with Sebastian on a secret key that we could both use to decrypt and encrypt messages. However, with Daniel in the room, agreeing on a key without Daniel knowing what it was would be pretty hard (although there are ways that we will discuss later). But what if we didn't need to agree on a key at all? RSA allows us to do this. The RSA crypto system was invented by Ron Rivest, Adi Shamir, and Leonard Abelman. RSA is a major cryptographic tool that is used every day millions and millions of times for things from secure banking to email. Its extreme usefulness means that we will study it. What RSA allows you to do is to encrypt a message without being able to decrypt someone else's message. If you use a standard cipher, like AES, the key allows you to decrypt and encrypt. In RSA, there's a decryption key and a encryption key. The encryption key is typically made public, and referred to as the **public key**. The decryption key is typically kept private and referred to as the **private key**. Since anybody can encrypt a message, but only the private key owner can decrypt it, RSA allows people to communicate securely, even if they have not been able to agree on a secret beforehand. what follows is a description of the RSA algorithm, from the ground up, starting with the most basic building blocks. It's not entirely necessary that you read the section anout modular arithmetic with great attention to detail; however, in terms of understanding, how and why RSA works, it's very useful. if you understand the basics of modular arithmetic, you can skip straight to [the section describing the mathematics of RSA](#Mechanics-of-RSA-Encryption), or [the section about attacking RSA](#Attacks-on-RSA), although you may not get the same level of appreciation for why RSA actually works. ## Modular Arithmetic ### What is an integer? Any number that can be written without a decimal point. ### Mathematical basis Consider Euclid's Division Lemma: $$ a = q \times b + r $$ for any two integers $a$ and $b$. Put in a slightly different way: $$ \frac{a}{b} = q $$ and the remainder is $r$. Again, a, b, q, and r, are all integers. You may remember this from grade school: dividing two integers creates a quotient $q$ and a remainder $r$. $$\frac{5}{4}$$ $$ 5 = 1 \times 4 + 1 $$ $$\frac{11}{3}$$ $$ 11 = 3 \times 3 + 2 $$ We call the operation of finding q such that $$ a = q \times b + d $$ division. We define the process of finding d such that $$ a = q \times b + d $$ the modulo operation (mod for short). Some vocab: The thing we're modding *by* is called the modulus. The plural of modulus is moduli. $$\frac{5}{4}$$ $$ 5 = 1 \times 4 + 1 $$ $$5 \mod{4} = 1 $$ In this case, *4* is the modulus. and $$\frac{11}{3}$$ $$ 11 = 3 \times 3 + 2 $$ $$ 11 \mod{3} = 2 $$ but wait... theres something funny going on here... $$ 14 = 3 \times 4 + 2 $$ $$ 14 \mod{3} = 2 $$ but $$ 11 \mod{3} = 2 $$ so given that $$ 2 = 2 $$ and the two equations above so $$11 = 14$$ ??? Did we just break math?? Not quite... math involving the mod operator, "modular" math, potentially involves numbers that mod to the same value but are different. They act equivalently and you can't tell the difference, but its still incorrect to say they're equal. When we do math under a "mod" operator, we always must keep this in mind. Therefore we don't say two numbers are "equal", we say they are congruent under the modulus. $a$ equals $b$ is written as $$a = b$$ Saying $a$ is congruent to $b$ when working $\mod{n}$ is written as $$a \mod{n} \equiv b \mod{n}$$ But remember; this is only true for a certain modulus. $$ 14 \mod{3} \equiv 2 $$ and $$ 11 \mod{3} \equiv 2 $$ so $$ 14 \mod{3} \equiv 11 \mod{3}$$ But $$ 14 \mod{4} \not\equiv 11 \mod{4}$$ congruence is not equality. Try to compute $$x \equiv 25 \mod{4}$$ for $0 \le x < 4$ to check your understanding. Remember, if you get stuck, ask an officer! Okay, the answer is $1$: $$25 \equiv 1 \pmod{4}$$ One last note, before we move on. For our purposes, it only makes sense to talk about numbers that are congruent given the *same* modulus. For example, $$ 14 \mod{3} \equiv 11 \mod{3}$$ We won't write the $\mod{3}$ both times; we will just say $$ 14 \equiv 11 \pmod{3}$$ and trust that the reader understands that the $\pmod{3}$ on the right applies to *all* the components of the equation. To write the most general relationship between two integers $a$ and $b$ with a modulus of $c$: $$a \equiv b \pmod{c}$$ or $$ a = n \times c + b $$ where $n$ is some integer. Now we have this shiny new tool!!! ### Math with the modulo operator Lets do some math! Remember when we said that $$ a \equiv b \pmod{c}$$ This means that $$ a = n \times c + b $$ where $n$ is some integer. But if $n$ can be *any* integer, it can be a negative one, too right? $$ a = n \times c + b $$ $$ 25 = 7 \times 4 - 3 $$ so $$25 \equiv -3 \pmod{4}$$ It turns out, this is completely okay! But it is usually more convenient to ONLY work with numbers from $0$ to $n$ when we work with an equation $\pmod{n}$. This is a good place to note that SOME programming languages will return ONLY positive values for the mod operation, while SOME will return either positive or negative values. I, use Python or Sagemath, because they avoid this issue. $a \equiv b \pmod{c}$ becomes `b = a % c` in Python or Sagemath, and $0 \leq b < c$ in either language (if you don't know what Sagemath is, that's OK right now, it's basically Python for math people and offers more tools). You don't need Sagemath to do any of the math this time, but in the future you might want it, as it makes complex operations easier. #### Sagemath Installation In the terminal on a Codespace (or any Ubuntu/Debian based Linux distro): ```te= sudo apt-get update sudo apt-get sagemath ``` Okay so lets do some math!!! Addition is straightforward. $$(a + b) \pmod{c} \equiv ((a \mod{c}) + (b \mod{c})) \pmod{c}$$ but wait! We know that we can have negative numbers; so $b$ can be negative. So subtraction works as well (its just addition with a negative number). $$(a - b) \pmod{c} \equiv (a \mod{c}) - (b \mod{c}) \pmod{c}$$ And multiplication works the same as in normal integers $$(a \times b) \pmod{c} \equiv (a \pmod{c}) \times(b \pmod{c}) \pmod{c}$$ Lets see an example of all 3: $$15 + 3 \equiv 4 \pmod{7}$$ $$15 - 3 \equiv 5 \pmod{7}$$ $$15 \times 3 \equiv 3 \pmod{7}$$ Okay so what about division? $$\frac{5}{4} \equiv x \pmod{3}$$ But wait! We can't write $1\frac{1}{4}$ because thats not an integer! That's not a real thing because WE ONLY ALLOW INTEGERS IN THIS EQUATION! IT BREAKS THE MAGIC! ITS NOT ALLOWED IN MY INTEGER MINECRAFT SERVER! Let's go back to the basics. In real numbers, where we talk about $.5$ and $.2827683$ and $\pi$ and $18787$ and $\frac{4}{5}$, what does it mean to do division? If $\frac{1}{a} = b$, then $a \times b = 1$. We call $a$ the inverse of $b$ (technically it's the multiplicative inverse, but for brevity I'll just say "inverse"). Let's do an example: $$\frac{1}{7} * 7 = 1$$ Dividing by $7$ is the same as multiplying by $\frac{1}{7}$. It is important that this makes sense. #### Dividing by a number is the same as multiplying by its inverse So lets apply this to modular arithmetic. We start with the definition: $$a \times \frac{1}{a} = 1$$ In modular arithmetic, we generally prefer to write this as: $$a \times a^{-1} = 1$$ In words: $a$ times the inverse of $a$ equals to one. Let's make this a modular equation: $$a \times a^{-1} \equiv 1 \pmod{n}$$ and that's how we do division; we multiply by the inverse of a number: This works like this; $\frac{b}{a}$ would be written as $b * a^{-1} \pmod{n}$ Division has all the normal mathematical properties; With normal numbers: $\frac{a * b}{a} = b$ In modular arithmetic: $$(a \times b) \times a^{-1} \equiv b \pmod{n}$$ In code, $a^{-1} \pmod{n}$ can be found: Python: ```python a_inv = pow(a,-1,n) ``` Sage: ```python a_inv = a.inverse_mod(n) ``` Okay. Now do this math to check your understanding; Find $a^{-1}$ $$5 \times a^{-1} \equiv 1 \pmod{6}$$ $$4 \times a^{-1} \equiv 1 \pmod{9}$$ $$a = 3, \pmod{7}$$ $$a = 4, \pmod{11}$$ Okay, enough of that. So is this always true? Can we always divide? It turns out, no. $$a = 4, \pmod{6}$$ $$4 * a^{-1} \equiv 1 \pmod{6}$$ There exists no $a^{-1}$ for $a \equiv 4 \pmod{6}$. But why? The answer is that the modulus and the number we're trying to invert share factors that are greater than 1. If $\gcd(a,n) \not\equiv 1 \pmod{n}$, or, in other words, $a$ and $n$ share a factor that is greater than 1 (in our case, 4 and 6 share the factor of 2) there exists no inverse. If you want to know why, read the next section. Else, skip to [Modular Exponentiation](#Modular-Exponentiation). #### Why can't they share factors? Lets say $a = p * x$ and $n = p * y$, where $p$ is some prime number (its factors are 1 and $p$), and $x$ and $y$ are random integers. Therefore, $a$ and $n$ share a factor of $p$. By the definition of inverse, $$a * a^{-1} \equiv 1 \pmod{n}$$Let's get rid of the mod operation. From the definition of the modulus, we remember that $x \equiv y \pmod{n}$ means for some integer $b$, $x = y + n * b$: $$a * a^{-1} = 1 + n * b$$ for some integer $b$. If we expand out the expression using the definition of $a$ and $n$, we get that $$p * x * a^{-1} = 1 + p * y * b$$ But wait! $p * x * a^{-1}$ and $p * y * b$ are both multiples of $p$!! In other words, I have two multiples of $p$ that are spaced 1 apart; one multiple of $p$ and another multiple of $p$ has a distance of $1$ between them. Now consider; how far apart are the multiples of 2? I'm going to write a numberline and cross out the numbers that aren't multiples of $2$. $2, \not{3},4,\not{5}, 6, ...$ So they're spaced 2 apart. The multiples of 3? $3,\not{4}, \not{5}, 6,\not{7}, \not{8}, 9, ...12, ... 15$ So they're spaced 3 apart. The multiples of $p$? Lets write out the numberline; $$p, p+1, p+2,... 2p, ... 3p,... 4p$$ The multiples of $p$ are spaced $p$ apart; $p+1$ and $p+2$ can only be multiples of $p$ if $p = 1$ or $p \leq 2$ respectively. If I want to jump from one number that is a multiple of $p$ to another multiple of $p$, I must add *at least* $p$. Take a moment to make sure that makes sense, because it's trickier than it sounds... Now; $$p * x * a^{-1} = 1 + p * y * b$$ So in other words, 2 multiples of $p$ are $1$ apart. This can only be true if $p = 1$ because we must add AT LEAST $p$ to jump from one multiple of $p$ to another. Thus $a * a^{-1} \equiv 1 \pmod{n}$, can only be true if $p$ is 1, thus $a$ and $n$ cannot share any factors other than one. ## Modular Exponentiation Now we're getting to the good stuff; computing exponents under a modulus. And luckily, we know how multiplication works; exponentiation works just the same way, just more times. $$a*a*a*a*a...*a \equiv a^b \equiv c \pmod{n} $$ is a general modular exponentiation. In theory you could compute $a^b$ first, and then mod it by $n$. however, this is inefficient, if $a$ or $b$ is very large. Python and Sagemath have built-in modular exponentiation, algorithms are more efficient. In Python: $a^b \equiv c \pmod{n}$ is written as `c = pow(a, b, n)` and in Sage it is written as `c = (a^b)%n` Quick example: $$4^2 \equiv 16 \equiv 1 \pmod{3}$$ Let's list out the powers of $5 \pmod{6}$. $$5^1 \equiv 5 \pmod{6}$$ $$5^2 \equiv 25 \equiv 1 \pmod{6}$$ $$5^3 \equiv 25*5 \equiv 1*5 \equiv 5 \pmod{6}$$ ok...... what about powers of $2 \pmod{5}$?? $$2^1 \equiv 2 \pmod{5}$$ $$2^2 \equiv 4 \pmod{5}$$ $$2^3 \equiv 3 \pmod{5}$$ $$2^4 \equiv 1 \pmod{5}$$ $$2^5 \equiv 2 \pmod{5}$$ It turns out that the power where you get $a^e \equiv 1 \pmod{n}$ assuming $0<a<n$ and that $a$ and $n$ are co-prime, or they share no factors greater than one, is the same as the number of numbers between zero and $n$, that are co-prime to $n$. Co-prime simply means that two numbers share the factors greater than one. Now, why is this? The answer is complicated, and somewhat out of the scope of this lecture. The simplest and most intuitive answer is: $$a^b * a^{-b}{} \equiv 1 \pmod{n}$$ because that's how exponents work. But this also implies that if we want to be able to compute $c \equiv a^b \pmod{n}$ we should also be able to compute $a^{-b} \equiv c^{-1} \pmod{n}$, so $c$ must not share any factors with the modulus. This doesn't fully address this, to be honest. But it's a good way to understand *why* they're connected. We define a function, $\phi(n)$, pronounced "Euler's totient function", which counts how all integers between $1$ and $n$ that are coprime to $n$. Consider a prime number $p$. Since $p$ is prime, it is co-prime to *all* numbers between $1$ and $p$. Therefore we must count $1$, $2$, $3$, $4$, ... $p-1$. This is a total of $p-1$ numbers. Therefore, $\phi(p) = p-1$. Let's do this for the prime number $5$. So these numbers are co-prime to 5; $1, 2, 3, 4$. This is 4 numbers, so $\phi(5) = 4$. ## Totient Function If I compute a number $n$ such that $n = p_1 * p_2 * p_3 * ... p_m$, and $p_1, p_2, p_3, ... p_m$ are all different prime numbers, then: $$ \phi(n) = \phi(p_1) * \phi(p_2) * \phi(p_3) * ... \phi(p_m)$$ $$ \phi(n) = (p_1-1) * (p_2-1) * (p_3-1) * ... (p_m-1)$$ Let's do this for the number 22. $$ \phi(n) = \phi(11) * \phi(2)$$ $$ \phi(n) = (11 - 1) * (2-1) = 10$$ We previously wrote that $a^e \equiv 1 \pmod{n}$ when $e$ is the number of integers between $1$ and $n$ coprime to $n$. Or in other words, $$a^{\phi(n)} \equiv 1 \pmod{n}$$ This is important. Now, we can do some RSA encryption. <H2><A name="rsa_math">Mechanics of RSA Encryption</A></H2> We now have all the math we need to create RSA encryption. The only thing left is one simple observation; it's really hard to factor numbers. Don't think so?? What are the factors of $282057$? You needed a computer to try all possible factors, didn't you... Okay enough playing around; you think you could factor $1359527947600925907$ without just trying all the possible factors? You really can't factor large numbers efficiently. When a number gets into the hundreds of digits, especially when it has only a few (large) factors, it basically becomes so hard to factor that it's not worth even worrying about it (this is not always true, but usually is). Enough words! We define prime numbers $p$ and $q$. In Python: ```python from Crypto.Util.number import getPrime p = getPrime(B) ``` to get a prime $p$ such that $2^{B+1} > p > 2^B$. In Sagemath ```python p = random_prime(B, false, 2^(B-1)) ``` will return a prime $p$ such that $2^{B} > p > 2^{B-1}$. In Sagemath, you could also just write ```python p = random_prime(B) ``` to return a prime $p$ such that $2^{B} > p > 2$ (this will almost always be good enough). Alright!! We now have primes $p, q$ $$n = p*q$$ Now, quick understanding check; if I choose `B = 512`, is it easy or hard to find $p$ and $q$ given $n$? The answer is, (usually) *very hard*. Like, *might as well give up* level hard. Let's pick a number $e$ coprime to $p$ and $q$. Typically $e = 65537 = 2^{16} + 1$. Sometimes, $e = 3$. Regardless, we now call $e$ and $n$ the public key. To encrypt a message, $m$, I would write $$m^e \equiv c \pmod{n}$$ In Python, the message $m$ will often be in the form of bytes, and not an integer. Thus we would write: Python: ```python from Crypto.Util.number import long_to_bytes c = pow(long_to_bytes(m), e, n) ``` Where `c` is the ciphertext. Equivalent code in Sage is left as an exercise to the reader <3. Okay, but how to decrypt... Well, we know that $\phi(n) = (p-1)(q-1)$ Let's now define $$d \equiv e^{-1} \pmod{((p-1)(q-1))}$$ So we have a message $m$, and ciphertext $c$. Let's now compute $$c^d \pmod{n}$$ Well, let's substitute in $m^e \equiv c \pmod{n}$ $$(m^e)^d \equiv m^{(e*d)} \pmod{n}$$ Now recall that $e^{-1} \equiv d \pmod{((p-1)(q-1))}$ $$e*d \equiv 1 \pmod{((p-1)(q-1))}$$ So by the definition of the mod operator $$e*d = 1 + C*((p-1)*(q-1))$$ with some integer $C$ Now we know that: $$c^d \equiv m^{1 + C*((p-1)*(q-1)} \equiv m^{1 + C*\phi(n)} \pmod{n}$$ By the properties of exponents: $$m^{1+C*\phi(n)} \equiv m^{1}*m^{C*\phi(n)} \equiv m*(m^{\phi(n)}* m^{\phi(n)} * m^{\phi(n)}... * m^{\phi(n)})\pmod{n} $$ but recall that $a^{\phi(n)} \equiv 1 \pmod{n}$ so $$m*(m^{\phi(n)}* m^{\phi(n)} * m^{\phi(n)}... *m^{\phi(n)}) \equiv m * (1*1*1...*1) \equiv m \pmod{n}$$ thus, $d$, $p$, and $q$ are the private key, as they enable decryption. Lets put it all together: with random primes $p$ and $q$ $$n=p*q$$ pick $e$ coprime to $n$ publish $e$ and $n$ as the public key. compute $$m^e \equiv c \pmod{n}$$ to encrypt $m$ into $c$. To decrypt: Compute $$e^{-1} \equiv d \pmod{n}$$ $p$, $q$, and $d$ are the private key. To decrypt; $$c^d \equiv m \pmod{n}$$ ## CRT Let's learn about CRT. This is useful for some RSA-related problems. The Chinese remainder theorem is a way of reconstructing a number from a series of remainders under various moduli. This was first created by Sun Tzu. Let's say I have $n$ equations of the following form $$a \equiv r_1 \pmod{m_1}$$ $$a \equiv r_2 \pmod{m_2}$$ $$...$$ $$a \equiv r_n \pmod{m_n}$$ The CRT allows us to find $a$ even if we only know $r_1, r_2, ... r_n$ and $m_1, m_2, ...m_n$ There are some limits. Technically, there's an infinite amount of values of $a$ that satisfy this constraint. So we limit ourselves to values of $a < (m_1 * m_2 ... * m_n)$ (technically it's a bit more complex than this, and this description is only true if all $m$ are coprime to each other, but that is usually true). In Sagemath: ```python= a = crt([r1, r2, r3... rn], [m1, m2, m3... mn]) ``` Now... Was learning CRT so bad? CRT allows us to construct a slightly different version of RSA encryption, which is equivalent and equally secure but faster to compute. ### CRT RSA Given a typical RSA setup ($n$, $e$, message = $m$, ciphertext = $c$) the decryptor recieves $c$ and wants to decrypt it. They compute $$m_1 \equiv (c)^{(d \mod{p-1})} \pmod{p}$$ $$m_2 \equiv (c)^{(d \mod{q-1})} \pmod{q}$$ Then we combine with the CRT to get: $$m \equiv (c)^{(d \mod{(p-1)*(q-1)})} \pmod{p*q}$$ Which is the same as $$m \equiv c^{d} \pmod{n}$$ Which is what we wanted. It's just that we did a bit less work, since $\pmod{p}$ uses smaller numbers than $\pmod{n}$, so we do a little bit less arithmetic this way. ## Properties of RSA ### RSA Signatures Let's say I want to approve a piece of information, and mark it as being "approved" by me. This is called a "signature". A signature is the opposite of encryption; only the private-key holder can sign, but ANYONE can verify that the signature was made. To sign a piece of data $m$ with signature $s$ $$s \equiv m^d \pmod{n}$$ To verify, confirm that $$s^e \equiv m \pmod{n}$$ ### Malleability RSA is known to be malleable. This means that I can change your plaintext in a predictable way without actually decrypting your message. Lets take a look at how I can multiply your message $m$ with a fudge factor $f$. $$c \equiv m^e \pmod{n}$$ $$c_{fudge} \equiv f^e \pmod{n}$$ Now I, the attacker, compute $$ c_{attack} \equiv c*c_{fudge} \equiv m^e * f^e \equiv (m*f)^e \pmod{n}$$ So now when the private key holder decrypts $c_{attack}$ they get $$c_{attack}^d \equiv ((m*f)^e)^d \equiv (m*f) \pmod{n}$$ I, the attacker, didn't decrypt the message. I still fudged it though. ## Attacks on RSA ### Factoring Attacks $n$ is hard to factor. But not impossible. #### Fermat's Factoring Algorithm If $p$ and $q$ are similar (like their first 100 digits are the same, and their last 100 are different): Define $$p = a + b, q = a - b$$ So $$ n = (a+b)(a-b) = a^2 - b^2$$ Well you can start with $a = \sqrt{n}$ and try different (integer) values of $a$ and $b$ until the equation is satisfied. #### Smooth $p-1$ or $q-1$ Smooth is a mathematical property meaning "having many small factors". If $p-1$ or $q-1$ is smooth, Pollard's $p-1$ algorithm can factor. Note: "smooth" has no upper bound. You decide if "smooth" means "no factors larger than $1,000$" or "no factors greater than $1,000,000,000,000$". My personal definition is "no factors larger than $1,000,000$", because that's what my computer can easily factor using this algorithm. #### Wieners attack If $$q < p < 2 \times q$$ and $$d < \frac{n^{\frac{1}{4}}}{3}$$, Wiener says we can find $d$ easily. #### FactorDB Go to www.factordb.com and put your number in. This may or may not work. #### Schors' Algorithm Only works on a (big) quantum computer. ### RSACTFtool This is primarily aimed at CTF-ers who want to be able to quickly solve a problem under time pressure; this doesn't really help you understand the problem. I wouldn't recommend using this for club CTF unless you feel you understand the challenge well and want to learn RSACTFtool. [Github link](https://github.com/RsaCtfTool/RsaCtfTool). ### Coppersmith's Family of Attacks #### $e = 3$ Attack If $e = 3$ and $m$ is small enough, then $m^e < n$ So $$c = m^3$$ and the $\pmod{n}$ part is irrelevant. If this is true, you can just take the cube-root to solve (remember to only use integer solutions). #### Partial Information Don Coppersmith basically wrote a theorem saying; if you can express some partial knowledge of $p$, or $q$, or you know part of the message, etc, etc, you can find the entire message, or factor the modulus; <a href = "https://www.cryptologie.net/article/222/implementation-of-coppersmith-attack-rsa-attack-using-lattice-reductions/">more info here.</a> Meow!