Cryptography exists to facilitate secure communication without an outside attacker being able to read the data being sent. Many systems and ciphers exist for this purpose, but many of them have one major issue: they rely solely on a private key. Suppose Reginald wants to send Tyrone (names courtesy of Kara Kim) a message. To do this with a traditional system, they have to agree on a private key that they both use to encrypt and decrypt the message. Since they have not already agreed on the key, however, they cannot securely exchange it. To solve this, public key cryptography systems use a public key for encryption, so anyone can encrypt and send a message to Tyrone. Tyrone then uses the corresponding private key, which only he knows, to decrypt the message.
The RSA system is a public key crypto system created by Ron Rivest, Adi Shamir, and Leonard Abelman and is one of the most common methods of encryption. The system is based on the fact that large numbers are really hard to factor. The GNFS algorithm, which is the fastest known algorithm for factoring large numbers (that can run on non-quantum computers), can take several thousand years of processor time to factor a sufficiently large integer. Because of this, RSA (if implemented correctly) is essentially uncrackable with the current state of available computing.
RSA uses modular arithmetic for encryption and decryption, and while it is not strictly necessary to understand the underlying math of RSA in order to use it, it does make using it much easier. Because of this, this lecture builds the system from the ground up, but if you already understand modular arithmetic or just want to understand the basic process for RSA you can skip the next section.
An integer can be expressed as a multiple of some divisor plus a remainder, as shown by the following equation.
In modular arithmetic, we only care about this remainder. The basic idea of modular arithmetic is that if we operate under some modulus n, we can do any ordinary arithmetic, like addition, subtraction, multiplication, division (sort of), or exponentiation, but we only use integers, taking their remainder when divided by n.
For example, we would say that the number 11 is equal to 3 under the modulus 4. This is true because Using 4 as the modulus, this means that
However, you may have noticed a small issue with this. If we look at 15 under the modulus 4, it also is 3, so
Clearly, 3 = 3, so does that mean 15 = 11 under a modulus of 4?
Not exactly. In modular arithmetic we say that things are congruent, not equal. The proper way of writing the equations above would really be
and
A few final notes:
We write the modulus on the side in parentheses, which means that it applies to everything in the equation.
It is correct to say that 15 and 11 are congruent mod 4, but this can be annoying to work with, so we only write numbers between 0 and the modulus, throwing away any remainder. (This also applies to negative numbers, so -5 is also congruent to 3 mod 4)
To do modular arithmetic in python or sagemath, use the % operater for mod and the pow() function for modular exponentiation
Addition, subtraction, multiplication, and exponentiation all work the same under a modulus as they do in normal arithmetic. You do the exact same calculation, then just throw away the extra multiples of the modulus to make the result less than it. For example,
and
and
and finally,
However, when we get to division, we encounter a problem. Dividing by an integer that isnt a factor of your dividend doesn't leave you with an integer. For example, , but 1.75 is't a number we can work with in modular arithmetic.
Because of this problem, we don't divide under a modulus. Instead, we multiply by a number's inverse. In regular arithmetic this is the same as dividing
But what does this mean in modular arithmetic? Well the definition of the inverse is that it equals 1 when multiplied by the original number
So for modular arithmetic this just means that we have to find some number such that multiplying by it gives us 1. If we wanted to find the inverse of 5 under the modulus 7, we would need to find some number a such that
It turns out that a = 3 satisfies this equation
Therefore, if we wanted to divide 4 by 5 under modulus 7, we would multiply by 3
So 4 divided by 5 is 5 under modulus 7.
Keep in mind that a number will have different inverses depending on the modulus
but
So we solved division! We can divide any number by any other number under any modulus, right?
Sadly, no. If the number shares any factors with the modulus there is no inverse. For example if we wanted to find the inverse of 9 under modulus 12, we would find that there is no integer that satisfies the equation
Understanding why this is true is beyond the scope of this lecture, but you can find a great explanation in Kiran's RSA notes from last year. (In addition to another explanation of everything in this lecture, in case this doesn't make sense)
Like I mentioned earlier, exponentiation under a modulus works the same as it normally does - it's essentially just repeated multiplication, with the added caveat that you only keep the portion less than the modulus. One exponent on its own isn't very interesting, but if you look at sequences of exponents an interesting pattern emerges. For example, let's consider the powers of 4 under the modulus 9
What about powers of 5 mod 9
Or even powers of 2 mod 9
Eventually, you reach an exponent that yields , then the sequence begins to repeat. Additionally, all of them resulted in when raised to the power of . This is actually true for any base, so long as it is coprime to 9 (coprime means it shares no factors other than 1). Every modulus has a special exponent that will always result in 1 if the base is coprime to the modulus.
This property turns out to be very useful in number theory. It's important enough to have its own dedicate function, know as Euler's totient function, which is denoted by the greek letter . gives us the value of an exponent such that for coprime to . The value of this function happens to be equal to the number of integers between 0 and the modulus that are coprime to the modulus. It is not necessary to understand why this is true for RSA, but I invite any curious reader to ponder this themself.
Consider what the value of for some prime . Because is prime, it must be coprime to all integers less than it, so must be . Additionally, is multiplicitive for distinct prime factors, so for some number with distinct prime factors ,
The proof of this is again outside of the scope of this lecture and is left as an exercise to the reader.
Now that we've gone over all of the underlying math, we can finally introduce the mechanics of RSA encryption.
We start be defining 2 large prime numbers, and (in python this can be accomplished with the Crypto.Util.number.getPrime
function from the pycryptodome library)
We then calculate our modulus and choose an exponent coprime to and . In almost every situation, is the best option, but sometimes is used. The values of and make up our public key, which can be shared with anyone wishing to send us a message
To encrypt a message , we compute
Now how do we decrypt it?
Remember that
Using this we can define
, along with and are our private key
To decrypt our message we raise our ciphertext to the power of d
Now if we replace c for its original value, , we can see that
As we defined above,
By the definition of the modulus operator, this means that for some integer . Therefore,
Using basic properties of exponents,
Now recall that . This means that
So, finally, we have shown that
RSA is malleable, meaning the content of a message can be predictably changed without decrypting the message. To do this, the attacker calculates a fudge factor
The attacker then multiplies the ciphertext by
When the recipient decrypts this message they get the original message multiplied by .
If I want to prove that a message actually comes from me and isn't someone else trying to impersonate me, I can use RSA to sign the message. This works in almost the same way, but using instead of . For some message , I can calculate the signature,
The recipient of the message can then verify the signature by calculating
If this decrypted signature does not match the original message, the recipient knows that the message did not come from me.
If we can factor and find the values of and , then we can calculate and decrypt messages. For small values of this can be done reasonably quickly, even on a weak personal computer, but for large numbers this takes long enough that it is not useful. However, if and are chosen badly, some algorithms can factor them very quickly in comparison to a general factoring algorithm. Many of these are somewhat complex, so I won't explain them in great detail, but I will provide links for further reading.
FactorDB is a database containing the factorization of many large integers. If the modulus has factors recorded in the database, then you have your solution. Otherwise, you'll have to find another method.
If and are close together, it is relatively easy to factor as a difference of squares. If we can find 2 integers and such that , then we know that , so and are and . To find and , start with a guess for (usually ), then calculate and check if the result is a perfect square. If it is, you've found and . Otherwise, increase and try again.
If or has only small prime factors, it is known as smooth. In this case, Pollard's algorithm can be used to factor n. The algorithm uses the fact that for any integer , for an integer coprime to some prime and if a number is congruent to 1 modulo a factor of , then the greatest common factor of and is a multiple of that factor. So you can find some multiple of , you can find . If we multiply a crapload of small prime factors to use as our exponent and is sufficiently smooth, then we can find . Note that the limit for smooth depends on the computer running the algorithm. A large, powerful computer might use 1 billion as the limit for the largest prime factor of a smooth prime, but on a smaller personal computer, 1 million might be a more sensible limit.
If a small value is chosen for the private exponent (specifically if ), then and can be efficiently calculated using only the public key, and . The basic idea of the attack is that the attacker can use and to estimate the value of a fraction containing . Then, an algorithm based on continued fractions can be used to recover the value of .
In 1994, Peter Shor devised a quantum algorithm, which, given a powerful quantum computer, could factor any relatively quickly. This algorithm is obviously not useful for CTFs or any current cybersecurity application, but it is theoretically interesting. If you're interested in learning about this algorithm I've linked a great MinutePhysics video about it.
Coppersmith's attack is actually a family of attacks that can be used if the public exponent is low (most commonly for ) or if the attacker has some partial knowledge of or .
If the public exponent is low and the message is sufficiently short, then the value of will be less than and therefore unaffected by the modulus. Therefore, the attacker can easily perform the inverse, calculating (i.e. the cube root of the ciphertext if ), which is the original message. Note that the ciphertext will often be large enough that it cannot be converted to a float, so the default python cbrt() function will not work. Because of this, you will likely have to make your own cube root function, but this can be easily accomplished with a simple binary search.
If one sender wants to send the same message to several people, using the same low public exponent but different moduli, the Chinese Remainder Theorem can be used to recover the original message. The Chinese Remainder Theorem states that if you know the remainder of some integer under various moduli, then you can reconstruct the original value of the integer. More formally, the theorem states that if we know
and and are all coprime, then one can uniquely determine the value of . Therefore, if , we can find .
Using CRT, if we know
for different coprime moduli, then we can find the value of and since for all (which is necesarily true for RSA to work properly), and . Therefore, we can just calculate the root of and recover the original message.
Timing attacks are in my opinion one of the most interesting attacks in cryptography. Essentially, if an attacker knows enough about the target's hardware and knows the amount of time taken to decrypt several messages, then they may be able to recover some part of the private key. Timing attacks can be applied to many cryptosystems, such as Diffie-Hellman, ElGamal, and RSA. In the case of RSA, the attacker can recover the value of one bit at a time by analyzing the decryption time for certain messages.
More info on attacks on RSA here: https://crypto.stanford.edu/~dabo/papers/RSA-survey.pdf