RSA

Club Resources

Why RSA?

Cryptography exists to facilitate secure communication without an outside attacker being able to read the data being sent. Many systems and ciphers exist for this purpose, but many of them have one major issue: they rely solely on a private key. Suppose Reginald wants to send Tyrone (names courtesy of Kara Kim) a message. To do this with a traditional system, they have to agree on a private key that they both use to encrypt and decrypt the message. Since they have not already agreed on the key, however, they cannot securely exchange it. To solve this, public key cryptography systems use a public key for encryption, so anyone can encrypt and send a message to Tyrone. Tyrone then uses the corresponding private key, which only he knows, to decrypt the message.

The RSA system is a public key crypto system created by Ron Rivest, Adi Shamir, and Leonard Abelman and is one of the most common methods of encryption. The system is based on the fact that large numbers are really hard to factor. The GNFS algorithm, which is the fastest known algorithm for factoring large numbers (that can run on non-quantum computers), can take several thousand years of processor time to factor a sufficiently large integer. Because of this, RSA (if implemented correctly) is essentially uncrackable with the current state of available computing.

RSA uses modular arithmetic for encryption and decryption, and while it is not strictly necessary to understand the underlying math of RSA in order to use it, it does make using it much easier. Because of this, this lecture builds the system from the ground up, but if you already understand modular arithmetic or just want to understand the basic process for RSA you can skip the next section.

Modular arithmetic

An integer can be expressed as a multiple of some divisor plus a remainder, as shown by the following equation.

a=q×b+r
In modular arithmetic, we only care about this remainder. The basic idea of modular arithmetic is that if we operate under some modulus n, we can do any ordinary arithmetic, like addition, subtraction, multiplication, division (sort of), or exponentiation, but we only use integers, taking their remainder when divided by n.

For example, we would say that the number 11 is equal to 3 under the modulus 4. This is true because

11=4×2+3 Using 4 as the modulus, this means that
11=3(mod4)

However, you may have noticed a small issue with this. If we look at 15 under the modulus 4, it also is 3, so

15=3(mod4)

Clearly, 3 = 3, so does that mean 15 = 11 under a modulus of 4?

Not exactly. In modular arithmetic we say that things are congruent, not equal. The proper way of writing the equations above would really be

113(mod4)

and

153(mod4)

A few final notes:

We write the modulus on the side in parentheses, which means that it applies to everything in the equation.

It is correct to say that 15 and 11 are congruent mod 4, but this can be annoying to work with, so we only write numbers between 0 and the modulus, throwing away any remainder. (This also applies to negative numbers, so -5 is also congruent to 3 mod 4)

To do modular arithmetic in python or sagemath, use the % operater for mod and the pow() function for modular exponentiation

a = 5 % 2             # a = 1
b = pow(a, 3, 5)      # b = (a ** 3) % 5

Math with a modulus

Addition, subtraction, multiplication, and exponentiation all work the same under a modulus as they do in normal arithmetic. You do the exact same calculation, then just throw away the extra multiples of the modulus to make the result less than it. For example,

10+43(mod11)

and

793(mod5)

and

5×62(mod7)

and finally,

535(mod6)

However, when we get to division, we encounter a problem. Dividing by an integer that isnt a factor of your dividend doesn't leave you with an integer. For example,

74=1.75, but 1.75 is't a number we can work with in modular arithmetic.

Because of this problem, we don't divide under a modulus. Instead, we multiply by a number's inverse. In regular arithmetic this is the same as dividing

74=7×14=7×41

But what does this mean in modular arithmetic? Well the definition of the inverse is that it equals 1 when multiplied by the original number

a×a1=a×1a=1

So for modular arithmetic this just means that we have to find some number such that multiplying by it gives us 1. If we wanted to find the inverse of 5 under the modulus 7, we would need to find some number a such that

5×a1(mod7)

It turns out that a = 3 satisfies this equation

5×3=151(mod7)

Therefore, if we wanted to divide 4 by 5 under modulus 7, we would multiply by 3

454×514×35(mod7)

So 4 divided by 5 is 5 under modulus 7.

Keep in mind that a number will have different inverses depending on the modulus

513(mod7)

but

512(mod9)

So we solved division! We can divide any number by any other number under any modulus, right?

Sadly, no. If the number shares any factors with the modulus there is no inverse. For example if we wanted to find the inverse of 9 under modulus 12, we would find that there is no integer that satisfies the equation

9×a1(mod12)

Understanding why this is true is beyond the scope of this lecture, but you can find a great explanation in Kiran's RSA notes from last year. (In addition to another explanation of everything in this lecture, in case this doesn't make sense)

Back to exponentiation

Like I mentioned earlier, exponentiation under a modulus works the same as it normally does - it's essentially just repeated multiplication, with the added caveat that you only keep the portion less than the modulus. One exponent on its own isn't very interesting, but if you look at sequences of exponents an interesting pattern emerges. For example, let's consider the powers of 4 under the modulus 9

414(mod9)
427(mod9)

431(mod9)

444(mod9)

457(mod9)

461(mod9)

What about powers of 5 mod 9

515(mod9)
527(mod9)

538(mod9)

544(mod9)

552(mod9)

561(mod9)

Or even powers of 2 mod 9

212(mod9)
224(mod9)

238(mod9)

247(mod9)

255(mod9)

261(mod9)

Eventually, you reach an exponent that yields

1, then the sequence begins to repeat. Additionally, all of them resulted in
1
when raised to the power of
6
. This is actually true for any base, so long as it is coprime to 9 (coprime means it shares no factors other than 1). Every modulus has a special exponent that will always result in 1 if the base is coprime to the modulus.

This property turns out to be very useful in number theory. It's important enough to have its own dedicate function, know as Euler's totient function, which is denoted by the greek letter

ϕ.
ϕ(n)
gives us the value of an exponent
e
such that
ae1(modn)
for
a
coprime to
n
. The value of this function happens to be equal to the number of integers between 0 and the modulus that are coprime to the modulus. It is not necessary to understand why this is true for RSA, but I invite any curious reader to ponder this themself.

Consider what the value of

ϕ(p) for some prime
p
. Because
p
is prime, it must be coprime to all integers less than it, so
ϕ(p)
must be
p1
. Additionally,
ϕ
is multiplicitive for distinct prime factors, so for some number
n
with distinct prime factors
p1,p2,p3,...,pn
,
ϕ(n)=ϕ(p1)×ϕ(p2)×ϕ(p3)×...×ϕ(pn)=(p11)×(p21)×(p31)×...×(pn1)

The proof of this is again outside of the scope of this lecture and is left as an exercise to the reader.

How RSA Actually Works

Now that we've gone over all of the underlying math, we can finally introduce the mechanics of RSA encryption.

We start be defining 2 large prime numbers,

p and
q
(in python this can be accomplished with the Crypto.Util.number.getPrime function from the pycryptodome library)

We then calculate our modulus

n=p×q and choose an exponent
e
coprime to
p1
and
q1
. In almost every situation,
e=65537
is the best option, but sometimes
e=3
is used. The values of
n
and
e
make up our public key, which can be shared with anyone wishing to send us a message

To encrypt a message

m, we compute
cme(modn)

Now how do we decrypt it?

Remember that

ϕ(n)=(p1)×(q1)
Using this we can define
de1(mod(p1)(q1))

d
, along with
p
and
q
are our private key

To decrypt our message we raise our ciphertext to the power of d

mcd(modn)
Now if we replace c for its original value,
me
, we can see that
m(me)dmed(modn)

As we defined above,

de1(mod(p1)(q1))
ed1(mod(p1)(q1))

By the definition of the modulus operator, this means that
ed=1+C(p1)(q1)
for some integer
C
. Therefore,
medm1+C(p1)(q1)m1+Cϕ(n)(modn)

Using basic properties of exponents,
medm(mϕ(n))(mϕ(n))(mϕ(n))...(modn)

Now recall that
aϕ(n)1(modn)
. This means that
medm(1)(1)(1)...m(modn)

So, finally, we have shown that

cdm(modn)

Some other properties of RSA

Malleability

RSA is malleable, meaning the content of a message can be predictably changed without decrypting the message. To do this, the attacker calculates a fudge factor

cffe(modn)

The attacker then multiplies the ciphertext by

cf

cc×cf(modn)

When the recipient decrypts this message they get the original message multiplied by

f.

(c)d(me×fe)d(modn)
(c)dmed×fed(modn)

(c)dm×f(modn)

RSA signatures

If I want to prove that a message actually comes from me and isn't someone else trying to impersonate me, I can use RSA to sign the message. This works in almost the same way, but using

d instead of
e
. For some message
m
, I can calculate the signature,
smd(modn)

The recipient of the message can then verify the signature by calculating

se(md)emedm(modn)

If this decrypted signature does not match the original message, the recipient knows that the message did not come from me.

Attacks on RSA

Factoring

If we can factor

n and find the values of
p
and
q
, then we can calculate
d
and decrypt messages. For small values of
n
this can be done reasonably quickly, even on a weak personal computer, but for large numbers this takes long enough that it is not useful. However, if
p
and
q
are chosen badly, some algorithms can factor them very quickly in comparison to a general factoring algorithm. Many of these are somewhat complex, so I won't explain them in great detail, but I will provide links for further reading.

FactorDB

FactorDB is a database containing the factorization of many large integers. If the modulus has factors recorded in the database, then you have your solution. Otherwise, you'll have to find another method.

Fermat Factorization

If

p and
q
are close together, it is relatively easy to factor
n
as a difference of squares. If we can find 2 integers
a
and
b
such that
n=a2b2
, then we know that
n=(a+b)(ab)
, so
p
and
q
are
a+b
and
ab
. To find
a
and
b
, start with a guess for
a
(usually
n
), then calculate
a2n
and check if the result is a perfect square. If it is, you've found
a
and
b
. Otherwise, increase
a
and try again.

Pollard's
p1
algorithm

If

p1 or
q1
has only small prime factors, it is known as smooth. In this case, Pollard's
p1
algorithm can be used to factor n. The algorithm uses the fact that for any integer
k
,
akϕ(p)1(modp)
for an integer
a
coprime to some prime
p
and if a number
x
is congruent to 1 modulo a factor of
n
, then the greatest common factor of
x1
and
n
is a multiple of that factor. So you can find some multiple of
p1
, you can find
p
. If we multiply a crapload of small prime factors to use as our exponent and
p1
is sufficiently smooth, then we can find
p
. Note that the limit for smooth depends on the computer running the algorithm. A large, powerful computer might use 1 billion as the limit for the largest prime factor of a smooth prime, but on a smaller personal computer, 1 million might be a more sensible limit.

Wiener's Attack

If a small value is chosen for the private exponent

d (specifically if
d<n143
), then
p
and
q
can be efficiently calculated using only the public key,
n
and
e
. The basic idea of the attack is that the attacker can use
e
and
n
to estimate the value of a fraction containing
d
. Then, an algorithm based on continued fractions can be used to recover the value of
d
.

Shor's Algorithm

In 1994, Peter Shor devised a quantum algorithm, which, given a powerful quantum computer, could factor any

n relatively quickly. This algorithm is obviously not useful for CTFs or any current cybersecurity application, but it is theoretically interesting. If you're interested in learning about this algorithm I've linked a great MinutePhysics video about it.

Coppersmith's Attack

Coppersmith's attack is actually a family of attacks that can be used if the public exponent

e is low (most commonly for
e=3
) or if the attacker has some partial knowledge of
p
or
q
.

Low public exponent and short message

If the public exponent

e is low and the message is sufficiently short, then the value of
me
will be less than
n
and therefore unaffected by the modulus. Therefore, the attacker can easily perform the inverse, calculating
ce
(i.e. the cube root of the ciphertext if
e=3
), which is the original message. Note that the ciphertext will often be large enough that it cannot be converted to a float, so the default python cbrt() function will not work. Because of this, you will likely have to make your own cube root function, but this can be easily accomplished with a simple binary search.

Håstad's Broadcast Attack

If one sender wants to send the same message to several people, using the same low public exponent but different moduli, the Chinese Remainder Theorem can be used to recover the original message. The Chinese Remainder Theorem states that if you know the remainder of some integer under various moduli, then you can reconstruct the original value of the integer. More formally, the theorem states that if we know

xa1(modn1)

xa2(modn2)

...

xak(modnk)

and and

n1,n2,...,nk are all coprime, then one can uniquely determine the value of
xmod(n1×n2×...×nk)
. Therefore, if
x<n1×n2×n1×...×nk
, we can find
x
.

Using CRT, if we know

cme(modn1)

cme(modn2)

...

cme(modne)

for

e different coprime moduli, then we can find the value of
memod(n1×n2×...×ne)
and since
m<ni
for all
i
(which is necesarily true for RSA to work properly),
me<(n1×n2×...×ne)
and
memod(n1×n2×...×ne)=me
. Therefore, we can just calculate the
eth
root of
me
and recover the original message.

Timing Attack

Timing attacks are in my opinion one of the most interesting attacks in cryptography. Essentially, if an attacker knows enough about the target's hardware and knows the amount of time taken to decrypt several messages, then they may be able to recover some part of the private key. Timing attacks can be applied to many cryptosystems, such as Diffie-Hellman, ElGamal, and RSA. In the case of RSA, the attacker can recover the value of

d one bit at a time by analyzing the decryption time for certain messages.

More info on attacks on RSA here: https://crypto.stanford.edu/~dabo/papers/RSA-survey.pdf