Table of Contents:
In followings, I will introduce some basic concepts used in this project.
Information is a measure of the uncertainty of an outcome. It is related to the amount of data that is required to specify the outcome of an event. The more uncertain an outcome is, the more information is required to resolve uncertainty of the outcome.
The information is calculated using the Shannon information. Shannon information is defined as:
where is the probability of the -th symbol in the data.
Entropy is a fundamental concept in information theory that quantifies the uncertainty or randomness associated with a random variable. It measures the average amount of information required to describe or encode an event or a set of outcomes.
The entropy is calculated using the Shannon entropy. Shannon entropy is defined as:
where is the probability of the -th symbol in the data.
Mutual information is a fundamental concept in information theory and statistics that measures the amount of information that two random variables share. It provides a quantitative measure of the dependence or association between the variables, revealing how much knowing the value of one variable can reduce uncertainty about the other.
The mutual information is calculated using the Kullback-Leibler divergence. Kullback-Leibler divergence is defined as:
where is the probability of the -th symbol in the data, and is the probability of the -th symbol in the encrypted data.
And the mutual information is defined as:
where is the joint probability of and , and is the product of the marginal probabilities of and .
Conditional entropy is a measure of the amount of information needed to describe the outcome of a random variable given that the value of another random variable is known. It is also known as the equivocation of given .
The conditional entropy is defined as:
where is the probability of the -th symbol in the data, and is the probability of the -th symbol in the encrypted data given the -th symbol in the data.
There is a relation between entropy, mutual information, and conditional entropy:
There are three elements in encryption: plaintext, ciphertext, and key. Plaintext is the original message. Ciphertext is the encrypted message. Key is the secret used to encrypt the plaintext. The encryption process is a function that maps the plaintext and key to the ciphertext. The decryption process is a function that maps the ciphertext and key to the plaintext.
Firstly, we consider the single-bit encryption. To make sure that once we know the ciphertext and the key, we can recover the plaintext, the encryption function must be a bijection. The possible encryption functions are:
is XOR (exclusive-OR) encryption, and is XNOR (exclusive-NOR) encryption. They both have the characteristic that ciphertext is corresponding to both 0, 1; only once we know the key is, the plane text can be recovered. And in most of cases, we pick XOR encryption as the most basic encryption to bit-wise operation.
Before we take account into multi-bits encryption, there are two important properties of the operation of a secure cipher identified by Shannon: confusion and diffusion. Confusion means that the relationship between the ciphertext and the key must be complex and involved. Diffusion means that the statistical structure of the plaintext must be dissipated into long-range statistics of the ciphertext.
There are two common ways to achieve confusion and diffusion which are Substitution-Permutation Network (SP-Network) and Feistel Cipher.
A sketch of a SP-Network with 3 rounds. Cited from wiki.
A sketch of a Feistel Cipher. Cited from wiki.
Also the example code can be found in src
folder.
All we have discussed above is that the encryption is about bit or binary system. Now, if we have a basis set of the data, e.g.: or , we still can use the index to map the basis set to the binary data, like Base64 encoding. For the N-bases system, we have 11 methods introduced by Shannon in 1949 to encrypt the data,
To make it clear, we can construct a function mapping the plaintext with the key to ciphertext as shown in the following form:
which needs to satisfy the following properties:
Given a 3-elements basis set , we have possible encryption functions, one of them is:
And a general form of encryption function can be represented as:
where are integers, is the number of elements in the basis set, is the plaintext, is the key, and is the ciphertext. And the decryption function is:
where we can see that , in this case, is so-called symmetric encryption.
To realized the perfect secrecy in this way, we need to have a one-time pad (OTP), which has been proved mathematically that it can achieve perfect secrecy and quantum resistance. However, it still have restrictions in the practical application, such as pre-shared key, key length, and key distribution (P.S. Quantum Key Distribution (QKD) may be helpful in OTP.)
Nowadays, what we usually use in daily communication is the asymmetric encryption (as known as Public-key cryptography), such like RSA, Elliptic Curve Cryptography (ECC) and so on. In this case, we have two keys, one is public key, and the other is private key. The public key is used to encrypt the data, and the private key is used to decrypt the data. The public key is usually published to the public, and the private key is only known by the owner.
The encryption function and decryption function now will no longer be a linear function as symmetric one, but a nonlinear function. The encryption function is usually a trapdoor function, which is easy to compute in one direction, but hard to compute in the opposite direction without the private key. Therefore, we can use asymmetric encryption to exchange the symmetric key, and then use the symmetric encryption to encrypt the data.
However, it has been proved that RSA is not quantum resistant, which means that the quantum computer can break the RSA in polynomial time with Shor's algorithm. And now, we have Post-quantum cryptography (PQC) to resist the attack from both classical and quantum computer.
We constructed a 4-bits XOR cipher as an example to show how to calculate the entropy of the encrypted data, mutual information and conditional entropy to show the relation of plaintext, key and ciphertext.
In addition, to show the fractal structure of XOR encryption, we also constructed 10-bits and 8-bits XOR cipher.
Note: the base of logarithm we used is 2 which represents bit in the unit of entropy.
To start with the 4-bits cipher, the following figure shows the ciphertext as the result of XOR operation between plaintext and key. The color represents the ciphertext (or can directly see from the number on it). We can see that the ciphertext is equally distributed to every element in the basis set. And the result is like Sudoku (數獨) that the number of ciphertext is not showing again on its row and column.
4-Bits Ciphertext Heat Map
And to calculate the properties in information theory later, we firstly made a bar plot of the entropy in 4-bits basis set to have more clear view. In the figure, we can see that when the numbers of 0 and 1 are equal, the entropy reaches the maximum. And the entropy is at the minimum when all the bits are 0 or 1.
4-Bits Entropy Bar Plot
The entropy of the ciphertext is shown in the following figure. It turns into a four-fold symmetry instead of a fractal structure.
4-Bits Entropy of Ciphertext Heat Map
Then the mutual information between C, P and C, K are shown in the followings. We can see that they are a quarter turn to each other. And also, the mutual information between C, P and C, K are small in most of the part, which means that the corelation between the ciphertext and the plaintext or key is weak.
4-Bits Mutual Information between C, P and C, K
With Eqn. (1), we can calculate the conditional entropy in the 4 kinds of relation. Let's see when the plaintext or key is known and how the conditional entropy that ciphertexts have. Based on the mutual information we get, there are cross-like pattern on it and have a quarter turn to each other.
Conditional Entropy of Ciphertext when Plaintext or Key is Known
And when the ciphertext is known, the conditional entropy of plaintext and key are shown in the followings. We can see that the conditional entropy of plaintext and key are the same, which means that the plaintext and the key are as well as hard to recover.
Conditional Entropy of Plaintext and Key when Ciphertext is Known
And we can compare the average entropy of 4-bits cipher and 8-bits cipher. We assume that the probability of each states (bits combination) is the same, and we know that the entropy of binary distribution is in the interval of 0 and 1 bit. The result is shown in below table.
Bits | = | ||
---|---|---|---|
4 | 0.7806 | 0.7806 | 0.5434 |
8 | 0.9024 | 0.9024 | 0.7788 |
We found that for equidistributed keys, the average entropy of ciphertexts is same as the average entropy of basis set. However, once we know the ciphertext, the average entropy (conditional entropy) of plaintexts and keys will be lowered down.
That sounds weird, since we encrypt the data, it should be more random and the entropy should be increased. But the result shows that the entropy of plaintext and key are decreased. The reason is that the plaintext should NOT be equidistributed! The plaintext we want to send must be meaningful and full of "information".
On the other hand, what we discussed here is single-char encryption (but multi-bits). In real case, we would send a sequence of chars instead of a single char. The average entropy we calculated here only represents the case that if we send a random plaintext with a random key in all the same probability.
Therefore, if we implement the randomly distributed key on the "full-of-info" plaintext, it will make the ciphertext random instead keeping the regularity of plaintext, which results in the increase of the entropy of ciphertext from plaintext.
Based on the results, we can conclude that the properties of a good encryption are as follows:
In the followings, I will list the codes I used in the jupyter notebook of methods. However, it does NOT include some other handful code I had written that may be helpful in the research of information theory.
All the source codes can be found in my Github repo: Jim137/Entropy: Entropy of encrypted data
In the jupyter notebook of methods, I have presented a special charasteric of XOR which is Bitwise Fractal. However, I cannot make a good interpretation of them, also it will exceed the topic. So I don't put it in the main text, but I still want to show how cool it is. Therefore, I put it in appendix.
We can clear see that the following figure, 10-bits cipher, presents a fractal pattern.
The unit structure of the fractal we can see that is XOR, where we prepared a XOR true table below. That the diagonal have higher value than the anti-diagonal. And the above figure did show the XOR behavior in every two-power-series scales' structure.
We also can see that they still satisfy the property what we concluded above.
The entropy of cipher turns into four-fold symmetric instead of fractal pattern.
Mutual information plots are a quarter turn to each other.
So do conditional entropy plots when plaintext or key is known.
And when ciphertext is known, the conditional entropy plots to plaintext and key are the same.
All the properties above are the same for 4-bits, 8-bits and 10-bits cases. But for higher number of bit, the fractal pattern is more significant. I think this is a cool phenomenon, that I want to introduce.