Table of Contents:
In followings, I will introduce some basic concepts used in this project.
Information is a measure of the uncertainty of an outcome. It is related to the amount of data that is required to specify the outcome of an event. The more uncertain an outcome is, the more information is required to resolve uncertainty of the outcome.
The information is calculated using the Shannon information. Shannon information is defined as:
where
Entropy is a fundamental concept in information theory that quantifies the uncertainty or randomness associated with a random variable. It measures the average amount of information required to describe or encode an event or a set of outcomes.
The entropy is calculated using the Shannon entropy. Shannon entropy is defined as:
where
Mutual information is a fundamental concept in information theory and statistics that measures the amount of information that two random variables share. It provides a quantitative measure of the dependence or association between the variables, revealing how much knowing the value of one variable can reduce uncertainty about the other.
The mutual information is calculated using the Kullback-Leibler divergence. Kullback-Leibler divergence is defined as:
where
And the mutual information is defined as:
where
Conditional entropy is a measure of the amount of information needed to describe the outcome of a random variable
The conditional entropy is defined as:
where
There is a relation between entropy, mutual information, and conditional entropy:
There are three elements in encryption: plaintext, ciphertext, and key. Plaintext is the original message. Ciphertext is the encrypted message. Key is the secret used to encrypt the plaintext. The encryption process is a function that maps the plaintext and key to the ciphertext. The decryption process is a function that maps the ciphertext and key to the plaintext.
Firstly, we consider the single-bit encryption. To make sure that once we know the ciphertext and the key, we can recover the plaintext, the encryption function must be a bijection. The possible encryption functions are:
Before we take account into multi-bits encryption, there are two important properties of the operation of a secure cipher identified by Shannon: confusion and diffusion. Confusion means that the relationship between the ciphertext and the key must be complex and involved. Diffusion means that the statistical structure of the plaintext must be dissipated into long-range statistics of the ciphertext.
There are two common ways to achieve confusion and diffusion which are Substitution-Permutation Network (SP-Network) and Feistel Cipher.
A sketch of a SP-Network with 3 rounds. Cited from wiki.
A sketch of a Feistel Cipher. Cited from wiki.
Also the example code can be found in src
folder.
All we have discussed above is that the encryption is about bit or binary system. Now, if we have a basis set of the data, e.g.:
To make it clear, we can construct a function mapping the plaintext with the key to ciphertext as shown in the following form:
which
Given a 3-elements basis set
And a general form of encryption function can be represented as:
where
where we can see that
To realized the perfect secrecy in this way, we need to have a one-time pad (OTP), which has been proved mathematically that it can achieve perfect secrecy and quantum resistance. However, it still have restrictions in the practical application, such as pre-shared key, key length, and key distribution (P.S. Quantum Key Distribution (QKD) may be helpful in OTP.)
Nowadays, what we usually use in daily communication is the asymmetric encryption (as known as Public-key cryptography), such like RSA, Elliptic Curve Cryptography (ECC) and so on. In this case, we have two keys, one is public key, and the other is private key. The public key is used to encrypt the data, and the private key is used to decrypt the data. The public key is usually published to the public, and the private key is only known by the owner.
The encryption function and decryption function now will no longer be a linear function as symmetric one, but a nonlinear function. The encryption function is usually a trapdoor function, which is easy to compute in one direction, but hard to compute in the opposite direction without the private key. Therefore, we can use asymmetric encryption to exchange the symmetric key, and then use the symmetric encryption to encrypt the data.
However, it has been proved that RSA is not quantum resistant, which means that the quantum computer can break the RSA in polynomial time with Shor's algorithm. And now, we have Post-quantum cryptography (PQC) to resist the attack from both classical and quantum computer.
We constructed a 4-bits XOR cipher as an example to show how to calculate the entropy of the encrypted data, mutual information and conditional entropy to show the relation of plaintext, key and ciphertext.
In addition, to show the fractal structure of XOR encryption, we also constructed 10-bits and 8-bits XOR cipher.
Note: the base of logarithm we used is 2 which represents bit in the unit of entropy.
To start with the 4-bits cipher, the following figure shows the ciphertext as the result of XOR operation between plaintext and key. The color represents the ciphertext (or can directly see from the number on it). We can see that the ciphertext is equally distributed to every element in the basis set. And the result is like Sudoku (數獨) that the number of ciphertext is not showing again on its row and column.
4-Bits Ciphertext Heat Map
And to calculate the properties in information theory later, we firstly made a bar plot of the entropy in 4-bits basis set to have more clear view. In the figure, we can see that when the numbers of 0 and 1 are equal, the entropy reaches the maximum. And the entropy is at the minimum when all the bits are 0 or 1.
4-Bits Entropy Bar Plot
The entropy of the ciphertext is shown in the following figure. It turns into a four-fold symmetry instead of a fractal structure.
4-Bits Entropy of Ciphertext Heat Map
Then the mutual information between C, P and C, K are shown in the followings. We can see that they are a quarter turn to each other. And also, the mutual information between C, P and C, K are small in most of the part, which means that the corelation between the ciphertext and the plaintext or key is weak.
4-Bits Mutual Information between C, P and C, K
With Eqn. (1), we can calculate the conditional entropy in the 4 kinds of relation. Let's see when the plaintext or key is known and how the conditional entropy that ciphertexts have. Based on the mutual information we get, there are cross-like pattern on it and have a quarter turn to each other.
Conditional Entropy of Ciphertext when Plaintext or Key is Known
And when the ciphertext is known, the conditional entropy of plaintext and key are shown in the followings. We can see that the conditional entropy of plaintext and key are the same, which means that the plaintext and the key are as well as hard to recover.
Conditional Entropy of Plaintext and Key when Ciphertext is Known
And we can compare the average entropy of 4-bits cipher and 8-bits cipher. We assume that the probability of each states (bits combination) is the same, and we know that the entropy of binary distribution is in the interval of 0 and 1 bit. The result is shown in below table.
Bits | |||
---|---|---|---|
4 | 0.7806 | 0.7806 | 0.5434 |
8 | 0.9024 | 0.9024 | 0.7788 |
We found that for equidistributed keys, the average entropy of ciphertexts is same as the average entropy of basis set. However, once we know the ciphertext, the average entropy (conditional entropy) of plaintexts and keys will be lowered down.
That sounds weird, since we encrypt the data, it should be more random and the entropy should be increased. But the result shows that the entropy of plaintext and key are decreased. The reason is that the plaintext should NOT be equidistributed! The plaintext we want to send must be meaningful and full of "information".
On the other hand, what we discussed here is single-char encryption (but multi-bits). In real case, we would send a sequence of chars instead of a single char. The average entropy we calculated here only represents the case that if we send a random plaintext with a random key in all the same probability.
Therefore, if we implement the randomly distributed key on the "full-of-info" plaintext, it will make the ciphertext random instead keeping the regularity of plaintext, which results in the increase of the entropy of ciphertext from plaintext.
Based on the results, we can conclude that the properties of a good encryption are as follows:
In the followings, I will list the codes I used in the jupyter notebook of methods. However, it does NOT include some other handful code I had written that may be helpful in the research of information theory.
All the source codes can be found in my Github repo: Jim137/Entropy: Entropy of encrypted data
def basis_get(data) -> list:
"Get the basis of the data."
basis_dict = []
for char in data:
if char not in basis_dict:
basis_dict.append(char)
return basis_dict
import numpy as np
from typing import Union
from .utils.basis_get import basis_get
def entropy(
data: Union[str, bytes, list, tuple],
base=2,
basis_dict: Union[set, list, tuple, None] = None,
) -> float:
"""
Return the entropy of given data.
Parameters
----------
data: str, bytes, or iterable
base: int
The logarithmic base to use, default is 2.
basis_dict: list
The basis of the entropy, default is None. If None, the basis will be the unique elements in data.
Returns
-------
entropy: float
The entropy of given data.
"""
if type(data) is bytes:
data = list(data)
elif type(data) is list or type(data) is tuple:
pass
elif type(data) is not str:
data = str(data)
if basis_dict is None:
basis_dict = basis_get(data)
prob_list = []
for item in basis_dict:
prob_list.append(data.count(item) / len(data))
entropy = 0
for prob in prob_list:
if prob == 0:
continue
entropy -= prob * np.log(prob) / np.log(base)
return entropy
import numpy as np
from typing import Union
from sklearn.metrics import mutual_info_score
def mutual_info(
dataX: Union[str, bytes, list, tuple],
dataY: Union[str, bytes, list, tuple],
base=2,
):
"""
Return the mutual information of given dataX and dataY.
Parameters
----------
dataX: str, bytes, or iterable
dataY: str, bytes, or iterable
base: int
The logarithmic base to use, default is 2.
basis_dict: list
The basis of the information, default is None. If None, the basis will be the unique elements in data.
Returns
-------
mi: float
The mutual information of given dataX and dataY.
"""
x = []
if type(dataX) is bytes:
x = list(dataX)
elif type(dataX) is str:
for char in dataX:
x.append(char)
else:
x = dataX
y = []
if type(dataY) is bytes:
y = list(dataY)
elif type(dataY) is str:
for char in dataY:
y.append(char)
else:
y = dataY
if len(x) != len(y):
print("Error: The length of dataX and dataY should be equal.")
raise ValueError
mi = mutual_info_score(x, y) / np.log(base)
return mi
from typing import Union
from .mutual_info import mutual_info
from .entropy import entropy
def con_entropy(
dataX: Union[str, bytes, list, tuple],
dataY: Union[str, bytes, list, tuple],
base=2,
basis_dict: Union[set, list, tuple, None] = None,
) -> float:
return entropy(dataX, base, basis_dict=basis_dict) - mutual_info(dataX, dataY, base)
In the jupyter notebook of methods, I have presented a special charasteric of XOR which is Bitwise Fractal. However, I cannot make a good interpretation of them, also it will exceed the topic. So I don't put it in the main text, but I still want to show how cool it is. Therefore, I put it in appendix.
We can clear see that the following figure, 10-bits cipher, presents a fractal pattern.
The unit structure of the fractal we can see that is XOR, where we prepared a XOR true table below. That the diagonal have higher value than the anti-diagonal. And the above figure did show the XOR behavior in every two-power-series scales' structure.
We also can see that they still satisfy the property what we concluded above.
The entropy of cipher turns into four-fold symmetric instead of fractal pattern.
Mutual information plots are a quarter turn to each other.
So do conditional entropy plots when plaintext or key is known.
And when ciphertext is known, the conditional entropy plots to plaintext and key are the same.
All the properties above are the same for 4-bits, 8-bits and 10-bits cases. But for higher number of bit, the fractal pattern is more significant. I think this is a cool phenomenon, that I want to introduce.