Entropy of Encrypted Data

Table of Contents:

Entropy of Encrypted Data

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Github Repo of this work: Jim137/Entropy: Entropy of encrypted data

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Revised version of this work: PDF Link

Introduction

In followings, I will introduce some basic concepts used in this project.

ⓘ Details about important properties in information theory

This block is written with the help of GPT.

Information

Information is a measure of the uncertainty of an outcome. It is related to the amount of data that is required to specify the outcome of an event. The more uncertain an outcome is, the more information is required to resolve uncertainty of the outcome.

The information is calculated using the Shannon information. Shannon information is defined as:

where

p (x_{i})

is the probability of the

i

-th symbol in the data.

Entropy

Entropy is a fundamental concept in information theory that quantifies the uncertainty or randomness associated with a random variable. It measures the average amount of information required to describe or encode an event or a set of outcomes.

The entropy is calculated using the Shannon entropy. Shannon entropy is defined as:

where

p (x_{i})

is the probability of the

i

-th symbol in the data.

Mutual Information

Mutual information is a fundamental concept in information theory and statistics that measures the amount of information that two random variables share. It provides a quantitative measure of the dependence or association between the variables, revealing how much knowing the value of one variable can reduce uncertainty about the other.

The mutual information is calculated using the Kullback-Leibler divergence. Kullback-Leibler divergence is defined as:

where

p (x_{i})

is the probability of the

i

-th symbol in the data, and

q (x_{i})

is the probability of the

i

-th symbol in the encrypted data.

And the mutual information is defined as:

where

P (X, Y)

is the joint probability of

X

and

Y

, and

P (X) P (Y)

is the product of the marginal probabilities of

X

and

Y

Conditional Entropy

Conditional entropy is a measure of the amount of information needed to describe the outcome of a random variable

Y

given that the value of another random variable

X

is known. It is also known as the equivocation of

Y

given

X

The conditional entropy is defined as:

where

p (x_{i})

is the probability of the

i

-th symbol in the data, and

p (y_{j} | x_{i})

is the probability of the

j

-th symbol in the encrypted data given the

i

-th symbol in the data.

1. Relation between Entropy, Mutual Information, and Conditional Entropy

There is a relation between entropy, mutual information, and conditional entropy:

\begin{matrix} (1) & I (X; Y) = H (X) - H (X | Y) = H (Y) - H (Y | X) \end{matrix}

2. Encryption

There are three elements in encryption: plaintext, ciphertext, and key. Plaintext is the original message. Ciphertext is the encrypted message. Key is the secret used to encrypt the plaintext. The encryption process is a function that maps the plaintext and key to the ciphertext. The decryption process is a function that maps the ciphertext and key to the plaintext.

Firstly, we consider the single-bit encryption. To make sure that once we know the ciphertext and the key, we can recover the plaintext, the encryption function must be a bijection. The possible encryption functions are:

f_{1} (0, 0) = 0, f_{1} (0, 1) = 1, f_{1} (1, 0) = 1, f_{1} (1, 1) = 0

f_{2} (0, 0) = 1, f_{2} (0, 1) = 0, f_{2} (1, 0) = 0, f_{2} (1, 1) = 1

f_{1}

is XOR (exclusive-OR) encryption, and

f_{2}

is XNOR (exclusive-NOR) encryption. They both have the characteristic that ciphertext is corresponding to both 0, 1; only once we know the key is, the plane text can be recovered. And in most of cases, we pick XOR encryption as the most basic encryption to bit-wise operation.

Before we take account into multi-bits encryption, there are two important properties of the operation of a secure cipher identified by Shannon: confusion and diffusion. Confusion means that the relationship between the ciphertext and the key must be complex and involved. Diffusion means that the statistical structure of the plaintext must be dissipated into long-range statistics of the ciphertext.

There are two common ways to achieve confusion and diffusion which are Substitution-Permutation Network (SP-Network) and Feistel Cipher.

ⓘ Details about SP-Network and Feistel Cipher

SP-Network contains two main features: substitution and permutation. Substitution means that the plaintext is replaced by the ciphertext using a substitution table. Permutation means that the order of the bits in the ciphertext is changed. Finally, at each round, the round key is combined using XOR encryption and sends the results to next round. The SP-Network is illustrated as follows:

A sketch of a SP-Network with 3 rounds. Cited from wiki.

Feistel Cipher use another way to realize confusion and diffusion. It splits the plaintext into two halves, and each round, the left half is XORed with the round key and then the result is sent to the round function. The output of the round function is XORed with the right half and then the result is sent to the next round. The Feistel Cipher is illustrated as follows:

A sketch of a Feistel Cipher. Cited from wiki.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Based on these concepts, we have many kinds of algorithm to encrypt the data. We have an introduction about 3 of the most common encryption algorithms which is written with the help of GPT: ClickMe!

Also the example code can be found in src folder.

All we have discussed above is that the encryption is about bit or binary system. Now, if we have a basis set of the data, e.g.:

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

{A, B, C, . . ., Z}

, we still can use the index to map the basis set to the binary data, like Base64 encoding. For the N-bases system, we have 11 methods introduced by Shannon in 1949 to encrypt the data,

including:

Simple Substitution Cipher
Transposition
Vigenère cipher
N-grams Substitution Cipher
Single Mixed Alphabet Vigenère Cipher: A simple substitution followed by a Vigenère cipher.
Hill Cipher
Playfair Cipher
Multiple Mixed Alphabet Substitution: In this cipher there are a set of d simple substitutions which are used in sequence.
Autokey Cipher
Fractional Ciphers
Codes: In codes words (or sometimes syllables) are replaced by substitute letter groups. Sometimes a cipher of one kind or another is applied to the result.

To make it clear, we can construct a function mapping the plaintext with the key to ciphertext as shown in the following form:

f : {m_{0}, m_{1}, m_{2}, . . ., m_{N - 1}} \times {m_{0}, m_{1}, m_{2}, . . ., m_{N - 1}} \to {m_{0}, m_{1}, m_{2}, . . ., m_{N - 1}}

which

f

needs to satisfy the following properties:

$f$ is a bijection.
Ciphertext must correspond to every element in the basis set. That without the key, we cannot recover the plaintext.

Given a 3-elements basis set

{A, B, C}

, we have

3! 2! = 12

possible encryption functions, one of them is:

$f$	$A$	$B$	$C$
$A$	$A$	$B$	$C$
$B$	$B$	$C$	$A$
$C$	$C$	$A$	$B$

And a general form of encryption function can be represented as:

c_{i} = f (p_{i}, k_{i}) = a \cdot p_{i} + b \cdot k_{i} + c \mod N

where

a, b, c

are integers,

N

is the number of elements in the basis set,

p_{i}

is the plaintext,

k_{i}

is the key, and

c_{i}

is the ciphertext. And the decryption function is:

p_{i} = f^{- 1} (c_{i}, k_{i}) = a \cdot c_{i} + b \cdot k_{i} + c \mod N

where we can see that

f = f^{- 1}

, in this case,

f

is so-called symmetric encryption.

To realized the perfect secrecy in this way, we need to have a one-time pad (OTP), which has been proved mathematically that it can achieve perfect secrecy and quantum resistance. However, it still have restrictions in the practical application, such as pre-shared key, key length, and key distribution (P.S. Quantum Key Distribution (QKD) may be helpful in OTP.)

Nowadays, what we usually use in daily communication is the asymmetric encryption (as known as Public-key cryptography), such like RSA, Elliptic Curve Cryptography (ECC) and so on. In this case, we have two keys, one is public key, and the other is private key. The public key is used to encrypt the data, and the private key is used to decrypt the data. The public key is usually published to the public, and the private key is only known by the owner.

The encryption function and decryption function now will no longer be a linear function as symmetric one, but a nonlinear function. The encryption function is usually a trapdoor function, which is easy to compute in one direction, but hard to compute in the opposite direction without the private key. Therefore, we can use asymmetric encryption to exchange the symmetric key, and then use the symmetric encryption to encrypt the data.

However, it has been proved that RSA is not quantum resistant, which means that the quantum computer can break the RSA in polynomial time with Shor's algorithm. And now, we have Post-quantum cryptography (PQC) to resist the attack from both classical and quantum computer.

Methods

We constructed a 4-bits XOR cipher as an example to show how to calculate the entropy of the encrypted data, mutual information and conditional entropy to show the relation of plaintext, key and ciphertext.

In addition, to show the fractal structure of XOR encryption, we also constructed 10-bits and 8-bits XOR cipher.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

The details of the methods can be found in

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

entropy_of_encrypted.ipynb

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Results

Note: the base of logarithm we used is 2 which represents bit in the unit of entropy.

To start with the 4-bits cipher, the following figure shows the ciphertext as the result of XOR operation between plaintext and key. The color represents the ciphertext (or can directly see from the number on it). We can see that the ciphertext is equally distributed to every element in the basis set. And the result is like Sudoku (數獨) that the number of ciphertext is not showing again on its row and column.

4-Bits Ciphertext Heat Map

And to calculate the properties in information theory later, we firstly made a bar plot of the entropy in 4-bits basis set to have more clear view. In the figure, we can see that when the numbers of 0 and 1 are equal, the entropy reaches the maximum. And the entropy is at the minimum when all the bits are 0 or 1.

4-Bits Entropy Bar Plot

The entropy of the ciphertext is shown in the following figure. It turns into a four-fold symmetry instead of a fractal structure.

4-Bits Entropy of Ciphertext Heat Map

Then the mutual information between C, P and C, K are shown in the followings. We can see that they are a quarter turn to each other. And also, the mutual information between C, P and C, K are small in most of the part, which means that the corelation between the ciphertext and the plaintext or key is weak.

4-Bits Mutual Information between C, P and C, K

With Eqn. (1), we can calculate the conditional entropy in the 4 kinds of relation. Let's see when the plaintext or key is known and how the conditional entropy that ciphertexts have. Based on the mutual information we get, there are cross-like pattern on it and have a quarter turn to each other.

Conditional Entropy of Ciphertext when Plaintext or Key is Known

And when the ciphertext is known, the conditional entropy of plaintext and key are shown in the followings. We can see that the conditional entropy of plaintext and key are the same, which means that the plaintext and the key are as well as hard to recover.

Conditional Entropy of Plaintext and Key when Ciphertext is Known

And we can compare the average entropy of 4-bits cipher and 8-bits cipher. We assume that the probability of each states (bits combination) is the same, and we know that the entropy of binary distribution is in the interval of 0 and 1 bit. The result is shown in below table.

Bits	$⟨ H_{B a s i s} ⟩$	$⟨ H_{C i p h e r} ⟩$	$⟨ H (P \| C) ⟩$ = $⟨ H (K \| C) ⟩$
4	0.7806	0.7806	0.5434
8	0.9024	0.9024	0.7788

We found that for equidistributed keys, the average entropy of ciphertexts is same as the average entropy of basis set. However, once we know the ciphertext, the average entropy (conditional entropy) of plaintexts and keys will be lowered down.

That sounds weird, since we encrypt the data, it should be more random and the entropy should be increased. But the result shows that the entropy of plaintext and key are decreased. The reason is that the plaintext should NOT be equidistributed! The plaintext we want to send must be meaningful and full of "information".

On the other hand, what we discussed here is single-char encryption (but multi-bits). In real case, we would send a sequence of chars instead of a single char. The average entropy we calculated here only represents the case that if we send a random plaintext with a random key in all the same probability.

Therefore, if we implement the randomly distributed key on the "full-of-info" plaintext, it will make the ciphertext random instead keeping the regularity of plaintext, which results in the increase of the entropy of ciphertext from plaintext.

Conclusion

Based on the results, we can conclude that the properties of a good encryption are as follows:

The transformation map from plaintext and key to ciphertext must be equidistributed.
If the transformation map is not equidistributed, based on char frequency in each languages statistically, the eavesdropper can guess the plaintext from the ciphertext with frequency characteristic.
The key must be randomly distributed.
Since we want to make the conditional entropy between ciphertext and plaintext reach the maximum, and the mutual information between ciphertext and plaintext reach the minimum. As a result, the eavesdropper cannot get useful information from ciphertext without key.
The order of the sequence of chars from plaintext to ciphertext must have a random permutation, and we also have to make the length of ciphertext into a given length instead of original length.
To satisfy confusion and diffusion, it can make the mutual information between ciphertext and plaintext lower, and the conditional entropy between ciphertext and plaintext higher. As a result, it will make the eavesdropper even harder to get useful information in the case that we use the same key and communicate multiple times.

References

Appendix

1. Code

In the followings, I will list the codes I used in the jupyter notebook of methods. However, it does NOT include some other handful code I had written that may be helpful in the research of information theory.

All the source codes can be found in my Github repo: Jim137/Entropy: Entropy of encrypted data

src/utils/basis_get.py








def basis_get(data) -> list:
    "Get the basis of the data."

    basis_dict = []
    for char in data:
        if char not in basis_dict:
            basis_dict.append(char)
    return basis_dict

src/entropy.py















































import numpy as np
from typing import Union

from .utils.basis_get import basis_get


def entropy(
    data: Union[str, bytes, list, tuple],
    base=2,
    basis_dict: Union[set, list, tuple, None] = None,
) -> float:
    """
    Return the entropy of given data.

    Parameters
    ----------
    data:   str, bytes, or iterable
    base:   int
            The logarithmic base to use, default is 2.
    basis_dict: list
                The basis of the entropy, default is None. If None, the basis will be the unique elements in data.

    Returns
    -------
    entropy:    float
                The entropy of given data.
    """

    if type(data) is bytes:
        data = list(data)
    elif type(data) is list or type(data) is tuple:
        pass
    elif type(data) is not str:
        data = str(data)

    if basis_dict is None:
        basis_dict = basis_get(data)

    prob_list = []
    for item in basis_dict:
        prob_list.append(data.count(item) / len(data))
    entropy = 0
    for prob in prob_list:
        if prob == 0:
            continue
        entropy -= prob * np.log(prob) / np.log(base)
    return entropy

src/mutual_info.py





















































import numpy as np
from typing import Union
from sklearn.metrics import mutual_info_score


def mutual_info(
    dataX: Union[str, bytes, list, tuple],
    dataY: Union[str, bytes, list, tuple],
    base=2,
):
    """
    Return the mutual information of given dataX and dataY.

    Parameters
    ----------
    dataX:   str, bytes, or iterable
    dataY:   str, bytes, or iterable
    base:   int
            The logarithmic base to use, default is 2.
    basis_dict: list
                The basis of the information, default is None. If None, the basis will be the unique elements in data.

    Returns
    -------
    mi:    float
            The mutual information of given dataX and dataY.

    """

    x = []
    if type(dataX) is bytes:
        x = list(dataX)
    elif type(dataX) is str:
        for char in dataX:
            x.append(char)
    else:
        x = dataX

    y = []
    if type(dataY) is bytes:
        y = list(dataY)
    elif type(dataY) is str:
        for char in dataY:
            y.append(char)
    else:
        y = dataY

    if len(x) != len(y):
        print("Error: The length of dataX and dataY should be equal.")
        raise ValueError

    mi = mutual_info_score(x, y) / np.log(base)
    return mi

src/con_entropy.py













from typing import Union

from .mutual_info import mutual_info
from .entropy import entropy


def con_entropy(
    dataX: Union[str, bytes, list, tuple],
    dataY: Union[str, bytes, list, tuple],
    base=2,
    basis_dict: Union[set, list, tuple, None] = None,
) -> float:
    return entropy(dataX, base, basis_dict=basis_dict) - mutual_info(dataX, dataY, base)

2. Fractal Structure

In the jupyter notebook of methods, I have presented a special charasteric of XOR which is Bitwise Fractal. However, I cannot make a good interpretation of them, also it will exceed the topic. So I don't put it in the main text, but I still want to show how cool it is. Therefore, I put it in appendix.

We can clear see that the following figure, 10-bits cipher, presents a fractal pattern.

The unit structure of the fractal we can see that is XOR, where we prepared a XOR true table below. That the diagonal have higher value than the anti-diagonal. And the above figure did show the XOR behavior in every two-power-series scales' structure.

We also can see that they still satisfy the property what we concluded above.

The entropy of cipher turns into four-fold symmetric instead of fractal pattern.

Mutual information plots are a quarter turn to each other.

So do conditional entropy plots when plaintext or key is known.

And when ciphertext is known, the conditional entropy plots to plaintext and key are the same.

All the properties above are the same for 4-bits, 8-bits and 10-bits cases. But for higher number of bit, the fractal pattern is more significant. I think this is a cool phenomenon, that I want to introduce.

Entropy of Encrypted Data

Introduction

This block is written with the help of GPT.

Information

Entropy

Mutual Information

Conditional Entropy

1. Relation between Entropy, Mutual Information, and Conditional Entropy

2. Encryption

Methods

Results

Conclusion

References

Appendix

1. Code

2. Fractal Structure

Read more

Binocular Balance Through Hebbian Learning and BCM Theory