# Assignment 0. Information theory
## Mustafin Timur
### Legend
Black dashed line — average entropy
Each bar stands for one file's entropy
### .doc files

Entropy for this type of files much different from one file to another. It means that this files has no compression or bad compression(variance came from different content of them)
### .exe files

Entropy for this type of files different(but variance is less than with .doc) from one file to another. Most probably, some .exe files has been compressed/optimized(entropy ~8) and others are not.
### .jpg files

Variance is low, entropy is high and constant. It means that .jpg's has been compressed well.
### .pdf files

Variance is low, entropy is high(for except of a few files). It means that .jpg's has been compressed well. Files which are not compressed/optimized.
### .png files

Variance is low, entropy is high and constant. It means that .jpg's has been compressed well.
### Average entropy for different types

As we can see, `.doc` files have the worst compression between these formats. `.exe` files have much better compression but not constantly, therefore it has less entropy than `.jpg`, `.png` and `.pdf` files which are aproximatly the same and but `.pdf` is a bit better.
### special question
max theretical entropy of a file:
* `-1 * sigma from i=0 to i=n (Pi * log(Pi)) `
* with `Pi` = 1/256 (uniform distribution) we will have max entropy
* `-1 * sigma from i=0 to i=255 ((1/256) * log(1/256))`
* `-1 * 256 * (1/256)*sigma from i=0 to i=255 (log(1/256))`
* `8`