# Assignment 0. Information theory ## Mustafin Timur ### Legend Black dashed line — average entropy Each bar stands for one file's entropy ### .doc files ![](https://i.imgur.com/wbXkj7x.png) Entropy for this type of files much different from one file to another. It means that this files has no compression or bad compression(variance came from different content of them) ### .exe files ![](https://i.imgur.com/b1MZ2lz.png) Entropy for this type of files different(but variance is less than with .doc) from one file to another. Most probably, some .exe files has been compressed/optimized(entropy ~8) and others are not. ### .jpg files ![](https://i.imgur.com/kBCrxgu.png) Variance is low, entropy is high and constant. It means that .jpg's has been compressed well. ### .pdf files ![](https://i.imgur.com/oWoXduy.png) Variance is low, entropy is high(for except of a few files). It means that .jpg's has been compressed well. Files which are not compressed/optimized. ### .png files ![](https://i.imgur.com/fFnHszr.png) Variance is low, entropy is high and constant. It means that .jpg's has been compressed well. ### Average entropy for different types ![](https://i.imgur.com/B8806Lh.png) As we can see, `.doc` files have the worst compression between these formats. `.exe` files have much better compression but not constantly, therefore it has less entropy than `.jpg`, `.png` and `.pdf` files which are aproximatly the same and but `.pdf` is a bit better. ### special question max theretical entropy of a file: * `-1 * sigma from i=0 to i=n (Pi * log(Pi)) ` * with `Pi` = 1/256 (uniform distribution) we will have max entropy * `-1 * sigma from i=0 to i=255 ((1/256) * log(1/256))` * `-1 * 256 * (1/256)*sigma from i=0 to i=255 (log(1/256))` * `8`