# Information Theory Discussion
Information Theory is almost pivotal in understanding various theoretical aspects of Machine Learning but sadly it isn't discussed nearly as often as it should be. The aim of this discussion is to fill this gap enabling one to delve deeper into the theoretical aspects of the field.
We'll start by discussing about the essence of Information Theory. We'll move onto entropy and divergence(with a focus on KL divergence).
Relevant uses of Information Theory in the context of Machine Learning will also be discussed as the discussion progresses.
The resources I would be using are:
* [Paper on Divergence, Entropy and Information by Philip S. Chodrow](https://arxiv.org/pdf/1708.07459.pdf)
* [Book on Information Theory, Inference, and Learning Algorithms(Ch 1 & 2) by MacKay](https://libgen.rocks/get.php?md5=d67881f5549615f6f507274d2d3eee84&key=ZS870H7GJYG79O7I)
* [Section on Information Theory on d2l.ai](https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html)
It is recommend to briefly go through the last resource before the discussion. One can then read the first two chapters of the book by MacKay, if time permits.
Some advanced reading material for those who want to further explore the field:
* [Course on Information Processing and Learning at CMU](https://www.cs.cmu.edu/~aarti/Class/10704/lecs.html)
* The book by MacKay explores the topic in great depth
The discussion would be largely mathematical.
Any suggestions regarding the discussion are welcome.