Introduction to Statistical NLP

Why?

Here are some links of what statistical NLP can do today.

OpenAI Codex Live Demo

What is a Language Model?

https://en.wikipedia.org/wiki/Language_model

References

The book Supervised Machine Learning for Text Analysis in R has a lot of good material about topics we will not discuss, such as the importance of tokenizing, stop words, stemming, bias in data. You can also find a good account of word embeddings.

If you want to see the math background of the language model of

n

-grams, have a look at Chapter 3.1 of Speech and Language Processing by Jurafsky and Martin.

This podcast with Ilya Sutskever for the deep learning side of things.