We previously learned about using co-occurrence and latent-semantic analysis to create count-based word vectors. Here we learn a method based on neural networks that is conceptually similar, but has many technical advantages, not least that it can be trained on much larger corpora. This method is prediction-based in the sense that it learns how to predict a word from its context (continuous bag of words model) or a context from a word (skip gram model).
You should read the following introductory material after we went over the main ideas in class.
word2vec
.The paper that introduced continuous bag of words and skip gram is Efficient Estimation of Word Representations in Vector Space, 2013.
word2vec
was introduced in Distributed Representations of Words and Phrases and their Compositionality, 2013, which refines the skip gram model from the previous paper. This was the first time that word embeddings were computed for millions of words by training on billions of words. It is also shown that word2vec
produces representations that exhibit linear structure that makes precise analogical reasoning possible. See McCormick's Word2Vec Tutorial Part 2 for a summary of the novel techniques introduced in this paper.