ADL Lecture 3.1: Word Representations 筆記

# ADL Lecture 3.1: Word Representations 筆記 ###### tags: `NLP` {%youtube p2e_riORjuU %} ## :memo: Meaning Representations in Computers ### Type 1: Knowledge-Based Representation - 由語言學家定出來的wordnet，表示詞彙之間的關係。 ![](https://i.imgur.com/NZ5G6um.png) - 問題: -過於主觀 -新加入的字無法處理 -帶有語言學家主觀的意見 -需要人工標記 :rocket: ### Type 2: Corpus-Based Representation - **Atomic symbols**: one-hot representation ![](https://i.imgur.com/cx1LUQe.png) -問題:無法得知字詞之間的關聯。 - Neighbor-based representation -Neighbor 定義: full documet -Neighbor 定義: windows * Windows-Based Co-occurrence Matrixz ![](https://i.imgur.com/epRHLeX.png) * 問題: matrix size隨字詞增加而增大 #### 改進 : Low-Dimensional Dense Word Vector * Method 1: dimension reduction on the matrix * Singular Value Decomposition (SVD) of co-occurrence matrix X ![](https://i.imgur.com/dx0tscq.png) > 使用SVD讓矩陣S降維(由r降為k) 問題: SVD計算量龐大 * Method 2: directly learn low-dimensional word vectors * Recent and most popular models: **word2vec** (Mikolov et al. 2013) and **Glove** (Pennington et al., 2014) > As known as “Word Embeddings” ---