# 多語暨跨語資訊期末考 --- # Probabilistic CFGs (PCFG) - A set of terminals, {$w^k$}, k = 1, …, V - A set of nonterminals, {$N^i$}, i = 1, …, n - A designated start symbol $N^1$ - A set of rules, {$N^i, \xi^j$} - where $\xi^j$ is a sequence of terminals and nonterminals - A corresponding set of probabilities on rules such that: - $\forall i \Sigma_j P(N^i \rightarrow \xi^j) = 1$ - The probability of a sentence (according to grammar G) is given by: - $P(w_{1m}) = \Sigma_t P(w_{1m},t) = \Sigma_{t: yield(t) = w_{1m}} P(t)$ - where $t$ is a parse tree of the sentence - Need dynamic programming to make this efficient :::info - Assumptions of PCFG model - Place invariance - The probability of a subtree does not depend on where in the string the words it dominates are (like time invariance in an HMM) - Context free - The probability of a subtree does not depend on words not dominated by that subtree - Ancestor free - The probability of a subtree does not depend on nodes in the derivation outside the subtree ::: - **Outside ($\alpha_j$)** and **Inside ($\beta_j$)** Probabilities are defined as - $\alpha_j (p,q) = P(w_{1(p-1)},N^j_{pq},w_{(q+1)m} | G)$ - $\beta_j (p,q) = P(w_{pq} | N^j_{pq},G)$ - Product of inside and outside probabilities - $\alpha_j (p,q) \beta_j (p,q) = P(w_{1m},N^j_{pq} | G)$ # POS tagging ## Tagging HMM model - $argmax_{t_{1,n}} P(t_{1,n} | w_{1,n})$ = $argmax_{t_{1,n}} P(t_{1,n}) \times P(w_{1,n} | t_{1,n})$ - $w_i$: word at position i - $t_i$: tag assigned to word at position i # Noisy Channel model ![](https://i.imgur.com/jduMga2.png) ![](https://i.imgur.com/hu4e90e.png) ![](https://i.imgur.com/cWT5uxI.png) # Collocation - Chi square ![](https://i.imgur.com/6jIapVM.png) # Word sense disambiguity - Methodology - Supervised Disambiguation - based on labeled training set - Dictionary-based Disambiguation - based on lexical resources such as dictionaries and thesauri - Unsupervised Disambiguation - based on unlabeled corpora - Factors Influencing the Notion of Sense - Co-occurrence (bag-of-words model): topical sense - Relational information (e.g., subject, object) - Other grammatical information (e.g., part-of-speech) - Collocations (one sense per collocation) - Discourse (one sense per discourse segment): How much context is needed to determine sense? - Combinations of the above # Machine translation ![](https://i.imgur.com/JjEPPLU.png) ## Sentence alignment - Length based - Lexical based - Offset based # 老師說會考的公式,背起來 ## CFG parse tree 部分 ![](https://i.imgur.com/FEPhd1y.png) ## 學長姐論文部分 ![](https://i.imgur.com/RalZb1m.png) ![](https://i.imgur.com/E0FruMx.jpg) ![](https://i.imgur.com/wIrlTtN.jpg) ![](https://i.imgur.com/nNRReYg.jpg) ![](https://i.imgur.com/MhV1kDE.jpg) # 考古題 ## 名詞解釋 - Corpus - 語料庫為一種大型結構化或非結構化的文字組合 - 在 NLP 任務中作為建立模型所需背景知識的基礎。 - Collocation - 搭配詞 - 語言中常出現的單字組合 - a sequence of two or more consecutive words - characteristics of a syntactic and semantic unit - Parallel texts - 平行語料 - 與譯文並行放置的文本 - 原始語料與譯本之間存在 1-to-1 mapping - Sparse data problem - 在 training data 中没有出現 , 但 inference 時可能出現 - Word sense disambiguation - 不同語境中,有岐義詞的含意 - many words have different meanings or senses - there is ambiguity about how they are to be specifically interpreted - CFG - Context free grammar - 可以用來描述巢狀結構語言的文法樹 - LR parsing - Parse tree - $G = (V, \Sigma, R, S)$ - Synonym - 同義詞 - Polysemy - 多義詞 - Pragmatics - 語用學 investigate the relatonship between context and meaning - Parts of Speech tag - 詞性標註 - Smoothing - Avoid infinite Cross Entropy - Address the problem of zero probabilities while retaining the relative likelihoods for occurring items ## 寫出他們的公式 ### Chain rule - $p(W) = p(w_1,w_2,w_3,...,w_n)$ - $p(W) = p(w_1)p(w_2|w_1)p(w_3|w_1,w_2) ... p(w_n|w_1,w_2,...,w_{n-1})$ ### Entropy - $H(p) = -\Sigma p(x)logp(x) = \Sigma p(x)log\frac{1}{p(x)}$ ### N-gram language model - $P(W) = \Pi_{1..n} P(w_i|w_{i-n+1},w_{i-n+2},...,w_{i-1})$ ### Hidden Markov model - $P(O|\mu) = \Sigma_{x_1...x_T} \pi_{x_1}b_{x_1o_1} \Pi^{T-1}_{t=1} a_{x_tx_{t+1}} b_{x_{t+1}o_{t+1}}$ ![](https://i.imgur.com/Nlwz5Iv.png) ### Kullback-Leibler divergence - 相對熵(relative entropy) - 是兩個機率分布 $P$ 和 $Q$ 差別的非對稱性的度量 - $D_{\mathrm {KL} }(P\|Q)=-\sum _{i}P(i)\ln {\frac {Q(i)}{P(i)}}$ - Equivalent to - $D_{{{\mathrm {KL}}}}(P\|Q)=\sum _{i}P(i)\ln {\frac {P(i)}{Q(i)}}$ ## NLP toolkits - Jieba - CKIP - Hanlp - CMU NLP toolkit - Moses Open source toolkit