# 多語暨跨語資訊期末考
---
# Probabilistic CFGs (PCFG)
- A set of terminals, {$w^k$}, k = 1, …, V
- A set of nonterminals, {$N^i$}, i = 1, …, n
- A designated start symbol $N^1$
- A set of rules, {$N^i, \xi^j$}
- where $\xi^j$ is a sequence of terminals and nonterminals
- A corresponding set of probabilities on rules such that:
- $\forall i \Sigma_j P(N^i \rightarrow \xi^j) = 1$
- The probability of a sentence (according to grammar G) is given by:
- $P(w_{1m}) = \Sigma_t P(w_{1m},t) = \Sigma_{t: yield(t) = w_{1m}} P(t)$
- where $t$ is a parse tree of the sentence
- Need dynamic programming to make this efficient
:::info
- Assumptions of PCFG model
- Place invariance
- The probability of a subtree does not depend on where in the string the words it dominates are (like time invariance in an HMM)
- Context free
- The probability of a subtree does not depend on words not dominated by that subtree
- Ancestor free
- The probability of a subtree does not depend on nodes in the derivation outside the subtree
:::
- **Outside ($\alpha_j$)** and **Inside ($\beta_j$)** Probabilities are defined as
- $\alpha_j (p,q) = P(w_{1(p-1)},N^j_{pq},w_{(q+1)m} | G)$
- $\beta_j (p,q) = P(w_{pq} | N^j_{pq},G)$
- Product of inside and outside probabilities
- $\alpha_j (p,q) \beta_j (p,q) = P(w_{1m},N^j_{pq} | G)$
# POS tagging
## Tagging HMM model
- $argmax_{t_{1,n}} P(t_{1,n} | w_{1,n})$ = $argmax_{t_{1,n}} P(t_{1,n}) \times P(w_{1,n} | t_{1,n})$
- $w_i$: word at position i
- $t_i$: tag assigned to word at position i
# Noisy Channel model



# Collocation
- Chi square

# Word sense disambiguity
- Methodology
- Supervised Disambiguation
- based on labeled training set
- Dictionary-based Disambiguation
- based on lexical resources such as dictionaries and thesauri
- Unsupervised Disambiguation
- based on unlabeled corpora
- Factors Influencing the Notion of Sense
- Co-occurrence (bag-of-words model): topical sense
- Relational information (e.g., subject, object)
- Other grammatical information (e.g., part-of-speech)
- Collocations (one sense per collocation)
- Discourse (one sense per discourse segment): How much context is needed to determine sense?
- Combinations of the above
# Machine translation

## Sentence alignment
- Length based
- Lexical based
- Offset based
# 老師說會考的公式,背起來
## CFG parse tree 部分

## 學長姐論文部分





# 考古題
## 名詞解釋
- Corpus
- 語料庫為一種大型結構化或非結構化的文字組合
- 在 NLP 任務中作為建立模型所需背景知識的基礎。
- Collocation
- 搭配詞
- 語言中常出現的單字組合
- a sequence of two or more consecutive words
- characteristics of a syntactic and semantic unit
- Parallel texts
- 平行語料
- 與譯文並行放置的文本
- 原始語料與譯本之間存在 1-to-1 mapping
- Sparse data problem
- 在 training data 中没有出現 , 但 inference 時可能出現
- Word sense disambiguation
- 不同語境中,有岐義詞的含意
- many words have different meanings or senses
- there is ambiguity about how they are to be specifically interpreted
- CFG
- Context free grammar
- 可以用來描述巢狀結構語言的文法樹
- LR parsing
- Parse tree
- $G = (V, \Sigma, R, S)$
- Synonym
- 同義詞
- Polysemy
- 多義詞
- Pragmatics
- 語用學 investigate the relatonship between context and meaning
- Parts of Speech tag
- 詞性標註
- Smoothing
- Avoid infinite Cross Entropy
- Address the problem of zero probabilities while retaining the relative likelihoods for occurring items
## 寫出他們的公式
### Chain rule
- $p(W) = p(w_1,w_2,w_3,...,w_n)$
- $p(W) = p(w_1)p(w_2|w_1)p(w_3|w_1,w_2) ... p(w_n|w_1,w_2,...,w_{n-1})$
### Entropy
- $H(p) = -\Sigma p(x)logp(x) = \Sigma p(x)log\frac{1}{p(x)}$
### N-gram language model
- $P(W) = \Pi_{1..n} P(w_i|w_{i-n+1},w_{i-n+2},...,w_{i-1})$
### Hidden Markov model
- $P(O|\mu) = \Sigma_{x_1...x_T} \pi_{x_1}b_{x_1o_1} \Pi^{T-1}_{t=1} a_{x_tx_{t+1}} b_{x_{t+1}o_{t+1}}$

### Kullback-Leibler divergence
- 相對熵(relative entropy)
- 是兩個機率分布 $P$ 和 $Q$ 差別的非對稱性的度量
- $D_{\mathrm {KL} }(P\|Q)=-\sum _{i}P(i)\ln {\frac {Q(i)}{P(i)}}$
- Equivalent to
- $D_{{{\mathrm {KL}}}}(P\|Q)=\sum _{i}P(i)\ln {\frac {P(i)}{Q(i)}}$
## NLP toolkits
- Jieba
- CKIP
- Hanlp
- CMU NLP toolkit
- Moses Open source toolkit