# 自然語言處理 - 李龍豪 (2024 Spring)
## Class info.
[課程資訊](https://timetable.nycu.edu.tw/?r=main/crsoutline&Acy=112&Sem=2&CrsNo=539104&lang=zh-tw)
Temporary Grading Policy:
- 自選論文書面報告: 2020~2023 ACL/EMNLP (20%)
- 指定論文口頭報告 (30%)
- 期末專題實作與報告 (50%)
中文維度型情感分析
- Training Set: Chinese EmoBank
- Test Set: 1000+ Mental Health Texts
## Date
### 2/22
:::info
Course Overview
:::
> Encoding
- ASCII
- Unicode
- UTF-8
- BIG-5 (繁體中文)
- GB_2312 (簡體中文)
> Words
- Morphology
- Word Segmentation
- Part-of-Speech Tagging
> Syntax
- Constituency Grammars
- Syntactic Parsing
- Dependency Parsing
> Semantics
- Lexical Semantics
- Semantic Role Labeling
- Words Sense Disambiguation
> Pragmatics
- Coreference Resolution
- Discourse Analysis
- Simile vs Metaphor
> Approach
- Rule-Based Approach
- Corpus-Based Approach
- Statistical Language Models
- Neural Language Models
- Pre-Trained Language Models
- Large Language Models
> Applications
- Machine Translations
- Question Answering
- Summarization
- Dialog Systems and Chatbots
- Grammatical/Spelling Error Correction
- Sentiment Analysis
### 2/29
:::info
NLP Fundamental Tasks
:::
Regular Expression
Text Normalization
- Tokenizing (segmenting) words
- Normalizing word formats
- Segmenting sentences
**Tokenization**: the task of segmenting running text into words
Name Entity Recognition (NER)
- BiLSTM with CRF Model
Evaluation of NER
- Recall
- Precision
- F-measure
$$
F = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}}
$$
### 3/7
:::info
Statistical Language Models
:::
> Language Model
Models that assign probability to upcoming words, or sequences of words.
> N-gram Language Models
The n-gram model is the simplest kind of language model.
The probability of a word $w$ give some history $h$.
$$
P(w \mid h)
$$
The bi-gram model approximates the probability of a word given all the previous words.
$$
P(w_n \mid w_{1:n-1}) \approx P(w_n \mid w_{n-1})
$$
> Chain Rule of Probability
Applying the chain rule of words
\begin{align}
P(w_{1:n}) &= P(w_1)P(w_2 \mid w_1)P(w_3 \mid w_{1:2}) \dots P(w_n \mid w_{1:n-1}) \\
&= \prod_{k=1}^nP(w_k \mid w_{1:k-1})
\end{align}
> [Hidden Markov Models](https://web.ntnu.edu.tw/~algo/HiddenMarkovModel.html)
**Markov models** are the class of probabilistic models the assume we can predict the probaility of some future unit wiyhout looking too far into the past.
Three fundamental problems:
- Likelihood
- Decoding
- Learning
> Perplexity
$$
\text{perplexity}(W) = P(w_1 w_2 \dots w_N)^{\frac{1}{N}} = (\prod_{i=1}^N \frac{1}{P(w_i \mid w_{i-1})})^{\frac{1}{N}}
$$
### 3/14
:::info
Embedding & Neural Language Models
:::
Lexical Semantics
> Connotation
- Valence
- Arousal
- Dominance
Word-Word Co-occurrence Matrix
> TF-IDF: Weighting terms in the vector
- TF (Term Frequency)
- IDF (Inverse Document Frequency)
$$
\mathrm{tf}_{t, d}= \begin{cases}1+\log _{10} \operatorname{count}(t, d) & \text { if } \operatorname{count}(t, d)>0 \\ 0 & \text { otherwise }\end{cases}
$$
$$
\mathrm{idf}_t = \log_{10} (\frac{N}{\mathrm{df}_t})
$$
$$
w_{t, d} = \mathrm{tf}_{t, d} \times \mathrm{idf}_t
$$
Word2vec
Other Kinds of Static Embeddings
- GloVe
- Fasttext
> Visualizing Embeddings
Probably the most common visualization method is to project the 100 dimensions of a word down into 2 dimensions suing a projection method called **t-SNE**
### 3/21
:::info
Pre-trained Language Model
:::
- MLP is linear transfer
- Feed forward network: non-linear transfer (activation)
> Computational Graphs
Computational Graphs 是一種表示神經網路計算過程的結構。它將神經網路的各個層、節點和運算操作以圖形的方式連接起來,形成一個有向圖。這個圖描述了正向傳播和反向傳播的過程,以及每個節點之間的數值流動。
![image](https://hackmd.io/_uploads/BJXDD8tAp.png)
> Embedding Layer
- 嵌入層是一種將高維度資料轉換為低維度表示的技術,同時保留原始資料的特性。
- 在處理像是稀疏向量等大量輸入的情況下,使用嵌入層可以使模型訓練更簡單。
> RNN & LSTM
RNN: any network that contains a cycle within its network connections
Bi-RNN
[LSTM](https://zhuanlan.zhihu.com/p/32085405): forget/add/output gates
Bi-LSTM
[GRU](https://zhuanlan.zhihu.com/p/32481747)
[CNN for NLP](https://zhuanlan.zhihu.com/p/189527481)
### 3/28
:::info
Large Language Model
:::
停課
### 4/4
Tomb Sweeping Day (Skip)
### 4/11
:::info
Large Language Model
:::
<!-- :::info
NLP Applications & Sentiment Analysis
::: -->
> Transformer
- Backward looking
- Bidirectional
> How to compare words?
- Inner product
- Query / Key / Value
- Attention score dividing the square root of the dimensionality of the query and key vectors
Multihead attention
Transformer block
![image](https://hackmd.io/_uploads/HJiqAeHlR.png)
Positional Embedding
> Sampling
- Top-K
- Top-P
- Temperature
[A Survey of Large Language Models](https://arxiv.org/abs/2303.18223)
### 4/18
:::info
Paper presentation
:::
- [bert2BERT: Towards Reusable Pretrained Language Models](https://aclanthology.org/2022.acl-long.151/)
- [We’re Afraid Language Models Aren’t Modeling Ambiguity](https://aclanthology.org/2023.emnlp-main.51/)
- [Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning](https://aclanthology.org/2023.emnlp-main.609/)
- [WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings](https://aclanthology.org/2023.acl-long.677/)
- [LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models](https://aclanthology.org/2023.emnlp-main.319/)
### 4/25
:::info
Paper presentation
:::
- [DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models](https://aclanthology.org/2023.acl-long.248/)
- [Controllable Text Generation via Probability Density Estimation in the Latent Space](https://aclanthology.org/2023.acl-long.704/)
- [Translation-Enhanced Multilingual Text-to-Image Generation](https://aclanthology.org/2023.acl-long.510/)
- [Language Model is Suitable for Correction of Handwritten Mathematical Expressions Recognition](https://aclanthology.org/2023.emnlp-main.247/)
- [Self-Supervised Multimodal Opinion Summarization](https://aclanthology.org/2021.acl-long.33/)
### 5/2
:::info
Paper presentation
:::
- [Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems](https://aclanthology.org/2022.acl-long.24/)
- [E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation](https://aclanthology.org/2023.emnlp-main.653.pdf)
- [Synchronous Dual Network with Cross-Type Attention for Joint Entity and Relation Extraction](https://aclanthology.org/2021.emnlp-main.219/)
- [OD-RTE: A One-Stage Object Detection Framework for Relational Triple Extraction](https://aclanthology.org/2023.acl-long.623/)
- [Selectively Answering Ambiguous Questions](https://aclanthology.org/2023.emnlp-main.35/)
### 5/9
:::info
Project progress
:::
I use MoE Bert.
### 5/16
:::info
Project related paper presentation
:::
- [Non-Linear Text Regression with a Deep Convolutional Neural Network](https://aclanthology.org/P15-2030/)
- [Predicting Valence-Arousal Ratings of Words Using a Weighted Graph Method](https://aclanthology.org/P15-2129/)
- [Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model](https://aclanthology.org/P16-2037/)
- [Sentiment Composition of Words with Opposing Polarities](https://aclanthology.org/N16-1128/)
- [Knowledge-enriched Two-layered Attention Network for Sentiment Analysis](https://aclanthology.org/N18-2041/)
### 5/23
:::info
Project related paper presentation
:::
- [Community-Based Weighted Graph Model for ValenceArousal Prediction of Affective Words](https://ieeexplore.ieee.org/abstract/document/7523246)
- [EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis](https://aclanthology.org/E17-2092/)
- [Tensor Fusion Network for Multimodal Sentiment Analysis](https://aclanthology.org/D17-1115/)
- [A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis](https://aclanthology.org/D17-1057/)
- [Volatility Prediction Using Financial Disclosures Sentiments with Word Embedding-based IR Model](https://aclanthology.org/P17-1157/)
### 5/30
:::info
Project related paper presentation
:::
- [Investigating Dynamic Routing in Tree-Structured LSTM for Sentiment Analysis](https://aclanthology.org/D19-1343/)
- [Adversarial Attention Modeling for Multi-dimensional Emotion Regression](https://aclanthology.org/P19-1045/)
- [Pipelined Neural Networks for Phrase-Level Sentiment Intensity Prediction](https://ieeexplore.ieee.org/document/8295270/)
- [Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis](https://ieeexplore.ieee.org/document/8930925)
- [From Polarity to Intensity: Mining Morality from Semantic Space](https://aclanthology.org/2022.coling-1.107/)
- [All-in-One: Emotion, Sentiment and Intensity Prediction Using a Multi-Task Ensemble Network](https://ieeexplore.ieee.org/document/8756111/)
### 6/6
:::info
Project presentation
:::
## Reference
- [Introduction to Natural Language Processing](https://cseweb.ucsd.edu/~nnakashole/teaching/eisenstein-nov18.pdf)
- [Speech and Language Processing 3rd edition](https://web.stanford.edu/~jurafsky/slp3/)