# 自然語言處理 - 李龍豪 (2024 Spring) ## Class info. [課程資訊](https://timetable.nycu.edu.tw/?r=main/crsoutline&Acy=112&Sem=2&CrsNo=539104&lang=zh-tw) Temporary Grading Policy: - 自選論文書面報告: 2020~2023 ACL/EMNLP (20%) - 指定論文口頭報告 (30%) - 期末專題實作與報告 (50%) 中文維度型情感分析 - Training Set: Chinese EmoBank - Test Set: 1000+ Mental Health Texts ## Date ### 2/22 :::info Course Overview ::: > Encoding - ASCII - Unicode - UTF-8 - BIG-5 (繁體中文) - GB_2312 (簡體中文) > Words - Morphology - Word Segmentation - Part-of-Speech Tagging > Syntax - Constituency Grammars - Syntactic Parsing - Dependency Parsing > Semantics - Lexical Semantics - Semantic Role Labeling - Words Sense Disambiguation > Pragmatics - Coreference Resolution - Discourse Analysis - Simile vs Metaphor > Approach - Rule-Based Approach - Corpus-Based Approach - Statistical Language Models - Neural Language Models - Pre-Trained Language Models - Large Language Models > Applications - Machine Translations - Question Answering - Summarization - Dialog Systems and Chatbots - Grammatical/Spelling Error Correction - Sentiment Analysis ### 2/29 :::info NLP Fundamental Tasks ::: Regular Expression Text Normalization - Tokenizing (segmenting) words - Normalizing word formats - Segmenting sentences **Tokenization**: the task of segmenting running text into words Name Entity Recognition (NER) - BiLSTM with CRF Model Evaluation of NER - Recall - Precision - F-measure $$ F = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}} $$ ### 3/7 :::info Statistical Language Models ::: > Language Model Models that assign probability to upcoming words, or sequences of words. > N-gram Language Models The n-gram model is the simplest kind of language model. The probability of a word $w$ give some history $h$. $$ P(w \mid h) $$ The bi-gram model approximates the probability of a word given all the previous words. $$ P(w_n \mid w_{1:n-1}) \approx P(w_n \mid w_{n-1}) $$ > Chain Rule of Probability Applying the chain rule of words \begin{align} P(w_{1:n}) &= P(w_1)P(w_2 \mid w_1)P(w_3 \mid w_{1:2}) \dots P(w_n \mid w_{1:n-1}) \\ &= \prod_{k=1}^nP(w_k \mid w_{1:k-1}) \end{align} > [Hidden Markov Models](https://web.ntnu.edu.tw/~algo/HiddenMarkovModel.html) **Markov models** are the class of probabilistic models the assume we can predict the probaility of some future unit wiyhout looking too far into the past. Three fundamental problems: - Likelihood - Decoding - Learning > Perplexity $$ \text{perplexity}(W) = P(w_1 w_2 \dots w_N)^{\frac{1}{N}} = (\prod_{i=1}^N \frac{1}{P(w_i \mid w_{i-1})})^{\frac{1}{N}} $$ ### 3/14 :::info Embedding & Neural Language Models ::: Lexical Semantics > Connotation - Valence - Arousal - Dominance Word-Word Co-occurrence Matrix > TF-IDF: Weighting terms in the vector - TF (Term Frequency) - IDF (Inverse Document Frequency) $$ \mathrm{tf}_{t, d}= \begin{cases}1+\log _{10} \operatorname{count}(t, d) & \text { if } \operatorname{count}(t, d)>0 \\ 0 & \text { otherwise }\end{cases} $$ $$ \mathrm{idf}_t = \log_{10} (\frac{N}{\mathrm{df}_t}) $$ $$ w_{t, d} = \mathrm{tf}_{t, d} \times \mathrm{idf}_t $$ Word2vec Other Kinds of Static Embeddings - GloVe - Fasttext > Visualizing Embeddings Probably the most common visualization method is to project the 100 dimensions of a word down into 2 dimensions suing a projection method called **t-SNE** ### 3/21 :::info Pre-trained Language Model ::: - MLP is linear transfer - Feed forward network: non-linear transfer (activation) > Computational Graphs Computational Graphs 是一種表示神經網路計算過程的結構。它將神經網路的各個層、節點和運算操作以圖形的方式連接起來,形成一個有向圖。這個圖描述了正向傳播和反向傳播的過程,以及每個節點之間的數值流動。 ![image](https://hackmd.io/_uploads/BJXDD8tAp.png) > Embedding Layer - 嵌入層是一種將高維度資料轉換為低維度表示的技術,同時保留原始資料的特性。 - 在處理像是稀疏向量等大量輸入的情況下,使用嵌入層可以使模型訓練更簡單。 > RNN & LSTM RNN: any network that contains a cycle within its network connections Bi-RNN [LSTM](https://zhuanlan.zhihu.com/p/32085405): forget/add/output gates Bi-LSTM [GRU](https://zhuanlan.zhihu.com/p/32481747) [CNN for NLP](https://zhuanlan.zhihu.com/p/189527481) ### 3/28 :::info Large Language Model ::: 停課 ### 4/4 Tomb Sweeping Day (Skip) ### 4/11 :::info Large Language Model ::: <!-- :::info NLP Applications & Sentiment Analysis ::: --> > Transformer - Backward looking - Bidirectional > How to compare words? - Inner product - Query / Key / Value - Attention score dividing the square root of the dimensionality of the query and key vectors Multihead attention Transformer block ![image](https://hackmd.io/_uploads/HJiqAeHlR.png) Positional Embedding > Sampling - Top-K - Top-P - Temperature [A Survey of Large Language Models](https://arxiv.org/abs/2303.18223) ### 4/18 :::info Paper presentation ::: - [bert2BERT: Towards Reusable Pretrained Language Models](https://aclanthology.org/2022.acl-long.151/) - [We’re Afraid Language Models Aren’t Modeling Ambiguity](https://aclanthology.org/2023.emnlp-main.51/) - [Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning](https://aclanthology.org/2023.emnlp-main.609/) - [WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings](https://aclanthology.org/2023.acl-long.677/) - [LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models](https://aclanthology.org/2023.emnlp-main.319/) ### 4/25 :::info Paper presentation ::: - [DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models](https://aclanthology.org/2023.acl-long.248/) - [Controllable Text Generation via Probability Density Estimation in the Latent Space](https://aclanthology.org/2023.acl-long.704/) - [Translation-Enhanced Multilingual Text-to-Image Generation](https://aclanthology.org/2023.acl-long.510/) - [Language Model is Suitable for Correction of Handwritten Mathematical Expressions Recognition](https://aclanthology.org/2023.emnlp-main.247/) - [Self-Supervised Multimodal Opinion Summarization](https://aclanthology.org/2021.acl-long.33/) ### 5/2 :::info Paper presentation ::: - [Multi-Party Empathetic Dialogue Generation: A New Task for Dialog Systems](https://aclanthology.org/2022.acl-long.24/) - [E-CORE: Emotion Correlation Enhanced Empathetic Dialogue Generation](https://aclanthology.org/2023.emnlp-main.653.pdf) - [Synchronous Dual Network with Cross-Type Attention for Joint Entity and Relation Extraction](https://aclanthology.org/2021.emnlp-main.219/) - [OD-RTE: A One-Stage Object Detection Framework for Relational Triple Extraction](https://aclanthology.org/2023.acl-long.623/) - [Selectively Answering Ambiguous Questions](https://aclanthology.org/2023.emnlp-main.35/) ### 5/9 :::info Project progress ::: I use MoE Bert. ### 5/16 :::info Project related paper presentation ::: - [Non-Linear Text Regression with a Deep Convolutional Neural Network](https://aclanthology.org/P15-2030/) - [Predicting Valence-Arousal Ratings of Words Using a Weighted Graph Method](https://aclanthology.org/P15-2129/) - [Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model](https://aclanthology.org/P16-2037/) - [Sentiment Composition of Words with Opposing Polarities](https://aclanthology.org/N16-1128/) - [Knowledge-enriched Two-layered Attention Network for Sentiment Analysis](https://aclanthology.org/N18-2041/) ### 5/23 :::info Project related paper presentation ::: - [Community-Based Weighted Graph Model for ValenceArousal Prediction of Affective Words](https://ieeexplore.ieee.org/abstract/document/7523246) - [EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis](https://aclanthology.org/E17-2092/) - [Tensor Fusion Network for Multimodal Sentiment Analysis](https://aclanthology.org/D17-1115/) - [A Multilayer Perceptron based Ensemble Technique for Fine-grained Financial Sentiment Analysis](https://aclanthology.org/D17-1057/) - [Volatility Prediction Using Financial Disclosures Sentiments with Word Embedding-based IR Model](https://aclanthology.org/P17-1157/) ### 5/30 :::info Project related paper presentation ::: - [Investigating Dynamic Routing in Tree-Structured LSTM for Sentiment Analysis](https://aclanthology.org/D19-1343/) - [Adversarial Attention Modeling for Multi-dimensional Emotion Regression](https://aclanthology.org/P19-1045/) - [Pipelined Neural Networks for Phrase-Level Sentiment Intensity Prediction](https://ieeexplore.ieee.org/document/8295270/) - [Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis](https://ieeexplore.ieee.org/document/8930925) - [From Polarity to Intensity: Mining Morality from Semantic Space](https://aclanthology.org/2022.coling-1.107/) - [All-in-One: Emotion, Sentiment and Intensity Prediction Using a Multi-Task Ensemble Network](https://ieeexplore.ieee.org/document/8756111/) ### 6/6 :::info Project presentation ::: ## Reference - [Introduction to Natural Language Processing](https://cseweb.ucsd.edu/~nnakashole/teaching/eisenstein-nov18.pdf) - [Speech and Language Processing 3rd edition](https://web.stanford.edu/~jurafsky/slp3/)