自然語言處理 - 李龍豪 (2024 Spring)

Class info.

課程資訊

Temporary Grading Policy:

自選論文書面報告: 2020~2023 ACL/EMNLP (20%)
指定論文口頭報告 (30%)
期末專題實作與報告 (50%)
中文維度型情感分析
- Training Set: Chinese EmoBank
- Test Set: 1000+ Mental Health Texts

Date

2/22

Course Overview

Encoding

ASCII
Unicode
UTF-8
BIG-5 (繁體中文)
GB_2312 (簡體中文)

Words

Morphology
Word Segmentation
Part-of-Speech Tagging

Syntax

Constituency Grammars
Syntactic Parsing
Dependency Parsing

Semantics

Lexical Semantics
Semantic Role Labeling
Words Sense Disambiguation

Pragmatics

Coreference Resolution
Discourse Analysis
Simile vs Metaphor

Approach

Rule-Based Approach
Corpus-Based Approach
Statistical Language Models
Neural Language Models
Pre-Trained Language Models
Large Language Models

Applications

Machine Translations
Question Answering
Summarization
Dialog Systems and Chatbots
Grammatical/Spelling Error Correction
Sentiment Analysis

2/29

NLP Fundamental Tasks

Regular Expression

Text Normalization

Tokenizing (segmenting) words
Normalizing word formats
Segmenting sentences

Tokenization: the task of segmenting running text into words

Name Entity Recognition (NER)

BiLSTM with CRF Model

Evaluation of NER

Recall
Precision
F-measure

F = 2 \cdot \frac{precision \cdot recall}{precision + recall}

3/7

Statistical Language Models

Language Model

Models that assign probability to upcoming words, or sequences of words.

N-gram Language Models

The n-gram model is the simplest kind of language model.

The probability of a word

w

give some history

h

P (w ∣ h)

The bi-gram model approximates the probability of a word given all the previous words.

P (w_{n} ∣ w_{1 : n - 1}) \approx P (w_{n} ∣ w_{n - 1})

Chain Rule of Probability

Applying the chain rule of words

\begin{aligned} P (w_{1 : n}) & = P (w_{1}) P (w_{2} ∣ w_{1}) P (w_{3} ∣ w_{1 : 2}) \dots P (w_{n} ∣ w_{1 : n - 1}) \\ = \prod_{k = 1}^{n} P (w_{k} ∣ w_{1 : k - 1}) \end{aligned}

Hidden Markov Models

Markov models are the class of probabilistic models the assume we can predict the probaility of some future unit wiyhout looking too far into the past.

Three fundamental problems:

Likelihood
Decoding
Learning

Perplexity

perplexity (W) = P (w_{1} w_{2} \dots w_{N})^{\frac{1}{N}} = (\prod_{i = 1}^{N} \frac{1}{P (w_{i} ∣ w_{i - 1})})^{\frac{1}{N}}

3/14

Embedding & Neural Language Models

Lexical Semantics

Connotation

Valence
Arousal
Dominance

Word-Word Co-occurrence Matrix

TF-IDF: Weighting terms in the vector

TF (Term Frequency)
IDF (Inverse Document Frequency)

{tf}_{t, d} = {\begin{cases} 1 + \log_{10} count (t, d) & if count (t, d) > 0 \\ 0 & otherwise \end{cases}

{idf}_{t} = \log_{10} (\frac{N}{{df}_{t}})

w_{t, d} = {tf}_{t, d} \times {idf}_{t}

Word2vec

Other Kinds of Static Embeddings

GloVe
Fasttext

Visualizing Embeddings

Probably the most common visualization method is to project the 100 dimensions of a word down into 2 dimensions suing a projection method called t-SNE

3/21

Pre-trained Language Model

MLP is linear transfer
Feed forward network: non-linear transfer (activation)

Computational Graphs

Computational Graphs 是一種表示神經網路計算過程的結構。它將神經網路的各個層、節點和運算操作以圖形的方式連接起來，形成一個有向圖。這個圖描述了正向傳播和反向傳播的過程，以及每個節點之間的數值流動。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Embedding Layer

嵌入層是一種將高維度資料轉換為低維度表示的技術，同時保留原始資料的特性。
在處理像是稀疏向量等大量輸入的情況下，使用嵌入層可以使模型訓練更簡單。

RNN & LSTM

RNN: any network that contains a cycle within its network connections

Bi-RNN

LSTM: forget/add/output gates

Bi-LSTM

GRU

CNN for NLP

3/28

Large Language Model

停課

4/4

Tomb Sweeping Day (Skip)

4/11

Large Language Model

Transformer

Backward looking
Bidirectional

How to compare words?

Inner product
Query / Key / Value
Attention score dividing the square root of the dimensionality of the query and key vectors

Multihead attention

Transformer block

Positional Embedding

Sampling

Top-K
Top-P
Temperature

A Survey of Large Language Models

4/18

Paper presentation

4/25

Paper presentation

5/2

Paper presentation

5/9

Project progress

I use MoE Bert.

5/16

Project related paper presentation

5/23

Project related paper presentation

5/30

Project related paper presentation

6/6

Project presentation

自然語言處理 - 李龍豪 (2024 Spring)

Class info.

Date

2/22

2/29

3/7

3/14

3/21

3/28

4/4

4/11

4/18

4/25

5/2

5/9

5/16

5/23

5/30

6/6

Reference

Read more

雲原生軟體開發與最佳實踐 - 黃俊龍、曾建超 (2024 Spring)

CS50 Harvard 2024

2023 Cloud Study Jam

Google Developer Group