Review AI from LINE EC NLP - 林儀潤(Vila Lin)

# Review AI from LINE EC NLP - 林儀潤(Vila Lin) {%hackmd @HWDC/BJOE4qInR %} >#### 》[議程介紹](https://hwdc.ithome.com.tw/2024/session-page/3312) >#### 》[填寫議程滿意度問卷｜回饋建言給辛苦的講者](https://forms.gle/8srUw3fwZveXTvxH8) ## Brief Introduction ### Relationship * 經典的AI電影代表人對AI的期待 * ML: 使用數學方式解決問題 * Feature跟Feature Engerring不容易 * Deep Learning * 利用神經網路自己找到重要的feature * NLP * 斷行，拿關鍵字 * 若要使用 NLP 去模仿人類行為，還需要再搭配 DL 實作 ### 2 model conecpts in AI * Discriminative * Decision Boundaries * $P(Y|X)$ * Generative * Probability Distribution * 生成式AI是倒過來，把Noise跟Y丟進去產生feature出來 ### Case in 2 model concepts * Discriminative * Regression * SVM * CRF * Tree (XGBoost, LightGBM) * Neuron Network (FM,BERT) * Generative * RBMs * Bayesian Network * HMM * Diffusion Model (用雜訊破壞然後組回來) * GAN * GPT ### NLP Evolution * 傳統方法 * SVM * TF-IDF * LDA(演變成GPT的Summarization概念) * Neuron Network * Word2Vec * CNN * LSTM * Pre-Trained Model * BERT (encoder) * GPT (decoder-only) * T5 * Prompt Engineering * ChatGPT * Gemini * LLaMA * Claude ## NLP E-commerce * Segmentation & Embedding * 斷詞 * 用在寫手寫的文章，廣告文 * NER * 實體提取，例如產品規格 * Classification * 商品類別區分 * 難度到後面人工分類越來越高 * 監督學習自動貼貼標 * Quary Understanding * 搜尋引擎 * 滿足客戶需求，提供搜尋建議 * NLG ### EC NLP * Traditional Approach: Trie * Neuron Network * BiLSTM ### Trie * Trie(Dictionary Tree) * Dictionary * Dynamic Programming ### Trie * HMM with stats: BMSE (Begin Middle Single End) * Viterbi Algorithm: : Choose path with maximum probability ### Problem * Quality * * Word Coverage * Apperance Frequency * * Design preprocessing * Collect Dictionaries * HMM * * Hard capture complex/non-linear relationship * * Computing cost increase with large datasets/complex stats ### BiLSTM * Architecture of BiLSTM * couple of LSTM (Long Short-Term Memory) * Cross-BiLSTM-CNN ### Problem * No pretrained with large corpora * OOVwords * Parallel limitation ### BERT * Artechture of BERT * Bert Base * Bert Large * Attention * Multi-head Attension * Self-Attention 能處理先後問題 ### BERT Transfer Learning (最關鍵的點！) * Pre-Training * Fine-Tuning ### Problem * Need signification size of task-specific data * Data agumentation * Semi-supervise learing * Active Learning * Knowledge distillation * external knowledge ## Past To Present ## Advance ### Discriminative vs Generative AI 結合在一起用開源模型進行加值 ### BERT + GPT * BERT * Word Segmentation * Classification * NER * GPT * One-shot/few-shot adaption * Text Generation * Summarization ### Generate Dataset by GPT ### Fine-Tune BERT ## Takeaway * AI 問題跟數學問題很像，不是只有一個解法 * 創業容易，守成困難 == 聊天區 ==