# 人工智慧導論期末考
[TOC]
## 考古
- ==What is the main difference between CNN and RNN?==
- CNN is suitable for spatial data like images. RNN is used for temporal data, also called sequential data.
- What is overfitting?
- Overfitting refers to the condition when the model completely fits the training data but fails to generalize the testing unseen data.
- How can we reduce overfitting? Describe at least two tips.
- k folds cross-validation
- Regularization
- Pruning
- Dropout
- Ensembling
- Batch normalization
- What strategies can apply to reduce overfitting when learning a decision tree?
- **decision tree pruning**
- build a full tree
- find a test node
- if irrelevant(only noise in data) --> replace with leaf node
- What is the kernel trick in SVM.
- 用kernel function(低維度的計算)在很高維的空間中有效率的找到optimal linear separator,這個結果map回低維度時,可能會對應到一個扭曲非線性的boundary.
- Describe the idea of ensemble learning.
- The idea of ensemble learning is to select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions.
- Describe the idea of random forest.
- The key idea is randomly vary the *attribute choices*. At each split point in constructing the tree, select a random sampling of attributes, and then compute which of those gives the highest information gain.
## 隨便亂打
### CH19
- Supervised learning
- 從一堆 input-output pair中學習 function,去預測新的input的輸出
- Ockham's razor
- 學習到的 function 會 prefer 最簡單的假設
- Decision tree
- 一個function,把 attribute 轉為 vector 輸入進去,回傳 decision
- DECISION-TREE-LEARNING algorithm
- 永遠從最重要的 attribute 先切,才能得到最多資訊(Entropy 減少量)
- Pruning
- 建好 decision tree 後,檢查每個node是否重要,不重要則用leaf node 取代它,這個動作可以讓 decision tree 更 general
- 建decision tree 會遇到的問題
- 缺失值、太多atrribute、資料是連續的
- independent and identically distributed (**iid**)
- 變數之間彼此獨立且來自相同的 distribution
- Regularization
- 選擇模型是要選擇cost最小的,懲罰高複雜度假設的機制就是regularization,其中 cost = loss + complexity
- Stochastic gradient descent(SGD)
- 隨機選擇 training data 來做梯度下降,較不容易卡在 local minimun
- Parametric model v.s. Nonparametric model
- Parametric model 透過資料去 fit 已知的參數,Nonparametric model 沒有固定的參數,是利用 training data 本身去預測輸出
- Nonparametric method
- memory-based learning、KNN、k-d tree、locality-sensitive hashing(LSH)、SVM
- Ensemble learning
- bagging、random forests、stacking、boostung、online learning
## CH20
- Statistic learning
- 關鍵是資料(有哪些 random varible)和假設
- Baysian learning 的關鍵
- hypothesis prior, likelihood
- MAP hypothesis
- 使用最有可能的假設來近似 Baysian prediction
- density estimation
- 給定資料,由這個模型產生的機率
- EM algorithm
- 假裝知道每個 data 屬於哪個 component,計算平均、共變異數、component weight,並且更新參數,直到收斂