人工智慧導論期末考

# 人工智慧導論期末考 [TOC] ## 考古 - ==What is the main difference between CNN and RNN?== - CNN is suitable for spatial data like images. RNN is used for temporal data, also called sequential data. - What is overfitting? - Overfitting refers to the condition when the model completely fits the training data but fails to generalize the testing unseen data. - How can we reduce overfitting? Describe at least two tips. - k folds cross-validation - Regularization - Pruning - Dropout - Ensembling - Batch normalization - What strategies can apply to reduce overfitting when learning a decision tree? - **decision tree pruning** - build a full tree - find a test node - if irrelevant（only noise in data) --> replace with leaf node - What is the kernel trick in SVM. - 用kernel function(低維度的計算)在很高維的空間中有效率的找到optimal linear separator，這個結果map回低維度時，可能會對應到一個扭曲非線性的boundary. - Describe the idea of ensemble learning. - The idea of ensemble learning is to select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions. - Describe the idea of random forest. - The key idea is randomly vary the *attribute choices*. At each split point in constructing the tree, select a random sampling of attributes, and then compute which of those gives the highest information gain. ## 隨便亂打 ### CH19 - Supervised learning - 從一堆 input-output pair中學習 function，去預測新的input的輸出 - Ockham's razor - 學習到的 function 會 prefer 最簡單的假設 - Decision tree - 一個function，把 attribute 轉為 vector 輸入進去，回傳 decision - DECISION-TREE-LEARNING algorithm - 永遠從最重要的 attribute 先切，才能得到最多資訊（Entropy 減少量） - Pruning - 建好 decision tree 後，檢查每個node是否重要，不重要則用leaf node 取代它，這個動作可以讓 decision tree 更 general - 建decision tree 會遇到的問題 - 缺失值、太多atrribute、資料是連續的 - independent and identically distributed （**iid**） - 變數之間彼此獨立且來自相同的 distribution - Regularization - 選擇模型是要選擇cost最小的，懲罰高複雜度假設的機制就是regularization，其中 cost = loss + complexity - Stochastic gradient descent（SGD） - 隨機選擇 training data 來做梯度下降，較不容易卡在 local minimun - Parametric model v.s. Nonparametric model - Parametric model 透過資料去 fit 已知的參數，Nonparametric model 沒有固定的參數，是利用 training data 本身去預測輸出 - Nonparametric method - memory-based learning、KNN、k-d tree、locality-sensitive hashing（LSH）、SVM - Ensemble learning - bagging、random forests、stacking、boostung、online learning ## CH20 - Statistic learning - 關鍵是資料（有哪些 random varible）和假設 - Baysian learning 的關鍵 - hypothesis prior, likelihood - MAP hypothesis - 使用最有可能的假設來近似 Baysian prediction - density estimation - 給定資料，由這個模型產生的機率 - EM algorithm - 假裝知道每個 data 屬於哪個 component，計算平均、共變異數、component weight，並且更新參數，直到收斂