# 人工智慧導論期末考 [TOC] ## 考古 - ==What is the main difference between CNN and RNN?== - CNN is suitable for spatial data like images. RNN is used for temporal data, also called sequential data. - What is overfitting? - Overfitting refers to the condition when the model completely fits the training data but fails to generalize the testing unseen data. - How can we reduce overfitting? Describe at least two tips. - k folds cross-validation - Regularization - Pruning - Dropout - Ensembling - Batch normalization - What strategies can apply to reduce overfitting when learning a decision tree? - **decision tree pruning** - build a full tree - find a test node - if irrelevant(only noise in data) --> replace with leaf node - What is the kernel trick in SVM. - 用kernel function(低維度的計算)在很高維的空間中有效率的找到optimal linear separator,這個結果map回低維度時,可能會對應到一個扭曲非線性的boundary. - Describe the idea of ensemble learning. - The idea of ensemble learning is to select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions. - Describe the idea of random forest. - The key idea is randomly vary the *attribute choices*. At each split point in constructing the tree, select a random sampling of attributes, and then compute which of those gives the highest information gain. ## 隨便亂打 ### CH19 - Supervised learning - 從一堆 input-output pair中學習 function,去預測新的input的輸出 - Ockham's razor - 學習到的 function 會 prefer 最簡單的假設 - Decision tree - 一個function,把 attribute 轉為 vector 輸入進去,回傳 decision - DECISION-TREE-LEARNING algorithm - 永遠從最重要的 attribute 先切,才能得到最多資訊(Entropy 減少量) - Pruning - 建好 decision tree 後,檢查每個node是否重要,不重要則用leaf node 取代它,這個動作可以讓 decision tree 更 general - 建decision tree 會遇到的問題 - 缺失值、太多atrribute、資料是連續的 - independent and identically distributed (**iid**) - 變數之間彼此獨立且來自相同的 distribution - Regularization - 選擇模型是要選擇cost最小的,懲罰高複雜度假設的機制就是regularization,其中 cost = loss + complexity - Stochastic gradient descent(SGD) - 隨機選擇 training data 來做梯度下降,較不容易卡在 local minimun - Parametric model v.s. Nonparametric model - Parametric model 透過資料去 fit 已知的參數,Nonparametric model 沒有固定的參數,是利用 training data 本身去預測輸出 - Nonparametric method - memory-based learning、KNN、k-d tree、locality-sensitive hashing(LSH)、SVM - Ensemble learning - bagging、random forests、stacking、boostung、online learning ## CH20 - Statistic learning - 關鍵是資料(有哪些 random varible)和假設 - Baysian learning 的關鍵 - hypothesis prior, likelihood - MAP hypothesis - 使用最有可能的假設來近似 Baysian prediction - density estimation - 給定資料,由這個模型產生的機率 - EM algorithm - 假裝知道每個 data 屬於哪個 component,計算平均、共變異數、component weight,並且更新參數,直到收斂