question - HackMD

# question ###### tags: `Lecture2` ## please ask some question 1.在簡報第43頁提到三張不同的圖片，卻有一樣的L2距離，所以KNN這個方法不好，想請問我們該用什麼方法解決這個問題？ 2.KNN在定義距離的時候有沒有什麼客觀或主觀的方法來決定？ 3.在嘗試不同k值的時候，如果有多組經過cross validations得到的準確率差不多，應該選擇k比較大的，還是k比較小的？ 4.在KNN的範例中，我們可以看到背景嚴重的影響了模型的判斷，模型並未聚焦於我們想關注的事物上，請問有沒有方法可以改善這個問題? 5.linear classifier 遇到高維度的資料可以利用降維度的方式分類嗎？我的想法： 1. actually,2 norm is not a good method to measure the distance between two images in pixels,if you want to solve this problem,please don't use knn in image classification XD. u can see pic below. 2. I think this question is case by case,maybe you can try different distance in practice,and find the best one~. 關鍵在於定義取feature的方式，feature的距離反而沒那麼重要因為feature都是pixel ，容易造成背景雜訊影響決策 3. In practice,you can use another dataset such as development set to decide which one is better~ https://stats.stackexchange.com/questions/126051/choosing-optimal-k-for-knn for example, if you have tight data,bigger one maybe better,all in all,case by caseXD Ｋ大一點或許比較適合小容易overfitting 4. I think maybe we can find better method in coming chapters..I have no ability to answer this question,sorryQQ. 影像特徵如何取最重要，不是knn問題 5. I think the answer is positive,in image classification we may call it "feature extraction" Ｍaybe the later chapter can give you the better idea. But if we get "numerical" data,we can use some elegant method to do feature selection,such as PCA,LLE... 不管是線性還是非線性，都可以以降帷的方式處理高維度用線性的反而不容易overfit 非線性反而容易overfit ＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝ For question 2: ![](https://i.imgur.com/Fm9RrCC.png)