# Matlab use KNN calculate iris dataset ## KNN 距離計算 k-近鄰演算法是所有的機器學習演算法中最簡單的之一。 KNN大略來說是以測試資料距離最近的訓練資料標籤是屬於哪類,就將測試資料判定為哪類別,而參數k為總共要判斷鄰近的幾個座標點。 以下案例由 iris 花朵特徵作為範例 由左至右分別為特徵一、二、三、四 及類別標籤 ``` 5.1000000e+000 3.5000000e+000 1.4000000e+000 2.0000000e-001 1 4.9000000e+000 3.0000000e+000 1.4000000e+000 2.0000000e-001 1 4.7000000e+000 3.2000000e+000 1.3000000e+000 2.0000000e-001 1 4.6000000e+000 3.1000000e+000 1.5000000e+000 2.0000000e-001 1 5.0000000e+000 3.6000000e+000 1.4000000e+000 2.0000000e-001 1 5.4000000e+000 3.9000000e+000 1.7000000e+000 4.0000000e-001 1 4.6000000e+000 3.4000000e+000 1.4000000e+000 3.0000000e-001 1 5.0000000e+000 3.4000000e+000 1.5000000e+000 2.0000000e-001 1 4.4000000e+000 2.9000000e+000 1.4000000e+000 2.0000000e-001 1 4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1 5.4000000e+000 3.7000000e+000 1.5000000e+000 2.0000000e-001 1 4.8000000e+000 3.4000000e+000 1.6000000e+000 2.0000000e-001 1 4.8000000e+000 3.0000000e+000 1.4000000e+000 1.0000000e-001 1 4.3000000e+000 3.0000000e+000 1.1000000e+000 1.0000000e-001 1 5.8000000e+000 4.0000000e+000 1.2000000e+000 2.0000000e-001 1 5.7000000e+000 4.4000000e+000 1.5000000e+000 4.0000000e-001 1 5.4000000e+000 3.9000000e+000 1.3000000e+000 4.0000000e-001 1 5.1000000e+000 3.5000000e+000 1.4000000e+000 3.0000000e-001 1 5.7000000e+000 3.8000000e+000 1.7000000e+000 3.0000000e-001 1 5.1000000e+000 3.8000000e+000 1.5000000e+000 3.0000000e-001 1 5.4000000e+000 3.4000000e+000 1.7000000e+000 2.0000000e-001 1 5.1000000e+000 3.7000000e+000 1.5000000e+000 4.0000000e-001 1 4.6000000e+000 3.6000000e+000 1.0000000e+000 2.0000000e-001 1 5.1000000e+000 3.3000000e+000 1.7000000e+000 5.0000000e-001 1 4.8000000e+000 3.4000000e+000 1.9000000e+000 2.0000000e-001 1 5.0000000e+000 3.0000000e+000 1.6000000e+000 2.0000000e-001 1 5.0000000e+000 3.4000000e+000 1.6000000e+000 4.0000000e-001 1 5.2000000e+000 3.5000000e+000 1.5000000e+000 2.0000000e-001 1 5.2000000e+000 3.4000000e+000 1.4000000e+000 2.0000000e-001 1 4.7000000e+000 3.2000000e+000 1.6000000e+000 2.0000000e-001 1 4.8000000e+000 3.1000000e+000 1.6000000e+000 2.0000000e-001 1 5.4000000e+000 3.4000000e+000 1.5000000e+000 4.0000000e-001 1 5.2000000e+000 4.1000000e+000 1.5000000e+000 1.0000000e-001 1 5.5000000e+000 4.2000000e+000 1.4000000e+000 2.0000000e-001 1 4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1 5.0000000e+000 3.2000000e+000 1.2000000e+000 2.0000000e-001 1 5.5000000e+000 3.5000000e+000 1.3000000e+000 2.0000000e-001 1 4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1 4.4000000e+000 3.0000000e+000 1.3000000e+000 2.0000000e-001 1 5.1000000e+000 3.4000000e+000 1.5000000e+000 2.0000000e-001 1 5.0000000e+000 3.5000000e+000 1.3000000e+000 3.0000000e-001 1 4.5000000e+000 2.3000000e+000 1.3000000e+000 3.0000000e-001 1 4.4000000e+000 3.2000000e+000 1.3000000e+000 2.0000000e-001 1 5.0000000e+000 3.5000000e+000 1.6000000e+000 6.0000000e-001 1 5.1000000e+000 3.8000000e+000 1.9000000e+000 4.0000000e-001 1 4.8000000e+000 3.0000000e+000 1.4000000e+000 3.0000000e-001 1 5.1000000e+000 3.8000000e+000 1.6000000e+000 2.0000000e-001 1 4.6000000e+000 3.2000000e+000 1.4000000e+000 2.0000000e-001 1 5.3000000e+000 3.7000000e+000 1.5000000e+000 2.0000000e-001 1 5.0000000e+000 3.3000000e+000 1.4000000e+000 2.0000000e-001 1 7.0000000e+000 3.2000000e+000 4.7000000e+000 1.4000000e+000 2 6.4000000e+000 3.2000000e+000 4.5000000e+000 1.5000000e+000 2 6.9000000e+000 3.1000000e+000 4.9000000e+000 1.5000000e+000 2 5.5000000e+000 2.3000000e+000 4.0000000e+000 1.3000000e+000 2 6.5000000e+000 2.8000000e+000 4.6000000e+000 1.5000000e+000 2 5.7000000e+000 2.8000000e+000 4.5000000e+000 1.3000000e+000 2 6.3000000e+000 3.3000000e+000 4.7000000e+000 1.6000000e+000 2 4.9000000e+000 2.4000000e+000 3.3000000e+000 1.0000000e+000 2 6.6000000e+000 2.9000000e+000 4.6000000e+000 1.3000000e+000 2 5.2000000e+000 2.7000000e+000 3.9000000e+000 1.4000000e+000 2 5.0000000e+000 2.0000000e+000 3.5000000e+000 1.0000000e+000 2 5.9000000e+000 3.0000000e+000 4.2000000e+000 1.5000000e+000 2 6.0000000e+000 2.2000000e+000 4.0000000e+000 1.0000000e+000 2 6.1000000e+000 2.9000000e+000 4.7000000e+000 1.4000000e+000 2 5.6000000e+000 2.9000000e+000 3.6000000e+000 1.3000000e+000 2 6.7000000e+000 3.1000000e+000 4.4000000e+000 1.4000000e+000 2 5.6000000e+000 3.0000000e+000 4.5000000e+000 1.5000000e+000 2 5.8000000e+000 2.7000000e+000 4.1000000e+000 1.0000000e+000 2 6.2000000e+000 2.2000000e+000 4.5000000e+000 1.5000000e+000 2 5.6000000e+000 2.5000000e+000 3.9000000e+000 1.1000000e+000 2 5.9000000e+000 3.2000000e+000 4.8000000e+000 1.8000000e+000 2 6.1000000e+000 2.8000000e+000 4.0000000e+000 1.3000000e+000 2 6.3000000e+000 2.5000000e+000 4.9000000e+000 1.5000000e+000 2 6.1000000e+000 2.8000000e+000 4.7000000e+000 1.2000000e+000 2 6.4000000e+000 2.9000000e+000 4.3000000e+000 1.3000000e+000 2 6.6000000e+000 3.0000000e+000 4.4000000e+000 1.4000000e+000 2 6.8000000e+000 2.8000000e+000 4.8000000e+000 1.4000000e+000 2 6.7000000e+000 3.0000000e+000 5.0000000e+000 1.7000000e+000 2 6.0000000e+000 2.9000000e+000 4.5000000e+000 1.5000000e+000 2 5.7000000e+000 2.6000000e+000 3.5000000e+000 1.0000000e+000 2 5.5000000e+000 2.4000000e+000 3.8000000e+000 1.1000000e+000 2 5.5000000e+000 2.4000000e+000 3.7000000e+000 1.0000000e+000 2 5.8000000e+000 2.7000000e+000 3.9000000e+000 1.2000000e+000 2 6.0000000e+000 2.7000000e+000 5.1000000e+000 1.6000000e+000 2 5.4000000e+000 3.0000000e+000 4.5000000e+000 1.5000000e+000 2 6.0000000e+000 3.4000000e+000 4.5000000e+000 1.6000000e+000 2 6.7000000e+000 3.1000000e+000 4.7000000e+000 1.5000000e+000 2 6.3000000e+000 2.3000000e+000 4.4000000e+000 1.3000000e+000 2 5.6000000e+000 3.0000000e+000 4.1000000e+000 1.3000000e+000 2 5.5000000e+000 2.5000000e+000 4.0000000e+000 1.3000000e+000 2 5.5000000e+000 2.6000000e+000 4.4000000e+000 1.2000000e+000 2 6.1000000e+000 3.0000000e+000 4.6000000e+000 1.4000000e+000 2 5.8000000e+000 2.6000000e+000 4.0000000e+000 1.2000000e+000 2 5.0000000e+000 2.3000000e+000 3.3000000e+000 1.0000000e+000 2 5.6000000e+000 2.7000000e+000 4.2000000e+000 1.3000000e+000 2 5.7000000e+000 3.0000000e+000 4.2000000e+000 1.2000000e+000 2 5.7000000e+000 2.9000000e+000 4.2000000e+000 1.3000000e+000 2 6.2000000e+000 2.9000000e+000 4.3000000e+000 1.3000000e+000 2 5.1000000e+000 2.5000000e+000 3.0000000e+000 1.1000000e+000 2 5.7000000e+000 2.8000000e+000 4.1000000e+000 1.3000000e+000 2 6.3000000e+000 3.3000000e+000 6.0000000e+000 2.5000000e+000 3 5.8000000e+000 2.7000000e+000 5.1000000e+000 1.9000000e+000 3 7.1000000e+000 3.0000000e+000 5.9000000e+000 2.1000000e+000 3 6.3000000e+000 2.9000000e+000 5.6000000e+000 1.8000000e+000 3 6.5000000e+000 3.0000000e+000 5.8000000e+000 2.2000000e+000 3 7.6000000e+000 3.0000000e+000 6.6000000e+000 2.1000000e+000 3 4.9000000e+000 2.5000000e+000 4.5000000e+000 1.7000000e+000 3 7.3000000e+000 2.9000000e+000 6.3000000e+000 1.8000000e+000 3 6.7000000e+000 2.5000000e+000 5.8000000e+000 1.8000000e+000 3 7.2000000e+000 3.6000000e+000 6.1000000e+000 2.5000000e+000 3 6.5000000e+000 3.2000000e+000 5.1000000e+000 2.0000000e+000 3 6.4000000e+000 2.7000000e+000 5.3000000e+000 1.9000000e+000 3 6.8000000e+000 3.0000000e+000 5.5000000e+000 2.1000000e+000 3 5.7000000e+000 2.5000000e+000 5.0000000e+000 2.0000000e+000 3 5.8000000e+000 2.8000000e+000 5.1000000e+000 2.4000000e+000 3 6.4000000e+000 3.2000000e+000 5.3000000e+000 2.3000000e+000 3 6.5000000e+000 3.0000000e+000 5.5000000e+000 1.8000000e+000 3 7.7000000e+000 3.8000000e+000 6.7000000e+000 2.2000000e+000 3 7.7000000e+000 2.6000000e+000 6.9000000e+000 2.3000000e+000 3 6.0000000e+000 2.2000000e+000 5.0000000e+000 1.5000000e+000 3 6.9000000e+000 3.2000000e+000 5.7000000e+000 2.3000000e+000 3 5.6000000e+000 2.8000000e+000 4.9000000e+000 2.0000000e+000 3 7.7000000e+000 2.8000000e+000 6.7000000e+000 2.0000000e+000 3 6.3000000e+000 2.7000000e+000 4.9000000e+000 1.8000000e+000 3 6.7000000e+000 3.3000000e+000 5.7000000e+000 2.1000000e+000 3 7.2000000e+000 3.2000000e+000 6.0000000e+000 1.8000000e+000 3 6.2000000e+000 2.8000000e+000 4.8000000e+000 1.8000000e+000 3 6.1000000e+000 3.0000000e+000 4.9000000e+000 1.8000000e+000 3 6.4000000e+000 2.8000000e+000 5.6000000e+000 2.1000000e+000 3 7.2000000e+000 3.0000000e+000 5.8000000e+000 1.6000000e+000 3 7.4000000e+000 2.8000000e+000 6.1000000e+000 1.9000000e+000 3 7.9000000e+000 3.8000000e+000 6.4000000e+000 2.0000000e+000 3 6.4000000e+000 2.8000000e+000 5.6000000e+000 2.2000000e+000 3 6.3000000e+000 2.8000000e+000 5.1000000e+000 1.5000000e+000 3 6.1000000e+000 2.6000000e+000 5.6000000e+000 1.4000000e+000 3 7.7000000e+000 3.0000000e+000 6.1000000e+000 2.3000000e+000 3 6.3000000e+000 3.4000000e+000 5.6000000e+000 2.4000000e+000 3 6.4000000e+000 3.1000000e+000 5.5000000e+000 1.8000000e+000 3 6.0000000e+000 3.0000000e+000 4.8000000e+000 1.8000000e+000 3 6.9000000e+000 3.1000000e+000 5.4000000e+000 2.1000000e+000 3 6.7000000e+000 3.1000000e+000 5.6000000e+000 2.4000000e+000 3 6.9000000e+000 3.1000000e+000 5.1000000e+000 2.3000000e+000 3 5.8000000e+000 2.7000000e+000 5.1000000e+000 1.9000000e+000 3 6.8000000e+000 3.2000000e+000 5.9000000e+000 2.3000000e+000 3 6.7000000e+000 3.3000000e+000 5.7000000e+000 2.5000000e+000 3 6.7000000e+000 3.0000000e+000 5.2000000e+000 2.3000000e+000 3 6.3000000e+000 2.5000000e+000 5.0000000e+000 1.9000000e+000 3 6.5000000e+000 3.0000000e+000 5.2000000e+000 2.0000000e+000 3 6.2000000e+000 3.4000000e+000 5.4000000e+000 2.3000000e+000 3 5.9000000e+000 3.0000000e+000 5.1000000e+000 1.8000000e+000 3 ``` ## 利用不同的兩種特徵畫出三類別差異的散佈圖,總共![](https://i.imgur.com/IZrh18m.png)(6張圖片)。 **讀取檔案** ``` dataSet = load('iris.txt'); rawData = dataSet(:,1:4); % 原始資料,75筆資料 x 4個特徵 label = dataSet(:,5); % 75筆資料所對應的標籤 ``` **繪製圖片** ``` Scatter Plot figure; % 開啟新的繪圖空間 row = nchoosek(1:4, 2); row_size = size(row); for dd= 1:row_size(1) subplot(2, 3, dd) plot(rawData( 1: 50, row(dd,1)), rawData( 1: 50, row(dd,2)),'ro',... rawData( 51:100, row(dd,1)), rawData( 51:100, row(dd,2)),'go',... rawData(101:150, row(dd,1)), rawData(101:150, row(dd,2)),'bo'); % 以plot繪圖指令分別畫出class1~3之第一與第二特徵。 title('Scatter Plot'); % 圖名稱 legend('class1', 'class2', 'class3'); % 類別標號說明 xlabel(['Feature', int2str(row(dd,1))]); % 特徵標號註解 ylabel(['Feature', int2str(row(dd,2))]); % 特徵標號註解 end ``` ![](https://i.imgur.com/H748AM4.png) ## 利用Iris dataset測試K-NN分類器,並列出所有可能之特徵組合(共15種組合)的分類率。 第一次先將各類別資料中的前一半資料當作測試資料(Training data),剩下的後一半資料當作測試資料(Testing data),求得一個分類率;之後再將Training data和Testing data互換,求得第二個分類率,再將兩分類率平均。 這邊以rate1作為前半段的分類率; 以rate2作為後半段的分類率 以鄰近數量1、3做為測試 ``` %1、2、3、4類 for j=1:4 row = nchoosek(1:4, j); row_size = size(row); for dd= 1:row_size class_ = num2str(row(dd,:)); trnSet = [rawData( 1: 25, row(dd,:));... rawData( 51: 75, row(dd,:));... rawData(101:125, row(dd,:));]; % 選取每類別前半,合併為training set tstSet = [rawData( 26: 50, row(dd,:));... rawData( 76:100, row(dd,:));... rawData(126:150, row(dd,:))]; % 選取每類別後半,合併為test set trnClass = [label(1: 25,1);... label(51: 75,1);... label(101:125,1);]; tstClass = [label(26: 50,1);... label(76:100,1);... label(126:150,1)]; k = 1; rate1 = rate_calculate(trnSet, tstSet, trnClass, tstClass, k); rate2 = rate_calculate(tstSet, trnSet, tstClass, trnClass, k); rate = (rate1 + rate2)/2; fprintf('K=%d ,特徵%10s分類率 : %.4f\n', k, class_, rate) k = 3; rate1 = rate_calculate(trnSet, tstSet, trnClass, tstClass, k); rate2 = rate_calculate(tstSet, trnSet, tstClass, trnClass, k); rate = (rate1 + rate2)/2; fprintf('K=%d ,特徵%10s分類率 : %.4f\n', k, class_, rate) end end ``` 輸出為 ``` K=1 ,特徵 1分類率 : 0.5933 K=3 ,特徵 1分類率 : 0.6000 K=1 ,特徵 2分類率 : 0.4867 K=3 ,特徵 2分類率 : 0.5200 K=1 ,特徵 3分類率 : 0.9133 K=3 ,特徵 3分類率 : 0.9267 K=1 ,特徵 4分類率 : 0.9133 K=3 ,特徵 4分類率 : 0.9600 K=1 ,特徵 1 2分類率 : 0.7067 K=3 ,特徵 1 2分類率 : 0.7533 K=1 ,特徵 1 3分類率 : 0.9267 K=3 ,特徵 1 3分類率 : 0.9267 K=1 ,特徵 1 4分類率 : 0.8867 K=3 ,特徵 1 4分類率 : 0.9400 K=1 ,特徵 2 3分類率 : 0.9200 K=3 ,特徵 2 3分類率 : 0.9200 K=1 ,特徵 2 4分類率 : 0.9333 K=3 ,特徵 2 4分類率 : 0.9533 K=1 ,特徵 3 4分類率 : 0.9533 K=3 ,特徵 3 4分類率 : 0.9533 K=1 ,特徵 1 2 3分類率 : 0.9267 K=3 ,特徵 1 2 3分類率 : 0.9267 K=1 ,特徵 1 2 4分類率 : 0.9267 K=3 ,特徵 1 2 4分類率 : 0.9067 K=1 ,特徵 1 3 4分類率 : 0.9467 K=3 ,特徵 1 3 4分類率 : 0.9533 K=1 ,特徵 2 3 4分類率 : 0.9667 K=3 ,特徵 2 3 4分類率 : 0.9733 K=1 ,特徵1 2 3 4分類率 : 0.9467 K=3 ,特徵1 2 3 4分類率 : 0.9400 ``` 這邊 norm 函式會計算出歐幾里得空間裡的距離大小。 以矩陣A=[1,2,3]; B=[5,6,7]為例: norm(A-B)=6.9282 因為A-B=[-4,-4,-4] ( (-4)^2^ + (-4)^2^ + (-4)^2^ )^0.5^ =6.9282 這邊 sort 函式參數ascend能將 d_array由小排序到大(value),並將原始位置記錄下來(index) 以矩陣A=[5,1,4,3,2]為例: [value,index] = sort(A,'ascend') value = [1,2,3,4,5] (排序大小) index = [2,5,4,3,1] (原始位置) **function KNN** ``` function seat = knn(train_row, test_row, k) for test_count=1:75 for train_count=1:75 distance = norm(test_row(test_count,:)-train_row(train_count,:)); d_array(train_count) = distance; end [value,index] = sort(d_array,'ascend'); %最有可能的點是在哪個位置 seat(test_count,1:k) = index(1:k); end end ``` KNN的分類率計算 **function rate_calculate** ``` function rate = rate_calculate(trnSet, tstSet, trnClass, tstClass, k) predict_seat = knn(trnSet, tstSet, k); pd_value = 0; correct = 0; for count=1:75 for c = 1:k %對類似[2, 1, 3]做處理 pd_array(c) = tstClass(predict_seat(count, c)); if mode(pd_array)==1 && pd_array(1)~=1 pd_value=pd_array(1); else pd_value=mode(pd_array); end end if trnClass(count) == pd_value correct = correct + 1; end end rate = correct/75; end ```