# Matlab use KNN calculate iris dataset
## KNN 距離計算
k-近鄰演算法是所有的機器學習演算法中最簡單的之一。
KNN大略來說是以測試資料距離最近的訓練資料標籤是屬於哪類,就將測試資料判定為哪類別,而參數k為總共要判斷鄰近的幾個座標點。
以下案例由 iris 花朵特徵作為範例
由左至右分別為特徵一、二、三、四 及類別標籤
```
5.1000000e+000 3.5000000e+000 1.4000000e+000 2.0000000e-001 1
4.9000000e+000 3.0000000e+000 1.4000000e+000 2.0000000e-001 1
4.7000000e+000 3.2000000e+000 1.3000000e+000 2.0000000e-001 1
4.6000000e+000 3.1000000e+000 1.5000000e+000 2.0000000e-001 1
5.0000000e+000 3.6000000e+000 1.4000000e+000 2.0000000e-001 1
5.4000000e+000 3.9000000e+000 1.7000000e+000 4.0000000e-001 1
4.6000000e+000 3.4000000e+000 1.4000000e+000 3.0000000e-001 1
5.0000000e+000 3.4000000e+000 1.5000000e+000 2.0000000e-001 1
4.4000000e+000 2.9000000e+000 1.4000000e+000 2.0000000e-001 1
4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1
5.4000000e+000 3.7000000e+000 1.5000000e+000 2.0000000e-001 1
4.8000000e+000 3.4000000e+000 1.6000000e+000 2.0000000e-001 1
4.8000000e+000 3.0000000e+000 1.4000000e+000 1.0000000e-001 1
4.3000000e+000 3.0000000e+000 1.1000000e+000 1.0000000e-001 1
5.8000000e+000 4.0000000e+000 1.2000000e+000 2.0000000e-001 1
5.7000000e+000 4.4000000e+000 1.5000000e+000 4.0000000e-001 1
5.4000000e+000 3.9000000e+000 1.3000000e+000 4.0000000e-001 1
5.1000000e+000 3.5000000e+000 1.4000000e+000 3.0000000e-001 1
5.7000000e+000 3.8000000e+000 1.7000000e+000 3.0000000e-001 1
5.1000000e+000 3.8000000e+000 1.5000000e+000 3.0000000e-001 1
5.4000000e+000 3.4000000e+000 1.7000000e+000 2.0000000e-001 1
5.1000000e+000 3.7000000e+000 1.5000000e+000 4.0000000e-001 1
4.6000000e+000 3.6000000e+000 1.0000000e+000 2.0000000e-001 1
5.1000000e+000 3.3000000e+000 1.7000000e+000 5.0000000e-001 1
4.8000000e+000 3.4000000e+000 1.9000000e+000 2.0000000e-001 1
5.0000000e+000 3.0000000e+000 1.6000000e+000 2.0000000e-001 1
5.0000000e+000 3.4000000e+000 1.6000000e+000 4.0000000e-001 1
5.2000000e+000 3.5000000e+000 1.5000000e+000 2.0000000e-001 1
5.2000000e+000 3.4000000e+000 1.4000000e+000 2.0000000e-001 1
4.7000000e+000 3.2000000e+000 1.6000000e+000 2.0000000e-001 1
4.8000000e+000 3.1000000e+000 1.6000000e+000 2.0000000e-001 1
5.4000000e+000 3.4000000e+000 1.5000000e+000 4.0000000e-001 1
5.2000000e+000 4.1000000e+000 1.5000000e+000 1.0000000e-001 1
5.5000000e+000 4.2000000e+000 1.4000000e+000 2.0000000e-001 1
4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1
5.0000000e+000 3.2000000e+000 1.2000000e+000 2.0000000e-001 1
5.5000000e+000 3.5000000e+000 1.3000000e+000 2.0000000e-001 1
4.9000000e+000 3.1000000e+000 1.5000000e+000 1.0000000e-001 1
4.4000000e+000 3.0000000e+000 1.3000000e+000 2.0000000e-001 1
5.1000000e+000 3.4000000e+000 1.5000000e+000 2.0000000e-001 1
5.0000000e+000 3.5000000e+000 1.3000000e+000 3.0000000e-001 1
4.5000000e+000 2.3000000e+000 1.3000000e+000 3.0000000e-001 1
4.4000000e+000 3.2000000e+000 1.3000000e+000 2.0000000e-001 1
5.0000000e+000 3.5000000e+000 1.6000000e+000 6.0000000e-001 1
5.1000000e+000 3.8000000e+000 1.9000000e+000 4.0000000e-001 1
4.8000000e+000 3.0000000e+000 1.4000000e+000 3.0000000e-001 1
5.1000000e+000 3.8000000e+000 1.6000000e+000 2.0000000e-001 1
4.6000000e+000 3.2000000e+000 1.4000000e+000 2.0000000e-001 1
5.3000000e+000 3.7000000e+000 1.5000000e+000 2.0000000e-001 1
5.0000000e+000 3.3000000e+000 1.4000000e+000 2.0000000e-001 1
7.0000000e+000 3.2000000e+000 4.7000000e+000 1.4000000e+000 2
6.4000000e+000 3.2000000e+000 4.5000000e+000 1.5000000e+000 2
6.9000000e+000 3.1000000e+000 4.9000000e+000 1.5000000e+000 2
5.5000000e+000 2.3000000e+000 4.0000000e+000 1.3000000e+000 2
6.5000000e+000 2.8000000e+000 4.6000000e+000 1.5000000e+000 2
5.7000000e+000 2.8000000e+000 4.5000000e+000 1.3000000e+000 2
6.3000000e+000 3.3000000e+000 4.7000000e+000 1.6000000e+000 2
4.9000000e+000 2.4000000e+000 3.3000000e+000 1.0000000e+000 2
6.6000000e+000 2.9000000e+000 4.6000000e+000 1.3000000e+000 2
5.2000000e+000 2.7000000e+000 3.9000000e+000 1.4000000e+000 2
5.0000000e+000 2.0000000e+000 3.5000000e+000 1.0000000e+000 2
5.9000000e+000 3.0000000e+000 4.2000000e+000 1.5000000e+000 2
6.0000000e+000 2.2000000e+000 4.0000000e+000 1.0000000e+000 2
6.1000000e+000 2.9000000e+000 4.7000000e+000 1.4000000e+000 2
5.6000000e+000 2.9000000e+000 3.6000000e+000 1.3000000e+000 2
6.7000000e+000 3.1000000e+000 4.4000000e+000 1.4000000e+000 2
5.6000000e+000 3.0000000e+000 4.5000000e+000 1.5000000e+000 2
5.8000000e+000 2.7000000e+000 4.1000000e+000 1.0000000e+000 2
6.2000000e+000 2.2000000e+000 4.5000000e+000 1.5000000e+000 2
5.6000000e+000 2.5000000e+000 3.9000000e+000 1.1000000e+000 2
5.9000000e+000 3.2000000e+000 4.8000000e+000 1.8000000e+000 2
6.1000000e+000 2.8000000e+000 4.0000000e+000 1.3000000e+000 2
6.3000000e+000 2.5000000e+000 4.9000000e+000 1.5000000e+000 2
6.1000000e+000 2.8000000e+000 4.7000000e+000 1.2000000e+000 2
6.4000000e+000 2.9000000e+000 4.3000000e+000 1.3000000e+000 2
6.6000000e+000 3.0000000e+000 4.4000000e+000 1.4000000e+000 2
6.8000000e+000 2.8000000e+000 4.8000000e+000 1.4000000e+000 2
6.7000000e+000 3.0000000e+000 5.0000000e+000 1.7000000e+000 2
6.0000000e+000 2.9000000e+000 4.5000000e+000 1.5000000e+000 2
5.7000000e+000 2.6000000e+000 3.5000000e+000 1.0000000e+000 2
5.5000000e+000 2.4000000e+000 3.8000000e+000 1.1000000e+000 2
5.5000000e+000 2.4000000e+000 3.7000000e+000 1.0000000e+000 2
5.8000000e+000 2.7000000e+000 3.9000000e+000 1.2000000e+000 2
6.0000000e+000 2.7000000e+000 5.1000000e+000 1.6000000e+000 2
5.4000000e+000 3.0000000e+000 4.5000000e+000 1.5000000e+000 2
6.0000000e+000 3.4000000e+000 4.5000000e+000 1.6000000e+000 2
6.7000000e+000 3.1000000e+000 4.7000000e+000 1.5000000e+000 2
6.3000000e+000 2.3000000e+000 4.4000000e+000 1.3000000e+000 2
5.6000000e+000 3.0000000e+000 4.1000000e+000 1.3000000e+000 2
5.5000000e+000 2.5000000e+000 4.0000000e+000 1.3000000e+000 2
5.5000000e+000 2.6000000e+000 4.4000000e+000 1.2000000e+000 2
6.1000000e+000 3.0000000e+000 4.6000000e+000 1.4000000e+000 2
5.8000000e+000 2.6000000e+000 4.0000000e+000 1.2000000e+000 2
5.0000000e+000 2.3000000e+000 3.3000000e+000 1.0000000e+000 2
5.6000000e+000 2.7000000e+000 4.2000000e+000 1.3000000e+000 2
5.7000000e+000 3.0000000e+000 4.2000000e+000 1.2000000e+000 2
5.7000000e+000 2.9000000e+000 4.2000000e+000 1.3000000e+000 2
6.2000000e+000 2.9000000e+000 4.3000000e+000 1.3000000e+000 2
5.1000000e+000 2.5000000e+000 3.0000000e+000 1.1000000e+000 2
5.7000000e+000 2.8000000e+000 4.1000000e+000 1.3000000e+000 2
6.3000000e+000 3.3000000e+000 6.0000000e+000 2.5000000e+000 3
5.8000000e+000 2.7000000e+000 5.1000000e+000 1.9000000e+000 3
7.1000000e+000 3.0000000e+000 5.9000000e+000 2.1000000e+000 3
6.3000000e+000 2.9000000e+000 5.6000000e+000 1.8000000e+000 3
6.5000000e+000 3.0000000e+000 5.8000000e+000 2.2000000e+000 3
7.6000000e+000 3.0000000e+000 6.6000000e+000 2.1000000e+000 3
4.9000000e+000 2.5000000e+000 4.5000000e+000 1.7000000e+000 3
7.3000000e+000 2.9000000e+000 6.3000000e+000 1.8000000e+000 3
6.7000000e+000 2.5000000e+000 5.8000000e+000 1.8000000e+000 3
7.2000000e+000 3.6000000e+000 6.1000000e+000 2.5000000e+000 3
6.5000000e+000 3.2000000e+000 5.1000000e+000 2.0000000e+000 3
6.4000000e+000 2.7000000e+000 5.3000000e+000 1.9000000e+000 3
6.8000000e+000 3.0000000e+000 5.5000000e+000 2.1000000e+000 3
5.7000000e+000 2.5000000e+000 5.0000000e+000 2.0000000e+000 3
5.8000000e+000 2.8000000e+000 5.1000000e+000 2.4000000e+000 3
6.4000000e+000 3.2000000e+000 5.3000000e+000 2.3000000e+000 3
6.5000000e+000 3.0000000e+000 5.5000000e+000 1.8000000e+000 3
7.7000000e+000 3.8000000e+000 6.7000000e+000 2.2000000e+000 3
7.7000000e+000 2.6000000e+000 6.9000000e+000 2.3000000e+000 3
6.0000000e+000 2.2000000e+000 5.0000000e+000 1.5000000e+000 3
6.9000000e+000 3.2000000e+000 5.7000000e+000 2.3000000e+000 3
5.6000000e+000 2.8000000e+000 4.9000000e+000 2.0000000e+000 3
7.7000000e+000 2.8000000e+000 6.7000000e+000 2.0000000e+000 3
6.3000000e+000 2.7000000e+000 4.9000000e+000 1.8000000e+000 3
6.7000000e+000 3.3000000e+000 5.7000000e+000 2.1000000e+000 3
7.2000000e+000 3.2000000e+000 6.0000000e+000 1.8000000e+000 3
6.2000000e+000 2.8000000e+000 4.8000000e+000 1.8000000e+000 3
6.1000000e+000 3.0000000e+000 4.9000000e+000 1.8000000e+000 3
6.4000000e+000 2.8000000e+000 5.6000000e+000 2.1000000e+000 3
7.2000000e+000 3.0000000e+000 5.8000000e+000 1.6000000e+000 3
7.4000000e+000 2.8000000e+000 6.1000000e+000 1.9000000e+000 3
7.9000000e+000 3.8000000e+000 6.4000000e+000 2.0000000e+000 3
6.4000000e+000 2.8000000e+000 5.6000000e+000 2.2000000e+000 3
6.3000000e+000 2.8000000e+000 5.1000000e+000 1.5000000e+000 3
6.1000000e+000 2.6000000e+000 5.6000000e+000 1.4000000e+000 3
7.7000000e+000 3.0000000e+000 6.1000000e+000 2.3000000e+000 3
6.3000000e+000 3.4000000e+000 5.6000000e+000 2.4000000e+000 3
6.4000000e+000 3.1000000e+000 5.5000000e+000 1.8000000e+000 3
6.0000000e+000 3.0000000e+000 4.8000000e+000 1.8000000e+000 3
6.9000000e+000 3.1000000e+000 5.4000000e+000 2.1000000e+000 3
6.7000000e+000 3.1000000e+000 5.6000000e+000 2.4000000e+000 3
6.9000000e+000 3.1000000e+000 5.1000000e+000 2.3000000e+000 3
5.8000000e+000 2.7000000e+000 5.1000000e+000 1.9000000e+000 3
6.8000000e+000 3.2000000e+000 5.9000000e+000 2.3000000e+000 3
6.7000000e+000 3.3000000e+000 5.7000000e+000 2.5000000e+000 3
6.7000000e+000 3.0000000e+000 5.2000000e+000 2.3000000e+000 3
6.3000000e+000 2.5000000e+000 5.0000000e+000 1.9000000e+000 3
6.5000000e+000 3.0000000e+000 5.2000000e+000 2.0000000e+000 3
6.2000000e+000 3.4000000e+000 5.4000000e+000 2.3000000e+000 3
5.9000000e+000 3.0000000e+000 5.1000000e+000 1.8000000e+000 3
```
## 利用不同的兩種特徵畫出三類別差異的散佈圖,總共(6張圖片)。
**讀取檔案**
```
dataSet = load('iris.txt');
rawData = dataSet(:,1:4); % 原始資料,75筆資料 x 4個特徵
label = dataSet(:,5); % 75筆資料所對應的標籤
```
**繪製圖片**
```
Scatter Plot
figure; % 開啟新的繪圖空間
row = nchoosek(1:4, 2);
row_size = size(row);
for dd= 1:row_size(1)
subplot(2, 3, dd)
plot(rawData( 1: 50, row(dd,1)), rawData( 1: 50, row(dd,2)),'ro',...
rawData( 51:100, row(dd,1)), rawData( 51:100, row(dd,2)),'go',...
rawData(101:150, row(dd,1)), rawData(101:150, row(dd,2)),'bo');
% 以plot繪圖指令分別畫出class1~3之第一與第二特徵。
title('Scatter Plot'); % 圖名稱
legend('class1', 'class2', 'class3'); % 類別標號說明
xlabel(['Feature', int2str(row(dd,1))]); % 特徵標號註解
ylabel(['Feature', int2str(row(dd,2))]); % 特徵標號註解
end
```

## 利用Iris dataset測試K-NN分類器,並列出所有可能之特徵組合(共15種組合)的分類率。
第一次先將各類別資料中的前一半資料當作測試資料(Training data),剩下的後一半資料當作測試資料(Testing data),求得一個分類率;之後再將Training data和Testing data互換,求得第二個分類率,再將兩分類率平均。
這邊以rate1作為前半段的分類率; 以rate2作為後半段的分類率
以鄰近數量1、3做為測試
```
%1、2、3、4類
for j=1:4
row = nchoosek(1:4, j);
row_size = size(row);
for dd= 1:row_size
class_ = num2str(row(dd,:));
trnSet = [rawData( 1: 25, row(dd,:));...
rawData( 51: 75, row(dd,:));...
rawData(101:125, row(dd,:));];
% 選取每類別前半,合併為training set
tstSet = [rawData( 26: 50, row(dd,:));...
rawData( 76:100, row(dd,:));...
rawData(126:150, row(dd,:))];
% 選取每類別後半,合併為test set
trnClass = [label(1: 25,1);...
label(51: 75,1);...
label(101:125,1);];
tstClass = [label(26: 50,1);...
label(76:100,1);...
label(126:150,1)];
k = 1;
rate1 = rate_calculate(trnSet, tstSet, trnClass, tstClass, k);
rate2 = rate_calculate(tstSet, trnSet, tstClass, trnClass, k);
rate = (rate1 + rate2)/2;
fprintf('K=%d ,特徵%10s分類率 : %.4f\n', k, class_, rate)
k = 3;
rate1 = rate_calculate(trnSet, tstSet, trnClass, tstClass, k);
rate2 = rate_calculate(tstSet, trnSet, tstClass, trnClass, k);
rate = (rate1 + rate2)/2;
fprintf('K=%d ,特徵%10s分類率 : %.4f\n', k, class_, rate)
end
end
```
輸出為
```
K=1 ,特徵 1分類率 : 0.5933
K=3 ,特徵 1分類率 : 0.6000
K=1 ,特徵 2分類率 : 0.4867
K=3 ,特徵 2分類率 : 0.5200
K=1 ,特徵 3分類率 : 0.9133
K=3 ,特徵 3分類率 : 0.9267
K=1 ,特徵 4分類率 : 0.9133
K=3 ,特徵 4分類率 : 0.9600
K=1 ,特徵 1 2分類率 : 0.7067
K=3 ,特徵 1 2分類率 : 0.7533
K=1 ,特徵 1 3分類率 : 0.9267
K=3 ,特徵 1 3分類率 : 0.9267
K=1 ,特徵 1 4分類率 : 0.8867
K=3 ,特徵 1 4分類率 : 0.9400
K=1 ,特徵 2 3分類率 : 0.9200
K=3 ,特徵 2 3分類率 : 0.9200
K=1 ,特徵 2 4分類率 : 0.9333
K=3 ,特徵 2 4分類率 : 0.9533
K=1 ,特徵 3 4分類率 : 0.9533
K=3 ,特徵 3 4分類率 : 0.9533
K=1 ,特徵 1 2 3分類率 : 0.9267
K=3 ,特徵 1 2 3分類率 : 0.9267
K=1 ,特徵 1 2 4分類率 : 0.9267
K=3 ,特徵 1 2 4分類率 : 0.9067
K=1 ,特徵 1 3 4分類率 : 0.9467
K=3 ,特徵 1 3 4分類率 : 0.9533
K=1 ,特徵 2 3 4分類率 : 0.9667
K=3 ,特徵 2 3 4分類率 : 0.9733
K=1 ,特徵1 2 3 4分類率 : 0.9467
K=3 ,特徵1 2 3 4分類率 : 0.9400
```
這邊 norm 函式會計算出歐幾里得空間裡的距離大小。
以矩陣A=[1,2,3]; B=[5,6,7]為例:
norm(A-B)=6.9282
因為A-B=[-4,-4,-4]
( (-4)^2^ + (-4)^2^ + (-4)^2^ )^0.5^ =6.9282
這邊 sort 函式參數ascend能將 d_array由小排序到大(value),並將原始位置記錄下來(index)
以矩陣A=[5,1,4,3,2]為例:
[value,index] = sort(A,'ascend')
value = [1,2,3,4,5] (排序大小)
index = [2,5,4,3,1] (原始位置)
**function KNN**
```
function seat = knn(train_row, test_row, k)
for test_count=1:75
for train_count=1:75
distance = norm(test_row(test_count,:)-train_row(train_count,:));
d_array(train_count) = distance;
end
[value,index] = sort(d_array,'ascend');
%最有可能的點是在哪個位置
seat(test_count,1:k) = index(1:k);
end
end
```
KNN的分類率計算
**function rate_calculate**
```
function rate = rate_calculate(trnSet, tstSet, trnClass, tstClass, k)
predict_seat = knn(trnSet, tstSet, k);
pd_value = 0;
correct = 0;
for count=1:75
for c = 1:k
%對類似[2, 1, 3]做處理
pd_array(c) = tstClass(predict_seat(count, c));
if mode(pd_array)==1 && pd_array(1)~=1
pd_value=pd_array(1);
else
pd_value=mode(pd_array);
end
end
if trnClass(count) == pd_value
correct = correct + 1;
end
end
rate = correct/75;
end
```