--- tags: Optimization Algorithms,K-NN --- # k-Nearest Neighbors algorithm(K-NN) - It is a Supervised learning algorithm - K-NN used to solve classification problem. - K-NN algorithm is classification algorithm that takes a bunch of labelled data, and use them to learn how to label other points - Cases that are near each other are said to be “neighbours” - K-NN comes under lazy learner category. ## K-nearest Neighbors algorithm steps: 1. Pick a value of k 3. Calculate distance of unkonwn cases from all cases using Euclidian distance $$Eculidian\ distance= \sqrt{(x_2 - x_1)^2 +(y_2 - y_1)^2 }$$ 4. Select the nearest neighbours based on calculated distance 5. Predict the unknown data point using the most voted value from nearest neighbour ## Adavantages: Easy for implementation More effective if training data is large ## Disadavantages: Selecting v value is challenging Computation east is high ## Application Pattern regression sensitive to scale data ## Example Let's assume this is sample for analysis | $$ x_1 $$ | $$ x_2 $$ | classification | | -------- | -------- | -------- | | 7 | 7 | bad | | 7 | 4 | bad | | 3 | 4 | good | | 1 | 4 | good | $Let's\ predict for\ new \ data\ point \ x_1=3,x_2=7$ **assume** k=3 Using Eculidian formula $\sum_{i=0}^n \sqrt{(x_2 - x_1)^2 +(x_2 - x_1)^2 }$ | $$ x_1 $$ | $$ x_2 $$ | distance | | -------- | -------- | -------- | | 7 | 7 | $$(7-3)^2+(7-7)^2 = 6$$ | | 7 | 4 | $$(7-3)^2+(4-7)^2 = 25$$ | | 3 | 4 | $$(3-3)^2+(4-7)^2 = 9$$ | | 1 | 4 | $$(1-3)^2+(4-7)^2 = 13$$ | Now sort distance based on calcuated value by compared k=3 | $$ x_1 $$ | $$ x_2 $$ | distance | Rank | k <3 | | -------- | -------- | -------- |------|--- | | 7 | 7 | $$(7-3)^2+(7-7)^2 = 6$$ | 1 |yes | | 7 | 4 | $$(7-3)^2+(4-7)^2 = 25$$ | 4 |no | | 3 | 4 | $$(3-3)^2+(4-7)^2 = 9$$ | 2 | yes | | 1 | 4 | $$(1-3)^2+(4-7)^2 = 13$$ | 3 |yes | Now select the observation which are nearst to k=3 and consider same value from calssificaiton column | $$ x_1 $$ | $$ x_2 $$ | distance | Rank | k <3 | calssificaiton of nearest neghbour| | -------- | -------- | -------- |------|--- | ------- | 7 | 7 | $$(7-3)^2+(7-7)^2 = 6$$ | 1 |yes | Bad | 7 | 4 | $$(7-3)^2+(4-7)^2 = 25$$ | 4 |no | -- | 3 | 4 | $$(3-3)^2+(4-7)^2 = 9$$ | 2 | yes | Good | 1 | 4 | $$(1-3)^2+(4-7)^2 = 13$$ | 3 |yes | Good Now by considering simple majority Good --> 2 and Bad --> 1 we can conclude for new data point $x_1 =3 and x_2 =7$ is belongs to Good category.