K-Nearest Classification Neighbors (KNN)

**K-Nearest Classification Neighbors (KNN) Overview:** K-Nearest Neighbors is a supervised machine learning algorithm used for classification and regression tasks. It makes predictions based on the majority class or average value of its k nearest neighbors in the feature space. We'll use the Iris dataset for this example. **Example Using Iris Dataset:** **Step 1: Import Libraries** ```python import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score, classification_report ``` **Step 2: Load and Explore the Iris Dataset** ```python # Load the Iris dataset iris = datasets.load_iris() X = iris.data # Feature matrix y = iris.target # Target labels ``` **Step 3: Split the Data into Training and Testing Sets** ```python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` **Step 4: Create and Train the KNN Classifier Model** ```python # Create a KNN classifier model with customizable parameters knn_model = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski') # Train the model on the training data knn_model.fit(X_train, y_train) ``` *Params That Can be Changed* 1. **n_neighbors** (default=5): - The number of neighbors to consider when making predictions. Adjusting this value can significantly impact the model's behavior. 2. **weights** (default="uniform"): - The weight function used in prediction. Options include "uniform" (all neighbors have equal weight) and "distance" (closer neighbors have more influence). 3. **algorithm** (default="auto"): - The algorithm used to compute nearest neighbors. Options include "auto," "ball_tree," "kd_tree," and "brute." 4. **leaf_size** (default=30): - Leaf size passed to BallTree or KDTree. It can impact the speed of the tree construction and querying. 5. **p** (default=2): - The power parameter for the Minkowski distance metric. When p=2, it uses Euclidean distance. You can change it for other distance metrics. 6. **metric** (default="minkowski"): - The distance metric to use when calculating distances between points. Options include "euclidean," "manhattan," "chebyshev," and more. **Step 5: Make Predictions** ```python # Make predictions on the test data y_pred = knn_model.predict(X_test) ``` **Step 6: Evaluate the Model** ```python # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy * 100:.2f}%") # Generate a classification report classification_rep = classification_report(y_test, y_pred, target_names=iris.target_names) print("Classification Report:") print(classification_rep) ``` These parameters allow you to fine-tune the KNN classifier’s behavior for your specific classification task and dataset. You can adjust the number of neighbors, the distance metric, and other hyperparameters to optimize model performance. Certainly! Here's an implementation of K-Nearest Neighbors (KNN) for regression along with an explanation, using a dataset. I'll also include parameters that can be changed: **K-Nearest Neighbors (KNN) Regression Overview:** K-Nearest Neighbors (KNN) is a supervised machine learning algorithm used for regression tasks. It works by finding the K-nearest data points in the training set to a given input and predicts the output based on the average (or weighted average) of the target values of those nearest neighbors. **Example Using a Sample Dataset:** **Step 1: Import Libraries and Load Dataset** ```python import numpy as np from sklearn.datasets import load_boston from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_squared_error, r2_score # Load the Boston Housing dataset (as an example) data = load_boston() X = data.data # Feature matrix y = data.target # Target values ``` **Step 2: Split the Data into Training and Testing Sets** ```python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` **Step 3: Create and Train the KNN Regression Model** ```python # Create a KNN regression model with K=5 (you can adjust K) knn_model = KNeighborsRegressor(n_neighbors=5) # Train the model on the training data knn_model.fit(X_train, y_train) ``` *Params That Can Be Changed:* 1. **n_neighbors** (default=5): - The number of neighbors to consider when making predictions. You can adjust this to change the size of the neighborhood. 2. **weights** (default='uniform'): - Determines how to weight the contributions of neighbors. Options include 'uniform' (all neighbors have equal weight) and 'distance' (closer neighbors have more influence). **Step 4: Make Predictions** ```python # Make predictions on the test data y_pred = knn_model.predict(X_test) ``` **Step 5: Evaluate the Model** ```python # Calculate Mean Squared Error (MSE) and R-squared (R2) score mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}") print(f"R-squared (R2) Score: {r2:.2f}") ``` **Explanation:** 1. We import the necessary libraries, including NumPy for numerical operations, scikit-learn for KNN regression, and a dataset (in this case, the Boston Housing dataset) to demonstrate the KNN regression model. 2. We load the dataset and split it into training and testing sets to evaluate the model's performance. 3. We create a KNN regression model with a specified number of neighbors (K). You can adjust the `n_neighbors` parameter to change the size of the neighborhood considered when making predictions. 4. The model is trained on the training data using the `fit` method. 5. We use the trained model to make predictions on the test data. 6. Finally, we evaluate the model's performance using metrics such as Mean Squared Error (MSE) and R-squared (R2) score. K-Nearest Neighbors (KNN) regression is a simple yet effective algorithm for regression tasks. It's particularly useful when you want to make predictions based on the similarity of data points in the feature space. Adjusting the number of neighbors (K) and the weighting scheme can impact the model's performance, allowing you to tailor it to your specific regression problem.