K-Means Clustering

**K-Means Clustering Overview:** K-Means is an unsupervised machine learning algorithm used for clustering similar data points into groups or clusters based on their features. It aims to partition data into K clusters, where each data point belongs to the cluster with the nearest mean. K-Means is widely used for tasks like customer segmentation and image compression. In this example, we'll use a dataset to demonstrate K-Means clustering. **Example Using a Dataset:** **Step 1: Import Libraries** ```python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_blobs from sklearn.cluster import KMeans ``` **Step 2: Generate Synthetic Data** ```python # Generate synthetic data with three clusters (you can replace this with your own dataset) X, _ = make_blobs(n_samples=300, centers=3, random_state=42) ``` **Step 3: Create and Train the K-Means Model** ```python # Create a K-Means model with customizable parameters kmeans_model = KMeans(n_clusters=3, init='k-means++', random_state=42) # Train the model on the data kmeans_model.fit(X) ``` **Params That Can Be Changed** 1. **n_clusters** (default=8): - Specifies the number of clusters (K) to form as well as the number of centroids to generate. 2. **init** (default='k-means++'): - Method for initializing cluster centroids. Common options include 'k-means++' (smart initialization), 'random' (random initialization), and providing a custom array of centroids. **Step 4: Make Predictions** ```python # Assign each data point to a cluster labels = kmeans_model.labels_ ``` **Step 5: Visualize the Clusters** ```python # Plot the data points and cluster centers plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis') plt.scatter(kmeans_model.cluster_centers_[:, 0], kmeans_model.cluster_centers_[:, 1], s=200, c='red', label='Centroids') plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.title("K-Means Clustering") plt.legend() plt.show() ``` **Explanation:** 1. We import the necessary libraries, including NumPy for numerical operations, Matplotlib for visualization, scikit-learn for K-Means clustering, and more. 2. We generate synthetic data using the `make_blobs` function, creating three clusters for this example. You can replace this step with your own dataset. 3. We create a K-Means model using `KMeans`. In this step, we introduce two customizable parameters: - `n_clusters`: Specifies the number of clusters (K) to form. - `init`: Determines the method for initializing cluster centroids. The default 'k-means++' uses a smart initialization method that often leads to better results. 4. The model is trained on the data using `fit`. 5. We use the trained model to assign each data point to one of the clusters. 6. Finally, we visualize the clusters by plotting the data points and cluster centers. This helps us understand how the K-Means algorithm has grouped similar data points together. You can customize the `n_clusters` and `init` parameters to control the number of clusters and the initialization method for K-Means clustering. These parameters allow you to tailor the clustering algorithm to your specific dataset and objectives.