---
# System prepended metadata

title: Principle Component Analysis (PCA)

---

**Principal Component Analysis (PCA) Overview:**

Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional representation while retaining the most important information. PCA identifies the principal components (linear combinations of the original features) that explain the most variance in the data.

In this example, we'll use PCA to reduce the dimensionality of a dataset and visualize the results.

**Example Using a Dataset:**

**Step 1: Import Libraries**
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
```

**Step 2: Load and Prepare the Dataset**
```python
# Load the Iris dataset
iris = load_iris()
X = iris.data  # Feature matrix
y = iris.target  # Target labels
```

**Step 3: Perform PCA Dimensionality Reduction**
```python
# Create a PCA object with the desired number of components (e.g., 2)
n_components = 2
pca = PCA(n_components=n_components)

# Fit and transform the data to reduce dimensionality
X_pca = pca.fit_transform(X)
```

**Params That Can Be Changed**
1. **n_components**: The number of principal components to retain. You can choose a different number based on your dimensionality reduction needs.

**Step 4: Visualize the Reduced Data**
```python
# Create a scatter plot to visualize the reduced data
plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k')
plt.title('PCA of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.colorbar(label='Target Class')
plt.show()
```

**Explanation:**

1. We import the necessary libraries, including NumPy for numerical operations, Matplotlib for visualization, scikit-learn for the PCA implementation, and more.

2. We load the Iris dataset, a popular dataset for classification tasks, which contains four feature columns and three target classes.

3. We create a PCA object with the desired number of principal components (in this case, 2) using scikit-learn's PCA class.

4. We fit the PCA model to the data and transform the data to reduce its dimensionality. The transformed data (`X_pca`) now contains two principal components.

5. We visualize the reduced data using a scatter plot. Each point represents a data sample projected onto the first two principal components. The colors represent the target classes, allowing us to observe how the data clusters in the reduced space.

PCA is a powerful tool for dimensionality reduction, data visualization, and feature extraction. By selecting an appropriate number of principal components, you can balance the trade-off between preserving information and reducing dimensionality to meet the needs of your specific analysis or modeling task.