**Naive Bayes Classifier Overview:**
Naive Bayes is a probabilistic classifier based on Bayes' theorem. It's particularly effective for text classification tasks like spam detection and sentiment analysis. It calculates the probability of a data point belonging to a particular class based on the features.
We'll use a text classification example for this implementation.
**Example Using a Text Classification Dataset:**
**Step 1: Import Libraries**
```python
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
```
**Step 2: Prepare the Text Data**
```python
# Sample text data
text_data = ["This is a positive message.",
"Negative sentiment in this text.",
"A positive outlook is important.",
"This doesn't look good."]
# Corresponding labels (0 for negative, 1 for positive)
labels = np.array([1, 0, 1, 0])
```
**Step 3: Vectorize the Text Data**
```python
# Create a CountVectorizer to convert text into a numerical format
vectorizer = CountVectorizer()
# Fit and transform the text data into a document-term matrix
X = vectorizer.fit_transform(text_data)
```
**Step 4: Split the Data into Training and Testing Sets**
```python
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)
```
**Step 5: Create and Train the Naive Bayes Classifier Model**
```python
# Create a Naive Bayes classifier model with customizable parameter
naive_bayes_model = MultinomialNB(alpha=1.0)
# Train the model on the training data
naive_bayes_model.fit(X_train, y_train)
```
**Params That Can be Changed**
1. **alpha** (default=1.0):
- Additive (Laplace/Lidstone) smoothing parameter. A value less than 1.0 accounts for no smoothing. Adjusting this value can impact the model's handling of rare words and help prevent zero probabilities in the probability estimates.
**Step 6: Make Predictions**
```python
# Make predictions on the test data
y_pred = naive_bayes_model.predict(X_test)
```
**Step 7: Evaluate the Model**
```python
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
# Generate a classification report
classification_rep = classification_report(y_test, y_pred, target_names=["Negative", "Positive"])
print("Classification Report:")
print(classification_rep)
```
**Explanation:**
1. We import the necessary libraries, including NumPy for numerical operations, scikit-learn for Naive Bayes classification, and more.
2. We prepare a sample text dataset with corresponding labels. In this example, 0 represents negative sentiment, and 1 represents positive sentiment.
3. We use CountVectorizer to convert the text data into a numerical format, creating a document-term matrix where each row represents a document (text) and each column represents a unique word (feature).
4. The data is split into training and testing sets, with 80% used for training and 20% for testing.
5. We create a Naive Bayes classifier model using `MultinomialNB`, which is suitable for text data.
6. The `alpha` parameter is introduced, and it controls the level of smoothing applied to the probability estimates. A value less than 1.0 accounts for Laplace smoothing, which helps prevent zero probabilities and can improve the model's performance when dealing with rare words.
7. The model is trained on the training data using `fit`.
8. We use the trained model to make predictions on the test data.
9. We evaluate the model's performance using accuracy and generate a classification report that includes precision, recall, F1-score, and support for each class.