**Linear Regression Overview:**
Linear regression is a supervised machine learning algorithm used for predicting a continuous target variable based on one or more predictor features. It models the relationship between the independent variables and the dependent variable by fitting a linear equation to the observed data.
In this example, we'll use a dataset to demonstrate linear regression.
**Example Using a Dataset:**
**Step 1: Import Libraries**
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
```
**Step 2: Load and Explore the Dataset**
```python
# Load a sample dataset (you can replace this with your own dataset)
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
# Use a single feature for simplicity
X = diabetes.data[:, np.newaxis, 2] # Use the third feature
y = diabetes.target
```
**Step 3: Split the Data into Training and Testing Sets**
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
**Step 4: Create and Train the Linear Regression Model**
```python
# Create a linear regression model
linear_regression_model = LinearRegression()
# Train the model on the training data
linear_regression_model.fit(X_train, y_train)
```
**Params That Can be Changed**
1. **fit_intercept** (default=True):
- Specifies whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations.
**Step 5: Make Predictions**
```python
# Make predictions on the test data
y_pred = linear_regression_model.predict(X_test)
```
**Step 6: Evaluate the Model**
```python
# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
# Calculate R-squared (coefficient of determination)
r2 = r2_score(y_test, y_pred)
print(f"R-squared: {r2:.2f}")
# Visualize the regression line
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.xlabel("Feature")
plt.ylabel("Target")
plt.title("Linear Regression")
plt.show()
```
**Explanation:**
1. We import the necessary libraries, including NumPy for numerical operations, Matplotlib for visualization, scikit-learn for linear regression, and more.
2. We load a sample dataset, in this case, the Diabetes dataset, which contains feature data and a target variable.
3. We split the dataset into training and testing sets. Here, we use 80% of the data for training and 20% for testing.
4. We create a Linear Regression model using `LinearRegression`.
5. The `fit_intercept` parameter is introduced. When set to True (the default), the model calculates an intercept. You can set it to False if you want to fit a linear equation without an intercept.
6. The model is trained on the training data using `fit`.
7. We use the trained model to make predictions on the test data.
8. We evaluate the model's performance using the mean squared error (MSE) and R-squared (coefficient of determination). MSE measures the average squared difference between predicted and actual values, while R-squared indicates how well the model fits the data. We also visualize the regression line.
You can customize the `fit_intercept` parameter to adjust whether the linear regression model includes an intercept in its calculations. This can be useful in various situations where you want more control over the model's behavior.