Linear Regression

**Linear Regression Overview:** Linear regression is a supervised machine learning algorithm used for predicting a continuous target variable based on one or more predictor features. It models the relationship between the independent variables and the dependent variable by fitting a linear equation to the observed data. In this example, we'll use a dataset to demonstrate linear regression. **Example Using a Dataset:** **Step 1: Import Libraries** ```python import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score ``` **Step 2: Load and Explore the Dataset** ```python # Load a sample dataset (you can replace this with your own dataset) from sklearn.datasets import load_diabetes diabetes = load_diabetes() # Use a single feature for simplicity X = diabetes.data[:, np.newaxis, 2] # Use the third feature y = diabetes.target ``` **Step 3: Split the Data into Training and Testing Sets** ```python X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` **Step 4: Create and Train the Linear Regression Model** ```python # Create a linear regression model linear_regression_model = LinearRegression() # Train the model on the training data linear_regression_model.fit(X_train, y_train) ``` **Params That Can be Changed** 1. **fit_intercept** (default=True): - Specifies whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations. **Step 5: Make Predictions** ```python # Make predictions on the test data y_pred = linear_regression_model.predict(X_test) ``` **Step 6: Evaluate the Model** ```python # Calculate mean squared error mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse:.2f}") # Calculate R-squared (coefficient of determination) r2 = r2_score(y_test, y_pred) print(f"R-squared: {r2:.2f}") # Visualize the regression line plt.scatter(X_test, y_test, color='black') plt.plot(X_test, y_pred, color='blue', linewidth=3) plt.xlabel("Feature") plt.ylabel("Target") plt.title("Linear Regression") plt.show() ``` **Explanation:** 1. We import the necessary libraries, including NumPy for numerical operations, Matplotlib for visualization, scikit-learn for linear regression, and more. 2. We load a sample dataset, in this case, the Diabetes dataset, which contains feature data and a target variable. 3. We split the dataset into training and testing sets. Here, we use 80% of the data for training and 20% for testing. 4. We create a Linear Regression model using `LinearRegression`. 5. The `fit_intercept` parameter is introduced. When set to True (the default), the model calculates an intercept. You can set it to False if you want to fit a linear equation without an intercept. 6. The model is trained on the training data using `fit`. 7. We use the trained model to make predictions on the test data. 8. We evaluate the model's performance using the mean squared error (MSE) and R-squared (coefficient of determination). MSE measures the average squared difference between predicted and actual values, while R-squared indicates how well the model fits the data. We also visualize the regression line. You can customize the `fit_intercept` parameter to adjust whether the linear regression model includes an intercept in its calculations. This can be useful in various situations where you want more control over the model's behavior.