# Linear Regression A *linear regression model* is a fundamental tool in statistics and modeling, expressed in the form: $$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \varepsilon $$ | Component | Definition | Role | | :---: | :---: | :--- | | $y$ | *Dependent Variable* | The variable we aim to predict (*Target*) | | $x_1, \dots, x_p$ | *Independent Variables* | The input features used for prediction (*Features*) | | $\beta_0, \dots, \beta_p$ | *Model Parameters* | The coefficients calculated during training | | $\varepsilon$ | *Residual* | The error term; the difference between the observed $y$ and the predicted value | The process of constructing a model involves *observing data*, choosing a *cost function*, and *computing parameters* that minimize that cost. ```python import numpy as np import matplotlib.pyplot as plt import scipy.linalg as la import pandas as pd ``` ## Least Squares Method ### Residual Sum of Squares Let $(\mathbf{x}_1,y_1),\dots,(\mathbf{x}_N,y_N)$ be a data set of $N$ observations where each $\mathbf{x}_i = (x_{i,1},\dots,x_{i,p})$ is a vector of values of the independent variables $x_1,\dots,x_p$. Consider a linear regression model $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p + \varepsilon $$ The **residual sum of squares** is the cost function $$ RSS = \sum_{i=1}^N |y_i - (\beta_0 + \beta_1 x_{i,1} + \cdots + \beta_p x_{i,p}) |^2 $$ The goal of **least squares linear regression** is to compute parameters $\beta_0,\beta_1,\dots,\beta_p$ which minimize $RSS$. ### Normal Equations The RSS minimization problem has a unique closed-form solution derived using linear algebra. 1. **Vector Form:** We express the problem using matrices, where the data is organized into vectors $\mathbf{y}$ and the **design matrix** $X$ (which includes a column of ones for the intercept $\beta_0$): $$\mathbf{y} = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix}, \quad X = \begin{bmatrix} 1 & x_{1,1} & \cdots & x_{1,p} \\ 1 & x_{2,1} & \cdots & x_{2,p} \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{N,1} & \cdots & x_{N,p} \end{bmatrix}, \quad \boldsymbol{\beta} = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix}$$ The RSS can then be written concisely as $RSS = || \mathbf{y} - X \boldsymbol{\beta} ||^2$. 2. **Orthogonality Condition:** The vector $X \boldsymbol{\beta}$ that minimizes the distance to $\mathbf{y}$ must be the orthogonal projection of $\mathbf{y}$ onto the column space of $X$. This requires that the residual vector $(\mathbf{y} - X \boldsymbol{\beta})$ be orthogonal to all columns of $X$, or that it lies in the **nullspace of $X^T$**. 3. **Final Equation:** This leads directly to the **normal equations**: $$X^T X \boldsymbol{\beta} = X^T \mathbf{y}$$ ### Example 1: Real Estate Prices We apply the least squares method to fit a simple linear model ($y = \beta_0 + \beta_1 x + \varepsilon$) to real estate data, where $y$ is the listing price (in millions of dollars) and $x$ is the apartment size (in square feet). #### Construct Data Matrix $X$ Since the model is $y = \beta_0 + \beta_1 x$, the matrix $X$ has two columns: a column of ones (for $\beta_0$) and a column for the apartment sizes $x$ (for $\beta_1$). ```python= x = [1557,1452,767,900,1018,802,924,981,879,819,829,1088,1085] y = [1.398,1.750,0.899,0.848,1.149,0.825,0.999888,1.175,0.885,0.749888,1.098,1.099,1.198] plt.plot(x,y,'.') plt.title('Listing Price on REALTOR.ca') plt.xlabel('Apartment Size (sqft)'), plt.ylabel('Price (Millons $)'), plt.grid(True) plt.show() ``` ![84f650e9-ade0-469b-ab3b-659e2b1a875a](https://hackmd.io/_uploads/SkhmDRtJWg.png) ```python=+ X = np.column_stack([np.ones(len(x)),x]) X ``` ``` array([[1.000e+00, 1.557e+03], [1.000e+00, 1.452e+03], [1.000e+00, 7.670e+02], [1.000e+00, 9.000e+02], [1.000e+00, 1.018e+03], [1.000e+00, 8.020e+02], [1.000e+00, 9.240e+02], [1.000e+00, 9.810e+02], [1.000e+00, 8.790e+02], [1.000e+00, 8.190e+02], [1.000e+00, 8.290e+02], [1.000e+00, 1.088e+03], [1.000e+00, 1.085e+03]]) ``` #### Solve Normal Equations We solve the system $X^T X \boldsymbol{\beta} = X^T \mathbf{y}$ for the parameter vector $\boldsymbol{\beta}$. ```python=+ beta = la.solve(X.T@X,X.T@y) beta ``` ``` array([0.10620723, 0.00096886]) ``` #### Plot Model The resulting linear model $\hat{y} = 0.1062 + 0.00096886 x$ is plotted over the data. The coefficient $\beta_1 \approx 0.00096886$ indicates that the listing price increases by about *$968.86$ dollars per square foot*. ```python=+ xs = np.linspace(700,1600) ys = beta[0] + beta[1]*xs plt.plot(x,y,'.',xs,ys) plt.title('Listing Price on REALTOR.ca'), plt.grid(True) plt.xlabel('Apartment Size (sqft)'), plt.ylabel('Price (Millons $)') plt.show() ``` ![bfbfdaca-91d1-4aae-b519-7add0d777a2a](https://hackmd.io/_uploads/rk0hwAt1-g.png) The coefficient $\beta_1 = 0.00096886$ suggests that the listing price of a 2-bedroom apartment increases by $968.86$ dollars per square foot. ### Example 2: Temperature We use least squares linear regression to model the daily average temperature $T$ as a periodic function of the day of the year $d$. This accounts for the annual fluctuation of temperature. The data includes measurements from the Vancouver Airport from 1995 to 2023. We begin by loading and plotting the raw data to visualize the seasonal trend. ```python= df = pd.read_csv('temperature.csv') df.head() ``` ``` day month year dayofyear avg_temperature 0 13 4 2023 103 7.10 1 12 4 2023 102 5.19 2 11 4 2023 101 8.00 3 10 4 2023 100 7.69 4 9 4 2023 99 9.30 ``` ```python=+ df.tail() ``` ``` day month year dayofyear avg_temperature 9995 1 12 1995 335 6.40 9996 30 11 1995 334 9.15 9997 29 11 1995 333 11.50 9998 28 11 1995 332 9.75 9999 27 11 1995 331 6.90 ``` ```python=+ plt.plot(df['dayofyear'],df['avg_temperature'],'.',alpha=0.1,lw=0) plt.title('Daily Average Temperature 1995-2023') plt.xlabel('Day of the Year'), plt.ylabel('Temperature (Celsius)'), plt.grid(True) plt.show() ``` ![be5b33bb-caaa-4232-8e9e-2f46e0c875c5](https://hackmd.io/_uploads/BJBgYAYyWe.png) #### Construct Matrix $X$ and Solve The temperature variation is naturally periodic over $365$ days, leading to the chosen regression model: $$ T = \beta_0 + \beta_1 \cos(2 \pi d/365) + \beta_2 \sin(2 \pi d/365) $$ This is a linear regression model because the model is linear in its *parameters* $\boldsymbol{\beta} = [\beta_0, \beta_1, \beta_2]^T$, even though the *features* $d$ are transformed by cosine and sine functions. The features are defined as $x_1 = \cos(2 \pi d/365)$ and $x_2 = \sin(2 \pi d/365)$. The **design matrix $X$** is constructed with three columns to solve the normal equations, $X^T X \boldsymbol{\beta} = X^T \mathbf{y}$: 1. A column of **ones** (for the intercept $\beta_0$) 2. A column of **cosine terms** (for $\beta_1$) 3. A column of **sine terms** (for $\beta_2$) ```python=+ N = len(df) d = df['dayofyear'] T = df['avg_temperature'] X = np.column_stack([np.ones(N),np.cos(2*np.pi*d/365),np.sin(2*np.pi*d/365)]) beta = la.solve(X.T@X,X.T@T) ds = np.linspace(0,365,500) Ts = beta[0] + beta[1]*np.cos(2*np.pi*ds/365) + beta[2]*np.sin(2*np.pi*ds/365) plt.plot(df['dayofyear'],df['avg_temperature'],'.',alpha=0.1,lw=0) plt.plot(ds,Ts,'r'), plt.grid(True) plt.title('Daily Average Temperature 1995-2023') plt.xlabel('Day of the Year'), plt.ylabel('Temperature (Celsius)') plt.show() ``` ![0c2ff4b3-fc6f-4b58-b65f-0f6f2fcc3e57](https://hackmd.io/_uploads/S1fXKCtkWe.png) ## Residuals and $R^2$ ### Residuals A linear regression model is built upon a dataset of $N$ observations, $(\mathbf{x}_1,y_1),\dots,(\mathbf{x}_N,y_N)$, where $\mathbf{x}_i$ is the vector of feature values for the $i$-th observation. The model is: $$ y = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p + \varepsilon $$ The quantity $\varepsilon$ is the *residual* or error term, representing the difference between the actual observed value $y_i$ and the value predicted by the model: $$ \varepsilon_i = y_i - (\beta_0 + \beta_1 x_{i,1} + \cdots + \beta_p x_{i,p}) $$ The central goal of the least squares method is to compute the parameters $\beta_0, \dots, \beta_p$ that minimize the *residual sum of squares (RSS)*, defined as the sum of the squared residuals: $$ RSS = \sum_{i = 1}^N \varepsilon_i^2 $$ Once the parameters are computed to minimize the $RSS$, the model is ready for validation using metrics like the coefficient of determination $R^2$ and diagnostic plots. ### Coefficient of Determination $R^2$ While the $RSS$ measures the total error of the model, it does not, on its own, tell us how well the model fits the data compared to a baseline. The *coefficient of determination* $R^2$ addresses this by comparing the model's error $RSS$ to the error of a trivial model based only on the mean of the dependent variable $\bar{y}$. The coefficient is defined as: $$ R^2 = 1 - \frac{\sum_{i=1}^N \varepsilon_i^2}{\sum_{i=1}^N (y_i - \bar{y})^2} = 1 - \frac{RSS}{TSS} $$ where $\bar{y}$ is the mean of the observed dependent variable: $$ \bar{y} = \frac{1}{N} \sum_{i=1}^N y_i $$ #### Interpretation The $R^2$ value is near 1 when the model fits the data very closely (low $RSS$). It is near 0 when the regression model performs no better than simply predicting the mean $\bar{y}$ for all observations. #### Adjusted $R^2$ ($R^2_{adj}$) The $RSS$ value naturally tends to decrease (or stay the same) whenever a new feature is added to the model, meaning $R^2$ usually increases with the number of features $p$. To account for the number of features used, we define the *adjusted $R^2$ coefficient* as: $$ R^2_{adj} = 1 - \frac{N-1}{N-p-1}\frac{\sum_{i=1}^N \varepsilon_i^2}{\sum_{i=1}^N (y_i - \bar{y})^2} $$ The ratio $\frac{N-1}{N-p-1}$ penalizes the model for including excessive or unnecessary features, providing a more reliable measure of model quality. ### Graphical Regression Diagnostics A linear regression model is considered a good representation of the data if the residuals $\varepsilon$ meet certain statistical assumptions. We use graphical diagnostics to visually verify these properties: | Property to Check | Diagnostic Plot | Expected Result | | :--- | :--- | :--- | | Normality | Histogram of residuals ($\varepsilon_i$) | A bell curve centered at 0 | | Homoscedasticity & Independence | Plot $\varepsilon_i$ vs. each feature $x_{i,j}$ | A random scatter plot with constant variance around the horizontal axis | | No Time/Index Dependence | Plot $\varepsilon_i$ vs. the observation index $i$ (or time $t_i$) | A random scatter plot with constant variance around the horizontal axis | ### Example: Real Estate Price Prediction This example uses real estate data from the Kitsilano neighborhood in Vancouver to illustrate how multiple features impact a linear regression model and how $R^2$ and $R^2_{adj}$ are used for model comparison. | Variable | Description | Python Array | | :---: | :--- | :---: | | $y$ (Target) | Listing Price (Millions $) | `y` | | $x_1$ (Feature) | Apartment Size (sqft) | `x1` | | $x_2$ (Feature) | Number of Bedrooms | `x2` | | $x_3$ (Feature) | Number of Bathrooms | `x3` | ```python= x1 = np.array([1050,943,1557,662,829,724,482,702,733,637,819,802,771,779,823,924,1088,1018]) x2 = np.array([2,2,2,1,2,1,1,1,2,1,2,2,1,2,2,2,2,2]) x3 = np.array([2,2,2,1,2,1,1,1,1,1,1,1,1,2,2,2,2,2]) y = np.array([1.39,1.079,1.398,0.859,1.098,0.619,0.625,0.639,0.728888,0.778,0.749888,0.825,0.858,0.899,0.8999,0.999888,1.099,1.149]) N = len(y) # Total number of observations ``` #### Model 1: Size Only ($y = \beta_0 + \beta_1 x_1 + \varepsilon$) We begin with the simplest model, using only apartment size ($x_1$) as the feature. 1. Construct $X$ and Solve $\boldsymbol{\beta}$: The design matrix $X$ has $p=1$ feature column (plus the intercept). ```python=+ X = np.column_stack([np.ones(N),x1]) beta = la.solve(X.T@X,X.T@y) # Plotting the data with the fitted line x1s = np.linspace(500,1600,50) ys = beta[0] + beta[1]*x1s plt.plot(x1,y,'.') plt.plot(x1s,ys) plt.title('Listing Price on REALTOR.ca (Model 1)') plt.xlabel('Size (sqft)'), plt.ylabel('Price (Millons $)'), plt.grid(True) plt.show() ``` ![427a3a91-089f-4a10-b96c-3c951a7f0c2c](https://hackmd.io/_uploads/ry6l9RtkZg.png) 2. Calculate $R^2$: ```python=+ residuals = y - (beta[0] + beta[1]*x1) yhat = np.mean(y) # Mean of the target variable y TSS = np.sum((y - yhat)**2) # Total Sum of Squares RSS = np.sum(residuals**2) # Residual Sum of Squares R2x1 = 1 - RSS/TSS R2adjx1 = 1 - (1 - R2x1)*(N - 1)/(N - 2 - 1) print('R2 =',R2x1,'R2adj =',R2adjx1) ``` ``` R2 = 0.7235562224127696 R2adj = 0.7053759486921101 ``` #### Model 2: Size and Bedrooms ($y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \varepsilon$) We add the number of bedrooms ($x_2$) to the model. The number of features is now $p=2$. ```python=+ X = np.column_stack([np.ones(N),x1,x2]) beta = la.solve(X.T@X,X.T@y) residuals = y - (beta[0] + beta[1]*x1 + beta[2]*x2) R2x1x2 = 1 - np.sum(residuals**2)/TSS # Calculate R2adj: R2adj = 1 - (1 - R2)*(N - 1)/(N - p - 1) R2adjx1x2 = 1 - (1 - R2x1x2)*(N - 1)/(N - 2 - 1) print('R2 =',R2x1x2,'R2adj =',R2adjx1x2) ``` ``` R2 = 0.7400376017871559 R2adj = 0.7053759486921101 ``` Note that $R^2$ increased slightly from Model 1 (0.7236 to 0.7400). #### Model 3: Size, Bedrooms, and Bathrooms ($y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \varepsilon$) We add the number of bathrooms ($x_3$) to the model. The number of features is $p=3$. ```python=+ X = np.column_stack([np.ones(N),x1,x2,x3]) beta = la.solve(X.T@X,X.T@y) residuals = y - (beta[0] + beta[1]*x1 + beta[2]*x2 + beta[3]*x3) R2x1x2x3 = 1 - np.sum(residuals**2)/TSS # Calculate R2adj: R2adj = 1 - (1 - R2)*(N - 1)/(N - p - 1) R2adjx1x2x3 = 1 - (1 - R2x1x2x3)*(N - 1)/(N - 3 - 1) print('R2 =',R2x1x2x3,'R2adj =',R2adjx1x2x3) ``` ``` R2 = 0.8302699983155557 R2adj = 0.7938992836688891 ``` #### Conclusion on Model Comparison | Model | Features (p) | R² | R²adj | | :---: | :--- | :---: | :---: | | 1 | $x_1$ (Size) | $0.7236$ | N/A | | 2 | $x_1, x_2$ (Size, Beds) | $0.7400$ | $0.7054$ | | 3 | $x_1, x_2, x_3$ (Size, Beds, Baths) | $\mathbf{0.8303}$ | $\mathbf{0.7939}$ | The coefficient of determination *increases* from $0.72$ to $0.83$ as we include the number of bedrooms and bathrooms, meaning *Model 3 explains $83.03\%$ of the variability in the listing price*. The adjusted $R^2$ also increased substantially, confirming that the added features are highly relevant and contribute significantly to the model's accuracy. ### Example: Temperature Regression Diagnostics This example continues the analysis of the daily average temperature data using the linear regression model $T = \beta_0 + \beta_1 \cos(2 \pi d/365) + \beta_2 \sin(2 \pi d/365)$. We now validate the model using residual analysis and diagnostic plots. #### Model Fit and Coefficients The periodic model is fit to the data, and the resulting curve (red line) shows a good approximation of the annual temperature cycle: ```python= N = len(df) d = df['dayofyear'] T = df['avg_temperature'] X = np.column_stack([np.ones(N),np.cos(2*np.pi*d/365),np.sin(2*np.pi*d/365)]) beta = la.solve(X.T@X,X.T@T) ds = np.linspace(0,365,500) x1 = np.cos(2*np.pi*ds/365) x2 = np.sin(2*np.pi*ds/365) Ts = beta[0] + beta[1]*x1 + beta[2]*x2 plt.plot(df['dayofyear'],df['avg_temperature'],'.',alpha=0.1,lw=0) plt.plot(ds,Ts,'r'), plt.grid(True) plt.title('Daily Average Temperature 1995-2023') plt.xlabel('Day of the Year'), plt.ylabel('Temperature (Celsius)') plt.show() ``` ![715739f6-099d-4956-80c1-2e5ea4bc5433](https://hackmd.io/_uploads/B1ER90K1bl.png) #### Graphical Regression Diagnostics We perform diagnostic checks on the residuals ($\varepsilon_i$) to evaluate the model's validity. ##### 1. Normality of Residuals The histogram of the residuals should approximate a bell curve centered at zero. ```python=+ # Compute the residuals and plot the distribution f = lambda d,beta: beta[0] + beta[1]*np.cos(2*np.pi*d/365) + beta[2]*np.sin(2*np.pi*d/365) residuals = T - f(d,beta) plt.hist(residuals,bins=50,density=True), plt.grid(True) plt.title('Distribution of Residuals') plt.xlabel('Residual'), plt.ylabel('Relative Frequency') plt.show() ``` ![e55b93c4-328c-4e2a-a431-b133b22046ec](https://hackmd.io/_uploads/BJcko0tJbx.png) The plot strongly suggests that the residuals are **normally distributed** around a mean of approximately zero ($\text{mean} \approx 2.85 \times 10^{-14}$), satisfying the first property of a good model. The variance of the residuals is $\text{variance} \approx 6.77$. ##### 2. Independence from Features ($x_1$ and $x_2$) The residuals are plotted against each input feature ($x_1 = \cos(\dots)$ and $x_2 = \sin(\dots)$) to check for systematic patterns. ```python=+ plt.plot(np.cos(2*np.pi*d/365),residuals,'.',alpha=0.1), plt.grid(True) plt.xlabel('$x_1$ (Cosine Term)'), plt.ylabel('Residual') plt.show() ``` ![b264ce19-e504-42f1-8b14-6142271b101c](https://hackmd.io/_uploads/rJcPjRKkWx.png) ```python=+ plt.plot(np.sin(2*np.pi*d/365),residuals,'.',alpha=0.1), plt.grid(True) plt.xlabel('$x_2$ (Sine Term)'), plt.ylabel('Residual') plt.show() ``` ![cc19800f-8fdc-4961-88cc-5d3a787a3938](https://hackmd.io/_uploads/BkKKs0Y1Zl.png) Both plots show **no clear dependence** on the trigonometric features, which is a positive sign for the model. ##### 3. Independence from Time/Index ($d$) The residuals are plotted against the day of the year ($d$) to check for patterns over time. ```python=+ plt.plot(d,residuals,'.',alpha=0.1), plt.grid(True) plt.xlabel('Day of the Year'), plt.ylabel('Residual') plt.show() ``` **Model Deficiency:** The plot of residuals vs. the day of the year reveals a **non-random pattern**—the error appears to vary with an approximate period of six months. This indicates that the current model is *missing a component* needed to accurately capture the semi-annual variation in temperature. This variation can be built into an improved model! ![0200f945-700c-4759-a46d-9fa14d1539bb](https://hackmd.io/_uploads/H1wm2AYJ-e.png) ## Regression Packages There are many Python packages available for computing and analyzing linear regression models. We will examine the core functionalities of `scipy.stats`, `sklearn.linear_model`, and `statsmodels` by applying them to the Real Estate Price example. ### Example: Real Estate Price Prediction (Continued) The data consists of $N=18$ observations, where the target variable $y$ is the listing price (in Millions $), and features are size $x_1$, bedrooms $x_2$, and bathrooms $x_3$. ```python= x1 = np.array([1050,943,1557,662,829,724,482,702,733,637,819,802,771,779,823,924,1088,1018]) x2 = np.array([2,2,2,1,2,1,1,1,2,1,2,2,1,2,2,2,2,2]) x3 = np.array([2,2,2,1,2,1,1,1,1,1,1,1,1,2,2,2,2,2]) y = np.array([1.39,1.079,1.398,0.859,1.098,0.619,0.625,0.639,0.728888,0.778,0.749888,0.825,0.858,0.899,0.8999,0.999888,1.099,1.149]) ``` #### `scipy.stats` The `scipy.stats` package contains the [`linregress`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html) function, which is limited to computing linear regression for a single feature ($p=1$). | Output | Description | | :---: | :--- | | `beta1` | The slope ($\beta_1$) | | `beta0` | The intercept ($\beta_0$) | | `r` | Pearson correlation coefficient ($\sqrt{R^2}$) | | `pvalue` | $p$-value for the hypothesis test that the slope is zero | | `stderr` | Standard error of the estimated slope | ```python=+ from scipy.stats import linregress # Fit the size-only model: y = beta0 + beta1*x1 beta1, beta0, r, pvalue, stderr = linregress(x1,y) print(beta0,beta1) print(r**2) ``` ``` 0.18918767130612613 0.0008660748169516868 0.7235562224127696 ``` The resulting coefficients match the explicit calculation for *Model 1 (Size Only)* exactly. #### `sklearn.linear_model` The `sklearn` library uses an object-oriented approach for machine learning models. The [`LinearRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) object can handle *multiple features* and is designed for prediction tasks. We fit *Model 3* ($y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3$) using all three features. ```python=+ from sklearn.linear_model import LinearRegression # X must be a matrix of features, Y must be a column vector X = np.column_stack([x1,x2,x3]) Y = y.reshape((len(y),1)) model = LinearRegression().fit(X,Y) ``` | Attribute/Method | Description | Result (Model 3) | | :--- | :--- | :--- | | `model.intercept_` | The intercept ($\beta_0$) | $\approx 0.1538$ | | `model.coef_` | Vector of coefficients ($\beta_1, \beta_2, \beta_3$) | $\approx [0.00059, -0.0341, 0.2157]$ | | `model.score(X,Y)` | Computes $R^2$ | $\approx 0.8303$ | **Prediction:** The method `model.predict()` can be used to predict the listing price for a new property: ```python=+ # Predict price for 1000 sqft, 2 beds, 1 bath model.predict([[1000,2,1]]) # This is equivalent to explicitly computing: # model.intercept_ + model.coef_[0,0]*1000 + model.coef_[0,1]*2 + model.coef_[0,2]*1 ``` ``` array([[0.89599539]]) ``` The predicted listing price is approximately $0.896$ Million $. #### `statsmodels` The `statsmodels` package is focused on deep statistical analysis, providing comprehensive output, including $p$-values and standard errors for all coefficients. We use the [`OLS`](https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html) (Ordinary Least Squares) function. **Note:** `statsmodels.OLS` does not automatically include the intercept ($\beta_0$). You must manually add a column of ones to the feature matrix $X$. ```python=+ import statsmodels.api as sm # Manually add column of ones for the intercept X = np.column_stack([np.ones(N),x1,x2,x3]) model = sm.OLS(y,X).fit() ``` | Attribute | Description | Result (Model 3) | | :--- | :--- | :--- | | `model.params` | Vector of parameters ($[\beta_0, \beta_1, \beta_2, \beta_3]$) | $\approx [0.1538, 0.00059, -0.0341, 0.2157]$ | | `model.rsquared` | Coefficient of Determination ($R^2$) | $\approx 0.8303$ | | `model.rsquared_adj` | Adjusted $R^2$ ($R^2_{adj}$) | $\approx 0.7939$ | **Prediction:** When using `model.predict()`, the input array must include the leading $x_0=1$ for the intercept term. ```python=+ # Predict price for 1000 sqft, 2 beds, 1 bath (Input: [1, x1, x2, x3]) model.predict([1,1000,2,1]) ``` ``` array([0.89599539]) ```