# Multivariate Midterm
###### tags: `multivariate`
# Q1
See `q1.R`
> (a) Calculate Mahalanobis distance and drow qqplot.

```
[1] "Mahalanobis distance"
[1] 3.565612 6.328101 3.420948 1.840835 3.520075 6.124092 5.354612
[8] 7.096413 17.892974 2.687432 8.300846 1.373607 4.947074 4.503419
[15] 7.211220 9.082503 3.619675 4.184885 6.849703 8.624747 4.347635
[22] 12.126761 2.259121 5.213604 5.020921 1.777118 3.238747 2.463155
[29] 2.912949 8.855924 3.187987 1.851106 5.290228 3.462169 4.246507
[36] 1.523147 8.709898 1.163788 5.319714 2.836044 6.629040 8.385439
[43] 5.005949 4.148234 5.353638 20.861587 6.762496 2.897667 25.172561
[50] 6.448091
```
And then, remove the four outliers.

> (b) Boxplot
I normalize the data before plotting out.
```
[1] "mean"
y1 y2 y3 y4 y5 y6
15.142 45.320 5408.000 23.080 25.540 21.120
[1] "variance"
y1 y2 y3 y4 y5 y6
6.441184e-01 4.711837e+00 2.013404e+06 9.048327e+01 5.751878e+01 1.810776e+01
```

Or plot separately

## Q2
See `q2.R`
> Draw scatter plot and regression line

The result of regression
```
Call:
lm(formula = x2 ~ x1, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.507 -5.119 -2.795 2.917 15.329
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.2960 3.9423 2.612 0.0156 *
x1 0.9529 0.3898 2.445 0.0226 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.18 on 23 degrees of freedom
Multiple R-squared: 0.2063, Adjusted R-squared: 0.1718
F-statistic: 5.977 on 1 and 23 DF, p-value: 0.02257
```
> Comment possible outliers by Cook's distance

21 is outlier without to say.
## Q3
See `q3.R`
> Profile of two groups are parallel?
Test the parallelism
```
[1] "t2"
[1,] 0.1716331
[1] "f"
[1,] 0.0750895
[1] "f critical"
[1] 4.737414
```
0.075 < 4.737
So we fail to reject parallelism hypothesis.

> If they are at the same level, are the two sample means equal?
If they are parallel and at the same level, they are equal.
If they are only at the same level, they may not be the same.
## Q4
See `q4.R`
> Show $Y_1$, $Y_2$, $Y_3$, $Y_4$ by regression analysis
```
Call:
lm(formula = Y1 ~ X1 + X2 + X3 + X4, data = data)
Residuals:
Min 1Q Median 3Q Max
-5.2239 -0.7260 0.0905 1.0783 2.2895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -74.23167 13.35744 -5.557 7.54e-07 ***
X1 -3.12032 1.91471 -1.630 0.10869
X2 0.09758 0.03203 3.046 0.00350 **
X3 0.04940 0.01851 2.669 0.00989 **
X4 85.07615 12.54761 6.780 7.39e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.498 on 57 degrees of freedom
Multiple R-squared: 0.7475, Adjusted R-squared: 0.7297
F-statistic: 42.18 on 4 and 57 DF, p-value: < 2.2e-16
Call:
lm(formula = Y2 ~ X1 + X2 + X3 + X4, data = data)
Residuals:
Min 1Q Median 3Q Max
-0.63540 -0.19946 -0.01305 0.19567 0.82886
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -24.014741 3.061117 -7.845 1.24e-10 ***
X1 -1.184892 0.438793 -2.700 0.0091 **
X2 0.009134 0.007341 1.244 0.2185
X3 0.008353 0.004242 1.969 0.0538 .
X4 28.754768 2.875531 10.000 3.76e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3433 on 57 degrees of freedom
Multiple R-squared: 0.7855, Adjusted R-squared: 0.7704
F-statistic: 52.18 on 4 and 57 DF, p-value: < 2.2e-16
Call:
lm(formula = Y3 ~ X1 + X2 + X3 + X4, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.04161 -0.40548 0.01791 0.36841 1.16904
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -45.763252 5.770990 -7.930 8.99e-11 ***
X1 -1.485668 0.827238 -1.796 0.07781 .
X2 0.047027 0.013839 3.398 0.00124 **
X3 0.025301 0.007997 3.164 0.00250 **
X4 45.798211 5.421112 8.448 1.24e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6472 on 57 degrees of freedom
Multiple R-squared: 0.8171, Adjusted R-squared: 0.8043
F-statistic: 63.66 on 4 and 57 DF, p-value: < 2.2e-16
Call:
lm(formula = Y4 ~ X1 + X2 + X3 + X4, data = data)
Residuals:
Min 1Q Median 3Q Max
-1.1554 -0.1668 0.0346 0.2458 0.4996
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.727292 3.110419 -5.699 4.45e-07 ***
X1 -0.549977 0.445861 -1.234 0.222445
X2 0.029166 0.007459 3.910 0.000248 ***
X3 0.010951 0.004310 2.541 0.013804 *
X4 16.219950 2.921844 5.551 7.71e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3488 on 57 degrees of freedom
Multiple R-squared: 0.7633, Adjusted R-squared: 0.7466
F-statistic: 45.94 on 4 and 57 DF, p-value: < 2.2e-16
```
> Construct 95% prediction interval for $Y_3$ at X1=0.33, X2=45.5, X3=220,375, X4=1.01
```
fit lwr upr
1 7.708677 4.661659 10.75569
```
## Q5
see `q5.R`
> Forward stepwise regression and backward stepwise regression
```
Call:
lm(formula = Y ~ X.21 + X.23 + X.7 + X.57 + X.16 + X.52 + X.25 +
X.5 + X.53 + X.8 + X.24 + X.6 + X.20 + X.22 + X.42 + X.18 +
X.27 + X.46 + X.49 + X.45 + X.19 + X.33 + X.9 + X.26 + X.12 +
X.17 + X.37 + X.44 + X.3 + X.4 + X.48 + X.47 + X.43 + X.1 +
X.2 + X.35 + X.55 + X.40 + X.30 + X.11 + X.54 + X.38 + X.10 +
X.50 + X.56 + X.34 + X.39, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.26423 -0.21604 -0.05733 0.21312 0.94535
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.014e-01 1.130e-02 17.828 < 2e-16 ***
X.21 5.228e-02 4.647e-03 11.252 < 2e-16 ***
X.23 1.784e-01 1.512e-02 11.801 < 2e-16 ***
X.7 2.125e-01 1.301e-02 16.327 < 2e-16 ***
X.57 8.054e-05 9.542e-06 8.440 < 2e-16 ***
X.16 7.471e-02 6.015e-03 12.420 < 2e-16 ***
X.52 6.783e-02 6.128e-03 11.068 < 2e-16 ***
X.25 -2.291e-02 3.644e-03 -6.287 3.54e-10 ***
X.5 8.446e-02 7.562e-03 11.169 < 2e-16 ***
X.53 2.359e-01 2.138e-02 11.032 < 2e-16 ***
X.8 9.437e-02 1.256e-02 7.515 6.81e-14 ***
X.24 9.112e-02 1.141e-02 7.982 1.80e-15 ***
X.6 1.205e-01 1.835e-02 6.568 5.67e-11 ***
X.20 6.114e-02 9.822e-03 6.224 5.27e-10 ***
X.22 4.462e-02 5.335e-03 8.364 < 2e-16 ***
X.42 -3.984e-02 6.556e-03 -6.077 1.32e-09 ***
X.18 5.797e-02 9.449e-03 6.135 9.25e-10 ***
X.27 -1.223e-02 1.504e-03 -8.128 5.58e-16 ***
X.46 -3.899e-02 5.440e-03 -7.166 8.94e-13 ***
X.49 -1.415e-01 2.199e-02 -6.434 1.37e-10 ***
X.45 -3.531e-02 4.949e-03 -7.135 1.12e-12 ***
X.19 1.413e-02 3.082e-03 4.584 4.69e-06 ***
X.33 -4.334e-02 8.791e-03 -4.930 8.51e-07 ***
X.9 7.363e-02 1.866e-02 3.946 8.07e-05 ***
X.26 -2.189e-02 6.678e-03 -3.278 0.001055 **
X.12 -2.775e-02 5.852e-03 -4.741 2.19e-06 ***
X.17 5.216e-02 1.182e-02 4.412 1.05e-05 ***
X.37 -3.445e-02 1.266e-02 -2.720 0.006546 **
X.44 -3.323e-02 7.801e-03 -4.259 2.09e-05 ***
X.3 4.012e-02 1.003e-02 4.001 6.40e-05 ***
X.4 1.190e-02 3.459e-03 3.439 0.000588 ***
X.48 -5.748e-02 1.695e-02 -3.391 0.000702 ***
X.47 -1.939e-01 6.332e-02 -3.062 0.002209 **
X.43 -6.581e-02 2.330e-02 -2.824 0.004765 **
X.1 -5.005e-02 1.674e-02 -2.990 0.002803 **
X.2 -1.212e-02 3.784e-03 -3.204 0.001364 **
X.35 -3.089e-02 1.152e-02 -2.682 0.007345 **
X.55 2.109e-04 1.813e-04 1.163 0.244814
X.40 4.044e-02 2.625e-02 1.541 0.123471
X.30 -5.130e-02 1.552e-02 -3.305 0.000958 ***
X.11 5.707e-02 2.621e-02 2.178 0.029473 *
X.54 2.734e-02 1.160e-02 2.356 0.018491 *
X.38 -5.228e-02 2.235e-02 -2.339 0.019364 *
X.10 1.617e-02 7.707e-03 2.099 0.035902 *
X.50 -6.085e-02 2.175e-02 -2.798 0.005160 **
X.56 7.158e-05 3.643e-05 1.964 0.049536 *
X.34 5.602e-02 3.050e-02 1.837 0.066263 .
X.39 -1.862e-02 1.168e-02 -1.594 0.110948
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3261 on 4553 degrees of freedom
Multiple R-squared: 0.5593, Adjusted R-squared: 0.5547
F-statistic: 122.9 on 47 and 4553 DF, p-value: < 2.2e-16
> summary(backward_aic)
Call:
lm(formula = Y ~ X.1 + X.2 + X.3 + X.4 + X.5 + X.6 + X.7 + X.8 +
X.9 + X.10 + X.11 + X.12 + X.13 + X.14 + X.15 + X.16 + X.17 +
X.18 + X.19 + X.20 + X.21 + X.22 + X.23 + X.24 + X.25 + X.26 +
X.27 + X.28 + X.29 + X.30 + X.31 + X.32 + X.33 + X.34 + X.35 +
X.36 + X.37 + X.38 + X.39 + X.40 + X.41 + X.42 + X.43 + X.44 +
X.45 + X.46 + X.47 + X.48 + X.49 + X.50 + X.51 + X.52 + X.53 +
X.54 + X.55 + X.56 + X.57, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.27083 -0.21836 -0.05765 0.21419 0.92732
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.003e-01 1.152e-02 17.390 < 2e-16 ***
X.1 -4.982e-02 1.678e-02 -2.969 0.003007 **
X.2 -1.205e-02 3.790e-03 -3.178 0.001492 **
X.3 3.928e-02 1.006e-02 3.906 9.51e-05 ***
X.4 1.192e-02 3.461e-03 3.444 0.000579 ***
X.5 8.421e-02 7.570e-03 11.124 < 2e-16 ***
X.6 1.188e-01 1.842e-02 6.451 1.22e-10 ***
X.7 2.129e-01 1.303e-02 16.342 < 2e-16 ***
X.8 9.399e-02 1.257e-02 7.477 9.05e-14 ***
X.9 7.247e-02 1.896e-02 3.822 0.000134 ***
X.10 1.507e-02 7.780e-03 1.937 0.052842 .
X.11 5.686e-02 2.623e-02 2.167 0.030253 *
X.12 -2.786e-02 5.901e-03 -4.721 2.41e-06 ***
X.13 1.190e-02 1.668e-02 0.714 0.475423
X.14 4.860e-03 1.474e-02 0.330 0.741597
X.15 1.852e-02 2.196e-02 0.843 0.399050
X.16 7.506e-02 6.024e-03 12.460 < 2e-16 ***
X.17 5.172e-02 1.186e-02 4.359 1.34e-05 ***
X.18 5.540e-02 9.763e-03 5.674 1.48e-08 ***
X.19 1.413e-02 3.103e-03 4.554 5.39e-06 ***
X.20 6.172e-02 9.840e-03 6.272 3.89e-10 ***
X.21 5.269e-02 4.663e-03 11.299 < 2e-16 ***
X.22 4.477e-02 5.346e-03 8.374 < 2e-16 ***
X.23 1.748e-01 1.591e-02 10.987 < 2e-16 ***
X.24 9.089e-02 1.143e-02 7.953 2.28e-15 ***
X.25 -2.317e-02 3.655e-03 -6.340 2.52e-10 ***
X.26 -2.163e-02 6.736e-03 -3.211 0.001332 **
X.27 -1.220e-02 1.508e-03 -8.091 7.52e-16 ***
X.28 3.987e-03 1.267e-02 0.315 0.753016
X.29 -7.450e-03 1.118e-02 -0.666 0.505356
X.30 -5.195e-02 1.610e-02 -3.227 0.001260 **
X.31 -2.329e-02 1.939e-02 -1.201 0.229792
X.32 6.332e-03 1.687e-01 0.038 0.970057
X.33 -4.198e-02 8.845e-03 -4.746 2.13e-06 ***
X.34 5.114e-02 1.660e-01 0.308 0.758033
X.35 -3.117e-02 1.221e-02 -2.552 0.010729 *
X.36 2.648e-02 1.963e-02 1.349 0.177312
X.37 -3.321e-02 1.271e-02 -2.612 0.009027 **
X.38 -5.344e-02 2.247e-02 -2.378 0.017449 *
X.39 -1.975e-02 1.172e-02 -1.686 0.091853 .
X.40 4.076e-02 2.724e-02 1.497 0.134588
X.41 -8.364e-03 1.438e-02 -0.582 0.560912
X.42 -3.693e-02 7.603e-03 -4.857 1.23e-06 ***
X.43 -6.324e-02 2.356e-02 -2.684 0.007304 **
X.44 -3.238e-02 7.863e-03 -4.118 3.89e-05 ***
X.45 -3.525e-02 4.958e-03 -7.110 1.34e-12 ***
X.46 -3.781e-02 5.777e-03 -6.546 6.57e-11 ***
X.47 -1.952e-01 6.345e-02 -3.076 0.002108 **
X.48 -5.822e-02 1.697e-02 -3.432 0.000605 ***
X.49 -1.401e-01 2.205e-02 -6.354 2.30e-10 ***
X.50 -5.996e-02 2.231e-02 -2.687 0.007232 **
X.51 -5.905e-02 4.488e-02 -1.316 0.188291
X.52 6.805e-02 6.136e-03 11.090 < 2e-16 ***
X.53 2.332e-01 2.171e-02 10.741 < 2e-16 ***
X.54 2.769e-02 1.162e-02 2.383 0.017200 *
X.55 2.327e-04 1.831e-04 1.270 0.203973
X.56 6.675e-05 3.713e-05 1.798 0.072240 .
X.57 7.986e-05 9.656e-06 8.270 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3262 on 4543 degrees of freedom
Multiple R-squared: 0.5599, Adjusted R-squared: 0.5544
F-statistic: 101.4 on 57 and 4543 DF, p-value: < 2.2e-16
```
> Training data by different alpha value.
Note when alpha =
* 0 => ridge
* 1 => lasso
```
[1] "alpha"
[1] 0
[1] "min lambda"
[1] 0.2629002
[1] "min loss"
[1] 0.4418458
[1] "alpha"
[1] 1
[1] "min lambda"
[1] 0.08217515
[1] "min loss"
[1] 0.438109
[1] "alpha"
[1] 2
[1] "min lambda"
[1] 0.06542304
[1] "min loss"
[1] 0.4413074
[1] "alpha"
[1] 3
[1] "min lambda"
[1] 0.0478678
[1] "min loss"
[1] 0.4406127
[1] "alpha"
[1] 4
[1] "min lambda"
[1] 0.03271152
[1] "min loss"
[1] 0.4437336
[1] "alpha"
[1] 5
[1] "min lambda"
[1] 0.01803742
[1] "min loss"
[1] 0.4415254
[1] "alpha"
[1] 6
[1] "min lambda"
[1] 0.0239339
[1] "min loss"
[1] 0.4350735
[1] "alpha"
[1] 7
[1] "min lambda"
[1] 0.02471012
[1] "min loss"
[1] 0.4365429
[1] "alpha"
[1] 8
[1] "min lambda"
[1] 0.01970057
[1] "min loss"
[1] 0.4361745
[1] "alpha"
[1] 9
[1] "min lambda"
[1] 0.01921898
[1] "min loss"
[1] 0.4383328
[1] "alpha"
[1] 10
[1] "min lambda"
[1] 0.01436034
[1] "min loss"
[1] 0.4354951
```
> Apply the minimum error model to test data(Last 1/3 data) to evaluate mean square errors.
And find the best one:
```
[1] "alpha"
[1] .7
[1] "MSE"
[1] 0.8083746
```