you got it — here’s a “work-by-hand” version for Problems 1 & 2 with every algebra step explicit, plus a single sanity-check code cell (already run above) so you can compare after you do it yourself. I’ll also outline how each R line maps to the math so your coding mirrors the derivations.
---
# Problem 1 — step-by-step math (show every equation)
We’re given the simple regression (robust SEs shown in parentheses):
[
\widehat{RENT}
= \hat\beta_0 + \hat\beta_1,INCOME
= 450.2;(97.6) + 12.85;(3.90)\cdot INCOME
]
* (n=55\Rightarrow df=n-2=53).
* (INCOME) is in **hundreds** of $ per month.
## (a) 95% CI for (\beta_1) and one-sided test (H_0:\beta_1\le 10) vs (H_1:\beta_1>10)
**95% CI formula (two-sided, robust (t)):**
[
\beta_1 \in \left[\hat\beta_1 - t_{0.975,,53}\cdot SE(\hat\beta_1),\ \hat\beta_1 + t_{0.975,,53}\cdot SE(\hat\beta_1)\right]
]
Plug symbols first:
[
\left[,12.85 - t_{0.975,,53}\cdot 3.90,; 12.85 + t_{0.975,,53}\cdot 3.90,\right]
]
(Use your table for (t_{0.975,53}\approx 2.006) when you compute.)
**One-sided test statistic:**
[
t
= \frac{\hat\beta_1 - 10}{SE(\hat\beta_1)}
= \frac{12.85 - 10}{3.90}
]
Compare that (t) to (t_{0.95,,53}) (or compute the one-sided p-value from the (t) CDF).
> sanity-check numbers (after you compute by hand):
> CI ≈ ([5.03,;20.67]), one-sided (t \approx 0.731) (already verified in the code cell)
## (b) Interpretation of (\hat\beta_1)
By definition in a simple regression, (\hat\beta_1) is the average change in (RENT) for a one-unit change in (INCOME):
[
\Delta \widehat{RENT} \approx \hat\beta_1 \cdot \Delta INCOME
]
With (INCOME) measured in **hundreds** of dollars, a one-unit increase means +$100/month. So:
[
+100\text{ dollars in income}\ \Rightarrow\ +$12.85\text{ rent (on average)}
]
For +$1000, (\Delta INCOME=10\Rightarrow 10\cdot 12.85 = $128.50).
## (c) Policy concern logic
Interpret the full CI range from (a). If the upper part is large (e.g., >$15 per $100), income growth could pass through strongly to rent; if the lower part is small (e.g., ~$5 per $100), pass-through is modest. You’ll argue from the interval, not just the point estimate.
## (d) Predicted rent at (INCOME=60) (i.e., $6,000/month)
**Use the fitted line:**
[
\widehat{RENT}(60)
= \hat\beta_0 + \hat\beta_1\cdot 60
= 450.2 + 12.85\cdot 60
]
> sanity-check number: ($1{,}221.20).
## (e) Which SLR assumption is shaky & why
State the **Zero Conditional Mean (CMI)** assumption:
[
\mathbb{E}[u\mid INCOME]=0
]
Explain why it may fail (omitted supply constraints, amenities, etc., correlated with both income and rent), and that would bias (\hat\beta_1).
---
# Problem 2 — step-by-step math (show every equation)
Model:
[
WAGE_i=\beta_0+\beta_1,EDUC_i+\beta_2,EXPER_i+\beta_3,FEMALE_i+u_i
]
Estimated equation (robust SEs):
[
\widehat{WAGE}
= 8.25;(2.50) + 0.75;(0.15)\cdot EDUC + 0.20;(0.10)\cdot EXPER - 1.50;(0.50)\cdot FEMALE
]
Other givens:
* (n=150\Rightarrow df\approx n-k=150-4=146).
* (SSR_{UR}=102.4), (SSR_R=118.0) for the restriction in part (c).
* (\mathrm{Var}(\hat\beta_2)=0.0100), (\mathrm{Var}(\hat\beta_3)=0.2500), (\mathrm{Cov}(\hat\beta_2,\hat\beta_3)=0.0015).
## (a) Interpret (\hat\beta_1=0.75)
Definition (ceteris paribus):
[
\Delta \widehat{WAGE} \approx \hat\beta_1 \cdot \Delta EDUC
]
So one extra year of education (\Rightarrow +$0.75) per hour on average, **holding EXPER and FEMALE fixed**.
## (b) Test (H_0:\beta_1=0.60) vs (H_1:\beta_1\neq 0.60) (5%)
**Test statistic (robust (t)):**
[
t = \frac{\hat\beta_1 - 0.60}{SE(\hat\beta_1)}
= \frac{0.75 - 0.60}{0.15}
]
Compare (|t|) to (t_{0.975,146}) or use a two-sided p-value.
> sanity-check number: (t\approx 1.00) → do not reject at 5%.
**95% CI (two-sided):**
[
\hat\beta_1 \pm t_{0.975,146}\cdot SE(\hat\beta_1)
= 0.75 \pm t_{0.975,146}\cdot 0.15
]
## (c) Joint test (H_0:\beta_2=\beta_3=0) (Wald (F) via SSRs)
Let (q=2) be number of restrictions. Use the classic nested-model (F):
[
F
= \frac{\big(SSR_R-SSR_{UR}\big)/q}{SSR_{UR}/(n-k)}
= \frac{(118.0-102.4)/2}{102.4/146}
]
Report (F), df ((q,,n-k)=(2,146)), and p-value.
> sanity-check number: (F\approx 11.121) → reject at 5%.
## (d) Test (H_0:\beta_2=\beta_3) (equivalently (H_0:\beta_2-\beta_3=0))
**Estimate of the linear combination:**
[
\widehat{L} = \hat\beta_2 - \hat\beta_3 = 0.20 - (-1.50) = 1.70
]
**Variance of a difference (delta method):**
[
\mathrm{Var}(\hat\beta_2-\hat\beta_3)
= \mathrm{Var}(\hat\beta_2) + \mathrm{Var}(\hat\beta_3) - 2,\mathrm{Cov}(\hat\beta_2,\hat\beta_3)
]
Plug symbols:
[
= 0.0100 + 0.2500 - 2\cdot 0.0015
= 0.2570
]
[
SE(\widehat{L})=\sqrt{0.2570}
]
**Test statistic:**
[
t = \frac{\widehat{L} - 0}{SE(\widehat{L})}
= \frac{1.70}{\sqrt{0.2570}}
]
Compare (|t|) to (t_{0.975,146}) or use the two-sided p-value.
> sanity-check numbers: (SE(\widehat{L})\approx 0.50695), (t\approx 3.353) → reject at 5%.
---
# How the **code** mirrors the math (line-by-line “why”)
Below is R you can submit. I annotate each line to show the math mapping.
```r
# Load robust-inference toolkits (HC1 robust covariance matrix)
library(sandwich) # vcovHC() builds robust VCV (HC1) -> gives robust SEs
library(lmtest) # coeftest() applies that VCV -> t, p using robust SEs
library(car) # linearHypothesis() -> Wald F with robust VCV
# === PROBLEM 3 (example of the same workflow on data) ===
df <- read.csv("card_subset.csv")
# (1) Estimate: wage_d = beta0 + beta1*educ + beta2*exper + beta3*fatheduc + beta4*motheduc + u
m1 <- lm(wage_d ~ educ + exper + fatheduc + motheduc, data = df)
# coeftest(m1, vcov = vcovHC(m1, type="HC1")):
# - Computes robust VCV \hat{V}_HC1
# - Reports t = (beta_hat - 0) / SE_HC1(beta_hat); we can extract educ row
coeftest(m1, vcov = vcovHC(m1, type = "HC1"))["educ", ]
# (2) Joint H0: exper = 0, fatheduc = 0, motheduc = 0
# linearHypothesis() constructs the Wald statistic:
# W = (R b_hat - r)' [ R V_hat R' ]^{-1} (R b_hat - r)
# and returns an F version with robust V
linearHypothesis(
m1,
c("exper = 0", "fatheduc = 0", "motheduc = 0"),
vcov = vcovHC(m1, type = "HC1"),
test = "F"
)
# (3) Add nearc4 (augmented model)
m2 <- lm(wage_d ~ educ + exper + fatheduc + motheduc + nearc4, data = df)
# Test H0: nearc4 = 0 with robust SEs
coeftest(m2, vcov = vcovHC(m2, type = "HC1"))["nearc4", ]
# (4) Prediction at specified covariates
# Standard (homoskedastic) 95% prediction interval:
newdat <- data.frame(educ=12, exper=10, fatheduc=12, motheduc=12, nearc4=0)
predict(m2, newdata = newdat, interval = "prediction", level = 0.95)
```
If you want to **practice the math in R** for Problem 2’s (d) linear combo, you can do:
```r
# Example of building a linear combination SE by hand:
# Suppose you've fit some 'm' with coefficients c(beta0, beta1, beta2, beta3)
# Define a contrast vector a for L = beta2 - beta3:
# order of names must match coef(m)
V <- vcovHC(m2, type = "HC1") # robust VCV
b <- coef(m2)
a <- c(0, 0, 1, -1, 0) # (beta2 - beta3) pattern, adjust if order differs
L_hat <- sum(a * b)
SE_L <- sqrt( as.numeric(t(a) %*% V %*% a) )
t_L <- L_hat / SE_L
```
(That code is the matrix version of the scalar formula you used: (\mathrm{Var}(a'\hat\beta)=a' \widehat{V}, a).)
---
# One sanity-check code cell (done)
I already executed a small Python cell that **only** recomputes the final numeric outputs for P1(a,d) and P2(b,c,d) from the given numbers. Use it to check your hand work *after* you write it out.
It produced:
* **Problem 1**:
CI for (\beta_1): ([5.0278,\ 20.6722])
(t)-stat (one-sided vs 10): (0.7308)
(\widehat{RENT}(60)=1221.2)
* **Problem 2**:
(t)-stat for (H_0:\beta_1=0.60): (1.0000)
(F)-stat for (H_0:\beta_2=\beta_3=0): (11.1211)
(\hat\beta_2-\hat\beta_3=1.7), (SE=0.50695), (t=3.3534)
(Those match what you should get by following the equations above.)
---
if you want, I can turn this into a printable “worksheet” where each sub-part has blanks to fill: (i) write the symbolic formula, (ii) substitute numbers, (iii) compute, (iv) short interpretation.