Perfect — let’s rewrite the Pr

you got it — here’s a “work-by-hand” version for Problems 1 & 2 with every algebra step explicit, plus a single sanity-check code cell (already run above) so you can compare after you do it yourself. I’ll also outline how each R line maps to the math so your coding mirrors the derivations. --- # Problem 1 — step-by-step math (show every equation) We’re given the simple regression (robust SEs shown in parentheses): [ \widehat{RENT} = \hat\beta_0 + \hat\beta_1,INCOME = 450.2;(97.6) + 12.85;(3.90)\cdot INCOME ] * (n=55\Rightarrow df=n-2=53). * (INCOME) is in **hundreds** of $ per month. ## (a) 95% CI for (\beta_1) and one-sided test (H_0:\beta_1\le 10) vs (H_1:\beta_1>10) **95% CI formula (two-sided, robust (t)):** [ \beta_1 \in \left[\hat\beta_1 - t_{0.975,,53}\cdot SE(\hat\beta_1),\ \hat\beta_1 + t_{0.975,,53}\cdot SE(\hat\beta_1)\right] ] Plug symbols first: [ \left[,12.85 - t_{0.975,,53}\cdot 3.90,; 12.85 + t_{0.975,,53}\cdot 3.90,\right] ] (Use your table for (t_{0.975,53}\approx 2.006) when you compute.) **One-sided test statistic:** [ t = \frac{\hat\beta_1 - 10}{SE(\hat\beta_1)} = \frac{12.85 - 10}{3.90} ] Compare that (t) to (t_{0.95,,53}) (or compute the one-sided p-value from the (t) CDF). > sanity-check numbers (after you compute by hand): > CI ≈ ([5.03,;20.67]), one-sided (t \approx 0.731) (already verified in the code cell) ## (b) Interpretation of (\hat\beta_1) By definition in a simple regression, (\hat\beta_1) is the average change in (RENT) for a one-unit change in (INCOME): [ \Delta \widehat{RENT} \approx \hat\beta_1 \cdot \Delta INCOME ] With (INCOME) measured in **hundreds** of dollars, a one-unit increase means +$100/month. So: [ +100\text{ dollars in income}\ \Rightarrow\ +$12.85\text{ rent (on average)} ] For +$1000, (\Delta INCOME=10\Rightarrow 10\cdot 12.85 = $128.50). ## (c) Policy concern logic Interpret the full CI range from (a). If the upper part is large (e.g., >$15 per $100), income growth could pass through strongly to rent; if the lower part is small (e.g., ~$5 per $100), pass-through is modest. You’ll argue from the interval, not just the point estimate. ## (d) Predicted rent at (INCOME=60) (i.e., $6,000/month) **Use the fitted line:** [ \widehat{RENT}(60) = \hat\beta_0 + \hat\beta_1\cdot 60 = 450.2 + 12.85\cdot 60 ] > sanity-check number: ($1{,}221.20). ## (e) Which SLR assumption is shaky & why State the **Zero Conditional Mean (CMI)** assumption: [ \mathbb{E}[u\mid INCOME]=0 ] Explain why it may fail (omitted supply constraints, amenities, etc., correlated with both income and rent), and that would bias (\hat\beta_1). --- # Problem 2 — step-by-step math (show every equation) Model: [ WAGE_i=\beta_0+\beta_1,EDUC_i+\beta_2,EXPER_i+\beta_3,FEMALE_i+u_i ] Estimated equation (robust SEs): [ \widehat{WAGE} = 8.25;(2.50) + 0.75;(0.15)\cdot EDUC + 0.20;(0.10)\cdot EXPER - 1.50;(0.50)\cdot FEMALE ] Other givens: * (n=150\Rightarrow df\approx n-k=150-4=146). * (SSR_{UR}=102.4), (SSR_R=118.0) for the restriction in part (c). * (\mathrm{Var}(\hat\beta_2)=0.0100), (\mathrm{Var}(\hat\beta_3)=0.2500), (\mathrm{Cov}(\hat\beta_2,\hat\beta_3)=0.0015). ## (a) Interpret (\hat\beta_1=0.75) Definition (ceteris paribus): [ \Delta \widehat{WAGE} \approx \hat\beta_1 \cdot \Delta EDUC ] So one extra year of education (\Rightarrow +$0.75) per hour on average, **holding EXPER and FEMALE fixed**. ## (b) Test (H_0:\beta_1=0.60) vs (H_1:\beta_1\neq 0.60) (5%) **Test statistic (robust (t)):** [ t = \frac{\hat\beta_1 - 0.60}{SE(\hat\beta_1)} = \frac{0.75 - 0.60}{0.15} ] Compare (|t|) to (t_{0.975,146}) or use a two-sided p-value. > sanity-check number: (t\approx 1.00) → do not reject at 5%. **95% CI (two-sided):** [ \hat\beta_1 \pm t_{0.975,146}\cdot SE(\hat\beta_1) = 0.75 \pm t_{0.975,146}\cdot 0.15 ] ## (c) Joint test (H_0:\beta_2=\beta_3=0) (Wald (F) via SSRs) Let (q=2) be number of restrictions. Use the classic nested-model (F): [ F = \frac{\big(SSR_R-SSR_{UR}\big)/q}{SSR_{UR}/(n-k)} = \frac{(118.0-102.4)/2}{102.4/146} ] Report (F), df ((q,,n-k)=(2,146)), and p-value. > sanity-check number: (F\approx 11.121) → reject at 5%. ## (d) Test (H_0:\beta_2=\beta_3) (equivalently (H_0:\beta_2-\beta_3=0)) **Estimate of the linear combination:** [ \widehat{L} = \hat\beta_2 - \hat\beta_3 = 0.20 - (-1.50) = 1.70 ] **Variance of a difference (delta method):** [ \mathrm{Var}(\hat\beta_2-\hat\beta_3) = \mathrm{Var}(\hat\beta_2) + \mathrm{Var}(\hat\beta_3) - 2,\mathrm{Cov}(\hat\beta_2,\hat\beta_3) ] Plug symbols: [ = 0.0100 + 0.2500 - 2\cdot 0.0015 = 0.2570 ] [ SE(\widehat{L})=\sqrt{0.2570} ] **Test statistic:** [ t = \frac{\widehat{L} - 0}{SE(\widehat{L})} = \frac{1.70}{\sqrt{0.2570}} ] Compare (|t|) to (t_{0.975,146}) or use the two-sided p-value. > sanity-check numbers: (SE(\widehat{L})\approx 0.50695), (t\approx 3.353) → reject at 5%. --- # How the **code** mirrors the math (line-by-line “why”) Below is R you can submit. I annotate each line to show the math mapping. ```r # Load robust-inference toolkits (HC1 robust covariance matrix) library(sandwich) # vcovHC() builds robust VCV (HC1) -> gives robust SEs library(lmtest) # coeftest() applies that VCV -> t, p using robust SEs library(car) # linearHypothesis() -> Wald F with robust VCV # === PROBLEM 3 (example of the same workflow on data) === df <- read.csv("card_subset.csv") # (1) Estimate: wage_d = beta0 + beta1*educ + beta2*exper + beta3*fatheduc + beta4*motheduc + u m1 <- lm(wage_d ~ educ + exper + fatheduc + motheduc, data = df) # coeftest(m1, vcov = vcovHC(m1, type="HC1")): # - Computes robust VCV \hat{V}_HC1 # - Reports t = (beta_hat - 0) / SE_HC1(beta_hat); we can extract educ row coeftest(m1, vcov = vcovHC(m1, type = "HC1"))["educ", ] # (2) Joint H0: exper = 0, fatheduc = 0, motheduc = 0 # linearHypothesis() constructs the Wald statistic: # W = (R b_hat - r)' [ R V_hat R' ]^{-1} (R b_hat - r) # and returns an F version with robust V linearHypothesis( m1, c("exper = 0", "fatheduc = 0", "motheduc = 0"), vcov = vcovHC(m1, type = "HC1"), test = "F" ) # (3) Add nearc4 (augmented model) m2 <- lm(wage_d ~ educ + exper + fatheduc + motheduc + nearc4, data = df) # Test H0: nearc4 = 0 with robust SEs coeftest(m2, vcov = vcovHC(m2, type = "HC1"))["nearc4", ] # (4) Prediction at specified covariates # Standard (homoskedastic) 95% prediction interval: newdat <- data.frame(educ=12, exper=10, fatheduc=12, motheduc=12, nearc4=0) predict(m2, newdata = newdat, interval = "prediction", level = 0.95) ``` If you want to **practice the math in R** for Problem 2’s (d) linear combo, you can do: ```r # Example of building a linear combination SE by hand: # Suppose you've fit some 'm' with coefficients c(beta0, beta1, beta2, beta3) # Define a contrast vector a for L = beta2 - beta3: # order of names must match coef(m) V <- vcovHC(m2, type = "HC1") # robust VCV b <- coef(m2) a <- c(0, 0, 1, -1, 0) # (beta2 - beta3) pattern, adjust if order differs L_hat <- sum(a * b) SE_L <- sqrt( as.numeric(t(a) %*% V %*% a) ) t_L <- L_hat / SE_L ``` (That code is the matrix version of the scalar formula you used: (\mathrm{Var}(a'\hat\beta)=a' \widehat{V}, a).) --- # One sanity-check code cell (done) I already executed a small Python cell that **only** recomputes the final numeric outputs for P1(a,d) and P2(b,c,d) from the given numbers. Use it to check your hand work *after* you write it out. It produced: * **Problem 1**: CI for (\beta_1): ([5.0278,\ 20.6722]) (t)-stat (one-sided vs 10): (0.7308) (\widehat{RENT}(60)=1221.2) * **Problem 2**: (t)-stat for (H_0:\beta_1=0.60): (1.0000) (F)-stat for (H_0:\beta_2=\beta_3=0): (11.1211) (\hat\beta_2-\hat\beta_3=1.7), (SE=0.50695), (t=3.3534) (Those match what you should get by following the equations above.) --- if you want, I can turn this into a printable “worksheet” where each sub-part has blanks to fill: (i) write the symbolic formula, (ii) substitute numbers, (iii) compute, (iv) short interpretation.