$$ % My definitions \def\ve{{\varepsilon}} \def\dd{{\text{ d}}} \newcommand{\dif}[2]{\frac{d #1}{d #2}} % for derivatives \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} % for partial derivatives \def\R{\text{R}} \def\E{\mathbb{E}} $$ # Q1 Suppose we want to study recidivism, which is the tendency of a convicted criminal to re-offend. In particular, we want to study how long do frst-time offenders "wait" before they re-offend. Our data include $N$ first-time offenders who, as a part of prison reform, were released from prisons on January 1st, 1990. For each of them, we observe the time $T_i \ge 0$ (in years) before they re-offend. If $T_i = 10$ then it means that $i$ re-offended after 10 years, and so on. For simplicity, ignore the possibility of death. a. Suppose the time to re-offend $T$ follows an exponential distribution with parameter $\theta_0 > 0$. So, its density is $f(T ; \theta_0) = \theta_0 \times exp(-\theta_0 \times T )$; and CDF is $F (T ; \theta_0) = 1 - exp(-\theta_0 \times T )$, with mean $\frac{1}{\theta_0}$ and variance $\frac{1}{\theta_0^2}$. Determine the maximum likelihood estimator (MLE) of $\theta_0$. b. Suppose, now the data collection ended abruptly on December 31st, 2020. Let us denote this cutoff time by $T^* = 30$. So, for each individual, we observe either the time of re-offense, $T_i$, or $T^*$ whichever comes first. This phenomenon is known as right-censoring of the data. In other words for each $i$ we observe $W_i = \min \{T_i; T^*\}$ and $C_i = 1\{T_i \le T^*\}$, where $1\{\cdot\}$ is an indicator function that is equal to $1$ if the argument $\{\cdot\}$ is true, and $0$ otherwise. Determine the MLE of $\theta_0$ when there is right-censoring of the kind mentioned here. c. Report the asymptotic distributions of the MLE estimators from (1) and (2). d. Describe how you would test the null hypothesis that average number of years before reoffending equals 3 using the MLE estimator from (1) and the asymptotic distriubiton from (3). ## Answer ### a $$ L(T_i, \theta_0) = \prod_{i=1}^n \theta_0 \times e^{-\theta_0 \times T_i}\\ \implies \log L = \sum_{i=1}^n (\ln \theta_0 - \theta_0 T_i) = n \ln \theta_0 - \sum_{i=1}^n \theta_0 T_i. $$ By FOC, $$ \hat{\theta_0} = \frac{n}{\sum_{i=1}^n T_i}. $$ ### b When $n=1$, the likelihood function is $$ L(W_i, C_i, \theta_0, n=1) = \begin{cases} \theta_0 \times e^{-\theta_0 W_i} &\text{ if } W_i < T^*\\ e^{-\theta_0 W_i} &\text{ if } W_i \ge T^*\\ \end{cases} $$ We can combine two conditions as $$ L(W_i, C_i, \theta_0, n=1) = (\theta_0 \times e^{-\theta_0 W_i})^{1-C_i} (e^{-\theta_0 W_i})^{C_i} $$ With $n$ observations, the likelihood function is $$ L(W_i, C_i, \theta_0) = \prod_{i=1}^n(\theta_0 \times e^{-\theta_0 W_i})^{1-C_i} (e^{-\theta_0 W_i})^{C_i} $$ The log-likelihood function is $$ \log L(W_i, C_i, \theta_0) = \sum_{i=1}^n (1-c_i) (\ln \theta_0 - \theta_0 W_i) + \sum_{i=1}^nC_i(-\theta_0 W_i). $$ By FOC, $$ \hat{\theta_0}= \frac{\sum_{i=1}^n (1-C_i)}{\sum_{i=1}^n W_i}. $$ :::info We can check the above answer. When $T^* \to \infty$, $C_i$ will mostly zero, and the answer for b will be equal to the answer for a. ::: ### c Apply the [MLE formula](https://hackmd.io/NLZtzH3hTSmO5kjDoWBsVg?view#MLE). ### d Because the mean is $\frac{1}{\theta_0}$, it is equivalent to test $\theta_0=\frac{1}{3}$. ## Q2 Suppose that $X_n$ is the sequence of double exponential random variables with density $\frac{1}{\lambda_n} exp(-|x|/ \lambda_n)$ with parameters $\lambda_n$: Suppose that sequence of random variables $Y_n$ is constructed from $X_n$ such that $$ Y_n = \begin{cases} 0, & \text{ if } X_n \le 0,\\ X_n, & \text{ if } X_n \in (0,1),\\ 1, & \text{ if } X_n \ge 1, \end{cases} $$ Now suppose that $\lambda_n \to \infty$. a. Does $X_n$ converge in distribution? If it does not, prove it formally, otherwise find its distribution limit. b. Does $Y_n$ converge in distribution? If it does not, prove it formally, otherwise find its distribution limit. ## Answer :::warning There is a typo in the question. The PDF should be $$ \frac{1}{2\lambda_n} exp(-|x|/ \lambda_n). $$ I am not very confident with my solution. ::: ### a The graph shows that the limit of $X_n$ is not a distribution. https://en.wikipedia.org/wiki/Laplace_distribution#/media/File:Laplace_pdf_mod.svg First, we know $f(x)=f(-x)$, so we can just consider the case of $x \ge 0$. We want to show that for any fixed interval, the probability will go to zero. Formally, for any $k>0$, $$P(0 \le X_n \le k) = \int_0^k \frac{1}{2\lambda_n} exp(-|x|/ \lambda_n) \dd x\\ = \int_0^k \frac{1}{2\lambda_n} exp(-x/ \lambda_n) \dd x = - \frac{1}{2} e^{x/ \lambda_n} |_0^k = \frac{1}{2} (1-e^{k/ \lambda_n}). $$ Hence, when $\lambda_n \to \infty$, $P(0 \le X_n \le k) \to 0$. In other words, the probability of any finite interval is zero in the limit, which cannot be a distribution. ### b For any $\lambda_n$, $P(X_n \le 0) = 0.5$, which is not hard to derive or guess. From the above calculation, $P(X_n \in(0,1)) = \frac{1}{2} (1-e^{k/ \lambda_n})$, and $P(X_n \ge 1) = \frac{1}{2} e^{k/ \lambda_n}$. Hence, as $\lambda \to \infty$, $(X_n \in(0,1)) \to 0$ and $P(X_n \ge 1) \to \frac{1}{2}$. The limit distribution of $Y_n$ is a Bernoulli distribution with probability $\frac{1}{2}$. # Q3 Consider the linear model, $$ Y = X_1 \beta_1 + X_2 \beta_2 + \epsilon, $$ where $X_1$ is $1\times k_1$, $X_2$ is $1 \times k_2$, and $Z$ is $1 \times m$, with $m \ge k = k_1+k_2$. Assume $E[Z'\epsilon ] = 0$. Suppose you observe a random sample, size $N$, from the joint distribution of $\{Y,X_1,Z$}$. $X_2$ is not included in the dataset. Consider the 2SLS estimator, $b_1^*$, obtained from the short regression of $Y$ on $X_1$ only, using $Z$ as an instrument for $X_1$. Let $b^{2sls}_1$ and $b^{2sls}_2$ be the 2SLS estimators that would be obtained from the long regression of $Y$ on $X_1$ and $X_2$, using $Z$ as an instrument for $X_1$ and $X_2$. a. Show that $b_1^*$ is a linear function of $b^{2sls}_1$ and $b^{2sls}_2$. b. Find the probability limit of $b_1^*$. Under what conditions is $b_1^*$ a consistent estimator of $\beta_1$. c. Find the limiting distribution of $\sqrt{N}( b_1^*-\beta_1)$ where $\beta_1$ is the probability limit in (b). ## Answer :::warning Be careful of the vector dimensions, which are various among different questions. ::: ### a To calculate 2SLS, we firt reg $X_1$ on $Z$, $$ \hat{X_1}_i = Z_i E_N[ (Z'Z)^{-1}Z' X_1], $$ then reg $Y$ on $\hat{X_1}_i$, $$ b_1^* = E_N [ (\hat{X_1}'\hat{X_1})^{-1} \hat{X_1}' Y]. $$ Define $X=[X_1 X_2]$ as a $1 \times k$ vector and $b^{2SLS} = \begin{pmatrix}b^{2sls}_1\\ b^{2sls}_2 \end{pmatrix}$, we have $$ \hat{X}_i = Z_iE_N[ (Z'Z)^{-1}Z' X], $$ and $$ b^{2SLS} = E_N [ (\hat{X}'\hat{X})^{-1} \hat{X}' Y]. $$ :::info The eaiset way to show the statement is using the projection concept in the linear algebra. $\hat{X_1}$ is the projection of $X_1$ on the space spanned by $Z$ and $b_1^*$ is the parameter of $Y$ project on the space spanned by $\hat{X_1}$. Plainly, $\hat{X_1}$ is a subspace of $\hat{X}$. Hence, the projection of $Y$ on $\hat{X_1}$ must also lie on $\hat{X}$. Thus, b_1^*$ is a linear function of $b^{2sls}_1$ and $b^{2sls}_2$. ::: We can also expand the above all equations, after some algebra, $$ b_1^* =E_N [ (X_1' \Delta X_1)^{-1} X_1' \Delta Y]\\ b^{2SLS} =E_N [ (X' \Delta X)^{-1} X' \Delta Y], $$ where $\Delta=Z(Z'Z)^{-1}Z'$. ### b :::info We may need to assume $\hat{X_1}_i$ is a consistent estimator. ::: We have, $$ b_1^* = E_N [ (\hat{X_1}'\hat{X_1})^{-1} \hat{X_1}' (X_1 \beta_1 + X_2 \beta_2 + \epsilon)]. $$ By the above assumption, $$ \lim_{N\to \infty}b_1^* = (X_1'X_1)^{-1} X_1' (X_1 \beta_1 + X_2 \beta_2) = \beta_1 + (X_1'X_1)^{-1} X_1' X_2 \beta_2. $$ $b_1^*$ is a consistent estimator if $X_1' X_2=0$ or $\beta_2=0$. ### c Apply the formula. :::info The exam is probably open-note. ::: # Q4 Consider the following regression: $$Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i D_i + u_i,$$ where $X$ is one dimensional random variable, $D_i$ is a dummy variable that can take only the values one and zero and $cov(u_i, X_i) = 0$. Suppose you mistakenly estimate the model $$ Y_i = \gamma_0 + \gamma_1 X_i + e_i,$$ which omits the cross-product term $X_i D_i$. Assume that $E[X_i |D_i = 1] = 0$, $\beta_2\neq 0$ and $Var[X]$ and $Var[XD]$ are known and finite. a. Find the probability limit of the OLS estimator of $\hat{\gamma_1}$ in terms of $\beta_1$, $\beta_2$, $Var[X]$ and $Var[XD]$. b. Can you give an example of the joint distribution of $X$ and $D$ such that the estimator $\hat{\gamma_1}$is a consistent of $\beta_1$? ## Answer ### a We know $$\lim_{n \to \infty} \hat{\gamma_1} = \frac{Cov( Y, X)}{Var(X)}, $$ where $$ Cov(Y,X) = Cov(\beta + \beta_1 X + \beta_2 X D + u, X) = \beta_1 Var(X) + \beta_2 Cov(XD, X). $$ By the formula, $$ Cov(XD, X)= E[X^2 D] - E[XD]E[X]\\ Var(XD)=E[X^2 D^2] - E[XD]E[XD]. $$ Define $P(D=1)=p$, we have $$ E[XD] = E[E[XD|D]]= E[XD|D=1] \cdot p + \underbrace{E[XD|D=0]}_{0} \cdot (1-p) = E[XD|D=1] \cdot p =0\\ E[X^2 D] = E[X^2 D|D=1] \cdot p + \underbrace{E[X^2 D|D=0]}_0 \cdot (1-p) = E[X^2 D|D=1] = E[X^2 |D=1] \\ E[X^2 D^2] = E[X^2 D^2|D=1] \cdot p + \underbrace{E[X^2 D^2|D=0]}_0 \cdot (1-p) = E[X^2 D^2|D=1] = E[X^2 |D=1] \\ $$ Hence, $$ Cov(XD, X)=Var(XD) $$ Thus, $$ \lim_{n \to \infty} \hat{\gamma_1} = \beta_1 + \beta_2 \frac{Var(XD)}{Var(X)} $$ ### b Since $\beta_2 \neq 0$ and $Var(X)$ is finite, $Var(XD)$ needs to be zero. From part a , we know $$ Var(XD) = E[X^2 |D=1]. $$ Consider the following joint distribution: $P(D=1) = P(D=0)= \frac{1}{2}$. $X=0$ if $D=1$ and $X=1$ if $D=0$. This joint distribution satisfies all above assumptions.