--- tags: metric, question --- # Metris 2021 Final The personal solution, may not be correct. ## Q1 ### Question $X_n$ is a sequence of random variables such that for each $n$ and each finite $k$, moment $E[(X_n)^k]$ exists. Moreiver, there exists random variable $X$ such that for each finite $k$ it's moment $E[X^k]$ exists and $\lim_{n \to \infty} E[(X_n)^k] =E[X^k]$. 1. Does the sequence $X_n$ converge to $X$ in probability? Prove or disprove. 2. Does the sequence $X_n$ converge to $X$ in distribution? Prove or disprove. Formally and rigorously provide the answers to these questions. ### Answer 1. No. Let $\{X_n\}$ be a trivial sequence that all elements are the same random variable $Y$. Suppose $Y$ follows the Bernoulli distribution with $p=0.5$. Let $X = 1-Y$. Hence, $X$ and $Y$ have the same distribution function, so their moments must be equal. However, their outcomes are always different. 2. No. Counterexample: http://z14120902.github.io/week3/When%20Do%20the%20Moments%20Uniquely%20Identify%20a%20Distribution.pdf More specifically, construct a trivial sequence $\{X_n\}$ with all elements equal to $Y$, where $Y$ follows log-normal distribution, $$ f_Y(y) = \frac{e^{0.5(\log x)^2}}{\sqrt{2 \pi} x}. $$ Let $X$ be the random variable follows the follwing density function, $$ f_X(x) = \frac{e^{0.5(\log x)^2}}{\sqrt{2 \pi} x} \left( 1 + \sin (2 \pi \log x)\right). $$ According to the literature, we have $$ E(Y^k) = E(X^k) = e^{\frac{h^2}{2}}, \forall k \in \mathbb{N} $$ ## Q2 ### Quetion Dummy variable $D$ is defined by $D=1\{aX_1+bX_2 \ge 0\}$.The covariates are independent from each other and $X_1$ takes value $1$ with probability $1/2$ and value $−1$ with probability $1/2$; and $X_2$ takes value $1$ with probability $1/2$ and value $−1$ with probability $1/2$. We observe the i.i.d. sample $\{d_t,x_{1t},x_{2t}\}^T_{t=1}$ of covariates and outcomes. 1. Prove that without additional restrictions parametersaandbare not point-identified. 2. Prove that under parameter constraint $|a|=|b|= 1$ the model is identified if and only if $P(D= 1) =3/4$ and either $Pr(D= 0,X_1=X_2= 1) =1/4$ or $Pr(D= 0,X_1=−X_2=1) =1/4$ or $Pr(D= 0,X_1=−X_2=−1) =1/4$ or $Pr(D= 0,X_1=X_2=−1) =1/4$. 3. Suppose that the model is identified under the parameter constraint $|a|=|b|= 1$. Provide a maximum likelihood estimator foraandband prove that it is consistent. 4. Show that if $\hat{a}$ is the maximum likelihood estimator with probability limit $a_0$, for any positive number $α >0$, $T^α\{Pr (\hat{a}=a_0)−1\} \to 0$ as $T \to \infty$. In other words, this estimator has an exponential convergence rate. ### Answer 1. For any $(a, b)$ and $c>0$, $$ aX_1+bX_2 \ge 0 \iff c(aX_1+bX_2) = acX_1+bcX_2 \ge0. $$ Hence, the outcome of any estimator $(\hat{a}, \hat{b})$ must equals to $(c\hat{a}, c\hat{b})$, and we can not fully identified the parameters. 2. There are only four possible realization of $(X_1, X_2)$, that is $$ (1, 1), (1, -1), (-1, 1), (-1, -1). $$ Given $|a|=|b|= 1$, there are also four possible values of $(a, b)$, $$ \Theta= \{(1, 1), (1, -1), (-1, 1), (-1, -1)\}. $$ Its easily to check that given any $(a, b)$, $Pr(D= 1,X_1=X_2= 1) =3/4$, and one of three conditions hold. 3. By the definition, $$ (\hat{a},\hat{b})_{MLE} = \arg \max_{(a, b) \in \Theta} L(\{d_t,x_{1t},x_{2t}\}^T_{t=1}; (a,b)) = \arg \max_{(a, b) \in \Theta} \prod P(x_{1t}, x_{2t}, d_t; (a,b)) $$ If we observe $d_t =0$ for some $t$, then we can fully identify the parameter. That is, three of four possible parameters will let the likelihood equal $0$. Hence, we have the following rule, $$ (\hat{a},\hat{b}) = \begin{cases} (1,1), &\text{ if } (d_t=0, x_{1t}= -1, x_{2t} = -1) \text{ for some } t\\ (1,-1), &\text{ if } (d_t=0, x_{1t}= -1, x_{2t} = 1) \text{ for some } t\\ (-1,1), &\text{ if } (d_t=0, x_{1t}= 1, x_{2t} = -1) \text{ for some } t\\ (-1,-1), &\text{ if } (d_t=0, x_{1t}= 1, x_{2t} = 1) \text{ for some } t\\ \end{cases} $$ However, even we observe $d_t=1$, we still obtaing some information. For example, if we observe $(d_1, x_{11}, x_{21}) = (1, 1, 1)$, then $(a,b)\neq (-1,-1)$; if we observe $$ (d_1, x_{11}, x_{21}) = (1, 1, 1), \quad (d_2, x_{12}, x_{22}) = (1, -1, 1), \quad (d_3, x_{13}, x_{23}) = (1, 1, -1) $$ The parameter must be $(a,b) = (1,1)$, other parameter also let the likelihood become $0$. Hence, the MLE estimator is the combination of the above two rules. If we do not observe $d_t=0$ for any $t$ and the three paris of $(x_1, x_2)$, we just randomly assign feasible parameters with probability $1/2$ or $1/3$. 4. Since the MLE estimator is too complicated, here we consider a simpler and less powerful estimator: If we do not observe $d_t=0$ for any $t$ and the three paris of $(x_1, x_2)$, we just randomly assign each parameter with probability $1/4$. Given, $\{d_t,x_{1t},x_{2t}\}_{t=1}^T$, the only possibility that we cannot derive the true parameter deterministically is that we can only oberserve two $(x_1, x_2)$ that do not generate $d_t=1$ in all $t$. Hence, $$ \{Pr (\hat{a}=a_0)−1\} = 0 \cdot P(\text{Get usefull information}) + \frac{-1}{2} \cdot P(\text{Do not get usefull information}). $$ The probability of the second term is $$ {3 \choose 2} (\frac{2}{4})^T = 3 \cdot (\frac{2}{4})^T $$ Finally, $$ T^α\{Pr (\hat{a}=a_0)−1\} = T^α \frac{-1}{2} \cdot 3 \cdot (\frac{2}{4})^T = \frac{-3}{2}T^{\alpha} 2^{-T} \to 0, $$ where $2^{-T}$ tell us that it has exponetial converge rate. The MLE estimator will converge faster than this. ## Q3 ### Question Using a sample of $n$ i.i.d. observations $\{x_i\}^n_{i=1}$ construct optimal test of size $α$ for hypothesis $H$ corresponding to the uniform distribution $U[0,1]$ against the alternative hypothesis $K$ corresponding to the uniform distribution $U[h,1 +h]$ where $h >0$.Plot the power function of your test as a function of $h$. ### Answer Let $x_{(n)} = \max\{x_1, \cdots, x_n\}$. Consider the test decision $$ \delta = \begin{cases} 1, &\text{ if } x_{(n)} \ge k\\ 0, &\text{ if } x_{(n)} < k\\ \end{cases} $$ We only need to calibriate $k$ based on $\alpha$, that is $$ P(x_{(n)} \ge k | h=0) \le \alpha \implies (k-0)^n \le \alpha \implies k \le \alpha^{\frac{1}{n}} $$ For the power function, we always reject $H$ when $h > k$, so we only need to consider the case when $0 \le h \le k$. $$ \beta(h) = \begin{cases} P(x_{(n)} \ge k | h=0) = (h-k)^n, &\text{ if } h \le k\\ 1, &\text{ if } h \ge k\\ \end{cases} $$ For optimality, it is tedious to write rigorous proof. Here is a conceptual intuition. When $n=1$, we have the rejection region $S = \{ x \ge 0: x \ge k\}$ and $1-k=\alpha$. It is clear that any region with higher power must have higher $\alpha$.