---
tags: metric, question
---
# Metris 2021 Final
The personal solution, may not be correct.
## Q1
### Question
$X_n$ is a sequence of random variables such that for each $n$ and each finite $k$, moment $E[(X_n)^k]$ exists. Moreiver, there exists random variable $X$ such that for each finite $k$ it's moment $E[X^k]$ exists and $\lim_{n \to \infty} E[(X_n)^k] =E[X^k]$.
1. Does the sequence $X_n$ converge to $X$ in probability? Prove or disprove.
2. Does the sequence $X_n$ converge to $X$ in distribution? Prove or disprove.
Formally and rigorously provide the answers to these questions.
### Answer
1. No.
Let $\{X_n\}$ be a trivial sequence that all elements are the same random variable $Y$. Suppose $Y$ follows the Bernoulli distribution with $p=0.5$. Let $X = 1-Y$. Hence, $X$ and $Y$ have the same distribution function, so their moments must be equal. However, their outcomes are always different.
2. No.
Counterexample:
http://z14120902.github.io/week3/When%20Do%20the%20Moments%20Uniquely%20Identify%20a%20Distribution.pdf
More specifically, construct a trivial sequence $\{X_n\}$ with all elements equal to $Y$, where $Y$ follows log-normal distribution,
$$
f_Y(y) = \frac{e^{0.5(\log x)^2}}{\sqrt{2 \pi} x}.
$$
Let $X$ be the random variable follows the follwing density function,
$$
f_X(x) = \frac{e^{0.5(\log x)^2}}{\sqrt{2 \pi} x} \left( 1 + \sin (2 \pi \log x)\right).
$$
According to the literature, we have
$$
E(Y^k) = E(X^k) = e^{\frac{h^2}{2}}, \forall k \in \mathbb{N}
$$
## Q2
### Quetion
Dummy variable $D$ is defined by $D=1\{aX_1+bX_2 \ge 0\}$.The covariates are independent from each other and $X_1$ takes value $1$ with probability $1/2$ and value $−1$ with probability $1/2$; and $X_2$ takes value $1$ with probability $1/2$ and value $−1$ with probability $1/2$. We observe the i.i.d. sample $\{d_t,x_{1t},x_{2t}\}^T_{t=1}$ of covariates and outcomes.
1. Prove that without additional restrictions parametersaandbare not point-identified.
2. Prove that under parameter constraint $|a|=|b|= 1$ the model is identified if and only if $P(D= 1) =3/4$ and either $Pr(D= 0,X_1=X_2= 1) =1/4$ or $Pr(D= 0,X_1=−X_2=1) =1/4$ or $Pr(D= 0,X_1=−X_2=−1) =1/4$ or $Pr(D= 0,X_1=X_2=−1) =1/4$.
3. Suppose that the model is identified under the parameter constraint $|a|=|b|= 1$. Provide a maximum likelihood estimator foraandband prove that it is consistent.
4. Show that if $\hat{a}$ is the maximum likelihood estimator with probability limit $a_0$, for any positive number $α >0$, $T^α\{Pr (\hat{a}=a_0)−1\} \to 0$ as $T \to \infty$. In other words, this estimator has an exponential convergence rate.
### Answer
1. For any $(a, b)$ and $c>0$,
$$
aX_1+bX_2 \ge 0 \iff c(aX_1+bX_2) = acX_1+bcX_2 \ge0.
$$
Hence, the outcome of any estimator $(\hat{a}, \hat{b})$ must equals to $(c\hat{a}, c\hat{b})$, and we can not fully identified the parameters.
2. There are only four possible realization of $(X_1, X_2)$, that is
$$
(1, 1), (1, -1), (-1, 1), (-1, -1).
$$
Given $|a|=|b|= 1$, there are also four possible values of $(a, b)$,
$$
\Theta= \{(1, 1), (1, -1), (-1, 1), (-1, -1)\}.
$$
Its easily to check that given any $(a, b)$, $Pr(D= 1,X_1=X_2= 1) =3/4$, and one of three conditions hold.
3. By the definition,
$$
(\hat{a},\hat{b})_{MLE} = \arg \max_{(a, b) \in \Theta} L(\{d_t,x_{1t},x_{2t}\}^T_{t=1}; (a,b)) = \arg \max_{(a, b) \in \Theta} \prod P(x_{1t}, x_{2t}, d_t; (a,b))
$$
If we observe $d_t =0$ for some $t$, then we can fully identify the parameter. That is, three of four possible parameters will let the likelihood equal $0$. Hence, we have the following rule,
$$
(\hat{a},\hat{b}) = \begin{cases}
(1,1), &\text{ if } (d_t=0, x_{1t}= -1, x_{2t} = -1) \text{ for some } t\\
(1,-1), &\text{ if } (d_t=0, x_{1t}= -1, x_{2t} = 1) \text{ for some } t\\
(-1,1), &\text{ if } (d_t=0, x_{1t}= 1, x_{2t} = -1) \text{ for some } t\\
(-1,-1), &\text{ if } (d_t=0, x_{1t}= 1, x_{2t} = 1) \text{ for some } t\\
\end{cases}
$$
However, even we observe $d_t=1$, we still obtaing some information. For example, if we observe $(d_1, x_{11}, x_{21}) = (1, 1, 1)$, then $(a,b)\neq (-1,-1)$; if we observe
$$
(d_1, x_{11}, x_{21}) = (1, 1, 1), \quad (d_2, x_{12}, x_{22}) = (1, -1, 1), \quad (d_3, x_{13}, x_{23}) = (1, 1, -1)
$$
The parameter must be $(a,b) = (1,1)$, other parameter also let the likelihood become $0$.
Hence, the MLE estimator is the combination of the above two rules. If we do not observe $d_t=0$ for any $t$ and the three paris of $(x_1, x_2)$, we just randomly assign feasible parameters with probability $1/2$ or $1/3$.
4. Since the MLE estimator is too complicated, here we consider a simpler and less powerful estimator: If we do not observe $d_t=0$ for any $t$ and the three paris of $(x_1, x_2)$, we just randomly assign each parameter with probability $1/4$.
Given, $\{d_t,x_{1t},x_{2t}\}_{t=1}^T$, the only possibility that we cannot derive the true parameter deterministically is that we can only oberserve two $(x_1, x_2)$ that do not generate $d_t=1$ in all $t$.
Hence,
$$
\{Pr (\hat{a}=a_0)−1\} = 0 \cdot P(\text{Get usefull information}) + \frac{-1}{2} \cdot P(\text{Do not get usefull information}).
$$
The probability of the second term is
$$
{3 \choose 2} (\frac{2}{4})^T = 3 \cdot (\frac{2}{4})^T
$$
Finally,
$$
T^α\{Pr (\hat{a}=a_0)−1\} = T^α \frac{-1}{2} \cdot 3 \cdot (\frac{2}{4})^T = \frac{-3}{2}T^{\alpha} 2^{-T} \to 0,
$$
where $2^{-T}$ tell us that it has exponetial converge rate. The MLE estimator will converge faster than this.
## Q3
### Question
Using a sample of $n$ i.i.d. observations $\{x_i\}^n_{i=1}$ construct optimal test of size $α$ for hypothesis $H$ corresponding to the uniform distribution $U[0,1]$ against the alternative hypothesis $K$ corresponding to the uniform distribution $U[h,1 +h]$ where $h >0$.Plot the power function of your test as a function of $h$.
### Answer
Let $x_{(n)} = \max\{x_1, \cdots, x_n\}$. Consider the test decision
$$
\delta = \begin{cases}
1, &\text{ if } x_{(n)} \ge k\\
0, &\text{ if } x_{(n)} < k\\
\end{cases}
$$
We only need to calibriate $k$ based on $\alpha$, that is
$$
P(x_{(n)} \ge k | h=0) \le \alpha \implies (k-0)^n \le \alpha \implies k \le \alpha^{\frac{1}{n}}
$$
For the power function, we always reject $H$ when $h > k$, so we only need to consider the case when $0 \le h \le k$.
$$
\beta(h) = \begin{cases}
P(x_{(n)} \ge k | h=0) = (h-k)^n, &\text{ if } h \le k\\
1, &\text{ if } h \ge k\\
\end{cases}
$$
For optimality, it is tedious to write rigorous proof. Here is a conceptual intuition. When $n=1$, we have the rejection region $S = \{ x \ge 0: x \ge k\}$ and $1-k=\alpha$. It is clear that any region with higher power must have higher $\alpha$.