--- tags: metrics, group --- Formulas in Econometrics === $$ % My definitions \def\ve{{\varepsilon}} \def\dd{{\text{ d}}} \def\E{{\mathbb{E}}} \newcommand{\dif}[2]{\frac{d #1}{d #2}} % for derivatives \newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} % for partial derivatives \def\R{\text{R}} $$ # Conditoinal Distribution ## Conditional Expectation $$ E(X\mid Y)=f(Y),$$ It is important that conditional expectation/variance is a function of a random variable, but not (necessary) a constant. ## Law of Iterated Expectations $$ \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y))$$ ## Conditional Variance $$ \operatorname {Var} (Y|X)=\operatorname {E} {\Big (}{\big (}Y-\operatorname {E} (Y\mid X){\big )}^{2}\mid X{\Big )} = \operatorname {E} (Y^{2}|X)-\operatorname {E} (Y|X)^{2}.$$ ### Conditional and Unconditional Variance $$ \operatorname {Var} (Y)=\operatorname {E} (\operatorname {Var} (Y\mid X))+\operatorname {Var} (\operatorname {E} (Y\mid X)).$$ # Asymptotic Theorem ## Weak Law of Large Number (WLLN) Suppose $Y$ is a random variable with $\E[Y]=\mu$, then the sample average converge to the expectation in probability, $$ \frac{1}{n} \sum_{i=1}^n Y_i = \bar{Y}_n \to_p \E[Y]=\mu $$ ## Continuous Mapping Theorem (CMT) 1. If $X_n \to_p X$, and $g(·)$ is some continuous function, then $g(X_n) \to_p g(X)$. 2. If $X_n \to_d X$, and $g(·)$ is some continuous function, then $g(X_n) \to_d g(X)$. ## Central Limit Theorem (CLT) Suppose $Y$ is a random variable with $\E[Y]=\mu, V[Y] = \sigma^2$. The sample average from the $n$ observations will converge to $\mu$ by WLLN. If we times the sample average with square root of $n$, we have, $$ \sqrt{n} (\bar{Y} - \mu) \to_d \mathcal{N} (0, \sigma^2) $$ ## Delta method ### Univariate If we have $$ \sqrt{n} (Y_n - \theta) \to_d \mathcal{N} (0, \sigma^2) $$ and $g$ is a function such that $g'(\theta)$ is not $0$. We have $$ \sqrt{n} (g(Y_n) - g(\theta)) \to_d \mathcal{N} (0, g'(\theta)^2\sigma^2) $$ ### Multivariate If we have $$ \sqrt{n} (Y_n - \theta) \to_d \mathcal{N} (0, \Sigma) $$ and $h$ is a function such that $\nabla h(\theta)$ is not $0$. We have $$ \sqrt{n} (h(Y_n) - h(\theta)) \to_d \mathcal{N} (0, \nabla h(\theta)^T \Sigma \nabla h(\theta)) $$ # Least Square ## Ordinary Least Square (OLS) Consider the following model, $$ y = x'\beta + e, $$ where $y$ and $e$ are scalar, $x$ and $\beta$ are $k \times 1$ vectors. Assume $\E[e|x]=0$ and $\E(x x')$ is full rank. Suppose we have $n$ iid observations $\{y_i, x_i\}_{i=1}^n$ that follows the above models, and we want to estimate $\beta$. ### Matrix form Let $$ Y = \begin{pmatrix} y_1 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{pmatrix}, X = \begin{pmatrix} x_1' \\ \cdot \\ \cdot \\ \cdot \\ x_n'\end{pmatrix}, E = \begin{pmatrix} e_1' \\ \cdot \\ \cdot \\ \cdot \\ e_n'\end{pmatrix}, X' = \begin{pmatrix} x_1 & \cdot & \cdot & \cdot & x_n\end{pmatrix}, $$ The above model can be written as $$ Y = X \beta + E. $$ The OLS estimator is $$ \hat{\beta} = (X'X)^{-1}X'Y $$ ### Scalar form Define $$ \E_n(x x') = \frac{1}{n} \sum_{i=1}^n x_i x_i', \E_n(x y) = \frac{1}{n} \sum_{i=1}^n x_i y_i $$ The OLS estimator can be written as $$ \hat{\beta} = \E_n(x x')^{-1} \E_n(x y), $$ which is equivalent to the matrix form. ### Consistency For matrix form, \begin{align} \hat{\beta} &= (X'X)^{-1}X'Y = (X'X)^{-1}X'(X \beta + E)\\ &= \beta + (X'X)^{-1} X'E\\ &= \beta + \left(\frac{1}{n} \sum_{i=1}^n x_i x_i' \right)^{-1} \frac{1}{n} \sum_{i=1}^n x_i e_i \\ &= \beta +\E_n(x x')^{-1} \E_n(x e) \end{align} By the WLLN, $$ \E_n(x x') \to_p E(x x'), \E_n(x e) \to_p \E(xe), $$ By LIE, $$ \E[xe] = \E [\E[xe|x] ] = \E [x\E[e|x]] = \E[0] = 0 $$ Apply Slustky, $\E_n(x x')^{-1} \E_n(x e) \to_p E(x x')^{-1} \cdot 0 =0$. The proof of scalar form is similar, \begin{align} \hat{\beta} &=\E_n(x x')^{-1} \E_n(x y) \\ &= \E_n(x x')^{-1} \E_n(x (x' \beta + e))\\ &= \beta + \E_n(x x')^{-1} \E_n(x e), \end{align} and we can get the same result from the above argument. ### Normality Under some regular assumptions, we have the following asympotical distribution of OLS estimator, $$ \sqrt{n}(\hat{\beta} - \beta) \to_d \mathcal{N}(0, V), $$ where $$ V = \E[xx']^{-1} \E[xx' e^2] \E[xx']^{-1}. $$ If we asuume homoskedastic of the error term, $V(E) = \sigma^2 I$ or $cov(xx', e^2)=0$, we have $$ V = \E[xx']^{-1} \sigma^2 $$ # Test ## Wald Statistic Suppose we have estimated a $k \times 1$ vector $\hat{\beta}$ from $n$ observations. By the CLT, we got $$ (\hat{\beta} - \beta) \to_d \mathcal{N}(0, \Sigma) $$ Let $H$ be a rank $p$ matrix. We want to test $$ H_0: H \beta = \theta\\ H_1: H \beta \neq \theta\\ $$ If the null hypothesis is true, we have $$ W = (H\hat{\beta} - H \beta)' (H \Sigma H')^{-1}(H\hat{\beta} - H \beta) \sim \mathcal{X}^2(p) $$ ## Zero Null Subvector Test ### General Cases If the null hypothesis is several parameters are zero, we call it as zero null subvector test. Suppose there are $k_2$ parameters are zero in the null hypothesis. In this case, $$ W = \frac{n-k}{1} \frac{R^2 - R^{2*}}{(1-R^2)} \sim \mathcal{X}^2(k_2) , $$ where $R^{2*}$ is the r-squared from the restricted regression. ### All Slopes Zero If we want to test the null hypothesis that all coefficients are zero except constant, the statistic becomes $$ W = \frac{n-k}{1} \frac{R^2 - 0}{(1-R^2)} = \frac{n-k}{1} \frac{R^2 }{(1-R^2)} \sim \mathcal{X}^2(k-1) , $$ :::info Goldberger use another distribution, $$ V= \frac{n-k}{k_2} \frac{R^2 - R^{2*}}{(1-R^2)} \sim F(k_2,n-k) , $$ $$ V = \frac{n-k}{k-1} \frac{R^2 - 0}{(1-R^2)} = \frac{n-k}{k-1} \frac{R^2 }{(1-R^2)} \sim F(1, n-k) , $$ ::: # GMM The following is based on Bruce Hansen, CH 13. ## Definition Suppose $(y, x)$ has joint distribtuion based on the parameter $\beta$, and we know there is a well behavior function $g$ such that $$ \E[g(y,x; \beta)]=0, $$ then we can use this moment condition to estimate $\beta$. (under some regular assumptions) The sample analog is $$ \bar{g}_n(\beta)= \frac{1}{n} \sum_{i=1}^n g(y_i,x_i; \beta) $$ The GMM estimator is $$ \hat{\beta}_{GMM} = \arg \min_\beta n \bar{g}_n(\beta)' \Delta \bar{g}_n(\beta), $$ where $\Delta$ is a positive-definite weighting matrix. ## Limit Distribution ### General Formula Let $$ \Omega = \E [ \pd{g(y,x; \beta)}{\beta}], \Sigma = \E[g(y,x; \beta) g(y,x; \beta)'] $$ Under some regular assumptions, $$ \sqrt{n} (\hat{\beta}_{GMM} - \beta) \to_d \mathcal{N}(0, V), $$ where $$ V = (\Omega' \Delta \Omega)^{-1} \Omega' \Delta \Sigma \Delta \Omega (\Omega' \Delta \Omega)^{-1} $$ ### Special Cases Let $J$ and $K$ be the dimension of $g$ and $\beta$, respectively. #### $J=K$ The inverse of $\Omega$ exists and the limiting variance becomes $$ V= \Omega^{-1} \Sigma \Omega^{-1} $$ #### Efficient Weighting Matrix We can achive the least variance by choosing $\Delta = \Sigma^{-1}$, then limiting variance becomes $$ V= (\Omega' \Sigma^{-1} \Omega)^{-1} $$ ### Examples Most of estimators covered in the lecture are GMM estimators. #### MLE Suppose $f(y, x; \beta)$ is the pdf of the joint distribution, MLE is just a GMM with the following moment condition, $$ g(y,x; \beta) = \pd{\log f(y, x; \beta)}{\beta} $$ Under some regular assumptions, MLE is the most efficient unbiased estimator. The limit distribution is $$ \sqrt{n}(\hat{\beta}-\beta) \to_d \mathcal{N}(0,(\Omega' \Sigma^{-1} \Omega)^{-1} ) = \mathcal{N}(0, \Sigma^{-1} ), $$ where $\Sigma = I$, is the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information). The above property is based on [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound). #### NLLS Suppose $\E[y|x] = m(x; \beta)$, the least square error is $$ \E [(y - m(x; \beta))^2|x] $$ The first-order condition is $$ \E[ 2 (y - m(x; \beta)) (- \pd{m(x; \beta)}{\beta})|x] =0 $$ so we have the following moment condition, $$ g(y,x; \beta) = (y - m(x; \beta)) \pd{m(x; \beta)}{\beta} $$ #### MOM Let $u = y - \E[y|x]$. We can estimate $\beta$ by finding some $v$ satifies $$ g(y,x; \beta) = v'(y - \E[y|x]) = v'u, \E[v'u] =0 $$ #### Linear IV Suppose $z_i$ is the $l \times 1$ IV variable, $y_i$ is a scalar, $x_i$ is a $k \times 1$ variable. If we have the following moment condition, $$ g_i(\beta) = z_i(y_i-x_i \beta), \E[g_i(\beta)]=0 $$ Let $Z, Y, X$ be the aggregate matrices and vector from $n$ observations. The GMM estimator is \begin{align} \hat{\beta}_{GMM} &= \arg \min_\beta J(\beta) \\ &= \arg \min_\beta n (Z'Y - Z'X \beta)' \Delta (Z'Y - Z'X \beta) \end{align} By the first-order condition, we have $$ \hat{\beta}_{GMM}=(X'Z \Delta Z' X)^{-1} (X'Z \Delta Z' Y) $$ #### Just-identified In the case, $$ \hat{\beta}_{GMM}= (Z'X)^{-1}Z'Y = \hat{\beta}_{IV} $$ #### Over-identified If we choose the weighting matrix as $\Delta = (Z'Z)^{-1}$, we have $$ \hat{\beta}_{GMM}= (Z'X)^{-1}Z'Y = \hat{\beta}_{IV} $$ # Special Model ## Probit Model We can only observe $y,x$, and we want to estimate $\beta$. The value of $y$ is based on the below equation, $$ y = \begin{cases} 1 & \text{ if } x\beta + u >0\\ 0 & \text{ if } x\beta + u \le 0 \end{cases} $$ where $u$ is an error term following $u|x \sim \mathcal{N}(0,1)$. Thus, $$ \E[y|x] = P(y=1|x)=P(u>-x\beta) = \Phi(x \beta). $$ With the random sample $\{y_i, x_i\}_{i=1}^n$, the pdf/likelihood is $$ L(y, x; \beta) = \prod_{i=1}^n \Phi(x_i \beta)^{y_i} (1-\Phi(x_i \beta))^{1-y_i} $$ The log-likelihood is $$ l(y,x; \beta) = \log L(y, x; \beta) = \sum_{i=1}^n \left \{ {y_i} \log \Phi(x_i \beta) + (1-y_i) \log (1-\Phi(x_i \beta)) \right \} $$ Hence, we can estimate $\beta$ by MLE, $$ \hat{\beta}_{MLE} = \arg \max_{\beta} l(y,x; \beta) $$ ## Tobit Model The original model is $$ y^* = x \beta + u, u|x \sim \mathcal{N}(0, \sigma^2) $$ However, we can only observe $y$, which is based on $$ y = \begin{cases} y^* & \text{ if } y^* >c\\ c & \text{ if } y^* \le c \end{cases} $$ The pdf of $y$ depend on different value is $$ f(y|x) = \begin{cases} 0 & \text{ if } y <c\\ 1-\Phi(\frac{x\beta - c}{\sigma}) & \text{ if } y=c \\ \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) & \text{ if } y>c \\ \end{cases} $$ The likelihood/pdf is $$ f(y|x) = \left[ 1-\Phi(\frac{x\beta - c}{\sigma}) \right]^{1(y=c)} \left[ \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) \right]^{1(y>c)} $$ ## Truncation The original model is $$ y^* = x \beta + u, u|x \sim \mathcal{N}(0, \sigma^2) $$ However, we can only observe $y$, which is based on $$ y = y^* if y^* \ge c $$ The likelihood function is \begin{align} f(y|x) &= f(y^*|x, y^*\ge c)\\ &= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) P(y^*>c)^{-1}\\ &= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma})[1-\Phi(\frac{c-x\beta}{\sigma})]^{-1}\\ &= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma})[\Phi(\frac{x\beta-c}{\sigma})]^{-1}\\ \end{align}