---
tags: metrics, group
---
Formulas in Econometrics
===
$$
% My definitions
\def\ve{{\varepsilon}}
\def\dd{{\text{ d}}}
\def\E{{\mathbb{E}}}
\newcommand{\dif}[2]{\frac{d #1}{d #2}} % for derivatives
\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}} % for partial derivatives
\def\R{\text{R}}
$$
# Conditoinal Distribution
## Conditional Expectation
$$ E(X\mid Y)=f(Y),$$
It is important that conditional expectation/variance is a function of a random variable, but not (necessary) a constant.
## Law of Iterated Expectations
$$ \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y))$$
## Conditional Variance
$$ \operatorname {Var} (Y|X)=\operatorname {E} {\Big (}{\big (}Y-\operatorname {E} (Y\mid X){\big )}^{2}\mid X{\Big )} = \operatorname {E} (Y^{2}|X)-\operatorname {E} (Y|X)^{2}.$$
### Conditional and Unconditional Variance
$$ \operatorname {Var} (Y)=\operatorname {E} (\operatorname {Var} (Y\mid X))+\operatorname {Var} (\operatorname {E} (Y\mid X)).$$
# Asymptotic Theorem
## Weak Law of Large Number (WLLN)
Suppose $Y$ is a random variable with $\E[Y]=\mu$, then the sample average converge to the expectation in probability,
$$
\frac{1}{n} \sum_{i=1}^n Y_i = \bar{Y}_n \to_p \E[Y]=\mu
$$
## Continuous Mapping Theorem (CMT)
1. If $X_n \to_p X$, and $g(·)$ is some continuous function, then $g(X_n) \to_p g(X)$.
2. If $X_n \to_d X$, and $g(·)$ is some continuous function, then $g(X_n) \to_d g(X)$.
## Central Limit Theorem (CLT)
Suppose $Y$ is a random variable with $\E[Y]=\mu, V[Y] = \sigma^2$. The sample average from the $n$ observations will converge to $\mu$ by WLLN.
If we times the sample average with square root of $n$, we have,
$$
\sqrt{n} (\bar{Y} - \mu) \to_d \mathcal{N} (0, \sigma^2)
$$
## Delta method
### Univariate
If we have
$$
\sqrt{n} (Y_n - \theta) \to_d \mathcal{N} (0, \sigma^2)
$$
and $g$ is a function such that $g'(\theta)$ is not $0$. We have
$$
\sqrt{n} (g(Y_n) - g(\theta)) \to_d \mathcal{N} (0, g'(\theta)^2\sigma^2)
$$
### Multivariate
If we have
$$
\sqrt{n} (Y_n - \theta) \to_d \mathcal{N} (0, \Sigma)
$$
and $h$ is a function such that $\nabla h(\theta)$ is not $0$. We have
$$
\sqrt{n} (h(Y_n) - h(\theta)) \to_d \mathcal{N} (0, \nabla h(\theta)^T \Sigma \nabla h(\theta))
$$
# Least Square
## Ordinary Least Square (OLS)
Consider the following model,
$$
y = x'\beta + e,
$$
where $y$ and $e$ are scalar, $x$ and $\beta$ are $k \times 1$ vectors. Assume $\E[e|x]=0$ and $\E(x x')$ is full rank.
Suppose we have $n$ iid observations $\{y_i, x_i\}_{i=1}^n$ that follows the above models, and we want to estimate $\beta$.
### Matrix form
Let
$$
Y = \begin{pmatrix} y_1 \\ \cdot \\ \cdot \\ \cdot \\ y_n\end{pmatrix}, X = \begin{pmatrix} x_1' \\ \cdot \\ \cdot \\ \cdot \\ x_n'\end{pmatrix},
E = \begin{pmatrix} e_1' \\ \cdot \\ \cdot \\ \cdot \\ e_n'\end{pmatrix},
X' = \begin{pmatrix} x_1 & \cdot & \cdot & \cdot & x_n\end{pmatrix},
$$
The above model can be written as
$$
Y = X \beta + E.
$$
The OLS estimator is
$$
\hat{\beta} = (X'X)^{-1}X'Y
$$
### Scalar form
Define
$$
\E_n(x x') = \frac{1}{n} \sum_{i=1}^n x_i x_i', \E_n(x y) = \frac{1}{n} \sum_{i=1}^n x_i y_i
$$
The OLS estimator can be written as
$$
\hat{\beta} = \E_n(x x')^{-1} \E_n(x y),
$$
which is equivalent to the matrix form.
### Consistency
For matrix form,
\begin{align}
\hat{\beta} &= (X'X)^{-1}X'Y = (X'X)^{-1}X'(X \beta + E)\\
&= \beta + (X'X)^{-1} X'E\\
&= \beta + \left(\frac{1}{n} \sum_{i=1}^n x_i x_i' \right)^{-1} \frac{1}{n} \sum_{i=1}^n x_i e_i \\
&= \beta +\E_n(x x')^{-1} \E_n(x e)
\end{align}
By the WLLN,
$$
\E_n(x x') \to_p E(x x'), \E_n(x e) \to_p \E(xe),
$$
By LIE,
$$
\E[xe] = \E [\E[xe|x] ] = \E [x\E[e|x]] = \E[0] = 0
$$
Apply Slustky, $\E_n(x x')^{-1} \E_n(x e) \to_p E(x x')^{-1} \cdot 0 =0$.
The proof of scalar form is similar,
\begin{align}
\hat{\beta} &=\E_n(x x')^{-1} \E_n(x y) \\
&= \E_n(x x')^{-1} \E_n(x (x' \beta + e))\\
&= \beta + \E_n(x x')^{-1} \E_n(x e),
\end{align}
and we can get the same result from the above argument.
### Normality
Under some regular assumptions, we have the following asympotical distribution of OLS estimator,
$$
\sqrt{n}(\hat{\beta} - \beta) \to_d \mathcal{N}(0, V),
$$
where
$$
V = \E[xx']^{-1} \E[xx' e^2] \E[xx']^{-1}.
$$
If we asuume homoskedastic of the error term, $V(E) = \sigma^2 I$ or $cov(xx', e^2)=0$, we have
$$
V = \E[xx']^{-1} \sigma^2
$$
# Test
## Wald Statistic
Suppose we have estimated a $k \times 1$ vector $\hat{\beta}$ from $n$ observations. By the CLT, we got
$$
(\hat{\beta} - \beta) \to_d \mathcal{N}(0, \Sigma)
$$
Let $H$ be a rank $p$ matrix. We want to test
$$
H_0: H \beta = \theta\\
H_1: H \beta \neq \theta\\
$$
If the null hypothesis is true, we have
$$
W = (H\hat{\beta} - H \beta)' (H \Sigma H')^{-1}(H\hat{\beta} - H \beta) \sim \mathcal{X}^2(p)
$$
## Zero Null Subvector Test
### General Cases
If the null hypothesis is several parameters are zero, we call it as zero null subvector test. Suppose there are $k_2$ parameters are zero in the null hypothesis. In this case,
$$
W = \frac{n-k}{1} \frac{R^2 - R^{2*}}{(1-R^2)} \sim \mathcal{X}^2(k_2) ,
$$
where $R^{2*}$ is the r-squared from the restricted regression.
### All Slopes Zero
If we want to test the null hypothesis that all coefficients are zero except constant, the statistic becomes
$$
W = \frac{n-k}{1} \frac{R^2 - 0}{(1-R^2)} = \frac{n-k}{1} \frac{R^2 }{(1-R^2)} \sim \mathcal{X}^2(k-1) ,
$$
:::info
Goldberger use another distribution,
$$
V= \frac{n-k}{k_2} \frac{R^2 - R^{2*}}{(1-R^2)} \sim F(k_2,n-k) ,
$$
$$
V = \frac{n-k}{k-1} \frac{R^2 - 0}{(1-R^2)} = \frac{n-k}{k-1} \frac{R^2 }{(1-R^2)} \sim F(1, n-k) ,
$$
:::
# GMM
The following is based on Bruce Hansen, CH 13.
## Definition
Suppose $(y, x)$ has joint distribtuion based on the parameter $\beta$, and we know there is a well behavior function $g$ such that
$$
\E[g(y,x; \beta)]=0,
$$
then we can use this moment condition to estimate $\beta$. (under some regular assumptions)
The sample analog is
$$
\bar{g}_n(\beta)= \frac{1}{n} \sum_{i=1}^n g(y_i,x_i; \beta)
$$
The GMM estimator is
$$
\hat{\beta}_{GMM} = \arg \min_\beta n \bar{g}_n(\beta)' \Delta \bar{g}_n(\beta),
$$
where $\Delta$ is a positive-definite weighting matrix.
## Limit Distribution
### General Formula
Let
$$
\Omega = \E [ \pd{g(y,x; \beta)}{\beta}], \Sigma = \E[g(y,x; \beta) g(y,x; \beta)']
$$
Under some regular assumptions,
$$
\sqrt{n} (\hat{\beta}_{GMM} - \beta) \to_d \mathcal{N}(0, V),
$$
where
$$
V = (\Omega' \Delta \Omega)^{-1} \Omega' \Delta \Sigma \Delta \Omega (\Omega' \Delta \Omega)^{-1}
$$
### Special Cases
Let $J$ and $K$ be the dimension of $g$ and $\beta$, respectively.
#### $J=K$
The inverse of $\Omega$ exists and the limiting variance becomes
$$
V= \Omega^{-1} \Sigma \Omega^{-1}
$$
#### Efficient Weighting Matrix
We can achive the least variance by choosing $\Delta = \Sigma^{-1}$, then limiting variance becomes
$$
V= (\Omega' \Sigma^{-1} \Omega)^{-1}
$$
### Examples
Most of estimators covered in the lecture are GMM estimators.
#### MLE
Suppose $f(y, x; \beta)$ is the pdf of the joint distribution, MLE is just a GMM with the following moment condition,
$$
g(y,x; \beta) = \pd{\log f(y, x; \beta)}{\beta}
$$
Under some regular assumptions, MLE is the most efficient unbiased estimator. The limit distribution is
$$
\sqrt{n}(\hat{\beta}-\beta) \to_d \mathcal{N}(0,(\Omega' \Sigma^{-1} \Omega)^{-1} ) = \mathcal{N}(0, \Sigma^{-1} ),
$$
where $\Sigma = I$, is the [Fisher information](https://en.wikipedia.org/wiki/Fisher_information). The above property is based on [Cramér–Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound).
#### NLLS
Suppose $\E[y|x] = m(x; \beta)$, the least square error is
$$
\E [(y - m(x; \beta))^2|x]
$$
The first-order condition is
$$
\E[ 2 (y - m(x; \beta)) (- \pd{m(x; \beta)}{\beta})|x] =0
$$
so we have the following moment condition,
$$
g(y,x; \beta) = (y - m(x; \beta)) \pd{m(x; \beta)}{\beta}
$$
#### MOM
Let $u = y - \E[y|x]$. We can estimate $\beta$ by finding some $v$ satifies
$$
g(y,x; \beta) = v'(y - \E[y|x]) = v'u, \E[v'u] =0
$$
#### Linear IV
Suppose $z_i$ is the $l \times 1$ IV variable, $y_i$ is a scalar, $x_i$ is a $k \times 1$ variable. If we have the following moment condition,
$$
g_i(\beta) = z_i(y_i-x_i \beta), \E[g_i(\beta)]=0
$$
Let $Z, Y, X$ be the aggregate matrices and vector from $n$ observations. The GMM estimator is
\begin{align}
\hat{\beta}_{GMM} &= \arg \min_\beta J(\beta) \\
&= \arg \min_\beta n (Z'Y - Z'X \beta)' \Delta (Z'Y - Z'X \beta)
\end{align}
By the first-order condition, we have
$$
\hat{\beta}_{GMM}=(X'Z \Delta Z' X)^{-1} (X'Z \Delta Z' Y)
$$
#### Just-identified
In the case,
$$
\hat{\beta}_{GMM}= (Z'X)^{-1}Z'Y = \hat{\beta}_{IV}
$$
#### Over-identified
If we choose the weighting matrix as $\Delta = (Z'Z)^{-1}$, we have
$$
\hat{\beta}_{GMM}= (Z'X)^{-1}Z'Y = \hat{\beta}_{IV}
$$
# Special Model
## Probit Model
We can only observe $y,x$, and we want to estimate $\beta$. The value of $y$ is based on the below equation,
$$
y = \begin{cases} 1 & \text{ if } x\beta + u >0\\
0 & \text{ if } x\beta + u \le 0
\end{cases}
$$
where $u$ is an error term following $u|x \sim \mathcal{N}(0,1)$.
Thus,
$$
\E[y|x] = P(y=1|x)=P(u>-x\beta) = \Phi(x \beta).
$$
With the random sample $\{y_i, x_i\}_{i=1}^n$, the pdf/likelihood is
$$
L(y, x; \beta) = \prod_{i=1}^n \Phi(x_i \beta)^{y_i} (1-\Phi(x_i \beta))^{1-y_i}
$$
The log-likelihood is
$$
l(y,x; \beta) = \log L(y, x; \beta) = \sum_{i=1}^n \left \{ {y_i} \log \Phi(x_i \beta) + (1-y_i) \log (1-\Phi(x_i \beta)) \right \}
$$
Hence, we can estimate $\beta$ by MLE,
$$
\hat{\beta}_{MLE} = \arg \max_{\beta} l(y,x; \beta)
$$
## Tobit Model
The original model is
$$
y^* = x \beta + u, u|x \sim \mathcal{N}(0, \sigma^2)
$$
However, we can only observe $y$, which is based on
$$
y = \begin{cases} y^* & \text{ if } y^* >c\\
c & \text{ if } y^* \le c
\end{cases}
$$
The pdf of $y$ depend on different value is
$$
f(y|x) = \begin{cases} 0 & \text{ if } y <c\\
1-\Phi(\frac{x\beta - c}{\sigma}) & \text{ if } y=c \\
\frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) & \text{ if } y>c \\
\end{cases}
$$
The likelihood/pdf is
$$
f(y|x) = \left[ 1-\Phi(\frac{x\beta - c}{\sigma}) \right]^{1(y=c)} \left[ \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) \right]^{1(y>c)}
$$
## Truncation
The original model is
$$
y^* = x \beta + u, u|x \sim \mathcal{N}(0, \sigma^2)
$$
However, we can only observe $y$, which is based on
$$
y = y^* if y^* \ge c
$$
The likelihood function is
\begin{align}
f(y|x) &= f(y^*|x, y^*\ge c)\\
&= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma}) P(y^*>c)^{-1}\\
&= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma})[1-\Phi(\frac{c-x\beta}{\sigma})]^{-1}\\
&= \frac{1}{\sigma} \phi(\frac{y-x\beta}{\sigma})[\Phi(\frac{x\beta-c}{\sigma})]^{-1}\\
\end{align}