---
tags: metric, memo
---
# Three-Stage Least Squares
## Setting
We can combine SUR and IV estimator to estimate a systems of equations, which is Three-Stage Least Squares.
Suppose we want to estimate $M$ equations with $T$ observations and each observation is i.i.d:
$$y_i = Z_i \beta_i + U_i, \quad E(Z_i 'U_i)\neq 0, i=1,2,..,G$$
Let $K =\sum_{j=1}^M K_j$ as the number of all independent variables in this system, we can stack the vectors as:
$y = \begin{pmatrix} y_1 \\ y_2 \\... \\ y_M\end{pmatrix}$: $T M \times 1$ matrix
$\beta = \begin{pmatrix} \beta_1 \\ \beta_2 \\... \\ \beta_M\end{pmatrix}$: $K \times 1$ matrix
$U = \begin{pmatrix} U_1 \\ U_2 \\... \\ U_M\end{pmatrix}$: $TM \times 1$ matrix
$X = \begin{pmatrix} Z_1 & 0 & ... & 0 \\ 0 & Z_2 & ... & 0 \\ . & . &... &.\\ 0 &... & 0 & Z_G\end{pmatrix}$ : $TM \times K$ matrix
After stacking, we get
$$y= Z \beta + U$$
To estimate $\beta_i$, we have a $T \times l$ matrix $X$ such that $E(X'U_i)=0, i=1,...,M$ and $l \ge max\{k_i\}$. We can use 2SLS to estimate $\beta_i$, that is
$$\hat{\beta_i}_{2SLS}=(Z_i'P_X Z_i)^{-1} Z_i'P_X y_i$$
## GLS
Again, SUR could be a more efficient estimator. We using the following procedure to derive the estimator:
$$(I_M \otimes X)'y =(I_M \otimes X)'Z\beta + (I_M \otimes X)' U$$
For the error terms, we can get:
$$\Omega = V((I_M \otimes X)' U)=E((I_M \otimes X)' UU'(I_M \otimes X))=(I_M \otimes X') E(UU')(I_M \otimes X)\\
=(I_M \otimes X') \Sigma(I_M \otimes X) \\
=\Sigma \otimes X'X$$
Hence, $\Omega^{-1} = \Sigma^{-1} \otimes (X'X)^{-1}.$
By the formula of GLS, we can get:
$$\hat{\beta}_{GLS} = [ Z'(I_M \otimes X)\Omega^{-1}(I_M \otimes X)'Z]^{-1} Z'(I_M \otimes X)\Omega^{-1}(I_M \otimes X)'y\\
= [ Z'(I_M \otimes X)(\Sigma^{-1} \otimes (X'X)^{-1})(I_M \otimes X')Z]^{-1} Z'(I_M \otimes X)(\Sigma^{-1} \otimes (X'X)^{-1})(I_M \otimes X')y\\
= [ Z'(\Sigma^{-1} \otimes X(X'X)^{-1}X)Z]^{-1} Z'(\Sigma^{-1} \otimes X(X'X)^{-1}X)'y\\
=(Z'(\Sigma^{-1} \otimes P_X)Z)^{-1}Z'(\Sigma^{-1} \otimes P_X)y$$
Let $[\sigma^{ij}] =\Sigma^{-1}$, we can rewrite the GLS estimator as:
$$\hat{\beta}_{GLS} = \begin{pmatrix}\sigma^{11} (Z_1' P_X Z_1) & \sigma^{12} (Z_1' P_X Z_2) & ... & \sigma^{1M} (Z_1' P_X Z_M) \\ \sigma^{21} (Z_2' P_X Z_1) & \sigma^{22} (Z_2' P_X Z_2) & ... & \sigma^{2M} (Z_2' P_X Z_M) \\ . & .&...& . \\ \sigma^{M1} (Z_M' P_X Z_1) & \sigma^{M2} (Z_M' P_X Z_2) & ... & \sigma^{MM} (Z_M' P_X Z_M)\end{pmatrix}^{-1} \begin{pmatrix}Z'_1(\sum_j \sigma^{1j} P_X y_j) \\ Z'_2(\sum_j \sigma^{2j} P_X y_j) \\... \\Z'_M(\sum_j \sigma^{Mj} P_X y_j) \end{pmatrix}.$$
### Feasible estimator
Again, we need to use 2SLS to estimate $\Sigma$,$\hat{\Sigma}=[\hat{\sigma}_{ij}]$:
$$\hat{\sigma}_{ij}=\frac{1}{T}(y_j-X_j \hat{\beta}_j)'(y_j-X_j \hat{\beta}_j)$$
where $\hat{\beta}$ is the 2SLS estimator. Baltagi uses an adjusted estimator of $\hat{\sigma}_{ij}$:
$$\hat{\sigma}_{ij} = \sum_{t=1}^T \frac{e_{it}e_{jt}}{(T-k_i)^{0.5}(T-k_j)^{0.5}}=\hat{s}_{ij}$$
where $e_{it}$ is the 2SLS residual of $i$ equation on $t$ observation, $k_i$ is the number of independent verables in equation $i$, and $\hat{s}_{ij}$ is Baltagi's notation.
$$\hat{\beta}_{3SLS} = \\
=(Z'(\hat{\Sigma}^{-1} \otimes P_X)Z)^{-1}Z'(\hat{\Sigma}^{-1} \otimes P_X)y$$
## When 3SLS equal 2SLS
### Uncorrelated between Equations
If $\Sigma$ is diagonal, i.e., $\sigma_{ij}=0$ if $i \neq j$, than $\Sigma^{-1}=diag[1/\sigma_{ii}]$ is also a diagonal matrix and $\hat{\beta}_{3SLS} =\hat{\beta}_{2SLS}$.
The proof is similar to the proof of SUR.
### Just-identified in every equation
If $X'Z_i$ is square and invertible for each $i$, then $\hat{\beta}_{3SLS} =\hat{\beta}_{2SLS}$.
In this case
$$ Z'(I_M \otimes X)\\
= \begin{pmatrix}Z_1' & 0 & 0 & ... & 0 \\
0 & Z_2' & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & Z_M'\end{pmatrix}
\begin{pmatrix} X & 0 & 0 & ... & 0 \\
0 & X & 0 & ... & 0 \\
0 & 0 & 0 & ... & 0 \\
0 & 0 & 0 & ... & X \\
\end{pmatrix} \\
= \begin{pmatrix}Z_1'X & 0 & 0 & ... & 0 \\
0 & Z_2'X & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & Z_M'X\end{pmatrix} \\
=diag[Z_i'X]$$
Similarly,
$$(I_M \otimes X')Z =diag[(X'Z_i)]$$
We also know the form of the inverse matrix:
$$[(I_M \otimes X')Z]^{-1}\\
= \begin{pmatrix}X'Z_1 & 0 & 0 & ... & 0 \\
0 & X'Z_2 & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & X'X_M\end{pmatrix}^{-1} \\
= \begin{pmatrix}(X'Z_1)^{-1} & 0 & 0 & ... & 0 \\
0 & (X'Z_2)^{-1} & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & (X'Z_M)^{-1}\end{pmatrix}\\
=diag[(X'Z_i)^{-1}]$$
Hence,
$$\hat{\beta}_{3SLS} = [ Z'(I_M \otimes X)(\Sigma^{-1} \otimes (X'X)^{-1})(I_M \otimes X')Z]^{-1} Z'(I_M \otimes X)(\Sigma^{-1} \otimes (X'X)^{-1})(I_M \otimes X')y\\
= [ diag[Z_i'X](\Sigma^{-1} \otimes (X'X)^{-1})diag[X'Z_i]]^{-1} diag[Z_i'X](\Sigma^{-1} \otimes (X'X)^{-1})diag[X'y]\\
= [diag[(X'Z)^{-1}(\Sigma \otimes (X'X))diag[(Z_i'X)^{-1}] diag[Z_i'X](\Sigma^{-1} \otimes (X'X)^{-1})diag[X'y]\\
= diag[(X'Z)^{-1} diag[X'y]\\
= \begin{pmatrix}(X'Z_1)^{-1} & 0 & 0 & ... & 0 \\
0 & (X'Z_2)^{-1} & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & (X'Z_M)^{-1}\end{pmatrix}\\
\times \begin{pmatrix}X'y_1 & 0 & 0 & ... & 0 \\
0 & X'y_2 & 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & X'y_M\end{pmatrix} \\
= \begin{pmatrix}(X'Z_1)^{-1}X'y_1 & 0 & 0 & ... & 0 \\
0 & (X'Z_2)^{-1} X'y_2& 0 & ... & 0\\
0 & 0 & 0 & ... & 0\\
0 & 0 & 0 & ... & (X'y_M)^{-1}X'y_M\end{pmatrix} \\
=\hat{\beta}_{2SLS}$$