---
GA: UA-159972578-2
---
###### tags: `計量經濟` `Econometrics` `Panel Data` `plm` `Random Effect Model` `Fixed Effect Model`
# Panel Data Model (TSCS Data)
+ 閱讀 [Rpub版](https://rpubs.com/RitaTang/plm)
+ Reference:
1. [Econometrics Academy](https://sites.google.com/site/econometricsacademy/econometrics-models/panel-data-models)
2. CUHK Materials
## Content
[1. 資料結構](#ch1)
[2. 模型介紹](#ch2)
[3. 模型選擇](#ch3)
[4. 模型解釋](#ch4)
## <a id='ch1'></a> 1. What is panel data?
+ Structure in data: Individual & Time

+ Prepare Data
```{r message=FALSE, warning=FALSE}
library(plm)
mydata<- read.csv("panel_wage.csv")
attach(mydata) # 代表接下來都用同樣的df,呼叫變數時前面不用再打'mydata$'
formula = lwage ~ exp + exp2 + wks + ed
```
+ Variable List
<table>
<tr>
<td>
lwage
</td>
<td>
log(Wage)
</td>
<td>
Dependent Variable
</td>
</tr>
<tr>
<td>
exp
</td>
<td>
Experience
</td>
<td>
Varying Regressor
</td>
</tr>
<tr>
<td>
exp2
</td>
<td>
Experience^2
</td>
<td>
Varying Regressor
</td>
</tr>
<tr>
<td>
wks
</td>
<td>
Weeks worked
</td>
<td>
Varying Regressor
</td>
</tr>
<tr>
<td>
ed
</td>
<td>
Education
</td>
<td>
Time-invariant Regressor
</td>
</tr>
</table>
+ Set as Panel Data
```{r}
pdata <- pdata.frame(mydata, index=c("id","t"))
```
## <a id='ch2'></a> 2. To deal with corrlations among error: Data Transform
### I. Estimators
+ OLS Pooled
+ Transformation: Nothing
+ No. obs: NT
+ **Ignores the unobserved heterogeneity of users** (possible association within groups)
```{r results='hide'}
pooling <- plm(formula, data=pdata, model= "pooling")
summary(pooling) # 跟lm一樣但忽略data structure
```
+ Beween (individual)
+ Transformation: **Time average of all variable**
+ No. obs: N
+ Loss information
```{r results='hide'}
between <- plm(formula, data=pdata, model= "between")
summary(between)
```
+ Within (individual) across time, Fixed Effect (FE)
+ Transformation: **Time-demean**
+ No. obs: NT
+ Individual specific effect (𝜶i) cancelled(扣掉人的效果)
+ No idiosyncratic error(error between an indivisual)
+ 
+ Error(𝜶) is individual specific error
+ Error(e) is idiosyncratic error
+ **Time-invariant variable are dropped**
```{r results='hide'}
fixed <- plm(formula, data=pdata, model= "within")
summary(fixed)
```
<center>

(圖解Btween和Within之差異)
</center>
<br>
+ LSDV, Fixed Effects (FE):做個人的Dummy
+ First-Diff (FD)
+ Transformation: **One period difference**
+ No. obs: N(T-1)
+ Individual specific effect (𝜶i) cancelled
+ **Time-invariant variable are dropped**
+ No Constant(因為被減掉了)
+ 必須在strict exogeneity(嚴格外生性的)的前提成立下才能使用
```{r results='hide'}
firstdiff <- plm(formula, data=pdata, model= "fd")
summary(firstdiff) # 沒有截距項,會把exp的係數打在(Intercept)中
```

<br>

<center>(Why FE model is preferred to FD model)</center>
<br>
+ Random Effect (RE)
+ Transformation: **Weighted average of between & within estimates**
+ 
+ theta(or Lamda)介於在0~1之間
+ 愈接近0代表靠近Pooled OLS,愈接近1代表靠近within的方法
+ 若個人效果(indivisual specific)不大,且模型沒有inconsistent(RE和FE都是Consistence)則選擇RE模型(More Effiecient)
+ No. obs: NT
```{r results='hide'}
random <- plm(formula, data=pdata, model= "random")
summary(random)
```
### II. Situation
+ 在計量經濟中,最重要的是Consistence,再來才是Effiecience
<table>
<tr>
<td>
Estimator \ True model
</td>
<th>
Pooled model
</th>
<th>
RE model
</th>
<th>
FE model
</th>
</tr>
<tr>
<th>
Pooled OLS estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>
<tr>
<th>
Between estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>
<tr>
<th>
Within or FE estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Consistent
</td>
</tr>
<tr>
<th>
RE estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>
</table>
## <a id='ch3'></a> 3. Choose a Model
<center>

(Flowchart for Choosing a Model)
</center>
<br>

<br>

<br>
+ Heteroscedasticity: BP test
```{r message=FALSE, warning=FALSE}
library(lmtest)
bptest(pooling)
```
+ Other Statistic test
```{r}
## LM test for random effects versus OLS
plmtest(pooling)
## LM test for fixed effects versus OLS
pFtest(fixed, pooling)
```
+ Hausman test: FE v.s. RE
+ Can be calculated only for the time-varying regressors.
+ Significant: use the fixed effects.
+ Insignificant: use the random effects.
```{r}
phtest(random, fixed)
```
## <a id='ch4'></a> 4. Explaination
<center>

</center>
+ 不管是哪個 estimators 都顯示,較高的經驗和教育水準與較高的薪資水平有關
+ 就各個模型而言
+ 【Pooled OLS】跨過個人和時間,額外一年的工作經驗會導致薪資提高4%
+ 【Between】對有多一年工作經驗的人,其平均薪資比一般人高3%
+ 【Within】每增加一年的工作經驗,對經驗高於平均的人而言薪資會多11%
+ 【First differences】在第一年到下一年的期間,每增加一年的工作經驗,薪資會多11%
+ 【Random】每增加一年的工作經驗,對經驗高於平均的人而言薪資會多8%
+ 因為 Hausman test 顯示 FE & RE 兩者模型的係數顯著不同,因此我們選擇 FE 模型
+ Rho 是 individual specific variation 的百分比,此例有非常高的比例 (FE: 98% & RE: 81%) 被 individual specific term 被解釋,剩餘不能解釋的是由於 idiosyncratic error
+ Rho 是個人的變異(因為在model裡面沒有放入individual)
+ 假設它帶有效果會跑到error裡面和真正的error混在一起(idiosyncratic)
+ Lambda 為 82%,因此 RE estimates 比 pooled estimates 更靠近 within estimates
+ FE把所有個人扣除所以R2比較大
+ R-squares 顯示 between estimator 可以解釋 32% 的 between variation,而 FE & RE estimators 分別可以解釋 66% 和 63% 的 within variation