---
# System prepended metadata

title: Panel Data Model (TSCS Data)
tags: [Fixed Effect Model, Econometrics, 計量經濟, plm, Random Effect Model, Panel Data]

---

---
GA: UA-159972578-2
---

###### tags: `計量經濟` `Econometrics` `Panel Data` `plm` `Random Effect Model` `Fixed Effect Model` 

# Panel Data Model (TSCS Data)

+ 閱讀 [Rpub版](https://rpubs.com/RitaTang/plm)
+ Reference:
	1. [Econometrics Academy](https://sites.google.com/site/econometricsacademy/econometrics-models/panel-data-models)
	2. CUHK Materials


## Content

[1. 資料結構](#ch1)
[2. 模型介紹](#ch2)
[3. 模型選擇](#ch3)
[4. 模型解釋](#ch4)


## <a id='ch1'></a> 1. What is panel data?

+ Structure in data: Individual & Time

![](https://i.imgur.com/fKI7LLa.png =60%x)

+ Prepare Data
```{r message=FALSE, warning=FALSE}
library(plm)
mydata<- read.csv("panel_wage.csv")
attach(mydata) # 代表接下來都用同樣的df，呼叫變數時前面不用再打'mydata$'
formula = lwage ~ exp + exp2 + wks + ed
```

+ Variable List

<table>
<tr>
<td>
lwage
</td>
<td>
log(Wage)
</td>
<td>
Dependent Variable
</td>
</tr>

<tr>
<td>
exp
</td>
<td>
Experience
</td>
<td>
Varying Regressor
</td>
</tr>

<tr>
<td>
exp2
</td>
<td>
Experience^2
</td>
<td>
Varying Regressor
</td>
</tr>

<tr>
<td>
wks
</td>
<td>
Weeks worked
</td>
<td>
Varying Regressor
</td>
</tr>

<tr>
<td>
ed
</td>
<td>
Education
</td>
<td>
Time-invariant Regressor
</td>
</tr>
</table>


+ Set as Panel Data
```{r}
pdata <- pdata.frame(mydata, index=c("id","t"))
```


## <a id='ch2'></a> 2. To deal with corrlations among error: Data Transform

### I. Estimators

+ OLS Pooled
    + Transformation: Nothing
    + No. obs:  NT
    + **Ignores the unobserved heterogeneity of users** (possible association within groups)
```{r results='hide'}
pooling <- plm(formula, data=pdata, model= "pooling")
summary(pooling) # 跟lm一樣但忽略data structure
```


+ Beween (individual)
    + Transformation: **Time average of all variable**
    + No. obs:  N
    + Loss information

```{r results='hide'}
between <- plm(formula, data=pdata, model= "between")
summary(between)
```

+ Within (individual) across time, Fixed Effect (FE)
    + Transformation: **Time-demean**
    + No. obs:  NT
    + Individual specific effect (𝜶i) cancelled(扣掉人的效果)
        + No idiosyncratic error(error between an indivisual)
        + ![](https://i.imgur.com/prDdXtN.png)
        + Error(𝜶) is individual specific error
        + Error(e) is idiosyncratic error
    + **Time-invariant variable are dropped**

```{r results='hide'}
fixed <- plm(formula, data=pdata, model= "within")
summary(fixed)
```
<center>

![](https://i.imgur.com/5R8WkgO.png =80%x)
(圖解Btween和Within之差異)
</center>

<br>

+ LSDV, Fixed Effects (FE)：做個人的Dummy

+ First-Diff (FD)
    + Transformation: **One period difference**
    + No. obs:  N(T-1)
    + Individual specific effect (𝜶i) cancelled
    + **Time-invariant variable are dropped**
    + No Constant(因為被減掉了)
    + 必須在strict exogeneity(嚴格外生性的)的前提成立下才能使用


```{r results='hide'}
firstdiff <- plm(formula, data=pdata, model= "fd")
summary(firstdiff) # 沒有截距項，會把exp的係數打在(Intercept)中
```

![](https://i.imgur.com/OTlZvuc.png =85%x)

<br>

![](https://i.imgur.com/n5ZuxtY.png =85%x)
<center>(Why FE model is preferred to FD model)</center>

<br>

+ Random Effect (RE)
    + Transformation: **Weighted average of between & within estimates**
        + ![](https://i.imgur.com/cAH1HRW.png)
        + theta(or Lamda)介於在0~1之間
        + 愈接近0代表靠近Pooled OLS，愈接近1代表靠近within的方法
        + 若個人效果(indivisual specific)不大，且模型沒有inconsistent(RE和FE都是Consistence)則選擇RE模型(More Effiecient)
    + No. obs:  NT

```{r results='hide'}
random <- plm(formula, data=pdata, model= "random")
summary(random)
```


### II. Situation

+ 在計量經濟中，最重要的是Consistence，再來才是Effiecience
<table>
<tr>
<td>
Estimator \ True model
</td>
<th>
Pooled model
</th>
<th>
RE model
</th>
<th>
FE model
</th>
</tr>

<tr>
<th>
Pooled OLS estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>

<tr>
<th>
Between estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>

<tr>
<th>
Within or FE estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Consistent
</td>
</tr>

<tr>
<th>
RE estimator
</th>
<td>
Consistent
</td>
<td>
Consistent
</td>
<td>
Inconsistent
</td>
</tr>
</table>

## <a id='ch3'></a> 3. Choose a Model


<center>

![](https://i.imgur.com/xk6SQSU.png =65%x)
(Flowchart for Choosing a Model)
</center>

<br>


![](https://i.imgur.com/GRNELxw.png =85%x)
<br>
![](https://i.imgur.com/ZzwuUyQ.png =85%x)
<br>


+ Heteroscedasticity: BP test
```{r message=FALSE, warning=FALSE}
library(lmtest)
bptest(pooling)
```

+ Other Statistic test
```{r}
## LM test for random effects versus OLS
plmtest(pooling)
## LM test for fixed effects versus OLS
pFtest(fixed, pooling)
```

+ Hausman test: FE v.s. RE
    + Can be calculated only for the time-varying regressors.
    + Significant: use the fixed effects.
    + Insignificant: use the random effects.
```{r}
phtest(random, fixed)
```




## <a id='ch4'></a> 4. Explaination
<center>

![](https://i.imgur.com/vFaj4Fx.png =65%x)

</center>

+ 不管是哪個 estimators 都顯示，較高的經驗和教育水準與較高的薪資水平有關
+ 就各個模型而言
    + 【Pooled OLS】跨過個人和時間，額外一年的工作經驗會導致薪資提高4％
    + 【Between】對有多一年工作經驗的人，其平均薪資比一般人高3%
    + 【Within】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多11%
    + 【First differences】在第一年到下一年的期間，每增加一年的工作經驗，薪資會多11%
    + 【Random】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多8%
+ 因為 Hausman test 顯示 FE & RE 兩者模型的係數顯著不同，因此我們選擇 FE 模型
+ Rho 是 individual specific variation 的百分比，此例有非常高的比例 (FE: 98% & RE: 81%) 被 individual specific term 被解釋，剩餘不能解釋的是由於 idiosyncratic error
    + Rho 是個人的變異（因為在model裡面沒有放入individual）
    + 假設它帶有效果會跑到error裡面和真正的error混在一起（idiosyncratic）
+ Lambda 為 82%，因此 RE estimates 比 pooled estimates 更靠近 within estimates
    + FE把所有個人扣除所以R2比較大
+ R-squares 顯示 between estimator 可以解釋 32% 的 between variation，而 FE & RE estimators 分別可以解釋 66% 和 63% 的 within variation


