--- # System prepended metadata title: Panel Data Model (TSCS Data) tags: [Fixed Effect Model, Econometrics, 計量經濟, plm, Random Effect Model, Panel Data] --- --- GA: UA-159972578-2 --- ###### tags: `計量經濟` `Econometrics` `Panel Data` `plm` `Random Effect Model` `Fixed Effect Model` # Panel Data Model (TSCS Data) + 閱讀 [Rpub版](https://rpubs.com/RitaTang/plm) + Reference: 1. [Econometrics Academy](https://sites.google.com/site/econometricsacademy/econometrics-models/panel-data-models) 2. CUHK Materials ## Content [1. 資料結構](#ch1) [2. 模型介紹](#ch2) [3. 模型選擇](#ch3) [4. 模型解釋](#ch4) ## 1. What is panel data? + Structure in data: Individual & Time ![](https://i.imgur.com/fKI7LLa.png =60%x) + Prepare Data ```{r message=FALSE, warning=FALSE} library(plm) mydata<- read.csv("panel_wage.csv") attach(mydata) # 代表接下來都用同樣的df，呼叫變數時前面不用再打'mydata$' formula = lwage ~ exp + exp2 + wks + ed ``` + Variable List

lwage	log(Wage)	Dependent Variable
exp	Experience	Varying Regressor
exp2	Experience^2	Varying Regressor
wks	Weeks worked	Varying Regressor
ed	Education	Time-invariant Regressor

+ Set as Panel Data ```{r} pdata <- pdata.frame(mydata, index=c("id","t")) ``` ## 2. To deal with corrlations among error: Data Transform ### I. Estimators + OLS Pooled + Transformation: Nothing + No. obs: NT + **Ignores the unobserved heterogeneity of users** (possible association within groups) ```{r results='hide'} pooling <- plm(formula, data=pdata, model= "pooling") summary(pooling) # 跟lm一樣但忽略data structure ``` + Beween (individual) + Transformation: **Time average of all variable** + No. obs: N + Loss information ```{r results='hide'} between <- plm(formula, data=pdata, model= "between") summary(between) ``` + Within (individual) across time, Fixed Effect (FE) + Transformation: **Time-demean** + No. obs: NT + Individual specific effect (𝜶i) cancelled(扣掉人的效果) + No idiosyncratic error(error between an indivisual) + ![](https://i.imgur.com/prDdXtN.png) + Error(𝜶) is individual specific error + Error(e) is idiosyncratic error + **Time-invariant variable are dropped** ```{r results='hide'} fixed <- plm(formula, data=pdata, model= "within") summary(fixed) ``` ![](https://i.imgur.com/5R8WkgO.png =80%x) (圖解Btween和Within之差異)
+ LSDV, Fixed Effects (FE)：做個人的Dummy + First-Diff (FD) + Transformation: **One period difference** + No. obs: N(T-1) + Individual specific effect (𝜶i) cancelled + **Time-invariant variable are dropped** + No Constant(因為被減掉了) + 必須在strict exogeneity(嚴格外生性的)的前提成立下才能使用 ```{r results='hide'} firstdiff <- plm(formula, data=pdata, model= "fd") summary(firstdiff) # 沒有截距項，會把exp的係數打在(Intercept)中 ``` ![](https://i.imgur.com/OTlZvuc.png =85%x)
![](https://i.imgur.com/n5ZuxtY.png =85%x) (Why FE model is preferred to FD model)
+ Random Effect (RE) + Transformation: **Weighted average of between & within estimates** + ![](https://i.imgur.com/cAH1HRW.png) + theta(or Lamda)介於在0~1之間 + 愈接近0代表靠近Pooled OLS，愈接近1代表靠近within的方法 + 若個人效果(indivisual specific)不大，且模型沒有inconsistent(RE和FE都是Consistence)則選擇RE模型(More Effiecient) + No. obs: NT ```{r results='hide'} random <- plm(formula, data=pdata, model= "random") summary(random) ``` ### II. Situation + 在計量經濟中，最重要的是Consistence，再來才是Effiecience

Estimator \ True model	Pooled model	RE model	FE model
Pooled OLS estimator	Consistent	Consistent	Inconsistent
Between estimator	Consistent	Consistent	Inconsistent
Within or FE estimator	Consistent	Consistent	Consistent
RE estimator	Consistent	Consistent	Inconsistent

## 3. Choose a Model ![](https://i.imgur.com/xk6SQSU.png =65%x) (Flowchart for Choosing a Model)
![](https://i.imgur.com/GRNELxw.png =85%x)
![](https://i.imgur.com/ZzwuUyQ.png =85%x)
+ Heteroscedasticity: BP test ```{r message=FALSE, warning=FALSE} library(lmtest) bptest(pooling) ``` + Other Statistic test ```{r} ## LM test for random effects versus OLS plmtest(pooling) ## LM test for fixed effects versus OLS pFtest(fixed, pooling) ``` + Hausman test: FE v.s. RE + Can be calculated only for the time-varying regressors. + Significant: use the fixed effects. + Insignificant: use the random effects. ```{r} phtest(random, fixed) ``` ## 4. Explaination ![](https://i.imgur.com/vFaj4Fx.png =65%x) + 不管是哪個 estimators 都顯示，較高的經驗和教育水準與較高的薪資水平有關 + 就各個模型而言 + 【Pooled OLS】跨過個人和時間，額外一年的工作經驗會導致薪資提高4％ + 【Between】對有多一年工作經驗的人，其平均薪資比一般人高3% + 【Within】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多11% + 【First differences】在第一年到下一年的期間，每增加一年的工作經驗，薪資會多11% + 【Random】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多8% + 因為 Hausman test 顯示 FE & RE 兩者模型的係數顯著不同，因此我們選擇 FE 模型 + Rho 是 individual specific variation 的百分比，此例有非常高的比例 (FE: 98% & RE: 81%) 被 individual specific term 被解釋，剩餘不能解釋的是由於 idiosyncratic error + Rho 是個人的變異（因為在model裡面沒有放入individual） + 假設它帶有效果會跑到error裡面和真正的error混在一起（idiosyncratic） + Lambda 為 82%，因此 RE estimates 比 pooled estimates 更靠近 within estimates + FE把所有個人扣除所以R2比較大 + R-squares 顯示 between estimator 可以解釋 32% 的 between variation，而 FE & RE estimators 分別可以解釋 66% 和 63% 的 within variation