Panel Data Model (TSCS Data)

--- GA: UA-159972578-2 --- ###### tags: `計量經濟` `Econometrics` `Panel Data` `plm` `Random Effect Model` `Fixed Effect Model` # Panel Data Model (TSCS Data) + 閱讀 [Rpub版](https://rpubs.com/RitaTang/plm) + Reference: 1. [Econometrics Academy](https://sites.google.com/site/econometricsacademy/econometrics-models/panel-data-models) 2. CUHK Materials ## Content [1. 資料結構](#ch1) [2. 模型介紹](#ch2) [3. 模型選擇](#ch3) [4. 模型解釋](#ch4) ## <a id='ch1'></a> 1. What is panel data? + Structure in data: Individual & Time ![](https://i.imgur.com/fKI7LLa.png =60%x) + Prepare Data ```{r message=FALSE, warning=FALSE} library(plm) mydata<- read.csv("panel_wage.csv") attach(mydata) # 代表接下來都用同樣的df，呼叫變數時前面不用再打'mydata$' formula = lwage ~ exp + exp2 + wks + ed ``` + Variable List <table> <tr> <td> lwage </td> <td> log(Wage) </td> <td> Dependent Variable </td> </tr> <tr> <td> exp </td> <td> Experience </td> <td> Varying Regressor </td> </tr> <tr> <td> exp2 </td> <td> Experience^2 </td> <td> Varying Regressor </td> </tr> <tr> <td> wks </td> <td> Weeks worked </td> <td> Varying Regressor </td> </tr> <tr> <td> ed </td> <td> Education </td> <td> Time-invariant Regressor </td> </tr> </table> + Set as Panel Data ```{r} pdata <- pdata.frame(mydata, index=c("id","t")) ``` ## <a id='ch2'></a> 2. To deal with corrlations among error: Data Transform ### I. Estimators + OLS Pooled + Transformation: Nothing + No. obs: NT + **Ignores the unobserved heterogeneity of users** (possible association within groups) ```{r results='hide'} pooling <- plm(formula, data=pdata, model= "pooling") summary(pooling) # 跟lm一樣但忽略data structure ``` + Beween (individual) + Transformation: **Time average of all variable** + No. obs: N + Loss information ```{r results='hide'} between <- plm(formula, data=pdata, model= "between") summary(between) ``` + Within (individual) across time, Fixed Effect (FE) + Transformation: **Time-demean** + No. obs: NT + Individual specific effect (𝜶i) cancelled(扣掉人的效果) + No idiosyncratic error(error between an indivisual) + ![](https://i.imgur.com/prDdXtN.png) + Error(𝜶) is individual specific error + Error(e) is idiosyncratic error + **Time-invariant variable are dropped** ```{r results='hide'} fixed <- plm(formula, data=pdata, model= "within") summary(fixed) ``` <center> ![](https://i.imgur.com/5R8WkgO.png =80%x) (圖解Btween和Within之差異) </center> <br> + LSDV, Fixed Effects (FE)：做個人的Dummy + First-Diff (FD) + Transformation: **One period difference** + No. obs: N(T-1) + Individual specific effect (𝜶i) cancelled + **Time-invariant variable are dropped** + No Constant(因為被減掉了) + 必須在strict exogeneity(嚴格外生性的)的前提成立下才能使用 ```{r results='hide'} firstdiff <- plm(formula, data=pdata, model= "fd") summary(firstdiff) # 沒有截距項，會把exp的係數打在(Intercept)中 ``` ![](https://i.imgur.com/OTlZvuc.png =85%x) <br> ![](https://i.imgur.com/n5ZuxtY.png =85%x) <center>(Why FE model is preferred to FD model)</center> <br> + Random Effect (RE) + Transformation: **Weighted average of between & within estimates** + ![](https://i.imgur.com/cAH1HRW.png) + theta(or Lamda)介於在0~1之間 + 愈接近0代表靠近Pooled OLS，愈接近1代表靠近within的方法 + 若個人效果(indivisual specific)不大，且模型沒有inconsistent(RE和FE都是Consistence)則選擇RE模型(More Effiecient) + No. obs: NT ```{r results='hide'} random <- plm(formula, data=pdata, model= "random") summary(random) ``` ### II. Situation + 在計量經濟中，最重要的是Consistence，再來才是Effiecience <table> <tr> <td> Estimator \ True model </td> <th> Pooled model </th> <th> RE model </th> <th> FE model </th> </tr> <tr> <th> Pooled OLS estimator </th> <td> Consistent </td> <td> Consistent </td> <td> Inconsistent </td> </tr> <tr> <th> Between estimator </th> <td> Consistent </td> <td> Consistent </td> <td> Inconsistent </td> </tr> <tr> <th> Within or FE estimator </th> <td> Consistent </td> <td> Consistent </td> <td> Consistent </td> </tr> <tr> <th> RE estimator </th> <td> Consistent </td> <td> Consistent </td> <td> Inconsistent </td> </tr> </table> ## <a id='ch3'></a> 3. Choose a Model <center> ![](https://i.imgur.com/xk6SQSU.png =65%x) (Flowchart for Choosing a Model) </center> <br> ![](https://i.imgur.com/GRNELxw.png =85%x) <br> ![](https://i.imgur.com/ZzwuUyQ.png =85%x) <br> + Heteroscedasticity: BP test ```{r message=FALSE, warning=FALSE} library(lmtest) bptest(pooling) ``` + Other Statistic test ```{r} ## LM test for random effects versus OLS plmtest(pooling) ## LM test for fixed effects versus OLS pFtest(fixed, pooling) ``` + Hausman test: FE v.s. RE + Can be calculated only for the time-varying regressors. + Significant: use the fixed effects. + Insignificant: use the random effects. ```{r} phtest(random, fixed) ``` ## <a id='ch4'></a> 4. Explaination <center> ![](https://i.imgur.com/vFaj4Fx.png =65%x) </center> + 不管是哪個 estimators 都顯示，較高的經驗和教育水準與較高的薪資水平有關 + 就各個模型而言 + 【Pooled OLS】跨過個人和時間，額外一年的工作經驗會導致薪資提高4％ + 【Between】對有多一年工作經驗的人，其平均薪資比一般人高3% + 【Within】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多11% + 【First differences】在第一年到下一年的期間，每增加一年的工作經驗，薪資會多11% + 【Random】每增加一年的工作經驗，對經驗高於平均的人而言薪資會多8% + 因為 Hausman test 顯示 FE & RE 兩者模型的係數顯著不同，因此我們選擇 FE 模型 + Rho 是 individual specific variation 的百分比，此例有非常高的比例 (FE: 98% & RE: 81%) 被 individual specific term 被解釋，剩餘不能解釋的是由於 idiosyncratic error + Rho 是個人的變異（因為在model裡面沒有放入individual） + 假設它帶有效果會跑到error裡面和真正的error混在一起（idiosyncratic） + Lambda 為 82%，因此 RE estimates 比 pooled estimates 更靠近 within estimates + FE把所有個人扣除所以R2比較大 + R-squares 顯示 between estimator 可以解釋 32% 的 between variation，而 FE & RE estimators 分別可以解釋 66% 和 63% 的 within variation