Simulation of Post-double LASSO estimator

Simulation of Post-double LASSO estimator === ## 1. Model setup: #### Structural model (Partially linear model) : $$ y_i = \alpha_0 d_i + \boldsymbol{x}_i' \boldsymbol{\theta}_g + l_d \gamma_{g,i} +\sigma_y(d_i, \boldsymbol{x}_i) \varepsilon_i, \quad \varepsilon_i \sim N(0,1) $$ $$ d_i = \boldsymbol{x}_i' \boldsymbol{\theta}_m + l_d \gamma_{m, i} + \sigma_d(\boldsymbol{x}_i) \nu_i, \quad \nu_i \sim N(0,1) $$ - $n = 100$ - $p = \mbox{dim}(\boldsymbol{x}_i) = 200$ - $\alpha = 0.5$ - $\boldsymbol{x}_i \sim N(0, \Sigma) \mbox{ with } \Sigma_{kj} = (0.5)^{|j-k|}$ - $\gamma_{g,i} \sim N(0, 1)$ - $\gamma_{m,i} \sim N(0, 1)$ - $l_d = 1$; if DGP is design 1 or design 2 #### Three different DGP's : 1. Design 1 (homoscedastic setting): - $\theta_{g,j} = c_y \beta_{0,j}$ - $\theta_{m,j} = c_d \beta_{0,j}$ - $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$ - $\sigma_y = \sigma_d = 1$ 2. Design 2 (heteroscedastic setting) - $\theta_{g,j} = c_y \beta_{0,j}$ - $\theta_{m,j} = c_d \beta_{0,j}$ - $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$ - $\sigma_{d,i} = \sqrt{\frac{(1+x_i'\beta_0)^2}{E_n(1+x_i'\beta_0)^2}}$ - $\sigma_{y,i} = \sqrt{\frac{(1+\alpha_0 d_i + x_i'\beta_0)^2}{E_n(1+\alpha_0 d_i + x_i'\beta_0)^2}}$ 3. Design 3 (combination of determinstic and random coefficients) - $\theta_{g, j} = \begin{cases} c_{y}(\frac{1}{j})^{2} \quad \text{for } j \le 5 \\ \\ \text{drawn iid from } N(0, \frac{1}{p})\end{cases}$ - $\theta_{m, j} = \begin{cases} c_{d}(\frac{1}{j})^{2} \quad \text{for } j \le 5 \\ \\ \text{drawn iid from } N(0, \frac{1}{p})\end{cases}$ - $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$ - $\sigma_{d,i} = \sqrt{\frac{(1+x_i'\beta_0)^2}{E_n(1+x_i'\beta_0)^2}}$ - $\sigma_{y,i} = \sqrt{\frac{(1+\alpha_0 d_i + x_i'\beta_0)^2}{E_n(1+\alpha_0 d_i + x_i'\beta_0)^2}}$ *NOTE: The article does not set a $\sigma_{d, i}$ and $\sigma_{y, i}$ for Design 3, we assume that it is the same as Design 2* --- ## 2. Code (implement in R) #### 2.1 generate $\Sigma$ from distribution of $X$ ```{r} #################################################################### ## ## ## SigmaMat_generator: Covariance matrix in X ## ## ## ## 1. p: number of variables ## ## 2. base: speed of decay which default is 0.5 ## ## ## #################################################################### SigmaMat_generator <- function(p, base = 0.5) { # create matrix: pxp resMat <- matrix(nrow = p, ncol = p) # apply two for loops to generate the matrix for (i in 1:p) { for (j in 1:p) { resMat[i, j] = base^(abs(i-j)) } } return(resMat) } ``` #### 2.2 Simulate the data: ```{r} ########################################################################## ## ## ## simulate_DGP: simulate DGP according to the design ## ## ## ## 1. Design: which DGP u wanna use for simulation ## ## 2. n: number of observations ## ## 3. p: number of variables ## ## 4. alpha: true value for parameter of interest ## ## 5. Rsq_d: R-square in second equation ## ## 6. Rsq_y: R-square in first equation ## ## ## ## RETURN: a list containing the elements of DGP ## ## ## ########################################################################## simulate_DGP <- function(Design, n, p, alpha, Rsq_d, Rsq_y) { #################### Simulating the data ##################### # list being returned when the function is called res <- list() # epsilon: error term in 1st equation e <- rnorm(n, 0, 1) %>% matrix(nrow = n, ncol = 1) # v: error term in 2nd equation v <- rnorm(n, 0, 1) %>% matrix(nrow = n, ncol = 1) ##################### Local setting for different DESIGN ##################### # Design 1: homoscedastic # Design 2: heteroscedastic # Design 3: combination of deterministic and random coefficients if (Design == 1) { # beta beta_0 = ((1/(1:p))^2 %>% matrix(nrow = p, ncol = 1))*5 # Sigma: covariance matrix in X SigmaMat <- SigmaMat_generator(p, base = 0.5) # X: design matrix meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p) ################## decided constant for y and d ################### ## decide constant for d and constant for y (c_d & c_y) # c_d: closed form solution c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric() # c_y: solve second-degree polynomial equation a <- t(beta_0)%*%SigmaMat%*%beta_0 b <- 2*alpha*c_d*t(beta_0)%*%SigmaMat%*%beta_0 c <- -((alpha^2+1)/(1-Rsq_y^2)-alpha^2-1) func <- function(x) a*x^2 + b*x + c # define second order polynomial equation solver <- multiroot(func, start = 0) # using Newton method c_y <- solver$root ######################################################################## # regression coefficient: theta = c * beta # constant: control the R_square for each reduced form theta_g = c_y*beta_0 theta_m = c_d*beta_0 # treatment variabe: d sigma_d = 1 d <- X%*%theta_m + sigma_d*v # simulate d # outcome variable: y sigma_y = 1 y <- alpha*d + X%*%theta_g + sigma_y*e # simulate y } else if (Design == 2) { # beta beta_0 = (1/(1:p))^2 %>% matrix(nrow = p, ncol = 1) # Sigma: covariance matrix in X SigmaMat <- SigmaMat_generator(p, base = 0.5) # X: design matrix meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p) ########################## decided constant for y and d ######################### ## decide constant for d and constant for y (c_d & c_y) # c_d: closed form solution c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric() # c_y: solve second-degree polynomial equation a <- alpha^2*(1-Rsq_y)*t(beta_0)%*%SigmaMat%*%beta_0 b <- c_d*(1-Rsq_y)*(t(beta_0)%*%beta_0) c <- (c_d^2*t(beta_0)%*%SigmaMat%*%beta_0+1.25)*(1-Rsq_y)-1.25 func <- function(x) a*x^2 + b*x + c # define second order polynomial equation solver <- multiroot(func, start = 0) # using Newton method c_y <- solver$root ############################################################################## # regression coefficient: theta = c * beta theta_g = c_y*beta_0 theta_m = c_d*beta_0 # treatment variable: d sigma_d <- sqrt((1+X%*%beta_0)^2 / (mean((1+X%*%beta_0)^2))) # outcome variable: y sigma_y <- sqrt((1+alpha*d+X%*%beta_0)^2 / (mean((1+alpha*d+X%*%beta_0)^2))) y <- alpha*d + X%*%theta_g + sigma_y*e } else if (Design == 3) { # beta beta_0 = (1/(1:p))^2 %>% matrix(nrow = p, ncol = 1) # Sigma SigmaMat <- SigmaMat_generator(p, base = 0.5) # X: design matrix meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p) # Generating thetas for Design 3 theta_g <- matrix(nrow = p) theta_m <- matrix(nrow = p) ###################### decided constant for y and d ###################### # decide c_d & c_y c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric() a <- alpha^2*(1-Rsq_y)*t(beta_0)%*%SigmaMat%*%beta_0 b <- c_d*(1-Rsq_y)*(t(beta_0)%*%beta_0) c <- (c_d^2*t(beta_0)%*%SigmaMat%*%beta_0+1.25)*(1-Rsq_y)-1.25 func <- function(x) a*x^2 + b*x + c solver <- multiroot(func, start = 0) # using Newton method c_y <- solver$root ############################################################################### for (j in 1:200) { if (j <= 5) { theta_g[j, 1] <- c_y * ((1/j)^2) theta_m[j, 1] <- c_d * ((1/j)^2) } else { theta_g[j, 1] <- rnorm(1, mean = 0, sd = sqrt(1/p)) theta_m[j, 1] <- rnorm(1, mean = 0, sd = sqrt(1/p)) } } # treatment variable: d sigma_d <- sqrt((1+X%*%beta_0)^2 / (mean((1+X%*%beta_0)^2))) d <- X%*%theta_m + sigma_d*v # outcome variable: y sigma_y <- sqrt((1+alpha*d+X%*%beta_0)^2 / (mean((1+alpha*d+X%*%beta_0)^2))) y <- alpha*d + X%*%theta_g + sigma_y*e } else { print("No such design :(") return(1) } res[["X"]] <- X res[["d"]] <- d res[["y"]] <- y res[["c_y"]] <- c_y res[["c_d"]] <- c_d return(res) } ``` #### 2.2 Post-Double LASSO estimator: ```{r} ################################################################################## ## ## ## postDouble_lasso: return estimation of alpha by post-double LASSO method ## ## ## ## 1. n: number of observations ## ## 2. p: number of variables ## ## 1. lst: list containing simulated data ## ## ## ## RETURN: estimation of alpha by postDouble method ## ## ## ################################################################################## postDouble_LASSO <- function(n, p, lst) { ## storing the index of remaining variables after LASSO idx_fit1 <- vector(mode = "integer") idx_fit2 <- vector(mode = "integer") X <- lst$X y <- lst$y d <- lst$d ################# Estimating procedure ############### # lambda proposed by (2.14) lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p)) # penalty loading proposed by APPENDIX A loading1 <- iterated_loading(n, p, X, y, nu = 0.01, K = 5) # fit the LASSO using lambda and loading defined above # regress X on y lasso1 <- glmnet(X, y, family = "gaussian", alpha = 1, lambda = lambda/n, penalty.factor = loading1, intercept = F) # which variables still remain in the model? beta_lasso1 <- as.vector(lasso1$beta) idx_fit1 <- which(beta_lasso1 != 0) # penalty loading proposed by APPENDIX A loading2 <- iterated_loading(n, p, X, d, nu = 0.01, K = 5) lasso2 <- glmnet(X, d, family = "gaussian", alpha = 1, lambda = lambda/n, penalty.factor = loading2, intercept = F) # which variables still remain in the model? beta_lasso2 <- as.vector(lasso2$beta) idx_fit2 <- which(beta_lasso2 != 0) # union of idx_fit1 and idx_fit2 union_idx <- sort(union(idx_fit1, idx_fit2)) # s_hat: number of elements in the union set s_hat <- length(union_idx) # define new design matrix used in post-OLS stage X_new <- cbind(d, X[, union_idx]) # estimate parameter of interest by OLS # using gernalized inverse para_hat <- ginv(t(X_new)%*%X_new)%*%t(X_new)%*%y alpha_hat <- para_hat[1, 1] return(alpha_hat) } ``` #### 2.3 conventional Post-LASSO estimator ```{r} ####################################################################### ## ## ## post_LASSO: return estimation of alpha by post-LASSO method ## ## ## ## 1. lst: list containing simuated data ## ## ## ## RETURN: estimation of alpha by post LASSO method ## ## ## ####################################################################### post_LASSO <- function(n, p, lst) { # storting the index of remaining variables after LASSO which excluding alpha in penalized term. idx_fit <- vector(mode = "integer") X <- lst$X y <- lst$y d <- lst$d # define new design matrix used in single-LASSO X_new <- cbind(d, X) ######################### Estimating procedure ########################## # lambda proposed by (2.14) lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p)) # penalty loading proposed by APPENDIX A loading <- iterated_loading(n, p, X, y, 0.01, 5) loading <- c(0, loading) # refit LASSO using adquent lambda lasso.naive <- glmnet(X_new, y, family = "gaussian", alpha = 1, lambda = lambda/n, intercept = F, penalty.factor = loading) # which variables still remain in the model after model selection (contains d for sure) beta_lasso <- as.vector(lasso.naive$beta) idx_fit <- which(beta_lasso != 0) # define new design matrix used in post-staged OLS X_new <- X_new[, idx_fit] # estimate parameter by OLS para_hat <- ginv(t(X_new)%*%X_new)%*%t(X_new)%*%y alpha_hat <- para_hat[1, 1] return(alpha_hat) } ``` #### 2.4 lambda and loading ```{r} ############################################################################## ## ## ## iterated_loading: choose panelty loading by itereated method ## ## ## ## 1. n: number of observations ## ## 2. p: number of variables ## ## 3. X: design matrix containing covariates ## ## 4. Y: vector of respose variable ## ## 4. nu: tolerance level ## ## 5. K: maximum number of iteration ## ## ## ## RETURN: a p-dimensional vector ## ## ## ############################################################################## iterated_loading <- function(n, p, X, y, nu, K) { # constant c <- 1.1 # gamma (significant level) gamma <- 0.05 # lambda proposed by (2.14) init_lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p)) # least squares estimator beta_bar <- ginv(t(X)%*%X)%*%t(X)%*%y ############## estimating the penalty loadings ################# # penalty loading in zero stage init_loading <- vector(length = p, mode = "numeric") residual <- as.vector(y - X%*%beta_bar) for (j in 1:p) { init_loading[j] = sqrt(mean(X[ ,j]^2*residual^2)) } # number of stage k <- 0 # tolerance level nu <- nu # upper bound on the number of iterations K <- K # define penalty loading for k stage current_loading <- vector(length = p, mode = "numeric") # define penalty loading for k+1 stage next_loading <- vector(length = p, mode = "numeric") # initial setting in order to make the iteration start # not important next_loading <- init_loading while(1) { # increment k in each iteration k = k+1 print(paste("This is", k, "stage")) # reassign l_k_next_hat to l_k_hat from previous iteration current_loading <- next_loading ### (1) compute post-LASSO estimator based on loading current_loading lasso_k <- glmnet(X, y, family = "gaussian", alpha = 1, lambda = init_lambda/n, intercept = F, penalty.factor = current_loading) beta_lasso <- as.vector(lasso_k$beta) # index set idx_fit <- which(beta_lasso != 0) # estimate post-LASSO estimator X_temp <- X[, idx_fit] %>% matrix(nrow = 100, byrow = F) beta_tilda <- ginv(t(X_temp)%*%X_temp)%*%t(X_temp)%*%y ### (2) calculate next loading s_hat <- length(idx_fit) print(paste("the variable remaining after LASSO in", k, "stage:", s_hat)) for (j in 1:p) { next_loading[j] = sqrt(mean(X[ ,j]^2 * as.vector(y - X[, idx_fit] %*% beta_tilda)^2)) * sqrt(n / (n - s_hat)) } ### (3): stopping criteria criteria <- max(abs(current_loading - next_loading)) if (criteria <= nu | k > K) { print(paste("complete at", k, "stage!")) print(paste("criteria:", criteria)) print(next_loading) print(paste("S_hat: ", s_hat)) return(next_loading) } } } ``` #### 2.5 main function ```{r} main <- function(seed, Design, n, p, alpha, Rsq_d, Rsq_y) { ############################################################################## ## ## ## main: main function to make estimation based on different method ## ## ## ## 1. seed: control randomness ## ## 2. n: number of observations ## ## 3. p: number of covariates ## ## 4, alpha: true value of parameter of interest ## ## 5. Design: which DGP u wanna use for simulation ## ## 6. Rsq_d: desired R-square for equation (2) ## ## 7. Rsq_y: desired R-square for equation (1) ## ## ## ############################################################################## # return object est_result <- list() # control the randomness set.seed(seed) # simulate the data sim <- simulate_DGP(Design, n, p, alpha, Rsq_d, Rsq_y) # estimate the alpha from postDouble LASSO method postDouble_est <- postDouble_LASSO(n, p, sim) est_result[["postDouble_est"]] <- postDouble_est # estimate the alpha from post LASSO method post_est <- post_LASSO(n, p, sim) est_result[["post_est"]] <- post_est return(est_result) } ``` --- ## 3. Results from simulations ### 3.1 Original $c_y$ ![](https://i.imgur.com/h3nYHUQ.png) ![](https://i.imgur.com/5UbaJc4.png) ![](https://i.imgur.com/QPYIjxt.png) ### 3.2 revised $c_y$ ![](https://i.imgur.com/kNuaIui.png) ### 3.3 revised $c_y$ and larger lambda ![](https://i.imgur.com/UUO5G2C.png) ![](https://i.imgur.com/p2TECzk.png) ![](https://i.imgur.com/fKqsg4P.png) --- ## 4. RMSE ### 4.1 RMSE with original $c_y$ | RMSE | Double | Naive | |:--------:|:-------:|:------:| | Design 1 | 0.1947 | 0.3483 | | Design 2 | 0.2780 | 0.4100 | | Design 3 | 4.5254 | 0.4360 | ### 4.2 RMSE with revised $c_y$ | RMSE | Double | Naive | |:--------:|:-------:|:------:| | Design 1 | 0.1181 | 0.2376 | ### 4.3 RMSE with revised $c_y$ and larger lambda | RMSE | Double | Naive | |:--------:|:-------:|:------:| | Design 1 | 0.1340 | 0.2497 | | Design 2 | 0.1834 | 0.2576 | | Design 3 | 0.2006 | 0.2071 |