Simulation of Post-double LASSO estimator
===
## 1. Model setup:
#### Structural model (Partially linear model) :
$$
y_i = \alpha_0 d_i + \boldsymbol{x}_i' \boldsymbol{\theta}_g + l_d \gamma_{g,i} +\sigma_y(d_i, \boldsymbol{x}_i) \varepsilon_i, \quad \varepsilon_i \sim N(0,1)
$$
$$
d_i = \boldsymbol{x}_i' \boldsymbol{\theta}_m + l_d \gamma_{m, i} + \sigma_d(\boldsymbol{x}_i) \nu_i, \quad \nu_i \sim N(0,1)
$$
- $n = 100$
- $p = \mbox{dim}(\boldsymbol{x}_i) = 200$
- $\alpha = 0.5$
- $\boldsymbol{x}_i \sim N(0, \Sigma) \mbox{ with } \Sigma_{kj} = (0.5)^{|j-k|}$
- $\gamma_{g,i} \sim N(0, 1)$
- $\gamma_{m,i} \sim N(0, 1)$
- $l_d = 1$; if DGP is design 1 or design 2
<br>
#### Three different DGP's :
1. Design 1 (homoscedastic setting):
- $\theta_{g,j} = c_y \beta_{0,j}$
- $\theta_{m,j} = c_d \beta_{0,j}$
- $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$
- $\sigma_y = \sigma_d = 1$
2. Design 2 (heteroscedastic setting)
- $\theta_{g,j} = c_y \beta_{0,j}$
- $\theta_{m,j} = c_d \beta_{0,j}$
- $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$
- $\sigma_{d,i} = \sqrt{\frac{(1+x_i'\beta_0)^2}{E_n(1+x_i'\beta_0)^2}}$
- $\sigma_{y,i} = \sqrt{\frac{(1+\alpha_0 d_i + x_i'\beta_0)^2}{E_n(1+\alpha_0 d_i + x_i'\beta_0)^2}}$
3. Design 3 (combination of determinstic and random coefficients)
- $\theta_{g, j} = \begin{cases} c_{y}(\frac{1}{j})^{2} \quad \text{for } j \le 5 \\ \\ \text{drawn iid from } N(0, \frac{1}{p})\end{cases}$
- $\theta_{m, j} = \begin{cases} c_{d}(\frac{1}{j})^{2} \quad \text{for } j \le 5 \\ \\ \text{drawn iid from } N(0, \frac{1}{p})\end{cases}$
- $\beta_{0,j} = (1/j)^2, \quad j=1,2,...,200$
- $\sigma_{d,i} = \sqrt{\frac{(1+x_i'\beta_0)^2}{E_n(1+x_i'\beta_0)^2}}$
- $\sigma_{y,i} = \sqrt{\frac{(1+\alpha_0 d_i + x_i'\beta_0)^2}{E_n(1+\alpha_0 d_i + x_i'\beta_0)^2}}$
*NOTE: The article does not set a $\sigma_{d, i}$ and $\sigma_{y, i}$ for Design 3, we assume that it is the same as Design 2*
<br>
<br>
---
## 2. Code (implement in R)
#### 2.1 generate $\Sigma$ from distribution of $X$
```{r}
####################################################################
## ##
## SigmaMat_generator: Covariance matrix in X ##
## ##
## 1. p: number of variables ##
## 2. base: speed of decay which default is 0.5 ##
## ##
####################################################################
SigmaMat_generator <- function(p, base = 0.5) {
# create matrix: pxp
resMat <- matrix(nrow = p, ncol = p)
# apply two for loops to generate the matrix
for (i in 1:p) {
for (j in 1:p) {
resMat[i, j] = base^(abs(i-j))
}
}
return(resMat)
}
```
<br>
#### 2.2 Simulate the data:
```{r}
##########################################################################
## ##
## simulate_DGP: simulate DGP according to the design ##
## ##
## 1. Design: which DGP u wanna use for simulation ##
## 2. n: number of observations ##
## 3. p: number of variables ##
## 4. alpha: true value for parameter of interest ##
## 5. Rsq_d: R-square in second equation ##
## 6. Rsq_y: R-square in first equation ##
## ##
## RETURN: a list containing the elements of DGP ##
## ##
##########################################################################
simulate_DGP <- function(Design, n, p, alpha, Rsq_d, Rsq_y) {
#################### Simulating the data #####################
# list being returned when the function is called
res <- list()
# epsilon: error term in 1st equation
e <- rnorm(n, 0, 1) %>% matrix(nrow = n, ncol = 1)
# v: error term in 2nd equation
v <- rnorm(n, 0, 1) %>% matrix(nrow = n, ncol = 1)
##################### Local setting for different DESIGN #####################
# Design 1: homoscedastic
# Design 2: heteroscedastic
# Design 3: combination of deterministic and random coefficients
if (Design == 1) {
# beta
beta_0 = ((1/(1:p))^2 %>% matrix(nrow = p, ncol = 1))*5
# Sigma: covariance matrix in X
SigmaMat <- SigmaMat_generator(p, base = 0.5)
# X: design matrix
meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector
X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p)
################## decided constant for y and d ###################
## decide constant for d and constant for y (c_d & c_y)
# c_d: closed form solution
c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric()
# c_y: solve second-degree polynomial equation
a <- t(beta_0)%*%SigmaMat%*%beta_0
b <- 2*alpha*c_d*t(beta_0)%*%SigmaMat%*%beta_0
c <- -((alpha^2+1)/(1-Rsq_y^2)-alpha^2-1)
func <- function(x) a*x^2 + b*x + c # define second order polynomial equation
solver <- multiroot(func, start = 0) # using Newton method
c_y <- solver$root
########################################################################
# regression coefficient: theta = c * beta
# constant: control the R_square for each reduced form
theta_g = c_y*beta_0
theta_m = c_d*beta_0
# treatment variabe: d
sigma_d = 1
d <- X%*%theta_m + sigma_d*v # simulate d
# outcome variable: y
sigma_y = 1
y <- alpha*d + X%*%theta_g + sigma_y*e # simulate y
} else if (Design == 2) {
# beta
beta_0 = (1/(1:p))^2 %>% matrix(nrow = p, ncol = 1)
# Sigma: covariance matrix in X
SigmaMat <- SigmaMat_generator(p, base = 0.5)
# X: design matrix
meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector
X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p)
########################## decided constant for y and d #########################
## decide constant for d and constant for y (c_d & c_y)
# c_d: closed form solution
c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric()
# c_y: solve second-degree polynomial equation
a <- alpha^2*(1-Rsq_y)*t(beta_0)%*%SigmaMat%*%beta_0
b <- c_d*(1-Rsq_y)*(t(beta_0)%*%beta_0)
c <- (c_d^2*t(beta_0)%*%SigmaMat%*%beta_0+1.25)*(1-Rsq_y)-1.25
func <- function(x) a*x^2 + b*x + c # define second order polynomial equation
solver <- multiroot(func, start = 0) # using Newton method
c_y <- solver$root
##############################################################################
# regression coefficient: theta = c * beta
theta_g = c_y*beta_0
theta_m = c_d*beta_0
# treatment variable: d
sigma_d <- sqrt((1+X%*%beta_0)^2 / (mean((1+X%*%beta_0)^2)))
# outcome variable: y
sigma_y <- sqrt((1+alpha*d+X%*%beta_0)^2 / (mean((1+alpha*d+X%*%beta_0)^2)))
y <- alpha*d + X%*%theta_g + sigma_y*e
} else if (Design == 3) {
# beta
beta_0 = (1/(1:p))^2 %>% matrix(nrow = p, ncol = 1)
# Sigma
SigmaMat <- SigmaMat_generator(p, base = 0.5)
# X: design matrix
meanMat <- matrix(0, nrow = p, ncol = 1) # zero-mean vector
X <- rMVNorm(n, mean = meanMat, Sigma = SigmaMat) %>% matrix(nrow = n, ncol = p)
# Generating thetas for Design 3
theta_g <- matrix(nrow = p)
theta_m <- matrix(nrow = p)
###################### decided constant for y and d ######################
# decide c_d & c_y
c_d <- sqrt(((1/(1-Rsq_d)) - 1) / (t(beta_0)%*%SigmaMat%*%beta_0)) %>% as.numeric()
a <- alpha^2*(1-Rsq_y)*t(beta_0)%*%SigmaMat%*%beta_0
b <- c_d*(1-Rsq_y)*(t(beta_0)%*%beta_0)
c <- (c_d^2*t(beta_0)%*%SigmaMat%*%beta_0+1.25)*(1-Rsq_y)-1.25
func <- function(x) a*x^2 + b*x + c
solver <- multiroot(func, start = 0) # using Newton method
c_y <- solver$root
###############################################################################
for (j in 1:200) {
if (j <= 5) {
theta_g[j, 1] <- c_y * ((1/j)^2)
theta_m[j, 1] <- c_d * ((1/j)^2)
} else {
theta_g[j, 1] <- rnorm(1, mean = 0, sd = sqrt(1/p))
theta_m[j, 1] <- rnorm(1, mean = 0, sd = sqrt(1/p))
}
}
# treatment variable: d
sigma_d <- sqrt((1+X%*%beta_0)^2 / (mean((1+X%*%beta_0)^2)))
d <- X%*%theta_m + sigma_d*v
# outcome variable: y
sigma_y <- sqrt((1+alpha*d+X%*%beta_0)^2 / (mean((1+alpha*d+X%*%beta_0)^2)))
y <- alpha*d + X%*%theta_g + sigma_y*e
} else {
print("No such design :(")
return(1)
}
res[["X"]] <- X
res[["d"]] <- d
res[["y"]] <- y
res[["c_y"]] <- c_y
res[["c_d"]] <- c_d
return(res)
}
```
#### 2.2 Post-Double LASSO estimator:
```{r}
##################################################################################
## ##
## postDouble_lasso: return estimation of alpha by post-double LASSO method ##
## ##
## 1. n: number of observations ##
## 2. p: number of variables ##
## 1. lst: list containing simulated data ##
## ##
## RETURN: estimation of alpha by postDouble method ##
## ##
##################################################################################
postDouble_LASSO <- function(n, p, lst) {
## storing the index of remaining variables after LASSO
idx_fit1 <- vector(mode = "integer")
idx_fit2 <- vector(mode = "integer")
X <- lst$X
y <- lst$y
d <- lst$d
################# Estimating procedure ###############
# lambda proposed by (2.14)
lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p))
# penalty loading proposed by APPENDIX A
loading1 <- iterated_loading(n, p, X, y, nu = 0.01, K = 5)
# fit the LASSO using lambda and loading defined above # regress X on y
lasso1 <-
glmnet(X, y, family = "gaussian", alpha = 1, lambda = lambda/n,
penalty.factor = loading1, intercept = F)
# which variables still remain in the model?
beta_lasso1 <- as.vector(lasso1$beta)
idx_fit1 <- which(beta_lasso1 != 0)
# penalty loading proposed by APPENDIX A
loading2 <- iterated_loading(n, p, X, d, nu = 0.01, K = 5)
lasso2 <-
glmnet(X, d, family = "gaussian", alpha = 1, lambda = lambda/n,
penalty.factor = loading2, intercept = F)
# which variables still remain in the model?
beta_lasso2 <- as.vector(lasso2$beta)
idx_fit2 <- which(beta_lasso2 != 0)
# union of idx_fit1 and idx_fit2
union_idx <- sort(union(idx_fit1, idx_fit2))
# s_hat: number of elements in the union set
s_hat <- length(union_idx)
# define new design matrix used in post-OLS stage
X_new <- cbind(d, X[, union_idx])
# estimate parameter of interest by OLS # using gernalized inverse
para_hat <- ginv(t(X_new)%*%X_new)%*%t(X_new)%*%y
alpha_hat <- para_hat[1, 1]
return(alpha_hat)
}
```
<br>
#### 2.3 conventional Post-LASSO estimator
```{r}
#######################################################################
## ##
## post_LASSO: return estimation of alpha by post-LASSO method ##
## ##
## 1. lst: list containing simuated data ##
## ##
## RETURN: estimation of alpha by post LASSO method ##
## ##
#######################################################################
post_LASSO <- function(n, p, lst) {
# storting the index of remaining variables after LASSO which excluding alpha in penalized term.
idx_fit <- vector(mode = "integer")
X <- lst$X
y <- lst$y
d <- lst$d
# define new design matrix used in single-LASSO
X_new <- cbind(d, X)
######################### Estimating procedure ##########################
# lambda proposed by (2.14)
lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p))
# penalty loading proposed by APPENDIX A
loading <- iterated_loading(n, p, X, y, 0.01, 5)
loading <- c(0, loading)
# refit LASSO using adquent lambda
lasso.naive <-
glmnet(X_new, y, family = "gaussian", alpha = 1, lambda = lambda/n, intercept = F,
penalty.factor = loading)
# which variables still remain in the model after model selection (contains d for sure)
beta_lasso <- as.vector(lasso.naive$beta)
idx_fit <- which(beta_lasso != 0)
# define new design matrix used in post-staged OLS
X_new <- X_new[, idx_fit]
# estimate parameter by OLS
para_hat <- ginv(t(X_new)%*%X_new)%*%t(X_new)%*%y
alpha_hat <- para_hat[1, 1]
return(alpha_hat)
}
```
<br>
#### 2.4 lambda and loading
```{r}
##############################################################################
## ##
## iterated_loading: choose panelty loading by itereated method ##
## ##
## 1. n: number of observations ##
## 2. p: number of variables ##
## 3. X: design matrix containing covariates ##
## 4. Y: vector of respose variable ##
## 4. nu: tolerance level ##
## 5. K: maximum number of iteration ##
## ##
## RETURN: a p-dimensional vector ##
## ##
##############################################################################
iterated_loading <- function(n, p, X, y, nu, K) {
# constant
c <- 1.1
# gamma (significant level)
gamma <- 0.05
# lambda proposed by (2.14)
init_lambda <- 2*c*sqrt(n)*qnorm(1-gamma/(2*p))
# least squares estimator
beta_bar <- ginv(t(X)%*%X)%*%t(X)%*%y
############## estimating the penalty loadings #################
# penalty loading in zero stage
init_loading <- vector(length = p, mode = "numeric")
residual <- as.vector(y - X%*%beta_bar)
for (j in 1:p) {
init_loading[j] = sqrt(mean(X[ ,j]^2*residual^2))
}
# number of stage
k <- 0
# tolerance level
nu <- nu
# upper bound on the number of iterations
K <- K
# define penalty loading for k stage
current_loading <- vector(length = p, mode = "numeric")
# define penalty loading for k+1 stage
next_loading <- vector(length = p, mode = "numeric")
# initial setting in order to make the iteration start # not important
next_loading <- init_loading
while(1) {
# increment k in each iteration
k = k+1
print(paste("This is", k, "stage"))
# reassign l_k_next_hat to l_k_hat from previous iteration
current_loading <- next_loading
### (1) compute post-LASSO estimator based on loading current_loading
lasso_k <-
glmnet(X, y, family = "gaussian", alpha = 1, lambda = init_lambda/n,
intercept = F, penalty.factor = current_loading)
beta_lasso <- as.vector(lasso_k$beta)
# index set
idx_fit <- which(beta_lasso != 0)
# estimate post-LASSO estimator
X_temp <- X[, idx_fit] %>% matrix(nrow = 100, byrow = F)
beta_tilda <- ginv(t(X_temp)%*%X_temp)%*%t(X_temp)%*%y
### (2) calculate next loading
s_hat <- length(idx_fit)
print(paste("the variable remaining after LASSO in", k, "stage:", s_hat))
for (j in 1:p) {
next_loading[j] = sqrt(mean(X[ ,j]^2 * as.vector(y - X[, idx_fit] %*% beta_tilda)^2)) *
sqrt(n / (n - s_hat))
}
### (3): stopping criteria
criteria <- max(abs(current_loading - next_loading))
if (criteria <= nu | k > K) {
print(paste("complete at", k, "stage!"))
print(paste("criteria:", criteria))
print(next_loading)
print(paste("S_hat: ", s_hat))
return(next_loading)
}
}
}
```
#### 2.5 main function
```{r}
main <- function(seed, Design, n, p, alpha, Rsq_d, Rsq_y) {
##############################################################################
## ##
## main: main function to make estimation based on different method ##
## ##
## 1. seed: control randomness ##
## 2. n: number of observations ##
## 3. p: number of covariates ##
## 4, alpha: true value of parameter of interest ##
## 5. Design: which DGP u wanna use for simulation ##
## 6. Rsq_d: desired R-square for equation (2) ##
## 7. Rsq_y: desired R-square for equation (1) ##
## ##
##############################################################################
# return object
est_result <- list()
# control the randomness
set.seed(seed)
# simulate the data
sim <- simulate_DGP(Design, n, p, alpha, Rsq_d, Rsq_y)
# estimate the alpha from postDouble LASSO method
postDouble_est <- postDouble_LASSO(n, p, sim)
est_result[["postDouble_est"]] <- postDouble_est
# estimate the alpha from post LASSO method
post_est <- post_LASSO(n, p, sim)
est_result[["post_est"]] <- post_est
return(est_result)
}
```
<br>
<br>
---
## 3. Results from simulations
### 3.1 Original $c_y$



### 3.2 revised $c_y$

### 3.3 revised $c_y$ and larger lambda



---
## 4. RMSE
### 4.1 RMSE with original $c_y$
| RMSE | Double | Naive |
|:--------:|:-------:|:------:|
| Design 1 | 0.1947 | 0.3483 |
| Design 2 | 0.2780 | 0.4100 |
| Design 3 | 4.5254 | 0.4360 |
### 4.2 RMSE with revised $c_y$
| RMSE | Double | Naive |
|:--------:|:-------:|:------:|
| Design 1 | 0.1181 | 0.2376 |
### 4.3 RMSE with revised $c_y$ and larger lambda
| RMSE | Double | Naive |
|:--------:|:-------:|:------:|
| Design 1 | 0.1340 | 0.2497 |
| Design 2 | 0.1834 | 0.2576 |
| Design 3 | 0.2006 | 0.2071 |