owned this note
owned this note
Published
Linked with GitHub
# Sparsed PCA
#### Principal component analysis (PCA)
PCA is a popular data-processing and dimension-reduction technique, with numerous applications in engineering, biology, and social science.
PCA guarantees minimal information loss and no correlartion. We regard this as a major positive. The drawback that this approach has is that despite achieving dimensionality reduction, we still have a significant number of explicitly used variables. The elastic net, a generalization of the lasso is a good variable reduction technique. We estimate the principal components with sparse loadings resulting in an approach that we term **Sparsed Principal Component Analysis(SPCA)**.
#### The Lasso and The Elastic Net
As contrast to the trivial method, the lasso is a penalized least squares method, imposing a constraint on the L1 norm of the regression coefficients. We obtain the lasso estimate $\hat{\beta}_{lasso}$ by minimizing the lasso criterion
$$ \hat{\beta}_{lasso} = arg_{\beta}min||Y-\sum_{j=1}^{p}{X_j{\beta}_j}||^2+{\lambda}\sum_{j=1}^{p}|{\beta}_j|$$The lasso continuously shrinks the coefficients toward zero, and achieves its prediction accuracy via the bias variance trade-off. We obtain an accurate and sparse model. The number of observations still remains as a limitation. The elastic net overcomes this. For ${\lambda}_1$ and ${\lambda}_2$ non-negative,
$$ \hat{\beta}_{lasso} = (1+{\lambda}_2)\{arg_{\beta}min||Y-\sum_{j=1}^{p}{X_j{\beta}_j}||^2+{\lambda}_2\sum_{j=1}^{p}|{\beta}_j|^2+{\lambda}_1\sum_{j=1}^{p}|{\beta}_j|\}$$The elastic net penalty is a convex combination of the ridge and lasso penalties. Lasso is just a special case of this when ${\lambda}_2=0$.
#### Motivation
A procedure called SCoTLASS uses an L1 constraint on PCA to obtain sparse loadings, however this has problems including high computational cost and insufficiently sparse loadings. As an alternative, an elastic net regression is created for the same. By regressing the principal components on variables, its loadings can be recovered. The pitfalls of this approach, that a unique solution is not always obtained, can be accounted for by a positive ridge penalty factor. This creates a sparse approximation to every principal component. Effectively, PCA can be transformed into k independent ridge regression problems. The lasso approach is used to produce sparse loadings. This can be generalized to:
1) A starts at the loadings of the first k ordinary principal components.
2) For a fixed A, solve elastic net problem for j = 1,2,..,k.
3) For a fixed B, compute the SVD of $\mathbf{X}^TXB$, and update $A=UV^T$.
4) Repeat steps 2 and 3 until convergence.
5) Normalization.
Upon calculation, the explained total variance equals
$\left\|\hat{Z}\! _{j,...,j-1} \right\|^{2}= R_{jj}^{2}$.
This algorithm is computationally efficient for a small p < n.
#### SCPA for p>>n
As λ → ∞, we can use the special case of the elastic net algorithm using soft thresholding. $\beta _{j} = \left ( \left|\alpha_{j}^{T}X^{T}X \right|-\frac{\lambda _{1,j}}{2} \right )_{+}Sign\left ( \alpha _{j}^{T}X^{T}X \right )$, replacing step 2 in the general algorithm with this.
### Examples
#### THE PITPROPS DATASET
The pitprops data has n = 180 and p = 13.
This is n >> p dataset so we considered all 6 PCs with λ = 0 and λ1 = (0.06, 0.16, 0.1, 0.5, 0.5, 0.5). Compared with the modified PCs of SCoTLASS, PCs by SPCA account for a larger amount of variance (75.8% vs. 69.3%) with a much sparser loading structure. The important variables associated with the six PCs do not overlap, which further makes the interpretations easier and clearer. When we made Loadings of the First Six Sparse PCs by SPCA it was interesting to note that even though the variance does not strictly monotonously decrease, the adjusted variance follows the right order.
Also it took seconds to perform SPCA in R, whereas optimizing SCoTLASS over several values of t is an even more difficult computational challenge.
#### A SYNTHETIC EXAMPLE
We have 3 hidden factors V1, V2 and V3. Where V3 is derived from V1 and V2.The variance in this particular example was 290, 300, and 283.8, respectively. The numbers of variables associated with these three factors were 4, 4, and 2.
Therefore V2 and V1 are almost equally important, and they are much more important than V3. The first two PCs together explain 99.6% of the total variance.
These facts suggest that we only need to consider two derived variables with “correct” sparse representations.When SPCA (λ = 0) and simple thresholding was performed on this SPCA correctly identifies the sets of important variables. In fact, SPCA delivered the ideal sparse representations of the first two principal components. SCoTLASS was also able to find solution but it took more time then SPCA.
#### RAMASWAMY DATA
This analysis is to find a set of genes which are biologically relevant to the outcome (e.g. tumor type or survival time).
For microarray data like this, it appears that SCoTLASS cannot be practically useful for finding sparse PCs.The idea to choose SPCA is intuitive: if the (sparse) principal component can explain a large part of the total variance of gene expression levels, then the subset of genes representing the principal component are considered important.
As given in dataset taking p = 16603 and n = 144.We apply sparse with (λ = ∞) to find the leading sparse PC.Using SPCA the percentage of explained variance decreases at a slow rate, as the sparsity increases.