# paper note ### 使用functional CCA和CVR求出節點 先將資料透過multiresolution spline basis functions(Tzeng and Huang. 2018)進行轉換,再進行canonical variate regression以求得generative topographic mapping regression之初始值。 \begin{align*} z_G &= F_{G,r_G}W_G+\epsilon_G \\ z_A &= F_{G,r_A}W_A+\epsilon_A \\ z_B &= F_{G,r_B}W_B+\epsilon_B \\ \end{align*} 其中,$z_G \subset \mathbb{R}^{T \times n_G}$、$z_A \subset \mathbb{R}^{T \times n_A}$、$z_B \subset \mathbb{R}^{T \times n_B}$,假設收集了t=1,...T個時間點觀察的數據。$F$可以看做basis function,是由multiresolution spline basis functions產生,$r_{\cdot}$可以看做選擇了$r$個basis function,選擇basis function個數的方法參考functional cca(Huang et al. 2020)的做法。 \begin{gather*} \begin{pmatrix} F_{G,r_G}' \\ F_{G,r_A}' \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} 0 \\ 0 \end{pmatrix}, \begin{pmatrix} F_{G,r_G}' \Sigma_G F_{G,r_G} & F_{G,r_G}' \Sigma_{G,A} F_{A,r_A} \\ F_{A,r_A}' \Sigma_{A,G} F_{G,r_G} & F_{A,r_A}' \Sigma_A F_{A,r_A} \end{pmatrix} \right) \\ r_\delta^* = \min\begin{Bmatrix}r=1,...,r_{max}:RV_G(r)\geq\delta,\,RV_A(r)\geq\delta\end{Bmatrix} \\ RV_y ≡ \frac{tr(F_{G,r_G}' \Sigma_G F_{G,r_G})}{tr(F_{G,r_max}' \Sigma_G F_{G,r_max})} \end{gather*} (5),(6),(7)的式子可以透過求出basis function後,以linear rgerssion求出與之對應的$W_{\cdot}$,$W_{\cdot} \subset \mathbb{R}^{T \times r_{\cdot}}$,$W_{\cdot}$可以視為原數據$z_{\cdot}$透過basis function進行轉換後的表達,且$r_{\cdot} \leq n_{\cdot}$。得到$W_{\cdot}$後,便可以將$W_{\cdot}$進行canonical variate regression(LUO et al. 2016)。 \begin{equation} \begin{aligned} &\min \eta \begin{Vmatrix} W_A'\nu_A-W_B'\nu_B \end{Vmatrix}_{F}^{2} + \\ &(1-\eta)\begin{Vmatrix} W_G - \begin{pmatrix}1\alpha^{T}+W_A'\nu_A \beta + W_B'\nu_B \beta\end{pmatrix} \end{Vmatrix} + \\ &\rho(W_A, \lambda_A, W_B, \lambda_B) \end{aligned} \end{equation} 其中,$\nu_{\cdot} \subset \mathbb{R}^{r_{\cdot} \times k}$,其中,k是自由選擇的,也就是希望透過CVR降到多少維度上。其中,$W_A'\nu_A$和$W_B'\nu_B$會是對應GTM中的節點。 #### 實作 生成地理資料-->求basis function-->求係數 ```R= # 生成地理資料 library(autoFRK) set.seed(42) generate_data <- function(l, n, s, timepoint){ grid1 <- grid2 <- seq(0, 1, l = l) grids <- expand.grid(grid1, grid2) fn <- matrix(0, l*l, 2) fn[, 1] <- cos(sqrt((grids[, 1] - 0)^2 + (grids[, 2] - 1)^2) * pi) fn[, 2] <- cos(sqrt((grids[, 1] - 0.75)^2 + (grids[, 2] - 0.25)^2) * 2 * pi) wt <- matrix(0, 2, timepoint) for (tt in 1:timepoint) wt[, tt] <- c(rnorm(1, sd =5), rnorm(1, sd = 3)) yt <- fn %*% wt obs <- sample(l*l, n) zt <- yt[sort(obs), ] + matrix(rnorm(n * timepoint), n, timepoint) * sqrt(s) X <- grids[obs, ] return(list(yt=yt, zt=zt, X=X)) } # 用mrts找basis function basis_function <- function(zt, X, n){ f = mrts(X, n) sigma_G = cov(t(zt), t(zt)) m_rv = t(f) %*% sigma_G %*% f rv_max = sum(diag(m_rv)) for (i in 1:(n-1)){ m_rv = t(f[,1:(n-i)]) %*% sigma_G %*% f[,1:(n-i)] rv = sum(diag(m_rv)) r_star = rv/rv_max if (r_star<=0.5){ f_basis = f[,1:(n-i+1)] break } } return(f_basis) } # 對basis function做線性迴歸 weight <- function(zt, f_basis, timepoint){ coefficients_matrix <- matrix(NA, nrow = timepoint, ncol = dim(f_basis)[2]) for (i in 1:timepoint){ model_linear = lm(zt[,i] ~ f_basis - 1) coefficients <- coef(model_linear) coefficients_matrix[i,] <- coefficients } return(coefficients_matrix) } ``` 對$A,B,G$都進行實作,找到$W_A, W_B, W_G$ ```R= l <- 30 n <- 70 s <- 5 timepoint <- 50 result = generate_data(l, n, s, timepoint) yt <- result$yt zt <- result$zt X <- result$X f_basis = basis_function(zt, X, n) coefficients_matrix_G = weight(zt, f_basis, timepoint) na <- 500 nb <- 500 set.seed(0) obs_a <- sample(l*l, na) set.seed(1) obs_b <- sample(l*l, nb) sa = 10 sb = 15 za = 2*yt[sort(obs_a), ]+0.75 + matrix(rnorm(na * timepoint), na, timepoint) * sqrt(sa) zb = 3*yt[sort(obs_b), ]-0.65 + matrix(rnorm(nb * timepoint), nb, timepoint) * sqrt(sb) grid1 <- grid2 <- seq(0, 1, l = l) grids <- expand.grid(grid1, grid2) Xa <- grids[obs_a, ] Xb <- grids[obs_b, ] f_basis_a = basis_function(za, Xa, na) coefficients_matrix_a = weight(za, f_basis_a, timepoint) f_basis_b = basis_function(zb, Xb, nb) coefficients_matrix_b = weight(zb, f_basis_b, timepoint) ``` 套用CVR,因為$W_A, W_B, W_G$都是高維度資料,但CVR只支援univariate response,所以: * 用迴圈處理: 假設$W_G$的維度是$\mathbb{R}^{T \times r_{G}}$,則需要跑$r_{G}$次的迴圈 ```R= library(CVR) X_list = list(X1 = coefficients_matrix_a, X2 = coefficients_matrix_b) for (i in 1:dim(coefficients_matrix_G)[2]){ cvr_result <- CVR(coefficients_matrix_G[,i], X_list, rankseq = 2, etaseq = 0.02, family = "gaussian", penalty = "L1") wx1 <- as.matrix(as.data.frame(cvr_result$solution$W[1])) wx2 <- as.matrix(as.data.frame(cvr_result$solution$W[2])) result_a <- coefficients_matrix_a %*% wx1 result_b <- coefficients_matrix_b %*% wx2 } ``` * Block diagonal matrix: 把$W_G$攤平,並且對$W_A$和$W_B$做Block diagonal matrix ```R= library(Matrix) library(CVR) block_diag_a <- list(coefficients_matrix_a) for (i in 1:(dim(coefficients_matrix_G)[2]-1)) { block_diag_a[[length(block_diag_a) + 1]] <- coefficients_matrix_a } block_diag_a <- bdiag(block_diag_a) block_diag_b <- list(coefficients_matrix_b) for (i in 1:(dim(coefficients_matrix_G)[2]-1)) { block_diag_b[[length(block_diag_b) + 1]] <- coefficients_matrix_b } block_diag_b <- bdiag(block_diag_b) flatten_y <- c(coefficients_matrix_G) X_list <- list(X1 = block_diag_a, X2 =block_diag_b) # test # aa = as.matrix(block_diag_a) # bb = as.matrix(block_diag_b) # X_list <- list(X1 = aa, X2 =bb) cvr_result <- CVR(flatten_y, X_list, rankseq = 2, etaseq = 0.02, family = "gaussian", penalty = "L1") # TO CSV A <- cvr_result$solution$W[1] A <- as.data.frame(matrix(unlist(A), ncol = 2, byrow = TRUE)) write.csv(A, file = "WA.csv", row.names = FALSE) B <- cvr_result$solution$W[2] B <- as.data.frame(matrix(unlist(B), ncol = 2, byrow = TRUE)) write.csv(B, file = "WB.csv", row.names = FALSE) ``` 不管是用迴圈還是Block diagonal matrix都會遇到一些問題,因為CVR的$Y$,也就是response是univariate,這樣代表會生成$W_G$的維度個,也就是$r_G$組$\nu_A$和$\nu_B$,這樣的話是不是需要每一個$W_G$分開來做,也就是需要做$r_G$次GTM。 * CVR可以一次做多個維度的response: ```R= Xlist = list(coefficients_matrix_a, coefficients_matrix_b) Wini <- SparseCCA(coefficients_matrix_a, coefficients_matrix_b, 2, 0.7, 0.7) opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005) result = cvrsolver(coefficients_matrix_G[, c(1, 2)], Xlist=Xlist, rank=2, eta=0.5, Lam=0.02, family="gaussian", Wini=Wini, penalty="L1", opts=opts ) ``` 所以以上面兩個問題都解決了這樣 補充: 1. 可以用檢查地理資料是否生成正確 ```R= require(fields) z_range <- range(yt) for (i in 1:50){ image.plot(matrix(yt[, i], l, l), zlim = z_range) Sys.sleep(0.5) } ``` ![plot_result](https://hackmd.io/_uploads/BkcfpM7U6.gif) 2. Block diagonal matrix和迴圈的結果我還沒有對過,但Block diagonal matrix的結果目前看起來很奇怪: * 該對應到數值的地方都是0,不知道是不是數值太小或沒有標準化導致 * 但目前使用迴圈,似乎沒有問題(待確認) ### GTM:(Generative topographic mapping) 其中$z$代表網格,$c_k$代表節點,$\phi$代表radial basis function,$W_x$代表對應radial basis function的線性組合的係數,$\epsilon_x$代表radial basis function的variance。 \begin{gather*} p(z) = \frac{1}{K}\sum_{k=1}^{K} \delta(z-c_k) \\ x=\phi(z,W_x)+e_x=\Phi(z)W_x+e_x \\ p(x|z,W_x,\epsilon_x) \sim N(x|\phi(z,W_x), \epsilon_xI) \end{gather*} the marginal distribution of x \begin{gather*} p(x|W_x, \epsilon_x) = \frac{1}{K}\sum_{k=1}^{K}p(x|c_k, W_x, \epsilon_x) \\ = \frac{1}{K}\sum_{k=1}^{K}N(x|\phi(c_k,W_x), \epsilon_xI) \end{gather*}posteriors of Gaussian product component \begin{gather*} R_{ki} = p(c_k | x_i) = \frac{p(x_i | c_k)}{\sum_{j=1}^{K} p(x_i | c_j)} = \frac{N(x_i | \phi(c_k, W_x), \epsilon_x I)}{\sum_{j=1}^{K} N(x_i | \phi(c_j, W_x), \epsilon_x I)} \end{gather*} expectation of the log-likelihood function(E-step) \begin{gather*} E(\ln L) = \sum_{i=1}^{n} \sum_{k=1}^{K} \frac{1}{K} R_{ki} \ln \left\{ N(x_i | \phi(c_k, W_x), \epsilon_x I) \right\} \end{gather*} maximizing with respect to $W_x$ \begin{gather*} \sum_{i=1}^{n} \sum_{k=1}^{K} \frac{1}{K} R_{ki} \left\{ W_x \phi(c_k) - x_i\right\}\phi(c_k)^{T}=0 \\ \Phi G \Phi^T W_x^T = \Phi R X \\ \end{gather*} $G$是diagnoal matrix,裡面的值是$\sum_{j=1}^{n}R_{ij}$,$\Phi = (\phi(c_1), ... , \phi(c_k))$。 ### 接下來的問題 按照老師的說法,要直接把$\nu_A, \nu_B, \nu_G$當成basis function,也就是: \begin{gather*} \Phi_A = \nu_A \\ \Phi_B = \nu_B \\ \Phi_G = \nu_G \\ \end{gather*} 把$\nu_A, \nu_B, \nu_G$當成basis function,那input data會是什麼? 很直觀可以想到的是$W_A$會是input data \begin{gather*} W_A=\Phi(W_A'\nu_A)Q_A+e_x \end{gather*} $Q_A$是對basis function做線性組合權重 一個重要的問題: $W_A$和$W_B$各有一個投影$\nu_A$和$\nu_B$可以使其投影到低維空間,但對於$W_G$,不存在$\nu_G$可以使$W_G$投影到低維空間 1. $A,B$各有一個低維空間會是問題嗎 * 我不覺得是問題,因為本來GTM中的低維空間本來就可以不是共同的 2. 不存在$\nu_G$會是問題嗎 * 我覺得會是問題,如前面所述,只有$W_G$沒有對應的$\nu_G$,直觀上我認為在計算$R_{ki}$的時候問有問題 解決方案: 把$W_G$先進行PCA再做CVR * 假設透過PCA得到一個$W_G$的低維投影,他的意義也和A,B相同,是透過和自身的線性組合(也就是前面提到的loading)而來 * 假設需要令$W_G, W_A, W_B$的低維投影的維度相同,則可以在PCA的時候就決定要使用多少節點來進行basis function的轉換 * 剩下就只需要使用某種basis function對節點做線性組合,就可以求出結果 ### sJIVE: Supervised Joint and Individual Variation Explained sJIVE是另一個解決方案,sJIVE和CVR的工作相同,但criteria不同 \begin{gather*} X_1 = U_1S_J+W_1S_1+E_1 \\ X_2 = U_2S_J+W_2S_2+E_2 \\ y = \theta_1S_J + \theta_{21}S_1 + \theta_{22}S_2 + E_Y \\ \end{gather*} \begin{gather*} \underset{S_j,\theta_1,\theta_{2,i},U_i,W_i,S_i, i=1,...k}{\arg\min} \sum_{i=1}^{K} \eta \| X_i - U_iS_j - W_iS_i\|_F^2 \\ + (1-\eta)\| y - \theta_1S_J - \sum_{i=1}^{K} \theta_{2i}S_J\|_F^2 \end{gather*} 可以看出,相較於CVR,這個criteria多了一個$S_J$作為共同的低維投影,此外,也有各有一個獨立的投影$S_1, S_2$,對於$G,A,B$在response的位置的對應問題,這裡和CVR相同,此外,對於y這裡也會有對應的低維投影,某種程度上解決了原先使用CVR,response沒有低維投影的問題,但這樣似乎又會有其他問題: 這樣應該要把$S_J$當作節點,忽略$S_1$和$S_2$嗎? 對於這個問題,我覺得可能是可以忽略的,原因是$S_J$只是被當作節點使用,之後還要接受GTM非線性變換的洗禮。另外一個重要的事情: 他有R的[程式實作](https://github.com/lockEF/r.jive/tree/master)。 basis function可以重新寫作: \begin{gather*} \Phi(S_j) = \exp\left(-\frac{\lVert S_j - C_{S_j} \rVert}{2\sigma}\right) \end{gather*} #### 實作 需要注意的是,這個package如果直接在R裡面install是沒有Sjive可以用的,需要從github上面下載或是透過下面的code直接安裝: ```R= install.packages('devtools') library(devtools) install_github("lockEF/r.jive") ``` 有了套件之後就可以透過指定一些參數跑Sjive,可以從[這裡](https://github.com/lockEF/r.jive/blob/master/inst/doc/BRCA_Example.html)取得資料的來源。 ```R= # sjive library(r.jive) data = data(BRCA_data) X = list(Data$Expression, Data$Methylation) y = t(Data$miRNA)[,1] Results = sJIVE(X, y, eta=0.5, rankJ=2, rankA=c(25,25)) ``` ![Results](https://hackmd.io/_uploads/BJ9KV94Up.png) #### 接下來的接下來 假設這樣可行,原本的節點的意思是指對網格上找的幾個中心點,但如果上式成立,比較不像是把$S_J$當作節點,而是當作網格,然後再繼續取$S_J$的中心點當作節點。 ### GTM把資料降到一個維度的程式實作 假設只有一個latent variable的GTM實作 之後再按照需求把節點改掉,目前GTM使用python實作 製造數據: 用1000個點$t \sim U(0, 3.15)$製造$y$,$y$就是input data 網格點使用stand normal製造 basis function:選擇使用radial basis function,並選擇使用5個中心點 ```python= t_size = 1000 variance_of_rbfs = 1 num_map_grids = 20 num_centers = 5 t = np.random.uniform(0, 3.15, size=t_size) y = t**2 + 1.5*np.sin(2*t) y = y.reshape(y.shape[0], 1) map_grids = np.random.normal(0, 1, size=num_map_grids) map_grids = map_grids.reshape(map_grids.shape[0], 1) min_value = min(y) max_value = max(y) centers = np.linspace(min_value, max_value, num_centers) centers = centers.reshape(centers.shape[0], 1) phi = cdist(map_grids, centers,'sqeuclidean') phi = np.exp(-phi / 2.0 / variance_of_rbfs) ``` 需要拿網格點-中心點是因為這個basis function表達了一種從網格點到中心點的距離,可以說是透過中心點映射到網格點。 選擇W和beta的初始值: (beta的初始值算法目前未定) ```python= phi = cdist(map_grids, centers,'sqeuclidean') W = np.linalg.pinv(phi).dot(map_grids) beta = 1 ``` Responsibility ```python= def caculate_R(W, beta): distance = cdist(y, phi.dot(W), 'sqeuclidean') p = np.exp(-beta / 2.0 * distance) sum_p = p.sum(axis=1) R = p / np.reshape(sum_p,(p.shape[0], 1)) likelihood_value =\ (np.log((beta / 2.0 / np.pi) ** (y.shape[1] / 2.0) / np.prod([num_map_grids]) * p.sum(axis=1))).sum() print('log likelihood: %s'%(likelihood_value)) return R ``` EM算法 ```python= num_iter = 10 _lambda = 0.001 for i in range(num_iter): R = caculate_R(W, beta) phi_solve = phi.T.dot(np.diag(R.sum(axis=0)).dot(phi)) + _lambda/beta * np.identity(phi.shape[1]) W = np.linalg.inv(phi_solve).dot(phi.T.dot(R.T.dot(y))) beta = y.size/(R * cdist(y, phi.dot(W))**2).sum() ```