# paper note
### 使用functional CCA和CVR求出節點
先將資料透過multiresolution spline basis functions(Tzeng and Huang. 2018)進行轉換,再進行canonical variate regression以求得generative topographic mapping regression之初始值。
\begin{align*}
z_G &= F_{G,r_G}W_G+\epsilon_G \\
z_A &= F_{G,r_A}W_A+\epsilon_A \\
z_B &= F_{G,r_B}W_B+\epsilon_B \\
\end{align*}
其中,$z_G \subset \mathbb{R}^{T \times n_G}$、$z_A \subset \mathbb{R}^{T \times n_A}$、$z_B \subset \mathbb{R}^{T \times n_B}$,假設收集了t=1,...T個時間點觀察的數據。$F$可以看做basis function,是由multiresolution spline basis functions產生,$r_{\cdot}$可以看做選擇了$r$個basis function,選擇basis function個數的方法參考functional cca(Huang et al. 2020)的做法。
\begin{gather*}
\begin{pmatrix} F_{G,r_G}' \\ F_{G,r_A}' \end{pmatrix} \sim \mathcal{N}
\left(
\begin{pmatrix} 0 \\ 0 \end{pmatrix},
\begin{pmatrix}
F_{G,r_G}' \Sigma_G F_{G,r_G} & F_{G,r_G}' \Sigma_{G,A} F_{A,r_A} \\
F_{A,r_A}' \Sigma_{A,G} F_{G,r_G} & F_{A,r_A}' \Sigma_A F_{A,r_A}
\end{pmatrix}
\right) \\
r_\delta^* = \min\begin{Bmatrix}r=1,...,r_{max}:RV_G(r)\geq\delta,\,RV_A(r)\geq\delta\end{Bmatrix} \\
RV_y ≡ \frac{tr(F_{G,r_G}' \Sigma_G F_{G,r_G})}{tr(F_{G,r_max}' \Sigma_G F_{G,r_max})}
\end{gather*}
(5),(6),(7)的式子可以透過求出basis function後,以linear rgerssion求出與之對應的$W_{\cdot}$,$W_{\cdot} \subset \mathbb{R}^{T \times r_{\cdot}}$,$W_{\cdot}$可以視為原數據$z_{\cdot}$透過basis function進行轉換後的表達,且$r_{\cdot} \leq n_{\cdot}$。得到$W_{\cdot}$後,便可以將$W_{\cdot}$進行canonical variate regression(LUO et al. 2016)。
\begin{equation}
\begin{aligned}
&\min \eta \begin{Vmatrix} W_A'\nu_A-W_B'\nu_B \end{Vmatrix}_{F}^{2} + \\
&(1-\eta)\begin{Vmatrix} W_G - \begin{pmatrix}1\alpha^{T}+W_A'\nu_A \beta + W_B'\nu_B \beta\end{pmatrix} \end{Vmatrix} + \\
&\rho(W_A, \lambda_A, W_B, \lambda_B)
\end{aligned}
\end{equation}
其中,$\nu_{\cdot} \subset \mathbb{R}^{r_{\cdot} \times k}$,其中,k是自由選擇的,也就是希望透過CVR降到多少維度上。其中,$W_A'\nu_A$和$W_B'\nu_B$會是對應GTM中的節點。
#### 實作
生成地理資料-->求basis function-->求係數
```R=
# 生成地理資料
library(autoFRK)
set.seed(42)
generate_data <- function(l, n, s, timepoint){
grid1 <- grid2 <- seq(0, 1, l = l)
grids <- expand.grid(grid1, grid2)
fn <- matrix(0, l*l, 2)
fn[, 1] <- cos(sqrt((grids[, 1] - 0)^2 + (grids[, 2] - 1)^2) * pi)
fn[, 2] <- cos(sqrt((grids[, 1] - 0.75)^2 + (grids[, 2] - 0.25)^2) * 2 * pi)
wt <- matrix(0, 2, timepoint)
for (tt in 1:timepoint) wt[, tt] <- c(rnorm(1, sd =5), rnorm(1, sd = 3))
yt <- fn %*% wt
obs <- sample(l*l, n)
zt <- yt[sort(obs), ] + matrix(rnorm(n * timepoint), n, timepoint) * sqrt(s)
X <- grids[obs, ]
return(list(yt=yt, zt=zt, X=X))
}
# 用mrts找basis function
basis_function <- function(zt, X, n){
f = mrts(X, n)
sigma_G = cov(t(zt), t(zt))
m_rv = t(f) %*% sigma_G %*% f
rv_max = sum(diag(m_rv))
for (i in 1:(n-1)){
m_rv = t(f[,1:(n-i)]) %*% sigma_G %*% f[,1:(n-i)]
rv = sum(diag(m_rv))
r_star = rv/rv_max
if (r_star<=0.5){
f_basis = f[,1:(n-i+1)]
break
}
}
return(f_basis)
}
# 對basis function做線性迴歸
weight <- function(zt, f_basis, timepoint){
coefficients_matrix <- matrix(NA, nrow = timepoint, ncol = dim(f_basis)[2])
for (i in 1:timepoint){
model_linear = lm(zt[,i] ~ f_basis - 1)
coefficients <- coef(model_linear)
coefficients_matrix[i,] <- coefficients
}
return(coefficients_matrix)
}
```
對$A,B,G$都進行實作,找到$W_A, W_B, W_G$
```R=
l <- 30
n <- 70
s <- 5
timepoint <- 50
result = generate_data(l, n, s, timepoint)
yt <- result$yt
zt <- result$zt
X <- result$X
f_basis = basis_function(zt, X, n)
coefficients_matrix_G = weight(zt, f_basis, timepoint)
na <- 500
nb <- 500
set.seed(0)
obs_a <- sample(l*l, na)
set.seed(1)
obs_b <- sample(l*l, nb)
sa = 10
sb = 15
za = 2*yt[sort(obs_a), ]+0.75 + matrix(rnorm(na * timepoint), na, timepoint) * sqrt(sa)
zb = 3*yt[sort(obs_b), ]-0.65 + matrix(rnorm(nb * timepoint), nb, timepoint) * sqrt(sb)
grid1 <- grid2 <- seq(0, 1, l = l)
grids <- expand.grid(grid1, grid2)
Xa <- grids[obs_a, ]
Xb <- grids[obs_b, ]
f_basis_a = basis_function(za, Xa, na)
coefficients_matrix_a = weight(za, f_basis_a, timepoint)
f_basis_b = basis_function(zb, Xb, nb)
coefficients_matrix_b = weight(zb, f_basis_b, timepoint)
```
套用CVR,因為$W_A, W_B, W_G$都是高維度資料,但CVR只支援univariate response,所以:
* 用迴圈處理: 假設$W_G$的維度是$\mathbb{R}^{T \times r_{G}}$,則需要跑$r_{G}$次的迴圈
```R=
library(CVR)
X_list = list(X1 = coefficients_matrix_a, X2 = coefficients_matrix_b)
for (i in 1:dim(coefficients_matrix_G)[2]){
cvr_result <- CVR(coefficients_matrix_G[,i], X_list, rankseq = 2, etaseq = 0.02,
family = "gaussian", penalty = "L1")
wx1 <- as.matrix(as.data.frame(cvr_result$solution$W[1]))
wx2 <- as.matrix(as.data.frame(cvr_result$solution$W[2]))
result_a <- coefficients_matrix_a %*% wx1
result_b <- coefficients_matrix_b %*% wx2
}
```
* Block diagonal matrix: 把$W_G$攤平,並且對$W_A$和$W_B$做Block diagonal matrix
```R=
library(Matrix)
library(CVR)
block_diag_a <- list(coefficients_matrix_a)
for (i in 1:(dim(coefficients_matrix_G)[2]-1)) {
block_diag_a[[length(block_diag_a) + 1]] <- coefficients_matrix_a
}
block_diag_a <- bdiag(block_diag_a)
block_diag_b <- list(coefficients_matrix_b)
for (i in 1:(dim(coefficients_matrix_G)[2]-1)) {
block_diag_b[[length(block_diag_b) + 1]] <- coefficients_matrix_b
}
block_diag_b <- bdiag(block_diag_b)
flatten_y <- c(coefficients_matrix_G)
X_list <- list(X1 = block_diag_a, X2 =block_diag_b)
# test
# aa = as.matrix(block_diag_a)
# bb = as.matrix(block_diag_b)
# X_list <- list(X1 = aa, X2 =bb)
cvr_result <- CVR(flatten_y, X_list, rankseq = 2, etaseq = 0.02,
family = "gaussian", penalty = "L1")
# TO CSV
A <- cvr_result$solution$W[1]
A <- as.data.frame(matrix(unlist(A), ncol = 2, byrow = TRUE))
write.csv(A, file = "WA.csv", row.names = FALSE)
B <- cvr_result$solution$W[2]
B <- as.data.frame(matrix(unlist(B), ncol = 2, byrow = TRUE))
write.csv(B, file = "WB.csv", row.names = FALSE)
```
不管是用迴圈還是Block diagonal matrix都會遇到一些問題,因為CVR的$Y$,也就是response是univariate,這樣代表會生成$W_G$的維度個,也就是$r_G$組$\nu_A$和$\nu_B$,這樣的話是不是需要每一個$W_G$分開來做,也就是需要做$r_G$次GTM。
* CVR可以一次做多個維度的response:
```R=
Xlist = list(coefficients_matrix_a, coefficients_matrix_b)
Wini <- SparseCCA(coefficients_matrix_a, coefficients_matrix_b, 2, 0.7, 0.7)
opts <- list(standardization = FALSE, maxIters = 300, tol = 0.005)
result = cvrsolver(coefficients_matrix_G[, c(1, 2)],
Xlist=Xlist,
rank=2,
eta=0.5,
Lam=0.02,
family="gaussian",
Wini=Wini,
penalty="L1",
opts=opts
)
```
所以以上面兩個問題都解決了這樣
補充:
1. 可以用檢查地理資料是否生成正確
```R=
require(fields)
z_range <- range(yt)
for (i in 1:50){
image.plot(matrix(yt[, i], l, l), zlim = z_range)
Sys.sleep(0.5)
}
```

2. Block diagonal matrix和迴圈的結果我還沒有對過,但Block diagonal matrix的結果目前看起來很奇怪:
* 該對應到數值的地方都是0,不知道是不是數值太小或沒有標準化導致
* 但目前使用迴圈,似乎沒有問題(待確認)
### GTM:(Generative topographic mapping)
其中$z$代表網格,$c_k$代表節點,$\phi$代表radial basis function,$W_x$代表對應radial basis function的線性組合的係數,$\epsilon_x$代表radial basis function的variance。
\begin{gather*}
p(z) = \frac{1}{K}\sum_{k=1}^{K} \delta(z-c_k) \\
x=\phi(z,W_x)+e_x=\Phi(z)W_x+e_x \\
p(x|z,W_x,\epsilon_x) \sim N(x|\phi(z,W_x), \epsilon_xI)
\end{gather*}
the marginal distribution of x
\begin{gather*}
p(x|W_x, \epsilon_x) = \frac{1}{K}\sum_{k=1}^{K}p(x|c_k, W_x, \epsilon_x) \\
= \frac{1}{K}\sum_{k=1}^{K}N(x|\phi(c_k,W_x), \epsilon_xI)
\end{gather*}posteriors of Gaussian product component
\begin{gather*}
R_{ki} = p(c_k | x_i) = \frac{p(x_i | c_k)}{\sum_{j=1}^{K} p(x_i | c_j)} = \frac{N(x_i | \phi(c_k, W_x), \epsilon_x I)}{\sum_{j=1}^{K} N(x_i | \phi(c_j, W_x), \epsilon_x I)}
\end{gather*}
expectation of the log-likelihood function(E-step)
\begin{gather*}
E(\ln L) = \sum_{i=1}^{n} \sum_{k=1}^{K} \frac{1}{K} R_{ki} \ln \left\{ N(x_i | \phi(c_k, W_x), \epsilon_x I) \right\}
\end{gather*}
maximizing with respect to $W_x$
\begin{gather*}
\sum_{i=1}^{n} \sum_{k=1}^{K} \frac{1}{K} R_{ki} \left\{ W_x \phi(c_k) - x_i\right\}\phi(c_k)^{T}=0 \\
\Phi G \Phi^T W_x^T = \Phi R X \\
\end{gather*}
$G$是diagnoal matrix,裡面的值是$\sum_{j=1}^{n}R_{ij}$,$\Phi = (\phi(c_1), ... , \phi(c_k))$。
### 接下來的問題
按照老師的說法,要直接把$\nu_A, \nu_B, \nu_G$當成basis function,也就是:
\begin{gather*}
\Phi_A = \nu_A \\
\Phi_B = \nu_B \\
\Phi_G = \nu_G \\
\end{gather*}
把$\nu_A, \nu_B, \nu_G$當成basis function,那input data會是什麼? 很直觀可以想到的是$W_A$會是input data
\begin{gather*}
W_A=\Phi(W_A'\nu_A)Q_A+e_x
\end{gather*}
$Q_A$是對basis function做線性組合權重
一個重要的問題: $W_A$和$W_B$各有一個投影$\nu_A$和$\nu_B$可以使其投影到低維空間,但對於$W_G$,不存在$\nu_G$可以使$W_G$投影到低維空間
1. $A,B$各有一個低維空間會是問題嗎
* 我不覺得是問題,因為本來GTM中的低維空間本來就可以不是共同的
2. 不存在$\nu_G$會是問題嗎
* 我覺得會是問題,如前面所述,只有$W_G$沒有對應的$\nu_G$,直觀上我認為在計算$R_{ki}$的時候問有問題
解決方案: 把$W_G$先進行PCA再做CVR
* 假設透過PCA得到一個$W_G$的低維投影,他的意義也和A,B相同,是透過和自身的線性組合(也就是前面提到的loading)而來
* 假設需要令$W_G, W_A, W_B$的低維投影的維度相同,則可以在PCA的時候就決定要使用多少節點來進行basis function的轉換
* 剩下就只需要使用某種basis function對節點做線性組合,就可以求出結果
### sJIVE: Supervised Joint and Individual Variation Explained
sJIVE是另一個解決方案,sJIVE和CVR的工作相同,但criteria不同
\begin{gather*}
X_1 = U_1S_J+W_1S_1+E_1 \\
X_2 = U_2S_J+W_2S_2+E_2 \\
y = \theta_1S_J + \theta_{21}S_1 + \theta_{22}S_2 + E_Y \\
\end{gather*}
\begin{gather*}
\underset{S_j,\theta_1,\theta_{2,i},U_i,W_i,S_i, i=1,...k}{\arg\min} \sum_{i=1}^{K} \eta \| X_i - U_iS_j - W_iS_i\|_F^2 \\ + (1-\eta)\| y - \theta_1S_J - \sum_{i=1}^{K} \theta_{2i}S_J\|_F^2
\end{gather*}
可以看出,相較於CVR,這個criteria多了一個$S_J$作為共同的低維投影,此外,也有各有一個獨立的投影$S_1, S_2$,對於$G,A,B$在response的位置的對應問題,這裡和CVR相同,此外,對於y這裡也會有對應的低維投影,某種程度上解決了原先使用CVR,response沒有低維投影的問題,但這樣似乎又會有其他問題: 這樣應該要把$S_J$當作節點,忽略$S_1$和$S_2$嗎?
對於這個問題,我覺得可能是可以忽略的,原因是$S_J$只是被當作節點使用,之後還要接受GTM非線性變換的洗禮。另外一個重要的事情: 他有R的[程式實作](https://github.com/lockEF/r.jive/tree/master)。
basis function可以重新寫作:
\begin{gather*}
\Phi(S_j) = \exp\left(-\frac{\lVert S_j - C_{S_j} \rVert}{2\sigma}\right)
\end{gather*}
#### 實作
需要注意的是,這個package如果直接在R裡面install是沒有Sjive可以用的,需要從github上面下載或是透過下面的code直接安裝:
```R=
install.packages('devtools')
library(devtools)
install_github("lockEF/r.jive")
```
有了套件之後就可以透過指定一些參數跑Sjive,可以從[這裡](https://github.com/lockEF/r.jive/blob/master/inst/doc/BRCA_Example.html)取得資料的來源。
```R=
# sjive
library(r.jive)
data = data(BRCA_data)
X = list(Data$Expression, Data$Methylation)
y = t(Data$miRNA)[,1]
Results = sJIVE(X, y, eta=0.5, rankJ=2, rankA=c(25,25))
```

#### 接下來的接下來
假設這樣可行,原本的節點的意思是指對網格上找的幾個中心點,但如果上式成立,比較不像是把$S_J$當作節點,而是當作網格,然後再繼續取$S_J$的中心點當作節點。
### GTM把資料降到一個維度的程式實作
假設只有一個latent variable的GTM實作
之後再按照需求把節點改掉,目前GTM使用python實作
製造數據:
用1000個點$t \sim U(0, 3.15)$製造$y$,$y$就是input data
網格點使用stand normal製造
basis function:選擇使用radial basis function,並選擇使用5個中心點
```python=
t_size = 1000
variance_of_rbfs = 1
num_map_grids = 20
num_centers = 5
t = np.random.uniform(0, 3.15, size=t_size)
y = t**2 + 1.5*np.sin(2*t)
y = y.reshape(y.shape[0], 1)
map_grids = np.random.normal(0, 1, size=num_map_grids)
map_grids = map_grids.reshape(map_grids.shape[0], 1)
min_value = min(y)
max_value = max(y)
centers = np.linspace(min_value, max_value, num_centers)
centers = centers.reshape(centers.shape[0], 1)
phi = cdist(map_grids, centers,'sqeuclidean')
phi = np.exp(-phi / 2.0 / variance_of_rbfs)
```
需要拿網格點-中心點是因為這個basis function表達了一種從網格點到中心點的距離,可以說是透過中心點映射到網格點。
選擇W和beta的初始值:
(beta的初始值算法目前未定)
```python=
phi = cdist(map_grids, centers,'sqeuclidean')
W = np.linalg.pinv(phi).dot(map_grids)
beta = 1
```
Responsibility
```python=
def caculate_R(W, beta):
distance = cdist(y, phi.dot(W), 'sqeuclidean')
p = np.exp(-beta / 2.0 * distance)
sum_p = p.sum(axis=1)
R = p / np.reshape(sum_p,(p.shape[0], 1))
likelihood_value =\
(np.log((beta / 2.0 / np.pi) ** (y.shape[1] / 2.0) / np.prod([num_map_grids]) * p.sum(axis=1))).sum()
print('log likelihood: %s'%(likelihood_value))
return R
```
EM算法
```python=
num_iter = 10
_lambda = 0.001
for i in range(num_iter):
R = caculate_R(W, beta)
phi_solve = phi.T.dot(np.diag(R.sum(axis=0)).dot(phi)) + _lambda/beta * np.identity(phi.shape[1])
W = np.linalg.inv(phi_solve).dot(phi.T.dot(R.T.dot(y)))
beta = y.size/(R * cdist(y, phi.dot(W))**2).sum()
```