Adv_Biostat (2020.12.26)

# Adv_Biostat (2020.12.26) ###### tags: `BioStat` 課前小記：許志遠博士所在的Nashville被炸了 # Grading - Lab Homework 25% - Homework 25% - Paper Critique 50% # FDA Case ![Phase 3 Programme Overview](https://i.imgur.com/y0atcYp.png) Figure. Phase 3 Programme Overview > :question: 為什麼不是AZLI versus TIS? A: 怕會輸TIS > :question: 為什麼不用noninferiority? A: sample size requirement: noninferiority >> superiority ## Some Concerns - 如何確認event的致病原是Pseudomonas aeruginosa? - 比起treatment，更像prevention - 為什麼四個arm要按2211分配? - 病人參加實驗設計，都是希望可以join - unbalanced design可以使病人更attractive (attract more people) - Stepped Wedge Design (SWT): a modified crossover design - recruitment過程中，對TIS反應不好的病人會被排除 - EUA只發生在沒有任何現有治療的情況下。例如：COVID-19的vaccine - 為什麼要分BID和TID? - dose determination不易：藥廠在Phase I和II時沒辦法確認optimal dose（FDA卻核准了使用） - 藥廠猜Placebo到BID到TID有越來越好的趨勢 - 另外的例子: Oxford和AZ藥廠，半個dose比一個dose的效果好，耗時耗資 ## Voting > **VOTE:** > Has the Applicant provided substantial evidence of the efficacy and safety of 75 mg **three** times daily of AZLI for the requested indication of improvement of respiratory symptoms and pulmonary function in cystic fibrosis patients with Pseudomonas aeruginosa? In your response, discuss the rationale for your answer. || **YES** -- 15 members |**NO** -- 2 members | | --------| -------- |-------- | || including infective disease doctors | including the speaker和一個有PhD的病人 | |Reason| 沒有現藥(包括對TIS耐受性差)：小孩都被注射到沒位置打 | FDA丟爛攤子給委員會，而且不是問BID，風險更大 | || EU approved | | || Canada may follow EU's step | | # Study Design ## Eﬃcacy confirmation: superiority, noninferiority, and equivalence > [color=#2bcbd1]**沒有十全十美的design，每種都有其limitation。** ### superiority: 至少要有一個variable證明我比你好 (primary endpoint) - 決定sample size和trial monitoring - 其他outcome - secondary endpoint - 不同於primary endpoint: QOL - 同primary endpoint: based on subgroup - surrogate endpoint ### noninferiority (統計學家的發展空間): 我的藥效比你差一點，但是我比你安全(ethical concern)、方便、使用者更廣泛、便宜 - FDA允許從noninferiority出發，發superiority的paper。 - 但不允許superiority→noninferiority two-tailed predetermine supriority的data overall alpha要考慮multiple CI要大於superiority ### (bio)equivalence # Four main types of studies **按嚴謹程度排序 (to decrease bias):** Clinical > Cohort > Case Control > Descriptive 實驗方法的嚴謹度，基本決定了刊登期刊的IF (八字輕重)。 | Correlational | Case Control | Cohort | | Clinical Trial | | -------- | -------- | --- | --- | -------- | | cause vs. outcome | 目的是generate hypotheses | nested case-control | case-cohort study | prospective | | 無法控制confounding variables | control group有recall bias | | | control | | 快速又簡單 | 不能用logistic regression | | | human | ## Correlational Studies > *Of course, a correlation between X and Y does not prove causation but indicates that either X influences Y, Y influences X, or X and Y are influenced by a common underlying mechanism.* - cause vs. outcome - cigarettes per capita vs. CHD deaths per 100,000 population (1960) - 只收集大層級的數據，缺乏individual的資料 - interpretation危險: 無法控制confounding variables - 快速又簡單 ### An interesting paper from NEJM: ![](https://i.imgur.com/nHEV73O.png) Chocolate Consumption, Cognitive Function, and Nobel Laureates (2012) - 資料來源是WIKI - r = 0.791, P<0.0001 - 討論：要得諾貝爾獎，就要多吃巧克力? - 沒有background的support，只是trick - 類似的實驗: Letters per 10 million vs. Prizes per 10 million (r = 0.921, p<0.00001) # Pay attention on outliers: Pearson vs. Spearman | Pearson | Spearman | | ------------------ | -------- | | linear correlation | measure each (x, y) is monotone | | very sensitive to outlier | 較不受outlier影響 | | ![](https://i.imgur.com/0XF9yxX.png)從圖中可以看出，Pearson值0無法區別不同的pattern| | | ## A New Coeﬃcient of Correlation: ξn > ξn is 0 **if and only if** the variables are independent. - simple asymptotic theory under the hypothesis of independence ![](https://i.imgur.com/M01vXSs.png) ## Case Control Studies - 目的不是檢定，而是generate a hypothesis - 收集case(有病)跟control(沒病) - control group有recall bias - matching process - implicate that there is no correlation in statistics - 不能用logistic regression，要用**conditional** logistic regression ### Remdesivir as an example <img src="https://i.imgur.com/n6rj5Io.png" alt> First Case of 2019 Novel Coronavirus in the US - Remdesivir for the Treatment of Covid-19 — Final Report > The Kaplan–Meier estimates of mortality were 6.7% with remdesivir and 11.9% with placebo by day 15 and 11.4% with remdesivir and 15.2% with placebo by day 29 (**hazard ratio, 0.73**; 95% CI, 0.52 to 1.03). - EUA > [color=#2bcbd1]**統計學家的基本訓練：** > 製圖的能力(例如violin plot)、了解其能夠提供的訊息然而現場沒有人聽過或用過UpSet plot... ## Cross-Sectional Surveys - snapshot: 沒有時間先後 - used in pilot study > 在美國，做臨床研究的醫生，很少自己分析資料，可見統計學家確實有其專業。 ## Cohort Studies ### Nested Case-Control (較reliable) - case-control in particular cohort - matching - 可以用propensity score - propensity score需要大量的variable，且對於unknown variables的表現不理想 ### Case-cohort study - randomly selected control - 只找到case，而control中可能有case的contamination - 較容易和快速，但是統計分析較困難 **一個好的研究題目**：以bio-banks的研究為例 - radiation dose of radiation cancer survivors **vs.** mutation probabilities in offspring - 限於時間和經費，不然選擇nested case-control會較reliable - Reference - Genetic Disease in the Children of Danish Survivors of Childhood and Adolescent Cancer - The use of Next Generation Sequencing Technology to Study - MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis. - Mitochondria sequence mapping strategies and practicability of mitochondria variant detection from exome and RNA sequencing data - Bivariate Poisson models with varying offsets: an application to the paired mitochondrial DNA dataset ## Clinical Trial: prospective (less bias)、control、human ### Phases > 很多學統計博士的題目是PK model。目前美國的主流方法是[BOIN](https://odin.mdacc.tmc.edu/~yyuan/Software/BOIN/paper.pdf)，尤其在Cancer。 > 石瑜第一個把adaptive BOIN帶到NCBI，當初沒有人能review。 | Phase I | Phase II | Phase III| | -------------------------------------------------------------------------- | -------- | --------- | | safety:安全性應優先藥效考量 | 有16種design | control可以是active或negative | # Phase III Trials: three types of design | Parallel Design | Crossover Design | Factorial Design | | -------- | -------- | -------- | | | ![](https://i.imgur.com/4XHSrED.png)order difference | ![](https://i.imgur.com/Of3UMN1.png)A可以是 inhibitor (例如降血壓藥)，B可以是exercise | | | exclude interaction effect | additive or synergy effect predetermined| ## Crossover Design - assumption: no interaction effect (residual + carryover effect) - based on PK study - washout period - 一旦有significant interaction effect (p>.15)，就只能用較前面period的結果 > :question: 想想看，為什麼要以.15作為cut-off - 較節省需要的病人數量，但試驗時間就拉長 - limitation: 病人死掉 ## High-order crossover design ![](https://i.imgur.com/oP2KbZK.png) - power隨著interaction次數有所提升 - Williams’ design with three treatments - ABB: 連續兩個B可以看到穩定的outcome - Williams’ design with four treatments ## Factorial Design - assumption: additive or synergy effect predetermined - 多為additive effect - additive effect所需樣本數較少 - 唯一可用來檢測synergy effect的design - fractional factorial design # Group allocation Design: 以機構為study union ![](https://i.imgur.com/TKNU4fQ.png) Group allocation Design以機構(hospital、community、school、nursing home)為study union，而非個人。 # Stepped-wedge Cluster Randomized Trial (SW-CRT) > SW-CRTs are essentially a **one-way** crossover trial. | ![](https://i.imgur.com/iz61VoV.png) | ![](https://i.imgur.com/cBLeoij.png)| | -------- | -------- | | SW-CRT將time effect納入考量 | SW-CRT和Waitlist design都要等待，不能馬上給治療。 | | 因此每人都有機會接受treatment，進而提高recruitment rate | | 目前SW-CRT還缺乏有效的分析方法，所以是統計學的研究熱點。 # Hypothesis Testing: a one-tailed test ![](https://i.imgur.com/476osFd.png) - negative delta: clinically significant level - 只要不要在DC就可以claim是DT - proving equivalent: p<.05, reject DC and DT ## Nonhypothesis > Insignificant p-value does not mean equivalent. > ![](https://i.imgur.com/pb2Ymli.png) 要進行檢定，只能藉由拒絕nonhypothesis (<.05) --- # Data Analysis: MCMC **前言：** ![](https://i.imgur.com/6BLutc4.png) 未知的distribution (state space巨大)會增加抽樣的挑戰性。透過MCMC，我們可以隨機抽出具有代表性的樣本來進行參數推論。然而MCMC也有其限制，運算速度不理想，或是完全來自隨機以致收斂性等表現難以掌握的問題，使得學界開始尋找更好的逼近法。 # Monte Carlo Simulation的精隨：以局部代表整體 ## 亂灑一地的針有其意義 — Buffon's needle **問題：** ![](https://i.imgur.com/C8gk6Zo.png) 在桌上鋪好一張大白紙，白紙上畫滿了等距的平行線。再把長度為L的鐵針往圖面上丟，求鐵針與平行線相交的機率？ | ![](https://i.imgur.com/5qDNvVk.png) | ![](https://i.imgur.com/Klrhijs.png) | | -------- | -------- | | 假設線條間距為D，針的長度為L，相交的比例為p。則 $𝜋=\dfrac{2L}{Dp}$ | 在丟了N次後，鐵針與平行線相交的機率會近似1/𝜋 | **以程式驗證** - 丟10000次鐵針，其與平行線相交的機率： ```r ## Monte Carlo for P. L=1, D=2 n <- 10000 phi <- runif(n, min=0, max=pi) y <- runif(n, min=0, max=1) sum(y < 0.5*sin(phi))/n # approximate 1/pi = 0.318 ? ``` - 丟10000次鐵針，其掉落在圓內的機率： ```r # A function to estimate pi pifun <- function(n) { x <- runif(n, min=-1, max=1) # generate x y <- runif(n, min=-1, max=1) # generate y L <- sqrt(x^2 + y^2) # distance to the origin point P <- sum(L < 1)/n # probability 4*P } pifun(1000) ``` 更改取樣數為100、1000、10000，可以發現函數值越來越接近$\pi$。 ## Law of Large Numbers **想法：** 用Sample Mean去近似Population Mean > The **weak law** describes how a sequence of probabilities converges, and the **strong law** describes how a sequence of random variables behaves in the limit. ## Weak Law of Large Numbers 隨著抽的樣本增加，sample mean會收斂到母體平均 (超出範圍的機率越小)。 ![](https://i.imgur.com/fvtCLNR.png) - independent: 每次丟硬幣不受前一次影響 - identically: 硬幣每次都在同一設定或環境下丟 **比較不同分布的近似效果** ![](https://i.imgur.com/Ifo5ga0.png) ```r #### Weak law of large numbers # A function to calculate sample mean samplemeanfun <- function(dist, n) { if (dist=="normal") { xbar <- mean(rnorm(n, mean=10, sd=3)) } else if (dist=="exponential") { xbar <- mean(rexp(n, rate = 0.5)) } else { xbar <- mean(rbinom(n, size=30, prob=0.3)) } xbar } n <- 100 epsilon <- 0.3 set.seed(101) xbars <- replicate(1000, samplemeanfun(dist="normal", n)) sum(abs(xbars - 10) > epsilon)/1000 # mean = 10 set.seed(101) xbars <- replicate(1000, samplemeanfun(dist="exponential", n)) sum(abs(xbars - 2) > epsilon)/1000 # mean = 2 set.seed(101) xbars <- replicate(1000, samplemeanfun(dist="binomial", n)) sum(abs(xbars - 9) > epsilon)/1000 # mean = 9 n <- 100 epsilon <- 0.3 set.seed(101) xbars <- replicate(1000, mean(rcauchy(n, location = 0, scale = 1)) ) sum(abs(xbars - 0) > epsilon)/1000 ``` ![](https://i.imgur.com/ueLwQqZ.png) ![](https://i.imgur.com/0sxYsMV.png) *隨著N變大，sample mean的分布越來越接近常態。* ### Central Limit Theorem (CLT) ![](https://i.imgur.com/gEUzlvM.png) *The error will obey the square root law.* --- # Markov Chain Monte Carlo (MCMC) **Markov Chain Monte Carlo (MCMC)** 是採Bayesian Analysis下，結合Gibbs sampling和Metropolis-Hastings的演算方式來估計參數。 ## 欲解決的問題：如果不知道distribution，就難以抽樣 ![](https://i.imgur.com/HvoFjcT.png) *分母很難算* **Goal Setting:** to generate a sequence of random variables that are not fully independent and approximately form a (posterior) distribution. ## Bayesian Interference: Prior + Data → Posterior ### Frequentist vs. Bayesian Methods ![](https://i.imgur.com/gY99dfC.png) - In frequentist inference, probabilities are interpreted as long run frequencies. The goal is to create procedures with long run frequency guarantees. - In Bayesian inference, probabilities are interpreted as subjective degrees of belief. The goal is to state and analyze your beliefs. ![](https://i.imgur.com/XL7ogTy.png) - 選擇Beta分佈做為先驗 ![](https://i.imgur.com/D5utLa0.png) *When $\alpha$ and $\beta$ are high, the Beta distribution can be approximated by a Normal* - Bayesians always insist that inference must condition on the data, that it should not involve possible data that were never observed. - $p(x)$ requires to be computed such that $\int_\theta p( x | \theta ) p( \theta )\mathrm{d} \theta$ ### Beta Distribution In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial, negative binomial and geometric distributions. The beta distribution is a suitable model for the random behavior of percentages and proportions. > 為什麼我們要選擇Beta分佈作為先驗? 1. 當可能性分佈為二項分佈時 — Beta分佈具有數學上的方便性，能夠在貝氏推論的過程中保持原有的數學性質。 2. Beta分佈有很大的彈性，可以變成多種分佈，如同讀者們一開始所看到的gif動畫。 ### Classical Properties of θ¯: The Bernstein–von Mises Theorem The variance of the asymptotic distribution provides the standard errors of the estimates and allows for hypothesis testing, the accuracy of which rises with sample size. ### Parameter Estimation 一般對參數的假設是「未知但固定」，貝氏則認為參數「具有某種分配」，對參數的瞭解通常隨著樣本數增加而更確定。如果缺乏過去經驗，先驗分配可假設prior distirubution對整體的影響力極小 (Non-informative Prior) The MCMC algorithm iteratively updates the Markov chain based on the transition probability from a state to another state. Eventually, the chain attains the state of equilibrium when the joint probability distribution for the current state approaches the stationary distribution. The parameters that leads to the stationary distribution are considered as the model parameters learnt for the particular training image. ## Markov Chain 將所有從一個狀態到另一個狀態的機率蒐集起來，會得到一個transition matrix，只要這個transition matrix有一些好的性質，就可以證明其所對應的Markov Chain抽樣分布會收斂到想要的機率分佈。收斂到目標分佈的Markov Chain有無窮個，我們希望找到收斂速度較快的那一個。 ### Reversibility - **可逆**即是指對任一狀態，從它流到某狀態的機率跟流過來的機率相等。 - Markov Chain常常是可逆的，也就是當它最終收斂到想要的分佈時，目標分佈下每個狀態**流出的機率會等於流入的機率**。所以流出的總機率自然會等於流入的總機率。 - 然而，最佳的transition matrix並不一定剛好可逆。 - 如果狀態空間中有n個狀態且機率相等，則最佳的傳遞矩陣是每一個狀態都百分之百流到某狀態，全部狀態輪過一輪後再輪回來。 - 當 n>2 時，這樣的傳遞矩陣明顯不是可逆的，因為每個狀態流入與流出的對象一定不同。 ## Burn-In: one method of selecting a starting distribution ![](https://i.imgur.com/RxLTPMK.png) Typically, initial samples are not completely valid because the Markov Chain has not stabilized to the stationary distribution. The burn in samples allow you to discard these initial samples that are not yet at the stationary. start with a sample from m and then run n steps. starting point的規律 throwing away some iterations at the beginning of an MCMC run 隨著burn-in丟的數量越來越多，optimisation ![https://www.johndcook.com/mcmc.svg](https://www.johndcook.com/mcmc.svg) Because we are going to average some function of your samples, we burn in a chain, stopping after some finite *n* to enter a high probability region. ### Concerns - Convergence (equilibrium): What if you is in a lower probability region at the end of your burn-in period than at the beginning? - Rate of Convergence: What if your chain is slowly moving toward a higher probability region, but you’re still not close at the end of your burn-in? If the LLN (resp. CLT) holds for a stationary Harris recurrent Markov chain, then the LLN (resp. CLT) also holds for any other Markov chain having the same stationary transition probabilities - invariant - pi-irreducible - Harris recurrent ## MCMC演算法：Metropolis-Hasting algorithm和Gibbs Sampler 不同MCMC法主要的區別在於建構Markov Chain的方法。 ### Metropolis-Hasting algorithm (廣義) ![](https://i.imgur.com/DVi6qmP.png) 1. Generate a **proposal** (or a candidate) sample $x^{cand}$ from the proposal distribution $q(x^{(i)} | x^{(i-1)} )$ 2. Compute the **acceptance probability** via the acceptance function $\alpha(x^{cand} | x^{(i-1)} )$ based upon the proposal distribution and the full joint density $\pi(\cdot)$ 3. **Accept** the candidate sample with probability $\alpha$, the acceptance probability, **or reject** it with probability $1-\alpha$ 決定target function (那個很難算的distribution) proposal勢必要比target好算 The MH algorithm starts with simulating a “candidate” sample x cand from the proposal distribution q(·). Note that samples from the proposal distribution are not accepted automatically as posterior samples. These candidate samples are accepted probabilistically based on the acceptance probability α(·). There are mainly two kinds of proposal distributions, symmetric and asymmetric. A proposal distribution is a symmetric distribution if q(x (i) |x (i−1)) = q(x (i−1)|x (i) ). Straightforward choices of symmetric proposals include Gaussian distributions or Uniform distributions centered at the current state of the chain. For example, if we have a Gaussian proposal, then we have x cand = x (i−1)+ Normal(0, σ). Because the pdf for Normal(x cand − x (i−1); 0, σ) = Normal(x (i−1) − x cand; 0, σ), this is a symmetric proposal. This proposal distribution randomly perturbs the current state of the chain, and then either accepts or rejects the pertubed value. Algorithms of this form are called “Random-walk Metropolis algorithm.” Acceptance function: Intuitively, the MH acceptance function is designed to strike a balance between the following two constraints: (1) The sampler should tend to visit higher probability areas under the full joint density (this constraint is given by the ratio π(x cand) π(x(i−1)) ); (2) The sampler should explore the space and avoid getting stuck at one site (e.g., the sampler can reverse its previous move in the space; this constraint is given by the ratio q(x (i−1)|x cand) q(xcand|x(i−1)) ). It is important that the MH acceptance function has this particular form because this form ensures that the MH algorithm satisfies the condition of detailed balance, which guarantees that the stationary distribution of the MH algorithm is in fact the target posterior that we are interested in (see Gilks et al., 1996, for more details). For Beyesian Posterior Inference, $S_{0}(\theta) \propto p(\theta) \displaystyle\prod_{i=1}^{N} p( x_{i} | \theta )$ 1. Burn-in is unnecessarily slow. 2. $Var[I] \propto \frac{1}{T}$ is too high. ```r #### Metropolis-Hasting algorithm n <- 1000; y <- 709; a <- 3; b <- 2; ap <- 3; bp <- 2 M <- 100500 # k = 1, 2,??, M targ.fun <- function(theta0) dbinom(y, size=n, theta0)*dbeta(theta0, a, b) #target distribution prop.fun <- function(theta0) dbeta(theta0, ap, bp) #proposal distribution theta <- numeric(M) theta[1] <- 0.5 for(k in 2:M) { theta.k <- theta[k-1] theta.cand <- rbeta(1, ap, bp) ratio <- targ.fun(theta.cand)*prop.fun(theta.k)/(targ.fun(theta.k)*prop.fun(theta.cand)) alpha.theta <- min(1,ratio) accept <- sample(c(1,0), size=1, prob=c(alpha.theta, 1-alpha.theta)) theta[k] <- theta.cand*(accept==1) + theta.k*(accept==0) } #The 1st to 500th theta were thrown out, known as the burn-in theta.mcmc <- theta[(500+1):M] mean(theta.mcmc) ``` 接近0.708 ![](https://i.imgur.com/Ry1VjGC.png) ### Gibbs Sampling (狹義) proposal distributions are the posterior conditionals. 對於難處理的multivariate joint distribution，我們可以在選定維度下，根據conditional distribution去抽樣，使得Markov Chain每次都在一個維度上動。 ```r #### Gibbs sampler n <- 30 a <- 3 b <- 2 M <- 10^5 # generate data (yk, thetak), k=1,2,... # but only store yk, k=1,2,... y <- numeric(M) y[1] <- sample(c(0:n), 1) for (i in 2:M) { theta <- rbeta(1, y[i-1] + a, n - y[i-1] + b) y[i] <- rbinom(1, n, theta) } # Histogram for yk, k=1,2,... hist(y, breaks = seq(-0.5, n + 0.5, by = 1), freq = FALSE, main = "Beta-Binomial distribution") mean(y) # sample mean #### ### Theoretical distribution of y px <- function(x, n, a, b) { choose(n, x)*gamma(a + b)*gamma(x + a)*gamma(n - x + b)/ (gamma(a)*gamma(b)*gamma(a + b + n)) } z <- c(0:n) points(z, px(z, n, a, b), type = "h", col = 4, lw = 5) legend("topleft", c("MCMC", "Theoretical"), col =c(1, 4), lty = c(1, 1), lwd=2) ``` Beta-Binomial Distribution #### Locally Optimal Sampler 在選定維度下，套上我們得到的結果，將條件機率排序，然後從小流到大，最大的再分配到各地方。程式撰寫上很容易，只要將 Gibbs Sampler中依據條件機率抽樣的地方，多加進一個排序的動作即可。 ## Takeaways MCMC is a class of methods in which we can simulate draws that are slightly dependent and are approximately from a (posterior) distribution. ## Homework 1. MH 2. Gibbs sampling