# Adv_Biostat (2020.12.28) ###### tags: `BioStat` Simple Randomisation **CANNOT** gaurantee any point in time between any two arms are balanced. Standard of a good randomisation 1. **treatment的balance:** 兩個treatment若是1 to 1的randomisation,任何時間點如果停下來,treatment和control之間要非常相似 (50:50) 越接近無限越balance (到N = 1000差不多就有趨勢了) 反之,sample size不怎麼大時,就難以balance (尤其是earlier trial) 2. **variable之間的balance:** 所有baseline的covariates (variables、prognostic factors)要非常相似。兩組的差別只在於有無用藥、有無treatment --- 接下來兩個方法都要保證treatment的balance Permuted Block Randomisation 使用時機:small sample size、earlier trial end of block一定會得到一半A一半B 注意:block size的分配也要randomly (不能讓人知道),不然他們會猜下一個病人是誰 解決:two levels randomisation randomly assigned block size -> within the block size,決定order (Method 2) ![](https://i.imgur.com/Px0MNtR.png) ![](https://i.imgur.com/y6PST7B.png) Biased Coin (用得不多,每年在NEJM或JAMA仍有3-5篇文章) 從Simple Randomisation開始,紀錄哪一方比較多,多出多少 當其中一arm超出C,就要使用biased coin給落後的,期許study的過程中就能balance 若都沒有大於C,就一直使用simple randomisation 1. 不能保證兩個treatment中間imbalance degree (兩組中間最大的差距): 2. permuted block randomisation是保證最大差距是block size的一半 (b/2) 例如: block size=4,最糟就是AABB 3. Permuted Block和Biased Coin的list都是predetermined: computer生成好list後,第一個病人拿走list,接續著第二、第三個... 有不用predetermined的方法 小記: --- Stratified permuted block randomization (在Phase III佔70-80%) 除了treatment的balance,多處理了**variable之間的balance** 跟疾病相關的變數一半在A、一半在B Simple randomization guarantees treatment balance **within prognostic factors**. Process 1. First define strata (與疾病極為相關的變數) 2. Randomization is performed within each stratum, and is usually blocked. Use different blocking patters in each cell 如果一個研究cover多個期數,要如何處理癌症死亡跟期數的關係? 例如:分為前期跟中期,那就要在前期進行permuted block randomization,在中期另外進行一個permuted block randomization等等。 如果沒有stratification,即便後續進行regression的處理 (例如Cox regression),還是很難balance ![](https://i.imgur.com/bZyJ4SQ.png) ![](https://i.imgur.com/02frkrs.png) 即便在stratum中已經balance,model中還是要考慮stratum。而因為block size是randomly assigned,可以避免overall imbalance 回想起AZLI的例子。藥廠失敗是因為他們只做permuted block randomization。 之所以沒有做stratified,是因為sample size很少,variable又多的情況下,容易出現sparse cell。 Minimization Randomisation 使用情境: 1. sample size小 (例如<100) 且 2. prognostic factor偏多:多個varible都跟outcome相關,都需要balance。 **Minimization:** (adaptive randomisation) Balances treatments simultaneously over several prognostic factors (strata) Does **not** balance **within** cross-classified stratum cells; balances over the marginal totals of each stratum separately ![](https://i.imgur.com/Ff6QSez.png) 26個病人在treatment A,24個在B 進一步看,有三種variable Say the 51st patient enrolled in the study is male, age ≥ 61, stage III. ![](https://i.imgur.com/JRAVpaP.png) 得到marginal total: 27是A,24是B 因為B落後,需要balance,所以第51個病人分到B 加入新病人後,總表要update 於是,Male變成16:15,≥ 61變成4:7,stage III變成7跟5。 limitation: 因為無法保證cross-classified stratum cells的balance (類似biased coin),乍看之下會覺得年齡變得更不balance了。 由上面例子看出,Minimization Randomisation不能事先準備,而是要根據trial的狀況作調整。 > When there is a tie, use simple randomization. 因為variable很多,又是adaptive,所以不太會發生猜分組的狀況。 如果真的擔心的話,可以加入probability (biased coin的想法),例如:3/4的機率分配到落後,1/4到超前組 FDA也同意這樣的作法。例如下面的情境:要使用電腦assign時,電腦當掉了。醫師就從口袋拿出銅板,flip a coin,正面到A,反面到B,當場決定 前50個病人是如何分配的呢? Simple Randomisation 問題:根據不同stage的嚴重性給權重? Weighting 可以想見,給的weight越高,該stratum的balance就越好 ![](https://i.imgur.com/d0MSzPP.png) weight是根據clinical evidence決定。 **但是:** 如果覺得stage比gender重要很多,又為何要stratify by gender? 又如果覺得都很重要,就使得randomisation難以進行 偵測key factor的重要性 > 不要超過3-4個variable,每個variable建議是binary (50以上以下、停經前後、stage I+II v.s III+IV、轉移前後) 不同stratum之間如果是correlated,不影響randomisation,只要在分析結果時處理就好 但如果存在multicollinearity (A發生B就一定發生),就只要stratify一個就好了 (通常不會) Minimization is an excellent method for achieving balance in a relatively **small study with several stratification factors**. **Careful** Balance by margins does not guarantee overall treatment balance or balance within stratum cells. --- **Stratifying by institution** 老師在1994年畢業時,申請了多份工作。當時美國經濟很糟糕,其中一份工作的interview中,被問了: 「如果要做multicenter trial的randomisation,center是否需要stratify?」 center就好比醫學中心 multicenter trial中,center一定要stratify design的過程真的很重要,例如藥廠寄vaccine,因為control的數量、劑量都是有控制的。一箱沒有用完不會收到下一箱 --- Biased Urn Model (也是一種adaptive randomisation) 使用情境:越多病人assign到越好的treatment 是特例所以不常用 Urn Model: ![](https://i.imgur.com/i24q1Dj.png) limitation: outcome known quickly (不然下一個病人沒辦法進來) 所以,通常出現在OR room或是ED或ER (冠心病的病人被打針,馬上就能知道有無溶血) Urn Model中 Case Study:ECMO的故事 ![](https://i.imgur.com/q9uMvLX.png) Background: 老師的母校Michigan University在葉克膜的發展有高度的貢獻 嬰兒天生的心臟缺陷或畸形,使得醫生需要進行大型的開心手術。 然而當時99%術後都會死亡,有人開發出葉克膜作為體外循環的機器,取代心肺功能。 接下來要FDA通過,然而FDA堅持要有randomised trial的evidence。 於是去找石瑜的老師Richard Cornell (不是博論老師,應該只是論文答辯的教授) Cornell is a real gentleman.跟FDA溝通後,FDA仍然堅持要有試驗的支持,於是Cornell發展出Urn Model Randomization scheme 1. The first patient would be randomized to ECMO or conventional therapy with **equal** probability (one of each ball in the urn). 2. For each patient who survived on ECMO or died on conventional therapy, one ECMO ball would be added to the urn. 3. For each patient who survived on conventional therapy or died on ECMO, one conventional therapy ball would be added to the urn 4. **Randomization would continue until 10** of one ball type had been added to the urn. Then randomization would cease, and all patients would be assigned to the successful therapy. 為什麼第一次是1顆ECMO對1顆傳統,不是3:3或是3:5或是其他數字? 類似Bayesian Model的想法,有prior和likelihood 假設第一次ECMO的treatment成功,那加入第一顆ECMO的球會使ECMO的比例上升幅度較大,不然變化會很慢 有見證人,並用拍立得照片紀錄 Results - Patient 1 randomized to ECMO ⇒ Survived ⇒ Subsequent odds of randomization to ECMO = 2 : 1 - Patient 2 randomized to conventional therapy ⇒ Died ⇒ Subsequent odds of randomization to ECMO = 3 : 1 - Patient 3 randomized to ECMO ⇒ Survived ⇒ Subsequent odds of randomization to ECMO = 4 : 1 - All remaining patients randomized to ECMO ⇒ Survived - Final results: - 11 ECMO patients ⇒ All survived - control patient ⇒ Died 當天系主任正好走到public house的hallway,老師問他怎麼看起來很憔悴? 才知道系主任抽到傳統方式,母親哭得很難過 沏了一杯咖啡後,安慰一下老師,說自己在台灣選舉的經驗。可以把傳統的球冰起來,這樣抽樣的時候就知道哪個不要抽。 因為是conditional在之前的結果,所以不能用Chi-Square Goodness of Fit Test。 Biometrika特地出了一期專刊,討論如何用好的statistical inference解決biased urn,結論是沒有好的方法。 --- # Trial Monitoring (GO or NO GO) Reasons for Treatment Monitoring: 為了 - Early dramatic benefits - Potential harmful effects - Differences so unimpressive that showing a difference at the end of the trial is very unlikely (統計學家最常被挑戰的問題) 不單是醫學,工業或是其他任一計畫都應該要有monitor的系統 --- ## Repeated Testing for Significance repeated testing problem: 每做一次test,alpha就用掉5% (也就是會有5%的機率會有mistake,run the wrong decision)。 multiple-look後,alpha會inflat非常快 **Overall Goal:** 控制overall alpha只有5% ### Group Sequential Methods (都是alpha-spending function) 專注在**existing** data | | Heybittle-Peto |Pocock | O’Brien-Fleming | | -------- | -------- | --- | -------- | | way to spend $\alpha$ | ad hoc | evenly use | nonlinear | ![](https://i.imgur.com/e1PMLoD.png) 在前兩次look時,最容易stop study的是Pocock method。 假如廠商覺得某藥非常promising,就可以用Pocock賭一把,然後過了第四次如果沒有及時停下,就有風險 **結論:** 真正的alpha-spending function可以任意決定藥看的時間點,不一定要決定20、40、60、80、100%。 建議範例:可以50%看一次,接下來依序70、80、100%看。 early stopping boundary for superiority: treatment要好過stopping boundary才會stop the trial early stopping for futility (藥沒效):constant、不需要等到糟那麼多才停下來 DeMets and Ware propose using a fixed lower boundary at which you “accept” H0. For example, set Zi = −1.5 or −2.0. 實務上不常用DeMets and Ware提出的constant negative control,最常用的反而是Conditional Power。 #### Group Sequential Methods: Heybittle-Peto (不常用) Use a large critical value, say Z = ±3.0, for all interim tests (i < N). Standard normal distribution: bell-shaped, symmetric, mean=0, SD=1 Z=±3.0代表用掉很小的alpha (p-value很小) (c.f. Z=±1.96) 因為前面用掉的alpha很小,後面就不需要調整了: Then any adjustment for repeated testing at the final test (i = N) is **negligible**, and the conventional critical value can be used. conventional critical value是0.05,在此建議可用比0.05稍小的0.045、0.047、0.048,來判斷最後是否為positive result 註:This is an **ad hoc** method. 這是rule of thumb,不是真的alpha-spending (不是exactly control $\alpha$-level) #### Group Sequential Methods: Pocock (較常用) evenly spend $\alpha$ 假設總共要看data五次,第一次看是在有20% **outcome** collected (不是20%病人enroll),第二次是40% outcome collected,依序是60、80、100%的時候。 ![](https://i.imgur.com/l8v08gX.png) **註:** 第一次跟最後一次看的p-value都要很小,所以很可能到最後一次,p-value明明很小,反而是negative study #### Group Sequential Methods: O’Brien-Fleming (最常見) nonlinear using alpha 如同在randomisation中最常見的是stratified, monitor中常見的就是O’Brien-Fleming了。 $Z*\sqrt{\frac{\# test}{i}}$ ![](https://i.imgur.com/tQcmIKA.png) > [Z與$\alpha$的對應關係]: http://davidmlane.com/hyperstat/z_table.html 可以找到Z對應的area on the curve (alpha),作為p-value要多小的參考。 學統計的直接用R就可以 範例: | Column 1 | Column 2 | Column 3 | | | | ------------------------------------ | ------------------------------------ | ------------------------------------ | ------------------------------------ | --- | | ![](https://i.imgur.com/7go3Bqy.png) | ![](https://i.imgur.com/BNqxi6f.png) | ![](https://i.imgur.com/ACrK6Bz.png) | ![](https://i.imgur.com/mNy7pKs.png) |![](https://i.imgur.com/Y10zLnP.png) | | | | | | | 越來越容易rejected 最後一次與0.05已經相差不大: ![](https://i.imgur.com/dxiNMbY.png) ### Curtailed Sampling Procedures (controversial) 考慮**future** data - Simple curtailment: A study is stopped as soon as the result is inevitable (i.e., it could not be reversed). - Stochastic curtailment: A study is stopped as soon as the result is highly probable. **controversy:** 沒有multiple comparison test或是alpha-spending的idea (如同frequency詬病Bayesian的地方) Bayesian相信multiple looks不會造成repeated testing的問題,只是calculate posterior distribution。 #### Conditional Power (可以視為weighted average) recall: power P(test statistic > critical value | $\delta = p_I - p_c$ ) Power is prob. you can tell a story if there was a story Conditional Power是一種Stochastic curtailment Conditional Power is conditioned on: 1. future data 2. observed data so far 回到這張圖: ![](https://i.imgur.com/e1PMLoD.png) 第二次看時有40%,已經可以看到新treatment的進步幅度的delta,未來60%的data雖然還沒有看,我們assume未來60%跟當初用PS做power analysis中original assumption有相同的delta。 前面觀測的40%可能跟delta不同,但我們依然assume未來60%就是power analysis得到的東西。合起來丟到PS重算power,就會得到conditional power。 假設前最先估計進步20%。實際進行實驗後,前40%非常糟糕,相比standard treatment根本沒有進步,也就是delta是0。 conditional power會比一般的power小。我們可以用conditional power決定futility stopping boundary If early results show: - Intervention better than expected ⇒ conditional power large - Intervention worse than expected ⇒ conditional power small (unless sample size is increased) --- ##### Example Effect of tilarginine acetate in patients with acute myocardial infarction and cardiogenic shock JAMA 2007 ![](https://i.imgur.com/xO83ZgE.jpg) 用藥的死亡率還比較高 ![](https://i.imgur.com/68o5PZY.png) 打開PS software來verify文章是否正確 - mortality所以是binary endpoint - 相對於50%,有25% relative reduction: - P_1 = 50% * (50% + 25%) = 37.5% ![](https://i.imgur.com/0BIU6eM.png) 2n = 658 (329 per arm) --- ![](https://i.imgur.com/FGp0MkR.jpg) data有一半時看第一次,如果conditional power小於10%就stop (只有不到10%的機會會有positive result) **注意:** conditional power的stopping boundary**沒有**一致性 老師通常會stop在20% (10-30%都是可接受) 30-day all-cause mortality 48% in treatment and 42% in placebo P_1 = 37.5%, P_2 = 50% ![](https://i.imgur.com/lvIQDgO.png) 驗證得到conditional power 0.194 為什麼不像文章中提到,剛好是20%? 因為balance不像我們想得那麼很完美, ![](https://i.imgur.com/kDTEDM4.png) 修改一下參數43.9%和45.6%: ![](https://i.imgur.com/SmfEhH7.png) --- ### Simpson's Paradox --- # Homework Explanation (2:55:00) ## code 參考p29和p35 1. For a study with n = 32 patients (two treatment arms), set up a randomization table using each of the following strategies. In each case, discuss your results in terms of the balance achieved. 總共64個病人參加study a) Simple randomization 用Pocock table分 b) Randomization stratified by disease stage (at two levels) and blocked in blocks of size four (you do not know beforehand how many patients of each stage will be accrued) 準備list時trial還沒開始 所以要準備64個重的、64個輕的 c) Biased coin randomization (c=2) d) Simple urn randomization (start with one ball for each arm) 不是要play-the-winner,這裡不在乎誰輸誰贏,只要最後各50% e) Biased urn randomization (start with 1 ball for each arm and assuming that the success rates are 0.8 and 0.2 for arm 1 and arm 2 respectively) 回想ECMO的例子 先各丟一顆球,抽出一個顏色,arm 1有80%的機會成功。一旦成功就加一顆球,失敗就加另一個顏色 f) Repeat (e) (start with 5 balls for each arm). Compare the results of (e) and (f) and comment on the difference. 加入Bayesian的concept: 一開始的球數,從1對1到5對5,prior變強了 Provide sufficient documentation so that someone else will be able to independently verify your assignments. Bonus: Write a computer script to automatically generate the randomization. 2. Use the minimization technique for two strata: disease stage (at two levels) and performance status (at three levels). 注意:每增加一個新病人,要記得update table Assume that the randomization for the first 15 patients has resulted in the assignments given in Table 1 below. Assign the next three patients with characteristics given in Table 2 below. Choose one of the two minimization criteria, and state which you chose. First use equal weights. Do the assignments change if the weights for stage and performance status are 2:1? For the equal weights case, discuss your results in terms of the balance achieved overall and the balance achieved marginally for each level of each stratification factor. Bonus: Instead of manually calculating treatment assignments, write a computer script to automatically generate the treatment assignment 3. You plan to perform a clinical trial comparing two treatments (new vs. standard) where the outcome is dichotomous (success/failure). 用uncorrected Chi's square做 You have previous data showing the success rate on the standard therapy to be 0.40. An increase in the success rate of 0.15 would be clinically important. You want to test the hypothesis H0: PA = PB versus the alternative Ha: PA ≠ PB, with an alpha-level of 0.05, and a power of 0.90 to detect a difference of 0.15. a) Calculate the sample size using 1) Dupont’s software, 2) the uncorrected formula from the notes, 3) the formula corrected for continuity, 4) computer simulation method (bonus) 兩個parameter: N跟P 兩個binomial distribution成功率分別為0.40跟0.55,不知道N: 所以先assume N是10,也就是抽出10個N,就知道多少成功多少失敗,再做Chi's square test,看p-value是否小於.05 (reject H0),重複100-1000次,看power。 累加到超過80%的p-value <.05就停下,得到我們要的power 在上面的simulation過程中,P0和P1固定,只變動N b) Compare your answers in (a) and discuss any differences. c) For the corrected sample size calculated in (a[3]), calculate the power for five other values of $\delta$ = PA –PB and sketch the power curve. 用PS畫五個不同的$\delta$ 觀察趨勢: The bigger the difference, the higher the power. d) What is the confidence interval half-width (d) that you expect based on the (corrected) sample size calculated in (a)? e) If you expect that 10% of the patients randomized to new therapy will cross over to the standard treatment and 5% of the patients randomized to standard therapy will cross over to the new treatment, what sample size should you actually use (based on a[3])? crossover有drop-out和drop-in $N_{adjusted} = N \frac{1}{(1 − R_1 - R_2)^2}$ 4. Sample size where the outcome is continuous. For the example presented in class (cholesterol reduction): a) Sketch a plot of N as a function of $\sigma$ (for the given $\alpha$ and $\beta$), based on calculating three to five points. Comment. variance越大,對N的要求就會越大 b) For the originally calculated sample size of 2N = 1,051, what confidence interval half-width (d) do you expect? assume $\sigma_1$和$\sigma_2$ 是一樣的 c) Can you verify the sample size calculation based on computer simulation method (bonus)? 從standard normal distribution假設mean=0和SD=1、mean=1和SD=1,先用N=10抽,抽出來跑t-test得到p-value,看有多少個小於.05。 持續增加N=17,跑1000次,會有80%的simulation有小於.05的p-value 5. Find and plot the Pocock and O’Brien-Fleming boundaries for performing N = 8 tests, using $\alpha$ = 0.05. ```r graph <- data.frame( Methods = rep(c("O'Brien-Fleming","Pocock"), each = 8), n = rep(c(1:8), 2), z_of <- 2.07*sqrt(8/c(1:8)), z_pocock <- rep(2.512, length = 8), z <- c(z_of, z_pocock) ) ggplot(graph, aes(x=n, y=z, group=Methods)) + ylim(0,6)+ geom_line(aes(), lwd = 0.5)+ geom_point(aes(color=Methods), lwd = 5)+ labs(x="N= # of observations", y = "Z-Value")+ theme_bw()+ theme(legend.position = "bottom")+ theme(legend.text=element_text(size=14))+ theme(axis.text.x = element_text(size=12))+ theme(axis.text.y = element_text(size=12))+ theme(axis.title.x = element_text(face="bold", size=14))+ theme(axis.title.y = element_text(face="bold", size=14)) ``` --- # final exam: explanation Please critique the attached paper, “Effect of Vitamin D3 Supplementation on Severe Asthma Exacerbations in Children With Asthma and Low Vitamin D Levels The VDKA Randomized Clinical Trial” (Erick Forno, MD, MPH et al., JAMA. 2020;324(8):752-760), according to the following questions. 挑選5個問題回答 1. Background 2. Study objectives 3. Primary Hypothesis 4. Primary endpoint 5. Secondary hypotheses 6. Secondary end points 7. Measure of response (endpoint) 8. Randomization/stratification scheme 9. Sample size justification 10. Plans for interim monitoring 11. Plans for statistical analysis arm很少又imbalance的情況下,用PS或是propensity score(IPTW)解釋是不被FDA允許的,因為會丟掉太多資訊,所以一般較常用於retrospective study