# Adv_Biostat (2020.12.27) ###### tags: `BioStat` - Sample Size Determination - 用幾隻老鼠、收幾個病人、幾個學校才合理 - 以統計學術語稱為Power Analysis - 多少機會可以發表在top journal if there was really a story - 要多次評估 - 最糟的情況,經費會不會不夠? - 最好的情況 - Randomization - Balance - Study Monitor (GO or NO GO) - MCMC: Gibbs Sampler --- # Sample Size Determination WHEN to determine: trial/study planning NO post hoc power power越大越好 ## Types of Errors in Hypothesis Testing $\alpha$: reject Nonhypothesis when nonhypothesis is true 新藥沒有比現有的藥好,卻說這是個好藥 將糟糕的藥引入市場 隨著試驗的進行,error應該逐漸降低 FDA regulation: Phase II (10-20%) Phase III (1-5%) $\beta$: do no reject Nonhypothesis when nonhypothesis is not true 是一個好藥 錯失很好的藥 FDA regulation: no matter Phase II or III (10-20%) power=1-$\beta$ 明顯地,Type I error較危險 > p-value應該被取代了 > CI,看upp bound和lower bound variation 新藥比老藥進步多少(animal model to human, literary research: ideas from published paper, early phase (pilot) study) 抗高血壓藥 都只是參考 真正決定significant level時, 若現在看到的improvement成真,我的colleages有多希望把它現有的處方簽改成我的新藥 significant level多大?大到 不是數學deal with的問題 例如:相比老藥,新藥顯著減少1 mmHg的血壓 如果power要越高,樣本數就要越大。 alpha越小,樣本數越大 variation越大,樣本數就要越大 (選outcome variable時,選variation較小的,才能節省成本) 病人表現表現差異大,越不consistent variance小,越能精準預測藥效 clinically siganificant level越大,樣本數就可以越小 clinically siganificant是最challenge的部分,需要一定的時間 clinical significant level 跟effect size是同樣的東西 > :warning: N越小的情況下: > standard error越大,而不是standard deviation > SE一定小於SD ## Sample Size Fn. 在observational study中,不做power analysis,而是用 - Reference - Scientific Rigor in the Age of COVID-19 - 重concept、較general - The design and analysis of non-randomized studies: A case study of off-label use of hydroxychloroquine in the COVID-19 pandemic - technical、較detailed ## Preparing to calculate sample size 1. What is the main **purpose** of the trial? (This is the question on which sample size is based.) superiority/noninferiority/equivalence 2. What is the principal measure of patient outcome (endpoint)? Is this measure continuous or discrete? Censoring? - continuous (在任兩數中間可再insert任意數,依然make sense。例如年齡) - discrete (insert) - Censoring (例如time-to-event) - 跟missing data不同 Censor定義的兩種情況 - end of study,要開始分析資料、寫文章,event沒發生在subject(老鼠沒生病、病人沒感染或住院) - lost follow (沒意願繼續參加、搬遷、失聯) 4. What statistical test will be used to assess treatment difference (e.g.,t-test, log-rank, chi-square)? With what α-level? One-tailed or two-tailed? FDA規定所有實驗two-tailed test (除了noninferiority是one-tailed test) 較保守,所以約定俗成 By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%. 6. What result is anticipated with the standard treatment (e.g., average value or rate)? 5. How **small** a treatment difference is it important to detect ($\delta$), and with what degree of certainty (power = $1−\beta$ )? $\delta$=clinically significant level ## power curve 願不願意做實驗的評估 | ![](https://i.imgur.com/JF8BWTs.png)| ![](https://i.imgur.com/5UvYs3O.png) | | -------- | -------- | | Text | $\delta$越小,所需樣本數越大 | --- ### Dichotomous response variables Compare drug A (standard) vs. drug B (new). 病人的outcome是成或敗,是escriptive variable $\rightarrow$ chi-square test proportion: 變成continuous variable The total sample size required (N in each group) is: ![](https://i.imgur.com/Ck5N6h6.png) 分母是 因為A是standard,可以從文獻得知,pilot study知道B的window。 若新藥進步幅度很大,所需的N就很小。 分子 $$ ![](https://i.imgur.com/qu9bJ1D.png) example: 設40%失敗率 新藥可將40->30% pB=0.3 (pilot study也證實) --- ### Use PS to draw alpha, beta curve # Randomization # Study Monitor: GO or NO GO 健保資料庫: All of Us crossover: dependent, matched two proportion: 兩個失敗率 m: ratio between treatment and control ![](https://i.imgur.com/96FmxGB.png) ![](https://i.imgur.com/ZYsjvzy.png) 476*2=952 **Correction:** **Underlying** distribution is **binomial (discrete)**, which we approximate with a normal distribution (continuous). ![](https://i.imgur.com/gP2svBw.png) ![](https://i.imgur.com/mg2sKL6.png) | Column 1 | Column 2 | Column 3 | | -------- | -------- | -------- | | 0.05 | | 140 | | 0.10 | | 199 | | 0.20 | | 293 | | 0.30 | | 356 | | 0.40 | | 387 | | 0.50 | | 391 | | 0.55 | | 387 | | 0.60 | | 356 | | 0.70 | Text | 293 | | 0.80 | Text | 152 | | 0.85 | Text | 152 | 同樣improve 10%,卻有variation 且呈現對稱 成功(或失敗)率極高的情況下,遇到一個病人是成功或失敗很好猜。 primary endpoint無論定成功或失敗,都用一樣的sample size .5做中線,往上下推10% CI決定樣本數 CI越窄,評估越precise,N越大 ### 考慮Dropouts和Drop-ins - 病人拒絕服用新藥。 仍然要給他吃標準的藥 經常發生 beta-blocker標準,compare A ACE inhibitor 通常不太有drop-ins treatment outcome dilution $N_{adjusted}=N\frac{1}{(1-R)^2}$ FDA有標準intention to treat的分析資料方式 modified intention to treat更好 (病人至少吃過一次藥) FDA除了要求report ITT (intention to treat),也要report PP(per-protocol) $N_{adjusted}=N\frac{1}{(1-R_1-R_2)^2}$ 實驗前後就應推估,如果推估不夠嚴謹。降低power,p-value變大 如何估計? 若無法well-tolerated,就有較好的compliance rate 5%較理想,10%平常 請問老師,chi square有自己的probability table,那我們為什麼要用normal distribution的probability table數字來估計sample size之後,才再校正呢? (成大醫院神經部 林伯昱) 承這個問題,請問dropout不算是protocol violation嗎?dropout rate太高會不會影響對這個研究的可信度?(公衛所博一 鄭維鈞) dropout不算是protocol violation noninferiority study中,comply rate就很重要了 dropout只限於pros 如果使用資料庫分析,就是retrospective study,用CI的concept得知precision和reproducibility --- ![](https://i.imgur.com/rKn5KII.png) ![](https://i.imgur.com/AzfKRAc.png) | ![](https://i.imgur.com/PJiSvfn.png) | ![](https://i.imgur.com/VAJd14i.png) | | -------- | -------- | | Text | Text | ![](https://i.imgur.com/WN4FAwF.png) delta 非常sensitive 所以需要sensitivity analysis 經費已知,需要多少病人才能達到目標delta --- **Exercise:** ![](https://i.imgur.com/SIvBPEz.png) | $\delta$ | $\sigma$ | N | $\delta$ | $\sigma$ | N | | -------- | -------- | --- | --- | -------- | -------- | | Text | Text | | | | Text | | Column 1 | Column 2 | | | | Column 3 | | -------- | -------- | | | | -------- | | Text | Text | | | | Text | | Text | Text | | | | Text | | Text | Text | | | | Text | | Text | Text | | | | Text | 在continuous variable中,effector size表示detect多少SD。 所以儘可能挑continuous variable 且儘可能挑variance小 (較consistent),所需N較小 8隻老鼠 ![](https://i.imgur.com/uOfMZTs.png) 改alpha/2 Sample size for change-from-baseline response variables --- ## Time-to-event data ![](https://i.imgur.com/8oFGDbZ.png) 做survival時,多用median,而不用mean 因為有outlier surgeon,病人離開ICU 一旦出院可以活得很長 平均survival time是4年,但是 50%可以存活至少5年以上 三個公式 | $\phi(\lambda)=\frac{\lambda^2}{}$![](https://i.imgur.com/eqxma8m.png) | ![](https://i.imgur.com/mwbVuUG.png) |![](https://i.imgur.com/pslE0vc.png) | | ------------------------------------------------- | -------- | -------- | | All patients enter at the beginning of the study. | recruited continually during this study period | accrual occurs over a fixed time period, T0, followed by a fixed interval of follow-up, T | | 主要用於animal model | | 最常見 | power跟N沒有太直接關係。做survival時有censoring。 number of event真正影響power(多少病人真的生病、需要IV、recurrent) 所以,所需樣本數更大 (第一個才有更多機會觀察event) --- PS experiment 第一天,每個病人都進來 0,5,1,127 (單位:年) 5,0,1,208 2,3,1,192 以指數函數作為base function,只是根據follow-up、median等參數做approximation,所以沒有考慮censoring。 m: Ratio of control to experimental patients. ![](https://i.imgur.com/vQhmep9.png) > 對照組1:1一定比1:2所需要的sample size少。 BUT 樣本數少不代表試驗一定會順利。 例如:實務操作上,arm ratio的是影響recruitment | m1 | | N | | Column 2 | N | | --- | --- | --- | -------- | -------- | -------- | | | | | | | | | | | | | | | | | | | | | | | | | | Text | Text | Text | ![](https://i.imgur.com/MWr6Ds7.png) Survival Analysis需要的樣本數最多,不容易做 (除非效果很顯著) Binary次之 Continuous最少 prospective study 用 power analysis,老師的文章也提到 retrospective study 用 precision analysis, 那可以類推到 prospective cohort study 用 power analysis,case-control study 或cross-sectional, correlational study 是用 precision study嗎?謝謝老師。 --- 肺癌 ![](https://i.imgur.com/6jDViLU.png) 預估recruit兩年(104週) follow-up變短,需要的N變大 (因為event少) 實務上:藉由拉長時間來降低N PS的時間要統一 --- ## Randomisation (只存在於prospective study) 亂數表無法guarantee治療的balancing ### permuted block 保證end of block的A、B各佔一半 ![](https://i.imgur.com/Hf5xO80.png) block size 實際上大家都會猜 如何避免大家猜測? 副作用? Block size can be **varied** over time, even randomly. 2、4和6也randomly assign 也可在R中直接抽樣,可是要確定分配比例 ### biased coin 1989在Michigan做第一份工作(RA), 拿到一個鞋盒 需要100份random assignment (用biased coin) 除了統計學家,只有藥劑師真的知道名單 拿著biased coin分組的病人並不算是have "the same chance" of begin assigned to either intervntion or control? 例如open label的狀況,其實知道目前已收了24個治療20個control病人,那就會知道我手上現在要入trial的病人比較容易入control? --- # Data Analysis: Regression - Descriptive statistics - Regression analysis - Multiple imputation for missing values - Variable selection: Elastic net/LASSO 補足missing point --- ``` #### Data loading SimData2 <- read.csv('C:/Users/c64/Documents/成大/修課/ADVANCED BIOSTATISTICS IN BIOMEDICINE AND RESEARCH DESIGN/1227/Datasets/SimData2.csv', header=T) str(SimData2) head(SimData2, 10) ``` 先看有多少個variable ```r 'data.frame': 1000 obs. of 5 variables: $ y : num 5.89 5.31 1.13 11.63 5.46 ... $ x1: chr "a" "b" "a" "b" ... $ x2: num 0.945 4.682 1.753 1.843 -2.755 ... $ x3: num 0.2219 0.0839 0.7957 -2.066 0.159 ... $ x4: int 1 1 2 3 3 3 3 2 3 3 ... ``` 這個variable missing的狀況如何? ``` Descriptive Statistics (N=1000) +------+--------------------------------------+ | | | | |(N=1000) | +------+--------------------------------------+ |y | 1.335610/ 4.962864/10.423755 | +------+--------------------------------------+ |x1 : b| 0.48 (476) | +------+--------------------------------------+ |x2 | 0.3705265/1.8947198/3.4206255 | +------+--------------------------------------+ |x3 |-0.680790652/ 0.004947291/ 0.718200553| +------+--------------------------------------+ |x4 : 1| 0.32 (325) | +------+--------------------------------------+ | 2 | 0.33 (331) | +------+--------------------------------------+ | 3 | 0.34 (344) | +------+--------------------------------------+ ``` 自動判別是category ```r #### Descriptive Statistics: Hmisc::summaryM library(Hmisc) #output 1 summaryM(y + x1 + x2 + x3 + x4 ~ 1, data = SimData2, test = F, overall = F, na.include=T) #output 2 summaryM(y + x2 + x3 + x4 ~ x1, data = SimData2, test = F, overall = F, na.include=T) ``` 設a是舊藥,b是舊藥,試問a和b之間是否存在明顯的不平衡? ``` Descriptive Statistics (N=1000) +------+-----------------------------------+-----------------------------------+ | |a |b | | |(N=524) |(N=476) | +------+-----------------------------------+-----------------------------------+ |y | 0.3101109/ 3.2154996/ 9.4013758 | 2.9911276/ 6.1313437/11.4271228 | +------+-----------------------------------+-----------------------------------+ |x2 | -0.4032653/ 0.9111485/ 2.3550091 | 1.5850898/ 2.9520165/ 4.2215834 | +------+-----------------------------------+-----------------------------------+ |x3 |-0.65960105/ 0.05697835/ 0.73958568|-0.68771211/-0.04715820/ 0.68992131| +------+-----------------------------------+-----------------------------------+ |x4 : 1| 0.29 (154) | 0.36 (171) | +------+-----------------------------------+-----------------------------------+ | 2 | 0.35 (183) | 0.31 (148) | +------+-----------------------------------+-----------------------------------+ | 3 | 0.36 (187) | 0.33 (157) | +------+-----------------------------------+-----------------------------------+ ``` --- ## Linear regression analysis 畫出散佈圖,以線逼近,residual最小時得到最好的function | Column 1 | Column 2 | | -------- | -------- | | Text | Text | hyper-plane 圖形上畫不出來 Model fitting: ``` 'data.frame': 1000 obs. of 4 variables: $ y : num 3.303 4.727 0.973 0.776 -4.945 ... $ x1: chr "a" "b" "a" "b" ... $ x2: num 0.945 4.682 1.753 1.843 -2.755 ... $ x3: num 0.2219 0.0839 0.7957 -2.066 0.159 ``` ``` y x1 x2 x3 1 3.3034375 a 0.9451998 0.22192251 2 4.7267805 b 4.6815861 0.08394058 3 0.9733954 a 1.7530319 0.79568740 4 0.7763351 b 1.8432958 -2.06603159 5 -4.9449693 a -2.7552663 0.15904736 6 -1.3593703 a -0.3308874 0.78268076 7 3.0481263 b 2.8547990 -0.64282890 8 4.5344029 a 2.9982046 -1.13656199 9 8.5468337 b 6.0937402 -1.01275800 10 1.1499367 a -1.0332654 0.67824236 ``` ``` Call: lm(formula = y ~ x1 + x2 + x3, data = SimData1) Residuals: Min 1Q Median 3Q Max -6.2936 -1.3303 -0.0203 1.3184 6.0389 Coefficients: Estimate Std. Error t value (Intercept) -0.14262 0.08968 -1.590 x1b 1.18597 0.13651 8.688 x2 0.99130 0.02977 33.302 x3 0.17449 0.06135 2.844 Pr(>|t|) (Intercept) 0.11208 x1b < 2e-16 *** x2 < 2e-16 *** x3 0.00455 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.942 on 996 degrees of freedom Multiple R-squared: 0.6398, Adjusted R-squared: 0.6387 F-statistic: 589.6 on 3 and 996 DF, p-value: < 2.2e-16 ``` | 越均勻散佈、越沒有pattern越好 | 越接近45°越好 | | ------------------------------------ | -------- | | ![](https://i.imgur.com/s9QMWxU.png) | ![](https://i.imgur.com/XB5EmOf.png) | | ![](https://i.imgur.com/kCha2eU.png) | ![](https://i.imgur.com/G9mDjFs.png) | --- - Continuous outcome: 45.2, 44.1, 60.2, … - lm(y ~ x1 + x2 +…) - Discrete outcome: - <font color="#f00">Dichotomy (binary)</font> outcome: No, Yes - Count outcome: 0, 1, 2, 3, … 在medical journal較少 (例如:車子流量) - <font color="#f00">Ordinal</font> outcome: - 1 < 2 < 3 - … - <font color="#f00">Survival outcome</font>: Time to event Censoring Generalized linear models g: link function --- ``` 'data.frame': 768 obs. of 10 variables: $ pregnant: int 6 1 8 1 0 5 3 10 2 8 ... $ glucose : int 148 85 183 89 137 116 78 115 197 125 ... $ pressure: int 72 66 64 66 40 74 50 NA 70 96 ... $ triceps : int 35 29 NA 23 35 NA 32 NA 45 NA ... $ insulin : int NA NA NA 94 168 NA 88 NA 543 NA ... $ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ... $ pedigree: num 0.627 0.351 0.672 0.167 2.288 ... $ age : int 50 31 32 21 33 30 26 29 53 54 ... $ diabetes: chr "pos" "neg" "pos" "neg" ... $ obesity : int 1 0 0 0 1 0 1 1 1 NA ... ``` ``` pregnant glucose pressure triceps insulin 1 6 148 72 35 NA 2 1 85 66 29 NA 3 8 183 64 NA NA 4 1 89 66 23 94 5 0 137 40 35 168 6 5 116 74 NA NA 7 3 78 50 32 88 8 10 115 NA NA NA 9 2 197 70 45 543 10 8 125 96 NA NA mass pedigree age diabetes obesity 1 33.6 0.627 50 pos 1 2 26.6 0.351 31 neg 0 3 23.3 0.672 32 pos 0 4 28.1 0.167 21 neg 0 5 43.1 2.288 33 pos 1 6 25.6 0.201 30 neg 0 7 31.0 0.248 26 pos 1 8 35.3 0.134 29 neg 1 9 30.5 0.158 53 pos 1 10 NA 0.232 54 pos NA ``` 實踐上常常會成Odds ratio (OR) estimate不容易解釋 ![](https://i.imgur.com/0XnVWpd.png) 把outcome當作factor或文字會影響到資料結果 glm ``` 'data.frame': 768 obs. of 10 variables: $ pregnant: int 6 1 8 1 0 5 3 10 2 8 ... $ glucose : int 148 85 183 89 137 116 78 115 197 125 ... $ pressure: int 72 66 64 66 40 74 50 NA 70 96 ... $ triceps : int 35 29 NA 23 35 NA 32 NA 45 NA ... $ insulin : int NA NA NA 94 168 NA 88 NA 543 NA ... $ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ... $ pedigree: num 0.627 0.351 0.672 0.167 2.288 ... $ age : int 50 31 32 21 33 30 26 29 53 54 ... $ diabetes: chr "pos" "neg" "pos" "neg" ... $ obesity : int 1 0 0 0 1 0 1 1 1 NA ... ``` 所以要加一行 ``` BinOutcome$diabetes <- factor(BinOutcome$diabetes) ``` ## Ordinal Outcome ```r #### Ordinal outcome OrdOutcome <- read.csv('C:/Users/c64/Documents/成大/修課/ADVANCED BIOSTATISTICS IN BIOMEDICINE AND RESEARCH DESIGN/1227/Datasets/OrdOutcome.csv', header=T) str(OrdOutcome) levels(OrdOutcome$apply) # "somewhat likely" "unlikely" "very likely?€? OrdOutcome$apply <- ordered(OrdOutcome$apply, levels = c("unlikely", "somewhat likely", "very likely")) library(MASS) fit4 <- polr(apply ~ pared + public + gpa, data = OrdOutcome, Hess=TRUE) summary(fit4) #### Confidence Interval for OR coeftable <- coef(summary(fit4)) pvalue <- 2 * pnorm(abs(coeftable[, "t value"]), lower.tail = FALSE) cbind(coeftable, "p_value" = pvalue) CI_OR <- exp(cbind(Est=coeftable[,"Value"], CI.L=coeftable[,"Value"] - 1.96 * coeftable[,"Std. Error"], CI.U=coeftable[,"Value"] + 1.96 * coeftable[,"Std. Error"])) round(CI_OR, 3) # or use R command CI_OR2 <- exp(cbind(OR=coef(fit4), confint(fit4))) round(CI_OR2 , 3) # Model diagnostics: goodness of fit [Hosmer-Lemeshow tests] library(generalhoslem) logitgof(OrdOutcome$apply, fitted(fit4), ord = TRUE) #logitgof(OrdOutcome$apply[-c(fit4$na.action)], fitted(fit4), ord = TRUE) if NA exists ``` 在此之前都是mean regression model,下面來看看不同的例子 ## Survival: Cox considered hazards instead of on mean 沒有intercept (= $\beta_0$) (已經合併在裡面) intercept是baseline表現 ### Case Study: lung cancer 資料修正 factor() lm和glm ## Multiple imputation for missing values missing values的危害 (p.29) 波浪符可代換