# Adv_Biostat (2020.12.27)
###### tags: `BioStat`
- Sample Size Determination
- 用幾隻老鼠、收幾個病人、幾個學校才合理
- 以統計學術語稱為Power Analysis
- 多少機會可以發表在top journal if there was really a story
- 要多次評估
- 最糟的情況,經費會不會不夠?
- 最好的情況
- Randomization
- Balance
- Study Monitor (GO or NO GO)
- MCMC: Gibbs Sampler
---
# Sample Size Determination
WHEN to determine: trial/study planning
NO post hoc power
power越大越好
## Types of Errors in Hypothesis Testing
$\alpha$: reject Nonhypothesis when nonhypothesis is true 新藥沒有比現有的藥好,卻說這是個好藥
將糟糕的藥引入市場
隨著試驗的進行,error應該逐漸降低
FDA regulation: Phase II (10-20%)
Phase III (1-5%)
$\beta$: do no reject Nonhypothesis when nonhypothesis is not true
是一個好藥
錯失很好的藥
FDA regulation: no matter Phase II or III (10-20%)
power=1-$\beta$
明顯地,Type I error較危險
> p-value應該被取代了
> CI,看upp bound和lower bound
variation
新藥比老藥進步多少(animal model to human, literary research: ideas from published paper, early phase (pilot) study)
抗高血壓藥
都只是參考
真正決定significant level時,
若現在看到的improvement成真,我的colleages有多希望把它現有的處方簽改成我的新藥
significant level多大?大到
不是數學deal with的問題
例如:相比老藥,新藥顯著減少1 mmHg的血壓
如果power要越高,樣本數就要越大。
alpha越小,樣本數越大
variation越大,樣本數就要越大 (選outcome variable時,選variation較小的,才能節省成本)
病人表現表現差異大,越不consistent
variance小,越能精準預測藥效
clinically siganificant level越大,樣本數就可以越小
clinically siganificant是最challenge的部分,需要一定的時間
clinical significant level 跟effect size是同樣的東西
> :warning: N越小的情況下:
> standard error越大,而不是standard deviation
> SE一定小於SD
## Sample Size
Fn.
在observational study中,不做power analysis,而是用
- Reference
- Scientific Rigor in the Age of COVID-19
- 重concept、較general
- The design and analysis of non-randomized studies: A case study of off-label use of hydroxychloroquine in the COVID-19 pandemic
- technical、較detailed
## Preparing to calculate sample size
1. What is the main **purpose** of the trial? (This is the question on which sample size is based.)
superiority/noninferiority/equivalence
2. What is the principal measure of patient outcome (endpoint)? Is this measure continuous or discrete? Censoring?
- continuous (在任兩數中間可再insert任意數,依然make sense。例如年齡)
- discrete (insert)
- Censoring (例如time-to-event)
- 跟missing data不同
Censor定義的兩種情況
- end of study,要開始分析資料、寫文章,event沒發生在subject(老鼠沒生病、病人沒感染或住院)
- lost follow (沒意願繼續參加、搬遷、失聯)
4. What statistical test will be used to assess treatment difference (e.g.,t-test, log-rank, chi-square)? With what α-level? One-tailed or two-tailed?
FDA規定所有實驗two-tailed test (除了noninferiority是one-tailed test)
較保守,所以約定俗成
By convention two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5%.
6. What result is anticipated with the standard treatment (e.g., average value or rate)?
5. How **small** a treatment difference is it important to detect ($\delta$), and with what degree of certainty (power = $1−\beta$ )?
$\delta$=clinically significant level
## power curve
願不願意做實驗的評估
| |  |
| -------- | -------- |
| Text | $\delta$越小,所需樣本數越大 |
---
### Dichotomous response variables
Compare drug A (standard) vs. drug B (new).
病人的outcome是成或敗,是escriptive variable
$\rightarrow$ chi-square test
proportion: 變成continuous variable
The total sample size required (N in each group) is:

分母是
因為A是standard,可以從文獻得知,pilot study知道B的window。
若新藥進步幅度很大,所需的N就很小。
分子
$$

example:
設40%失敗率
新藥可將40->30%
pB=0.3 (pilot study也證實)
---
### Use PS to draw alpha, beta curve
# Randomization
# Study Monitor: GO or NO GO
健保資料庫: All of Us
crossover: dependent, matched
two proportion: 兩個失敗率
m: ratio between treatment and control


476*2=952
**Correction:**
**Underlying** distribution is **binomial (discrete)**, which we approximate with a normal distribution (continuous).


| Column 1 | Column 2 | Column 3 |
| -------- | -------- | -------- |
| 0.05 | | 140 |
| 0.10 | | 199 |
| 0.20 | | 293 |
| 0.30 | | 356 |
| 0.40 | | 387 |
| 0.50 | | 391 |
| 0.55 | | 387 |
| 0.60 | | 356 |
| 0.70 | Text | 293 |
| 0.80 | Text | 152 |
| 0.85 | Text | 152 |
同樣improve 10%,卻有variation
且呈現對稱
成功(或失敗)率極高的情況下,遇到一個病人是成功或失敗很好猜。
primary endpoint無論定成功或失敗,都用一樣的sample size
.5做中線,往上下推10%
CI決定樣本數
CI越窄,評估越precise,N越大
### 考慮Dropouts和Drop-ins
- 病人拒絕服用新藥。
仍然要給他吃標準的藥
經常發生
beta-blocker標準,compare A ACE inhibitor
通常不太有drop-ins
treatment outcome dilution
$N_{adjusted}=N\frac{1}{(1-R)^2}$
FDA有標準intention to treat的分析資料方式
modified intention to treat更好 (病人至少吃過一次藥)
FDA除了要求report ITT (intention to treat),也要report PP(per-protocol)
$N_{adjusted}=N\frac{1}{(1-R_1-R_2)^2}$
實驗前後就應推估,如果推估不夠嚴謹。降低power,p-value變大
如何估計?
若無法well-tolerated,就有較好的compliance rate
5%較理想,10%平常
請問老師,chi square有自己的probability table,那我們為什麼要用normal distribution的probability table數字來估計sample size之後,才再校正呢? (成大醫院神經部 林伯昱)
承這個問題,請問dropout不算是protocol violation嗎?dropout rate太高會不會影響對這個研究的可信度?(公衛所博一 鄭維鈞)
dropout不算是protocol violation
noninferiority study中,comply rate就很重要了
dropout只限於pros
如果使用資料庫分析,就是retrospective study,用CI的concept得知precision和reproducibility
---


|  |  |
| -------- | -------- |
| Text | Text |

delta 非常sensitive
所以需要sensitivity analysis
經費已知,需要多少病人才能達到目標delta
---
**Exercise:**

| $\delta$ | $\sigma$ | N | $\delta$ | $\sigma$ | N |
| -------- | -------- | --- | --- | -------- | -------- |
| Text | Text | | | | Text |
| Column 1 | Column 2 | | | | Column 3 |
| -------- | -------- | | | | -------- |
| Text | Text | | | | Text |
| Text | Text | | | | Text |
| Text | Text | | | | Text |
| Text | Text | | | | Text |
在continuous variable中,effector size表示detect多少SD。
所以儘可能挑continuous variable
且儘可能挑variance小 (較consistent),所需N較小
8隻老鼠

改alpha/2
Sample size for change-from-baseline response variables
---
## Time-to-event data

做survival時,多用median,而不用mean
因為有outlier
surgeon,病人離開ICU
一旦出院可以活得很長
平均survival time是4年,但是
50%可以存活至少5年以上
三個公式
| $\phi(\lambda)=\frac{\lambda^2}{}$ |  | |
| ------------------------------------------------- | -------- | -------- |
| All patients enter at the beginning of the study. | recruited continually during this study period | accrual occurs over a fixed time period, T0, followed
by a fixed interval of follow-up, T |
| 主要用於animal model | | 最常見 |
power跟N沒有太直接關係。做survival時有censoring。
number of event真正影響power(多少病人真的生病、需要IV、recurrent)
所以,所需樣本數更大 (第一個才有更多機會觀察event)
---
PS experiment
第一天,每個病人都進來
0,5,1,127 (單位:年)
5,0,1,208
2,3,1,192
以指數函數作為base function,只是根據follow-up、median等參數做approximation,所以沒有考慮censoring。
m: Ratio of control to experimental patients.

> 對照組1:1一定比1:2所需要的sample size少。
BUT 樣本數少不代表試驗一定會順利。
例如:實務操作上,arm ratio的是影響recruitment
| m1 | | N | | Column 2 | N |
| --- | --- | --- | -------- | -------- | -------- |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | Text | Text | Text |

Survival Analysis需要的樣本數最多,不容易做 (除非效果很顯著)
Binary次之
Continuous最少
prospective study 用 power analysis,老師的文章也提到 retrospective study 用 precision analysis, 那可以類推到 prospective cohort study 用 power analysis,case-control study 或cross-sectional, correlational study 是用 precision study嗎?謝謝老師。
---
肺癌

預估recruit兩年(104週)
follow-up變短,需要的N變大 (因為event少)
實務上:藉由拉長時間來降低N
PS的時間要統一
---
## Randomisation (只存在於prospective study)
亂數表無法guarantee治療的balancing
### permuted block
保證end of block的A、B各佔一半

block size
實際上大家都會猜
如何避免大家猜測?
副作用?
Block size can be **varied** over time, even randomly.
2、4和6也randomly assign
也可在R中直接抽樣,可是要確定分配比例
### biased coin
1989在Michigan做第一份工作(RA),
拿到一個鞋盒
需要100份random assignment (用biased coin)
除了統計學家,只有藥劑師真的知道名單
拿著biased coin分組的病人並不算是have "the same chance" of begin assigned to either intervntion or control?
例如open label的狀況,其實知道目前已收了24個治療20個control病人,那就會知道我手上現在要入trial的病人比較容易入control?
---
# Data Analysis: Regression
- Descriptive statistics
- Regression analysis
- Multiple imputation for missing values
- Variable selection: Elastic net/LASSO
補足missing point
---
```
#### Data loading
SimData2 <- read.csv('C:/Users/c64/Documents/成大/修課/ADVANCED BIOSTATISTICS IN BIOMEDICINE AND RESEARCH DESIGN/1227/Datasets/SimData2.csv', header=T)
str(SimData2)
head(SimData2, 10)
```
先看有多少個variable
```r
'data.frame': 1000 obs. of 5 variables:
$ y : num 5.89 5.31 1.13 11.63 5.46 ...
$ x1: chr "a" "b" "a" "b" ...
$ x2: num 0.945 4.682 1.753 1.843 -2.755 ...
$ x3: num 0.2219 0.0839 0.7957 -2.066 0.159 ...
$ x4: int 1 1 2 3 3 3 3 2 3 3 ...
```
這個variable missing的狀況如何?
```
Descriptive Statistics (N=1000)
+------+--------------------------------------+
| | |
| |(N=1000) |
+------+--------------------------------------+
|y | 1.335610/ 4.962864/10.423755 |
+------+--------------------------------------+
|x1 : b| 0.48 (476) |
+------+--------------------------------------+
|x2 | 0.3705265/1.8947198/3.4206255 |
+------+--------------------------------------+
|x3 |-0.680790652/ 0.004947291/ 0.718200553|
+------+--------------------------------------+
|x4 : 1| 0.32 (325) |
+------+--------------------------------------+
| 2 | 0.33 (331) |
+------+--------------------------------------+
| 3 | 0.34 (344) |
+------+--------------------------------------+
```
自動判別是category
```r
#### Descriptive Statistics: Hmisc::summaryM
library(Hmisc)
#output 1
summaryM(y + x1 + x2 + x3 + x4 ~ 1, data = SimData2, test = F, overall = F, na.include=T)
#output 2
summaryM(y + x2 + x3 + x4 ~ x1, data = SimData2, test = F, overall = F, na.include=T)
```
設a是舊藥,b是舊藥,試問a和b之間是否存在明顯的不平衡?
```
Descriptive Statistics (N=1000)
+------+-----------------------------------+-----------------------------------+
| |a |b |
| |(N=524) |(N=476) |
+------+-----------------------------------+-----------------------------------+
|y | 0.3101109/ 3.2154996/ 9.4013758 | 2.9911276/ 6.1313437/11.4271228 |
+------+-----------------------------------+-----------------------------------+
|x2 | -0.4032653/ 0.9111485/ 2.3550091 | 1.5850898/ 2.9520165/ 4.2215834 |
+------+-----------------------------------+-----------------------------------+
|x3 |-0.65960105/ 0.05697835/ 0.73958568|-0.68771211/-0.04715820/ 0.68992131|
+------+-----------------------------------+-----------------------------------+
|x4 : 1| 0.29 (154) | 0.36 (171) |
+------+-----------------------------------+-----------------------------------+
| 2 | 0.35 (183) | 0.31 (148) |
+------+-----------------------------------+-----------------------------------+
| 3 | 0.36 (187) | 0.33 (157) |
+------+-----------------------------------+-----------------------------------+
```
---
## Linear regression analysis
畫出散佈圖,以線逼近,residual最小時得到最好的function
| Column 1 | Column 2 |
| -------- | -------- |
| Text | Text |
hyper-plane 圖形上畫不出來
Model fitting:
```
'data.frame': 1000 obs. of 4 variables:
$ y : num 3.303 4.727 0.973 0.776 -4.945 ...
$ x1: chr "a" "b" "a" "b" ...
$ x2: num 0.945 4.682 1.753 1.843 -2.755 ...
$ x3: num 0.2219 0.0839 0.7957 -2.066 0.159
```
```
y x1 x2 x3
1 3.3034375 a 0.9451998 0.22192251
2 4.7267805 b 4.6815861 0.08394058
3 0.9733954 a 1.7530319 0.79568740
4 0.7763351 b 1.8432958 -2.06603159
5 -4.9449693 a -2.7552663 0.15904736
6 -1.3593703 a -0.3308874 0.78268076
7 3.0481263 b 2.8547990 -0.64282890
8 4.5344029 a 2.9982046 -1.13656199
9 8.5468337 b 6.0937402 -1.01275800
10 1.1499367 a -1.0332654 0.67824236
```
```
Call:
lm(formula = y ~ x1 + x2 + x3, data = SimData1)
Residuals:
Min 1Q Median 3Q Max
-6.2936 -1.3303 -0.0203 1.3184 6.0389
Coefficients:
Estimate Std. Error t value
(Intercept) -0.14262 0.08968 -1.590
x1b 1.18597 0.13651 8.688
x2 0.99130 0.02977 33.302
x3 0.17449 0.06135 2.844
Pr(>|t|)
(Intercept) 0.11208
x1b < 2e-16 ***
x2 < 2e-16 ***
x3 0.00455 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.942 on 996 degrees of freedom
Multiple R-squared: 0.6398, Adjusted R-squared: 0.6387
F-statistic: 589.6 on 3 and 996 DF, p-value: < 2.2e-16
```
| 越均勻散佈、越沒有pattern越好 | 越接近45°越好 |
| ------------------------------------ | -------- |
|  |  |
|  |  |
---
- Continuous outcome: 45.2, 44.1, 60.2, …
- lm(y ~ x1 + x2 +…)
- Discrete outcome:
- <font color="#f00">Dichotomy (binary)</font> outcome: No, Yes
- Count outcome: 0, 1, 2, 3, …
在medical journal較少 (例如:車子流量)
- <font color="#f00">Ordinal</font> outcome:
- 1 < 2 < 3
- …
- <font color="#f00">Survival outcome</font>:
Time to event
Censoring
Generalized linear models
g: link function
---
```
'data.frame': 768 obs. of 10 variables:
$ pregnant: int 6 1 8 1 0 5 3 10 2 8 ...
$ glucose : int 148 85 183 89 137 116 78 115 197 125 ...
$ pressure: int 72 66 64 66 40 74 50 NA 70 96 ...
$ triceps : int 35 29 NA 23 35 NA 32 NA 45 NA ...
$ insulin : int NA NA NA 94 168 NA 88 NA 543 NA ...
$ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ...
$ pedigree: num 0.627 0.351 0.672 0.167 2.288 ...
$ age : int 50 31 32 21 33 30 26 29 53 54 ...
$ diabetes: chr "pos" "neg" "pos" "neg" ...
$ obesity : int 1 0 0 0 1 0 1 1 1 NA ...
```
```
pregnant glucose pressure triceps insulin
1 6 148 72 35 NA
2 1 85 66 29 NA
3 8 183 64 NA NA
4 1 89 66 23 94
5 0 137 40 35 168
6 5 116 74 NA NA
7 3 78 50 32 88
8 10 115 NA NA NA
9 2 197 70 45 543
10 8 125 96 NA NA
mass pedigree age diabetes obesity
1 33.6 0.627 50 pos 1
2 26.6 0.351 31 neg 0
3 23.3 0.672 32 pos 0
4 28.1 0.167 21 neg 0
5 43.1 2.288 33 pos 1
6 25.6 0.201 30 neg 0
7 31.0 0.248 26 pos 1
8 35.3 0.134 29 neg 1
9 30.5 0.158 53 pos 1
10 NA 0.232 54 pos NA
```
實踐上常常會成Odds ratio (OR)
estimate不容易解釋

把outcome當作factor或文字會影響到資料結果
glm
```
'data.frame': 768 obs. of 10 variables:
$ pregnant: int 6 1 8 1 0 5 3 10 2 8 ...
$ glucose : int 148 85 183 89 137 116 78 115 197 125 ...
$ pressure: int 72 66 64 66 40 74 50 NA 70 96 ...
$ triceps : int 35 29 NA 23 35 NA 32 NA 45 NA ...
$ insulin : int NA NA NA 94 168 NA 88 NA 543 NA ...
$ mass : num 33.6 26.6 23.3 28.1 43.1 25.6 31 35.3 30.5 NA ...
$ pedigree: num 0.627 0.351 0.672 0.167 2.288 ...
$ age : int 50 31 32 21 33 30 26 29 53 54 ...
$ diabetes: chr "pos" "neg" "pos" "neg" ...
$ obesity : int 1 0 0 0 1 0 1 1 1 NA ...
```
所以要加一行
```
BinOutcome$diabetes <- factor(BinOutcome$diabetes)
```
## Ordinal Outcome
```r
#### Ordinal outcome
OrdOutcome <- read.csv('C:/Users/c64/Documents/成大/修課/ADVANCED BIOSTATISTICS IN BIOMEDICINE AND RESEARCH DESIGN/1227/Datasets/OrdOutcome.csv', header=T)
str(OrdOutcome)
levels(OrdOutcome$apply) # "somewhat likely" "unlikely" "very likely??
OrdOutcome$apply <- ordered(OrdOutcome$apply, levels = c("unlikely", "somewhat likely", "very likely"))
library(MASS)
fit4 <- polr(apply ~ pared + public + gpa, data = OrdOutcome, Hess=TRUE)
summary(fit4)
#### Confidence Interval for OR
coeftable <- coef(summary(fit4))
pvalue <- 2 * pnorm(abs(coeftable[, "t value"]), lower.tail = FALSE)
cbind(coeftable, "p_value" = pvalue)
CI_OR <- exp(cbind(Est=coeftable[,"Value"],
CI.L=coeftable[,"Value"] - 1.96 * coeftable[,"Std. Error"],
CI.U=coeftable[,"Value"] + 1.96 * coeftable[,"Std. Error"]))
round(CI_OR, 3)
# or use R command
CI_OR2 <- exp(cbind(OR=coef(fit4), confint(fit4)))
round(CI_OR2 , 3)
# Model diagnostics: goodness of fit [Hosmer-Lemeshow tests]
library(generalhoslem)
logitgof(OrdOutcome$apply, fitted(fit4), ord = TRUE)
#logitgof(OrdOutcome$apply[-c(fit4$na.action)], fitted(fit4), ord = TRUE) if NA exists
```
在此之前都是mean regression model,下面來看看不同的例子
## Survival: Cox
considered hazards instead of on mean
沒有intercept (= $\beta_0$) (已經合併在裡面)
intercept是baseline表現
### Case Study: lung cancer
資料修正
factor()
lm和glm
## Multiple imputation for missing values
missing values的危害
(p.29)
波浪符可代換