Bland–Altman plot 實作
===
###### tags: `python` `statistics`
![](https://hackmd.io/_uploads/Sk7OiHREc.gif)
Bland–Altman plot 將兩組觀(預)測值間差異可視化,並估計觀(預)測值間差異的可信區間。
# 縱座標:觀(預)測值間差異
縱座標為觀(預)測值間差異 $\Delta o_j=o_j^1-o_j^2$,其中 $o_j^i$ 為第 $i$ 組的第 $j$ 筆觀(預)測值;假設觀(預)測值間差異為常態分佈:
$$D\sim N(\mu_s,\hat{s}^2)$$
其中 $\mu_s=\overline{\Delta o_j}$ 為差異的樣本平均,$\hat{s}^2=\frac{1}{J-1}\sum_{j=1}^J(\Delta o_j-\mu_s)^2$ 為差異的校正後樣本變異,則差異的(95%)信賴區間可以進一步被估計:
$$\left[\mu_s-1.96\hat{s}, \mu_s+1.96\hat{s}\right]$$
# 橫座標:觀(預)測值平均
橫座標為觀(預)測值平均 $\overline{o_j}=\frac{1}{2}(o_j^1+o_j^2)$;不過差異的(95%)信賴區間的估計和觀(預)測值平均有何關聯?
沒有關聯。
可視化觀(預)測值平均的目的是直觀的檢視有無 proportional bias,如同 [funnel plot](https://en.wikipedia.org/wiki/Funnel_plot) 直觀的檢視有無 publicaiton bias 一般;以[範例研究](https://pubmed.ncbi.nlm.nih.gov/35300618/)(測量門牙到食道腫瘤的距離)為例,如果今天同時受測者包含小人族和巨人族,其觀(預)測值很可能不在同一個尺度上:巨人族測得的觀(預)測值可能是 1000公分,相差 10 公分算十分精準,而對觀(預)測值 5 公分的小人族來說,相差 10 公分卻是 200% 的差距。
這類由 proportional bias,也就是觀(預)測值尺度造成的偏差,可以呈現在以觀(預)測值平均為橫座標的作圖上:proportional bias [嚴重的話可以看到尖端朝左的漏斗狀分佈](https://upload.wikimedia.org/wikipedia/commons/9/9b/Bland-altman_plot_of_three_clinicians%27_ratings_of_burn_size%2C_using_two_different_methods.png)。
那有 proportional bias 的話怎麼辦?直覺的方法當然是取 logarithm。但實務上用途有限;同樣以範例研究為例,一個直觀的門牙到食道腫瘤的距離模型為:
$$o_j=d_j^{\text{Lar}}+d_j^{\text{Eso}}*r_j^{\text{Tum}}$$
其中 $d_j^{\text{Lar}}$ 為門牙到食道的距離,$d_j^{\text{Eso}}$ 為食道長度,$r_j^{\text{Tum}}$ 為腫瘤在食道的相對位置。即使 $d_j^{\text{Lar}}$ 正比 $d_j^{\text{Eso}}$,乘上 $r_j^{\text{Tum}}$ 的後者與前者的相加仍會破壞其 logarithm 後的線性關係。換句話說,logarithm 和類似的轉換的效益與否很依賴對原始分佈的領域知識。
# Implementation
## Observations
```
fig, ax = plt.subplots()
fig.set_size_inches(9, 6)
idx = np.arange(a_1.size)
for i, j, k in zip(idx, a_1, esc):
ax.plot([0, j], [i + 1-0.15]*2, color='green', alpha=0.5)
ax.plot([0, k], [i + 1+0.15]*2, color='blue', alpha=0.5)
xlm = np.array([np.min([a_1, esc]) * 0.9 // 1,
np.max([a_1, esc]) * 1.1 // 1])
ax.scatter(a_1, idx+1, color='green', s=10, alpha=0.5)
ax.scatter(esc, idx+1, color='blue', s=10, alpha=0.5)
ax.set_xlim(xlm)
ax.plot([],[], '.-', color='green', alpha=0.5, label='Reader A exam 1')
ax.plot([],[], '.-', color='blue', alpha=0.5, label='Endoscopy')
ax.set_ylim([0, a_1.size+1])
ax.set_xticks(np.arange(15,45,5))
ax.set_yticks(idx + 1)
ax.set_ylabel("Tumor indices")
ax.set_xlabel("Measurements (cm)")
ax.legend(frameon=False)
pass
```
![](https://hackmd.io/_uploads/HJ5YuAA4q.png)
## Calculate differences
```
fig, ax = plt.subplots()
fig.set_size_inches(9, 6)
idx = np.arange(a_1.size)
dis = a_1 - esc
xlm = np.array([np.min([a_1, esc]) * 0.9 // 1,
np.max([a_1, esc]) * 1.1 // 1])
a_m = (a_1+esc) / 2
for i, j, k in zip(idx, dis, esc):
if j > 0:
ax.plot( [k, j + k], [i + 1.]*2, color='green', alpha=0.5)
else:
ax.plot( [k, j + k], [i + 1.]*2, color='blue', alpha=0.5)
ax.plot([a_m[i]]*2, [0, a_1.size+1],color='gray', alpha=0.5, linewidth=0.5)
ax.scatter( (a_1), idx+1, color='green', s=10, alpha=0.5)
ax.scatter( (esc), idx+1, color='blue', s=10, alpha=0.5)
ax.scatter( (a_m), idx+1, color='orange', s=10, label='Mean')
ax.set_xlim(xlm)
ax.set_ylim([0, a_1.size+1])
ax.set_xticks(np.arange(15,45,5))
ax.set_yticks(idx + 1)
ax.set_ylabel("Tumor indices")
ax.set_xlabel("Measurements (cm)")
ax.legend(frameon=False)
pass
```
![](https://hackmd.io/_uploads/BykyF0AE5.png)
## Sort by differences
```
fig, ax = plt.subplots()
fig.set_size_inches(9, 6)
idx = np.arange(a_1.size)
dis = a_1 - esc
# bsl = np.min([a_1, esc], axis=0)
xlm = np.array([np.min([a_1, esc]) * 0.9 // 1,
np.max([a_1, esc]) * 1.1 // 1])
a_m = (a_1+esc) / 2
for i, j, k in zip(idx, dis, esc):
if j > 0:
ax.plot( [k, j + k], [j]*2, color='green', alpha=0.5)
else:
ax.plot( [k, j + k], [j]*2, color='blue', alpha=0.5)
ax.scatter( (a_1), dis, color='green', s=10, alpha=0.5)
ax.scatter( (esc), dis, color='blue', s=10, alpha=0.5)
ax.scatter( (a_m), dis, color='orange', s=10,)
ax.plot(xlm, [0]*2, color='gray', alpha=0.5)
ax.set_xlim(xlm)
ylm = ax.get_ylim()
for i, j, k in zip(idx, dis, esc):
ax.plot([a_m[i]]*2, ylm,color='gray', alpha=0.5, linewidth=0.5)
ax.set_ylim(ylm)
ax.set_xticks(np.arange(15,45,5))
ax.set_ylabel("Difference (cm)")
ax.set_xlabel("Measurements (cm)")
ax.legend(frameon=False)
pass
```
![](https://hackmd.io/_uploads/SJkSF0AV5.png)
## Estimate the Gaussian
```
fig, ax = plt.subplots()
fig.set_size_inches(9, 6)
idx = np.arange(a_1.size)
dis = a_1 - esc
xlm = np.array([np.min([a_1, esc]) * 0.9 // 1,
np.max([a_1, esc]) * 1.1 // 1])
a_m = (a_1+esc) / 2
for i, j, k in zip(idx, dis, esc):
if j > 0:
ax.plot( xlm, [j]*2, color='gray', alpha=0.5, linewidth=0.5)
else:
ax.plot( xlm, [j]*2, color='gray', alpha=0.5, linewidth=0.5)
ax.scatter( (a_m), dis, color='orange', s=10,)
ax.scatter(np.full_like(dis, 15), dis, color='red', s=10, alpha=0.5)
mne_dis = dis.mean()
std_dis = dis.std(ddof=1)
ylm = ax.get_ylim()
pts = np.linspace(*ylm, 1001)
ax.plot((get_gau(pts, mne_dis, std_dis) * 50 + 15) , pts, color='red',alpha=0.5)
ax.set_xlim(xlm)
ax.set_ylim(ylm)
ax.set_xticks(np.arange(15,45,5))
ax.set_ylabel("Difference (cm)")
ax.set_xlabel("Mean (cm)")
ax.legend(frameon=False)
pass
```
![](https://hackmd.io/_uploads/SkZiY00Nq.png)
## Estimate the CI
```
fig, ax = plt.subplots()
fig.set_size_inches(9, 6)
idx = np.arange(a_1.size)
dis = a_1 - esc
xlm = np.array([np.min([a_1, esc]) * 0.9 // 1,
np.max([a_1, esc]) * 1.1 // 1])
a_m = (a_1+esc) / 2
ax.scatter( (a_m), dis, color='orange', s=10,)
mne_dis = dis.mean()
std_dis = dis.std(ddof=1)
ylm = ax.get_ylim()
pts = np.linspace(*ylm, 1001)
low, hgh = mne_dis - np.abs(std_dis) * 1.96, mne_dis + np.abs(std_dis) * 1.96
ax.plot(xlm, [low]*2, color='gray', alpha=0.8, label=f"95% CI: {low.round(3)} to {hgh.round(3)}")
ax.plot(xlm, [hgh]*2, color='gray', alpha=0.8, )
ax.plot(xlm, [mne_dis]*2, '--', color='gray', alpha=0.8, label=f"mean difference: {mne_dis.round(3)}")
ax.plot(xlm, [0]*2, color='gray', alpha=0.5)
ax.set_xlim(xlm)
ax.set_xticks(np.arange(15,45,5))
ax.set_ylabel("Difference (cm)")
ax.set_xlabel("Mean (cm)")
ax.legend(frameon=False)
pass
```
![](https://hackmd.io/_uploads/Skt3F00Nc.png)