Bland–Altman plot 實作 === ###### tags: `python` `statistics` ![](https://hackmd.io/_uploads/Sk7OiHREc.gif) Bland–Altman plot 將兩組觀(預)測值間差異可視化,並估計觀(預)測值間差異的可信區間。 # 縱座標:觀(預)測值間差異 縱座標為觀(預)測值間差異 $\Delta o_j=o_j^1-o_j^2$,其中 $o_j^i$ 為第 $i$ 組的第 $j$ 筆觀(預)測值;假設觀(預)測值間差異為常態分佈: $$D\sim N(\mu_s,\hat{s}^2)$$ 其中 $\mu_s=\overline{\Delta o_j}$ 為差異的樣本平均,$\hat{s}^2=\frac{1}{J-1}\sum_{j=1}^J(\Delta o_j-\mu_s)^2$ 為差異的校正後樣本變異,則差異的(95%)信賴區間可以進一步被估計: $$\left[\mu_s-1.96\hat{s}, \mu_s+1.96\hat{s}\right]$$ # 橫座標:觀(預)測值平均 橫座標為觀(預)測值平均 $\overline{o_j}=\frac{1}{2}(o_j^1+o_j^2)$;不過差異的(95%)信賴區間的估計和觀(預)測值平均有何關聯? 沒有關聯。 可視化觀(預)測值平均的目的是直觀的檢視有無 proportional bias,如同 [funnel plot](https://en.wikipedia.org/wiki/Funnel_plot) 直觀的檢視有無 publicaiton bias 一般;以[範例研究](https://pubmed.ncbi.nlm.nih.gov/35300618/)(測量門牙到食道腫瘤的距離)為例,如果今天同時受測者包含小人族和巨人族,其觀(預)測值很可能不在同一個尺度上:巨人族測得的觀(預)測值可能是 1000公分,相差 10 公分算十分精準,而對觀(預)測值 5 公分的小人族來說,相差 10 公分卻是 200% 的差距。 這類由 proportional bias,也就是觀(預)測值尺度造成的偏差,可以呈現在以觀(預)測值平均為橫座標的作圖上:proportional bias [嚴重的話可以看到尖端朝左的漏斗狀分佈](https://upload.wikimedia.org/wikipedia/commons/9/9b/Bland-altman_plot_of_three_clinicians%27_ratings_of_burn_size%2C_using_two_different_methods.png)。 那有 proportional bias 的話怎麼辦?直覺的方法當然是取 logarithm。但實務上用途有限;同樣以範例研究為例,一個直觀的門牙到食道腫瘤的距離模型為: $$o_j=d_j^{\text{Lar}}+d_j^{\text{Eso}}*r_j^{\text{Tum}}$$ 其中 $d_j^{\text{Lar}}$ 為門牙到食道的距離,$d_j^{\text{Eso}}$ 為食道長度,$r_j^{\text{Tum}}$ 為腫瘤在食道的相對位置。即使 $d_j^{\text{Lar}}$ 正比 $d_j^{\text{Eso}}$,乘上 $r_j^{\text{Tum}}$ 的後者與前者的相加仍會破壞其 logarithm 後的線性關係。換句話說,logarithm 和類似的轉換的效益與否很依賴對原始分佈的領域知識。 # Implementation ## Observations ``` fig, ax = plt.subplots() fig.set_size_inches(9, 6) idx = np.arange(a_1.size) for i, j, k in zip(idx, a_1, esc): ax.plot([0, j], [i + 1-0.15]*2, color='green', alpha=0.5) ax.plot([0, k], [i + 1+0.15]*2, color='blue', alpha=0.5) xlm = np.array([np.min([a_1, esc]) * 0.9 // 1, np.max([a_1, esc]) * 1.1 // 1]) ax.scatter(a_1, idx+1, color='green', s=10, alpha=0.5) ax.scatter(esc, idx+1, color='blue', s=10, alpha=0.5) ax.set_xlim(xlm) ax.plot([],[], '.-', color='green', alpha=0.5, label='Reader A exam 1') ax.plot([],[], '.-', color='blue', alpha=0.5, label='Endoscopy') ax.set_ylim([0, a_1.size+1]) ax.set_xticks(np.arange(15,45,5)) ax.set_yticks(idx + 1) ax.set_ylabel("Tumor indices") ax.set_xlabel("Measurements (cm)") ax.legend(frameon=False) pass ``` ![](https://hackmd.io/_uploads/HJ5YuAA4q.png) ## Calculate differences ``` fig, ax = plt.subplots() fig.set_size_inches(9, 6) idx = np.arange(a_1.size) dis = a_1 - esc xlm = np.array([np.min([a_1, esc]) * 0.9 // 1, np.max([a_1, esc]) * 1.1 // 1]) a_m = (a_1+esc) / 2 for i, j, k in zip(idx, dis, esc): if j > 0: ax.plot( [k, j + k], [i + 1.]*2, color='green', alpha=0.5) else: ax.plot( [k, j + k], [i + 1.]*2, color='blue', alpha=0.5) ax.plot([a_m[i]]*2, [0, a_1.size+1],color='gray', alpha=0.5, linewidth=0.5) ax.scatter( (a_1), idx+1, color='green', s=10, alpha=0.5) ax.scatter( (esc), idx+1, color='blue', s=10, alpha=0.5) ax.scatter( (a_m), idx+1, color='orange', s=10, label='Mean') ax.set_xlim(xlm) ax.set_ylim([0, a_1.size+1]) ax.set_xticks(np.arange(15,45,5)) ax.set_yticks(idx + 1) ax.set_ylabel("Tumor indices") ax.set_xlabel("Measurements (cm)") ax.legend(frameon=False) pass ``` ![](https://hackmd.io/_uploads/BykyF0AE5.png) ## Sort by differences ``` fig, ax = plt.subplots() fig.set_size_inches(9, 6) idx = np.arange(a_1.size) dis = a_1 - esc # bsl = np.min([a_1, esc], axis=0) xlm = np.array([np.min([a_1, esc]) * 0.9 // 1, np.max([a_1, esc]) * 1.1 // 1]) a_m = (a_1+esc) / 2 for i, j, k in zip(idx, dis, esc): if j > 0: ax.plot( [k, j + k], [j]*2, color='green', alpha=0.5) else: ax.plot( [k, j + k], [j]*2, color='blue', alpha=0.5) ax.scatter( (a_1), dis, color='green', s=10, alpha=0.5) ax.scatter( (esc), dis, color='blue', s=10, alpha=0.5) ax.scatter( (a_m), dis, color='orange', s=10,) ax.plot(xlm, [0]*2, color='gray', alpha=0.5) ax.set_xlim(xlm) ylm = ax.get_ylim() for i, j, k in zip(idx, dis, esc): ax.plot([a_m[i]]*2, ylm,color='gray', alpha=0.5, linewidth=0.5) ax.set_ylim(ylm) ax.set_xticks(np.arange(15,45,5)) ax.set_ylabel("Difference (cm)") ax.set_xlabel("Measurements (cm)") ax.legend(frameon=False) pass ``` ![](https://hackmd.io/_uploads/SJkSF0AV5.png) ## Estimate the Gaussian ``` fig, ax = plt.subplots() fig.set_size_inches(9, 6) idx = np.arange(a_1.size) dis = a_1 - esc xlm = np.array([np.min([a_1, esc]) * 0.9 // 1, np.max([a_1, esc]) * 1.1 // 1]) a_m = (a_1+esc) / 2 for i, j, k in zip(idx, dis, esc): if j > 0: ax.plot( xlm, [j]*2, color='gray', alpha=0.5, linewidth=0.5) else: ax.plot( xlm, [j]*2, color='gray', alpha=0.5, linewidth=0.5) ax.scatter( (a_m), dis, color='orange', s=10,) ax.scatter(np.full_like(dis, 15), dis, color='red', s=10, alpha=0.5) mne_dis = dis.mean() std_dis = dis.std(ddof=1) ylm = ax.get_ylim() pts = np.linspace(*ylm, 1001) ax.plot((get_gau(pts, mne_dis, std_dis) * 50 + 15) , pts, color='red',alpha=0.5) ax.set_xlim(xlm) ax.set_ylim(ylm) ax.set_xticks(np.arange(15,45,5)) ax.set_ylabel("Difference (cm)") ax.set_xlabel("Mean (cm)") ax.legend(frameon=False) pass ``` ![](https://hackmd.io/_uploads/SkZiY00Nq.png) ## Estimate the CI ``` fig, ax = plt.subplots() fig.set_size_inches(9, 6) idx = np.arange(a_1.size) dis = a_1 - esc xlm = np.array([np.min([a_1, esc]) * 0.9 // 1, np.max([a_1, esc]) * 1.1 // 1]) a_m = (a_1+esc) / 2 ax.scatter( (a_m), dis, color='orange', s=10,) mne_dis = dis.mean() std_dis = dis.std(ddof=1) ylm = ax.get_ylim() pts = np.linspace(*ylm, 1001) low, hgh = mne_dis - np.abs(std_dis) * 1.96, mne_dis + np.abs(std_dis) * 1.96 ax.plot(xlm, [low]*2, color='gray', alpha=0.8, label=f"95% CI: {low.round(3)} to {hgh.round(3)}") ax.plot(xlm, [hgh]*2, color='gray', alpha=0.8, ) ax.plot(xlm, [mne_dis]*2, '--', color='gray', alpha=0.8, label=f"mean difference: {mne_dis.round(3)}") ax.plot(xlm, [0]*2, color='gray', alpha=0.5) ax.set_xlim(xlm) ax.set_xticks(np.arange(15,45,5)) ax.set_ylabel("Difference (cm)") ax.set_xlabel("Mean (cm)") ax.legend(frameon=False) pass ``` ![](https://hackmd.io/_uploads/Skt3F00Nc.png)