# NIS 表特徵篩選 ###### tags: `國泰專案` ## 介紹 NIS 表的特色是同個病人的特徵會有多筆,分析難度相較一人一筆特徵的資料難度來得高,如何提取 NIS 表的重要資訊是關鍵,借鑑 EWS 專案經驗,我們發展出一套有效的特徵篩選方法。 ## 定義 母群體:所有疑似敗血症群體連集出院診斷為敗血症的群體。 ## 流程 1. **步驟一**:選取合適的時間窗口。時間窗口的選擇可以根據醫生的經驗,或使用機器學習方法自動挑選。備註中的 ERSTAY 統計表統計了**母群體**中病人在急診停留的小時數分佈,明顯觀察到大部分的病人停留的小時數在 0-5 小時。 > 建議:急診停留 0-14 小時的病人大約佔 93% 左右 (若考慮 NA 值,則為 83%),所以會使用三個時間窗口 0-4, 0-9, 0-14 (小時) 分別進行建模,然後看三個組合中哪個建模效果最好。 3. **步驟二**:根據步驟一所計算的時間窗口在時間窗口內計算下列指標,並在這些指標中挑選特徵重要性最高的指標放入建模。 + 平均:將病人在急診測量的數值做平均 + 最前:選取病人在急診測量的最前數值 + 最後:選取病人在急診測量的最後數值 + 最大:選取病人在急診測量的最大數值 + 最小:選取病人在急診測量的最小數值 3. **步驟三**:加入趨勢特徵。NIS 表的特徵為時序資料,我們可以根據這個時序資料來計算趨勢,計算方法是根據這個特徵的時序資料算出對應的迴歸線斜率。下圖是 ACCOUNTNO 為 I10700000041 的病人的 spo2 走勢圖,x 軸表示時間,y 軸表示 spo2 的數值,紅點表示測量資料,藍線則為迴歸線,計算後迴歸線斜率為 -0.00367368,表示 spo2 數值有緩慢下降的趨勢。 ![](https://i.imgur.com/PbRK2lu.png =250x150) ## 備註 1. **ERSTAY 統計表** (不考慮 ERSTAY 為 NA 的資料) | ERSTAY(小時) | 件數 | 累計件數 | 佔比 | 累積佔比 | | --------------:| -----:| --------:| -------:| --------:| | 0 | 4,993 | 4,993 | 21.488% | 21.488% | | 1 | 6,005 | 10,998 | 25.844% | 47.332% | | 2 | 3,871 | 14,869 | 16.659% | 63.991% | | 3 | 2,088 | 16,957 | 8.986% | 72.977% | | 4 | 1,147 | 18,104 | 4.936% | 77.914% | | 5 | 764 | 18,868 | 3.288% | 81.202% | | 6 | 573 | 19,441 | 2.466% | 83.668% | | 7 | 411 | 19,852 | 1.769% | 85.436% | | 8 | 350 | 20,202 | 1.506% | 86.943% | | 9 | 334 | 20,536 | 1.437% | 88.380% | | 10 | 328 | 20,864 | 1.412% | 89.792% | | 11 | 261 | 21,125 | 1.123% | 90.915% | | 12 | 269 | 21,394 | 1.158% | 92.073% | | 13 | 221 | 21,615 | 0.951% | 93.024% | | 14 | 200 | 21,815 | 0.861% | 93.884% | | 15 | 187 | 22,002 | 0.805% | 94.689% | | 16 | 171 | 22,173 | 0.736% | 95.425% | | 17 | 147 | 22,320 | 0.633% | 96.058% | | 18 | 136 | 22,456 | 0.585% | 96.643% | | 19 | 104 | 22,560 | 0.448% | 97.091% | | 20 | 78 | 22,638 | 0.336% | 97.426% | | 21 | 94 | 22,732 | 0.405% | 97.831% | | 22 | 94 | 22,826 | 0.405% | 98.235% | | 23 | 74 | 22,900 | 0.318% | 98.554% | | 24 | 49 | 22,949 | 0.211% | 98.765% | | 25 | 46 | 22,995 | 0.198% | 98.963% | | 26 | 39 | 23,034 | 0.168% | 99.131% | | 27 | 30 | 23,064 | 0.129% | 99.260% | | 28 | 27 | 23,091 | 0.116% | 99.376% | | 29 | 20 | 23,111 | 0.086% | 99.462% | | 30 | 9 | 23,120 | 0.039% | 99.501% | | 31 | 5 | 23,125 | 0.022% | 99.522% | | 32 | 7 | 23,132 | 0.030% | 99.552% | | 33 | 9 | 23,141 | 0.039% | 99.591% | | 34 | 8 | 23,149 | 0.034% | 99.626% | | 35 | 6 | 23,155 | 0.026% | 99.651% | | 36 | 7 | 23,162 | 0.030% | 99.682% | | 37 | 5 | 23,167 | 0.022% | 99.703% | | 38 | 5 | 23,172 | 0.022% | 99.725% | | 39 | 5 | 23,177 | 0.022% | 99.746% | | 40 | 4 | 23,181 | 0.017% | 99.763% | | 41 | 5 | 23,186 | 0.022% | 99.785% | | 42 | 9 | 23,195 | 0.039% | 99.824% | | 43 | 1 | 23,196 | 0.004% | 99.828% | | 44 | 5 | 23,201 | 0.022% | 99.849% | | 45 | 1 | 23,202 | 0.004% | 99.854% | | 46 | 4 | 23,206 | 0.017% | 99.871% | | 47 | 3 | 23,209 | 0.013% | 99.884% | | 48 | 1 | 23,210 | 0.004% | 99.888% | | 49 | 2 | 23,212 | 0.009% | 99.897% | | 50 | 3 | 23,215 | 0.013% | 99.910% | | 52 | 2 | 23,217 | 0.009% | 99.918% | | 53 | 3 | 23,220 | 0.013% | 99.931% | | 54 | 1 | 23,221 | 0.004% | 99.935% | | 57 | 1 | 23,222 | 0.004% | 99.940% | | 64 | 1 | 23,223 | 0.004% | 99.944% | | 65 | 1 | 23,224 | 0.004% | 99.948% | | 66 | 2 | 23,226 | 0.009% | 99.957% | | 72 | 1 | 23,227 | 0.004% | 99.961% | | 74 | 1 | 23,228 | 0.004% | 99.966% | | 76 | 2 | 23,230 | 0.009% | 99.974% | | 88 | 1 | 23,231 | 0.004% | 99.978% | | 100 | 1 | 23,232 | 0.004% | 99.983% | | 117 | 1 | 23,233 | 0.004% | 99.987% | | 332 | 1 | 23,234 | 0.004% | 99.991% | | 358 | 1 | 23,235 | 0.004% | 99.996% | | 904 | 1 | 23,236 | 0.004% | 100.000% |