# NIS 表特徵篩選
###### tags: `國泰專案`
## 介紹
NIS 表的特色是同個病人的特徵會有多筆,分析難度相較一人一筆特徵的資料難度來得高,如何提取 NIS 表的重要資訊是關鍵,借鑑 EWS 專案經驗,我們發展出一套有效的特徵篩選方法。
## 定義
母群體:所有疑似敗血症群體連集出院診斷為敗血症的群體。
## 流程
1. **步驟一**:選取合適的時間窗口。時間窗口的選擇可以根據醫生的經驗,或使用機器學習方法自動挑選。備註中的 ERSTAY 統計表統計了**母群體**中病人在急診停留的小時數分佈,明顯觀察到大部分的病人停留的小時數在 0-5 小時。
> 建議:急診停留 0-14 小時的病人大約佔 93% 左右 (若考慮 NA 值,則為 83%),所以會使用三個時間窗口 0-4, 0-9, 0-14 (小時) 分別進行建模,然後看三個組合中哪個建模效果最好。
3. **步驟二**:根據步驟一所計算的時間窗口在時間窗口內計算下列指標,並在這些指標中挑選特徵重要性最高的指標放入建模。
+ 平均:將病人在急診測量的數值做平均
+ 最前:選取病人在急診測量的最前數值
+ 最後:選取病人在急診測量的最後數值
+ 最大:選取病人在急診測量的最大數值
+ 最小:選取病人在急診測量的最小數值
3. **步驟三**:加入趨勢特徵。NIS 表的特徵為時序資料,我們可以根據這個時序資料來計算趨勢,計算方法是根據這個特徵的時序資料算出對應的迴歸線斜率。下圖是 ACCOUNTNO 為 I10700000041 的病人的 spo2 走勢圖,x 軸表示時間,y 軸表示 spo2 的數值,紅點表示測量資料,藍線則為迴歸線,計算後迴歸線斜率為 -0.00367368,表示 spo2 數值有緩慢下降的趨勢。

## 備註
1. **ERSTAY 統計表** (不考慮 ERSTAY 為 NA 的資料)
| ERSTAY(小時) | 件數 | 累計件數 | 佔比 | 累積佔比 |
| --------------:| -----:| --------:| -------:| --------:|
| 0 | 4,993 | 4,993 | 21.488% | 21.488% |
| 1 | 6,005 | 10,998 | 25.844% | 47.332% |
| 2 | 3,871 | 14,869 | 16.659% | 63.991% |
| 3 | 2,088 | 16,957 | 8.986% | 72.977% |
| 4 | 1,147 | 18,104 | 4.936% | 77.914% |
| 5 | 764 | 18,868 | 3.288% | 81.202% |
| 6 | 573 | 19,441 | 2.466% | 83.668% |
| 7 | 411 | 19,852 | 1.769% | 85.436% |
| 8 | 350 | 20,202 | 1.506% | 86.943% |
| 9 | 334 | 20,536 | 1.437% | 88.380% |
| 10 | 328 | 20,864 | 1.412% | 89.792% |
| 11 | 261 | 21,125 | 1.123% | 90.915% |
| 12 | 269 | 21,394 | 1.158% | 92.073% |
| 13 | 221 | 21,615 | 0.951% | 93.024% |
| 14 | 200 | 21,815 | 0.861% | 93.884% |
| 15 | 187 | 22,002 | 0.805% | 94.689% |
| 16 | 171 | 22,173 | 0.736% | 95.425% |
| 17 | 147 | 22,320 | 0.633% | 96.058% |
| 18 | 136 | 22,456 | 0.585% | 96.643% |
| 19 | 104 | 22,560 | 0.448% | 97.091% |
| 20 | 78 | 22,638 | 0.336% | 97.426% |
| 21 | 94 | 22,732 | 0.405% | 97.831% |
| 22 | 94 | 22,826 | 0.405% | 98.235% |
| 23 | 74 | 22,900 | 0.318% | 98.554% |
| 24 | 49 | 22,949 | 0.211% | 98.765% |
| 25 | 46 | 22,995 | 0.198% | 98.963% |
| 26 | 39 | 23,034 | 0.168% | 99.131% |
| 27 | 30 | 23,064 | 0.129% | 99.260% |
| 28 | 27 | 23,091 | 0.116% | 99.376% |
| 29 | 20 | 23,111 | 0.086% | 99.462% |
| 30 | 9 | 23,120 | 0.039% | 99.501% |
| 31 | 5 | 23,125 | 0.022% | 99.522% |
| 32 | 7 | 23,132 | 0.030% | 99.552% |
| 33 | 9 | 23,141 | 0.039% | 99.591% |
| 34 | 8 | 23,149 | 0.034% | 99.626% |
| 35 | 6 | 23,155 | 0.026% | 99.651% |
| 36 | 7 | 23,162 | 0.030% | 99.682% |
| 37 | 5 | 23,167 | 0.022% | 99.703% |
| 38 | 5 | 23,172 | 0.022% | 99.725% |
| 39 | 5 | 23,177 | 0.022% | 99.746% |
| 40 | 4 | 23,181 | 0.017% | 99.763% |
| 41 | 5 | 23,186 | 0.022% | 99.785% |
| 42 | 9 | 23,195 | 0.039% | 99.824% |
| 43 | 1 | 23,196 | 0.004% | 99.828% |
| 44 | 5 | 23,201 | 0.022% | 99.849% |
| 45 | 1 | 23,202 | 0.004% | 99.854% |
| 46 | 4 | 23,206 | 0.017% | 99.871% |
| 47 | 3 | 23,209 | 0.013% | 99.884% |
| 48 | 1 | 23,210 | 0.004% | 99.888% |
| 49 | 2 | 23,212 | 0.009% | 99.897% |
| 50 | 3 | 23,215 | 0.013% | 99.910% |
| 52 | 2 | 23,217 | 0.009% | 99.918% |
| 53 | 3 | 23,220 | 0.013% | 99.931% |
| 54 | 1 | 23,221 | 0.004% | 99.935% |
| 57 | 1 | 23,222 | 0.004% | 99.940% |
| 64 | 1 | 23,223 | 0.004% | 99.944% |
| 65 | 1 | 23,224 | 0.004% | 99.948% |
| 66 | 2 | 23,226 | 0.009% | 99.957% |
| 72 | 1 | 23,227 | 0.004% | 99.961% |
| 74 | 1 | 23,228 | 0.004% | 99.966% |
| 76 | 2 | 23,230 | 0.009% | 99.974% |
| 88 | 1 | 23,231 | 0.004% | 99.978% |
| 100 | 1 | 23,232 | 0.004% | 99.983% |
| 117 | 1 | 23,233 | 0.004% | 99.987% |
| 332 | 1 | 23,234 | 0.004% | 99.991% |
| 358 | 1 | 23,235 | 0.004% | 99.996% |
| 904 | 1 | 23,236 | 0.004% | 100.000% |