# textCNN_XLNET_Cross Validation ## Cross validation 前 * 原本使用全部資料 => training * 將所有非正常與正常 1:1 => testing #### 問題 testing 的資料都在 training data 出現過,可能造成 overting ## 使用 Cross validation fold = 5 data: word_labeled_zh_200918_02:17CC.csv 時間: 6.5 hr **資料描述** ![](https://i.imgur.com/U6bY2UN.png) #### CV 結果 : 命中率雖高,但效果不好 ![](https://i.imgur.com/Rxd9KaS.png) ------------------------------------------- ![](https://i.imgur.com/6pNj9bF.png) ------------------------------------------- ![](https://i.imgur.com/ji0N55g.png) ------------------------------------------- ![](https://i.imgur.com/BO1zLZg.png) ------------------------------------------- ![](https://i.imgur.com/4Z9bGwD.png) ### 推測原因 資料不平衡且不平均分布 # Update 2020/12/27 ### data information * training data: word_label_data_remove_long_word_201223.csv * fold: 5 (Train:Test = 8:2) ### 效果 #### Fold 0 (**Accuracy: 0.936**) * model: **text_cnn_best_0.9362186788154897_LR0.001_BATCH100_EPOCH100** ![](https://i.imgur.com/Cm235lr.png) ![](https://i.imgur.com/TtIk83F.png) * predict error (err: 465): https://docs.google.com/spreadsheets/d/1JNuHyBOSBIFLDATdaWymgu30GHRJvtpigSLGUlC_GSg/edit?usp=sharing --- #### Fold 1 (**Accuracy: 0.933**) * model: **text_cnn_best_0.9335387913707625_LR0.001_BATCH100_EPOCH100** ![](https://i.imgur.com/oPPO4Tg.png) ![](https://i.imgur.com/riP3T1N.png) * predict error (err:473): https://docs.google.com/spreadsheets/d/1dxq_1wdXgpiiCCqsWOrFsZtX5Le391bMMNd7xl0aYg8/edit?usp=sharing --- #### Fold 2 (**Accuracy: 0.928**) * model: **text_cnn_best_0.9287149939702533_LR0.001_BATCH100_EPOCH100** ![](https://i.imgur.com/gSoZmGh.png) ![](https://i.imgur.com/ASetfaC.png) * predict error (err:536): https://docs.google.com/spreadsheets/d/1AH8HBkOyNcMRAXzGeBv-ZUt6Nv3xx6Pc73Cs4qE9QRk/edit?usp=sharing --- #### Fold 3 (**Accuracy: 0.937**) * model: **text_cnn_best_0.9371482176360225_LR0.001_BATCH100_EPOCH100** ![](https://i.imgur.com/ASetfaC.png) ![](https://i.imgur.com/OHHvX4k.png) * predict error (err: 451): https://docs.google.com/spreadsheets/d/13P8Waolh6jKTpKIPDpV701RZsEz1Cv_mlOBzCHqi_60/edit?usp=sharing --- #### Fold 4 (**Accuracy: 0.926**) * model: **text_cnn_best_0.9261592066470116_LR0.001_BATCH100_EPOCH100** ![](https://i.imgur.com/5uw71Q2.png) ![](https://i.imgur.com/6Lb6ZHS.png) * predict error (err: 537): https://docs.google.com/spreadsheets/d/1Kr8XNZwokP58-EVb41BQy5tv1kzF9FNWXs9nJeS8iVk/edit?usp=sharing # Update 2021/1/3 ## 調整 weight ![](https://i.imgur.com/QeK2jCz.png) **效果僅微幅上升,混淆矩陣效果仍然不佳** #### fold 0 ![](https://i.imgur.com/CNFt6kg.png) * model: text_cnn_best_0.9385647216633132_LR0.001_BATCH100_EPOCH100 ![](https://i.imgur.com/6OpFo2m.png) #### fold 1 ![](https://i.imgur.com/dFHO0uv.png) * model: text_cnn_best_0.9362843729040912_LR0.001_BATCH100_EPOCH100 ![](https://i.imgur.com/PgozWaH.png) #### fold 2 ![](https://i.imgur.com/jOErAnH.png) * model: text_cnn_best_0.9299704856452912_LR0.001_BATCH100_EPOCH100 ![](https://i.imgur.com/qynJnDg.png) #### fold 3 ![](https://i.imgur.com/N0fpWeB.png) * model: text_cnn_best_0.9331902334317145_LR0.001_BATCH100_EPOCH100 ![](https://i.imgur.com/HvPAITM.png) #### fold 4 ![](https://i.imgur.com/9NlueL7.png) * model: text_cnn_best_0.9369466058492085_LR0.001_BATCH100_EPOCH100 ![](https://i.imgur.com/qlwu3MD.png) ### code https://colab.research.google.com/drive/1tysuKunII1paJo6VXZtDcW6Qq5bxacJz?usp=sharing ###### tags: `Progress Report`