# 資料科學期末報告 ---- ## Preparing ### (1) import Libraries ![](https://i.imgur.com/GgRzjc9.png =x200) --- ### (2)Read the Data in Pandas dataFrame ![](https://i.imgur.com/wfl3EVX.png =x30) --- ## Data Preprocessing ### convert the one-hot encoding labels into class labels ![](https://i.imgur.com/BTNz7LQ.png =x100) --- ### Describe the features ![](https://i.imgur.com/nfKKDRR.png =x600) --- ### Calculating the faults ![](https://i.imgur.com/JdhND41.png =x200) --- ### Calculation plot ![](https://i.imgur.com/lkFH8AR.png) --- ### the correlation matrix ![](https://i.imgur.com/jzVY0V3.png =x500) * Correlation coeff of "TypeOfSteel_A300" and "TypeOfSteel_A400" is -1 so we can drop one of them * * Correlation coeff of "X_minimum" and "X_maximum" is -1 so we can drop one of them * * Correlation coeff of "Y_minimum" and "Y_maximum" is -1 so we can drop one of them --- ### Split train/test data ![](https://i.imgur.com/QgCDpy9.png) --- ### Applying Random Forest ![](https://i.imgur.com/KfTQY2w.png =x400) --- ![](https://i.imgur.com/fMNTs7u.png) ### Assessing feature importance with random forest ![](https://i.imgur.com/mgYZv0t.png =x400) ![](https://i.imgur.com/jUebsts.png =x500) * SigmoidOfAreas has low importance--->we remove it ![](https://i.imgur.com/xEz2Up1.png =x40) * 1.X_Minimum 2.X_Maximum 3.Y_Maximum 4.Y_Minimum 5.Pixels_Areas 6.X-Perimeter 7.Y-Perimeter--->these are the important factor ### Applying Random Forest after removing least important feature #### Accuracy Chart ![](https://i.imgur.com/0bJFISK.png) #### Confusion Matrix ![](https://i.imgur.com/VmKXjyO.png) #### Classification result ![](https://i.imgur.com/hmJIaiB.png =x250)
{"metaMigratedAt":"2023-06-16T03:34:26.149Z","metaMigratedFrom":"Content","title":"資料科學期末報告","breaks":true,"contributors":"[{\"id\":\"23d7162c-196b-4bfc-adf7-d13ed86771a6\",\"add\":1709,\"del\":10}]"}
    294 views