# 資料科學期末報告
----
## Preparing
### (1) import Libraries

---
### (2)Read the Data in Pandas dataFrame

---
## Data Preprocessing
### convert the one-hot encoding labels into class labels

---
### Describe the features

---
### Calculating the faults

---
### Calculation plot

---
### the correlation matrix

* Correlation coeff of "TypeOfSteel_A300" and "TypeOfSteel_A400" is -1 so we can drop one of them *
* Correlation coeff of "X_minimum" and "X_maximum" is -1 so we can drop one of them *
* Correlation coeff of "Y_minimum" and "Y_maximum" is -1 so we can drop one of them
---
### Split train/test data

---
### Applying Random Forest

---

### Assessing feature importance with random forest


* SigmoidOfAreas has low importance--->we remove it

* 1.X_Minimum 2.X_Maximum 3.Y_Maximum 4.Y_Minimum 5.Pixels_Areas 6.X-Perimeter 7.Y-Perimeter--->these are the important factor
### Applying Random Forest after removing least important feature
#### Accuracy Chart

#### Confusion Matrix

#### Classification result

{"metaMigratedAt":"2023-06-16T03:34:26.149Z","metaMigratedFrom":"Content","title":"資料科學期末報告","breaks":true,"contributors":"[{\"id\":\"23d7162c-196b-4bfc-adf7-d13ed86771a6\",\"add\":1709,\"del\":10}]"}