Lab 1-2 - HackMD

# Lab 1-2 [TOC] 路徑:`/mlsec/malware/malware-classification.ipynb` ## Code ```python= import pandas as pd import numpy as np from sklearn.ensemble import ExtraTreesClassifier from sklearn.feature_selection import SelectFromModel from sklearn import tree, ensemble, naive_bayes from sklearn import model_selection from sklearn.metrics import confusion_matrix from matplotlib import pyplot as plt %matplotlib inline ``` > cross_validation已經停用，現已改為model_selection > %matplotlib inline是專用於jupyter note，讓matplotlib.pyplot的繪圖結果直接輸出在console ```python= df = pd.read_csv('data.csv.original', sep='|') legit_binaries = df[0:41323].drop(['legitimate'], axis=1) malicious_binaries = df[41323::].drop(['legitimate'], axis=1) ``` ```python= legit_binaries['FileAlignment'].value_counts() ``` ```python= malicious_binaries['FileAlignment'].value_counts() ``` ```python= #Q1: plot and observe more data plt.figure(figsize=(20,10)) plt.hist([legit_binaries['SectionsMaxEntropy'], malicious_binaries['SectionsMaxEntropy']],\ range=[0,8], normed=True, color=["green", "red"],label=["legitimate", "malicious"]) plt.legend() plt.show() ``` ```python= X = df.drop(['Name', 'md5', 'legitimate'], axis=1).values y = df['legitimate'].values ``` ```python= # Build a forest and compute the feature importances - n_estimators:The number of trees in the forest. forest = ExtraTreesClassifier(n_estimators=10).fit(X, y) # Meta-transformer for selecting features based on importance weights. model = SelectFromModel(forest, prefit=True) ``` ```python= X_new = model.transform(X) print('before X.shape: {}'.format(X.shape)) print('after X.shape: {}'.format(X_new.shape)) ``` ```python= nb_features = X_new.shape[1] indices = np.argsort(forest.feature_importances_)[::-1][:nb_features] for f in range(nb_features): print("%d. feature %s (%f)" % (f + 1, df.columns[2+indices[f]],forest.feature_importances_[indices[f]])) ``` ```python= X_train, X_test, y_train, y_test = model_selection.train_test_split(X_new, y ,test_size=0.2) clf = tree.DecisionTreeClassifier(max_depth=10) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) score = clf.score(X_test, y_test) print("DecisionTree : %f %%" % ( score*100)) print(confusion_matrix(y_test, y_pred)) ``` 對DecisionTree進行預測 ```python= #Q2: Setup more model algorithms = { "DecisionTree": tree.DecisionTreeClassifier(max_depth=10), "RandomForest": ensemble.RandomForestClassifier(n_estimators=50), "GradientBoosting": ensemble.GradientBoostingClassifier(n_estimators=50), "AdaBoost": ensemble.AdaBoostClassifier(n_estimators=100), "GNB": naive_bayes.GaussianNB(), } X_train, X_test, y_train, y_test = model_selection.train_test_split(X_new, y ,test_size=0.2) ``` 設定各個algorithms ```python= results = {} y_preds = {} print("Now testing algorithms\n") #Q3 Findthe best one results = {} y_preds = {} print("Now testing all algorithms\n") algo=["DecisionTree","RandomForest","GradientBoosting","AdaBoost","GNB"] for algo in algorithms: clf = algorithms[algo] clf.fit(X_train, y_train) y_pred = clf.predict(X_test) score = clf.score(X_test, y_test) print("%s : %f %%" % (algo, score*100)) results[algo] = score y_preds[algo] = y_pred winner = max(results, key=results.get) print('\nWinning algorithm is %s with a %f %% success' % (winner, results[winner]*100)) ``` 針對所有algorithms進行預測，找出accuracy最高值輸出 ```python= for algo in algorithms: print(confusion_matrix(y_test,y_preds[algo])) ``` 對所有結果進行confusion_matrix，輸出其結果 <style> span.hidden-xs:after { content: ' × ML Security' !important; } </style> ###### tags: `ML Security`

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.