owned this note
owned this note
Published
Linked with GitHub
ML for Security
=
https://ppt.cc/f6zq5x
https://drive.google.com/drive/folders/1J7WuuxGKE0oVle5WU7cnXvCq9eQfkkAl?usp=sharing
## Slide
[DeepCorr](https://arxiv.org/abs/1808.07285)
主要方案:
- 跟助教要USB拿Docker Image
```
sudo docker load –i <filename>
sudo docker load < <filename>
```
備用方案:
- Image:
https://drive.google.com/file/d/16HIzBegZEW7uNJrhsqTRQsWJxjC-CqSw/view?usp=sharing
Pull docker mlsec image
```
docker pull bletchley/mlsec:taiwanno1
```
下載好Ubuntu,並裝好docker後,執行(約6G大小)
```
docker run -p 8888:8888 -it bletchley/mlsec:taiwanno1 bash
```
下載完後,進到 docker 輸入
```
jupyter notebook --allow-root
```
帶有 token 的網址丟進瀏覽器,網址的部分要修改一下
將 http://(<container_id> or 127.0.0.1):8888 => http://127.0.0.1:8888
- [決策樹介紹網址](http://www.r2d3.us/visual-intro-to-machine-learning-part-1/)
- 如何切: Entropy、Gini
Payment
==
One hot encoding
```
df = pd.get_dummies(df, columns = ["paymentMethod"])
df.sample(3)
```
training/testing set
```
X_train, X_test, y_train, y_test = train_test_split(
df.drop('label', axis=1), df['label'],
test_size=0.33, random_state=17)
```
Model
```
# Initialize and train classifier model
clf = LogisticRegression().fit(X_train, y_train)
# Make predictions on test set
y_pred = clf.predict(X_test)
```
Docker no space left problem
https://colobu.com/2018/10/22/no-space-left-on-device-for-docker/
ImportError: cannot import name 'cross_validation'
```python
# replace 'from sklearn import cross_validation' with:
from sklearn.model_selection import train_test_split
```
```python
algorithms = {
"DecisionTree": tree.DecisionTreeClassifier(max_depth=10),
"RandomForest": ensemble.RandomForestClassifier(n_estimators=50),
"GradientBoosting": ensemble.GradientBoostingClassifier(n_estimators=50),
"AdaBoost": ensemble.AdaBoostClassifier(n_estimators=100),
"GNB": naive_bayes.GaussianNB(),
}
```
```python
results = {}
y_preds = {}
print("Now testing algorithms\\n")
for algo in algorithms:
clf = algorithms[algo]
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
score = clf.score(X_test, y_test)
print("%s : %f %%" % (algo, score*100))
results[algo] = score
y_preds[algo] = y_pred
```
## Missing Value
```python
num_orig_rows = len(df)
num_full_rows = len(df.dropna())
(num_orig_rows - num_full_rows)/float(num_orig_rows)
```
Drop Rows
```python
df_droprows = df.dropna()
build_model(df_droprows)
```
Drop Columns
```python
df_dropcols = df[['MonthlyIncome','Overtime','Label']]
build_model(df_dropcols)
```
Fill NA as -1
```python
# use df.fillna to fill column with -1
df_sentinel = df.fillna(value=-1)
build_model(df_sentinel)
```
Use median
```python
# use median instead mean
imp = Imputer(missing_values='NaN', strategy='median', axis=0)
df_imputed = pd.DataFrame(imp.fit_transform(df),
columns=['TotalWorkingYears', 'MonthlyIncome',
'OverTime', 'DailyRate', 'Label'])
build_model(df_imputed)
```
## Network Moninor
``` python
from sklearn.model_selection import train_test_split
# train_test_split
x_train, x_test, y_train, y_test = train_test_split( \
data, label, test_size=0.20, random_state=123)
# Random Forest Classifier
from sklearn import ensemble
clf = ensemble.RandomForestClassifier()
clf.fit(x_train, y_train)
pred = clf.predict(x_test)
#demostrate confusion_matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, pred)
print(cm)
#show accuracy_score
from sklearn import metrics
a_score = metrics.accuracy_score(y_test, pred)
print("Accuracy score:\t %.2f%%" % (a_score*100))
```
## nsl-kdd-classification

## sklearn-gridsearch

---
# Z3 CTF
https://gist.github.com/ekse/baee0cabbe12861443a5#file-harder_serial-py
> 喔喔等等會講 z3 喔
> https://pastebin.com/iRX8AEwc (?
> 其實沒有講,簡單帶過而已,如果有時間再說Z3原理