Lab 0-1 - HackMD

# Lab 0-1 [TOC] 路徑:`/mlsec/frauddetect/logistic-regression-fraud-detection.ipynb` ## panda Pandas 是 python 的一個數據分析 lib，提供兩種主要的資料結構，Series 與 DataFrame - Series 用來處理時間序列相關的資料 - DataFrame 處理結構化(Table like)的資料 EX:csv ## Code: ```python= import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix # Read in the data from the CSV file df = pd.read_csv('datasets/payment_fraud.csv') df.sample(15) ``` 引入panda讀取資料 `.sample(n)`可以隨機選取出n筆資料 ```python= # Convert categorical feature into dummy variables with one-hot encoding #Q1. Which column is categorical feature #Q2 Use pd.get_dummies to convert it to one-hot encoding # Usage pd.get_dummies(<your data>, columns = [<culomn name>]) df = pd.get_dummies(df,column=['paymentMethod']) ``` `get_dummies()`是進行**One-hot encoding** 也是這部分最重要的一環有些資料不一定具有數字上的意義，但是也是重要的資料不能忽略我們可以利用**One-hot encoding**將資料轉換成數字型態，讓他可以進行+/-運算 ```python= df.sample(3) # Split dataset up into train and test sets # Q3: Use df.drop to drop label and Generate feature data # Q4: Split data into 2:1 by train_test_split # Usage: X_train, X_test, y_train, y_test = train_test_split(features data frame, label data frame, test_size=<test_size>, random_state=17) X_train,X_test,y_train,y_test = train_test_spilt(df.drop('label',axis=1),df['lable'],test_size=0.33, random_state=17) ``` 使用`train_test_spilt()`將資料分成**train data**和**test data** `.drop()`可以刪除特定資料，其中`axis`參數為`0`刪除column；`1`刪除row `test_size`是樣本占比，如果輸入整數則為數量 `random_state`是亂數種子，設定好之後可以保證每次拿到的數值都相同，在重複驗證時很好用 ```python= #df = pd.get_dummies(df, columns=['paymentMethod']) # Initialize and train classifier model # Q5: New LogisticRegression Model and fit the data you have # Usage: clf = LogisticRegression().fit(feature data frame, label data frame) clf = LogisticRegression().fit(X_train,y_tain) y_pred = clf.predict(X_test) ``` 使用`LogisticRegression()`以及`.fit()`進行訓練並且進行預測`.predict()` ```python= # Q6: Use predict to test sample # Make predictions on test set # Usage: clf.predict(data to test) # Compare test set predictions with ground truth labels print(accuracy_score(y_pred, y_test)) print(confusion_matrix(y_test, y_pred)) ``` 輸出最終準確度`qccuracy_score()` 運用`confusion_matrix()`計算混淆矩陣來評估準確性最終訓練結果: ``` 0.999922738159623 [[12753 0] [ 1 189]] ``` <style> span.hidden-xs:after { content: ' × ML Security' !important; } </style> ###### tags: `ML Security`

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.