# ML for Cyber Security
Project Report\
CSAW-HackML-2020\
Name: Eugene Wang\
Net-ID: yjw259
https://github.com/y56/CSAW-HackML-2020/blob/master/report.md
https://hackmd.io/@y56/H1xHFwbTD
## Introduction
In this lab we are given backdoored CNNs (called bad-net/bd_model) with known architecture (refer to `architecture.py`) and we want to "repair" the bad-net. Imagine we are buying service to train a model for us, or using some unknown source of model. Attackers may train the model to perform normally on "clean" data while output misleading result on "poisoned data" with "trigger" it.
## Methodology
Following the method in ***Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks***, I use fine-pruning as the defense approach. Although the authors also mention the pruning-aware attack. I assume the bad net we considered as just baseline attack.
### Pruning
I reset the the 77% lowest contribution neurons under clean input. The idea behind is that we believe the backdoor pattern will be capture in the `conv_3` layer but its contribution will be low when clean input are processed.
### Fine-tuning
In this step, I retrain the model on clean input. As mentioned in the paper, this is a technique of transfer learning. We can view it as we drop some suspicious rules in the bad net first, and then train the model based on the detoxified (yet less powerful) model. Note that those reset neurons will gain weights/bias again. I save fine-pruned models for each bd model.
### Combining
I use a function `accuracy_calculator_for_combined_models()` to call both bad net and fine-pruned net and compare their outputs. An input will be considered clean if the two models give the same result, otherwise, backdoored.
### Detecting backdoored data for evaluating performance
:::warning
Actually no need for this since the provided posioned data are completely backdoored.
:::
For `anonymous_1_poisoned_data.h5` I use number of purple `(128,255,255)` pixels to detect backdoored data. This only for evaluation. I am not using this information to do the repairing. I use `check_anonymous_1_poisoned_data.py` to label backdoored inputs of `anonymous_1_poisoned_data.data` such that I can calculated *accuracy on clean data*, *attack success rate on backdoored data*, and *attack detection rate*.
## Discussion
### anonymous_1_bd_net
```
python3 eval_defense.py data/anonymous_1_poisoned_data.h5 models/anonymous_1_bd_net.h5 data/trigger_anonymous_1_poisoned_data.pkl
```
* before repair
* bad net on clean validation data:
* acc: 97.18
* bad net on clean test data:
* acc: 97.19
* bad net on its corresponding poisoned data:
* acc: 91.40
* pruned
* pruned bad net on clean validation data:
* acc: 36.81
* pruned bad net on clean test data: acc:
* acc: 37.28
* pruned bad net on its corresponding poisoned data:
* acc: 50.08
* tuned and pruned
* tuned pruned bad net on clean validation data:
* acc: 99.89
* tuned pruned bad net on clean test data:
* acc: 95.259
* tuned pruned bad net on its corresponding poisoned data:
* acc: 8.37
* repaired (by comparing)
* repaired net on clean validation data:
* acc: 97.13
* inferred attack_ratio: 2.80 (true as 0)
* repaired net on clean test data:
* * acc: 93.84
* inferred attack_ratio: 5.64 (true as 0)
* repaired net on its corresponding poisoned data:
* acc: 8.35
* inferred attack_ratio: 83.66
* true attack ratio (using purple detection) :
* 90.93
* accuracy on those clean data within poisoned data:
* 0.21
* I don\'t know why so low, maybe all data od `anonymous_1_poisoned_data.h5` are backdoored? So actually no clean data in it.
* All data in `anonymous_1_poisoned_data.h5` are labeled as zero. So, very possible there are no clean data in it.
* detected rate for those truly bd data within poisoned data:
* 92.14
* attack succes for those clean data wihun poisoned data:
* 7.67
## Reference
Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks
Kang Liu, Brendan Dolan-Gavitt, Siddharth Garg
https://arxiv.org/abs/1805.12185
https://github.com/kangliucn
## log
### sunglasses_bd_net
```bash
python3 eval_defense.py data/sunglasses_poisoned_data.h5 models/sunglasses_bd_net.h5
```
```python=
bad net on clean validation data: acc: 97.88689702953148
bad net on clean test data: acc: 97.77864380358535
bad net on its corresponding poisoned data: acc: 99.99220576773187
pruned bad net on clean validation data: acc: 35.51571836840738
pruned bad net on clean test data: acc: 35.861262665627436
pruned bad net on its corresponding poisoned data: acc: 93.4684333593141
tuned pruned bad net on clean validation data: acc: 99.76617303195636
tuned pruned bad net on clean test data: acc: 93.4684333593141
tuned pruned bad net on its corresponding poisoned data: acc: 9.540140296180827
repaired net on clean validation data: acc, inferred attack_ratio: (97.70503160994197, 2.260327357755261)
repaired net on clean test data: acc, inferred attack_ratio (92.79033515198752, 6.609508963367109)
repaired net on its corresponding poisoned data: overall acc, inferred attack_ratio: (9.540140296180827, 90.45206547155105)
```
### multi_trigger_multi_target_bd_net
#### eyebrows_poisoned_data
```bash
python3 eval_defense.py "data/Multi-trigger Multi-target/eyebrows_poisoned_data.h5" models/multi_trigger_multi_target_bd_net.h5
```
```python=
bad net on clean validation data: acc: 96.26742876937733
bad net on clean test data: acc: 96.00935307872174
bad net on its corresponding poisoned data: acc: 91.34840218238503
pruned bad net on clean validation data: acc: 46.03793193037152
pruned bad net on clean test data: acc: 45.74434918160561
pruned bad net on its corresponding poisoned data: acc: 74.29851909586905
tuned pruned bad net on clean validation data: acc: 99.91339741924308
tuned pruned bad net on clean test data: acc: 95.22213561964146
tuned pruned bad net on its corresponding poisoned data: acc: 58.47622759158223
repaired net on clean validation data: acc, inferred attack_ratio: (96.25010825322595, 3.6806096821685284)
repaired net on clean test data: acc, inferred attack_ratio (93.03975058456741, 6.266562743569759)
repaired net on its corresponding poisoned data: overall acc, inferred attack_ratio: (58.44699922057678, 34.343335931410756)
```
#### lipstick_poisoned_data
```bash
python3 eval_defense.py "data/Multi-trigger Multi-target/lipstick_poisoned_data.h5" models/multi_trigger_multi_target_bd_net.h5
```
```python=
bad net on clean validation data: acc: 96.26742876937733
bad net on clean test data: acc: 96.00935307872174
bad net on its corresponding poisoned data: acc: 91.52377240841777
pruned bad net on clean validation data: acc: 46.03793193037152
pruned bad net on clean test data: acc: 45.74434918160561
pruned bad net on its corresponding poisoned data: acc: 27.572096648480127
tuned pruned bad net on clean validation data: acc: 99.96535896769724
tuned pruned bad net on clean test data: acc: 95.17537022603274
tuned pruned bad net on its corresponding poisoned data: acc: 0.9840218238503509
repaired net on clean validation data: acc, inferred attack_ratio: (96.24144799515025, 3.749891746774054)
repaired net on clean test data: acc, inferred attack_ratio (93.02416212003118, 6.266562743569759)
repaired net on its corresponding poisoned data: overall acc, inferred attack_ratio: (0.9840218238503509, 91.69914263445051)
```
#### sunglasses_poisoned_data
```bash
python3 eval_defense.py "data/Multi-trigger Multi-target/sunglasses_poisoned_data.h5" models/multi_trigger_multi_target_bd_net.h5
```
```python=
bad net on clean validation data: acc: 96.26742876937733
bad net on clean test data: acc: 96.00935307872174
bad net on its corresponding poisoned data: acc: 100.0
pruned bad net on clean validation data: acc: 46.03793193037152
pruned bad net on clean test data: acc: 45.74434918160561
pruned bad net on its corresponding poisoned data: acc: 0.009742790335151987
tuned pruned bad net on clean validation data: acc: 99.89607690309171
tuned pruned bad net on clean test data: acc: 95.18316445830087
tuned pruned bad net on its corresponding poisoned data: acc: 0.13639906469212784
repaired net on clean validation data: acc, inferred attack_ratio: (96.2068069628475, 3.7585520048497445)
repaired net on clean test data: acc, inferred attack_ratio (92.98519095869057, 6.360093530787217)
repaired net on its corresponding poisoned data: overall acc, inferred attack_ratio: (0.13639906469212784, 99.86360093530787)
```
### anonymous_2_bd_net
```bash
python3 eval_defense_nodata.py nodata models/anonymous_2_bd_net.h5
```
```python=
bad net on clean validation data: acc: 95.82575560751711
bad net on clean test data: acc: 95.96258768511302
pruned bad net on clean validation data: acc: 36.780116047458215
pruned bad net on clean test data: acc: 37.44349181605612
tuned pruned bad net on clean validation data: acc: 99.91339741924308
tuned pruned bad net on clean test data: acc: 94.94933749025721
repaired net on clean validation data: acc, inferred attack_ratio: (95.79111457521434, 0.04156923876331515)
repaired net on clean test data: acc, inferred attack_ratio (92.64224473889323, 0.06851130163678877)
```