# Unicorn illustrative examples
Ideal characteristics:
- real-life
- popular
- have problems that can be fixed
- fixes require
- modifying pipeline after initial test, ideally - adding side measurements synchronized with something else
- modyfing deployment: on what devices to deploy
- Quick modification and redeployment allows to fix data collection or easily experiment with newer data
## VPN/Non VPN
NO, reason: could be fixed during data preprocessing (after data collection), so the platform is not required
## Heartbleed @ CIC-IDS-2017
Pros:
- CIC-IDS-2017
Cons:
- Shows only pipeline modification, deployment modification is not needed
## IDS @ CIC-IDS-2017
Pros:
- CIC-IDS-2017
- Deployment modification will help (more dispersed environment)
Cons:
- Pipeline modification isn't easy to advocate
- Holland's paper :D
## Kitsune @ Mirai dataset
Pros:
- Easy to ask for pipeline modification
Cons:
- Deployment modification isn't so easy to advocate
- No 'easy fix' even for pipeline modification
ToDo:
- Look at other datasets from the same source
- Pulse wave attacks?
- Creating of the dataset in pulse-wave
## OS Fingerprinting @ CIC-IDS-2017
Pros:
- Simpler to understand
- Easy to advocate for deployment changes (again TTL)
Cons:
- Pipeline modification?
- Holland again
- Probably ignore because we cannot reproduce/recollect
## IoT Device Fingerprinting
Pros:
- Shows different architectures for netunicorn
Cons:
- no deployment modifications
- hard (P4 switch) pipeline modification and possibly no separate tasks
## Pensieve
Cons:
- Not easy to check, and fit to the system
# Intermediate result
Something based on CIC-IDS-2017, because we already know deployment problems (TTL), and it's very popular and close to real life.
## Mediocre example: IDS @ CIC-IDS-2017
The main problem in this dataset is TTL-based separation of harmless/harmful traffic, because all attackers were outside of the network. So, it is easy to justify deployment modification - deploy attackers code inside the network and harmless code outside of the network.
Pipeline modification isn't so easy to justify. We can say we want to add periodic TTL modification synchronized with experiment start to imitate attacks coming from different network distance and further bring dataset closer to real-world distribution.
Cons:
- Still Holland's paper :D
- Rather old
Proposal:
- Create some another task on top of the dataset?
- Detection of particular classes, so we can enrich these classes and therefore justify pipeline modification?
## https://www.unb.ca/cic/datasets/darknet2020.html
Darkent traffic classification (public internet vs darknet)
## https://www.unb.ca/cic/datasets/ids-2018.html
CIC-IDS-2019 using AWS
## M-Lab
Cons:
- It probably would be hard to find problems related to deployment, as they are internet-
## Beauty and the Burst
ToDo:
- Take students' code