Unicorn illustrative examples

# Unicorn illustrative examples Ideal characteristics: - real-life - popular - have problems that can be fixed - fixes require - modifying pipeline after initial test, ideally - adding side measurements synchronized with something else - modyfing deployment: on what devices to deploy - Quick modification and redeployment allows to fix data collection or easily experiment with newer data ## VPN/Non VPN NO, reason: could be fixed during data preprocessing (after data collection), so the platform is not required ## Heartbleed @ CIC-IDS-2017 Pros: - CIC-IDS-2017 Cons: - Shows only pipeline modification, deployment modification is not needed ## IDS @ CIC-IDS-2017 Pros: - CIC-IDS-2017 - Deployment modification will help (more dispersed environment) Cons: - Pipeline modification isn't easy to advocate - Holland's paper :D ## Kitsune @ Mirai dataset Pros: - Easy to ask for pipeline modification Cons: - Deployment modification isn't so easy to advocate - No 'easy fix' even for pipeline modification ToDo: - Look at other datasets from the same source - Pulse wave attacks? - Creating of the dataset in pulse-wave ## OS Fingerprinting @ CIC-IDS-2017 Pros: - Simpler to understand - Easy to advocate for deployment changes (again TTL) Cons: - Pipeline modification? - Holland again - Probably ignore because we cannot reproduce/recollect ## IoT Device Fingerprinting Pros: - Shows different architectures for netunicorn Cons: - no deployment modifications - hard (P4 switch) pipeline modification and possibly no separate tasks ## Pensieve Cons: - Not easy to check, and fit to the system # Intermediate result Something based on CIC-IDS-2017, because we already know deployment problems (TTL), and it's very popular and close to real life. ## Mediocre example: IDS @ CIC-IDS-2017 The main problem in this dataset is TTL-based separation of harmless/harmful traffic, because all attackers were outside of the network. So, it is easy to justify deployment modification - deploy attackers code inside the network and harmless code outside of the network. Pipeline modification isn't so easy to justify. We can say we want to add periodic TTL modification synchronized with experiment start to imitate attacks coming from different network distance and further bring dataset closer to real-world distribution. Cons: - Still Holland's paper :D - Rather old Proposal: - Create some another task on top of the dataset? - Detection of particular classes, so we can enrich these classes and therefore justify pipeline modification? ## https://www.unb.ca/cic/datasets/darknet2020.html Darkent traffic classification (public internet vs darknet) ## https://www.unb.ca/cic/datasets/ids-2018.html CIC-IDS-2019 using AWS ## M-Lab Cons: - It probably would be hard to find problems related to deployment, as they are internet- ## Beauty and the Burst ToDo: - Take students' code