Network Intrusion Detection System (NIDS) using Tree-based Ensemble Learning

# Network Intrusion Detection System (NIDS) using Tree-based Ensemble Learning ## 1. Overview Did you ever thought if somebody is trying to "hack" your network? Or realize how "secure" are the network where we use to connect our devices?. Well, most of the time usually is secure enough unless there is a smart bad guy or many of them who want to break into the network. Then, not only your password or privacy get will be compromised, also a whole organization could break down into peaces due to impact of cyber-attacks. There is when a anti-virus or firewall is just not enough to protect us and here is where the Network Intrusion Detection came to help us against the bad guys. ## 2. Project Structure Network Intrusion Detection System (NIDS) is a security technology software application designed to - monitor network traffic behaviour, - detect **malicius activity**, - identify cyber-attacks, - alert to administrators. The approach of this project is to use different tree-based **machine learning** models on a pre-proceced dataset to compare its accuracy and precision. ```graphviz digraph hierarchy { nodesep=0.4 // increases the separation between nodes node [color=Darkgreen,fontname=Arial,shape=box] //All nodes will this shape and colour edge [color=black, style=solid] //All the lines look like this "Networking"->{SDN, NFV, "Cyber-Security"} "Cyber-Security" -> {NIDS, HIDS} NIDS->{"CSE-CIC-IDS-Dataset"} "CSE-CIC-IDS-Dataset"->"Machine Learning" #{rank=same;ITManager Teacher1 Teacher2} // Put them on the same level } ``` ## 3. Details of tree-based algoritms ```graphviz digraph hierarchy { nodesep=0.4 // increases the separation between nodes node [color=Darkgreen,fontname=Arial,shape=box] //All nodes will this shape and colour edge [color=black, style=solid] //All the lines look like this "Machine Learning"->{ "Decision Tree" "Random Forest" "Bagging" "XGBoost" "CatBoost" "LightGBM" } } ``` ### Binary Decision Tree Classifier ```graphviz digraph BinaryTreeClassifier { node [shape=box]; Root [label="Weather?\nSunny / Rainy"]; Left [label="Play"]; Right [label="Temperature?\nHot / Cold"]; RightLeft [label="Not Play"]; RightRight [label="Play"]; Root -> Left [label=" Sunny"]; Root -> Right [label=" Rainy"]; Right -> RightLeft [label=" Hot"]; Right -> RightRight [label=" Cold"]; } ``` ### Random Forest ```graphviz digraph RandomForestClassifier { subgraph cluster_tree1 { label="Tree 1"; node [shape=box, style=rounded]; Tree1Root [label="Age? \n < 30 / >= 30"]; Tree1Left [label="Eligible"]; Tree1Right [label="Not Eligible"]; Tree1Root -> Tree1Left [label=" < 30"]; Tree1Root -> Tree1Right [label=" >= 30"]; } subgraph cluster_tree2 { label="Tree 2"; node [shape=box, style=rounded]; Tree2Root [label="Income? \n < 50k / >= 50k"]; Tree2Left [label="Not Eligible"]; Tree2Right [label="Eligible"]; Tree2Root -> Tree2Left [label=" < 50k"]; Tree2Root -> Tree2Right [label=" >= 50k"]; } } ``` ### Bagging ```graphviz digraph Baggin { subgraph root_tree { label="Root Tree"; node [shape=box]; "Root Tree" -> {Tree1Root Tree2Root Tree3Root}; subgraph cluster_tree1 { label="Tree 1"; node [shape=box]; Tree1Root [label="Age? \n < 30 / >= 30"]; Tree1Left [label="Eligible"]; Tree1Right [label="Not Eligible"]; Tree1Root -> Tree1Left [label=" < 30"]; Tree1Root -> Tree1Right [label=" >= 30"]; } subgraph cluster_tree2 { label="Tree 2"; node [shape=box]; Tree2Root [label="Income? \n < 50k / >= 50k"]; Tree2Left [label="Not Eligible"]; Tree2Right [label="Eligible"]; Tree2Root -> Tree2Left [label=" < 50k"]; Tree2Root -> Tree2Right [label=" >= 50k"]; } subgraph cluster_tree3 { label="Tree 3"; node [shape=box]; Tree3Root [label="Experience? \n < 2 years / >= 2 years"]; Tree3Left [label="Not Eligible"]; Tree3Right [label="Eligible"]; Tree3Root -> Tree3Left [label=" < 2 years "]; Tree3Root -> Tree3Right [label=" >= 2 years "]; } } subgraph results { label="results"; node [shape=box]; {Tree3Left Tree3Right Tree2Right Tree2Left Tree1Right Tree1Left} -> {"Decision"}; } } ```   ## 4. Dataset Description The dataset was created as a result of a collaborative project between the Canadian Institute for Cybersecurity (CIC) and the Communications Security Establishments (CSE). Due to privacy and confidentiality, organizations will not share their traffic data, this itself is a significant challenge and availability becomes extremely rare. ![image](https://hackmd.io/_uploads/Sk32fmXHT.png) Figure 1 : Network Topology. > The dataset `CSE-CIC-IDS2018` is hosted in Amazon Web Services (AWS). ## 5 Type of Attacks :::warning |Name |Attack| |---|--------| | 1 | Bruteforce attack| | | 2 | Web attack| | 3 | Infiltration attack| | 4 | Botnet attack| | 5 | DDos and Port Scan| ::: ## 6. Results :::success ![image](https://hackmd.io/_uploads/ryQmiRRHa.png =200x150) ![image](https://hackmd.io/_uploads/S17rjCCr6.png =200x150) ![image](https://hackmd.io/_uploads/HyxUjCAS6.png =200x150) ![image](https://hackmd.io/_uploads/H1edDi0RHa.png =200x150) ![image](https://hackmd.io/_uploads/rJVdsRCHT.png =200x150) ![image](https://hackmd.io/_uploads/H1lKsRRBT.png =200x150) ![image](https://hackmd.io/_uploads/rkciiA0Sa.png) ::: ## 6. Prerequisites and Installations You can run and test this project repository directly in Google Colab environment in the following url [Open Google Colab](https://githubtocolab.com/mjacker/MJCapstone/blob/master/0_merged_ipynb_files_for_google_colab.ipynb) or Scan the QR below. Another way is cloning this repository, then install a `venv` enviroment using the `requirement.yml` file. [Open Github Repository](https://github.com/mjacker/MJCapstone/tree/master) or Scan the QR below. ![Google Colab - ML-IDS](https://hackmd.io/_uploads/B1yYpc-Sa.png =300x300) ![My github Capstone](https://hackmd.io/_uploads/HyRFT9ZB6.png =300x300)