網絡協議逆向分析

# 網絡協議逆向分析 ## 實驗簡介 ![](https://hackmd.io/_uploads/rJ-sNpeUn.png) ## 关于bitmap,bloom filter,sketch在网络流中的应用 ### Bitmap Bitmap是一种基础的数据结构，它用于表示1位或0位的二进制值集合。在网络流中，Bitmap用来帮助确定哪些IP地址已经被访问过，以及哪些端口正在使用等等。同时Bitamp也可以用于流量监测方面，例如记录传输协议类型和客户端操作系统，从而帮助识别并防范恶意攻击。 ### Bloom Filter Bloom Filter是一种高效的概率型数据结构，用于判断元素是否属于某个集合。在网络流中，Bloom Filter被广泛用于减少缓存服务器的负载压力。它可以快速确定一个特定的URL是否已经被访问过。同时，Bloom Filter还常用于常量字符串的比较工作中（如DNS查询），从而提高处理速度。 ### Sketch Sketch是一类算法，其目标是利用小内存预计算和估计计数各种数据集合。Sketch使用哈希函数来将原始数据转换为计数器数组，并使用一些统计技巧来估计结果。在网络流中，Sketch也是很有用的。例如，可以将它们用于对流量进行相关性分析（如流量聚类或基于应用程序的分类）以及对网络状况进行快速探测。 ## 設計Direat Bitmap ```python import hashlib class DirectBitmap: def __init__(self, size): self.size = size self.bitmap = [0 for _ in range(size)] self.count = 0 def add_packet(self, packet): index = hash(packet) % self.size if self.bitmap[index] == 1: self.count += 1 self.bitmap[index] = 1 def get_count(self): return sum(self.bitmap) ``` 这段代码实现了Bloom Filter功能，在初始化时，设定了Bloom Filter的大小，创建了一个全为0的列表，用于存储数据包的哈希值，Hash函數调用hashlib.sha256库，取值則是把(srclP, dstlP, srcPort, dstPort, Protocol, size)元組當成字符串。 ## 在Direct Bitmap下，L與Hash的碰撞次數測試如下圖所示 ![](https://hackmd.io/_uploads/rJDiFk7Ih.png) 上图显示L与hash结果之间的关系，图中的L size越大，Number of hash's collisions越小，他们之间是呈反比例关系的。 ## 关于测试标准bloom filter，以下是关于该实验的设计与分析 ###### 設計Bloom Filter * **設計Bloom Filter** * **收集PCAP文件用作測試Bloom Filter的數據** * **列出BloomFilter長度N，Hash函數個數K，數據流個體M之間的關係** * **測試使用Bitarray與使用list在空間利用上的差異** * **比較 bloom filter / set / dict 之間的大小** ```python import hashlib import bitarray import random import string import time import sys class BloomFilter: def __init__(self, size): self.size = size self.bitarray = bitarray.bitarray(size) self.list = [0] * size self.bitarray.setall(False) self.hash_functions = [hashlib.md5, hashlib.sha1, hashlib.sha256] def add_data(self, data): for hash_func in self.hash_functions: index = int(hash_func(data).hexdigest(), 16) % self.size self.bitarray[index] = 1 self.list[index] = 1 def is_data_exist(self, data): result = bitarray.bitarray([True]) for hash_func in self.hash_functions: index = int(hash_func(data).hexdigest(), 16) % self.size result &= self.bitarray[index:index+1] if result[0]: return True else: return False ``` ### 代碼解釋為了可以直接操縱 bit 所以特別找了 bitarray 這個套件來使用，要不然使用一般的 python list 的話 size 就會比較大。首先我們初始化一個 class，這次我們使用 3 個 hash function: md5, sha1, sha256，在 python 的 hashlib 已經實作好了，可以直接使用。 ```python def __init__(self, size): self.size = size self.bitarray = bitarray.bitarray(size) self.list = [0] * size self.bitarray.setall(False) self.hash_functions = [hashlib.md5, hashlib.sha1, hashlib.sha256] ``` 接下來就是實作 add_data 這個 function，hash 之後 mod 就可拿到 index，再去改值就好了，為了看到底差多少，這邊連 list 也要一起改值喔。 ```python def add_data(self, data): for hash_func in self.hash_functions: index = int(hash_func(data).hexdigest(), 16) % self.size self.bitarray[index] = 1 self.list[index] = 1 ``` 再來就是確認該筆 data 有沒有出現過的 is_data_exist function，只要把拿到的 bit 全部 AND 起來看是不是 1 就可以了。 ```python def is_data_exist(self, data): result = bitarray.bitarray([True]) for hash_func in self.hash_functions: index = int(hash_func(data).hexdigest(), 16) % self.size result &= self.bitarray[index:index+1] if result[0]: return True else: return False ``` Bloom Filter 數據測試根據實驗要求，數據采用PCAP格式。引入庫 ```python from scapy.all import * from scapy.layers.inet import IP, TCP, UDP from BloomFilter import BloomFilter 讀取PCAP文件內容 def readData(path: str) -> (list, set): pcap = rdpcap(path) data_list = list() dataSet = set() for pkt in pcap: # 获取时间 time = pkt.time try: # 获取IP地址 srcIP = pkt[IP].src dstIP = pkt[IP].dst # 获取端口号 if pkt.haslayer(TCP): srcPort = pkt[TCP].sport dstPort = pkt[TCP].dport protocol = "TCP" size = len(pkt[TCP]) elif pkt.haslayer(UDP): srcPort = pkt[UDP].sport dstPort = pkt[UDP].dport protocol = "UDP" size = len(pkt[UDP]) else: continue x = srcIP + dstIP + str(srcPort) + str(dstPort) + protocol + str(size) data_list.append(str.encode(x)) dataSet.add(x) except Exception as e: print(e) break return data_list, dataSet ``` 主函數 ```python if name == “main”: data_list, dataSet = readData("test.pcap") test_list, _ = readData("now.pcap") contain_data_size = len(data_list) test_case_amount = len(test_list) size = 20000 print("Number of items in the filter: ", contain_data_size) print("Number of bits in the filter: ", size) print("Number of hash functions: ", 3) false_positive_amount = 0 bloomFilter = BloomFilter(size) # Add data on BloomFilter for data in data_list: bloomFilter.add_data(data) # Test BloomFilter's false positive for data in test_list: if bloomFilter.is_data_exist(data) and (data not in dataSet): false_positive_amount += 1 fail_rate = round((false_positive_amount / test_case_amount) * 100, 2) print("{:20} {}\n".format("success case amount:", test_case_amount - false_positive_amount)) print("{:20} {}\n".format("fail case amount:", false_positive_amount)) print("{:20} {}%\n".format("fail rate:", fail_rate)) ``` 實驗結果設 N = 11140, M = 20000, K = 3 則 ![](https://hackmd.io/_uploads/H1LYz_j8h.png) ![](https://hackmd.io/_uploads/BkTl9_i8h.png) 可以看見，最終的結果很接近理論分析的公式。設 N = 1000, M = 10000, K = 3 則 ![](https://hackmd.io/_uploads/BkkLxHnLh.png) ![](https://hackmd.io/_uploads/rkoUlS2Ih.png) 可以看見，最終的結果很接近理論分析的公式。