CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph

# CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph --- ## Abstract - 藍色:原文 - 綠色:翻譯 - 黃色:心得 :::info To prevent the same known vulnerabilities from affecting different firmware, searching known vulnerabilities in binary firmware across different architectures is crucial. Because the accuracy of existing cross-architecture vulnerability search methods is not high, **we propose a staged approach based on support vector machine (SVM) and attributed control flow graph (ACFG) at the function level to improve the accuracy using prior knowledge**. Furthermore, for efficiency, we utilize the k-nearest neighbor (kNN) algorithm to prune and SVM to refine in the function prefilter stage. Although the accuracy of the proposed method using kNN-SVM approach is slightly lower than the accuracy of the method using only SVM, its efficiency is significantly enhanced.We have implemented our approach CVSkSA to search several vulnerabilities in real-world firmware images. The experimental results show that the accuracy of the proposed method using kNN-SVM approach is close to the accuracy of the method using only SVM in most cases, while the former is approximately four times faster than the latter ::: :::success 基於SVM 和屬性控制流程圖（ACFG）我們在函數階段提出了一種分階段的方法來增進準確性，從先前的論文中的啟發提高準確性在效率上也用KNN-SVM的方法來優化，但準確性較SVM低，實驗結果顯示使用kNN-SVM方法提出的方法的準確度接近於大多數情況下僅使用SVM的方法，而前者大約是四倍快。 ::: :::warning 這篇可以理解 ACFG的概念，透過ACFG的方法來解較以往的CFG差異值得探討 ::: ## Introduction :::info In general, firmware refers not only to the interface combining the hardware with the software, but also to the software residing in the hardware. Firmware is an important part of IoT systems, the BIOS in computer systems, and the programs in extension ROM, as well as executable programs of common network devices, such as routers, switches, and webcams, all of which are typical firmware. However, similar to common software, firmware may also have vulnerabilities (referring to the weaknesses that can be exploited by attackers performing unauthorized actions, which introduce potential risk to IoT systems) Costin et al. 2014). ZoomEye’s statistical report (Knownsec 2017) showed that 23% of more than 60,000 routers available for public search were affected by backdoor mechanisms in 2013. According to the report of OWASP (Open Web Application Security Project) in 2014, among the top 10 attacks on IoT systems, the attacks on software and firmware in embedded devices ranked ninth (Open web application secutity project 2016). With increasingly more security incidents occurring due to malicious firmware (Adelstein et al. 2002), there is a greater awareness of the importance of the vulnerability search in firmware, especially in cross-architecture scenarios. Because obtaining the source code of most firmware is difficult, the known vulnerability search works mainly at the binary code level. Abundant approaches exist to search known vulnerabilities at the binary code level. However, most of these existing approaches either utilize dynamic analysis or are limited to the same architecture. The dynamic analysis on the firmware generally needs the specific devices or the simulation environment to run the target binary firmware. Additionally, there are several strict requirements for the target code to execute, so it is laborious to apply the dynamic analysis to cross-architecture vulnerability search cases. Other approaches, such as the k-gram and sequence alignment of instructions, obtain the opcodes or instructions for analysis, but these approaches are closely related to the specific architecture; therefore, directly applying them to the known vulnerability search across different architectures is difficult. the known binary vulnerabilities across different architectures, there are only few researches on cross-architecture known vulnerability search. Recently, one advanced method to search known vulnerabilities across different architectures was proposed by Eschweiler et al. (2016). They employed a prescreening method to screen out most of the dissimilar functions and then used the MCS (maximum common subgraph) algorithm to determine the true matching function among a few suspicious functions. Although the method is effective in the cross-architecture cases, its prescreening stage seems unreliable and might result in poor accuracy in some scenarios (e.g., the case in Section 3). In this paper, to enhance the overall efficiency, we adopted a staged strategy for the vulnerability search in firmware. To obtain a small portion of the candidate functions quickly, we first used kNN to prune (similar to the work by Eschweiler et al. in 2016), and then used SVM to refine, which can result in a higher accuracy in the prescreening stage. Then, inspired by the work of Feng et al. (2016), we used bipartite matching to pick out the true matching functions from the suspicious functions that remained from the prescreening stage. The experimental results show that CVSkSA achieves good performance in vulnerability search. This paper is a significant extension of the conference paper published at DSA 2017 (Lin et al. 2017). On the basis of the original paper, we further propose a hybrid method using a kNN-SVM approach to improve the efficiency considerably at an acceptable or negligible cost of accuracy. The idea is to use kNN for fast screening out of obvious non-candidates, and then, SVM is only applied to a small number of highly suspected functions. The kNN-SVM-based method can reduce the query time to a few times lower than that of our previous work only applying SVM (Lin et al. 2017), e.g., from 0.18 s to 0.032 s in the condition “ARM to MIPS”, at the expense of slightly lower correctness, e.g., from 99.7 to 99.6%. The kNN-SVM-based method also outperforms other state-of-the-art approaches, i.e., Multi−MH (Pewny et al. 2015), Multi−k−MH (Pewny et al. 2015), and discovRE (Eschweiler et al. 2016), in terms of overall performance. In summary, our major contributions are as follows: 1. We show a staged approach, CVSkSA, to search vulnerabilities in binary firmware across different architectures, which can take advantage of the knowledge that we already know about the vulnerabilities. 2. We perform some experiments on a baseline dataset and real-world firmware images. Compared with Multi−MH (Pewny et al. 2015), Multi−k−MH (Pewny et al. 2015), and discovRE (Eschweiler et al. 2016), the experimental results show that CVSkSA has not only a better accuracy in vulnerability search but also a higher efficiency. The rest of this paper is organized as follows: Section 2 is the overview of the proposed approach. Section 3 mainly discusses the implementation of our approach, and compares it with the state-of-the-art approaches. Section 4 shows the experimental evaluation of our approach on real-world vulnerable functions and firmware images under realistic conditions. Section 5 presents some related works about recognizing the known vulnerabilities and some works about the hybrid method using kNN-SVM in other application fields. Section 6 mainly demonstrates some restrictions of our approach. Section 7 provides a summary of our work and presents the future work. ::: :::success 韌體是硬體和軟體之間的介面包含:BIOS、ROM中的程式、網路設備連接程式、但可能存在漏洞（指攻擊者可以利用的弱點執行未經授權的操作，這會給物聯網系統帶來潛在的風險從韌體中找到原始的source code有相當的難度，因此搜索已知漏洞主要會在二進制執行檔進行大多數的搜尋都是以動態分析為主或是只能限定同個架構(無法跨架構) 動態分析很難處理不同架構問題會受限制有些方法像是:透過k-gram和指令序列的校準觀察opcode或指令來分析但這種只能適應特定架構 Pewny et al. (2015)在先前的研究提出架構漏洞的問題，表示目前跨架構還是比較少研究的近期像是 Eschweiler et al. (2016). 學者有提出比較好的研究這篇是使用預篩選(prescreening)方法來篩選出大多數不相同的function 然後使用MCS（maximum common subgraph 最大公共子圖）演算法從可疑的function，中去做匹配，雖然這種方法快架構上很有效率但prescreening階段看起來不可靠且在幾個不同場景鐘不太優良而作者提出的方法透過多個階段策略來增進效率，為觀察小部分的候選function，作者先用KNN去嘗試接著用來提煉可以提高準確性這篇是DSA 2017 (Lin et al. 2017).的延伸 ::: :::warning 接續上一篇看 ::: ## Approach overview ###### tags: `thesis`