# CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph
---
## Abstract
- 藍色:原文
- 綠色:翻譯
- 黃色:心得
:::info
To prevent the same known vulnerabilities from affecting different firmware, searching
known vulnerabilities in binary firmware across different architectures is crucial. Because
the accuracy of existing cross-architecture vulnerability search methods is not high, **we propose a staged approach based on support vector machine (SVM) and attributed control flow graph (ACFG) at the function level to improve the accuracy using prior knowledge**. Furthermore,
for efficiency, we utilize the k-nearest neighbor (kNN) algorithm to prune and SVM
to refine in the function prefilter stage. Although the accuracy of the proposed method using
kNN-SVM approach is slightly lower than the accuracy of the method using only SVM, its
efficiency is significantly enhanced.We have implemented our approach CVSkSA to search
several vulnerabilities in real-world firmware images. The experimental results show that
the accuracy of the proposed method using kNN-SVM approach is close to the accuracy of
the method using only SVM in most cases, while the former is approximately four times
faster than the latter
:::
:::success
基於SVM 和 屬性控制流程圖(ACFG)我們在函數階段提出了一種分階段的方法來增進準確性,從先前的論文中的啟發提高準確性
在效率上也用KNN-SVM的方法來優化,但準確性較SVM低
,實驗結果顯示使用kNN-SVM方法提出的方法的準確度接近於
大多數情況下僅使用SVM的方法,而前者大約是四倍
快。
:::
:::warning
這篇可以理解 ACFG的概念,透過ACFG的方法來解 較以往的CFG差異值得探討
:::
## Introduction
:::info
In general, firmware refers not only to the interface combining the hardware with the software,
but also to the software residing in the hardware. Firmware is an important part of
IoT systems, the BIOS in computer systems, and the programs in extension ROM, as well
as executable programs of common network devices, such as routers, switches, and webcams,
all of which are typical firmware. However, similar to common software, firmware may also have vulnerabilities (referring to the weaknesses that can be exploited by attackers
performing unauthorized actions, which introduce potential risk to IoT systems) Costin
et al. 2014).
ZoomEye’s statistical report (Knownsec 2017) showed that 23% of more than
60,000 routers available for public search were affected by backdoor mechanisms in 2013.
According to the report of OWASP (Open Web Application Security Project) in 2014,
among the top 10 attacks on IoT systems, the attacks on software and firmware in embedded
devices ranked ninth (Open web application secutity project 2016). With increasingly
more security incidents occurring due to malicious firmware (Adelstein et al. 2002), there
is a greater awareness of the importance of the vulnerability search in firmware, especially
in cross-architecture scenarios.
Because obtaining the source code of most firmware is difficult, the known vulnerability
search works mainly at the binary code level. Abundant approaches exist to search known
vulnerabilities at the binary code level. However, most of these existing approaches either
utilize dynamic analysis or are limited to the same architecture. The dynamic analysis on
the firmware generally needs the specific devices or the simulation environment to run the
target binary firmware. Additionally, there are several strict requirements for the target code
to execute, so it is laborious to apply the dynamic analysis to cross-architecture vulnerability
search cases. Other approaches, such as the k-gram and sequence alignment of instructions,
obtain the opcodes or instructions for analysis, but these approaches are closely related to
the specific architecture; therefore, directly applying them to the known vulnerability search
across different architectures is difficult. the known binary vulnerabilities across different architectures, there are only few researches
on cross-architecture known vulnerability search. Recently, one advanced method to search
known vulnerabilities across different architectures was proposed by Eschweiler et al.
(2016). They employed a prescreening method to screen out most of the dissimilar functions
and then used the MCS (maximum common subgraph) algorithm to determine the true
matching function among a few suspicious functions. Although the method is effective in
the cross-architecture cases, its prescreening stage seems unreliable and might result in poor
accuracy in some scenarios (e.g., the case in Section 3).
In this paper, to enhance the overall efficiency, we adopted a staged strategy for the vulnerability
search in firmware. To obtain a small portion of the candidate functions quickly,
we first used kNN to prune (similar to the work by Eschweiler et al. in 2016), and then
used SVM to refine, which can result in a higher accuracy in the prescreening stage. Then,
inspired by the work of Feng et al. (2016), we used bipartite matching to pick out the true
matching functions from the suspicious functions that remained from the prescreening stage.
The experimental results show that CVSkSA achieves good performance in vulnerability
search.
This paper is a significant extension of the conference paper published at DSA 2017
(Lin et al. 2017). On the basis of the original paper, we further propose a hybrid method
using a kNN-SVM approach to improve the efficiency considerably at an acceptable or
negligible cost of accuracy. The idea is to use kNN for fast screening out of obvious
non-candidates, and then, SVM is only applied to a small number of highly suspected
functions. The kNN-SVM-based method can reduce the query time to a few times lower
than that of our previous work only applying SVM (Lin et al. 2017), e.g., from 0.18 s to
0.032 s in the condition “ARM to MIPS”, at the expense of slightly lower correctness, e.g.,
from 99.7 to 99.6%. The kNN-SVM-based method also outperforms other state-of-the-art
approaches, i.e., Multi−MH (Pewny et al. 2015), Multi−k−MH (Pewny et al. 2015), and discovRE (Eschweiler et al. 2016), in terms of overall performance. In summary, our major
contributions are as follows:
1. We show a staged approach, CVSkSA, to search vulnerabilities in binary firmware
across different architectures, which can take advantage of the knowledge that we
already know about the vulnerabilities.
2. We perform some experiments on a baseline dataset and real-world firmware images.
Compared with Multi−MH (Pewny et al. 2015), Multi−k−MH (Pewny et al. 2015),
and discovRE (Eschweiler et al. 2016), the experimental results show that CVSkSA has
not only a better accuracy in vulnerability search but also a higher efficiency.
The rest of this paper is organized as follows: Section 2 is the overview of the proposed
approach. Section 3 mainly discusses the implementation of our approach, and compares
it with the state-of-the-art approaches. Section 4 shows the experimental evaluation of our
approach on real-world vulnerable functions and firmware images under realistic conditions.
Section 5 presents some related works about recognizing the known vulnerabilities
and some works about the hybrid method using kNN-SVM in other application fields.
Section 6 mainly demonstrates some restrictions of our approach. Section 7 provides a
summary of our work and presents the future work.
:::
:::success
韌體是硬體和軟體之間的介面
包含:BIOS、ROM中的程式、網路設備連接程式、但可能存在漏洞(指攻擊者可以利用的弱點
執行未經授權的操作,這會給物聯網系統帶來潛在的風險
從韌體中找到原始的source code有相當的難度,因此搜索已知漏洞主要會在二進制執行檔進行
大多數的搜尋都是以動態分析為主 或是只能限定同個架構(無法跨架構)
動態分析很難處理不同架構問題 會受限制
有些方法像是:透過k-gram和指令序列的校準觀察opcode或 指令來分析 但這種只能適應特定架構
Pewny et al. (2015)在先前的研究提出架構漏洞的問題,表示目前跨架構還是比較少研究的
近期像是 Eschweiler et al.
(2016). 學者
有提出比較好的研究
這篇是使用預篩選(prescreening)方法來篩選出大多數不相同的function
然後使用MCS(maximum common subgraph 最大公共子圖)演算法從可疑的function,中去做匹配,雖然這種方法快架構上很有效率但prescreening階段看起來不可靠且在幾個不同場景鐘不太優良
而作者提出的方法透過多個階段策略來增進效率,為觀察小部分的候選function,作者先用KNN去嘗試接著用來提煉可以提高準確性
這篇是DSA 2017
(Lin et al. 2017).的延伸
:::
:::warning
接續上一篇看
:::
## Approach overview
###### tags: `thesis`