Staged Method of Code Similarity Analysis for Firmware Vulnerability Detection

# Staged Method of Code Similarity Analysis for Firmware Vulnerability Detection 記得先跟老師說明上次回顧 --- ## Abstract 提出現有檢測方式缺點及問題: - Existing vulnerability detection technology based on simple feature matching cannot reach high accuracy to detect ﬁrmware vulnerabilities while using a control ﬂow graph matching directly has proven to be too expensive [關於 feature matching](https://www.csie.ntu.edu.tw/~cyy/courses/vfx/05spring/lectures/scribe/04scribe.pdf) 為了可以提升大量檢測和準確度的方式，作者提出一種多階段檢測的方式: - To address the problem of accurateand efﬁcient, we present a method of staged ﬁrmware vulnerability detection based on code similarity. 第一階段使用 NN 來看 function 相似度，可以提升效率 - The ﬁrst stage, function embedding based on neural network is used to analyze the similarities among functions, and large-scale ﬁrmware security inspection can be achieved efﬁciently 第二階段比較function local call 流程圖後做細粒度分析這階段可提升準確性 - The second stage, the similarity among function local call ﬂow graphs is calculated for ﬁne-grained ﬁrmware security analysis, and this stage can improve the accuracy of vulnerability detection. We compared our method with state-of-the-art approaches, andtheexperimentalresultsdemonstratethatourmethodismoreaccurate 最後ˇ比較他們的方法和 state-of-the-art 準確性和效能上都比較優秀準確度高達86% (我認為蠻低的或許可以再想想如何提升) --- ## Introduction ==**前言**== 這邊著重在 function 相似度比較 - Function similarity analysis technology plays an important role in many ﬁelds, such as software plagiarism detection and malicious code detection 作者提到韌體分析和電腦軟體之間分析差異，受到IOT的架構不同的關係加上很多非opensource很難去做動態分析或是symbolic execution ，雖然symbolic execution 正確率高但性能開銷很大加上韌體通常很大，加上環境相當難建構，因此動態分析相當難用，目前比較常用的還是feature-based 的這種方法 However, as opposed to software security analysis in the ﬁeld of PC,almostallﬁrmwareofIoTequipmentisnotopen source [8], [9]; moreover, the processor architectures and operating systems of IoT equipment are diverse. - Firmware cannot be analyzed by methods used on PC software, such as symbolic execution or dynamic analysis. Although the accuracy of symbolic execution is high, the performance overhead is substantially greater. Thus, symbolic execution is not suitable for ﬁrmware analysis, as the size of the ﬁrmware is often large. IoT equipment includes numerous peripherals, which make it hard to build an integrated environmentforﬁrmware thus,ﬁrmware dynamic analysisis difﬁcult work. 特徵分析的方法比較有效率，且相當依靠特徵取出的品質 - Among the existing ﬁrmware security analysis methods, feature-based analysis is more effective, and depends on the quality of extracted features 但是根據目前方法中沒有看到有考慮到函數調用之間的關聯的方法 - none of these existing methods have considered the invoke relations among functions 函數調用會決定有沒有bug可以觸發，因此是很重要 - the function invoke relation determines whether or not bugs can be triggered ==**比較別人論文Gemini 並說明他的缺點**== #略 ==**Contribution**== - We propose a novel staged ﬁrmware vulnerability detection method. The ﬁrst stage uses a residual network to embed the function features into the high-dimensional space for efﬁcient analysis of function similarity, while the second stage uses the hierarchical weighted bipartite graph to calculate the function LCG similarity. The combination of the two stages provides higher similarity analysis accuracy compared with existing method 我們提出了一種分階段的固件漏洞檢測方法。第一階段使用殘差網絡將功能特徵嵌入到高維空間中，以有效分析功能相似度，而第二階段使用層次加權二部圖來計算功能LCG相似度。與現有方法相比，兩個階段的結合提供了更高的相似度分析準確性 - We design a method using the genetic algorithm to optimize function features. In addition to CFG features, the call ﬂow graph features of the function are considered, and the genetic algorithm is applied to obtain an optimal weighted feature combination, which improves the function similarity accuracy. 我們設計了一種使用遺傳算法優化功能特徵的方法。除CFG特徵外，還考慮了函數的調用流圖特徵，並應用遺傳算法獲得了最優的加權特徵組合，從而提高了函數相似度。 - We implement a prototype system and verify its effectiveness. The experiments demonstrate that the prototype yields an AUC of 0.981, which is higher than the AUC of 0.971 achieved by the state-of-the-art methods, and obtains the best result of 26 true positive samples in the top 30 suspicious vulnerable functions in real-world testing. Furthermore, it can complete the retraining process in 1 h and can thus be reconstructed rapidly, according to new datasets and purposes 我們實施原型系統並驗證其有效性。實驗表明，該原型產生的AUC為0.981，高於通過最新方法獲得的AUC為0.971，並且在前30個可疑脆弱功能中獲得了26個真實陽性樣本的最佳結果。實際測試。此外，根據新的數據集和用途，它可以在1小時內完成再培訓過程，因此可以快速重建。 --- ## Background ==**FUNCTION SIMILARITY ANALYSIS**== Feature matching is often used in vulnerability detection, providing high efﬁciency and accuracy. The main concept of feature matching is to extract the vulnerability feature, and then verify whether the target binary has any code similar thereto. For example,used strings as feature to detect the ﬁrmware security. Function similarity analysis is one of the methods of feature matching, and extracts the syntactic features of functions to compare the differences among them. Existing function similarity analysis methods are mainly suitable for functions with the same processer architecture; 但韌體有很多不同的架構如MISP　ＡＭＤ however, multiple architectures exist for ﬁrmware functions. The architecture of the same product ﬁrmware may differ from model to model; thus, signiﬁcant syntactic diversity exists among ﬁrmware functions. However, there is little semantic difference in homologous functions, because semantics primarily express functional attributes, whichwouldnotvarywithachangeinarchitecture.Bycomparing a large number of CFGs and call ﬂow graphs of cross-platformbinariescompiledfromthesamesourcecode, it can be found that the graph structure is exactly the same. The call ﬂow graph of the function represents the invoke relation among functions; if the invocation order changes, so does the program action. Therefore, regardless of whether thebinaryarchitecture,operatingsystem,compiler,andcompile compilation options are the same, the call ﬂow graph of the programs should be the same if they are compiled from the same source code. The CFG describes the internal characteristics of the function, and expresses the jump path of the basic block. Owing to the differences in the registers, address offsets, and instruction opcode among architectures, theorderofthebasicblocksofthehomologousfunctionswill change slightly, but the overall structure of the CFG will not vary substantially. Therefore, the call ﬂow graph and CFG are appropriate features for the cross-architecture similarity analysis of functions. Reference proposed a method for cross-architecture function similarity analysis based on maximumcommonsubgraphmatching,whichachievedhigh accuracy. However, the experiments demonstrated that direct graph matching is very expensive, and is not suitable for large-scale security detection of ﬁrmware functions. 特徵匹配通常用於漏洞檢測中，以提供高效和準確性。特徵匹配的主要概念是提取漏洞特徵，然後驗證目標二進製文件是否具有與其相似的任何代碼。例如，[13]使用字符串作為檢測固件安全性的功能。功能相似性分析是特徵匹配的方法之一，它提取功能的句法特徵以比較它們之間的差異。現有功能相似性分析方法主要適用於具有相同處理器架構的功能。但是，固件功能存在多種體系結構。同一產品固件的體系結構可能因模型而異；因此，固件功能之間存在顯著的句法多樣性。但是，同源函數之間的語義差異很小，因為語義主要表示功能屬性，而屬性不會隨體系結構的變化而變化。通過比較大量的CFG並調用從相同源代碼編譯的跨平台二進制流圖，可以發現圖結構完全相同。函數的調用流圖表示函數之間的調用關係；如果調用順序更改，則程序操作也會更改。因此，不管二進制體系結構，操作系統，編譯器和編譯編譯選項是否相同，如果它們是從相同的源代碼編譯的，則程序的調用流圖應該相同。 CFG描述功能的內部特性，並表示基本塊的跳轉路徑。由於體系結構之間寄存器，地址偏移量和指令操作碼的差異，基本塊軟同源功能的順序將略有變化，但CFG的總體結構不會有實質性變化。因此，調用流圖和CFG是用於功能的跨體系結構相似性分析的適當功能。參考文獻[10]提出了一種基於最大子圖匹配的跨架構函數相似度分析方法，該方法具有較高的準確性。但是，實驗表明，直接圖匹配非常昂貴，並且不適合固件功能的大規模安全檢測。 ==**NETWORK REPRESENTATION LEARNING OF FUNCTION**== Network representation learning (NRL) uses lowdimensional, dense, and real-valued vectors to represent network nodes; that is, for mapping to k-dimensional hidden space. NRL introduces the concept of embedding ,which can be represented as illustrated in Figure 1, in which nodeuismappedtod-dimensionalspacebyakernelfunction and transformed into a vector. NRL offers two main advantages: 1. nodes of networks with similar structures will have similar embedding; and 2. homophilic network nodes will have similar embedding. For example, nodes u and s6 have similar embedding in Figure 1. The property of NRL can be used to analyze the similarity of ﬁrmware functions, because homologous functions should have similar embedding, even if the processor architecture, operating system, and compiler differ. Furthermore, the computational efﬁciency of vector similarity is higher than that of graph similarity 網絡表示學習（NRL）使用低維，密集和實值向量表示網絡節點；也就是說，用於映射到k維隱藏空間。 NRL引入了嵌入，的概念，可以如圖1所示來表示，其中結點通過容器函數映射到維空間，並轉換為向量。 NRL具有兩個主要優點：1.具有相似結構的網絡節點將具有相似的嵌入； 2.同質網絡節點將具有相似的嵌入。例如，節點u和s6在圖1中具有相似的嵌入。NRL的屬性可用於分析固件功能的相似性，因為即使處理器架構，操作系統和編譯器不同，同源功能也應具有相似的嵌入。此外，向量相似度的計算效率高於圖相似度 --- ## APPROACH OVERVIEW To realize the similarity analysis among ﬁrmware functions more efﬁciently and obtain the potential relations among ﬁrmware function features, which can be embedded in highdimensional space. Following embedding, a ﬁrmware function is represented as a numerical vector, which can be used directly to analyze function similarity without accessing the original functions. Any action on the function can achieve the same effect on the function embedding vector by means of certain operations, but vector analysis is more efﬁcient than raw function analysis. An overview of our method is presented in Figure 2. We implement a staged ﬁrmware function similarity analysis method, which can not only satisfy large-scale analysis, but also ensure accuracy. Moreover, two databases are created to contain the embeddings of the ﬁrmware and vulnerability functions, respectively: FirmwareDB and VulnerabilityDB, which can be used to detect the ﬁrmware security rapidly. In Figure 2, the inputs are ﬁrmware binary code and vulnerability functions. The commercial disassembly tool IDA Pro was used to extract the function features, includingintra-functionCFGfeaturesandfunctioncallgraph features. To represent the function more accurately, a feature selection procedure, the genetic algorithm, was used to ﬁlter out the low-impact and redundant features and generate a weightedfunctionfeaturecombination(sectionIV).Aneural network was designed to embed the function in the highdimensional space, where the function could be represented as a numerical vector. Finally, the two-stage code similarity analysis was performed. 為了更有效地實現固件功能之間的相似性分析，並獲得可以嵌入到高維空間中的固件功能特徵之間的潛在關係。嵌入之後，固件函數被表示為數值向量，可以直接用於分析函數相似性而無需訪問原始函數。對函數的任何動作都可以通過某些操作對函數嵌入向量實現相同的效果，但是向量分析比原始函數分析更為有效。圖2概述了我們的方法。我們實現了分階段的固件功能相似性分析方法，該方法不僅可以滿足大規模分析的需要，而且可以確保准確性。此外，創建了兩個數據庫來分別包含固件和漏洞功能的嵌入：FirmwareDB和VulnerabilityDB，可用於快速檢測固件安全。在圖2中，輸入是固件二進制代碼和漏洞功能。商業拆卸工具IDA Pro [21]用於提取功能特徵，包括功能內CFG功能和功能調用圖特徵。為了更準確地表示功能，使用特徵選擇程序（遺傳算法）篩選低影響力和冗餘功能，並生成加權功能特徵組合（第IV節）。設計了神經網絡，將功能嵌入到高維空間中，其中該函數可以表示為數值向量。最後，進行了兩階段代碼相似性分析。 ![](https://i.imgur.com/qbwgkkb.png) ### B. TWO STAGES ANALYSIS ##### First stage: Function embedding similarity analysis. - This was achieved by calculating the cosine angle of two embeddings to obtain the similarity, which is suitable for large-scale ﬁrmware security inspection. Further details on function embedding similarity analysis are provided in section V. 這是通過計算兩個嵌入的餘弦角以獲得相似度來實現的，該相似度適用於大型固件安全檢查。第五節提供了有關函數嵌入相似性分析的更多詳細信息。 ##### Second stage: LCG similarity analysis. - The LCGs of the ﬁrmware and vulnerability functions form a multi-layer weighted bipartite graph, and the similarity of nodes on both sides of the graph could be calculated using the algorithm during the ﬁrst stage. The invocations among functions were consideredinthisstage,whichcouldimprovetheaccuracyof the vulnerability similarity analysis but sacriﬁce efﬁciency. Further details on LCG similarity are provided in section VI. The reason for performing two stages of code similarity analysis is to satisfy different analysis demands. If one only wishes to determine whether the ﬁrmware contains suspicious vulnerability functions; that is, whether or not the ﬁrmware is secure, stage one can satisfy this requirement, as the ﬁrmware is probably insecure if it contains many suspicious vulnerability functions and the similarity scores are high. If an accurate result is desired, especially to determine whether the function is really a vulnerability function, both stages one and two should be performed. The LCGcontainsfunctioninvokerelationsthatareimportantfor bugtriggering.Furthermore,thefunctioninvokerelationsare more robust to heterogeneity of the architecture than CFG features. 固件和漏洞功能的LCG形成了多層加權二部圖，並且可以在第一階段使用該算法計算圖兩側節點的相似度。在此階段考慮了功能之間的調用，這可能會提高漏洞相似性分析的準確性，但會提高效率。第六節提供了有關LCG相似性的更多詳細信息。執行代碼相似性分析的兩個階段的原因是為了滿足不同的分析需求。如果僅希望確定固件是否包含可疑的漏洞功能；也就是說，無論固件是否安全，第一階段都可以滿足此要求，因為如果固件包含許多可疑的漏洞功能並且相似度很高，則該固件可能是不安全的。如果需要準確的結果，尤其是要確定該功能是否真的是漏洞功能，則應該執行第一階段和第二階段。 LCG包含對錯誤觸發很重要的功能調用。此外，功能調用對體系結構的異構性比CFG功能更健壯。 ## FEATURE SELECTION - **FIRMWARE FUNCTION FEATURE EXTRACTION** - complexity grouping > 在此階段，根據複雜度對韌體中的所有函數進行分類。 > 對於函數，其複雜度越高，表示邏輯越複雜，並且功能中出現漏洞的可能性也越高。 > 為了不遺漏低複雜度功能中的漏洞，我們將所有函數分組並在每組中判斷漏洞特徵。 > 函數的複雜度體現在兩個方面：函數本身的邏輯複雜度和參考關係的複雜度，分別由CYCLOMATIC COMPLEXITY和NUMBER OF TIMES A FUNCTION IS CALLED來衡量 - **CYCLOMATIC COMPLEXITY** ： We calculate the cyclomatic complexity of a function according to the number of points and edges of the function control flow graph - **NUMBER OF TIMES A FUNCTION IS CALLED**　： If a function is vulnerable and is called multiple times, it implies that there are multiple ways to trigger its vulnerability - vulnerability-feature ranking stages >在漏洞特徵排名階段，根據漏洞特徵索引對每組中的函數進行排序，並確定每組中最易受攻擊的函數。函數的漏洞功能可以體現在兩個方面：SENSITIVITY FUNCTION CALL INDEX和NUMBER OF MEMORY OPERATIONS 一些敏感的function call 或是記憶體存取不當的操作 ![](https://i.imgur.com/ojZ85XB.png) 概念: preanalysis 事先分組完後，再用演算法下去搜索 - **DUMP CONTEXT** 為了確保emulation enviroment 的環境的準確性他們 dump出暫存器記憶體的架構資訊內容這邊設計一些演算法是為了像是 D-Link DIR系列的router 只有提供用戶web介面，簡化一些配置的工作因此透過幾種方式來執行system command - BASED ON TELNET/SSH SERVICE：we can easily get the shell of the device and execute system commands - BASED ON DEVICE DEBUG INTERFACE : We can generally obtain the shell of the device through the UART port - BASED ON FIRMWARE UPDATE MECHANISM : 如果韌體在更新過程中未驗證，則可以提取他，修改啟動腳本rcS，並在打開設備電源時提供Telnet或SSH服務。重新打包固件，最後將修改後的固件更新到設備。獲取設備外殼後，我們通過網絡將靜態鏈接的gdbserver上傳到設備，然後通過gdbserver指定設備端的調試端口，通過gdb連接主機端的遠程調試端口，然後運行到入口點開始準備轉儲上下文。意思即為:修改更新的韌體，並將RCS寫入開啟TELENT OR ssh的服務再用gdbserver去連接到他等到debugging port等 - **HOOK** > 在開始emulate之前，框架會分析韌體程式的GOT資訊。設置完CPU的初始環境後，我們遍歷GOT表並讀取內存以獲取GOT每個條目的地址，以獲得函數的實際地址。但是，由於ELF的惰性綁定機制[19]，對於GOT中的某些功能，地址的綁定將無法完成。接下來，透過解析固件中的動態鏈接庫文件的符號表信息，可以獲得動態鏈接庫中的庫函數的偏移量。 > 選擇一個表示為funcX的函數；該函數在內存中的地址定義為mem_addrfuncX，而動態鏈接庫中funcX的偏移地址為offsetfuncX。因此，我們獲得內存libcaddr中動態鏈接庫文件的實際加載地址，如下所示重點：FIRMCORN將為用戶提供一個接口，以用自定義函數替換原始函數或跳過__dl_runtime_resolve [21]的地址解析，並直接跳轉到內存中該函數的實際地址。同樣，FIRMCORN可以基於GOTmem和GOTorig在模糊測試過程中自動識別和跳過不必要的功能，從而提高了虛擬執行的速度。上述過程的實現原理如圖2所示。對於靜態鏈接的二進製文件，編譯器在可執行程序的編譯過程中將所需的庫文件編譯到程序中。這種方法仍然可以自動識別功能並通過分析二進制符號表來添加鉤子 ==簡略:做hook是為了替幻他接下來要用 Unicorn Engine (CPU emulator )的客製化函數去替換一些原本的函數，可以在模糊測試時跳過不必要的功能，增加虛擬執行速度== 在FIRMCORN的hook子模塊中，我們使用Unicorn Engine [20]提供的hook_add函數添加類型為UC_HOOK_CODE的回調函數來監視固件的運行地址是GOTmem還是GOTorig - **OPTIMIZED VIRTUAL EXECUTION** - CPU EMULATOR 主要核心利用Unicorn Engine API ，Unicorn Engine [22]僅保留了QEMU的CPU仿真器部分，刪除了其他設備的仿真，並提供了python接口綁定 - MULTI ARCHITECTURE x86傳參和 MIPS環境中傳參的不同，因此提供一致的街口，供不同架構執行 - HEURISTIC OPTIMIZATION：we specifically describe the three types of functions, namely unresolved, unnecessary, and hardware-specific functions - UNRESOLVED FUNCTION > Although the context information is extracted as much as possible, dynamically allocated memory, such as heap space, may not be initialized and thus not obtained at the entry point; therefore, import of this part of memory in advance is not possible. If the library function reads and writes this part of the memory, it will cause errors in the emulation process. We define these functions as unresolved functions in the framework. 簡要:一些動態分配的記憶體配置(像是heap)，不會初始化，因此可能會導致框架錯誤就，他們系統無法解決的部分，就被分配到這一塊 - UNNECESSARY FUNCTION > 模糊處理過程中不需要一些功能，例如puts功能和類似功能。我們將這些功能定義為不必要的功能。為了實現更有效的模糊測試，我們的框架為用戶提供了跳過某些功能的界面。當執行這些功能時，程序計數器（PC）寄存器將被設置為下一條指令的地址，並且堆棧將被平衡。 - HARDWARE-SPECIFIC FUNCTION > oT設備固件可以訪問硬件功能，例如讀取GPIO引腳或NVRAM區域；但是，在仿真過程中缺少這些硬件引腳將導致程序停止運行並崩潰 - ** Fuzz test** ## Evalution ###### tags: `thesis`