Android Malware Clustering using Community Detection

# Android Malware Clustering using Community Detection Andorid惡意程式已被許多APT團體結合Windows惡意程式共同進行攻擊，判斷Android惡意程式的家族，可縮短惡意程式的潛伏期。本研究提出Android Malware Clustering using Community Detection on Android Privilege-based Similarity Network技術，這個技術以MobSF靜態分析平台為基礎，萃取Android APP授權行為，並透過授權相似度分析，建構Android惡意程式分群機制，以找出惡意程式的家族。我們用Android Malware資料集(https://github.com/sk3ptre/AndroidMalware_2020)200隻APT惡意程式，5個家族進行分析，提出的方法可完整的分析惡意程式家族（質化、量化指標），本研究有以下貢獻: - 提出一個ㄡ - balabala - balabalabalabala # Introduction ## Background and Motivation 智能手機數量的急劇增加市場，而Android平台將成為市場領導者對此需要進行惡意軟件分析平台迫在眉睫(Burguera, I., Zurutuza, U., & Nadjm-Tehrani, S., 2011)。然而，由於手機的數據和服務支持的增加，出現漏洞的機會也增加了。Android安全策略的經驗表明，它始於一個相對容易理解的MAC強制模型，但是細化的數量和微妙之處使人們很難僅通過查看就發現應用程序的策略。將委派控制混入了其他典型的MAC模型中。這種情況使得對Android安全模型的牢固掌握變得微不足道(Enck, W., Ongtang, M., & McDaniel, P. ,2009)。國內單純仰賴手機上網的人口越來越多，由 105 年的 11.9%、106 年的 18.7%、107 年的 28.0%，再增為 108 年的 31.2%,在Google Play 官方商店大約有 280 萬個行動應用程式。在金融服務行動化後，Android銀行木馬Geost最先是從 Stratosphere Laboratory 的 Sebastian García、Maria Jose Erquiaga和 Anna Shirokova的研究披露出來。他們在監視 HtBot惡意代理網路時偵測到這隻木馬程式。此僵屍網路以俄羅斯銀行為目標，在該研究去年發表在 Virus Bulletin時，受害者數量已經超過了 80 萬。因此，快速檢測不同的意圖Android惡意軟件，以提高行動商務之資訊安全與日鉅增。隨著物聯網的普及，5G和其他技術，移動智能設備正迅速發展應用在智能終端，例如手機和平板電腦使用量之增加。但是，隨之而來的是增加針對平台的惡意軟件，Android應用程序在整個移動生態系統中發展迅速，但Android惡意軟件也正在層出不窮(Liu，K.，Xu，S.，Xu，G.，Zhang，M.，Sun，D.，＆Liu，H., 2020)。Android上進行的調查不同惡意軟件檢測，專注於機器的更多方面學習方法之研究，人工引進智能方法（例如機器學習）在很大程度上改善了檢測Android惡意軟件的前景。(Enck, W., Ongtang, M., & McDaniel, P. ,2009)。 1.2分群機制關鍵在什麼地方？過去有哪些方法來處理這個機制？....REF1, REF2, REF3… Saxe, J., & Sanders, H. (2018)在惡意軟件數據科學之攻擊檢測和歸因一書中提及對靜態分析技術以進行分解從惡意軟件恢復的彙編代碼。雖然可以進行靜態分析通過研究惡意軟件來獲取有關惡意軟件的有用信息的有效方法磁盤上的不同組件，但無法觀察惡意軟件行為。而動態惡意軟件分析的基礎知識與靜態分析不同，靜態分析著眼於文件形式的惡意軟件，動態分析包括在安全，隱蔽的環境中運行惡意軟件環境以了解其行為。這就像引入危險細菌菌株進入密封環境，以觀察細胞的作用。使用動態分析，可以繞過常見的靜態分析障礙，例如包裝和混淆以及可獲得更直接的見解，以達到給定惡意軟件樣本的目的。動態分析技術，惡意軟件與應用數據科學具有相關性。使用malwr.com之類的開源工具進行研究動態分析實例。透過分群機制將惡意軟件相似性和共享代碼分析，以及可視化。因此，防禦能力在很大程度上受到對這些新興的移動惡意軟件有限度的了解，並且缺乏及時獲取相關樣本的機會。從各種角度系統地表徵了它們方面，包括其安裝方法，激活機制以及攜帶的惡意有效載荷的性質。表徵和隨後的基於進化的研究個有代表性的家族表明他們正在迅速發展規避現有移動反病毒的檢測軟件（Zhou, Y., & Jiang, X.,2012）。惟動態分析之成本及所需時成本過高，較難達成決策者之風險承擔力。然而，通過表徵這些惡意軟件樣本各個方面研究結果表明其中86.0％重新包裝合法應用程序以包含惡意負載；36.7％包含平台級別的漏洞，以提升特權；93.0％的機器人具有類似機器人的能力（Zhou, Y.等,2012)。 Kim等。從收集的樣本中建立功耗歷史記錄，並從構造的歷史記錄中生成功率簽名，以進行功率意識的惡意軟件檢測。 Ongtang等。提議Saint根據應用程序開發人員定義的安全策略來保護其他應用程序可訪問的接口。 Fuchs等。提出ScanSroid提取應用程序的安全規範，並應用數據流分析來檢查是否有數據流違反它們。 Enck等。提出了Kirin安全服務來執行應用程序的認證。他們將各種潛在的危險許可組合定義為規則，以阻止潛在的不安全應用程序的安裝。 Desnos等。開發算法來幫助他們構建規則。他們提出了一種基於簽名的方法，並且還使用了權限屬性。然後，他們通過這些收集的內容構建控制流程圖，以檢測Android惡意軟件。 Zhou et al.有時會導致較高的假陽性率。有一些相關的工作將異常檢測機制應用於移動惡意軟件檢測。 What?Why?How(X)結果如何？你覺他做的怎麼樣(突破、重點、問題) ## Ideas 我們怎麼做傳統的靜態分析方法受增強和混淆的應用，Barrera等提出識別應用程序集群的方法根據請求的權限，他們嘗試實現特定的應用程序類別通常使用哪種權限。 DiCerbo等瞄準檢測Android安全性對可疑應用程序的分析權限。 Kim等析權限基於DEX和對惡意文件的清單解析應用程序。Zhou et al.等檢測感染來自基於許可的已知Android惡意軟件行為足跡。（Wu, D. J., Mao, C. H., Wei, T. E., Lee, H. M., & Wu, K. P.,2012）動態分析的影響方法需要太多的時間和空間，並且功能從現有的可視化方法中提取出來的方法很簡單。為了解決這些問題，我們提出了一個新的AndroidDEX文件的惡意家族分類方法部分功能。 DEX文件轉換為RGB圖像和基於部分功能的純文本。紋理特徵圖像的顏色特徵和文本特徵是計算為樣本特徵。由於顏色特徵，紋理特徵和文本特徵的尺寸不同和價值觀。我們選擇特徵後期融合分類多核機器學習的方法。將本文提出的方法與傳統的灰度圖像方法進行比較，對分類方法進行了分類。傳統的灰度方法是0.92，比標準灰度法低0.04比我們的方法與常規功能相比早期融合算法。特徵早期融合方法有在SVM分類器下具有很高的分類效果精度，召回率和f1達到0.94。但是，精度本文提出的方法可以達到0.96，其中比特徵早期融合方法高0.02。並且通過減少特徵提取過程與頻繁子序列方法相比，為2.999秒。實驗結果表明，Android惡意軟件本文提出了基於家庭分類的方法對DEX文件部分功能執行高分類效率和精度。 SO的縮寫是“ Shared Object”（共享對象），即機器可以直接運行。 SO文件主要存在於Unix和Linux系統。由於Android基於Linux內核，它還繼承了Linux中的所有相關設計。所有JAVA代碼已存在於DEX文件中，但仍存在一些C代碼SO文件。SO文件並從中提取更多功能以改善本文的分類精度。( Fang, Y., Gao, Y., Jing, F., & Zhang, L.,2020). What? Observerations 如何去做 ## Contributions - We use Privilege-based Similarity Network to group the …..via…. - We design and develop a system...for….MobSF - We collect several popular github android malware data and conduct extensive experiment...result. good good…. The remainder of this paper is organized as follows. The methodology of FalDroid is detailed in Section II, and its two usages are presented in Section III. The experimental results are reported in Section IV. After providing a discussion of the limitations of FalDroid in Section V, we introduce related work in Section VI. We conclude the paper with a discussion of future work in Section VII. # System Architecture <a href="https://ibb.co/6v9rnfc"><img src="https://i.ibb.co/5YpGMfX/Mob-SF-Architecture.jpg" alt="Mob-SF-Architecture" border="0"></a> # Experiments and Evaluation ### Installation Dillinger requires [Node.js](https://nodejs.org/) v4+ to run. Install the dependencies and devDependencies and start the server. ```sh $ cd dillinger $ npm install -d $ node app ``` For production environments... ```sh $ npm install --production $ NODE_ENV=production node app ``` ### Plugins Dillinger is currently extended with the following plugins. Instructions on how to use them in your own application are linked below. | Plugin | README | | ------ | ------ | | Dropbox | [plugins/dropbox/README.md][PlDb] | | GitHub | [plugins/github/README.md][PlGh] | | Google Drive | [plugins/googledrive/README.md][PlGd] | | OneDrive | [plugins/onedrive/README.md][PlOd] | | Medium | [plugins/medium/README.md][PlMe] | | Google Analytics | [plugins/googleanalytics/README.md][PlGa] | ### Development Want to contribute? Great! Dillinger uses Gulp + Webpack for fast developing. Make a change in your file and instantaneously see your updates! Open your favorite Terminal and run these commands. First Tab: ```sh $ node app ``` Second Tab: ```sh $ gulp watch ``` (optional) Third: ```sh $ karma test ``` #### Building for source For production release: ```sh $ gulp build --prod ``` Generating pre-built zip archives for distribution: ```sh $ gulp build dist --prod ``` ### Docker Dillinger is very easy to install and deploy in a Docker container. By default, the Docker will expose port 8080, so change this within the Dockerfile if necessary. When ready, simply use the Dockerfile to build the image. ```sh cd dillinger docker build -t joemccann/dillinger:${package.json.version} . ``` This will create the dillinger image and pull in the necessary dependencies. Be sure to swap out `${package.json.version}` with the actual version of Dillinger. Once done, run the Docker image and map the port to whatever you wish on your host. In this example, we simply map port 8000 of the host to port 8080 of the Docker (or whatever port was exposed in the Dockerfile): ```sh docker run -d -p 8000:8080 --restart="always" <youruser>/dillinger:${package.json.version} ``` Verify the deployment by navigating to your server address in your preferred browser. ```sh 127.0.0.1:8000 ``` #### Kubernetes + Google Cloud See [KUBERNETES.md](https://github.com/joemccann/dillinger/blob/master/KUBERNETES.md) ### Todos - Write MORE Tests - Add Night Mode License ---- MIT **Free Software, Hell Yeah!** [//]: # (These are reference links used in the body of this note and get stripped out when the markdown processor does its job. There is no need to format nicely because it shouldn't be seen. Thanks SO - http://stackoverflow.com/questions/4823468/store-comments-in-markdown-syntax) [dill]: <https://github.com/joemccann/dillinger> [git-repo-url]: <https://github.com/joemccann/dillinger.git> [john gruber]: <http://daringfireball.net> [df1]: <http://daringfireball.net/projects/markdown/> [markdown-it]: <https://github.com/markdown-it/markdown-it> [Ace Editor]: <http://ace.ajax.org> [node.js]: <http://nodejs.org> [Twitter Bootstrap]: <http://twitter.github.com/bootstrap/> [jQuery]: <http://jquery.com> [@tjholowaychuk]: <http://twitter.com/tjholowaychuk> [express]: <http://expressjs.com> [AngularJS]: <http://angularjs.org> [Gulp]: <http://gulpjs.com> [PlDb]: <https://github.com/joemccann/dillinger/tree/master/plugins/dropbox/README.md> [PlGh]: <https://github.com/joemccann/dillinger/tree/master/plugins/github/README.md> [PlGd]: <https://github.com/joemccann/dillinger/tree/master/plugins/googledrive/README.md> [PlOd]: <https://github.com/joemccann/dillinger/tree/master/plugins/onedrive/README.md> [PlMe]: <https://github.com/joemccann/dillinger/tree/master/plugins/medium/README.md> [PlGa]: <https://github.com/RahulHP/dillinger/blob/master/plugins/googleanalytics/README.md>