マルウェア解析に必要な素養～マルウェア検知編～

マルウェア解析に必要な素養～マルウェア検知編～ ==== :::success 親ページ：[マルウェア解析に必要な素養](https://hackmd.io/s/S1kLEr5x#) ::: トピック ---- ### 評価用マルウェア [Eicar – EUROPEAN EXPERT GROUP FOR IT-SECURITY](http://www.eicar.org/) ### データセットマルウェア対策のための研究用データセット～ MWS 2011 Datasets ～ http://www.iwsec.org/mws/2011/manuscript/1A1-1.pdf * マルウェアを提供ししたのはFFRI [[出典](http://www.ffri.jp/assets/files/monthly_research/MR201407_An%20Example%20of%20Antivirus%20Detection%20Rates%20and%20Similarity%20of%20Undetected%20Malware_ENG.pdf)] FFRI Dataset 2014のご紹介 http://www.iwsec.org/mws/2014/files/FFRI_Dataset_2014.pdf #### 機械学習マルウェアは一種の文字列もしくは画像なので適用できる。 Microsoft Malware Classification Challenge (2015) https://www.kaggle.com/c/malware-classification * 機械学習によりマルウェアを分類するコンテスト * データセットは今でもダウンロードできる（2016年4月現在）クラスタリング ---- ロジスティック回帰分析による未知ファイル分類の有効性 http://www.ffri.jp/assets/files/monthly_research/MR201402_Effectiveness%20of%20unknown%20malware%20classification%20by%20logistic%20regression%20analysis_JPN.pdf 動的情報に基づいたマルウェアのクラスタリング http://www.ffri.jp/assets/files/monthly_research/MR201311_Behavioral-based_malware_clustering_JPN.pdf 機械学習のセキュリティ技術応用 http://www.ffri.jp/assets/files/monthly_research/MR201306_Machine_learning_for_computer_security_JPN.pdf Fighting advanced malware using machine learning http://www.ffri.jp/assets/files/research/research_papers/psj13-murakami_EN.pdf ### Fuzzy Hashingを用いたクラスタリング #### ssdeep TODO #### impfuzzy https://github.com/JPCERTCC/impfuzzy/ [Import APIとFuzzy Hashingでマルウエアを分類する～impfuzzy～(2016-05-09) - JPCERT/CC Eyes | JPCERTコーディネーションセンター公式ブログ](https://blogs.jpcert.or.jp/ja/2016/05/impfuzzy.html) > 今回は、新たな手法impfuzzyを提案し、impfuzzyをマルウエアに施して得られる値を用いることにより、類似したマルウエアを的確に見つけられることを、従来の手法との比較により示します。 #### Trend Micro Locality Sensitive Hashing (TLSH) TODO [機械学習を利用したクラスタリングによる仮想通貨発掘マルウェアの検出 | トレンドマイクロセキュリティブログ](http://blog.trendmicro.co.jp/archives/17221) (2018/4/6) 表層解析による検知 ---- ### [yara](https://plusvic.github.io/yara/) > The pattern matching swiss knife for malware researchers 平たく言うとファイル中のバイト列の特徴を元に検出・分類するOSSツール install on Arch Linux: ``` sudo pacman -S yara yara-python ``` usage: ``` % yara --help YARA 3.5.0, the pattern matching swiss army knife. Usage: yara [OPTION]... RULES_FILE FILE | DIR | PID Mandatory arguments to long options are mandatory for short options too. -t, --tag=TAG print only rules tagged as TAG -i, --identifier=IDENTIFIER print only rules named IDENTIFIER -n, --negate print only not satisfied rules (negate) -D, --print-module-data print module data -g, --print-tags print tags -m, --print-meta print metadata -s, --print-strings print matching strings -e, --print-namespace print rules' namespace -p, --threads=NUMBER use the specified NUMBER of threads to scan a directory -l, --max-rules=NUMBER abort scanning after matching a NUMBER of rules -d VAR=VALUE define external variable -x MODULE=FILE pass FILE's content as extra data to MODULE -a, --timeout=SECONDS abort scanning after the given number of SECONDS -k, --stack-size=SLOTS set maximum stack size (default=16384) -r, --recursive recursively search directories -f, --fast-scan fast matching mode -w, --no-warnings disable warnings -v, --version show version information -h, --help show this help and exit Send bug reports and suggestions to: vmalvarez@virustotal.com. ``` * yarファイルに __errorがあるときは解析がストップ__（このとき何も表示しない？）するので注意実行例 \*1： ``` [katc@K_atc originalfile]$ yara -v yara 3.5.0 [katc@K_atc originalfile]$ yara -w ~/malware/rules/index.yar PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae maldoc_OLE_file_magic_number PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae RTF_Shellcode PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae Big_Numbers0 PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae Big_Numbers1 PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae Big_Numbers3 PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae Big_Numbers4 PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae contentis_base64 PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae with_urls PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae without_images PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae without_attachments PlugX_RTF_dropper_42fba80f105aa53dfbf50aeba2d73cae ``` この結果では `RTF_Shellcode` が検出されたことが興味深い。筆者はRTFファイルにおけるシェルコードはあまり詳しくないが、アドレス調整のためのnop seld（\x90はIntel系CPUではnop命令に当たる。nop sledはそのnopの連なりのことで、エクスプロイトにおけるテクニックの一つ）による不自然な\x90の連続を根拠として検出したと思われる。以下に `RTF_Shellcode` のルール定義を示す。 ``` // This rule have beed improved by Javier Rascon rule RTF_Shellcode : maldoc { meta: author = "RSA-IR – Jared Greenhill" date = "01/21/13" description = "identifies RTF's with potential shellcode" filetype = "RTF" strings: $rtfmagic={7B 5C 72 74 66} /* $scregex=/[39 30]{2,20}/ */ $scregex=/(90){2,20}/ condition: ($rtfmagic at 0) and ($scregex) } ``` (\*1) 検体→ ac7d02465d0b1992809e16aaae2cd779470a99e0860c4d8a2785d97ce988667b (sha256), [VirusTotal](https://virustotal.com/ja/file/ac7d02465d0b1992809e16aaae2cd779470a99e0860c4d8a2785d97ce988667b/analysis/), [hybrid-analysis](https://www.hybrid-analysis.com/sample/ac7d02465d0b1992809e16aaae2cd779470a99e0860c4d8a2785d97ce988667b?environmentId=100), [YaraRules Anlayzer](https://analysis.yararules.com/analysis/586b8bbfe41a9c3e9aa25467) #### yara rules `RULES_FILE` のファイルはここを利用できる： https://github.com/Yara-Rules/rules/ * [commit 42ce524](https://github.com/Yara-Rules/rules/commit/42ce524845c10649f99b231bb206261ac8522191) 時点では、"./malware/MALW_DirtyCow.yar"のimportをコメントアウトをしないとろくに動かないので注意オンライン版： https://analysis.yararules.com ごった煮（ライセンス関係不明）： https://github.com/mikesxrs/Open-Source-YARA-rules ##### yara ruleを書いてみよう * [Writing YARA rules — yara 3.4.0 documentation](http://yara.readthedocs.io/en/v3.4.0/writingrules.html) TODO：同人誌に書いた内容軽く載せようかな…？ #### yaraのプラグインを書いてみよう同人誌 [TomoriNao Vol. 1](https://tomorinao.pro/goods/books.html) に書きました。不正通信の検出 ---- __TODO__：SOCや客先でも使われる技術も載せていきたいぞ… ### IDS パケットおよびフローをシグネチャとマッティングし、マッチしたフローをログに吐き出してくれる。 #### snort 読み：すのーと。TODO #### Suricata https://suricata-ids.org/ Ubuntuではaptで入るバージョンが古いので、gitでインストールするといいかもしれない。シグネチャ： * Emerging Threats * 有償と無償のシグネチャを提供している * https://rules.emergingthreats.net/ ### SIEM 読み：しーむ。ログをかき集めて、セキュリティの相関分析を行うエンジン。 ### ログ [マルウエアDatperをプロキシログから検知する(2017-08-17)](http://www.jpcert.or.jp/magazine/acreport-datper.html) (2017/8/17) C2通信検出の話 [マルウエアDatperの痕跡を調査する～ログ分析ツール（Splunk・ELKスタック）を活用した調査～ (2017-09-25)](http://www.jpcert.or.jp/magazine/acreport-search-datper.html) (2017/9/25) 今時なログ解析 [セキュリティのためのログ分析入門サイバー攻撃の痕跡を見つける技術 (Software Design plusシリーズ) ](https://www.amazon.co.jp/dp/429710041X/) ### pcap分析 [PacketTotal - A free, online PCAP analysis engine](https://www.packettotal.com/) pcapをアップロードすると検知結果を出してくれるオンラインサービス。 ### HTTPリクエストヘッダに残る痕跡 TODO ### DNS [Global DNS Hijacking Campaign: DNS Record Manipulation at Scale | FireEye Inc](https://www.fireeye.com/blog/threat-research/2019/01/global-dns-hijacking-campaign-dns-record-manipulation-at-scale.html) (2019/1/9) ### 機械学習 [フォーカス・リサーチ（1）「ディープラーニングを用いたログ解析による悪性通信の検出」](https://www.iij.ad.jp/dev/report/iir/042/02.html) > ディープラーニングは、悪意ある通信を発見する手法としても活用可能です。ここでは、マルウェア感染及びExploit Kitの悪性通信を、一般的なファイアウォールやWebプロキシサーバの膨大なログから検出する手法を紹介します。 > > なお本稿は、国際的なセキュリティカンファレンス「Black Hat Europe 2018」のBriefingにて"Deep Impact: Recognizing Unknown Malicious Activities from Zero Knowledge"（※1）というタイトルで発表した内容を再構成したものです。 ### データセット * 一覧：[16. Public Data Sets — Suricata 4.1.0-dev documentation](https://suricata.readthedocs.io/en/latest/public-data-sets.html) * pcapをsuricataに食わせるとそこそこ面白いよ * 一覧：[Public PCAP files for download](https://www.netresec.com/?page=PcapFiles) 参考資料： * #### KDD Cup 99 Data > サイバーセキュリティに携わる者なら一度くらいはKDD Cup 99 Dataなるデータセットの名を耳にしたことがあるのではないだろうか．KDD Cupは国際会議SIGKDDによるデータマイニングのコンペで，KDD Cup 99 Dataはそのためのネットワーク侵入検知にまつわるデータ．正常通信と攻撃を分類するタスクが与えられた． > [KDD Cup 99 Dataおぼえがき | 一生あとで読んでろ](http://ntddk.github.io/2016/11/23/kdd-cup-99-data/)（by ntddk）より ### パケットに残る特徴の利用通信プロトコルのヘッダの特徴に基づく不正通信の検知・分類手法 https://ipsj.ixsq.nii.ac.jp/ej/index.php?active_action=repository_view_main_item_detail&page_id=13&block_id=8&item_id=106530&item_no=1 特徴的なTCP/IPヘッダによるパケット検知ツールtkiwa http://ipsr.ynu.ac.jp/tkiwa/ ### 分類待ち [マルウェア通信検知手法におけるUser-Agentの有効性の一考察](http://www.slideshare.net/recruitcojp/useragent-54370987) マルウェアの不正ネットワークの分析によるルールファイル自動化の設計と実装 https://ipsj.ixsq.nii.ac.jp/ej/?action=pages_view_main&active_action=repository_view_main_item_detail&item_id=108925&item_no=1&page_id=13&block_id=8 特徴量の時間的な状態遷移を考慮したマルウェア感染検知手法に関する研究（修士論文） https://dspace.wul.waseda.ac.jp/dspace/handle/2065/36065 [データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定](http://www.slideshare.net/JubatusOfficial/ss-63592780) 未知マルウェアの検出に向けて ---- ### 必要なもの《》に検討項目を示す。★が付いた観点は差別化要因になるので、各位知恵を絞ってくだされ。 * データセット《何に特化する？汎化する？》 * 良性《Windowsの標準実行ファイル＆ライブラリ、まともなWebサイト、メール、…》 * 悪性《exe, dll, pdf, js, ps1, docx, apk, pcap, malspam, ...》 * 自作模擬マルウェアでなく、in-the-wildの検体がほしいところ * ソース★《研究用データセット、VirusTotal、オンラインサンドボックス、ハニーポット、クローラー、ブラックリストサイト(\*1)、…》 * いつのデータセットなのかも重要。時期によって流行するマルウェアが違うため、今ではろくに検出できませんということもありうる。 * 検体のラベル情報《採用するAVは？、VirusTotal、…》 * 特徴量★《クラスタリング、ハッシュ値、ライブラリ、実行トレース、VirusTotal、画像化、…》 * サンドボックス《Cuckoo？商用サンドボックス？オンラインサンドボックス？》 * 機械学習★《Chainer使う？、学習モデルは？、…》 * ハードウェア《CPU、メモリー、GPU》 (\*1) VirusTotalに報告されているサイトで、悪性判定される検体を落とせるケースあり。 ### 実施例 * [死にゆくアンチウイルスへの祈り | 一生あとで読んでろ](https://ntddk.github.io/2017/09/10/a-prayer-for-the-dying-antivirus/) (2017/9/10) とりあえず、国内論文で「未知マルウェア」の論文を探してみるとか https://scholar.google.co.jp/scholar?hl=ja&as_sdt=0%2C5&q=%E6%9C%AA%E7%9F%A5%E3%83%9E%E3%83%AB%E3%82%A6%E3%82%A7%E3%82%A2&btnG= 特許の検索サイトで「未知マルウェア」と検索してみるとか https://www.j-platpat.inpit.go.jp/web/all/top/BTmTopPage テレメトリーデータで感染を予知 ---- **高火力なコンピューティングリソースがある人はチャレンジしてみましょう！** 前回は2015年。 [Microsoft Malware Prediction | Kaggle](https://www.kaggle.com/c/microsoft-malware-prediction) 2019/5終了 > The goal of this competition is to predict a Windows machine’s probability of getting infected by various families of malware, based on different properties of that machine. The telemetry data containing these properties and the machine infections was generated by combining heartbeat and threat reports collected by Microsoft's endpoint protection solution, Windows Defender. ルールを一言でまとめると：マシンについての状態（＝テレメトリーデータ）がcsvとして与えられる。訓練用データではマシンが感染しているかどうかが`HasDetections`カラムで与えられる。プレイヤーは各マシーンについて`HasDetections`の値を予測する。これ、できるんかいな…（まぁやってみろって話）。 [Microsoft AI competition explores the next evolution of predictive technologies in security - Microsoft Secure](https://cloudblogs.microsoft.com/microsoftsecure/2018/12/13/microsoft-ai-competition-explores-the-next-evolution-of-predictive-technologies-in-security/) （2018/12/13）