--- tags: decompiler title: Source of the samples --- # Overview ## Draft Analysis of dataset for the paper We used pyc files collected by [Reversinglabs](https://www.reversinglabs.com/) from April 2016 to July 2021. We received 1905 raw pyc files and 627 pyinstaller bundled executable. These samples include various levels of threat levels ranging from 1 to 5 (an average of 2.10) with 5 being the highest degree of threat. Furthermore, samples ranged from various threat families (such as PBot, Lazagne, SeaDuke) and threat types where the threat types can be listed below. | Threat type | Count | | Threat type | Count | | -------- | -------- | -------- | -------- | -------- | |'Adware'| 1479||'Trojan'| 519| |'Hacktool'| 103||'Packed'| 82| |'Ransomware'| 75||'Backdoor'| 51| |'Exploit'| 43||'Infostealer'| 27| |'PUA'| 19||'Malware'| 13| |'Worm'| 10||'Spyware'| 5| |'Downloader'| 4||'Virus'| 4| |'Keylogger'| 1||'Certificate'| 1| |'Dropper'| 1||'Network'| 1| We unpack pyinstaller executables to get upto 134,920 pyc files. Duplicates were removed by taking hash of each file, after which, we ended with 44,796 pyc files. ## Comments by Prof Kim As far as I remember, I have shared 1000 pyc malware samples with you guys (I guess Meng) in April, 2020, and also I have uploaded them into the github repo. Check this commit message here. (https://github.com/roguedream/py-mal-sample/commit/0999a0bcdb73dafebe18f71c1daa41a86e425080) Then, Ali and Meng said that most of pyc malware samples were `PBOT` and you guys need more other samples than `PBOT`. On Aug. 4th, 2021, I have filtered out the `PBOT` malware classification and downloaded other 1906 malicious pyc from ReversingLabs. The pyc files were zipped as `pyc_malware.zip` and shared on Aug. 4th, 2021. Also, as requested for pyinstallers as well, I downloaded 627 malicious pyinstallers and shared them with you in `pyinstaller_malware.zip` on the same day. Questions remain: - What was the source of each of them? - these pyc and pyinstallers were collected by Reversinglabs. - When were these samples collected? - I will share the json files that includes all information and you can parse them and get related information such as when they were collected and how serious malware were, etc. All reports are inlcuded in the json files. - What time period were these samples collected over? - For pyinstaller the dates are first_seen: - earliest: 2016-07-21 - latest: 2021-07-28 - For pyinstaller the dates are last_seen: - earliest: 2019-02-12 - latest: 2021-07-28 - As for pyc files the dates are first_seen: - earliest: 2016-13-04 - latest: 2021-06-21 - As for pyc files the dates are last_seen: - earliest: 2017-12-02 - latest: 2021-07-06 --- The samples used vary from different sources. - Oct 16 2021 -> Pyc malware received - Uploaded on github by prof Kim - [link](https://github.com/roguedream/py-mal-sample) - ![](https://i.imgur.com/ZEdDbMN.png) - 2021/08/05 -> Pyinstaller files received - Uploaded on gdrive and received by meng - - March 17 2021 -> Malware 1-6 received - Uploaded on github by prof Kim - [link](https://github.com/roguedream/py-mal-sample) - [ref](https://discord.com/channels/719402366209884190/816750378728161290/816797237954609154) ![](https://i.imgur.com/PJW8ExT.png) Questions remain: - What was the source of each of them? - When were these samples collected? - What time period were these samples collected over?