Project Summary for `Forced-Execution`

--- tags: forced-execution --- # Project Summary for `Forced-Execution` # Timeline - Find Scientific Contributions in Batch 2 of Samples: [Now - 11 June] - Find Scientific Contributions in Batch 1 of Samples (Revisit): [11 June - 25 June] - Complete the outline of the paper with figures [25 June] - Start Writing Paper: [25 June - 2 July] - Revision period [2 July - 9 July] - Revision incorporation [9 July - 16 July] - Create Website and Add content [9 June - 16 July] - NDSS: 24 July - ### Intro/Background Python has become the go to language for scripting almost anything given the vast libraries and open-source modules available for developers. Due this, many malicious parties have settled for writing malware entirely in Python that may be dropped into systems as plain scripts or obfuscated within an executable. ### Tool Description We introduce a tool that takes these python programs, be it executables, plain-text scripts, or python object files, and **forcefully executes** them in order to **uncover the (potentially malicious) abilities or functionality** of these programs. * This is done by x-force, j-force, malmax Our tool **manipulates the predicates** or *forcefully executes* code in order to acheive 100% coverage of the code. Not only that, our tool forces all combinations of paths and logs the fined-grained run-time information of the program (eg., variable values, function arguments, etc). * This is done by x-force, j-force, malmax * [comment] Covering all combinations is not possible. It causes the path explosion problem. Moreover, it overrides errors by leveraging our python forced-objects that fill in the gaps of where forceful execution may break the program. * This is "automatically recovering from exceptions/faults" * Done by X-force, blanket execution * https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/egele In doing so, our tool gives a holistic picture of what these malware programs are capable of doing and what their real intentions may be. * Same as x-force and others # Background Pythonic code comes in different variants and so do the corresponding malware programs. Attackers can drop their scripts in the following formats: - Python executables - [name=yk] This is just packed python. Unpacking is not our main contribution -- so forget about this - **Python object files** (`.pyo` / `.pyc`) - [name=yk] **This is our focus** - Plain-text python scripts - [name=yk] I understand that there can be source code based malware, but it would not really qualify to say in the paper. I would rather use this for internal analysis only. So analyzing this does NOT help us. We gain no credit. So, do it when you can't handle other samples. Furthermore, these programs have the tendency of hiding their intentions by either obfuscating themselves or hiding their malicious intent within a benign program where reading the source code directly is less meaningful or challenging. * [name=yk] static obfuscation (e.g., changing variable names) is not our focus * [name=yk] runtime packing/unpacking (e.g., base64encoding/decoding and eval) is in scope, but not much of our focus. This is boring topic so we should focus on others unless we have nothing else to talk. * [name=yk] **evasive** techniques are in scope and important. Other than obfuscation, certain python language features create hurdles for analysts to understand the program and at times it becomes ~~impossible~~ very difficult to create or, in fact, infer the program environment under which the malicious behavior is uncovered. * [name=yk] impossible is the word you may never use in science. We analyze these limitations and create our tool to cater to such situtaions. ## Challenges Before, stating the goals, below we list the challenges that python programs have for regular tools to have. 1. ~~Flipping predicates can lead to state/path explosion.~~ 2. ~~Loops can be infinitely run by naive predicate flipping.~~~~ 3. Blocks not associated with predicates, such as exception blocks, may be missed. 4. Forced-execution may lead to missing/garbage data leading to faulty object creations. 5. Wrong objects may be passed to logically incorrect paths, leading to errorful/wrong executions. 6. Uncontrollably forcefully executing code through imported modules may lead to forceful execution of exponentially larger code-base. 7. Missing I/O may lead to crashes in program leading to a need of some level of simulation. 8. Fake object -> Fake object explosion 9. Traversing through imported modules leading to exponentially increased logs => how to avoid this? Execute only defined functions and classes in the main file 10. Impact of fake-objects on regular executions 11. Fork-explosion through naively forking through all conditionals ``` try: f = open('xxx') catch: create ``` ## Goals 1. **Predicate manipulation**: Fundamental of the tool that includes conditionals and loops. We want to ensure all blocks are covered. 2. **Exception blocks**: These are the `try`/`catch` blocks. We want to execute both of these blocks irrespective of if an error is caused or not. 3. **Missing modules/dependencies**: These are modules that may be missing from a fragment of code. We want to ignore them and simulate it's methods and objects at run-time. {Exposes some functionality for unknown object - convert blackbox to whitebox} 4. **Dynamic object manipulation**: We want to create missing attributes and methods of python objects that may be missing at run-time for smooth execution. 5. ~~**File and Network I/O**: We want to force execution w.r.t disk and network interactions and have them execute smoothly.~~ # Design We build our tool on top of the python interpretter. W # Related works # Conclusion