SP updates - HackMD

--- tags: decompiler title: SP updates --- | Parse errors breakdown | Uncompyle6 | Decompyle3 | Uncompyle2 | Unpyc37 | Decompyle++ | |------------------------|--------------|--------------|--------------|--------------|--------------| | Conditional block | 16.1% (0.8%) | 10.1% (1.9%) | 34.6% (0.8%) | 10.2% (1.2%) | 38.1% (1.5%) | | Boolean expression | 14.3% (1.2%) | 2.2% (0.4%) | 11.1% (1.1%) | 33.9% (0.7%) | 3.1% (1.4%) | | loop block | 20.2% (0.2%) | 31.9% (1.2%) | 42.4% (1.4%) | 6.3% (0.1%) | 27.2% (0.4%) | | try/except block | 31.1% (0.8%) | 51.2% (2.2%) | 0.0% (0.0%) | 26.0% (1.0%) | 9.4% (0.7%) | | with block | 9.8% (0.4%) | 3.6% (0.3%) | 0.0% (0.0%) | 0.0% (0.0%) | 16.8% (1.7%) | | other | 8.5% (1.1%) | 1.0% (0.2%) | 11.9% (0.6%) | 23.6% (0.5%) | 5.4% (1.5%) | | Category | Preliminary | Entire dataset | Diff | |---------------------------|-------------|----------|-------| | Missing Parsing Rules | 44.5% | 38.7% | -5.8% | | Conflicting Parsing Rules | 42.4% | 39.4% | -3.0% | | Unsupported instructions | 11.9% | 21.1% | 9.2% | | Implementation Bugs | 1.2% | 0.8% | -0.4% | # To do: 1. Update abstract to add *5 decompilers*. 2. Update the following to 375 ![](https://i.imgur.com/hmelv6H.png) 3. Rephrase ![](https://i.imgur.com/eVVqmfz.png) # Reviewer feedback: Following is the original response (click details): :::spoiler ![](https://i.imgur.com/mORKnva.png) ::: For each of the points, the breakdown is as follows. ### 2. Extend evaluations to include other Python decompilers (e.g., pycdc and unpyc) and obfuscation methods. #### Other decompilers. For this part, we extend our evaluation for decompilers pycdc, uncompyle2, and unpyc37. (***Section 5*** in paper) The table for each of their evaluations can be found as follows: (***Table 5*** in paper) - [Explicit](https://docs.google.com/spreadsheets/d/1r90fQLIHTJmQ9S2upRzgzuoX-itusia5VWSFqFKfAkA/edit#gid=406200740) - [Implicit](https://docs.google.com/spreadsheets/d/1w06MACucgc2HsG5q08Tuwkjv0Eu98BGzGwcx-LKuGX4/edit#gid=1186557262) ##### Decompilers without error reporting mechanism. For unpyc37 and pycdc, we do not see offsets or functions inducing errors. For this, we slice the module by functions to pinpoint the erronous functions. For all these, the entry point of perturbation is the root node. :::danger Note that there maybe cases where a function has multiple errors in the same function. ::: ##### unpyc37 and uncompyle2: We find PyFET performs well against these two decompilers. ##### Pycdc: Specifically in pycdc we observe special cases because of the poor development. For each explicit error we see the patterns [here](https://hackmd.io/Zu2FG0VxStutWpB1ynD27A?view#Segmentation-fault). Meanwhile, we see a variety of instructions (22 instructions across all python versions) that were not supported for which we apply rules [here](https://hackmd.io/faccVWwjT8COgXD6jv9MDQ#Unsupported-instructions1). ###### Design issues in Pycdc: For implicit errors as well we see unusual case of syntax errors. We notice that the decompiler uses the same set of grammar rules across all python versions i.e., it caters to instruction such as `POP_JUMP_IF_FALSE` in python 2.7 the same way it would for python 3.8. Moreover, the decompiler does a bad job scaling with newer python versions. The developers choose to support new python versions based on the number of instructions unsupported. This leads to them missing parsing rules for instructions in contextual changes. For example, in python 3.8, the language drops the use of `SETUP_LOOP` instruction for loops (7,395 instances in our data). However, the decompiler does not cater to such instances leading to either implicit errors at loops or producing outputs with syntax errors. ``` def foo(): for i in lib: z=z ``` decompiles to: ``` def foo(): pass ``` We see the same case for try/except cases where the decompiler mistreats `SETUP_FINALLY` because of failure of porting instruction to new python versions. :::warning Try porting these instructions to python 3.7 ::: ###### Perf eval (***Section 5.3*** in paper) We also extend our perf evaluation with results [here](https://docs.google.com/spreadsheets/d/1URVPfhsh3qCLQ2-UtEYLa-riGX-QQWtakbEwB69MjQw/edit#gid=43625599). #### Obfuscation. (***Section 5.4*** in paper) Details on obfuscation techniques can be found [here](https://hackmd.io/@jGDeFsroQ7C7kfC8L-kU-Q/HJldH-v35). ### 3. Conduct root cause analysis to identify and characterize bugs or other issues in Python decompilers that result in decompilation errors. (***Section 2*** in paper) For root cause analysis we categorize error in two main categories for all decompilers. 1) "Type of error in a decompiler" and 2) "Error location in bytecode". For type of error in decompiler I created 5 main categories: 1. **Unsupported instructions**: These are errors where decompiler fails to recognize the instruction or doesn't support it. (Does not include after migration) 2. **Unsupported python versions**: These are errors caused by pyc files unsupported by uncompyle6 and decompyle3. 3. **Parse error (missing grammar)**: These are errors caused by cases where decompiler fails to map any grammar to a pattern in binary. 4. **grammar conflict**: These are errors caused due to decompiler inferring incorrect grammar for a certain pattern (implicit errors) 5. **Other errors**: These errors constitute of bugs in decompiler (e.g., segmentation faults and DNF). ![](https://i.imgur.com/2s8WWzX.png) For parse errors and grammar conflicts we study the context in bytecode where the errors originate. Note that pin-pointing the exact grammar conflict or what grammar is missing is a difficult task because decompilers attempt all possible grammars in a greedy manner (i.e., they will look for loops or with statements in a conditional if atleast one instruction overlaps between their respective grammar). As such we categorize the contexts of failures as follows: 1. conditionals 2. booleans 3. loops 4. try/except 5. with block 6. other ![](https://i.imgur.com/nG15idn.png) The final results are found [here](https://docs.google.com/spreadsheets/d/1i3dRGD0GWnQ9OlN7ajnxxfHSSFn5N_yVIgcSytjc3CA/edit#gid=1708920927). ### 4. Depict the human effort necessary to fix decompiler defects. Compare the human work involved with PYFET. (***Section 5.5*** in paper -- Add new section and ***Section 6*** in paper) To depict the human effort necessary, we analyze the complexity of decompilers to showcase that updating the decompiler takes non-trivial effort. We showcase these details [here](https://hackmd.io/N8_tSnbQRRiNzGY48dyvtA?view#Complexity-of-decompiler). For case studies, we analyze other decompilers and showcase it [here](https://hackmd.io/N8_tSnbQRRiNzGY48dyvtA?view#New-case-study). ### Overall, derived from items 2, 3, and 4, the reviewers would want a precise, direct response to the following question: how generic is PYFET and how much effort does it save versus directly patching the bugs? (***Section 6*** in paper) - From **4** we showcase that PyFET takes non-trivial effort to fix as opposed to fixing the decompiler. - From **3** we see that the errors have various root-causes and pin-pointing the exact location of error itself is a difficult task. Our design fundamentally relieves the analyst from finding the root cause and lets our tool perturb the binary to locate and fix the error. - From **2** we see that our approach is generic and can compliment all decompilers in fixing errors despite the poor design or implementation choices that the developers make. ### 5. Perform a ground-truth study in which you compare the decompiled binaries with the ground truth that was used to produce the binaries, to study the usefulness of the proposed transformations. How different are the two? Are the differences small enough that they can still aid meaningful forensic analysis? (***Section 5.X*** in paper - New section for correctness?) We run our ground truth study for **1300** samples from 100 applications downloaded from Github. These samples are all where each have errors that require fixes from one of the 26 FETs. Note that we ensure that each sample has **only one error** to measure the impact of single transformation on the remaining of code. Also note that we also instrument errors for FET rules that we did not find sufficient candidate samples. The details can be found [here](https://hackmd.io/iSB4nTTiROuTXPcf9xxTcg). ### 1. Clarify confusions and ambiguities identified by reviewers. Deliver whatever you have promised to fix/enhance in your rebuttal. - Update intro to incorporate changes in paper - Issues with decompiler - add detail and move decompiler study - Major root causes for errors - Fundamental design choices that makes our tool better ("handling obfuscated bytecode within the decompiler would disrupt the current design of the decompiler significantly.") - Add reasoning for choice of decompilers (Eval?) - Update implicit error description to correct factual misundertandings - not blindly replace errors - Type 4 issue - Split eval of Py 3.9 binaries - Keep separate - Justify not migrating python 2.7~3.6 binaries to decompyle3 - Explain binary level transformation for FET (.i.e., the rules in table are only for presentation basis) - Discuss other obfuscation techniques - Add scope with PyFET not handling source level obfuscation techniques - Give breakdown for SETs/FETs - Discussion with developers? - Add future work for extensibility of PyFET - Explain how FETs are derived - Add discussion on errors introduced with transformation - Explicitly mention that we use different dataset for training and testing ![](https://i.imgur.com/m0avSxM.png) ![](https://i.imgur.com/4pJpar9.png)