Timeline - HackMD

--- tags: decompiler title: Timeline --- # Overview ## Dashboard: - [x] Complete rule table [name=Ali] - **Saturday Midnight** - [x] Categorize them - [x] Add in paper :warning: In process - [x] Add more implicit errors - [x] Find implicit error functions [name=Chijung] - [x] Inspect for unique new implicit errors in 2000 remaining samples [name=Chijung] - [x] Centralize and add them in the paper [name=Ali] - [ ] (V-C) Complete Eval [name=Ali] - **Sunday Midnight** - [x] Most Effective Transformation Rules - [x] Explain top 3 rules - [x] Fill XXX - [x] # of Transformations Applied - [x] Report rules applied and number of rules applied per block - [x] Errors per File/Function - [x] Count of files and functions with 2 or more errors - [x] Most number of errors in file - [x] Most number of errors in single function - [x] Types of Transformation Rules Applied - [x] Explain top rules for each python version - [x] Comment on implicit errors - [x] Concrete numbers of rules for Python 2.7 - [ ] (V-D) Time overhead calculation [name=Ali] - **Monday 6pm** - [ ] Craft Experiments - **Sturday Midnight** - [ ] Time taken for applying rule w.r.t file size - [ ] Time taken for decompilation w.r.t file size - [ ] Add results in paper - [ ] Handling Opcode Remapping Case study [name=Ali] - **10pm Monday** - [ ] Recheck - [ ] Cover more remaining samples [name=Ali] - [ ] Resolve more errors - [ ] Update Eval table to reflect coverage ## :alarm_clock: Timeline - Saturday April 2: - Deadline @ 7:59 am ## :closed_book:Overall tasks - [x] 1. Case studies [link](https://hackmd.io/@aliahad97/rkm8HC9Z9): - [x] a. Python 3.9 case study - [x] b. Regular case study of malicious sample - [x] c. Customized python vm - shuffled instructions - [ ] 2. Evaluation section [link](https://hackmd.io/@aliahad97/ByCo2tFb9) - [x] a. Samples statistics - [x] b. Error inducing statistics - [x] c. List down what can and cannot be - [x] d. Add initial tables to paper - [x] e. Find stats related to blocks - [ ] f. Evaluate correctness - [ ] g. Evaluate time taken, efficiency and performanc - [ ] 3. Cover python 3.9 cases ([rules](https://hackmd.io/@aliahad97/HkDD5eVWq), [progress](https://docs.google.com/spreadsheets/d/10dA4An1F36qm5aruGctzKn6nRIQk8xQtvaYodVtsoMI/edit#gid=1627318841)) - [x] a. Initial draft - [x] b. Add to Paper - [x] c. Finalize rules - [x] d. Finalize transforamtions - [ ] e. Reflect changes in paper - [x] 4. Handle "Other errors" [link](https://docs.google.com/spreadsheets/d/1i3dRGD0GWnQ9OlN7ajnxxfHSSFn5N_yVIgcSytjc3CA/edit#gid=0) - [x] a. Check up on `parse errors` - [x] b. Inspect other errors - [x] c. Finalize draft and charts for these errors - [x] 5. Eval CFG changes "After" decompilation [name=Chijung] - [x] a. Compare CFG in mal dataset - [x] b. Compare CFG in benign dataset (ground truth) - [ ] 6. Update dataset numbers [link](https://docs.google.com/spreadsheets/d/1lWiTob6nIFrQFSZFpIHcUmtopqbEJNi0JVm1GPklqTQ/edit?pli=1#gid=1429326896) - [x] a. Prune library related samples - [x] b. Merge Malware with pyinstaller samples - [x] c. Create table for resolved errors - [ ] d. Resolve that cannot be resolved and reaason if cannot - [ ] 7. Paper tasks - [ ] a. Discussion section - [ ] b. Add centralized rules - [ ] 8. CFG - [ ] a. Complete the entire list of implicit errors # :book: Notes: - Percentages should be around 5% - Don't rely on percentages - # :handshake::pencil: Meeting notes: ## Meeting 2 - Use diagrams - Py3.9 - Are these big changes? - Case study - shouldn't be trivial - failing point - Stitch mulitple samples - Go visualize later - Create tables for now - Eval: - Basic blocks - run regex and see match - CFG done by: - Clear picture By monday! - ## Meeting 1 - Examples of changing CFG - Why this happens? - Versions diff - recompilation test with benign source code samples - Case studies - py3.9 - customized python vm - shuffled instructions - - Find numbers - What we have - What we can attain - How many rules - Basic blocks transformed - How many errors - iterations for a function and iteration for a file - Correctness????? - number of errors - average of errors - Samples source of when and where we got them - Transformation lead to new error and what we do about those? - Push all to 100% - remove library files - Prepare for other errors ## IF ASKED - Does priority of rule impact the tool - priority can be devised for performance but it doesn't affect the validity