--- tags: comp-decomp title: Task for Hamza --- :::info UPDATE: Focus on samples in dir: `samples/cpython_38` Data overview [here](https://docs.google.com/spreadsheets/d/1czDlDY0NtSVFOAXQBvBDk-lP3BrwfqZ4bHeaVZVLEPc/edit#gid=0). ::: # Mapping slices to bytecode For this task you have to map the bytecode slices to source code such that we can have glossary of what type of bytecode pattern matches what source code. The datastructure is as follows: ``` code_object_to_source_map = [ (code_object_1, corresponding_source_code_1), (code_object_2, corresponding_source_code_2), (code_object_3, corresponding_source_code_3), (code_object_4, corresponding_source_code_4), ... ... (code_object_N, corresponding_source_code_N), ] ``` # Approaches What you can do is: 1. Take one source code 2. Recompile it 3. Get code object 4. Add to list Do this for varying sizes of source code to create a corpus of code objects to source code maps. :::info Note that, we will use this corpus as a stepping stone to synthesize our own code objects later in the project. Focus on getting the best list as you can. ::: ## Functions that can help: See function `get_closest_code` in file `lib/decompiler_helper.py` that takes an initial code object and finds the similarity score between the original code object and the decompiled code object.