Week 4 Progress

# Week 4 Progress ### Key notes: - There is no way to test the new function `is_subgraph`. - Task is not only writing function, but creating end to end solution - For each function you write, there should be a small test that needs to be written to ensure reproducibility TODO: - Return the exact subgraph and draw a cfg of it as well ### Main task: ```is_subgraph checking``` Notes and approaches from prototyping a solution: - Approach: make a set of vertices and a set of edges, then check that all of nodes and edges in the subgraph are in the other graph (check both ways). This is the only true robust and simple way to check the subgraph of a graph. - DFS and BFS methods always have the higher likelihood of false negative classifications depending on the order neighboring nodes are traversed. - Concern: How do we represent the vertices and edges? - the id's of a subgraph may be different than another graph, so id does NOT work. (ex: same basic block w/ differnt IDs) - we could use all of the instructions and make some sort of unique identifier with string concatenation - problem: how would we represent edges? Lengthy string comparison are NOT efficient and would be memory intensive #### My Solution: HASHING - notes on python's hash complexity. It is O(n) for new strings and amortized constant for reused strings https://stackoverflow.com/questions/51338089/complexity-of-the-internal-hash-function-in-python - Integer comparisons with the integers returned from the built-in hash function are much faster than string comparisons - Nodes with the same basic block but different id's will be correctly indentified as the same node because the hash function is deterministic (same data always evalutates to the same hash) and uses the instructions of a node as a parameter. - Two helper functions were also added to cfg.py to keep code cleaner and more modular - Future tasks/questions: - should I integrate the is_subgraph function into the CFG object? It would have the header ``` CFG.is_subgraph(self, other_cfg)``` - Does using the hash function present any problems? Did I overlook any easier/alternative methods that avoid using it?