Some Ideas and Questions about Value Set Analysis

This blog is about my basic ideas about VSA and some questions, whose aim is to try to expose my misunderstanding and throw some problems about the difficulty in applying VSA onto RadecoIL.

The remainder of the blog is organized as follows:

My Basic Ideas about Applying VSA onto RadecoIL
- Data Struct
- Algorithm
Some Obstacle in front of Our Work
- Initial Global Data
- Call Site
- Condition and Loop

By the way, all of these mentioned below will ONLY concentrate on the basic VSA from "Analyzing Memory Accesses in x86 Executables".

The GMOD-Based Merge Function mentioned in "Improved Memory-Access Analysis for x86 Executables"
VSA with ASI mentioned in DIVINE: DIscovering Variables IN Executables

will be included after solving all of problems in this blog.

My Basic Ideas about Applying VSA onto RadecoIL

Data Struct

VSA consists of three basic compositions:

a_loc: A_locs corresponds to the variables in source code, which contain local variables (gathered from R2), global (gathered from R2), heap-allocation variables and registers.
Value Set: N-tuple of PIC, for n memory regions, while RIC consist of four numbers (a, b, c, d) as a segment: a * [b, c] + d. In general, pure numbers could be in the GLOBAL region (Like what the original paper did).
Abstract Store: Abstract Stores are a map from a_locs to Value Sets, for every single instruction. In RadecoIL, it could be bounded with every single OpCode/Phi node.

From now on, I have three ideas about the whole data structures, but I'm not sure which to choose, or other better ones.

Building a HashMap for every OpCode/Phi node with another HashMap. The latter will map every a_loc to its value set. The shortage of this implement is the huge memory and time consumption, as both equal O(nmp), while n is the number of SSA nodes, m is the numbers of a_locs and p is the numbers of memory regions.
Another idea is based on the truth that every Opcode/Phi node will only changes one or two a_locs. We could maintain a HashMap from a_locs to value sets during the traveling in SSA. This will cut down the consumption, but increase the possibility of writing bugs. The reason is that this implementation need us to consider much more special cases, like a loop or conditional statements.
A middle course, maintain abstract store at every basic block's entry and exit.

Algorithm

The algoriths for all data structures are similar, working in as BFS.

The program maintains a heap, which consist of basic blocks. The sort key of the heap is the number of unvisited predecessor, from small to big. The aim of this heap is that we could distinguish whether the block is a loop entry. When Heap pops a basic block, if the number of unvisited predecessor is zero, it means is's not a loop entry block, otherwise, it is.
Push the start node into heap with the initial abstract store and a_locs.
If the heap is empty, go to step 7.
Pop a basic block, if it's a loop entry, use limited widening for every phi node, otherwise, use simple union/join.
Trave the Opcode node in order within the poped block, and calculate the abstract store at the exit of this block.
Push all the successor of the poped block. Go to step 4.
End the program.

Some Obstacles in front of Our Work

Initial Global Data

However, the algorithm mentioned above is not complete. The first obstacle is the initial global data. The problem occurs in the step 2 for initial abstract store

We all know that global datas could have initial value. It could be an address, or a pure number, or more common, a zero number in .bss segment. But the point is, we only concentrate on single function, that means we could not know whether other functions have changed their value. We cannot have a accurate initial abstrace store for global.

Only one solution I could have is to assume they are any value.

Call Site

Call Statements are the most difficult point in the whole alogrithm. We cannot calculate abstract store when we meet an OpCall, because we cannot know what happen will inside the callee, even the BP/SP could be changed in special situation like function alloc.

In the original paper, the auther solved this problem by building a supergraph of the executable, which includes all the functions. But we could do that, because it's an amazing consumption on time and memory.

Thus, the only solution is to analyze functions in order, from leaf to the root in a special tree. The tree consists of functions, where the father function node calls the son function node. But, there are still some sub-problems:

In common situation, the leaf functions are always library functions. So how could we get the abstract store from library functions?
The point of analyzing functions in order is that we should get the register who works for the return value. So how could we finger this register out? (See afC command in radare2, implementing afCr may be a way to go)
Another point of this implement is to find out the preserved registers, which will be used to determine the abstract store in caller function. The method mentioned in the original paper is try to find the smallest sp_delta, but how to find the sp_delta?
Recursive function

If above problems have been solved (or maybe there are still other problems), we could gather information form callee function and handle OpCall

Condition and Loop

We have mentioned a method to finger the loop entry. By the way, that idea was made by myself, so I'm not sure it's total right, even I have proved it. There're also other problems about the limited widening:

The first problem is the, in the original paper, authers could easily calculate the value range of the register which controls the execution of the loop. However, in our radeco-lib, it's not an easy work. Because our block's selectors are always flag register, but not general registers. How could we get the real condition for a conditional statement? It should be solved, even we bypass it in VSA, but we will meet it in C-AST.
The second problem is the Affine Relations of general registers. I think it's a hard work to find them

However, the problem in Condition and Loop is not so urgent, because we can give up a litte preciseness and replace Limited Widening with Ordinary Widening ;P

Some Pre-work should be done

Make a HashMap from SSA node into it's registers (for calculating abstract store)
Gather global variables and their initial value from r2, especially some readonly data.