CDAP Development Update 2

# CDAP Development Update 2 This update is delayed a bit. Future updates will be posted about every two weeks. In my last [update](https://hackmd.io/@alexchenzl/rkXcGporF), the goal for this stage is set to design and implement the predictive engine prototype based on Geth codebase. The development currently goes well as I expected, though I've encountered some difficulties before I found out the workaround. At the same time, my main goals in this program is becoming clearer than ever when I'm deeply involved in it: * Accomplish the project: Static Analysis to Predict Access List * Get deep understanding of the source code of go-ethereum because it's the most widely used Ethereum execution layer Progress --- At first, I spent some time to read the EVM implementation of go-ethereum, and understand some optimization that have been applied, such as [simplified run loop instruction calling](https://medium.com/@jeff.ethereum/the-removal-of-the-experimental-evm-ae4d8ffbef3e), [64-bit gas instructions](https://medium.com/@jeff.ethereum/optimising-the-ethereum-virtual-machine-58457e61ca15#.58zkoudg0) and [using uint256](https://chfast.github.io/Go-ethereum-EVM-optimization-report/). [Etk](https://github.com/quilt/etk) is the first project that I’m interested in at the beginning of this program. It uses symbolic execution in its disassembler to analyze EVM byte codes. So I also spent some time to read its source code to learn how it analyzes byte codes and found a minor [bug](https://github.com/quilt/etk/issues/85). Then I began to think of the predictive engine. There are two potential solutions for it: * Using pattern-matching to find data access list * Simulating the execution of transactions and recording data access The first one should be easier to build, but it can only find some basic static data access. The second one is more complex but it can even find more dynamic data access. Finally I decided to implement the second solution because it can help me understand details of the EVM implementation besides the above reason. My initial idea is to use a symbol expression to express an unknown storage value when executing a transaction payload in the engine. Ideally when a storage slot key depends another previous slot value, sometimes it could even be calculated after retrieving the previous slot value and doesn’t need to call the prediction function again. I began to implement it, but found it’s too complex. The intructions, stack, memory and statedb need too much modification. Obviously it will also cause bad performance even if it was implemented. I decided to put the implementation work aside for a while. I wanted to see how data access happens in a real transaction. So I read the tracer implementation in Geth and wrote a [trace tool](https://github.com/alexchenzl/TraceHelper.git) to extract data access list together with context information of transactions. This tool would help me to design test cases, debug and even find some patterns to optimize the predictive engine. When I was implementing the trace tool, I realized that I only need to use a specific unit256 value such as `0xFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFCFC` to tag an unknown storage value instead of an expression. Then the stack and memory don’t need any change, the instructions doesn’t need too much changes either. An in-memory statedb still needs to be implemented. The complexity is decreased much. Now I'm working on this way. Next Goals --- * Finish the first version of a runnable predictive engine * Write some basic test cases and make them passed