Artifact Overview

This document is an overview of the artifact that accompanies the OOPSLA19's conditionally accepted paper

BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-program Path Sampling and Per-path Abstract Interpretation

The authors are Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon and Xiangyu Zhang.

Get Started

The artifact is provided as a VirtualBox Appliance named BDA-Artifact.ova, with Ubuntu 18.04.2 LTS installed. Necessary software such as radare2, python2.7 and sqlite3 are pre-installed. Additionally, Rust nightly compiling-chain is also installed. The md5 of provided appliance is MD5 (BDA-Artifact.ova) = 2bde8ade5728d447eaf0da81323b3023.

Setup Virtual Machine

For any desktop user such as Windows, Linux Desktop, Macintosh and Solaris, download Oracle VirtualBox from www.virtualbox.org and import the VM following instructions mentioned here.

For Linux terminal users, you can

# Install Oracle VirtualBox
sudo apt install 

# Install pre-built appliance
vboxmanage import BDA-Artifact.ova

# Start Virtual Machine
VBoxHeadless --startvm BDA-Artifact &

Note that current virtual machine needs 15G memory (10G might work, users can adjust this before starting VM using GUI or following command) and nearly 20G disk space to run. Please make sure your environment satisfies the above requirement.

# Reset memory to 10G if necessary
VBoxManage modifyvm BDA-Artifact --memory 10240

Moreover, when client VM is set up, VirtualBox would forward port 12345 of the host machine to port 22 of the client machine. Please make sure host machine's port 12345 is not occupied.

Desktop users could use Oracle VirtualBox GUI to login virtual machine, using following username and password.

username: bda
password: bda

Terminal users could use SSH from the host machine to login with the above certification.

ssh -p 12345 bda@127.0.0.1

Descrption

We will briefly describe our implementation of BDA in this section, which locates at ~/sabre in VM.

BDA is written in Rust, with 15K LoC. Following are the important parts in BDA.

src/ Main implementation of BDA, which we named as sabre framework, short for Sampling-based Analysis for Binary Reverse Engineering.
src/analyzer/ Analysis plugins for sabre, including postier analysis plugin for BDA (Section 6).
src/engine/ Radare2-based pre-path abstract interpreter, which combined of emulator and path_sampler (Section 5).
src/medium/ Intermediate Representation (IR) and Abstract Domains.
src/medium/metadata/ Abstract Domains for pre-path abstract interpreter (Section 5).
src/medium/HIG/ Customized low-level IR of sabre, which is able to represent CFG and DDG.
src/medium/UBG/ Weighted high-level IR of sabre, used for unbiased whole-program path sampling (Section 4).

artifacts/ Artifact working directory.
artifacts/clean.sh Bash script for cleaning current analysis results.
artifacts/run.sh Bash script for running analysis.
artifacts/rebuild.sh Bash script for rebuilding BDA.

Step by Step Instructions

Build BDA

Please cd to Artifact Directory via

cd ~/sabre/artifacts

Bash scripts would get in trouble if user's $PWD is not $HOME/sabre/artifacts.

To (re-)build BDA, users could

./rebuild.sh

Note that this step might take 10 to 30 minutes, and need network connection to download third-party packages.

Explanation

There are five figures/tables in Section 7 Evaluation, which are

Fig. 9. Path coverage
Fig. 10. Effect of sampling
Table 4. Memory Dependence
Table 5. Effect of posterior analysis
Table 6. Runtime overhead

In this section, we will describe how to reproduce the empirical evaluation for Fig. 9., Table 4. and Table 5..

Additionally, according to Table 6. Runtime overhead, some of the benchmarks need BDA to run for a long time (more than 24h per executable in VM) and consume large memory (more than 50G), whose results would also occupy more than 10G disk space. It's very inconvenient to prepare and distribute such a VM and reproduce the results. Thus, we picked several relatively small benchmarks to be evaluated. Note that the size of the benchmark would not influence BDA's accuracy theoretically.

We don't evaluate Table 6. Runtime overhead due to performance claims cannot be reproduced in VM.

We ignore Fig. 10. Effect of sampling too. Because it needs iterating normal analysis for 10 times, leading much longer time than single analysis (around one whole day for small executable and more than a week for large ones).

If any user thinks it's necessary to evaluate above missing parts, please contact zhan3299@purdue.edu for future discussion.

Instructions

Evaluated Benchmarks One By One

As mentioned above, evaluating BDA needs a long time and large memory. Thus, we recommend users to evaluate BDA one target by one target.

# cd to Artifact Directory
cd ~/sabre/artifacts

# show all analysis targets
ls
181.mcf/ # Time: 0.5 - 1.5 h; Memory: 1 - 2 GB
164.gzip/ # Time: 3 - 5 h; Memory: 3 - 5 GB
256.bzip2/ # Time: 3 - 5 h; Memory: 3 - 5 GB
254.gap/ # Time: 4 - 8 h; Memory: 4 - 9 GB
252.eon/ # Time: 5 - 9 h; Memory: 8 - 12 GB

Every directory under ~/sabre/artifacts is a pre-prepared analysis target. We offer above five benchmarks to evaluate.

We use 181.mcf as an example to show how to evaluate BDA.

cd ~/sabre/artifacts
./run.sh 181.mcf

After that, we will show the estimated time and memory consumption for the given analysis target as a warning

./run.sh 181.mcf
Start to run analysis for 181.mcf
This analysis might take 30 to 90 minutes, and consume 1 to 2 GB memory
Please make sure whether environment is valid, and continue? (y/n)

Press y to begin the analysis, and wait for the result. During analysis, we will print a few logs. Note that if memory requirement is not satisfied, the analysis would be very slow due to memory swapping.

./run.sh 181.mcf
Start to run analysis for 181.mcf
This analysis might take 30 to 90 minutes, and consume 1 to 2 GB memory
Please make sure whether environment is valid, and continue? (y/n)
y
Running migration ValueInit_create_table
Running migration VariableInit_create_table
[1/4] Sampling done.
[2/4] Calculating dependence with analysis done.
[3/4] Calculating dependence without analysis done.

When analysis ends, it would output results as

./run.sh 181.mcf
Start to run analysis for 181.mcf
This analysis might take 30 to 90 minutes, and consume 1 to 2 GB memory
Please make sure whether environment is valid, and continue? (y/n)
y
Running migration ValueInit_create_table
Running migration VariableInit_create_table
[1/4] Sampling done.
[2/4] Calculating dependence with analysis done.
[3/4] Calculating dependence without analysis done.
[4/4] Testing intra-procedure paths coverage done.

Finial Report:

# This data is for Table 4. (181.mcf BDA part)
# Due to randomization, following is accepted range:
#     MISS: 0 ~ 15
#     Extra: 1K ~ 3K
#     Mistyped: 10% ~ 20%
# *Less* Miss/Extra/Mistyped means BDA is more accurate.

Report (Analysis Enable: true):
        FOUND: 4554
        MISS: 2(0.10%)
        EXTRA: 2506
        MISTYPED: 682(14.98%)

----------------------------------------------------------------------

# This data is for Table 5 w/o analysis (181.mcf part).
# Due to randomization, this data will vary in a large range.
# *More* Miss means posterior analysis is more necessary.

Report (Analysis Enable: false):
        FOUND: 2506
        MISS: 36(1.76%)
        EXTRA: 492
        MISTYPED: 88(3.51%)

----------------------------------------------------------------------

# This data is for Fig 9 (181.mcf part).

Covered Rate:
        0.0: 0%
        0.1: 0%
        0.2: 0%
        0.3: 0%
        0.4: 0%
        0.5: 0%
        0.6: 0%
        0.7: 0%
        0.8: 0%
        0.9: 0%
        1.0: 100%

======================================================================

The main concern is that our analysis is randomized, which means the final results should fall into a suitable range rather than an accurate number. We will show this range in comments for Table 4. (which is our final result). e.g.

# Due to randomization, following is accepted range:
#     MISS: 0 ~ 15
#     Extra: 1K ~ 3K
#     Mistyped: 10% ~ 20%

For Table 5., due to lack of postier analysis and the uncountable whole-program paths, the result might vary in a large range (so we do not give it a suitable range).

For Fig. 9, the covered rate indicates the percentage of functions for which BDA has achieved various levels of coverage. The first number is the coverage level, and the second number is the percentage of functions. Taking 0.9: 6% as an example, it means 6% functions have achieved 0.9 coverage level (BDA covered 90%~99% intra-procedure paths).

If users want to show more internal log, they could

DUMP_LOG=true ./run.sh 181.mcf

Other benchmarks could be evaluated via

./run.sh 181.mcf/ 
./run.sh 164.gzip/ 
./run.sh 256.bzip2/ 
./run.sh 254.gap/ 
./run.sh 252.eon/

Evaluated All Together

We also offer the capability to run all pre-prepared targets.

# Clean current analysis results
./clean.sh

# Run all
./run.sh

It might take a whole day to run analysis. After finishing, run

# Gather analysis result
./result.sh
cat reports.txt