Execution visualization for rv32emu

唐文駿

Objective

The primary goal of this project is to provide a visual representation of the behavior of RISC-V programs while incorporating hardware performance estimations, enabling deeper insights into program execution and system performance.

Execution Plan

1. Study the Reference Project

  • Analyze the ama-riscv-sim to understand its mechanisms for:
    • Logging executed instructions and runtime profiling.
    • how to use Python scripts Visualizing program behavior.

2. Analyze the Current Emulator

  • Evaluate the output for Visualizing of rv32emu.

3. Apply Visualization Methods to the Current Emulator

  • Integrate a logging mechanism in rv32emu to capture the sequence of executed instructions, program counters, and execution times.

visulization on ama-riscv-sim

ama-riscv-sim use ./run_analysis.py to implement visulization.
This code aims to visualize the behavior of RISC-V instruction execution and memory access patterns through charts and data analysis. Below are the main functionalities implemented in the code, along with explanations.

1. User Input Parsing and Command Validation

The program parses user-provided command-line arguments and performs corresponding actions based on the parameters, such as processing instruction logs or execution trace files.

import argparse

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Analysis of memory access logs and traces")
    parser.add_argument('-i', '--inst_log', type=str, help="Path to JSON instruction count log with profiling data")
    parser.add_argument('-t', '--trace', type=str, help="Path to binary execution trace")
    # Additional parameters omitted...
    return parser.parse_args()

2. File Processing

The provided execution trace file is read and parsed into a format suitable for analysis. Converts binary trace files into a DataFrame for further processing and analysis.

import numpy as np
import pandas as pd

def load_bin_trace(bin_log, args) -> pd.DataFrame:
    dtype = np.dtype([
        ('pc', np.uint32), ('isz', np.uint32),
        ('dmem', np.uint32), ('dsz', np.uint32), ('sp', np.uint32),
    ])
    data = np.fromfile(bin_log, dtype=dtype)
    df = pd.DataFrame(data, columns=['pc', 'isz', 'dmem', 'dsz', 'sp'])
    # Additional processing logic omitted...
    return df

3. Visualization Generation

Two main types of visualizations are generated: a histogram of instruction execution frequency and a time series chart of instruction execution behavior.
Shows the dynamic changes in instruction execution during the program's runtime, enabling analysis of instruction access patterns.

3.1. Histogram Generation
import matplotlib.pyplot as plt

def draw_inst_log(df, hl_groups, title, args) -> plt.Figure:
    df_g = df[['i_type', 'count']].groupby('i_type').sum()
    fig, ax = plt.subplots()
    ax.barh(df_g.index, df_g['count'], color="skyblue")
    # Additional chart settings omitted...
    return fig
3.2. Time Series Chart Generation

def draw_exec(df, hl_groups, title, symbols, args, ctype) -> plt.Figure:
    fig, ax_t = plt.subplots()
    ax_t.step(df.index, df['pc'], where='pre', lw=1.5)
    # Additional chart settings omitted...
    return fig

what file record dynamic changed?


4. Execution and Data Saving

The main function orchestrates the above functionalities and, based on user options, saves the generated data or charts to files.


def run_main(args) -> None:
    if args.inst_log:
        df, fig = run_inst_log(args.inst_log, hl_groups, title, args)
        # Save CSV or images...
    elif args.trace:
        df, figs_dict = run_bin_trace(args.trace, hl_groups, title, args)
        # Save charts...

Environment

$ riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (g04696df09) 14.2.0
$ gcc --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
$ sysctl -a | grep machdep.cpu     
machdep.cpu.cores_per_package: 8
machdep.cpu.core_count: 8
machdep.cpu.logical_per_package: 8
machdep.cpu.thread_count: 8
machdep.cpu.brand_string: Apple M2

Step1 : try to exceute ama-riscv-sim on example file

../src/ama-riscv-sim ../sw/baremetal/vector_ew_mac_uint8/basic.bin --out_dir_tag=basic_test

problem 1: can't make under src

main.cpp:4:10: fatal error: 'external/cxxopts/include/cxxopts.hpp' file not found
#include "external/cxxopts/include/cxxopts.hpp"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [build/main.o] Error 1

The file cxxopts.hpp was not found because the external/cxxopts submodule might not have been initialized. cxxopts is a C++ library for handling command-line parameters and is usually included in projects as a git submodule.

Solution: Initialize git submodule and add cxxopts as a submodule:**

not used [-Werror,-Wunused-private-field]
        main_memory* mem;
                     ^
1 error generated.
make: *** [build/hw_models/cache.o] Error 1

problem 2: In cache.h, there is an unused private member variable mem.
In this project, the purpose of the mem pointer is to enable the cache class to interact with the main memory. This is evident from the following key points in the code:

#if CACHE_MODE == CACHE_MODE_FUNC
act_line.data = mem->rd_line(addr - BASE_ADDR);
#endif

When the cache requires a write-back, dirty data needs to be written back to the main memory:

#if CACHE_MODE == CACHE_MODE_FUNC and defined(CACHE_VERIFY)
mem->wr_line((act_line.tag << tag_off) - BASE_ADDR, act_line.data);
#endif

However, these operations are surrounded by conditional compilation directives:

#if CACHE_MODE == CACHE_MODE_FUNC
#if CACHE_MODE == CACHE_MODE_FUNC and defined(CACHE_VERIFY)

This indicates that the mem pointer is only used in functional simulation mode (CACHE_MODE_FUNC).
Solution:
First, add the mem member variable in cache.h:

class cache {
    private:
        uint32_t sets;
        uint32_t ways;
        // ... other members ...
        main_memory* mem;  // Add this line
        bool speculative_exec_active;

Modify the constructor in cache.cpp:

cache::cache(uint32_t sets, uint32_t ways, std::string cache_name, main_memory* mem) :
    sets(sets), ways(ways), cache_name(cache_name), mem(mem)
{
    validate_inputs(sets, ways);
    // ... rest of the constructor ...
}

Update the initialization in main_memory.cpp:

#ifdef ENABLE_HW_PROF
,
icache(hw_cfg.icache_sets, hw_cfg.icache_ways, "icache", this),
dcache(hw_cfg.dcache_sets, hw_cfg.dcache_ways, "dcache", this)
#endif

issue 2: can't build the sw

 % make -j
../Makefile.inc:63: *** commands commence before first target. Stop.

solution 2: manually build the main.c under /sw/baremetal/ by riscv64-unknown-elf-gcc

Don't put screenshots which contain plaintext only.

Step1 Result:

截圖 2025-01-22 下午9.24.26
and we would have following file for step 2.
截圖 2025-01-22 下午9.43.58

exec.log records the detailed execution process of the program (timeline).
inst_profiler.json captures the usage frequency of various instructions.
hw_stats.json logs hardware performance metrics (e.g., cache hit rates).

{
"add": {"count": 1024},
"sub": {"count": 0},
"sll": {"count": 0},
"srl": {"count": 0},
"sra": {"count": 0},
    ...

Step 2 : try to exceute ama-riscv-sim on example file

excution result
figure 1 : instructions profiled

截圖 2025-01-22 下午10.08.12
figure 2 : PC frequency profiled
We need to identify how the simulator generates the ISA count JSON file. Once this is clarified, we can adapt the approach for RV32emu. By applying the same Python script, we can visualize the instructions effectively.

Log Generation Mechanism for ama-riscv-sim

exec.log / inst_profiler.json /hw_stats.json
these 3 simulation log file was produce by profilers.cpp

Execution Flow Summary

1. Launching the Simulator

When executing ./ama-riscv-sim to start the simulator, the program enters main() and creates the memory and core:

int main(int argc, char* argv[]) {
    // ... Parse command-line arguments ...
    
    // Create main memory and core
    memory mem(test_bin, hw_cfg);
    core rv32(&mem, gen_log_path(test_bin, out_dir_tag), cfg, hw_cfg);
    rv32.exec();  // Start execution
}

2. Core Execution and Profiler Initialization

The core begins executing the program. Inside the core class, the profiler is initialized:

class core {
    profiler prof;  // Instruction profiler

    void step() {
        uint32_t inst = fetch();  // Fetch instruction
        prof.new_inst(inst);      // Notify profiler of new instruction

        // Record statistics during instruction execution
        switch(inst_type) {
            case BRANCH:
                prof.log_inst(opc_j::i_beq, taken, direction);
                break;
            default:
                prof.log_inst(opc);
                break;
        }
    }
};

Key Points:

  • The profiler maintains counters for each instruction type and execution count.
  • Special handling is implemented for branch instruction behavior.

3. Profiler Behavior During Execution

The profiler records instruction types and counts during execution:

class profiler {
    // Instruction counter array
    struct prof_g_t {
        std::string name;
        uint64_t count;
    } prof_g_arr[NUM_INST];

    // Branch instruction counters
    struct prof_j_t {
        std::string name;
        uint64_t count_taken;
        uint64_t count_taken_fwd;
        uint64_t count_not_taken;
        uint64_t count_not_taken_fwd;
    } prof_j_arr[NUM_BRANCH];

    void log_inst(opc_g opc) {
        prof_g_arr[TO_U32(opc)].count++;  // Increment count
    }

    void log_inst(opc_j opc, bool taken, b_dir_t direction) {
        if (taken) {
            prof_j_arr[TO_U32(opc)].count_taken++;
            if (direction == b_dir_t::forward)
                prof_j_arr[TO_U32(opc)].count_taken_fwd++;
        }
    }
};

Key Points:

  • The profiler tracks all executed instructions and branch instruction details, including directions and taken counts.

4. Outputting Execution Statistics

When the program completes execution, the profiler outputs statistics:

  • inst_profiler.json: Contains execution statistics for all instructions, including detailed branch prediction information.
void profiler::log_to_file() {
    // Open output file
    ofs.open(log_path + "inst_profiler.json");

    // Output general instruction statistics
    for (const auto &i : prof_g_arr) {
        ofs << "\"" << i.name << "\": {\"count\": " << i.count << "}," << std::endl;
    }

    // Output branch instruction statistics
    for (const auto &e : prof_j_arr) {
        ofs << "\"" << e.name << "\": {"
            << "\"count\": " << e.count_taken + e.count_not_taken << ","
            << "\"breakdown\": {"
            << "\"taken\": " << e.count_taken << ","
            << "\"taken_fwd\": " << e.count_taken_fwd
            << "}}," << std::endl;
    }
}

Current visulization on RV32emu

rv32emu already have it's own visulization, i want to know how it work, and apply python code on it.

Step 1: Run rv_histogram

We will compile tests/nqueens.c into an ELF file and analyze it using rv_histogram. The steps are as follows:

issue 1:
pocoloco@wenjuntangdeMacBook-Pro-2 rv32emu % build/rv_histogram -a nqueens.elf
Failed to open nqueens.elf
solution:
From the code in src/elf.c, it is clear that the simulator expects a 32-bit RISC-V ELF file:

/* must be 32bit ELF */
if (e->hdr->e_ident[EI_CLASS] != ELFCLASS32)
    return false;

/* check if machine type is RISC-V */
if (e->hdr->e_machine != EM_RISCV)
    return false;

Recompile using RV32:

riscv64-unknown-elf-gcc -O2 -march=rv32im -mabi=ilp32 -static tests/nqueens.c -o build/nqueens.elf
  1. Compile nqueens.c into an ELF file:

  2. Analyze it using rv_histogram:

  • Display usage statistics for all instructions:
build/rv_histogram -a nqueens.elf
  • Display register usage statistics:
build/rv_histogram -r nqueens.elf
  • Display both instruction and register usage statistics simultaneously:
build/rv_histogram -ar nqueens.elf

Result for step 1:

截圖 2025-01-22 下午11.43.59
figure 3: build/rv_histogram -a nqueens.elf

截圖 2025-01-23 上午12.49.01
figure 4: build/rv_histogram -r nqueens.elf

histogram Generation Mechanism for RV32emu

The primary goal of rv_histogram.c is to parse ELF files, collect and analyze the usage frequency of RV32 instructions or registers, and finally visualize the results using histograms.

Steps to Process and Analyze Data

1. Parsing Command-Line Arguments

The program begins by parsing command-line arguments using the parse_args() function. It determines the target ELF file and whether to perform analysis on registers or instructions.

2. Loading and Parsing the ELF File

In the main() function, the program loads the ELF file and extracts its program sections using the following logic:

elf_t *e = elf_new();
if (!elf_open(e, elf_prog)) { /* Handle ELF loading */ }

uint8_t *elf_first_byte = get_elf_first_byte(e);
const struct Elf32_Shdr **shdrs =
    (const struct Elf32_Shdr **) &elf_first_byte[hdr->e_shoff];

3. Extracting Executable Instruction Sections

The program iterates through each section in the ELF file, checking if the section type (sh_type) is SHT_PROGBITS and if the section contains executable instructions (sh_flags with SHF_EXECINSTR). Identified executable sections are further analyzed:

while (ptr < exec_end_addr) {
    insn = *((uint32_t *) ptr);
    rv_decode(&ir, insn);
    hist_record(&ir);
}

4. Instruction and Register Frequency Analysis

For each instruction, the program performs frequency analysis based on the show_reg parameter. The logic updates the frequency statistics as follows:

  • Instruction Frequency: Updates rv_insn_stats using insn_hist_incr().
  • Register Frequency: Updates rv_reg_stats using reg_hist_incr().
static void insn_hist_incr(const rv_insn_t *ir);
static void reg_hist_incr(const rv_insn_t *ir);

5. Calculating Frequencies and Generating Histograms

The program calculates the highest instruction or register frequency using find_max_freq() and generates histograms for each instruction or register based on the statistics:

find_max_freq(rv_insn_stats, N_RV_INSNS + 1);
print_hist_stats(rv_insn_stats, N_RV_INSNS + 1);

static void print_hist_stats(const rv_hist_t *stats, size_t stats_size) {
    char hist_bar[max_col * 3 + 1];
    float percent;
    size_t idx = 1;

    for (size_t i = 0; i < stats_size; i++) {
        const char *insn_reg = stats[i].insn_reg;
        size_t freq = stats[i].freq;

        percent = ((float) freq / total_freq) * 100;
        if (percent < 1.00)
            continue;

        printf(fmt, idx, insn_reg, percent, freq,
               gen_hist_bar(hist_bar, sizeof(hist_bar), freq, max_freq, max_col,
                            used_col));
        idx++;
    }
}

issue2: bad address
% build/rv32emu -p build/nqueens.elf
Error: Bad address

Implementation for RV32emu: rv_pyvisual

1. Add rv_pyvisual.c

rv_pyvisual.c would output statistic file under build/pyvisual named output.json, and would called run_analysis.py to generate figure for the target input elf file.

2. In mk/tools.mk for command make tool to build

$ git diff HEAD mk/tools.c
+PYVIS_BIN := $(OUT)/rv_pyvisual
 
+PYVIS_OBJS := \
+       riscv.o \
+       utils.o \
+       map.o \
+       elf.o \
+       decode.o \
+       mpool.o \
+       utils.o \
+       rv_pyvisual.o
+

+PYVIS_OBJS := $(addprefix $(OUT)/, $(PYVIS_OBJS))
+deps += $(PYVIS_OBJS:%.o=%.o.d)
 
-TOOLS_BIN += $(HIST_BIN)
+$(PYVIS_BIN): $(PYVIS_OBJS)
+       $(VECHO) "  LD\t$@\n"
+       $(Q)$(CC) -o $@ -D RV32_FEATURE_GDBSTUB=0 $^ $(LDFLAGS)
+
+TOOLS_BIN += $(HIST_BIN) $(PYVIS_BIN)
 

3. Add python visualization code run_analysis.py

4. Running rv_pyvisual (Basic Syntax)

$ build/rv_pyvisual [options] <elf_file_path> [options]

Example:

$ build/rv_pyvisual -i build/nqueens.elf -l ""

Or Run instruction log analysis with highlight

build/rv_pyvisual -i build/nqueens.elf -l "lw,lh,lb,lhu,lbu,sw,sh,sb bne,beq,blt,bge,bgeu,bltu jal,jalr"

截圖 2025-01-23 下午12.01.04

Don't put screenshots which contain plaintext only!

5. Result

instruction_hbar

Code

gitbub-PochariChun / rv32emu visualization
gist:visualization:rv_pyvisual.c
gist:visualization:run_analysis.py

Reference