# Taint analysis Valgrind taint analysis tool :taintgrind 前幾天有說想要關聯所有variable 最簡單的方式可以用taintgrind ,這邊找尋一下variable之間的關聯 Git clone https://github.com/eembc/coremark.git /home/gcc-plugin/coremark/barebones/core_portme.mak /home/gcc-plugin/coremark/posix/core_portme.mak ```makefile= # Flag: OUTFLAG #   Use this flag to define how to to get an executable (e.g -o) OUTFLAG= -o # Flag: CC #   Use this flag to define compiler to use CC?= cc # Flag: CFLAGS #   Use this flag to define compiler options. Note, you can add compiler options from the command line using XCFLAGS="other flags" PORT_CFLAGS = -O0 -g ``` /home/gcc-plugin/Makefile ```makefile= PLUGIN_CFLAGS = -I/home/rex603/valgrind-3.18.1/taintgrind/ -I/home/rex603/valgrind-3.18.1/include HEADERS = coremark.h CHECK_FILES = $(ORIG_SRCS) $(HEADERS) …. compile: $(OPATH) $(SRCS) $(HEADERS)     $(CC) $(PLUGIN_CFLAGS) $(CFLAGS) $(XCFLAGS) $(SRCS) $(OUTCMD) 2> ./check.log ``` 這邊要注意的是 -O0 這邊境可能的保留程式碼沒被優化的訊息 ,記得-g 保留source code 位置 # Build Valgrind & taintgrind 這裡要確保 Valgrind 版本為 3.18.1,也就是 taintgrind 最後更新的對應的 Valgrind 版本 然後稍微換一下來源capstone-3.0.4 ```bash= wget https://sourceware.org/pub/valgrind/valgrind-3.18.1.tar.bz2 tar jxvf valgrind-3.18.1.tar.bz2 cd valgrind-3.18.1.tar.bz2 git clone http://github.com/wmkhoo/taintgrind.git cd taintgrind Need change capstone-3.0.4.tar.gz https://opentuna.cn/pypi/web/packages/e7/29/e9ad2a12c38f19e9ca8aff05122e5b9e271da6ecbfb6c4e20aee381b49ff/capstone-3.0.4.tar.gz#sha256=945d3b8c3646a1c3914824c416439e2cf2df8969dd722c8979cdcc23b40ad225 sudo sh build_taintgrind.sh ``` # Run ```bash= /home/rex603/valgrind-3.18.1/build/bin/valgrind --tool=taintgrind ``` # build test case 第一個 -I../ 該目錄是位於 taintgrind 的"taintgrind.h" 第二個 -I../../ 是對應 Valgrind 下的include ```bash= gcc -I../ -I../../include -O0 -g ./test.c -o test ``` 假設要找出a的所有汙染路徑 ```c= #include "taintgrind.h" #include <stdio.h> int test(int x ){ // printf("test%d",x); // TNT_TAINT(&x,sizeof(x)); return x%99; } int get_sign(int x) { if (x == 0) return test(x); if (x < 0) return test(x); return test(x); } int main(int argc, char **argv) { // Turns on printing //TNT_START_PRINT(); int a = 1000; // Defines int a as tainted TNT_TAINT(&a,sizeof(a)); int s = get_sign(a); // Turns off printing //TNT_STOP_PRINT(); return 0; } ``` # run test case ```bash= /home/rex603/valgrind-3.18.1/build/bin/valgrind --tool=taintgrind ./tests/test ``` ![](https://hackmd.io/_uploads/S1Y7kAY16.png) # coremark test add trace point ## core_main.c ```c= #error "Cannot use a static data area with multiple contexts!" #endif #elif (MEM_METHOD == MEM_MALLOC) for (i = 0; i < MULTITHREAD; i++) { ee_s32 malloc_override = get_seed(7); if (malloc_override != 0) results[i].size = malloc_override; else results[i].size = TOTAL_DATA_SIZE; results[i].memblock[0] = portable_malloc(results[i].size); results[i].seed1 = results[0].seed1; results[i].seed2 = results[0].seed2; results[i].seed3 = results[0].seed3; results[i].err = 0; results[i].execs = results[0].execs; } TNT_TAINT(& results[0].memblock, sizeof( results[0].memblock)); ``` # run coremark 這邊可以看到log 數量蠻多的 ```bash= /home/valgrind-3.18.1/build/bin/valgrind --tool=taintgrind ./coremark.exe 0x0 0x0 0x66 1 7 1 2000 2> logfile ``` ![](https://hackmd.io/_uploads/rkZR1RYy6.png) # parse logfile ```python= file_path = "logfile" results = {} with open(file_path, "r") as file: for line in file: parts = line.strip().split("|") if len(parts) >= 2: source_code, asm,variable_memory = parts[0], parts[1] ,parts[-1].strip() if(variable_memory.find("<-")>=0): operation, variable = variable_memory.split("<-") source_code = source_code.strip() operation = operation.strip() variable = variable.strip() # Use the source code as the dictionary key and add the operation and variable to a set if variable.find(":")>=0: result_save = (f"{source_code} | {asm} | {variable}") elif operation.find(":")>=0: result_save = (f"{source_code} | {asm} | {operation}") if source_code in results: results[result_save].add(result_save) else: results[result_save] = {result_save} # Print the simplified results for source_code, data_set in results.items(): for data in data_set: print(f"{data}") ``` # output log 這邊可以看到最終過濾出來的variable 剩下 600多個也就是 memblock 關聯了600多個variable ![](https://hackmd.io/_uploads/Sk0Ge0F1T.png) ![](https://hackmd.io/_uploads/BJ5Xx0Kya.png)