大致分析afl++

# 大致分析afl++ ![image](https://hackmd.io/_uploads/r1wJrV21R.png) 研究一下 afl++ 內部工作原理 # Install 我的環境是docker並且在windows wsl2 https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/INSTALL.md ```bash= docker pull aflplusplus/aflplusplus:latest docker run -ti -v [localfolder]:/src aflplusplus/aflplusplus ``` # Build afl++ ```bash= cp -R /AFLplusplus/ /src ``` 假設你在 wsl2 你要在docker maping 出來的資料夾透過vscode 修改source code 那你必須要在外面對著 container copy出來的 AFLplusplus資料夾在設定權限一次 ```bash= # in wsl folder chmod 777 /AFLplusplus ``` 這樣就可以透過vscode 去修改source code 因為某寫檔案已經build 過所以可以很快的build 完整的source code ```bash= make make install ``` # Example ## Source Tree 開始建立 example 首先先建立個testfolder ```bash ./testfolder ``` 裡面大概要這些資料夾,fuzz_in裡面還要再產生一個testcase ```bash= -rwxr-xr-x 1 root root 19248 Apr 4 15:11 fuzzTest -rwxrwxrwx 1 root root 186 Apr 4 14:02 fuzzTest.c drwxrwxrwx 2 root root 4096 Apr 4 14:02 fuzz_in ``` ## Script ```bash= touch fuzzTest.c echo "1234567892&^$%^$#$@" > fuzz_in/testcase mkdir fuzz_in ``` ## Testcase fuzzTest.c ```c= #include <stdio.h> // 引入標準I/O庫 int main(int argc, char *argv[]) { char buf[100] = {0}; // 定義字符陣列 gets(buf); // 讀取輸入，存在栈溢出風險 printf(buf); // 輸出字符串，存在格式化字符串風險 return 0; } ``` ## Build ```bash= afl-gcc -g -o ./fuzzTest fuzzTest.c ``` ## Fuzzy ```bash= afl-fuzz -i fuzz_in -o fuzz_out ./fuzzTest ``` ## Run ![image](https://hackmd.io/_uploads/ByOLl-6yC.png) # compiler instrumentation 在看afl++架構的時候,發先從編譯的時候他會對程式碼進行插樁,也就是埋code,這樣再配合afl-fuzz在runtime的時候可以控制binary 生命流程來達到探索路徑,那最簡單的就是觀看afl-gcc 做了那些事情 ```bash= afl-gcc -g -o ./fuzzTest fuzzTest.c ``` ## afl-as.c ```bash /home/x213212/afl/AFLplusplus/src/afl-as.c ``` 在這個source code 可以找到 afl-gcc compiler entry point main function ```c= /* Main entry point */ int main(int argc, char **argv) { s32 pid; u32 rand_seed, i, j; int status; u8 *inst_ratio_str = getenv("AFL_INST_RATIO"); struct timeval tv; struct timezone tz; clang_mode = !!getenv(CLANG_ENV_VAR); if ((isatty(2) && !getenv("AFL_QUIET")) || getenv("AFL_DEBUG") != NULL) { SAYF(cCYA "afl-as" VERSION cRST " by Michal Zalewski\n"); } else { be_quiet = 1; } if (argc < 2 || (argc == 2 && strcmp(argv[1], "-h") == 0)) { fprintf( stdout, "afl-as" VERSION " by Michal Zalewski\n" "\n%s [-h]\n\n" "This is a helper application for afl-fuzz. It is a wrapper around GNU " "'as',\n" "executed by the toolchain whenever using afl-gcc or afl-clang. You " "probably\n" "don't want to run this program directly.\n\n" "Rarely, when dealing with extremely complex projects, it may be " "advisable\n" "to set AFL_INST_RATIO to a value less than 100 in order to reduce " "the\n" "odds of instrumenting every discovered branch.\n\n" "Environment variables used:\n" "AFL_AS: path to assembler to use for instrumented files\n" "AFL_CC: fall back path to assembler\n" "AFL_CXX: fall back path to assembler\n" "TMPDIR: directory to use for temporary files\n" "TEMP: fall back path to directory for temporary files\n" "TMP: fall back path to directory for temporary files\n" "AFL_INST_RATIO: user specified instrumentation ratio\n" "AFL_QUIET: suppress verbose output\n" "AFL_KEEP_ASSEMBLY: leave instrumented assembly files\n" "AFL_AS_FORCE_INSTRUMENT: force instrumentation for asm sources\n" "AFL_HARDEN, AFL_USE_ASAN, AFL_USE_MSAN, AFL_USE_UBSAN, AFL_USE_LSAN:\n" " used in the instrumentation summary message\n", argv[0]); exit(1); } gettimeofday(&tv, &tz); rand_seed = tv.tv_sec ^ tv.tv_usec ^ getpid(); // in fast systems where pids can repeat in the same seconds we need this for (i = 1; (s32)i < argc; i++) for (j = 0; j < strlen(argv[i]); j++) rand_seed += argv[i][j]; srandom(rand_seed); edit_params(argc, argv); if (inst_ratio_str) { if (sscanf(inst_ratio_str, "%u", &inst_ratio) != 1 || inst_ratio > 100) { FATAL("Bad value of AFL_INST_RATIO (must be between 0 and 100)"); } } if (getenv(AS_LOOP_ENV_VAR)) { FATAL("Endless loop when calling 'as' (remove '.' from your PATH)"); } setenv(AS_LOOP_ENV_VAR, "1", 1); /* When compiling with ASAN, we don't have a particularly elegant way to skip ASAN-specific branches. But we can probabilistically compensate for that... */ if (getenv("AFL_USE_ASAN") || getenv("AFL_USE_MSAN")) { sanitizer = 1; if (!getenv("AFL_INST_RATIO")) { inst_ratio /= 3; } } if (!just_version) { add_instrumentation(); } if (!(pid = fork())) { execvp(as_params[0], (char **)as_params); FATAL("Oops, failed to execute '%s' - check your PATH", as_params[0]); } if (pid < 0) { PFATAL("fork() failed"); } if (waitpid(pid, &status, 0) <= 0) { PFATAL("waitpid() failed"); } if (!getenv("AFL_KEEP_ASSEMBLY")) { unlink(modified_file); } exit(WEXITSTATUS(status)); } ``` afl-as 首先會先執行 function edit_params() 來調整參數，而後會執行 function add_instrumentation() 做插樁，最後執行 as 做組譯 ## add_instrumentation ```c= /* Process input file, generate modified_file. Insert instrumentation in all the appropriate places. */ static void add_instrumentation(void) { static u8 line[MAX_LINE]; FILE *inf; FILE *outf; s32 outfd; u32 ins_lines = 0; u8 instr_ok = 0, skip_csect = 0, skip_next_label = 0, skip_intel = 0, skip_app = 0, instrument_next = 0; #ifdef __APPLE__ u8 *colon_pos; #endif /* __APPLE__ */ if (input_file) { inf = fopen(input_file, "r"); if (!inf) { PFATAL("Unable to read '%s'", input_file); } } else { inf = stdin; } outfd = open(modified_file, O_WRONLY | O_EXCL | O_CREAT, DEFAULT_PERMISSION); if (outfd < 0) { PFATAL("Unable to write to '%s'", modified_file); } outf = fdopen(outfd, "w"); if (!outf) { PFATAL("fdopen() failed"); } while (fgets(line, MAX_LINE, inf)) { /* In some cases, we want to defer writing the instrumentation trampoline until after all the labels, macros, comments, etc. If we're in this mode, and if the line starts with a tab followed by a character, dump the trampoline now. */ if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok && instrument_next && line[0] == '\t' && isalpha(line[1])) { fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32, R(MAP_SIZE)); instrument_next = 0; ins_lines++; } /* Output the actual line, call it a day in pass-thru mode. */ fputs(line, outf); if (pass_thru) { continue; } /* All right, this is where the actual fun begins. For one, we only want to instrument the .text section. So, let's keep track of that in processed files - and let's set instr_ok accordingly. */ if (line[0] == '\t' && line[1] == '.') { /* OpenBSD puts jump tables directly inline with the code, which is a bit annoying. They use a specific format of p2align directives around them, so we use that as a signal. */ if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) && isdigit(line[10]) && line[11] == '\n') { skip_next_label = 1; } if (!strncmp(line + 2, "text\n", 5) || !strncmp(line + 2, "section\t.text", 13) || !strncmp(line + 2, "section\t__TEXT,__text", 21) || !strncmp(line + 2, "section __TEXT,__text", 21)) { instr_ok = 1; continue; } if (!strncmp(line + 2, "section\t", 8) || !strncmp(line + 2, "section ", 8) || !strncmp(line + 2, "bss\n", 4) || !strncmp(line + 2, "data\n", 5)) { instr_ok = 0; continue; } } /* Detect off-flavor assembly (rare, happens in gdb). When this is encountered, we set skip_csect until the opposite directive is seen, and we do not instrument. */ if (strstr(line, ".code")) { if (strstr(line, ".code32")) { skip_csect = use_64bit; } if (strstr(line, ".code64")) { skip_csect = !use_64bit; } } /* Detect syntax changes, as could happen with hand-written assembly. Skip Intel blocks, resume instrumentation when back to AT&T. */ if (strstr(line, ".intel_syntax")) { skip_intel = 1; } if (strstr(line, ".att_syntax")) { skip_intel = 0; } /* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */ if (line[0] == '#' || line[1] == '#') { if (strstr(line, "#APP")) { skip_app = 1; } if (strstr(line, "#NO_APP")) { skip_app = 0; } } /* If we're in the right mood for instrumenting, check for function names or conditional labels. This is a bit messy, but in essence, we want to catch: ^main: - function entry point (always instrumented) ^.L0: - GCC branch label ^.LBB0_0: - clang branch label (but only in clang mode) ^\tjnz foo - conditional branches ...but not: ^# BB#0: - clang comments ^ # BB#0: - ditto ^.Ltmp0: - clang non-branch labels ^.LC0 - GCC non-branch labels ^.LBB0_0: - ditto (when in GCC mode) ^\tjmp foo - non-conditional jumps Additionally, clang and GCC on MacOS X follow a different convention with no leading dots on labels, hence the weird maze of #ifdefs later on. */ if (skip_intel || skip_app || skip_csect || !instr_ok || line[0] == '#' || line[0] == ' ') { continue; } /* Conditional branch instruction (jnz, etc). We append the instrumentation right after the branch (to instrument the not-taken path) and at the branch destination label (handled later on). */ if (line[0] == '\t') { if (line[1] == 'j' && line[2] != 'm' && R(100) < (long)inst_ratio) { fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32, R(MAP_SIZE)); ins_lines++; } continue; } /* Label of some sort. This may be a branch destination, but we need to read carefully and account for several different formatting conventions. */ #ifdef __APPLE__ /* Apple: L<whatever><digit>: */ if ((colon_pos = strstr(line, ":"))) { if (line[0] == 'L' && isdigit(*(colon_pos - 1))) { #else /* Everybody else: .L<whatever>: */ if (strstr(line, ":")) { if (line[0] == '.') { #endif /* __APPLE__ */ /* .L0: or LBB0_0: style jump destination */ #ifdef __APPLE__ /* Apple: L<num> / LBB<num> */ if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3))) && R(100) < (long)inst_ratio) { #else /* Apple: .L<num> / .LBB<num> */ if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3))) && R(100) < (long)inst_ratio) { #endif /* __APPLE__ */ /* An optimization is possible here by adding the code only if the label is mentioned in the code in contexts other than call / jmp. That said, this complicates the code by requiring two-pass processing (messy with stdin), and results in a speed gain typically under 10%, because compilers are generally pretty good about not generating spurious intra-function jumps. We use deferred output chiefly to avoid disrupting .Lfunc_begin0-style exception handling calculations (a problem on MacOS X). */ if (!skip_next_label) { instrument_next = 1; } else { skip_next_label = 0; } } } else { /* Function label (always instrumented, deferred mode). */ instrument_next = 1; } } } if (ins_lines) { fputs(use_64bit ? main_payload_64 : main_payload_32, outf); } if (input_file) { fclose(inf); } fclose(outf); if (!be_quiet) { if (!ins_lines) { WARNF("No instrumentation targets found%s.", pass_thru ? " (pass-thru mode)" : ""); } else { char modeline[100]; snprintf(modeline, sizeof(modeline), "%s%s%s%s%s%s", getenv("AFL_HARDEN") ? "hardened" : "non-hardened", getenv("AFL_USE_ASAN") ? ", ASAN" : "", getenv("AFL_USE_MSAN") ? ", MSAN" : "", getenv("AFL_USE_TSAN") ? ", TSAN" : "", getenv("AFL_USE_UBSAN") ? ", UBSAN" : "", getenv("AFL_USE_LSAN") ? ", LSAN" : ""); OKF("Instrumented %u locations (%s-bit, %s mode, ratio %u%%).", ins_lines, use_64bit ? "64" : "32", modeline, inst_ratio); } } } ``` 那這些插樁的asm 到底寫在哪個資料夾可以在afl-as.h 看的到 ```bash= /home/x213212/afl/AFLplusplus/include/afl-as.h ``` ![image](https://hackmd.io/_uploads/rybHVb6yA.png) 像在這篇文章就有提到跳轉規則 https://zhuanlan.zhihu.com/p/583178410 * 插樁的模式: 1. ^main - 函數入口點 2. ^\..L0 - GCC跳轉標籤 3. ^\..LBB0_0 - clang跳轉標籤 4. ^\tjnz foo - 條件跳轉標籤 * 不希望捕獲的模式: 1. ^# BB#0 - clang注釋 2. ^ # BB#0 - clang注釋 3. ^\..Ltmp0 - clang非分支標籤 4. ^\..LC0 - GCC非分支標籤 5. ^\..LBB0_0 - GCC非分支標籤（這條與想要捕獲的clang跳轉標籤相同，需確定context以做區分） 6. ^\tjmp foo - 非條件跳轉 https://www.secpulse.com/archives/166956.html 在這邊文章也有敘述到在路徑中他會寫入到bitmap來記錄探索的路徑資料寫到shard memory 也就是後續要大概說一下fork_server運作原理 # fork_server 透過剛剛插樁,其實就是在控制程式的生命流程 https://ch4r1l3.github.io/2019/03/08/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%903%E2%80%94%E2%80%94afl-as-h%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/ 這篇文章有對插入的asm進行詳細解釋了我們直接看 ![image](https://hackmd.io/_uploads/Hkvhwb61C.png) ![image](https://hackmd.io/_uploads/BkihP-pkA.png) ```c= #include <stdio.h> int main(void){ //instrumentation A, 函数入口 int a; scanf("%d", &a); if(a==0xdeadbeef) //instrumentation B, jnz 不執行 *((char *)0)=1; //instrumentation C, jnz 執行 return 0; } ``` 也就是說fork_server 透過 fork 先停在main function ,由於我們又用compiler 進行插樁,所以在遇到新的路徑的時候都會記錄在shard memory 來避免重複探索路徑,那透過這樣的方式假設fork 出來的 child process 發生 timeout 或者 crash ,子父程序都可以藉由pipe 去傳遞彼此之間的狀態,透過這樣的方式不用每次從頭開始執行,假設發生crash 也只是fork 出來的child process ,fuzzer 隨時可以從上一個中斷點開始運行 # fuzzy 後續就是透過演算法不斷的探索新路徑,然後紀錄發生crash 再去看在修復程式碼,分析較為淺不過大致流程應該是這樣 https://bbs.kanxue.com/thread-254705.htm # ref https://tttang.com/user/f1tao https://ch4r1l3.github.io/2019/03/05/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%901%E2%80%94%E2%80%94afl-gcc-c%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/ https://ch4r1l3.github.io/2019/03/06/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%902%E2%80%94%E2%80%94afl-as-c%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/ https://ch4r1l3.github.io/2019/03/08/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%903%E2%80%94%E2%80%94afl-as-h%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/ https://ch4r1l3.github.io/2019/03/09/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%904%E2%80%94%E2%80%94afl-fuzz-c%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%901/ https://ch4r1l3.github.io/2019/03/10/AFL%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%905%E2%80%94%E2%80%94afl-fuzz-c%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%902/ https://ithelp.ithome.com.tw/users/20151153/ironman/5164?page=1 https://ithelp.ithome.com.tw/articles/10288409