---
title: Grammar-Fuzzing
author: bynx
tags: fuzzing
---
# Grammar-based Fuzzing
One major issues whitnessed for grammar-based fuzzing happens to be a familiar effect shared with LR parsers.
To prevent recursion hell, like with parsing, we can establish our grammar as a _derivation tree_, or _concrete syntax tree (CST)_
> _Derivation trees consist of_ **_nodes_** _which have other nodes as their children - i.e,_ **_child nodes_**_. Trees start with one node that has no parent; this is referred to as the_ **_root node_**.
Take the following sample as our working example:
```javascript
1. x = 0;
2. if (x) x++;
```
Parsing our sample leads to the following:
```mermaid
%% Derivation Tree %%
flowchart LR;
classDef exec fill:#ffabbb,stroke:#333,stroke-width:2px;
classDef root fill:#ECECFF,stroke:#333,stoke-width:8px;
classDef term fill:#da96ea,color:#3d3d3d;
classDef whitespace fill:#da96ea,color:#3d3d3d,height:2.2%;
rt(["Program (Root)"]):::root===a{{"1/3 "}}:::exec===>L1
rt(["Program (Root)"]):::root===b{{"2/3"}}:::exec==>WS("WhiteSpace")-->nl("#92;n"):::term
rt(["Program (Root)"]):::root===c{{"3/3"}}:::exec===>L2
subgraph L2["L2:\nif (x) x++;"]
direction TB
ES2("ExprStmt")-->IfExpr("IfExpr");
%% IfExpr kw and conditional block %%
IfExpr---> kw("Keyword")----->conditional("if"):::term;
IfExpr--->WS_2a("WhiteSpace")----->w3(" "):::whitespace;
IfExpr---> P_2a("Punctuation")----->paren_open("("):::term;
IfExpr---> id_2("ident")----->id2("x"):::term;
IfExpr---> P_2b("Punctuation")----->paren_close(")"):::term;
IfExpr--->WS_2b("WhiteSpae")----->w4(" "):::whitespace;
IfExpr===>C{Decision}
C{Decision}-.true.->U("UnaryExpr");
C{Decision}-.false.->L("Literal");
ES2("ExprStmt")-------->sc2(";"):::term;
%% UnaryExpr conditional %%
U-->id_c("ident")--->id3("x"):::term;
U-->P_c("Punctuation")--->Add("++"):::term;
L---->n("null"):::term;
subgraph cond[" "]
U
id_c
id3
U
P_c
Add
L
n
end
end
subgraph L1["L1:\nx = 0;"]
direction TB
ES1["ExprStmt"]-->AssignmentExpr;
AssignmentExpr-->id_1("ident")--->id1("x"):::term;
AssignmentExpr-->WS_1a("Whitespace")-..->w1(" "):::whitespace;
AssignmentExpr-->P_1("Punctuation")--->eq("="):::term;
AssignmentExpr-->WS_1b("WhiteSpace")-..->w2(" "):::whitespace;
AssignmentExpr-->Literal--->lit("0"):::term;
ES1["ExprStmt"]----->sc(";"):::term;
end
```
$$
\textit{Concrete Syntax Tree (CST)}
$$
Below we've provided the simplistic view of our AST, where boxed Expressesions represent our Statements:
```mermaid
%% Abtract Syntax Tree %%
flowchart TD
p("Program")
p--"1"-->s1
p--"2"--->s2
exp2-.->exp3("UnaryExpr")
exp2-.->exp4("Literal")
subgraph pg["Program Block"]
subgraph s1["Stmt Block"]
exp1("BindExpr")
sc1(";")
end
subgraph s2["Stmt Block"]
exp2("IfExpr")
exp3("UnaryExpr")
exp4("Literal")
sc2(";")
end
end
```
$$
\textit{Abstract Syntax Tree (AST)}
$$
Fuzzing targets with some manner of Domain-Specific Language (DSL) _is_ grammar-based fuzzing. Using a specific grammar (granularity) provides non-negligible benefit. Commonnly ASTs are used for ease of understanding; conversely, IRs or bytes are used to provide more effecient computations, increasing speed.
---
We're given a _**secure**_ fuzz target, $T$. Assume $T$ is as _optimal_ of a fuzz target as possible, _e.g., has exactly one input that returns a bool indicating memory corruption_.
Without an accompanying proof, we can't _really_ know $T$ is secure; so let's prove (i.e. fuzz) it. Well $-$ starting with the worst ideas $-$ let's just brute force the entire _Control Flow Graph (CFG)_ of $T$. Our working example's CFG is provided below.
```mermaid
%% Control Flow Graph %%
flowchart TD
a(("PC\nEntry"))
b("Assignment")
c{"Boolean\nConditional"}
d("Increment")
e(("PC\nExit"))
style a fill:#ffabbb,stroke:#333,stroke-width:1px,scale:1.2;
style b fill:#eec2c9,stroke:#333,stroke-width:2px;
style c fill:#d9d8d8,stroke:#333,stroke-width:2px;
style d fill:#cce2df,stroke:#333,stroke-width:1px,scale:1.5;
style e fill:#beece7,stroke:#333,stroke-width:1px,scale:1.5;
a==>b==>c
c-.False..->e
c-.True..->d-..->e
```
$$
\textit{Control Flow Graph (CFG)}
$$
This is the flow that $T$ executes, or rather the _path_ that is traversed on the tree of $T$'s CFG.
_$-$ So, what if we just **walked every edge** of the tree $\dots$_
Let's do it; under our _"optimal target"_ assumption, this will suffice as our _proof_.
---
Screw it $-$, let's make our working example what we call $T$. Cool. Now, we're going to assume that we can give it an input that is stored in `x`. Let's bring the code up again $\dots$
```javascript
x = 4;
if (x) x++;
```
Massaging this a bit $\dots$
```rust=
extern "C" {
/// Our goal is to validate the security
/// of this function via proof-by-fuzz.
///
/// We are manipulating the full range
/// of i32 -- our 'grammar's input space
///
/// Upon inspection, we may even hope `c_Add` overflows.
fn fuzz_trgt_hook(mut i32 x) -> Result<(),OverflowError> {
x = 4;
if (x) {
x = c_Add(x,1),
}
Ok(())
}
}
```
## sfd