# Fuzzing regexp
Kang-min Liu // @gugod
---
# `.*.*=.*`
![Regexp::Debugger in action](https://blog-cloudflare-com-assets.storage.googleapis.com/2019/07/555-steps.gif)
https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/
---
![](https://blog-cloudflare-com-assets.storage.googleapis.com/2019/07/failing-x-x.png)
That’s bad because as the input size goes up the match time goes up super-linearly.
---
# A proof-of-concept
https://github.com/gugod/bin/blob/master/refuzz
```
# ./refuzz '.*.*=.*'
0 7 False
1 20 False
2 41 False
3 72 False
4 115 False
5 172 False
6 245 False
7 336 False
8 447 False
9 580 False
10 737 False
11 920 False
12 1131 False
13 1372 False
14 1645 False
15 1952 False
16 2295 False
17 2676 False
18 3097 False
19 3560 False
20 4067 False
```
---
# Regexp::Debugger
use Regexp::Debugger save_to => "rx.json";
$str =~ /$re/;
---
Did you know
the core of Regexp::Debugger
is a regexp (`s///e`) tomatch all regexps ?
---
# re pragma
use re 'debug';
"x=xxxxxxxxxxxxxxxxxxxx" =~ /.*.*=.*/;
---
```
Compiling REx ".*.*=.*"
Final program:
1: STAR (3)
2: REG_ANY (0)
3: STAR (5)
4: REG_ANY (0)
5: EXACT <=> (7)
7: STAR (9)
8: REG_ANY (0)
9: END (0)
floating "=" at 0..9223372036854775807 (checking floating) anchored(MBOL) implicit minlen 1
Matching REx ".*.*=.*" against "x=xxxxxxxxxxxxxxxxxxxx"
Intuit: trying to determine minimum start position...
doing 'check' fbm scan, [0..22] gave 1
Found floating substr "=" at offset 1 (rx_origin now 0)...
(multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
0 <> <x=xxxxxxxx> | 0| 1:STAR(3)
| 0| REG_ANY can match 22 times out of 2147483647...
22 <xxxxxxxxxxxx> <> | 1| 3:STAR(5)
| 1| REG_ANY can match 0 times out of 2147483647...
| 1| failed...
21 <xxxxxxxxxxx> <x> | 1| 3:STAR(5)
| 1| REG_ANY can match 1 times out of 2147483647...
| 1| failed...
20 <xxxxxxxxxx> <xx> | 1| 3:STAR(5)
| 1| REG_ANY can match 2 times out of 2147483647...
| 1| failed...
19 <xxxxxxxxx> <xxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 3 times out of 2147483647...
| 1| failed...
18 <xxxxxxxx> <xxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 4 times out of 2147483647...
| 1| failed...
17 <xxxxxxx> <xxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 5 times out of 2147483647...
| 1| failed...
16 <xxxxxx> <xxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 6 times out of 2147483647...
| 1| failed...
15 <xxxxx> <xxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 7 times out of 2147483647...
| 1| failed...
14 <xxxxx> <xxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 8 times out of 2147483647...
| 1| failed...
13 <xxxxx> <xxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 9 times out of 2147483647...
| 1| failed...
12 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 10 times out of 2147483647...
| 1| failed...
11 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 11 times out of 2147483647...
| 1| failed...
10 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 12 times out of 2147483647...
| 1| failed...
9 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 13 times out of 2147483647...
| 1| failed...
8 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 14 times out of 2147483647...
| 1| failed...
7 <xxxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 15 times out of 2147483647...
| 1| failed...
6 <=xxxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 16 times out of 2147483647...
| 1| failed...
5 <x=xxx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 17 times out of 2147483647...
| 1| failed...
4 <x=xx> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 18 times out of 2147483647...
| 1| failed...
3 <x=x> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 19 times out of 2147483647...
| 1| failed...
2 <x=> <xxxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 20 times out of 2147483647...
| 1| failed...
1 <x> <=xxxxxxxxx> | 1| 3:STAR(5)
| 1| REG_ANY can match 21 times out of 2147483647...
1 <x> <=xxxxxxxxx> | 2| 5:EXACT <=>(7)
2 <x=> <xxxxxxxxxx> | 2| 7:STAR(9)
| 2| REG_ANY can match 20 times out of 2147483647...
22 <xxxxxxxxxxxx> <> | 3| 9:END(0)
Match successful!
Freeing REx: ".*.*=.*"
```
---
# Findings
- Regexp::Debugger is as accurate as what textbook teaches you.
- internal changes for emitting events is wanted
- analysis info from `re` is specific to perl5
- perl5 RE is **very** optimized
- .oO( Had Cloudflare used perl5, they could have dodged the outage... )
---
# Extra
p5find-regex (from App::p5find)
https://metacpan.org/source/GUGOD/App-p5find-0.02/script/p5find-regex
---
```
# p5find-regex
t/bin/nsh-head-request.pl:23: my ($gimme) = ($self->{request_info}{'query_string'} ||'') =~ m'gimme_content_length=(1?)$';
t/live-connect-timeout.t:41: $exception =~ s/\n//g;
examples/hijkurl:32: $header ? ( head => [split /: /, $header, 2] ) : (),
lib/Hijk.pm:79: for (split /${CRLF}/o, $head) {
lib/Hijk.pm:80: my ($key, $value) = split /: /, $_, 2;
```
---
# END
- WIP
- @gugod (CPAN, twitter, github, irc, etc)
- Shall we find some buggy regexp ?
{"metaMigratedAt":"2023-06-14T23:18:38.652Z","metaMigratedFrom":"YAML","title":"Fuzzing regexp","breaks":true,"slideOptions":"{\"transition\":\"none\"}","contributors":"[{\"id\":\"dcce249b-7b0f-4093-b97b-479c7762dfba\",\"add\":7225,\"del\":291}]"}