changed 5 years ago
Published Linked with GitHub

Fuzzing regexp

Kang-min Liu // @gugod


.*.*=.*

Regexp::Debugger in action

https://blog.cloudflare.com/details-of-the-cloudflare-outage-on-july-2-2019/


That’s bad because as the input size goes up the match time goes up super-linearly.


A proof-of-concept

https://github.com/gugod/bin/blob/master/refuzz

# ./refuzz '.*.*=.*'
0	7	False
1	20	False
2	41	False
3	72	False
4	115	False
5	172	False
6	245	False
7	336	False
8	447	False
9	580	False
10	737	False
11	920	False
12	1131	False
13	1372	False
14	1645	False
15	1952	False
16	2295	False
17	2676	False
18	3097	False
19	3560	False
20	4067	False

Regexp::Debugger

​​​​use Regexp::Debugger save_to => "rx.json";
​​​​
​​​​$str =~ /$re/;

Did you know
the core of Regexp::Debugger
is a regexp (s///e) tomatch all regexps ?


re pragma

​​​​use re 'debug';
​​​​
​​​​"x=xxxxxxxxxxxxxxxxxxxx" =~ /.*.*=.*/;

Compiling REx ".*.*=.*"
Final program:
   1: STAR (3)
   2:   REG_ANY (0)
   3: STAR (5)
   4:   REG_ANY (0)
   5: EXACT <=> (7)
   7: STAR (9)
   8:   REG_ANY (0)
   9: END (0)
floating "=" at 0..9223372036854775807 (checking floating) anchored(MBOL) implicit minlen 1 
Matching REx ".*.*=.*" against "x=xxxxxxxxxxxxxxxxxxxx"
Intuit: trying to determine minimum start position...
  doing 'check' fbm scan, [0..22] gave 1
  Found floating substr "=" at offset 1 (rx_origin now 0)...
  (multiline anchor test skipped)
Intuit: Successfully guessed: match at offset 0
   0 <> <x=xxxxxxxx>         |   0| 1:STAR(3)
                             |   0| REG_ANY can match 22 times out of 2147483647...
  22 <xxxxxxxxxxxx> <>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 0 times out of 2147483647...
                             |   1|  failed...
  21 <xxxxxxxxxxx> <x>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 1 times out of 2147483647...
                             |   1|  failed...
  20 <xxxxxxxxxx> <xx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 2 times out of 2147483647...
                             |   1|  failed...
  19 <xxxxxxxxx> <xxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 3 times out of 2147483647...
                             |   1|  failed...
  18 <xxxxxxxx> <xxxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 4 times out of 2147483647...
                             |   1|  failed...
  17 <xxxxxxx> <xxxxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 5 times out of 2147483647...
                             |   1|  failed...
  16 <xxxxxx> <xxxxxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 6 times out of 2147483647...
                             |   1|  failed...
  15 <xxxxx> <xxxxxxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 7 times out of 2147483647...
                             |   1|  failed...
  14 <xxxxx> <xxxxxxxx>      |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 8 times out of 2147483647...
                             |   1|  failed...
  13 <xxxxx> <xxxxxxxxx>     |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 9 times out of 2147483647...
                             |   1|  failed...
  12 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 10 times out of 2147483647...
                             |   1|  failed...
  11 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 11 times out of 2147483647...
                             |   1|  failed...
  10 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 12 times out of 2147483647...
                             |   1|  failed...
   9 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 13 times out of 2147483647...
                             |   1|  failed...
   8 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 14 times out of 2147483647...
                             |   1|  failed...
   7 <xxxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 15 times out of 2147483647...
                             |   1|  failed...
   6 <=xxxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 16 times out of 2147483647...
                             |   1|  failed...
   5 <x=xxx> <xxxxxxxxxx>    |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 17 times out of 2147483647...
                             |   1|  failed...
   4 <x=xx> <xxxxxxxxxx>     |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 18 times out of 2147483647...
                             |   1|  failed...
   3 <x=x> <xxxxxxxxxx>      |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 19 times out of 2147483647...
                             |   1|  failed...
   2 <x=> <xxxxxxxxxx>       |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 20 times out of 2147483647...
                             |   1|  failed...
   1 <x> <=xxxxxxxxx>        |   1|  3:STAR(5)
                             |   1|  REG_ANY can match 21 times out of 2147483647...
   1 <x> <=xxxxxxxxx>        |   2|   5:EXACT <=>(7)
   2 <x=> <xxxxxxxxxx>       |   2|   7:STAR(9)
                             |   2|   REG_ANY can match 20 times out of 2147483647...
  22 <xxxxxxxxxxxx> <>       |   3|    9:END(0)
Match successful!
Freeing REx: ".*.*=.*"

Findings

  • Regexp::Debugger is as accurate as what textbook teaches you.

    • internal changes for emitting events is wanted
  • analysis info from re is specific to perl5

    • perl5 RE is very optimized
    • .oO( Had Cloudflare used perl5, they could have dodged the outage )

Extra

p5find-regex (from App::p5find)

https://metacpan.org/source/GUGOD/App-p5find-0.02/script/p5find-regex


# p5find-regex

t/bin/nsh-head-request.pl:23:    my ($gimme) = ($self->{request_info}{'query_string'} ||'') =~ m'gimme_content_length=(1?)$';
t/live-connect-timeout.t:41:    $exception =~ s/\n//g;
examples/hijkurl:32:    $header ? ( head => [split /: /, $header, 2] ) : (),
lib/Hijk.pm:79:                for (split /${CRLF}/o, $head) {
lib/Hijk.pm:80:                    my ($key, $value) = split /: /, $_, 2;

END

  • WIP

  • @gugod (CPAN, twitter, github, irc, etc)

  • Shall we find some buggy regexp ?

Select a repo