# Fuzzing at Scale
---
### Who we are?
+ Payatu Software Labs
+ Project Srishti
+ CloudFuzz
+ Vulnerability Research
+ CVE-2014-8446, CVE-2015-6086, CVE-2017-{8774, 8775, 8776, 8773, 5005, 8453, 8454, 8455, 5032, 3038, 10942, 10943, 10944, 10994, 11221, 11231, 16368}, CVE-2018-{9950, 9951, 12798, 8389} ...
---
### What do we want?
<span><!-- .element: class="fragment" data-fragment-index="1" -->
Fuzz `all` the things</span>
---
### Motivation
- Google didn't open source **ClusterFuzz** until February 2019
- Nightmare project
- third party code
- not so much of automation as we wanted
- Easy to understand our own code and design decissions
---
### What do we have?
Resources - memory, processing, power
Current Infrastructure
+ **168+32 CPU Cores**
+ **480+ GB RAM**
----
#### Old

----
#### New

---
### How do we do it?
+ Select target
+ Attack surface analysis
+ Build/Deploy
+ Generate testcases
+ Detect crashes using harness
+ Reproduce & Minimize crashes
+ Write exploit?
+ Report to vendors?
---
### Primary Target
+ Browsers/JS engines
+ File parsers
+ Whatever's interesting - kernels, compilers..
---
### Architecure
+ Micro services architecture
+ Manager & Worker based
+ Job oriented - gives fine grained control
+ Platform independent - *Fuzz `all` the things*
+ Plugin system
+ Heart beat based health monitoring
---
### Managers
+ Manage the platform
+ Create jobs
+ Record interesting stuff
---
### Workers
+ Execute the jobs
+ Bootstrapped VMs/containers
+ Cross platform
---
### Jobs
+ Download Corpus
+ Gather Coverage
+ Generate Testcases
+ Execute Testcases
+ Reproduce Crashes
+ Minimize Crashes
---
### Scale
+ Managers 3x
+ Workers 100x
+ ~2 million jobs in a day
+ 95% fuzz jobs
---
## Details
---
## UI
----
#### Overall

----
#### Statistics

----
#### Bots

----
#### Realtime Coverages

----
#### Crashes

----
#### Coverage Engines

----
#### Mutation Engines

----
#### Crashes

----
#### Some more Crashes

----
#### Bug Reports

---
### Distillation
+ Auto download of corpus
+ Minset calculation - similar to afl-cmin
+ Union coverage of all files - choose smallest file, subtract its coverage, pick if new blocks, repeat
---
### Coverage Management
+ View realtime coverage graphs
+ Gather coverage for minset generation
+ Explore aggregated coverage in IDA Lighthouse
---
### Grammar Inference for Input Generation
#### Using ML to generate high coverage corpus
* Learn the Grammar from a set of input files
* Realtime generation of input from inferred grammar
----
#### Example PDF objects generation
* Deeper look at data

----
#### Details
* Dataset: Millions of PDF objects
* Model: LSTM (Long Short Term Memory) Networks
|
|
|:--:|
|*[Image credit](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)*|
----
#### Output
* Generated PDF objects
```
17 0 obj
<</Subtype/Link/Rect[ 142.50 12.10 76.92 314.4 13.488] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://than /ersa0]>>
endob 1082 1540 119 0 0 02 0 R/Type/Encoding/Roding/Identiseica1
/ColorSpace 46 0 R/Percjs[63 0 R/Regi Com50 0 R/FontDescriptor/9 0 R/BaseFont/Helveticating.pdf)/S/URI>>/Rect[456.867 585.487 598.178 744.645]>>
endobj
18 0 obj
<</Type/Annot/Border[0 0 0]/Dest(refg16)/Subtype/Link/Rect[848.902 224.944>>
endobj
19 0 obj
<</Type/Page
/A/DivB<</Groul 87
00 0>>
endobj
20 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(en-US) /StructParent 417>>
endobj
```
---
### Browser Fuzzing
+ Old school DOM fuzzing?
+ Grammar based fuzzing techniques
+ Domato and Dharma
+ Ease of writing and maintaining grammar
+ Testcase generation engines modification
+ Custom fork of dharma
+ Custom extensions
+ Python generators
+ Ditched Domato over Dharma
+ Code fragement based grammar generation
---
### PDF Parser Fuzzing
+ Integrating **radamsa** and other external data generators and mutators
+ Custom mutators in python
+ Javascript engines in PDF readers?
+ Reusing same mutators/generators explored in browser fuzzing
+ Complexity?
+ UI automation for auto form/dialog close
---
### Crash Management
+ View on WebUI, check for interestingness
+ One click manual reproduction and minimization
+ Reproduce & minimize only interesting crashes
+ Prevents unnecessary resource utilization
+ Download testcase for manual analysis
---
### Extendable
+ All code in python
+ Abstracted classes - Manager, Worker, Plugins
+ ThreadOrProcess based on implementation/use
+ Commit in code, manage in database
---
### Deployment
+ Build target
+ Copy targets to template
+ Deploy copies of template
+ Automate with ansible
+ Periodic tasks to manage workers
+ Kill hangs
+ Update framework source from git
+ Restart workers
+ Update status to view on WebUI
---
### Decisions
+ Scale database access
- Don't give database access to workers
- Cache frequent queries which are likely not to change frequently, like getting all active projects
+ Beanstalk as a message bus
- Jobs - pushed by managers, consumed by workers
- Config - Interactive to-and-fro
- Results - pushed by workers, consumed by managers
---
### Decisions
+ Job Segregation
- Each project with own queues for all types of jobs
- Ensures job matches its environment
- Bot publishes it's config and environment once intialized
- Bot manager replies with relevant projects and config
- Bot initializes and publishes whats available
- Manager saves the state in cache
---
### What did we learn?
+ Having a stable platform really helps
+ Fuzzing is a full time job
+ Attack surface analysis is must
+ Keep improving generators/mutators
+ Focus on coverage driven fuzzing
+ Don't be afraid to experiment
+ Failing is expected
---
### Roadmap
+ Expand hardware infrastructure
+ Add RabbitMQ support
+ Crash Recycling
+ AFL
+ Libfuzzer
+ Android and MacOS support
+ More custom mutators/generators
+ Auto crash analysis
---
# Q/A
{"metaMigratedAt":"2023-06-14T20:09:05.257Z","metaMigratedFrom":"YAML","title":"Fuzzing at Scale","breaks":true,"slideOptions":"{\"transition\":\"slide\",\"theme\":\"night\"}","contributors":"[{\"id\":\"f41e1afe-84d9-46f4-ab07-3940dc41035c\",\"add\":9396,\"del\":3848},{\"id\":null,\"add\":2330,\"del\":1150}]"}