Fuzzing at Scale

# Fuzzing at Scale --- ### Who we are? + Payatu Software Labs + Project Srishti + CloudFuzz + Vulnerability Research + CVE-2014-8446, CVE-2015-6086, CVE-2017-{8774, 8775, 8776, 8773, 5005, 8453, 8454, 8455, 5032, 3038, 10942, 10943, 10944, 10994, 11221, 11231, 16368}, CVE-2018-{9950, 9951, 12798, 8389} ... --- ### What do we want? <span> Fuzz `all` the things</span> --- ### Motivation - Google didn't open source **ClusterFuzz** until February 2019 - Nightmare project - third party code - not so much of automation as we wanted - Easy to understand our own code and design decissions --- ### What do we have? Resources - memory, processing, power Current Infrastructure + **168+32 CPU Cores** + **480+ GB RAM** ---- #### Old ![](https://pbs.twimg.com/media/CLuw_Q9UcAADg6I.jpg =330x600) ---- #### New ![](https://i.imgur.com/fbHnzol.jpg =330x600) --- ### How do we do it? + Select target + Attack surface analysis + Build/Deploy + Generate testcases + Detect crashes using harness + Reproduce & Minimize crashes + Write exploit? + Report to vendors? --- ### Primary Target + Browsers/JS engines + File parsers + Whatever's interesting - kernels, compilers.. --- ### Architecure + Micro services architecture + Manager & Worker based + Job oriented - gives fine grained control + Platform independent - *Fuzz `all` the things* + Plugin system + Heart beat based health monitoring --- ### Managers + Manage the platform + Create jobs + Record interesting stuff --- ### Workers + Execute the jobs + Bootstrapped VMs/containers + Cross platform --- ### Jobs + Download Corpus + Gather Coverage + Generate Testcases + Execute Testcases + Reproduce Crashes + Minimize Crashes --- ### Scale + Managers 3x + Workers 100x + ~2 million jobs in a day + 95% fuzz jobs --- ## Details --- ## UI ---- #### Overall ![](https://i.imgur.com/0pN9WE1.png) ---- #### Statistics ![](https://i.imgur.com/Ql0sauR.png) ---- #### Bots ![](https://i.imgur.com/7veZ556.png) ---- #### Realtime Coverages ![](https://i.imgur.com/k0DIBwg.png) ---- #### Crashes ![](https://i.imgur.com/OQI32Yc.png) ---- #### Coverage Engines ![](https://i.imgur.com/yevwsQ1.png) ---- #### Mutation Engines ![](https://i.imgur.com/GE6K5ji.png)![](https://i.imgur.com/Sul0zwr.png) ---- #### Crashes ![](https://i.imgur.com/NRnZqCd.png) ---- #### Some more Crashes ![](https://i.imgur.com/qGxRf6Y.png) ---- #### Bug Reports ![](https://i.imgur.com/GO54t58.png) --- ### Distillation + Auto download of corpus + Minset calculation - similar to afl-cmin + Union coverage of all files - choose smallest file, subtract its coverage, pick if new blocks, repeat --- ### Coverage Management + View realtime coverage graphs + Gather coverage for minset generation + Explore aggregated coverage in IDA Lighthouse --- ### Grammar Inference for Input Generation #### Using ML to generate high coverage corpus * Learn the Grammar from a set of input files * Realtime generation of input from inferred grammar ---- #### Example PDF objects generation * Deeper look at data ![Inside PDF](https://i.imgur.com/YPBnfNU.png) ---- #### Details * Dataset: Millions of PDF objects * Model: LSTM (Long Short Term Memory) Networks |![LSTM](https://i.imgur.com/RWGq4G0.png) | |:--:| |*[Image credit](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)*| ---- #### Output * Generated PDF objects ``` 17 0 obj <</Subtype/Link/Rect[ 142.50 12.10 76.92 314.4 13.488] /BS<</W 0>>/F 4/A<</Type/Action/S/URI/URI(http://than /ersa0]>> endob 1082 1540 119 0 0 02 0 R/Type/Encoding/Roding/Identiseica1 /ColorSpace 46 0 R/Percjs[63 0 R/Regi Com50 0 R/FontDescriptor/9 0 R/BaseFont/Helveticating.pdf)/S/URI>>/Rect[456.867 585.487 598.178 744.645]>> endobj 18 0 obj <</Type/Annot/Border[0 0 0]/Dest(refg16)/Subtype/Link/Rect[848.902 224.944>> endobj 19 0 obj <</Type/Page /A/DivB<</Groul 87 00 0>> endobj 20 0 obj <</Type/Catalog/Pages 2 0 R/Lang(en-US) /StructParent 417>> endobj ``` --- ### Browser Fuzzing + Old school DOM fuzzing? + Grammar based fuzzing techniques + Domato and Dharma + Ease of writing and maintaining grammar + Testcase generation engines modification + Custom fork of dharma + Custom extensions + Python generators + Ditched Domato over Dharma + Code fragement based grammar generation --- ### PDF Parser Fuzzing + Integrating **radamsa** and other external data generators and mutators + Custom mutators in python + Javascript engines in PDF readers? + Reusing same mutators/generators explored in browser fuzzing + Complexity? + UI automation for auto form/dialog close --- ### Crash Management + View on WebUI, check for interestingness + One click manual reproduction and minimization + Reproduce & minimize only interesting crashes + Prevents unnecessary resource utilization + Download testcase for manual analysis --- ### Extendable + All code in python + Abstracted classes - Manager, Worker, Plugins + ThreadOrProcess based on implementation/use + Commit in code, manage in database --- ### Deployment + Build target + Copy targets to template + Deploy copies of template + Automate with ansible + Periodic tasks to manage workers + Kill hangs + Update framework source from git + Restart workers + Update status to view on WebUI --- ### Decisions + Scale database access - Don't give database access to workers - Cache frequent queries which are likely not to change frequently, like getting all active projects + Beanstalk as a message bus - Jobs - pushed by managers, consumed by workers - Config - Interactive to-and-fro - Results - pushed by workers, consumed by managers --- ### Decisions + Job Segregation - Each project with own queues for all types of jobs - Ensures job matches its environment - Bot publishes it's config and environment once intialized - Bot manager replies with relevant projects and config - Bot initializes and publishes whats available - Manager saves the state in cache --- ### What did we learn? + Having a stable platform really helps + Fuzzing is a full time job + Attack surface analysis is must + Keep improving generators/mutators + Focus on coverage driven fuzzing + Don't be afraid to experiment + Failing is expected --- ### Roadmap + Expand hardware infrastructure + Add RabbitMQ support + Crash Recycling + AFL + Libfuzzer + Android and MacOS support + More custom mutators/generators + Auto crash analysis --- # Q/A