Benchmarking - HackMD

# SAST Tool analysis ## Semgrep Semgrep was easy to use and the CLI output was easy to interpret. I should call out that I needed to specify the default config, this is not necessarily required but you cannot prevent metrics from sending unless you specify the config. I also had to do some investigation about the default ignore list and include my own custom `.semgrepignore` file to allow a fair comparison and include tests in the scanning. This information was well detailed in the docs. :heavy_check_mark: Supports excluding specific files from scanning :heavy_check_mark: Supports excluding specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation ```bash brew install semgrep ``` #### Running the tool ```bash semgrep --config "p/default" --metrics=off ``` #### Output and exporting CLI output is easy to read and contains a summary of the findings. There is also the option to rather export the information in the following formats: emacs single-line, gitlab-sast, gitlab-secrets, json, junit-xml, sarif, text and vim single-line. #### Supported Languages The following languages are supported: Go, Java, JavaScript, JSON, Terraform, TypeScript, Ruby, Python, PHP and more which I haven't listed because they are likely not relevant to klaviyo. ## Snyk Snyk was easy to use and the output was easy to interperet. This tool worked straight out the box without any modifications to the base command. The documentation was easy to read through. :heavy_check_mark: Supports excluding specific files from scanning :heavy_check_mark: Supports excluding specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation ```bash curl https://static.snyk.io/cli/latest/snyk-macos -o snyk; chmod +x ./snyk; mv ./snyk /usr/local/bin/; ``` #### Running the tool ```bash snyk code test ``` #### Output and exporting The CLI output is easy to read and contains summary information. The information can also be exported in a json or sarif format. #### Supported Languages The following languages are supported: Go, Java, JavaScript, TypeScript, Ruby, Python, PHP and more which I haven't listed because they are likely not relevant to klaviyo. ## SonarQube Installation was simple, and scanning a project involved adding it through the UI and then copy-pasting the suggested command into the terminal to run the scan. This worked well for the most part until attempting to run on a Java project which needed to be compiled in order to run. This is another extra step that may increase complexity and run time. I did try to run the docker solution but could not get it to work, the scanner seemed to run forever so I just gave up and ran it bare-metal. It should be noted that this tools seems to cover a lot more than SAST, it provides suggestions to improve other aspects of code quality and not just securirty, we should consider if we want these features. :heavy_check_mark: Supports excluding specific files from scanning :heavy_check_mark: Supports ignoring specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation ```bash brew install sonarqube; brew install sonar-scanner; ``` #### Running the tool The java version management of the bare-metal version made this somewhat tricky, especially considering the fact that the Java repos needed to be compiled for this to work (possibly a third java version). Ideally we would have the server running elsewhere so not a big deal but still frustrating for local development. ```bash # Must use java version 17 export PATH="/opt/homebrew/opt/openjdk@17/bin:$PATH" sonar console; # Must use later java version PATH="/opt/homebrew/opt/openjdk/bin:$PATH" sonar-scanner \ -Dsonar.projectKey=Test-App \ -Dsonar.sources=. \ -Dsonar.host.url=http://localhost:9000 \ -Dsonar.token=****; ``` #### Output and exporting Results are outputted to the server and can be viewed in the browser. There is a lot of information about good code practice which we are not interested in. Specifically I looked at "Security Hotspots" when recording the number of findings. I could not find any options for exporting the data, it looks like the suggested way to get reports is to query the web APIs. #### Supported Languages The following languages are supported: Go, Java, JavaScript, TypeScript, Ruby, Python, PHP and more which I haven't listed because they are likely not relevant to klaviyo. ## Arnica The biggest drawback of this tool is that is does not have a CLI. I also could not find a way to manually trigger a scan (only using hooks) and I could not find a way to export the results of the scan. The tool does also come with some features we may not be interested in such as SCA and secret detection. :x: Supports excluding specific files from scanning :heavy_check_mark: Supports ignoring specific issues :x: Supports excluding rules from the scan #### Installation: Sign in using github credentials and provide the app access to the github repositories you want scanned. There is no CLI for this tool and since I did not want to provide access to Klaviyo code, I did not test this on the Klaviyo app. #### Running the tool: The tool is triggered when the app is initially connected to github, following that you can set up triggers for the scanning. The options for these triggers are: - On pull request - On push #### Output and exporting The output can be viewed in the UI, I did not find any options for exporting the data. #### Supported Languages The following languages are supported: Go, Java, JavaScript, TypeScript, Ruby, Python, PHP and more which I haven't listed because they are likely not relevant to klaviyo. ## Aikido This tool seems very similar to Arnica, it UI only and scans code through github. This tool seems very young and missing a lot of features. It has no CLI, cannot export results, and found the fewest findings while running against the vulnerable repos. I cannot tell what features it adds on top of other the tools I have tested and it does not seem to do the SAST very well. :x: Supports excluding specific files from scanning :heavy_check_mark: Supports ignoring specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation: Sign in using github credentials and provide the app access to the github repositories you want scanned. There is no CLI for this tool and since I did not want to provide access to Klaviyo code, I did not test this on the Klaviyo app. #### Running the tool: The tool is triggered when the app is initially connected to github, you can also trigger a scan from the UI #### Output and exporting The output can be viewed in the UI and a PDF report can be exported, I did not find any other options for exporting the data. #### Supported Languages The following languages are supported: Go, Java, JavaScript, TypeScript, Ruby, Python, PHP and more which I haven't listed because they are likely not relevant to klaviyo. ## Bandit This tool was easy to install and run. The CLI output was simple to interpret and it supports exporting the data in a lot of different file formats. This ran a lot faster than the other tools and was able to found ~15x as many issues as the tool that found the second highest number of issues in the app (false positives?). The major downside of the tool is that it only supports python. :heavy_check_mark: Supports excluding specific files from scanning :heavy_check_mark: Supports excluding specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation: ```bash # Assumning you have python installed pip install bandit ``` #### Running the tool: ```bash bandit -r . ``` #### Output and exporting The CLI output is easy to read and contains summary information. The information can also be exported in one of the following formats: csv, html, json, screen, txt, xml, yaml or custom. #### Supported Languages This tool only supports python. ## CodeQL Out of all the tools I tested, this was by far the trickiest to use. Installation required mutiple steps, I was unsuccessful in getting it installed properly using the official documentation and had to rely on a Medium article. Additionally, in order to run the tool, you have to be very conscious of the languages in the codebase since you will need to build a database for each language. The multi-step approach to runnning the scan (build databases, run scan on each database) made everything a lot more complex. Additionally the tool required compilation of Java (and likely other compiled languages) which further worsened this. This tool identified a good number of findings but was a lot slower than the other tools. The CLI oputput from the tool did not provide much meaniingful data and I needed to manually look into the .sarif file to get information about the findings. :heavy_check_mark: Supports excluding specific files from scanning :cross: Supports excluding specific issues :heavy_check_mark: Supports excluding rules from the scan #### Installation: #### Running the tool: ```bash codeql database create ./.codeql_db --language="python,javascript" --db-cluster && codeql database analyze ./.codeql_db/javascript --format="sarif-latest" --output=./codeql_js.sarif --sarif-category=javascript --verbose && codeql database analyze ./.codeql_db/python --format="sarif-latest" --output=./codeql_py.sarif --sarif-category=python ``` #### Output and exporting The CLI output was not very helpful but there was options for exportion the data. The data can be exported in CSV, SARIF, and graph formats. ## Benchmarking ### Findings | Repo | Semgrep | Snyk | SonarQube | Arnica | Aikido | Bandit | CodeQL | Average | | ------------- | ------: | ---: | --------: | -----: | -----: | -----: | -----: | ----------------: | | BenchmarkJava | 5848 | 3526 | 1125 | 2759 | 548 | N/A | 4072 | 2979.67 | | VAmPI | 8 | 2 | 7 | 5 | 1 | 6 | 3 | 4.57 | | WebGoat | 163 | 236 | 72 | 127 | 172 | N/A | 66 | 139.33 | | django.nv | 37 | 9 | 13 | 41 | 8 | 5 | 65 | 25.43 | | dvna | 33 | 38 | 6 | 22 | 9 | N/A | 35 | 23.83 | | juice-shop | 123 | 289 | 264 | 56 | 97 | N/A | 174 | 167.17 | | Klaviyo App | 2680 | 588 | 2765 | N/A | N/A | 41871 | 236 | 9628.00 (1567.25) | > Semgrep by default ignores all test files, I have modified the .semgrepignore file to not ignore any files for this testing. This was after seeing test files show up in Snyk and SonarQube results. ### Timing (in seconds) | Repo | Semgrep | Snyk | SonarQube | Arnica | Aikido | Bandit | CodeQL | Average | | :------------ | ------: | -----: | --------: | -----: | -------------: | -----: | ------: | ------: | | BenchmarkJava | 107.88 | 69.37 | 331.42 | 95.33 | 18 (116 total) | N/A | 209.56 | 138.59 | | VAmPI | 12.14 | 8.32 | 12.86 | 95.33 | 6 (45 total) | 0.36 | 149.25 | 40.61 | | WebGoat | 49.21 | 66.33 | 84.95 | 95.33 | 8 (61 total) | N/A | 213.96 | 86.30 | | django.nv | 17.66 | 13.50 | 50.25 | 95.33 | 5 (41 total) | 0.46 | 239.59 | 60.26 | | dvna | 11.74 | 8.39 | 33.72 | 95.33 | 8 (48 total) | N/A | 47.27 | 34.08 | | juice-shop | 37.32 | 132.41 | 109.98 | 95.33 | 4 (63 total) | N/A | 98.22 | 79.54 | | Klaviyo App | 627.19 | 873.61 | 617.10 | N/A | N/A | 161.71 | 2437.85 | 943.49 | > **Aikido timings:** Since this does more than SAST I have included the total time, this includes scans on: dependencies, exposed secrets, SAST, infrastructure as code and surface monitoring > > **Arnica timings:** Arnica does not provide per repo scan timings but rather the time to scan all the repos. I was unable to find a way to easily configure a scan of a single repo so instead included the total time divided by the number of repos. > > **CodeQL Timings**: The timings here include the time to build the database an then analyze the results. For multi language repos this includes the building and analyzing of each database. ## Rankings The rankings are calculated as the average of `tool_result_for_repo/average_tool_result_for_repo` over all repos tested. A lower score in the timing reflects a better result, while a higher score in the findings represents a better result. #### Timing | Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | --------- | :----: | :----: | :-----: | :---: | :-------: | :----: | :----: | | **Tool** | Bandit | Aikido | Semgrep | Snyk | SonarQube | Arnica | CodeQL | | **Score** | 0.063 | 0.123 | 0.488 | 0.648 | 1.079 | 1.620 | 2.407 | #### Findings | Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | --------- | :----: | :-----: | :----: | :---: | :----: | :-------: | :----: | | **Tool** | Bandit | Semgrep | CodeQL | Snyk | Arnica | SonarQube | Aikido | | **Score** | 1.953 | 1.248 | 1.084 | 1.008 | 0.967 | 0.722 | 0.485 | #### Usage For scoring usage I took into account how sinple it was to set up the tool, run a scan and interpret the results. | Rank | 1 | 2 | 3 | 4 | 5 | 6 | 7 | | --------- | :----: | :--: | :-----: | :----: | :----: | :-------: | :----: | | **Tool** | Bandit | Snyk | Semgrep | Aikido | Arnica | SonarQube | CodeQL | | **Score** | 5 | 5 | 4.5 | 4 | 3 | 3 | 2 | # TODO; - Allowing ignore rules - Custom rules (nice to have) - IDE integration (nice to have) # Rule out tools - Once down to 3 ping Justin Pagano (Static analysis tooling channel)