# Secure System Development Lab 2 - Static Application Security Testing
**Team Members:**
* Daniyar Cherekbashev (d.cherekbashev@innopolis.university)
* Gleb Statkevich (g.statkevich@innopolis.university)
* Daria Kalashnikova (d.kalashnikova@innopolis.university)
## Task 1
**1. Q: What is the differences between source code scanners and binaries scanners?**
A:
Source code scanners look for insecure coding practices (such as improper input validation, buffer overflows, and insecure data storage). Source code scanners are typically used during the development phase.
Binary scanners look for vulnerabilities that may exist in the compiled code (buffer overflows, hardcoded credentials, and insecure permissions). Binary scanners are typically used in post-deployment security testing to identify potential vulnerabilities in a application.
Both types of scanners play a crucial role in securing software applications.
**2. Q: Explain how Abstract Syntax Trees (AST) can help to find vulnerabilities and what kind of vulnerabilities can be found more effective.**
A: Abstract Syntax Trees (AST) are a hierarchical (tree) representation of the source code of a program, it allows use in-depth and structured analysis of the codebase.
- Injection vulnerabilities: AST can be used to detect insecure input handling (ex: SQL injection)
- Code injection vulnerabilities: AST can help detect vulnerabilities that come from dynamically executing code (ex: `eval()`)
- Information leakage vulnerabilities: AST can help detect potential data leakage vulnerabilities, (ex: logging sensitive information, exposing sensitive data in error messages)
**3. Q: What is Static Code Analysis?**
A: Set of techniques of source code analysis. Its purpose is to find errors, basic mistakes, logical flaws that may lead to runtime errors or vulnerabilities. The analysis is called "static" because the source code of the software does not need to be actually executed.
**4. Q: Give and explain the benefit(s) of Static Analysis Tool.**
A:
- Easy to integrate the analysis into pipeline and run it frequently + Increased productivity (Static analysis tools can automate the process of code review)
- Early detection of essential vulnerabilities (insecure usage of functions, injections, etc.)
- Increase code quality and disciple the devs to write good code
**5. Q: Give and explain the limitation(s) of Static Analyzers.**
A:
- Only basic vulnerabilities may be detected - injections, insecure functions, improper function usage
- Langugage specific - it's needed to adapt tools to any new language used
- False-positivies on code which may be unable to exploit or the flaw is mitigated and False-negative (miss actual issues in the codebase)
- Limited scope: code analysis based on predefined rules and patterns, which means they may not catch all types of issues or vulnerabilities and unable to analyse archtictural issues with (authentication, data flow, etc.)
For this lab, we'll be using Semgrep + OWASP Juice Shop.
## Task 3.1 Semgrep - Set up your environment
Imported Juice Shop project:

Semgrep pipeline (with included owasp-top-10 rule from semgrep.dev):
```yaml=
semgrep:
image: semgrep/semgrep
script: semgrep ci
rules:
- if: $CI_PIPELINE_SOURCE == "web"
- if: $CI_MERGE_REQUEST_IID
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
variables:
SEMGREP_APP_TOKEN: $SEMGREP_APP_TOKEN
```
After adding the pipeline, we can see that scanning is in progress in our dashboard:

Our CI job has successfully passed and reported 81 findings.

We can now access semgrep results in the dashboard. 19 of 81 findings have high severity

Given the high number of vulnerabilities, especially those with high severity, it is clear that the Juice Shop web application has significant security flaws. The presence of NoSQL injections, SSTI to RCE, and XXE vulnerabilities highlights the need for thorough and comprehensive security testing and remediation.
## Task 3.2 Semgrep - Analysis
**Case 2:**
```python=
import jwt as tokenizer
# JWT Auth bypass
def accept_request():
decode = tokenizer.decode(token, "password", verify=False)
# JWT algorithm bypass
def accept_another_request():
decode= jwt.decode(token,"password")
```
scan results

1. Vulnerabilities: a secret is hard-coded in the application.
2. Semgrep rules: `python.pyjwt.python-pyjwt-hardcoded-secret.python-pyjwt-hardcoded-secret`
**Case 3**
```python=
import sqlite3
from passlib.hash import pbkdf2_sha256
def db_init():
users = [
('ace', pbkdf2_sha256.encrypt('123456')),
('semper', pbkdf2_sha256.encrypt('Password')),
('alex', pbkdf2_sha256.encrypt('Waiver2'))
]
conn = sqlite3.connect('users.sqlite')
c = conn.cursor()
c.execute("DROP TABLE users")
c.execute("CREATE TABLE users (user text, password text, failures int)")
for u,p in users:
c.execute("INSERT INTO users (user, password, failures) \
VALUES ('%s', '%s', '%d')" %(u, p, 0))
conn.commit()
conn.close()
if __name__ == '__main__':
db_init()****
```
scan results:

1. Vulnerability: execution of raw/formatted SQL query in code.
2. Semgrep rules:
- `python.sqlalchemy.security.sqlalchemy-execute-raw-query.sqlalchemy-execute-raw-query`
- `python.lang.security.audit.formatted-sql-query.formatted-sql-query`
### Adding Semgrep rules to CI
1. Imported cases in Gitlab:

2. After adding our rules in rule policies, we run our semgrep CI job in created project

As we can see, semgrep successfully detected those rule violations

## Task 3.3 Semgrep - Remediation
We'll solve 3 issues from the OWASP Juice Shop project
### NoSQL Injection
The very first vulnerability with High severity that semgrep shows to us is a potential NoSQL vulnerability that might be possible at 8 endpoints.

**CWE-943**: Improper Neutralization of Special Elements in Data Query Logic
**CVSS v3.1** score: 4.3 `CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:N/A:N/E:F/RC:C`
**Remediation**
To prevent NoSQL injection, it is important to validate and sanitize user input: use strict data types, filter out malicious input by using pattern matching. Also, it is important to use parameterized queries instead of dynamically constructing queries, as they were designed to specifically remove any user-interaction and separate query logic from the user input, automatically handling the proper escaping.
For this case (`routes/trackOrder.ts` endpoint), for a user-defined variable `id`, we can convert it to
```javascript=
const id = String(id).replace(/[^\w-]+/g, '')
```
which removes all non-alphanumeric characters from the string, making it impossible to perform injection.
Example:

### XXE
Another interesting finding is the call to `parseXml()` function with the unsanitized data passed by user. This might open a potential vector of XXE attack

**CWE-611**: Improper Restriction of XML External Entity Reference
**CVSS v3.1** score: 7.5 `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N`
**Remediation**
Most XXE vulnerabilities arise because the application's XML parsing library supports potentially dangerous XML features that the application does not need or intend to use. The easiest and most effective way to prevent XXE attacks is to disable those features.
In this case, it is enough to simply parse XML document in vm and set `noent` parameter to `false`:
```javascript=
const xmlDoc = vm.runInContext('libxml.parseXml(data, { noblanks: true, noent: false, nocdata: true })', sandbox, { timeout: 2000 })
```
`NOENT` set to `true` means that no entity nodes should be created in the parsed document, meaning that every entity is expanded. By setting it to `false` XXE attacks are not possible in any way.
### SSTI to RCE
Another finding that caught our eyes is unvalidated user input that is compiled into the template, essentially leading to SSTI vulnerability

**CWE-1336**: Improper Neutralization of Special Elements Used in a Template Engine
**CVSS v3.1** score: 9.8 `CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H`
**Remediation**
Just like in NoSQL injection, it is important to firstly sanitize user input before using it in the template. To do this, various means are possible (use of regex, white lists of authorised expressions, etc).
Additionally, it is crucial to use a secure template engine that restricts execution of arbitrary code and limits the functionality of templates.
For even more secure approach, run user-supplied data in a closed environment, where risky modules and features are disabled.
For our specific case (pug engine), it will be enough to match user input with the regex like this `/#{(.*)}/` and, if match occurs, disallow render or remove special characters from input to render the secure version of input

### Re-scanning the project
After fixing all aforementioned vulnerabilities, we'll scan the project again. Latest scan showed roughly 3 findings less than it was before:

Conviniently enough, semgrep provides a `Fixed` status where we can see all vulnerabilities that were fixed within the latest commit.

Overall, SAST with semgrep makes the progress of tracking and fixing vulnerabilities much more intuitive and easier.