# Protect the System Call, Protect (most of) the World with BASTION
###### tags: `security` `system call`
[website](https://www.unexploitable.systems/publication/jelesnianskibastion/)
## Abstract
* enforces at **runtime**
* 3 contexts
* (Call type): which system call, how's call
* (Control Flow)
* (Argument Integrity)
* Bastion
* a compiler
* a runtime monitor system
* Case study
* NGINX, SQLite, and vsFTPd
* indication: reduce overhead of 0.60%, 2.01% 1.65%
## Introduction
* Traditional methods:
* debloating, system call filtering, and system call sandboxing
* idea: disabling unused system calls
* problem: still allow system calls to be invoked (cuz still needed, e.g code-reuse)
* Contribution:
* **Novel system call contexts for system call integrity**
* 3 system call contexts, namely **Call Type**, **Control Flow**, and **Argument Integrity** contexts
* **Bastion defense enforcing system call integrity**
* compiler pass
* analyzes all system call usage, performs instrumentation, and generates metadata
* runtime monitor
* static and dynamic aspects of each system call context
* **Security & performance evaluation**
* NGINX, SQLite, and vsFTPd
## Background
### System call Usage in Attaacks
* 400+ system calls in recent Linux kernel
* Only few system calls are desired by attackers, call **sensitive system calls**
### Current system call protection mechanisms
* Attack surface reduction
* Debloating techniques
* Remove unused code
* static program analysis or dynamic coverage analysis
* Problem: many sensitive system calls are used forlibrary loading (mmap, mprotect)
* System call filtering
* Seccomp. system call filtering framework
* User needs to define an allowlist/denylist given an application
* Can restrict a system call argument
* Problem: cannot remove sensitve-but-necessary system calls
* Problem: restricting arguments to a constant value is applied across the entire application scope. **eg. there are to callsites of mprotect, one is read-only, another is read-executable, then the policy for all app scope is read-executable**
* Control-flow integrity (CFI)
* LLVM support (backward)
* perform analysis to generage an allowed set of targets per-callsite, called an equivalence class(EC)
* Problem: Larges ECs -> inconvient; Small ECs -> dangerous
* Problem: runtime overhead, Eg. Intel PT
* Data-flow integrity (DFI)
* instrument every `load` and `store` instruction
* Problem: overhead
## Contexts for system call integrity
Q: What's a legitimate use of a system call?
A: two variants: (1) control-flow integrity (2) data integrity
In this paper, thress contexts are defined accrodingly
1. Call-type context
2. Control-flow context
3. Argument integrity context
### Call-type context
only permitted system call are called in the right manner.
* direct call or indirect call
* direct call: `int ret = chmod("AAA", S_IWOTH)`
* indirect call: function pointer
* a system call is one of the categories:
* not-callable
* directly-callable
* indirectly-callable
* It is rare for system calls to be called from an indirect call site
### Control-flow context
* Keep the valid pathes of all sensitive system calls, and enforce this context at runtime
### Argument integrity context
* A system call argument type is either (1) a direct argument or (2) an extended argument
* direct argument: eg. constant, local variable
* indirect argument: eg. pointer
* if there's struct, take care of the filed
### Real world code examples
- [ ] Legitimate use of the execve system call in NGINX
```clike
// nginx/src/os/unix/ngx_process.c
static void ngx_execute_proc(ngx_cycle_t *cycle, void *data){
ngx_exec_ctx_t *ctx = data;
// Legitimate NGINX usage of execve system call
if (execve(ctx->path, ctx->argv, ctx->envp) == -1) {
...
}
exit(1);
}
// nginx/src/core/ngx_output_chain.c
ngx_int_t ngx_output_chain(ngx_output_chain_ctx_t *ctx, ngx_chain_t *in){
...
if (in->next == NULL && ngx_output_chain_as_is(ctx, in->buf) ) {
return ctx->output_filter(ctx->filter_ctx, in);
}
...
}
```
* ctx->output_filter = ngx_execute_proc
* ctx->path = "/bin/sh"
BASTION:
* Call-type context: At static analysis, we know that "execve" has to be a direct call -> but at runtime, it is a indirect call
* Control-Flow context: detect
* Argument integreity context: detect
- [ ] Snippet of NGINX code that can be compromised to reach and call the mprotect system call elsewhere by corrupting index in vulnerable code pointer
v[index].get_handler()
```clike
// nginx/src/http/ngx_http_variables.c
ngx_http_variable_value_t *ngx_http_get_indexed_variable( ngx_http_request_t *r, ngx_uint_t index){
...
if (v[index].get_handler(r, &r->variables[index], v[index].data) == NGX_OK) {
ngx_http_variable_depth++;
if (v[index].flags & NGX_HTTP_VAR_NOCACHEABLE) {
r->variables[index].no_cacheable = 1;
}
return &r->variables[index];
}
...
return NULL;
}
```
* buffer overflow: modify index, such that v[index].get_handler = mprotect
* r = memory region to exploit
* change permission
BASTION:
* mprotext does not have indirect call
* control path is problematic
* argument is wrong
## Threat model and assumptions
* arbitrary memory read/write
* Data Execution Prevention(DEP)
* attackers cannot inject or modify code due to DEP
* Address Space Layout Randomization (ASLR)
* Shadow Stack (CET)
* Hardware and OS kernel are trusted
* Attackers going for OS kernel and hardware (Spectre) are out of scope
* BASTION protects a subset of available system calls
* 
## BASTION design and implementation
* Choose 20 sensiteve system calls (Table1)
* BASTION = BASTION-compiler pass + BASTION-monitor
### BASTION compiler
* Generate context metadata
* Leverage light-weight library API for dynamic tracking of sensitive data (Argument integrity)
#### Analysis for Call-Type Context
* System calls in the program fall into three categories
* not-callable
* directly-callable
* indirectlycallable
* BASTION analyzes the entire program’s LLVM IR instructions and checks all call instructions.
* directly-callable: a system call is a target of a direct function call
* indirectly-callable: the address of a system call is taken and used in the left-hand-side of an assignment
* not-callable
* Metadata:
* pairs of system call numbers and their call type
* list of legitimate indirect callsites (i.e., offset in a program binary)
#### Analysis for Control-Flow Context
* performed only when a system call is invoked -> reduce overhead (CFL enforce for every indirect control-flow transfer)
compile time:
1. generate CFG
2. identify all function callee→caller relationships that reach system call callsites
3. For each callsite, recursively records all callee→caller associations
4. stops once reaching either main() or an indirect function call
runtime:
1. unwind stack frame
2. verifies callee→caller relations until the bottom of the stack (i.e., main), or an indirect callsite
#### Analysis for Argument Integrity Context
* check not only system call arguments but also an arguments’ data-dependent variables -> sensitive variables
* maintains a shadow copy of the sensitive variable’s legitimate value in a shadow memory region
* updates the shadow copy whenever the sensitive variable is updated legitimately
* binds each argument to a certain position for the system call so the Bastion runtime monitor can check argument integrity
* 

1. enumerates all variables used in system call arguments
2. performs a backward data-flow analysis, traversing the use-def chains to derive any other variables used to define sensitive variables
* newly identified data-dependent variables are added to the set of sensitive variables
3. , if there is a write to a field of a struct (e.g., size field of gshm in Figure 2), that write is added to the sensitive variables
**repeat 2, 3 until no new sensitive variables are found**
* Once all sensitive variables are identified, Bastion instruments ctx_write_mem after any memory-backed sensitive variable store to keep its shadow copy up-to-date
* Before each sensitive system call callsite, Bastion instruments ctx_bind_mem_X or ctx_-bind_const_X to bind an arguments to their respective argument position X
### Bastion runtime monitor
#### Initializing the Bastion Monitor
* Loading metadata:
* The monitor retrieves ELF, DWARF, and linked library file information to recover symbol addresses
* loads Bastion context metadata into the monitor’s memory
* Launching a Bastion-protected application:
* performs fork to spawn a child process where the child runs the Bastion-enabled application
* initializes a shadow memory region under a segmentation register
* initializes seccomp: trap on sensitive system calls in the child process
* ptrace: access the application’s state
* Trapping a system call invocation:
* custom seccomp-BPF filter to trap on the application’s sensitive system call
* SECCOMP_-RET_ALLOW: non-sensitive system calls, ignore
* SECCOMP_RET_KILL: disables any notcallable system calls
* SECCOMP_RET_TRACE: directlyand indirectly-callable system calls so these system calls can be verified by the Bastion monitor
#### Enforcing Call-Type Context
* Take $PC, look meatedata, check call type
#### Enforcing Control-Flow Context
* stack trace: unwinds and gets each function callsite offset
* CFG metadata:a list of callees and their respective valid callers
* until the entire stack has been vetted or an indirect call is encountered
#### Enforcing Argument Integrity Context
* verifies integrity of all sensitive variables in the current call stack
* Take $PC, check the associated argument integrity context metadata
### Implementation
* Linux x86-64 v5.19.14
* LLVM Module
* hardware-based shadow stack ```-fcf-protection=full```
* Intel Tiger Lake and AMD Ryzen 7 processors
* Glibc v2.28+
* Binutils v2.29+
Efforts:
* LLVM module: 3,939 lines of code
* Bastion’s C runtime library: 659 lines of code
* Bastion runtime monitor is a C-program: 7313 lines of code
## EVALUATION
### Evaluation Methodology
* 8-core (16-hardware thread) machine featuring an AMD Ryzen 7 PRO 5850U processor and 16 GB DDR4 memory
* Bastion LLVM compiler
* Results are reported average over five runs
* **NGINX, SQLite, and vsftpd**
### Performance Evaluation
* NGINX:
* wrk, HTTP benchmarking tool
* sends concurrent HTTP requests
* measure throughput
* NGINX maximum of 1,024 connections per processor
* 32 worker threads
* never incurred more than 0.60% degradation compared to the unprotected NGINX baseline
* Argument Integrity context adds the most overhead
* utilizes a vast sensitive system calls (e.g., mprotect, mmap) during its initialization phase while seldom using when idle or processing requests -> Bastion rarely being triggered during runtime
* average call-depth is only 5.2 frames, with 4 and 9 being minimum and maximum stack call-depths
* SQLite:
* DBT2, database transaction processing benchmark
* mix of read and write SQL operations for large data warehouse transactions
* 10 second new thread delay and a 10 minute workload duration
* number of new-order transactions per-minute (NOTPM) for performance
* Overhead:
* Call-Type: 0.92%
* Control-Flow: 1.48%
* Argument Integrity: 2.01%
* VSFTPD:
* dkftpbench, FTP benchmark program
* fetch a 100 MB file from vsftpd launching clients one after another for a 120 second duration
* Overhead: worst 1.65%

* Argument Integrity context is most costly
* LLVM CFI is expensive cuz it is triggered for every indirect callsite => NGINX does not have many indirect call

* mprotect, mmap

* Sensitive system calls are never called indirectly via a function pointer
* Compare with CET, LLVM CFI
* CET: maintain a secondary (shadow) stack. Upon returning from a function, the CPU compares return addresses in the shadow stack and the normal stack
* LLVM CFI: verification at every indirect callsite
## SECURITY EVALUATION
### ROP Attacks
* libc library call `system` = `fork` + `execl`
* exec-type system call, to create access to a root shell
* `mprotect` or `chmod` system calls to change memory or file permissions to be executable
### Direct Attacker Manipulation of System Calls
* Go after system calls directly, setup callsites and arguments to desired values
* The **CsCFI** attack leverages mprotect to make the entire libc readable, writable, and executable, revealing the code layout to perform arbitrary code execution
* **AOCR’s Attack 1** open and write to reveal the code layout of NGINX to execute arbitrary code
* Control-Flow and Argument integrity contexts to detect
* LLVM CFI cannot defend against either attack.
* In the CsCFI attack, mprotect is never used, its address is still taken as this system call is necessary to support dynamic loading of shared libraries
### Indirect Attack Manipulation of System Calls
* full-function code re-use, data-oriented attacks, and COOP
* The **NEWTON CPI attack** avoids corrupting any code or data pointers. It corrupts the index variable of an array of function pointers to make the array index point to a system call location
* Call-Type context blocks the invocation of a system call never used in the program code base
## DISCUSSION AND LIMITATIONS
### Bastion under Arbitrary Memory Corruption
* To bypass all three of Bastion’s contexts, the attacker realistically would needto perform arbitrary read/write many times to match the expected context values without violating static constraints
### Protecting Filesystem Related System Calls
* The main challenge is that this type of system call is called much more frequently
* Full Bastion context checking incurs high overhead – e.g., 96.7% for NGINX
* A majority of overhead results from fetching protected process state using `ptrace` (< 95.7%, delta between Rows 1 and 2)
* additional context switching overhead to access the protected program
* eliminate ptrace overhead would be to run the Bastion monitor inside the kernel