Protect the System Call, Protect (most of) the World with BASTION

tags: `security` `system call`

website

Abstract

enforces at runtime
3 contexts
- (Call type): which system call, how's call
- (Control Flow)
- (Argument Integrity)
Bastion
- a compiler
- a runtime monitor system
Case study
- NGINX, SQLite, and vsFTPd
- indication: reduce overhead of 0.60%, 2.01% 1.65%

Introduction

Traditional methods:
- debloating, system call filtering, and system call sandboxing
- idea: disabling unused system calls
- problem: still allow system calls to be invoked (cuz still needed, e.g code-reuse)
Contribution:
- Novel system call contexts for system call integrity
  - 3 system call contexts, namely Call Type, Control Flow, and Argument Integrity contexts
- Bastion defense enforcing system call integrity
  - compiler pass
    - analyzes all system call usage, performs instrumentation, and generates metadata
  - runtime monitor
    - static and dynamic aspects of each system call context
- Security & performance evaluation
  - NGINX, SQLite, and vsFTPd

Background

System call Usage in Attaacks

400+ system calls in recent Linux kernel
Only few system calls are desired by attackers, call sensitive system calls

Current system call protection mechanisms

Attack surface reduction
- Debloating techniques
- Remove unused code
- static program analysis or dynamic coverage analysis
- Problem: many sensitive system calls are used forlibrary loading (mmap, mprotect)
System call filtering
- Seccomp. system call filtering framework
- User needs to define an allowlist/denylist given an application
- Can restrict a system call argument
- Problem: cannot remove sensitve-but-necessary system calls
- Problem: restricting arguments to a constant value is applied across the entire application scope. eg. there are to callsites of mprotect, one is read-only, another is read-executable, then the policy for all app scope is read-executable
Control-flow integrity (CFI)
- LLVM support (backward)
- perform analysis to generage an allowed set of targets per-callsite, called an equivalence class(EC)
- Problem: Larges ECs -> inconvient; Small ECs -> dangerous
- Problem: runtime overhead, Eg. Intel PT
Data-flow integrity (DFI)
- instrument every load and store instruction
- Problem: overhead

Contexts for system call integrity

Q: What's a legitimate use of a system call?
A: two variants: (1) control-flow integrity (2) data integrity

In this paper, thress contexts are defined accrodingly

Call-type context
Control-flow context
Argument integrity context

Call-type context

only permitted system call are called in the right manner.

direct call or indirect call
- direct call: int ret = chmod("AAA", S_IWOTH)
- indirect call: function pointer
a system call is one of the categories:
- not-callable
- directly-callable
- indirectly-callable
It is rare for system calls to be called from an indirect call site

Control-flow context

Keep the valid pathes of all sensitive system calls, and enforce this context at runtime

Argument integrity context

A system call argument type is either (1) a direct argument or (2) an extended argument
direct argument: eg. constant, local variable
indirect argument: eg. pointer
if there's struct, take care of the filed

Real world code examples

Legitimate use of the execve system call in NGINX

// nginx/src/os/unix/ngx_process.c
static void ngx_execute_proc(ngx_cycle_t *cycle, void *data){
    ngx_exec_ctx_t *ctx = data;
    // Legitimate NGINX usage of execve system call
    if (execve(ctx->path, ctx->argv, ctx->envp) == -1) {
    ...
        }
    exit(1);
}
    // nginx/src/core/ngx_output_chain.c
ngx_int_t ngx_output_chain(ngx_output_chain_ctx_t *ctx, ngx_chain_t *in){
    ...
    if (in->next == NULL && ngx_output_chain_as_is(ctx, in->buf) ) {
        return ctx->output_filter(ctx->filter_ctx, in);
    }
    ...
}

ctx->output_filter = ngx_execute_proc
ctx->path = "/bin/sh"

BASTION:

Call-type context: At static analysis, we know that "execve" has to be a direct call -> but at runtime, it is a indirect call
Control-Flow context: detect
Argument integreity context: detect

Snippet of NGINX code that can be compromised to reach and call the mprotect system call elsewhere by corrupting index in vulnerable code pointer
v[index].get_handler()

// nginx/src/http/ngx_http_variables.c
ngx_http_variable_value_t *ngx_http_get_indexed_variable( ngx_http_request_t *r, ngx_uint_t index){
    ...
    if (v[index].get_handler(r, &r->variables[index], v[index].data) == NGX_OK) {
        ngx_http_variable_depth++;
        if (v[index].flags & NGX_HTTP_VAR_NOCACHEABLE) {
            r->variables[index].no_cacheable = 1;
        }
    return &r->variables[index];
    }
    ...
    return NULL;
}

buffer overflow: modify index, such that v[index].get_handler = mprotect
r = memory region to exploit
change permission

BASTION:

mprotext does not have indirect call
control path is problematic
argument is wrong

Threat model and assumptions

arbitrary memory read/write
Data Execution Prevention(DEP)
- attackers cannot inject or modify code due to DEP
Address Space Layout Randomization (ASLR)
Shadow Stack (CET)
Hardware and OS kernel are trusted
- Attackers going for OS kernel and hardware (Spectre) are out of scope
BASTION protects a subset of available system calls
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →

BASTION design and implementation

Choose 20 sensiteve system calls (Table1)
BASTION = BASTION-compiler pass + BASTION-monitor

BASTION compiler

Generate context metadata
Leverage light-weight library API for dynamic tracking of sensitive data (Argument integrity)

Analysis for Call-Type Context

System calls in the program fall into three categories
- not-callable
- directly-callable
- indirectlycallable
BASTION analyzes the entire program’s LLVM IR instructions and checks all call instructions.
- directly-callable: a system call is a target of a direct function call
- indirectly-callable: the address of a system call is taken and used in the left-hand-side of an assignment
- not-callable
Metadata:
- pairs of system call numbers and their call type
- list of legitimate indirect callsites (i.e., offset in a program binary)

Analysis for Control-Flow Context

performed only when a system call is invoked -> reduce overhead (CFL enforce for every indirect control-flow transfer)

compile time:

generate CFG
identify all function callee→caller relationships that reach system call callsites
For each callsite, recursively records all callee→caller associations
stops once reaching either main() or an indirect function call

runtime:

unwind stack frame
verifies callee→caller relations until the bottom of the stack (i.e., main), or an indirect callsite

Analysis for Argument Integrity Context

check not only system call arguments but also an arguments’ data-dependent variables -> sensitive variables
maintains a shadow copy of the sensitive variable’s legitimate value in a shadow memory region
updates the shadow copy whenever the sensitive variable is updated legitimately
binds each argument to a certain position for the system call so the Bastion runtime monitor can check argument integrity

enumerates all variables used in system call arguments
performs a backward data-flow analysis, traversing the use-def chains to derive any other variables used to define sensitive variables
- newly identified data-dependent variables are added to the set of sensitive variables
, if there is a write to a field of a struct (e.g., size field of gshm in Figure 2), that write is added to the sensitive variables
repeat 2, 3 until no new sensitive variables are found

Once all sensitive variables are identified, Bastion instruments ctx_write_mem after any memory-backed sensitive variable store to keep its shadow copy up-to-date
Before each sensitive system call callsite, Bastion instruments ctx_bind_mem_X or ctx_-bind_const_X to bind an arguments to their respective argument position X

Bastion runtime monitor

Initializing the Bastion Monitor

Loading metadata:
- The monitor retrieves ELF, DWARF, and linked library file information to recover symbol addresses
- loads Bastion context metadata into the monitor’s memory
Launching a Bastion-protected application:
- performs fork to spawn a child process where the child runs the Bastion-enabled application
- initializes a shadow memory region under a segmentation register
- initializes seccomp: trap on sensitive system calls in the child process
- ptrace: access the application’s state
Trapping a system call invocation:
- custom seccomp-BPF filter to trap on the application’s sensitive system call
- SECCOMP_-RET_ALLOW: non-sensitive system calls, ignore
- SECCOMP_RET_KILL: disables any notcallable system calls
- SECCOMP_RET_TRACE: directlyand indirectly-callable system calls so these system calls can be verified by the Bastion monitor

Enforcing Call-Type Context

Take $PC, look meatedata, check call type

Enforcing Control-Flow Context

stack trace: unwinds and gets each function callsite offset
CFG metadata:a list of callees and their respective valid callers
until the entire stack has been vetted or an indirect call is encountered

Enforcing Argument Integrity Context

verifies integrity of all sensitive variables in the current call stack
Take $PC, check the associated argument integrity context metadata

Implementation

Linux x86-64 v5.19.14
LLVM Module
hardware-based shadow stack -fcf-protection=full
Intel Tiger Lake and AMD Ryzen 7 processors
Glibc v2.28+
Binutils v2.29+

Efforts:

LLVM module: 3,939 lines of code
Bastion’s C runtime library: 659 lines of code
Bastion runtime monitor is a C-program: 7313 lines of code

EVALUATION

Evaluation Methodology

8-core (16-hardware thread) machine featuring an AMD Ryzen 7 PRO 5850U processor and 16 GB DDR4 memory
Bastion LLVM compiler
Results are reported average over five runs
NGINX, SQLite, and vsftpd

Performance Evaluation

NGINX:
- wrk, HTTP benchmarking tool
- sends concurrent HTTP requests
- measure throughput
- NGINX maximum of 1,024 connections per processor
- 32 worker threads
- never incurred more than 0.60% degradation compared to the unprotected NGINX baseline
- Argument Integrity context adds the most overhead
- utilizes a vast sensitive system calls (e.g., mprotect, mmap) during its initialization phase while seldom using when idle or processing requests -> Bastion rarely being triggered during runtime
- average call-depth is only 5.2 frames, with 4 and 9 being minimum and maximum stack call-depths
SQLite:
- DBT2, database transaction processing benchmark
- mix of read and write SQL operations for large data warehouse transactions
- 10 second new thread delay and a 10 minute workload duration
- number of new-order transactions per-minute (NOTPM) for performance
- Overhead:
  - Call-Type: 0.92%
  - Control-Flow: 1.48%
  - Argument Integrity: 2.01%
VSFTPD:
- dkftpbench, FTP benchmark program
- fetch a 100 MB file from vsftpd launching clients one after another for a 120 second duration
- Overhead: worst 1.65%

Argument Integrity context is most costly
LLVM CFI is expensive cuz it is triggered for every indirect callsite => NGINX does not have many indirect call

mprotect, mmap

Sensitive system calls are never called indirectly via a function pointer
Compare with CET, LLVM CFI
- CET: maintain a secondary (shadow) stack. Upon returning from a function, the CPU compares return addresses in the shadow stack and the normal stack
- LLVM CFI: verification at every indirect callsite

SECURITY EVALUATION

ROP Attacks

libc library call system = fork + execl
exec-type system call, to create access to a root shell
mprotect or chmod system calls to change memory or file permissions to be executable

Direct Attacker Manipulation of System Calls

Go after system calls directly, setup callsites and arguments to desired values
The CsCFI attack leverages mprotect to make the entire libc readable, writable, and executable, revealing the code layout to perform arbitrary code execution
AOCR’s Attack 1 open and write to reveal the code layout of NGINX to execute arbitrary code
Control-Flow and Argument integrity contexts to detect
LLVM CFI cannot defend against either attack.
- In the CsCFI attack, mprotect is never used, its address is still taken as this system call is necessary to support dynamic loading of shared libraries

Indirect Attack Manipulation of System Calls

full-function code re-use, data-oriented attacks, and COOP
The NEWTON CPI attack avoids corrupting any code or data pointers. It corrupts the index variable of an array of function pointers to make the array index point to a system call location
Call-Type context blocks the invocation of a system call never used in the program code base

DISCUSSION AND LIMITATIONS

Bastion under Arbitrary Memory Corruption

To bypass all three of Bastion’s contexts, the attacker realistically would needto perform arbitrary read/write many times to match the expected context values without violating static constraints

The main challenge is that this type of system call is called much more frequently
Full Bastion context checking incurs high overhead – e.g., 96.7% for NGINX
- A majority of overhead results from fetching protected process state using ptrace (< 95.7%, delta between Rows 1 and 2)
- additional context switching overhead to access the protected program
- eliminate ptrace overhead would be to run the Bastion monitor inside the kernel