Try   HackMD

Protect the System Call, Protect (most of) the World with BASTION

tags: security system call

website

Abstract

  • enforces at runtime
  • 3 contexts
    • (Call type): which system call, how's call
    • (Control Flow)
    • (Argument Integrity)
  • Bastion
    • a compiler
    • a runtime monitor system
  • Case study
    • NGINX, SQLite, and vsFTPd
    • indication: reduce overhead of 0.60%, 2.01% 1.65%

Introduction

  • Traditional methods:
    • debloating, system call filtering, and system call sandboxing
    • idea: disabling unused system calls
    • problem: still allow system calls to be invoked (cuz still needed, e.g code-reuse)
  • Contribution:
    • Novel system call contexts for system call integrity
      • 3 system call contexts, namely Call Type, Control Flow, and Argument Integrity contexts
    • Bastion defense enforcing system call integrity
      • compiler pass
        • analyzes all system call usage, performs instrumentation, and generates metadata
      • runtime monitor
        • static and dynamic aspects of each system call context
    • Security & performance evaluation
      • NGINX, SQLite, and vsFTPd

Background

System call Usage in Attaacks

  • 400+ system calls in recent Linux kernel
  • Only few system calls are desired by attackers, call sensitive system calls

Current system call protection mechanisms

  • Attack surface reduction
    • Debloating techniques
    • Remove unused code
    • static program analysis or dynamic coverage analysis
    • Problem: many sensitive system calls are used forlibrary loading (mmap, mprotect)
  • System call filtering
    • Seccomp. system call filtering framework
    • User needs to define an allowlist/denylist given an application
    • Can restrict a system call argument
    • Problem: cannot remove sensitve-but-necessary system calls
    • Problem: restricting arguments to a constant value is applied across the entire application scope. eg. there are to callsites of mprotect, one is read-only, another is read-executable, then the policy for all app scope is read-executable
  • Control-flow integrity (CFI)
    • LLVM support (backward)
    • perform analysis to generage an allowed set of targets per-callsite, called an equivalence class(EC)
    • Problem: Larges ECs -> inconvient; Small ECs -> dangerous
    • Problem: runtime overhead, Eg. Intel PT
  • Data-flow integrity (DFI)
    • instrument every load and store instruction
    • Problem: overhead

Contexts for system call integrity

Q: What's a legitimate use of a system call?
A: two variants: (1) control-flow integrity (2) data integrity

In this paper, thress contexts are defined accrodingly

  1. Call-type context
  2. Control-flow context
  3. Argument integrity context

Call-type context

only permitted system call are called in the right manner.

  • direct call or indirect call
    • direct call: int ret = chmod("AAA", S_IWOTH)
    • indirect call: function pointer
  • a system call is one of the categories:
    • not-callable
    • directly-callable
    • indirectly-callable
  • It is rare for system calls to be called from an indirect call site

Control-flow context

  • Keep the valid pathes of all sensitive system calls, and enforce this context at runtime

Argument integrity context

  • A system call argument type is either (1) a direct argument or (2) an extended argument
  • direct argument: eg. constant, local variable
  • indirect argument: eg. pointer
  • if there's struct, take care of the filed

Real world code examples

  • Legitimate use of the execve system call in NGINX
// nginx/src/os/unix/ngx_process.c
static void ngx_execute_proc(ngx_cycle_t *cycle, void *data){
    ngx_exec_ctx_t *ctx = data;
    // Legitimate NGINX usage of execve system call
    if (execve(ctx->path, ctx->argv, ctx->envp) == -1) {
    ...
        }
    exit(1);
}
    // nginx/src/core/ngx_output_chain.c
ngx_int_t ngx_output_chain(ngx_output_chain_ctx_t *ctx, ngx_chain_t *in){
    ...
    if (in->next == NULL && ngx_output_chain_as_is(ctx, in->buf) ) {
        return ctx->output_filter(ctx->filter_ctx, in);
    }
    ...
}
  • ctx->output_filter = ngx_execute_proc
  • ctx->path = "/bin/sh"

BASTION:

  • Call-type context: At static analysis, we know that "execve" has to be a direct call -> but at runtime, it is a indirect call
  • Control-Flow context: detect
  • Argument integreity context: detect
  • Snippet of NGINX code that can be compromised to reach and call the mprotect system call elsewhere by corrupting index in vulnerable code pointer
    v[index].get_handler()
// nginx/src/http/ngx_http_variables.c
ngx_http_variable_value_t *ngx_http_get_indexed_variable( ngx_http_request_t *r, ngx_uint_t index){
    ...
    if (v[index].get_handler(r, &r->variables[index], v[index].data) == NGX_OK) {
        ngx_http_variable_depth++;
        if (v[index].flags & NGX_HTTP_VAR_NOCACHEABLE) {
            r->variables[index].no_cacheable = 1;
        }
    return &r->variables[index];
    }
    ...
    return NULL;
}
  • buffer overflow: modify index, such that v[index].get_handler = mprotect
  • r = memory region to exploit
  • change permission

BASTION:

  • mprotext does not have indirect call
  • control path is problematic
  • argument is wrong

Threat model and assumptions

  • arbitrary memory read/write
  • Data Execution Prevention(DEP)
    • attackers cannot inject or modify code due to DEP
  • Address Space Layout Randomization (ASLR)
  • Shadow Stack (CET)
  • Hardware and OS kernel are trusted
    • Attackers going for OS kernel and hardware (Spectre) are out of scope
  • BASTION protects a subset of available system calls
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →

BASTION design and implementation

  • Choose 20 sensiteve system calls (Table1)
  • BASTION = BASTION-compiler pass + BASTION-monitor

BASTION compiler

  • Generate context metadata
  • Leverage light-weight library API for dynamic tracking of sensitive data (Argument integrity)

Analysis for Call-Type Context

  • System calls in the program fall into three categories

    • not-callable
    • directly-callable
    • indirectlycallable
  • BASTION analyzes the entire program’s LLVM IR instructions and checks all call instructions.

    • directly-callable: a system call is a target of a direct function call
    • indirectly-callable: the address of a system call is taken and used in the left-hand-side of an assignment
    • not-callable
  • Metadata:

    • pairs of system call numbers and their call type
    • list of legitimate indirect callsites (i.e., offset in a program binary)

Analysis for Control-Flow Context

  • performed only when a system call is invoked -> reduce overhead (CFL enforce for every indirect control-flow transfer)

compile time:

  1. generate CFG
  2. identify all function callee→caller relationships that reach system call callsites
  3. For each callsite, recursively records all callee→caller associations
  4. stops once reaching either main() or an indirect function call

runtime:

  1. unwind stack frame
  2. verifies callee→caller relations until the bottom of the stack (i.e., main), or an indirect callsite

Analysis for Argument Integrity Context

  • check not only system call arguments but also an arguments’ data-dependent variables -> sensitive variables
  • maintains a shadow copy of the sensitive variable’s legitimate value in a shadow memory region
  • updates the shadow copy whenever the sensitive variable is updated legitimately
  • binds each argument to a certain position for the system call so the Bastion runtime monitor can check argument integrity

  1. enumerates all variables used in system call arguments
  2. performs a backward data-flow analysis, traversing the use-def chains to derive any other variables used to define sensitive variables
    • newly identified data-dependent variables are added to the set of sensitive variables
  3. , if there is a write to a field of a struct (e.g., size field of gshm in Figure 2), that write is added to the sensitive variables
    repeat 2, 3 until no new sensitive variables are found
  • Once all sensitive variables are identified, Bastion instruments ctx_write_mem after any memory-backed sensitive variable store to keep its shadow copy up-to-date
  • Before each sensitive system call callsite, Bastion instruments ctx_bind_mem_X or ctx_-bind_const_X to bind an arguments to their respective argument position X

Bastion runtime monitor

Initializing the Bastion Monitor

  • Loading metadata:
    • The monitor retrieves ELF, DWARF, and linked library file information to recover symbol addresses
    • loads Bastion context metadata into the monitor’s memory
  • Launching a Bastion-protected application:
    • performs fork to spawn a child process where the child runs the Bastion-enabled application
    • initializes a shadow memory region under a segmentation register
    • initializes seccomp: trap on sensitive system calls in the child process
    • ptrace: access the application’s state
  • Trapping a system call invocation:
    • custom seccomp-BPF filter to trap on the application’s sensitive system call
    • SECCOMP_-RET_ALLOW: non-sensitive system calls, ignore
    • SECCOMP_RET_KILL: disables any notcallable system calls
    • SECCOMP_RET_TRACE: directlyand indirectly-callable system calls so these system calls can be verified by the Bastion monitor

Enforcing Call-Type Context

  • Take $PC, look meatedata, check call type

Enforcing Control-Flow Context

  • stack trace: unwinds and gets each function callsite offset
  • CFG metadata:a list of callees and their respective valid callers
  • until the entire stack has been vetted or an indirect call is encountered

Enforcing Argument Integrity Context

  • verifies integrity of all sensitive variables in the current call stack
  • Take $PC, check the associated argument integrity context metadata

Implementation

  • Linux x86-64 v5.19.14
  • LLVM Module
  • hardware-based shadow stack -fcf-protection=full
  • Intel Tiger Lake and AMD Ryzen 7 processors
  • Glibc v2.28+
  • Binutils v2.29+

Efforts:

  • LLVM module: 3,939 lines of code
  • Bastion’s C runtime library: 659 lines of code
  • Bastion runtime monitor is a C-program: 7313 lines of code

EVALUATION

Evaluation Methodology

  • 8-core (16-hardware thread) machine featuring an AMD Ryzen 7 PRO 5850U processor and 16 GB DDR4 memory
  • Bastion LLVM compiler
  • Results are reported average over five runs
  • NGINX, SQLite, and vsftpd

Performance Evaluation

  • NGINX:

    • wrk, HTTP benchmarking tool
    • sends concurrent HTTP requests
    • measure throughput
    • NGINX maximum of 1,024 connections per processor
    • 32 worker threads
    • never incurred more than 0.60% degradation compared to the unprotected NGINX baseline
    • Argument Integrity context adds the most overhead
    • utilizes a vast sensitive system calls (e.g., mprotect, mmap) during its initialization phase while seldom using when idle or processing requests -> Bastion rarely being triggered during runtime
    • average call-depth is only 5.2 frames, with 4 and 9 being minimum and maximum stack call-depths
  • SQLite:

    • DBT2, database transaction processing benchmark
    • mix of read and write SQL operations for large data warehouse transactions
    • 10 second new thread delay and a 10 minute workload duration
    • number of new-order transactions per-minute (NOTPM) for performance
    • Overhead:
      • Call-Type: 0.92%
      • Control-Flow: 1.48%
      • Argument Integrity: 2.01%
  • VSFTPD:

    • dkftpbench, FTP benchmark program
    • fetch a 100 MB file from vsftpd launching clients one after another for a 120 second duration
    • Overhead: worst 1.65%

  • Argument Integrity context is most costly
  • LLVM CFI is expensive cuz it is triggered for every indirect callsite => NGINX does not have many indirect call

  • mprotect, mmap

  • Sensitive system calls are never called indirectly via a function pointer

  • Compare with CET, LLVM CFI

    • CET: maintain a secondary (shadow) stack. Upon returning from a function, the CPU compares return addresses in the shadow stack and the normal stack
    • LLVM CFI: verification at every indirect callsite

SECURITY EVALUATION

ROP Attacks

  • libc library call system = fork + execl
  • exec-type system call, to create access to a root shell
  • mprotect or chmod system calls to change memory or file permissions to be executable

Direct Attacker Manipulation of System Calls

  • Go after system calls directly, setup callsites and arguments to desired values
  • The CsCFI attack leverages mprotect to make the entire libc readable, writable, and executable, revealing the code layout to perform arbitrary code execution
  • AOCR’s Attack 1 open and write to reveal the code layout of NGINX to execute arbitrary code
  • Control-Flow and Argument integrity contexts to detect
  • LLVM CFI cannot defend against either attack.
    • In the CsCFI attack, mprotect is never used, its address is still taken as this system call is necessary to support dynamic loading of shared libraries

Indirect Attack Manipulation of System Calls

  • full-function code re-use, data-oriented attacks, and COOP
  • The NEWTON CPI attack avoids corrupting any code or data pointers. It corrupts the index variable of an array of function pointers to make the array index point to a system call location
  • Call-Type context blocks the invocation of a system call never used in the program code base

DISCUSSION AND LIMITATIONS

Bastion under Arbitrary Memory Corruption

  • To bypass all three of Bastion’s contexts, the attacker realistically would needto perform arbitrary read/write many times to match the expected context values without violating static constraints
  • The main challenge is that this type of system call is called much more frequently
  • Full Bastion context checking incurs high overhead – e.g., 96.7% for NGINX
    • A majority of overhead results from fetching protected process state using ptrace (< 95.7%, delta between Rows 1 and 2)
    • additional context switching overhead to access the protected program
    • eliminate ptrace overhead would be to run the Bastion monitor inside the kernel