# Pointer Authentication and Memory Tagging
All about ARM security extensions
# Pointer Authentication
* the concept of a pointer authentication code (PAC), which is stored in some upper bits of pointers
* Each PAC is derived from the original pointer, another 64-bit value (e.g. the stack pointer), and a secret 128-bit key
[PAC it up: Towards Pointer Integrity using ARM Pointer Authentication](https://www.usenix.org/system/files/sec19fall_liljestrand_prepub.pdf)
[slide1](https://events.static.linuxfound.org/sites/events/files/slides/slides_23.pdf)
## Background
* Code reuse attacks: ROP,JOP
* Memory protection largely prevents code injection
* Various mitigations today
* e.g. ASLR, execute-only memory, CFI, canaries, pointer mangling, shadow stacks
* difficult to integrate
* non-trivial performance / code size impact
* inhibit debugging
## ROP protection example
backwards compatible ver.
![](https://i.imgur.com/ima8rST.png)
normal ver.
![](https://i.imgur.com/LzjCJ9S.png)
## Theory
* instructions to sign and authenticate pointers
* architecture provides mechanism, not policy
* Pointer Authentication Code (PAC)
* authentication metadata stored within pointer
* no additional space required
* derived from:
- A pointer value
- A 64-bit context value
- A 128-bit secret key
![](https://i.imgur.com/9j76EHo.png)
* Key:
* 128-bit value inhibit prediction / forging of PACs
* APIAKEY can be used, but not read/written at EL0 (userspace)
* The kernel maintains an APIAKey value for **each process** (shared by all threads within), which is initialised to a **random value at exec()** time.
* limited risk of disclosure / modification
* five separate keys:
* two for executable (instruction) pointers
* two for data
* one "general" key
* Pointer Memory layout:
* 7 bits with 48-bit VA with tagging
![](https://i.imgur.com/SmV9nQy.png)
* 15 bits with 48-bit VA without tagging
![](https://i.imgur.com/3bKZTJX.png)
* Operations:
- sign
- PAC* instructions sign pointers with PACs
- Result is not a usable pointer
![](https://i.imgur.com/EWOJRja.png)
- authenticate
- AUT* instructions authenticate PACs
- If PAC matches, result is the original pointer
- If PAC doesn’t match, result is an invalid pointer → faults upon use
![](https://i.imgur.com/VeNQzp7.png)
- strip
- XPAC* instructions strip PACs
- Result is the original pointer
- No authentication is performed
![](https://i.imgur.com/E3uEBpA.png)
## Mechanism
* Insert a PAC into a pointer
* Strip a PAC from a pointer
* Authenticate strip a PAC from a pointer
* If authentication succeeds, the code is removed, yielding the original pointer
* If authentication fails, bits are set in the pointer such that it is guaranteed
to cause a fault if used
## Usage
* Optional ARMv8.3-A extension
* Detects illicit modification of pointers (and data structures)
* Backwards compatible subset
* binaries using some features can run on any ARMv8-A CPU (without protection)
* distributions only need one set of binaries
# Memory Tagging
* "Hardware-ASAN on steroids"
* RAM overhead: 3%-5%
* CPU overhead: (hoping for) low-single-digit %
* AArch64 only because it need Top Byte Ignore(TBI)
* In 64-bit, general-purpose registers(GPR64) make the most significant 16 bits of an address must be all 0xFFFF or 0x0000
* the top eight bits, that is [63:56] of the Virtual Address are ignored by the processor when tagged addressing support is enabled
* Two types of tags
* Every aligned 16 bytes of memory have a 4-bit tag stored separately
* Every pointer has a 4-bit tag stored in the top byte
## Mechanism
* Allocation:
- Align allocations by 16
- Choose a 4-bit tag (random is ok)
- Tag the pointer
- Tag the memory (optionally initialize it at no extra cost)
* Deallocation:
- Re-tag the memory with a different tag
* To use tagging with heap allocations only the allocator needs to make use of the new instructions, the rest of the code only performs standard LDR/STR
## Detect
* Heap-buffer-overflow
![](https://i.imgur.com/oB4sswc.png)
* Heap-use-after-free
![](https://i.imgur.com/IoJs96V.png)
## Kernel ABI and HWCAPs
1. On AArch64 the TCR_EL1.TBI0 bit is set by default(CPU Feature Register)
1. All syscalls need to accept any valid tagged pointer, and it should work as the untagged pointer.
1. UB for invalid tagged pointer
1. Documentation/arm64/tagged-address-abi.rst
## MTE Enabled Kernel Interface
1. built on top of the newly introduced Aarch64 Tagged Address ABI
1. The Kernel exposes a new mmap()/mprotect() flag: PROT_MTE
1. Kernel supports both the exception types: Precise and Imprecise
![](https://i.imgur.com/lfOHUix.png)
1. How does it work?
- Memory allocator, which ultimately invokes mmap()
- with a special flag, **PROT_MTE**, the reserved memory has tagging effects enabled
- The allocator tags the memory and returns to the application a tagged pointer
![](https://i.imgur.com/tFbugSe.png)
## Compiler perspective(challenges)
* [slide1](https://llvm.org/devmtg/2018-10/slides/Serebryany-Stepanov-Tsyrklevich-Memory-Tagging-Slides-LLVM-2018.pdf)
* C family has suck memory safety
* Use-after-free / buffer-overflow / uninitialized memory
* High and Critical ratio in bugs
* AddressSanitizer (ASAN) is not enough
* Hard to use in production
* Not a security mitigation
* Tagging heap objects
* CPU: malloc/free become O(size) operations
* Tagging stack objects
* CPU: function prologue becomes O(frame size)
* Stack size: local variables aligned by 16
* Code size: extra instructions per function entry/exit
* Register pressure: local variables have unique tags, not as simple as [SP, #offset]
* Malloc zero-fill
* ![](https://i.imgur.com/ryTwghr.png)
* ![](https://i.imgur.com/wV4GBHG.png)
* reduce the overhead
## existing implementations
* SPARC ADI
* HW ASAN
* 4-bit tags per 64-bytes of memory
* high RAM overhead due to 64-byte alignment
* LLVM HWASAN
* Software implementation similar to ASAN (LLVM ToT)
* 8-bit tags per 16-bytes of memory
* AArch64-only (uses top-byte-ignore)
* Overhead: 6% RAM, 2x CPU, 2x code size