# Compiler: The Program, the Language, & the Computer Work (2)
###### tags: `Compiler` `Tokenizer` `Parser` `Optimization` `Compiler Design` `GCC` `Clang` `LLVM` `JVM` `TurboFan` `V8` `Automata` `GNU` `Linux` `Kernel` `Program Analysis` `Operating System` `Computer Architecture` `ASM` `ABI` `ISA` `Runtime` `Interpreter` `Toolchain` `C` `C++` `Java` `JavaScript` `C#` `Cross Platform` `JIT` `AOT` `Automata` `Computing Machine`
These are the learning notes, research, & discussions on compiler design for GNU/Linux systems. Topics include optimization strategies, program analysis, runtime support, cross-compilers, & other advanced subjects.
:::info
:arrow_right: Previous article: [Compiler: The Program, the Language, & the Computer Work (1) - shibarashinu](https://hackmd.io/@shibarashinu/SyEHz-JHC).
:::
:::warning
**Program vs. Language vs. System**
:arrow_right: Related topics with compiler design, implementations, & applications: [我所不知道的 JavaScript - shibarashinu](https://hackmd.io/@shibarashinu/BkAHFociR).
:::
> All Linux kernel functions discussed here primarily based on *version 6.11*.
## Compiler Support
### Built-In Support
Support provided by the compiler elevates the compiling code, such as:
- Adorns `IR` with fine-grained labels.
- Provides miscinallious control options for programmer to fine-tune their programs (spanning *OS support*, *machine restriction*, *machine acceleration*, *compilation policy*, *optimization options*, *security enhencement*, *code specialization*, *customization rules*, ...).
For example:
- Turn "general" **`memcpy()`** into "specialized" functions:
- **`__builtin_memcpy_inline()`**: Enabling inline expansion or loop unrolling.
- **`__builtin_memcpy_align()`**: Enabling specific hardware features or alignment requirements.
[Re: __builtin_memcpy and alignment assumptions - GNU GCC](https://gcc.gnu.org/legacy-ml/gcc/2016-01/msg00046.html)
- **`__builtin_memcpy_vectorized()`**: Enabling vector instructions for bulk memory operations.
> *Instruction support*, *data bus bandwidth*, *MM in OS* all matter.
- **`__builtin_memcpy_check_bounds()`**: Enabling buffer size checks for secure operations.
- **Lambda function closure issues:** [【python】这个十多年的bug,没点黑魔法还真解决不了 - 码农高天](https://www.youtube.com/watch?v=Mp_f1sBckjU)
- **Intrinsic function:** **`_mm_cmpistrc()`** wraps **CISC SIMD's SSE** (a machine acceleration instruction).
- [x86 intrinsics list - Microsoft Learn](https://learn.microsoft.com/en-us/cpp/intrinsics/x86-intrinsics-list?view=msvc-170)
- **`__builtin_expect()`** provide **`likely()`** & **`unlikely()`** options in branch controls.
- Degenerate **`printf(char[])`** without format specifiers (e.g., *%s*, *%d*, *%.2f* ...) to **`puts(char[])`**.
:::info
:arrow_right: For entensive compiler-specific support, see the following [compiler extensions](#Compiler-Extensions) section.
:::
:::warning
**Sometimes, the developer just want using the unmodified functions without any optimization or transformation.**
Then it should be very careful on tuning compiler options for special development scenarios.
For example, In embedded system development, if the device doesn't implement the `puts()` function, issues may arise when `printf()` is replaced with `puts()` due to default compiler behavior.
To prevent optimization on specific built-in function, do:
```
gcc -fno-builtin-func_name ...
```
More:
- [Can't step into string.h function with GDB - StackOverflow](https://stackoverflow.com/questions/15306090/cant-step-into-string-h-function-with-gdb)
:::
### (Optional) Hardware Acceleration
Compilers not only can do hardware-independent optimization like above mentioned: loop invariants (code motion), BB optimization, control flow analysis, ...
Some compilers also support aggressive optimization for each machine- (system-) dependent environment:
```
gcc --march=native ...
```
Or using above-mentioned intrinsics functions, libraries, drivers, or OS support functionalities to leverage hardware-level acceleration features, like:
- loop unrolling (relying on instruction-level parallelism).
- specialized hardware-dependent instruction: simd, branchless `cmov`, speculative prediction, ...
- cache-friendly, NUMA systems, ... locality-sensitive optimization.
- security features: CFI, secure enclaves, ...
- adds-on systems / hardware features: DMA, GPGPU, ...
:::warning
But for those really general, common, crucial, or performance-oriented operations, the compilers may decide enabling hardware acceleration by default, for example: CPU parallelism / concurency, FPU, MMU, atomic ops (e.g., futex), ...
:::
### Whole-Program Modularization & Link-Time Optimization
The program is generally divided into several procedures to make the job easy in analyzing, debugging, etc.
:::info
**Terminology**
- **Local Optimization:** the optimization in a *Basic Block*.
- **Regional Optimization:** the optimization in an *Extended Basic Block*.
- **Global Optimization:** the optimization of a *whole procedure*.
- **Interprocedural optimization:** the optimization of *whole programs* which includes different procedures.
:::
**Pros:**
- Limiting the amount of code that the compiler considers at any particular time.
- Keeping the compile-time data structures small and limiting the cost of various compile-time algorithms.
**Cons:**
- Preventing the compiler from understanding what happens inside a call.
- Requiring frequent jumps.
:::warning
Same work can be applied in code bundlers like *Webpack Optimization*.
:::
#### Compilation Unit (Translation Unit)
Each `*.c` file is a compilation unit treated as an independent unit that can perform abstraction, encapsulation, optimization, variable visibility (e.g., `static`, `local`, `global`), and source tree tracking.
:::warning
This is the point where *Makefile* can do parallel compilation by processing independent modules.
```sh!
make -j$(nproc) ...
```
:::
Related topics:
- [Why won't extern link to a static variable? - StackOverflow](https://stackoverflow.com/questions/2841762/why-wont-extern-link-to-a-static-variable)
#### Incremental Compilation
Incremental compilation minimizes recompilation by only recompiling parts of the code that have changed, while retaining unchanged sections.
> 
>
> (Source: [Incremental Compilation - Rust Blog](https://blog.rust-lang.org/2016/09/08/incremental.html))
#### Interprocedural Optimization (IPO)
To reduce the inefficiencies that are introduced by separate procedures which can't be done within the *Global Optimaization*, an *Intraprocedural Optimization*, the compiler may analyze and transform multiple procedures together. Interprocedural Optimization helps in achieving this.
:::success
**Inline Substitution:**
The compiler can improve the efficiency by replacing the call site with a copy of the callee’s body via *Link-Time Optimization*.
This helps in allowing the compiler to avoid most of the procedure linkage code, resulting in the streamlining of the program’s call graph.
e.g., Inline the shared object's function
```cpp=
int isOdd(int num) {
return (num & 0x1);
}
```
More:
- [Link-time optimization for the kernel - LWN](https://lwn.net/Articles/512882/)
**Points-To Analysis:**
A compile-time technique that helps identify relationships between pointer variables and the memory locations that they point to during program execution.
:::
### Cross-Platform Parallel Computing
#### OpenMP (Open Multi-Processing)
An API standard that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran.
#### C++ AMP *(Deprecated)*
C++ Accelerated Massive Parallelism (C++ AMP) accelerates execution of C++ code by taking advantage of data-parallel hardware such as a graphics processing unit (GPU) on a discrete graphics card.
[C++ AMP Overview - Microsoft](https://learn.microsoft.com/en-us/cpp/parallel/amp/cpp-amp-overview?view=msvc-160)
### Cross Compilation
To build on one platform a binary that will run on another platform.

:::warning
**For example, develop embedded systems in arm architecuture on Intel x86 machines.**
The *cross compilation* enables efficient and effective deployment of software onto embedded devices, which are often resource-constrained & have specialized processors or operating systems.
> **It usually needs a set of "Cross Compiler Toolchain" to complete the whole task:** Runtime support (& maybe using different linker script) should be taken into account.
:::
## Runtime Support
[Runtime system - Wiki](https://en.wikipedia.org/wiki/Runtime_system)
The compiler's work for runtime environment supports is like:
- Mapping program elements like "procedure names" and "identifiers" to their actual memory locations during the runtime.
- Wrap the user's program with language-specified, system-provided environment help, like calling convention, loaders, function prologue/epilogue, etc.
It is so-called the **dynamic state of the target machine** that includes *software libraries* and *environment variables*, to support running processes.
[Runtime Environments in Compiler Design - GeeksforGeeks](https://www.geeksforgeeks.org/runtime-environments-in-compiler-design/)
:::warning
By being packaged as a *"Programming Language"* aimed at developers and then implemented by *"Execution Systems"* (e.g., Operating Systems, Runtime Systems) that focus on practical processing, they work together to achieve computer functionality and complete tasks.
:::
### Procedure Call Stack
Keep track of the live *procedure activations* (i.e. the procedures whose execution have not been completed).

The call stack consist of **activation records** that provide essential runtime enviroment components: *local vriables*, *temporary values*, *parent / other procedure snapshots*, *machine status*, *return value*, *parameters*, *dynamic links*, ...
### Runtime Storage (Section -> Segment)
- **Target Code:** loaded from ELF's `.text`.
- **Data Objects:**
- **Global Static Variables:**
- With initialized values: loaded from ELF's `.data`.
- Uninitialized: OS allocates a zeroed segment based on ELF's `.bss`.
- **Automatic Data Objects:** Stack.
- **Dynamic Data Objects:** Heap.
- **Access Links:** refer to non-local data held in other activation records.
- **Control links:** information of procedures' control flow.
[Executable Linkers are basically just home theater setups - Chris Kanich](https://www.youtube.com/watch?v=eQ0KOT_J8Sk)
### Application Binary Interface (ABI)
[System V ABI (on x86_64 Systems) - OS Dev Wiki](https://wiki.osdev.org/System_V_ABI)
> 
> (Source: [x32 ABI - Wiki](https://en.wikipedia.org/wiki/X32_ABI))
#### Calling Conventions
- How to pass/handle parameters.
- [How Many Arguments Can Be Passed in Variadic Functions (The GNU C Library) - GNU.org](https://www.gnu.org/software/libc/manual/html_node/How-Many-Arguments.html):
> There is no general way for a function to determine the number and type of the optional arguments it was called with. That is, the developer maintains their own calling convention for each variadic function.
A variadic function example:
```c=
#include <stdarg.h>
void print(const char* ptr, ...) {
va_list args;
for (va_start(args, ptr); *ptr != '\0'; ++ptr) {
if (*ptr == 'd') {
int i = va_arg(args, int);
printf("%d", d);
}
}
va_end(args);
}
int main() {
print("__d", 'a', 1.2, 3);
}
```
- **Number of optional arguments as a fixed argument:** This convention implies all of the optional arguments are of the same type.
- **Bitmasks & types of optional arguments as a fixed argument:** If the bit is set, fetch the value of the next argument, otherwise use a default value.
> For example: The *format string argument* of `printf`.
- **Special Terminator as the last optional argument:** This should assume that the special terminator isn’t otherwise meaningful to the function.
> For example: A null pointer indicates the end of the argument list in `execl`.
:::warning
**This Could Leak Vulnerables due to the Variadic Function Characteritstics!**
For example, if we can make the format argument `"%x, %x, %x, %x"` & `printf`, it will pop 4 stack values & print them in hex, potentially leaking sensitive information.
Refs:
- [Format String Vulnerability - CTF Handbook](https://ctf101.org/binary-exploitation/what-is-a-format-string-vulnerability/)
:::
- **Function/Syscalls following the ABI standard/convention:**
- **x86 user function call vs. kernel function call**
- user function parameter:
```arm!
rdi, rsi, rdx, rcx, r8, r9, (SIMD regs ...)
```
- the kernel uses:
```arm!
rdi, rsi, rdx, r10, r8, r9
```
[List of x86 calling conventions - Wiki](https://en.wikipedia.org/wiki/X86_calling_conventions#List_of_x86_calling_conventions)
- **What tasks the machine should do before & after the call**
e.g., standard function call's prologue / epilogue.
- **Instruction alignment for each function call**
for example, 16-Byte function alignment improves:
- Instruction cache efficiency.
- Branch prediction & fetch alignment.
- Loop performance (at hotspot gains more).
[Why does this function push RAX to the stack as the first operation? - StackOverflow](https://stackoverflow.com/questions/37773787/why-does-this-function-push-rax-to-the-stack-as-the-first-operation)
- **Stack management**
- **Stack pointer (`rsp`/`rbp`) alignment, scratch/preserved register management for every function call**.
e.g., alignment of the stack:
```arm=
call <func> ; push $ra (8-Byte) onto stack
```
inside the function:
```arm=
sub rsp 0x8 ; stack pointer should be 16-Byte aligned
...
```
or
```arm=
push rbp ; now stack pointer auto aligned by 16-Byte!
...
```
- **Stack frame layout**
e.g, return address, parameters, & local variables, ...
- **Stack, function call optimization / security**
e.g., tail call optimization, stack canary, CET, ...
- **Symbol Name Resolutions:**
- **Dynamically Linking (by glibc `ld.so`):**
- Lazy loading of shared libraries: default (i.e., no `LD_PRELOAD=...`, no full RELRO)
- Lazy binding of function symbols: default (i.e., `LD_BIND_NOW=0`)
:::info
:arrow_right: For more details, see below glibc implementation of user program's dynamically linked shared libraries on Linux: [C-Runtime & C-Library (glibc on GNU/Linux)](#C-Runtime-amp-C-Library-glibc-on-GNULinux) section.
:::
:::warning
**Reduce Dynamically Linked Shared Library Lookup Scope for an Unsolved Symbol**
By default, a symbol is searched through ALL loaded shared libraries until it resolved, how to reduce the symbol lookup scope?
1. (Proactive) non-exposed functions:
```c=
__attribute__((visibility("hidden"))) int func() {}
```
2. (Passive) using `dlopen` to dynamically control symbol resolution scope:
```c=
void *math_lib = dlopen("libmath.so", RTLD_LAZY | RTLD_LOCAL);
...
void (*foo)() = (void (*)())dlsym(math_lib, "func");
```
`RTLD_...` flags: [dlopen(3) — Linux manual page](https://man7.org/linux/man-pages/man3/dlopen.3.html)
:::
- **Name Mangling:** provides means to encode added information in the symbol names (e.g., *function*, *structure*, *class*) **to pass extra semantic information from the compiler to the linker**.
- **Type Representations:** define how different static/dynamic types are stored, accessed, and manipulated (especially useful in object-oriented programming & casting scenarios).
- **Primitive Types:** *int*, *float*, *char*, *bool*, ...
- **Composite Types:** *arrays*, *structures*, *unions*, *tables*, *blobs*, *streams*, ...
- **User-Defined Types:** *int32_t*, *audio* derived from *blobs*, ...
- **Machine-Specific Architectures & Machanisms:** the standards for performing the tasks on specific platform(s) & runtimes (i.e., operating systems, language runtime supports, ...), such as:
- **Memory Management:** stacks, heaps, files, libraries, devices, paging, caching, garbage collection, thread-safe shared memory, IPC, brk(), mmap(), MMU, ASLR, ELF loaders, dynamically linking, ...
- **Exception/Interrupt Handling:** defines how error/event/signal conditions are communicated and handled between the operating system and programs.
- **Thread Support & Synchronization:** encapsulate & establish standards for data sharing and synchronization in parallel processing.
- **Load Program into Memory:** e.g., ELF loading process, GOT table, ...
More:
- [How do linkers resolve symbols? Systems Programming CS Lecture - Chris Kanich](https://www.youtube.com/watch?v=6XVUIeAaROU)
- [Application Binary Interface - Derya Cortuk](https://medium.com/@derya.cortuk/application-binary-interface-92c52d46bbf6)
### C-Runtime & C-Library (glibc on GNU/Linux)
[Creating a C Library - OSDev Wiki](https://wiki.osdev.org/Creating_a_C_Library)
> 
>
> (Source: [[Slides] How A Compiler Works: GNU Toolchain - jserv - slideshare](https://www.slideshare.net/slideshow/how-a-compiler-works-gnu-toolchain/45294966))
#### Executable Load & Return
*Provides supports at the beginning & the end of the process.*
:+1: [How programs get run: ELF binaries - LWN.net](https://lwn.net/Articles/631631/)
:::warning
**ELF File Format**

[In-depth: ELF - The Extensible & Linkable Format - stacksmashing](https://www.youtube.com/watch?v=nC1U1LJQL8o)
:::
- **Dynamic Linking of (PIC) Shared Libraries & (PIE) Executables:**
[Linux Executable Symbol Relocation Explained - Chris Kanich](https://www.youtube.com/watch?v=E804eTETaQs)
> Simple dynamically linking flow chart with `.got` (individual mutable function offset pointers) & `.plt` (shared fixed function stubs for lazy linking):
> 
>
> (Source: [动态链接库(dll)是如何工作的? - 奇乐编程学院](https://www.youtube.com/watch?v=QUaSgq-ivbw))
> Dynamic linking `call <puts@plt@libc.so>` example:
> 
>
> (Source: [GOT表和PLT表 - saku376](https://saku376.github.io/2021/04/30/GOT%E8%A1%A8%E5%92%8CPLT%E8%A1%A8/))
**At compile time:**
```sh!
# shared library
gcc -c math.c -o math.o -fPIC -g -Og # compile
gcc math.o -o libmath.so -shared # link
gcc math.c -o libmath.so -fPIC -g -Og -shared # compile & link
```
```sh!
# main executable
gcc (-fPIE) -c main.c -o main.o # compile
gcc (-pie) main.o -o main -lmath -L. # link
gcc main.c -o main -lmath -L. # compile & link
```
- ==**Compiler**== makes function calls indirectly route to `.plt` (code) sections/segments.
- ==**Linker**== (`ld`) makes `.plt` stub inderectly route to `.got` (data) sections/segments.
**At runtime:**
```sh!
# setup dynamically-linked shared library directories
export LD_LIBRARY_PATH=".:$LD_LIBRARY_PATH"
# execute
./main
```
```sh!
# check for the executable's shared library dependencies
ldd ./main
```
- ==**Dynamic Linker**== (`ld.so`) lazily links the shared libraries's symbols:
For example: a `int *add(int, int)` in libmath.so that called by ./main (PIE) program.
0. the kernel starts ELF loading process ([load_elf_binary](https://elixir.bootlin.com/linux/v6.15.1/source/fs/binfmt_elf.c#L824)) after the `exec` syscall, bringing (replacing) ELF file image in RAM.
then the kernel calling the user-space `ld.so` program interpreter, the dynamic linker, specified in `PT_INTERP` (ELF program header -> section/segment address).
the `ld.so` loads shared libraries specified in `.dynamic` section, updates each shared library `_GLOBAL_OFFSET_TABLE_`, & puts them to the shared library `link_map` linked list.
> `ld.so` uses breadth-first traversal to load ALL required shared libraries.
- memory layout
```sh!
(gdb) info proc map
```

:::warning
**Maybe Some `LD_PRELOAD` Shared Libraries Are Specified**
`ld.so` first processes them, then the `DT_NEEDED` shared libraries listed in the main executable.
:::
:::warning
**Maybe Full RELRO is On**
`ld.so` additionally links all shared library symbols, updates `<func>@got.plt` address slots, & then sets `.got.plt` (pointed by `DT_JMPREL`) segment to READONLY.
:::
1. the ./main calls the function's `<func>@plt` (code) stub.
```arm=
mov esi, <int32_arg0>
mov edi, <int32_arg1>
call <add@plt>
...
```
:::warning
The going-through segments of the following steps are **PIE-/sharedlibrary-independenly**, meaning the dynamically linking can be performed independently & iteratively.
:::
:::info
Note here is using function `call` to the `.plt` stub, then `jmp` several trampoline of `<func>@plt`, `<func>@got.plt` to the actual function address.
:::
2. `.plt` code segment of ./main contains all dynamic-linking related code:
```arm=
; runtime symbol resolver (ld.so) trampoline (using GOT data)
push [_GLOBAL_OFFSET_TABLE_+8] ; GOT[1] (see below)
jmp [_GLOBAL_OFFSET_TABLE_+16] ; GOT[2]
nop DWORD PTR [rax+0x0] ; alignment padding
; <add@plt> stub
jmp [<add@got.plt>] ; jmp to "add" addr or the following line?
push 0x0 ; resolution symbol index
jmp <dyn_res@plt> ; resolve the symbol via ld.so
; <printf@plt> stub
jmp [<printf@got.plt>]
push 0x1
jmp <dyn_res@plt>
...
```
:::warning
**[CS:APP] `_GLOBAL_OFFSET_TABLE_` in `.got.plt` Data Segment**
This table is used when those `<func>@plt` code stubs want to initiate a runtime symbol resolution call.

- **GOT[0]**: `.dynamic` address (see below).
- **GOT[1]**: `link_map` linked list pointer ---- each shared library data of dynamically linking.
- [/elf/link.h](https://elixir.bootlin.com/glibc/glibc-2.40.9000/source/elf/link.h#L101)
```c=
struct link_map
{
char *l_name; // shared object's file path
ElfW(Addr) l_addr; // shared object's base address offset
ElfW(Dyn) *l_ld; // shared object's .dynamic address
struct link_map *l_next, *l_prev;
};
```
> shared objects (main executable & shared libraries) have there own `.dynamic` section.
- `.dynamic` section's entries:
```sh!
readelf --dynamic <dynamically-linked-program>
```

> inspect `.dynamic` entries via GDB:
> ```sh!
> # check .dynamic: info files
> (gdb) p $dyn = <.dynamic-addr>
> (gdb) p $dyn_arr = (Elf64_Dyn*) $dyn
> (gdb) p $i = 0
> (gdb) p $dyn_arr[$i++]
> ...
> ```
[Dynamic Section (Linker and Libraries Guide) - Oracle Docs](https://docs.oracle.com/cd/E19683-01/817-3677/chapter6-42444/index.html)
- **`DT_STRTAB`**: string table `.dynstr` address

- **`DT_NEED`**: required shared library name
> Type `DT_NEED` (`dtag = 1`) entries only store the character offset in `.dynstr` table:
> ```sh!
> (gdb) x/1s <.dynstr-addr> + d_val
> ```
> 
- **`DT_SYMTAB`**: symbol table `.dynsym` address
```sh!
readelf --dyn-syms <dynamically-linked-program>
```

- **`DT_HASH`/`DT_GNU_HASH`**: symbol hash table(s)
:::info
**Hashing for Dynamic Linkers Searching Symbols**
- **SysV Hash (ELF Hash Table):** uses the extra `.hash` ELF section to do hashing function with candidate buckets & chaining.
- **GNU Hash (GNU ld):** uses the extra `.gnu.hash` ELF section to do hashing funtion with bloom filter layers, candidate buckets & sorted chaining.
:::
- **`DT_RELA`/`DT_RELASZ`**: relocation entries
- **`DT_PLTGOT`**: `.got.plt` address
- **GOT[2]**: `_dl_runtime_resolve()` address.
> **`_dl_runtime_resolve`**: address of dynamic linker's (`ld.so`) runtime symbol resolution procedure.
- **GOT[3+]**: each of the dynamically linked symbol's resolved address.
> Initially, the `<func>@got.plt` slot was stuffed with the next instruction of `<func>@plt`, ++as if doing the normal sequential execution of the runtime symbol resolution of `<func>` with `ld.so`!++
>
> (recall **`<func>@plt`:** `jmp [<func>@got.plt]`)
The control flow is rather like:
:::info
**Does `.got` has corresponding `<func>` symbol's resolved address?**
1. **yes**: `jmp <func>`.
2. **no**: continue the `<func@plt>` stub's following steps ---- the symbol resolution that writes the actual address into `<func>@got.plt`.
:::
> [!Note]
> The format & interpretation of the Global Offset Table is processor specific (ABI specific).
[Dynamic Linking - Linux Foundation](https://refspecs.linuxfoundation.org/ELF/zSeries/lzsabi0_zSeries/x2251.html)
:::
:::info
**`.got` Data Segment vs. `.got.plt` Data Segment**
- **`.got`**: stores glibc necessary runtime symbol's resolved address slot (e.g., `__libc_start_main`, `__cxa_finalize`, ...).
> For general global symbols use.
- **`.got.plt`**: provides the `_GLOBAL_OFFSET_TABLE_` & the user program symbol's resolved address slot.
:::
3. searches from ALL loaded `<lib>.so` chained in `link_map`.
4. searches the inquired symbol by ++symbol hash table++.
5. updates the `.got.plt` corresponding function address slot.
:::info
**Can this Mechanism be Exploited due to Vulnerabilities?**
:arrow_right: RELRO (RELocation Read Only) Technique: [Hardware/Software-Based Security - shibarashinu](https://hackmd.io/@shibarashinu/H1UUR_p-0#RELocation-Read-Only-RELRO)
:::
- ==**OS**== handles the effective on-demand loading: demand paging mechanism, copy on write for global variables, OS memory management, ...
:::success
**[TODO] Profile-Guided & Adaptive Dynamically Linking Optimization**
*Extra Layer of Symbol Resolutions Using ID-Based & Priority-Based Query Systems.*
- **ID-Based Look-Up Systems:**
- C++ namespace mangling.
- Symbol stripping for productions: game engines, app bundlers (e.g., Webpack), ...
- WASM index-based function references.
- Windows COMs (Component Object Model) with interface IDs (GUIDs).
- **Priority-Based Look-Up Systems:**
- DBMS's query planners & optimizers (statically or dynamically).
- index optimization, clustered index.
- query plan caching, bundling, tuning.
- Profile-guided symbol table layout: heuristic observation, AI (machine learning) recognizing the workload pattern.
- runtime profiling, query statistics, histogram collections.
- IR, control flow aiding supports.
- JIT-like inline patching, hot-path rewriting, hot-module replacement.
- cacheline, superscalar, ... & other computer architecture optimizations.
:::warning
E.g., Adjust the `.gnu.hash`'s hashing & chaining layouts, tune bloom filters & hierachical multi-hashing logics, predictions & fallbacks, or add middle layers.
:::
> These require compilers (`gcc`/`clang`/`llvm`), linkers (`ld`), dynamic linkers (`ld.so`), ABI formats (`ELF`) & OS (runtime) supports all at once.
But the gain is huge: we can preload the hotspot code, memory layouts, function calls & systems resources are dynamically used in the optimizing way, it can also surely be seamlessly adapted to advanced OS memory management, multitasking, parallel processing, realtime demands, embedded systems, ... Then the entire ecosystem built & adapting this OS can benifit from this, from low-level module apis to end-user apps.
:::
#### Dynamically Linking of OS Shared Libraries & Data
*Reduces the need of syscall for some non-sensitive common data & virtual interfaces to the kernel.*

> `ld` loads vDSO:
> 
> 
> 
>
> (Source: [The vDSO on arm64 - ARM](https://blog.linuxplumbersconf.org/2016/ocw/system/presentations/3711/original/LPC_vDSO.pdf))
`(gdb) info proc map`:

:::info
**Linux Kernel Entries**
Source code: [/arch/x86/entry/](https://elixir.bootlin.com/linux/v6.15.3/source/arch/x86/entry)
- syscalls
- vsyscall interface
- vDSO
:::
- **vDSO (virtual Dynamic Shared Object)**:
Some utils procedures to get the vvar (vDSO Variables). e.g., `__vdso_clock_gettime()`, `__vdso_gettimeofday()`, `__vdso_getcpu()`, `__vdso_time()`, ...
:::info
**Same Address Can Have Multiple Symbols**
For example, defined in `.symtab` for compile-time funcitons & `.dynsym` for runtime functions):

:::


> Where `0x00007ffff7fc----` is in the range of `[vvar]` section.
Source Code:
- vDSO headers: [/include/vdso/](https://elixir.bootlin.com/linux/v6.11/source/include/vdso)
- vDSO for x86: [/arch/x86/entry/vdso/](https://elixir.bootlin.com/linux/v6.11/source/arch/x86/entry/vdso)
- vDSO for ARM: [/arch/arm/vdso/](https://elixir.bootlin.com/linux/v6.11/source/arch/arm/vdso)
:::info
**Auxiliary Vector**
When a program is executed, it receives information from the operating system about the environment in which it is operating. The form of this information is a table of key-value pairs, where the keys are from the set of `AT_*` values in [uapi/linux/auxvec.h](https://elixir.bootlin.com/linux/v6.11/source/include/uapi/linux/auxvec.h) (used to be in *elf.h*).
This (architecture-specific) first entry in the vector is the `AT_SYSINFO_EHDR` value for x86_64; this indicates ==the location of the vDSO page==.
Refs:
- [Auxiliary Vector - GNU C Library](https://www.gnu.org/software/libc/manual/html_node/Auxiliary-Vector.html)
- [How programs get run: ELF binaries](https://lwn.net/Articles/631631/)
- [Linux — The Auxiliary Vector (AUXV) - Shlomi Boutnaru](https://medium.com/@boutnaru/linux-the-auxiliary-vector-auxv-cba527871b50)
:::
Refs:
- [vdso(7) - Linux manual page](https://man7.org/linux/man-pages/man7/vdso.7.html)
- [Implementing virtual system calls - LWN.net](https://lwn.net/Articles/615809/)
- [vDSO: 快速的 Linux 系統呼叫機制 - jserv](https://hackmd.io/@sysprog/linux-vdso)
- **vvar (vDSO Variables)**: The read-only data copied from the kernel space.
#### Utility C Library Calls
- [C Standard Library](https://en.wikipedia.org/wiki/C_standard_library): Core set of functions & features defined by the *ANSI C standard (ISO C)*.
- [C POSIX Library](https://en.wikipedia.org/wiki/C_POSIX_library): POSIX OS API that builds on top of C Standard Library (for UNIX-like OSs).
- [float.h](https://en.wikipedia.org/wiki/C_data_types#float.h) (compile time) / [fenv.h](https://en.wikipedia.org/wiki/C_mathematical_functions#fenv.h) (runtime): floating-point properties (e.g., precision, range) / floating-point environment settings (e.g., rounding modes, exception handling).
- [unistd.h](https://en.wikipedia.org/wiki/Unistd.h): ==syscall wrapper functions==, such as: `fork`, `pipe`, `sleep`, & I/O primitives (`read`, `write`, `close`, ...).
:::info
These are specific for POSIX OS API, for broader & miscellaneous functions look up headers in *sys/* (e.g., *sys/tree.h*, *sys/socket.h*, *sys/epoll.h*, ...).
:::
- [dlfcn.h](https://en.wikipedia.org/wiki/Dynamic_loading): ==OS dynamic loading utils functions==, such as: `dlopen`, `dlerror`, `dlclose`, `dl_runtime`, ...
- [stdio.h](https://en.wikipedia.org/wiki/C_file_input/output): ==C file input/output utils functions==, such as: `printf`, `sscanf`, `puts`, `gets_s`, `fopen`, `fclose`, `fread`, `fwrite`, ...
- [stdlib.h](https://en.wikipedia.org/wiki/C_dynamic_memory_allocation): ==C dynamic malloc utils functions==, such as: `malloc`, `calloc`, `free`, ...
- [signal.h](https://en.wikipedia.org/wiki/C_signal_handling): ==OS interaction with user processes utils functions==, such as: high-level events of *exception occurrences* / *hardware interrupt* / *inter-process messages* that bubble up from OS.
- [setjmp.h](https://en.wikipedia.org/wiki/Setjmp.h): `setjmp` / `longjmp` are useful for implementing `try ... catch` or coroutines.
```C=
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>
#include <string.h>
jmp_buf jump_buffer;
enum { UNDEFINED = -1, NORMAL, DIVIDE_BY_ZERO };
typedef struct { int error, result; } Result;
void handle_divide_error(int sig) {
longjmp(jump_buffer, DIVIDE_BY_ZERO);
}
Result divide(int numerator, int denominator) {
Result result = { .error = UNDEFINED };
switch (setjmp(jump_buffer)) {
case NORMAL:
result.result = numerator / denominator;
result.error = NORMAL;
break;
case DIVIDE_BY_ZERO:
result.result = 0;
result.error = DIVIDE_BY_ZERO;
break;
}
return result;
}
int main() {
signal(SIGFPE, handle_divide_error);
Result result = divide(5, 0);
if (result.error)
printf("error code: %i", result.error);
else
printf("result: %i", result.result);
return 0;
}
```
Example code:
```C=
#include <stdio.h> // for fprintf()
#include <unistd.h> // for close()
#include <sys/epoll.h> // for syscall epoll_create1()
int main()
{
int epoll_fd = epoll_create1(0);
if (epoll_fd == -1)
{
fprintf(stderr, "Failed to create epoll fd\n");
return 1;
}
if (close(epoll_fd))
{
fprintf(stderr, "Failed to close epoll fd\n");
return 1;
}
return 0;
}
```
More:
- [Why is malloc() considered a library call and not a system call? - StackOverflow](https://stackoverflow.com/questions/71413587/why-is-malloc-considered-a-library-call-and-not-a-system-call)
- [I made the same game in Assembly, C and C++ - Nathan Baggs](https://www.youtube.com/watch?v=2eeXj-ck9VA)
#### Handle Exception & Runtime Supports
*Supports the process with OS & hardware/software functionarities via C libraries.*
> E.g., Software-based alternative for supporting FPU hardware acceleration feature.
### C++ RTTI (Run-Time Type Information)
Determines the data type of a variable at runtime, it is used when invoking *dynamic_cast* or *typeid operators*.
> The C++ standard does not determine how exactly RTTI is implemented, **its implementation is "delegated" to the application binary interface (ABI)**. Most compilers store information in a **virtual table (vtable)**. If you need details, you can look at the vtable implementation on the x86_64 platform.
More:
- [[Slides] Static Analysis of C++ Virtual Tables (from GCC) - James Rowley, 2023](https://hardwear.io/usa-2023/presentation/analyzing-decompiled-c++vtables-and-objects-in-GCC-binaries.pdf)
:::info
**Get Rids of the RTTI's dynamic_cast**

`dynamic_cast` is bad because:
1. It work only on polymorphic types.
2. It doesn't work with the `-fno-rtti` compiler's flag.
3. ABI-specific information about types is required for its work.
4. Downcasting works via the vtable view, and this is a slow operation.
> Alternatively, we can implement a simple self-hosting dynamic_cast by additionally wrapping our current class with a parent class that contains type information that can be determined at runtime.
For example:
```cpp=
void foo(const Stmt *stmt)
{
if (stmt->m_kind == Stmt::Type::IfStmt)
{
auto ifStmt = static_cast<const IfStmt *>(stmt)
Visit(*ifStmt);
}
else if (stmt->m_kind == Stmt::Type::ForStmt)
{
auto forStmt = static_cast<const ForStmt *>(stmt)
Visit(*forStmt);
}
}
```
Refs:
- [Is there life without RTTI or How we wrote our own dynamic_cast - Vladislav Stolyarov](https://pvs-studio.com/en/blog/posts/cpp/0998/)
:::
### Virtual Machine
#### C# .NET CLR (Common Language Runtime)
The *JIT compiler* in *.NET* translates *IL (Intermediate Language)* into native machine code with *CLR* support, enabling *IL* to be dynamically transformed into specific-platform machine code.
## C Extensions & Metaprogramming
### Language Standard
- **`_Generic` Keyword**
*Static polymorphism of Template in C!*
[Generic selection - cppreference.com](https://en.cppreference.com/w/c/language/generic.html)
```c=
#define func(a, b) \
_Generic((a), \
int: _Generic((b), int: "i+i", char *: "i+c", default: "unknown"), \
char *: _Generic((b), int: "c+i", char *: "c+c", default: "unknown"), \
default: "unknown")
int main() {
func(1, 3);
func("str", 3);
func(3, "str");
func("str", 3.14);
return 0;
}
```
### Compiler Extensions
[Extensions to the C Language Family - GCC 12.2.0](https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/C-Extensions.html#C-Extensions)
- **`({ ... })` Expression**
*GNU Extension*
For example:
- [/include/linux/kernel.h](https://elixir.bootlin.com/linux/v6.11.10/source/include/linux/kernel.h#L300)
```c=
#define trace_puts(str) ({ \
static const char *trace_printk_fmt __used \
__section("__trace_printk_fmt") = \
__builtin_constant_p(str) ? str : NULL; \
\
if (__builtin_constant_p(str)) \
__trace_bputs(_THIS_IP_, trace_printk_fmt); \
else \
__trace_puts(_THIS_IP_, str, strlen(str)); \
})
```
:::info
**For More Standard & Portable Practice in Modern C++**
Use lambda expressions to wrap them up:
```C++=
auto x = [&] {
if (...)
return ...;
else
return ...;
}();
```
:::
- **`typeof` (`__typeof__`)**
*GNU Extension*
:::warning
**Something like the C version of `decltype`!**
:::
For example:
- [/include/linux/string.h](https://elixir.bootlin.com/linux/v6.11.1/source/include/linux/string.h#L496)
```c=
#define memset_startat(obj, v, member) \
({ \
u8 *__ptr = (u8 *)(obj); \
typeof(v) __val = (v); \
memset(__ptr + offsetof(typeof(*(obj)), member), __val, \
sizeof(*(obj)) - offsetof(typeof(*(obj)), member \
})
```
- **Other GNU Extensions:** `__builtin_types_compatible_p`, `__builtin_choose_expr`, `__builtin_offsetof`, `__attribute__((...))`, ...
So a simple template function can be implemented at the program level easily without additional programming language standard support:
```c=
#define foo(x) \
({ \
typeof (x) tmp = (x); \
if (__builtin_types_compatible_p (typeof (x), long double)) \
tmp = foo_long_double (tmp); \
else if (__builtin_types_compatible_p (typeof (x), double)) \
tmp = foo_double (tmp); \
else if (__builtin_types_compatible_p (typeof (x), float)) \
tmp = foo_float (tmp); \
else \
abort (); \
tmp; \
})
```
- [Other Built-in Functions Provided by GCC - GNU GCC](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html)
- [Linux 核心原始程式碼巨集: max, min - jserv](https://hackmd.io/@sysprog/linux-macro-minmax)
- `__attribute__((visibility("hidden")))`: [Why use the Global Offset Table for symbols defined in the shared library itself? - StackOverflow](https://stackoverflow.com/questions/55587313/why-use-the-global-offset-table-for-symbols-defined-in-the-shared-library-itself)
- [How exactly does __attribute__((constructor)) work? - StackOverflow](https://stackoverflow.com/questions/2053029/how-exactly-does-attribute-constructor-work)
### Preprocessor Macro
- **Fully Expanded Previous Macro**
As the following shows, consider a macro `STR(str) printf(#str)`, if `str` is also a macro, WHEN should it be expanded?
```c=
#define STR_X(str) printf(#str "\n")
#define STR(str) STR_X(str)
int main() {
STR_X(STR_X(hello)); // STR_X(hello)
// vs.
STR(STR(hello)); // printf("hello" "\n")
return 0;
}
```
- **Get Finite `__VA_ARGS__` Count**
:::warning
Pure macro without using C++ `constexpr`!
:::
- [/sysdeps/unix/sysdep.h](https://elixir.bootlin.com/glibc/glibc-2.41.9000/source/sysdeps/unix/sysdep.h#L111)
```c=
#define __INLINE_SYSCALL_NARGS_X(a,b,c,d,e,f,g,h,n,...) n
#define __INLINE_SYSCALL_NARGS(...) \
__INLINE_SYSCALL_NARGS_X (__VA_ARGS__,7,6,5,4,3,2,1,0,)
#define __INLINE_SYSCALL_DISP(x,...) \
__SYSCALL_CONCAT (x,__INLINE_SYSCALL_NARGS(__VA_ARGS__))(__VA_ARGS__)
__INLINE_SYSCALL_NARGS(0, 1, 2, 3, 4);
// n = 4
// of (0, 1, 2, 3, 4, /* push 1 arg +1 */ 7, 6, 5, "4", ...)
```
## Misc.
### Type Conversion Issues -- Compatible or Incompatible?
*Type-Based Alias Analysis (TBAA) & Type Punning Rules*
> An extended topic on ++Pointer Alias Analysis++.
:::info
**Language Standard vs. Compiler Implementation**
Is *Undefined Behavior* always bad?
UB makes the implementation of the language features enable the compiler to behave differently as for respecting the platform foundations & needs.
:::
- [Example 1] The violation of Type-Based Alias Analysis (TBAA) in C:
```c=
float f = 3.14f;
int *p = (int*) &f; // violates TBAA (undefined behavior)
*p = 10; // type punning (undefined behavior)
```
- [Example 2] Does the compiler consider the 2 pointer parameters of `func` from the same source?
```cpp=
void func(int *i, short *s) {
// type punning undefined behavior: is s aliased to i?
*i = 10;
*s = 20;
*i += 1; // 11 or 21?
}
int main() {
int i;
func(&i, (short*) &i); // violates TBAA
}
```
```sh!
gcc -O1 ... # 21 (1-to-1 reflection code, treat them the same)
gcc -O3 ... # 11 (optimized code, adopt TBAA, treat them differently, then perform the constant propagation: *s = 20; *i = 11;)
```
:::info
**Why does the compiler behave like this?**
> Undefined behavior at `(short *) &i` (violates the standard states below)!
:::
:::warning
**Solution**
1. avoid undefined behavior of TBAA violation:
```cpp!
(short*) &i
```
2. Or use `restrict` pointers to promise unique pointers with no aliases:
```cpp!
void func(int *restrict i, short *restrict s) {
...
}
```
:::
- [Example 3] Does the compiler recognize the same type of the pointer & the object type of the address through a bunch of type-casting pointers?
```cpp=
int x = 10;
float *y = (float*) &x; // violates TBAA (undefined behavior)
int *z = (int*) y; // violates TBAA (undefined behavior)
*z = 42; // type punning (legal)
```
But `x` <=> `*z` deduction successfully based on the below standard spec.
> The standard only cares if the lvalue type is compatible with object type.
:::danger
**Conclution**
Legal type punning operation, but based on undefined behavior operation.
:::
:::warning
**ISO C11 § 6.5 Expressions**
> Can lvalue access the living object's value? Is lvalue type compatible with object type?
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
> The intent of this list is to specify those circumstances in which ==an object may or may not be aliased==.
- a type compatible with the effective type of the object, for example:
```c!
int i = 10;
// compatible type
__int32_t *p = &i;
*p = 20;
```
- a qualified version of a type compatible with the effective type of the object, for example:
```c!
int i = 10;
// compatible type
volatile const int *const restrict p = &i;
*p = 20;
```
- a type that is the signed or unsigned type corresponding to the effective type of the object, for example:
```c!
int i = 10;
// compatible (but interpret data differently)
unsigned int *p = &i;
*p = 20;
```
- a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, for example:
```c!
int i;
// compatible type (but interpret data differently)
const unsigned int *p = &i;
*p = 20;
```
- an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), for example:
```c!
int i = 10;
// compatible type (but interpret data differently)
union u { double d; const int i; };
union u* p = (union u*) &i;
*p = 20;
```
- a character type, for example:
```c!
int i = 10;
// compatible type (but interpret data differently)
char *p = (char*) i;
*p = 20;
```
:::
For those who is not satisfied the above memtioned standard spec (i.e., violates TBAA):
:::info
**Solution: Safe Type Punning**
*1 by 1 byte writing.*
- `memcpy`/`memmove`:
```c=
int i = 10;
struct S {...};
struct S s;
memcpy(&s, &i, sizeof(int));
```
- copied as `char[]`:
```c=
int i = 10;
float f;
for (int _i = 0; _i < sizeof(int); _i++)
f[_i] = (char*) i[_i];
```
:::
- [C - Type punning, Strict aliasing, and Endianness - StackOverflow](https://stackoverflow.com/questions/74673767/c-type-punning-strict-aliasing-and-endianness)
- [Type-Based Alias Analysis in C - Stefan SF](https://stefansf.de/post/type-based-alias-analysis/)
### XX Macro
*Preprocessor Trick*
Firstly, define the attribute keys (& values) wrapped with `XX(...)`, a *magic* macro function, then at the definitions of stuctures or functions, dynamically define how `XX` should perform on those data.
[libuv - GitHub](https://github.com/libuv/libuv)
For example:
```c=
#define MY_EXPAND_ERRNO_MAP(XX) \
XX(EOK, "good one") \
XX(EAPPLE, "bad apple") \
XX(EBANANA, "bad banana") \
...
```
then expand it as needed:
```c=
typedef enum {
#define MY_ERRNO_ENUM(key, _) MY_##key,
// e.g., MY_EOK, MY_EAPPLE, MY_EBANANA, ...
MY_EXPAND_ERRNO_MAP(MY_ERRNO_ENUM)
#undef MY_ERRNO_ENUM
MY_ERRNO_MAX
} my_errno_t;
const char* my_errno_strs[] = {
#define MY_ERRNO_STR(_, str) str,
// e.g., "good one", "bad apple", ...
MY_EXPAND_ERRNO_MAP(MY_ERRNO_STR)
#undef MY_ERRNO_STR
"unknown error"
};
const char* my_strerror(my_errno_t err) {
if (err > MY_ERRNO_MAX || err < 0)
err = MY_ERRNO_MAX;
return my_errno_strs[err];
}
```
### Quantum Computing Compiler
- [LLVM 教學(2):llvm-project 組成的 3 大類別與其資料夾結構介紹 - 灣區筆記](https://bayareanotes.com/llvm-project-layout/)
- [[Paper] Optimized Compiler for Distributed Quantum Computing - D Cuomo, 2023](https://dl.acm.org/doi/10.1145/3579367)