# CS:APP Ch.7 Linking 重點整理
###### tags: `CS:APP`
* Compiler driver
* Which invokes the `language preprocessor`, `compiler`, `assembler`, and `linker`, as needed on behalf of the user
* `unix> gcc -O2 -g -o p main.c swap.c `

1. C preprocessor (cpp)
* which translates the C source file _main.c_ into an ASCII **intermediate file** _main.i_
* `cpp [other arguments] main.c /tmp/main.i `
2. C compiler (cc1)
* which translates _main.i_ into an ASCII **assembly language file** _main.s_
* `cc1 /tmp/main.i main.c -O2 [other arguments] -o /tmp/main.s `
3. Assembler (as)
* which translates _main.s_ into a **relocatable object file** _main.o_
* `as [other arguments] -o /tmp/main.o /tmp/main.s `
4. Linker program (ld)
* which combines _main.o_ and _swap.o_, along with the necessary system object files, to create the **executable object file** _p_
* `ld -o p [system object files and args] /tmp/main.o /tmp/swap.o `
* Object file
1. Relocatable object file
* Contains binary code and data in a form that can be combined with other relocatable object files **at compile time** to create an executable object file.
2. Executable object file
* Contains binary code and data in a form that can be copied directly into memory and executed.
3. Shared object file
* A special type of relocatable object file that can be loaded into memory and **linked dynamically**, at either load time or run time.
* Object file formats vary from system to system
* Modern Unix systems use the Unix **Executable and Linkable Format (ELF)**
* Object files are merely collections of blocks of bytes. A linker concatenates blocks together
* ELF relocatable object file format

* ELF header
* Begins with a 16-byte sequence that describes the **word size** and **byte ordering** of the system that generated the file.
* Allows a linker to parse and interpret the object file.
* Can use **`readelf --header filename.o`** commend to dump the human readable content.

* .text
* The machine code of the compiled program
* .rodata
* Read-only data such as the format strings in printf statements, and jump tables for switch statements
* .data
* **Initialized global** C variables.
* **Local C variables are maintained at run time on the stack**
* .bss
* **Uninitialized global** C variables.
* This section occupies no actual space in the object file
* Symbol
* `Variables` and `functions` all have `names` in source code which we refer to them by.
* Symbols help humans to understand programming, since they are a symbolic representation of an area of memory
* Symbol Visibility
* In some C programs, **`static`** and **`extern`** used with variables can effect what we call the visibility of symbols.
* What **`extern`** says to a compiler is that " it should not allocate any space in memory for this variable, and leave this symbol in the object code where it will be fixed up by **linker**. "
* **`static`** says to the compiler "don't leave any symbols for this in the object code". This means that when the linker is linking together object files it will never see that symbol, so that **we can reuse the variable name in other files.**
* Linker
* Take as input a collection of **relocatable object files** to generate as output a **fully linked executable object file**.
* To build the executable, the linker must perform two main tasks:
* Symbol resolution
* To associate each **symbol reference** with exactly one **symbol definition**.
* Relocation
* Compilers and assemblers generate code and data sections that start at address 0. The linker relocates these sections by associating a memory location with each symbol definition, and then modifying all of the references to those symbols so that they point to this memory location.

* How Linkers Resolve Multiply Defined Global Symbols
* At compile time, the compiler exports each **global symbol** to the assembler as either **strong** or **weak**.
* strong symbol :
* Functions
* **Initialized** global variables
* weak symbol :
* **Uninitialized** global variables
* Unix linkers use the following rules for dealing with multiply defined symbols:
1. Multiple strong symbols are not allowed.
2. Given a strong symbol and multiple weak symbols, choose the strong symbol.
3. Given multiple weak symbols, choose any of the weak symbols.
* For example :

:negative_squared_cross_mark: : In this case, the linker will generate an error message because the strong symbol mainis defined multiple times (rule 1):

:negative_squared_cross_mark: : Similarly, the linker will generate an error message for the above modules because the strong symbol x is defined twice (rule 1):

```
unix> gcc -o foobar3 foo3.c bar3.c
unix> ./foobar3
x = 15212
```
:warning: : At run time, function f changes the value of x from 15213 to 15212
:warning: : Notice that the linker normally gives no indication that it has detected multiple definitions of x
## Linking with Libraries
### Static Library
* A static library is simply a group of object files.
* the linker copies only the object modules in the library that are referenced by the application program.
* On Unix systems, static libraries are stored on disk in a particular file format known as an _archive_. An archive is a collection of concatenated relocatable object files, with a header that describes the size and location of each member object file. Archive filenames are denoted with the **.a** suffix.
:+1: : Provide a way for programmers to utilize standard functions.
:warning: : Every executable file in a system would contain a complete copy of the collection of standard functions, which would be extremely wasteful of disk space.
:warning: : Any change to any static library function, would require the library developer to recompile the entire source file, which is a time-consuming job, and hard to maintain.
### Shared Library
* An object module that, **at run time**, can be loaded at an arbitrary memory address and linked with a program in memory.
* The code and data in shared library are shared by all of the executable object files that reference the library.

---
References :
http://www.cs.cmu.edu/afs/cs/academic/class/15213-f19/www/lectures/14-linking.pdf