<style>
details > summary {
display: list-item;
}
</style>
# Booting 1 ~ 3
note:
Hello, everyone. Today, we'll discuss the first three chapters of the book.
----
[TOC]
---
## From Bootloader to Kernel
> [name= 楊皓宇, 紀政良, 李建緯]
note:
Lets start the first chapter.
----
Power button -> BIOS -> MBR (bootloader)
-> kernel real-mode code
note:
This chapter details the process starting from powering on the device, accessing the BIOS, loading the bootloader from the disk, and finally executing the kernel code.
----
### Power Button
1. The motherboard sends a signal to the power supply device.
2. The power supply provides the proper amount of electricity to the computer.
3. Once the motherboard receives the *power good signal*, it tries to start the CPU.
4. The CPU resets all leftover data in its registers and *sets predefined values for each of them*.
note:
When the power button is pressed, the motherboard sends a signal to the power supply, which then delivers the appropriate amount of electricity to the computer. Upon receiving the "power good" signal, the motherboard initiates the startup of the CPU. The CPU then clears any leftover data in its registers and assigns predefined values to each.
----
#### The Program Counter of x86 in real mode
- Two 16-bit registers for 20-bit memory address
- `CS` (Code Segment) (`CS_selector`)
- Select the memory segment.
- `CS_base` is hidden from programmer.
- `IP` (Instruction Pointer)
- the offset of the memory segment.
note:
Now, we'll introduce the program counter in the x86 CPU architecture when operating in real mode, which we'll cover in detail later. In real mode, two visible 16-bit registers combine to form a 20-bit memory address space. CS, standing for Code Segment (or CS_selector), is used to select a memory segment. Additionally, for the Intel 80386 and later models, there's a hidden register called CS_base, designed to accommodate larger memory spaces. Next, the IP, or Instruction Pointer, specifies the offset within the memory segment.
----
- Intel 8086 (16-bit)
```C
CS: 0xFFFF // code segment selector
IP: 0x0000 // instruction pointer
```
- Intel 80286 (16-bit)
```C
CS: 0xF000
IP: 0xFFF0
```
- Intel 80386 and later (32-bit and 64-bit)
```C
CS: 0xF000
IP: 0xFFF0
```
```C
CS_base: 0xFFFF_0000 // hidden
```
note:
These are the predefined values for CS and IP, as illustrated in the slide.
----
### BIOS
1. The processor starts working in *real mode*.
- An operating mode of all x86 CPUs.
- Addresses in real mode always correspond to real locations in memory.
- 20-bit memory address space.
$$
\begin{aligned}
\text{CS Base} &= \text{CS Selector} \times 16\\
\text{Memory Address} &= \text{CS Base} + \text{IP}
\end{aligned}
$$
note:
After the processor starts, it operates in "real mode," which is a mode present in all x86 processors. In real mode, addresses directly correspond to physical memory locations, meaning there is no memory protection feature. Regardless of being 16-bit or 32-bit, programs are limited to a 20-bit memory address space.
The relationship between the registers is defined by a formula: CS_base equals CS_selector multiplied by 16, or equivalently, shifted left by 4 bits. The memory address is then calculated by adding the CS_base to the instruction pointer.
----
2. Processor start to run code from `0xFFFF_FFF0`.
- *Reset vector*
- The memory controller on the motherboard *redirects* the memory read request to the BIOS ROM.
- Commonly, it is a `near jump` instruction, guiding the system to the rest of BIOS boot code.
note:
After the processor starts, it begins executing the first instruction located at a specific address, known as the reset vector. Since RAM is empty at boot, the motherboard's memory controller redirects memory read requests to the BIOS ROM. Notably, the first instruction at the reset vector is typically a near jump command, which directs the system to the remainder of the BIOS boot code.
----
Recall the initial register data and formula:
```http
CS_selector: 0xF000
CS_base: 0xFFFF_0000
IP: 0xFFF0
```
$$
\begin{aligned}
\text{CS Base} &= \text{CS Selector} \times 16\\
\text{Memory Address} &= \text{CS Base} + \text{IP}
\end{aligned}
$$
According to the formula above, `CS_base` should be `0xF_0000` instead of `0xFFFF_0000`.
Why `CS_base` is not `CS selector` times 16?
note:
Recall the initial register data and formula, can you spot something is wrong? According to the formula above, `CS_base` should be `0xF_0000` instead of `0xFFFF_0000`.
Why `CS_base` is not `CS selector` times 16?
----
- Intel 8086
- 16-bit processor, 20-bit memory address
- Reset vector: `0xF_FFF0`
- Intel 80386 and later
- 32-bit or 64-bit
- Reset vector: `0xFFFF_FFF0`
> In real mode, program can only access 20-bit memory address.
note:
For the Intel 8086, the reset vector corresponds directly to its CS and IP values. However, for the Intel 80386 and later models, the reset vector is a 32-bit address. Considering that real mode only supports a 20-bit memory address space, how can a processor access this larger address?
----
To address this difference and maintain compatibility with the 20-bit memory address system of the real mode, **modern x86 CPUs are designed to initialize the `CS selector` and `CS base` register in a way that aligns with this legacy requirement.**
```http
CS_selector: 0xF000
CS_base: 0xFFFF_0000
IP: 0xFFF0
Reset_vector: 0xFFFF_FFF0
```
----
Furthermore, the distinction between jump instructions is also important.
A `near jump` affects only `IP`, leaving the `CS` as is. However, when the system employs a `far jump`, both the `CS selector` and the `CS base` are synchronized.
----
## Bootloader
note:
Next, we'll talk about bootloader. This essential component serves as the bridge between the firmware's initial power-on state and the loading of the operating system.
----
### How does the bootloader starts?
- The BIOS choose a bootable device from its configuration
- The BIOS tries to find a boot sector that ends with a magic number.
- On hard drive with MBR, the boot sector is in the first 446 bytes
- The BIOS will copy it into a fixed memory location at `0x7c00` and jumps to there
note:
The BIOS, a basic check-up system for your computer, turns on first. It looks for a place to start the operating system from a list of sources like hard drives or USBs. It searches for a unique code, known as the 'magic number,' that marks the beginning of this process. Once found, it moves this code to a specific memory location and starts running it
----
### Bootloader - GRUB 2
#### `boot.img`
- Only contain necessary code to load core image of GRUB 2.
- At the root partition of system.
- Contains code to read file system.
note:
boot.img is the first step in using GRUB 2, our main bootloader.
Because of space limits, it's very compact. Its job is to locate and jump to GRUB 2's core program.
----
### Bootloader - GRUB 2
#### `diskboot.img`
- The core image begins with `diskboot.img`
- Stored immediately after the first sector in the unused space.
- Loads the rest of the core image into memory.
- Then, it executes the `grub_main` function.
note:
diskboot.img comes next, stored in a space before the first partition. It loads the complete GRUB 2, which includes the drivers necessary for reading the filesystem into memory, and then starts grub_main.
----
### Bootloader - GRUB 2
#### `grub_main`
- Initialize the console
- Get the base address for modules
- Set the root device
- Load/parse the GRUB configuration file
- Load modules
- ..
note:
grub_main is where the action happens in GRUB 2. It sets up the console for display, figures out where modules are stored, and chooses the 'root' device. Then, it reads the GRUB configuration file to know what else needs to be loaded.
// This step is crucial for loading other modules and preparing the system for use.
----
### Bootloader - GRUB 2
#### `grub_main`
- Moves GRUB to normal mode and execute the `grub_normal_execute`
- Shows the boot menu to select an OS
note:
At the end of execution, the grub_main function moves grub to normal mode. The grub_normal_execute function completes the final preparations and shows a menu to select an operating system.
----
### Bootloader - GRUB 2
#### `grub_menu_execute_entry`
- The boot loader will load the kernel as the memory layout defined by the boot protocol.
- When the bootloader transfers control to the kernel, it starts at:
`X + sizeof(KernalBootSector) + 1`
note:
In the grub_menu_execute_entry phase, GRUB 2 carefully prepares to hand over control to the operating system's kernel. This process is guided by specific rules known as the boot protocol, ensuring that the kernel is loaded correctly into memory.
The transition point, or where GRUB hands off to the kernel, is calculated by the formula shows bellow. Here, 'X' is the address of the kernel boot sector being loaded, and sizeof(KernelBootSector) calculates the size of the kernel's boot sector, ensuring the kernel starts executing at exactly the right location. This precision ensures the system boots smoothly.
----
### The Beginning of the Kernel Setup Stage
- The kernel is stored in a compressed format.
- It will first configure stuff for the decompressor and some memory management related things
- Decompress the actual kernel and jump to it
note:
Its first task is to get ready for the main event. Since the kernel is stored in a compressed format to save space, it must first be decompressed. But before that, the kernel sets up the necessary environment. This setup involves configuring the decompressor and arranging memory management components.
After these preparations, the kernel proceeds to decompress itself. Once decompression is complete, the kernel jumps into action, beginning its core functions and taking over the system's operation. This moment is critical, marking the transition from booting to an operational state where the operating system takes the lead.
----
### The boot sector
- Starts with `MZ` and follows by a `PE` header.
- The exact entry point for kernel setup part is at an offset from the kernel boot sector.
- It starts by setting many headers and then starts at the `start_up_setup`
note:
The boot sector is essentially the starting line for the operating system's kernel. It has a unique signature that begins with 'MZ', followed by a 'PE' header.
However, the actual starting point for setting up the kernel isn't right at the beginning of this sector. It's a bit further along, at a specific distance known as an offset.
Once we reach this starting point, the kernel begins its setup routine, known as start_up_setup. This phase involves configuring a variety of settings, known as headers, which tell the kernel how to behave and what resources it has at its disposal
That's a closer look at how a computer transitions from off to fully operational through the bootloader and kernel setup. This process, while complex, ensures that your computer starts correctly and is ready for use
---
#### Bootloader -> kernel (real-mode code)
```
bzImage
|
|---- setup.elf // <---- header.S and some c code
|
|---- vmlinux
|
|---- setup.bin
|
|---- vmlinux.bin.gz
```
----
### Kernel real-mode code
![image](https://hackmd.io/_uploads/HJhmuEjTT.png)
<!--
```=118
| Protected-mode kernel |
100000 +------------------------+
| I/O memory hole |
0A0000 +------------------------+
| Reserved for BIOS | Leave as much as possible unused
~ ~
| Command line | (Can also be below the X+10000 mark)
X+10000 +------------------------+
| Stack/heap | For use by the kernel real-mode code.
X+08000 +------------------------+
| Kernel setup | The kernel real-mode code.
| Kernel boot sector | The kernel legacy boot sector.
X +------------------------+
| Boot loader | <- Boot sector entry point 0000:7C00
001000 +------------------------+
| Reserved for MBR/BIOS |
000800 +------------------------+
| Typically used by MBR |
000600 +------------------------+
| BIOS use only |
000000 +------------------------+
``` -->
<!-- ![image](https://hackmd.io/_uploads/Hko9AAcp6.png) -->
#### In this case X = 0x10000;
#### Kernel setup is at 0x10200
----
[header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S)
1. lagacy boot sector (512 bytes)
2. first part of kernel setup
3. (boot header information)
----
#### Kernel legacy code
<!-- <span style="font-size:0.5em;"> -->
[header.S](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/header.S)
![螢幕快照 2024-03-10 00-13-10](https://hackmd.io/_uploads/H1lzpNoaT.png)
<!--
``` =39
.code16
.section ".bstext", "ax"
.global bootsect_start
bootsect_start:
#ifdef CONFIG_EFI_STUB
# "MZ", MS-DOS header
.byte 0x4d
.byte 0x5a
#endif
``` -->
<!--
``` =279
.globl hdr
hdr:
setup_sects: .byte 0 /* Filled in by build.c */
root_flags: .word ROOT_RDONLY
syssize: .long 0 /* Filled in by build.c */
ram_size: .word 0 /* Obsolete */
vid_mode: .word SVGA_MODE
root_dev: .word 0 /* Filled in by build.c */
boot_flag: .word 0xAA55
# offset 512, entry point
.globl _start
``` -->
<!-- </span> -->
----
#### Kernel legacy code
![螢幕快照 2024-03-10 00-13-50](https://hackmd.io/_uploads/rkgtTNo6a.png)
##### According to the Linux [boot protocol](https://github.com/torvalds/linux/blob/v4.16/Documentation/x86/boot.txt)
----
#### Kernel legacy code
<!-- <span style="font-size:0.6em;"> -->
- contains some code to show the error message
- if we don't use 3rd party bootloader and let BIOS load the first sector of kernel image into memory at 0x7c00 and run
![image](https://hackmd.io/_uploads/B14lzksT6.png)
<!-- </span> -->
----
#### Kernel setup code in Header.S
```
// first part of setup header
```
![image](https://hackmd.io/_uploads/SkvOg_o66.png)
```
// second part of setup header
```
<!-- <span style="font-size:0.5em;">
``` =289
# offset 512, entry point
.globl _start
_start:
# Explicitly enter this as bytes, or the assembler
# tries to generate a 3-byte jump here, which causes
# everything else to push off to the wrong offset.
.byte 0xeb # short (2-byte) jump
.byte start_of_setup-1f
1:
```
``` =558
init_size: .long INIT_SIZE # kernel initialization size
handover_offset: .long 0 # Filled in by build.c
# End of setup header #####################################################
.section ".entrytext", "ax"
start_of_setup:
# Force %es = %ds
movw %ds, %ax
movw %ax, %es
cld
```
</span> -->
<!-- <span style="font-size:0.5em;">
``` =616
# Jump to C code (should not return)
calll main
```
</span> -->
![image](https://hackmd.io/_uploads/HyQSWujp6.png)
----
### Kernel setup code in header.S
<!-- <span style="font-size:0.5em;"> -->
- Aligning the segment registers
- Stack setup
- BSS setup
- Jump to main
<!-- </span> -->
----
### Aligning the segment registers
![image](https://hackmd.io/_uploads/SyzoYC5aT.png)
<!-- <span style="font-size:0.5em;"> -->
#### we want es=cs=ss=ds=0x1000
<!-- </span> -->
----
### Aligning the segment registers
<!-- <span style="font-size:0.7em;"> -->
Make sure that all segment register values are equal
- <font color=#FFFF00>ds</font> is already <font color=#00FFFF>0x1000</font>
- Force <font color=#00FF00>es</font> = <font color=#FFFF00>ds</font>
- stack setup (set sp), then let <font color=#FF00FF>ss</font> = <font color=#FFFF00>ds</font>
- <font color=#FF0000>cs</font> right now is <font color=#00FFFF>0x1020</font>, but it's not modifiable manually
<!-- </span> -->
----
### Aligning the segment registers
<!-- <span style="font-size:0.7em;"> -->
- <font size = "6"> grub2 loads kernel setup code at address <font color=#00FFFF>0x10000</font>, but starts from<font color=#FF7788> 0x10200 </font> </font>
- <font size = "6"> we want to let <font color=#FF0000>cs</font> = <font color=#00FFFF>0x1000</font></font>
- <font size = "6"> push <font color=#FFFF00>ds</font> and the address of the next instruction </font>
- <font size = "6"> then 'return' </font>
![image](https://hackmd.io/_uploads/HyasTrjTa.png)
<!-- ```c=599
pushw %ds
pushw $6f
lretw
6:
# Check signature at end of setup
cmpl $0x5a5aaa55, setup_sig
jne setup_bad
``` -->
<!-- </span> -->
----
### Stack setup
- <font size="6">in loadflags, if <font color=#FF0000> CAN_USE_HEAP </font>is clear (=0)</font>
- <font size="6">sp = _end + STACK_SIZE</font>
![image](https://hackmd.io/_uploads/SyxZrp9TT.png)
----
### Stack setup
- <font size="6">if <font color=#FF0000> CAN_USE_HEAP</font> flag is set (=1)</font>
- <font size="6"> sp = heap_end_ptr + STACK_SIZE </font>
<font size="6"> or </font>
- <font size="6"> sp = 0xFFFC </font>
<font size="6"> if heap_end_ptr + STACK_SIZE is overflow </font>
![image](https://hackmd.io/_uploads/BJ7DrT5aT.png)
----
#### BSS setup
- <font size="6">Linux carefully ensures this area of memory is zeroed</font>
- <font size="6">The code writes zeros from __bss_start to _end</font>
![image](https://hackmd.io/_uploads/rkPyf0q6p.png)
----
<font size="5"> Now we have the stack and BSS, so we can jump to the main() C function </font>
![螢幕快照 2024-03-10 00-16-16](https://hackmd.io/_uploads/BJtviSo66.png)
----
### Takeaway Questions (1)
- What is the entry point of the BIOS code for 64-bit x86 CPU?
(A). `0xF_FFF0`
(B). `0xFFFF_FFF0`
\(C\). `0xFFFF_FFFF_FFFF_FFF0`
----
### Takeaway Questions (2)
- In which stage, it will try to find the boot sector ends with the specific signature (`0x55`, `0xaa`) ?
(A). Bootloader
(B). BIOS
\(C\). Kernel setup
----
### Takeaway Questions (3)
- What is the entry point of the kernel code in this case?
(A). `0x7C00`
(B). `0x10000`
\(C\). `0x10200`
---
## First steps in the kernel setup code
> [name=高士軒, 黃爾泰]
----
### Protected mode
- `Protected mode` was the main mode of Intel processors from the 80286 processor until Intel 64 and `long mode` came.
- Very limited access to the RAM in `real mode`, which is only `1 MB`.
- 20-bit address was replaced with 32-bit address bus.
- It allowed access to 4 GB memory.
- The main difference between `real mode` and `protected mode` is memory management.
----
#### Memory management in protected mode
- The size and location of each segment is described by an associated data structure called the `Segment Descriptor`.
- These segment descriptors are stored in a data structure called the `Global Descriptor Table (GDT)`.
- The address of GDT is stored in the special `GDTR` register. There will be an operation for loading it from memory, something like:
```
lgdt gdt
```
----
#### Segment descriptor in GDT
- Each descriptor is 64-bits in size. The general scheme of a descriptor is:
```
63 56 51 48 45 39 32
------------------------------------------------------------
| | |B| |A| | | | |0|E|W|A| |
| BASE 31:24 |G|/|L|V| LIMIT |P|DPL|S| TYPE | BASE 23:16 |
| | |D| |L| 19:16 | | | |1|C|R|A| |
------------------------------------------------------------
31 16 15 0
------------------------------------------------------------
| | |
| BASE 15:0 | LIMIT 15:0 |
| | |
------------------------------------------------------------
```
----
#### Segment selector in protected mode
- Segment registers contain segment selectors as in `real mode`.
- Each Segment Descriptor has an associated Segment Selector which is a 16-bit structure:
```c
15 3 2 1 0
-----------------------------
| Index | TI | RPL |
-----------------------------
```
- `Index` stores the index number of the descriptor in the GDT.
- And `RPL` contains the Requester's Privilege Level.
----
#### Get physical address in protected mode
- Use `GDT address + Index` from the selector.
- Plus the base address of segment with offset.
![Physical address](https://hackmd.io/_uploads/Hk8Lhms66.png)
----
### Copy boot parameters
- The first function called in main is `copy_boot_params(void)`.
- The `boot_params` structure includes a member `setup_header hdr` which contains some useful parameters in later initialization.
- Using `memcpy` defined in `copy.S` to copy `hdr` to `boot_params`.
----
### Console initialization
- Function `console_init` would be called.
- It tries to parse the port address and baud rate of the serial port and initialize it.
- It would output the string below to test if the serial port initialization is successful.
```c
puts("early console in setup code\n");
```
- The `puts` function can print character by character by interrypt `0x10`.
----
### Heap initialization
- Initialize the heap with the `init_heap` function.
- Checks `CAN_USE_HEAP` flag from `loadflags`.
- `loadflags` is a bitmask and it also contains other mask.
- The inline assembly is to calculate the address of `stack_end`.
- ``` stack_end = esp - STACK_SIZE;```
```c
char *stack_end;
if (boot_params.hdr.loadflags & CAN_USE_HEAP) {
asm("leal %P1(%%esp),%0"
: "=r" (stack_end) : "i" (-STACK_SIZE));
```
----
### Heap initialization
- And `heap_end` is defined in other header file.
- The last check is whether `heap_end` is greater than `stack_end`. If it is then `stack_end` is assigned to `heap_end` to make them equal.
```c
if (heap_end > stack_end)
head_end = stak_end;
```
----
![The first example](https://hackmd.io/_uploads/BksbR7jaT.png)
----
![The second example](https://hackmd.io/_uploads/S1W30boTp.png)
---
### Takeaway Questions (4)
- What is the main reason to use `protected mode`?
(A). Faster execution speed.
(B). More available memory space.
\(C\). Lower hardware requirements.
----
### CPU Validation
- Function `validate_cpu`
- Checks if the CPU is in right CPU level by function `check_cpu`
```
check_cpu(&cpu_level, &req_level, &err_flags);
if (cpu_level < req_level) {
...
return -1;
}
```
----
### CPU Validation
- Function `check_cpu`
1. CPU have right level
e.g. long mode in x86_64
1. Preparations for vendor deterministic feature
e.g. SSE+SSE2 for AMD if missing
----
### CPU Validation
- After validation of CPU, `set_bios_mode` is called.
- It executes `0x15` BIOS interrupt to tell BIOS long mode will be used.
----
### Memory Detection
- Next step is to get information about memory from bios.
- `detect_memory` function provides a map of available RAM to the CPU.
- There are many programming interfaces for memory detection such as `0xe820`, `0xe801` and `0x88`.
- We will take `0xe820` for example.
----
### Memory Detection:[`detect_memory_e820`](https://github.com/torvalds/linux/blob/0adb32858b0bddf4ada5f364a84ed60b196dbcda/arch/x86/boot/memory.c#L20)
- Initialize the `biosregs` structure with `0xe820` call.
```
initregs(&ireg);
ireg.ax = 0xe820;
ireg.cx = sizeof buf;
ireg.edx = SMAP;
ireg.di = (size_t)&buf;
```
----
### Memory Detection:[`detect_memory_e820`](https://github.com/torvalds/linux/blob/0adb32858b0bddf4ada5f364a84ed60b196dbcda/arch/x86/boot/memory.c#L20)
1. `ax` : the number of the function (0xe820)
1. `cx` : size of the buffer which will contain data about the memory (`sizeof buf`)
1. `edx` : `SMAP`(ASCII) magic number
1. `es:di` : contain the address of the buffer (`&buf`)
1. `ebx` : Initialized to zero in the first time.
----
### Memory Detection:[`detect_memory_e820`](https://github.com/torvalds/linux/blob/0adb32858b0bddf4ada5f364a84ed60b196dbcda/arch/x86/boot/memory.c#L20)
- A loop calling `intcall(0x15, &ireg, &oreg)`
- Gets memory information by BIOS interrupt.
- Get lines by call the interrupt iteratively.
```
// in each iteration
intcall(0x15, &ireg, &oreg);
ireg.ebx = oreg.ebx; // update ebx with pervious value
```
- Loop terminate when `ebx` = 0
- Collects data write into an array of `e820_entry`
----
### Memory Detection:[`detect_memory_e820`](https://github.com/torvalds/linux/blob/0adb32858b0bddf4ada5f364a84ed60b196dbcda/arch/x86/boot/memory.c#L20)
- Each `e820_entry` contain
1. start of memory segment
1. size of memory segment
1. type of memory segment(used or reserved)
----
- Simply output of `dmesg` like
```
[ 0.0] e820: BIOS-provided physical RAM map:
[ 0.0] BIOS-e820: [mem 0x00000000-0x0009fbff] usable
[ 0.0] BIOS-e820: [mem 0x0009fc00-0x0009ffff] reserved
[ 0.0] BIOS-e820: [mem 0x000f0000-0x000fffff] reserved
[ 0.0] BIOS-e820: [mem 0x00100000-0x3ffdffff] usable
[ 0.0] BIOS-e820: [mem 0x3ffe0000-0x3fffffff] reserved
[ 0.0] BIOS-e820: [mem 0xfffc0000-0xffffffff] reserved
```
----
### Keyboard Initialization
- `keyboard_init` function
- Call interrupt `0x16` to query the status of the keyboard
```
initregs(&ireg);
ireg.ah = 0x02; /* Get keyboard status */
intcall(0x16, &ireg, &oreg);
boot_params.kbd_status = oreg.al;
```
- Call interrupt `0x16` again to set repeat rate and delay.
```
ireg.ax = 0x0305; /* Set keyboard repeat rate */
```
----
### Querying
- The next steps are queries for different parameters.
- We will not dive into details about these queries now.
- In next few slides, we will take a look to some functions for example.
----
### Function: `query_ist`
- Get Intel SpeedStep(A variable CPU frequency feature provide by intel) by calling `query_ist` function.
- Checks the CPU level and if it is correct.
- Call the interrupt `0x15` to get the info and saves the result to `boot_params`.
----
### Function: `query_apm_bios`
- `query_apm_bios` calls the interrupt `0x15` with `ah=0x53` to check `APM` installation.
- APM : Advanced Power Management
- A standard for power management
----
### Function: `query_apm_bios`
- Next, it calls `0x15` again, but with `ah=0x5304` to disconnect `APM` interface and connect the 32-bit protected mode interface.
- In the end, it fills `boot_params.apm_bios_info` with values obtained from the BIOS.
----
### Function: `query_apm_bios`
- Note: `query_apm_bios` will be executed only when `CONFIG_APM` or `CONFIG_APM_MODULE` compile flag was set.
----
### Function: `query_edd`
- The last is the `query_edd` function, which queries EDD(`Enhanced Disk Drive`) info from BIOS.
- Enhanced Disk Drive
- A interface provide better access to hard drive
- Can be disable by kernel's command line
- It also use a loop to query these infomation
----
### Function: `query_edd`
- The simplied code of `query_edd`
```
for (devno = 0x80; devno < 0x80 + EDD_MBR_SIG_MAX; devno++) {
if (!get_edd_info(devno, &ei) &&
boot_params.eddbuf_entries < EDDMAXNR) {
memcpy(edp, &ei, sizeof ei);
edp++;
boot_params.eddbuf_entries++;
}
...
...
...
}
```
----
### Takeaway Questions (5)
- What does function `validate_cpu` do?
(A). Check if CPU is in right level.
(B). Check whether CPU is broken.
\(C\). Benchmark to check the speed of the CPU.
----
### Takeaway Questions (6)
- What does function `detect_memory_e820` do?
(A). Detect the bandwith of memory.
(B). Detect the DRAM generation.
\(C\). Get information about available address
---
## Video mode initialization and transition to protected mode
> [name=Ting Shiuan Guan, Tim Lin]
----
`main()` in [`arch/x86/boot/main.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/main.c#L177-L182)
```c=177
/* Set the video mode */
set_video();
/* Do the last things and invoke protected mode */
go_to_protected_mode();
}
```
----
### Kernel data types
| size (bytes) | 1 | 2 | 4 | 8 |
| :--: | :--: | :---: | :--: | :--: |
| signed type | char | short | int | long |
| unsigned type | u8 | u16 | u32 | u64 |
----
### Heap API
Defined in [`arch/x86/boot/boot.h`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/boot.h#L174)
```c=170 [1-100|5|15-16|6-14|18-21]
/* Heap -- available for dynamic lists. */
extern char _end[];
extern char *HEAP;
extern char *heap_end;
#define RESET_HEAP() ((void *)( HEAP = _end ))
static inline char *__get_heap(size_t s, size_t a, size_t n)
{
char *tmp;
HEAP = (char *)(((size_t)HEAP+(a-1)) & ~(a-1));
tmp = HEAP;
HEAP += s*n;
return tmp;
}
#define GET_HEAP(type, n) \
((type *)__get_heap(sizeof(type),__alignof__(type),(n)))
static inline bool heap_free(size_t n)
{
return (int)(heap_end-HEAP) >= (int)n;
}
```
----
### Video Mode
`main()` in [`arch/x86/boot/main.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/main.c#L177-L182)
```c=177 [2]
/* Set the video mode */
set_video();
/* Do the last things and invoke protected mode */
go_to_protected_mode();
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=319 []
void set_video(void)
{
u16 mode = boot_params.hdr.vid_mode;
RESET_HEAP();
store_mode_params();
save_screen();
probe_cards(0);
for (;;) {
if (mode == ASK_VGA)
mode = mode_menu();
if (!set_mode(mode))
break;
printf("Undefined video mode number: %x\n", mode);
mode = ASK_VGA;
}
boot_params.hdr.vid_mode = mode;
vesa_store_edid();
store_mode_params();
if (do_restore)
restore_screen();
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=319 [3]
void set_video(void)
{
u16 mode = boot_params.hdr.vid_mode;
RESET_HEAP();
```
<!-- * Filled in [`copy_boot_params`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/main.c#L30) -->
* Boot protocol
* header `vid_mode`
* offset `0x01FA` / size `2`
* command line options `vga=<mode>`
* integer / `normal` / `ext` / `ask`
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=321 [3,5]
u16 mode = boot_params.hdr.vid_mode;
RESET_HEAP();
store_mode_params();
save_screen();
probe_cards(0);
```
* Reset heap.
* Store in `boot_params.screen_info`.
----
In [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c)
`store_mode_params()`
```c=57 [7,17-18|20-26|28-30|32-41]
/*
* Store the video mode parameters for later usage by the kernel.
* This is done by asking the BIOS except for the rows/columns
* parameters in the default 80x25 mode -- these are set directly,
* because some very obscure BIOSes supply insane values.
*/
static void store_mode_params(void)
{
u16 font_size;
int x, y;
/* For graphics mode, it is up to the mode-setting driver
(currently only video-vesa.c) to store the parameters */
if (graphic_mode)
return;
store_cursor_position();
store_video_mode();
if (boot_params.screen_info.orig_video_mode == 0x07) {
/* MDA, HGC, or VGA in monochrome mode */
video_segment = 0xb000;
} else {
/* CGA, EGA, VGA and so forth */
video_segment = 0xb800;
}
set_fs(0);
font_size = rdfs16(0x485); /* Font size, BIOS area */
boot_params.screen_info.orig_video_points = font_size;
x = rdfs16(0x44a);
y = (adapter == ADAPTER_CGA) ? 25 : rdfs8(0x484)+1;
if (force_x)
x = force_x;
if (force_y)
y = force_y;
boot_params.screen_info.orig_video_cols = x;
boot_params.screen_info.orig_video_lines = y;
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=323 [4]
RESET_HEAP();
store_mode_params();
save_screen();
probe_cards(0);
```
* Save contents of screen to heap.
----
`save_screen()`
```c=241 [12-15]
static void save_screen(void)
{
/* Should be called after store_mode_params() */
saved.x = boot_params.screen_info.orig_video_cols;
saved.y = boot_params.screen_info.orig_video_lines;
saved.curx = boot_params.screen_info.orig_x;
saved.cury = boot_params.screen_info.orig_y;
if (!heap_free(saved.x*saved.y*sizeof(u16)+512))
return; /* Not enough heap to save the screen */
saved.data = GET_HEAP(u16, saved.x*saved.y);
set_fs(video_segment);
copy_from_fs(saved.data, 0, saved.x*saved.y*sizeof(u16));
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=325 [3]
store_mode_params();
save_screen();
probe_cards(0);
for (;;) {
```
* Probe video drivers and generate mode lists.
----
`probe_cards()` in [`arch/x86/boot/video-mode.c`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video-mode.c#L30-L49)
```c=30 [12,19,15]
/* Probe the video drivers and have them generate their mode lists. */
void probe_cards(int unsafe)
{
struct card_info *card;
static u8 probed[2];
if (probed[unsafe])
return;
probed[unsafe] = 1;
for (card = video_cards; card < video_cards_end; card++) {
if (card->unsafe == unsafe) {
if (card->probe)
card->nmodes = card->probe();
else
card->nmodes = 0;
}
}
}
```
----
`video_vga` in [arch/x86/boot/video-vga.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video-vga.c#L284-L288)
```c=284
static __videocard video_vga = {
.card_name = "VGA",
.probe = vga_probe,
.set_mode = vga_set_mode,
};
```
`__videocard` macro in [`arch/x86/boot/video.h`](`https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.h#L83-L84`)
```c=83
#define __videocard struct card_info __section(".videocards") __attribute__((used))
```
----
`video_cards` in [`arch/x86/boot/setup.ld`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/setup.ld#L29-L33)
```ld=29
.videocards : {
video_cards = .;
*(.videocards)
video_cards_end = .;
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=327 [3,12,7-8|4-5,10-11]
probe_cards(0);
for (;;) {
if (mode == ASK_VGA)
mode = mode_menu();
if (!set_mode(mode))
break;
printf("Undefined video mode number: %x\n", mode);
mode = ASK_VGA;
}
boot_params.hdr.vid_mode = mode;
```
----
:::spoiler `mode_menu()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L202-L232)
```c=202
static unsigned int mode_menu(void)
{
int key;
unsigned int sel;
puts("Press <ENTER> to see video modes available, "
"<SPACE> to continue, or wait 30 sec\n");
kbd_flush();
while (1) {
key = getchar_timeout();
if (key == ' ' || key == 0)
return VIDEO_CURRENT_MODE; /* Default */
if (key == '\r')
break;
putchar('\a'); /* Beep! */
}
for (;;) {
display_menu();
puts("Enter a video mode or \"scan\" to scan for "
"additional modes: ");
sel = get_entry();
if (sel != SCAN)
return sel;
probe_cards(1);
}
}
```
:::
![image](https://hackmd.io/_uploads/SkZ1bX96a.png)
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=329 [5-6,11]
for (;;) {
if (mode == ASK_VGA)
mode = mode_menu();
if (!set_mode(mode))
break;
printf("Undefined video mode number: %x\n", mode);
mode = ASK_VGA;
}
boot_params.hdr.vid_mode = mode;
vesa_store_edid();
```
----
`set_mode()` in [`arch/x86/boot/video-mode.c`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video-mode.c#L144-L171)
```c=144 [15]
/* Set mode (with recalc if specified) */
int set_mode(u16 mode)
{
int rv;
u16 real_mode;
/* Very special mode numbers... */
if (mode == VIDEO_CURRENT_MODE)
return 0; /* Nothing to do... */
else if (mode == NORMAL_VGA)
mode = VIDEO_80x25;
else if (mode == EXTENDED_VGA)
mode = VIDEO_8POINT;
rv = raw_set_mode(mode, &real_mode);
if (rv)
return rv;
if (mode & VIDEO_RECALC)
vga_recalc_vertical();
/* Save the canonical mode number for the kernel, not
an alias, size specification or menu position */
#ifndef _WAKEUP
boot_params.hdr.vid_mode = real_mode;
#endif
return 0;
}
```
----
`raw_set_mode()` in [`arch/x86/boot/video-mode.c`](https://github.com/torvalds/linux/blob/master/arch/x86/boot/video-mode.c#L69-L111)
```c=69 [13,22]
/* Set mode (without recalc) */
static int raw_set_mode(u16 mode, u16 *real_mode)
{
int nmode, i;
struct card_info *card;
struct mode_info *mi;
/* Drop the recalc bit if set */
mode &= ~VIDEO_RECALC;
/* Scan for mode based on fixed ID, position, or resolution */
nmode = 0;
for (card = video_cards; card < video_cards_end; card++) {
mi = card->modes;
for (i = 0; i < card->nmodes; i++, mi++) {
int visible = mi->x || mi->y;
if ((mode == nmode && visible) ||
mode == mi->mode ||
mode == (mi->y << 8)+mi->x) {
*real_mode = mi->mode;
return card->set_mode(mi);
}
if (visible)
nmode++;
}
}
/* Nothing found? Is it an "exceptional" (unprobed) mode? */
for (card = video_cards; card < video_cards_end; card++) {
if (mode >= card->xmode_first &&
mode < card->xmode_first+card->xmode_n) {
struct mode_info mix;
*real_mode = mix.mode = mode;
mix.x = mix.y = 0;
return card->set_mode(&mix);
}
}
/* Otherwise, failure... */
return -1;
}
```
----
[arch/x86/boot/video-vga.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video-vga.c)
```c=284 [4]
static __videocard video_vga = {
.card_name = "VGA",
.probe = vga_probe,
.set_mode = vga_set_mode,
};
```
`vga_set_mode()`
```c=193 [10,13-15]
static int vga_set_mode(struct mode_info *mode)
{
/* Set the basic mode */
vga_set_basic_mode();
/* Override a possibly broken BIOS */
force_x = mode->x;
force_y = mode->y;
switch (mode->mode) {
case VIDEO_80x25:
break;
case VIDEO_8POINT:
vga_set_8font();
break;
case VIDEO_80x43:
vga_set_80x43();
break;
case VIDEO_80x28:
vga_set_14font();
break;
case VIDEO_80x30:
vga_set_80x30();
break;
case VIDEO_80x34:
vga_set_80x34();
break;
case VIDEO_80x60:
vga_set_80x60();
break;
}
return 0;
}
```
----
`vga_set_8font()` in [arch/x86/boot/video-vga.c](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video-vga.c)
```c=63 [8-11]
static void vga_set_8font(void)
{
/* Set 8x8 font - 80x43 on EGA, 80x50 on VGA */
struct biosregs ireg;
initregs(&ireg);
/* Set 8x8 font */
ireg.ax = 0x1112;
/* ireg.bl = 0; */
intcall(0x10, &ireg, NULL);
/* Use alternate print screen */
ireg.ax = 0x1200;
ireg.bl = 0x20;
intcall(0x10, &ireg, NULL);
/* Turn off cursor emulation */
ireg.ax = 0x1201;
ireg.bl = 0x34;
intcall(0x10, &ireg, NULL);
/* Cursor is scan lines 6-7 */
ireg.ax = 0x0100;
ireg.cx = 0x0607;
intcall(0x10, &ireg, NULL);
}
```
----
`set_video()` in [`arch/x86/boot/video.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/video.c#L319-L345)
```c=338 [2|3|4|6-7]
}
boot_params.hdr.vid_mode = mode;
vesa_store_edid();
store_mode_params();
if (do_restore)
restore_screen();
}
```
* EDID (Extended Display Identification Data)
----
### Last preparation before transition into protected mode
----
[`arch/x86/boot/main.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/main.c)
```c=177 [5]
/* Set the video mode */
set_video();
/* Do the last things and invoke protected mode */
go_to_protected_mode();
}
```
----
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
- Before enter protected mode, there are few things to do.
```c [3-12]
void go_to_protected_mode(void)
{
realmode_switch_hook();
if (enable_a20()) {
...
}
reset_coprocessor();
mask_all_interrupts();
setup_idt();
setup_gdt();
protected_mode_jump(boot_params.hdr.code32_start,
(u32)&boot_params + (ds() << 4));
}
```
----
### realmode_switch_hook
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=104 [4]
void go_to_protected_mode(void)
{
/* Hook before leaving real mode, also disables interrupts */
realmode_switch_hook();
/* Enable the A20 gate */
if (enable_a20()) {
puts("A20 gate not responding, unable to boot...\n");
die();
}
/* Reset coprocessor (IGNNE#) */
reset_coprocessor();
/* Mask all interrupts in the PIC */
mask_all_interrupts();
...
```
----
### realmode_switch_hook
- Invoke realmode_switch is present
- Hooks are used if the bootloader runs in a hostile environment.
- `io_delay`
- `asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT));`
```c [3-6]
static void realmode_switch_hook(void)
{
if (boot_params.hdr.realmode_swtch) {
asm volatile("lcallw *%0"
: : "m" (boot_params.hdr.realmode_swtch)
: "eax", "ebx", "ecx", "edx");
} else {
asm volatile("cli");
outb(0x80, 0x70); /* Disable NMI */
io_delay();
}
}
```
----
### ADVANCED BOOT LOADER HOOKS
```txt
If the boot loader runs in a particularly hostile environment (such as
LOADLIN, which runs under DOS) it may be impossible to follow the
standard memory location requirements.
Such a boot loader may use the
following hooks that, if set, are invoked by the kernel at the
appropriate time. The use of these hooks should probably be
considered an absolutely last resort!
```
----
### realmode_switch_hook (No hook)
- Disable NMI (Non Maskable Interrupt)
- Hardware interrupt that standard interrupt-masking in the system cannot ignore
- Usually occured at non-recovery hardware error
- Writing 0x80 to CMOS Address register (0x70)
```c [7-11]
static void realmode_switch_hook(void)
{
if (boot_params.hdr.realmode_swtch) {
asm volatile("lcallw *%0"
: : "m" (boot_params.hdr.realmode_swtch)
: "eax", "ebx", "ecx", "edx");
} else {
asm volatile("cli");
outb(0x80, 0x70); /* Disable NMI */
io_delay();
}
}
```
----
### Enable A20 line
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=104 [6-10]
void go_to_protected_mode(void)
{
/* Hook before leaving real mode, also disables interrupts */
realmode_switch_hook();
*
/* Enable the A20 gate */
if (enable_a20()) {
puts("A20 gate not responding, unable to boot...\n");
die();
}
/* Reset coprocessor (IGNNE#) */
reset_coprocessor();
/* Mask all interrupts in the PIC */
mask_all_interrupts();
...
```
----
### Enable A20 line
- A20 line in particular is used to transmit the 21st bit on the address bus
- Intel no longer supports the A20 gate
```c
int enable_a20(void)
{
int loops = A20_ENABLE_LOOPS;
int kbc_err;
while (loops--) {
/* First, check to see if A20 is already enabled
(legacy free, etc.) */
if (a20_test_short())
return 0;
/* Next, try the BIOS (INT 0x15, AX=0x2401) */
enable_a20_bios();
if (a20_test_short())
return 0;
/* Try enabling A20 through the keyboard controller */
kbc_err = empty_8042();
if (a20_test_short())
return 0; /* BIOS worked, but with delayed reaction */
if (!kbc_err) {
enable_a20_kbc();
if (a20_test_long())
return 0;
}
/* Finally, try enabling the "fast A20 gate" */
enable_a20_fast();
if (a20_test_long())
return 0;
}
return -1;
}
```
----
### Disable Coprocessor
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=104 [12-13]
void go_to_protected_mode(void)
{
/* Hook before leaving real mode, also disables interrupts */
realmode_switch_hook();
/* Enable the A20 gate */
if (enable_a20()) {
puts("A20 gate not responding, unable to boot...\n");
die();
}
/* Reset coprocessor (IGNNE#) */
reset_coprocessor();
/* Mask all interrupts in the PIC */
mask_all_interrupts();
...
```
----
### Disable Coprocessor
- Clears the Math Coprocessor
1. Writing 0 to 0xf0
2. Resets it by writing 0 to 0xf1.
```c
outb(0, 0xf0);
outb(0, 0xf1);
```
----
### Mask All Interrupt
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=104 [15-16]
void go_to_protected_mode(void)
{
/* Hook before leaving real mode, also disables interrupts */
realmode_switch_hook();
/* Enable the A20 gate */
if (enable_a20()) {
puts("A20 gate not responding, unable to boot...\n");
die();
}
/* Reset coprocessor (IGNNE#) */
reset_coprocessor();
/* Mask all interrupts in the PIC */
mask_all_interrupts();
...
```
----
### Mask All Interrupt
- Masks all interrupts on the secondary `PIC` (Programmable Interrupt Controller) and primary `PIC`
- Except for `IRQ2` on the primary `PIC`.
- `IRQ2` line cascade PIC1 and PIC2
```c
outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */
outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */
```
----
### Setting up the Interrupt Descriptor Table
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=124 [2]
/* Actual transition to protected mode... */
setup_idt();
setup_gdt();
protected_mode_jump(boot_params.hdr.code32_start,
(u32)&boot_params + (ds() << 4));
}
```
----
### IDT Struct
```c
struct gdt_ptr {
u16 len;
u32 ptr;
} __attribute__((packed));
```
----
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
- Load a null IDT
- Interrupt is another topic, IDT is initialized after boot
```c=95 [3-4]
static void setup_idt(void)
{
static const struct gdt_ptr null_idt = {0, 0};
asm volatile("lidtl %0" : : "m" (null_idt));
}
```
----
### Set up Global Descriptor Table
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=124 [3]
/* Actual transition to protected mode... */
setup_idt();
setup_gdt();
protected_mode_jump(boot_params.hdr.code32_start,
(u32)&boot_params + (ds() << 4));
}
```
----
### `boot_gdt`
- Intel recommends 16 byte alignment
```c
static const u64 boot_gdt[] __attribute__((aligned(16))) = {
/* CS: code, read/execute, 4 GB, base 0 */
[GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff),
/* DS: data, read/write, 4 GB, base 0 */
[GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff),
/* TSS: 32-bit tss, 104 bytes, base 4096 */
/* We only have a TSS here to keep Intel VT happy;
we don't actually use it for anything. */
[GDT_ENTRY_BOOT_TSS] = GDT_ENTRY(0x0089, 4096, 103),
};
```
----
### Load GDT
```c
static struct gdt_ptr gdt;
gdt.len = sizeof(boot_gdt)-1;
gdt.ptr = (u32)&boot_gdt + (ds() << 4);
asm volatile("lgdtl %0" : : "m" (gdt));
```
----
### `GDT_ENTRY` Macro
`GDT_ENTRY(base, limit, flag)`
----
### `flag` of GDT_ENTRY
- <1>(G) granularity bit
- <1>(D) if 0 16-bit segment; 1 = 32-bit segment
- <1>(L) executed in 64-bit mode if 1
- <1>(AVL) available for use by system software
- <4>4-bit length 19:16 bits in the descriptor
- <1>(P) segment presence in memory
- <2>(DPL) - privilege level, 0 is the highest privilege
- <1>(S) code or data segment, not a system segment
- <3>segment type execute/read/
- <1>accessed bit
----
### Actual transition into protected mode
[`arch/x86/boot/pm.c`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pm.c)
```c=124 [4-5]
/* Actual transition to protected mode... */
setup_idt();
setup_gdt();
protected_mode_jump(boot_params.hdr.code32_start,
(u32)&boot_params + (ds() << 4));
}
```
----
- `protected_mode_jump(jump_location, boot_paramters)`
----
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- Store `cs` Register into `bx`
- Store `&boot_params` into `edx`
```asm= [2-5]
GLOBAL(protected_mode_jump)
movl %edx, %esi # Pointer to boot_params table
xorl %ebx, %ebx
movw %cs, %bx
shll $4, %ebx
addl %ebx, 2f
jmp 1f # Short jump to serialize on 386/486
1:
...
```
----
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- (cs << 4) + `in_pm32`
- `in_pm32` # Transition to 32-bit mode
```asm=2 [5-9]
movl %edx, %esi # Pointer to boot_params table
xorl %ebx, %ebx
movw %cs, %bx
shll $4, %ebx
addl %ebx, 2f
jmp 1f # Short jump to serialize on 386/486
1:
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
```
----
### What?
- Why take a jump to next instruction?
```c=7 [2-3]
addl %ebx, 2f
jmp 1f # Short jump to serialize on 386/486
1:
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
```
----
### What? Explained
- Why take a jump to next instruction?
- **Clear CPU pre-fetched instructions**
```c=7 [2-3]
addl %ebx, 2f
jmp 1f # Short jump to serialize on 386/486
1:
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
```
----
### Store DS and TSS
```c=7 [5-6]
addl %ebx, 2f
jmp 1f # Short jump to serialize on 386/486
1:
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
```
----
### Enable Protected-mode
- Set Protection Enable `PE` bit in Control Register `cr0`
```c=11 [4-6]
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
movl %cr0, %edx
orb $X86_CR0_PE, %dl # Protected mode
movl %edx, %cr0
```
----
### Finally, A Long Jump
- `0x66` Prefix which allows us to mix 16-bit and 32-bit code
- `0xea` Jump opcode
- `in_pm32` (cs << 4) + in_pm32
- `__BOOT_CS` Target code segment
```c=14 [5-8]
movl %cr0, %edx
orb $X86_CR0_PE, %dl # Protected mode
movl %edx, %cr0
# Transition to 32-bit mode
.byte 0x66, 0xea # ljmpl opcode
2: .long in_pm32 # offset
.word __BOOT_CS # segment
ENDPROC(protected_mode_jump)
```
----
### First time running under Protected-mode
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- Setup segment registers
```c=51 [2-7]
GLOBAL(in_pm32)
# Set up data segments for flat 32-bit mode
movl %ecx, %ds
movl %ecx, %es
movl %ecx, %fs
movl %ecx, %gs
movl %ecx, %ss
# The 32-bit code sets up its own stack, but this way we do have
# a valid stack if some debugging hack wants to use it.
addl %ebx, %esp
# Set up TR to make Intel VT happy
ltr %di
# Clear registers to allow for future extensions to the
# 32-bit boot protocol
xorl %ecx, %ecx
xorl %edx, %edx
xorl %ebx, %ebx
xorl %ebp, %ebp
xorl %edi, %edi
# Set up LDTR to make Intel VT happy
lldt %cx
jmpl *%eax # Jump to the 32-bit entrypoint
ENDPROC(in_pm32)
```
----
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- Setup a valid stack
```c=51 [8-10]
GLOBAL(in_pm32)
# Set up data segments for flat 32-bit mode
movl %ecx, %ds
movl %ecx, %es
movl %ecx, %fs
movl %ecx, %gs
movl %ecx, %ss
# The 32-bit code sets up its own stack, but this way we do have
# a valid stack if some debugging hack wants to use it.
addl %ebx, %esp
# Set up TR to make Intel VT happy
ltr %di
# Clear registers to allow for future extensions to the
# 32-bit boot protocol
xorl %ecx, %ecx
xorl %edx, %edx
xorl %ebx, %ebx
xorl %ebp, %ebp
xorl %edi, %edi
# Set up LDTR to make Intel VT happy
lldt %cx
jmpl *%eax # Jump to the 32-bit entrypoint
ENDPROC(in_pm32)
```
----
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- Clear the general purpose registers before jumping
```c=51 [15-21]
GLOBAL(in_pm32)
# Set up data segments for flat 32-bit mode
movl %ecx, %ds
movl %ecx, %es
movl %ecx, %fs
movl %ecx, %gs
movl %ecx, %ss
# The 32-bit code sets up its own stack, but this way we do have
# a valid stack if some debugging hack wants to use it.
addl %ebx, %esp
# Set up TR to make Intel VT happy
ltr %di
# Clear registers to allow for future extensions to the
# 32-bit boot protocol
xorl %ecx, %ecx
xorl %edx, %edx
xorl %ebx, %ebx
xorl %ebp, %ebp
xorl %edi, %edi
# Set up LDTR to make Intel VT happy
lldt %cx
jmpl *%eax # Jump to the 32-bit entrypoint
ENDPROC(in_pm32)
```
----
[`arch/x86/boot/pmjump.S`](https://github.com/torvalds/linux/blob/v4.16/arch/x86/boot/pmjump.S)
- JUMP!
```c=51 [26]
GLOBAL(in_pm32)
# Set up data segments for flat 32-bit mode
movl %ecx, %ds
movl %ecx, %es
movl %ecx, %fs
movl %ecx, %gs
movl %ecx, %ss
# The 32-bit code sets up its own stack, but this way we do have
# a valid stack if some debugging hack wants to use it.
addl %ebx, %esp
# Set up TR to make Intel VT happy
ltr %di
# Clear registers to allow for future extensions to the
# 32-bit boot protocol
xorl %ecx, %ecx
xorl %edx, %edx
xorl %ebx, %ebx
xorl %ebp, %ebp
xorl %edi, %edi
# Set up LDTR to make Intel VT happy
lldt %cx
jmpl *%eax # Jump to the 32-bit entrypoint
ENDPROC(in_pm32)
```
----
### Takeaway Questions (7)
- What is the purpose of `heap_free` function?
\(A\). Free memory allocated on the heap.
\(B\). Check if the heap has enough space.
\(C\). Swap the heap to disk to free memory.
----
### Takeaway Questions (8)
- Why is there a jump to next instruction?
\(A\). Clear out prefetched instuction
\(B\). Recalculate PC by an offset
\(C\). Store general purpose registers
---
Thanks
{"title":"OSC2024 - Booting 1-3","contributors":"[{\"id\":\"608a7668-2fad-4b0d-b902-77725425366d\",\"add\":9213,\"del\":3699},{\"id\":\"f4fbe309-4519-4168-84c2-a27a22c38a08\",\"add\":20755,\"del\":7749},{\"id\":\"0a6203cd-fe5c-4883-9d66-c525b5d82a54\",\"add\":12069,\"del\":4775},{\"id\":\"6acb0861-7b5a-454b-8282-10d7ac1cdd67\",\"add\":8230,\"del\":2550},{\"id\":\"58cdb1b3-d8a9-4847-a872-67e7bded70c1\",\"add\":6094,\"del\":1778},{\"id\":\"862cc6b3-64b1-4231-b5ab-7ad0b92d72a9\",\"add\":13924,\"del\":377},{\"id\":\"9ba1d0c4-8f0e-4d68-9a6b-7195ead54309\",\"add\":10360,\"del\":4549}]","description":"3/7 分享會"}