changed a year ago
Linked with GitHub

Booting 1 ~ 3

note:
Hello, everyone. Today, we'll discuss the first three chapters of the book.



From Bootloader to Kernel

楊皓宇, 紀政良, 李建緯

note:
Lets start the first chapter.


Power button -> BIOS -> MBR (bootloader)
-> kernel real-mode code

note:
This chapter details the process starting from powering on the device, accessing the BIOS, loading the bootloader from the disk, and finally executing the kernel code.


Power Button

  1. The motherboard sends a signal to the power supply device.
  2. The power supply provides the proper amount of electricity to the computer.
  3. Once the motherboard receives the power good signal, it tries to start the CPU.
  4. The CPU resets all leftover data in its registers and sets predefined values for each of them.

note:
When the power button is pressed, the motherboard sends a signal to the power supply, which then delivers the appropriate amount of electricity to the computer. Upon receiving the "power good" signal, the motherboard initiates the startup of the CPU. The CPU then clears any leftover data in its registers and assigns predefined values to each.


The Program Counter of x86 in real mode

  • Two 16-bit registers for 20-bit memory address
  • CS (Code Segment) (CS_selector)
    • Select the memory segment.
    • CS_base is hidden from programmer.
  • IP (Instruction Pointer)
    • the offset of the memory segment.

note:
Now, we'll introduce the program counter in the x86 CPU architecture when operating in real mode, which we'll cover in detail later. In real mode, two visible 16-bit registers combine to form a 20-bit memory address space. CS, standing for Code Segment (or CS_selector), is used to select a memory segment. Additionally, for the Intel 80386 and later models, there's a hidden register called CS_base, designed to accommodate larger memory spaces. Next, the IP, or Instruction Pointer, specifies the offset within the memory segment.


  • Intel 8086 (16-bit)
    ​​​​CS: 0xFFFF // code segment selector
    ​​​​IP: 0x0000 // instruction pointer
    
  • Intel 80286 (16-bit)
    ​​​​CS: 0xF000
    ​​​​IP: 0xFFF0
    
  • Intel 80386 and later (32-bit and 64-bit)
    ​​​​CS: 0xF000
    ​​​​IP: 0xFFF0
    
    ​​​​CS_base: 0xFFFF_0000  // hidden
    

note:
These are the predefined values for CS and IP, as illustrated in the slide.


BIOS

  1. The processor starts working in real mode.
    • An operating mode of all x86 CPUs.
    • Addresses in real mode always correspond to real locations in memory.
    • 20-bit memory address space.

\[ \begin{aligned} \text{CS Base} &= \text{CS Selector} \times 16\\ \text{Memory Address} &= \text{CS Base} + \text{IP} \end{aligned} \]

note:
After the processor starts, it operates in "real mode," which is a mode present in all x86 processors. In real mode, addresses directly correspond to physical memory locations, meaning there is no memory protection feature. Regardless of being 16-bit or 32-bit, programs are limited to a 20-bit memory address space.
The relationship between the registers is defined by a formula: CS_base equals CS_selector multiplied by 16, or equivalently, shifted left by 4 bits. The memory address is then calculated by adding the CS_base to the instruction pointer.


  1. Processor start to run code from 0xFFFF_FFF0.
    • Reset vector
    • The memory controller on the motherboard redirects the memory read request to the BIOS ROM.
    • Commonly, it is a near jump instruction, guiding the system to the rest of BIOS boot code.

note:
After the processor starts, it begins executing the first instruction located at a specific address, known as the reset vector. Since RAM is empty at boot, the motherboard's memory controller redirects memory read requests to the BIOS ROM. Notably, the first instruction at the reset vector is typically a near jump command, which directs the system to the remainder of the BIOS boot code.


Recall the initial register data and formula:

CS_selector: 0xF000
CS_base:     0xFFFF_0000
IP:          0xFFF0

\[ \begin{aligned} \text{CS Base} &= \text{CS Selector} \times 16\\ \text{Memory Address} &= \text{CS Base} + \text{IP} \end{aligned} \]

According to the formula above, CS_base should be 0xF_0000 instead of 0xFFFF_0000.
Why CS_base is not CS selector times 16?

note:
Recall the initial register data and formula, can you spot something is wrong? According to the formula above, CS_base should be 0xF_0000 instead of 0xFFFF_0000.
Why CS_base is not CS selector times 16?


  • Intel 8086
    • 16-bit processor, 20-bit memory address
    • Reset vector: 0xF_FFF0
  • Intel 80386 and later
    • 32-bit or 64-bit
    • Reset vector: 0xFFFF_FFF0

In real mode, program can only access 20-bit memory address.

note:
For the Intel 8086, the reset vector corresponds directly to its CS and IP values. However, for the Intel 80386 and later models, the reset vector is a 32-bit address. Considering that real mode only supports a 20-bit memory address space, how can a processor access this larger address?


To address this difference and maintain compatibility with the 20-bit memory address system of the real mode, modern x86 CPUs are designed to initialize the CS selector and CS base register in a way that aligns with this legacy requirement.

CS_selector:  0xF000
CS_base:      0xFFFF_0000
IP:           0xFFF0
Reset_vector: 0xFFFF_FFF0

Furthermore, the distinction between jump instructions is also important.
A near jump affects only IP, leaving the CS as is. However, when the system employs a far jump, both the CS selector and the CS base are synchronized.


Bootloader

note:
Next, we'll talk about bootloader. This essential component serves as the bridge between the firmware's initial power-on state and the loading of the operating system.


How does the bootloader starts?

  • The BIOS choose a bootable device from its configuration
  • The BIOS tries to find a boot sector that ends with a magic number.
    • On hard drive with MBR, the boot sector is in the first 446 bytes
  • The BIOS will copy it into a fixed memory location at 0x7c00 and jumps to there

note:
The BIOS, a basic check-up system for your computer, turns on first. It looks for a place to start the operating system from a list of sources like hard drives or USBs. It searches for a unique code, known as the 'magic number,' that marks the beginning of this process. Once found, it moves this code to a specific memory location and starts running it


Bootloader - GRUB 2

boot.img

  • Only contain necessary code to load core image of GRUB 2.
  • At the root partition of system.
  • Contains code to read file system.

note:
boot.img is the first step in using GRUB 2, our main bootloader.
Because of space limits, it's very compact. Its job is to locate and jump to GRUB 2's core program.


Bootloader - GRUB 2

diskboot.img

  • The core image begins with diskboot.img
  • Stored immediately after the first sector in the unused space.
  • Loads the rest of the core image into memory.
  • Then, it executes the grub_main function.
    note:
    diskboot.img comes next, stored in a space before the first partition. It loads the complete GRUB 2, which includes the drivers necessary for reading the filesystem into memory, and then starts grub_main.

Bootloader - GRUB 2

grub_main

  • Initialize the console
  • Get the base address for modules
  • Set the root device
  • Load/parse the GRUB configuration file
  • Load modules
  • ..

note:
grub_main is where the action happens in GRUB 2. It sets up the console for display, figures out where modules are stored, and chooses the 'root' device. Then, it reads the GRUB configuration file to know what else needs to be loaded.
// This step is crucial for loading other modules and preparing the system for use.


Bootloader - GRUB 2

grub_main

  • Moves GRUB to normal mode and execute the grub_normal_execute
  • Shows the boot menu to select an OS

note:
At the end of execution, the grub_main function moves grub to normal mode. The grub_normal_execute function completes the final preparations and shows a menu to select an operating system.


Bootloader - GRUB 2

grub_menu_execute_entry

  • The boot loader will load the kernel as the memory layout defined by the boot protocol.
  • When the bootloader transfers control to the kernel, it starts at:
    X + sizeof(KernalBootSector) + 1

note:
In the grub_menu_execute_entry phase, GRUB 2 carefully prepares to hand over control to the operating system's kernel. This process is guided by specific rules known as the boot protocol, ensuring that the kernel is loaded correctly into memory.
The transition point, or where GRUB hands off to the kernel, is calculated by the formula shows bellow. Here, 'X' is the address of the kernel boot sector being loaded, and sizeof(KernelBootSector) calculates the size of the kernel's boot sector, ensuring the kernel starts executing at exactly the right location. This precision ensures the system boots smoothly.


The Beginning of the Kernel Setup Stage

  • The kernel is stored in a compressed format.
  • It will first configure stuff for the decompressor and some memory management related things
  • Decompress the actual kernel and jump to it

note:
Its first task is to get ready for the main event. Since the kernel is stored in a compressed format to save space, it must first be decompressed. But before that, the kernel sets up the necessary environment. This setup involves configuring the decompressor and arranging memory management components.
After these preparations, the kernel proceeds to decompress itself. Once decompression is complete, the kernel jumps into action, beginning its core functions and taking over the system's operation. This moment is critical, marking the transition from booting to an operational state where the operating system takes the lead.


The boot sector

  • Starts with MZ and follows by a PE header.
  • The exact entry point for kernel setup part is at an offset from the kernel boot sector.
  • It starts by setting many headers and then starts at the start_up_setup
    note:
    The boot sector is essentially the starting line for the operating system's kernel. It has a unique signature that begins with 'MZ', followed by a 'PE' header.

However, the actual starting point for setting up the kernel isn't right at the beginning of this sector. It's a bit further along, at a specific distance known as an offset.

Once we reach this starting point, the kernel begins its setup routine, known as start_up_setup. This phase involves configuring a variety of settings, known as headers, which tell the kernel how to behave and what resources it has at its disposal

That's a closer look at how a computer transitions from off to fully operational through the bootloader and kernel setup. This process, while complex, ensures that your computer starts correctly and is ready for use


Bootloader -> kernel (real-mode code)

  bzImage
     |
     |---- setup.elf   // <---- header.S and some c code
     |
     |---- vmlinux
             |
             |---- setup.bin
             |
             |---- vmlinux.bin.gz
          

Kernel real-mode code

image

In this case X = 0x10000;

Kernel setup is at 0x10200


header.S

  1. lagacy boot sector (512 bytes)

  2. first part of kernel setup

  3. (boot header information)


Kernel legacy code

header.S

螢幕快照 2024-03-10 00-13-10


Kernel legacy code

螢幕快照 2024-03-10 00-13-50

According to the Linux boot protocol

Kernel legacy code

  • contains some code to show the error message

  • if we don't use 3rd party bootloader and let BIOS load the first sector of kernel image into memory at 0x7c00 and run
    image


Kernel setup code in Header.S

// first part of setup header

image

// second part of setup header

image


Kernel setup code in header.S

  • Aligning the segment registers
  • Stack setup
  • BSS setup
  • Jump to main

Aligning the segment registers

image

we want es=cs=ss=ds=0x1000


Aligning the segment registers

Make sure that all segment register values are equal

  • ds is already 0x1000
  • Force es = ds
  • stack setup (set sp), then let ss = ds
  • cs right now is 0x1020, but it's not modifiable manually

Aligning the segment registers

  • grub2 loads kernel setup code at address 0x10000, but starts from 0x10200

  • we want to let cs = 0x1000

    • push ds and the address of the next instruction
    • then 'return'

image


Stack setup

  • in loadflags, if CAN_USE_HEAP is clear (=0)
    • sp = _end + STACK_SIZE

image


Stack setup

  • if CAN_USE_HEAP flag is set (=1)
    • sp = heap_end_ptr + STACK_SIZE
      or
    • sp = 0xFFFC
      if heap_end_ptr + STACK_SIZE is overflow

image


BSS setup

  • Linux carefully ensures this area of memory is zeroed
  • The code writes zeros from __bss_start to _end

image


Now we have the stack and BSS, so we can jump to the main() C function

螢幕快照 2024-03-10 00-16-16


Takeaway Questions (1)

  • What is the entry point of the BIOS code for 64-bit x86 CPU?
    (A). 0xF_FFF0
    (B). 0xFFFF_FFF0
    (C). 0xFFFF_FFFF_FFFF_FFF0

Takeaway Questions (2)

  • In which stage, it will try to find the boot sector ends with the specific signature (0x55, 0xaa) ?
    (A). Bootloader
    (B). BIOS
    (C). Kernel setup

Takeaway Questions (3)

  • What is the entry point of the kernel code in this case?
    (A). 0x7C00
    (B). 0x10000
    (C). 0x10200

First steps in the kernel setup code

高士軒, 黃爾泰


Protected mode

  • Protected mode was the main mode of Intel processors from the 80286 processor until Intel 64 and long mode came.
  • Very limited access to the RAM in real mode, which is only 1 MB.
    • 20-bit address was replaced with 32-bit address bus.
    • It allowed access to 4 GB memory.
  • The main difference between real mode and protected mode is memory management.

Memory management in protected mode

  • The size and location of each segment is described by an associated data structure called the Segment Descriptor.
  • These segment descriptors are stored in a data structure called the Global Descriptor Table (GDT).
  • The address of GDT is stored in the special GDTR register. There will be an operation for loading it from memory, something like:
lgdt gdt

Segment descriptor in GDT

  • Each descriptor is 64-bits in size. The general scheme of a descriptor is:
    ​​​​ 63         56         51   48    45           39        32 
    ​​​​------------------------------------------------------------
    ​​​​|             | |B| |A|       | |   | |0|E|W|A|            |
    ​​​​| BASE 31:24  |G|/|L|V| LIMIT |P|DPL|S|  TYPE | BASE 23:16 |
    ​​​​|             | |D| |L| 19:16 | |   | |1|C|R|A|            |
    ​​​​------------------------------------------------------------
    
    ​​​​ 31                         16 15                         0 
    ​​​​------------------------------------------------------------
    ​​​​|                             |                            |
    ​​​​|        BASE 15:0            |       LIMIT 15:0           |
    ​​​​|                             |                            |
    ​​​​------------------------------------------------------------
    

Segment selector in protected mode

  • Segment registers contain segment selectors as in real mode.
  • Each Segment Descriptor has an associated Segment Selector which is a 16-bit structure:
    ​​​​                     15             3 2  1     0
    ​​​​                    -----------------------------
    ​​​​                    |      Index     | TI | RPL |
    ​​​​                    -----------------------------
    
    • Index stores the index number of the descriptor in the GDT.
    • And RPL contains the Requester's Privilege Level.

Get physical address in protected mode

  • Use GDT address + Index from the selector.
  • Plus the base address of segment with offset.
    Physical address

Copy boot parameters

  • The first function called in main is copy_boot_params(void).
    • The boot_params structure includes a member setup_header hdr which contains some useful parameters in later initialization.
    • Using memcpy defined in copy.S to copy hdr to boot_params.

Console initialization

  • Function console_init would be called.
  • It tries to parse the port address and baud rate of the serial port and initialize it.
  • It would output the string below to test if the serial port initialization is successful.
    puts("early console in setup code\n");
  • The puts function can print character by character by interrypt 0x10.

Heap initialization

  • Initialize the heap with the init_heap function.
    • Checks CAN_USE_HEAP flag from loadflags.
    • loadflags is a bitmask and it also contains other mask.
    • The inline assembly is to calculate the address of stack_end.
    • stack_end = esp - STACK_SIZE;
char *stack_end;

if (boot_params.hdr.loadflags & CAN_USE_HEAP) {
    asm("leal %P1(%%esp),%0"
        : "=r" (stack_end) : "i" (-STACK_SIZE));

Heap initialization

  • And heap_end is defined in other header file.

  • The last check is whether heap_end is greater than stack_end. If it is then stack_end is assigned to heap_end to make them equal.

    ​​​​if (heap_end > stack_end)
    ​​​​    head_end = stak_end;
    

The first example


The second example


Takeaway Questions (4)

  • What is the main reason to use protected mode?
    (A). Faster execution speed.
    (B). More available memory space.
    (C). Lower hardware requirements.

CPU Validation

  • Function validate_cpu
    • Checks if the CPU is in right CPU level by function check_cpu
      ​​​​​​​​check_cpu(&cpu_level, &req_level, &err_flags);
      ​​​​​​​​if (cpu_level < req_level) {
      ​​​​​​​​    ...
      ​​​​​​​​    return -1;
      ​​​​​​​​}
      

CPU Validation

  • Function check_cpu
    1. CPU have right level
      e.g. long mode in x86_64
    2. Preparations for vendor deterministic feature
      e.g. SSE+SSE2 for AMD if missing

CPU Validation

  • After validation of CPU, set_bios_mode is called.
    • It executes 0x15 BIOS interrupt to tell BIOS long mode will be used.

Memory Detection

  • Next step is to get information about memory from bios.
  • detect_memory function provides a map of available RAM to the CPU.
  • There are many programming interfaces for memory detection such as 0xe820, 0xe801 and 0x88.
  • We will take 0xe820 for example.

Memory Detection:detect_memory_e820

  • Initialize the biosregs structure with 0xe820 call.
    ​​​​initregs(&ireg);
    ​​​​ireg.ax  = 0xe820;
    ​​​​ireg.cx  = sizeof buf;
    ​​​​ireg.edx = SMAP;
    ​​​​ireg.di  = (size_t)&buf;
    

Memory Detection:detect_memory_e820

  1. ax : the number of the function (0xe820)
  2. cx : size of the buffer which will contain data about the memory (sizeof buf)
  3. edx : SMAP(ASCII) magic number
  4. es:di : contain the address of the buffer (&buf)
  5. ebx : Initialized to zero in the first time.

Memory Detection:detect_memory_e820

  • A loop calling intcall(0x15, &ireg, &oreg)
    • Gets memory information by BIOS interrupt.
    • Get lines by call the interrupt iteratively.
      ​​​​​​​​// in each iteration  
      ​​​​​​​​intcall(0x15, &ireg, &oreg);
      ​​​​​​​​ireg.ebx = oreg.ebx; // update ebx with pervious value
      
  • Loop terminate when ebx = 0
  • Collects data write into an array of e820_entry

Memory Detection:detect_memory_e820

  • Each e820_entry contain
    1. start of memory segment
    2. size of memory segment
    3. type of memory segment(used or reserved)

  • Simply output of dmesg like
[    0.0] e820: BIOS-provided physical RAM map:
[    0.0] BIOS-e820: [mem 0x00000000-0x0009fbff] usable
[    0.0] BIOS-e820: [mem 0x0009fc00-0x0009ffff] reserved
[    0.0] BIOS-e820: [mem 0x000f0000-0x000fffff] reserved
[    0.0] BIOS-e820: [mem 0x00100000-0x3ffdffff] usable
[    0.0] BIOS-e820: [mem 0x3ffe0000-0x3fffffff] reserved
[    0.0] BIOS-e820: [mem 0xfffc0000-0xffffffff] reserved

Keyboard Initialization

  • keyboard_init function
    • Call interrupt 0x16 to query the status of the keyboard
      ​​​​​​​​initregs(&ireg);
      ​​​​​​​​ireg.ah = 0x02;     /* Get keyboard status */
      ​​​​​​​​intcall(0x16, &ireg, &oreg);
      ​​​​​​​​boot_params.kbd_status = oreg.al;
      
    • Call interrupt 0x16 again to set repeat rate and delay.
      ​​​​​​​​ireg.ax = 0x0305;   /* Set keyboard repeat rate */
      

Querying

  • The next steps are queries for different parameters.
  • We will not dive into details about these queries now.
  • In next few slides, we will take a look to some functions for example.

Function: query_ist

  • Get Intel SpeedStep(A variable CPU frequency feature provide by intel) by calling query_ist function.
    • Checks the CPU level and if it is correct.
    • Call the interrupt 0x15 to get the info and saves the result to boot_params.

Function: query_apm_bios

  • query_apm_bios calls the interrupt 0x15 with ah=0x53 to check APM installation.
  • APM : Advanced Power Management
    • A standard for power management

Function: query_apm_bios

  • Next, it calls 0x15 again, but with ah=0x5304 to disconnect APM interface and connect the 32-bit protected mode interface.
  • In the end, it fills boot_params.apm_bios_info with values obtained from the BIOS.

Function: query_apm_bios

  • Note: query_apm_bios will be executed only when CONFIG_APM or CONFIG_APM_MODULE compile flag was set.

Function: query_edd

  • The last is the query_edd function, which queries EDD(Enhanced Disk Drive) info from BIOS.
  • Enhanced Disk Drive
    • A interface provide better access to hard drive
  • Can be disable by kernel's command line
  • It also use a loop to query these infomation

Function: query_edd

  • The simplied code of query_edd
    for (devno = 0x80; devno < 0x80 + EDD_MBR_SIG_MAX; devno++) {
        if (!get_edd_info(devno, &ei) &&
        boot_params.eddbuf_entries < EDDMAXNR) {
            memcpy(edp, &ei, sizeof ei);
            edp++;
            boot_params.eddbuf_entries++;
        }
        ...
        ...
        ...
    }

Takeaway Questions (5)

  • What does function validate_cpu do?
    (A). Check if CPU is in right level.
    (B). Check whether CPU is broken.
    (C). Benchmark to check the speed of the CPU.

Takeaway Questions (6)

  • What does function detect_memory_e820 do?
    (A). Detect the bandwith of memory.
    (B). Detect the DRAM generation.
    (C). Get information about available address

Video mode initialization and transition to protected mode

Ting Shiuan Guan, Tim Lin


main() in arch/x86/boot/main.c

/* Set the video mode */ set_video(); /* Do the last things and invoke protected mode */ go_to_protected_mode(); }

Kernel data types

size (bytes) 1 2 4 8
signed type char short int long
unsigned type u8 u16 u32 u64

Heap API

Defined in arch/x86/boot/boot.h

/* Heap -- available for dynamic lists. */ extern char _end[]; extern char *HEAP; extern char *heap_end; #define RESET_HEAP() ((void *)( HEAP = _end )) static inline char *__get_heap(size_t s, size_t a, size_t n) { char *tmp; HEAP = (char *)(((size_t)HEAP+(a-1)) & ~(a-1)); tmp = HEAP; HEAP += s*n; return tmp; } #define GET_HEAP(type, n) \ ((type *)__get_heap(sizeof(type),__alignof__(type),(n))) static inline bool heap_free(size_t n) { return (int)(heap_end-HEAP) >= (int)n; }

Video Mode

main() in arch/x86/boot/main.c

/* Set the video mode */ set_video(); /* Do the last things and invoke protected mode */ go_to_protected_mode(); }

set_video() in arch/x86/boot/video.c

void set_video(void) { u16 mode = boot_params.hdr.vid_mode; RESET_HEAP(); store_mode_params(); save_screen(); probe_cards(0); for (;;) { if (mode == ASK_VGA) mode = mode_menu(); if (!set_mode(mode)) break; printf("Undefined video mode number: %x\n", mode); mode = ASK_VGA; } boot_params.hdr.vid_mode = mode; vesa_store_edid(); store_mode_params(); if (do_restore) restore_screen(); }

set_video() in arch/x86/boot/video.c

void set_video(void) { u16 mode = boot_params.hdr.vid_mode; RESET_HEAP();
  • Boot protocol
    • header vid_mode
      • offset 0x01FA / size 2
    • command line options vga=<mode>
      • integer / normal / ext / ask

set_video() in arch/x86/boot/video.c

u16 mode = boot_params.hdr.vid_mode; RESET_HEAP(); store_mode_params(); save_screen(); probe_cards(0);
  • Reset heap.
  • Store in boot_params.screen_info.

In arch/x86/boot/video.c
store_mode_params()

/* * Store the video mode parameters for later usage by the kernel. * This is done by asking the BIOS except for the rows/columns * parameters in the default 80x25 mode -- these are set directly, * because some very obscure BIOSes supply insane values. */ static void store_mode_params(void) { u16 font_size; int x, y; /* For graphics mode, it is up to the mode-setting driver (currently only video-vesa.c) to store the parameters */ if (graphic_mode) return; store_cursor_position(); store_video_mode(); if (boot_params.screen_info.orig_video_mode == 0x07) { /* MDA, HGC, or VGA in monochrome mode */ video_segment = 0xb000; } else { /* CGA, EGA, VGA and so forth */ video_segment = 0xb800; } set_fs(0); font_size = rdfs16(0x485); /* Font size, BIOS area */ boot_params.screen_info.orig_video_points = font_size; x = rdfs16(0x44a); y = (adapter == ADAPTER_CGA) ? 25 : rdfs8(0x484)+1; if (force_x) x = force_x; if (force_y) y = force_y; boot_params.screen_info.orig_video_cols = x; boot_params.screen_info.orig_video_lines = y; }

set_video() in arch/x86/boot/video.c

RESET_HEAP(); store_mode_params(); save_screen(); probe_cards(0);
  • Save contents of screen to heap.

save_screen()

static void save_screen(void) { /* Should be called after store_mode_params() */ saved.x = boot_params.screen_info.orig_video_cols; saved.y = boot_params.screen_info.orig_video_lines; saved.curx = boot_params.screen_info.orig_x; saved.cury = boot_params.screen_info.orig_y; if (!heap_free(saved.x*saved.y*sizeof(u16)+512)) return; /* Not enough heap to save the screen */ saved.data = GET_HEAP(u16, saved.x*saved.y); set_fs(video_segment); copy_from_fs(saved.data, 0, saved.x*saved.y*sizeof(u16)); }

set_video() in arch/x86/boot/video.c

store_mode_params(); save_screen(); probe_cards(0); for (;;) {
  • Probe video drivers and generate mode lists.

probe_cards() in arch/x86/boot/video-mode.c

/* Probe the video drivers and have them generate their mode lists. */ void probe_cards(int unsafe) { struct card_info *card; static u8 probed[2]; if (probed[unsafe]) return; probed[unsafe] = 1; for (card = video_cards; card < video_cards_end; card++) { if (card->unsafe == unsafe) { if (card->probe) card->nmodes = card->probe(); else card->nmodes = 0; } } }

video_vga in arch/x86/boot/video-vga.c

static __videocard video_vga = { .card_name = "VGA", .probe = vga_probe, .set_mode = vga_set_mode, };

__videocard macro in arch/x86/boot/video.h

#define __videocard struct card_info __section(".videocards") __attribute__((used))

video_cards in arch/x86/boot/setup.ld

.videocards : { video_cards = .; *(.videocards) video_cards_end = .; }

set_video() in arch/x86/boot/video.c

probe_cards(0); for (;;) { if (mode == ASK_VGA) mode = mode_menu(); if (!set_mode(mode)) break; printf("Undefined video mode number: %x\n", mode); mode = ASK_VGA; } boot_params.hdr.vid_mode = mode;

mode_menu() in arch/x86/boot/video.c
static unsigned int mode_menu(void) { int key; unsigned int sel; puts("Press <ENTER> to see video modes available, " "<SPACE> to continue, or wait 30 sec\n"); kbd_flush(); while (1) { key = getchar_timeout(); if (key == ' ' || key == 0) return VIDEO_CURRENT_MODE; /* Default */ if (key == '\r') break; putchar('\a'); /* Beep! */ } for (;;) { display_menu(); puts("Enter a video mode or \"scan\" to scan for " "additional modes: "); sel = get_entry(); if (sel != SCAN) return sel; probe_cards(1); } }

image


set_video() in arch/x86/boot/video.c

for (;;) { if (mode == ASK_VGA) mode = mode_menu(); if (!set_mode(mode)) break; printf("Undefined video mode number: %x\n", mode); mode = ASK_VGA; } boot_params.hdr.vid_mode = mode; vesa_store_edid();

set_mode() in arch/x86/boot/video-mode.c

/* Set mode (with recalc if specified) */ int set_mode(u16 mode) { int rv; u16 real_mode; /* Very special mode numbers... */ if (mode == VIDEO_CURRENT_MODE) return 0; /* Nothing to do... */ else if (mode == NORMAL_VGA) mode = VIDEO_80x25; else if (mode == EXTENDED_VGA) mode = VIDEO_8POINT; rv = raw_set_mode(mode, &real_mode); if (rv) return rv; if (mode & VIDEO_RECALC) vga_recalc_vertical(); /* Save the canonical mode number for the kernel, not an alias, size specification or menu position */ #ifndef _WAKEUP boot_params.hdr.vid_mode = real_mode; #endif return 0; }

raw_set_mode() in arch/x86/boot/video-mode.c

/* Set mode (without recalc) */ static int raw_set_mode(u16 mode, u16 *real_mode) { int nmode, i; struct card_info *card; struct mode_info *mi; /* Drop the recalc bit if set */ mode &= ~VIDEO_RECALC; /* Scan for mode based on fixed ID, position, or resolution */ nmode = 0; for (card = video_cards; card < video_cards_end; card++) { mi = card->modes; for (i = 0; i < card->nmodes; i++, mi++) { int visible = mi->x || mi->y; if ((mode == nmode && visible) || mode == mi->mode || mode == (mi->y << 8)+mi->x) { *real_mode = mi->mode; return card->set_mode(mi); } if (visible) nmode++; } } /* Nothing found? Is it an "exceptional" (unprobed) mode? */ for (card = video_cards; card < video_cards_end; card++) { if (mode >= card->xmode_first && mode < card->xmode_first+card->xmode_n) { struct mode_info mix; *real_mode = mix.mode = mode; mix.x = mix.y = 0; return card->set_mode(&mix); } } /* Otherwise, failure... */ return -1; }

arch/x86/boot/video-vga.c

static __videocard video_vga = { .card_name = "VGA", .probe = vga_probe, .set_mode = vga_set_mode, };

vga_set_mode()

static int vga_set_mode(struct mode_info *mode) { /* Set the basic mode */ vga_set_basic_mode(); /* Override a possibly broken BIOS */ force_x = mode->x; force_y = mode->y; switch (mode->mode) { case VIDEO_80x25: break; case VIDEO_8POINT: vga_set_8font(); break; case VIDEO_80x43: vga_set_80x43(); break; case VIDEO_80x28: vga_set_14font(); break; case VIDEO_80x30: vga_set_80x30(); break; case VIDEO_80x34: vga_set_80x34(); break; case VIDEO_80x60: vga_set_80x60(); break; } return 0; }

vga_set_8font() in arch/x86/boot/video-vga.c

static void vga_set_8font(void) { /* Set 8x8 font - 80x43 on EGA, 80x50 on VGA */ struct biosregs ireg; initregs(&ireg); /* Set 8x8 font */ ireg.ax = 0x1112; /* ireg.bl = 0; */ intcall(0x10, &ireg, NULL); /* Use alternate print screen */ ireg.ax = 0x1200; ireg.bl = 0x20; intcall(0x10, &ireg, NULL); /* Turn off cursor emulation */ ireg.ax = 0x1201; ireg.bl = 0x34; intcall(0x10, &ireg, NULL); /* Cursor is scan lines 6-7 */ ireg.ax = 0x0100; ireg.cx = 0x0607; intcall(0x10, &ireg, NULL); }

set_video() in arch/x86/boot/video.c

} boot_params.hdr.vid_mode = mode; vesa_store_edid(); store_mode_params(); if (do_restore) restore_screen(); }
  • EDID (Extended Display Identification Data)

Last preparation before transition into protected mode


arch/x86/boot/main.c

/* Set the video mode */ set_video(); /* Do the last things and invoke protected mode */ go_to_protected_mode(); }

arch/x86/boot/pm.c

  • Before enter protected mode, there are few things to do.
void go_to_protected_mode(void)
{
	realmode_switch_hook();
	if (enable_a20()) {
		...
	}
    reset_coprocessor();
	mask_all_interrupts();
	setup_idt();
	setup_gdt();
	protected_mode_jump(boot_params.hdr.code32_start,
			    (u32)&boot_params + (ds() << 4));
}

realmode_switch_hook

arch/x86/boot/pm.c

void go_to_protected_mode(void) { /* Hook before leaving real mode, also disables interrupts */ realmode_switch_hook(); /* Enable the A20 gate */ if (enable_a20()) { puts("A20 gate not responding, unable to boot...\n"); die(); } /* Reset coprocessor (IGNNE#) */ reset_coprocessor(); /* Mask all interrupts in the PIC */ mask_all_interrupts(); ...

realmode_switch_hook

  • Invoke realmode_switch is present
    • Hooks are used if the bootloader runs in a hostile environment.
  • io_delay
    • asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT));
static void realmode_switch_hook(void)
{
	if (boot_params.hdr.realmode_swtch) {
		asm volatile("lcallw *%0"
			     : : "m" (boot_params.hdr.realmode_swtch)
			     : "eax", "ebx", "ecx", "edx");
	} else {
		asm volatile("cli");
		outb(0x80, 0x70); /* Disable NMI */
		io_delay();
	}
}

ADVANCED BOOT LOADER HOOKS

If the boot loader runs in a particularly hostile environment (such as
LOADLIN, which runs under DOS) it may be impossible to follow the
standard memory location requirements.  

Such a boot loader may use the
following hooks that, if set, are invoked by the kernel at the
appropriate time.  The use of these hooks should probably be
considered an absolutely last resort!

realmode_switch_hook (No hook)

  • Disable NMI (Non Maskable Interrupt)
    • Hardware interrupt that standard interrupt-masking in the system cannot ignore
    • Usually occured at non-recovery hardware error
  • Writing 0x80 to CMOS Address register (0x70)
static void realmode_switch_hook(void)
{
	if (boot_params.hdr.realmode_swtch) {
		asm volatile("lcallw *%0"
			     : : "m" (boot_params.hdr.realmode_swtch)
			     : "eax", "ebx", "ecx", "edx");
	} else {
		asm volatile("cli");
		outb(0x80, 0x70); /* Disable NMI */
		io_delay();
	}
}

Enable A20 line

arch/x86/boot/pm.c

void go_to_protected_mode(void) { /* Hook before leaving real mode, also disables interrupts */ realmode_switch_hook(); * /* Enable the A20 gate */ if (enable_a20()) { puts("A20 gate not responding, unable to boot...\n"); die(); } /* Reset coprocessor (IGNNE#) */ reset_coprocessor(); /* Mask all interrupts in the PIC */ mask_all_interrupts(); ...

Enable A20 line

  • A20 line in particular is used to transmit the 21st bit on the address bus
    • Intel no longer supports the A20 gate
int enable_a20(void)
{
       int loops = A20_ENABLE_LOOPS;
       int kbc_err;

       while (loops--) {
	       /* First, check to see if A20 is already enabled
		  (legacy free, etc.) */
	       if (a20_test_short())
		       return 0;
	       
	       /* Next, try the BIOS (INT 0x15, AX=0x2401) */
	       enable_a20_bios();
	       if (a20_test_short())
		       return 0;
	       
	       /* Try enabling A20 through the keyboard controller */
	       kbc_err = empty_8042();

	       if (a20_test_short())
		       return 0; /* BIOS worked, but with delayed reaction */
	
	       if (!kbc_err) {
		       enable_a20_kbc();
		       if (a20_test_long())
			       return 0;
	       }
	       
	       /* Finally, try enabling the "fast A20 gate" */
	       enable_a20_fast();
	       if (a20_test_long())
		       return 0;
       }
       
       return -1;
}

Disable Coprocessor

arch/x86/boot/pm.c

void go_to_protected_mode(void) { /* Hook before leaving real mode, also disables interrupts */ realmode_switch_hook(); /* Enable the A20 gate */ if (enable_a20()) { puts("A20 gate not responding, unable to boot...\n"); die(); } /* Reset coprocessor (IGNNE#) */ reset_coprocessor(); /* Mask all interrupts in the PIC */ mask_all_interrupts(); ...

Disable Coprocessor

  • Clears the Math Coprocessor
    1. Writing 0 to 0xf0
    2. Resets it by writing 0 to 0xf1.
outb(0, 0xf0);
outb(0, 0xf1);

Mask All Interrupt

arch/x86/boot/pm.c

void go_to_protected_mode(void) { /* Hook before leaving real mode, also disables interrupts */ realmode_switch_hook(); /* Enable the A20 gate */ if (enable_a20()) { puts("A20 gate not responding, unable to boot...\n"); die(); } /* Reset coprocessor (IGNNE#) */ reset_coprocessor(); /* Mask all interrupts in the PIC */ mask_all_interrupts(); ...

Mask All Interrupt

  • Masks all interrupts on the secondary PIC (Programmable Interrupt Controller) and primary PIC
    • Except for IRQ2 on the primary PIC.
    • IRQ2 line cascade PIC1 and PIC2
outb(0xff, 0xa1);       /* Mask all interrupts on the secondary PIC */
outb(0xfb, 0x21);       /* Mask all but cascade on the primary PIC */

Setting up the Interrupt Descriptor Table

arch/x86/boot/pm.c

/* Actual transition to protected mode... */ setup_idt(); setup_gdt(); protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4)); }

IDT Struct

struct gdt_ptr {
	u16 len;
	u32 ptr;
} __attribute__((packed));

arch/x86/boot/pm.c

  • Load a null IDT
  • Interrupt is another topic, IDT is initialized after boot
static void setup_idt(void) { static const struct gdt_ptr null_idt = {0, 0}; asm volatile("lidtl %0" : : "m" (null_idt)); }

Set up Global Descriptor Table

arch/x86/boot/pm.c

/* Actual transition to protected mode... */ setup_idt(); setup_gdt(); protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4)); }

boot_gdt

  • Intel recommends 16 byte alignment
static const u64 boot_gdt[] __attribute__((aligned(16))) = {
    /* CS: code, read/execute, 4 GB, base 0 */
    [GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff),
    /* DS: data, read/write, 4 GB, base 0 */
    [GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff),
    /* TSS: 32-bit tss, 104 bytes, base 4096 */
    /* We only have a TSS here to keep Intel VT happy;
       we don't actually use it for anything. */
    [GDT_ENTRY_BOOT_TSS] = GDT_ENTRY(0x0089, 4096, 103),
};

Load GDT

static struct gdt_ptr gdt;

gdt.len = sizeof(boot_gdt)-1;
gdt.ptr = (u32)&boot_gdt + (ds() << 4);

asm volatile("lgdtl %0" : : "m" (gdt));

GDT_ENTRY Macro

GDT_ENTRY(base, limit, flag)


flag of GDT_ENTRY

  • <1>(G) granularity bit
  • <1>(D) if 0 16-bit segment; 1 = 32-bit segment
  • <1>(L) executed in 64-bit mode if 1
  • <1>(AVL) available for use by system software
  • <4>4-bit length 19:16 bits in the descriptor
  • <1>§ segment presence in memory
  • <2>(DPL) - privilege level, 0 is the highest privilege
  • <1>(S) code or data segment, not a system segment
  • <3>segment type execute/read/
  • <1>accessed bit

Actual transition into protected mode

arch/x86/boot/pm.c

/* Actual transition to protected mode... */ setup_idt(); setup_gdt(); protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4)); }

  • protected_mode_jump(jump_location, boot_paramters)

arch/x86/boot/pmjump.S

  • Store cs Register into bx
  • Store &boot_params into edx
GLOBAL(protected_mode_jump) movl %edx, %esi # Pointer to boot_params table xorl %ebx, %ebx movw %cs, %bx shll $4, %ebx addl %ebx, 2f jmp 1f # Short jump to serialize on 386/486 1: ...

arch/x86/boot/pmjump.S

  • (cs << 4) + in_pm32
    • in_pm32 # Transition to 32-bit mode
movl %edx, %esi # Pointer to boot_params table xorl %ebx, %ebx movw %cs, %bx shll $4, %ebx addl %ebx, 2f jmp 1f # Short jump to serialize on 386/486 1: movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di

What?

  • Why take a jump to next instruction?
addl %ebx, 2f jmp 1f # Short jump to serialize on 386/486 1: movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di

What? Explained

  • Why take a jump to next instruction?
  • Clear CPU pre-fetched instructions
addl %ebx, 2f jmp 1f # Short jump to serialize on 386/486 1: movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di

Store DS and TSS

addl %ebx, 2f jmp 1f # Short jump to serialize on 386/486 1: movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di

Enable Protected-mode

  • Set Protection Enable PE bit in Control Register cr0
movw $__BOOT_DS, %cx movw $__BOOT_TSS, %di movl %cr0, %edx orb $X86_CR0_PE, %dl # Protected mode movl %edx, %cr0

Finally, A Long Jump

  • 0x66 Prefix which allows us to mix 16-bit and 32-bit code
  • 0xea Jump opcode
  • in_pm32 (cs << 4) + in_pm32
  • __BOOT_CS Target code segment
movl %cr0, %edx orb $X86_CR0_PE, %dl # Protected mode movl %edx, %cr0 # Transition to 32-bit mode .byte 0x66, 0xea # ljmpl opcode 2: .long in_pm32 # offset .word __BOOT_CS # segment ENDPROC(protected_mode_jump)

First time running under Protected-mode

arch/x86/boot/pmjump.S

  • Setup segment registers
GLOBAL(in_pm32) # Set up data segments for flat 32-bit mode movl %ecx, %ds movl %ecx, %es movl %ecx, %fs movl %ecx, %gs movl %ecx, %ss # The 32-bit code sets up its own stack, but this way we do have # a valid stack if some debugging hack wants to use it. addl %ebx, %esp # Set up TR to make Intel VT happy ltr %di # Clear registers to allow for future extensions to the # 32-bit boot protocol xorl %ecx, %ecx xorl %edx, %edx xorl %ebx, %ebx xorl %ebp, %ebp xorl %edi, %edi # Set up LDTR to make Intel VT happy lldt %cx jmpl *%eax # Jump to the 32-bit entrypoint ENDPROC(in_pm32)

arch/x86/boot/pmjump.S

  • Setup a valid stack
GLOBAL(in_pm32) # Set up data segments for flat 32-bit mode movl %ecx, %ds movl %ecx, %es movl %ecx, %fs movl %ecx, %gs movl %ecx, %ss # The 32-bit code sets up its own stack, but this way we do have # a valid stack if some debugging hack wants to use it. addl %ebx, %esp # Set up TR to make Intel VT happy ltr %di # Clear registers to allow for future extensions to the # 32-bit boot protocol xorl %ecx, %ecx xorl %edx, %edx xorl %ebx, %ebx xorl %ebp, %ebp xorl %edi, %edi # Set up LDTR to make Intel VT happy lldt %cx jmpl *%eax # Jump to the 32-bit entrypoint ENDPROC(in_pm32)

arch/x86/boot/pmjump.S

  • Clear the general purpose registers before jumping
GLOBAL(in_pm32) # Set up data segments for flat 32-bit mode movl %ecx, %ds movl %ecx, %es movl %ecx, %fs movl %ecx, %gs movl %ecx, %ss # The 32-bit code sets up its own stack, but this way we do have # a valid stack if some debugging hack wants to use it. addl %ebx, %esp # Set up TR to make Intel VT happy ltr %di # Clear registers to allow for future extensions to the # 32-bit boot protocol xorl %ecx, %ecx xorl %edx, %edx xorl %ebx, %ebx xorl %ebp, %ebp xorl %edi, %edi # Set up LDTR to make Intel VT happy lldt %cx jmpl *%eax # Jump to the 32-bit entrypoint ENDPROC(in_pm32)

arch/x86/boot/pmjump.S

  • JUMP!
GLOBAL(in_pm32) # Set up data segments for flat 32-bit mode movl %ecx, %ds movl %ecx, %es movl %ecx, %fs movl %ecx, %gs movl %ecx, %ss # The 32-bit code sets up its own stack, but this way we do have # a valid stack if some debugging hack wants to use it. addl %ebx, %esp # Set up TR to make Intel VT happy ltr %di # Clear registers to allow for future extensions to the # 32-bit boot protocol xorl %ecx, %ecx xorl %edx, %edx xorl %ebx, %ebx xorl %ebp, %ebp xorl %edi, %edi # Set up LDTR to make Intel VT happy lldt %cx jmpl *%eax # Jump to the 32-bit entrypoint ENDPROC(in_pm32)

Takeaway Questions (7)

  • What is the purpose of heap_free function?
    (A). Free memory allocated on the heap.
    (B). Check if the heap has enough space.
    (C). Swap the heap to disk to free memory.

Takeaway Questions (8)

  • Why is there a jump to next instruction?
    (A). Clear out prefetched instuction
    (B). Recalculate PC by an offset
    (C). Store general purpose registers

Thanks

Select a repo