# S4 Hibernate design v1.0 Important note: this document purpose is mostly to describe what would happen between loader realizing that there is an image to resume, up to the moment where kernel C environment is back. It does not discuss device tree suspend/resume, nor the actions taken at suspending and writing the image. ## Suspend to disk (incomplete). The format of suspend image would be ELF. It is more similar to the core dump ELF object than exe or shared. We will allocate OS-specific e_type to indicate it. Also e_machine is useful. ``` #define ET_FREEBSD_S4_IMAGE (ET_LOOS + 1) ``` We would utilize program headers: - To provide both the memory segment information, using PT_LOAD segments with p_paddr to locate them in the physical memory (p_vaddr does not matter, it is essentially automatic with the kernel page table activation). - To point to special segment that would contain a data required to start kernel, very much like savepcb. We need at least the right values for %cr3, %cr4, %rsp, and the entry point. Again, the segment would use OS-specific p_type. ``` #define PT_FREEBSD_S4_PCB 0x61000054 ``` struct s4_pcb { uint64_t cr0; uint64_t cr3; /* kernel page table root */ uint64_t cr4; uint64_t rsp; /* C stack for the entry point */ uint64_t rip; /* entry point */ uint64_t gsbase; /* KGSBASE, to have curthread right away */ uint32_t acpic_facs_hwsig; /* hardware signature from the ACPI FACS table */ }; - Suspended image activator needs some memory for breathe. This should be solved by suspending kernel allocating some amount of low (<= 4G) contigous physical memory, and again communicating the placement of it to the activator by a private-typed segment. The segment must be part of existing valid PT_LOAD segment. Desciption of the trampoline buffer, see below. ``` #define PT_FREEBSD_S4_TRAMPOLINE 0x61000055 struct trampoline_buf_desc { uint64_t num_pages; uint64_t phys_page[]; }; ``` ## Resume before kernel. I propose only support UEFI machines, and have a special chain-loaded UEFI application that activates a suspend image. Selecting the image might be left to the proper loader which already has rich UI facilities. Also, the text below is written in assumption that we work with plain-text image, encryption should be handled by other layers. The activator would do the following: - Read program headers from the provided image. As the first action, it must check compatibility of the kernel physical segment's map with the layout of segments reported by UEFI. Ideally, this should be done after the ExitBootServices(), but then we do not have any way to refuse the further activation other than reboot. Also this is the moment to compare current FACS.HW_SIGNATURE with the one from stored in s4_pcb. - If image is compatible enough with the UEFI map, we allocate from UEFI at the location specified by `PT_FREEBSD_S4_TRAMPOLINE`, copy some bootstrap there. BIG PROBLEM: Then we need to read all the data from PT_LOAD segments of the image into the right memory locations. The trouble there is we need the working UEFI fs access (EFI_LOAD_FILE_PROTOCOL) while potentially we are going to write this data to the firmware allocated memory simply due to the PT_LOAD segments layout. This cannot be solved by buffering, since we need to dump and then restore the whole system memory. An approach that might help there is to remember the whole UEFI memory map, in particular, layout of the boot code and data, and not just runtime memory. Then we might try to put the trampoline into a place which does not intersect with the boot firmware. Next, to give the trampoline a space to work, we may pre-allocate enough pages in the suspending kernel, lets call it trampoline buffer. This memory should be described to the loader as free to use. The consequence is that content of these pages does not need restoration, and loader' trampoline can use it as one-time buffer to transfer data to the memory owned by the UEFI boot but otherwise reused by kernel. Kernel must be careful to ensure that trampoline buffer does not intersect with any EFI memory as described by EFI map. [It is highly likely that it is enough to allocate all pages to the trampoline buffer above 4G](/XKTGrIr9TcOJjtqLav-wNQ) [This still does not help with loading the rest] - If the previous step, the load of the physical segments, went fine, we can definitely call ExitBootServices() from the trampoline and start preparing to enter the kernel entry point. Copying from the trampoline buffer to the final memory happens after the ExitBootServices(), of course. There, we should be able to simply load %cr3 with the kernel page table root pointer, since trampoline is mapped 1:1 by UEFI, and the kernel page table also maps it 1:1 below 4G by conventions. Then we can load the stack pointer. From now on, we should be in the kernel C land. Kernel startup: - We are on BSP, and we must immediately reload right GDT, IDT, TR. - The C stack was left in the call chain to the ACPI power off method. The entry point would restore %rsp at some moment of that call chain where the CPU state was saved. This should be handled more accurately, perhaps unwinding to some mutually agreed point by a mechanism like longjmp. - Then we need to re-initialize our LAPIC and proceed to startup of APs. AP startup should be similar but much simpler than normal boot, since all structures are already set, we only need to point hardware to it. - The trampoline memory can be freed. - Next step is to resume device tree. Hand-waving.