contributed by < Tao Chiu
>
linux
x64
Memory Addressing
IA-32e
System Overviewx86
Long Mode (IA-32e
)SSE
.CPUID.80000008H:EAX[bits 7-0]
(3.3.1)IA-32e
mode whenever CR0.PG
= 1 and IA32_EFER.LME
= 1. This fact is reported in IA32_EFER.LMA
[bit 10]. Software cannot set this bit directly; it is always the logical-AND of CR0.PG
and IA32_EFER.LME
.IA-32e
SegmentationCS
, DS
, ES
, SS
as zero, creating a linear address that is equal to the effective address.FS
and GS
are used for memory addressing. These segment registers (which hold the segment base) can be used as additional base registers in linear address calculationsFS
-segment and GS
-segment overrides are not checked for a runtime limit nor subjected to attribute-checking. Normal segment loads into FS
and GS
load a standard 32-bit base value in the hidden portion of the segment register. The base address bits above the standard 32 bits are cleared to 0 to allow consistency for implementations that use less than 64 bits(3.4.4).fs
/gs
base, mentioned here.
1. Code- and Data-Segment Descriptor Types
IA-32e
mode is active (A32_EFER.LMA = 1
)
CS.L
), the processor is running in compatability mode (as if IA-32
). In this case, D-bit (CS.D
) select the default size for data and addresses. if CS.D
= 0, the default address size is 16-bit. Otherwise, it is 32-bit.CS.L
= 1, the only valid setting is CS.D
= 0.2. System Descriptor Types: System descriptors such as call gates, interrupt gates, and task gates on IA-32e
are extended to 16 bytes long. The Type
field for a system descriptor define whether it is a call gate, interrupt(trap) gate or a task gate, described below:
IA-32e
PagingThere are four paging modes supported by intel, including basic 32-bit paging, PAE paging, 4-level paging, and 5-level paging.
CR0.WP
, bit[16]: If CR0.WP = 0, supervisor-mode write accesses are allowed to linear addresses with read-only access rights.CR0.PG
, bit[31]: Enables paging.CR4.PAE
, bit[5]: Determines paging mode together with LME
in IA32_EFER
MSR.CR4.PSE
, bit[4]: Enables 4-MByte pages for 32-bit pagingCR4.PGE
, bit[7]: If CR4.PGE
= 1, specified translations may be shared across address space.CR4.LA57
, bit[12]: Determinses whether 4-level or 5-level paging is used for IA-32e
paging. A #GP will be triggered if software try to change this bit when CR4.PG
is set.CR4.PCIDE
, bit[17]: Enables process-context identifiers (PCIDs) for 4-level paging. PCIDs allow a logical processor to cache information for multiple linear-address spacesCR4.SMEP
, bit[20]: If CR4.SMEP
= 1, software operating in supervisor mode cannot fetch instructions from linear addresses that are accessible in user mode.CR4.SMAP
, bit[21]: If CR4.SMAP = 1, software operating in supervisor mode cannot access data at linear addresses that are accessible in user mode.CR4.PKE
, bit[22]: Allows each linear address to be associated with a protection key.CR4.CET
, bit[23]: If CR4.CET = 1, certain memory accesses are identified as shadow-stack accesses.CR4.PKS
, bit[24]: Protection keys for supervisor-mode pages.LME
in IA32_EFER
MSR, bit[8]: Determines paging mode together with CR4.PAE
.NXE
in IA32_EFER
MSR, bit[11]: Enables non-executable (NX
) pages.AC
in EFLAGS
, bit[18]:CR3
and the corresponding paging structure should be properly initiated before enabling paging. Also, the page fault handler should be correctly set and enabled in IDT
, or any paging related exceptions may cause processor to reset due to tripple faults.CR0.PG
= 1, CR4.PAE
= 1, and IA32_EFER.LME
= 1. 4-level paging translates 48-bit linear addresses to 52-bit physical addresses. However, at most 256 TBytes (48-bit, 9*4 + 12) of linear-address space may be accessed at any given time.CR0.PG
= 1, CR4.PAE
= 1, IA32_EFER.LME
= 1, and CR4.LA57
= 1. 5-level paging translates 57-bit linear addresses to 52-bit physical addresses.CR3
CR3
is used to locate the first paging-structure, the PML4
or PML5
table. Use of CR3
with 4(5)-level paging depends on whether process-context identifiers (PCIDs) have been enabled by setting CR4.PCIDE
:
XD
-bit: If IA32_EFER.NXE
= 1, execute-disable (if 1
, instruction fetches are not allowed from the region controlled by this entry); otherwise, reversed.CR4.PKE
= 1 or CR4.PKS
= 1, this may control the page’s access rights.PS
-bit (7): Page size. If the entry maps either a 1-GB
, or 2-MB
page, this bit must be 1. Otherwise, this entry references a next-level page table.PAT
-bit: Indirectly determines the memory type used to access the 4-KB
page controlled by this entry.G
-bit: Global; if CR4.PGE = 1, determines whether the translation is global.D
-bit: Dirty; indicates whether software has written to the page referenced by this entry.A
-bit: Accessed; indicates whether software has accessed to the page referenced by this entry.PCD
-bit and PWT
-bit: Page-level cache disable and page-level write through; indirectly determines the memory type used to access either a next-level page table, or a page.U/S
-bit: User/supervisor; if 0, user-mode accesses are not allowed to the region controlled by this entry.R/W
-bit: Read/write; if 0, writes may not be allowed to the region controlled by this entry.P
-bit: Present; must be 1 to reference a page or map a page table.
For a page table at each level,
Note: A processor may cache information from the paging-structure entries in TLBs and paging-structure caches (see Section 4.10). These structures may include information about access rights. The processor may enforce access rights based on the TLBs and paging-structure caches instead of on the paging structures in memory. See section 4.10.4.2 for more information on invalidating TLBs.
It is not covered by this topic at this time. Please refer to the manual, or my seminar pressentation (Oct. 2019) for more informations.
The menual use the term "memory type" of a memory access as the type of caching used for that access. Such behavior may be jointly controlled by bits on paging structures, memory-type range registers (MTRR
s), and a 64-bit MSR table called page attribute table (IA32_PAT
) if supported. PAT is supported by all processors that support 4-level or 5-level, thus we will skip cache control mechanism without PAT support.
Encoding in MTRR | Memory Type |
---|---|
0x00 | UC |
0x01 | WC |
0x04 | WT |
0x05 | WP |
0x06 | WB |
0x2, 0x3, 0x7-0xFF | Reserved |
MTRRs control caching of selected regions of physical memory. They can be divided into two categories. One controls first 1MB of physical addresses from 0x0 to 0xFFFFF with 11 fixed range MTRRs. While the other may control any physical pages with m
number of variable range MTRRs, where m
is reported by field VCNT
of MTRRCAP
register.
To define a memory type for a region of physical addresses, we corperate with IA32_MTTR_PHYSBASEn
and IA32_MTTR_MASKn
registers. PhysBase
and PhysMask
fields are used to define boundaries of the memory region, where
Then, the field Type
are used to encode an actual memory type of that region.
Encoding | Memory Type |
---|---|
0x00 | UC |
0x01 | WC |
0x04 | WT |
0x05 | WP |
0x06 | WB |
0x07 | UC- |
0x2, 0x3, 0x8-0xFF | Reserved |
The PAT is a companion feature to the MTRRs; that is, the MTRRs allow mapping of memory types to regions of the physical address space, where the PAT allows mapping of memory types to pages within the linear address space. PAT is more flexibale than MTRRs in the way that it does not have hardware limitation on number of such attribute settings allowed. |
PCID
)Processors may cache data from paging structures to accelerate address translation process. For a cache entry in either TLB or paging-structure caches, the processor may associate current PCID with the translation info to test if it belongs to current address space.
PCID is a 12-bit identifier with following properties.
IA-32e
with Linux (5.8.7)Test Environment:
DEBUG_KERNEL
DEBUG_INFO
X86_5LEVEL
PREEMPT
Setting up qemu:
building
qemu option lists