Linux Memory Management - Zones

# Linux Memory Management - Zones [TOC] ## Citations **NOTICE NOTICE NOTICE** The **Citations** section contains excerpts from the talks listed in the **References**. The origins are labeled in the end of the paragraphs, and the contents are attributed their respective authors. ### NUMA nodes If we have to look at the physical memory of your system, then we can see that the physical memory is subdivided into the so-called **nodes**. ![Screenshot from 2024-10-21 00-10-13](https://hackmd.io/_uploads/ryfYjjzlye.png) Here we see a simplified example of a NUMA system, with two nodes. Part of the memory is at node 0, and part of the memory is at node 1. Both nodes together that's the total meory space of this system. You can see in a node we also have various CPUs connected, and the other as well in the system has various CPUs connected. These CPUs can very fast access the memory in the same node, but they even can access memory in the other node, but that will be done via internal connect, which is slow. -- [43:23, Tutorial: Linux Memory Management and Containers - Gerlof Langeveld, AT Computing](https://youtu.be/ql1axx--8sI?si=a2OlyiSRvc7qt8La&t=2603) ### Zones So memory is subdivided into node, and nodes are subdivided into **zones** for Linux memory management. ``` cat /proc/buddyinfo Node 0, zone DMA 1 0 0 1 0 1 0 0 0 1 2 Node 0, zone DMA32 4 8 6 8 6 7 7 5 3 5 341 Node 0, zone Normal 4240 1274 1804 1471 746 462 210 81 24 24 5971 ``` ### ZONE_DMA What we see here, the first zone in memory is the so-called **DMA zone**. That's the first 16 MB of the memory. That's is still rather presious memory if you are still using ISA controllers. ISA controller can only address 24 bit addresses, and they can only do DMA in the first 16 MB of memory. So that's a separate zone. -- [44:59, Tutorial: Linux Memory Management and Containers - Gerlof Langeveld, AT Computing](https://youtu.be/ql1axx--8sI?si=1YEEDpZxA3NwdsGo&t=2699) (We can see that it physical memory is not quite a homogeneous pool of addresses. That's where we kind of start abstracting this.) -- [5:42, Inspecting and Optimizing Memory Usage in Linux - João Marcos Costa, Bootlin](https://youtu.be/pIR1H7ZyWe4?si=xd6vDfPmg6pj65Ml&t=342) ### ZONE_DMA32 Then we have another zone, which is the **DMA32 zone**, and that's from the 16 MB to 4 GB, which is addressable by 32 bits for 32bit controllers that might do DMA. They have to have their buffer there. ### ZONE_NORMAL The rest of your memory is in fact, **NORMAL** zone. ### `/proc/buddyinfo` > *Also see the [`proc_buddyinfo(5)`](https://man7.org/linux/man-pages/man5/proc_buddyinfo.5.html)* on the `man` pages. You can have those information about nodes and zones in this `buddyinfo` file in the procfs. ``` # cat /proc/buddyinfo Node 0, zone Normal 28 13 8 3 3 1 2 2 2 3 2 51 ``` First I got this from a Arm board with 32-bit machine. Here we can see that we only have normal zone because we haven't hit the roughly the 900 MB limit. For each of those columns, we have a number of available consecutive memory chunks of a certain size. They all have an order. We have 28 chunks of 4K size, because the order is 0. The next column we have 13 chunks of the "double the page" size (8K size), and so on and so forth. This is also a way to have an idea on how fragmented your memory is, because in the left most side you have the smaller chunks, and in the right most side you have the bigger chunks of memory. -- [7:17, Inspecting and Optimizing Memory Usage in Linux - João Marcos Costa, Bootlin](https://youtu.be/pIR1H7ZyWe4?si=rXiEpeCwgOsytRUN&t=437) (Another example of the `/proc/buddyinfo` from my laptop): ``` Node 0, zone DMA 1 0 0 1 0 1 0 0 0 1 2 Node 0, zone DMA32 4 8 6 8 6 7 7 5 3 5 341 Node 0, zone Normal 4240 1274 1804 1471 746 462 210 81 24 24 5971 ``` ### Allocation and zones If you are allocating memory (for DMA), ultimately you're going to end up allocating pages. That's what we use for the DMA. When allocating a big chunk of contiguous memory, you can ask page allocator for the memory to come from a specific **zone**. At the startup the kernel partitions the memory into different zones in order to give some amount of granularity with regard to the location of the memory allocations, and that's the best granularity we're going to get in order to get the memory in a specific placement. -- [27:25, SUSE Labs Conference 2020 - DMA mapping for the Raspberry Pi 4](https://youtu.be/I3iRRYjfPFY?si=2qCKq-iNCwb7bH7l&t=1645) ## References Also see: 1. [Physical Memory](https://docs.kernel.org/mm/physical_memory.html) in the Linux Kernel Documentation. 2. [Hierarchical NUMA](https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf) on [LPC2017](https://blog.linuxplumbersconf.org/2017/ocw/sessions/4656.html) and its [audio recording](https://blog.linuxplumbersconf.org/2017/wp-content/audio/0914-THU/PlatinumD/04AnshumanKhandual.mp3). 3. [*How the Linux kernel divides up your RAM*](https://utcc.utoronto.ca/~cks/space/blog/linux/KernelMemoryZones) in Chris's Wiki. ### [Tutorial: Linux Memory Management and Containers - Gerlof Langeveld, AT Computing (43:23)](https://youtu.be/ql1axx--8sI?si=a2OlyiSRvc7qt8La&t=2603) {%youtube ql1axx--8sI %} ### [Inspecting and Optimizing Memory Usage in Linux - João Marcos Costa, Bootlin (5:42)](https://youtu.be/pIR1H7ZyWe4?si=xd6vDfPmg6pj65Ml&t=342) {%youtube pIR1H7ZyWe4 %} ### [SUSE Labs Conference 2020 - DMA mapping for the Raspberry Pi 4 (27:25)](https://youtu.be/I3iRRYjfPFY?si=2qCKq-iNCwb7bH7l&t=1645) {%youtube I3iRRYjfPFY %}

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.