---
# System prepended metadata

title: The Process Address Space

---

# The Process Address Space

典型32bits的分布情形
![](https://i.imgur.com/lb4xv6m.png)

![](https://i.imgur.com/vQ3utYg.png)

每一個process都會有mm_struct來記錄所有跟memory management有關的資訊，VMA會被記錄在mm_struct裡，包括address space的描述，page與VMA與file的對應
![](https://i.imgur.com/qDQIJNR.png)


VMA除了描寫空間之外，還會記錄每個page對應到哪些physical page，這裡page123都有對應到實體記憶體，page45有合法的位址，但尚未跟實體記憶體有對應關係(可能還沒用到就不會配給他，所以第一次存取會page fault)

VMA的page使用者看起來邏輯連續的，但對到實體記憶體可能是不連續的
![](https://i.imgur.com/wn8F8sB.png)

# Address Spaces

VMA是描述process在記憶體分佈的樣子，但這些資訊是哪裡來的? 都是透過linker scripts去把執行檔爬出來然後把資訊轉交給mm_struct才成為記憶體中的樣子，linker scripts可以參考這篇

http://wen00072.github.io/blog/2014/03/14/study-on-the-linker-script/


![](https://i.imgur.com/GA62l5X.png)


OS Process & Thread的差別 (user/kernel) 
https://medium.com/@yovan/os-process-thread-user-kernel-%E7%AD%86%E8%A8%98-aa6e04d35002


Context switch發生了什麼事
https://stackoverflow.com/questions/12630214/context-switch-internals


![](https://i.imgur.com/Fcqvvea.png)


Segmentation Fault
*  If a process accesses a memory address not in a valid memory area
*  if it accesses a valid area in an invalid manner
*  the kernel kills the process with the dreaded “Segmentation Fault” message

Memory areas有很多好處:
* text section
    * 程式碼所在地
* data section
    * 已經被初始化的global variable
* bss section
    * 還沒被初始化的global variable
* User-space stack
    * A memory map of the zero page used for the process’s user-space stack
* An additional text, data, and bss section for each shared library, such as the C library and dynamic linker, loaded into the process’s address space
* Any memory mapped files
* Any shared memory segments
* Any anonymous memory mappings, such as those associated with malloc()

# The Memory Descriptor

用來描述process address space的data structure, called the 
> memory descriptor
> 
This structure contains all the information related to the process address space

![](https://i.imgur.com/hzaVQ4y.png)
![](https://i.imgur.com/fYOY86e.png)

* mm_users:
    * 代表使用這個address space的processes有多少人
    * ex: 2 threads使用，mm_users == 2
* mm_count:
    * 代表還有沒有人在使用這個地址空間
    * 只有還有1個人以上，他就是1
    * 除非mm_users = 0, mm_count才會是0
* mmap:(linked list)
    * contain all the areas in this address space
* mm_rb:(rbtree)
    * 跟mmap存一樣的東西
    * searching time (O(log n))

> 一般來說kernel會盡量避免用兩個不同的struct存放同樣的data,會需要上面這兩個的原因是，mmap讓traversing all elements比較容易(因為linked list結構)，mm_rb則可以比較快找到某個element
> 
所有的mm_struct 會用doubly linked list串起來 (mmlist field)，第一個element會放 init_mm這個memory descriptor, 專門放init process的address space.用來保護整個串列的lock為 mmlist_lock, which is defined in kernel/fork.c

# Allocating a Memory Descriptor

mm_struct 其實就放在process descriptor裡面

![](https://i.imgur.com/9ET2rcW.png)

所以你如果要存取 current process's memory descriptor

```
 current->mm
```
* copy_mm() 用來在fork()的時候把parent的mm_struct copy給child
* mm_struct() 是從mm_cachep slab cache的allocate_mm()分配出來的

> 一般來說，每個process都有自己的mm_struct，and thus a unique process address space.
> 
* processes可以跟自己的child分享他們的address spaces(clone()的時候把CLONE_VM flag立起來即可)
    * 你只要作了這件事，你就是thread!!!
    * 在linux裡，這件事是process跟thread唯一不同的地方，thread在linux裡就是個regular processes merely share certain resources.
    * CLONE_VM立起來的話，就不會再去call allocate_mm()了，child的mm field會直接去指到他parent的地址空間(就是直接住你屋子的概念)

又有人來住屋子，所以mm_users要加一
然後child's mm point to parent's mm
![](https://i.imgur.com/tPGOQGE.png)
![](https://i.imgur.com/93gaImi.png)


# Destroying a Memory Descriptor

process要exit時，會觸發exit_mm()把地址空間給放掉, defined in kernel/exit.c 

![](https://i.imgur.com/9HTdkQq.png)
最後會call mmput()

![](https://i.imgur.com/rL9WgB2.png)
mmdrop()會decrease mm_count
如果mm_count被降為0，就會觸發 free_mm() macro，他就會把前面提到過的資源回收車叫來，把mm_struct裝進mm_cachep slab cache裡面(透過kmem_cache_free())


# The mm_struct and Kernel Threads

Kernel threads因為沒有 Process address space所以也沒有相關的memory descriptor.所以! 

> mm field 在kernel thread's process descriptor is NULL!

這就是kernel thread的定義:
**一個沒有user context的process!!**

這也沒關係，因為kernel thread本來就不會access任何user-space memory.也因為kernel threads沒有任何user-space pages,他們也不配擁有自己的memory descriptor跟page tables.

但，kernel threads要運作，還是要有代替的東東，為了以下幾點不要不要的原因
* 為了讓kernel threads擁有page tables
* 為了不要浪費memory給他配備mm_struct and page struct.
* 為了不要浪費cpu cycles就因為kernel threads想要執行而必須change address space

kernel threads你就使用任何被你搶佔的task的mm_struct吧!!!!

> 當一個process被scheduled進CPU的時候，這個task的地址空間就會被load進來，mm_struct當然也是，active_mm field就會指向這個要進來的新address space.因為kernel threads沒有自己的地址空間而且mm is NULL,所以，當一個kernel threads被scheduled, kernel注意到mm is NULL, 所以會保留前一個process的address space. kernel接著會更新這個kernel threads的active_mm field去指向前一個process的mm_struct，藉此來使用前一個衰鬼的page tables.
> 

> 注意! kernel threads不能access user-space memory, 只能使用地址空間內保留給kernel memory的區域，對於所有processes而言都是這樣
> 

# Virtual Memory Areas

![](https://i.imgur.com/Fklwo8S.png)


描述一段地址空間內的某段連續區域，kernel把一段VMA當成一種object來使用.each memory area has certain properties, such as permissions and a set of associated operations.所以每個VMA structure可以用來表示不同型態的memory areas. ex:
* memory-mapped files
* process's user-space stack

![](https://i.imgur.com/aKrL2v8.png)
![](https://i.imgur.com/H7qMOym.png)

two threads that share an address space also share all the vm_area_struct structures therein

# VMA Flags

The vm_flags field contains bit flags, defined in <linux/mm.h>, that specify the behavior of and provide information about the pages contained in the memory area

他不像是physical page的那些存取限制規定，VMA flags主要是針對kernel的行為，跟hardware無關。
* vm_flags包含了memory area裡面pages的資訊

![](https://i.imgur.com/c90DM0G.png)
![](https://i.imgur.com/CpFPQM5.png)


* permissions for the pages
    * VM_READ
    * VM_WRITE
    * VM_EXEC
* ex: object code for a process
    * VM_READ and VM_EXEC
* data section from an executable object
    * VM_READ and VM_WRITE
* read-only memory mapped data file
    * VM_READ
* VM_SHARED (跟不同process分享這段VMA)
    *  specifies whether the memory area contains a mapping that is shared among multiple processes
* VM_IO
    * 這段VMA代表某個device的I/O mapping space
    * drivers call mmap()就會set這個flag
* VM_RESERVED
    * 這段VMA不能被swapped out
* VM_SEQ_READ
    * 提示kernel這個APP執行方式是很sequential的，kernel可以去做些最佳化
* VM_RAND_READ
    * 跟上面的相反


# VMA Operations
![](https://i.imgur.com/x0g4jmL.png)

![](https://i.imgur.com/b6lURjk.png)


# Memory Areas in Real Life(一個process實際內部mapping的例子)


![](https://i.imgur.com/Ffpyr2z.png)

先了解一下這個process的address space都有些什麼
* text section
* data section
* bss section
* process's stack
假設這個process is dynamically linked with the C library. 這三個section也會存在 libc.so and ld.so. 

> 用這個指令可以看到process內部的mapping狀況
> cat /proc/<pid>/maps 

![](https://i.imgur.com/TVOpQGP.png)

![](https://i.imgur.com/sKhNb68.png)
![](https://i.imgur.com/R14kyr7.png)
![](https://i.imgur.com/G1UrAoX.png)

前三行是text section, data, bss of libc.so(C library) 
下兩行是text and data section for .exe檔
下三行分別是 text, data ,bss for ld.so(dynamic linker)
最後一個是 process's stack

* text(程式碼)
    *  當然是 readable and executable
* data(contain global variables)
    * readable and writable
    * not executable
* bss (contain global variables)
    * readable and writable
    * not executable
* stack
    * readable, writeable, executable

整個地址空間大概

> 如果一個memory region is shared or nonwritable, kernel只會保留一份copy在memory裡就夠了，比如說lib.so，只能讀不能改他，所以lib.so只要占用1212KB在實體記憶體裡就好，而不是每個process都去複製一份lib.so。可以看到整個process可以access 1340KB地址空間，但實際上只消耗了40KB是writable/private.可說是非常節省

(32bits的定址空間是4GB，但不表示process就會用到4G，實際上用到多少還是視實際申請了多少memory而定)

![](https://i.imgur.com/tuRubAJ.png)


上面看到的每個memory areas都是由VMA構成的，即 vm_area_struct. 因為這是一個process而不是thread. 她在task_struct會有自己的mm_struct


# Manipulating Memory Areas

The kernel often has to perform operations on a memory area

ex:
* 給你一個地址你要去檢查這個地址是不是有在VMA裡

    * 很常執行這個操作，這也是mmap()的例行公事
    
可以用find_vma()來達成
(找東西肯定是用紅黑樹比較快)
![](https://i.imgur.com/iKXFRuO.png)

傳入的位址不一定是合法的，找不到會return NULL
有找到的話result會被存在 mmap_cache in mm_struct來增加效率(找不到再去search整個紅黑樹)

![](https://i.imgur.com/qFrUZRp.png)


# find_vma_intersections()
The find_vma_intersection() function returns the first VMA that overlaps a given
address interval

![](https://i.imgur.com/Dbgsn2w.png)


# mmap() and do_mmap(): Creating an Address Interval

![](https://i.imgur.com/flS13aA.png)


do_mmap()被用來create一段新的linear address interval.注意到這不一定會create出一段新的VMA，因為有可能新的interval跟舊的interval相鄰的話，而且他們share一樣的permissions，kernel會把它們合併在一起。如果不是這樣才會create出一個新的VMA。

![](https://i.imgur.com/mfmmbRC.png)

![](https://i.imgur.com/9FcnkWh.png)


![](https://i.imgur.com/Y4xhbxL.png)
右左:存取到非法空間
左右:存取到不存在的空間


* anonymous mapping (對應到記憶體)
    * file = NULL
    * offset = 0
* file-backed mapping (對應到硬碟)
    * otherwised
* addr
    *  specifies the initial address from which to start the search for a free interval
* prot (保護protection)
    * specifies the access permissions for pages in the memory area

![](https://i.imgur.com/sP5I93M.png)

* flags
    *  specifies flags that correspond to the remaining VMA flags
    * 定義你想mmap的這塊區間想怎麼跟其他人分享    

![](https://i.imgur.com/4CmE2ik.png)

有任何的參數是invalid, do_mmap() return negative value.不然一段合適的VMA會被locate出來.

* VMA is allocated from vm_area_cahep slab cache.
* VMA被加入linked list and RBtree


# mmap() system call (Page cache)

Memory Mapped有兩種
1. VA對應到實體記憶體
2. VA對應到 file
    * 對IO存取有極大的好處
    * 對減少memory copy也有很大的好處
    * 對應一旦建立，userspace就可以存取這個空間就像是直接存取IO一樣
    
memory map I/O影片    
https://www.youtube.com/watch?v=m7E9piHcfr4

do_mmap()下面實際會呼叫到mmap()這個system call，實作上參考Page Cache的機制，有另一篇章節會詳細討論

![](https://i.imgur.com/mRI6LTD.png)
![](https://i.imgur.com/C8fPf8e.png)
![](https://i.imgur.com/MJMpGol.png)

Shared file mapping

兩種狀況
* memory-mapped I/O, 直接在VA上讀寫 I/O(等於把存取I/O的速度直接拉到讀寫記憶體一樣快)
    * 原本一個user process要read disk上的東西的話，需要從disk搬到kernel space memory,再從kernel space memory搬到user process的address space，總共要搬兩次。如果現在可以直接把file mapping到 userspace，DMA就可以直接把data搬到user-space家裡

速度大概快兩倍
![](https://i.imgur.com/xKodWST.png)


> 原本AP中要想存取device或實體記憶體的化，因為kernel的保護機制，你只能透過ioctl() 或 read/write system call的機制，但對於大量的資料進出的case來說，比如video或streaming，這樣子的效能是無法被接受的，所以mmap就幫了大忙，device這邊只要配合實作mmap的方法，兩邊就可以盡情地進行交流了
    
> 簡單圖解如下：
> AP->開啟/dev/mem->mmap到實體記憶體位址->AP快樂的存取
> DRIVER->module_init時做ioremap->取得記憶體指標->DRIVER快樂的存取 
 
* IPC
    * data-transfer(not byte stream)
    * with filesystem persistence
    * among unrelated processes

兩個process對shared的使用情況
![](https://i.imgur.com/vFRgzGh.png)
* stack, heap不shared
* lib.so, abc.dat shared
* text shared(同一份code fork出不同的process)


# Memory-mapped I/O的缺點

* memory garbage
* significent waste of memory
* memory mapping must fit in the process address space
    * 32bits system上，會導致記憶體碎片，會越來越難找到大的連續記憶體空間，64bits系統上不會有這個問題
* there is kernel overhead in maintaining mappings


# Removing an Address Interval

![](https://i.imgur.com/fmd5oEm.png)


# Page Tables

Page Tables把VA切成chunks, 每個chunk用index指出一個table. table可能指向更後面的table或直接把PA給翻譯出來。

Linux的pages tables是三層架構@@

* top level
    * page global directory (PGD)
    * which is an array of pgd_t types
* second level
    * page middle directory (PMD)
    * which is an array of pmd_t types
* final level
    * page table entries (PTE)
    * simply the page table and consists of page table entries of type pte_t    


page table lookups通常是由HW完成的
![](https://i.imgur.com/VBK6yCt.png)


# TLB

>  looking up all these addresses in memory can be done only so quickly. To facilitate this, most processors implement a translation lookaside buffer, or simply TLB,which acts as a hardware cache of virtual-to-physical mappings.When accessing a virtual address, the processor first checks whether the mapping is cached in the TLB. If there is a hit, the physical address is immediately returned. Otherwise, if there is a miss, the page tables are consulted for the corresponding physical address

Context switch的時候要flush TLB
因為process A的TLB跟process B的 TLB是不一樣的
![](https://i.imgur.com/703X9WQ.png)

但有也TLB加上pid的版本，這種版本就不用flush TLB

![](https://i.imgur.com/5zbeswQ.png)

* 有分先做cache後作MMU的 (logical cache) 
    * cache要擺Pid or flush while context switch
* 也有先作MMU後作cache的 (physical cache)
    * slow but share without flush
    
![](https://i.imgur.com/Luen8Vb.png)


Q: 實體記憶體被配光了怎麼辦?
A: 所有的應用程式都是無所不用其極地想去配memory,不管是Buffer cache(mmap I/O), slab allocator, 還是user malloc, 他們都很期待所有的physical memory背對應到，但你有VMA，不代表你有用到，所以在實體記憶體真的不夠用之前，一定要有一套機制先起來開始砍沒用到記憶體的人
，Linux的話就是用kernel swapped daemon定期的去檢查

* 哪些人可以被swapped?
    * kernel的program是不能被swapped的
    * 把資源分成actived and nonactived的
    * 設一些watermark