Sean20405
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # OS Memory Management ## Ch. 8 Memory Management ### Memory Hierarchy - Main memory: the only large storage media that the CPU can access directly. [More info](/ABZGtc8UT8e3rvG1o9x-2g#Solid-State-Disks-SSD) - ![image](https://hackmd.io/_uploads/HkVNiMyWyx.png) ### Contigous Memory Allocation - Also known as dynamic memory/space allocation - Heap management in Linux - `mmap(-1)`、`malloc()` 會從 program 的 heap 拿記憶體 - `free()` or `munmap()` 釋出記憶體 - 如果 heap 不夠大了則會用 `brk()` 或 `sbrk()` 增大 heap - 長久操作下來,heap 會產生許多洞,該如何解決這些洞? - First-fit:使用第一個看到且夠大的洞 - Best-fit:使用最剛好的(夠大且最小),所以必須歷遍整個 heap - Worst-fit: 使用最大的 - Buddy System - Linux 對實體記憶體的管理方法 - 將記憶體切成不同大小的 block,每個 block 大小為 $2^n$,$n$ 又稱為 order - 對於每次配置,尋找最接近其大小的 block,查詢有無空間,若無則往上一層尋找,並把上層的 block 切半,一半配置給他,一半加進下一層的 list(等於這層少了一個 block,下一層多了一個 block) - ![image](https://hackmd.io/_uploads/HJGoemybke.png) - Fragmentation - external:所有可用空間總和大於某個 process 所需要,但因為這些空間不連續所以無法配給該 process 使用 - internal:配置給 process 的 memory 空間大於 process 真正所需的空間,這些多出來的空間該 process 用不到,而且也沒辦法供其他 process 使用 - Slab Allocation - 為了解決 external fragmentation - 根本原因為每次 allocation 的 size 都不同 - 解法:每次都 allocate 同樣的 size - part of Linux memory allocation - 因為 kernel object(e.g. inode)很常需要 allocate / deallocate - ![image](https://hackmd.io/_uploads/SkMLuXkWJe.png) - 在實體記憶體中每個 block 都是相同大小 - Linux kernel 用了 slab allocation + buddy system - 對於比較常 allocate / deallocate 的 object,並且其 size 都相同,則用 slab allocation - 若是 size 都不同的,則用 buddy system ### Address Binding - address binding: assign memory addresses to instructions and data - can happen at - compile time - load time - execution time - Compile-time binding - 如果 inst. 和 data 的位置在 compile 的時候就被確定,那就可以先配置 - 如果起始位址變更或被其他 process block 著,則需要重新 compile - Load-time binding - compile 產生 relocatable code,給出相對位址,在 load 的時候才據此決定真正的位址 - Execution-time binding - ![image](https://hackmd.io/_uploads/S1_S3QyZkg.png) - 需要硬體支援 e.g. relocation reg in MMU (Memory Management Unit) ### Paging - Compaction - 一個解決 external fragmentation 的方法 - ![image](https://hackmd.io/_uploads/Sk6HamyZ1e.png) - Slow - 另一個方法:paging - Paging: A modern approach that separates logical (virtual) address space from physical address space - 將 physical memory 分成固定大小的 block(稱為 frame) e.g. 4 KB - 將 logical memory 也拆成固定大小的 block(稱為 page) - page table: translate logical to physical addresses(OS 負責) - ![image](https://hackmd.io/_uploads/S1Of3hPbJe.png) - page number 經過 page table 找到對應的 frame number 後即轉為 physical address - page table 放在 memory 裡 - page table 由 OS 填,查詢跟轉換由硬體(MMU, Memory Menagement Unit)負責 - e.g. - ![image](https://hackmd.io/_uploads/B1z-CnDbJg.png) - logical memory has 16 bytes and physical memory has 32 bytes - page size and frame size are 4 bytes - Q: how many bits are required to represent - A logical address: 16 bytes -> 4 bits - The page number: 16 bytes / 4 bytes = 4 -> 2 bits - A physical address: 32 bytes -> 5 bits - The frame number: 32 bytes / 4 bytes = 8 -> 3 bits - The displacement (p): 2 bits - Each entry of page table (f): 3 bits (to store frame index) - OS 會有個 free-frame list,當 process 需要配置記憶體時,OS 會去這個 list 找 memory 中可用的洞 #### Implementation of Page Table - 每個 process 有自己的 page table - `PTBR` (Page Table Base Register) 指向 page table - `PTLR` (Page Table Length Register) 標示大小 - Data access 會需要兩次的 memory access,一次取用 page table 一次取用真的要的資料 → sol: translation look-aside buffers (TLBs) - Translation Look-aside Buffers (TLBs) - 用 associative 的概念實作 - 像一個 cache,在 CPU 裡,存最近查詢的結果 - ![image](https://hackmd.io/_uploads/BydTbTv-1l.png) - TLB 查詢過程利用 fully associative,通常只有 64~1024 entries,所以需要 replacement policy - EAT (Effective Access Time) - Associative Lookup need $\varepsilon$ time unit ($\varepsilon<<1$) - Memory access need 1 time unit - TLB hit ratio $\alpha$ - $\text{EAT}=(1+\varepsilon)\alpha+(2+\varepsilon)(1-\alpha)=2+\varepsilon-\alpha$ #### Structure of the Page Table - Two-level Page Table (Hierarchical Paging) - Page table are large and sparse → break up into multi-level page table, cna allocated on demand - e.g. Intel 80386 - logical addr. ![image](https://hackmd.io/_uploads/SJxaZP_W1l.png) - ![image](https://hackmd.io/_uploads/BylR-PuWJe.png) - 右邊的 addr. 都是 physical addr. - 如果還是用 logical addr. 就需要再 paging 一次,沒必要 - pros:page table 間可以不連續,不用一開始就配置很大的空間 - cons:TLB miss 增加 - Hashed Page Tables - 在大於 32-bit 的處理器上常用,其 page number 的長度可以很長 - 若使用 multi-level page table,addr. bit 越多就需要越多層,一旦遇到 TLB miss 就需要查很多層 - ![image](https://hackmd.io/_uploads/BkrJHPObkl.png) - TLB miss 的花費跟 hash function 和 hash table 的大小有關 - Inverted Page Table - 若用先前提到的 forward page table,每個 process 都會自己有一個,當 process 一堆 page table 加起來占的空間會變很大 - Goal:所有 process 共用同個 table - 反過來,存 frame num. 對應的 page num.,這樣 page table 的大小僅跟 physical memory 的大小有關 - 在 memory reference 的時候,要給 PID 和需要的 page num. - 需要給 PID 是因為可能有多個 process 映射到同一個 frame - ![image](https://hackmd.io/_uploads/SJ6N8POb1x.png) - search 常用 hash 做,不然 linear search 太慢了 - pages 若發生 hash conflict 可以直接用 linked list 串在一起 - 可能比較難實現 shared memory → 因為要讓不同 pid 及不同 p 對應到同一個 i 需要特別處理,可能得用 linked list 取得所有指向同一個 i 的 (pid, p) pair ### Segmentation - 為了保護記憶體(權限問題) - A program is a collection of segments. A segment is a logical unit such as: main program, procedure, function, local variables, global variables, ... - Common segments and permission - stack segment `rw-`: local var., function arg., return addr., CPU flags, ... - 為什麼 stack 不能執行:stack overrun attack - ![image](https://hackmd.io/_uploads/Hyfc2b-MJg.png) - 把惡意程式碼當作 function arg. 存進 stack,他會覆蓋掉原本的資料 - 當運行到 return addr. 的時候,反而會跳到惡意代碼的位址,進而運行 - 禁止手段:把 stack 設為不可被執行,一旦被執行則回報 segmentation fault - data segment - bss, data(無 / 有初始值的全域變數) `rw-` - data RO (Read Only) `r--`: const. - heap `rw-` :::success ```cpp= #include<stdio.h> #include<stdlib.h> int main() { char *p = "hello world"; p[0] = 'H'; } ``` 會觸發 segmentation fault,因為 C 語言內會將第四行看作是 constant literal ::: - code segment `r-x` (`r` 是為了 copy-on-write 或 debug) - hardware - logical addr. 可以被拆成 `<segment-number, offset>` - Segment table:可以將 logical addr. 轉成 physical addr.(跟 page table 很像),每個 entry 存 - base:segment 的 starting physical addr. - limit:segment 的長度 - Segment-table base register (STBR) - Segment-table length register (STLR) - ![image](https://hackmd.io/_uploads/ryXaR--Myx.png) - segment table 放在記憶體(DRAM)內 - 查詢過程由 MMU 執行 - 填表由 OS 負責 - 在查詢的過程中也會順便檢查要求的權限是否正確 - ![image](https://hackmd.io/_uploads/ryCSlz-Gyl.png) - e.g. `exe S2 offset=20` - 去 segment table 找到第二個 segment,檢查 limit 是否 < offset - 檢查權限(此處應為 `r-x` 而要求權限為 `x` -> 通過) - 去拿 phsical addr. = 4300+20 - segmentation 能保護記憶體 - segment limit reg. 避免 out of range - 控管每類 segment 的權限(rwx) - 看 user / kernel mode,user mode 只能存取自己的 segment - paging vs. segmentation - paging 和 segmentation 的功能大部分是重複的(paging 也能管理每個 page table 的 rwx 權限) - segmentation 的實作會根據不同 CPU 而有不同方法 - 每種 CPU 對 segmentation 的支援程度不同 - 如 RISC 的支援較少 - 在現今為了 portability,OS 很少使用 segmentation - 在 Linux segmentation 的概念已經用 paging 實現 - 只有在 x86 架構上 context switch 的時候會用到 ### Example - Linux on Intel 80386 - 僅使用 6 個 segment(儘管這顆 CPU 定義了更多) - Shared by all processes - Kernel code - Kernel data - User code - User data - The default LDT (usually not used) - Per-core - Task-state(TSS, 用於切換 user / kernel mode) - segmented paging - 先做 segmentation 再 paging - ![image](https://hackmd.io/_uploads/rk5iHzbf1e.png) - selector 實際上有 3 bits 用來 control - descriptor table (segmentation table) 共有 $2^{13}=8192$ 個 entry - Intel x86-64 - 64 bits 太大了,不用用那麼多 logical address,只會用 48 bit addressing - ![image](https://hackmd.io/_uploads/S1UjJE2NJe.png) - 可以 paging 到一半就出去,此時那個 page 叫 super page - page size 有 4 KB、2 MB、1 GB,最基本是 2-level page table - 如此可以增加 TLB cover 到的範圍 - 如果需要的記憶體比較大且連續,就可以使用比較大的的 page table - arm - 支援不同的 page size,可以選如何切 logical address(12-8-12、8-10-14) - ![image](https://hackmd.io/_uploads/BkEDb4nVJe.png) ## Ch. 9 Virtual-Memory Management ### Demand Paging - Requiring a (large) secondary storage as the backup of memory pages - Bringing a page to memory only when it is needed - 不用讀太多東西進 memory - logical address space 可以比 physical address space 還大 - 提升 degree of multiprogramming - 支援共享記憶體空間 - Valid-invalid bit in page table entry - 硬體支援,page table 多一個 bit - 1 -> valid -> page 對 frame 的對應關係已建立 - 0 -> valid -> page 還沒對應到 frame -> logical addr. 還沒被對應到 physical addr. - 在 addr. translation 時如果這個 bit 是 0 會觸發 page fault(一個 trap),並交由 OS 處理後續 - ![image](https://hackmd.io/_uploads/ryp5JcZz1e.png) - ![image](https://hackmd.io/_uploads/S1Tiyq-Mke.png) - I/O 很慢(在硬碟找 / 拿 page 的過程) - Fetch instruction 可能也會導致 page fault(如果那段 code 還沒被執行過,而且那行在新的 page) - Performance - Page fault rate $0\le p\le 1$ - Effective Access Time $$\begin{gather*}\text{EAT} = (1-p)\times \text{memory access} + p\times(\text{page fault overhead} \\ + \text{ swap page out} + \text{swap page in} + \text{restart overhead})\end{gather*}$$ - 但 page fault 發生時代價很高,所以通常 page fault rate 都很小 - 將 TLB 加進考慮 - TLB hit: ETA = TLB access - TLB miss & no page fault: ETA = TLB access + page table access + TLB update - TLB miss & page fault: ETA = TLB access + page table access + page fault handling + TLB update ### Page Replacement - Memory 滿了之後要把哪些 frame 刪掉(called "victim frame"),page replacement 就是一個選擇要刪掉誰的 policy - mechanism - ![image](https://hackmd.io/_uploads/H1tX69Wfye.png) - 寫回 victim frame - 不一定每次都要做(不然花費兩倍的時間,因為寫入硬碟也很慢),若 victim frame 之前在 memory 中有被修改過,才需要寫回硬碟 - 使用 dirty bit 判斷該 frame 是否有被修改過(MMU 查詢) ### Page Replacement Algo. - FIFO #### Optimal Page Replacement (OPT) - 查看未來會用到的 frame,看哪個會最晚被用到就刪掉那個 - 最佳解 - 實際上不可能,因為系統很難知道之後哪些 frame 會被用到 #### Least-Recently Used (LRU) - 利用 temporal locality - Implementation - ![image](https://hackmd.io/_uploads/rJYsZo-fJx.png) - 每次新 reference 一個 page,就把他拉到最前面(MRU, Most-Recently Used) - replace 時直接拿最後一個 - 通常會有個 look up table 直接拿到那個 page 在 list 的 index,不然 linear search 太慢 - Stack property - ![image](https://hackmd.io/_uploads/HyWRQiWMJe.png) - 如果 stack 只有 3 個 ``` a: 2 1 0 b: 7 2 1 ``` -> 是原先的 subset - stack 存的 page 越多,能包含小 stack 能做到的事 -> 效果越好 - LRU 是 stable algo.(給越多 frame 表現越好,但非線性) - ![image](https://hackmd.io/_uploads/SyntEiWf1g.png) - FIFO 無 stack property,frame 越多 page miss 不一定越少(Belady’s Anomaly) - ![image](https://hackmd.io/_uploads/HyyAsd7mye.png) - 如何解決:cache simulation or ghost cache #### LRU approximation algo. - Also called second chance algo. or clock algo. - 實務上 store 和 load 都是一個 instr.,如果每次呼叫都要動用一堆指令把來找到 LRU 並搬到最前面,太慢 - 用 approximation 加硬體支援來改善 - 但 LRU 還是適合用在 IO-based 的系統(e.g. file system or database) - Reference bit - 每個 page 多存一個 bit,如果該 page 被 reference 過就把他設成 1 - Implementation - 把 cache 當做一個圓,輪流去看每個 page - 如果該 page 的 ref bit 為 1,改成 0 - 如果是 0,代表上一次把他改成 0 後都沒有被 ref 過,所以就可以把他丟掉 - ![image](https://hackmd.io/_uploads/BJGvJYX71x.png) - 指針在看到兩個 1 的時候就會把他們設成 0 - 看到 0 就代表可以把那個 page 換掉 - enhanced ver. - 加入 dirty 一起考量 (ref, dirty) 1. (0, 0) 最適合被丟掉 2. (0, 1) 雖然需要寫回,但因為最近沒用到,所以還算可以被丟掉 3. (1, 0) 雖然不用寫回,但最近會被用到,根據 Spatial locality 之後可能還會用到,如果先把他丟掉之後還需要讀入,如此花的時間就跟上述情形差不多了,於是不太適合 4. (1, 1) 最不適合 #### Counting Algorithms - 存每個 page 目前被 reference 的次數 - 每次換掉最少被使用的 page(LFU, Least Frequently Used) - Issue:曾經很常用的 page 會一直待在 cache 很難被取代(因為 cnt 很大) - Sol: 週期性 `cnt bit >>= 1` (aging) - LRU vs. LFU - LRU (Least recently used) - (+) fast response to locality change - (-) sequential reference(一次 reference 大量的東西 e.g. 複製檔案)會把 cache 原先的內容都洗掉,但通常這種操作只會 reference 一次,cache 其實不需要存他們 - 現況還是較常用 - LFU (Least frequently used) - (+) 可以處理 sequential reference,因為他們的 cnt 不會很大 - (-) slow response to locality change - 累積 cnt(warm up) 或把很久沒用但以前很常用的 page 刪掉(cool down)需要一些時間 ### Swap-Space Management - 從 disk 中拿一些空間來做虛擬記憶體,讓記憶體看起來很大,並用上面的 replecement algo. 選擇要放在真的記憶體的資料 - swap space 可以真的是一個檔案(e.g. `C:\pagefile.sys` in Windows、有些 Linux 也有),或者是直接從 disk 分一個 partition(Linux 常用) - ![image](https://hackmd.io/_uploads/rkssvKQm1x.png) - swap map 在 RAM 裡面,其他東西在 disk - 還有 page table 和 page cache 在 memory 內 - 對 page table 內某些已經不在 page cache 的 entry,會有個 bit 標示已被 swap out,他的 frame number 會改為表示 swap partition 的 slot number,標示該 page 放在哪裡 - 如果有 page 要被換掉,則會把他放在 swap partition 空的地方,並更新 page table(利用 page table 的 invalid bit 標示他已不再被 cache 住,該 logical addr. 沒有對應到任何 physical addr.) - swap map 的數字代表當前有幾個 process 正在用,在 shared memory 的時候一個位址可能有多個 process 在存取,每當 detach 的時候就會把這裡減一,減到 0 代表沒有人要用了,就可以把他搬離 swap partition ### OS features based on virtual memory #### Page sharing - shared code - dynamic library (e.g. libc) - kernel code - shared memory - 創建一段記憶體 `shmget()` - 把這段記憶體 attach 到 process 上(在這個 process 的 page table 標示這段記憶體的位址)`shmat()` #### Copy-on-write - A fast implementation of memory duplication between processes, e.g., fast fork() - 在 fork 一個 child 的時候我們需要把 parent 的資料都複製給 child,但直接在 memory 複製一份太不實際 - 因此實際上我們會讓 child 的 page table 也先同樣指到一樣的位址 - 只有在 child 寫入的時候,才會真的從記憶體複製一段給 child 用 (copy on write) - ![image](https://hackmd.io/_uploads/BkaLQ8VQ1l.png) - RO (Read Only) / RW (Read Write) bit:偵測該位址有沒有被寫入 - 由 MMU 查找 → 因此 COW 需要 MMU 支援 - 在 child 的 page table 中,每個 page 的預設值都為 RO - 當 child 想被標記 RO 的 page 的時候,MMU 就會觸發一個 trap 給 OS,讓 OS 去複製記憶體,達成 copy on write - `vfork()` 可用在沒有 MMU 的情況 - parent 先停下來讓 child 用他的 memory - 假定 child 等等會運行`exec()`,等到 child 跑 `exec()` 的時候就會有一個新的記憶體位址給他 - 原先的記憶體就可以還給 parent,child 在上面的修改會被刪掉 #### Memory-mapped file - page table 直接對應到 disk 裡的檔案 - 當我們讀到 page table 裡的那個 entry 後會觸發 page fault,於是就會去找 file system 裡面的那個檔案並讀回來(一般的 page fault 會去找 swap partition) - 若需要修改檔案,可以直接改 memory 內容而不需使用 `fwrite()`,並把 page table 標為 dirty,後續就會由 flush threads 寫回 disk - e.g. `mmap(fd)` - 使得讀寫檔案不用進 kernel mode - `fd` 為一般的檔案,就直接做上面的操作 - `fd` 也可以是 temporary object (在 /dev/shm 下),這些 object 由 `shm_open()` 創建,可以做到 IPC(讓多個 process 可以拿到同一塊記憶體位置) :::warning 這裡的 temporary object 沒有指到任何檔案,只是作為資源共享用 ::: - `fd = -1`:向 kernel 要一塊記憶體 - 於是 page 有分兩種 - Anonymous page - 會指到 process 的記憶體片段,如 data、stack,或 `mmap(-1)` 得到的 memory - 寫回時寫進 swap - File page - 指到一個檔案 - 寫回時直接寫進原本的檔案 - e.g. - Memory of executable binary: file - Process stacks: anonymous - Process heaps: anonymous - Memory created by mmap(fd), fd is regular file: file - Memory created by mmap(fd=-1): anonymous ### Performance Issues #### Thrashing - Thrashing: a state that the kernel is busy swapping pages in and out - ![image](https://hackmd.io/_uploads/ry2_7WNm1l.png) - Thrashing 造成的 performance 問題 - processes 因 page fault 在等待 I/O,使 CPU 使用率下降 - OS 發現 CPU 使用率下降,增加 degree of multiprogramming - 新產生的 processes 跟舊的 processes 搶 memory 中能放的 page 數,使 page fault 更多 - 結果 CPU 的使用率繼續下降 - Working-set model - $\Delta$: a fixed number of page references - WSSi(Working Set if the process $P_i$): a set of page referenced in the most recent $\Delta$ for process $P_i$ - $D=\sum\text{WSS_i}$. If $D>m\Rightarrow$ Thrashing: de-allocate pages from processes - Thrashing Management - swapper: aka midterm scheduler,可以先釋放 process,之後再把該 process 和做到一半的資料取回來 - swapped process 用的 memory 於是就可以放出來給其他人用 → useful to thrashing management - 但現代 UNIX OS 會更傾向使用 finer-granularity page swapping,因為當系統在 thrashing 的時候把 free page 讀回來會比較花時間 - 而當還是無法解決時,kill process #### Background Dirty Page Flushing - 在背景 flush dirty page(把 dirty page 寫回) - Related Linux kernel threads - kswapd (background) & DirectReclaim (blocking): scan and reclaim memory - 當 scan 的速度太慢,就會用 DirectReclaim 先 block 住直到掃完 - flush threads (pdflush vs. bdi_writeback): flush dirty page to storage #### Pre-Paging (Pre-Fetching) - 在 page fault 的時候多讀一些 requested page 後面的資料 - For Linux,會多讀 128 KB = 32 pages - (+) Less disk head movement - Sequential disk access 比較有效率 - (+) Fewer page fault - Spatial locality - (-) 如果後續沒有用到,IO 和 memory 就被浪費 #### TLB Reach & Page Size tradeoff - TLB Reach = (TLB Size) X (Page Size) - 不用去查 page table 的情況有多少 - 理想上,每個 process 的 working set 都在 TLB - Increase TLB reach - 增加 TLB size - 然而 TLB 都是用 fullt-associative,電路面積很大,不太可能 - Increase page size - 一個 page 可以涵蓋的範圍更大了 - Issue - Internal fragmentation:作業系統配置給 process 的 memory 空間大於 process 真正所需的空間,其他 process 也無法使用那些地方,造成浪費 - more I/O traffic - sol: 提供多種 page size e.g. IA-64 supports 4 KB and 4 MB pages - page size tradeoff - large page - (+) small page table - (+) large TLB reach - (-) 多帶用不到的資料、浪費記憶體 - small page - (上面反過來) ### OS Examples - Status Bits Associated with a Page - x86 - ![image](https://hackmd.io/_uploads/r1bNkMEQJe.png) - present = valid bit: 該 page 在不在 memory - RW bit: read-only / read-write - reference bit - dirty bit - user / supervisor: page mode,supervisor 只能被 kernel mode 讀到 - Linux page reclaiming - ![image](https://hackmd.io/_uploads/H1lmZfNmkl.png) - file page:來源為實體檔案 - anonymous page:來源如 heap、shared memory - scan window:會把 window 內的 page 拿出來檢查看能不能回收 - writeback:若該 page 正在被寫回,還不能被丟掉,於是放回去 inactive - swappiness - anonymous page 和 file page 發生 page fault 時的代價不同 - Swap IO: 因為 anonymous page 比較小且隨機,處理起來較慢 - File IO: 檔案通常都是連續的,在 file system 中處理起來較快 - swappiness: a tunable kernel knob - 控制從兩個 page 類型回收的比例 - 0 (file) ~ 200 (anon) - 通常設為 60,亦即傾向 swap file page - 然而當今大多用 SSD 儲存,IO 更快,使得兩個類型的 page fault 花費差異縮小 - 在最近的一些 Linux distro. 會把 swappiness 設為大於 100(e.g. Android on smartphone) - Mobile system - Swap - 手機的記憶體多使用快閃記憶體(flash memory, e.g. eMMC / UFS / NVMe) - 這類的記憶體有讀寫壽命的問題,且此類裝置的寫出量很大,因此若一直把資料 swap 到硬碟不太好 - 如 Android 會在記憶體內部創一個 swap 區(zRAM),少用的資料會被壓縮並移到這個區域 - 因此 anonymous page fault 的 cost 反而降低了,swappiness 會傾向 200 - Other method for free memory - Android:直接結束 app - iOS:要求 app 釋放記憶體

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully