# Outline
1. term project 公布
2. Virtual Memory
3. OS
4. Boot sequence
5. Bare machine
6. Standalone program
7. RISC-V Vector extention
---
# 名詞解釋
* Memory paging(or swapping on some Unix-like systems): is a memory management scheme by which a computer stores and retrieves data from secondary storage[a] for use in main memory. Paging is an important part of virtual memory implementations in modern operating systems, **using secondary storage to let programs exceed the size of available physical memory**.
* [Thread process](https://www.youtube.com/watch?app=desktop&v=4rLW7zg21gI):
* Process 是可以同時運行數個 thread 的容器(container)

* [Virtual memory area (VMA)](https://www.oreilly.com/library/view/linux-device-drivers/9781785280009/4759692f-43fb-4066-86b2-76a90f0707a2.xhtml): **to keep track of the process's memory mappings**; for example, a process has one VMA for its code, one VMA for each type of data, one VMA for each distinct memory mapping (if any), and so on. **VMAs are processor-independent structures**, with permissions and access control flags. Each VMA has a start address, a length, and their sizes are always a multiple of the page size (PAGE_SIZE). A VMA consists of a number of pages, each of which has an entry in the page table.
* VMA 所描述的記憶體區域始終是虛擬連續的,而不是物理上連續的
* [三種 page fault](https://en.wikipedia.org/wiki/Page_fault):
* Minor: 如果在產生 page fault 時頁面已載入到記憶體中,但在記憶體管理單元中未將其標記為已載入到記憶體中,則稱為 minor page fault 或 soft page fault。
* page miss 時到 main memory 抓資料也成功抓到,寫穿回 cache 但 page 尚未寫回。
* Major: 這是作業系統用來根據需要增加可用程式記憶體量的機制。作業系統會**延遲從磁碟載入程式的某些部分**,直到程式嘗試使用它並產生頁面錯誤。如果發生 page fault 時頁面未載入到記憶體中,則稱為 major page fault 或 hard page fault。
* 就是還沒載入到 page 啦
* Invalid: 如果不屬於 VM adress 的參考發生頁面錯誤,這表示記憶體中不存在與其對應的頁面,則稱為 invalid page fault。
* sbrk:
* int brk(const void *addr): 我們可以用此函數來獲得運行中程式對應的堆邊界
* void* sbrk (intptr_t incr): 需要申請的記憶體的大小,然後返回 heap 新的上界 brk 地址。 若 sbrk 的參數為 0,則傳回的為原來的 brk 位址
* malloc():
* 函數用來在程式運行中動態申請內存,heap 地址增加,它在使用時調用 sbrk() 來獲取要在堆中分配的內存。 它們是唯一呼叫 sbrk() 的程式。
* malloc 有 buffer,malloc 調用 sbrk() 系統調用函數申請內存,sbrk 函數可能是申請了 100 多 k 的內存,放入 buffer 中,後面的變量 malloc 申請的內存不超過 100 多 k 的話,就從 buffer 中 將記憶體分發下去
* 當這些記憶體不夠的時候,malloc 會再次呼叫 sbrk 函數繼續申請大塊記憶體
* [crt0](https://zh.wikipedia.org/zh-tw/Crt0):
* 也叫做 c0
* 是連接到 C 程式上的**一組執行啟動常式**
* **進行在呼叫這個程式的主函數之前所需要的任何初始化工作**
* 它一般的都採用叫做 crt0.o 的目的檔形式,經常採用組合語言編寫,連結器自動的將它包括入它所建造的所有可執行檔中
```misp=
.text
.globl _start
_start: # _start is the entry point known to the linker
mov %rsp, %rbp # setup a new stack frame
mov 0(%rbp), %rdi # get argc from the stack
lea 8(%rbp), %rsi # get argv from the stack
call main # %rdi, %rsi are the first two args to main
mov %rax, %rdi # mov the return of main to the first argument
call exit # terminate the program
```
* bss segment:
* the block starting symbol (abbreviated to .bss or bss) is the portion of an object file, executable, or assembly language code.
* contains statically allocated variables that are declared but have not been assigned a value yet.
* Typically only the length of the bss section, but no data, is stored in the object file.
* 程式載入器在載入程式時為 bss 段分配記憶體。將沒有值的變數放置在 .bss section 中,而不是放置在需要初始值資料的 .data 或 .rodata section 中,可以減少目標檔案的大小。
* Objectdump:
* [GNU Binutils](https://zh.wikipedia.org/zh-tw/GNU_Binutils) 的一部分
* 是在類 Unix 作業系統上顯示關於目的檔的各種資訊的命令列程式
* 例如,它可用作反組譯器來以組譯代碼形式檢視可執行檔
* Region of interest: 我也忘了是指啥
---
# Virtual Memory
* 使用 Virtual Memory 的目的
1. isolated
2. protect
* **硬體**提供 virtual -> physical mapping
* Memory manager maps virtual to physical address
* Paging 不是唯一實現 Virtual Memory 的方法
## 舉例
* Each process has its own virtual memory, but all processes must share physical memory

* Pages in physical memory correspond to pages in virtual memory

* Each process has a table mapping its physical/virtual pages

* When a process needs more data, the oldest (LRU) page is removed from PM and replaced with a new page from disk




## Page table
* There are many "label" in Page table, like "valid", "dirty bit", "Permission bits", etc. .
* There are many "level" in Page table, like 5-level page table and 4-level page table.
* We can only “find” mappings for pages we own!
* Therefore, we cannot construct physical addresses we do not have access to :)
* All other mappings are invalid or blank!

* PTs is normally keep in the main memory.
## Translation Lookaside Buffer (TLB)
* Since locality in pages of data, there must be locality in the translations of those pages.
* For speed up , we build a separate cache for the Page Table.
* **Notice** that what is stored in the TLB is NOTmemory data, but the VPN → PPN mappings.
* TLBs usually small, typically 32–128 entries.
* TLB access time comparable to cache (much faster than accessing main memory).
* TLBs usually are **fully/highly associativity**.
* Fetching Data on a Memory Read
* Check TLB (input: VPN, output: PPN)
* Check cache (input: PPN+Page Offset, output: data)
* Very clrarly layout


* Typical TLB Entry Format
* **Valid** whether that TLB ENTRY is valid (unrelated to PT)
* **Access Rights**: Data from the PT
* **Dirty**: Consistent with PT
* **Ref**: Used to implement LRU
* Set when page is accessed, cleared periodically by OS
* **PPN**: Data from PT
* **VPN**: Data from PT

## Context Switching
* How does a single processor run many programs at once?
* Context switch: Changing of internal state of processor (switching between processes).
* Save register values (and PC) and change value in Page Table Base register.
* What happens to the TLB?
* Current entries are for a different process (similar VAs, though!)
* Set all entries to **invalid on context switch**

## Hierarchical Page Table
* In 1-level PTs, if I only need the first page (0) and the last page (N) in my virtual memory space? I have to load the entire page table!
* What if there was a way to only load/create the sections I need as I need them?
* Hierarchical Page Table


### Example: How Big is the Page Table?
* Layout
* 64 MiB RAM
* 32-bit virtual address space
* 1 KiB pages
* Answer: More than 1000 Pages
* Details
* Offset Bits = log2 (1024) = 10
* Virtual Page Bits = 32 - 10 = 22 (2^22 entries in the page table! )
* Physical Page Bits = log2 (2^26) - 10 = 26 - 10 = 16 (2 B)
* Total Bytes ≈ 2^22 * 2 ≈ 2^23 B
* **Number of pages = 2^23 / 2^10 = 2^13 PAGES**
---
# Microkernel and monolithic kernels

## Ecall
* [Venus](https://inst.eecs.berkeley.edu/~cs61c/su21/resources/venus-reference/#writing-larger-risc-v-programs) provides many simple syscalls using the ecall RISC-V instruction.
* How to issue a syscall?
1. Place the syscall number in a0
2. Place arguments to the syscall in the a1 register
3. Issue the ecall instruction
## Multiprogramming
* Solution: **VM**, to illusion of a large, private, uniform store
## Segments


## Fragment memory
* Page boundary check
---
# Standalone program
* Build everything from scretch.(without OS)
* Like HW3.
* Normally, compiler only provide like float.h, limit.h etc.
---
# 參考
* [課程影片](https://www.youtube.com/watch?v=A8AmpS3V8cE)
* [課程教材1](https://wiki.csie.ncku.edu.tw/arch/schedule)
* [課程教材2](https://inst.eecs.berkeley.edu/~cs61c/su20/pdfs/lectures/lec19.pdf)
* [課程教材3](https://inst.eecs.berkeley.edu/~cs61c/su20/pdfs/lectures/lec18.pdf)