# Mini Rpi3 OS - mini UART initial & Mailbox
**Goal:**
設計一個在 Rasberry Pi 3 Model B 上可執行的 OS,同時做一些資訊整理。
## Command
#### aarch64
aarch64 是 ARMv8 (第 8 代 ARM 硬體的規格) 的其一執行狀態(有 AArch64 和 AArch32),使用 A64 RISC 指令集。
(RasPi CPU 為 ARM Cortex-A53,支援 A64)
#### A64
AArch64 執行 A64 指令集,使用 64bit 暫存器,但是指令的長度還是固定32bit。
- The architecture provides **31** general purpose registers
- Each register can be used as
AArch64(A64 ISA): **64-bit X register (X0..X30),**
AArch32(A32 ISA): **32-bit W register (W0..W30).**
These are two separate ways of looking at the same register.
- When a W register is written, as seen in the example above, the top 32 bits of the 64-bit register are zeroed.
**General-purpose integer registers**

沒有 x31 快取器。 該編碼用於特殊用途。
**Floating-point/SIMD/NEON registers**

ref:
[AArch64 Instruction Set](https://medium.com/%20vince-engineering/aarch64-instruction-set-architecture-19d2d68392b)
[ARM64 ABI 慣例概觀](https://learn.microsoft.com/zh-tw/cpp/build/arm64-windows-abi-conventions?view=msvc-160)
[基础篇(三).A64指令集](https://blog.csdn.net/heshuangzong/article/details/128059606)
#### `MacOS` - aarch64-elf
1. Install tool
```
brew install qemu gcc-arm-embedded gdb
```
- aarch64-elf-gcc → 交叉編譯器
- aarch64-elf-ld → 連結器
- aarch64-elf-objcopy → 轉換為二進位檔
- aarch64-elf-gdb → 除錯器
- qemu-system-aarch64 → 模擬 Raspberry Pi 3
2. Create obj
```
aarch64-elf-gcc -c a.S main.c
```
- output a.o, main.o
3. Link
```
aarch64-elf-ld -T linker.ld -o kernel8.elf a.o main.o
```
- -T:
4. Create kernel image
```
aarch64-elf-objcopy -O binary kernel8.elf kernel8.img
```
- makefile
```make
# Compiler / Toolchain
CROSS = aarch64-elf-
CC = $(CROSS)gcc
LD = $(CROSS)ld
OBJCOPY = $(CROSS)objcopy
# Paths
INCLUDE = include
SRC_DIR = src
# Sources
SRCS = main.c $(wildcard $(SRC_DIR)/*.c)
OBJS = $(SRCS:.c=.o)
# Flags
CFLAGS = -I$(INCLUDE) \
-fno-stack-protector -Wall -Wextra -Wpedantic -O2 \
-ffreestanding -nostdinc -nostdlib -nostartfiles -g
# Build rules
all: kernel8.img
@echo "Build completed. Cleaning intermediate files..."
rm -f $(OBJS)
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
kernel8.img: boot.o $(OBJS)
$(LD) -T linker.ld -o kernel8.elf boot.o $(OBJS)
$(OBJCOPY) -O binary kernel8.elf kernel8.img
boot.o: boot.S
$(CC) $(CFLAGS) -c $< -o $@
clean:
rm -f kernel8.elf kernel8.img boot.o $(OBJS)
run:
qemu-system-aarch64 -M raspi3b -serial null -serial stdio \
-display none -kernel kernel8.img
debug:
qemu-system-aarch64 -M raspi3b -kernel kernel8.img -display none -S -s
```
**CFLAGS**
| Flag | Desc | 為什麼要用 |
| -------------------------- | --------------- | ----------- |
| `-I$(INCLUDE)` | 指定標頭檔路徑 | 找到 `.h` 檔 |
| `-fno-stack-protector` | 關閉棧保護 | 沒有 libc |
| `-Wall -Wextra -Wpedantic` | 顯示警告 | 避免語法錯誤 |
| `-O2` | 編譯最佳化 | 提升效能 |
| `-ffreestanding` | freestanding 模式 | 適合裸機環境 |
| `-nostdinc` | 不用系統標頭 | 防止引錯 |
| `-nostdlib` | 不用系統函式庫 | 自己寫 OS |
| `-nostartfiles` | 不用系統啟動檔 | 自己寫 start.S |
| `-g` | 產生除錯符號 | 給 GDB 用 |
#### `Linux`
[IOC5226: Operating System Capstone - lab0](https://oscapstone.github.io/labs/lab0.html)
### Debug
`QEMU`
1. (開兩個terminal, 其中第一個)
```
qemu-system-aarch64 -M raspi3b -kernel kernel8.img -display none -S -s
```
-S:啟動後暫停在第一條指令(等待 GDB)
-s:等同 -gdb tcp::1234 → 開放 port 1234 給 GDB
2. (第二個)
```
aarch64-elf-gdb
file kernel8.elf # 載入符號表
target remote :1234 # 連線到 QEMU
```
### GDB 常用命令
| Command | Desc |
| ---------------- | ------------------- |
| `info registers` | 查看所有暫存器 |
| `x/10i $pc` | 顯示從目前 PC 開始的 10 條指令 |
| `b _start` | 在 `_start` 設中斷點 |
| `b main` | 在 `main()` 設中斷點 |
| `c` | 繼續執行直到中斷點 |
| `si` | 單步執行(進入函式) |
| `ni` | 單步執行(略過函式) |
| `info break` | 顯示目前所有中斷點 |
| `p/x $sp` | 印出 stack pointer |
| `x/16x $sp` | 查看 stack 內容 |
| `detach` | 與 QEMU 分離但不關掉 |
| `quit` | 離開 GDB |
## Lab file structure
Project structure
```
project/
│
├── include/
│ ├── uart.h
│ ├── shell.h
│ ├── string.h
| ├── mbox.h
│ └── reboot.h
│
├── src/
│ ├── uart.c
│ ├── shell.c
│ ├── string.c
| ├── mbox.c
│ └── reboot.c
│
├── main.c
├── boot.S
├── linker.ld
└── Makefile
```
## Boot
| Step | Desc | Why |
| ---------------------------- | ------------------------------ | ------------ |
| 設定 stack pointer (`sp`) | 讓函式呼叫、局部變數可用 | C 語言需要 stack |
| 清空 `.bss` 區 | 將未初始化的全域變數設為 0 | 符合 C 語言規範 |
| 初始化資料段 | 把 `.data` 從 ROM 複製到 RAM | 全域變數正確值 |
| 多核心控制 | 只讓一顆 CPU(master core)執行 kernel | 其他 core idle |
| 設定 Exception Level (EL1) | 進入 OS 所需的權限層 | 使用系統暫存器 |
| 跳轉到 C 的入口 (`kernel`) | 進入主要邏輯 | 開始 OS |
**Why master core:**
:::info
如果四個核心都執行同一份 _start,
它們就會同時,初始化 BSS 區段、設定 stack、呼叫 kernel_main()。
這會導致:多個核心搶著清零同一段記憶體、多個核心共用同一個 stack、系統崩潰。
:::
- **boot.S**
```asm
.section .text.boot
.global _start
# Initialization
_start: // = 0x80000;
// read cpu id, stop slave cores
mrs x1, mpidr_el1
and x1, x1, 0b11
// four-core cpu, & with 11 to preserve latest two bits
cbz x1, setting
// cbz: compare branch zero, if x1==0 jump to setting; otherwise continue
halt:
wfe
//(Wait for event)
// add events here
b halt
# Set `sp` to just before the code (_start).
setting:
ldr x1, =_stack_top
// load data from _stack_top into register x1
mov sp, x1
// move a value x1 into a register sp
ldr x1, =__bss_start
ldr w2, =__bss_size
// __bss_start and __bss_size define in linker file
// x1 is 64bits for address, w2 is 32 bits for value
// loop for clear bss section
clear_bss:
cbz w2, kernel_main
str xzr,[x1],#8
// The line str xzr, [x1], #8 is used to clear a 64-bit word of memory
// by storing the value zero (which is always held in register xzr) at the memory location pointed to by x1
sub w2, w2, #1
cbnz w2, clear_bss
// Go to C program
kernel_main:
bl main
// Branch with Link
b halt
```
>**_start**: 找到 4 個 cpu 中的第 0 個,使其負責 setting,避免重複 setting
> **halt**: 會讓 CPU 進入低功耗狀態,停止執行直到有事件發生(Wait For Event),當事件來臨時,CPU 會被喚醒並重新執行下一條指令(這裡是 b halt,回到 loop)
> **setting**: 設定 sp,選擇一段「空的 RAM 區域」作為 stack 的起點。(在 linker script 中,會定義一個RAM 的頂端位置)
> **clear_bss**: 程式執行時,必須將未初始化的全域變數都初始化成零
> cbz: Compare and branch on zero-> if w2 == 0, branch to #1
> str: Store-> storing xzr(0, 64bits) data to [x1]'s memory addr,then xzr += 8(bytes = 64bits)
>
### Kernel 初始化:
Code segment:

#### initial
Lowest address: text 從 0x80000(320B) 開始
BSS 區(memzero)
Stack
### flow:
```
Power On
↓
GPU Boot (全自動)
↓
GPU 讀 SD 卡上的 bootcode.bin / start.elf
↓
GPU 載入 kernel8.img 到 RAM(0x80000)
↓
GPU 跳到 0x80000 執行 CPU0
↓
============== ARM CPU 開始執行 ==============
boot.S
↓
1. 關閉中斷
2. 設定 Exception Level(通常在 EL2 或 EL1)
3. 設定 Stack Pointer
4. 清 BSS (.bss = 0)
5. 跳到 C 語言的 kernel main
↓
============== C Kernel 初始化開始 ==============
uart_init()
↓
初始化 GPIO / UART
↓
Output
```
## Link
- linker.ld
```ld
SECTIONS
{
/* setting start point */
. = 0x80000;
.text : { KEEP(*(.text.boot)) *(.text .text.* .gnu.linkonce.t*) }
.rodata : { *(.rodata .rodata.* .gnu.linkonce.r*) }
PROVIDE(_data = .);
.data : { *(.data .data.* .gnu.linkonce.d*) }
.bss (NOLOAD) : {
. = ALIGN(16);
__bss_start = .;
*(.bss .bss.*)
*(COMMON)
__bss_end = .;
}
_end = .;
/* stack size = 64KB */
_stack_top = _end + 0x10000;
/DISCARD/ : { *(.comment) *(.gnu*) *(.note*) *(.eh_frame*) }
}
__bss_size = (__bss_end - __bss_start)>>3;
```
- 0x80000: kernel 從實體記憶體位址 0x80000 開始放置,是 Raspberry Pi GPU bootloader 的載入位置。
- .text:
- [KEEP](https://hackmd.io/wuglnPdHQbiTgJUTeHP4cw#KEEPsymbol): 避免被優化,即使這個 section 在最終分析中看起來沒被引用,也絕對不要把它刪掉。command 如果下 --gc-sections,就會啟用「garbage collection of unused sections」,移除沒被用到的 Section。
- .text.boot: .section ".text .boot",放在最下面。
- .text.*: GCC 會把每個 function 放在獨立 section
- [.gnu.linkonce.t](https://hackmd.io/wuglnPdHQbiTgJUTeHP4cw#linkeonce): link once only 資訊 for text,處理重複定義 function
- PROVIDE(_data = .): 定義 _data 標記供組合語言或 C 端使用,如果其他 link 程式已經有這個 symbol 就用原本的,否則新定義(應該有點像 #ifndef,是一個比較安全的 symbol 定義方式)
- .bss:
- .bss (NOLOAD): 程式執行的時候不載入到記憶體(想像ROM或是NOR FLASH,執行時由 boot code 清零)。
- . = ALIGN(16): 對齊下一個 16 bytes = 4 words
- *(.bss .bss.*)
- /DISCARD/ : { *(.comment) *(.gnu*) *(.note*) *(.eh_frame*) }: 丟掉除錯、註解與 C++ 用不到的 section,讓 kernel 更乾淨。
- __bss_size = (__bss_end - __bss_start) >> 3:
bss size = (end - start)/8,計算 bss size 的 byte size 並以每 8 bytes 為一單位,對應
```
clear_bss:
cbz w2, kernel_main
str xzr,[x1],#8
```
:::info
KEEP(*(.text.boot)) 並非必要,
因為這個 section 不會被 linker 丟掉(沒有啟用 LTO/GC)。
:::
## Mini Uart Intializtion
```
uart.h ← 定義暫存器位址與函式原型
uart.c ← 實作初始化、讀寫、傳送函式
main.c ← 主程式呼叫 uart_init(), Hello world
```
### GPIO
RPi3 透過 **[MMIO](https://ithelp.ithome.com.tw/articles/10364165)** 存取 peripherals,當 CPU 對某個特定實體位址進行 load/store 時,實際上是操作該周邊的暫存器。
Rpi3 has several GPIO lines for basic input-output devices such as LED or button. Besides, some GPIO lines provide alternate functions such as UART and SPI.
It should configure GPIO pin to the corresponding mode before using UART, TX(GPIO 14)、RX(GPIO 15)(pin 8 & 10) can be both used for **mini UART** and **PL011 UART**.

ref: https://www.chipwaygo.com/doc/gpio_pin.php
:::info
There is a VideoCore/ARM MMU translating physical addresses to bus addresses. **The MMU maps physical address 0x3f000000 to bus address 0x7e000000**. In your code, you should use physical addresses instead of bus addresses. However, the reference uses bus addresses. You should translate them into physical one.
physical = bus - 0x20000000
:::

define
```
#define MMIO_BASE 0x3F000000
#define AUX_BASE (MMIO_BASE + 0x215000)
//(bus: 0x7E21 5000-> phy: 0x3F21 5000)
```
#### PL011(UART0)
標準 UART,
set ALT = 0
#### Mini Uart(UART1) - 本次作業
屬於「Auxiliary peripherals」的一部分,功能較少,但初始化簡單。
set ALT = 5 (controlled by the auxiliary enable register).
:::info
Auxiliary peripherals Register Map (offset = 0x7E21 5000)
:::
| Address | Register Name | Description | Size |
| ----------- | ------------------ | ---------------------------- | ---- |
| 0x7E21_5000 | AUX_IRQ | Auxiliary Interrupt status | 3 |
| 0x7E21_5004 | **AUX_ENABLES** | Auxiliary enables | 3 |
| 0x7E21_5040 | AUX_MU_IO_REG | Mini Uart I/O Data | 8 |
| 0x7E21_5044 | **AUX_MU_IER_REG** | Mini Uart Interrupt Enable | 8 |
| 0x7E21_5048 | AUX_MU_IIR_REG | Mini Uart Interrupt Identify | 8 |
| 0x7E21_504C | AUX_MU_LCR_REG | Mini Uart Line Control | 8 |
| 0x7E21_5050 | AUX_MU_MCR_REG | Mini Uart Modem Control | 8 |
| 0x7E21_5054 | AUX_MU_LSR_REG | Mini Uart Line Status | 8 |
| 0x7E21_5058 | AUX_MU_MSR_REG | Mini Uart Modem Status | 8 |
| 0x7E21_505C | AUX_MU_SCRATCH | Mini Uart Scratch | 8 |
| 0x7E21_5060 | **AUX_MU_CNTL_REG** | Mini Uart Extra Control | 8 |
| 0x7E21_5064 | AUX_MU_STAT_REG | Mini Uart Extra Status | 32 |
| 0x7E21_5068 | AUX_MU_BAUD_REG | Mini Uart Baudrate | 16 |
| 0x7E21_5080 | AUX_SPI0_CNTL0_REG | SPI 1 Control register 0 | 32 |
| 0x7E21_5084 | AUX_SPI0_CNTL1_REG | SPI 1 Control register 1 | 8 |
| 0x7E21_5088 | AUX_SPI0_STAT_REG | SPI 1 Status | 32 |
| 0x7E21_5090 | AUX_SPI0_IO_REG | SPI 1 Data | 32 |
| 0x7E21_5094 | AUX_SPI0_PEEK_REG | SPI 1 Peek | 16 |
| 0x7E21_50C0 | AUX_SPI1_CNTL0_REG | SPI 2 Control register 0 | 32 |
| 0x7E21_50C4 | AUX_SPI1_CNTL1_REG | SPI 2 Control register 1 | 8 |
| 0x7E21_50C8 | AUX_SPI1_STAT_REG | SPI 2 Status | 32 |
| 0x7E21_50D0 | AUX_SPI1_IO_REG | SPI 2 Data | 32 |
| 0x7E21_50D4 | AUX_SPI1_PEEK_REG | SPI 2 Peek | 16 |
- uart.h
``` cpp
#ifndef UART_H
#define UART_H
#define MMIO_BASE 0x3F000000 // RPi3's base mem addr
// Auxiliary mini UART registers
#define AUX_BASE (MMIO_BASE + 0x215000) //(offset = 0x7E21 5000)
#define AUX_ENABLES ((volatile unsigned int*)(AUX_BASE + 0x04))
#define AUX_MU_IO_REG ((volatile unsigned int*)(AUX_BASE + 0x40))
#define AUX_MU_IER_REG ((volatile unsigned int*)(AUX_BASE + 0x44))
#define AUX_MU_IIR_REG ((volatile unsigned int*)(AUX_BASE + 0x48))
#define AUX_MU_LCR_REG ((volatile unsigned int*)(AUX_BASE + 0x4C))
#define AUX_MU_MCR_REG ((volatile unsigned int*)(AUX_BASE + 0x50))
#define AUX_MU_LSR_REG ((volatile unsigned int*)(AUX_BASE + 0x54))
#define AUX_MU_CNTL_REG ((volatile unsigned int*)(AUX_BASE + 0x60))
#define AUX_MU_BAUD_REG ((volatile unsigned int*)(AUX_BASE + 0x68))
void uart_init(void);
void uart_send(char c);
char uart_recv(void);
void uart_puts(const char* str);
#endif
```
The GPIO has 41 registers. All accesses are assumed to be 32-bit.
| Address | Field Name | Description | Size | Read/Write |
| ----------- | ---------- | ------------------------------------- | ---- | ---------- |
| 0x7E20_0000 | GPFSEL0 | GPIO Function Select 0 | 32 | R/W |
| 0x7E20_0004 | GPFSEL1 | GPIO Function Select 1 | 32 | R/W |
| 0x7E20_0008 | GPFSEL2 | GPIO Function Select 2 | 32 | R/W |
| 0x7E20_000C | GPFSEL3 | GPIO Function Select 3 | 32 | R/W |
| 0x7E20_0010 | GPFSEL4 | GPIO Function Select 4 | 32 | R/W |
| 0x7E20_0014 | GPFSEL5 | GPIO Function Select 5 | 32 | R/W |
| 0x7E20_0018 | - | Reserved | - | - |
| 0x7E20_001C | GPSET0 | GPIO Pin Output Set 0 | 32 | W |
| 0x7E20_0020 | GPSET1 | GPIO Pin Output Set 1 | 32 | W |
| 0x7E20_0024 | - | Reserved | - | - |
| 0x7E20_0028 | GPCLR0 | GPIO Pin Output Clear 0 | 32 | W |
| 0x7E20_002C | GPCLR1 | GPIO Pin Output Clear 1 | 32 | W |
| 0x7E20_0030 | - | Reserved | - | - |
| 0x7E20_0034 | GPLEV0 | GPIO Pin Level 0 | 32 | R |
| 0x7E20_0038 | GPLEV1 | GPIO Pin Level 1 | 32 | R |
| 0x7E20_003C | - | Reserved | - | - |
| 0x7E20_0040 | GPEDS0 | GPIO Pin Event Detect Status 0 | 32 | R/W |
| 0x7E20_0044 | GPEDS1 | GPIO Pin Event Detect Status 1 | 32 | R/W |
| 0x7E20_0048 | - | Reserved | - | - |
| 0x7E20_004C | GPREN0 | GPIO Pin Rising Edge Detect Enable 0 | 32 | R/W |
| 0x7E20_0050 | GPREN1 | GPIO Pin Rising Edge Detect Enable 1 | 32 | R/W |
| 0x7E20_0054 | - | Reserved | - | - |
| 0x7E20_0058 | GPFEN0 | GPIO Pin Falling Edge Detect Enable 0 | 32 | R/W |
| 0x7E20_005C | GPFEN1 | GPIO Pin Falling Edge Detect Enable 1 | 32 | R/W |
| 0x7E20_0060 | - | Reserved | - | - |
| 0x7E20_0064 | GPHEN0 | GPIO Pin High Detect Enable 0 | 32 | R/W |
| 0x7E20_0068 | GPHEN1 | GPIO Pin High Detect Enable 1 | 32 | R/W |
| 0x7E20_006C | - | Reserved | - | - |
| 0x7E20_0070 | GPLEN0 | GPIO Pin Low Detect Enable 0 | 32 | R/W |
| 0x7E20_0074 | GPLEN1 | GPIO Pin Low Detect Enable 1 | 32 | R/W |
| 0x7E20_0078 | - | Reserved | - | - |
| 0x7E20_007C | GPAREN0 | GPIO Pin Async. Rising Edge Detect 0 | 32 | R/W |
| 0x7E20_0080 | GPAREN1 | GPIO Pin Async. Rising Edge Detect 1 | 32 | R/W |
| 0x7E20_0084 | - | Reserved | - | - |
| 0x7E20_0088 | GPAFEN0 | GPIO Pin Async. Falling Edge Detect 0 | 32 | R/W |
| 0x7E20_008C | GPAFEN1 | GPIO Pin Async. Falling Edge Detect 1 | 32 | R/W |
| 0x7E20_0090 | - | Reserved | - | - |
| 0x7E20_0094 | GPPUD | GPIO Pin Pull-up/down Enable | 32 | R/W |
| 0x7E20_0098 | GPPUDCLK0 | GPIO Pin Pull-up/down Enable Clock 0 | 32 | R/W |
| 0x7E20_009C | GPPUDCLK1 | GPIO Pin Pull-up/down Enable Clock 1 | 32 | R/W |
| 0x7E20_00A0 | - | Reserved | - | - |
| 0x7E20_00B0 | - | Test | 4 | R/W |
- gpio.h
``` h
#include <uart.h>
#define MMIO_BASE 0x3F000000
#define GPFSEL0 ((volatile unsigned int*)(MMIO_BASE+0x00200000))
#define GPFSEL1 ((volatile unsigned int*)(MMIO_BASE+0x00200004))
#define GPFSEL2 ((volatile unsigned int*)(MMIO_BASE+0x00200008))
#define GPFSEL3 ((volatile unsigned int*)(MMIO_BASE+0x0020000C))
#define GPFSEL4 ((volatile unsigned int*)(MMIO_BASE+0x00200010))
#define GPFSEL5 ((volatile unsigned int*)(MMIO_BASE+0x00200014))
#define GPSET0 ((volatile unsigned int*)(MMIO_BASE+0x0020001C))
#define GPSET1 ((volatile unsigned int*)(MMIO_BASE+0x00200020))
#define GPCLR0 ((volatile unsigned int*)(MMIO_BASE+0x00200028))
#define GPLEV0 ((volatile unsigned int*)(MMIO_BASE+0x00200034))
#define GPLEV1 ((volatile unsigned int*)(MMIO_BASE+0x00200038))
#define GPEDS0 ((volatile unsigned int*)(MMIO_BASE+0x00200040))
#define GPEDS1 ((volatile unsigned int*)(MMIO_BASE+0x00200044))
#define GPHEN0 ((volatile unsigned int*)(MMIO_BASE+0x00200064))
#define GPHEN1 ((volatile unsigned int*)(MMIO_BASE+0x00200068))
#define GPPUD ((volatile unsigned int*)(MMIO_BASE+0x00200094))
#define GPPUDCLK0 ((volatile unsigned int*)(MMIO_BASE+0x00200098))
#define GPPUDCLK1 ((volatile unsigned int*)(MMIO_BASE+0x0020009C))
```
volatile: 避免被編譯器優化,導致暫存器的輸入/輸出失誤
每個 GPIO 都可以有不同「功能模式」:
000:普通輸入
001:普通輸出
100 ~ 111:ALT0 ~ ALT5
GPFSEL0, 1 這兩個暫存器控制的是 GPIO10~19 的功能選擇。
**GPFSEL0**

**GPFSEL1**

讓 GPIO 14, 15 = ALT 5,調整 12-14, 15-17 bit:
GPFSEL1 = `... 0100 10...`
雖然可以直接: *GPFSEL1 = 0x00048000; 但會覆蓋掉其他 GPIO10~19 的設定(可能其他腳位原本有功能)。
另外 GPIO 腳位預設可能有 上拉電阻 (pull-up) 或 下拉電阻 (pull-down)。
這在 UART 傳輸中會干擾信號,因此要關閉。
##### Initialization
1. Set AUXENB register to enable mini UART. Then mini UART register can be accessed.
2. Set AUX_MU_CNTL_REG to 0. Disable transmitter and receiver during configuration.
3. Set AUX_MU_IER_REG to 0. Disable interrupt because currently you don’t need interrupt.
4. Set AUX_MU_LCR_REG to 3. Set the data size to 8 bit.
5. Set AUX_MU_MCR_REG to 0. Don’t need auto flow control.
6. Set AUX_MU_BAUD to 270. Set baud rate to 115200, After booting, the system clock is 250 MHz.
$baud rate = \frac{clock}{(8*\text{AUX_MU_BAUD}+1)}$
7. Set AUX_MU_IIR_REG to 6. No FIFO.
8. Set AUX_MU_CNTL_REG to 3. Enable the transmitter and receiver
- uart.c
```c
#include "uart.h"
#include "gpio.h"
/**
* Set baud rate and characteristics (115200 8N1) and map to GPIO
*/
void uart_init(void)
{
unsigned int r;
/* initialize UART */
*AUX_ENABLES = 1; // Set AUXENB register to enable mini UART.
*AUX_MU_CNTL_REG = 0; // Disable transmitter and receiver during configuration.
*AUX_MU_IER_REG = 0; // Disable interrupt.
*AUX_MU_LCR_REG = 3; // Set the data size to 8 bit.
*AUX_MU_MCR_REG = 0; // Don’t need auto flow control.
/*
* baud rate = system_clock_freq/(8*AUX_MU_BAUD+1)
* system_clock_freq = 250 MHz(default)
*
* 115200 = 250,000,000 / (8 * (baud_reg + 1))
* baud_reg + 1 = 250,000,000 / (8 * 115,200)
* baud_reg = 270
*/
*AUX_MU_BAUD_REG = 270; // Set baud rate to 115200
*AUX_MU_IIR_REG = 6; // No FIFO
*AUX_MU_CNTL_REG = 3; // Enable the transmitter and receiver
/* map UART1 to GPIO pins */
r = *GPFSEL1;
// set GPFSEL1 = ... 010, 010 ...
r &= ~((7 << 12) | (7 << 15)); // flush GPIO14/15 bits(~b111 = b000)
r |= (2 << 12) | (2 << 15); // set b010
*GPFSEL1 = r; // write back
/* turn off pull-up/down, enable pins 14 and 15
*
* To remove the pull-up/down from a GPIO pin:
* 1. Write to GPPUD to set the required control signal (00 = off).
* 2. Wait 150 cycles for control signal to settle.
* 3. Write to GPPUDCLK0 to clock the control signal into the desired GPIO pins(14, 15).
* 4. Wait 150 cycles for the clock to take effect.
* 5. Write to GPPUD and GPPUDCLK0 again to remove the signal.
*
* */
*GPPUD = 0; // Disable pull-up/down
for (r = 0; r < 150; r++) { } // waiting for setup delay
*GPPUDCLK0 = (1 << 14) | (1 << 15); // Assert Clock on pin 14, 15
for (r = 0; r < 150; r++) { } // waiting for setup delay
*GPPUDCLK0 = 0; // flush GPIO setup
// *GPPUD = 0;
*AUX_MU_CNTL_REG = 3; // enable Tx, Rx
}
void uart_send(char c)
{
while (!(*AUX_MU_LSR_REG & 0x20)); // waitting for Mini Uart Line Status Reg
*AUX_MU_IO_REG = c; // write data in
}
char uart_recv(void)
{
while (!(*AUX_MU_LSR_REG & 0x01)); // waitting for Mini Uart Line Status Reg
return *AUX_MU_IO_REG & 0xFF;
}
// send string
void uart_puts(const char* str)
{
while (*str) uart_send(*str++);
}
```
```
*AUX_MU_CNTL_REG = 0; // disable TX/RX
...
*AUX_MU_CNTL_REG = 3;
```
在 UART 啟動下,不應更改:
- 位元長度(LCR)
- Flow control(MCR)
- Baud rate(BAUD)
- FIFO(IIR)
等 UART 參數
因此正確順序是:
1. 先把 TX/RX 關閉(AUX_MU_CNTL_REG = 0)
2. 設定 UART 所有參數(LCR, BAUD, IER, FIFO…)
3. GPIO ALT function 設定好
4. pull-up/down 設定好
5. 最後再開 TX/RX(AUX_MU_CNTL_REG = 3)
`while (!(*AUX_MU_LSR_REG & 0x20)); `
The AUX_MU_LSR_REG register shows the data status:

1 表示 TX FIFO 可以寫入(>1byte)資料。
`while (!(*AUX_MU_LSR_REG & 0x01));`

1 表示 AUX_MU_IO_REG 有資料可以讀
`return *AUX_MU_IO_REG & 0xFF;`
The AUX_MU_IO_REG:

IO_REG 是 32-bit, 但真正 receive UART data 只在 lower 8-bit, char 也是 8-bit
## Simple Shell
### shell command
- uart.c
```c
// ...
char uart_recv(void)
{
while (!(*AUX_MU_LSR_REG & 0x01)); // waitting for Mini Uart Line Status Reg
return *AUX_MU_IO_REG & 0xFF;
}
// recv string
void uart_gets(char *buf)
{
int i = 0;
char c = uart_recv(); // echo back
while(1) {
if (c == "\n"|| c == "\r") {
buf[i] = "\0"; // null character
uart_puts("\r\n");
return;
}
buf[i++] = c;
}
}
// ...
```
- shell.c
```c
#include "shell.h"
#include "uart.h"
#include "string.h"
#include "reboot.h"
void shell_start(void)
{
uart_puts("Welcome to mini RPi3 Shell!\r\n");
while (1) {
uart_puts("> ");
char buf[64];
uart_gets(buf);
if (!strcmp(buf, "help")) {
uart_puts("help : print this help menu\n");
uart_puts("hello : print Hello World!\n");
uart_puts("reboot : reboot the device\n");
}
else if (!strcmp(buf, "hello")) {
uart_puts("Hello World!\n");
}
else if (!strcmp(buf, "reboot")) {
// ...
}
else {
uart_puts("Unknown command\r\n");
}
}
}
```
- `#include "include/string.h"` 自己複製一份簡易的 string.h
目前還沒辦法刪除輸入的文字
### reboot
RPi3 doesn’t originally provide an on board reset button.
要讓 CPU 重啟(reboot),必須透過 看門狗(Watchdog)和 Power Management 寄存器,
RPi3 的 reboot 是靠 PM_RSTC / PM_WDOG 註冊(BCM2835 ARM Peripherals.doc 裡沒有):
| 寄存器 | 功能 |
| -------------------------- | ---------------------------------------- |
| `PM_RSTC` (0x3F10001C) | Reset Control,控制 CPU reset 類型(soft/full) |
| `PM_WDOG` (0x3F100024) | Watchdog timer,倒數計時後觸發 reset |
| `PM_PASSWORD` (0x5A000000) | 寫入時的保護密碼,防止錯誤寫入 |
1. 寫入 watchdog 倒數
2. 設 reset control flag
3. 寫 magic number
4. CPU reboot
- reboot.c
```c
#include "uart.h"
#define PM_PASSWORD 0x5a000000
#define PM_RSTC 0x3F10001c
#define PM_WDOG 0x3F100024
void set(long addr, unsigned int value) {
volatile unsigned int* point = (unsigned int*)addr;
*point = value;
}
void reset(int tick) { // reboot after watchdog timer expire
set(PM_RSTC, PM_PASSWORD | 0x20); // full reset
set(PM_WDOG, PM_PASSWORD | tick); // number of watchdog tick
}
void cancel_reset() {
set(PM_RSTC, PM_PASSWORD | 0); // full reset
set(PM_WDOG, PM_PASSWORD | 0); // number of watchdog ti
}
```
- `set(PM_RSTC, PM_PASSWORD | 0x20)`
PM_RSTC 設 0x20 → full reset flag 代表看門狗倒數結束後,CPU 會完全 reboot
- `set(PM_WDOG, PM_PASSWORD | tick);`
PM_WDOG 設 tick → 倒數多少個週期後觸發 reset,觸發後,CPU 自動從 bootloader(0x80000)重新啟動
- `cancel_reset()`
取消正在倒數的 watchdog reset
## Mailbox
Mailbox is a communication mechanism between ARM and VideoCoreIV GPU, Hardware Mailbox(only Mailbox 0 & Mailbox 1)
| Hardware Mailbox | Direct | Desc |
| ------------- | ------------ | ----------- |
| **Mailbox 0** | **VC → ARM** | ARM read, GPU write |
| **Mailbox 1** | **ARM → VC** | GPU read, ARM write |
```
+--------------------+
ARM WRITE -------->| Mailbox 1 register |--------> VideoCore
+--------------------+
^
│
│ message = address | channel
v
+--------------------+
ARM READ <--------| Mailbox 0 register |<-------- VideoCore
+--------------------+
```
**Goal**: Get the hardware’s information
- board revision
- ARM memory base address and size
Mailbox messages:
- The mailbox interface has 28 bits (MSB) available for the value and 4 bits (LSB) for the channel.
- Request message: 28 bits (MSB) buffer address
- Response message: 28 bits (MSB) buffer address
- channel: 4 bits(LSB)
>bits[3:0] = channel
>bits[31:4] = message buffer address
為了讓 message 位置記錄在 MSB ,在設定 message 位置時要對齊 4bit = aligned(16),方法是利用 `__attribute__`:
attribute-list是一個用逗號分隔開的屬性清單。__attribute__ ((attribute-list))放於宣告的尾端「;」之前,用於宣告函數、變數、類型的屬性。
其中可以用 `int __attribute__(aligned(16) mbox[])` 來宣告陣列以 16 **words** 為單位對齊。
example:
```cpp
#include <stdio.h>
volatile unsigned int __attribute__((aligned(16))) mbox[36];
int main()
{
printf("%p\n%p\n%p\n", &mbox[0], &mbox[1], &mbox[2]);
return 0;
}
```
output:
0x601040
0x601044
0x601048
ref:
- [來了解GNU C __attribute__](https://medium.com/@fearless1997s/來了解gnu-c-attribute-f06d49af2454)
- [Asked: Is array different w/ and w/o GCC aligned attribute in rpi3?](https://stackoverflow.com/questions/68388487/is-array-different-w-and-w-o-gcc-aligned-attribute-in-rpi3)
lower 4 bit 則寫入channel, Mailbox 0 defines the following channels:
0: Power management
1: Framebuffer
2: Virtual UART
3: VCHIQ
4: LEDs
5: Buttons
6: Touch screen
7:
8: Property tags (ARM -> VC)
9: Property tags (VC -> ARM):
:::info
- Channel 8: Request from ARM for response by VC
- Channel 9: Request from VC for response by ARM (none currently defined)
:::
A Channel is a number that tells you and the GPU what the information being sent through the mailbox means. We will only be needing channel 1, the **framebuffer channel**, and channel 8, the **property channel(ARM -> VC)**.
(本次目標是 board revision & ARM memory base address and size,都是 channel 8: property channel,因此在寫入時 LSB 都直接寫入 0x1000)
在撰寫 mailbox message 時,透過陣列 (buffer) 打包整個 mailbox (包括實際使用到的 buffer size、種類=Tag...),回傳的資訊回直接寫在該 buffer 內。
The buffer contains the following structure:
[0]: 第一個 u32:整個 buffer 的大小 = n*4 (bytes) = n * 32(bits)
[1]: 第二個 u32:request / response code (Request codes:
0x00000000: process request
All other values reserved
Response codes:
0x80000000: request successful
0x80000001: error parsing request buffer (partial response)
All other values reserved)
[2...]: 接下來是一系列 tag 區塊 (tag block) 每個 tag 可能包含:
tag ID (u32)
value buffer 的長度 (bytes) (u32)
request / response 屬性 (tag request code, u32)
value buffer...
[n-1]: 最後一個 u32 必須是 0x00000000 作為結束 tag (end tag)
要跟 VideoCore GPU 要資料(像 Board revision、serial number、memory info)
都必須依照 message Buffer contains structure,後將該 buffer addr 接上 mailbox channel 8。
Mailbox property interface(channel 8, 9) contains several [tags](https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface) to indicate different operations.
#### Board revision
Tag: 0x00010002
Request:
Length: 0
Response:
Length: 4
Value:
u32: board revision
#### ARM memory base address
Tag: 0x00010005
Request:
Length: 0
Response:
Length: 8
Value:
u32: base address in bytes
u32: size in bytes
Future formats may specify multiple base+size combinations.
The buffer contains the following structure:
```
...
[2] tag identifier
[3] value buffer size in bytes
[4] tag request code
[5..] value buffer
...
```

e.g.
```cpp
mbox[0] = 7 * 4 ; // buffer size
mbox[1] = 0x00000000 ; // request code
mbox[2] = 0x00010002 ; // tag: Get Board Revision
mbox[3] = 4 ; // buffer size = 4
mbox[4] = 0 ; // request (bit31=0)
mbox[5] = 0 ; // value buffer (空) → GPU 會填
mbox[6] = 0 ; // end tag
```
Mailbox registers are accessed by MMIO, we only need Mailbox 0 Read/Write (CPU read from GPU), Mailbox 0 status (check GPU status) and Mailbox 1 Read/Write(CPU write to GPU)
### Mailbox registers
Raspberry Pi (BCM2835/6/7) 上 Mailbox base:
`Mailbox base = 0x3F00B880 = MMIO_BASE + 0xB880(offset)`
**mailbox address and flags**
```cpp
#define MBOX_BASE MMIO_BASE + 0xb880
#define MBOX_READ ((volatile unsigned int*)(MBOX_BASE))
#define MBOX_STATUS ((volatile unsigned int*)(MBOX_BASE + 0x18))
#define MBOX_WRITE ((volatile unsigned int*)(MBOX_BASE + 0x20))
#define MBOX_EMPTY 0x40000000
#define MBOX_FULL 0x80000000
```
- MAILBOX_READ: MailBox 0 的 mes,GPU 回傳資料給 CPU,CPU 讀
- MAILBOX_STATUS: `bit31 = FULL, bit30 = EMPTY`
所以 `#define MAILBOX_FULL 0x80000000, #define MAILBOX_EMPTY 0x40000000`
- MAILBOX_WRITE: MailBox 1 的 mes,CPU 要送資料給 GPU,GPU 寫
To pass messages by the mailbox, you need to prepare a message array. Then apply the following steps:
1. Combine the message address (upper 28 bits) with channel number (lower 4 bits)
2. Check if Mailbox 0 status register’s full flag is set.
3. If not, then you can write to Mailbox 1 Read/Write register.
4. Check if Mailbox 0 status register’s empty flag is set.
5. If not, then you can read from Mailbox 0 Read/Write register.
6. Check if the value is the same as you wrote in step 1.
- mbox.h
```cpp
#include "gpio.h"
#ifndef _MBOX_H
#define _MBOX_H
extern volatile unsigned int mbox[36];
/* mailbox address and flags */
#define MBOX_BASE MMIO_BASE + 0xb880
#define MBOX_READ ((volatile unsigned int*)(MBOX_BASE))
#define MBOX_STATUS ((volatile unsigned int*)(MBOX_BASE + 0x18))
#define MBOX_WRITE ((volatile unsigned int*)(MBOX_BASE + 0x20))
#define MBOX_EMPTY 0x40000000
#define MBOX_FULL 0x80000000
#define MBOX_RES_SUCCESS 0x80000000
#define MBOX_RES_ERROR 0x80000001
#define MBOX_REQUEST 0x0
/* channels */
#define MBOX_CH_POWER 0
#define MBOX_CH_FB 1
#define MBOX_CH_VUART 2
#define MBOX_CH_VCHIQ 3
#define MBOX_CH_LEDS 4
#define MBOX_CH_BTNS 5
#define MBOX_CH_TOUCH 6
#define MBOX_CH_COUNT 7
#define MBOX_CH_PROP 8
/* tags */
#define TAG_REQUEST 0x00000000
#define MBOX_TAG_GETSERIAL 0x00010004
#define MBOX_TAG_GETBOARD 0x00010002
#define MBOX_TAG_GETARMMEM 0x00010005
#define MBOX_TAG_LAST 0x00000000
int mbox_call(unsigned char channel);
int get_board_revision();
int get_arm_mem();
#endif
```
- mbox.c
#include "mbox.h"
```cpp
#include "mbox.h"
volatile unsigned int __attribute__((aligned(16))) mbox[36];
int mbox_call(unsigned char channel) {
unsigned int r = (unsigned int)(((unsigned long)mbox) & (~0xF)) | (channel & 0xF);
// waitting to write
while (*MBOX_STATUS & MBOX_FULL) {} // wait until full flag unset
*MBOX_WRITE = r; // write address of message + channel to mailbox
// wait until response
while (1) {
// wait until empty flag unset
while (*MBOX_STATUS & MBOX_EMPTY) {}
// check if it a response to our msg
if (r == *MBOX_READ) {
// check is response success
return mbox[1] == MBOX_RES_SUCCESS;
}
}
return 0;
}
int get_board_revision(){
mbox[0] = 7 * 4; // u32: buffer size in bytes buffer size in bytes
mbox[1] = MBOX_REQUEST; // u32: buffer request
// u8...: sequence of concatenated tags
// u32: tag identifier
// tags begin
mbox[2] = MBOX_TAG_GETBOARD; // tag identifier
// u32: value buffer size in bytes
mbox[3] = 4; // maximum of request and response value buffer's length.
// u32: Request codes
mbox[4] = TAG_REQUEST;
// u8...: value buffer for response
mbox[5] = 0; // value buffer
// u32: 0x0 (end tag)
// tags end
mbox[6] = MBOX_TAG_LAST;
return mbox_call(MBOX_CH_PROP); // message passing procedure call
//we should implement it following the 6 steps provided.
}
int get_arm_mem(){
mbox[0] = 8 * 4; // buffer size in bytes
mbox[1] = MBOX_REQUEST; // u32: buffer request
// tags begin
mbox[2] = MBOX_TAG_GETARMMEM; // tag identifier
mbox[3] = 8; // maximum of request and response value buffer's length.
mbox[4] = TAG_REQUEST;
mbox[5] = 0; // value buffer
mbox[6] = 0; // value buffer
// tags end
mbox[7] = MBOX_TAG_LAST;
return mbox_call(MBOX_CH_PROP); // message passing procedure call, we should implement it following the 6 steps provided above.
}
```
#### Update uart & shell
- uart.c
```cpp
/**
* Display a binary value in hexadecimal
*/
void uart_hex(unsigned int d) {
unsigned int n;
for(int c = 28; c >= 0; c -= 4) {
// get highest tetrad
n = (d >> c) & 0xF;
// 0-9 => '0'-'9', 10-15 => 'A'-'F'
n += ((n > 9) ? 0x37 : 0x30);
uart_send(n);
}
}
```
- shell.c
```cpp
#include "shell.h"
#include "uart.h"
#include "string.h"
#include "reboot.h"
#include "mbox.h"
#include "shell.h"
#include "uart.h"
#include "string.h"
#include "reboot.h"
#include "mbox.h"
void shell_start(void)
{
uart_puts("Welcome to mini RPi3 Shell!\r\n");
uart_puts("Please type 'help' for more assistance!\r\n");
while (1) {
uart_puts("> ");
char buf[64];
uart_gets(buf);
if (!strcmp(buf, "help")) {
uart_puts("help : print this help menu\r\n");
uart_puts("hello : print Hello World!\r\n");
uart_puts("reboot : reboot the device\r\n");
uart_puts("info : get board reversion and ARM memory info\r\n");
} else if (!strcmp(buf, "hello")) {
uart_puts("Hello World!\r\n");
} else if (!strcmp(buf, "reboot")) {
uart_puts("Rebooting...\r\n");
reset(10);
} else if (!strcmp(buf, "info")) {
if (get_board_revision()) {
uart_puts("My board revision is: 0x");
uart_hex(mbox[5]);
uart_puts("\r\n");
}
if (get_arm_mem()) {
uart_puts("My ARM memory base address is: 0x");
uart_hex(mbox[5]); // base
uart_puts("\r\n");
uart_puts("My ARM memory size is: 0x");
uart_hex(mbox[6]); // size
uart_puts("\r\n");
}
}
else {
uart_puts("Unknown command\r\n");
}
}
}
```
### Advence
Mini GPU Compute Offload Demo
### ref:
- [Mailbox property interface](https://github.com/raspberrypi/firmware/wiki/Mailbox-property-interface)
- [Mailbox](https://grasslab.github.io/osdi/en/hardware/mailbox.html#mailbox)
- [raspberrypi/firmware/Mailboxes](https://github.com/raspberrypi/firmware/wiki/Mailboxes)
## Main
- main.c
```c
#include "uart.h"
#include "shell.h"
void main()
{
uart_init();
uart_puts("Hello World!\n");
// into shell
shell_start();
// never return;
// while(1) {}
}
```
---
SD directory struct
/boot (SD root)
├── bootcode.bin
├── start.elf
├── fixup.dat
├── config.txt
└── kernel8.img
- config.txt
```
# Uncomment this to enable the lirc-rpi module
# dtoverlay=lirc-rpi
# Enable audio (loads snd_bcm2835)
arm_64bit=1
enable_uart=1
kernel=kernel8.img
```
- enable_uart=1: 設定
- arm_64bit=1 → RPi3 才會用 64-bit AArch64 模式執行。
- enable_uart=1 → 啟用 UART 通訊
## Result
### on qemu
```shell
(base) denny@jiangguanyudeMacBook-Air Lab1 % make run
qemu-system-aarch64 -M raspi3b -serial null -serial stdio \
-display none -kernel kernel8.img
Hello World!
Welcome to mini RPi3 Shell!
> help
help : print this help menu
hello : print Hello World!
reboot : reboot the device
info : get board reversion and ARM memory info
> reboot
Rebooting...
Hello World!
Welcome to mini RPi3 Shell!
> info
My board revision is: 0x00A02082
My ARM memory base address is: 0x00000000
My ARM memory size is: 0x3C000000
```
### on board
#### 1. get device No.
```
% ls /dev/cu.
/dev/cu.Bluetooth-Incoming-Port /dev/cu.WHP01K
/dev/cu.debug-console /dev/cu.X10
```
#### 2. connect through Baudrate 115200
```
% screen /dev/cu.usbserial-0001 115200
```
result:
```
Hello World!
Welcome to mini RPi3 Shell!
> help
help : print this help menu
hello : print Hello World!
reboot : reboot the device
info : get board reversion and ARM memory info
> info
My board revision is: 0x00A52082
My ARM memory base address is: 0x00000000
My ARM memory size is: 0x3B400000
>
```
#### 3.1 exit
`ctr + A` then `K` then `y`
#### 3.2 pause in the background
`ctr + A` then `D`
find out paused process:
```
% lsof | grep usbserial
screen 22264 ...
```
then 1. return:
```
screen -r 22264
```
2. kill it
```
kill 22264
```