# NEORV32
###### tags: `Computer Architecture 2021`
The [NEORV32](https://github.com/stnolting/neorv32) Processor is a customizable microcontroller-like **system on chip** (SoC) that is based on the **RISC-V** NEORV32 CPU.
We focus on NEORV32 CPU feature.The following are our observations.
## Architecture

The NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture.
### pipeline

- Fornt_end : responsible for fetching 32-bit instruction words.The instruction data is stored to a **FIFO queue** - the **instruction prefetch buffer**.
- Back_end : responsible for the actual execution of the instruction,which takes data from the **instruction prefetch buffer** and assembles 32-bit instruction words.
### multi-cycle

- Register file
- A register file is an array of processor registers in a central processing unit (CPU).
- ALU
- performs arithmetic and bitwise operations on integer binary numbers.
- Co-Processor
- shifter
- Mul/Div
- Bit man
- FPU
### Memory

As a Von-Neumann machine, the CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are merged into a single processor-internal bus via a prioritizing bus switch (data accesses have higher priority).
### Summary
The **optimal CPI** (cycles per instructions) is 2, because the optimal CPI of pipeline is 1 and the optimal CPI of multi-cycle is 1.
## Toolchain setup
- We installed a prebuilt toolchain from [riscv-gcc-prebuilt](https://github.com/stnolting/riscv-gcc-prebuilt).
- There are two option you can choose:
- rv32i-2.0.0: 32-bit basic integer instruction set, there are 32 32-bit registers.
- rv32e-1.0.0: The difference with RV32I is that the register is reduced to only 16 32-bit registers.
| Release (tag) | Download archive | GCC | binutils | march | mabi | clib |
|----|----|----|----|----|----|----|
rv32i-2.0.0 | [💾 download (.tar.gz)](https://github.com/stnolting/riscv-gcc-prebuilt/releases/tag/rv32i-2.0.0) |10.2.0 |2.35 |rv32i |ilp32 |newlib
rv32e-1.0.0 |[💾 download (.tar.gz)](https://github.com/stnolting/riscv-gcc-prebuilt/releases/tag/rv32e-1.0.0) |10.1.0 |2.34 |rv32e |ilp32e |newlib
- create a folder where you want to install toolchain,for example `/opt/riscv/`
```c=
$ sudo mkdir /opt/riscv
```
- Navigate to the download folder. Decompress your toolchain
```c=
$ sudo tar -xzf riscv32-unknown-elf.gcc-10.2.0.rv32i.ilp32.newlib.tar.gz -C /opt/riscv/
```
- Now add the toolchain's `bin` folder to your system's PATH environment variable,or you can add this instruction to the `.bashrc`.So you don’t have to enter it again every time you boot computer.
```c=
$ export PATH=$PATH:/opt/riscv/bin
```
- Now,you can test the toolchain
```c=
$ riscv32-unknown-elf-gcc -v
```

## Neorv32/sim setup
- You should clone the latest version v1.6.5
```c=
$ git clone git@github.com:stnolting/neorv32.git --branch v1.6.5
```
- Navigating to an example project in the NEORV32 example folder and execute the following command:
```c=
$ cd neorv32/sw/example/blink_led/
$ make check
```
- Everything is working fine if `Toolchain check OK` appears at the end.

- There are instructions for the built-in command set simulator in sim/README
- Before run the `./sim/run_riscv_arch_test.sh`,you should install `ghdl` which providing a virtual execution platform for the test framework.
```c=
$ sudo apt install ghdl
```
- Now,you can run the tests.
:warning: Simulating all the test cases takes quite some time.
```c=
$ ./sim/run_riscv_arch_test.sh
```
- You can see it complete successfully.

## Hello world!
- `Hello_world` example is at `/neorv32/sw/example/hello_world/` folder.
- You can use `make help` to check `Makefile` corresponding `make instruction`.
```c=
$ make help
```

- How to know these `make instruction` ?
- You can see the `Makefile` which only `include $(NEORV32_HOME)/sw/common/common.mk`
- So, you can check this `common.mk` at `/neorv32/sw/commaom/` folder.
### Execute
```c=
$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=rv32imac clean_all sim
```
---
- result
Represent the simulation mode UART0 is enabled by the `USER_FLAGS+=-DUART0_SIM_MODE` makefile flag.
```
$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=rv32imac clean_all sim
../../../sw/lib/source/neorv32_uart.c: In function 'neorv32_uart0_setup':
../../../sw/lib/source/neorv32_uart.c:140:4: warning: #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! [-Wcpp]
140 | #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only!
```
The executed file size(text) and static data memory requirements (data, bss).
```
| ^~~~~~~
Memory utilization:
text data bss dec hex filename
4612 0 116 4728 1278 main.elf
```
The application code is installed as pre-initialized IMEM. This is the default approach for simulation.
```
Compiling ../../../sw/image_gen/image_gen
Installing application image to ../../../rtl/core/neorv32_application_image.vhd
```
The description of UART.
```
Simulating neorv32_application_image.vhd...
Tip: Compile application with USER_FLAGS+=-DUART[0/1]_SIM_MODE to auto-enable UART[0/1]'s simulation mode (redirect UART output to simulator console).
```
List of (default) arguments that were send to the simulator. Here: maximum simulation time (10ms).
```
Using simulation runtime args: --stop-time=10ms
```
"Sanity checks" from the core’s VHDL files. These reports give some brief information about the SoC/CPU configuration (→ generics). If there are problems with the current configuration, an ERROR will appear.
```
../../rtl/core/neorv32_top.vhd:361:3:@0ms:(assertion note): NEORV32 PROCESSOR IO Configuration: GPIO MTIME UART0 UART1 SPI TWI PWM WDT CFS SLINK NEOLED XIRQ GPTMR XIP
../../rtl/core/neorv32_top.vhd:386:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).
../../rtl/core/neorv32_top.vhd:410:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing on-chip debugger (OCD).
../../rtl/core/neorv32_cpu.vhd:171:3:@0ms:(assertion note): NEORV32 CPU ISA Configuration (MARCH): RV32IMACBU_Zicsr_Zicntr_Zihpm_Zifencei_Zfinx_Debug
../../rtl/core/neorv32_cpu.vhd:193:3:@0ms:(assertion note): NEORV32 CPU CONFIG NOTE: Implementing NO dedicated hardware reset for uncritical registers (default, might reduce area). Set package constant <dedicated_reset_c> = TRUE to configure a DEFINED reset value for all CPU registers.
../../rtl/core/neorv32_cpu_cp_bitmanip.vhd:147:3:@0ms:(assertion note): Implementing bit-manipulation (B) sub-extensions: ZbbZba
../../rtl/core/mem/neorv32_imem.legacy.vhd:90:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Using legacy HDL style IMEM.
../../rtl/core/mem/neorv32_imem.legacy.vhd:91:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (16384 bytes), pre-initialized with application (4612 bytes).
../../rtl/core/mem/neorv32_dmem.legacy.vhd:73:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Using legacy HDL style DMEM.
../../rtl/core/mem/neorv32_dmem.legacy.vhd:74:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal DMEM (RAM, 8192 bytes).
../../rtl/core/neorv32_wishbone.vhd:144:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing STANDARD Wishbone protocol.
../../rtl/core/neorv32_wishbone.vhd:148:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing auto-timeout (256 cycles).
../../rtl/core/neorv32_wishbone.vhd:152:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing LITTLE-endian byte order.
../../rtl/core/neorv32_wishbone.vhd:156:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing registered RX path.
../../rtl/core/neorv32_slink.vhd:175:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing 8 RX and 8 TX stream links.
```
Program execute.
```
##
## ## ## ##
## ## ######### ######## ######## ## ## ######## ######## ## ################
#### ## ## ## ## ## ## ## ## ## ## ## ## ## #### ####
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
## ## ## ######### ## ## ######### ## ## ##### ## ## #### ###### ####
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
## #### ## ## ## ## ## ## ## ## ## ## ## #### ####
## ## ######### ######## ## ## ## ######## ########## ## ################
## ## ## ##
##
Hello world! :)
```

## NEORV32 CPU feature
The following is RISC-V-compatible ISA extensions
### rv32I
- I : **Base Integer ISA**. CPU always supports the complete `rv32i`.
- When you `make` your project, it would **defaultly** use `rv32i` if you didn't use other `MARCH`.
- The base instruction set includes the following instructions:
- alu: `addi` `slti` `sltiu` `xori` `ori` `andi` `slli` `srli` `srai` `add` `sub` `sll` `slt` `sltu` `xor` `srl` `sra` `or` `and`
- memory: `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw`
- immediate: `lui` `auipc`
- branches: `beq` `bne` `blt` `bge` `bltu` `bgeu`
- jumps: `jal` `jalr`
- environment: `ecall` `ebreak` `fence`
<!--

-->
### rv32A
- A : Atomic Memory Access
- Atomic operation means An operation or series of operations that **cannot be interrupted**.
- There are two instructions as follow.
- `lr.w` : load-reservate
- It like normal `lw` instruction but it would also set a **data memory access lock**.
- `sc.w` : store-conditional
- It would only conduct an memory write operation if the lock is still `intact` and it also return lock state.
<!--

-->
### rv32B
- B : Bit-Manipulation Operations
- The `B` extension is frozen but not officially ratified yet.
### rv32C
- C : Compressed Instructions
- Provide the 16-bit encode let reduce code space size.
- The `C` extension is available when the `CPU_EXTENSION_RISCV_C`configuration generic is true.
- The base instruction set includes the following instructions:
- `c.addi4spn` `c.lw` `c.sw` `c.nop` `c.addi` `c.jal` `c.li` `c.addi16sp` `c.lui` `c.srli` `c.srai` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.j` `c.beqz` `c.bnez` `c.slli` `c.lwsp` `c.jr` `c.mv` `c.ebreak` `c.jalr` `c.add` `c.swsp`
<!--

-->
### rv32M
- M : Integer Multiplication and Division
- When the CPU_EXTENSION_RISCV_M configuration generic is true,hardware-accelerated integer multiplication and division operations are available.
- multiplication: `mul` `mulh` `mulhsu` `mulhu`
- division: `div` `divu` `rem` `remu`
<!--

-->
### rv32U
- U : Less-Privileged User Mode
- The user-mode ISA extensions adds a second less-privileged operation mode,besides the basic (and highest-privileged) machine-mode.
- If the CPU_EXTENSION_RISCV_U configuration generic is true. Code executed in user-mode can't access machine-mode CSRs.
- Furthermore, user-mode access to the address space (like peripheral/IO devices) can be constrained via the physical memory protection (PMP).
### rv32X
- X : NEORV32-Specific (Custom) Extensions
- The NEORV32-specific extensions are always enabled and indicated by the set X bit in the misa CSR.
- The CPU provides 16 fast interrupt interrupts (FIRQ) to controlle via custom bits in the mie and mip CSR.
- All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception.
### rv32_Zfinx
- Zfinx : Implement the 32-bit single-precision floating-point extension (using integer registers) when true.
- The base instruction set includes the following instructions:
- conversion: `fcvt.s.w` `fcvt.s.wu` `fcvt.w.s` `fcvt.wu.s`
- comparison: `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s`
- computational: `fadd.s` `fsub.s` `fmul.s`
- sign-injection: `fsgnj.s` `fsgnjn.s` `fsgnjx.s`
- number classification: `fclass.s`
- additional CSRs: `fcsr` `frm` `fflags`
- Fused `multiply-add instructions` are not supported!
- `Division fdiv.s` and `square root fsqrt.s` instructions are not supported yet!
- Subnormal numbers ("de-normalized" numbers) are not supported by the NEORV32. Subnormal numbers (exponent = 0) are flushed to zero setting them to +/- 0.
- The Zfinx extension is not yet officially ratified!!
### rv32_Zicsr
- Zicsr : Implement the control and status register (CSR) access instructions when true. When this option is disabled, no interrupts, no exceptions and no machine information will be available.
- The base instruction set includes the following instructions:
- CSR access: `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci`
- environment: `mret` `wfi`
- `wfi` : "wait for interrupt instruction" acts like a sleep command. When executed, the CPU is halted until a valid interrupt request occurs.
### rv32_Zicntr
- Zicntr : Implement the basic CPU counter CSRs (`time[h]`, `[m]cycle[h]`, `[m]instret[h]`) when true.
- This extensions is stated is mandatory by the RISC-V spec.
## Example code
### Find the Highest Altitude ([1732. Find the Highest Altitude](https://leetcode.com/problems/find-the-highest-altitude/))
There is a biker going on a road trip. The road trip consists of ```n + 1``` points at different altitudes. The biker starts his trip on point ```0``` with altitude equal ```0```.
You are given an integer array ```gain``` of length ```n``` where ```gain[i]``` is the **net gain in altitude** between points ```i``` and ```i + 1``` for all ```(0 <= i < n```). Return the **highest altitude** of a point.
```c=
#include <neorv32.h>
int main() {
// capture all exceptions and give debug info via UART
// this is not required, but keeps us safe
neorv32_rte_setup();
// abort if UART0 is not implemented
if (neorv32_uart0_available() == 0) {
return 1;
}
// init UART at default baud rate, no parity bits, ho hw flow control
neorv32_uart0_setup(BAUD_RATE, PARITY_NONE, FLOW_CONTROL_NONE);
// check available hardware extensions and compare with compiler flags
neorv32_rte_check_isa(0); // silent = 0 -> show message if isa mismatch
int gain[] = {-5,1,5,0,-7, -5, 10, 9, -4, 8, -4, 2, -4, 4, 3, -2}, len = 16;
int altitude = 0, highest = 0;
for(int i = 0; i < len; i++) {
altitude += gain[i];
if(highest < altitude)
highest = altitude;
}
neorv32_uart0_printf("highest altitude : %i\n", highest);
return 0;
}
```

### CPU feature compare
| | text | data | bss | dec | hex | executable size |
| -------- | -------- | -------- | -------- | -------- | -------- |-|
| rv32i | 4408 | 0 |116 |4524 | 11ac| 4420 |
| rv32ia | 4408 | 0 |116 |4524 |11ac |4420 |
| rv32ic | 3568 | 0 |116 |3684 |e64 |3580 |
| rv32im | 4096 | 0 | 116|4212 |1072 | 4108 |
### CPU feature analysis
* Compare with `rv32i` and `rv32ia`, we found that the `text section size` and `executable size` are the same. It is because our testing code doesn't need the atomic memory access. In each of our for loop, we only access data memory one time for accessing array.
* Compare with `rv32i` and `rv32ic`, we found that the `text section size` and `executable size` of `rv32ic` are less than `rv32i`. It is because `rv32ic` includes compressed Instructions, it compresses a lot of base instructions into lower size. So, it reduces whole code space size.
* Compare with `rv32i` and `rv32im`. But actually in our testing code, we don't really use `mul` and `div` instructions. The differentiaton between `text section size` and `executable size` might be some of the default hardware instruction or the internal of `print` instructions.