Computer Architecture
Github : FreeRTOS-on-VexRiscv
Use 3 terminal to show
Take VGA project in VexRiscvSocSoftware for example.
Run
and type following command in GDB
And see the RGB square in VGA GUI interface.
There are other projects for Briey in VexRiscvSocSoftware
From the description
/source
contains the FreeRTOS source code/Demo
contains a demo application for every official FreeRTOS port./Test
contains the tests performed on common code and the portable layer code.The virt board is a platform which does not correspond to any real hardware; it is designed for use in virtual machines. It is the recommended board type if you simply want to run a guest such as Linux and do not care about reproducing the idiosyncrasies and limitations of a particular bit of real-world hardware.
We can find the memory capacity definition in Briey.scala
0x80000000
and onChipRamSize
(4kB)0x40000000
and sdramLayout.capaciry
That's why the linker script (<VexRiscvSOcSoftware>/projects/briey/libs/linker.ld
) in VexRiscvSocSoftware
mention that
shows that
0x80000000
with LENGTH = 4K
0x40000000
with LENGTH = 64M
Modify and rename fake_rom.lds
-> linker.ld
Copy the header file from VexRiscvSocSoftware/libs (gpio.h
, interrupt.h
, prescaler.h
, timer.h
, uart.h
, vga.h
) to RISC-V_VexRiscv-Briey_GCC/Vex_libs
and integrate briey.h(include several base address macro) in gpio.h
Rename riscv-virt
to riscv-briey
and modify uart port.
And modify associated part.
Reference Use Briey timer instead of machine timer
main.c
Implement vPortSetupTimerInterrupt()
which is marked as weak in Source/portable/GCC/RISC-V/port.c
configTICK_RATE_HZ
(which is defined in FreeRTOSConfig.h
)Implement handle_trap()
, which is used to handle timer interrupts. It has to reset the PENDING bit for the timer and increase the system ticks
riscv-briey.c
handle_trap()
function is mentioned by -DportasmHANDLE_INTERRUPT=handle_trap
in Makefile, and this is assembler macro, not compiler macro.mstatus
0x1808
set MPP[1:0] = 11, which is stand for machine mode, and also set MIE = 1 (enable)0x880
set MEIE and MTIE to 1, which means interrupt enable for machine level external and machine timer interrupts.onChipRam
(4K) does not have enough capacity to store our code (text section)..text
and .data
section to sdram (64M).can solve this problem
Copy <FreeRTOS repo>/FreeRTOS/Demo/Common
into <FreeRTOS-Briey>/Demo
which is not provided in repo.
"Full" vs "Minimal" demo application files
FreeRTOS/Demo/Common/Full
directory assume a hosted environment and are only used by demos that run on top of old DOS systems(which is also why the Partest.c filename is cryptic - it could only use short filenames in 8.3 format)FreeRTOS/Demo/Common/Minimal
directory, none of which assume a hosted environment.Under /FreeRTOS
I suffer from some unknown problem and the gdb will stuck and can not enter in main
function.
Notice that the project is compile with -march=rv32imac
, the compressed instrucion cause nop
instruction into 2 bytes (in previous work set 7 nop
instruction to push 0x80000020
), and I think there is another problem (no figure it out yet, but I think single step(s
) in gdb will let pc+4
at each step) that cause gdb stuck.
Also, li
instruction is also compressed to 2 bytes. But in riscv64-unknown-elf-gdb
, it can't load a1
to 0
properly.
with gdb
And casue the next instruction bne
jump to label secondary
, which is handle for multicore, and it shouldn't be jump in my case.
So, I decide to build the project without compressed instruction.
Check multi-lib
But this method just abort the compressed instruction.
There are some discusses about error and incompatible feature and some project with RVC compatible implementation.
Some similar project
And follow the cofiguration above, modify IBusCachedPlugin
Makefile
Furthermore, remember that executing ecall
will trigger a SWI and jump to the address of mtvec
, which is 0x80000020
. And that's why Oscar add serveral nop
instructions to push the main_entry
to 0x80000020
.
But in RVC, nop
was compressed to 2bytes, that means I need more nop
instructions (2 times) to do the same thing.
lma
problemThere is a function in portASM.S
to trap exception and view the decription.
run(gdb)info register
to check the mcause
value
macuse
= 0x5, stand for Load access faultmepc
= 0x80000074, stand for 80000074: 00052283 lw t0,0(a0)
mstatus
= 0x1800Check the disassembly
In assembly code is la a0, _data_lma
, but in disassembly, is link to _bss_lma
.
Put _data_lma
lma same to vma.
Now, it can successfully enter in main
function. But trap to freertos_risc_v_application_exception_handler
again.
The bug is cause by setting CFLAGS
with -march=rv32ima
and LDFLAGS
with -march=rv32imac
and both with -mabi=ilp32
CFLAGS
is used to compile codeLDFLAGS
is used to link with linker scriptIt fixed when I used modified BrieySoc (with compressed instruction).
t0
= 2 stand for illegal instruction, reference previous workChange Briey.scala
setting
And it can succefully enter the task.
But, still trap in exception
But, if I use the GCC/RISC-V/portASM.S
in FreeRTOS Kernel V10.4.6, It can successfully run context switch.
In previous version, portASM.S
use portasmHANDLE_INTERRUPT
for handling external interrupts. But in the latest version (FreeRTOS Kernel V10.5.1), the code was re-factoring. And portasmHANDLE_INTERRUPT
disappear.
Using FreeRTOS on RISC-V Microcontrollers mention that to build FreeRTOS for a RISC-V core need:
So, I change the file to use external interrupts.
portASM.S
For some unknown problem, the results doesn't show as I think. The result should be show as below
which is shown in previous work
Noticed that in previous work, configUSE_PREEMPTION
is set to 0, which means cooperative scheduling. So I just add portYIELD();
in the end of the sender
and receiver
task.
And the context switch will run properly as I think.
In VexRiscv.scala
Define a 5-stage pipeline (Fetch, Decode, Execute, Memory, WriteBack)
and Fetch stage has with RVC or not option
which is utilized in Services.scala
And there is serveral regression test that give clock cycle
In src/test/cpp/regression/main.cpp
, we can find how regression test work.
There is a ready-valid handshake protocol implemented
Notice that instanceCycles
was enclosed by two for loop preCycle()
and postCycle()
And these two function definition is depended on IBUS configuration. Take IBUS_SIMPLE
as example
It will call every simElement->preCycle()
and check if iBus_cmd_valid
and iBus_cmd_ready
all both in high to make sure data transfer correctely.
For example
Slow writer(sender) and fast reader(receiver)
Fast writer(sender) and slow reader(receiver)
Under /regression
, use the command
by giving TRACE=yes
, it will generate .fst
file and we can observe it by GTKWave.
Take rv32ui-p-xor.fst
for example, which is one of the regression test.
valid
and ready
are both 1, the pc
will plus 4 at the next clock rising edge. Make sure the PC increment with instructions transfer properly.And the check()
function
assertEq
, which will throw an exception if two input is not equal
payload_address
and regFileWriteRefIndex
payload_data
and WriteRef
If above two function pass, instanceCycles += 1;
And the postCycle()
function I think it is related to GDB. (no figure out yet.)
mtime
will increment and if mTime
greater than mTimeCmp
, it will generate timerInterrupt
From Paper and Reference Link, construct a test method
Paper | Briey Soc | |
---|---|---|
C | Cache size | 4096 byte |
N | Array size | (process size) x (process number) |
s | stride | (process size) |
b | line size | 32 byte |
a | associativity | 1 way |
And associated definition can find in Briey.scala
And the paper also mentioned several regimes with different size setting to measure time per iteration with or without cache miss
Regime | Size of Array | Stride | Frequency of Misses | Time per Iteration |
---|---|---|---|---|
1 | \(1 \leq N \leq C\) | \(1 \leq s \leq N/2\) | no misses | \(T_{no-miss}\) |
2.a | \(C \lt N\) | \(1 \leq s \lt N/2\) | one miss every \(b/s\) elements | \(T_{no-miss} + Ds/b\) |
2.b | \(C \lt N\) | \(b \leq s \lt N/a\) | one miss every element | \(T_{no-miss} + D\) |
2.c | \(C \lt N\) | \(N/a \leq s \leq N/2\) | no misses | \(T_{no-miss}\) |
In reference, first simply use an array to test the influence about cache miss.
In Briey, we have 4096 byte cache size, by setting configMINIMAL_STACK_SIZE = 128
and use a function to create number of tasks.
And in the task
handler, use xTaskGetTickCount
to find the current tick count.
Try to use etime - stime
to measure the context switch latency, but it seems not work (stime
and etime
are the same value at the most time)
Implementation is still working…
The process of context switching involves storing the context (state) of the current executing task in to the stack and restoring the context of the task to be executed from the stack, the context includes
And the overhead can be reduced by migrating kernel services such as scheduling, time tick processing, and interrupt handling to hardware.
This paper is aim to reduce the effect of context switch overhead. One of the method is specializing certain register to a thread, which can eliminate the need for saving and restoring of context, but on the contrary it will reduce the number of registers available for other threads.
This paper provides a mothod that reduce the overhead by restricting the use of memory during context-switching by adding register file to the procerssor. This makes the process to compute at much faster rate therby reducin the overhead.
The implementation involves
scxt, rcxt
)First, modify orginal Plasma MIPS design by implementing all the "reg_bank" register in FPGA's logic blocks. In order to save context registers on a register file in one CPU clock cycle.
Furthermore, 4 additional register files are implemented with each register file holds 12 context registers.
Modified MIPS Architecture (the red part is modified part)
Next, implementint two context-switching instructions (scxt
, rcxt
) to access these register files for storing and restoring the context durin context-switching operation.
First, developing a co-operative operating system involing basic function for
Next, modify GCC compiler to aware of the newly added instruction, which inturn used to compile the MIPS C files. The instruction are specified in GNU binutils, and the file mips-opc.c
in folder contains all the instructions supported by the MIPS processor. Like this.
makes GCC compiler compatible to modified architecture.
First application contain four tasks created using createTask()
, the firset task deals with incrementing variables followed by adding them. Storing the result in sum variable and finally diplaying number of clock cycles.
Second application comprises of four tasks, two of them are structured to undergo fast context switchin using internal register files and the other two tasks are structured to undergo context switching using external RAM
Restricting the process of context-switching to the processir itself by modifing CPU architecture, without having the external memory access the context of tasks, rules out the extra time consumption.