# N64 emulator extensions for homebrew developers
This document describes a set of extensions to the N64 hardware that emulators
can optionally support to help the development of homebrew software.
THIS DOCUMENT IS A DRAFT: It is not currently frozen. Feel free to propose comments and improvements before we move to the implementation phase.
## Extension entrypoint
The entrypoint for all extensions is the opcode instruction `TNE` (trap if not equal). Normally, this opcode accepts two registers and generates an exception if their current value is different. By passing the same register twice, the opcode is effectively a nop. This is never generated by a compiler (as it it would be pointless), and has zero side effects when run on real hardware. It is thus a good candidate for an extension point caught only by emulators.
The `TNE` opcode also contains a 10-bit code field that can normally be used by the exception handler to differentiate different traps. In our case, we use the code field to specify the specific extension that we want to run. The actual extension number is specified in bits `4..9` while bits `0..3` are reserved as additional input to "tweak" the way the extension operates.
The register referenced by `TNE` can be used to either provide additional input, or for the extension to provide an output. If no register is needed as input, `$0` should be conventially used for forward compatibility.
## Disassembly
For the purpose of making explicit that `TNE` is the extension entrypoint, we suggest emulators that show a disassembly/trace to format the opcode as `EMUX`. This should be done for all `TNE` opcodes that use the same register on both operands. For instance, the opcode
0x00e70076 tne $7, $7, 0x10
should be disassembled as:
0x00e70076 emux $7, breakpoint
as `breakpoint` is the name of extension 0x10.
Notice that a `TNE` opcode with different registers should be left as-is. For instance:
0x00c70076 tne $6, $7, 0x10
should be left as-is, as the two registers are different, so this is not an emulator extension entrypoint.
## Extension detection
The guest application might want to detect whether the emulator supports extensions (and which ones are supported, as an emulator might not implement the full specification). To do so, the application can try to run extension 0x0.
Extension 0x0 is called `detect`, and is used specifically as a way to detect extensions. The specified register is used as output of the extension: it will be filled with a 64-bit mask, where each bit reports whether the corresponding extension is supported. Thus,
a way to detect emulator extensions is the following:
li at, 0x0 # clear the at register
emux at, detect # run extension detection
bnez at, has_extensions # if not zero, extensions are supported
Notice how the above sequence is completely harmless when run on real hardware.
## RSP support
The extensions can be invoked on both the VR4300 and the RSP. Technically, `TNE` is an invalid opcode on the RSP, but RSP has no exceptions or side effects for running invalid opcodes, so the opcode can be used freely even
there.
## Emulator-level extension toggle
We suggest emulators to add a user-level option to enable or disable emulator homebrew extensions. While extensions have been designed to be completely harmless when run on hardware and on emulators of any accuracy level, we advise against providing a too easy way for a ROM to perform a more general "emulator detection" that cannot be overridden by the user.
We value a homebrew ecosystem that targets real hardware first and foremost, and we want to avoid as much as possible homebrew software that explicitly change their behavior when an emulator is detected (at least, not without user consent).
By providing a way to disable extension support at the emulator level, we give the users the final choice on whether they want to play "emulator enhancements"
or not, and we avoid a fragmented ecosystem where ROMs behave differently on different emulators and on hardware.
## Extension list
This is the list of all defined extensions. The extension number must be encoded in the highest 5 bits of the `code` field of the `TNE` opcode. Emulators are expected to only check those bits to decide which extension to run, as the lowest bits are reserved to each extension (see below).
### 0x0: `detect`
Detect extensions supported by the emulator.
Input:
* `code[9..4]`: 0x0
* `code[3..0]`: ignored
* `register`: ignored
Output:
* `register`: 64-bit bitmask where ones mark supported extensions. NOTE: on RSP, only the lowest 32-bit of the bitmasks are reported.
The specified register is filled with a bitmask, where each bit corresponds to an extension; the bit should be set to 1 if the extension is supported by the emulator, and to 0 otherwise. For instance, if an emulator writes 0x3 into the register, it means that only extension 0x0 (`detect`) and 0x1 (`breakpoint`) are supported.
### 0x1: `breakpoint`
Trigger/configure a breakpoint in the built-in/attached debugger of the emulator.
This is an exension "family" covering several different commands.
#### 0x10: `breakpoint(now)`
Immediate break in the debugger
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x0
* `register`: ignored
Output:
* `register`: not modified
#### 0x11: `breakpoint(set)`
Set a breakpoint into the debugger
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x1
* `register`: virtual address of the memory location where the breakpoint should be configured
Output:
* `register`: not modified
#### 0x12: `breakpoint(unset)`
Remove a breakpoint from the debugger
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x2
* `register`: virtual address of the memory location where the breakpoint should be removed
Output:
* `register`: not modified
#### 0x13: `breakpoint(watch)`
Add a watchpoint: break when a certain location is written to.
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x3
* `register`: virtual address of the memory location where the watchpoint should be added
Output:
* `register`: not modified
#### 0x14: `breakpoint(watch_any)`
Add a watchpoint: break when a certain location is read or written to.
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x4
* `register`: virtual address of the memory location where the watchpoint should be added
Output:
* `register`: not modified
#### 0x15: `breakpoint(unwatch)`
Remove a watchpoint
Input:
* `code[9..4]`: 0x1
* `code[3..0]`: 0x5
* `register`: virtual address of the memory location where the watchpoint should be removed
Output:
* `register`: not modified
### 0x2: `trace`
This extension asks the emulator to start tracing opcodes run after it on the current CPU.
Traces are emulator dependent, so this specification does not provide any guidance on the actual format of traces, or where they should be written or displayed.
Tracing should be activated only on the CPU that has run this opcode (so either the VR4300 or the RSP).
This is an exension "family" covering several different commands.
#### 0x20: `trace(start)`
Start tracing from the current PC until manually stopped.
Input:
* `code[9..4]`: 0x2
* `code[3..0]`: 0x0
* `register`: ignored
Output:
* `register`: not modified
Running `trace(start)` while another trace is in progress will reset the tracing engine to the new specified configuration. For instance, if a limited trace was in progress, when `emux $0, trace(start)` is run, the trace will switch to limitless mode and keep tracing until manually stopped.
#### 0x21: `trace(count)`
Input:
* `code[9..4]`: 0x2
* `code[3..0]`: 0x1
* `register`: number of instructions to trace
Output:
* `register`: not modified
The trace engine is started (or its configuration modified, if it was already running) and it will continue tracing for number of opcodes specified in the register. After that many opcodes, tracing will automatically stop.
Example:
li t0, 100
emux t0, trace(count)
This extension will cause the emulator to emit a trace of the next 100 opcodes run by this CPU.
#### 0x22: `trace(stop)`
Stop tracing.
Input:
* `code[9..4]`: 0x2
* `code[3..0]`: 0x2
* `register`: ignored
Output:
* `register`: not modified
Notice that it is valid to run this extension also while a limited trace is running. For instance, if the ROM has previously run the extension `li at, 100; emux at, trace(count)` and is currently tracing, if the 60th opcode being run is a `emux trace_stop`, it should stop tracing there, without tracing the next 40 opcodes.
### 0x3: `log`
This extension asks the emulator to display a log message, with a specified content. Each call to a `log` extension will provide one or more characters to display, which might or might not form a single complete message.
The emulator should make no assumptions on the formatting of the string,
and specifically it must not assume that each extension call is a "complete line of text". The best way to display the log is akin piping the bytes directly to the console (or writing the bytes into a log file). The emulator should avoid adding newlines, such as displaying the string provided by a single `log` extension into a standalone line. On the other hand, it is allowed for the emulator to buffer internally the received data until a newline is received.
The text to display is always treated as UTF-8 encoded string.
### 0x30: `log(byte)`
Log a single character (byte) of text.
Input:
* `code[9..4]`: 0x3
* `code[3..0]`: 0x0
* `register`: Byte of text to log (in bits 0..7).
Output:
* `register`: not modified
Example:
```
.balign 8
MSG: .byte "hello\n\0\0"
ld t0, MSG
dsrl t1, t0, 56
0x01290c36 emux t1, log(byte)
dsrl t1, t0, 48
0x01290c36 emux t1, log(byte)
dsrl t1, t0, 40
0x01290c36 emux t1, log(byte)
dsrl t1, t0, 32
0x01290c36 emux t1, log(byte)
dsrl t1, t0, 24
0x01290c36 emux t1, log(byte)
dsrl t1, t0, 16
0x01290c36 emux t1, log(byte)
```
this should make the emulator emit a log such as:
```
[DEBUG] hello
```
### 0x31: `log(string)`
Log a zero-terminated string.
Input:
* `code[9..4]`: 0x3
* `code[3..0]`: 0x1
* `register`: virtual address (VR4300) or DMEM address (RSP) of the zero terminated string to log.
Output:
* `register`: not modified
A zero terminated string is a sequence of bytes whose length is not explicitly declared, but is terminated by the special byte `0x0`. As explained above this string does not necessarily constitute a full message, and should simply be appended to the previous logged contents.
Example:
```
MSG: .byte "hello\n\0"
la t0, MSG
0x01080c76 emux t0, log(string)
```
this should make the emulator emit a log such as:
```
[DEBUG] hello
```
### 0x32: `log(buflen)`
Set the length of a memory buffer that will be logged. This extensions is used to provide the length for a buffer logged via the extension `0x33` (`log(buf)`).
Input:
* `code[9..4]`: 0x3
* `code[3..0]`: 0x2
* `register`: length of the buffer in bytes.
Output:
* `register`: not modified
The length configured via this extension remains valid for multiple calls to extension `log(buf)`.
### 0x33: `log(buf)`
Log a memory buffer. This allows to log an arbitrary amount of bytes from a memory buffer. The length must have been configured via extension `0x32` (`log(buflen)`) before calling this extension.
Input:
* `code[9..4]`: 0x3
* `code[3..0]`: 0x3
* `register`: virtual address (VR4300) or DMEM address (RSP) of a buffer to log.
Output:
* `register`: not modified
Example:
```
MSG: .byte "hello\n"
li t1, 3
0x01290cb6 emux t1, log(buflen) # Configure buffer length
la t0, MSG
0x01080cf6 emux t0, log(buf) # Output 3 bytes
la t0, MSG+3
0x01080cf6 emux t0, log(buf) # Output 3 bytes
```
This should make the emulator emit a log such as:
```
[DEBUG] hello
```
Notice that the emulator MUST NOT display this output:
```
[DEBUG] hel
[DEBUG] lo
```
because, just like in the `log(byte)` case, a single extension call is not guaranteed to represent a complete message.
### 0x4: `dump_regs`
This extension asks the emulator to display a register dump.
In all the extensions of this family, the input register is treated specially: when its value is not 0, it is used as a bitmask that specifies which registers should be dumped. When its value is zero, it means that all registers should be dumped.
### 0x40: `dump_regs(gpr)`
Dump general purpose registers of the current CPU.
Input:
* `code[9..4]`: 0x4
* `code[3]`: if 1, prefer decimal representation
* `code[2]`: if 1, include also lo/hi in the dump
* `code[1..0]`: 0x0
* `register`: bitmask of which registers must be dumped (if 0, then all registers must be dumped).
Output:
* `register`: not modified
Example:
```
0x00001036 emux $0, dump_regs(gpr)
```
this should provide an output such as:
```
GPR:
zr: ---- ---- 0000 0000 at: ---- ---- 8080 0000 v0: ---- ---- 800c 0000 v1: ---- ---- 800c 0000
a0: ---- ---- 8006 dd88 a1: ff01 0401 0000 0000 a2: ff01 0401 0000 0000 a3: ff01 0401 0000 0000
t0: ---- ---- 8004 72b0 t1: ---- ---- 0000 0004 t2: ff01 0401 0000 0000 t3: ff01 0401 0000 0000
t4: ff01 0401 0000 0000 t5: ---- ---- 0000 0000 t6: ---- ---- 0000 0000 t7: ---- ---- 807f fdc0
s0: ---- ---- 8006 dd88 s1: ---- ---- 8004 65b0 s2: ---- ---- 8004 6610 s3: ffff ffff 0000 0000
s4: ---- ---- 8004 65b0 s5: ---- ---- 0050 4040 s6: ---- ---- 0000 003f s7: ---- ---- 0000 0001
t8: ---- ---- 0000 0000 t9: ---- ---- 8002 42c0 k0: 8002 56c0 0000 000f k1: ---- ---- 807f fef8
gp: ---- ---- 8004 df60 sp: ---- ---- 807f fe78 s8: ---- ---- 0000 0000 ra: ---- ---- 8001 597c
lo: ---- ---- 0000 000f hi: ---- ---- 0000 0080
```
Example:
```
li t0, (1<<8)|(1<<9)|(1<<10) # Dump registers t0,t1,t2
0x01081276 emux $0, dump_regs(gpr,decimal)
```
this should provide an output such as:
```
t0: 1458 t1: -123 t2: 256
```
### 0x41: `dump_regs(cop0)`
Dump COP0 registers of the current CPU.
Input:
* `code[9..4]`: 0x4
* `code[3]`: if 1, prefer decimal representation
* `code[2]`: ignored
* `code[1..0]`: 0x0
* `register`: bitmask of which registers must be dumped (if 0, then all registers must be dumped).
Output:
* `register`: not modified
### 0x42: `dump_regs(cop1)` (VR4300)
Dump COP1 registers. This extension can only be run on VR4300. Notice that this same opcode means `dump_regs(cop2)` when run on RSP.
Input:
* `code[9..4]`: 0x4
* `code[3]`: if 1, prefer decimal representation
* `code[2]`: if 1, interpret FPU registers as double precision
* `code[1..0]`: 0x2
* `register`: bitmask of which registers must be dumped (if 0, then all registers must be dumped).
Output:
* `register`: not modified
Example:
```
0x000010b6 emux $0, dump_regs(cop1)
```
should produce an output such as:
```
FPR:
$f0: 40f000007f0834ca $f1: 40f000004cb2d05e $f2: 3faeccfe43044fe1 $f3: 3fc66666434a5eca
$f4: 000000004324ee6a $f5: 00000000433a9a62 $f6: 000000004260a23a $f7: 000000004311207f
$f8: 0000000043152a03 $f9: 00000000423d34d3 $f10: 0000000042c00000 $f11: 0000000000000000
$f12: 40f00000ff0834ca $f13: 3f84282242800000 $f14: 0000000042d76040 $f15: 0000000042eb426b
$f16: 000000003f800000 $f17: 0000000000000000 $f18: 0000000000000000 $f19: 0000000000000000
$f20: 0000000043a00000 $f21: 0000000043700000 $f22: 000000003dcccccd $f23: 000000003f4ccccd
$f24: 0000000030000000 $f25: 0000000000000000 $f26: 0000000000000000 $f27: 0000000000000000
$f28: 0000000000000000 $f29: 0000000000000000 $f30: 0000000000000000 $f31: 0000000000000000
```
Example:
```
li t0, (1<<10) | (1<<20) | (1<<21)
0x000010b6 emux t0, dump_regs(cop1,double,decimal)
```
should produce an output such as:
```
$f10: 0.06015772407604892
$f20: 0.17499998365086916
$f21: <Denormal>
```
### 0x42: `dump_regs(cop2)` (RSP)
Dump COP2 registers. This extension can only be run on RSP. Notice that this same opcode means `dump_regs(cop1)` when run on VR4300.
Input:
* `code[9..4]`: 0x4
* `code[3]`: if 1, prefer decimal representation
* `code[2]`: if 1, include accumulator and flag registers in the dump
* `code[1..0]`: 0x2
* `register`: bitmask of which registers must be dumped (if 0, then all registers must be duped).
Output:
* `register`: not modified
### 0x5: `profile`
This extension family asks the emulator to profile a section of code. A profile is initiated with a "start" command, and terminated with a "stop" command.
Profiling collects several metrics, whose list is emulator dependent. The bare minimum is CPU cycles, but that is not very helpful per se as the CPU is able to read that by itself via COP0. Other more useful metrics are listed below.
Profiling collects metrics in several "slots". Running "start"/"stop" only affects the slot specified via the input register. The application can start multiple slots in parallel, and metrics should be called in all the running slots. The maximum number of available slots is emulator depenent, but we suggest to support 65536 slots.
Notice that slots must shared between VR4300 and RSP. That is, VR4300 might start a profile on slot 15 and RSP might then stop profiling on slot 15; they both refer to the same slot. Profile metrics always must cover both VR4300 and RSP, irrespective of which CPU started the profiling.
### 0x50: profile(start)
Start collecting metrics in the specified slot.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x0
* `register`: slot index
Output:
* `register`: not modified
### 0x51: profile(stop)
Stop collecting metrics in the specified slot.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x1
* `register`: slot index
Output:
* `register`: not modified
### 0x52: profile(clear)
Clear all metrics in the specified slot to zero.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x2
* `register`: slot index
Output:
* `register`: not modified
### 0x53: profile(reset)
Reset all metrics in all the slots
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x3
* `register`: ignored
Output:
* `register`: not modified
### 0x54: profile(logenable)
Enable a metric to be displayed in the log.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x4
* `register[15..0]`: metric to enable
The command `profile(log)` display metrics collected in a certain slot. Since not all the metrics are useful at any given time, and displaying too many numbers is confusing to the user, multiple calls to this command allows to configure the metrics that will be displayed by the following `profile(log)` calls.
### 0x55: profile(logreset)
Reset the log system disabling all metrics. After this call, `profile(logenable)` should be called to configure which metrics to display.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x5
* `register`: ignored
### 0x56: profile(log)
Log the metrics collected in a specified slot. This command will display on output, in a emulator-specific format, the collected metrics. Only the ones enabled via `profile(logenable)` should be displayed.
Input:
* `code[9..4]`: 0x5
* `code[3..0]`: 0x4
* `register`: slot index
Output:
* `register`: not modified
Example:
```
li s0, 4
0x021014b6 emux s0, profile(clear) # clear metrics in slot 4
0x02101436 emux s0, profile(start) # start profile in slot 4
jal FunctionToBenchmark
nop
0x021014b6 emux s0, profile(stop) # stop profile in slot 4
emux s0, profile(log) # log metrics collected in slot 4
```
### Metrics list
This is a list of all potentially available metrics. An emulator is not required to implement them all, and it should return 0 when reading a non imlemented metric via `profile(read)`.
* 0x00nn: VR4300 metrics
* 0x0000: cycle count
* 0x0001: cycle count within an exception (EXL=1 or ERL=1)
* 0x0010: icache hits
* 0x0011: icache misses
* 0x0020: dcache hits
* 0x0021: dcache misses
* 0x0030: number of bytes written to RDRAM
* 0x0031: number of bytes read from RDRAM
* 0x01nn: VR4300 COP0 metrics
* 0x0100: TLB lookup hits
* 0x0101: TLB lookup misses
* 0x0110: Mini-TLB lookup hits
* 0x0111: Mini-TLB lookup misses
* 0x02nn: RSP metrics
* 0x0000: cycle count
* 0x0001: cycle count while idle (halted)
* 0x0010: number of DMA bytes RDRAM -> DMEM
* 0x0011: number of DMA bytes DMEM -> RDRAM
* 0x0020: total number of pipeline stalls
* 0x0021: number of pipeline stalls because of vector write/read delay
* 0x0022: number of pipeline stalls because of general write/read conflict
### 0x1f: `control`
This extension "family" allows the application to control some behaviors of the
emulator while running it.
#### 0x1f0: `control(exit)`
Request the emulator to exit.
Input:
* `code[9..4]`: 0x1f
* `code[3..0]`: 0x0
* `register`: exit code of the process
Output:
* `register`: not modified
Through this control command, some testsuites could be designed to not draw anything on screen, and instead display results via the `log` extension, and then fully shutdown the emulator, so that they can be run in a fully non interactive mode (eg: even on a headless CI).
NOTE: an emulator that implementing this command is expected to fully exit itself. Just stopping emulation or even closing the ROM while keeping the emulator window open and active is not considered a conforming implementation.
#### 0x1f1: `control(fast)`
Request the emulator to run at maximum unbounded speed, ignoring vertical sync or other real time concerns.
Input:
* `code[9..4]`: 0x1f
* `code[3..0]`: 0x1
* `register`: ignored
Output:
* `register`: not modified
This can be useful while debugging code, as in that case the developer might want to do an edit-compile-run cycle and get as soon as possible to a breakpoint or to some register dump.
Unbound speed should only affect how fast the program is executing on the host machine. To the emulated system, the higher speed should not be perceivable.