# N64 emulator extensions for homebrew developers This document describes a set of extensions to the N64 hardware that emulators can optionally support to help the development of homebrew software. THIS DOCUMENT IS A DRAFT: It is not currently frozen. Feel free to propose comments and improvements before we move to the implementation phase. ## Extension entrypoint All extensions use unused COP0 opcodes. On both VR4300 and RSP, unused COP0 opcodes are completely ignored and generate no (known) side effects. COP0 opcodes are identified by the last 6 bits in the 32-bit opcode word. We use values in the range 0x20..0x3F, which are unused. Moreover, bits from 6 to 24 in the opcode word are conventionally allocated as follows: * Bits 24..20: `rd`, refer to a GPR register of the CPU * Bits 19..15: `rt`, refer to a GPR register of the CPU * Bits 14..6: `code`, 9 bit of immediate code Emulators should emulate emux extenions as if they take exactly 1 clock cycle each one, and cause no stalls on input/output registers. ## Disassembly All extension mnemonics start with `x`. The documentation states which input/output each extenion uses. Disassembly should conventionally list arguments in this order: ``` xopcode rd, rt, code ``` though if an argument is unused by the opcode, it can be ignored. Some opcodes also have "optional" arguments, that is, they use the special 0 value as default. In that case, the disassembly can also hide the whole argument. For instance, `xlog` doesn't use `code`, optionally uses `rt`, and always uses `rd`. All the following syntaxes are valid disassembly of the same opcode word, but the last form is the suggested one: ``` xlog s7, zero, 0 xlog s7, zero xlog s7 ``` ## Extension detection The guest application might want to detect whether the emulator supports extensions (and which ones are supported, as an emulator might not implement the full specification). To do so, the application can try to run the extension `xdetect` (opcode 0x20). `xdetect` (0x20) is used specifically as a way to detect extensions. The specified `rd` register is used as output of the extension: it will be filled with a 64-bit mask, where each bit reports whether the corresponding extension is supported. Thus, a way to detect emulator extensions is the following: li at, 0 # clear at (in case emux is not supported) xdetect at # run extension detection bnez at, has_extensions # if not zero, extensions are supported Notice how the above sequence is completely harmless when run on real hardware. ## Emulator-level extension toggle We suggest emulators to add a user-level option to enable or disable emulator homebrew extensions. While extensions have been designed to be completely harmless when run on hardware and on emulators of any accuracy level, we advise against providing a too easy way for a ROM to perform a more general "emulator detection" that cannot be overridden by the user. We value a homebrew ecosystem that targets real hardware first and foremost, and we want to avoid as much as possible homebrew software that explicitly change their behavior when an emulator is detected (at least, not without user consent). By providing a way to disable extension support at the emulator level, we give the users the final choice on whether they want to play "emulator enhancements" or not, and we avoid a fragmented ecosystem where ROMs behave differently on different emulators and on hardware. ## Extension list This is the list of all defined extensions. The extension code is the COP0 opcode that should be used when encoding that extension. ### 0x20: `xdetect` Detect extensions supported by the emulator. Output: * `rd`: 64-bit bitmask where 1s mark supported extensions. NOTE: on RSP, only the lowest 32-bit of the bitmasks are reported. The specified register is filled with a bitmask, where each bit corresponds to an extension; the bit should be set to 1 if the extension (identified by its opcode) is supported by the emulator, and to 0 otherwise. For instance, an emulator must set bit 0x21 (33) in the bitmask to notify that it supports `xbreak`. ### 0x21: `xbreak` Immediate break in the debugger. When this opcode is run, the emulator should break into its builtin debugger to allow to inspect the state. ### 0x22: `xbreakpoint` Set/unset a breakpoint/watchpoint within the debugger Input: * `rd`: virtual address of the memory location where the breakpoint should be configured * `code:`: Sub function: * 1: Add a breakpoint at the specified address * 2: Remove a breakpoint from the specified address * 3: Add a read watchpoint at the specified address * 4: Remove a read watchpoint at the specified address * 5: Add a write watchpoint at the specified address * 6: Remove a write watchpoint at the specified address ### 0x23: `xtrace-start` Start tracing from the current PC until manually stopped. This extension asks the emulator to start tracing opcodes run after it on the current CPU. Traces are emulator dependent, so this specification does not provide any guidance on the actual format of traces, or where they should be written or displayed. Tracing should be activated only on the CPU that has run this opcode (so either the VR4300 or the RSP). Input: * `code`: if not zero, specifies the maximum number of instructions to trace. If zero, it means unlimited (until manually stopped) Running `xtrace-start` while another trace is in progress will reset the tracing engine to the new specified configuration. For instance, if a limited trace was in progress, when `xtrace-start $0` is run, the trace will switch to limitless mode and keep tracing until manually stopped. Example: xtrace-start 100 This extension will cause the emulator to emit a trace of the next 100 opcodes run by this CPU. `xtrace-start` itself is never part of the trace. ### 0x24: `xtrace-stop` Stop tracing. Notice that it is valid to run this extension also while a limited trace is running. For instance, if the ROM has previously run the extension `xtrace-start 100` and is currently tracing, if the 60th opcode being run is a `xtrace-stop`, it should stop tracing there, without tracing the next 40 opcodes. `xtrace-stop` is always part of the trace, effectively being the last instruction being traced. ### 0x25: `xlog` This extension asks the emulator to display a log message, with a specified content. Each call to `xlog` will provide one or more characters to display, which might or might not form a single complete message. The emulator should make no assumptions on the formatting of the string, and specifically it must not assume that each extension call is a "complete line of text". The best way to display the log is akin piping the bytes directly to the console (or writing the bytes into a log file). The emulator should avoid adding newlines, such as displaying the string provided by a single `xlog` extension into a standalone line. On the other hand, it is allowed for the emulator to buffer internally the received data until a newline is received. The text to display is always treated as UTF-8 encoded string. Input: * `rd`: virtual address (VR4300) or DMEM address (RSP) of the zero terminated string to log. * `rt`: length of the string in bytes. If this value is 0, the string must be NULL-terminated. A zero terminated string is a sequence of bytes whose length is not explicitly declared, but is terminated by the special byte `0x0`. As explained above this string does not necessarily constitute a full message, and should simply be appended to the previous logged contents. Example: ``` MSG: .byte "hello\n\0" la t0, MSG xlog t0, zr ``` this should make the emulator emit a log such as: ``` [DEBUG] hello ``` Another example: ``` MSG: .byte "hello\n" la t0, MSG li t1, 3 xlog t0, t1 la t0, MSG+3 xlog t0, t1 ``` This should also make the emulator emit a log such as: ``` [DEBUG] hello ``` Notice that the emulator MUST NOT display this output: ``` [DEBUG] hel [DEBUG] lo ``` because, as explained, a single `xlog` call is not guaranteed to represent a complete message. ### 0x26: `xlogregs` Log registers of the current CPU. Input: * `rd`: register containing a 32-bit bitmask of which registers must be dumped. If the register has value 0, all (existing) registers should be dumped * `code[0..1]`: a value specifyig which register file to dump: * 0: COP0 registers * 1: COP1 registers (valid only on VR4300) * 2: COP2 registers (valid only on RSP) * 3: GPR registers * `code[2]`: if 1, prefer signed decimal representation. If 0, hexadecimal. * `code[3]`: if 1, interpret registers as 64-bit floating point values (valid only for COP1) * `code[4]`: if 1, dump also "extra registers" not covered by the 32-bit bitmasks * For COP2 (RSP): accumulator, flags * For GPR (VR4300): hi/lo, program counter * For GPR (RSP): program counter Example: ``` #define XDUMP_GPR 3 xlogregs $0, XDUMP_GPR ``` this should provide an output such as: ``` GPR: zr: ---- ---- 0000 0000 at: ---- ---- 8080 0000 v0: ---- ---- 800c 0000 v1: ---- ---- 800c 0000 a0: ---- ---- 8006 dd88 a1: ff01 0401 0000 0000 a2: ff01 0401 0000 0000 a3: ff01 0401 0000 0000 t0: ---- ---- 8004 72b0 t1: ---- ---- 0000 0004 t2: ff01 0401 0000 0000 t3: ff01 0401 0000 0000 t4: ff01 0401 0000 0000 t5: ---- ---- 0000 0000 t6: ---- ---- 0000 0000 t7: ---- ---- 807f fdc0 s0: ---- ---- 8006 dd88 s1: ---- ---- 8004 65b0 s2: ---- ---- 8004 6610 s3: ffff ffff 0000 0000 s4: ---- ---- 8004 65b0 s5: ---- ---- 0050 4040 s6: ---- ---- 0000 003f s7: ---- ---- 0000 0001 t8: ---- ---- 0000 0000 t9: ---- ---- 8002 42c0 k0: 8002 56c0 0000 000f k1: ---- ---- 807f fef8 gp: ---- ---- 8004 df60 sp: ---- ---- 807f fe78 s8: ---- ---- 0000 0000 ra: ---- ---- 8001 597c lo: ---- ---- 0000 000f hi: ---- ---- 0000 0080 ``` Example: ``` #define XDUMP_GPR 3 #define XDUMP_DECIMAL (1<<2) li t0, (1<<8)|(1<<9)|(1<<10) # Dump registers t0,t1,t2 xdump t0, XDUMP_GPR | XDUMP_DECIMAL ``` this should provide an output such as: ``` t0: 1458 t1: -123 t2: 256 ``` Example: ``` #define XDUMP_FPU 1 xlogregs $0, XDUMP_FPU ``` should produce an output such as: ``` FPR: $f0: 40f000007f0834ca $f1: 40f000004cb2d05e $f2: 3faeccfe43044fe1 $f3: 3fc66666434a5eca $f4: 000000004324ee6a $f5: 00000000433a9a62 $f6: 000000004260a23a $f7: 000000004311207f $f8: 0000000043152a03 $f9: 00000000423d34d3 $f10: 0000000042c00000 $f11: 0000000000000000 $f12: 40f00000ff0834ca $f13: 3f84282242800000 $f14: 0000000042d76040 $f15: 0000000042eb426b $f16: 000000003f800000 $f17: 0000000000000000 $f18: 0000000000000000 $f19: 0000000000000000 $f20: 0000000043a00000 $f21: 0000000043700000 $f22: 000000003dcccccd $f23: 000000003f4ccccd $f24: 0000000030000000 $f25: 0000000000000000 $f26: 0000000000000000 $f27: 0000000000000000 $f28: 0000000000000000 $f29: 0000000000000000 $f30: 0000000000000000 $f31: 0000000000000000 ``` Example: ``` li t0, (1<<10) | (1<<20) | (1<<21) xlogregs t0, XDUMP_FPU | XDUMP_DOUBLE | XDUMP_DECIMAL ``` should produce an output such as: ``` $f10: 0.06015772407604892 $f20: 0.17499998365086916 $f21: <Denormal> ``` ### 0x27: `xhexdump` Perform a hexdump of the specified buffer, writing it to the log. Input: * `rd`: address of the buffer to dump. See `code` for how this address is interpreted * `rt`: length of the buffer in bytes. * `code[0]`: how the address is interpreted: * `0`: virtual address (VR4300) or IMEM/DMEM (RSP) * `1`: RCP physical address (can be run on both CPUs) Note that xhexdump should perform memory accesses in a side-effect free way. In particular: * Reads from the RSP semaphore should not set it to 1. This is the only hardware register in N64 that has side effects on read. * Reads through virtual addresses that go through the data cache should report either cache contents (if those addresses are cached) or RAM contents (if they are not), but the data cachelines should not be alterered in any way. ### 0x28: `xprof` This extension family asks the emulator to profile a section of code. A profile is initiated with a "start" command, and terminated with a "stop" command. Profiling collects several metrics, whose list is emulator dependent. The bare minimum is CPU cycles, but that is not very helpful per se as both CPUs are able to read that by themselves via COP0. Other more useful metrics are listed below. Profiling collects metrics in several "slots". Running "start"/"stop" only affects the slot specified via the input register. The application can start multiple slots in parallel, and metrics should be called in all the running slots. The maximum number of available slots is emulator depenent, but we suggest to support 256 slots. Notice that slots must shared between VR4300 and RSP. That is, VR4300 might start a profile on slot 15 and RSP might then stop profiling on slot 15; they both refer to the same slot. Profile metrics always must cover both VR4300 and RSP, irrespective of which CPU started the profiling. Input: * `rd`: slot index * `code`: sub-function: * 1: start profiling in the specified slot * 2: stop profiling in the specified slot * 3: clear (zero) the specified slot * 4: reset all the slots (ignore `rd`) ### 0x29: `xprof-read` Read the current value of a specified metric in a specified slot. Input: * `rd`: slot index (if negative, return global counter since boot) * `rt`: metric ID Output: * `rt`: current value of the metric This function can be used to read back the current value of a metric, that can be used to show realtime profiling data into the running application. It can request either a slot-specific metric, which accounts for the profiling data accumulated between profile start/stop calls in that slot, or the global metric (specifying a negative slot number), that shows total values since boot. If a specified metric does not exist or is not supported, `rt` should be set to 0 in output. ### Metrics list This is a list of all potentially available metrics. An emulator is not required to implement them all, and it should return 0 when reading a non imlemented metric via `profile(read)`. * 0x00nn: VR4300 metrics * 0x0000: cycle count * 0x0001: cycle count within an exception (EXL=1 or ERL=1) * 0x0010: icache hits * 0x0011: icache misses * 0x0012: icache writebacks * 0x0020: dcache hits * 0x0021: dcache misses * 0x0022: dcache writebacks * 0x01nn: VR4300 COP0 metrics * 0x0100: TLB lookup hits * 0x0101: TLB lookup misses * 0x0110: Mini-TLB lookup hits * 0x0111: Mini-TLB lookup misses * 0x02nn: RSP metrics * 0x0200: cycle count * 0x0201: cycle count while idle (halted) * 0x0210: total number of pipeline stalls * 0x0211: number of pipeline stalls because of vector write/read delay * 0x0212: number of pipeline stalls because of general write/read conflict * 0x03nn: RDRAM metrics * 0x0300: total number of bytes read or written * 0x0301: total number of bytes read * 0x0302: total number of bytes written * 0x0310: number of bytes read or written by VR4300 icache * 0x0311: number of bytes read by VR4300 icache * 0x0312: number of bytes written by VR4300 icache * 0x0320: number of bytes read or written by VR4300 dcache * 0x0321: number of bytes read by VR4300 dcache * 0x0322: number of bytes written by VR4300 dcache * 0x0330: number of bytes read or written by VR4300 uncached load/store * 0x0331: number of bytes read by VR4300 uncached load/store * 0x0332: number of bytes written by VR4300 uncached load/store * 0x0340: number of bytes read or written by RSP DMA * 0x0341: number of bytes read by RSP DMA * 0x0342: number of bytes written by RSP DMA * 0x0350: number of bytes read or written by PI DMA * 0x0351: number of bytes read by PI DMA * 0x0352: number of bytes written by PI DMA * 0x0360: number of bytes read or written by SI DMA * 0x0361: number of bytes read by SI DMA * 0x0362: number of bytes written by SI DMA * 0x0370: number of bytes read or written by RDP (while drawing) * 0x0371: number of bytes read by RDP (while drawing) * 0x0372: number of bytes written by RDP (while drawing) * 0x0380: number of bytes read or written by AI DMA * 0x0381: number of bytes read by AI DMA * 0x0382: number of bytes written by AI DMA (always 0) * 0x0390: number of bytes read or written by VI * 0x0391: number of bytes read by VI * 0x0392: number of bytes written by VI (always 0) * 0x03A0: number of bytes read or written by RDP DMA * 0x03A1: number of bytes read by RDP DMA * 0x03A2: number of bytes written by RDP DMA (always 0) ### 0x2C: `xioctl` This extension allows the running application to affect the emulator itself. Input: * `code`: operation to perform: * 0x1: `exit`: The emulator must exit. * 0x2: `fast`: The emulator must switch to frame-unlimited mode (running as fast as possible) * 0x3: `slow`: The emulator must switch back to frame-limited mode. * 0x4: `pause`: The emulator should pause emulation. The user will have to unpause it themselves if they want to continue running the ROM. #### `xioctl` operation: `exit` Through this control command, some testsuites could be designed to not draw anything on screen, and instead display results via the `xlog` extension, and then fully shutdown the emulator, so that they can be run in a fully non interactive mode (eg: even on a headless CI). NOTE: an emulator that implementing this command is expected to fully exit itself. Just stopping emulation or even closing the ROM while keeping the emulator window open and active is not considered a conforming implementation. #### `xioctl` operation: `fast`/`slow` Request the emulator to run at maximum unbounded speed, ignoring vertical sync or other real time concerns. This can be useful while debugging code, as in that case the developer might want to do an edit-compile-run cycle and get as soon as possible to a breakpoint or to some register dump. Unbound speed should only affect how fast the program is executing on the host machine. To the emulated system, the higher speed should not be perceivable.