The Bauhinia Instruction Set Architecture Documentation, version 1.0.0 === ## Chapter 1: Overview Bauhinia Instruction Set Architecture (Bauhinia ISA) is an assembly language designed and implemented by [Black Bauhinia](https://b6a.black) for HKCERT CTF 2023. Please consider to sponsor us through [Patreon](https://www.youtube.com/watch?v=dQw4w9WgXcQ) to support our further development. Notably, this architecture document is largely referenced from the Volume 2 of the Intel Architecture Software Developer's Manual. If you are uploading the document to _plagarism detection_ services, this document would definitely fail. ### Designers and Authors This section contains a list of designers and authors of the Bauhinia ISA. If you hate the architecture, please direct your hate to them directly. Black Bauhinia does not bear the responsibility of their design and implementation. - The architecture is designed by [@khlung1](https://twitter.com/Khlung1) and [@harrier_lcc](https://twitter.com/harrier_lcc). - The challenges are prepared by [@khlung1](https://twitter.com/Khlung1), [@harrier_lcc](https://twitter.com/harrier_lcc) and [@mystiz613](https://twitter.com/mystiz613). - The document is compiled by [@khlung1](https://twitter.com/Khlung1), [@harrier_lcc](https://twitter.com/harrier_lcc), [@mystiz613](https://twitter.com/mystiz613) and [@vikychoi](https://twitter.com/vikychoi). :::info If there are any questions the architecture document during contest, please reach out to the authors by opening a ticket on the HKCERT CTF 2023 Discord server. ::: ### Special Thanks - Thanks @TrebledJ for spotting the typo in descriptions for `LT` and `LTu`. (22:44, Nov 10, 2023) - Thanks @peace_ranger for spotting the typos in descriptions for `NEQ`, `LTu` and `PUSH`; as well as pointing out that `SYSCALL` outputs to `R8` (02:23, Nov 11, 2023) - Thanks @harrier for spotting the typo in description for `MOV` (10:10, Nov 12, 2023) - Thanks @Soar for spotting the typo in `mem` definition at Instruction Column (13:35, Nov 12, 2023) ### Instruction Operands An instruction of Bauhinia ISA has the following format: ``` mnemonic argument1, argument2 ``` where: - a **mnemonic** is a reserved name for a class of instruction operators which have the same function, and - the **operands** _argument1_ and _argument2_ are optional. There may be zero to two operands, depending on the operator. When present, they take the form of either literals or identifiers for data items. Operand identifiers are either reserved names of registers or are assumed to be assigned to data items declared in another part of the program (which may not be shown in the example). When two operands are present in an arithmetic or logical instruction, the right operand is the source and the left operand is the destination. For example: `MOV R3, 1337` is an instruction. In this case, `MOV` is the mnemonic identifier of an operator, `R3` is the destination operand and `1337` is the source operand. It is important that the mnemonic and operands are case-sensitive, and a space (` `) character is required in between mnemonic and the first operand. The operands are comma-separated. ### Registers There are eight general-purpose registers, namely, `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7` and `R8`. There are three registers with specific roles: - `PC`, the program counter - `FP`, the frame pointer (or the stack base pointer) - `SP`, the stack pointer ## Chapter 2: Interpreter ### Execution The interpreter will parse the instruction by capturing the code segment between the program counter (`PC`) and the new line character. We will use the below code snippet as an example: ``` MOV R1, 0x1337; assigns 0x1337 to R1 MOV R2, 0xdeadbeef ``` Suppose that `PC` is currently at the beginning of the first line. The instruction captured will be ``` MOV R1, 0x1337; assigns 0x1337 to R1 ``` After that, it will strips away the comment by removing everything since the first semicolon. ``` MOV R1, 0x1337 ``` Note that a [`NOP`](#NOP-No-operation) instruction will be executed if the resulting string is empty. The program counter will advance to the beginning of the next line after the instruction is executed. This does _not_ apply when a jump is taken place by the [`JMP`](#JMP-Unconditional-jump), [`JZ`](#JZ-Jump-if-zero) or [`JNZ`](#JNZ-Jump-if-not-zero) instructions. Additionally, instructions are case sensitive. ### Memory regions and addresses There are two memory segments defined by Bauhinia ISA, defined by the below table: | Segment name | Segment address | Segment size | | ------------ | --------------- | ------------ | | Code | `0x00400000` | `0x100000` | | Stack | `0xfff00000` | `0x100000` | Any access to memory apart from these regions will be considered invalid. ### Initial values When the program executes, the registers and the memory will be initialized by the following values: - `PC` will be set to the beginning of the code segment, i.e., `0x00400000`, - `FP` and `SP` will be set to `0xfffffff0`, and - the code segment will be set to the defined program in string. Take the below program as an example. ``` MOV R1, 0x1337; assigns 0x1337 to R1 MOV R2, 0xdeadbeef ``` When the above program begins, `PC` will be set to `0x00400000` and `FP`, `SP` will be set to `0xfffffff0`. Below show the memory between `0x00400000` and `0x00400040`. <p style="text-align: center;"><img src="https://hackmd.io/_uploads/ByLLenqQp.png" style="width: 500px;"></p> ### Limitations The interpreter will keep track on the number of steps executed. If there are more than `MAX_STEP_COUNT` steps, the interpreter will stop processing and will return an non-zero exit code, stating that the [step count exceeded](#Exit-code-65-Step-count-exceeded) the limit. At the moment, `MAX_STEP_COUNT` is defined to be 131072. ## Chapter 3: Exit codes This session introduces the set of exit codes implemented by Bauhinia ISA. | Exit Code | Description | | ---------:| ------------------- | | 0 | OK | | 2 | Invalid instruction | | 3 | Segmentation fault | | 63 | Invalid program | | 64 | Bad server config | | 65 | Step count exceeded | ### Exit code `0`: OK This exit code is called to indicate the program exits successfully. ### Exit code `2`: Invalid instruction This exit code will be triggered when an invalid instruction is found at the program counter (`PC`). This including but not limited to: 1. Having an invalid instruction. For example, `mov R1, 0x1337` (instructions are case-sensitive). 1. Assigning a value to an immediate. For example, `MOV 0x400000, 0x1337`. 1. Accessing `PC`. For example, `JMP PC`. Notably, if there are no system calls that terminates the program, it is likely that the program would return an exit code 2. This is because it has reached to the end of the program, and the interpreter is unable to parse the next instruction. ### Exit code `3`: Segmentation fault This exit code will be triggered when attempting to read, write or execute memory from invalid address. All addresses apart from the code region and the stack region are considered invalid. Below are examples of code snippets that will return exit code 3: 1. Reading from address `0xdeadbeef`: ``` MOV R1, 0xdeadbeef MOV R2, [R1] ``` 1. Writing to address `0xffeffff0`, assuming that `FP` being `0xfffffff0`: ``` MOV [FP-0x100000], 0x1337 ``` 1. Executing instructions on address `0x13371337`. Note that the below instruction will run successfully, but it will fail when `PC` is at `0x13371337`: ``` JMP 0x13371337 ``` ### Exit code `63`: Invalid program This exit code will be triggered when there is a checker imposed. For example, in the _Jump Scare_ challenge, the below Python checker is implemented: ```python= def checker(code): # No comments allowed! if ';' in code: return False # All instruction should be a JMP instruction! for line in code.split('\n'): if not line.startswith('JMP '): return False return True ``` All the below input code would yield an "invalid program" error, because they do not pass the check: 1. The first line contains a comment. This is rejected on line 3 of the above Python snippet: ``` JMP 0x400000; hello world :) JMP 0x400000 ``` 1. The second line does not start with `JMP`. This is rejected on line 6 above: ``` JMP 0x400000 NOP ``` ### Exit code `64`: Bad server config This exit code will be triggered when the server is misconfigured. Please report to admin if this happens. ### Exit code `65`: Step count exceeded This exit code will be triggered when the program has been executed for more than `MAX_STEP_COUNT` steps. Please refer to the [interpreter limitations](#Limitations) on the value of `MAX_STEP_COUNT`. For example, the below program would yield an "step count exceeded" error because it did not terminate properly in `MAX_STEP_COUNT` steps (in reality, it would not terminate). ``` JMP 0x400000 ``` ## Chapter 4: Instruction Reference This chapter provides detailed descriptions of each of the Bauhinia ISA instructions. ### Interpreting the Instruction Reference #### Instruction Format The following is an example of the format used for each Bauhinia ISA instruction description in this chapter: | Instruction | Description | | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- | | `MOV r/m, imm` | Copy the immediate value from second operand (source operand) to the register or memory of the first operand (destination operand) | | `MOV r/m, reg` | Copy the register in the second operand (source operand) to the register or memory of the first operand (destination operand) | #### Instruction Column The "instruction" column gives the syntax of the instruction staqtement as it would appear in an Bauhinia ISA program. The following is a list of the symbols used to represent operands in the instruction statements: - `imm` (Immediate) -- An 32-bit value used for instructions. It allows the use of a number between 0 (`0x00000000`) and 4294967295 (`0xffffffff`) inclusive. The immediates can be written in base 2, base 8, base 16 by respectively prepending `0b`, `0o` and `0x`; or in base 10 if there are no prefixes. - `reg` (Register) -- A general-purpose register or memory operand used for instructions. The registers that can be used are `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `FP` and `SP`. - `mem` (Memory) -- A 32-bit address that represents the memory. The contents of memory are found at the address provided by the effective address computation. It must either be `[reg+imm]`, `[reg-imm]` or `[reg*imm]`. - `r/m` (Register or memory) -- Either a register or a memory address. The operand can either be a register or a memory location. ##### Examples <p style="text-align: center;"><img src="https://hackmd.io/_uploads/ByLLenqQp.png" style="width: 500px;"></p> - `12345` is an immediate that represents the number $12345_{10}$. - `0b10001` is an immediate that represents the number $17_{10}$, or $\texttt{10001}_2$. - `0o13371337` is an immediate that represents the number $301195_{10}$, or $\texttt{13371337}_8$. - `0x00f` is an immedidate that represents the number $15_{10}$, or $\texttt{f}_{16}$. - `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `FP` and `SP` are the valid registers. - `[R1]` is an memory address that points to the value at memory `R1`. For instance, If `R1 = 0x400000` then `[R1]` would be `0x20564f4d`. - `[R1+0x20]` is an memory address that points to the value at memory `R1+0x20`. For instance, if `R1 = 0x400000` then `[R1+0x20]` would be `0x3152206f`. - `[R1-8]` is an memory address that points to the value at memory `R1-8`. For instance, if `R1 = 0x400010` then `[R1-8]` would be `0x33317830`. - `[R1*4]` is an memory address that points to the value at memory `R1*4`. For instance, if `R1 = 0x100000` then `[R1*4]` would be `0x20564f4d`. #### Description Column The "Description" column following the "Instruction" column briefly explains the various forms of the instruction. The following "Description" sections contain more details of the instruction's operation. ### `JMP`: Unconditional jump | Instruction | Description | | ----------- | ------------------------ | | `JMP reg` | Jump to reg, absolutely | | `JMP imm` | Jump to imm, absolutely | | `JMP +imm` | Jump to +imm, relatively | | `JMP -imm` | Jump to -imm, relatively | #### Description Transfers program counter to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. #### Examples - `JMP R1`: Sets `PC` to the value of `R1`. - `JMP 0x401337`: Sets `PC` to `0x401337`. - `JMP +10`: After running this `JMP` instruction, increases `PC` by 10. To understand the third example properly, you need to first understand how `PC` moves for normal instructions. ``` 0x400000: MOV R1, 1; ⬆️ 0x40000a: MOV R2, 1; ``` After running the first instruction, `PC` will move to the start of next instruction (i.e. `0x40000a`): ``` 0x400000: MOV R1, 1; 0x40000a: MOV R2, 1; ⬆️ ``` For relative `JMP`, the offset is all based on the value of `PC` after the `JMP` instruction is executed. For example: ``` 0x400000: JMP +11; 0x400009: MOV R2, 1; 0x400014: MOV R1, 1; ``` After running `JMP`, `PC` should goes to `0x400014`. This is because `PC` will be moved to `0x400009` after the instruction is executed, followed by the offset `+11`. ### `JZ`: Jump if zero | Instruction | Description | | ----------- | --------------------------------- | | `JZ reg` | Jump to reg, absolutely, if zero | | `JZ imm` | Jump to imm, absolutely, if zero | | `JZ +imm` | Jump to +imm, relatively, if zero | | `JZ -imm` | Jump to -imm, relatively, if zero | #### Description Transfers program counter to a different point in the instruction stream without recording return information if the stack top is zero. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. Note this will pop the current stack top. ### `JNZ`: Jump if not zero | Instruction | Description | | ----------- | ------------------------------------- | | `JNZ reg` | Jump to reg, absolutely, if not zero | | `JNZ imm` | Jump to imm, absolutely, if not zero | | `JNZ +imm` | Jump to +imm, relatively, if not zero | | `JNZ -imm` | Jump to -imm, relatively, if not zero | #### Description Transfers program counter to a different point in the instruction stream without recording return information if the stack top is non-zero. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location. Note this will pop the current stack top. ### `MOV`: Move | Instruction | Description | | -------------- | ------------------- | | `MOV r/m, imm` | Move `imm` to `r/m` | | `MOV r/m, reg` | Move `reg` to `r/m` | | `MOV reg, r/m` | Move `r/m` to `reg` | #### Description Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Note the two operand cannot be memory location at the same time. ### `NOT`: One's complement negation | Instruction | Description | | ----------- | ----------- | | `NOT r/m` | Negate r/m | #### Description Performs a one's complement negation on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location. ### `AND`: Logical AND | Instruction | Description | | -------------- | ----------- | | `AND r/m, reg` | r/m AND reg | | `AND reg, r/m` | reg AND r/m | | `AND r/m, imm` | r/m AND imm | #### Description Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction. ### `OR`: Logical OR | Instruction | Description | | ------------- | ----------- | | `OR r/m, reg` | r/m OR reg | | `OR reg, r/m` | reg OR r/m | | `OR r/m, imm` | r/m OR imm | #### Description Performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction. ### `XOR`: Logical exclusive OR | Instruction | Description | | -------------- | ----------- | | `XOR r/m, reg` | r/m XOR reg | | `XOR reg, r/m` | reg XOR r/m | | `XOR r/m, imm` | r/m XOR imm | #### Description Performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction. ### `SAL`: Shift arithmetic left | Instruction | Description | | -------------- | ---------------------------- | | `SAL r/m, imm` | Multiply r/m by 2, imm times | | `SAL r/m, reg` | Multiply r/m by 2, reg times | #### Description The shift arithmetic left (SAL) instructions shifts the bits in the destination operand to the left (toward more significant bit locations). For each shift count, and the least significant bit is cleared. ### `SAR`: Shift arithmetic right | Instruction | Description | | -------------- | --------------------------------- | | `SAR r/m, imm` | Signed divide r/m by 2, imm times | | `SAR r/m, reg` | Signed divide r/m by 2, reg times | #### Description The Shift arithmetic right (SAR) instructions shifts the bits in the destination operand to the right (toward less least bit locations). The SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand. ### `SHL`: Shift left | Instruction | Description | | -------------- | ---------------------------- | | `SHL r/m, imm` | Multiply r/m by 2, imm times | | `SHL r/m, reg` | Multiply r/m by 2, reg times | #### Description The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation. ### `SHR`: Shift right | Instruction | Description | | -------------- | ------------------------------------ | | `SHR r/m, imm` | Unsigned divide r/m by 2, imm times. | | `SHR r/m, reg` | Unsigned divide r/m by 2, reg times. | #### Description Shift logical right (SHR) instructions shift the bits of the destination operand to the right (toward less significant bit locations). For each shift count, the SHR instruction clears the most significant bit. ### `ROL`: Rotate left | Instruction | Description | | -------------- | ------------------------- | | `ROL r/m, imm` | Rotate r/m left imm times | | `ROL r/m, reg` | Rotate r/m left reg times | #### Description The rotate left (ROL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the least-significant bit location. ### `ROR`: Rotate right | Instruction | Description | | -------------- | -------------------------- | | `ROL r/m, imm` | Rotate r/m right imm times | | `ROL r/m, reg` | Rotate r/m right reg times | #### Description The rotate right (ROR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location. ### `ADD`: Addition | Instruction | Description | | -------------- | -------------- | | `ADD r/m, reg` | Add r/m to reg | | `ADD reg, r/m` | Add reg to r/m | | `ADD r/m, imm` | Add reg to imm | #### Description Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. However, two memory operands cannot be used in one instruction. ### `SUB`: Subtraction | Instruction | Description | | -------------- | --------------------- | | `SUB r/m, reg` | Subtract r/m from reg | | `SUB reg, r/m` | Subtract reg from r/m | | `SUB r/m, imm` | Subtract reg from imm | #### Description Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. However, two memory operands cannot be used in one instruction. ### `MUL`: Signed multiplication | Instruction | Description | | -------------- | ------------------------------ | | `MUL reg, reg` | Signed multiply reg with reg | #### Description Performs an signed multiplication of the first operand (destination operand) and the second operand (source operand). The lower 32 bits of the result would be stored in the destination operand and the higher 32 bits of the result would be stored in the source operand. ### `MULu`: Unsigned multiplication | Instruction | Description | | --------------- | ------------------------------ | | `MULu reg, reg` | Unsigned multiply reg with reg | #### Description Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand). The lower 32 bits of the result would be stored in the destination operand and the higher 32 bits of the result would be stored in the source operand. ### `DIV`: Signed divide | Instruction | Description | | -------------- | ---------------------------- | | `DIV reg, reg` | Signed division reg with reg | #### Description Performs a signed division with the first operand (dividend) divided by the second operand (divisor). The resulting quotient would be stored in the first operand and the remainder would be stored in the second operand. ### `DIVu`: Unsigned divide | Instruction | Description | | --------------- | ------------------------------ | | `DIVu reg, reg` | Unsigned division reg with reg | #### Description Performs an unsigned division with the first operand (dividend) devided by the second operand (divisor). The resulting quotient would be stored in the first operand and the remainder would be stored in the second operand. ### `EQ`: Compare for equality | Instruction | Description | | -------------- | ------------------------ | | `EQ imm, imm` | Compare if imm equal imm | | `EQ r/m, reg` | Compare if r/m equal r/m | | `EQ reg, r/m` | Compare if reg equal r/m | | `EQ r/m, imm` | Compare if r/m equal imm | | `EQ imm, r/m` | Compare if imm equal r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the operands are the same, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `NEQ`: Compare for inequality | Instruction | Description | | -------------- | ---------------------------- | | `NEQ imm, imm` | Compare if imm not equal imm | | `NEQ r/m, reg` | Compare if r/m not equal r/m | | `NEQ reg, r/m` | Compare if reg not equal r/m | | `NEQ r/m, imm` | Compare if r/m not equal imm | | `NEQ imm, r/m` | Compare if imm not equal r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the operands are different, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `GT`: Compare for signed greater than | Instruction | Description | | ------------- | --------------------------------- | | `GT imm, imm` | Compare if imm signed greater imm | | `GT r/m, reg` | Compare if r/m signed greater r/m | | `GT reg, r/m` | Compare if reg signed greater r/m | | `GT r/m, imm` | Compare if r/m signed greater imm | | `GT imm, r/m` | Compare if imm signed greater r/m | #### Description Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is greater than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `GTu`: Compared for unsigned greater than | Instruction | Description | | -------------- | ----------------------------------- | | `GTu imm, imm` | Compare if imm unsigned greater imm | | `GTu r/m, reg` | Compare if r/m unsigned greater r/m | | `GTu reg, r/m` | Compare if reg unsigned greater r/m | | `GTu r/m, imm` | Compare if r/m unsigned greater imm | | `GTu imm, r/m` | Compare if imm unsigned greater r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is greater than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `GTE`: Compare for signed greater than or equal | Instruction | Description | | -------------- | ----------------------------------------------- | | `GTE imm, imm` | Compare if imm signed greater than or equal imm | | `GTE r/m, reg` | Compare if r/m signed greater than or equal r/m | | `GTE reg, r/m` | Compare if reg signed greater than or equal r/m | | `GTE r/m, imm` | Compare if r/m signed greater than or equal imm | | `GTE imm, r/m` | Compare if imm signed greater than or equal r/m | #### Description Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is greater than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `GTEu`: Compare for unsigned greater than or equal | Instruction | Description | | -------------- | ------------------------------------------------- | | `GTE imm, imm` | Compare if imm unsigned greater than or equal imm | | `GTE r/m, reg` | Compare if r/m unsigned greater than or equal r/m | | `GTE reg, r/m` | Compare if reg unsigned greater than or equal r/m | | `GTE r/m, imm` | Compare if r/m unsigned greater than or equal imm | | `GTE imm, r/m` | Compare if imm unsigned greater than or equal r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is unsigned greater than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `LT`: Compare for signed less than | Instruction | Description | | ------------- | ------------------------------------- | | `LT imm, imm` | Compare if imm signed less than imm | | `LT r/m, reg` | Compare if r/m signed less than r/m | | `LT reg, r/m` | Compare if reg signed less than r/m | | `LT r/m, imm` | Compare if r/m signed less than imm | | `LT imm, r/m` | Compare if imm signed less than r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is signed less than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `LTu`: Compare for unsigned less than | Instruction | Description | | -------------- | ----------------------------------- | | `LTu imm, imm` | Compare if imm unsigned less than imm | | `LTu r/m, reg` | Compare if r/m unsigned less than r/m | | `LTu reg, r/m` | Compare if reg unsigned less than r/m | | `LTu r/m, imm` | Compare if r/m unsigned less than imm | | `LTu imm, r/m` | Compare if imm unsigned less than r/m | #### Description Compares the unsigned first source operand with the unsigned second source operand and pushes the result on the stack. If the first source operand is unsigned less than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `LTE`: Compare for signed less than or equal | Instruction | Description | | --------------- | -------------------------------------------- | | `LTE imm, imm` | Compare if imm signed less than or equal imm | | `LTE r/m, reg` | Compare if r/m signed less than or equal r/m | | `LTE reg, r/m,` | Compare if reg signed less than or equal r/m | | `LTE r/m, imm` | Compare if r/m signed less than or equal imm | | `LTE imm, r/m` | Compare if imm signed less than or equal r/m | #### Description Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is signed less than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `LTEu`: Compare for unsigned less than or equal | Instruction | Description | | --------------- | ---------------------------------------------- | | `LTEu imm, imm` | Compare if imm unsigned less than or equal imm | | `LTEu r/m, reg` | Compare if r/m unsigned less than or equal r/m | | `LTEu reg, r/m` | Compare if reg unsigned less than or equal r/m | | `LTEu r/m, imm` | Compare if r/m unsigned less than or equal imm | | `LTEu imm, r/m` | Compare if imm unsigned less than or equal r/m | #### Description Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is unsigned less than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead. ### `CALL`: Call procedure | Instruction | Description | | ----------- | ----------------------------------- | | `CALL imm` | Call absolute, address given in imm | | `CALL r/m` | Call absolute, address given in r/m | #### Description Saves `PC` on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a register, or a memory location. ### `RET`: Return from procedure | Instruction | Description | | ----------- | --------------------------- | | `RET` | Return to calling procedure | #### Description Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction. Note the return address is popped from the top of the stack. ### `SYSCALL`: System call | Instruction | Description | | ----------- | -------------------------------- | | `SYSCALL` | System call to system procedures | #### Description Calls a specified system procedure on `R8`. There will be at most 7 arguments for `SYSCALL`s apart from the syscall number, `R8`. `R1` corresponds to the first argument, `R2` corresponds to the second and so on. For the implemented system calls, there are not more than three arguments. The below table shows a brief summary of the system calls. Please refer to [system calls references](#Chapter-5-System-Calls-Reference) for details. | Syscall | `R8` | `R1` | `R2` | `R3` | Return value at `R8` | | -------- | ---- | ---------------- | -------------- | ----------- | -------------------- | | Input | `0` | Buffer address | Input length | - | Input length | | Output | `1` | Buffer address | Output length | - | Output length | | Exit | `2` | Exit code | - | - | None | | Readfile | `3` | Filename address | Buffer address | Buffer size | Number of bytes read | ### `PUSH`: Push a value onto the stack | Instruction | Description | | ----------- | ----------- | | `PUSH imm` | Push imm | | `PUSH r/m` | Push r/m | #### Description Decrements the stack pointer and then stores the source operand on the top of the stack. ### `POP`: Pop a value from the stack | Instruction | Description | | ----------- | ----------------------------------------------------- | | `POP r/m` | Pop top of stack into r/m and increment stack pointer | | `POP reg` | Pop top of stack into reg and increment stack pointer | #### Description Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a register or memory location. ### `SWAP`: Swap two values | Instruction | Description | | ----------- | ----------------------------------------------------------- | | `SWAP imm` | Swap the top of the stack with the imm-th item on the stack | | `SWAP r/m` | Swap the top of the stack with the r/m-th item on the stack | #### Description Exchanges the top value on the stock with the value contained in the i-th item on stack specified by the source operand. ### `COPY`: Copy value on stack | Instruction | Description | | ----------- | ---------------------------------------------- | | `COPY imm` | Push the imm-th item on the stack to the stack | | `COPY r/m` | Push the r/m-th item on the stack to the stack | #### Description Pushes the value contained in the $i$-th item on stack specified by the source operand to the top of the stack. Symbolically, `COPY XXX` is equivalent to `PUSH [SP+4*XXX]`. #### Example ``` PUSH 0x13370 PUSH 0x13371 PUSH 0x13372 PUSH 0x13373 PUSH 0x13374 PUSH 0x13375 COPY 1 ``` for the above code snippet, six values are pushed to the stack: `0x13370`, `0x13371`, `0x13372`, `0x13373`, `0x13374` and `0x13375`. For the last instruction, <p style="text-align: center;"><img src="https://hackmd.io/_uploads/ryCJ2womp.png" style="width: 300px;"></p> ### `NOP`: No operation | Instruction | Description | | ----------- | ------------------------------------------------------------------------------------- | | `NOP` | Do nothing and allows the processor to advance to the next instruction in the program | #### Description Performs no operation. It is a one-byte or multi-byte NOP that takes up space in the instruction stream but does not impact machine context, except for the PC register. ## Chapter 5: System Calls Reference This chapter describes all the system calls available in Bauhinia ISA, which can be called via the `SYSCALL` instruction. ### Interpretering the System Calls Reference There will be a system call (syscall) number at the beginning of each of the syscall descriptions. It should be the value set to `R8` when running the `SYSCALL` instruction. `R1`, `R2`, up to `R7` might be required depending on the system call, which will be served as arguments to that call in particular. ### Input Syscall #### Description Syscall number 0. Reads input from the user and stores it in the specified buffer at `R1`. It will attempt to read at most `R2` bytes. #### Input - `R1`: Buffer address where input data will be stored. - `R2`: Size of the buffer. #### Output - `R8`: Input length (number of characters read from the user). ### Output Syscall #### Description Syscall number 1. Outputs data from the specified buffer at `R1` to the user. It will attempt to write at most `R2` bytes. #### Input - `R1`: Buffer address containing data to be output. - `R2`: Size of the data to be output. #### Output - `R8`: Output length (number of characters successfully output). ### Exit Syscall #### Description Syscall number 2. Terminates the program with the specified exit code at `R1`. #### Input - `R1`: Exit code indicating the reason for program termination. #### Output None ### Readfile Syscall #### Description Syscall number 3. Reads the content of the specified file into the provided buffer until reading `R3` bytes or reaching the end of the file. If the operation fails, returns `0xffffffff`. #### Input - `R1`: File name address indicating the name of the file to be read. - `R2`: Buffer address where the content of the file will be stored. - `R3`: Size of the buffer. #### Output `R8`: If successful, return the number of characters read from the file. Otherwise, return `0xffffffff`. ### Example The below Bauhinia ISA code implements a sample read file function. It performs the steps: 1. Read the input from `stdin` 2. Open the file with the filename given by the user 3. Write the file to `stdout` 4. Exit the program with exit code 0 ``` ; read the input from the user, up to 100 bytes, and save to 0x410000 MOV R8, 0; syscall INPUT MOV R1, 0x410000; input address MOV R2, 100; input size SYSCALL; ; R8 will be the number of bytes read ; read up to 100 bytes from the file with filename at 0x410000 (the user input) ; and stores its content at 0x420000. MOV R8, 3; syscall READFILE MOV R1, 0x410000; file name address MOV R2, 0x420000; buffer address MOV R3, 100; buffer size SYSCALL; ; R8 will be the number of bytes read ; prints the file content at 0x420000 to stdout MOV R2, R8; output size MOV R8, 1; syscall OUTPUT MOV R1, 0x420000; output address SYSCALL; ; R8 will be the number of bytes written ; exits the program with exit code 0 MOV R8, 2; syscall EXIT MOV R1, 0; exit code SYSCALL; ; the program will be terminated ``` ## Chapter 6: Version History This chapter includes the version history of this document. | Version | Date | Description | | ------- | ------------ | -------------------------------------------------- | | v1.0.0 | Nov 10, 2023 | The initial release of Bauhinia ISA documentation. |