The Bauhinia Instruction Set Architecture Documentation, version 1.1.0
===
## Chapter 1: Overview
Bauhinia Instruction Set Architecture (Bauhinia ISA) is an assembly language designed and implemented by [Black Bauhinia](https://b6a.black) for HKCERT CTF 2023 and 2024. Please consider to sponsor us through [Patreon](https://www.youtube.com/watch?v=dQw4w9WgXcQ) to support our further development.
Notably, this architecture document is largely referenced from the Volume 2 of the Intel Architecture Software Developer's Manual. If you are uploading the document to _plagarism detection_ services, this document would definitely fail.
### Designers and Authors
This section contains a list of designers and authors of the Bauhinia ISA. If you hate the architecture, please direct your hate to them directly. Black Bauhinia does not bear the responsibility of their design and implementation.
- The architecture is designed by [@khlung1](https://twitter.com/Khlung1) and [@harrier_lcc](https://twitter.com/harrier_lcc).
- The challenges in 2024 are prepared by [@khlung1](https://twitter.com/Khlung1) and [@harrier_lcc](https://twitter.com/harrier_lcc), and the challenges in 2023 were prepared by [@khlung1](https://twitter.com/Khlung1), [@harrier_lcc](https://twitter.com/harrier_lcc) and [@mystiz613](https://twitter.com/mystiz613).
- The document is compiled by [@khlung1](https://twitter.com/Khlung1), [@harrier_lcc](https://twitter.com/harrier_lcc), [@mystiz613](https://twitter.com/mystiz613) and [@vikychoi](https://twitter.com/vikychoi).
:::info
If there are any questions the architecture document during contest, please reach out to the authors by opening a ticket on the HKCERT CTF 2024 Discord server.
:::
### Special Thanks
- Thanks @TrebledJ for spotting the typo in descriptions for `LT` and `LTu`. (22:44, Nov 10, 2023)
- Thanks @peace_ranger for spotting the typos in descriptions for `NEQ`, `LTu` and `PUSH`; as well as pointing out that `SYSCALL` outputs to `R8` (02:23, Nov 11, 2023)
- Thanks @harrier for spotting the typo in description for `MOV` (10:10, Nov 12, 2023)
- Thanks @Soar for spotting the typo in `mem` definition at Instruction Column (13:35, Nov 12, 2023)
- Thanks @peace_ranger for spotting the typo in descriptions for `Invalid instruction` and `Segmentation fault` (20:57, Nov 9, 2024)
### Instruction Operands
An instruction of Bauhinia ISA has the following format:
```asm
mnemonic argument1, argument2
```
where:
- a **mnemonic** is a reserved name for a class of instruction operators which have the same function, and
- the **operands** _argument1_ and _argument2_ are optional. There may be zero to two operands, depending on the operator. When present, they take the form of either literals or identifiers for data items. Operand identifiers are either reserved names of registers or are assumed to be assigned to data items declared in another part of the program (which may not be shown in the example).
When two operands are present in an arithmetic or logical instruction, the right operand is the source and the left operand is the destination.
For example: `MOV R3, 1337` is an instruction. In this case, `MOV` is the mnemonic identifier of an operator, `R3` is the destination operand and `1337` is the source operand.
It is important that the mnemonic and operands are case-sensitive, and a space (` `) character is required in between mnemonic and the first operand. The operands are comma-separated.
### Registers
There are eight general-purpose registers, namely, `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7` and `R8`.
There are three registers with specific roles:
- `PC`, the program counter
- `FP`, the frame pointer (or the stack base pointer)
- `SP`, the stack pointer
## Chapter 2: Interpreter
### Execution
The interpreter will parse the instruction by capturing the code segment between the program counter (`PC`) and the new line character. We will use the below code snippet as an example:
```
MOV R1, 0x1337; assigns 0x1337 to R1
MOV R2, 0xdeadbeef
```
Suppose that `PC` is currently at the beginning of the first line. The instruction captured will be
```
MOV R1, 0x1337; assigns 0x1337 to R1
```
After that, it will strips away the comment by removing everything since the first semicolon.
```
MOV R1, 0x1337
```
Note that a [`NOP`](#NOP-No-operation) instruction will be executed if the resulting string is empty. The program counter will advance to the beginning of the next line after the instruction is executed. This does _not_ apply when a jump is taken place by the [`JMP`](#JMP-Unconditional-jump), [`JZ`](#JZ-Jump-if-zero) or [`JNZ`](#JNZ-Jump-if-not-zero) instructions.
Additionally, instructions are case sensitive.
### Memory regions and addresses
There are three memory segments defined by Bauhinia ISA, defined by the below table:
| Segment name | Segment address | Segment size | Permission |
| ------------ | --------------- | ------------ | ---------- |
| Code | `0x00400000` | `0x100000` | `R-X` |
| Bss | `0x00500000` | `0x10000` | `RW-` |
| Stack | `0xfff00000` | `0x100000` | `RW-` |
Any access to memory apart from these regions will be considered invalid.
#### Permission
There are 3 types of access permission: read\(R), write(W) and execute(X) for each segment. You can only:
- read a memory if `R` permission is set on its segment.
- write a memory if `W` permission is set on its segment
- execute from a memory if `X` permission is set on its segment.
### Initial values
When the program executes, the registers and the memory will be initialized by the following values:
- `PC` will be set to the beginning of the code segment, i.e., `0x00400000`,
- `FP` and `SP` will be set to `0xfffffff0`, and
- the code segment will be set to the defined program in string.
Take the below program as an example.
```
MOV R1, 0x1337; assigns 0x1337 to R1
MOV R2, 0xdeadbeef
```
When the above program begins, `PC` will be set to `0x00400000` and `FP`, `SP` will be set to `0xfffffff0`. Below show the memory between `0x00400000` and `0x00400040`.
<p style="text-align: center;"><img src="https://hackmd.io/_uploads/ByLLenqQp.png" style="width: 500px;"></p>
### Limitations
The interpreter will keep track on the number of steps executed. If there are more than `MAX_STEP_COUNT` steps, the interpreter will stop processing and will return an non-zero exit code, stating that the [step count exceeded](#Exit-code-65-Step-count-exceeded) the limit. At the moment, `MAX_STEP_COUNT` is defined to be 131072.
## Chapter 3: Exit codes
This session introduces the set of exit codes implemented by Bauhinia ISA.
| Exit Code | Description |
| ---------:| ------------------- |
| 0 | OK |
| 3 | Invalid instruction |
| 5 | Segmentation fault |
| 63 | Invalid program |
| 64 | Bad server config |
| 65 | Step count exceeded |
| 66 | Stack Smash detected |
### Exit code `0`: OK
This exit code is called to indicate the program exits successfully.
### Exit code `3`: Invalid instruction
This exit code will be triggered when an invalid instruction is found at the program counter (`PC`). This including but not limited to:
1. Having an invalid instruction. For example, `mov R1, 0x1337` (instructions are case-sensitive).
1. Assigning a value to an immediate. For example, `MOV 0x400000, 0x1337`.
1. Accessing `PC`. For example, `JMP PC`.
Notably, if there are no system calls that terminates the program, it is likely that the program would return an exit code 3. This is because it has reached to the end of the program, and the interpreter is unable to parse the next instruction.
### Exit code `5`: Segmentation fault
This exit code will be triggered when attempting to read, write or execute memory from invalid address. All addresses apart from code region, the stack region and the bss region are considered invalid. (See [Memory regions and addresses](#Memory-regions-and-addresses))
Below are examples of code snippets that will return exit code 3:
1. Reading from address `0xdeadbeef`:
```
MOV R1, 0xdeadbeef
MOV R2, [R1]
```
1. Writing to address `0xffeffff0`, assuming that `FP` being `0xfffffff0`:
```
MOV [FP-0x100000], 0x1337
```
1. Executing instructions on address `0x13371337`. Note that the below instruction will run successfully, but it will fail when `PC` is at `0x13371337`:
```
JMP 0x13371337
```
### Exit code `63`: Invalid program
This exit code will be triggered when there is a checker imposed. For example, in the _Jump Scare_ challenge, the below Python checker is implemented:
```python=
def checker(code):
# No comments allowed!
if ';' in code: return False
# All instruction should be a JMP instruction!
for line in code.split('\n'):
if not line.startswith('JMP '): return False
return True
```
All the below input code would yield an "invalid program" error, because they do not pass the check:
1. The first line contains a comment. This is rejected on line 3 of the above Python snippet:
```
JMP 0x400000; hello world :)
JMP 0x400000
```
1. The second line does not start with `JMP`. This is rejected on line 6 above:
```
JMP 0x400000
NOP
```
### Exit code `64`: Bad server config
This exit code will be triggered when the server is misconfigured. Please report to admin if this happens.
### Exit code `65`: Step count exceeded
This exit code will be triggered when the program has been executed for more than `MAX_STEP_COUNT` steps.
Please refer to the [interpreter limitations](#Limitations) on the value of `MAX_STEP_COUNT`.
For example, the below program would yield an "step count exceeded" error because it did not terminate properly in `MAX_STEP_COUNT` steps (in reality, it would not terminate).
```
JMP 0x400000
```
### Exit code `66`: Stack Smash detected
This exit code will be triggered when the `canary` is modified.
The isa program may apply the security method `canary` to detect stack buffer overflow.
A random value called `canary` will place before the return address. This canary value is checked before a function returns. If the canary value has been modified, it indicates that the buffer has been overflowed, and the program exited immediately with this exit code.
## Chapter 4: Instruction Reference
This chapter provides detailed descriptions of each of the Bauhinia ISA instructions.
### Interpreting the Instruction Reference
#### Instruction Format
The following is an example of the format used for each Bauhinia ISA instruction description in this chapter:
| Instruction | Description |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `MOV r/m, imm` | Copy the immediate value from second operand (source operand) to the register or memory of the first operand (destination operand) |
| `MOV r/m, reg` | Copy the register in the second operand (source operand) to the register or memory of the first operand (destination operand) |
#### Instruction Column
The "instruction" column gives the syntax of the instruction staqtement as it would appear in an Bauhinia ISA program. The following is a list of the symbols used to represent operands in the instruction statements:
- `imm` (Immediate) -- An 32-bit value used for instructions. It allows the use of a number between 0 (`0x00000000`) and 4294967295 (`0xffffffff`) inclusive. The immediates can be written in base 2, base 8, base 16 by respectively prepending `0b`, `0o` and `0x`; or in base 10 if there are no prefixes.
- `reg` (Register) -- A general-purpose register or memory operand used for instructions. The registers that can be used are `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `FP` and `SP`.
- `mem` (Memory) -- A 32-bit address that represents the memory. The contents of memory are found at the address provided by the effective address computation. It must either be `[reg+imm]`, `[reg-imm]` or `[reg*imm]`.
- `r/m` (Register or memory) -- Either a register or a memory address. The operand can either be a register or a memory location.
##### Examples
<p style="text-align: center;"><img src="https://hackmd.io/_uploads/ByLLenqQp.png" style="width: 500px;"></p>
- `12345` is an immediate that represents the number $12345_{10}$.
- `0b10001` is an immediate that represents the number $17_{10}$, or $\texttt{10001}_2$.
- `0o13371337` is an immediate that represents the number $301195_{10}$, or $\texttt{13371337}_8$.
- `0x00f` is an immedidate that represents the number $15_{10}$, or $\texttt{f}_{16}$.
- `R1`, `R2`, `R3`, `R4`, `R5`, `R6`, `R7`, `R8`, `FP` and `SP` are the valid registers.
- `[R1]` is an memory address that points to the value at memory `R1`.
For instance, If `R1 = 0x400000` then `[R1]` would be `0x20564f4d`.
- `[R1+0x20]` is an memory address that points to the value at memory `R1+0x20`.
For instance, if `R1 = 0x400000` then `[R1+0x20]` would be `0x3152206f`.
- `[R1-8]` is an memory address that points to the value at memory `R1-8`.
For instance, if `R1 = 0x400010` then `[R1-8]` would be `0x33317830`.
- `[R1*4]` is an memory address that points to the value at memory `R1*4`.
For instance, if `R1 = 0x100000` then `[R1*4]` would be `0x20564f4d`.
#### Description Column
The "Description" column following the "Instruction" column briefly explains the various forms of the instruction. The following "Description" sections contain more details of the instruction's operation.
### `JMP`: Unconditional jump
| Instruction | Description |
| ----------- | ------------------------ |
| `JMP reg` | Jump to reg, absolutely |
| `JMP imm` | Jump to imm, absolutely |
| `JMP +imm` | Jump to +imm, relatively |
| `JMP -imm` | Jump to -imm, relatively |
#### Description
Transfers program counter to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.
#### Examples
- `JMP R1`: Sets `PC` to the value of `R1`.
- `JMP 0x401337`: Sets `PC` to `0x401337`.
- `JMP +10`: After running this `JMP` instruction, increases `PC` by 10.
To understand the third example properly, you need to first understand how `PC` moves for normal instructions.
```
0x400000: MOV R1, 1;
⬆️
0x40000a: MOV R2, 1;
```
After running the first instruction, `PC` will move to the start of next instruction (i.e. `0x40000a`):
```
0x400000: MOV R1, 1;
0x40000a: MOV R2, 1;
⬆️
```
For relative `JMP`, the offset is all based on the value of `PC` after the `JMP` instruction is executed. For example:
```
0x400000: JMP +11;
0x400009: MOV R2, 1;
0x400014: MOV R1, 1;
```
After running `JMP`, `PC` should goes to `0x400014`. This is because `PC` will be moved to `0x400009` after the instruction is executed, followed by the offset `+11`.
### `JZ`: Jump if zero
| Instruction | Description |
| ----------- | --------------------------------- |
| `JZ reg` | Jump to reg, absolutely, if zero |
| `JZ imm` | Jump to imm, absolutely, if zero |
| `JZ +imm` | Jump to +imm, relatively, if zero |
| `JZ -imm` | Jump to -imm, relatively, if zero |
#### Description
Transfers program counter to a different point in the instruction stream without recording return information if the stack top is zero. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.
Note this will pop the current stack top.
### `JNZ`: Jump if not zero
| Instruction | Description |
| ----------- | ------------------------------------- |
| `JNZ reg` | Jump to reg, absolutely, if not zero |
| `JNZ imm` | Jump to imm, absolutely, if not zero |
| `JNZ +imm` | Jump to +imm, relatively, if not zero |
| `JNZ -imm` | Jump to -imm, relatively, if not zero |
#### Description
Transfers program counter to a different point in the instruction stream without recording return information if the stack top is non-zero. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.
Note this will pop the current stack top.
### `MOV`: Move
| Instruction | Description |
| -------------- | ------------------- |
| `MOV r/m, imm` | Move `imm` to `r/m` |
| `MOV r/m, reg` | Move `reg` to `r/m` |
| `MOV reg, r/m` | Move `r/m` to `reg` |
#### Description
Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location.
Note the two operand cannot be memory location at the same time.
### `NOT`: One's complement negation
| Instruction | Description |
| ----------- | ----------- |
| `NOT r/m` | Negate r/m |
#### Description
Performs a one's complement negation on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location.
### `AND`: Logical AND
| Instruction | Description |
| -------------- | ----------- |
| `AND r/m, reg` | r/m AND reg |
| `AND reg, r/m` | reg AND r/m |
| `AND r/m, imm` | r/m AND imm |
#### Description
Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction.
### `OR`: Logical OR
| Instruction | Description |
| ------------- | ----------- |
| `OR r/m, reg` | r/m OR reg |
| `OR reg, r/m` | reg OR r/m |
| `OR r/m, imm` | r/m OR imm |
#### Description
Performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction.
### `XOR`: Logical exclusive OR
| Instruction | Description |
| -------------- | ----------- |
| `XOR r/m, reg` | r/m XOR reg |
| `XOR reg, r/m` | reg XOR r/m |
| `XOR r/m, imm` | r/m XOR imm |
#### Description
Performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. However, two memory operands cannot be used in one instruction.
### `SAL`: Shift arithmetic left
| Instruction | Description |
| -------------- | ---------------------------- |
| `SAL r/m, imm` | Multiply r/m by 2, imm times |
| `SAL r/m, reg` | Multiply r/m by 2, reg times |
#### Description
The shift arithmetic left (SAL) instructions shifts the bits in the destination operand to the left (toward more significant bit locations). For each shift count, and the least significant bit is cleared.
### `SAR`: Shift arithmetic right
| Instruction | Description |
| -------------- | --------------------------------- |
| `SAR r/m, imm` | Signed divide r/m by 2, imm times |
| `SAR r/m, reg` | Signed divide r/m by 2, reg times |
#### Description
The Shift arithmetic right (SAR) instructions shifts the bits in the destination operand to the right (toward less least bit locations). The SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand.
### `SHL`: Shift left
| Instruction | Description |
| -------------- | ---------------------------- |
| `SHL r/m, imm` | Multiply r/m by 2, imm times |
| `SHL r/m, reg` | Multiply r/m by 2, reg times |
#### Description
The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation.
### `SHR`: Shift right
| Instruction | Description |
| -------------- | ------------------------------------ |
| `SHR r/m, imm` | Unsigned divide r/m by 2, imm times. |
| `SHR r/m, reg` | Unsigned divide r/m by 2, reg times. |
#### Description
Shift logical right (SHR) instructions shift the bits of the destination operand to the right (toward less significant bit locations). For each shift count, the SHR instruction clears the most significant bit.
### `ROL`: Rotate left
| Instruction | Description |
| -------------- | ------------------------- |
| `ROL r/m, imm` | Rotate r/m left imm times |
| `ROL r/m, reg` | Rotate r/m left reg times |
#### Description
The rotate left (ROL) instructions shift all the bits toward more-significant bit positions, except for the most-significant bit, which is rotated to the least-significant bit location.
### `ROR`: Rotate right
| Instruction | Description |
| -------------- | -------------------------- |
| `ROL r/m, imm` | Rotate r/m right imm times |
| `ROL r/m, reg` | Rotate r/m right reg times |
#### Description
The rotate right (ROR) instructions shift all the bits toward less significant bit positions, except for the least-significant bit, which is rotated to the most-significant bit location.
### `ADD`: Addition
| Instruction | Description |
| -------------- | -------------- |
| `ADD r/m, reg` | Add r/m to reg |
| `ADD reg, r/m` | Add reg to r/m |
| `ADD r/m, imm` | Add reg to imm |
#### Description
Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. However, two memory operands cannot be used in one instruction.
### `SUB`: Subtraction
| Instruction | Description |
| -------------- | --------------------- |
| `SUB r/m, reg` | Subtract r/m from reg |
| `SUB reg, r/m` | Subtract reg from r/m |
| `SUB r/m, imm` | Subtract reg from imm |
#### Description
Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. However, two memory operands cannot be used in one instruction.
### `MUL`: Signed multiplication
| Instruction | Description |
| -------------- | ------------------------------ |
| `MUL reg, reg` | Signed multiply reg with reg |
#### Description
Performs an signed multiplication of the first operand (destination operand) and the second operand (source operand). The lower 32 bits of the result would be stored in the destination operand and the higher 32 bits of the result would be stored in the source operand.
### `MULu`: Unsigned multiplication
| Instruction | Description |
| --------------- | ------------------------------ |
| `MULu reg, reg` | Unsigned multiply reg with reg |
#### Description
Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand). The lower 32 bits of the result would be stored in the destination operand and the higher 32 bits of the result would be stored in the source operand.
### `DIV`: Signed divide
| Instruction | Description |
| -------------- | ---------------------------- |
| `DIV reg, reg` | Signed division reg with reg |
#### Description
Performs a signed division with the first operand (dividend) divided by the second operand (divisor). The resulting quotient would be stored in the first operand and the remainder would be stored in the second operand.
### `DIVu`: Unsigned divide
| Instruction | Description |
| --------------- | ------------------------------ |
| `DIVu reg, reg` | Unsigned division reg with reg |
#### Description
Performs an unsigned division with the first operand (dividend) devided by the second operand (divisor). The resulting quotient would be stored in the first operand and the remainder would be stored in the second operand.
### `EQ`: Compare for equality
| Instruction | Description |
| -------------- | ------------------------ |
| `EQ imm, imm` | Compare if imm equal imm |
| `EQ r/m, reg` | Compare if r/m equal r/m |
| `EQ reg, r/m` | Compare if reg equal r/m |
| `EQ r/m, imm` | Compare if r/m equal imm |
| `EQ imm, r/m` | Compare if imm equal r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the operands are the same, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `NEQ`: Compare for inequality
| Instruction | Description |
| -------------- | ---------------------------- |
| `NEQ imm, imm` | Compare if imm not equal imm |
| `NEQ r/m, reg` | Compare if r/m not equal r/m |
| `NEQ reg, r/m` | Compare if reg not equal r/m |
| `NEQ r/m, imm` | Compare if r/m not equal imm |
| `NEQ imm, r/m` | Compare if imm not equal r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the operands are different, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `GT`: Compare for signed greater than
| Instruction | Description |
| ------------- | --------------------------------- |
| `GT imm, imm` | Compare if imm signed greater imm |
| `GT r/m, reg` | Compare if r/m signed greater r/m |
| `GT reg, r/m` | Compare if reg signed greater r/m |
| `GT r/m, imm` | Compare if r/m signed greater imm |
| `GT imm, r/m` | Compare if imm signed greater r/m |
#### Description
Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is greater than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `GTu`: Compared for unsigned greater than
| Instruction | Description |
| -------------- | ----------------------------------- |
| `GTu imm, imm` | Compare if imm unsigned greater imm |
| `GTu r/m, reg` | Compare if r/m unsigned greater r/m |
| `GTu reg, r/m` | Compare if reg unsigned greater r/m |
| `GTu r/m, imm` | Compare if r/m unsigned greater imm |
| `GTu imm, r/m` | Compare if imm unsigned greater r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is greater than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `GTE`: Compare for signed greater than or equal
| Instruction | Description |
| -------------- | ----------------------------------------------- |
| `GTE imm, imm` | Compare if imm signed greater than or equal imm |
| `GTE r/m, reg` | Compare if r/m signed greater than or equal r/m |
| `GTE reg, r/m` | Compare if reg signed greater than or equal r/m |
| `GTE r/m, imm` | Compare if r/m signed greater than or equal imm |
| `GTE imm, r/m` | Compare if imm signed greater than or equal r/m |
#### Description
Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is greater than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `GTEu`: Compare for unsigned greater than or equal
| Instruction | Description |
| -------------- | ------------------------------------------------- |
| `GTE imm, imm` | Compare if imm unsigned greater than or equal imm |
| `GTE r/m, reg` | Compare if r/m unsigned greater than or equal r/m |
| `GTE reg, r/m` | Compare if reg unsigned greater than or equal r/m |
| `GTE r/m, imm` | Compare if r/m unsigned greater than or equal imm |
| `GTE imm, r/m` | Compare if imm unsigned greater than or equal r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is unsigned greater than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `LT`: Compare for signed less than
| Instruction | Description |
| ------------- | ------------------------------------- |
| `LT imm, imm` | Compare if imm signed less than imm |
| `LT r/m, reg` | Compare if r/m signed less than r/m |
| `LT reg, r/m` | Compare if reg signed less than r/m |
| `LT r/m, imm` | Compare if r/m signed less than imm |
| `LT imm, r/m` | Compare if imm signed less than r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is signed less than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `LTu`: Compare for unsigned less than
| Instruction | Description |
| -------------- | ----------------------------------- |
| `LTu imm, imm` | Compare if imm unsigned less than imm |
| `LTu r/m, reg` | Compare if r/m unsigned less than r/m |
| `LTu reg, r/m` | Compare if reg unsigned less than r/m |
| `LTu r/m, imm` | Compare if r/m unsigned less than imm |
| `LTu imm, r/m` | Compare if imm unsigned less than r/m |
#### Description
Compares the unsigned first source operand with the unsigned second source operand and pushes the result on the stack. If the first source operand is unsigned less than the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `LTE`: Compare for signed less than or equal
| Instruction | Description |
| --------------- | -------------------------------------------- |
| `LTE imm, imm` | Compare if imm signed less than or equal imm |
| `LTE r/m, reg` | Compare if r/m signed less than or equal r/m |
| `LTE reg, r/m,` | Compare if reg signed less than or equal r/m |
| `LTE r/m, imm` | Compare if r/m signed less than or equal imm |
| `LTE imm, r/m` | Compare if imm signed less than or equal r/m |
#### Description
Compares the signed first source operand with the signed second source operand and pushes the result on the stack. The most significant bit would be treated as a sign bit. (0 for a positive number and to 1 for a negative number). If the first source operand is signed less than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `LTEu`: Compare for unsigned less than or equal
| Instruction | Description |
| --------------- | ---------------------------------------------- |
| `LTEu imm, imm` | Compare if imm unsigned less than or equal imm |
| `LTEu r/m, reg` | Compare if r/m unsigned less than or equal r/m |
| `LTEu reg, r/m` | Compare if reg unsigned less than or equal r/m |
| `LTEu r/m, imm` | Compare if r/m unsigned less than or equal imm |
| `LTEu imm, r/m` | Compare if imm unsigned less than or equal r/m |
#### Description
Compares the first source operand with the second source operand and pushes the result on the stack. If the first source operand is unsigned less than or equal to the second source operand, value 1 would be pushed to the stack, else value 0 would be pushed instead.
### `CALL`: Call procedure
| Instruction | Description |
| ----------- | ----------------------------------- |
| `CALL imm` | Call absolute, address given in imm |
| `CALL r/m` | Call absolute, address given in r/m |
#### Description
Saves `PC` on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a register, or a memory location.
### `RET`: Return from procedure
| Instruction | Description |
| ----------- | --------------------------- |
| `RET` | Return to calling procedure |
#### Description
Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.
Note the return address is popped from the top of the stack.
### `SYSCALL`: System call
| Instruction | Description |
| ----------- | -------------------------------- |
| `SYSCALL` | System call to system procedures |
#### Description
Calls a specified system procedure on `R8`.
There will be at most 7 arguments for `SYSCALL`s apart from the syscall number, `R8`. `R1` corresponds to the first argument, `R2` corresponds to the second and so on.
For the implemented system calls, there are not more than three arguments. The below table shows a brief summary of the system calls. Please refer to [system calls references](#Chapter-5-System-Calls-Reference) for details.
| Syscall | `R8` | `R1` | `R2` | `R3` | Return value at `R8` |
| -------- | ---- | ---------------- | -------------- | ----------- | -------------------- |
| [Input](#Input-Syscall) | `0` | Buffer address | Input length | - | Input length |
| [Output](#Output-Syscall) | `1` | Buffer address | Output length | - | Output length |
| [Exit](#Exit-Syscall) | `2` | Exit code | - | - | None |
| [Readfile](#Readfile-Syscall) | `3` | Filename address | Buffer address | Buffer size | Number of bytes read |
| [Listfile](#Listfile-Syscall) | `4` | - | - | - | Number of files listed |
| [Exec](#Exec-Syscall) | `5` | Filename address | - | - | None |
| [Download](#Download-Syscall) | `6` | Filename address | Url address | - | Number of bytes downloaded |
| [Random](#Random-Syscall) | `7` | - | - | - | Random Number |
### `PUSH`: Push a value onto the stack
| Instruction | Description |
| ----------- | ----------- |
| `PUSH imm` | Push imm |
| `PUSH r/m` | Push r/m |
#### Description
Decrements the stack pointer and then stores the source operand on the top of the stack.
### `POP`: Pop a value from the stack
| Instruction | Description |
| ----------- | ----------------------------------------------------- |
| `POP r/m` | Pop top of stack into r/m and increment stack pointer |
| `POP reg` | Pop top of stack into reg and increment stack pointer |
#### Description
Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a register or memory location.
### `SWAP`: Swap two values
| Instruction | Description |
| ----------- | ----------------------------------------------------------- |
| `SWAP imm` | Swap the top of the stack with the imm-th item on the stack |
| `SWAP r/m` | Swap the top of the stack with the r/m-th item on the stack |
#### Description
Exchanges the top value on the stock with the value contained in the i-th item on stack specified by the source operand.
### `COPY`: Copy value on stack
| Instruction | Description |
| ----------- | ---------------------------------------------- |
| `COPY imm` | Push the imm-th item on the stack to the stack |
| `COPY r/m` | Push the r/m-th item on the stack to the stack |
#### Description
Pushes the value contained in the $i$-th item on stack specified by the source operand to the top of the stack. Symbolically, `COPY XXX` is equivalent to `PUSH [SP+4*XXX]`.
#### Example
```
PUSH 0x13370
PUSH 0x13371
PUSH 0x13372
PUSH 0x13373
PUSH 0x13374
PUSH 0x13375
COPY 1
```
for the above code snippet, six values are pushed to the stack: `0x13370`, `0x13371`, `0x13372`, `0x13373`, `0x13374` and `0x13375`. For the last instruction,
<p style="text-align: center;"><img src="https://hackmd.io/_uploads/ryCJ2womp.png" style="width: 300px;"></p>
### `NOP`: No operation
| Instruction | Description |
| ----------- | ------------------------------------------------------------------------------------- |
| `NOP` | Do nothing and allows the processor to advance to the next instruction in the program |
#### Description
Performs no operation. It is a one-byte or multi-byte NOP that takes up space in the instruction stream but does not impact machine context, except for the PC register.
## Chapter 5: System Calls Reference
This chapter describes all the system calls available in Bauhinia ISA, which can be called via the `SYSCALL` instruction.
### Interpretering the System Calls Reference
There will be a system call (syscall) number at the beginning of each of the syscall descriptions. It should be the value set to `R8` when running the `SYSCALL` instruction. `R1`, `R2`, up to `R7` might be required depending on the system call, which will be served as arguments to that call in particular.
### Input Syscall
#### Description
Syscall number 0. Reads input from the user and stores it in the specified buffer at `R1`. It will attempt to read at most `R2` bytes.
#### Input
- `R1`: Buffer address where input data will be stored.
- `R2`: Size of the buffer.
#### Output
- `R8`: Input length (number of characters read from the user).
### Output Syscall
#### Description
Syscall number 1. Outputs data from the specified buffer at `R1` to the user. It will attempt to write at most `R2` bytes.
#### Input
- `R1`: Buffer address containing data to be output.
- `R2`: Size of the data to be output.
#### Output
- `R8`: Output length (number of characters successfully output).
### Exit Syscall
#### Description
Syscall number 2. Terminates the program with the specified exit code at `R1`.
#### Input
- `R1`: Exit code indicating the reason for program termination.
#### Output
None
### Readfile Syscall
#### Description
Syscall number 3. Reads the content of the specified file into the provided buffer until reading `R3` bytes or reaching the end of the file. If the operation fails, returns `0xffffffff`.
#### Input
- `R1`: File name address indicating the name of the file to be read.
- `R2`: Buffer address where the content of the file will be stored.
- `R3`: Size of the buffer.
#### Output
`R8`: If successful, return the number of characters read from the file. Otherwise, return `0xffffffff`.
### Listfile Syscall
Syscall number 4. Lists all files in the filesystem. The output is in format of:
```
<file1_name>
<file2_name>
<file3_name>
...
```
#### Input
None
#### Output
`R8`: If successful, return the number of files listed.
### Exec Syscall
Syscall number 5. Loads and executes a file with the filename at `R1` from the filesystem.
#### Input
- `R1`: File name address indicating the name of the file to be executed.
#### Output
None
### Download Syscall
Syscall number 6. Access the internet, visit the URL provided at `R2`, retrieve the response data and save it into a file named with `R1`. If the size of the response data is larger than 0x10000 or the operation fails, throw an error.
#### Input
- `R1`: The file name address specifies the name of the file where the downloaded data will be saved.
- `R2`: The url address to be downloaded.
#### Output
`R8`: If successful, return the size of the downloaded file in bytes. Otherwise, it throws an error.
### Random Syscall
Syscall number 7. Generate and return a random number.
#### Input
None
#### Output
`R8`: Return a random 32-bit number.
### Example
The below Bauhinia ISA code implements a sample read file function. It performs the steps:
1. Read the input from `stdin`
2. Open the file with the filename given by the user
3. Write the file to `stdout`
4. Exit the program with exit code 0
```asm
; read the input from the user, up to 100 bytes, and save to 0x410000
MOV R8, 0; syscall INPUT
MOV R1, 0x410000; input address
MOV R2, 100; input size
SYSCALL;
; R8 will be the number of bytes read
; read up to 100 bytes from the file with filename at 0x410000 (the user input)
; and stores its content at 0x420000.
MOV R8, 3; syscall READFILE
MOV R1, 0x410000; file name address
MOV R2, 0x420000; buffer address
MOV R3, 100; buffer size
SYSCALL;
; R8 will be the number of bytes read
; prints the file content at 0x420000 to stdout
MOV R2, R8; output size
MOV R8, 1; syscall OUTPUT
MOV R1, 0x420000; output address
SYSCALL;
; R8 will be the number of bytes written
; exits the program with exit code 0
MOV R8, 2; syscall EXIT
MOV R1, 0; exit code
SYSCALL;
; the program will be terminated
```
## Chapter 6: Version History
This chapter includes the version history of this document.
| Version | Date | Description |
| ------- | ------------ | -------------------------------------------------------------------- |
| v1.0.0 | Nov 10, 2023 | The initial release of Bauhinia ISA documentation. |
| v1.0.1 | Nov 12, 2023 | Fixed multiple incorrected documented features pointed by the users. |
| v1.1.0 | Nov 8, 2024 | Added Bss segment, syscalls: listfile, exec, download and random, exit codes: stack smash detected; updated exit codes for invalid instruction and segmentation fault; implemented segment permission |