Try   HackMD

Assembly Syntax for EOF Stack Instructions

EOF introduces new stack instructions with immediate operands: DUPN, SWAPN, and EXCHANGE. The operands determine what elements of the stack will be accessed and manipulated.

There's no standard or obvious way in which these operands should be represented in human-readable assembly syntax, and different developers may come up with different solutions. If this ends up being subtly different across different tools (e.g., off by one), an assembly instruction may have different behavior depending on the syntax variant it's read as, and people or tools may be misled, which could be used to conceal malicious behavior in the worst case. Additionally, lack of clarity in communication about these opcodes could result in bugs in various places.

Because of this, we should standardize the representation of the operands of each of these instructions.

I'll describe three alternative approaches:

  1. Verbatim: The operands are written in assembly with the same numerical value that is present in the bytecode, probably as a decimal number.
  2. Traditional: The operands are written in a way that matches the numeration used in the original DUP* and SWAP* instructions.
  3. Exact: The operands are written to reflect the actual stack heights touched by the instruction in 1-based indexing.

The final choice could be a combination of these approaches.

Notation

The notation <bytecode> <-> <assembly> means that <bytecode> disassembles to <assembly> and that <assembly> assembles to <bytecode>.

I'll use a space to separate an opcode from its immediate operand, like DUPN 4, but the discussion should apply equally to different syntax, like DUPN[4].

DUPN and SWAPN

Running examples and their effect described in 1-based indexing:

  • e6 03: DUPN instruction that duplicates the 4th stack element.
  • e7 03: SWAPN instruction that swaps the 5th stack element with the 1st.

1. Verbatim

Examples:

  • e6 03 <-> DUPN 3
  • e7 03 <-> SWAPN 3

2. Traditional

According to this approach, for example, DUPN 4 should be functionally equivalent to DUP4, and SWAPN 4 should be functionally equivalent to SWAP4.

The assembly operand is thus the numerical value in the bytecode plus one.

Examples:

  • e6 03 <-> DUPN 4
  • e7 03 <-> SWAPN 4

3. Exact

In this approach, the operand of SWAPN is encoded differently than the operand of DUPN. Immediates that are equal in bytecode are not numerically equal in assembly.

Examples:

  • e6 03 <-> DUPN 4
  • e7 03 <-> SWAPN 5

Note that thus SWAPN 5 is functionally equivalent to SWAP4.

EXCHANGE

Exchange has a single-byte immediate, but conceptually two nibble-sized operands.

It can be confusing, so I'll use one running example, e8 13: the EXCHANGE instruction with the effect of swapping the 3rd stack element with the 7th (in 1-based indexing).

Visualizing the stack: a b C d e f G h … -> a b G d e f C h …

1. Verbatim

There are two ways to interpret the verbatim approach.

  • One operand:
    • e8 13 <-> EXCHANGE 0x13 (in decimal EXCHANGE 19)
  • Two operands:
    • e8 13 <-> EXCHANGE 1 3

2. Traditional

There is no direct equivalent to EXCHANGE prior to EOF, so this alternative proposes to draw an equivalence with a sequence of SWAP or SWAPN instructions: EXCHANGE n m should be functionally equivalent to SWAPN n SWAPN m SWAPN n. For example, EXCHANGE 2 6 should be equivalent to SWAP2 SWAP6 SWAP2.

Example:

  • e8 13 <-> EXCHANGE 2 6

3. Exact

Example:

  • e8 13 <-> EXCHANGE 3 7