EVM bytecode programming

1. Environment setup

Run a local testnet using dapptools:

dapp testnet

On a second terminal, source your testnet environment:

source ~/env-testnet

Wait, you don't have a testnet env file? Here it is:






export ETH_KEYSTORE=~/.dapp/testnet/8545/keystore
export ETH_FROM=$(cat ~/.dapp/testnet/8545/config/account | head -n 1)
export ETH_FROM_2=$(cat ~/.dapp/testnet/8545/config/account | tail -n 1)
export ETH_PASSWORD=/dev/null
export ETH_GAS=1000000
export ETH_RPC_URL=http://localhost:8545

Always source that file after running the testnet.

Allright, let's do this!

2. The simplest smart contract ever

That's it.

That opcode means stop. Not revert or anything. Just stop. So if you send a transaction, any transaction to that smart contract, it will accept it. Isn't that cool?

3. Deploying the simplest smart contract ever

Run the following:

seth send --create 0x00

Did you run it? Please run it now.

$ seth send --create 0x00
seth-send: Published transaction with 1 bytes of calldata.
seth-send: 0x74fc23e06481497e0df23476341e1595dd4b5b887337dc5a102e0e9c45a05f63
seth-send: Waiting for transaction receipt...
seth-send: Transaction included in block 25.
0x551c27e50aa07c52770924d1de13ac55b0186ec2

Congratulations! You did it! Or did you?

4. The return statement

Actually, you didn't deploy shit. Take a look:

$ seth code 0x551c27e50aa07c52770924d1de13ac55b0186ec2
0x

Did you really think it was gonna be that easy? C'mon. What did you think this was?

In order to actually deploy something, you need to include code that returns the code you actually want to deploy:

f3

f3 means return. You're finishing the execution and returning something back. In this case, as you're sending a transaction to the zero address, that means you're deploying a contract with whatever code you return.

So run this:

seth send --create 0x00f3

Nah, I'm just messing with you. That won't work either.

5. Memory and stack

I promise that very soon you'll be able to deploy stuff. Trust me.

f3 actually returns stuff that's stored in memory. For instance, let's assume that in memory position number 0 you have value 0 and you want to deploy that. So you have to let f3 know about the position in memory and the length of what you're interested in. You do that by putting it into the stack:

60 01
60 00
f3

Now we're talking. 60 means push. Push to the stack, that is. And whatever comes after 60 is the value you're pushing.

So in the code above, we're pushing a 01 and then a 00 into the stack, and then we're calling f3. What f3 does internally is take these two values from the stack and interpret them as the input it needs in order to return something from memory.

So it's like function parameters, but they're in the stack.

In this case, we're telling f3 to return 01 bytes of information from memory position 00. Why 01 though? Because we only wanna return a single byte of code, namely 00.

6. Putting stuff into memory

Now that we know how to return something from memory, let's learn how to put something in memory that we want to return. In this case, some code.

And then, we will be able to deploy!

So the command to put things into memory is 52. You tell it what it is you wanna put in memory and where in memory you wanna put it, and then you call 52:

60 00
60 00
52

In our case, we just wanna put a 00 in memory position 00. That's all.

7. Deploying our first bytecode smart contract, for real

60 00 // put 00 in the stack
60 00 // put another 00 in the stack
52    // store 00 in memory position 00
60 01 // put 01 in the stack
60 00 // put 00 in the stack
f3    // return 1 byte of data from memory position 00

Makes sense, or what? Let's try it:

$ seth send --create 600060005260016000f3
seth-send: Published transaction with 10 bytes of calldata.
seth-send: 0xabb42b4054173a797f61b8ed3d93e06810c200374be500cb6bd44b27dcf25ccc
seth-send: Waiting for transaction receipt...
seth-send: Transaction included in block 32.
0xbd6101666fb519155af79f377793444069a08899

And there you have it. Your first smart contract written directly in bytecode. Take a look:

$ seth code 0xbd6101666fb519155af79f377793444069a08899
0x0

You see? You just deployed a zero to the blockchain. Power to you.

8. A "hola mundo" string

Let's try something else. Let's deploy some code that actually does something. How about returning a "hola mundo" string?

$ seth --from-ascii 'hola mundo' | seth --to-bytes32
0x686f6c61206d756e646f00000000000000000000000000000000000000000000

There you have it. We need to return that.

But also, let's retain compatibility with Solidity, because what can you do. So in solidity strings are dynamically-sized arrays, so you first specify its location, then its length, and then the actual string:

$ seth --to-uint256 32; # string starts at position 32 (right after me) \
seth --to-uint256 10; # string has 10 characters \
seth --from-ascii 'hola mundo' | seth --to-bytes32
0x0000000000000000000000000000000000000000000000000000000000000020
0x000000000000000000000000000000000000000000000000000000000000000a
0x686f6c61206d756e646f00000000000000000000000000000000000000000000

Concat that removing the 0xs in the middle and there you have your string.

9. Actually returning the string

Let's work from backwards. You wanna write some code that returns something. So let's start with that. What whas the opcode that returns stuff? Do you remember? Hint: it starts with an f. Everything that's kinda related to finishing te execution starts with an f.

It's f3. So let's start with that:

f3

It takes two "arguments" as stack values. First, the memory position; then, the length of the stuff in memory you wanna return.

When I say first I mean the outermost element in the stack, and when I say second, I mean the innermost element, or the one that was pushed first. So first actually means last. But you know what I mean, right? Something like this:

[offset length]

That's the stack and it goes from left to right, so when you push something, you push it to the left of it. So f3 takes that, and then returns.

In this case, the length is gonna be 3 slots of 32 bytes, that is 96, or 0x60. And for the offset, let's put it in memory position 0:

60 60
60 00
f3

So you first push the length, then the offset in memory, then return. Now let's put that string in memory.

10. Putting stuff in memory (again)

Please tell me what was the opcode for putting stuff in memory. Did you put it in your memory?

It starts with a 5, as IO operations tend to do. It's 52. It takes two "arguments" from the stack, like so:

[offset value]

In this case, we want to put it in memory position 0. This goes against some convention, by the way. You're supposed to allocate some free memory space, but tbh I don't understand what's the use of that, so I'm not gonna do it.

60 00
52

Then we need to push the whole string to memory, but 52 only does it 32 bytes at a time. So we need 3 52 operations. Let's start with the first 32 bytes and work from there.

60 20
60 00
52

The first 32 bytes are just zeroes and then a 20 at the end. Check. The second 32 bytes are very similar:

60 0a
60 20
52

Take into account that you always need to specify two characters, even if the first is a zero because the EVM has no other way to tell bytes apart. The spaces between bytes are just a human convention, but it will be gone.

Now, let's put the actual string in memory. This is gonna be a bit of a challenge.

11. Pushing many bytes to the stack

60 only pushes one byte to a stack position. But that's a problem, because we need to push 10 bytes this time. If we use 60 ten times, we will end up with 10 bytes in 10 stack positions. What we want is 10 bytes in a single stack position.

Introducing the 6* family. Each member of the 6* family pushes an increasingly large number of bytes to the stack. Which is confusing, because 61 doesn't push 1 byte, but two. So in this case, we want the 69, which pushes 10 bytes.

So we run seth --from-ascii 'hola mundo' and put the result here:

69 68 6f 6c 61 20 6d 75 6e 64 6f

Then we put it in the third memory slot (which, being a 32-bytes slot, starts at 64, aka 0x40):

60 40
52

12. A recap of our "hola mundo" contract

// put a '32' in memory position '00'
60 20
60 00
52

// put a '10' in memory position '32'
60 0a
60 20
52

// put 'hola mundo' in memory position '64'
69 68 6f 6c 61 20 6d 75 6e 64 6f
60 40
52

// tell f3 the size and position of what we want to return
60 60
60 00

// return it
f3

13. Deploying "hola mundo"

The above code returns "hola mundo". But in order to deploy that code, we need to return it in the transaction we make to the blockchain. So we need to write code that returns code that returns "hola mundo". Like, literally.

We kinda did that earlier, but at the time we only needed to deploy one byte. Now we need to deploy like 20, so putting it in the stack and then in memory would be pretty cumbersome. There must be something better. A way to copy code into memory.

That's called 39. Things that start with 3 tend to be related to the user input. In this case, we are the user, and we are inputting the code. So 39 copies code into memory, and we must tell it where in memory we want to copy it, and the lenght and offset of the code we wanna copy.

So we need to push the following things into the stack in order to use 39:

[memoryPosition codePosition length]

The first one (I mean, the last one we should push) is easy: we're just gonna put that code in memory position 0. The second one is tricky, so let's leave it for later, and the third one is the length of the code. Let's count how many bytes our hola mundo contract has.

To me, it looks like it has 29 bytes. That's 1d. So let's go with that:

Now. When we run the transaction, this is the first code that we're gonna put there, and then we're gonna put all the hola mundo code. So the code position is gonna be right after this block of code. So let's count how many bytes it has (counting also the ??, that is, the count itself) and replace ?? with that number.

It looks like it has seven bytes. So, given it starts with 0, the position of the hola mundo code would be 07:

Oh shit. We forgot that we also need to return the code. Here we're merely copying it to memory. So the code position is gonna be larger that 07. Sorry about that. And welcome to the world of constantly counting bytes.

The code for returning would be

60 1d // return 1d bytes
60 00 // which are in memory position 0
f3

So putting it all together,

60 1d // copy 1d bytes
60 ?? // which are in code position ??
60 00 // to memory position 0
39    // do it
60 1d // return 1d bytes
60 00 // which are in memory position 0
f3    // do it

And now that we have it, we can count again the number of bytes we just wrote and replace ?? with it. It's 12, a.k.a. 0c:

60 1d // copy 1d bytes
60 0c // which are in code position 0c
60 00 // to memory position 0
39    // do it
60 1d // return 1d bytes
60 00 // which are in memory position 0
f3    // do it

14. Deploying "hola mundo" (for real now)

Putting together the deployment (a.k.a. constructor) code with the actual contract code, we have the following:

60 1d // copy 1d bytes
60 0c // which are in code position 0c
60 00 // to memory position 0
39    // do it
60 1d // return 1d     bytes
60 00 // which are in memory position 0
f3    // do it

// put a '32' in memory position '00'
60 20
60 00
52

// put a '10' in memory position '32'
60 0a
60 20
52

// put 'hola mundo' in memory position '64'
69 68 6f 6c 61 20 6d 75 6e 64 6f
60 40
52

// tell f3 the size and position of what we want to return
60 60
60 00

// return it
f3

Now we need to remove all the comments and put the bytecode together. Save this to a local file and do

sed 's/\/\/.*//g' file | tr '\n' ' ' | sed 's/ //g'

Now deploy the output with

$ sed 's/\/\/.*//g' file | tr '\n' ' ' | sed 's/ //g' | xargs seth send --create
seth-send: Published transaction with 41 bytes of calldata.
seth-send: 0xb30ce20c19fc6c5f7476c14b02987e69ae559d2f2fe01995aab49129ad6c526c
seth-send: Waiting for transaction receipt...
seth-send: Transaction included in block 3.
0x0e2e77d95bb96308e22ce237f766c5db1fc31657

15. Calling "hola mundo"

$ seth call 0x0e2e77d95bb96308e22ce237f766c5db1fc31657
0x0000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000686f6c61206d756e646f

Great!

Now let's transform that into ascii to finally see our long-awaited message:

$ seth call 0x0e2e77d95bb96308e22ce237f766c5db1fc31657 | seth --to-ascii
 
hola mundo

There you have it.

16. Next steps

Did you like bytecode programming? Check out my repo where I did a full ERC-20 implementation with bytecode. I also made some tools for easier processing of bytecode. Among other niceties, you don't have to count bytes anymore with these tools.

Also, check out https://www.ethervm.io/ for a full reference of all opcodes. https://github.com/crytic/evm-opcodes includes the gas cost of each, as well as https://github.com/wolflo/evm-opcodes. I'm not sure whether these pages are really up-to-date.

https://github.com/quilt/etk/ has a more robust framework for bytecode modification. It uses the opcode names instead of their actual numbers and allows for using multiple files.