# EVM bytecode programming ## 1. Environment setup Run a local testnet using [dapptools](https://github.com/dapphub/dapptools): ```bash dapp testnet ``` On a second terminal, source your testnet environment: ```bash source ~/env-testnet ``` Wait, you don't have a testnet env file? Here it is: ```bash= export ETH_KEYSTORE=~/.dapp/testnet/8545/keystore export ETH_FROM=$(cat ~/.dapp/testnet/8545/config/account | head -n 1) export ETH_FROM_2=$(cat ~/.dapp/testnet/8545/config/account | tail -n 1) export ETH_PASSWORD=/dev/null export ETH_GAS=1000000 export ETH_RPC_URL=http://localhost:8545 ``` Always source that file *after* running the testnet. Allright, let's do this! ## 2. The simplest smart contract ever ``` 00 ``` That's it. That opcode means `stop`. Not revert or anything. Just `stop`. So if you send a transaction, any transaction to that smart contract, it will accept it. Isn't that cool? ## 3. Deploying the simplest smart contract ever Run the following: ```bash seth send --create 0x00 ``` Did you run it? Please run it now. ```bash $ seth send --create 0x00 seth-send: Published transaction with 1 bytes of calldata. seth-send: 0x74fc23e06481497e0df23476341e1595dd4b5b887337dc5a102e0e9c45a05f63 seth-send: Waiting for transaction receipt... seth-send: Transaction included in block 25. 0x551c27e50aa07c52770924d1de13ac55b0186ec2 ``` Congratulations! You did it! Or did you? ## 4. The return statement Actually, you didn't deploy shit. Take a look: ```bash $ seth code 0x551c27e50aa07c52770924d1de13ac55b0186ec2 0x ``` Did you really think it was gonna be that easy? C'mon. What did you think this was? In order to actually deploy something, you need to include code that *returns* the code you actually want to deploy: ``` f3 ``` `f3` means `return`. You're finishing the execution and returning something back. In this case, as you're sending a transaction to the zero address, that means you're deploying a contract with whatever code you return. So run this: ```bash seth send --create 0x00f3 ``` Nah, I'm just messing with you. That won't work either. ## 5. Memory and stack I promise that very soon you'll be able to deploy stuff. Trust me. `f3` actually returns stuff that's stored in memory. For instance, let's assume that in memory position number 0 you have value 0 and you want to deploy that. So you have to let `f3` know about the position in memory and the length of what you're interested in. You do that by putting it into the stack: ``` 60 01 60 00 f3 ``` Now we're talking. `60` means `push`. Push to the stack, that is. And whatever comes after `60` is the value you're pushing. So in the code above, we're pushing a `01` and then a `00` into the stack, and then we're calling `f3`. What `f3` does internally is take these two values from the stack and interpret them as the input it needs in order to return something from memory. So it's like function parameters, but they're in the stack. In this case, we're telling `f3` to return `01` bytes of information from memory position `00`. Why `01` though? Because we only wanna return a single byte of code, namely `00`. ## 6. Putting stuff into memory Now that we know how to return something from memory, let's learn how to put something in memory that we want to return. In this case, some code. And then, we will be able to deploy! So the command to put things into memory is `52`. You tell it what it is you wanna put in memory and where in memory you wanna put it, and then you call `52`: ``` 60 00 60 00 52 ``` In our case, we just wanna put a `00` in memory position `00`. That's all. ## 7. Deploying our first bytecode smart contract, for real ``` 60 00 // put 00 in the stack 60 00 // put another 00 in the stack 52 // store 00 in memory position 00 60 01 // put 01 in the stack 60 00 // put 00 in the stack f3 // return 1 byte of data from memory position 00 ``` Makes sense, or what? Let's try it: ```bash $ seth send --create 600060005260016000f3 seth-send: Published transaction with 10 bytes of calldata. seth-send: 0xabb42b4054173a797f61b8ed3d93e06810c200374be500cb6bd44b27dcf25ccc seth-send: Waiting for transaction receipt... seth-send: Transaction included in block 32. 0xbd6101666fb519155af79f377793444069a08899 ``` And there you have it. Your first smart contract written directly in bytecode. Take a look: ```bash $ seth code 0xbd6101666fb519155af79f377793444069a08899 0x0 ``` You see? You just deployed a zero to the blockchain. Power to you. ## 8. A "hola mundo" string Let's try something else. Let's deploy some code that actually does something. How about returning a "hola mundo" string? ```bash $ seth --from-ascii 'hola mundo' | seth --to-bytes32 0x686f6c61206d756e646f00000000000000000000000000000000000000000000 ``` There you have it. We need to return that. But also, let's retain compatibility with Solidity, because what can you do. So in solidity strings are dynamically-sized arrays, so you first specify its location, then its length, and then the actual string: ```bash $ seth --to-uint256 32; # string starts at position 32 (right after me) \ seth --to-uint256 10; # string has 10 characters \ seth --from-ascii 'hola mundo' | seth --to-bytes32 0x0000000000000000000000000000000000000000000000000000000000000020 0x000000000000000000000000000000000000000000000000000000000000000a 0x686f6c61206d756e646f00000000000000000000000000000000000000000000 ``` Concat that removing the `0x`s in the middle and there you have your string. ## 9. Actually returning the string Let's work from backwards. You wanna write some code that returns something. So let's start with that. What whas the opcode that returns stuff? Do you remember? Hint: it starts with an `f`. Everything that's kinda related to finishing te execution starts with an `f`. It's `f3`. So let's start with that: ``` f3 ``` It takes two "arguments" as stack values. First, the memory position; then, the length of the stuff in memory you wanna return. When I say *first* I mean the outermost element in the stack, and when I say second, I mean the innermost element, or the one that was pushed first. So first actually means last. But you know what I mean, right? Something like this: ``` [offset length] ``` That's the stack and it goes from left to right, so when you push something, you push it to the left of it. So `f3` takes that, and then returns. In this case, the length is gonna be 3 slots of 32 bytes, that is 96, or 0x60. And for the offset, let's put it in memory position 0: ``` 60 60 60 00 f3 ``` So you first push the length, then the offset in memory, then return. Now let's put that string in memory. ## 10. Putting stuff in memory (again) Please tell me what was the opcode for putting stuff in memory. Did you put it in *your* memory? It starts with a `5`, as IO operations tend to do. It's `52`. It takes two "arguments" from the stack, like so: ``` [offset value] ``` In this case, we want to put it in memory position 0. This goes against some convention, by the way. You're supposed to allocate some free memory space, but tbh I don't understand what's the use of that, so I'm not gonna do it. ``` 60 00 52 ``` Then we need to push the whole string to memory, but `52` only does it 32 bytes at a time. So we need 3 `52` operations. Let's start with the first 32 bytes and work from there. ``` 60 20 60 00 52 ``` The first 32 bytes are just zeroes and then a `20` at the end. Check. The second 32 bytes are very similar: ``` 60 0a 60 20 52 ``` Take into account that you always need to specify two characters, even if the first is a zero because the EVM has no other way to tell bytes apart. The spaces between bytes are just a human convention, but it will be gone. Now, let's put the actual string in memory. This is gonna be a bit of a challenge. ## 11. Pushing many bytes to the stack `60` only pushes one byte to a stack position. But that's a problem, because we need to push 10 bytes this time. If we use `60` ten times, we will end up with 10 bytes in 10 stack positions. What we want is 10 bytes in a single stack position. Introducing the `6*` family. Each member of the `6*` family pushes an increasingly large number of bytes to the stack. Which is confusing, because `61` doesn't push `1` byte, but two. So in this case, we want the `69`, which pushes 10 bytes. So we run `seth --from-ascii 'hola mundo'` and put the result here: ``` 69 68 6f 6c 61 20 6d 75 6e 64 6f ``` Then we put it in the third memory slot (which, being a 32-bytes slot, starts at 64, aka 0x40): ``` 60 40 52 ``` ## 12. A recap of our "hola mundo" contract ``` // put a '32' in memory position '00' 60 20 60 00 52 // put a '10' in memory position '32' 60 0a 60 20 52 // put 'hola mundo' in memory position '64' 69 68 6f 6c 61 20 6d 75 6e 64 6f 60 40 52 // tell f3 the size and position of what we want to return 60 60 60 00 // return it f3 ``` ## 13. Deploying "hola mundo" The above code returns "hola mundo". But in order to deploy that code, we need to return it in the transaction we make to the blockchain. So we need to write code that returns code that returns "hola mundo". Like, literally. We kinda did that [earlier](#7-Deploying-our-first-bytecode-smart-contract-for-real), but at the time we only needed to deploy one byte. Now we need to deploy like 20, so putting it in the stack and then in memory would be pretty cumbersome. There must be something better. A way to copy code into memory. That's called `39`. Things that start with `3` tend to be related to the user input. In this case, we are the user, and we are inputting the code. So `39` copies code into memory, and we must tell it where in memory we want to copy it, and the lenght and offset of the code we wanna copy. So we need to push the following things into the stack in order to use `39`: ``` [memoryPosition codePosition length] ``` The first one (I mean, the last one we should push) is easy: we're just gonna put that code in memory position 0. The second one is tricky, so let's leave it for later, and the third one is the length of the code. Let's count how many bytes our hola mundo contract has. To me, it looks like it has 29 bytes. That's `1d`. So let's go with that: ``` 60 1d 60 ?? 60 00 39 ``` Now. When we run the transaction, this is the first code that we're gonna put there, and then we're gonna put all the hola mundo code. So the code position is gonna be right after this block of code. So let's count how many bytes it has (counting also the ??, that is, the count itself) and replace ?? with that number. It looks like it has seven bytes. So, given it starts with 0, the position of the hola mundo code would be `07`: ``` 60 1d 60 07 60 00 39 ``` Oh shit. We forgot that we also need to return the code. Here we're merely copying it to memory. So the code position is gonna be larger that `07`. Sorry about that. And welcome to the world of constantly counting bytes. The code for returning would be ``` 60 1d // return 1d bytes 60 00 // which are in memory position 0 f3 ``` So putting it all together, ``` 60 1d // copy 1d bytes 60 ?? // which are in code position ?? 60 00 // to memory position 0 39 // do it 60 1d // return 1d bytes 60 00 // which are in memory position 0 f3 // do it ``` And now that we have it, we can count again the number of bytes we just wrote and replace `??` with it. It's 12, a.k.a. `0c`: ``` 60 1d // copy 1d bytes 60 0c // which are in code position 0c 60 00 // to memory position 0 39 // do it 60 1d // return 1d bytes 60 00 // which are in memory position 0 f3 // do it ``` ## 14. Deploying "hola mundo" (for real now) Putting together the deployment (a.k.a. constructor) code with the actual contract code, we have the following: ``` 60 1d // copy 1d bytes 60 0c // which are in code position 0c 60 00 // to memory position 0 39 // do it 60 1d // return 1d bytes 60 00 // which are in memory position 0 f3 // do it // put a '32' in memory position '00' 60 20 60 00 52 // put a '10' in memory position '32' 60 0a 60 20 52 // put 'hola mundo' in memory position '64' 69 68 6f 6c 61 20 6d 75 6e 64 6f 60 40 52 // tell f3 the size and position of what we want to return 60 60 60 00 // return it f3 ``` Now we need to remove all the comments and put the bytecode together. Save this to a local file and do ``` sed 's/\/\/.*//g' file | tr '\n' ' ' | sed 's/ //g' ``` Now deploy the output with ```bash $ sed 's/\/\/.*//g' file | tr '\n' ' ' | sed 's/ //g' | xargs seth send --create seth-send: Published transaction with 41 bytes of calldata. seth-send: 0xb30ce20c19fc6c5f7476c14b02987e69ae559d2f2fe01995aab49129ad6c526c seth-send: Waiting for transaction receipt... seth-send: Transaction included in block 3. 0x0e2e77d95bb96308e22ce237f766c5db1fc31657 ``` ## 15. Calling "hola mundo" ```bash $ seth call 0x0e2e77d95bb96308e22ce237f766c5db1fc31657 0x0000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000000a00000000000000000000000000000000000000000000686f6c61206d756e646f ``` Great! Now let's transform that into ascii to finally see our long-awaited message: ``` $ seth call 0x0e2e77d95bb96308e22ce237f766c5db1fc31657 | seth --to-ascii hola mundo ``` There you have it. ## 16. Next steps Did you like bytecode programming? Check out [my repo](https://github.com/e18r/evm) where I did a full ERC-20 implementation with bytecode. I also made some tools for easier processing of bytecode. Among other niceties, you don't have to count bytes anymore with these tools. Also, check out https://www.ethervm.io/ for a full reference of all opcodes. https://github.com/crytic/evm-opcodes includes the gas cost of each, as well as https://github.com/wolflo/evm-opcodes. I'm not sure whether these pages are really up-to-date. https://github.com/quilt/etk/ has a more robust framework for bytecode modification. It uses the opcode names instead of their actual numbers and allows for using multiple files.