Write-up for Assembly Crash Course

# **Write-up for Assembly Crash Course** ### 1. set-register: --- > In this level, you will work with registers! Please set the following: > rdi = 0x1337 Use mov to assign 0x1337 to rdi: ```assembly mov rdi, 0x1337 ``` ### 2. set-multiple-registers: --- > In this level, you will work with multiple registers. Please set the following: > rax = 0x1337 > r12 = 0xCAFED00D1337BEEF > rsp = 0x31337 Similarly, use MOV to assign values to registers: ```assembly mov rax, 0x1337 mov r12, 0xCAFED00D1337BEEF mov rsp, 0x31337 ``` ### 3. add-to-register: --- > Do the following: > add 0x331337 to rdi Here, we need to add 0x331337 to rdi, so we should use add instead of mov rdi, 0x331337 (which only changes the value without adding). ```assembly add rdi, 0x331337 ``` ### 4. linear-equation-registers: --- > Using your new knowledge, please compute the following: > > f(x) = mx + b, where: > m = rdi > x = rsi > b = rdx > Place the result into rax. In this challenge, we need to perform the operation: mx + b, where m = `rdi`, x = `rsi`, and b = `rdx`. * First, we compute m * x (rdi *rsi) using `imul`: ```assembly imul rdi, rsi ``` * Then, we store this value in rax using: ```assembly mov rax, rdi ``` * Finally, we add rdx to rax with: ```assembly add rax, rdx ``` ### 5. integer-division: --- > Please compute the following: > > speed = distance / time, where: > distance = rdi > time = rsi > speed = rax > Note that distance will be at most a 64-bit value, so rdx should be 0 when dividing. In this challenge, we need to divide `rdi` by `rsi`, and store the result in `rax`. We use `div` to perform the division. Since when using `div`, only `rax` is used as the dividend, and the quotient is also stored in `rax` while the remainder is stored in `rdx`, we first move `rdi` into `rax` and set `rdx` to 0 to ensure an accurate result. ```assembly mov rdx, 0 mov rax, rdi div rsi `````` ### 6. modulo-operation: --- > Please compute the following: rdi % rsi > > Place the value in rax. In the previous challenge, we performed division to get the quotient. In this case, we are dividing to get the remainder, which is stored in `rdx`. We will divide as before and then move the value of `rdx` into `rax`. ```assembly mov rdx, 0 mov rax, rdi div rsi mov rax, rdx ``` ### 7. set-upper-byte: --- > For example, the lower 32 bits of rax can be accessed using eax, the lower 16 bits using ax, and the lower 8 bits using al. //64-bit ``` MSB LSB +----------------------------------------+ | rax | +--------------------+-------------------+ | eax | +---------+---------+ | ax | +----+----+ | ah | al | +----+----+ Lower register bytes access is applicable to almost all registers. ``` > Using only one move instruction, please set the upper 8 bits of the ax register to 0x42. Since the highest 8-bit of the `ax` register can be accessed through `al` and `ah`, we can set the value of `ah` to `0x42` using: ```assembly mov ah, 0x42 ``` ### 8. efficient-modulo: --- > If we have x % y, and y is a power of 2, such as 2^n, the result will be the lower n bits of x. > Please compute the following: > rax = rdi % 256 > rbx = rsi % 65536 For the operation `rax = rdi % 256` and `rbx = rsi % 65536`, since 256 and 65536 are 2^8^ and 2^16^ respectively, the remainder when dividing by 256 and 65536 will be the last 8-bit and the last 16-bit of the registers holding the dividend. For `rax`, the lowest 8-bit are `al`, and for `rdi`, the lowest 8-bit are `dil`. For `rbx`, the lowest 16-bit are `bx`, and for `rsi`, the lowest 16-bit are `si`. Therefore, the result of the `mod` operation for: - `rdi % 256` will be stored in `dil`. - `rsi % 65536` will be stored in `si`. ```assembly mov al, dil mov bx, si ``` ### 9. efficient-modulo: --- > Using only the following instructions: > > mov, shr, shl > Please perform the following: Set rax to the 5th least significant byte of rdi. > > For example: > > rdi = | B7 | B6 | B5 | B4 | B3 | B2 | B1 | B0 | > Set rax to the value of B4 We will shift `rdi` to the right by 32-bit, so the value B4 is moved to the lower part of the register. After that, we will move the value of `dil` into `al`. ```assembly shr rdi, 32 mov al, dil ``` ### 10. bitwise-and: --- > Without using the following instructions: mov, xchg, please perform the following: > > Set rax to the value of (rdi AND rsi) We have: a xor a = 0 a xor b = b (when a = 0) First, use `xor` to set `rax` = `rdi`, since `rax xor rax = 0`, which leaves `rdi` in `rax`. Then, use and to perform a bitwise and between `rax` and `rsi`. ```assembly xor rax, rax xor rax, rdi and rax, rsi ``` ### 11. check-even: --- > Using only the following instructions: > > and, or, xor > > Implement the following logic: > > if x is even then y = 1 > else y = 0 > Where: > > x = rdi > y = rax The parity (even or odd) of a number is determined by its 2^0^ bit. - If this bit is `1`, the number is odd. - If this bit is `0`, the number is even. To implement this logic: - If `rdi` is even, `rax = 1`. - If `rdi` is odd, `rax = 0`. We using and 1 to extract the least significant bit of rdi is 1 or 0. We use `and 1` to check whether the 2^0^ bit of `rdi` is `1` or `0`. Then, we use `xor 1` to invert the bit (`1 → 0`, `0 → 1`). ```assembly and rdi, 1 xor rax, rax xor rax, rdi xor rax, 1 ``` ### 12. memory-read: --- > Please perform the following: Place the value stored at 0x404000 into rax. Make sure the value in rax is the original value stored at 0x404000. Use the following instruction to move the value at memory address `0x404000` into `rax`: ``` assembly mov rax, [0x404000] ``` ### 13. memory-write: --- > Please perform the following: Place the value stored in rax to 0x404000. Using: ``` assembly mov [0x404000], rax ``` ### 14. memory-increment: --- > Please perform the following: > Place the value stored at 0x404000 into rax. > Increment the value stored at the address 0x404000 by 0x1337. > Make sure the value in rax is the original value stored at 0x404000 and make sure that [0x404000] now has the incremented value. Using: ``` assembly mov rax, [0x404000] mov rbx, 0x1337 add [0x404000], rbx ``` ### 15. byte-access: --- > Please perform the following: Set rax to the byte at 0x404000. We have: Byte = 1 byte = 8 bits ```assembly mov al, [0x404000] ``` ### 16. memory-size-access: --- > Please perform the following: > > Set rax to the byte at 0x404000 > Set rbx to the word at 0x404000 > Set rcx to the double word at 0x404000 > Set rdx to the quad word at 0x404000 The breakdown of the names of memory sizes: Quad Word = 8 Bytes = 64 bits Double Word = 4 bytes = 32 bits Word = 2 bytes = 16 bits Byte = 1 byte = 8 bits ```assembly mov al, [0x404000] mov bx, [0x404000] mov ecx, [0x404000] mov rdx, [0x404000] ``` ### 17. little-endian-write: --- > For this challenge we will give you two addresses created dynamically each run. > > The first address will be placed in rdi. > The second will be placed in rsi. > > Using the earlier mentioned info, perform the following: > Set [rdi] = 0xdeadbeef00001337 > Set [rsi] = 0xc0ffee0000 > > Hint: it may require some tricks to assign a big constant to a dereferenced register. > Try setting a register to the constant value then assigning that register to the dereferenced register. Use the `mov` instruction to store values into memory at specific addresses. Then, we will use `mov` instructions to store values from registers into memory: `0xdeadbeef00001337` is a 64-bit value in binary, stored using a 64-bit register. `0xc0ffee0000` is a 40-bit value in binary, stored using a 64-bit register. ```assembly mov rax, 0xdeadbeef00001337 mov rbx, 0xc0ffee0000 mov [rdi], rax mov [rsi], rbx ``` ### 18. memory-sum: --- > Perform the following: > Load two consecutive quad words from the address stored in rdi. > Calculate the sum of the previous steps' quad words. > Store the sum at the address in rsi. Two consecutive quad words from the address stored in rdi are rdi and rdi+8. Use add and mov to store a value in rax and then save the value in rax to the memory address stored in rsi: ```assembly mov rax, [rdi] add rax, [rdi+8] mov [rsi], rax ``` ### 19. stack-subtraction: --- > On x86, the pop instruction will take the value from the top of the stack and put it into a register. > > Similarly, the push instruction will take the value in a register and push it onto the top of the stack. > > Using these instructions, take the top value of the stack, subtract rdi from it, then put it back. Retrieve the value at the top of the stack using `pop rax`. Subtract the value of `rdi` from `rax`. Push the result back onto the stack using `push rax`. ```assembly! pop rax sub rax, rdi push rax ``` ### 20. swap-stack-values: --- > Using only the following instructions: > > push > pop > Swap values in rdi and rsi. > > Example: > > If to start rdi = 2 and rsi = 5 > Then to end rdi = 5 and rsi = 2 Use `push` to sequentially store the values of `rdi` and `rsi` into the register, then use `pop` to retrieve them in reverse order, `rsi` first, then `rdi`. ```assembly! push rdi push rsi pop rdi pop rsi ``` ### 21. average-stack-values: --- > Without using pop, please calculate the average of 4 consecutive quad words stored on the stack. Push the average on the stack. > > Hint: > > RSP+0x?? Quad Word A > RSP+0x?? Quad Word B > RSP+0x?? Quad Word C > RSP Quad Word D `rsp` always stores the memory address of the top of the stack so that the quad words are [rsp+24], [rsp+16], [rsp+8], and [rsp], respectively. We will use rax to store the sum of these values and then divide by rbx (4 is loaded into rbx). ```assembly! mov rax, [rsp] add rax, [rsp+8] add rax, [rsp+16] add rax, [rsp+24] mov rbx, 4 div rbx push rax ``` ### 22. absolute-jump: --- > > For all jumps, there are three types: > > Relative jumps: jump + or - the next instruction. > Absolute jumps: jump to a specific address. > Indirect jumps: jump to the memory address specified in a register. > > In this level, we will ask you to do an absolute jump. Perform the following: Jump to the absolute address 0x403000. In x86, absolute jumps (jump to a specific address) are accomplished by first putting the target address in a register reg, then doing jmp reg. ```assembly! mov rax, 0x403000 jmp rax ``` ### 23. relative-jump: --- > In this level, we will ask you to do a relative jump. You will need to fill space in your code with something to make this relative jump possible. We suggest using the nop instruction. It's 1 byte long and very predictable. > > In fact, the assembler that we're using has a handy .rept directive that you can use to repeat assembly instructions some number of times: GNU Assembler Manual > > Useful instructions for this level: > > jmp (reg1 | addr | offset) > nop > Hint: For the relative jump, look up how to use labels in x86. > > Using the above knowledge, perform the following: > > Make the first instruction in your code a jmp. > Make that jmp a relative jump to 0x51 bytes from the current position. > At the code location where the relative jump will redirect control flow, set rax to 0x1. We know that `.rept` helps us repeat a certain instruction, and now we need to use `jmp` to jump to an instruction located 0x51 bytes (or 81 bytes) ahead. Since the `nop` instruction takes up 1 byte and does nothing, we will apply that here. ```assembly! jmp jump .rept 81 nop .endr jump: mov rax, 0x1 ``` ### 24. jump-trampoline: --- > Create a two jump trampoline: > Make the first instruction in your code a jmp. > Make that jmp a relative jump to 0x51 bytes from its current position. > At 0x51, write the following code: > Place the top value on the stack into register rdi. > jmp to the absolute address 0x403000. We simply combine the two previous tasks: ```assembly! jmp jump .rept 81 nop .endr jump: pop rdi mov rax, 0x403000 jmp rax ``` ### 25. conditional-jump: --- > Using the above knowledge, implement the following: > > if [x] is 0x7f454c46: > y = [x+4] + [x+8] + [x+12] > else if [x] is 0x00005A4D: > y = [x+4] - [x+8] - [x+12] > else: > y = [x+4] * [x+8] * [x+12] > Where: > > x = rdi, y = rax. > Assume each dereferenced value is a signed dword. This means the values can start as a negative value at each memory position. > > A valid solution will use the following at least once: > > jmp (any variant), cmp ```assembly! mov eax, dword ptr [rdi] ; đọc [x] vào eax cmp eax, 0x7f454c46 jne else_if mov eax, dword ptr [rdi+4] ; eax = [x+4] add eax, dword ptr [rdi+8] ; eax += [x+8] add eax, dword ptr [rdi+12] ; eax += [x+12] jmp end else_if: cmp eax, 0x00005A4D jne else mov eax, dword ptr [rdi+4] ; eax = [x+4] sub eax, dword ptr [rdi+8] ; eax -= [x+8] sub eax, dword ptr [rdi+12] ; eax -= [x+12] jmp end else: mov eax, dword ptr [rdi+4] ; eax = [x+4] imul eax, dword ptr [rdi+8] ; eax *= [x+8] imul eax, dword ptr [rdi+12] ; eax *= [x+12] end: nop ``` ### 26. indirect-jump: --- > > Using the above knowledge, implement the following logic: > > if rdi is 0: > jmp 0x40301e > else if rdi is 1: > jmp 0x4030da > else if rdi is 2: > jmp 0x4031d5 > else if rdi is 3: > jmp 0x403268 > else: > jmp 0x40332c > Please do the above with the following constraints: > > Assume rdi will NOT be negative. > Use no more than 1 cmp instruction. > Use no more than 3 jumps (of any variant). > We will provide you with the number to 'switch' on in rdi. > We will provide you with a jump table base address in rsi. > Here is an example table: > > [0x40427c] = 0x40301e (addrs will change) > [0x404284] = 0x4030da > [0x40428c] = 0x4031d5 > [0x404294] = 0x403268 > [0x40429c] = 0x40332c Use `cmp` to compare the value of `rdi` with 3. If it is greater than 3, jump to `default_case`, which jump to default address `rsi + 0x20` (use `ja` for unsigned comparison). In the cases of 0, 1, 2, or 3, jump to the address `[rsi + rdi * 8]` following the jump table. ```assembly! cmp rdi, 3 ja default_case jmp [rsi + rdi*8] default_case: jmp [rsi + 0x20] ``` ### 27. average-loop: --- > In most programming languages, a structure exists called the for-loop, which allows you to execute a set of instructions for a bounded amount of times. The bounded amount can be either known before or during the program's run, with "during" meaning the value is given to you dynamically. > > As an example, a for-loop can be used to compute the sum of the numbers 1 to n: > > sum = 0 > i = 1 > while i <= n: > sum += i > i += 1 > Please compute the average of n consecutive quad words, where: > > rdi = memory address of the 1st quad word > rsi = n (amount to loop for) > rax = average computed Similar to computing the average of 4 integer quad words, the average of n integer quad words can be calculated using a loop and jump instructions. We use rbx to store the index (from 1 to n) to control the loop. ```assembly! mov rax, [rdi] mov rbx, 0x01 loop: cmp rbx, rsi je end add rax, [rdi+rbx*8] add rbx, 0x01 jmp loop end: div rsi ``` ### 28. count-non-zero: --- > A second loop structure exists called the while-loop to fill this demand. In the while-loop, you iterate until a condition is met. > > As an example, say we had a location in memory with adjacent numbers and we wanted to get the average of all the numbers until we find one bigger or equal to 0xff: > > average = 0 > i = 0 > while x[i] < 0xff: > average += x[i] > i += 1 > average /= i > Using the above knowledge, please perform the following: > > Count the consecutive non-zero bytes in a contiguous region of memory, where: > > rdi = memory address of the 1st byte > rax = number of consecutive non-zero bytes > Additionally, if rdi = 0, then set rax = 0 (we will check)! > > An example test-case, let: > > rdi = 0x1000 > [0x1000] = 0x41 > [0x1001] = 0x42 > [0x1002] = 0x43 > [0x1003] = 0x00 > Then: rax = 3 should be set. ``` mov rax, 0 ; rax = 0, initialize counter cmp rdi, 0 ; if rdi == 0 je end ; → rax remains 0, exit loop: mov bl, [rdi] ; read 1 byte from address rdi into bl cmp bl, 0 ; compare with 0 je end ; if equal → end loop inc rax ; increment counter inc rdi ; move to the next byte jmp loop end: nop ``` ### 29. string-lower: --- > In this level, you will be provided with a contiguous region of memory again and will loop over each performing a conditional operation till a zero byte is reached. All of which will be contained in a function! > > A function is a callable segment of code that does not destroy control flow. > > Functions use the instructions "call" and "ret". > > The "call" instruction pushes the memory address of the next instruction onto the stack and then jumps to the value stored in the first argument. > > Let's use the following instructions as an example: > > 0x1021 mov rax, 0x400000 > 0x1028 call rax > 0x102a mov [rsi], rax > call pushes 0x102a, the address of the next instruction, onto the stack. > call jumps to 0x400000, the value stored in rax. > The "ret" instruction is the opposite of "call". > > ret pops the top value off of the stack and jumps to it. > > Let's use the following instructions and stack as an example: > > Stack ADDR VALUE > 0x103f mov rax, rdx RSP + 0x8 0xdeadbeef > 0x1042 ret RSP + 0x0 0x0000102a > Here, ret will jump to 0x102a. > > Please implement the following logic: > > str_lower(src_addr): > i = 0 > if src_addr != 0: > while [src_addr] != 0x00: > if [src_addr] <= 0x5a: > [src_addr] = foo([src_addr]) > i += 1 > src_addr += 1 > return i > foo is provided at 0x403000. foo takes a single argument as a value and returns a value. > > All functions (foo and str_lower) must follow the Linux amd64 calling convention (also known as System V AMD64 ABI): System V AMD64 ABI > > Therefore, your function str_lower should look for src_addr in rdi and place the function return in rax. > > An important note is that src_addr is an address in memory (where the string is located) and [src_addr] refers to the byte that exists at src_addr. > > Therefore, the function foo accepts a byte as its first argument and returns a byte. Solve: ```assembly! mov rbx, 0 ; Initialize counter i = 0 cmp rdi, 0 ; Check if src_addr is null je end cmp rsi, 0 ; Check if string length = 0 je end while: mov al, byte ptr [rdi] ; Load current character cmp al, 0 ; Check for null terminator je end cmp al, 0x5a ; Compare with 'Z' ja next inc rbx ; Increment counter i push rdi ; Save string pointer mov dil, al ; Pass character to foo call 0x403000 ; Call foo pop rdi ; Restore pointer mov byte ptr [rdi], al ; Store result next: inc rdi ; Advance string pointer dec rsi ; Decrement remaining length jz end ; Exit if string ends jmp while end: mov rax, rbx ; Return counter i ret ``` ### 30. most-common-byte: --- > A function stack frame is a set of pointers and values pushed onto the stack to save things for later use and allocate space on the stack for function variables. > > First, let's talk about the special register rbp, the Stack Base Pointer. > > The rbp register is used to tell where our stack frame first started. As an example, say we want to construct some list (a contiguous space of memory) that is only used in our function. The list is 5 elements long, and each element is a dword. A list of 5 elements would already take 5 registers, so instead, we can make space on the stack! > > The assembly would look like: > > ; setup the base of the stack as the current top > mov rbp, rsp > ; move the stack 0x14 bytes (5 * 4) down > ; acts as an allocation > sub rsp, 0x14 > ; assign list[2] = 1337 > mov eax, 1337 > mov [rbp-0xc], eax > ; do more operations on the list ... > ; restore the allocated space > mov rsp, rbp > ret > Notice how rbp is always used to restore the stack to where it originally was. If we don't restore the stack after use, we will eventually run out. In addition, notice how we subtracted from rsp, because the stack grows down. To make the stack have more space, we subtract the space we need. The ret and call still work the same. > > Consider the fact that to assign a value to list[2] we subtract 12 bytes (3 dwords). That is because stack grows down and when we moved rsp our stack contains addresses <rsp, rbp). > > Once again, please make function(s) that implement the following: > > most_common_byte(src_addr, size): > i = 0 > while i <= size-1: > curr_byte = [src_addr + i] > [stack_base - curr_byte * 2] += 1 > i += 1 > > b = 1 > max_freq = 0 > max_freq_byte = 0 > while b <= 0x100: > if [stack_base - b * 2] > max_freq: > max_freq = [stack_base - b * 2] > max_freq_byte = b > b += 1 > > return max_freq_byte > Assumptions: > > There will never be more than 0xffff of any byte > The size will never be longer than 0xffff > The list will have at least one element > Constraints: > > You must put the "counting list" on the stack > You must restore the stack like in a normal function > You cannot modify the data at src_addr I had some dificulties with this challenge, so that I analyzed it a bit more carefully (also as a way to review the previous challs) **Set up the function frame:** ```assembly most_common_byte: push rbp mov rbp, rsp sub rsp, 0x200 ``` * Allocate 512 bytes = 256 words (2 bytes per entry) on the stack to create a frequency array for counting occurrences of bytes (0–255). **Initialize the count array to 0:** ```assembly mov rcx, 0x100 lea r8, [rbp - 0x200] xor eax, eax .init_loop: mov [r8 + rcx*2 - 2], ax loop .init_loop ``` * `rcx = 256` — number of entries (words). * `r8` is the base address of the count array. * The loop writes `ax = 0` to each element at `[r8 + rcx*2 - 2]` (since each entry is a word = 2 bytes). **Count the frequency of each byte:** ```assembly xor ecx, ecx .count_loop: cmp ecx, esi jge .count_done movzx eax, byte [rdi + ecx] inc word ptr [r8 + rax*2] inc ecx jmp .count_loop ``` * `ecx` is used as the loop index `i`. * Loop continues until `i >= size` (`esi`). * Reads the byte at `[rdi + i]` and stores it in `eax` (value from 0–255). * Increments the corresponding count at `[r8 + eax*2]`. **Find the byte with the highest frequency:** ```assembly .count_done: mov ecx, 0 xor edx, edx xor eax, eax ``` * `ecx` iterates through each byte value (0–255). * `edx` stores `max_freq`. * `eax` stores `max_freq_byte`. ```assembly .find_max_loop: cmp ecx, 0xff jg .find_max_done movzx ebx, word ptr [r8 + ecx*2] cmp ebx, edx jle .not_greater mov edx, ebx mov eax, ecx .not_greater: inc ecx jmp .find_max_loop ``` * Compares the frequency of each byte (`ecx`) with the current `max_freq` (`edx`). * If greater, updates `max_freq = ebx` and `max_freq_byte = ecx`. **End the function:** ```assembly .find_max_done: mov rsp, rbp pop rbp ret ``` * Restore the stack to its original state. * Return the result (most common byte) in `rax`.