--- tags: CSE --- --- > [toc] --- ## MIPS __MIPS__ (Microprocessor without Interlocked Pipelined Stages) is a reduced instruction set computer([RISC](https://en.wikipedia.org/wiki/Reduced_instruction_set_computer)) instruction set architecture([ISA](https://en.wikipedia.org/wiki/Instruction_set_architecture)). ### registers | Number | Name | Comments | | ------ | ---- | -------- | | 0 | $zero| always zero | | 1 | $at | reserved for assembler | | 2,3 | $v0 - $v1| first and second return values, respectively| |4,...,7|$a0 - $a3| first four arguments to functions | |8,...,15|$t0 - $t7| temporary registers | |16,...,23|$s0 - $s7| saved registers | |24,25|$t8, $t9| temporary registers | |26,27|$k0, $k1| reserved for kernel (operating system) | |28|$gp|global pointer| |29|$sp|stack pointer| |30|$fp|frame pointer| |31|$ra|return address| ### instruction formats * __R-type:__ op(6)+rs(5)+rt(5)+rd(5)+shamt(5)+funct(6) * __I-type:__ op(6)+rs(5)+rt(5)+immediate(16) * __J-type:__ op(6)+address(26) _*the unit above is bit_ __op__: basic operation of the instruction, traditionally called the __opcode__. __rs__: the first register source operand. __rt__: the second register source operand. __rd__: the register destination operand. it gets the result of the operation. __shamt__: shift amount. __funct__: function. this field, often called the _function code_, selects the specific variant of the operation in the op field. ### arithmetic instruction * ```add $rd, $rs, $rt``` $rd = $rs + $rt * ```addu $rd, $rs, $rt``` $rd = $rs + $rt * ```addi $rt, $rs, immed``` $rt = $rs + immed * ```sub $rd, $rs, $rt``` $rd = $rs - $rt * ```subu $rd, $rs, $rt``` $rd = $rs - $rt __EXAMPLE:__ turn the c code segment ```f=(g+h)-(i+j);``` into MIPS assembly code. the variables ```f,g,h,i,j``` are assigned to the register $s0, $s1, $s2, $s3, $s4, respectively. ans: ``` add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1 ``` ### basic memory instruction * ```lw $rt, offset($rs)``` _load word_, $rt = $rs[offset], the unit of offset is byte * ```sw $rt, offset($rs)``` _store word_, $rs[offset] = $rt, the unit of offset is byte __EXAMPLE:__ trun the c code segment ```g=h+A[8];``` into MIPS assembly code. the variables ```g,h``` are assigned to the register $s1, $s2 respectively. and the _base address_ of the array of int is in $s3. ans: ``` lw $t0, 32($s3) # a word is 4 bytes add $s1, $s2, $t0 ``` __EXAMPLE:__ trun the c code segment ```A[12] = h + A[8];``` into MIPS assembly code. the variable ```h``` is assigned to the register $s2 and the base address of the array of int, ```A``` is in $s3. ans: ``` lw $t0, 32($s3) add $t0, $s2, $t0 sw $t0, 48($s3) ``` ### how instruction represents in the computer __EXAMPLE:__ turn ```add $t0, $s1, $s2``` into machine code. ans: remember the instruction format: R-type: op(6)+rs(5)+rt(5)+rd(5)+shamt(5)+funct(6) the opcode of ```add``` instruction is 0, the numbe of ```$t0``` is 8, ```$s1``` is 17, ```$s2``` is 18. and they're all corresponding to op, rd, rs, rt. so the machine code is | op | rs | rt | rd | shamt | funct |---|---|---|---|---|---| | 0b000000 | 0b10001 | 0b10010 | 0b01000 | 0b00000 | 0b100000 which is __0x2324020__. __EXAMPLE:__ turn the c code segment ```A[300] = h + A[300];``` into machine code. the variable ```h``` is assigned to $s2, and the base of the array is in $t1. ans: the assembly code is ``` lw $t0, 1200($t1) add $t0, $s2, $t0 sw $t0, 1200($t1) ``` the machine code of ```lw``` is: | op | rs | rt | immedatiate | | --- | --- | --- | --- | | 35=0b100011 | 9=0b01001 | 8=0b01000 | 1200=0b0000010010110000 | which is __0x8D2804B0__. ,and the machine code of ```add``` is: | op | rs | rt | rd | shamt | funct | | --- | --- | --- | --- | --- | --- | | 0b000000 | 0b10010 | 0b01000 | 0b01000 | 0b00000 | 0b100000 | which is __0x2484020__. ,the machine code of ```sw``` is: | op | rs | rt | immedatiate | | --- | --- | --- | --- | | 43=0b101011 | 9=0b01001 | 8=0b01000 | 1200=0b0000010010110000 | which is __0xAD2804B0__. ### bitwise instruction * ```sll $rd, $rt, shamt``` $rd = $rt << shamt * ```srl $rd, $rt, shamt``` $rd = $rt >> shamt * ```and $rd, $rs, $rt``` * ```andi $rt, $rs, immed``` * ```or $rd, $rs, $rt``` * ```ori $rt, $rs, immed``` * ```nor $rd, $rs, $rt``` * ```nori $rt, $rs, immed``` * ```xor $rd, $rs, $rt``` * ```xori $rt, $rs, immed``` ### conditional branch instruction * ```beq $rs, $rt, Label``` if ($rs == $rt) goto Label (Label is actually an immediate which is relative to PC, we will discuss it later.) * ```bne $rs, $rt, Label``` if ($rs != $rt) goto Label * ```slt $rd, $rs, $rt``` \$rd = ($rs < $rt) * ```sltu $rd, $rs, $rt``` \$rd = ($rs < $rt), unsigned int comparsion * ```slti $rt, $rs, immed``` \$rt = (\$rs < immed) * ```sltiu $rt, $rs, immed``` \$rt = (\$rs < immed), unsigned int comparsion note that ```slt``` won't cause overflow exception. look this: https://stackoverflow.com/questions/69061872/mips-processor-why-sub-cause-overflow-and-slt-isnt __EXAMPLE:__ turn the c code segment ```c if (i == j) f = g + h; else f = g - h; ``` into MIPS code. the variables ```f``` through ```j``` correspond to the five registers $s0 through $s4. ans: ``` bne $s3, $s4, ELSE add $s0, $s1, $s2 j EXIT # goto EXIT ELSE: sub $s0, $s1, $s2 EXIT: ``` ### loop turn the c code segment ```c while (save[i] == k) i += 1; ``` into MIPS code. ```i,k``` is corresponding to $s3, $s5, respectively. and the base address of ```save``` is in $s6. ans: ``` WHILE: sll $t0, $s3, 2 add $t0, $s6, $t0 lw $t1, 0($t0) bne $t1, $s5, EXIT addi $s3,$s3,1 j WHILE EXIT: ``` optimization version: ``` sll $t0, $s3, 2 add $t0, $s6, $t0 WHILE: lw $t1, 0($t0) bne $t1, $s5, EXIT addi $t0, 4 j WHILE EXIT: ``` ### function call the outlook of function call stages: 1. Put parameters in a place where the procedure can access them. 2. Transfer control to the procedure. 3. Acquire the storage resources needed for the procedure. 4. Perform the desired task. 5. Put the result value in a place where the calling program can access it. 6. Return control to the point of origin, since a procedure can be called from several points in a program. MIPS provide several registers for function call: * $a0 - $a3 : four argument registers in which to pass parameters. * $v0 - $v1 : two value registers in which to return values. * $ra : one return address register to return to the point of origin. MIPS assembly language includes an instruction just for function call: ```jal address```(jump and link), it jumps to the address and simultaneously saves the calling address in register \$ra. to meet the need of jump to the address in register (yeah, ```j address``` takes only immediate value), MIPS provides such an instruction: ```jr $rs``` which can jump to the address stored in register $rs. note that ```jal address``` change nothing but $ra, so the caller must put the argument into $a0 - $a3 before ```jal```. and the callee must place the result in $v0 and $v1 before returning control to the caller using ```jr $ra``` Implicit in the stored-program idea is the need to have a register to hold the address of the current instruction being executed. this register is called __program counter__, abbreviated _PC_ in the MIPS architecture. the instruction ```jal``` actually saves PC+4 in $ra to link to the following instruction to set up the procedure return. suppose a compiler needs more registers for a procedure than the four argument and two return value registers. we need another data structure to help. the ideal data structure is __stack__. the pointer of stack is named $sp in MIPS. by historical precedent, stack "grow" from higher address to lower address. __EXAMPLE:__ turn the c function ```c int leaf_example(int g, int h, int i, int j) { int f; f = (g + h) - (i + j); return f; } ``` into MIPS code. ```f``` is corresponding to $s0. ans: ``` LEAF_EXAMPLE: addi $sp, $sp, -12 # move down the stack pointer sw $s0, 0($sp) # store the content of $s0 sw $t0, 4($sp) sw $t1, 8($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $t1, 8($sp) lw $t0, 4($sp) lw $s0, 0($sp) # return the content of $s0 addi $sp, $sp, 12 # move back the stack pointer jr $ra ``` ### recurrsion function turn the c code function ```c int fact(int n) { if (n < 1) return 1; else return n * fact(n - 1); } ``` into MIPS code. ans: ``` FACT: slti $t0, $a0, 1 beq $t0, $zero, RECU addi $v0, $zero, 1 jr $ra RECU: addi $sp, $sp, -8 sw $ra, 0($sp) sw $a0, 4($sp) addi $a0, $a0, -1 jal FACT lw $a0, 4($sp) lw $ra, 0($sp) addi $sp, $sp, 8 mul $v0, $a0, $v0 jr $ra ``` ### memory layout ![](https://i.imgur.com/Kj41WWc.png) * the .bss, data, text section are called _static memory layout_. * if we want to allocate a space on heap, we need [system call sbrk()](www.doc.ic.ac.uk/lab/secondyear/spim/node8.html), and use some register to access the data. __EXAMPLE:__ ``` int var1 = 1; // data section int var2; // .bss section int main() { int arr1; // stack int* var3 = (int *)malloc(sizeof(int)); // heap return 0; } ``` ### allocate on heap first, __system call__ must be mentioned here. system call is a collection of function provided by OS for request some service from kernel, such like process control, file management, device management, communication, etc... [more detail](https://hackmd.io/@combo-tw/Linux-%E8%AE%80%E6%9B%B8%E6%9C%83/%2F%40combo-tw%2FBJPoAcqQS) make a system call is slightly different from a common function call. the general step is: 1. load the system call number in register $v0. 2. load argument values, if any, in $a0, $a1, $a2, or $f12 as specified. 3. issue the ```syscall``` instruction. 4. retrieve return values, if any, from result registers as specified. __EXAMPLE:__ show the value stored in $t0 on the console. ``` li $v0, 1 # system call 1 is print a integer add $a0, $t0, $zero syscall ``` [system call number table](https://courses.missouristate.edu/kenvollmar/mars/help/syscallhelp.html) now, if we want to allocate a space on heap, we need system call 9, sbrk(). sbrk() will take an argument, _increments_, and increments the program's data space by _increment_ bytes. the return value is the original value of program break. (program break defines the end of the process's data segment) (you can find more detail on [man page](https://man7.org/linux/man-pages/man2/brk.2.html)) here is a example: ``` # allocation li $v0, 9 addi $a0, $zero, 1024 # i want to allocate 1024 bytes syscall add $s0, $v0, $zero # store the original value of program break ... # now we can use this space by $s0 # free li $v0, 9 addi $a0, $zero, -1024 # decrease the program break by 1024 syscall ``` ### the pointer of memory layout * frame pointer($fp): ![](https://i.imgur.com/Km4p4Wo.png) in normal situation, \$fp isn't necessary needed. but when in debug, \$fp is needed for trace the function. * stack pointer(\$sp) * global pointer(\$gp): global pointer points to the upper start of static data section. it's used to access dynamic allocated data more convenient. * program count(\$pc) ### dealing with ASCII character a character of ASCII code is 8-bit length. so we can't just use ```lw,sw```. MIPS provides ```lb $rt, offset($rs), lbu $rt, offset($rs), sb $rt, offset($rs), sbu $rt, offset($rs)``` which only takes the rightmost 8-bits. __EXAMPLE:__ trun the c code function ```c void strcpy(char x[], char y[]) { int i; i = 0; while((x[i] = y[i]) != '\0') ++i; } ``` into MIPS code. ans: ``` STRCPY: addi $sp, $sp, -4 sw $s0, 0($sp) add $s0, $zero, $zero WHILE: add $t0, $s0, $a0 add $t1, $s0, $a1 lbu $t2, 0($t1) sbu $t2, 0($t0) beq $t2, $zero, EXIT_WHILE addi $s0, $s0, 1 j WHILE EXIT_WHILE: lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra ``` ### dealing with unicode character by default unicode, a character takes 16-bits which is half-word. MIPS provides ```lh $rt, offset($rs), lhu $rt, offset($rs), sh $rt, offset($rs), shu $rt, offset($rs)``` for the situation. ### 32-bits immediate operands though MIPS register can store 32-bit value, the instruction actually limit the max value that can input at one time. to solve this, MIPS provide ```lui $rt, $immed``` which store the upper 16-bits. what about the lower 16-bits? well, we can use ```ori```(OR bitwise) to done that. for example: how to store ```0b00000000001111010000100100000000``` into \$s0? ``` lui $s0, 61 # 0b0000000000111101 = 61 ori $s0, $s0, 2304 # 0b0000100100000000 = 2304 ``` *The effect of the ```lui``` instruction. The instruction ```lui``` transfers the 16-bit immediate constant field value into the left most 16 bits of the register, filling the lower 16 bits with 0. ### addressing in jump now we introduce the relative of ```jal```, ```j address```. this instruction is used to change PC to the immediate value, and the actual address PC will go is calculated by the rule below: 1. calculate ```val = address << 2``` (now it's 28-bit long) 2. concatenate the head 4-bit of PC with the ```val``` just calculated. (now it's 32-bit long) 3. change the value of PC to address just calculated the rule is applied on every J-type instructions. ### addressing in branches now we goes with branch. remember the format of j-type instruction, the address take 26 bits. while the immediate of branch instruction take only 16 bits. if address had to fit in this 16-bit field, that means no program can be larger than $2^{16}$ which is way too small for the program today. to solve this, we can locate branch address base on PC.(actually is PC+4) this form of branch addressing is called __PC-relative addressing__. *the PC-relative addressing refer to the number of _words_. __EXAMPLE:__ ``` Loop:sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit addi $s3, $s3, 1 j Loop Exit: ``` assume the address of ```Loop``` is 80000. the machine code will be: ![](https://i.imgur.com/I13fPcm.png) ### MIPS addressing mode summary 1. _immediate addressing_, where the operand is a constant within the instruction itself. _e.g._ ```j address``` 2. _register addressing_, where the operand is a register. _e.g._ ```jr $ra``` 3. _base_ or _displacement addressing_, where the operand is at the memory location whose address is the sum of a register and a constant in the instruction. _e.g._ ```lw $rt, offset($rs)``` 4. _PC-relative addressing_, where the branch address is the sum of the PC and a constant in the instruction. _e.g._ ```beq $rs, $rt, immed``` 5. _pseudodirect addressing_, where the jump address is the 26 bits of the instruction concatenated with the upper bits of the PC. _e.g._ ``` beq $s0, $s1, L1 ``` if L1 is too far away, the assembler will replace the code below with it: ``` bne $s0, $s1, L2 j L1 L2: ``` ### parallelism and synchronization to support parallelism, MIPS provide ```ll $rt, offset($rs), sc $rt, offset($rs)```, called _load link_ and _store condition_. These instructions are used in sequence: ```ll $rt, offset($rs)``` will load the value of the memory from $rs[offset] to $rt, and save the memory position to link register. ```sc $rt, offset($rs)``` store $rt into $rs[offset] if the contents of the memory location specified by the load linked isn't changed before the store conditional to the same address occurs. if it is, the store conditional won't work. __EXAMPLE:__ ``` again:add $t0, $zero, $s0 # copy the exchange value ll $t1, 0($s1) sc $t0, 0($s1) beq $t0, $zero, again # if store failed, $t0 would be set to be zero add $s0, $zero, $t1 # put loaded value ``` for almost all processor families, only one ```ll``` can be held at a time, so issuing a second ```ll``` invalidates the first, and a context switch on a processor holding an ```ll``` normally causes the ```sc``` to fail always, when it comes to be performed. ### how a executable file come out ![](https://i.imgur.com/GV8Cf12.png) ### object file object file is a combination of machine language instructions, data, and information needed to place instruction properly in memory. to produce the binary version of each instruction in the assembly language program, the assembler must determine the addresses corresponding to all labels. assemblers keep track of labels used in branches and data transfer instructions in a __symbol table__, which is a table that matches names of labels to the addresses of the memory words that instructions occupy. The object file for UNIX systems typically contains six sections: * the _header_ describes the size and position of the other sections of the object file. * the _text segment_ contains the machine language code. * the _static data segment_ contains data allocated for the life of the program. * the _relocation information_ identifies instrution and data words that depend on absolute address when the program is loaded into memory. * the _symbol table_ the remaining labels that are not defined, such as external references. * the _debugging information_ contains a concise description of how the modules were compiled so that a debugger can associate machine instructions with C source files and make data structure readable. ### linker linker is a systems program that combines independently assembled machine language programs and resolves all undefined labels into an executable file. the linker uses the relocation information and symbol table in each object module to resolve all undefined labels. such references occur in branch instructions, jump instructions, and data addresses, so the job of this program is much like that of an editor: it finds the old addresses and replaces them with the new addresses. the reason a linker is useful is that it is much faster to patch code than it is to recompile and reassemble. the linker produces an executable file that can be run. typically, executable file has the same format as an object file, except that it contains no unresolved reference. __EXAMPLE:__ there are two object file below. the example will show how a linker work. there're instructions that refer to the addresses of procedures A and B and the instructions that refer to the addresses of data words X and Y. *we show the instructions in assembly language just to make the example understandable; in reality, the instructions would be numbers. ---- __object file header__: | name | text size | data size | | --------- | ----------- | ----------- | | procedure A | 0x100 | 0x20 | __text segment:__ | address | instruction | | ------- | ----------- | | 0 | ```lw $a0, 0($gp)``` | | 4 | ```jal 0``` | | ... | ... | __data segment:__ | address | data | | --- | --- | | 0 | X | __relocation information:__ | address | instruction | dependency | | -------- | -------- | -------- | | 0 | ```lw``` | X | | 4 | ```jal``` | B | __symbol table:__ | label | address | | -------- | -------- | | X | - | | B | - | --- __object file header__: | name | text size | data size | | --------- | ----------- | ----------- | | procedure B | 0x200 | 0x30 | __text segment:__ | address | instruction | | ------- | ----------- | | 0 | ```sw $a1, 0($gp)``` | | 4 | ```jal 0``` | | ... | ... | __data segment:__ | address | data | | --- | --- | | 0 | Y | __relocation information:__ | address | instruction | dependency | | -------- | -------- | -------- | | 0 | ```sw``` | Y | | 4 | ```jal``` | A | __symbol table:__ | label | address | | -------- | -------- | | Y | - | | A | - | ---- > memory layout > ![](https://i.imgur.com/F7uDezS.png) after link: __executable file header:__ | text size | data size | | ----------- | ----------- | | 0x300 | 0x50 | according to the picture above, text segment starts at 0x400000. and the text size of procedure A is 0x100, so the start of the text of procedure B is 0x400100. similar as the data segment. \$gp points to 0x10008000, which is 268468224 in decimal. so the first data is in 0x10000000=268435456, whose offset according to global pointer is -32768. the second one is in 0x10000020=268435488, whose offset according to global pointer is -32736. (u may ask why the textbook write 0x8000, since the offset is a signed integer, 0x8000 will be converted to -32768 in dec. and -32736 is actually 0x8020. so the textbook is right!) the start address of procedure A is 0x400000 and the other is 0x400100. since ```jal``` is use 26-bit address to create 28-bit address(fill zeros at lower two positions, which is equal to shift left by 2) and concatenate the upper four bits of PC + 4, the destination of ```jal``` must be ignore the upper four bits and divided by 4, which is 0x400000=0b0000 0000 0100 0000 0000 0000 0000 0000 -> 0b0000 0100 0000 0000 0000 0000 0000 -> 0b0000 0001 0000 0000 0000 0000 0000 = 0x100000 = 1048576 and 0x400100=0b0000 0000 0100 0000 0000 0001 0000 0000 -> 0b0000 0100 0000 0000 0001 0000 0000 -> 0b0000 0001 0000 0000 0000 0100 0000 = 0x100040 = 1048640 respectively. __text segment:__ | address | instruction | | ------- | ----------- | | 0x400000| ```lw $a0, -32768($gp)``` | | 0x400004| ```jal 1048640``` | | ... | ... | | 0x400100| ```sw $a1, -32736($gp)``` | | 0x400104| ```jal 1048576``` | __data segment:__ | address | | | ------- | --- | | 0x10000000 | X | | ... | ...| | 0x10000020 | Y | | ... | ...| ### loader The loader follows these steps in UNIX systems: 1. reads the executable file header to determine size of the text and data segments. 2. creates an address space large enough for the text and data. 3. copies the instructions and data from the executable file into memory. 4. copies the parameters (if any) to the main program onto the stack. 5. initializes the machine registers and sets the stack pointer to the first free location. 6. jumps to a start-up routine that copies the parameters into the argument registers and calls the main routine of the program. when the main routine returns, the start-up routine terminates the program with an exit system call. ### a c sort example turn the c sort code into MIPS assembly code. ```c= void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } void sort(int v[], int n) // v[1, n] { int i, j; for (i = 0; i < n; ++i) for (j = i - 1; j >= 0 && v[j] > v[j + 1]; --j) swap(v, j); } ``` ans: first deal with ```swap(int*, int)``` ``` SWAP: sll $a1, $a1, 2 # index * 4, it's important add $t0, $a0, $a1 lw $t1, 0($t0) lw $t2, 4($t0) sw $t1, 4($t0) sw $t2, 0($t0) jr $ra ``` then deal with ```sort(int*, int)``` ``` SORT: addi $sp, $sp, -16 sw $s0, 0($sp) sw $s1, 4($sp) sw $s2, 8($sp) sw $ra, 12($sp) # since we need function call in sort add $s0, $zero, $zero # i = 0 add $s2, $zero, $a1 # $s2 = n LOOP1: # LOOP1 condition: i < n slt $t0, $s0, $s2 # $t0 = $s0 < $s2 = i < n beq $t0, $zero, EXIT_LOOP1 # if (i < n) == 0 goto EXIT_LOOP1 addi $s1, $s0, -1 # j = i - 1 LOOP2: # LOOP2 condition: j >= 0 slt $t0, $s1, $zero # $t0 = $s1 < 0 = j < 0 bne $t0, $zero, EXIT_LOOP2 # if (j < 0) != 0, goto EXIT_LOOP2 # LOOP2 condition: v[j] > v[j+1] sll $t4, $s1, 2 add $t0, $a0, $t4 lw $t1, 0($t0) # v[j] lw $t2, 4($t0) # v[j+1] slt $t3, $t2, $t1 # $t3 = $t2 < $t1 = v[j+1] < v[j] beq $t3, $zero, EXIT_LOOP2 # if (v[j+1] < v[j]) == 0, goto EXIT_LOOP2 add $a1, $s1, $zero # $a1 = j jal SWAP # function call, $a0 = v, $a1 = j addi $s1, $s1, -1 j LOOP2 EXIT_LOOP2: addi $s0, $s0, 1 j LOOP1 EXIT_LOOP1: lw $ra, 12($sp) lw $s2, 8($sp) lw $s1, 4($sp) lw $s0, 0($sp) addi $sp, $sp, 16 jr $ra ``` ### fallacies and pitfalls fallacy: _CISC has more higher performance than RISC_ in the pass, the statement is true. since at that time, the access speed of memory is very very slow, for decreasing the frequency to access it, engineers prefer info-dense instruction (i.e. contain many steps per instruction, like ```xchg``` in x86). so the high frequency to access memory (reading instruction) is a fatal shortcoming for RISC, "at that time". with the increasing of accessing speed, RISC start to take adventage. its characteristic (one instruction a time) allow a tech called [instruction pipeline](https://en.wikipedia.org/wiki/Instruction_pipelining) to be implemented, this speed RISC up lots. so, is RISC faster than CISC now? no. these two are evolving, learning each other. for example, CISC can trun the big instruction into a sequence of RISC instruction, and implement instruction pipeline. the only thing come state is: __"there's only the most fittable one instead of the most efficient."__ fallacy: _write in assembly language to obtain the highest performance._ no. since compiler is smart than you, knowing better the arithmetic of your computer than you. pitfall: _forgetting that sequential word addresses in machines with byte addressing do not diff er by one._ remember to multiple by 4!