CH1 - HackMD

# Overview --- * We can divide computer into two major section 1. Software 2. Hardware ![](https://hackmd.io/_uploads/Bkxe-3aOh.png) ## Storage-Program Concept --- * Proposed by John von Neumann also called **Von Neumann Architechture** * Says that the instctions and data can be stored in memory as **number** (like binary 0/1) ![](https://hackmd.io/_uploads/B1a6Ghp_n.png) * Consiste five majot components 1. **Control** 2. **Datapath** * Is the place instuctions run * Usually combined with Control unit and called **CPU** 3. **Memory** 4. **Input** 5. **Output** * Since we can't access the memory two times in the same time(Fetch instruction/Data access)this leads to a bottleneck, also known as the **Von Neumann Bottleneck** * To overcome it we can divide memory into two sections,one for instruction and the other for data * This method arose the **Harvard Architecture** ![](https://hackmd.io/_uploads/SJegvhaun.png) ## Instruction Set Architecture(ISA) --- * Computer command in High level language called statement,in **Low level language** called **Instruction** * Instruction is the most basic operation(*Primitive Operation*)that CPU can excute * **Instruction Set** and **Relative Hardware Information** build up the *ISA* * Work as the interface between the *lowest* level of software and *highest* level of hardware * Is an **abstraction** so the implmentation of same ISA can be different #### Different ISA --- ##### CISC(Complex Istruction Set Computer) * Include more complex instruction to provide a **better enviroment for programming** * Most x86 processor are this type(Intel/AMD) ##### RISC(Reduced Istruction Set Computer) * Include only the simplest instruction for CPU to achieve **faster instruction excution** * ARM is the most representative user #### Hardware Information in ISA --- ##### Memory * Is a large **single dimesional array** * Size is measured by *Array Length* * *Data Size in one slot* ![](https://hackmd.io/_uploads/H1x_VA6_h.png) * With different definations of data size in each memory slot we have two addressing method 1. Byte Addressing * The size of each slot is *1 Byte* 2. Word Addressing * The size of each slot is *1 Word* whose size is defined by the computer * Usually will be the **maximum bits of the data CPU can access in one time** * Like 32bits mean the bus between CPU and Memory is 32bits * For the *Write* operation, the **starting position** we choose **must be the mutiple of the word size defined by the computer**,also known as **Aligenmet** > Example: > > MIPS is using byte addressing with 32bits(4bytes) of data/address line,so that for each data CPU wants it needs to go through 4 memory positions. > >That's say the starting position of any write operation must be **mutiple of 4** > ![](https://hackmd.io/_uploads/H1y3iZgK2.png) * Address conforms to aligenment is **Word Address**,others called *Byte Address* * For *Read* operation,since the slot is only 1byte and the data is 4bytes so there's need to arrange the way data place 1. *Big Endian* * **MSB placed at the highest address** * Used by **MIPS** 2. *Little Endian* * **LSB placed at the highest address** * Used by Intel 80x86 > EX: > Assume that data is 0x3D2F4E88 > > Big Endian > > > |10004|10005|10006|10007| > |-----|-----|-----|-----| > | 88 | 4E | 2F | 3D | > > Little Endian > |10004|10005|10006|10007| > |-----|-----|-----|-----| > | 3D | 2F | 4E | 88 | --- ##### Register > ![](https://hackmd.io/_uploads/rJl936-Kh.png) ###### Special-Purpose Register * **Hi/Lo** is for mutiple and divide. * For mutiple the left 8 bytes will be stored in Hi REG while right 8 bytes will be stored in Lo REG * For divide the quotient will be stored in Hi REG while the remainder will be stored in Lo REG * **PC** stands for *Program Counter* storing the **address of next instruction** --- ##### General-Purpose Register > ![](https://hackmd.io/_uploads/HJv306bY3.png) * **Procedure call** * The procedure which is being call is *callee* * When storing data it must start storing from 0 > EX: > > caller = ADD(1,10),function address = 1030 > ```c++= > ADD(int x, int y) > { > int sum = 0; > for(int i = x ; i <= y ; i++) > sum += i; > return sum; > } > ``` > when calling > |a3|a2|a1|a0| > |--|--|--|--| > |0 |0 |10|1 | > > when return > |v1|v0|ra| > |--|--|--| > |0 |55 |1030| * **Variable** * Since Register is limited so while changing procedure or calling procedure the **callee**(Can reduce the number of register needed to be stored) need to store the original data in the *Saved Register* into memory * For **Saved Register** the *compiler* will find *frequently used* variable and store it in saved register and these information will be stored in *Symbol Table* like this > ![](https://hackmd.io/_uploads/By37KC-F3.png) * But more Saved Register might not have better efficency.Since it might increase the *clock time* of the CPU and harder to encode the register * **Memory Management** > ![](https://hackmd.io/_uploads/rJeGU0bF2.png) * *Local Variables* store in the **Stack** * *Dynamic Variables* store in the **Heap** * *SP* store the **top address of the stack** * *FP* store the **bottom address of thestack** * MIPS has this feature to avoid the confusion(may have many reference for one variable) casued by the change of SP during the runtime * We can use *FP+offset* to easily locate certain variable in the stack * *GP* store the **start address of the Static block**(in this case is 05F00000) ##### Translation Hierarchy * There's two way to run a program 1. **Compilation**(Translation) * High Level Language => *Assembly Language*(by **Compiler**) => *Machine Language*(by **Assembler**) => RUN(by **Linker**&**Loader**) * Linker will combined the original Machine Language with other things like library provided by the programming language to produce the excutable file * To load a excutable file with loader there's six steps 1. **Read the header** of the exe file to know the size of the file 2. Find a cotinuous space(**Address Space**) in memory which is large enough for the file 3. Copy the exe file into memory 4. If needed **copy the parameter into memory** 5. **Initialize the register** like set the sp to the first free sapce of the stack 6. Jump to **Start-up rountine** to run the program,that says that *all running program are a sub-program of the start-up rountine* and can be terminated by an exit system call called by the start-up rountine * Adopted by most programming language since it can **use the hardware of the computer directly** 2. **Interpretation** * Java => Java Binrary Code(by **Compiler**) => Run(with the use of the **Virtual Machine** in this case is JVM) * This methode can improve the portability and provide **platform independent** for the code but with the use of the virtual machine it's inefficient * To increase the efficency some frequently used functions will be compiled into machine language by the Just-In-Time compiler(JIT) ##### Instruction Set > ![](https://hackmd.io/_uploads/HkApYUMY3.png) 1. **Data Movement** * For data movment we use *reference* like **lw t0,4($s1)** since the memory location of program can be different during the runtime * Instructions mentioned below are **the only instructions will access memory** > Example: > ![](https://hackmd.io/_uploads/SJUibgPK2.png) > A[12] = h + A[8] in assembly = ? > 　For an int array each element is 1 bytes => **the offset can be derived by mutiple 4 with the index**. > 　For example A[12] can be represented by 4*12($s0) =48($s0) > So the answer is > ***lw t0,32(s0)*** > ***add s1,s1,t0*** > ***sw s1,48(s0)*** * MIPS also supports other instructions 1. **lh**(Load Half)&**lhu**(Load Half Unsigned) > Let s0 = 10004 > ![](https://hackmd.io/_uploads/r1fcHxwF3.png) > **lh t0, 2(s0)** > => t0 = , ,E8,AC but the first two bits is not defined in this case we need to extend the bits with the first avaliable=>**E** in binary is 1110 => extend 1 to the first two bits > =>**ANS = FF,FF,E8,AC** > > **lhu t0, 2(s0)** > => t0 = , , E8,AC since the data we load is an unsigned => the first bit is 0 =>extend 0 to the first two bits => **ANS = 00,00,E8,AC** 3. **lb**(Load Byte)&**lbu**(Load Byte Unsigned) * Same as the lh but only load one byte > ***lb t0, 4(s0)*** > t0 = , , ,B4 => B in binary is 1011 => **ANS = FF,FF,FF,B4** 4. **sh**(Store Half)&**sb**(Store Byte) * When storing we store the data into a smaller space(register is 4bytes but we store 1 byte at a time)so there's no need to extend the bits >let s0 = 10040 and t1 = FF,23,BE,F4 >***sh t0 4(s0)*** => **ANS => 10044 = BE 10045 = F4** >***sb t0 4(s0)*** => **ANS => 10044 = BE** * Sometimes we use an element in array as the other array's index like *A[B[4]]* > let A(s6) B(s7) > To derived the index for A we need to first load the B[4] > **(1).** ***lw t0,16(s7)*** > 　　Since there's no mutiple in MIPS we can use **shift left as mutiple 2**(shift right is divide 2) > **(2).** ***sll t0, 2***(shift left 2 bits = mutiple 4) > **(3).** ***add t0,t0,s6***(get the index of A) > **(4).** ***lw t1,0(t0)*** => GET the data we want in array A 2. **Operation** * **Logicl Movement** 1. **Shift** * Can shift to right or left * For **unsigned number** shift left equals to *mutiple 2^n*,shfit right equals to *divide 2^n* * The shft out bit will be discarded * Has **logical** & **arithmetic** * For *logical* the undefined bits are setting to 0(has L&R) * For *arithmetic* are set according to the sign it is(has only *R*) > t0 = 10010001 => ***sll t0 1*** =>0010001**0** > In this case the MSB 1 is discarded,The LSB is set to 0 > ***sra t0 1*** =>**1**1001000 > In this the sign is 1 so the MSB is reset to 1 and the LSB 1 is discarded 2. **Rotate** * The discarded bits will be reused to the direction we rotate > 10010001 => rotate left 1 => 0010001**1** > > MSB 1 is rotated to LSB 3. **Boolean Arithmetic** * Operations like **and**/**or**/**nor**/**xor**/ * MIPS doesn't have not instruction instead it use **nor** to implement not > Pseudo: ***not t0 s1*** > Actual assembly : ***nor t0 s1 s1*** > Other examples like **swap s0 s1** > Actual assembly: ***xor s0 s0 s1***(s0 = s0 xor s1) > 　　　　　　　　***xor s1 s0 s1*** (s1 = [s0 xor s1]xor s1 = s0) > 　　　　　　　　***xor s0 s0 s1*** (s0 = [s0 xor s1]xor s0 = s1) > **a xor a = 0* * **MASK** * Can reserve bits we want while set other bits to 0 * Implemented by **and** operation > s0 = 010**1110**010 => let s1 = 000**1111**000 > ***and s0 s0 s1*** => 000**1110**000 * **SET** * Can set the bits we want to 1 while others remained the same * Implemented by **or** operation > s0 = 010**1110**010 => let s1 = 000**1111**000 > ***or s0 s0 s1*** => 010**1111**010 * **Field Isolation** * Extract bits in the range(*field*) we want * Can be implementd by **shift** operations(most efficient way) > ![](https://hackmd.io/_uploads/r1GF1EqY2.png) > ![](https://hackmd.io/_uploads/B14q1V9t2.png) > (a): > 　field range from 16 to 0 > 　**(1).** ***sll t1 t0 9*** (t1 = [Field][000000][remaining bits in the right]) > 　**(2).** ***srl t1 t1 15*** > (b): > 　14+i-j = 31 => The ans should looks like [Field][000...000] > 　**(1).** ***srl t1 t0 6***(t1 = [remained btis in the left][000000][Field]) > 　**(2).** ***sll t1 t1 15*** >(c.): >　14+i-j = 31 => The ans should looks like [Field][xxx..xxx] >　**(1).** ***srl t2 t0 6***(t2 = [000000][xx...xxx][Field]) >　**(2).** ***sll t2 t2 15***(t2 = [Field][000...000]) >　**(3).** ***or t1 t1 t2*** * **Arithmetic** * Can only perform **Binary Operations** * For compiler, it produce assembly code with the symbol table and high level language > ![](https://hackmd.io/_uploads/B1-t6LGF2.png) > > A = B+C+D > > ***add $t0 , $s1 , $s2*** > > ***add $s0 , $t0 , $s3*** * **Divide** and **Modulus** (*Divider must be the power of 2*) can be done by division the data > 12/4 => 4 = $2^2$ => 12 = 0b1100 => divison from bits number 2 => ans = 0b11 = 3 > 17%2 => 2 = $2^1$ => 17 = 0b10001=> division from bits number 1 => 0b1 = 1 * **Flow Control** * Can jump betweeen the different **Basic Block** > Basic block is a sequence of instructions without brach(except jump-in(branch label) in begining and jump-out(branch instruction) in the end) * We use **Label** to specify different basic block 1. **beq**(Branch if equal) & **beq**(Branch if not equal) > ![](https://hackmd.io/_uploads/B1qvUyFtn.png) > The final **j exit** is required to avoid the program continue > ![](https://hackmd.io/_uploads/HyN7vJFY3.png) > Let A(s5) g(s1) h(s2) i(s3) j(s4) > **Assembly**=> > ***Loop: sll t1 s3 2*** // Get the index of A > 　　***add t1 t1 s5*** > 　　***lw to 0(s3)*** // Get the data in A[i] > 　　***add s1 s1 t0*** // g = g + A[i] > 　　***add s3 s3 s4*** // i = i + j > 　　***beq s1 s2 Loop*** * **slt**(Set if not equal) > ***slt t0 s1 s2*** equals to the following C statement: > ````C > if(s1 < s2) > t0 = 1; > else > t0 = 0; > ```` * **Immediate Instructions** * One operand can be constant**(avoid the *lantency for loading data* from memory) > ***addi t1 t0 5*** * The constant is **signed** and ranged from $2^{15}$ to $-2^{15}$ * Most instructions in MIPS have immediate variant(since the constant is signed so there's *no subi*) * If the constant is larger than 32 bits,we divied it into two 16bits segements and load it into register respectively > addi s0 zero 80000 > 80000 in 2's complement binary is `0000000000000001,0011100010000000` => in deciaml `1,14464` > **(1).** ***lui t0 1*** > **(2).** ***ori to 14463*** * **Pseudoinstruction** * Some instructions is not included by MIPS but the assmebly support * The assembler will transform it to the instrcutions that MIPS can excute.Following is some examples.(There's more see textbook) 1. **blt**(Branch if less than) > Pseudo : ***blt s1 s2 L*** > Actual assembly: ***slt t0 s1 s2*** > 　　　　　　　***bne t0 zero L*** 2. **bgt**(Branch if greater than) > Pseudo : ***brt s1 s2 L*** > Actual assembly: ***slt t0 s2 s1*** > 　　　　　　　***bne t0 zero L*** ##### Instruction Format * There's four types of format 1. **R-type**(*Register*) > ![](https://hackmd.io/_uploads/BkM_euoFh.png) * For instructions involved with **three register**(*all arithmetic instructions*) or **two register alonged with 5 bits offset**(*all shift movement*) * OP code **always be `000000`** 3. **I-type**(*Immediate*) > ![](https://hackmd.io/_uploads/Byg9WujK3.png) * For instructions involved with **two register and 16bits offset/constant** * **Branch/Immediate/Data Movement** 5. **J-type**(*Jump*) > ![](https://hackmd.io/_uploads/HkXYG_it2.png) * Only for **j/jal** * Temporary Register numbered from `8-15` * Saved Register numbered from `16-23` * Op code and Function code can be found in table > ![](https://hackmd.io/_uploads/r1e_VuoYn.png) ![](https://hackmd.io/_uploads/H16dVdot3.png) >(1). ***sll t1 s3 2*** => rs=null,rt=t1,rd=s3 => [000000][00000][10010][01001][00010][000000] >(2). ***add t1 t1 s6*** => rs=t1,rt=s6,rd=t1 => [000000][01001][10110][01001][00000][100001] >(3). ***lw t0 0(t1)*** => rs=t1,rt=t0 =>[100011][01001][01000][000000000000] >(4). **bne t0 s3 Exit** => rs=t0,rt=s3 =>[000101][01000][10101][0000000000000010] >* The offset for branch is **calculated from the address of PC**,in this case after fetch bne the PC points to *addi s3 s3 1* and the Exit block is in next 2 memory location => the offset is 2 > >(5). ***addi s3 s3 1*** => rs=s3,rd=s3 =>[001000][01011][01011][0000000000000001] >(6). ***j Loop*** =>[000010][100111000100000] >* MIPS supports **at most 4GB of memory and divide into 16($2^4$) blocks with each of the block is 256MB** => The first 4bits is block number. >* One program will not access other program's memory,that's say the intra-jump instruction will jump between the same block => **The first 4bits can be ingored** >* For one slot in memory it store 4byte => The change will only be occurred after bits nubmer 2 => **bits 1 and 0 can be ingored** > * We only need 32-4-2 = 26 bits to specify an address in same memory block * The range of jump/branch is determined by the number of the address bits in intruction * **Jump** * The destination address for jump is reconstructed by the assmebler and the first 4 bits which indictate the memory block that program exisit is determined by the PC,so the **Jump instruction can't jump between differnet memory block** * **Branch** * Branch takes 16bits as offset so the range for jumping is **2000-1FFC** * The memory is a ring structure so **it is possible for branch instruction branches to different memory block** * When we want to branch to a address far away we can combined jump with branch to do so > ![](https://hackmd.io/_uploads/HkRzsze9n.png) ###### Procedure Call * Both caller and callee has something to do before calling and returning > ![](https://hackmd.io/_uploads/BJRpKZfch.png) * If the callee will call other procedure(not leaf procedure)it needs to store the **ra** and **v0** > ![](https://hackmd.io/_uploads/SJiMj-f93.png) >Eample: >![](https://hackmd.io/_uploads/H18js-G5h.png)![](https://hackmd.io/_uploads/SJb3sbG5h.png) > > > --- >For callee: > >--- >*First Section*(Store register into stack) >* There's only 1 register needs to store >***addi sp sp -4*** // One variable needs 4 bytes >***sw s0 0(sp)*** // Store f into stack >--- >*Second Section*(Procedure Body) >***add a0 a0 a1*** // g+h >***add a2 a2 a3*** // i+j >***sub s0 a0 a2*** // f=(g+h)-(i+j) > >--- >*Third Section*(Setup return value) >***add v0 s0 zero*** > >--- >*Fourth Section*(Restore the data in stack) >***lw s0 0(sp)*** >***addi sp sp 4*** > >--- >*Fifth Section*(Return) >***jr ra*** >Example 2: >![](https://hackmd.io/_uploads/r1uKZMfcn.png) >* n stores in a0 >* The procedure will be the caller and callee at the same time >--- >*First Section*(Store register into stack) >* There's only 1 register needs to store >***addi sp sp -8*** // Store ra and v0 >***sw ra 0(sp)*** // Store ra into stack >***sw v0 4(sp)*** // Store v0 into stack >--- >*Section for If(n-1)* >***slti t0 a0 1*** // if(n < 1) set s0 into 1 >***beq t0 zero ELSE*** // Branch if t0=0(n>1) > >--- >*Section for ture* >***addi v0 1 zero*** // Set up return value >***addi sp sp 8*** // Reset sp >***jr ra*** // Return >* Since the segements above haven't change ra and v0 so there's no need to restore it >--- >*Section for ELSE* >***addi v0 v0 -1*** // Setup augument (n-1) >***jal FACT*** // Call FACT >***lw ra 0(sp)*** // Restore the original ra >***lw a0 4(sp)*** // Restore the original v0(n) >***mul v0 a0 v0*** // Setup return value (n*fact(n-1)) >***jr ra*** // Return ##### Addressing Mode * There's Five diffrent addressing mode 1. **Immediate Addressing Mode** * The operand is **constant** within the instruciton > ***addi t0 t0 1*** > ![](https://hackmd.io/_uploads/ryLk1umc2.png) 2. **Register Addressing Mode** * The operand is in the **register** > ***add t0 t0 t1*** > ![](https://hackmd.io/_uploads/rkgG1dX5h.png) 3. **Base or Displacement Addressing Mode** * The operand is in the a certain location of memory and the address is a sum of **register and constant offset** > ***add t0 t0 4(s0)*** >![](https://hackmd.io/_uploads/HkYXJ_Qqh.png) 4. **PC-Relative Addressing Mode** * The **Branch** address is a sum of PC and constant offset > ***beq t0 t1 LABEL*** > ![](https://hackmd.io/_uploads/r1JrJ_m53.png) 5. **Pseudodirect Addressing Mode** * The **Jump** address is a sum of 26bits address and upper four bits of PC > ***j LABEL*** > ![](https://hackmd.io/_uploads/ByTrydX93.png) * When designing ISA there's four things need to be awared 1. **Simplicity favors regularity** > Like all instructions are 32 bits 2. **Smaller is faster** > Like there's only 32 registers 3. **Make the common case fast** > Like immediate and PC-relative addressing mode 4. **Good design demands some compromises** > Like there's three different instructions format ##### RISC vs CISC >![](https://hackmd.io/_uploads/ByLbTj4cn.png) ##### Different styles of ISA 1. **Accumulator** * There's **only one register** in the CPU called accumulator > Example: a = b + c > ***load Memory[Address_b]*** > ***add Memory[Address_c]*** > ***store Memory[Address_a]*** * Since the only operand in the instruction is accumulatot this style is also known as **1 operand style** 2. **Memory to Memory** * Can have register or no register > Example: a = b + c > ***add Memory[Address_a], Memory[Address_b], Memory[Address_c]*** * **3 operands style** 3. **Stack** * Have **no register** >Example: a = b + c >***push Memory[Address_b]*** >***push Memory[Address_c]*** >***add*** >***pop Memory[Address_a]*** * The instruction is for the **top 2 element in the stack** * Must push/pop with the order given by the high level language >Example: a = (b+c) - d >***push b*** >***push c*** >***add*** >***push d*** >***sub*** >***pop a*** * **0 operand style** 4. **Load/Store** * Like MIPS >Example: a = b + c >***load R1 b*** >***load R2 c*** >***add R1 R1 R2*** >***store R1 a*** * **3 operands style**