# Overview
---
* We can divide computer into two major section
1. Software
2. Hardware

## Storage-Program Concept
---
* Proposed by John von Neumann also called **Von Neumann Architechture**
* Says that the instctions and data can be stored in memory as **number** (like binary 0/1)

* Consiste five majot components
1. **Control**
2. **Datapath**
* Is the place instuctions run
* Usually combined with Control unit and called **CPU**
3. **Memory**
4. **Input**
5. **Output**
* Since we can't access the memory two times in the same time(Fetch instruction/Data access)this leads to a bottleneck, also known as the **Von Neumann Bottleneck**
* To overcome it we can divide memory into two sections,one for instruction and the other for data
* This method arose the **Harvard Architecture**

## Instruction Set Architecture(ISA)
---
* Computer command in High level language called statement,in **Low level language** called **Instruction**
* Instruction is the most basic operation(*Primitive Operation*)that CPU can excute
* **Instruction Set** and **Relative Hardware Information** build up the *ISA*
* Work as the interface between the *lowest* level of software and *highest* level of hardware
* Is an **abstraction** so the implmentation of same ISA can be different
#### Different ISA
---
##### CISC(Complex Istruction Set Computer)
* Include more complex instruction to provide a **better enviroment for programming**
* Most x86 processor are this type(Intel/AMD)
##### RISC(Reduced Istruction Set Computer)
* Include only the simplest instruction for CPU to achieve **faster instruction excution**
* ARM is the most representative user
#### Hardware Information in ISA
---
##### Memory
* Is a large **single dimesional array**
* Size is measured by *Array Length* * *Data Size in one slot*

* With different definations of data size in each memory slot we have two addressing method
1. Byte Addressing
* The size of each slot is *1 Byte*
2. Word Addressing
* The size of each slot is *1 Word* whose size is defined by the computer
* Usually will be the **maximum bits of the data CPU can access in one time**
* Like 32bits mean the bus between CPU and Memory is 32bits
* For the *Write* operation, the **starting position** we choose **must be the mutiple of the word size defined by the computer**,also known as **Aligenmet**
> Example:
>
> MIPS is using byte addressing with 32bits(4bytes) of data/address line,so that for each data CPU wants it needs to go through 4 memory positions.
>
>That's say the starting position of any write operation must be **mutiple of 4**
> 
* Address conforms to aligenment is **Word Address**,others called *Byte Address*
* For *Read* operation,since the slot is only 1byte and the data is 4bytes so there's need to arrange the way data place
1. *Big Endian*
* **MSB placed at the highest address**
* Used by **MIPS**
2. *Little Endian*
* **LSB placed at the highest address**
* Used by Intel 80x86
> EX:
> Assume that data is 0x3D2F4E88
>
> Big Endian
>
>
> |10004|10005|10006|10007|
> |-----|-----|-----|-----|
> | 88 | 4E | 2F | 3D |
>
> Little Endian
> |10004|10005|10006|10007|
> |-----|-----|-----|-----|
> | 3D | 2F | 4E | 88 |
---
##### Register
> 
###### Special-Purpose Register
* **Hi/Lo** is for mutiple and divide.
* For mutiple the left 8 bytes will be stored in Hi REG while right 8 bytes will be stored in Lo REG
* For divide the quotient will be stored in Hi REG while the remainder will be stored in Lo REG
* **PC** stands for *Program Counter* storing the **address of next instruction**
---
##### General-Purpose Register
> 
* **Procedure call**
* The procedure which is being call is *callee*
* When storing data it must start storing from 0
> EX:
>
> caller = ADD(1,10),function address = 1030
> ```c++=
> ADD(int x, int y)
> {
> int sum = 0;
> for(int i = x ; i <= y ; i++)
> sum += i;
> return sum;
> }
> ```
> when calling
> |a3|a2|a1|a0|
> |--|--|--|--|
> |0 |0 |10|1 |
>
> when return
> |v1|v0|ra|
> |--|--|--|
> |0 |55 |1030|
* **Variable**
* Since Register is limited so while changing procedure or calling procedure the **callee**(Can reduce the number of register needed to be stored) need to store the original data in the *Saved Register* into memory
* For **Saved Register** the *compiler* will find *frequently used* variable and store it in saved register and these information will be stored in *Symbol Table* like this
> 
* But more Saved Register might not have better efficency.Since it might increase the *clock time* of the CPU and harder to encode the register
* **Memory Management**
> 
* *Local Variables* store in the **Stack**
* *Dynamic Variables* store in the **Heap**
* *SP* store the **top address of the stack**
* *FP* store the **bottom address of thestack**
* MIPS has this feature to avoid the confusion(may have many reference for one variable) casued by the change of SP during the runtime
* We can use *FP+offset* to easily locate certain variable in the stack
* *GP* store the **start address of the Static block**(in this case is 05F00000)
##### Translation Hierarchy
* There's two way to run a program
1. **Compilation**(Translation)
* High Level Language => *Assembly Language*(by **Compiler**) => *Machine Language*(by **Assembler**) => RUN(by **Linker**&**Loader**)
* Linker will combined the original Machine Language with other things like library provided by the programming language to produce the excutable file
* To load a excutable file with loader there's six steps
1. **Read the header** of the exe file to know the size of the file
2. Find a cotinuous space(**Address Space**) in memory which is large enough for the file
3. Copy the exe file into memory
4. If needed **copy the parameter into memory**
5. **Initialize the register** like set the sp to the first free sapce of the stack
6. Jump to **Start-up rountine** to run the program,that says that *all running program are a sub-program of the start-up rountine* and can be terminated by an exit system call called by the start-up rountine
* Adopted by most programming language since it can **use the hardware of the computer directly**
2. **Interpretation**
* Java => Java Binrary Code(by **Compiler**) => Run(with the use of the **Virtual Machine** in this case is JVM)
* This methode can improve the portability and provide **platform independent** for the code but with the use of the virtual machine it's inefficient
* To increase the efficency some frequently used functions will be compiled into machine language by the Just-In-Time compiler(JIT)
##### Instruction Set
> 
1. **Data Movement**
* For data movment we use *reference* like **lw t0,4($s1)** since the memory location of program can be different during the runtime
* Instructions mentioned below are **the only instructions will access memory**
> Example:
> 
> A[12] = h + A[8] in assembly = ? <br>
> For an int array each element is 1 bytes => **the offset can be derived by mutiple 4 with the index**.<br>
> For example A[12] can be represented by 4*12($s0) =48($s0)<br>
> So the answer is
> ***lw t0,32(s0)***
> ***add s1,s1,t0***
> ***sw s1,48(s0)***
* MIPS also supports other instructions
1. **lh**(Load Half)&**lhu**(Load Half Unsigned)
> Let s0 = 10004<br>
> <br>
> **lh t0, 2(s0)**<br>
> => t0 = , ,E8,AC but the first two bits is not defined in this case we need to extend the bits with the first avaliable=>**E** in binary is 1110 => extend 1 to the first two bits
> =>**ANS = FF,FF,E8,AC** <br>
>
> **lhu t0, 2(s0)** <br>
> => t0 = , , E8,AC since the data we load is an unsigned => the first bit is 0 =>extend 0 to the first two bits => **ANS = 00,00,E8,AC**
3. **lb**(Load Byte)&**lbu**(Load Byte Unsigned)
* Same as the lh but only load one byte
> ***lb t0, 4(s0)***
> t0 = , , ,B4 => B in binary is 1011 => **ANS = FF,FF,FF,B4**
4. **sh**(Store Half)&**sb**(Store Byte)
* When storing we store the data into a smaller space(register is 4bytes but we store 1 byte at a time)so there's no need to extend the bits
>let s0 = 10040 and t1 = FF,23,BE,F4<br>
>***sh t0 4(s0)*** => **ANS => 10044 = BE 10045 = F4**
>***sb t0 4(s0)*** => **ANS => 10044 = BE**
* Sometimes we use an element in array as the other array's index like *A[B[4]]*
> let A(s6) B(s7) <br>
> To derived the index for A we need to first load the B[4]<br>
> **(1).** ***lw t0,16(s7)***<br>
> Since there's no mutiple in MIPS we can use **shift left as mutiple 2**(shift right is divide 2)
> **(2).** ***sll t0, 2***(shift left 2 bits = mutiple 4)
> **(3).** ***add t0,t0,s6***(get the index of A)
> **(4).** ***lw t1,0(t0)*** => GET the data we want in array A
2. **Operation**
* **Logicl Movement**
1. **Shift**
* Can shift to right or left
* For **unsigned number** shift left equals to *mutiple 2^n*,shfit right equals to *divide 2^n*
* The shft out bit will be discarded
* Has **logical** & **arithmetic**
* For *logical* the undefined bits are setting to 0(has L&R)
* For *arithmetic* are set according to the sign it is(has only *R*)
> t0 = 10010001 => ***sll t0 1*** =>0010001**0**
> In this case the MSB 1 is discarded,The LSB is set to 0
> ***sra t0 1*** =>**1**1001000
> In this the sign is 1 so the MSB is reset to 1 and the LSB 1 is discarded
2. **Rotate**
* The discarded bits will be reused to the direction we rotate
> 10010001 => rotate left 1 => 0010001**1**
>
> MSB 1 is rotated to LSB
3. **Boolean Arithmetic**
* Operations like **and**/**or**/**nor**/**xor**/
* MIPS doesn't have not instruction instead it use **nor** to implement not
> Pseudo: ***not t0 s1***
> Actual assembly : ***nor t0 s1 s1***
> Other examples like **swap s0 s1**
> Actual assembly: ***xor s0 s0 s1***(s0 = s0 xor s1)
> ***xor s1 s0 s1*** (s1 = [s0 xor s1]xor s1 = s0)
> ***xor s0 s0 s1*** (s0 = [s0 xor s1]xor s0 = s1)
> **a xor a = 0*
* **MASK**
* Can reserve bits we want while set other bits to 0
* Implemented by **and** operation
> s0 = 010**1110**010 => let s1 = 000**1111**000
> ***and s0 s0 s1*** => 000**1110**000
* **SET**
* Can set the bits we want to 1 while others remained the same
* Implemented by **or** operation
> s0 = 010**1110**010 => let s1 = 000**1111**000
> ***or s0 s0 s1*** => 010**1111**010
* **Field Isolation**
* Extract bits in the range(*field*) we want
* Can be implementd by **shift** operations(most efficient way)
> 
> 
> (a):
> field range from 16 to 0
> **(1).** ***sll t1 t0 9*** (t1 = [Field][000000][remaining bits in the right])
> **(2).** ***srl t1 t1 15***
> (b):
> 14+i-j = 31 => The ans should looks like [Field][000...000]
> **(1).** ***srl t1 t0 6***(t1 = [remained btis in the left][000000][Field])
> **(2).** ***sll t1 t1 15***
>(c.):
> 14+i-j = 31 => The ans should looks like [Field][xxx..xxx]
> **(1).** ***srl t2 t0 6***(t2 = [000000][xx...xxx][Field])
> **(2).** ***sll t2 t2 15***(t2 = [Field][000...000])
> **(3).** ***or t1 t1 t2***
* **Arithmetic**
* Can only perform **Binary Operations**
* For compiler, it produce assembly code with the symbol table and high level language
> 
>
> A = B+C+D
>
> ***add $t0 , $s1 , $s2***
>
> ***add $s0 , $t0 , $s3***
* **Divide** and **Modulus** (*Divider must be the power of 2*) can be done by division the data
> 12/4 => 4 = $2^2$ => 12 = 0b1100 => divison from bits number 2 => ans = 0b11 = 3
> 17%2 => 2 = $2^1$ => 17 = 0b10001=> division from bits number 1 => 0b1 = 1
* **Flow Control**
* Can jump betweeen the different **Basic Block**
> Basic block is a sequence of instructions without brach(except jump-in(branch label) in begining and jump-out(branch instruction) in the end)
* We use **Label** to specify different basic block
1. **beq**(Branch if equal) & **beq**(Branch if not equal)
> 
> The final **j exit** is required to avoid the program continue
> <br>
> Let A(s5) g(s1) h(s2) i(s3) j(s4)
> **Assembly**=>
> ***Loop: sll t1 s3 2*** // Get the index of A
> ***add t1 t1 s5***
> ***lw to 0(s3)*** // Get the data in A[i]
> ***add s1 s1 t0*** // g = g + A[i]
> ***add s3 s3 s4*** // i = i + j
> ***beq s1 s2 Loop***
* **slt**(Set if not equal)
> ***slt t0 s1 s2*** equals to the following C statement:
> ````C
> if(s1 < s2)
> t0 = 1;
> else
> t0 = 0;
> ````
* **Immediate Instructions**
* One operand can be constant**(avoid the *lantency for loading data* from memory)
> ***addi t1 t0 5***
* The constant is **signed** and ranged from $2^{15}$ to $-2^{15}$
* Most instructions in MIPS have immediate variant(since the constant is signed so there's *no subi*)
* If the constant is larger than 32 bits,we divied it into two 16bits segements and load it into register respectively
> addi s0 zero 80000
> 80000 in 2's complement binary is `0000000000000001,0011100010000000` => in deciaml `1,14464`
> **(1).** ***lui t0 1***
> **(2).** ***ori to 14463***
* **Pseudoinstruction**
* Some instructions is not included by MIPS but the assmebly support
* The assembler will transform it to the instrcutions that MIPS can excute.Following is some examples.(There's more see textbook)
1. **blt**(Branch if less than)
> Pseudo : ***blt s1 s2 L***
> Actual assembly: ***slt t0 s1 s2***
> ***bne t0 zero L***
2. **bgt**(Branch if greater than)
> Pseudo : ***brt s1 s2 L***
> Actual assembly: ***slt t0 s2 s1***
> ***bne t0 zero L***
##### Instruction Format
* There's four types of format
1. **R-type**(*Register*)
> 
* For instructions involved with **three register**(*all arithmetic instructions*) or **two register alonged with 5 bits offset**(*all shift movement*)
* OP code **always be `000000`**
3. **I-type**(*Immediate*)
> 
* For instructions involved with **two register and 16bits offset/constant**
* **Branch/Immediate/Data Movement**
5. **J-type**(*Jump*)
> 
* Only for **j/jal**
* Temporary Register numbered from `8-15`
* Saved Register numbered from `16-23`
* Op code and Function code can be found in table
>  
>(1). ***sll t1 s3 2*** => rs=null,rt=t1,rd=s3 => [000000][00000][10010][01001][00010][000000]
>(2). ***add t1 t1 s6*** => rs=t1,rt=s6,rd=t1 => [000000][01001][10110][01001][00000][100001]
>(3). ***lw t0 0(t1)*** => rs=t1,rt=t0 =>[100011][01001][01000][000000000000]
>(4). **bne t0 s3 Exit** => rs=t0,rt=s3 =>[000101][01000][10101][0000000000000010]
>* The offset for branch is **calculated from the address of PC**,in this case after fetch bne the PC points to *addi s3 s3 1* and the Exit block is in next 2 memory location => the offset is 2
>
>(5). ***addi s3 s3 1*** => rs=s3,rd=s3 =>[001000][01011][01011][0000000000000001]
>(6). ***j Loop*** =>[000010][100111000100000]
>* MIPS supports **at most 4GB of memory and divide into 16($2^4$) blocks with each of the block is 256MB** => The first 4bits is block number.
>* One program will not access other program's memory,that's say the intra-jump instruction will jump between the same block => **The first 4bits can be ingored**
>* For one slot in memory it store 4byte => The change will only be occurred after bits nubmer 2 => **bits 1 and 0 can be ingored**
> * We only need 32-4-2 = 26 bits to specify an address in same memory block
* The range of jump/branch is determined by the number of the address bits in intruction
* **Jump**
* The destination address for jump is reconstructed by the assmebler and the first 4 bits which indictate the memory block that program exisit is determined by the PC,so the **Jump instruction can't jump between differnet memory block**
* **Branch**
* Branch takes 16bits as offset so the range for jumping is **2000-1FFC**
* The memory is a ring structure so **it is possible for branch instruction branches to different memory block**
* When we want to branch to a address far away we can combined jump with branch to do so
> 
###### Procedure Call
* Both caller and callee has something to do before calling and returning
> 
* If the callee will call other procedure(not leaf procedure)it needs to store the **ra** and **v0**
> 
>Eample:
>
>
>
> ---
>For callee:
>
>---
>*First Section*(Store register into stack)
>* There's only 1 register needs to store
>***addi sp sp -4*** // One variable needs 4 bytes
>***sw s0 0(sp)*** // Store f into stack
>---
>*Second Section*(Procedure Body)
>***add a0 a0 a1*** // g+h
>***add a2 a2 a3*** // i+j
>***sub s0 a0 a2*** // f=(g+h)-(i+j)
>
>---
>*Third Section*(Setup return value)
>***add v0 s0 zero***
>
>---
>*Fourth Section*(Restore the data in stack)
>***lw s0 0(sp)***
>***addi sp sp 4***
>
>---
>*Fifth Section*(Return)
>***jr ra***
>Example 2:
>
>* n stores in a0
>* The procedure will be the caller and callee at the same time
>---
>*First Section*(Store register into stack)
>* There's only 1 register needs to store
>***addi sp sp -8*** // Store ra and v0
>***sw ra 0(sp)*** // Store ra into stack
>***sw v0 4(sp)*** // Store v0 into stack
>---
>*Section for If(n-1)*
>***slti t0 a0 1*** // if(n < 1) set s0 into 1
>***beq t0 zero ELSE*** // Branch if t0=0(n>1)
>
>---
>*Section for ture*
>***addi v0 1 zero*** // Set up return value
>***addi sp sp 8*** // Reset sp
>***jr ra*** // Return
>* Since the segements above haven't change ra and v0 so there's no need to restore it
>---
>*Section for ELSE*
>***addi v0 v0 -1*** // Setup augument (n-1)
>***jal FACT*** // Call FACT
>***lw ra 0(sp)*** // Restore the original ra
>***lw a0 4(sp)*** // Restore the original v0(n)
>***mul v0 a0 v0*** // Setup return value (n*fact(n-1))
>***jr ra*** // Return
##### Addressing Mode
* There's Five diffrent addressing mode
1. **Immediate Addressing Mode**
* The operand is **constant** within the instruciton
> ***addi t0 t0 1***
> 
2. **Register Addressing Mode**
* The operand is in the **register**
> ***add t0 t0 t1***
> 
3. **Base or Displacement Addressing Mode**
* The operand is in the a certain location of memory and the address is a sum of **register and constant offset**
> ***add t0 t0 4(s0)***
>
4. **PC-Relative Addressing Mode**
* The **Branch** address is a sum of PC and constant offset
> ***beq t0 t1 LABEL***
> 
5. **Pseudodirect Addressing Mode**
* The **Jump** address is a sum of 26bits address and upper four bits of PC
> ***j LABEL***
> 
* When designing ISA there's four things need to be awared
1. **Simplicity favors regularity**
> Like all instructions are 32 bits
2. **Smaller is faster**
> Like there's only 32 registers
3. **Make the common case fast**
> Like immediate and PC-relative addressing mode
4. **Good design demands some compromises**
> Like there's three different instructions format
##### RISC vs CISC
>
##### Different styles of ISA
1. **Accumulator**
* There's **only one register** in the CPU called accumulator
> Example: a = b + c
> ***load Memory[Address_b]***
> ***add Memory[Address_c]***
> ***store Memory[Address_a]***
* Since the only operand in the instruction is accumulatot this style is also known as **1 operand style**
2. **Memory to Memory**
* Can have register or no register
> Example: a = b + c
> ***add Memory[Address_a], Memory[Address_b], Memory[Address_c]***
* **3 operands style**
3. **Stack**
* Have **no register**
>Example: a = b + c
>***push Memory[Address_b]***
>***push Memory[Address_c]***
>***add***
>***pop Memory[Address_a]***
* The instruction is for the **top 2 element in the stack**
* Must push/pop with the order given by the high level language
>Example: a = (b+c) - d
>***push b***
>***push c***
>***add***
>***push d***
>***sub***
>***pop a***
* **0 operand style**
4. **Load/Store**
* Like MIPS
>Example: a = b + c
>***load R1 b***
>***load R2 c***
>***add R1 R1 R2***
>***store R1 a***
* **3 operands style**