# Design Abstraction
* As we mentioned before in CH1,ISA is an *Design Abstraction* can be implemented in different ways but all implementation must **follow the reqiurement provided by the ISA**
> 
## Components to Build up Computer in MIPS
1. **Memory**
> 
> 
> *Read line is optional*
2. Register/Instruction Format
>
* We use *Adder* to perform calculation **for memory address**,the ALU is used for other purpose
> 
* And it is often to extend bits less than 32bits to 32bits.We use a dedicated *extendsion unit* to perform it.
> 
3. Logic Gate
>
# Constructing Datapath
* It's difficult to build up whole datapath in one step.Therefore we divide it into small parts and reconstruct it to datapath
* And the logic graph might be messy to see.We can use a **Register Transfer Language(RTL)** to simulate the data transfering in programming language
> 
## Different Instruction's Datapath
1. **R-Type**
* Use ***add t1 s1 s2*** as example
>
>
* In RTL => ***R[rd] <- R[rs] op R[rt]***
2. **Load**
* Use **lw s1 12(s2)** as example
* The address is derived by adding 12 and s2 but the 12 is an 16 bits number thus there's need to extend it to 32bits
>
* In RTL => ***R[rt] <- MEM[R[rs]+SE(imm16))]***
3. **Store**
* Use **sw s1 12(s2)** as example
>
* In RTL => ***MEM[R[rs]+SE(imm16)] <- R[rt]***
4. **Branch**
* Use **beq s1 s2 L1** as example
* Recall that we implement comparsion by subtract them therefore in this case we can't use ALU for calculating address.
> 
* The branch address is calculated by $(PC+4)+ L1*4$ since we start counting from the next instruction's address
## Rebuilding Datapath
* We rebuild it following the order decided by the similarity of the hardware used by each of the single datapaths
> 
1. *Combine lw and sw*
> 
* Eventhough the *Read data* will have output when performing lw but the control line will keep it off from writing the memory
2. *Combine with R-type*
> 
* In this case,some data lines are linked together which is not accpetable.Therefore we use a **multiplexer** to decied which line will be used
> 
* For example,when performing R-type instruction like *add t1 s1 s2*.The first MUX = 1,second MUX = 0,third MUX = 1
3. *Combine with Branch*
> 
* The upper part is used for calculating the branch address
* The *zero line* of ALU is connected to the MUX along with an AND gate whose input 1 is 0 to avoid errors when performing instructions other than branch
4. *Combine with Control Units*
* We use *multiple level* CU for constructing
> 
> 
## Constructing Control Unit
### ALU Control
* The ALU can perform *add/sub/and/or/slt/nor* therefore it needs 4bits to represent all 5 instructions
> 
* Depending on the instruction type the Main Control will give ALU Control different input
> 
* By combining the chart above with the instruction we defined we can derive a truth table for the output of ALU Control
> 
* Since some of the inputs are don't care thus we can simplify the truth table to the table below(**JUST WATCH IT**)
>
> 
### Main Control
* Main Contorl has 6 output lines to control *4 MUX & 3 Storage R/W & 2 ALUop*
> 
> For a *R-type* instruction
> (1). RegDst = 1 (2). RegWrite = 1 (3). ALUSrc = > 0 (4).ALUOp0 = 1 ALUOp1 = 0
> (5). Branch = 0 (6). MemWtite = 0 (7). MemRead = 0 (8). MemtoReg = 0
* The following chart can be derived by the process above
> 
> Some question will place the MemtoReg MUX upside down
* Like the ALU Control, some input is don't care factor in this case.Therefore we can simplify the chart above to the following chart(**JUST WATCH IT**)
> 
* To implement the truth table in hardware,we use **Programmable Logic Array(PLA)**
> 
> For *R-type* instruction the OP code is `000000` and the output *RegDst & RegWrite & ALUOp1* should be `1` in R-type.Therefore we make the output of AND Gate_R be the `1` and line it to the correspond output line.We drived the hardware by the process above.
#### PLA
* Is inspired by the **SumOfProduct** form of boolean expression
> For example => $f(a,b,c) = a\bar b+abc+b\bar c$
> It will always be some result of AND gate along with the OR Gate to get the final answer
> Thus we prepare some AND&OR Gate beforehand and use a program to define which output line is needed to connect to the OR Gate
> 
#### Adding JUMP
* Since the JUMP instruction will influence the PC thus we need to add another MUX after the PCSrc
> 
* We first *shift right 2* to restore the 0 of last 2 bits.For the first 4 bits(Block number),we can use the PC+4's block number directly since it's inside the same program
#### Effect of the Control Line
* Consider the following line is being stuck at 0/1 which instructions will still work.
> 
##### RegWrite
| | R-type | lw | sw | beq |
| --- |:--------:| -------- | -------- | -------- |
| 0 | $\times$ | $\times$ | $\vee$ | $\vee$ |
| 1 | $\vee$ | $\vee$ | $\times$ | $\times$ |
##### ALUop1
| | R-type | lw | sw | beq |
| --- |:-------------------:| -------- | -------- | ------ |
| 0 | $\times$(exception) | $\vee$ | $\vee$ | $\vee$ |
| 1 | $\vee$ | $\times$ | $\times$ | $\vee$ |
* The **Addtion for R-type will still work** when stuck at 0.Since the ALU will perform add when ALUop1 = 0.
##### ALUop0
| | R-type | lw | sw | beq |
| --- |:---------:| -------- | -------- | -------- |
| 0 | $\vee$ | $\vee$ | $\vee$ | $\times$ |
| 1 | $\vee$`!` | $\times$ | $\times$ | $\vee$ |
* `!` ALUop0 is don't care term for R-type instruction
##### Branch
| | R-type | lw | sw | beq |
| --- |:--------:| -------- | -------- |:--------:|
| 0 | $\vee$ | $\vee$ | $\vee$ | $\times$ |
| 1 | $\times$ | $\times$ | $\times$ | $\vee$ |
##### MemRead
| | R-type | lw | sw | beq |
| --- |:------:| -------- | ------ |:------:|
| 0 | $\vee$ | $\times$ | $\vee$ | $\vee$ |
| 1 | $\vee$ | $\vee$ | $\vee$ | $\vee$ |
##### MemWrite
| | R-type | lw | sw | beq |
| --- |:--------:| -------- | -------- |:--------:|
| 0 | $\vee$ | $\vee$ | $\times$ | $\vee$ |
| 1 | $\times$ | $\times$ | $\vee$ | $\times$ |
* Since it will write the data to wrong adrress whenever there's an address in memory file.Therfore when then MemWrite is stuck at 1 all instructions except sw will be wrong.
> Example:
> (2.) 1 4
> (3.) 3
# Single Cycle Machine Performance
* To determine the cycle time for the machine we need to first find out the **longest time needed to perform an instruction** by identify the **critical path** for all instructions.
>
* From the chart above we can see that **lw** has the longest critical path,that's say **the minimum time for clock cycle is the time used for perform lw**
> Example:
> | | R-type | lw | sw | beq | Jump|
> | --- |:--------:| -------- | -------- |:--------:|:----|
> | Time| 400 | 600 | 550 | 350 | 200 |
>
> For single cycle => cycle time = 600ps
> For variable clock => avg clock time =$400*0.45+500*0.23+550*0.1+350*0.15+200*0.05 = 447.5ps$
> variable clock is faster than single clock by $600/447.5=1.34$
> Example:
> 1. R-type => PIMRAMR => $400+30+200+120+30+200 = 980$
> 2. lw => PIMRADMR => $400+30+200+120+350+30+200 = 1300$
> 3. beq => PIMRAM => $400+30+200+120+30 = 780$
# Multiple Cycle Machine
* We break instrucitons into series of steps and those **steps contain only one main functional unit** to avoid lengthing the cycle time(i.e longest steps)
* Functional units contains **Memory/Register File,ALU**
>
> Example:

> Caulation => CPI = 4
> Branch => CPI = 3
> MemRead => CPI = 5
> MemWrite => CPI = 4
> Cycle Time = 60ns
> $AvgExeTime = AvgCPI * \frac{1}{CycleTime}$
> =>$4*0.47+3*0.2+5*0.21+4*0.12 = 4.01$ => $Ans =4.93*60*10^{-9} = 240.6ns$
> Example:
> lw needs 5 steps => 6th clock is fetching instruction for lw t3 4(t3)
> The second step for beq is *reading register&decoding* => 12th clock is reading register and decoding
> Example:

> (1).
> $AvgCPI_{Multiple} = 6*0.2+5*0.14+4*0.5+4*0.12+3*0.04 = 4.5$
> $AvgCPI_{Single} = 0.5+0.2+0.14+0.12+0.04 = 1$
> => 4.5/1 =4.5
>(2).
>In single lw takes 60ns => In multiple the cycle time is $60/6= 10ns$ but it has 2ns delay between each step => *12ns*
> =>$Speedup = \frac{1*60}{4.5*12} = 1.11$
> Example:
>
> 1). *4+N* since it is the longest excution time among all instructions
> 2). $ExeTime_{Mul} = {\frac{\sum_{i=1} ^{N}4+i}{N}* 1ns = 4.5+4N}$
> For single cycle the $AvgCPI = 1$ =>$ExeTime_{Single} = 1 * 4+N = 4+N$
> => $SpeedUp = \frac{4+N}{4.5+4N}$
> 3). $\frac{4+N}{4.5+4N} = 5 => N = -12.33$ => Impossible
> 4). i.e the frequency for class i instruction is $\frac{i}{\sum_{i=1} ^{N} i}$ => $ExeTime_{Mul} = {\frac{\sum_{i=1} ^{N}(4+i)i}{\sum_{i=1} ^{N} i}= 4+\frac{2N+1}{3}}$
> => $SpeedUp= \frac{4+N}{4+\frac{2N+1}{3}} = \frac{12+3N}{13+2N}$
> 5). $ExeTime_{Single} = 9$, $ExeTime_{Mul} = 2*\frac{\sum_{i=1} ^{5}\frac{(4+i)}{2}}{5} = 7$
> => $SpeedUp = 9/7 = 1.2857$