CH4 - HackMD

# Design Abstraction * As we mentioned before in CH1,ISA is an *Design Abstraction* can be implemented in different ways but all implementation must **follow the reqiurement provided by the ISA** > ![](https://hackmd.io/_uploads/H1_ASEDoh.png) ## Components to Build up Computer in MIPS 1. **Memory** > ![](https://hackmd.io/_uploads/HJPG8NPs2.png) > ![](https://hackmd.io/_uploads/SyOQUEPin.png) > *Read line is optional* 2. Register/Instruction Format >![](https://hackmd.io/_uploads/ByinLEwi3.png) * We use *Adder* to perform calculation **for memory address**,the ALU is used for other purpose > ![](https://hackmd.io/_uploads/SybSK4von.png) * And it is often to extend bits less than 32bits to 32bits.We use a dedicated *extendsion unit* to perform it. > ![](https://hackmd.io/_uploads/H1B49NDsn.png) 3. Logic Gate >![](https://hackmd.io/_uploads/Byp55Evo3.png) # Constructing Datapath * It's difficult to build up whole datapath in one step.Therefore we divide it into small parts and reconstruct it to datapath * And the logic graph might be messy to see.We can use a **Register Transfer Language(RTL)** to simulate the data transfering in programming language > ![](https://hackmd.io/_uploads/H1R7nEvih.png) ## Different Instruction's Datapath 1. **R-Type** * Use ***add t1 s1 s2*** as example >![](https://hackmd.io/_uploads/ryjLsVDs2.png) >![](https://hackmd.io/_uploads/r1LFj4wjn.png) * In RTL => ***R[rd] <- R[rs] op R[rt]*** 2. **Load** * Use **lw s1 12(s2)** as example * The address is derived by adding 12 and s2 but the 12 is an 16 bits number thus there's need to extend it to 32bits >![](https://hackmd.io/_uploads/BJ27I_ui2.png) * In RTL => ***R[rt] <- MEM[R[rs]+SE(imm16))]*** 3. **Store** * Use **sw s1 12(s2)** as example >![](https://hackmd.io/_uploads/H1P7zF_ih.png) * In RTL => ***MEM[R[rs]+SE(imm16)] <- R[rt]*** 4. **Branch** * Use **beq s1 s2 L1** as example * Recall that we implement comparsion by subtract them therefore in this case we can't use ALU for calculating address. > ![](https://hackmd.io/_uploads/SkH4dtdon.png) * The branch address is calculated by $(PC+4)+ L1*4$ since we start counting from the next instruction's address ## Rebuilding Datapath * We rebuild it following the order decided by the similarity of the hardware used by each of the single datapaths > ![](https://hackmd.io/_uploads/HyiKiYdi3.png) 1. *Combine lw and sw* > ![](https://hackmd.io/_uploads/rymanFujn.png) * Eventhough the *Read data* will have output when performing lw but the control line will keep it off from writing the memory 2. *Combine with R-type* > ![](https://hackmd.io/_uploads/S1lgJqdjn.png) * In this case,some data lines are linked together which is not accpetable.Therefore we use a **multiplexer** to decied which line will be used > ![](https://hackmd.io/_uploads/r1ajl9Ojn.png) * For example,when performing R-type instruction like *add t1 s1 s2*.The first MUX = 1,second MUX = 0,third MUX = 1 3. *Combine with Branch* > ![](https://hackmd.io/_uploads/HJTL8cdon.png) * The upper part is used for calculating the branch address * The *zero line* of ALU is connected to the MUX along with an AND gate whose input 1 is 0 to avoid errors when performing instructions other than branch 4. *Combine with Control Units* * We use *multiple level* CU for constructing > ![](https://hackmd.io/_uploads/r1Izuquo3.png) > ![](https://hackmd.io/_uploads/BJOSu5doh.png) ## Constructing Control Unit ### ALU Control * The ALU can perform *add/sub/and/or/slt/nor* therefore it needs 4bits to represent all 5 instructions > ![](https://hackmd.io/_uploads/B1zz25dsn.png) * Depending on the instruction type the Main Control will give ALU Control different input > ![](https://hackmd.io/_uploads/SkTY95Oj2.png) * By combining the chart above with the instruction we defined we can derive a truth table for the output of ALU Control > ![](https://hackmd.io/_uploads/rk_g3qus3.png) * Since some of the inputs are don't care thus we can simplify the truth table to the table below(**JUST WATCH IT**) >![](https://hackmd.io/_uploads/r1TY3cdon.png) > ![](https://hackmd.io/_uploads/Syqzh1qo2.png) ### Main Control * Main Contorl has 6 output lines to control *4 MUX & 3 Storage R/W & 2 ALUop* > ![](https://hackmd.io/_uploads/Bk_ohkqjh.png) > For a *R-type* instruction > (1). RegDst = 1 (2). RegWrite = 1 (3). ALUSrc = > 0 (4).ALUOp0 = 1 ALUOp1 = 0 > (5). Branch = 0 (6). MemWtite = 0 (7). MemRead = 0 (8). MemtoReg = 0 * The following chart can be derived by the process above > ![](https://hackmd.io/_uploads/Bk07zg9oh.png) > Some question will place the MemtoReg MUX upside down * Like the ALU Control, some input is don't care factor in this case.Therefore we can simplify the chart above to the following chart(**JUST WATCH IT**) > ![](https://hackmd.io/_uploads/r1O3Hx5o3.png) * To implement the truth table in hardware,we use **Programmable Logic Array(PLA)** > ![](https://hackmd.io/_uploads/BynZPx9jh.png) > For *R-type* instruction the OP code is `000000` and the output *RegDst & RegWrite & ALUOp1* should be `1` in R-type.Therefore we make the output of AND Gate_R be the `1` and line it to the correspond output line.We drived the hardware by the process above. #### PLA * Is inspired by the **SumOfProduct** form of boolean expression > For example => $f(a,b,c) = a\bar b+abc+b\bar c$ > It will always be some result of AND gate along with the OR Gate to get the final answer > Thus we prepare some AND&OR Gate beforehand and use a program to define which output line is needed to connect to the OR Gate > ![](https://hackmd.io/_uploads/S177OZ5i2.png) #### Adding JUMP * Since the JUMP instruction will influence the PC thus we need to add another MUX after the PCSrc > ![](https://hackmd.io/_uploads/ByFIYmain.png) * We first *shift right 2* to restore the 0 of last 2 bits.For the first 4 bits(Block number),we can use the PC+4's block number directly since it's inside the same program #### Effect of the Control Line * Consider the following line is being stuck at 0/1 which instructions will still work. > ![](https://hackmd.io/_uploads/SJdHRXai2.png) ##### RegWrite | | R-type | lw | sw | beq | | --- |:--------:| -------- | -------- | -------- | | 0 | $\times$ | $\times$ | $\vee$ | $\vee$ | | 1 | $\vee$ | $\vee$ | $\times$ | $\times$ | ##### ALUop1 | | R-type | lw | sw | beq | | --- |:-------------------:| -------- | -------- | ------ | | 0 | $\times$(exception) | $\vee$ | $\vee$ | $\vee$ | | 1 | $\vee$ | $\times$ | $\times$ | $\vee$ | * The **Addtion for R-type will still work** when stuck at 0.Since the ALU will perform add when ALUop1 = 0. ##### ALUop0 | | R-type | lw | sw | beq | | --- |:---------:| -------- | -------- | -------- | | 0 | $\vee$ | $\vee$ | $\vee$ | $\times$ | | 1 | $\vee$`!` | $\times$ | $\times$ | $\vee$ | * `!` ALUop0 is don't care term for R-type instruction ##### Branch | | R-type | lw | sw | beq | | --- |:--------:| -------- | -------- |:--------:| | 0 | $\vee$ | $\vee$ | $\vee$ | $\times$ | | 1 | $\times$ | $\times$ | $\times$ | $\vee$ | ##### MemRead | | R-type | lw | sw | beq | | --- |:------:| -------- | ------ |:------:| | 0 | $\vee$ | $\times$ | $\vee$ | $\vee$ | | 1 | $\vee$ | $\vee$ | $\vee$ | $\vee$ | ##### MemWrite | | R-type | lw | sw | beq | | --- |:--------:| -------- | -------- |:--------:| | 0 | $\vee$ | $\vee$ | $\times$ | $\vee$ | | 1 | $\times$ | $\times$ | $\vee$ | $\times$ | * Since it will write the data to wrong adrress whenever there's an address in memory file.Therfore when then MemWrite is stuck at 1 all instructions except sw will be wrong. > Example:![](https://hackmd.io/_uploads/HkITiSTsn.png) > (2.) 1 4 > (3.) 3 # Single Cycle Machine Performance * To determine the cycle time for the machine we need to first find out the **longest time needed to perform an instruction** by identify the **critical path** for all instructions. >![](https://hackmd.io/_uploads/ry7IZUaoh.png) * From the chart above we can see that **lw** has the longest critical path,that's say **the minimum time for clock cycle is the time used for perform lw** > Example:![](https://hackmd.io/_uploads/S1h1MUTj2.png) > | | R-type | lw | sw | beq | Jump| > | --- |:--------:| -------- | -------- |:--------:|:----| > | Time| 400 | 600 | 550 | 350 | 200 | > > For single cycle => cycle time = 600ps > For variable clock => avg clock time =$400*0.45+500*0.23+550*0.1+350*0.15+200*0.05 = 447.5ps$ > variable clock is faster than single clock by $600/447.5=1.34$ > Example:![](https://hackmd.io/_uploads/HJq_NUajn.png) > 1. R-type => PIMRAMR => $400+30+200+120+30+200 = 980$ > 2. lw => PIMRADMR => $400+30+200+120+350+30+200 = 1300$ > 3. beq => PIMRAM => $400+30+200+120+30 = 780$ # Multiple Cycle Machine * We break instrucitons into series of steps and those **steps contain only one main functional unit** to avoid lengthing the cycle time(i.e longest steps) * Functional units contains **Memory/Register File,ALU** >![](https://hackmd.io/_uploads/BJGDcapj2.png) > Example:![](https://hackmd.io/_uploads/BJwoq6aoh.png) ![](https://hackmd.io/_uploads/ryyhq66j3.png) > Caulation => CPI = 4 > Branch => CPI = 3 > MemRead => CPI = 5 > MemWrite => CPI = 4 > Cycle Time = 60ns > $AvgExeTime = AvgCPI * \frac{1}{CycleTime}$ > =>$4*0.47+3*0.2+5*0.21+4*0.12 = 4.01$ => $Ans =4.93*60*10^{-9} = 240.6ns$ > Example:![](https://hackmd.io/_uploads/Syn5p6aon.png) > lw needs 5 steps => 6th clock is fetching instruction for lw t3 4(t3) > The second step for beq is *reading register&decoding* => 12th clock is reading register and decoding > Example:![](https://hackmd.io/_uploads/B1llbAash.png) ![](https://hackmd.io/_uploads/BkRg-Aas2.png) > (1). > $AvgCPI_{Multiple} = 6*0.2+5*0.14+4*0.5+4*0.12+3*0.04 = 4.5$ > $AvgCPI_{Single} = 0.5+0.2+0.14+0.12+0.04 = 1$ > => 4.5/1 =4.5 >(2). >In single lw takes 60ns => In multiple the cycle time is $60/6= 10ns$ but it has 2ns delay between each step => *12ns* > =>$Speedup = \frac{1*60}{4.5*12} = 1.11$ > Example:![](https://hackmd.io/_uploads/HJNdezk2h.png) >![](https://hackmd.io/_uploads/B1wqgfknh.png) > 1). *4+N* since it is the longest excution time among all instructions > 2). $ExeTime_{Mul} = {\frac{\sum_{i=1} ^{N}4+i}{N}* 1ns = 4.5+4N}$ > 　 For single cycle the $AvgCPI = 1$ =>$ExeTime_{Single} = 1 * 4+N = 4+N$ > 　 => $SpeedUp = \frac{4+N}{4.5+4N}$ > 3). $\frac{4+N}{4.5+4N} = 5 => N = -12.33$ => Impossible > 4). i.e the frequency for class i instruction is $\frac{i}{\sum_{i=1} ^{N} i}$ => $ExeTime_{Mul} = {\frac{\sum_{i=1} ^{N}(4+i)i}{\sum_{i=1} ^{N} i}= 4+\frac{2N+1}{3}}$ > 　=> $SpeedUp= \frac{4+N}{4+\frac{2N+1}{3}} = \frac{12+3N}{13+2N}$ > 5). $ExeTime_{Single} = 9$, $ExeTime_{Mul} = 2*\frac{\sum_{i=1} ^{5}\frac{(4+i)}{2}}{5} = 7$ > 　=> $SpeedUp = 9/7 = 1.2857$