# Rework Homework1 ## Container With Most Water Leetcode hyperlink:https://leetcode.com/problems/container-with-most-water/ You are given an integer array height of length n. There are n vertical lines drawn such that the two endpoints of the ith line are (i, 0) and (i, height[i]). Find two lines that together with the x-axis form a container, such that the container contains the most water. Return the maximum amount of water a container can store. Notice that you may not slant the container. Example 1: ![](https://i.imgur.com/nozDFt8.png) Input: height = [1,8,6,2,5,4,8,3,7] Output: 49 Explanation: The above vertical lines are represented by array [1,8,6,2,5,4,8,3,7]. In this case, the max area of water (blue section) the container can contain is 49. ### C code ``` #include<stdio.h> #include<stdlib.h> int maxArea(int* height, int heightSize){ int left=0; int right=heightSize-1; int shortline; int currentarea; int maxarea=0; while(left!=right) { if(height[left]>height[right]) { shortline=height[right]; currentarea=shortline*(right-left); if(currentarea>maxarea) maxarea=currentarea; right--; } else { shortline=height[left]; currentarea=shortline*(right-left); if(currentarea>maxarea) maxarea=currentarea; left++; } } return maxarea; } int main() { int height[]={1,8,6,2,5,4,3,7}; int heightSize=8; int a; a=maxArea(height,heightSize); printf("%d\n",a); } ``` ## Assembly code ``` .data arr1: .word 1,8,6,2,5,4,3,7 # h[] = {1,8,6,2,5,4,3,7} len: .word 8 #len=8 .text main: la s1,arr1 # load address of the array lw s2,len la s3,arr1 add s4,zero,zero #currentarea add s5,zero,zero #maxarea add t1,zero,zero #left's area addi t2,s2,-1 #right's area lw a3 28(s3) #a3=h[7] #right's value lw a4,0(s1) #a4=h[0] #left's value add a6,zero,zero #right-left loop: beq t1,t2,print #if left=right then return maxarea sub a6,t2,t1 #a6=right-left bgt a4,a3,left #if left>right mul s4,a4,a6 #currentarea=left*a6 addi t1,t1,1 #left'value+1 addi s1,s1,4 lw a4,0(s1) #left'value+1 bgt s4,s5,max #if currentarea>maxarea=>max jal ra,loop max: add,s5,s4,zero jal loop left: mul s4,a3,a6 #currentarea=right*a6 addi t2,t2,-1 #right'value-1 addi s3,s3,-4 lw a3,28(s3) #right'value-1 bgt s4,s5,max jal ra,loop print: mv a0,s5 #equal to addi a0,s5,0 li a7,1 ecall ``` ### Analyze #### RISC-V Pipline We choose a 5-stage pipline processor to run the code. “5-stage” means we divide the process of executing into five parts. They are: 1. Instruction fetch (IF) 2. Instruction decode and register fetch (ID) 3. Execute (EX) 4. Memory access (MEM) 5. Register write back (WB) The value of each stage will be stored in pipeline register. The figure below shows the whole pipline of RISC-V. ![](https://i.imgur.com/hqRkhG9.png) #### Stages of the pipline ##### Instruction fetch (IF) In IF stage, processor get the instruction from memory with the address stored in PC. Also, PC will add 4 for next instruction if there is no branch or jump instruction. ![](https://i.imgur.com/U6rgSuF.png) When there is a **nop** added in the pipline, the enable signal in PC will turn into unable, which means stopping reading the next instruction.Also,we can see that branch instruction is in **EX stage**,so if we need to branch,we will have two nops. ##### Instruction decode and register fetch (ID) In this stage, processor decodes the instruction and decide which registers are going to use. In addition, we will give register a value, which is calculated or read from data by the previous instruction. ![](https://i.imgur.com/MWsOB6f.png) The instruction is 0x4730263 can be decoded into some parts. In riscv-spec we can find the beq specification: ![](https://i.imgur.com/j6dAMmk.png) * opcode : 110011 An I-type instruction * rs1: 0x06 * rs2: 0x07 * Imm : 0x00000044 ##### Execute (EX) In this stage, the ALU of the processor calculates value.Take the figure below for example. ![](https://i.imgur.com/4IfZhkl.png) Also, this is the stage that calculate the address of next instruction when jump or branch instructions are executed in this stage. ![](https://i.imgur.com/Vz1HTTf.png) The figure above shows how the jal is executed. The ALU calculates the address and sends back to PC in IF-stage. ![](https://i.imgur.com/JguTbM6.png) The figure above shows how beq is executed. The Branch taken signal is triggered if the condition is true. The ALU calculates the address and send to PC in IF-stage. ##### Memory access (MEM) In this stage, processor will store the data int o the memory or read the data from memory with the address caculated by ALU. Take the figure below for example, the processor accesses address 0x10000014, which is calculated by ALU, in data memory and get the data is 0x00000004. ![](https://i.imgur.com/m9HFkdH.png) ##### Register write back (WB) In this stage, the data will be written back to a register if the instruction requests for it. The figure below shows how the data write back to register. ![](https://i.imgur.com/ZoSpHQn.png) #### Data Hazards We are likely to face Data Hazard when instructions nearby acquire same register or data. For example, ``` sub a6,t2,t1 #a6=right-left bgt a4,a3,left #if left>right mul s4,a4,a6 #currentarea=left*a6 ``` The first instruction loads the data from ALU or memory into the register in WB stage, but the second instruction need the data right away.In theory, we need to add a nop between these two instructions. But we have forwarding line that we can use forwarding to send the data to EX stage. ![](https://i.imgur.com/KuQ1UiW.png) #### Control Hazards Control Hazards are likely to occur when Control Transfer Instructions are executed. For example, ``` bgt s4,s5,max #if currentarea>maxarea=>max jal ra,loop max: add,s5,s4,zero jal loop ``` If s4<s5,then the jal instruction will jump to loop. As a result, the following instructions add and jal instructions will not be executed right away. However, these instrutions are now in the pipline. Therefore, we have to use two nops to replace them. ![](https://i.imgur.com/z6V6iB2.png) => ![](https://i.imgur.com/bBK2kEz.png) ### Cache Miss When running a program, the processor has to get some data from memory. However, the processor runs too fast so that the memory can’t provide data on time. As a result, we need a cache, which can read and write data much faster than main memory, to store the data that processor needs. When a processor request the data that is not stored in cache, it is called cache miss. The processor needs to get the data from the memory and put in the cache, which causes much more time than just access cache. Therefore, the cache miss rate plays an important role in the performance of a processor. Cache miss have three type: **Compulsory misses**:Also known as cold start misses.Due to the first assumption, when a task executes on a processor, if not preempted by other tasks, the only cache misses are compulsory misses. **Capacity misses**:Capacity misses occur when the cache is too small to hold all concurrently used data. Conflict misses are caused when several addresses map to the same set and evict blocks that are still needed. **Conflict misses**:Conflict misses are caused when several addresses map to the same set and evict blocks that are still needed. ##### Cache design patterns There are three way to choose associative,respectively for 1.direct mapped 2.set associative 3.fully associative ![](https://i.imgur.com/dH9gfVy.png) Ripes in present have choose for three: ![](https://i.imgur.com/KDRqthW.png) First,I use fully associative: ![](https://i.imgur.com/DCOhLwP.png) In Ripes simulator, the address of data or instruction is divided into four parts, which are tag, index, Block and offset. Processor can find the data it needs in cache with these four parts. ![](https://i.imgur.com/Y4eSECy.png) In picture,we can see that * Tag: 31:9 * Index: 8:4 * Block: 3:2 * Offset: 1:0 The offset is always 00 because an address is **4-byte**. With the information above, the processor can search the data it needs in cache quickly. ![](https://i.imgur.com/8R3mDQw.png) The figure above shows how a **cache hit** looks like. The data is right in the cache so that the processor can get the data immediately. However, there are some data that the processor needs are not in cache. Thefore, we have to access the memory, get the data and put in the cache. Usually, we bring a block of data instead of only 1 value, since we are likely use the data around which are stored around the value we need now. ![](https://i.imgur.com/CpbY4Ar.png) #### Data cache miss In the program, the data cache miss is only caused by loading the element to the array. ![](https://i.imgur.com/oOYJBjI.png) ![](https://i.imgur.com/Zb8f5Su.png) ![](https://i.imgur.com/TPjiClK.png) So,in this case, there is only Compulsory misses,but in other cases,they may have Compulsory misses,Capacity misses,and Conflict misses. But if I change cache set into: ![](https://i.imgur.com/ylWBVRC.png) It will cause **Capacity misses**: ![](https://i.imgur.com/04FIrq8.png) ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ ![](https://i.imgur.com/TB4MQiu.png) #### Instruction cache miss There are only **Compulsory misses**. ![](https://i.imgur.com/cPEXhjV.png) When running a program, the processor needs to load the instructions from main memory, which are likely to cause instruction cache miss. Since the number of instructions in a single problem is fixed, the more times a certain part of code is execute, the higher the hit rate is. ## Gas-Station Leetcode hyperlink:https://leetcode.com/problems/gas-station/ There are n gas stations along a circular route, where the amount of gas at the ith station is gas[i]. You have a car with an unlimited gas tank and it costs cost[i] of gas to travel from the ith station to its next (i + 1)th station. You begin the journey with an empty tank at one of the gas stations. Given two integer arrays gas and cost, return the starting gas station's index if you can travel around the circuit once in the clockwise direction, otherwise return -1. If there exists a solution, it is guaranteed to be unique ### C code ``` #include<stdio.h> #include<stdlib.h> int gas_station(int* gas, int gasSize, int* cost, int costSize) { int sum=0; int total=0; int start=0; int i; for(i=0;i<gasSize;i++) { sum+=gas[i]-cost[i]; total+=gas[i]-cost[i]; if (sum<0) { sum=0; start=i+1; } } return (total<0)?-1:start; } int main() { int gas[]={1,2,3,4,5,2,1,2,5,8,2}; int cost[]={3,4,5,1,2,3,7,4,2,6,1}; int gasSize=11; int costSize=11; int a; a=gas_station(gas,gasSize,cost,costSize); printf("%d\n",a); } ``` ## Assembly code ``` .data arr1: .word 1,2,3,4,5,2,1,2,5,8,2 arr2: .word 3,4,5,1,2,3,7,4,2,6,1 len: .word 11 .text main: la s1,arr1 # gas la s2,arr2 #cost lw s3,len #gas and cost size add s4,zero,zero #sum add a1,zero,zero #total add a2,zero,zero #start add a3,zero,zero #i lw a4,0(s1) #gas[0] lw a5,0(s2) #cost[0] li a6,-1 #-1 loop: sub t0,a4,a5 #t0=gas[i]-cost[i] add s4,s4,t0 #sum=sum+t0 add a1,a1,t0 #total=total+t0 bltz s4,restart #sum<0 =>restart addi a3,a3,1 #i=i+1 beq a3,s3,ans #if i==len =>ans addi,s1,s1,4 lw a4,0(s1) #gas[i+1] addi s2,s2,4 lw a5,0(s2) #cost[i+1] jal loop restart: li s4,0 #sum=0 addi a2,a3,1 #start=i+1 addi a3,a3,1 #i=i+1 addi,s1,s1,4 lw a4,0(s1) #gas[i+1] addi s2,s2,4 lw a5,0(s2) #cost[i+1] jal loop bad: mv a0,a6 j 12 ans: bltz a1,bad mv a0,a2 #equal to addi a0,a2,0 li a7,1 ecall ``` ### Cache Miss **Compulsory misses**: ![](https://i.imgur.com/F8zysy9.png) For the data that first put from memory to cache,it will cause Compulsory misses.Also,because of cache setting ![](https://i.imgur.com/QYvuWqh.png) One cycle will make 4 word ,so if I have 23 components,23/4=5. It will make 6 times Compulsory misses. **Capacity misses**: When we change cache set into ![](https://i.imgur.com/mn3FS0K.png) Because of lack of cache size,when the new datas come,it will make capacity misses. ![](https://i.imgur.com/QPEMo7y.png) ![](https://i.imgur.com/mpoRHMJ.png) **Conflict misses**: When we increase associative from 2-way to 8-way,we can see that hit rate is increase. ![](https://i.imgur.com/2PYk6JW.png) => ![](https://i.imgur.com/AiEhj3n.png) So,this result accomplish that reduce the associative of cache will increase the competition with each block. So if we want to make the highest hit rate,we should use fully-connected.But it is hard to accomplish fully-connected on the hardware.