# CA_HW1 ## Problem You are given an array prices where prices[i] is the price of a given stock on the ith day. You want to maximize your profit by choosing a single day to buy one stock and choosing a different day in the future to sell that stock. Return the maximum profit you can achieve from this transaction. If you cannot achieve any profit, return 0. examples: ![](https://i.imgur.com/XrstpPT.png) ![](https://i.imgur.com/sAlQbO9.png) ## Method First We should initialize our min_peice varibles to be the max value of "int", the p_gap varibles means the gap of profit, so in the beginning the value of p_gap should be zero. After we finished the initialization. We start to scan the array which recorded the stock price everyday. First, when the data that we scaned was less than the value of "min_price" varibles, we updated the value of min_price to be the value of the data that we scaned (i.e. prices[i] ). Second, if the prices[i] value minus the min_value is larger than the p_gap varibles, than we updated the p_gap value to be prices[i] minus min_price. After the for loop is done, p_gap value should be the largest profit in these trading days. ## The answer of the question in Leetcode (without main function) ```=c int maxProfit(int* prices, int pricesSize){ int min_price=INT_MAX; int p_gap=0; for (int i=0;i<pricesSize;i++){ if(prices[i]<=min_price) min_price=prices[i]; //update min_price value if(prices[i]-min_price>=p_gap) p_gap=prices[i]-min_price; //update the newest price gap } return p_gap; } ``` ## C code & Assembly ```=c #include <stdio.h> #include <limits.h> int maxProfit(int* prices, int pricesSize){ int min_price=INT_MAX; int p_gap=0; for (int i=0;i<pricesSize;i++){ if(prices[i]<=min_price) min_price=prices[i]; //update min_price value if(prices[i]-min_price>=p_gap) p_gap=prices[i]-min_price; //update the newest price gap } return p_gap; } int main(){ int prices[] = {7,1,5,3,6,4}; printf("%d\n", maxProfit(prices,6)); return 0; } ``` ```=asm .data arr: .word 7,1,5,3,6,4,11,3,5,7,1,20 # array len: .word 12 # number of elemnet in array str: .string "Max Profit in these days is " .text # s1 = base address of array # t0 = i # t1 = nums[i] # s2 = p_gap (gap of prize) # s3 = numsSize (number of elemnet in array) # s4 = min_price (intmax) main: la s1, arr # the value in s1 is the address of nums[0] li s4, -1 # s4 = -1 srli s4, s4, 1 # let -1 shift right logic imm by 1, to get the largest num lw s3, len # s3 = numsSize add t0, x0, x0 # i = 0 add s2, x0, x0 # s2 =0 jal ra, loop # jump to loop and save the addr. of instr "jal ra, print" to reg "ra" jal ra, print # jump to print and save the addr. of instr "li a7, 10" to reg "ra" li a7, 10 # end the program ecall loop: lw t1, 0(s1) # t1 = prices[i] bgt t1, s4, if2 # if (prices[i]>min_price) than branch add s4, x0, t1 # min_price=prices[i] if2: sub t2, t1, s4 # price[i]-min_price blt t2, s2, if3 # if(prices[i]-min_price<p_gap) than branch sub s2, t1, s4 # p_gap=prices[i]-min_price if3: addi s1, s1, 4 # nums++ (address move forward) addi t0, t0, 1 # i++ blt t0, s3, loop # if i < array length jr ra # else, return to main print: la a0, str # load string li a7, 4 # print string ecall add a0, s2, x0 # load result li a7, 1 # print integer ecall jr ra # go back to main ``` ## Pseudo instruction translation & machine code ```asm 00000000 <main>: 0: 10000497 auipc x9 0x10000 4: 00048493 addi x9 x9 0 8: fff00a13 addi x20 x0 -1 c: 001a5a13 srli x20 x20 1 10: 10000997 auipc x19 0x10000 14: 0209a983 lw x19 32 x19 18: 000002b3 add x5 x0 x0 1c: 00000933 add x18 x0 x0 20: 010000ef jal x1 16 <loop> 24: 034000ef jal x1 52 <print> 28: 00a00893 addi x17 x0 10 2c: 00000073 ecall 00000030 <loop>: 30: 0004a303 lw x6 0 x9 34: 006a4463 blt x20 x6 8 <if2> 38: 00600a33 add x20 x0 x6 0000003c <if2>: 3c: 414303b3 sub x7 x6 x20 40: 0123c463 blt x7 x18 8 <if3> 44: 41430933 sub x18 x6 x20 00000048 <if3>: 48: 00448493 addi x9 x9 4 4c: 00128293 addi x5 x5 1 50: ff32c0e3 blt x5 x19 -32 <loop> 54: 00008067 jalr x0 x1 0 00000058 <print>: 58: 10000517 auipc x10 0x10000 5c: fdc50513 addi x10 x10 -36 60: 00400893 addi x17 x0 4 64: 00000073 ecall 68: 00090533 add x10 x18 x0 6c: 00100893 addi x17 x0 1 70: 00000073 ecall 74: 00008067 jalr x0 x1 0 ``` ## Result A testing data running in Leetcode ![](https://i.imgur.com/J8y34Uk.png) Same output result in ripes ![](https://i.imgur.com/oJmVC3Y.png) ![](https://i.imgur.com/dHFBHxE.png) ## Analysis We have five stage in our RISC V pipelining processor. The main reason that we want to use pipeline architecture is to improve the throughput. In the ideal circumstances. Speedup of the pipeline machine compare to the signal cycyle machine is n-stages times. (ideal means without consider the pipeline stall situation) ### Instrucions 1. auipc => (add upper immediate to pc) is used to build pc-relative addresses and uses the U-type format. AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits withzeros, adds this offset to the pc, then places the result in register rd. Let us take a example to totally understand it! e.g. ```=asm .data arr: .word 7,1,5,3,6,4,11,3,5,7,1,20 # array len: .word 12 # number of elemnet in array . . . . . . . main: la s1, arr # s1 = pointer to nums[0] li s4, -1 # s4 = -1 srli s4, s4, 1 # let -1 shift right logic imm by 1, to get the largest num lw s3, len # s3 = numsSize add t0, x0, x0 # i = 0 add s2, x0, x0 # s2 = 0 jal ra, loop # jump to loop and save the addr. of instr "jal ra, print" to reg "ra" jal ra, print # jump to print and save the addr. of instr "li a7, 10" to reg "ra" li a7, 10 # end the program ecall ``` After assembler transfer, the machine code will become: ```=asm 0: 10000497 auipc x9 0x10000 4: 00048493 addi x9 x9 0 8: fff00a13 addi x20 x0 -1 c: 001a5a13 srli x20 x20 1 10: 10000997 auipc x19 0x10000 14: 0209a983 lw x19 32 x19 ``` Let's us discuss this by drawing memory diagram by hand write. ![](https://i.imgur.com/hcLRxfc.jpg) 2. la => load address 2. li => load immediate 3. jal =>jump to the labeled address and store pc+4 into the ra register. 4. ecall => systam call 5. jr => edit the value of pc to the value in ra register. ### Five stages in pipeline architecture IF---ID---EX---MEM---WB IF:Instruction fetch stage We fetch the instrution from memory in this stage ID:Instruction decode stage We decode the instrution that we fetch in earlier stage EX:Excution stage We excute the instruction by the powerful ALU. MEM: Memory stage The instruction write or read the data in memory, (it depends.) WB: write back stage If the instrucion needs to save data back to the register. ### Why we need pipeline regs? To separate signal in each stages. We can easily flush the instruction in the pipeline register or enable it by control signal. ### What will be happen when we encounter branch condition? ![](https://i.imgur.com/GFMGPU3.png) ![](https://i.imgur.com/yvIRFdj.png) ``` jal x1 16 <loop> ``` When this instruction is in excute stage, after the calculation in ALU, The PC register will be changed and the instruction in Instruction fetch stage & Instruction decode stage should be flushed.(PC register is changed, the instruction in these two stage is no longer vaild) The control signal let the clear signal in pipeline reg of IFID IDEX is 1, the data in these regs will be set to 0. ![](https://i.imgur.com/d0mmcLV.png) ![](https://i.imgur.com/7zFvfha.png) ![](https://i.imgur.com/gZatAdP.png) The incorrect instruction will be eliminated and the correct instruction will be enter the pipeline. ### Why the ecall instruction let the pipeline stall? ![](https://i.imgur.com/mzmxQNl.png) ![](https://i.imgur.com/m7PO3wU.png) The Ecall instruction is in the EX stage now, and we have to let the pipeline stall. ``` addi x10 x10 -36 addi x17 x0 4 ``` The value of these two instruction should be store back to the register file, so we must let the pipeline stall to avoid the incorrectness. (i.e. To read the wrong data before the new data updated) Wait for these two instruction save the value back to the register file. ### Any load-use data hazard encountered in this example? Here is a simple example of load-use data hazard ![](https://i.imgur.com/TfFai7o.png) Now see our cases. ![](https://i.imgur.com/CNosubD.png) Register x6 is not ready until the instruction is in the MEM stage, but the blt instruction are going to use the value of x6 register in EX stage, although this pipeline machine has the forwarding lines but we still have to stall one clock cycle to prevent error when excuting the program. ## Reference Computer Organization and Design: The Hardware Software Interface: RISC V Edition by David A. Patterson, John L. Hennessy https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md