arch2025-homework1

# problem B Assignment1: RISC-V Assembly and Instruction Pipeline Problem:Refer to [Quiz1 of Computer Architecture (2025 Fall) Problem B](https://https://hackmd.io/@sysprog/arch2025-quiz1-sol#Problem-B) ## explain THIS ASSIGNMENT is a logarthmic 8-bits codec we will use encode and decode to acheive this logarthemic 8-bits codec and recover it and we will check decode parameter can correspond to encode one.hat maps 20-bit unsigned integers ( [ 0 , 1,015,792 ] ) to 8-bit symbols via logarithmic quantization, delivering 2.5:1 compression and ≤6.25% relative error. ## decode ![螢幕擷取畫面 2025-10-18 220224](https://hackmd.io/_uploads/SyHZ0zZRel.png) when e is floor(b/16),m is b mod 16. we use this format to decode 8bits symbols to 20-bits unsigned integer ## ENCODE ![螢幕擷取畫面 2025-10-18 220021](https://hackmd.io/_uploads/BJoaafZ0ex.png) we choose this format to encode 20 bits unsigned integer to get the 8 bits symbol and we preserve that the whole loop will error? when the output is 1 the whole loop is correct ## sloution ### AI TOOL I use chatgpt to help me when I have bug that I cant solve,some grammer ,and some translation,and which time need to store bits in s0-s11,and explain the c code in [Quiz1 of Computer Architecture (2025 Fall) Problem B](https://https://hackmd.io/@sysprog/arch2025-quiz1-sol#Problem-B) and almost(above 90%) code are write by my own hand think by my brain( less than 10%)is that I comment above (i cant slove bug) and i find it(it is condition mistake to jump wrong branch some logical mistake i didnt find out) ## clz find the msb to help encode and decode ```c clz: addi sp ,sp -16 sw ra ,12(sp) li t0,,16 #c = 16 n = 32 ,copy16 to s2 li t1,,32 loop: beq t0 ,zero,end # if c = 0 leave loop srl a1 ,a0, t0 #y = x>>c beq a1 ,zero,decision # if y =0 shift c>>1 sub t1 ,t1,t0 # n- = c mv a0 ,a1 # x = y decision: srli t0, t0 ,1 j loop end: sub a0 ,t1 , a0 #n = n-x lw ra,12(sp) addi,sp,sp,16 ret #return caller ``` and we can get the msb to continue to solve encode and decode ## decode ```c decode: addi ,sp ,sp ,-16 sw ,ra,12(sp) andi t0 ,a0,0x0f #b mod 16 mantissa srli t1,a0,4 #b/16 exponent addi t2,t1,-15 sub t2,zero,t2 li t3,0x7fff #then we let 2^15=32768 is biggest step size we shift make it /2^n-1 and we finished the 2^e-1 srl t3,t3,t2 slli t3,t3,4 sll t0,t0,t1 add a0,t0,t3 #we find d(b)leftand add offset we get decode lw ra ,12(sp) addi ,sp ,sp 16 ret ``` this part is decode the uf8 t0 uint_20 so we will use it in last part of this code ## ENcode ```c encode: addi,sp,sp,-16 sw ra,12(sp) mv t5,a0 li t1,16 bltu t5,t1,small #if 16>=value jump to end fex: mv a0,t5 jal ra,clz add t0 ,a0,zero addi t0,t0,-31 sub t0,zero,t0 #31-clz(value) decision msb in which bit addi t1,zero,-4 bltu t1,x0,tu0 # if msb<5 goto next loop li t0,15 blt t0,t1,tu15 #if exp>15 goto nextloop li t1,15 tu0: li t1 0 j nextf tu15: li t1 15 nextf: li t2,1 sll t2,t2,t1 addi t2,t2,-1 #t1 is exp slli t2,t2,4 #find overflow adj: sltu t0,zero,t1 #if wxp<=0 or value>=overflow mv t4,t5 sltu t3,t4,t2 and t0,t0,t3 #t0 is 2 condition sucess addi t3,zero,1 bne t0,t3,fae addi t2,t2,-16 srli t2,t2,1 addi t1,t1,-1 beq t0 ,zero,adj fae: addi t0,zero,15 bgeu t1, t0,fend slli t2,t2,1 addi,t2,t2,16 #find exact exp bltu t5,t2,fend addi,t1,t1,1 j fae fend: sub t0,t5,t2 srl t0,t0,t1 slli t1,t1,4 andi t0,t0,0xf add t3,t1,t0 fin: andi a0,t3,0xff lw ra ,12(sp) addi sp,sp ,16 ret small: mv a0,t5 lw ra,12(sp) addi sp,sp,16 ret ``` and this part is most complicated part in this homework cause too many branch may jump to wrong branch and this is encode uint_32bit to corrspond uf8 with its stepsize and we can find out the uf8 ## verify ```c bool: addi sp,sp ,-16 sw ra,8(sp) li,s0,-1 #t0 is preious value li,s1,1 # pass is ture ,t1 li,s2,0 #t2 is fl li,s3,256 #loop i<256 confirm: mv a0,s2 #fl = t2 andi a0,s2,0xff jal ra,decode mv t5,a0 mv a0,t5 #value is t5 jal ra,encode mv t6,a0 # fl2 is t6 andi t4,s2,0xff andi t6,t6,0xff beq t4,t6,ch1 li a0,0 mv a0,t2 li a7,1 ecall la a0,str1 li a7,4 ecall mv a0,t5 li a7,1 ecall la a0,str2 li a7,4 ecall mv a0,t6 li a7,1 ecall la a0,n li a7,4 ecall li s1,0 ch1: blt s0,t5,ch2 li a0,0 mv a0,t2 li a7,1 ecall la a0,str1 li a7,4 ecall mv a0,t5 li a7,1 ecall la a0,str2 li a7,4 ecall mv a0,t6 li a7,1 ecall la a0,n li a7,4 ecall li s1,0 ch2: mv s0,t5 addi,s2,s2,1 bltu s2,s3 ,confirm mv a0,s1 lw ra ,8(sp) addi sp,sp,16 ret ``` and the last function we need to call the encode and decode to verify if the decode number equal to encode number? so that's find out we run uf8 total 256 numbers and decode it ## FULL CODE ```c .data str1:.string ":produce value" str2:.string "but encodes back to" n:.asciz "\n" str3:.string ":value" str4:.string "<= previous_value" msg:.asciz "\n" .text .globl main main: jal ra ,bool li a7,1 ecall li a7,10 ecall clz: addi sp ,sp -16 sw ra ,12(sp) li t0,,16 #c = 16 n = 32 ,copy16 to s2 li t1,,32 loop: beq t0 ,zero,end # if c = 0 leave loop srl a1 ,a0, t0 #y = x>>c beq a1 ,zero,decision # if y =0 shift c>>1 sub t1 ,t1,t0 # n- = c mv a0 ,a1 # x = y decision: srli t0, t0 ,1 j loop end: sub a0 ,t1 , a0 #n = n-x lw ra,12(sp) addi,sp,sp,16 ret #return caller decode: addi ,sp ,sp ,-16 sw ,ra,12(sp) andi t0 ,a0,0x0f #b mod 16 mantissa srli t1,a0,4 #b/16 exponent addi t2,t1,-15 sub t2,zero,t2 li t3,0x7fff #then we let 2^15=32768 is biggest step size we shift make it /2^n-1 and we finished the 2^e-1 srl t3,t3,t2 slli t3,t3,4 sll t0,t0,t1 add a0,t0,t3 #we find d(b)leftand add offset we get decode lw ra ,12(sp) addi ,sp ,sp 16 ret encode: addi,sp,sp,-16 sw ra,12(sp) mv t5,a0 li t1,16 bltu t5,t1,small #if 16>=value jump to end fex: mv a0,t5 jal ra,clz add t0 ,a0,zero addi t0,t0,-31 sub t0,zero,t0 #31-clz(value) decision msb in which bit addi t1,zero,-4 bltu t1,x0,tu0 # if msb<5 goto next loop li t0,15 blt t0,t1,tu15 #if exp>15 goto nextloop li t1,15 tu0: li t1 0 j nextf tu15: li t1 15 nextf: li t2,1 sll t2,t2,t1 addi t2,t2,-1 #t1 is exp slli t2,t2,4 #find overflow adj: sltu t0,zero,t1 #if wxp<=0 or value>=overflow mv t4,t5 sltu t3,t4,t2 and t0,t0,t3 #t0 is 2 condition sucess addi t3,zero,1 bne t0,t3,fae addi t2,t2,-16 srli t2,t2,1 addi t1,t1,-1 beq t0 ,zero,adj fae: addi t0,zero,15 bgeu t1, t0,fend slli t2,t2,1 addi,t2,t2,16 #find exact exp bltu t5,t2,fend addi,t1,t1,1 j fae fend: sub t0,t5,t2 srl t0,t0,t1 slli t1,t1,4 andi t0,t0,0xf add t3,t1,t0 fin: andi a0,t3,0xff lw ra ,12(sp) addi sp,sp ,16 ret small: mv a0,t5 lw ra,12(sp) addi sp,sp,16 ret bool: addi sp,sp ,-16 sw ra,8(sp) li,s0,-1 #t0 is preious value li,s1,1 # pass is ture ,t1 li,s2,0 #t2 is fl li,s3,256 #loop i<256 confirm: mv a0,s2 #fl = t2 andi a0,s2,0xff jal ra,decode mv t5,a0 mv a0,t5 #value is t5 jal ra,encode mv t6,a0 # fl2 is t6 andi t4,s2,0xff andi t6,t6,0xff beq t4,t6,ch1 li a0,0 mv a0,t2 li a7,1 ecall la a0,str1 li a7,4 ecall mv a0,t5 li a7,1 ecall la a0,str2 li a7,4 ecall mv a0,t6 li a7,1 ecall la a0,n li a7,4 ecall li s1,0 ch1: blt s0,t5,ch2 li a0,0 mv a0,t2 li a7,1 ecall la a0,str1 li a7,4 ecall mv a0,t5 li a7,1 ecall la a0,str2 li a7,4 ecall mv a0,t6 li a7,1 ecall la a0,n li a7,4 ecall li s1,0 ch2: mv s0,t5 addi,s2,s2,1 bltu s2,s3 ,confirm mv a0,s1 lw ra ,8(sp) addi sp,sp,16 ret ``` ## RESULT ![螢幕擷取畫面 2025-10-18 222008](https://hackmd.io/_uploads/rkrHMmWRll.png) THIS result seems the whole loop is correct and single cycle execute info is below in single cycle ![螢幕擷取畫面 2025-10-18 222023](https://hackmd.io/_uploads/rkxUz7-Age.png) this is 5 stage cpu info ![螢幕擷取畫面 2025-10-18 225518](https://hackmd.io/_uploads/SJqv9XZCxe.png) although 5-stage cpu is more loops count but it's more efficent ## pipeline ### STAGE 1:IF(INSTRTION FETCH) ![螢幕擷取畫面 2025-10-18 231245](https://hackmd.io/_uploads/BkoqR7ZCex.png) .Fetch the next instruction from the Instruction Memory and update the Program Counter (PC) to point to the next instruction. ### STAGE2:INSTRUCTION DECODE AND REGISTER FETCH(ID) ![螢幕擷取畫面 2025-10-18 232401](https://hackmd.io/_uploads/r1KXbVW0ge.png) Decode the instruction (determine its operation type such as add, lw, jal, etc.) and read operands from the Register File. ### STAGE3: EXECUTE(EX) ![螢幕擷取畫面 2025-10-18 232730](https://hackmd.io/_uploads/BkFgGNZ0ge.png) Perform ALU operations (addition, subtraction, multiplication, shift, or logic). For memory instructions, calculate the effective address. Branch decisions are also made in this stage. ### STAGE4:MEMORY ACCESS(MEM) ![螢幕擷取畫面 2025-10-18 232907](https://hackmd.io/_uploads/rJuLzEW0ee.png) Access Data Memory if the instruction is lw or sw. For other instructions, this stage simply passes the result forward. ### STAGE5:WRITE BACK (WB) ![螢幕擷取畫面 2025-10-18 233027](https://hackmd.io/_uploads/SJ3izVZAge.png) Write the result of computation or memory access back to the Register File, e.g., writing the ALU result to the destination register rd. ![螢幕擷取畫面 2025-10-18 230835](https://hackmd.io/_uploads/r1Uc67bRgx.png) ### OBERSERVATION we can oberservation this part of code and we can see hazard in this part, cause when we execute **blt** we can see the ID is **mv** so we need to flush **mv** and **jal** to let **blt** can execute smoothly. ### HAZARD 1.Data Hazard: One instruction depends on data from a previous one 2.Control Hazard Branch : The branch decision (jump target) isn’t known yet SO, in my code will encounter this 2 hazard in which jal want to jump to the poiner but **mv,a0,t5** did not write back this is data hazard ,and **blt** want to jump but **mv** is int the IF so this is call controll hazard.we need to flush or bubble to solve hazard. # PEOBLEM C Assignment1: RISC-V Assembly and Instruction Pipeline Problem:[Refer to Quiz1 of Computer Architecture (2025 Fall) Problem C](https://hackmd.io/@sysprog/arch2025-quiz1-sol) ## EXPLAIN ![螢幕擷取畫面 2025-10-19 000929](https://hackmd.io/_uploads/rydzhVbRex.png) ## solution ### TEST DATA #### TEST1: ![螢幕擷取畫面 2025-10-19 001352](https://hackmd.io/_uploads/rJcHT4b0xx.png) input: oxff23,0x0345 sqrt input: 0x3f80 OUTPUT IN BFLOAT: ADD:0XFF23 ,sub: MUL:0XC2FA DIV:0XFF80SQRT:0X3F80 ![螢幕擷取畫面 2025-10-19 001836](https://hackmd.io/_uploads/rkRxCE-Rxg.png) #### TEST 2: input:0x5438,3842 sqrt input: 0x3f80 ![螢幕擷取畫面 2025-10-19 004141](https://hackmd.io/_uploads/Syt8mSbRex.png) output in bfloat: add 0x5438 sub:0x5438 mul :0x4d0b div:0x5bff sqrt:0x56df #### test 3 input:0x3f80,0x3f00 sqrt:0x3f80 ![螢幕擷取畫面 2025-10-19 004951](https://hackmd.io/_uploads/Sy3sSSZ0lg.png) ![螢幕擷取畫面 2025-10-19 005055](https://hackmd.io/_uploads/ByLFBHZRxl.png) output in bfloat: add:0x3fc0 sub:0x3f00 mul:0x3f00 div:0x407f sqrt:0x3f80 ### ASSEMBLY CODE ```c .#problem 3 .data smask:.half 0x8000 emask:.half 0x7f80 mmask:.half 0x007f #all unsigned ebias:.word 127 nan:.half 0x7fc0 bzero:.half 0x0000 .text .globl main main: li a0,0x3f80 li a1,0x3f00 jal add li t6,0xffff and a0,a0,t6 slli a0,a0,16 li a7,34 ecall li a0,0x3f80 li a1,0x3f00 jal sub li t6,0xffff and a0,a0,t6 slli a0,a0,16 li a7,34 ecall li a0 ,0x3f80 li a1,0x3f00 jal mul li t6,0xffff and a0,a0,t6 slli a0,a0,16 li a7,34 ecall li a0 ,0x3f80 li a1,0x3f00 jal div li t6,0xffff and a0,a0,t6 slli a0,a0,16 li a7,34 ecall li a0,0x3f80 jal sqrt0 li t6,0xffff and a0,a0,t6 slli a0,a0,16 li a7,34 ecall li a7,10 ecall isnan: addi sp,sp,-16 sw ra ,12(sp) la t1,emask lhu t0,0(t1) and t1,a0,t0 srli t1,t1,7 sltiu t2,t1,0xff #把平移後的a.bit exp跟0xff堆比果如小於就取0 xori t2,t2,0x01 #not so,if !(t1<0xff) 1 t1<0xff 0 la t1,mmask lhu t0,0(t1) and t3,a0,t0 #mantissa produce sltu t3,zero,t3 #mantissa!=0 1 and a0,t2,t3 lw ra ,12(sp) addi sp,sp,16 ret isinf: addi sp,sp,-16 sw ra ,12(sp) la t1,emask lhu t0,0(t1) and t1,a0,t0 srli t1,t1,7 sltiu t2,t1,0xff #把平移後的a.bit exp跟0xff堆比果如小於就取0 xori t2,t2,0x01 #not so,if !(t1<0xff) 1 t1<0xff 0 la t1,mmask lhu t0,0(t1) and t3,a0,t0 #mantisa produce sltu t3,zero,t3 #mantissa!=0 1 xori t3,t3,1 and a0,t2,t3 lw ra ,12(sp) addi sp,sp,16 ret iszero: addi sp,sp,-16 sw ra ,12(sp) li t0,0x7fff and t1,a0,t0 sltu t1,zero,t1 xori a0,t1,0x01 lw ra,12(sp) addi sp,sp,16 ret f322bf16: addi sp,sp,-16 sw ra ,12(sp) mv t0,a0 srli t2,a0,23 #現在抓她後8位如果是0xff就把原數字為移16做邏輯位移 andi t2,t2,0xff li t1,0xff beq t2,t1,q srli t0,a0,16 andi t0,t0,1 li t1,0x7fff add t0,t0,t1 add t0,t0,a0 q: srli a0,t0,16 lw ra,12(sp) addi sp,sp,16 ret b162f32: addi sp,sp,-16 sw ra ,12(sp) slli a0,a0,16 lw ra ,12(sp) addi sp,sp,16 ret unpackbf16: srli t0,a0,15 #sign a = t0 andi t0,t0,1 srli t1,a1,15 #sign b = t1 andi t1,t1,1 srli t2,a0,7 #exp a= t2 andi t2,t2,0xff srli t3,a1,7 #exp a= t3 andi t3,t3,0xff andi t4,a0,0x7f #mantissa a=t4 andi t5,a1,0x7f #mantissa b=t5 mv s0,t0 mv s1,t1 mv s2,t2 mv s3,t3 mv s4,t4 mv s5,t5 ret add: addi sp,sp,-48 sw ra ,12(sp) sw s0,0(sp) sw s1,4(sp) sw s2,8(sp) sw s3,16(sp) sw s4,20(sp) sw s5,24(sp) sw a0,28(sp) sw a1,32(sp) jal ra,unpackbf16 li t0,0xff bne s2,t0,other bne s4,zero,reta bne s3,t0,reta sltu t1,x0,s5 xor t2,s0,s1 sltiu t2,t2,1 or t3,t1,t2 bne t3,x0,retb li a0,0x7fc0 j last other: li t0,255 bne s3,t0,add1 # if wxpb==0xff j retb add1: bne s2,x0,add2 beq s4,x0,retb #if(exp!=0&&!mantb!0) add2: bne s3,x0,x80 beq s5,x0,reta #if(exp!=0&&!mantb!0) x80: beq s2,x0,x802 ori s4,s4,128 x802: beq s3,x0,doother ori s5,s5,128 doother: sub t0,s2,s3 #t0 is exp diff mv t6,t0 bge x0,t0,expdif1 mv t1,s2 #t1 is result exp li t2,8 blt t2,t0,reta srl s5,s5,t0 j sign expdif1: beq t6,x0,expdif2 mv t1,s3 li t2,-8 blt t0,t2,retb sub t0,x0,t0 srl s4,s4,t0 j sign expdif2: mv t1,s2 j sign sign: bne s0,s1,sign2 mv t2,s0 #result sign =signa t2,result sign add t3,s4,s5 #reslut mant =t3 andi t4,t3,0x100 beq t4,x0,psign srli t3,t3,1 addi t1,t1,1 li t5,0xff blt t1,t5,psign slli t4,t2,15 li t5,0x7f80 or a0,t4,t5 j last psign: slli t4,t2,15 andi t5,t1,0xff slli t5,t5,7 andi t6,t3,0x7f or t4,t4,t5 or a0,t4,t6 j last sign2: blt s4,s5,bch2 mv t2,s0 sub t3,s4,s5 j almost bch2: mv t2,s1 sub t3,s5,s4 almost: beq t3,x0,ret0 j slt slt: loop: andi t4,t3,0x80 bne t4,x0,gg slli t3,t3,1 addi t1,t1,-1 bge zero,t1,ret0 j loop gg: slli t4,t2,15 andi t5,t1,0xff slli t5,t5,7 andi t6,t3,0x7f or t4,t4,t5 or a0,t4,t6 j last reta: lw a0,28(sp) j last retb: lw a0,32(sp) j last ret0: li a0,0 j last last: lw ra ,12(sp) lw s0,0(sp) lw s1,4(sp) lw s2,8(sp) lw s3,16(sp) lw s4,20(sp) lw s5,24(sp) addi sp,sp,48 ret sub: addi sp,sp,-16 sw ra,12(sp) la t1,smask lhu t1,0(t1) xor a1,a1,t1 jal ra ,add lw ra,12(sp) addi sp,sp,16 ret mul: addi sp,sp,-48 sw ra ,12(sp) sw s0,0(sp) sw s1,4(sp) sw s2,8(sp) sw s3,16(sp) sw s4,20(sp) sw s5,24(sp) sw a0, 28(sp) sw a1,32(sp) jal ra,unpackbf16 xor t0,s0,s1 #t0 is reult sign li t1,0xff bne s2,t1,mult2 bne s4,x0,mreta bne s3,x0,mult1 beq s5,x0,retn mult1: slli t2,t0,15 li t1,0x7f80 or a0,t1,t2 j multend mult2: bne s3,t1,mult3 bne s5,x0,mretb bne s2,x0,mult1 beq s4,x0,retn mult3: or t1,s2,s4 beq t1,x0,mret0 or t1,s3,s5 beq t1,x0,mret0 mult4: mv t2,x0 #expadjust=0 beq s2,x0,mloop j else mloop: andi t1,s4,0x80 bne t1,x0,expa1 slli s4,s4,1 addi t2,t2,-1 j mloop expa1: li t1,1 mv s2,t1 j else else: ori s4,s4,0x80 mult5: beq s3,x0,mloop1 j else1 mloop1: andi t1,s5,0x80 bne t1,x0,expb1 slli s5,s5,1 addi t2,t2,-1 j mloop1 expb1: li t1,1 mv s3,t1 j else1 else1: ori s5,s5,0x80 newm: mv t5,s4 mv t1,s5 mv t6,x0 muloop: andi t3,t1,1 beq t3,x0,nadd add t6,t6,t5 #t6 is reult mant nadd: srli t1,t1,1 slli t5,t5,1 bne t1,x0,muloop reltexp: add t5,s2,s3 addi t5,t5,-127 add t5,t5,t2 #t5 is reslt exp findmant: li t1,0x8000 and t1,t6,t1 beq t1,x0,felse srli t6,t6,8 andi t6,t6,0x7f addi t5,t5,1 j findexp felse: srli t6,t6,7 andi t6,t6,0x7f findexp: li t1,0xff blt t5,t1,elsep slli t0,t0,15 li t1,0x7f80 or a0,t0,t1 j multend elsep: blt x0,t5,mlast li t1,-6 blt t5,t1,mnext li t1,1 sub t1,t1,t5 srl t6,t6,t1 mv t5,x0 j mlast mnext: slli a0,t0,15 j multend mlast: slli t0,t0,15 andi t6,t6,0x7f andi t5,t5,0xff slli t5,t5,7 or t0,t0,t5 or a0,t0,t6 j multend mreta: lw a0, 28(sp) j multend mretb: lw a0, 32(sp) j multend retn: li a0,0x7fc0 j multend mret0: slli a0,t0,15 j multend multend: lw ra ,12(sp) lw s0,0(sp) lw s1,4(sp) lw s2,8(sp) lw s3,16(sp) lw s4,20(sp) lw s5,24(sp) addi sp,sp,48 ret div: addi sp,sp,-48 sw ra ,12(sp) sw s0,0(sp) sw s1,4(sp) sw s2,8(sp) sw s3,16(sp) sw s4,20(sp) sw s5,24(sp) sw a0,28(sp) sw a1,32(sp) jal ra,unpackbf16 xor t0,s0,s1 #t0 is reult sign li t1,0xff bne s3,t1,div2 bne s5,x0,dretb bne s2,t1,nextd beq s4,x0,dret0 j nextd nextd: slli a0,t0,15 j divend div2: bne s3,x0,div3 bne s5,x0,div3 bne s2,x0,ret1nf beq s4,x0,dret0 j ret1nf div3: bne s2,t1,div4 bne s4,x0,dreta slli t0,t0,15 li t1,0x7f80 or a0,t0,t1 j divend div4: bne s2,x0,div5 bne s4,x0,div5 j nextd div5: beq x0,s2,div6 ori s4,s4,0x80 div6: beq x0,s3,dpset ori s5,s5,0x80 dpset: slli t3,s4,15 #t3 is dividend mv t4,s5 #t4 is divisor mv t5,x0 #t5 is qotient mv t1,x0 li t2,16 dloop: slli t5,t5,1 sub t6,x0,t1 addi t6,t6,15 srl t6,t4,t6 blt t3,t6,divjump sub t6,x0,t1 addi t6,t6,15 srl t6,t4,t6 sub t3,t3,t6 ori t5,t5,1 j divjump divjump: addi t1,t1,1 blt t1,t2,dloop chexp: sub t1,s2,s3 #t1 is reslt exp addi t1,t1,127 bne s2,x0,chexp2 addi t1,t1,-1 j chexp2 chexp2: bne s3,x0,qucheck addi t1,t1,1 j qucheck qucheck: li t2,0x8000 and t2,t5,t2 bne t2,x0,rl8 qucheck2: li t3,1 bge t3,t1,rl8 slli t5,t5,1 addi t1,t1,-1 li t2,0x8000 and t2,t5,t2 beq t2,x0,qucheck2 rl8: srli t5,t5,8 andq: andi t5,t5,0x7f lastcheck: li t3,0xff bge t1,t3,ret1nf bge x0,t1,nextd slli t0,t0,15 andi t1,t1,0xff slli t1,t1,7 andi t5,t5,0x7f or t0,t0,t1 or a0,t0,t5 j divend dreta: lw a0,28(sp) j divend dretb: lw a0,32(sp) j divend dret0: la t1,nan lhu a0,0(t1) j divend ret1nf: slli t0,t0,15 li t1,0x7f80 or a0,t0,t1 j divend divend: lw ra ,12(sp) lw s0,0(sp) lw s1,4(sp) lw s2,8(sp) lw s3,16(sp) lw s4,20(sp) lw s5,24(sp) addi sp,sp,48 ret sqrt0: addi sp,sp,-16 sw ra ,12(sp) srli t0,a0,15 andi t0,t0,1 # t0 is sign srli t1,a0,7 andi t1,t1,0xff #t1 is exp andi t2,a0,0x7f # t2 is mant hs: li t3,0xff bne t3,t1,sqr bne t2,x0,sreta j ches ches: bne t0,x0,sretn j sreta sqr: bne t1,x0,sqrn beq t2,x0,sret0 j sqrn sqrn: bne t0,x0,sretn j denorm denorm: beq t1,x0,sret0 j algo algo: addi t3,t1,-127 #t3 is e ori t4,t2,0x80 #t4 is m andi t5,t3,1 bne t5,x0,selse srai t5,t3,1 addi t6,t5,127 #t6 is new esp j binarysearch selse: slli t4,t4,1 addi t5,t3,-1 srai t5,t5,1 addi t6,t5,127 binarysearch: li t0 ,90 li t1,256 li t2,128 sloop: blt t1,t0,normlize add t5,t0,t1 srli t5,t5,1 mul t3,t5,t5 srli t3,t3,7 blt t4,t3,selse2 mv t2,t5 addi t0,t5,1 j sloop selse2: addi t1,t5,-1 j sloop normlize: li t0,256 blt t2,t0,elseif srli t2,t2,1 addi t6,t6,1 j sfmant elseif: li t0,128 bge t2,t0,sfmant li t0,1 loopnew: li t1,128 bge t2,t1,sfmant bge t0,t6,sfmant slli t2,t2,1 addi t6,t6,-1 j loopnew sfmant: li t0,0xff bge t6,t0,sret07f80 bge x0,t6,sret0 j retnew sreta: mv a0,a0 j sqrtend sretn: la t3,nan lhu a0,0(t3) j sqrtend sret0: li a0,0 j sqrtend sret07f80: li a0,0x7f80 j sqrtend retnew: andi t1,t6,0xff slli t1,t1,7 andi t2,t2,0x7f or a0,t1,t2 j sqrtend sqrtend: lw ra ,12(sp) addi sp,sp,16 ret ``` ## result ![螢幕擷取畫面 2025-10-19 005754](https://hackmd.io/_uploads/Bk5HDHbAlx.png) this is execute info # [leetcode 190. Reverse Bits](https://leetcode.com/problems/reverse-bits/) ## explain Reverse bits of a given 32 bits signed integer. Example 1: Input: n = 43261596 Output: 964176192 Explanation: Integer Binary 43261596 00000010100101000001111010011100 964176192 00111001011110000010100101000000 Example 2: Input: n = 2147483644 Output: 1073741822 Explanation: Integer Binary 2147483644 1073741822 Constraints: 0 <= n <= 231 - 2 n is even. ## solution i have 3 version c code for this leetcode **version 1**: ```c int reverseBits(int n) { int x = n>>24; int y = n>>16; int z = n>>8; int u = n; x = x&0xff; y = y&0xff; z = z&0xff; u = u&0xff; int t = 0; int s = 0; int p = 0; int q = 0; for(int i =0;i<8;i++){ int bit; int v = 1; bit= x&v; x>>=1; bit <<=(7-i); t|=bit; } x =t; for(int i =0;i<8;i++){ int bit; int v = 1; bit= y&v; y>>=1; bit <<=(7-i); s|=bit; } y =s; for(int i =0;i<8;i++){ int bit; int v = 1; bit = z&v; z>>=1; bit <<=(7-i); p|=bit; } z=p; for(int i =0;i<8;i++){ int bit; int v = 1; bit= u&v; u>>=1; bit <<=(7-i); q|=bit; } u = q; u<<=24; z<<=16; y<<=8; int answer = x+y+z+u; return answer; } ``` my first think is divid 32 bits integer for 4 part and we do each part reverse and let 1st in last byte etc......,so i get this code for the first version. **version 2**: ```c int reverseBits(int n) { int t=0; u_int u = (u_int) n; for(int i = 0;i<32;i++){ int bit; bit = u&1; u>>=1; bit<<=(31-i); t|=bit; } u =t; n = u; return n; ``` this is my second version ,i think version 1 is too complicated,so i decide to construct only a loop for whole 32bits and do it 32 time always let the last bit goto 1 goto 31-loop times bit so i can get the final answer **version 3**: ```c int reverseBits(int n) { n= (n&0x55555555u)<<1|(n>>1)&0x55555555u; n = (n&0x33333333u)<<2|(n>>2)&0x33333333u; n = (n&0x0f0f0f0fu)<<4|(n>>4)&0x0f0f0f0fu; n = (n&0x00ff00ffu)<<8|(n>>8)&0x00ff00ffu; n = (n&0x0000ffff)<<16|(n>>16)&0x0000ffff; return n; } ``` the last version only complexity O(1),version 1 and version 2 both complexity is O(n) ,this version only bitwise no longer need to use loop we dont need any branch inside so can avoid control hazard in it. ### assembly code **TEST 1**: INPUT:2147483644 ![螢幕擷取畫面 2025-10-19 232040](https://hackmd.io/_uploads/ry3RWKfAle.png) OUTPUT:1073741822 **TEST 2**: INPUT:43261596 ![螢幕擷取畫面 2025-10-19 231953](https://hackmd.io/_uploads/rybh-KMAll.png) OUTPUT:964176192 **TEST 3**: INPUT:1084732 ![螢幕擷取畫面 2025-10-19 231902](https://hackmd.io/_uploads/HkNYZFf0el.png) OUTPUT:1018234880 ```c .data oddevenmask:.word 0x55555555 twobitsmask:.word 0x33333333 fourbitsmask:.word 0x0f0f0f0f eightbitsmask:.word 0x00ff00ff .text .globl main main: li a0,1084732 la t0,oddevenmask lw t0,0(t0) la t1,twobitsmask lw t1,0(t1) la t2,fourbitsmask lw t2,0(t2) la t3,eightbitsmask lw t3,0(t3) srli t4,a0,1 and t4,t4,t0 and t5,a0,t0 slli t5,t5,1 or a0,t5,t4 srli t4,a0,2 and t4,t4,t1 and t5,a0,t1 slli t5,t5,2 or a0,t5,t4 srli t4,a0,4 and t4,t4,t2 and t5,a0,t2 slli t5,t5,4 or a0,t5,t4 srli t4,a0,8 and t4,t4,t3 and t5,a0,t3 slli t5,t5,8 or a0,t5,t4 srli t4,a0,16 slli t5,a0,16 or a0,t4,t5 li a7,34 ecall ```