[Toc] # arithmetic for computers some basic concept of arithemetic logic unit (ALU), starting from 1-unit ALU, ![截圖 2024-04-30 上午11.09.35](https://hackmd.io/_uploads/Hkje-kR-R.png) ![截圖 2024-04-30 上午11.11.38](https://hackmd.io/_uploads/B1_uWJRbA.png) then we could combine them to build an 32-bits ALU. ![截圖 2024-04-30 上午11.12.00](https://hackmd.io/_uploads/B1y5Wk0ZR.png) --- Well, we could also add more functions and finally draw a simple figure to present an ALU. ![截圖 2024-04-30 上午11.15.33](https://hackmd.io/_uploads/SklPMk0WR.png) ![截圖 2024-04-30 上午11.16.02](https://hackmd.io/_uploads/Bk0OMkAZR.png) ![截圖 2024-04-30 上午11.16.13](https://hackmd.io/_uploads/ryjYG1AbC.png) with sample VHDL behavioral defination of a RISC-V ALU ```verilog module RISCVALU(ALUctl, A, B, ALUOut, Zero); input [3:0] ALUctl; input [31:0] ALUOut; output reg [31:0] ALUOut; output Zero; assign Zero = (ALUOut==0); // Zero is true if ALUOut is 0 always @(ALUOut, A, B) begin case(ALUctl) 4'b0000: ALUOut <= A & B; 4'b0001: ALUOut <= A | B; 4'b0010: ALUOut <= A + B; 4'b0110: ALUOut <= A - B; 4'b0111: ALUOut <= A < B ? 1 : 0; 4'b1100: ALUOut <= ~(A | B); // NOR default: ALUOut <= 0; endcase end endmodule ``` ## addition & subtraction addition and subtraction addition: ```a+b = (a^b) | ((a&b) << 1)``` substraction: ```a-b = a+(-b) = a+((~b)+1)``` in RISC-V, developers didn't include overflow ckeck on integer arithmetic operations in the base instruction set; howerer, many overflow checks can be cheaply implemented using RISC-V branches, such as: ``` add t0, t1, t2; bltu t0, t1, overflow; ``` which is an unsigned addition overflow check. | operation | operand A | operand B | result indicating overflow | | --------- | --------- | --------- | -------------------------- | | $A+B$ | $\ge 0$ | $\ge 0$ | $< 0$ | | $A+B$ | $< 0$ | $< 0$ | $\ge 0$ | | $A-B$ | $\ge 0$ | $< 0$ | $< 0$ | | $A-B$ | $< 0$ | $\ge 0$ | $\ge 0$ | ## miltiplication naive way to do multiplication: ![image](https://hackmd.io/_uploads/BkwU-DslC.png) multiplication hardware: ![image](https://hackmd.io/_uploads/BkkAWY0WR.png) ![image](https://hackmd.io/_uploads/Byjv-woxA.png) multiplication algorithm: ![image](https://hackmd.io/_uploads/SkGKWDsg0.png) disadvantage: 1. waste the space of multiplicand. 2. since the multiplicand is 64-bit, 64-bit ALU is required. however, the 64-bit ALU only produce 32-bit data. --- optimized multiplier: perform steps in parallel - add/shift ==EXAMPLE== Using 4-bit numbers to save space, multiply $2_{10} \times 3_{10}$. signed multiplication ![截圖 2024-04-30 晚上10.39.48](https://hackmd.io/_uploads/SJEaMKCZC.png) faster multiplication by divided addition ![image](https://hackmd.io/_uploads/Hypmfvol0.png) ## division naive way to do divison: ![image](https://hackmd.io/_uploads/ByKufvslR.png) divison hardware: ![image](https://hackmd.io/_uploads/HyaYfPogR.png) divison algorithm: ![image](https://hackmd.io/_uploads/B1iczvoe0.png) *dividend and divisor are both 32-bit. ==EXAMPLE== using a 4-bit version of the algorithm to save pages, let’s try $7_{10} \div 2_{10}$ ![截圖 2024-04-30 晚上10.42.57](https://hackmd.io/_uploads/Sk1tmKAZR.png) ![截圖 2024-04-30 晚上10.43.26](https://hackmd.io/_uploads/BJ1omtCZA.png) ## floating point defined by IEEE standard 754, the notation in __normalized binary__ (which computer represents in base 2) is $\pm 1.xxxxx_{2} \times 2^{yyyy}$, with two representations: single precision (32-bits) and double precision (64-bits). IEEE Floating-Point Format: ==$x = (-1)^s \times (1+F) \times 2^{E-Bias}$== * $s$: sign bit, $0$ means non-negative and $1$ means negative, * normalize signficand: $1.0 \le |signficand| \le 2.0$, there always has a leading pre-binary-point 1 bit, so no need to represent it explictly. $F$ is the fraction with the "$1.$" restored. * $E$: excess representation: actual exponent + bias, ensures exponent is unsigned, for using $127$($-128$ reserved) as a bias for single precision and $1023$($-1024$ reserved) for double precision single precision (32-bits) ![截圖 2024-04-30 晚上10.53.01](https://hackmd.io/_uploads/BJ60SY0WR.png) double precision (64-bits) ![截圖 2024-04-30 晚上10.53.21](https://hackmd.io/_uploads/BJnJLYAbR.png) single-precision range: exponent `0000 0000` and `1111 1111` are reserved the smallest value is $\pm 1.0 \times 2^{00000001_2 - 127_{10}} = \pm 1.0 \times 2^{-126}$, which fraction is $000...00$ with 23-bits. the largest value is $\pm 1.11111111111111111111111_2 \times 2^{11111110_2 - 127_{10}} = \pm 1.11111111111111111111111_2 \times 2^{254-127}$ as well as, the relative precision of single and double: single: $23 \times \log_{10}2 \approx 23 \times 0.3 \approx 6$ double: $52 \times \log_{10}2 \approx 52 \times 0.3 \approx 16$ __denormal numbers__: $x = (-1)^s \times (0+F) \times 2^{-Bias}$, which is smaller than normal numbers __zero__: occurs when both _exp_ and _mantissa_ are zero $x = (-1)^s \times 0$ (the sign is matter, which $+0.0$ and $-0.0$ are represents 0.0) __infinity__: occurs when the bits of _exp_ are all one and _mantissa_ equals to zero $x = (-1)^s \times \infty$ (the sign is matter) __NaN__: occurs when the bits of _exp_ are all one and _mantissa_ doesn't equal to zero $x = NaN$ (the sign isn't matter) ![截圖 2024-04-30 晚上11.22.18](https://hackmd.io/_uploads/rkDn3KRWA.png) --- floating addition algorithm: ![截圖 2024-04-30 晚上11.23.32](https://hackmd.io/_uploads/H1lZaYRbR.png) ==EXAMPLE== What's decimal number does the floating number `0x427D0000` represented by IEEE 754 standard? $427D0000_{16} = 01000010011111000000000000000000_2$ the value is $(-1)^0 \times (1+F) \times 2^{E-Bias}$, which the F is $1111100...000_2$ and the E is $10000100_2 = 132_{10}$. Hence the decimal number is $1.11111_2 \times 2^{132-127} = 1.11111 \times 2^5$ $11111_2 = 31_{10}$ ==EXAMPLE== add $0.5_{10}$ and $-0.4375_{10}$ assume we keep 4 bits of precision: $0.5_{10} = 0.1_2 = (1 \times 2^{-1})_2$ $-0.4375_{10} = (-7/16)_{10} = (-1.11 \times 2^{-2})_2$ align the number whose exp is bigger: $-1.11 \times 2^{-2} = -0.111 \times 2^{-1}$ add together: $(1 \times 2^{-1}) + (-0.111 \times 2^{-1}) = 0.001 \times 2^{-1}$ normalize: $0.001 \times 2^{-1} = 1 \times 2^{-4} = 0.0625_{10}$ ==EXAMPLE== multiplying $0.5_{10}$ and $-0.4375_{10}$ $0.5_{10} = 0.1_2 = (1 \times 2^{-1})_2$ $-0.4375_{10} = (-7/16)_{10} = (-1.11 \times 2^{-2})_2$ $(-1)+(-2) = (-3) = 124 - 127$, so exp is $124$ the fraction is: $1.000 \times 1.110 = (1000 \times 1110) \times 0.000001 = 1110000 \times 0.000001 = 1.110000$, round, $1.11$ the product is $(1.110 \times 2^{124-127})_2 = 1.75/8_{10} = 0.21875$ with the sign, the answer is $-0.21875_{10}$ --- fallacy: _right shift by n-bit is equal to divided by $2^n$ and left shift by n-bit is equal to multiple by $2^n$_ when the number to be operate is <font color=#f00>__unsigned__</font>, the conclusion is __TRUE__. but when the number is signed, the conclusion is wrong. consider -5 divided by 4: in 8 bits binary, -5 is `0b11111011` when -5 divided by 4, we expect the outcome shall be -1. but `0b11111011 >> 2` = `0b00111110` which is 62 instead, absoutely wrong. maybe try shift with sign extend? `0b11111011 >> 2` = `0b11111110` which is -2, close, but still wrong. ```cpp= #include <iostream> int main(){ std::cout << ((-5) >> 2) << std::endl; return 0; } ``` ![截圖 2024-05-01 凌晨12.09.06](https://hackmd.io/_uploads/Hk6ow5A-R.png) pitfall: _addition in float is <font color=#f00>__associative__</font>_ floating-point numbers have limited precision and result in approximations of real results. consider $a = 1.5 \times 10^{38}, b = 1.0, c = -1.5 \times 10^{38}$, whether the result of `(c+a)+b` and `c+(a+b)`? ```cpp= #include <iostream> using namespace std; int main(){ float a = 1.5e38, b = 1.0, c = -a; cout << (c+a) + b << endl; cout << c + (a+b) << endl; return 0; } ``` ![截圖 2024-05-01 凌晨12.11.05](https://hackmd.io/_uploads/BJEmOcCbR.png)