# Representation of Number
* Numbers in computer can be divided into three different types
1. **Unsigned Integer**
* For n bits of unsigned int the range for it is **$0-(2^n - 1)$**
3. **Signed Integer**
4. **Floating Point Number**
## Representation for Signed Integer
1. **Sign and Magnitude**
* Use the **first bit to indicate** the sign of the nubmer
* For n bits of S&M number the range for it is **$\pm(0\sim2^{n-1})$**
* *Cons*
1. Can't defiend where the sign bit should be
2. Will have both `+0` and `-0`
3. For arithmetic we will need handle the sign bit
2. **1's Complement**
* For a unsigned number **turn all 1 into 0 and 0 to 1**
* For n bits of 1's complement the range for it is **$\pm(0\sim2^{n-1})$**
* *Cons*
1. Still have `+0` and `-0`
2. To get the correct result we must add up the carry
> Ex:
> -3 = 1100, -2 = 1101
> -3 + -2 = (1) 1001 = -6 $\neq$ -5
> 1001 + 1 = 1010 = -5
3. **2's Complement**
* Can be formed by transform the unsigned number into 1's complement and +1
* For n bits the range for it is $-2^{n-1}\sim2^{n-1}-1$
* Have only one +0 and the answer of basic arithmetic is correct for most case
* Adopted by modern computer
#### Overflow
* When performing addition and subtraction sometimes the carry can't be represented by the hardware which cause the sign of the number change
> 
* Two operands with **different sign** will not cause the overflow.
#### Multimedia Arithmetic
* For color we used **1byte to represent RGB** that says the range for each color is $0\sim255$
* But the ALU is 32bits so when performing multimedia arithmetic we split 32bits ALU into 4 8bits ALU and **all performing same instruction**(SIMD,Same Instruction Multiple Data)
* Since the range is $0\sim255$ the number will not exceed 255 and smaller than 0 when performing multimedia arithmetic(**Saturating Operations**)
> Ex:
> For two 8bits unsigned decimal number A = 200, B = 103.Calculate A+B with saturating operation
> The range for 8bits binary number is $-127\sim127$
> A+B = `127` $\neq$ `303`
# Design of ALU
* An ALU should have **and/or/add/sub/nor/nand**
* We use **AND GATE/OR GATE/ONE BIT FULL ADDER/NOT GATE/MULTIPLEXER** as components for ALU
## 1Bit ALU
> 
> 
* The x in the table means that number is *don't care* and for and/or & not/nand we can assume that the **CarryIn can be the same as Binvert** thus we can reduce the control line for 1 bit ALU to 4bits
## 32Bits ALU
* Can perform **and/or/nor/add/sub/slt**
* Built up with 32 1bits ALU with every control lines are connected together
>
>
* The slt is implemented by **subtract two number**
1. We store either 1 or 0 into the desired register that says the **output for 2nd to 31th ALU should all be 0**
2. If a > b than the result for the 31th alu's adder should be 0(since the answer is >0)in the other hand the answer will be 1
3. The output of 1st ALU controling the result for the slt instruction and that output is decided by the output of 31th ALU => **We lined the output of 31th ALU's Less line to the 1st ALU's Less line**
>
* And same as the 1Bits ALU the CarryIn and Binvert can be the same line
* For **zero detection** we simpliy **or all result together and not it after**
> 
* For **overflow detection** we only need to do the detection on the 31th ALU since the **overflow will occur only when the CarryIn and CarryOut are not the same in the 31th ALU** => $CarryIn\oplus CarryOut$
> 
* Every step in the ALU might not be simultaneously.For example the **adder will need to wait the former adder end their job to get the CarryIn they want**(*But there still an answer before they get the CarryIn but the answer will be wrong*)The time for waiting is called **propagation delay**
> Example:
> 
> 
> 1(for not) + 2(for 2-1 mux) + 1(for or) + 4(for 4-1 mux) = 8
> 
> 2(for 2-1 mux) + 3*32(every adder will produce carry) + max(3,4)(since the overflow detection and 4-1 mux can work in parallel) = 102
> 
> 1(for not) + 2(for 2-1 mux) + $3*32$(since the slt is implmented by sub) + max(3,4) = 103
## Multiplication
### Sequential Multiplier
#### Unsigned Multiplication
##### Tranditional Multiplication
* The Multiplicand and Multiplier **shift left for 1 bit** after every multiply
* We add up all partial mutiply result
> 
##### Hardware Friendly Multiplication
* The Mutilplier **shift left for 1 bit** after every mutiply
* The **mutiplier is placed in the register that stores result**(*means that
* we shfit the answer register after mutiply*)
>
* We only use the upper part of the product register so that **4bits ALU** will be enough
#### Signed Multiplication
* The only method for signed multilication we will talk about is **Booth Algorithm**
##### 2Bits Booth Algorithm
* Scan 2 bits of the multiplier each time and according to the result there's three different actions
1. **00 & 11**
* *Do nothing*
2. **01**
* *Subtract the product with multiplicand once*
3. **10**
* *Add the product with multiplicand once*
* The shift movement here is an **Arithmetic Shfit**
* In order to scan thoroughly we need to add a **Dummy zero** before the LSB of the product register
>
> Example: 0010 * 1101
> The product register is `00001101 *0*`.For first scan 0000110**1 0** => subtract 0010
> Product register = `11110110 *1*`=>Scan 1111011**0 1** => add 0010
> Product register = `00001011 *0*`=> Scan 00001011**1 0** => sub 0010
> Product register = `11110101 *1*` => Scan 1111010**1 1** => do nothing
> Product register = `11111010 *1*`
* **Generally Boooth algorithm is faster** than the hardware friedly multiplication but if the mutiplier is **alternating between 0&1 than it will be slower**
> Example: For a multiplier like 10101
> hardware friedly multiplication needs 3 add/sub
> booth algorithm needs 5 add/sub
##### 3Bits Booth Algorithm
* Scan 3bits each time.There's five different actions
1. **000 & 111** 011 100 101 110
* Do nothing
2. **001 & 010**
* Add the product with multiplicand once
3. **011**
* Add the product with multiplicand twice
4. **101 & 110**
* Subtract the product with multiplicand once
5. **100**
* Subtract the product with multiplicand twice
* We can defined the action by do 2 bits scan twice
> For 010 1st round is 10 => add once.2nd round is 01 but after 1st we shift right 1bit to the register which caused the number in register mutilply by 2 => in this case will be sub *twice* => 010 will sub once
### Combinational Multiplier
* All combinational multiplier here are **Unsigned**
#### Fast Multiplier
* Recalling the sum of the partial product we will notice that the LSB is not involving in any arithmetic. So we could take the LSB as the result directly
>
>
#### Parallel Multiplier
* Do the add in parallel
>
* Only need $lgn$ adder for n bits multiplication
## Division
### Unsigned Division
#### Traditional Division
* When performing division we **shift the divisor to right** and **compare the shifted part with the dividend**
1. If **larger** then **shift the quotient to left and add *0* to its LSB**
2. If **smaller** then **shift the quotient to left and add *1* to its LSB** and **subtract the dividend with shifted divisor**
* We put the dividend and remainder together since the final result of the dividend will be the remainder
> Example: 0111/0010
> 
>
> 
#### Hardware Friendly Division
* Instead of shifting the divisor we shift the dividend and compare it with the divisor
* Before the loop starts we first shift the divisor to avoid doing 1 more round of the loop
* Since the quotient and dividend are both shiftng to the left,we can combine them together
* In the end shift the upper part of the remainder register to the right to get the correct remainder since we shift it to left when we quit the loop
* **Watch flow chart**
> 
### Signed Division
* MIPS has no signed division it implement it by do the unsigned division and add the sign afterward
* The sign is following the **dividend**
> Example:
> -7 / -2 => 7/2 = 3...1 => Answer for signed operation is `3...-1`
### Combined the Division and Multiplication
* Both Hardware friendly ways to perform these two instructions use the same hardware.The only difference between them is the direction of shifting.Thus in MIPS it combines Division and Multiplication in one specialized ALU
> 
## Division and Multiplication Instructions in MIPS
> 
# Representation of Floating Number
## IEEE 754
* A number like below is called **Binary Normalized Number**.
> 
* We use **IEEE 754** to represent a binary normalized number and it consisted three parts
1. **Sign**
2. **Fraction** : Numbers following by the binary point.Represnet the *precision*
3. **Exponent** : The expnent of the 2.Reprsent the *range*
* If we combine 1 and fraction part like `1.111` called **significan** and the bits will be *bits for fraction + 1*
* For **single precision** floating number it uses *1word(32bits)* to represent a number
* **Fraction uses 8bits.Exponent uses 23 bits**
* For **double precision** it uses *2word(64bits)* to represent a number
* **Fraction uses 11 bits. Exponent uses 52 bits**
> Example: Turn -0.75 to binary representation in single precision
> 
* As for the exponent part since the 2's complement's range is $-2^7 \sim 2^7-1$ thus the 0 will be `01111111` but its not intuitive to think.If we shift the number by -127($2^7-1$) than the 0 will be `00000000`which is easy to understand.Also known as **Bias Notation**
* To *get the expression* in IEE754 **add 127 to the exponent**
* To translate IEEE754 to *actual nubmer* **sub 127 the exponent**
* For double precision we use **1023** as bias
* In general for **N bits of exponent the Excess will be $2^{n-1} -1$**
* It make the comparision of FP easlier
> 
> For the example above the acutal exponent is `-1` to translate into IEEE754 -1 + 127 = 126 = `01111110` in binary
> So the IEEE754 single precision representation for -0.75 is
> => `1 01111110 100000000000000000000000`
> 
* With the bias notation the 0 will not be all 0 in IEEE754.And some special case will ocuur too.
> 
* From the chart above we can know that
> 
* For Denormalized Number
> 
* Or **in general we can simplify it to the chart below**
> 
> Example:
> 
> 5 bits expoenet => the excess will be $2^{5-1} - 1 = 15$
> total 16 bits => bits for fraction is 16-1(sign)-5=10 bits
> For max = $1.1111111111 * 2^{15}
* From the example above, we can see that the denormalized part is not indeed required but $\pm0, \pm\infty ,Nan$ is required
> Example 2:
> 
> 
## IBM Format
>
* The **base is 16**
* **Excess-64** instead of 63
* Normalized number by move 4 bits left until the MSB is 0
> Example: $-938.8125$
> 
# Floating Point Arithmetic
## FP Addition
* There's five steps to perform a fp addition
1. Shift the number whose exponent is lower to match the other number's exponent
2. Add the significant directly
3. Normalize the number by shift left/right
4. If the fraction requires more bits than the maximum number of bits than round it to match the regulation
5. Check if normalized or not.If so then done,it not then do 3 again.
> 
> 
>Example:
> 
* **FP Addition has no associative law**
## FP Multiplication
* There's five steps tp perform a multiply
1. Add the exponent
2. Multiply the number
3. Normalize the number
4. Round it if needed
5. Set the sign
* For step 1, with biased expression the exponent for each operand is divied by the bias number.But if we add the exponent together than acutally we divide the exponent by the bias two time.Thus we need to divide the final result by the bias again.
> 
> Example:
> 
> 
## Rounding
* To understand rounding in binary we first discuss rounding in decimal
### Rounding in Decimal
> 
* If without the GR bits we drop all bits can't fit inside the fraction bits
> Example for Round to nearest even
> For LSB is even number => 24.50 => drop it => 24
> For odd 23.50 => carry-in => 24
> Example:
> 3 significan bits => 2 bits for fraction
> 
* To get more accurate result,we add **sticky bits** after to track if there's number after the btis.
* If the sticky bit is 0 then even the LSB is odd the 50 case will not carry in.
> 
* To estimate the accuracy of arithmetic operation,we use **Units in the Last Place(ULP)**,which represents the **number of bits being dropped**.That being said the ULP is an **integer**
> Example: $2.34*10^5 + 2.56*10^0$ with 3 significant bits
> 
### Rounding in Binary
* In general,just like the rounding in decimal
> 
> Example:
> 
* MIPS have other rounding rules.All of them are under the circumtance of **GR is non-zero**
1. Rounding to Positive Infinity: If not then carry in
2. Rounding to Negative Infinity
1. Number > 0 => drop it
2. Number < 0 => Carry-In
### Fused Multiply Add
* MIPS has no this kind of instruction
> Example: **madd** in ARM
> madd a a b c => a = a*(b+c)
* Normally it will perform rounding twice,but in this kind of instruction **it will only does rounding once to ensure precision**
# Other things
## Shift Right vs Divide 2
* If the number is > 0 then slr = /2, but with < 0 the result will be wrong
> Example:
> 
* We can add the $2^n - 1$ before head to the result to correction the answer => 1011 + 0011 = 1100 => slr 2 => 1111 = -1
## Overflow Dectection by Software
### Signed
* Signed arithmetic's overflow will be detected by the OS.But it can be done by ourself.
* We use xor to identify the two operands are same sign(same=>answer will be negative) or not.And use the result to compare with the result we get from addu.
> 
> 
### Unsigned
* The unsigned arithmetic is not easy to detect overflow.But we can compare the number with the maximum number allowed to check if the arithmetic will cause overflow or not.
> 
* In that case if t1 is larger than `11111111111111111111111111111111` than overflow will occur => To dectect overflow we first compare t1 and t2.
>
> If use add in the first instruction then if an overflow ocurr the OS will terminate the program which is not we want.