Chap. 03 - Computer Arithmetic

# Chap. 03 - Computer Arithmetic > 課程內容 : 清華大學開放式課程黃婷婷教授 > 參考書目 : Computer Organization and Design: The Hardware/Software Interface (5th), David A. Patterson, John L. Hennessy > > 其他科目內容請參見 [[Here]](https://hackmd.io/@ChenZE/By4SOO6Jyl) ## Content * [Sec. 3.1 Addition and subtraction](#Sec-31-Addition-and-subtraction) * [1. Problem: design MIPS ALU](#1-Problem-design-MIPS-ALU) * [2. Functional specification](#2-Functional-specification) * [Sec. 3.2 Constructing an arithmetic logic unit](#Sec-32-Constructing-an-arithmetic-logic-unit) * [1. Bulding ALU](#1-Bulding-ALU) * [1.1 A bit-slice ALU](#11-A-bit-slice-ALU) * [1.2 A 1-bit ALU](#12-A-1-bit-ALU) * [1.3 How about subtraction](#13-How-about-subtraction) * [1.4 NOR operation](#14-NOR-operation) * [1.5 Set on less than](#15-Set-on-less-than) * [1.6 Overflow (溢位)](#16-Overflow-溢位) * [1.7 Zero detection](#17-Zero-detection) * [2. Fast adders](#2-Fast-adders) * [2.1 Problem with ripple carry adder](#21-Problem-with-ripple-carry-adder) * [2.2 Carry look-ahead](#22-Carry-look-ahead) * [2.3 Cascaded carry look-ahead](#23-Cascaded-carry-look-ahead) * [2.4 Multiple level carry look-ahead adder](#24-Multiple-level-carry-look-ahead-adder) * [2.5 Arithmetic for multimedia](#25-Arithmetic-for-multimedia) * [Sec. 3.3 Multiplication](#Sec-33-Multiplication) * [1. Unsigned multiply](#1-Unsigned-multiply) * [1.1 Multiplication in MIPS](#11-Multiplication-in-MIPS) * [1.2 Unsigned multiply - Ver. 1](#12-Unsigned-multiply---Ver-1) * [1.3 Unsigned multiply - Ver. 2](#13-Unsigned-multiply---Ver-2) * [2. Signed multiply](#2-Signed-multiply) * [2.1 Booth's algorithm](#21-Booths-algorithm) * [2.2 Sequential multiplier](#22-Sequential-multiplier) * [2.3 Faster multiplier](#23-Faster-multiplier) * [Sec. 3.4 Division](#Sec-34-Division) * [1. Division in MIPS](#1-Division-in-MIPS) * [1.1 Divide: paper and pencil](#11-Divide-paper-and-pencil) * [2. Divide - ver. 1](#2-Divide---ver-1) * [2.1 Hardware design](#21-Hardware-design) * [2.2 Algorithm](#22-Algorithm) * [3. Divide - ver. 2](#3-Divide---ver-2) * [3.1 Hardware design](#31-Hardware-design) * [3.2 Algorithm](#32-Algorithm) * [4. Complementary](#4-Complementary) * [Sec. 3.5 Floating point](#Sec-35-Floating-point) * [1. Representation](#1-Representation) * [1.1 Scientific notation](#11-Scientific-notation) * [1.2 FP representation](#12-FP-representation) * [1.3 IEEE-754 standard](#13-IEEE-754-standard) * [1.4 Transformation](#14-Transformation) * [1.5 Zero and special number](#15-Zero-and-special-number) * [2. Addition and multiplication](#2-Addition-and-multiplication) * [2.1 Addition](#21-Addition) * [2.2 Multiplication](#22-Multiplication) * [Clncluding remarks](#Clncluding-remarks) ## Sec. 3.1 Addition and subtraction ### 1. Problem: design MIPS ALU MIPS 的 ALU (算術與邏輯單元，**A**rithmetic and **L**ogic **U**nit) 必須包含以下四種運算 * 算術運算: add (加法) 與 sub (減法) * 主要採用 2-補數的加法器/減法器，並且搭配 overflow(溢位) 的偵測 * 邏輯運算: and、or 與 nor * 比較: slt (set on less than) ### 2. Functional specification 基本的 ALU 構造如下圖，其中 A 與 B 分別表示第一個與第二個 input，且每個 input 都是 32-bits。ALUop 表示 ALU 的 control (控制碼)，共有 4-bits。最後輸出的結果也是 32-bits ![basic-ALU](https://hackmd.io/_uploads/ByyTLNT_C.png) | ALU Control (ALUop) | Function | | :-: | :-: | | 0000 | and | | 0001 | or | | 0010 | add | | 0110 | subtract | | 0111 | set on less than | | 1100 | NOR | :::info ALU 的計算被歸類在 R-format 的 instruction 之中，而 R-format 中的 func field 指的就是 ALU control ::: ## Sec. 3.2 Constructing an arithmetic logic unit ### 1. Bulding ALU #### 1.1 A bit-slice ALU 因為 MIPS 架構中的資料與指令都是 32-bits 的結構，所以將其拆解為較小的 binary bit 進行運算，每個 bit-slice 計算完之後再合併為最後的結果，類似演算法中的 divide and conquer (如下圖) ![image](https://hackmd.io/_uploads/BJ8Y_VTOA.png) #### 1.2 A 1-bit ALU 此處先聚焦在單一位元 (1-bit) 的 ALU 上，以下圖為例。有兩個輸入 A 與 B，兩者會同時進行 AND、OR 與 ADD 的計算(平行運算)，因此理論上會有 3 個輸出值，這 3 個輸出值再通過 Mux 後選擇其中一個作為最後的結果。 :::info MUX(multiplexer)，多工處理器，從多個輸入中選擇其中一個作為輸出值，以 2-to-1 的 Mux 為例，因為有兩個輸入，所以可以使用 1 個 bit 來控制要輸出哪個結果。以此處的 1-bit ALU 中因為有三個輸入 (AND、OR 與 ADD)，所以需要 2-bits 來控制輸出結果 ::: ![image](https://hackmd.io/_uploads/rJLBcEau0.png) 在 Mux 中，AND 所對應的 binary code 是 00，OR 所對應的 binary code 是 01，ADD 所對應的 binary code 是 10。而這個 2-bits 的控制碼就是 ALUop 中的後兩碼 ![image](https://hackmd.io/_uploads/SJGh6Epu0.png) | ALU Control (ALUop) | Function | | :-: | :-: | | 0000 | AND | | 0001 | OR | | 0010 | ADD | | 0110 | subtract | | 0111 | set on less than | | 1100 | NOR | ##### Example: 4-bits ALU 以 4-bits 的結構為例，ALU 的計算如下圖所示，左邊是單一 ALU 的計算，右邊是組合後的 4-bits 結果。Result0 到 Result 3 分別代表每個 bit 計算的結果，Carry In 與 Carry Out 可以想像成是加法計算中的進位機制。紅色箭頭指的是每個 bit 計算時的 operation，也就是 ALUop ![S__2244620](https://hackmd.io/_uploads/HkM30E6_A.jpg) #### 1.3 How about subtraction :::info 2-補數的計算包含兩個步驟: (1) 先做 inverse (i.e., 0/1 變 1/0) (2) 加 1 * $A + B = A + (-B) = A + (B' + 1) = A + B' + 1$ * $B'$ 指的是 $B$ 的 inverse ::: 在減法的計算中，考慮到使用 $A + B = A + (-B)$，所以在 B 的輸入端要額外設計一個 Mux 來選擇是否要做 negate，這個 Mux 只有兩個輸入，所以僅需要 1 個 bit 來控制即可。此外，因為在取 2-補數的時候需要 +1，這個 +1 就由 carry in 來提供，詳細架構如下圖所示 ![image](https://hackmd.io/_uploads/B1RffHTOA.png) 因為減法的本質上就是進行加法運算，所以減法的 ALUop 的後兩碼與加法相同，都是 10。而第 3 個 bit 的 1 指的是 B negate 的 Mux，也就是決定 B 要不要做 negate | ALU Control (ALUop) | Function | | :-: | :-: | | 0000 | AND | | 0001 | OR | | 0010 | ADD | | 0110 | subtract | | 0111 | set on less than | | 1100 | NOR | #### 1.4 NOR operation 考慮 A nor B = (not A) and (not B)。由此可知 NOR 的運算本質上與 AND 相同，因此 ALUop 的後兩碼也相同。另外，因為輸入 B 要做 negate，所以第 3 個 bit 要設為 1。結構圖如下所示。 ![image](https://hackmd.io/_uploads/SJdIBBT_A.png) 同理，因為 A 也會經過一個 A negate 的 Mux，所以**第 4 個 bit 也要設為 1** | ALU Control (ALUop) | Function | | :-: | :-: | | 0000 | AND | | 0001 | OR | | 0010 | ADD | | 0110 | subtract | | 0111 | set on less than | | **1**100 | NOR | 至此，ALUop 中的 4-bits 各自所代表的意義如上所述 #### 1.5 Set on less than :::info (1) **Most significant bit (MSB):** 最高有效位，指的是 binary code 中的最左側數字 (2) **Least significant bit (LSB):** 最低有效位，指的是 binary code 中的最右側數字 (3) 習慣上第 0 個 bit 指的是 binary code 中的作右側數字 ::: 先考慮 1-bit ALU 的結構 (見下左圖)。因為 Mux 多了一個輸入值，所以 set on less than 的 ALUop 後兩碼設為 11。再來，因為 $\mathtt{slt}$ 的運算方式是: 若 $a<b$ 則後面的 register 就設為 1，反之設為 0。因此第 1 個 bit 到第 30 個 bit 是沒有意義的，可以直接設為 0，如下圖的藍色箭頭。 ![S__2244623_0](https://hackmd.io/_uploads/SkGGgUaO0.jpg) 再來考慮 sing bit (第 31 個 bit) 的 ALU (見上右圖)。如果要比較 $a<b$，可以計算 $a-b$ 的結果，$a-b$ 是負數就代表 $a<b$，反之則為 $a>b$。又因為 ALU 內部是進行平行運算，所以減法的結果其實已經有了 (上右圖綠色圓圈處)。當 $a-b$ 的結果為負數時，直接將第 0 bit 設為 1，反之設為 0，如下圖所示的綠色箭頭 ![S__2244626](https://hackmd.io/_uploads/HJnzW8TdA.jpg) 三者結合起來後如圖下圖所示，是一個完整的 $\mathtt{slt}$ 的計算 ![image](https://hackmd.io/_uploads/HyOj-8pdA.png) $\mathtt{slt}$ 的 ALUop 如下，屬於第 4 種輸入的 Mux，所以後兩碼設為 11，因為需要做減法，所以第 3 碼設為 1。 | ALU Control (ALUop) | Function | | :-: | :-: | | 0000 | AND | | 0001 | OR | | 0010 | ADD | | 0110 | subtract | | 0111 | **set on less than** | | 1100 | NOR | #### 1.6 Overflow (溢位) 以 sign number 的 4-bits 運算為例，計算結果理論上應該要在 -8 到 7 之間。 | Decimal | Binar | Decimal | 2-complement | | :-: | :-: | :-: | :-: | | 0 | 0000 | 0 | 0000 | | 1 | 0001 | -1 | 1111 | | 2 | 0010 | -2 | 1110 | | 3 | 0011 | -3 | 1101 | | 4 | 0100 | -4 | 1100 | | 5 | 0101 | -5 | 1011 | | 6 | 0110 | -6 | 1010 | | 7 | 0111 | -7 | 1001 | | | | -8 | 1000 | 但實際上會發現，例如當 $7+3=10$ 或 $-4-5=-9$ 都會超過這個限制範圍，而且以 binary code 計算的話，$7+3$ 會變成 $-6$，$-4-5$ 會變成 7，這就是所謂的 overflow ![image](https://hackmd.io/_uploads/rkNWHUaO0.png) overflow 通常會發生在計算的輸出結果過大(超出上限)或過小(超出下限)的情況，以上述例子來說就是超過 8 或 -7。一般來說 overflow 的發生可以歸類為下列情況 * 當兩數的正負符號不同時，相加不會發生 overflow * 當兩數的正負符號相同時 * 兩正數相加，但結果為負 :arrow_right: 發生 overflow * 兩負數相加，但結果為正 :arrow_right: 發生 overflow 總結來說，當 carry in 的 MSB $\neq$ carry out 的 MSB :arrow_right: 發生 overflow (見下圖) ![S__2244627](https://hackmd.io/_uploads/BJHDK8puA.jpg) 為了避免 overflow，可以根據上述 overflow 的發生狀況提前檢測。檢測方式可以利用第 n-1 個 carry in 的 bit 與第 n-1 個 carry out 的 bit 作 XOR $$ \text{overflow} = \text{carry in}[n-1] \oplus \text{carry out}[n-1] $$ ![image](https://hackmd.io/_uploads/SkAs_Upu0.png) #### 1.7 Zero detection Zero detection logic 是一個大的 NOR gate，用來支援條件跳轉 (conditional jump)，例如: $\mathtt{beq}$，結構如下圖所示。會將所有單一位元的運算結果輸入至 NOR gate 中，只要有一個結果是 1 就會輸出 0 ![image](https://hackmd.io/_uploads/HkA1uv1YA.png) ### 2. Fast adders #### 2.1 Problem with ripple carry adder 加法器 (adder) 在計算時惠涉及到進位 (carry) 的狀況，一般來說 carry 會從 LSB 傳到 MSB。又因為加法的計算過程中是 bit-by-bit 的傳播，所以必須要等前一個 bit 算完才能確定是否有 carry 的產生，但這樣速度太慢了會產生延遲 (delay)。以 N-bits 的資料量來說，因為需要進行 N 次的計算，所以會產生 N-stage 的 delay ![image](https://hackmd.io/_uploads/Hkmm2PktC.png) 改善方式可以透過平行運算以及使用更多的 hardware 來加快速度，以此方式所設計的加法器稱為 carry look-ahead adder (提前進位加法器) #### 2.2 Carry look-ahead ![image](https://hackmd.io/_uploads/BJJ33wkKC.png) 以上圖為例，先看 1-bit 的如何實作。在 1-bit 的 ALU 中，第 0 bit 的 carry out (Cout0) 與第 1 bit 的 carry in (Cin1) 是相同的。可依據下式來判斷是否有進位 (carry out) 產生 $$ \text{Cout} = (B * \text{Cin}) + (A * \text{Cin}) + (A*B) $$ 因為要產生進位就必須有兩個以上的 1 才可能有 carry out，而加法器中會有 3 個輸入 (input A、input B 與 carry in)，所以必須兩兩一組進行判斷，看是不是會有每一組同時有兩個 1。上述數學式中的括號就是兩兩一組進行 AND ($*$) 的運算，當有兩個 1 時 AND 的結果也是 1。最後再將三組結果進行 OR ($+$) 的運算，來判斷三個輸入中會不會出現兩個以上的 1 又因為 Cin[n] = Cout[n-1]，所以根據上圖可以得到兩個數學式 $$ \begin{equation} \textbf{Cin1} = \text{Cout0} = (B_0 * \text{Cin0}) + (A_0 * \text{Cin0}) + (A_0 * B_0) \end{equation} $$ $$ \begin{equation} \text{Cin2} = \text{Cout1} = (B_1 * \textbf{Cin1}) + (A_1 * \textbf{Cin1}) + (A_1 * B_1) \end{equation} $$ 將上述第 1 式的 Cin1 代入第 2 式後可得 $$ \begin{align} \text{Cin2} = &(B_1*B_0*\text{Cin0}) + (B_1 *A_0*\text{Cin0}) + (B_1*A_0*B_0) + \\ &(A_1*B_0*\text{Cin0}) + (A_1*A_0*\text{Cin0}) + (A_1*A_0*B_0) + \\ &(A_1*B_1) \end{align} $$ :::info **Review** (1) Generate carry at bit $i$ Adder 一般會有 3 個 input 做相加，兩個是原始位元的輸入 (a and b)，另一個是前一位元計算後所產生的進位 (carry in)。當 adder 中的 input a 與 input b 都是 1 時，不論是否有 carry in，相加後一定會產生 carry out。可以 AND 運算來判斷 (見下式) $$ a_i * b_i = g_i $$ 當 $g_i = 1$ 表示 generate carry (有進位訊號) (2) Propagate carry via bit $i$ 當 adder 中的 input a 與 input b 只有一個 1 時，則有 carry in 必有 carry out，這個位元的運算可以看成是把前一個位元所產生的 carry out 傳遞至下一個位元。可以用 XOR 運算來判斷 (見下式) $$ a_i \oplus b_i = p_i $$ 當 $p_i = 1$ 表示 propagate carry (有傳播訊號) ::: 由 generate carry 與 propagate carry 的概念可再次改寫上述的 Cin1 與 Cin2 如下: $$ \begin{align} \text{Cin1} = \text{Cout0} &= (B_0 * \text{Cin0}) + (A_0 * \text{Cin0}) + (A_0 * B_0)\\ &= (A_0 * B_0) + (A_0 * \text{Cin0}) + (B_0 * \text{Cin0})\\ &=g_0 + (A_0 + B_0) * \text{Cin0}\\ &=g_0 + (p0 * \text{Cin0}) \end{align} $$ 此式在判斷當前的 $a_0$、$b_0$ 與 $\text{Cin0}$ 三者相加後會不會有進位 (carry out) 的產生。$g_0$ 表示當前的第 0 bit ($a_0$ 與 $b_0$) 會不會產生 generate carry。而 $p_0 * \text{Cin0}$ 表示如果沒有第 0 bit 沒有 generate carry 的話，就判斷需不需要把 $\text{Cin0}$ 往後傳遞。同理，可得 $\text{Cin2}$ 如下: $$ \text{Cin2} = \text{Cout1} = g_1 + (p_1 * g_0) + (p_1 * p_0 * \text{Cin0}) $$ $g_1$ 表示判斷當前的第 1 bit ($a_1$ 與 $b_1$) 會不會產生 generate carry。$p_1*g_0$ 表示第 1 bit 沒有 generate carry，並判斷第 0 bit ($a_0$ 與 $b_0$) 會不會產生 generate carry，而第 1 bit 負責進行 propagate carry。$p_1 * p_0 * \text{Cin0}$ 表示第 1 bit 與第 0 bit 都不會產生 generate carry，所以判斷需不需要把 $\text{Cin0}$ 往後傳遞。同理可再得 $\text{Cin3}$ 如下: $$ \text{Cin3} = \text{Cout2} = g_2 + (p_2 * g_1) + (p_2 * p_1 * g_0) + (p_2 * p_1 * p_0 * \text{Cin0}) $$ 因為要進行 carry look-ahead 的方法做平行運算，所以必須使用更多的 hardware 來實作，是一種以空間換取時間的方法。常見的 carry look-ahead adder 的變形有兩種: (1) cascaded carry looh-ahead addder 與 (2) multiple level carry look-ahead adder #### 2.3 Cascaded carry look-ahead 如果將所有的 bit 都以 look-ahead 的方法來做，那結構會變得太複雜。改善方式是將某幾個 bit 組合在一起視為 block，在 block 內部進行 carry look-ahead 來加速，如下圖所示 ![image](https://hackmd.io/_uploads/HycSpOyFA.png) #### 2.4 Multiple level carry look-ahead adder 將 n-bits 視為一個 block 並在 block 內部進行 look-ahead 後可進行加入運算，但是 block 之間還是會有 delay。改善方式是在 block 與 block 之間再次進行 look-ahead 來加速。 block 是屬於 generate (以大寫 G 表示) 還是屬於 propagate (以大寫 P 表示)，可透過 BLOCK 內部的 n-bits 來判斷。當 block 內部的 bits 都是 propagate 時，該 block 就是 Propagate (以 3- bits 為例) $$ P = p_3 * p_2 * p_1 * p_0 $$ 當 block 內部的 n-bits 至少有一個 generate 時，該 block 就是 Generate (以 3- bits 為例) $$ G = g_3 + (p_3*g_2) + (p_3*p_2*g_1) + (p_3*p_2*p_1*g_0) $$ ![image](https://hackmd.io/_uploads/B1olbtyKA.png) #### 2.5 Arithmetic for multimedia 在圖形 (graphics) 與媒體 (media) 的處理器中，通常是在 8-bits 或 16-bits 的向量資料上運算，例如: SIMD (single-instruction, multiple-data)。類似這樣的 ISA 中，通常會涉及到 saturating operation (飽和運算)，簡單來說就是當發生 overflow 時，取最大數值即可，例如: clipping in audio 或 saturation in video ## Sec. 3.3 Multiplication ### 1. Unsigned multiply 以 MIPS 的架構為例，CPU 中有兩個特殊的 register: Lo 與 Hi，用來進行乘法運算，如下圖所示 ![S__2244637](https://hackmd.io/_uploads/SyfPeq1tC.jpg) #### 1.1 Multiplication in MIPS 在 MIPS 中有關乘法的指令架構如下: $$ \mathtt{mult}\ \ $\text{t1},\ $\text{t2} $$ 可以注意到說並沒有 destination register，因為 32-bits 乘上 32-bits 的結果會是 64-bits，但是 register 無法儲存 64-bits 的資料，所以會使用兩個特殊的 register: Lo 與 Hi 來儲存。\$t1 * \$t2 後的 64-bits 結果中，前 32-bits 會放在 Hi 中，後 32-bits 會放在 Lo 中。使用時可以透過 move from hi ($\mathtt{mfhi}$) 指令與 move from lo ($\mathtt{mflo}$) 指令來將資料從 Hi 與 Lo 中移至一般的 destinaiton register，示意圖如下。 $$ \begin{align} \mathtt{mfhi}\ \ $t3\\ \mathtt{mflo}\ \ $t4\\ \end{align} $$ ![image](https://hackmd.io/_uploads/Hy1CE9ytC.png) #### 1.2 Unsigned multiply - Ver. 1 手寫直式乘法如下所示，以 1000 作為被乘數 (multiplicand)，1001 作為乘數 (multiplier)為例，binary 的乘法過程基本上會牽涉到兩件事: * multiplier * = 1 時，複製 multiplicand (紅色數字) * = 0 時，全部寫 0 * 每乘完一次 multiplicand 的值 (又稱 partial product，藍色數字) 需往左位移 (綠色數字) ![Untitled](https://hackmd.io/_uploads/Hk2pd9ytC.png) 總結來說， 64-bits 的 multiplicand 在運算過程需要往左位移，32-bits 的 multiplier 需要往右位移，因為每次只乘上 multiplier 的一個位元，所以需要進行多次的 iteration (下圖綠色螢光筆) ![S__2244638](https://hackmd.io/_uploads/rywqs5kKR.jpg) 詳細流程圖與演算法如下圖，但以此方式的乘法會有幾個缺點: (1) 因為需要作多次的 iteration，所以乘法的計算速度較慢。(2) multiplicand 的長度雖然是 64-bits，但實際上會用到的有效程度只有內部的 32-bits，且左右的 bits 都是 0。可透過兩種方式改善 ![2未命名 (1)](https://hackmd.io/_uploads/SyHk1jkK0.png) #### 1.3 Unsigned multiply - Ver. 2 > **Improvement (1)** > Instead of shifting multiplicand to left, shift (partial)product to right 如下右圖所示，每次 iteration 中，將原本要左移的 multiplicand (上紅色與上綠色數字) 位置保持不變，改為 partial product 右移(下紅色數字) ![Untitled](https://hackmd.io/_uploads/rkjlmo1F0.png) > **Improvement (2)** > Since product register wastes space, combine multiplier and product register 因為 partial product (32-bits) 是往右移動，multiplier (32-bits) 也是往右移動，所以兩者可以合併在同一個 register 中節省空間。結構與演算法如下兩圖所示 ![3F6344AF-3C92-47AF-A6C5-3A1AF9AE0ED5](https://hackmd.io/_uploads/rkMHVoJKR.jpg) ![74135949-F812-426A-8CA2-FD17695924E4](https://hackmd.io/_uploads/H1kU4sytA.jpg) ### 2. Signed multiply 直式的手寫乘法如下所示，同樣要注意兩件事: * multiplicand 需要做 sign extended * 當 multiplier 算到做左邊時，因為 MSB 是代表 sign bit，所以要進行額外處理 * 當 sign bit = 0 時，直接以 0 * multiplicand * 當 sign bit = 1 時，以 (-1) * multiplicand，也就是減掉 multiplicand ![Untitled](https://hackmd.io/_uploads/rkjj1A1t0.png) #### 2.1 Booth's algorithm 以下列乘法為例 $$ A \times 000\textbf{11111} = A \times (00\textbf{1}00000 - 0000000\textbf{1}) $$ 當進行左邊的乘法運算時，因為有五個 1，所以需要進行 5 次的 iteration，但如果改成右式，雖然需要做減法，但因為只有兩個 1，所以只要進行 2 次的 iteration 即可，可以加速計算，這就是 Booth's algorithm 的精神。 > **Booth's algorithm** > (1) 遇到連續 1 時可拆解 > (2) 如何拆？以 $2_{10} \times 6_{10} = 0010_2 \times 0110_2$ 為例原本的直式乘法如下: ![image](https://hackmd.io/_uploads/S1z3T0rYR.png) 改成下列算法: * multiplier 的,第一個 1 用減的 * 中間都不變 * multiplier 的最後一個 1 之後用加的 原因在於 6 可以拆解為 $6=-2+8 \Rightarrow 0110=-00010+01000$ ![image](https://hackmd.io/_uploads/ryTXRABFC.png) ![image](https://hackmd.io/_uploads/rkISeJUtA.png) ::: warning 但要如何判斷當前 multiplier 中的 bit 是第幾個 1 ? ::: 可使用目前位元 (current bit) 與右側位元 (bit to right) 來判斷 | Current bit | Bit to right | Explanation | Example | Note | | :-: | :-: | :-: | :-: | :-: | | 1 | 0 | begins run of 1 | 0000111**10**00 | 第一個 1 用減的 | | 1 | 1 | middle run of 1 | 000011**11**000 | 中間不動 | | 0 | 1 | end run of 1 | 000**01**111000 | 最後一個 1 用加的 | | 0 | 0 | middle run of 0 | 00**00**1111000 | 中間不動 | ##### Example: $2 \times 7=0010 \times 0111$ ![S__2260995](https://hackmd.io/_uploads/SJ4DDWLFA.jpg) #### 2.2 Sequential multiplier 以直式乘法為例，adder 的計算通常是兩兩一組，算完 multiplier 的一個 bit 後才相加。這個 adder 算完後把答案放在 register 再繼續動作。但實際上每一次 binary operation 的 mulciplicand 都是可以事先得到的 (做 AND)，這種一次性做完而不是依序做完的方法，稱為 combinational multiplier，以下圖直式乘法為例，至少需要 3 個 adder ![S__2260996](https://hackmd.io/_uploads/rkWrjZLYC.jpg) #### 2.3 Faster multiplier Conbinational multiplier 會有 3 個輸入與 2 個輸出並搭配多個 adder。結果只會有一個 full adder 的 delay。如下所示，每個位置都是 fuller adder，不會做 propagate carry ![image](https://hackmd.io/_uploads/B19eyGLYA.png) ## Sec. 3.4 Division 乘法與除法概念相同，共用同一個 component 即可 ### 1. Division in MIPS $$ \mathtt{div} \ \ $\text{t1}, \ \ $\text{t2} $$ 上述 MIPS 的語法表示 \$t1 / \$t2，其中 quotient (商數) 儲存在 $Lo，remainder (餘數) 儲存在 $Hi。 unsigned 的算法可為 $\mathtt{divu}$ 即可。此外可透過下列語法取出 quotient 與 remainder $$ \mathtt{mflo} \ \ $\text{t3} $$ $$ \mathtt{mfhi} \ \ $\text{t4} $$ 在 MIPS 架構中，除法不會檢查 overflow 與 divider by 0。使用者需透過 software 自行檢查，因為結構會變得太過複雜 #### 1.1 Divide: paper and pencil Binary operation 的直式除法運算如下，與 decimal 的除法類似 * Divisor 乘上 quotient 的 1 * 當相減答案大於 0 時 * 直接相減 (橘色部分) * 減完後從 dividend 降一個 bit (藍色箭頭左) * 當相減答案小於 0 時 (橘色部分) * quotient 補 0 * 從 dividend 降一個 bit (藍色箭頭中) 最後 remainder 為負數時除法結束。當 quotient 補 0 且 dividend 下降一個 bit 時，此動作可以視為 shift right (綠色部分) ![S__2260997_0](https://hackmd.io/_uploads/r1234zUKC.jpg) ### 2. Divide - ver. 1 #### 2.1 Hardware design ![S__2260999_0](https://hackmd.io/_uploads/H1i8UMIYC.jpg) 在 hardware 設計上，會有 64-bits 的 divisor register，但初始值只放左邊的 32-bits 並在每次 iteration 中往右移動。 64-bits 的 ALU。64-bits 的 remainder，且初始值為 32-bits 的 dividend。32-bits 的 quotient register 並在每次 iteration 中往左移動。設計圖如上 #### 2.2 Algorithm ![image](https://hackmd.io/_uploads/BJsuUMIYA.png) ### 3. Divide - ver. 2 #### 3.1 Hardware design 如同乘法設計所遇到的問題一樣，在 64-bits 的 divisor register 中只用了中間的 32-bits，且左右兩邊都是 0。改善方式如下: * Divisor 不動，remainder 左移 * Remainder 與 quotient 分享共用一個 register ![image](https://hackmd.io/_uploads/HkmivG8tC.png) #### 3.2 Algorithm ![image](https://hackmd.io/_uploads/H1fpPzUYR.png) ### 4. Complementary 當遇到 signed divided 時，要注意以下事情: * dividend 與 remainder 要有相同的 sign * 當 divisor 與 dividend 的 sign 不同時 (異號)，quotient 為負號總而言之，需滿足除法原理: $$ \text{Dividend}=\text{Divosor} \times \text{Quotient} + \text{Remainder} $$ 此外，因為乘法與除法的計算方式與架構相同，所以兩者的架構是可以共用的 ![image](https://hackmd.io/_uploads/B1EZYf8KR.png) ## Sec. 3.5 Floating point N-bits 數字的表示法可有很多種 | Representation | Range | | :-: | :-: | | Unsigned | $0$ to $2^n - 1$ | | 2's complement | $-2^{n-1}$ ~ $2^{n-1}-1$ | | 1's complement | $-2^{n-1}+1$ ~ $2^{n-1}$ | | Excess M | $-M$ ~ $2^{n}-M-1$ | 但是當面臨到很大/很小的數，或是有理數/無理數等就無法用上述方式做表示，必須用小數點表示。與整數的表示法相比，例如 $2=2.0$ 或 $125=125.0$ 的小數點都在固定位置 (fixed)，但 $1.2$ 或 $104.23$ 的小數點就不會在相同位置 (unfixed)，所以小數表示法才被稱為浮點數 (floating pint number) ### 1. Representation #### 1.1 Scientific notation Binary 的科學記號表示法如下: $$ 1.0_2 \times 2^{-1} $$ * $1.0$ 稱為 significand 或 mantissa * $2^{-1}$ 中 * 2 稱為 base (或 radix) * -1 稱為 expoment 為了統一浮點數的寫法，在表示一個浮點數時都會進行 normalizer 的動作，也就是讓 leading bit $\neq$ 0，如同 decimal 的科學記號表示法中，小數點左側的數字必須為一位數 * Normalized form: $1.0 \times 10^{-9}$ * Not normalized: $0.1 \times 10^{-8}$ 又因為在 binary code 中數字不是 0 就是 1 ，所以 leading bit 必 = 1 #### 1.2 FP representation Normalized format for binary: $$ 1.\text{xxxxxxxx} \times 2^{\text{yyyy}} $$ 因為 leading bit = 1，所以上述表示法寫成 binary code 後只要紀錄 significand 的 xxxxxxxx 與 exponent 的 yyyyyyyy 即可。此外，在浮點數的表示法中又分為 32-bits 的 single precision (單精準度) 與 64-bits 的 double-precision (倍精準度)。兩者 binary code 的表示法與意義如下: ![image](https://hackmd.io/_uploads/ryFnzX8t0.png) * S 表示 signed bit (1-bit) * Exponent 表示 yyyyyyyy 的數值 (8-bits for single and 11-bits for double) * Significand 表示 xxxxxxxx 的數值 (23-bits for single and 52-bits for double) 從 single 到 double 的擴增中，主要是增加 significand 的位數。因為增加 exponent 的 bits 表示增加可表示的數值範圍 (accuracy)，而增加 significand 的 bits 表示增加精準度 (precision) #### 1.3 IEEE-754 standard float number 的表示法一般是參照 IEEE-754 的規定，以下以 32-bits 的 single precision 為主，64-bits 的表示法概念相同。主要分成三個部分說明: sign bit、significand 與 exponent ##### Sign bit * 0 for positive * 1 for negative ##### Significand 因為 normalized 後的浮點數形式都相同，所有的 leading bit 都是 1，所以在 significand 寫法中可以把這個 leading bit 省略 (hidden)，但實際上從 binary code 轉成科學記號時要記得補上。另外，數字 0 因為本身就沒有 leading bit，所以會有另外的表示法 (後面會提到) ##### Exponent 當兩個浮點數比較大小時，通常都會先比較正負 (signed)，再看exponent 的大小，當 exponent 相同時才會比較 significand (所以 binary code 的寫法中 signed bit 與 exponent 才會放前面)。比較 exponent 時如果用減法比較，因為還必須轉成 2's complement 來表示，這樣計算上速度太慢了，所以必須找到一種更快的比較方法，這種方法稱為 biased notation。以 4-bits 為例 (如下圖)，4-bits 最多可表示 -7 ~ 8 的數字範圍，所以將 binary code 與 decimal 由小排大到並一一對應。因為前 7 個位置已經給負數使用，所以 binary code 所表示數字的實際大小為 unsigned number - 7，這種表示法稱為 biased 7。例如綠色部份的 binary code 是 1011 ，以 biased 7 表示的話，實際上的大小為 $$ (1011)_2 - 7 = 11 - 7 = 4 $$ (若以 2's complement 表示的話則為 -5) 以 biased notation 來表示 exponent，就可以依序從 bit 的順序來比較 exponent 的大小，但要記得換回科學記號的寫法時要 - bias 才是真正的 exponent。在 IEEE-754 中，single-precision 使用 bias of 127 ($2^8-1$)，double-precision 使用 bias of 1023 ($2^{10}-1$) ##### Summary 以一個 single-precision 的表示法為例， ![image](https://hackmd.io/_uploads/ryB5F7IK0.png) 寫成科學記號的表示為 $$ (-1)^s \times (1.\text{significand}) \times 2^{\text{exponent}-127} $$ #### 1.4 Transformation ##### FP to decimal 以下面的 IEE-754 表示法為例 $$ 0\ |\ 0110\ 1000\ |\ 101\ 0101\ 0100\ 0011\ 0100\ 0010 $$ * Signed bit: 0 (positive) * Exponent: * unsugn: $0110\ 1000_2=64+32+8=104_{10}$ * bias 127: $104=127=-23$ * Significand: $1.0+2^{-1}+2^{-3}+2^{-5}+2^{-7}+2^{-9}+2^{-14}+2^{-15}+2^{-17}+2^{-22}$ * hidden bit: 1 * others: $2^{-1}+2^{-3}+2^{-5}+2^{-7}+2^{-9}+2^{-14}+2^{-15}+2^{-17}+2^{-22}=0.666115$ 最後結果為 $1.666115 \times 2^{-23}$ ##### Decimal to FP 先做 decimal 轉成 binary。以 0.75 為例 $$ \begin{align} 0.75 \times 2 = \textbf{1}.5\\ 0.5 \times 2 = \underline{1}.0\\ \end{align} $$ 所以 $(0.75)_{10}=(0.\textbf{1}\underline{1})_2$，在做 binary 轉成 FP，$(0.11)_2=1.1 \times 2^{-1}$ * signed bit = 0 * exponent: $-1+127=126=01111110$ * significand: 1 $(1.1 \times 2^{-1})_2 = 0\ 01111110\ 10000000000000000000000$ #### 1.5 Zero and special number 不論是 single-precision 還是 double-precision，在 exponent 的位置都保留 00..00 (全為 0) 與 11..11 (全為 1) 另做其他用途，因此 exponent 可以分成 3 種狀況討論 * 全為 0 * 0...01 ~ 1...10 (1~254) * 全為 1 再根據 significand 的不同可以產生以下幾種狀況，其中 exponent = 1~254 為一般的浮點數表示法 | Exponent | Significand | Object | | :-: | :-: | :-: | | 0...0 | 0 | ??? | | 0...0 | non-zero | ??? | | 1~254 | anything | ± floating number | | 1...1 | 0 | ??? | | 1...1 | non-zero | ??? | ##### Representation for 0 當 exponent 都 = 0 且 significant 也都 = 0 時，表示此 floating number = 0，再根據 signed bit 的不同分為 +0 與 -0，要注意的是這種形式下的浮點數沒有 hidden bit (即 leading bit 1) :::info 事實上，只要是 exponent 都 = 0 的特殊狀況，都沒有 hidden bit 1 ::: ##### Gradual underflow bit representation 能表示的浮點數值有限，如果浮點數太小會發生 under flow；反之太大會發生 overflow。gradual underflow 指的是在 bit representation 有限的情況下，盡可能去表示 underflow 的數，發生在 exponent 都 = 0 但 significand ≠ 0 時，用來表示一個 denormalized 的數。理論上來說，當 exponent = 0..01 且 segnificand = 0..0 時會有最小值 $1.0000\ 0000\ 0000\ 0000\ 0000\ 000 \times 2^{-126}$，所謂的 de-normalized number 指的是當 exponent 都 = 0 但 significand ≠ 0 時，不用加 hidden bit 1，直接寫浮點數即可。例如 $$ 0\ | 0000\ 0000\ |\ 0000\ 0000\ 0000\ 0000\ 0000\ 000\ | = 0.0000\ 0000\ 0000\ 0000\ 0000\ 011 \times 2^{-126} $$ ##### ± Infinity 當浮點數運算時，如果發生除以 0 的狀況，會產生 ±∞，發生在 exponent 都 = 1 且 significant 都 = 0 時，又根據 signed bit 的不同分成 ±∞ ##### Not a number 每些情況下算不出答案的情況，例如 $\sqrt{-4}$，會以 NaN (Not a Number) 呈現，當 exponent 都 = 1 且 significand ≠ 0 時就用來表示 NaN，主要用來 debug | Exponent | Significand | Object | | :-: | :-: | :-: | | 0...0 | 0 | ±0 | | 0...0 | non-zero | gradual underflow | | 1~254 | anything | ± floating number | | 1...1 | 0 | ±∞ | | 1...1 | non-zero | NaN | ### 2. Addition and multiplication #### 2.1 Addition 浮點數加法原則上有 4 個步驟，並以 $1.000_2 \times 2^{-1} + (-1.110)_2 \times2^{-2}$ 為例 (0.5+(-0.4375)) 1. 對齊小數點 * 將較小的數右移 (right shift) 來對齊小數點 (binary point) * 透過兩數相減來找較小的數 * Ex: $1.000_2 \times 2^{-1} + (-0.111)_2 \times2^{-1}$ 2. significand 相加 * Ex: $(1.000_2 - 0.111_2) \times 2^{-1}=0.001 \times 2^{-1}$ 3. Normalization 並檢查是否有 over/underflow (需要的話) * 減小 exponent :arrow_right: 左移 * 增加 exponent :arrow_right: 右移 * 檢查 over/underflow * Ex: $1.000_2 \times 2^{-4}$ 4. Round the mantissa (捨入) 並 renormalized (需要的話) * Ex: $1.000_2 \times 2^{-4}$ :::info 這邊所指的右移 (right shift) 與左移 (left shift) 與一般數字的移動不同，一般十進位的小數在移動時是移動小數點，這邊的移動是指移動數字 (1) left shift 時數字往左 $\Leftrightarrow$ 小數點往右移動 $\Leftrightarrow$ 指數變小 (2) right shift 時數字往右 $\Leftrightarrow$ 小數點往左移動 $\Leftrightarrow$ 指數變大 ::: 架構設計如下，左圖為原圖，右圖為輔助說明 ![未命名 (1)](https://hackmd.io/_uploads/BJ0LcDIKC.jpg) #### 2.2 Multiplication 浮點數乘法原則上有 5 個步驟，並以 $1.000_2 \times 2^{-1} \times (-1.110)_2 \times2^{-2}$ 為例 (0.5×(-0.4375)) 1. exponent 相加 * 因為 exponent 是使用 biased notation 表示，所以兩個 exponent 相加時會加算一次的 bias，必須多減掉一次的 bias 才是真正的 exponent (見下補充) * Ex: * unbiased: (-1) + (-2 = -3 * biased: (-1+127) + (-2+127) $\Rightarrow$ (-3)+254-127=(-3)+127 2. mantissa 相乘 * Ex: $1.000_2 \times 1.110_2 = 1.1102_2 \Rightarrow 1.110_2 \times 2^{-3}$ 3. normalized 結果並檢查 under/underflow * Ex: $1.110_2 \times 2^{-3}$ 不變 4. Round mantissa 並 renormalized (需要的話) * Ex: $1.110_2 \times 2^{-3}$ 不變 5. 設置 signed bit * Ex: $-1.110_2 \times 2^{-3}$ :::info 以 biased = 7 為例 Exponent = 1001 時表示 $1101_2-7=9_{10}-7=2$ 次方 Exponent = 1100 時表示 $1100_2-7=12_{10}-7=5$ 次方當兩個 exponent 相加時 1001 + 1100 = 10101，如果直接使用 biase = 7 換算會得到 $10101_2-7=21_{10}-7=14$。因為在換算時每個 exponent 都有算一次的 biased，所以相加後必須減去一個 bias 才是真正的 exponent，也就是 $21-7-7=7_{10}=0111_2$ 才是真正的 exponent (或是 $21-14=7 \Rightarrow 10101_2-01110_2=0111_2$ 也可以) ::: 在 MIPS 的架構設計上，因為浮點數計算與表示方式不一樣，所以浮點數會有自己的處理器，稱為 coprocessor 1 (或 FPU)，是一種專用於處理浮點數的處理器 :::info CPU 是針對整數運算處理 ::: ![image](https://hackmd.io/_uploads/BkiY-uIYR.png) 同樣的，在浮點數運算中 MIPS 也會有自己的 instrustion * single precision: $\mathtt{add.s}$、$\mathtt{sub.s}$、... * double precision: $\mathtt{add.d}$、$\mathtt{sub.d}$、... register 方面，FPU 內部會有自己的 register 提供給浮點數使用，同樣是 32 個 32-bits 的 register，以\$f0、\$f1、... 依序命名。但如果是 double precision 的運算時，因為 32-bits 的長度不夠，所以 register 會以成對的方式 (\$f0/\$f1、\$f2/\$f3、... ) 提供 64-bits 的計算如果要在 main processor 與 co-processor 之間交換資料，可以使用 $\mathtt{mfc0}$ (move from c0)、$\mathtt{mtc0}$ (move to c0)、$\mathtt{mfc1}$ (moce from c1) ... 等指令 ## Clncluding remarks 在 MIPS 的 ISA 中，只有 54 是常被使用的，其中整數算術佔了 100%，浮點數運算佔了 97%，因此要如果要設計 ISA 可以從這個方面著手。要加速執行的速度，可以藉由最佳化這 54 個指令，要降低成本，可以省去一些不常用到的指令來讓 software 自己完成。