Computer Organization - Note

# Computer Organization Design - Note ###### `大三上` `計算機組織` `計算機結構` 紀錄教授**勾選的題目**跟**課堂作業**答案(**占比50%**) *翻譯是外掛翻的挺詭異的* -------**手機板導航**------- * CH1 * [CH1課本烏龍習題](#課本烏龍習題1) * [CH1課本習題](#課本習題1) * [CH1課堂1](#課堂作業1-1) * [CH1課堂2](#課堂作業1-2) * CH2 * [CH2課本習題](#課本習題2) * [CH2課堂](#課堂作業2) * CH3 * [CH3課本習題](#課本習題3) --------**導航結束**-------- --- # CH1 ## 課本烏龍習題1 由於老師搞錯版本的烏龍事件(5ed) 這區僅供參考真正的題目傳送門在此(6ed) :point_right: [正確課本習題](#課本習題):point_left: * ***1.2*** The eight great ideas in computer architecture are similar to ideas from other fi elds. Match the eight ideas from computer architecture, “*Design for Moore’s Law*”, “*Use Abstraction to Simplify Design*”, “*Make the Common Case Fast*”, “*Performance via Parallelism*”, “*Performance via Pipelining*”, “*Performance via Prediction*”, “*Hierarchy of Memories*”, and “*Dependability via Redundancy*” to the following ideas from other fields: a. **Assembly lines in automobile manufacturing** b. **Suspension bridge cables** c. **Aircraft and marine navigation systems that incorporate wind information** d. **Express elevators in buildings** e. **Library reserve desk** f. **Increasing the gate area on a CMOS transistor to decrease its switching time** g. **Adding electromagnetic aircraft catapults (which are electrically powered as opposed to current steam-powered models), allowed by the increased power generation offered by the new reactor technology** h. **Building self-driving cars whose control systems partially rely on existing sensor systems already installed into the base vehicle, such as lane departure systems and smart cruise control systems** ``` 電腦體系結構中八個偉大的想法與其他領域的想法相似。匹配電腦體系結構中的八個想法：「摩爾定律設計」、「使用抽象來簡化設計」、「使常見情況變得更快」、「透過並行性實現效能」、「透過管線實現效能」、「透過預測實現效能」、「記憶體的層次結構」和「透過冗餘實現可靠性」來自其他領域的以下想法： a. 汽車製造中的組裝線 b. 懸吊橋纜索 c. 包含風資訊的飛機和航海導航系統 d. 建築物內的快速電梯 e. 圖書館預約櫃檯 f. 增加 CMOS 電晶體的閘極面積以減少其開關時間 g. 增加電磁飛機彈射器（電動飛機與目前的蒸汽動力型號相反），增加的功率允許新反應器技術提供的發電 h. 建造控制系統部分依賴現有感測器的自動駕駛汽車已安裝到基礎車輛中的系統，例如車道偏離系統和智慧型巡航控制系統 ``` >:bulb: >![](https://hackmd.io/_uploads/SJrdJcYbp.png) --- * ***1.3*** Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor. ``` 描述轉換用高階語言編寫的程式的步驟將 C 等語言轉換為電腦直接執行的表示形式處理器 ``` >:bulb: ![](https://hackmd.io/_uploads/ryzKk9YWa.png) --- * ***1.5*** Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2. **a. Which processor has the highest performance expressed in instructions per second?** **b. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.** **c. We are trying to reduce the execution time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?** ``` 考慮三個不同的處理器 P1、P2 和 P3 執行相同的指令集。 P1 的時脈頻率為 3 GHz，CPI 為 1.5。 P2 有一個2.5 GHz 時脈頻率和 1.0 CPI。 P3 具有 4.0 GHz 時脈頻率並具有 CPI 2.2. a. 哪一種處理器的效能（以每秒指令數表示）最高？ b. 如果每個處理器在 10 秒內執行一個程序，請找出週期和指令數。 c. 我們試圖將執行時間減少 30%，但這會導致執行時間增加佔CPI的20%。們應該採用什麼時脈頻率才能減少時間？ ``` >:bulb: >![](https://hackmd.io/_uploads/rJfqJ5F-a.png) >![](https://hackmd.io/_uploads/SyZskct-6.png) --- * ***1.7*** Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s. **a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns. b. Assume the compiled programs run on two diff erent processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A’s code versus the clock of the processor running compiler B’s code? c. A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor?** ``` 編譯器會對效能產生深遠的影響一個應用程式的。假設對於一個程序，編譯器 A 產生一個動態的指令數為 1.0E9，執行時間為 1.1 s，而編譯器 B 導致動態指令數為 1.2E9，執行時間為 1.5 秒。 a. 假設處理器有一個時脈週期，求每個程式的平均 CPI 1 ns 的時間。 b. 假設編譯後的程式在兩個不同的處理器上執行。如果執行兩個處理器上的時間相同，哪個處理器的時脈快多少運行編譯器 A 程式碼的處理器與運行處理器的時鐘編譯器B的程式碼？ c. 開發了一種新的編譯器，僅使用 6.0E8 指令，並且具有平均CPI為1.1。與使用這個新編譯器相比，使用這個新編譯器的加速是多少？原處理器上的編譯器 A 或 B？ ``` >:bulb: >![](https://hackmd.io/_uploads/HJF01cYba.png) --- * ***1.8*** The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static power and 90 W of dynamic power.The Core i5 Ivy Bridge, released in 2012, had a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of dynamic power. ``` 2004年發布的Pentium 4 Prescott處理器，時脈頻率為3.6 GHz，電壓為1.25 V。假設平均靜態功耗為10 W，動態功耗為90 W。發布的Core i5 Ivy Bridge處理器2012 年，時脈頻率為3.4 GHz，電壓為0.9 V。假設平均靜態功耗為30 W，動態功耗為40 W。 ``` **1.8.1 For each processor fi nd the average capacitive loads.** >:bulb: >![](https://hackmd.io/_uploads/ByV-e9Y-6.png) **1.8.2 Find the percentage of the total dissipated power comprised by static power and the ratio of static power to dynamic power for each technology.** >:bulb: >![](https://hackmd.io/_uploads/rkaZx5tbT.png) **1.8.3 If the total dissipated power is to be reduced by 10%, how much should the voltage be reduced to maintain the same leakage current? Note: power is defi ned as the product of voltage and current.** >:bulb: >![](https://hackmd.io/_uploads/HJOfe9K-6.png) --- * ***1.9*** Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency. Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same. ``` 假設對於算術、載入/儲存和分支指令，處理器具有CPI 分別為 1、12 和 5。也假設在單一處理器上有一個程序需要執行2.56E9算術指令，1.28E9載入/存儲指令和2.56億條分支指令。假設每個處理器都有 2 GHz 時脈頻率。假設，當程式並行運行在多個核心上時，數量每個處理器的算術和載入/儲存指令數除以 0.7 x p（其中p 是處理器的數量）但是每個處理器的分支指令的數量保持不變。 ``` **1.9.1 Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.** >:bulb: ![](https://hackmd.io/_uploads/H17Dx9K-6.png) **1.9.2 If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?** >:bulb: >![](https://hackmd.io/_uploads/ry1_g5Kb6.png) **1.9.3 To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?** >:bulb: ![](https://hackmd.io/_uploads/Hk3OxqFZa.png) --- * ***1.11*** The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s. ``` 在 AMD 上執行的 SPEC CPU2006 bzip2 基準測試的結果巴塞隆納的指令數為2.389E12，執行時間為750秒，參考時間為 9650 秒。 ``` **1.11.1 Find the CPI if the clock cycle time is 0.333 ns.** >:bulb: >![](https://hackmd.io/_uploads/r1Vol9KbT.png) **1.11.2 Find the SPECratio.** >:bulb: >![](https://hackmd.io/_uploads/S1njeqKWp.png) **1.11.3 Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without aff ecting the CPI.** >:bulb: ![](https://hackmd.io/_uploads/BJLhgct-6.png) **1.11.4 Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.** >:bulb: >![](https://hackmd.io/_uploads/SyY6x5tW6.png) **1.11.5 Find the change in the SPECratio for this change.** >:bulb: >![](https://hackmd.io/_uploads/rJ-0x9t-p.png) **1.11.6 Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. Th e execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.** >:bulb: >![](https://hackmd.io/_uploads/Sy50x5F-T.png) **1.11.7 Th is CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?** >:bulb: >![](https://hackmd.io/_uploads/Bym1Z5Ybp.png) **1.11.8 By how much has the CPU time been reduced?** >:bulb: >![](https://hackmd.io/_uploads/SJ91WqFba.png) **1.11.9 For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without aff ecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.** >:bulb: >![](https://hackmd.io/_uploads/ByLe-9Fba.png) **1.11.10 Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.** >:bulb: >![](https://hackmd.io/_uploads/r1W-WqYW6.png) **1.11.11 Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.** >:bulb: >![](https://hackmd.io/_uploads/BJj-W5tbp.png) --- * ***1.13*** Another pitfall cited in Section 1.10 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. Consider a computer running a program that requires 250 s, with 70 s spent executing FP instructions, 85 s executed L/S instructions, and 40 s spent executing branch instructions. ``` 在第 1.10 節中提到的另一個陷阱是期望改善整體透過僅改進計算機的一個方面來提高計算機的性能。考慮一台電腦運行一個需要 250 秒的程序，其中執行 FP 花了 70 秒指令，執行 L/S 指令 85 秒，執行分支 40 秒指示。 ``` **1.13.1 By how much is the total time reduced if the time for FP operations is reduced by 20%?** >:bulb: >![](https://hackmd.io/_uploads/H1ISW9tb6.png) **1.13.2 By how much is the time for INT operations reduced if the total time is reduced by 20%?** >:bulb: >![](https://hackmd.io/_uploads/r1AHbcYZa.png) **1.13.3 Can the total time can be reduced by 20% by reducing only the time for branch instructions?** >:bulb: >![](https://hackmd.io/_uploads/ByuUbqFbp.png) ## 課本習題1 真正的課本習題從這開始 * ***1.2*** The seven great ideas in computer architecture are similar to ideas from other fields. Match the seven ideas from computer architecture, **“Use Abstraction to Simplify Design”, “Make the Common Case Fast”, “Performance via Parallelism”, “Performance via Pipelining”, “Performance via Prediction”, “Hierarchy of Memories”, and “Dependability via Redundancy”** to the following ideas from other fields: a. Assembly lines in automobile manufacturing b. Suspension bridge cables c. Aircraft and marine navigation systems that incorporate wind information d. Express elevators in buildings e. Library reserve desk f. Increasing the gate area on a CMOS transistor to decrease its switching time g. Building self-driving cars whose control systems partially rely on existing sensor systems already installed into the base vehicle, such as lane departure systems and smart cruise control systems ``` 電腦體系結構的七個偉大思想是與其他領域的想法類似。匹配電腦體系結構中的七個想法：「使用抽象來簡化設計」、「使常見情況變得更快」、「透過並行性提高效能」、「透過管線提高效能」、「透過預測提高效能」、「記憶體層次結構」和「透過冗餘實現可靠性」到其他領域的以下想法： a. 汽車製造中的組裝線 b. 懸吊橋纜索 c. 包含風資訊的飛機和航海導航系統 d. 建築物內的快速電梯 e. 圖書館預約櫃檯 f. 增加 CMOS 電晶體的閘極面積以減少其開關時間 g. 建造其控制系統部分依賴已安裝到基礎車輛中的現有感測器系統的自動駕駛汽車，例如車道偏離系統和智慧巡航控制系統 ``` >:bulb: >![](https://hackmd.io/_uploads/H13FySpWp.png) --- * ***1.3*** Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor. ``` 描述將用高階語言（例如 C）編寫的程式轉換為電腦處理器直接執行的表示形式的步驟。 ``` >:bulb: 這題答案看上面的烏龍題目 >High level language is the language generally consisting of words and algebraic notations, which can be translated by a **compiler** into an assembly language. The **assembler** translates this assembly language into Binary machine language. Thus, the compiler and assembler play an important role in translating a program written in high level language into the machine language.Steps involved in translating a high level program into the language executed by the processor are as follows: >1. The high level program consists of words and algebraic notations. Each word constitutes a group of letters. Each letter is referred to as bit or binary digit. >2. Every instruction in high level program is divided into a mnemonic followed by a set of operators by a compiler. >3. The operation that is to be performed in each instruction of a program is considered as a mnemonic. For example, in instruction A+B, the operation to be performed is add - the mnemonic. >4. Finally, the translated assembly language instructions are then converted into machine understandable language by the assembler. --- * ***1.5*** Consider three different processors P1, P2, and P3executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2. a. Which processor has the highest performance expressed in instructions per second? b. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions. c. We are trying to reduce the execution time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction? ``` 考慮三個不同的處理器 P1、P2 和 P3 執行相同的指令集。 P1 的時脈頻率為 3 GHz，CPI 為 1.5。 P2 的時脈頻率為 2.5 GHz，CPI 為 1.0。 P3 的時脈頻率為 4.0 GHz，CPI 為 2.2。 a. 哪一種處理器的效能（以每秒指令數表示）最高？ b. 若每個處理器在 10 秒內執行一個程序，求週期數和指令數。 C. 我們試圖將執行時間減少 30%，但這會導致 CPI 增加 20%。我們應該採用什麼時脈頻率才能減少時間？ ``` >:bulb: >![](https://hackmd.io/_uploads/SkfYMr6ZT.png) --- * ***1.7*** Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2. Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which is faster: P1 or P2? a. What is the global CPI for each implementation? b. Find the clock cycles required in both cases. ``` 考慮同一指令集架構的兩種不同實作。根據其 CPI，指令可分為四大類（A、B、C 和 D 類）。 P1時脈頻率為2.5GHz，CPI為1、2、3、3； P2時脈為3GHz，CPI為2、2、2、2。給定一個動態指令數為 1.0E6 條指令的程序，分為以下幾類：10% A 類、20% B 類、50% C 類和 20% D 類，哪個更快：P1 或 P2？ A。每個實施的全球 CPI 是多少？ b.找出兩種情況下所需的時鐘週期。 ``` >![](https://hackmd.io/_uploads/HJ-lXrpZp.png) --- * ***1.8*** Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s. a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns. b. Assume the compiled programs run on two different processors. If the execution times on the two processors are the same, how much faster is the clock of the processor running compiler A’s code versus the clock of the processor running compiler B’s code? c. A new compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or B on the original processor? ``` 編譯器可以對應用程式的效能產生深遠的影響。假設對於一個程序，編譯器 A 產生的動態指令數為 1.0E9，執行時間為 1.1 s，而編譯器 B 產生的動態指令數為 1.2E9，執行時間為 1.5 s。 a. 假定處理器的時脈週期時間為 1 ns，求每個程式的平均 CPI。 b. 假設編譯後的程式在兩個不同的處理器上執行。如果兩個處理器上的執行時間相同，則執行編譯器 A 程式碼的處理器的時脈與執行編譯器 B 程式碼的處理器的時脈快多少？ c. 開發了一種新的編譯器，僅使用 6.0E8 指令，平均 CPI 為 1.1。與在原始處理器上使用編譯器 A 或 B 相比，使用此新編譯器的加速比是多少？ ``` >![](https://hackmd.io/_uploads/B1RFmBT-a.png) --- * ***1.9*** The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static power and 90 W of dynamic power. The Core i5 Ivy Bridge, released in 2012, has a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of dynamic power. ``` 2004 年發布的 Pentium 4 Prescott 處理器的時脈頻率為 3.6 GHz，電壓為 1.25 V。假設平均靜態功耗為 10 W，動態功耗為 90 W。 2012年發布的Core i5 Ivy Bridge的時脈頻率為3.4GHz，電壓為0.9V。假設平均靜態功耗為30W，動態功耗為40W。 ``` **1.9.1**. For each processor find the average capacitive loads. ``` 對於每個處理器，找到平均電容負載。 ``` >![](https://hackmd.io/_uploads/rynWrH6b6.png) **1.9.2**. Find the percentage of the total dissipated power comprised by static power and the ratio of static power to dynamic power for each technology. ``` 找出每種技術的靜態功率佔總耗散功率的百分比以及靜態功率與動態功率的比率。 ``` >![](https://hackmd.io/_uploads/SyKMrrp-T.png) **1.9.3**. If the total dissipated power is to be reduced by 10%, how much should the voltage be reduced to maintain the same leakage current? Note: Power is defined as the product of voltage and current ``` 如果總耗散功率要降低 10%，電壓應降低多少才能保持相同的漏電流？註：功率定義為電壓和電流的乘積 ``` ![](https://hackmd.io/_uploads/Sy97SSaba.png) ![](https://hackmd.io/_uploads/ByrNrSpZa.png) --- * ***1.11*** Assume a 15 cm diameter wafer has a cost of 12, contains 84 dies, and has 0.020 defects/cm2. Assume a 20 cm diameter wafer has a cost of 15, contains 100 dies, and has 0.031 defects/cm^2. ``` 假設直徑 15 公分的晶圓成本為 12，包含 84 個晶片，缺陷數為 0.020/cm2。假設直徑 20 公分的晶圓成本為 15，包含 100 個晶片，缺陷數為 0.031/cm2。 ``` **1.11.1** Find the yield for both wafers. ``` 求兩個晶圓的良率 ``` >![](https://hackmd.io/_uploads/HJeI8HTZp.png) **1.11.2** Find the cost per die for both wafers. ``` 計算兩個晶圓的每個晶片的成本 ``` >![](https://hackmd.io/_uploads/HJ_8UHTWa.png) **1.11.3** If the number of dies per wafer is increased by 10% and the defects per area unit increases by 15%, find the die area and yield. ``` 若每個晶圓的晶片數量增加 10%，單位面積缺陷數增加 15%，求晶片面積及成品率 ``` >![](https://hackmd.io/_uploads/r1SwLSp-a.png) **1.11.4** Assume a fabrication process improves the yield from 0.92 to 0.95. Find the defects per area unit for each version of the technology given a die area of 200 mm^2. ``` 假設製造流程將良率從 0.92 提高到 0.95。在晶片面積為 200 mm^2 的情況下，找出每個技術版本的單位面積缺陷 ``` >![](https://hackmd.io/_uploads/H1euLB6ZT.png) --- * ***1.13*** Section 1.11 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions. ``` 第 1.11 節引用了使用性能方程式的子集作為性能指標的缺陷。為了說明這一點，請考慮以下兩個處理器。 P1的時脈頻率為4GHz，平均CPI為0.9，需要執行5.0E9指令。 P2的時脈頻率為3GHz，平均CPI為0.75，需要執行1.0E9條指令。 ``` **1.13.1** One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. ``` 一種常見的謬誤是認為時脈頻率最高的電腦具有最高的效能。檢查 P1 和 P2 是否如此。 ``` >![](https://hackmd.io/_uploads/Bk29_Sab6.png) **1.13.2** Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions. ``` 另一個謬論是認為執行最多指令的處理器將需要更多的 CPU 時間。考慮到處理器 P1 正在執行一系列 1.0E9 指令，且處理器 P1 和 P2 的 CPI 不變，請確定 P1 需要執行 1.0E9 指令的同時 P2 可以執行的指令數。 ``` >![](https://hackmd.io/_uploads/SkIouSTZ6.png) **1.13.3** A common fallacy is to use MIPS (millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. ``` 一個常見的謬誤是使用MIPS（每秒百萬指令）來比較兩個不同處理器的效能，並認為MIPS最大的處理器具有最大的效能。檢查 P1 和 P2 是否如此。 ``` >![](https://hackmd.io/_uploads/S1G2uSaWT.png) >![](https://hackmd.io/_uploads/r123_ra-p.png) **1.13.4** Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as ![](https://hackmd.io/_uploads/Sk-GPrpWa.png) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors. ``` 另一個常見的效能數字是MFLOPS（每秒百萬次浮點運算），定義為(圖片)，但該數字與MIPS 存在相同的問題。假設P1和P2上執行的指令中有40%是浮點指令。尋找處理器的 MFLOPS 數字。 ``` >![](https://hackmd.io/_uploads/SytTurpba.png) --- ## 課堂作業1-1 * ![](https://hackmd.io/_uploads/rkn0v9KW6.png) >:bulb: >1. ![](https://hackmd.io/_uploads/B1-t_2qbp.png) >--- >2. **「控制器」、「運算邏輯器」、「記憶體」、「輸入設備」與「輸出設備」** >--- >3. ![](https://hackmd.io/_uploads/ryzUB25WT.png) >--- >4. >> **1. Memory Management(記憶體管理)**：管理主記憶體，並決定那一支程式可以佔有主記憶體，那一支程式不能使用主記憶體，…。 **2. Processor / Process Management(處理器 / 處理元管理)**：中央處理器之使用。 **3. Device Management(設備管理)**：管理輸出輸入週邊設備之運作，尤其必須有設備驅動程式(DeviceDriver)來驅動設備工作。 **4. Information Management(資訊管理)**：管理磁碟、磁片、光碟內之檔案結構及其內容。 >--- >5. ![](https://hackmd.io/_uploads/B1R3Un5Zp.png) >--- >6. **公有雲**(Public Cloud), **私有雲**(Private Cloud), **社群雲**(Community Cloud), **混合雲**(Hybrid Cloud)![](https://hackmd.io/_uploads/rydvPncbT.png) >--- >7. >>** Use abstraction to simplify design  Make the common case fast  Performance via parallelism  Performance via pipelining  Performance via prediction  Hierarchy of memories  Dependability via redundancy** --- ## 課堂作業1-2 * ![](https://hackmd.io/_uploads/H1iDuctZp.png) * ![](https://hackmd.io/_uploads/rkvt_cKZT.png) >:bulb: >1. ![](https://hackmd.io/_uploads/ry9g52q-6.png) >--- >2. **(1): CPU time = (7.5 x 10^9) x 0.8 /(5 x 10^9) = 1.2 s** >**(2): 1.2 / 3 = 0.4 = 40%** (Hint: wait clock time是指一個完整CPU time要3s) >--- >3. **100 / 3 = (100-80) + 80 / x >=> x(改善倍率) = 6** >--- >4. **speedUp1 = 1 / ((0.2/10) + (1-0.2)) = 1.22 >speedUp2 = 1 / ((0.5/2) + (1-0.5)) = 1.33 >=> 第二種比較好** >--- >5. **A is (25 / 15 = 1.67) times faster than B** >--- >6. **(2)** >(Hint: data center要服務很多人所以要用throughput) --- # CH2 ## 課本習題1-2 * **2.3** For the following C statement, write the corresponding MIPS assembly code. Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. **B[8] = A[i–j];** >:bulb: >![image](https://hackmd.io/_uploads/SJyNsOjN6.png) * **2.4** For the MIPS assembly instructions above, what is the corresponding C statement? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. ![image](https://hackmd.io/_uploads/rkW26X9NT.png) >:bulb: >![image](https://hackmd.io/_uploads/HJcBidiVT.png) * **2.6** Translate 0xabcdef12 into decimal. >:bulb: >![image](https://hackmd.io/_uploads/HyevsusEa.png) * **2.8** Translate the following MIPS code to C.Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. **addi $t0, $s6, 4 add $t1, $s6, $0 sw \$t1, 0 ($t0) lw \$t0, 0 ($t0) add $s0, $t1, $t0** >:bulb: >![image](https://hackmd.io/_uploads/rkqwo_oVT.png) * **2.9** (不在範圍內) :point_right: :point_left: For each MIPS instruction in Exercise 2.8, show the value of the opcode (op), source register (rs) and funct field, and destination register (rd) fields. For the I-type instructions, show the value of the immediate field, and for the R-type instructions, show the value of the second source register (rt). >:bulb: >![image](https://hackmd.io/_uploads/ByP6OpFwa.png) * **2.11** Assume that $s0 holds the value 128ten. * **1.** For the instruction add $t0, $s0, $s1, what is the range(s) of values for $s1 that would result in overflow? * **2.** For the instruction sub $t0, $s0, $s1, what is the range(s) of values for $s1 that would result in overflow? * **3.** For the instruction sub $t0, $s1, $s0, what is the range(s) of values for $s1 that would result in overflow? >:bulb: >![image](https://hackmd.io/_uploads/B1wK5hiE6.png) * **2.16** (不在範圍內) :point_right: :point_left: Assume that we would like to expand the MIPS register file to 128 registers and expand the instruction set to contain four times as many instructions. * **1.** How would this affect the size of each of the bit fields in the R-type instructions? >![image](https://hackmd.io/_uploads/rkrwYTtD6.png) * **2.** How would this affect the size of each of the bit fields in the I-type instructions? >![image](https://hackmd.io/_uploads/BJS_tpKvT.png) * **3.** How could each of two propose changes decrease the size of an MIPS assembly program? On the other hand, how could the proposed change increase the size of an MIPS assembly program? >![image](https://hackmd.io/_uploads/BkFFK6FvT.png) * **2.18** Find the shortest sequence of MIPS instructions that extracts bits 16 down to 11 from register $t0 and uses the value of this field to replace bits 31 down to 26 in register $t1 without changing the other bits of registers $t0 and $t1 (Be sure to test your code using $t0 = 0 and $t1 = 0xffffffffffffffff . Doing so may reveal a common oversight.). >:bulb: >![image](https://hackmd.io/_uploads/BJhYids4T.png) * **2.21** Assume $t0 holds the value 0x010100000. What is the value of $t2 after the following instructions? **slt $t2, $0, $t0 bne $t2, $0, ELSE j DONE ELSE: addi $t2, $t2, 2 DONE:** >:bulb: >![image](https://hackmd.io/_uploads/Hyr9iOo46.png) * **2.23** Consider a proposed new instruction named rpt. This instruction combines a loop’s condition check and counter decrement into a single instruction. For example, rpt $s0, loop would do the following: ```c if (x29 > 0) { x29 = x29 − 1; goto loop } ``` * **1.** If this instruction were to be implemented in the MIPS instruction set, what is the most appropriate instruction format? * **2.** What is the shortest sequence of MIPS instructions that performs the same operation? >:bulb: >![image](https://hackmd.io/_uploads/ryGoo_sEp.png) * **2.25** Translate the following C code to MIPS assembly code. Use a minimum number of instructions. Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D. ```c for(i=0; i<a; i++) for(j=0; j<b; j++) D[ 4 * j ] = i + j; ``` >:bulb: >![image](https://hackmd.io/_uploads/B1Gg12oNp.png) * **2.26** How many MIPS instructions does it take to implement the C code from Exercise 2.25? If the variables a and b are initialized to 10 and 1 and all elements of D are initially 0, what is the total number of MIPS instructions that is executed to complete the loop? >:bulb: >![image](https://hackmd.io/_uploads/rykZy3sE6.png) * **2.27** Translate the following loop into C. Assume that the C-level integer i is held in register $t1, $s2 holds the C-level integer called result, and $s0 holds the base address of the integer MemArray. **addi $t1, $0, 0 LOOP: lw \$s1, 0($s0) add $s2, $s2, $s1 addi $s0, $s0, 4 addi $t1, $t1, 1 slti $t2, $t1, 100 bne $t2, $s0, LOOP** >:bulb: >![image](https://hackmd.io/_uploads/Hkc-1nj4T.png) * **2.28** Rewrite the loop from Exercise 2.27 reduce the number of MIPS instructions executed. Hint: Notice that variable i is used only for loop control. >:bulb: >![image](https://hackmd.io/_uploads/BJSGkhsNa.png) * **2.29** Implement the following C code in MIPS assembly. Hint: Remember that the stack pointer must remain aligned on a multiple of 16.![image](https://hackmd.io/_uploads/HkanRVqV6.png) >:bulb: >![b7ab6726-8f6d-49a1-9453-2a98d0312bd6](https://hackmd.io/_uploads/SJYuJ2o4p.jpg) >![b7ab6726-8f6d-49a1-9453-2a98d0312312bd6](https://hackmd.io/_uploads/BkzKJhs4T.jpg) * **2.40** Assume that for a given program 70% of the executed instructions are arithmetic, 10% are load/store, and 20% are branch. * **1.** Given the instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store instruction takes 6 cycles, and a branch instruction takes 3 cycles, find the average CPI. * **2.** For a 25% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all? * **3.** For a 50% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all? >:bulb: >![image](https://hackmd.io/_uploads/Bygkjkho4T.png) ## 課堂作業2 ### **part 1** * **1.** 假設我們要把一個十進位數值1,000,000從記憶體位置80開始擺放，請問若分別以Big-endian與Little-endian擺放，則他們各自的記憶體位置分別為何？ >![image](https://hackmd.io/_uploads/B1A7iSc4p.png) * **2.** 請說明為何暫存器中會有$zero這個設置' >功用1: 載入常數到reg.用 >**addi $t0, $zero, 5** >功用2: 搬移reg.位置 >**add $s3, $s2, $zero** * **3.** 請以16位元的有號數的格式分別儲存1023ten與-1023ten的結果 > 1023: 0000 0011 1111 1111 >-1023: 1111 1100 0000 0001 * **4.** 請分別寫出MIPS指令集中，R, I, J三種指令格式的配置 >![image](https://hackmd.io/_uploads/Syz_2v9Ea.png) * **5.** 請將以下16進位轉2進位，或2進位轉16進位， (a)eca8 6420(16) >1110 1100 1010 1000 0110 0100 0010 0000 (b)0001 0011 1010 0111 1001 1011 1101 1111(2) >13A79BDF ### **part 2** * **1.** 請寫出以下c語言指令的MIPS組合語言碼，A[12]=A[1]+A[2]-A[3]+4，其中A的基底位置存在$s2。 >![image](https://hackmd.io/_uploads/BkbMAw9E6.png) * **2.** Write the following sequence of code into MIPS assembler: x=x+y+z-q, where x, y, z, and q are stored in registers $s1-$s4 >![image](https://hackmd.io/_uploads/BJ8XCD54a.png) * **3.** For the following binary entries, what instruction do they represent? What type instruction do the binary entries above represent? 1010 1110 0000 1011 0000 0000 0000 0100(2) >(沒教到這) * **4.** In the snippet of MIPS assembler code below, how many times is the instruction memory accessed? How many times is the data memory accessed? (Count only accesses to memory, not registers.) **lw \$t0, 0($a0) addi $t0, $t0, 1 lw \$t1, 1($a0) addi $t1, $t1, 1** >![image](https://hackmd.io/_uploads/rJL4AP94a.png) * **5.** The logical instructions below are not included in the MIPS instruction set, but can be synthesized using one or more MIPS assembly instructions. Provide a minimal set of MIPS instructions that may be used in place of the instructions. If the value of $t2=0X00FFA5A5 and the value of $t3=0XFFFF003C, what is the result in $t1? (a) andn $t1, $t2, $t3 // bit-wise AND of $t2, !$t3 (b) xnor $t1, $t2, $t3 // bit-wise exdusive-NOR >![image](https://hackmd.io/_uploads/HJGBCPqVa.png) * **6.** Convert the C function below to MIPS assembly language. (assume that arguments g, h, i, j are stored in \$a0~\$a3 and f in \$s0) **int leaf_example(int g, int h, int i, int j){ int f; f=(g+h)-(i+j); return f; }** >![image](https://hackmd.io/_uploads/SkgL0PqNa.png) * **7.** 請寫出MIPS的五種定址模式 >(沒教到這) * **8.** Please use a single MIPS instruction to move a value in $t0 to $s1 >![image](https://hackmd.io/_uploads/Hyj8CD5Na.png) * **9.** Write the MIPS assembly code for the C statement A[12]=h+A[8]; Assume that h is stored in register $s0, and the base address of array A is stored in $s1. >![image](https://hackmd.io/_uploads/HkUPAP5Vp.png) * **10.** Explain the meaning of each of the following MIPS instructions using an if-statement. Denote the program counter as PC when needed.Note that an instruction has four bytes. **(a) slt $s1, $s2, $s3 (b) slti $s1, $s2, 100 (c\) bne $s1, $s2, 25** >![image](https://hackmd.io/_uploads/BkyO0D5Na.png) * **11.** 假設你有兩個變數a與b，(1)請寫出能交換這兩個變數數值的c語言程式碼，(2)假設上述程式碼執行前， a與b分別存在暫存器的\$s1與\$s2，請寫出上述c語言程式碼的MIPS組合語言碼。(3)請把上述的組合語言碼換成2進位的機器碼。 >![image](https://hackmd.io/_uploads/HytuCw94p.png) * **12.** 請寫出一個飲料販賣機找零系統的C語言程式碼與MIPS組合語言碼，其中這個找零系統只能投50元進去，並找出10元與1元 >自己瞎寫的, 正確率未知 >![59df09f2-4973-4cec-a90c-107f32e48188](https://hackmd.io/_uploads/rJzaYhiNa.jpg) >![59df09f2-4973-4cec-a90c-107f32e4000188](https://hackmd.io/_uploads/rkF6YhoV6.jpg) # CH3 ## 課本習題3 * ***3.1*** What is 5ED4 2 07A4 when these values represent unsigned 16-bit hexadecimal numbers? The result should be written in decimal. Show your work. >:bulb: >![image](https://hackmd.io/_uploads/rko-wpKvT.png) --- * ***3.2*** What is 5ED4 2 07A4 when these values represent signed 16-bit hexadecimal numbers stored in sign-magnitude format? >:bulb: >同**3.1** --- * ***3.4*** What is 4365 − 3412 when these values represent unsigned 12-bit octal numbers? The result should be written in decimal. Show your work. >:bulb: >![image](https://hackmd.io/_uploads/Bkkmw6KPp.png) --- * ***3.5*** What is 4365 − 3412 when these values represent signed 12-bit octal numbers stored in sign-magnitude format? The result should be written in decimal. Show your work. >:bulb: >![image](https://hackmd.io/_uploads/HJ67DaFwa.png) --- * ***3.6*** Assume 185 and 122 are unsigned 8-bit decimal integers. Calculate 185 – 122. Is there overflow, underflow, or neither? >:bulb: >![image](https://hackmd.io/_uploads/BkvVPpKv6.png) --- * ***3.7*** Assume 185 and 122 are signed 8-bit decimal integers stored in sign-magnitude format. Calculate 185 + 122. Is there overflow, underflow, or neither? >:bulb: >![image](https://hackmd.io/_uploads/HkREvTYDp.png) --- * ***3.9*** Assume 151 and 214 are signed 8-bit decimal integers stored in two’s complement format. Calculate 151 + 214 using saturating arithmetic. The result should be written in decimal. Show your work. >:bulb: >![image](https://hackmd.io/_uploads/rJDSvTFDa.png) --- * ***3.12*** Using a table similar to that shown in Figure 3.6, calculate the product of the octal unsigned 6-bit integers 62 and 12 using the hardware described in Figure 3.3. You should show the contents of each register on each step. ![image](https://hackmd.io/_uploads/Sy6wqzjvp.png) >:bulb: >![image](https://hackmd.io/_uploads/HJBO1aFwa.png) --- * ***3.13*** Using a table similar to that shown in Figure 3.6, calculate the product of the octal unsigned 8-bit integers 62 and 12 using the hardware described in Figure 3.5. You should show the contents of each register on each step. ![image](https://hackmd.io/_uploads/Skc8JmsDp.png) >:bulb: >![image](https://hackmd.io/_uploads/B1GjyatD6.png) --- * ***3.18*** (可以跳過) 3.18 [20] <§3.4> Using a table similar to that shown in Figure 3.10, calculate 74 divided by 21 using the hardware described in Figure 3.8. You should show the contents of each register on each step. Assume both inputs are unsigned 6-bit integers. >:bulb: >![image](https://hackmd.io/_uploads/rySEx6tPp.png) * ***3.23*** Write down the binary representation of the decimal number 63.25 assuming the IEEE 754 single precision format. >:bulb: >![image](https://hackmd.io/_uploads/ry2rNaFva.png) * ***3.24*** Write down the binary representation of the decimal number 63.25 assuming the IEEE 754 double precision format. >:bulb: >![image](https://hackmd.io/_uploads/r1Qd4aYDT.png)