# **Chapter 1 Computer Abstractions and Technology**
:::info
#### 1. Aside from the smart cell phones used by a billion people, list and describe four other types of computers.
:::
1. Personal Computer
2. supercomputer
3. embedded computer
4. server
:::info
#### 2. The eight great ideas in computer architecture are similar to ideas from other fi elds. Match the eight ideas from computer architecture, “Design for Moore’s Law”, “Use Abstraction to Simplify Design”, “Make the Common Case Fast”, “Performance via Parallelism”, “Performance via Pipelining”, “Performance via Prediction”, “Hierarchy of Memories”, and “Dependability via Redundancy” to the following ideas from other fi elds:
```
a. Assembly lines in automobile manufacturing
b. Suspension bridge cables
c. Aircraft and marine navigation systems that incorporate wind information
d. Express elevators in buildings
e. Library reserve desk
f. Increasing the gate area on a CMOS transistor to decrease its switching time
g. Adding electromagnetic aircraft catapults (which are electrically-powered
as opposed to current steam-powered models), allowed by the increased power
generation off ered by the new reactor technology
h. Building self-driving cars whose control systems partially rely on existing
sensor systems already installed into the base vehicle, such as lane departure
systems and smart cruise control systems
```
:::
(a) Performance via Parallelism<br>
(b) Dependability via Redundancy<br>
\(c\) Performance via Prediction<br>
(d) Make the Common Case Fast<br>
(e) Hierarchy of Memories<br>
(f) Design for Moore’s Law<br>
(g) Performance via Pipelining<br>
(h) Use Abstraction to Simplify Design
:::info
#### 3. Describe the steps that transform a program written in a high-level language such as C into a representation that is directly executed by a computer processor.
:::
1. C 透過 compiler, 編譯成 assembly code
2. assembly code 透過 assembler, 組譯成 machine code
:::info
#### 4. Assume a color display using 8 bits for each of the primary colors (red, green, blue) per pixel and a frame size of 1280 × 1024.
```
a. What is the minimum size in bytes of the frame buffer to store a frame?
b. How long would it take, at a minimum, for the frame to be sent over a 100 Mbit/s
network?
```
:::
(a) 1280 * 1024 bytes<br>
(b) (1280 * 1024 * 8) / (100 * 10^6)
:::info
#### 5. Consider three different processors P1, P2, and P3 executing the same instruction set. P1 has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
```
a. Which processor has the highest performance expressed in instructions per second?
b. If the processors each execute a program in 10 seconds, find the number of cycles
and the number of instructions.
c. We are trying to reduce the execution time by 30% but this leads to an increase
of 20% in the CPI. What clock rate should we have to get this time reduction?
```
:::
(a)
>補充 : Instrcutions per second = clock rate / CPI
\begin{array}
\ && P1 && P2 && P3 \\
\hline
\ Instructions \ per \ second && 2*10^9 && 2.5*10^9 && 1.818*10^9 \\
\end{array}
Ans : P2<br>
(b)
\begin{array}
\ && P1 && P2 && P3 \\
\hline
\ cycles && 30*10^9 && 25*10^9 && 40*10^9 \\
\ instructions && 20*10^9 && 25*10^9 && 18.18*10^9 \\
\end{array}
\(c\)
>補充 : clock rate = instructions * CPI / EXEtime<br>
$$clock rate of P1 = 3GHz \times {1.2\over0.7} = 5.143 GHz$$
$$clock rate of P2 = 2.5GHz \times {1.2 \over 0.7} = 4.286 GHz$$
$$clock rate of P3 = 4GHz \times {1.2 \over 0.7} = 6.857 GHz
$$
:::info
#### 6. Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C, and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2, and 2. <br><br>Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which implementation is faster?
```
a. What is the global CPI for each implementation?
b. Find the clock cycles required in both cases.
```
:::
(a)<br>
$$CPI\ of\ P1 = 1 \times 0.1 + 2 \times 0.2 + 3 \times 0.5 + 3 \times 0.2 = 2.6$$
$$CPI\ of\ P2 = 2 \times 0.1 + 2 \times 0.2 + 2 \times 0.5 + 2 \times 0.2 = 2
$$
(b)<br>
$$
clock\ cycle\ required\ of\ P1 = {2.6 * 10^6} \qquad
clock\ cycle\ required\ of\ P2 = {2 * 10^6}
$$
:::info
#### 7. Compilers can have a profound impact on the performance of an application. Assume that for a program, compiler A results in a dynamic instruction count of 1.0E9 and has an execution time of 1.1 s, while compiler B results in a dynamic instruction count of 1.2E9 and an execution time of 1.5 s.
```
a. Find the average CPI for each program given that the processor has a clock cycle
time of 1 ns.
b. Assume the compiled programs run on two different processors. If the execution
times on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?
c. A new compiler is developed that uses only 6.0E8 instructions and has an
average CPI of 1.1. What is the speedup of using this new compiler versus using
compiler A or B on the original processor?
```
:::
(a)<br>
$$ CPI_A = {{1.1 * 10^9} \over {10^9}} = 1.1 \qquad
CPI_B = {{1.5 \times 10^9} \over {1.2 \times 10^9}} = 1.25 $$
(b)
$$ speed up = {cycles \ of \ B \over cycles \ of \ A} = {{1.2 \times 10^9 \times 1.25} \over {10^9 \times 1.1}} = 1.364$$
\(c\)
$${cycles\ of\ A\over cycles\ of\ new}={{10^9 \times 1.1} \over {6 \times 10^8 \times 1.1}}=1.667\qquad
{cycles\ of\ B\over cycles\ of\ new}={{1.2\times10^9\times1.25}\over {6 \times 10^8 \times 1.1}}=2.273
$$
:::info
#### 8. The Pentium 4 Prescott processor, released in 2004, had a clock rate of 3.6 GHz and voltage of 1.25 V. Assume that, on average, it consumed 10 W of static power and 90 W of dynamic power.<br><br>The Core i5 Ivy Bridge, released in 2012, had a clock rate of 3.4 GHz and voltage of 0.9 V. Assume that, on average, it consumed 30 W of static power and 40 W of dynamic power.
:::
:::info
<b>8.1 For each processor find the average capacitive loads.</b>
:::
**補充 : dynamic power = 1/2 * Capacitive load * Voltage^2^ * Frequency switched
$$Pentium\ 4\ Prescott : {2 * 90W \over 1.25V^2 * 3.6GHz} = 3.2\times10^{-8}$$
$$Core\ i5\ Ivy\ Bridge : {2 * 40W \over 0.9V^2 * 3.4GHz} = 2.9\times10^{-8}$$
:::info
<b>8.2 Find the percentage of the total dissipated power comprised by static power and the ratio of static power to dynamic power for each technology.</b>
:::
$$Pentium\ 4\ Prescott : {10 \over 100} = 10\% $$
$$Core\ i5\ Ivy\ Bridge : {30 \over 70} = 42.9\%
$$
:::info
<b>~~8.3 If the total dissipated power is to be reduced by 10%, how much should the voltage be reduced to maintain the same leakage current? Note: power is defined as the product of voltage and current.~~</b>
:::
:::info
#### 9. Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. <br><br>Assume that each processor has a 2 GHz clock frequency. Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same.
:::
:::info
<b>9.1 Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.</b>
:::
\begin{array}
\ core && arithmetic && load/store && branch && cycles && EXEtime && speedup\\
\hline
\ 1 && 2.56*10^9 && 1.28*10^9 && 256*10^6 && 7.94*10^{10} && 39.7s && 1\\
\ 2 && 1.83*10^9 && 9.14*10^8 && 256*10^6 && 5.67*10^{10} && 28.3s && 1.4\\
\ 4 && 9.14*10^8 && 4.57*10^8 && 256*10^6 && 2.83*10^{10} && 14.2s && 2.8\\
\ 8 && 4.57*10^8 && 2.29*10^8 && 256*10^6 && 1.42*10^{10} && 7.1s && 5.6\\
\end{array}
:::info
<b>9.2 If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?</b>
:::
\begin{array}
\ core && EXEtime \\
\hline
\ 1 && 41 \\
\ 2 && 29.3 \\
\ 4 && 14.6 \\
\ 8 && 7.33 \\
\end{array}
:::info
<b>9.3 To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?</b>
:::
Ans : 3 (計算量太大, 直接抄解答)
:::info
#### 10. Assume a 15 cm diameter wafer has a cost of 12, contains 84 dies, and has 0.020 defects/cm2. Assume a 20 cm diameter wafer has a cost of 15, contains 100 dies, and has 0.031 defects/cm2.
:::
:::info
<b>10.1 Find the yield for both wafers.</b>
:::
$$
die\ area_{15cm}={7.5^2\times\pi \over 84} = 2.1cm^2 \qquad yield_{15cm}={1\over({1+{0.02\times2.1 \over 2}})^2}=95.93\%
$$$$
die\ area_{20cm}={10^2\times\pi \over 100} = 3.14cm^2 \qquad yield_{20cm}={1\over({1+{0.31\times3.14 \over 2}})^2}=90.93\%
$$
:::info
<b>10.2 Find the cost per die for both wafers.</b>
:::
$$
cost\ per\ die_{15cm}={12 \over 84 \times 0.9593}=0.1489
$$$$
cost\ per\ die_{20cm}={15 \over 100 \times 0.9093}=0.165
$$
:::info
<b>10.3 If the number of dies per wafer is increased by 10% and the
defects per area unit increases by 15%, find the die area and yield</b>
:::
>算出新的 dies per area 和 defects per area
$$
die\ area_{15cm}={7.5^2\times\pi \over 84\times1.1} = 1.91cm^2 \qquad yield_{15cm}={1\over({1+{0.02\times1.15\times 2.1 \over 2}})^2}=95.75\%
$$$$
die\ area_{20cm}={10^2\times\pi \over 100\times 1.1} = 2.86cm^2 \qquad yield_{20cm}={1\over({1+{0.31\times1.15\times3.14 \over 2}})^2}=90.82\%$$
:::info
<b>10.4 Assume a fabrication process improves the yield from 0.92 to 0.95. Find the defects per area unit for each version of the technology given a die area of 200 mm^2^.</b>
:::
$$
defect_{0.92}=2\times{0.92^{-0.5}-1\over200}=0.043\ defect/cm^2
$$$$
defect_{0.92}=2\times{0.95^{-0.5}-1\over200}=0.026\ defect/cm^2$$
:::info
#### 11. Th e results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.
:::
:::info
<b>11.1 Find the CPI if the clock cycle time is 0.333 ns.</b>
:::
$$
CPI={750s\over0.333ns\times2.389\times10^{12}}=0.943
$$
:::info
<b>11.2 Find the SPECratio.</b>
:::
$$
SPECratio={9650\over750}=12.86
$$
:::info
<b>11.3 Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without affecting the CPI.</b>
:::
Ans : increase 10%
:::info
<b>11.4 Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.</b>
:::
1.1 * 1.05 = 1.155<br>
Ans : increase 15.5%
:::info
<b>11.5 Find the change in the SPECratio for this change.</b>
:::
$$
SPECratio = {reference\ time \over CPU\ time} \qquad Change = {1\over1.155}=0.866
$$
Ans : decrease 13.4%
:::info
<b>11.6 Suppose that we are developing a new version of the AMD
Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.</b>
:::
$$
{2.389\times10^{12}\times0.85\times CPI\over4GHz}=700s \qquad CPI={700\times4GHz\over2.389\times10^{12}\times0.85}=1.379
$$
:::info
<b>11.7 This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?</b>
:::
$$
CPI\ ratio = {1.379\over0.943}=1.462\qquad Clock\ rate\ ratio={4GHz\over3GHz}=1.333
$$
Ans : They are dissimilar, although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage.
:::info
<b>11.8 By how much has the CPU time been reduced?</b>
:::
$$
{700\over750}=0.933
$$
Ans : reduce 6.7%
:::info
<b>11.9 For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without affecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.</b>
:::
$$
{IC\times1.61\over4GHz}=960s\times0.9\qquad IC={960s\times0.9\times4GHz\over1.61}=2.147\times10^{12}
$$
:::info
<b>11.10 Determine the clock rate required to give a further 10%
reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.</b>
:::
$$
clock\ rate=3GHz\times{1\over0.9}=3.333
$$
:::info
<b>11.11 Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.</b>
:::
$$
clock\ rate={2.147\times10^{12}\times1.61\times0.85\over960\times0.8}=3.826\times10^9
$$
:::info
#### 12. Section 1.10 cites as a pitfall the utilization of a subset of the performance equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3 GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions.
:::
:::info
<b>12.1 One usual fallacy is to consider the computer with the
largest clock rate as having the largest performance. Check if this is true for P1 and P2.</b>
:::
$$
EXEtime_1={5\times10^9\times0.9\over4\times10^9}=1.125s\qquad EXEtime_2={1\times10^9\times0.75\over3\times10^9}=0.25s
$$
Ans : largest clock rate does not means largest performance
:::info
<b>12.2 Another fallacy is to consider that the processor executing
the largest number of instructions will need a larger CPU time. Considering that processor P1 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P2 can execute in the same time that P1 needs to execute 1.0E9 instructions.</b>
:::
$$
EXEtime\ of\ P1\ executing\ 10^9\ instr.={10^9\times0.9\over4\times10^9}=0.225s
$$$$
instr.\ count\ of\ P2 ={0.225s\times3\times10^9\over0.75\times}=9\times10^8$$
Ans : largest number of instr. count isn't means need a large CPU time
:::info
<b>12.3 A common fallacy is to use MIPS (millions of instructions per second) to compare the performance of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2.</b>
:::
$$
MIPS_1={5\times10^9\over1.125\times10^6}=4.444\times10^3 \qquad MIPS_2={1\times10^9\over0.25\times10^6}=4\times10^3
$$
Ans : largest MIPS dose not means largest performance
:::info
<b>12.4 Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as MFLOPS = No. FP operations / (execution time × 1E6) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the programs.</b>
:::
$$
MFLOPS_1={5\times10^9\times0.4\over1.125\times10^6}=1.778\times10^3 \qquad MFLOPS_2={1\times10^9\times0.4\over0.25\times10^6}=1.6\times10^3
$$
:::info
#### 13. Another pitfall cited in Section 1.10 is expecting to improve the overall performance of a computer by improving only one aspect of the computer. Consider a computer running a program that requires 250 s, with 70 s spent executing FP instructions, 85 s executed L/S instructions, and 40 s spent executing branch instructions.
:::
:::info
<b>13.1 By how much is the total time reduced if the time for FP operations is reduced by 20%?</b>
:::
$$
70\times0.8=56,\quad 70-56=14
$$
Ans : reduce 14s
:::info
<b>13.2 By how much is the time for INT operations reduced if the total time is reduced by 20%?</b>
:::
$$
55\times0.8=44,\quad 55-44=11
$$
Ans : reduce 11s
:::info
<b>13.3 Can the total time can be reduced by 20% by reducing only the time for branch instructions?</b>
:::
$$
{40\over250}=16\%
$$
Ans : No, because the branch part only accounts for 16% of total running time
:::info
#### 14. Assume a program requires the execution of 50 × 10^6^ FP instructions, 110 × 10^6^ INT instructions, 80 × 10^6^ L/S instructions, and 16 × 10^6^ branch instructions. The CPI for each type of instruction is 1, 1, 4, and 2, respectively. Assume that the processor has a 2 GHz clock rate.
:::
:::info
<b>14.1 By how much must we improve the CPI of FP instructions if we want the program to run two times faster?</b>
:::
$$
50\times10^6\times1+110\times10^6\times1+80\times10^6\times4+16\times10^6\times2=512\times10^6\ cycles
$$$$
50\times10^6\times CPI_{improve}+110\times10^6\times1+80\times10^6\times4+16\times10^6\times2=256\times10^6\ cycles
$$$$
CPI_{improve}<0
$$
Ans : impossible
:::info
<b>14.2 By how much must we improve the CPI of L/S instructions if we want the program to run two times faster?</b>
:::
$$
50\times10^6\times1+110\times10^6\times1+80\times10^6\times CPI_{improve}+16\times10^6\times2=256\times10^6\ cycles
$$$$
CPI_{improve}={256-50-110-32\over80}={74\over80}=0.925
$$
:::info
<b>14.3 By how much is the execution time of the program improved if the CPI of INT and FP instructions is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?</b>
:::
$$
50\times10^6\times1\times0.6+110\times10^6\times1\times0.6+80\times10^6\times4\times0.7+16\times10^6\times2\times0.7=342.4\times10^6\ cycles
$$$$
execution\ time={342.4\times10^6\over2\times10^9}=0.171s
$$
:::info
#### 15. When a program is adapted to run on multiple processors in a multiprocessor system, the execution time on each processor is comprised of computing time and the overhead time required for locked critical sections and/or to send data from one processor to another.<br><br>Assume a program requires t = 100 s of execution time on one processor. When run p processors, each processor requires t/p s, as well as an additional 4 s of overhead, irrespective of the number of processors. Compute the per-processor execution time for 2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup relative to a single processor and the ratio between actual speedup versus ideal speedup (speedup if there was no overhead).
:::
\begin{array}
\ core && execution && contain\ overhead && speedup && actual\ speedup/ideal\ speedup\\
\hline
\ 1 && 100 && && && \\
\ 2 && 50 && 54 && 1.852 && 0.926\\
\ 4 && 25 && 29 && 3.448 && 0.862\\
\ 8 && 12.5 && 16.5 && 6.061 && 0.758\\
\ 16 && 6.25 && 10.25 && 9.976 && 0.624\\
\ 32 && 3.125 && 7.125 && 14.035 && 0.439\\
\ 64 && 1.567 && 5.567 && 17.963 && 0.281\\
\ 128 && 0.781 && 4.781 && 20.916 && 0.163\\
\end{array}
---
# **Chapter 2 Instructions: Language of the Computer**
:::info
#### 1. For the following C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, and i are given and could be considered 32-bit integers as declared in a C program. Use a minimal number of MIPS assembly instructions.<br><br>    f = g + (h − 5);
:::
```
addi i, h, -5
add f, g, i
```
:::info
#### 2. For the following MIPS assembly instructions above, what is a corresponding C statement?
```
add f, g, h
add f, i, f
```
:::
f = i + (g + h)
:::info
#### 3. For the following C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively.<br><br>    B[8] = A[i−j];
:::
```
sub $t0, $s3, $s4
sll $t0, $t0, 2
add $t0, $t0, $s6
lw $t0, 0($t0)
sw $t0, 32($s7)
```
:::info
#### 4. For the MIPS assembly instructions below, what is the corresponding C statement? Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively.
```
sll $t0, $s0, 2 # $t0 = f * 4
add $t0, $s6, $t0 # $t0 = &A[f]
sll $t1, $s1, 2 # $t1 = g * 4
add $t1, $s7, $t1 # $t1 = &B[g]
lw $s0, 0($t0) # f = A[f]
addi $t2, $t0, 4
lw $t0, 0($t2)
add $t0, $t0, $s0
sw $t0, 0($t1)
```
:::
B[g] = A[f] + A[f+1]
:::info
#### 5. For the MIPS assembly instructions in Exercise 2.4, rewrite the assembly code to minimize the number if MIPS instructions (if possible) needed to carry out the same function.
:::
```
sll $t0, $s0, 2 # $t0 = f * 4
add $t0, $s6, $t0 # $t0 = &A[f]
sll $t1, $s1, 2 # $t1 = g * 4
add $t1, $s7, $t1 # $t1 = &B[g]
lw $s0, 0($t0) # f = A[f]
lw $t0, 4($t0)
add $t0, $t0, $s0
sw $t0, 0($t1)
```
:::info
#### 6. The table below shows 32-bit values of an array stored in memory.
\begin{array}
\ Address && Data \\
\hline
24 && 2 \\
38 && 4 \\
32 && 3 \\
36 && 6 \\
40 && 1 \\
\end{array}
:::
:::info
<b>6.1 For the memory locations in the table above, write C code to sort the data from lowest to highest, placing the lowest value in the smallest memory location shown in the figure. Assume that the data shown represents the C variable called Array, which is an array of type int, and that the first number in the array shown is the first element in the array. Assume that this particular machine is a byte-addressable machine and a word consists of four bytes.</b>
:::
```
temp1 = Array[0];
temp2 = Array[1];
Array[0] = Array[4];
Array[4] = Array[3];
Array[1] = temp1;
Array[3] = temp2;
```
:::info
<b>6.2 For the memory locations in the table above, write MIPS code to sort the data from lowest to highest, placing the lowest value in the smallest memory location. Use a minimum number of MIPS instructions. Assume the base address of Array is stored in register $s6.</b>
:::
```
lw $t0, 0($s6)
lw $t1, 4($s6)
lw $t2, 16($s6)
sw $t2, 0($s6)
sw $t0, 4($s6)
lw $t0, 12($s6)
sw $t0, 16($s6)
sw $t1, 12($s6)
```
:::info
#### 7. Show how the value 0xabcdef12 would be arranged in memory of a little-endian and a big-endian machine. Assume the data is stored starting at address 0.
:::
\begin{array}
\ little-endian &&& big-endian \\
\ Address \ \ Data &&& Address \ \ Data \\
\hline
0 \qquad \quad 12 &&& 0 \qquad \quad ab\\
1 \qquad \quad ef &&& 1 \qquad \quad cd\\
2 \qquad \quad cd &&& 2 \qquad \quad ef\\
3 \qquad \quad ab &&& 3 \qquad \quad 12\\
\end{array}
:::info
#### 8. Translate 0xabcdef12 into decimal.
:::
Ans : 2882400018
:::info
#### 9. Translate the following C code to MIPS. Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively. Assume that the elements of the arrays A and B are 4-byte words:<br><br>    B[8] = A[i] + A[j];
:::
```
sll $t0, $s3, 2
add $t0, $t0, $s6
lw $t0, 0($t0)
sll $t1, $s4, 2
add $t1, $t1, $s6
lw $t1, 0($t1)
add $t0, $t1, $t0
sw $t0, 32($s7)
```
:::info
#### 10. Translate the following MIPS code to C. Assume that the variables f, g, h, i, and j are assigned to registers $s0, $s1, $s2, $s3, and $s4, respectively. Assume that the base address of the arrays A and B are in registers $s6 and $s7, respectively.
```
addi $t0, $s6, 4
add $t1, $s6, $0
sw $t1, 0($t0)
lw $t0, 0($t0)
add $s0, $t1, $t0
```
:::
A[1] = (&A)<br>
f = A[1] + (&A)
:::info
#### 11. For each MIPS instruction, show the value of the opcode(OP), source register (RS), and target register (RT) fields. For the I-type instructions, show the value of the immediate field, and for the R-type instructions, show the value of the destination register(RD) field.
:::
\begin{array}
\ code && type && op && rs && rt && rd && imm16\\
\hline
\ addi\ $t0,\ $s6,\ 4 && I && 8 && 22 && 8 && && 4\\
\ add\ $t1,\ $s6,\ $0 && R && 0 && 22 && 0 && 9 &&\\
\ sw\ $t1,\ 0($t0) && I && 43 && 8 && 9 && && 0\\
\ lw\ $t0,\ 0($t0) && I && 35 && 8 && 8 && && 0\\
\ add\ $s0,\ $t1,\ $t0 && R && 0 && 9 && 8 && 16 &&\\
\end{array}
:::info
#### 12. Assume that registers $s0 and $s1 hold the values 0x80000000 and 0xD0000000, respectively.
:::
:::info
<b>12.1 What is the value of $t0 for the following assembly code?<br><br>    add $t0, $s0, $s1</b>
:::
>remember use 2's complement
Ans : 0x50000000
:::info
<b>12.2 Is the result in $t0 the desired result, or has there been overflow?</b>
:::
Ans : overflow
:::info
<b>12.3 For the contents of registers $s0 and $s1 as specified above,
what is the value of $t0 for the following assembly code?<br><br>    sub $t0, $s0, $s1</b>
:::
Ans : 0xB000000
:::info
<b>12.4 Is the result in $t0 the desired result, or has there been overflow?</b>
:::
Ans : no overflow
:::info
<b>12.5 For the contents of registers $s0 and $s1 as specified above, what is the value of $t0 for the following assembly code?</b>
```
add $t0, $s0, $s1
add $t0, $t0, $s0
```
:::
Ans : 0xD0000000
:::info
<b>12.6 Is the result in $t0 the desired result, or has there been overflow?</b>
:::
And : overflow
:::info
#### 2.13 Assume that $s0 holds the value 128~ten~.
:::
:::info
<b>13.1 For the instruction add $t0, $s0, $s1, what is the range(s) of values for $s1 that would result in overflow?</b>
:::
Ans : 2^31^-129 < $s1 < -2^31^-128
:::info
<b>13.2 For the instruction sub $t0, $s0, $s1, what is the range(s) of values for $s1 that would result in overflow?</b>
:::
Ans : -2^31^+129< $s1 < 2^31^+128
:::info
<b>13.3 For the instruction sub $t0, $s1, $s0, what is the range(s) of values for $s1 that would result in overflow?</b>
:::
Ans : -2^32^+128 < $s1 < 2^31^+127
:::info
#### 14. Provide the type and assembly language instruction for the following binary value: 0000 0010 0001 0000 1000 0000 0010 0000~two~
:::

\begin{array}
\ op && rs && rt && rd && shamt &&funct\\
\hline
\ 000000 && 10000 && 10000 && 10000 && 00000 && 100000\\
\end{array}
Ans : R-type, add $s0, $s0, $s0
:::info
#### 15. Provide the type and hexadecimal representation of following instruction: sw $t1, 32($t2)
:::
\begin{array}
\ op && rs && rt && imm16\\
\hline
\ 101011 && 01010 && 01001 && 0000000000100000\\
\end{array}
Ans : I-type, 0xAD490020
:::info
#### 16. Provide the type, assembly language instruction, and binary representation of instruction described by the following MIPS fields:<br><br>    op=0, rs=3, rt=2, rd=3, shamt=0, funct=34
:::
\begin{array}
\ op && rs && rt && rd && shamt &&funct\\
\hline
\ 000000 && 00011 && 00010 && 00011 && 00000 && 100010\\
\end{array}
Ans : R-type, sub $3, $3, $2
:::info
#### 17. Provide the type, assembly language instruction, and binary representation of instruction described by the following MIPS fields:<br><br>    op=0x23, rs=1, rt=2, const=0x4
:::
\begin{array}
\ op && rs && rt && imm16\\
\hline
\ 010111 && 00011 && 00010 && 0000000000000100\\
\end{array}
Ans : R-type, sub $3, $3, $2
:::info
#### 18. Assume that we would like to expand the MIPS register file to 128 registers and expand the instruction set to contain four times as many instructions.
:::
:::info
<b>18.1 How this would this affect the size of each of the bit fields in the R-type instructions?</b>
:::
\begin{array}
\ op && rs && rt && rd && shamt && funct\\
\hline
\ 8-bit && 7-bit && 7-bit && 7-bit && 5-bit && 6-bit\\
\end{array}
Ans : 40 bits
:::info
<b>18.2 How this would this aff ect the size of each of the bit fields in the I-type instructions?</b>
:::
\begin{array}
\ op && rs && rt && imm16\\
\hline
\ 8-bit && 7-bit && 7-bit && 16-bit\\
\end{array}
Ans : 38 bits
:::info
<b>18.3 How could each of the two proposed changes decrease the size of an MIPS assembly program? On the other hand, how could the proposed change increase the size of an MIPS assembly program?</b>
:::
more registers → more bits per instruction → could increase code size<br>
more registers → less register spills → less instructions<br>
more instructions → more appropriate instruction → decrease code size<br>
more instructions → larger opcodes → larger code size
:::info
#### 19. Assume the following register contents:<br><br>    $t0 = 0xAAAAAAAA, $t1 = 0x12345678
:::
:::info
<b>19.1 For the register values shown above, what is the value of $t2
for the following sequence of instructions?</b>
```
sll $t2, $t0, 44
or $t2, $t2, $t1
```
:::
Ans : 0x12345678
:::info
<b>19.2 For the register values shown above, what is the value of $t2 for the following sequence of instructions?</b>
```
sll $t2, $t0, 4
andi $t2, $t2, −1
```
:::
Ans : 0xAAAAAA9F
:::info
<b>19.3 For the register values shown above, what is the value of $t2 for the following sequence of instructions?</b>
```
srl $t2, $t0, 3
andi $t2, $t2, 0xFFEF
```
:::
0x15555555 + 0xFFEF = 0x15565544<br>
Ans : 0x15565544
:::info
#### 20. Find the shortest sequence of MIPS instructions that extracts bits 16 down to 11 from register $t0 and uses the value of this field to replace bits 31 down to 26 in register $t1 without changing the other 26 bits of register $t1.
:::
```
srl $t0, $t0, 11
sll $t0, $t0, 26
sll $t1, $t1, 6
srl $t1, $t1, 6
or $t1, $t1, $t0
```
:::info
#### 21. Provide a minimal set of MIPS instructions that may be used to implement the following pseudoinstruction:<br><br>    not $t1, $t2 // bit-wise invert
:::
```
nor $t1, $zero, $t2
```
:::info
#### 22. For the following C statement, write a minimal sequence of MIPS assembly instructions that does the identical operation. Assume $t1 = A, $t2 = B, and $s1 is the base address of C.<br><br>    A = C[0] << 4;
:::
```
lw $s1, 0($s1)
sll $t1, $s1, 4
```
:::info
#### 23. Assume $t0 holds the value 0x00101000. What is the value of $t2 after the following instructions?
```
slt $t2, $0, $t0
bne $t2, $0, ELSE
j DONE
ELSE: addi $t2, $t2, 2
DONE:
```
:::
Ans : 3
:::info
#### 24. Suppose the program counter (PC) is set to 0x2000 0000. Is it possible to use the jump (j) MIPS assembly instruction to set the PC to the address as 0x4000 0000? Is it possible to use the branch-on-equal (beq) MIPS assembly instruction to set the PC to this same address?
:::
jump 只能在相同 block 中跳躍 => impossible<br>
beg 最多跳躍至 0x20020000 => impossible
:::info
#### 25. The following instruction is not included in the MIPS instruction set:<br><br>    rpt $t2, loop # if(R[rs]>0) R[rs]=R[rs]−1, PC=PC+4+BranchAddr
:::
:::info
<b>25.1 If this instruction were to be implemented in the MIPS instruction set, what is the most appropriate instruction format?</b>
:::
Ans : I-type
:::info
<b>25.2 What is the shortest sequence of MIPS instructions that performs the same operation?</b>
:::
```
slt $t1, $zero, $t2
bne $t1, $zero loop
loop: addi $t1, $t1, -1
```
:::info
#### 26. Consider the following MIPS loop:
```
LOOP: slt $t2, $0, $t1
beq $t2, $0, DONE
subi $t1, $t1, 1
addi $s2, $s2, 2
j LOOP
DONE:
```
:::
:::info
<b>26.1 Assume that the register $t1 is initialized to the value 10. What is the value in register $s2 assuming $s2 is initially zero?</b>
:::
Ans : 20
:::info
<b>26.2 For each of the loops above, write the equivalent C code routine. Assume that the registers $s1, $s2, $t1, and $t2 are integers A, B, i, and temp, respectively.</b>
:::
```
while(i > 0){
i--;
B = B + 2;
}
```
:::info
<b>26.3 For the loops written in MIPS assembly above, assume that the register $t1 is initialized to the value N. How many MIPS instructions are executed?</b>
:::
Ans : 5N+2
:::info
#### 27. Translate the following C code to MIPS assembly code. Use a minimum number of instructions. Assume that the values of a, b, i, and j are in registers $s0, $s1, $t0, and $t1, respectively. Also, assume that register $s2 holds the base address of the array D.
```
for(i=0; i<a; i++)
for(j=0; j<b; j++)
D[4*j] = i + j;
```
:::
>有點難, 解答的看不懂, 下面是自己寫的
```
add $t0, $zero, $zero
Loop1: slt $t2, $t0, $s0 # if $t2=0, then Exit1
beq $t2, $zero, Exit1
add $t1, $zero, $zero
Loop2: slt $t2, $t1, $s1
beq $t2, $zero, Exit2 # if $t2=0, then Exit2
sll $t3, $1, 4 # $t3 << 4
add $t3, $t3, $s2 # $t3 = $t3 + base addr of D
add $t4, $t0, $t1 # $t4 = i + j
sw $t4, 0($t3)
addi $t1, $t1, 1
j Loop2
Exit2: addi $t0, $t0, 1
j Loop1
Exit1:
```
:::info
#### 28. How many MIPS instructions does it take to implement the C code from Exercise 2.27? If the variables a and b are initialized to 10 and 1 and all elements of D are initially 0, what is the total number of MIPS instructions that is executed to complete the loop?
:::
14 instruction to implement<br>
There are 88 instructions that is executed to complete
:::info
#### 29. Translate the following loop into C. Assume that the C-level integer i is held in register $t1, $s2 holds the C-level integer called result, and $s0 holds the base address of the integer MemArray.
```
add $t1, $0, $0
LOOP: lw $s1, 0($s0)
add $s2, $s2, $s1
addi $s0, $s0, 4
addi $t1, $t1, 1
slti $t2, $t1, 100
bne $t2, $0, LOOP
```
:::
```
for (i=0; i<100; i++) {
result += MemArray[s0];
s0 = s0 + 1;
}
```
:::info
#### 30. Rewrite the loop from Exercise 2.29 to reduce the number of MIPS instructions executed.
:::
```
addi $t1, $s0, 400
LOOP: lw $s1, 0($t0)
add $s2, $s2, $s1
addi $s0, $s0, -4
bne $t2, $s0, LOOP
```
:::info
#### 31. ~~Implement the following C code in MIPS assembly. What is the total number of MIPS instructions needed to execute the function?~~
```
int fib(int n){
if (n==0)
return 0;
else if (n == 1)
return 1;
else
return fib(n−1) + fib(n−2);
}
```
:::
:::info
#### 32. Functions can often be implemented by compilers “in-line.” An in-line function is when the body of the function is copied into the program space, allowing the overhead of the function call to be eliminated. Implement an “in-line” version of the C code above in MIPS assembly. What is the reduction in the total number of MIPS assembly instructions needed to complete the function? Assume that the C variable n is initialized to 5.
:::
>透過將 function program 放入 main 中, 減少 function call
Due to the recursive nature of the code, it is not possible for the compiler to in-line the function call.
:::info
#### 33. For each function call, show the contents of the stack after the function call is made. Assume the stack pointer is originally at address 0x7ffffffc, and follow the register conventions as specified in Figure 2.11.
:::
\begin{array}
\ address && content && pointer\\
\hline
\ 0x7ffffffc && $sp && old\ $sp\\
\ 0x7ffffff8 && $ra &&\\
\ 0x7ffffff4 && $s0 &&\\
\ 0x7ffffff0 && $s0 && current\ $sp\\
\end{array}
:::info
#### 34. Translate function f into MIPS assembly language. If you need to use registers $t0 through $t7, use the lower-numbered registers first. Assume the function declaration for func is “int f(int a, int b);”. The code for function f is as follows:
```
int f(int a, int b, int c, int d){
return func(func(a,b),c+d);
}
```
:::
\begin{array}
\ variable && a && b && c && d \\
\hline
\ register && $a0 && $a1 && $a2 && $a3 \\
\end{array}
```
f: addi $sp, $sp, -4
sw $ra, 4($sp)
add $t0, $a2, $a3 # $t0 = c+d
jal func # $a0, $a1 直接給 funct(a,b) 使用, result 存入 $v0
move $a0, $v0
move $a1, $t0
jal func
lw $ra, 4($sp)
addi $sp, $sp, 4
jr $ra
```
:::info
#### 35. Can we use the tail-call optimization in this function? If no, explain why not. If yes, what is the difference in the number of executed instructions in f with and without the optimization?
:::
>看很久還是不知道怎模運作的
(Tail-call optimization enables user to avoid the allocation of a new stack frame for a function because the calling function will return the value that it gets from the called function.)<br><br>
take 2.34 for the example<br>
We can use the tail-call optimization for the second call to func, but then we must restore $ra, $s0, $s1, and $sp before that call. We save only one instruction (jr $ra).
:::info
#### 36. Right before your function f from Exercise 2.34 returns, what do we know about contents of registers $t5, $s3, $ra, and $sp? Keep in mind that we know what the entire function f looks like, but for function func we only know its declaration.
:::
* $ra : 呼叫 f 函數的 return address
* $s3, $sp : 與呼叫 f 函數前相同
* $t5 : 無法預測其值, 雖然 f 沒有使用到 $t5, 但可能被 func 修改過
:::info
#### 37. Write a program in MIPS assembly language to convert an ASCII number string containing positive and negative integer decimal strings, to an integer. Your program should expect register $a0 to hold the address of a nullterminated string containing some combination of the digits 0 through 9. Your program should compute the integer value equivalent to this string of digits, then place the number in register $v0. If a non-digit character appears anywhere in the string, your program should stop with the value −1 in register $v0. For example, if register $a0 points to a sequence of three bytes 50ten, 52ten, 0ten (the nullterminated string “24”), then when the program stops, register $v0 should contain the value 24~ten~.
:::
>我不會, 答案抄來的
```
MAIN: addi $sp, $sp, -4
sw $ra, ($sp)
add $t6, $0, 0x30 # ‘0’
add $t7, $0, 0x39 # ‘9’
add $s0, $0, $0
add $t0, $a0, $0
LOOP: lb $t1, ($t0)
slt $t2, $t1, $t6
bne $t2, $0, DONE
slt $t2, $t7, $t1
bne $t2, $0, DONE
sub $t1, $t1, $t6
beq $s0, $0, FIRST
mul $s0, $s0, 10
FIRST:add $s0, $s0, $t1
addi $t0, $t0, 1
j LOOP
DONE: add $v0, $s0, $0
lw $ra, ($sp)
addi $sp, $sp, 4
jr $ra
```
:::info
#### 38. Consider the following code:<br><br>    lbu $t0, 0($t1)<br>    sw $t0, 0($t2)<br><br>Assume that the register $t1 contains the address 0x1000 0000 and the register $t2 contains the address 0x1000 0010. Note the MIPS architecture utilizes big-endian addressing. Assume that the data (in hexadecimal) at address 0x10000000 is: 0x11223344. What value is stored at the address pointed to by register $t2?
:::
\begin{array}
\ address && 0x10000000 && 0x10000001 && 0x10000010 && 0x10000011 \\
\hline
\ data && 0x11 && 0x22 && 0x33 && 0x44 \\
\ $t0 && 0x00000011
\end{array}
Ans : 0x00000011
:::info
#### 39. Write the MIPS assembly code that creates the 32-bit constant 0010 0000 0000 0001 0100 1001 0010 0100~two~ and stores that value to register $t1.
:::
```
lui $t1, 0x2001
ori $t1, 0x4924
```
:::info
#### 40. If the current value of the PC is 0x00000000, can you use a single jump instruction to get to the PC address as shown in Exercise 2.39?
:::
Ans : No, they are not in same block
:::info
#### 41. If the current value of the PC is 0x00000600, can you use a single branch instruction to get to the PC address as shown in Exercise 2.39?
:::
Ans : No, the upper limit address is 0x00020600
:::info
#### 42. If the current value of the PC is 0x1FFFf000, can you use a single branch instruction to get to the PC address as shown in Exercise 2.39?
:::
Ans : Yes, the upper limit is 0x2001F000
:::info
#### 43. Write the MIPS assembly code to implement the following C code:<br><br>    lock(lk);<br>    shvar=max(shvar,x);<br>    unlock(lk);<br><br> Assume that the address of the lk variable is in $a0, the address of the shvar variable is in $a1, and the value of variable x is in $a2. Your critical section should not contain any function calls. Use ll/sc instructions to implement the lock() operation, and the unlock() operation is simply an ordinary store instruction.
:::
>idk
:::info
#### 44. Repeat Exercise 2.43, but this time use `ll/sc` to perform an atomic update of the shvar variable directly, without using `lock()` and `unlock()`. Note that in this problem there is no variable `lk`.
:::
>idk
:::info
#### 45. Using your code from Exercise 2.43 as an example, explain what happens when two processors begin to execute this critical section at the same time, assuming that each processor executes exactly one instruction per cycle.
:::
>idk
:::info
#### 46. Assume for a given processor the CPI of arithmetic instructions is 1, the CPI of load/store instructions is 10, and the CPI of branch instructions is 3. Assume a program has the following instruction breakdowns: 500 million arithmetic instructions, 300 million load/store instructions, 100 million branch instructions.
:::
:::info
<b>46.1 Suppose that new, more powerful arithmetic instructions are added to the instruction set. On average, through the use of these more powerful arithmetic instructions, we can reduce the number of arithmetic instructions needed to execute a program by 25%, and the cost of increasing the clock cycle time by only 10%. Is this a good design choice? Why?</b>
:::
$$ Original \ EXEtime = 1 \times (500\times1 + 300\times10 + 100\times3)\times10^6 = 3800 \times 10^6$$
$$ Improved \ EXEtime = 1.1 \times (0.75\times500\times1 + 300\times10 + 100\times3)\times10^6 = 4042.5 \times 10^6$$
Ans : It is not a good choice
:::info
<b>46.2 Suppose that we find a way to double the performance of arithmetic instructions. What is the overall speedup of our machine? What if we find a way to improve the performance of arithmetic instructions by 10 times?</b>
:::
$$ 1. \quad Speedup = {{500\times1 + 300\times10 + 100\times 3}\over{0.5\times500\times1 + 300\times10 + 100\times 3}} = {3800 \over 3550} = 107\% $$
$$ 2. \quad Speedup = {{500\times1 + 300\times10 + 100\times 3}\over{0.1\times500\times1 + 300\times10 + 100\times 3}} = {3800 \over 3350} = 113\%$$
:::info
#### 47. Assume that for a given program 70% of the executed instructions are arithmetic, 10% are load/store, and 20% are branch.
:::
:::info
<b>47.1 Given this instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store instruction takes 6 cycles, and a branch instruction takes 3 cycles, find the average CPI.</b>
:::
$$ CPI = 0.7\times2 + 0.1\times6 + 0.2\times3 = 2.6
$$
:::info
<b>47.2 For a 25% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all?</b>
:::
$$ 2.6\times 0.75 = 0.7\times cycle + 0.1\times6 + 0.2\times3
$$$$ cycle = 1.07
$$
:::info
<b>47.3 For a 50% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all?</b>
:::
$$ 2.6\times 0.5 = 0.7\times cycle + 0.1\times6 + 0.2\times3
$$$$ cycle = 0.14
$$
# **Chapter 3 Arithmetic for Computers**
:::success
#### 1. What is 5ED4 - 07A4 when these values represent unsigned 16-bit hexadecimal numbers? The result should be written in hexadecimal. Show your work.
:::
\begin{array}
\ && binary && hexadecimal\\
\hline
\ && \text{0101 1110 1101 0100} && 5ED4\\
\ - && \text{0000 0111 1010 0100} && 07A4\\
\hline
\ && \text{0101 0111 0011 0000} && 5730
\end{array}
Ans : 5730
:::success
#### 2. What is 5ED4 - 07A4 when these values represent signed 16-bit hexadecimal numbers stored in sign-magnitude format? The result should be written in hexadecimal. Show your work.
:::
\begin{array}
\ && binary && hexadecimal\\
\hline
\ && \text{0101 1110 1101 0100} && 5ED4\\
\ - && \text{0000 0111 1010 0100} && 07A4\\
\hline
\ && \text{0101 0111 0011 0000} && 5730
\end{array}
Ans : 5730
:::success
#### 3. Convert 5ED4 into a binary number. What makes base 16 (hexadecimal) an attractive numbering system for representing values in computers?
:::
Ans : 0101 1110 1101 0100
:::success
#### 4. What is 4365 - 3412 when these values represent unsigned 12-bit octal numbers? The result should be written in octal. Show your work.
:::
\begin{array}
\ && binary && octal\\
\hline
\ && \text{100 011 110 101} && 4365\\
\ - && \text{011 100 001 010} && 3412\\
\hline
\ && \text{000 111 101 011} && 0753
\end{array}
Ans : 0753
:::success
#### 5. What is 4365 - 3412 when these values represent signed 12-bit octal numbers stored in sign-magnitude format? The result should be written in octal. Show your work.
:::
\begin{array}
\ && binary && octal\\
\hline
\ && \text{100 011 110 101} && 4365\\
\ - && \text{011 100 001 010} && 3412\\
\hline
\ && \text{100 011 110 101} && -0365\\
\ - && \text{011 100 001 010} && 3412\\
\hline
\ - && \text{011 111 110 110} && -3777\\
\hline
\ && \text{111 111 111 111} && 7777\\
\end{array}
Ans : 7777
:::success
#### 6. Assume 185 and 122 are unsigned 8-bit decimal integers. Calculate 185 – 122. Is there overflow, underflow, or neither?
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10111001} && 185\\
\ - && \text{01111010} && 122\\
\hline
\ && \text{00111111} && 63
\end{array}
Ans : neither
:::success
#### 7. Assume 185 and 122 are signed 8-bit decimal integers stored in sign-magnitude format. Calculate 185 + 122. Is there overflow, underflow, or neither?
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10111001} && 185\\
\ + && \text{01111010} && 122\\
\hline
\ && \text{00111001} && -57\\
\ + && \text{01111010} && 122\\
\hline
\ && \text{01000001} && 65
\end{array}
Ans : neither
:::success
#### 8. Assume 185 and 122 are signed 8-bit decimal integers stored in sign-magnitude format. Calculate 185 - 122. Is there overflow, underflow, or neither?
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10111001} && 185\\
\ - && \text{01111010} && 122\\
\hline
\ && \text{00111001} && -57\\
\ - && \text{01111010} && 122\\
\hline
\ && \text{10110011} && -179
\end{array}
Ans : overflow
:::success
#### 9. Assume 151 and 214 are signed 8-bit decimal integers stored in two’s complement format. Calculate 151 + 214 using saturating arithmetic. The result should be written in decimal. Show your work.
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10010111} && 151\\
\ + && \text{11010110} && 214\\
\hline
\ && \text{10010111} && -105\\
\ + && \text{11010110} && -42\\
\hline
\ && \text{11111111} && -128 (saturating)
\end{array}
Ans : -128
:::success
#### 10. Assume 151 and 214 are signed 8-bit decimal integers stored in two’s complement format. Calculate 151 - 214 using saturating arithmetic. The result should be written in decimal. Show your work.
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10010111} && 151\\
\ - && \text{11010110} && 214\\
\hline
\ && \text{10010111} && -105\\
\ - && \text{11010110} && -42\\
\hline
\ && \text{10010111} && -105\\
\ + && \text{00101010} && 42\\
\hline
\ && \text{11000001} && -63
\end{array}
Ans : -128
:::success
#### 11. Assume 151 and 214 are unsigned 8-bit integers. Calculate 151 + 214 using saturating arithmetic. The result should be written in decimal. Show your work.
:::
\begin{array}
\ && binary && decimal\\
\hline
\ && \text{10010111} && 151\\
\ + && \text{11010110} && 214\\
\hline
\ && \text{11111111} && 255(saturating)
\end{array}
Ans : 255
:::success
#### 12. Using a table similar to that shown in Figure 3.6, calculate the product of the octal unsigned 6-bit integers 62 and 12 using the hardware described in Figure 3.3. You should show the contents of each register on each step.

:::
>做到 step4 就有答案了, 但題目要求全部 run 完
>
\begin{array}
\ binary && octal\\
\hline
\ \text{110 010} && 62\\
\ \text{001 010} && 12\\
\end{array}
\begin{array}
\ Step && Action && Multiplier && Multiplicand && Product\\
\hline
\ 0 && Initialization && \text{001 010} && \text{000 000 110 010} && \text{000 000 000 000}\\
\hline
\ 1 && nop && \text{001 010} && \text{000 000 110 010} && \text{000 000 000 000}\\
\ 1 && Mplier右移 && \text{000 101} && \text{000 000 110 010} && \text{000 000 000 000}\\
\ 1 && Mcand左移 && \text{000 101} && \text{000 001 100 100} && \text{000 000 000 000}\\
\hline
\ 2 && Pro=Pro+Mcand && \text{000 101} && \text{000 001 100 100} && \text{000 001 100 100}\\
\ 2 && Mplier右移 && \text{000 010} && \text{000 001 100 100} && \text{000 001 100 100}\\
\ 2 && Mcand左移 && \text{000 010} && \text{000 011 001 000} && \text{000 001 100 100}\\
\hline
\ 3 && nop && \text{000 010} && \text{000 011 001 000} && \text{000 001 100 100}\\
\ 3 && Mplier右移 && \text{000 001} && \text{000 011 001 000} && \text{000 001 100 100}\\
\ 3 && Mcand左移 && \text{000 001} && \text{000 110 010 000} && \text{000 001 100 100}\\
\hline
\ 4 && Pro=Pro+Mcand && \text{000 001} && \text{000 110 010 000} && \text{000 111 110 100}\\
\ 4 && Mplier右移 && \text{000 000} && \text{000 110 010 000} && \text{000 111 110 100}\\
\ 4 && Mcand左移 && \text{000 000} && \text{001 100 100 000} && \text{000 111 110 100}\\
\hline
\ 5 && nop && \text{000 000} && \text{001 100 100 000} && \text{000 111 110 100}\\
\ 5 && Mplier右移 && \text{000 000} && \text{001 100 100 000} && \text{000 111 110 100}\\
\ 5 && Mcand左移 && \text{000 000} && \text{011 001 000 000} && \text{000 111 110 100}\\
\hline
\ 6 && nop && \text{000 000} && \text{011 001 000 000} && \text{000 111 110 100}\\
\ 6 && Mplier右移 && \text{000 000} && \text{011 001 000 000} && \text{000 111 110 100}\\
\ 6 && Mcand左移 && \text{000 000} && \text{110 010 000 000} && \text{000 111 110 100}\\
\end{array}
:::success
#### 13. Using a table similar to that shown in Figure 3.6, calculate the product of the hexadecimal unsigned 8-bit integers 62 and 12 using the hardware described in Figure 3.5. You should show the contents of each register on each step.

:::
\begin{array}
\ binary && octal\\
\hline
\ \text{110 010} && 62\\
\ \text{001 010} && 12\\
\end{array}
\begin{array}
\ Step && Action && Multiplicand && Product\\
\hline
\ 0 && Initialization && \text{110 010} && \text{000 000 001 010}\\
\hline
\ 1 && nop && \text{110 010} && \text{000 000 001 010}\\
\ 1 && Pro右移 && \text{110 010} && \text{000 000 000 101}\\
\hline
\ 2 && Pro=Pro+Mcand && \text{110 010} && \text{110 010 000 101}\\
\ 2 && Pro右移 && \text{110 010} && \text{011 001 000 010}\\
\hline
\ 3 && nop && \text{110 010} && \text{011 001 000 010}\\
\ 3 && Pro右移 && \text{110 010} && \text{001 100 100 001}\\
\hline
\ 4 && Pro=Pro+Mcand && \text{110 010} && \text{111 110 100 001}\\
\ 4 && Pro右移 && \text{110 010} && \text{011 111 010 000}\\
\hline
\ 5 && nop && \text{110 010} && \text{011 111 010 000}\\
\ 5 && Pro右移 && \text{110 010} && \text{001 111 101 000}\\
\hline
\ 6 && nop && \text{110 010} && \text{001 111 101 000}\\
\ 6 && Pro右移 && \text{110 010} && \text{000 111 110 100}\\
\end{array}
:::success
#### 14. Calculate the time necessary to perform a multiply using the approach given in Figures 3.3 and 3.4 if an integer is 8 bits wide and each step of the operation takes 4 time units. Assume that in step 1a an addition is always performed—either the multiplicand will be added, or a zero will be. Also assume that the registers have already been initialized (you are just counting how long it takes to do the multiplication loop itself). If this is being done in hardware, the shifts of the multiplicand and multiplier can be done simultaneously. If this is being done in software, they will have to be done one after the other. Solve for each case.


:::
* For hardware : <br>
addition, shift(Mcand, Mplier), decide(loop over) take 3 cycles<br>
3 * 8 * 4 = 96 time units
* For software : <br>
decide(add), addition, shift(Mcand), shift(Mplier), decide(loop over) take 5 cycles<br>
5 * 8 * 4 = 160 time units
:::success
#### 15. Calculate the time necessary to perform a multiply using the approach described in the text (31 adders stacked vertically) if an integer is 8 bits wide and an adder takes 4 time units.
:::
there are 8-1 adder, 7*4=28<br>Ans : 28 time units
:::success
#### 16. Calculate the time necessary to perform a multiply using the approach given in Figure 3.7 if an integer is 8 bits wide and an adder takes 4 time units.

:::
4*log~2~8=12<br>Ans : 12
:::success
#### 17. As discussed in the text, one possible performance enhancement is to do a shift and add instead of an actual multiplication. Since 9 * 6, for example, can be written (2 * 2 * 2 + 1) * 6, we can calculate 9 * 6 by shifting 6 to the left 3 times and then adding 6 to that result. Show the best way to calculate 0x33 * 0x55 using shifts and adds/subtracts. Assume both inputs are 8-bit unsigned integers.
:::
0x33 = (00110011)~2~<br>
0x33 * 0x55 = (2^5^+2^4^+2+1)*0x55<br>
>0x55左移5位 + 0x55左移4位 + 0x55左移1位 + 0x55
:::success
#### 18. Using a table similar to that shown in Figure 3.10, calculate 74 divided by 21 using the hardware described in Figure 3.8. You should show the contents of each register on each step. Assume both inputs are unsigned 6-bit integers.

:::
\begin{array}
\ binary && octal\\
\hline
\ \text{111 100} && 74\\
\ \text{010 001} && 21\\
\end{array}
\begin{array}
\ Step && Action && Divisor && Remainder && Quotient\\
\hline
\ 0 && Initialization && \text{010 001 000 000} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 1 && R=R-D && \text{010 001 000 000} && \text{101 111 111 100} && \text{000 000}\\
\ 1 && R<0, R=R+D, Q<<0 && \text{010 001 000 000} && \text{000 000 111 100} && \text{000 000}\\
\ 1 && D右移 && \text{001 000 100 000} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 2 && R=R-D && \text{001 000 100 000} && \text{111 000 011 100} && \text{000 000}\\
\ 2 && R<0, R=R+D, Q<<0 && \text{001 000 100 000} && \text{000 000 111 100} && \text{000 000}\\
\ 2 && D右移 && \text{000 100 010 000} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 3 && R=R-D && \text{000 100 010 000} && \text{111 100 101 100} && \text{000 000}\\
\ 3 && R<0, R=R+D, Q<<0 && \text{000 100 010 000} && \text{000 000 111 100} && \text{000 000}\\
\ 3 && D右移 && \text{000 010 001 000} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 4 && R=R-D && \text{000 010 001 000} && \text{111 110 110 100} && \text{000 000}\\
\ 4 && R<0, R=R+D, Q<<0 && \text{000 010 001 000} && \text{000 000 111 100} && \text{000 000}\\
\ 4 && D右移 && \text{000 001 000 100} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 5 && R=R-D && \text{000 001 000 100} && \text{111 111 111 000} && \text{000 000}\\
\ 5 && R<0, R=R+D, Q<<0 && \text{000 001 000 100} && \text{000 000 111 100} && \text{000 000}\\
\ 5 && D右移 && \text{000 000 100 010} && \text{000 000 111 100} && \text{000 000}\\
\hline
\ 6 && R=R-D && \text{000 000 100 010} && \text{000 000 011 010} && \text{000 000}\\
\ 6 && R>0, Q<<1 && \text{000 000 100 010} && \text{000 000 011 010} && \text{000 001}\\
\ 6 && D右移 && \text{000 000 010 001} && \text{000 000 011 010} && \text{000 001}\\
\hline
\ 7 && R=R-D && \text{000 000 010 001} && \text{000 000 011 010} && \text{000 001}\\
\ 7 && R>0, Q<<1 && \text{000 000 010 001} && \text{000 000 001 001} && \ \text{000 011}\\
\ 7 && D右移 && \text{000 000 001 000} && \text{000 000 001 001} && \text{000 011}\\
\end{array}
:::success
#### 19. Using a table similar to that shown in Figure 3.10, calculate 74 divided by 21 using the hardware described in Figure 3.11. You should show the contents of each register on each step. Assume A and B are unsigned 6-bit integers. This algorithm requires a slightly different approach than that shown in Figure 3.9. You will want to think hard about this, do an experiment or two, or else go to the web to figure out how to make this work correctly. (Hint: one possible solution involves using the fact that Figure 3.11 implies the remainder register can be shifted either direction.)

:::
\begin{array}
\ binary && octal\\
\hline
\ \text{111 100} && 74\\
\ \text{010 001} && 21\\
\end{array}
\begin{array}
\ Step && Action && Divisor && Remainder\\
\hline
\ 0 && Initialization && \text{010 001} && \text{000 000 111 100} \\
\hline
\ 1 && R左移 && \text{010 001} && \text{000 001 111 000}\\
\ 1 && R=R-D && \text{010 001} && \text{110 000 111 000}\\
\ 1 && R<0, R=R+D && \text{010 001} && \text{000 001 111 000}\\
\hline
\ 2 && R左移 && \text{010 001} && \text{000 011 110 000}\\
\ 2 && R=R-D && \text{010 001} && \text{110 010 110 000}\\
\ 2 && R<0, R=R+D && \text{010 001} && \text{000 011 110 000}\\
\hline
\ 3 && R左移 && \text{010 001} && \text{000 111 100 000}\\
\ 3 && R=R-D && \text{010 001} && \text{110 110 100 000}\\
\ 3 && R<0, R=R+D && \text{010 001} && \text{000 111 100 000}\\
\hline
\ 4 && R左移 && \text{010 001} && \text{001 111 000 000}\\
\ 4 && R=R-D && \text{010 001} && \text{111 110 000 000}\\
\ 4 && R<0, R=R+D && \text{010 001} && \text{001 111 000 000}\\
\hline
\ 5 && R左移 && \text{010 001} && \text{011 110 000 000}\\
\ 5 && R=R-D && \text{010 001} && \text{001 101 000 000}\\
\ 5 && R>0, R_0=1 && \text{010 001} && \text{001 101 000 001}\\
\hline
\ 6 && R左移 && \text{010 001} && \text{011 010 000 010}\\
\ 6 && R=R-D && \text{010 001} && \text{001 001 000 010}\\
\ 6 && R>0, R_0=1 && \text{010 001} && \text{001 001 000 011}\\
\end{array}
:::success
#### 20. What decimal number does the bit pattern 0×0C000000 represent if it is a two’s complement integer? An unsigned integer?
:::
Ans : 12*16^6^ for both questions
:::success
#### 21 If the bit pattern 0×0C000000 is placed into the Instruction Register, what MIPS instruction will be executed?
:::
\begin{array}
\ op && 26bit address\\
\hline
\ 000011 && 00000000000000000000000000\\
\end{array}
Ans : jal 0x00000000
:::success
#### 22. What decimal number does the bit pattern 0×0C000000 represent if it is a floating point number? Use the IEEE 754 standard.
:::
\begin{array}
\ sign && exponent && fraction\\
\hline
\ 0 && 00011000 && 00000000000000000000000\\
\end{array}
Ans : 1*2^-103^
:::success
#### 23. Write down the binary representation of the decimal number 63.25 assuming the IEEE 754 single precision format.
:::
63.25 = (111111.01)~2~ = 1.1111101 * 2^5^
\begin{array}
\ sign && exponent && fraction\\
\hline
\ 0 && 10000100 && 11111010000000000000000\\
\end{array}
Ans : 0x427D0000
:::success
#### 24. ~~Write down the binary representation of the decimal number 63.25 assuming the IEEE 754 double precision format.~~
:::
:::success
#### 25. Write down the binary representation of the decimal number 63.25 assuming it was stored using the single precision IBM format (base 16, instead of base 2, with 7 bits of exponent).
:::
63.25 = (111111.01)~2~ = 1.1111101 * 2^5^ = 0.0011111101 * 16^2^
\begin{array}
\ sign && exponent && fraction\\
\hline
\ 0 && 1000010 && 001111110100000000000000\\
\end{array}
Ans : 01000010001111110100000000000000
:::success
#### 26. Write down the binary bit pattern to represent -1.5625 * 10^-1^ assuming a format similar to that employed by the DEC PDP-8 (the left most 12 bits are the exponent stored as a two’s complement number, and the rightmost 24 bits are the fraction stored as a two’s complement number). No hidden 1 is used. Comment on how the range and accuracy of this 36-bit pattern compares to the single and double precision IEEE 754 standards.
:::
-1.5625 * 10^-1^ = -0.15625 = (-0.00101)~2~ = -0.101 * 2^-2^
\begin{array}
\ exponent && fraction\\
\hline
\ 111111111110 && 101100000000000000000000\\
\end{array}
>fraction = 0.101 (encode>>) -010100000...0 (2's complment>>) 1011000...0
Ans : 111111111110101100000000000000000000
:::success
#### 27. IEEE 754-2008 contains a half precision that is only 16 bits wide. The left most bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Write down the bit pattern to represent -1.5625 * 10^-1^ assuming a version of this format, which uses an excess-16 format to store the exponent. Comment on how the range and accuracy of this 16-bit floating point format compares to the single precision IEEE 754 standard.
:::
\begin{array}
\ sign && exponent && fraction\\
\hline
\ 1 && 01100 && 0100000000\\
\end{array}
Ans : 1011000100000000
:::success
#### 28. The Hewlett-Packard 2114, 2115, and 2116 used a format with the left most 16 bits being the fraction stored in two’s complement format, followed by another 16-bit field which had the left most 8 bits as an extension of the fraction (making the fraction 24 bits long), and the rightmost 8 bits representing the exponent. However, in an interesting twist, the exponent was stored in sign-magnitude format with the sign bit on the far right! Write down the bit pattern to represent -1.5625 * 10^-1^ assuming this format. No hidden 1 is used. Comment on how the range and accuracy of this 32-bit pattern compares to the single precision IEEE 754 standard.
:::
\begin{array}
\ fraction(24) && exponent(8)\\
\hline
\ 101100000000000000000000 && 00000101 && \\
\end{array}
Ans : 10110000000000000000000000000101
:::success
#### 29. Calculate the sum of 2.6125 * 10^1^ and 4.150390625 * 10^-1^ by hand, assuming A and B are stored in the 16-bit half precision described in Exercise 3.27. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps.
:::
2.6125 * 10^1^ = 1.1010001000 * 2^4^<br>
4.150390625 * 10^-1^ = 1.1010100111 * 2^-2^ = 0.0000011010100111
\begin{array}
\ && & GR & S\\
\ && 1.1010001000 & 00\\
\ + && 0.0000011010 & 10 & 0111\\
\hline
\ && 1.1010100010 & 10 & 1 & Round\ up\\
\hline
\ && 1.1010100011\\
\end{array}
Ans : 1.1010100011 * 2^4^
:::success
#### 30. Calculate the product of –8.0546875 * 10^0^ and -1.79931640625 * 10^–1^ by hand, assuming A and B are stored in the 16-bit half precision format described in Exercise 3.27. Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps; however, as is done in the example in the text, you can do the multiplication in human-readable format instead of using the techniques described in Exercises 3.12 through 3.14. Indicate if there is overflow or underfl ow. Write your answer in both the 16-bit floating point format described in Exercise 3.27 and also as a decimal number. How accurate is your result? How does it compare to the number you get if you do the multiplication on a calculator?
:::
-8.0546875 = -1.0000000111 * 2^3^ <br>
-1.79931640625 * 10^-1^ = -1.0111000010 * 2^-3^<br>
(-1.0000000111 * 2^3^) * (-1.0111000010 * 2^-3^) = 1.01110011000001001110 * 2^0^
\begin{array}
\ && & GR & S\\
\ && 1.0111001100 & 00 & 01001110\\
\hline
\ && 1.0111001100 & 00 & 1 &Round\ up\\
\hline
\ && 1.0111001100\\
\end{array}
Ans : 1.0111001100
:::success
#### 31. ~~Calculate by hand 8.625 * 10^1^ divided by -4.875 * 10^0^. Show all the steps necessary to achieve your answer. Assume there is a guard, a round bit, and a sticky bit, and use them if necessary. Write the final answer in both the 16-bit floating point format described in Exercise 3.27 and in decimal and compare the decimal result to that which you get if you use a calculator.~~
:::
:::success
#### 32. Calculate (3.984375 * 10^-1^ + 3.4375 * 10^-1^) + 1.771 * 10^3^ by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.
:::
3.984375 * 10^-1^ = 1.1001100000 * 2^-2^<br>
3.4375 * 10^-1^ = 1.0110000000 * 2^-2^<br>
1.771 * 10^3^ = 1.1011101011 * 2^10^
\begin{array}
\ && & GR & S\\
\ && 1.1010001000 & 00\\
\ + && 0.0000011010 & 10 & 0111\\
\hline
\ && 1.1010100010 & 10 & 1 & Round\ up\\
\hline
\ && 1.1010100011 && \\
\end{array}
:::success
#### 33. Calculate 3.984375 * 10^-1^ + (3.4375 * 10^-1^ + 1.771 * 10^3^) by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.
:::
3.984375 * 10^-1^ = 1.1001100000 * 2^-2^<br>
3.4375 * 10^-1^ = 1.0110000000 * 2^-2^<br>
1.771 * 10^3^ = 1.1011101011 * 2^10^
\begin{array}
\ && & GR & S\\
\ && 1.1001100000\\
\ + && 1.0110000000\\
\hline
\ && 10.1111100000\\
\end{array}
1.1001100000 * 2^-2^ + 1.0110000000 * 2^-2^ = 1.0111110000 * 2^-1^ = 0.00000000001011111
\begin{array}
\ && & GR & S\\
\ && 1.1011101011\\
\ + && 0.0000000000 & 10 & 11111\\
\hline
\ && 1.1011101011 & 10 & 1 & Round\ up\\
\hline
\ && 1.1011101100\\
\end{array}
Ans : 1.1011101100 * 2^10^
:::success
#### 34. Based on your answers to 3.32 and 3.33, does (3.984375 * 10^-1^ + 3.4375 * 10^-1^) + 1.771 * 10^3^ = 3.984375 * 10^-1^ + (3.4375 * 10^-1^ + 1.771 * 10^3^)?
:::
>Q.33 是算左邊, 現在要計算等號右邊的結果
1.771 * 10^3^ = 1.1011101011 * 2^10^<br>
3.4375 * 10^-1^ = 1.0110000000 * 2^-2^ = 0.000000000001011 * 2^10^<br>
3.984375 * 10^-1^ = 1.1001100000 * 2^-2^ = 0.00000000000110011
\begin{array}
\ && & GR & S\\
\ && 0.0000000000 & 01 & 011\\
\ + && 1.1011101011\\
\hline
\ && 1.1011101011 \\
\ + && 0.0000000000 & 01 & 10011 \\
\hline
\ && 1.1011101011\\
\end{array}
Ans : Not equal
:::success
#### 35. Calculate (3.41796875 * 10^-3^ * 6.34765625 * 10^-3^) * 1.05625 * 10^2^ by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.
:::
> 不想寫ㄟ
:::success
#### 36. Calculate 3.41796875 10^-3^ * (6.34765625 * 10^-3^ * 1.05625 * 10^2^) by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.
:::
> 不想寫ㄟ
:::success
#### 37. Based on your answers to 3.35 and 3.36, does (3.41796875 10^-3^ * 6.34765625 * 10^-3^) * 1.05625 * 10^2^ = 3.41796875 * 10^-3^ * (6.34765625 * 10^-3^ * 1.05625 * 10^2^)?
:::
> 不想寫ㄟ
:::success
#### 38. Calculate 1.666015625 * 10^0^ * (1.9760 * 10^4^ * -1.9744 * 10^4^) by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit floating point format and in decimal.
:::
> 不想寫ㄟ
:::success
#### 39. Calculate (1.666015625 * 10^0^ * 1.9760 * 10^4^) * (1.666015625 * 10^0^ * -1.9744 * 10^4^) by hand, assuming each of the values are stored in the 16-bit half precision format described in Exercise 3.27 (and also described in the text). Assume 1 guard, 1 round bit, and 1 sticky bit, and round to the nearest even. Show all the steps, and write your answer in both the 16-bit fl oating point format and in decimal.
:::
> 不想寫ㄟ
:::success
#### 40. Based on your answers to 3.38 and 3.39, does (1.666015625 * 10^0^ * 1.9760 * 10^4^) * (1.666015625 * 10^0^ * -1.9744 * 10^4^) = 1.666015625 * 10^0^ * (1.9760 * 10^4^ * -1.9744 * 10^4^)?
:::
> 不想寫ㄟ
:::success
#### 41. Using the IEEE 754 floating point format, write down the bit pattern that would represent -1/4. Can you represent -1/4 exactly? 3.42 What do you get if you add -1/4 to itself 4 times? What is -1/4 * 4? Are they the same? What should they be?
:::
\begin{array}
\ sign && exponent && fraction\\
\hline
\ 1 && 01111101 && 00000000000000000000000\\
\end{array}
Ans : They are same
:::success
#### 42. What do you get if you add -1/4 to itself 4 times? What is -1/4 * 4? Are they the same? What should they be?
:::
Ans : They are same
:::success
#### 43. Write down the bit pattern in the fraction of value 1/3 assuming a floating point format that uses binary numbers in the fraction. Assume there are 24 bits, and you do not need to normalize. Is this representation exact?
:::
Ans : No, 0101 0101 0101 0101 0101 0101
:::success
#### 44. Write down the bit pattern in the fraction assuming a floating point format that uses Binary Coded Decimal (base 10) numbers in the fraction instead of base 2. Assume there are 24 bits, and you do not need to normalize. Is this representation exact?
:::
Ans : No, 0011 0011 0011 0011 0011 0011
:::success
#### 45. Write down the bit pattern assuming that we are using base 15 numbers in the fraction instead of base 2. (Base 16 numbers use the symbols 0–9 and A–F. Base 15 numbers would use 0–9 and A–E.) Assume there are 24 bits, and you do not need to normalize. Is this representation exact?
:::
Ans : Yes, 0101 0000 0000 0000 0000 0000
:::success
#### 46. Write down the bit pattern assuming that we are using base 30 numbers in the fraction instead of base 2. (Base 16 numbers use the symbols 0–9 and A–F. Base 30 numbers would use 0–9 and A–T.) Assume there are 20 bits, and you do not need to normalize. Is this representation exact?
:::
Ans : Yes, 01010 00000 00000 00000
:::success
#### 47. ~~The following C code implements a four-tap FIR filter on input array sig_in. Assume that all arrays are 16-bit fixed-point values~~.
```
for (i 3;i< 128;i )
sig_out[i] sig_in[i-3] * f[0] sig_in[i-2] * f[1]
sig_in[i-1] * f[2] sig_in[i] * f[3];
```
<b>~~Assume you are to write an optimized implementation this code in assembly language on a processor that has SIMD instructions and 128-bit registers. Without knowing the details of the instruction set, briefl y describe how you would implement this code, maximizing the use of sub-word operations and minimizing the amount of data that is transferred between registers and memory. State all your assumptions about the instructions you use.~~</b>
:::
> idk
>official solution : https://github.com/dmohindru/cod5e/tree/master/solutions