[Toc]
## eight great concept
* design for __Moore's Law__
* use __absrtraction__ to simplify design
* make the __common case fast__
* performance via __parallelism__
* performance via __pipelining__
* performance via __prediction__
* __hieracy__ of memories
* __dependability__ via redundancy
## from high-level language to hardware
```sequence
high level language (such as C) -> assembly language: compiler
assembly language -> object file: assembler
object file -> binary machine code: linker
```

## performance
response time: how long it takes to complete a task.
throughput / bandwidth: how many works are completed in a given time
==DEFINATION==
* $performance = {1 \over execution\,time}$
the relative performance:
"$x$ is $n$ times faster than $y$. If $x$ us $n$ times as fast as $y$, then the execution time on $y$ is $n$ time as long as it is on $x$"
$$
n = {performance_x \over performance_y} = {execution\,time_y \over execution\,time_x}
$$
## CPU time
clock period: duration of a clock cycle
clock frequency (rate): cycles per second,e.g., 4.0 GHz $=$ 4.0 $\times$ 10^9^Hz
$$
\begin{aligned}
CPU\ time &= {CPU\ clock\ cycles \times clock\ cycle\ time} \\
&= {CPU\ clock\ cycles \over clock\ rate}
\end{aligned}
$$
performance __improved__ by:
1. reducing number or clock cycles
2. increasing clock rate
3. howerer, hardware designer must offen __tarde off__ two of above
---
==EXAMPLE==
Computer A: 2 GHz, 10s CPU time
how to design computer with following requirement:
aim for 6s CPU time
can do faster clock, but causes 1.2 x cycles
ANSWER:
$$
\begin{align}
clock\ rate_B &= {clock\ cycles_B \over CPU\ time_B} \\
&= {1.2 \times clock\ cycle_A \over 6s} \\
&= {1.2 \times 10s \times 2GHz \over 6s} \\
&= {24 \times 10^9\over 6s} \\
&= 4\ Ghz
\end{align}
$$
## instruction performance
* $performance = {1 \over CPU\ time}$
* $CPU\ clock \ cycles = instruction\ count \times average\ clock\ cycles\ per\ instruction(CPI)$
* $CPU\ time = instruction\ count \times CPI \times clock\ cycle\ time$
_*note: $clock\ cycle\ time = {1 \over clock\ rate}$_
==EXAMPLE==
| processor | clock rate | CPI |
| --------- | ---------- | --- |
| P1 | 2 GHz | 0.7 |
| P2 | 3 GHz | 1.5 |
1. which processor has the highest performance
$P1={2 \times 10^9 \over IC \times 0.7}$
$P2={3 \times 10^9 \over IC \times 1.5}$
P1 has the highest performance
2. if the processors each execute a program 10s, find the number of __cycles__ and the number of __instruction__
$P1: 10={IC \times 0.7 \over 2 \times 10^9 }$, $IC={2 \times 10^{10} \over 0.7}$, $cycle=IC \times CPI=2 \times 10^{10}$
$P2: 10={IC \times 1.5 \over 3 \times 10^9 }$, $IC={3 \times 10^{10} \over 1.5}$, $cycle=IC \times CPI=3 \times 10^{10}$
3. we are trying to reduce tie time by 10% but thus leads to an increase of 20% in CPI. what clock rate should we have to get this time reduction?
$P1: 9={2 \times 10^{10} \times 0.7 \times 1.2 \over rate1}$, $rate1={1.68 \times 10^{10} \over 9} \approx 1.87\ Ghz$
$P2: 9={3 \times 10^{10} \times 1.5 \times 1.2 \over rate2}$, $rate2={5.4 \times 10^{10} \over 9} \approx 6\ Ghz$
---
__conclusion__
$$
\begin{flalign}
CPU\ time &= {instruction \over program}\times {clock\ cycles \over instruction}\times {seconds \over clock\ cycle} \\
&=IC \times CPI \times T_c
\end{flalign}
$$
$IC$: 一個程式有多少指令
$CPI$: 每個指令花費多少clock cycle
$T_C$: 每個clock cycle的時間
| | $IC$ | $CPI$ | $T_C$ |
| -------------------- | ------------ | ------------ | ----- |
| algorithm | $\checkmark$ | $\Delta$ | |
| programming language | $\checkmark$ | $\checkmark$ | |
| compiler | $\checkmark$ | $\checkmark$ | |
| ISA | $\checkmark$ | $\checkmark$ | $\checkmark$ |

## power
$power \propto {{1 \over 2} \times capcitive\ load \times voltage^2 \times frequency\ switched}$

## Amdahl's law
improving an aspect of a computer and expecting a proportional improvement in overall performance
$T_{improved} = {T_{affected} \over improvement\ factor} + T_{unaffected}$
==EXAMPLE==
consider a computer running a program that requires 250s, with 70s spent executing FP instructions, 85s spent executing L/S instructions, 40s spent executing branch instructions, and the rest spent on INT instructions
1. by how much is the total time reduced if the time for FP operatios is reduced by 20s.
$250-(250-70 \times 0.2)=250-236=14s$
2. can the total time be reduced by 20% by reducing only the time for branch instruction? why?
$250 \times 0.8 = {40 \over x}+(250-40)$, since $x<0$, hence it is impossible.
## MIPS (million instruction per second)
$$
\begin{aligned}
MIPS &= {instruction\ count \over execution\ time \times 10^6} \\
&={instruction\ count \over {instruction\ count \times CPI \over clock\ rate}\times10^6} \\
&= {clock\ rate \over CPI \times 10^6}
\end{aligned}
$$
==EXAMPLE==
| measurement | Processor A | Processor B |
| ------------------ | --------- | --------- |
| instruction counts | 8 billion | 6 billion |
| clock rate | 800 MHz | 700 MHz |
| CPI | 1.5 | 1.2 |
1. which processor has the higher MIPS
$MIPS_A={800 \times 10^6\over 1.5 \times 10^6} \approx 533.3$
$MIPS_B={700 \times 10^6 \over 1.2 \times 10^6} \approx 583.3$
hence processor B has the higher MIPS
2. which processor is faster
$CPU\ time_A = {8 \times 10^9 \times 1.5 \over 800 \times 10^6} = {12 \over 700}$
$CPU\ time_B = {6 \times 10^9 \times 1.2 \over 700 \times 10^6} = {7.2 \over 700}$
since $CPU\ time_A>CPU\ time_b$, processor B is faster