CH3 - HackMD

# Performance Definition * For different target the index for estimate performance will be different 1. Personal Use : Execution Time 2. Server : Throughput * We define the $Performance = 1/Execution Time$.To determine that $PC_{a}$ is faster than $PC_{b}$ by *n* times and n is calculated by $Performance_a / Performance_b$ * **Performance is inversely proportion to Execution time** > ![](https://hackmd.io/_uploads/rke2zTzjn.png) * Since the I/O time and System CPU time is uncontrollable.We will focus on the **User CPU time**. # User Time ## Components of User Time * To avoid errors caused by the data transfer speed.We use a *clock* to synchronize the data and ensure the result is correct * Clock can be derived by $Clock = 1 / CPU\,Speed$ the *CPU Speed also known as **clock rate*** > ![](https://hackmd.io/_uploads/B1BqVazsn.png) > ![](https://hackmd.io/_uploads/SkJ7rpGi3.png) * To compare **different machine** we only need to focus on the $Avg\,CPI * Cycle\,Time$(*Intruction Time*) since the IC will remain the same when running same program. * To compare **different program** we focus on the $IC * Avg\,CPI$(*CPU Clocks*) since the cycle time will be the same --- > Example:![](https://hackmd.io/_uploads/B1EESTzi3.png) > For A the $Instruction\,time = 250 * 2 = 500$ > For B the $Instruction\,time = 500 * 1.2 = 600$ > =>A is faster than B by $600/500 = 1.2\ times$ > Example :![](https://hackmd.io/_uploads/SJVoP6zj3.png) > $Cycles = Execution time / clock\ rate = 10 / (1/2*10^9)$ = 20G > For B $Cycles_B = 20 * 1.2 = 24G$ the desired clock $speed = 24G/6 = 4Ghz$ > Example: > ![](https://hackmd.io/_uploads/Hy3XYpMj2.png) > ![](https://hackmd.io/_uploads/BJB0upGih.png) > 1. $Instructions_1 = 2 + 1 + 2 = 5$,$Instructions_2 = 4 + 1 + 1 = 6$ > 2. Sequence 2 > 3. $CPI_1 = (2*1+2*1+3*2)/5 = 2$, $CPI_2 = (4*1+1*2+3*1) = 1.5$ >Example: ![](https://hackmd.io/_uploads/rJ02YTfo3.png) > (a.) $IPC_1 = 20*10^9/2*10^9 = 10$,$IPC_2 = 30*10^9/1.5*10^9 = 20$,$IPC_3 = 90*10^9/3*10^9 = 30$ > (b.) P2 is slower than P1 by 7/10 = 0.7times => To speed up to 7s the clock rate needs to be $1.5/0.7 = 2.14\ Ghz$ > (c.) Same as above,the ratio is 0.9 => $30*0.9 = 27 * 10^9$ ## Factors that will influence the User Time >![](https://hackmd.io/_uploads/BJweipGj2.png) # Million Instructions Per Second(MIPS) * Can be used for estimate efficiency but **it's not absolutely correct** > ![](https://hackmd.io/_uploads/B1ZGh14sn.png) > And the instruction time can be expressed like below > ![](https://hackmd.io/_uploads/ryv2hyEj2.png) * Like what we said before,MIPS can't be used for compare PC's efficiency > ![](https://hackmd.io/_uploads/r1I_pJEj2.png) > 1. For CISC,one instruction can perform more difficult job than a RISC instruction > 2. Different program will have different MIPS since the instructions inside is different --- > Example: > ![](https://hackmd.io/_uploads/SJ0yJg4i2.png) > $MIPS = 10^7/0.25 * 10^9 = 40$ > $CPI = (100*10^6*0.25)/10^7 = 2.5$ > Example:![](https://hackmd.io/_uploads/HyTBbgVi2.png) > ![](https://hackmd.io/_uploads/ByBIWg4j3.png) > (a). $CPI_{M1} = (0.6+0.6+0.4) = 1.6$,$CPI_{M2} = (1.2+0.9+0.4) = 2.5$ > (b). $IT = 1/(MIPS)*10^6$ => $MIPS = 1/IT*10^6$. $MIPS_{M1} = 1/((1/{80*10^6})*1.6)*10^6 = 50$,$MIPS_{M2} = 1/((1/{100*10^6})*2.5)*10^6 = 40$ > (c). Choose the most frequent instructions to reduce its CPI => Reduce A'CPI to 1 => $CPI_{M2} = (0.6+0.9+0.4) = 1.9$ => $MIPS_{M2} = 1/((1/{100*10^6})*1.9)*10^6 = 52>50$ > Example: > ![](https://hackmd.io/_uploads/HkwW8g4ih.png) > Since every mul 4 instruction's CPI is reduced to 2(2 * 1) thus the time we saved is coming from it => $10-9 * 1Ghz = 1G = Total\ Clocks\ Saved$ => $1G/2 = 0.5G = Total\ Instruction\ Swapped$ > Example:![](https://hackmd.io/_uploads/S1A_Pl4o2.png) > ![](https://hackmd.io/_uploads/BJSoPlNs3.png) > 1. $MIPS_A = (L+F*N)/Y*10^6$ $MIPS_B = (L+F)/Z*10^6$ > 2. Put the numbers into the equation above => $Y = (L+F*N)/MIPS_A*10^6 = 100ms$,$Z = (L+F)/MIPS_B*10^6 = 91.67ms$ > 3. Yes,with the co-processor the excution time is faster. # Amdahl's Law * Used for describe the improvement of the execution time > ![](https://hackmd.io/_uploads/rJNfUt4o2.png) --- > Example:![](https://hackmd.io/_uploads/SybowtEon.png) > $Speedup = 10/(5/5)+5 = 5/3$ > Example:![](https://hackmd.io/_uploads/SJN-LtNi3.png) > Let total *time spend on fp arithmetic = x* => $3 = 100/(x/5)+(100-x) = 250/3$ --- * From the example above we can drived an formula for speedup by letting the *time affected = f* and *improvement = s* > ![](https://hackmd.io/_uploads/HJEl2Y4in.png) --- > Example:![](https://hackmd.io/_uploads/rJfjhFEsh.png) > $Time\ after\ speedup = 60/3 = 20$ => $20 = 20 + 40/s$ => s needs to be infinity that the speedup of 3 is achieveable => impossible > Example:![](https://hackmd.io/_uploads/ByP8JcNsn.png) > 1. False,clock rate increase the execution time will reduce > 2. True > 3. True > 4. False,$1/(0.4/40+0.6) = 1.63$ > 5. False,$1/(0.5/20+0.5 = 1.9$ > 6. True,$1/(0.6/10+0.4) = 2.17$ > Example:![](https://hackmd.io/_uploads/B1JgWcNs3.png) > (1).$1/(0.2/4+0.8) = 1.17$ > (2).$1/(0.5/2 + 0.5) = 1.3$ > (3).$1/(0.2/4+0.5/2+0.3) = 1.6$ > Example:![](https://hackmd.io/_uploads/By9cq0Hjn.png) > Each loop is independent => Can be perform in parallel > Without parallelized => $Cycles_{Before} = 20 + 200 * 200 = 40020$ > In parallel => $Cycles_{After} = 20 + 200 = 220$ > $Answer = 40020 / 220 = 181.91$ > Example:![](https://hackmd.io/_uploads/H1i9pAri2.png) > (a) Let `X` be the time used for excuting MMX => $2 = \frac{1}{\frac{x}{10}+(1-x)} => x = 55\%$ > (b) Let `T` be total time spend before and since the time spend on MMX mode is 10 times faster => $Answer = \frac{\frac{0.55T}{10}}{\frac{0.55T}{10}+0.45T} = \frac{\frac{0.55}{10}}{\frac{0.55}{10}+0.45} = 10.89\%$ > (c) If whole program is in MMX then the speedup will be 10 => half is 5 => $5 = \frac{1}{\frac{x}{10}+(1-x)} => x =88\%$ > Example:![](https://hackmd.io/_uploads/Skb4uk8oh.png) > Enhanced mode used 80% of the time means the *80% of the time spend after have been speedup* > (1). Since the time after is speedup 10 times => Let `T` be the time spend after => Original time spend on fast mode = $10 * 0.8 * T = 8T$ => $SpeedUp = \frac{8T}{8T+0.2T} = 8.2$ > (2). $Answer = \frac{8T}{8.2T} = 97\%$ # Performance Summarizing ## Arithmetic Mean * Consider a time of excution chart below > ![](https://hackmd.io/_uploads/S181eeIj2.png) * We can get performace by $\frac{Time_B}{Time_A}$,for example => for program 1 computer B is faster than computer A by $\frac{100}{20} = 5$ times faster * And if we take the arithmetic mean for all program, we can get a overall comparison for those two > ![](https://hackmd.io/_uploads/HkZe-lIo3.png) * To get a more accurate result,we can consider adding *weight* to different program ## Geometric Mean * We take one machine as reference and *normalized* the run time for other machine with it by the formula $SPECratio = \frac{Time_{Reference}}{Time_{Measured}}$ > ![](https://hackmd.io/_uploads/Sk_i7gLoh.png) * With the use of GM,we can compare machines without worrying the reference changed > ![](https://hackmd.io/_uploads/rJNeBxLih.png) * But it's **not suitable to compare** performance since it don't predict excution time > ![](https://hackmd.io/_uploads/S17iSlUj2.png) > From the chart above, we can see that B'S overall performance is better than A but the GM of both of them are the same --- > Example:![](https://hackmd.io/_uploads/SJI4Lg8in.png) >1. The excution time will increase by 10% >2. The excution time will increase by $1.1*1.05 = 1.155$ >3.The reference time is original time => $SPECratio = \frac{1}{1.155} = 0.86$ => Decreased by 14% > Example: > ![](https://hackmd.io/_uploads/BkvBwQwjh.png) > 1. $(1*0.1)+(3*0.1)+(2*0.5)+(5*0.1)+(4*0.2) = 2.7$ > 2. $10^9 * 2.7 * (10*10^-9) = 27sec$ > 3. Since the multiply can be replaced by the mul-add by 5% and it needs to perform arithmetic too,thus the percentage of arithmetic instruction will reduce by 5% too. > => Jump = 10% Branch = 10% Arithmetic = 45% Multiply = 5% Memory = 20% Mul-Add = 5% > * The total percentage isn't 100% when calculating CPI,we need to **divide the current total percentage** to get a correct answer => > $CPI = (1*0.1)+(3*0.1)+(2*0.45)+(5*0.05)+(4*0.2) + (5*0.05) / 0.95 = 2.74$ > 4. $10^9 * 0.95 * 2.74 * (10*10^-9) = 26.03sec$ > Example:![](https://hackmd.io/_uploads/HJnJaQDin.png) > 1. $(1*0.4)+(3*0.2)+(3*0.2)+(2*0.2) = 2$ > 2. Load > 3. ALU = 40% Load = 10% Store = 20% Branch = 20% => > $CPI = ((2*0.4)+(3*0.1)+(3*0.2)+(2*0.2)) / 0.9= 2.33$ > =>No,since both CPI and Cycle Time have increased,the excution time will also increase