Try   HackMD

2016q3 Homework1 (Compute-pi)

contributed by <vic85821>

開發環境

  • Ubuntu 16.04.1 LTS
  • Linux version 4.4.0-38-generic
  • CPU : Intel® Core™ i5-3230M CPU @ 2.60GHz
  • MEM : 8GB
  • L1d cache : 32K
  • L1i cache : 32K
  • L2 cache : 256K
  • L3 cache : 3072K

背景知識

大一大二,還沒有面對過時間量測的問題,藉這個機會一次弄懂

1. sys user real time

  • system time

    • process中kernel mode code所花費的CPU time(e.g. system calls)
  • user time

    • process中user mode code所花費的CPU time
    • CPU time真正花費在程序上, process blocked或是其他process佔用的時間都不算

    user time + system time = process takes CPU time

  • real time

    • 可以看成"當程式開始時看一次手錶,結束時再看一次,兩者的時間差"
    • 根據wall-clock time而定
    • 包含process blocked的時間(e.g.等待I/O)
  • Question

    • user time + system time = real time ? (NO)
      • user time + system time > real time multithread process,各個thread所花時間的總和
      • user time + system time < real time interrupts

2. CPU / wall-clock time

  • CPU time
    • process在CPU上運行所消耗的時間,有多個thread的話,CPU time是各個thread的總和
  • wall-clock time
    • 系統藉由RTC(Real time clock)計算時間
    • 顧名思義,基本上是真實世界所經過的時間,但有時不完全是單調遞增

  • make check
time ./time_test_baseline
N = 400000000 , pi = 3.141593
3.60user 0.02system 0:03.62elapsed 100%CPU (0avgtext+0avgdata 1768maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
time ./time_test_openmp_2
N = 400000000 , pi = 3.141593
3.90user 0.00system 0:01.95elapsed 199%CPU (0avgtext+0avgdata 1584maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
time ./time_test_openmp_4
N = 400000000 , pi = 3.141593
7.24user 0.00system 0:01.95elapsed 370%CPU (0avgtext+0avgdata 1744maxresident)k
0inputs+0outputs (0major+93minor)pagefaults 0swaps
time ./time_test_avx
N = 400000000 , pi = 3.141593
1.64user 0.00system 0:01.64elapsed 99%CPU (0avgtext+0avgdata 1576maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
time ./time_test_avxunroll
N = 400000000 , pi = 3.141593
1.46user 0.00system 0:01.47elapsed 99%CPU (0avgtext+0avgdata 1788maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps

baseline版本

double computePi_v1(size_t N)
{
    double pi = 0.0;
    double dt = 1.0 / N; // dt = (b-a)/N, b = 1, a = 0
    for (size_t i = 0; i < N; i++) {
        double x = (double) i / N; // x = ti = a+(b-a)*i/N = i/N
        pi += dt / (1.0 + x * x); // integrate 1/(1+x^2), i = 0....N
    }
    return pi * 4.0;
}

pi 是 ( 01 11+x2 dt ) 的4倍

離散統計 (Monte Carlo)

一個半徑為1的圓,面積就是 π
因此在一個邊長為2的正方形中間,隨機亂數產生點,圓內的點/點的總數 π/4
π = 4 * 該比例

double compute_pi_toss(size_t N)
{
    size_t sum;
    for(size_t i = 0; i < N; i++)
    {
        double x,y,z;
        x = (double)rand()/RAND_MAX;
        y = (double)rand()/RAND_MAX;
        z = x*x+y*y;

        if (z<=1) sum++;
    }
    return sum/(double)N * 4.0;
}

明顯有誤差而且執行時間長

N = 400000000 , pi = 3.141574
9.84user 0.03system 0:09.87elapsed 99%CPU (0avgtext+0avgdata 1792maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps

Leibniz

公式:

Proof:
先從黎曼積分開始

分解

但當x=1時,不在收斂半徑,需要額外證明

當n 時,除積分項以外的項收斂到萊布尼茨級數

得證##

程式碼

double compute_pi_leibniz_baseline(size_t N)
{
    double pi = 0.0;
    for (size_t i = 0; i < N; i++)
    {
        pi += pow(-1.0,i) / (2*i + 1);
    }
    return pi;
}

Eular

公式:

程式碼

double compute_pi_eular_baseline(size_t N)
{
    double pi = 0.0;
    for (size_t i = 1; i < N; i++)
    {
        pi += 1 / pow(i,2);
    }
    return sqrt(6 * pi);
}

圖表 (時間)

  • 因為多了一個,de超久的Orz
  • toss 是 用離散統計的方法,明顯超耗時,原因是過程從x y座標rand產生並計算到原點的距離
  • 看不出其他方法的差異了,只好把toss拿掉再畫一次
  • 覺得default顏色好醜,研究了一下怎麼改顏色plot sin(x) lt rgb "#FF0000"

  • 資料 100 : 4096 : 250000

gnuplot程式碼

reset
set xlabel 'N'
set ylabel 'time(sec)'
set style fill solid
set key left
set grid
set title 'compute pi time'
set term png enhanced font 'Verdana,10'
set datafile separator ","
set output 'runtime.png'

plot "result_clock_gettime.csv" using 1:2 title 'baseline' with lines lt rgb "#FF0000", \
     "" using 1:3 title 'openmp_2' with lines lt rgb "#FF4500", \
     "" using 1:4 title 'openmp_4' with lines lt rgb "#FFD700", \
     "" using 1:5 title 'avx' with lines lt rgb "#7FFF00", \
     "" using 1:6 title 'avxunroll' with lines lt rgb "#0000FF"

從圖表中可以看出有一些資料會浮動

圖表分析

分別是 "baseline" "openmp_4"浮動現象較為明顯
在benchmark_clock_gettime.c檔案裏面,是使用clock_gettime()來計算執行時間
int clock_gettime(clockid_t clk_id, struct timespec *tp);

struct timespec {
        time_t   tv_sec;        /* seconds */
        long     tv_nsec;       /* nanoseconds */
};

程式中呼叫clock_gettime(CLOCK_ID,&start)並且#define CLOCK_ID CLOCK_MONOTONIC_RAW,什麼是CLOCK_MONOTONIC_RAW?

  • CLOCK_MONOTONIC v.s. CLOCK_MONOTONIC_RAW
    • CLOCK_MONOTONIC : 不能被更改修正的系統時間,透過jiffies的值來計算,但會受到RTC的校正影響
    • CLOCK_MONOTONIC_RAW : 與CLOCK_MONOTONIC類似,唯一不同在於不受RTC影響

OpenMP

額外分析看看對於openMP而言,thread的數目是否會影響浮動這個現象!!

看起來只有thread=4的時候受影響較大

修正

將95%信賴區間內的資料保留,並重新繪圖

突然發現我計算信賴區間的方法錯了,應該是N的各個值重複計算,然後再取95%信賴區間的值

Leibniz, Eular所需時間較多

程式碼

  • Makefile
gencsv: default
	for i in `seq 100 4096 250000`; do \
		for j in `seq 1 1 100`; do \
			printf "%d," $$i;\
			./benchmark_clock_gettime $$i; \
		done > result_clock_gettime.csv ; \
         ./confidence_interval; \
	done > result_clock_gettime_refine.csv

plot: gencsv
	gnuplot runtime.gb
  • confidence_interval.c
double compute_avg(double* data, int num)
{
    double sum = 0.0;

    for(int i=0;i<num;++i)
        sum += data[i];

    return sum / num;
}

double compute_std(double avg,double* data)
{
    double sum = 0.0;

    for(int i=0;i<NUM_DATA;++i)
    {
        sum += pow(data[i]-avg,2);
    }

    return sqrt(sum / (double)(NUM_DATA - 1));
}

void compute_ci(double avg,double std,double* max,double* min)
{
    *max = avg + 2*std;
    *min = avg - 2*std;
}

參考資料