Try   HackMD

2019q1 Homework1 (lab0)

tags: sysprog 2019 2019spring

contributed by < czvkcui >

認真看作業要求,除了提及如何逐步達到要求,還需要進行:

  1. 改善程式效能
  2. 解釋自動評分系統運作的原理
  3. 提及 qtest 的行為和裡頭的技巧

還未達成符合預期目標,請繼續付出!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

queue implement

q_new,q_free,q_insert_head,q_insert_tail,q_remove_head,q_size,q_reverse implement

q_new

typedef struct { list_ele_t *head, *tail; /* Linked list of elements */ /* You will need to add more fields to this structure to efficiently implement q_size and q_insert_tail */ size_t size; } queue_t; queue_t *q_new() { queue_t *q = malloc(sizeof(queue_t)); if (q) { q->head = NULL; q->tail = NULL; q->size = 0; } return q; }

size保存linked list 的長度
q_new 初始化queue_t結構

q_free

void q_free(queue_t *q) { /* Free queue structure */ if (!q) return; list_ele_t* t; while (q->head) { t = q->head->next; free(q->head->value); free(q->head); q->head = t; } free(q); }

q_insert_head

bool q_insert_head(queue_t *q, char *s) { list_ele_t *newh = malloc(sizeof(list_ele_t)); int len = strlen(s); if (!q) goto error; if (!newh) goto error; newh->value = malloc((len + 1) * sizeof(char)); if (!newh->value) goto error; strncpy(newh->value, s, len); newh->value[len] = '\0'; newh->next = q->head; if (!q->head) { q->tail = newh; } q->head = newh; q->size++; return true; error: if (newh) free(newh); return false; }

利用goto來統一處理例外狀況

q_insert_tail

bool q_insert_tail(queue_t *q, char *s) { list_ele_t *newh = malloc(sizeof(list_ele_t)); int len = strlen(s); if (!q) goto error; if (!newh) goto error; newh->value = malloc((len + 1) * sizeof(char)); if (!newh->value) goto error; strncpy(newh->value, s, len); newh->value[len] = '\0'; newh->next = NULL; if (!q->tail) { q->head = newh; q->tail = newh; } q->tail->next = newh; q->tail = newh; q->size++; return true; error: if (newh) free(newh); return false; }

q_remove_head

bool q_remove_head(queue_t *q, char *sp, size_t bufsize) { if (!q || !q->head) goto error; char* str = q->head->value; if (sp) { strncpy(sp, str, bufsize - 1); sp[bufsize - 1] = '\0'; } list_ele_t* t = q->head; q->head = q->head->next; free(t->value); free(t); q->size--; if (q->size == 0) q->tail = NULL; return true; error: return false; }

q_size

int q_size(queue_t *q) { if (!q || !q->head) return 0; return q->size; }

q_reverse

void q_reverse(queue_t *q) { if (!q || !q->head || !q->head->next) return; if (q->head == q->tail) return; q->tail = q->head; q->head = q->head->next; list_ele_t* back = q->tail, *curr = q->head->next; while (curr != NULL) { q->head->next = back; back = q->head; q->head = curr; curr = curr->next; } q->head->next = back; q->tail->next = NULL; }

解釋自動評分系統運作的原理

scripts

git hook

we have 2 hooks inside our scripts:

  1. pre-commit.hook
  2. commit-msg.hook

pre-commit.hook

  • trigger on git commit command
  • check format is suitable with clang format
  • use cppcheck checking

commit-msg.hook

  1. check commit message with regular expression

qtest command interpret

初始化階段:

queue_init();
init_cmd();
console_init();
set_verblevel(level);

init_cmd和console_init會向console註冊所使用的command

command parse

main->run_console->cmd_select->interpret_cmd->interpret_cmda

interpret_cmda執行

cmd_ptr next_cmd = cmd_list;
bool ok = true;
while (next_cmd && strcmp(argv[0], next_cmd->name) != 0)
        next_cmd = next_cmd->next;
if (next_cmd) {
    ok = next_cmd->operation(argc, argv);
    if (!ok)
        record_error();
} else {
    ... // error handling
}

cmd_list是一個link list,它保存了queue_initinit_cmd所註冊的command。透過循序搜尋找到符合的cmd,然後執行對應的函式(next_cmd->operation)。

report.[hc]

static void check_exceed(size_t new_bytes)
{
    size_t limit_bytes = (size_t) mblimit << 20;
    size_t request_bytes = new_bytes + current_bytes;
    if (mblimit > 0 && request_bytes > limit_bytes) {
        report_event(MSG_FATAL,
                     "Exceeded memory limit of %u megabytes with %lu bytes",
                     mblimit, request_bytes);
    }
}

Q:Why limit_bytes = mblimit<<20 ?
A:mblimit 作為判斷malloc有無超出上限,mblimit<<20代表最高可以使用的大小(KB),在本例中0<<20還是為0(mblimit預設為0表示無上限)

void init_time(double *timep)
{
    (void) delta_time(timep);
}

Q:Why we need (void) before delta_time(timep) ?
A:某些編譯器可能會警告使用者沒有處理返回值,加上(void)來處理返回值(轉形成void),但是像是printf等常用的function可以在宣告時加上__attribute__(noreturn)來避免。

在gcc中 -Wall 預設會開啟Wunused-result,不過在我的平台上(gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0),刪除(void)和加上-Wunused-result都沒有效果。此外也可以在function宣告加上__attribute__((warn_unused_result))強迫警告使用者處理返回值。

getrusage:

獲取資源的使用情形

/* Number of bytes resident in physical memory */
size_t resident_bytes()
{
    struct rusage r;
    size_t mem = 0;
    int code = getrusage(RUSAGE_SELF, &r);
    if (code < 0) {
        report_event(MSG_ERROR, "Call to getrusage failed");
    } else {
        mem = r.ru_maxrss * 1024;
    }
    return mem;
}

declaration:
int getrusage(int who, struct rusage *usage);
usage接受一個參數負責返回結果,而who則有以下參數

  • RUSAGE_SELF:
    只統計本身process的使用資訊(包含thread)
  • RUSAGE_CHILDREN:
    會統計所有children和calling的process的資訊
  • RUSAGE_THREAD(linux 2.6.26以後):
    統計calling thread的資訊
    rusage structure:
struct rusage {
               struct timeval ru_utime; /* user CPU time used */
               struct timeval ru_stime; /* system CPU time used */
               long   ru_maxrss;        /* maximum resident set size */
               long   ru_ixrss;         /* integral shared memory size */
               long   ru_idrss;         /* integral unshared data size */
               long   ru_isrss;         /* integral unshared stack size */
               long   ru_minflt;        /* page reclaims (soft page faults) */
               long   ru_majflt;        /* page faults (hard page faults) */
               long   ru_nswap;         /* swaps */
               long   ru_inblock;       /* block input operations */
               long   ru_oublock;       /* block output operations */
               long   ru_msgsnd;        /* IPC messages sent */
               long   ru_msgrcv;        /* IPC messages received */
               long   ru_nsignals;      /* signals received */
               long   ru_nvcsw;         /* voluntary context switches */
               long   ru_nivcsw;        /* involuntary context switches */
           };

console.[hc]

cmd_select ,run_console:

bool run_console(char *infile_name)
{
    if (!push_file(infile_name)) {
        report(1, "ERROR: Could not open source file '%s'", infile_name);
        return false;
    }
    while (!cmd_done()) {
        cmd_select(0, NULL, NULL, NULL, NULL);
    }
    return err_cnt == 0;
}

harness.[hc]

test_malloc and test_free

queue.c因為有include harness.h 因此呼叫的mallocfreetest_malloctest_free
struct of block_ele_t

typedef struct BELE {
    struct BELE *next;
    struct BELE *prev;
    size_t payload_size;
    size_t magic_header; /* Marker to see if block seems legitimate */
    unsigned char payload[0];
    /* Also place magic number at tail of every block */
} block_ele_t;

payload[0]是C99後引入的語法,可以允許使用者在runtime時才決定該array大小。

block_ele_t *new_block =
        malloc(size + sizeof(block_ele_t) + sizeof(size_t));

size是是真正要給使用者使用的空間大小(bytes),加上block_ele_t的大小,sizeof(size_t)則是儲存footer的空間。在test_malloc中,block_ele_t會依序填上MAGICHEADERsizeFOOTERpayload,接著將next指向link list的開頭(allcated則指向新的new block,舊的prev指標指向新的block),再由allocated_count++後完成記憶體分配。test_free會檢查要被釋放的block的MAGICHEADERMAGICFOOTER。當payload存取的資料超過payload的大小時會覆蓋自己的MAGICFOOTERMAGICHEADER則可以檢查自己的block是否被別人所覆蓋[1]。最後test_free將block移出double link list,allocated_count--釋放block。

trigger_exception,exception_setup,exception_cancel

jmp_read的type是volatile,在參考from C 語言 setjmp 與 longjmp 函數用法教學後得知:

  • 未開啟最佳化options(-O)只有 register 宣告的變數會改變(not change:volatile.global,static,local)
    • with -O option: register,local宣告的變數值均會改變
  • sigsetjmp,siglongjmp:
    • setjmplongjmp一樣都是nonlocal jmp
    • 但是setjmplongjmp並不會在jump儲存當前的signal mask
void trigger_exception(char *msg)
{
    error_occurred = true;
    error_message = msg;
    if (jmp_ready)
        siglongjmp(env, 1);
    else
        exit(1);
}

當exception觸發(SIGSEGV,SIGALRM),jmp_rady會被trigger成true,而進行setlongjmp至上一個exception_setup位置(if (sigsetjmp(env, 1))),印出錯誤訊息後返回。
exception_cancel:清除'jmp_ready'和error_message,若是超時則會觸發alarm。

qtest.c

bool do_new(int argc, char *argv[])
{
    if (argc != 1) {
        report(1, "%s takes no arguments", argv[0]);
        return false;
    }
    bool ok = true;
    if (q != NULL) {
        report(3, "Freeing old queue");
        ok = do_free(argc, argv);
    }
    error_check();
    if (exception_setup(true))
        q = q_new();
    exception_cancel();
    qcnt = 0;
    show_queue(3);
    return ok && !error_check();
}

根據exception_setupexception_cancel可以得知

static void queue_init()
{
    fail_count = 0;
    q = NULL;
    signal(SIGSEGV, sigsegvhandler);
    signal(SIGALRM, sigalrmhandler);
}

queue_init會使qtest在初始化時註冊SIGSEGVSIGALRM,當中斷發生時OS會通知signal handler做對應的處理,在本例中SIGSEGV發生時會交由sigsegvhandler,而SIGALARMsigalrmhandler處理,兩者都是印出錯誤訊息後,呼叫trigger_exception

效能改善

Test in parallel

修改driver.py 以 process pool來執行:
Performance counter stats for 'python scripts/driver.py'(with process pool):

​​​​   1276.391265      task-clock (msec)         #    1.327 CPUs utilized          
​​​​           842      context-switches          #    0.660 K/sec                  
​​​​            58      cpu-migrations            #    0.045 K/sec                  
​​​​       135,961      page-faults               #    0.107 M/sec                  
​​​​ 3,941,655,638      cycles                    #    3.088 GHz                    
​​​​ 6,034,081,563      instructions              #    1.53  insn per cycle         
​​​​ 1,167,962,182      branches                  #  915.050 M/sec                  
​​​​     2,832,393      branch-misses             #    0.24% of all branches        

​​​​   0.961677493 seconds time elapsed

Performance counter stats for 'python scripts/driver.py':

​​​​   1007.687420      task-clock (msec)         #    0.797 CPUs utilized          
​​​​           803      context-switches          #    0.797 K/sec                  
​​​​            12      cpu-migrations            #    0.012 K/sec                  
​​​​       131,010      page-faults               #    0.130 M/sec                  
​​​​ 3,107,377,907      cycles                    #    3.084 GHz                    
​​​​ 5,928,785,192      instructions              #    1.91  insn per cycle         
​​​​ 1,146,581,106      branches                  # 1137.834 M/sec                  
​​​​     2,129,417      branch-misses             #    0.19% of all branches        

​​​​   1.264532373 seconds time elapsed

In [2]: (1.264532373-0.961677493)/1.264532373 * 100
Out[2]: 23.949950706402355

compiler opt

在用perf record分析時,發現有些部份可能可以交由編譯器最佳化,例如每次malloc都會被呼叫但是回傳值都是一樣的fail_allocation

  • orig
        21,424,519      cache-misses              #   91.576 % of all cache refs      ( +-  0.05% )
        23,395,281      cache-references                                              ( +-  0.17% )
     1,580,167,387      instructions              #    1.51  insn per cycle           ( +-  0.05% )
     1,049,546,666      cycles                                                        ( +-  1.31% )
           246,893      branch-misses                                                 ( +-  2.07% )

       0.343512614 seconds time elapsed                                               ( +-  1.37% )
  • gcc -O2 flag
        21,198,677      cache-misses              #   92.116 % of all cache refs      ( +-  0.06% )
        23,013,085      cache-references                                              ( +-  0.14% )
     1,242,986,343      instructions              #    1.39  insn per cycle           ( +-  0.05% )
       893,054,339      cycles                                                        ( +-  1.19% )
           232,820      branch-misses                                                 ( +-  1.79% )

       0.291908910 seconds time elapsed  
  • gcc -O3 flag
        21,257,036      cache-misses              #   92.071 % of all cache refs      ( +-  0.06% )
        23,087,689      cache-references                                              ( +-  0.15% )
     1,245,395,797      instructions              #    1.36  insn per cycle           ( +-  0.06% )
       916,791,967      cycles                                                        ( +-  1.35% )
           243,474      branch-misses                                                 ( +-  1.96% )

       0.299784791 seconds time elapsed                                               ( +-  1.45% )
  • clang -O2 flag
        21,258,614      cache-misses              #   91.996 % of all cache refs      ( +-  0.04% )
        23,108,203      cache-references                                              ( +-  0.16% )
     1,242,551,526      instructions              #    1.34  insn per cycle           ( +-  0.06% )
       924,695,143      cycles                                                        ( +-  1.43% )
           261,872      branch-misses                                                 ( +-  8.10% )

       0.302822045 seconds time elapsed                                               ( +-  1.53% )
  • clang -O3 flag
        21,243,947      cache-misses              #   92.033 % of all cache refs      ( +-  0.03% )
        23,082,922      cache-references                                              ( +-  0.11% )
     1,241,602,442      instructions              #    1.35  insn per cycle           ( +-  0.05% )
       917,623,562      cycles                                                        ( +-  1.30% )
           249,315      branch-misses                                                 ( +-  5.23% )

       0.299945914 seconds time elapsed                                               ( +-  1.28% )

驗證常數時間複雜度[2]

Reference


  1. 0xff07筆記 ↩︎

  2. afcidk筆記 ↩︎