contributed by <ruby0109
>
反省這兩個禮拜糟糕的自己, 完全不行阿QAQQQ
參考了CheHsuan 和 TempoJiJi兩位同學的筆記之後, 發現自己沒有驗證原本的程式正確性, mmap沒有下去探討, 學習thread pool等沒有先從小程式實作,和遇到問題不會拆解所以卡住。以下加入預期目標, 繼續完成作業。
之前進度:phonebook
開始實作之前, 我先看了吳彥寬同學程式解說video, 筆記如下:
fgets
本身就可以讀行, 所以可以簡化
補充fgets基礎知識:
根據c99 規格書,
#include <stdio.h>
char *fgets(char * restrict s, int n,FILE * restrict stream);
description:
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additionalcharacters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
理解:
fgets讀取stream指向的stream最多n-1個字元到s指向的陣列中, 遇到換行符號或是檔案結尾字元就不會再讀, '\0' 會寫在陣列最後一個字元之後。
strncasecmp
:
#include <strings.h>
int strcasecmp(const char *s1, const char *s2);int strncasecmp(const char *s1, const char *s2, size_t n);
from linux manual:
strcasecmp() :function compares the two strings s1 and s2, ignoring the case of the characters. It returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be less than, to match, or be greater than s2.The strncasecmp() function is similar, except it compares the only first n bytes of s1
strncasecmp比較s1前n bytes和s2, 比較時忽略大小寫, 若s1和s2相同傳回0; s1長度大於s2傳大於0值; 小於則傳負值。
memory pool 和 pthread 這兩個部份看影片仍然不太明白
可以在findname放其他資料,而不是在append整個放進去(findname或許會些微上升, 可是不會很多)append進去的只有lastName 的指標, findname時再放入lastName
整個append用pthread下去做而不是只有mmap(還不懂為什麼原本只做mmap時間會上升)
先執行一次看看, make plot之後:
Performance counter stats for './phonebook_orig' (100 runs):
4,457,101 cache-misses # 91.778 % of all cache refs( +- 0.07% )
4,856,373 cache-references ( +- 0.05% )
263,262,629 instructions # 1.69 insns per cycle ( +- 0.02% )
155,898,150 cycles ( +- 0.06% )
0.076228638 seconds time elapsed ( +- 0.31% )
Performance counter stats for './phonebook_opt' (100 runs):
3,548,725 cache-misses # 61.592 % of all cache refs( +- 0.25% )
5,761,661 cache-references ( +- 0.46% )
224,875,100 instructions # 1.93 insns per cycle ( +- 0.04% )
116,299,224 cycles ( +- 0.42% )
0.278607164 seconds time elapsed ( +- 2.06% )
優化後的cache miss略下降一些。
file align
先看在main.c中被include的file.c產生的align.txt, 直接點開的話裡面只有aaaa,用vim 的話則是:
aaaa
^@^@^@^@^@^@^@^@^@^@^@aaaaa
^@^@^@^@^@^@^@^@^@^@aaaaaa
^@^@^@^@^@^@^@^@^@aaaaaaa
^@^@^@^@^@^@^@^@aaaaaaaa
^@^@^@^@^@^@^@aaaaaaaah
^@^@^@^@^@^@aaaaaaauugh
^@^@^@^@aaaaaagh
^@^@^@^@^@^@^@aaaaaahhhhh
^@^@^@^@aaaaaaugh
^@^@^@^@^@^@aaaaagh
^@^@^@^@^@^@^@^@aaaaah
^@^@^@^@^@^@^@^@^@aaaarthur
^@^@^@^@^@^@aaaaw
^@^@^@^@^@^@^@^@^@^@aaagghhhh
^@^@^@^@^@^@aaah
^@^@^@^@^@^@^@^@^@^@^@aaaugh
^@^@^@^@^@^@^@^@^@aaccf
不是很懂^@是什麼, 參考what is ^@ character in vi?才知道是NULL bytes。不過奇怪的是,首先第一行沒有^@,再來每一行的行數不同,像第7行有17個,第8行只有12個。
先用GDB觀察main.c跳進file_align function後的執行情形。在wbuf和rbuf都有align, 顯示出來的值也都正確。rbuf的大小似乎可以調小一點, 畢竟也有措施避免strcpy的overflow。
linked list的資料是否正確
試試看原本同學寫的debug.h,跑完之後發現在終端機不太方便測試是否有遺漏的資訊, 要怎麼比對呢? 可以直接用findname
// use findname to test whether all elements in the list
// open file for test and origin word.txt
FILE *test,*org;
test = fopen("test","a");
org = fopen(DICT_FILE, "r");
char line[MAX_LAST_NAME_SIZE];
// use findname to test whether all elements in the list
while(fgets(line, sizeof(line),org)){
e = pHead;
if(findName(line, e) == NULL) fprintf(test,"%s not found\n",line);
}
fclose(org);
fclose(test);
直接把words.txt檔的東西丟下去找, 可是這樣若找尋350000筆,每筆0.03897要尋找約22分鐘。發現TempoJiJi大大直接另外寫一個測, 才發現原來這個版本沒有Hash
寫了entry_test.c 可是有segmentation fault
debug還是卡了好久, 用GDB下去看發現是在append_target中, 第60行的地方, malloc執行到第三個word就會直接掛掉。
把他改成小一點的程式, 縮小錯誤範圍:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#define MAX_LAST_NAME_SIZE 16
#define HASH_SIZE 42737
#define DICT_FILE "./dictionary/words.txt"
typedef struct __PHONE_BOOK_ENTRY {
char *lastName;
struct __PHONE_BOOK_ENTRY *pNext;
} entry;
entry *append_target(char lastName[], entry *e)
{
e->pNext = (entry *)malloc(sizeof(entry));
e = e->pNext;
strcpy(e->lastName, lastName);
e->pNext = NULL;
return e;
}
int main(void)
{
FILE *org;
org = fopen("./dictionary/words.txt","r");
char line[MAX_LAST_NAME_SIZE];
entry *target, *tHead;
tHead = (entry *)malloc(sizeof(entry));
target = tHead;
target -> pNext = NULL;
while(fgets(line,sizeof(line),org)) {
target = append_target(line,target);
}
fclose(org);
free(tHead);
free(tHead->pNext);
}
gdb跑出的結果:
Program received signal SIGSEGV, Segmentation fault.
__strcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S:605
605 ../sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S: 沒有此一檔案或目錄.
發現是struct 忘記改><
感謝徐朝逸大大幫我看codeQAQ
幾經波折, 把所有菜鳥可以犯的錯犯過一遍然後debug完後, 發現有一長串都沒有在裡面, 這裡擷取一小部份:
aaaa
not found
angelika
not found
anklet
not found
antescript
not found
antirobin
not found
assiento
not found
發現我的程式是錯的!!雖然有幾個字母帶進原程式會發現的確沒在裡面, 可以很多都是跑得出來的。把append和findname找不到的資料互相比較, 發現其實有append進去, 可是findName沒有找到, 原來是findName的地方有寫錯, 這樣代表之前自己的code也有錯…
正確性解決
把程式改好之後, 跑出來的結果:
aaaa
not found
aaaaa
not found
aaaaaa
not found
aaaaaaa
not found
因為pthrea跑的方式是
thread0:0 4 8..
thread1:1 5 9..
thread2:2 6 10..
thread3:3 7 11..
所以可以推測這可能是pthread各自list的開頭有問題。
仔細看之後發現在main.c 中的pHead = stack[i]->pHead->pNext和etmp->pNext = stack[i]->pHead->pNext 應該要把pNext去掉!!
if ((suffix = (pad - strlen(rbuf))) != 0)
strcpy(wbuf, rbuf);
typedef struct _Thread_Stack_ {
char *StartAdrs;
char *EndAdrs;
int tid;
int nthread;
entry *PoolPtr;
entry *pHead;
entry *pTail;
} ThrdStack;
筆記:
concurrency in computer world is a way to build things, concurrent is a composition of
independently executing things typically funcitons but they don't have to be(independently executing processes)這裡的process泛指各種thread, coroutine等
理解:concurrency就像是一種方法, 由各個獨立執行的函式(包含process, thread, coroutine?等等)組成, 可是方法不一定要採用。
parallelism is simultaneously executing lots of multiple thing, possible related possible not
理解:parallelism是同時執行很多東西,可能相關,也可能互相獨立。
concurrency is about dealing with a lot of things at once and
parallelism is about doing a lot of things at once
理解:concurrency是一次處理很多東西, 但不一定要同時執行, parallelism 就是同時執行很多東西。
concurrency is about the structure
parallelism is about the execution
concurrency is a structure things that you can maybe use parallelism to do a better job
but parallism is not the goal of concurrency
concurrency's goal is a good structure
ex. OS deal with lots of things, but these things aren't necessarily parallel, only one processor only one of them is
ever running the time, so there are concurrencty structure, but not necessary parallel
理解:parallism不是concurrency的目標, 就像是OS也是一次處理很多東西, 而且不一定要都同時執行。也就是concurrency是一個結構,可是不一定要用parallism的方式執行。
parallelism像是一個箭頭,可以分成不同的方向
concurrency gives you a structure program different pieces, but we need to find out how to coordinate those pieces
and communication
理解:concurrency給我們一個架構, 可以把程式分成不同的區塊, 可是我們必須找出怎麼協調和讓這些區塊可以溝通
後面的GO就聽不是很懂了><
The Free Lunch Is Over
A Fundamental Turn Toward Concurrency in Software
By Herb Sutter
這30年來, CPU 設計者主要藉由改善以下來達到效能的提昇
可是發展到了現在 clock speed很難再提昇, 因為如果再提昇會有無法散去高熱, 太耗能跟電流液出的情形
Myths and Realities: 2 x 3GHz < 6 GHz
未來可以藉由以下來繼續提高效能
Concurrency is the next major revolution in how we write software
作業中的concurrency:
參考筆記並閱讀The Free Lunch Is Over, 對於重構的兩則常見誤解
valgrind:
偵測undefined behavior
valgrind ./phonebook_opt
function and memory profiler
data-race detection(覺得這好酷!!)
valgrind --tool=helgrind ./phonebook_opt
可以用來看memory leak
--leak-check=full
14872 Memcheck, a memory error detector
14872 Copyright © 2002-2013, and GNU GPL'd, by Julian Seward et al.
14872 Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
14872 Command: ./phonebook_opt
14872
size of entry : 24 bytes
execution time of append() : 0.118666 sec
execution time of findName() : 0.002815 sec
14872
14872 HEAP SUMMARY:
14872 in use at exit: 2,215 bytes in 16 blocks
14872 total heap usage: 28 allocs, 12 frees, 8,408,637 bytes allocated
14872
14872 16 bytes in 1 blocks are definitely lost in loss record 1 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x400F23: file_align.4663 (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x40101B: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 16 bytes in 1 blocks are definitely lost in loss record 2 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x4013F6: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 16 bytes in 1 blocks are definitely lost in loss record 3 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x40142A: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 16 bytes in 1 blocks are definitely lost in loss record 4 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x40166E: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x40147E: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 24 bytes in 1 blocks are definitely lost in loss record 5 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x401050: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 107 bytes in 1 blocks are definitely lost in loss record 8 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x4013F6: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 107 bytes in 1 blocks are definitely lost in loss record 9 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x40142A: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 107 bytes in 1 blocks are definitely lost in loss record 10 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x4016B1: findName (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x40147E: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 192 bytes in 4 blocks are definitely lost in loss record 11 of 13
14872 at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
14872 by 0x401709: ThrdInitial (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872 by 0x4011EC: main (in /home/ruby/2016_autumn/W2/phonebook-concurrent/phonebook_opt)
14872
14872 LEAK SUMMARY:
14872 definitely lost: 601 bytes in 12 blocks
14872 indirectly lost: 0 bytes in 0 blocks
14872 possibly lost: 0 bytes in 0 blocks
14872 still reachable: 1,614 bytes in 4 blocks
14872 suppressed: 0 bytes in 0 blocks
14872 Reachable blocks (those to which a pointer was found) are not shown.
14872 To see them, rerun with: –leak-check=full –show-leak-kinds=all
14872
14872 For counts of detected and suppressed errors, rerun with: -v
14872 ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 0 from 0)
問題:看到一些文章說free並不是必須, 可是大部分是好的, 比如In C, is it necessary to free a pointer at exit?, What REALLY happens when you don't free after malloc?, Does calling free or delete ever release memory back to the “system” 要視情況而定, 可是有種很難體會的感覺。
Solve memory leak
在findName中, 不需要把lastname放進entry->lastName 中, 因為strncasecmp就是用pointer即可。
學習李昆憶大大的重構方式
優化append
結果:
struct threadpool_t {
pthread_mutex_t lock;
pthread_cond_t notify;
pthread_t *threads;//放置所有thread空間的指標
threadpool_task_t *queue;//放置所有task空間的指標
int thread_count;
int queue_size;
int head;
int tail;
int count;
int shutdown;
int started;
};
threads的struct是pthread_t;
task queue的struct則是由funciton和argument組成
typedef struct {
void (*function)(void *);
void *argument;
} threadpool_task_t;
再來看執行的流程圖
TypeError: y.shiftX is not a function
不懂的地方:
next = (pool->tail + 1) % pool->queue_size;
pool->head = (pool->head + 1) % pool->queue_size;
上面這個是像一個ring, 有head和tail在兩端移動
從圖上可以發現四個thread的時候最快, 8個以上時間就變得相當高
改進:
學習github.ignore
thread pool影響
ruby
2016_Autumn
HW2
phonebook-concurrent