2016q3 Homework2 (phonebook-concurrency)

contribute by <kobeyu>

tags: `kobeyu`

指定閱讀

the free lunch is over

Concurrency vs Parallelism

concurrency與Parallelism最明顯的差別在於cpu的數量,concurrent是將要做的工作切分成數個小的task然後交錯的由單一cpu處理,而Parallelism則是多個cpu能夠同時進行運算,但Parallelism要注意的是欲將處理的資料之間是沒有前後順序與相依性的問題.

Sequenced-before

當我們在撰寫程式很直覺的會認為程式是依照我們所書寫的方式進行處理的,但實際情況並非如此,而Sequenced-before就是在描述這樣的關係:

如果A is sequenced-before B，代表A的求值會先完成，才進行對B的求值
如果A is not sequenced before B 而且 B is sequenced before A，代表B的求值會先完成，才開始對A的求值。
如果A is not sequenced before B 而且 B is not sequenced before A，代表兩種可能，一種是順序不定，甚至這兩者的求值過程可能會重疊（因為CPU優化指令交錯的關係）或不重疊。

解析程式碼(phonebook-concurrency)

資料夾結構

.
├── LICENSE
├── Makefile
├── README.md
├── calculate.c
├── debug.h
├── dictionary
│   ├── Prime-10000.csv
│   ├── all-names.txt
│   ├── female-names.txt
│   ├── male-names.txt
│   ├── tolowercase.c
│   ├── words.txt
│   └── words_test.txt
├── file.c
├── file_align.c
├── main.c
├── phonebook_opt
├── phonebook_opt.c
├── phonebook_opt.h
├── phonebook_orig
├── phonebook_orig.c
├── phonebook_orig.h
└── scripts
	├── install-git-hooks
	├── pre-commit.hook
	└── runtime.gp

2 directories, 24 files

file.c

看到了這個專案新增了file.c的檔案,進一步去閱讀程式碼發現是要將原本的word.c中每一筆資料取出,然後補足#define MAX_BUFF_SIZE 1000的大小,word.c會從原本的3206080變成align.txt的5598400,在思考為何要這麼做以及有沒有更好的方法.
=>padding的目的是讓每一行的字串變成相同的大小,原本讀一行需有個for迴圈判斷\0 or \n,若是固定長度可以用mmap一次讀取一行增進效率.

main.c

pthread_setconcurrency

man:

NOTES
	   The default concurrency level is 0.

	Concurrency levels are meaningful only for M:N threading implementations, where at any moment a subset of a process's set of user-level
   threads  may  be bound to a smaller number of kernel-scheduling entities.  Setting the concurrency level allows the application to give
   the system a hint as to the number of kernel-scheduling entities that should be provided for efficient execution of the application.

   Both LinuxThreads and NPTL are 1:1 threading implementations, so setting the concurrency level has no  meaning.   In  other  words,  on
   Linux these functions merely exist for compatibility with other systems, and they have no effect on the execution of a program.