2016q3 Homework2 (phonebook-concurrent)

contributed by <shouchengH>

malloc、calloc、alloca 比較

malloc : 向 heap 申請空間，一開始無做初始化 (type為void *，回傳NULL表示無空間)，所以需要 memset。realloc 則是用來調整空間大小，作法為重新宣告一段空間，在搬之前空間到目前空間 (回傳NULL表示ptr為原來位址，否則為新位址空間)，使用free 釋放。 void *malloc(size_t size);
calloc : 類似malloc，將空間初始化設置 0，sbrk則是增加、釋放空間大小，void *sbrk(intptr_t increment); ，increment + 則分配空間，- 則釋放空間， void *calloc(size_t nmemb, size_t size);
alloca : 向 stack 申請空間，所以不需要釋放，因為在某function內宣告，只會存在此function。 void *alloca( size_t size );
alloc : alloc.h 不是一個在ANSI(American Standards Association)的函式庫，其中定義alloca、malloc 等函式。
C99 SPEC中出現：

extern void *alloc(size_t);
double *dp = alloc(sizeof *dp);

alloc function需要回傳一個大小與宣告一樣的空間 (sizeof *dp -> double 大小)

測試結果 ：分別使用-std=c99、C11、C90 測試alloc 、 aligned_alloc 、 malloc， aligned_alloc 用於C11， C11 SPEC 中定義 void *aligned_alloc(size_t alignment, size_t size); 。 alloc 並無被定義

目前參考這兩篇 :

free只需要傳開頭的指標，不需要傳大小，因為malloc如果宣告12byte，他12byte+cookie來紀錄狀況
使用 malloc 分配出來的每個區塊會有一些管理這些區塊的額外的開銷，稱為 "cookie"，這些額外開銷可能有 4~32 個 bytes，如果你分配的是小區塊，那這些額外開銷就會變得非常可觀
- 如果額外的開銷是 4 bytes，那你每新建一個 4 bytes 的區塊，額外開銷就佔 50%
sysconf

編譯器最佳化

編譯器最佳化會造成程式碼順序變動， int a 表示 alloca i32, align 4 ，每次存取空間會有load/store ，
Static Single Assignment : 編譯器每個變數只能被賦值一次，所以會採用再每次賦值時加上一個編號。如果遇到if else ，會採用a₃ = Φ(a₁, a₂)，等判斷完才給其值

Example 1 ->
int foo() {					int foo() {
	int a;						int a;
	a = 0;						a1 = 0;
	a = 1;						a2 = 1;
	return a;					return a2;
}						}
此時第二個foo 可以直接回傳a2

Constant Propagation : 將b以數值代入，可以少一個load的時間，是一種 Copy Propagation

Example 2 ->
int foo(int a) {				int foo(int a) {
	int b = 10;					int b = 10;
	return a + b;					return a + 10;
}						}

Copy Propagation : 去掉變數不必要的複製，減少 cache miss

Example 3 ->
b = a;						b = a;
c = b;						c = a;

Inline Function : 這樣可以減少函數呼叫的時間，如果運算過大(如遞迴)，編譯器不會執行Inline

Example 4 ->
int main() {					int main() {
	square(2);					{
	square(4);						x = 2;	
}								printf("%d", x*x);
							}
inline void square(int x){				{
	printf("%d", x*x)					x = 4;
}								printf("%d", x*x);
							}
						}

Dead Code Elimination : 將多餘的程式碼去除，使用上述方法，會出現很多不被需要的程式碼
Common Subexpression Elimination : 可以減少許多多餘的運算

Example 5 ->
#define STEP 3 
#define SIZE 100 
int i, j, k, p[SIZE]; 				j = 2 * STEP;
for (i = 0; i < SIZE; ++i) { 			for (i = 0; i < SIZE; ++i) 
	j = 2 * STEP; 					k = p[i] * j;
	k = p[i] * j; 
}

Loop Unroll : 再HW1介紹過

GCC 提供-O0(不做最佳化)、-O1、-O2、-O3、-Os ，可將C程式碼轉成組合語言，分別看其最佳化結果

MMAP

所有user建立的執行檔都是user space，OS kenrel的執行程式碼則是kernel space 程式碼。在32位元的CPU，LINUX把0~3G的虛擬位址空間分給user space，3G~4G的虛擬位址空間分給kernel space
通常 I/O 對應的位址在kernel space ，不由 user 隨意使用，要使用則需使用system call
使用MMAP, user 可以載user space直接使用記憶體配置的 file 空間
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
void *mmap(void *start,size_t length, int prot, int flags, int fd, off_t offsize);
start : 起始位置，通常設為NULL，kernel 會自己找尋位置
length : 映射大小 (文件大小)，
prot : PROT_EXEC 映射區域可被執行
PROT_READ 映射區域可被讀取
PROT_WRITE 映射區域可被寫入
PROT_NONE 映射區域不能存取
flags : MAP_SHARED 允許其他映射該文件的行程共享，對映射區域的寫入資料會複製回文件
fd : open 的回傳值，-1代表

回傳值 ：回傳起始位置

POSIX Thread

Thread = 4的結果圖
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
方法一：使用CPU榜定，預想如果OS只讓thread 都執行在同一個CPU，所以讓thread分別榜定在某一個CPU上，達成加快效能

#define __USE_GNU 
#include <unistd.h>
#include <sched.h>

CPU_ZERO(&mask);      
CPU_SET(count%num, &mask); 
sched_setaffinity(app->tid+1, sizeof(mask), &mask);

Image Not Showing Possible Reasons