contributed by < stanleytazi >
stanleytazi
hw4
phonebook
Architecture: x86_64
CPU 作業模式: 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
每核心執行緒數:2
每通訊端核心數:2
Socket(s): 1
NUMA 節點: 1
供應商識別號: GenuineIntel
CPU 家族: 6
型號: 69
Model name: Intel(R) Core(TM) i5-4260U CPU @ 1.40GHz
製程: 1
CPU MHz: 926.484
CPU max MHz: 2700.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.06
虛擬: VT-x
L1d 快取: 32K
L1i 快取: 32K
L2 快取: 256K
L3 快取: 3072K
NUMA node0 CPU(s): 0-3
上課前老師列出可以探討的其他同學的共筆,發現 csielee 想要探討的問題跟我的滿類似,甚至更深入,可以參考,包括 pthread_setconcurrency() 的作用、把實作跟界面切割,這樣可以讓不同的實作更容易做替換、以直接做 mmap() 取代 file alignment + mmap()
The pthread_setconcurrency() function informs the implementation of
the application's desired concurrency level, specified in new_level.
The implementation takes this only as a hint: POSIX.1 does not
specify the level of concurrency that should be provided as a result
of calling pthread_setconcurrency().
...
CONFORMING TO top
POSIX.1-2001, POSIX.1-2008.
Concurrency levels are meaningful only for M:N threading
implementations, where at any moment a subset of a process's set of
user-level threads may be bound to a smaller number of kernel-
scheduling entities. Setting the concurrency level allows the
application to give the system a hint as to the number of kernel-
scheduling entities that should be provided for efficient execution
of the application.
Both LinuxThreads and NPTL are 1:1 threading implementations, so
setting the concurrency level has no meaning. In other words, on
Linux these functions merely exist for compatibility with other
systems, and they have no effect on the execution of a program.
看到最後一段,原來 manual 寫得這 寫著 Linux 上使用 pthread_setconcurrency() 與 pthread_getconcurrency() 只是為了相容,然後在程式中執行是不會有任何作用的。之後我們可以用實驗數據來看一下如果沒有這個 function 執行時間的差異
for (int i = 0; i < THREAD_NUM; i++) {
if (i == 0) {
pHead = thread_args[i]->lEntry_head;
// orig: pHead = thread_args[i]->lEntry_head->pNext;
DEBUG_LOG("Connect %d head string %s %p\n", i,
pHead->lastName, thread_args[i]->data_begin);
} else {
e->pNext = thread_args[i]->lEntry_head;
//orig: e->pNext = thread_args[i]->lEntry_head->pNext;
DEBUG_LOG("Connect %d head string %s %p\n", i,
e->pNext->lastName, thread_args[i]->data_begin);
}
e = thread_args[i]->lEntry_tail;
DEBUG_LOG("Connect %d tail string %s %p\n", i,
e->lastName, thread_args[i]->data_begin);
DEBUG_LOG("round %d\n", i);
}
int count_orig = 0;
while (fgets(line, sizeof(line), fp)) {
while (line[i] != '\0')
i++;
line[i - 1] = '\0';
i = 0;
e = append(line, e);
count_orig++;
}
CheHsuan同學共筆
看到這個想到在 OS 有提過這個function,可以把 I/O 映射到記憶體,接著就可以對記憶體做操作,以避免過多 read()/write()
發現執行時間竟然有差別,而且是會超過 20% 的差異,這樣 man 所講是哪裡有認知錯誤呢? 除非在 Linux 上 pthread 的使用是 M:N ?
size_t sizeForThread = file_size/THREAD_NUM;
int fd = open(DICT_FILE, O_RDONLY | O_NONBLOCK);
off_t file_size = fsize(DICT_FILE);
map = mmap(NULL, file_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
assert(map && "mmap error");
data_end = map;
size_t sizeForThread = file_size/THREAD_NUM;
for (int i=0; i < THREAD_NUM; i++) {
thread_args[i].data_start = data_end;
if (i != (THREAD_NUM-1)) {
data_end = map + (i + 1) * sizeForThread;
while (*data_end != '\n')
data_end++;
thread_args[i].data_end = data_end;
data_end++;
}
else
thread_args[i].data_end = map + file_size;
thread_args[i].entry_list_head = NULL;
thread_args[i].entry_list_tail = NULL;
}
void append(void *arg)
{
thread_arg *t_arg = (thread_arg *) arg;
char *data = t_arg->data_start;
int w = 0;
int count = 0;
entry *e = NULL;
while (data < t_arg->data_end) {
if (*(data+w) == '\n') {
count++;
e = (entry *)malloc(sizeof(entry));
e->lastName = data;
*(data+w) = '\0';
data+=(w+1);
w = 0;
if (!t_arg->entry_list_tail)
t_arg->entry_list_tail = e;
e->pNext = t_arg->entry_list_head;
t_arg->entry_list_head = e;
}
w++;
}
t_arg->count = count;
pthread_exit(NULL);
}
2.27 │3f: mov -0x20(%rbp),%eax ▒
2.27 │ movslq %eax,%rdx ▒
2.27 │ mov -0x18(%rbp),%rax ▒
2.27 │ add %rdx,%rax ▒
4.55 │ movzbl (%rax),%eax ▒
25.00 │ cmp $0xa,%al ▒
│ ↓ jne c9 ▒
│ count++; ▒
4.55 │ addl $0x1,-0x1c(%rbp) ▒
│ e = (entry *)malloc(sizeof(entry)); ▒
11.36 │ mov $0x18,%edi ▒
│ → callq malloc@plt ▒
2.27 │ mov %rax,-0x8(%rbp) ▒
│ e->lastName = data; ▒
│ mov -0x8(%rbp),%rax ▒
6.82 │ mov -0x18(%rbp),%rdx ▒
│ mov %rdx,(%rax) ▒
│ *(data+w) = '\0'; ▒
4.55 │ mov -0x20(%rbp),%eax ▒
│ movslq %eax,%rdx ▒
│ mov -0x18(%rbp),%rax