2016q3 Homework 2 (phonebook-concurrent)

# 2016q3 Homework 2 (phonebook-concurrent) contributed by <`carolc0708`> [作業說明 A06: phonebook-concurrent](https://hackmd.io/s/rJsgh0na) ## 作業要求 - [ ]學習第二週提及的 [concurrency](/s/H10MXXoT) 程式設計: [WEEK2 課程筆記](https://hackmd.io/s/B1Rv5Z_0) - [ ]延續 [A01: phonebook](/s/S1RVdgza) 的開發方向，本作業著重於透過 POSIX Thread 來縮減 `alloc()` 的時間成本 * 詳細閱讀吳彥寬的[實驗](/s/BkN1JNQp)，指出他的實做缺失，並提出改進縮減 append() 時間的可行提案，接著開發程式來驗證 * 提示：可透過建立 thread pool 來管理 worker thread * 第一週 phonebook 未完成和待改進項目也一併在 [phonebook-concurrent](https://github.com/sysprog21/phonebook-concurrent) 的基礎下進行 - [ ] 學習 [concurrent-ll](https://github.com/jserv/concurrent-ll) (concurrent linked-list 實作) 的 scalability 分析方式，透過 gnuplot 製圖比較 list 操作的效能 ## mmap() * 在未優化的程式中，使用了 `fgets()` 會 blocking I/O :::info **blocking I/O & non-blocking I/O** * Blocking IO means that a given thread cannot do anything more until the IO is fully received (in the case of sockets this wait could be a long time). For blocking IO you either need to accept that you are going to wait for every IO request or you are going to need to fire off a thread per request (Which will get very complicated very quickly). * Non-blocking IO means an IO request is queued straight away and the function returns. The actual IO is then processed at some later pointer by the kernel. For non-blocking IO you can send off multiple requests but you need to bear in mind that the data will not be available until some "later" point. This checking that the data has actually arrived is probably the most complicated part. ::: 這樣一來，即使將程式改為 multithread 版本也無法保證效能提升。故使用 mmap() 先將記憶體內容映射到對應的記憶體空間，做快速的記憶體存取。 * 參考 [記憶體映射函數 mmap 的使用方法](http://welkinchen.pixnet.net/blog/post/41312211-%E8%A8%98%E6%86%B6%E9%AB%94%E6%98%A0%E5%B0%84%E5%87%BD%E6%95%B8-mmap-%E7%9A%84%E4%BD%BF%E7%94%A8%E6%96%B9%E6%B3%95) :::info MMAP(2) Linux Programmer's Manual MMAP(2) NAME mmap, munmap - map or unmap files or devices into memory SYNOPSIS ``` #include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); int munmap(void *addr, size_t length); ``` DESCRIPTION **mmap()** creates a new mapping in the virtual address space of the calling process. The starting address for the new mapping is specified in addr. The length argument specifies the length of the mapping. If addr is NULL, then the kernel chooses the address at which to create the mapping; this is the most portable method of creating a new mapping. If addr is not NULL, then the kernel takes it as a hint about where to place the mapping; on Linux, the mapping will be created at a nearby page boundary. The address of the new mapping is returned as the result of the call. The contents of a file mapping (as opposed to an anonymous mapping; see MAP_ANONYMOUS below), are initialized using length bytes starting at offset offset in the file (or other object) referred to by the file descriptor fd. offset must be a multiple of the page size as returned by sysconf(_SC_PAGE_SIZE). **munmap()** The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region. The address addr must be a multiple of the page size. All pages containing a part of the indicated range are unmapped, and subsequent references to these pages will generate SIGSEGV. It is not an error if the indicated range does not contain any mapped pages. ::: ## dprintf() `int dprintf(int fd, const char *format, ...);` The function dprintf() is the same as fprintf(3) except that it outputs to a file descriptor, fd, instead of to a stdio stream. ## pthread_setconcurrency() NAME pthread_setconcurrency, pthread_getconcurrency - set/get the concurrency level SYNOPSIS #include <pthread.h> int pthread_setconcurrency(int new_level); int pthread_getconcurrency(void); Compile and link with -pthread. DESCRIPTION The pthread_setconcurrency() function informs the implementation of the application's desired concurrency level, specified in new_level. The implementation takes this only as a hint: POSIX.1 does not specify the level of concurrency that should be provided as a result of calling pthread_setconcurrency(). Specifying new_level as 0 instructs the implementation to manage the concurrency level as it deems appropriate. pthread_getconcurrency() returns the current value of the concurrency level for this process. RETURN VALUE On success, pthread_setconcurrency() returns 0; on error, it returns a nonzero error number. pthread_getconcurrency() always succeeds, returning the concurrency level set by a previous call to pthread_setconcurrency(), or 0, if pthread_setconcur‐ rency() has not previously been called. ERRORS pthread_setconcurrency() can fail with the following error: EINVAL new_level is negative. POSIX.1 also documents an EAGAIN error ("the value specified by new_level would cause a system resource to be exceeded"). VERSIONS These functions are available in glibc since version 2.1. ATTRIBUTES For an explanation of the terms used in this section, see attributes(7). ┌──────────────────────────┬───────────────┬─────────┐ │Interface │ Attribute │ Value │ ├──────────────────────────┼───────────────┼─────────┤ │pthread_setconcurrency(), │ Thread safety │ MT-Safe │ │pthread_getconcurrency() │ │ │ └──────────────────────────┴───────────────┴─────────┘ ## PThread 來縮減 alloc() 的時間成本 #### 先建置一般的 Thread Pool ![](https://i.imgur.com/w7R8x3P.png) * 複習 condition variable 的概念 ![](https://i.imgur.com/WRWJzKh.png) :::info * **pthread_cond_wait()** blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it. * Recommendation: Using a WHILE loop instead of an IF statement (see watch_count routine in example below) to check the waited for condition can help deal with several potential problems, such as: * If several threads are waiting for the same wake up signal, they will take turns acquiring the mutex, and any one of them can then modify the condition they all waited for. * If the thread received the signal in error due to a program bug The Pthreads library is permitted to issue spurious wake ups to a waiting thread without violating the standard. * The **pthread_cond_signal()** routine is used to signal (or wake up) another thread which is waiting on the condition variable. It should be called after mutex is locked, and must unlock mutex in order for pthread_cond_wait() routine to complete. * The **pthread_cond_broadcast()** routine should be used instead of pthread_cond_signal() if more than one thread is in a blocking wait state. * It is a logical error to call pthread_cond_signal() before calling pthread_cond_wait(). * Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example: * Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block. * Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked). ::: * 參考 [Threadpool 程式碼](https://github.com/mbrossard/threadpool/blob/master/src/threadpool.c)，實作 threadpool 在此使用 `thread_num = 4`, `queue_size = 16` ，執行時間與沒有 threadpool 的 optimized 版本相比其實沒有差很多。 ![](https://i.imgur.com/mkzPh71.png) #### Lock-free Thread Pool ![](https://i.imgur.com/SShhQqR.png) * 參考 [LFT pool](https://github.com/xhjcehust/LFTPool) 程式碼，實作 lock-free threadpool 在此使用 `thread_num = 4`，結果 append() 執行時間暴增。 ![](https://i.imgur.com/KM75HK3.png)