# Mutex and Semaphore Mutex、Semaphore 跟 Binary Semaphore 的差別網路上講解的非常清楚。Google 搜尋的答覆不外乎以下幾種: 1. Ownership 的概念 2. 使用上本質的差異(資料保護或執行緒同步) 3. Mutex 可以,但 Semaphore 所不能解決的問題 (e.g. priority inversion,recursive deadlock) ++**但,你用的 Mutex 真的是你知道的 Mutex 嗎?**++ ## 實驗設計 考慮三條 thread A, B, C Thread A, B 想辦法去取得 critical section 的操作後並 lock 住,但不釋放 lock。因此當 A 取得資源後,B 便會 waiting,反之亦然。而 Thread C 負責從第三方去實現 unlock。 ### 預期行為 mutex 因為有 **ownership** 的概念,上述的操作是不被允許。解鎖將會失敗 (mutex 的所有權在 A,但 C 想去解鎖),然後未取得資源的 thread 會 deadlock。 而 binary semaphore 則會成功。 考慮下方兩個實作的比較: Binary Semaphore v.s. Mutex ```shell $ gcc file.c -o file -lpthread ``` - [ ] pthread binary Semaphore ```clike #include <stdio.h> #include <pthread.h> #include <semaphore.h> #include <unistd.h> sem_t bin_sem; void* lock_thread(void* arg) { sem_wait(&bin_sem); printf("thread %d Entered..\n", (int) pthread_self()); return NULL; } void* unlock_thread(void* arg) { printf("thread %d try to unlock\n", (int) pthread_self()); sem_post(&bin_sem); return NULL; } int main() { sem_init(&bin_sem, 0, 1); pthread_t t1, t2, t3; pthread_create(&t1, NULL, lock_thread, NULL); pthread_create(&t2, NULL, lock_thread, NULL); sleep(2); pthread_create(&t3, NULL, unlock_thread, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); sem_destroy(&bin_sem); return 0; } ``` 執行結果: ```shell thread 778938112 Entered.. thread 762152704 try to unlock thread 770545408 Entered.. ``` - [ ] Mutex ```clike #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <semaphore.h> #include <unistd.h> #include <errno.h> #include <string.h> pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; void* lock_thread(void* arg) { pthread_mutex_lock(&mutex); printf("thread %d Entered..\n", (int) pthread_self()); return NULL; } void* unlock_thread(void* arg) { printf("thread %d try to unlock\n", (int) pthread_self()); int err = pthread_mutex_unlock(&mutex); if (err) fprintf(stderr, "Value of errno: %s\n", strerror(err)); return NULL; } int main() { pthread_t t1, t2, t3; pthread_create(&t1, NULL, lock_thread, NULL); pthread_create(&t2, NULL, lock_thread, NULL); sleep(2); pthread_create(&t3, NULL, unlock_thread, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); pthread_mutex_destroy(&mutex); return 0; } ``` 執行結果: ```shell thread 1699403520 Entered.. thread 1610610432 try to unlock thread 1691010816 Entered.. ``` **哪尼,Mutex 就給我這樣解鎖了??** ~~太沒潔操了吧~~ <img src="https://i.imgur.com/EL0OMV8.png" style="width:300px;display: block;margin:auto;"> --- **事情並不單純,這樣怎麼跟面試官交代。** 參考 [man page `pthread_mutex_unlock()`](https://linux.die.net/man/3/pthread_mutex_lock) 裏面有提到 If the mutex type is **PTHREAD_MUTEX_DEFAULT**, attempting to recursively lock the mutex results in undefined behavior. **Attempting to unlock the mutex if it was not locked by the calling thread results in undefined behavior.** Attempting to unlock the mutex if it is not locked results in undefined behavior. > 原來是 undefined behavior (UB) 的部份阿,但從實驗結果來看,UB 還是將資源釋放出來了,這樣不是跟 binary semaphore 一樣了嗎? > [name=ShaoHua Wang] 如果有仔細地看 [man page `pthread_mutex_unlock()`](https://linux.die.net/man/3/pthread_mutex_lock) 會發現 If the mutex type is **PTHREAD_MUTEX_ERRORCHECK**, then error checking shall be provided. If a thread attempts to relock a mutex that it has already locked, an error shall be returned. **If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, an error shall be returned.** <img src="https://i.imgur.com/1sC3qRU.png" style="width:300px;display: block;margin:auto;"> --- 再參考以下實作: ### Mutex w/ Error Check ```clike #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <semaphore.h> #include <unistd.h> #include <errno.h> #include <string.h> // Error Check Type Mutex pthread_mutex_t mutex = PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP; void* lock_thread(void* arg) { //wait pthread_mutex_lock(&mutex); printf("thread %d Entered..\n", (int) pthread_self()); return NULL; } void* unlock_thread(void* arg) { printf("thread %d try to unlock\n", (int) pthread_self()); int err = pthread_mutex_unlock(&mutex); if (err) fprintf(stderr, "Value of errno: %s\n", strerror(err)); return NULL; } int main() { pthread_t t1, t2, t3; pthread_create(&t1, NULL, lock_thread, NULL); pthread_create(&t2, NULL, lock_thread, NULL); sleep(2); pthread_create(&t3, NULL, unlock_thread, NULL); pthread_join(t1, NULL); pthread_join(t2, NULL); pthread_join(t3, NULL); pthread_mutex_destroy(&mutex); return 0; } ``` 執行結果: ```shell thread 885999360 Entered.. thread 869213952 try to unlock Value of errno: Operation not permitted (deadlock) ``` **果然行為才如預期一般!** <img src="https://i.imgur.com/51EPw1V.png" style="width:300px;display: block;margin:auto;"> --- 那這樣不就代表 **defualt mutex = binary semaphore** 了嗎? ~~不行,我不能接受~~ 其實不然,我們從 pthread mutex 實作來看為什麼預設的 Mutex 會這樣做 * [glibc `pthread.h`](https://code.woboq.org/userspace/glibc/sysdeps/nptl/pthread.h.html#PTHREAD_MUTEX_TIMED_NP) 其中 **PTHREAD_MUTEX_TIMED_NP** 是預設的 mutex 的屬性 ```clike enum { PTHREAD_MUTEX_TIMED_NP, PTHREAD_MUTEX_RECURSIVE_NP, PTHREAD_MUTEX_ERRORCHECK_NP, PTHREAD_MUTEX_ADAPTIVE_NP .... }; ``` * 在 [glibc `pthread_mutex_lock.c`](https://code.woboq.org/userspace/glibc/nptl/pthread_mutex_lock.c.html) 中可以發現,不論是什麼類型的 mutex 都會紀錄下持有者 (**ownership**) ```clike int __pthread_mutex_lock (pthread_mutex_t *mutex) { unsigned int type = PTHREAD_MUTEX_TYPE_ELISION (mutex); /* * check mutex type code section */ pid_t id = THREAD_GETMEM (THREAD_SELF, tid); /* Record the ownership.*/ mutex->__data.__owner = id; /* * other code section */ return 0; } /* * other code section */ #ifndef __pthread_mutex_lock weak_alias (__pthread_mutex_lock, pthread_mutex_lock) hidden_def (__pthread_mutex_lock) #endif ``` * 而導致 default mutex 因第三方解鎖而釋放資源的原因在 [glibc `pthread_mutex_unlock.c`](https://code.woboq.org/userspace/glibc/nptl/pthread_mutex_unlock.c.html#__pthread_mutex_unlock_usercnt) 之中。 ```clike int attribute_hidden __pthread_mutex_unlock_usercnt (pthread_mutex_t *mutex, int decr) { int type = PTHREAD_MUTEX_TYPE_ELISION (mutex); /* * other code section */ /* 檢查 Mutex 的 type */ /* Default mutex 並不會檢查 id 是否為原本 owner 的 */ if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP) == PTHREAD_MUTEX_TIMED_NP) { /* Always reset the owner field. */ normal: mutex->__data.__owner = 0; if (decr) /* One less user. */ --mutex->__data.__nusers; /* Unlock. */ lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex)); LIBC_PROBE (mutex_release, 1, mutex); return 0; } /* * other code section */ else { /* Error checking mutex. 會檢查解鎖的 thread id 是否為 owner id */ assert (type == PTHREAD_MUTEX_ERRORCHECK_NP); if (mutex->__data.__owner != THREAD_GETMEM (THREAD_SELF, tid) || ! lll_islocked (mutex->__data.__lock)) return EPERM; goto normal; } } /* * other code section */ int __pthread_mutex_unlock (pthread_mutex_t *mutex) { return __pthread_mutex_unlock_usercnt (mutex, 1); } weak_alias (__pthread_mutex_unlock, pthread_mutex_unlock) hidden_def (__pthread_mutex_unlock) ``` 儘管 **default** mutex 跟 binary semaphore 在特定行為上是相同的,但 mutex 本身還是遵守著紀錄持有者的行為,也就是與 semaphore 最大的差異。 依據 POSIX 標準,mutex 有以下四種型態: * NORMAL * ERRORCHECK * RECURSIVE * DEFAULT 其中,在 Pthread 中,`DEFAULT` 對應於 PTHREAD_MUTEX_NORMAL,但在其他實作中可能會有截然不同的行為。使用 lock 時,應該明確指定是 ERRORCHECK 或 RECURSIVE,以確保行為符合預期。 **你今天用對 mutex 了嗎?**