Memory barrier (1)

###### tags: `Linux kernel` # Memory barrier (1) 第一次看到memory barrier是之前在看ARM MMU document，翻到最後有一小章節講到這個東西，大致上就是說CPU在跑的時候，有些指令執行起來會out of order，所以需要放一個memory barrier來確定某些instruction的執行順序。不過官方文件有時候就是這樣，只平鋪直敘的說一件事情，看的人就生出一大堆疑問。對啊我當然知道compiler會產出一些out-of-order的machine code，我當然知道CPU會為了效能out-of-order跑指令，**那這個memory barrier到底要在什麼情境下用呢？難不成我因為害怕他的out of order，需要在每行code下面都放memory barrier嗎？** 最近看linux code又看到這東西，試著稍微理解了一下memory barrier，有兩種memory barrier分別對應compiler和CPU out-of-order execution，前者是compiler barrier，後者是CPU memory barrier，這些基本觀念在參考資料裡有詳細的解釋。我這邊主要用code來解說這些觀念。底下是一個驗證CPU out-of-order execution的程式碼。編譯的時候要加上-D_GNU_SOURCE和-pthread要不然不會過。 ``` gcc -D_GNU_SOURCE -pthread -o test_mb test_mb.c ``` ``` #define _GNU_SOURCE #include <assert.h> #include <pthread.h> #include <sched.h> #include <unistd.h> #include <stdio.h> static pthread_barrier_t barr, barr_end; volatile int x, y, r1, r2; static void* thread1(void *arg) { while(1) { pthread_barrier_wait(&barr); x = 1; // store r1 = y; // load pthread_barrier_wait(&barr_end); } return NULL; } static void* thread2(void *arg) { while (1) { pthread_barrier_wait(&barr); y = 1; // store r2 = x; // load pthread_barrier_wait(&barr_end); } return NULL; } int main() { pthread_barrier_init(&barr, NULL, 3); pthread_barrier_init(&barr_end, NULL, 3); pthread_t t1, t2; pthread_create(&t1, NULL, thread1, NULL); pthread_create(&t2, NULL, thread2, NULL); int cpu_1 = 0; int cpu_2 = 1; cpu_set_t cs; CPU_ZERO(&cs); CPU_SET(cpu_1, &cs); pthread_setaffinity_np(t1, sizeof(cs), &cs); CPU_ZERO(&cs); CPU_SET(cpu_2, &cs); pthread_setaffinity_np(t2, sizeof(cs), &cs); // wait result while(1) { // init variable x = y = r1 = r2 = 0; pthread_barrier_wait(&barr); pthread_barrier_wait(&barr_end); printf("r1 = %d, r2 = %d\n", r1, r2); assert(!(r1 == 0 && r2 == 0)); } pthread_barrier_destroy(&barr); pthread_barrier_destroy(&barr_end); return 0; } ``` 這裡生出兩個thread，由ptherad_setaffinity_np()把兩個thread分別掛在兩個不同CPU上去執行。另外pthread_barrier_wait()會block current thread，直到給定數量的thread呼叫到此API才開始執行（此例為3），等所有的pthread_barrier_wait()都return之後barrier狀態復原，亦即再次block current thread，此例中等待3個thread呼叫到pthread_barrier_wait()才放行。因此程式的流程如下： 1. t1, t2在pthread_barrier_wait(&barr)等待 2. main() pthread_barrier_wait(&barr)放開t1, t2 3. t1, t2, main()又一起在pthread_barrier_wait(&barr_end)等待 4. 檢查有沒有out-of-order execution 5. main() while(1)循環這裡只要不出現out-of-order execution，則r1和r2不可能同時為0，因此assert判斷是否有此現象。結果如圖，在我的機器上不用1秒就可以跑出來： ![](https://i.imgur.com/czkjYR7.png) 最後用memory barrier來解決這個問題： ``` .. while(1) { pthread_barrier_wait(&barr); x = 1; // store __asm__ __volatile__("mfence" ::: "memory"); r1 = y; // load pthread_barrier_wait(&barr_end); } ... while (1) { pthread_barrier_wait(&barr); y = 1; // store __asm__ __volatile__("mfence" ::: "memory"); r2 = x; // load pthread_barrier_wait(&barr_end); } ... ``` UP環境下基本上不用考慮CPU run-time out-of-order execution的問題，所以在當年做ARM 926EJS這種單核CPU底下看到memory barrier，還真是難以想像他的用途。好了，那最後一個問題，這樣看來，幾乎只要跨thread之間的data share，幾乎都躲不掉這個問題不是嗎？一個thread做了一些運算得到一個數值，然後另一個thread拿走，然後做之後的判斷，這不是稀鬆平常的事嗎？為什麼不會發生問題呢？或者反過來說，一般的開發者根本想也沒想過memory barrier的事情，怎麼還可以寫出正確無誤的multi-thread程式呢？ **主要是因為，各種同步機制中已經隱含了memory barrier**，所以開發者不直接使用memory barrier，一樣不會有問題。舉個例子，spin_lock()裡的prermpt_disable()最終會呼叫memory barrier： ![](https://i.imgur.com/bSqVucT.png) ![](https://i.imgur.com/0o0oDOy.png) 不同thread之間的data share，正常的話一定會用同步機制保護吧，這樣一來也就等同於使用了memory barrier。參考資料： 1. [理解 Memory barrier](https://blog.csdn.net/zhangxiao93/article/details/42966279) 2. [Memory barrier](https://ithelp.ithome.com.tw/articles/10213513) 3. [pthread_barrier_wait](https://pubs.opengroup.org/onlinepubs/009696899/functions/pthread_barrier_wait.html)