Parallel Programming and System LAB 2

## 目標 1. 量化 Mutex (Futex)、Spinlock (Busy-wait) 與 Atomic (CAS) 的效能差異。 2. 分析高競爭 (High Contention) 下的排程器行為與 Context Switch 成本。 3. Padding, Data Layout Strategy ## 平行的代價在CA等課程中，大多會在pipline單元接觸到 **Harzard** 的概念，而這裡的平行化其實也有著類似的問題。除去完全無依賴性的情況，**data harzard**的發生會是顯而易見的，但在平行化中更多的應該會是硬體爭奪，這是源自於計算資源有限，這從小範圍的memory、regitster、ALU，到更大的單位如Threads、L1/2/3 Cache、Main Memory、SM、Core等。 ### 執行緒的資源清單 | 資源類型 | 狀態 | | :--- | :--- | | **Stack (堆疊)** | **獨佔** | | **Registers (暫存器)** | **獨佔** | | **Heap / Global (堆積/全域)** | **共享** | | **Files / Sockets** | **共享**| 而在OS或CA Intro中，有提到了許多困境的形式，最常見的應該就屬**Dead Lock**，故為防止污染與競爭侵占，Lock的機制產生，但這也對平行化產生了負面影響 > 在平行計算中，我們稱這種爭搶為 **Contention（競爭）**。 > 如果你的程式 90% 的時間都在處理自己私有的 Stack 數據，你的加速比會很完美；但如果你 90% 的時間都在搶那個全域的 `counter`，你的電腦核心再多也沒用，因為大家都卡在排隊。 ### 競爭情況(簡述) 當 Thread A 搶到鎖，而 Thread B 也想要時： 1. **User Space (Spin)**：Thread B 會先在門外等著（Spinlock），等A出來。 2. **Kernel Space (Context Switch)**：如果 A 遲遲不出來，OS 就會介入，把 B 移出 CPU，使B Threads sleep。 3. **喚醒**：當 A 釋放鎖，OS wake up B threads。這個wake up 即recreate，也就是之前所測量的overhead。 ### OS 機制 * **序列化效應（Serialization）** 即使有 4 個執行緒，但在任何時間點，LOCKED的狀態永遠只會出現在一個執行緒的賽道上。這就是互斥鎖的本質：將並行變回串列。 * **Queue** * **Unfairness** 除非有piority，不然fmutex是不公平的 ### 視覺化的code #### visualizer_thread_contention ```c // ~/pthreads/visualizer/pthread/visualizer_thread_contention.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define NUM_THREADS 5 #define ITERATIONS 10 // --- 共享資源 --- int shared_pot = -1; // 所有人都要搶著修改 int use_lock = 0; // 安全開關：0 = 沒鎖 (亂搶), 1 = 有鎖 (排隊) pthread_mutex_t lock; void print_ui(int id, const char* action, const char* color, int val) { // 每個執行緒有自己的行，顯示目前的動作和讀取到的值 printf("\033[%d;1HThread %d: [%s%-12s\033[0m] | 讀取到的值: %2d", id + 4, id, color, action, val); // 在上方顯示目前的值真相 printf("\033[2;1H================================"); printf("\033[3;1H [ 共享變數的現況 ] -> 【 %2d 】", shared_pot); printf("\033[2;35H模式: %s", use_lock ? "Mutex ON" : "No Lock"); fflush(stdout); } void* thread_worker(void* arg) { int id = *(int*)arg; for (int i = 0; i < ITERATIONS; i++) { // 1. 準備階段 print_ui(id, "準備中", "\033[32m", shared_pot); usleep(rand() % 800000 + 200000); // 2. 進入爭奪 if (use_lock) pthread_mutex_lock(&lock); print_ui(id, "正在修改!", "\033[31m", shared_pot); // 模擬「讀取 -> 修改 -> 寫回」的延遲 int temp = shared_pot; usleep(100000); // 故意製造空隙讓別人有機會插隊 shared_pot = id; if (use_lock) pthread_mutex_unlock(&lock); // 3. 修改完成 usleep(500000); } return NULL; } int main(int argc, char** argv) { if (argc > 1) use_lock = atoi(argv[1]); // 可以透過指令參數控制 pthread_t threads[NUM_THREADS]; int ids[NUM_THREADS]; pthread_mutex_init(&lock, NULL); printf("\033[2J\033[?25l"); // 清屏、隱藏游標 printf("\033[1;1H=== 共享變數強奪實驗 ==="); for (int i = 0; i < NUM_THREADS; i++) { ids[i] = i; pthread_create(&threads[i], NULL, thread_worker, &ids[i]); } for (int i = 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); printf("\033[%d;1H\n實驗結束。請觀察落差。\033[?25h\n", NUM_THREADS + 5); return 0; } ``` #### visualizer_thread_static ```c // ~/pthread/visualizer_thread_static.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #define NUM_THREADS 4 #define TOTAL_STEPS 10 pthread_mutex_t lock; int lock_counts[NUM_THREADS] = {0}; // 紀錄每個 Thread 搶到鎖的次數 void print_status(int id, const char* status, const char* color) { // 顯示 ID, 狀態, 以及搶到鎖的累積次數 printf("\033[%d;1HThread %d: [%s%-12s\033[0m] | Success: %d 次", id + 3, id, color, status, lock_counts[id]); fflush(stdout); } void* thread_worker(void* arg) { int id = *(int*)arg; for (int i = 0; i < TOTAL_STEPS; i++) { print_status(id, "THINKING", "\033[32m"); // 綠色 usleep(rand() % 500000); print_status(id, "WAITING...", "\033[33m"); // 黃色 pthread_mutex_lock(&lock); // --- 進入臨界區 --- lock_counts[id]++; print_status(id, "LOCKED!!", "\033[31m"); // 紅色 usleep(300000); // ---------------- pthread_mutex_unlock(&lock); } print_status(id, "FINISHED", "\033[34m"); return NULL; } int main() { pthread_t threads[NUM_THREADS]; int ids[NUM_THREADS]; pthread_mutex_init(&lock, NULL); srand(time(NULL)); printf("\033[2J\033[?25l\033[1;1H=== 資源爭奪計數器 ===\n"); for (int i = 0; i < NUM_THREADS; i++) { ids[i] = i; pthread_create(&threads[i], NULL, thread_worker, &ids[i]); } for (int i = 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); printf("\033[%d;1H\n總結：即便核心再多，鎖（Lock）依然讓執行變回了排隊。\n\033[?25h", NUM_THREADS + 4); pthread_mutex_destroy(&lock); return 0; } ``` ## Synchronization Primitives 在 Linux 中，同步原語的實作涉及硬體指令與作業系統調度器的協作。我們將比較三種核心技術： * **Mutex (POSIX Mutex)** 基於 futex (Fast Userspace Mutex)。在無競爭時待在使用者態，有競爭時透過系統呼叫進入核心態掛起執行緒。 * **Spinlock** 執行緒在使用者態不斷輪詢（Busy-waiting）。不觸發 Context Switch，但極度消耗 CPU 週期。 > Context Switch : CPU在不同的排程中，切換不同register、PC、Stack Pointer的狀況，也就是切換threads * **Atomic (C11/C++11 Memory Model)** 利用硬體層級的 CAS (Compare-and-Swap) 指令，直接在 CPU 快取一致性協定（MESI）層級完成同步。 ### Threads Contention ```c //sync_compare.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <stdatomic.h> #define NUM_ITERATIONS 10000000 #define NUM_THREADS 16 typedef struct { int counter; // 給 Mutex 與 Spinlock 使用的普通整數 pthread_mutex_t mutex; // OS 級別的互斥鎖 pthread_spinlock_t spinlock; // 使用者態的自旋鎖 _Atomic int atomic_counter; // 硬體級別的原子變數 } shared_data_t; // 測試函式指標類型 typedef void* (*test_func_t)(void*); // --- 不同的同步實作 --- void* test_mutex(void* arg) { shared_data_t* data = (shared_data_t*)arg; for (int i = 0; i < NUM_ITERATIONS / NUM_THREADS; i++) { pthread_mutex_lock(&data->mutex); // 確定鎖沒鎖 data->counter++; // 臨界區 (Critical Section) pthread_mutex_unlock(&data->mutex); // 釋放鎖 } return NULL; } // pthread_mutex_lock 在 Linux 下 // 通常實作為 Futex (Fast Userspace Mutex)。 // 它會先嘗試在 User-space 獲取鎖， // 失敗後才會透過系統呼叫進入核心態掛起執行緒， // 這是為了減少不必要的 Context Switch。 void* test_spinlock(void* arg) { shared_data_t* data = (shared_data_t*)arg; for (int i = 0; i < NUM_ITERATIONS / NUM_THREADS; i++) { pthread_spin_lock(&data->spinlock); // 不斷循環檢查鎖是否釋放，CPU 不休息 data->counter++; pthread_spin_unlock(&data->spinlock); } return NULL; } // Spinlock 適合鎖定時間極短的情境。 // 但在環境中，若執行緒發生 Preemption（被 OS 強行奪走 CPU）， // 其它在 Spin 的執行緒會白白浪費整個時間片（Time Slice）。 void* test_atomic(void* arg) { shared_data_t* data = (shared_data_t*)arg; for (int i = 0; i < NUM_ITERATIONS / NUM_THREADS; i++) { atomic_fetch_add(&data->atomic_counter, 1); } return NULL; } // 對應到 x86 架構的 LOCK ADD 指令。 // 它不需要經過 OS 核心， // 而是透過 CPU 的 Cache Coherency Protocol (如 MESI) // 在快取層級鎖定該記憶體行（Cache Line）， // 避免其它核心同時修改。 double run_experiment(test_func_t func, shared_data_t* data) { pthread_t threads[NUM_THREADS]; struct timespec start, end; // Default 初始化計數器 data->counter = 0; atomic_store(&data->atomic_counter, 0); // fork clock_gettime(CLOCK_MONOTONIC, &start); for (int i = 0; i < NUM_THREADS; i++) pthread_create(&threads[i], NULL, func, data); for (int i = 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); clock_gettime(CLOCK_MONOTONIC, &end); return (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9; } int main() { shared_data_t data; pthread_mutex_init(&data.mutex, NULL); pthread_spin_init(&data.spinlock, PTHREAD_PROCESS_PRIVATE); printf("執行緒數量: %d, 總迭代次數: %d\n", NUM_THREADS, NUM_ITERATIONS); printf("Mutex 耗時: %f s\n", run_experiment(test_mutex, &data)); printf("Spinlock 耗時: %f s\n", run_experiment(test_spinlock, &data)); printf("Atomic 耗時: %f s\n", run_experiment(test_atomic, &data)); pthread_mutex_destroy(&data.mutex); pthread_spin_destroy(&data.spinlock); return 0; } ``` #### Compile & Eval ``` # 1. 編譯 gcc -O3 -pthread src/pthreads/sync_compare.c -o bin/sync_compare # 2. 測量 Mutex (觀察 Context Switches) perf stat -e cpu-clock,faults,context-switches,cpu-migrations ./sync # WSL的話要先改，因為在 WSL2 中，系統核心（Kernel）是由微軟提供的特製版本，且 sysctl 的路徑處理有時與原生 Ubuntu 不同 sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid' ``` #### Return Mutex ``` 執行緒數量: 16, 總迭代次數: 10000000 Mutex 耗時: 0.540014 s Performance counter stats for './sync': 89271 context-switches 586 cpu-migrations <not supported> cycles 0.541140951 seconds time elapsed 0.706919000 seconds user 6.872599000 seconds sys ``` #### Return Spin ``` 執行緒數量: 16, 總迭代次數: 10000000 Spinlock 耗時: 1.273953 s Performance counter stats for './sync': 456 context-switches 3 cpu-migrations <not supported> cycles 1.275071647 seconds time elapsed 17.841535000 seconds user 0.007791000 seconds sys ``` #### Return Atomic ``` 執行緒數量: 16, 總迭代次數: 10000000 Atomic 耗時: 0.183654 s Performance counter stats for './sync': 32 context-switches 1 cpu-migrations <not supported> cycles 0.184785650 seconds time elapsed 2.445988000 seconds user 0.003851000 seconds sys ``` ## No Lock? 如果沒有鎖，怎麼辦? - Cache Coherence & False Sharing > 記憶體分區 ```c // nolock.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #define ITERATIONS 100000000 // 一億次運算，放大效能差距 #define NUM_THREADS 4 // 建議設定為你的實體核心數 // 情況 A：緊鄰存放 (觸發 False Sharing) struct bad_struct { long volatile count[NUM_THREADS]; }; // 情況 B：對齊存放 (規避 False Sharing) // 使用 64 位元組的 Padding 確保每個 count 獨立佔用一個 Cache Line struct good_struct { struct { long volatile count; long padding[7]; // 1個long(8b) + 7個long(56b) = 64 bytes } data[NUM_THREADS]; }; struct bad_struct s1; struct good_struct s2; void* bench_bad(void* arg) { int id = *(int*)arg; for (long i = 0; i < ITERATIONS; i++) { s1.count[id]++; } return NULL; } void* bench_good(void* arg) { int id = *(int*)arg; for (long i = 0; i < ITERATIONS; i++) { s2.data[id].count++; } return NULL; } double get_time(void* (*func)(void*)) { pthread_t threads[NUM_THREADS]; int ids[NUM_THREADS]; struct timespec start, end; clock_gettime(CLOCK_MONOTONIC, &start); for (int i = 0; i < NUM_THREADS; i++) { ids[i] = i; pthread_create(&threads[i], NULL, func, &ids[i]); } for (int i = 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); clock_gettime(CLOCK_MONOTONIC, &end); return (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9; } int main() { printf("--- False Sharing 效能研究 (Threads: %d) ---\n", NUM_THREADS); double time_bad = get_time(bench_bad); printf("緊鄰存放 (Bad) 耗時: %.4f 秒\n", time_bad); double time_good = get_time(bench_good); printf("對齊存放 (Good) 耗時: %.4f 秒\n", time_good); printf("加速比 (Speedup): %.2fx\n", time_bad / time_good); return 0; } ``` ``` # perf 在wsl下沒辦法看cache ，所以最多看時間 --- False Sharing 效能研究 (Threads: 4) --- 緊鄰存放 (Bad) 耗時: 0.2050 秒對齊存放 (Good) 耗時: 0.0257 秒加速比 (Speedup): 7.98x ``` 有趣的是，如果更改padding的間隔 ``` # 改為3 a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- 緊鄰存放 (Bad) 耗時: 0.3086 秒對齊存放 (Good) 耗時: 0.0343 秒加速比 (Speedup): 9.01x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- 緊鄰存放 (Bad) 耗時: 0.4284 秒對齊存放 (Good) 耗時: 0.3608 秒加速比 (Speedup): 1.19x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- 緊鄰存放 (Bad) 耗時: 0.3226 秒對齊存放 (Good) 耗時: 0.0303 秒加速比 (Speedup): 10.65x ``` 速度明顯開始抖動，雖然有環境影響(電腦內其餘工作的因素)，但padding為3顯然不能隔開，這時候就看運氣了 ### alignas * alignas(64) 的作用：確保這個變數的起始地址，一定是 64 的倍數。當你把它放在 struct aligned_element 裡並宣告成陣列時，編譯器會自動在每個 long 之間插入剛好足夠的空間（Padding），確保每個 data[i] 都佔據一個獨立的 Cache Line。 * 可移植性 (Portability)：如果你手動寫 long padding[6]，在 long 為 4 bytes 的系統上，總長度會縮水（4+24=28），導致對齊失效。但 alignas(64) 永遠會保證 64 bytes 的間距。 ```c // nolock_opt.c #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <stdalign.h> // 必須包含此標頭檔來使用 alignas #define ITERATIONS 100000000 #define NUM_THREADS 4 // 情況 A：緊鄰存放 (Bad) // 整個陣列連續存放，4 個 long 共 32 bytes，全部擠在同一個 64-byte Cache Line struct bad_struct { volatile long count[NUM_THREADS]; }; // 情況 B：對齊存放 (Good) // 我們定義一個結構體，強制它在記憶體中以 64 bytes 為單位對齊 struct aligned_element { alignas(64) volatile long count; }; struct good_struct { struct aligned_element data[NUM_THREADS]; }; struct bad_struct s1; struct good_struct s2; // --- 執行緒函式與計時邏輯 (保持不變) --- void* bench_bad(void* arg) { int id = *(int*)arg; for (long i = 0; i < ITERATIONS; i++) { s1.count[id]++; } return NULL; } void* bench_good(void* arg) { int id = *(int*)arg; for (long i = 0; i < ITERATIONS; i++) { s2.data[id].count++; } return NULL; } double get_time(void* (*func)(void*)) { pthread_t threads[NUM_THREADS]; int ids[NUM_THREADS]; struct timespec start, end; clock_gettime(CLOCK_MONOTONIC, &start); for (int i = 0; i < NUM_THREADS; i++) { ids[i] = i; pthread_create(&threads[i], NULL, func, &ids[i]); } for (int i = 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); clock_gettime(CLOCK_MONOTONIC, &end); return (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9; } int main() { printf("--- False Sharing 效能研究 (Threads: %d) ---\n", NUM_THREADS); printf("Cache Line 對齊大小設定為: 64 bytes\n\n"); double time_bad = get_time(bench_bad); printf("緊鄰存放 (Bad) 耗時: %.4f 秒\n", time_bad); double time_good = get_time(bench_good); printf("對齊存放 (Good) 耗時: %.4f 秒\n", time_good); printf("加速比 (Speedup): %.2fx\n", time_bad / time_good); return 0; } ``` ``` 緊鄰存放 (Bad) 耗時: 0.4143 秒對齊存放 (Good) 耗時: 0.0264 秒加速比 (Speedup): 15.69x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.2063 秒對齊存放 (Good) 耗時: 0.0275 秒加速比 (Speedup): 7.49x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.1933 秒對齊存放 (Good) 耗時: 0.0260 秒加速比 (Speedup): 7.44x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.2006 秒對齊存放 (Good) 耗時: 0.0261 秒加速比 (Speedup): 7.67x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.2008 秒對齊存放 (Good) 耗時: 0.0259 秒加速比 (Speedup): 7.76x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.3147 秒對齊存放 (Good) 耗時: 0.0261 秒加速比 (Speedup): 12.05x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.4128 秒對齊存放 (Good) 耗時: 0.0276 秒加速比 (Speedup): 14.93x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.2148 秒對齊存放 (Good) 耗時: 0.0251 秒加速比 (Speedup): 8.55x a@owner:~/sp/parallel-programming-lab/src/pthreads$ ./nolock --- False Sharing 效能研究 (Threads: 4) --- Cache Line 對齊大小設定為: 64 bytes 緊鄰存放 (Bad) 耗時: 0.1953 秒對齊存放 (Good) 耗時: 0.0291 秒加速比 (Speedup): 6.70x ``` ### Cache Line 與 MESI 協定現代電腦中的記憶體並不是以單一字節為單位存取的，而是以 64 Bytes 為一組的 Cache Line (快取行) 進行搬運，所以當 Thread A 修改了位址 0x100 的數據，而 Thread B 修改位址 0x104 的數據時： 1. 如果這兩個位址落在同一個 64-byte 的 Cache Line 裡。 2. 硬體的 MESI (Modified, Exclusive, Shared, Invalid) 協定會認為整個 Cache Line 都被修改了。 3. Thread B 所在的核心必須強制將其快取設為「失效」，並重新從記憶體或 Thread A 的快取中抓取最新資料。 > 這種現象被稱為 False Sharing (偽共享)：邏輯上沒有共享，硬體上卻被迫同步 ### Question 在Cache Line 與 MESI 協定下，false sharing狀況很常發生，也就是當thread A去cache內修改threads B共用的64byte的其中一個，即便沒有改到B的資料，但還是會被當作汙染重新更新或抓取。如果是在重抓的過程中Threads A又修改，那這樣Threads B不會進入無限抓取嗎? > 這被稱作快取乒乓 (Cache Ping-Pong) ，如果執行緒 A 瘋狂寫入，執行緒 B 確實會陷入一種「不斷被宣告無效、不斷重抓、抓到又立刻失效」的循環。這雖然不會導致程式當掉，但會讓效能掉進黑洞。不過，就算看起來會無限循環，CPU 硬體的匯流排仲裁 (Bus Arbitration)會介入 >* 在 MESI 協定背後，所有的核心都連在一條「訊息通道（Bus）」上。當核心 B 想要重抓資料時： > 1. 發出請求：核心 B 會發出一個 Read Invalid (RFO) 訊號。 > 2. 仲裁機制：匯流排不會永遠偏袒核心 A。硬體會確保每個核心都有機會獲得匯流排的控制權。 > 3. 強制暫停：當核心 B 獲得控制權準備讀取時，核心 A 的寫入請求會被暫時擋住（Stall），直到 B 完成這一次的抓取。 #### 數據佈局 (Data Layout) 既然硬體是 64-byte 一組在管，我們寫程式就要有「邊界感」。 | 策略 | 做法 | 效果 | | :--- | :--- | :--- | | **避讓** | 使用 `padding` 或 `alignas(64)` | 讓不同執行緒的資料住在不同「房間」。 | | **私有化** | 使用 `Thread Local Storage` | 每個執行緒先算自己的，算完才回報，完全不吵架。 | | **唯讀化** | 盡量讓共享資料變為 `const` | 如果狀態是 **S (Shared)** 且沒人修改，大家都能飛速讀取。 | 簡單來說，在無Layout與lock策略下，讓平行運作的**核心 A 改一個位元組，核心 B 附近的 63 個位元組都會跟著廢掉。** LAB3 https://hackmd.io/@1p0_VOUHTXGcHlBU9A0OEA/SywA3pXiWe

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.