論文〈Dude, is my code constant time〉導讀

contributed by < chiangkd >

oreparaz/dudect
論文

Timing Leakage Detection

執行 leakage detection test，透過量測兩個不同的資料的執行時間 (execution time) 來檢查二者的時間分佈有沒有顯著性差異 (statistically different)

第 1 步: 測量執行時間

重複測量兩個不同資料類型的輸入在加密函數 (cryptographic function) 的執行時間

a) Class definition

常見 fix-vs-random tests，透過給定兩種不同類別的輸入資料，重複對 function 進行量測

固定資料: 常用 corner-case
隨機資料

b) Cycle counters

現在 CPU 提供 cycle counter ，例如 x86 架構的 TSC 及 Arm 架構的 performance counter，可以精準統計執行成本。

c) Environmental conditions

為了最小化環境改變對測試結果的影響，每個測量都會對應到隨機選擇的 class

第 2 步: 進行後處理

在每個獨立的統計測量之前先做一些後處理 (post-processing)。

a) Cropping

在較長的執行時間中，典型的時間分佈會呈現正偏態 (positive skew)。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

可能的原因為測量誤差 (measurement artifacts)，主要的行程被作業系統或其他外部無關的活動 (extraneous activities) 打斷 (interrupt)，可藉由過裁切 (crop) 掉大於某個大於、固定，且 class-independent 的閾值 (threshold，也譯作臨界值)。

b) Higher-order preprocessing

統計測試可引入 high-order pre-processing ，例如 centered product, mimicking higher-order DPA attacks

第 3 步: 施加統計測試

使用統計測試來推翻 (disprove) 虛無假說 (null hypothesis)，這邊試圖推翻虛無假說「兩個時間分佈相等」

有幾種可用的統計測試方式:

a) t-test

又稱 Student's t-test，常見的實作有 Welch's t-test，適用在兩個 populations 有相同的平均值時，且變異數不同及樣本空間不同時會更可靠。

在此處如果 t-test 有結合前述 cropping pre-processing，就不再僅測試均值相等性 (equality of means) 而是更高的統計情境，因為 cropping 是一個非線性的轉換。

b) Non-parametric tests 無母數統計

這類統計優點在於對資料分佈的假設比較少，但缺點是其收斂速度較慢且需要更多樣本，例如 Kolmogorov-Smirnov test。

討論

實作

在〈Dude, is my code constant time〉中採用 2015 年出廠的 Apple MacBook 搭配 LLVM/clang-703.0.31 並指定 -O2 最佳化等級，每個實驗執行 120 秒。

a) 前處理

這裡為了測試受限的分佈範圍，會捨棄大於百分比

p

的值，特別是 lower cycle count tail ，而 upper tail 會更受與資料無關的雜訊影響，這個過程會讓我們拿到更好的實驗結果 (檢測更快)，但原文中也提供沒有前處理(pre-processing) 的測量資料。

b) 統計測試

使用 Welch's test 搭配 Welford's variance 計算方法來改進數值穩定性 (numeric stability)。

Welch's test
Welch's test 是一種假設檢驗方法，用來比較二個獨立樣本的平均值是否有顯著差異，在 Welch's test 中，假設二個樣本的平均值都相等，然後計算兩個樣本的 t 統計量，該統計量的分佈會近似於自由度小的那個樣本的 t 分佈，然後根據 t 分佈的臨界值，計算

p

值，判斷是否拒絕原假設，如果

p

值小於預先選定的顯著性水平 (通常是

0.05

) ，則可以拒絕原假設，認為兩個樣本的平均值不相等。

Welford's variance
Welford's variance 與傳統的方法相比具有更好的數值穩定性和更低的計算成本， Welford's variance 方法藉由逐一存取迴圈樣本中每筆資料，依次計算每個資料點的平均、均方差和標準差，在計算過程中保存樣本數、樣本平均值、總平方差、總平方和等中間變量，再透過這些中間變量的運算，可以在不需要儲存所有資料點的情況下逐步計算樣本平方差。

記憶體比較

記憶體比較操作常出現在需要常數時間執行時間的加密任務 (cryptiographic context)，例如 password verification 以及 message authentication tag verification

a) 時間變異的實作

這裡以 memcmp 為例，作為 smoke test，程式碼在 examples/simple/example.c，透過比較輸入字串以及一個 "secret" 的固定字串 (內部實作 white-box)

下圖畫出了 fix 和 radom classes 的累積分佈函數(機率密度函數的積分)。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

可見到在單次執行都小於 100 個 CPU 時鐘週期，而且可以看到兩個 class 有較大的差異，有 timing leakage 產生(延伸閱讀 timing attack)。

透過繪製 t-static 圖來判斷二個分佈圖是否存在差異

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

圖中不同的線代表著不同的 threshold

p

percentile value.

紅色區塊代表
$p$ 高於 4.5 threshold value
綠色區塊代表
$p$ 低於 4.5 threshold value

由上圖可以看到在任意次測試中，所有線都超過

p = 4.5

的 threshold，且其中存在達到

| t | = 1000

的曲線。

在 dudect.h 提及雖然論文表明 4.5 有強大的統計說明這兩個分佈具有差異 (分辨得出來)，但在實作上它給出較寬裕的值 10

#define t_threshold_moderate                                                   \
  10 // test failed. Pankaj likes 4.5 but let's be more lenient

接著，論文比較另一個 black-box，亦即沒在內部實作 secret ，代表計算估算端 (evaluator) 不知 s，因此在此測試下 fixed class 設定成 0 (沒有寫出詳細原因，不過我認為是因為不需要測試 corner-case，如同前述提到 fixed value 是為了測試 special corner-case)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

由上圖可以看到分佈圖更靠近，但仍不同，一樣需要推翻兩個分佈是一樣的虛無假說

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

由上圖可以看到 time leakage 還是可以判斷，但是相較於上一個案例更難 (出現了 below 4.5 的區域，代表沒有偵測到差異)，且

| t |

statistic 的值更小了，代表兩組資料分佈更相似，差異更小

b) 常數直接實作

本論文也分析了常數時間複雜度 (constant time) 的 memcmp(3)，後者運用邏輯運算以比較結果，且沒有提早跳出的機制。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

由圖可以看到幾乎是一模一樣的分佈

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

| t |

statics 的圖都分佈在綠色區域，代表這兩個分佈基本上沒有差異，在 20 million (

2 \times 10^{7}

) 次測試中都是 indistinguishable 的！

但在 memcmp(3) 有提到不建議使用 memcmp 比較具安全性的資料，因為它需要保持常數時間複雜度，這也就隱含了其實 memcmp 其實並不保證是常數時間複雜度，在 Linux 上這類函式應自行實作。

NOTES
Do not use memcmp() to compare security critical data, such as cryptographic secrets, because the required CPU time depends on the number of equal bytes. Instead, a function that performs comparisons in constant time is required. Some operating systems provide such a function (e.g., NetBSD's consttime_memequal()), but no such function is specified in POSIX. On Linux, it may be necessary to implement such a function oneself.

論文〈Dude, is my code constant time〉導讀

Timing Leakage Detection

第 1 步: 測量執行時間

a) Class definition

b) Cycle counters

c) Environmental conditions

第 2 步: 進行後處理

a) Cropping

b) Higher-order preprocessing

第 3 步: 施加統計測試

a) t-test

b) Non-parametric tests 無母數統計

討論

實作

a) 前處理

b) 統計測試

記憶體比較

a) 時間變異的實作

b) 常數直接實作

參考資訊

Read more

面試紀錄 & 練習 (聯發科)

2023q1 Homework1 (lab0)

將現有的 Raspberry PI 系統備份

Run FreeRTOS on VexRiscv and Measure/Tweak context switch overhead