SIGMA: Seure GPT Inference with Functional Secret Sharing

# SIGMA: Seure GPT Inference with Functional Secret Sharing - 2PC with preprocessing - GPU acceleration. different protocol for GPU and CPU - Order of magnitude better latency, communication than CrypTen - Standard 2PC security (semi-honest?) SIGMA is based on Orca. Orca only focuses on convolutional neural network that supports simple non-linearities like relu. SIGMA propose FSS based protocol for non-linear operations like GeLU, Softmax, and layer-normalization. ## Description of FSS Pair of algorithm Gen and Eval. $Gen: g, \lambda \rightarrow (k_0, k_1)$. $Eval: (b, k_b, x) \rightarrow g_b(x)$ Gen generates keys for 0 and 1 which define function $g_0$ and $g_1$ given function g and security parameter lamda. Eval takes $b, k_b, x$ and produces output of evaluation of the function share $g_b$ with input $x$. ## 2-PC from pre-processing from FSS with FSS, 2 parties know input to function shares. In order to acieve 2-PC, input has to be hidden from each other. We mask input with correlated randomness $r_i$ for each wire $w_i$. Given gates {g}, wire {w}. In offline phase, for each wire $w_i$, sample random masks $r_i$. and generates FSS for offset function $g^{[r_i,r_j]}(x) = g(x-r_i) + r_j$ These shared functions, when evaluated with masked input $\hat{x} = x + r_j$, returns shares of $g(x)+r_j$ which is masked output of gate g. In online phase, 2 parties evaluate each offset FSS gates and reconstruct masked output and evaluate gates in topological order. ## Distributed Point Function point function: $$f_{\alpha, \beta}^\bullet(x) = \begin{cases} \beta & \text{if $x = \alpha$ }\\ 0 & \text{otherwise} \end{cases} $$ Distributed point function: FSS version of point function. In the protocols proposed in this paper, it suffices to have output {1,0} and $\beta = 1$ Theorem1 Cost of DPF When $n > v$, ($v = log_2(\lambda+1)$) key size $(n-v) \cdot (\lambda + 2) + 2\lambda$ Number of PRG calls $2(n-v)$ for Gen. $n-v$ for Eval. When $n <= v$ key size $2^n$ PRG calls: 0 In this pape, they set $\lambda = 127$ and implemet length doubling PRG using 2 calls of AES-128. single AES call suffices for Eval as only half of the output is used. ### Comparison using DPF keys In Grotto paper. they introduce comparison using DPF max((n-v), 0) calls of PRG. ## Cryptographic building blocks from previous works. Multiplication -> beaver-triple based FSS-protocol. key size 3n bits. non-interactive. Select -> takes a selectorbit s and payload x. returns x if s = 1, 0 otherwise. non-interactive protocol. keysize 4n. Selectlin -> γ = {(α0, β0), (α1, β1), (α2, β2), (α3, β3)} with αi, βi ∈ UN, ∀i ∈. It takes as input two selector bits s0, s1, and a payload x, and outputs selectlinn,γ(s0, s1, x) = α2s0+s1x+β2s0+s1. non-interactive keysize 8n. Look-up table. input bit width n, output bitwidth l and public table T. takes input x and returns T\[x\]. keysize = keysize of DPFn,1 + n + 2l. 1 round communication with 2l bits. online phase invoke 2^(n-v)-1 times PRG calls. ### Truncation protocol Shift the value by f-bits in FSS manner. Achieves arithmetic right shift by logical right shift. (ARS -> shift signed value, LRS -> shift unsigned value) ![スクリーンショット 2024-04-30 181224](https://hackmd.io/_uploads/B1P0TEAb0.png) $ARS_{n,f}$ securely such that $keysize(ΠARS_{n,f}) = keysize(Π^{TR}_{n,f})+ keysize(DPF_{n−f,1})+2n+1$. Online phase requires 1 evaluation each of DPFf,1 and DPFn−f,1 and communicates $2(n − f) + 4$ bits in 3 rounds. TR: truncation reduce. drop f bits of the n-bit input and return (n-f) bit number. ### DReLU and Comparison Protocols DReLU is derivative of ReLU. keysize: keysize(DPF_n-1,1) + 1 online phase require 1 evaluation of DPF_n-1,1 call DReLU to realize comparison. ## Non-linear functions GeLU, softmax, layer normalization Overall direction: use look-up table for these unary functions. Try to reduce table size by appropriate approximation. ![スクリーンショット 2024-04-30 183927](https://hackmd.io/_uploads/S1UkREAZ0.png) ![スクリーンショット 2024-04-30 183924](https://hackmd.io/_uploads/r1ykR4RbR.png) ![スクリーンショット 2024-04-30 184300](https://hackmd.io/_uploads/BypkR40ZC.png) How to achieve non-linear functions. ## Evaluation ![スクリーンショット 2024-04-30 184633](https://hackmd.io/_uploads/BJNzAN0ZR.png) Round complexity should be high. ![スクリーンショット 2024-04-30 184944](https://hackmd.io/_uploads/r1jAREAW0.png) 13billion inference -> 30sec (wat!) Keysize(should estimate) ![スクリーンショット 2024-04-30 185711](https://hackmd.io/_uploads/B1QqeBAWC.png) ![スクリーンショット 2024-04-30 185821](https://hackmd.io/_uploads/Bko6grAWC.png) Accuracy ![スクリーンショット 2024-04-30 185844](https://hackmd.io/_uploads/SJc1-rAWA.png) --- Discussion point - what are the parties requirement? - It seems the both party joining the computation should act as symmetric actor when computing these functions. - What kind of deployment scenario does this protocol cover? - How to achieve malicious security? -