# SIGMA: Seure GPT Inference with Functional Secret Sharing
- 2PC with preprocessing
- GPU acceleration. different protocol for GPU and CPU
- Order of magnitude better latency, communication than CrypTen
- Standard 2PC security (semi-honest?)
SIGMA is based on Orca.
Orca only focuses on convolutional neural network that supports simple non-linearities like relu.
SIGMA propose FSS based protocol for non-linear operations like GeLU, Softmax, and layer-normalization.
## Description of FSS
Pair of algorithm Gen and Eval.
$Gen: g, \lambda \rightarrow (k_0, k_1)$.
$Eval: (b, k_b, x) \rightarrow g_b(x)$
Gen generates keys for 0 and 1 which define function $g_0$ and $g_1$ given function g and security parameter lamda.
Eval takes $b, k_b, x$ and produces output of evaluation of the function share $g_b$ with input $x$.
## 2-PC from pre-processing from FSS
with FSS, 2 parties know input to function shares.
In order to acieve 2-PC, input has to be hidden from each other. We mask input with correlated randomness $r_i$ for each wire $w_i$. Given gates {g}, wire {w}.
In offline phase, for each wire $w_i$, sample random masks $r_i$.
and generates FSS for offset function $g^{[r_i,r_j]}(x) = g(x-r_i) + r_j$
These shared functions, when evaluated with masked input $\hat{x} = x + r_j$, returns shares of $g(x)+r_j$ which is masked output of gate g.
In online phase, 2 parties evaluate each offset FSS gates and reconstruct masked output and evaluate gates in topological order.
## Distributed Point Function
point function:
$$f_{\alpha, \beta}^\bullet(x) =
\begin{cases}
\beta & \text{if $x = \alpha$ }\\
0 & \text{otherwise}
\end{cases}
$$
Distributed point function: FSS version of point function. In the protocols proposed in this paper, it suffices to have output {1,0} and $\beta = 1$
Theorem1 Cost of DPF
When $n > v$, ($v = log_2(\lambda+1)$)
key size $(n-v) \cdot (\lambda + 2) + 2\lambda$
Number of PRG calls $2(n-v)$ for Gen. $n-v$ for Eval.
When $n <= v$
key size $2^n$
PRG calls: 0
In this pape, they set $\lambda = 127$ and implemet length doubling PRG using 2 calls of AES-128.
single AES call suffices for Eval as only half of the output is used.
### Comparison using DPF keys
In Grotto paper. they introduce comparison using DPF
max((n-v), 0) calls of PRG.
## Cryptographic building blocks from previous works.
Multiplication -> beaver-triple based FSS-protocol. key size 3n bits. non-interactive.
Select -> takes a selectorbit s and payload x. returns x if s = 1, 0 otherwise.
non-interactive protocol. keysize 4n.
Selectlin -> γ = {(α0, β0), (α1, β1), (α2, β2), (α3, β3)} with αi, βi ∈ UN, ∀i ∈. It takes as input two selector bits s0, s1, and a payload x, and outputs selectlinn,γ(s0, s1, x) =
α2s0+s1x+β2s0+s1. non-interactive keysize 8n.
Look-up table. input bit width n, output bitwidth l and public table T.
takes input x and returns T\[x\]. keysize = keysize of DPFn,1 + n + 2l.
1 round communication with 2l bits. online phase invoke 2^(n-v)-1 times PRG calls.
### Truncation protocol
Shift the value by f-bits in FSS manner.
Achieves arithmetic right shift by logical right shift. (ARS -> shift signed value, LRS -> shift unsigned value)

$ARS_{n,f}$ securely such that $keysize(ΠARS_{n,f}) = keysize(Π^{TR}_{n,f})+ keysize(DPF_{n−f,1})+2n+1$.
Online phase requires 1 evaluation each of DPFf,1 and DPFn−f,1 and communicates $2(n − f) + 4$ bits in 3 rounds.
TR: truncation reduce. drop f bits of the n-bit input and return (n-f) bit number.
### DReLU and Comparison Protocols
DReLU is derivative of ReLU.
keysize: keysize(DPF_n-1,1) + 1
online phase require 1 evaluation of DPF_n-1,1
call DReLU to realize comparison.
## Non-linear functions
GeLU, softmax, layer normalization
Overall direction: use look-up table for these unary functions.
Try to reduce table size by appropriate approximation.



How to achieve non-linear functions.
## Evaluation

Round complexity should be high.

13billion inference -> 30sec (wat!)
Keysize(should estimate)


Accuracy

---
Discussion point
- what are the parties requirement?
- It seems the both party joining the computation should act as symmetric actor when computing these functions.
- What kind of deployment scenario does this protocol cover?
- How to achieve malicious security?
-