Why Privacy Preserving Machine Learning (PPML)

# Brief Intro to Privacy Preserving Machine Learning --- ## Why Privacy Preserving Machine Learning (PPML) - Protect sensitive data during ML process - Prevent sending sensitive data directly to MLaaS server - Prevent leaking model and its parameters - LLM memorize training examples - Public language model fine-tuned on private data can be misused to recover private information - Infer if specific user is in the training set - etc --- ## In different ML steps - Collecting data - Training the model - Deployment and Inference --- ## Approaches for PPML --- ### Differential Privacy Add noise to dataset and make it difficult for attackers to extract specific information about any individuals. --- ### Federated learning, decentralized learning Instead of sending data to a server for training, each data holder trains the model on their local device and only the aggregated model is shared with the central server. --- ### **Secure neural network inference** (SNNI) Use MPC/FHE technique to execute machine learning inference in a privacy preserving manner. Model holder doesn't reveal their model parameters, input holder doesn't reveal the input data to each other. --- ## Usecases - Healthcare - Finance - Sensitive data for statistics --- # Secure neural network inference Problem setting: If model and input are held by different parties, how to protect privacy of their data? \* ex) ML as a service. use pre-trained NN model as a service to its users. --- ![スクリーンショット 2024-06-13 152507](https://hackmd.io/_uploads/r16bwGuSC.png) --- ![スクリーンショット 2024-06-13 152527](https://hackmd.io/_uploads/rJng9M_rA.png) --- ## Requirements, Challenges - Accuracy - Latency In NN, different layers use different operations. Some operations can work well with MPC/FHE but some are not. Non linear operations like **activation functions** and **pooling layers** require non-linear operations. --- **How to handle non-linear operations without losing accuracy and efficency** **How to handle floating point/fixed point numbers for better accuracy** --- ## Protocols Differnt protocols has different tradeoffs based on its protocol features. Tradeoffs of - Accuracy - Computational complexity - Round complexity - Communication complexity --- ## GC based Pros - Lower computational cost - Constant round Cons - High communication cost - Lower accuracy --- ## FHE based Pros - Lower communication cost - Constant round Cons - higher computational overhead - Require bootstrapping - Less accuracy --- ## FSS based Pros - GPU acceleartion - Better latency - Better communication cost Cons - Semi-honest - key size can get bigger - Require setup for each inference --- ## Mixed protocol Take good parts of different protocols --- ## State of the art protocol performances ![スクリーンショット 2024-06-12 234048](https://hackmd.io/_uploads/H1CMnf_BA.png) --- ![スクリーンショット 2024-04-30 184944](https://hackmd.io/_uploads/HJAkhfOBC.png) --- ### About study group Bi-weekly Tuesday 12:00 GMT. Next session: 25th June. List of topics covered in the study group https://hackmd.io/q8dWbmD2S-GNtnkj2azeUA