# Brief Intro to Privacy Preserving Machine Learning
---
## Why Privacy Preserving Machine Learning (PPML)
- Protect sensitive data during ML process
- Prevent sending sensitive data directly to MLaaS server
- Prevent leaking model and its parameters
- LLM memorize training examples
- Public language model fine-tuned on private data can be misused to recover private information
- Infer if specific user is in the training set
- etc
---
## In different ML steps
- Collecting data
- Training the model
- Deployment and Inference
---
## Approaches for PPML
---
### Differential Privacy
Add noise to dataset and make it difficult for attackers to extract specific information about any individuals.
---
### Federated learning, decentralized learning
Instead of sending data to a server for training, each data holder trains the model on their local device and only the aggregated model is shared with the central server.
---
### **Secure neural network inference** (SNNI)
Use MPC/FHE technique to execute machine learning inference in a privacy preserving manner. Model holder doesn't reveal their model parameters, input holder doesn't reveal the input data to each other.
---
## Usecases
- Healthcare
- Finance
- Sensitive data for statistics
---
# Secure neural network inference
Problem setting: If model and input are held by different parties, how to protect privacy of their data?
\* ex) ML as a service. use pre-trained NN model as a service to its users.
---

---

---
## Requirements, Challenges
- Accuracy
- Latency
In NN, different layers use different operations. Some operations can work well with MPC/FHE but some are not. Non linear operations like **activation functions** and **pooling layers** require non-linear operations.
---
**How to handle non-linear operations without losing accuracy and efficency**
**How to handle floating point/fixed point numbers for better accuracy**
---
## Protocols
Differnt protocols has different tradeoffs based on its protocol features.
Tradeoffs of
- Accuracy
- Computational complexity
- Round complexity
- Communication complexity
---
## GC based
Pros
- Lower computational cost
- Constant round
Cons
- High communication cost
- Lower accuracy
---
## FHE based
Pros
- Lower communication cost
- Constant round
Cons
- higher computational overhead
- Require bootstrapping
- Less accuracy
---
## FSS based
Pros
- GPU acceleartion
- Better latency
- Better communication cost
Cons
- Semi-honest
- key size can get bigger
- Require setup for each inference
---
## Mixed protocol
Take good parts of different protocols
---
## State of the art protocol performances

---

---
### About study group
Bi-weekly Tuesday 12:00 GMT. Next session: 25th June.
List of topics covered in the study group
https://hackmd.io/q8dWbmD2S-GNtnkj2azeUA