Vessels: Efficient and Scalable DNN Prediction on Trusted Processors
===
## Introduction
* Deep learning is popular and prevalent
* Object, voice recognition, image classification
* Medical diagnosis, face recognition, autonomous driving
* DNN
* Training -> prediction
* Stealing and manipulation attacks on deep learning
* Training data, model, architecture in the form of data
* Data breaches, Trojan attack
* Data at rest and in motion are protected by cryptography
* Difficult to protect data in use (during prediction).
* Protecting DNN prediction using Intel SGX
* SGX can protect the confidentiality and integrity of data during DNN prediction
* EPC limit of 128MB
* Linux: page swapping, Windows: can't run
* One page swapping takes XXX cycles [CK: Cite]
* DNN prediction requires large memory
* DenseNet model is XXX MB and requires XXX MB memory for prediction
* EPC is shared by all enclaves in the physical machine
* Scalability issue with many prediction requests in parallel
* Instant DNN prediction in the cloud (e.g., Siri, Alexa, [CK: Use the examples in the Neurosurgeon paper])
* EPC thrashing
* Existing work and their problems
* Fig. 1: Performance of existing DNN prediction with SGX
* AlexNet (?)
* Native, Eleos, TF-Trusted [CK: Use these as the baselines instead of Darknet-SGX in evaluation?]
* Y-axis: Avg. response time (per request) [CK: instead of throughput?]
* X-axis: Number of simultaneous requests
* Eleos: We ported Darknet (a DL framework in C++) to run on Eleos
* Replaces SGX page swapping with more efficient page swapping inside the enclave
* Slow because not designed for DNN
* DNN prediction requires to read and write many floating point numbers
at a very high frequency
* Frequent page swapping and address translation for each floating point
number
* TF-Trusted: TensorFlow Lite on SCONE
* SCONE
* Reduces the number of context switching for system calls
* Does not address page swapping, which is more significant to the performance of DNN prediction
* TensorFlow Lite
* Quantization: integer-only prediction for embedded devices
* Changes the DNN model: affects model accuracy and doesn't support layers that require floating point operations
* Slow because XXX
* Both do not address the scalability issue
* In this paper, we focus on the memory usage of DNN prediction with SGX
* Systematic study: found memory footprint of DNN prediction can be significantly reduced
* [CK: Turn the bullet points O1-4 to regular sentences]
* A new system for efficient and scalable DNN prediction with SGX
* Vessel: an optimized enclave for efficient and scalable DNN prediction
* Reduces peak enclave memory without modification to DNN architecture or model
* This reduces expensive SGX page swapping significantly
* Brings the performance of protected DNN prediction with SGX to that of unprotected prediction
* Carefully schedules to launch vessels for parallel prediction requests to avoid EPC threshing, hence minimizes the impact on scalability
* Key techniques of the system
* Shared memory across layers
* On-demand weight loading
* EPC-aware scheduling: based on enclave size estimation before execution
* Experimental results
* Contributions