# Cloud Computing & Networking
###### tags: `UCR`
## VMs
### Interfaces
* Assembly instructions, System calls, API
* Type of virtualization
* Bare-metal
* Native VM (Type 1)
* VMM between HW and guest OS
* VMM in **privilege** mode (**kernel mode**)
* Wvery VM run an OS
* Provide services on top of bare-metal, mediate access to the hardware
* e.g. VMWare
* Unmodified OS run in user mode, hypervisor does a binary translation for sensitive instructions
* Hypervisor is the real kernel, traps Sensitive instructions with VT
* Scan codes of programs, replace sensitive instructions with special procedures
* Hosted VM(Type 2)
* VMM running on top of host OS
* Hypervisor is an emulator
* e.g. KVM and QEMU
* Dual mode VM
* One in non-privilege mode(user mode), one in privilege mode.
* Host OS need modifications
* Dom0 keep drivers out of VMM
* Hybrid organization (Para-virtualization)
* Mostly bare-metal, modified kernel talks directly to the hypervisor
* e.g. Xen
* OS-level virtualization
* OS allows multoiple secure virtual servers, supporting isolation
* e.g. Docker and LXC
* Application-lvel
* e.g. JVM and Wine
### Rings
* Ring 0: HW, such as CPU, memory, can do anything
* Ring 3: User mode access, cannot run several instructions and write to several registers
* Cause a trap in user mode, have hypervisor takes over
* Linux X86 only uses ring 0 and ring 3
### memory virtualization
* Need a shadow page table
* Hypervisor maps shadow page table to actual page table
* OS: VM virtual pages -> VM's physical address
* Change page table to read-only
* Use hyper call to change page table in para-virtualizaion
### IO Virtualization
* Hypervisor creates virtual disk
* Large empty files on physical disks appears as disks to the guest OS
* Hypervisor convert block # to file offsets
### Full virtualization
* Run guest OS unmodified
* Wehn OS executes privilege instruction, trapped by the hypervisor
* Expensive, translate OS binaries and cache them
### Para-virtualization
* Modify OS to make it VM-aware
#### Xen
* Guest OSs aware of Xen, in user mode
* CPU
* Fast system call handler can be serviced without trapping to VMM
* Handler is validated before installing
* Interrupts are replaced with a lightweight event system
* I/O
* Event notification replaces interrupts
* I/O data transferred to guest OS via Xen with shared mory buffer
* Updated the bitmap, and guest OS can register handlers
* Rings
*
* Control transfer
* Explicit control transfer from guests to monitor: Hypercalls
* Xen-readable flags may be set by a domain for pending events
* Memory
* Guest cannot install privilege level segment descriptors, but has right to read directly
* Virtual -> Physical(Done by guest OS) -> Machine(Done by VMM) (2 levels mapping)
* VMM creates shadow page table that map VPN(Virtual page # to Machine frame number)
* Make sure shadow table is consistent, by marking those HW pages read-only
* Balooning
* Deduplication: VMM can share memory across read-only pages, such as between 2 guests OS running same operating system
* Networking
* Each virtual interfaces has 2 I/O rings
* Xen models a virtual firewall+router
* Packet transmission
* Xen copies packet header, change source IP and destination
* If no receive frame is available, drop the packet
* Avoid Xen to guests copies
* Disk virtualizations
### VT-x
* Ring compression
* Non-trapping instructions
* VMX (Virtual machine extensions)
* VMM runs in VMX for root operations
* Guest OS runs for non-root operations
* VMX Transitions
* Causes swapping
* VMX Entry: Transition into VMX non-root operations
* VMX Exit: Transition into root operation
* VMCS: VM control structure
* MMU Virtualization
* VPID
* First gen forces TLB flush on each VMX transition
* Shadow page table, memory overhead
* [Extended Pages Table](https://www.exploit-db.com/docs/45546)
* Help manage page table entries
* VMM maintains PPN->MPN
* One page table is maintained by guest OS, which is used to generate the guest-physical address.
* The other page table is maintained by VMM, which is used to map guest physical address to host physical address.
## Containers
* No hypervisor, shared kernel with host OS
* Lightweight
* OS provides resource isolation
* Scheduling
* Shared-based scheduling
### LXC
* Namespace: Retrict what a container can see
* Processes, MNT(mount points), PID, NET(NICs, routings), Users
* Cgroup: Limit resource and provides priority, keep track of how much resouce a container use
* Chroot
* Provides libraries run on Linux
### Docker
* On top of host OS, there's a Docker engine
* Docker image
* A snapshot of Docker VM
* Union file system: Allow containers to use FS safely
* Copy-on-write file system
## Live Migration
* Why?
* We wanna move a VM without Interrupting VM service
* Load balancing
* VM checkpointing
* Pause VM
* Write ot all its memory state to disk
* Write out processor state
* Make a copy of its disk data
### VMs
* Save a copy of VM
* VMs are not aware of being moved, and network connections are not interrupted
* Zero down time
* Steps
* Copy all memory pages as the VM is running
* Track what memory pages are written by the VM during transfer
* Resend all dirty pages
* Repeat until very few pages left
* Pause and send the final set of pages
* Consideration
* Down time
* Scale
* Cost of migration
* reduce total migration time
* Minimize network activity
* Process migration
* Pure stop-and-copy
* Long downtime
* Minimal total migration time = downtime
* Pure Demand Paging
* Copy minimal execution context to target
* Restart VM at target
* Pull memory contents from source as and when needed
* Slow warm-up phase at target during page faults across network
### Memory
* Push
* Pages are pushed across the network to dest`
* Stop and Copy
* The src VM stopped and paged are copied to the dest
* Pull
* New VM executes and find page faults
#### Pre-copy (Push stage)
* Not freezing src, let it continue running
* Copy memory contents to target
* Copy all pages during first iteration
* Each subsequent iteration copies pages that were dirtied by the VM during the previous iteration
* When number of dirty pages is small enough, stop-and-copy
* Challenge
* Some sets of pages update frequently.
* Network bandwidth
* Issues
* "Peek" those dirty pages in the previous round (Rapid page Dirtying)
* Dynamic Rate-Limiting
* Higher migration time
#### post-copy
* Downtime before resume
* Advantage
* Lower network overhead
* Each page sent only once over the network
* Total migration time lower for write intensive worloads
* Disadvantage
* Cold start penalty at destination
* Stages
* Freeze VM at source
* Migrate CPU states and minimum states to dest
* Start VM at target without memory
* Fetch the memory via
* Demand paging over the network
* Active push from source
* Preparing stage
* Bubbling
* Write the VM to the disk
#### Hybrid Migration
* Do first iteration a memory copy
* Demand paging
* Compared to pre-copy, Lower total migration time
* Compared to post-copy, Smaller cold-start penalty due to fewer network-bound page fault
#### Network
* Generate an unsolicited **ARP entry** from the migrated host
* Some packets may be lost
#### Disk
* Network-Attached Storage
### Xen
* Managed migration
* 1st round
* Copy all memory page to the destination
* Replace original page table with shadow page table, marked as read-only
* Create a dirty bit map
* 2nd to n-1 round
* If VM wants to modify a page, invoke Xen to set the appropriate bit in the dirty bit map
* Resend dirty page
* nth round
* When the number of dirty page exceed upper bound, stop-and-copy
* Self migration
* The OS, not Xen, does what mentioned above. In nth round, OS disables all activities when the number of dirtied page exceed upper bound
* Finally, copy all dirty pages to shadow buffer
### Optimization
* Dynamic rate limiting
* Rapid page dirting
### Blackbox and gray box strategy (Sandpiper)
* Decide when, where to migrate, and how much resources to allocate?
* Automatically detect and mitigate hotspots through VM migration
#### Architecture
* On physical machine
* Nucleus
* Monitor resources and report to control plane on servers
* Control Plane on Centralized server
* Hotspot Detectors
* Profile Engine: Decide how much to allocate
* Migration Manager: Determine where to migrate
* Black box: Only data outside VM
* Gray box: Access to OS stats and logs
## Remote Replication of VM over the WAN
* Why not using public cloud
* Privacy concerns
* Don't want to share CPU cycles with competitors
* Problems
* Transparency
* Application may have been written for LAN environment
* Must add Internet gateway
* Security
* Must configure firewall rules to limit access
* Flexible resources management
### Vision: Virtual Cloud Pool
Flexible cross data center resouce pool
* Goals
* A secure collection of server, storage and network resources
* Seemly and securely connects enterprises and cloud data centers
* Use cases
* Enterprise Consolidation: Simplify deployment into the cloud
* Cloud Bursting : WAN migration enables dynamic placement
* Follow the sun: Application moves to be closer to clients and data
* VPNs
* L3 and L2 MPLS based VPN (Virtual Private Network)
* VPLS (LAN Service): Bridge the local networks at multiple sites
* Cloud manager
* Allocate computation and storage resources
* Manage VLAN assignment within cloud network
* Can create a logical router on physical routers
* Network manager
* Create and configure VPN Endpoints
* Reserve network resources
* Centralized VPN Controller
* Act as route reflector
* Route propagated via BGP
* L2 VPN makes WAN like LAN, by changing ARP
* Optimize WAN migration
* Detect identical regions in memory or disk and only send once
* Send only page delta for partially changed blocks
* Smart stop: iterate until sent > Dirtied
* Then find local minimum for dirtied
### Containers
* Checkpoints / Restore in userspace
* Smaller footpoint compared to VMs
#### Challenge
* Environment: Cgroups and namespaces
* Process: May have many child process and have to track down Dependencies
* Not all have same API
* Check for CPU compatibility
## Networking
* Cloud Isolation
* Motivation: Cloud Platform not optimized for enterprise use
* Problem
* Transparency
* Must deal with public IPs and configure DNS
* Security
* Server in LAN are exposed
* Fine grained rules for firewall rules are difficult to manage in dynamic environment
* Flexible Resource Management
* Do not support network sesource reservation
* Dynamic Cloud Pool
* Seamlessly securely connect enterprises and cloud data center
* Cloud Bursting
* Enable efficient migration of resources between data centers
* Virtual Private Cloud
* Virtual Private Networks
* VPLS
* On Layer 2, so no need to change IP address
* Bridge the local network to
* Dynamic VPN Endpoints
* Centralized VPN Controleer
* Acts as route reflector between sites
* Can adjust ruleset to modify VPN topology
* Route updates propograted via BGP
* WAN Migration Optimizations
* Storage Migration
* Async copy of disk store to remote site initially
* Sync copy of incremented updates subsequently
* Redundency elimination
* Detect identivcal region in memory or disk and only send once
* Use fingerprint
* Or send page deltas, only send delta for partially changed data blocks during the migration
* Smart Stop
## Container Migration
* Pre-copy
* Track memory changes, copy memory whilte tasksare running
* Pros: Once migrated, source node can disappered
* Cons: Unpredictable and non-guaranteed
* Iteration make take long
* Post-copy
* Migrate all but memory, turn on "network swap" on destination
* Pros: Predictable migration time
* Cons: Source node death means death of container
* Why difficults
* Needs to copy states of all objects
* Needs to check dependencies between containers
* All have different amounts of data, APIs
* Needs to check CPU compatibility
* Needs to load necessary kernel module
* Non shared filesystem should be copied
* Rollback on source node
* Implementation
* CRIU: memory copies and save/restore states
* P.Haul: Checks and deal with filesystem
## Distributed File System & Externel Synchrony
### CAP Theorem
* Partition Tolerance
* Consistency
* Availability
* Choose 2 from 3
### Synced ns Asynced
* Synced
* Strong reliability but slow
* No data loss, unsafe replies
* Performance veries with latency
* Guarantee ordering
* Caller blocked until operation completes
* Asynced
* Relax reliability guarantees reasonable performance
* Fast
* Externel Synchrony
* How to improve both durability & performance for local file system
* `sync()` in Unix have data written only to volatile cache
* Data not safe
* Externel sync: Same causal ordering without blocking applications
* Optimization
* Group committing
* Externel output is buffered and process continue execution
### ACID property
* Atomicity
* An operation either be performed in its entirety or not performed at all
* Hypervisor should support migration and suspend/resume
* Need atomic snapshot
* Consistency
* If a transaction is exexcuted from the beginning to the end, it should take the database from one consistent state to another
* Isolation
* Appear as it is being executed in isolation from other transaction
* Durability
* The chnage apply to the database by a committed transaction must persist in the database
### Remus: VM Replication
* Remus divides time into epochs
* Perform a chcekpoint at the end of each epoch
* Suspend primary VM
* Copy the memory state
* Resume primary VM
* Send asynchronous messages to backup containing state changes
* Backups apply state changes
* Replication
* State of replica must be synchronized with primary before the output of primary is externally visible
* Buffer outputs
* Allow computation to be performed asynchronously
* Overlapping normal execution and with replication
* Outbound packets are buffered until checkpointed states are committed to backups
* Disk updates are asynchronously mirrored in the memory of backup
* Write release to the disk after checkpoint on backup
* VM image resides in memory on backup, VM not execute until failure occurs
* After failure, backup VM begins execution from the latest checkpoint
* The work done during the failed epoch will be lost
* Consumes small amount of backup resources
### Pipecloud: Using causality to overcome speed-of-light delays on cloud-based disaster recovery
* Cloud provides data backup and failover mechanisms
* Challenge
* Latency
* Locating the cloud site close to primary is not always feasible
* Pipeline synchrony
* No replies to client until prior writes are replicated to backup site
* Overlapping execution with replication
* Hold network replies until write complete
* Client treat as synchronous
* No need to modify the application
* Track disk writes and network calls, intercepted by hypervisor
* Replicate disk to backup, block network packets until prior disk write committed at replica
* Ensure total ordering: **Count pending and committed writes**
* Use **vector clocks** to track pending writes and ensure causal ordering
* Must propagate dependencies for multi-tier apps
* More execution to overlap, less replication overhead
## Data Center
* Where clouds are build on
* Challenge
* Multiple applications
* Load balancing
* Direct request o different servers
* Operational Cost
* Power usage
* Cost: Servers > Power infrastructure > Power draw > Network
* Design Condiseration
* 2 types of applications
* Outward facing
* Internal Computation
* Unpredictable workloads
* Failures of servers, traffic matrix between servers are constantly changing
* Objectives
* Uniform high capacity
* No needs to consider topology when adding servers
* Easy managemnet: Plug-And-play for L2
* Any server can have IP address
* Performance Isolation
### Fat Tree
* A switch-centric architecture
* Has identical bandwidth at any bisection
* Increased throughput between racks
* Increased reliability with redundancy
* Can be built using cheap devices with uniform capacity
* Problem
* L2 algorithm: Control Plane flooding
* Modification
* First-level is prefix lookup
* Goal: Don't want host address to change
### Portland: Fault-Tolerant Layer 2
* Use host IP address as host identifier
* Use "Pseudo MAC" to identify location
* IP address ->Pseudo MAC Address
* Topology independent
* Needs to know Pod numberand position-level switch
* Get port number from fabric manager
* Fabric manager
* A server
* Maps the IP address to Pseudo MAC
### VL2
* Location-specific IP address
* Valiant Load Balancing: 2 hops
* 1st: Ramdonly chosen node (Random path up)
* 2nd: To destination (Non-random path down)
* Not limited to a single pod
* VL2 Agent
* Intercept ARP request
* Convert req to a unicast query to the VL2 directory system
* Intercept packet from host
* Advantage
* Load balancing
* Simple migration
### Dcell
* Use servers to do the forwarding
* Because servers have multiple ports
* Use little switch
* No connections between switches
### Bcube
* Leverage servers be part of routing infrastructure
* Server has multiple ports
### Agility
y service, Any server
* Achieve agility
* Workload management: Rapid install service code ona server
* *e.g.* MapReduce
* Storage management: For a server to access persistent data
* Use Distributed filesystem, such as GFS
* Network management: For a server to communicate with other server regardless of where they are in data center
### Networking
* Architecture
* Topology
* Routing algorithms
* Switching strtegies
## Software Defined Network
An approach to network management that enables dynamic, programmatically efficient network configuration
* Decouples data plane and control plane
* Features can be implemented on network OS
* Programmable routing
### Benefits
* Elastic resource allocation
* Scalability
* Load balacing
### Steps
* Open interface for packet forwarding
* Openflow protocol
* Communication between the controller and the network devices(*e.g.* switch)
* Main components: Flow and Group table
* Controller can manipulate tables with Openflow protocol
* Flow table: Reactively or proactively defines how incoming packets are forwarded
* Established by the controller
* Group table: Additional processing
* At least one Network OS
* A distributed system that creates a consistent, up-to-date network view
* Get state info from forwarding elements
* Control program run on top of network OS
* Operates on view of network
* Input: Global network view
* Output: Configuration of each network device
* Not a distributed system
* Abstraction hides the notion of distributed
* Forwarding Abstraction
* Abstract away hardware
* Flexible
* Minimum
* Switch
* MAC address are group into common subnet
* Network controller
* Logically centralized
* Router matches on IP prefix, switch matches on destination MAC address
### Forwarding vs Routing
* Forwarding
* Data plane
* Directing a data packet to an outgoing link
* Routing
* Control plane
* Compute paths the packets to follow
### Openflow
A protocol between SDN controller and Openflow switches
#### Components
* Flow table: Define how incoming packets are forwarded
* Controller may manipulates the tables via Openflow protocol
* Group table: Additional processing
## Openstack
An open-source Cloud operating system, aligned with Ubuntu release cycle
### Design
* Reliability: Minimal dependencies
* Loosely coupled components
* Publish/Subscribe message service: Advanced Message Queuing Protocol (AMQP) and RabbitMQ
* Lossely coupled Asynchronous interactions between core components and services
* even for remote procedure calls
* Local database
### Core components
* API component
* stateless
* One or more internal components
* A local database
### Component
* Nova: Computing
* Provides on-demand virtual services
* API: Restful
* Compute: Manage instances, talks to hypervisors
* Scheduler: Coordinates all services and determining placements of new requesed resources
* Database: Storing runtime states of cloud infrastructure
* Glance: Dealing with images
* Neutron: Networking
* Manages large network
* Decoupling the logical view of the network from physical view
* Cinder: Volume service
* Block storage
* Handles creation, attachment and deattachment of volume
* Swift: Object storage
* Provided completely distributed storages platform that can be accessed by APIs
* Proxy server
* Account server
* Container server
* Object server
* Heat: Orchestrations
* Horizon: Dashboard
* Web interface
* Keystone: Authentication
* Initiate a VM
* Queueing for messages between components
* Use publishers / subscribers models
## Borg: An OS for Cluster
### Motivation
* Provide resources sharing
* Provide high reliability and availability of the cluster
* Programmers may focus on application
### Borg Structure
* Borgmaster: master for cell (Subset of cluster)
* Scheduler makes decision where to place tasks
* May have up to 5 replica, bus has only one master decided by Paxos
:::info
A job consists of multiple tasks
:::
* Borglet: Agent running on each machine, monitoring the tasks
* Borgmaster poll borglet periodically
### Scheduling in data center
* Goals
* Increase throughput
* Reduce machine fragmentation
* Increased the number of unused machines
* Types
* long-running
* batch jobs: low priority jobs, less sensitive to short-term performance fluctuations
* How to find a candidate
* Scoring picks one of the feasible machine
* Minimize the job being preempted
* Affirmative
* Scalability
* Seperate scheduler
* Seperate thread to poll the Borglet
* Choose a random number of machines to score
* How to estimate
* Maximize throughput, total useful work
* Minimize preemptions
* May maximize or minimize unused machine for power saving
* Naming
* Name each task
## Kubernetes
* Pods
* Consists of one or more containers which share
* Volumes
* A network namespace
* The atomic unit of Kubernetes
* Are **ephemeral**
* Service
* Unified method of accessing workloads for pods
* Durable resources, not ephemeral
* static cluster IP, DNS namespace
* Controller
* API server
* Provides facing REST interfaces
* Provides authentication, request validation
* etcd: Stable storage, cluster datastore
* Key-value store, stores objects and configurations
* Use Raft consensus(Paxos)
* Kube-controller-manager
* The primary daemon managing all core component controller loop
* Monitor cluster via api-server and steer cluster to the desired state
* Kube-scheduler
* Evaluates workload requirements and attempts to place it on a matching resource
* Use bin packing
* Node components
* Kublet
* Mange lifecycle of the pod
* Provides HTTP Endpoints
* May act as an HTTP server
* Creates the pod
* Kube-proxy: Providing communication
* Manages the network rules on each nodes
* Perform connections forwarding or load balancing
* Runtime engine
* Containerd
* Networking
* Pod Network
* CNI enables communications between pods
* Is capable of addressing other container's IP addresses without resorting to NAT
* Each pod get its own IP address
## Container Networking
### Microservice
An approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.
* Properties
* Small code base
* Scalable
* Resilient
* Benefits
* A highly reliable, scalable and resource efficient application
* Enable small development team
* Team free to use languages and tools for the job
* Rapid application development
* Cloud Native Application
* Loosely coupled distributed applications
* Datastore
* Packaging
* Teams
* Overhead is low enough
* Challenges
* Service Discovery
* Operational overhead
* Service Dependency
* Traffic and Load balancing
* Fault tolerance
* Auto-scale
### Basics
* Container eth0 bridged at OS level to physical interface
* Minimal Requirement
* IP connectivity in containers network
* IP Address Management(IPAM) and network device creation
* Externel connectivity via Host NAT and Route Advertisement
* Docker
* CNM (Container Network Model)
* Endpoint in Network Sandbox inside container
* Deals with routing table, NAT
* Sandbox: Contains the configuration of a container's network stack
* Network: A group of endpoints that can communicate with each other
* Endpoint: Joins a sandbox to the network, implemented by a veth pair, or an OpenvSwitch port
* Layer 2 subnet in Linux bridge `docker0`
* Partition and isolation are achieved by dividing network address
* Kubernetes
* CNI: Differnt from CNM, is capable of addressing other container's IP address without resorting to NAT
* Support any container runtime
* Plug-in is called when a pod is initiated or removed
## Service discovery
### Types
* Client discovery
* Cleints talk to device registry
* Client service need to be service registry aware
* Server discovery
* Clients talk to the load balancer and the load balancer talks to the servers
### Should provides
* Services needs to discover eacth other dynamically
* Health check load balancing
* Create a new overley network
* Creates a service
## Serverless Computing
Users are not managing the servers, **lack of concerning about server management, capability planning**,
* Never pay for idle
* On demand scaling
* Cares only about application logics, event-triggered
* When to use
* Short running
* Event driven
* Stateless
* Microservice, Mobile Backend, IoT, BoT, service integration
* When not to use
* Long running
* Stateful
* Latency critical
* Benefits
* Event-driven
* Auto-scale
* Fully-managed services
* Limitations
* Applications still need some sort of state
* May need remote and local debugging
* Providers impose limits on execution time, bandwidth, etc.
* Malicious code can corrupt the shared machine
* May not be cheaper for non-event triggered applications
* Bad for
* Long running, stateful, latency critical
* *e.g.* Database
### Usage
* Independent modules for
* Authentication service
* Various database
### Architecture
* Sense
* Analyze
* Respond
* Multiple Receipent can respond independently
* Act immediately upon event arrival, or asynchronously at later a point in time
* Events are pushed
### Features
* Functions
* Decouple software function from SW/HW resources
* The source code of the function provided by the user
* Event-driven
* Decouple event sources from processor
* Stateless
* Decouple Computation from state
* Microservice
* Decouple separation of concern
* Delegation
### FaaS
* Function matadata store
* Platform and function specific configuration
* Reason
* Have 2 different access patterns
* execution model
* Execution = function + runtime + resource
* Pricing model
* memory usage + compute time
* Per request
### Challenge
#### Cold start
* Response time = cold start time + user response time
* Factors increase cold start time
* Static type language vs dynamic type
* Memory size of the function
* VPC (Virtual private Circuit)
* Expensive to keep
* Code size of the function
#### Keep warn
* Reuse function
* Multiplexing requests onto the same instance
* Runtime pooling
* User provided functions
* Runtime: Specific to programming languages
* Predictive scheduling
* Anticipating the demand of functions and deploy the functions ahead of time
* Function prefetching
* Move data to the place where it is most likely to be excuted in near time
* Horizon pre-warming
* Deploy the dependent functions ahead of time
* Scheduled Warming
### Concerns
## Network Function Virtualization
* Goals
* Make an efficient and customizable data plane
* Run network functions in software
* Run them in virtual machines
* Advantage
* Scalable
* Cheap
* Rapid on-demand instantiate and removal
### Middlebox
* Types
* Firewall
* DDOS Protection
* QoE Monitor
* etc
* Are mainly software based functions
* Why they exists
* A solution in response for changing performace, security and policy compliance requirements
* Network Functions
* switches
* Intrution Detction System
* Drawbacks
* Hard to modify
* Costly
### Packet Processing
* Linux Packet Processing
* NIC use DMA to copy data
* Interrupts when packets arrives
* Copy packets from kernel space to user space, overhead
* Use systemcall to tramit packet to kernel space
* User space Packet Processing
* NIC uses DMA to copy data into user space driver
* Use Polling to find when packet arrives (**DPDK**)
* Use regular function call to transmit packets from user space
* Design principle
* Combined NF
* Everything has to be compiled together, tight code dependency
* Fast calls to NFS
* VM ot containers per NF
* Can be dunamically instantiated
### DPDK
* Bypass the kernel
* Benefits
* Get larger pages
* No extra copies
* No context switching problems
* Copy the descriptor to user space
* Applications that needs high speed access to low-level devices
* Solutions
* Polling
* User mode driver
* Pthread affinity
* Get the benefit of caching in CPU
* huge pages (No 4K)
* 4 packets per page
* 4KB pages in kernel space, updates Transition Lookaside Buffer(TLB) frequently
* On Sandy Bridge, there are 64 entries
* Lockless inter-core communications
* high throughput bulk mode I/O calls
* Process in bulks
### OpenNetVM
* Packet DMA into huge memory
* Packets can be modified the packets
* Only one applications access the packet, avoid locks
* One writter, multiple readers
* If a VM has file descriptor
* Moving descriptors only
* Shared memory os allocated by the manager(Called NetVM)
* Trusted
* Ported to Docker containers
#### Design Principles
* Providing efficient IO for modular NF which can be dynamically deployed and managed
* Service Chaim
* Chain together functionalities to build more complex services
* Monolithic
* Good: Fast calls to NF
* Bad: Code dependencies, must be compiled together
* Modular
* Good: Dynamically instantiate, greater control over resource allocation
* Bad: Higer cost when moving data between NFs
* Eliminate abstraction layers when they cause overhead