Virtualization and Containerization
This document is meant to be a gentle introduction to virtualization and containerization.
The key concepts that will be discussed in the lecture along this document are
- virtualization
- containerization
- virtualization vs containerization
- orchestration
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
For more info, please visit the Useful Links
section at the end of this document.
Introduction
- Generally speaking, it is a good practice to run individual services (or associated services) on independent and isolated servers.
- However having one physical machine per service (or associated services) is not a scalable model due to issues associated cost and others like physical space, hardware-availability, cooling and power.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
This example does not scale very well because it is not cost-effective (among other issues). Notice the resources are not being used efficiently. Figure taken from RedHat.
Virtualization
- Virtualization is a technology that lets you create useful IT services using resources that are traditionally bound to hardware.
- This technology allows devs and users to isolate applications in software.
- Virtualization uses a hypervisor to create and run virtual machines (VMs).
- A virtual machine (VM) is a compute resource that uses software instead of a physical computer to run programs and deploy apps.
- Virtual machines are logically isolated from one another, with their own operating system kernel.
- The machine running the hypervisor is often referred to as the host machine or host.
- The operating system (OS) of the host machine is referred to as the host operating system (OS).
- The virtual machine running on the host machine is often referred to a guest machine or guest.
- The operating system (OS) of the guest machine is referred to as the guest operating system (OS).
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Hypervisors separate the physical resources from the virtual environments. Figure taken from RedHat.
- Virtualization lets devs and users separate their data from the guest OS running the application, i.e. more than likely your data and your service will live in different hosts.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Reconsider the first example. In the example above we can now see we were able to reduce the number of physical servers and have two services run on the same machine on two different virtual machines. Figure taken from RedHat.
Desktop Virtualization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Check virtual.andrew.cmu.edu
.
This document does not cover desktop virtualization but be aware that as a member of the CMU community this service is offered to you.
Local Server/OS Virtualization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Figure taken from RedHat.
- OS virtualization is very among among data scientists or people dealing in data analytics.
- It is very straight-forward to deploy a guest OS on a laptop or dektop.
- In this context, the capability of running multiple operating systems and applications on a single computer is the most important aspect, e.g. a host machine running Windows with a Linux guest machine (the poster child).
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Oracle VM VirtualBox running on a MacBook Pro circa 2015.
- Oracle VM VirtualBox is a free widely used solution for local virtualization on personal computers and small machines.
- VMWare is a commercial enterprise solution.
- oVirt is a free open-source virtualization enterprise solution.
- Local virtualization is widely used for testing or running apps that are only available or stable for specific OS.
The cloud
- The cloud is not a physical entity, but instead is a vast network of remote servers around the globe.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
The cloud can make your life easier especially if you are not interested in DevOps.
- Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform,
- Amazon Web Services Free Tier is designed to give you hands-on experience with a range of AWS services at no charge.
- Other companies offer similar packages like Google Cloud and Microsoft Azure.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Some cloud computing services offer free allocations for full-time students enrolled in accredited universities like CMU.
For example, the GitHub Student Developer Pack.
Local virtualization vs the cloud
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- From my personal experience, choosing bewtween local vs cloud computing (as well as a provider) always boils down to cost
- Keep in mind
- physical space
- hardware availability
- cooling and power, and
- human resources
Data Virtualization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Data that's spread all over can be consolidated into a single source.
- Data virtualization tools sit in front of multiple data sources and allows them to be treated as single source.
Containerization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Mandatory container joke added to every presentation about containers
- Containerization is defined as a form of operating system virtualization, through which applications are run in isolated user spaces called containers, all using the same shared operating system (OS).
- A container is essentially a fully packaged and portable computing environment.
- Containers are meant to be
- portable
- scalable
- easier to build and deploy
- easier to manage
- isolated
Docker
- Docker is an open platform for developing, shipping, and running applications or services.
- A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. For an example Dockerfile, click here.
- The Registry is a stateless, highly scalable server side application that stores and lets you distribute Docker images.
- DockerHub is the most widely known public registry.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Docker uses a client-server architecture. Figure taken from Docker.
Singularity
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Singularity is yet another container platform.
- It is targeted towards isolating software, not deploying microservices.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Take this comment with a grain of salt: Docker is generally not used on HPC clusters due to security concerns.
Orchestration
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Container orchestration automates the provisioning, deployment, networking, scaling, availability, and lifecycle management of containers.
Virtualization versus Containerization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Different techniques to solve similar problems. In reality we combine them together to create and deploy systems that use both. Figure taken from e4developer.
In reality, we tend to combine virtualization and containerization to deploy services.
Example. Hellow World.
A Dockefile that can be used to print Hello, World!
looks like
A Singularity definition file that can be used to print Hello, World!
looks like
Example. FALCON for Python over Docker.
halcon is a Python implementation of the Feedback Adaptive Loop for Content-Based Retrieval (FALCON) algorithm as described in
- Leejay Wu, Christos Faloutsos, Katia P. Sycara, and Terry R. Payne. 2000. FALCON: Feedback Adaptive Loop for Content-Based Retrieval. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB '00), Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 297-306.
The algorithm works but the code was written a long time ago. If I were to try to install the library in a newer OS it would more than likely fail. However because I am using a
Upon inspection, the image exists locally
and running an example in the original repo leads to
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
The take home lesson for this example is; I can very easily deploy and run legacy code (from about ~7 years ago) in a modern system.
Example. Autolab.

Autolab is a course management platform that enables instructors to offer autograded programming assignments to their students.

Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
The take home lesson for this example is; you can create and deploy services that uses both local and cloud computing resources exploiting the fact that Docker images are portable.
Example. Galaxy Project.

Galaxy is an open, web-based platform for accessible, reproducible, and transparent computational research.

Using Galaxy you can create complex workflows by connecting tools. Each tool could be deployed.
A simple Galaxy deployment has at least two machines -virtual or physical. One machine to host the Galaxy server and another compute node to run workflows.
However, using Docker it is possible to scale this service

Figure taken from Galaxy Project.
For more information about how to Use Singularity containers for running Galaxy jobs
click here.
Conclusion

Figure taken from Kubernetes.io.
Advanced Topics
Orchestration

- Container orchestration automates the deployment, management, scaling, and networking of containers.
- Platform as a service (PaaS) is an enabler for software development where a third-party service provider delivers a platform to customers so they can develop, run, and manage software applications without the need to build and maintain the underlying infrastructure themselves.
Kubernetes

Figure taken from Kubernetes.io.
- Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.
- When you deploy Kubernetes, you get a cluster.
- A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.
OpenShift

- Red Hat OpenShift and Kubernetes are both container orchestration software, but Red Hat OpenShift is packaged as a downstream enterprise open source platform—meaning it’s undergone additional testing and contains additional features not available from the Kubernetes open source project.
Common Workflow Language

- The Common Workflow Language (CWL) is a specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.
Example. Hello World.
Consider this example. You will need to create two files: tool.cwl
and job.yml
Airflow

- Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows.
- When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.
Example. Hello World.
Consider the example below
Useful Links
Vocabulary
- Container. A container is essentially a fully packaged and portable computing environment.
- Containerization. Containerization is defined as a form of operating system virtualization, through which applications are run in isolated user spaces called containers, all using the same shared operating system (OS).
- Docker. The term Docker can refer to (1) the Docker project as a whole, which is a platform for developers and sysadmins to develop, ship, and run applications, or (2) the docker daemon process running on the host which manages images and containers (also called Docker Engine)
- Dockerfile. A Dockerfile is a text document that contains all the commands you would normally execute manually in order to build a Docker image. Docker can build images automatically by reading the instructions from a Dockerfile.
- Singularity. Singularity is a container platform. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible.
- Hypervisor. A hypervisor is software that creates and runs virtual machines (VMs). A hypervisor, sometimes called a virtual machine monitor (VMM), isolates the hypervisor operating system and resources from the virtual machines and enables the creation and management of those VMs.
- Singularity image file.
- Virtualization. Virtualization is technology that lets you create useful IT services using resources that are traditionally bound to hardware. It allows you to use a physical machine’s full capacity by distributing its capabilities among many users or environments.
- Virtual machine. A virtual machine (VM) is a compute resource that uses software instead of a physical computer to run programs and deploy apps.