Unit 5 - Virtualization, Containerization and Elasticity in Cloud Computing
Unit 5 - Virtualization, Containerization and Elasticity in Cloud Computing
Elasticity is defined as "the degree to which a system is able to adapt to workload changes by
provisioning and de-provisioning resources in an autonomic manner, such that at each point in
time the available resources match the current demand as closely as possible.
EC2 servers: A business can increase the number of EC2 servers to handle a traffic spike, and
then reduce the number of servers when traffic decreases.
Cloud elasticity: A cloud provider can allocate unlimited resources to a user's account when
the feature is activated. This allows the user to handle unexpected or sudden workload spikes.
AWS Lambda: A serverless compute service that automatically scales in response to incoming
requests. Each request triggers a separate instance of your function, allowing for virtually
unlimited scaling without provisioning servers.
5.2 Containers
Cloud containers are software code packages that contain an application’s code, its libraries,
and other dependencies that it needs to run in the cloud.Traditionally, software had to be
packaged in multiple formats to run in different environments such as Windows, Linux, Mac, and
mobile. All containers are run by a single operating system kernel and therefore use fewer
resources than a virtual machine.
However, a container packages the software and all of its dependencies into a single file that
can run anywhere. Running the container in the cloud provides additional flexibility and
performance benefits at scale. With cloud containers, you can distribute and manage these
containers across many different cloud servers or instances. Containers are a lightweight form
of virtualization that package an application and its dependencies together, allowing it to run
consistently across different computing environments.
Virtualization
Virtualization is a technology that allows you to create and manage virtual versions of physical
resources, such as servers, storage devices, and network resources. It is used to create a
virtual version of an underlying service With the help of Virtualization. It is one of the main
cost-effective, hardware-reducing, and energy-saving techniques used by cloud providers.
Virtualization allows sharing of a single physical instance of a resource or an application among
multiple customers and organisations at one time.
● Host Machine: The machine on which the virtual machine is going to be built is
known as Host Machine.
● Guest Machine: The virtual machine is referred to as a Guest Machine.
Benefits of Virtualization
● More flexible and efficient allocation of resources.
● Enhance development productivity.
● It lowers the cost of IT infrastructure.
● Remote access and rapid scalability.
● High availability and disaster recovery.
● Pay per use of the IT infrastructure on demand.
● Enables running multiple operating systems.
Containers and virtual machines are not mutually exclusive. For instance, an organisation might
leverage both technologies by running containers in VMs to increase isolation and security and
leverage already installed tools for automation, backup and monitoring.
Characteristics of Containers:
Isolation: Containers run in isolated environments, ensuring that applications do not interfere
with each other. This allows multiple containers to run on the same host without conflict.
Efficiency: Containers share the host operating system's kernel, making them more lightweight
than traditional virtual machines (VMs). This allows for faster start-up times and better resource
utilisation.
Scalability: Containers can be easily scaled up or down based on demand. Orchestrators like
Kubernetes automate the management of containerized applications, allowing for dynamic
scaling.
Version Control: Container images can be versioned, making it easy to roll back to previous
versions or deploy specific versions in different environments.
Docker
Containerization: Docker allows you to package applications and their dependencies into
containers. Each container is isolated from others and runs consistently across different
environments.
Docker Images: A Docker image is a lightweight, standalone, executable package that includes
everything needed to run a piece of software, including the code, runtime, libraries, and
environment variables. Images can be versioned and shared.
Docker Hub: A cloud-based registry service for sharing and managing Docker images. Users
can upload their images to Docker Hub or pull existing images for use.
Docker Compose: A tool for defining and running multi-container Docker applications using a
YAML file. It allows you to specify the services, networks, and volumes needed for your
application.
Docker Swarm: A native clustering and orchestration tool for Docker, allowing you to manage a
group of Docker engines as a single virtual system. It simplifies deployment and scaling of
applications across multiple Docker hosts.
Components of Docker
● Docker Engine: It is a core part of docker, that handles the creation and
management of containers.
● Docker Image: It is a read-only template that is used for creating containers,
containing the application code and dependencies.
● Docker Hub: It is a cloud based repository that is used for finding and sharing the
container images.
● Dockerfile: It is a script that contains instructions to build a docker image.
● Docker Registry : It is a storage distribution system for docker images, where you
can store the images in both public and private modes.
Working of Docker
Containers: Docker packages applications and their dependencies into containers. A container
is a lightweight, standalone, executable package that includes everything needed to run the
application, such as the code, runtime, libraries, and system tools.
Images: Containers are created from images, which are read-only templates. An image
contains all the necessary files and instructions to run an application. You can create your own
images or use existing ones from Docker Hub, a public repository of images.
Docker Engine: This is the core component of Docker, responsible for running containers. It
can be installed on various operating systems and provides a REST API for managing
containers.
Docker Compose: For applications consisting of multiple containers, Docker Compose allows
you to define and manage them using a YAML file. This makes it easier to configure services,
networks, and volumes.
Networking and Storage: Docker provides networking capabilities to connect containers and
manage communication. It also offers options for persistent storage, allowing data to be stored
outside of the container lifecycle.
Isolation: Containers run in isolated environments, ensuring that they don’t interfere with one
another or the host system. This isolation helps in consistent deployments and easier
troubleshooting.
DevOps
DevOps combines development (Dev) and operations (Ops) to increase the efficiency, speed,
and security of software development and delivery compared to traditional processes. DevOps
practices enable software development (dev) and operations (ops) teams to accelerate delivery
through automation, collaboration, fast feedback, and iterative improvement.Stemming from an
Agile approach to software development, a DevOps process expands on the cross-functional
approach of building and shipping applications in a faster and more iterative manner.
DevOps is a set of practices, tools, and a cultural philosophy that automate and integrate
the processes between software development and IT teams. It emphasises team
empowerment, cross-team communication and collaboration, and technology
automation.
It seeks to shorten the software development lifecycle and increase the frequency of software
releases.
Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes,
these two teams merge into a single team where the engineers work across the entire
application lifecycle — from development and test to deployment and operations — and have a
range of multidisciplinary skills.
DevOps teams use tools to automate and accelerate processes, which helps to increase
reliability. A DevOps toolchain helps teams tackle important DevOps fundamentals including
continuous integration, continuous delivery, automation, and collaboration. DevOps values are
sometimes applied to teams other than development. When security teams adopt a DevOps
approach, security is an active and integrated part of the development process. This is called
DevSecOps.
Here's a brief overview of how it works:
Public registries are commonly used by individuals or small teams that want to get up and
running with their registry as quickly as possible. However, as their organisations grow, this can
bring more complex security issues like patching, privacy, and access control that can arise.
Private registries provide a way to incorporate security and privacy into enterprise container
image storage, either hosted remotely or on-premises. These private registries often come with
advanced security features and technical support.
Most cloud providers offer private image registry services:Google offers the Google Container
Registry, AWS provides Amazon Elastic Container Registry (ECR), and Microsoft has the Azure
Container Registry.
Benefits of container registry
● Provide a place to save images for sharing and collaboration
● Act as a single source of truth for an application or component, providing the most recent
version ready for replication and use
● Define container images approved for use in the organisation
Characteristics
Kubernetes master is responsible for managing the entire cluster, coordinates all activities
inside the cluster, and communicates with the worker nodes to keep the Kubernetes and your
application running.
a. API Server
i. All the administrative tasks are done by the API server within the master node.
ii. If we want to create, delete, update or display a Kubernetes object it has to go
through this API server.
iii. API server validates and configures the API objects such as ports, services,
replication, controllers, and deployments and it is responsible for exposing APIs
for every operation.
b. Scheduler
i. It is a service in the master responsible for distributing the workload.
ii. It is responsible for tracking the utilisation of the working load of each worker
node and then placing the workload on which resources are available and can
accept the workload.
iii. The scheduler is responsible for scheduling pods across available nodes
depending on the constraints you mention in the configuration file it schedules
these pods accordingly.
c. Controller Manager
i. Also known as controllers. It is a daemon that runs in a non terminating loop and
is responsible for collecting and sending information to the API server.
ii. It regulates the Kubernetes cluster by performing lifestyle functions such as
namespace creation and lifecycle event garbage collections, terminated pod
garbage collection, cascading deleted garbage collection, node garbage
collection, and many more.
iii. Basically, the controller watches the desired state of the cluster if the current
state of the cluster does not meet the desired state then the control loop takes
the corrective steps to make sure that the current state is the same as that of the
desired state.
d. Etc
i. It is a distributed key-value lightweight database. In Kubernetes, it is a central
database for storing the current cluster state at any point in time and is also used
to store the configuration details such as subnets, config maps, etc. It is written in
the Go programming language.
Kubernetes Worker node contains all the necessary services to manage the networking
between the containers, communicate with the master node, and assign resources to the
containers scheduled.
a. Kubelet
i. It is a primary node agent which communicates with the master node and
executes on each worker node inside the cluster.
ii. It gets the pod specifications through the API server and executes the container
associated with the pods and ensures that the containers described in the pods
are running and healthy.
iii. If kubelet notices any issues with the pods running on the worker nodes then it
tries to restart the pod on the same node
iv. If the issue is with the worker node itself then the Kubernetes master node
detects the node failure and decides to recreate the pods on the other healthy
node.
b. Kube-Proxy
i. It is the core networking component inside the Kubernetes cluster.
ii. It is responsible for maintaining the entire network configuration.
iii. Kube-Proxy maintains the distributed network across all the nodes, pods, and
containers and exposes the services across the outside world.
c. Pods
i. A pod is a group of containers that are deployed together on the same host. With
the help of pods, we can deploy multiple dependent containers together so it acts
as a wrapper around these containers so we can interact and manage these
containers primarily through pods.
d. Docker
i. Docker is the containerization platform that is used to package your application
and all its dependencies together in the form of containers to make sure that your
application works seamlessly in any environment which can be development or
test or production. Docker is a tool designed to make it easier to create, deploy,
and run applications by using containers.
Scaling
Scaling in Kubernetes refers to the process of adjusting the number of pod replicas in response
to application demand. It can be done both manually and automatically, ensuring that
applications can handle varying loads efficiently.
1. Manual Scaling
a. Replicas: You can manually adjust the number of replicas in a deployment by
changing the replicas field in the deployment configuration.
b. kubectl Command: Use the command kubectl scale deployment
<deployment-name> --replicas=<number> to increase or decrease the number of
pod instances.
2. Horizontal Pod Autoscaler (HPA)
a. Automatic Scaling: HPA automatically adjusts the number of pod replicas based
on observed CPU utilization (or other select metrics).
b. Configuration: You define HPA using metrics that specify the desired resource
utilization. Kubernetes will then increase or decrease the number of pods to
maintain this level.
c. Example: If you set HPA to maintain 50% CPU usage, and the average CPU
usage goes above this threshold, HPA will increase the number of pods.
3. Vertical Pod Autoscaler (VPA)
a. Resource Adjustment: VPA adjusts the resource requests and limits for
containers within a pod based on historical usage patterns.
b. Use Case: This is useful when workloads require more CPU or memory over
time but do not need to scale out (increase the number of pods).
4. Cluster Autoscaler
a. Node Scaling: This component automatically adjusts the size of the Kubernetes
cluster by adding or removing nodes based on pod requirements.
b. Integration: Works with cloud providers to scale nodes up or down depending on
resource needs.
Pipeline
Data Pipeline deals with information that is flowing from one end to another. In simple words, we
can say collecting the data from various resources than processing it as per requirement and
transferring it to the destination by following some sequential activities. It is a set of manners
that first extracts data from various resources and transforms it to a destination means it
processes it as well as moves it from one system to another system.
1. Source
2. Destination
3. Data flow
4. Processing
5. Workflow
6. Monitoring
Microservices
Microservices are an architectural approach to developing software applications as a collection
of small, independent services that communicate with each other over a network. Microservice
is a small, loosely coupled service that is designed to perform a specific business function and
each microservice can be developed, deployed, and scaled independently.
This architecture allows you to take a large monolith application and decompose it into small
manageable components/services. Also, it is considered as the building block of modern
applications.
Microservices can be written in a variety of programming languages, and frameworks, and each
service acts as a mini-application on its own.
Hybrid and Multicloud Kubernetes refer to the deployment and management of Kubernetes
clusters across different environments, combining on-premises infrastructure with public cloud
services or multiple cloud providers. Here's an overview of each concept:
Hybrid Kubernetes
Definition: Hybrid Kubernetes involves running Kubernetes clusters that span both on-premises
data centres and public cloud environments.
Key Features:
Use Cases:
● Regulatory Compliance: Sensitive data can be kept on-premises while utilising cloud
resources for less sensitive workloads.
● Scalability: Businesses can scale their applications in the cloud during peak demand
while maintaining core services on-premises.
Definition: Multicloud Kubernetes refers to using multiple cloud providers (e.g., AWS, Azure,
Google Cloud) to run Kubernetes clusters, often with a focus on leveraging the best features
from each provider.
Key Features:
● Avoid Vendor Lock-in: By using multiple cloud providers, organisations can avoid being
tied to a single vendor, increasing flexibility and negotiating power.
● Optimised Resource Utilisation: Organisations can choose the best services or pricing
options from different clouds, optimising costs and performance.
● Redundancy and Resilience: Distributing workloads across multiple clouds can enhance
fault tolerance and availability.
Use Cases: