0% found this document useful (0 votes)
4 views45 pages

Chapter 8 - Virtualization and Cloud Computing Part 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views45 pages

Chapter 8 - Virtualization and Cloud Computing Part 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Virtualization and Cloud

Computing for Big Data

Big Data Technology


Introduction
 Virtualization
and cloud computing is an important technologies that
support Big Data
 Virtualization provides the added level of efficiency to make big data
platforms a reality. Although virtualization is technically not a requirement
for big data analysis, software frameworks are more efficient in a virtualized
environment.
What is Virtualization?
 Virtualizationis the ability to run multiple operating systems on a single
physical system and share the underlying hardware resources*
 Itis the process by which one computer hosts the appearance of many
computers.
 Virtualizationis used to improve IT throughput and costs by using physical
resources as a pool from which virtual resources can be allocated.

*VMWare white paper, Virtualization Overview


What is Virtualization?
 Virtualization describes the task of abstracting of computer resources as a
form of a virtual version of the platforms like - virtual computer hardware
platforms, storage devices, and computer network resources etc.
Virtualization Architecture
• A Virtual machine (VM) is an isolated runtime environment (guest OS and
applications)
• Multiple virtual systems (VMs) can run on a single physical system
Hypervisor
A hypervisor, a.k.a. a virtual machine manager/monitor (VMM), or
virtualization manager, is a program that allows multiple operating systems
to share a single hardware host.
 Eachguest operating system appears to have the host's processor, memory,
and other resources all to itself. However, the hypervisor is actually
controlling the host processor and resources, allocating what is needed to
each operating system in turn and making sure that the guest operating
systems (called virtual machines) cannot disrupt each other.
Hypervisor
Hypervisor
• Hypervisor plays an important role in the virtualization scenario by
virtualization of hardware. It provides support for running multiple operating
systems concurrently in virtual servers created within a physical server.

• The virtualization layer is the software responsible for hosting and managing
all VMs. The virtualization layer is a hypervisor running directly on the
hardware.

• Example: VMWare, Xen, KVM.


Server without Virtualization
• Only one OS can run at a
time within a server.
Multiple Software Applications • Under utilization of
resources.
Operating System • Inflexible and costly
infrastructure.
Hardware
• Hardware changes require
manual effort and access
to the physical server.
CPU Memory NIC DISK
Server with Virtualization
• Can run multiple OS simultaneously.
Multiple Software Multiple Software • Each OS can have different
Applications Applications hardware configuration.

Operating System Operating System


• Efficient utilization of hardware
resources.
Virtual Server 1 Virtual Server 2
• Each virtual machine is
independent.
Hypervisor • Save electricity, initial cost to buy
servers, space etc.
Hardware • Easy to manage and monitor virtual
machines centrally.

CPU Memory NIC DISK


Server with Virtualization
Full virtualization
Multiple Software
Applications
Multiple Software
Applications
• Enables hypervisors to run an
unmodified guest operating system
(e.g. Windows 2003 or XP).
Operating System Operating System

Virtual Server 1 Virtual Server 2


• Guest OS is not aware that it is
being virtualized.
• E.g.: VMware uses a combination
Hypervisor of direct execution and binary
translation techniques to achieve
Hardware full virtualization of server systems.

CPU Memory NIC DISK


Server with Virtualization

Multiple Software Multiple Software


Applications Applications Para virtualization
Para virtualized Guest Para virtualized Guest
• Involves explicitly modifying guest
Operating System Operating System operating system (e.g. SUSE Linux
Virtual Server 1 Virtual Server 2 Enterprise Server 11) so that it is
aware of being virtualized to
allow near native performance.
Hypervisor / VMM
• Improves performance.
Hardware • Lower overhead.
• E.g.: Xen supports both Hardware
Assisted Virtualization (HVM) and
Para-Virtualization (PV).

CPU Memory NIC DISK


Server with Virtualization
VM VM VM

Bare metal Approach


• Type I Hypervisor.
• Runs directly on the system
hardware.
Hypervisor • May require hardware assisted
virtualization technology support
Kernel Driver by the CPU.
• Limited set of hardware drivers
provided by the hypervisor
vendor.
• E.g.: Xen, VMWare ESXi
Hardware
Server with Virtualization
VM VM

Applications Hosted Approach


Hypervisor • Type II Hypervisor.
• Runs virtual machines on top of
a host OS (windows, Unix etc.)
• Relies on host OS for physical
resource management.
Host Operating System
• Host operating system provides
drivers for communicating with
the server hardware.
• E.g.: VirtualBox

Hardware
Characteristics of Virtualization
 Virtualization has three characteristics that support the scalability and
operating efficiency required for big data environments:
 Partitioning: In virtualization, many applications and operating systems are
supported in a single physical system by partitioning the available resources.
 Isolation: Each virtual machine is isolated from its host physical system and
other virtualized machines. Because of this isolation, if one virtual instance
crashes, the other virtual machines and the host system aren’t affected. In
addition, data isn’t shared between one virtual instance and another.
 Encapsulation: A virtual machine can be represented as a single file, so you
can identify it easily based on the services it provides.
Benefits of Virtualization
 Sharing of resources helps cost reduction
 Isolation:Virtual machines are isolated from each other as if they are
physically separated
 Encapsulation: Virtual machines encapsulate a complete computing
environment
 Hardware Independence: Virtual machines run independently of underlying
hardware
 Portability: Virtual machines can be migrated between different hosts.
Application of Virtualization for Big Data
 Big data application virtualization
 Application infrastructure virtualization provides an efficient way to manage
applications in context with customer demand.
 The application is encapsulated in a way that removes its dependencies from the
underlying physical computer system.
 This helps to improve the overall manageability and portability of the application.
Application of Virtualization for Big Data
 Big data application virtualization
 In addition, the application infrastructure virtualization software typically allows
for codifying business and technical usage policies to make sure that each of your
applications leverages virtual and physical resources in a predictable way.
 Efficiencies are gained because you can more easily distribute IT resources
according to the relative business value of your applications.
 Application infrastructure virtualization used in combination with server
virtualization can help to ensure that business service-level agreements are met.
 Server virtualization monitors CPU and memory usage, but does not account for
variations in business priority when allocating resources.
Application of Virtualization for Big Data
 Big data network virtualization
 Network virtualization provides an efficient way to use networking as a pool of
connection resources.
 Instead of relying on the physical network for managing traffic, you can create
multiple virtual networks all utilizing the same physical implementation.
 This can be useful if you need to define a network for data gathering with a certain
set of performance characteristics and capacity and another network for
applications with different performance and capacity.
 Virtualizing the network helps reduce these bottlenecks and improve the
capability to manage the large distributed data required for big data analysis.
Application of Virtualization for Big Data
 Big data processor and memory virtualization
 Processor virtualization helps to optimize the processor and maximize
performance. Memory virtualization decouples memory from the servers.
 In big data analysis, you may have repeated queries of large data sets and the
creation of advanced analytic algorithms, all designed to look for patterns and
trends that are not yet understood.
 These advanced analytics can require lots of processing power (CPU) and memory
(RAM).
 For some of these computations, it can take a long time without sufficient CPU
and memory resources.
Application of Virtualization for Big Data
 Big data and storage virtualization
 Data virtualization can be used to create a platform for dynamic linked data
services. This allows data to be easily searched and linked through a unified
reference source.
 As a result, data virtualization provides an abstract service that delivers data in a
consistent form regardless of the underlying physical database.
 In addition, data virtualization exposes cached data to all applications to improve
performance.
 Storage virtualization combines physical storage resources so that they are more
effectively shared. This reduces the cost of storage and makes it easier to manage
data stores required for big data analysis.
Cloud Computing and Big Data

 One of the vital issues that organizations face with the storage and management of
Big Data is the huge amount of investment to get the required hardware setup and
software packages.

 Some of these resources may be overutilised or underutilized with varying


requirements overtime.

 We can overcome these challenges by providing a set of computing resources that


can be shared through cloud computing.
Cloud Computing Services
Storage-as-a-service
SaaS Database-as-a-service
PaaS Information-as-a-service
Process-as-a-service
IaaS Application-as-a-service
Integration-as-a-service
Security-as-a-service
Mgt/Governance-as-a-service
Testing-as-a-service
Network-as-a-Service
Testing-as-a-Service
Cloud Computing evolution
Cloud Computing and Big Data

 The cloud computing environment saves costs related to infrastructure in an


organization by providing a framework that can be optimized and expanded
horizontally.
Cloud Computing and Big Data

 Cloud computing is the delivery of computing services—including servers, storage,


databases, networking, software, analytics, and intelligence—over the Internet (“the
cloud”) to offer faster innovation, flexible resources, and economies of scale. You
typically pay only for cloud services you use, helping you lower your operating costs,
run your infrastructure more efficiently, and scale as your business needs change.
Cloud Computing and Big Data

Features of Cloud Computing


Scalability

Elasticity

Resource Pooling

Self-Service

Low Cost

Fault Tolerance
Cloud Computing and Big Data

Cloud Deployment Models

Public Cloud (End-User Level Cloud)

Private Cloud (Enterprise-Level Cloud)

Community Cloud

Hybrid Cloud
Cloud Computing and Big Data

Cloud Deployment Models

• Public Cloud (End-User Level Cloud): A cloud that is owned and managed by a
company than the one (which can be either an individual user or a company) using it
is known as a public cloud.

• Private Cloud (Enterprise Level Cloud): The cloud that remains entirely in the
ownership of the organization using it is known as a private cloud.
Cloud Computing and Big Data

Cloud Deployment Models

• Community Cloud: Community cloud is a type of cloud that is shared among various
organizations with a common tie.

• Hybrid Cloud: The cloud environment in which various internal or external service
providers offer services to many organizations is known as a hybrid cloud.
Cloud Computing and Big Data

Cloud Delivery Models


• It is one of the categories of cloud computing
Infrastructure as a Service services, which makes available virtualized
(IaaS) computing resources on Internet.
• It is built above IaaS and is the layer that
interacts with the users, allowing them to
Platform as a Service (PaaS) deploy and use applications created using
programming and run-time environment
platforms that are supported by the provider.
• SaaS is one of the most popular cloud-based
Software as a Service (SaaS) models and comprises applications provided by
the service provider.
Cloud Computing and Big Data
Cloud Computing and Big Data

Cloud Providers in Big Data Market

 Big Data cloud providers have been gearing up to bring the most advanced technologies at
competitive prices in the market.

 Some providers are established, whereas some of them are relatively new to the field of
cloud services. Some of these providers are rendering services that are relevant to Big Data
analytics only. Some such providers are as follows:
 Amazon
 Google
 Windows Azure
SaaS, PaaS and Iaas applications
Benefits of Big data analysis in Cloud.

 Improved analysis
 With the advancement of Cloud technology, big data analysis has become more
improved causing better results. Hence, companies prefer to perform big data
analysis in the Cloud. Moreover, Cloud helps to integrate data from numerous
sources.
Benefits of Big data analysis in Cloud.

 Simplified Infrastructure
 Big Data analysis is a tremendous strenuous job on infrastructure as the data comes
in large volumes with varying speeds, and types which traditional infrastructures
usually cannot keep up with.

 As the Cloud computing provides flexible infrastructure, which we can scale


according to the needs at the time, it is easy to manage workloads.
Benefits of Big data analysis in Cloud.

 Lowering the cost


 Both Big data and Cloud technology delivers value to organizations by
reducing the ownership.
 The Pay-per-user model of Cloud turns CAPEX into OPEX. On the other hand,
Apache cut down the licensing cost of Big data which is supposed to be cost
millions to build and buy.
 Cloudenables customers for big data processing without large-scale big data
resources. Hence, both Big Data and Cloud technology are driving the cost
down for enterprise purposes and bringing value to the enterprise.
Benefits of Big data analysis in Cloud.

 Security and Privacy


 Datasecurity and privacy are two major concerns when dealing
with enterprise data.
 Moreover, when your application is hosted on a Cloud platform
due to its open environment and limited user control security
becomes a primary concern.
Benefits of Big data analysis in Cloud.

 Security and Privacy


 On the other hand, being an open source application, Big data
solution like Hadoop uses a lot of third-party services and
infrastructure.
 Hence, nowadays system integrators bring in Private Cloud
Solution that is Elastic and Scalable. Furthermore, it also
leverages Scalable Distributed Processing.
Benefits of Big data analysis in Cloud.
 Security and Privacy
 Besides
that Cloud data is stored and processed in a central location
commonly known as Cloud storage server.
 Along with it the service provider and the customer signs a service level
agreement (SLA) to gain the trust between them. If require the provider also
leverages required advanced level of security control.
 This enables the security of big data in Cloud computing covering the
following issues:
 Protecting big data from advanced threats.

 How Cloud service providers maintain storage and data.


Benefits of Big data analysis in Cloud.
 Security and Privacy
 Thereare rules associated with service level agreements for protecting
 data

 capacity

 scalability

 security

 privacy

 availability of data storage and data growth

 Onthe other hand in many organizations, big data analytics is utilized to


detect and prevent advanced threats and malicious hackers.
Potential challenges of cloud computing for
Big Data
 Less control over security
 These large datasets often contain sensitive information such as individuals’
addresses, credit card details, social security numbers, and other personal
information.
 Ensuring that this data is kept protected is of paramount importance. Data
breaches could mean serious penalties under various regulations and a tarnished
company brand, which can lead to loss customers and revenue.
Potential challenges of cloud computing for
Big Data
 Less control over compliance
 Compliance is another concern that you’ll have to think about when moving data to
the cloud.
 Cloud service providers maintain a certain level of compliance with various
regulations such as HIPAA, PCI, and many more. But similar to security, you no
longer have full control over your data’s compliance requirements.
 Even if your CSP is managing a good chunk of your compliance, you should make
sure you know the answers to the following questions:
 Where is the data going to reside?
 Who is going to manage it, and who can access it?
 What local data regulations do I need to comply with?
Potential challenges of cloud computing for
Big Data
 Network dependency and latency issues
 The flipside of having easy connectivity to data in the cloud is that
availability of the data is highly reliant on network connection.
 This dependence on the internet means that the system could be prone to
service interruptions.
 In addition, the issue of latency in the cloud environment could well come
into play given the volume of data that’s being transferred, analyzed, and
processed at any given time.
References
 https://www.dummies.com/programming/big-data/big-data-visualization/t
he-importance-of-virtualization-to-big-data/
 https://www.whizlabs.com/blog/big-data-and-cloud-computing/

 Big Data & Virtualization: Concept familiarization and relation between them
accelerate the insight of Big Data and Virtualization with a laconic concept and
significant overview, https://www.ijedr.org/papers/IJEDR1803107.pdf

You might also like