0% found this document useful (0 votes)
20 views

Cloud Computing Presentation Notes

The document discusses cloud computing concepts including definitions, characteristics, deployment and service models, and open source private cloud software options like CloudStack, Eucalyptus, and OpenStack. It also covers data center and virtualization technologies, different types and levels of virtualization, and the pros and cons of virtualization. Key topics include cloud architecture, virtual machine monitors, full vs para vs hardware virtualization, and virtualization in cloud computing.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Cloud Computing Presentation Notes

The document discusses cloud computing concepts including definitions, characteristics, deployment and service models, and open source private cloud software options like CloudStack, Eucalyptus, and OpenStack. It also covers data center and virtualization technologies, different types and levels of virtualization, and the pros and cons of virtualization. Key topics include cloud architecture, virtual machine monitors, full vs para vs hardware virtualization, and virtualization in cloud computing.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

RAJASTHAN TECHNICAL UNIVERSITY, KOTA

Department of Computer Science Engineering


Cloud Computing Presentation
VII Semester
Presentation by: Submitted to:
Manvendra Singh (17/478) Dr. R.K. Banyal
Gaurav Tak (20/361)
Associate Professor (Department of Computer Science)
Nakul Sharma (20/378)
Rahul Jain (20/389)
UNIT I
• Introduction: Historical Development • Open-Source Private Cloud Software
• Cloud Computing Architecture • Eucalyptus
• The Cloud Reference Model • Open Nebula

• Cloud Characteristics • OpenStack

• Cloud Deployment Models


• Public
• Private
• Community
• Hybrid
• Cloud Delivery Models
• IaaS
• PaaS
• SaaS
CLOUD COMPUTING: DEFINITION

“Cloud Computing is a model for enabling ubiquitous, convenient, on-demand


network access to a shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or service provider
interaction.” - NIST
CLOUD COMPUTING: ESSENTIAL CHARACTERISTICS

NIST identifies five essential characteristics of cloud computing:


• On-demand self service
• Broad network access
• Resource pooling
• Rapid elasticity
• Measured service
CLOUD COMPUTING: COMMON CHARACTERISTICS

• Massive Scale
• Homogeneity
• Virtualization
• Low-Cost Software
• Resilient Computing
• Geographic Distribution
• Service Orientation
• Advanced Security
CLOUD COMPUTING: ARCHITECTURE
CLOUD COMPUTING:
DEPLOYMENT MODELS

• Public Cloud
• Private Cloud
• Community Cloud
• Hybrid Cloud
CLOUD COMPUTING:
SERVICE MODELS

• Infrastructure-as-a-Service (IaaS):
• Virtual computing, storage and network
resource that can be provisioned on demand
• Platform-as-a-Service (PaaS):
• Application development frameworks,
operating systems and deployment
frameworks
• Software-as-Service (SaaS):
• Applications, management and user
interfaces provided over a network
OPEN SOURCE PRIVATE CLOUD
SOFTWARE: CLOUDSTACK

• Apache CloudStack is an open source cloud software that can be used for
creating private cloud offerings.
• CloudStack manages the network, storage, and compute nodes that make up a
cloud infrastructure.
• A CloudStack installation consists of a Management Server and the cloud
infrastructure that it manages.
• Zones
• The Management Server manages one or more zones where each zone is typically a single
datacenter.

• Pods
• Each zone has one or more pods. A pod is a rack of hardware comprising of a switch and
one or more clusters.

• Cluster
• A cluster consists of one or more hosts and a primary storage. A host is a compute node
that runs guest virtual machines.

• Primary Storage
• The primary storage of a cluster stores the disk volumes for all the virtual machines
running on the hosts in that cluster.

• Secondary Storage
• Each zone has a secondary storage that stores templates, ISO images, and disk volume
snapshots
OPEN SOURCE PRIVATE CLOUD
SOFTWARE: EUCALYPTUS

• Eucalyptus is an open source private cloud software for building private and
hybrid clouds that are compatible with Amazon Web Services (AWS) APIs.
• Node Controller
• NC hosts the virtual machine instances and manages the virtual network endpoints.

• The cluster-level (availability-zone) consists of three components


• Cluster Controller: which manages the virtual machine and is the front-end for a cluster.

• Storage Controller: which manages the Eucalyptus block volumes and snapshots to the
instances within its specific cluster. SC is equivalent to the AWS Elastic Block Store
(EBS).

• VMWare Broker: which is an optional component that provides an AWS-compatible


interface for VMWare environments.

• At the cloud-level there are two components:


• Cloud Controller: which provides an administrative interface for cloud management and
performs high-level resource scheduling, system accounting, authentication and quota
management.

• Walrus: which is equivalent to Amazon S3 and serves as a persistent storage to all of the
virtual machines in the Eucalyptus Cloud. Walrus can be used as a simple Storage-as-a-
Service.
OPEN SOURCE PRIVATE CLOUD
SOFTWARE: OPENSTACK

• OpenStack is a cloud operating system comprising of a collection of interacting


services that control computing, storage, and networking resources.
• OpenStack compute service (nova-compute) manages the networks of virtual
machines running on nodes, providing virtual servers on demand.
• The networking service (nova-networking) provides connectivity between the
interfaces of other OpenStack services.
• The volume service (cinder) manages storage volumes for virtual machines.
• The object storage service (swift) allows users to storage and retrieve files.
• The identity service (keystone) provides authentication and authorization for
other services.
• The image registry (glance) acts as a catalog and repository for virtual machine
images.
• The OpenStack scheduler (nova-scheduler) maps the nova-API calls to the
appropriate OpenStack components. The scheduler takes the virtual machine
requests from the queue and determines where they should run.
• The messaging service (rabbit-mq) acts as a central node for message passing
between daemons.
• Orchestration activities such as running an instance are performed by the nova-
api which accepts and responds to end user compute API calls.
• The OpenStack dashboard (horizon) provides web-based interface for
managing OpenStack services.
UNIT II

• Data Center Technology


• Virtualization
• Characteristics of Virtualized Environments
• Taxonomy of Virtualization Techniques
• Virtualization and Cloud Computing
• Pros and Cons of Virtualization
• Implementation Levels of Virtualization Tools and Mechanisms
• VMWare
• Microsoft Hyper-V
• KVM
• Virtual Box
DATA CENTER TECHNOLOGY

A data center is a specialized IT infrastructure that houses centralized IT resources, such as servers, databases, and software systems.
Data centers are typically comprised of the following technologies and components:
• Virtualization
• Standardization and Modularity
• Automation
• Remote Operation and Management
• High Availability
• Security-Aware Design, Operation and Management
• Facilities
• Computing Hardware
• Storage Hardware
• Networking Hardware
VIRTUALIZATION

• Virtualization refers to the partitioning


of resources of a physical system (such
as computing, storage, network and
memory) into multiple virtual
resources.
• Key enabling technology of cloud
computing that allows pooling of
resources.
• In cloud computing, resources are
pooled to serve multiple users using
multi-tenancy.
HYPERVISOR

• The virtualization layer consists of a


hypervisor or a virtual machine monitor
(VMM).
• Hypervisor presents a virtual operation
platform to a guest operating system (OS).
• Type-1 Hypervisor
• Type-1 or native hypervisors run directly on
the host hardware and control the hardware
and monitor the guest operating systems.
• Type-2 Hypervisor
• Type-2 hypervisors or hosted hypervisors run
on top of a conventional (main/host)
operating system and monitor the guest
operating systems.
TYPES OF VIRTUALIZATION

• Full Virtualization
• In full virtualization, the virtualization layer completely decouples the guest OS from the underlying hardware. The guest
OS requires no modification and is not aware that it is being virtualized. Full virtualization is enabled by direct execution
of user requests and binary translation of OS requests.
• Para-Virtualization
• In para-virtualization, the guest OS is modified to enable communication with the hypervisor to improve performance and
efficiency. The guest OS kernel is modified to replace non-virtualizable instructions with hyper-calls that communicate
directly with the virtualization layer hypervisor.
• Hardware Virtualization
• Hardware assisted virtualization that is enabled by hardware features such as Intel’s Virtualization Technology (VT-x) and
AMD’s AMD-V. In hardware assisted virtualization, privileged and sensitive calls are set to automatically trap to the
hypervisor. Thus, there is no need for either binary translation or para-virtualization.
CHARACTERISTICS OF VIRTUALIZED ENVIRONMENTS

• Increased Security
• Managed Execution
• Sharing
• Aggregation
• Emulation
• Isolation
• Performance Tuning
• Virtual machine migration
• Portability
TAXONOMY OF VIRTUALIZATION TECHNIQUES

• Execution virtualization
• Machine reference model
• Hardware-level virtualization
• Hypervisors
• Hardware virtualization techniques
• Operating system-level virtualization
• Programming language-level virtualization
• Application-level virtualization
• Other types of virtualization
• Storage virtualization
• Network virtualization
• Desktop virtualization
• Application server virtualization
PROS OF VIRTUALIZATION

• Managed execution
• Isolation
• Simplified allocation and partitioning of resources
• Portability and self-containment
• Efficient use of resources
• Disaster Recovery
• Cloud Migration
CONS OF VIRTUALIZATION

• Performance degradation
• Virtualization interposes an abstraction layer between the guest and the host, which causes increased latencies for the guest.
• In hardware virtualization, VMM is executed and scheduled together with other applications sharing the resources of the
host with them, thereby causing performance degradation.
• In case of programming language virtual machines, binary translation and interpretation can slow down the execution of
managed applications.
• Inefficiency and degraded user experience
• Virtualization can sometime lead to an inefficient use of the host.
• Some of the specific features of the host cannot be exposed by the abstraction layer and then become inaccessible. This can
happen due to device drivers in case of hardware virtualization and due to lack of specific libraries in case of
programming-level virtualization.
• Security holes and new threats
• In case of hardware virtualization, malicious programs can preload themselves before the OS and act as a thin VMM
toward it. The OS is then controlled and can be manipulated to extract sensitive information.
• Examples: BluePill, SubVirt, etc
IMPLEMENTATION LEVELS OF VIRTUALIZATION

• Instruction Set Architecture Level (ISA)


• Hardware Abstraction Level (HAL)
• Operating System Level
• Library Level
• Application Level
VIRTUALIZATION TOOLS

Feature VMWare Microsoft Hyper-V KVM VirtualBox


Vendor VMWare Inc. Microsoft Open Source Oracle

Windows (with optional


Licensing Model Commercial Commercial Open Source
extensions)

Host OS VMWare ESXi, vSphere Windows Server Linux (Kernel-based) Windows, Linux, macOS

Guest OS Support Wide range Windows, Linux Various Linux Windows, Linux, macOS, more

Hypervisor Type Type 1 (Bare-Metal) Type 1 (Bare-Metal) Type 1 (Bare-Metal) Type 2 (Hosted)

Performance High Good Good Good

Scalability Excellent Good Good Good

Management Tools vCenter, Web Client Hyper-V Manager, System center Libvirt, virt-manager VirtualBox Manager

Snapshot Support Yes Yes Yes Yes

Live Migration vMotion Live Migration Live Migration No native support

High Availability Yes (HA/DRS) Yes (Failover Clustering) Manual setup No native support

Community/Support Strong Strong Strong Moderate


UNIT III
• Cloud Infrastructure Mechanism • Cloud Management Mechanism
• Cloud Storage • Remote Administration System
• Cloud Usage Monitor • Resource Management System
• Resource Replication • SLA Management System
• Specialized Cloud Mechanism • Billing Management System
• Load Balancer
• SLA Monitor
• Pay-per-use Monitor
• Audit Monitor
• Failover System
• Hypervisor
• Resource Cluster
• Multi Device Broker
• State Management Database
CLOUD INFRASTRUCTURE MECHANISM:
C L O U D S TO R A G E

• The cloud storage device mechanism represents storage devices that are
designed specifically for cloud-based provisioning.
• Instances of these devices can be virtualized and are able to provide
fixed-increment capacity allocation in support of the pay-per-use
mechanism.
• Cloud storage devices can be exposed for remote access via cloud
storage services.
• Cloud Storage Levels
• Files

• Blocks

• Datasets

• Objects

• Cloud Storage Technical Interfaces


• Network Storage Interfaces

• Object Storage Interfaces

• Database Storage Interfaces


• Relational Data Storage

• Non-Relational Data Storage


CLOUD INFRASTRUCTURE MECHANISM:
C L O U D U S A G E M O N I TO R

• Cloud Usage Monitor mechanism is a lightweight and autonomous software


program responsible for collecting and processing IT resource usage data.

• They can be designed to forward collected usage data to a log database for post
processing and reporting purposes.

• They can be implemented in three agent-based implementation formats depending


on the type of usage metrics they are designed to collect, and the way usage data
needs to be collected.

• Monitoring Agent
• A monitoring agent is an intermediary, event-driven program that exists as a service agent and
resides along existing communication paths to transparently monitor and analyse dataflows.

• This type of cloud usage monitor is commonly used to measure network traffic and message
metrics.

• Resource Agent
• A resource agent is a processing module that collects usage data by having event-driven
interactions with specialized resource software.

• This module is used to monitor usage metrics based on pre-defined observable events are the
resource software level, such as initiating, suspending, resuming, and vertical scaling.

• Polling Agent
• A polling agent is a processing module that collects cloud service usage data by polling IT
resources.

• This type of cloud service monitor is commonly used to periodically monitor IT resource status,
such as uptime and downtime.
CLOUD INFRASTRUCTURE MECHANISM:
R E S O U R C E R E P L I C AT I O N

• Replication is defined as the creation of multiple instances of the same


IT resource

• Replication is typically performed when an IT resource’s availability


and performance need to be enhanced.

• Virtualization technology is used to implement the resource replication


mechanism to replicate cloud-based IT resources
SPECIALIZED CLOUD MECHANISM:
LOAD BALANCER

• Load Balancer mechanism is a runtime agent to balance a workload


across two or more IT resources to increase performance and
capacity beyond what a single IT resource can provide to improve
horizontal scaling.

• Load balancers can performance a range of specialized runtime


workload distribution functions that include:

• Asymmetric Distribution: Larger workloads are issued to IT resources with


higher processing capabilities..

• Workload Prioritization: Workloads are scheduled, queued, discarded, and


distributed workloads according to their priority levels.

• Content-Aware Distribution: requests are distributed to different IT


resources as dictated by the request content.

• A load balancer is programmed or configured with a set of


performance and QoS rules and parameters with the general
objectives of optimizing IT resource usage, avoiding overloads, and
maximizing throughput.
SPECIALIZED CLOUD MECHANISM:
S L A M O N I TO R

• SLA monitor mechanism is used to specifically observe the runtime


performance of cloud services to ensure that they are fulfilling the
contractual QoS requirements that are published in SLAs

• The data collected by the SLA monitor is processed by an SLA


management system to be aggregated into SLA reporting metrics.

• The system can proactively repair or failover cloud services when


exception conditions occur, such as when the SLA monitor reports a
cloud service as down.
SPECIALIZED CLOUD MECHANISM:
PAY- P E R - U S E M O N I TO R

• The pay-per-use monitor mechanism measures cloud-based IT


resource usage in accordance with predefined pricing parameters
and generates usage logs for fee calculations and billing purposes.

• The monitoring variables include:


• Request/Response message quantity

• Transmitted data volume

• Bandwidth consumption

• The data collected by the pay-per-use monitor is processed by a


billing management system that calculates the payment fees.
SPECIALIZED CLOUD MECHANISM:
A U D I T M O N I TO R

• The audit monitor mechanism is used to collect audit tracking data


for networks and IT resources in support of (or dictated by)
regulatory or contractual obligations.

• Audit monitor is generally implemented as a monitoring agent that


intercepts “login” requests and stores the requestor’s security
credentials, as well as both failed and successful login attempts, in a
log database for future audit reporting purposes.
SPECIALIZED CLOUD MECHANISM:
FA I L O V E R S Y S T E M

• The failover system mechanism is used to increase the reliability


and availability of IT resources by using established clustering
technology to provide redundant implementations.

• A failover system is configured to automatically switch over to a


redundant or standby IT resource instance whenever the currently
active IT resource becomes available.

• Failover systems come in two basic configurations:


• Active-Active

• In an active-active configuration, redundant implementations of the IT


resource actively serve the workload asynchronously. Load balancing
among active instances is required.

• When a failure is detected, the failed instance is removed from the load
balancing scheduler. Whichever IT resource remains operational when a
failure is detected takes over the processing.

• Active-Passive

• In an active-passive configuration, a standby or inactive


implementation is activated to take over the processing from the IT
resource that becomes unavailable, and the corresponding workload in
redirected to the instance taking over the operation.
SPECIALIZED CLOUD MECHANISM:
H Y P E RV I S O R

• The hypervisor mechanism is a fundamental part of virtualization


infrastructure that is primarily used to generate virtual server
instances of a physical server.

• A hypervisor is generally limited to one physical server and can


therefore only create virtual images of that server. Similarly, a
hypervisor can only assign virtual servers it generates to resource
pools that reside on the same underlying physical server.

• A hypervisor has limited virtual server management features, such


as increasing the virtual server’s capacity or shutting it down.

• The VIM provides a range of features for administering multiple


hypervisors across physical servers.

• Hypervisor software can be installed directly in bare-metal servers


and provides features for controlling, sharing and scheduling the
usage of hardware resources, such as processor power, memory, and
I/O. These can appear to each virtual server’s operating system as
dedicated resources.
SPECIALIZED CLOUD MECHANISM:
RESOURCE CLUSTER

• The resource cluster mechanism is used to group multiple IT resource instances


so that they can be operated as a single IT resource. This increases the
combined computing capacity, load balancing, and availability of the clustered
IT resources.

• Common Resource Cluster Types


• Server Cluster

• Database Cluster

• Large Dataset Cluster

• Basic types of Resource Clusters


• Load Balanced Cluster

• This resource cluster specializes in distributing workloads among cluster nodes to


increase IT resource capacity while preserving the centralization of IT resource
management.

• It usually implements a load balancer mechanism that is either embedded within the
cluster management platform or set up as a separate IT resource.

• HA Cluster

• A high-availability cluster maintains system availability in the event of multiple node


failures and has redundant implementations of most or all the clustered IT resources.

• It implements a failover system mechanism that monitors failure conditions and


automatically redirects the workload away from any failed nodes.
SPECIALIZED CLOUD MECHANISM:
M U LT I - D E V I C E B R O K E R

• The multi-device broker mechanism is used to facilitate runtime data


transformation to make a cloud service accessible to a wider range of cloud
service consumer programs and devices.

• A multi-device broker contains the mapping logic necessary to transform data


exchanges between a cloud service and different types of cloud service
consumer devices.

• Multi-device brokers commonly exists as gateways or incorporate gateway


components, such as:

• XML Gateway: Transmits and validates XML data

• Cloud Storage Gateway: Transforms cloud storage protocols and encodes storage devices
to facilitate data transfer and storage

• Mobile Device Gateway: Transforms the communication protocols used by mobile


devices into protocols that are compatible with a cloud service

• The levels at which transformation logic can be created include:

• Transport Protocols

• Messaging Protocols

• Storage Device Protocols

• Data Schemas/Data Models


SPECIALIZED CLOUD MECHANISM:
S TAT E M A N A G E M E N T D ATA B A S E

• A state management database is a storage device that is used to


temporarily persist state data for software programs.

• As an alternative to caching state data in memory, software


programs can off-load state data to the database to reduce the
amount of runtime memory they consume.

• By doing so, the software programs and the surrounding


infrastructure are more scalable.

• State management databases are commonly used by cloud services,


especially those involved in long-running runtime activities.

• During the lifespan of a cloud service instance, it may be required


to remain stateful and keep state data cached in memory, even when
idle.

• By deferring state data to a state repository, the cloud service can


transition to a stateless condition (or a partially stateless condition),
thereby temporarily freeing system resources.
CLOUD MANAGEMENT MECHANISM:
R E M O T E A D M I N I S T R AT I O N S Y S T E M

• The remote administration system mechanism provides tools and


user-interfaces for external cloud resource administrators to
configure and administer cloud-based IT resources.

• A remote administration system can establish a portal for access to


administration and management features of various underlying
systems, including the resource management, SLA management,
and billing management systems.

• The tools and APIs provided by a remote administration system are


generally used by the cloud provider to develop and customize
online portals that provide cloud consumers with a variety of
administrative controls.

• The two primary types of portals that are created with the remote
administration systems are:

• Usage and Administrative Portal

• Self-Service Portal
CLOUD MANAGEMENT MECHANISM:
RESOURCE MANAGEMENT SYSTEM

• The resource management system mechanism helps coordinate IT resources in


responsive to management actions performed by both cloud consumers and
cloud providers.

• Virtual Infrastructure Manager (VIM) is the core of this system that


coordinates the server hardware so that virtual sever instances can be created
from the most expedient underlying physical server. It can be used to manage a
range of virtual IT resources across multiple physical servers.

• Tasks that are typically automated and implemented through the resource
management system include:

• Managing virtual IT resource templates that used to create pre-built instances, such as
virtual server images.

• Allocation and releasing virtual IT resources into the available physical infrastructure in
responsive to the starting, pausing, resuming, and termination of virtual IT resource
instances.

• Coordinating IT resources in relation to the involvement of other mechanisms, such as


resource replication, load balancer, and failover system.

• Enforcing usage and security policies throughout the lifecycle of cloud service instances

• Monitoring operational conditions of IT resources


CLOUD MANAGEMENT MECHANISM:
SLA MANAGEMENT SYSTEM

• SLA management system mechanism represents a range of


commercially available cloud management products that provide
features pertaining to the administration, collection, storage,
reporting, and runtime notification of SLA data.

• An SLA management system deployment will generally include a


repository used to store and retrieve collected SLA data based on
pre-defined metrics and reporting parameters.

• It will further rely on one or more SLA monitor mechanisms to


collect the SLA data that can then be made available in near-real
time to usage and administration portals to provide on-going
feedback regarding active cloud services.

• The metrics monitored to individual cloud services are aligned with


the SLA guarantees in corresponding cloud provisioning contracts.
CLOUD MANAGEMENT MECHANISM:
BILLING MANAGEMENT SYSTEM

• The billing management system mechanism is dedicated to the


collection and processing of usage data as it pertains to cloud
provider accounting and cloud consumer billing.

• It relies on pay-per-use monitors to father runtime usage data that is


stored in a repository that the system components then draw from
for billing, reporting, and invoicing purposes.

• A cloud service consumer exchanges messages with a cloud service.

• A pay-per-use monitor keeps track of the usage and collects data


relevant to billing, which is forwarded to a repository that is part of
the billing management system.

• The system periodically calculates the consolidated cloud service


usage fees and generates an invoice for the cloud consumer.

• The invoice may be provided to the cloud consumer through the


usage and administration portal.
UNIT IV
• Apache Hadoop
• Hadoop MapReduce
• Hadoop Distributed File System
• Hadoop I/O
• Developing a MapReduce Application
• MapReduce Types and Formats
• MapReduce Features
• Hadoop Cluster Setup
• Administering Hadoop
A PA C H E H A D O O P

• A Hadoop cluster comprises of a Master node, backup node and a


number of slave nodes.

• The master node runs the NameNode and JobTracker processes and
the slave nodes run the DataNode and TaskTracker components of
Hadoop.

• The backup node runs the Secondary NameNode process.

• NameNode
• NameNode keeps the directory tree of all files in the file system and tracks
where across the cluster the file data is kept. It does not store the data of
these files itself. Client applications talk to the NameNode whenever they
wish to locate a file, or when they want to add/copy/move/delete a file.

• Secondary NameNode
• NameNode is a Single Point of Failure for the HDFS Cluster. An optional
Secondary NameNode which is hosted on a separate machine creates
checkpoints of the namespace.

• JobTracker
• The JobTracker is the service within Hadoop that distributes MapReduce
tasks to specific nodes in the cluster, ideally the nodes that have the data, or
at least are in the same rack.
A PA C H E H A D O O P

• TaskTracker

• TaskTracker is a node in a Hadoop cluster that accepts Map, Reduce and


Shuffle tasks from the JobTracker.

• Each TaskTracker has a defined number of slots which indicate the number
of tasks that it can accept.

• DataNode

• A DataNode stores data in an HDFS file system.

• A functional HDFS filesystem has more than one DataNode, with data
replicated across them.

• DataNodes respond to requests from the NameNode for filesystem


operations.

• Client applications can talk directly to a DataNode, once the NameNode


has provided the location of the data.

• Similarly, MapReduce operations assigned to TaskTracker instances near a


DataNode, talk directly to the DataNode to access the files.

• TaskTracker instances can be deployed on the same servers that host


DataNode instance, so that MapReduce operations are performed close to
the data.
HADOOP MAPREDUCE

• MapReduce job consists of two phases:

• Map: In the map phase, data is read from a distributed file system and
partitioned among a set of computing nodes in the cluster. The data is sent
to the nodes as a set of key-value pairs. The Map tasks process the input
records independently of each other and produce intermediate results as
key-value pairs. The intermediate results are stored on the local disk of the
node running the Map task.

• Reduce: When all the Map tasks are completed, the Reduce phase begins
in which the intermediate data with the same key is aggregated.

• Optional Combine Task

• An optional Combine task can be used to perform data aggregation on the


intermediate data of the same key for the output of the mapper before
transferring the output to the Reduce task.
MAPREDUCE JOB EXECUTION WORKFLOW
(HADOOP I/O)

• MapReduce job execution starts when the client applications submit jobs to the JobTracker.

• The JobTracker returns a JobID to the client application. The JobTracker talks to the
NameNode to determine the location of the data.

• The JobTracker locates TaskTracker nodes with available slots at/or near the data.

• The TaskTrackers send out heartbeat messages to the JobTracker, usually every few minutes,
to reassure the JobTracker that they are still alive. These messages also inform the JobTracker
of the number of available slots, so the JobTracker can stay up to date with where in the
cluster, new work can be delegated.

• The JobTracker submits the work to the TaskTracker nodes when they poll for tasks. To
choose a task for a TaskTracker, the JobTracker uses various scheduling algorithms (default is
FIFO).

• The TaskTracker nodes are monitored using the heartbeat signals that are sent by the
TaskTrackers to JobTracker.

• The TaskTracker spawns a separate JVM process for each task so that any task failure does
not bring down the TaskTracker.

• The Task Tracker monitors these spawned processes while capturing the output and exit
codes. When the process finishes, successfully or not, the TaskTracker notifies the
JobTracker. When the job is completed, the JobTracker updates its status.
DEVELOPING A MAPREDUCE APPLICATION

• Writing a program in MapReduce follows a certain pattern. You start by writing your map and reduce
functions, ideally with unit tests to make sure they do what you expect. Then you write a driver
program to run a job, which can run from your IDE using a small subset of the data to check that it is
working. If it fails, you can use your IDE’s debugger to find the source of the problem. With this
information, you can expand your unit tests to cover this case and improve your mapper or reducer as
appropriate to handle such input correctly.
• When the program runs as expected against the small dataset, you are ready to unleash it on a cluster.
Running against the full dataset is likely to expose some more issues, which you can fix as before, by
expanding your tests and mapper or reducer to handle the new cases. Debugging failing programs in
the cluster is a challenge, so we look at some common techniques to make it easier.
• After the program is working, you may wish to do some tuning, first by running through some
standard checks for making MapReduce programs faster and then by doing task profiling. Profiling
distributed programs is not easy, but Hadoop has hooks to aid the process.
UNIT V
• Basic Terms and Concepts
• Threat Agents
• Cloud Security Threats
• Cloud Security Mechanism
• Encryption
• Hashing
• Digital Signature
• Public Key Infrastructure
• Identity and Access Management
• Single Sign-On
• Cloud Based Security Groups
• Hardened Virtual Server Images
FUNDAMENTAL CLOUD SECURITY:
BASIC TERMS AND CONCEPTS

• Confidentiality: Confidentiality is the characteristic of something being made accessible only to authorized parties.
• Integrity: Integrity is the characteristic of not having been altered by an unauthorized party.
• Authenticity: Authenticity is the characteristic of something having been provided by an authorized party.
• Availability: Availability is the characteristic of being accessible and usable during a specified time period.
• Threat: A threat is a potential security violation that can challenge defenses in an attempt to breach privacy and/or cause harm.
• Vulnerability: A vulnerability is a weakness that can be exploited either because it is protected by insufficient security controls, or because
existing security controls are overcome by an attack.
• Risk: Risk is the possibility of loss or harm arising from performing an activity.
• Security Controls: Security controls are countermeasures used to prevent or respond to security threats and to reduce or avoid risk.
• Security Mechanisms: Security mechanisms are components comprising a defensive framework that protects IT resources, information, and
services.
• Security Policies: A security policy establishes a set of security rules and regulations.
FUN D A M E N TA L CLO U D SECU RI T Y:
THR E AT AG E N T S

• A threat agent is an entity that poses a threat because


it is capable of carrying out an attack.
• Cloud security threats can originate either internally
or externally, from humans or software programs.
TYPES OF THREAT AGENTS

• Anonymous Attacker
• An anonymous attacker is a non-trusted cloud service consumer without permissions in the cloud.

• It typically exists as an external software program that launches network-level attacks through public networks.

• When anonymous attackers have limited information on security policies and defenses, it can inhibit their ability to formulate effective attacks.

• Therefore, anonymous attackers often resort to committing acts like bypassing user accounts or stealing user credentials, which using methods that either ensure
anonymity or require substantial resources for prosecution.

• Malicious Service Agent


• A malicious service agent is able to intercept and forward the network traffic that flows within a cloud.

• It typically exists as a service agent (or a program pretending to be a service agent) with compromised or malicious logic.

• It may also exist as an external program able to remotely intercept and potentially corrupt message contents.

• Trusted Attacker
• A trusted attacker shares IT resources in the same cloud environment as the cloud consumer and attempts to exploit legitimate credentials to target cloud providers
and the cloud tenants with whom they share IT resources.

• Unlike anonymous attackers (which are non-trusted), trusted attackers usually launch their attacks from within a cloud’s trust boundaries by abusing legitimate
credentials or via the appropriation of sensitive and confidential information.

• Trusted attackers (also known as malicious tenants) can use cloud-based IT resources for a wide range of exploitations, including the hacking of weak authentication
processes, the breaking of encryption, the spamming of e-mail accounts, or to launch common attacks, such as denial of service campaigns.
TYPES OF THREAT AGENTS

• Malicious Insider
• Malicious Insiders are human threat agents acting on behalf of or in relation to the cloud provider.

• They are typically current or former employees or third parties with access to the cloud provider’s premises.

• This type of threat agent carries tremendous damage potential, as the malicious insider may have administrative privileges for accessing cloud consumer IT
resources.
CLOUD SECURITY THREATS

• Traffic Eavesdropping
• Traffic eavesdropping occurs when data being transferred to or within a cloud (usually from the cloud consumer to the cloud provider) is passively intercepted by a
malicious service agent for illegitimate information gathering purposes.

• The aim of this attack is to directly compromise the confidentiality of the data and, possibly, the confidentiality of the relationship between the cloud consumer and
cloud provider.

• Because of the passive nature of the attack, it can more easily go undetected for extended periods of time.

• Malicious Intermediary
• The malicious intermediary threat arises when messages are intercepted and altered by a malicious service agent, thereby potentially compromising the message’s
confidentiality and/or integrity.

• It may also insert harmful data into the message before forwarding it to its destination.

• Denial of Service
• The objective of the denial of service (DoS) attack is to overload IT resources to the point where they cannot function properly.

• This form of attack is commonly launched in one of the following ways:


• The workload on cloud services is artificially increased with imitation messages or repeated communication requests.

• The network is overloaded with traffic to reduce its responsiveness and cripple its performance.

• Multiple cloud service requests are sent, each of which is designed to consume excessive memory and processing resources.

• Successful DoS attacks produce server degradation and/or failure.


CLOUD SECURITY THREATS

• Insufficient Authorization
• The insufficient authorization attack occurs when access is granted to an attacker erroneously or too broadly, resulting in the attacker getting access to IT resources
that are normally protected.

• This is often a result of the attacker gaining direct access to IT resources that were implemented under the assumption that they would only be accessed by trusted
consumer programs.

• A variation of this attack, known as weak authentication, can result when weak passwords or shared accounts are used to protect IT resources.

• Within cloud environments, these types of attacks can lead to significant impacts depending on the range of IT resources and the range of access to those IT
resources the attacker gains.

• Virtualization Attack
• A virtualization attack exploits vulnerabilities in the virtualization platform to jeopardize its confidentiality, integrity, and/or availability.

• A trusted attacker successfully accesses a virtual server to compromise its underlying physical server.

• Within public clouds, where a single physical IT resource may be providing virtualized IT resources to multiple cloud consumers, such an attack can have
significant repercussions.

• Overlapping Trust Boundaries


• If physical IT resources within a cloud are shared by different cloud service consumers, these cloud service consumers have overlapping trust boundaries.

• Malicious cloud service consumers can target shared IT resources with the intention of compromising cloud consumers or other IT resources that share the same
trust boundary.

• The consequence is that some or all of the other cloud service consumers could be impacted by the attack and/or attacker could use virtual IT resources against
others that happen to also share the same trust boundary.
CLOUD SECURITY MECHANISM:
E N C RY P T I O N

• The encryption mechanism is a digital coding system dedicated to preserving


the confidentiality and integrity of data. It is used for encoding plaintext data
using a standardized algorithm (cipher) into a protected and unreadable format
(ciphertext).

• When encryption is applied to plaintext data, the data is paired with a string of
characters called an encryption key, a secret message that is established by and
shared among authorized parties. The encryption key is used to decrypt the
ciphertext back into its original plaintext format.

• The encryption mechanism can help counter the traffic eavesdropping, malicious
intermediary, insufficient authorization, and overlapping trust boundaries
security threats.

• There are two common forms of encryption:

• Symmetric Encryption: It uses the same key for both encryption and decryption, both of
which are performed by authorized parties that use the one shared key.

• Asymmetric Encryption: It relies on the use of two different keys, namely a private key
and a public key. The private key is known only to its owner while the public key is
commonly available. A document that was encrypted with a private key can only be
correctly decrypted with the corresponding public key and vice versa.
CLOUD SECURITY MECHANISM:
HASHING

• The hashing mechanism is used when a one-way, non-reversible form of data


protection is required.

• Once hashing has been applied to a message, it is locked, and no key is provided
for the message to be unlocked.

• A common application is this mechanism is the storage of passwords.

• Hashing technology can be used to derive a hashing code or message digest


from a message, which is often of a fixed length and smaller than the original
message.

• The message sender can then utilize the hashing mechanism to attach the
message digest to the message.

• The recipient applies the same hash function to the message to verify that the
produced message digest is identical to the one that accompanied the message.

• An alternation to the original data results in an entirely different message digest


and clearly indicates that tampering has occurred.

• In addition to its utilization for protecting stored data, cloud threats that can
mitigated by the hashing mechanism include malicious intermediary and
insufficient authorization.
CLOUD SECURITY MECHANISM:
D I G I TA L S I G N AT U R E

• The digital signature mechanism is a means of providing data authenticity and


integrity through authentication and non-repudiation.

• A message is assigned a digital signature prior to transmission, which is then rendered


invalid if the message experiences any subsequent, unauthorized modifications.

• A digital signature provides evidence that the message received is the same as the one
created by its rightful sender.

• Both hashing and asymmetrical encryption are involved in the creation of a digital
signature, which essentially exists as a message digest that was encrypted by a private
key and appended to the original message.

• The recipient verifies the signature validity and uses the corresponding public key to
decrypt the digital signature, which produces the message digest.

• The hashing mechanism can also be applied to the original message to produce this
message digest.

• Identical results from the two different processes indicate that the message maintained
its integrity.

• The digital signature mechanism helps mitigate the malicious intermediary, insufficient
authorization, and overlapping trust boundaries security threats.
CLOUD SECURITY MECHANISM:
PUBLIC KEY INFRASTRUCTURE (PKI)

• The Public Key Infrastructure (PKI) is a common approach for managing the
issuance of asymmetric keys, which exists as a system of protocols, data formats,
rules, and practices that enable large-scale systems to securely use public key
cryptography.

• This system is used to associate public keys with their corresponding key owners
(known as public key identification) while enabling the verification of key
validity.

• PKIs rely on the use of digital certificates, which are digitally signed data structures
that bind public keys to certificate owner identities, as well as to related
information, such as validity periods.

• Digital certificates are usually digitally signed by a third-party certificate authority


(CA). The majority of digital certificates are issued by only a handful of trusted Cas
like VeriSign and Comodo.

• The PKI is a dependable method for implementing asymmetric encryption,


managing cloud consumer and cloud provider identity information, and helping to
defend against the malicious intermediary and insufficient authorization threats.

• The PKI mechanism is primarily used to counter the insufficient authorization


threat.
CLOUD SECURITY MECHANISM:
IDENTITY AND ACCESS MANAGEMENT (IAM)
• The identity and access management (IAM) mechanism encompasses the components and policies necessary to control and
track user identities and access privileges for IT resources, environments, and systems.
• IAM mechanisms exist as systems comprised of four main components:
• Authentication: Username and password combinations remain the most common forms of user authentication credentials managed by the
IAM system, which also can support digital signatures, digital certificates, biometric hardware (fingerprint readers), specialized software
(such as voice analysis programs), and locking user accounts to registered IP or MAC addresses.
• Authorization: The authorization component defines the correct granularity for access controls and oversees the relationships between
identities, access control rights, and IT resource availability.
• User Management: Related to the administrative capabilities of the system, the user management program is responsible for creating new
user identities and access groups, resetting passwords, defining password policies, and managing privileges.
• Credentials Management: The credential management system establishes identities and access control rules for defined user accounts,
which mitigates the threat of insufficient authorization.

• The IAM mechanism is primarily used to counter the insufficient authorization, denial of service, and overlapping trust
boundaries threats.
CLOUD SECURITY MECHANISM:
SINGLE SIGN-ON (SSO)

• The single sign-on (SSO) mechanism enables one cloud service consumer to be
authenticated by a security broker, which establishes a security context that is
persisted while the cloud service consumer accesses other cloud services or
cloud-based IT resources. Otherwise, the cloud service consumer would need to
re-authenticate itself with every subsequent request.

• The SSO mechanism essentially enables mutually independent cloud services


and IT resources to generate and circulate runtime authorization and
authorization credentials.

• The credentials initially provided by the cloud service consumer remain valid for
the duration of a session, while its security context information is shared.

• The SSO mechanism’s security broker is especially useful when a cloud service
consumer needs to access cloud services residing on different clouds.

• The mechanism does not directly counter any of the cloud security threats. It
primarily enhances the usability of cloud-based environments for access and
management of distributed IT resources and solutions.
CLOUD SECURITY MECHANISM:
CLOUD-BASED SECURITY GROUPS

• Cloud resource segmentation is a process by which separate physical and virtual IT


environments are created for different users and groups.

• Resource segmentation is used to enable virtualization by allocating a variety of physical IT


resources to virtual machines.

• The cloud-based resource segmentation process creates cloud-based security group


mechanism that are determined through security policies.

• Networks are segmented into logical cloud-based security groups that form logical network
perimeters.

• Each cloud-based IT resource is assigned to at least one logical cloud-based security group.
Each logical cloud-based security group is assigned specific rules that govern the
communication between the security groups.

• Multiple virtual servers running on the same physical server can become members of
different logical cloud-based security groups.

• Virtual servers can further be separated into public-private groups, development-production


groups, or any other designation configured by the cloud resource administrator.

• Cloud-based security groups delineate areas where different security measures can be applied.
Properly implemented cloud-based security groups help limit unauthorized access to IT
resources in the event of a security breach. This mechanism can be used to help counter the
denial of service, insufficient authorization, and overlapping trust boundaries threats, and is
closely related to the logical network perimeter mechanism.
CLOUD SECURITY MECHANISM:
H A R D E N E D V I RT U A L S E RV E R I M A G E S

• A virtual server is created from a template configuration called a


virtual server image (or virtual machine image).

• Hardening is the process of stripping unnecessary software from a


system to limit potential vulnerabilities that can be exploited by
attackers.

• Removing redundant programs, closing unnecessary server ports, and


disabling unused services, internal root accounts, and guest access are
all examples of hardening.

• A hardened virtual server image is a template for virtual service


instance creation that has been subjected to a hardening process.

• This generally results in a virtual server template that is significantly


more secure than the original standard image.

• Hardened virtual server images help counter the denial of service,


insufficient authorization, and overlapping trust boundaries threats.

You might also like