Cloud Computing (100question's Answers)
Cloud Computing (100question's Answers)
1. What is a Cloud?
⚫ Cloud computing traces its roots to the 1960s when John McCarthy
suggested that computing might one day be organized as a utility.
⚫ The term “cloud computing” came into use in the early 2000s.
⚫ Amazon Web Services (AWS) launched in 2006 as one of the first
significant cloud platforms, providing scalable computing services over
the internet.
⚫ Other companies, like Google, Microsoft, and IBM, followed, and the
cloud industry rapidly expanded, evolving with innovations such as
SaaS, PaaS, and IaaS.
Advantages:
Disadvantages:
Q12. What are the Key Stages in migrating to the cloud? (Answer:
Write about 3 stages -> Plan, Execute and Monitor)
Encryption ensures that data is secure both in transit and at rest by converting it into
unreadable text that can only be decrypted with the correct key.
Cloud providers offer services for data encryption at different layers (file, database,
network, etc.).
Q.16 what is utility computing?
Utility computing is a model of computing where computing resources—such as
processing power, storage, and networking—are provided and billed in a manner similar
to traditional public utilities (like electricity or water). In utility computing, users pay only
for the resources they consume, rather than maintaining and managing their own
infrastructure. This on-demand, pay-as-you-go model is one of the foundational
principles behind cloud computing.
Q.17 what is virtual desktop infrastructure
Virtual Desktop Infrastructure (VDI) is a technology that enables the creation,
management, and delivery of virtualized desktop environments to users, typically hosted
in a data center or cloud. VDI allows users to access desktop environments remotely
from any device, with the desktop operating system (OS), applications, and data residing
on centralized virtual machines (VMs), rather than on physical endpoints like laptops or
desktops.
Q.18 What is encryption?
Encryption is a process used to protect data by converting it into a form that is unreadable
to anyone who does not have the correct decryption key or password. The primary purpose
of encryption is to safeguard sensitive information from unauthorized access, ensuring
confidentiality and integrity, both in transit and at rest.
In simpler terms, encryption scrambles data in a way that only authorized users (or
systems) with the correct decryption key can access the original, readable version of the
data.
Cloud Components:
• Data Security: Storing data in the cloud raises concerns about data
breaches, unauthorized access, and data loss. Cloud service providers must
implement robust security measures to protect sensitive data from
cyberattacks.
• Data Privacy: Since cloud resources are often shared among multiple
customers, data privacy is a concern, especially when dealing with
personally identifiable information (PII) or confidential corporate data.
• Compliance: Organizations are required to comply with various
regulatory frameworks (e.g., GDPR, HIPAA) when storing data in the
cloud. Ensuring that the cloud provider meets these compliance standards
can be a challenge.
• Shared Responsibility Model: In a cloud environment, the responsibility
for security is shared between the cloud provider and the customer.
Misunderstanding of this model can lead to gaps in security.
It is possible to organize all the concrete realizations of cloud computing into a layered view
covering the entire, from hardware appliances to software systems. All of the physical
manifestations of cloud computing can be arranged into a layered picture that encompasses
anything from software systems to hardware appliances. Utilizing cloud resources can provide the
“computer horsepower” needed to deliver services. This layer is frequently done utilizing a data
center with dozens or even millions of stacked nodes. Because it can be constructed from a range
of resources, including clusters and even networked PCs, cloud infrastructure can be
heterogeneous in character. The infrastructure can also include database systems and other storage
services. The core middleware, whose goals are to create an optimal runtime environment for
applications and to best utilize resources, manages the physical infrastructure. Virtualization
technologies are employed at the bottom of the stack to ensure runtime environment modification,
application isolation, sandboxing, and service quality. At this level, hardware virtualization is most
frequently utilized. The distributed infrastructure is exposed as a collection of virtual computers
via hypervisors, which control the pool of available resources. By adopting virtual machine
technology, it is feasible to precisely divide up hardware resources like CPU and memory as well
as virtualize particular devices to accommodate user and application needs.
• Cloud Provider
• Cloud Carrier
• Cloud Broker
• Cloud Auditor
• Cloud Consumer
Method and/or performs tasks in Cloud computing. There are five major actors defined in the
NIST cloud computing reference architecture, which are described below:
Cloud Provider:
A group or object that delivers cloud services to cloud consumers or end-users. It offers various
components of cloud computing. Cloud computing consumers purchase a growing variety of cloud
services from cloud service providers. There are various categories of cloud-based services
mentioned below:
• IaaS Providers: In this model, the cloud service providers offer infrastructure components
that would exist in an on-premises data center. These components consist of servers,
networking, and storage as well as the virtualization layer.
• SaaS Providers: In Software as a Service (SaaS), vendors provide a wide sequence of
business technologies, such as Human resources management (HRM) software, customer
relationship management (CRM) software, all of which the SaaS vendor hosts and provides
services through the internet.
• PaaS Providers: In Platform as a Service (PaaS), vendors offer cloud infrastructure an
services that can access to perform many functions. In PaaS, services and products are
mostly utilized in software development. PaaS providers offer more services than IaaS
providers. PaaS providers provide operating system and middleware along with application
stack, to the underlying infrastructure
Cloud Auditor: An entity that can conduct independent assessment of cloud services, security,
performance, and information system operations of the cloud implementations. The services that
are provided by Cloud Service Providers (CSP) can be evaluated by service auditors in terms of
privacy impact, security control, and performance, etc. Cloud Auditor can make an assessment of
the security controls in the information system to determine the extent to which the controls are
implemented correctly, operating as planned and constructing the desired outcome with respect to
meeting the security necessities for the system. There are three major roles of Cloud Auditor which
are mentioned below:
• Security Audit.
• Privacy Impact Audit.
• Performance Audit.
Q. 4 what is meant by on-demand provisioning? State its purpose in cloud
Resource Provisioning means the selection, deployment, and run-time management of software
(e.g., database server management systems, load balancers) and hardware resources (e.g., CPU,
storage, and network) for ensuring guaranteed performance for applications. By provisioning the
resources, the QoS parameters like availability, throughput, security, response time, reliability,
performance etc must be achieved without violating SLA.
Q. 5 what do you mean by cloud storage? Describe its types.
Cloud Storage as a Service (STaaS) provides on-demand storage resources over the internet. It abstracts the
complexities of storage infrastructure, offering a scalable and costeffective solution for storing and managing data.
Advantages of Cloud Storage:
Scalability:
Cloud storage can easily scale up or down based on demand, allowing organizations to
Pay for only the storage they use.
Cost Efficiency:
Organizations can avoid the upfront costs of purchasing and maintaining physical
Hardware, paying only for the storage resources consumed.
Accessibility:
Data stored in the cloud can be accessed from anywhere with an internet connection,
Facilitating remote access and collaboration.
Redundancy and Reliability:
Cloud storage providers often implement redundant storage mechanisms, ensuring data
Durability and high availability.
Data Security:
Cloud storage services implement robust security measures, including encryption and
access controls, to protect stored data.
Automatic Updates and Maintenance:
Cloud storage providers handle infrastructure updates and maintenance, relieving users
from these operational tasks.
Key Features of Amazon S3:
Object Storage: Amazon S3 allows users to store and retrieve any amount of data as objects,
each consisting of data, a key, and metadata.
Scalability: S3 provides virtually unlimited storage capacity, and it scales automatically to handle
growing amounts of data.
Data Durability and Availability: S3 achieves high durability by storing data across multiple
locations and availability zones, ensuring high availability and reliability.
Security Features: S3 supports data encryption in transit and at rest, access control policies, and
integration with AWS Identity and Access Management (IAM) for fine-grained access control.
• Scalability.
• Security: Because data is properly separated, the chances of data theft by attackers are
considerably reduced
Q.14 Explain the PLATFORM AS A SERVICE (PAAS)
Platform as a Service (PaaS) is a type of cloud computing that helps developers to build
applications and services over the Internet by providing them with a platform. PaaS helps in
maintaining control over their business applications.
Advantages of PaaS
• PaaS is simple and very much convenient for the user as it can be accessed via a web
browser.
• PaaS has the capabilities to efficiently manage the lifecycle.
Disadvantages of PaaS
• PaaS has limited control over infrastructure as they have less control over the
environment and are not able to make some customizations.
• PaaS has a high dependence on the provider.
An organization or a unit that manages the performance, use, and delivery of cloud services by
enhancing specific capability and offers value-added services to cloud consumers. It combines
and integrates various services into one or more new services. They provide service arbitrage
which allows flexibility and opportunistic choices. There are major three services offered by a
cloud broker:
• Service Intermediation.
• Service Aggregation.
• Service Arbitrage.
Advantages of SaaS
• SaaS solutions have limited customization, which means they have some restrictions
within the platform.
• SaaS has little control over the data of the user.
• SaaS are generally cloud-based, they require a stable internet connection for proper working
1. Data Loss –
Data Loss is one of the issues faced in Cloud Computing. This is also known as Data Leakage. As
we know that our sensitive data is in the hands of somebody else, and we don’t have full control
over our database. So, if the security of cloud service is to break by hackers then it may be
possible that hackers will get access to our sensitive data or personal files.
Account Hijacking is the most serious security issue in Cloud Computing. If somehow the Account
of User or an Organization is hijacked by a hacker then the hacker has full authority to perform
Unauthorized Activities.
Answer:
Resource management in cloud computing involves efficiently allocating, monitoring, and
optimizing cloud resources (like computing power, storage, and network bandwidth) to ensure
that applications run smoothly and cost-effectively. It includes provisioning, scaling, and
deallocating resources based on demand.
Answer:
Cloud security is essential to protect data, applications, and services hosted in the cloud from
cyber threats, data breaches, and unauthorized access. It ensures confidentiality, integrity, and
availability of sensitive information, preventing loss and reputational damage.
Answer:
Common cloud security threats include data breaches, loss of data control, insecure APIs, denial-
of-service attacks, and misconfigurations. These vulnerabilities can lead to unauthorized access,
data loss, and disruption of services.
Answer:
The Shared Responsibility Model defines the security responsibilities of both the cloud provider
and the customer. The provider secures the infrastructure, while the customer is responsible for
securing their data, applications, and access controls within the cloud environment.
Answer:
IAM in the cloud is a framework used to control and manage user access to cloud resources. It
UNIT – IV
Resource Management and Security in Cloud
ensures that only authorized users have appropriate access levels to applications and data,
improving security and reducing the risk of unauthorized access.
Answer:
Auto-scaling in cloud computing automatically adjusts the number of active resources (such as
servers or VMs) based on current demand. This ensures optimal resource usage, reduces costs,
and prevents system overloads during peak usage times.
Answer:
Cloud resource usage can be optimized by using auto-scaling, load balancing, resource allocation
strategies, and monitoring tools to track performance. Additionally, choosing cost-effective
service tiers and rightsizing resources based on actual usage helps reduce wastage.
Answer:
Encryption in cloud security is the process of converting data into an unreadable format to
prevent unauthorized access. It is applied to data both in transit (while being transferred) and at
rest (while stored) to protect sensitive information.
Answer:
Monitoring in cloud security involves continuously observing cloud resources and activities for
anomalies, unauthorized access, or performance issues. It helps detect potential threats early,
ensuring timely responses to mitigate risks.
UNIT – IV
Resource Management and Security in Cloud
10. What is a Cloud Access Security Broker (CASB)?
Answer:
A CASB is a security tool that acts as an intermediary between users and cloud services,
enforcing security policies. It helps organizations monitor and control cloud service usage,
ensuring data security, compliance, and preventing unauthorized access.
Instead of managing hardware or software in-house, users rely on these third-party providers to
deliver scalable, on-demand resources over the internet.
• Offers a platform for developers to build, test, and deploy applications without managing
the underlying infrastructure.
• Examples:
o Heroku: A platform for building, running, and scaling apps.
o Google App Engine: A managed platform for app development.
o Microsoft Azure App Service: For web and mobile app development.
4. Storage Services:
• Examples:
o Cloudflare: For web performance and security.
o Twilio: Cloud-based communications (SMS, voice, video).
o Snowflake: Cloud-based data warehousing and analytics.
These third-party services reduce the need for businesses to manage their own infrastructure and
enable them to focus on core activities.
2. Enhanced Security
• Data Protection: Helps enforce security measures like encryption, access control, and
monitoring.
• Compliance: Ensures adherence to industry regulations such as GDPR, HIPAA, or PCI
DSS.
3. Performance Optimization
4. Simplified Operations
• Ensures data recovery processes are in place to minimize downtime during failures.
1. On-Demand Availability
o Users can access resources whenever needed without prior commitments.
2. Pay-Per-Use Model
UNIT – IV
Resource Management and Security in Cloud
o Charges are based on actual consumption, making it cost-efficient.
3. Scalability
o Resources can scale up or down dynamically based on user needs.
4. Infrastructure Abstraction
o Users do not need to own or manage physical infrastructure; they access
virtualized resources.
• Cost Savings: Eliminates the need for large upfront investments in IT infrastructure.
• Flexibility: Allows businesses to adjust resources as demand fluctuates.
• Focus on Core Business: Frees up organizations to focus on innovation rather than
managing IT infrastructure.
By leveraging cloud management and utility computing, organizations can optimize their
cloud usage, minimize costs, and improve operational efficiency.
a. Static Resource Provisioning:- For applications that have predictable and generally
unchanging demands/workloads, it is possible to use “static provisioning" effectively. With
advance provisioning, the customer contracts with the provider for services and the provider
prepares the appropriate resources in advance of start of service. The customer is charged a flat
fee or is billed on a monthly basis.
c. User Self-provisioning: With user self provisioning:-(also known as cloud self-service), the
customer purchases resources from the cloud provider through a web form, creating a customer
account and paying for resources with a credit card. The provider's resources are available for
customer use within hours, if not minutes.
When an organization elects to store data or host applications on the public cloud, it loses its
ability to have physical access to the servers hosting its information. As a result, potentially
sensitive data is at risk from insider attacks. According to a recent Cloud Security Alliance
report, insider attacks are the sixth biggest threat in cloud computing. Therefore, cloud service
providers must ensure that thorough background checks are conducted for employees who have
physical access to the servers in the data center. Additionally, data centers must be frequently
monitored for suspicious activity.
In order to conserve resources, cut costs, and maintain efficiency, cloud service providers often
store more than one customer's data on the same server. As a result, there is a chance that one
user's private data can be viewed by other users (possibly even competitors). To handle such
sensitive situations, cloud service providers should ensure proper data isolation and logical
storage segregation.
The extensive use of virtualization in implementing cloud infrastructure brings unique security
concerns for customers or tenants of a public cloud service. Virtualization alters the relationship
between the OS and underlying hardware – be it computing, storage or even networking. This
introduces an additional layer – virtualization – that itself must be properly configured, managed
and secured. Specific concerns include the potential to compromise the virtualization software,
UNIT – IV
Resource Management and Security in Cloud
or "hypervisor". While these concerns are largely theoretical, they do exist. For example, a
breach in the administrator workstation with the management software of the virtualization
software can cause the whole datacenter to go down or be reconfigured to an attacker's liking.
Relating to public and hybrid cloud environments, the loss of overall service visibility and the
associated lack of control can be a problem. Whether you’re dealing with public or hybrid cloud
environments, a loss of visibility in the cloud can mean a loss of control over several aspects of
IT management and data security. Where legacy style in- house infrastructure was entirely under
the control of the company, cloud services delivered by third-party providers don’t offer the
same level of granularity with regards to administration and management.
Despite the fact that generally speaking, enterprise-grade cloud services are more secure than legacy
architecture, there is still a potential cost in the form of data breaches and downtime. With public and
private cloud offerings, resolving these types of problems is in the hands of the third-party
provider. Consequently, the business has very little control over how long critical business
systems may be offline, as well as how well the breach is managed.
Vendor Lock-In
For companies that come to rely heavily on public and hybrid cloud platforms, there is a danger
that they become forced to continue with a specific third-party vendor simply to retain
operational capacity. If critical business applications are locked into a single vendor, it can be
very difficult to make tactical decisions such as moving to a new vendor. In effect, the vendor is
being provided with the leverage it needs to force the customer into an unfavorable contract.
Logicworks recently performed a survey that found showed that some 78% of IT decision
makers blame the fear of vendor lock-in as a primary reason for their organization failing to gain
maximum value from cloud computing.
Compliance Complexity
UNIT – IV
Resource Management and Security in Cloud
In sectors such as healthcare and finance, where legislative requirements with regard to storage
of private data are heavy, achieving full compliance whilst using public or private cloud
offerings can be more complex. Many enterprises attempt to gain compliance by using a cloud
vendor that is deemed fully compliant. Indeed, data shows that some 51% of firms in the USA
rely on nothing more than a statement of compliance from their cloud vendor as confirmation
that all legislative requirements have been met. But what happens when at a later stage, it is
found that the vendor is not actually fully compliant? The client company could find itself facing
non-compliance, with very little control over how the problem can be resolved.
• HTTP
o The acronym “Hypertext Transfer Protocol
UNIT – IV
Resource Management and Security in Cloud
o HTTP is a request/response standard between a client and a server
o For distributed, collaborative, hypermedia information systems
• XMPP
o The acronym “Extensible Messaging and Presence Protocol”
o Used for near-real-time, extensible instant messaging and presence information
o XMPP remains the core protocol of the Jabber Instant Messaging and Presence
technology
• SIMPLE
o Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions
o For registering for presence information and receiving notifications
o It is also used for sending short messages and managing a session of realtime messages
between two or more participants
1. Enhanced Security
• Access Control: Ensures that only authorized users can access specific resources.
• Least Privilege Principle: Limits access rights to the minimum required for users to
perform their tasks.
• Reduced Risk of Insider Threats: By managing access tightly, IAM minimizes the risk
of unauthorized actions by internal personnel.
2. Regulatory Compliance
• Simplifies compliance with standards like GDPR, HIPAA, and PCI DSS by ensuring
proper logging, monitoring, and auditing of access.
• Facilitates access reviews and ensures only appropriate permissions are granted.
3. Improved Productivity
• Single Sign-On (SSO): Allows users to access multiple systems with one set of
credentials, reducing the time spent on logging into various platforms.
• Automated Provisioning: Speeds up the onboarding process by automatically granting
access to necessary resources.
4. Cost Efficiency
UNIT – IV
Resource Management and Security in Cloud
• Reduces IT overhead by automating user provisioning, password resets, and de-
provisioning.
• Prevents data breaches and the associated financial and reputational costs.
• Offers a unified platform to manage and monitor user access across multiple systems and
applications.
• Simplifies administration, especially in multi-cloud or hybrid environments.
• Streamlined Access: SSO and self-service portals improve usability, reducing frustration
caused by multiple logins or forgotten passwords.
• Consistency: Provides uniform access across all resources and platforms
8. Scalability
SaaS applications can range from customer relationship management (CRM) tools (e.g.,
Salesforce) to office productivity software (e.g., Google Workspace) and enterprise resource
planning (ERP) systems. Because the software and data are hosted on cloud servers owned by
third-party providers, SaaS security is essential to ensure that data is protected from breaches,
loss, and unauthorized access.
UNIT – IV
Resource Management and Security in Cloud
Key Aspects of SaaS Security:
The Hadoop was started by Doug Cutting and Mike Cafarella in 2002. Its origin was the
Google File System paper, published by Google.
Let's focus on the history of Hadoop in the following steps: -
o In 2002, Doug Cutting and Mike Cafarella started to work on a project, Apache Nutch. It
is an open source web crawler software project.
o While working on Apache Nutch, they were dealing with big data. To store that data they
have to spend a lot of costs which becomes the consequence of that project. This problem
becomes one of the important reason for the emergence of Hadoop.
o In 2003, Google introduced a file system known as GFS (Google file system). It is a
proprietary distributed file system developed to provide efficient access to data.
o In 2004, Google released a white paper on Map Reduce. This technique simplifies the
data processing on large clusters.
o In 2005, Doug Cutting and Mike Cafarella introduced a new file system known as NDFS
(Nutch Distributed File System). This file system also includes Map reduce.
o In 2006, Doug Cutting quit Google and joined Yahoo. On the basis of the Nutch project,
Dough Cutting introduces a new project Hadoop with a file system known as HDFS
(Hadoop Distributed File System). Hadoop first version 0.1.0 released in this year.
o Doug Cutting gave named his project Hadoop after his son's toy elephant.
o In 2007, Yahoo runs two clusters of 1000 machines.
o In 2008, Hadoop became the fastest system to sort 1 terabyte of data on a 900 node
cluster within 209 seconds.
o In 2013, Hadoop 2.2 was released.
o In 2017, Hadoop 3.0 was released.
Oracle VirtualBox is open source virtualization software that allows users to run multiple
operating system on a single device, and easily deploy to cloud.
A MapReduce is a data processing tool which is used to process the data parallelly in a
distributed form. It was developed in 2004, on the basis of paper titled as "MapReduce:
Simplified Data Processing on Large Clusters," published by Google.
The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer
phase. In the Mapper, the input is given in the form of a key-value pair. The output of the
Mapper is fed to the reducer as input. The reducer runs only after the Mapper is over. The
reducer too takes input in key-value format, and the output of reducer is the final output.
Phases of MapReduce:
• Input Phase − Here we have a Record Reader that translates each record in an input file
and sends the parsed data to the mapper in the form of key-value pairs.
• Map − Map is a user-defined function, which takes a series of key-value pairs and
processes each one of them to generate zero or more key-value pairs.
• Intermediate Keys − They key-value pairs generated by the mapper are known as
intermediate keys.
• Combiner − A combiner is a type of local Reducer that groups similar data from the map
phase into identifiable sets. It takes the intermediate keys from the mapper as input and
applies a user-defined code to aggregate the values in a small scope of one mapper. It is
not a part of the main MapReduce algorithm; it is optional.
• Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It downloads
the grouped key-value pairs onto the local machine, where the Reducer is running. The
individual key-value pairs are sorted by key into a larger data list. The data list groups the
equivalent keys together so that their values can be iterated easily in the Reducer task.
• Reducer − The Reducer takes the grouped key-value paired data as input and runs a
Reducer function on each one of them. Here, the data can be aggregated, filtered, and
combined in a number of ways, and it requires a wide range of processing. Once the
execution is over, it gives zero or more key-value pairs to the final step.
• Output Phase − In the output phase, we have an output formatter that translates the final
key-value pairs from the Reducer function and writes them onto a file using a record
writer.
Hadoop Architecture
The Hadoop architecture is a package of the file system, MapReduce engine and the
HDFS (Hadoop Distributed File System). The MapReduce engine can be
MapReduce/MR1 or YARN/MR2.
A Hadoop cluster consists of a single master and multiple slave nodes. The master node
includes Job Tracker, Task Tracker, NameNode, and DataNode whereas the slave node
includes DataNode and TaskTracker.
Both NameNode and DataNode are capable enough to run on commodity machines. The Java
language is used to develop HDFS. So any machine that supports Java language can easily run
the NameNode and DataNode software.
NameNode
DataNode
Job Tracker
• The role of Job Tracker is to accept the MapReduce jobs from client and process the data
by using NameNode.
• In response, NameNode provides metadata to Job Tracker.
Task Tracker
MapReduce Layer
The MapReduce comes into existence when the client application submits the MapReduce job to
Job Tracker. In response, the Job Tracker sends the request to the appropriate Task Trackers.
Sometimes, the TaskTracker fails or time out. In such a case, that part of the job is rescheduled.
• GAE is a platform-as-a-service product that provides web app developers and enterprises
with access to Google's scalable hosting and tier 1 internet service.
• GAE requires that applications be written in Java or Python, store data in Google
Bigtable and use the Google query language. Noncompliant applications require
modification to use GAE.
• GAE provides more infrastructure than other scalable hosting services, such as Amazon
Elastic Compute Cloud (EC2). GAE also eliminates some system administration and
development tasks to make writing scalable applications easier.
• Google provides GAE free up to a certain amount of use for the following resources:
▪ processor (CPU)
▪ storage
▪ application programming interface (API) calls
▪ concurrent requests
Advantages of GAE
• Ease of setup and use. GAE is fully managed, so users can write code without
considering IT operations and back-end infrastructure. The built-in APIs enable users to
build different types of applications. Access to application logs also facilitates debugging
and monitoring in production.
• Pay-per-use pricing. GAE's billing scheme only charges users daily for the resources
they use. Users can monitor their resource usage and bills on a dashboard.
• Scalability. Google App Engine automatically scales as workloads fluctuate, adding and
removing application instances or application resources as needed.
• Security. GAE supports the ability to specify a range of acceptable Internet Protocol (IP)
addresses. Users can allow list specific networks and services and blocklist specific IP
addresses.
GAE disadvantages
OpenStack is a collection of open source software modules and tools that provides a framework
to create and manage both public cloud and private cloud infrastructure.
Components of OpenStack:
As an open source software, it has a community that collaborates and it has defined nine
components that are part of the “core” of OpenStack. The community maintains these
components and they are distributed as a part of any OpenStack system.
Nova
This is the primary computing engine behind OpenStack. This allows deploying and managing
virtual machines and other instances t o handle computing tasks.
Swift
The storage system for objects and files is referred to as Swift. In the traditional storage systems,
files are referred to a location on the disk drive, whereas in OpenStack Swift files are referred to
by a unique identifier and the Swift is in charge where to store the files.
The scaling is therefore made easier because the developers don’t have the worry about the
capacity on a single system behind the software. This makes the system in charge of the best way
to make data backup in case of network or hardware problems.
Cinder
This is the respective component to the traditional computer access to specific disc locations. It is
a block storage component that enables the cloud system to access data with higher speed in
situations when it is an important feature.
Neutron
Neutron is the networking component of OpenStack. It makes all the components communicate
with each other smoothly, quickly and efficiently.
Horizon
This is the OpenStack dashboard. It’s the graphical interface to OpenStack and the first
component that users starting with OpenStack will see.
There is an OpenStack API that allows developers to access all the components individually, but
the dashboard is the management platform for the system administrators to have a know what is
going on in the cloud.
Keystone
This is the component that provides identity services for OpenStack. Basically, this is a
centralized list of all the users and their permissions for the services they use in the OpenStack
cloud.
Glance:It is a component that provides image services or virtual copies of the hard disks.
Glance allows these images to be used as templates when deploying new virtual machine
instances.
Heat:Heat is the orchestration component of OpenStack, which allows developers to store the
requirements of a cloud application in a file that defines what resources are necessary for that
application. In this way, it helps to manage the infrastructure needed for a cloud service to run.
This is the component which allows developers to keep the requirements of the cloud
applications of the resources in a file. It improves the management of the infrastructure needed
for a cloud service to run.
Cloud computing offers computing resources such as servers, databases, storage, networking,
runtime environment, virtualization, & software to its customer on their demand over the
internet. Customers consume these cloud services with pay as you go pricing model.
Now the term federation is associated with the cloud. Federation means associating small
divisions to a single group for performing a common task. Federated cloud is formed by
connecting the cloud environment of several cloud providers using a common standard. This
federation in the cloud helps the provider to easily scale up the resource to match business needs.
Cloud federation properties can be classified into two categories i.e. functional cloud federation
properties and usage cloud federation properties.
1. Authentication: Cloud federation has the involvement of several foreign resources that have
participated in the federation. To consume these foreign resource customer must be provided
with the access credential relevant to the target foreign resource. However, the respective foreign
resource must also have authentication information of the customer.
2. Integrity: Integrity in the federated cloud offers and demand consistent resources by the
providers participated in the federation. If the federated cloud environment lacks in providing the
resources its purpose becomes questionable.
To maintain the consistency of the environment management is needed by the providers they can
even designate a federation administrative board or the provider can automate the process which
will trigger the administrative action when any irregularity is detected.
3. Monitoring: Federated cloud can be monitored in two ways global monitoring and monitoring
as a service (MaaS). Global monitoring aids in maintaining the federated cloud. MaaS provides
information that helps in tracking contracted services to the customer.
4. Object: Marketing object in cloud computing are infrastructure, software, platform that are
offered to the customer as a service. These objects have to pass through federation when
consumed in the federated cloud.
5. Contracts: In cloud computing, the agreement between provider and consumer i.e. service
level agreement (SLA) has both technical as well as administrative commitments between
provider and consumer. In addition to SLA federated cloud has a federation level agreement
that encloses commitment to the functional and usage properties.
6. Provisioning: Allocating services and resources offered by the cloud provider to the customer
through federation. It can be done manually or automatically. In an automatic way, the best
provider is chosen to allocate the resources and services to the customer. In the manual way
entity in the federation selects the provider to allocate the resources and services.
A MapReduce is a data processing tool which is used to process the data parallelly in a
distributed form. It was developed in 2004, on the basis of paper titled as "MapReduce:
Simplified Data Processing on Large Clusters," published by Google.
The MapReduce is a paradigm which has two phases, the mapper phase, and the reducer
phase. In the Mapper, the input is given in the form of a key-value pair. The output of the
Mapper is fed to the reducer as input. The reducer runs only after the Mapper is over. The
reducer too takes input in key-value format, and the output of reducer is the final output.
Features of MapReduce:
1. Scalability
Apache Hadoop is a highly scalable framework. This is because of its ability to store and
distribute huge data across plenty of servers. All these servers were inexpensive and can operate
in parallel. We can easily scale the storage and computation power by adding servers to the
cluster.
2. Flexibility
The MapReduce programming model uses HBase and HDFS security platform that allows access
only to the authenticated users to operate on the data. Thus, it protects unauthorized access to
system data and enhances system security.
4. Cost-effective solution
Hadoop’s scalable architecture with the MapReduce programming framework allows the storage
and processing of large data sets in a very affordable manner.
5. Fast
Hadoop uses a distributed storage method called as a Hadoop Distributed File System that
basically implements a mapping system for locating data in a cluster.
Amongst the various features of Hadoop MapReduce, one of the most important features is that
it is based on a simple programming model. Basically, this allows programmers to develop the
MapReduce programs which can handle tasks easily and efficiently.
7. Parallel Programming
One of the major aspects of the working of MapReduce programming is its parallel processing. It
divides the tasks in a manner that allows their execution in parallel. The parallel processing
allows multiple processors to execute these divided tasks. So the entire program is run in less
time.
Whenever the data is sent to an individual node, the same set of data is forwarded to some other
nodes in a cluster. So, if any particular node suffers from a failure, then there are always other
copies present on other nodes that can still be accessed whenever needed. This assures high
availability of data.
• GAE is a platform-as-a-service product that provides web app developers and enterprises
with access to Google's scalable hosting and tier 1 internet service.
• GAE requires that applications be written in Java or Python, store data in Google
Bigtable and use the Google query language. Noncompliant applications require
modification to use GAE.
• GAE provides more infrastructure than other scalable hosting services, such as Amazon
Elastic Compute Cloud (EC2). GAE also eliminates some system administration and
development tasks to make writing scalable applications easier.
• Google provides GAE free up to a certain amount of use for the following resources:
▪ processor (CPU)
▪ storage
▪ application programming interface (API) calls
▪ concurrent requests
• API selection. GAE has several built-in APIs, including the following five:
• Managed infrastructure. Google manages the back-end infrastructure for users. This
approach makes GAE a serverless platform and simplifies API management.
• Several programming languages. GAE supports a number of languages, including GO,
PHP, Java, Python, NodeJS, .NET and Ruby. It also supports custom runtimes.
• Support for legacy runtimes. GAE supports legacy runtimes, which are versions of
programming languages no longer maintained. Examples include Python 2.7, Java 8 and
Go 1.11.
• Application diagnostics. GAE lets users record data and run diagnostics on applications
to gauge performance.
• Security features. GAE enables users to define access policies with the GAE firewall
and managed Secure Sockets Layer/Transport Layer Security certificates for free.
• Traffic splitting. GAE lets users route requests to different application versions.
• Versioning. Applications in Google App Engine function as a set of microservices that
refer back to the main source code. Every time code is deployed to a service with the
corresponding GAE configuration files, a version of that service is created.
The Hadoop Distributed File System (HDFS) is a distributed file system for Hadoop. It contains
a master/slave architecture. This architecture consist of a single NameNode performs the role of
master, and multiple DataNodes performs the role of a slave.
Both NameNode and DataNode are capable enough to run on commodity machines. The Java
language is used to develop HDFS. So any machine that supports Java language can easily run
the NameNode and DataNode software.
• The role of Job Tracker is to accept the MapReduce jobs from client and process the data
by using NameNode.
• In response, NameNode provides metadata to Job Tracker.
Task Tracker
Modules of Hadoop
1. HDFS: Hadoop Distributed File System. Google published its paper GFS and on the basis
of that HDFS was developed. It states that the files will be broken into blocks and stored
in nodes over the distributed architecture.
2. Yarn: Yet another Resource Negotiator is used for job scheduling and manage the cluster.
3. Map Reduce: This is a framework which helps Java programs to do the parallel
computation on data using key value pair. The Map task takes input data and converts it
into a data set which can be computed in Key value pair. The output of Map task is
consumed by reduce task and then the out of reducer gives the desired result.
4. Hadoop Common: These Java libraries are used to start Hadoop and are used by other
Hadoop modules.
Federation in the cloud is an ability to connect two or more cloud computing environment of
distinct cloud service providers. The federation can be classified into four types.
• Permissive federation
Permissive federation allows the interconnection of the cloud environment of two service
providers without the verifying identity of peer cloud using DNS lookups. This raises the
chances of domain spoofing.
• Verified Federation
Verified federation allows interconnection of the cloud environment, two service
providers, only after the peer cloud is identified using the information obtained from
DNS. Though the identity verification prevents spoofing the connection is still not
encrypted and there are chances of DNS attack.
• Encrypted Federation
Encrypted federation allows interconnection of the cloud environment of two services
provider only if the peer cloud supports transport layer security (TSL). The peer cloud
interested in the federation must provide the digital certificate which still provides
mutual authentication. Thus encrypted federation results in weak identity verification.
• Trusted Federation
Trusted federation allows two clouds from different provider to connect only under a
provision that the peer cloud support TSL along with that it provides a digital certificate
authorized by the certification authority (CA) that is trusted by the authenticating cloud.
19. Explain both advantages & disadvantages and open stack components.
Apart from various projects which constitute the OpenStack platform, there are nine major
services namely Nova, Neutron, Swift, Cinder, Keystone, Horizon, Ceilometer, and Heat.
Advantages of using OpenStack
The future of the cloud is federated, and when you look at the broad categories of apps
moving to the cloud, the truth of this statement begins to become clear. Gaming, social
media, Web, eCommerce, publishing, CRM – these applications demand truly global
coverage, so that the user experience is always on, local and instant, with ultra-low latency.
That’s what the cloud has always promised to be.
The problem is that end users can’t get that from a single provider, no matter how large.
The federated cloud model is a force for real democratization in the cloud market. It’s how
businesses will be able to use local cloud providers to connect with customers, partners and
employees anywhere in the world. It’s how end users will finally get to realize the promise of the
cloud. And, it’s how data center operators and other service providers will finally be able to
compete with, and beat, today’s so-called global cloud providers.