Unit III Cloud Computing CSE 8th Semester
Unit III Cloud Computing CSE 8th Semester
Resource Overview
What is a resource?
In the context of Google Cloud, a resource can refer to the service-level resources that
are used to process your workloads (VMs, DBs, and so on) as well as to the account-
level resources that sit above the services, such as projects, folders, and the
organization.
Both Cloud IAM and Organization policies are inherited through the hierarchy, and the
effective policy at each node of the hierarchy is the result of policies directly applied at
the node and policies inherited from its ancestors.
The following diagram shows an example resource hierarchy illustrating the core
account-level resources involved in administering your Google Cloud account.
Domain
• Your company Domain is the primary identity of your organization and establishes your company's
identity with Google services, including Google Cloud.
• At the domain level, you define which users should be associated with your organization when using
Google Cloud.
• Domain is also where you can universally administer policy for your users and devices (for example,
enable 2-factor authentication, reset passwords for any users in your organization).
• The G Suite or Cloud Identity account is associated with exactly one Organization.
• You manage the domain-level functionality using the Google Admin Console (admin.google.com).
For more information on the hierarchy of resources, see the Resource Manager
documentation.
Organization
• An Organization is the root node of the Google Cloud hierarchy of resources.
• All Google Cloud resources that belong to an Organization are grouped under the Organization node,
allowing you to define settings, permissions, and policies for all projects, folders, resources, and Cloud
Billing accounts it parents.
• An Organization is associated with exactly one Domain (established with either a G Suite or Cloud
Identity account), and is created automatically when you set up your domain in Google Cloud.
• Using an Organization, you can centrally manage your Google Cloud resources and your users' access to
those resources. This includes:
• Reactive management: an Organization resource provides a safety net to regain access to lost resources
(for example, if one of your team members loses their access or leaves the company).
• The various roles and resources that are related to Google Cloud (including the organization, projects,
folders, resources, and Cloud Billing accounts) are managed within the Google Cloud Console.
Folders
• Folders are a grouping mechanism and can contain projects, other folders, or a combination of both.
• Folders and projects are all mapped under the Organization node.
• Folders can be used to group resources that share common Cloud IAM policies.
• While a folder can contain multiple folders or resources, a given folder or resource can have exactly one
parent.
For more details about using folders, see Creating and Managing Folders.
Projects
• Projects are required to use service-level resources (such as Compute Engine virtual machines (VMs),
Pub/Sub topics, Cloud Storage buckets, and so on).
• All service-level resources are parented by projects, the base-level organizing entity in Google Cloud.
• You can use projects to represent logical projects, teams, environments, or other collections that map to
a business function or structure.
• Projects form the basis for enabling services, APIs, and Cloud IAM permissions.
For more details about projects, see Creating and Managing Projects.
Resources
• Google Cloud service-level resources are the fundamental components that make up all Google Cloud
services, such as Compute Engine virtual machines (VMs), Pub/Sub topics, Cloud Storage buckets, and so
on.
• For billing and access control purposes, resources exist at the lowest level of a hierarchy that also
includes projects and an organization.
Labels
• Labels help you categorize your Google Cloud resources (such as Compute Engine instances).
• You can attach labels to each resource, then filter the resources based on their labels.
• Labels are great for cost tracking at a granular-level. Information about labels is forwarded to the billing
system, so you can analyze your charges by label.
For more details about using labels, see Creating and Managing Labels.
Overview
A Cloud Billing account is set up in Google Cloud and is used to define who pays for a
given set of Google Cloud resources and Google Maps Platform APIs. Access control to
a Cloud Billing account is established by Cloud Identity and Access Management
(Cloud IAM) roles. A Cloud Billing account is connected to a Google payments profile.
Your Google payments profile includes a payment instrument to which costs are
charged.
• Project usage is charged to the linked Cloud Billing account. • Processes payments for ALL Google
services (not just Google Cloud.
• Results in a single invoice per Cloud Billing account
• Stores information like name, address,
• Operates in a single currency
and tax ID (when required legally) of who is
• Defines who pays for a given set of resources responsible for the profile.
• Is connected to a Google Payments Profile, which includes a • Stores your various payment
payment instrument, defining how you pay for your charges instruments (credit cards, debit cards, bank
accounts, and other payment methods you've
• Has billing-specific roles and permissions to control accessing used to buy through Google in the past.)
and modifying billing-related functions (established by Cloud
Identity and Access Management roles) • Functions as a document center, where you
can view invoices, payment history, and so
on.
• Costs are charged automatically to the payment instrument connected to Cloud Billing account.
• The documents generated for self-serve accounts include statements, payment receipts, and tax
invoices, and are accessible in the Cloud Console.
• Invoices are also accessible in the Cloud Console, as are payment receipts.
• You must be eligible for invoiced billing. Learn more about invoiced billing eligibility.
Payments profile types
When you create your payments profile, you'll be asked to specify the profile type. This
information must be accurate for tax and identity verification. This setting can't be
changed. When you are setting up your payments profile, make sure to choose the
type that best fits how you plan to use your profile.
• Individual
• You're using your account for your own personal payments.
• If you register your payments profile as an individual, then only you can manage the profile. You won't
be able to add or remove users, or change permissions on the profile.
• Business
• You're paying on behalf of a business, organization, partnership, or educational institution.
• You use Google payments center to pay for Play apps and games, and Google services like Google Ads,
Google Cloud, and Fi phone service.
• A business profile allows you to add other users to the Google payments profile you manage, so that
more than one person can access or manage a payments profile.
• All users added to a business profile can see the payment information on that profile.
Charging cycle
The charging cycle on your Cloud Billing account determines how and when you pay for
your Google Cloud services and your use of Google Maps Platform APIs.
For self-serve Cloud Billing accounts, your Google Cloud costs are charged
automatically in one of two ways:
• Threshold billing: Costs are charged when your account has accrued a specific amount.
For self-serve Cloud Billing accounts, your charging cycle is automatically assigned
when you create the account. You do not get to choose your charging cycle and you
cannot change the charging cycle.
For invoiced Cloud Billing accounts, you typically receive one invoice per month and the
amount of time you have to pay your invoice (your payment terms) is determined by the
agreement you made with Google.
Billing contacts
A Cloud Billing account includes one or more contacts that are defined on the Google
Payments profile that is connected to the Cloud Billing account. These contacts are
people who are designated to receive billing information specific to the payment
instrument on file (for example, when a credit card needs to be updated). To access and
manage this list of contacts, you can use the Payments console or you can use
the Cloud Console.
Subaccounts
Subaccounts are intended for resellers. If you are a reseller, you can use
subaccounts to represent your customers' charges for the purpose of chargebacks.
Cloud Billing subaccounts allow you to group charges from projects together on a
separate section of your invoice. A billing subaccount is a Cloud Billing account with a
billing linkage to a reseller's master Cloud Billing account on which the charges appear.
The master Cloud Billing account must be on invoiced billing.
A subaccount behaves like a Cloud Billing account in most ways: it can have projects
linked to it, Cloud Billing data exports can be configured on it, and it can have Cloud
IAM roles defined on it. Any charges made to projects linked to the subaccount are
grouped and subtotalled on the invoice, and the effect on resource management is that
access control policy can be entirely segregated on the subaccount to allow for
customer separation and management.
The Cloud Billing Account API provides the ability to create and manage subaccounts.
Use the API to connect to your existing systems and provision new customers or
chargeback groups programmatically.
• Payment linkages define which Cloud Billing account pays for a given project.
The following diagram shows the relationship of ownership and payment linkages for a
sample organization.
In the diagram, the organization has ownership over Projects 1, 2, and 3, meaning that
it is the Cloud IAM permissions parent of the three projects.
The Cloud Billing account is linked to Projects 1, 2, and 3, meaning that it pays for costs
incurred by the three projects.
The Cloud Billing account is also linked to a Google payments profile, which stores
information like name, address, and payment methods.
Note: Although you link Cloud Billing accounts to projects, Cloud Billing accounts are not parents of
projects in an Cloud IAM sense, and therefore projects don't inherit permissions from the Cloud Billing
account they are linked to.
In this example, any users who are granted Cloud IAM billing roles on the organization
also have those roles on the Cloud Billing account or the projects.
For more information on granting Cloud IAM billing roles, see Overview of Cloud Billing
access control.
Roles Overview
What are roles?
Roles grant one or more privileges to a user that allow performing a common business
function.
Google Cloud offers Cloud Identity and Access Management (Cloud IAM) to manage
access control to your Google Cloud resources. Cloud IAM lets you control who
(users) has what access (roles) to which resources by setting Cloud IAM policies. To
assign permissions to a user, you use Cloud IAM policies to grant specific role(s) to a
user. Roles have one or more permissions bundled within them, controlling user access
to resources.
You can set a Cloud IAM policy (roles) at the organization level, the folder level,
the project level, or (in some cases) on the service-level resource.
Policies are inherited through the hierarchy. The effective policy at each node of the
hierarchy is the result of policies directly applied at the node and policies inherited from
its ancestors. If you set a policy at the Organization level, it is inherited by all its child
folders and projects. If you set a policy at the project level, it is inherited by all its child
resources. You can enforce granular permissions at different levels in the resource
hierarchy to ensure that the right individuals have the ability to spend within Google
Cloud.
• Document who your admins are and communicate those names to people in your organization
Important Roles
The following diagram represents the Google Cloud resource hierarchy in complete
form, and calls out the important high-access roles at each level:
public Domain
The G Suite or Cloud Identity super administrators at the domain level are the first users who
can access an organization after creation.
domain Organization
payment Payments Profile
Payments Profiles are managed outside of your Cloud Organization, in the Google Payments Center, a single loc
where you can manage the ways you pay for all Google products and services, such as Google Ads, Google Clou
and Fi phone service. Payments Profiles are connected to Cloud Billing accounts.
First of all, you do not need to buy the hardware and maintain it
with your own team. The information in the cloud is stored on
several servers at the same time. It means that even if 1 or 2
servers are damaged, you will not lose your information. It also
helps to provide the high uptime, up to 99.9%.
The cloud computing is the perfect Choice for those who do not
require a high performance constantly but use it time by time. You
can get a subscription and use the resources you paid for. Most
providers even let pause the subscription if you do not need it. and
at the same time, you’re able to control everything and get instant
help from the support team.
Automation
Cost
With cloud computing, you do not need to pay for the services you
don’t use: the subscription model means you choose the amount of
space, processing power, and other components that you really
need.
Security
Many people are not sure about the security of cloud services. Why
can it be not so secure? As the company uses the third party
solution to store data, it’s reasonable to think that the provider can
access the confidential data without permission. However, there
are good solutions to avoid the leaks.
Ease of access
Users can access cloud databases from virtually anywhere, using a vendor’s API or
web interface.
Scalability
Cloud databases can expand their storage capacities on run-time to accommodate
changing needs. Organizations only pay for what they use.
Disaster recovery
In the event of a natural disaster, equipment failure or power outage, data is kept secure
through backups on remote servers.
Considerations for cloud databases
• Control options
Users can opt for a virtual machine image managed like a traditional database or a
provider’s database as a service (DBaaS).
• Database technology
SQL databases are difficult to scale but very common. NoSQL databases scale more
easily but do not work with some applications.
• Security
Most cloud database providers encrypt data and provide other security measures;
organizations should research their options.
• Maintenance
When using a virtual machine image, one should ensure that IT staffers can maintain
the underlying infrastructure.
Today, the need to process large amount of data has been enhanced in the area of
Engineering,Science, Commerce and the Economics of the world. The ability to
process huge data frommultiple resources of data remains a critical
challenge.Many organizations face difficulties when dealing withalarge amount
of data. They are unable tomanage, manipulate, process, share and retrieve large
amounts of data by traditional softwaretools due to them being costly and time-
consuming for data processing. The term large-scaleprocessing is focused on
how to handle the applications with massive datasets.Such applicationsdevote the
largest fraction of execution time to movement of data from data storage to
thecomputing nodein a computing environment .The main challenges
behind such applications are data storage capacity and processor computing
power constrains. Developers need hundredsor thousands of processing nodes and
large volume of storage devices to process complexapplications with large
datasets, such applications process multi-terabyte to petabyte-sizeddatasets and
using traditional data processing methods like sequential processing andcentralized
data processing are not effective to solve these kinds of application’
s p r o b l e m s . The question is how to process large amounts of distributed data
quickly with good responsetimes and replication at minimum cost? One of the best
ways for huge data processing is toperform parallel and distributed computing in a
cloud computing environment.Cloud computingas a distributed computing
paradigm aimsatlarge datasets to be processed on available computernodes by
using a MapReduce framework. MapReduce is a software framework introduced to
theworld by Google in 2004;it runs on a large cluster of machines and is highly
scalable . It is ahigh-performance processing technique to solve large-scale dataset
problems. MapReducecomputation processes petabyte to terabyte of unit data on
thousands of processors. Google usesMapReduce for indexing web pages. Its main
aim is to process large amount of data in parallelstored on a distributed clusterof
computers. This studypresents a way to solve large-scale
datasetprocessingproblems in parallel and distributed mode operating on a large
cluster of machines byusing MapReduce framework. It is a basis to take
advantage of cloud computing paradigm asanew realistic computation industry
standard.The first contribution of this work is topropose aframeworkfor
running MapReduce system in acloud environmentbased on the captured
requirements and to present its implementationonAmazon Web Services.The
second contribution is to present an experimentation of running theMapReduce
system in a cloud environment to validate the proposed framework and to present
theevaluation of the experiment based on the criteria such as speed of processing,
data-storage usage,response time and cost efficiency.The rest of the paper is
organized as follows. Section II provides background information anddefinitionof
MapReduce and Hadoop. Section III describesworkflow ofMapReduce,
the generalintroduction of Map and Reduce functions and it also describes
Hadoop, an implementation of theMapReduce framework. Section IV we present
MapReduce in cloud computing. Section Vpresents the
relatedMapReducesystems. Section VI capturesa set of requirements to
develop theframework. Section VII showstheproposed framework andthe
implementation of the framework onAmazon Web Services for running a
MapReduce system in a cloud environment it alsopresents anexperimentation
of running a MapReduce system in a cloud environmentto validate the
proposed framework and resulting outcomes with evaluation criteria.
EC2 was the idea of engineer Chris Pinkham who conceived it as a way to
scale Amazon's internal infrastructure. Pinkham and engineer Benjamin Black
presented a paper on their ideas to Amazon CEO Jeff Bezos, who liked what
he read and requested details on virtual cloud servers.
EC2 was then developed by a team in Cape Town, South Africa. Pinkham
provided the initial architecture guidance for EC2, gathered a development
team and led the project along with Willem van Biljon.
In 2006, Amazon announced a limited public beta test of EC2, and in 2007
added two new instance types -- Large and Extra-Large. Amazon announced
the addition of static IP addresses, availability zones, and user
selectable kernels in spring 2008, followed by the release of the Elastic Block
Store (EBS) in August.
Amazon EC2 went into full production on October 23, 2008. Amazon also
released a service level agreement (SLA) for EC2 that day, along with
Microsoft Windows and SQL Server in beta form on EC2. Amazon added
the AWS Management Console, load balancing, autoscaling, and cloud
monitoring services in 2009.
As of 2019, EC2 and Amazon Simple Storage Service (S3) are the most
popular of Amazon's AWS products.
Data only remains on an EC2 instance while it is running, but a developer can
use an Amazon Elastic Block Store volume for an extra level of durability and
Amazon S3 for EC2 data backup.
Cost
A breakdown
of Amazon EC2 instances and their associated prices.
Benefits
Getting started with EC2 is easy, and because EC2 is controlled by APIs
developers can commission any number of server instances at the same time
to quickly increase or decrease capacity. EC2 allows for complete control of
instances which makes operation as simple as if the machine were in-house.
Challenges
Security -- developers must make sure that public facing instances are
running securely.
Ongoing maintenance -- Amazon EC2 instances are virtual machines that run
in Amazon's cloud. However, they ultimately run on physical hardware which
can fail. AWS alerts developers when an instance must be moved due to
hardware maintenance. This requires ongoing monitoring.
EC2 vs. S3
Both Amazon EC2 and Amazon S3 are important services that allow
developers to maximize use of the AWS cloud. The main difference between
Amazon EC2 and S3 is that EC2 is a computing service that allows
companies to run servers in the cloud. While S3 is an object storage service
used to store and retrieve data from AWS through the Internet. S3 is like a
giant hard drive in the cloud, while EC2 offers CPU and RAM in addition to
storage. Many developers use both services for their cloud computing needs.
Execution Environment
Data Management
Windows Azure, SQL Azure, and the associated services provide opportunities for
storing and managing data in a range of ways. The following data management
services and features are available:
• Azure Storage: This provides four core services for persistent and durable
data storage in the cloud. The services support a REST interface that can be
accessed from within Azure-hosted or on-premises (remote) applications. For
information about the REST API, see “Windows Azure Storage Services
REST API Reference” athttp://msdn.microsoft.com/en-
us/library/dd179355.aspx. The four storage services are:
o The Azure Table Service provides a table-structured storage mechanism
based on the familiar rows and columns format, and supports queries for
managing the data. It is primarily aimed at scenarios where large volumes
of data must be stored, while being easy to access and update. For more
detailed information see “Table Service Concepts”
at http://msdn.microsoft.com/en-us/library/dd179463.aspx and “Table
Service API” at http://msdn.microsoft.com/en-us/library/dd179423.aspx.
o The Binary Large Object (BLOB) Service provides a series of
containers aimed at storing text or binary data. It provides both Block
BLOB containers for streaming data, and Page BLOB containers for
random read/write operations. For more detailed information see
“Understanding Block Blobs and Page Blobs”
athttp://msdn.microsoft.com/en-us/library/ee691964.aspx and “Blob
Service API” at http://msdn.microsoft.com/en-us/library/dd135733.aspx.
o The Queue Service provides a mechanism for reliable, persistent
messaging between role instances, such as between a Web role and a
Worker role. For more detailed information see “Queue Service Concepts”
at http://msdn.microsoft.com/en-us/library/dd179353.aspx and “Queue
Service API” athttp://msdn.microsoft.com/en-us/library/dd179363.aspx.
o Windows Azure Drives provide a mechanism for applications to mount a
single volume NTFS VHD as a Page BLOB, and upload and download
VHDs via the BLOB. For more detailed information see “Windows Azure
Drive” (PDF) at http://go.microsoft.com/?linkid=9710117.
• SQL Azure Database: This is a highly available and scalable cloud database
service built on SQL Server technologies, and supports the familiar T-SQL
based relational database model. It can be used with applications hosted in
Windows Azure, and with other applications running on-premises or hosted
elsewhere. For more detailed information see “SQL Azure Database”
at http://msdn.microsoft.com/en-us/library/ee336279.aspx.
• Data Synchronization: SQL Azure Data Sync is a cloud-based data
synchronization service built on Microsoft Sync Framework technologies. It
provides bi-directional data synchronization and data management capabilities
allowing data to be easily shared between multiple SQL Azure databases and
between on-premises and SQL Azure databases. For more detailed information
see “Microsoft Sync Framework Developer Center”
at http://msdn.microsoft.com/en-us/sync.
• Caching: This service provides a distributed, in-memory, low latency and
high throughput application cache service that requires no installation or
management, and dynamically increases and decreases the cache size
automatically as required. It can be used to cache application
data, ASP.NET session state information, and for ASP.NET page output
caching. For more detailed information see “Caching Service (Windows Azure
AppFabric)” at http://msdn.microsoft.com/en-us/library/gg278356.aspx.
Networking Services
Windows Azure provides several networking services that you can take advantage
of to maximize performance, implement authentication, and improve
manageability of your hosted applications. These services include the following:
• Content Delivery Network (CDN). The CDN allows you to cache publicly
available static data for applications at strategic locations that are closer (in
network delivery terms) to end users. The CDN uses a number of data centers
at many locations around the world, which store the data in BLOB storage that
has anonymous access. These do not need to be locations where the
application is actually running. For more detailed information see “Delivering
High-Bandwidth Content with the Windows Azure CDN”
athttp://msdn.microsoft.com/en-us/library/ee795176.aspx.
• Virtual Network Connect. This service allows you to configure roles of an
application running in Windows Azure and computers on your on-premises
network so that they appear to be on the same network. It uses a software agent
running on the on-premises computer to establish an IPsec-protected
connection to the Windows Azure roles in the cloud, and provides the
capability to administer, manage, monitor, and debug the roles directly. For
more detailed information see “Connecting Local Computers to Windows
Azure Roles” at http://msdn.microsoft.com/en-us/library/gg433122.aspx.
• Virtual Network Traffic Manager. This is a service that allows you to set up
request redirection and load balancing based on three different methods.
Typically you will use Traffic Manager to maximize performance by
redirecting requests from users to the instance in the closest data center using
the Performance method. Alternative load balancing methods available are
Failover and Round Robin. For more detailed information see “Windows
Azure Traffic Manager” at http://msdn.microsoft.com/en-
us/WAZPlatformTrainingCourse_WindowsAzureTrafficManager.
• Access Control. This is a standards-based service for identity and access
control that makes use of a range of identity providers (IdPs) that can
authenticate users. ACS acts as a Security Token Service (STS), or token
issuer, and makes it easier to take advantage of federation authentication
techniques where user identity is validated in a realm or domain other than that
in which the application resides. An example is controlling user access based
on an identity verified by an identity provider such as Windows Live ID or
Google. For more detailed information see “Access Control Service 2.0”
at http://msdn.microsoft.com/en-us/library/gg429786.aspx and “Claims Based
Identity & Access Control Guide” at http://claimsid.codeplex.com/.
• Service Bus. This provides a secure messaging and data flow capability for
distributed and hybrid applications, such as communication between Windows
Azure hosted applications and on-premises applications and services, without
requiring complex firewall and security infrastructures. It can use a range of
communication and messaging protocols and patterns to provide delivery
assurance, reliable messaging; can scale to accommodate varying loads; and
can be integrated with on-premises BizTalk Server artifacts. For more detailed
information see “AppFabric Service Bus” at http://msdn.microsoft.com/en-
us/library/ee732537.aspx
EUCALYPTUS—OPEN SOURCE SOFTWARE SUPPORTING -
CLOUD COMPUTING
Many other cloud vendors support Eucalyptus, so today, it is the most portable
option available. Eucalyptus also works with most of the currently available Linux
distributions, including Ubuntu, Red Hat Enterprise Linux (RHEL), CentOS,
SUSE Linux Enterprise Server (SLES), openSUSE, Debian, and Fedora.
Importantly, Eucalyptus can use a variety of virtualization technologies, including
VMware, Xen, and KVM, to implement the cloud abstractions it supports.
Eucalyptus’s Walrus is an S3-compatible implementation of cloud storage. It is
well-described in The Eucalyptus Open-source Cloud-computing System.
The Ubuntu Enterprise Cloud (UEC) is powered by Eucalyptus and brings an
Amazon EC2-like infrastructure inside the firewall.12 It appears that the recently
announced Nimbula, which supports private versions of EC2, is similar.
Ubuntu UEC is open source, with commercial support available from Canonical
Ltd., a company founded (and funded) by South African entrepreneur Mark
Shuttleworth (formerly the official maintainer of Debian, a version of Linux, and
founder of Thawte Consulting) for the promotion of free software projects.
Canonical is registered in the Isle of Man (part of the Channel Islands), a favorable
tax jurisdiction, and employs staff around the world, as well as in its main offices
in London. Ubuntu JeOS is an efficient variant of Ubuntu configured specifically
for virtual appliances.
Eucalyptus Enterprise Edition 2.0 was built on the core Eucalyptus open source platform, with
additional functionality designed to optimize the building and deploying of massively scalable, high
performance private clouds in the enterprise. The latest release adds support for Windows Server
2003 and 2008 and Windows 7 virtual machines. (Previously, only Linux images were supported).
Other changes include new accounting and user group management capabilities, allowing
administrators to easily define groups of users and allocate different levels of access based on a
group’s needs.