Cloud Computing Notes
Cloud Computing Notes
Computing is the process of using computer technology to complete a given goal-oriented task.
Cloud computing is the delivery of computing resources (like storage, processing power, and software) over
the internet. Major providers like Google Cloud Platform (GCP), Amazon Web Services (AWS), and
Microsoft Azure manage these resources for users. This allows users to access services like data storage and
web-based software without owning physical hardware. Many courses are available to learn about cloud
computing.
• Work from Anywhere: Access data and services from any device.
• Cost Efficiency: Lower setup and maintenance costs.
• Backup and Recovery: Simplified data backup and recovery.
• Database Security: Enhanced protection for stored data.
Cloud computing is the delivery of computing resources like storage and software over the internet,
managed by providers such as GCP, AWS, and Azure.
• Client-Server Model: Central server holds data; users connect to access it.
• Distributed Computing Model: Multiple computers share resources via networking.
• Cloud Computing: Emerged in the early 2000s with AWS (2002), Google Cloud (2009), and
Microsoft Azure (2009). Cloud computing improved efficiency and flexibility.
• Infrastructure as a Service (IaaS): Provides basic computing resources like virtual machines and
storage.
• Platform as a Service (PaaS): Offers hardware and software tools over the internet for application
development.
• Software as a Service (SaaS): Delivers software applications over the internet without needing
installation.
• Anything as a Service (XaaS): Encompasses a variety of services delivered online, paid for based
on usage.
Conclusion
Cloud computing offers flexible, scalable, and cost-effective computing resources. It includes various types
like public, private, and hybrid clouds, and services such as IaaS, PaaS, SaaS, and XaaS. While it provides
significant benefits like accessibility and cost savings, it also presents challenges like security risks and cost
management.
Traditional computing has been integral to business and daily life, but it has several limitations. Initially,
businesses relied on manual processes like pen, paper, and fax machines. Over time, computers replaced
these methods, making business operations more efficient.
1. High Cost and Complexity: Setting up and maintaining in-house IT infrastructure is expensive and
complex. It involves significant investment in hardware, software, and skilled personnel.
2. Time-Consuming: The process of setting up infrastructure can take weeks or months.
3. Maintenance Burden: Regular maintenance, updates, and security measures are continuous
burdens.
4. Scalability Issues: Scaling up or down based on demand is difficult and often requires additional
investment.
5. Limited Access and Flexibility: Traditional systems are not easily accessible remotely, limiting
flexibility and mobility.
6. Resource Management: Organizations often face challenges in managing hardware and software
resources efficiently.
When needing IT infrastructure, organizations can either manage it themselves or outsource to third parties.
However, both options come with their own set of issues, such as managing hardware and software, which
may not be the organization's core competency.
Traditional outsourcing doesn't solve all problems because different users have different requirements. For
instance, application developers need a different setup than end-users. Therefore, traditional outsourcing
can't fully address the complexities and concerns of all users.
1. Infrastructure: The hardware components like processors, memory, and storage devices. It requires
basic amenities like power and cooling.
2. Platform: The combination of hardware and software, including operating systems and runtime
libraries, where applications run. It is also where software development occurs.
3. Application: The software applications accessed by end-users for various tasks like document
editing, gaming, or business operations.
Technological advancements have led to the development of cloud computing, which offers computing
resources as services, much like utilities. This model removes the burden of managing infrastructure,
platforms, and applications, allowing users to focus on their core tasks.
This model offers flexibility, cost-efficiency, and better service quality by leveraging the economies of scale
of large cloud vendors. Users only need a basic device and internet connection to access these services,
reducing the demands on local hardware.
*Concerns
While cloud computing raises concerns about reliability and data security, it generally offers better or
comparable safeguards than traditional computing. The key advantage is the significant reduction in cost and
complexity, along with improved service flexibility and quality.
Benefits of Cloud Computing:
Cloud computing has transformed the scope of computing, offering it as an on-demand utility service. This
model provides several benefits that influence its adoption over traditional computing methods:
These benefits make cloud computing a cost-effective, flexible, and reliable alternative to traditional
computing methods.
Challenges of Cloud Computing:
While cloud computing offers many benefits, it also presents several challenges. Efforts are ongoing to
address these issues:
These challenges highlight areas where cloud computing needs to improve, but ongoing efforts by cloud
vendors aim to mitigate these issues.
Types of Cloud Computing:
Cloud computing provides computing resources as a service, with various deployment models like public,
private, hybrid, and community clouds. These models differ based on implementation, hosting, and
accessibility. Here are the main types:
1. Public Cloud:
o Operated by third-party providers offering services over the internet with pay-as-you-go
billing.
o Examples: Microsoft Azure, Google App Engine, Amazon EC2.
o Benefits: Cost-effective, scalable, ideal for small businesses.
o Drawback: Shared resources may lead to security concerns.
2. Private Cloud:
o Runs on private infrastructure, offering dedicated resources to a single organization.
o Examples: HP Data Centers, Elastic-Private Cloud.
o Benefits: High security and control, customizable.
o Drawback: High cost, requires skilled management.
3. Hybrid Cloud:
o Combines public and private clouds, allowing data and applications to be shared between
them.
o Benefits: Scalable, cost-effective, secure, and can handle peak loads.
o Drawback: Complex to manage.
4. Community Cloud:
o Shared by several organizations with similar needs, managed internally or by a third party.
o Benefits: Cost-sharing, higher security than public clouds.
o Drawback: Limited storage and bandwidth, not suitable for all businesses.
5. Multi-cloud:
o Uses multiple cloud services from different providers.
o Benefits: Prevents vendor lock-in, enhances reliability, meets specific business and
application needs.
o Drawback: Can be complex to manage and integrate.
1. Infrastructure-as-a-Service (IaaS):
o Provides virtualized hardware resources such as virtual processors, memory, storage, and
networks.
o Consumers can build virtual machines and other infrastructure components.
o Examples: Amazon EC2, Google Compute Engine.
o Benefits: No need to manage physical hardware, scalable, cost-effective.
2. Platform-as-a-Service (PaaS):
o Offers a platform for developing, testing, and deploying applications.
o Includes infrastructure plus middleware, development tools, and runtime environments.
o Examples: Google App Engine, Microsoft Azure.
o Benefits: Simplifies application development, reduces the cost of ownership, supports
collaborative work.
3. Software-as-a-Service (SaaS):
o Delivers software applications over the internet, accessible via web browsers.
o Hosted and maintained by the service provider, including updates and data management.
o Examples: Salesforce CRM, Google Apps, Microsoft Office 365.
o Benefits: No need for software installation or maintenance, accessible from anywhere, cost-
effective subscription model.
These models collectively enable users to focus on their core tasks while leveraging scalable and efficient
cloud resources.
Comparison of the traditional system model and the cloud system model:
This table highlights the key differences between the traditional system model and the cloud system model
in terms of ownership, deployment, expenditure, scalability, resource management, and work environment.
These categories extend the range of cloud services beyond infrastructure, platform, and software, providing
specialized solutions for various business needs such as security, identity management, storage, database
management, backup, compliance, desktop virtualization, and monitoring.
• Eucalyptus: A cloud computing platform that enables users to build private and hybrid clouds compatible
with Amazon Web Services (AWS) APIs.
• OpenNebula: An open-source cloud computing toolkit for managing virtualized data centers and private
cloud infrastructure.
• Nebula: Developed by NASA, Nebula is an open-source cloud computing platform designed for scientific
research and high-performance computing.
• Nimbus: An open-source toolkit for building Infrastructure-as-a-Service (IaaS) clouds, primarily focused
on providing cloud computing capabilities for scientific and academic research.
• OpenStack: A widely-used open-source cloud computing platform for building public and private clouds,
offering infrastructure and platform services.
• Apache VCL (Virtual Computing Lab): An open-source cloud computing platform designed for
managing and provisioning virtual machines in educational environments.
• Apache CloudStack: An open-source cloud computing platform for building and managing public,
private, and hybrid clouds.
• Enomaly ECP (Elastic Computing Platform): An open-source cloud computing platform for building
private and public clouds, offering infrastructure services with features like auto-scaling and resource
scheduling.
Resource Virtualization:
Overview: Virtualization in cloud computing creates virtual versions of computing resources like servers,
storage, and networks. It allows multiple virtual instances to run on a single physical infrastructure,
enhancing resource utilization, scalability, and flexibility. Virtualization optimizes hardware usage and
reduces costs, making it a cornerstone of cloud computing.
Work of Virtualization in Cloud Computing: Virtualization enables users to share infrastructure and
reduce costs in cloud computing. It allows outsourcing IT maintenance, streamlining operations, and
optimizing expenses, thus becoming a cost-saving advantage for cloud users.
Benefits of Virtualization:
1. Better Performance: Utilizes CPU resources more efficiently, leading to improved performance.
2. Enhanced Security: Virtual machines are logically separated, enhancing security and availability.
3. Cost Reduction: Enables running multiple virtual machines on the same hardware, reducing
operational costs.
4. Reliability: Offers better reliability and disaster recovery compared to traditional systems.
5. Environmentally Friendly: Reduces physical resource consumption, making it more
environmentally friendly.
Drawback of Virtualization:
Characteristics of Virtualization:
Types of Virtualization:
Uses of Virtualization:
1. Data Integration
2. Business Integration
3. Service-Oriented Architecture (SOA) Data Services
4. Searching Organizational Data
Conclusion: Virtualization in cloud computing enhances resource pooling, cost savings, and environmental
friendliness. It offers various benefits like better performance, enhanced security, and reliability. Different
types of virtualization cater to diverse needs, ensuring scalability, reliability, and accessibility of data.
*Design: A resource pooling architecture combines identical computing resources into pools and ensures
synchronized allocation.
*Categories of Resources: Computing resources are categorized into computer/server, network, and
storage, with a focus on processors, memory, network devices, and storage.
• Setup: Physical servers are grouped into pools and installed with operating systems and system
software.
• Virtualization: Virtual machines are created on these servers, and physical processor and memory
components are linked to increase server capacity.
2. Storage Pool:
• Setup: Storage disks are configured with proper partitioning and formatting and provided to
consumers in a virtualized mode.
3. Network Pool:
• Setup: Networking components like switches and routers are pre-configured and delivered in
virtualized mode to consumers to build their own networks.
4. Hierarchical Organization:
• Structure: Data centers organize separate resource pools for server, processor, memory, storage, and
network components.
• Hierarchy: Hierarchical structures establish parent and child relationships among pools to handle
complexity and ensure fault tolerance.
Benefits:
Conclusion: Resource pooling in cloud computing enables efficient allocation and management of
computing resources. By grouping resources into interconnected pools and organizing them hierarchically,
cloud providers ensure scalability, fault tolerance, and efficient resource utilization.
Resource sharing in cloud computing increases resource utilization rates by allowing multiple
applications to use pooled resources. It involves distributing pooled and virtualized resources among
applications, users, and servers, requiring appropriate architectural support.
1. Quality of Service (QoS): Maintaining performance isolation is crucial for ensuring QoS as multiple
applications compete for the same resources.
2. Predictability: Predicting response and turnaround time becomes difficult due to resource sharing,
necessitating optimized resource management strategies.
Multi-tenancy:
• Definition: Multi-tenancy allows a resource component to serve different consumers while keeping
them isolated from each other.
• Implementation: In cloud computing, multi-tenancy is realized through ownership-free sharing of
resources in virtualized mode and temporary allocation from resource pools.
Types of Tenancy:
• Public Cloud: Co-tenants are mutually exclusive and share the same computing infrastructure.
• Community Cloud: Co-tenants belong to the same community and share similar interests.
• Private Cloud: Tenancy is limited within sub-co-tenants internal to a single organization.
• Infrastructure as a Service (IaaS): Shared computing infrastructure resources like servers and
storage.
• Platform as a Service (PaaS): Sharing of operating system by multiple applications.
• Software as a Service (SaaS): Single application instance and/or database instance serving multiple
consumers.
Benefits of Multi-tenancy:
Conclusion: Resource sharing and multi-tenancy in cloud computing enable efficient resource utilization
and cost reduction by distributing pooled resources among applications and users while maintaining
performance isolation and ensuring QoS.
RESOURCE PROVISIONING:
In traditional computing, setting up new servers or virtual servers is time-consuming. Cloud computing, with
its virtualization and IaaS model, enables rapid provisioning of resources, often in just minutes, if the
required resources are available. This is a significant advantage of cloud computing, allowing users to create
virtual servers through self-service interfaces.
Flexible Provisioning in Cloud Computing: Flexible resource provisioning is essential in cloud computing
to meet varying demands. Orchestration of resources must be intelligent and rapid to provision resources to
applications dynamically.
Role of SLA: Service level agreements (SLAs) between consumers and cloud providers help estimate
resource requirements. Cloud providers plan resources based on SLAs to dynamically allocate physical
resources to virtual machines (VMs) running end-user applications.
Resource Provisioning Approaches: Resource provisioning in cloud computing is enabled through VM
provisioning. Two approaches are static and dynamic provisioning. Static provisioning allocates resources
once during VM creation, while dynamic provisioning adjusts resource capacity based on workload
fluctuations.
Hybrid Approach: A hybrid provisioning approach combines static and dynamic provisioning to address
real-time scenarios with changing load in cloud computing.
Resource Under-provisioning and Over-provisioning: Traditional computing often faces resource under-
provisioning or over-provisioning issues due to fixed-size resource allocation. Cloud computing mitigates
these issues by employing dynamic or hybrid resource provisioning approaches, ensuring high performance
at lower costs.
VM Sizing: VM sizing ensures the allocated resources match the workload. It can be done on a VM-by-VM
basis or jointly, allowing unused resources from less loaded VMs to be allocated to others.
Dynamic Provisioning and Fault Tolerance: Dynamic resource provisioning in cloud computing enhances
fault tolerance by replacing faulty nodes with new ones. This leads to a zero-downtime architecture,
ensuring continuous operation even during hardware failures.
In essence, resource provisioning in cloud computing is characterized by its flexibility, automation, and
ability to ensure high performance while optimizing costs.
Case Study: -
case study that illustrates the use of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and
Software as a Service (SaaS) in a real-world scenario:
Background: A tech startup, XYZ Innovations, has developed a new social media analytics tool aimed at
helping businesses analyze and optimize their social media marketing strategies. The company is growing
rapidly and needs to scale its infrastructure to handle increasing demand from customers.
IaaS Implementation: To meet their infrastructure needs, XYZ Innovations decides to adopt IaaS. They
choose a popular cloud provider and provision virtual servers to host their application. With IaaS, they have
full control over the virtual machines, allowing them to install and configure the necessary software stack
for their social media analytics tool. They can also scale their infrastructure up or down based on demand,
ensuring they only pay for the resources they use.
PaaS Implementation: As XYZ Innovations continues to grow, they realize that managing the entire
software stack on their own virtual machines is becoming cumbersome. They decide to migrate their
application to a PaaS solution offered by their cloud provider. With PaaS, they no longer need to worry
about managing the underlying infrastructure. Instead, they focus on developing and deploying their
application using the tools and services provided by the PaaS platform. This allows them to accelerate their
development process and focus more on innovation.
SaaS Implementation: To complement their social media analytics tool, XYZ Innovations decides to offer
additional services to their customers. They develop a SaaS application that integrates with their existing
platform and provides advanced reporting and visualization features. This SaaS application is hosted on the
same cloud platform as their PaaS solution, allowing seamless integration between the two. Customers can
now access the social media analytics tool and the additional reporting features through a web browser,
without the need for any installation or maintenance on their end.
Benefits:
• Scalability: With IaaS, XYZ Innovations can easily scale their infrastructure to handle increasing
demand.
• Development Efficiency: PaaS enables faster development and deployment of applications,
allowing XYZ Innovations to innovate more rapidly.
• User Accessibility: SaaS makes their services easily accessible to customers, who can access the
applications from anywhere with an internet connection.
• Cost Savings: By leveraging cloud computing services, XYZ Innovations can reduce upfront
infrastructure costs and pay only for the resources they use.
Conclusion: By adopting IaaS, PaaS, and SaaS solutions, XYZ Innovations has been able to rapidly scale
their infrastructure, accelerate their development process, and offer additional services to their customers.
This has helped them stay competitive in the rapidly evolving market of social media analytics.
UNIT-2
Scaling in the Cloud:
What is Scaling?
• Scaling is about a system's ability to grow or shrink as needed. In computing, it means handling
varying workloads efficiently without performance issues or unnecessary costs.
• In traditional computing, scaling resources manually is common, but it's inefficient because
resources often go unused.
• Cloud computing offers dynamic and automatic scaling, adjusting resources on-the-fly based on
demand.
Auto-Scaling in Cloud:
• How it Works: The system automatically adjusts resources based on predefined conditions or
schedules.
• Scaling Boundaries: Limits can be set to prevent excessive scaling, with manual intervention
required if boundaries are exceeded.
In summary, scaling in the cloud is about adapting resources to meet demand efficiently. It involves
proactive planning based on predictable patterns and reactive adjustments to real-time changes, all done
automatically within predefined limits.
Types of Scaling:
• Vertical Scaling (Scaling Up): Increasing a system's capacity by replacing existing components
with more powerful ones.
• Horizontal Scaling (Scaling Out): Increasing capacity by adding more resources without replacing
existing ones.
Vertical Scaling:
Horizontal Scaling:
• Advantages: No service interruption, uses commodity hardware, and can handle almost unlimited
traffic.
• Disadvantages: Complex to manage, requires distributed computing architecture, and may need
application redesign.
• Vertical Scaling: Simple but limited. Suitable for any computing environment.
• Horizontal Scaling: Complex but cost-effective. Needs distributed computing architecture.
Horizontal Scaling in the Cloud:
• Performance: Measured by response time and overall completion time for tasks.
• Scalability: Refers to the system's ability to maintain performance with growing user numbers.
• Ideal Scalable System: Response time should remain consistent regardless of concurrent users.
In essence, while vertical scaling is simpler but limited, horizontal scaling offers greater scalability potential
but requires a more complex setup. Balancing performance and scalability is crucial for a robust computing
system.
This table should make it easier to see the differences between vertical and horizontal scaling approaches.
Cloud bursting is like a safety valve for your computing system. When your organization's internal
infrastructure or private cloud can't handle the demand, it expands into a public cloud temporarily. This
setup is called cloud bursting, and it lets you manage changing resource needs without disrupting your users.
Cloud Bursting:
• What is it? When your internal infrastructure or private cloud can't handle the demand, so your
system expands into a public cloud temporarily.
• How does it work? It's based on two main functions: automated scaling listener and resource
replicator. The scaling listener decides when to switch to the external cloud, and the replicator keeps
your systems synchronized during the switch.
• When is it used? It's handy for systems with sudden spikes in traffic, like during events or
promotions.
• What is scalability? It's the ability of your computing system to handle more workload efficiently.
• Why does it matter? Scalability means happy customers because your system can handle more
traffic without slowing down. Even tiny delays in loading times can hurt business—Amazon saw a
1% decrease in revenue for every 100-millisecond delay.
• How does cloud computing help? Cloud computing gives you almost unlimited scalability. Your
applications can automatically adjust to handle more traffic when needed and scale down during
quieter times, letting you focus on your business goals.
So, cloud bursting is like a safety net, ensuring your system can handle whatever comes its way, while
scalability is crucial for keeping customers happy and your business thriving.
Why does it matter? It's all about balancing supply and demand. You want to have just the right amount of
resources (like servers and storage) to support your applications, neither too much nor too little.
In the past: Companies used to estimate their resource needs for a long time ahead and invest in fixed
amounts of computing power. But this often led to wasted resources or not having enough when demand
spiked.
Now: With cloud computing, things have changed. You can rent resources as you need them, paying only
for what you use. This flexibility helps avoid wasted resources and ensures you have enough when demand
goes up.
Who's responsible? Everyone has a role. Cloud providers manage the physical resources, while consumers
(like businesses) estimate their needs and communicate them through service level agreements (SLAs).
Benefits: Proper capacity planning means better performance for your applications and more control over
costs. It's like having just the right amount of popcorn for movie night—enough to enjoy without leftovers.
• Look at past usage patterns to understand how resources are typically used over time.
• Analyze data to predict future demand, considering factors like seasonal variations and special
events.
• This helps in knowing when resources will be needed the most.
• Monitor the current usage of resources like processors, memory, storage, and network connectivity.
• Identify any bottlenecks or stress points where resources are maxed out and affecting performance.
• Understand how the system responds to different levels of demand.
Load Balancing:
Definition: Load balancing is a crucial technique in distributed computing systems. It ensures that
processing and communication tasks are evenly distributed across network resources, preventing any single
resource from becoming overloaded.
Purpose: Load balancing is especially important for handling large and unpredictable numbers of service
requests. It evenly distributes workload among multiple computing resources like processors and memory,
improving overall system performance.
Benefits:
1. Improved Resource Utilization: Efficient load balancing ensures that resources are utilized
optimally, maximizing system performance.
2. Enhanced Reliability: By distributing workload across multiple resources, load balancing increases
system reliability. If one component fails, others can continue functioning without disruption.
3. Scalability: Load balancing enables a system to scale according to demand, redirecting load to
newly added resources as needed.
4. Continuous Availability: By preventing overloading of resources, load balancing ensures that
applications remain available at all times, enhancing user experience.
Implementation: Load balancing can be implemented through hardware or software solutions. Software-
based solutions are cost-effective and easier to configure and maintain compared to hardware-based
solutions.
Importance in Cloud Computing: Load balancing is essential in cloud computing to maintain operational
efficiency and reliability. It ensures that workloads are properly distributed among available resources,
making the system scalable and tolerant to failures. It also maximizes resource availability and minimizes
downtime, critical for businesses relying on cloud services.
Categories of Load Balancing
Static Approach:
• Definition: Static load balancing doesn't use a knowledge base to distribute tasks. It assigns tasks to
resources solely based on task characteristics, without considering the current state of resources.
• Operation: Tasks are evenly distributed among available resources based on predefined rules. It
doesn't track the workload of each resource.
• Advantages: Simple to design and implement, uses algorithms like round robin.
• Example: If two servers are available and six similar requests come in, static load balancing ensures
each server handles three requests.
Dynamic Approach:
• Definition: Dynamic load balancing considers the current state of resources when distributing tasks.
It adjusts task allocation based on real-time information about resource load.
• Operation: Load balancers continually monitor resource load and adjust task distribution
accordingly. Tasks are assigned based on current resource availability.
• Advantages: Ensures even resource utilization, increases fault tolerance, and supports scalability.
• Example: If one server becomes overloaded, dynamic load balancing redirects tasks to less loaded
servers to maintain system stability.
• Distributed Approach: All nodes share the task of load balancing, either cooperatively or non-
cooperatively.
• Non-Distributed Approach: Load balancing is managed by one or a cluster of nodes. It can be
centralized or semi-distributed.
• Centralized: A single node manages load balancing for the entire system, communicating directly
with other nodes.
• Semi-Distributed: Load balancing tasks are partitioned among groups of nodes, each with a central
node responsible for balancing within the group.
• Objective: Balancing loads while minimizing overhead, optimizing response time, ensuring system
health, and facilitating scalability.
Load balancing plays a critical role in optimizing resource utilization and ensuring efficient task execution
in distributed systems like cloud computing. Whether using a static or dynamic approach, load balancers
help maintain system performance, reliability, and scalability.
LOAD BALANCING ALGORITHMS
The main goal of load balancing algorithms is to ensure no server is overloaded in terms of capacity or
performance. These algorithms are divided into two categories:
Persistence, or stickiness, ensures that all requests from a single client session go to the same backend
server. This is important for applications that need to maintain session state.
• Features:
o SSL Offloading: Moves encryption tasks from the server to the ADC.
o Content Compression: Reduces data size sent over the network, improving speed and
bandwidth use.
Case Studies
1. Google Cloud
o DNS Level Load Balancing: Redirects requests to the nearest data center.
o Data Center Level Load Balancing: Distributes requests based on server load within the
data center.
2. Amazon EC2
o Auto-Scaling Group Integration: Dynamically adjusts capacity based on traffic.
o Controller Service: Monitors load balancers' health and performance.
o DNS Records: Updates load balancer records dynamically for efficient scaling.
Load balancing algorithms are classified based on their awareness of the requests:
1. Class-Agnostic Algorithms:
o Round Robin
o Random
o Least Connection
2. Class-Aware Algorithms:
o Content-Aware: Routes similar requests to the same server.
o Client-Aware: Routes requests from similar clients to specific servers.
o Client & Content Aware: Combines both approaches.
By choosing the right load balancing algorithm, cloud service providers can enhance system performance,
reliability, and efficiency.
Data-intensive computing involves processing large datasets that require efficient management and rapid
data movement. Traditional enterprise storage systems are inadequate for such tasks. Key requirements
include:
• Partitioning and Distribution: Large datasets need to be partitioned and processed across multiple
nodes.
• Scalability: Effective data partitioning and distribution promote scalability.
• I/O Performance: High-performance data processing demands efficient data handling to reduce
access time.
• Parallel and Distributed Processing: Handling complex data efficiently requires parallel and
distributed computing models like MapReduce.
Cloud native file systems must meet several challenges not faced by traditional file systems, including:
• Multi-Tenancy: Ensuring isolation and security for multiple tenants sharing resources.
• Scalability: Supporting both upward and downward scaling to meet varying storage needs without
resource wastage.
• Unlimited Storage: Providing virtually unlimited and fault-tolerant storage using inexpensive
hardware.
• Efficiency: Handling thousands of concurrent operations efficiently.
• Compatibility: Maintaining backward compatibility with existing file system interfaces.
• Metered Use: Enabling resource usage metering.
• Error Detection and Recovery: Incorporating automatic error detection and recovery mechanisms.
Processing large datasets often requires parallel and distributed models like MapReduce. Key aspects
include:
To support high-performance computing, several cloud-native file systems have been developed, including:
• IBM General Parallel File System (GPFS): An early high-performance distributed file system
developed by IBM.
• Google File System (GFS): A scalable and reliable distributed file system using inexpensive
hardware, designed for large file storage and access.
• Hadoop Distributed File System (HDFS): An open-source implementation of GFS, providing
scalable and reliable data storage on commodity servers.
• Ghost Cloud File System: A scalable private cloud file system designed for use within an Amazon
Web Services (AWS) account.
• Gluster File System (GlusterFS): An open-source distributed file system capable of scaling across
multiple disks, machines, and data centers.
• Kosmos File System (KFS): An open-source GFS implementation developed in C++, also known as
CloudStore.
• Sector Distributed File System: Another open-source file system inspired by GFS.
13.5 Storage Deployment Models
• Public Cloud Storage: Accessible to anyone and provided by third-party service providers.
• Private Cloud Storage: Managed by the consumer enterprise, can be set up on-premises or off-
premises.
• Hybrid Cloud Storage: Combines public and private storage, using private storage for critical data
and public storage for archiving.
Several managed cloud storage services are popular among developers, providing reliable and high-
performance storage:
• Amazon Elastic Block Store (EBS): Provides block-level storage volumes for Amazon EC2
instances, supporting various file systems.
• Amazon Simple Storage Service (S3): Stores files as objects within buckets, capable of handling
trillions of objects.
• Google Cloud Storage: Persistent storage attached to Google Compute Engine (GCE), storing data
as objects within buckets.
These advancements in cloud file systems and storage solutions are crucial for meeting the demands of high-
performance and data-intensive computing environments.
UNIT-3
Multi-tenant software:
Applications have traditionally been developed for single enterprises, where the data belongs to one
organization. However, SaaS platforms require a single application to handle data from multiple customers,
known as multi-tenancy. Multi-tenancy can also be achieved through virtualization, where each tenant has
its own virtual machines. This chapter discusses application-level multi-tenancy.
1 Multi-Entity Support
Before SaaS, large organizations needed applications to support multiple units (entities) while keeping data
segregated. For example, a bank's software should allow branch-level users to see only their data, but also
support centralized changes and global processing like inter-branch reconciliation. This is similar to multi-
tenancy.
To achieve multi-entity support, each database table includes an additional column (OU_ID) indicating the
organizational unit. Queries filter data based on the current user's unit. This is called the single schema
model, where one schema holds data for all entities.
Advantages of the single schema model include easy upgrades for all customers. However, it can be
complex to support custom fields for specific customers. The model shown in Figure 9.2 demonstrates how
custom fields are managed separately, making upgrades complicated.
2 Multi-Schema Approach
The multi-schema approach uses separate schemas for each customer. This simplifies application design and
makes it easier to re-engineer existing applications. Each schema can have its own customizations. Figure
9.3 shows how separate schemas handle customer data, making the system more flexible.
Cloud data stores like Google App Engine, Amazon SimpleDB, and Azure's data services support multi-
tenancy. For example, Google App Engine allows dynamic creation of classes for different schemas, as
shown in Figure 9.4. This flexibility makes it easier to manage multi-tenant applications on these platforms.
4 Data Access Control for Enterprise Applications
Multi-tenancy can also be useful within an enterprise, where access to data needs to be controlled based on
various rules. Data Access Control (DAC) can manage who sees what data. Figure 9.5 illustrates a generic
DAC implementation where each table has an additional DAC_ID field. Rules define access, and users are
assigned DAC roles.
In traditional databases, SQL queries can enforce DAC by joining tables. In cloud databases, such joins are
not supported, so DAC must be implemented in the application code.
To summarize, achieving multi-tenancy, especially with cloud databases, involves complexities and careful
planning to ensure data security and efficient customization.
Relational databases have dominated enterprise applications since the 1980s, providing a reliable way to
store and retrieve data, especially for transaction processing. However, with the rise of web services from
companies like Google, new distributed systems like GFS (Google File System) and BigTable, as well as
programming models like MapReduce, have been developed. These new systems are particularly effective
for handling large volumes of data and parallel processing.
Relational Databases
Relational databases use SQL to interact with users and applications, optimizing query execution through
memory and disk operations. Data is typically stored in rows on disk pages, but column-oriented storage can
be more efficient for read-heavy tasks like analytics.
Relational databases have evolved to support parallel processing using different architectures (shared
memory, shared disk, shared nothing). They handle transaction isolation through locking mechanisms,
which become complex in parallel and distributed setups. These databases are designed for transaction
processing, but large-scale parallelism requires new approaches.
Cloud File Systems: GFS and HDFS
GFS (Google File System) and HDFS (Hadoop Distributed File System) are designed for managing large
files across clusters of servers. They handle hardware failures and support multiple clients reading, writing,
and appending data in parallel. These systems break files into chunks, which are replicated across different
servers to ensure reliability. Clients access data chunks directly after getting metadata from a master server,
ensuring consistency and fault tolerance through regular updates.
BigTable (by Google) and HBase (on HDFS) are distributed storage systems that manage data in a
structured, multi-dimensional map format. Data is accessed by row key, column key, and timestamp, with
column families storing related data together, similar to column-oriented databases. Data is managed by
tablet servers, with metadata servers locating these tablets.
Amazon's Dynamo is a key-value store that handles high volumes of concurrent updates using distributed
object versioning and quorum consistency, ensuring data reliability across nodes. Data is stored with
versioning, allowing for conflict resolution by applications. A quorum protocol ensures consistency by
requiring reads and writes to access multiple replicas.
Google App Engine's Datastore and Amazon's SimpleDB are built on BigTable and Dynamo, respectively.
Datastore uses BigTable's infrastructure for efficient data storage and querying. SimpleDB uses Dynamo's
key-value approach, providing scalable, fault-tolerant storage.
Summary
Cloud data strategies like BigTable, HBase, and Dynamo offer scalable and fault-tolerant alternatives to
traditional relational databases. They are particularly well-suited for large-scale, parallel data processing,
leveraging distributed file systems like GFS and HDFS for efficient data access and reliability. These
systems support the growing needs of enterprise applications in the cloud.
1. Database in Cloud:
• Implementation Forms:
o Traditional Database Solution on IaaS: Users deploy database applications on virtual
machines.
o Database-as-a-Service (DBaaS): Service providers manage backend administration tasks.
2. Data Models:
• Structured Data (SQL Model): Traditional relational databases like Oracle, SQL Server.
• Unstructured Data (NoSQL Model): Suitable for scalable systems, efficient for unstructured data
sets. Examples: Amazon SimpleDB, Google Datastore, Apache Cassandra.
3. Database-as-a-Service (DBaaS):
• Deployment Options:
o Traditional Deployment: Users deploy traditional RDBMS on cloud servers, manage
administration.
o Relational Database-as-a-Service: Fully-managed RDBMS by cloud providers.
• Amazon RDS: Supports MySQL, Oracle, SQL Server, PostgreSQL, and Amazon Aurora.
• Google Cloud SQL: Managed MySQL database.
• Azure SQL Database: Managed Microsoft SQL Server.
In essence, cloud databases offer flexibility in deployment, management, and scalability, catering to both
structured and unstructured data needs. Managed services relieve users of administrative burdens, providing
on-demand access and automated functionalities.
NoSQL databases are a new type of database system that handle large volumes of unstructured data more
efficiently than traditional relational databases. They are crucial for cloud computing environments due to
their ability to store and retrieve data effectively. Let's break down the key points:
1. Emergence of Big Data: With the rise of web-based platforms, data started growing exponentially
in volume and complexity. Big data refers to massive sets of structured and unstructured data with
characteristics like high volume, velocity (fast-changing), and variety (different formats like text,
audio, video).
2. Challenges with Relational Databases: Traditional relational databases faced challenges in
handling the increasing volume of data, especially with the surge in online transactions and social
networking. They struggled to scale horizontally, meaning they couldn't easily distribute data across
multiple servers to handle increased loads.
3. Need for NoSQL Databases: As web applications moved to cloud computing, it became evident
that traditional relational databases were inadequate for handling modern data requirements. NoSQL
databases emerged as a solution because they could scale horizontally, distribute data efficiently, and
handle unstructured data effectively.
4. Characteristics of NoSQL Databases:
o Horizontal Scalability: NoSQL databases can scale out across multiple servers to handle
increasing data loads.
o Flexible Schemas: They allow for schema-less data storage, enabling easy adaptation to
changing data structures.
o Non-relational: NoSQL databases can efficiently handle both relational and non-relational
data.
o Auto-distribution and Replication: Data distribution and replication are automatic
processes in NoSQL databases, ensuring high availability and fault tolerance.
o Integrated Caching: Many NoSQL databases offer integrated caching capabilities to
improve performance.
5. Types of NoSQL Databases:
o Key-Value: Simplest form where data is stored as key-value pairs.
o Document-Oriented: Data is stored in documents, usually using JSON or XML formats.
o Column-Family: Data is grouped in columns, allowing for efficient querying.
o Graph: Data is stored as graph structures, useful for applications with complex relationships.
6. Popular NoSQL Databases: Examples include MongoDB, Cassandra, HBase, DynamoDB,
CouchDB, and Neo4j. Each has its own strengths and weaknesses, catering to different use cases.
7. Selecting the Right NoSQL Database: Choosing the appropriate NoSQL database depends on the
specific requirements of the application. There's no one-size-fits-all solution, and often multiple
NoSQL databases may be used together to optimize performance.
Overall, CDNs are essential for delivering content quickly and reliably to users worldwide, improving user
experience and business success.
Security Reference Model
Security is a big concern in cloud computing, just like in regular computing. When individuals or businesses
switch to cloud services, they want to be sure their data and operations are safe. Cloud computing means
working with data and applications outside your usual setup, which can feel risky.
People often say, "Cloud computing is a great idea, but it's not secure." But, if done right, cloud computing
can be just as safe as traditional methods. Understanding the risks and choosing the right setup for your
needs is key.
In the past, we used firewalls to protect our networks from outside threats. But with cloud computing, the
traditional security boundaries change. Computing resources move outside our usual security zones, and we
need new ways to keep them safe.
Several groups, like the Cloud Security Alliance (CSA) and the Jericho Forum, have worked on creating
security standards for cloud computing. They've published guidelines and best practices to help both
providers and users stay safe.
1. User Access: Who has control over your data in the cloud?
2. Regulatory Compliance: Are your cloud providers following security rules?
3. Data Location: Do you know where your data is stored?
4. Data Segregation: Is your data kept separate from others'?
5. Recovery: What happens if there's a disaster? Can you get your data back?
6. Investigative Support: Can you investigate any problems that occur?
7. Long-Term Viability: What happens if your cloud provider goes out of business?
By asking these questions and choosing reputable providers, you can make sure your cloud computing is as
safe as possible.
Cloud Security Reference Model
For years, various groups and organizations worked on developing a model to address cloud security. The
Jericho Forum proposed the "Cloud Cube Model" in 2009 to tackle the issue of security boundaries blurring
among collaborating businesses.
This model suggests that cloud security shouldn't just be measured by whether systems are "internal" or
"external." It looks at factors like who manages the cloud and who has access rights.
Primary Objectives
The Cloud Cube Model aims to represent different cloud formations, highlight their characteristics, and
show the benefits and risks. It also emphasizes that traditional approaches to computing aren't always
obsolete.
Cloud security depends on both providers and consumers. Consumers have more responsibility for security
management with Infrastructure as a Service (IaaS), while it decreases with Software as a Service (SaaS).
Security Policy
Security policies guide reliable security implementation in a system. These policies include management
policy, regulatory policy, advisory policy, and informative policy.
Building trust in cloud computing is essential. Providers can gain trust through security certifications like the
Security, Trust & Assurance Registry (STAR) from the Cloud Security Alliance (CSA).
Providers can strengthen trust by obtaining security certifications. Transparency about security measures
helps consumers feel confident in adopting cloud services.
• Cloud security must address basic requirements like confidentiality, integrity, availability, identity
management, and access control.
• These requirements are not new, but their implementation in the cloud requires special attention.
Responsibility:
• SLAs establish trust between providers and consumers, outlining service capabilities and security
measures.
• They should include detailed security provisions and responsibilities for both parties.
• Threats like eavesdropping, fraud, theft, sabotage, and external attacks exist in cloud computing.
• Cloud-specific threats include infrastructure, information, and access control vulnerabilities.
Infrastructure Security:
Information Security:
• Plans should be in place for scenarios like provider bankruptcy, acquisition, or service
discontinuation.
• Consumers should have access to their data and be able to move it to other providers if needed.
Overall, cloud security requires collaboration between providers and consumers, clear SLAs, and measures
to protect data integrity, confidentiality, and availability.
Identity management and access control, also known as IAM, are essential for secure computing. Here's
why:
1. Operational Efficiency: IAM automates user verification, making system operations smoother.
2. Enhanced Security: It protects systems and data from harmful attacks.
IAM includes:
• Identification Management: Users state their identities, usually with a unique ID or username.
• Authentication Management: Verifies a user's identity through passwords, fingerprints, or retina
scans.
• Authorization Management: Determines a user's level of access rights after verifying identity.
• Access Management: Implements organizational policies and system privileges for resource
requests.
• Accountability: Ensures individuals cannot deny actions within the system, using audit trails and
logs.
• Monitoring and Auditing: Allows users to monitor, audit, and report compliance issues regarding
resource access.
In cloud computing, IAM is crucial due to the loss of control over infrastructure. Both service providers and
consumers play roles in ensuring security:
IAM in cloud computing includes robust authentication, authorization, and access control mechanisms, often
using modern technologies like biometrics or smart cards.
User authentication can be simplified with Single Sign-On (SSO) and Federated Identity Management
(FIM):
Access control limits user access to systems and data. Models like Mandatory Access Control (MAC),
Discretionary Access Control (DAC), and Non-Discretionary Access Control determine access policies
based on user roles and system requirements.
Cloud Security Design Principles:
Frameworks like ITIL and ISO 27001/27002 provide guidelines for managing cloud security effectively.
Security-as-a-Service:
Organizations can outsource security management responsibilities to third-party providers, who offer
services like email and web content filtering, vulnerability management, and identity management. These
services are subscription-based and managed by the provider.
In summary, IAM, access control, and robust security design principles are critical for ensuring cloud
security, with frameworks and Security-as-a-Service options available to help organizations manage their
security effectively.
Understanding Privacy: Privacy and security are both important in cloud computing, but they're not the
same. While security focuses on protecting data from unauthorized access, privacy is about controlling who
can access personal or sensitive information.
Consumer Responsibility: Cloud service providers are mainly responsible for maintaining privacy, but
consumers should also consider their own privacy needs when agreeing to service terms.
Regulatory Compliance: Meeting legal requirements in different countries can be challenging due to the
global nature of cloud data storage. Consumers should understand their privacy rights and the provider's
approach to compliance.
GRC Issues: Governance, risk, and compliance become more complex with cloud adoption. Regular audits
can help identify any violations and build consumer trust.
What is Privacy?: Privacy involves keeping personal information confidential and under control.
Personally identifiable information (PII), like names and contact details, falls under privacy regulations.
• Access to Data: Consumers need clarity on who can access their data and under what circumstances.
• Compliance: Privacy laws vary by region, so consumers must understand legal implications.
• Storage Location: Data stored in the cloud can be spread across multiple locations, subject to
different privacy laws.
• Retention and Destruction: Consumers should know how long data will be stored and what
happens when they switch providers.
Key Concerns: Privacy laws differ globally, making it crucial for consumers to understand their rights and
the provider's privacy policies.
Importance of Privacy Policy: Cloud providers should have clear privacy policies to address consumer
concerns and build trust. These policies outline how privacy is maintained and any limitations on liability.
Understanding Compliance: Compliance involves following regulations and laws, which can be
challenging due to differing standards across countries and regions. Cloud computing adds complexity
because data is stored in multiple locations with different regulations.
Role of Service Level Agreements (SLAs): SLAs between cloud service providers (CSPs) and consumers
address compliance issues. Both parties must agree on requirements and resolve any violations.
Shared Responsibility: Compliance isn't solely the provider's responsibility. Consumers also play a role in
monitoring and addressing compliance issues.
Governance, Risk, and Compliance (GRC): GRC encompasses governance strategies, risk management,
and compliance to regulations. It's crucial for organizations to address these areas together to avoid conflicts.
Importance of GRC:
• Information Explosion: Increased data requires better management for security and privacy.
• Scores of Regulations: Many regulations make compliance challenging.
• Globalization: Businesses operate globally, facing varied regulations.
• Cloud Computing: Cloud adds complexity due to data storage across jurisdictions.
Automated GRC Monitoring: Tools automate GRC monitoring, enhancing visibility and efficiency.
Solutions from vendors like SAP offer automated GRC programs.
• System Audit: Regular checks to assess system performance, security, and compliance.
• Internal and External Audit: Internal audits are performed by employees, while external audits are
conducted by independent professionals.
Audit Frameworks:
Auditing the Cloud: Cloud customers need assurance that providers comply with regulations. Right to
Audit clauses in contracts allow clients to conduct audits. CSPs can adopt compliance programs based on
standard frameworks to build trust.
UNIT-4
Portability and interoperability are critical concerns in cloud computing, impacting a consumer's ability to
move their systems between different providers and integrate applications across various environments. Let's
break down the key points:
Portability:
• Definition: Portability refers to the ability to move computing entities, such as virtual machines,
applications, or development environments, from one system to another without losing functionality.
• Levels of Concern: Portability issues vary across different layers of cloud services, including SaaS,
PaaS, and IaaS.
• Categories of Portability: Data portability, application portability, and platform portability are the
primary categories of concern.
• Challenges: Moving data between cloud services requires standard data formats. Application
portability is hindered by platform-specific features and APIs. Platform portability involves either
reusing platform components or bundling machine images.
Interoperability:
Addressing Challenges:
• Portability Solutions: Standardizing data formats and APIs, as well as initiatives like the Simple
Cloud API, help address portability challenges.
• Interoperability Solutions: Standardizing communication interfaces and data formats, leveraging
service-oriented architecture (SOA) principles, and initiatives from organizations like CCIF and
DMTF are key to promoting interoperability.
• Historical Context: Portability and interoperability challenges have been present in computing since
earlier days, but cloud computing exacerbates these issues due to the distributed nature of cloud
environments.
• Current Efforts: Technologists are actively working on addressing these challenges through
standardization efforts and collaborative initiatives.
Overall, understanding and addressing portability and interoperability concerns are essential for consumers
considering cloud adoption, as they impact the flexibility and effectiveness of cloud deployments.
Various situations where interoperability and portability issues may arise in cloud computing, along with
potential recommendations and remedies:
• Portability Considerations:
o SaaS Level: Data portability is critical, ensuring data format, extent, and semantics align
between providers. Migration tools or standard data formats can aid in this.
o PaaS Level: Application portability depends on compatibility of virtual machines, operating
systems, and development environments. Portable machine image formats like OVF can
assist in this.
• Interoperability Considerations:
o For SaaS, functional interface similarities between old and new providers are important. API
compatibility is crucial for applications using SaaS APIs.
o For PaaS and IaaS, API interoperability is vital for applications.
• Portability Considerations: Data and functionality/process integration are crucial. APIs must be
well-defined for both on-premises and cloud applications.
• Interoperability Considerations: Integration effort is reduced with SOA techniques or PaaS, where
customers have more control. IaaS integration requires less effort as customers control platform
services and applications.
Addressing these scenarios effectively involves understanding the specific challenges and leveraging
appropriate technologies and standards to ensure seamless interoperability and portability between cloud
services and environments.
Machine imaging and virtual appliances are both valuable tools in the realm of cloud computing, each
serving specific purposes and offering distinct benefits. Let's break down the key points about each:
Machine Imaging:
• Definition: A machine image is essentially a clone of an entire system stored in a file, which can be
deployed later to launch multiple instances of that machine.
• Contents: It includes the operating system, pre-installed applications, and tools necessary to run the
machine.
• Benefits:
o Streamlines deployment of virtual servers or machines, providing users with pre-configured
options.
o Facilitates faster disaster recovery by allowing users to restore systems from image backups.
Virtual Appliance:
Distinguishing Features:
• Machine Image vs. Machine Instance: A machine image serves as a template from which multiple
machine instances can be launched. Each instance is a virtual server created from the image.
• Deployment Objective: Machine imaging primarily focuses on launching machine instances over
virtual infrastructure, while virtual appliances aim to deploy applications over virtual infrastructure.
• Definition: OVF is a packaging standard developed by DMTF to address portability issues of virtual
systems, such as virtual machine images or virtual appliances.
• Purpose: OVF enables cross-platform portability by providing a common packaging format for
virtual systems.
• Delivery: OVF packages consist of one or more files, including an XML descriptor file containing
metadata, virtual disk files, and other relevant data. OVF packages can be delivered either as a
package of files or as a single OVA (Open Virtualization Appliance) file.
In summary, machine imaging and virtual appliances offer efficient ways to deploy and manage applications
in cloud computing environments, with each serving specific needs and providing unique advantages. The
adoption of standards like OVF further enhances interoperability and portability across different
virtualization platforms.
This table summarizes the key differences between virtual machine images and virtual appliances, focusing
on their contents, objectives, and the ability to install additional software.
• Frontend and Backend: Cloud computing involves client applications at the frontend and the cloud
as the backend, which is composed of multiple layers and abstractions.
• Backend Components: The backend comprises a cloud building platform and cloud services.
• Platform for Cloud: This includes the infrastructure and support system for cloud operations.
• Cloud Services: Comprise infrastructure, platform, and application services, with infrastructure
services being the most common.
Design Characteristics
• Deployment Options: Private, public, hybrid, and community deployments, with three main service
delivery categories: IaaS, PaaS, and SaaS.
• Degree of Abstraction: Increases as one moves upwards along the service levels, with flexibility
decreasing as abstraction increases.
• Simplified Asset Management: Consumers can easily manage computing assets through cloud
service providers, with capacity expansion and provisioning becoming simpler and more agile.
These summaries capture the key points discussed in the text, providing a clearer understanding of cloud
management and programming models.
1. Business Support:
o Involves activities like customer management, contract management, inventory management,
accounting and billing, reporting and auditing, and pricing and rating.
2. Provisioning and Configuration:
o Core activities include rapid provisioning, resource changing, monitoring and reporting,
metering, and SLA management.
3. Portability and Interoperability:
o Focuses on data, application, and platform portability, as well as service interoperability.
• Cloud management tools enhance performance attributes like effectiveness, security, and flexibility.
• They include features like resource configuration, provisioning, policy management, performance
monitoring, cost monitoring, performance optimization, and security management.
• Various vendors offer cloud management solutions tailored to different cloud environments.
• Standards like CIMI (Cloud Infrastructure Management Interface) and OVF (Open Virtualization
Format) simplify cloud management by providing standard APIs and packaging formats.
• These standards enable the development of vendor-independent cloud management tools.
• Responsibilities vary between service consumers and providers across SaaS, PaaS, and IaaS models.
• Consumers have minimal responsibilities in SaaS, moderate in PaaS, and maximum in IaaS.
• Phases include service template creation, SLA definition and service contracts, service creation and
provisioning, service optimization and customization, service maintenance and governance, and
service retirement.
• The lifecycle starts with service creation and ends with service retirement, aiming for optimal
resource usage.
The lifecycle of cloud services has several stages to manage their dynamic nature. Here's a simpler
breakdown:
The cloud service lifecycle begins with creating templates and ends with retiring services, aiming for
efficient resource usage throughout.
SLA MANAGEMENT
An SLA (Service-Level Agreement) is a legal agreement between a service provider and a consumer or
carrier regarding the quality of services provided. In cloud computing, SLAs define various parameters for
service quality, such as uptime or response time for issue resolution. Before finalizing the SLA, both parties
must clearly identify the Service-Level Objectives (SLOs).
TYPES OF SLA
1. Infrastructure SLA: This covers issues related to the infrastructure, like network connectivity and
server availability. For example, it might guarantee that network packet loss will not exceed 1% in a
month.
2. Application SLA: This focuses on application performance metrics, such as web server latency or
database response time.
SLA LIFECYCLE
1. Contract Definition: Service providers create SLAs based on standard templates, which can be
customized for individual customers.
2. Publishing and Discovery: Providers publish their service offerings, and consumers search the
service catalog to find suitable options.
3. Negotiation: Both parties negotiate the terms of the SLA before signing the final agreement.
4. Operationalization: Once the SLA is in effect, monitoring, accounting, and enforcement activities
ensure compliance with the agreed-upon terms.
5. Decommissioning: If the relationship between the consumer and provider ends, this phase specifies
the terms for terminating the agreement.
Overall, the SLA lifecycle involves defining, negotiating, implementing, and terminating agreements to
ensure that service quality meets expectations.
In cloud computing, disasters or damage to physical computing resources don't cause much harm because
data centers are well-protected and connected through networks. Disaster recovery planning involves two
key factors:
1. Recovery Point Objective (RPO): This determines the maximum acceptable data loss in case of a
disaster. For example, if the RPO is 8 hours, backups must be made at least every 8 hours.
2. Recovery Time Objective (RTO): This specifies the acceptable downtime for the system in case of a
disaster. For instance, if the RTO is 6 hours, the system must be operational again within 6 hours.
A good disaster recovery plan should meet the RPO and RTO needs. It involves: A. Backups and data
retention: Regular backups of persistent data are crucial for recovery. Each service provider has its backup
strategy. B. Geographic redundancy: Storing data in multiple geographic locations increases survival
chances during disasters. C. Organizational redundancy: Maintaining backups with alternative service
providers safeguards against unexpected shutdowns or failures.
InterCloud, also called cloud federation, is the concept of connecting multiple clouds to share resources and
support each other in case of saturation. It's like a "cloud of clouds" where communication protocols allow
clouds to interact. This federation enables workload flexibility and resource sharing among interconnected
clouds with similar architecture and interfaces.
CLOUD PROGRAMMING: A CASE STUDY WITH ANEKA
Aneka is a cloud platform developed by the University of Melbourne, Australia, and commercialized by
Manjrasoft, an Australian company. It's named after the Sanskrit word for "many in one" because it supports
multiple programming models: task programming, thread programming, and MapReduce programming.
Let's break down the key components and programming models supported by Aneka:
1. Components of Aneka:
o Executors: These are nodes that execute tasks.
o Schedulers: They arrange the execution of tasks across multiple nodes.
o WorkUnits: These are the units of work that make up an application.
o Manager: This is a client component that communicates with the Aneka system.
2. Programming Models:
o Thread Programming:
▪ This model focuses on achieving high performance by allowing multiple threads to
run simultaneously.
▪ In Aneka, it's implemented using distributed threads called Aneka threads.
▪ Developers use APIs similar to .NET's thread class, making it easy to port existing
multi-threaded applications to Aneka.
o Task Programming:
▪ This model considers applications as collections of independent tasks that can be
executed in any order.
▪ Aneka supports this model with APIs like the ITask interface.
▪ Tasks can be bundled and sent to the Aneka cloud for execution.
o MapReduce Programming:
▪ This model, popularized by Google, is used for processing large volumes of data
efficiently.
▪ Aneka's implementation follows the structure of Hadoop, a Java-based framework.
▪ It involves two phases: mapping and reducing, where data is filtered, sorted, and
summarized.
Aneka is built on the Microsoft .NET framework and provides tools and APIs for developing .NET
applications. It can be deployed on various cloud platforms, including public clouds like Amazon EC2 and
Microsoft Azure, as well as private clouds. This flexibility allows developers to build hybrid applications
with minimal effort.
Popular cloud services:
Cloud computing has become a big deal over the past decade, with major players like Amazon, Microsoft,
and Google leading the charge. They offer various cloud services catering to businesses and individuals
alike.
1. Amazon Web Services (AWS): Amazon is a big player in cloud services, especially in
Infrastructure-as-a-Service (IaaS). Their suite of services, AWS, covers everything from computing
to storage to databases.
o Elastic Compute Cloud (EC2): This allows users to create virtual servers in the cloud.
o Simple Storage System (S3) and Elastic Block Store (EBS): Used for data storage in the
cloud.
o Elastic Beanstalk: A platform for developing and deploying applications.
o Relational Database Service (RDS), DynamoDB, and SimpleDB: For managing databases.
o CloudFront: A content delivery service (CDN).
2. Amazon Elastic Compute Cloud (EC2): This is Amazon's main cloud computing platform, where
users can create and manage virtual servers with different configurations.
o EC2 uses different instance types like general purpose, compute optimized, memory
optimized, storage optimized, and GPU instances.
o Users can choose the instance type based on their application needs, like balancing between
processor, memory, storage, and network resources.
3. Amazon Storage Systems:
o Amazon Simple Storage System (Amazon S3): A scalable and persistent storage system for
durable data storage.
o Amazon Elastic Block Store (Amazon EBS): Persistent storage designed for use with
Amazon EC2 instances.
o S3 is good for storing data independently from other services, while EBS is specifically for
use with EC2 instances.
4. AWS Elastic Beanstalk: This is Amazon's Platform-as-a-Service (PaaS) offering, allowing users to
develop and deploy applications seamlessly.
5. Database Services of AWS:
o Amazon Relational Database Service (Amazon RDS): A managed relational database
service supporting various database engines like MySQL, PostgreSQL, Oracle, and Microsoft
SQL Server.
o SimpleDB and DynamoDB: Fully managed NoSQL database services catering to different
workload needs.
6. Amazon CDN Service: CloudFront: A content delivery network service for distributing content to
end-users with low latency.
7. Amazon Message Queuing Service: SQS: A fully managed message queuing service for storing
messages as they travel between computing nodes.
These services make it easier for businesses and individuals to access computing resources, store data,
deploy applications, and manage databases in the cloud.
Microsoft Azure is Microsoft's cloud service offering, encompassing various components like Azure Virtual
Machine, Azure Platform, Azure Storage, Azure Database Services, and Azure Content Delivery Network
(CDN). Here's a simplified breakdown:
1. Azure Virtual Machine: Allows users to create scalable virtual machines on-demand. It runs on the
Azure operating system with a layer called the 'Fabric Controller' managing computing and storage
resources. Users can choose between Windows and Linux operating systems.
2. Azure Platform: Combines Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS)
offerings. It provides application servers, storage, networking, and computing infrastructure.
Developers can use the cloud-enabled .NET Framework for Windows-based application
development.
3. Azure Storage: Provides scalable, durable, and highly-available storage for Azure Virtual Machines.
It includes services like Blob storage, Table storage, Queue storage, and File storage. Blob storage is
used for storing large volumes of unstructured data.
4. Azure Database Services: Offers various database services like Azure SQL Database (a managed
relational database service), DocumentDB (a NoSQL document database service), and HDInsight (an
Apache Hadoop solution for Big Data processing).
5. Azure Content Delivery Network (CDN): Enables easy delivery of high-bandwidth content hosted
in Azure by caching blobs and static content at strategically-placed locations for high-performance
content delivery.
6. Microsoft's SaaS Offerings: Includes services like Office 365, OneDrive, and SharePoint Online.
Office 365 is a cloud version of the traditional Microsoft Office suite with additional services like
Outlook, Skype, and OneDrive. OneDrive offers free online storage, while SharePoint Online
facilitates secure collaboration.
In summary, Microsoft Azure provides a comprehensive range of cloud services, including virtual machines,
platform services, storage solutions, database services, content delivery network, and software-as-a-service
offerings like Office 365, OneDrive, and SharePoint Online.
Google Cloud, like other major cloud providers, offers a wide range of services across different categories
like Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS).
Let's break down the key offerings:
In summary, Google Cloud provides a comprehensive suite of cloud services, ranging from infrastructure
and platform solutions to productivity and collaboration tools, catering to the needs of developers,
businesses, and individual users alike.
This table provides a quick overview of the main features and offerings of AWS, Azure, and Google Cloud.
UNIT-5
Technological Evolution:
Information Systems:
Enterprise Components
Component-Based Structure:
Integration Needs:
Integration Levels:
1. Data Level: Direct data transfer.
2. API Level: Applications publish API libraries.
3. Service Level: Applications publish services (e.g., web services).
4. User Interface Level: Common user interfaces using mashup APIs.
5. Workflow Level: Tasks in one application trigger tasks in others.
Overall Goal:
• EA provides a roadmap for continuous IT adaptation, balancing current needs with future goals.
In-House vs. Cloud: When deciding where to deploy enterprise applications, consider these factors:
• Complexity: Extensive IT infrastructures with many applications and servers are costly and difficult
to manage.
• Server Sprawl: Using many underutilized servers is inefficient. Virtualization allows multiple
applications to share servers, improving efficiency.
Automation:
• Cloud platforms offer automation (e.g., server provisioning), reducing manual efforts and costs.
• Cost Savings: Virtualization and automation in the cloud lead to significant savings.
• Efficiency: Shared resources and automated management enhance IT efficiency.
Future chapters will explore cloud platforms and their technologies in detail.
Partner Model
• Partners can be organizations or individuals.
• Roles: A partner can have multiple roles (customer, supplier, employee).
• Communications: Track interactions with partners through different mechanisms.
13.3 Products
Product Model
Order Model
Quote Model
Work Model
This simplified version highlights the key components and relationships in enterprise applications without
the detailed technical specifications.
13.6 Billing
Billing Process:
13.7 Accounting
Accounting:
Enterprise Processes:
Integration Challenges:
• Integration vs. Development Costs: Weighing integration challenges against high development
costs.
SaaS Considerations:
This simplified version covers the essential points about billing, accounting, enterprise processes, and the
considerations for building vs. buying software and using SaaS.
Custom enterprise applications and Dev 2.0
In the previous chapter, we learned that enterprises have various information needs, which can be managed
using CRM, SCM, or core ERP systems. These systems can be either pre-packaged or custom-built,
depending on specific requirements.
Enterprise applications are made up of different parts called application components. Each component
manages specific tasks like storing data, interacting with users, and executing business logic. These
components are organized into layers, following a design pattern called 'model-view-controller' (MVC). For
example, in a typical application server architecture, the presentation layer deals with user interfaces, the
business logic layer executes computations, and the data access layer interacts with the database.
In transaction-processing applications, users interact with the system by viewing, entering, and modifying
data. For instance, when entering a new order, data validation and submission happen through layers like
HTML pages and server-side scripts. Traditional web architectures retrieve pages through server requests,
while newer paradigms like AJAX allow for more efficient page updates without full page reloads. These
architectures use controllers to manage navigation and data flow between different layers.
Common patterns for user interfaces include search, results, editing, and creating pages. For example, an
order entry system may have screens for searching orders, viewing results, and editing order details. These
interfaces can be formalized using defined rules and patterns, making it easier to design and develop similar
interfaces for other applications.
Formal models help define user interface behavior, making it easier to reuse patterns and frameworks across
different applications. While technology evolves rapidly, formal models provide a stable foundation for
adapting to new technologies and patterns. However, as new technologies emerge, these models may need to
be updated to accommodate new patterns and features.
In summary, understanding software architecture principles and common UI patterns is crucial for building
scalable enterprise applications, whether using pre-packaged solutions or custom development.
Business Logic:
• Business logic refers to the rules and processes that govern how data is handled within an
application, apart from the user interface.
• It includes tasks like data validation, computation, transaction management, data manipulation, and
calling other methods.
• For example, when a user submits an order form, the business logic validates the order, computes the
total value, manages the transaction, and inserts the order data into the database.
Rule-based Computing:
• Rule-based computing uses formal logic to define rules that determine whether certain conditions are
true or false.
• These rules are useful for validation and other computations.
• Rule engines evaluate these rules against facts (data) to determine their truth value.
• Rule engines can be backward-chaining (evaluating specific predicates) or forward-chaining
(determining all true predicates).
• Rule-based systems allow for dynamic addition and modification of rules at runtime.
• Some platforms use a visual approach called Logic Map, based on the MapReduce cloud
programming paradigm.
• Logic Maps represent business logic functions like create, search, and update as nodes in a graph.
• These nodes manipulate relational records and perform computations without a user interface.
• Logic Maps allow for parallel execution of tasks in cloud environments and handle complex business
logic functions efficiently.
• MDA (Model Driven Architecture) transforms models into code using code generators.
• MDI (Model Driven Interpreter) directly interprets higher-level abstractions at runtime, without
generating code.
• In MDI, the application functionality is represented by a meta-model, which is interpreted directly.
• MDA typically involves visual modeling or higher-level language specifications translated into code.
• MDI platforms use a WYSIWIG approach, instantly reflecting changes in the running platform.
• This makes Dev 2.0 platforms 'always on', eliminating compile and build cycles and allowing
immediate testing and use of incremental changes.
2. Error Handling:
• It's about how an application deals with errors and communicates them to users.
• Different types of errors need different handling, from technical errors to business validation failures.
• Error management frameworks ensure errors are reported uniformly and appropriately.
• Errors can be shown immediately or bundled until it's appropriate to inform the user.
• Error handling is crucial for user experience and involves multiple layers of the application
architecture.
3. Transaction Management:
• It's about ensuring data integrity and concurrency control during user interactions.
• Each user interaction should be atomic, especially in multi-user scenarios.
• Optimistic concurrency control is commonly used, where each record has a version number.
• If data changes between read and write attempts, a version number mismatch triggers a transaction
failure.
• Transaction management ensures data consistency and is essential for multi-user applications.
All these aspects are considered cross-cutting concerns, meaning they involve multiple layers of the
application architecture. Aspect-oriented programming (AOP) is a technique used to manage such concerns
efficiently.
15.1 Implementing Workflow in an Application: In a simple leave approval process, employees request
leave, which goes to their supervisor for approval. If approved, it's recorded by HR. Employees are notified
of the outcome. The process involves forms for requesting leave, approving it, and viewing the result.
15.2 Workflow Meta-Model Using ECA Rules: The process involves activities like applying for leave,
approving it, and viewing results. These activities are performed by different actors like employees,
supervisors, and HR managers. We can model this using Entry-Condition-Action (ECA) rules.
15.3 ECA Workflow Engine: A simple workflow engine can be created using ECA rules. It updates
activity lists and sends notifications based on changes in the process. Users can view their pending tasks
through a worklist.
15.4 Using an External Workflow Engine: External workflow engines are necessary when transactions
from multiple applications need to be managed. They handle flow updates, notifications, worklists, and
navigation between applications.
15.5 Process Modeling and BPMN: BPMN (Business Process Modeling Notation) is a graphical way to
represent processes. It includes activities, gateways, transactions, and exceptions to model complex
processes more intuitively.
15.6 Workflow in the Cloud: Workflow services are ideal for cloud deployment, as they enable integration
between different applications deployed on-premise or in the cloud. More cloud-based workflow solutions
are likely to emerge in the future.
An in-depth exploration of enterprise search and analytics, highlighting the importance of leveraging data to
uncover hidden patterns and drive business decisions. Let's break down some key points:
Overall, the text provides a comprehensive overview of enterprise search and analytics, covering
motivations, knowledge discovery tasks, business intelligence, data warehousing, OLAP, and parallel OLAP
using MapReduce. It emphasizes the importance of leveraging data to drive business decisions and optimize
operational processes.
Data Classification:
• What it is: Sorting data into different groups based on certain characteristics.
• How it's done: We represent our data in a matrix. Each row represents a data point, and each column
represents a feature. We use techniques like the Singular Value Decomposition (SVD) to simplify
the data and find patterns.
• Example: Imagine we have documents categorized into physics (P), computing (C), and biology (B).
We can use SVD to analyze these documents and classify new ones based on their content.
• What it is: A method to compute SVD, a way to simplify large data sets.
• How it works: We divide our data into smaller parts and process them in parallel, using a technique
called MapReduce. This helps speed up the computation, especially for big data sets.
Clustering Data:
Anomaly Detection:
• What it is: Identifying unusual data points that don't fit with the rest.
• How it's done: We look for data points that are not well-represented by the main patterns in the data.
These outliers could indicate something unusual or suspicious.
• Example: Using SVD, we can identify data points that don't fit well with the main patterns in the
data, potentially highlighting anomalies.
In essence, these methods help us make sense of large amounts of data by simplifying them and finding
patterns or anomalies.
1. Ranking Difficulty: Web search prioritizes popular results, while enterprise search prioritizes
accurate ones. Plus, in a company, there's more user information available to tailor results.
2. Data Structure: Web data is linked explicitly through hyperlinks, while enterprise data is often a
mix of text and structured records in databases, which aren't always linked directly.
3. Security: Enterprise data often has security restrictions, unlike public web data.
4. Data Variety: Web data is mainly text and easily located via URLs, but enterprise data is a mix of
text, structured records, and various formats.
To handle search efficiently, techniques like MapReduce and Latent Semantic Indexing (LSI) are used:
• MapReduce: It efficiently processes search queries in parallel, breaking down tasks into smaller
chunks handled by different processors.
• LSI: It uses a mathematical method called Singular Value Decomposition (SVD) to understand the
context of search terms better, capturing synonyms and polysemous words.
For structured data, traditional SQL queries have limitations, so there's growing interest in applying search
technology to databases:
• Challenges: SQL queries may not cover all search needs, especially when dealing with multiple
databases or cloud platforms that don't support joins.
• Solutions: Techniques like indexing and parallel processing can help search structured data
efficiently, though incorporating relationships between data items without explicit joins is an
ongoing area of research.
Overall, the goal is to make searching within a company as seamless and effective as searching the web,
even with the added complexities of enterprise data.
We've discussed various cloud computing technologies and their impact on enterprise software needs,
mainly focusing on major providers like Amazon, Google, and Microsoft. However, there's more to the
cloud ecosystem beyond these giants. Emerging technologies complement public clouds, enable
interoperability with private data centers, and facilitate private cloud creation within enterprises.
1. Public Cloud Providers: These offer Infrastructure as a Service (IaaS) and Platform as a Service
(PaaS) solutions. Examples include Amazon EC2 (IaaS) and Google App Engine (PaaS). Some
traditional hosting providers are also entering the cloud space.
2. Cloud Management Platforms and Tools: Tools like RightScale and EnStratus help manage
complex cloud deployments. They offer features like infrastructure configuration, monitoring, and
load balancing.
3. Tools for Building Private Clouds: Enterprises are interested in private clouds, combining
virtualization, self-service infrastructure, and dynamic monitoring. Tools like VMware and 3tera
assist in this. Grid computing technologies are also used in high-performance computing
environments.
Examples:
• Eucalyptus (IaaS): An open-source framework for building private clouds. It mimics Amazon
EC2's infrastructure and provides insights into cloud architectures.
• AppScale (PaaS on IaaS): Mimics Google App Engine's platform on an IaaS platform. It allows
scalable deployment of GAE-like environments on platforms like EC2 or Eucalyptus.
These tools simplify cloud deployment and management, catering to diverse enterprise needs.