Cloud Computing
Cloud Computing
• Description: A virtual boundary created around a cloud environment to secure and isolate
cloud resources. It is implemented through firewalls, VPNs, and VLANs to control access and
ensure that only authorized entities can interact with the cloud infrastructure.
• Purpose: Enhances security and compliance by segregating data and restricting access.
Key features
• The Logical Network Perimeter is a virtual boundary that defines
and protects the digital resources of a network. Unlike a physical
network perimeter, which is tied to physical devices such as routers
or firewalls,
• The key features of Logical Network Perimeter are
– Access Control: Policies to regulate who can access the network.
– Segmentation: Dividing the network into smaller, secure segments.
– Encryption: Protecting data in transit within and outside the perimeter.
– Virtual Boundaries: Implemented via software-defined networking (SDN),
virtual private networks (VPNs), or cloud-based solutions.
Components of LNP
• Firewalls:
– Monitor and control incoming and outgoing traffic based on predefined rules.
– Types include network firewalls (e.g., AWS Network Firewall) and web application firewalls (e.g., Azure WAF).
• Virtual Private Networks (VPNs):
– Secure communication by encrypting data transmitted between on-premises environments and cloud services.
– Examples include AWS Client VPN and Azure VPN Gateway.
• Gateways:
– Internet Gateways: Enable cloud resources to communicate with the internet securely.
– NAT Gateways: Allow instances without public IPs to access external services while keeping them private.
– API Gateways: Manage and secure API traffic.
• Identity and Access Management (IAM):
– Enforces authentication and authorization for accessing resources.
• Intrusion Detection and Prevention Systems (IDPS):
– Detect and block malicious activity within the cloud network.
• Network Segmentation Tools:
– Virtual LANs (VLANs), security groups, and subnets are used to segregate traffic.
• Cloud Security Posture Management (CSPM):
• Monitors compliance and configuration of network perimeters
Purpose
• Access Control: Regulates which users, devices, or services can
interact with network resources.
• Data Security: Protects sensitive information from
unauthorized access.
• Threat Mitigation: Identifies, monitors, and blocks malicious
traffic or activities.
• Compliance: Helps organizations meet industry standards and
regulations (e.g., HIPAA, PCI DSS).
Security Implications
• Enhanced Security Posture:
– Reduces exposure to threats by segmenting networks and applying least-
privilege access.
• Compliance with Standards:
– Helps align with industry-specific security frameworks and regulations.
• Dynamic Threat Mitigation:
– Real-time monitoring and response to malicious activities.
• Challenges:
– Misconfigurations in cloud settings can lead to vulnerabilities.
– Insider threats or compromised accounts may bypass logical perimeters.
Examples of Implementation of LNP in Cloud Infrastructure
Types of Hypervisors
1. Type 1 (Bare-Metal Hypervisors)
– Runs directly on the host's hardware.
– Does not require an underlying operating system.
– Provides better performance and efficiency due to direct access to hardware.
– Common in data centers and enterprise environments.
– Examples:
• VMware ESXi, Microsoft Hyper-V, Xen
• KVM (Kernel-based Virtual Machine)
2. Type 2 (Hosted Hypervisors)
– Runs on top of an existing operating system.
– Easier to set up and use but may have overhead since it relies on the host OS.
– Common for desktop and personal use.
– Examples:
• VMware Workstation, Oracle VirtualBox, Parallels Desktop
• QEMU (can also act as a Type 1 in certain configurations)
Functions of a Hypervisor
Resource Usage Dedicated to a single purpose or workload. Shares physical resources with other VMs.
Flexibility Limited scalability; hardware upgrades required. Highly scalable with configurable
resources.
Cost Higher upfront costs for hardware. Lower costs due to shared infrastructure.
Fault Tolerance Requires additional hardware for redundancy. Enhanced through snapshots and
failovers.
Benefits of Virtual Servers in Cloud Computing
• Cost Efficiency:
– Reduces hardware investment by sharing physical resources.
• Scalability and Flexibility:
– Adjust resources as needed to match demand.
• Improved Disaster Recovery:
– Snapshots and backups allow quick recovery.
• Resource Optimization:
– Maximizes hardware utilization.
• Ease of Management:
– Centralized management tools simplify deployment and monitoring.
• Portability:
– Virtual servers can be easily migrated across environments.
Case Studies of Virtual Server Usage
• Definition: Provides raw storage blocks to be formatted by the user, similar to hard
drives. It works at the block level.
• Use Cases:Databases, Virtual Machine Disks, High-performance workloads
• Examples:Amazon Elastic Block Store (EBS), Google Persistent Disks, Microsoft Azure
Disk Storage
• Advantages:
– Low latency
– High IOPS (Input/Output Operations Per Second)
– Granular control over data
• Disadvantages:
– No metadata support
– Typically more expensive than other options
b. Object Storage
• Definition: Data is stored as objects, each containing the data, metadata, and a
unique identifier.
• Use Cases:Media storage, Archival and backup, Static website hosting
• Examples:Amazon S3, Google Cloud Storage, Azure Blob Storage
• Advantages:
– Scalability
– Supports metadata and retrieval by unique IDs
– Cost-effective for large amounts of unstructured data
• Disadvantages:
– High latency compared to block storage
– Not ideal for transactional data
What is the difference between block storage and object storage?
• Cloud file storage is a method for storing data in the cloud that provides servers and
applications access to data through shared file systems. This compatibility makes cloud file
storage ideal for workloads that rely on shared file systems and provides simple integration
without code changes.
• What is a cloud file system?
• A cloud file system is a hierarchical storage system in the cloud that provides shared access
to file data. Users can create, delete, modify, read, and write files, as well as organize them
logically in directory trees for intuitive access.
• What is cloud file sharing?
• Cloud file sharing is a service that provides simultaneous access for multiple users to a
common set of files stored in the cloud. Security for online file storage is managed with user
and group permissions so that administrators can control access to the shared file data.
How does cloud file storage help with collaboration?
• Cloud file storage allows team members to access, view, and edit the same files in near,
real-time, and simultaneously, from virtually any location. Edits are visible to users or
groups as they are made, and changes are synced and saved so that users or groups see
the most recent version of the file. Collaboration through cloud file sharing offers many
benefits:
– Work together and achieve shared goals, even with remote members.
– Schedule work flexibly by sharing tasks between collaborators in different time zones.
– Share and edit large files, like video or audio files, with ease.
– Receive notifications when files are edited or updated in real time.
– Share ideas or suggestions by leaving comments on shared files.
• What are the use cases for cloud file storage?
• Cloud file storage provides the flexibility to support and integrate with existing applications, plus the ease
to deploy, manage, and maintain all your files in the cloud. These two key advantages give organizations
the ability to support a broad spectrum of applications and verticals. Use cases such as large content
repositories, development environments, media stores, and user home directories are ideal workloads for
cloud-based file storage. Some example use cases for file storage are as follows.
Cloud-Based Storage Solutions
• While both terms are often used in the same conversations, this isn’t an either/or
decision. Both data back-ups and redundancy offer two different and equally as valuable
solutions to ensuring business continuity in the face of unplanned accidents, unexpected
attacks or system failures.
• Redundancy is designed to increase your operational time, boost staff productivity and
reduce the amount of time that a system is unavailable due to a failure.
• Back-up, on the other hand, is designed to kick-in when something goes wrong, allowing
you to completely rebuild no matter what caused the failure.
• In short, redundancy prevents failure while back-ups prevent loss. In a modern business
environment that is inherently dependent on access to large volumes of data,
• it’s clear that operational redundancy and back-ups are both critical elements that
comprise an effective continuity strategy.
• As interchangeable as they may seem, backup and redundancy have two distinct
meanings, and both play an important role.
Data Backup and Redundancy Techniques
Ensuring data durability and availability is crucial in cloud storage.
a. Data Backup Techniques
– Snapshot Backups: Periodic point-in-time backups of block storage volumes.
– Incremental Backups: Storing only the changes made since the last backup.
– Cloud-to-Cloud Backups: Backing up data across multiple cloud platforms.
b. Redundancy Models
– Replication:
• Cross-region replication for disaster recovery (e.g., Amazon S3 CRR).
• High availability through multi-zone replication.
– RAID-Like Architectures: (Redundant Array Independent Disks)
• Some storage solutions mimic RAID for distributed data.
c. Disaster Recovery Plans
– Ensuring data is available in case of a failure or disaster.
– Testing recovery strategies periodically.
d. Durability Guarantees
– Many providers offer "11 nines" (99.999999999%) durability for object storage (e.g., Amazon S3).
What is RAID?
• RAID 0 distributes data across all disks in the array, splitting it into blocks and writing each block to a
separate disk.
• Advantages: increased speed of operation due to parallel writing of data to both disks.
• Disadvantages: lack of redundancy — failure of one disk will result in the loss of all data in the disk
array.
• Application: Used in tasks where high performance is a priority, for example, video processing or
working with large files.
RAID 1 (Mirroring)
• RAID 1 creates a copy of the data on each disk, ensuring its safety.
• Advantages: high fault tolerance - if one disk fails, all information remains on the second.
• Disadvantages: Reduces available storage space by half.
• Examples of Use: used in systems where high reliability is required, for example, in banking systems.
RAID 2
• This configuration uses striping across disks, with some disks
storing error checking and correcting (ECC) information.
• RAID 2 also uses a dedicated Hamming code parity, a linear form of
ECC.
• RAID 2 has no advantage over RAID 3 and is no longer used.
RAID 3
• This technique uses striping and dedicates one drive to storing parity information. The
embedded ECC information is used to detect errors.
• Data recovery is accomplished by calculating the exclusive information recorded on the other
drives. Because an I/O operation addresses all the drives at the same time, RAID 3 cannot
overlap I/O.
• For this reason, RAID 3 is best for single-user systems with long record applications
RAID 4
• This level uses large stripes, which means a user can read
records from any single drive.
• Overlapped I/O can then be used for read operations. Because
all write operations are required to update the parity drive, no
I/O overlapping is possible
RAID 5 (Striping with Distributed Parity)
• RAID 5 distributes data and parity across disks, providing a balance between speed and reliability.
• This level is based on parity block-level striping. The parity information is striped across each drive, enabling the array
to function, even if one drive were to fail.
• The array's architecture enables read and write operations to span multiple drives. This results in performance better
than that of a single drive, but not as high as a RAID 0 array. RAID 5 requires at least three disks,
• Advantages: effective combination of performance, reliability, and disk space utilization.
• Disadvantages: recovery after a failure can take time and reduce performance.
• Application: widely used on servers where both performance and protection are important.
RAID 6 (Striping with Double Parity)
• RAID 6 is similar to RAID 5 but uses double parity information, allowing it to withstand the failure of
two disks simultaneously. RAID 6 requires an array of 4 or more disks to operate.
• Advantages: higher reliability compared to RAID 5.
• Disadvantages: high disk costs, as half of the capacity is used for duplication. slower write
performance than RAID 5 arrays.
• Where it is used: ideal for databases and critical applications.
RAID 10 (RAID 1+0)
• RAID 10 combines the methods of RAID 1 and RAID 0, creating mirrored pairs of disks with data
distributed across all pairs.
• Advantages: fast operation and data protection.
• Disadvantages: high disk costs, as half of the capacity is used for duplication.
• Examples of Use: ideal for databases and critical applications.
Other Nested RAID
• RAID 01 (RAID 0+1). RAID 0+1 is similar to RAID 1+0, except the data
organization method is slightly different. Rather than creating a mirror
and then striping it, RAID 0+1 creates a stripe set and then mirrors the
stripe set.
• RAID 03 (RAID 0+3, also known as RAID 53 or RAID 5+3). This level
uses striping in RAID 0 style for RAID 3's virtual disk blocks. This offers
higher performance than RAID 3, but at a higher cost.
• RAID 50 (RAID 5+0). This configuration combines RAID 5 distributed
parity with RAID 0 striping to improve RAID 5 performance without
reducing data protection.
BENEFITS OF RAID
• Improved cost-effectiveness because lower-priced disks are used in large
numbers.
• Using multiple hard drives enables RAID to improve the performance of a single
hard drive.
• Increased computer speed and reliability after a crash, depending on the
configuration.
• Reads and writes can be performed faster than with a single drive with RAID 0.
This is because a file system is split up and distributed across drives that work
together on the same file.
• There is increased availability and resiliency with RAID 5. With mirroring, two
drives can contain the same data, ensuring one will continue to work if the
other fails
Cloud Usage Monitor
• A cloud usage monitor is a tool mechanism that tracks, analyzes and logs resource
usage within a cloud environment to optimize their utilization and minimize costs.
• It provides visibility into resource usage patterns such as CPU usage, memory usage,
network traffic, and storage usage.
• It enables businesses to identify underutilized resources and make the necessary
adjustments to optimize performance and reduce expenses.
• It provides detailed metrics on resource consumption, user activities, and
performance.
• Cloud usage monitoring is implemented via automatic monitoring software, giving
control and central access over cloud infrastructure.
• Purpose:
– Enables billing, performance optimization, and resource management.
– By using cloud usage monitoring tools, businesses can stay ahead of performance issues,
identify areas for improvement, and maintain optimal cloud performance.
Roles of Usage Monitoring in Cloud
• Key Roles:
– Performance Management: Ensures workloads meet performance requirements by monitoring key metrics
(e.g., CPU, memory, Network and Disk I/O).
– Cost Optimization: Identifies underutilized resources to reduce unnecessary expenses.
– Troubleshooting: Provides insights into system behavior to diagnose and resolve issues quickly.
– Scalability: Helps predict demand trends to provision resources dynamically.
– Compliance(Log Analysis & Regular Audit): Tracks resource usage for audit, error logs for debugging and
regulatory compliance.
– Health Checks:
• Regular checks on system and service status (e.g., HTTP responses, latency).
– Alerts and Notifications:
• Automated alerts for predefined thresholds or abnormal behavior.
• Cloud monitoring assesses three main areas: Performance Monitoring, Security
Monitoring, and Compliance Monitoring. Each area is critical for managing the health,
security, and regulatory compliance of cloud services.
Benefits of Monitors in Cloud Computing
• a. AWS CloudWatch
• A monitoring and observability service.
• Features:
– Tracks metrics like CPU utilization, disk I/O, and network activity.
– Logs management via Amazon CloudWatch Logs.
– Alerts and automated responses using CloudWatch Alarms.
– Dashboards for visualization.
• b. Azure Monitor
• A comprehensive monitoring service for Azure resources.
• Features:
– Tracks metrics and logs for VMs, databases, and web apps.
– Integrated with Azure Alerts for incident response.
– Application Insights for performance monitoring of applications.
• c. Google Cloud Operations Suite (formerly Stackdriver)
• Features:
– Tracks resource metrics, logs, and traces.
– Real-time monitoring with customizable dashboards.
• d. Third-Party Tools
• Datadog: Unified monitoring across multi-cloud environments.
• New Relic: Application performance monitoring (APM) and infrastructure monitoring.
• SolarWinds: Hybrid IT monitoring with deep analytics.
Storage Usage Metrics to Monitor
• Monitoring the usage of storage devices is a critical aspect of managing
the infrastructure in cloud computing. It helps in optimizing performance,
managing costs, ensuring data availability, and detecting potential issues.
Key metrics for monitoring cloud storage include:
– Capacity Usage: The amount of used vs. available storage.
– I/O Operations: Read/write operations per second (IOPS).
– Latency: Time taken to perform storage operations.
– Throughput: Data transfer rate during storage operations.
– Error Rates: Frequency of storage-related errors.
– Snapshot and Backup Sizes: For disaster recovery and compliance.
Metrics Tracked in Cloud Monitoring
a. Real-Time Monitoring
– Provides live insights into resource usage.
– Useful for:
• Immediate issue detection and resolution.
• Dynamic scaling based on demand spikes.
– Tools: AWS CloudWatch Live Metrics, Azure Monitor Live Metrics.
b. Historical Monitoring
– Stores past usage data for trend analysis and capacity planning.
– Useful for:
• Identifying usage patterns.
• Forecasting future resource needs.
– Tools: Google Cloud Logging, AWS CloudWatch Logs
Benefits for Resource Optimization and Cost Control
a. Resource Optimization
– Ensures resources are neither over- nor under-provisioned.
– Identifies idle or underutilized resources for rightsizing.
– Helps with workload placement for optimal performance.
b. Cost Control
– Detects and eliminates unused resources (e.g., unattached volumes).
– Tracks cost-related metrics to avoid budget overruns.
– Enables cost allocation by tagging resources for usage accountability.
c. Improved Application Performance
– Detects performance bottlenecks (e.g., high CPU usage).
– Ensures applications run smoothly under varying loads.
d. Proactive Issue Resolution
– Triggers alerts for potential problems (e.g., memory thresholds).
– Reduces downtime through faster incident response.
Resource Replication
• Synchronous replication simultaneously writes data to two systems instead of one. Asynchronous replication
writes data to the primary storage array first and then copies data to the replica.
• Synchronous replication ensures data consistency but may introduce latency and performance issues due to
waiting for acknowledgments. On the other hand, asynchronous replication offers better performance but risks
potential data loss during failures.
• Synchronous replication is the process of copying data over a network to create multiple current copies of the
data. Synchronous replication is mainly used for high-end transactional applications that need instant failover if
the primary node fails.
• The benefits of asynchronous replication
– Asynchronous replication requires substantially less bandwidth than synchronous replication.
– It is designed to work over long distances.
– asynchronous replication can tolerate some degradation in connectivity.
• In contrast, synchronous replication allows fail-over from primary to secondary data storage to occur nearly
instantaneous, to ensure little to no application downtime. However as noted above, it requires the bandwidth of
a LAN between the servers, possibly with an extended LAN in two geographically remote computer zones and
may also require specialized hardware (depending on the implementation).
How does resource replication occurs?
The process of resource replication typically involves several steps:
• Data Distribution: When data is stored in the cloud, it's often replicated across
multiple servers or data centers. This distribution ensures that if one server fails, the
data can still be accessed from another location without interruption.
• Automatic Replication: Cloud platforms often have built-in mechanisms for
automatic replication. When a file or piece of data is uploaded to the cloud, it's
automatically replicated to multiple locations according to predefined replication
policies set by the cloud provider or user.
• Load Balancing: Resource replication also involves load balancing mechanisms to
evenly distribute workloads across replicated resources. This ensures optimal
performance and prevents any single server from becoming overwhelmed with
requests.
How does resource replication occurs?
• Synchronization: To ensure consistency across replicated resources, synchronization
mechanisms are employed. Changes made to data or applications in one location are
synchronized with all other replicated instances in real-time or at defined intervals.
• Failover and Disaster Recovery: Replication plays a crucial role in failover and disaster recovery
scenarios. If one server or data center experiences a failure, traffic can be rerouted to
replicated resources, minimizing downtime and ensuring continuity of service.
• Geographical Distribution: Cloud providers often replicate resources across multiple
geographic regions to improve performance and provide resilience against natural disasters or
regional outages.
Overall, resource replication in cloud computing is essential for maintaining high availability,
reliability, and resilience in the face of hardware failures, network issues, or other disruptions.
By distributing data and services across multiple locations, cloud providers ensure that users
can access their resources consistently and without interruption.
Ready-Made Environment
• Key Components:
– Redundancy:
• Duplicate resources to act as backups.
• Can be active-passive or active-active.
– Automatic Detection:
• Monitoring systems identify failures in real-time.
– Failover Mechanism:
• Seamless redirection of traffic or workloads to backup resources.
– Recovery Time Objective (RTO) and Recovery Point Objective (RPO):
• Metrics to ensure minimal downtime and data loss.
Importance of Failover Mechanisms in System Design
• A crucial component of system design is failover, particularly in settings where dependability and uptime are crucial.
Failure over is crucial for the following reasons:
– High Availability: In the event that a system or component fails, failover makes sure that services continue to be
offered. This is essential for systems like banking systems, emergency services, and e-commerce platforms that must
be available around-the-clock.
– Redundancy: In the event of a breakdown, failover systems offer redundancy by having backup parts or resources
prepared to take over. The possibility that a single point of failure will bring down the entire system is reduced by this
redundancy.
– Fault Tolerance: By automatically identifying faults and rerouting workload or traffic to healthy components, failover
techniques enhance fault tolerance. This lessens the effect that malfunctions have on the system as a whole.
– Disaster Recovery: A crucial part of any strategy for disaster recovery is failover. Failureover techniques aid in the
prompt restoration of services and reduction of downtime in the event of a disaster, such as a hardware malfunction,
network outage, or natural disaster.
– Business Continuity: By reducing downtime and guaranteeing the continued availability of vital services, failover
assures business continuity. This is especially crucial for companies whose operations significantly depend on their IT
infrastructure.
– Customer Satisfaction: Higher customer satisfaction is a result of dependable services. By preserving service
dependability and availability, failover techniques make sure that users may continue to access the services they
require.
Failover Strategies
• Active-Active
– In an active-active setup, redundant IT resource implementations actively and synchronously support
the workload (Figure 1). It is necessary to load balance among active instances. The failed instance is
eliminated from the load balancing scheduler when a failure is discovered (Figure 2). Whenever a fault is
discovered, the processing is transferred to the IT resource that is still active
• Active-Passive
– A standby or inactive implementation is triggered in an active-passive setup to take over processing
from an IT resource that becomes unavailable, and the associated workload is directed to the instance
taking over the operation.
– Some failover systems are made to reroute workloads to operational IT resources. These systems rely on
specialized load balancers to identify failure conditions and remove instances of failed IT resources from
the workload distribution. This kind of failover system is appropriate for IT resources that support
stateless processing and don’t need execution state management. The redundant or standby IT resource
implementations are required to share their state and execution context in technology architectures that
are often based on clustering and virtualization technologies.
Failover Systems in Computing
• Benefits:
– Minimizes downtime and disruptions.
– Ensures business continuity.
– Meets compliance and SLA requirements.
• Challenges and Best Practices
• Challenges:
– Complex setup for large-scale systems.
– Balancing cost and performance.
– Ensuring proper testing and validation.
• Best Practices:
– Implement regular failover testing.
– Use multi-region or multi-zone deployments.
– Monitor failover systems themselves.
– Optimize cost by scaling resources dynamically.
Failover Systems in Cloud Computing
• Cloud Computing Platforms:
– Cloud computing platforms implement failover mechanisms to maintain uptime and availability of virtualized resources and services.
• Approaches:
– DNS Failover:
• Uses DNS to route traffic to healthy resources.
• Example: AWS Route 53, Cloudflare.
– Load Balancer Failover:
• Distributes traffic across multiple instances and redirects when an instance fails.
• Example: AWS Elastic Load Balancer, Azure Load Balancer.
– Cluster-based Failover:
• Utilizes clustered servers or containers to ensure high availability.
• Example: Kubernetes, VMware vSphere.
– Backup and Restore:
• A simpler failover strategy involving restoring from backups.
– Hypervisor-based failover mechanisms, such as VMware HA (High Availability) and Microsoft Hyper-V Replica, automatically restart virtual machines on healthy hosts in case of host
failures.
– Load balancers and DNS-based failover solutions distribute incoming traffic across multiple servers or data centers and redirect traffic to healthy instances during failures or performance
degradation.
• Conclusion
Monitoring and failover systems are integral to the resilience of cloud-based infrastructures. By combining robust
monitoring tools with a reliable failover mechanism, organizations can ensure high availability and maintain service
continuity even during unexpected disruptions. Monitoring and failover systems are critical components of cloud
computing infrastructures, ensuring reliability, availability, and seamless service delivery.
Thank You!