Cloud Computing Notes 1
Cloud Computing Notes 1
Course Description:
1. Overview of distributed computing:
trends of computing, introduction to distributed computing, Cloud
Computing
2. Distributed systems:
- Data centers,
- Virtualization, / synchronization / replication
- Web 2.0,
- Service- and utility-oriented computing.
- Architectures of DS
- Communication in DS
8. Cloud applications:
Healthcare and biology. Geo-science. Business and consumer
applications.
1. Trends of Computing:
• Cloud Computing:
Modern trend providing on-demand, scalable, utility-based services.
- People need DC becouse they need high performance , by that they want
to acheive :
1/ Parallelism: access to alot of CPU/memories/storages/data
2/ Tolerate faults: if one computer fails, you can count over the other
one
3. Fault Tolerance: If one computer fails, others can take over its tasks.
Conclusion:
Distributed computing has transformed the way we build and deploy
software systems by enabling efficient resource utilization, increased
system reliability, and enhanced scalability.
1. Definition:
2. Primary Goals
o Service Continuity: The system should keep operating even if
some nodes or network links fail.
o Consistency & Coordination: maintaining and Ensuring data
consistency or state is synchronized across nodes.
o Scalability & Transparency: Ability to scale up by adding nodes
while hiding the complexity from end users.
3. Diffrence between DS and DC:
• Distributed Systems: Emphasis on the system as a whole, ensuring
reliability, coordination, and consistency across nodes.
• Distributed Computing: Emphasis on dividing and conquering
computational tasks and optimizing for parallel execution.
Examples
• Distributed Systems:
Social Media Platforms that store and serve billions of user posts
from data centers worldwide.
Cloud-Based services where each service runs on different nodes
but collectively forms a single application.
Distributed Databases (e.g., Cassandra, MongoDB clusters)
managing data replication and partitioning.
• Distributed Computing:
MapReduce / Spark Clusters performing big data analytics or
machine learning tasks on enormous datasets.
Scientific Simulations (e.g., protein folding, climate modeling)
spread over high-performance computing clusters.
Client/Server Systems
In client/server systems, the client sends input to the server, which processes
the request and sends back a response. This model is fundamental and can
involve multiple server.
Peer-to-Peer Systems
Peer-to-peer systems are decentralized, with each node acting as both client
and server. Nodes perform tasks on their local memory and share data
through a supporting medium.
5. Conclusion:
A. Centralized Architectures :
B. Decentralized Architecture :
Hybrid architecture is particularly beneficial for systems that require both the
reliability and consistency of centralized control and the scalability and fault
tolerance of decentralized components. By combining these approaches,
hybrid architecture can provide a balanced and efficient solution for complex,
large-scale distributed systems.
Fundamental Components :
• Cloud Data Centers: Cloud providers like AWS, Google Cloud, and
Microsoft Azure operate large-scale data centers that provide
infrastructure and services to distributed systems globally.
• Edge Data Centers: Edge data centers are smaller facilities located
closer to end-users to reduce latency and improve performance for
distributed systems, such as content delivery networks (CDNs) and
Internet of Things (IoT) applications.
A. Edge computing
Key Points
1. Location of Processing
a. Instead of sending large volumes of raw data to a distant cloud
server, the processing (computation and analytics) happens near
the devices—either on the device itself or on a local edge node.
b. This local or near-local node could be a specialized gateway, an on-
premises server, or even an embedded computer in a machine.
2. Benefits
a. Reduced Latency: Immediate processing near the data source
allows real-time or near real-time responses (crucial in applications
like self-driving cars, industrial automation, or health monitoring).
b. Bandwidth Savings: By not transmitting all raw data to the cloud,
organizations can lower network usage and associated costs.
c. Reliability: Systems remain partially functional even if the
connection to the central cloud is disrupted—helpful for remote
locations or harsh environments.
d. Data Privacy: Sensitive or proprietary data can be processed
locally, limiting what is sent over external networks.
3. Use Cases
a. IoT (Internet of Things): Smart sensors in manufacturing or smart
cities can analyze data at the edge, only sending summarized
results to the cloud.
b. Autonomous Vehicles: Quick decision-making (e.g., obstacle
avoidance) requires computing near the source (the car itself).
c. Healthcare: Wearable or bedside devices that need immediate
local analysis for patient monitoring.
d. Retail: Local servers in stores can handle point-of-sale data,
personalized recommendations, and inventory management
without depending on a constant, high-bandwidth connection.
4. Challenges
a. Infrastructure Complexity: Setting up edge nodes across multiple
locations can be more complex than using a single central cloud.
b. Security: More endpoints and distributed nodes can increase the
attack surface, requiring robust security measures.
c. Maintenance: Monitoring and updating a large number of edge
devices requires careful orchestration and management strategies.
5. Relation to Cloud Computing
a. Edge computing often complements cloud computing rather than
replaces it.
b. Hybrid Approach: Perform immediate or real-time processing at
the edge, then send aggregated or less time-sensitive data to the
cloud for further analysis, archiving, or advanced machine learning
tasks.
In Summary
B. Virtualisation :
• Types of Virtualization:
o Hardware Virtualization: This involves creating virtual versions of
physical hardware components, like CPUs, storage devices, and
network resources. Popular platforms include VMware, Hyper-V, and
VirtualBox.
o Software Virtualization: This allows multiple software
applications to run on a single physical machine, often using virtual
environments or containers like Docker.
o Desktop Virtualization: This lets you run multiple desktop
environments on a single physical computer, enabling users to
switch between operating systems or applications seamlessly.
o Network Virtualization: Combines multiple physical networks into
a single virtualized network (e.g., VPNs, SDN, NFV).
o Storage Virtualization: Pools multiple storage devices into a
single, logical unit for efficient management (e.g., SAN, NAS, Cloud-
based storage).
• Benefits:
o Cost Savings: Reduces the need for physical hardware, leading to
lower costs for purchasing, maintaining, and powering devices.
o Resource Optimization: Maximizes the utilization of physical
resources, as multiple VMs can share a single physical machine’s
CPU, memory, and storage.
o Scalability: Makes it easier to scale your infrastructure up or down
based on demand, by adding or removing VMs as needed.
o Disaster Recovery: Simplifies backup and recovery processes, as
VMs can be easily copied, moved, or restored in the event of
hardware failure.
• Example :
here is a real application for virtualization , on which you can run and see
in your computer (Click here or type this title on YouTube : How to use
VirtualBox - Tutorial for Beginners - YouTube) try to do the same in your
computer .
C. Consistency
D. Replication
1/ Data Replication:
v. DISTRIBUTED REPLICATION :
Distributed Replication distributes data or services across multiple nodes
in a less structured manner compared to primary-backup or chain
replication. Replicas can be located geographically or logically distributed
across the network
Eg: Content Delivery Network (CDN) (eg Netflix)
Data Redundancy: Ensures that data is available even if one copy is lost
or corrupted.
2/ System Replication:
iii. Benefits:
3/ Synchronization:
Data Synchronization:
i. File Synchronization: Keeps files consistent across multiple
devices or storage locations. For example, syncing files between
your computer and cloud storage.
ii. Database Synchronization: Ensures that data in multiple
databases or database instances remains consistent. This is crucial
for distributed systems where data is spread across different
servers.
Benefits:
Consistency: Ensures that all copies of data are the same,
reducing the risk of data discrepancies.
Accessibility: Allows users to access the most recent version of
data from any location or device.
Collaboration: Facilitates real-time collaboration by ensuring
that changes made by one user are immediately reflected for
others.
System Synchronization:
Benefits:
Accuracy: Ensures that actions or events occur at the correct
time and in the correct sequence.
Reliability: Reduces the risk of errors or conflicts caused by
out-of-sync data or systems.
Efficiency: Improves the overall performance and
responsiveness of systems by ensuring that they operate in
harmony.
4/ Consistency Guarantees :
For serving massive online traffic in the modern Web, it’s very common to
have an infrastructure set up with multiple replicas (or partitions). Whether it’s
your Facebook activity, GMail data or Amazon’s order history — everything is
replicated across datacenters, availability zones and possibly across countries
to ensure data is not lost and the systems are always highly available in case
one of the replica crashes. This poses a challenge — how to have consistent
data across replicas? Without consistency, a mail that you have sent recently
through GMail might disappear, an item deleted from Amazon cart might
reappear. Or even worse, a financial transaction might be lost causing
thousands of $$$ loss. While losing cart items is okay at times but losing $$$ is
a big NO NO!
there are two kinds of consistency:
- Weak Consistency and Strong Consistency.
A/ Weak Consistency :
NoSQL data stores like MongoDB, Amazon Dynamo DB, Cassandra etc. These
systems are usually known for built in high availability and performance. In the
presence of partition and network issues, they embrace weakness in
consistency to support such behaviour. As you can see in Figure 1, weaker
consistency means higher availability, performance and throughput although
more anomalous data.
Properties:
B/ Strong Consistency :
4. Communication in a DS :
We said a distributed system is a collection of interconnected computers
that work together to achieve a common goal. Without communication we
can’t achieve ( coordination, data sharing and resource distribution)
between the different components of DS.
To make it easier to deal with the numerous levels and issues involved in
communication, the International Standards Organization (ISO) developed a
reference model that clearly identifies the various levels involved, gives them
standard names, and points out which level should do which job. This model
is called the Open Systems Interconnection Reference Model (Day and
Zimmerman, 1983), usually abbreviated as ISO OSI or sometimes just the
OSI model.
A/ SERVICE-ORIENTED COMPUTING
• Key Features:
Modularity: Applications are composed of smaller,
independent services.
Interoperability: Services can work together, even if they are
built on different platforms or technologies.
LOOSE COUPLING — Services are independent and depend
on each other minimally. If one service fails or is updated, it does not
break the entire system.
DISCOVERABILITY — Services should be easily located and
accessed when needed. They are often registered in a service
directory (like UDDI—Universal Description, Discovery, and
Integration) where applications can search for and find available
services.
Scalability: Services can be scaled independently based on
demand.
B/ UTILITY-ORIENTED COMPUTING
Both paradigms aim to optimize resource usage and provide flexibility, but
SOC focuses on building modular applications, while utility computing
emphasizes resource provisioning and cost efficiency. Eg: Amazon Web
Services, Microsoft Azure, Virtual Machines on Cloud Platforms.