0% found this document useful (0 votes)
9 views17 pages

Notes

The document provides an overview of cloud computing fundamentals, including its characteristics, deployment models (Public, Private, Community, Hybrid), and service models (IaaS, PaaS, SaaS). It also discusses virtualization, its reference model, types, and key technologies. The content emphasizes the benefits of cloud computing, such as cost efficiency, scalability, and enhanced security.

Uploaded by

anupsambhex545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Notes

The document provides an overview of cloud computing fundamentals, including its characteristics, deployment models (Public, Private, Community, Hybrid), and service models (IaaS, PaaS, SaaS). It also discusses virtualization, its reference model, types, and key technologies. The content emphasizes the benefits of cloud computing, such as cost efficiency, scalability, and enhanced security.

Uploaded by

anupsambhex545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit 01 : Fundamentals of Cloud Computing

Cloud computing is the delivery of various services via the Internet—these services include data storage,
servers, databases, networking, and software. Instead of owning their own computing infrastructure or
data centers, companies can lease access to anything from applications to storage from a cloud service
provider.

Characteristics of Cloud Computing :


On-Demand Self-Service
 Users can access computing resources such as servers, storage, and applications whenever they
need them, without requiring human intervention from the service provider.
Broad Network Access
 Resources are accessible over a network (usually the internet) and can be used on various devices
such as laptops, smartphones, or tablets.
Resource Pooling
 The cloud provider’s resources are pooled together to serve multiple users. Resources are
dynamically allocated and reassigned according to demand, ensuring efficiency. This is often
achieved using multi-tenancy models.
Measured Service
 Cloud systems automatically control and optimize resource usage by metering services. Users only
pay for what they use, based on metrics such as processing time, data transfer, or storage capacity.

Cloud Deployment Models :

Cloud computing services are deployed based on the type of access and the ownership of the
infrastructure. The primary deployment models are Public Cloud, Private Cloud, Community Cloud, and
Hybrid Cloud. Here's an explanation of each:

1. Public Cloud
A public cloud is a type of cloud infrastructure that is open for use by the general public. It is owned,
managed, and operated by third-party cloud service providers.
 Examples: Microsoft Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP).
 Features:
o Accessible via the internet to anyone willing to pay.
o Cost-effective since resources are shared among multiple users.
o Highly scalable and flexible.
 Advantages:
o No maintenance for the user.
o Pay-as-you-go pricing.
 Disadvantages:
o Less control over resources.
o May not meet specific security or compliance needs.

pg. 1 By Ghanshyam
2. Private Cloud
A private cloud is a dedicated cloud environment used exclusively by a single organization. It may be hosted
on-premises or by a third-party provider but is not shared with other users.
 Examples: VMware, OpenStack, or a custom private setup.
 Features:
o Offers higher security and control compared to public clouds.
o Can be tailored to meet specific organizational needs.
 Advantages:
o Greater security and privacy.
o Ideal for businesses with strict compliance requirements.
 Disadvantages:
o Higher cost due to dedicated resources.
o Requires in-house expertise for management and maintenance.

3. Community Cloud
A community cloud is a cloud infrastructure shared by multiple organizations with similar interests, goals,
or compliance requirements. It is operated for the benefit of the specific community.
 Examples: Government organizations sharing infrastructure for data sharing or compliance
purposes.
 Features:
o Collaboration and resource sharing among a specific group of organizations.
o Managed either internally or by a third-party provider.
 Advantages:
o Cost is shared among participants.
o Enhanced collaboration within the community.
 Disadvantages:
o Limited scalability compared to public clouds.
o Managing shared governance can be challenging.

4. Hybrid Cloud
A hybrid cloud combines two or more cloud models (public, private, or community), allowing data and
applications to be shared between them. It provides the flexibility of multiple deployment environments.
 Examples: A company might use a private cloud for sensitive operations and a public cloud for less
critical workloads.
 Features:
o Seamless integration of public and private clouds.
o Offers the best of both worlds by balancing control and scalability.
 Advantages:
o Greater flexibility in handling workloads.
o Cost optimization by using public cloud resources for less sensitive tasks.
 Disadvantages:
o Complex to manage and integrate.
o Security concerns when data moves between public and private clouds.

pg. 2 By Ghanshyam
Comparison of Cloud Deployment Models (Public, Private, Community, and Hybrid Cloud)

Feature Public Cloud Private Cloud Community Cloud Hybrid Cloud

Shared ownership by
Owned and managed Owned by a single Combination of public
Ownership a group of
by a third-party organization. and private clouds.
organizations.
provider.
Restricted to a
Open to the general Restricted to a single Both public and private
Accessibility specific group of
public. organization. resources are accessible.
users.
Costs are shared
Pay-per-use; cost is High cost due to Costs depend on the
Cost among community
shared among users. dedicated resources. combination used.
members.
Shared control among
Limited control over Full control over Partial control over
Control participating
infrastructure. resources. private cloud resources.
organizations.
Lower security; data High security; Moderate security, Balances security across
Security is shared over the dedicated tailored for public and private
public network. infrastructure. community needs. clouds.
Moderate scalability,
Highly scalable with Limited by the Highly scalable with
depending on
Scalability on-demand organization’s public resources and
community
resources. resources. private infrastructure.
agreements.
Suitable for
Suitable for startups Ideal for sensitive data Best for organizations
organizations requiring
Use Case or general-purpose and compliance-heavy with shared goals
flexibility and resource
applications. environments. (e.g., healthcare).
optimization.
Government or A company using AWS
AWS, Microsoft VMware Private
Examples healthcare-specific for testing and a private
Azure, Google Cloud. Cloud, OpenStack.
clouds. cloud for operations.

Cloud Component :

1. Servers 4. Virtualization
2. Data center 5. Security
3. Networking 6. Client computer

Cloud Service Models :

Cloud computing offers three primary service models: Infrastructure as a Service (IaaS), Platform as a
Service (PaaS), and Software as a Service (SaaS). These models differ in the level of control and
responsibility users have over the resources.
pg. 3 By Ghanshyam
1. Infrastructure as a Service (IaaS) :
IaaS provides virtualized computing resources over the internet, such as servers, storage, and networking.
It gives users control over the underlying infrastructure while offloading hardware maintenance to the
cloud provider.
 Examples: Amazon EC2 (AWS), Microsoft Azure Virtual Machines, Google Compute Engine.
 Features:
o Users manage the operating system, applications, and middleware.
o Scalability to meet demand.
o Resources are billed on a pay-as-you-use basis.
 Advantages:
o Flexibility to configure infrastructure as needed.
o Cost savings by avoiding physical hardware investment.
 Disadvantages:
o Users are responsible for managing the software stack.

2. Platform as a Service (PaaS)


PaaS provides a platform that includes tools, frameworks, and runtime environments for application
development. Users focus on developing and running applications while the cloud provider manages the
underlying infrastructure.
 Examples: Google App Engine, Microsoft Azure App Services, Heroku.
 Features:
o Developers get pre-configured environments for coding, testing, and deploying.
o Integration with databases, middleware, and development tools.
 Advantages:
o Simplifies development and deployment.
o No need to manage infrastructure, middleware, or runtime.
 Disadvantages:
o Limited control over the underlying infrastructure.
o May lead to vendor lock-in if applications are tightly integrated with the provider’s tools.

1. Software as a Service (SaaS)


SaaS delivers software applications over the internet, eliminating the need to install or maintain them
locally. Users access the software via a web browser.
 Examples: Google Workspace (Docs, Gmail), Microsoft Office 365, Salesforce.
 Features:
o Applications are fully managed by the cloud provider.
o Users access the software through a web interface or app.
 Advantages:
o Easy to use; no need for installation or maintenance.
o Cost-effective with subscription-based pricing.
 Disadvantages:
o Limited customization options.
o Data security concerns if sensitive information is stored in the cloud.

pg. 4 By Ghanshyam
Comparison of IaaS, PaaS, and SaaS :

IaaS (Infrastructure as a SaaS (Software as a


Feature PaaS (Platform as a Service)
Service) Service)
Provides a platform with tools
Provides virtualized Provides ready-to-use
Definition and frameworks for application
infrastructure like servers, software applications.
development.
storage, and networking.
Manage OS, middleware, Only manage user
User Manage applications and data.
applications, and data. settings and data.
Responsibility
Developers and application End-users requiring
Target Users IT administrators and system
teams. specific software.
architects.
Minimal control;
High control over infrastructure Moderate control over the
Control Level application is fully
and configurations. platform.
managed.
AWS EC2, Google Compute Gmail, Microsoft
Examples Google App Engine, Heroku.
Engine. Office 365.
Scalable development Scalable application
Scalability Highly scalable infrastructure.
environment. usage.
Pay-per-use for resources Subscription-based or
Cost Structure Pay for platform usage.
consumed. pay-per-user.
Medium; easier to use than Low; ready-to-use with
Complexity High; requires expertise to
IaaS. minimal setup.
configure and maintain.

Cloud Economics and Benefits :

Cloud Economics refers to the financial aspects of using cloud computing, emphasizing cost-efficiency,
scalability, and optimization. Businesses adopt cloud solutions to reduce upfront investments in
infrastructure and optimize operating costs.

Key Benefits of Cloud Computing

1. Cost Efficiency
2. Scalability and Flexibility
3. Faster Time to Market
4. Accessibility and Collaboration
5. Business Continuity
6. Global Reach
7. Security Enhancements

Services Offered by Cloud Computing :

1. Infrastructure as a Service (IaaS) 4. Networking Services


2. Platform as a Service (PaaS) 5. Security Service
3. Storage Services 6. Analytics and Al Services
pg. 5 By Ghanshyam
Cloud Computing Architecture

The architecture of cloud computing can be visualized with key components divided into the front end and
the back end of the system, interacting through the internet. Here’s a simplified view of cloud computing
architecture:

Frontend

 Components:
o User Interface (UI): Web browsers or applications used to interact with the cloud (e.g.,
Google Drive, AWS Management Console).
o Client Devices: Computers, smartphones, or tablets that access cloud services.
 Purpose: Provides the interface for users to access and utilize cloud resources.

Backend

 Management: This part handles administrative tasks and operations of the cloud services.
 Application: The software applications hosted on the cloud.
 Service: Various services provided by the cloud, such as computing power, databases, networking,
and messaging.
 Storage: Manages data storage solutions, ensuring efficient data management and retrieval.
 Security: Ensures protection and security of data and applications hosted in the cloud.

Communication between the frontend and backend components happens over the internet, facilitating
seamless interaction and service delivery.

1.5 Cloud Based Integrated Development Environment : steps to use write run , debug , code with Browser

pg. 6 By Ghanshyam
Unit 02 : Virtualization
Introduction to Virtualization

Virtualization is the process of creating virtual versions of physical hardware, such as servers, storage,
networks, or even applications. It allows multiple operating systems or workloads to run on a single
physical machine by abstracting the hardware resources.

 Purpose: To optimize resource utilization, improve scalability, and simplify IT infrastructure.


 Key Technology: Hypervisors, which enable multiple virtual machines (VMs) to share physical
hardware.

Characteristics of a Virtualized Environment

1. Isolation 4. Resource Pooling


2. Encapsulation 5. Fault Tolerance
3. Hardware Independence 6. Improved Security
4. Elasticity and Scalability 7. Improved Security
5. Resource Pooling 8. Live Migration

Virtualization Reference Model

The Virtualization Reference Model is a conceptual framework that describes the components and layers
in a virtualized environment. It helps understand how physical resources are abstracted and managed to
provide virtualized services.

Key Components of the Virtualization Reference Model

1. Physical Hardware Layer

 Definition: The foundation layer containing the actual physical resources.


 Components:
o CPU: Provides processing power.
o Memory (RAM): Temporary storage for running applications.
o Storage: Hard drives or SSDs for data storage.
o Network: Physical networking devices like routers, switches, and NICs.
 Purpose: Supplies the raw computing power and resources needed for virtualization.

2. Virtualization Layer

 Definition: This layer abstracts the physical hardware and creates virtual instances.
 Key Technology: Hypervisor (or Virtual Machine Monitor).
 Hypervisor Types:
o Type 1 (Bare-Metal): Runs directly on hardware (e.g., VMware ESXi, Microsoft Hyper-V).
o Type 2 (Hosted): Runs on an existing OS (e.g., Oracle VirtualBox, VMware Workstation).
 Purpose:
o Allocates physical resources to virtual machines.
o Provides isolation between virtual machines.

pg. 7 By Ghanshyam
3. Virtual Machines (VMs)

 Definition: Independent virtualized environments that mimic physical computers.


 Components:
o Virtual CPUs (vCPUs): Virtual processors allocated from physical CPUs.
o Virtual Memory: Memory allocated from physical RAM.
o Virtual Disk: Virtual storage for the operating system and data.
o Virtual Network Interfaces: Simulated NICs for network connectivity.
 Purpose: Run operating systems and applications independently.

4. Management Layer

 Definition: Provides tools and software for managing virtualized resources and environments.
 Components:
o Orchestration Tools: Automate resource allocation (e.g., Kubernetes, OpenStack).
o Monitoring Tools: Track performance and usage (e.g., vCenter, Prometheus).
o Resource Management: Allocate and optimize hardware resources dynamically.
 Purpose: Simplifies the management of VMs, storage, and networks.

5. User Interface Layer

 Definition: The interface through which users interact with and manage the virtualized
environment.
 Components:
o Web portals or dashboards for administrators and users.
o APIs for programmatic access to virtual resources.
 Purpose: Provide easy access for managing and using virtual machines and resources.

| User Interface Layer | | (Web Portals, APIs, Dashboards) |

| Management Layer | | (Orchestration, Monitoring, Resource Management) |

| Virtualization Layer | | (Hypervisors, Resource Abstraction) |

| Virtual Machines (VMs) | | (vCPU, vMemory, vDisk, vNetwork) |

| Physical Hardware Layer | | (CPU, Memory, Storage, Network) |

------------------------------------------------------------------------------------------------------------------------------------

Types of Virtualization

1. Server Virtualization
o Divides a single server into multiple virtual servers.
o Example: VMware ESXi, Microsoft Hyper-V.
o Use: Hosting multiple applications.
2. Desktop Virtualization
o Provides virtual desktops hosted on a remote server.
o Example: Citrix Virtual Apps, VMware Horizon.
o Use: Remote work environments.
pg. 8 By Ghanshyam
3. Network Virtualization
o Abstracts physical networks into virtual networks.
o Example: VMware NSX, Cisco ACI.
o Use: Simplifies cloud network management.
4. Storage Virtualization
o Combines physical storage into a virtual pool.
o Example: VMware vSAN, NetApp.
o Use: Simplified storage management.
5. Application Virtualization
o Runs apps without installing them on the OS.
o Example: Citrix XenApp, Microsoft App-V.
o Use: Compatibility and simplified deployment.
6. Data Virtualization
o Integrates data from multiple sources into one view.
o Example: Denodo, IBM Data Virtualization.
o Use: Business intelligence and analytics.

Technology Types: VMware, Microsoft Hyper-V, KVM, Xen

These are prominent virtualization technologies used to create and manage virtualized environments.
Here's a brief overview:

1. VMware

 Type: Proprietary, Enterprise-grade virtualization.


 Features:
o Offers Type 1 (ESXi) and Type 2 (VMware Workstation) hypervisors.
o Advanced features like vMotion (live migration), High Availability, and Fault Tolerance.
o Excellent for enterprise-level data centers and cloud solutions.
 Use Cases: Server virtualization, virtual desktop infrastructure (VDI).
 Examples: VMware ESXi, vSphere, Workstation.

2. Microsoft Hyper-V

 Type: Proprietary, Hypervisor-based virtualization (Type 1).


 Features:
o Built into Windows Server and Windows 10/11 Pro.
o Supports live migration, replication, and resource control.
o Integration with Microsoft ecosystem (Azure, Windows).
 Use Cases: Windows-centric environments, server consolidation.
 Examples: Windows Server Hyper-V, Hyper-V Manager.

3. KVM (Kernel-based Virtual Machine)

 Type: Open-source, Type 1 hypervisor built into the Linux kernel.


 Features:
o Converts Linux into a full-featured hypervisor.
o Lightweight, efficient, and highly customizable.
o Supports a wide range of guest OSs.
 Use Cases: Open-source and cost-efficient virtualization, cloud solutions.
 Examples: KVM with management tools like QEMU, oVirt, and OpenStack.

pg. 9 By Ghanshyam
4. Xen

 Type: Open-source, Type 1 hypervisor.


 Features:
o High performance and scalability.
o Used in cloud platforms like AWS.
o Supports paravirtualization (PV) for better performance and hardware-assisted
virtualization (HVM).
 Use Cases: Cloud environments, hosting providers.
 Examples: Xen Project, Citrix Hypervisor.

Advantages of Virtual Machines (VMs)

Virtual machines (VMs) provide numerous benefits for modern IT environments, especially in virtualization,
cloud computing, and resource management. Here's a breakdown based on the aspects you mentioned:

1. Virtual Machines (VMs) 3. Consolidation

 Isolation  Optimized Resource Usage


 Platform Independence  Cost Savings
 Resource Efficiency  Simplified Management
 Flexibility  Environmental Benefits
 Backup and Recovery
4. Management
2. VM Migration
 Centralized Control
 Live Migration  Automation
 Load Balancing  Enhanced Security
 Hardware Maintenance  Scalability
 Disaster Recovery  Monitoring and Analytic
 Flexibility

Steps to Build a Virtual Machine Using VMware

1. Install VMware: Download and install VMware Workstation or Fusion on your physical machine.
2. Launch VMware: Open the application and select "Create a New Virtual Machine".
3. Choose Configuration: Select Typical for default settings or Custom for advanced configurations.
4. Provide OS Installation Media: Use a physical disc, ISO file, or choose to install the OS later.
5. Select OS Type: Choose the guest operating system (e.g., Windows, Linux) and version.
6. Name and Save VM: Enter a name and specify a location to save VM files.
7. Configure Resources: Assign CPU, memory (RAM), storage, and network settings.
8. Create and Power On: Finish setup, power on the VM, and install the OS.
9. Install VMware Tools: Optimize performance by installing VMware Tools after OS installation.
10. Save and Use: Save settings and start using the VM.

Features of Virtualization

1. Resource Sharing 4. Scalability 8. Security


2. Isolation 5. Flexibility 9. Cost Efficiency
3. Hardware 6. Snapshot and Backup 10. Centralized
Independence 7. Live Migration Management

pg. 10 By Ghanshyam
Unit 03 : Storage in Clouds
Storage in Cloud :

Cloud storage refers to storing data on remote servers accessed via the internet. Instead of relying on local
storage devices, cloud services offer scalable, on-demand storage solutions.

Storage System Architecture :

Storage system architecture refers to the design and organization of storage systems, defining how data is
stored, managed, and accessed. A typical storage architecture involves several key components:

Storage Devices

 HDDs and SSDs: These devices store data physically. SSDs are faster and more reliable than HDDs.

Storage Networks

 DAS (Direct-Attached Storage): Storage directly connected to a machine.


 NAS (Network-Attached Storage): Provides file-level storage over a network.
 SAN (Storage Area Network): A high-speed network providing block-level storage to multiple
machines.

Storage Controllers

 These manage the interaction between storage devices and users. RAID controllers offer
redundancy and improve performance.

Data Virtualization

 Virtualization abstracts physical storage, allowing a more efficient use of storage resources.

Data Security & Management

 Backup, encryption, and deduplication help manage, secure, and reduce the storage footprint.

Diagram : Storage Devices (HDD/SSD) → Storage Controllers → Data Network (DAS, NAS, SAN) →
Virtualization Layer (if applicable) → User Access and Management.

Benefits of Virtualized Data Center (VDC)

1. Cost Efficiency 5. Simplified Management


2. Scalability and Flexibility 6. Security and Isolation
3. Improved Resource Utilization 7. Remote Access and Mobility
4. High Availability and Disaster Recovery 8. Faster Provisioning

pg. 11 By Ghanshyam
Virtualized Data Center (VDC) Architecture and Environment

A Virtualized Data Center (VDC) utilizes virtualization technologies across the data center infrastructure,
including servers, storage, networking, desktops, and applications. VDC allows for flexibility, scalability,
and improved resource utilization while reducing costs and complexity.

VDC Architecture Components

A typical VDC architecture consists of the following components:

Server Virtualization

 Hypervisor (e.g., VMware ESXi, Microsoft Hyper-V, KVM) abstracts physical servers to create virtual
machines (VMs).
 Multiple VMs can run on the same physical hardware, maximizing CPU, memory, and storage
utilization.

Storage Virtualization

 Physical storage devices (e.g., hard drives, SSDs) are abstracted into a unified pool of storage.
 SAN (Storage Area Networks) or NAS (Network-Attached Storage) provide flexible and scalable
storage solutions.
 Features like Data Deduplication, RAID, and Snapshots improve storage efficiency and availability.

Networking Virtualization

 Virtual network switches and routers create logical networks, separate from the physical network.
 Software-Defined Networking (SDN) enables flexible network configurations and centralized
management of the network infrastructure.

Desktop Virtualization

 Virtual Desktop Infrastructure (VDI) allows the creation and management of virtual desktops that
are hosted on centralized servers.
 Users access these desktops remotely, making it easier to manage and secure.

Application Virtualization

 Applications run on a centralized server or in the cloud, with users accessing them remotely, rather
than installing them locally on physical machines.
 Tools like Citrix XenApp or VMware Horizon can provide application delivery.

VDC Environment Setup

The environment consists of:

 Physical Servers: Hosting hypervisors for virtualization.


 Storage Systems: A combination of SAN or NAS providing scalable, accessible storage.
 Network Infrastructure: SDN and virtualized network resources allow optimized connectivity.
 Management Layer: Tools (e.g., VMware vCenter, Microsoft System Center) for managing and
orchestrating virtualized resources across servers, storage, and network.

pg. 12 By Ghanshyam
Virtualization Techniques :

Server Virtualization

 VMWare, Hyper-V, KVM are used to create multiple VMs on a single physical server.
 Benefits: Improved hardware utilization, simplified management, isolation, and security.

Storage Virtualization

 SAN/NAS provides shared storage resources across multiple servers.


 Benefits: Centralized management, redundancy, and flexibility.

Networking Virtualization

 SDN allows for programmable networks where software controls network behavior.
 Benefits: Easier management, improved scalability, and faster network adjustments.

Desktop Virtualization

 VDI (Virtual Desktop Infrastructure) or Remote Desktop Services allow users to access virtualized
desktops remotely.
 Benefits: Centralized desktop management, reduced hardware costs, and better security.

Application Virtualization

 App Virtualization Tools (e.g., VMware Horizon, Citrix XenApp) allow applications to run on
centralized servers.
 Benefits: Simplified application management, no need for local installation, and easy updates.

Steps to Design a Storage System for a Cloud Setup (Short Version)

1. Determine Storage Requirements


o Capacity: Estimate the total storage required based on data volume, growth, and retention
needs.
o Performance: Identify the required IOPS (Input/Output Operations Per Second), latency, and
throughput.
2. Choose the Storage Type
o Object Storage (e.g., Amazon S3) for unstructured data.
o Block Storage (e.g., Amazon EBS) for databases and high-performance applications.
o File Storage (e.g., Amazon EFS) for shared file systems.
3. Select Storage Architecture
o Distributed Storage: Use technologies like RAID, SAN, or NAS for redundancy and scalability.
o Cloud-Native Storage: Leverage cloud providers' managed storage services to ensure
scalability and reliability.
4. Plan Data Redundancy and Backup
o Implement RAID configurations (RAID 1, RAID 5, or RAID 10) for redundancy.
o Set up automated backups for data protection, ensuring disaster recovery.
5. Define Data Security Measures
o Apply encryption (in-transit and at-rest) to protect sensitive data.
o Implement access controls and authentication mechanisms to secure data.
6. Select a Storage Management Tool

pg. 13 By Ghanshyam
oUse cloud providers’ management tools (e.g., AWS Management Console, Google Cloud
Storage Manager) for provisioning, monitoring, and scaling storage.
7. Consider Data Lifecycle Management
o Implement data tiering strategies (hot, cold, and archival storage) to optimize cost based on
data access frequency.
8. Implement Monitoring and Scaling
o Set up monitoring (e.g., CloudWatch, Prometheus) for real-time tracking of storage
utilization.
o Enable auto-scaling to automatically increase or decrease storage capacity based on deman

Block and File Level Storage Virtualization


Block-Level Storage Virtualization

 Definition: In block-level virtualization, data is stored in blocks, and the storage system abstracts
these blocks from the physical storage hardware. The virtualization layer presents logical storage
volumes to the operating system, enabling the use of different storage devices as a single,
consolidated block storage resource.
 Example: Technologies like SAN (Storage Area Network) and iSCSI.
File-Level Storage Virtualization
 Definition: In file-level virtualization, the data is organized in a file system (such as NTFS or ext4)
rather than as individual blocks. Virtualization occurs at the file level, and the system abstracts the
files across different storage systems, often using a network file system.
 Example: NAS (Network Attached Storage) solutions.

Virtual Provisioning and Automated Storage Tiering


Virtual Provisioning
 Definition: Virtual provisioning involves dynamically allocating storage resources as needed without
the immediate need to allocate physical storage. It allows storage to be provisioned based on actual
usage, leading to more efficient resource utilization.
Automated Storage Tiering
 Definition: Automated storage tiering dynamically moves data between different types of storage
media (e.g., SSD, HDD) based on predefined policies, access frequency, or performance needs. Data
that is frequently accessed is placed on faster storage (like SSD), while less critical data moves to
slower, more cost-effective storage.

pg. 14 By Ghanshyam
Virtual Storage Area Network (VSAN)
VSAN (Virtual Storage Area Network)

 Definition: A VSAN is a software-defined storage solution that creates a virtualized storage area
network by pooling together storage resources across multiple physical machines. It abstracts the
underlying storage infrastructure, providing a unified, flexible virtual storage layer that is easily
scalable.
 Key Features:
1. Virtualizes both block and file storage resources.
2. It is integrated into hyperconverged infrastructure (HCI), with storage, compute, and
networking combined into a single platform.
3. It is often used with VMware vSphere for virtualized environments.
Benefits of VSAN
1. Scalability: Easily scale storage capacity by adding new nodes to the network.
2. Simplified Management: Centralized management for both compute and storage resources.
3. Cost Efficiency: Reduces the need for separate storage hardware and simplifies the infrastructure.
4. Performance: Ensures better resource utilization with optimized storage performance across virtual
machines.
5. High Availability: Built-in fault tolerance and disaster recovery features ensure continuous uptime.

Cloud File Systems: Google File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) are both distributed file systems
designed to handle large-scale data storage across clusters of machines. They share many similarities but
also have key differences.

Google File System (GFS)


 Developed by: Google

 Purpose: Designed to meet Google's needs for scalable and fault-tolerant data storage to support its
search engines, indexing, and data processing systems.
 Characteristics:
1. Data Replication: GFS replicates data across multiple nodes (typically three copies) to ensure
fault tolerance and high availability.
2. Write-Once, Append-Only: GFS allows files to be written once and then read multiple times,
with the ability to append data to existing files but no modification of existing data.

pg. 15 By Ghanshyam
3. Chunk-Based Storage: Data is divided into fixed-size chunks (64MB by default), and each
chunk is stored across multiple machines. Each chunk has a primary replica and several
replicas for redundancy.

4. Fault Tolerance: GFS has mechanisms to detect node failures and automatically replicate
missing or corrupted data from replicas.
5. Metadata: Metadata (e.g., file names, permissions) is stored in a centralized master server,
but actual data is distributed across worker nodes.
6. Optimized for Large Files: Optimized for large files (typically in GB to TB sizes) and heavy
workloads, like MapReduce.

Hadoop Distributed File System (HDFS)


 Developed by: Apache Software Foundation, part of the Hadoop ecosystem.
 Purpose: Built to support distributed storage for big data processing workloads, HDFS is used
extensively with frameworks like Apache Spark and MapReduce to handle large data sets efficiently.
 Characteristics:
1. Data Replication: Similar to GFS, HDFS replicates data blocks to multiple nodes (default
replication factor of 3) to ensure fault tolerance.
2. Write-Once, Read-Many: Like GFS, HDFS supports write-once semantics for files, meaning
data can only be written once and modified later by creating new files.
3. Block-Based Storage: HDFS divides data into blocks (typically 128MB or 256MB) and
distributes these blocks across different nodes in the cluster.
4. Master-Slave Architecture: The NameNode (master) stores metadata (file system
namespace) and the DataNodes (slaves) store the actual data blocks.
5. Fault Tolerance: HDFS has mechanisms to recover from hardware failures, including block
replication and re-replication of lost blocks if necessary.

6. Optimized for Large Files: HDFS is optimized for high throughput of large files, ideal for
batch processing and analytic workloads.

Verification Statistics in GFS and HDFS (Short Version)


1. Checksum Verification:
o Both GFS and HDFS use checksums (e.g., CRC32) to verify data integrity during read
operations. When data is written, a checksum is generated and stored. During data retrieval,
the checksum is validated to detect corruption.
2. Data Replication:
o GFS and HDFS replicate data across multiple nodes to ensure fault tolerance. The default
replication factor is 3 for both systems. If a node or data chunk fails, replication ensures data
availability from other replicas.

pg. 16 By Ghanshyam
3. Error Detection:
o Both systems employ error detection methods during data transfers, ensuring that data
remains intact during inter-node communication. Corrupted data is replaced from replica
copies.
4. Block/Chunk Recovery:

o In case of data corruption or node failure, both systems re-replicate the data from healthy
replicas to maintain fault tolerance and data integrity.
5. Heartbeat and Failure Detection:
o HDFS uses a Heartbeat mechanism to detect DataNode failures. In GFS, the master server
tracks replicas and initiates recovery in case of failure.

Comparison: GFS vs. HDFS


Feature GFS (Google File System) HDFS (Hadoop Distributed File System)

Designed for Google’s internal large-scale Designed for storing and processing large
Purpose
data storage needs datasets in the Hadoop ecosystem

Data Storage
Chunk-based storage (64MB by default) Block-based storage (128MB or 256MB blocks)
Units

Data
Default replication factor is 3 Default replication factor is 3
Replication

File
Write-once, append-only Write-once, read-many
Modification

Metadata
Centralized master node stores metadata Metadata is stored in NameNode (master)
Storage

Replication of blocks across nodes for fault


Fault Tolerance Automated replication for fault tolerance
tolerance

File chunks cannot be modified once Files can be written once and appended, no
Write Access
written, but can be appended random writes

Optimized for large-scale data processing


Performance Optimized for batch processing and analytics
and search tasks

Big data analytics, batch processing, data


Use Cases Google search indexing, data processing
warehousing

Primarily integrated into Google Integrated into the Hadoop ecosystem (e.g.,
Integration
infrastructure MapReduce, Spark)

pg. 17 By Ghanshyam

You might also like