0% found this document useful (0 votes)
168 views184 pages

Ccs367 Storage Technologies Book

The document outlines the syllabus for the CCS367 Storage Technologies course at Anna University, covering various aspects of storage systems, intelligent storage systems, storage networking technologies, backup and replication, and securing storage infrastructure. It includes detailed topics such as cloud computing, RAID, SAN architectures, data protection, and business continuity planning. Each unit is structured to provide foundational knowledge and practical applications in the field of storage technologies.

Uploaded by

junkyharish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views184 pages

Ccs367 Storage Technologies Book

The document outlines the syllabus for the CCS367 Storage Technologies course at Anna University, covering various aspects of storage systems, intelligent storage systems, storage networking technologies, backup and replication, and securing storage infrastructure. It includes detailed topics such as cloud computing, RAID, SAN architectures, data protection, and business continuity planning. Each unit is structured to provide foundational knowledge and practical applications in the field of storage technologies.

Uploaded by

junkyharish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 184

lOMoARcPSD|40976084

CCS367 Storage Technologies BOOK

Storage Technologies (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Harish Junk ([email protected])
lOMoARcPSD|40976084

CCS367 STORAGE TECHNOLOGIES


SYLLABUS
UNIT I - STORAGE SYSTEMS
Introduction to Information Storage: Digital data and its types, Information storage,
Key characteristics of data center and Evolution of computing platforms.
Information Lifecycle Management. Third Platform Technologies: Cloud
computing and its essential characteristics, Cloud services and cloud deployment
models, Big data analytics, Social networking and mobile computing,
Characteristics of third platform infrastructure and Imperatives for third platform
transformation. Data Center Environment: Building blocks of a data center,
Compute systems and compute virtualization and Software-defined data center.
UNIT II - INTELLIGENT STORAGE SYSTEMS AND RAID
Components of an intelligent storage system, Components, addressing, and
performance of hard disk drives and solid-state drives, RAID, Types of intelligent
storage systems, Scale-up and scale-out storage Architecture.
UNIT III - STORAGE NETWORKING TECHNOLOGIES AND
VIRTUALIZATION
Block-Based Storage System, File-Based Storage System, Object-Based and
Unified Storage. Fiber Channel SAN: Software-defined networking, FC SAN
components and architecture, FC SAN topologies, link aggregation, and zoning,
Virtualization in FC SAN environment. Internet Protocol SAN: iSCSI protocol,
network components, and connectivity, Link aggregation, switch aggregation, and
VLAN, FCIP protocol, connectivity, and configuration. Fiber Channel over
Ethernet SAN: Components of FCoE SAN, FCoE SAN connectivity, Converged
Enhanced Ethernet, FCoE architecture.
UNIT IV - BACKUP, ARCHIVE AND REPLICATION
Introduction to Business Continuity, Backup architecture, Backup targets and
methods, Data deduplication, Cloud-based and mobile device backup, Data archive,
Uses of replication and its characteristics, Compute based, storage-based, and
network-based replication, Data migration, Disaster Recovery as a Service
(DRaaS).
UNIT V SECURING STORAGE INFRASTRUCTURE
Information security goals, Storage security domains, Threats to a storage
infrastructure, Security controls to protect a storage infrastructure, Governance,
risk, and compliance, Storage infrastructure management functions, Storage
infrastructure management processes.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

TABLE OF CONTENTS
UNIT 1 - STORAGE SYSTEMS
TOPICS PAGE NO
1.1 Introduction to Information Storage 11
1.1.1 Digital data and its types 13
1.1.1.1 Structured data 14
1.1.1.2 Unstructured data 14
1.1.1.3 Semi-Structured data 14
1.1.2 Information storage 16
1.1.3 Key characteristics of data center 17
1.1.4 Evolution of computing platforms 19
1.1.5 Information Lifecycle Management (ILM) 20
1.1.5.1 Information Lifecycle 20
1.1.5.2 Example of ILM 21
1.1.5.3 Characteristics of ILM 21
1.1.5.4 Implementation of ILM 22
1.1.5.5 Benefits of ILM 23
1.2 Third Platform Technologies 24
1.2.1 Cloud computing 25
1.2.2 Essential characteristics of cloud computing 25
1.2.3 Cloud service models 26
1.2.3.1 Infrastructure as a Service (IaaS) 27
1.2.3.2 Platform as a Service (PaaS) 28
1.2.3.3 Software as a Service (SaaS) 29
1.2.4 Types of cloud / Cloud deployment models 31
1.2.4.1 Public cloud 33
1.2.4.2 Private cloud 34
1.2.4.3 Hybrid cloud 35
1.2.4.4 Community cloud 37
1.2.4.5 Multi-Cloud 38
1.2.5 Big data Analytics 40
1.2.6 Social networking 45
1.2.7 Mobile computing 47
1.2.8 Characteristics of third platform infrastructure 50
1.2.9 Imperatives for third platform transformation 51
1.3 Data center environment 52
1.3.1 Building blocks of data center 53
1.3.2 Compute systems 56
1.3.4 Compute virtualization 57
1.3.5 Software-defined data center 58
Two Mark Questions with Answers 59
Review Questions 63

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

UNIT 2 - INTELLIGENT STORAGE SYSTEMS AND RAID


TOPICS PAGE
NO
2.1 Intelligent Storage Systems 65
2.1.1 Components of Intelligent storage system 65
2.1.1.1 Front-end 65
2.1.1.2 Cache 67
2.1.1.3 Back-end 71
2.1.1.4 Physical disks 72
2.1.2 Components of Hard disk drives and Solid-state drives 73
2.1.2.1 Platter 73
2.1.2.2 Spindle 74
2.1.2.3 Read/Write Head 74
2.1.2.4 Actuator Arm Assembly 74
2.1.2.5 Controller 75
2.1.2.6 Physical Disk Structure 75
2.1.2.7 Zoned Bit Recording 76
2.1.3 Addressing of Hard disk drives and Solid-state drives 78
2.1.3.1 Logical Block Addressing 78
2.1.4 Performance of Hard disk drives and Solid-state drives 79
2.1.4.1 Disk Service Time 79
2.1.5 Types of Intelligent storage systems 81
2.1.5.1 High-end Intelligent storage system 82
2.1.5.2 Mid-range Intelligent storage system 83
2.2 Data Protection: RAID (Redundant Array of Independent Disks) 83
2.2.1 Implementation of RAID 83
2.2.1.1 Software RAID 84
2.2.1.2 Hardware RAID 84
2.2.2 RAID Array Components 84
2.2.3 RAID Levels 85
2.2.3.1 Striping 85
2.2.3.2 Mirroring 86
2.2.3.3 Parity 87
2.2.3.4 RAID 0: block-by-block striping 89
2.2.3.5 RAID 1: block-by-block mirroring 90
2.2.3.6 RAID 0+1: striping and mirroring combined 91
(Nested RAID)
2.2.3.7 RAID 2: Bit-Level Stripping with Dedicated 92
Parity
2.2.3.8 RAID 3: Byte-Level Stripping with Dedicated 94
Parity
2.2.3.9 RAID 4: Block-Level Stripping with Dedicated 95
Parity

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

2.2.3.10 RAID 5: Block-Level Stripping with Distributed 95


Parity
2.2.3.11 RAID 6: Block-Level Stripping with two Parity 97
Bits
2.2.4 RAID Comparison 98
2.2.5 RAID Impact on Disk Performance 99
2.2.5.1 Application IOPS and RAID Configurations 100
2.2.6 Hot Spares 101
2.3 Scale-up and Scale-out storage Architecture 102
2.3.1 Comparison of Scale up-Scale-out Storage 103
2.3.2 Advantages 103
2.3.3 Disadvantages 104
Two Mark Questions with Answers 106
Review Questions 109

UNIT 3 - STORAGE NETWORKING TECHNOLOGIES AND


VIRTUALIZATION
TOPICS PAGE
NO
3.1 Storage system 110
3.1.1 Types of Storage Systems 110
3.1.1.1 Block-Based Storage System 110
3.1.1.2 File-Based Storage System 112
3.1.1.3 Object-Based is Storage System 112
3.1.1.4 Unified Storage 114
3.2 Fiber Channel Storage Area Network (FC SAN) 116
3.2.1 Software-defined networking 117
3.2.2 FC SAN Components and Architecture 119
3.2.2.1 Physical Components: Host bus adapters and 119
converged network adapters
3.2.2.2 FC Interconnecting devices: (Hubs, Switches and 120
Directors)
3.2.2.3 FC Storage Arrays 120
3.2.2.4 FC Cabling: (Multimode fiber (MMF), Single- 121
mode fiber (SMF))
3.2.2.5 Logical Components: FC SAN Protocol Stack 122
3.2.2.6 FC SAN Addressing 123
3.2.2.7 FC Fabrics 125
3.2.2.8 FC Frame structure 125
3.2.2.9 FC Services (Fabric login server, Name server, 127
Fabric controller, Management server)

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

3.2.2.10 FC flow control 128


3.2.2.11 Zoning 128
3.2.2.12 FC Classes and Service 130
3.2.2.13 Virtual SAN 131
3.2.3 FC SAN Connectivity 134
3.2.3.1 Point-to-Point 134
3.2.3.2 Fiber Channel Arbitrated Loop 135
3.2.3.3 Fiber Channel Switched Fabric 137
3.2.4 FC SAN Port virtualization 139
3.2.4.1 Types of Ports (N_Port, E_Port, F_Port, G_Port 139
3.2.4.2 N_Port Virtualization 141
3.2.4.3 N_Port ID Virtualization (NPIV 142
3.2.5 FC SAN Topologies 142
3.2.5.1 Single-Switch topology 142
3.2.5.2 Mesh topology 143
3.2.5.3 Core-edge topology 144
3.2.6 Link aggregation and zoning 145
3.2.6.1 Link aggregation with example 145
3.2.6.2 Zoning 145
3.2.6.3 Best practices for zoning 146
3.2.6.4 Types of Zoning (WWN zoning, Port zoning, 147
Mixed Zoning
3.2.7 Virtualization in FC SAN (VSAN) environment 147
3.2.7.1 Configuring VSAN 149
3.2.7.2 VSAN verses Zone 149
3.2.7.3 VSAN Trunking 149
3.2.7.4 VSAN Tagging 150
3.2.8 Basic troubleshooting tips for Fiber Channel (FC) SAN 150
issues
3.3 IP SAN 152
3.3.1 iSCSI 153
3.3.1.1 Components of iSCSI 155
3.3.1.2 iSCSI Host Connectivity 156
3.3.1.3 Topologies for iSCSI Connectivity 157
3.3.1.4 iSCSI Protocol Stack 158
3.3.1.5 iSCSI Discovery 159
3.3.1.6 iSCSI Names 160
3.3.1.7 iSCSI Session 161
3.3.1.8 iSCSI PDU 161
3.3.1.9 Link aggregation, Switch aggregation 162
3.3.1.10 VLAN 163
3.3.1.11 Ordering and Numbering 165
3.3.1.12 iSCSI Error Handling & Security 165

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

3.3.2 FCIP 166


3.3.2.1 FCIP Protocol 167
3.3.2.2 FCIP Connectivity and Topologies 167
3.3.2.3 FCIP Configuration 168
3.3.2.4 FCIP Performance and Security 169
3.4 Fiber Channel over Ethernet Storage Area Network (FCoE SAN) 169
3.4.1 Components of FCoE SAN 169
3.4.2 FCoE SAN connectivity 171
3.4.3 Converged Enhanced Ethernet 172
3.4.4 FCoE Architecture 176
Two Mark Questions with Answers 179
Review Questions 182

UNIT 4 - BACKUP, ARCHIVE AND REPLICATION


TOPICS PAGE
NO
4.1 Introduction to Business Continuity
4.1.1 Information Availability
4.1.1.1 Causes of Information Unavailability
4.1.1.2 Measuring Information Availability
4.1.1.3 Consequences of Downtime
4.1.2 Business Continuity Terminology
4.1.3 Business Continuity Planning Lifecycle
4.1.4 Failure Analysis
4.1.4.1 Single Point of Failure
4.1.4.2 Fault Tolerance
4.1.4.3 Multipathing Software
4.1.5 Business Impact Analysis
4.1.6 BC Technology Solutions
4.1.7 Concept in Practice: EMC Power Path
4.1.7.1 Power Path Features
4.1.7.2 Dynamic Load Balancing
4.1.7.3 Automatic Path Failover
4.2 Backup and Recovery
4.2.1 Backup (Protecting the data for short term) Purpose
4.2.1.1 Disaster Recovery
4.2.1.2 Operational Backup
4.2.1.3 Archival
4.2.1.4 Backup Considerations
4.2.1.5 Backup Granularity
4.2.1.6 Recovery Considerations
4.2.1.7 Backup and Restore Operations

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

4.3 Backup Architecture and components


4.3.1 Backup Servers
4.3.2 Backup Clients
4.3.3 Media Servers
4.3.4 Backup Destinations/Targets/ Back up Technologies
4.3.4.1 Tape Library and Tape Drives
4.3.4.2 Disk Drives
4.3.4.3 Virtual Tape Libraries
4.3.4.3 Taking backup to the Cloud
4.3.5 Backup methods
4.3.5.1 Hot Backups/Online Backups
4.3.5.2 Cold Backups/Offline Backups
4.3.5.3 Local Area Network (LAN) Based Backups
4.3.3.4 LAN-Free Backups / Storage Area Network
(SAN) -based backups
4.3.3.5 Serverless Backup
4.3.3.6 Network Data Management Protocol (NDMP)
Backup
4.3.3.7 Direct Primary Storage Backup
4.3.6 Types of Data Backups
4.3.6.1 Full Backup
4.3.6.2 Cumulative (Differential) Backup
4.3.6.3 Incremental Backup
4.3.6.4 Synthetic Backup
4.3.6.5 Incremental Forever Backup
4.3.6.6 Image Backups
4.3.6.7 Application-Aware Backups
4.3.7 Data Deduplication
4.3.8 Cloud-based backup
4.3.8.1 Backup as a Service
4.3.8.2 Backup service deployment options in a cloud-
based backup
4.3.8.2.1 Local backup service (managed
backup service)
4.3.8.2.2 Remote backup service
4.3.8.2.3 Replicated backup service
4.3.8.3 Archive as a Service
4.3.8.4 key considerations for cloud-based archiving
4.3.8.4.1 Service Level Agreement (SLA)
4.3.8.4.2 Vendor lock-in
4.3.8.4.3 Compliance
4.3.8.4.4 Data Security
4.3.8.4.5 Pricing

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

4.3.9 Mobile-based backup


4.3.9.1 Adaptive sub-file backup technique (mobile)
4.3.9.2 Mobile Device Management (MDM) (mobile)
4.4 Data Archive (Protecting Data for long term)
4.4.1 Archive operations
4.4.2 Components of Archive Solution Architecture
4.4.2.1 Archiving agent
4.4.2.2 Archiving server
4.4.2.3 Archiving storage device.
4.5 Replication
4.5.1 Uses of Replication
4.5.2 Characteristics of Replication
4.5.3 Types of Data Replication
4.5.3.1 Local Replication
4.5.3.2 Remote Replication
4.5.4 Compute-based replication
4.5.5 Storage-based replication
4.5.5.1 Storage System based Local Replication
Techniques
4.5.5.1.1 Full Volume Replication (Cloning)
4.5.5.1.2 Pointer based Virtual Replication
(Snapshot)
4.5.5.2 Storage System based Remote Replication
Techniques
4.5.5.2.1 Synchronous Replication
4.5.5.2.2 Asynchronous Replication
4.5.5.2.3 Multi-site Replication
4.5.6 Network-based replication (continuous data protection
(CDP)
4.5.6.1 Components of CDP – Journal Volume, CDP
Appliance, Write Splitter
4.6 Data Migration
4.6.1 Advantages of Data Migration Techniques
4.6.2 Data Migration Techniques
4.6.2.1 Storage System based Migration
4.6.2.2 Virtual Appliance based Migration
4.6.2.3 Virtual Machine Live MIgration
4.6.2.4 Virtual Machine Storage Migration
4.7 Disaster Recovery as a Service (DRaaS)
4.7.1 Working Nature of DRaaS
4.7.2 Issues to consider when choosing a DRaaS provider
4.7.3 Why do organizations choose DRaaS?
4.7.4 Types of cloud disaster recovery

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

4.7.4.1 Cloud-to-cloud recovery


4.7.4.2 Hybrid cloud recovery
4.7.4.3 Server-to-cloud recovery
4.7.5 Advantages of Disaster Recovery as a Service (DRaaS)
4.7.6 Key attributes to use DRaaS in an organization
Two Mark Questions with Answers
Review Questions

UNIT 5 - SECURING STORAGE INFRASTRUCTURE

TOPICS PAGE
NO
5.1 Introduction to Information Security
5.2 Information security goals
5.2.1 Confidentiality
5.2.2 Integrity
5.2.3 Availability
5.2.4 Accountability
5.2.4 Authentication
5.2.5 Authorization
5.2.6 Auditing
5.3 Information Security Considerations
5.3.1 Risk assessment
5.3.2 Assets and Threats
5.3.3 Vulnerability
5.3.4 Security Controls
5.3.4.1 Preventive
5.3.4.2 Detective
5.3.4.3 Corrective
5.3.5 Defense in depth
5.4 Storage Security Domains
5.4.1 Securing the Application Access Domain
5.4.2 Securing the Management Access Domain
5.4.3 Securing Backup, Recovery, and Archive (BURA)
5.5 Threats to a storage infrastructure, Security
5.5.1 Unauthorized access
5.5.2 Denial of Service (DoS)
5.5.3 Distributed DoS (DDoS) attack
5.5.4 Data loss
5.5.5 Malicious Insiders
5.5.6 Account Hacking
5.5.7 Insecure API’s

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

10

5.5.8 Shared technology vulnerability


5.5.9 Media Theft
5.6 Security controls to protect a storage infrastructure
5.7 Governance, risk, and compliance
5.8 Storage infrastructure management functions
5.9 Storage infrastructure management processes.
Two Mark Questions with Answers
Review Questions

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

11

UNIT 1 - STORAGE SYSTEMS


Introduction to Information Storage: Digital data and its types, Information storage, Key
characteristics of data center and Evolution of computing platforms. Information Lifecycle
Management. Third Platform Technologies: Cloud computing and its essential characteristics,
Cloud services and cloud deployment models, Big data analytics, Social networking and mobile
computing, Characteristics of third platform infrastructure and Imperatives for third platform
transformation. Data Center Environment: Building blocks of a data center, Compute systems
and compute virtualization and Software-defined data center.

1.1 Introduction to Information Storage


Information storage refers to the process of storing and preserving data in a way that allows for
easy retrieval and access. In today's digital age, where vast amounts of information are generated
and consumed every second, effective information storage is crucial for businesses, organizations,
and individuals alike.
There are various methods and technologies used for information storage, each with its own
advantages and limitations.
Some of the most common forms of information storage:
1. Magnetic Storage: Magnetic storage is one of the oldest and most widely used methods of
storing information. It involves using magnetic fields to encode data on a medium such as hard
disk drives (HDDs) or magnetic tapes. HDDs are commonly used in computers to store operating
systems, software applications, and user data. Magnetic tapes are often used for long-term
archival storage due to their high capacity.
2. Solid-State Storage: Solid-state storage has gained popularity in recent years due to its faster
access times and lower power consumption compared to traditional magnetic storage. Solid-state
drives (SSDs) use flash memory technology to store data electronically. They are commonly
found in laptops, smartphones, and other portable devices.
3. Optical Storage: Optical storage uses lasers to read and write data on optical discs such as
CDs, DVDs, and Blu-ray discs. These discs have a reflective layer that stores binary data as pits
and lands. Optical storage is widely used for distributing software, movies, music, and other
multimedia content.
4. Cloud Storage: Cloud storage has revolutionized the way we store and access information. It
involves storing data on remote servers accessed via the internet. Cloud storage services offer
scalability, accessibility from anywhere with an internet connection, and backup capabilities.
Popular cloud storage providers include Google Drive, Dropbox, and Microsoft OneDrive.
5. Tape Storage: Although not as commonly used as other storage methods, tape storage is still
widely used for long-term archival and backup purposes. Magnetic tape cartridges provide high-
capacity storage and are often used by large organizations and data centers. Effective information
storage involves considerations such as data security, redundancy, scalability, and cost-
effectiveness.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

12

6. Network-Attached Storage (NAS): NAS is a type of storage device that connects to a network
and provides file-level data storage to multiple clients. It is commonly used in homes and small
businesses to centralize data storage and facilitate easy file sharing and access.
7. Storage Area Network (SAN): SAN is a specialized network that connects multiple storage
devices to servers, allowing for high-speed data transfer and centralized storage management.
SANs are commonly used in large enterprises and data centers that require high-performance
storage solutions.
8. Virtualization: Virtualization technology allows for the abstraction of physical storage
resources, enabling multiple virtual machines or servers to share a common pool of storage. This
improves resource utilization and simplifies management in virtualized environments.
9. Redundant Array of Independent Disks (RAID): RAID is a method of combining multiple
physical disk drives into a single logical unit for improved performance, reliability, or both.
Different RAID levels offer various combinations of data striping, mirroring, and parity for
enhanced data protection and performance.
10. Data Backup and Disaster Recovery: Information storage also involves implementing
backup strategies to protect against data loss due to hardware failures, natural disasters, or human
errors. Regular backups ensure that critical data can be restored in case of an unforeseen event.
11. Data Compression: Data compression techniques are often used to reduce the size of stored
information, optimizing storage space and improving transfer speeds. Compression algorithms
remove redundant or unnecessary data from files without compromising their integrity.
12. Data Encryption: To ensure the security and privacy of stored information, data encryption
techniques are employed. Encryption transforms data into an unreadable format using
cryptographic algorithms, making it accessible only to authorized users with the appropriate
decryption keys.
13. Big Data Storage: With the exponential growth of data in recent years, big data storage
solutions have emerged to handle massive volumes of structured and unstructured data.
Technologies like Hadoop Distributed File System (HDFS) and NoSQL databases provide
scalable and distributed storage architectures for big data analytics. Effective information storage
involves a combination of these methods and technologies, tailored to the specific needs and
requirements of an organization or individual. It is essential to consider factors such as data access
speed.
Data
Data is a collection of raw facts from which conclusions may be drawn.
Examples:

✓ Handwritten letters,
✓ A printed book,
✓ A family photograph,
✓ A movie on video tape,

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

13

✓ Printed and duly signed copies of mortgage papers,


✓ A bank’s ledgers, and
✓ An account holder’s passbooks.
1.1.1 DIGITAL DATA AND ITS TYPES

✓ Digital data refers to any information that is processed and stored in a digital format, such as
text, images, audio, and video. This data is represented using binary code (1s and 0s) and can
be easily manipulated and transmitted electronically.
✓ Examples of digital data can include emails, social media posts, digital images, music files,
video files, and more.
✓ Digital data is used widely in modern society and is essential for communication,
entertainment, education, research, and business operations.

With the advancement of computer and communication technologies, the rate of data generation
and sharing has increased exponentially.
The following is a list of some of the factors that have contributed to the growth of digital
data:

✓ Increase in data processing capabilities: Modern-day computers provide a significant


increase in processing and storage capabilities. This enables the conversion of various types
of content and media from conventional forms to digital formats.
✓ Lower cost of digital storage: Technological advances and decrease in the cost of storage
devices have provided low-cost solutions and encouraged the development of less expensive
data storage devices. This cost benefit has increased the rate at which data is being generated
and stored.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

14

✓ Affordable and faster communication technology: The rate of sharing digital data is now
much faster than traditional approaches. A handwritten letter may take a week to reach its
destination, whereas it only takes a few seconds for an e-mail message to reach its recipient.
Types of digital data
Data can be classified based on how it is stored and managed.
Digital data are classified as follows,

✓ Structured Data
✓ Unstructured Data
✓ Semi-Structured Data
1.1.1.1 Structured data:
Structured data is a data that is highly organized and follows a predefined schema or data model.
In structured data, data are organized in rows and columns. It is typically stored using DBMS
(Database and Management System).
Examples -

✓ Relational databases: Tables with rows and columns containing data.


✓ Spreadsheets: Excel files with tabular data.
✓ XML (extensible Markup Language) documents with a defined structure.
✓ JSON (JavaScript Object Notation) files when they adhere to a defined schema
1.1.1.2 Unstructured data:
Unstructured data is a data that lacks a specific format, often contains text, multimedia, or other
content. In unstructured data, data cannot be stored in rows and columns and is therefore difficult
to query and retrieve by business application.
Examples -

✓ Text documents (e.g., Word documents, PDFs) without a consistent structure.


✓ Images and videos.
✓ Social media posts and comments.
✓ Email messages.
1.1.1.3 Semi-Structured Data:
Semi-structured data is a data that does not have a fixed schema. Data is not completely raw or
unstructured. It is a data that is not captured or formatted in conventional ways. In semi-structural
data, data does not follow the format of a tabular data model or relational databases. This type of
data contains some structural elements such as tags and organizational metadata that make it
easier to analyse.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

15

Examples

✓ HTML code,
✓ Graphs and tables,
✓ E-mails,
✓ XML documents.
Advantage:
The advantages of semi-structured data is that it is more flexible and simpler to scale compared
to structured data.
Difference between Structured, Unstructured and Semi-structured data

Properties Structured data Unstructured data Semi-structured data

Technology Relational database Character and binary XML / RDF


table data

Transaction Matured transaction, No transaction Transaction


Management various concurrency management, no management adapted
techniques concurrency. from RDBMS not
matured.

Version Management Versioning over Versioned as a whole. Versioning over


tuples, rows, tables tuples or graphs is
etc. possible.

Flexibility Less flexible, Very flexible, Flexible, tolerant


Schema dependent absence of schema schema
rigorous schema

Scalability Scaling DB schema Very scalable Schema scaling is


is difficult simple

Robustness Very robust Less robust New technology not


widely spread

Query Performance Structured query Only texual queries Queries over


allows complex joins possible anonymous nodes
are possible

Format Predefined format Variety of formats -

Analysis Easy Difficult Easy

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

16

1.1.2 Information Storage


Information
Information is the intelligence and knowledge derived from data. Effective data analysis not only
extends its benefits to existing businesses, but also creates the potential for new business
opportunities by using the information in creative ways. Business analyze raw data in order to
identify meaningful trends.
Storage
Data created by individuals or businesses must be stored so that it is easily accessible for further
processing. In a computing environment, devices designed for storing data are termed storage
devices or simply storage. This type of storage used varies based on the type of data and the rate
at which it is created and used.
Examples -

✓ Devices such as memory in a cell phone or digital camera, DVDs, CD-ROMs, and hard disks
in personal computers.
✓ Businesses have several options available for storing data including internal hard disks,
external disk arrays and tapes.

Figure - Virtuous cycle of Information


Types of storage
Primary Storage is also known as primary memory is immediate access storage for active data.
Fastest and most expensive storage directly accessible by the CPU. Read Only Memory (ROM)
, Random Access Memory(RAM), Cache & Flash Memory are primary storage.
Secondary Storage is a long-term storage for inactive data. Slower than primary storage but with
larger capacity. Magnetic Storage Devices (Hard disk drives(HDDs), Tape drives, Floppy disk
drives). Optical Storage Devices (CD drives, DVD drives, Blue-ray drives). Solid-State Storage
Devices (Solid State Drives (SSDs), USB drives) is a secondary storage.
Tertiary Storage is a offline Storage of archival purposes. It is used for backup and archival
purposes. Cloud storage is a tertiary storage.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

17

Information storage is a fundamental concept in the world of data and technology. Information
storage refers to the process of collecting, preserving, and organizing data in a manner that allows
for efficient retrieval and use at a later time.
Information storage systems are an integral part of our daily lives, playing a crucial role in various
fields, including business, science, education, personal communication.

Feature Primary Storage Secondary Storage

Speed Fastest Fast

Data availability Critical Non-critical

Data durability Yes Not necessary

First byte retrieval Milliseconds Sub second

Likely storage type Flash/SSD HDD/spinning disk, tape

Cost More expensive Less expensive

1.1.3 Key characteristics of data center


Data center store and manage large amounts of mission-critical data. It is a collection of servers
where application is placed and is accessed via internet. The data center infrastructure includes
computers, storage systems, network devices, dedicated power backups, and environmental
controls (such as air conditioning and fire suppression).

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

18

Availability: All data center elements should be designed to ensure accessibility. The inability of
users to access data can have a significant negative impact on a business.
Security: Polices, procedures, and proper integration of the data center core elements that will
prevent unauthorized access to information must be established. In addition to the security
measures for client access, specific mechanisms must enable servers to access only their allocated
resources on storage arrays.
Scalability: Data center operations should be able to allocate additional processing capabilities
or storage on demand, without interrupting business operations. Business growth often requires
deploying more servers, new applications, and additional databases. The storage solution should
be able to grow with the business.
Performance: All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed. The infrastructure should be able
to support performance requirements.
Data integrity: Data integrity refers to mechanisms such as error correction codes or parity bits
which ensure that data is written to disk exactly as it was received. Any variation in data during
its retrieval implies corruption, which may affect the operations of the organization.
Capacity: Data center operations require adequate resources to store and process large amounts
of data efficiently. When capacity requirements increase, the data center must be able to provide
additional capacity without interrupting availability, or, at the very least, with minimal disruption.
Capacity may be managed by reallocation of existing resources, rather than by adding new
resources.
Manageability: A data center should perform all operations and activities in the most efficient
manner. Manageability can be achieved through automation and the reduction of human (manual)
intervention in common tasks.
Monitoring: It is a continuous process of gathering information on various elements and services
running in the data center. The reason is obvious – to predict unpredictable.
Reporting: A resource performance, capacity and utilization gathered together in a point of time.
Provisioning: It is a process of providing the hardware, software and other resources required to
run a data center.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

19

1.1.4 Evolution of Computing Platforms


The evolution of computing platforms has been a dynamic and fascinating journey that spans
several decades. This evolution has been driven by advances in technology, changes in user needs,
and shifts in the computing industry.
Computing platforms evolved over the time as follows
First Platform: Mainframes and Centralized Computing.
Second Platform: Personal Computers and Client-Server architecture.
Third Platform: Cloud Computing and Mobile Devices.
Fourth Platform: Artificial Intelligence and Machine Learning.
Mainframes (1940s-1950s): The earliest computers were massive mainframe machines that
filled entire rooms. They were primarily used for scientific and military applications, such as
calculations for the Manhattan Project.
Minicomputers (1960s-1970s): These were smaller and more affordable than mainframes,
making them accessible to universities and businesses. They played a significant role in scientific
research and early business computing.
Personal Computers (PCs) (1970s-1980s): The introduction of the microprocessor, notably the
Intel 4004 and 8080, led to the development of the first personal computers. Companies like
Apple and IBM brought PCs to the mass market. This era also saw the emergence of operating
systems like MS-DOS.
Graphical User Interfaces (GUIs) (1980s-1990s): GUIs, popularized by Apple's Macintosh and
Microsoft's Windows, made computers more user-friendly. The mouse-driven interface and icons
replaced command-line interfaces for many tasks.
Networking and the Internet (1980s-1990s): The development of TCP/IP protocols and the
World Wide Web in the late 20th century revolutionized how people access and share information.
This era also saw the rise of email and early online communities.
Client-Server Computing (1990s): Client-server architecture became dominant. Applications
were split between client (user interface) and server (data processing) components, enabling
distributed computing.
Laptops and Mobile Computing (1990s-present): The laptop brought computing mobility,
while the development of smartphones and tablets expanded computing to a wider audience.
These devices integrated various technologies, including touchscreens and mobile operating
systems like iOS and Android.
Cloud Computing (2000s-present): Cloud computing services, like Amazon Web Services
(AWS), Google Cloud, and Microsoft Azure, revolutionized data storage and processing. Users
could access computing resources on-demand, without needing to own or manage physical
servers.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

20

Virtualization and Containers (2000s-present): Virtualization technologies (e.g., VMware)


and containerization (e.g., Docker) enabled more efficient use of server hardware, improving
resource allocation and deployment.
Big Data and Analytics (2010s-present): The explosion of data led to the development of big
data technologies, such as Hadoop and Spark, for processing and analyzing massive datasets.
This era also saw the rise of data science and machine learning.
Edge Computing (2010s-present): As the Internet of Things (IoT) grew, edge computing
emerged to process data closer to the source, reducing latency and bandwidth requirements.
Quantum Computing (ongoing): Quantum computing is an emerging field with the potential to
solve complex problems that classical computers cannot. Companies like IBM, Google, and
startups are working on quantum computers.
AI and Machine Learning (ongoing): Advances in AI and machine learning have led to the
development of specialized hardware (e.g., GPUs and TPUs) and platforms (e.g., TensorFlow,
PyTorch) for deep learning and other AI applications.
Decentralized and Blockchain (ongoing): Technologies like blockchain have introduced
decentralized computing platforms for applications beyond cryptocurrencies, including supply
chain management and smart contracts.
Augmented Reality (AR) and Virtual Reality (VR) (ongoing): AR and VR technologies are
pushing the boundaries of computing by creating immersive digital experiences, impacting
industries such as gaming, education, and healthcare.
Key Challenges in Managing Information:
In order to frame an effective information management policy, businesses need to consider the
following key challenges of information management:
Exploding digital universe: The rate of information growth is increasing exponentially.
Duplication of data to ensure high availability and repurposing has also contributed to the multi-
fold increase of information growth.
Increasing dependency on information: The strategic use of information plays an important
role in determining the success of a business and provides competitive advantages in the
marketplace.
Changing value of information: Information that is valuable today may become less important
tomorrow. The value of information often changes over time. Framing a policy to meet these
challenges involves understanding the value of information over its lifecycle.
1.1.5 Information Lifecycle Management
1.1.5.1 Information Lifecycle:
The information lifecycle is the “change in the value of information” over time. When data is first
created, it often has the highest value and is used frequently. As data ages, it is accessed less
frequently and is of less value to the organization. Understanding the information lifecycle helps
to deploy appropriate storage infrastructure, according to the changing value of information.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

21

1.1.5.2 For example, in a sales order application, the value of the information changes from the
time the order is placed until the time that the warranty becomes void (see Figure -Changing value
of sales order information). The value of the information is highest when a company receives a
new sales order and processes it to deliver the product. After order fulfilment, the customer or
order data need not be available for real-time access. The company can transfer this data to less
expensive secondary storage with lower accessibility and availability requirements unless or until
a warranty claim or another event triggers its need. After the warranty becomes void, the company
can archive or dispose of data to create space for other high-value information.

Figure - Changing value of sales order information


Information Lifecycle Management:
Information lifecycle management (ILM) is a proactive strategy that enables an IT organization
to effectively manage the data throughout its lifecycle, based on predefined business policies.
This allows an IT organization to optimize the storage infrastructure for maximum return on
investment.
1.1.5.3 Information Lifecycle strategy:
An ILM strategy should include the following characteristics:
Business-centric: It should be integrated with key processes, applications, and initiatives of the
business to meet both current and future growth in information.
Centrally managed: All the information assets of a business should be under the purview of the
ILM strategy.
Policy-based: The implementation of ILM should not be restricted to a few departments. ILM
should be implemented as a policy and encompass all business applications, processes, and
resources.
Heterogeneous: An ILM strategy should take into account all types of storage platforms and
operating systems.
Optimized: Because the value of information varies, an ILM strategy should consider the
different storage requirements and allocate storage resources based on the information’s value to
the business.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

22

1.1.5.4 Information Lifecycle Management Implementation:


The process of developing an ILM strategy includes four activities—classifying, implementing,
managing, and organizing:
Classifying data and applications on the basis of business rules and policies to enable
differentiated treatment of information.
Implementing policies by using information management tools, starting from the creation of
data and ending with its disposal.
Managing the environment by using integrated tools to reduce operational complexity
Organizing storage resources in tiers to align the resources with data classes, and storing
information in the right type of infrastructure based on the information’s current value
Implementing ILM across an enterprise is an ongoing process. (Figure -Implementation of ILM)
illustrates a three-step road map to enterprise-wide ILM.
Steps 1 and 2 are aimed at implementing ILM in a limited way across a few enterprise-critical
applications.
In Step 1, the goal is to implement a storage networking environment. Storage architectures offer
varying levels of protection and performance and this acts as a foundation for future policy-based
information management in Steps 2 and 3. The value of tiered storage platforms can be exploited
by allocating appropriate storage resources to the applications based on the value of the
information processed.
Step 2 takes ILM to the next level, with detailed application or data classification and linkage of
the storage infrastructure to business policies. These classifications and the resultant policies can
be automatically executed using tools for one or more applications, resulting in better
management and optimal allocation of storage resources.
Step 3 of the implementation is to automate more of the applications or data classification and
policy management activities in order to scale to a wider set of enterprise applications.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

23

Figure- Implementation of ILM


1.1.5.5 ILM Benefits
Implementing an ILM strategy has the following key benefits that directly address the challenges
of information management:

✓ Improved utilization by using tiered storage platforms and increased visibility of all
enterprise information.
✓ Simplified management by integrating process steps and interfaces with individual tools and
by increasing automation.
✓ A wider range of options for backup, and recovery to balance the need for business continuity.
✓ Maintaining compliance by knowing what data needs to be protected for what length of time.
✓ Lower Total Cost of Ownership (TCO) by aligning the infrastructure and management costs
with information value. As a result, resources are not wasted, and complexity is not
introduced by managing low-value data at the expense of high-value data.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

24

1.2 Third Platform Technologies


Third Platform Technologies" is a term often used in the context of information technology and
business to describe the next wave of computing and technology platforms that follow the first
two platforms. The first platform was the mainframe era, which dominated from the 1950s to the
1970s. The second platform was the client-server era, which emerged in the 1980s and continued
through the 1990s and early 2000s. The third platform represents the current and evolving
technology landscape, which has been shaping the digital world since the mid-2000s and
continues to do so.
Key characteristics of Third Platform Technologies include:
Cloud Computing: Cloud computing has become a fundamental component of the third
platform. It enables the delivery of computing services over the internet, providing scalable and
flexible infrastructure for businesses. Cloud services include Infrastructure as a Service (IaaS),
Platform as a Service (PaaS), and Software as a Service (SaaS).
Big Data and Analytics: The third platform is characterized by the proliferation of data and the
tools and technologies to analyze and derive insights from this data. Big data analytics, data
warehouses, and data lakes are central to this platform.
Mobile Computing: Mobile devices, such as smartphones and tablets, play a critical role in the
third platform. Mobile apps and mobile internet usage have become ubiquitous, transforming how
people interact with technology and access information.
Social Media: Social media platforms like Facebook, Twitter, and LinkedIn have become integral
to both personal and business communication. They have also created new opportunities for
marketing, customer engagement, and data collection.
IoT (Internet of Things): The third platform encompasses the growing network of
interconnected devices and sensors that collect and exchange data. IoT has applications in various
industries, including healthcare, manufacturing, transportation, and smart cities.
Artificial Intelligence (AI) and Machine Learning (ML): AI and ML technologies are a
cornerstone of the third platform. They enable automation, predictive analytics, natural language
processing, and other advanced capabilities that are transforming industries and processes.
Blockchain: While originally associated with cryptocurrencies like Bitcoin, blockchain
technology has found applications beyond digital currencies. It is used for secure and transparent
record-keeping and transactions in various industries.
Cybersecurity: As technology becomes more integrated into daily life and business operations,
cybersecurity becomes increasingly important. Protecting data and systems from cyber threats is
a central concern in the third platform.
Edge Computing: With the growth of IoT and the need for real-time processing, edge computing
has emerged as a key technology within the third platform. Edge computing involves processing
data closer to the source (e.g., IoT devices) rather than relying solely on centralized cloud
infrastructure.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

25

Augmented Reality (AR) and Virtual Reality (VR): AR and VR technologies are transforming
the way we interact with digital information and environments, impacting industries like gaming,
education, healthcare, and more.
The third platform represents a shift towards a more interconnected, data-driven, and technology-
dependent world. It has significant implications for businesses, as they must adapt to leverage
these technologies for competitive advantage and operational efficiency. Additionally, it
continues to evolve, with emerging technologies like quantum computing and 5G networks
expected to further shape the landscape of the third platform in the future.

1.2.1 Cloud computing


Cloud computing is the delivery of computing services—including servers, storage, databases,
networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster
innovation, flexible resources, and economies of scale. You typically pay only for cloud services
you use, helping you lower your operating costs, run your infrastructure more efficiently, and
scale as your business needs change.
Cloud computing is on-demand access, via the internet, to computing resources—applications,
servers (physical servers and virtual servers), data storage, development tools, networking
capabilities, and more—hosted at a remote data center managed by a cloud services provider (or
CSP). The CSP makes these resources available for a monthly subscription fee or bills them
according to usage.

1.2.2 Essential characteristics of Cloud Computing

1. On-demand self-services: The Cloud computing services does not require any human
administrators, user themselves are able to provision, monitor and manage computing
resources as needed.
2. Broad network access: The Computing services are generally provided over standard
networks and heterogeneous devices.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

26

3. Rapid elasticity: The Computing services should have IT resources that are able to scale out
and in quickly and on as needed basis. Whenever the user require services it is provided to
him and it is scale out as soon as its requirement gets over.
4. Resource pooling: The IT resource (e.g., networks, servers, storage, applications, and
services) present are shared across multiple applications and occupant in an uncommitted
manner. Multiple clients are provided service from a same physical resource.
5. Measured service: The resource utilization is tracked for each application and occupant, it
will provide both the user and the resource provider with an account of what has been used.
This is done for various reasons like monitoring billing and effective use of resource.
6. Multi-tenancy: Cloud computing providers can support multiple tenants (users or
organizations) on a single set of shared resources.
7. Virtualization: Cloud computing providers use virtualization technology to abstract
underlying hardware resources and present them as logical resources to users.
8. Resilient computing: Cloud computing services are typically designed with redundancy and
fault tolerance in mind, which ensures high availability and reliability.
9. Flexible pricing models: Cloud providers offer a variety of pricing models, including pay-
per-use, subscription-based, and spot pricing, allowing users to choose the option that best
suits their needs.
10. Security: Cloud providers invest heavily in security measures to protect their users’ data and
ensure the privacy of sensitive information.
11. Automation: Cloud computing services are often highly automated, allowing users to deploy
and manage resources with minimal manual intervention.
12. Sustainability: Cloud providers are increasingly focused on sustainable practices, such as
energy-efficient data centers and the use of renewable energy sources, to reduce their
environmental impact.
1.2.3 Cloud Service Models
Most cloud computing services fall into four broad categories:

• Infrastructure as a service (IaaS),


• Platform as a service (PaaS),
• Software as a service (SaaS),
• Serverless.
These are sometimes called the cloud computing "stack" because they build on top of one another.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

27

Cloud service models


1.2.3.1 Infrastructure as a Service (IaaS):

Infrastructure as a service (IaaS) refers to the cloud computing services that provides IT
infrastructure—servers and virtual machines (VMs), storage, networks, operating systems— as a
service from a cloud provider on a pay-as-you-go basis. IaaS is also known as Hardware as a
Service (HaaS). It is a computing infrastructure managed over the internet.

Example: DigitalOcean, Linode, Amazon Web Services (AWS), Microsoft Azure, Google
Compute Engine (GCE), Rackspace, and Cisco Metacloud.

Characteristics of IaaS:

There are the following characteristics of IaaS –

o Resources are available as a service


o Services are highly scalable
o Dynamic and flexible
o GUI and API-based access
o Automated administrative tasks

Advantages of IaaS:

1. Cost-Effective: Eliminates capital expense and reduces ongoing cost and IaaS customers
pay on a per-user basis, typically by the hour, week, or month.
2. Website hosting: Running websites using IaaS can be less expensive than traditional web
hosting.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

28

3. Security: The IaaS Cloud Provider may provide better security than your existing
software.
4. Maintenance: There is no need to manage the underlying data center or the introduction
of new releases of the development or underlying software. This is all handled by the IaaS
Cloud Provider.

Disadvantages of laaS:

1. Limited control over infrastructure: IaaS providers typically manage the underlying
infrastructure and take care of maintenance and updates, but this can also mean that users
have less control over the environment and may not be able to make certain
customizations.
2. Security concerns: Users are responsible for securing their own data and applications,
which can be a significant undertaking.
3. Limited access: Cloud computing may not be accessible in certain regions and countries
due to legal policies.

1.2.3.2 Platform as a Service (PaaS):

Platform as a service (PaaS) refers to cloud computing services that supply an on-demand
environment for developing, testing, delivering, and managing software applications. PaaS is
designed to make it easier for developers to quickly create web or mobile apps, without worrying
about setting up or managing the underlying infrastructure of servers, storage, network, and
databases needed for development.

Example: AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App Engine,
Apache Stratos, Magento Commerce Cloud, and OpenShift.

Characteristics of PaaS:

There are the following characteristics of PaaS -

o Accessible to various users via the same development application.


o Integrates with web services and databases.
o Builds on virtualization technology, so resources can easily be scaled up or down as per
the organization's need.
o Support multiple languages and frameworks.
o Provides an ability to "Auto-scale".

Advantages of PaaS:

1. Simple and convenient for users: It provides much of the infrastructure and other IT
services, which users can access anywhere via a web browser.
2. Cost-Effective: It charges for the services provided on a per-use basis thus eliminating
the expenses one may have for on-premises hardware and software.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

29

3. Efficiently managing the lifecycle: It is designed to support the complete web


application lifecycle: building, testing, deploying, managing, and updating.
4. Efficiency: It allows for higher-level programming with reduced complexity thus, the
overall development of the application can be more effective.

Disadvantages of PaaS:

1. Limited control over infrastructure: PaaS providers typically manage the underlying
infrastructure and take care of maintenance and updates, but this can also mean that users
have less control over the environment and may not be able to make certain
customizations.
2. Dependence on the provider: Users are dependent on the PaaS provider for the
availability, scalability, and reliability of the platform, which can be a risk if the provider
experiences outages or other issues.
3. Limited flexibility: PaaS solutions may not be able to accommodate certain types of
workloads or applications, which can limit the value of the solution for certain
organizations.

1.2.3.3 Software as a Service (SaaS):

Software as a service (SaaS) is a method for delivering software applications over the internet,
on demand and typically on a subscription basis. With SaaS, cloud providers host and manage
the software application and underlying infrastructure, and handle any maintenance, like software
upgrades and security patching. Users connect to the application over the internet, usually with a
web browser on their phone, tablet, or PC.

Example: BigCommerce, Google Apps, Salesforce, Dropbox, Cisco WebEx, Slack,


GoToMeeting, Cloud9 Analytics, Cloud Switch, Microsoft Office 365, Eloqua and Cloud Tran.

Characteristics of SaaS:

There are the following characteristics of SaaS -

o Managed from a central location


o Hosted on a remote server
o Accessible over the internet
o Users are not responsible for hardware and software updates. Updates are applied
automatically.
o The services are purchased on the pay-as-per-use basis.

Advantages of SaaS:

1. Cost-Effective: Pay only for what you use.


2. Reduced time: Users can run most SaaS apps directly from their web browser without
needing to download and install any software. This reduces the time spent in installation

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

30

and configuration and can reduce the issues that can get in the way of the software
deployment.
3. Accessibility: We can Access app data from anywhere.
4. Automatic updates: Rather than purchasing new software, customers rely on a SaaS
provider to automatically perform the updates.
5. Scalability: It allows the users to access the services and features on-demand.

Disadvantages of SaaS:

1. Limited customization: SaaS solutions are typically not as customizable as on-premises


software, meaning that users may have to work within the constraints of the SaaS
provider’s platform and may not be able to tailor the software to their specific needs.
2. Dependence on internet connectivity: SaaS solutions are typically cloud-based, which
means that they require a stable internet connection to function properly. This can be
problematic for users in areas with poor connectivity or for those who need to access the
software in offline environments.
3. Security concerns: SaaS providers are responsible for maintaining the security of the
data stored on their servers, but there is still a risk of data breaches or other security
incidents.
4. Limited control over data: SaaS providers may have access to a user’s data, which can
be a concern for organizations that need to maintain strict control over their data for
regulatory or other reasons.

Serverless Computing

Overlapping with PaaS, Serverless Computing focuses on building app functionality without
spending time continually managing the servers and infrastructure required to do so. The cloud
provider handles the setup, capacity planning, and server management for you. Serverless
architectures are highly scalable and event-driven, only using resources when a specific function
or trigger occurs.

Advantages of Serverless Computing:

1. Lower costs - Serverless computing is generally very cost-effective, as traditional cloud


providers of backend services (server allocation) often result in the user paying for unused
space or idle CPU time.
2. Simplified scalability - Developers using serverless architecture don’t have to worry
about policies to scale up their code. The serverless vendor handles all of the scaling on
demand.
3. Simplified backend code - With FaaS, developers can create simple functions that
independently perform a single purpose, like making an API call.
4. Quicker turnaround - Serverless architecture can significantly cut time to market.
Instead of needing a complicated deploy process to roll out bug fixes and new features,
developers can add and modify code on a piecemeal basis.

Difference between IaaS, PaaS, and SaaS:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

31

The below table shows the difference between IaaS, PaaS, and SaaS –

IaaS Paas SaaS

It provides a virtual data It provides virtual platforms It provides web software


center to store information and tools to create, test, and and apps to complete
and create platforms for app deploy apps. business tasks.
development, testing, and
deployment.

It provides access to It provides runtime It provides software as a


resources such as virtual environments and service to the end-users.
machines, virtual storage, deployment tools for
etc. applications.

It is used by network It is used by developers. It is used by end users.


architects.

IaaS provides only PaaS provides Infrastructure SaaS provides Infrastructure


Infrastructure. + Platform. + Platform + Software.

1.2.4 Cloud deployment models / Types of cloud models:


Cloud Deployment Model functions as a virtual computing environment with a deployment
architecture that varies depending on the amount of data user’s want to store and who has access
to the infrastructure.
Cloud Deployment Models
✓ Public cloud
✓ Private cloud
✓ Community cloud
✓ Hybrid cloud
Public - Public cloud supports all users who want to make use of a computing resource, such as
hardware (OS, CPU, memory, storage) or software (application server, database) on a
subscription basis. Resource available for the general public under the Pay as you go model.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

32

Examples- Google Workspace, Amazon Web Services (AWS), Dropbox, and Microsoft offerings
like Microsoft 365 and Azure, as well as streaming services like Netflix.
Private - Private cloud is a Infrastructure used by a single organization. In simple words,
Resource managed and used by the organization.
Examples - Amazon VPC, HPE, VMware, and IBM.
Community – Community cloud supports multiple organizations sharing computing resources
that are part of a community. Resource shared by several organizations, usually in the same
industry.
Examples - Health Care community cloud, Scientific Research Sector.
Hybrid - An organization makes use of interconnected private and public cloud infrastructure.
Hybrid cloud deployment model is partly managed by the service provided and partly by the
organization.
Examples - Google Application Suite (Gmail, Google Apps, and Google Drive), Office 365 (MS
Office on the Web and One Drive), Amazon Web Services.
Common Adoption Issues for Cloud
• It is impossible to provide 100% availability without a high-availability architecture.
• Vendor lock-in is also a concern that users always have, but in practice, they live with it.
• It is almost impossible to guarantee 100% of security and privacy protection.
• Enterprise users must maintain business legal documents.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

33

1.2.4.1 Public Cloud Model


As its names suggest, the public cloud is available to the general public, and resources are shared
between all users. They are available to anyone, from anywhere, using the Internet. The public
cloud deployment model is one of the most popular types of cloud.

Public Cloud Architecture


This computing model is hosted at the vendor’s data center. The public cloud model makes the
resources, such as storage and applications, available to the public over the WWW. It serves all
the requests; therefore, resources are almost infinite.
Characteristics of Public Cloud
Here are the essential characteristics of the Public Cloud:
1. Uniformly designed Infrastructure
2. Works on the Pay-as-you-go basis
3. Economies of scale
4. SLA guarantees that all users have a fair share with no priority
5. It is a multitenancy architecture, so data is highly likely to be leaked
Advantages of Public Cloud Deployments
Here are the pros/benefits of the Public Cloud Deployment Model:
1. Highly available anytime and anywhere, with robust permission and authentication
mechanism.
2. There is no need to maintain the cloud.
3. Does not have any limit on the number of users.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

34

4. The cloud service providers fully subsidize the entire Infrastructure. Therefore, you don’t
need to set up any hardware.
5. Does not cost you any maintenance charges as the service provider does it.
6. It works on the Pay as You Go model, so you don’t have to pay for items you don’t use.
7. There is no significant upfront fee, making it excellent for enterprises that require
immediate access to resources.
Disadvantages of Public Cloud Deployments
Here are the cons/drawbacks of the Public Cloud Deployment Model:
1. It has lots of issues related to security.
2. Privacy and organizational autonomy are not possible.
3. You don’t control the systems hosting your business applications.
1.2.4.2 Private Cloud Model
The private cloud deployment model is a dedicated environment for one user or customer. Users
don’t share their hardware with any other users, as all the hardware is theirs.
It is a one-to-one environment for single use, so there is no need to share your hardware with
anyone else. The main difference between private and public cloud deployment models is how
you handle the hardware. It is also referred to as “Internal cloud,” which refers to the ability to
access systems and services within an organization or border.

How Private Cloud Works


Characteristics of Private Cloud
Here are the essential characteristics of the Private Cloud:
1. It has a non-uniformly designed infrastructure.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

35

2. Very low risk of data leaks.


3. Provides End-to-End Control.
4. Weak SLA, but you can apply custom policies.
5. Internal Infrastructure to manage resources easily.
Advantages of Private Cloud Deployments
Here are the pros/benefits of the Private Cloud Deployment Model:
1. You have complete command over service integration, IT operations, policies, and user
behaviour.
2. Companies can customize their solution according to market demands.
3. It offers exceptional reliability in performance.
4. A private cloud enables the company to tailor its solution to meet specific needs.
5. It provides higher control over system configuration according to the company’s
requirements.
6. Private cloud works with legacy systems that cannot access the public cloud.
• This Cloud Computing Model is small, and therefore it is easy to manage.
• It is suitable for storing corporate information that only permitted staff can access.
• You can incorporate as many security services as possible to secure your cloud.
Disadvantages of Private Cloud Deployments
Here are the cons/drawbacks of the Private Cloud Deployment Model:
1. It is a fully on-premises-hosted cloud that requires significant capital to purchase and
maintain the necessary hardware.
2. Companies that want extra computing power must take extra time and money to scale up
their Infrastructure.
3. Scalability depends on the choice of hardware.
1.2.4.3 Hybrid Cloud Model
A hybrid cloud deployment model combines public and private clouds. Creating a hybrid cloud
computing model means that a company uses the public cloud but owns on-premises systems and
provides a connection between the two. They work as one system, which is a beneficial model
for a smooth transition into the public cloud over an extended period.
Some companies cannot operate solely in the public cloud because of security concerns or data
protection requirements. So, they may select the hybrid cloud to combine the requirements with
the benefits of a public cloud. It enables on-premises applications with sensitive data to run
alongside public cloud applications.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

36

How the Hybrid Cloud Works


Characteristics of Private Cloud
Here are the Characteristics of the Private Cloud:
1. Provides betters security and privacy
2. Offers improved scalability
3. Cost-effective Cloud Deployment Model
4. Simplifies data and application portability
Advantages of Hybrid Cloud Deployments
Here are the pros/benefits of the Hybrid Cloud Deployment Model:
1. It gives the power of both public and private clouds.
2. It offers better security than the Public Cloud.
3. Public clouds provide scalability. Therefore, you can only pay for the extra capacity if
required.
4. It enables businesses to be more flexible and to design personalized solutions that meet
their particular needs.
5. Data is separated correctly, so the chances of data theft by attackers are considerably
reduced.
6. It provides robust setup flexibility so that customers can customize their solutions to fit
their requirements.
Disadvantages of Hybrid Cloud Deployments

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

37

Here are the cons/drawbacks of the Hybrid Cloud Deployment Model:


1. It is applicable only when a company has varied use or demand for managing the
workloads.
2. Managing a hybrid cloud is complex, so if you use a hybrid cloud, you may spend too
much.
3. Its security features are not good as the Private Cloud.
1.2.4.4 Community Cloud Model
Community clouds are cloud-based infrastructure models that enable multiple organizations to
share resources and services based on standard regulatory requirements. It provides a shared
platform and resources for organizations to work on their business requirements. This Cloud
Computing model is operated and managed by community members, third-party vendors, or both.
The organizations that share standard business requirements make up the members of the
community cloud.

How Community Cloud Works


Advantages of Community Cloud Deployments
Here are the pros/benefits of the Community Cloud Deployment Model:
1. You can establish a low-cost private cloud.
2. It helps you to do collaborative work on the cloud.
3. It is cost-effective, as multiple organizations or communities share the cloud.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

38

4. You can share resources, Infrastructure, etc., with multiple organizations.


5. It is a suitable model for both collaboration and data sharing.
6. Gives better security than the public cloud.
7. It offers a collaborative space that allows clients to enhance their efficiency.
Disadvantages of Community Cloud Deployments
Here are the cons/drawbacks of the Community Cloud Deployment Model:
1. Because of its restricted bandwidth and storage capacity, community resources often pose
challenges.
2. It is not a very popular and widely adopted cloud computing model.
3. Security and segmentation are challenging to maintain.
1.2.4.5 Multi - cloud

Multi-Cloud Architecture
Multi-cloud computing refers to using public cloud services from many cloud service
providers. A company must run workloads on IaaS or PaaS in a multi-cloud configuration from
multiple vendors, such as Azure, AWS, or Google Cloud Platform.
There are many reasons an organization selects a multi-cloud strategy. Some use it to avoid
vendor lock-in problems, while others combat shadow IT through multi-cloud deployments. So,
employees can still benefit from a specific public cloud service if it does not meet strict IT
policies.
Benefits of Multi-Cloud Deployment Model

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

39

Here are the pros/benefits of the Multi-Cloud Deployment Model:


1. A multi-cloud deployment model helps organizations choose the specific services that
work best for them.
2. It provides a reliable architecture.
3. With multi-cloud models, companies can choose the best Cloud service provider based on
contract options, flexibility with payments, and customizability of capacity.
4. It allows you to select cloud regions and zones close to your clients.
Disadvantages of Multi-Cloud Deployments
Here are the cons/drawbacks of the Multi-Cloud Deployment Model:
1. Multi-cloud adoption increases the complexity of your business.
2. Finding developers, engineers, and cloud security experts who know multiple clouds is
difficult.
How to select the suitable Cloud Deployment Models
Companies are extensively using these cloud computing models all around the world. Each of
them solves a specific set of problems. So, finding the right Cloud Deployment Model for you or
your company is important.
Here are points you should remember for selecting the right Cloud Deployment Model:
1. Scalability: You need to check if your user activity is growing quickly or unpredictably
with spikes in demand.
2. Privacy and security: Select a service provider that protects your privacy and the
security of your sensitive data.
3. Cost: You must decide how many resources you need for your cloud solution. Then
calculate the approximate monthly cost for those resources with different cloud providers.
4. Ease of use: You must select a model with no steep learning curve.
5. Legal Compliance: You need to check whether any relevant low stop you from selecting
any specific cloud deployment model.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

40

Comparison of Top Cloud Deployment Models

Parameters Public Private Community Hybrid

Setup and use Easy Need help from a Require a Require a


professional IT professional IT professional IT
team. team. team.

Scalability Very High Low Moderate High


and Elasticity

Data Control Little to none Very High Relatively High High

Security Very low Very high High Very high


and privacy

Reliability Low High Higher High

Demand for No Very high in-house No In-house software


software is not a must.
in-house
requirement
software

1.2.5 Big data analytics


Big Data analytics is a process used to extract meaningful insights, such as hidden patterns,
unknown correlations, market trends, and customer preferences. Big Data analytics provides
various advantages—it can be used for better decision making, preventing fraudulent activities,
among other things.
Importance of Big Data Analytics:
In today’s world, Big Data analytics is fueling everything we do online—in every industry.
Example : Take the music streaming platform Spotify for example. The company has nearly 96
million users that generate a tremendous amount of data every day. Through this information, the
cloud-based platform automatically generates suggested songs—through a smart
recommendation engine—based on likes, shares, search history, and more. What enables this is
the techniques, tools, and frameworks that are a result of Big Data analytics.
If you are a Spotify user, then you must have come across the top recommendation section, which
is based on your likes, past history, and other things. Utilizing a recommendation engine that
leverages data filtering tools that collect data and then filter it using algorithms works. This is
what Spotify does.
Benefits and Advantages of Big Data Analytics
1. Risk Management

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

41

Use Case: Banco de Oro, a Phillippine banking company, uses Big Data analytics to identify
fraudulent activities and discrepancies. The organization leverages it to narrow down a list of
suspects or root causes of problems.
2. Product Development and Innovations
Use Case: Rolls-Royce, one of the largest manufacturers of jet engines for airlines and armed
forces across the globe, uses Big Data analytics to analyze how efficient the engine designs are
and if there is any need for improvements.
3. Quicker and Better Decision Making Within Organizations
Use Case: Starbucks uses Big Data analytics to make strategic decisions. For example, the
company leverages it to decide if a particular location would be suitable for a new outlet or not.
They will analyze several different factors, such as population, demographics, accessibility of the
location, and more.
4. Improve Customer Experience
Use Case: Delta Air Lines uses Big Data analysis to improve customer experiences. They monitor
tweets to find out their customers’ experience regarding their journeys, delays, and so on. The
airline identifies negative tweets and does what’s necessary to remedy the situation. By publicly
addressing these issues and offering solutions, it helps the airline build good customer relations.
Big Data
According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to complex
and large data sets that have to be processed and analyzed to uncover valuable information that
can benefit businesses and organizations.
However, there are certain basic tenets of Big Data that will make it even simpler to answer what
is Big Data:
✓ It refers to a massive amount of data that keeps on growing exponentially with time.
✓ It is so voluminous that it cannot be processed or analyzed using conventional data
processing techniques.
✓ It includes data mining, data storage, data analysis, data sharing, and data visualization.
✓ The term is an all-comprehensive one including data, data frameworks, along with the
tools and techniques used to process and analyze the data.
Types of Big Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
a) Structured
Structured is one of the types of big data and By structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information that

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

42

can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms.
For instance, the employee table in a company database will be structured as the employee details,
their job positions, their salaries, etc., will be present in an organized manner.
b) Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data.
Email is an example of unstructured data. Structured and unstructured are two important types of
big data.
c) Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers
to the data that although has not been classified under a particular repository (database), yet
contains vital information or tags that segregate individual elements within the data.
The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets go back to
the 1960s and '70s when the world of data was just getting started with the first data centers and
the development of the relational database.
Around 2005, people began to realize just how much data users generated through Facebook,
YouTube, and other online services. Hadoop (an open-source framework created specifically to
store and analyze big data sets) was developed that same year. NoSQL also began to gain
popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was
essential for the growth of big data because they make big data easier to work with and cheaper
to store. In the years since then, the volume of big data has skyrocketed. Users are still generating
huge amounts of data—but it’s not just humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the
internet, gathering data on customer usage patterns and product performance. The emergence of
machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has expanded
big data possibilities even further. The cloud offers truly elastic scalability, where developers can
simply spin up ad hoc clusters to test a subset of data.
Uses and Examples of Big Data Analytics
There are many different ways that Big Data analytics can be used in order to improve businesses
and organizations. Here are some examples:
• Using analytics to understand customer behaviour in order to optimize the customer
experience

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

43

• Predicting future trends in order to make better business decisions


• Improving marketing campaigns by understanding what works and what doesn't
• Increasing operational efficiency by understanding where bottlenecks are and how to fix
them
• Detecting fraud and other forms of misuse sooner
These are just a few examples — the possibilities are really endless when it comes to Big Data
analytics. It all depends on how you want to use it in order to improve your business.
The Lifecycle Phases of Big Data Analytics
• Stage 1 - Business case evaluation - The Big Data analytics lifecycle begins with a
business case, which defines the reason and goal behind the analysis.
• Stage 2 - Identification of data - Here, a broad variety of data sources are identified.
• Stage 3 - Data filtering - All of the identified data from the previous stage is filtered here
to remove corrupt data.
• Stage 4 - Data extraction - Data that is not compatible with the tool is extracted and then
transformed into a compatible form.
• Stage 5 - Data aggregation - In this stage, data with the same fields across different
datasets are integrated.
• Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to discover
useful information.
• Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView, Big
Data analysts can produce graphic visualizations of the analysis.
• Stage 8 - Final analysis result - This is the last step of the Big Data analytics lifecycle,
where the final results of the analysis are made available to business stakeholders who
will take action.
Different Types of Big Data Analytics
Here are the four types of Big Data analytics:
1. Descriptive Analytics - This summarizes past data into a form that people can easily read. This
helps in creating reports, like a company’s revenue, profit, sales, and so on. Also, it helps in the
tabulation of social media metrics.
Use Case: The Dow Chemical Company analyzed its past data to increase facility utilization
across its office and lab space. Using descriptive analytics, Dow was able to identify underutilized
space. This space consolidation helped the company save nearly US $4 million annually.
2. Diagnostic Analytics
This is done to understand what caused a problem in the first place. Techniques like drill-down,
data mining, and data recovery are all examples. Organizations use diagnostic analytics because
they provide an in-depth insight into a particular problem.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

44

Use Case: An e-commerce company’s report shows that their sales have gone down, although
customers are adding products to their carts. This can be due to various reasons like the form
didn’t load correctly, the shipping fee is too high, or there are not enough payment options
available. This is where you can use diagnostic analytics to find the reason.
3. Predictive Analytics
This type of analytics looks into the historical and present data to make predictions of the future.
Predictive analytics uses data mining, AI, and machine learning to analyze current data and make
predictions about the future. It works on predicting customer trends, market trends, and so on.
Use Case: PayPal determines what kind of precautions they have to take to protect their clients
against fraudulent transactions. Using predictive analytics, the company uses all the historical
payment data and user behavior data and builds an algorithm that predicts fraudulent activities.
4. Prescriptive Analytics
This type of analytics prescribes the solution to a particular problem. Perspective analytics works
with both descriptive and predictive analytics. Most of the time, it relies on AI and machine
learning.
Use Case: Prescriptive analytics can be used to maximize an airline’s profit. This type of analytics
is used to build an algorithm that will automatically adjust the flight fares based on numerous
factors, including customer demand, weather, destination, holiday seasons, and oil prices.
Big Data Analytics Tools
Here are some of the key big data analytics tools :
• Hadoop - helps in storing and analyzing data
• MongoDB - used on datasets that change frequently
• Talend - used for data integration and management
• Cassandra - a distributed database used to handle chunks of data
• Spark - used for real-time processing and analyzing large amounts of data
• STORM - an open-source real-time computational system
• Kafka - a distributed streaming platform that is used for fault-tolerant storage
Big Data Industry Applications
Here are some of the sectors where Big Data is actively used:
• Ecommerce - Predicting customer trends and optimizing prices are a few of the ways e-
commerce uses Big Data analytics
• Marketing - Big Data analytics helps to drive high ROI marketing campaigns, which result
in improved sales
• Education - Used to develop new and improve existing courses based on market
requirements

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

45

• Healthcare - With the help of a patient’s medical history, Big Data analytics is used to
predict how likely they are to have health issues
• Media and entertainment - Used to understand the demand of shows, movies, songs, and
more to deliver a personalized recommendation list to its users
• Banking - Customer income and spending patterns help to predict the likelihood of
choosing various banking offers, like loans and credit cards
• Telecommunications - Used to forecast network capacity and improve customer
experience
• Government - Big Data analytics helps governments in law enforcement, among other
things
1.2.6 Social networking
Social networks are websites and apps that allow users and organizations to connect,
communicate, share information and form relationships. People can connect with others in the
same area, families, friends, and those with the same interests. Social networks are one of the
most important uses of the internet today.
Popular social networking sites -- such as Facebook, Yelp, Twitter, Instagram and TikTok --
enable individuals to maintain social connections, stay informed and access, as well as share a
wealth of information. These sites also enable marketers to reach their target audiences.
Social Networking:
The term social networking entails having connections in both the real and the digital worlds.
Today, this term is mainly used to reference online social communications. The internet has made
it possible for people to find and connect with others who they may never have met otherwise.
Online social networking is dependent on technology and internet connectivity. Users can access
social networking sites using their PCs, tablets or smartphones. Most social networking sites run
on a back end of searchable databases that use advanced programming languages, such as Python,
to organize, store and retrieve data in an easy-to-understand format. For example, Tumblr uses
such products and services in its daily operations as Google Analytics, Google Workspace and
WordPress.
Purpose of Social Networking:
Social networking fulfils the following four main objectives:
1. Sharing -Friends or family members who are geographically dispersed can connect
remotely and share information, updates, photos and videos. Social networking also
enables individuals to meet other people with similar interests or to expand their current
social networks.
2. Learning - Social networks serve as great learning platforms. Consumers can instantly
receive breaking news, get updates regarding friends and family, or learn about what's
happening in their community.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

46

3. Interacting - Social networking enhances user interactions by breaking the barriers of time
and distance. With cloud-based video communication technologies such as WhatsApp or
Instagram Live, people can talk face to face with anyone in the world.
4. Marketing - Companies may tap into social networking services to enhance brand
awareness with the platform's users, improve customer retention and conversion rates, and
promote brand and voice identity.
Types of Social Networking:
six most common types are the following:
1. Social connections. This is a type of social network where people stay in touch with
friends, family members, acquaintances or brands through online profiles and updates, or
find new friends through similar interests. Some examples are Facebook, Myspace and
Instagram.
2. Professional connections. Geared toward professionals, these social networks are
designed for business relationships. These sites can be used to make new professional
contacts, enhance existing business connections and explore job opportunities, for
example. They may include a general forum where professionals can connect with co-
workers or offer an exclusive platform based on specific occupations or interest levels.
Some examples are LinkedIn, Microsoft Yammer and Microsoft Viva.
3. Sharing of multimedia. Various social networks provide video- and photography-sharing
services, including YouTube and Flickr.
4. News or informational. This type of social networking allow users to post news stories,
informational or how-to content and can be general purpose or dedicated to a single topic.
These social networks include communities of people who are looking for answers to
everyday problems and they have much in common with web forums. Fostering a sense
of helping others, members provide answers to questions, conduct discussion forums or
teach others how to perform various tasks and projects. Popular examples include Reddit,
Stack Overflow or Digg.
5. Communication. Here, social networks focus on allowing the user to communicate
directly with each other in one-on-one or group chats. They have less focus on posts or
updates and are like instant messaging apps. Some examples are WhatsApp, WeChat and
Snapchat.
6. Educational. Educational social networks offer remote learning, enabling students and
teachers to collaborate on school projects, conduct research, and interact through blogs
and forums. Google Classroom, LinkedIn Learning and ePals are popular examples.
Advantages of Social Networking
1. Brand awareness. Social networking enables companies to reach out to new and existing
clients. This helps to make brands more relatable and promotes brand awareness.
2. Instant reachability. By erasing the physical and spatial boundaries between people, social
networking websites can provide instant reachability.
3. Builds a following. Organizations and businesses can use social networking to build a
following and expand their reach globally.
4. Business success. Positive reviews and comments generated by customers on social
networking platforms can help improve business sales and profitability.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

47

5. Increased website traffic. Businesses can use social networking profiles to boost and direct
inbound traffic to their websites. They can achieve this, for example, by adding inspiring
visuals, using plugins and shareable social media buttons, or encouraging inbound linking.
Disadvantages of Social Networking:
1. Rumors and misinformation. Incorrect information can slip through the cracks of social
networking platforms, causing havoc and uncertainty among consumers. Often, people
take anything posted on social networking sites at face value instead of verifying the
sources.
2. Negative reviews and comments. A single negative review can adversely affect an
established business, especially if the comments are posted on a platform with a large
following. A tarnished business reputation can often cause irreparable damage.
3. Data security and privacy concerns. Social networking sites can inadvertently put
consumer data at risk. For instance, if a social networking site experiences a data breach,
the users of that platform automatically fall under the radar as well. According to Business
Insider, a data breach in April 2021 leaked the personal data of more than 500 million
Facebook users.
4. Time-consuming process. Promoting a business on social media requires constant
upkeep and maintenance. Creating, updating, preparing and scheduling regular posts can
take a considerable amount of time. This can be especially cumbersome for small
businesses that may not have the extra staff and resources to dedicate to social media
marketing.
1.2.7 Mobile computing
Mobile Computing refers a technology that allows transmission of data, voice and video via a
computer or any other wireless enabled device. It is free from having a connection with a fixed
physical link. It facilitates the users to move from one physical location to another during
communication.
Introduction of Mobile Computing
Mobile Computing is a technology that provides an environment that enables users to transmit
data from one device to another device without the use of any physical link or cables.
In other words, Mobile computing allows transmission of data, voice and video via a computer
or any other wireless-enabled device without being connected to a fixed physical link. In this
technology, data transmission is done wirelessly with the help of wireless devices such as
mobiles, laptops etc.
With Mobile Computing technology one can access and transmit data from any remote locations
without being present there physically. Mobile computing technology provides a vast coverage
diameter for communication. It is one of the fastest and most reliable sectors of the computing
technology field.
The concept of Mobile Computing can be divided into three parts:
o Mobile Communication
o Mobile Hardware

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

48

o Mobile Software
Mobile Communication
Mobile Communication specifies a framework that is responsible for the working of mobile
computing technology. In this case, mobile communication refers to an infrastructure that ensures
seamless and reliable communication among wireless devices. This framework ensures the
consistency and reliability of communication between wireless devices. The mobile
communication framework consists of communication devices such as protocols, services,
bandwidth, and portals necessary to facilitate and support the stated services. These devices are
responsible for delivering a smooth communication process.
Mobile communication can be divided in the following four types:
1. Fixed and Wired
2. Fixed and Wireless
3. Mobile and Wired
4. Mobile and Wireless

Fixed and Wired:


In Fixed and Wired configuration, the devices are fixed at a position, and they are connected
through a physical link to communicate with other devices.
Example - Desktop Computer.
Fixed and Wireless:
In Fixed and Wireless configuration, the devices are fixed at a position, and they are connected
through a wireless link to make communication with other devices.
Example - Communication Towers, WiFi router
Mobile and Wired:
In Mobile and Wired configuration, some devices are wired, and some are mobile. They
altogether make communication with other devices.
Example - Laptops.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

49

Mobile and Wireless:


In Mobile and Wireless configuration, the devices can communicate with each other irrespective
of their position. They can also connect to any network without the use of any wired device.
Example - WiFi Dongle.
Mobile Hardware
Mobile hardware consists of mobile devices or device components that can be used to receive or
access the service of mobility. Examples of mobile hardware can be smartphones, laptops,
portable PCs, tablet PCs, Personal Digital Assistants, etc.

These devices are inbuilt with a receptor medium that can send and receive signals. These devices
are capable of operating in full-duplex. It means they can send and receive signals at the same
time. They don't have to wait until one device has finished communicating for the other device
to initiate communications.
Mobile Software
Mobile software is a program that runs on mobile hardware. This is designed to deal capably with
the characteristics and requirements of mobile applications. This is the operating system for the
appliance of mobile devices. In other words, you can say it the heart of the mobile systems. This
is an essential component that operates the mobile device.

This provides portability to mobile devices, which ensures wireless communication.


Applications of Mobile Computing
Following is a list of some significant fields in which mobile computing is generally applied:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

50

o Web or Internet access.


o Global Position System (GPS).
o Emergency services.
o Entertainment services.
o Educational services.
1.2.8 Characteristics of third platform infrastructure
The Third Platform infrastructure is characterized by several key features and characteristics that
set it apart from previous computing paradigms. These characteristics are fundamental to the
modern IT landscape and enable organizations to leverage advanced technologies to meet their
evolving business needs.
The primary characteristics of Third Platform infrastructure:

Scalability - Third Platform infrastructure is designed to be highly scalable. Cloud services can
be easily scaled up or down based on demand, ensuring that organizations can handle fluctuations
in workload without overprovisioning resources. This scalability supports agility and cost-
efficiency.

Security and Compliance - Security is a critical consideration in Third Platform infrastructure.


Cloud providers implement robust security measures, including encryption, identity and access
management, and compliance certifications. Organizations must also take responsibility for
securing their applications and data.

Availability - Third Platform infrastructure is designed for high availability and disaster recovery.
Cloud providers offer redundancy and data replication across multiple regions to ensure business
continuity.

Performance - Performance in Third Platform infrastructure is a critical consideration as


organizations increasingly rely on digital technologies to deliver services, process data, and
support their operations. Ensuring optimal performance is essential to meet user expectations,
maintain productivity, and deliver a positive user experience.

Manageability - Manageability in Third Platform infrastructure is crucial for efficiently


deploying, monitoring, and maintaining complex IT systems and services. As organizations
increasingly rely on digital technologies to support their operations, ensuring manageability is
essential for maintaining security, scalability, performance, and overall operational excellence.

Ease of Access - Ease of access generally refers to how easily individuals can obtain or interact
with something, whether it's information, services, physical spaces, or digital resources. In the
context of technology and digital services, ease of access is a critical consideration to ensure that
users can efficiently and conveniently access and utilize various resources.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

51

Agility - Agility in Third Platform infrastructure refers to the ability of an organization's IT


environment to rapidly adapt to changing business needs, technology advancements, and market
dynamics. Third Platform infrastructure, which encompasses cloud computing, big data analytics,
mobile computing, and other modern technologies, is designed to provide the flexibility and
agility required for organizations to stay competitive and responsive in a rapidly evolving digital
landscape. Ability to quickly adapt to changing needs.

Flexibility - Third Platform infrastructure offers flexibility in choosing deployment models.


Organizations can adopt a hybrid cloud approach, combining public and private cloud resources
as needed. This flexibility allows them to optimize performance, security, and compliance while
meeting specific business requirements. Support a wide range of workloads and applications.

Resiliency - Resiliency in Third Platform infrastructure is a critical aspect that ensures that IT
systems and services can continue to operate reliably and recover quickly from disruptions or
failures. With the increasing complexity of modern IT environments, resiliency is essential to
maintain business continuity, minimize downtime, and protect against various threats and
challenges. High availability and Fault tolerance.

Interoperability - Interoperability in Third Platform infrastructure refers to the ability of various


systems, applications, and components to work together seamlessly, share data, and exchange
information effectively. Ability of computer systems or software to exchange and make use of
information.

1.2.9 Imperatives for third platform transformation.


Here are the imperatives for third platform transformation, The below imperatives are the reasons
for the transformation to third platform.
New Business Models:
The term business model refers to a company's plan for making a profit. It identifies the products
or services the business plans to sell, its identified target market, and any anticipated expenses.
Business models are important for both new and established businesses. They help new,
developing companies attract investment, recruit talent, and motivate management and staff. The
new business models is a main imperative for the third platform transformation.
Established businesses should regularly update their business model or they'll fail to anticipate
trends and challenges ahead. Business models also help investors evaluate companies that interest
them and employees understand the future of a company they may aspire to join.
Retailer, Manufacturer, Fee-for-Service, Subscription, Freemium, Bundling, Marketplace,
Affiliate, Razor Blade, Reverse Razor Blade, Franchise, Pay-As-You-Go, Brokerage are some
types of new business models.
Data intelligence
Data intelligence refers to the tools and methods that enterprise-scale organizations use to better
understand the information they collect, store, and utilize to improve their products and/or
services. Apply AI and machine learning to stored data, and you get data intelligence.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

52

Agility
Agility is the ability to be quick and graceful. Agility in today’s business and the world around is
an important factor for the third platform transformation.
Intelligent Operations
Intelligent Operations is a bold new approach to achieve Operational Excellence (OpEx). It uses
digital transformation to optimize production, minimize equipment downtime, enhance human
performance, and manage operational risks.
New product and Services
A product is a tangible offering to a customer, whereas a service is an intangible offering. The
former is usually a one-time exchange for value. In contrast, a service usually involves a longer
period of time. The value of a product is inherent in the tangible offering itself, for example, in
the can of paint or a pair of pants. In contrast, the value of a service often comes from the eventual
benefit that the customer perceives from the time while using the service. In addition, the
customer often judges the value of a service based on the quality of the relationship between the
provider and the customer while using the service.
Mobility
Business mobility, also known as enterprise mobility, is the growing trend of businesses to offer
remote working options, allow the use of personal laptops and mobile devices for business
purposes, and make use of cloud technology for data access.
Social Networking
Social networking involves using online social media platforms to connect with new and existing
friends, family, colleagues, and businesses. Individuals can use social networking to announce
and discuss their interests and concerns with others who may support or interact with them.
1.3 Data Center Environment
Organizations maintain data centers to provide centralized data processing capabilities across the
enterprise. Data centers store and manage large amounts of mission-critical data. The data center
infrastructure includes computers, storage systems, network devices, dedicated power backups,
and environmental controls (such as air conditioning and fire suppression).
Large organizations often maintain more than one data center to distribute data processing
workloads and provide backups in the event of a disaster. The storage requirements of a data
center are met by a combination of various storage architectures.
Core Elements
Five core elements are essential for the basic functionality of a data center:
Application: An application is a computer program that provides the logic for computing
operations. Applications, such as an order processing system, can be layered on a database, which
in turn uses operating system services to perform read/write operations to storage devices.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

53

Database: More commonly, a database management system (DBMS) provides a structured way
to store data in logically organized tables that are interrelated. A DBMS optimizes the storage and
retrieval of data.
Server and operating system: A computing platform that runs applications and databases.
Network: A data path that facilitates communication between clients and servers or between
servers and storage.
Storage array: A device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the elements
must work together to address data processing requirements. Figure 1-5 shows an example of an
order processing system that involves the five core elements of a data center and illustrates their
functionality in a business process.

1. A customer places an order through the AUI of the order processing application software
located on the client computer.
2. The client connects to the server over the LAN and accesses the DBMS located on the
server to update the relevant information such as the customer name, address, payment
method, products ordered, and quantity ordered.
3. The DBMS uses the server operating system to read and write this data to the database
located on physical disks in the storage array.
4. The Storage Network provides the communication link between the server and the storage
array and transports the read or write commands between them.
5. The storage array, after receiving the read or write commands from the server, performs
the necessary operations to store the data on physical disks.
Figure 1-5: Example of an order processing system
1.3.1 Building blocks of a data center
Physical Infrastructure:
Facility Location:
The location of the data center is crucial, considering factors like proximity to power sources,
network connectivity, and disaster risk.
Building:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

54

The physical structure housing the data center, which must be designed to withstand
environmental threats, such as earthquakes, floods, and fire.
Power Infrastructure:
Ensures a continuous and reliable power supply, including backup generators and uninterruptible
power supplies (UPS) to prevent data loss during power outages.
Cooling and HVAC:
Efficient cooling systems are necessary to maintain the proper temperature and humidity levels
inside the data center, as servers generate a significant amount of heat.
Racks and Cabinets:
Server Racks:
These house the servers, networking equipment, and other hardware. They are designed to
optimize space and airflow for cooling.
Cabinets:
Secure cabinets or enclosures to store networking equipment, switches, and patch panels.
Servers and Hardware:
Server Hardware:
The core computing components, including servers, storage devices, and backup systems.
Storage Systems:
Arrays of hard drives or solid-state drives (SSDs) for data storage and retrieval.
Networking Equipment:
Routers, switches, and firewalls to manage data traffic within the data center and across networks.
Power Distribution:
Power Distribution Units (PDUs):
Devices that distribute power to servers and networking equipment within racks and cabinets.
Redundancy:
Implementing redundancy in power distribution to ensure uninterrupted operation.
Cooling and Environmental Control:
Precision Air Conditioning: HVAC systems designed to maintain a controlled environment for
temperature and humidity.
Hot and Cold Aisle Containment: Arranging server racks in hot and cold aisles to optimize
cooling efficiency.
Network Infrastructure:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

55

Internet Connectivity: Multiple high-speed internet connections and redundancy for


uninterrupted network access.
Firewalls and Security:
Hardware and software systems to protect the data center from cyber threats.
Load Balancers:
Distribute incoming network traffic across multiple servers to optimize performance and ensure
high availability.
Monitoring and Management:
Data Center Management Software:
Tools for monitoring and managing servers, networking equipment, and power consumption.
Environmental Sensors:
Sensors for tracking temperature, humidity, and other environmental factors.
Security Measures:
Physical Security:
Access controls, surveillance cameras, and biometric authentication to protect against
unauthorized access.
Fire Suppression:
Fire detection and suppression systems to safeguard against fire hazards.
Backup and Disaster Recovery:
Data Backup:
Regularly backing up critical data to prevent data loss in case of hardware failures or disasters.
Disaster Recovery Plan:
Strategies and procedures for quickly recovering data and operations in the event of a catastrophe.
Scalability and Future Expansion:
Data centers are typically designed with the ability to scale up by adding more servers, storage,
and networking equipment as the organization's needs grow.
Compliance and Regulations:
Ensuring that the data center complies with industry-specific regulations and standards, such as
GDPR, HIPAA, or ISO 27001.
Human Resources:
Skilled personnel for managing and maintaining the data center, including system administrators,
network engineers, and security experts.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

56

Documentation and Processes:


Proper documentation of configurations, processes, and procedures for efficient data center
management and troubleshooting.
Energy Efficiency:
Implementing green technologies and practices to reduce energy consumption and environmental
impact.
Remote Management:
Tools and systems for remote monitoring and management of data center resources, allowing for
remote troubleshooting and maintenance.
1.3.2 Compute systems
A compute system essentially means a computer; however, data centers use specific computers called
servers, and so the term computing system will often be used to refer to a server.

A personal computer tends to come with more input/output devices such as a mouse, keyboard,
and monitor so a user can interact directly with the system while servers do not usually include
components for direct user interaction. Of course, a computing system, even a data center server,
must have logical components such as an OS and other forms of system software.
Outside of a data center, computing systems can be large or small, depending on a user’s
computing needs. For example, a gamer will probably use a large gaming laptop or PC tower
because they need computing speed and space for good graphics, but that would be overkill for
someone just looking up a recipe to make dinner. They would be fine with using a tablet or phone.
However, in a data center, a computing system needs to be large enough to provide the amount
of processing power needed to store and process data for a large number of users at one time such
as for businesses, companies, and government agencies. That is why data centers use servers.
This is because unlike desktop computers, which focus their processing power on user-friendly
features like a desktop screen and media, servers are designed to use most of their processing
power to host services and to interact with other machines. This makes servers more efficient and
better equipped to deal with higher workloads.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

57

The tower system is a traditional PC tower. These are also commonly used in computer labs and
offices so more than likely, you’ve used a tower system like a personal computer, but not as a
server.
The rack-mounted system is a thin, large rectangular compute system that slides onto the racks
of a frame. When the frame is full of rack-mounted systems, it resembles a tall metal set of
drawers. A typical metal frame used for this type of system is called a 19-inch rack, based on its
width. The height of the frame is measured by slots available for rack-mounted servers, called
rack units. For example, a normal 19-inch rack is 42u, which means it holds up to 42 rack-
mounted servers. The 42u is taller than the average person, but smaller frames are available, such
as 1u and 8u, which take up less space.
The blade server, like the rack-mounted, has rectangular hardware inserted into a larger frame.
However, these are usually inserted vertically into the frame, which would look like a set of
drawers on its side. The adoption of smaller form-factor blade servers is growing dramatically.
Since the transition to blade architectures is generally driven by a desire to consolidate physical
IT resources, virtualization is an ideal complement for blade servers because it delivers benefits
such as resource optimization, operational efficiency, and rapid provisioning.
1.3.3 Compute virtualization
Compute virtualization is a process by which a virtual version is created of computing hardware, operating
systems, computer networks or other resources. It is a simplification of traditional architectures in order
to reduce the number of physical devices.

Compute virtualization is a process which enhances the efficiency and reduces the cost of IT
infrastructure. It provides a flexible model for virtual machines through which physical servers
are treated as a pool of resources. It works by consolidating the servers, and thus reduces the need

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

58

for computer equipment and other related hardware, thus reducing the costs. It simplifies the
business procedures related to licensing, and thus can make things more manageable. It creates a
centralized infrastructure which can be shared and accessed from various employees sitting in
different locations at a time.

1.3.4 Software-defined data center


Software-defined data center (SDDC) refers to a data center where infrastructure is virtualized
through abstraction, resource pooling, and automation to deliver Infrastructure-as-a-service
(IAAS). Software-defined infrastructure lets IT administrators easily provision and manage
physical infrastructure using software-defined templates and APIs to define and automate
infrastructure configuration and lifecycle operations.

Software-defined data centers are considered by many to be the next step in the evolution of
virtualization, container and cloud services.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

59

Advantages of a software-defined data center


Simplify data center management:
SDDC can be managed via a central dashboard enabling IT users to view inventory, health status
and control all of the server, storage and networking infrastructure via intelligent software.
Faster IT Service Delivery
Software-defined intelligence enables automated provisioning with repeatable templates that
ensure high reliability, consistency, and control across the SDDC.
Reduce Costs
By utilizing Composable Infrastructure within a SDDC, IT can pool resources for any workload
– bare metal, virtualized, and containerized to eliminate silos and reduce over provisioning while
utilizing software defined intelligence to achieve faster time to value.
Gain cloud agility on‑premises
SDDC enables “infrastructure as code” for superior control, programmability, and extensibility.
Business applications, infrastructure management, automation and service orchestration tools can
stand-up infrastructure and provision resources in real time to support dynamic workloads and
fluctuating business demands, and to enable DevOps, self-service IT and agile development
practices.

Two Marks with Answers

1. Define Data
Ans:

Data is a collection of raw factsfrom which conclusions may be drawn. Handwritten letters, a printed book,
a family photograph, a movie on video tape, printed and duly signed copies of mortgage papers, a bank’s
ledgers, and an account holder’s passbooks are all examples of data.

2. Define Information
Ans:

Information is the intelligence and knowledge derived from data. Effective data analysis not only extends
its benefits to existing businesses, but also creates the potential for new business opportunities by using
the information in creative ways.

3. What is called as Digital Data?


Ans:

➢ Digital Data is any information that is processed and stored in a digital format, such as text, images,
audio, and video.
➢ It can be easily manipulated and transmitted electronically.
➢ This data is represented using binary code (0s and 1s)
➢ Examples - Emails, Social Media posts, Digital Images, Music Files, Video Files, and more.

4. What are the types of Digital Datas?

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

60

Ans: The Types of Digital Datas are as follows

➢ Structured Data
➢ Unstructured Data
➢ Semi - Structured Data

5. What is Structured Data with Example?


✓ Structured data is usually stored in well-defined schemas such as Databases.
✓ Structured data is a data that is highly organized and follows a predefined schema or data model. It is
generally tabular with column and rows that clearly define its attributes.
Examples:

➢ Relational databases, Spread Sheets, XML file, JSON files.

6. What is Unstructured Data with Example?


Ans: Unstructured data is a data that lacks a specific format, often contains text, multimedia, or other
content. Data is unstructured if data cannot be stored in rows and columns. It is difficult to query and
retrieve by business application.

Examples:

➢ Text documents, Images and Videos,Social media posts and comments,Email messages.

7. What is Semi- Structured Data with Example?


Ans: Semi -Structured Data is a data that is not captured or formatted in conventional ways.

It does not follow the format of a tabular data model or relational databases.It does not have a fixed
schema. Data is not completely raw or unstructured .It contains some structural elements such as tags and
organizational metadata that make it easier to analyze.

Example – HTML code, Graphs and tables, E-mails, XML documents.

8. What is Big Data?


Ans: The definition of big data is data that contains greater variety, arriving in increasing volumes and
with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data
sets, especially from new data sources.

9. Difference between Structured, Unstructured and Semi-structured data with examples


Ans:

Properties Structured data Unstructured data Semi-structured data

Technology Relational database Character and binary XML / RDF


table data

Transaction Matured transaction, No transaction Transaction


Management various concurrency management, no management adapted
techniques concurrency. from RDBMS not
matured.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

61

Version Mangement Versioning over Versioned as a whole. Versioning over tuples


tuples, rows, tables or graphs is possible.
etc.

Flexibility Less flexible, Schema Very flexible, Flexible, tolerant


dependent rigorous absence of schema schema
schema

Scalability Scaling DB schema is Very scalable Schema scaling is


difficult simple

Robustness Very robust Less robust New technology not


widely spread

Query Performance Structured query Only texual queries Queries over


allows complex joins possible anonymous nodes are
possible

Format Predefined format Variety of formats -

Analysis Easy Difficult Easy

Examples Relational databases, Text documents, HTML code, Graphs


Spread Sheets Images and Videos, and tables, E-mails,
Socialmedia posts XML documents.
and comments,Email
messages.

10. What is data explosion?


Ans:

Inexpensive and easier ways to create, collect, and store all types of data, coupled with increasing
individual and business needs, have led to accelerated data growth, popularly termed the data explosion.
Data has different purposes and criticality, so both individuals and businesses have contributed in varied
proportions to this data explosion.

11. What is meant by Storage?


Ans:

➢ Data created by individuals or businesses must be stored so that it is easily accessible for further
processing. In a computing environment, devices designed for storing data are termed storage devices
or simply storage.
➢ This type of storage used varies based on the type of data and the rate at which it is created and used.
➢ Devices such as memory in a cell phone or digital camera, DVDs, CD-ROMs, and hard disks in
personal computers are examples of storage devices.
➢ Businesses have several options available for storing data including internal hard disks, external disk
arrays and tapes.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

62

12. What is meant by Information Storage?


Ans:

➢ Information Storage refers to the process of collecting, preserving, and organizing data in a manner
that allows for efficient retrieval and use at a later time.
➢ Information storage systems are an integral part of our daily lives, playing a crucial role in various
fields, including Business, Science, Education, Personal communication.
13. What is meant by Cloud Computing?
Ans: Cloud computing is the delivery of computing services—including servers, storage, databases,
networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster
innovation, flexible resources, and economies of scale.

14. Define Computing?


Ans: Computing is the process of using computer technology to complete a given goal-oriented task.
Computing may encompass the design and development of software and hardware systems for a broad
range of purposes – often structuring, processing and managing any kind of information – to aid in the
pursuit of scientific studies, making intelligent systems, and creating and using different media for
entertainment and communication.

15. What is Information Lifecycle?


Ans:

The information lifecycle is the “change in the value of information” over time. When data is first created,
it often has the highest value and is used frequently. As data ages, it is accessed less frequently and is of
less value to the organization.

16. What are the types of Cloud Deployment Models?


Ans: Types of cloud deployment models are as follows

➢ Public Cloud
➢ Private Cloud
➢ Hybrid Cloud
➢ Community Cloud
➢ Multi-Cloud
17. What is public cloud with example?
Ans: The public cloud is defined as computing services offered by third-party providers over the public
Internet, making them available to anyone who wants to use or purchase them. They may be free or sold
on-demand, allowing customers to pay only per usage for the CPU cycles, storage, or bandwidth they
consume.

Examples: Google Workspace, Amazon Web Services (AWS), Dropbox, and Microsoft offerings like
Microsoft 365 and Azure, as well as streaming services like Netflix.

18. What is private cloud with example?


Ans: A private cloud is a cloud computing environment dedicated to a single organization. Any cloud
infrastructure has underlying compute resources like CPU and storage that you provision on demand
through a self-service portal. In a private cloud, all resources are isolated and in the control of one
organization.

Amazon VPC, HPE, VMware, and IBM. They leverage technologies like Virtualization, Management
Software, and Automation to achieve this. A private cloud can also leverage DevOps and cloud-native
practices to maximize agility

19. What are the types Cloud Service Models?

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

63

Infrastructure as a service (IaaS) - Just Network is provided.

Platform as a service (PaaS) - Operating system and network is provided

Software as a service (SaaS) - Required software, Operating system & network is provided.

Anything/Everything as a service (XaaS) - Combination of all services with some additional service..
Function as a Service (FaaS) –Same as (PaaS)

20. What is Data Center?


Ans: It is collection of servers where application is placed and is accessed via internet. A data center is a
physical room, building or facility that houses IT infrastructure for building, running, and delivering
applications and services, and for storing and managing the data associated with those applications and
services.

21. What is Software- Defined Data Center?


Ans:Software-defined data center (SDDC) refers to a data center where infrastructure is virtualized
through abstraction, resource pooling, and automation to deliver Infrastructure-as-a-service (IAAS).

22. What is called as Big Data Analytics?


Ans: Big data analytics describes the process of uncovering trends, patterns, and correlations in large
amounts of raw data to help make data-informed decisions. These processes use familiar statistical analysis
techniques—like clustering and regression—and apply them to more extensive datasets with the help of
newer tools.

23. What is meant by Compute Virtualization?


Ans: Compute virtualization is a process by which a virtual version is created of computing hardware,
operating systems, computer networks or other resources. It is a simplification of traditional architectures
in order to reduce the number of physical devices.

24. What is Mobile Computing?


Ans: Mobile computing refers to the set of IT technologies, products, services and operational strategies
and procedures that enable end users to access computation, information and related resources and
capabilities while mobile. Mobile most commonly refers to access in motion, where the user is not
restricted to a given geographic location.

25. What is Compute system?


Ans: A compute system essentially means a computer; however, data centers use specific computers called
servers, and so the term computing system will often be used to refer to a server.

26. What are the Key Components of Data Center Environment?


Ans: The key components of a data center design include

➢ Routers,
➢ Switches,
➢ Firewalls,
➢ Storage systems,
➢ Servers,
➢ Application-delivery controllers.

Review Questions:

1. Explain in detail about Evolution of Computing Platforms?


2. Write about the characteristics of Third Platform Infrastructure?

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

64

3. Explain in detail about the Building blocks of Data Center?


4. Explain about Social Networking in Detail?
5. Explain about ILM in detail?
6. Explain the characteristics of Data Center in detail?
7. Discuss in detail about Cloud Service and Deployment models?
8. Explain in detail about IaaS, PaaS, SaaS with Examples?
9. Write about Mobile Computing with example in detail?
10. Explain in detail about Big Data Analytics?
11. Discuss about imperatives for third platform transformation?
12. Explain about the essential characteristics of cloud computing?
13. Explain in detail about compute systems and Compute virtualization
14. Describe about Software defined data-center?

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

65

UNIT 2 - INTELLIGENT STORAGE SYSTEMS AND RAID


2.1 Intelligent Storage Systems

The intelligent storage systems are arrays that provide highly optimized I/O processing
capabilities. These arrays have an operating environment that controls the management,
allocation, and utilization of storage resources. These storage systems are configured with large
amounts of memory called cache and use sophisticated algorithms to meet the I/O requirements
of performance sensitive applications.
2.1.1 Components of an Intelligent Storage System
An intelligent storage system consists of four key components: front end, cache, back end, and
physical disks. Figure 4-1 illustrates these components and their interconnections. An I/O request
received from the host at the front-end port is processed through cache and the back end, to enable
storage and retrieval of data from the physical disk. A read request can be serviced directly from
cache if the requested data is found in cache.

2.1.1.1 Front End


The front end provides the interface between the storage system and the host. It consists of two
components: front-end ports and front-end controllers. The front-end ports enable hosts to
connect to the intelligent storage system. Each front-end port has processing logic that executes
the appropriate transport protocol, such as SCSI, Fibre Channel, or iSCSI, for storage
connections. Redundant ports are provided on the front end for high availability. Front-end
controllers route data to and from cache via the internal data bus. When cache receives write data,
the controller sends an acknowledgment message back to the host. Controllers optimize I/O
processing by using command queuing algorithms.
Front-End Command Queuing
Command queuing is a technique implemented on front-end controllers. It determines the
execution order of received commands and can reduce unnecessary drive head movements and
improve disk performance. When a command is received for execution, the command queuing
algorithms assigns a tag that defines a sequence in which commands should be executed. With

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

66

command queuing, multiple commands can be executed concurrently based on the organization
of data on the disk, regardless of the order in which the commands were received.
The most commonly used command queuing algorithms are as follows:
■ First In First Out (FIFO): This is the default algorithm where commands are executed in the
order in which they are received (Figure 4-2 [a]). There is no reordering of requests for
optimization; therefore, it is inefficient in terms of performance.
■ Seek Time Optimization: Commands are executed based on optimizing read/write head
movements, which may result in reordering of commands. Without seek time optimization, the
commands are executed in the order they are received. For example, as shown in Figure 4-2(a),
the commands are executed in the order A, B, C and D. The radial movement required by the
head to execute C immediately after A is less than what would be required to execute B. With
seek time optimization, the command execution sequence would be A, C, B and D, as shown in
Figure 4-2(b).
■ Access Time Optimization: Commands are executed based on the combination of seek time
optimization and an analysis of rotational latency for optimal performance.
Command queuing can also be implemented on disk controllers and this may further supplement
the command queuing implemented on the front-end controllers. Some models of SCSI and Fibre
Channel drives have command queuing implemented on their controllers.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

67

2.1.1.2 Cache
Cache is an important component that enhances the I/O performance in an intelligent storage
system. Cache is semiconductor memory where data is placed temporarily to reduce the time
required to service I/O requests from the host. Cache improves storage system performance by
isolating hosts from the mechanical delays associated with physical disks, which are the slowest
components of an intelligent storage system. Accessing data from a physical disk usually takes a
few milliseconds because of seek times and rotational latency. If a disk has to be accessed by the
host for every I/O operation, requests are queued, which results in a delayed response. Accessing
data from cache takes less than a millisecond. Write data is placed in cache and then written to
disk. After the data is securely placed in cache, the host is acknowledged immediately.
Structure of Cache
Cache is organized into pages or slots, which is the smallest unit of cache allocation. The size of
a cache page is configured according to the application I/O size. Cache consists of the data store
and tag RAM. The data store holds the data while tag RAM tracks the location of the data in the
data store (see Figure 4-3) and in disk. Entries in tag RAM indicate where data is found in cache
and where the data belongs on the disk. Tag RAM includes a dirty bit flag, which indicates
whether the data in cache has been committed to the disk or not. It also contains time-based
information, such as the time of last access, which is used to identify cached information that has
not been accessed for a long period and may be freed up

Read Operation with Cache

When a host issues a read request, the front-end controller accesses the tag RAM to determine
whether the required data is available in cache. If the requested data is found in the cache, it is
called a read cache hit or read hit and data is sent directly to the host, without any disk operation
(see Figure 4-4[a]). This provides a fast response time to the host (about a millisecond). If the
requested data is not found in cache, it is called a cache miss and the data must be read from the
disk (see Figure 4-4[b]). The back-end controller accesses the appropriate disk and retrieves the
requested data. Data is then placed in cache and is finally sent to the host through the front-end
controller. Cache misses increase I/O response time.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

68

A pre-fetch, or read-ahead, algorithm is used when read requests are sequential. In a sequential
read request, a contiguous set of associated blocks is retrieved. The intelligent storage system
offers fixed and variable pre-fetch sizes.
In fixed pre-fetch, the intelligent storage system pre-fetches a fixed amount of data. It is most
suitable when I/O sizes are uniform.
In variable pre-fetch, the storage system pre-fetches an amount of data in multiples of the size of
the host request. Maximum pre-fetch limits the number of data blocks that can be pre-fetched to
prevent the disks from being rendered busy with pre-fetch at the expense of other I/O.

Read performance is measured in terms of the read hit ratio, or the hit rate, usually expressed as
a percentage. This ratio is the number of read hits with respect to the total number of read requests.
A higher read hit ratio improves the read performance.
Write Operation with Cache
Write operations with cache provide performance advantages over writing directly to disks. When
an I/O is written to cache and acknowledged, it is completed in far less time (from the host’s
perspective) than it would take to write directly to disk. Sequential writes also offer opportunities
for optimization because many smaller writes can be coalesced for larger transfers to disk drives
with the use of cache.
A write operation with cache is implemented in the following ways:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

69

■ Write-back cache: Data is placed in cache and an acknowledgment is sent to the host
immediately. Later, data from several writes are committed (de-staged) to the disk. Write response
times are much faster, as the write operations are isolated from the mechanical delays of the disk.
However, uncommitted data is at risk of loss in the event of cache failures.
■ Write-through cache: Data is placed in the cache and immediately written to the disk, and an
acknowledgment is sent to the host. Because data is committed to disk as it arrives, the risks of
data loss are low but write response time is longer because of the disk operations.
Cache can be bypassed under certain conditions, such as very large size write I/O. In this
implementation, if the size of an I/O request exceeds the predefined size, called write aside size,
writes are sent to the disk directly to reduce the impact of large writes consuming a large cache
area. This is particularly useful in an environment where cache resources are constrained and
must be made available for small random I/O s.
Cache Implementation
Cache can be implemented as either dedicated cache or global cache. With dedicated cache,
separate sets of memory locations are reserved for reads and writes. In global cache, both reads
and writes can use any of the available memory addresses. Cache management is more efficient
in a global cache implementation, as only one global set of addresses has to be managed.
Global cache may allow users to specify the percentages of cache available for reads and writes
in cache management. Typically, the read cache is small, but it should be increased if the
application being used is read intensive. In other global cache implementations, the ratio of cache
available for reads versus writes is dynamically adjusted based on the workloads.
Cache Management
Cache is a finite and expensive resource that needs proper management. Even though intelligent
storage systems can be configured with large amounts of cache, when all cache pages are filled,
some pages have to be freed up to accommodate new data and avoid performance degradation.
Various cache management algorithms are implemented in intelligent storage systems to
proactively maintain a set of free pages and a list of pages that can be potentially freed up
whenever required:
■ Least Recently Used (LRU): An algorithm that continuously monitors data access in cache
and identifies the cache pages that have not been accessed for a long time. LRU either frees up
these pages or marks them for reuse. This algorithm is based on the assumption that data which
hasn’t been accessed for a while will not be requested by the host. However, if a page contains
write data that has not yet been committed to disk, data will first be written to disk before the
page is reused.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

70

Most Recently Used (MRU): An algorithm that is the converse of LRU. In MRU, the pages that
have been accessed most recently are freed up or marked for reuse. This algorithm is based on
the assumption that recently accessed data may not be required for a while.
As cache fills, the storage system must take action to flush dirty pages (data written into the cache
but not yet written to the disk) in order to manage its availability.
Flushing is the process of committing data from cache to the disk. On the basis of the I/O access
rate and pattern, high and low levels called watermarks are set in cache to manage the flushing
process.
1. High watermark (HWM) is the cache utilization level at which the storage system starts
high speed flushing of cache data.
2. Low watermark (LWM) is the point at which the storage system stops the high-speed or
forced flushing and returns to idle flush behaviour.
The cache utilization level, as shown in Figure 4-5, drives the mode of flushing to be used:
■ Idle flushing: Occurs continuously, at a modest rate, when the cache utilization level is between
the high and low watermark.
■ High watermark flushing: Activated when cache utilization hits the high watermark. The
storage system dedicates some additional resources to flushing. This type of flushing has minimal
impact on host I/O processing.
■ Forced flushing: Occurs in the event of a large I/O burst when cache reaches 100 percent of
its capacity, which significantly affects the I/O response time. In forced flushing, dirty pages are
forcibly flushed to disk.
Cache Data Protection
Cache is volatile memory, so a power failure or any kind of cache failure will cause the loss of
data not yet committed to the disk. This risk of losing uncommitted data held in cache can be
mitigated using cache mirroring and cache vaulting:
■ Cache mirroring: Each write to cache is held in two different memory locations on two
independent memory cards. In the event of a cache failure, the write data will still be safe in the
mirrored location and can be committed to the disk. Reads are staged from the disk to the cache;
therefore, in the event of a cache failure, the data can still be accessed from the disk. As only
writes are mirrored, this method results in better utilization of the available cache. In cache

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

71

mirroring approaches, the problem of maintaining cache coherency is introduced. Cache


coherency means that data in two different cache locations must be identical at all times. It is the
responsibility of the array operating environment to ensure coherency.

■ Cache vaulting: Cache is exposed to the risk of uncommitted data loss due to power failure.
This problem can be addressed in various ways: powering the memory with a battery until AC
power is restored or using battery power to write the cache content to the disk. In the event of
extended power failure, using batteries is not a viable option because in intelligent storage
systems, large amounts of data may need to be committed to numerous disks and batteries may
not provide power for sufficient time to write each piece of data to its intended disk. Therefore,
storage vendors use a set of physical disks to dump the contents of cache during power failure.
This is called cache vaulting and the disks are called vault drives. When power is restored, data
from these disks is written back to write cache and then written to the intended disks.
2.1.1.3 Back End
The back end provides an interface between cache and the physical disks. It consists of two
components:
1. Back-end ports
2. Back-end controllers.
The back-end controls data transfers between cache and the physical disks. From cache, data is
sent to the back end and then routed to the destination disk. Physical disks are connected to ports
on the back end.
The back-end controller communicates with the disks when performing reads and writes and also
provides additional, but limited, temporary data storage.
The algorithms implemented on back-end controllers provide error detection and correction,
along with RAID functionality.
For high data protection and availability, storage systems are configured with dual controllers
with multiple ports. Such configurations provide an alternate path to physical disks in the event
of a controller or port failure.
This reliability is further enhanced if the disks are also dual-ported. In that case, each disk port
can connect to a separate controller. Multiple controllers also facilitate load balancing.
2.1.1.4 Physical Disk
A physical disk stores data persistently. Disks are connected to the back-end with either SCSI or
a Fibre Channel interface (discussed in subsequent chapters). An intelligent storage system
enables the use of a mixture of SCSI or Fibre Channel drives and IDE/ATA drives.
Logical Unit Number:
Physical drives or groups of RAID protected drives can be logically split into volumes known as
logical volumes, commonly referred to as Logical Unit Numbers (LUNs).
The use of LUNs improves disk utilization. For example, without the use of LUNs, a host
requiring only 200 GB could be allocated an entire 1TB physical disk. Using LUNs, only the

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

72

required 200 GB would be allocated to the host, allowing the remaining 800 GB to be allocated
to other hosts. In the case of RAID protected drives, these logical units are slices of RAID sets
and are spread across all the physical disks belonging to that set. The logical units can also be
seen as a logical partition of a RAID set that is presented to a host as a physical disk. For example,
Figure 4-6 shows a RAID set consisting of five disks that have been sliced, or partitioned, into
several LUNs. LUNs 0 and 1 are shown in the figure.

Note how a portion of each LUN resides on each physical disk in the RAID set. LUNs 0 and 1
are presented to hosts 1 and 2, respectively, as physical volumes for storing and retrieving data.
Usable capacity of the physical volumes is determined by the RAID type of the RAID set.

The capacity of a LUN can be expanded by aggregating other LUNs with it. The result of this
aggregation is a larger capacity LUN, known as a meta LUN. The mapping of LUNs to their
physical location on the drives is managed by the operating environment of an intelligent storage
system.

LUN Masking:

LUN masking is a process that provides data access control by defining which LUNs a host can
access. LUN masking function is typically implemented at the front-end controller. This ensures
that volume access by servers is controlled appropriately, preventing unauthorized or accidental
use in a distributed environment. For example, consider a storage array with two LUNs that store
data of the sales and finance departments. Without LUN masking, both departments can easily
see and modify each other’s data, posing a high risk to data integrity and security. With LUN
masking, LUNs are accessible only to the designated hosts.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

73

2.1.2 Components of Hard disk drives and Solid-state drives:

Components of Hard disk drives:


A disk drive uses a rapidly moving arm to read and write data across a flat platter coated with
magnetic particles. Data is transferred from the magnetic platter through the R/W head to the
computer. Several platters are assembled together with the R/W head and controller, most
commonly referred to as a hard disk drive (HDD). Data can be recorded and erased on a magnetic
disk any number of times. This section details the different components of the disk, the
mechanism for organizing and storing data on disks, and the factors that affect disk performance.
Key components of a disk drive are platter, spindle, read/write head, actuator arm assembly, and
controller (Figure 2-2):

2.1.2.1 Platter
A typical HDD consists of one or more flat circular disks called platters (Figure 2-3). The data is
recorded on these platters in binary codes (0s and 1s). The set of rotating platters is sealed in a
case, called a Head Disk Assembly (HDA). A platter is a rigid, round disk coated with magnetic
material on both surfaces (top and bottom). The data is encoded by polarizing the magnetic area,
or domains, of the disk surface. Data can be written to or read from both surfaces of the platter.
The number of platters and the storage capacity of each platter determine the total capacity of the
drive.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

74

2.1.2.2 Spindle
A spindle connects all the platters, as shown in Figure 2-3, and is connected to a motor. The motor
of the spindle rotates with a constant speed. The disk platter spins at a speed of several thousands
of revolutions per minute (rpm). Disk drives have spindle speeds of 7,200 rpm, 10,000 rpm, or
15,000 rpm. Disks used on current storage systems have a platter diameter of 3.5” (90 mm). When
the platter spins at 15,000 rpm, the outer edge is moving at around 25 percent of the speed of
sound. The speed of the platter is increasing with improvements in technology, although the extent
to which it can be improved is limited.
2.1.2.3 Read/Write Head
Read/Write (R/W) heads, shown in Figure 2-4, read and write data from or to a platter. Drives
have two R/W heads per platter, one for each surface of the platter. The R/W head changes the
magnetic polarization on the surface of the platter when writing data. While reading data, this
head detects magnetic polarization on the surface of the platter. During reads and writes, the R/W
head senses the magnetic polarization and never touches the surface of the platter. When the
spindle is rotating, there is a microscopic air gap between the R/W heads and the platters, known
as the head flying height. This air gap is removed when the spindle stops rotating and the R/W
head rests on a special area on the platter near the spindle. This area is called the landing zone.
The landing zone is coated with a lubricant to reduce friction between the head and the platter.
The logic on the disk drive ensures that heads are moved to the landing zone before they touch
the surface. If the drive malfunctions and the R/W head accidentally touches the surface of the
platter outside the landing zone, a head crash occurs. In a head crash, the magnetic coating on the
platter is scratched and may cause damage to the R/W head. A head crash generally results in data
loss.
2.1.2.4 Actuator Arm Assembly
The R/W heads are mounted on the actuator arm assembly (refer to Figure 2-2 [a]) which
positions the R/W head at the location on the platter where the data needs to be written or read.
The R/W heads for all platters on a drive are attached to one actuator arm assembly and move
across the platters simultaneously. Note that there are two R/W heads per platter, one for each
surface, as shown in Figure 2-4.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

75

2.1.2.5 Controller
The controller (see Figure 2-2 [b]) is a printed circuit board, mounted at the bottom of a disk
drive. It consists of a microprocessor, internal memory, circuitry, and firmware. The firmware
controls power to the spindle motor and the speed of the motor. It also manages communication
between the drive and the host. In addition, it controls the R/W operations by moving the actuator
arm and switching between different R/W heads, and performs the optimization of data access.
2.1.2.6 Physical Disk Structure
Data on the disk is recorded on tracks, which are concentric rings on the platter around the spindle,
as shown in Figure 2-5. The tracks are numbered, starting from zero, from the outer edge of the
platter. The number of tracks per inch (TPI) on the platter (or the track density) measures how
tightly the tracks are packed on a platter.
Each track is divided into smaller units called sectors.
A sector is the smallest, individually addressable unit of storage. The track and sector structure is
written on the platter by the drive manufacturer using a formatting operation. The number of
sectors per track varies according to the specific drive.
The first personal computer disks had 17 sectors per track. Recent disks have a much larger
number of sectors on a single track. There can be thousands of tracks on a platter, depending on
the physical dimensions and recording density of the platter.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

76

2.1.2.7 Zoned Bit Recording


Because the platters are made of concentric tracks, the outer tracks can hold more data than the
inner tracks, because the outer tracks are physically longer than the inner tracks, as shown in
Figure 2-6 (a).
On older disk drives, the outer tracks had the same number of sectors as the inner tracks, so data
density was low on the outer tracks. This was an inefficient use of available space.
Zone bit recording utilizes the disk efficiently. As shown in Figure 2-6 (b), this mechanism groups
tracks into zones based on their distance from the center of the disk. The zones are numbered,
with the outermost zone being zone 0. An appropriate number of sectors per track are assigned to
each zone, so a zone near the center of the platter has fewer sectors per track than a zone on the
outer edge.
However, tracks within a particular zone have the same number of sectors.

Components of Solid-State Drives:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

77

An SSD, or solid-state drive, is a type of storage device used in computers. This non-volatile
storage media stores persistent data on solid-state flash memory. SSDs replace traditional hard
disk drives (HDDs) in computers and perform the same basic functions as a hard drive. But SSDs
are significantly faster in comparison. With an SSD, the device's operating system will boot up
more rapidly, programs will load quicker and files can be saved faster.
A traditional hard drive consists of a spinning disk with a read/write head on a mechanical arm
called an actuator. An HDD reads and writes data magnetically. The magnetic properties,
however, can lead to mechanical breakdowns.
By comparison, an SSD has no moving parts to break or spin up or down. The two key
components in an SSD are the flash controller and NAND flash memory chips. This configuration
is optimized to deliver high read/write performance for sequential and random data requests.

Flash Memory Chip: The data is stored on a solid-state flash memory that contains storage
memory. SSD has interconnected flash memory chips, which are fabricated out of silicon. So,
SSDs are manufactured by stacking chips in a grid to achieve different densities.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

78

Flash Controller: It is an in-built microprocessor that takes care of functions like error
correction, data retrieval, and encryption. It also controls access to input/output (I/O) and
read/write (R/W) operations between the SSD and host computer.2.1.3 Addressing of Hard Disk
drives and Solid-State Drives.
2.1.3 Addressing of Hard Disk drives and Solid-State Drives
Addressing of Hard Disk Drive:
2.1.3.1 Logical Block Addressing
Earlier drives used physical addresses consisting of the cylinder, head, and sector (CHS) number
to refer to specific locations on the disk, as shown in Figure 2-7 (a), and the host operating system
had to be aware of the geometry of each disk being used. Logical block addressing (LBA), shown
in Figure 2-7 (b), simplifies addressing by using a linear address to access physical blocks of data.
The disk controller translates LBA to a CHS address, and the host only needs to know the size of
the disk drive in terms of the number of blocks. The logical blocks are mapped to physical sectors
on a 1:1 basis.

In Figure 2-7 (b), the drive shows eight sectors per track, eight heads, and four cylinders. This
means a total of 8 × 8 × 4 = 256 blocks, so the block number ranges from 0 to 255. Each block
has its own unique address. Assuming that the sector holds 512 bytes, a 500 GB drive with a
formatted capacity of 465.7 GB will have in excess of 976,000,000 blocks.
Addressing of Solid-state drives:
The logical block address (LBA) is the standard used to specify the address for write and read
commands. Each LBA defines a 512 bytes sector into the device's storage space, although rarely
there are variations of the sector size.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

79

2.1.4 Performance of Hard disk drives and Solid-state drives


Disk Drive Performance
A disk drive is an electromechanical device that governs the overall performance of the storage
system environment. The various factors that affect the performance of disk drives are discussed
in this section.
2.1.4.1 Disk Service Time
Disk service time is the time taken by a disk to complete an I/O request. Components that
contribute to service time on a disk drive are seek time, rotational latency, and data transfer rate.
1. Seek Time
The seek time (also called access time) describes the time taken to position the R/W heads across
the platter with a radial movement (moving along the radius of the platter). In other words, it is
the time taken to reposition and settle the arm and the head over the correct track. The lower the
seek time, the faster the I/O operation. Disk vendors publish the following seek time
specifications:
■ Full Stroke: The time taken by the R/W head to move across the entire width of the disk, from
the innermost track to the outermost track.
■ Average: The average time taken by the R/W head to move from one random track to another,
normally listed as the time for one-third of a full stroke.
■ Track-to-Track: The time taken by the R/W head to move between adjacent tracks. Each of
these specifications is measured in milliseconds. The average seek time on a modern disk is
typically in the range of 3 to 15 milliseconds.
Seek time has more impact on the read operation of random tracks rather than adjacent tracks. To
minimize the seek time, data can be written to only a subset of the available cylinders. This results
in lower usable capacity than the actual capacity of the drive. For example, a 500 GB disk drive
is set up to use only the first 40 percent of the cylinders and is effectively treated as a 200 GB
drive. This is known as short-stroking the drive.
2. Rotational Latency
To access data, the actuator arm moves the R/W head over the platter to a particular track while
the platter spins to position the requested sector under the R/W head. The time taken by the platter
to rotate and position the data under the R/W head is called rotational latency. This latency
depends on the rotation speed of the spindle and is measured in milliseconds.
The average rotational latency is one-half of the time taken for a full rotation. Similar to the seek
time, rotational latency has more impact on the reading/writing of random sectors on the disk
than on the same operations on adjacent sectors. Average rotational latency is around 5.5 ms for
a 5,400-rpm drive, and around 2.0 ms for a 15,000-rpm drive.
3. Data Transfer Rate
The data transfer rate (also called transfer rate) refers to the average amount of data per unit time
that the drive can deliver to the HBA. It is important to first understand the process of read and

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

80

write operations in order to calculate data transfer rates. In a read operation, the data first moves
from disk platters to R/W heads, and then it moves to the drive’s internal buffer. Finally, data
moves from the buffer through the interface to the host HBA. In a write operation, the data moves
from the HBA to the internal buffer of the disk drive through the drive’s interface. The data then
moves from the buffer to the R/W heads. Finally, it moves from the R/W heads to the platters.
Little’s Law

N=a*R

N: total no of request a: arrival rate R: average response time

The data transfer rates during the R/W operations are measured in terms of internal and external
transfer rates, as shown in Figure 2-8.

Internal transfer rate is the speed at which data moves from a single track of a platter’s surface to
internal buffer (cache) of the disk. Internal transfer rate takes into account factors such as the seek
time. External transfer rate is the rate at which data can be moved through the interface to the
HBA. External transfer rate is generally the advertised speed of the interface, such as 133 MB/s
for ATA. The sustained external transfer rate is lower than the interface speed.
Performance of Solid-State Drives:
1. IOPS. This acronym stands for input/output operations per second. The metric measures how
many reads and writes an SSD can handle per second. The higher the IOPS, the better.
2. Throughput. This is the SSD's data transfer speed, measured in bytes per second. The higher
throughput, the better, although throughput is affected by elements such as file size and whether
the reads and writes are random or sequential.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

81

3. Latency. This shows how long it takes to process an I/O operation. This process translates to
SSD response time and is measured in microseconds or milliseconds. The lower the latency, the
better.
Solid-state drives are much faster than hard disk drives, and the speed difference between the two
types is significant. When moving big files, HDDs can copy 30 to 150 MB per second (MB/s),
while standard SATA SSDs perform the same action at speeds of 500 MB/s. Newer NVMe SSDs
can get up to astounding speeds: 3,000 to 3,500 MB/s.
With an SSD, one can copy a 20 GB movie in less than 10 seconds, while a hard disk would take
at least two minutes. Upgrading your Mac to an SSD or installing an SSD in your PC will give it
a significant speed boost.

2.1.5 Intelligent Storage Array


Intelligent storage systems generally fall into one of the following two categories:
■ High-end storage systems
■ Midrange storage systems
Traditionally, high-end storage systems have been implemented with active-active arrays,
whereas midrange storage systems used typically in small- and medium sized enterprises have
been implemented with active-passive arrays. Active-passive arrays provide optimal storage
solutions at lower costs. Enterprises make use of this cost advantage and implement active-
passive arrays to meet specific application requirements such as performance, availability, and
scalability. The distinctions between these two implementations are becoming increasingly
insignificant.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

82

2.1.5.1 High-end Storage Systems


High-end storage systems, referred to as active-active arrays, are generally aimed at large
enterprises for centralizing corporate data. These arrays are designed with a large number of
controllers and cache memory. An active-active array implies that the host can perform I/Os to
its LUNs across any of the available paths (see Figure 4-7)

To address the enterprise storage needs, these arrays provide the following capabilities:
■ Large storage capacity.
■ Large amounts of cache to service host I/O s optimally.
■ Fault tolerance architecture to improve data availability.
■ Connectivity to mainframe computers and open systems hosts.
■ Availability of multiple front-end ports and interface protocols to serve a large number of hosts.
■ Availability of multiple back-end Fibre Channel or SCSI RAID controllers to manage disk
processing.
■ Scalability to support the increased connectivity, performance, and the storage capacity
requirements.
■ Ability to handle large amounts of concurrent I/O s from a number of servers and applications
■ Support for array-based local and remote replication. In addition to these features, high-end
arrays possess some unique features and functionals that are required for mission-critical
applications in large enterprises.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

83

2.1.5.2 Midrange Storage System:


Midrange storage systems are also referred to as active-passive arrays and they are best suited
for small- and medium-sized enterprises.
In an active-passive array, a host can perform I/Os to a LUN only through the paths to the owning
controller of that LUN. These paths are called active paths.
The other paths are passive with respect to this LUN. As shown in Figure 4-8, the host can
perform reads or writes to the LUN only through the path to controller A, as controller A is the
owner of that LUN. The path to controller B remains passive and no I/O activity is performed
through this path.
Midrange storage systems are typically designed with two controllers, each of which contains
host interfaces, cache, RAID controllers, and disk drive interfaces.

Midrange arrays are designed to meet the requirements of small and medium enterprises;
therefore, they host less storage capacity and global cache than active-active arrays.
There are also fewer front-end ports for connection to servers.
However, they ensure high redundancy and high performance for applications with predictable
workloads. They also support array-based local and remote replication.
2.2 Data Protection: RAID (Redundant Array of Independent Disks)
RAID (redundant array of independent disks) is a way of storing the same data in different places
on multiple hard disks or solid-state drives (SSDs) to protect data in the case of a drive failure.
There are different RAID levels, however, and not all have the goal of providing redundancy.
2.2.1 Implementation of RAID
There are two types of RAID implementation, hardware and software. Both have their merits and
demerits and are discussed in this section.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

84

2.2.1.1 Software RAID


Software RAID uses host-based software to provide RAID functions. It is implemented at the
operating-system level and does not use a dedicated hardware controller to manage the RAID
array. Software RAID implementations offer cost and simplicity benefits when compared with
hardware RAID. However, they have the following limitations:
■ Performance:
Software RAID affects overall system performance. This is due to the additional CPU cycles
required to perform RAID calculations.
The performance impact is more pronounced for complex implementations of RAID, as detailed
later in this chapter.
■ Supported features:
Software RAID does not support all RAID levels.
■ Operating system compatibility:
Software RAID is tied to the host operating system hence upgrades to software RAID or to the
operating system should be validated for compatibility. This leads to inflexibility in the data
processing environment.
2.2.1.2 Hardware RAID
In hardware RAID implementations, a specialized hardware controller is implemented either on
the host or on the array.
These implementations vary in the way the storage array interacts with the host. Controller card
RAID is host-based hardware RAID implementation in which a specialized RAID controller is
installed in the host and HDDs are connected to it. The RAID Controller interacts with the hard
disks using a PCI bus.
Manufacturers also integrate RAID controllers on motherboards. This integration reduces the
overall cost of the system, but does not provide the flexibility required for high-end storage
systems.
The external RAID controller is an array-based hardware RAID. It acts as an interface between
the host and disks. It presents storage volumes to the host, which manage the drives using the
supported protocol.
Key functions of RAID controllers are:
■ Management and control of disk aggregations
■ Translation of I/O requests between logical disks and physical disks
■ Data regeneration in the event of disk failures
2.2.2 RAID Array Components
A RAID array is an enclosure that contains a number of HDDs and the supporting hardware and
software to implement RAID.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

85

HDDs inside a RAID array are usually contained in smaller sub-enclosures. These sub-
enclosures, or physical arrays, hold a fixed number of HDDs, and may also include other
supporting hardware, such as power supplies. A subset of disks within a RAID array can be
grouped to form logical associations called logical arrays, also known as a RAID set or a RAID
group (see Figure 3-1).
Logical arrays are comprised of logical volumes (LV). The operating system recognizes the LVs
as if they are physical HDDs managed by the RAID controller. The number of HDDs in a logical
array depends on the RAID level used. Configurations could have a logical array with multiple
physical arrays or a physical array with multiple logical arrays.

2.2.3 RAID Levels


RAID levels (see Table 3-1) are defined on the basis of striping, mirroring, and parity techniques.
These techniques determine the data availability and performance characteristics of an array.
Some RAID arrays use one technique, whereas others use a combination of techniques.
Application performance and data availability requirements determine the RAID level selection.
2.2.3.1 Striping
A RAID set is a group of disks. Within each disk, a predefined number of contiguously
addressable disk blocks are defined as strips. The set of aligned strips that spans across all the
disks within the RAID set is called a stripe. Figure 3-2 shows physical and logical representations
of a striped RAID set.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

86

Strip size (also called stripe depth) describes the number of blocks in a strip, and is the maximum
amount of data that can be written to or read from a single HDD in the set before the next HDD
is accessed, assuming that the accessed data starts at the beginning of the strip. Note that all strips
in a stripe have the same number of blocks, and decreasing strip size means that data is broken
into smaller pieces when spread across the disks.
Stripe size is a multiple of strip size by the number of HDDs in the RAID set. Stripe width refers
to the number of data strips in a stripe.
Striped RAID does not protect data unless parity or mirroring is used. However, striping may
significantly improve I/O performance. Depending on the type of RAID implementation, the
RAID controller can be configured to access data across multiple HDDs simultaneously
2.2.3.2 Mirroring
Mirroring is a technique whereby data is stored on two different HDDs, yielding two copies of
data. In the event of one HDD failure, the data is intact on the surviving HDD (see Figure 3-3)
and the controller continues to service the host’s data requests from the surviving disk of a
mirrored pair.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

87

When the failed disk is replaced with a new disk, the controller copies the data from the surviving
disk of the mirrored pair. This activity is transparent to the host.
In addition to providing complete data redundancy, mirroring enables faster recovery from disk
failure. However, disk mirroring provides only data protection and is not a substitute for data
backup. Mirroring constantly captures changes in the data, whereas a backup captures point-in-
time images of data.
Mirroring involves duplication of data — the amount of storage capacity needed is twice the
amount of data being stored. Therefore, mirroring is considered expensive and is preferred for
mission-critical applications that cannot afford data loss.
Mirroring improves read performance because read requests can be serviced by both disks.
However, write performance deteriorates, as each write request manifests as two writes on the
HDDs. In other words, mirroring does not deliver the same levels of write performance as a
striped RAID
2.2.3.3 Parity
Parity is a method of protecting striped data from HDD failure without the cost of mirroring. An
additional HDD is added to the stripe width to hold parity, a mathematical construct that allows
re-creation of the missing data. Parity is a redundancy check that ensures full protection of data
without maintaining a full set of duplicate data.
Parity information can be stored on separate, dedicated HDDs or distributed across all the drives
in a RAID set. Figure 3-4 shows a parity RAID. The first four disks, labeled D, contain the data.
The fifth disk, labeled P, stores the parity information, which in this case is the sum of the
elements in each row. Now, if one of the Ds fails, the missing value can be calculated by
subtracting the sum of the rest of the elements from the parity value.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

88

In Figure 3-4, the computation of parity is represented as a simple arithmetic operation on the
data. However, parity calculation is a bitwise XOR operation. Calculation of parity is a function
of the RAID controller.
Compared to mirroring, parity implementation considerably reduces the cost associated with data
protection. Consider a RAID configuration with five disks.
Four of these disks hold data, and the fifth holds parity information. Parity requires 25 percent
extra disk space compared to mirroring, which requires 100 percent extra disk space. However,
there are some disadvantages of using parity.
Parity information is generated from data on the data disk. Therefore, parity is recalculated every
time there is a change in data. This recalculation is time-consuming and affects the performance
of the RAID controller.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

89

2.2.3.4 RAID 0: block-by-block striping / striped array with no fault tolerance


In a RAID 0 configuration, data is striped across the HDDs in a RAID set. It utilizes the full
storage capacity by distributing strips of data over multiple HDDs in a RAID set. To read data,
all the strips are put back together by the controller.
The stripe size is specified at a host level for software RAID and is vendor specific for hardware
RAID. Figure 3-5 shows RAID 0 on a storage array in which data is striped across 5 disks.
When the number of drives in the array increases, performance improves because more data can
be read or written simultaneously.
RAID 0 is used in applications that need high I/O throughput. However, if these applications
require high availability, RAID 0 does not provide data protection and availability in the event of
drive failures.
Evaluation
Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks each having B
blocks are fully utilized.
Advantages

• It is easy to implement.
• It utilizes the storage capacity in a better way.
Disadvantages

• A single drive loss can result in the complete failure of the system.
• Not a good choice for a critical system.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

90

2.2.3.5 RAID 1: block-by-block mirroring / disk mirroring


In a RAID 1 configuration, data is mirrored to improve fault tolerance (see Figure 3-6). A RAID
1 group consists of at least two HDDs.
As explained in mirroring, every write is written to both disks, which is transparent to the host in
a hardware RAID implementation.
In the event of disk failure, the impact on data recovery is the least among all RAID
implementations. This is because the RAID controller uses the mirror drive for data recovery and
continuous operation. RAID 1 is suitable for applications that require high availability.
Evaluation
Assume a RAID system with mirroring level 2.
Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk would have duplicates on
some other disk. If we are lucky enough and disks 0 and 2 fail, then again this can be handled as
the blocks of these disks have duplicates on disks 1 and 3. So, in the best case, N/2 disk failures
can be handled.
Capacity: N*B/2
Only half the space is being used to store data. The other half is just a mirror of the already stored
data.
Advantages

• It covers complete redundancy.


• It can increase data security and speed.
Disadvantages

• It is highly expensive.
• Storage capacity is less.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

91

2.2.3.6 RAID 0+1: striping and mirroring combined (Nested RAID)


Most data centers require data redundancy and performance from their RAID arrays. RAID 0+1
and RAID 1+0 combine the performance benefits of RAID 0 with the redundancy benefits of
RAID 1. They use striping and mirroring techniques and combine their benefits. These types of
RAID require an even number of disks, the minimum being four (see Figure 3-7).
RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0. Similarly, RAID 0+1 is also known as
RAID 01 or RAID 0/1. RAID 1+0 performs well for workloads that use small, random, write-
intensive I/O.
Some applications that benefit from RAID 1+0 include the following:
■ High transaction rate Online Transaction Processing (OLTP)
■ Large messaging installations
■ Database applications that require high I/O rate, random access, and high availability
A common misconception is that RAID 1+0 and RAID 0+1 are the same. Under normal
conditions, RAID levels 1+0 and 0+1 offer identical benefits. However, rebuild operations in the
case of disk failure differ between the two.
RAID 1+0 is also called striped mirror. The basic element of RAID 1+0 is a mirrored pair, which
means that data is first mirrored and then both copies of data are striped across multiple HDDs in
a RAID set.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

92

When replacing a failed drive, only the mirror is rebuilt. In other words, the disk array controller
uses the surviving drive in the mirrored pair for data recovery and continuous operation. Data
from the surviving disk is copied to the replacement disk.
RAID 0+1 is also called mirrored stripe. The basic element of RAID 0+1 is a stripe. This means
that the process of striping data across HDDs is performed initially and then the entire stripe is
mirrored. If one drive fails, then the entire stripe is faulted.
A rebuild operation copies the entire stripe, copying data from each disk in the healthy stripe to
an equivalent disk in the failed stripe. This causes increased and unnecessary I/O load on the
surviving disks and makes the RAID set more vulnerable to a second disk failure.
Advantages:

• Using multiple hard drives enables RAID to improve the performance of a single hard
drive.
• Reads and writes can be performed faster than with a single drive with RAID 0. This is
because a file system is split up and distributed across drives that work together on the
same file.
Disadvantages:

• Nested RAID levels are more expensive to implement than traditional RAID levels,
because they require more disks.
• The cost per gigabyte for storage devices is higher for nested RAID because many of the
drives are used for redundancy.
2.2.3.7 RAID 2: Bit-Level Stripping with Dedicated Parity
In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code Parity
Method to find the error in the data. It uses one designated drive to store parity.
The structure of Raid-2 is very complex as we use two disks in this technique.
One word is used to store bits of each word and another word is used to store error code correction.
It is not commonly used.
Advantages

• In case of Error Correction, it uses hamming code.


• It Uses one designated drive to store parity.
Disadvantages

• It has a complex structure and high cost due to extra drive.


• It requires an extra drive for error detection.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

93

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

94

2.2.3.8 RAID 3: Byte-Level Stripping with Dedicated Parity / Parallel access array with
dedicated parity disks
RAID 3 stripes data for high performance and uses parity for improved fault tolerance. Parity
information is stored on a dedicated drive so that data can be reconstructed if a drive fails. For
example, of five disks, four are used for data and one is used for parity. Therefore, the total disk
space required is 1.25 times the size of the data disks. RAID 3 always reads and writes complete
stripes of data across all disks, as the drives operate in parallel. There are no partial writes that
update one out of many strips in a stripe. Figure 3-8 illustrates the RAID 3 implementation.

RAID 3 provides good bandwidth for the transfer of large volumes of data.
RAID 3 is used in applications that involve large sequential data access, such as video streaming.
Advantages

• Data can be transferred in bulk.


• Data can be accessed in parallel.
Disadvantages

• It requires an additional drive for parity.


• In the case of small-size files, it performs slowly.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

95

2.2.3.9 RAID 4: Block-Level Stripping with Dedicated Parity / Striped array with
independent disks and a dedicated parity disk
Similar to RAID 3, RAID 4 stripes data for high performance and uses parity for improved fault
tolerance (refer to Figure 3-8). Data is striped across all disks except the parity disk in the array.
Parity information is stored on a dedicated disk so that the data can be rebuilt if a drive fails.
Striping is done at the block level.
Unlike RAID 3, data disks in RAID 4 can be accessed independently so that specific data
elements can be read or written on single disk without read or write of an entire stripe. RAID 4
provides good read throughput and reasonable write throughput.
Evaluation
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If more than
one disk fails, there is no way to recover the data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made available
for data storage, each disk having B blocks.
Advantages

• It helps in reconstructing the data if at most one data is lost.


Disadvantages

• It can’t help in reconstructing when more than one data is lost.


2.2.3.10 RAID 5: Block-Level Stripping with Distributed Parity / Striped array with
independent disks & distributed parity
RAID 5 is a very versatile RAID implementation. It is similar to RAID 4 because it uses striping
and the drives (strips) are independently accessible.
The difference between RAID 4 and RAID 5 is the parity location.
In RAID 4, parity is written to a dedicated drive, creating a write bottleneck for the parity disk.
In RAID 5, parity is distributed across all disks.
The distribution of parity in RAID 5 overcomes the write bottleneck. Figure 3-9 illustrates the
RAID 5 implementation.
RAID 5 is preferred for messaging, data mining, medium-performance media serving, and
relational database management system (RDBMS) implementations in which database
administrators (DBAs) optimize data access.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

96

Evaluation
Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If more than
one disk fails, there is no way to recover the data. This is identical to RAID-4.
Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
Advantages

• Data can be reconstructed using parity bits.


• It makes the performance better.
Disadvantages

• Its technology is complex and extra space is required.


• If both discs get damaged, data will be lost forever.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

97

2.2.3.11 RAID 6: Block-Level Stripping with two Parity Bits / Striped array with
independent disks & dual distributed parity
RAID 6 works the same way as RAID 5 except that RAID 6 includes a second parity element to
enable survival in the event of the failure of two disks in a RAID group (see Figure 3-10).
Therefore, a RAID 6 implementation requires at least four disks. RAID 6 distributes the parity
across all the disks.
The write penalty in RAID 6 is more than that in RAID 5; therefore, RAID 5 writes perform
better than RAID 6. The rebuild operation in RAID 6 may take longer than that in RAID 5 due
to the presence of two parity sets.

Advantages

• Very high data Accessibility.


• Fast read data transactions.
Disadvantages

• Due to double parity, it has slow write data transactions.


• Extra space is required.
Advantages of RAID

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

98

Data redundancy: By keeping numerous copies of the data on many disks, RAID can shield data
from disk failures.
Performance enhancement: RAID can enhance performance by distributing data over several
drives, enabling the simultaneous execution of several read/write operations.
Scalability: RAID is scalable, therefore by adding more disks to the array, the storage capacity
may be expanded.
Versatility: RAID is applicable to a wide range of devices, such as workstations, servers, and
personal PCs
Disadvantages of RAID
Cost: RAID implementation can be costly, particularly for arrays with large capacities.
Complexity: The setup and management of RAID might be challenging.
Decreased performance: The parity calculations necessary for some RAID configurations,
including RAID 5 and RAID 6, may result in a decrease in speed.
Single point of failure: RAID is not a comprehensive backup solution, while offering data
redundancy. The array’s whole contents could be lost if the RAID controller malfunctions.
2.2.4 RAID Comparison
Table 3-2 compares the different types of RAID.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

99

2.2.5 RAID Impact on Disk Performance


When choosing a RAID type, it is imperative to consider the impact to disk performance and
application IOPS.
In both mirrored and parity RAID configurations, every write operation translates into more I/O
overhead for the disks which is referred to as write penalty. In a RAID 1 implementation, every
write operation must be performed on two disks configured as a mirrored pair while in a RAID 5
implementation, a write operation may manifest as four I/O operations. When performing small
I/Os to a disk configured with RAID 5, the controller has to read, calculate, and write a parity
segment for every data write operation.
Figure 3-11 illustrates a single write operation on RAID 5 that contains a group of five disks.
Four of these disks are used for data and one is used for parity.

The parity (P) at the controller is calculated as follows:

E p = E1 + E2 + E3 + E4 (XOR operations)

Here, D1 to D4 is striped data across the RAID group of five disks.

Whenever the controller performs a write I/O, parity must be computed by reading the old parity
(Ep old) and the old data (E4 old) from the disk, which means two read I/Os.

The new parity (Ep new) is computed as follows:

E p new = Ep old – E4 old + E4 new (XOR operations)

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

100

After computing the new parity, the controller completes the write I/O by writing the new data
and the new parity onto the disks, amounting to two write I/Os. Therefore, the controller performs
two disk reads and two disk writes for every write operation, and the write penalty in RAID 5
implementations is 4.
In RAID 6, which maintains dual parity, a disk write requires three read operations: for Ep1 old,
Ep2 old, and E4 old. After calculating Ep1 new and Ep2 new, the controller performs three write
I/O operations for Ep1 new, Ep2 new and E4 new. Therefore, in a RAID 6 implementation, the
controller performs six I/O operations for each write I/O, and the write penalty is 6.
2.2.5.1 Application IOPS and RAID Configurations
When deciding the number of disks required for an application, it is important to consider the
impact of RAID based on IOPS generated by the application. The total disk load should be
computed by considering the type of RAID configuration and the ratio of read compared to write
from the host.
The following example illustrates the method of computing the disk load in different types of
RAID.
Consider an application that generates 5,200 IOPS, with 60 percent of them being reads.
The disk load in RAID 5 is calculated as follows:
RAID 5 disk load = 0.6 × 5,200 + 4 × (0.4 × 5,200) [because the write penalty for RAID 5 is 4]
= 3,120 + 4 × 2,080
= 3,120 + 8,320
= 11,440 IOPS

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

101

The disk load in RAID 1 is calculated as follows:


RAID 1 disk load = 0.6 × 5,200 + 2 × (0.4 × 5,200) [because every write manifests as two writes
to the disks]
= 3,120 + 2 × 2,080
= 3,120 + 4,160
= 7,280 IOPS
The computed disk load determines the number of disks required for the application. If in this
example an HDD with a specification of a maximum 180 IOPS for the application needs to be
used, the number of disks required to meet the workload for the RAID configuration would be as
follows:
■ RAID 5: 11,440 / 180 = 64 disks
■ RAID 1: 7,280 / 180 = 42 disks (approximated to the nearest even number)
2.2.6 Hot Spares
A hot spare refers to a spare HDD in a RAID array that temporarily replaces a failed HDD of a
RAID set. A hot spare takes the identity of the failed HDD in the array.
One of the following methods of data recovery is performed depending on the RAID
implementation:
■ If parity RAID is used, then the data is rebuilt onto the hot spare from the parity and the data
on the surviving HDDs in the RAID set.
■ If mirroring is used, then the data from the surviving mirror is used to copy the data.
When the failed HDD is replaced with a new HDD, one of the following takes place:
■ The hot spare replaces the new HDD permanently. This means that it is no longer a hot spare,
and a new hot spare must be configured on the array.
■ When a new HDD is added to the system, data from the hot spare is copied to it. The hot spare
returns to its idle state, ready to replace the next failed drive.
A hot spare should be large enough to accommodate data from a failed drive. Some systems
implement multiple hot spares to improve data availability.
A hot spare can be configured as automatic or user initiated, which specifies how it will be used
in the event of disk failure.
In an automatic configuration, when the recoverable error rates for a disk exceed a predetermined
threshold, the disk subsystem tries to copy data from the failing disk to the hot spare
automatically.
If this task is completed before the damaged disk fails, then the subsystem switches to the hot
spare and marks the failing disk as unusable. Otherwise, it uses parity or the mirrored disk to
recover the data. In the case of a user-initiated configuration, the administrator has control of the
rebuild process.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

102

For example, the rebuild could occur overnight to prevent any degradation of system
performance. However, the system is vulnerable to another failure if a hot spare is
unavailable.
2.3 Scale-up and Scale-out storage Architecture
Scaling up and scaling out are the two main methods used to increase data storage capacity.
Scale-out and scale-up architectures—also known, respectively, as horizontal scaling and vertical
scaling and scale in and scale down—refer to how companies scale their data storage: by adding
more hardware drives (scale up/vertical scaling), or by adding more software nodes (scale
out/horizontal scaling).
Scale-up is the more traditional format, but it runs into space issues as data volumes grow and the
need for more and more data storage increases. Hence, the advent of scale-out architectures.
Scale-up Architecture:
In a scale-up data storage architecture, storage drives are added to increase storage capacity and
performance.
The drives are managed by two controllers.
When you run out of storage capacity, you add another shelf of drives to the architecture.
Scale-out Architecture:
A scale-out architecture uses software-defined storage (SDS) to separate the storage hardware
from the storage software, letting the software act as the controllers. This is why scale-out storage
is considered to be network attached storage (NAS).
Scale-out NAS systems involve clusters of software nodes that work together. Nodes can be added
or removed, allowing things like bandwidth, compute, and throughput to increase or decrease as
needed. To upgrade a scale-out system, new clusters must be created.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

103

2.3.1 Comparison of Scale up-Scale-out Storage


Vertical-scaling (i.e., scaling in) and horizontal-scaling (i.e., scaling down) architectures differ in
the way they scale data storage.
Decoupling storage software from storage hardware in the scale-out model allows companies to
expand their storage capacity when and how they see fit. With scale-up architectures, on the other
hand, another piece of proprietary hardware has to be added to be able to scale.

Scale-up Scale-out

Increasing the size of the instances Adding more of the same

Simpler to manage More complexity to manage

Lower availability (If a single instance falls Higher availability (If a single instance fall it
service becomes unavailable.) doesn’t matter.)

2.3.2 Advantages
Advantages of Scale-up Architecture
Scaling up offers certain advantages, including:
• Affordability: Because there’s only one large server to manage, scaling up is a cost-
effective way to increase storage capacity since you’ll end up paying less for your network
equipment and licensing. Upgrading a pre-existingserver costs less than purchasing a new
one. Vertical scaling also tends to require less new backup and virtualization software.
• Maintenance: Since you have only one storage system to manage versus a whole cluster
of different elements, scale-up architectures are easier to manage and also make it easier
to address specific data quality issues.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

104

• Simpler communication: Since vertical scaling means having just a single node handling
all the layers of your services, you don’t need to worry about your system synchronizing
and communicating with other machines to work, which can lead to faster response times.
Advantages of Scale-out Architecture
The advantages of scale-out architecture include:
• Better performance: Horizontal scaling allows for more connection endpoints since the
load will be shared by multiple machines, and this improves performance.

• Easier scaling: Horizontal scaling is much easier from a hardware perspective because
all you need to do is add machines.

• Less downtime and easier upgrades: Scaling out means less downtime because you
don’t have to switch anything off to scale or make upgrades. Scaling out essentially allows
you to upgrade or downgrade your hardware whenever you want as you can move all
users, workloads, and data without any downtime. Scale-out systems can also auto-tune
and self-heal, allowing clusters to easily accommodate all data demands.
2.3.3 Disadvantages
Disadvantages of Scale-up Architecture
The disadvantages of scale-up architectures include:
Scalability limitations: Although scaling up is how enterprises have traditionally handled
storage upgrades, this approach has slowly lost its effectiveness. The RAM, CPU, and hard drives
added to a server can only perform to the level the computing housing unit allows. As a result,
performance and capacity become a problem as the unit nears its physical limitations. This, in
turn, impacts backup and recovery times and other mission-critical processes.
Upgrade headaches and downtime: Upgrading a scale-up architecture can be extremely tedious
and involve a lot of heavy lifting. Typically, you need to copy every piece of data from the old
server over to a new machine, which can be costly in terms of both money and downtime. Also,
adding another server to the mix usually means adding another data store, which could result in
the network getting bogged down by storage pools and users not knowing where to look for files.
Both of these can negatively impact productivity. Also, with a scale-up architecture, you need to
take your existing server offline while replacing it with a new, more powerful one. During this
time, your apps will be unavailable.
Disadvantages of Scale-out Architecture
The disadvantages of horizontal scaling include:
• Complexity: It’s always going to be harder to maintain multiple servers compared to a
single server. Also, things like load balancing and virtualization may require adding
software, and machine backups can also be more complex because you’ll need to ensure
nodes synchronize and communicate effectively.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

105

• Cost: Scaling out can be more expensive than scaling up because adding new servers is
far more expensive than upgrading old ones.
Which One Is Best: Scale-out or Scale-up?
The answer depends on your particular needs and resources. Here are some questions to think
about:
• Are your needs long term or short term?
• What’s your budget? Is it big or small?
• What type of workloads are you dealing with?
• Are you dealing with a temporary traffic peak or constant traffic overload?
Once you’ve answered those questions, consider these factors:
• Cost: Horizontal scaling is more expensive, at least initially, so if your budget is tight,
then scaling up might be the best choice.
• Reliability: Horizontal scaling is typically far more reliable than vertical scaling. If
you’re handling a high volume of transactional data or sensitive data, for example, and
your downtime costs are high, you should probably opt for scaling out.
• Geographic distribution: If you have, or plan to have, global clients, you’ll be much
better able to maintain your SLAs via scaling out since a single machine in a single
location won’t be enough for customers to access your services.
• Future-proofing: Because scaling up uses a single node, it’s tough to future-proof a
vertical scaling-based architecture.
With scaling out, it’s much easier to increase the overall performance threshold of your
organization by adding machines. If you’re planning for the long term and operate in a
highly competitive industry with lots of potential disruptors, scaling out would be the best
option.
In short, if you have a bigger budget and expect a steady and large growth in data over a long
period of time and need to distribute an overstrained storage workload across several storage
nodes, scaling out is the best option.
If you haven’t yet maxed out the full potential of your current infrastructure and can still add
CPUs and memory resources to it and you don’t anticipate a meaningfully large growth in your
data set over the next three to five years, then scaling up would likely be the best choice.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

106

Two Mark Questions with Answers


1. What is Storage?

Ans: Storage is a process through which digital data is saved within a data storage device by means of
computing technology. Storage is a mechanism that enables a computer to retain data, either temporarily
or permanently.

➢ Storage devices such as flash drives and hard disks are a fundamental component of most digital
devices since they allow users to preserve all kinds of information such as videos, documents,
pictures and raw data.
➢ Storage may also be referred to as computer data storage or electronic data storage.
2. What is meant by Storage Systems?
Ans: Storage systems, in the context of information technology and data management, refer to the
hardware and software components designed to store and manage digital data, making it accessible for
future retrieval and use. These systems play a fundamental role in modern computing and are essential
for preserving and managing vast amounts of data generated by individuals, organizations, and
applications.

3. List Components of Storage Systems?


Ans: A Storage System Consists of three Components

✓ Hosts
✓ Connectivity
✓ Storage
4. What is ment by Intelligent Storage Systems?

Ans: The intelligent storage systems are arrays that provide highly optimized I/O processing capabilities.
These arrays have an operating environment that controls the management, allocation, and utilization of
storage resources. These storage systems are configured with large amounts of memory called cache and
use sophisticated algorithms to meet the I/O requirements of performance sensitive applications.
5. List the Components of Intelligent Storage Systems?
Ans: An intelligent storage system consists of four key components

✓ Front end,
✓ Cache,
✓ Back end,
✓ Physical disks
6. What are the types of Intelligent Storage System?
Ans: There are two main categories in intelligent storage systems as follows

✓ High end storage system


✓ Midrange storage system

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

107

7. What is meant by RAID?


Ans: RAID (redundant array of independent disks) is a way of storing the same data in different places on
multiple hard disks or solid-state drives (SSDs) to protect data in the case of a drive failure. There are
different RAID levels, however, and not all have the goal of providing redundancy.

8. What are the levels of RAID?


Ans: Different RAID Levels are as follows

➢ RAID-0 (Stripping)
➢ RAID-1 (Mirroring)
➢ RAID-2 (Bit-Level Stripping with Dedicated Parity)
➢ RAID-3 (Byte-Level Stripping with Dedicated Parity)
➢ RAID-4 (Block-Level Stripping with Dedicated Parity)
➢ RAID-5 (Block-Level Stripping with Distributed Parity)
➢ RAID-6 (Block-Level Stripping with two Parity Bits)
9. What is meant by Spindle in Hard Disk Drives?
Ans: Spindle is the axis on which the hard disks spin. In storage engineering, the physical disk drive is
often called a “spindle”, referencing the spinning parts which limit the device to a single I/O operation at
a time and making it the focus of Input/Output scheduling decisions.

10. What is Called Hard Disk Drives?


Ans: The hard disk is a type of magnetic disk. It is also called a fixed disk.A hard disk consists of
several circular disks called platters sealed inside a container. The container contains a motor to rotate
the disk. It also contains an access arm and a read and writes head to read and write data to the disk. The
platters are used to store the data. A platter in a hard disk is coated with magnetic material.

11. List Components of Hard Disk Drives?


Ans: The Components of Hard Disk drives as follows

➢ Disk Platters
➢ Read/ Write Heads
➢ Head Actuator mechanism
➢ Logic Board
➢ Spindle motor
➢ Cables and Connectors
➢ Configuration items (jumpers, Switches etc)
12. How do you measure the performance of hard disks and Solid drives?
Ans: The performance of Hard disks and Solid drives are measured as follows

➢ Disk service time-Disk service time is the time taken by a disk to complete an I/O request.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

108

➢ Seek Time- Describes the time taken to position the R/W heads across the platter with a radial
movement.
➢ Rotational latency- The time taken by the platter to rotate and position the data under the R/W
head.
➢ Data Transfer Rate- The data transfer rate refers to the average amount of data per unit time that
the drive can deliver to HBA.
Little’s Law

N=a*R

N: total no of request a: arrival rate R: average response time

13. What is meant by scale-up and scale-out Storage?


Ans:

➢ Scaling up is adding further resources, like hard drives and memory, to increase the computing
capacity of physical servers.
➢ Scaling out is adding more servers to your architecture to spread the workload across more
machines.
14. What is RAID 1 with example?
Ans: RAID 1 is also called as Disk Mirroring, Mirroring is a technique whereby data is stored on two
different HDDs, yielding two copies of data. In the event of one HDD failure, the data is intact on the
surviving HDD and the controller continues to service the host’s data requests from the surviving disk of
a mirrored pair.

15. What is host in storage systems ?


Ans: Users store and retrieve data through applications. The computers on which these applications run
are referred to as hosts. Hosts can range from simple laptops to complex clusters of servers. A host consists
of physical components (hardware devices) that communicate with one another using logical components
(software and protocols).

16. What is meant by disk service time?


Ans: Disk service time is the time taken by a disk to complete an I/O request.

Components that contribute to service time on a disk drive are seek time, rota

tional latency, and data transfer rate.

17. How do you address the Hard disk drives?


Ans:

The operating system tells the drive to read or write a certain Logical Block Address (LBA).
Traditionally, each LBA refers to the start of a 512 byte sector on the drive. A 1 TB disk drive will have
around 2 billion sectors each numbered consecutively from the start of the drive.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

109

Review Questions

1. Discuss about Components of Intelligent Storage Systems?


2. Explain about Redundant Array of Independent Disk and its levels with Examples?
3. Explain the types of intelligent storage systems in detail?
4. Explain in detail about Performance of Hard disk drives and Solid drives?
5. Explain about Scale up and Scale out Storage Architecture in detail?
6. Write a Short note on Components of Hard Disk Drives and Solid drives?
7. Explain in detail about Addressing of Hard disk drives and Solid drives?
8. Why is RAID 1 not a substitute for a backup?
9. Why is RAID 0 not an option for data protection and high availability?
10. Explain the process of data recovery in case of a drive failure in RAID 5.
11. What are the benefits of using RAID 3 in a backup application?
12. Discuss the impact of random and sequential I/O in different RAID configurations.
13. An application has 1,000 heavy users at a peak of 2 IOPS each and 2,000 typical users at
a peak of 1 IOPS each, with a read/write ratio of 2:1. It is estimated that the application
also experiences an overhead of 20 percent for other workloads. Calculate the IOPS
requirement for RAID 1, RAID 3, RAID 5, and RAID 6.
14. Compute the number of drives required to support the application in different RAID
environments if 10K RPM drives with a rating of 130 IOPS per drive were used.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

110

UNIT 3 - STORAGE NETWORKING TECHNOLOGIES AND


VIRTUALIZATION
3.1 Storage system
Files, blocks, and objects are storage formats that hold, organize, and present data in different
ways—each with their own capabilities and limitations. File storage organizes and represents data
as a hierarchy of files in folders; block storage chunks data into arbitrarily organized, evenly sized
volumes; and object storage manages data and links it to associated metadata.
Containers are highly flexible and bring incredible scale to how apps and storage are delivered.
3.1.1 Types of Storage Systems
Different types of storage systems as follows,
o Block-Based Storage System – Examples – SAN (Storage Area Network), iSCSI (Internet
Small Computer System Interface), and local disks.
o File-Based Storage System – Examples – NTFS (New Technology File System), FAT (File
Allocation Table), EXT (Extended File System), NAS (Network-attached Storage).
o Object-Based Storage System – Examples – Google cloud storage, Amazon Simple Storage
Options.
o Unified Storage System – Examples – Dell EMC Unity XT All-Flash Unified Storage and Dell
EMC Unity XT Hybrid Unified Storage.

3.1.1.1 Block-Based Storage System


A block-based storage system is the traditional storage systems which provides hosts with block-
level access to the storage volumes. In this type of storage system, the file system is created on
the hosts and data is accessed on a network at the block level.
After Cloud computing came into picture, to gain cost advantage, many organisations are moving
their application to cloud. To ensure proper functioning of the application and provide acceptable
performance, service providers offer block-based storage in cloud. The service providers enable
the consumers to create a block-based storage volumes and attach them to the virtual machine
instances. After the volumes are attached, the consumers can create the file system on these
volumes and run applications the way they would on an on-premise data center. This is most
commonly used storage system type in environment you work.
Block-based Storage System Architecture

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

111

The block-based storage system may consists of one or more controller(s) and number of storage
disks.

Controller
A controller of a block-based storage system consists of three key components: front end, cache,
and back end. An I/O request received from the hosts or compute systems at the front-end port is
processed through cache and back end, to enable storage and retrieval of data from the storage. A
read request can be serviced directly from cache if the requested data is found in the cache. In
modern intelligent storage systems, front end, cache, and back end are typically integrated on a
single board referred as a storage processor or storage controller.
For high data protection and high availability, storage systems are configured with dual
controllers with multiple ports. Such configurations provide an alternative path to physical
storage drives if a controller or port failure occurs. This reliability is further enhanced if the
storage drives are also dual-ported. In that case, each drive port can connect to a separate
controller. Multiple controllers also facilitate load balancing.
Front End
The front end provides the interface between the storage system and the hosts. It consists of two
components: front-end ports and front-end controllers. Typically, a front end has redundant
controllers for high availability, and each controller contains multiple ports that enable large
numbers of hosts to connect to the intelligent storage system. Each front-end controller has
processing logic that executes the appropriate transport protocol, such as Fibre Channel, iSCSI,
FICON, or FCoE for storage connections. Front-end controllers route data to and from cache via
the internal data bus. When the cache receives the write data, the controller sends an
acknowledgement message back to the compute system.
Backend
The back end provides an interface between cache and the physical storage drives. It consists of
two components: back-end ports and back-end controllers. The back-end controls data transfers
between cache and the physical drives. From cache, data is sent to the back end and then routed
to the destination storage drives. Physical drives are connected to ports on the back end. The
back-end controller communicates with the storage drives when performing reads and writes and
also provides additional, but limited, temporary data storage. The algorithms implemented on
back-end controllers provide error detection and correction, along with RAID functionality.
Storage

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

112

Physical storage drives are connected to the back-end storage controller and provide persistent
data storage. Modern intelligent storage systems provide support to a variety of storage drives
with different speeds and types, such as FC, SATA, SAS, and solid state drives. They also support
the use of a mix of SSD, FC, or SATA within the same storage system.
Workloads that have predictable access patterns typically work well with a combination of HDDs
and SSDs. If the workload changes, or constant high performance is required for all the storage
being presented, using a SSD can meet the desirable performance requirements.
3.1.1.2 File-Based Storage System
File-based storage systems (NAS) are based on file hierarchies that are complex in structure. Most
file systems have restrictions on the number of files, directories and levels of hierarchy that can
be supported, which limits the amount of data that can be stored. Whereas Object based storage
systems stores data using flat address space where the objects exist at the same level and one
object cannot be placed inside another object.
File sharing allows users to share files with other users. In a file-sharing environment, a user who
creates the file (the creator or owner of a file) determines the type of access (such as read, write,
execute, append, delete) to be given to other users. When multiple users try to access a shared file
at the same time, a locking scheme is used to maintain data integrity and at the same time make
this sharing possible. Some examples of file-sharing methods are
• Peer-to-Peer (P2P) model – A peer-to-peer (P2P) file sharing model uses peer-to-peer
network. P2P enables client machines to directly share files with each other over a
network.
• File Transfer Protocol (FTP) – FTP is a client-server protocol that enables data transfer
over a network. An FTP server and an FTP client communicate with each other using TCP
as the transport protocol.
• Distributed File System (DFS) – A distributed file system (DFS) is a file system that is
distributed across several hosts. A DFS can provide hosts with direct access to the entire
file system, while ensuring efficient management and data security. Hadoop Distributed
File System (HDFS) is an example of distributed file system.
The standard client-server file-sharing protocols, such as NFS and CIFS, enable the owner of a
file to set the required type of access, such as read-only or read-write, for a particular user or
group of users. Using this protocol, the clients can mount remote file systems that are available
on dedicated file servers.
So, for example if somebody shares a folder with you over the network, once you are connected
to the network, the shared folder is ready to use. There is no need to format before accessing it
unlike in block storage. Shared file storage is often referred to as network-attached storage (NAS)
and uses protocols such as NFS and SMB/CIFS to share storage.
3.1.1.3 Object-Based is Storage System
Object storage is a new type of storage system designed for cloud-scale scalability. Objects are
stored and retrieved from an object store through the web-based APIs such as REST and SOAP.
Each object can be linked with extensive metadata that can be searched and indexed. Object

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

113

storage is ideal for rich content data that does not change often and does not require high
performance. It is popular in the public cloud model.
Object-based Storage
Object-based storage device stores data in the form of objects on flat address space based on its
content and other attributes rather than the name and the location. An object is the fundamental
unit of object-based storage that contains user data, related metadata (size, date, ownership, etc.),
and user defined attributes of data (retention, access pattern, and other business-relevant
attributes).
The additional metadata or attributes enable optimized search, retention and deletion of objects.
For example, when bank account information is stored as a file in a NAS system, the metadata is
basic and may include information such as file name, date of creation, owner, and file type. When
stored as an object, the metadata component of the object may include additional information
such as account name, ID, and bank location, apart from the basic metadata.

The object ID is generated using specialized algorithms such as a hash function on the data and
guarantees that every object is uniquely identified. Any changes in the object, like user-based
edits to the file, results in a new object ID. Most of the object storage system supports APIs to
integrate it with software-defined data center and cloud environments.

Unlike SAN and NAS, applications do not know the location of the object stored. With object
storage, the application creates some data and give it to the OSD in exchange for a unique object
id (OID). The application which created the data does not need to know where the object is stored
as long as it is protected and returned whenever the application needed it.

For example, Consider a traditional car parking in any shopping mall or restaurant. It is your
responsibility to remember where you have parked your car in the huge parking area. But now a
days we have Valet parking, you just need to give your keys and you will have no idea where
your car will be parked and they will bring it back to you when you needed it. Similarly in Object
storage, the application will not know the location of the object but it can get it whenever it is
needed.
Components of Object based Storage Device

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

114

The OSD system is typically composed of three key components: Controllers, internal network,
and storage.
Nodes (controllers)
The OSD system is composed of one or more nodes or controllers. A node is a server that runs
the OSD operating environment and provides services to store, retrieve, and manage data in the
system. Typically OSD systems are architected to work with inexpensive x86-based nodes, each
node provides both compute and storage resources, and scales linearly in capacity and
performance by simply adding nodes.

The OSD node has two key services: metadata service and storage service. The metadata service
is responsible for generating the object ID from the contents of a file. It also maintains the
mapping of the object IDs and the file system namespace. In some implementations, the metadata
service runs inside an application server. The storage service manages a set of disks on which the
user data is stored.
Internal Network
The OSD nodes connect to the storage via an internal network. The internal network provides
node-to-node connectivity and node-to-storage connectivity. The application server accesses the
node to store and retrieve data over an external network.
Storage
OSD typically uses low-cost and high-density disk drives to store the objects. As more capacity
is required, more disk drives can be added to the system.
Object storage is not designed for high-performance and high-change requirements, nor is it
designed for storage of structured data such as databases. This is because object storage often
doesn’t allow updates in place. It is also not necessarily the best choice for data that changes a
lot. What it is great for is storage and retrieval of rich media and other Web 2.0 types of content
such as photos, videos, audio, and other documents.
3.1.1.4 Unified Storage
Unified storage architecture enables the creation of a common storage pool that can be shared
across a diverse set of applications with a common set of management processes.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

115

The key component of a unified storage architecture is unified controller. The unified controller
provides the functionalities of block storage, file storage, and object storage. It contains iSCSI,
FC, FCoE and IP front-end ports for direct block access to application servers and file access to
NAS clients.

For block-level access, the controller configures LUNs and presents them to application servers
and the LUNs presented to the application server appear as local physical disks. A file system is
configured on these LUNs at the server and is made available to applications for storing data.
For NAS clients, the controller configures LUNs and creates a file system on these LUNs and
creates a NFS, CIFS, or mixed share, and exports the share to the clients.
Some storage vendors offer REST API to enable object-level access for storing data from the
web/cloud applications.
The advantages by deploying unified storage systems
• Creates a single pool of storage resources that can be managed with a single management
interface.
• Sharing of pooled storage capacity for multiple business workloads should lead to a lower
overall system cost and administrative time, thus reducing the total cost of ownership
(TCO).
• Provides the capability to plan the overall storage capacity consumption. Deploying a
unified storage system takes away the guesswork associated with planning for file and
block storage capacity separately.
• Increased utilization, with no stranded capacity. Unified storage eliminates the capacity
utilization penalty associated with planning for block and file storage support separately.
• Provides the capability to integrate with software-defined storage environment to provide
next generation storage solutions for mobile, cloud, big data, and social computing needs.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

116

3.2 Fiber Channel Storage Area Network (FC SAN)


Fibre Channel SAN (FC SAN) is also referred as SAN. It uses Fibre Channel (FC) protocol for
communication. FC protocol (FCP) is used to transport data, commands, and status information
between the compute-systems and the storage systems. It is also used to transfer data between the
storage systems.
FC SAN is a storage networking technology that allows block storage resources to be shared over
a dedicated high-speed fibre channel (FC) network. Fibre Channel Protocol (FCP) is a mapping
of the SCSI protocol over Fibre Channel networks, i.e the SCSI commands and data blocks are
wrapped up in FC frames and delivered over an FC network.

By using this technology, it is technically possible to share any SCSI device over an FC SAN.
However, 99.9 % of devices shared on an FC SAN are disk storage devices or tape drives and
tape libraries. These devices are block devices which are effectively raw devices which appears
to operating system as they are locally attached devices. These devices do not have any higher
levels of abstraction such as file systems applied to them, which means that in an FC SAN
environment the creation or addition of file systems is the responsibility of the host or server
accessing the block storage device.
FC is a high-speed network technology that runs on high-speed optical fiber cables and serial
copper cables. The FC technology was developed to meet the demand for the increased speed of
data transfer between compute systems and mass storage systems.

The latest FC implementations of 16 GFC offer a throughput of 3200 MB/s (raw bit rates of 16
Gb/s), whereas Ultra640 SCSI is available with a throughput of 640 MB/s. FC is expected to
come with 6400 MB/s (raw bit rates of 32 Gb/s) and 25600 MB/s (raw bit rates of 128 Gb/s)
throughput in 2016. Technical Committee T11, which is the committee within International

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

117

Committee for Information Technology Standards (INCITS), is responsible for FC interface


standards.

The flow control mechanism in FC SAN delivers data as fast as the destination buffer is able to
receive it, without dropping frames. FC also has very little transmission overhead. The FC
architecture is highly scalable, and theoretically, a single FC SAN can accommodate
approximately 15 million devices.

3.2.1 Software-defined networking


Traditionally in any data center, a switch or a router consists of a data plane and a control plane.
The function of the data plane is to transfer the network traffic from one physical port to another
port by following rules that are programmed into the component. The function of the control
plane is to provide the programming logic that the data plane follows for switching or routing of
the network traffic.

Software Defined Networking

As per EMC definition, Software-defined networking is an approach to abstract and separate the
control plane functions from the data plane functions. Instead of the built-in control functions at
the network components level, the software external to the components takes over the control
functions.
The software runs on a compute-system or a standalone device and is called network controller.
The network controller interacts with the network components to gather configuration
information and to provide instructions for data plane in order to handle the network traffic. The
Software Defined Networking is

• Directly programmable: Network control is directly programmable because it is


decoupled from forwarding functions.
• Agile: Abstracting control from forwarding lets administrators dynamically adjust
network-wide traffic flow to meet changing needs.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

118

• Centrally managed: Network intelligence is (logically) centralized in software-based


SDN controllers that maintain a global view of the network, which appears to
applications and policy engines as a single, logical switch.

• Programmatically configured: SDN lets network managers configure, manage,


secure, and optimize network resources very quickly via dynamic, automated SDN
programs, which they can write themselves because the programs do not depend on
proprietary software.

• Open standards-based and vendor-neutral: When implemented through open


standards, SDN simplifies network design and operation because instructions are
provided by SDN controllers instead of multiple, vendor-specific devices and
protocols.

The key advantages of Software-defined networking are

• Centralized control: The software-defined approach provides a single point of control


for the entire SAN infrastructure that may span across data centers. The centralized
control plane provides that programming logic for transferring the SAN traffic, which
can be uniformly and quickly applied across the SAN infrastructure. The programming
logic can be upgraded centrally to add new features and based on application
requirements.

• Policy-based automation: With software-defined approach, many hardware-based


SAN management operations such as zoning can be automated. Management
operations may be programmed in the network controller based on business policies
and best practices. This reduces the need for manual operations that are repetitive,
error-prone, and time-consuming. Policy-based automation also helps to standardize
the management operations.

• Simplified, agile management: The network controller usually provides a


management interface that includes a limited and standardized set of management

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

119

functions. With policy-based automation in place, these management functions are


available in simplified form, abstracting the underlying operational complexity. This
makes it easy to configure a SAN infrastructure. This also helps promptly modifying
the SAN configuration to respond to the changing application requirements.

3.2.2 FC SAN Components and Architecture

FC SAN is made up of several physical and logical components. Such as

• Host bus adapters and converged network adapters


• FC Switches and directors
• FC Storage Arrays
• FC Cabling
• FC Fabrics
• FC Name Server
• Zoning
• FC Addressing
• FC Classes and Service
• Virtual SAN

3.2.2.1 Physical Components: Host bus adapters and converged network adapters:
The key FC SAN physical components are network adapters, cables, and interconnecting devices.
These components provide the connection network between the storage system and hosts. Here
we will see the major physical components to design a Fibre Channel SAN environment.
Network adapters: In an FC SAN, the end devices, such as server or host and storage systems
are all referred to as nodes. Each node is a source or destination of information. Each node
requires one or more network adapters to provide a physical interface for communicating with
other nodes. Hosts and servers connect to the SAN through one or more Fibre Channel host bus
adapters (HBA) or converged network adapters (CNA) which are installed on the PCIe bus of the
host. Examples of network adapters are FC host bus adapters (HBAs) and storage system front-
end adapters.

Hosts interface with the FC SAN via either HBAs or CNAs. These PCI devices appear to the host
operating system as SCSI adapters, and any storage volumes presented to the OS via them appear
as locally attached SCSI devices. Both types of card also offer hardware offloads for FCP
operations, whereas CNA cards also offer hardware offloads of other protocols such as iSCSI and
TCP/IP.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

120

3.2.2.2 FC Interconnecting devices: (Hubs, Switches and Directors)


The commonly used interconnecting devices in FC SANs are FC hubs, FC switches, and FC
directors. FC switches and directors provide the connectivity between hosts and storage. Using
switches for connectivity offers huge scalability as well as good performance and improved
manageability. Switches provide fabric services to help partition the SAN and make it more
manageable.
Switches and directors provide connectivity between end devices such as hosts and storage. They
operate at layers FC-0, FC-1, and FC-2 and provide full bandwidth between communicating end
devices. They also provide various fabric services that simplify management and enable
scalability. Also, if multiple FC switches are properly networked together, they merge and form
a single common fabric.

• FC HUbs – FC hubs are used as communication devices in Fibre Channel Arbitrated


Loop (FC-AL) implementations. Hubs physically connect nodes in a logical loop or a
physical star topology. All the nodes must share the loop because data travels through
all the connection points. Because of the availability of low-cost and high-performance
switches, the FC switches are preferred over the FC hubs in FC SAN deployments.

• FC switch – FC switches are more intelligent than FC hubs and directly route data
from one physical port to another. Therefore, the nodes do not share the data path.
Instead, each node has a dedicated communication path. The FC switches are
commonly available with a fixed port count. Some of the ports can be active for
operational purpose and the rest remain unused. The number of active ports can be
scaled-up non-disruptively.

• FC Directors – FC directors are high-end switches with a higher port count. A director
has a modular architecture and its port count is scaled-up by inserting additional line
cards or blades to the director’s chassis. Directors contain redundant components with
automated failover capability. Its key components such as switch controllers, blades,
power supplies, and fan modules are all hot-swappable. These insure high availability
for business critical applications.
The difference between directors and switches is that larger switches, usually with 128 or more
ports, are referred to as directors, whereas those with lower port counts are referred to as switches
or workgroup switches. Directors have more high-availability (HA) features and more built-in
redundancy than smaller workgroup-type switches. For example, director switches can have two
control processor cards running in active/passive mode. In the event that the active control
processor fails, the standby assumes control and service is maintained. This redundant control
processor model also allows for non-disruptive firmware updates. switches do not have this level
of redundancy.
3.2.2.3 FC Storage Arrays
Active-active storage system
Supports access to the LUNs simultaneously through all the storage ports that are available
without significant performance degradation. All the paths are active, unless a path fails.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

121

Active-passive storage system


A system in which one storage processor is actively providing access to a given LUN. The other
processors act as a backup for the LUN and can be actively providing access to other LUN I/O.
I/O can be successfully sent only to an active port for a given LUN. If access through the active
storage port fails, one of the passive storage processors can be activated by the servers accessing
it.
Asymmetrical storage system
Supports Asymmetric Logical Unit Access (ALUA). ALUA-compliant storage systems provide
different levels of access per port. With ALUA, the host can determine the states of target ports
and prioritize paths. The host uses some of the active paths as primary, and uses others as
secondary.

3.2.2.4 FC Cabling: (Multimode fiber (MMF), Single-mode fiber (SMF))


Cables:

FC SAN implementations primarily use optical fiber cabling. Copper cables may be used for
shorter distances because it provides acceptable signal-to-noise ratio for distances up to 30
meters. Optical fiber cables carry data in the form of light. There are two types of optical cables:
multimode and single-mode.

• Multimode fiber (MMF) cable carries multiple beams of light projected at different
angles simultaneously onto the core of the cable. In an MMF transmission, multiple

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

122

light beams travelling inside the cable tend to disperse and collide. This collision
weakens the signal strength after it travels a certain distance – a process known as
modal dispersion. Due to modal dispersion, an MMF cable is typically used for short
distances, commonly within a data center.

• Single-mode fiber (SMF) carries a single ray of light projected at the center of the core.
The small core and the single light wave help to limit modal dispersion. Single-mode
provides minimum signal attenuation over maximum distance (up to 10 km). A single-
mode cable is used for long-distance cable runs, and the distance usually depends on
the power of the laser at the transmitter and the sensitivity of the receiver.

3.2.2.5 Logical Components: FC SAN Protocol Stack


FC protocol forms the fundamental construct of the FC SAN infrastructure. FC protocol
predominantly is the implementation of SCSI over an FC network. SCSI data is encapsulated and
transported within FC frames.
SCSI over FC overcomes the distance and the scalability limitations associated with traditional
direct-attached storage. Storage devices attached to the FC SAN appear as locally attached
devices to the operating system (OS) or hypervisor running on the compute system.
FC Protocol defines the communication protocol in five layers:
FC-4 Upper Layer Protocol - It is the uppermost layer in the FCP stack. This layer defines the
application interfaces and the way Upper Layer Protocols (ULPs) are mapped to the lower FC
layers. The FC standard defines several protocols that can operate on the FC-4 layer. Some of the
protocols include SCSI, High Performance Parallel Interface (HIPPI) Framing Protocol, ESCON,
Asynchronous Transfer Mode (ATM), and IP.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

123

FC-2 Transport Layer


The FC-2 is the transport layer that contains the payload, addresses of the source and destination
ports, and link control information. The FC-2 layer provides Fibre Channel addressing, structure,
and organization of data (frames, sequences, and exchanges). It also defines fabric services,
classes of service, flow control, and routing.
FC-1 Transmission Protocol
This layer defines the transmission protocol that includes serial encoding and decoding rules,
special characters used, and error control. At the transmitter node, an 8-bit character is encoded
into a 10-bit transmissions character. This character is then transmitted to the receiver node. At
the receiver node, the 10-bit character is passed to the FC-1 layer, which decodes the 10-bit
character into the original 8-bit character.
FC-0 Physical Interface
FC-0 is the lowest layer in the FCP stack. This layer defines the physical interface, media, and
transmission of raw bits. The FC-0 specification includes cables, connectors, and optical and
electrical parameters for a variety of data rates. The FC transmission can use both electrical and
optical media.
3.2.2.6 FC SAN Addressing
An FC address is dynamically assigned when a node port logs on to the fabric. The FC address
has a distinct format.

The first field of the FC address contains the domain ID of the switch. A domain ID is a unique
number provided to each switch in the fabric. Although this is an 8-bit field, there are only 239
available addresses for domain ID because some addresses are deemed special and reserved for
fabric services.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

124

For example, FFFFFC is reserved for the name server, and FFFFFE is reserved for the fabric
login service.
The area ID is used to identify a group of switch ports used for connecting nodes. An example of
a group of ports with common area ID is a port card on the switch.
The last field, the port ID, identifies the port within the group. Therefore, the maximum possible
number of node ports in a switched fabric is calculated as:
239 domains X 256 areas X 256 ports = 15,663,104 ports.

FC Address of an NL_port
The FC addressing scheme for an NL_port differs from other ports. The two upper bytes in the
FC addresses of the NL_ports in a private loop are assigned zero values.
However, when an arbitrated loop is connected to a fabric through an FL_port, it becomes a public
loop. In this case, an NL_port supports a fabric login.
The two upper bytes of this NL_port are then assigned a positive value, called a loop identifier,
by the switch. The loop identifier is the same for all NL_ports on a given loop.
Figure 6-15 illustrates the FC address of an NL_port in both a public loop and a private loop. The
last field in the FC addresses of the NL_ports, in both public and private loops, identifies the AL-
PA. There are 127 allowable AL-PA addresses; one address is reserved for the FL_port on the
switch.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

125

3.2.2.7 FC Fabrics
A fabric is a collection of connected FC switches that have a common set of services such as they
can share a common name server, common zoning database and common FSPS routing table.
You can also deploy dual redundant fabrics for resiliency. Each fabric is viewed and managed as
a single logical entity and it is common across the fabric to update the zoning configuration from
any switch in the fabric.

Every FC switch in a fabric needs a domain ID. This domain ID is a numeric string that is used
to uniquely identify the switch in the fabric. These domain IDs can be administratively set or
dynamically assigned by the principal switch in a fabric during reconfiguration. These Domain
ID must be a unique IDs within a fabric and should be used for another switch.
Principal Switch is a main switch in a fabric that is responsible for managing the distribution of
domain IDs within the fabric.
3.2.2.8 FC Frame structure
In an FC network, data transport is analogous to a conversation between two people, whereby a
frame represents a word, a sequence represents a sentence, and an exchange represents a
conversation.
Exchange: An exchange operation enables two node ports to identify and manage a set of
information units. Each upper layer protocol (ULP) has its protocol-specific information that
must be sent to another port to perform certain operations. This protocol-specific information is
called an information unit. The structure of these information units is defined in the FC-4 layer.
This unit maps to a sequence. An exchange is composed of one or more sequences.
Sequence: A sequence refers to a contiguous set of frames that are sent from one port to another.
A sequence corresponds to an information unit, as defined by the ULP.
Frame: A frame is the fundamental unit of data transfer at FC-2 layer. An FC frame consists of
five parts: start of frame (SOF), frame header, data field, cyclic redundancy check (CRC), and
end of frame (EOF).

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

126

The S_ID and D_ID are standard FC addresses for the source port and the destination port,
respectively. The SEQ_ID and OX_ID identify the frame as a component of a specific sequence
and exchange, respectively.
The frame header also defines the following fields:
■ Routing Control (R_CTL): This field denotes whether the frame is a link control frame or a
data frame. Link control frames are non-data frames that do not carry any payload. These frames
are used for setup and messaging. In contrast, data frames carry the payload and are used for data
transmission.
■ Class Specific Control (CS_CTL): This field specifies link speeds for class 1 and class 4 data
transmission.
■ TYPE: This field describes the upper layer protocol (ULP) to be carried on the frame if it is a
data frame. However, if it is a link control frame, this field is used to signal an event such as
“fabric busy.” For example, if the TYPE is 08, and the frame is a data frame, it means that the
SCSI will be carried on an FC.
■ Data Field Control (DF_CTL): A 1-byte field that indicates the existence of any optional
headers at the beginning of the data payload. It is a mechanism to extend header information into
the payload.
■ Frame Control (F_CTL): A 3-byte field that contains control information related to frame
content. For example, one of the bits in this field indicates whether this is the first sequence of
the exchange. The SOF and EOF act as delimiters. The frame header is 24 bytes long and contains
addressing information for the frame. The data field in an FC frame contains the data payload, up
to 2,112 bytes of actual data – in most cases the SCSI data. The CRC checksum facilitates error
detection for the content of the frame. This checksum verifies data integrity by checking whether
the content of the frames are received correctly. The CRC checksum is calculated by the sender
before encoding at the FC-1 layer. Similarly, it is calculated by the receiver after decoding at the
FC-1 layer.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

127

3.2.2.9 FC Services (Fabric login server, Name server, Fabric controller, Management
server)
All FC switches, regardless of the manufacturer, provide a common set of services as defined in
the FC standards. These services are available at certain predefined addresses. Some of these
services are Fabric Login Server, Fabric Controller, Name Server, and Management Server.
Fabric Login Server: It is located at the predefined address of FFFFFE and is used during the
initial part of the node’s fabric login process.
Name Server (formally known as Distributed Name Server): It is located at the predefined
address FFFFFC and is responsible for name registration and management of node ports. Each
switch exchanges its Name Server information with other switches in the fabric to maintain a
synchronized, distributed name service.
Fabric Controller: Each switch has a Fabric Controller located at the predefined address
FFFFFD. The Fabric Controller provides services to both node ports and other switches. The
Fabric Controller is responsible for managing and distributing Registered State Change
Notifications (RSCNs) to the node ports registered with the Fabric Controller. If there is a change
in the fabric, RSCNs are sent out by a switch to the attached node ports. The Fabric Controller
also generates Switch Registered State Change Notifications (SW-RSCNs) to every other domain
(switch) in the fabric. These RSCNs keep the name server up-to-date on all switches in the fabric.
Management Server: FFFFFA is the FC address for the Management Server. The Management
Server is distributed to every switch within the fabric. The Management Server enables the FC
SAN management software to retrieve information and administer the fabric.
Fabric services define three login types:

• Fabric login (FLOGI): It is performed between an N_Port and an F_Port. To log on


to the fabric, a node sends a FLOGI frame with the WWNN and WWPN parameters
to the login service at the predefined FC address FFFFFE (Fabric Login Server). In
turn, the switch accepts the login and returns an Accept (ACC) frame with the assigned
FC address for the node. Immediately after the FLOGI, the N_Port registers itself with
the local Name Server on the switch, indicating its WWNN, WWPN, port type, class
of service, assigned FC address, and so on. After the N_Port has logged in, it can query
the name server database for information about all other logged in ports.

• Port login (PLOGI): It is performed between two N_Ports to establish a session. The
initiator N_Port sends a PLOGI request frame to the target N_Port, which accepts it.
The target N_Port returns an ACC to the initiator N_Port. Next, the N_Ports exchange
service parameters relevant to the session.

• Process login (PRLI): It is also performed between two N_Ports. This login relates to
the FC-4 ULPs, such as SCSI. If the ULP is SCSI, N_Ports exchange SCSI-related
service parameters.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

128

3.2.2.10 FC flow control


Flow control is the process to regulate the data transmission rate between two devices so that a
transmitting device does not overflow a receiving device with data.
A fabric uses the buffer-to-buffer credit (BB_Credit) mechanism for flow control. The BB_Credit
management may occur between any two FC ports.
Flow control defines the pace of the flow of data frames during data transmission. FC technology
uses two flow-control mechanisms: buffer-to-buffer credit (BB_Credit) and end-to-end credit
(EE_Credit).
BB_Credit
FC uses the BB_Credit mechanism for hardware-based flow control. BB_Credit controls the
maximum number of frames that can be present over the link at any given point in time. In a
switched fabric, BB_Credit management may take place between any two FC ports. The
transmitting port maintains a count of free receiver buffers and continues to send frames if the
count is greater than 0. The BB_Credit mechanism provides frame acknowledgment through the
Receiver Ready (R_RDY) primitive.
EE_Credit
The function of end-to-end credit, known as EE_Credit, is similar to that of BB_Credit. When an
initiator and a target establish themselves as nodes communicating with each other, they exchange
the EE_Credit parameters (part of Port Login). The EE_Credit mechanism affects the flow control
for class 1 and class 2 traffic only.
3.2.2.11 Zoning
Zoning is an FC switch function that enables node ports within the fabric to be logically
segmented into groups and communicate with each other within the group.

Zoning also provides access control, along with other access control mechanisms, such as LUN
masking. Zoning provides control by allowing only the members in the same zone to establish

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

129

communication with each other. Multiple zones can be grouped together to form a zone set and
this zone set is applied to the fabric. Any new zone configured needs to be added to the active
zone set in order to applied to the fabric.
Zone members, zones, and zone sets form the hierarchy defined in the zoning process. A zone set
is composed of a group of zones that can be activated or deactivated as a single entity in a fabric.
Multiple zone sets may be defined in a fabric, but only one zone set can be active at a time.
Members are the nodes within the FC SAN that can be included in a zone.
FC switch ports, FC HBA ports, and storage system ports can be members of a zone. A port or
node can be a member of multiple zones. Nodes distributed across multiple switches in a switched
fabric may also be grouped into the same zone. Zone sets are also referred to as zone
configurations.

Best Practices for Zoning

• Always keep the zones small so that the troubleshooting may get simpler.

• Have only a single initiator in each zone and it is not recommended to have more than
one initiator in a zone.

• To make troubleshooting easier, also keep the number of targets in a zone small.

• Give meaningful aliases and names to your zones so that they can easily identified
during troubleshooting.

• Zone changes need to be done with extreme caution and caring to prevent unwanted
access of sensitive data.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

130

Zoning can be categorized into three types:

WWN zoning: It uses World Wide Names to define zones. The zone members are the unique
WWN addresses of the FC HBA and its targets (storage systems). A major advantage of WWN
zoning is its flexibility. If an administrator moves a node to another switch port in the fabric, the
node maintains connectivity to its zone partners without having to modify the zone configuration.
This is possible because the WWN is static to the node port. WWN zoning is also referred as soft
zoning sometimes.

Port zoning: It uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone members are the
port identifiers (switch domain ID and port number) to which FC HBA and its targets (storage
systems) are connected. If a node is moved to another switch port in the fabric, port zoning must
be modified to allow the node, in its new port, to participate in its original zone. However, if an
FC HBA or storage system port fails, an administrator just has to replace the failed device without
changing the zoning configuration. Port zoning is also referred as hard zoning sometimes.
Mixed zoning: It combines the qualities of both WWN zoning and port zoning. Using mixed
zoning enables a specific node port to be tied to the WWN of another node.
3.2.2.12 FC Classes and Service
The FC standards define different classes of service to meet the requirements of a wide range of
applications. The table below shows three classes of services and their features (Table 6-1).

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

131

Another class of services is class F, which is intended for use by the switches communicating
through ISLs. Class F is similar to Class 2, and it provides notification of non-delivery of frames.
Other defined Classes 4, 5, and 6 are used for specific applications. Currently, these services are
not in common use.
3.2.2.13 Virtual SAN
Virtual SAN (also called virtual fabric) is a logical fabric on an FC SAN, which enables
communication among a group of nodes regardless of their physical location in the fabric.
Each SAN can be partitioned into smaller virtual fabrics, generally called as VSANs. VSANs are
similar to VLANs in the networking and allows to partition physical kit into multiple smaller
logical SANs/fabrics. It is possible to route traffic between virtual fabrics by using vendor-
specific technologies.
In a VSAN, a group of node ports communicate with each other using a virtual topology defined
on the physical SAN. Multiple VSANs may be created on a single physical SAN. Each VSAN
behaves and is managed as an independent fabric. Each VSAN has its own fabric services,
configuration, and set of FC addresses. Fabric-related configurations in one VSAN do not affect
the traffic in another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

132

VSANs improve SAN security, scalability, availability, and manageability. VSANs provide
enhanced security by isolating the sensitive data in a VSAN and by restricting the access to the
resources located within that VSAN.
For example, a cloud provider typically isolates the storage pools for multiple cloud services by
creating multiple VSANs on an FC SAN. Further, the same FC address can be assigned to nodes
in different VSANs, thus increasing the fabric scalability.
The events causing traffic disruptions in one VSAN are contained within that VSAN and are not
propagated to other VSANs. VSANs facilitate an easy, flexible, and less expensive way to
manage networks. Configuring VSANs is easier and quicker compared to building separate
physical FC SANs for various node groups. To regroup nodes, an administrator simply changes
the VSAN configurations without moving nodes and recabling.
Configuring VSAN
To configure VSANs on a fabric, an administrator first needs to define VSANs on fabric switches.
Each VSAN is identified with a specific number called VSAN ID. The next step is to assign a
VSAN ID to the F_Ports on the switch. By assigning a VSAN ID to an F_Port, the port is included
in the VSAN. In this manner, multiple F_Ports can be grouped into a VSAN.
For example, an administrator may group switch ports (F_Ports) 1 and 2 into VSAN 10 (ID) and
ports 6 to 12 into VSAN 20 (ID). If an N_Port connects to an F_Port that belongs to a VSAN, it
becomes a member of that VSAN. The switch transfers FC frames between switch ports that
belong to the same VSAN.
VSAN versus Zone

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

133

Both VSANs and zones enable node ports within a fabric to be logically segmented into groups.
But they are not same and their purposes are different. There is a hierarchical relationship between
them. An administrator first assigns physical ports to VSANs and then configures independent
zones for each VSAN. A VSAN has its own independent fabric services, but the fabric services
are not available on a per-zone basis.

VSAN Trunking

VSAN trunking allows network traffic from multiple VSANs to traverse a single ISL. It supports
a single ISL to permit traffic from multiple VSANs along the same path. The ISL through which
multiple VSAN traffic travels is called a trunk link.

VSAN trunking enables a single E_Port to be used for sending or receiving traffic from multiple
VSANs over a trunk link. The E_Port capable of transferring multiple VSAN traffic is called a
trunk port. The sending and receiving switches must have at least one trunk E_Port configured
for all of or a subset of the VSANs defined on the switches.
VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It reduces the
number of ISLs when the switches are configured with multiple VSANs. As the number of ISLs
between the switches decreases, the number of E_Ports used for the ISLs also reduces. By
eliminating needless ISLs, the utilization of the remaining ISLs increases. The complexity of
managing the FC SAN is also minimized with a reduced number of ISLs.

VSAN Tagging

VSAN tagging is the process of adding or removing a marker or tag to the FC frames that contains
VSAN-specific information. Associated with VSAN trunking, it helps isolate FC frames from
multiple VSANs that travel through and share a trunk link. Whenever an FC frame enters an FC
switch, it is tagged with a VSAN header indicating the VSAN ID of the switch port (F_Port)
before sending the frame down to a trunk link.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

134

The receiving FC switch reads the tag and forwards the frame to the destination port that
corresponds to that VSAN ID. The tag is removed once the frame leaves a trunk link to reach an
N_Port.
3.2.3 FC SAN Connectivity
The FC SAN physical components such as network cables network adapters and hubs or switches
can be used to design a Fibre channel Storage Area Network. The different types of FC
architecture which can be designed are

• Point-to-point
• Fibre channel arbitrated loop (FC-AL)
• Fibre channel switched fabric (FC-SW).

3.2.3.1 Point-to-Point
Point-to-point is the simplest FC configuration — two devices are connected directly to each
other, as shown in Figure 6-6. This configuration provides a dedicated connection for data
transmission between nodes.
However, the point-to-point configuration offers limited connectivity, as only two devices can
communicate with each other at a given time. Moreover, it cannot be scaled to accommodate a
large number of network devices. Standard DAS uses point to-point connectivity

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

135

3.2.3.2 Fiber Channel Arbitrated Loop


In Arbitrated loop connectivity, the devices are attached to a shared loop. Each device contends
with other devices to perform I/O operations. The devices on the loop must “arbitrate” to gain
control of the loop.
At any given time, only one device can perform I/O operations on the loop. Because each device
in a loop must wait for its turn to process an I/O request, the overall performance in FC-AL
environments is low.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

136

Further, adding or removing a device results in loop re-initialization, which can cause a
momentary pause in loop traffic. As a loop configuration, FC-AL can be implemented without
any interconnecting devices by directly connecting one device to another two devices in a ring
through cables. However, FC-AL implementations may also use FC hubs through which the
arbitrated loop is physically connected in a star topology.
FC-AL Transmission
When a node in the FC-AL topology attempts to transmit data, the node sends an arbitration
(ARB) frame to each node on the loop. If two nodes simultaneously attempt to gain control of the
loop, the node with the highest priority is allowed to communicate with another node.
When the initiator node receives the ARB request it sent, it gains control of the loop. The initiator
then transmits data to the node with which it has established a virtual connection. Figure 6-8
illustrates the process of data transmission in an FC-AL configuration.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

137

3.2.3.3 Fiber Channel Switched Fabric


FC-Switch: It involves a single FC switch or a network of FC switches (including FC directors)
to interconnect the nodes. It is also referred to as fabric connect. A fabric is a logical space in
which all nodes communicate with one another in a network. In a fabric, the link between any
two switches is called an inter-switch link (ISL). ISLs enable switches to be connected together
to form a single, larger fabric.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

138

They enable the transfer of both storage traffic and fabric management traffic from one switch to
another. In FC-SW, nodes do not share a loop; instead, data is transferred through a dedicated
path between the nodes. Unlike a loop configuration, an FC-SW configuration provides high
scalability. The addition or removal of a node in a switched fabric is minimally disruptive; it does
not affect the ongoing traffic between other nodes. FC switches operate up to FC-2 layer, and
each switch supports and assists in providing a rich set of fabric services such as the FC name
server, the zoning database and time synchronization service. When a fabric contains more than
one switch, these switches are connected through a link known as an inter-switch link.

Inter-switch links (ISLs) connect multiple switches together, allowing them to merge into a
common fabric that can be managed from any switch in the fabric. ISLs can also be bonded into
logical ISLs that provide the aggregate bandwidth of each component ISL as well as providing
load balancing and high-availability features.

FC-SW Transmission

FC-SW uses switches that are intelligent devices. They can switch data traffic from an initiator
node to a target node directly through switch ports. Frames are routed between source and
destination by the fabric. As shown in Figure 6-11, if node B wants to communicate with node
D, Nodes should individually login first and then transmit data via the FC-SW. This link is
considered a dedicated connection between the initiator and the target.

When the number of tiers in a fabric increases, the distance that a fabric management message
must travel to reach each switch in the fabric also increases. The increase in the distance also
increases the time taken to propagate and complete a fabric reconfiguration event, such as the
addition of a new switch, or a zone set propagation event (detailed later in this chapter). Figure
6-10 illustrates two-tier and three-tier fabric architecture.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

139

3.2.4 FC SAN Port virtualization


3.2.4.1 Types of Ports (N_Port, E_Port, F_Port, G_Port)
The ports in a switched fabric can be one of the following types

• N_Port: It is an end point in the fabric. This port is also known as the node port.
Typically, it is a compute system port (FC HBA port) or a storage system port that is
connected to a switch in a switched fabric.

• E_Port: It is a port that forms the connection between two FC switches. This port is
also known as the expansion port. The E_Port on an FC switch connects to the E_Port
of another FC switch in the fabric ISLs.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

140

• F_Port: It is a port on a switch that connects an N_Port. It is also known as a fabric


port.

• G_Port: It is a generic port on a switch that can operate as an E_Port or an F_Port and
determines its functionality automatically during initialization.

Common FC ports speeds are 2 Gbps, 4Gbps, 8Gbps and 16Gbps. FC ports along with HBA
ports, switch ports and storage array ports can be configured to autonegotiate their speed,
autonegotiate is a protocol that allows two devices to agree on a common speed for the link. It is
a good practice to hard-code the same speed at both ends of the link.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

141

World Wide Name (WWN)

Each device in the FC environment is assigned a 64-bit unique identifier called the World Wide
Name (WWN). The FC environment uses two types of WWNs.

• World Wide Node Name (WWNN) – WWNN is used to physically identify FC


network adapters. Unlike an FC address, which is assigned dynamically, a WWN is a
static name for each device on an FC network. WWNs are similar to the Media Access
Control (MAC) addresses used in IP networking.

• WWNs are burned into the hardware or assigned through software. Several
configuration definitions in an FC SAN use WWN for identifying storage systems and
FC HBAs. WWNs are critical for FC SAN configuration as each node port has to be
registered by its WWN before the FC SAN recognizes it.
• World Wide Port Name (WWPN) – WWPN is used to physically identify FC adapter
ports or node ports. For example, a dual-port FC HBA has one WWNN and two
WWPNs
3.2.4.2 N_Port Virtualization
The proliferation of compute systems in a data centre causes increased use of edge switches in a
fabric. As the edge switch population grows, the number of domain IDs may become a concern
because of the limitation on the number of domain IDs in a fabric. N_Port.
Virtualization (NPV) addresses this concern by reducing the number of domain IDs in a fabric.
Edge switches supporting NPV do not require a domain ID. They pass traffic between the core
switch and the compute systems. NPV-enabled edge switches do not perform any fabric services,
and instead forward all fabric activity, such as login and name server registration to the core
switch.
All ports at the NPV edge switches that connect to the core switch are established as NP_Ports
(not E_Ports). The NP_Ports connect to an NPIV-enabled core director or switch. If the core
director or switch is not NPIV-capable, the NPV edge switches do not function. As the switch

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

142

enters or exits from NPV mode, the switch configuration is erased and it reboots. Therefore,
administrators should take care when enabling or disabling NPV on a switch. The figure on the
slide shows a core-edge fabric that comprises two edge switches in NPV mode and one core
switch (an FC director).
3.2.4.3 N_Port ID Virtualization (NPIV)
It enables a single N_Port (such as an FC HBA port) to function as multiple virtual N_Ports. Each
virtual N_Port has a unique WWPN identity in the FC SAN. This allows a single physical N_Port
to obtain multiple FC addresses.

VMware or Hypervisors leverage NPIV to create virtual N_Ports on the FC HBA and then assign
the virtual N_Ports to virtual machines (VMs). A virtual N_Port acts as a virtual FC HBA port.
This enables a VM to directly access LUNs assigned to it
NPIV enables an administrator to restrict access to specific LUNs to specific VMs using security
techniques like zoning and LUN masking; similarly to the assignment of a LUN to a physical
compute system. To enable NPIV, both the FC HBAs and the FC switches must support NPIV.
The physical FC HBAs on the compute system, using their own WWNs, must have access to all
LUNs that are to be accessed by VMs running on that compute system.
3.2.5 FC SAN Topologies
FC SAN offers 3 types of FC Switch topologies. They are
• Single-Switch topology
• Mesh topology
• Core-edge topology
3.2.5.1 Single-Switch topology
In a single-switch topology, the fabric consists of only a single switch. Both the compute systems
and the storage systems are connected to the same switch. A key advantage of a single-switch
fabric is that it does not need to use any switch port for ISLs. Therefore, every switch port is
usable for compute system or storage system connectivity. Further, this topology helps eliminate
FC frames travelling over the ISLs and consequently eliminates the ISL delays.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

143

A typical implementation of a single-switch fabric would involve the deployment of an FC


director. FC directors are high-end switches with a high port count.
When additional switch ports are needed over time, new ports can be added via add-on line cards
(blades) in spare slots available on the director chassis. To some extent, a bladed solution
alleviates the port count scalability problem inherent in a single-switch topology.
3.2.5.2 Mesh Topology
A mesh topology may be one of the two types: full mesh or partial mesh. In a full mesh, every
switch is connected to every other switch in the topology.
A full mesh topology may be appropriate when the number of switches involved is small. A
typical deployment would involve up to four switches or directors, with each of them servicing
highly localised compute-to-storage traffic.
In a full mesh topology, a maximum of one ISL or hop is required for compute-to-storage traffic.
However, with the increase in the number of switches, the number of switch ports used for ISL
also increases. This reduces the available switch ports for node connectivity.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

144

In a partial mesh topology, not all the switches are connected to every other switch. In this
topology, several hops or ISLs may be required for the traffic to reach its destination. Partial mesh
offers more scalability than full mesh topology. However, without proper placement of compute
and storage systems, traffic management in a partial mesh fabric might be complicated and ISLs
could become overloaded due to excessive traffic aggregation.
3.2.5.3 Core-edge topology
The core-edge topology has two types of switch tiers: edge and core.
The edge tier is usually composed of switches and offers an inexpensive approach to adding more
compute systems in a fabric. The edge-tier switches are not connected to each other. Each switch
at the edge tier is attached to a switch at the core tier through ISLs.
The core tier is usually composed of directors that ensure high fabric availability. In addition,
typically all traffic must either traverse this tier or terminate at this tier. In this configuration, all
storage systems are connected to the core tier, enabling compute-to-storage traffic to traverse only
one ISL. Compute systems that require high performance may be connected directly to the core
tier and consequently avoid ISL delays.

The core-edge topology increases connectivity within the FC SAN while conserving the overall
port utilization. It eliminates the need to connect edge switches to other edge switches over ISLs.
Reduction of ISLs can greatly increase the number of node ports that can be connected to the
fabric. If fabric expansion is required, then administrators would need to connect additional edge
switches to the core. The core of the fabric is also extended by adding more switches or directors

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

145

at the core tier. Based on the number of core-tier switches, this topology has different variations,
such as single-core topology and dual-core topology. To transform a single-core topology to dual-
core, new ISLs are created to connect each edge switch to the new core switch in the fabric.
3.2.6 Link aggregation and zoning
Link aggregation and the zoning in Fiber channel Storage Area Network are as follows
3.2.6.1 Link aggregation with example
Link aggregation combines two or more parallel ISLs into a single logical ISL, called a port-
channel, yielding higher throughput than a single ISL could provide. For example, the aggregation
of 10 ISLs into a single port-channel provides up to 160 Gb/s throughput assuming the bandwidth
of an ISL is 16 Gb/s.

Link aggregation optimizes fabric performance by distributing network traffic across the shared
bandwidth of all the ISLs in a port-channel. This allows the network traffic for a pair of node
ports to flow through all the available ISLs in the port-channel rather than restricting the traffic
to a specific, potentially congested ISL. The number of ISLs in a port channel can be scaled
depending on application’s performance requirement.
3.2.6.2 Zoning
Zoning is an FC switch function that enables node ports within the fabric to be logically
segmented into groups and communicate with each other within the group.

Zoning also provides access control, along with other access control mechanisms, such as LUN
masking. Zoning provides control by allowing only the members in the same zone to establish
communication with each other. Multiple zones can be grouped together to form a zone set and
this zone set is applied to the fabric. Any new zone configured needs to be added to the active
zone set in order to applied to the fabric.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

146

Zone members, zones, and zone sets form the hierarchy defined in the zoning process. A zone set
is composed of a group of zones that can be activated or deactivated as a single entity in a fabric.
Multiple zone sets may be defined in a fabric, but only one zone set can be active at a time.
Members are the nodes within the FC SAN that can be included in a zone.
FC switch ports, FC HBA ports, and storage system ports can be members of a zone. A port or
node can be a member of multiple zones. Nodes distributed across multiple switches in a switched
fabric may also be grouped into the same zone. Zone sets are also referred to as zone
configurations.

3.2.6.3 Best practices for zoning

• Always keep the zones small so that the troubleshooting may get simpler.

• Have only a single initiator in each zone and it is not recommended to have more than
one initiator in a zone.

• To make troubleshooting easier, also keep the number of targets in a zone small.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

147

• Give meaningful aliases and names to your zones so that they can easily identified
during troubleshooting.

• Zone changes need to be done with extreme caution and caring to prevent unwanted
access of sensitive data.

3.2.6.4 Types of Zoning (WWN zoning, Port zoning, Mixed Zoning


WWN zoning: It uses World Wide Names to define zones. The zone members are the unique
WWN addresses of the FC HBA and its targets (storage systems). A major advantage of WWN
zoning is its flexibility. If an administrator moves a node to another switch port in the fabric, the
node maintains connectivity to its zone partners without having to modify the zone configuration.
This is possible because the WWN is static to the node port. WWN zoning is also referred as soft
zoning sometimes.

Port zoning: It uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone members are the
port identifiers (switch domain ID and port number) to which FC HBA and its targets (storage
systems) are connected. If a node is moved to another switch port in the fabric, port zoning must
be modified to allow the node, in its new port, to participate in its original zone. However, if an
FC HBA or storage system port fails, an administrator just has to replace the failed device without
changing the zoning configuration. Port zoning is also referred as hard zoning sometimes.
Mixed zoning: It combines the qualities of both WWN zoning and port zoning. Using mixed
zoning enables a specific node port to be tied to the WWN of another node.

3.2.7 Virtualization in FC SAN (VSAN) environment


Virtual SAN (also called virtual fabric) is a logical fabric on an FC SAN, which enables
communication among a group of nodes regardless of their physical location in the fabric.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

148

Each SAN can be partitioned into smaller virtual fabrics, generally called as VSANs. VSANs are
similar to VLANs in the networking and allows to partition physical kit into multiple smaller
logical SANs/fabrics. It is possible to route traffic between virtual fabrics by using vendor-
specific technologies.
In a VSAN, a group of node ports communicate with each other using a virtual topology defined
on the physical SAN. Multiple VSANs may be created on a single physical SAN. Each VSAN
behaves and is managed as an independent fabric. Each VSAN has its own fabric services,
configuration, and set of FC addresses. Fabric-related configurations in one VSAN do not affect
the traffic in another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.

VSANs improve SAN security, scalability, availability, and manageability. VSANs provide
enhanced security by isolating the sensitive data in a VSAN and by restricting the access to the
resources located within that VSAN.
For example, a cloud provider typically isolates the storage pools for multiple cloud services by
creating multiple VSANs on an FC SAN. Further, the same FC address can be assigned to nodes
in different VSANs, thus increasing the fabric scalability.
The events causing traffic disruptions in one VSAN are contained within that VSAN and are not
propagated to other VSANs. VSANs facilitate an easy, flexible, and less expensive way to
manage networks. Configuring VSANs is easier and quicker compared to building separate
physical FC SANs for various node groups. To regroup nodes, an administrator simply changes
the VSAN configurations without moving nodes and recabling.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

149

3.2.7.1 Configuring VSAN


To configure VSANs on a fabric, an administrator first needs to define VSANs on fabric switches.
Each VSAN is identified with a specific number called VSAN ID. The next step is to assign a
VSAN ID to the F_Ports on the switch. By assigning a VSAN ID to an F_Port, the port is included
in the VSAN. In this manner, multiple F_Ports can be grouped into a VSAN.
For example, an administrator may group switch ports (F_Ports) 1 and 2 into VSAN 10 (ID) and
ports 6 to 12 into VSAN 20 (ID). If an N_Port connects to an F_Port that belongs to a VSAN, it
becomes a member of that VSAN. The switch transfers FC frames between switch ports that
belong to the same VSAN.
3.2.7.2 VSAN verses Zone
Both VSANs and zones enable node ports within a fabric to be logically segmented into groups.
But they are not same and their purposes are different. There is a hierarchical relationship between
them. An administrator first assigns physical ports to VSANs and then configures independent
zones for each VSAN. A VSAN has its own independent fabric services, but the fabric services
are not available on a per-zone basis.
3.2.7.3 VSAN Trunking
VSAN trunking allows network traffic from multiple VSANs to traverse a single ISL. It supports
a single ISL to permit traffic from multiple VSANs along the same path. The ISL through which
multiple VSAN traffic travels is called a trunk link.

VSAN trunking enables a single E_Port to be used for sending or receiving traffic from multiple
VSANs over a trunk link. The E_Port capable of transferring multiple VSAN traffic is called a
trunk port. The sending and receiving switches must have at least one trunk E_Port configured
for all of or a subset of the VSANs defined on the switches.
VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It reduces the
number of ISLs when the switches are configured with multiple VSANs. As the number of ISLs
between the switches decreases, the number of E_Ports used for the ISLs also reduces. By
eliminating needless ISLs, the utilization of the remaining ISLs increases. The complexity of
managing the FC SAN is also minimized with a reduced number of ISLs.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

150

3.2.7.4 VSAN Tagging


VSAN tagging is the process of adding or removing a marker or tag to the FC frames that contains
VSAN-specific information. Associated with VSAN trunking, it helps isolate FC frames from
multiple VSANs that travel through and share a trunk link. Whenever an FC frame enters an FC
switch, it is tagged with a VSAN header indicating the VSAN ID of the switch port (F_Port)
before sending the frame down to a trunk link.

The receiving FC switch reads the tag and forwards the frame to the destination port that
corresponds to that VSAN ID. The tag is removed once the frame leaves a trunk link to reach an
N_Port.
3.2.8 Basic troubleshooting tips for Fiber Channel (FC) SAN issues
There are many areas where the errors can be made and you might experience lots of issues with
the mis-configuration settings. A thorough and deep understanding of the SAN configuration is
needed to troubleshoot any storage related issues. Slight differences can make a huge data loss
and could make the organisation collapse. To troubleshoot any kind of situation, follow these tips
as a starting step before the advanced troubleshooting. There might be other tools to troubleshoot
the issues but these are basic first steps which might help you save the time.
1) Always take backup of Switch Configurations
Regular backup of switch configurations needs to be done just in regular intervals just in case if
you are unable to troubleshoot the issue and needs to revert back to the previous configuration.
Such backup files tend to be human-readable flat files that are extremely useful if you need to
compare a broken configuration image to a previously known working configuration. Another
option might be to create a new zone configuration each time you make a change, and maintain
previous versions that can be rolled back to if there are problems after committing the change.
2) Troubleshooting Connectivity Issues

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

151

Many of the day-to-day issues that you see are connectivity issues such as hosts not being able to
see a new LUN or not being able to see storage or tape devices on the SAN. Connectivity issues
will be due to misconfigured zoning. Each vendor provides different tools to configure and
troubleshoot zoning, but the following common CLI commands can prove very helpful.
fcping
fcping is an FC version of the popular IP ping tool. fcping allows you to test the following:

• Whether a device (N_Port) is alive and responding to FC frames

• End-to-end connectivity between two N_Ports

• Latency

• Zoning between two devices


fcping is available on most switch platforms as well as being a CLI tool for most operating
systems and some HBAs. It works by sending Extended Link Service (ELS) echo request frames
to a destination, and the destination responding with ELS echo response frames. For example

# fcping 50:01:43:80:05:6c:22:ae
fctrace
Another tool that is modeled on a popular IP networking tool is the fctrace tool. This tool traces
a route/path to an N_Port. The following command shows an fctrace command example
# fctrace fcid 0xef0010 vsan 1
3) Things to check while troubleshooting Zoning

• Are your aliases correct ?

• If using port zoning, have your switch domain IDs changed ?

• If using WWPN zoning, have any of the HBA/WWPNs been changed ?

•Is your zone in the active zone set?


4) Rescan the SCSI Bus if required
After making zoning changes, LUN masking changes or any other work that changes a
LUN/volume presentation to a host, you may be required to rescan the SCSI bus on that host in
order to detect the new device. The following command shows how to rescan the SCSI bus on a
Windows server using the diskpart tool
DISKPART> list disk
DISKPART> rescan
If you know that your LUN masking and zoning are correct but the server still does not see the
device, it may be necessary to reboot the host.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

152

5) Understanding Switch Configuration Dumps


Each switch vendor also tends to have a built-in command/script that is used to gather configs
and logs to be sent to the vendor for their tech support groups to analyze. The output of these
commands/scripts can also be useful to you as a storage administrator. Each vendor has its own
version of these commands/scripts
Cisco – show tech-support
Brocade – support-show or support-save
QLogic – create support
6) Use Port Error Counters
Switch-based port error counters are an excellent way to identify physical connectivity issues
such as

• Bad cables (bent, kinked, or otherwise damaged cables)

• Bad connectors (dust on the connectors, loose connectors)


The following example shows the error counters for a physical switch port on a switch:
admin> portshow 4/15
These port counters can sometimes be misleading. It is perfectly normal to see high counts
against some of the values, and it is common to see values increase when a server is rebooted and
when similar changes occur. If you are not sure what to look for, check your switch
documentation, but also compare the counters to some of your known good ports.
If some counters are increasing on a given port that you are concerned with, but they are not
increasing on some known good ports, then you know that you have a problem on that port.
Other commands show similar error counters as well as port throughput. The following porter
show command shows some encoding out (enc out) and class 3 discard (disc c3) errors on port
0. This may indicate a bad cable, a bad port, or another hardware problem admin> porter-show
3.3 IP SAN
IP SAN uses Internet Protocol (IP) for the transport of storage traffic instead of Fibre Channel
(FC) cables. It transports block I/O over an IP-based network. Two primary protocols that
leverage IP as the transport mechanism for block-level data transmission are

• Internet SCSI (iSCSI)

• Fibre Channel over IP (FCIP).


iSCSI is a storage networking technology which allows storage resources to be shared over an IP
network Whereas FCIP is an IP-based protocol that enables distributed FC SAN islands to be
interconnected over an existing IP network. In FCIP, FC frames are encapsulated onto the IP
payload and transported over an IP network.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

153

IP is a matured technology and using IP as a storage networking option provides several


advantages.

• Most organizations have an existing IP-based network infrastructure, which could also
be used for storage networking and may be a more economical option than deploying
a new FC SAN infrastructure.

• IP network has no distance limitation, which makes it possible to extend or connect


SANs over long distances. With IP SAN, organizations can extend the geographical
reach of their storage infrastructure and transfer data that are distributed over wide
locations.

• Many long-distance disaster recovery (DR) solutions are already leveraging IP-based
networks. In addition, many robust and mature security options are available for IP
networks.
Typically, a storage system comes with both FC and iSCSI ports. This enables both the native
iSCSI connectivity and the FC connectivity in the same environment.
3.3.1 iSCSI
iSCSI is a storage networking technology which allows storage resources to be shared over an IP
network and most of the storage resources which are shared on an iSCSI SAN are disk resources.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

154

Just like as SCSI messages are mapped on Fibre Channel in FC SAN, iSCSI is a mapping of SCSI
protocol over TCP/IP.

iSCSI is an acronym for Internet Small Computer System Interface, it deals with block storage
and maps SCSI over traditional TCP/IP. This protocol is mostly used for sharing primary storage
such as disk drives and in some cases it is used for disk backup environment aswell.

SCSI commands are encapsulated at each layer of the network stack for eventual transmission
over an IP network. The TCP layer takes care of transmission reliability and in-order delivery
whereas the IP layer provides routing across the network.

In iSCSI SAN, initiators issue read/write data requests to targets over an IP network. Targets
respond to initiators over the same IP network. All iSCSI communications follow this request
response mechanism and all requests and responses are passed over the IP network as iSCSI
Protocol Data Units (PDUs). iSCSI PDU is the fundamental unit of communication in an iSCSI
SAN.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

155

iSCSI performance is influenced by three main components. Best initiator performance can be
achieved with dedicated iSCSI HBAs, target performance can be achieved by purpose-built iSCSI
arrays and finally the network performance can be achieved by dedicated network switches

Multiple layers of security should be implemented on an iSCSI SAN as security is the most
important in IT infra. These include CHAP for authentication, discovery domains to restrict
device discovery, network isolation and IPsec for encryption of in-flight data.

3.3.1.1 Components of iSCSI


iSCSI is an IP-based protocol that establishes and manages connections between hosts and storage
systems over IP. iSCSI is an encapsulation of SCSI I/O over IP.
iSCSI encapsulates SCSI commands and data into IP packets and transports them using TCP/IP.
iSCSI is widely adopted for transferring SCSI data over IP between hosts and storage systems
and among the storage systems. It is relatively inexpensive and easy to implement, especially
environments in which an FC SAN does not exist.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

156

Key components for iSCSI communication are

• iSCSI initiators such as an iSCSI HBA

• iSCSI targets such as a storage system with an iSCSI port

• IP-based network such as a Gigabit Ethernet LAN An iSCSI initiator sends commands
and associated data to a target and the target returns data and responses to the initiator.

3.3.1.2 iSCSI Host Connectivity


iSCSI host connectivity requires a hardware component, such as a NIC with a software
component (iSCSI initiator) or an iSCSI HBA. In order to use the iSCSI protocol, a software
initiator or a translator must be installed to route the SCSI commands to the TCP/IP stack.
A standard NIC, a TCP/IP offload engine (TOE) NIC card, and an iSCSI HBA are the three
physical iSCSI connectivity options.
A standard NIC is the simplest and least expensive connectivity option. It is easy to implement
because most servers come with at least one, and in many cases two, embedded NICs. It requires
only a software initiator for iSCSI functionality. However, the NIC provides no external
processing power, which places additional overhead on the host CPU because it is required to
perform all the TCP/IP and iSCSI processing.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

157

If a standard NIC is used in heavy I/O load situations, the host CPU may become a bottleneck.
TOE NIC help alleviate this burden. A TOE NIC offloads the TCP management functions from
the host and leaves iSCSI functionality to the host processor. The host passes the iSCSI
information to the TOE card and the TOE card sends the information to the destination using
TCP/IP. Although this solution improves performance, the iSCSI functionality is still handled by
a software initiator, requiring host CPU cycles.
An iSCSI HBA is capable of providing performance benefits, as it offloads the entire iSCSI and
TCP/IP protocol stack from the host processor. Use of an iSCSI HBA is also the simplest way for
implementing a boot from SAN environment via iSCSI. If there is no iSCSI HBA, modifications
have to be made to the basic operating system to boot a host from the storage devices because the
NIC needs to obtain an IP address before the operating system loads. The functionality of an
iSCSI HBA is very similar to the functionality of an FC HBA, but it is the most expensive option.
A fault-tolerant host connectivity solution can be implemented using host based multipathing
software (e.g., EMC PowerPath) regardless of the type of physical connectivity. Multiple NICs
can also be combined via link aggregation technologies to provide failover or load balancing.
Complex solutions may also include the use of vendor-specific storage-array software that
enables the iSCSI host to connect to multiple ports on the array with multiple NICs or HBAs.
3.3.1.3 Topologies for iSCSI Connectivity

Native iSCSI: Native topologies do not have any FC components; they perform all
communication over IP. The initiators may be either directly attached to targets or connected
using standard IP routers and switches.

In this type of connectivity, the host with iSCSI initiators may be either directly attached to the
iSCSI targets (iSCSI-capable storage systems) or connected through an IP-based network. FC
components are not required for native iSCSI connectivity. Below figure shows a native iSCSI
implementation that includes a storage system with an iSCSI port. The storage system is
connected to an IP network. After an iSCSI initiator is logged on to the network, it can access the
available LUNs on the storage system.

Bridged iSCSI: Bridged iSCSI Connectivity - Bridged topologies enable the co-existence of FC
with IP by providing iSCSI-to-FC bridging functionality. For example, the initiators can exist in
an IP environment while the storage remains in an FC SAN.
This type of connectivity allows the initiators to exist in an IP environment while the storage
systems remain in an FC SAN environment. It enables the coexistence of FC with IP by providing
iSCSI-to-FC bridging functionality. The above figure illustrates a bridged iSCSI implementation.
It shows connectivity between a compute system with an iSCSI initiator and a storage system
with an FC port.
As the storage system does not have any iSCSI port, a gateway or a multi-protocol router is used.
The gateway facilitates the communication between the compute system with iSCSI ports and
the storage system with only FC ports. The gateway converts IP packets to FC frames and vice
versa, thereby bridging the connectivity between the IP and FC environments. The gateway
contains both FC and Ethernet ports to facilitate the communication between the FC and the IP

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

158

environments. The iSCSI initiator is configured with the gateway’s IP address as its target
destination. On the other side, the gateway is configured as an FC initiator to the storage system.

3.3.1.4 iSCSI Protocol Stack


SCSI is the command protocol that works at the application layer of the Open System
Interconnection (OSI) model. The initiators and the targets use SCSI commands and responses to
talk to each other. The SCSI commands, data, and status messages are encapsulated into TCP/IP
and transmitted across the network between the initiators and the targets.
The below figure displays a model of iSCSI protocol layers and depicts the encapsulation order
of the SCSI commands for their delivery through a physical carrier.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

159

iSCSI is the session-layer protocol that initiates a reliable session between devices that recognize
SCSI commands and TCP/IP. The iSCSI session-layer interface is responsible for handling login,
authentication, target discovery, and session management.
TCP is used with iSCSI at the transport layer to provide reliable transmission. TCP controls
message flow, windowing, error recovery, and retransmission. It relies upon the network layer of
the OSI model to provide global addressing and connectivity. The OSI Layer 2 protocols at the
data link layer of this model enable node-to-node communication through a physical network.
3.3.1.5 iSCSI Discovery
An iSCSI initiator must discover the location of its targets on the network and the names of the
targets available to it before it can establish a session. This discovery commonly takes place in
two ways: SendTargets discovery or internet Storage Name Service (iSNS).
SendTargets discovery: In SendTargets discovery, the initiator is manually configured with the
target’s network portal (IP address and TCP port number) to establish a discovery session. The
initiator issues the SendTargets command, and thereby the target network portal responds to the
initiator with the location and name of the target.
iSNS: iSNS in the iSCSI SAN is equivalent in function to the Name Server in an FC SAN. It
enables automatic discovery of iSCSI devices on an IP-based network. The initiators and targets
can be configured to automatically register themselves with the iSNS server. Whenever an
initiator wants to know the targets that it can access, it can query the iSNS server for a list of
available targets.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

160

3.3.1.6 iSCSI Names


Both the initiators and the targets in an iSCSI environment have iSCSI addresses that facilitate
communication between them. An iSCSI address is comprised of the location of an iSCSI initiator
or target on the network and the iSCSI name. The location is a combination of the host name or
IP address and the TCP port number.
For iSCSI initiators, the TCP port number is omitted from the address. iSCSI name is a unique
worldwide iSCSI identifier that is used to identify the initiators and targets within an iSCSI
network to facilitate communication.
The unique identifier can be a combination of the names of the department, application,
manufacturer, serial number, asset number, or any tag that can be used to recognize and manage
the iSCSI nodes. The following are three types of iSCSI names commonly used.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

161

iSCSI Qualified Name (IQN): An organization must own a registered domain name to generate
iSCSI Qualified Names. This domain name does not need to be active or resolve to an address. It
just needs to be reserved to prevent other organizations from using the same domain name to
generate iSCSI names. A date is included in the name to avoid potential conflicts caused by the
transfer of domain names. An example of an IQN is iqn.2015-04.com.example: optional_string.
The optional_string provides a serial number, an asset number, or any other device identifiers.
IQN enables storage administrators to assign meaningful names to the iSCSI initiators and the
iSCSI targets, and therefore, manages those devices more easily.
Extended Unique Identifier (EUI): An EUI is a globally unique identifier based on the IEEE
EUI-64 naming standard. An EUI is composed of the eui prefix followed by a 16-character
hexadecimal name, such as eui.0300732A32598D26.
Network Address Authority (NAA): NAA is another worldwide unique naming format as
defined by the Inter-National Committee for Information Technology Standards (INCITS) T11 –
Fibre Channel (FC) protocols and is used by Serial Attached SCSI (SAS). This format enables
the SCSI storage devices that contain both iSCSI ports and SAS ports to use the same NAA-based
SCSI device name. An NAA is composed of the naa prefix followed by a hexadecimal name,
such as naa.52004567BA64678D. The hexadecimal representation has a maximum size of 32
characters (128-bit identifier).
3.3.1.7 iSCSI Session
An iSCSI session is established between an initiator and a target. A session ID (SSID), which
includes an initiator ID (ISID) and a target ID (TSID), identifies a session.
The session can be intended for one of the following:
■ Discovery of available targets to the initiator and the location of a specific target on a network
■ Normal operation of iSCSI (transferring data between initiators and targets)
TCP connections may be added and removed within a session. Each iSCSI connection within the
session has a unique connection ID (CID).
3.3.1.8 iSCSI PDU
iSCSI initiators and targets communicate using iSCSI Protocol Data Units (PDUs). All iSCSI
PDUs contain one or more header segments followed by zero or more data segments. The PDU
is then encapsulated into an IP packet to facilitate the transport.
A PDU includes the components shown in Figure 8-6. The IP header provides packet-routing
information that is used to move the packet across a network. The TCP header contains the
information needed to guarantee the packet’s delivery to the target. The iSCSI header describes
how to extract SCSI commands and data for the target. iSCSI adds an optional CRC, known as
the digest, beyond the TCP checksum and Ethernet CRC to ensure datagram integrity. The header
and the data digests are optionally used in the PDU to validate integrity, data placement, and
correct operation.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

162

As shown in Figure 8-7, each iSCSI PDU does not correspond in a 1:1 relationship with an IP
packet. Depending on its size, an iSCSI PDU can span an IP packet or even coexist with another
PDU in the same packet. Therefore, each IP packet and Ethernet frame can be used more
efficiently because fewer packets and frames are required to transmit the SCSI information.

3.3.1.9 Link aggregation, Switch aggregation


Like an FC environment, the link aggregation in an Ethernet network also combines two or more parallel
network links into a single logical link (port-channel).
Link aggregation enables obtaining higher throughput than a single link could provide. It also enables
distribution of network traffic across the links that ensure even link utilization. If a link in the aggregation
is lost, all network traffic on that link is redistributed across the remaining links. Link aggregation can
be performed for links between two switches and between a switch and a node. The below figure shows
an example of link aggregation between two Ethernet switches. In this example, four links between the
switches are aggregated into a single port-channel.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

163

Switch aggregation combines two physical switches to make them appear as a single logical switch. All
network links from these physical switches appear as a single logical link. This enables nodes to use a
port-channel across two switches. The network traffic is also distributed across all the links in the port-
channel. Switch aggregation allows ports in both the switches to be active and to forward network traffic
simultaneously. Therefore, it provides more active paths and throughput than a single switch or multiple
non-aggregated switches under normal conditions, resulting in improved node performance. With switch
aggregation, if one switch in the aggregation fails, network traffic will continue to flow through another
switch. In the figure, four physical links to the aggregated switches appear as a single logical link to the
third switch.

3.3.1.10 VLAN

Virtual LANs (VLANs) are logical networks created on a LAN. A VLAN enables communication
between a group of nodes with a common set of functional requirements independent of their physical
location in the network. VLANs are particularly well-suited for iSCSI deployments as they enable
isolating the iSCSI traffic from other network traffic (for example, compute-to-compute traffic) when a
physical Ethernet network is used to transfer different types of network traffic.

A VLAN conceptually functions in the same way as a VSAN. Each VLAN behaves and is managed as
an independent LAN. Two nodes connected to a VLAN can communicate between themselves without
routing of frames even if they are in different physical locations. VLAN traffic must be forwarded via a
router or OSI Layer-3 switching device when two nodes in different VLANs are communicating even if
they are connected to the same physical LAN. Network broadcasts within a VLAN generally do not
propagate to nodes that belong to a different VLAN, unless configured to cross a VLAN boundary.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

164

To configure VLANs, an administrator first defines the VLANs on the switches. Each VLAN is identified
by a unique 12-bit VLAN ID (as per IEEE 802.1Q standard). The next step is to configure the VLAN
membership based on an appropriate technique supported by the switches, such as port-based, MAC-
based, protocol-based, IP subnet address-based, and application-based.

• In the port-based technique, membership in a VLAN is defined by assigning a VLAN ID to a


switch port. When a node connects to a switch port that belongs to a VLAN, the node becomes
a member of that VLAN.

• In the MAC-based technique, the membership in a VLAN is defined on the basis of the MAC
address of the node.

• In the protocol-based technique, different VLANs are assigned to different protocols based on
the protocol type field found in the OSI Layer 2 header.

• In the IP subnet address-based technique, the VLAN membership is based on the IP subnet
address. All the nodes in an IP subnet are members of the same VLAN. In the application-
based technique, a specific application, for example, a file transfer protocol (FTP) application
can be configured to execute on one VLAN.

VLAN Trunking & Tagging

Similar to the VSAN trunking, network traffic from multiple VLANs may traverse a trunk link. A single
network port, called trunk port, is used for sending or receiving traffic from multiple VLANs over a trunk
link. Both the sending and the receiving network components must have at least one trunk port configured
for all or a subset of the VLANs defined on the network component.
As with VSAN tagging, VLAN has its own tagging mechanism. The tagging is performed by inserting a
4-byte tag field containing 12-bit VLAN ID into the Ethernet frame (as per IEEE 802.1Q standard) before
it is transmitted through a trunk link. The receiving network component reads the tag and forwards the
frame to the destination port(s) that corresponds to that VLAN ID. The tag is removed once the frame
leaves a trunk link to reach a node port.

A stretched VLAN is a VLAN that spans across multiple sites over a WAN connection. In a typical
multi-site environment, network traffic between sites is routed through an OSI Layer 3 WAN connection.
Because of the routing, it is not possible to transmit OSI Layer 2 traffic between the nodes in two sites.
A stretched VLAN extends a VLAN across the sites and enables nodes in two different sites to
communicate over a WAN as if they are connected to the same network.
Stretched VLANs also allow the movement of virtual machines (VMs) between sites without the need
to change their network configurations. This simplifies the creation of high-availability clusters, VM
migration, and application and workload mobility across sites.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

165

3.3.1.11 Ordering and Numbering


iSCSI communication between initiators and targets is based on the request response command
sequences. A command sequence may generate multiple PDUs. A command sequence number (CmdSN)
within an iSCSI session is used to number all initiator-to-target command PDUs belonging to the session.
This number is used to ensure that every command is delivered in the same order in which it is
transmitted, regardless of the TCP connection that carries the command in the session.

Command sequencing begins with the first login command and the CmdSN is incremented by one for
each subsequent command. The iSCSI target layer is responsible for delivering the commands to the
SCSI layer in the order of their CmdSN. This ensures the correct order of data and commands at a target
even when there are multiple TCP connections between an initiator and the target using portal groups.

Similar to command numbering, a status sequence number (StatSN) is used to sequentially number status
responses, as shown in Figure 8-8. These unique numbers are established at the level of the TCP
connection.

A target sends the request-to-transfer (R2T) PDUs to the initiator when it is ready to accept data. Data
sequence number (DataSN) is used to ensure in-order delivery of data within the same command. The
DataSN and R2T sequence numbers are used to sequence data PDUs and R2Ts, respectively. Each of
these sequence numbers is stored locally as an unsigned 32-bit integer counter defined by iSCSI. These
numbers are communicated between the initiator and target in the appropriate iSCSI PDU fields during
command, status, and data exchanges.
In the case of read operations, the DataSN begins at zero and is incremented by one for each subsequent
data PDU in that command sequence. In the case of a write operation, the first unsolicited data PDU or
the first data PDU in response to an R2T begins with a DataSN of zero and increments by one for each
subsequent data PDU. R2TSN is set to zero at the initiation of the command and incremented
by one for each subsequent R2T sent by the target for that command.

3.3.1.12 iSCSI Error Handling & Security

The iSCSI protocol addresses errors in IP data delivery. Command sequencing is used for flow control,
the missing commands, and responses, and data blocks are detected using sequence numbers. Use of the
optional digest improves communication integrity in addition to TCP checksum and Ethernet CRC.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

166

The error detection and recovery in iSCSI can be classified into three levels:

Level 0 = Session Recovery, Level 1 = Digest Failure Recovery and Level 2 = Connection Recovery.
The error-recovery level is negotiated during login.
■ Level 0: If an iSCSI session is damaged, all TCP connections need to be closed and all tasks and
unfulfilled SCSI commands should be completed. Then, the session should be restarted via the repeated
login.
■ Level 1: Each node should be able to selectively recover a lost or damaged
PDU within a session for recovery of data transfer. At this level, identification of an error and data
recovery at the SCSI task level is performed, and an attempt to repeat the transfer of a lost or damaged
PDU is made.
■ Level 2: New TCP connections are opened to replace a failed connection. The new connection picks
up where the old one failed iSCSI may be exposed to the security vulnerabilities of an unprotected IP
network. Some of the security methods that can be used are IPSec and authentication solutions such as
Kerberos and CHAP (challenge-handshake authentication protocol).

3.3.2 FCIP
FC SAN provides a high-performance infrastructure for localized data movement. But the organizations
are now looking for ways to transport data over a long distance between their disparate FC SANs at
multiple geographic locations. One of the best ways to achieve this goal is to interconnect geographically
dispersed FC SANs through reliable, high-speed links. This approach involves transporting the FC block
data over the IP infrastructure.

FCIP is an IP-based protocol that enables distributed FC SAN islands to be interconnected over an
existing IP network. In FCIP, FC frames are encapsulated onto the IP payload and transported over an IP
network. The FC frames are not altered while transferring over the IP network. In this manner, FCIP
creates virtual FC links over IP network to transfer FC data between FC SANs. FCIP is a tunnelling
protocol in which FCIP entity such as an FCIP gateway is used to tunnel FC fabrics through an IP
network.

The FCIP standard has rapidly gained acceptance as a manageable, cost-effective way to blend
the best of the two technologies which are FC SAN and the proven, widely deployed IP
infrastructure. As a result, organizations now have a better way to store, protect, and move their
data by leveraging investments in their existing IP infrastructure. FCIP is extensively used in

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

167

disaster recovery implementations in which data is replicated to the storage located at a remote
site. It also facilitates data sharing and data collaboration over distance, which is a key
requirement for next generation applications.

3.3.2.1 FCIP Protocol


The FCIP protocol stack is shown on the slide. Applications generate SCSI commands and data,
which are processed by various layers of the protocol stack. The upper layer protocol SCSI
includes the SCSI driver program that executes the read-and-write commands. Below the SCSI
layer is the FC protocol (FCP) layer, which is simply an FC frame whose payload is SCSI.

The FC frames can be encapsulated into the IP packet and sent to a remote FC SAN over the IP.
The FCIP layer encapsulates the FC frames onto the IP payload and passes them to the TCP layer.
TCP and IP are used for transporting the encapsulated information across Ethernet, wireless, or
other media that support the TCP/IP traffic.

Encapsulation of FC frame on to IP packet could cause the IP packet to be fragmented when the
data link cannot support the maximum transmission unit (MTU) size of an IP packet. When an IP
packet is fragmented, the required parts of the header must be copied by all fragments. When a
TCP packet is segmented, normal TCP operations are responsible for receiving and re-sequencing
the data prior to passing it on to the FC processing portion of the device.
3.3.2.2 FCIP Connectivity and Topologies

In an FCIP environment, FCIP entity such as an FCIP gateway is connected to each fabric via a
standard FC connection. The FCIP gateway at one end of the IP network encapsulates the FC
frames into IP packets. The gateway at the other end removes the IP wrapper and sends the FC
data to the adjoined fabric. The fabric treats these gateways as fabric switches. An IP address is
assigned to the port on the gateway, which is connected to an IP network. After the IP connectivity
is established, the nodes in the two independent fabrics can communicate with other.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

168

An FCIP environment functions as if it is a single cohesive SAN environment. Before


geographically dispersed SANs are merged, a fully functional layer 2 network exists on the SANs.
This layer 2 network is a standard SAN fabric. These physically independent fabrics are merged
into a single fabric with an IP link between them.

An FCIP gateway router is connected to each fabric via a standard FC connection (see Figure 8-
10). The fabric treats these routers like layer 2 fabric switches. The other port on the router is
connected to an IP network and an IP address is assigned to that port. This is similar to the method
of assigning an IP address to an iSCSI port on a gateway. Once IP connectivity is established, the
two independent fabrics are merged into a single fabric. When merging the two fabrics, all the
switches and routers must have unique domain IDs, and the fabrics must contain unique zone set
names. Failure to ensure these requirements will result in a segmented fabric. The FC addresses
on each side of the link are exposed to the other side, and zoning or masking can be done to any
entity in the new environment.

3.3.2.3 FCIP Configuration


An FCIP tunnel may be configured to merge interconnected fabrics into a single large fabric. In
the merged fabric, FCIP transports existing fabric services across the IP network.
An FCIP tunnel consists of one or more independent connections between two FCIP ports on
gateways (tunnel endpoints). Each tunnel transports encapsulated FC frames over a TCP/IP
network. The nodes in either fabric are unaware of the existence of the tunnel. Multiple tunnels

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

169

may be configured between the fabrics based on connectivity requirement. Some


implementations allow aggregating FCIP links (tunnels) to increase throughput and to provide
link redundancy and load balancing.

Frequently, only a small subset of nodes in either fabric requires connectivity across an FCIP
tunnel. Thus, an FCIP tunnel may also use vendor-specific features to route network traffic
between specific nodes without merging the fabrics.

A VSAN, similar to a stretched VLAN, may be extended across sites. The FCIP tunnel may use
vendor-specific features to transfer multiple VSAN traffic through it. The FCIP tunnel functions
as a trunk link and carries tagged FC frames. This allows extending separate VSANs each with
their own fabric services, configuration, and set of FC addresses across multiple sites.
3.3.2.4 FCIP Performance and Security
Performance, reliability, and security should always be taken into consideration when
implementing storage solutions. The implementation of FCIP is also subject to the same
consideration.
From the perspective of performance, multiple paths to multiple FCIP gateways from different
switches in the layer 2 fabric eliminates single points of failure and provides increased bandwidth.
In a scenario of extended distance, the IP network may be a bottleneck if sufficient bandwidth is
not available. In addition, because FCIP creates a unified fabric, disruption in the underlying IP
network can cause instabilities in the SAN environment. These include a segmented fabric,
excessive RSCNs, and host timeouts.
The vendors of FC switches have recognized some of the drawbacks related to FCIP and have
implemented features to provide additional stability, such as the capability to segregate FCIP
traffic into a separate virtual fabric. Security is also a consideration in an FCIP solution because
the data is transmitted over public IP channels. Various security options are available to protect
the data based on the router’s support. IPSec is one such security measure that can be
implemented in the FCIP environment.
3.4 Fiber Channel over Ethernet Storage Area Network (FCoE SAN)
FCoE SAN is a Converged Enhanced Ethernet (CEE) network that is capable of transporting FC
data along with regular Ethernet traffic over high speed (such as 10 Gbps or higher) Ethernet
links.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

170

It uses FCoE protocol that encapsulates FC frames into Ethernet frames. FCoE protocol is defined
by the T11 standards committee. FCoE is based on an enhanced Ethernet standard that supports
Data Center Bridging (DCB) functionalities (also called CEE functionalities). DCB ensures
lossless transmission of FC traffic over Ethernet.

FCoE SAN provides the flexibility to deploy the same network components for transferring both
server-to-server traffic and FC storage traffic. This helps to mitigate the complexity of managing
multiple discrete network infrastructures. FCoE SAN uses multi-functional network adapters and
switches. Therefore, FCoE reduces the number of network adapters, cables, and switches, along
with power and space consumption required in a data center.
3.4.1 Components of FCoE SAN
he key FCoE SAN components are:

• Network adapters such as Converged Network Adapter (CNA) and software FCoE
adapter

• Cables such as copper cables and fiber optical cables

•FCoE switch
Converged Network Adapter (CNA)
The CNA is a physical adapter that provides the functionality of both a standard NIC and an FC
HBA in a single device. It consolidates both FC traffic and regular Ethernet traffic on a common
Ethernet infrastructure. CNAs connect hosts to the FCoE switches. They are responsible for

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

171

encapsulating FC traffic onto Ethernet frames and forwarding them to FCoE switches over CEE
links.
They eliminate the need to deploy separate adapters and cables for FC and Ethernet
communications, thereby reducing the required number of network adapters and switch ports.
A CNA offloads the FCoE protocol processing task from the compute system, thereby freeing the
CPU resources of the compute system for application processing. It contains separate modules
for 10 Gigabit Ethernet (GE), FC, and FCoE Application Specific Integrated Circuits (ASICs).
Software FCoE Adapter
Instead of a CNA, a software FCoE adapter may also be used. A software FCoE adapter is OS or
hypervisor kernel-resident software that performs FCoE processing. The FCoE processing
consumes hosts CPU cycles.

With software FCoE adapters, the OS or hypervisor implements FC protocol in software that
handles SCSI to FC processing. The software FCoE adapter performs FC to Ethernet
encapsulation. Both FCoE traffic (Ethernet traffic that carries FC data) and regular Ethernet
traffic are transferred through supported NICs on the hosts.

FCOE Switch
An FCoE switch has both Ethernet switch and FC switch functionalities. It has a Fibre Channel
Forwarder (FCF), an Ethernet Bridge, and a set of ports that can be used for FC and Ethernet
connectivity. FCF handles FCoE login requests, applies zoning, and provides the fabric services
typically associated with an FC switch. It also encapsulates the FC frames received from the FC
port into the Ethernet frames and decapsulates the Ethernet frames received from the Ethernet
Bridge to the FC frames.
Upon receiving the incoming Ethernet traffic, the FCoE switch inspects the Ethertype of the
incoming frames and uses that to determine their destination. If the Ethertype of the frame is
FCoE, the switch recognizes that the frame contains an FC payload and then forwards it to the
FCF. From there, the FC frame is extracted from the Ethernet frame and transmitted to the FC
SAN over the FC ports. If the Ethertype is not FCoE, the switch handles the traffic as usual
Ethernet traffic and forwards it over the Ethernet ports.
3.4.2 FCoE SAN connectivity
The most common FCoE connectivity uses FCoE switches to interconnect a CEE network
containing hosts with an FC SAN containing storage systems. The hosts have FCoE ports that
provide connectivity to the FCoE switches. The FCoE switches enable the consolidation of FC
traffic and Ethernet traffic onto CEE links.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

172

This type of FCoE connectivity is suitable when an organization has an existing FC SAN
environment. Connecting FCoE hosts to the FC storage systems through FCoE switches do not
require any change in the FC environment. And the other type of FCoE connectivity model is the
end-to-end FCoE model. Some vendors offer FCoE ports in their storage systems. These storage
systems connect directly to the FCoE switches.

The FCoE switches form FCoE fabrics between hosts and storage systems and provide end to-
end FCoE support. The end-to-end FCoE connectivity is suitable for a new FCoE deployment.
3.4.3 Converged Enhanced Ethernet
Traditional Ethernet networks are accessed via a network adapter called a network interface
card (NIC). Each host that wants to connect to the Ethernet network needs at least one.
Conversely, traditional Fibre Channel networks are accessed via a network adapter called a host
bus adapter (HBA) in each host.

However, to create a single data center network capable of transporting IP and FC storage traffic,
Ethernet adapters had to be significantly enhanced and upgraded. So, ideally to set up a new
server in the data center, we need a new type of advanced adapter known as Converged Network
Adapter (CNA) in it, and you can do all the networking you want.Accessing an FCoE network
requires a new type of network adapter called a converged network adapter (CNA). All three of
these network adapter cards are implemented as PCI adapter cards. They can be either expansion
cards or directly on the motherboard of a server in what is known as LAN on motherboard (LOM).

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

173

A CNA is exactly what the name suggests – a NIC and an HBA converged into a single network
card. For performance reasons, CNAs provide NIC and HBA functionality in

hardware (usually an ASIC) so that things like FCoE encapsulation can be fast without impacting
host CPU resources.

You can do general purpose IP networking, FC storage networking, iSCSI storage, NAS, maybe
even low-latency, high-performance computing. A single network adapter and a single cable will
do all the job. It also has the positive effect of reduced power and cooling costs. All in all, it
delivers reduced data center running costs and lower total cost of ownership (TCO).

So to create this new enhanced Ethernet, the IEEE formed a new task group within the 802.1
working group called as Data Center Bridging (DCB). This DCB is responsible for the
development of a data center ethernet network that is capable of transporting all common data
center network traffic types like IP LAN traffic, FC storage traffic and infiband high performance
computing traffic. So the enhanced ethernet generally called as either Data Center Bridging
(DCB) or Converged Ethernet (CEE) or Data Center Fabric or Unified Fabric.
This Converged Enhanced Ethernet (CEE) has the following enhancements which we will discuss
in next posts

• Increased bandwidth

• Classes of service

• Priorities

• Congestion management

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

174

• Enhanced transmission selection (ETS)

All these enhancements can be obtained by using Converged network adapters and a whole load
of new hardware requirements such as cables, switch ports, and switches.
Functions of Converged Enhanced Ethernet (CEE)
Conventional Ethernet is lossy in nature, which means that frames might be dropped or lost under
congestion conditions. Therefore, Converged Enhanced Ethernet (CEE) provides a new
specification to the existing Ethernet standard. It eliminates the lossy nature of Ethernet and
enables convergence of various types of network traffic on a common Ethernet infrastructure.
CEE eliminates the dropping of frames due to congestion and thereby ensures lossless
transmission of FCoE (Fibre Channel over Ethernet) traffic over an Ethernet network. The
lossless Ethernet is required for the reliable transmission of FC data over an Ethernet network.
Unlike TCP/IP, the loss of a single FC frame typically requires the entire FC exchange to be
aborted and re‐transmitted, instead of just re‐sending a particular missing frame. CEE makes a
high-speed (such as 10 Gbps or higher) Ethernet network a viable storage networking option,
similar to an FC SAN.

Converged Enhanced Ethernet (CEE) Functions

The CEE requires certain functionalities. These functionalities are defined and maintained by the
Data Center Bridging (DCB) task group, which is a part of the IEEE 802.1 working group. These
functionalities are

• Priority-based flow control

• Enhanced transmission selection

• Congestion notification

• Data center bridging exchange protocol

Priority based Flow Control (PFC)


Traditional FC manages congestion through the use of a link-level, credit-based flow control that
guarantees no loss of FC frames. Typical Ethernet, coupled with TCP/IP, uses a packet drop flow
control mechanism. The packet drop flow control is not lossless. This challenge is eliminated by
using an IEEE 802.3x Ethernet PAUSE control frame to create a lossless Ethernet. A receiver can
send a PAUSE request to a sender when the receiver’s buffer is filling up. Upon receiving a
PAUSE frame, the sender stops transmitting frames, which guarantees no loss of frames. The
downside of using the Ethernet PAUSE frame is that it operates on the entire link, which might
be carrying multiple traffic flows.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

175

• PFC provides a link-level flow control mechanism. PFC creates eight separate virtual
links on a single physical link and allows any of these links to be paused and restarted
independently.

• PFC enables the PAUSE mechanism based on user priorities or classes of service.
Enabling the PAUSE based on priority allows creating lossless links for network
traffic, such as FCoE traffic.

•This PAUSE mechanism is typically implemented for FCoE while regular TCP/IP
traffic continues to drop frames.
Enhanced Transmission Selection (ETS)
Enhanced transmission selection (ETS) provides a common management framework for the
allocation of bandwidth to different traffic classes, such as LAN, SAN, and Inter Process
Communication (IPC). For example, an administrator may assign 40 percent of network
bandwidth to LAN traffic, 40 percent of bandwidth to SAN traffic, and 20 percent of bandwidth
to IPC traffic. When a particular class of traffic does not use its allocated bandwidth, ETS enables
other traffic classes to use the available bandwidth.
Congestion notification (CN)
Congestion notification (CN) provides end-to-end congestion management for protocols, such as
FCoE, that do not have built-in congestion control mechanisms. Link level congestion
notification provides a mechanism for detecting congestion and notifying the source to move the
traffic flow away from the congested links. Link level congestion notification enables a switch to
send a signal to other ports that need to stop or slow down their transmissions.

The process of congestion notification and its management is shown in the above figure, which
represents the communication between the nodes A (sender) and B (receiver). If congestion at the
receiving end occurs, the algorithm running on the switch generates a congestion notification
message to the sending node (Node A). In response to the message, the sending end limits the
rate of data transfer.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

176

Data Center Bridging Exchange (DCBX) protocol

DCBX is a discovery and capability exchange protocol, which helps CEE devices to convey and
configure their features with the other CEE devices in the network. DCBX is used to negotiate
capabilities between the switches and the network adapters, which allows the switch to distribute
the configuration values to all the attached adapters. This helps to ensure consistent configuration
across the entire network.
3.4.4 FCoE Architecture
The data in FCoE is sent through FCoE frames. An FCoE frame is an Ethernet frame that contains
an FCoE Protocol Data Unit (PDU). The below diagram shows the FCoE frame structure. The
Ethernet header includes the source and destination MAC addresses, IEEE 802.1Q VLAN tag,
and Ethertype field. FCoE has its own Ethertype.
The FCoE header includes a version field that identifies the version of FCoE being implemented
and some reserved bits. The Start of Frame (SOF) and the End of Frame (EOF) mark the start and
the end of the encapsulated FC frame respectively. The encapsulated FC frame consists of the FC
header and the data being transported (including the FC CRC). The FCoE frame ends with the
Frame Check Sequence (FCS) field that provides error detection for the Ethernet frame. Notice
that the FCoE frame, unlike iSCSI and FCIP, has no TCP/IP overhead.

Frame size is an important factor in FCoE. A typical FC data frame has a 2112-byte payload, a
24-byte header, and an FCS. A standard Ethernet frame has a default payload capacity of 1500
bytes. To maintain good performance, FCoE must use jumbo frames to prevent an FC frame from
being split into two Ethernet frames.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

177

The encapsulation of the FC frames occurs through the mapping of the FC frames onto Ethernet,
as shown on the slide. FC and traditional networks have stacks of layers where each layer in the
stack represents a set of functionalities. The FC stack consists of five layers – FC-0 through FC-
4.
Ethernet is typically considered as a set of protocols that operates at the physical and data link
layers in the seven-layer OSI stack. The FCoE protocol specification replaces the FC-0 and FC-
1 layers of the FC stack with Ethernet. This provides the capability to carry the FC-2 to the FC-4
layer over the Ethernet layer.

FCoE Addressing

An FCoE SAN uses MAC address for frame forwarding. The MAC addresses are assigned to the
VN_Ports, VF_Ports, and VE_Ports. The destination and the source MAC addresses are used to
direct frames to their Ethernet destinations. Both the VF_Ports and the VE_Ports obtain MAC
addresses from the FCoE switch. FCoE supports two types of addressing for the VN_Ports:
server-provided MAC address (SPMA) and fabric-provided MAC address (FPMA). These
addressing types are described below
SPMA: In this type of addressing, the compute systems provide MAC addresses to the associated
VN_Ports. The MAC addresses are issued in accordance with Ethernet standards. These
addresses are either burned-in by the manufacturers of the network adapters or are configured by
an administrator. SPMA can use a single MAC address exclusively for FCoE traffic or it can have
different MAC address for each VN_Port.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

178

FPMA: In this type of addressing, the VN_Ports receive MAC addresses from the FCoE switches
dynamically during login. The VN_Ports then use their granted MAC addresses for
communication. This address is derived by concatenating the 24-bit FC MAC address prefix (FC-
MAP) and the 24-bit FC address assigned to the VN_Port by the FCoE switch. FC-MAP identifies
the fabric to which an FCoE switch belongs. The FPMA ensures that the MAC addresses are
unique within an FCoE SAN.
FCoE Frame Forwarding
In an FCoE SAN, a node must know two different addresses to forward a frame to another node.
First, it must know the Ethernet MAC address of the FCoE switch port (VF_Port). Second, it
must know the FC address assigned to the destination node port (VN_Port or N_Port). The MAC
address is used to forward an Ethernet frame containing FC payload over a CEE network. The
FC address is used to send the FC frame, encapsulated into the Ethernet frame, to its FC
destination.

FCoE Process Flow

To understand the FCoE communication, it is important to know the FCoE process. The FCoE
process includes three key phases: discovery, login, and data transfer.

• Discovery phase: In this phase, the FCFs discover each other and form an FCoE
fabric. The FCoE nodes also find the available FCFs for login. Moreover, both the
FCoE nodes and the FCFs discover potential VN_Port to VF_Port pairing.

• Login phase: In this phase, the virtual FC links are established between VN_Ports and
VF_Ports as well as between VE_Ports. VN_ports perform FC login (including
FLOGI, PLOGI, PRLI) to the discovered FCFs and obtain FC addresses. Each
VN_Port also obtains a unique MAC address.

• Data transfer phase: After login, the VN_Ports can start transferring regular FC
frames (encapsulated) over the CEE network.

FCoE Initiation Protocol (FIP)

In an FCoE SAN, an FCoE node needs a discovery mechanism that allows it to discover the
available FCFs before it can perform FC login. The mechanism used for the discovery is the
FCoE Initialization Protocol (FIP).

FIP is used for discovering the FCFs and establishing virtual links between FCoE devices (FCoE
nodes and FCoE switches). Unlike FCoE frames, FIP frames do not transport FC data, but contain
discovery and login/logout parameters. FIP frames are assigned a unique EtherType code to
distinguish them from the FCoE frames.

The FCoE node to FCF discovery and the login use the following FIP operations:

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

179

• FCoE node sends multicast FIP Solicitation frame to find which FCFs are available
for login.

• Each FCF replies to the FCoE node by sending unicast FIP Advertisement frame.

• After the FCoE node decides which FCF is appropriate, it sends FIP FLOGI request
to the FCF.

• The selected FCF sends FIP FLOGI Accept which contains both FC address and MAC
address for the VN_Port. The reason for using FIP for FLOGI instead of a regular
FLOGI is that the FIP FLOGI Accept has a field for the FCF to assign a MAC address
to the VN_Port.

Two Mark Questions with Answers

1.List the types of storage systems.


Different types of storage systems as follows,
o Block-Based Storage System – Examples – SAN (Storage Area Network), iSCSI
(Internet Small Computer System Interface), and local disks.
o File-Based Storage System – Examples – NTFS (New Technology File System), FAT (File
Allocation Table), EXT (Extended File System).
o Object-Based Storage System – Examples – Google cloud storage, Amazon Simple
Storage Options.
o Unified Storage System – Examples – Dell EMC Unity XT All-Flash Unified Storage and
Dell EMC Unity XT Hybrid Unified Storage.

2. Define virtualization.

▪ Virtualization is the technique of masking or abstracting physical resources, which


simplifies the infrastructure and accommodates the increasing pace of business and
technological changes. It increases the utilization and capability of IT resources, such
as servers, networks, or storage devices, beyond their physical limits.
▪ Virtualization simplifies resource management by pooling and sharing resources for
maximum utilization and makes them appear as logical resources with enhanced
capabilities.
3. State the connectivity of iSCSI protocol.
▪ Native iSCSI Connectivity - Native topologies do not have any FC components; they
per-form all communication over IP. The initiators may be either directly attached to
targets or connected using standard IP routers and switches.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

180

▪ Bridged iSCSI Connectivity - Bridged topologies enable the co-existence of FC with IP


by providing iSCSI-to-FC bridging functionality. For example, the initiators can exist in
an IP environment while the storage remains in an FC SAN.

4. What is zoning?
▪ Zoning allows for finer segmentation of the switched fabric. Zoning can be used to
instigate a barrier between different environments.
▪ Only the members of the same zone can communicate within that zone; all
other attempts from outside are rejected.
▪ Zoning can be implemented in the following ways:
✓ Hardware zoning
✓ Software zoning

5. Define switch aggregation?

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

181

✓ An aggregation switch is a networking device that allows multiple network connections


to be bundled together into a single link. This enables increased bandwidth and better
network performance.

6. What is meant by file-based storage system?

• File storage, also called file-level or file-based storage, stores data in a


hierarchical structure. The data is saved in files and folders, and presented to both the
system storing it and the system retrieving it in the same format.
• Data can be accessed using the Network File System (NFS) protocol for Unix or Linux, or
the Server Message Block (SMB) protocol for Microsoft Windows.
7. Define Link aggregation?

o Link aggregation allows to combine multiple Ethernet links into a single logical link
between two networked devices. Link aggregation is sometimes called by other names:
Ethernet bonding. Ethernet teaming.

8. List the types of connectivity in FC SAN.

The three FC SAN topologies are as

follows
✓ Point-to-point,
✓ Fibre Channel-Arbitrated Loop (FC -AL),
✓ Switched Fabric.

9. List the types of topologies in FC SAN


The three FC SAN topologies are
✓ Single-Switch topology
✓ Mesh topology
✓ Core-edge topology
10. What is FCoE SAN?

FCoE SAN is a Converged Enhanced Ethernet (CEE) network that is capable of transporting
FC data along with regular Ethernet traffic over high speed (such as 10 Gbps or higher) Ethernet
links.

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

182

Review Questions
1. Explain in detail about Block-based, File-based, Object-based and Unified Storage systems.
2. Describe in detail about the components and architecture of FC SAN
3. Write a note on FC SAN topologies and connectivity
4. Describe in detail about zoning and link aggregation
5. What is Virtual SAN of FC SAN. Explain in detail.
6. Explain in detail about the IP SAN Protocols (iSCSI protocol Stack, FCIP protocol Stack)
7. Explain in detail about zoning and aggregation in iSCSI IP SAN
8. Explain the components, performance and addressing in FCIP
9.Explain in detail about the components and architecture of FCoE
10. Describe about Converged Enhanced Ethernet

Downloaded by Harish Junk ([email protected])


lOMoARcPSD|40976084

183

Downloaded by Harish Junk ([email protected])

You might also like