Ccs367 Storage Technologies Book
Ccs367 Storage Technologies Book
TABLE OF CONTENTS
UNIT 1 - STORAGE SYSTEMS
TOPICS PAGE NO
1.1 Introduction to Information Storage 11
1.1.1 Digital data and its types 13
1.1.1.1 Structured data 14
1.1.1.2 Unstructured data 14
1.1.1.3 Semi-Structured data 14
1.1.2 Information storage 16
1.1.3 Key characteristics of data center 17
1.1.4 Evolution of computing platforms 19
1.1.5 Information Lifecycle Management (ILM) 20
1.1.5.1 Information Lifecycle 20
1.1.5.2 Example of ILM 21
1.1.5.3 Characteristics of ILM 21
1.1.5.4 Implementation of ILM 22
1.1.5.5 Benefits of ILM 23
1.2 Third Platform Technologies 24
1.2.1 Cloud computing 25
1.2.2 Essential characteristics of cloud computing 25
1.2.3 Cloud service models 26
1.2.3.1 Infrastructure as a Service (IaaS) 27
1.2.3.2 Platform as a Service (PaaS) 28
1.2.3.3 Software as a Service (SaaS) 29
1.2.4 Types of cloud / Cloud deployment models 31
1.2.4.1 Public cloud 33
1.2.4.2 Private cloud 34
1.2.4.3 Hybrid cloud 35
1.2.4.4 Community cloud 37
1.2.4.5 Multi-Cloud 38
1.2.5 Big data Analytics 40
1.2.6 Social networking 45
1.2.7 Mobile computing 47
1.2.8 Characteristics of third platform infrastructure 50
1.2.9 Imperatives for third platform transformation 51
1.3 Data center environment 52
1.3.1 Building blocks of data center 53
1.3.2 Compute systems 56
1.3.4 Compute virtualization 57
1.3.5 Software-defined data center 58
Two Mark Questions with Answers 59
Review Questions 63
TOPICS PAGE
NO
5.1 Introduction to Information Security
5.2 Information security goals
5.2.1 Confidentiality
5.2.2 Integrity
5.2.3 Availability
5.2.4 Accountability
5.2.4 Authentication
5.2.5 Authorization
5.2.6 Auditing
5.3 Information Security Considerations
5.3.1 Risk assessment
5.3.2 Assets and Threats
5.3.3 Vulnerability
5.3.4 Security Controls
5.3.4.1 Preventive
5.3.4.2 Detective
5.3.4.3 Corrective
5.3.5 Defense in depth
5.4 Storage Security Domains
5.4.1 Securing the Application Access Domain
5.4.2 Securing the Management Access Domain
5.4.3 Securing Backup, Recovery, and Archive (BURA)
5.5 Threats to a storage infrastructure, Security
5.5.1 Unauthorized access
5.5.2 Denial of Service (DoS)
5.5.3 Distributed DoS (DDoS) attack
5.5.4 Data loss
5.5.5 Malicious Insiders
5.5.6 Account Hacking
5.5.7 Insecure API’s
10
11
12
6. Network-Attached Storage (NAS): NAS is a type of storage device that connects to a network
and provides file-level data storage to multiple clients. It is commonly used in homes and small
businesses to centralize data storage and facilitate easy file sharing and access.
7. Storage Area Network (SAN): SAN is a specialized network that connects multiple storage
devices to servers, allowing for high-speed data transfer and centralized storage management.
SANs are commonly used in large enterprises and data centers that require high-performance
storage solutions.
8. Virtualization: Virtualization technology allows for the abstraction of physical storage
resources, enabling multiple virtual machines or servers to share a common pool of storage. This
improves resource utilization and simplifies management in virtualized environments.
9. Redundant Array of Independent Disks (RAID): RAID is a method of combining multiple
physical disk drives into a single logical unit for improved performance, reliability, or both.
Different RAID levels offer various combinations of data striping, mirroring, and parity for
enhanced data protection and performance.
10. Data Backup and Disaster Recovery: Information storage also involves implementing
backup strategies to protect against data loss due to hardware failures, natural disasters, or human
errors. Regular backups ensure that critical data can be restored in case of an unforeseen event.
11. Data Compression: Data compression techniques are often used to reduce the size of stored
information, optimizing storage space and improving transfer speeds. Compression algorithms
remove redundant or unnecessary data from files without compromising their integrity.
12. Data Encryption: To ensure the security and privacy of stored information, data encryption
techniques are employed. Encryption transforms data into an unreadable format using
cryptographic algorithms, making it accessible only to authorized users with the appropriate
decryption keys.
13. Big Data Storage: With the exponential growth of data in recent years, big data storage
solutions have emerged to handle massive volumes of structured and unstructured data.
Technologies like Hadoop Distributed File System (HDFS) and NoSQL databases provide
scalable and distributed storage architectures for big data analytics. Effective information storage
involves a combination of these methods and technologies, tailored to the specific needs and
requirements of an organization or individual. It is essential to consider factors such as data access
speed.
Data
Data is a collection of raw facts from which conclusions may be drawn.
Examples:
✓ Handwritten letters,
✓ A printed book,
✓ A family photograph,
✓ A movie on video tape,
13
✓ Digital data refers to any information that is processed and stored in a digital format, such as
text, images, audio, and video. This data is represented using binary code (1s and 0s) and can
be easily manipulated and transmitted electronically.
✓ Examples of digital data can include emails, social media posts, digital images, music files,
video files, and more.
✓ Digital data is used widely in modern society and is essential for communication,
entertainment, education, research, and business operations.
With the advancement of computer and communication technologies, the rate of data generation
and sharing has increased exponentially.
The following is a list of some of the factors that have contributed to the growth of digital
data:
14
✓ Affordable and faster communication technology: The rate of sharing digital data is now
much faster than traditional approaches. A handwritten letter may take a week to reach its
destination, whereas it only takes a few seconds for an e-mail message to reach its recipient.
Types of digital data
Data can be classified based on how it is stored and managed.
Digital data are classified as follows,
✓ Structured Data
✓ Unstructured Data
✓ Semi-Structured Data
1.1.1.1 Structured data:
Structured data is a data that is highly organized and follows a predefined schema or data model.
In structured data, data are organized in rows and columns. It is typically stored using DBMS
(Database and Management System).
Examples -
15
Examples
✓ HTML code,
✓ Graphs and tables,
✓ E-mails,
✓ XML documents.
Advantage:
The advantages of semi-structured data is that it is more flexible and simpler to scale compared
to structured data.
Difference between Structured, Unstructured and Semi-structured data
16
✓ Devices such as memory in a cell phone or digital camera, DVDs, CD-ROMs, and hard disks
in personal computers.
✓ Businesses have several options available for storing data including internal hard disks,
external disk arrays and tapes.
17
Information storage is a fundamental concept in the world of data and technology. Information
storage refers to the process of collecting, preserving, and organizing data in a manner that allows
for efficient retrieval and use at a later time.
Information storage systems are an integral part of our daily lives, playing a crucial role in various
fields, including business, science, education, personal communication.
18
Availability: All data center elements should be designed to ensure accessibility. The inability of
users to access data can have a significant negative impact on a business.
Security: Polices, procedures, and proper integration of the data center core elements that will
prevent unauthorized access to information must be established. In addition to the security
measures for client access, specific mechanisms must enable servers to access only their allocated
resources on storage arrays.
Scalability: Data center operations should be able to allocate additional processing capabilities
or storage on demand, without interrupting business operations. Business growth often requires
deploying more servers, new applications, and additional databases. The storage solution should
be able to grow with the business.
Performance: All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed. The infrastructure should be able
to support performance requirements.
Data integrity: Data integrity refers to mechanisms such as error correction codes or parity bits
which ensure that data is written to disk exactly as it was received. Any variation in data during
its retrieval implies corruption, which may affect the operations of the organization.
Capacity: Data center operations require adequate resources to store and process large amounts
of data efficiently. When capacity requirements increase, the data center must be able to provide
additional capacity without interrupting availability, or, at the very least, with minimal disruption.
Capacity may be managed by reallocation of existing resources, rather than by adding new
resources.
Manageability: A data center should perform all operations and activities in the most efficient
manner. Manageability can be achieved through automation and the reduction of human (manual)
intervention in common tasks.
Monitoring: It is a continuous process of gathering information on various elements and services
running in the data center. The reason is obvious – to predict unpredictable.
Reporting: A resource performance, capacity and utilization gathered together in a point of time.
Provisioning: It is a process of providing the hardware, software and other resources required to
run a data center.
19
20
21
1.1.5.2 For example, in a sales order application, the value of the information changes from the
time the order is placed until the time that the warranty becomes void (see Figure -Changing value
of sales order information). The value of the information is highest when a company receives a
new sales order and processes it to deliver the product. After order fulfilment, the customer or
order data need not be available for real-time access. The company can transfer this data to less
expensive secondary storage with lower accessibility and availability requirements unless or until
a warranty claim or another event triggers its need. After the warranty becomes void, the company
can archive or dispose of data to create space for other high-value information.
22
23
✓ Improved utilization by using tiered storage platforms and increased visibility of all
enterprise information.
✓ Simplified management by integrating process steps and interfaces with individual tools and
by increasing automation.
✓ A wider range of options for backup, and recovery to balance the need for business continuity.
✓ Maintaining compliance by knowing what data needs to be protected for what length of time.
✓ Lower Total Cost of Ownership (TCO) by aligning the infrastructure and management costs
with information value. As a result, resources are not wasted, and complexity is not
introduced by managing low-value data at the expense of high-value data.
24
25
Augmented Reality (AR) and Virtual Reality (VR): AR and VR technologies are transforming
the way we interact with digital information and environments, impacting industries like gaming,
education, healthcare, and more.
The third platform represents a shift towards a more interconnected, data-driven, and technology-
dependent world. It has significant implications for businesses, as they must adapt to leverage
these technologies for competitive advantage and operational efficiency. Additionally, it
continues to evolve, with emerging technologies like quantum computing and 5G networks
expected to further shape the landscape of the third platform in the future.
1. On-demand self-services: The Cloud computing services does not require any human
administrators, user themselves are able to provision, monitor and manage computing
resources as needed.
2. Broad network access: The Computing services are generally provided over standard
networks and heterogeneous devices.
26
3. Rapid elasticity: The Computing services should have IT resources that are able to scale out
and in quickly and on as needed basis. Whenever the user require services it is provided to
him and it is scale out as soon as its requirement gets over.
4. Resource pooling: The IT resource (e.g., networks, servers, storage, applications, and
services) present are shared across multiple applications and occupant in an uncommitted
manner. Multiple clients are provided service from a same physical resource.
5. Measured service: The resource utilization is tracked for each application and occupant, it
will provide both the user and the resource provider with an account of what has been used.
This is done for various reasons like monitoring billing and effective use of resource.
6. Multi-tenancy: Cloud computing providers can support multiple tenants (users or
organizations) on a single set of shared resources.
7. Virtualization: Cloud computing providers use virtualization technology to abstract
underlying hardware resources and present them as logical resources to users.
8. Resilient computing: Cloud computing services are typically designed with redundancy and
fault tolerance in mind, which ensures high availability and reliability.
9. Flexible pricing models: Cloud providers offer a variety of pricing models, including pay-
per-use, subscription-based, and spot pricing, allowing users to choose the option that best
suits their needs.
10. Security: Cloud providers invest heavily in security measures to protect their users’ data and
ensure the privacy of sensitive information.
11. Automation: Cloud computing services are often highly automated, allowing users to deploy
and manage resources with minimal manual intervention.
12. Sustainability: Cloud providers are increasingly focused on sustainable practices, such as
energy-efficient data centers and the use of renewable energy sources, to reduce their
environmental impact.
1.2.3 Cloud Service Models
Most cloud computing services fall into four broad categories:
27
Infrastructure as a service (IaaS) refers to the cloud computing services that provides IT
infrastructure—servers and virtual machines (VMs), storage, networks, operating systems— as a
service from a cloud provider on a pay-as-you-go basis. IaaS is also known as Hardware as a
Service (HaaS). It is a computing infrastructure managed over the internet.
Example: DigitalOcean, Linode, Amazon Web Services (AWS), Microsoft Azure, Google
Compute Engine (GCE), Rackspace, and Cisco Metacloud.
Characteristics of IaaS:
Advantages of IaaS:
1. Cost-Effective: Eliminates capital expense and reduces ongoing cost and IaaS customers
pay on a per-user basis, typically by the hour, week, or month.
2. Website hosting: Running websites using IaaS can be less expensive than traditional web
hosting.
28
3. Security: The IaaS Cloud Provider may provide better security than your existing
software.
4. Maintenance: There is no need to manage the underlying data center or the introduction
of new releases of the development or underlying software. This is all handled by the IaaS
Cloud Provider.
Disadvantages of laaS:
1. Limited control over infrastructure: IaaS providers typically manage the underlying
infrastructure and take care of maintenance and updates, but this can also mean that users
have less control over the environment and may not be able to make certain
customizations.
2. Security concerns: Users are responsible for securing their own data and applications,
which can be a significant undertaking.
3. Limited access: Cloud computing may not be accessible in certain regions and countries
due to legal policies.
Platform as a service (PaaS) refers to cloud computing services that supply an on-demand
environment for developing, testing, delivering, and managing software applications. PaaS is
designed to make it easier for developers to quickly create web or mobile apps, without worrying
about setting up or managing the underlying infrastructure of servers, storage, network, and
databases needed for development.
Example: AWS Elastic Beanstalk, Windows Azure, Heroku, Force.com, Google App Engine,
Apache Stratos, Magento Commerce Cloud, and OpenShift.
Characteristics of PaaS:
Advantages of PaaS:
1. Simple and convenient for users: It provides much of the infrastructure and other IT
services, which users can access anywhere via a web browser.
2. Cost-Effective: It charges for the services provided on a per-use basis thus eliminating
the expenses one may have for on-premises hardware and software.
29
Disadvantages of PaaS:
1. Limited control over infrastructure: PaaS providers typically manage the underlying
infrastructure and take care of maintenance and updates, but this can also mean that users
have less control over the environment and may not be able to make certain
customizations.
2. Dependence on the provider: Users are dependent on the PaaS provider for the
availability, scalability, and reliability of the platform, which can be a risk if the provider
experiences outages or other issues.
3. Limited flexibility: PaaS solutions may not be able to accommodate certain types of
workloads or applications, which can limit the value of the solution for certain
organizations.
Software as a service (SaaS) is a method for delivering software applications over the internet,
on demand and typically on a subscription basis. With SaaS, cloud providers host and manage
the software application and underlying infrastructure, and handle any maintenance, like software
upgrades and security patching. Users connect to the application over the internet, usually with a
web browser on their phone, tablet, or PC.
Characteristics of SaaS:
Advantages of SaaS:
30
and configuration and can reduce the issues that can get in the way of the software
deployment.
3. Accessibility: We can Access app data from anywhere.
4. Automatic updates: Rather than purchasing new software, customers rely on a SaaS
provider to automatically perform the updates.
5. Scalability: It allows the users to access the services and features on-demand.
Disadvantages of SaaS:
Serverless Computing
Overlapping with PaaS, Serverless Computing focuses on building app functionality without
spending time continually managing the servers and infrastructure required to do so. The cloud
provider handles the setup, capacity planning, and server management for you. Serverless
architectures are highly scalable and event-driven, only using resources when a specific function
or trigger occurs.
31
The below table shows the difference between IaaS, PaaS, and SaaS –
32
Examples- Google Workspace, Amazon Web Services (AWS), Dropbox, and Microsoft offerings
like Microsoft 365 and Azure, as well as streaming services like Netflix.
Private - Private cloud is a Infrastructure used by a single organization. In simple words,
Resource managed and used by the organization.
Examples - Amazon VPC, HPE, VMware, and IBM.
Community – Community cloud supports multiple organizations sharing computing resources
that are part of a community. Resource shared by several organizations, usually in the same
industry.
Examples - Health Care community cloud, Scientific Research Sector.
Hybrid - An organization makes use of interconnected private and public cloud infrastructure.
Hybrid cloud deployment model is partly managed by the service provided and partly by the
organization.
Examples - Google Application Suite (Gmail, Google Apps, and Google Drive), Office 365 (MS
Office on the Web and One Drive), Amazon Web Services.
Common Adoption Issues for Cloud
• It is impossible to provide 100% availability without a high-availability architecture.
• Vendor lock-in is also a concern that users always have, but in practice, they live with it.
• It is almost impossible to guarantee 100% of security and privacy protection.
• Enterprise users must maintain business legal documents.
33
34
4. The cloud service providers fully subsidize the entire Infrastructure. Therefore, you don’t
need to set up any hardware.
5. Does not cost you any maintenance charges as the service provider does it.
6. It works on the Pay as You Go model, so you don’t have to pay for items you don’t use.
7. There is no significant upfront fee, making it excellent for enterprises that require
immediate access to resources.
Disadvantages of Public Cloud Deployments
Here are the cons/drawbacks of the Public Cloud Deployment Model:
1. It has lots of issues related to security.
2. Privacy and organizational autonomy are not possible.
3. You don’t control the systems hosting your business applications.
1.2.4.2 Private Cloud Model
The private cloud deployment model is a dedicated environment for one user or customer. Users
don’t share their hardware with any other users, as all the hardware is theirs.
It is a one-to-one environment for single use, so there is no need to share your hardware with
anyone else. The main difference between private and public cloud deployment models is how
you handle the hardware. It is also referred to as “Internal cloud,” which refers to the ability to
access systems and services within an organization or border.
35
36
37
38
Multi-Cloud Architecture
Multi-cloud computing refers to using public cloud services from many cloud service
providers. A company must run workloads on IaaS or PaaS in a multi-cloud configuration from
multiple vendors, such as Azure, AWS, or Google Cloud Platform.
There are many reasons an organization selects a multi-cloud strategy. Some use it to avoid
vendor lock-in problems, while others combat shadow IT through multi-cloud deployments. So,
employees can still benefit from a specific public cloud service if it does not meet strict IT
policies.
Benefits of Multi-Cloud Deployment Model
39
40
41
Use Case: Banco de Oro, a Phillippine banking company, uses Big Data analytics to identify
fraudulent activities and discrepancies. The organization leverages it to narrow down a list of
suspects or root causes of problems.
2. Product Development and Innovations
Use Case: Rolls-Royce, one of the largest manufacturers of jet engines for airlines and armed
forces across the globe, uses Big Data analytics to analyze how efficient the engine designs are
and if there is any need for improvements.
3. Quicker and Better Decision Making Within Organizations
Use Case: Starbucks uses Big Data analytics to make strategic decisions. For example, the
company leverages it to decide if a particular location would be suitable for a new outlet or not.
They will analyze several different factors, such as population, demographics, accessibility of the
location, and more.
4. Improve Customer Experience
Use Case: Delta Air Lines uses Big Data analysis to improve customer experiences. They monitor
tweets to find out their customers’ experience regarding their journeys, delays, and so on. The
airline identifies negative tweets and does what’s necessary to remedy the situation. By publicly
addressing these issues and offering solutions, it helps the airline build good customer relations.
Big Data
According to Gartner, the definition of Big Data –
“Big data” is high-volume, velocity, and variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.”
This definition clearly answers the “What is Big Data?” question – Big Data refers to complex
and large data sets that have to be processed and analyzed to uncover valuable information that
can benefit businesses and organizations.
However, there are certain basic tenets of Big Data that will make it even simpler to answer what
is Big Data:
✓ It refers to a massive amount of data that keeps on growing exponentially with time.
✓ It is so voluminous that it cannot be processed or analyzed using conventional data
processing techniques.
✓ It includes data mining, data storage, data analysis, data sharing, and data visualization.
✓ The term is an all-comprehensive one including data, data frameworks, along with the
tools and techniques used to process and analyze the data.
Types of Big Data
Now that we are on track with what is big data, let’s have a look at the types of big data:
a) Structured
Structured is one of the types of big data and By structured data, we mean data that can be
processed, stored, and retrieved in a fixed format. It refers to highly organized information that
42
can be readily and seamlessly stored and accessed from a database by simple search engine
algorithms.
For instance, the employee table in a company database will be structured as the employee details,
their job positions, their salaries, etc., will be present in an organized manner.
b) Unstructured
Unstructured data refers to the data that lacks any specific form or structure whatsoever. This
makes it very difficult and time-consuming to process and analyze unstructured data.
Email is an example of unstructured data. Structured and unstructured are two important types of
big data.
c) Semi-structured
Semi structured is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data. To be precise, it refers
to the data that although has not been classified under a particular repository (database), yet
contains vital information or tags that segregate individual elements within the data.
The History of Big Data
Although the concept of big data itself is relatively new, the origins of large data sets go back to
the 1960s and '70s when the world of data was just getting started with the first data centers and
the development of the relational database.
Around 2005, people began to realize just how much data users generated through Facebook,
YouTube, and other online services. Hadoop (an open-source framework created specifically to
store and analyze big data sets) was developed that same year. NoSQL also began to gain
popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was
essential for the growth of big data because they make big data easier to work with and cheaper
to store. In the years since then, the volume of big data has skyrocketed. Users are still generating
huge amounts of data—but it’s not just humans who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the
internet, gathering data on customer usage patterns and product performance. The emergence of
machine learning has produced still more data.
While big data has come far, its usefulness is only just beginning. Cloud computing has expanded
big data possibilities even further. The cloud offers truly elastic scalability, where developers can
simply spin up ad hoc clusters to test a subset of data.
Uses and Examples of Big Data Analytics
There are many different ways that Big Data analytics can be used in order to improve businesses
and organizations. Here are some examples:
• Using analytics to understand customer behaviour in order to optimize the customer
experience
43
44
Use Case: An e-commerce company’s report shows that their sales have gone down, although
customers are adding products to their carts. This can be due to various reasons like the form
didn’t load correctly, the shipping fee is too high, or there are not enough payment options
available. This is where you can use diagnostic analytics to find the reason.
3. Predictive Analytics
This type of analytics looks into the historical and present data to make predictions of the future.
Predictive analytics uses data mining, AI, and machine learning to analyze current data and make
predictions about the future. It works on predicting customer trends, market trends, and so on.
Use Case: PayPal determines what kind of precautions they have to take to protect their clients
against fraudulent transactions. Using predictive analytics, the company uses all the historical
payment data and user behavior data and builds an algorithm that predicts fraudulent activities.
4. Prescriptive Analytics
This type of analytics prescribes the solution to a particular problem. Perspective analytics works
with both descriptive and predictive analytics. Most of the time, it relies on AI and machine
learning.
Use Case: Prescriptive analytics can be used to maximize an airline’s profit. This type of analytics
is used to build an algorithm that will automatically adjust the flight fares based on numerous
factors, including customer demand, weather, destination, holiday seasons, and oil prices.
Big Data Analytics Tools
Here are some of the key big data analytics tools :
• Hadoop - helps in storing and analyzing data
• MongoDB - used on datasets that change frequently
• Talend - used for data integration and management
• Cassandra - a distributed database used to handle chunks of data
• Spark - used for real-time processing and analyzing large amounts of data
• STORM - an open-source real-time computational system
• Kafka - a distributed streaming platform that is used for fault-tolerant storage
Big Data Industry Applications
Here are some of the sectors where Big Data is actively used:
• Ecommerce - Predicting customer trends and optimizing prices are a few of the ways e-
commerce uses Big Data analytics
• Marketing - Big Data analytics helps to drive high ROI marketing campaigns, which result
in improved sales
• Education - Used to develop new and improve existing courses based on market
requirements
45
• Healthcare - With the help of a patient’s medical history, Big Data analytics is used to
predict how likely they are to have health issues
• Media and entertainment - Used to understand the demand of shows, movies, songs, and
more to deliver a personalized recommendation list to its users
• Banking - Customer income and spending patterns help to predict the likelihood of
choosing various banking offers, like loans and credit cards
• Telecommunications - Used to forecast network capacity and improve customer
experience
• Government - Big Data analytics helps governments in law enforcement, among other
things
1.2.6 Social networking
Social networks are websites and apps that allow users and organizations to connect,
communicate, share information and form relationships. People can connect with others in the
same area, families, friends, and those with the same interests. Social networks are one of the
most important uses of the internet today.
Popular social networking sites -- such as Facebook, Yelp, Twitter, Instagram and TikTok --
enable individuals to maintain social connections, stay informed and access, as well as share a
wealth of information. These sites also enable marketers to reach their target audiences.
Social Networking:
The term social networking entails having connections in both the real and the digital worlds.
Today, this term is mainly used to reference online social communications. The internet has made
it possible for people to find and connect with others who they may never have met otherwise.
Online social networking is dependent on technology and internet connectivity. Users can access
social networking sites using their PCs, tablets or smartphones. Most social networking sites run
on a back end of searchable databases that use advanced programming languages, such as Python,
to organize, store and retrieve data in an easy-to-understand format. For example, Tumblr uses
such products and services in its daily operations as Google Analytics, Google Workspace and
WordPress.
Purpose of Social Networking:
Social networking fulfils the following four main objectives:
1. Sharing -Friends or family members who are geographically dispersed can connect
remotely and share information, updates, photos and videos. Social networking also
enables individuals to meet other people with similar interests or to expand their current
social networks.
2. Learning - Social networks serve as great learning platforms. Consumers can instantly
receive breaking news, get updates regarding friends and family, or learn about what's
happening in their community.
46
3. Interacting - Social networking enhances user interactions by breaking the barriers of time
and distance. With cloud-based video communication technologies such as WhatsApp or
Instagram Live, people can talk face to face with anyone in the world.
4. Marketing - Companies may tap into social networking services to enhance brand
awareness with the platform's users, improve customer retention and conversion rates, and
promote brand and voice identity.
Types of Social Networking:
six most common types are the following:
1. Social connections. This is a type of social network where people stay in touch with
friends, family members, acquaintances or brands through online profiles and updates, or
find new friends through similar interests. Some examples are Facebook, Myspace and
Instagram.
2. Professional connections. Geared toward professionals, these social networks are
designed for business relationships. These sites can be used to make new professional
contacts, enhance existing business connections and explore job opportunities, for
example. They may include a general forum where professionals can connect with co-
workers or offer an exclusive platform based on specific occupations or interest levels.
Some examples are LinkedIn, Microsoft Yammer and Microsoft Viva.
3. Sharing of multimedia. Various social networks provide video- and photography-sharing
services, including YouTube and Flickr.
4. News or informational. This type of social networking allow users to post news stories,
informational or how-to content and can be general purpose or dedicated to a single topic.
These social networks include communities of people who are looking for answers to
everyday problems and they have much in common with web forums. Fostering a sense
of helping others, members provide answers to questions, conduct discussion forums or
teach others how to perform various tasks and projects. Popular examples include Reddit,
Stack Overflow or Digg.
5. Communication. Here, social networks focus on allowing the user to communicate
directly with each other in one-on-one or group chats. They have less focus on posts or
updates and are like instant messaging apps. Some examples are WhatsApp, WeChat and
Snapchat.
6. Educational. Educational social networks offer remote learning, enabling students and
teachers to collaborate on school projects, conduct research, and interact through blogs
and forums. Google Classroom, LinkedIn Learning and ePals are popular examples.
Advantages of Social Networking
1. Brand awareness. Social networking enables companies to reach out to new and existing
clients. This helps to make brands more relatable and promotes brand awareness.
2. Instant reachability. By erasing the physical and spatial boundaries between people, social
networking websites can provide instant reachability.
3. Builds a following. Organizations and businesses can use social networking to build a
following and expand their reach globally.
4. Business success. Positive reviews and comments generated by customers on social
networking platforms can help improve business sales and profitability.
47
5. Increased website traffic. Businesses can use social networking profiles to boost and direct
inbound traffic to their websites. They can achieve this, for example, by adding inspiring
visuals, using plugins and shareable social media buttons, or encouraging inbound linking.
Disadvantages of Social Networking:
1. Rumors and misinformation. Incorrect information can slip through the cracks of social
networking platforms, causing havoc and uncertainty among consumers. Often, people
take anything posted on social networking sites at face value instead of verifying the
sources.
2. Negative reviews and comments. A single negative review can adversely affect an
established business, especially if the comments are posted on a platform with a large
following. A tarnished business reputation can often cause irreparable damage.
3. Data security and privacy concerns. Social networking sites can inadvertently put
consumer data at risk. For instance, if a social networking site experiences a data breach,
the users of that platform automatically fall under the radar as well. According to Business
Insider, a data breach in April 2021 leaked the personal data of more than 500 million
Facebook users.
4. Time-consuming process. Promoting a business on social media requires constant
upkeep and maintenance. Creating, updating, preparing and scheduling regular posts can
take a considerable amount of time. This can be especially cumbersome for small
businesses that may not have the extra staff and resources to dedicate to social media
marketing.
1.2.7 Mobile computing
Mobile Computing refers a technology that allows transmission of data, voice and video via a
computer or any other wireless enabled device. It is free from having a connection with a fixed
physical link. It facilitates the users to move from one physical location to another during
communication.
Introduction of Mobile Computing
Mobile Computing is a technology that provides an environment that enables users to transmit
data from one device to another device without the use of any physical link or cables.
In other words, Mobile computing allows transmission of data, voice and video via a computer
or any other wireless-enabled device without being connected to a fixed physical link. In this
technology, data transmission is done wirelessly with the help of wireless devices such as
mobiles, laptops etc.
With Mobile Computing technology one can access and transmit data from any remote locations
without being present there physically. Mobile computing technology provides a vast coverage
diameter for communication. It is one of the fastest and most reliable sectors of the computing
technology field.
The concept of Mobile Computing can be divided into three parts:
o Mobile Communication
o Mobile Hardware
48
o Mobile Software
Mobile Communication
Mobile Communication specifies a framework that is responsible for the working of mobile
computing technology. In this case, mobile communication refers to an infrastructure that ensures
seamless and reliable communication among wireless devices. This framework ensures the
consistency and reliability of communication between wireless devices. The mobile
communication framework consists of communication devices such as protocols, services,
bandwidth, and portals necessary to facilitate and support the stated services. These devices are
responsible for delivering a smooth communication process.
Mobile communication can be divided in the following four types:
1. Fixed and Wired
2. Fixed and Wireless
3. Mobile and Wired
4. Mobile and Wireless
49
These devices are inbuilt with a receptor medium that can send and receive signals. These devices
are capable of operating in full-duplex. It means they can send and receive signals at the same
time. They don't have to wait until one device has finished communicating for the other device
to initiate communications.
Mobile Software
Mobile software is a program that runs on mobile hardware. This is designed to deal capably with
the characteristics and requirements of mobile applications. This is the operating system for the
appliance of mobile devices. In other words, you can say it the heart of the mobile systems. This
is an essential component that operates the mobile device.
50
Scalability - Third Platform infrastructure is designed to be highly scalable. Cloud services can
be easily scaled up or down based on demand, ensuring that organizations can handle fluctuations
in workload without overprovisioning resources. This scalability supports agility and cost-
efficiency.
Availability - Third Platform infrastructure is designed for high availability and disaster recovery.
Cloud providers offer redundancy and data replication across multiple regions to ensure business
continuity.
Ease of Access - Ease of access generally refers to how easily individuals can obtain or interact
with something, whether it's information, services, physical spaces, or digital resources. In the
context of technology and digital services, ease of access is a critical consideration to ensure that
users can efficiently and conveniently access and utilize various resources.
51
Resiliency - Resiliency in Third Platform infrastructure is a critical aspect that ensures that IT
systems and services can continue to operate reliably and recover quickly from disruptions or
failures. With the increasing complexity of modern IT environments, resiliency is essential to
maintain business continuity, minimize downtime, and protect against various threats and
challenges. High availability and Fault tolerance.
52
Agility
Agility is the ability to be quick and graceful. Agility in today’s business and the world around is
an important factor for the third platform transformation.
Intelligent Operations
Intelligent Operations is a bold new approach to achieve Operational Excellence (OpEx). It uses
digital transformation to optimize production, minimize equipment downtime, enhance human
performance, and manage operational risks.
New product and Services
A product is a tangible offering to a customer, whereas a service is an intangible offering. The
former is usually a one-time exchange for value. In contrast, a service usually involves a longer
period of time. The value of a product is inherent in the tangible offering itself, for example, in
the can of paint or a pair of pants. In contrast, the value of a service often comes from the eventual
benefit that the customer perceives from the time while using the service. In addition, the
customer often judges the value of a service based on the quality of the relationship between the
provider and the customer while using the service.
Mobility
Business mobility, also known as enterprise mobility, is the growing trend of businesses to offer
remote working options, allow the use of personal laptops and mobile devices for business
purposes, and make use of cloud technology for data access.
Social Networking
Social networking involves using online social media platforms to connect with new and existing
friends, family, colleagues, and businesses. Individuals can use social networking to announce
and discuss their interests and concerns with others who may support or interact with them.
1.3 Data Center Environment
Organizations maintain data centers to provide centralized data processing capabilities across the
enterprise. Data centers store and manage large amounts of mission-critical data. The data center
infrastructure includes computers, storage systems, network devices, dedicated power backups,
and environmental controls (such as air conditioning and fire suppression).
Large organizations often maintain more than one data center to distribute data processing
workloads and provide backups in the event of a disaster. The storage requirements of a data
center are met by a combination of various storage architectures.
Core Elements
Five core elements are essential for the basic functionality of a data center:
Application: An application is a computer program that provides the logic for computing
operations. Applications, such as an order processing system, can be layered on a database, which
in turn uses operating system services to perform read/write operations to storage devices.
53
Database: More commonly, a database management system (DBMS) provides a structured way
to store data in logically organized tables that are interrelated. A DBMS optimizes the storage and
retrieval of data.
Server and operating system: A computing platform that runs applications and databases.
Network: A data path that facilitates communication between clients and servers or between
servers and storage.
Storage array: A device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the elements
must work together to address data processing requirements. Figure 1-5 shows an example of an
order processing system that involves the five core elements of a data center and illustrates their
functionality in a business process.
1. A customer places an order through the AUI of the order processing application software
located on the client computer.
2. The client connects to the server over the LAN and accesses the DBMS located on the
server to update the relevant information such as the customer name, address, payment
method, products ordered, and quantity ordered.
3. The DBMS uses the server operating system to read and write this data to the database
located on physical disks in the storage array.
4. The Storage Network provides the communication link between the server and the storage
array and transports the read or write commands between them.
5. The storage array, after receiving the read or write commands from the server, performs
the necessary operations to store the data on physical disks.
Figure 1-5: Example of an order processing system
1.3.1 Building blocks of a data center
Physical Infrastructure:
Facility Location:
The location of the data center is crucial, considering factors like proximity to power sources,
network connectivity, and disaster risk.
Building:
54
The physical structure housing the data center, which must be designed to withstand
environmental threats, such as earthquakes, floods, and fire.
Power Infrastructure:
Ensures a continuous and reliable power supply, including backup generators and uninterruptible
power supplies (UPS) to prevent data loss during power outages.
Cooling and HVAC:
Efficient cooling systems are necessary to maintain the proper temperature and humidity levels
inside the data center, as servers generate a significant amount of heat.
Racks and Cabinets:
Server Racks:
These house the servers, networking equipment, and other hardware. They are designed to
optimize space and airflow for cooling.
Cabinets:
Secure cabinets or enclosures to store networking equipment, switches, and patch panels.
Servers and Hardware:
Server Hardware:
The core computing components, including servers, storage devices, and backup systems.
Storage Systems:
Arrays of hard drives or solid-state drives (SSDs) for data storage and retrieval.
Networking Equipment:
Routers, switches, and firewalls to manage data traffic within the data center and across networks.
Power Distribution:
Power Distribution Units (PDUs):
Devices that distribute power to servers and networking equipment within racks and cabinets.
Redundancy:
Implementing redundancy in power distribution to ensure uninterrupted operation.
Cooling and Environmental Control:
Precision Air Conditioning: HVAC systems designed to maintain a controlled environment for
temperature and humidity.
Hot and Cold Aisle Containment: Arranging server racks in hot and cold aisles to optimize
cooling efficiency.
Network Infrastructure:
55
56
A personal computer tends to come with more input/output devices such as a mouse, keyboard,
and monitor so a user can interact directly with the system while servers do not usually include
components for direct user interaction. Of course, a computing system, even a data center server,
must have logical components such as an OS and other forms of system software.
Outside of a data center, computing systems can be large or small, depending on a user’s
computing needs. For example, a gamer will probably use a large gaming laptop or PC tower
because they need computing speed and space for good graphics, but that would be overkill for
someone just looking up a recipe to make dinner. They would be fine with using a tablet or phone.
However, in a data center, a computing system needs to be large enough to provide the amount
of processing power needed to store and process data for a large number of users at one time such
as for businesses, companies, and government agencies. That is why data centers use servers.
This is because unlike desktop computers, which focus their processing power on user-friendly
features like a desktop screen and media, servers are designed to use most of their processing
power to host services and to interact with other machines. This makes servers more efficient and
better equipped to deal with higher workloads.
57
The tower system is a traditional PC tower. These are also commonly used in computer labs and
offices so more than likely, you’ve used a tower system like a personal computer, but not as a
server.
The rack-mounted system is a thin, large rectangular compute system that slides onto the racks
of a frame. When the frame is full of rack-mounted systems, it resembles a tall metal set of
drawers. A typical metal frame used for this type of system is called a 19-inch rack, based on its
width. The height of the frame is measured by slots available for rack-mounted servers, called
rack units. For example, a normal 19-inch rack is 42u, which means it holds up to 42 rack-
mounted servers. The 42u is taller than the average person, but smaller frames are available, such
as 1u and 8u, which take up less space.
The blade server, like the rack-mounted, has rectangular hardware inserted into a larger frame.
However, these are usually inserted vertically into the frame, which would look like a set of
drawers on its side. The adoption of smaller form-factor blade servers is growing dramatically.
Since the transition to blade architectures is generally driven by a desire to consolidate physical
IT resources, virtualization is an ideal complement for blade servers because it delivers benefits
such as resource optimization, operational efficiency, and rapid provisioning.
1.3.3 Compute virtualization
Compute virtualization is a process by which a virtual version is created of computing hardware, operating
systems, computer networks or other resources. It is a simplification of traditional architectures in order
to reduce the number of physical devices.
Compute virtualization is a process which enhances the efficiency and reduces the cost of IT
infrastructure. It provides a flexible model for virtual machines through which physical servers
are treated as a pool of resources. It works by consolidating the servers, and thus reduces the need
58
for computer equipment and other related hardware, thus reducing the costs. It simplifies the
business procedures related to licensing, and thus can make things more manageable. It creates a
centralized infrastructure which can be shared and accessed from various employees sitting in
different locations at a time.
Software-defined data centers are considered by many to be the next step in the evolution of
virtualization, container and cloud services.
59
1. Define Data
Ans:
Data is a collection of raw factsfrom which conclusions may be drawn. Handwritten letters, a printed book,
a family photograph, a movie on video tape, printed and duly signed copies of mortgage papers, a bank’s
ledgers, and an account holder’s passbooks are all examples of data.
2. Define Information
Ans:
Information is the intelligence and knowledge derived from data. Effective data analysis not only extends
its benefits to existing businesses, but also creates the potential for new business opportunities by using
the information in creative ways.
➢ Digital Data is any information that is processed and stored in a digital format, such as text, images,
audio, and video.
➢ It can be easily manipulated and transmitted electronically.
➢ This data is represented using binary code (0s and 1s)
➢ Examples - Emails, Social Media posts, Digital Images, Music Files, Video Files, and more.
60
➢ Structured Data
➢ Unstructured Data
➢ Semi - Structured Data
Examples:
➢ Text documents, Images and Videos,Social media posts and comments,Email messages.
It does not follow the format of a tabular data model or relational databases.It does not have a fixed
schema. Data is not completely raw or unstructured .It contains some structural elements such as tags and
organizational metadata that make it easier to analyze.
61
Inexpensive and easier ways to create, collect, and store all types of data, coupled with increasing
individual and business needs, have led to accelerated data growth, popularly termed the data explosion.
Data has different purposes and criticality, so both individuals and businesses have contributed in varied
proportions to this data explosion.
➢ Data created by individuals or businesses must be stored so that it is easily accessible for further
processing. In a computing environment, devices designed for storing data are termed storage devices
or simply storage.
➢ This type of storage used varies based on the type of data and the rate at which it is created and used.
➢ Devices such as memory in a cell phone or digital camera, DVDs, CD-ROMs, and hard disks in
personal computers are examples of storage devices.
➢ Businesses have several options available for storing data including internal hard disks, external disk
arrays and tapes.
62
➢ Information Storage refers to the process of collecting, preserving, and organizing data in a manner
that allows for efficient retrieval and use at a later time.
➢ Information storage systems are an integral part of our daily lives, playing a crucial role in various
fields, including Business, Science, Education, Personal communication.
13. What is meant by Cloud Computing?
Ans: Cloud computing is the delivery of computing services—including servers, storage, databases,
networking, software, analytics, and intelligence—over the Internet (“the cloud”) to offer faster
innovation, flexible resources, and economies of scale.
The information lifecycle is the “change in the value of information” over time. When data is first created,
it often has the highest value and is used frequently. As data ages, it is accessed less frequently and is of
less value to the organization.
➢ Public Cloud
➢ Private Cloud
➢ Hybrid Cloud
➢ Community Cloud
➢ Multi-Cloud
17. What is public cloud with example?
Ans: The public cloud is defined as computing services offered by third-party providers over the public
Internet, making them available to anyone who wants to use or purchase them. They may be free or sold
on-demand, allowing customers to pay only per usage for the CPU cycles, storage, or bandwidth they
consume.
Examples: Google Workspace, Amazon Web Services (AWS), Dropbox, and Microsoft offerings like
Microsoft 365 and Azure, as well as streaming services like Netflix.
Amazon VPC, HPE, VMware, and IBM. They leverage technologies like Virtualization, Management
Software, and Automation to achieve this. A private cloud can also leverage DevOps and cloud-native
practices to maximize agility
63
Software as a service (SaaS) - Required software, Operating system & network is provided.
Anything/Everything as a service (XaaS) - Combination of all services with some additional service..
Function as a Service (FaaS) –Same as (PaaS)
➢ Routers,
➢ Switches,
➢ Firewalls,
➢ Storage systems,
➢ Servers,
➢ Application-delivery controllers.
Review Questions:
64
65
The intelligent storage systems are arrays that provide highly optimized I/O processing
capabilities. These arrays have an operating environment that controls the management,
allocation, and utilization of storage resources. These storage systems are configured with large
amounts of memory called cache and use sophisticated algorithms to meet the I/O requirements
of performance sensitive applications.
2.1.1 Components of an Intelligent Storage System
An intelligent storage system consists of four key components: front end, cache, back end, and
physical disks. Figure 4-1 illustrates these components and their interconnections. An I/O request
received from the host at the front-end port is processed through cache and the back end, to enable
storage and retrieval of data from the physical disk. A read request can be serviced directly from
cache if the requested data is found in cache.
66
command queuing, multiple commands can be executed concurrently based on the organization
of data on the disk, regardless of the order in which the commands were received.
The most commonly used command queuing algorithms are as follows:
■ First In First Out (FIFO): This is the default algorithm where commands are executed in the
order in which they are received (Figure 4-2 [a]). There is no reordering of requests for
optimization; therefore, it is inefficient in terms of performance.
■ Seek Time Optimization: Commands are executed based on optimizing read/write head
movements, which may result in reordering of commands. Without seek time optimization, the
commands are executed in the order they are received. For example, as shown in Figure 4-2(a),
the commands are executed in the order A, B, C and D. The radial movement required by the
head to execute C immediately after A is less than what would be required to execute B. With
seek time optimization, the command execution sequence would be A, C, B and D, as shown in
Figure 4-2(b).
■ Access Time Optimization: Commands are executed based on the combination of seek time
optimization and an analysis of rotational latency for optimal performance.
Command queuing can also be implemented on disk controllers and this may further supplement
the command queuing implemented on the front-end controllers. Some models of SCSI and Fibre
Channel drives have command queuing implemented on their controllers.
67
2.1.1.2 Cache
Cache is an important component that enhances the I/O performance in an intelligent storage
system. Cache is semiconductor memory where data is placed temporarily to reduce the time
required to service I/O requests from the host. Cache improves storage system performance by
isolating hosts from the mechanical delays associated with physical disks, which are the slowest
components of an intelligent storage system. Accessing data from a physical disk usually takes a
few milliseconds because of seek times and rotational latency. If a disk has to be accessed by the
host for every I/O operation, requests are queued, which results in a delayed response. Accessing
data from cache takes less than a millisecond. Write data is placed in cache and then written to
disk. After the data is securely placed in cache, the host is acknowledged immediately.
Structure of Cache
Cache is organized into pages or slots, which is the smallest unit of cache allocation. The size of
a cache page is configured according to the application I/O size. Cache consists of the data store
and tag RAM. The data store holds the data while tag RAM tracks the location of the data in the
data store (see Figure 4-3) and in disk. Entries in tag RAM indicate where data is found in cache
and where the data belongs on the disk. Tag RAM includes a dirty bit flag, which indicates
whether the data in cache has been committed to the disk or not. It also contains time-based
information, such as the time of last access, which is used to identify cached information that has
not been accessed for a long period and may be freed up
When a host issues a read request, the front-end controller accesses the tag RAM to determine
whether the required data is available in cache. If the requested data is found in the cache, it is
called a read cache hit or read hit and data is sent directly to the host, without any disk operation
(see Figure 4-4[a]). This provides a fast response time to the host (about a millisecond). If the
requested data is not found in cache, it is called a cache miss and the data must be read from the
disk (see Figure 4-4[b]). The back-end controller accesses the appropriate disk and retrieves the
requested data. Data is then placed in cache and is finally sent to the host through the front-end
controller. Cache misses increase I/O response time.
68
A pre-fetch, or read-ahead, algorithm is used when read requests are sequential. In a sequential
read request, a contiguous set of associated blocks is retrieved. The intelligent storage system
offers fixed and variable pre-fetch sizes.
In fixed pre-fetch, the intelligent storage system pre-fetches a fixed amount of data. It is most
suitable when I/O sizes are uniform.
In variable pre-fetch, the storage system pre-fetches an amount of data in multiples of the size of
the host request. Maximum pre-fetch limits the number of data blocks that can be pre-fetched to
prevent the disks from being rendered busy with pre-fetch at the expense of other I/O.
Read performance is measured in terms of the read hit ratio, or the hit rate, usually expressed as
a percentage. This ratio is the number of read hits with respect to the total number of read requests.
A higher read hit ratio improves the read performance.
Write Operation with Cache
Write operations with cache provide performance advantages over writing directly to disks. When
an I/O is written to cache and acknowledged, it is completed in far less time (from the host’s
perspective) than it would take to write directly to disk. Sequential writes also offer opportunities
for optimization because many smaller writes can be coalesced for larger transfers to disk drives
with the use of cache.
A write operation with cache is implemented in the following ways:
69
■ Write-back cache: Data is placed in cache and an acknowledgment is sent to the host
immediately. Later, data from several writes are committed (de-staged) to the disk. Write response
times are much faster, as the write operations are isolated from the mechanical delays of the disk.
However, uncommitted data is at risk of loss in the event of cache failures.
■ Write-through cache: Data is placed in the cache and immediately written to the disk, and an
acknowledgment is sent to the host. Because data is committed to disk as it arrives, the risks of
data loss are low but write response time is longer because of the disk operations.
Cache can be bypassed under certain conditions, such as very large size write I/O. In this
implementation, if the size of an I/O request exceeds the predefined size, called write aside size,
writes are sent to the disk directly to reduce the impact of large writes consuming a large cache
area. This is particularly useful in an environment where cache resources are constrained and
must be made available for small random I/O s.
Cache Implementation
Cache can be implemented as either dedicated cache or global cache. With dedicated cache,
separate sets of memory locations are reserved for reads and writes. In global cache, both reads
and writes can use any of the available memory addresses. Cache management is more efficient
in a global cache implementation, as only one global set of addresses has to be managed.
Global cache may allow users to specify the percentages of cache available for reads and writes
in cache management. Typically, the read cache is small, but it should be increased if the
application being used is read intensive. In other global cache implementations, the ratio of cache
available for reads versus writes is dynamically adjusted based on the workloads.
Cache Management
Cache is a finite and expensive resource that needs proper management. Even though intelligent
storage systems can be configured with large amounts of cache, when all cache pages are filled,
some pages have to be freed up to accommodate new data and avoid performance degradation.
Various cache management algorithms are implemented in intelligent storage systems to
proactively maintain a set of free pages and a list of pages that can be potentially freed up
whenever required:
■ Least Recently Used (LRU): An algorithm that continuously monitors data access in cache
and identifies the cache pages that have not been accessed for a long time. LRU either frees up
these pages or marks them for reuse. This algorithm is based on the assumption that data which
hasn’t been accessed for a while will not be requested by the host. However, if a page contains
write data that has not yet been committed to disk, data will first be written to disk before the
page is reused.
70
Most Recently Used (MRU): An algorithm that is the converse of LRU. In MRU, the pages that
have been accessed most recently are freed up or marked for reuse. This algorithm is based on
the assumption that recently accessed data may not be required for a while.
As cache fills, the storage system must take action to flush dirty pages (data written into the cache
but not yet written to the disk) in order to manage its availability.
Flushing is the process of committing data from cache to the disk. On the basis of the I/O access
rate and pattern, high and low levels called watermarks are set in cache to manage the flushing
process.
1. High watermark (HWM) is the cache utilization level at which the storage system starts
high speed flushing of cache data.
2. Low watermark (LWM) is the point at which the storage system stops the high-speed or
forced flushing and returns to idle flush behaviour.
The cache utilization level, as shown in Figure 4-5, drives the mode of flushing to be used:
■ Idle flushing: Occurs continuously, at a modest rate, when the cache utilization level is between
the high and low watermark.
■ High watermark flushing: Activated when cache utilization hits the high watermark. The
storage system dedicates some additional resources to flushing. This type of flushing has minimal
impact on host I/O processing.
■ Forced flushing: Occurs in the event of a large I/O burst when cache reaches 100 percent of
its capacity, which significantly affects the I/O response time. In forced flushing, dirty pages are
forcibly flushed to disk.
Cache Data Protection
Cache is volatile memory, so a power failure or any kind of cache failure will cause the loss of
data not yet committed to the disk. This risk of losing uncommitted data held in cache can be
mitigated using cache mirroring and cache vaulting:
■ Cache mirroring: Each write to cache is held in two different memory locations on two
independent memory cards. In the event of a cache failure, the write data will still be safe in the
mirrored location and can be committed to the disk. Reads are staged from the disk to the cache;
therefore, in the event of a cache failure, the data can still be accessed from the disk. As only
writes are mirrored, this method results in better utilization of the available cache. In cache
71
■ Cache vaulting: Cache is exposed to the risk of uncommitted data loss due to power failure.
This problem can be addressed in various ways: powering the memory with a battery until AC
power is restored or using battery power to write the cache content to the disk. In the event of
extended power failure, using batteries is not a viable option because in intelligent storage
systems, large amounts of data may need to be committed to numerous disks and batteries may
not provide power for sufficient time to write each piece of data to its intended disk. Therefore,
storage vendors use a set of physical disks to dump the contents of cache during power failure.
This is called cache vaulting and the disks are called vault drives. When power is restored, data
from these disks is written back to write cache and then written to the intended disks.
2.1.1.3 Back End
The back end provides an interface between cache and the physical disks. It consists of two
components:
1. Back-end ports
2. Back-end controllers.
The back-end controls data transfers between cache and the physical disks. From cache, data is
sent to the back end and then routed to the destination disk. Physical disks are connected to ports
on the back end.
The back-end controller communicates with the disks when performing reads and writes and also
provides additional, but limited, temporary data storage.
The algorithms implemented on back-end controllers provide error detection and correction,
along with RAID functionality.
For high data protection and availability, storage systems are configured with dual controllers
with multiple ports. Such configurations provide an alternate path to physical disks in the event
of a controller or port failure.
This reliability is further enhanced if the disks are also dual-ported. In that case, each disk port
can connect to a separate controller. Multiple controllers also facilitate load balancing.
2.1.1.4 Physical Disk
A physical disk stores data persistently. Disks are connected to the back-end with either SCSI or
a Fibre Channel interface (discussed in subsequent chapters). An intelligent storage system
enables the use of a mixture of SCSI or Fibre Channel drives and IDE/ATA drives.
Logical Unit Number:
Physical drives or groups of RAID protected drives can be logically split into volumes known as
logical volumes, commonly referred to as Logical Unit Numbers (LUNs).
The use of LUNs improves disk utilization. For example, without the use of LUNs, a host
requiring only 200 GB could be allocated an entire 1TB physical disk. Using LUNs, only the
72
required 200 GB would be allocated to the host, allowing the remaining 800 GB to be allocated
to other hosts. In the case of RAID protected drives, these logical units are slices of RAID sets
and are spread across all the physical disks belonging to that set. The logical units can also be
seen as a logical partition of a RAID set that is presented to a host as a physical disk. For example,
Figure 4-6 shows a RAID set consisting of five disks that have been sliced, or partitioned, into
several LUNs. LUNs 0 and 1 are shown in the figure.
Note how a portion of each LUN resides on each physical disk in the RAID set. LUNs 0 and 1
are presented to hosts 1 and 2, respectively, as physical volumes for storing and retrieving data.
Usable capacity of the physical volumes is determined by the RAID type of the RAID set.
The capacity of a LUN can be expanded by aggregating other LUNs with it. The result of this
aggregation is a larger capacity LUN, known as a meta LUN. The mapping of LUNs to their
physical location on the drives is managed by the operating environment of an intelligent storage
system.
LUN Masking:
LUN masking is a process that provides data access control by defining which LUNs a host can
access. LUN masking function is typically implemented at the front-end controller. This ensures
that volume access by servers is controlled appropriately, preventing unauthorized or accidental
use in a distributed environment. For example, consider a storage array with two LUNs that store
data of the sales and finance departments. Without LUN masking, both departments can easily
see and modify each other’s data, posing a high risk to data integrity and security. With LUN
masking, LUNs are accessible only to the designated hosts.
73
2.1.2.1 Platter
A typical HDD consists of one or more flat circular disks called platters (Figure 2-3). The data is
recorded on these platters in binary codes (0s and 1s). The set of rotating platters is sealed in a
case, called a Head Disk Assembly (HDA). A platter is a rigid, round disk coated with magnetic
material on both surfaces (top and bottom). The data is encoded by polarizing the magnetic area,
or domains, of the disk surface. Data can be written to or read from both surfaces of the platter.
The number of platters and the storage capacity of each platter determine the total capacity of the
drive.
74
2.1.2.2 Spindle
A spindle connects all the platters, as shown in Figure 2-3, and is connected to a motor. The motor
of the spindle rotates with a constant speed. The disk platter spins at a speed of several thousands
of revolutions per minute (rpm). Disk drives have spindle speeds of 7,200 rpm, 10,000 rpm, or
15,000 rpm. Disks used on current storage systems have a platter diameter of 3.5” (90 mm). When
the platter spins at 15,000 rpm, the outer edge is moving at around 25 percent of the speed of
sound. The speed of the platter is increasing with improvements in technology, although the extent
to which it can be improved is limited.
2.1.2.3 Read/Write Head
Read/Write (R/W) heads, shown in Figure 2-4, read and write data from or to a platter. Drives
have two R/W heads per platter, one for each surface of the platter. The R/W head changes the
magnetic polarization on the surface of the platter when writing data. While reading data, this
head detects magnetic polarization on the surface of the platter. During reads and writes, the R/W
head senses the magnetic polarization and never touches the surface of the platter. When the
spindle is rotating, there is a microscopic air gap between the R/W heads and the platters, known
as the head flying height. This air gap is removed when the spindle stops rotating and the R/W
head rests on a special area on the platter near the spindle. This area is called the landing zone.
The landing zone is coated with a lubricant to reduce friction between the head and the platter.
The logic on the disk drive ensures that heads are moved to the landing zone before they touch
the surface. If the drive malfunctions and the R/W head accidentally touches the surface of the
platter outside the landing zone, a head crash occurs. In a head crash, the magnetic coating on the
platter is scratched and may cause damage to the R/W head. A head crash generally results in data
loss.
2.1.2.4 Actuator Arm Assembly
The R/W heads are mounted on the actuator arm assembly (refer to Figure 2-2 [a]) which
positions the R/W head at the location on the platter where the data needs to be written or read.
The R/W heads for all platters on a drive are attached to one actuator arm assembly and move
across the platters simultaneously. Note that there are two R/W heads per platter, one for each
surface, as shown in Figure 2-4.
75
2.1.2.5 Controller
The controller (see Figure 2-2 [b]) is a printed circuit board, mounted at the bottom of a disk
drive. It consists of a microprocessor, internal memory, circuitry, and firmware. The firmware
controls power to the spindle motor and the speed of the motor. It also manages communication
between the drive and the host. In addition, it controls the R/W operations by moving the actuator
arm and switching between different R/W heads, and performs the optimization of data access.
2.1.2.6 Physical Disk Structure
Data on the disk is recorded on tracks, which are concentric rings on the platter around the spindle,
as shown in Figure 2-5. The tracks are numbered, starting from zero, from the outer edge of the
platter. The number of tracks per inch (TPI) on the platter (or the track density) measures how
tightly the tracks are packed on a platter.
Each track is divided into smaller units called sectors.
A sector is the smallest, individually addressable unit of storage. The track and sector structure is
written on the platter by the drive manufacturer using a formatting operation. The number of
sectors per track varies according to the specific drive.
The first personal computer disks had 17 sectors per track. Recent disks have a much larger
number of sectors on a single track. There can be thousands of tracks on a platter, depending on
the physical dimensions and recording density of the platter.
76
77
An SSD, or solid-state drive, is a type of storage device used in computers. This non-volatile
storage media stores persistent data on solid-state flash memory. SSDs replace traditional hard
disk drives (HDDs) in computers and perform the same basic functions as a hard drive. But SSDs
are significantly faster in comparison. With an SSD, the device's operating system will boot up
more rapidly, programs will load quicker and files can be saved faster.
A traditional hard drive consists of a spinning disk with a read/write head on a mechanical arm
called an actuator. An HDD reads and writes data magnetically. The magnetic properties,
however, can lead to mechanical breakdowns.
By comparison, an SSD has no moving parts to break or spin up or down. The two key
components in an SSD are the flash controller and NAND flash memory chips. This configuration
is optimized to deliver high read/write performance for sequential and random data requests.
Flash Memory Chip: The data is stored on a solid-state flash memory that contains storage
memory. SSD has interconnected flash memory chips, which are fabricated out of silicon. So,
SSDs are manufactured by stacking chips in a grid to achieve different densities.
78
Flash Controller: It is an in-built microprocessor that takes care of functions like error
correction, data retrieval, and encryption. It also controls access to input/output (I/O) and
read/write (R/W) operations between the SSD and host computer.2.1.3 Addressing of Hard Disk
drives and Solid-State Drives.
2.1.3 Addressing of Hard Disk drives and Solid-State Drives
Addressing of Hard Disk Drive:
2.1.3.1 Logical Block Addressing
Earlier drives used physical addresses consisting of the cylinder, head, and sector (CHS) number
to refer to specific locations on the disk, as shown in Figure 2-7 (a), and the host operating system
had to be aware of the geometry of each disk being used. Logical block addressing (LBA), shown
in Figure 2-7 (b), simplifies addressing by using a linear address to access physical blocks of data.
The disk controller translates LBA to a CHS address, and the host only needs to know the size of
the disk drive in terms of the number of blocks. The logical blocks are mapped to physical sectors
on a 1:1 basis.
In Figure 2-7 (b), the drive shows eight sectors per track, eight heads, and four cylinders. This
means a total of 8 × 8 × 4 = 256 blocks, so the block number ranges from 0 to 255. Each block
has its own unique address. Assuming that the sector holds 512 bytes, a 500 GB drive with a
formatted capacity of 465.7 GB will have in excess of 976,000,000 blocks.
Addressing of Solid-state drives:
The logical block address (LBA) is the standard used to specify the address for write and read
commands. Each LBA defines a 512 bytes sector into the device's storage space, although rarely
there are variations of the sector size.
79
80
write operations in order to calculate data transfer rates. In a read operation, the data first moves
from disk platters to R/W heads, and then it moves to the drive’s internal buffer. Finally, data
moves from the buffer through the interface to the host HBA. In a write operation, the data moves
from the HBA to the internal buffer of the disk drive through the drive’s interface. The data then
moves from the buffer to the R/W heads. Finally, it moves from the R/W heads to the platters.
Little’s Law
N=a*R
The data transfer rates during the R/W operations are measured in terms of internal and external
transfer rates, as shown in Figure 2-8.
Internal transfer rate is the speed at which data moves from a single track of a platter’s surface to
internal buffer (cache) of the disk. Internal transfer rate takes into account factors such as the seek
time. External transfer rate is the rate at which data can be moved through the interface to the
HBA. External transfer rate is generally the advertised speed of the interface, such as 133 MB/s
for ATA. The sustained external transfer rate is lower than the interface speed.
Performance of Solid-State Drives:
1. IOPS. This acronym stands for input/output operations per second. The metric measures how
many reads and writes an SSD can handle per second. The higher the IOPS, the better.
2. Throughput. This is the SSD's data transfer speed, measured in bytes per second. The higher
throughput, the better, although throughput is affected by elements such as file size and whether
the reads and writes are random or sequential.
81
3. Latency. This shows how long it takes to process an I/O operation. This process translates to
SSD response time and is measured in microseconds or milliseconds. The lower the latency, the
better.
Solid-state drives are much faster than hard disk drives, and the speed difference between the two
types is significant. When moving big files, HDDs can copy 30 to 150 MB per second (MB/s),
while standard SATA SSDs perform the same action at speeds of 500 MB/s. Newer NVMe SSDs
can get up to astounding speeds: 3,000 to 3,500 MB/s.
With an SSD, one can copy a 20 GB movie in less than 10 seconds, while a hard disk would take
at least two minutes. Upgrading your Mac to an SSD or installing an SSD in your PC will give it
a significant speed boost.
82
To address the enterprise storage needs, these arrays provide the following capabilities:
■ Large storage capacity.
■ Large amounts of cache to service host I/O s optimally.
■ Fault tolerance architecture to improve data availability.
■ Connectivity to mainframe computers and open systems hosts.
■ Availability of multiple front-end ports and interface protocols to serve a large number of hosts.
■ Availability of multiple back-end Fibre Channel or SCSI RAID controllers to manage disk
processing.
■ Scalability to support the increased connectivity, performance, and the storage capacity
requirements.
■ Ability to handle large amounts of concurrent I/O s from a number of servers and applications
■ Support for array-based local and remote replication. In addition to these features, high-end
arrays possess some unique features and functionals that are required for mission-critical
applications in large enterprises.
83
Midrange arrays are designed to meet the requirements of small and medium enterprises;
therefore, they host less storage capacity and global cache than active-active arrays.
There are also fewer front-end ports for connection to servers.
However, they ensure high redundancy and high performance for applications with predictable
workloads. They also support array-based local and remote replication.
2.2 Data Protection: RAID (Redundant Array of Independent Disks)
RAID (redundant array of independent disks) is a way of storing the same data in different places
on multiple hard disks or solid-state drives (SSDs) to protect data in the case of a drive failure.
There are different RAID levels, however, and not all have the goal of providing redundancy.
2.2.1 Implementation of RAID
There are two types of RAID implementation, hardware and software. Both have their merits and
demerits and are discussed in this section.
84
85
HDDs inside a RAID array are usually contained in smaller sub-enclosures. These sub-
enclosures, or physical arrays, hold a fixed number of HDDs, and may also include other
supporting hardware, such as power supplies. A subset of disks within a RAID array can be
grouped to form logical associations called logical arrays, also known as a RAID set or a RAID
group (see Figure 3-1).
Logical arrays are comprised of logical volumes (LV). The operating system recognizes the LVs
as if they are physical HDDs managed by the RAID controller. The number of HDDs in a logical
array depends on the RAID level used. Configurations could have a logical array with multiple
physical arrays or a physical array with multiple logical arrays.
86
Strip size (also called stripe depth) describes the number of blocks in a strip, and is the maximum
amount of data that can be written to or read from a single HDD in the set before the next HDD
is accessed, assuming that the accessed data starts at the beginning of the strip. Note that all strips
in a stripe have the same number of blocks, and decreasing strip size means that data is broken
into smaller pieces when spread across the disks.
Stripe size is a multiple of strip size by the number of HDDs in the RAID set. Stripe width refers
to the number of data strips in a stripe.
Striped RAID does not protect data unless parity or mirroring is used. However, striping may
significantly improve I/O performance. Depending on the type of RAID implementation, the
RAID controller can be configured to access data across multiple HDDs simultaneously
2.2.3.2 Mirroring
Mirroring is a technique whereby data is stored on two different HDDs, yielding two copies of
data. In the event of one HDD failure, the data is intact on the surviving HDD (see Figure 3-3)
and the controller continues to service the host’s data requests from the surviving disk of a
mirrored pair.
87
When the failed disk is replaced with a new disk, the controller copies the data from the surviving
disk of the mirrored pair. This activity is transparent to the host.
In addition to providing complete data redundancy, mirroring enables faster recovery from disk
failure. However, disk mirroring provides only data protection and is not a substitute for data
backup. Mirroring constantly captures changes in the data, whereas a backup captures point-in-
time images of data.
Mirroring involves duplication of data — the amount of storage capacity needed is twice the
amount of data being stored. Therefore, mirroring is considered expensive and is preferred for
mission-critical applications that cannot afford data loss.
Mirroring improves read performance because read requests can be serviced by both disks.
However, write performance deteriorates, as each write request manifests as two writes on the
HDDs. In other words, mirroring does not deliver the same levels of write performance as a
striped RAID
2.2.3.3 Parity
Parity is a method of protecting striped data from HDD failure without the cost of mirroring. An
additional HDD is added to the stripe width to hold parity, a mathematical construct that allows
re-creation of the missing data. Parity is a redundancy check that ensures full protection of data
without maintaining a full set of duplicate data.
Parity information can be stored on separate, dedicated HDDs or distributed across all the drives
in a RAID set. Figure 3-4 shows a parity RAID. The first four disks, labeled D, contain the data.
The fifth disk, labeled P, stores the parity information, which in this case is the sum of the
elements in each row. Now, if one of the Ds fails, the missing value can be calculated by
subtracting the sum of the rest of the elements from the parity value.
88
In Figure 3-4, the computation of parity is represented as a simple arithmetic operation on the
data. However, parity calculation is a bitwise XOR operation. Calculation of parity is a function
of the RAID controller.
Compared to mirroring, parity implementation considerably reduces the cost associated with data
protection. Consider a RAID configuration with five disks.
Four of these disks hold data, and the fifth holds parity information. Parity requires 25 percent
extra disk space compared to mirroring, which requires 100 percent extra disk space. However,
there are some disadvantages of using parity.
Parity information is generated from data on the data disk. Therefore, parity is recalculated every
time there is a change in data. This recalculation is time-consuming and affects the performance
of the RAID controller.
89
• It is easy to implement.
• It utilizes the storage capacity in a better way.
Disadvantages
• A single drive loss can result in the complete failure of the system.
• Not a good choice for a critical system.
90
• It is highly expensive.
• Storage capacity is less.
91
92
When replacing a failed drive, only the mirror is rebuilt. In other words, the disk array controller
uses the surviving drive in the mirrored pair for data recovery and continuous operation. Data
from the surviving disk is copied to the replacement disk.
RAID 0+1 is also called mirrored stripe. The basic element of RAID 0+1 is a stripe. This means
that the process of striping data across HDDs is performed initially and then the entire stripe is
mirrored. If one drive fails, then the entire stripe is faulted.
A rebuild operation copies the entire stripe, copying data from each disk in the healthy stripe to
an equivalent disk in the failed stripe. This causes increased and unnecessary I/O load on the
surviving disks and makes the RAID set more vulnerable to a second disk failure.
Advantages:
• Using multiple hard drives enables RAID to improve the performance of a single hard
drive.
• Reads and writes can be performed faster than with a single drive with RAID 0. This is
because a file system is split up and distributed across drives that work together on the
same file.
Disadvantages:
• Nested RAID levels are more expensive to implement than traditional RAID levels,
because they require more disks.
• The cost per gigabyte for storage devices is higher for nested RAID because many of the
drives are used for redundancy.
2.2.3.7 RAID 2: Bit-Level Stripping with Dedicated Parity
In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code Parity
Method to find the error in the data. It uses one designated drive to store parity.
The structure of Raid-2 is very complex as we use two disks in this technique.
One word is used to store bits of each word and another word is used to store error code correction.
It is not commonly used.
Advantages
93
94
2.2.3.8 RAID 3: Byte-Level Stripping with Dedicated Parity / Parallel access array with
dedicated parity disks
RAID 3 stripes data for high performance and uses parity for improved fault tolerance. Parity
information is stored on a dedicated drive so that data can be reconstructed if a drive fails. For
example, of five disks, four are used for data and one is used for parity. Therefore, the total disk
space required is 1.25 times the size of the data disks. RAID 3 always reads and writes complete
stripes of data across all disks, as the drives operate in parallel. There are no partial writes that
update one out of many strips in a stripe. Figure 3-8 illustrates the RAID 3 implementation.
RAID 3 provides good bandwidth for the transfer of large volumes of data.
RAID 3 is used in applications that involve large sequential data access, such as video streaming.
Advantages
95
2.2.3.9 RAID 4: Block-Level Stripping with Dedicated Parity / Striped array with
independent disks and a dedicated parity disk
Similar to RAID 3, RAID 4 stripes data for high performance and uses parity for improved fault
tolerance (refer to Figure 3-8). Data is striped across all disks except the parity disk in the array.
Parity information is stored on a dedicated disk so that the data can be rebuilt if a drive fails.
Striping is done at the block level.
Unlike RAID 3, data disks in RAID 4 can be accessed independently so that specific data
elements can be read or written on single disk without read or write of an entire stripe. RAID 4
provides good read throughput and reasonable write throughput.
Evaluation
Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If more than
one disk fails, there is no way to recover the data.
Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made available
for data storage, each disk having B blocks.
Advantages
96
Evaluation
Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If more than
one disk fails, there is no way to recover the data. This is identical to RAID-4.
Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
Advantages
97
2.2.3.11 RAID 6: Block-Level Stripping with two Parity Bits / Striped array with
independent disks & dual distributed parity
RAID 6 works the same way as RAID 5 except that RAID 6 includes a second parity element to
enable survival in the event of the failure of two disks in a RAID group (see Figure 3-10).
Therefore, a RAID 6 implementation requires at least four disks. RAID 6 distributes the parity
across all the disks.
The write penalty in RAID 6 is more than that in RAID 5; therefore, RAID 5 writes perform
better than RAID 6. The rebuild operation in RAID 6 may take longer than that in RAID 5 due
to the presence of two parity sets.
Advantages
98
Data redundancy: By keeping numerous copies of the data on many disks, RAID can shield data
from disk failures.
Performance enhancement: RAID can enhance performance by distributing data over several
drives, enabling the simultaneous execution of several read/write operations.
Scalability: RAID is scalable, therefore by adding more disks to the array, the storage capacity
may be expanded.
Versatility: RAID is applicable to a wide range of devices, such as workstations, servers, and
personal PCs
Disadvantages of RAID
Cost: RAID implementation can be costly, particularly for arrays with large capacities.
Complexity: The setup and management of RAID might be challenging.
Decreased performance: The parity calculations necessary for some RAID configurations,
including RAID 5 and RAID 6, may result in a decrease in speed.
Single point of failure: RAID is not a comprehensive backup solution, while offering data
redundancy. The array’s whole contents could be lost if the RAID controller malfunctions.
2.2.4 RAID Comparison
Table 3-2 compares the different types of RAID.
99
E p = E1 + E2 + E3 + E4 (XOR operations)
Whenever the controller performs a write I/O, parity must be computed by reading the old parity
(Ep old) and the old data (E4 old) from the disk, which means two read I/Os.
100
After computing the new parity, the controller completes the write I/O by writing the new data
and the new parity onto the disks, amounting to two write I/Os. Therefore, the controller performs
two disk reads and two disk writes for every write operation, and the write penalty in RAID 5
implementations is 4.
In RAID 6, which maintains dual parity, a disk write requires three read operations: for Ep1 old,
Ep2 old, and E4 old. After calculating Ep1 new and Ep2 new, the controller performs three write
I/O operations for Ep1 new, Ep2 new and E4 new. Therefore, in a RAID 6 implementation, the
controller performs six I/O operations for each write I/O, and the write penalty is 6.
2.2.5.1 Application IOPS and RAID Configurations
When deciding the number of disks required for an application, it is important to consider the
impact of RAID based on IOPS generated by the application. The total disk load should be
computed by considering the type of RAID configuration and the ratio of read compared to write
from the host.
The following example illustrates the method of computing the disk load in different types of
RAID.
Consider an application that generates 5,200 IOPS, with 60 percent of them being reads.
The disk load in RAID 5 is calculated as follows:
RAID 5 disk load = 0.6 × 5,200 + 4 × (0.4 × 5,200) [because the write penalty for RAID 5 is 4]
= 3,120 + 4 × 2,080
= 3,120 + 8,320
= 11,440 IOPS
101
102
For example, the rebuild could occur overnight to prevent any degradation of system
performance. However, the system is vulnerable to another failure if a hot spare is
unavailable.
2.3 Scale-up and Scale-out storage Architecture
Scaling up and scaling out are the two main methods used to increase data storage capacity.
Scale-out and scale-up architectures—also known, respectively, as horizontal scaling and vertical
scaling and scale in and scale down—refer to how companies scale their data storage: by adding
more hardware drives (scale up/vertical scaling), or by adding more software nodes (scale
out/horizontal scaling).
Scale-up is the more traditional format, but it runs into space issues as data volumes grow and the
need for more and more data storage increases. Hence, the advent of scale-out architectures.
Scale-up Architecture:
In a scale-up data storage architecture, storage drives are added to increase storage capacity and
performance.
The drives are managed by two controllers.
When you run out of storage capacity, you add another shelf of drives to the architecture.
Scale-out Architecture:
A scale-out architecture uses software-defined storage (SDS) to separate the storage hardware
from the storage software, letting the software act as the controllers. This is why scale-out storage
is considered to be network attached storage (NAS).
Scale-out NAS systems involve clusters of software nodes that work together. Nodes can be added
or removed, allowing things like bandwidth, compute, and throughput to increase or decrease as
needed. To upgrade a scale-out system, new clusters must be created.
103
Scale-up Scale-out
Lower availability (If a single instance falls Higher availability (If a single instance fall it
service becomes unavailable.) doesn’t matter.)
2.3.2 Advantages
Advantages of Scale-up Architecture
Scaling up offers certain advantages, including:
• Affordability: Because there’s only one large server to manage, scaling up is a cost-
effective way to increase storage capacity since you’ll end up paying less for your network
equipment and licensing. Upgrading a pre-existingserver costs less than purchasing a new
one. Vertical scaling also tends to require less new backup and virtualization software.
• Maintenance: Since you have only one storage system to manage versus a whole cluster
of different elements, scale-up architectures are easier to manage and also make it easier
to address specific data quality issues.
104
• Simpler communication: Since vertical scaling means having just a single node handling
all the layers of your services, you don’t need to worry about your system synchronizing
and communicating with other machines to work, which can lead to faster response times.
Advantages of Scale-out Architecture
The advantages of scale-out architecture include:
• Better performance: Horizontal scaling allows for more connection endpoints since the
load will be shared by multiple machines, and this improves performance.
• Easier scaling: Horizontal scaling is much easier from a hardware perspective because
all you need to do is add machines.
• Less downtime and easier upgrades: Scaling out means less downtime because you
don’t have to switch anything off to scale or make upgrades. Scaling out essentially allows
you to upgrade or downgrade your hardware whenever you want as you can move all
users, workloads, and data without any downtime. Scale-out systems can also auto-tune
and self-heal, allowing clusters to easily accommodate all data demands.
2.3.3 Disadvantages
Disadvantages of Scale-up Architecture
The disadvantages of scale-up architectures include:
Scalability limitations: Although scaling up is how enterprises have traditionally handled
storage upgrades, this approach has slowly lost its effectiveness. The RAM, CPU, and hard drives
added to a server can only perform to the level the computing housing unit allows. As a result,
performance and capacity become a problem as the unit nears its physical limitations. This, in
turn, impacts backup and recovery times and other mission-critical processes.
Upgrade headaches and downtime: Upgrading a scale-up architecture can be extremely tedious
and involve a lot of heavy lifting. Typically, you need to copy every piece of data from the old
server over to a new machine, which can be costly in terms of both money and downtime. Also,
adding another server to the mix usually means adding another data store, which could result in
the network getting bogged down by storage pools and users not knowing where to look for files.
Both of these can negatively impact productivity. Also, with a scale-up architecture, you need to
take your existing server offline while replacing it with a new, more powerful one. During this
time, your apps will be unavailable.
Disadvantages of Scale-out Architecture
The disadvantages of horizontal scaling include:
• Complexity: It’s always going to be harder to maintain multiple servers compared to a
single server. Also, things like load balancing and virtualization may require adding
software, and machine backups can also be more complex because you’ll need to ensure
nodes synchronize and communicate effectively.
105
• Cost: Scaling out can be more expensive than scaling up because adding new servers is
far more expensive than upgrading old ones.
Which One Is Best: Scale-out or Scale-up?
The answer depends on your particular needs and resources. Here are some questions to think
about:
• Are your needs long term or short term?
• What’s your budget? Is it big or small?
• What type of workloads are you dealing with?
• Are you dealing with a temporary traffic peak or constant traffic overload?
Once you’ve answered those questions, consider these factors:
• Cost: Horizontal scaling is more expensive, at least initially, so if your budget is tight,
then scaling up might be the best choice.
• Reliability: Horizontal scaling is typically far more reliable than vertical scaling. If
you’re handling a high volume of transactional data or sensitive data, for example, and
your downtime costs are high, you should probably opt for scaling out.
• Geographic distribution: If you have, or plan to have, global clients, you’ll be much
better able to maintain your SLAs via scaling out since a single machine in a single
location won’t be enough for customers to access your services.
• Future-proofing: Because scaling up uses a single node, it’s tough to future-proof a
vertical scaling-based architecture.
With scaling out, it’s much easier to increase the overall performance threshold of your
organization by adding machines. If you’re planning for the long term and operate in a
highly competitive industry with lots of potential disruptors, scaling out would be the best
option.
In short, if you have a bigger budget and expect a steady and large growth in data over a long
period of time and need to distribute an overstrained storage workload across several storage
nodes, scaling out is the best option.
If you haven’t yet maxed out the full potential of your current infrastructure and can still add
CPUs and memory resources to it and you don’t anticipate a meaningfully large growth in your
data set over the next three to five years, then scaling up would likely be the best choice.
106
Ans: Storage is a process through which digital data is saved within a data storage device by means of
computing technology. Storage is a mechanism that enables a computer to retain data, either temporarily
or permanently.
➢ Storage devices such as flash drives and hard disks are a fundamental component of most digital
devices since they allow users to preserve all kinds of information such as videos, documents,
pictures and raw data.
➢ Storage may also be referred to as computer data storage or electronic data storage.
2. What is meant by Storage Systems?
Ans: Storage systems, in the context of information technology and data management, refer to the
hardware and software components designed to store and manage digital data, making it accessible for
future retrieval and use. These systems play a fundamental role in modern computing and are essential
for preserving and managing vast amounts of data generated by individuals, organizations, and
applications.
✓ Hosts
✓ Connectivity
✓ Storage
4. What is ment by Intelligent Storage Systems?
Ans: The intelligent storage systems are arrays that provide highly optimized I/O processing capabilities.
These arrays have an operating environment that controls the management, allocation, and utilization of
storage resources. These storage systems are configured with large amounts of memory called cache and
use sophisticated algorithms to meet the I/O requirements of performance sensitive applications.
5. List the Components of Intelligent Storage Systems?
Ans: An intelligent storage system consists of four key components
✓ Front end,
✓ Cache,
✓ Back end,
✓ Physical disks
6. What are the types of Intelligent Storage System?
Ans: There are two main categories in intelligent storage systems as follows
107
➢ RAID-0 (Stripping)
➢ RAID-1 (Mirroring)
➢ RAID-2 (Bit-Level Stripping with Dedicated Parity)
➢ RAID-3 (Byte-Level Stripping with Dedicated Parity)
➢ RAID-4 (Block-Level Stripping with Dedicated Parity)
➢ RAID-5 (Block-Level Stripping with Distributed Parity)
➢ RAID-6 (Block-Level Stripping with two Parity Bits)
9. What is meant by Spindle in Hard Disk Drives?
Ans: Spindle is the axis on which the hard disks spin. In storage engineering, the physical disk drive is
often called a “spindle”, referencing the spinning parts which limit the device to a single I/O operation at
a time and making it the focus of Input/Output scheduling decisions.
➢ Disk Platters
➢ Read/ Write Heads
➢ Head Actuator mechanism
➢ Logic Board
➢ Spindle motor
➢ Cables and Connectors
➢ Configuration items (jumpers, Switches etc)
12. How do you measure the performance of hard disks and Solid drives?
Ans: The performance of Hard disks and Solid drives are measured as follows
➢ Disk service time-Disk service time is the time taken by a disk to complete an I/O request.
108
➢ Seek Time- Describes the time taken to position the R/W heads across the platter with a radial
movement.
➢ Rotational latency- The time taken by the platter to rotate and position the data under the R/W
head.
➢ Data Transfer Rate- The data transfer rate refers to the average amount of data per unit time that
the drive can deliver to HBA.
Little’s Law
N=a*R
➢ Scaling up is adding further resources, like hard drives and memory, to increase the computing
capacity of physical servers.
➢ Scaling out is adding more servers to your architecture to spread the workload across more
machines.
14. What is RAID 1 with example?
Ans: RAID 1 is also called as Disk Mirroring, Mirroring is a technique whereby data is stored on two
different HDDs, yielding two copies of data. In the event of one HDD failure, the data is intact on the
surviving HDD and the controller continues to service the host’s data requests from the surviving disk of
a mirrored pair.
Components that contribute to service time on a disk drive are seek time, rota
The operating system tells the drive to read or write a certain Logical Block Address (LBA).
Traditionally, each LBA refers to the start of a 512 byte sector on the drive. A 1 TB disk drive will have
around 2 billion sectors each numbered consecutively from the start of the drive.
109
Review Questions
110
111
The block-based storage system may consists of one or more controller(s) and number of storage
disks.
Controller
A controller of a block-based storage system consists of three key components: front end, cache,
and back end. An I/O request received from the hosts or compute systems at the front-end port is
processed through cache and back end, to enable storage and retrieval of data from the storage. A
read request can be serviced directly from cache if the requested data is found in the cache. In
modern intelligent storage systems, front end, cache, and back end are typically integrated on a
single board referred as a storage processor or storage controller.
For high data protection and high availability, storage systems are configured with dual
controllers with multiple ports. Such configurations provide an alternative path to physical
storage drives if a controller or port failure occurs. This reliability is further enhanced if the
storage drives are also dual-ported. In that case, each drive port can connect to a separate
controller. Multiple controllers also facilitate load balancing.
Front End
The front end provides the interface between the storage system and the hosts. It consists of two
components: front-end ports and front-end controllers. Typically, a front end has redundant
controllers for high availability, and each controller contains multiple ports that enable large
numbers of hosts to connect to the intelligent storage system. Each front-end controller has
processing logic that executes the appropriate transport protocol, such as Fibre Channel, iSCSI,
FICON, or FCoE for storage connections. Front-end controllers route data to and from cache via
the internal data bus. When the cache receives the write data, the controller sends an
acknowledgement message back to the compute system.
Backend
The back end provides an interface between cache and the physical storage drives. It consists of
two components: back-end ports and back-end controllers. The back-end controls data transfers
between cache and the physical drives. From cache, data is sent to the back end and then routed
to the destination storage drives. Physical drives are connected to ports on the back end. The
back-end controller communicates with the storage drives when performing reads and writes and
also provides additional, but limited, temporary data storage. The algorithms implemented on
back-end controllers provide error detection and correction, along with RAID functionality.
Storage
112
Physical storage drives are connected to the back-end storage controller and provide persistent
data storage. Modern intelligent storage systems provide support to a variety of storage drives
with different speeds and types, such as FC, SATA, SAS, and solid state drives. They also support
the use of a mix of SSD, FC, or SATA within the same storage system.
Workloads that have predictable access patterns typically work well with a combination of HDDs
and SSDs. If the workload changes, or constant high performance is required for all the storage
being presented, using a SSD can meet the desirable performance requirements.
3.1.1.2 File-Based Storage System
File-based storage systems (NAS) are based on file hierarchies that are complex in structure. Most
file systems have restrictions on the number of files, directories and levels of hierarchy that can
be supported, which limits the amount of data that can be stored. Whereas Object based storage
systems stores data using flat address space where the objects exist at the same level and one
object cannot be placed inside another object.
File sharing allows users to share files with other users. In a file-sharing environment, a user who
creates the file (the creator or owner of a file) determines the type of access (such as read, write,
execute, append, delete) to be given to other users. When multiple users try to access a shared file
at the same time, a locking scheme is used to maintain data integrity and at the same time make
this sharing possible. Some examples of file-sharing methods are
• Peer-to-Peer (P2P) model – A peer-to-peer (P2P) file sharing model uses peer-to-peer
network. P2P enables client machines to directly share files with each other over a
network.
• File Transfer Protocol (FTP) – FTP is a client-server protocol that enables data transfer
over a network. An FTP server and an FTP client communicate with each other using TCP
as the transport protocol.
• Distributed File System (DFS) – A distributed file system (DFS) is a file system that is
distributed across several hosts. A DFS can provide hosts with direct access to the entire
file system, while ensuring efficient management and data security. Hadoop Distributed
File System (HDFS) is an example of distributed file system.
The standard client-server file-sharing protocols, such as NFS and CIFS, enable the owner of a
file to set the required type of access, such as read-only or read-write, for a particular user or
group of users. Using this protocol, the clients can mount remote file systems that are available
on dedicated file servers.
So, for example if somebody shares a folder with you over the network, once you are connected
to the network, the shared folder is ready to use. There is no need to format before accessing it
unlike in block storage. Shared file storage is often referred to as network-attached storage (NAS)
and uses protocols such as NFS and SMB/CIFS to share storage.
3.1.1.3 Object-Based is Storage System
Object storage is a new type of storage system designed for cloud-scale scalability. Objects are
stored and retrieved from an object store through the web-based APIs such as REST and SOAP.
Each object can be linked with extensive metadata that can be searched and indexed. Object
113
storage is ideal for rich content data that does not change often and does not require high
performance. It is popular in the public cloud model.
Object-based Storage
Object-based storage device stores data in the form of objects on flat address space based on its
content and other attributes rather than the name and the location. An object is the fundamental
unit of object-based storage that contains user data, related metadata (size, date, ownership, etc.),
and user defined attributes of data (retention, access pattern, and other business-relevant
attributes).
The additional metadata or attributes enable optimized search, retention and deletion of objects.
For example, when bank account information is stored as a file in a NAS system, the metadata is
basic and may include information such as file name, date of creation, owner, and file type. When
stored as an object, the metadata component of the object may include additional information
such as account name, ID, and bank location, apart from the basic metadata.
The object ID is generated using specialized algorithms such as a hash function on the data and
guarantees that every object is uniquely identified. Any changes in the object, like user-based
edits to the file, results in a new object ID. Most of the object storage system supports APIs to
integrate it with software-defined data center and cloud environments.
Unlike SAN and NAS, applications do not know the location of the object stored. With object
storage, the application creates some data and give it to the OSD in exchange for a unique object
id (OID). The application which created the data does not need to know where the object is stored
as long as it is protected and returned whenever the application needed it.
For example, Consider a traditional car parking in any shopping mall or restaurant. It is your
responsibility to remember where you have parked your car in the huge parking area. But now a
days we have Valet parking, you just need to give your keys and you will have no idea where
your car will be parked and they will bring it back to you when you needed it. Similarly in Object
storage, the application will not know the location of the object but it can get it whenever it is
needed.
Components of Object based Storage Device
114
The OSD system is typically composed of three key components: Controllers, internal network,
and storage.
Nodes (controllers)
The OSD system is composed of one or more nodes or controllers. A node is a server that runs
the OSD operating environment and provides services to store, retrieve, and manage data in the
system. Typically OSD systems are architected to work with inexpensive x86-based nodes, each
node provides both compute and storage resources, and scales linearly in capacity and
performance by simply adding nodes.
The OSD node has two key services: metadata service and storage service. The metadata service
is responsible for generating the object ID from the contents of a file. It also maintains the
mapping of the object IDs and the file system namespace. In some implementations, the metadata
service runs inside an application server. The storage service manages a set of disks on which the
user data is stored.
Internal Network
The OSD nodes connect to the storage via an internal network. The internal network provides
node-to-node connectivity and node-to-storage connectivity. The application server accesses the
node to store and retrieve data over an external network.
Storage
OSD typically uses low-cost and high-density disk drives to store the objects. As more capacity
is required, more disk drives can be added to the system.
Object storage is not designed for high-performance and high-change requirements, nor is it
designed for storage of structured data such as databases. This is because object storage often
doesn’t allow updates in place. It is also not necessarily the best choice for data that changes a
lot. What it is great for is storage and retrieval of rich media and other Web 2.0 types of content
such as photos, videos, audio, and other documents.
3.1.1.4 Unified Storage
Unified storage architecture enables the creation of a common storage pool that can be shared
across a diverse set of applications with a common set of management processes.
115
The key component of a unified storage architecture is unified controller. The unified controller
provides the functionalities of block storage, file storage, and object storage. It contains iSCSI,
FC, FCoE and IP front-end ports for direct block access to application servers and file access to
NAS clients.
For block-level access, the controller configures LUNs and presents them to application servers
and the LUNs presented to the application server appear as local physical disks. A file system is
configured on these LUNs at the server and is made available to applications for storing data.
For NAS clients, the controller configures LUNs and creates a file system on these LUNs and
creates a NFS, CIFS, or mixed share, and exports the share to the clients.
Some storage vendors offer REST API to enable object-level access for storing data from the
web/cloud applications.
The advantages by deploying unified storage systems
• Creates a single pool of storage resources that can be managed with a single management
interface.
• Sharing of pooled storage capacity for multiple business workloads should lead to a lower
overall system cost and administrative time, thus reducing the total cost of ownership
(TCO).
• Provides the capability to plan the overall storage capacity consumption. Deploying a
unified storage system takes away the guesswork associated with planning for file and
block storage capacity separately.
• Increased utilization, with no stranded capacity. Unified storage eliminates the capacity
utilization penalty associated with planning for block and file storage support separately.
• Provides the capability to integrate with software-defined storage environment to provide
next generation storage solutions for mobile, cloud, big data, and social computing needs.
116
By using this technology, it is technically possible to share any SCSI device over an FC SAN.
However, 99.9 % of devices shared on an FC SAN are disk storage devices or tape drives and
tape libraries. These devices are block devices which are effectively raw devices which appears
to operating system as they are locally attached devices. These devices do not have any higher
levels of abstraction such as file systems applied to them, which means that in an FC SAN
environment the creation or addition of file systems is the responsibility of the host or server
accessing the block storage device.
FC is a high-speed network technology that runs on high-speed optical fiber cables and serial
copper cables. The FC technology was developed to meet the demand for the increased speed of
data transfer between compute systems and mass storage systems.
The latest FC implementations of 16 GFC offer a throughput of 3200 MB/s (raw bit rates of 16
Gb/s), whereas Ultra640 SCSI is available with a throughput of 640 MB/s. FC is expected to
come with 6400 MB/s (raw bit rates of 32 Gb/s) and 25600 MB/s (raw bit rates of 128 Gb/s)
throughput in 2016. Technical Committee T11, which is the committee within International
117
The flow control mechanism in FC SAN delivers data as fast as the destination buffer is able to
receive it, without dropping frames. FC also has very little transmission overhead. The FC
architecture is highly scalable, and theoretically, a single FC SAN can accommodate
approximately 15 million devices.
As per EMC definition, Software-defined networking is an approach to abstract and separate the
control plane functions from the data plane functions. Instead of the built-in control functions at
the network components level, the software external to the components takes over the control
functions.
The software runs on a compute-system or a standalone device and is called network controller.
The network controller interacts with the network components to gather configuration
information and to provide instructions for data plane in order to handle the network traffic. The
Software Defined Networking is
118
119
3.2.2.1 Physical Components: Host bus adapters and converged network adapters:
The key FC SAN physical components are network adapters, cables, and interconnecting devices.
These components provide the connection network between the storage system and hosts. Here
we will see the major physical components to design a Fibre Channel SAN environment.
Network adapters: In an FC SAN, the end devices, such as server or host and storage systems
are all referred to as nodes. Each node is a source or destination of information. Each node
requires one or more network adapters to provide a physical interface for communicating with
other nodes. Hosts and servers connect to the SAN through one or more Fibre Channel host bus
adapters (HBA) or converged network adapters (CNA) which are installed on the PCIe bus of the
host. Examples of network adapters are FC host bus adapters (HBAs) and storage system front-
end adapters.
Hosts interface with the FC SAN via either HBAs or CNAs. These PCI devices appear to the host
operating system as SCSI adapters, and any storage volumes presented to the OS via them appear
as locally attached SCSI devices. Both types of card also offer hardware offloads for FCP
operations, whereas CNA cards also offer hardware offloads of other protocols such as iSCSI and
TCP/IP.
120
• FC switch – FC switches are more intelligent than FC hubs and directly route data
from one physical port to another. Therefore, the nodes do not share the data path.
Instead, each node has a dedicated communication path. The FC switches are
commonly available with a fixed port count. Some of the ports can be active for
operational purpose and the rest remain unused. The number of active ports can be
scaled-up non-disruptively.
• FC Directors – FC directors are high-end switches with a higher port count. A director
has a modular architecture and its port count is scaled-up by inserting additional line
cards or blades to the director’s chassis. Directors contain redundant components with
automated failover capability. Its key components such as switch controllers, blades,
power supplies, and fan modules are all hot-swappable. These insure high availability
for business critical applications.
The difference between directors and switches is that larger switches, usually with 128 or more
ports, are referred to as directors, whereas those with lower port counts are referred to as switches
or workgroup switches. Directors have more high-availability (HA) features and more built-in
redundancy than smaller workgroup-type switches. For example, director switches can have two
control processor cards running in active/passive mode. In the event that the active control
processor fails, the standby assumes control and service is maintained. This redundant control
processor model also allows for non-disruptive firmware updates. switches do not have this level
of redundancy.
3.2.2.3 FC Storage Arrays
Active-active storage system
Supports access to the LUNs simultaneously through all the storage ports that are available
without significant performance degradation. All the paths are active, unless a path fails.
121
FC SAN implementations primarily use optical fiber cabling. Copper cables may be used for
shorter distances because it provides acceptable signal-to-noise ratio for distances up to 30
meters. Optical fiber cables carry data in the form of light. There are two types of optical cables:
multimode and single-mode.
• Multimode fiber (MMF) cable carries multiple beams of light projected at different
angles simultaneously onto the core of the cable. In an MMF transmission, multiple
122
light beams travelling inside the cable tend to disperse and collide. This collision
weakens the signal strength after it travels a certain distance – a process known as
modal dispersion. Due to modal dispersion, an MMF cable is typically used for short
distances, commonly within a data center.
• Single-mode fiber (SMF) carries a single ray of light projected at the center of the core.
The small core and the single light wave help to limit modal dispersion. Single-mode
provides minimum signal attenuation over maximum distance (up to 10 km). A single-
mode cable is used for long-distance cable runs, and the distance usually depends on
the power of the laser at the transmitter and the sensitivity of the receiver.
123
The first field of the FC address contains the domain ID of the switch. A domain ID is a unique
number provided to each switch in the fabric. Although this is an 8-bit field, there are only 239
available addresses for domain ID because some addresses are deemed special and reserved for
fabric services.
124
For example, FFFFFC is reserved for the name server, and FFFFFE is reserved for the fabric
login service.
The area ID is used to identify a group of switch ports used for connecting nodes. An example of
a group of ports with common area ID is a port card on the switch.
The last field, the port ID, identifies the port within the group. Therefore, the maximum possible
number of node ports in a switched fabric is calculated as:
239 domains X 256 areas X 256 ports = 15,663,104 ports.
FC Address of an NL_port
The FC addressing scheme for an NL_port differs from other ports. The two upper bytes in the
FC addresses of the NL_ports in a private loop are assigned zero values.
However, when an arbitrated loop is connected to a fabric through an FL_port, it becomes a public
loop. In this case, an NL_port supports a fabric login.
The two upper bytes of this NL_port are then assigned a positive value, called a loop identifier,
by the switch. The loop identifier is the same for all NL_ports on a given loop.
Figure 6-15 illustrates the FC address of an NL_port in both a public loop and a private loop. The
last field in the FC addresses of the NL_ports, in both public and private loops, identifies the AL-
PA. There are 127 allowable AL-PA addresses; one address is reserved for the FL_port on the
switch.
125
3.2.2.7 FC Fabrics
A fabric is a collection of connected FC switches that have a common set of services such as they
can share a common name server, common zoning database and common FSPS routing table.
You can also deploy dual redundant fabrics for resiliency. Each fabric is viewed and managed as
a single logical entity and it is common across the fabric to update the zoning configuration from
any switch in the fabric.
Every FC switch in a fabric needs a domain ID. This domain ID is a numeric string that is used
to uniquely identify the switch in the fabric. These domain IDs can be administratively set or
dynamically assigned by the principal switch in a fabric during reconfiguration. These Domain
ID must be a unique IDs within a fabric and should be used for another switch.
Principal Switch is a main switch in a fabric that is responsible for managing the distribution of
domain IDs within the fabric.
3.2.2.8 FC Frame structure
In an FC network, data transport is analogous to a conversation between two people, whereby a
frame represents a word, a sequence represents a sentence, and an exchange represents a
conversation.
Exchange: An exchange operation enables two node ports to identify and manage a set of
information units. Each upper layer protocol (ULP) has its protocol-specific information that
must be sent to another port to perform certain operations. This protocol-specific information is
called an information unit. The structure of these information units is defined in the FC-4 layer.
This unit maps to a sequence. An exchange is composed of one or more sequences.
Sequence: A sequence refers to a contiguous set of frames that are sent from one port to another.
A sequence corresponds to an information unit, as defined by the ULP.
Frame: A frame is the fundamental unit of data transfer at FC-2 layer. An FC frame consists of
five parts: start of frame (SOF), frame header, data field, cyclic redundancy check (CRC), and
end of frame (EOF).
126
The S_ID and D_ID are standard FC addresses for the source port and the destination port,
respectively. The SEQ_ID and OX_ID identify the frame as a component of a specific sequence
and exchange, respectively.
The frame header also defines the following fields:
■ Routing Control (R_CTL): This field denotes whether the frame is a link control frame or a
data frame. Link control frames are non-data frames that do not carry any payload. These frames
are used for setup and messaging. In contrast, data frames carry the payload and are used for data
transmission.
■ Class Specific Control (CS_CTL): This field specifies link speeds for class 1 and class 4 data
transmission.
■ TYPE: This field describes the upper layer protocol (ULP) to be carried on the frame if it is a
data frame. However, if it is a link control frame, this field is used to signal an event such as
“fabric busy.” For example, if the TYPE is 08, and the frame is a data frame, it means that the
SCSI will be carried on an FC.
■ Data Field Control (DF_CTL): A 1-byte field that indicates the existence of any optional
headers at the beginning of the data payload. It is a mechanism to extend header information into
the payload.
■ Frame Control (F_CTL): A 3-byte field that contains control information related to frame
content. For example, one of the bits in this field indicates whether this is the first sequence of
the exchange. The SOF and EOF act as delimiters. The frame header is 24 bytes long and contains
addressing information for the frame. The data field in an FC frame contains the data payload, up
to 2,112 bytes of actual data – in most cases the SCSI data. The CRC checksum facilitates error
detection for the content of the frame. This checksum verifies data integrity by checking whether
the content of the frames are received correctly. The CRC checksum is calculated by the sender
before encoding at the FC-1 layer. Similarly, it is calculated by the receiver after decoding at the
FC-1 layer.
127
3.2.2.9 FC Services (Fabric login server, Name server, Fabric controller, Management
server)
All FC switches, regardless of the manufacturer, provide a common set of services as defined in
the FC standards. These services are available at certain predefined addresses. Some of these
services are Fabric Login Server, Fabric Controller, Name Server, and Management Server.
Fabric Login Server: It is located at the predefined address of FFFFFE and is used during the
initial part of the node’s fabric login process.
Name Server (formally known as Distributed Name Server): It is located at the predefined
address FFFFFC and is responsible for name registration and management of node ports. Each
switch exchanges its Name Server information with other switches in the fabric to maintain a
synchronized, distributed name service.
Fabric Controller: Each switch has a Fabric Controller located at the predefined address
FFFFFD. The Fabric Controller provides services to both node ports and other switches. The
Fabric Controller is responsible for managing and distributing Registered State Change
Notifications (RSCNs) to the node ports registered with the Fabric Controller. If there is a change
in the fabric, RSCNs are sent out by a switch to the attached node ports. The Fabric Controller
also generates Switch Registered State Change Notifications (SW-RSCNs) to every other domain
(switch) in the fabric. These RSCNs keep the name server up-to-date on all switches in the fabric.
Management Server: FFFFFA is the FC address for the Management Server. The Management
Server is distributed to every switch within the fabric. The Management Server enables the FC
SAN management software to retrieve information and administer the fabric.
Fabric services define three login types:
• Port login (PLOGI): It is performed between two N_Ports to establish a session. The
initiator N_Port sends a PLOGI request frame to the target N_Port, which accepts it.
The target N_Port returns an ACC to the initiator N_Port. Next, the N_Ports exchange
service parameters relevant to the session.
• Process login (PRLI): It is also performed between two N_Ports. This login relates to
the FC-4 ULPs, such as SCSI. If the ULP is SCSI, N_Ports exchange SCSI-related
service parameters.
128
Zoning also provides access control, along with other access control mechanisms, such as LUN
masking. Zoning provides control by allowing only the members in the same zone to establish
129
communication with each other. Multiple zones can be grouped together to form a zone set and
this zone set is applied to the fabric. Any new zone configured needs to be added to the active
zone set in order to applied to the fabric.
Zone members, zones, and zone sets form the hierarchy defined in the zoning process. A zone set
is composed of a group of zones that can be activated or deactivated as a single entity in a fabric.
Multiple zone sets may be defined in a fabric, but only one zone set can be active at a time.
Members are the nodes within the FC SAN that can be included in a zone.
FC switch ports, FC HBA ports, and storage system ports can be members of a zone. A port or
node can be a member of multiple zones. Nodes distributed across multiple switches in a switched
fabric may also be grouped into the same zone. Zone sets are also referred to as zone
configurations.
• Always keep the zones small so that the troubleshooting may get simpler.
• Have only a single initiator in each zone and it is not recommended to have more than
one initiator in a zone.
• To make troubleshooting easier, also keep the number of targets in a zone small.
• Give meaningful aliases and names to your zones so that they can easily identified
during troubleshooting.
• Zone changes need to be done with extreme caution and caring to prevent unwanted
access of sensitive data.
130
WWN zoning: It uses World Wide Names to define zones. The zone members are the unique
WWN addresses of the FC HBA and its targets (storage systems). A major advantage of WWN
zoning is its flexibility. If an administrator moves a node to another switch port in the fabric, the
node maintains connectivity to its zone partners without having to modify the zone configuration.
This is possible because the WWN is static to the node port. WWN zoning is also referred as soft
zoning sometimes.
Port zoning: It uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone members are the
port identifiers (switch domain ID and port number) to which FC HBA and its targets (storage
systems) are connected. If a node is moved to another switch port in the fabric, port zoning must
be modified to allow the node, in its new port, to participate in its original zone. However, if an
FC HBA or storage system port fails, an administrator just has to replace the failed device without
changing the zoning configuration. Port zoning is also referred as hard zoning sometimes.
Mixed zoning: It combines the qualities of both WWN zoning and port zoning. Using mixed
zoning enables a specific node port to be tied to the WWN of another node.
3.2.2.12 FC Classes and Service
The FC standards define different classes of service to meet the requirements of a wide range of
applications. The table below shows three classes of services and their features (Table 6-1).
131
Another class of services is class F, which is intended for use by the switches communicating
through ISLs. Class F is similar to Class 2, and it provides notification of non-delivery of frames.
Other defined Classes 4, 5, and 6 are used for specific applications. Currently, these services are
not in common use.
3.2.2.13 Virtual SAN
Virtual SAN (also called virtual fabric) is a logical fabric on an FC SAN, which enables
communication among a group of nodes regardless of their physical location in the fabric.
Each SAN can be partitioned into smaller virtual fabrics, generally called as VSANs. VSANs are
similar to VLANs in the networking and allows to partition physical kit into multiple smaller
logical SANs/fabrics. It is possible to route traffic between virtual fabrics by using vendor-
specific technologies.
In a VSAN, a group of node ports communicate with each other using a virtual topology defined
on the physical SAN. Multiple VSANs may be created on a single physical SAN. Each VSAN
behaves and is managed as an independent fabric. Each VSAN has its own fabric services,
configuration, and set of FC addresses. Fabric-related configurations in one VSAN do not affect
the traffic in another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.
132
VSANs improve SAN security, scalability, availability, and manageability. VSANs provide
enhanced security by isolating the sensitive data in a VSAN and by restricting the access to the
resources located within that VSAN.
For example, a cloud provider typically isolates the storage pools for multiple cloud services by
creating multiple VSANs on an FC SAN. Further, the same FC address can be assigned to nodes
in different VSANs, thus increasing the fabric scalability.
The events causing traffic disruptions in one VSAN are contained within that VSAN and are not
propagated to other VSANs. VSANs facilitate an easy, flexible, and less expensive way to
manage networks. Configuring VSANs is easier and quicker compared to building separate
physical FC SANs for various node groups. To regroup nodes, an administrator simply changes
the VSAN configurations without moving nodes and recabling.
Configuring VSAN
To configure VSANs on a fabric, an administrator first needs to define VSANs on fabric switches.
Each VSAN is identified with a specific number called VSAN ID. The next step is to assign a
VSAN ID to the F_Ports on the switch. By assigning a VSAN ID to an F_Port, the port is included
in the VSAN. In this manner, multiple F_Ports can be grouped into a VSAN.
For example, an administrator may group switch ports (F_Ports) 1 and 2 into VSAN 10 (ID) and
ports 6 to 12 into VSAN 20 (ID). If an N_Port connects to an F_Port that belongs to a VSAN, it
becomes a member of that VSAN. The switch transfers FC frames between switch ports that
belong to the same VSAN.
VSAN versus Zone
133
Both VSANs and zones enable node ports within a fabric to be logically segmented into groups.
But they are not same and their purposes are different. There is a hierarchical relationship between
them. An administrator first assigns physical ports to VSANs and then configures independent
zones for each VSAN. A VSAN has its own independent fabric services, but the fabric services
are not available on a per-zone basis.
VSAN Trunking
VSAN trunking allows network traffic from multiple VSANs to traverse a single ISL. It supports
a single ISL to permit traffic from multiple VSANs along the same path. The ISL through which
multiple VSAN traffic travels is called a trunk link.
VSAN trunking enables a single E_Port to be used for sending or receiving traffic from multiple
VSANs over a trunk link. The E_Port capable of transferring multiple VSAN traffic is called a
trunk port. The sending and receiving switches must have at least one trunk E_Port configured
for all of or a subset of the VSANs defined on the switches.
VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It reduces the
number of ISLs when the switches are configured with multiple VSANs. As the number of ISLs
between the switches decreases, the number of E_Ports used for the ISLs also reduces. By
eliminating needless ISLs, the utilization of the remaining ISLs increases. The complexity of
managing the FC SAN is also minimized with a reduced number of ISLs.
VSAN Tagging
VSAN tagging is the process of adding or removing a marker or tag to the FC frames that contains
VSAN-specific information. Associated with VSAN trunking, it helps isolate FC frames from
multiple VSANs that travel through and share a trunk link. Whenever an FC frame enters an FC
switch, it is tagged with a VSAN header indicating the VSAN ID of the switch port (F_Port)
before sending the frame down to a trunk link.
134
The receiving FC switch reads the tag and forwards the frame to the destination port that
corresponds to that VSAN ID. The tag is removed once the frame leaves a trunk link to reach an
N_Port.
3.2.3 FC SAN Connectivity
The FC SAN physical components such as network cables network adapters and hubs or switches
can be used to design a Fibre channel Storage Area Network. The different types of FC
architecture which can be designed are
• Point-to-point
• Fibre channel arbitrated loop (FC-AL)
• Fibre channel switched fabric (FC-SW).
3.2.3.1 Point-to-Point
Point-to-point is the simplest FC configuration — two devices are connected directly to each
other, as shown in Figure 6-6. This configuration provides a dedicated connection for data
transmission between nodes.
However, the point-to-point configuration offers limited connectivity, as only two devices can
communicate with each other at a given time. Moreover, it cannot be scaled to accommodate a
large number of network devices. Standard DAS uses point to-point connectivity
135
136
Further, adding or removing a device results in loop re-initialization, which can cause a
momentary pause in loop traffic. As a loop configuration, FC-AL can be implemented without
any interconnecting devices by directly connecting one device to another two devices in a ring
through cables. However, FC-AL implementations may also use FC hubs through which the
arbitrated loop is physically connected in a star topology.
FC-AL Transmission
When a node in the FC-AL topology attempts to transmit data, the node sends an arbitration
(ARB) frame to each node on the loop. If two nodes simultaneously attempt to gain control of the
loop, the node with the highest priority is allowed to communicate with another node.
When the initiator node receives the ARB request it sent, it gains control of the loop. The initiator
then transmits data to the node with which it has established a virtual connection. Figure 6-8
illustrates the process of data transmission in an FC-AL configuration.
137
138
They enable the transfer of both storage traffic and fabric management traffic from one switch to
another. In FC-SW, nodes do not share a loop; instead, data is transferred through a dedicated
path between the nodes. Unlike a loop configuration, an FC-SW configuration provides high
scalability. The addition or removal of a node in a switched fabric is minimally disruptive; it does
not affect the ongoing traffic between other nodes. FC switches operate up to FC-2 layer, and
each switch supports and assists in providing a rich set of fabric services such as the FC name
server, the zoning database and time synchronization service. When a fabric contains more than
one switch, these switches are connected through a link known as an inter-switch link.
Inter-switch links (ISLs) connect multiple switches together, allowing them to merge into a
common fabric that can be managed from any switch in the fabric. ISLs can also be bonded into
logical ISLs that provide the aggregate bandwidth of each component ISL as well as providing
load balancing and high-availability features.
FC-SW Transmission
FC-SW uses switches that are intelligent devices. They can switch data traffic from an initiator
node to a target node directly through switch ports. Frames are routed between source and
destination by the fabric. As shown in Figure 6-11, if node B wants to communicate with node
D, Nodes should individually login first and then transmit data via the FC-SW. This link is
considered a dedicated connection between the initiator and the target.
When the number of tiers in a fabric increases, the distance that a fabric management message
must travel to reach each switch in the fabric also increases. The increase in the distance also
increases the time taken to propagate and complete a fabric reconfiguration event, such as the
addition of a new switch, or a zone set propagation event (detailed later in this chapter). Figure
6-10 illustrates two-tier and three-tier fabric architecture.
139
• N_Port: It is an end point in the fabric. This port is also known as the node port.
Typically, it is a compute system port (FC HBA port) or a storage system port that is
connected to a switch in a switched fabric.
• E_Port: It is a port that forms the connection between two FC switches. This port is
also known as the expansion port. The E_Port on an FC switch connects to the E_Port
of another FC switch in the fabric ISLs.
140
• G_Port: It is a generic port on a switch that can operate as an E_Port or an F_Port and
determines its functionality automatically during initialization.
Common FC ports speeds are 2 Gbps, 4Gbps, 8Gbps and 16Gbps. FC ports along with HBA
ports, switch ports and storage array ports can be configured to autonegotiate their speed,
autonegotiate is a protocol that allows two devices to agree on a common speed for the link. It is
a good practice to hard-code the same speed at both ends of the link.
141
Each device in the FC environment is assigned a 64-bit unique identifier called the World Wide
Name (WWN). The FC environment uses two types of WWNs.
• WWNs are burned into the hardware or assigned through software. Several
configuration definitions in an FC SAN use WWN for identifying storage systems and
FC HBAs. WWNs are critical for FC SAN configuration as each node port has to be
registered by its WWN before the FC SAN recognizes it.
• World Wide Port Name (WWPN) – WWPN is used to physically identify FC adapter
ports or node ports. For example, a dual-port FC HBA has one WWNN and two
WWPNs
3.2.4.2 N_Port Virtualization
The proliferation of compute systems in a data centre causes increased use of edge switches in a
fabric. As the edge switch population grows, the number of domain IDs may become a concern
because of the limitation on the number of domain IDs in a fabric. N_Port.
Virtualization (NPV) addresses this concern by reducing the number of domain IDs in a fabric.
Edge switches supporting NPV do not require a domain ID. They pass traffic between the core
switch and the compute systems. NPV-enabled edge switches do not perform any fabric services,
and instead forward all fabric activity, such as login and name server registration to the core
switch.
All ports at the NPV edge switches that connect to the core switch are established as NP_Ports
(not E_Ports). The NP_Ports connect to an NPIV-enabled core director or switch. If the core
director or switch is not NPIV-capable, the NPV edge switches do not function. As the switch
142
enters or exits from NPV mode, the switch configuration is erased and it reboots. Therefore,
administrators should take care when enabling or disabling NPV on a switch. The figure on the
slide shows a core-edge fabric that comprises two edge switches in NPV mode and one core
switch (an FC director).
3.2.4.3 N_Port ID Virtualization (NPIV)
It enables a single N_Port (such as an FC HBA port) to function as multiple virtual N_Ports. Each
virtual N_Port has a unique WWPN identity in the FC SAN. This allows a single physical N_Port
to obtain multiple FC addresses.
VMware or Hypervisors leverage NPIV to create virtual N_Ports on the FC HBA and then assign
the virtual N_Ports to virtual machines (VMs). A virtual N_Port acts as a virtual FC HBA port.
This enables a VM to directly access LUNs assigned to it
NPIV enables an administrator to restrict access to specific LUNs to specific VMs using security
techniques like zoning and LUN masking; similarly to the assignment of a LUN to a physical
compute system. To enable NPIV, both the FC HBAs and the FC switches must support NPIV.
The physical FC HBAs on the compute system, using their own WWNs, must have access to all
LUNs that are to be accessed by VMs running on that compute system.
3.2.5 FC SAN Topologies
FC SAN offers 3 types of FC Switch topologies. They are
• Single-Switch topology
• Mesh topology
• Core-edge topology
3.2.5.1 Single-Switch topology
In a single-switch topology, the fabric consists of only a single switch. Both the compute systems
and the storage systems are connected to the same switch. A key advantage of a single-switch
fabric is that it does not need to use any switch port for ISLs. Therefore, every switch port is
usable for compute system or storage system connectivity. Further, this topology helps eliminate
FC frames travelling over the ISLs and consequently eliminates the ISL delays.
143
144
In a partial mesh topology, not all the switches are connected to every other switch. In this
topology, several hops or ISLs may be required for the traffic to reach its destination. Partial mesh
offers more scalability than full mesh topology. However, without proper placement of compute
and storage systems, traffic management in a partial mesh fabric might be complicated and ISLs
could become overloaded due to excessive traffic aggregation.
3.2.5.3 Core-edge topology
The core-edge topology has two types of switch tiers: edge and core.
The edge tier is usually composed of switches and offers an inexpensive approach to adding more
compute systems in a fabric. The edge-tier switches are not connected to each other. Each switch
at the edge tier is attached to a switch at the core tier through ISLs.
The core tier is usually composed of directors that ensure high fabric availability. In addition,
typically all traffic must either traverse this tier or terminate at this tier. In this configuration, all
storage systems are connected to the core tier, enabling compute-to-storage traffic to traverse only
one ISL. Compute systems that require high performance may be connected directly to the core
tier and consequently avoid ISL delays.
The core-edge topology increases connectivity within the FC SAN while conserving the overall
port utilization. It eliminates the need to connect edge switches to other edge switches over ISLs.
Reduction of ISLs can greatly increase the number of node ports that can be connected to the
fabric. If fabric expansion is required, then administrators would need to connect additional edge
switches to the core. The core of the fabric is also extended by adding more switches or directors
145
at the core tier. Based on the number of core-tier switches, this topology has different variations,
such as single-core topology and dual-core topology. To transform a single-core topology to dual-
core, new ISLs are created to connect each edge switch to the new core switch in the fabric.
3.2.6 Link aggregation and zoning
Link aggregation and the zoning in Fiber channel Storage Area Network are as follows
3.2.6.1 Link aggregation with example
Link aggregation combines two or more parallel ISLs into a single logical ISL, called a port-
channel, yielding higher throughput than a single ISL could provide. For example, the aggregation
of 10 ISLs into a single port-channel provides up to 160 Gb/s throughput assuming the bandwidth
of an ISL is 16 Gb/s.
Link aggregation optimizes fabric performance by distributing network traffic across the shared
bandwidth of all the ISLs in a port-channel. This allows the network traffic for a pair of node
ports to flow through all the available ISLs in the port-channel rather than restricting the traffic
to a specific, potentially congested ISL. The number of ISLs in a port channel can be scaled
depending on application’s performance requirement.
3.2.6.2 Zoning
Zoning is an FC switch function that enables node ports within the fabric to be logically
segmented into groups and communicate with each other within the group.
Zoning also provides access control, along with other access control mechanisms, such as LUN
masking. Zoning provides control by allowing only the members in the same zone to establish
communication with each other. Multiple zones can be grouped together to form a zone set and
this zone set is applied to the fabric. Any new zone configured needs to be added to the active
zone set in order to applied to the fabric.
146
Zone members, zones, and zone sets form the hierarchy defined in the zoning process. A zone set
is composed of a group of zones that can be activated or deactivated as a single entity in a fabric.
Multiple zone sets may be defined in a fabric, but only one zone set can be active at a time.
Members are the nodes within the FC SAN that can be included in a zone.
FC switch ports, FC HBA ports, and storage system ports can be members of a zone. A port or
node can be a member of multiple zones. Nodes distributed across multiple switches in a switched
fabric may also be grouped into the same zone. Zone sets are also referred to as zone
configurations.
• Always keep the zones small so that the troubleshooting may get simpler.
• Have only a single initiator in each zone and it is not recommended to have more than
one initiator in a zone.
• To make troubleshooting easier, also keep the number of targets in a zone small.
147
• Give meaningful aliases and names to your zones so that they can easily identified
during troubleshooting.
• Zone changes need to be done with extreme caution and caring to prevent unwanted
access of sensitive data.
Port zoning: It uses the switch port ID to define zones. In port zoning, access to node is
determined by the physical switch port to which a node is connected. The zone members are the
port identifiers (switch domain ID and port number) to which FC HBA and its targets (storage
systems) are connected. If a node is moved to another switch port in the fabric, port zoning must
be modified to allow the node, in its new port, to participate in its original zone. However, if an
FC HBA or storage system port fails, an administrator just has to replace the failed device without
changing the zoning configuration. Port zoning is also referred as hard zoning sometimes.
Mixed zoning: It combines the qualities of both WWN zoning and port zoning. Using mixed
zoning enables a specific node port to be tied to the WWN of another node.
148
Each SAN can be partitioned into smaller virtual fabrics, generally called as VSANs. VSANs are
similar to VLANs in the networking and allows to partition physical kit into multiple smaller
logical SANs/fabrics. It is possible to route traffic between virtual fabrics by using vendor-
specific technologies.
In a VSAN, a group of node ports communicate with each other using a virtual topology defined
on the physical SAN. Multiple VSANs may be created on a single physical SAN. Each VSAN
behaves and is managed as an independent fabric. Each VSAN has its own fabric services,
configuration, and set of FC addresses. Fabric-related configurations in one VSAN do not affect
the traffic in another VSAN. A VSAN may be extended across sites, enabling communication
among a group of nodes, in either site with a common set of requirements.
VSANs improve SAN security, scalability, availability, and manageability. VSANs provide
enhanced security by isolating the sensitive data in a VSAN and by restricting the access to the
resources located within that VSAN.
For example, a cloud provider typically isolates the storage pools for multiple cloud services by
creating multiple VSANs on an FC SAN. Further, the same FC address can be assigned to nodes
in different VSANs, thus increasing the fabric scalability.
The events causing traffic disruptions in one VSAN are contained within that VSAN and are not
propagated to other VSANs. VSANs facilitate an easy, flexible, and less expensive way to
manage networks. Configuring VSANs is easier and quicker compared to building separate
physical FC SANs for various node groups. To regroup nodes, an administrator simply changes
the VSAN configurations without moving nodes and recabling.
149
VSAN trunking enables a single E_Port to be used for sending or receiving traffic from multiple
VSANs over a trunk link. The E_Port capable of transferring multiple VSAN traffic is called a
trunk port. The sending and receiving switches must have at least one trunk E_Port configured
for all of or a subset of the VSANs defined on the switches.
VSAN trunking eliminates the need to create dedicated ISL(s) for each VSAN. It reduces the
number of ISLs when the switches are configured with multiple VSANs. As the number of ISLs
between the switches decreases, the number of E_Ports used for the ISLs also reduces. By
eliminating needless ISLs, the utilization of the remaining ISLs increases. The complexity of
managing the FC SAN is also minimized with a reduced number of ISLs.
150
The receiving FC switch reads the tag and forwards the frame to the destination port that
corresponds to that VSAN ID. The tag is removed once the frame leaves a trunk link to reach an
N_Port.
3.2.8 Basic troubleshooting tips for Fiber Channel (FC) SAN issues
There are many areas where the errors can be made and you might experience lots of issues with
the mis-configuration settings. A thorough and deep understanding of the SAN configuration is
needed to troubleshoot any storage related issues. Slight differences can make a huge data loss
and could make the organisation collapse. To troubleshoot any kind of situation, follow these tips
as a starting step before the advanced troubleshooting. There might be other tools to troubleshoot
the issues but these are basic first steps which might help you save the time.
1) Always take backup of Switch Configurations
Regular backup of switch configurations needs to be done just in regular intervals just in case if
you are unable to troubleshoot the issue and needs to revert back to the previous configuration.
Such backup files tend to be human-readable flat files that are extremely useful if you need to
compare a broken configuration image to a previously known working configuration. Another
option might be to create a new zone configuration each time you make a change, and maintain
previous versions that can be rolled back to if there are problems after committing the change.
2) Troubleshooting Connectivity Issues
151
Many of the day-to-day issues that you see are connectivity issues such as hosts not being able to
see a new LUN or not being able to see storage or tape devices on the SAN. Connectivity issues
will be due to misconfigured zoning. Each vendor provides different tools to configure and
troubleshoot zoning, but the following common CLI commands can prove very helpful.
fcping
fcping is an FC version of the popular IP ping tool. fcping allows you to test the following:
• Latency
# fcping 50:01:43:80:05:6c:22:ae
fctrace
Another tool that is modeled on a popular IP networking tool is the fctrace tool. This tool traces
a route/path to an N_Port. The following command shows an fctrace command example
# fctrace fcid 0xef0010 vsan 1
3) Things to check while troubleshooting Zoning
152
153
• Most organizations have an existing IP-based network infrastructure, which could also
be used for storage networking and may be a more economical option than deploying
a new FC SAN infrastructure.
• Many long-distance disaster recovery (DR) solutions are already leveraging IP-based
networks. In addition, many robust and mature security options are available for IP
networks.
Typically, a storage system comes with both FC and iSCSI ports. This enables both the native
iSCSI connectivity and the FC connectivity in the same environment.
3.3.1 iSCSI
iSCSI is a storage networking technology which allows storage resources to be shared over an IP
network and most of the storage resources which are shared on an iSCSI SAN are disk resources.
154
Just like as SCSI messages are mapped on Fibre Channel in FC SAN, iSCSI is a mapping of SCSI
protocol over TCP/IP.
iSCSI is an acronym for Internet Small Computer System Interface, it deals with block storage
and maps SCSI over traditional TCP/IP. This protocol is mostly used for sharing primary storage
such as disk drives and in some cases it is used for disk backup environment aswell.
SCSI commands are encapsulated at each layer of the network stack for eventual transmission
over an IP network. The TCP layer takes care of transmission reliability and in-order delivery
whereas the IP layer provides routing across the network.
In iSCSI SAN, initiators issue read/write data requests to targets over an IP network. Targets
respond to initiators over the same IP network. All iSCSI communications follow this request
response mechanism and all requests and responses are passed over the IP network as iSCSI
Protocol Data Units (PDUs). iSCSI PDU is the fundamental unit of communication in an iSCSI
SAN.
155
iSCSI performance is influenced by three main components. Best initiator performance can be
achieved with dedicated iSCSI HBAs, target performance can be achieved by purpose-built iSCSI
arrays and finally the network performance can be achieved by dedicated network switches
Multiple layers of security should be implemented on an iSCSI SAN as security is the most
important in IT infra. These include CHAP for authentication, discovery domains to restrict
device discovery, network isolation and IPsec for encryption of in-flight data.
156
• IP-based network such as a Gigabit Ethernet LAN An iSCSI initiator sends commands
and associated data to a target and the target returns data and responses to the initiator.
157
If a standard NIC is used in heavy I/O load situations, the host CPU may become a bottleneck.
TOE NIC help alleviate this burden. A TOE NIC offloads the TCP management functions from
the host and leaves iSCSI functionality to the host processor. The host passes the iSCSI
information to the TOE card and the TOE card sends the information to the destination using
TCP/IP. Although this solution improves performance, the iSCSI functionality is still handled by
a software initiator, requiring host CPU cycles.
An iSCSI HBA is capable of providing performance benefits, as it offloads the entire iSCSI and
TCP/IP protocol stack from the host processor. Use of an iSCSI HBA is also the simplest way for
implementing a boot from SAN environment via iSCSI. If there is no iSCSI HBA, modifications
have to be made to the basic operating system to boot a host from the storage devices because the
NIC needs to obtain an IP address before the operating system loads. The functionality of an
iSCSI HBA is very similar to the functionality of an FC HBA, but it is the most expensive option.
A fault-tolerant host connectivity solution can be implemented using host based multipathing
software (e.g., EMC PowerPath) regardless of the type of physical connectivity. Multiple NICs
can also be combined via link aggregation technologies to provide failover or load balancing.
Complex solutions may also include the use of vendor-specific storage-array software that
enables the iSCSI host to connect to multiple ports on the array with multiple NICs or HBAs.
3.3.1.3 Topologies for iSCSI Connectivity
Native iSCSI: Native topologies do not have any FC components; they perform all
communication over IP. The initiators may be either directly attached to targets or connected
using standard IP routers and switches.
In this type of connectivity, the host with iSCSI initiators may be either directly attached to the
iSCSI targets (iSCSI-capable storage systems) or connected through an IP-based network. FC
components are not required for native iSCSI connectivity. Below figure shows a native iSCSI
implementation that includes a storage system with an iSCSI port. The storage system is
connected to an IP network. After an iSCSI initiator is logged on to the network, it can access the
available LUNs on the storage system.
Bridged iSCSI: Bridged iSCSI Connectivity - Bridged topologies enable the co-existence of FC
with IP by providing iSCSI-to-FC bridging functionality. For example, the initiators can exist in
an IP environment while the storage remains in an FC SAN.
This type of connectivity allows the initiators to exist in an IP environment while the storage
systems remain in an FC SAN environment. It enables the coexistence of FC with IP by providing
iSCSI-to-FC bridging functionality. The above figure illustrates a bridged iSCSI implementation.
It shows connectivity between a compute system with an iSCSI initiator and a storage system
with an FC port.
As the storage system does not have any iSCSI port, a gateway or a multi-protocol router is used.
The gateway facilitates the communication between the compute system with iSCSI ports and
the storage system with only FC ports. The gateway converts IP packets to FC frames and vice
versa, thereby bridging the connectivity between the IP and FC environments. The gateway
contains both FC and Ethernet ports to facilitate the communication between the FC and the IP
158
environments. The iSCSI initiator is configured with the gateway’s IP address as its target
destination. On the other side, the gateway is configured as an FC initiator to the storage system.
159
iSCSI is the session-layer protocol that initiates a reliable session between devices that recognize
SCSI commands and TCP/IP. The iSCSI session-layer interface is responsible for handling login,
authentication, target discovery, and session management.
TCP is used with iSCSI at the transport layer to provide reliable transmission. TCP controls
message flow, windowing, error recovery, and retransmission. It relies upon the network layer of
the OSI model to provide global addressing and connectivity. The OSI Layer 2 protocols at the
data link layer of this model enable node-to-node communication through a physical network.
3.3.1.5 iSCSI Discovery
An iSCSI initiator must discover the location of its targets on the network and the names of the
targets available to it before it can establish a session. This discovery commonly takes place in
two ways: SendTargets discovery or internet Storage Name Service (iSNS).
SendTargets discovery: In SendTargets discovery, the initiator is manually configured with the
target’s network portal (IP address and TCP port number) to establish a discovery session. The
initiator issues the SendTargets command, and thereby the target network portal responds to the
initiator with the location and name of the target.
iSNS: iSNS in the iSCSI SAN is equivalent in function to the Name Server in an FC SAN. It
enables automatic discovery of iSCSI devices on an IP-based network. The initiators and targets
can be configured to automatically register themselves with the iSNS server. Whenever an
initiator wants to know the targets that it can access, it can query the iSNS server for a list of
available targets.
160
161
iSCSI Qualified Name (IQN): An organization must own a registered domain name to generate
iSCSI Qualified Names. This domain name does not need to be active or resolve to an address. It
just needs to be reserved to prevent other organizations from using the same domain name to
generate iSCSI names. A date is included in the name to avoid potential conflicts caused by the
transfer of domain names. An example of an IQN is iqn.2015-04.com.example: optional_string.
The optional_string provides a serial number, an asset number, or any other device identifiers.
IQN enables storage administrators to assign meaningful names to the iSCSI initiators and the
iSCSI targets, and therefore, manages those devices more easily.
Extended Unique Identifier (EUI): An EUI is a globally unique identifier based on the IEEE
EUI-64 naming standard. An EUI is composed of the eui prefix followed by a 16-character
hexadecimal name, such as eui.0300732A32598D26.
Network Address Authority (NAA): NAA is another worldwide unique naming format as
defined by the Inter-National Committee for Information Technology Standards (INCITS) T11 –
Fibre Channel (FC) protocols and is used by Serial Attached SCSI (SAS). This format enables
the SCSI storage devices that contain both iSCSI ports and SAS ports to use the same NAA-based
SCSI device name. An NAA is composed of the naa prefix followed by a hexadecimal name,
such as naa.52004567BA64678D. The hexadecimal representation has a maximum size of 32
characters (128-bit identifier).
3.3.1.7 iSCSI Session
An iSCSI session is established between an initiator and a target. A session ID (SSID), which
includes an initiator ID (ISID) and a target ID (TSID), identifies a session.
The session can be intended for one of the following:
■ Discovery of available targets to the initiator and the location of a specific target on a network
■ Normal operation of iSCSI (transferring data between initiators and targets)
TCP connections may be added and removed within a session. Each iSCSI connection within the
session has a unique connection ID (CID).
3.3.1.8 iSCSI PDU
iSCSI initiators and targets communicate using iSCSI Protocol Data Units (PDUs). All iSCSI
PDUs contain one or more header segments followed by zero or more data segments. The PDU
is then encapsulated into an IP packet to facilitate the transport.
A PDU includes the components shown in Figure 8-6. The IP header provides packet-routing
information that is used to move the packet across a network. The TCP header contains the
information needed to guarantee the packet’s delivery to the target. The iSCSI header describes
how to extract SCSI commands and data for the target. iSCSI adds an optional CRC, known as
the digest, beyond the TCP checksum and Ethernet CRC to ensure datagram integrity. The header
and the data digests are optionally used in the PDU to validate integrity, data placement, and
correct operation.
162
As shown in Figure 8-7, each iSCSI PDU does not correspond in a 1:1 relationship with an IP
packet. Depending on its size, an iSCSI PDU can span an IP packet or even coexist with another
PDU in the same packet. Therefore, each IP packet and Ethernet frame can be used more
efficiently because fewer packets and frames are required to transmit the SCSI information.
163
Switch aggregation combines two physical switches to make them appear as a single logical switch. All
network links from these physical switches appear as a single logical link. This enables nodes to use a
port-channel across two switches. The network traffic is also distributed across all the links in the port-
channel. Switch aggregation allows ports in both the switches to be active and to forward network traffic
simultaneously. Therefore, it provides more active paths and throughput than a single switch or multiple
non-aggregated switches under normal conditions, resulting in improved node performance. With switch
aggregation, if one switch in the aggregation fails, network traffic will continue to flow through another
switch. In the figure, four physical links to the aggregated switches appear as a single logical link to the
third switch.
3.3.1.10 VLAN
Virtual LANs (VLANs) are logical networks created on a LAN. A VLAN enables communication
between a group of nodes with a common set of functional requirements independent of their physical
location in the network. VLANs are particularly well-suited for iSCSI deployments as they enable
isolating the iSCSI traffic from other network traffic (for example, compute-to-compute traffic) when a
physical Ethernet network is used to transfer different types of network traffic.
A VLAN conceptually functions in the same way as a VSAN. Each VLAN behaves and is managed as
an independent LAN. Two nodes connected to a VLAN can communicate between themselves without
routing of frames even if they are in different physical locations. VLAN traffic must be forwarded via a
router or OSI Layer-3 switching device when two nodes in different VLANs are communicating even if
they are connected to the same physical LAN. Network broadcasts within a VLAN generally do not
propagate to nodes that belong to a different VLAN, unless configured to cross a VLAN boundary.
164
To configure VLANs, an administrator first defines the VLANs on the switches. Each VLAN is identified
by a unique 12-bit VLAN ID (as per IEEE 802.1Q standard). The next step is to configure the VLAN
membership based on an appropriate technique supported by the switches, such as port-based, MAC-
based, protocol-based, IP subnet address-based, and application-based.
• In the MAC-based technique, the membership in a VLAN is defined on the basis of the MAC
address of the node.
• In the protocol-based technique, different VLANs are assigned to different protocols based on
the protocol type field found in the OSI Layer 2 header.
• In the IP subnet address-based technique, the VLAN membership is based on the IP subnet
address. All the nodes in an IP subnet are members of the same VLAN. In the application-
based technique, a specific application, for example, a file transfer protocol (FTP) application
can be configured to execute on one VLAN.
Similar to the VSAN trunking, network traffic from multiple VLANs may traverse a trunk link. A single
network port, called trunk port, is used for sending or receiving traffic from multiple VLANs over a trunk
link. Both the sending and the receiving network components must have at least one trunk port configured
for all or a subset of the VLANs defined on the network component.
As with VSAN tagging, VLAN has its own tagging mechanism. The tagging is performed by inserting a
4-byte tag field containing 12-bit VLAN ID into the Ethernet frame (as per IEEE 802.1Q standard) before
it is transmitted through a trunk link. The receiving network component reads the tag and forwards the
frame to the destination port(s) that corresponds to that VLAN ID. The tag is removed once the frame
leaves a trunk link to reach a node port.
A stretched VLAN is a VLAN that spans across multiple sites over a WAN connection. In a typical
multi-site environment, network traffic between sites is routed through an OSI Layer 3 WAN connection.
Because of the routing, it is not possible to transmit OSI Layer 2 traffic between the nodes in two sites.
A stretched VLAN extends a VLAN across the sites and enables nodes in two different sites to
communicate over a WAN as if they are connected to the same network.
Stretched VLANs also allow the movement of virtual machines (VMs) between sites without the need
to change their network configurations. This simplifies the creation of high-availability clusters, VM
migration, and application and workload mobility across sites.
165
Command sequencing begins with the first login command and the CmdSN is incremented by one for
each subsequent command. The iSCSI target layer is responsible for delivering the commands to the
SCSI layer in the order of their CmdSN. This ensures the correct order of data and commands at a target
even when there are multiple TCP connections between an initiator and the target using portal groups.
Similar to command numbering, a status sequence number (StatSN) is used to sequentially number status
responses, as shown in Figure 8-8. These unique numbers are established at the level of the TCP
connection.
A target sends the request-to-transfer (R2T) PDUs to the initiator when it is ready to accept data. Data
sequence number (DataSN) is used to ensure in-order delivery of data within the same command. The
DataSN and R2T sequence numbers are used to sequence data PDUs and R2Ts, respectively. Each of
these sequence numbers is stored locally as an unsigned 32-bit integer counter defined by iSCSI. These
numbers are communicated between the initiator and target in the appropriate iSCSI PDU fields during
command, status, and data exchanges.
In the case of read operations, the DataSN begins at zero and is incremented by one for each subsequent
data PDU in that command sequence. In the case of a write operation, the first unsolicited data PDU or
the first data PDU in response to an R2T begins with a DataSN of zero and increments by one for each
subsequent data PDU. R2TSN is set to zero at the initiation of the command and incremented
by one for each subsequent R2T sent by the target for that command.
The iSCSI protocol addresses errors in IP data delivery. Command sequencing is used for flow control,
the missing commands, and responses, and data blocks are detected using sequence numbers. Use of the
optional digest improves communication integrity in addition to TCP checksum and Ethernet CRC.
166
The error detection and recovery in iSCSI can be classified into three levels:
Level 0 = Session Recovery, Level 1 = Digest Failure Recovery and Level 2 = Connection Recovery.
The error-recovery level is negotiated during login.
■ Level 0: If an iSCSI session is damaged, all TCP connections need to be closed and all tasks and
unfulfilled SCSI commands should be completed. Then, the session should be restarted via the repeated
login.
■ Level 1: Each node should be able to selectively recover a lost or damaged
PDU within a session for recovery of data transfer. At this level, identification of an error and data
recovery at the SCSI task level is performed, and an attempt to repeat the transfer of a lost or damaged
PDU is made.
■ Level 2: New TCP connections are opened to replace a failed connection. The new connection picks
up where the old one failed iSCSI may be exposed to the security vulnerabilities of an unprotected IP
network. Some of the security methods that can be used are IPSec and authentication solutions such as
Kerberos and CHAP (challenge-handshake authentication protocol).
3.3.2 FCIP
FC SAN provides a high-performance infrastructure for localized data movement. But the organizations
are now looking for ways to transport data over a long distance between their disparate FC SANs at
multiple geographic locations. One of the best ways to achieve this goal is to interconnect geographically
dispersed FC SANs through reliable, high-speed links. This approach involves transporting the FC block
data over the IP infrastructure.
FCIP is an IP-based protocol that enables distributed FC SAN islands to be interconnected over an
existing IP network. In FCIP, FC frames are encapsulated onto the IP payload and transported over an IP
network. The FC frames are not altered while transferring over the IP network. In this manner, FCIP
creates virtual FC links over IP network to transfer FC data between FC SANs. FCIP is a tunnelling
protocol in which FCIP entity such as an FCIP gateway is used to tunnel FC fabrics through an IP
network.
The FCIP standard has rapidly gained acceptance as a manageable, cost-effective way to blend
the best of the two technologies which are FC SAN and the proven, widely deployed IP
infrastructure. As a result, organizations now have a better way to store, protect, and move their
data by leveraging investments in their existing IP infrastructure. FCIP is extensively used in
167
disaster recovery implementations in which data is replicated to the storage located at a remote
site. It also facilitates data sharing and data collaboration over distance, which is a key
requirement for next generation applications.
The FC frames can be encapsulated into the IP packet and sent to a remote FC SAN over the IP.
The FCIP layer encapsulates the FC frames onto the IP payload and passes them to the TCP layer.
TCP and IP are used for transporting the encapsulated information across Ethernet, wireless, or
other media that support the TCP/IP traffic.
Encapsulation of FC frame on to IP packet could cause the IP packet to be fragmented when the
data link cannot support the maximum transmission unit (MTU) size of an IP packet. When an IP
packet is fragmented, the required parts of the header must be copied by all fragments. When a
TCP packet is segmented, normal TCP operations are responsible for receiving and re-sequencing
the data prior to passing it on to the FC processing portion of the device.
3.3.2.2 FCIP Connectivity and Topologies
In an FCIP environment, FCIP entity such as an FCIP gateway is connected to each fabric via a
standard FC connection. The FCIP gateway at one end of the IP network encapsulates the FC
frames into IP packets. The gateway at the other end removes the IP wrapper and sends the FC
data to the adjoined fabric. The fabric treats these gateways as fabric switches. An IP address is
assigned to the port on the gateway, which is connected to an IP network. After the IP connectivity
is established, the nodes in the two independent fabrics can communicate with other.
168
An FCIP gateway router is connected to each fabric via a standard FC connection (see Figure 8-
10). The fabric treats these routers like layer 2 fabric switches. The other port on the router is
connected to an IP network and an IP address is assigned to that port. This is similar to the method
of assigning an IP address to an iSCSI port on a gateway. Once IP connectivity is established, the
two independent fabrics are merged into a single fabric. When merging the two fabrics, all the
switches and routers must have unique domain IDs, and the fabrics must contain unique zone set
names. Failure to ensure these requirements will result in a segmented fabric. The FC addresses
on each side of the link are exposed to the other side, and zoning or masking can be done to any
entity in the new environment.
169
Frequently, only a small subset of nodes in either fabric requires connectivity across an FCIP
tunnel. Thus, an FCIP tunnel may also use vendor-specific features to route network traffic
between specific nodes without merging the fabrics.
A VSAN, similar to a stretched VLAN, may be extended across sites. The FCIP tunnel may use
vendor-specific features to transfer multiple VSAN traffic through it. The FCIP tunnel functions
as a trunk link and carries tagged FC frames. This allows extending separate VSANs each with
their own fabric services, configuration, and set of FC addresses across multiple sites.
3.3.2.4 FCIP Performance and Security
Performance, reliability, and security should always be taken into consideration when
implementing storage solutions. The implementation of FCIP is also subject to the same
consideration.
From the perspective of performance, multiple paths to multiple FCIP gateways from different
switches in the layer 2 fabric eliminates single points of failure and provides increased bandwidth.
In a scenario of extended distance, the IP network may be a bottleneck if sufficient bandwidth is
not available. In addition, because FCIP creates a unified fabric, disruption in the underlying IP
network can cause instabilities in the SAN environment. These include a segmented fabric,
excessive RSCNs, and host timeouts.
The vendors of FC switches have recognized some of the drawbacks related to FCIP and have
implemented features to provide additional stability, such as the capability to segregate FCIP
traffic into a separate virtual fabric. Security is also a consideration in an FCIP solution because
the data is transmitted over public IP channels. Various security options are available to protect
the data based on the router’s support. IPSec is one such security measure that can be
implemented in the FCIP environment.
3.4 Fiber Channel over Ethernet Storage Area Network (FCoE SAN)
FCoE SAN is a Converged Enhanced Ethernet (CEE) network that is capable of transporting FC
data along with regular Ethernet traffic over high speed (such as 10 Gbps or higher) Ethernet
links.
170
It uses FCoE protocol that encapsulates FC frames into Ethernet frames. FCoE protocol is defined
by the T11 standards committee. FCoE is based on an enhanced Ethernet standard that supports
Data Center Bridging (DCB) functionalities (also called CEE functionalities). DCB ensures
lossless transmission of FC traffic over Ethernet.
FCoE SAN provides the flexibility to deploy the same network components for transferring both
server-to-server traffic and FC storage traffic. This helps to mitigate the complexity of managing
multiple discrete network infrastructures. FCoE SAN uses multi-functional network adapters and
switches. Therefore, FCoE reduces the number of network adapters, cables, and switches, along
with power and space consumption required in a data center.
3.4.1 Components of FCoE SAN
he key FCoE SAN components are:
• Network adapters such as Converged Network Adapter (CNA) and software FCoE
adapter
•FCoE switch
Converged Network Adapter (CNA)
The CNA is a physical adapter that provides the functionality of both a standard NIC and an FC
HBA in a single device. It consolidates both FC traffic and regular Ethernet traffic on a common
Ethernet infrastructure. CNAs connect hosts to the FCoE switches. They are responsible for
171
encapsulating FC traffic onto Ethernet frames and forwarding them to FCoE switches over CEE
links.
They eliminate the need to deploy separate adapters and cables for FC and Ethernet
communications, thereby reducing the required number of network adapters and switch ports.
A CNA offloads the FCoE protocol processing task from the compute system, thereby freeing the
CPU resources of the compute system for application processing. It contains separate modules
for 10 Gigabit Ethernet (GE), FC, and FCoE Application Specific Integrated Circuits (ASICs).
Software FCoE Adapter
Instead of a CNA, a software FCoE adapter may also be used. A software FCoE adapter is OS or
hypervisor kernel-resident software that performs FCoE processing. The FCoE processing
consumes hosts CPU cycles.
With software FCoE adapters, the OS or hypervisor implements FC protocol in software that
handles SCSI to FC processing. The software FCoE adapter performs FC to Ethernet
encapsulation. Both FCoE traffic (Ethernet traffic that carries FC data) and regular Ethernet
traffic are transferred through supported NICs on the hosts.
FCOE Switch
An FCoE switch has both Ethernet switch and FC switch functionalities. It has a Fibre Channel
Forwarder (FCF), an Ethernet Bridge, and a set of ports that can be used for FC and Ethernet
connectivity. FCF handles FCoE login requests, applies zoning, and provides the fabric services
typically associated with an FC switch. It also encapsulates the FC frames received from the FC
port into the Ethernet frames and decapsulates the Ethernet frames received from the Ethernet
Bridge to the FC frames.
Upon receiving the incoming Ethernet traffic, the FCoE switch inspects the Ethertype of the
incoming frames and uses that to determine their destination. If the Ethertype of the frame is
FCoE, the switch recognizes that the frame contains an FC payload and then forwards it to the
FCF. From there, the FC frame is extracted from the Ethernet frame and transmitted to the FC
SAN over the FC ports. If the Ethertype is not FCoE, the switch handles the traffic as usual
Ethernet traffic and forwards it over the Ethernet ports.
3.4.2 FCoE SAN connectivity
The most common FCoE connectivity uses FCoE switches to interconnect a CEE network
containing hosts with an FC SAN containing storage systems. The hosts have FCoE ports that
provide connectivity to the FCoE switches. The FCoE switches enable the consolidation of FC
traffic and Ethernet traffic onto CEE links.
172
This type of FCoE connectivity is suitable when an organization has an existing FC SAN
environment. Connecting FCoE hosts to the FC storage systems through FCoE switches do not
require any change in the FC environment. And the other type of FCoE connectivity model is the
end-to-end FCoE model. Some vendors offer FCoE ports in their storage systems. These storage
systems connect directly to the FCoE switches.
The FCoE switches form FCoE fabrics between hosts and storage systems and provide end to-
end FCoE support. The end-to-end FCoE connectivity is suitable for a new FCoE deployment.
3.4.3 Converged Enhanced Ethernet
Traditional Ethernet networks are accessed via a network adapter called a network interface
card (NIC). Each host that wants to connect to the Ethernet network needs at least one.
Conversely, traditional Fibre Channel networks are accessed via a network adapter called a host
bus adapter (HBA) in each host.
However, to create a single data center network capable of transporting IP and FC storage traffic,
Ethernet adapters had to be significantly enhanced and upgraded. So, ideally to set up a new
server in the data center, we need a new type of advanced adapter known as Converged Network
Adapter (CNA) in it, and you can do all the networking you want.Accessing an FCoE network
requires a new type of network adapter called a converged network adapter (CNA). All three of
these network adapter cards are implemented as PCI adapter cards. They can be either expansion
cards or directly on the motherboard of a server in what is known as LAN on motherboard (LOM).
173
A CNA is exactly what the name suggests – a NIC and an HBA converged into a single network
card. For performance reasons, CNAs provide NIC and HBA functionality in
hardware (usually an ASIC) so that things like FCoE encapsulation can be fast without impacting
host CPU resources.
You can do general purpose IP networking, FC storage networking, iSCSI storage, NAS, maybe
even low-latency, high-performance computing. A single network adapter and a single cable will
do all the job. It also has the positive effect of reduced power and cooling costs. All in all, it
delivers reduced data center running costs and lower total cost of ownership (TCO).
So to create this new enhanced Ethernet, the IEEE formed a new task group within the 802.1
working group called as Data Center Bridging (DCB). This DCB is responsible for the
development of a data center ethernet network that is capable of transporting all common data
center network traffic types like IP LAN traffic, FC storage traffic and infiband high performance
computing traffic. So the enhanced ethernet generally called as either Data Center Bridging
(DCB) or Converged Ethernet (CEE) or Data Center Fabric or Unified Fabric.
This Converged Enhanced Ethernet (CEE) has the following enhancements which we will discuss
in next posts
• Increased bandwidth
• Classes of service
• Priorities
• Congestion management
174
All these enhancements can be obtained by using Converged network adapters and a whole load
of new hardware requirements such as cables, switch ports, and switches.
Functions of Converged Enhanced Ethernet (CEE)
Conventional Ethernet is lossy in nature, which means that frames might be dropped or lost under
congestion conditions. Therefore, Converged Enhanced Ethernet (CEE) provides a new
specification to the existing Ethernet standard. It eliminates the lossy nature of Ethernet and
enables convergence of various types of network traffic on a common Ethernet infrastructure.
CEE eliminates the dropping of frames due to congestion and thereby ensures lossless
transmission of FCoE (Fibre Channel over Ethernet) traffic over an Ethernet network. The
lossless Ethernet is required for the reliable transmission of FC data over an Ethernet network.
Unlike TCP/IP, the loss of a single FC frame typically requires the entire FC exchange to be
aborted and re‐transmitted, instead of just re‐sending a particular missing frame. CEE makes a
high-speed (such as 10 Gbps or higher) Ethernet network a viable storage networking option,
similar to an FC SAN.
The CEE requires certain functionalities. These functionalities are defined and maintained by the
Data Center Bridging (DCB) task group, which is a part of the IEEE 802.1 working group. These
functionalities are
• Congestion notification
175
• PFC provides a link-level flow control mechanism. PFC creates eight separate virtual
links on a single physical link and allows any of these links to be paused and restarted
independently.
• PFC enables the PAUSE mechanism based on user priorities or classes of service.
Enabling the PAUSE based on priority allows creating lossless links for network
traffic, such as FCoE traffic.
•This PAUSE mechanism is typically implemented for FCoE while regular TCP/IP
traffic continues to drop frames.
Enhanced Transmission Selection (ETS)
Enhanced transmission selection (ETS) provides a common management framework for the
allocation of bandwidth to different traffic classes, such as LAN, SAN, and Inter Process
Communication (IPC). For example, an administrator may assign 40 percent of network
bandwidth to LAN traffic, 40 percent of bandwidth to SAN traffic, and 20 percent of bandwidth
to IPC traffic. When a particular class of traffic does not use its allocated bandwidth, ETS enables
other traffic classes to use the available bandwidth.
Congestion notification (CN)
Congestion notification (CN) provides end-to-end congestion management for protocols, such as
FCoE, that do not have built-in congestion control mechanisms. Link level congestion
notification provides a mechanism for detecting congestion and notifying the source to move the
traffic flow away from the congested links. Link level congestion notification enables a switch to
send a signal to other ports that need to stop or slow down their transmissions.
The process of congestion notification and its management is shown in the above figure, which
represents the communication between the nodes A (sender) and B (receiver). If congestion at the
receiving end occurs, the algorithm running on the switch generates a congestion notification
message to the sending node (Node A). In response to the message, the sending end limits the
rate of data transfer.
176
DCBX is a discovery and capability exchange protocol, which helps CEE devices to convey and
configure their features with the other CEE devices in the network. DCBX is used to negotiate
capabilities between the switches and the network adapters, which allows the switch to distribute
the configuration values to all the attached adapters. This helps to ensure consistent configuration
across the entire network.
3.4.4 FCoE Architecture
The data in FCoE is sent through FCoE frames. An FCoE frame is an Ethernet frame that contains
an FCoE Protocol Data Unit (PDU). The below diagram shows the FCoE frame structure. The
Ethernet header includes the source and destination MAC addresses, IEEE 802.1Q VLAN tag,
and Ethertype field. FCoE has its own Ethertype.
The FCoE header includes a version field that identifies the version of FCoE being implemented
and some reserved bits. The Start of Frame (SOF) and the End of Frame (EOF) mark the start and
the end of the encapsulated FC frame respectively. The encapsulated FC frame consists of the FC
header and the data being transported (including the FC CRC). The FCoE frame ends with the
Frame Check Sequence (FCS) field that provides error detection for the Ethernet frame. Notice
that the FCoE frame, unlike iSCSI and FCIP, has no TCP/IP overhead.
Frame size is an important factor in FCoE. A typical FC data frame has a 2112-byte payload, a
24-byte header, and an FCS. A standard Ethernet frame has a default payload capacity of 1500
bytes. To maintain good performance, FCoE must use jumbo frames to prevent an FC frame from
being split into two Ethernet frames.
177
The encapsulation of the FC frames occurs through the mapping of the FC frames onto Ethernet,
as shown on the slide. FC and traditional networks have stacks of layers where each layer in the
stack represents a set of functionalities. The FC stack consists of five layers – FC-0 through FC-
4.
Ethernet is typically considered as a set of protocols that operates at the physical and data link
layers in the seven-layer OSI stack. The FCoE protocol specification replaces the FC-0 and FC-
1 layers of the FC stack with Ethernet. This provides the capability to carry the FC-2 to the FC-4
layer over the Ethernet layer.
FCoE Addressing
An FCoE SAN uses MAC address for frame forwarding. The MAC addresses are assigned to the
VN_Ports, VF_Ports, and VE_Ports. The destination and the source MAC addresses are used to
direct frames to their Ethernet destinations. Both the VF_Ports and the VE_Ports obtain MAC
addresses from the FCoE switch. FCoE supports two types of addressing for the VN_Ports:
server-provided MAC address (SPMA) and fabric-provided MAC address (FPMA). These
addressing types are described below
SPMA: In this type of addressing, the compute systems provide MAC addresses to the associated
VN_Ports. The MAC addresses are issued in accordance with Ethernet standards. These
addresses are either burned-in by the manufacturers of the network adapters or are configured by
an administrator. SPMA can use a single MAC address exclusively for FCoE traffic or it can have
different MAC address for each VN_Port.
178
FPMA: In this type of addressing, the VN_Ports receive MAC addresses from the FCoE switches
dynamically during login. The VN_Ports then use their granted MAC addresses for
communication. This address is derived by concatenating the 24-bit FC MAC address prefix (FC-
MAP) and the 24-bit FC address assigned to the VN_Port by the FCoE switch. FC-MAP identifies
the fabric to which an FCoE switch belongs. The FPMA ensures that the MAC addresses are
unique within an FCoE SAN.
FCoE Frame Forwarding
In an FCoE SAN, a node must know two different addresses to forward a frame to another node.
First, it must know the Ethernet MAC address of the FCoE switch port (VF_Port). Second, it
must know the FC address assigned to the destination node port (VN_Port or N_Port). The MAC
address is used to forward an Ethernet frame containing FC payload over a CEE network. The
FC address is used to send the FC frame, encapsulated into the Ethernet frame, to its FC
destination.
To understand the FCoE communication, it is important to know the FCoE process. The FCoE
process includes three key phases: discovery, login, and data transfer.
• Discovery phase: In this phase, the FCFs discover each other and form an FCoE
fabric. The FCoE nodes also find the available FCFs for login. Moreover, both the
FCoE nodes and the FCFs discover potential VN_Port to VF_Port pairing.
• Login phase: In this phase, the virtual FC links are established between VN_Ports and
VF_Ports as well as between VE_Ports. VN_ports perform FC login (including
FLOGI, PLOGI, PRLI) to the discovered FCFs and obtain FC addresses. Each
VN_Port also obtains a unique MAC address.
• Data transfer phase: After login, the VN_Ports can start transferring regular FC
frames (encapsulated) over the CEE network.
In an FCoE SAN, an FCoE node needs a discovery mechanism that allows it to discover the
available FCFs before it can perform FC login. The mechanism used for the discovery is the
FCoE Initialization Protocol (FIP).
FIP is used for discovering the FCFs and establishing virtual links between FCoE devices (FCoE
nodes and FCoE switches). Unlike FCoE frames, FIP frames do not transport FC data, but contain
discovery and login/logout parameters. FIP frames are assigned a unique EtherType code to
distinguish them from the FCoE frames.
The FCoE node to FCF discovery and the login use the following FIP operations:
179
• FCoE node sends multicast FIP Solicitation frame to find which FCFs are available
for login.
• Each FCF replies to the FCoE node by sending unicast FIP Advertisement frame.
• After the FCoE node decides which FCF is appropriate, it sends FIP FLOGI request
to the FCF.
• The selected FCF sends FIP FLOGI Accept which contains both FC address and MAC
address for the VN_Port. The reason for using FIP for FLOGI instead of a regular
FLOGI is that the FIP FLOGI Accept has a field for the FCF to assign a MAC address
to the VN_Port.
2. Define virtualization.
180
4. What is zoning?
▪ Zoning allows for finer segmentation of the switched fabric. Zoning can be used to
instigate a barrier between different environments.
▪ Only the members of the same zone can communicate within that zone; all
other attempts from outside are rejected.
▪ Zoning can be implemented in the following ways:
✓ Hardware zoning
✓ Software zoning
181
o Link aggregation allows to combine multiple Ethernet links into a single logical link
between two networked devices. Link aggregation is sometimes called by other names:
Ethernet bonding. Ethernet teaming.
follows
✓ Point-to-point,
✓ Fibre Channel-Arbitrated Loop (FC -AL),
✓ Switched Fabric.
FCoE SAN is a Converged Enhanced Ethernet (CEE) network that is capable of transporting
FC data along with regular Ethernet traffic over high speed (such as 10 Gbps or higher) Ethernet
links.
182
Review Questions
1. Explain in detail about Block-based, File-based, Object-based and Unified Storage systems.
2. Describe in detail about the components and architecture of FC SAN
3. Write a note on FC SAN topologies and connectivity
4. Describe in detail about zoning and link aggregation
5. What is Virtual SAN of FC SAN. Explain in detail.
6. Explain in detail about the IP SAN Protocols (iSCSI protocol Stack, FCIP protocol Stack)
7. Explain in detail about zoning and aggregation in iSCSI IP SAN
8. Explain the components, performance and addressing in FCIP
9.Explain in detail about the components and architecture of FCoE
10. Describe about Converged Enhanced Ethernet
183