Storage Area Network Module-1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 77

Storage Area Network

Module-1
Introduction
 Data centers now view information storage as one of their core
elements, along with applications, databases, operating systems,
and networks.

 Storage technology continues to evolve with technical


advancements offering increasingly higher levels of availability,
security, scalability, performance, integrity, capacity, and
manageability.

 Storage technology and architecture continues to evolve, which


enables organizations to consolidate, protect, optimize, and
leverage their data to achieve the highest return on information
assets.
Cont..
 This Module describes the evolution of information storage
architecture from simple direct-attached models to complex
networked topologies

 EVOLUTION OF STORAGE TECHNOLOGY AND


ARCHITECTURE:
 Historically, organizations had centralized computers
(mainframe) and information storage devices (tape reels and disk
packs) in their data center.

 The evolution of open systems and the affordability and ease of


deployment that they offer made it possible for business
units/departments to have their own servers and storage.
Cont..
 In earlier implementations of open systems, the storage was
typically internal to the server
• Redundant Array of Independent Disks (RAID): This
technology was developed to address the cost, performance, and
availability requirements of data.
• Direct-attached storage (DAS): This type of storage connects
directly to a server (host) or a group of servers in a cluster. Storage
can be either internal or external to the server.
• Storage area network (SAN): This is a dedicated, high-
performance Fibre Channel (FC)network to facilitate block-level
communication between servers and storage. SAN offers
scalability, availability, performance, and cost benefits compared to
DAS.
Cont..
 Network-attached storage (NAS): This is dedicated storage for
file serving applications. Unlike a SAN, it connects to an existing
communication network (LAN) and provides file access to
heterogeneous clients.
 Internet Protocol SAN (IP-SAN): One of the latest evolutions in
storage architecture, IP-SAN is a convergence of technologies used
in SAN and NAS.
KEY CHARACTERISTIC DATA CENTER ELEMENTS
• Uninterrupted operation of data centers is critical to the survival
and success of a business. It is necessary to have a reliable
infrastructure that ensures data is accessible at all times.

• While the requirements, shown in Figure 1-6, are applicable to all


elements of the data center infrastructure, our focus here is on
storage systems.

• Availability: All data center elements should be designed to ensure


accessibility. The inability of users to access data can have a
significant negative impact on a business.
CONT..
Cont..
• Security: Polices, procedures, and proper integration of the data
center core elements that will prevent unauthorized access to
information must be established.
In addition to the security measures for client access, specific
mechanisms must enable servers to access only their allocated
resources on storage arrays.

• Scalability: Data center operations should be able to allocate


additional processing capabilities or storage on demand, without
interrupting business operations.
• Performance: All the core elements of the data center should be
able to provide optimal performance and service all processing
requests at high speed.
Cont..
• Data integrity: Data integrity refers to mechanisms such as error
correction codes or parity bits which ensure that data is written to
disk exactly as it was received.
• Capacity: Data center operations require adequate resources to
store and process large amounts of data efficiently.
When capacity requirements increase, the data center must be able
to provide additional capacity without interrupting availability, or,
at the very least, with minimal disruption.
• Manageability: A data center should perform all operations and
activities in the most efficient manner.
Manageability can be achieved through automation and the
reduction of human (manual) intervention in common tasks
VIRTUALIZATION AND CLOUD COMPUTING
Virtualization
 Virtualization is a technique of abstracting physical resources, such
as compute, storage, and network, and making them appear as
logical resources.

 Common examples of virtualization are virtual memory used on


compute systems and partitioning of raw disks.

 Virtualization enables pooling of physical resources and providing


an aggregated view of the physical resource capabilities. For
example, storage virtualization enables multiple pooled storage
devices to appear as a single large storage entity.
Cont..
 Similarly, by using compute virtualization, the CPU capacity of the
pooled physical servers can be viewed as the aggregation of the
power of all CPUs (in megahertz).

 Virtualization also enables centralized management of pooled


resources.

 Based on business requirements, capacity can be added to or


removed from the virtual resources without any disruption to
applications or users

 Cloud Computing:
Cloud computing, addresses these challenges efficiently. Cloud
computing enables individuals or businesses to use IT resources as
a service over the network.
Cont..
 It provides highly scalable and flexible computing that enables
provisioning of resources on demand.

 Users can scale up or scale down the demand of computing


resources, including storage capacity, with minimal management
effort or service provider interaction.

 Cloud computing empowers self-service requesting through a fully


automated request-fulfillment process.

 Cloud computing enables consumption-based metering; therefore,


consumers pay only for the resources they use, such as CPU hours
used, amount of data transferred, and gigabytes of data stored.
Cont..
 Cloud infrastructure is usually built upon virtualized data centers,
which provide resource pooling and rapid provisioning of
resources.
KEY DATA CENTER ELEMENTS

• The three main components in a storage system environment — the


host, connectivity, and storage — are described in this section

• (i) HOST (COMPUTE):


• The computers on which these applications run are referred to as
hosts or compute systems. Hosts can be physical or virtual
machines.

• A compute virtualization software enables creating virtual


machines on top of a physical compute infrastructure.

• A host consists of CPU, memory, I/O devices, and a collection of


software to perform computing operations.
CONT..
• This software includes the operating system, file system, logical
volume manager, device drivers, and so on. This software can be
installed as separate entities or as part of the operating system.
• Software runs on a host and enables processing of input and output
(I/O) data.
• The following section details various software components that are
essential parts of a host system:
• Operating System: The operating system also monitors and
responds to user actions and the environment. It organizes and
controls hardware components and manages the allocation of
hardware resources
CONT..
• Memory Virtualization: Memory virtualization enables multiple
applications and processes, whose aggregate memory requirement
is greater than the available physical memory, to run on a host
without impacting each other.
• Device Driver: A device driver is special software that permits the
operating system to interact with a specific device, such as a
printer, a mouse, or a disk drive.
• Volume Manager: The evolution of Logical Volume
Managers(LVMs) enabled dynamic extension of file system
capacity and efficient storage management.
The LVM is software that runs on the compute system and
manages logical and physical storage. In partitioning, a disk drive is
divided into logical containers called logical volumes(LVs)
CONT..
 File Systems: A file system consists of logical structures and
software routines that control access to files. It provides users with
the functionality to create, modify, delete, and access files.
 Compute Virtualization: Compute virtualization is a technique for
masking or abstracting the physical hardware from the operating
system. It enables multiple operating systems to run concurrently
on single or clustered physical machines.
This technique enables creating portable virtual compute systems
called virtual machines(VMs). Compute virtualization is achieved
by a virtualization layer that resides between the hardware and
virtual machines.
This layer is also called the hypervisor. The hypervisor provides
hardware resources, such as CPU, memory, and network to all the
virtual machines.
CONT..
(ii) CONNECTIVITY:
• Connectivity refers to the interconnection between hosts or between
a host and peripheral devices, such as printers or storage devices.
• Connectivity and communication between host and storage are
enabled using physical components and interface protocols.
• Physical Components of Connectivity: The physical components
of connectivity are the hardware elements that connect the host to
storage.
• Three physical components of connectivity between the host and
storage are the host interface device, port, and cable (Figure 2-4).
CONT..
CONT..
• Interface Protocols: A protocol enables communication between the
host and storage. Protocols are implemented using interface devices
(or controllers) at both source and destination.
• The popular interface protocols used for host to storage
communications are Integrated Device Electronics/Advanced
Technology Attachment(IDE/ATA), Small Computer System
Interface(SCSI), Fibre Channel(FC) and Internet Protocol(IP).

(iii) STORAGE
• Storage is a core component in a data center. A storage device uses
magnetic, optic, or solid state media. Disks, tapes, and diskettes use
magnetic media, whereas CD/DVD uses optical media for storage.
CONT..
• However, tapes have various limitations in terms of performance
and management, as listed here:
• Data is stored on the tape linearly along the length of the tape.
Search and retrieval of data are done sequentially, and it invariably
takes several seconds to access the data.
• As a result, random data access is slow and time-consuming. This
limits tapes as a viable option for applications that require real-
time, rapid access to data.
• In a shared computing environment, data stored on tape cannot be
accessed by multiple applications simultaneously, restricting its use
to one application at a time.
CONT..
• On a tape drive, the read/write head touches the tape surface, so the
tape degrades or wears out after repeated use.
• The storage and retrieval requirements of data from the tape and the
overhead associated with managing the tape media are significant.
RAID IMPLEMENTATIONS

• The two methods of RAID( Redundant Array of Independent Disks)


implementation are hardware and software. Both have their
advantages and disadvantages, and are discussed in this section.
Software RAID:
• Software RAID uses host-based software to provide RAID
functions. It is implemented at the operating-system level and
does not use a dedicated hardware controller to manage the RAID
array.
• Software RAID implementations offer cost and simplicity benefits
when compared with hardware RAID. However, they have the
following limitations:
• Performance: Software RAID affects overall system performance.
This is due to additional CPU cycles required to perform RAID
calculations.
Cont..
• Supported features: Software RAID does not support all RAID
levels.
• Operating system compatibility: Software RAID is tied to the host
operating system; hence, upgrades to software RAID or to the
operating system should be validated for compatibility. This leads
to inflexibility in the data-processing environment.
Hardware RAID
• In hardware RAID implementations, a specialized hardware
controller is implemented either on the host or on the array.
• Controller card RAID is a host-based hardware RAID
implementation in which a specialized RAID controller is installed
in the host, and disk drives are connected to it.
Cont..
• The external RAID controller is an array-based hardware RAID. It
acts as an interface between the host and disks. It presents
storage volumes to the host, and the host manages these volumes
as physical drives. The key functions of the RAID controllers are as
follows:
• Management and control of disk aggregations
• Translation of I/O requests between logical disks and physical disks
• Data regeneration in the event of disk failures
RAID TECHNIQUES
• RAID techniques — striping, mirroring, and parity — form the basis
for defining various RAID levels. These techniques determine the
data availability and performance characteristics of a RAID set.
• (i) Striping:
• Striping is a technique to spread data across multiple drives (more
than one) to use the drives in parallel.
• Strip size (also called stripe depth) describes the number of blocks
in a strip and is the maximum amount of data that can be written
to or read from a single disk in the set, assuming that the accessed
data starts at the beginning of the strip.
Cont..
• All strips in a stripe have the same number of blocks.
• Stripe size is a multiple of strip size by the number of datadisks in
the RAID set. For example, in a five disk striped RAID set with a
strip size of 64 KB, the stripe size is 320 KB(64KB X 5).
• Striped RAID does not provide any data protection unless parity or
mirroring is used, as discussed in the following sections.
Cont..
• (ii) Mirroring:
• Mirroring is a technique whereby the same data is stored on two
different disk drives, yielding two copies of the data. If one disk
drive failure occurs, the data is intact on the surviving disk drive
(see Figure 3-3) and the controller continues to service the host’s
data requests from the surviving disk of a mirrored pair.
Cont..
• Mirroring involves duplication of data — the amount of storage
capacity needed is twice the amount of data being stored.
• Therefore, mirroring is considered expensive and is preferred for
mission-critical applications that cannot afford the risk of any data
loss.
• Mirroring improves read performance because read requests can
be serviced by both disks.
• Mirroring does not deliver the same levels of write performance as
a striped RAID
Cont..
(iii) Parity
• Parity is a method to protect striped data from disk drive failure
without the cost of mirroring. An additional disk drive is added to
hold parity, a mathematical construct that allows re-creation of
the missing data.
• Parity is a redundancy technique that ensures protection of data
without maintaining a full set of duplicate data.
• Calculation of parity is a function of the RAID controller.
RAID LEVELS
• Application performance, data availability requirements,
and cost determine the RAID level selection.
• These RAID levels are defined on the basis of striping,
mirroring, and parity techniques. Some RAID levels use a
single technique, whereas others use a combination of
techniques.
• Table 3-1 shows the commonly used RAID levels.
Cont..
Cont..
RAID-0
• RAID 0 configuration uses data striping techniques,
where data is striped across all the disks within a RAID
set.
• Therefore it utilizes the full storage capacity of a RAID
set. To read data, all the strips are put back together by
the controller.

• Figure 3-5 shows RAID 0 in an array in which data is


striped across five disks.
Cont..
Cont..
RAID 0 is a good option for applications that need high I/O
throughput.

However, if these applications require high availability


during drive failures, RAID 0 does not provide data
protection and availability.
RAID 1:
RAID 1 is based on the mirroring technique. In this RAID
configuration, data is mirrored to provide fault tolerance
(see Figure 3-6).
A RAID 1 set consists of two disk drives and every write is
written to both disks.
Cont..
RAID 1 is suitable for applications that require high availability and cost
is no constraint
Cont..
NESTED RAID
Most data centers require data redundancy and
performance from their RAID arrays.
RAID 1+0 and RAID 0+1combine the performance benefits
of RAID 0 with the redundancy benefits of RAID 1.
They use striping and mirroring techniques and combine
their benefits. These types of RAID require an even number
of disks, the minimum being four (see Figure 3-7).
Cont..
Cont..
Following are the steps performed in RAID 0+1 (see
Figure 3-7 [b])

RAID 3 :
RAID 3 stripes data for performance and uses parity for fault
tolerance.

Parity information is stored on a dedicated drive so that the


data can be reconstructed if a drive fails in a RAID set.

For example, in a set of five disks, four are used


for data and one for parity.
Cont..

Figure 3-8 illustrates the RAID 3 implementation.

RAID 3 provides good performance for applications that


involve large sequential data access, such as data backup or
video streaming.
Cont..

Figure 3-8 illustrates the RAID 3 implementation.

RAID 3 provides good performance for applications that


involve large sequential data access, such as data backup or
video streaming.

RAID-4:
Similar to RAID 3, RAID 4 stripes data for high performance
and uses parity for improved fault tolerance.

Data is striped across all disks except the parity disk in the
array. Parity information is stored on a dedicated disk so that
the data can be rebuilt if a drive fails.
Cont..

Unlike RAID 3, data disks in RAID 4 can be accessed


independently so that specific data elements can be read or
written on a single disk without reading or writing an entire
stripe.

RAID 4 provides good read throughput and reasonable write


throughput.
Cont..

RAID 5:
RAID 5 is a versatile RAID implementation. It is similar to
RAID 4 because it uses striping.

 The drives (strips) are also independently accessible. The


difference between RAID 4 and RAID 5 is the parity location.

 In RAID 4, parity is written to a dedicated drive, creating a


write bottleneck for the parity disk.

In RAID 5, parity is distributed across all disks to overcome


the write bottleneck of a dedicated parity disk. Figure 3-9
illustrates the RAID 5 implementation.
Cont..
Cont..
 RAID 5 is good for random, read-intensive I/O applications
and preferred for messaging, data mining, medium-
performance media serving, and relational database
management system (RDBMS) implementations, in which
database administrators (DBAs) optimize data access.

RAID 6
RAID 6 works the same way as RAID 5, except that RAID
6 includes a second parity element to enable survival if two
disk failures occur in a RAID set (see Figure 3-10).
 Therefore, a RAID 6 implementation requires at least four
disks.
RAID 6 distributes the parity across all the disks.
Cont..
The write penalty (explained later in this chapter) in RAID 6
is more than that in RAID 5; therefore, RAID 5 writes
perform better than RAID 6.

COMPONENTS OF AN INTELLIGENT STORAGE SYSTEM:
• An intelligent storage system consists of four key
components: front end, cache, back end, and physical
disks. Figure 4-1 illustrates these components and their
interconnections.
• An I/O request received from the host at the front-end
port is processed through cache and back end, to enable
storage and retrieval of data from the physical disk.
• A read request can be serviced directly from cache if the
requested data is found in the cache.
Cont..
• In modern intelligent storage systems, front end, cache,
and back end are typically integrated on a single board
(referred to as a storage processor or storage controller).
Cont..
Front End
• The front end provides the interface between the storage
system and the host.
• It consists of two components: front-end ports and
front-end controllers.
• Typically, a front end has redundant controllers for high
availability, and each controller contains multiple ports
that enable large numbers of hosts to connect to the
intelligent storage system.
Cont..
• Front-end controllers route data to and from cache via
the internal data bus. When the cache receives the write
data, the controller sends an acknowledgment message
back to the host

Cache
• Cache is semiconductor memory where data is placed
temporarily to reduce the time required to service I/O
requests from the host.
Cont..
Read Operation with Cache
Write Operation with Cache
• Write-back cache
• Write-through cache
Cont..
Cache Implementation: Cache can be implemented as
either dedicated cache or global cache.

With dedicated cache, separate sets of memory locations


are reserved for reads and writes.

Cache Management: Various cache management


algorithms are implemented in intelligent storage systems to
proactively maintain a set of free pages and a list of pages
that can be potentially freed up whenever required. Some
Algorithms are LRU,MRU etc.,
Flushing
• As cache fills, the storage system must take
action to flush dirty pages to manage space
availability.
• Flushing is the process that commits data
from cache to the disk.
• High watermark(HWM)
• Low Watermark(LWM)
Cont..
Cache Data Protection:
Cache is volatile memory, so a power failure or any kind of
cache failure will cause loss of the data that is not yet
committed to the disk.

This risk of losing uncommitted data held in cache can be


mitigated using cache mirroring and cache vaulting

Cache mirroring: Each write to cache is held in two different


memory locations on two independent memory cards.

If a cache failure occurs, the write data will still be safe in


the mirrored location and can be committed to the disk.
Cont..
In cache mirroring approaches, the problem of maintaining
cache coherency is introduced.

Cache coherency means that data in two different cache


locations must be identical at all times.

It is the responsibility of the array operating environment to


ensure coherency.

Cache vaulting: The risk of data loss due to power failure can
be addressed in various ways: powering the memory with a
battery until the AC power is restored or using battery power
to write the cache content to the disk.
Cont..

 If an extended power failure occurs, using batteries is not a


viable option. This is because in intelligent storage systems,
large amounts of data might need to be committed to
numerous disks, and batteries might not provide power for
sufficient time to write each piece of data to its intended
disk.

Therefore, storage vendors use a set of physical disks to


dump the contents of cache during power failure. This is
called cache Vaulting and the disks are called vault drives
Cont..
Back End
The back end provides an interface between cache and the
physical disks. It consists of two components: back-end ports
and back-end controllers.

The back-end controls data transfers between cache and the


physical disks.

The algorithms implemented on back-end controllers provide


error detection and correction, along with RAID functionality.
Cont..
Physical Disk
Physical disks are connected to the back-end storage
controller and provide persistent data storage.

Modern intelligent storage systems provide support to a variety


of disk drives with different speeds and types, such as FC,
SATA, SAS, and flash drives.

They also support the use of a mix of flash, FC, or SATA within
the same array
STORAGE PROVISIONING
Storage provisioning is the process of assigning storage
resources to hosts based on capacity, availability, and
performance requirements of applications running on the
hosts.

Storage provisioning can be performed in two ways: traditional


and virtual.

Virtual provisioning leverages virtualization technology for


provisioning storage for applications.
Traditional Storage Provisioning
In traditional storage provisioning, physical disks are logically
grouped together and a required RAID level is applied to form
a set, called a RAID set.
The number of drives in the RAID set and the RAID level
determine the availability, capacity, and performance of the
RAID set.

Logical units are created from the RAID sets by partitioning


(seen as slices of the RAID set) the available capacity into
smaller units. These units are then assigned to the host based
on their storage requirements.

Each logical unit created from the RAID set is assigned a


unique ID called a logical unit number(LUN).

LUNs hide the organization and composition of the RAID set


from the hosts. LUNs created by traditional storage
provisioning methods are also referred to as thick LUNs to
distinguish them from the LUNs created by virtual provisioning
methods.
Cont..

Figure 4-5 shows a RAID set consisting of five disks that have
been sliced, or partitioned, into two LUNs: LUN 0 and LUN 1.
These LUNs are then assigned to Host1 and Host 2 for their
storage requirements

When a LUN is configured and assigned to a non-virtualized


host, a bus scan is required to identify the LUN. This LUN
appears as a raw disk to the operating system.

To make this disk usable, it is formatted with a file system


and then the file system is mounted.
Cont..
Cont..

In a virtualized host environment, the LUN(Logical Unit


Number) is assigned to the hypervisor, which recognizes it as
a raw disk. This disk is configured with the hypervisor file
system, and then virtual disks are created on it.

The virtual disks are then assigned to virtual machines and


appear as raw disks to them.

To make the virtual disk usable to the virtual machine, similar


steps are followed as in a non-virtualized environment. Here,
the LUN space may be shared and accessed simultaneously
by multiple virtual machines
Cont..
LUN Expansion: MetaLUN
MetaLUNis a method to expand LUNs that require additional
capacity or performance.

A metaLUN can be created by combining two or more LUNs.


A metaLUN consists of a base LUN and one or more
component LUNs.

MetaLUNs can be either concatenated or striped.

Concatenated expansion simply adds additional capacity to


the base LUN.

 In this expansion, the component LUNs are not required to


be of the same capacity as the base LUN.
Cont..
All LUNs in a concatenated metaLUN must be either
protected (parity or mirrored) or unprotected (RAID 0).
Cont..
Striped expansion restripes the base LUN’s data across the
base LUN and component LUNs.

In striped expansion, all LUNs must be of the same capacity


and RAID level.

Striped expansion provides improved performance due to the


increased number of drives being striped (see Figure 4-7)
Cont..
Virtual Storage Provisioning
Virtual provisioning enables creating and presenting a LUN
with more capacity than is physically allocated to it on the
storage array.

The LUN created using virtual provisioning is called a thin LUN


to distinguish it from the traditional LUN.
Cont..
Cont..

A shared pool in virtual provisioning is analogous to a


RAID group, which is a collection of drives on which LUNs
are created.

Virtual provisioning enables more efficient allocation of


storage to hosts.

Virtual provisioning also enables oversubscription, where


more capacity is presented to the hosts than is actually
available on the storage array.
INTELLIGENT STORAGE SYSTEMS IMPLEMENTATIONS

Intelligent storage systems generally fall into one of the


following two categories:
High-end storage systems
Midrange storage systems

High-End Storage Systems


High-end storage systems, referred to as active-active
arrays, are generally aimed at large enterprise
applications.

These systems are designed with a large number of


controllers and cache memory. An active-active array
implies that the host can perform I/Os to its LUNs through
any of the available controllers (see Figure 4-10)
Cont..
Cont..

Midrange Storage Systems


Midrange storage systems are also referred to as active-
passive arrays and are best suited for small- and medium-
sized enterprise applications.

They also provide optimal storage solutions at a lower cost.

Midrange storage systems are typically designed with two


controllers, each of which contains host interfaces, cache,
RAID controllers, and interface to disk drives
Cont..

You might also like