Paper 1 SAN
Paper 1 SAN
Paper 1 SAN
> Striping is a technique to spread data across multiple drives (more than one) to use the drives
in parallel.
> All the read-write heads work simultaneously. allowing more data to be processed in a shorter
time and increasing performance, compared to reading and writing from a single disk.
> Within each disk in a RAID set, a predefined number of contiguously addressable disk
blocks are defined as a strip.
> The set of aligned strips that spans across allthe disks within the RAID set is called a stripe.
> The below figure shows physical and logical representations of a striped RAID set.
> Strip size (also called stripe depth) describes the number of blocks in a strip and is the
maximum amount of data that can be written to or read from a single disk in the set.
> All strips in a stripe have the same number of blocks.
Having a smaller strip size means that data is broken into smaller pieces while spread
across the disks.
Sintlas
tripe
Srip 1 Rrip 3
Mirroring
C c
-Disks
D D
Advantages:
complete data redundancy
mirroring enables fast recovery from disk failure.
data protection
Mirroring is not a substitute for data backup. Mirroring constantly captures changes in thedata,
whereas a backup captures point-in-time images of the data.
Disadvantages:
Mirroring involves duplication of data the amount of storage capacity needed is
twice the amount of data being stored.
Expensive
Parity
> Parity is a method to protect striped data from disk drive failure without the cost of
mirroring.
> An additional disk drive is added to hold parity, a mathematical construct that allows re
creation of the missing data.
> Parity is a redundancy technique that ensures protection of data without maintaining a full
set of duplicate data.
> Calculation of parity is a function of the RAID controller.
> Parity information can be stored on separate, dedicated disk drives or distributed across all the
drives in a RAID set.
2 3
> Now, if one of the data disks fails, the missing value can be calculated by subtracing the sum
of the rest of the elements from the parity value.
> Here, computation of parity is represented as an arithmetic sum of the data. However, parity
calculation is a bitwise XOR operation.
NETWORK ATTACHED STORAGE(NAS)
> NAS is an IP based dedicated, high-performance file sharing and storage device.
> Uses network and file-sharing protocols to provide access to the file data.
> Ex: Common Internet File System (CIFS) and Network File System (NFS).
> Enables both UNIX and Microsoft Windows users to share the same data seamlessly.
> NS device uses its own operating system and integrated hardware and software components to
meet specific file-service needs.
> Its operating system is optimized for file VO which performs better than a general-purpose
server.
> ANAS device can serve more clients than general-purpose servers and provide the benefit of
server consolidation.
Components of NAS
> NAS device has two key components (as shown in Fig 2.33): NAS head and storage.
> In some NAS implementations, the storage could be external to the NAS device and shared with
other hosts.
One or more network interface cards (NICS), which provide connectivity to the client
network.
An optimized operating system for managing the NAS functionality. It translates file
level requests into block-storage requests and further converts the data supplied at the
block level to file data
> The NAS environment includes clients accessing a NAS device over an IP network using file
sharing protocols.
NAS Device OS
Storage Interface
CIFS
Windows Client
Storage Array
NAS Device
Fig 2 33
Read Operation with Cache
When a host issues a read request, the storage controller reads the tag RAM to determine
whether the required data is available in cache.
> If the requested data is found in the cache, it is called a read cache hit or read hit and
data is sent directly to the host, without any disk operation (see Fig [a]).This provides a
fast response time to the host (about a millisecond).
> If the requested data is not found in cache, it is called a cache miss and the data must be
read from the disk(see ig [b]).. The back-end controller accesses the appropriate disk and
retrieves the requested data. Data is then placed in cache and is finally sent to the host
through the front- end controller.
> Cache misses increase VO response time.
> A Pre-fetch, or Read-ahead, algorithm is used when read requests are sequential. In a
sequential read request, a contiguous set of associated blocks is retrieved. Several other
blocks that have not yet been requested by the host can be read from the disk and placed
into cache in advance. When the host subsequently requests these blocks, the read
operations will be read hits.
> This process significantly improves the response time experienced by the host.
> The intelligent storage system offers fixed and variable prefetch sizes.
> In fixed pre-fetch, the intelligent storage system pre-fetches a fixed amount of data. It is
most suitable when VO sizes are uniform.
> In variable pre-fetch, the storage system pre-fetches an amount of data in multiples of the size
of the host request.
20
Physical Disks
Host
Read
Cache
Request
Send Data
2
a)
Physical Disks
Host Read Read
Cache
Request Request
(b)
> When an /O is written to cache and acknowledged, it is comnpleted in far less time (from
the host's perspective) than it would take to write directly to disk.
> Sequential writes also offer opportunities for optimization because many smaller writes
can be coalesced for larger transfers to disk drives with the use of cache.
> A
write operation with cache is implemented in the following ways:
> Write-back cache: Data is placed in cache and an acknowledgment is sent to the host
immediately. Later, data from several writes are committed to the disk. Write response
times are much faster, as the write operations are isolated from the mechanical delays of
the disk. However., uncommitted data is at risk of loss in the event of cache failures.
> Write-through cache: Data is placed in the cache and immediately written to the disk,
NAS File Sharing Protocols
> NAS devices support multiple file-service protocols to handle file /O requests
> NAS devices enable users to share file data across different operating environments
> I provides a means for users to migrate transparently from one operating system to another
Network File System (NFS)
> NES is a client-server protocol for file sharing that is commonly used on UNIX systems.
> NFS was originally based on the connectionless User Datagram Protocol (UDP).
> It uses Remote Procedure Call (RPC) as a method of inter-process communication between two
Computers.
> The NFS protocol provides a set of RPCs to access a remote file system for the following
operations:
> It enables clients to access files and services on remote computers over TCPIP.
> t is a public, or open, variation of Server Message Block (SMB) protocol.
> I provides following features to ensure data integrity:
It uses file and record locking to prevent users from overwriting the work of another user
on a file or a record.
It supports fault tolerance and can automatically restore connections and reopen files that
were open prior to an interruption. This feature depends on whether an application is
written to take advantage of this.
CIFS is a stateful protocol because the CIES server maintains connection information
regarding every connected client. If a network failure or CIFS server failure occurs, the
client receives a disconnection notification. User disruption is minimized if the
application has the embedded intelligence to restore the connection. However, the
embedded intelligence is missing. the user must take steps to reestablish the CIFS
connection.
RAID Levels
V Application performance
data availability requirements
cost
RAID O
> RAID0configuration uses data striping techniques, where data is striped across all the disks
within a RAID set. Therefore it utilizes the full storage capacity of a RAID set.
> To read data, all the strips are put back together by the controller.
> Fig shows RAID 0 in an array in which data is striped across five disks.
RAID Controller
Disis
Fig: RAID 0
> When the number of drives in the RAID set increases, performance improves because more
data can be read or written simultaneously.
RAIDO is a good option for applications that need high VO throughput.
> However, if these applications require high availability during drive failures, RAID 0 does not
provide data protection and availability.
RAID 1
RAID Controller
pisks
Fig: RAID I
Nested RAID
> Most data centers require data redundancy and performance from their RAID arrays.
> RAID l+0 and RAID 0+1 combine the performance benefits of RAID 0 with the redundancy
benefits of RAID 1.
> They use striping and mirroring techniques and combine their benefits.
> These types of RAID require an even number of disks, the minimum being four (see
Fig).
10
RAID
Mirrortng Mirrortng Mirroring
Controller triping Sutping
3) Connectors,
Hubs),
4) Interconnecting Devices (Such As Fc Switches Or
5) San Management Software.
Node Ports
libraries are all referred to as
> In fibre channel, devices such as hosts, storage and tape
Nodes.
Each node is a source or destination of information for one or more nodes.
MODULE 2
Storage Area Networks
a physical interface for communicating
> Each node requires one or more ports to provide
with other nodes.
Node
Port A
Link
Porr O
Port 1
Port a
Cables
> SAN implementations use optical fiber cabling.
back-end connectivity
> Copper can be used for shorter distances for
Optical fiber cables carry data in the form of light.
> There are two types of optical cables
:Multi-Mode And Single-Mode.
projected at different angles
1)Multi-mode fiber (MMF) cable carries muliple beams of light
simultaneously onto the core of the cable (see Fig 2.2 (a).
light beams traveling inside the cable tend to
> In an MMF transmission, multiple
signal strength after it travels a
disperse and collide. This collision weakens the
dispersion.
certain distance a process known as modal
data centers for shorter distance runs
> MMFs are generally used within
of light projected at the center of the core (see
2) Single-mode fiber (SMF) carries a single ray
Fig 2.2 (b).
travels in a straight line through the core
In an SMF transmission, a single light beam
of the fiber.
> The small core and the single light wave limits
modal dispersion. Among all types of
maximum
fibre cables, single-mode provides minimum signal attenuation over
up Km).
> A single-mode cable is used for long-distance cable runs, limited only by the power
of the laser at the transmitter and sensitivity of the receiver.
> SMEFs are used for longer distances.
Claddlng Core
Cadding Core
Wght In
Connectors
> They are attached at the end of the cable to enable swift connection and
disconnection of the
cable to and from a port.
> A Standard connector (SC) (see Fig 2.3 (a)) and a Lucent connector (LC) (see
Fig 2.3 (b)
are two commonly used connectors for fiber optic cables.
An SC is used for data transmission speeds up to 1Gb/s, whereas an LC is
used for speeds up
to 4 Gb/s.
1) Hubs,
2) Switches,
3) Directors
> Hubs are used as communication devices in FC-AL implementations. Hubs physically
connect nodes in a logical loop or a physical star topology.
> All the nodes must share the bandwidth because data travels through all the connection
points. Because of availability of low cost and high performance switches, hubs are no
longer used in SANS.
Switches are more intelligent than hubs and directly route data from one physical
port to another. Therefore, nodes do not share the bandwidth. Instead, each node has a
dedicated communication path, resulting in bandwidh aggregation.
> Switches are available with:
Fixed port count
> Directors are larger than switches and are deployed for data center implementations.
> The function of directors is similar to that of FC switches, but directors have higher
port count and fault tolerance capabilities.
> Port card or blade has multiple ports for connecting nodes and other FC switches
SAN management software manages the interfaces between hosts, interconnect devices,
and storage arrays.
> The software provides a view of the SAN environment and enables management of
various resources from one central console.