0% found this document useful (0 votes)

100 views135 pages

Section 2 - Storage Systems Architecture

Uploaded by

Aditya Raina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views135 pages

Section 2 - Storage Systems Architecture

Uploaded by

Aditya Raina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 135

Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Section 2 - Storage Systems Architecture

© 2007 EMC Corporation. All rights reserved.

Welcome to Section 2 of Storage Technology Foundations – Storage Systems Architecture.

Copyright © 2007 EMC Corporation. All rights reserved.

These materials may not be copied without EMC's written consent.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change
without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, Navisphere, CLARiiON, and Symmetrix are registered trademarks and EMC Enterprise Storage, The
Enterprise Storage Company, The EMC Effect, Connectrix, EDM, SDMS, SRDF, Timefinder, PowerPath, InfoMover,
FarPoint, EMC Enterprise Storage Network, EMC Enterprise Storage Specialist, EMC Storage Logix, Universal Data Tone,
E-Infostructure, Access Logix, Celerra, SnapView, and MirrorView are trademarks of EMC Corporation.
All other trademarks used herein are the property of their respective owners.

Storage Systems Architecture Introduction -1

Section Objectives
Upon completion of this section, you will be able to:
y Describe the physical and logical components of a host
y Describe common connectivity components and protocols
y Describe features of intelligent disk storage systems
y Describe data flow between the host and the storage
array

The objectives for this section are shown here. Please take a moment to read them.

Storage Systems Architecture Introduction -2

In This Section
This section contains the following modules:
1. Components of a Host
2. Connectivity
3. Physical Disks
4. RAID Arrays
5. Disk Storage Systems
Additional Information:
y Apply Your Knowledge
y Data Flow Exercise (Student Resource Guide ONLY)
y Case Studies (Student Resource Guide ONLY)

This section is comprised of the 5 modules shown here.

This section also contains Apply Your Knowledge information, a Data Flow Exercise, and two Case
Studies.
The Apply Your Knowledge information is presented on-line at the end of Module 5. The Data Flow
Exercise and two Case Studies are only available in the Student Resource Guide. Please make sure to
download the Student Resource Guide and review these materials prior to taking the on-line
assessment.

Storage Systems Architecture Introduction -3

Storage Systems Architecture Introduction -4

Components of a Host
Upon completion of this module, you will be able to:
y List the hardware and software components of a host
y Describe key protocols and concepts used by each
component

© 2007 EMC Corporation. All rights reserved. Components of a Host - 1

In this module, we look at the hardware and software components of a host, as well as the key
protocols and concepts that make these components work. This provides the context for how data
typically flows within the host, as well as between the hosts and storage systems.
The objectives for this module are shown here. Please take a moment to read them.

Components of a Host -1
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Examples of Hosts

Server
Laptop

Group of Servers

Mainframe

© 2007 EMC Corporation. All rights reserved. Components of a Host - 2

A host could be something small, like a laptop, or it could be larger, such as a server, a group or cluster
of servers, or a mainframe. The host has physical (hardware) and logical (software) components. Let’s
look at the physical components first.

Components of a Host -2
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Components of a Host

Bus
CPU Storage

I/O Devices

© 2007 EMC Corporation. All rights reserved. Components of a Host - 3

The most common physical components found in a host system include the Central Processing Unit
(CPU), Storage, and Input/Output Devices (I/O).
The CPU performs all the computational processing (number-crunching) for the host. This processing
involves running programs, which are a series of instructions that tell the CPU what to do.
Storage can be high-speed, temporary (volatile, meaning that the content is lost when power is
removed) storage, or permanent magnetic or optical storage media.
I/O devices allow the host to communicate with the outside world.
Let’s look at each of these elements, starting with the CPU.

Components of a Host -3
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

CPU

Bus
ALU L1 Cache

Bus

Registers
CPU

© 2007 EMC Corporation. All rights reserved. Components of a Host - 4

The CPU consists of three major parts: The Arithmetic Logical Unit, Registers, and the L1 Cache.
The Arithmetic Logic Unit (ALU) is the portion of the CPU that performs all the manipulation of data,
such as addition of numbers.
The Registers hold data that is being used by the CPU. Because of their proximity to the ALU,
registers are very fast. CPUs will typically have only a small number of registers – 4 to 20 is common.
L1 cache is additional memory which is associated with the CPU. It holds data and program
instructions that are likely to be needed by the CPU in the near future. The L1 cache will be slower
than registers, but there will be more storage space in the L1 cache than in the registers – 16 KB is
common. Although L1 cache is optional, it is found on most modern CPUs.
The CPU connects to other components in the host via a bus. Buses will be discussed in the
Connectivity module of this Section.

Components of a Host -4
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Storage

n
…
Data n

3 Data 3
Disk
2 Data 2
1 Data 1
0 Data 0
Address Content

Memory

© 2007 EMC Corporation. All rights reserved. Components of a Host - 5

Storage in a host is comprised of memory modules and magnetic or optical media.

Memory provides access to data at electronic speeds as it is implemented using silicon chips and has
no mechanical parts. Generally, there are two types of memory within a host:
y Random Access Memory (RAM) - This is the most common form of memory. It allows direct
access to any memory location and can have data written into it or read from it
y Read Only Memory (ROM) - contains data that can be read, but not changed. It is usually used for
data needed during internal routines such as system startup
Modern hosts can have large amounts of memory – 16 GB and upwards. The slide shows a
representation of memory addressing. Each memory location is given a unique address which is used
for reading/writing data from and to memory.
Examples of media-based host storage include:
y Hard disk
y CDROM or DVDROM
y Floppy disk
y Tape drive

Components of a Host -5
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Hierarchy – Speed and Cost

Fast CPU registers

L1 cache
L2 cache

Speed Magnetic RAM

disk

Optical
Tape
disk
Slow
Low High
Cost
© 2007 EMC Corporation. All rights reserved. Components of a Host - 6

In any host, there is a variety of storage types. Each type has different characteristics of speed, cost,
and capacity. As a general rule, faster technologies cost more and, as a result, are more scarce.
CPU registers are extremely fast but limited in number to a few tens of locations at most, and are
expensive in terms of both cost and power use. As we move down the list, speeds decrease along with
cost.
Magnetic disks are generally fixed, whereas optical disk and tape use removable media. The cost of
optical and tape media per MB stored is much lower than that of magnetic disk.

Components of a Host -6
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

I/O Devices
y Human interface
– Keyboard
– Mouse
– Monitor

y Computer-computer interface
– Network Interface Card (NIC)

y Computer-peripheral interface
– USB (Universal Serial Bus) port
– Host Bus Adapter (HBA)

© 2007 EMC Corporation. All rights reserved. Components of a Host - 7

I/O devices allow a host to interact with the outside world by sending and receiving data. The basic I/O
devices, such as the keyboard, mouse and monitor, allow users to enter data and view the results of
operations. Other I/O devices allow hosts to communicate with each other or with peripheral devices,
such as printers and cameras.

Components of a Host -7
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

HBAs
Host

Apps

Operating System

DBMS Mgmt Utilities

File System

Volume Management

Multi-pathing Software
Device Drivers
HBA HBA HBA

© 2007 EMC Corporation. All rights reserved. Components of a Host - 8

The host connects to storage devices using special hardware called a Host Bus Adapter (HBA). HBAs
are generally implemented as either an add-on card or a chip on the motherboard of the host. The ports
on the HBA are used to connect the host to the storage subsystem. There may be multiple HBAs in a
host.
The HBA has the processing capability to handle some storage commands, thereby reducing the
burden on the host CPU.

Components of a Host -8
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Logical Components of a Host

Host

Apps

Operating System

DBMS Mgmt Utilities

File System

Volume Management

Multi-pathing Software
Device Drivers
HBA HBA HBA

© 2007 EMC Corporation. All rights reserved. Components of a Host - 9

Hosts generally include software components such as:

y Applications - provide a point of interaction either between the user and the host or between hosts
y Operating system - controls all aspects of the computing environment. It manages the user
interface and the internal operations of all hardware components of the system
y The Operating System:
− Provides the services required for applications to access data
− Monitors and responds to user actions and the environment
− Organizes and controls the hardware components
− Connects hardware components to the application program layer and the users
− Manages system activities such as storage and communication
y File System (and Files) - provides a logical structure for data access and data storage
y Device drivers:
− Allows the operating system to be aware of, and use a standard interface to access and control a
specific device (i.e., printer, speakers, mouse, keyboard, video, storage devices)
− Provides the appropriate protocols to the host to allow access to the device

Components of a Host -9
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

File Systems
Host

Apps

Operating System

DBMS Mgmt Utilities

File System

Volume Management

Multi-pathing Software
Device Drivers
HBA HBA HBA

© 2007 EMC Corporation. All rights reserved. Components of a Host - 10

The file system is the general name given to the host-based logical structures and software routines
used to control access to data storage.
The file system block is the smallest ‘container’ allocated to a file’s data. Each filesystem block is a
contiguous area of physical disk capacity.
y Blocks can range in size, depending on the type of files being stored and accessed.
y The block size is fixed (by the operating system) at the time of file system creation.
y Since most files are larger than the pre-defined filesystem block size, a file’s data spans multiple
filesystem blocks. However, the filesystem blocks containing all of the file’s data may not
necessarily be contiguous on a physical disk. Over time, as files grow larger, the file system
becomes increasingly fragmented.
In multi-user, multi-tasking environments, filesystems manage shared storage resources using:
y Directories, paths and structures to identify file locations
y Volume Managers to hide the complexity of physical disk structures
y File locking capabilities to control access to files. This is important when multiple users or
applications attempt to access the same file simultaneously

Components of a Host - 10
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

File System: Metadata Examples

UNIX (UFS) Windows (NTFS)
y File type and permissions y Time stamp and link count
y Number of links y File name
y Owner and group IDs y Access rights
y Number of bytes in the file y File data
y Last file access y Index information
y Last file modification y Volume information

© 2007 EMC Corporation. All rights reserved. Components of a Host - 11

The number of files created and accessed by a host can be very large. Instead of using a linear or flat
structure (similar to having many objects in a single box), a filesystem is divided into directories
(smaller boxes), or folders.
Directories:
y Organize file systems into containers which may hold files as well as other (sub)directories
y Hold information about files they contain
A directory is a special type of file containing a list of filenames and associated metadata (information
or data about the file). When a user attempts to access a given file by name, the name is used to look
up the appropriate entry in the directory. That entry holds the corresponding metadata.

Components of a Host - 11
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

File Systems: Journaling and Logging

y Improves data integrity and system restart time over non-
journaling file systems
y Uses a separate area called a log or journal
– May hold all data to be written
– May hold only metadata

y Disadvantage - slower than other file systems

– Each file system update requires at least 1 extra write – to the log

© 2007 EMC Corporation. All rights reserved. Components of a Host - 12

Non-journaling file systems create a potential for lost files because they may use many separate writes
to update their data and metadata. If the system crashes during the write process, metadata or data may
be lost or corrupted. When the system reboots, the filesystem attempts to update the metadata
structures by examining and repairing them. This operation takes a long time on large file systems. If
there is insufficient information to recreate the desired or original structure, files may be misplaced or
lost and file systems corrupted.
A journaling file system uses a separate area called a log, or journal. This journal may contain all the
data to be written (physical journal), or may contain only the metadata to be updated (logical journal).
Before changes are made to the filesystem, they are written to this separate area. Once the journal has
been updated, the operation on the filesystem can be performed. If the system crashes during the
operation, there is enough information in the log to "replay" the log record and complete the operation.
Journaling results in a very quick filesystem check by only looking at the active, most recently
accessed parts of a large file system. In addition, because information about the pending operation is
saved, the risk of files being lost is lessened.
A disadvantage of journaling filesystems is they are slower than other file systems. This slowdown is
the result of the extra operations that have to be performed on the journal each time the filesystem is
changed. The much shortened time for file system check and the integrity provided by journaling far
outweighs this disadvantage. Nearly all file system implementations use journaling.

Components of a Host - 12
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Volume Management
Host

Apps

Operating System

DBMS Mgmt Utilities

File System

Volume Management

Multi-pathing Software
Device Drivers
HBA HBA HBA

© 2007 EMC Corporation. All rights reserved. Components of a Host - 13

The volume manager is an optional intermediate layer between the file system and the physical disks.
It sits between the file system and the physical disk system. It ‘aggregates’ several hard disks to form a
large, virtual disk and makes this virtual disk visible to higher level programs and applications. It
optimizes access to storage and simplifies the management of storage resources.

Components of a Host - 13
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

How Files are Moved to and from Storage

Teacher Course File(s) File System Files File System Blocks

Configures / Reside Mapped by

Manages in file system to

Disk Sectors Disk Physical LVM Logical

Extents Extents

Managed by Consisting Mapped by Residing

Disk Storage of LVM to in
Subsystem

© 2007 EMC Corporation. All rights reserved. Components of a Host - 14

This represents how files are moved to and from Storage:

1. A teacher designs course materials using an application and stores them as files on a filesystem.
2. These files are mapped to units of data called filesystem blocks, which are mapped to disk sectors
by the operating system, in the absence of a Logical Volume Manager.
3. When a Logical Volume Manager (LVM) is used, filesystem blocks are mapped to logical extents,
which in turn are mapped to disk physical extents. These physical disk extents map to disk sectors.

Components of a Host - 14
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y Hosts typically have:
– Hardware: CPU, memory, buses, disks, ports, and interfaces
– Software: applications, operating systems, file systems, device
drivers, volume managers

y Journaling enables:
– very fast file system checks in the event of system crash
– provides better integrity for file system structure

y HBAs are used to connect hosts to storage devices

© 2007 EMC Corporation. All rights reserved. Components of a Host - 15

These are the key points covered in this module. Please take a moment to review them.

Components of a Host - 15
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

9 Check Your Knowledge

y What are some examples of hosts?
y Describe the hardware components found in most hosts.
y What is the function of the operating system?
y What is the function of the file system?
y What is volume management?

© 2007 EMC Corporation. All rights reserved. Components of a Host - 16

Check your knowledge of this module by taking some time to answer the questions shown on the slide.

Components of a Host - 16
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Connectivity
Upon completion of this module, you will be able to:
y Describe the physical components of a networked
storage environment
y Describe the logical components (communication
protocols) of a networked storage environment

© 2007 EMC Corporation. All rights reserved. Connectivity - 1

In the previous module, we looked at the host environment. In this module, we discuss how the host is
connected to storage, and the protocols used for communication between them.
The objectives for this module are shown here. Please take a moment to read them.

Connectivity - 1
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Components – Host with Internal Storage

Bus
CPU

Port
HBA
Host
Port Cable

Disk

© 2007 EMC Corporation. All rights reserved. Connectivity - 2

There are three key connectivity components associated with hosts:

y Bus – for example, connecting the CPU to memory
y Ports – connections to external devices such as printers, scanners, or storage
y Cables – copper or fiber optic “wires” connecting a host to internal or external devices
A host with internal storage may be anything from a laptop to a large enterprise server. All of the
components are internal to the host enclosure.

Connectivity - 2
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Bus Technology
Serial

Serial Bi-directional

Parallel

© 2007 EMC Corporation. All rights reserved. Connectivity - 3

A bus is a collection of paths that facilitate data transmission from one part of the computer to another.
Physical components communicate across a bus by sending packages of data between the devices.
These packets can travel in a serial path or in parallel paths. In serial communication, the bits travel
one behind the other. In parallel communication, the bits can move along multiple paths
simultaneously.
A simple analogy to describe buses is a highway:
A Serial Bus is a one-way, single-lane highway where data packets travel in a line in one direction.
A Bi-directional Serial Bus is a two-lane road where data packets travel in a line in both directions
simultaneously
A Parallel Bus is a multi-lane, highway. This could be a bi-directional, multi-lane highway where they
travel in different lanes in both directions simultaneously.
Note: The Parallel Bi-directional Bus is not shown in this slide.
.

Connectivity - 3
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Bus Technology
y System Bus – connects CPU to Memory
y Local (I/O) Bus – carries data to/from peripheral devices
y Bus width measured in bits
y Bus speed measured in MHz
y Throughput measured in MB/S

© 2007 EMC Corporation. All rights reserved. Connectivity - 4

Generally, there are at least two types of buses in a computer system:

y System Bus – carries data from the processor to memory
y Local or I/O Bus – carries data to/from peripheral devices such as storage devices. The local bus is
a high-speed pathway that connects directly to the processor
The size of a bus, known as its width, is important because it determines how much data can be
transmitted at one time. For example, a 16-bit bus can transmit 16 bits of data, whereas a 32-bit bus
can transmit 32 bits of data. The width of a bus may be compared to the number of lanes on a
highway.
Every bus has a clock speed measured in MHz. A fast bus allows data to be transferred faster, which
makes applications run faster.

Connectivity - 4
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Connectivity Protocols
y Protocol = a defined format for communication – allows
the sending and receiving devices to agree on what is
being communicated.

Tightly Directly Network

Connected Attached Connected
Entities Entities Entities

© 2007 EMC Corporation. All rights reserved. Connectivity - 5

Protocol is a defined format, in this case, for communication between hardware or software
components. Communication protocols are defined for systems and components that are:
y Tightly connected entities – such as central processor to RAM, or storage buffers to controllers –
use standard BUS technology (e.g. System bus or I/O – Local Bus)
y Directly attached entities or devices connected at moderate distances – such as host to printer or
host to storage
y Network connected entities – such as networked hosts, Network Attached Storage (NAS) or
Storage Area Networks (SAN)
We will discuss the communication protocols (logical components) found in each of these connectivity
models, starting with the tightly connected or bus protocols.

Connectivity - 5
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Communication Protocols

Host

Apps

Operating System
PCI

SCSI or IDE/ATA Device Drivers

© 2007 EMC Corporation. All rights reserved. Connectivity - 6

The protocols for the local (I/O) bus and for connections to an internal disk system include:
y PCI
y IDE/ATA
y SCSI
The next few slides examine each of these.

Connectivity - 6
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Bus Technology - PCI

y Peripheral Component Interconnect (PCI) defines the
local bus system within a computer
y It is an interconnection between microprocessor and
attached devices, in which expansion slots are spaced
closely for high-speed operation
y Has Plug and Play functionality
y PCI is 32/64 bit
y Throughput is 133 MB/sec

© 2007 EMC Corporation. All rights reserved. Connectivity - 7

The Peripheral Component Interconnect (PCI) is a specification defining the local bus system within a
computer. The specification standardizes how PCI expansion cards, such as network cards or modems,
install themselves and exchange information with the central processing unit (CPU).
In more detail, a Peripheral Component Interconnect (PCI) includes:
y an interconnection system between a microprocessor and attached devices, in which expansion
slots are spaced closely for high-speed operation
y plug and play functionality that makes it easy for a host to recognize a new card
y 32 or 64 bit data
y a throughput of 133 MB/sec
PCI Express is an enhanced PCI bus with increased bandwidth.

Connectivity - 7
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

IDE/ATA
y Integrated Device Electronics (IDE) / Advanced
Technology Attachment (ATA)
y Most popular interface used with modern hard disks
y Good performance at low cost
y Desktop and laptop systems
y Inexpensive storage interconnect

© 2007 EMC Corporation. All rights reserved. Connectivity - 8

The most popular interface protocol used in modern hard disks is the one most commonly known as
IDE. This interface is also known as ATA.
IDE/ATA hard disks are used in most modern PCs, and offer excellent performance at relatively low
cost.

Connectivity - 8
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

SCSI - Small Computer System Interface

y Most popular hard disk interface for servers
y Higher cost than IDE/ATA
y Supports multiple simultaneous data access
y Currently both parallel and serial forms
y Used primarily in “higher end” environments

© 2007 EMC Corporation. All rights reserved. Connectivity - 9

Small Computer Systems Interface, SCSI, has several advantages over IDE that make it preferable for
use in higher-end machines. It is far less commonly used than IDE/ATA in PCs due to its higher cost
and the fact that its advantages are not useful for the typical home or business desktop user.
SCSI began as a parallel interface, allowing the connection of devices to a PC, or other servers, with
data being transmitted across multiple data lines. SCSI itself, however, has been broadened greatly in
terms of its scope, and now includes a wide variety of related technologies and standards.

Connectivity - 9
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

SCSI Model

Target
Initiator

© 2007 EMC Corporation. All rights reserved. Connectivity - 10

As you can see from the diagram, a SCSI device that ‘starts’ a communication is an “initiator”, and a
SCSI device that services a request is a “target”.
You should not necessarily think of initiators as hosts, and targets as storage devices. Storage devices
may initiate a command to other storage devices or switches, and hosts may be targets and receive
commands from the storage devices.
After initiating a request to the target, the host can process other events without having to wait for a
response from the target. After it finishes processing, the target signals a command complete or a
status message back to the host.

Connectivity - 10
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

SCSI Model

Target LUNs
ID

Initiator
ID

© 2007 EMC Corporation. All rights reserved. Connectivity - 11

Components of a SCSI communication include:

y Initiator ID – uniquely identifies an initiator that is used as an “originating address”
y Target ID – uniquely identifies a target. Used as the address for exchanging commands and status
information with initiators
y Logical Unit Numbers (LUNs) – identifies a specific Logical Unit in a target. Logical Units can be
more than a single disk

Connectivity - 11
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

SCSI Addressing

Initiator ID Target ID LUN

y Initiator ID - a number from 0 to 15 with the most common

value being 7.
y Target ID - a number from 0 to 15
y LUN - a number that specifies a device addressable
through a target.

© 2007 EMC Corporation. All rights reserved. Connectivity - 12

Initiator ID is the original initiator ID number (used to send responses back to the initiator from the
storage device). A SCSI host bus adapter (referred to as a controller) can be implemented in two ways:
y an onboard interface
y an ‘add in’ card plugged into the system I/O bus
Target ID is the value for a specific storage device. It is an address that is set on the interface of the
device such as a disk, tape or CDROM.
LUN is Logical Unit Number of the device. It reflects the actual address of the device, as seen by the
target.

Connectivity - 12
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Identifier - Addressing

Host Addressing
– Controller
c0 t0 d0
– Target
– LUN

Peripheral
LUNs
Controller
d0 d1 d2
Target

c0 –
Controller/
Initiator/HBA
© 2007 EMC Corporation. All rights reserved. Connectivity - 13

For example, a logical device name (used by a host) for a disk drive may be: cn|tn|dn, where
y cn is the controller
y tn is the target ID of the devices such as t0, t1, t2 and so on
y dn is the device number, which reflects the actual address of the device unit. This is usually d0 for
most SCSI disks because there is only one disk attached to the target controller.
In intelligent storage systems, discussed later, each target may address many LUNs.

Connectivity - 13
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

SCSI - Pros and Cons

y Pros: y Cons:
– Fast transfer speeds, up to 320 – Configuration and setup specific
megabytes per second to one computer
– Reliable, durable components – Unlike IDE, few BIOS support the
– Can connect many devices with a standard
single bus, more than just HDs – Overwhelming number of
– SCSI host cards can be put in variations in the standard,
almost any system hardware, and connectors
– Full backwards compatibility – No common software interfaces
and protocol

© 2007 EMC Corporation. All rights reserved. Connectivity - 14

SCSI has many significant advantages in relation to IDE. They include:

y a faster transfer speed (Note: 320 MB/s refers to parallel SCSI. Serial SCSI may be different.)
y robust software and hardware
y can connect many devices to a computer
y allows SCSI Host Adapter cards to be put into almost any system
y supports a remarkable level of backwards compatibility

Connectivity - 14
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Comparison IDE/ATA vs. SCSI

Feature IDE/ATA SCSI

Connectivity Market Internal Storage Internal and External
Storage

Speed (MB/sec) 100/133/150 320

Hot Pluggable No Yes

Expandability Easier to set up Very good but very

expensive to set up

Cost/Performance Good High cost/Fast

transfer speed

© 2007 EMC Corporation. All rights reserved. Connectivity - 15

Expandability and number of devices - SCSI is superior to IDE/ATA. This advantage of SCSI only
matters if you actually need this much expansion capability as SCSI is more involved and expensive to
set up.
Device Type Support – SCSI holds a significant advantage over IDE/ATA in terms of the types of
devices each interface supports.
Cost – the IDE/ATA interface is superior to the SCSI interface.
Performance – These factors influence system performance for both interfaces:
y Maximum Interface Data Transfer Rate: Both interfaces presently offer very high maximum
interface rates, so this is not an issue for most PC users. However, if you are using many hard disks
at once, for example in a RAID array, SCSI offers better overall performance.
y Device-Mixing Issues: IDE/ATA channels that mix hard disks and CD-ROMs are subject to
significant performance hits due to the fact these devices are operating at different speeds (hard
disks read and write relatively quickly when compared to CDROM drives). Also, the IDE channel
that can only support a single device at a time must wait for the slower optical drive to complete a
task. SCSI does not have this problem.
y Device Performance: When looking at particular devices, SCSI can support multiple devices
simultaneously while IDE/ATA can only support a single device at a time.
Configuration and set-up – IDE/ATA is easier to set up, especially if you are using a reasonably new
machine and only a few devices. SCSI has a significant advantage over IDE/ATA in terms of hard disk
addressing issues.

Connectivity - 15
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Components – Host with External Storage

Port
Bus
CPU

Host

HBA
Cable

Port

Disk

© 2007 EMC Corporation. All rights reserved. Connectivity - 16

A host with external storage is usually a large enterprise server. Components are identical to those of a
host with internal storage. The key difference is in the external storage interfaces used.

Connectivity - 16
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Fibre Channel Host

Apps
DBMS Mgmt Utils
File System
LVM
Multipathing Software
Device Drivers
HBA HBA HBA

Fibre Channel

Storage Arrays

© 2007 EMC Corporation. All rights reserved. Connectivity - 17

Fibre Channel is a high–speed interconnect used in networked storage to connect servers to shared
storage devices. Fibre Channel components include HBAs, hubs, switches, cabling, and disks.
The term Fibre Channel refers to both the hardware components and the protocol used for
communication between nodes.

Connectivity - 17
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

External Storage Interfaces – A Comparison

y SCSI
– Limited distance
– Limited device count
– Usually limited to single initiator
– Single-ported drives

y Fibre Channel
– Greater distance
– High device count in SANs
– Multiple initiators
– Dual-ported drives

© 2007 EMC Corporation. All rights reserved. Connectivity - 18

The two most popular interfaces for external storage devices are SCSI and Fibre Channel (FC). SCSI is
also commonly used for internal storage in hosts; FC is almost never used internally.

Connectivity - 18
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Fibre Channel Connectivity

Hosts

Switches Storage

© 2007 EMC Corporation. All rights reserved. Connectivity - 19

When computing environments require high speed connectivity, they use sophisticated equipment to
connect hosts to storage devices.
Physical connectivity components in networked storage environments include:
y HBA (Host-side interface) – Host Bus Adapters connect the host to the storage devices
y Optical cables – fiber optic cables to increase distance, and reduce cable bulk
y Switches – used to control access to multiple attached devices
y Directors – sophisticated switches with high availability components
y Bridges – connections to different parts of a network

Connectivity - 19
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y The physical components of a networked storage
environment
y The logical components (communication protocols) of a
networked storage environment

© 2007 EMC Corporation. All rights reserved. Connectivity - 20

These are the key points covered in this module. Please take a moment to review them.

Connectivity - 20
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

9 Check Your Knowledge

y What are the key physical connectivity components of a
small systems environment?
y What are the key physical connectivity components of
networked storage computing environments?
y What are the key logical connectivity protocols found in
all computing environments?

© 2007 EMC Corporation. All rights reserved. Connectivity - 21

Check your knowledge of this module by taking some time to answer the questions shown on the slide.

Connectivity - 21
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Connectivity - 22
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Disks
After completing this module, you will be able to:
y Describe the major physical components of a disk drive
and their function
y Define the logical constructs of a physical disk
y Describe the access characteristics for disk drives and
their performance implications
y Describe the logical partitioning of physical drives

© 2007 EMC Corporation. All rights reserved. Physical Disks - 1

There are several methods for storing data, however, in this module, the focus is on disk drives. Disk
drives use many types of technology to perform their job: mechanical, chemical, magnetic, electrical.
Our intent is not to make you an expert on every detail about the drive - rather you should have a high
level understanding of how both the physical and logical parts of a drive work. This enables you to see
how these parts impact system capacity, reliability, and performance.
The objectives for this module are shown here. Please take a moment to read them.

Physical Disks -1
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Disk Drive Components

Upon completion of this lesson, you will be able to:
y Describe the physical components of a disk drive
y Describe the physical structure of a disk drive platter
y Discuss how the geometry of a disk impacts how data is
recorded on a platter
y Differentiate between the logical organization of data and
the physical organization on a disk drive

© 2007 EMC Corporation. All rights reserved. Physical Disks - 2

The focus of this lesson is on the components of a disk drive and how they work. Additionally, it is
important to understand how the data is organized on the disk based on its disk geometry.

Physical Disks -2
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Components: Platters

01010100111010101010
00110100111010101010
00110100111010101010
10110101011010101010

© 2007 EMC Corporation. All rights reserved. Physical Disks - 3

A hard drive contains a series of rotating platters within a sealed case. The sealed case is known as
Head Disk Assembly, or HDA.
A platter has the following attributes:
y It is a rigid, round disk which is coated with magnetically sensitive material.
y Data is stored in binary code (0s and 1s). It is encoded by polarizing magnetic areas, or domains,
on the disk surface.
y Data can be written to and read from both surfaces of a platter.
y A platter’s storage capacity varies across drives. There is an industry trend toward higher capacity
as technology improves.
− Note: The drive’s capacity is determined by the number of platters, the amount of data which
can be stored on each platter, and how efficiently data is written to the platter.
Note: These concepts apply to disk drives used in systems of all sizes.

Physical Disks -3
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Components: Spindle

Spindle

Platters

© 2007 EMC Corporation. All rights reserved. Physical Disks - 4

Multiple platters are connected by a spindle.

y The spindle is connected to a motor which rotates at a constant speed.
y The spindle rotates continuously until power is removed from the spindle motor.
y Many hard drive failures occur when the spindle motor fails.
Disk platters spin at speeds of several thousand revolutions per minute. These speeds increase as
technologies improve, though there is a physical limit to the extent to which they can improve.

Physical Disks -4
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Components: Read/Write Heads

© 2007 EMC Corporation. All rights reserved. Physical Disks - 5

Data is read and written by read/write heads, or R/W heads. Most drives have two R/W heads per
platter, one for each surface of the platter.
y When reading data, they detect magnetic polarization on the platter surface.
y When writing data, they change the magnetic polarization on the platter surface.
Since reading and writing data is a magnetic process, the R/W heads never actually touch the surface
of the platter. There is a microscopic air gap between the read/write heads and the platter. This is
known as the head flying height.
When the spindle rotation has stopped, the air gap is removed and the R/W heads rest on the surface of
the platter in a special area near the spindle called a landing zone. The landing zone is coated with a
lubricant to reduce head/platter friction. Logic on the disk drive ensures that the heads are moved to
the landing zone before they touch the surface.
If the drive malfunctions and a read/write head accidentally touches the surface of the platter outside of
the landing zone, it is called a head crash. When a head crash occurs, the magnetic coating on the
platter gets scratched and damage may also occur to the R/W head. A head crash generally results in
data loss.

Physical Disks -5
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Components: Actuator

Spindle

Actuator

© 2007 EMC Corporation. All rights reserved. Physical Disks - 6

Read/write heads are mounted on the actuator arm assembly, which positions the read/write head at
the location on the platter where data needs to be written or read.

Physical Disks -6
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Disk Structures: Actuator Arm Assembly

R/W Head

Actuator

© 2007 EMC Corporation. All rights reserved. Physical Disks - 7

The read/write heads for all of the platters in a drive are attached to one actuator arm assembly and
move across the platter simultaneously. Notice there are two read/write heads per platter, one for each
surface.

Physical Disks -7
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Components: Controller

Controller

Interface
HDA

Power
Connector

Bottom View of Disk Drive

© 2007 EMC Corporation. All rights reserved. Physical Disks - 8

The controller is a printed circuit board, mounted at the bottom of the disk drive. It contains a
microprocessor (as well as some internal memory, circuitry, and firmware) that controls:
y power to the spindle motor and control of motor speed
y how the drive communicates with the host CPU
y reads/writes by moving the actuator arm, and switching between R/W heads
y optimization of data access

Physical Disks -8
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Disk Structures: Sectors and Tracks

Sector

Track

Platter

© 2007 EMC Corporation. All rights reserved. Physical Disks - 9

Data is recorded in tracks. A track is a concentric ring around the spindle which contains data.
y A track can hold a large amount of data. Track density describes how tightly packed the tracks are
on a platter.
y Tracks are numbered from the outer edge of the platter, starting at track zero.
y A track is divided into sectors. A sector is the smallest individually-addressable unit of storage.
y The number of sectors per track is based upon the specific drive.
y Sectors typically hold 512 bytes of user data. Some disks can be formatted with larger sectors.
y A formatting operation performed by the manufacturer writes the track and sector structure on the
platter.
Each sector stores user data as well as other information, including its sector number, head number (or
platter number) and track number. This information aids the controller in locating data on the drive,
but it also takes up space on the disk. Thus there is a difference between the capacity of an unformatted
disk and a formatted one. Drive manufacturers generally advertise the formatted capacity.
The first PC hard disks typically held 17 sectors per track. Today's hard disks can have a much larger
number of sectors in a single track. There can be thousands of tracks on a platter, depending on the size
of the drive.

Physical Disks -9
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Platter Geometry and Zoned-Bit Recording

Sector

Track

Platter Without Zones Platter With Zones

© 2007 EMC Corporation. All rights reserved. Physical Disks - 10

Since a platter is made up of concentric tracks, the outer tracks can hold more data than the inner ones
because they are physically longer than the inner tracks. However, in older disk drives, the outer tracks
had the same number of sectors as the inner tracks, which means that the data density was very low on
the outer tracks. This was an inefficient use of the available space.
Zoned-bit recording uses the disk more efficiently. It groups tracks into zones that are based upon
their distance from the center of the disk. Each zone is assigned an appropriate number of sectors per
track. This means that a zone near the center of the platter has fewer sectors per track than a zone on
the outer edge.
In zoned-bit recording:
y outside tracks have more sectors than inside tracks
y zones are numbered, with the outermost zone being Zone 0
y tracks within a given zone have the same number of sectors
Note: The media transfer rate drops as the zones move closer to the center of the platter, meaning that
performance is better on the zones created on the outside of the drive. Media transfer rate is covered
later in the module.

Physical Disks - 10
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Disk Structures: Cylinders

Cylinder

Tracks, Cylinders and Sectors

Tracks and sectors organize data on a single platter. Cylinders help organize data across platters on a
drive.
A cylinder is the set of identical tracks on both surfaces of each of the drive’s platters. Often the drive
head location is referred to by cylinder number rather than by track number.
Because all of the read-write heads move together, each head is always physically located at the same
track number. In other words, one head cannot be on track zero while another is on track 10.
.

Physical Disks - 11
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Logical Block Addressing

Sector

Cylinder
Block 0
Head Block 8
(lower surface)

Block 16

Block 32

Block 48

Physical Address = CHS Logical Block Address = Block #

© 2007 EMC Corporation. All rights reserved. Physical Disks - 12

At one time, drives used physical addresses made up of the Cylinder, Head, and Sector number (CHS)
to refer to specific locations on the disk. This meant that the host had to be aware of the geometry of
each disk that was used.
Logical Block Addressing (LBA) simplifies addressing by a using a linear address for accessing
physical blocks of data. The disk controller performs the translation process from LBA to CHS
address. The host only needs to know the size of the disk drive (how many blocks).
y Logical blocks are mapped to physical sectors on a 1:1 basis
y Block numbers start at 0 and increment by one until the last block is reached (E.g., 0, 1, 2, 3 … (N-
1))
y Block numbering starts at the beginning of a cylinder and continues until the end of that cylinder
y This is the traditional method for accessing peripherals on SCSI, Fibre Channel, and newer ATA
disks
y As an example, we’ll look at a new 500 GB drive. The true capacity of the drive is 465.7 GB,
which is in excess of 976,000,000 blocks. Each block will have its own unique address
In the slide, the drive shows 8 sectors per track, 8 heads, and 4 cylinders. We have a total of 8 x 8 x 4 =
256 blocks. The illustration on the right shows the block numbering, which ranges from 0 to 255.

Physical Disks - 12
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Drive Partitioning and Concatenation

A
B D
C
Partitioning -
Concatenation -
Multiple Logical Volumes
One Logical Volume
© 2007 EMC Corporation. All rights reserved. Physical Disks - 13

Partitioning divides the disk into logical containers (known as volumes), each of which can be used
for a particular purpose.
y Partitions are created from groups of contiguous cylinders
y A large physical drive could be partitioned into multiple Logical Volumes (LV) of smaller capacity
y Because partitions define the disk layout, they are generally created when the hard disk is initially
set up on the host
y Partition size impacts disk space utilization
y The host filesystem accesses partitions, with no knowledge of the physical structure.
Concatenation groups several smaller physical drives and presents them collectively as one large
logical drive to the host. This is typically done using the Logical Volume Manager on the host.

Physical Disks - 13
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson Summary
Key points covered in this lesson:
y Physical drives are made up of:
– HDA
¾ Platters connected via a spindle
¾ Read/write heads which are positioned by an actuator
– Controller
¾ Controls power, communication, positioning, and optimization

y Data is structured on a drive using tracks, sectors, and

cylinders
y The geometry of a disk impacts how data is recorded on
a platter

© 2007 EMC Corporation. All rights reserved. Physical Disks - 14

These are the key points covered in this lesson. Please take a moment to review them.

Physical Disks - 14
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson: Disk Drive Performance

Upon completion of this lesson, you will be able to:
y Describe the factors that impact the performance of a
drive
y Describe how drive reliability is measured

© 2007 EMC Corporation. All rights reserved. Physical Disks - 15

The focus of this lesson is on the factors that impact how well a drive works, in particular, the
performance and reliability of the drive.
Since a disk drive is a mechanical device, it takes much more time than the electronic speeds of
memory. The length of time to read or write data on the disk is dependant primarily upon three factors:
Seek time, Rotational Delay also known as Latency, and Transfer Rate.
The objectives for this lesson are shown here. Please take a moment to read them.

Physical Disks - 15
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Performance: Positioning

y Seek time is the time for
read/write heads to move
between tracks
y Seek time specifications
include:
– Full stroke
– Average
– Track-to-track

© 2007 EMC Corporation. All rights reserved. Physical Disks - 16

Seek times describe the time it takes to position the read/write heads radially across the platter. The
following specifications are often published:
y Full Stroke - the time it takes to move across the entire width of the disk, from the innermost track
to the outermost
y Average – the average time it takes to move from one random track to another (normally listed as
the time for one-third of a full stroke)
y Track-to-Track – the time it takes to move between adjacent tracks
Each of these specifications is measured in milliseconds (ms).
Notes:
Average seek times on modern disks typically are in the range of 3 to 15 ms.
Seek time has more impact on reads of random tracks on the disk rather than on adjacent tracks.
To improve seek time, data is often written only to a subset of the available cylinders (either on the
inner or outer tracks), and the drive is treated as though it has a lower capacity than it really has, e.g. a
500 GB drive is set up to use only the first 40 % of the cylinders, and is treated as a 200 GB drive.
This is known as short-stroking the drive.

Physical Disks - 16
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Performance: Rotational Speed/Latency

© 2007 EMC Corporation. All rights reserved. Physical Disks - 17

The actuator moves the read/write head over the platter to a particular track, while the platter spins to
position the a particular sector under the read write head.
Rotational latency is the time it takes the platter to rotate and position the data under the read/write
head.
y Rotational latency is dependent upon the rotation speed of the spindle and is measured in
milliseconds (ms)
y The average rotational latency is one-half of the time taken for a full rotation
y Like seek times, rotational latency has more of an impact on reads or writes of random sectors on
the disk than on the same operations on adjacent sectors
Since spindle speed contributes to latency, the faster the disk spins, the quicker the correct sector will
rotate under the heads—thus leading to a lower latency.
Rotational latency is around 5.5 ms for a 5,400 rpm drive, and around 2.0 ms for a 15,000 rpm drive.

Physical Disks - 17
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Performance: Command Queuing

Without Command Queuing

Request 1
Request 2 2
4 3 2 1
1
Request 3
3
Request 4 4

With Command Queuing

Request 1
Request 2 2
4 2 3 1
1
Request 3
3
Request 4 4

© 2007 EMC Corporation. All rights reserved. Physical Disks - 18

If commands are processed as they are received, time is wasted if the read/write head passes over data
that is needed one or two requests later. To improve drive performance, some drive manufacturers
include logic that analyzes where data is stored on the platter relative to the data access requests.
Requests are then reordered to make best use of the data’s layout on the disk.
This technique is known as Command Queuing (also known as Multiple Command Reordering,
Multiple Command Optimization, Command Queuing and Reordering, Native Command Queuing or
Tagged Command Queuing).
In addition to being performed at the physical disk level, command queuing can also be performed by
the storage system that uses the disk.

Physical Disks - 18
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Disk Drive Performance: Data Transfer Rate

External transfer rate Internal transfer rate

measured here measured here

HBA Interface Buffer

Disk Drive

© 2007 EMC Corporation. All rights reserved. Physical Disks - 19

The following steps take place when data is read from/written to the drive:
y Read
1. Data moves from the disk platters to the heads
2. Data moves from the heads to the drive's internal buffer
3. Data moves from the buffer through the interface to the host HBA
y Write
1. Data moves from the HBA to the internal buffer through the drive’s interface
2. Data moves from the buffer to the read/write heads
3. Data moves from the disk heads to the platters
The Data Transfer Rate describes the MB/second that the drive can deliver data to the HBA. Given
that internal and external factors can impact performance transfer rates are refined to use:
y Internal transfer rate - the speed of moving data from the disk surface to the R/W heads on a
single track of one surface of the disk. This is also known as the burst transfer rate
− Sustained internal transfer rate takes other factors into account, such as seek times
y External transfer rate - the rate at which data can be moved through the interface to the HBA.
The burst transfer rate is generally the advertised speed of the interface (e.g., 133 MB/s for
ATA/133)
− Sustained external transfer rate are lower than the interface speed
Note: Internal transfer rates are almost always lower, sometimes appreciably lower, than the external
transfer rate.

Physical Disks - 19
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Drive Reliability: MTBF

y Mean Time Between Failure
y Amount of time that one can anticipate a device to work
before an incapacitating malfunction occurs
– Based on averages
– Measured in hours

y Determined by artificially aging the product

© 2007 EMC Corporation. All rights reserved. Physical Disks - 20

Mean Time Between Failure (MTBF) is the amount of time that one can anticipate a device to work
before an incapacitating malfunction occurs. It is based on averages and therefore is used merely to
provide estimates. MTBF is measured in hours (e.g., 750,000 hours).
MTBF is based on an aggregate analysis of a huge number of drives, so it does not help to determine
how long a given drive will actually last. MTBF is often used along with the service life of the drive,
which describes how long you can expect the drive’s components to work before they wear out (e.g., 2
years).
Note: MTBF is a statistical method developed by the U.S. military as a way of estimating maintenance
levels required by various devices. It is generally not practical to test a drive before it becomes
available for sale (750,000 hours is over 85 years!). Instead, MTBF is tested by artificially aging the
drives. This is accomplished by subjecting them to stressful environments such as high temperatures,
high humidity, fluctuating voltages, etc.

Physical Disks - 20
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Lesson Summary
Key points covered in this lesson:
y Drive performance is impacted by a number of factors
including:
– Seek time
– Rotational latency
– Command queuing
– Data transfer rate

y Drive reliability is measured using MTBF

© 2007 EMC Corporation. All rights reserved. Physical Disks - 21

These are the key points covered in this lesson. Please take a moment to review them.

Physical Disks - 21
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y Physical drives are made up of a number of components
– HDA – houses the platters, spindles, actuator assemblies (which
include the actuator and the read/write heads)
– Controller - Controls power, communication, positioning, and
optimization

y Data is structured on a drive using tracks, sectors, and

cylinders
y Drive performance is impacted by seek time, rotational
latency, command queuing, and data transfer rate

© 2007 EMC Corporation. All rights reserved. Physical Disks - 22

These are the key points covered in this module. Please take a moment to review them.

Physical Disks - 22
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

9 Check Your Knowledge

y Describe the purpose of the actuator, the read/write head,
and the controller on a drive.
y What is the difference between a track, a sector, and a
cylinder?
y Why is zoned-bit recording used?
y What is the difference between seek time and rotational
latency?
y What is the difference between internal and external data
transfer rates?
y What purpose does the MTBF specification serve?

© 2007 EMC Corporation. All rights reserved. Physical Disks - 23

Check your knowledge of this module by taking some time to answer the questions shown on the slide.

Physical Disks - 23
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Physical Disks - 24
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID Arrays
After completing this module, you will be able to:
y Describe what RAID is and the needs it addresses
y Describe the concepts upon which RAID is built
y Compare and contrast common RAID levels
y Recommend the use of the common RAID levels based
on performance and availability considerations

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 1

In the previous module, we looked at how a disk drive works. Disk drives can be combined into disk
arrays to increase capacity.
An individual drive has a certain life expectancy before it fails, as measured by MTBF. Since there are
many drives, potentially hundreds or even thousands of drives in disk array, the probability of a drive
failure increases significantly. As an example, if the MTBF of a drive is 750,000 hours, and there are
100 drives in the array, then the MTBF of the array becomes 750,000 / 100, or 7,500 hours. RAID
(Redundant Array of Independent Disks) was introduced to mitigate this problem.
RAID arrays enable you to increase capacity, provide higher availability (in case of a drive failure),
and increase performance (through parallel access). In this module, we will look at the concepts that
provide a foundation for understanding disk arrays with built-in controllers for performing RAID
calculations. Such arrays are commonly referred to as RAID Arrays. We will also learn about a few
commonly implemented RAID levels and the type of protection they offer.

RAID Arrays -1
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID - Redundant Array of Independent Disks

RAID
Controller

Host

RAID Array

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 2

RAID (Redundant Arrays of Independent Disks) combines two or more disk drives in an array into
a RAID set or a RAID group. The RAID set appears to the host as a single disk drive. Properly
implemented RAID sets provide:
y Higher data availability
y Improved I/O performance
y Streamlined management of storage devices
Historical Note: In 1987, Patterson, Gibson and Katz at the University of California Berkeley,
published a paper entitled, "A Case for Redundant Arrays of Inexpensive Disks (RAID)." This paper
described various types of disk arrays, referred to by the acronym RAID. At the time, data was stored
largely on large, expensive disk drives (called SLED, or Single Large Expensive Disk). The term
inexpensive was used in contrast to the SLED implementation. The term RAID has been redefined to
refer to independent disks, to reflect the advances in the storage technology.
RAID storage has now grown from an academic concept to an industry standard.

RAID Arrays -2
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID Components

Physical
Array

Logical
Array
RAID
Controller
Logical
Array

Host

RAID Array

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 3

Physical disks inside a RAID array are usually contained in smaller sub-enclosures. These sub-
enclosures, or physical arrays, hold a fixed number of physical disks, and may also include other
supporting hardware, such as power supplies.
A subset of disks within a RAID array can be grouped to form logical associations called logical
arrays, also known as a RAID set or a RAID group. The operating system may see these disk groups
as if they were regular disk volumes. Logical arrays facilitate the management of a potentially huge
number of disks. Several physical disks can be combined to make large logical volumes.
Generally, the array management software implemented in RAID systems handles:
y Management and control of disk aggregations (e.g. volume management)
y Translation of I/O requests between the logical disks and the physical disks
y Data regeneration if disk failures occur

RAID Arrays -3
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID Levels
y 0 Striped array with no fault tolerance
y 1 Disk mirroring
y 3 Parallel access array with dedicated parity disk
y 4 Striped array with independent disks and a dedicated
parity disk
y 5 Striped array with independent disks and distributed
parity
y 6 Striped array with independent disks and dual
distributed parity
y Combinations of levels (I.e., 1 + 0, 0 + 1, etc.)
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 4

There are some standard RAID configuration levels, each of which has benefits in terms of
performance, capacity, data protection, etc.
The discussion centers around the commonly used levels and commonly used combinations of levels.

RAID Arrays -4
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Data Organization: Strips and Stripes

Stripe 1
Stripe 2
Stripe 3

Strips

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 5

RAID sets are made up of disks. Within each disk, there are groups of contiguously addressed blocks,
called strips. The set of aligned strips that spans across all the disks within the RAID set is called a
stripe.
y Strip size (also called stripe depth) describes the number of blocks in a strip, and is the maximum
amount of data that is written to or read from a single disk in the set before the next disk is
accessed (assuming that the accessed data starts at the beginning of the strip).
− All strip in a stripe have the same number of blocks.
− Decreasing strip size means that data is broken into smaller pieces when spread across the
disks.
y Stripe size describes the number of data blocks in a stripe.
− To calculate the stripe size, multiply the strip size by the number of data disks.
y Stripe width refers to the number of data strips in a stripe (or, put differently, the number of data
disks in a stripe).

RAID Arrays -5
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 0 – Striped Array with no Fault Tolerance

RAID
Block 0
4
3
2
1 Block 0
4
3
2
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 6

RAID 0 stripes the data across the drives in the array without generating redundant data.
y Performance - better than JBOD because it uses striping. The I/O rate, called throughput, can be very high
when I/O sizes are small. Large I/Os produce high bandwidth (data moved per second) with this RAID type.
Performance is further improved when data is striped across multiple controllers with only one drive per
controller.
y Data Protection – no parity or mirroring means that there is no fault tolerance. Therefore, it is extremely
difficult to recover data.
y Applications – those that need high bandwidth or high throughput, but where the data is not critical, or can
easily be recreated.
Striping improves performance by distributing data across the disks in the array. This use of multiple
independent disks allows multiple reads and writes to take place concurrently.
y When a large amount of data is written, the first piece is sent to the first drive, the second piece to the second
drive, and so on.
y The pieces are put back together again when the data is read.
y Striping can occur at the block (or block multiple) level or the byte level. Stripe size can be specified at the
Logical Volume Manager level from the host – software RAID. Or depending on the vendor, can be set at
the array level – in case of hardware RAID.
Notes on striping:
y Increasing the number of drives in the array increases performance because more data can be read or written
simultaneously.
y A higher stripe width indicates a higher number of drives and therefore better performance.
y Striping is generally handled by the controller and is transparent to the host operating system.

RAID Arrays -6
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 1 – Disk Mirroring

RAID
Block 0
1 Block 0
1
Controller

Host

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 7

RAID 1 uses mirroring to improve fault tolerance. A RAID 1 group consists of 2 (typically) or more
disk modules. Every write to a data disk is also a write to the mirror disk(s). This is transparent to the
host. If a disk fails, the disk array controller uses the mirror drive for data recovery and continuous
operation. Data on the replaced drive is rebuilt from the mirror drive.
y Benefits - high data availability and high I/O rate (small block size)
y Drawbacks - total number of disks in the array equaling 2 times the data (useable) disks. This
means that the overhead cost equals 100%, while usable storage capacity is 50%
y Performance – improves read performance, but degrades write performance
y Data Protection - improved fault tolerance over RAID 0
y Disks – at least two disks
y Cost – expensive due to the extra capacity required to duplicate data
y Maintenance - low complexity
y Applications - applications requiring high availability and non-degraded performance in the event
of a drive failure

RAID Arrays -7
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 0+1 – Striping and Mirroring

RAID 1

Block 0

Block 2

RAID RAID 0
Block 0
3
2
1
Controller

Block 1

Host Block 3

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 8

RAID 0+1 is one way of combining the speed of RAID 0 with the redundancy of RAID 1. RAID 0+1
is implemented as a mirrored array whose basic elements are RAID 0 stripes.
y Benefits - medium data availability, high I/O rate (small block size), and the ability to withstand
multiple drive failures as long as they occur on the same stripe
y Drawbacks - total number of disks equal two times the data disks, with overhead cost equaling
100%
y Performance - high I/O rates; writes are slower than reads because of mirroring
y Data Protection - medium reliability
y Disks - even number of disks (4 disk minimum to allow striping)
y Cost - very expensive because of the high overhead
y Applications – imaging and general file server

RAID Arrays -8
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 0+1 – Striping and Mirroring

RAID 1

Block 0 Block 0

Block 2 Block 2

RAID RAID 0
Controller

Block 1 Block 1

Host Block 3 Block 3

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 9

In the event of a single drive failure, the entire stripe set is faulted. Normal processing can continue
with the mirrors. How ever, rebuild of the failed drive will involve copying data from the mirror to the
entire stripe set. This will result in increased rebuild times as compared to RAID 1+0 solution. This
makes RAID 0+1 implementation less common than RAID 1+0.

RAID Arrays -9
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 1+0 – Mirroring and Striping

RAID 0

Block 1

Block 3

RAID RAID 1
Block 2
0
Controller

Block 1

Host Block 3

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 10

RAID 1+0 (or RAID 10, RAID 1/0, or RAID A) also combines the speed of RAID 0 with the
redundancy of RAID 1, but it is implemented a different manner than RAID 0+1. RAID 1+0 is a
striped array whose individual elements are RAID 1 arrays - mirrors.
y Benefits - high data availability, high I/O rate (small block size), and the ability to withstand
multiple drive failures as long as they occur on different mirrors
y Drawbacks - total number of disks equal two times the data disks, with overhead cost equaling
100%
y Data Protection - high reliability
y Disks - even number of disks (4 disk minimum, to allow striping)
y Cost - very expensive, because of the high overhead
y Performance: High I/O rates achieved using multiple stripe segments. Writes are slower than reads,
because they are mirrored
y Applications – databases requiring high I/O rates with random data, and applications requiring
maximum data availability

RAID Arrays - 10
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 1+0 – Mirroring and Striping

RAID 0

Block 0 Block 1

Block 2 Block 3

RAID RAID 1
Controller

Block 0 Block 1

Host Block 2 Block 3

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 11

In the event of a drive failure, normal processing can continue with the surviving mirror. Only the data
on the failed drive has to be copied over from the mirror for the rebuild, as opposed to rebuilding the
entire stripe set in RAID 0+1. This results in faster rebuild times for RAID 1+0 and makes it a more
common solution than RAID 0+1.
Note that under normal operating conditions both RAID 0+1 and RAID 1+0 provide the same benefits.
These solutions are still aimed at protecting against a single drive failure and not against multiple drive
failures.

RAID Arrays - 11
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID Redundancy: Parity

0
4
8

1
5
9

RAID 2
Controller 6
10

3
Host 7
11

0123
4567
8 9 10 11

Parity Disk
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 12

Parity is a redundancy check that ensures that the data is protected without using a full set of duplicate
drives.
y If a single disk in the array fails, the other disks have enough redundant data so that the data from
the failed disk can be recovered.
y Like striping, parity is generally a function of the RAID controller and is transparent to the host.
y Parity information can either be:
− Stored on a separate, dedicated drive (RAID-3)
− Distributed with the data across all the drives in the array (RAID-5)

RAID Arrays - 12
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Parity Calculation

5 Data
5 + 3 + 4 + 2 = 14

3 Data

The middle drive fails: 4 Data

5 + 3 + ? + 2 = 14
2 Data
? = 14 – 5 – 3 – 2
?=4 Parity
14

RAID Array
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 13

This example uses arithmetic operations to demonstrate how parity works. It illustrates the concept,
but not the actual mechanism.
y Think of parity as the sum of the data on the other disks in the RAID set. Each time data is
updated, the parity is updated as well, so that it always reflects the current sum of the data on the
other disks.
Note: While parity is calculated on a per stripe basis, the diagram omits this detail for the sake of
simplification.
y If a disk fails, the value of its data is calculated by using the parity information and the data on the
surviving disks.
y If the parity disk fails, the value of its data is calculated by using the data disks. Parity will only
need to be recalculated, and saved, when the failed disk is replaced with a new disk.
In the event of a disk failure, each request for data from the failed disk requires that the data be
recalculated before it can be sent to the host. This recalculation is time-consuming, and decreases the
performance of the RAID set. Hot spare drives, introduced later, provide a way to minimize the
disruption caused by a disk failure.
The actual parity algorithm use the Boolean exclusive-OR (XOR) operations.

RAID Arrays - 13
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 3 – Parallel Transfer with Dedicated Parity

Disk

Block 0
3
2
1 RAID0
Block
Controller
Block
Parity1
Generated
Block 2
Host Block 3
P0123

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 14

RAID Level 3 stripes data for high performance and uses parity for improved fault tolerance. Data is
striped across all the disks, but one in the array. Parity information is stored on a dedicated drive, so
that data can be reconstructed if a drive fails.
RAID 3 always reads and writes complete stripes of data across all the disks. There are no partial
writes that update one out of many strips in a stripe.
y Benefits - the total number of disks is less than in a mirrored solution (e.g. 1.25 times the data disks
for group of 5), good bandwidth on large data transfers
y Drawbacks - poor efficiency in handling small data blocks. This makes it not well suited to
transaction processing applications. Data is lost if multiple drives fail within the same RAID 3
Group.
y Performance - high data read/write transfer rate. Disk failure has a significant impact on
throughput. Rebuilds are slow.
y Data Protection - uses parity for improved fault tolerance
y Striping – byte level to multiple block level, depending on vendor implementation
y Applications - applications where large sequential data accesses are used such as medical and
geographic imaging

RAID Arrays - 14
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 4 – Striping with Dedicated Parity Disk

Block 0
Block 4

Block 1
Block 5

Parity
RAID0 Block 2
Block 0 Block
Generated
Controller Block 6
P0123
Block 3
Host Block 7

P0123
P4567

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 15

RAID Level 4 stripes data for high performance and uses parity for improved fault tolerance. Data is
striped across all the disks, but one in the array. Parity information is stored on a dedicated disk so that
data can be reconstructed if a drive fails.
The data disks are independently accessible, and multiple reads and writes can occur simultaneously.
y Benefits - the total number of disks is less than in a mirrored solution (e.g., 1.25 times the data
disks for group of 5), good read throughput, and reasonable write throughput.
y Drawbacks – the dedicated parity drive can be a bottleneck when handling small data writes. This
RAID level is not well suited to transaction processing applications. Data loss if multiple drives
fail within the same RAID 4 Group.
y Performance - high data read transfer rate. Poor to medium write transfer rate. Disk failure has a
significant impact on throughput
y Data Protection - uses parity for improved fault tolerance.
y Striping – usually at the block (or block multiple) level
y Applications – general purpose file storage
RAID 4 is much less commonly used than RAID 5, discussed next. The dedicated parity drive is a
bottleneck, especially when a disk failure has occurred.

RAID Arrays - 15
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 5 – Independent Disks with Distributed Parity

Block 0
Block 4

Block 1
Block 5

Parity
RAID4 Block 2
Block 0
4 Block 0
Generated
Controller Block 6
P405
1627 3
Block 3
Host P4567

P0123
Block 7

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 16

RAID 5 does not read and write data to all disks in parallel like RAID 3. Instead, it performs independent read
and write operations. There is no dedicated parity drive; data and parity information is distributed across all
drives in the group.
y Benefits - the most versatile RAID level. A transfer rate greater than that of a single drive but with a high
overall I/O rate. Good for parallel processing (multi-tasking) applications/environments. Cost savings due to
the use of parity over mirroring.
y Drawbacks - slower transfer rate than RAID 3. Small writes are slow, because they require a read-modify-
write (RMW) operation. Write to a single block involves two reads (old block and old parity) and two writes
(new block and new parity). There is degradation in performance in recovery and reconstruction modes and
data loss if multiple drives within the same group are lost.
y Performance - high read data transaction rate, medium write data transaction rate. Low ratio of parity disks to
data disks. Good aggregate transfer rate
y Data Protection - single disk failure puts volume in degraded mode. Difficult to rebuild (as compared to
RAID level 1).
y Disks - 5-disk and 9-disk groups are popular. Most implementations allow other RAID set sizes.
y Striping – block level, or multiple block level
y Applications - file and application servers, database servers, WWW, email, and News servers
Read operations do not involve parity calculations. In this case of 5-disk RAID 5 group, a maximum of 5
independent reads can be performed. As a write operation involves two disks (parity disk and the data disk), a
maximum of two independent writes can be performed in this configuration. So a maximum of 5 independent
reads or two independent writes can be performed on a 5-disk RAID 5 group.

RAID Arrays - 16
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID 6 – Dual Parity RAID

y Two disk failures in a RAID set leads to data
unavailability and data loss in single-parity schemes,
such as RAID-3, 4, and 5
y Increasing number of drives in an array and increasing
drive capacity leads to a higher probability of two disks
failing in a RAID set
y RAID-6 protects against two disk failures by maintaining
two parities
– Horizontal parity which is the same as RAID-5 parity
– Diagonal parity is calculated by taking diagonal sets of data blocks
from the RAID set members

y Even-Odd, and Reed-Solomon are two commonly used

algorithms for calculating parity in RAID-6
© 2007 EMC Corporation. All rights reserved. RAID Arrays - 17

The details of diagonal parity generation and rebuilds are beyond the scope of this foundations course.

RAID Arrays - 17
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

RAID Implementations
y Hardware (usually a specialized disk controller card)
– Controls all drives attached to it
– Performs all RAID-related functions, including volume management
– Array(s) appear to the host operating system as a regular disk drive
– Dedicated cache to improve performance
– Generally provides some type of administrative software

y Software
– Generally runs as part of the operating system
– Volume management performed by the server
– Provides more flexibility for hardware, which can reduce the cost
– Performance is dependent on CPU load
– Has limited functionality

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 18

As a broad distinction, hardware RAID is implemented by intelligent storage systems external to the
host, or, at minimum, intelligent controllers in the host that offload the RAID management functions
from the host.
Software RAID usually describes RAID that is managed by the host. Typically it is implemented via
Logical Volume Manager on the host. The disadvantage of software RAID is that is uses host CPU
cycles that would be better utilized to run applications. Software RAID often looks attractive initially
because it does not require the purchase of additional hardware. The initial cost savings are soon
exceeded by the expense of using a costly server to perform I/O operations that it performs
inefficiently at best.

RAID Arrays - 18
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Hot Spares

RAID
Controller

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 19

A hot spare is an idle component (often a drive) in a RAID array that becomes a temporary replacement for a
failed component. For example:
The hot spare drive takes the failed drive’s identity in the array.
Data recovery takes place. How this happens is based on the RAID implementation:
y If parity was used, data is rebuilt onto the hot spare from the parity and data on the surviving drives.
y If mirroring was used, data is rebuilt using the data from the surviving mirror drive.
The failed drive is replaced with a new drive at some time later.
One of the following occurs:
y The hot spare replaces the new drive permanently—meaning that it is no longer a hot spare and a new hot
spare must be configured on the system.
y When the new drive is added to the system, data from the hot spare is copied to the new drive. The hot spare
returns to its idle state, ready to replace the next failed drive.
Note: The hot spare drive needs to be large enough to accommodate the data from the failed drive.
Hot spare replacement can be:
y Automatic - when a disk’s recoverable error rates exceed a predetermined threshold, the disk subsystem tries
to copy data from the failing disk to a spare one. If this task completes before the damaged disk fails, the
subsystem switches to the spare and marks the failing disk unusable. (If not, it uses parity or the mirrored
disk to recover the data, as appropriate).
y User initiated - the administrator tells the system when to do the rebuild. This gives the administrator control
(e.g., rebuild overnight so as not to degrade system performance); however, the system is vulnerable to
another failure because the hot spare is now unavailable. Some systems implement multiple hot spares to
improve availability.

RAID Arrays - 19
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Hot Swap

RAID
Controller

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 20

Like hot spares, hot swaps enable a system to recover quickly in the event of a failure. With a hot
swap, the user can replace the failed hardware (such as a controller) without having to shut down the
system.

RAID Arrays - 20
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module:
y What RAID is and the needs it addresses
y The concepts upon which RAID is built
y Some commonly implemented RAID levels

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 21

These are the key points covered in this module. Please take a moment to review them.

RAID Arrays - 21
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

9 Check Your Knowledge

y What is a RAID array?
y What benefits do RAID arrays provide?
y What methods can be used to provide higher data
availability in a RAID array?
y What is the primary difference between RAID 3 and
RAID 5?
y What is Read-Modify-Write in RAID 5?
y What is advantage of using RAID 6?
y What is a hot spare?

© 2007 EMC Corporation. All rights reserved. RAID Arrays - 22

Check your knowledge of this module by taking some time to answer the questions shown on the slide.

RAID Arrays - 22
Copyright © 2007 EMC Corporation. Do not Copy - All Rights Reserved.

Intelligent Storage Systems

After completing this module, you will be able to:
y Describe the components of an intelligent storage system
y Describe the configuration of a logical disk
y Discuss the methods employed to ensure that a host can
access a storage volume
y Discuss back end volume protection
y Discuss front end host configuration
y Describe the I/O flow from the back end to the physical
disks

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 1

At this point, you have learned how disks work and how they can be combined to form RAID arrays.
Now we are going to build on those concepts and add intelligence to those arrays, making them even
more powerful. Throughout this module we refer to this as an intelligent storage system.
The objectives for this module are shown here. Please take a moment to read them

Disk Storage Systems -1

Lesson: Intelligent Storage System Overview

After completing this lesson, you will be able to:
y List the benefits of intelligent storage systems
y Compare and contrast integrated and modular
approaches to intelligent storage systems
y Describe the I/O flow through the storage system
y Describe the logical elements of an intelligent storage
system

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 2

This module contains two lessons. In this lesson, we take a high level look at the components of a disk
storage system as well as two approaches to implementing them: integrated and modular.
The objectives for this lesson are shown here. Please take a moment to read them.

Disk Storage Systems -2

What is an Intelligent Storage System

Intelligent Storage Systems are RAID arrays that are:
y Highly optimized for I/O processing.
y Have large amounts of cache for improving I/O
performance.
y Have operating environments that provide:
– Intelligence for managing cache
– Array resource allocation
– Host access to array resources
– Connectivity for heterogeneous hosts
– Advanced array based local and remote replication options

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 3

Let’s start by asking the question, “What is an intelligent storage system?” It is a disk storage system
which distributes data over several devices, and manages access to that data.
Intelligent storage systems have an operating environment. The operating environment can be viewed
as an “operating system” for the array. They also have large amounts of cache. Sophisticated
algorithms manage cache to optimize the read/write requests from the hosts. Large capacity drives can
be partitioned or “sliced” into smaller units. These smaller units, in turn, can be presented to hosts as
individual disk drives. Array management software can also enable multiple hosts to access the array
via the same I/O channel. The operating environment ensures that each host can only access the disk
resources allocated to it.

Disk Storage Systems -3

Benefits of an Intelligent Storage System

Intelligent storage system provides several benefits over a
collection of disks in an array or even a RAID array:
– Improved performance
– Easier data management
– Improved resource allocation and utilization
– Very high levels of data availability and data protection
– Array based technologies for local and remote replication
– Optimized backup/restore functionalities
– Improved flexibility and scalability

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 4

Intelligent storage systems, a collection of disks in an array, and RAID arrays, all provide increased
data storage capacity. However, intelligent storage systems provide more benefits, as listed in the
slide.

Disk Storage Systems -4

Monolithic (Integrated) Storage Systems

FC Ports
Port Processors

Monolithic Cache

RAID Controllers

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 5

Intelligent storage systems generally fall into one of two categories, monolithic and modular.
Monolithic storage systems are generally aimed at the enterprise level, centralizing data in a powerful
system with hundreds of drives. They have the following characteristics:
y Large storage capacity
y Large amounts of cache to service host I/Os efficiently and optimally
y Redundant components for improved data protection and availability
y Many built in features to make them more robust and fault tolerant
y Usually connect to mainframe computers or very powerful open systems hosts
y Multiple front end ports to provide connectivity to multiple servers
y Multiple back end Fibre Channel or SCSI RAID controllers to manage disk processing.
This system is contained within a single frame or interconnected frames (for expansion) and can scale
to support increases in connectivity, performance, and capacity as required. Monolithic storage
systems can handle large amounts of concurrent I/Os from numerous servers and applications. They
are quite expensive compared to modular storage systems (discussed in the next slide). Many of their
features and functionality might be required only for mission critical applications in large enterprises.
Note: Monolithic arrays are sometimes called integrated arrays, enterprise arrays, or cache centric
arrays.

Disk Storage Systems -5

Modular Storage Systems

Modular

Rack
Host Interface Host Interface
Servers
Cache Cache

RAID RAID FC Switches

Controller A Controller B
Disk Modules

Control Module
with Disks

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 6

Modular storage systems provide storage to a smaller number of (typically) Windows or Unix servers
than larger integrated storage systems. Modular storage systems are typically designed with two
controllers, each of which contains host interfaces, cache, RAID processors, and disk drive interfaces.
They have the following characteristics:
y Smaller total storage capacity and lesser global cache, than monolithic arrays
y Fewer front end ports for connection to servers
y Performance can degrade as the number of connected servers increases
y Limited redundancy
y Fewer options for array based local and remote replication
Note: Modular storage systems are sometimes called midrange or departmental storage systems.
It should also be noted that the distinction between monolithic and modular arrays is becoming
increasingly blurred. Traditionally, monolithic arrays have been associated with large enterprises and
modular arrays with small/medium businesses. With proper classification of application requirements
(such as performance, availability, scalability), modular arrays can now be found in several enterprises,
providing optimal storage solutions at a lower cost (than monolithic arrays).

Disk Storage Systems -6

Components of an Intelligent Storage System

Intelligent Storage System

Front End Back End Physical Disks

Host Connectivity Cache

Cache

© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 7

At a high level, the components of an intelligent storage systems are:

y Front End
y Cache
y Back End
y Physical disks

Disk Storage Systems -7

Intelligent Storage System: Front End

Intelligent Storage System

Front End Back End Physical Disks

Host Connectivity Cache

Ports Controllers

The front end controller receives and processes I/O requests from the host. Hosts connect to the
storage system via ports on the front end controller.
y Ports are the external interfaces for connectivity to the host. Each storage port has processing logic
responsible for executing the appropriate transport protocol for storage connections. For example,
it could use SCSI, Fibre Channel, or iSCSI.
y Behind the storage ports are controllers which communicate with the cache and back end to
provide data access.
The number of front-end ports on a modular storage system generally ranges from 1-8; 4 is typical. On
a large monolithic array, port counts as high as 64 or 128 are common.

Disk Storage Systems -8

Front End Command Queuing

Without Command Queuing

Request 1 F
R
Request 2 O 2
N 4 3 2 1
1
Request 3 T
3
Request 4 E 4
N
D

With Command Queuing

Request 1 F
R
Request 2 O 2
N 4 2 3 1
1
Request 3 T
3
Request 4 E 4
N
D

As seen earlier, command queuing processes multiple concurrent commands based on the organization
of the data on disk, regardless of the order in which the commands were received.
The command queuing software reorders commands so as to make the execution more efficient, and
assigns each command a tag. This tag identifies when the command will be executed, just as the
number you take at the deli determines when you will be served.
Some disk drives, particularly SCSI and Fibre Channel disks, are intelligent enough to manage their
own command queuing. Intelligent storage systems can make use of this native disk intelligence, and
may supplement it with queuing performed by the controller.
There are several command queuing algorithms that can be used. Here are some of the common ones.
y First In, First Out – commands are executed in the order in which they arrive. This is identical to
having no queuing, and is therefore inefficient in terms of performance.
y Seek Time Optimization - faster than First In, First Out. However, two requests could be on
cylinders that are very close to each other, but in very different places within the track. Meanwhile,
there might be a third sector that is a few cylinders further away but much closer overall to the
location of the first request. Optimizing seek times only, without regard for rotational latency, will
not normally produce the best results.
y Access Time Optimization - combines seek time optimization with an analysis of rotational
latency for optimal performance.

Disk Storage Systems -9

Intelligent Storage System: Cache

Intelligent Storage System

Front End Back End Physical Disks

Host Connectivity Cache

Cache improves system performance by isolating the hosts from the mechanical delays associated with
physical disks. You have already seen that accessing data from a physical disk usually takes several
milliseconds, because of seek times and rotational latency; accessing data from high speed memory
typically takes less than a millisecond. The performance of reads as well as writes may be improved by
the use of cache. Cache is discussed in more detail in the next lesson.

Disk Storage Systems - 10

Intelligent Storage System: Back End

Intelligent Storage System

Front End Back End Physical Disks

Host Connectivity Cache

Controllers Ports

The back end controls the data transfers between cache and the physical disks. Physical disks are
connected to ports on the back end.
The back end provides the communication with the disks for read and write operations. The controllers
on the back end:
y Manages the transfer of data between the I/O bus and the disks in the storage system
y Handles addressing for the device - translating logical blocks into physical locations on the disk
y Provides additional, but limited, temporary storage for data
y Provides error detection and correction – often in conjunction with similar features on the disks
To provide maximum data protection and availability, dual controllers provide an alternative path to
physical disks, in case of a controller or a port failure. This reliability is enhanced if the disks used are
dual-ported; each disk port can connect to a separate controller. Having multiple controllers also
facilitates load balancing. Having more than one port on each controller provides additional protection
in the event of port failure. Typically, disks can be accessed via ports on controllers of two different
back ends.

Disk Storage Systems - 11

Intelligent Storage System: Physical Disks

Intelligent Storage System

Front End Back End Physical Disks

Host Connectivity Cache

Physical disks are where data is stored.

y Drives are connected to the controller with either SCSI (SCSI interface and copper cable) or Fibre
Channel (optical or copper) cables.
y When a storage system is used in environments where performance is not critical, ATA drives may
be used.
y The connection to the drives will then be made via parallel ATA (PATA) or serial ATA (SATA)
copper cables.
y Some storage systems allow a mixture of SCSI or Fibre Channel drives and ATA drives. The
higher performing drives are used for application data storage, while the slower ATA drives are
used for backup and archiving.

Disk Storage Systems - 12

What the Host Sees – Physical Drive Partitioning

Intelligent Storage System

Host
Physical Disks
Back End
LUN 0
LUN 1 Cache

Host

LUN 0
LUN 2 LUN 1
LUN 2

Since intelligent storage systems have multiple disk drives, they use the disks in various ways to
provide optimal performance and capacity. For example:
y A large physical drive could be subdivided into multiple virtual disks of smaller capacity. This is
similar to drive partitioning discussed in Section 2.
y Several physical drives can be combined together and presented as one large virtual drive. This is
similar to drive concatenation discussed in Section 2.
y Typically physical drives are grouped into RAID sets or RAID groups. LUNs with the desired
level of RAID protection are then created from these RAID sets and presented to the hosts.
The mapping of the LUNs to their physical location on the drives is managed by the controller.

Disk Storage Systems - 13

What the Host Sees – RAID Sets and LUNs

Intelligent Storage System

Host
Back End Physical Disks
LUN 0 Cache LUN 0

LUN 1
Host

LUN 1

In this example, a RAID set consisting of 5 disks has been sliced, or partitioned, into several LUNs.
LUNs 0 and 1 are shown. Note how a portion of each LUN resides on each disk in the RAID set.

Disk Storage Systems - 14

Logical Device Names

Host Intelligent Storage System

Volume
Manager
Physical Disks
/dev/rdsk/c1t1d0 Back End
LUN 0
/dev/rdsk/c1t1d1 Cache
LUN 1

Host
Volume
Manager LUN 0
LUN 2 LUN 1
LUN 2
\\.\PhysicalDrive0

This example shows a single physical disk divided into 3 LUNs: LUN 0, 1 and 2. The LUNs are
presented separately to the host or hosts. A host will see a LUN as if it were a single disk device. The
host is not aware that this LUN is only a part of a larger physical drive.
The host assigns logical device names to the LUNs; the naming conventions vary by platform.
Examples are shown for both Unix and Windows addressing.

Disk Storage Systems - 15

Lesson Summary
Key points covered in this lesson:
y An intelligent disk storage system:
– Is highly optimized for I/O processing
– Has an operating environment which, among other things, manages
cache, controls resource allocation, and provides advanced local and
remote replication capabilities
– Has a front end, cache, a back end, and physical disks
– The physical disks can be partitioned into LUNs or can be grouped
into RAID sets, and presented to the hosts

Please take a few moments to review the key points covered in this lesson.

Disk Storage Systems - 16

Lesson: Cache – A Closer Look

After completing this lesson, you will be able to:
y Describe the benefit of cache in intelligent storage
systems
y Describe how cache is structured
y Describe cache hits and misses
y Describe algorithms to manage cache

We already mentioned that cache plays a key role in an intelligent storage system. At this point, let’s
take a closer look at what cache is and how it works.

Disk Storage Systems - 17

What is Cache in a Storage System

A memory space used by an intelligent storage system
to reduce the time required to service I/O requests from
the host

Cache
Read
Write
Request

Acknowledgment

Physical disks are the slowest components of an intelligent storage system. If the disk has to be
accessed for every I/O operation from the host, response times are very high. Cache helps in reducing
the I/O response times. Cache can improve I/O response times in the following two ways:
y Read cache holds data that is staged into it from the physical disks. Discussed later, data can be
staged into cache ahead of time upon detection of read access patterns from hosts.
y Write cache holds data written by a host to the array until it can be committed to disk. Holding
writes in cache and acknowledging them immediately to host, prior to committing to disk, isolates
the host from inherent mechanical delays of the disk (such as rotational and seek latencies). Other
benefits of write caching are discussed later in this lesson.
Cache is volatile – loss of power leads to loss of data resident in cache, that has not yet been
committed to disk. Storage system vendors solve this problem in various ways. The memory may be
powered by a battery until AC power is restored, or battery power may be used to write the content of
cache to disk. In the event of an extended power failure, this is the best option. Intelligent storage
systems can have upwards of 256GB of cache and hundreds of physical disks. Potentially, there could
be a large amount of data to be committed to numerous disks. In this case, the batteries may not
provide power for sufficient amount of time to write each piece of data to the appropriate disk. Some
vendors use a dedicated set of physical disks to “dump” the content of cache during a power failure.
This is usually referred to as vaulting, and the dedicated disks are called vault drives. When power is
restored, data from these disks are read and then written to the correct disks.

Disk Storage Systems - 18

How Cache is Structured

Data Store

Tag RAM

The amount of user data that the cache can hold is based on the cache size and design. Cache normally
consists of two areas:
y Data store - the part of the cache that holds the data
y Tag RAM – the part of the cache that tracks the location of the data in the data store. Entries in
this area indicate where the data is found in memory, and also where the data belongs on disk.
Additional information found here will include a ‘dirty bit’ – a flag that indicates that data in cache
has not yet been committed to disk. There may also be time-based information such as the time of
last access. This information will be used to determine which cached information has not been
accessed for a long period of time, and may be discarded.

Disk Storage Systems - 19

Configuration and implementation of cache varies between vendors. In general, these are the options:
y A reserved set of memory addresses for reads and another reserved set of memory addresses for
writes. This implementation is known as dedicated cache. Cache management, such as tracking
the addresses currently in use, those that are available, and the addresses whose content has to be
committed to disk, can become quite complex in this implementation.
y In a global cache implementation, both reads and writes can use any of the available memory
addresses. Cache management is more efficient in this implementation, as only one global set of
addresses has to be managed.
− Some global cache implementations allow the users to specify the percentage of cache that has
to be available for reads and the percentage of cache that has to be available for writes. This
implementation is common in modular storage arrays.
− In other global cache implementations, the ratio of cache available for reads vs. writes might be
fixed, or the array operating environment can dynamically adjust this ratio based on the current
workload. These implementations are typically found in integrated storage arrays.
In integrated arrays, all the front end and back end directors have access to all regions of the cache. In
modular arrays, each controller (typically two) has access to its own cache on-board. A fault in
memory, for example failure of a memory chip, would lead to loss of any uncommitted data held in it.
Vendors use different approaches to mitigate this risk:
y Pro-actively “scrub” all regions of memory. Faults can be detected ahead of time, and the faulty
region can be isolated or fenced, and taken out of use. This is similar to bad block relocation on
physical disks.
y Mirror all writes within cache. Similar to RAID 1 mirroring of disks, each write can be held in two
different memory addresses, well separated from each other. Each write would be placed on two
independent memory boards, for example. In the event of a fault, the write data will still be safe in
the mirrored location and can be committed (de-staged) to disk. Since reads are staged from the
disk to cache, if there is a fault, an I/O error could be returned to the host, and the data can be
staged back into a different location in cache to complete the read request. The read service time
would be elongated, how ever there is no risk of lost data. As only writes are mirrored, this method
will lead to better utilization of available cache for data store.
y A third approach would be to mirror all reads and all writes in cache. In this implementation, when
data is read from the disk to be staged into cache, it is written to two different locations. Likewise
writes from hosts will be held in two different locations. This effectively reduces the amount of
usable cache by half. As reads and writes are treated on equal footing, the management overhead
would be less than that of mirroring writes alone.
In either of the two mirroring approaches, the problem of cache coherency is introduced. Cache
coherency means that data in the two different cache addresses are identical at all times. It is the
responsibility of the array operating environment to ensure coherency.

Disk Storage Systems - 20

Read Cache ‘Hits’ and ‘Misses’

Data found in cache = ‘Hit’
Cache
Read
Request

No data found = ‘Miss’

Cache
Read
Request

When a host issues a read request, the front end controller accesses the Tag RAM to determine whether
the required data is already available in cache.
If the requested data is found in the cache, it is known as a cache hit.
y The data is sent directly to the host, with no disk operation required.
y This provides fast response times.
If the data is not found in cache, the operation is known as a cache miss.
y When there is a cache miss, the data must be read from disk. The back end controller accesses the
appropriate disk and retrieves the requested data.
y Data is typically placed in cache and then sent to the host.
The read cache hit ratio (or hit rate), usually expressed as a percentage, describes how well the read
cache is performing. To determine the hit ratio, divide the number of read cache hits by the total
number of read requests.
Cache misses lengthen I/O response times. The response time depend on factors such as rotational
latency, and seek times, as discussed earlier.
A read cache hit can take about a millisecond, while a read cache miss can take many times longer.
Remember that average disk access times for reads are often in the 10 ms range.

Disk Storage Systems - 21

Algorithms Used to Manage Cache

New Data
y Least Recently Used (LRU)
– Discards least recently used data

y Most Recently Used (MRU)

– Discards most recently used data

Oldest Data y Read Ahead (pre-fetch)

– Monitors read requests from hosts to
detect sequential access
– If sequential access is detected, then
data is read from the disk into cache
before it is requested by the host

Cache is a finite resource. Even though the intelligent storage systems can have hundreds of GB of cache, when
all cache addresses are used up for data, some addresses have to be freed up to accommodate new data. Waiting
until a cache full condition occurs to free up addresses is inefficient and leads to performance degradation. The
array operating environment should proactively maintain a set of free addresses and/or a list of addresses that can
be potentially freed up when required. Algorithms used for cache management are:
y Least Recently Used (LRU) – access to data in cache is monitored continuously, and the addresses that have
not been accessed in a “long time” can be freed up immediately, or can be marked as being candidates for re-
use. This algorithm assumes that data not accessed in a while will not be requested by the host. The length
of time that an address should be inactive prior to being freed up is dependent on the implementation. Quite
clearly, if an address contains write data, not yet committed to disk, the data will of course be written to disk
before the address is re-used.
y Most Recently Used (MRU) – is the converse of LRU. Addresses that have been accessed most recently
will be freed up or marked as potential candidates for re-use. This algorithm assumes that data that has been
accessed in the immediate past may not be required for a while.
y Read Ahead – if the read requests are sequential, i.e. a contiguous set of disk blocks, several more blocks not
yet requested by the host can be read from disk and placed in cache. When the host subsequently requests
these blocks, these read operations will be read hits. In general, there is an upper limit to the amount of data
that is pre-fetched. The percentage of pre-fetched data that is actually used is also monitored. A high
percentage would imply that the algorithm is correctly predicting the sequential access pattern. A low
percentage would indicate that effort is being wasted in performing pre-fetch, and that the access pattern
from the host is not truly sequential.
Some implementations allow for data to be “pinned” in cache permanently. The pinned addresses will not
participate in the LRU or the MRU considerations. Note that the slide shows a depiction of the LRU.

Disk Storage Systems - 22

Write Algorithms
Write-through Cache
Cache
Write
Request
Acknowledgement

Write-back
Cache
Write
Request
Acknowledgement Acknowledge-
ment

Write-through cache – data is placed in cache, immediately written to disk, and acknowledgement is
sent to the host. As data is committed to disk as it arrives, the risk of data loss is low. Write response
times will be longer because of the mechanical delays of the disk.
Write-back cache – data is placed in cache and immediately acknowledged to the host. At a later time,
data from several writes are committed (de-staged) to disk. Uncommitted data is exposed to risk of
loss in the event of failures. Write response times are much faster as the write operations are isolated
from the mechanical delays of the disk.
Cache could also be by-passed under certain conditions, such as very large write I/O sizes. In this
implementation, writes are sent directly to disk.

Disk Storage Systems - 23

Write Cache: Performance

y Manage peak I/O requests “bursts” through flushing
– Least-recently used pages are flushed from cache to the drives

y For maximum performance:

– Provide headroom in write cache for I/O bursts

y Coalesce small host writes into larger disk writes

– Improve sequentiality at the disk

Some of the things that improve performance include:

y Manage peak I/O requests by absorbing large groups of writes—called bursts—without becoming
bottlenecked by the speed of a physical disk. This is known as burst smoothing.
y Merging several writes to the same area into a single operation
The algorithms that manage cache should adapt to changing data access patterns. The actual
algorithms used are vendor-specific.

Disk Storage Systems - 24

Lesson Summary
Key points covered in this lesson:
y Cache is a memory space used by an intelligent storage
system to reduce the time required to service I/O
requests from the host
y Cache can speed up both read and write operations
y Algorithms to manage cache include:
– Least Recently Used (LRU)
– Most Recently Used (MRU)
– Read Ahead (pre-fetch)

y Cache write algorithms include:

These are the key points covered in this lesson. Please take a moment to review them.

Disk Storage Systems - 25

Module Summary
Key points covered in this module:
y Intelligent Storage Systems are RAID Arrays that are
highly optimized for I/O processing
y Monolithic storage systems are generally aimed at the
enterprise level, centralizing data in a powerful system
with hundreds of drives
y Modular storage systems provide storage to a smaller
number of (typically) Windows or Unix servers than larger
integrated storage systems
y Cache in intelligent storage systems accelerates
response times for host I/O requests
© 2007 EMC Corporation. All rights reserved. Disk Storage Systems - 26

These are the key points covered in this module. Please take a moment to review them.

Disk Storage Systems - 26

9 Check Your Knowledge

y What are the parts of an Intelligent Storage System?
y What are the differences between a monolithic and a
modular array?
y What is the difference between a read cache hit and a
read cache miss?
y What is the difference between Least Recently Used and
Most Recently Used algorithms?
y What is the difference between Write-through and Write-
back cache?

Check your knowledge of this module by taking some time to answer the questions shown on the slide.

Disk Storage Systems - 27

Apply Your Knowledge

Upon completion of this case study, you will be able to:
y Describe the basic architecture of the EMC CLARiiON
modular storage array
y Describe the basic architecture of the EMC Symmetrix
integrated storage array

At this point, we will apply what you learned in this module to some real world examples. In this case,
we look at the architecture of the EMC CLARiiON and EMC Symmetrix storage arrays.

Disk Storage Systems - 28

CLARiiON CX3-80 Architecture

UltraScale 1/2/4 Gb/s Fibre Channel Front End UltraScale
Storage Processor Storage Processor
CLARiiON Messaging Interface (CMI)
Multi-Lane PCI-Express bridge link
Fibre Channel Fibre Channel
Power supply SPS
Mirrored cache Mirrored cache
CPU CPU Fan Fan Fan Fan CPU CPU
FC FC FC FC SPS Power supply FC FC FC FC

2/4 Gb/s Fibre 4Gb/s LCC 4Gb/s LCC 2/4 Gb/s Fibre
Channel Back End Channel Back End

4Gb/s LCC 4Gb/s LCC

Up to 480 drives max per storage system (CX3-80)

The CLARiiON architecture includes fully redundant, hot swappable components—meaning the
system can survive the loss of a fan or a power supply, and the failed component can be replaced
without powering down the system.
y The Standby Power Supplies (SPSs) maintain power to the cache for long enough to allow its
content to be copied to a dedicated disk area (called the vault) if a power failure should occur.
y Storage Processors communicate with each other over the CLARiiON Messaging Interface (CMI)
channels. They transport commands, status information, and data for write cache mirroring
between the Storage Processors. CMI is used for peer-to-peer communications in the SAN space
and may be used for I/O expansion in the NAS space.
y The CX3-80 uses PCI-Express as the high-speed CMI path. PCI Express architecture delivers
advance I/O technology delivering high bandwidth per pin, superior routing characteristics, and
improved reliability.
y When more capacity is required, additional disk array enclosures containing disk modules can be
easily added. Link Control Cards (LCC) connect shelves of disks.

Disk Storage Systems - 29

Assigning CLARiiON LUNs to Hosts

y CLARiiON disks are grouped into RAID Groups
– Disks from any enclosure may be used in a RAID Group
– All disks in a RAID Group must be either Fibre Channel or ATA
– A RAID Group is the ‘RAID set’ discussed earlier
– A RAID Group may be a single disk, or RAID Level 0, 1, 1/0, 3 or 5

y The RAID Group is then partitioned into LUNs

– All LUNs in a RAID Group will be the same RAID Level

y The LUNs are then made accessible to hosts

– CLARiiON-resident software ensures that LUNs are seen only by the
hosts that own them

Making LUNs available to a host is a 3-step process:

1. Create a RAID Group
Choose which physical disks should be used for the RAID Group and assign those disks to the
group. Each physical disk may be part of one RAID Group only.
2. Create LUNs on that RAID Group
LUNs may be created (Note: The CLARiiON term is ‘bound’) on that RAID Group. The first LUN
that is bound has a RAID Level selected by the user; all subsequent LUNs must be of the same
RAID Level.
3. Assign those LUNs to hosts
When LUNs have been bound, they are assigned to hosts. Normal host procedures, such as
partitioning, formatting and labeling, are then be performed to make the LUN usable. The
CLARiiON software that controls host access to LUNs, a process known as LUN masking, is
called Access Logix.

Disk Storage Systems - 30

EMC Symmetrix DMX Array

y Direct Matrix Interconnect
y Dynamic Global Memory
y Enginuity Operating Environment
y Processing Power
y Flexible Back-End Configurations
y Fault-tolerant Design

The Symmetrix DMX series arrays delivers the highest levels of performance and throughput for high-end storage. It incorporates the
following features:
y Direct Matrix Interconnect
− Up to 128 direct paths from directors and memory
− Up to 128 GB/s data bandwidth; up to 6.4 GB/s message bandwidth
y Dynamic Global Memory
− Up to 512 GB Global Memory
− Intelligent Adaptive Pre-fetch
− Tag-based cache algorithms
y Enginuity Operating Environment
− Foundation for powerful storage-based functionality
− Continuous availability and advanced data protection
− Performance optimization and self-tuning
− Advanced management
− Integrated SMI-S compliance
y Advanced processing power
− Up to 130 PowerPC Processors
− Four or eight processors per director
y High-performance back end
− Up to 64 2 Gb/s Fibre Channel paths (12.8 GB/s maximum bandwidth)
− RAID 0, 1, 1 + 0, 5
− 73, 146, and 300 GB 10,000 rpm disks; 73 and 146 GB 15,000 rpm disks; 500 GB 7,200 rpm disks
y A fully fault-tolerant design
− Nondisruptive upgrades and operations
− Full component-level redundancy with hot-swappable replacements
− Support: Dual-ported disks and global-disk hot spares
− Redundant power supplies and integrated battery backups
− Remote support and proactive call-home capabilities

Disk Storage Systems - 31

Symmetrix DMX Series Direct Matrix Architecture

This shows the logical representation of the Symmetrix DMX architecture. The Front-end (host connectivity directors and
ports), Cache (Memory) and the Back-end (directors/ports which connect to the physical disks) are shown.
Front-end:
y Hosts connect to the DMX via front-end ports (shown as ‘Host Attach”) on Front-end directors. DMX supports
ESCON, FICON, Fibre Channel and iSCSI front-end connectivity.
Back-end:
y The disk director ports (back-end) are connected to Disk Array Enclosures. The DMX back-end employs an arbitrated
loop design and dual-ported disk drives. I/Os to the physical disks are handled by the back-end.
Cache:
y All front-end I/Os (reads and writes) to the Symmetrix have to pass through the cache, this is unlike some arrays which
will allow I/Os to by pass cache altogether. Let us take a look at how the Symmetrix handles front-end read and write
operations:
y Read: A read is issued by a server. The Symmetrix will look for the data in the cache, if the data is in cache it will be
read from cache and sent to the server via the front-end port – This is a read hit. If the data is not in cache, then the
Symmetrix will go to the physical disks on the back-end, fetch the data into cache and then send the data from the
cache to the requesting server – This is a read miss.
y Write: A write is issued by a sever. The write will be received in cache and a write complete will be immediately
issued to the server. Data will be de-staged from the cache to the back-end at a later time.
y Enhanced global memory technology supports multiple regions and sixteen connections on each global memory
director, one to each director. Each director slot port is hard-wired point-to-point to one port on each global memory
director board. If a director is removed from a system, the usable bandwidth is not reduced. If a memory board is
removed, the usable bandwidth is dropped.

Disk Storage Systems - 32

Symmetrix DMX: Dual-ported Disk and Redundant Directors

Disk Director 1 Disk Director 16

P
P

S
S

P
P

S
S

P
P

S
S

P
P

Symmetrix DMX back-end employs an arbitrated loop design and dual-ported disk drives. Each drive
connects to two paired Disk Directors through separate Fibre Channel loops. Port Bypass Cards
prevent a Director failure or replacement from affecting the other drives on the loop. Directors have
four primary loops for normal drive communication and four secondary loops to provide alternate path,
if the other director fails.

Disk Storage Systems - 33

Configuring Symmetrix Logical Volumes (SLV)

Physical Physical Physical Physical Physical
Disk Disk Disk Disk Disk Symmetrix Service Processor

Running SymmWin Application

y Initial configuration of Symmetrix Logical Volumes is done

via the Symmetrix Service Processor and the SymmWin
interface/application
– A configuration file (IMPL.BIN) is created and loaded on to the array

y Subsequent configuration changes can be performed

All Symmetrix arrays have a Service Processor running the SymmWin application. Initial
configuration of Symmetrix arrays has to be performed by EMC personnel via the Symmetrix Service
Processor.
Physical disks (in the disk array enclosures) are sliced into hypers, or disk slices, and protection
schemes (RAID1, RAID5, etc.) are then incorporated, creating the Symmetrix logical volumes
(discussed in the next slide). A Symmetrix logical volume is the entity that is presented to a host via a
Symmetrix front-end port. The host views the Symmetrix logical volume as a physical drive. Do not
confuse Symmetrix logical volumes with host-based logical volumes. Symmetrix logical volumes are
defined by the Symmetrix configuration, while host-based logical volumes are configured by Logical
Volume Manager software.
EMC ControlCenter and Solutions Enabler are software packages which are used to monitor and
manage the Symmetrix. Solutions Enabler has a command line interface, while ControlCenter provides
a Graphical User Interface (GUI). ControlCenter is a very powerful storage management tool,
managing the Symmetrix is one of the many things it can do.

Disk Storage Systems - 34

RAID1 – Symmetrix Logical Volume

y RAID1 SLV
– Data is written to two hyper volumes on two different physical disks
which are accessed via two different disk directors

y Host is unaware of data protection being applied

Different Disk
Disk Director
Director

Mirroring provides the highest level of performance and availability for all applications. Mirroring
maintains a duplicate copy of a logical volume on two physical drives. The Symmetrix maintains
these copies internally by writing all modified data to both physical locations. The mirroring function
is transparent to attached hosts, as the hosts view the mirrored pair of hypers as a single Symmetrix
logical volume.
A RAID1 SLV: Two hyper volumes from two different disks on two different disk directors are
logically presented as an RAID1 SLV. The hyper volumes are chosen from different disks on different
disk directors to provide maximum redundancy. The SLV is given an Hexadecimal address. In the
example, SLV 04B is a RAID1 SLV whose hyper volumes exist on the physical disks in the back-end
of the array.
The SLV is then mapped to one or more Symmetrix front-end ports (a target and LUN ID is assigned
at this time). The SLV can now be assigned to a server. The server views the SLV as a physical drive.
On a fully configured Symmetrix DMX3 array, one can have up to 64,000 Symmetrix logical volumes.
The maximum number of SLVs on a DMX is a function of the number of disks, disk directors, and the
protection scheme used.

Disk Storage Systems - 35

Data Protection
y Mirroring (RAID 1)
– Highest performance, availability and functionality
– Two hyper mirrors form one Symmetrix Logical Volume located on separate
physical drives

y Parity RAID (not available on DMX3)

– 3 +1 (3 data and 1 parity volume) or 7 +1 (7 data and 1 parity volume)

y Raid 5 Striped RAID volumes

– Data blocks are striped horizontally across the members of the RAID group
( 4 or 8 member group); parity blocks rotate among the group members

y RAID 10 Mirrored Striped Mainframe Volumes

Data protection options are configured at the volume level and the same Symmetrix can employ a
variety of protection schemes.
Dynamic Sparing: Disks in the back-end of the Array which are reserved for use when a physical disk
fails. When a physical disk fails, the dynamic spare is used as a replacement.
SRDF is a remote replication solution and is discussed later on in the Business Continuity section of
this course.

Disk Storage Systems - 36

Assigning Symmetrix Logical Volumes to Hosts

y Configure Symmetrix Logical Volumes
y Map Symmetrix Logical Volumes to Front-end ports
– Performed via EMC ControlCenter or Solutions Enabler

y Make Symmetrix Logical Volumes accessible to hosts

– SAN Environment
¾ Zone Hosts to Front-end ports
¾ Perform LUN Masking
Can be performed via EMC ControlCenter or Solutions Enabler
LUN Masking information is maintained on the Symmetrix in the VCM Database
(VCMDB)
LUN Masking information is also flashed to all the front-end directors

Assigning Symmetrix Logical Volumes to hosts is a 3 step process:

1. Configure SLVs
2. Map the SLVs to front-end ports
y When a SLV is created, it is not assigned to any front-end port, thus one must assign SLVs to
front-end ports before a host can access the same. Mapping is the task of assigning SLVs to front-
end ports. For redundancy, map a device to more that one front-end port.
3. Make SLVs accessible to hosts
y SAN Environment – Zoning and LUN masking has to be performed. Zoning and LUN Masking
in the SAN is discussed later in this course.

Disk Storage Systems - 37

Disk Storage Systems - 38

Data Flow Exercise

Data Flow Exercise -1

Q: Architecture Exercise
Identify the components of a data storage environment:

C D E

A B F

Match the letter in the diagram with the appropriate component:

___ Host
___ Intelligent Storage System
___ RAID Set
___ Cache
___ Front End
___ Back End
___ Connectivity

Data Flow Exercise -2

Q: Data Flow Exercise – Write Operation

In this example, the storage system uses write-back cache.
Identify the operations performed when writing data to disk, then list them
in the correct order:

F D C A

B
E

1. Fill in the letter in the diagram that corresponds to the appropriate operation. Hint: Not all of the
operations are used.
___ Host sends data to storage system
___ Data is written to physical disk some time later
___ Data is written to cache
___ Data is written to physical disk immediately
___ An acknowledgement is sent to the host
___ Data is returned to the host
___ Data is sent to back end
___ Back end receives status of write operation
2. List the operations in the correct order.

Data Flow Exercise -3

Q: Data Flow Exercise – Read Cache Hit

Identify the operations performed for a read request from the host. List
the operations in the correct order.

A
C

D B

Fill in the letter in the diagram that corresponds to the appropriate operations Hint: Not all of the
operations are used.
___ Host sends read request to storage system
___ Data is read from physical disk when requested by the LRU algorithm
___ Data is written to cache
___ Data is read from physical disk immediately
___ Read data is sent to the host
___ Status is returned to the host
___ Data is sent to back end
___ Back end receives status of read operation
___ Cache is searched, and data is found there
___ Cache is searched, and data is not found there
___ Data placed in cache by a previous read or write operation

Data Flow Exercise -4

Q: Data Flow Exercise – Read Cache Miss

Identify the operations performed for a read request from the host. List
the operations in the correct order.

F D A

E C B

1. Fill in the letter in the diagram that corresponds to the appropriate operations Hint: Not all of the
operations are used.
___ Host sends read request to storage system
___ Data is read from physical disk when requested by the LRU algorithm
___ Data is written to cache
___ Data is read from physical disk immediately
___ Read data is sent to the host
___ Status is returned to the host
___ Data is sent to back end
___ Back end receives status of read operation
___ Cache is searched, and data is found there
___ Cache is searched, and data is not found there
2. List the operations in the correct order.

Data Flow Exercise -5

Data Flow Exercise -6

CASE STUDY 1

Business Profile:

Acme Telecom is involved in mobile wireless services across the United States and has about 5000
employees worldwide. This company is Chicago based and has 7 regional offices across the country.
Although Acme is doing well financially, they continue to feel competitive pressure. As a result, the
company needs to ensure that the IT infrastructure takes advantage of fault tolerant features.

Current Situation/Issues:

• The company uses a number of different applications for communication, accounting, and
management. All the applications are hosted on individual servers with disks configured as RAID 0.
• All financial activity is managed and tracked by a single accounting application. It is very important
for the accounting data to be highly available.
• The application performs around 15% write operations, and the remaining 85 % are reads.
• The accounting data is currently stored on a 5-disk RAID 0 set. Each disk has an advertised formatted
capacity of 200 GB, and the total size of their files is 730 GB.
• The company performs nightly backups and removes old information—so the amount of data is
unlikely to change much over the next 6 months.

The company is approaching the end of the financial year and the IT budget is depleted. Buying even one
new disk drive will not be possible.

How would you suggest that the company restructure their environment?
You will need to justify your choice based on cost, performance, and availability of the new solution.

RAID level to use:

Advantages:
Disadvantages:
CASE STUDY 2

Business Profile:

Current Situation/Issues:

• The company uses a number of different applications for communication, accounting, and
management. All the applications were hosted on individual servers with disks configured as RAID 0.
• The company changed the RAID level of their accounting application based on your
recommendations 6 months ago.
• It is now the beginning of a new financial year and the IT department has an increased budget. You
are called in to recommend changes to their database environment.
• You investigate their database environment closely, and observe that the data is stored on a 6-disk
RAID 0 set. Each disk has an advertised formatted capacity of 200 GB and the total size of their files
is 900 GB. The amount of data is likely to change by 30 % over the next 6 months and your solution
must accommodate this growth.
• The application performs around 40% write operations, and the remaining 60 % are reads. The
average size of a read or write is small, at around 2 KB.

How would you suggest that they restructure their environment?

A new 200 GB disk drive costs $1000. The controller can handle all commonly used RAID levels, so will
not need to be replaced.
What is the cost of the new solution?
Justify your choice based on cost, performance, and data availability of the new solution.

RAID level to use:

Section Summary
Key Points covered in this Section:
y Physical and logical components of a host
y Common connectivity components and protocols
y Features of intelligent disk storage systems
y Data flow between the host and the storage array
y Apply Your Knowledge
y Data Flow Exercise
y Case Studies

These are the key points covered in this section. Please take a moment to review them.
This concludes the training. Please proceed to the Course Completion slide to take the Assessment.

Disk Storage Systems -1

Computer Hardware
100% (2)
Computer Hardware
53 pages
Information Technology
No ratings yet
Information Technology
97 pages
Caring For Network CH1-4
No ratings yet
Caring For Network CH1-4
163 pages
SAN MGMT Student Guide
100% (1)
SAN MGMT Student Guide
628 pages
Merged
No ratings yet
Merged
416 pages
Chapter 3a Hardware Memory
No ratings yet
Chapter 3a Hardware Memory
69 pages
Lab Manual 1
No ratings yet
Lab Manual 1
45 pages
ISM - Course Organization: Isdmr:Beit Viii: Chap2:Madh UN 1
No ratings yet
ISM - Course Organization: Isdmr:Beit Viii: Chap2:Madh UN 1
37 pages
Chapter 2
No ratings yet
Chapter 2
66 pages
Chapter 2
No ratings yet
Chapter 2
69 pages
M 04 Res 01
No ratings yet
M 04 Res 01
98 pages
Lesson 4 Computer Hardware Parts
No ratings yet
Lesson 4 Computer Hardware Parts
30 pages
It Fund Lecture Notes2new
No ratings yet
It Fund Lecture Notes2new
44 pages
M00res02 PDF
No ratings yet
M00res02 PDF
25 pages
(Black Meets Black: Older PC'S) : Notes On PC Assembly Make SAFETY Your First Priority
100% (2)
(Black Meets Black: Older PC'S) : Notes On PC Assembly Make SAFETY Your First Priority
8 pages
CAR
No ratings yet
CAR
30 pages
SAN Management Student Guide PDF
0% (1)
SAN Management Student Guide PDF
628 pages
CS9618 Camb 3 Hardware
No ratings yet
CS9618 Camb 3 Hardware
16 pages
Ict Cs181 Week 2
No ratings yet
Ict Cs181 Week 2
55 pages
LO1 and LO2
No ratings yet
LO1 and LO2
28 pages
Lecture 2
No ratings yet
Lecture 2
59 pages
Ict Cs181 Week 2
No ratings yet
Ict Cs181 Week 2
55 pages
Computing-GCSE - Paper 1
No ratings yet
Computing-GCSE - Paper 1
29 pages
Chapter3 Hardware Part1 2024
No ratings yet
Chapter3 Hardware Part1 2024
15 pages
OS - Unit 1
No ratings yet
OS - Unit 1
67 pages
Microsoft Word M08 Care For Network & Computer Hardware1234
No ratings yet
Microsoft Word M08 Care For Network & Computer Hardware1234
52 pages
COS 101 Lecture 3 - Computer Hardware
No ratings yet
COS 101 Lecture 3 - Computer Hardware
29 pages
Internal Computer Hardware Componenet
No ratings yet
Internal Computer Hardware Componenet
8 pages
Computer Hardware For Elearning-1
No ratings yet
Computer Hardware For Elearning-1
8 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
6 pages
Computer Hardwares and Its Function
No ratings yet
Computer Hardwares and Its Function
5 pages
COC 1 Computer Systems Servicing Presentation by Horacio N. Aceveda JR
No ratings yet
COC 1 Computer Systems Servicing Presentation by Horacio N. Aceveda JR
92 pages
MGU-TASFA-EDSI-IT-101 - Module 2 - Hardware and Operating Systems
No ratings yet
MGU-TASFA-EDSI-IT-101 - Module 2 - Hardware and Operating Systems
33 pages
Caring For Network and Computer Hardware
No ratings yet
Caring For Network and Computer Hardware
65 pages
Unit 3 Hardware
No ratings yet
Unit 3 Hardware
13 pages
Unit 1 Ca Final
No ratings yet
Unit 1 Ca Final
20 pages
Buee Construction and Industrial College
No ratings yet
Buee Construction and Industrial College
11 pages
HW SW
No ratings yet
HW SW
53 pages
Computer System Structure, Organization and Operation
No ratings yet
Computer System Structure, Organization and Operation
10 pages
System Unit: Box-Like Case Containing Electronic Components Used To Process Data
No ratings yet
System Unit: Box-Like Case Containing Electronic Components Used To Process Data
38 pages
Computer System and Network
100% (3)
Computer System and Network
245 pages
Ab - Latif Along Soft & Hardware
No ratings yet
Ab - Latif Along Soft & Hardware
53 pages
Module II - Chapter 1 Computer Hardware
No ratings yet
Module II - Chapter 1 Computer Hardware
47 pages
Software and Hardware Interaction: Learning Outcomes Words To Know
No ratings yet
Software and Hardware Interaction: Learning Outcomes Words To Know
8 pages
Section 1: Fundamentals of Hardware and Software: Objective 1.1: Describe A General-Purpose Computer System
No ratings yet
Section 1: Fundamentals of Hardware and Software: Objective 1.1: Describe A General-Purpose Computer System
40 pages
DCCN 1pra
No ratings yet
DCCN 1pra
9 pages
Internship Training: Submitted by - Gaurav Sisodiya Sanmitra Pomane Akash Nimbalkar Girish Mohite
No ratings yet
Internship Training: Submitted by - Gaurav Sisodiya Sanmitra Pomane Akash Nimbalkar Girish Mohite
30 pages
CompTIA Summary
No ratings yet
CompTIA Summary
27 pages
Difference Between Incore I Node and Disk Inode
100% (1)
Difference Between Incore I Node and Disk Inode
3 pages
Processors (CPU) : Internal System Unit Components
No ratings yet
Processors (CPU) : Internal System Unit Components
5 pages
Gr10 Cat Theory LB 2
No ratings yet
Gr10 Cat Theory LB 2
236 pages
Hardware and Software Basics: With ICT Team
No ratings yet
Hardware and Software Basics: With ICT Team
53 pages
Gunsmoke Guns Search Warrant Affidavit
100% (2)
Gunsmoke Guns Search Warrant Affidavit
35 pages
Computer Hardware
No ratings yet
Computer Hardware
36 pages
Unit 2 Computer Systems: Structure
No ratings yet
Unit 2 Computer Systems: Structure
13 pages
Computer Hardware: Personal Computer Operating System Software
No ratings yet
Computer Hardware: Personal Computer Operating System Software
15 pages
VNX5300 Parts Guide
No ratings yet
VNX5300 Parts Guide
44 pages
Java Book
100% (1)
Java Book
275 pages
2-3 - Common - Storage - Protocols - Copie
No ratings yet
2-3 - Common - Storage - Protocols - Copie
58 pages
Computer Components
No ratings yet
Computer Components
8 pages
Information Sheets CO3.1-2 "File Services
No ratings yet
Information Sheets CO3.1-2 "File Services
42 pages
Hardware and Networking
No ratings yet
Hardware and Networking
6 pages
9-12 Plant Science Curriculum
No ratings yet
9-12 Plant Science Curriculum
309 pages
Restauracion NAS Iomega HMNHD
No ratings yet
Restauracion NAS Iomega HMNHD
2 pages
Timex Sinclair 2068 Technical Manual Best
No ratings yet
Timex Sinclair 2068 Technical Manual Best
401 pages
Dell EMC VMware VSAN Ready Nodes
No ratings yet
Dell EMC VMware VSAN Ready Nodes
131 pages
CHS Install Computer System & Network
No ratings yet
CHS Install Computer System & Network
103 pages
Chap4 - Basic - Classification - Class Teaching
No ratings yet
Chap4 - Basic - Classification - Class Teaching
168 pages
CCMS
No ratings yet
CCMS
115 pages
Introduction - The HCI
No ratings yet
Introduction - The HCI
37 pages
Unit 1
No ratings yet
Unit 1
17 pages
Page Replacement Algo
No ratings yet
Page Replacement Algo
3 pages
How Google Works
No ratings yet
How Google Works
11 pages
Dell PowerEdge 1650 Service Manual
No ratings yet
Dell PowerEdge 1650 Service Manual
83 pages
3 - Internrt Oft
No ratings yet
3 - Internrt Oft
8 pages
ProDOS 8 Technical Reference Manual
No ratings yet
ProDOS 8 Technical Reference Manual
188 pages
Learn 7zip Command Examples in Linux
No ratings yet
Learn 7zip Command Examples in Linux
3 pages
Magnetic Tapes Cd-Rom: Adil Yousif, PHD
No ratings yet
Magnetic Tapes Cd-Rom: Adil Yousif, PHD
31 pages
HW - Disk Storage 5
No ratings yet
HW - Disk Storage 5
4 pages
Connecting The Dots
No ratings yet
Connecting The Dots
21 pages
Cmpe242 Fall 2009-2010 MT1
No ratings yet
Cmpe242 Fall 2009-2010 MT1
5 pages
Operating System: Genral Purpose File Related Commands Directory Related Commands
No ratings yet
Operating System: Genral Purpose File Related Commands Directory Related Commands
8 pages
Corporate War Rooms
No ratings yet
Corporate War Rooms
46 pages
Intel 80286
No ratings yet
Intel 80286
24 pages
Microprocessors Lecture 1 From Random Logic To Microprocessors
No ratings yet
Microprocessors Lecture 1 From Random Logic To Microprocessors
4 pages
Effects of Leberlisation
No ratings yet
Effects of Leberlisation
6 pages
Veritas Volume Manager CLI Commands
No ratings yet
Veritas Volume Manager CLI Commands
5 pages
How To Launch Remix OS For PC
No ratings yet
How To Launch Remix OS For PC
2 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
The complete guide to Hardware Technician Terminology: A simplified guide
From Everand
The complete guide to Hardware Technician Terminology: A simplified guide
Sumitra Kumari
No ratings yet
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet