0% found this document useful (0 votes)
18 views51 pages

Storage - Part 1

The document provides an overview of IT infrastructure focusing on storage systems, including historical developments from early storage methods to modern SSDs and RAID configurations. It discusses various storage technologies such as mechanical hard disks, solid-state drives, and tape storage, along with their advantages and disadvantages. Additionally, it covers concepts like data deduplication, cloning, snapshots, and storage area networks (SAN) that enhance data management and efficiency.

Uploaded by

samir.elsagheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views51 pages

Storage - Part 1

The document provides an overview of IT infrastructure focusing on storage systems, including historical developments from early storage methods to modern SSDs and RAID configurations. It discusses various storage technologies such as mechanical hard disks, solid-state drives, and tape storage, along with their advantages and disadvantages. Additionally, it covers concepts like data deduplication, cloning, snapshots, and storage area networks (SAN) that enhance data management and efficiency.

Uploaded by

samir.elsagheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

IT Infrastructure

Architecture
Infrastructure Building Blocks
and Concepts

Storage – Part 1
(chapter 10)
Introduction

• Every day, approximately


200,000 petabytes of new
information is generated
worldwide
• The total amount of digital data
doubles approximately every
two years
History

• Early computers used a very basic persistent


storage system, based on punched cards or
paper tape
• Drum memory was one of the first magnetic
read/write storage systems
 It was widely used in the 1950s and into the 1960s
 Consisted of a large rotating metal cylinder that was
coated on the outside with magnetic recording
material
 Multiple rows of fixed read-write heads were placed
along the drum, each head reading or writing to one
track
 The drum could store 62 kB of data
History – Hard disks

• The first commercial digital disk storage


device was part of the IBM RAMAC 350
system, shipped in 1956
 Approximately 5 MB of data
 Fifty 61 cm diameter disks
 Weighed over a ton

• Over the years:


 Physical size of hard disks shrunk
 Magnetic density increased
 Rotation speed increased from 3,600 rpm to 15,000
rpm
 Seek times lowered as a result of using servo
controlled read/write heads instead of stepper
motors
History – Tapes

• The IBM 726, introduced in 1952, was one of


the first magnetic tape systems
 2 MB per 20-centimeter-diameter reel of tape

• Reel tapes were used until the late 1980s,


mostly in mainframes
• In 1984, DEC introduced the Digital Linear Tape
(DLT)
 Super DLT (SDLT) tape cartridges can store up to 300
GB of data

• Linear Tape Open (LTO) was originally


developed in the late 1990s
 LTO version 7 was released in 2015 and can hold up to
6 TB of data
Storage building blocks
Storage model

• Most servers use external


storage, sometimes
combined with internal
storage
• A model of storage
building blocks is shown
on the right
Disks – command sets

• Disks are connected to disk controllers using a command set, based on


either ATA or SCSI
 Advanced Technology Attachment (ATA), also known as IDE, uses a relatively simple
hardware and communication protocol to connect disks to computers (mostly PCs)
 Small Computer System Interface (SCSI) is a set of standards for physically connecting
and transferring data between computers (mostly servers) and peripheral devices, like
disks and tapes
 The SCSI command set is complex - there are about 60 different SCSI commands in total

• Serial interfaces replaced the parallel interfaces, but the disk commands
are still the same
Mechanical hard disks

• Mechanical disks consist of:


 A vacuum sealed case
 One or more spinning magnetic disks on one spindle
 A number of read/write heads that can move to reach each part of the spinning disks
Mechanical hard disks

• Serial ATA (SATA) disks


 Low-end high-capacity disks
 Ideal for bulk storage applications (like archiving or backup)
 Have a low cost per gigabyte
 Often used in PCs and laptops
 Use the SMART command set to control the disk
Mechanical hard disks

• Serial Attached SCSI (SAS) disks


 Relatively expensive
 High end disks
 Spinning disk platters with a rotational speed of 10,000 or 15,000 rpm
 Typically have 25% of the capacity of SATA or NL-SAS disks
 Uses the SCSI command set that includes error-recovery and error-reporting and more
functionality than the SMART commands used by SATA disks
Mechanical hard disks

• Near-Line SAS (NL-SAS) disks


 Have a SAS interface, but the mechanics of SATA disks
 Can be combined with faster SAS disks in one storage array
Solid State Drives (SSDs)

• SSD disks don’t have moving parts


• Based on flash technology
 Flash technology is semiconductor-based memory
that preserves its information when powered off

• Connected using a standard SAS disk


interface
• Data can be accessed much faster than using
mechanical disks
 Microseconds vs. milliseconds

• Most storage vendors offer all-flash arrays –


storage systems using only SSD disks
Solid State Drives (SSDs)

• SSDs consume less power, and therefore generate less heat, than
mechanical disks
• They have no moving parts
• They generate no vibrations that could influence or harm other
components, or shorten their lifetime
• Since 2020, the prices of SSDs are equal to those of mechanical drives
• In the coming years, mechanical drives are expected to be used only as
cheap, low-end storage for applications such as archiving.
Solid State Drives (SSDs)

• Flash memory can only be rewritten a limited number of times


 SSD disks “wear out” more rapidly than mechanical disks
 SSDs keep track of the number of times a sector is rewritten, and map much used
sectors to spare sectors if they are about to wear out
 It is important to monitor the wear level of heavily used SSDs
 Replace them before they break

• NVMe drives can deliver hundreds of thousands of read/write operations


and gigabytes of throughput per second
Disk capacity - Kryder's law

• The density of
information on hard
drives doubles every 13
months
• In recent years we see a
slight slowing of the
curve

Please note that the vertical scale is


logarithmic instead of linear
Disk capacity - Kryder's law

• The picture on the right shows 8 bytes


core memory and 8 GB SD flash memory
 An increase of 1,000,000,000 times in 50 years

• To have full benefits of Kryder's law, the


storage infrastructure should be
designed to handle just in time
expansion
 Buy disks as late as possible!
Tapes

• When storing large amounts of data, tape is the most inexpensive option
• Tapes are suitable for archiving
 Tape manufacturers guarantee a long life expectancy
 DLT, SDLT, and LTO Ultrium cartridges are guaranteed to be readable after 30 years on
the shelf
Tapes - disadvantages

• Tapes are fragile


 Manual handling can lead to mechanical defects

• Tape cartridges contain mechanical parts


 Manually changed tapes get damaged easily

• Frequent rewinding causes stress to the tape substrate


 Leads to lower reliability of data reads

• Tapes are extremely slow


 They only write and read data sequentially
 When a particular piece of data is required, it must be searched by reading all data on
tape until the required data is found
 Together with rewinding of the tape (needed for ejecting the tapes) handling tapes is
expressed in minutes instead of in milliseconds or microsecond
Tapes

• (S)DLT and LTO are the most popular tape cartridge formats in use today
 LTO has a market share of more than 80%
 LTO-9 tape cartridges can store 18 TB of uncompressed data

• Tape throughput is in the 200 MB/s range


 The tape drive interface is capable of even higher speeds
 Most tape drives use 8 Gbit/s Fibre Channel interfaces
 A sustained throughput of between 700 and 900 MB/s
Tape library

• Tape libraries can be used to automate tape


handling
• A tape library is a storage device that
contains:
 One or more tape drives
 A number of slots to hold tape cartridges
 A barcode or RFID tag reader to identify tape
cartridges
 An automated method for loading tapes
Virtual tape library

• A Virtual Tape Library (VTL) uses disks for storing backups


• A VTL consists of:
 An appliance or server
 Software that emulates traditional tape devices and formats

• VTLs combine high performance disk based backup and restore with well-
known backup applications, standards, processes, and policies
• Most of the current VTL solutions use NL-SAS or SATA disk arrays because
of their relatively low cost
• They provide multiple virtual tape drives for handling multiple tapes in
parallel
Controllers

• Controllers connect disks and/or tapes to a server, in one of two ways:


 Implemented as a PCI expansion boards in the server
 Part of a NAS or SAN deployment, where they connect all available disks and tapes to
redundant Fibre Channel, iSCSI, or FCoE connections
Controllers

• A controller can implement:


 High performance
 High availability
 Virtualized storage
 Cloning
 Data deduplication
 Thin provisioning
Controllers

• The controller splits up all


disks in small pieces called
physical extents
• From these physical extents,
new virtual disks (Logical
Unit Numbers - LUNs) are
composed and presented to
the operating system
RAID

• Redundant Array of Independent Disks (RAID) solutions provide:


 High availability of data
 Improvements of performance

• RAID uses multiple redundant disks


• RAID can be implemented:
 In the disk controller’s hardware
 As software running in a server’s operating system
RAID

• RAID can be implemented in several configurations, called RAID levels


• In practice, five RAID levels are implemented most often:
 RAID 0 - Striping
 RAID 1 - Mirroring
 RAID 10 - Striping and Mirroring
 RAID 5 - Striping with distributed parity
 RAID 6 - Striping with distributed double parity
RAID 0 - Striping

• RAID 0 is also known as striping


• Provides an easy and cheap way to
increase performance
• Uses multiple disks, each with a part of
the data on it
• RAID 0 actually lowers availability
 If one of the disks in a RAID 0 set fails, all data
is lost

• Only acceptable if losing all data on the


RAID set is no problem (for instance for
temporary data)
RAID 1 - Mirroring

• RAID 1 is also known as mirroring


• A high availability solution that uses
two disks that contain the same data
• If one disk fails, data is not lost as it is
still available on the mirror disk
• The most reliable RAID level
• High price
 50% of the disks are used for redundancy
only

• A spare physical disk can be


configured to automatically take over
the task of a failed disk
RAID 10 - Striping and mirroring

• RAID 10 uses a combination of striping and mirroring


• Provides high performance and availability
• Only 50% of the available disk space is used
RAID 5 - Striping with distributed
parity
• Data is written in disk blocks on all disks
• A parity block of the written disk blocks is stored as well
• This parity block is used to automatically reconstruct data in a RAID 5 set
(using a spare disk) in case of a disk failure
RAID 6 - Striping with distributed
double parity
• RAID 6 protects against double disk failures by using two distributed parity
blocks instead of one
• Important in case a second disk fails during reconstruction of the first
failing disk
Data compression

• Data on disk and tape is typically stored in a compressed format


• Allows 2 to 2.5 times the amount of data to be stored on the same media
• The degree of compression is never guaranteed - if the data is very
diverse, the compression ratio may be correspondingly lower
Data deduplication

• Data deduplication searches the storage system for duplicate data


segments (disk blocks or files) and removes these duplicates
• Data deduplication is used in archived as well as in production data
Data deduplication

• The deduplication system keeps a table of hash tags to quickly identify


duplicate disk blocks
 The incoming data stream is segmented
 Hash tags are calculated of those segments
 The hashes are compared to hash tags of segments already on disk
 If an incoming data segment is identified as a duplicate, the segment is not stored
again, but a pointer to the matching segment is created for it instead
Data deduplication

• Deduplication can be done inline


or periodically
 Inline deduplication checks for
duplicate data segments before data
is written to disk
 Avoids duplicate data on disks at any
time
 Introduces a relatively large
performance penalty
Data deduplication
 Periodically: writing data to disk first, and periodically check if duplicate
data exists
 Duplicate data is deduplicated by changing the duplicate data to a pointer to existing
data on disk, and freeing disk space of the original block
 This process can be done at times when performance needs are low
 Duplicate data will be stored on the disks for some time
Cloning and snapshots

• With cloning and snapshotting, a copy of data is made at a specific point in


time that can be used independently from the source data
• Usage:
 Create a backup at a specific point in time, when the data is in a stable, consistent state
 Creating test sets of data and an easy way to revert to older data without restoring data
from a backup

• Cloning: the storage system creates a full copy of a disk, much like a RAID
1 mirror disk
Cloning and snapshots

• Snapshot: represents a point in time of the data on the disks


 No writing to those disks is permitted anymore, as long as the snapshot is active
 All writing is done on a separate disk volume in the storage system
 The original disks still provide read-access
Thin provisioning

• Thin provisioning enables the allocation of more storage capacity to users


than is physically installed
 About 50% of allocated storage is never used

• Thin provisioning still provides the applications with the required storage
 Storage is not really available on physical disks
 Uses automated capacity management
 The application's real storage need is monitored closely
 Physical disk space is added when needed

• Typical use: Providing users with large sized home directories or email
storage
Direct Attached Storage (DAS)

• DAS – also known as local disks – is a storage system where one or more
dedicated disks connect via the SAS or SATA protocol to a built-in
controller, connected to the rest of the computer using the PCI bus
• The controller provides a set of disk blocks to the computer, organized in
LUNs (or partitions)
• The computer’s operating system uses these disk blocks to create a file
system to store files
Storage Area Network (SAN)

• A Storage Area Network (SAN) is a specialized storage network that


consists of SAN switches, controllers and storage devices
• It connects a large pool of central storage to multiple servers
• A SAN physically connects servers to disk controllers using specialized
networking technologies like Fibre Channel or iSCSI
• Via the SAN, disk controllers offer virtual disks to servers, also known as
LUNs (Logical Unit Numbers)
• LUNs are only available to the server that has that specific LUN mounted
Storage Area Network (SAN)

• The core of the SAN is a set of SAN


switches, called the Fabric
 Comparable with a LAN’s switched network
segment

• Host bus adapters (HBAs) are


interface cards implemented in
servers
 Comparable to NICs used in networking
 Connected to SAN switches, usually in a
redundant way
Storage Area Network (SAN)

• In SANs, a large number of disks are


installed in one or more disk arrays
• The number of disks varies between dozens
of disks and hundreds of disks
• A disk array can easily contain several
petabytes (PB) of data
• One Petabyte is 1000 Terabytes, or one
million Gigabytes
SAN connectivity protocols

• The most used SAN connectivity


protocols:
 Fibre Channel
 FCoE
 iSCSI

• All Fibre Channel devices are


connected to Fibre Channel
switches, a similar concept as in
Ethernet implementations
Fibre Channel

• Fibre Channel (FC) is a dedicated level 2 network protocol, specially


designed for transportation of storage data blocks
• Speeds: 1, 2, 4, 8, 16, 32, 64, and 128 Gbit/s
• Runs on:
 Twisted pair copper wire (i.e. UTP and STP)
 Fiber optic cables

• The Fibre Channel protocol was specially developed for the transport of
disk blocks
• The protocol is very reliable, with guaranteed zero data loss
Fibre Channel

• Three network topologies:


 Point-to-Point
 Two devices are connected directly to each other
 Arbitrated loop
 Also known as FC-AL
 All devices are in a loop
 Switched fabric
 All devices are connected to Fibre Channel switches
 A similar concept as in Ethernet implementations

• Most implementations today use a switched fabric


FCoE

• Fibre Channel over Ethernet (FCoE) encapsulates Fibre Channel data in


Ethernet packets
• Allows Fibre Channel traffic to be transported over 10 Gbit or higher
Ethernet networks
• FCoE eliminates the need for separate Ethernet and Fibre Channel cabling
and switching technology
• PCoE needs at least 10 Gbit Ethernet with special extensions, known as
Data Center Bridging (DCB) or Converged Enhanced Ethernet (CEE)
FCoE

• Ethernet extensions:
 Lossless Ethernet connections
 A FCoE implementation must guarantee that no Ethernet packets are lost
 Quality of Service (QoS)
 Allows FCoE packets to have priority over other Ethernet packets to avoid storage performance
issues
 Large Maximum Transfer Unit (MTU) support
 Allows Ethernet packets of 2500 bytes in size, instead of the standard 1500 bytes
 Also known as Jumbo frames
FCoE

• FCoE needs specialized Converged Network


Adapters (CNAs)
• CNAs support the Ethernet extensions
• They present themselves to the operating
system as two adapters:
 Ethernet Network Interface Controller (NIC)
 Fibre Channel Host Bus Adapter (HBA)
iSCSI

• iSCSI allows the SCSI protocol to run over Ethernet LANs using TCP/IP
• Uses the familiar TCP/IP protocols and well known SCSI commands
• Performance is typically lower than that of Fibre Channel, due to the
TCP/IP overhead
• With 10 or 40 Gbit/s Ethernet and jumbo frames, iSCSI is now rapidly
conquering a big part of the SAN market

You might also like