Module 1
1)Define data.Explain the factors that have contributed to the growth
of digital data.
Data is a collection of raw facts from which conclusions may be drawn.
Eg: a printed book, a family photograph, a movie on videotape, e‐mail message, an e-
book, a bitmapped image, or a digital movie are all examples of data.
Digital data
The following is a list of some of the factors that have contributed to the growth of digital data :
1. Increase in data processing capabilities: Modern-day computers provide a significant
increase in processing and storage capabilities. This enables the conversion of various
types of content and media from conventional forms to digital formats.
2. Lower cost of digital storage: Technological advances and decrease in the cost of
storage devices have provided low-cost solutions and encouraged the development of less
expensive data storage devices. This cost benefit has increased the rate at which data is
being generated and stored.
3. Affordable and faster communication technology: The rate of sharing digital data is
now much faster than traditional approaches. A handwritten letter may take a week to
reach its destination, whereas it only takes a few seconds for an e‐mail message to reach its
recipient.
4. Proliferation of applications and smart devices: Smartphones, tablets, and newer
digitaldevices, along with smart applications, have significantly contributed to the
generation of digital content.
2) With a neat diagram, Explain the evolution of storage architecture.
Evolution of Storage Architecture
Historically, organizations had centralized computers (mainframe) and information storage
devices (tape reels and disk packs) in their data center.
The evolution of open systems and the affordability and ease of deployment that they offer
made it possible for business units/departments to have their own servers and storage.
In earlier implementations of open systems, the storage was typically internal to the server.
This approach is referred to as server-centric storage architecture (see Fig 1.4 [a]).
In this server-centric storage architecture, each server has a limited number of storage
devices, and any administrative tasks, such as maintenance of the server or increasing storage
capacity, might result in unavailability of information.
The rapid increase in the number of departmental servers in an enterprise resulted in
unprotected, unmanaged, fragmented islands of information and increased capital and
operating expenses.
To overcome these challenges, storage evolved from server-centric to information-centric
In information-centric architecture, storage devices are managed centrally and
independent of servers.
These centrally-managed storage devices are shared with multiple servers.
When a new server is deployed in the environment, storage is assigned from the same
shared storage devices to that server.
The capacity of shared storage can be increased dynamically by adding more storage
devices without impacting information availability.
In this architecture, information management is easier and cost-effective.
Storage technology and architecture continues to evolve, which enables organizations to
consolidate, protect, optimize, and leverage their data to achieve the highest return on
information assets.
3)List and explain key characteristics of data center elements.
Key characteristics of data center elements are:
1) Availability: All data center elements should be designed to ensure accessibility. The
inability of users to access data can have a significant negative impact on a business.
2) Security: Polices, procedures, and proper integration of the data center core elements
that will prevent unauthorized access to information must be established. Specific
mechanisms must enable servers to access only their allocated resources on storage
arrays.
3) Scalability: Data center operations should be able to allocate additional processing
capabilities (eg: servers, new applications, and additional databases) or storage on
demand, without interrupting business operations. The storage solution should be able to grow with
the business.
4) Performance: All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed. The infrastructure should
be able to support performance requirements.
5) Data integrity: Data integrity refers to mechanisms such as error correction codes or
parity bits which ensure that data is written to disk exactly as it was received. Any
variation in data during its retrieval implies corruption, which may affect the operations
of the organization.
6) Capacity: Data center operations require adequate resources to store and process large
amounts of data efficiently. When capacity requirements increase, the data center must
be able to provide additional capacity without interrupting availability, or, at the very
least, with minimal disruption. Capacity may be managed by reallocation of existing
resources, rather than by adding new resources.
7) Manageability: A data center should perform all operations and activities in the most
efficient manner. Manageability can be achieved through automation and the reduction
of human (manual) intervention in common tasks.
Module 2
1)Define RAID.Explain the techniques that are used to implement
RAID levels.
RAID is the use of small-capacity, inexpensive disk drives as an alternative to large-
capacity drives common on mainframe computers.
Later RAID has been redefined to refer to independent disks to reflect advances in the storage
technology.
RAID Techniques
There are three RAID techniques
1. striping
2. mirroring
3. parity
Striping
Striping is a technique to spread data across multiple drives (more than one) to use the drives
in parallel.
All the read-write heads work simultaneously, allowing more data to be processed in a shorter
time and increasing performance, compared to reading and writing from a single disk.
Within each disk in a RAID set, a predefined number of contiguously addressable disk
blocks are defined as a strip.
The set of aligned strips that spans across all the disks within the RAID set is called a stripe.
The below figure shows physical and logical representations of a striped RAID set.
Strip size (also called stripe depth) describes the number of blocks in a strip and is the
maximum amount of data that can be written to or read from a single disk in the set.
All strips in a stripe have the same number of blocks.
Having a smaller strip size means that data is broken into smaller pieces while spread
across the disks.
Stripe size is a multiple of strip size by the number of data disks in the RAID set.
Eg: In a 5 disk striped RAID set with a strip size of 64 KB, the stripe size is 320KB
(64KB x 5).
Stripe width refers to the number of data strips in a stripe.
Striped RAID does not provide any data protection unless parity or mirroring is used.
Mirroring
Mirroring is a technique whereby the same data is stored on two different disk drives,
yielding two copies of the data.
If one disk drive failure occurs, the data is intact on the surviving disk drive (see Fig
below) and the controller continues to service the host’s data requests from the surviving disk
of a mirrored pair.
When the failed disk is replaced with a new disk, the controller copies the data from the
surviving disk of the mirrored pair.
This activity is transparent to the host.
Advantages:
complete data redundancy
mirroring enables fast recovery from disk failure.
data protection
Mirroring is not a substitute for data backup. Mirroring constantly captures changes in thedata,
whereas a backup captures point-in-time images of the data.
Disadvantages:
Mirroring involves duplication of data — the amount of storage capacity needed is
twice the amount of data being stored.
Expensive
Parity
Parity is a method to protect striped data from disk drive failure without the cost of
mirroring.
An additional disk drive is added to hold parity, a mathematical construct that allows re-
creation of the missing data.
Parity is a redundancy technique that ensures protection of data without maintaining a full
set of duplicate data.
Calculation of parity is a function of the RAID controller.
Parity information can be stored on separate, dedicated disk drives or distributed across all the
drives in a RAID set.
Fig shows a parity RAID set.
The first four disks, labeled “Data Disks,” contain the data. The fifth disk, labeled “Parity
Disk,” stores the parity information, which, in this case, is the sum of the elements in each
row.
Now, if one of the data disks fails, the missing value can be calculated by subtracting the sum
of the rest of the elements from the parity value.
Here, computation of parity is represented as an arithmetic sum of the data. However, parity
calculation is a bitwise XOR operation.
2)Explain Nested RAID with the neat diagram.
Nested RAID
Most data centers require data redundancy and performance from their RAID arrays.
RAID 1+0 and RAID 0+1 combine the performance benefits of RAID 0 with the redundancy
benefits of RAID 1.
They use striping and mirroring techniques and combine their benefits.
These types of RAID require an even number of disks, the minimum being four (see
Fig).
RAID 1+0:
RAID 1+0 is also known as RAID 10 (Ten) or RAID 1/0.
RAID 1+0 performs well for workloads with small, random, write-intensive I/Os.
Some applications that benefit from RAID 1+0 include the following:
High transaction rate Online Transaction Processing (OLTP)
Large messaging installations
Database applications with write intensive random access workloads
RAID 1+0 is also called striped mirror.
The basic element of RAID 1+0 is a mirrored pair, which means that data is first mirrored and
then both copies of the data are striped across multiple disk drive pairs in a RAID set.
When replacing a failed drive, only the mirror is rebuilt. The disk array controller uses the
surviving drive in the mirrored pair for data recovery and continuous operation.
Working of RAID 1+0:
Eg: consider an example of six disks forming a RAID 1+0 (RAID 1 first and then RAID 0)
set.
These six disks are paired into three sets of two disks, where each set acts as a RAID 1 set
(mirrored pair of disks). Data is then striped across all the three mirrored sets to form RAID 0.
Following are the steps performed in RAID 1+0 (see Fig 1.16 [a]):
Drives 1+2 = RAID 1 (Mirror Set A)
Drives 3+4 = RAID 1 (Mirror Set B)
Drives 5+6 = RAID 1 (Mirror Set C)
Now, RAID 0 striping is performed across sets A through C.
In this configuration, if drive 5 fails, then the mirror set C alone is affected. It still has drive 6
and continues to function and the entire RAID 1+0 array also keeps functioning.
Now, suppose drive 3 fails while drive 5 was being replaced. In this case the array still
continues to function because drive 3 is in a different mirror set.
So, in this configuration, up to three drives can fail without affecting the array, as long as they
are all in different mirror sets.
RAID 0+1 is also called a mirrored stripe.
The basic element of RAID 0+1 is a stripe. This means that the process of striping data across
disk drives is performed initially, and then the entire stripe is mirrored.
In this configuration if one drive fails, then the entire stripe is faulted.
Working of RAID 0+1:
Eg: Consider the same example of six disks forming a RAID 0+1 (that is, RAID 0 first and
then RAID 1).
Here, six disks are paired into two sets of three disks each.
Each of these sets, in turn, act as a RAID 0 set that contains three disks and then these two
sets are mirrored to form RAID 1.
Following are the steps performed in RAID 0+1 (see Fig 1.16 [b]):
Drives 1 + 2 + 3 = RAID 0 (Stripe Set A)
Drives 4 + 5 + 6 = RAID 0 (Stripe Set B)
These two stripe sets are mirrored.
If one of the drives, say drive 3, fails, the entire stripe set A fails.
A rebuild operation copies the entire stripe, copying the data from each disk in the healthy
stripe to an equivalent disk in the failed stripe.
This causes increased and unnecessary I/O load on the surviving disks and makes the RAID
set more vulnerable to a second disk failure.