0% found this document useful (0 votes)
307 views61 pages

Storage Technologies Question Bank

The document is a question bank for the course 'Storage Technologies' at Mahendra Institute of Technology for the academic year 2024-2025. It covers various topics related to storage systems, data centers, information lifecycle management, and the evolution of storage technology. The document includes questions and detailed explanations for students to understand key concepts in artificial intelligence and machine learning related to storage technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
307 views61 pages

Storage Technologies Question Bank

The document is a question bank for the course 'Storage Technologies' at Mahendra Institute of Technology for the academic year 2024-2025. It covers various topics related to storage systems, data centers, information lifecycle management, and the evolution of storage technology. The document includes questions and detailed explanations for students to understand key concepts in artificial intelligence and machine learning related to storage technologies.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

MAHENDRA INSTITUTE OF TECHNOLOGY (AUTONOMOUS)

Mahendhirapuri, Mallasamudram,Namakkal- 637 503

DEPARTMENT OF B.E CSE (ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING)

QUESTION BANK

Academic Year: 2024-2025 (Odd Semester)

Year/Sem.: III/V
Course Code &Title: AI2219304 & STORAGE TECHNOLOGIES
Regulation: R2022

Prepared By Approved By

(M.R.NITHYAA , AP/AI&ML) Dr.N.SATHISH,HoD/IT

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 1
UNIT-I

STORAGE SYSTEMS

Introduction to Information Storage: Digital data and its types, Information storage, Key characteristics of
data centre and Evolution of computing platforms. Information Lifecycle Management.Data Centre
Environment: Building blocks of a data center, Compute systems and compute virtualization and
Software-defined data center.

Part-A ( Five Questions)


S.No Questions BTL CO
1 Define data with examples. K1 CO1
Data is a collection of raw facts from real-world entities Data can come in
the form of text, observations, figures, images, numbers, graphs, or
symbols.
Examples
Handwritten letters, a printed book, a family photograph, a
movie on video tape, printed and duly signed copies of mortgage
papers, a bank’s ledgers, and an account holder’s passbooks are all
examples of data.
2 List the core functional elements of the data center. K1 CO1
Five core elements are essential for the basic functionality of a data center
 Application
 Database
 Server and operating system
 Network
 Storage array
3 What is structured data and unstructured data? K1 CO1
 Structured data is organized in rows and columns in a rigidly
defined format so that applications can retrieve and process it
efficiently. Structured data is typically stored using a database
management system.
 Data is unstructured if its elements cannot be stored in rows and
columns, and is therefore difficult to query and retrieve by business
applications.
4 What is data center virtualization? K1 CO1

 Data center virtualization is the transfer of physical data centers


into digital data centers (i.e.,virtual) using a cloud software
platform, enabling companies to remotely access information
and applications.
 Data center virtualization is the process of creating a virtual
server—sometimes called a software defined data center
(SDCC) from traditional, physical servers.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 2
5 Identify the Implementation process of ILM. K2 CO1
 Classifying data and applications based on business rules and
policies to enable differentiated treatment of information
 Implementing policies by using information
management tools, starting from the creation of data
and ending with its disposal
 Managing the environment by using integrated tools
to reduce operational complexity
 Organizing storage resources in tiers to align the resources with
data classes, and storing information in the right type of
infrastructure.

Part-B(Three Questions) ( 13 Marks)

S.No Questions BTL CO

1 Explain data center core elements in detail. K1 CO1

 The data center infrastructure includes computers, storage systems, network


devices, dedicated power backups, and environmental controls
 Large organizations often maintain more than one data center to distribute data
processing workloads and provide backups in the event of a disaster.

CORE ELEMENTS:
Five core elements are essential for the basic functionality of a data center:
Application:
An application is a computer program that provides the logic for computing
operations. Applications, such as an order processing system, can be layered on the
database, which in turn uses operating system services to perform read/write
operations to storage devices.
Database:
More commonly, a database management system (DBMS) provides a structured way
to store data in logically organized tables that are interrelated. A DBMS optimizes the
storage and retrieval of data.
Server and operating system:
A computing platform that runs applications and databases.
Network:
A data path that facilitates communication between clients and servers or
between servers and storage.
Storage array:

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 3
A device that stores data persistently for subsequent use.

KEY REQUIREMENTS FOR DATA CENTER ELEMENTS

Availability:
All data center elements should be designed to ensure accessibility. The inability
of users to access data can have a significant negative impact on a business.

Security:
 Policies, procedures, and proper integration of the data center core elements
that will prevent unauthorized access to information must be established.
 In addition to the security measures for client access, specific
mechanisms must enable servers to access only their allocated resources on
storage arrays.
Scalability:
 Data center operations should be able to allocate additional processing
capabilities or storage on demand, without interrupting business operations.
 Business growth often requires deploying more servers, new applications, and
additional databases. The storage solution should be able to grow with the
business.
Performance:
 All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed.
 The infrastructure should be able to support performance requirements
Data integrity:
 Data integrity refers to mechanisms such as error correction codes or parity
bits which ensure that data is written to disk exactly as it was received.
 Any variation in data during its retrieval implies corruption, which may
affect the operations of the organization.
Capacity:

Data center operations require adequate resources to store and process large
amounts of data efficiently.
 When capacity requirements increase, the data center must be able to
provide additional capacity with- out interrupting availability, or, at the
very least, with minimal disruption.
Manageability:
 A data center should perform all operations and activities most
efficiently.
 Manageability can be achieved through automation and the reduction of
human (manual) intervention in common tasks.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 4
MANAGING STORAGE INFRASTRUCTURE:
Managing a modern, complex data center involves many tasks.
Monitoring is the continuous collection of information and the review of the
entire data center infrastructure.
Reporting is done periodically on resource performance, capacity, and utilization.
Reporting tasks help to establish business justifications and chargeback of
costs associated with data center operations.
Provisioning is the process of providing the hardware, software, and other resources
needed to run a data center.
 Capacity planning ensures that the user’s and the application’s future
needs will be addressed in the most cost-effective and controlled
manner.
 Resource planning is the process of evaluating and identifying
required resources, such as personnel, the facility (site), and the
technology. Resource planning ensures that adequate resources are
available to meet user and application requirements.
2 Elaborate on the Evolution of storage technology and architecture in detail. K1 CO1
The organizations had centralized computers (mainframe) and information
storage devices (tape reels and disk packs) in their data center.

• The evolution of open systems and the affordability and ease of


deployment that they offer made it possible for business
units/departments to have their own servers and storage.
• In earlier implementations of open systems, the storage was typically
internal to the server.
• The proliferation of departmental servers in an enterprise resulted in
unprotected, unmanaged, fragmented islands of information and
increased operating costs.
• Originally, there were very limited policies and processes for
managing these servers and the data created.
To overcome these challenges, storage technology evolved from non-
intelligent internal storage to intelligent networked storage.

Redundant Array of Independent Disks (RAID):


This technology was developed to address the cost, performance, and
availability requirements of data. It continues to evolve today and is used in all

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 5
storage architectures such as DAS, SAN, and so on.
Direct-attached storage (DAS):
This type of storage connects directly to a server (host) or a group of servers in a
cluster. Storage can be either internal or external to the server. External DAS
alleviated the challenges of limited internal storage capacity.

Storage area network (SAN):


This is a dedicated, high-performance Fibre Channel (FC) network to facilitate
block-level communication between servers and storage.
Storage is partitioned and assigned to a server for accessing its data.
SAN offers scalability, availability, performance, and cost benefits compared
to DAS.
Network-attached storage (NAS): This is dedicated storage for file-serving
applications. Unlike a SAN, it connects to an existing communication network
(LAN) and provides file access to heterogeneous clients. Because it is
purposely built to provide storage to file server applications, it offers higher
scalability, availability, performance, and cost benefits compared to general-
purpose file servers.
Internet Protocol SAN (IP-SAN): One of the latest evolutions in storage
architecture, IP-SAN is a convergence of technologies used in SAN and NAS.
IP-SAN provides block-level communication across a local or wide area
network (LAN or WAN), resulting in greater consolidation and availability of
data.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 6
3 Describe in detail about the Information Lifecycle Management system. K2 CO1

INFORMATION LIFECYCLE MANAGEMENT:


Information lifecycle management (ILM) is a proactive strategy that enables an
IT organization to effectively manage the data throughout its lifecycle, based
on predefined business policies.

An ILM strategy should include the following characteristics:

Business-centric: It should be integrated with key processes, applications, and


initiatives of the business to meet both current and future growth in
information.
Centrally managed: All the information assets of a business should be under
the purview of the ILM strategy.
Policy-based: The implementation of ILM should not be restricted to a few
departments. ILM should be implemented as a policy and encompass all
business applications, processes, and resources.
Heterogeneous: An ILM strategy should take into account all types of storage
platforms and operating systems.
Optimized: Because the value of information varies, an ILM strategy should
consider the different storage requirements and allocate storage resources based on
the information’s value to the business.
ILM IMPLEMENTATION:
The process of developing an ILM strategy includes four activities— classifying,

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 7
implementing, managing, and organizing:

Classifying data and applications on the basis of business rules and policies to
enable differentiated treatment of information
■ Implementing policies by using information management tools, starting from

the creation of data and ending with its disposal


■ Managing the environment by using integrated tools to reduce operational

complexity
■ Organizing storage resources in tiers to align the resources with data classes,

and storing information in the right type of infrastructure based on the


information’s current value.

Step 1 :
The goal is to implement a storage networking environment. Storage architectures
offer varying levels of protection and performance and this acts as a foundation for
future policy-based information management in Steps 2 and 3.
Step 2:
Takes ILM to the next level, with detailed application or data classification and linkage
of the storage infrastructure to business policies.
This classification and the resultant policies can be automatically executed using tools
for one or more applications, resulting in better management and optimal allocation
of storage resources.
Step 3 : The implementation is to automate more of the applications or
data classification and policy management activities in order to scale
to a wider set of enterprise applications.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 8
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO

i)Summarize the idea of a “Data Centre Environment” (8). K2 CO1


1
ii)Discuss the benefits and the key components of the software-defined data center. (7) K2 CO1
A data center is a facility that provides shared access to applications and data using
a complex network, compute, and storage infrastructure.
EVOLUTION OF THE DATA CENTER TO THE CLOUD
 The fact that virtual cloud DC can be provisioned or scaled down with only
a few clicks is a major reason for shifting to the cloud. In modern data
centers, software-defined networking (SDN) manages the traffic flows via
software.
 Infrastructure as a Service (IaaS) offerings, hosted on private and public
clouds, spin up whole systems on-demand.

TYPES OF DATA CENTERS:


Enterprise data centers are typically constructed and used by a single organization for
their own internal purposes. These are common among tech giants.

Colocation data centers function as a kind of rental property where the space and
resources of a data center are made available to the people willing to rent it.

Managed service data centers offer aspects such as data storage, computing, and other
services as a third party, serving customers directly.

Cloud data centers are distributed and are sometimes offered to customers with the help
of a third-party managed service provider.
BUILDING BLOCKS OF A DATA CENTER :
Data centers are made up of three primary types of components:

Compute, storage, and network.

Apart from the Data Centers, support infrastructure is essential to meeting the
service level agreements of an enterprise data center.

Data Center Computing


• Servers are the engines of the data center. On servers, the processing and
memory used to run applications may be physical, virtualized, distributed
across containers, or distributed among remote nodes in an edge computing

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 9
model.
• Data centers must use processors that are best suited for the task, e.g. general-
purpose CPUs may not be the best choice to solve artificial intelligence (AI) and
machine learning (ML) problems.
Data Center Storage
 Data centers host large quantities of sensitive information, both for their
purposes and the needs of their customers. Decreasing costs of storage
media increases the amount of storage available for backing up the data
either locally, remote, or both.
 Advancements in non-volatile storage media lowers data access times.
 In addition, as in any other thing that is software-defined, software-defined
storage technologies increase staff efficiency for managing a storage
system.
Data Center Networks
 Datacenter network equipment includes cabling, switches, routers, and
firewalls that connect servers and to the outside world. Properly configured
and structured, they can manage high volumes of traffic without
compromising performance.
 A typical three-tier network topology is made up of core switches at the edge
connecting the data center to the Internet and a middle aggregate layer that
connects the core layer to the access layer where the servers reside.
 Advancements, such as hyper-scale network security and software-defined
networking, bring cloud-level agility and scalability to on-premises networks.

ii)SOFTWARE - DEFINED DATA CENTER (SDDC)


 A traditional data center is a facility where organizational data, applications,
networks, and infrastructure are centrally housed and accessed.
 It is the hub for IT operations and physical infrastructure equipment, including
servers, storage devices, network equipment, and security devices.
Traditional data centers can be hosted:
• On-premise
• With a managed service provider (MSP)

• In the cloud

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 10
In software-defined data center is an IT-as-a-Service (ITaaS) platform that services
an organization’s software, infrastructure, or platform needs.
An SDDC can be housed on-premise, at an MSP, and in private, public, or hosted
clouds.
Like traditional data centers, SDDCs also host servers, storage devices, network
equipment, and security devices. You can manage SDDCs from any location, using
remote APIs and Web browser interfaces. SDDCs also make extensive use of
automation capabilities to:
• Reduce IT resource usage
• Provide automated deployment and management for many core
functions

KEY COMPONENTS OF SDDC

• Compute virtualization, where virtual machines (VMs)—including


their operating systems, CPUs, memory, and software—reside on cloud
servers. Compute virtualization allows users to create software
implementations of computers that can be spun up or spun down as
needed, decreasing provisioning time.
• Network virtualization, where the network infrastructure servicing
your VMs can be provisioned without worrying about the underlying
hardware. Network infrastructure needs—telecommunications,
firewalls, subnets, routing, administration, DNS, etc.—are configured
inside your cloud SDDC on the vendor’s abstracted hardware. No network
hardware assembly is required.
• Storage virtualization, where disk storage is provisioned from the
SDDC vendor’s storage pool. You get to choose your storage types, based
on your needs and costs. You can quickly add storage to a VM when
needed.
• Management and automation software. SDDCs use management and
automation software to keep business-critical functions working around

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 11
the clock, reducing the need for IT manpower. Remote management and
automation is delivered via a software platform accessible from any
suitable location, via APIs or Web browser access.

Benefits of SDDCs
Business agility
An SDDC offers several benefits that improve business agility with a focus on three key
areas:
• Balance
• Flexibility
• Adaptability

Reduced cost
• In general, it costs less to operate an SDDC than housing data in brick-and-mortar
data centers.
• Cloud SDDCs operate similarly to SaaS platforms that charge a recurring monthly
cost.
• This is usually an affordable rate, making an SDDC accessible to all types of
businesses, even those who may not have a big budget for technology
spending.

Increased scalability
By design, cloud SDDCs can easily expand along with your business. Increasing
your storage space or adding functions is usually as easy as contacting the data
facility to get a revised monthly service quote.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 12
UNIT-II

Components of an intelligent storage system,Componenets, addressing and performance of hard disk drives
and solid state drives, RAID, Types of intelligent storage systems, Scale-up, and scaleout storage
Architecture.

Part A ( Five Questions)


S.No Questions BTL CO
1 What is an intelligent storage system? K1 CO2

 Intelligent storage is a storage system or service that uses AI to


continuously learn and adapt to its hybrid cloud environment to
better manage and serve data.
 It can be deployed as hardware on-premises, as a virtual appliance,
or as a cloud service. It also features RAID arrays that provide
highly optimized I/O processing capabilities.

2 Define command Queueing. K1 CO2


Command queuing is a technique implemented on front-end controllers. It
determines the execution order of received commands and can reduce
unnecessary drive head movements and improve disk performance. When a
command is received for execution, the command queuing algorithms
assign a tag that defines a sequence in which the commands can be
executed.

3 List out the RAID levels. K1 CO2

 RAID 0- Striped array with no fault tolerance


 RAID 1- Disk mirroring
 RAID 3- Parallel access array with dedicated parity disk
 RAID 4- Striped array with independent disks and a dedicated
parity disk
 RAID 5- Striped array with independent disks and distributed parity
 RAID 6- Striped array with independent disks and dual distributed
parity
 Nested- Combinations of RAID levels. Example: RAID 1 + RAID 0

4 Write about cache mirroring and cache vaulting. K1 CO2


cache mirroring:
Each write-to cache is held in two different memory locations on
two independent memory cards. In the event of a cache failure, the
write data will still be safe in the mirrored location and can be
committed to the disk.
cache vaulting:
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 13
The cache is exposed to the risk of uncommitted data loss due to
power failure. This problem can be addressed in various ways:
powering the memory with a battery until AC power is restored or
using battery power to write the cache content to the disk.
5 Define scale-up and scale-out storage architecture. K1 CO2
Scale-up data storage architecture, storage drives are added to increase
storage capacity and performance.
Scale-out storage architecture uses software-defined storage (SDS) to
separate the storage hardware from the storage software, letting the
software act as the controllers
Part-B(Three Questions) ( 13 Marks)

S.No Questions BTL CO

1 Explain key components of an intelligent storage system. K2 CO2

An intelligent storage system consists of four key components: front end,


cache, back end, and physical disks.

An I/O request from the host at the front-end port is processed through a cache
and the back end, to enable storage and retrieval of data from the physical
disk. A read request can be serviced directly from the cache if the requested
data is found in the cache.

Front-End Back-End P hysical


Host Disks
Connectivit Cache
y
Storage
Netw
ork

P ort P ort

FRONT END
• The front end provides the interface between the storage system and the
host. It consists of two components: front-end ports and front-end
controllers.
• The front-end ports enable hosts to connect to the intelligent storage system.
Each
front-end port has processing logic that executes the appropriate transport
protocol, such as SCSI, Fibre Channel, or iSCSI, for storage connections.
 Front-end controllers route data to and from the cache via the internal data bus.
 When the cache receives write data, the controller sends an acknowledgment
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 14
message back to the host. Controllers optimize I/O processing by using command
queuing algorithms.

Front-End Command Queuing

• Command queuing is a technique implemented on front-end controllers. It


determines the execution order of received commands and can reduce
unnecessary drive head movements and improve disk performance.
• When a command is received for execution, the command queuing
algorithms assign a tag that defines a sequence in which commands
should be executed.
•With command queuing, multiple commands can be executed
concurrently based on the organization of data on the disk, regardless of
the order in which the commands were received.
The most commonly used command queuing algorithms are as follows:
First In First Out (FIFO): This is the default algorithm where commands are
executed in the order in which they are received. There is no reordering of
requests for optimization; therefore, it is inefficient in terms of performance.
Seek Time Optimization: Commands are executed based on optimizing
read/write head movements, which may result in a reordering of commands.
Access Time Optimization: Commands are executed based on the
combination of seek time optimization and an analysis of rotational latency
for optimal performance.

CACHE
• The cache is semiconductor memory where data is placed temporarily to reduce the
time required to service I/O requests from the host.
• Accessing data from the cache takes less than a millisecond. Write data is placed
in the cache and then written to disk. After the data is securely placed in the cache,
the host is acknowledged immediately.

Structure of Cache:
✓ The cache is organized into pages or slots, which is the smallest unit of cache
allocation.

 The size of a cache page is configured according to the application I/O size. The
cache consists of the data store and tag RAM.

 The data store holds the data while tag RAM tracks the location of the data in
the data store and disk.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 15
 Entries in tag RAM indicate where data is found in cache and where the data
belongs on the disk. Tag RAM includes a dirty bit flag, which indicates whether
the data in cache has been committed to the disk or not.

 It also contains time-based information, such as the time of last access, which
is used to identify cached information that has not been accessed for a long
period and may be freed up.

Cache Implementation

The cache can be implemented as either a dedicated cache or a global cache. With a
dedicated cache, separate sets of memory locations are reserved for reads and writes.
In the global cache, both reads and writes can use any of the available memory
addresses. Cache management is more efficient in a global cache implementation, as
only one global set of addresses has to be managed.

BACK END:
• The back end provides an interface between cache and the physical
disks. It con- sists of two components: back-end ports and back-end
controllers.
• The back end controls data transfers between cache and the physical disks.
From cache,
data is sent to the back end and then routed to the destination disk. Physical
disks are connected to ports on the back end.
• The back-end controller communicates with the disks when performing
reads and writes
and also provides additional, but limited, temporary data storage.

PHYSICAL DISK:

 A physical disk stores data persistently.


 Disks are connected to the back end with either SCSI or a Fibre

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 16
Channel interface.
 An intelligent storage system enables the use of a mixture of SCSI or Fibre
Channel drives and IDE/ATA drives.
Logical Unit Number

 Physical drives or groups of RAID-protected drives can be logically split into


volumes known as logical volumes, commonly referred to as Logical Unit
Numbers (LUNs).

 The use of LUNs improves disk utilization.

 For example, without the use of LUNs, a host requiring only 200 GB could be
allocated an entire 1TB physical disk. Using LUNs, only the required 200 GB
would be allocated to the host, allowing the remaining 800 GB to be allocated to
other hosts.

 LUNs 0 and 1 are presented to hosts 1 and 2, respectively, as physical volumes


for storing and retrieving data. The usable capacity of the physical volumes is
determined by the RAID type of the RAID set.

 The capacity of a LUN can be expanded by aggregating other LUNs with it.
The result of this aggregation is a larger capacity LUN, known as a meta-
LUN. The mapping of LUNs to their physical location on the drives is
managed by the operating environment of an intelligent storage system.

2 Discuss in detail about the Disk Drive Components. K2 CO2

 A disk drive uses a rapidly moving arm to read and write data across a flat platter
coated with magnetic particles. Data is transferred from the magnetic platter
through the R/W head to the computer.

 Several platters are assembled together with the R/W head and controller, most

 commonly referred to as a hard disk drive (HDD).

Key components of a disk drive are platter, spindle, read/write head, actuator arm
assembly, and controller

PLATTER:

 A typical HDD consists of one or more flat circular disks called platters (Figure
2-3). The data is recorded on these platters in binary codes (0s and 1s).

 The set of rotating platters is sealed in a case, called a Head Disk Assembly
(HDA). A platter is a rigid, round disk coated with magnetic material on both

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 17
surfaces (top and bottom).

 The data is encoded by polarizing the magnetic area, or domains, of the disk
surface. Data can be written to or read from both surfaces of the platter.

 The number of platters and the storage capacity of each platter determine the
total capacity of the drive.

SPINDLE
✓ A spindle connects all the platters, as shown in Figure 2-3, and is
connected to a motor. The motor of the spindle rotates with a
constant speed.
✓ The disk platter spins at a speed of several thousands of
revolutions per minute (rpm). Disk drives have spindle speeds of
7,200 rpm, 10,000 rpm, or 15,000 rpm. Disks used on current
storage systems have a platter diameter of 3.5” (90 mm).
✓ When the platter spins at 15,000 rpm, the outer edge is moving at
around 25
percent of the speed of sound.

READ/WRITE HEAD

✓ Read/Write (R/W) heads, shown in Figure 2-4, read and write


data from or to a platter.
✓ Drives have two R/W heads per platter, one for each surface of the platter.
✓ The R/W head changes the magnetic polarization on the surface ofthe
platter when writing data. While reading data, this head detects
magnetic polarization on the surface of the platter.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 18
✓ During reads and writes, the R/W head senses the magnetic
polarization and never touches the surface of the platter. When the
spindle is rotating, there is a microscopic air gap between the R/W
heads and the platters, known as the head flying height.
✓ This air gap is removed when the spindle stops rotating and the R/W
head rests on a special area on the platter near the spindle. This area is
called the landing zone. The landing zone is coated with a lubricant
to reduce friction between the head and the platter.
✓ The logic on the disk drive ensures that heads are moved to the landing
zone before they touch the surface. If the drive malfunctions and the
R/W head accidentally touches the surface of the platter outside the
landing zone, a head crash occurs.

ACTUATOR ARM ASSEMBLY:


The R/W heads are mounted on the actuator arm assembly, which positions the R/W
head at the location on the platter where the data needs to be written or read. The
R/W heads for all platters on a drive are attached to one actuator arm assembly and
move across the platters simultaneously.
CONTROLLER:
The controller is a printed circuit board, mounted at the bot- tom of a disk
drive. It consists of a microprocessor, internal memory, circuitry, and
firmware.
 The firmware controls power to the spindle motor and the speed of
the motor. It also manages communication between the drive and the
host.
 In addition, it controls the R/W operations by moving the actuator
arm and switching between different R/W heads and performing the
optimization of data access.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 19
3 Describe the two types of RAID implementation and Array Components in detail. K2 CO2

RAID is a way of storing the same data in different places on multiple


hard disks or solid-state drives (SSDs) to protect data in the case of a
drive failure.

There are two types of RAID implementation, hardware and software.

Software RAID

✓ Software RAID uses host-based software to provide RAID functions.

✓ It is implemented at the operating-system level and does not use a


dedicated hardware controller to manage the RAID array.

✓ Software RAID implementations offer cost and simplicity benefits when


com- pared with hardware RAID. However, they have the following
limitations:

✓ Performance: Software RAID affects overall system performance.


This is due to the additional CPU cycles required to perform RAID
calculations.

✓ Supported features: Software RAID does not support all RAID levels.

✓ Operating system compatibility: Software RAID is tied to the host


operat- ing system hence upgrades to software RAID or to the operating
system should be validated for compatibility. This leads to inflexibility in
the data processing environment.

Hardware RAID

✓ In hardware RAID implementations, a specialized hardware controller is


imple- mented either on the host or on the array. These implementations
vary in the way the storage array interacts with the host.

✓ Controller card RAID is host-based hardware RAID implementation in


which a specialized RAID controller is installed in the host and HDDs are
connected to it.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 20
✓ The RAID Controller interacts with the hard disks using a PCI bus.
Manufacturers also integrate RAID controllers on motherboards. This
integra- tion reduces the overall cost of the system, but does not provide
the flexibility required for high-end storage systems.

✓ The external RAID controller is an array-based hardware RAID. It acts as


an interface between the host and disks. It presents storage volumes to the
host, which manage the drives using the supported protocol. Key functions
of RAID controllers are:

Management and control of disk aggregations

■ Translation of I/O requests between logical disks and physical disks


■ Data regeneration in the event of disk failures

RAID Array Components


✓ RAID array is an enclosure that contains a number of HDDs and the
supporting hardware and software to implement RAID. HDDs inside a
RAID array are usually contained in smaller sub- enclosures.

✓ These sub-enclosures, or physical arrays, hold a fixed number of HDDs,


and may also include other supporting hardware, such as power supplies.
A subset of disks within a RAID array can be grouped to form logical
associations called logical arrays, also known as a RAID set or a RAID
group (see Figure 3-1).

✓ Logical arrays are comprised of logical volumes (LV). The operating


system recognizes the LVs as if they are physical HDDs managed by the
RAID controller.

✓ The number of HDDs in a logical array depends on the RAID level used.
Configurations could have a logical array with multiple physical arrays or
a physical array with multiple logical arrays.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 21
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO

i) Discuss the steps involved in various RAID levels models. (10) K2 CO2
1
ii) Explain the Read and Write operation performed in cache memory (5) K2 CO2
i)RAID levels are defined based on striping, mirroring, and parity techniques. These
techniques determine the data availability and performance characteristics of an array.

RAID 0: Striping
• RAID 0, also known as a striped set or a striped volume, requires a minimum of
two disks. The disks are merged into a single large volume where data is stored
evenly across the number of disks in the array.
• This process is called disk striping and involves splitting data into blocks and
writing it simultaneously/sequentially on multiple disks. Therefore, RAID 0 is
generally implemented to improve speed and efficiency.

Advantages of RAID 0
• Cost-efficient and straightforward to implement.
• Increased read and write performance.
• No overhead (total capacity use).
Disadvantages of RAID 0
• Doesn't provide fault tolerance or redundancy.

RAID 1: Mirroring
✓ RAID 1 is an array consisting of at least two disks where the same data is stored on
each to ensure redundancy. The most common use of RAID 1 is setting up a
mirrored pair consisting of two disks in which the contents of the first disk is
mirrored in the second. This is why such a configuration is also called mirroring.

Advantages of RAID 1
• Increased read performance.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 22
• Provides redundancy and fault tolerance.
• Simple to configure and easy to use.

Disadvantages of RAID 1
• Uses only half of the storage capacity.
• More expensive (needs twice as many drivers).
• Requires powering down your computer to replace the failed drive.

Raid 2: Bit-Level Striping with Dedicated Hamming-Code Parity

It combines bit-level striping with error checking and information correction. This RAID
implementation requires two groups of disks – one for writing the data and another for
writing error correction codes. RAID 2 also requires a special controller for the
synchronized spinning of all disks.

Advantages of RAID 2
• Reliability.
• The ability to correct stored information.
Disadvantages of RAID 2
• Expensive.
• Difficult to implement.
• Require entire disks for ECC.

Raid 3: Bit-Level Striping with Dedicated Parity


✓ This RAID implementation utilizes bit-level striping and a dedicated parity
disk. Because of this, it requires at least three drives, where two are used for
storing data strips, and one is used for parity.
✓ To allow synchronized spinning, RAID 3 also needs a special controller. Due
to its configuration and synchronized disk spinning, it achieves better
performance rates with sequential operations than random read/write
operations.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 23
Advantages of RAID 3
• Good throughput when transferring large amounts of data.
• High efficiency with sequential operations.
• Disk failure resiliency.
Disadvantages of RAID 3
• Not suitable for transferring small files.
• Complex to implement.
• Difficult to set up as software RAID.

Raid 4: Block-Level Striping with Dedicated Parity


RAID 4 is another unpopular standard RAID level. It consists of block-level data striping
across two or more independent diss and a dedicated parity disk.

• Fast read operations.

• Low storage overhead.

• Simultaneous I/O requests.

Disadvantages of RAID 4

• Bottlenecks that have big effect on overall performance.


• Slow write operations.
• Redundancy is lost if the parity disk fails.

Raid 5: Striping with Parity


RAID 5 is considered the most secure and most common RAID implementation. It
combines striping and parity to provide a fast and reliable setup. Such a configuration gives
the user storage usability as with RAID 1 and the performance efficiency of RAID 0.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 24
Parity bits are distributed evenly on all disks after each sequence of data has been saved.
Advantages of RAID 5
• High performance and capacity.
• Fast and reliable read speed.
• Tolerates single drive failure.
Disadvantages of RAID 5
• Longer rebuild time.
• Uses half of the storage capacity (due to parity).
• If more than one disk fails, data is lost.
• More complex to implement.

Raid 6: Striping with Double Parity


• RAID 6 is an array similar to RAID 5 with an addition of its double parity feature.
For this reason, it is also referred to as the double-parity RAID.
• Block-level striping with two parity blocks allows two disk failures before any
data is lost. This means that in an event where two disks fail, RAID can still
reconstruct the required data.

Advantages of RAID 6
• High fault and drive-failure tolerance.

• Storage efficiency (when more than four drives are used).

• Fast read operations.

Disadvantages of RAID 6
• Rebuild time can take up to 24 hours.

• Slow write performance.

• Complex to implement.

• More expensive.

Raid 10: Mirroring with Striping


• RAID 10 is part of a group called nested or hybrid RAID, which means it is a
combination of two different RAID levels. In the case of RAID 10, the array
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 25
combines level 1 mirroring and level 0 striping. This RAID array is also known
as RAID 1+0.
RAID 10 uses logical mirroring to write the same data on two or more drives to provide
redundancy. If one disk fails, there is a mirrored image of the data stored on another disk..

Advantages of RAID 10
• High performance.
• High fault-tolerance.
• Fast read and write operations.
• Fast rebuild time.
Disadvantages of RAID 10
• Limited scalability.
• Costly (compared to other RAID levels).
• Uses half of the disk space capacity.
• More complicated to set up.
ii) Read Operation with Cache

✓ when a host issues a read request, the front-end controller accesses the
tag RAM to determine whether the required data is available in the
cache.

✓ If the requested data is found in the cache, it is called a read cache hit
or read hit and data is sent directly to the host, without any disk
operation. This provides a fast response time to the host (about a
millisecond).

✓ If the requested data is not found in the cache, it is called a cache miss
and the data must be read from the disk.

✓ The back-end controller accesses the appropriate disk and retrieves the
requested data. Data is then placed in the cache and is finally sent to the
host through the front-end controller. Cache misses increase I/O
response time.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 26
✓ A pre-fetch, or read-ahead, algorithm is used when read requests are
sequential. In a sequential read request, a contiguous set of associated
blocks is retrieved. Several other blocks that have not yet been
requested by the host can be read from the disk and placed into the
cache in advance.
✓ The intelligent storage system offers fixed and variable pre-fetch sizes.
✓ In fixed pre-fetch, the intelligent storage system pre-fetches a fixed
amount of data. It is most suitable when I/O sizes are uniform.
In variable pre-fetch, the storage system pre-fetches an amount of data in
multiples of the size of the host request.
✓ Read performance is measured in terms of the read hit ratio, or the
hit rate, usually expressed as a percentage.
This ratio is the number of read hits with respect to the total number of read requests. A
higher read-hit ratio improves the read performance.

Write Operation with Cache:

Write operations with cache provide performance advantages over writing directly
to disks. When an I/O is written to the cache and acknowledged, it is completed in
far less time (from the host’s perspective) than it would take to write directly to
disk
• Write-back cache: Data is placed in the cache and an acknowledgment is sent to
the host immediately. Later, data from several writes are committed (de-staged) to
the disk. Write response times are much faster, as the write operations are isolated
from the mechanical delays of the disk. However, uncommitted data is at risk of
loss in the event of cache failures.

• Write-through cache: Data is placed in the cache and immediately written to the

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 27
disk, and an acknowledgment is sent to the host. Because data is committed to disk
as it arrives, the risks of data loss are low but write response time is longer because
of the disk operations.

The cache can be bypassed under certain conditions, such as very large size write I/O.
In this implementation, if the size of an I/O request exceeds the pre-defined size, called
write aside size, writes are sent to the disk directly to reduce the impact of large writes
consuming a large cache area.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 28
UNIT-III

STORAGE NETWORKING TECHNOLOGIES AND VIRTUALIZATION

Block-Based Storage System, File-Based Storage System, Object-Based and Unified Storage. Fibre Channel
SAN: Software-defined networking, FC SAN components and architecture, FC SAN topologies, link
aggregation, and zoning, Virtualization in FC SAN environment. Internet Protocol SAN: iSCSI protocol,
network components, and connectivity, Link aggregation, switch aggregation, and VLAN, FCIP protocol,
connectivity, and configuration.

Part-A ( Five Questions)

S.No Questions BTL CO

1 List the types of storage systems. K1 CO3


Different types of storage systems are as follows,
 Block-Based Storage System – Examples – SAN (Storage Area
Network), iSCSI, and local disks.
 File-Based Storage System – Examples – NTFS (New Technology
File System), FAT (File Allocation Table), EXT (Extended File
System).
 Object-Based Storage System – Examples – Google cloud storage,
Amazon Simple Storage Options.
 United Storage System – Examples – Dell EMC Unity XT All-Flash
United Storage and Dell EMC Unity XT Hybrid United Storage
2 State the connectivity of iSCSI protocol. K1 CO3
Native iSCSI connectivity - Native topologies do not have any FC components;
they Perform all communication over IP. The initiators may be either directly
attached to targets or connected using standard IP routers and switches.

Bridged iSCSI connectivity - Bridged topologies enable the co-existence of FC


with IP
by providing iSCSI-to-FC bridging functionality. For example, the initiators can
exist in an IP environment while the storage remains in an FC SAN

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 29
3 What is meant by a file-based storage system? K1 CO3

 File storage, also called file-level or file-based storage, stores data in


a hierarchical structure. The data is saved included and folders, and
presented to both the system storing it and the system retrieving it in
the same format.
 Data can be accessed using the Network File System (NFS) protocol
for Unix or Linux, or the Server Message Block (SMB) protocol for
Microsoft Windows.

4 Difference between Multimode fiber (MMF) cable and Single-mode fiber K1 CO3
(SMF).
Multimode fiber cable Single-mode fiber

Multimode fiber (MMF) cable Single-mode fiber (SMF) carries a single


carries multiple beams of light ray of light projected at the center of the
projected at different angles core. The small core and the single light
simultaneously onto the core of the wave help to limit modal dispersion.
cable

In an MMF transmission, multiple Single mode Provides minimum signal


light beams traveling inside the attenuation over a maximum distance (up
cable tend to disperse and collide. to 10 km).
This collision weakens the signal
strength after it travels a certain
distance a process known as modal
dispersion.
5 Define Link aggregation. K1 CO3
 Link aggregation allows combining multiple Ethernet links into a single
logical link between two networked devices. Link aggregation is sometimes
called by other names: Ethernet bonding. Ethernet teaming.
 Link aggregation provides greater bandwidth between the devices
at each end of the aggregated link.
 Link Aggregation refers to the process of combining multiple physical
links into a bundle known as a Link Aggregation Group (LAG).
 This allows for increased bandwidth between nodes and enhances resilience
by enabling data transfer between links in the group.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 30
Part-B(Three Questions) ( 13 Marks)

S.No Questions BTL CO

1 Explain in detail about Block-based Storage system K1 CO3


• Block storage is for flexible, fast access

• Block storage is a form of cloud storage that is used to store data,


often on storage area networks (SANs).

• Data is stored in blocks, with each block stored separately based on


the efficiency

• Each block is assigned a unique address, which is then used by a


management application controlled by the server's operating system to
retrieve and compile data into files upon request.

• Block storage offers efficiency due to the way blocks can be


distributed across multiple systems and even configured to work with
different operating systems.

• This makes using block storage quite similar to storing data on a hard
drive within a server, except the data is stored in a remote location rather
than on local hardware.

Working of Block Storage:

• A block is a fixed-size amount of memory within storage media that’s


capable of storing a piece of data. The size of each block is determined
by the management system.

• The block size is generally too small to fit an entire piece of data, and
so the data for any particular file is broken up into numerous blocks for
storage.

• Each block is given a unique identifier without any higher-level


metadata; details such as data format, type, and ownership are not noted.

• The operating system allocates and distributes blocks across the


storage network to balance efficiency and functionality.

• When a file is requested, the management application uses addresses to


identify the necessary blocks and then compiles them into the complete
file for use.

• By enabling storage across multiple environments, block storage

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 31
separates data from the limitations of individual user environments. As a
result, data can be retrieved through any number of paths to maximize
efficiency, with high input/output operations per second (IOPS).

Benefits of block storage

• High efficiency: Block storage’s high IOPS and low latency make it
ideal for applications that demand high performance.

• Compatibility: Block storage works across different operating systems


and file systems, making it compatible for enterprises whatever their
configuration and environment.

• Flexibility: With block storage, horizontal scaling is extremely flexible.


Cluster nodes can be added as needed, allowing for greater overall
storage capability.

• Large file efficiency: For large files, such as archives and video files,
data must be completely overwritten when using file or object storage..

Limitations of block storage

• Greater cost: While block storage is easily scalable, it can also be


expensive due to the cost of SANs. In addition, managing block storage
requires more-specialized training for management and maintenance,
increasing the overall expense.

• Performance limitations: With block storage, metadata is built in and


hierarchical, and it is defined by the file system. Because data is broken
up into blocks, searching for a complete file requires the proper
identification of all its pieces. This can create performance issues for
operations accessing the metadata, particularly with folders featuring a
large number of files.

Block storage use cases:

• Containers: Block storage supports the use of container platforms


such as Kubernetes, creating a block volume that enables persistent
storage for the entire container. This allows for the clean management
and migration of containers as needed.

• Email servers: Email servers can take advantage of block storage’s


flexibility and scalability. In fact, in the case of Microsoft Exchange,
block storage is required due to the lack of support for network-attached
storage.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 32
• Databases: Block storage is fast, efficient, flexible, and scalable, with
support for redundant volumes. This allows it to support databases,
particularly those that handle a heavy volume of queries and where
latency must be minimized.

• Disaster recovery: Block storage can be a redundant backup solution


for nearline storage and quick restoration, with data swiftly moved from
backup to production through easy access.

Need for block storage :

• Block storage continues to be an efficient and flexible cloud storage


option for enterprises require high-performance workloads or need to
manage large files.

2 Discuss the various FC topologies in detail. K1 CO3


• Fabric design follows standard topologies to connect devices.
• Core-edge fabric is one of the popular topology designs.

Core-Edge Fabric
In the core-edge fabric topology, there are two types of switch tiers in this
fabric.

• The edge tier usually comprises switches and offers an inexpensive


approach to adding more hosts in a fabric.

• The tier at the edge fans out from the tier at the core. The nodes on
the edge can communicate with each other.

• The core tier usually comprises enterprise directors that ensure high
fabric availability.

• All traffic has to either traverse through or terminate at this tier. In a


two- tier configuration, all storage devices are connected to the core tier,
facilitating fan- out.

The host-to-storage traffic has to traverse one and two ISLs in a two-tier and
three-tier configuration, respectively.

Hosts used for mission-critical applications can be connected directly to the


core tier and consequently avoid traveling through the ISLs to process I/O
requests from these hosts.

The core-edge fabric topology increases connectivity within the SAN

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 33
while conserving overall port utilization. If expansion is required, an additional
edge switch can be connected to the core.

This topology can have different variations. In a single-core topology, all

hosts are connected to the edge tier and all storage is connected to the core tier.

A dual-core topology can be expanded to include more core switches.

However, to maintain the topology, it is essential that new ISLs are created to
connect each edge switch to the new core switch that is added.

Benefits and Limitations of Core-Edge Fabric


 The core-edge fabric provides one-hop storage access to all storage in
the system. Because traffic travels in a deterministic pattern, a core-
edge provides easier calculation of ISL loading and traffic patterns.

 Each tier’s switch is used for either storage or hosts, one can easily
identify which resources are approaching their capacity, making it easier
to develop a set of rules for scaling and apportioning.

 Core-edge fabrics can be scaled to larger environments by linking core


switches, adding more core switches, or adding more edge switches.

 This method can be used to extend the existing simple core-edge


model or to expand the fabric into a compound or complex core-edge

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 34
model.

 The core-edge fabric may lead to some performance-related problems


because scaling a core-edge topology involves increasing the number of
ISLs in the fabric.

 The domain count in the fabric increases. A common best practice is to


keep the number of host-to-storage hops unchanged, at one hop, in a
core-edge.

 Hop count represents the total number of devices a given piece of data
(packet) passes through.

 A large hop count means greater the transmission delay between data
traverse from its source to destination.

As the number of cores increases, it may be prohibitive to continue to


maintain ISLs from each core to each edge switch. When this happens, the
fabric design can be changed to a compound or complex core-edge design.

Mesh Topology
 In a mesh topology, each switch is directly connected to other switches
by using ISLs. This topology promotes enhanced connectivity within the
SAN.

 When the number of ports on a network increases, the number of


nodes that can participate and communicate also increases.

 A mesh topology may be one of the two types: full mesh or partial
mesh. In a full mesh, every switch is connected to every other switch in
the topology. Full mesh topology may be appropriate when the number
of switches involved is small.

 A typical deployment would involve up to four switches or directors,


with each of them servicing highly localized host-to-storage traffic.

 In a full mesh topology, a maximum of one ISL or hop is required for


host-to-storage traffic. In a partial mesh topology, several hops or ISLs
may be required for the traffic to reach its destination.

 Hosts and storage can be located anywhere in the fabric, and storage
can be localized to a director or a switch in both mesh topologies.

 A full mesh topology with a symmetric design results in an even number


of switches, whereas a partial mesh has an asymmetric design and may

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 35
result in an odd number of switches.

3 Explain in detail about the components and architecture of FCoE. K1 CO3


FCOE SAN COMPONENTS
The key FCoE SAN components are:

 Network adapters such as Converged Network Adapter (CNA) and


software FCoE adapter

 Cables such as copper cables and fiber optical cables

 FCoE switch

Converged Network Adapter (CNA)


 The CNA is a physical adapter that provides the functionality of both a
standard NIC and an FC HBA in a single device.

 It consolidates both FC traffic and regular Ethernet traffic on a


common Ethernet infrastructure.

 FC traffic onto Ethernet frames and forwarding them to FCoE


switches over CEE links.

 They eliminate the need to deploy separate adapters and cables for FC
and Ethernet communications, thereby reducing the required number
of network adapters and switch ports.

 A CNA offloads the FCoE protocol processing task from the compute

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 36
system, thereby freeing the CPU resources of the compute system for
application processing.

 It contains separate modules for 10 Gigabit Ethernet (GE), FC, and


FCoE Application Specific Integrated Circuits (ASICs).

Software FCoE Adapter


 Instead of a CNA, a software FCoE adapter may also be used. A
software FCoE adapter is OS or hypervisor kernel-resident software that
performs FCoE processing.

 The FCoE processing consumes hosts CPU cycles.

 With software FCoE adapters, the OS or hypervisor implements FC


protocol in software that handles SCSI to FC processing.

 The software FCoE adapter performs FC to Ethernet encapsulation. Both


FCoE traffic

 (Ethernet traffic that carries FC data) and regular Ethernet traffic are
transferred through supported NICs on the hosts.

FCOE Switch
 An FCoE switch has both Ethernet switch and FC switch functionalities.
It has a Fibre Channel Forwarder (FCF), an Ethernet Bridge, and a set of
ports that can be used for FC and Ethernet connectivity.

 FCF handles FCoE login requests, applies zoning, and provides the
fabric services typically associated with an FC switch.

 It also encapsulates the FC frames received from the FC port into the
Ethernet frames and decapsulates the Ethernet frames received from the
Ethernet Bridge to the FC frames.

 Upon receiving the incoming Ethernet traffic, the FCoE switch inspects
the Ethertype of the incoming frames and uses that to determine their
destination.

 If the Ethertype of the frame is FCoE, the switch recognizes that the
frame contains an FC payload and then forwards it to the FCF.

 From there, the FC frame is extracted from the Ethernet frame and
transmitted to the FC SAN over the FC ports.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 37
 If the Ethertype is not FCoE, the switch handles the traffic as usual
Ethernet traffic and forwards it over the Ethernet ports.

FCoE ARCHITECTURE
 Fibre Channel over Ethernet (FCoE) is a method of supporting
converged Fibre Channel (FC) and Ethernet traffic on a data center
bridging (DCB) network.

 FCoE encapsulates unmodified FC frames in Ethernet to transport the


FC frames over a physical Ethernet network.

 An FCoE frame is the same as any other Ethernet frame because the
Ethernet encapsulation provides the header information needed to
forward the frames. However, to achieve the lossless behavior that FC
transport requires, the Ethernet network must conform to DCB
standards.

 DCB standards create an environment over which FCoE can


transport native FC traffic encapsulated in Ethernet while preserving
the mandatory class of service (CoS) and other characteristics that FC
traffic requires.

 Supporting FCoE in a DCB network requires that the FCoE devices in


the Ethernet network and the FC switches at the edge of the SAN
network handle both Ethernet and native FC traffic. To handle Ethernet
traffic, an FC switch does one of two things:

 Incorporates FCoE interfaces.

Uses an FCoE-FC gateway such as a QFX3500 switch to de-encapsulate FCoE


traffic from FCoE devices into native FC and to encapsulate native FC traffic
from the FC switch into FCoE and forward it to FCoE devices through the
Ethernet network

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 38
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO

1 Illustrate the components and architecture of FC SAN. K2 CO3

A SAN consists of three basic components: servers, network infrastructure,


and storage.

These components can be further broken down into the following key
elements: node ports, cabling, interconnecting devices (such as FC switches
or hubs), storage arrays, and SAN management software

Node Ports

In fibre channel, devices such as hosts, storage and tape libraries are all
referred to as nodes. Each node is a source or destination of information for
one or more nodes.

Each node requires one or more ports to provide a physical


interface for communicating with other nodes. These ports are integral
components of an HBA and the storage front-end adapters.

A port operates in full-duplex data transmission mode with a transmit (Tx)


link and a receive (Rx) link.

Cabling:

 SAN implementations use optical fiber cabling. Copper can be used


for shorter distances for back-end connectivity, as it provides a better
signal-to-noise ratio for distances up to 30 meters.

 Optical fiber cables carry data in the form of light.

 There are two types of optical cables, multi-mode and single-mode.

 Multi-mode fiber (MMF) cable carries multiple beams of light


projected at different angles simultaneously onto the core of the

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 39
cable.

Based on the bandwidth, multi-mode fibers are classified as

 OM1 (62.5µm),
 OM2 (50µm)
 laser optimized OM3 (50µm).
In an MMF transmission, multiple light beams traveling inside the cable tend
to disperse and collide.
 This collision weakens the signal strength after it travels a
certain distance — a process known as modal dispersion.

 An MMF cable is usually used for distances of up to 500 meters


because of signal degradation (attenuation) due to modal
dispersion.

 Single-mode fiber (SMF) carries a single ray of light projected at the


center of the core.

 These cables are available in diameters of 7–11 microns; the most


common size is 9 microns.

 In an SMF transmission, a single light beam travels in a straight line


through the core of the fiber.

 The small core and the single light wave limits modal dispersion.
Among all types of fibre cables, single-mode provides minimum
signal attenuation over a maximum distance (up to 10 km).

 A single-mode cable is used for long-distance cable runs, limited


only by the power of the laser at the transmitter and sensitivity of the
receiver

MMFs are generally used within data centers for shorter distance runs, while
SMFs are used for longer distances. MMF transceivers are less expensive as
compared to SMF transceivers.

 A Standard connector (SC) and a Lucent connector (LC) are two


commonly used connectors for fiber optic cables.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 40
 An SC is used for data transmission speeds up to 1 Gb/s, whereas an
LC is used for speeds up to 4 Gb/s.

 A Straight Tip (ST) is a fiber optic connector with a plug and a


socket that is locked with a half-twisted bayonet lock.

 In the early days of FC deployment, fiber optic cabling


predominantly used ST connectors. This connector is often used with
Fibre Channel patch panels

The Small Form-factor Pluggable (SFP) is an optical transceiver used in


optical communication. The standard SFP+ transceivers support data rates
up to 10 Gb/s.

Interconnect Devices
 Hubs, switches, and directors are the interconnect devices commonly
used in SAN.

 Hubs are used as communication devices in FC-AL implementations.

 Hubs physically connect nodes in a logical loop or a physical star


topology.

 All the nodes must share the bandwidth because data travels through
all the connection points. Because of the availability of low-cost and
high-performance switches, hubs are no longer used in SANs.

Storage Arrays
 The fundamental purpose of a SAN is to provide host access to
storage resources.

 The large storage capacities offered by modern storage arrays have


been exploited in SAN environments for storage consolidation and
centralization.

 SAN implementations complement the standard


features of storage arrays by providing high availability and

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 41
redundancy, improved performance, business continuity, and
multiple host connectivity.
 SAN management software manages the interfaces between hosts,
interconnect devices, and storage arrays.

 The software provides a view of the SAN environment and


enables t he management of various resources from one central
console.

 It provides key management functions, including mapping of storage


devices, switches, and servers, monitoring and generating alerts for
discovered devices, and logical partitioning of the SAN, called
zoning.

FC ARCHITECTURE

 The FC architecture represents true channel/network integration with


standard interconnecting devices. Connections in a SAN are
accomplished using FC.

 Transmissions from host to storage devices are carried out over


channel connections such as a parallel bus. Channel technologies
provide high levels of performance with low protocol overheads.

 Such performance is due to the static nature of channels and the high
level of hardware and software integration provided by the channel
technologies.

The key advantages of FCP are as follows:

Sustained transmission bandwidth over long distances.

 Support for a larger number of addressable devices over a


network. FC can support over 15 million device addresses on a
network.

 Exhibits the characteristics of channel transport and provides


speeds up to 8.5 Gb/s (8 GFC).

The FC standard enables mapping several existing Upper Layer Protocols


(ULPs) to FC frames for transmission, including SCSI, IP, High
Performance Parallel Interface (HIPPI), Enterprise System Connection
(ESCON), and Asynchronous Transfer Mode (ATM).

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 42
Unit-IV

BACKUP, ARCHIVE AND REPLICATION

Introduction to Business Continuity, Backup architecture, Backup targets and methods, Data deduplication,
Cloud-based and mobile device backup, Data archive, Uses of replication and its characteristics, Compute
based, storage-based, and network-based replication, Data migration, Disaster Recovery as a Service (DRaaS).

Part-A ( Five Questions)

S.No Questions BTL CO

1 Define Business continuity (BC) . K1 CO4


Business continuity (BC) is an integrated and enterprise wide process that
includes all activities (internal and external to IT) that a business must perform to
mitigate the impact of planned and unplanned downtime.
2 Define data Replication. K1 CO4
 Replication is the process of creating an exact copy of data. Creating one or
more replicas of the production data is one of the ways to provide Business
Continuity (BC).
 Data replication, where the same data is stored on multiple storage devices.
3 State Data Migration. K1 CO4
Data migration is the process of selecting, preparing, extracting, and
transforming data and permanently transferring it from one computer
storage system to another.
4 Difference between Data migration and data conversion. K2 CO4
 Data migration is the process of transferring data between data storage
systems or formats.
 Data conversion is the process of changing data from one format to
another. If a legacy system and a new system have identical fields, an
organization could just do a data migration; however, the data from the
legacy system is generally different and needs to be modified before
migrating. Data conversion is often a step in the data migration process.
5 List out the types of data migrations. K2 CO4
 Storage migration
 Database migration
 Application migration
 Cloud migration
 Business process migration.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 43
Part-B( Three Questions) ( 13 Marks)

S.No Questions BTL CO


1 Discuss in detail about backup architecture and their methods. K1 CO4

Backup Methods:

Backup method is of two types :

1. Hot backup

2. Cold backup

Hot backup

 Hot Backup-hot backup is also called as dynamic backup. In hot backup


method, backup is taken when user actively accessing the data.
 Hot backups can provide a convenient solution in multi-user systems,
because they do not require downtime, as does a conventional cold backup.
 Hot backups involve certain risks.
 If the data is altered while the backup is in progress, the resulting copy may
not match the final state of the data.
 If recovery of the data becomes necessary, the inconsistency must be
resolved.
 A Hot backup or one taken while the database is active, can only give a read
consistent copy, but doesn’t handle active transactions.
 When databases must remain operational 24 hours a day, 7 days a week or
have become so large that a cold backup would take too long.

To perform a hot backup, the database must be in ARCHIVELOG mode. Unlike a


cold backup, in which the whole database is usually backed up at the same timetable
spaces in a hot backup scenario can be backed up on different schedules.
Cold backup
 An offline cold backup is a physical backup of the database after it has been
shutdown using the SHUTDOWN NORMAL command.

 If the database is shutdown with the IMMEDIATE or ABORT option, it


should be restarted in RESTRICT mode and then shutdown with the
NORMAL option. An operating system utility is used to perform the backup.
 A cold backup of the database is an image copy of the database at a point in
time. The database is consistent and restorable.
 This image copy can be used to move the database to another computer
provided the same operating system is being used.
 If the database is in ARCHIVELOG mode, the cold backup would be the

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 44
starting point for a point-in-time recovery.
 All archive lofiles necessary would be applied to the database once it is
restored from the cold backup. Cold backups are useful if your business
requirements allow for a shut-down window to backup the database.
 If your database is very large or you have 24*7 processing, cold backups are
not an option.
 The other major difference between hot and cold backups is that before a
table space can be backed up, the database must be informed when a backup
starting and when it is complete.

BACK ARCHITECTURE:
 Client server architecture is used in backup process. Multiple client
connected to server and server machine is used as backup machine.
 Server manages the backup process and also maintains the log and backup
catalog. Backup catalog contains information about the backup process and
backup metadata.
 The backup server depends on backup clients to gather the data to be backed
up. Backup server receives backup metadata from the clients to
 perform its activity.
 Storage node is responsible for writing data to the backup device.
 A storage node is a machine that is connected to a backup server and one or
more devices used in backup process.
 Devices attached to storage nodes are called remote devices because they are
not physically attached to the controlling Backup server.
 The storage node runs special backup software that controls devices. The
data stored on media in remote
 devices tracked in the media database and online client file indexes on the
controlling.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 45
A Backup process
A backup server is a type of server that enables the backup of data, files,
applications and/or databases on a specialized in-house or remote server. It
combines hardware and software technologies that provide backup storage and
services to connected computers, servers or related devices.
Organization decides the policy for backup. Backup server takes backup based on
policy. Backup server sends request to the backup client. Then backup client sends
metadata and data to the backup server. The backup server writes received metadata
to the catalog. After taking backup storage nodes disconnects the connection from
backup device.
2 Describe about various Data replication process. K1 CO4
Data replication is the process of making multiple copies of data and storing them at
different locations for backup purposes, fault tolerance and to improve their overall
accessibility across a network.
Similar to data mirroring, data replication can be applied to both individual
computers and servers. The data replicates can be stored within the same system,
on-site and off-site hosts, and cloud-based hosts.
Common database technologies today either have built-in capabilities, or use third-
party tools to accomplish data replication. While Oracle Database and Microsoft
SQL actively support data replication, some traditional technologies may not
include this feature out of the box.

Data replication can either be synchronous, meaning that any changes made to the
original data will be replicated, or asynchronous, meaning replication is initiated
only when the Commit statement is passed to the database.

Benefits of data replication:


Although data replication can be demanding in terms of cost, computational, and

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 46
storage requirements, businesses widely use this database management technique to
achieve one or more of the following goals:
 Improve the availability of data
 Increase the speed of data access
 Enhance server performance
 Accomplish disaster recovery
Improve the availability of data
When a particular system experiences a technical glitch due to malware or a faulty
hardware component, the data can still be accessed from a different site or node.
Data replication enhances the resilience and reliability of systems by storing data at
multiple nodes across the network.
Increase data access speed:
In organizations where there are multiple branch offices spread across the globe,
users may experience some latency while accessing data from one country to
another. Placing replicas on local servers provides users with faster data access and
query execution times.
Enhance server performance:
Database replication effectively reduces the load on the primary server by dispersing
it among other nodes in the distributed system, thereby improving network
performance. By routing all read-operations to a replica database, IT administrators
can save the primary server for write-operations that demand more processing
power.
Accomplish Disaster recovery:
Businesses are often susceptible to data loss due to a data breach or hardware
malfunction. During such a catastrophe, the employees' valuable data, along with
client information can be compromised. Data replication facilitates the recovery of
data which is lost or corrupted by maintaining accurate backups at well-monitored
locations, thereby contributing to enhanced data protection.
Working of data replication:
 Modern day applications use a distributed database in the back end, where
data is stored and processed using a cluster of systems, instead of relying on
one particular system for the same.
 Let us assume that a user of an application wishes to write a piece of data to
the database. This data gets split into multiple fragments, with each fragment
getting stored on a different node across the distributed system. The database
technology is also responsible for gathering and consolidating the different
fragments when a user wants to retrieve or read the data.
 In such an arrangement, a single system failure can inhibit the retrieval of the
entire data. This is where data replication saves the day. Data replication
technology can store multiple fragments at each node to streamline read and
write operations across the network.
 Data replication tools ensure that complete data can still be consolidated
from other nodes across the distributed system during the event of a system
failure.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 47
Types of data replication
Depending on data replication tools employed, there are multiple types of
replication practiced by businesses today. Some of the popular replication modes are
as follows
 Full table replication
 Transactional replication
 Snapshot replication
 Merge replication
 Key-based incremental replication
Full table replication
Full table replication means that the entire data is replicated. This includes new,
updated as well as existing data that is copied from source to the destination. This
method of replication is generally associated with higher costs since the processing
power and network bandwidth requirements are high.

However, full table replication can be beneficial when it comes to the recovery of
hard-deleted data, as well as data that do not possess replication keys - discussed
further down this article.
Transactional replication
In this method, the data replication software makes full initial copies of data from
origin to destination following which the subscriber database receives updates
whenever data is modified. This is more efficient mode of replication since fewer
rows are copied each time data is changed. Transactional replication is usually
found in server-to-server environments.

Snapshot replication
In Snapshot replication, data is replicated exactly as it appears at any given time.
Unlike other methods, Snapshot replication does not pay attention to the changes
made to data. This mode of replication is used when changes made to data tends to
be infrequent; for example performing initial synchronizations between publishers
and subscribers

Merge replication
This type of replication is commonly found in server-to-client environments and
allows both the publisher and subscriber to make changes to data dynamically. In
merge replication, data from two or more databases are combined to form a single
database thereby contributing to the complexity of using this technique.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 48
Key-based incremental replication
Also called key-based incremental data capture, this technique only copies data
changed since the last update. Keys can be looked at as elements that exist within
databases that trigger data replication. Since only a few rows are copied during each
update, the costs are significantly low.

However, the drawback lies in the fact that this replication mode cannot be used to
recover hard deleted data, since the key value is also deleted along with the record.
3 Explain in detail about Disaster Recovery As A Service (DRAAS) K1 CO4
DISASTER RECOVERY AS A SERVICE (DRaaS)
Disaster recovery as a service (DRaaS) is a cloud computing service model that
allows an organization to back up its data and IT infrastructure in a third party cloud
computing environment and provide all the DR orchestration, all through a SaaS
solution, to regain access and functionality to IT infrastructure after a disaster.
Disaster recovery planning is critical to business continuity. Many disasters that
have the potential to wreak havoc on an IT organization have become more frequent
in recent years:
 Natural disasters such as hurricanes, floods, wildfires and earthquakes
 Equipment failures and power outages
 Cyberattacks.
Models of DRaas:
Organizations may choose to hand over all or part of their disaster recovery
planning to a DRaaS provider. There are many different disaster recovery as a
service providers to choose from, with three main models:

Managed DRaaS:
In a managed DRaaS model, a third party takes over all responsibility for disaster
recovery. Choosing this option requires an organization to stay in close contact with
their DRaaS provider to ensure that it stays up to date on all infrastructure,
application and services changes. If you lack the expertise or time to manage your
own disaster recovery, this may be the best option for you.

Assisted DRaaS:
If you prefer to maintain responsibility for some aspects of your disaster
recovery plan, or if you have unique or customized applications that might be
challenging for a third party to take over, assisted DRaaS might be a better option.
In this model, the service provider offers its expertise for optimizing disaster
recovery procedures, but the customer is responsible for implementing some or all
of the disaster recovery plan.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 49
Self-service DRaaS:

The least expensive option is self-service DRaaS, where the customer is responsible
for the planning, testing and management of disaster recovery, and the customer
hosts its own infrastructure backup on virtual machines in a remote location. Careful
planning and testing are required to make sure that processing can fail over to the
virtual servers instantly in the event of a disaster. This option is best for those who
have experienced disaster recovery experts on staff.

Part-C ( One Question) ( 15 Marks)


S.No Questions BTL CO
Discuss the various strategies used in data migrations and phases of data
1 K1 CO4
migration process.
DATA MIGRATION:
Data migration is the process of transferring data from one data storage system to
another and also between data formats and applications. It also involves data
transfers between different data formats and applications.
Types of data migration :
 Storage Migration
 Database Migration
 Application Migration
 Cloud Migration
 Business process Migration
 Data Center Migration
Storage Migration:
Storage migrations are the most basic types of data migration, fitting the
literal definition of data migration. These migrations consist of moving data from
one storage device to a new or different storage device. That device can be in the
same building or in a different datacenter that's far away. The device may also be of
a different kind, such as moving from a hard disk drive to a solid-state drive.
Migrating data to the cloud or from one cloud provider to another is also a kind of
storage migration.

Database Migration:
Databases are data storage media where data is structured in an organized way.
Databases are managed through database management systems (DBMS). Hence,
database migration involves moving from one DBMS to another or upgrading from
the current version of a DBMS to the latest version of the same DBMS. The former
is more challenging especially if the source system and the target system use
different data structures.

Application Migration:
Application migration occurs when an organization goes through a change in
application software or changes an application vendor. This migration requires
moving data from one computing environment to another. A new application
platform may require radical transformation due to new application interactions after

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 50
the migration. The database that the application uses will need to be relocated
sometimes even modified in format to fit a new data model via data conversion
along with the files and directory structure the application requires installing and
running.

Cloud migration:
Much like two other types of data migration storage migration and application
migration this type of data migration involves moving data or applications. The key
aspect is that cloud data migration refers specifically to transferring data or
applications from a private, on-premises datacenter to the cloud or from one cloud
environment to another. The extent of the migration will vary.

Business Process Migration:


This data migration type refers to moving data and applications in order to better
manage or operate the business itself. In a business process migration, the
organization may transfer any kind of data including databases and applications that
serves products, customer experiences, operations, practices.

Data center Migration:


Data center migration relates to the migration of data center infrastructure to a new
physical location or the movement of data from the old data center infrastructure to
new infrastructure equipment at the same physical location. A data center houses the
data storage infrastructure, which maintains the organization ’ s critical applications.
It consists of servers, network routers, switches, computers, storage devices, and
related data equipment.

Data migration process:


The data migration process should be well planned, seamless, and efficient to ensure
it does not go over budget or result in a protracted process. It involves the following
steps in the planning, migration, and post-migration phases:

The data migration process can also follow the ETL process:
 Extraction of data
 Transformation of data
 Loading data
ETL tools can manage the complexities of the data migration process from
processing huge datasets, profiling, and integration of multiple application
platforms.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 51
The data migration process remains the same whether a big bang approach or a
trickle approach is adopted.

1.Big bang data migration approach:


The big bang data migration approach moves all data in one single operation from
the current environment to the target environment. It is fast and less complex, and
unavailable for users during the migration. Hence, it should be conducted during
public holidays or periods where users are not expected to use the system. The
advantages of the above approach are offset by the risk of an expensive failure due
to big data, which can overwhelm the network during transmission. Because of such
risk, the big bang approach is more suitable for small companies with smaller
amounts of data or for operations or projects where the migration involves a small
amount of data. Furthermore, it should not be used on systems that cannot sustain
any downtime.

2.Trickle Data Migration Approach:


The trickle data migration approach is a phased approach to data migration. Trickle
data migration breaks down the migration process into sub-processes where data is
transferred in small increments. The old system remains operational and runs
parallel with the migration. The advantage is that there is no downtime in the live
system, and it is less susceptible to errors and unexpected failures.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 52
Unit-V

SECURING STORAGE INFRASTRUCTURE

Information security goals, Storage security domains, Threats to a storage infrastructure, Security controls to
protect a storage infrastructure, Governance, risk, and compliance, Storage infrastructure management
functions, Storage infrastructure management processes.

Part-A ( Five Questions)

S.No Questions BTL CO


1 What are the information security goals? K1 CO5
In Information security, it is a collection of practices intended to convey personal
information secure from unapproved access and modification throughout of storing
or broadcasting from one place to another place.
Goals are
 Confidentiality
 Integrity
 Availability
2 What are the security controls to protect a storage infrastructure? K1 CO5
The security controls are
 Physical security controls
 Digital security controls
 Cybersecurity controls
 Cloud security controls
3 Define Governance. K1 CO5
Governance is the set of policies, rules, or frameworks that a company uses to
achieve its business goals. It defines the responsibilities of key stakeholders, such as
the board of directors and senior management.
4 Give some examples of GRC. K2 CO5
 Diligent HighBond.
 IBM OpenPages.
 LogicManager.
 LogicGate Risk Cloud.
 MetricStream Enterprise GRC.
 Navex Global Lockpath.
 ServiceNow Governance, Risk, and Compliance
5 What are the different processes in Storage management? K1 CO5
The most common processes found in storage management are
 Provisioning
 Data compression
 Data migration
 Data replication
 Automation
 Disaster recovery

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 53
Part-B( Three Questions) ( 13 Marks)

S.No Questions BTL CO


1 Explain in detail about storage security domains. K1 CO5
Storage security domains refer to the distinct areas or categories within an
organization’s storage infrastructure that define how data is stored, accessed, and
protected. These domains help ensure data confidentiality, integrity, and availability,
aligning with security policies and compliance requirements. Here are the key
components and considerations related to storage security domains:

1. Data Classification
Categories: Data is classified based on sensitivity and importance, such as public,
internal, confidential, and restricted. This classification guides security measures
and access controls.
Labeling: Proper labeling of data helps in enforcing security policies and ensuring
that sensitive information is handled appropriately.
2. Access Control
User Authentication: Ensuring only authorized users can access storage systems.
This may involve multi-factor authentication (MFA) or biometric methods.
Role-Based Access Control (RBAC): Users are granted access based on their roles,
limiting their ability to access sensitive data unnecessarily.
Access Logs: Monitoring and logging access to storage systems to detect
unauthorized access or anomalies.

3. Data Encryption
At-Rest Encryption: Encrypting data stored on physical storage devices to protect it
from unauthorized access.
In-Transit Encryption: Securing data during transfer using protocols like SSL/TLS
to prevent interception.
Key Management: Proper management of encryption keys to ensure they are secure
and accessible only to authorized users.
4. Data Integrity
Checksums and Hashing: Using techniques to verify that data has not been altered
or corrupted over time.
Audit Trails: Maintaining logs of all changes to data to ensure accountability and
traceability.
5. Physical Security
Location Security: Ensuring that storage devices are housed in secure environments,
such as data centers with controlled access.
Environmental Controls: Implementing measures to protect against physical threats,
such as fire, water damage, or power failure.
6. Backup and Recovery
Regular Backups: Establishing protocols for regularly backing up data to prevent
loss in case of a breach or hardware failure.
Disaster Recovery Plans: Creating plans for restoring data and operations in the
event of a security incident or catastrophic failure.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 54
7. Compliance and Governance
Regulatory Requirements: Adhering to relevant laws and regulations (e.g., GDPR,
HIPAA) that dictate how data must be stored and protected.
Policy Development: Establishing and enforcing policies regarding data storage and
security, including incident response procedures.
8. Network Security
Firewalls and Intrusion Detection Systems: Implementing network security
measures to protect data storage from unauthorized access over the network.
Segmentation: Isolating storage networks from other parts of the organization’s
infrastructure to reduce risk.
9. Virtualization and Cloud Security
Virtual Storage Management: Understanding the security implications of using
visualized storage solutions.
Cloud Security Controls: Applying security measures specific to cloud storage
environments, including vendor assessments and shared responsibility models.
2 Discuss the various methods to secure the backup, recovery, and archive the K1 CO5
information.

Database backup is the same as any other data backup: taking a copy of the data and
then storing it on a different medium in case of failure or damage to the original.
The simplest case of a backup involves shutting down the database to ensure that no
further transactions occur, and then simply backing it up. You can then recreate the
database if it becomes damaged or corrupted in some way.

The recreation of the database is called recovery. Version recovery is the restoration
of a previous version of the database, using an image that was created during a
backup operation. Rollforward recovery is the reapplication of transactions recorded
in the database log files after a database or a table space backup image has been
restored.

Crash recovery is the automatic recovery of the database if a failure occurs before
all of the changes that are part of one or more units of work (transactions) are
completed and committed. This is done by rolling back incomplete transactions and
completing committed transactions that were still in memory when the crash
occurred.

Recovery log files and the recovery history file are created automatically when a
database is created . These log files are important if you need to recover data that is
lost or damaged.

Each database includes recovery logs, which are used to recover from application or
system errors. In combination with the database backups, they are used to recover
the consistency of the database right up to the point in time when the error occurred.

The recovery history file contains a summary of the backup information that can be
used to determine recovery options, if all or part of the database must be recovered

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 55
to a given point in time. It is used to track recovery-related events such as backup
and restore operations, among others. This file is located in the database directory.

The table space change history file, which is also located in the database directory,
contains information that can be used to determine which log files are required for
the recovery of a particular table space.

You cannot directly modify the recovery history file or the table space change
history file; however, you can delete entries from the files using the PRUNE
HISTORY command. You can also use the rec_his_retentn database configuration
parameter to specify the number of days that these history files will be retained.

Database recovery files

Securing backup, recovery, and archival information is crucial for data integrity and
business continuity. Here are various methods to achieve this:
Backup Security
Encryption: Encrypt data before it is backed up. This ensures that even if backups
are compromised, the data remains unreadable without the decryption key.
Access Controls: Implement strict access controls to limit who can perform
backups and access backup data. Use role-based access controls (RBAC) to enforce
permissions.
Regular Testing: Regularly test backup processes to ensure data can be restored
accurately and efficiently. This includes performing scheduled restore tests.
Physical Security: For on-premises backups, ensure physical security measures are
in place, such as locked storage, surveillance, and restricted access areas.
Use of Offsite Storage: Store backups in a secure offsite location or use cloud
storage to protect against local disasters.
Securing backup, recovery, and archival information is crucial for data integrity and

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 56
business continuity. Here are various methods to achieve this:
Backup Security
Encryption: Encrypt data before it is backed up. This ensures that even if backups
are compromised, the data remains unreadable without the decryption key.
Access Controls: Implement strict access controls to limit who can perform
backups and access backup data. Use role-based access controls (RBAC) to enforce
permissions.
Regular Testing: Regularly test backup processes to ensure data can be restored
accurately and efficiently. This includes performing scheduled restore tests.
Physical Security: For on-premises backups, ensure physical security measures are
in place, such as locked storage, surveillance, and restricted access areas.
Use of Offsite Storage: Store backups in a secure offsite location or use cloud
storage to protect against local disasters.
Recovery Security
Redundancy: Maintain multiple backup copies in different locations or formats.
This reduces the risk of total data loss.
Automated Recovery Solutions: Implement automated recovery solutions that can
quickly restore systems to minimize downtime during a disaster.
Documentation: Keep detailed documentation of recovery procedures, including
contact information for recovery teams and steps for restoring various systems.
Monitoring and Alerts: Use monitoring tools to track backup and recovery
processes. Set up alerts for any failures or anomalies.

Archival Security
Long-term Storage Solutions: Use durable and reliable storage media (e.g., magnetic
tape, optical discs, or cloud storage) designed for long-term data retention.

Data Integrity Checks: Regularly perform checksums or hash functions on


archived data to ensure integrity and detect any corruption.
Retention Policies: Establish clear data retention policies that specify how long
data will be archived and when it can be deleted.

Legal and Compliance Considerations: Ensure archival processes comply with


relevant regulations (e.g., GDPR, HIPAA) regarding data protection and retention.
Access Logging: Implement logging and monitoring of access to archived data to
detect unauthorized access or anomalies
3 Give the various security controls to protect a storage infrastructure. K2 CO5
Securing a storage infrastructure is critical to protect sensitive data and maintain
operational integrity. Here are various security controls that can be implemented:
1. Access Controls
Role-Based Access Control (RBAC): Grant access based on user roles and
responsibilities to minimize unnecessary access to sensitive data.
Least Privilege Principle: Ensure users have only the access necessary to perform
their jobs, reducing the risk of data breaches.
Authentication: Implement strong authentication mechanisms, such as passwords,
biometric, or multi-factor authentication (MFA), to verify user identities.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 57
2. Data Encryption
At-Rest Encryption: Encrypt data stored on storage devices to protect it from
unauthorized access.
In-Transit Encryption: Use protocols like TLS/SSL to encrypt data during
transmission between devices and networks.
3. Network Security
Firewalls: Deploy firewalls to control incoming and outgoing network traffic,
providing a barrier against unauthorized access.
Intrusion Detection and Prevention Systems (IDPS): Monitor network traffic for
suspicious activities and respond to potential threats in real-time.
4. Physical Security
Secure Data Centers: Use physical security measures such as surveillance cameras,
security personnel, and access controls to protect data center facilities.
Environmental Controls: Implement fire suppression, temperature control, and
power management systems to protect against environmental threats.
5. Data Integrity Controls
Check sums and Hashing: Use check sums and hashing algorithms to ensure data
integrity, allowing for detection of data corruption or unauthorized changes.
Data Versioning: Maintain multiple versions of data to allow rollback in case of
corruption or loss.
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO
Summarize the functions and process involved in storage infrastructure
1 K2 CO5
management.
The three major areas of management are capacity, performance, and
availability. These three areas can be easily summarized as good storage
management, which is about making sure that the storage is always available,
always has enough space, and is fast in terms of performance. Good storage
management requires solid processes, policies, and tools.

Storage Infrastructure Components

The key storage infrastructure components are Servers, storage systems, and storage
area networks (SANs). These components could be physical or virtual and are used
to provide services to the users. The storage infrastructure management includes all
the storage infrastructure-related functions that are necessary for the management of
the infrastructure components and services, and for the maintenance of data
throughout its lifecycle.

Traditional Storage Management

Traditionally, storage infrastructure management is component specific. The

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 58
management tools only enable monitoring and management of specific components.

Storage Infrastructure Management key Functions

Storage infrastructure management performs two key functions


 Infrastructure discovery
 Operations management

Infrastructure discovery:
Creates an inventory of infrastructure components and provides information about
the components including their configuration, connectivity, functions, performance,
capacity, availability, utilization, and physical-to-virtual dependencies.
It provides the visibility needed to monitor and manage the infrastructure
components.
Discovery is performed using a specialized tool that commonly interacts with
infrastructure components commonly through the native APIs of these components.

Operations management involves on-going management activities to maintain the


storage infrastructure and the deployed services. It ensures that the services and
service levels are delivered as committed. Operations management involves several
management processes. Ideally, operations management should be automated to
ensure the operational agility. Management tools are usually capable of automating
many management operations. These automated operations are described along with
the management processes. Further, the automated operations of management tools
can also be logically integrated and sequenced through orchestration.

The key functions of Storage Operations Management are

Configuration Management Configuration management is responsible for


maintaining information about configuration items (CI). CIs are components such as
services, process documents, infrastructure components including hardware and
software, people, and SLAs that need to be managed in order to deliver services.

Capacity Management Capacity management ensures adequate availability of


storage infrastructure resources to provide services and meet SLA requirements. It
determines the optimal amount of storage required to meet the needs of a service
regardless of dynamic resource consumption and seasonal spikes in storage demand.
It also maximizes the utilization of available capacity and minimizes spare and
stranded capacity without compromising the service levels.

Performance Management
Performance management ensures the optimal operational efficiency of all
infrastructure components so that storage services can meet or exceed the required

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 59
performance level. Performance-related data such as response time and throughput
of components are collected, analyzed, and reported by specialized management
tools. The performance analysis provides information on whether a component
meets the expected performance levels. These tools also proactively alert
administrators about potential performance issues and may prescribe a course of
action to improve a situation.

Availability Management
Availability management is responsible for establishing a proper guideline based on
the defined availability levels of services. The guideline includes the procedures and
technical features required to meet or exceed both current and future service
availability needs at a justifiable cost. Availability management also identifies all
availability-related issues in a storage infrastructure and areas where availability
must be improved.

Incident Management
An incident is an unplanned event such as an HBA failure or an application error
that may cause an interruption to services or degrade the service quality. Incident
management is responsible for detecting and recording all incidents in a storage
infrastructure. The incident management support groups investigate the incidents
escalated by the incident management tools or service desk. They provide solutions
to bring back the services within an agreed timeframe specified in the SLA. If the
support groups are unable to determine and correct the root cause of an incident,
error-correction activity is transferred to problem management. In this case, the
incident management team provides a temporary solution (workaround) to the
incident

Problem Management
A problem is recognized when multiple incidents exhibit one or more common
symptoms. Problem management reviews all incidents and their history to detect
problems in a storage infrastructure. It identifies the underlying root cause that
creates a problem and provides the most appropriate solution and/or preventive
remediation for the problem. Incident and problem management, although separate
management processes, require automated interaction between them and use
integrated incident and problem management tools. These tools may help an
administrator to track and mark specific incident(s) as a problem and transfer the
matter to problem management for further investigation.
Security Management
Security management is responsible for developing information security policies
that govern the organization’s approach towards information security management.
It establishes the security architecture, processes, mechanisms, tools, user
responsibilities, and standards needed to meet the information security policies in a

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 60
cost-effective manner. It also ensures that the required security processes and
mechanisms are properly implemented.. Security management ensures the
confidentiality, integrity, and availability of information in a storage infrastructure.
It prevents the occurrence of security-related incidents or activities that adversely
affect the infrastructure components, management processes, information, and
services. It also meets regulatory or compliance requirements (both internal and
external) for protecting information at reasonable/acceptable costs.

Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 61

You might also like