Storage Technologies Question Bank
Storage Technologies Question Bank
QUESTION BANK
Year/Sem.: III/V
Course Code &Title: AI2219304 & STORAGE TECHNOLOGIES
Regulation: R2022
Prepared By Approved By
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 1
UNIT-I
STORAGE SYSTEMS
Introduction to Information Storage: Digital data and its types, Information storage, Key characteristics of
data centre and Evolution of computing platforms. Information Lifecycle Management.Data Centre
Environment: Building blocks of a data center, Compute systems and compute virtualization and
Software-defined data center.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 2
5 Identify the Implementation process of ILM. K2 CO1
Classifying data and applications based on business rules and
policies to enable differentiated treatment of information
Implementing policies by using information
management tools, starting from the creation of data
and ending with its disposal
Managing the environment by using integrated tools
to reduce operational complexity
Organizing storage resources in tiers to align the resources with
data classes, and storing information in the right type of
infrastructure.
CORE ELEMENTS:
Five core elements are essential for the basic functionality of a data center:
Application:
An application is a computer program that provides the logic for computing
operations. Applications, such as an order processing system, can be layered on the
database, which in turn uses operating system services to perform read/write
operations to storage devices.
Database:
More commonly, a database management system (DBMS) provides a structured way
to store data in logically organized tables that are interrelated. A DBMS optimizes the
storage and retrieval of data.
Server and operating system:
A computing platform that runs applications and databases.
Network:
A data path that facilitates communication between clients and servers or
between servers and storage.
Storage array:
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 3
A device that stores data persistently for subsequent use.
Availability:
All data center elements should be designed to ensure accessibility. The inability
of users to access data can have a significant negative impact on a business.
Security:
Policies, procedures, and proper integration of the data center core elements
that will prevent unauthorized access to information must be established.
In addition to the security measures for client access, specific
mechanisms must enable servers to access only their allocated resources on
storage arrays.
Scalability:
Data center operations should be able to allocate additional processing
capabilities or storage on demand, without interrupting business operations.
Business growth often requires deploying more servers, new applications, and
additional databases. The storage solution should be able to grow with the
business.
Performance:
All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed.
The infrastructure should be able to support performance requirements
Data integrity:
Data integrity refers to mechanisms such as error correction codes or parity
bits which ensure that data is written to disk exactly as it was received.
Any variation in data during its retrieval implies corruption, which may
affect the operations of the organization.
Capacity:
Data center operations require adequate resources to store and process large
amounts of data efficiently.
When capacity requirements increase, the data center must be able to
provide additional capacity with- out interrupting availability, or, at the
very least, with minimal disruption.
Manageability:
A data center should perform all operations and activities most
efficiently.
Manageability can be achieved through automation and the reduction of
human (manual) intervention in common tasks.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 4
MANAGING STORAGE INFRASTRUCTURE:
Managing a modern, complex data center involves many tasks.
Monitoring is the continuous collection of information and the review of the
entire data center infrastructure.
Reporting is done periodically on resource performance, capacity, and utilization.
Reporting tasks help to establish business justifications and chargeback of
costs associated with data center operations.
Provisioning is the process of providing the hardware, software, and other resources
needed to run a data center.
Capacity planning ensures that the user’s and the application’s future
needs will be addressed in the most cost-effective and controlled
manner.
Resource planning is the process of evaluating and identifying
required resources, such as personnel, the facility (site), and the
technology. Resource planning ensures that adequate resources are
available to meet user and application requirements.
2 Elaborate on the Evolution of storage technology and architecture in detail. K1 CO1
The organizations had centralized computers (mainframe) and information
storage devices (tape reels and disk packs) in their data center.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 5
storage architectures such as DAS, SAN, and so on.
Direct-attached storage (DAS):
This type of storage connects directly to a server (host) or a group of servers in a
cluster. Storage can be either internal or external to the server. External DAS
alleviated the challenges of limited internal storage capacity.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 6
3 Describe in detail about the Information Lifecycle Management system. K2 CO1
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 7
implementing, managing, and organizing:
Classifying data and applications on the basis of business rules and policies to
enable differentiated treatment of information
■ Implementing policies by using information management tools, starting from
complexity
■ Organizing storage resources in tiers to align the resources with data classes,
Step 1 :
The goal is to implement a storage networking environment. Storage architectures
offer varying levels of protection and performance and this acts as a foundation for
future policy-based information management in Steps 2 and 3.
Step 2:
Takes ILM to the next level, with detailed application or data classification and linkage
of the storage infrastructure to business policies.
This classification and the resultant policies can be automatically executed using tools
for one or more applications, resulting in better management and optimal allocation
of storage resources.
Step 3 : The implementation is to automate more of the applications or
data classification and policy management activities in order to scale
to a wider set of enterprise applications.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 8
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO
Colocation data centers function as a kind of rental property where the space and
resources of a data center are made available to the people willing to rent it.
Managed service data centers offer aspects such as data storage, computing, and other
services as a third party, serving customers directly.
Cloud data centers are distributed and are sometimes offered to customers with the help
of a third-party managed service provider.
BUILDING BLOCKS OF A DATA CENTER :
Data centers are made up of three primary types of components:
Apart from the Data Centers, support infrastructure is essential to meeting the
service level agreements of an enterprise data center.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 9
model.
• Data centers must use processors that are best suited for the task, e.g. general-
purpose CPUs may not be the best choice to solve artificial intelligence (AI) and
machine learning (ML) problems.
Data Center Storage
Data centers host large quantities of sensitive information, both for their
purposes and the needs of their customers. Decreasing costs of storage
media increases the amount of storage available for backing up the data
either locally, remote, or both.
Advancements in non-volatile storage media lowers data access times.
In addition, as in any other thing that is software-defined, software-defined
storage technologies increase staff efficiency for managing a storage
system.
Data Center Networks
Datacenter network equipment includes cabling, switches, routers, and
firewalls that connect servers and to the outside world. Properly configured
and structured, they can manage high volumes of traffic without
compromising performance.
A typical three-tier network topology is made up of core switches at the edge
connecting the data center to the Internet and a middle aggregate layer that
connects the core layer to the access layer where the servers reside.
Advancements, such as hyper-scale network security and software-defined
networking, bring cloud-level agility and scalability to on-premises networks.
• In the cloud
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 10
In software-defined data center is an IT-as-a-Service (ITaaS) platform that services
an organization’s software, infrastructure, or platform needs.
An SDDC can be housed on-premise, at an MSP, and in private, public, or hosted
clouds.
Like traditional data centers, SDDCs also host servers, storage devices, network
equipment, and security devices. You can manage SDDCs from any location, using
remote APIs and Web browser interfaces. SDDCs also make extensive use of
automation capabilities to:
• Reduce IT resource usage
• Provide automated deployment and management for many core
functions
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 11
the clock, reducing the need for IT manpower. Remote management and
automation is delivered via a software platform accessible from any
suitable location, via APIs or Web browser access.
Benefits of SDDCs
Business agility
An SDDC offers several benefits that improve business agility with a focus on three key
areas:
• Balance
• Flexibility
• Adaptability
Reduced cost
• In general, it costs less to operate an SDDC than housing data in brick-and-mortar
data centers.
• Cloud SDDCs operate similarly to SaaS platforms that charge a recurring monthly
cost.
• This is usually an affordable rate, making an SDDC accessible to all types of
businesses, even those who may not have a big budget for technology
spending.
Increased scalability
By design, cloud SDDCs can easily expand along with your business. Increasing
your storage space or adding functions is usually as easy as contacting the data
facility to get a revised monthly service quote.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 12
UNIT-II
Components of an intelligent storage system,Componenets, addressing and performance of hard disk drives
and solid state drives, RAID, Types of intelligent storage systems, Scale-up, and scaleout storage
Architecture.
An I/O request from the host at the front-end port is processed through a cache
and the back end, to enable storage and retrieval of data from the physical
disk. A read request can be serviced directly from the cache if the requested
data is found in the cache.
P ort P ort
FRONT END
• The front end provides the interface between the storage system and the
host. It consists of two components: front-end ports and front-end
controllers.
• The front-end ports enable hosts to connect to the intelligent storage system.
Each
front-end port has processing logic that executes the appropriate transport
protocol, such as SCSI, Fibre Channel, or iSCSI, for storage connections.
Front-end controllers route data to and from the cache via the internal data bus.
When the cache receives write data, the controller sends an acknowledgment
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 14
message back to the host. Controllers optimize I/O processing by using command
queuing algorithms.
CACHE
• The cache is semiconductor memory where data is placed temporarily to reduce the
time required to service I/O requests from the host.
• Accessing data from the cache takes less than a millisecond. Write data is placed
in the cache and then written to disk. After the data is securely placed in the cache,
the host is acknowledged immediately.
Structure of Cache:
✓ The cache is organized into pages or slots, which is the smallest unit of cache
allocation.
The size of a cache page is configured according to the application I/O size. The
cache consists of the data store and tag RAM.
The data store holds the data while tag RAM tracks the location of the data in
the data store and disk.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 15
Entries in tag RAM indicate where data is found in cache and where the data
belongs on the disk. Tag RAM includes a dirty bit flag, which indicates whether
the data in cache has been committed to the disk or not.
It also contains time-based information, such as the time of last access, which
is used to identify cached information that has not been accessed for a long
period and may be freed up.
Cache Implementation
The cache can be implemented as either a dedicated cache or a global cache. With a
dedicated cache, separate sets of memory locations are reserved for reads and writes.
In the global cache, both reads and writes can use any of the available memory
addresses. Cache management is more efficient in a global cache implementation, as
only one global set of addresses has to be managed.
BACK END:
• The back end provides an interface between cache and the physical
disks. It con- sists of two components: back-end ports and back-end
controllers.
• The back end controls data transfers between cache and the physical disks.
From cache,
data is sent to the back end and then routed to the destination disk. Physical
disks are connected to ports on the back end.
• The back-end controller communicates with the disks when performing
reads and writes
and also provides additional, but limited, temporary data storage.
PHYSICAL DISK:
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 16
Channel interface.
An intelligent storage system enables the use of a mixture of SCSI or Fibre
Channel drives and IDE/ATA drives.
Logical Unit Number
For example, without the use of LUNs, a host requiring only 200 GB could be
allocated an entire 1TB physical disk. Using LUNs, only the required 200 GB
would be allocated to the host, allowing the remaining 800 GB to be allocated to
other hosts.
The capacity of a LUN can be expanded by aggregating other LUNs with it.
The result of this aggregation is a larger capacity LUN, known as a meta-
LUN. The mapping of LUNs to their physical location on the drives is
managed by the operating environment of an intelligent storage system.
A disk drive uses a rapidly moving arm to read and write data across a flat platter
coated with magnetic particles. Data is transferred from the magnetic platter
through the R/W head to the computer.
Several platters are assembled together with the R/W head and controller, most
Key components of a disk drive are platter, spindle, read/write head, actuator arm
assembly, and controller
PLATTER:
A typical HDD consists of one or more flat circular disks called platters (Figure
2-3). The data is recorded on these platters in binary codes (0s and 1s).
The set of rotating platters is sealed in a case, called a Head Disk Assembly
(HDA). A platter is a rigid, round disk coated with magnetic material on both
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 17
surfaces (top and bottom).
The data is encoded by polarizing the magnetic area, or domains, of the disk
surface. Data can be written to or read from both surfaces of the platter.
The number of platters and the storage capacity of each platter determine the
total capacity of the drive.
SPINDLE
✓ A spindle connects all the platters, as shown in Figure 2-3, and is
connected to a motor. The motor of the spindle rotates with a
constant speed.
✓ The disk platter spins at a speed of several thousands of
revolutions per minute (rpm). Disk drives have spindle speeds of
7,200 rpm, 10,000 rpm, or 15,000 rpm. Disks used on current
storage systems have a platter diameter of 3.5” (90 mm).
✓ When the platter spins at 15,000 rpm, the outer edge is moving at
around 25
percent of the speed of sound.
READ/WRITE HEAD
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 19
3 Describe the two types of RAID implementation and Array Components in detail. K2 CO2
Software RAID
✓ Supported features: Software RAID does not support all RAID levels.
Hardware RAID
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 20
✓ The RAID Controller interacts with the hard disks using a PCI bus.
Manufacturers also integrate RAID controllers on motherboards. This
integra- tion reduces the overall cost of the system, but does not provide
the flexibility required for high-end storage systems.
✓ The number of HDDs in a logical array depends on the RAID level used.
Configurations could have a logical array with multiple physical arrays or
a physical array with multiple logical arrays.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 21
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO
i) Discuss the steps involved in various RAID levels models. (10) K2 CO2
1
ii) Explain the Read and Write operation performed in cache memory (5) K2 CO2
i)RAID levels are defined based on striping, mirroring, and parity techniques. These
techniques determine the data availability and performance characteristics of an array.
RAID 0: Striping
• RAID 0, also known as a striped set or a striped volume, requires a minimum of
two disks. The disks are merged into a single large volume where data is stored
evenly across the number of disks in the array.
• This process is called disk striping and involves splitting data into blocks and
writing it simultaneously/sequentially on multiple disks. Therefore, RAID 0 is
generally implemented to improve speed and efficiency.
Advantages of RAID 0
• Cost-efficient and straightforward to implement.
• Increased read and write performance.
• No overhead (total capacity use).
Disadvantages of RAID 0
• Doesn't provide fault tolerance or redundancy.
RAID 1: Mirroring
✓ RAID 1 is an array consisting of at least two disks where the same data is stored on
each to ensure redundancy. The most common use of RAID 1 is setting up a
mirrored pair consisting of two disks in which the contents of the first disk is
mirrored in the second. This is why such a configuration is also called mirroring.
Advantages of RAID 1
• Increased read performance.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 22
• Provides redundancy and fault tolerance.
• Simple to configure and easy to use.
Disadvantages of RAID 1
• Uses only half of the storage capacity.
• More expensive (needs twice as many drivers).
• Requires powering down your computer to replace the failed drive.
It combines bit-level striping with error checking and information correction. This RAID
implementation requires two groups of disks – one for writing the data and another for
writing error correction codes. RAID 2 also requires a special controller for the
synchronized spinning of all disks.
Advantages of RAID 2
• Reliability.
• The ability to correct stored information.
Disadvantages of RAID 2
• Expensive.
• Difficult to implement.
• Require entire disks for ECC.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 23
Advantages of RAID 3
• Good throughput when transferring large amounts of data.
• High efficiency with sequential operations.
• Disk failure resiliency.
Disadvantages of RAID 3
• Not suitable for transferring small files.
• Complex to implement.
• Difficult to set up as software RAID.
Disadvantages of RAID 4
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 24
Parity bits are distributed evenly on all disks after each sequence of data has been saved.
Advantages of RAID 5
• High performance and capacity.
• Fast and reliable read speed.
• Tolerates single drive failure.
Disadvantages of RAID 5
• Longer rebuild time.
• Uses half of the storage capacity (due to parity).
• If more than one disk fails, data is lost.
• More complex to implement.
Advantages of RAID 6
• High fault and drive-failure tolerance.
Disadvantages of RAID 6
• Rebuild time can take up to 24 hours.
• Complex to implement.
• More expensive.
Advantages of RAID 10
• High performance.
• High fault-tolerance.
• Fast read and write operations.
• Fast rebuild time.
Disadvantages of RAID 10
• Limited scalability.
• Costly (compared to other RAID levels).
• Uses half of the disk space capacity.
• More complicated to set up.
ii) Read Operation with Cache
✓ when a host issues a read request, the front-end controller accesses the
tag RAM to determine whether the required data is available in the
cache.
✓ If the requested data is found in the cache, it is called a read cache hit
or read hit and data is sent directly to the host, without any disk
operation. This provides a fast response time to the host (about a
millisecond).
✓ If the requested data is not found in the cache, it is called a cache miss
and the data must be read from the disk.
✓ The back-end controller accesses the appropriate disk and retrieves the
requested data. Data is then placed in the cache and is finally sent to the
host through the front-end controller. Cache misses increase I/O
response time.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 26
✓ A pre-fetch, or read-ahead, algorithm is used when read requests are
sequential. In a sequential read request, a contiguous set of associated
blocks is retrieved. Several other blocks that have not yet been
requested by the host can be read from the disk and placed into the
cache in advance.
✓ The intelligent storage system offers fixed and variable pre-fetch sizes.
✓ In fixed pre-fetch, the intelligent storage system pre-fetches a fixed
amount of data. It is most suitable when I/O sizes are uniform.
In variable pre-fetch, the storage system pre-fetches an amount of data in
multiples of the size of the host request.
✓ Read performance is measured in terms of the read hit ratio, or the
hit rate, usually expressed as a percentage.
This ratio is the number of read hits with respect to the total number of read requests. A
higher read-hit ratio improves the read performance.
Write operations with cache provide performance advantages over writing directly
to disks. When an I/O is written to the cache and acknowledged, it is completed in
far less time (from the host’s perspective) than it would take to write directly to
disk
• Write-back cache: Data is placed in the cache and an acknowledgment is sent to
the host immediately. Later, data from several writes are committed (de-staged) to
the disk. Write response times are much faster, as the write operations are isolated
from the mechanical delays of the disk. However, uncommitted data is at risk of
loss in the event of cache failures.
• Write-through cache: Data is placed in the cache and immediately written to the
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 27
disk, and an acknowledgment is sent to the host. Because data is committed to disk
as it arrives, the risks of data loss are low but write response time is longer because
of the disk operations.
The cache can be bypassed under certain conditions, such as very large size write I/O.
In this implementation, if the size of an I/O request exceeds the pre-defined size, called
write aside size, writes are sent to the disk directly to reduce the impact of large writes
consuming a large cache area.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 28
UNIT-III
Block-Based Storage System, File-Based Storage System, Object-Based and Unified Storage. Fibre Channel
SAN: Software-defined networking, FC SAN components and architecture, FC SAN topologies, link
aggregation, and zoning, Virtualization in FC SAN environment. Internet Protocol SAN: iSCSI protocol,
network components, and connectivity, Link aggregation, switch aggregation, and VLAN, FCIP protocol,
connectivity, and configuration.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 29
3 What is meant by a file-based storage system? K1 CO3
4 Difference between Multimode fiber (MMF) cable and Single-mode fiber K1 CO3
(SMF).
Multimode fiber cable Single-mode fiber
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 30
Part-B(Three Questions) ( 13 Marks)
• This makes using block storage quite similar to storing data on a hard
drive within a server, except the data is stored in a remote location rather
than on local hardware.
• The block size is generally too small to fit an entire piece of data, and
so the data for any particular file is broken up into numerous blocks for
storage.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 31
separates data from the limitations of individual user environments. As a
result, data can be retrieved through any number of paths to maximize
efficiency, with high input/output operations per second (IOPS).
• High efficiency: Block storage’s high IOPS and low latency make it
ideal for applications that demand high performance.
• Large file efficiency: For large files, such as archives and video files,
data must be completely overwritten when using file or object storage..
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 32
• Databases: Block storage is fast, efficient, flexible, and scalable, with
support for redundant volumes. This allows it to support databases,
particularly those that handle a heavy volume of queries and where
latency must be minimized.
Core-Edge Fabric
In the core-edge fabric topology, there are two types of switch tiers in this
fabric.
• The tier at the edge fans out from the tier at the core. The nodes on
the edge can communicate with each other.
• The core tier usually comprises enterprise directors that ensure high
fabric availability.
The host-to-storage traffic has to traverse one and two ISLs in a two-tier and
three-tier configuration, respectively.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 33
while conserving overall port utilization. If expansion is required, an additional
edge switch can be connected to the core.
hosts are connected to the edge tier and all storage is connected to the core tier.
However, to maintain the topology, it is essential that new ISLs are created to
connect each edge switch to the new core switch that is added.
Each tier’s switch is used for either storage or hosts, one can easily
identify which resources are approaching their capacity, making it easier
to develop a set of rules for scaling and apportioning.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 34
model.
Hop count represents the total number of devices a given piece of data
(packet) passes through.
A large hop count means greater the transmission delay between data
traverse from its source to destination.
Mesh Topology
In a mesh topology, each switch is directly connected to other switches
by using ISLs. This topology promotes enhanced connectivity within the
SAN.
A mesh topology may be one of the two types: full mesh or partial
mesh. In a full mesh, every switch is connected to every other switch in
the topology. Full mesh topology may be appropriate when the number
of switches involved is small.
Hosts and storage can be located anywhere in the fabric, and storage
can be localized to a director or a switch in both mesh topologies.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 35
result in an odd number of switches.
FCoE switch
They eliminate the need to deploy separate adapters and cables for FC
and Ethernet communications, thereby reducing the required number
of network adapters and switch ports.
A CNA offloads the FCoE protocol processing task from the compute
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 36
system, thereby freeing the CPU resources of the compute system for
application processing.
(Ethernet traffic that carries FC data) and regular Ethernet traffic are
transferred through supported NICs on the hosts.
FCOE Switch
An FCoE switch has both Ethernet switch and FC switch functionalities.
It has a Fibre Channel Forwarder (FCF), an Ethernet Bridge, and a set of
ports that can be used for FC and Ethernet connectivity.
FCF handles FCoE login requests, applies zoning, and provides the
fabric services typically associated with an FC switch.
It also encapsulates the FC frames received from the FC port into the
Ethernet frames and decapsulates the Ethernet frames received from the
Ethernet Bridge to the FC frames.
Upon receiving the incoming Ethernet traffic, the FCoE switch inspects
the Ethertype of the incoming frames and uses that to determine their
destination.
If the Ethertype of the frame is FCoE, the switch recognizes that the
frame contains an FC payload and then forwards it to the FCF.
From there, the FC frame is extracted from the Ethernet frame and
transmitted to the FC SAN over the FC ports.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 37
If the Ethertype is not FCoE, the switch handles the traffic as usual
Ethernet traffic and forwards it over the Ethernet ports.
FCoE ARCHITECTURE
Fibre Channel over Ethernet (FCoE) is a method of supporting
converged Fibre Channel (FC) and Ethernet traffic on a data center
bridging (DCB) network.
An FCoE frame is the same as any other Ethernet frame because the
Ethernet encapsulation provides the header information needed to
forward the frames. However, to achieve the lossless behavior that FC
transport requires, the Ethernet network must conform to DCB
standards.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 38
Part-C ( One Question) ( 15 Marks)
S.No Questions BTL CO
These components can be further broken down into the following key
elements: node ports, cabling, interconnecting devices (such as FC switches
or hubs), storage arrays, and SAN management software
Node Ports
In fibre channel, devices such as hosts, storage and tape libraries are all
referred to as nodes. Each node is a source or destination of information for
one or more nodes.
Cabling:
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 39
cable.
OM1 (62.5µm),
OM2 (50µm)
laser optimized OM3 (50µm).
In an MMF transmission, multiple light beams traveling inside the cable tend
to disperse and collide.
This collision weakens the signal strength after it travels a
certain distance — a process known as modal dispersion.
The small core and the single light wave limits modal dispersion.
Among all types of fibre cables, single-mode provides minimum
signal attenuation over a maximum distance (up to 10 km).
MMFs are generally used within data centers for shorter distance runs, while
SMFs are used for longer distances. MMF transceivers are less expensive as
compared to SMF transceivers.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 40
An SC is used for data transmission speeds up to 1 Gb/s, whereas an
LC is used for speeds up to 4 Gb/s.
Interconnect Devices
Hubs, switches, and directors are the interconnect devices commonly
used in SAN.
All the nodes must share the bandwidth because data travels through
all the connection points. Because of the availability of low-cost and
high-performance switches, hubs are no longer used in SANs.
Storage Arrays
The fundamental purpose of a SAN is to provide host access to
storage resources.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 41
redundancy, improved performance, business continuity, and
multiple host connectivity.
SAN management software manages the interfaces between hosts,
interconnect devices, and storage arrays.
FC ARCHITECTURE
Such performance is due to the static nature of channels and the high
level of hardware and software integration provided by the channel
technologies.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 42
Unit-IV
Introduction to Business Continuity, Backup architecture, Backup targets and methods, Data deduplication,
Cloud-based and mobile device backup, Data archive, Uses of replication and its characteristics, Compute
based, storage-based, and network-based replication, Data migration, Disaster Recovery as a Service (DRaaS).
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 43
Part-B( Three Questions) ( 13 Marks)
Backup Methods:
1. Hot backup
2. Cold backup
Hot backup
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 44
starting point for a point-in-time recovery.
All archive lofiles necessary would be applied to the database once it is
restored from the cold backup. Cold backups are useful if your business
requirements allow for a shut-down window to backup the database.
If your database is very large or you have 24*7 processing, cold backups are
not an option.
The other major difference between hot and cold backups is that before a
table space can be backed up, the database must be informed when a backup
starting and when it is complete.
BACK ARCHITECTURE:
Client server architecture is used in backup process. Multiple client
connected to server and server machine is used as backup machine.
Server manages the backup process and also maintains the log and backup
catalog. Backup catalog contains information about the backup process and
backup metadata.
The backup server depends on backup clients to gather the data to be backed
up. Backup server receives backup metadata from the clients to
perform its activity.
Storage node is responsible for writing data to the backup device.
A storage node is a machine that is connected to a backup server and one or
more devices used in backup process.
Devices attached to storage nodes are called remote devices because they are
not physically attached to the controlling Backup server.
The storage node runs special backup software that controls devices. The
data stored on media in remote
devices tracked in the media database and online client file indexes on the
controlling.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 45
A Backup process
A backup server is a type of server that enables the backup of data, files,
applications and/or databases on a specialized in-house or remote server. It
combines hardware and software technologies that provide backup storage and
services to connected computers, servers or related devices.
Organization decides the policy for backup. Backup server takes backup based on
policy. Backup server sends request to the backup client. Then backup client sends
metadata and data to the backup server. The backup server writes received metadata
to the catalog. After taking backup storage nodes disconnects the connection from
backup device.
2 Describe about various Data replication process. K1 CO4
Data replication is the process of making multiple copies of data and storing them at
different locations for backup purposes, fault tolerance and to improve their overall
accessibility across a network.
Similar to data mirroring, data replication can be applied to both individual
computers and servers. The data replicates can be stored within the same system,
on-site and off-site hosts, and cloud-based hosts.
Common database technologies today either have built-in capabilities, or use third-
party tools to accomplish data replication. While Oracle Database and Microsoft
SQL actively support data replication, some traditional technologies may not
include this feature out of the box.
Data replication can either be synchronous, meaning that any changes made to the
original data will be replicated, or asynchronous, meaning replication is initiated
only when the Commit statement is passed to the database.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 46
storage requirements, businesses widely use this database management technique to
achieve one or more of the following goals:
Improve the availability of data
Increase the speed of data access
Enhance server performance
Accomplish disaster recovery
Improve the availability of data
When a particular system experiences a technical glitch due to malware or a faulty
hardware component, the data can still be accessed from a different site or node.
Data replication enhances the resilience and reliability of systems by storing data at
multiple nodes across the network.
Increase data access speed:
In organizations where there are multiple branch offices spread across the globe,
users may experience some latency while accessing data from one country to
another. Placing replicas on local servers provides users with faster data access and
query execution times.
Enhance server performance:
Database replication effectively reduces the load on the primary server by dispersing
it among other nodes in the distributed system, thereby improving network
performance. By routing all read-operations to a replica database, IT administrators
can save the primary server for write-operations that demand more processing
power.
Accomplish Disaster recovery:
Businesses are often susceptible to data loss due to a data breach or hardware
malfunction. During such a catastrophe, the employees' valuable data, along with
client information can be compromised. Data replication facilitates the recovery of
data which is lost or corrupted by maintaining accurate backups at well-monitored
locations, thereby contributing to enhanced data protection.
Working of data replication:
Modern day applications use a distributed database in the back end, where
data is stored and processed using a cluster of systems, instead of relying on
one particular system for the same.
Let us assume that a user of an application wishes to write a piece of data to
the database. This data gets split into multiple fragments, with each fragment
getting stored on a different node across the distributed system. The database
technology is also responsible for gathering and consolidating the different
fragments when a user wants to retrieve or read the data.
In such an arrangement, a single system failure can inhibit the retrieval of the
entire data. This is where data replication saves the day. Data replication
technology can store multiple fragments at each node to streamline read and
write operations across the network.
Data replication tools ensure that complete data can still be consolidated
from other nodes across the distributed system during the event of a system
failure.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 47
Types of data replication
Depending on data replication tools employed, there are multiple types of
replication practiced by businesses today. Some of the popular replication modes are
as follows
Full table replication
Transactional replication
Snapshot replication
Merge replication
Key-based incremental replication
Full table replication
Full table replication means that the entire data is replicated. This includes new,
updated as well as existing data that is copied from source to the destination. This
method of replication is generally associated with higher costs since the processing
power and network bandwidth requirements are high.
However, full table replication can be beneficial when it comes to the recovery of
hard-deleted data, as well as data that do not possess replication keys - discussed
further down this article.
Transactional replication
In this method, the data replication software makes full initial copies of data from
origin to destination following which the subscriber database receives updates
whenever data is modified. This is more efficient mode of replication since fewer
rows are copied each time data is changed. Transactional replication is usually
found in server-to-server environments.
Snapshot replication
In Snapshot replication, data is replicated exactly as it appears at any given time.
Unlike other methods, Snapshot replication does not pay attention to the changes
made to data. This mode of replication is used when changes made to data tends to
be infrequent; for example performing initial synchronizations between publishers
and subscribers
Merge replication
This type of replication is commonly found in server-to-client environments and
allows both the publisher and subscriber to make changes to data dynamically. In
merge replication, data from two or more databases are combined to form a single
database thereby contributing to the complexity of using this technique.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 48
Key-based incremental replication
Also called key-based incremental data capture, this technique only copies data
changed since the last update. Keys can be looked at as elements that exist within
databases that trigger data replication. Since only a few rows are copied during each
update, the costs are significantly low.
However, the drawback lies in the fact that this replication mode cannot be used to
recover hard deleted data, since the key value is also deleted along with the record.
3 Explain in detail about Disaster Recovery As A Service (DRAAS) K1 CO4
DISASTER RECOVERY AS A SERVICE (DRaaS)
Disaster recovery as a service (DRaaS) is a cloud computing service model that
allows an organization to back up its data and IT infrastructure in a third party cloud
computing environment and provide all the DR orchestration, all through a SaaS
solution, to regain access and functionality to IT infrastructure after a disaster.
Disaster recovery planning is critical to business continuity. Many disasters that
have the potential to wreak havoc on an IT organization have become more frequent
in recent years:
Natural disasters such as hurricanes, floods, wildfires and earthquakes
Equipment failures and power outages
Cyberattacks.
Models of DRaas:
Organizations may choose to hand over all or part of their disaster recovery
planning to a DRaaS provider. There are many different disaster recovery as a
service providers to choose from, with three main models:
Managed DRaaS:
In a managed DRaaS model, a third party takes over all responsibility for disaster
recovery. Choosing this option requires an organization to stay in close contact with
their DRaaS provider to ensure that it stays up to date on all infrastructure,
application and services changes. If you lack the expertise or time to manage your
own disaster recovery, this may be the best option for you.
Assisted DRaaS:
If you prefer to maintain responsibility for some aspects of your disaster
recovery plan, or if you have unique or customized applications that might be
challenging for a third party to take over, assisted DRaaS might be a better option.
In this model, the service provider offers its expertise for optimizing disaster
recovery procedures, but the customer is responsible for implementing some or all
of the disaster recovery plan.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 49
Self-service DRaaS:
The least expensive option is self-service DRaaS, where the customer is responsible
for the planning, testing and management of disaster recovery, and the customer
hosts its own infrastructure backup on virtual machines in a remote location. Careful
planning and testing are required to make sure that processing can fail over to the
virtual servers instantly in the event of a disaster. This option is best for those who
have experienced disaster recovery experts on staff.
Database Migration:
Databases are data storage media where data is structured in an organized way.
Databases are managed through database management systems (DBMS). Hence,
database migration involves moving from one DBMS to another or upgrading from
the current version of a DBMS to the latest version of the same DBMS. The former
is more challenging especially if the source system and the target system use
different data structures.
Application Migration:
Application migration occurs when an organization goes through a change in
application software or changes an application vendor. This migration requires
moving data from one computing environment to another. A new application
platform may require radical transformation due to new application interactions after
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 50
the migration. The database that the application uses will need to be relocated
sometimes even modified in format to fit a new data model via data conversion
along with the files and directory structure the application requires installing and
running.
Cloud migration:
Much like two other types of data migration storage migration and application
migration this type of data migration involves moving data or applications. The key
aspect is that cloud data migration refers specifically to transferring data or
applications from a private, on-premises datacenter to the cloud or from one cloud
environment to another. The extent of the migration will vary.
The data migration process can also follow the ETL process:
Extraction of data
Transformation of data
Loading data
ETL tools can manage the complexities of the data migration process from
processing huge datasets, profiling, and integration of multiple application
platforms.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 51
The data migration process remains the same whether a big bang approach or a
trickle approach is adopted.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 52
Unit-V
Information security goals, Storage security domains, Threats to a storage infrastructure, Security controls to
protect a storage infrastructure, Governance, risk, and compliance, Storage infrastructure management
functions, Storage infrastructure management processes.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 53
Part-B( Three Questions) ( 13 Marks)
1. Data Classification
Categories: Data is classified based on sensitivity and importance, such as public,
internal, confidential, and restricted. This classification guides security measures
and access controls.
Labeling: Proper labeling of data helps in enforcing security policies and ensuring
that sensitive information is handled appropriately.
2. Access Control
User Authentication: Ensuring only authorized users can access storage systems.
This may involve multi-factor authentication (MFA) or biometric methods.
Role-Based Access Control (RBAC): Users are granted access based on their roles,
limiting their ability to access sensitive data unnecessarily.
Access Logs: Monitoring and logging access to storage systems to detect
unauthorized access or anomalies.
3. Data Encryption
At-Rest Encryption: Encrypting data stored on physical storage devices to protect it
from unauthorized access.
In-Transit Encryption: Securing data during transfer using protocols like SSL/TLS
to prevent interception.
Key Management: Proper management of encryption keys to ensure they are secure
and accessible only to authorized users.
4. Data Integrity
Checksums and Hashing: Using techniques to verify that data has not been altered
or corrupted over time.
Audit Trails: Maintaining logs of all changes to data to ensure accountability and
traceability.
5. Physical Security
Location Security: Ensuring that storage devices are housed in secure environments,
such as data centers with controlled access.
Environmental Controls: Implementing measures to protect against physical threats,
such as fire, water damage, or power failure.
6. Backup and Recovery
Regular Backups: Establishing protocols for regularly backing up data to prevent
loss in case of a breach or hardware failure.
Disaster Recovery Plans: Creating plans for restoring data and operations in the
event of a security incident or catastrophic failure.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 54
7. Compliance and Governance
Regulatory Requirements: Adhering to relevant laws and regulations (e.g., GDPR,
HIPAA) that dictate how data must be stored and protected.
Policy Development: Establishing and enforcing policies regarding data storage and
security, including incident response procedures.
8. Network Security
Firewalls and Intrusion Detection Systems: Implementing network security
measures to protect data storage from unauthorized access over the network.
Segmentation: Isolating storage networks from other parts of the organization’s
infrastructure to reduce risk.
9. Virtualization and Cloud Security
Virtual Storage Management: Understanding the security implications of using
visualized storage solutions.
Cloud Security Controls: Applying security measures specific to cloud storage
environments, including vendor assessments and shared responsibility models.
2 Discuss the various methods to secure the backup, recovery, and archive the K1 CO5
information.
Database backup is the same as any other data backup: taking a copy of the data and
then storing it on a different medium in case of failure or damage to the original.
The simplest case of a backup involves shutting down the database to ensure that no
further transactions occur, and then simply backing it up. You can then recreate the
database if it becomes damaged or corrupted in some way.
The recreation of the database is called recovery. Version recovery is the restoration
of a previous version of the database, using an image that was created during a
backup operation. Rollforward recovery is the reapplication of transactions recorded
in the database log files after a database or a table space backup image has been
restored.
Crash recovery is the automatic recovery of the database if a failure occurs before
all of the changes that are part of one or more units of work (transactions) are
completed and committed. This is done by rolling back incomplete transactions and
completing committed transactions that were still in memory when the crash
occurred.
Recovery log files and the recovery history file are created automatically when a
database is created . These log files are important if you need to recover data that is
lost or damaged.
Each database includes recovery logs, which are used to recover from application or
system errors. In combination with the database backups, they are used to recover
the consistency of the database right up to the point in time when the error occurred.
The recovery history file contains a summary of the backup information that can be
used to determine recovery options, if all or part of the database must be recovered
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 55
to a given point in time. It is used to track recovery-related events such as backup
and restore operations, among others. This file is located in the database directory.
The table space change history file, which is also located in the database directory,
contains information that can be used to determine which log files are required for
the recovery of a particular table space.
You cannot directly modify the recovery history file or the table space change
history file; however, you can delete entries from the files using the PRUNE
HISTORY command. You can also use the rec_his_retentn database configuration
parameter to specify the number of days that these history files will be retained.
Securing backup, recovery, and archival information is crucial for data integrity and
business continuity. Here are various methods to achieve this:
Backup Security
Encryption: Encrypt data before it is backed up. This ensures that even if backups
are compromised, the data remains unreadable without the decryption key.
Access Controls: Implement strict access controls to limit who can perform
backups and access backup data. Use role-based access controls (RBAC) to enforce
permissions.
Regular Testing: Regularly test backup processes to ensure data can be restored
accurately and efficiently. This includes performing scheduled restore tests.
Physical Security: For on-premises backups, ensure physical security measures are
in place, such as locked storage, surveillance, and restricted access areas.
Use of Offsite Storage: Store backups in a secure offsite location or use cloud
storage to protect against local disasters.
Securing backup, recovery, and archival information is crucial for data integrity and
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 56
business continuity. Here are various methods to achieve this:
Backup Security
Encryption: Encrypt data before it is backed up. This ensures that even if backups
are compromised, the data remains unreadable without the decryption key.
Access Controls: Implement strict access controls to limit who can perform
backups and access backup data. Use role-based access controls (RBAC) to enforce
permissions.
Regular Testing: Regularly test backup processes to ensure data can be restored
accurately and efficiently. This includes performing scheduled restore tests.
Physical Security: For on-premises backups, ensure physical security measures are
in place, such as locked storage, surveillance, and restricted access areas.
Use of Offsite Storage: Store backups in a secure offsite location or use cloud
storage to protect against local disasters.
Recovery Security
Redundancy: Maintain multiple backup copies in different locations or formats.
This reduces the risk of total data loss.
Automated Recovery Solutions: Implement automated recovery solutions that can
quickly restore systems to minimize downtime during a disaster.
Documentation: Keep detailed documentation of recovery procedures, including
contact information for recovery teams and steps for restoring various systems.
Monitoring and Alerts: Use monitoring tools to track backup and recovery
processes. Set up alerts for any failures or anomalies.
Archival Security
Long-term Storage Solutions: Use durable and reliable storage media (e.g., magnetic
tape, optical discs, or cloud storage) designed for long-term data retention.
The key storage infrastructure components are Servers, storage systems, and storage
area networks (SANs). These components could be physical or virtual and are used
to provide services to the users. The storage infrastructure management includes all
the storage infrastructure-related functions that are necessary for the management of
the infrastructure components and services, and for the maintenance of data
throughout its lifecycle.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 58
management tools only enable monitoring and management of specific components.
Infrastructure discovery:
Creates an inventory of infrastructure components and provides information about
the components including their configuration, connectivity, functions, performance,
capacity, availability, utilization, and physical-to-virtual dependencies.
It provides the visibility needed to monitor and manage the infrastructure
components.
Discovery is performed using a specialized tool that commonly interacts with
infrastructure components commonly through the native APIs of these components.
Performance Management
Performance management ensures the optimal operational efficiency of all
infrastructure components so that storage services can meet or exceed the required
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 59
performance level. Performance-related data such as response time and throughput
of components are collected, analyzed, and reported by specialized management
tools. The performance analysis provides information on whether a component
meets the expected performance levels. These tools also proactively alert
administrators about potential performance issues and may prescribe a course of
action to improve a situation.
Availability Management
Availability management is responsible for establishing a proper guideline based on
the defined availability levels of services. The guideline includes the procedures and
technical features required to meet or exceed both current and future service
availability needs at a justifiable cost. Availability management also identifies all
availability-related issues in a storage infrastructure and areas where availability
must be improved.
Incident Management
An incident is an unplanned event such as an HBA failure or an application error
that may cause an interruption to services or degrade the service quality. Incident
management is responsible for detecting and recording all incidents in a storage
infrastructure. The incident management support groups investigate the incidents
escalated by the incident management tools or service desk. They provide solutions
to bring back the services within an agreed timeframe specified in the SLA. If the
support groups are unable to determine and correct the root cause of an incident,
error-correction activity is transferred to problem management. In this case, the
incident management team provides a temporary solution (workaround) to the
incident
Problem Management
A problem is recognized when multiple incidents exhibit one or more common
symptoms. Problem management reviews all incidents and their history to detect
problems in a storage infrastructure. It identifies the underlying root cause that
creates a problem and provides the most appropriate solution and/or preventive
remediation for the problem. Incident and problem management, although separate
management processes, require automated interaction between them and use
integrated incident and problem management tools. These tools may help an
administrator to track and mark specific incident(s) as a problem and transfer the
matter to problem management for further investigation.
Security Management
Security management is responsible for developing information security policies
that govern the organization’s approach towards information security management.
It establishes the security architecture, processes, mechanisms, tools, user
responsibilities, and standards needed to meet the information security policies in a
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 60
cost-effective manner. It also ensures that the required security processes and
mechanisms are properly implemented.. Security management ensures the
confidentiality, integrity, and availability of information in a storage infrastructure.
It prevents the occurrence of security-related incidents or activities that adversely
affect the infrastructure components, management processes, information, and
services. It also meets regulatory or compliance requirements (both internal and
external) for protecting information at reasonable/acceptable costs.
Mahendra Institute of Technology- Dept.of CSE(AI&ML)- AI2219304 & STORAGE TECHNOLOGIES Page 61