CCS367 Storage Technologies Lecture Notes 1
CCS367 Storage Technologies Lecture Notes 1
EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
Introduction to Information Storage: Digital data and its types, Information storage, Key
characteristics of data center and Evolution of computing platforms. Information Lifecycle
Management.Third Platform Technologies: Cloud computing and its essential characteristics,
Cloud services and cloud deployment models, Big data analytics, Social networking and mobile
computing, Characteristics of third platform infrastructure and Imperatives for third platform
transformation. Data Center Environment: Building blocks of a data center, Compute systems
and compute virtualization and Software-defined data center.
INFORMATION STORAGE
Businesses use data to derive information that is critical to their day-to-day operations.
Storage is a repository that enables users to store and retrieve this digital data.
Before the advent of computers, the procedures and methods adopted for data creation
and sharing were limited to fewer forms, such as paper and film.
Today, the same data can be converted into more convenient forms such as an e-mail
message, an e-book, a bitmapped image, or a digital movie.
This data can be generated using a computer and stored in strings of 0s and 1s, as
shown in Figure 1-2. Data in this form is called digital data and is accessible by the
user only after it is processed by a computer
With the advancement of computer and communication technologies, the rate of data
generation and sharing has increased exponentially.
The following is a list of some of the factors that have contributed to the
growth of digital data:
■ Increase in data processing capabilities: Modern-day computers provide a
significant increase in processing and storage capabilities. This enables the conversion
of various types of content and media from conventional forms to digital formats.
■ Lower cost of digital storage: Technological advances and decrease in the cost
of storage devices have provided low-cost solutions and encouraged the development of
less expensive data storage devices. This cost benefit has increased the rate at which
data is being generated and stored.
Inexpensive and easier ways to create, collect, and store all types of data, coupled with
increasing individual and business needs, have led to accelerated data growth,
popularly termed the data explosion.
Data has different purposes and criticality, so both individuals and businesses
have contributed in varied proportions to this data explosion.
The importance and the criticality of data vary with time. Most of the data
Created holds significance in the short-term but becomes less valuable over time.
2 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE
EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
This governs the type of data storage solutions used. Individuals store data on a
variety of storage devices, such as hard disks, CDs, DVDs, or Universal Serial Bus
(USB) flash drives.
Example of Research and Business data :
■ Medical data: Data related to the health care industry, such as patient history,
radiological images, details of medication and other treatment, and insurance
information
TYPES OF DATA :
For example, customer contacts may be stored in various forms such as sticky
notes, e-mail messages, business cards, or even digital format files such as .doc, .txt,
and .pdf. Due its unstructured nature, it is difficult to retrieve using a customer
relationship management application.
Unstructured data may not have the required components to identify itself uniquely for
any type of processing or interpretation. Businesses are primarily concerned with
managing unstructured data because over 80 percent of enterprise data is unstructured
and requires significant storage space and effort to manage
INFORMATION STORAGE :
INFORMATION:
Data, whether structured or unstructured, does not fulfill any purpose for individuals
or businesses unless it is presented in a meaningful form. Businesses need to analyze
data for it to be of value.
Information is the intelligence and knowledge derived from data.
Businesses analyze raw data in order to identify meaningful trends. On the basis of
these trends, a company can plan or modify its strategy. For example, a retailer
identifies customers’ preferred products and brand names by analyzing their purchase
patterns and maintaining an inventory of those products.
Effective data analysis not only extends its benefits to existing businesses, but also
creates the potential for new business opportunities by using the information in
creative ways. Job portal is an example.
In order to reach a wider set of prospective employers, job seekers post their résumés
on various websites offering job search facilities.
4 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE
EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
These websites collect the résumés and post them on centrally accessible locations for
prospective employers.
In addition, companies post available positions on job search sites. Job-matching
software matches keywords from résumés to keywords in job postings. In this manner,
the job search engine uses data and turns it into information for employers and job
seekers.
STORAGE:
Data created by individuals or businesses must be stored so that it is easily accessible for
further processing.
In a computing environment, devices designed for storing data are termed storage devices
or simply storage.
The type of storage used varies based on the type of data and the rate at which it is created
and used.
Devices such as memory in a cell phone or digital camera, DVDs, CD-ROMs, and hard
disks in personal computers are examples of storage devices.
Businesses have several options available for storing data including internal hard disks,
external disk arrays and tape
CORE ELEMENTS:
Five core elements are essential for the basic functionality of a data center:
Application: An application is a computer program that provides the logic for
computing operations. Applications, such as an order processing system, can be
layered on a database, which in turn uses operating system services to perform
read/write operations to storage devices.
Database: More commonly, a database management system (DBMS) provides
a structured way to store data in logically organized tables that are interrelated.
A DBMS optimizes the storage and retrieval of data.
Server and operating system: A computing platform that runs applica tions
and databases.
Network: A data path that facilitates communication between clients and servers
or between servers and storage.
Storage array: A device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the
elements must work together to address data processing requirements.
Figure 1-5 shows an example of an order processing system that involves the five
core elements of a data center and illustrates their functionality in a business process.
Availability
Capacity Security
Manageability
Performance
Availability:
All data center elements should be designed to ensure acces- sibility. The
inability of users to access data can have a significant negative impact on a business.
Security:
Polices, procedures, and proper integration of the data cen- ter core elements
that will prevent unauthorized access to information must be established.
In addition to the security measures for client access, specific mechanisms
must enable servers to access only their allocated resources on storage arrays.
Scalability:
Data center operations should be able to allocate additional processing
capabilities or storage on demand, without interrupting busi- ness operations.
Business growth often requires deploying more servers, new applications, and
additional databases. The storage solution should be able to grow with the business.
Performance:
All the core elements of the data center should be able to provide optimal
performance and service all processing requests at high speed.
The infrastructure should be able to support performance requirements.
Data integrity:
Data integrity refers to mechanisms such as error correc- tion codes or parity
bits which ensure that data is written to disk exactly as it was received.
Any variation in data during its retrieval implies cor- ruption, which may affect
the operations of the organization.
Capacity:
Data center operations require adequate resources to store and process large
amounts of data efficiently.
When capacity requirements increase, the data center must be able to provide
additional capacity with- out interrupting availability, or, at the very least, with
minimal disruption. Capacity may be managed by reallocation of existing resources,
rather than by adding new resources.
Manageability:
A data center should perform all operations and activi- ties in the most efficient
manner.
Manageability can be achieved through automation and the reduction of human
(manual) intervention in com- mon tasks.
Monitoring is the continuous collection of information and the review of the entire
data center infrastructure. The aspects of a data center that are monitored include
security, performance, accessibility, and capacity.
Reporting is done periodically on resource performance, capacity, and utilization.
9 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE
EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
o The value of the information is highest when a company receives a new sales
order and processes it to deliver the product.After order fulfillment, the customer
or order data need not be available for real-time access.
The company can transfer this data to less expensive secondary storage with lower
accessibility and availability requirements unless or until a warranty claim or
another event triggers its need.
After the warranty becomes void, the company can archive or dispose of data to
create space for other high-value information
initiatives of the business to meet both current and future growth in information.
■ Centrally managed: All the information assets of a business should be under
consider the different storage requirements and allocate storage resources based
on the information’s value to the business.
ILM IMPLEMENTATION:
The process of developing an ILM strategy includes four activities—
classifying, implementing, managing, and organizing:
■ Classifying data and applications on the basis of business rules and poli- cies to
ILM BENEFITS:
Implementing an ILM strategy has the following key benefits that directly address the
challenges of information management:
Improved utilization by using tiered storage platforms and increased vis ibility of all
enterprise information.
■ Simplified management by integrating process steps and interfaces with individual tools and
by increasing automation.
■ A wider range of options for backup, and recovery to balance the need for business continuity.
Maintaining compliance by knowing what data needs to be protected for what length
of time.
■ Lower Total Cost of Ownership (TCO) by aligning the infrastructure and management
costs with information value. As a result, resources are not wasted, and complexity is not
introduced by managing low-value data at the expense of high-value data.
applications, and services) that can be rapidly provisioned and released with minimal
management effort or service provider interaction.
A cloud is a type of parallel and distributed system consisting of a collection of interconnected.
virtualized computers that are dynamically provisioned and presented as one or more unified
computing resources based on service-level agreements established through negotiation
between the service provider and consumers.
There are many characteristics of Cloud Computing here are few of them:
1.On-demand self-services:
The Cloud computing services does not require any human administrators, user themselves are
able to provision, monitor and manage computing resources as needed.
3.Rapid elasticity:
The Computing services should have IT resources that are able to scale out and in quickly and
on as needed basis. Whenever the user requires services it is provided to him and it is scale out as
soon as its requirement gets over.
4.Resource pooling:
The IT resource (e.g., networks, servers, storage, applications, and services) present are shared
across multiple applications and occupant in an uncommitted manner. Multiple clients are
provided service from a same physical resource.
5.Measured service:
The resource utilization is tracked for each application and occupant, it will provide both the
user and the resource provider with an account of what has been used. This is done for various
reasons like monitoring billing and effective use of resource.
6.Multi-tenancy:
Cloud computing providers can support multiple tenants (users or organizations) on a
single set of shared resources
.
7.Virtualization:
Cloud computing providers use virtualization technology to abstract underlying hardware
resources and present them as logical resources to users.
8.Resilient computing:
Cloud computing services are typically designed with redundancy and fault tolerance in
mind, which ensures high availability and reliability.
10.Security:
Cloud providers invest heavily in security measures to protect their users’ data and ensure the
privacy of sensitive information.
11.Automation:
Cloud computing services are often highly automated, allowing users to deploy and manage
resources with minimal manual intervention.
12.Sustainability:
Cloud providers are increasingly focused on sustainable practices, such as energy-
efficient data centers and the use of renewable energy sources, to reduce their environmental impact.
Advantages:
1. Easy implementation
2. Accessibility
3. No hardware required
4. Cost per head
5. Flexibility for growth
6. Efficient recovery
Disadvantages:
1. No longer in control
2. May not get all the features
3. Doesn't mean you should do away with servers
4. No Redundancy
5. Bandwidth issues
CLOUD MODELS :
The Cloud Models are as follows :
Cloud Service models
Cloud Deployment models.
CLOUD SERVICES :
The term "cloud services" refers to a wide range of services delivered on demand to companies and
customers over the internet.
These services are designed to provide easy, affordable access to applications and resources, without
the need for internal infrastructure or hardware.
From checking email to collaborating on documents, most employees use cloud services throughout
the workday, whether they’re aware of it or not.
Cloud services are fully managed by cloud computing vendors and service providers.
They’re made available to customers from the providers' servers, so there's no need for a company to
host applications on its own on-premises servers.
There are the following three types of cloud service models -
1. Infrastructure as a Service (IaaS)
2. Platform as a Service (PaaS)
3. Software as a Service (SaaS)
Basis Of IAAS PAAS SAAS
It is a service
It is a cloud
model that It is a service model
computing model
provides in cloud computing
that delivers tools
virtualized that hosts software to
that are used for
computing make it available to
the development of
resources over the clients.
applications.
Model internet.
There is no
It requires Some knowledge is requirement about
technical required for the technicalities
Technical knowledge. basic setup. company handles
understanding. everything.
It has about a 27 %
It has around a It has around 32%
rise in the cloud
12% increment. increment.
Percentage rise computing model.
Used by the
Used by mid-level Used among the
skilled developer
developers to build users of
to develop unique
applications. entertainment.
Usage applications.
Outsourced Force.com,
Salesforce AWS, Terremark
cloud services. Gigaspaces.
ADVANTAGES OF IAAS:
The resources can be deployed by the provider to a customer’s environment at any given time.
Its ability to offer the users to scale the business based on their requirements.The provider has
various options when deploying resources including virtual machines, applications, storage, and
networks.
DISADVANTAGES OF IAAS:
Security issues are there.
Service and Network delays are quite a issue in IaaS.
ADVANTAGES OF PAAS :
Programmers need not worry about what specific database or language the application has been
programmed in.
It offers developers the to build applications without the overhead of the underlying operating
system or infrastructure.
Provides the freedom to developers to focus on the application’s design while the platform takes
care of the language and the database.
It is flexible and portable.
It is quite affordable.
It manages application development phases in the cloud very efficiently.
DISADVANTAGES OF PAAS
Data is not secure and is at big risk.
As data is stored both in local storage and cloud, there are high chances of data mismatch while
integrating the data.
ADVANTAGES OF SAAS:
It is a cloud computing service category providing a wide range of hosted capabilities and
services. These can be used to build and deploy web-based software applications.
It provides a lower cost of ownership than on-premises software. The reason is it does not require
the purchase or installation of hardware or licenses.
It can be easily accessed through a browser along a thin client.
No cost is required for initial setup.
Low maintenance costs.
Installation time is less, so time is managed properly.
DISADVANTAGES OF SAAS:
Low performance.
It has limited customization options.
It has security and data concerns.
Data Security and Privacy: It’s suitable for storing corporate information to which only
authorized staff have access. By segmenting resources within the same infrastructure,
improved access and security can be achieved.
Supports Legacy Systems: This approach is designed to work with legacy systems that are
unable to access the public cloud.
Customization: Unlike a public cloud deployment, a private cloud allows a company to tailor
its solution to meet its specific needs.
Disadvantages of the private cloud model:
Less scalable: Private clouds are scaled within a certain range as there is less number of
clients.
Costly: Private clouds are more costly as they provide personalized facilities.
PUBLIC CLOUD : The cloud infrastructure is owned by an organization that sells cloud services
to the general public or to a large industry group.
HYBRID CLOUD: The cloud infrastructure is a composition of two or more clouds (internal,
community, or public) that remain unique entities. However, these entities are bound together by
standardized or proprietary technology that enables data and application portability, for example, cloud
bursting. Figure 6-3 shows cloud computing deployment models
Slow data transmission: Data transmission in the hybrid cloud takes place through the
public cloud so latency occurs.
Big data analytics is important because it helps companies leverage their data to identify opportunities
for improvement and optimisation.
Across different business segments, increasing efficiency leads to overall more intelligent operations,
higher profits, and satisfied customers.
Big data analytics helps companies reduce costs and develop better, customer-centric products and
services.
Data analytics helps provide insights that improve the way our society functions. In health care, big data
analytics not only keeps track of and analyses individual records but it plays a critical role in measuring
outcomes on a global scale.
During the COVID-19 pandemic, big data-informed health ministries within each nation’s government
on how to proceed with vaccinations and devised solutions for mitigating pandemic outbreaks in the
future.
Four main types of big data analytics support and inform different business decisions.
27 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
EnggTree.com
SCET – DEPT OF CSE CCS367 – STORAGE TECHNOLOGIES
1. Descriptive analytics:
Descriptive analytics refers to data that can be easily read and interpreted. This data helps create
reports and visualise information that can detail company profits and sales.
Example: During the pandemic, a leading pharmaceutical company conducted data analysis on its
offices and research labs. Descriptive analytics helped them identify consolidated unutilised spaces
and departments, saving the company millions of pounds.
2. Diagnostics analytics:
Diagnostics analytics helps companies understand why a problem occurred. Big data technologies
and tools allow users to mine and recover data that helps dissect an issue and prevent it from
happening in the future.
Example: An online retailer’s sales have decreased even though customers continue to add items to
their shopping carts. Diagnostics analytics helped to understand that the payment page was not
working correctly for a few weeks.
3. Predictive analytics:
Predictive analytics looks at past and present data to make predictions. With artificial intelligence
(AI), machine learning, and data mining, users can analyse the data to predict market trends.
Example: In the manufacturing sector, companies can use algorithms based on historical data to predict
if or when a piece of equipment will malfunction or break down.
4. Prescriptive analytics:
Prescriptive analytics solves a problem, relying on AI and machine learning to gather and use
data for risk management.
Example: Within the energy sector, utility companies, gas producers, and pipeline owners
identify factors that affect the price of oil and gas to hedge risks.
BIG DATA ANALYTICS TOOLS:
Harnessing all of that data requires tools. Thankfully, technology has advanced so that many intuitive
software systems are available for data analysts to use.
Hadoop: An open-source framework that stores and processes big data sets. Hadoop can handle and
analyses structured and unstructured data.
Spark: An open-source cluster computing framework for real-time processing and data analysis.
Data integration software: Programs that allow big data to be streamlined across different platforms,
such as MongoDB, Apache, Hadoop, and Amazon EMR.
Stream analytics tools: Systems that filter, aggregate, and analyse data that might be stored in
different platforms and formats, such as Kafka.
Distributed storage: Databases that can split data across multiple servers and can identify lost or
corrupt data, such as Cassandra.
Predictive analytics hardware and software: Systems that process large amounts of complex data,
using machine learning and algorithms to predict future outcomes, such as fraud detection, marketing, and risk
assessments.
Data mining tools: Programs that allow users to search within structured and unstructured big data.
NoSQL databases: non-relational data management systems ideal for dealing with raw and
unstructured data.
Data warehouses: Storage for large amounts of data collected from many different sources, typically
using predefined schemas.
SOCIAL NETWORKING:
Social Networking refers to grouping of individuals and organizations together via some
medium, in order to share thoughts, interests, and activities.
There are several web based social network services are available such as facebook, twitter,
linkedin, Google+ etc. which offer easy to use and interactive interface to connect with people with
in the country an overseas as well. There are also several mobile based social networking services in
for of apps such as Whatsapp, hike, Line etc.
The following table describes some of the famous social networking services provided over
web and mobile:
1. Facebook
Allows to share text, photos, video etc. It also offers interesting online games.
2. Google+
It is pronounced as Google Plus. It is owned and operated by Google.
3. Twitter
Twitter allows the user to send and reply messages in form of tweets. These tweets are the small messages, gen
characters.
4. Faceparty
Faceparty is a UK based social networking site. It allows the users to create profiles and interact with each othe
messages.
5. Linkedin
Linkedin is a business and professional networking site.
6. Flickr
7. Ibibio
Ibibio is a talent based social networking site. It allows the users to promote one’s self and also discover new ta
8. WhatsApp
It is a mobile based messaging app. It allows to send text, video, and audio messages
9. Line
It is same as whatsapp. Allows to make free calls and messages.
10. Hike
It is also mobile based messager allows to send messages and exciting emoticons.
Following are the areas where social networking has become most popular:
Online Marketing:
Website like facebook allows us to create a page for specific product, community or firm and
promiting over the web.
Online Jobs:
Website like linkedin allows us to create connection with professionals and helps to find the
suitable job based on one’s specific skills set.
Online News:
On social networking sites, people also post daily news which helps us to keep us updated.
Chatting:
Social networking allows us to keep in contact with friends and family. We can communicate
with them via messages.
One can share picture, audio and video using social networking sites.
MOBILE COMPUTING:
Mobile Computing is a technology that provides an environment that enables users to
transmit data from one device to another device without the use of any physical link or
cables.
In other words, you can say that mobile computing allows transmission of data, voice and
video via a computer or any other wireless-enabled device without being connected to a fixed
physical link. In this technology, data transmission is done wirelessly with the help of
wireless devices such as mobiles, laptops etc.
This is only because of Mobile Computing technology that you can access and transmit data
from any remote locations without being present there physically. Mobile computing
technology provides a vast coverage diameter for communication. It is one of the fastest and
most reliable sectors of the computing technology field.
MOBILE COMMUNICATION:
Mobile Communication specifies a framework that is responsible for the working of mobile
computing technology. In this case, mobile communication refers to an infrastructure that ensures
seamless and reliable communication among wireless devices.
This framework ensures the consistency and reliability of communication between wireless devices.
The mobile communication framework consists of communication devices such as protocols,
services, bandwidth, and portals necessary to facilitate and support the stated services. These devices
are responsible for delivering a smooth communication process.
MOBILE SOFTWARE:
Mobile software is a program that runs on mobile hardware. This is designed to deal capably with the
characteristics and requirements of mobile applications.
This is the operating system for the appliance of mobile devices. In other words, you can say it the
heart of the mobile systems.
This is an essential component that operates the mobile device.
.
32 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
EnggTree.com
SCET – DEPT OF CSE CCS367 – STORAGE TECHNOLOGIES
Connectivity:
While the mobile infrastructure continues to improve, there are areas where signal strength is
poor or nonexistent.
Data security:
Mobile computing raises significant data security vulnerabilities because business users,
especially, may have sensitive data on their devices while traveling or working remotely.
Companies must implement security measures and policies to keep corporate data secure.
Dependence:
The flip side to the convenience of mobile devices is that consumers may become overly
reliant on them, which can lead to compulsive or unhealthy behaviors such as smartphone
addiction.
Distraction:
Mobile devices can be distracting and potentially dangerous in a hazardous work envronment
that requires the employee's attention, such as a construction site.
They pose dangers if used inappropriately while driving.
First Platform:
First Platform (Mainframe) - late 1950s to present
The first platform is the mainframe computer system, which began in the late 1950s and continues today.
Second Platform:
Second Platform (Client/Server) - mid 1980s to present
The second platform is the client/server system, which began in the mid-1980s with PCs tapping into
mainframe databases and applications.
Third Platform:
Third Platform (Social, Mobile, Cloud & Analytics, possibly IoT) - early 2010s to present
The third platform is a term coined by marketing firm International Data Corporation (IDC) to describe a
model of computing platform. It was promoted as an interdependence between mobile computing, social
media, cloud computing, information/analytics (big data), and possibly the Internet of Things.
No single "third platform" product has emerged, but there are a number of proprietary and free
software products that enterprises can use to create, deploy and operate solutions that use third platform
technologies.
Within an enterprise, a combination of these products that meet enterprise needs is a "third platform"
for that enterprise. Its design can be considered part of Enterprise Architecture.
Gartner defined a social technology as, “Any technology that facilitates social interactions and is
enabled by a communications capability, such as the Internet or a mobile device.” This extends not
only to social media but also to all social technologies that make social interaction possible. A VoIP
service, for example, would be considered a social technology.
In a trend that has been described as ‘social everything’, companies both big and small, will continue
to inject a social element into every product and service.
The cloud provides the infrastructure that makes the information accessible, the social technology
helps to organise the data and facilitate access, and the mobile devices will provide the means by
which most people receive the data.
Mobile devices
The third platform is designed to give everybody access to big data via mobile devices; it is this
mobility that really defines the third platform. A company representative on the road or working
from home will have instant access to data through his or her mobile device with this third platform
whenever and wherever they need it.
An example of the use of mobile devices in the third platform would be a school that gives every
student a tablet. The tablet would take the place of textbooks and paper used in assignments, but
more importantly, the student will have access to a virtual classroom at additional times.[11]
Analytics (big data)
The concept behind big data is to maximize the utility of all data. An executive at a company that
streamlines its business functions with the third platform would have easy access to all of the data,
including sales figures, personnel information, accounting data, financials and so on. This data can
then be used to inform more areas of the business.
Big data can be further differentiated once we analyze its three distinguishing features: variety,
volume, and velocity.
o Variety means that many forms of data are collected, with formats ranging from audio and
video to client log files and Tweets.
o Volume represents the fact that big data must come in massive quantities, often over
a petabyte.
o Velocity signifies that big data must be constantly collected for maximum effectiveness; even
data that is a few days old is not ideal.
Cloud services
Cloud services are at the heart of the third platform. Having big data and mobile devices is one
thing, but without the cloud, there will be no way to access this data from outside of the office.
This differs greatly from the first platform, where computer networks consisted of large mainframes.
All of a company's employees had access to the data in the mainframe but they could only access it
through their desktop computers.
In the second platform, a company's employees could access the data in the mainframe as well as
outside data, via an Internet connection.
The third platform will allow all of a company's IT solutions to be available through the cloud,
accessible via a variety of mobile devices. Data storage, servers and many IT solutions, which are
on-site, can now be cloud-based.
Internet of Things (IOT)
The Internet of Things is the network of connected devices that enable computer systems to monitor
and control aspects of the physical environment. It has applications in personal and home
environments, smart cities, factory automation, transport, and many other areas.
The incorporation of the Internet of Things in the third platform gives enterprises the ability to
interact with these systems and use these applications.
Sensors and actuators have been used in computer systems for many years. It is the ability to connect
to such devices anywhere in the world through the Internet that characterizes the Internet of things.
Third platform services are structured in a way to boost connectivity. With hyper-competition and
relatively similar service offerings, customer-centricity put front and centre will be one way that suppliers
could gain genuine competitive advantage.
With the 4 pillars of Cloud, Mobile, Social and Analytics getting established, both Gartner and IDC
are prediction a new digital era that Gartner calls the Digital Industrial Revolution. IDC refers to 6
innovations accelerators sitting on top of the 4 pillars and accelerating the transformation to Digital.
As evidenced by the wave of new IoT devices and solutions at display at several tradeshows this
year, the physical world is soon entering the digital era with future connected cars, houses, wallets, etc.
2. Cognitive systems
Analyzing the vast amount of IoT data created by the connected devices with the next wave of
analytics tools (diagnostic, predictive and prescriptive), cognitive systems with observe, learn and offer
suggestion thus reshaping the services industry.
3. Pervasive robotics
A new era of automation driven by knowledge gained from the digital world and set in action in the
physical world with self-driving cars, drones, robots, etc.
Those printers will be creating physical things of all kind from a digital drawing: not only plastic
parts (quite common now, my local library even has a 3D printer for plastic parts) and fine resolution metal
parts but also food and clothing items, and, eventually, living tissues and organs!
5. Natural interfaces
Beyond mouse and keyboards, using touch and motion, speech (starting to be common on
smartphones and cars nowadays) and also vision to connect people to their devices and 3rd Platform
solutions.
With the predicted massive amount of new connected IoT devices, a better way to secure access to
systems and devices is required to avoid the security breaches we have experienced in 2014.
You won’t need to interact with the service provider. Cloud customers can access their cloud
accounts through a web self-service portal to view their cloud services, monitor their usage, and provision
and de-provision services.
3. Resource Pooling
With resource pooling, multiple customers can share physical resources using a multi-tenant model.
This model assigns and reassigns physical and virtual resources based on demand. Multi-tenancy allows
customers to share the same applications or infrastructure while maintaining privacy and security.
Though customers won't know the exact location of their resources, they may be able to specify the
location at a higher level of abstraction, such as a country, state, or data center. Memory, processing, and
bandwidth are among the resources that customers can pool.
4. Rapid Elasticity
It should be capable of handling massive expansion as no. of resources connected increasing
sometimes automatically, so customers can scale quickly based on demand. The capabilities available for
provisioning are practically unlimited.
Customers can engage with these capabilities at any time in any quantity.
Digital Transformation
Digital transformation is the incorporation of computer-based technologies into an organization's products,
processes and strategies. Organizations undertake digital transformation to better engage and serve their
workforce and customers and thus improve their ability to compete.
A digital transformation initiative can require an examination and reinvention of all facets of an
organization, from supply chains and workflows, to employee skill sets and org charts, to customer
interactions and value proposition to stakeholders.
One particularly important new ability that Microsoft has launched is encrypting data while it’s in
use without changing applications. This quickly safeguards digital assets and will help with
compliance for customers in regulated industries.
Enterprise data centers are typically constructed and used by a single organization for their own internal
purposes. These are common among tech giants.
Colocation data centers function as a kind of rental property where the space and resources of a data center
are made available to the people willing to rent it.
Managed service data centers offer aspects such as data storage, computing, and other services as a third
party, serving customers directly.
Cloud data centers are distributed and are sometimes offered to customers with the help of a third-party
managed service provider.
Apart from the Data Centers, support infrastructure is essential to meeting the service level
agreements of an enterprise data center.
Data Center Computing
Servers are the engines of the data center. On servers, the processing and memory used to run
applications may be physical, virtualized, distributed across containers, or distributed among remote
nodes in an edge computing model.
Data centers must use processors that are best suited for the task, e.g. general purpose CPUs may not
be the best choice to solve artificial intelligence (AI) and machine learning (ML) problems.
Data Center Storage
Data centers host large quantities of sensitive information, both for their own purposes and the needs
of their customers. Decreasing costs of storage media increases the amount of storage available for
backing up the data either locally, remote, or both.
Advancements in non-volatile storage media lowers data access times.
In addition, as in any other thing that is software-defined, software-defined storage technologies
increase staff efficiency for managing a storage system.
Data Center Networks
Datacenter network equipment includes cabling, switches, routers, and firewalls that connect servers
together and to the outside world. Properly configured and structured, they can manage high
volumes of traffic without compromising performance.
A typical three-tier network topology is made up of core switches at the edge connecting the data
center to the Internet and a middle aggregate layer that connects the core layer to the access layer
where the servers reside.
Advancements, such as hyperscale network security and software-defined networking, bring cloud-
level agility and scalability to on-premises networks.
In the cloud
Compute virtualization, where virtual machines (VMs)—including their operating systems, CPUs,
memory, and software—reside on cloud servers. Compute virtualization allows users to create software
implementations of computers that can be spun up or spun down as needed, decreasing provisioning
time.
Network virtualization, where the network infrastructure servicing your VMs can be provisioned
without worrying about the underlying hardware. Network infrastructure needs—telecommunications,
43 Prepared by, Mrs.J.REVATHI, AP / CSE
Downloaded from EnggTree.com
EnggTree.com
SCET – DEPT OF CSE CCS367 – STORAGE TECHNOLOGIES
firewalls, subnets, routing, administration, DNS, etc.—are configured inside your cloud SDDC on the
vendor’s abstracted hardware. No network hardware assembly is required.
Storage virtualization, where disk storage is provisioned from the SDDC vendor’s storage pool.
You get to choose your storage types, based on your needs and costs. You can quickly add storage to a
VM when needed.
Management and automation software. SDDCs use management and automation software to keep
business critical functions working around the clock, reducing the need for IT manpower. Remote
management and automation is delivered via a software platform accessible from any suitable location,
via APIs or Web browser access.
You can also connect additional critical software to connect with and customize your SDDC platform. But,
for companies just moving to an SDDC, your first goal is to get your basic operations software
infrastructure ready for the transition. Customizing can come later.
Benefits of SDDCs
1. Business agility
An SDDC offers several benefits that improve business agility with a focus on three key areas:
Balance
Flexibility
Adaptability
2. Reduced cost
In general, it costs less to operate an SDDC than housing data in brick-and-mortar data centers.
Cloud SDDCs operate similarly to SaaS platforms that charge a recurring monthly cost.
This is usually an affordable rate, making an SDDC accessible to all types of businesses, even those
who may not have a big budget for technology spending.
3. Increased scalability
By design, cloud SDDCs can easily expand along with your business. Increasing your storage space
or adding functions is usually as easy as contacting the data facility to get a revised monthly service quote.
B
security, and scalability. A hard disk drive is a core element of storage that governs the
perfor- mance of any storage system.
Some of the older disk array technologies could not overcome per- formance
constraints due to the limitations of a hard disk and its mechanical components. RAID
technology made an important contribution to enhancing storage performance and
reliability, but hard disk drives even with a RAID implementation could not meet
performance require- ments of today’s applications.
With advancements in technology, a new breed of storage solutions known as an
intelligent storage system has evolved. These storage systems are configured with large
amounts of memory called cache and use sophisticated algorithms to meet the I/O
requirements of performance- sensitive applications.
An I/O request received from the host at the front-end port is processed through
cache and the back end, to enable storage and retrieval of data from the physical disk.
A read request can be serviced directly from cache if the requested data is found in
cache.
Storage
Netw ork
P ort P ort
Controllers
FRONT END
The front end provides the interface between the storage system and the host. It
consists of two components: front-end ports and front-end controllers.
The front-end ports enable hosts to connect to the intelligent storage system. Each
front-end port has processing logic that executes the appropriate transport pro-
tocol, such as SCSI, Fibre Channel, or iSCSI, for storage connections.
Redundant ports are provided on the front end for high availability.
Front-end controllers route data to and from cache via the internal data bus.
When cache receives write data, the controller sends an acknowledgment message
back to the host. Controllers optimize I/O processing by using command queuing
algorithms.
executed.
With command queuing, multiple commands can be executed concurrently
based on the organization of data on the disk, regardless of the order in which
the commands were received.
The most commonly used command queuing algorithms are as follows:
■ First In First Out (FIFO): This is the default algorithm where commands are
executed in the order in which they are received (Figure 4-2 [a]). There is no
reordering of requests for optimization; therefore, it is inef- ficient in terms of
performance.
■ Seek Time Optimization: Commands are executed based on optimizing
read/write head movements, which may result in reordering of com-mands.
Without seek time optimization, the commands are executed in the order they
are received.
For example, as shown in Figure 4-2(a), the commands are executed in the order
A, B, C and D. The radial movement required by the head to execute C
immediately after A is less than what would be required to execute B.
With seek time optimization, the command execution sequence would be A, C, B
and D, as shown in Figure 4-2(b).
CACHE
Cache is an important component that enhances the I/O performance in an
intelligent storage system.
Cache is semiconductor memory where data is placed temporarily to reduce the
time required to service I/O requests from the host.
Cache improves storage system performance by isolating hosts from the
mechanical delays associated with physical disks, which are the slowest compo-
nents of an intelligent storage system.
Accessing data from a physical disk usu- ally takes a few milliseconds because of
seek times and rotational latency. If a disk has to be accessed by the host for every
I/O operation, requests are queued, which results in a delayed response.
Accessing data from cache takes less than a millisecond. Write data is placed in
cache and then written to disk. After the data is securely placed in cache, the host
is acknowledged immediately.
Structure of Cache
Cache is organized into pages or slots, which is the smallest unit of cache
allo- cation.
The size of a cache page is configured according to the application I/O size.
Cache consists of the data store and tag RAM.
The data store holds the data while tag RAM tracks the location of the data in
the data store (see Figure 4-3) and in disk.
Entries in tag RAM indicate where data is found in cache and where the data
belongs on the disk. Tag RAM includes a dirty bit flag, which indicates
whether the data in cache has been committed to the disk or not.
Prepared by, Mrs.J.REVATHI, AP / CSE
When a host issues a read request, the front-end controller accesses the tag RAM
to determine whether the required data is available in cache.
If the requested data is found in the cache, it is called a read cache hit or read hit
and data is sent directly to the host, without any disk operation (see Figure 4-4[a]).
This provides a fast response time to the host (about a millisecond).
If the requested data is not found in cache, it is called a cache miss and the data
must be read from the disk (see Figure 4-4[b]).
The back-end controller accesses the appropriate disk and retrieves the requested
data. Data is then placed in cache and is finally sent to the host through the front-
end controller. Cache misses increase I/O response time.
The intelligent storage system offers fixed and variable pre-fetch sizes.
In fixed pre-fetch, the intelligent storage system pre-fetches a fixed amount of data.
It is most suitable when I/O sizes are uniform.
In variable pre-fetch, the storage system pre-fetches an amount of data in multiples
of the size of the host request.
Read performance is measured in terms of the read hit ratio, or the hit rate,
usually expressed as a percentage.
This ratio is the number of read hits with respect to the total number of read requests.
A higher read hit ratio improves the read performance.
Cache Implementation
Cache can be implemented as either dedicated cache or global cache. With dedi-
cated cache, separate sets of memory locations are reserved for reads and writes.
In global cache, both reads and writes can use any of the available memory
addresses. Cache management is more efficient in a global cache implementa-
tion, as only one global set of addresses has to be managed.
Cache Management
Even though intelligent storage systems can be configured with large amounts of
cache, when all cache pages are filled, some pages have to be freed up to accom
modate new data and avoid performance degradation.
long time. LRU either frees up these pages or marks them for reuse.
■ Most Recently Used (MRU): An algorithm that is the converse of LRU. In
MRU, the pages that have been accessed most recently are freed up or
marked for reuse.
As cache fills, the storage system must take action to flush dirty pages
(data written into the cahce but not yet written to the disk) in order to
manage its availability.
Flushing is the process of committing data from cache to the disk. On the
basis of the I/O access rate and pattern, high and low levels called
watermarks are set in cache to manage the flushing process.
High watermark (HWM) is the cache utilization level at which the storage
system starts high- speed flushing of cache data.
Low watermark (LWM) is the point at which the storage system stops the
high-speed or forced flushing and returns to idle flush behavior.
The cache utilization level, as shown in Figure 4-5, drives the mode of
flushing to be used:
■ Idle flushing: Occurs continuously, at a modest rate, when the cache
utilization level is between the high and low watermark.
■ High watermark flushing: Activated when cache utilization hits the high
watermark. The storage system dedicates some additional resources to flush-
ing.
■ Forced flushing: Occurs in the event of a large I/O burst when cache reaches
100 percent of its capacity, which significantly affects the I/O response time.
In forced flushing, dirty pages are forcibly flushed to disk.
BACK END
The back end provides an interface between cache and the physical disks. It con- sists
of two components: back-end ports and back-end controllers.
The back end controls data transfers between cache and the physical disks. From cache,
data is sent to the back end and then routed to the destination disk. Physical disks are
connected to ports on the back end.
The back end controller communicates with the disks when performing reads and writes
Prepared by, Mrs.J.REVATHI, AP / CSE
and also provides additional, but lim ited, temporary data storage.
PHYSICAL DISK
A physical disk stores data persistently.
Disks are connected to the back-end with either SCSI or a Fibre Channel
interface.
An intelligent storage system enables the use of a mixture of SCSI or Fibre Channel
drives and IDE/ATA drives.
Logical Unit Number
For example, without the use of LUNs, a host requiring only 200 GB
could be allocated an entire 1TB physical disk. Using LUNs, only the
required 200 GB would be allocated to the host, allowing the remaining
800 GB to be allocated to other hosts.
For example, Figure 4-6 shows a RAID set consisting of five disks that have been
sliced, or partitioned, into several LUNs. LUNs 0 and 1 are shown in the figure.
Note how a portion of each LUN resides on each physical disk in the RAID set.
Prepared by, Mrs.J.REVATHI, AP / CSE
LUNs 0 and 1 are presented to hosts 1 and 2, respectively, as physical vol- umes for
storing and retrieving data. Usable capacity of the physical volumes is determined by
the RAID type of the RAID set.
The capacity of a LUN can be expanded by aggregating other LUNs with it.
The result of this aggregation is a larger capacity LUN, known as a meta- LUN. The
mapping of LUNs to their physical location on the drives is man- aged by the operating
environment of an intelligent storage system.
LUN Masking
LUN masking is a process that provides data access control by defining which
LUNs a host can access.
LUN masking function is typically implemented at the front end controller. This
ensures that volume access by servers is controlled appropriately, preventing
unauthorized or accidental use in a distributed environment.
For example, consider a storage array with two LUNs that store data of the sales
and finance departments. Without LUN masking, both departments can easily see
and modify each other’s data, posing a high risk to data integrity and security.
Data can be recorded and erased on a magnetic disk any number of times.
Key components of a disk drive are platter, spindle, read/write head, actuator arm
assembly, and controller (Figure 2-2):
PLATTER
A typical HDD consists of one or more flat circular disks called platters (Figure
2-3). The data is recorded on these platters in binary codes (0s and 1s).
The set of rotating platters is sealed in a case, called a Head Disk Assembly
(HDA). A platter is a rigid, round disk coated with magnetic material on both
surfaces (top and bottom).
The data is encoded by polarizing the magnetic area, or domains, of the disk
surface. Data can be written to or read from both surfaces of the platter.
The number of platters and the storage capacity of each platter determine the
total capacity of the drive.
SPINDLE
A spindle connects all the platters, as shown in Figure 2-3, and is connected to a
motor. The motor of the spindle rotates with a constant speed.
The disk platter spins at a speed of several thousands of revolutions per minute
(rpm). Disk drives have spindle speeds of 7,200 rpm, 10,000 rpm, or 15,000
rpm. Disks used on current storage systems have a platter diameter of 3.5” (90
mm).
When the platter spins at 15,000 rpm, the outer edge is moving at around 25
percent of the speed of sound.
READ/WRITE HEAD
Read/Write (R/W) heads, shown in Figure 2-4, read and write data from or to
a platter.
Drives have two R/W heads per platter, one for each surface of the platter.
The R/W head changes the magnetic polarization on the surface ofthe platter when
writing data. While reading data, this head detects magnetic polarization on the surface
of the platter.
During reads and writes, the R/W head senses the magnetic polarization and never
touches the surface of the platter. When the spindle is rotating, there is a microscopic
air gap between the R/W heads and the platters, known as the head flying height.
This air gap is removed when the spindle stops rotating and the R/W head rests on a
special area on the platter near the spindle. This area is called the landing zone. The
landing zone is coated with a lubricant to reduce friction between the head and the
Prepared by, Mrs.J.REVATHI, AP / CSE
platter.
The logic on the disk drive ensures that heads are moved to the landing zone before
they touch the surface. If the drive malfunctions and the R/W head accidentally touches
the surface of the platter outside the landing zone, a head crash occurs.
CONTROLLER
The controller (see Figure 2-2 [b]) is a printed circuit board, mounted at the
bot- tom of a disk drive. It consists of a microprocessor, internal memory,
circuitry, and firmware.
The firmware controls power to the spindle motor and the speed of the
motor. It also manages communication between the drive and the host.
In addition, it controls the R/W operations by moving the actuator arm and
switching between different R/W heads, and performs the optimization of data
access.
1
Prepared by, Mrs.J.REVATHI, AP / CSE
Typically, a sector holds 512 bytes of user data, although some disks can be
formatted with larger sector sizes. In addition to user data, a sector also stores other
information, such as sector number, head number or platter number, and track number.
Consequently, there is a difference between the capacity of an unformatted disk
and a format- ted one. Drive manufacturers generally advertise the unformatted
capacity — for example, a disk advertised as being 500GB will only hold 465.7GB of
user data, and the remaining 34.3GB is used for metadata.
A cylinder is the set of identical tracks on both surfaces of each drive plat- ter.
The location of drive heads is referred to by cylinder number, not by track number.
Prepared by, Mrs.J.REVATHI, AP / CSE
In Figure 2-7 (b), the drive shows eight sectors per track, eight heads,
and four cylinders. This means a total of 8 × 8 × 4 = 256 blocks, so the block
number ranges from 0 to 255. Each block has its own unique address.
Assuming that the sector holds 512 bytes, a 500 GB drive with a formatted
capacity of 465.7 GB will have in excess of 976,000,000 blocks.
Seek Time
The seek time (also called access time) describes the time taken to position the R/W
Prepared by, Mrs.J.REVATHI, AP / CSE
heads across the platter with a radial movement (moving along the radius of the platter).
In other words, it is the time taken to reposition and settle the arm and the head over
the correct track. The lower the seek time, the faster the I/O operation.
Rotational Latency
To access data, the actuator arm moves the R/W head over the platter to a par-
ticular track while the platter spins to position the requested sector under the R/W head.
The time taken by the platter to rotate and position the data under the R/W head is
called rotational latency.
This latency depends on the rotation speed of the spindle and is measured in
milliseconds. The average rotational latency is one-half of the time taken for a full
rotation.
Average rotational latency is around 5.5 ms for a 5,400-rpm drive, and around
2.0 ms for a 15,000-rpm drive.
Internal transfer rate is the speed at which data moves from a single track of a
platter’s surface to internal buffer (cache) of the disk. Internal transfer rate takes into
account factors such as the seek time.
External transfer rate is the rate at which data can be moved through the interface to
the HBA. External transfer rate is generally the advertised speed of the interface, such as
133 MB/s for ATA. The sustained external transfer rate is lower than the interface speed.
Hardware RAID
In hardware RAID implementations, a specialized hardware controller is imple- mented
either on the host or on the array. These implementations vary in the way the storage array
interacts with the host.
Controller card RAID is host-based hardware RAID implementation in which a specialized
RAID controller is installed in the host and HDDs are connected to it.
The RAID Controller interacts with the hard disks using a PCI bus. Manufacturers also
integrate RAID controllers on motherboards. This integra- tion reduces the overall cost of the
system, but does not provide the flexibility required for high-end storage systems.
The external RAID controller is an array-based hardware RAID. It acts as an interface
between the host and disks. It presents storage volumes to the host, which manage the drives
using the supported protocol. Key functions of RAID controllers are:
■ Management and control of disk aggregations
■ Translation of I/O requests between logical disks and physical disks
■ Data regeneration in the event of disk failures
A RAID array is an enclosure that contains a number of HDDs and the supporting hardware and
software to implement RAID. HDDs inside a RAID array are usually contained in smaller sub-
enclosures.
These sub-enclosures, or physical arrays, hold a fixed number of HDDs, and may also include
other supporting hardware, such as power supplies. A subset of disks within a RAID array can be
grouped to form logical associations called logical arrays, also known as a RAID set or a RAID
group (see Figure 3-1).
Logical arrays are comprised of logical volumes (LV). The operating system recognizes the LVs
as if they are physical HDDs managed by the RAID controller.
The number of HDDs in a logical array depends on the RAID level used. Configurations could
have a logical array with multiple physical arrays or a physical array with multiple logical arrays.
RAID LEVELS
RAID levels (see Table 3-1) are defined on the basis of striping, mirroring, and parity
techniques. These techniques determine the data availability and per- formance characteristics of
an array. Some RAID arrays use one technique, whereas others use a combination of techniques.
Application performance and data availability requirements determine the RAID level selection.
Striping
A RAID set is a group of disks. Within each disk, a predefined number of
contiguously addressable disk blocks are defined as strips. The set of aligned strips that
spans across all the disks within the RAID set is called a stripe. Figure 3-2 shows physical
and logical representations of a striped RAID set.
Strip size (also called stripe depth) describes the number of blocks in a strip, and is the
maximum amount of data that can be written to or read from a single HDD in the set before the
next HDD is accessed, assuming that the accessed data starts at the beginning of the strip.
Note that all strips in a stripe have the same number of blocks, and decreasing strip size
means that data is broken into smaller pieces when spread across the disks.
Stripe size is a multiple of strip size by the number of HDDs in the RAID set.
Stripe width refers to the number of data strips in a stripe.
Striped RAID does not protect data unless parity or mirroring is used. However, striping
may significantly improve I/O performance. Depending on the type of RAID implementation,
the RAID controller can be configured to access data across multiple HDDs simultaneously.
Mirroring
Mirroring is a technique whereby data is stored on two different HDDs, yield- ing two
copies of data. In the event of one HDD failure, the data is intact on the surviving HDD (see
Figure 3-3) and the controller continues to service the host’s data requests from the surviving
disk of a mirrored pair.
When the failed disk is replaced with a new disk, the controller copies the data from the
surviving disk of the mirrored pair. This activity is transparent to the host.
In addition to providing complete data redundancy, mirroring enables faster recovery
Prepared by, Mrs.J.REVATHI, AP / CSE
from disk failure. However, disk mirroring provides only data protection and is not a substitute for
data backup. Mirroring constantly captures changes in the data, whereas a backup captures point-
in-time images of data.
Mirroring involves duplication of data — the amount of storage capacity needed is
twice the amount of data being stored. Therefore, mirroring is con- sidered expensive and
is preferred for mission-critical applications that cannot afford data loss.
Mirroring improves read performance because read requests can be serviced by both
disks. However, write performance deteriorates, as each write request manifests as two
writes on the HDDs. In other words, mirroring does not deliver the same levels of write
performance as a striped RAID.
Parity
Parity is a method of protecting striped data from HDD failure without the cost of
mirroring. An additional HDD is added to the stripe width to hold parity, a mathematical
construct that allows re-creation of the missing data.
Parity is a redundancy check that ensures full protection of data without
maintaining a full set of duplicate data.
Parity information can be stored on separate, dedicated HDDs or distributed across
all the drives in a RAID set. Figure 3-4 shows a parity RAID. The first four disks, labeled
D, contain the data.
The fifth disk, labeled P, stores the parity information, which in this case is the sum
of the elements in each row. Now, if one of the Ds fails, the missing value can be
calculated by subtracting the sum of the rest of the elements from the parity value.
change in data. This recalculation is time-consuming and affects the performance of the
RAID controller.
RAID 0: Striping
RAID 0, also known as a striped set or a striped volume, requires a minimum of two disks. The
disks are merged into a single large volume where data is stored evenly across the number of disks in
the array.
This process is called disk striping and involves splitting data into blocks and writing it
simultaneously/sequentially on multiple disks. Configuring the striped disks as a single partition
increases performance since multiple disks do reading and writing operations simultaneously.
Therefore, RAID 0 is generally implemented to improve speed and efficiency.
It is important to note that if an array consists of disks of different sizes, each will be limited to
the smallest disk size in the setup. This means that an array composed of two disks, where one is 320
GB, and the other is 120 GB, actually has the capacity of 2 x 120 GB (or 240 GB in total).
Certain implementations allow you to utilize the remaining 200 GB for different use.
Additionally, developers can implement multiple controllers (or even one per disk) to improve
performance.
RAID 0 is the most affordable type of redundant disk configuration and is relatively easy to set
up. Still, it does not include any redundancy, fault tolerance, or party in its composition.
Hence, problems on any of the disks in the array can result in complete data loss. This is why it
should only be used for non-critical storage, such as temporary files backed up somewhere else.
Advantages of RAID 0
Cost-efficient and straightforward to implement.
Increased read and write performance.
No overhead (total capacity use).
Disadvantages of RAID 0
Doesn't provide fault tolerance or redundancy.
RAID 1: Mirroring
RAID 1 is an array consisting of at least two disks where the same data is stored on each to
ensure redundancy. The most common use of RAID 1 is setting up a mirrored pair consisting of two
disks in which the contents of the first disk is mirrored in the second. This is why such a configuration is
also called mirroring.
Unlike with RAID 0, where the focus is solely on speed and performance, the primary goal of
RAID 1 is to provide redundancy. It eliminates the possibility of data loss and downtime by replacing a
failed drive with its replica.
In such a setup, the array volume is as big as the smallest disk and operates as long as one drive
is operational. Apart from reliability, mirroring enhances read performance as a request can be handled
by any of the drives in the array. On the other hand, the write performance remains the same as with one
disk and is equal to the slowest disk in the configuration.
Advantages of RAID 1
Increased read performance.
Provides redundancy and fault tolerance.
Disadvantages of RAID 1
Uses only half of the storage capacity.
More expensive (needs twice as many drivers).
Requires powering down your computer to replace failed drive.
When Raid 1 Should Be Used
RAID 1 is used for mission-critical storage that requires a minimal risk of data loss. Accounting
systems often opt for RAID 1 as they deal with critical data and require high reliability.
It is also suitable for smaller servers with only two disks, as well as if you are searching for a simple
configuration you can easily set up (even at home).
The array calculates the error code correction on the fly. While writing the data, it strips it to the
data disk and writes the code to the redundancy disk. On the other hand, while reading data from the
disk, it also reads from the redundancy disk to verify the data and make corrections if needed.
Advantages of RAID 2
Reliability.
The ability to correct stored information.
Disadvantages of RAID 2
Expensive.
Difficult to implement.
Require entire disks for ECC.
Like RAID 2, RAID 3 is rarely used in practice. This RAID implementation utilizes bit-level
striping and a dedicated parity disk. Because of this, it requires at least three drives, where two are used
for storing data strips, and one is used for parity.
To allow synchronized spinning, RAID 3 also needs a special controller. Due to its configuration
and synchronized disk spinning, it achieves better performance rates with sequential operations than
random read/write operations.
Advantages of RAID 3
Good throughput when transferring large amounts of data.
High efficiency with sequential operations.
Disk failure resiliency.
Disadvantages of RAID 3
Not suitable for transferring small files.
Complex to implement.
Difficult to set up as software RAID.
RAID 4 is another unpopular standard RAID level. It consists of block-level data striping across
two or more independent diss and a dedicated parity disk.
The implementation requires at least three disks – two for storing data strips and one dedicated
for storing parity and providing redundancy. As each disk is independent and there is no synchronized
spinning, there is no need for a controller.
RAID 4 configuration is prone to bottlenecks when storing parity bits for each data block on a
single drive. Such system bottlenecks have a large impact on system performance.
Advantages of RAID 4
Prepared by, Mrs.J.REVATHI, AP / CSE
Parity bits are distributed evenly on all disks after each sequence of data has been saved. This
feature ensures that you still have access to the data from parity bits in case of a failed drive. Therefore,
RAID 5 provides redundancy through parity bits instead of mirroring.
Advantages of RAID 5
High performance and capacity.
Fast and reliable read speed.
Tolerates single drive failure.
Disadvantages of RAID 5
Longer rebuild time.
Uses half of the storage capacity (due to parity).
If more than one disk fails, data is lost.
More complex to implement.
When Raid 5 Should Be Used
RAID 5 is often used for file and application servers because of its high efficiency and optimized
storage. Additionally, it is the best, cost-effective solution if continuous data access is a priority and/or
you require installing an operating system on the array.
Block-level striping with two parity blocks allows two disk failures before any data is lost. This means
that in an event where two disks fail, RAID can still reconstruct the required data.
Its performance depends on how the array is implemented, as well as the total number of drives.
Write operations are slower compared to other configurations due to its double parity feature.
Advantages of RAID 6
High fault and drive-failure tolerance.
Storage efficiency (when more than four drives are used).
Fast read operations.
Disadvantages of RAID 6
Rebuild time can take up to 24 hours.
Slow write performance.
Complex to implement.
More expensive.
To implement such a configuration, the array requires at least four drives, as well as a disk
controller.
Advantages of RAID 10
High performance.
High fault-tolerance.
Fast read and write operations.
Chapter 3
RAID
Fast rebuild time.
Disadvantages of RAID 10
Limited scalability.
Costly (compared to other RAID levels).
■
Uses half of the disk space capacity.
Data Protection:
More complicated to set up.
5
6
TYPES OF INTELLIGENT STORAGE SYSTEMS
Intelligent storage systems generally fall into one of the following two categories:
■ High-end storage systems
■ Midrange storage systems
Traditionally, high-end storage systems have been implemented with active-active
arrays, whereas midrange storage systems used typically in small- and medium- sized
enterprises have been implemented with active-passive arrays.
Active-passive arrays provide optimal storage solutions at lower costs. Enterprises
make use of this cost advantage and implement active-passive arrays to meet specific
application requirements such as performance, availability, and scalability. The distinctions
between these two implementations are becoming increasingly insignificant.
large enterprises for centralizing corporate data. These arrays are designed with a large
number of controllers and cache memory.
An active-active array implies that the host can perform I/Os to its LUNs across any of the
available paths (see Figure 4-7).
To address the enterprise storage needs, these arrays provide the following capabilities:
■ Large storage capacity
■ Large amounts of cache to service host I/Os optimally
■ Fault tolerance architecture to improve data availability
■ Connectivity to mainframe computers and open systems hosts
■ Availability of multiple front-end ports and interface protocols to serve a large
number of hosts
■ Availability of multiple back-end Fibre Channel or SCSI RAID controllers to manage disk
processing
■ Scalability to support increased connectivity, performance, and storage capacity
requirements
■ Ability to handle large amounts of concurrent I/Os from a number of servers and
applications
■ Support for array-based local and remote replication
In addition to these features, high-end arrays possess some unique features and functionals that
are required for mission-critical applications in large enterprises.
contains host interfaces, cache, RAID controllers, and disk drive interfaces.
Midrange arrays are designed to meet the requirements of small and medium
enterprises; therefore, they host less storage capacity and global cache than active-active
arrays.
There are also fewer front-end ports for connection to serv- ers. However, they
ensure high redundancy and high performance for applications with predictable workloads.
They also support array-based local and remote replication.
Scale-up Architecture
In a scale-up data storage architecture, storage drives are added to increase storage capacity and
performance. The drives are managed by two controllers. When you run out of storage capacity, you
add another shelf of drives to the architecture.
In a scale-up approach, organizations add to existing infrastructure, such as with more disks or
drives. If it is important to retain the same device rather than splitting up critical applications and data
across multiple storage devices, use a scale-up approach to scale storage. This is also known as vertical
scaling.
IT management may determine that an existing storage device will need to increase its capacity
due to expansion of key applications that use the storage component. Organizations can then configure
additional servers and link them to the main system.
existing server offline while replacing it with a new, more powerful one. During this time, your apps
will be unavailable.
Scale-out Architecture
A scale-out architecture uses software-defined storage (SDS) to separate the storage hardware
from the storage software, letting the software act as the controllers. This is why scale-out storage is
considered to be network attached storage (NAS).
Scale-out NAS systems involve clusters of software nodes that work together. Nodes can be
added or removed, allowing things like bandwidth, compute, and throughput to increase or decrease as
needed. To upgrade a scale-out system, new clusters must be created.
Distributed file systems can be an important part of a scale-out arrangement, as they use multiple
devices in a cohesive storage environment.
Less downtime and easier upgrades: Scaling out means less downtime because you don’t have to
switch anything off to scale or make upgrades. Scaling out essentially allows you to upgrade or
downgrade your hardware whenever you want as you can move all users, workloads, and data
without any downtime. Scale-out systems can also auto-tune and self-heal, allowing clusters to
easily accommodate all data demands.
In practice, many organizations use a hybrid approach, maximizing each server’s power through
scaling up, then expanding capacity through scaling out.
Ultimately, the choice between the two strategies should take into account your application’s
requirements, growth projections and budget. Remember, the goal is to align your scaling strategy with
your business objectives for optimal performance.
One great option for scaling your storage is network-attached storage (NAS)
By enabling storage across multiple environments, block storage separates data from
the limitations of individual user environments. As a result, data can be retrieved
through any number of paths to maximize efficiency, with high input/output
operations per second (IOPS).
The result is an approach that offers a higher level of efficiency than other cloud
storage methods, making it ideal for high-performance applications or applications
that require constant writing and retrieval.
Databases: Block storage is fast, efficient, flexible, and scalable, with support for
redundant volumes. This allows it to support databases, particularly those that handle a
heavy volume of queries and where latency must be minimized.
Disaster recovery: Block storage can be a redundant backup solution for nearline storage
and quick restoration, with data swiftly moved from backup to production through easy
access.
File sharing: File storage is ideal for centralizing and sharing files on a Local Area Network
(LAN). Files stored on a NAS device are easily accessible by any computer on
the network that has the appropriate permission rights.
Common protocols: File storage uses common file-level protocols such as Server Message
Block (SMB), Common Internet File System (CIFS), or Network File System (NFS). If you
utilize a Windows or Linux operating system (or both), standard protocols like SMB/CIFS
and NFS will allow you to read and write files to a Windows-based or Linux-based server
over your Local Area Network (LAN).
Data protection: Storing files on a separate, LAN-connected storage device offers you a
level of data protection should your network computer experience a failure. Cloud-based file
storage services provide additional data protection and disaster recovery by replicating data
files across multiple, geographically-dispersed data centers.
Affordability: File storage using a NAS device allows you to move files off of expensive
computing hardware and onto a more affordable LAN-connected storage device. Moreover, if
you choose to subscribe to a cloud file-storage service, you eliminate the expense of on-site
hardware upgrades and the associated ongoing maintenance and operation costs.
Much of today’s data is unstructured: email, media and audio files, web pages, sensor
data, and other types of digital content that do not fit easily into traditional databases.
As a result, finding efficient and affordable ways to store and manage it has become
problematic.
Increasingly, object storage has become the preferred method for storing static
content, data arches, and backups.
Searchability
Metadata is part of objects, making it easy to search through and navigate without
the need of a separate application. It’s also far more flexible and customizable. You can tag
objects with attributes and information, such as consumption, cost, and policies for
automated deletion, retention, and tiering.
Resiliency
Object storage can automatically replicate data and store it across multiple devices
and geographical locations. This can help protect against outages, safeguard against data
loss, and help support disaster recovery strategies.
Cost efficiency
Object storage was created with cost in mind, providing storage for large amounts of
data at a lower price than file- and block-based systems. With object storage, you only pay
for the capacity you need, allowing you to control costs even for large amounts of data.
UNIFIED STORAGE :
Also known as multiprotocol storage, unified storage allows multiple types of data to
be stored in the same device.
It combines block and file storage protocols, such as iSCSI, NFS, and SMB, into a
single platform, making it easier for IT administrators to manage and maintain their
storage infrastructure because they have it all in one place.
With unified storage, users can access their data from different applications and
platforms via a single interface, which helps streamline workflows and reduce storage
complexity.
The storage controllers manage the storage access protocols and data services, while
the storage arrays contain the physical storage devices such as hard drives and solid-
state drives.
Network interfaces connect the storage system to the network, and management
software provides a GUI or command-line interface for administrators to manage and
monitor the storage environment.
Block storage
Block storage improves on the performance of file storage, breaking files into
separate blocks and storing them separately. A block-storage system will assign a unique
identifier to each chunk of raw data, which can then be used to reassemble them into the
complete file when you need to access it. Block storage doesn’t require a single path to data,
so you can store it wherever is most convenient and still retrieve it quickly when needed.
Block storage works well for organizations that work with large amounts of
transactional data or mission-critical applications that need minimal delay and consistent
performance. However, it can be expensive, offers no metadata capabilities, and requires an
operating system to access blocks.
Object storage
Object storage, as discussed earlier, saves files in a flat data environment, or storage
pool, as a self-contained object that contains all the data, a unique identifier, and detailed
metadata that contains information about the data, permissions, policies, and other
contingencies. Object storage works best for static storage, especially for unstructured data,
where you write data once but may need to read it many times.
While object storage eliminates the need for directories, folders, and other complex
hierarchical organization, it’s not a good solution for dynamic data that is changing
constantly as you’ll need to rewrite the entire object to modify it. In some cases, file storage
and block storage may still suit your needs depending on your speed and performance
requirements.
Types of SDN
There are four primary types of software-defined networking (SDN):
· Open SDN – Open protocols are used to control the virtual and physical devices
responsible for routing the data packets.
· API SDN – Through programming interfaces, often called southbound APIs,
organizations control the flow of data to and from each device.
9 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
· Overlay Model SDN – It creates a virtual network above existing hardware, providing
tunnels containing channels to data centers. This model then allocates bandwidth in each
channel and assigns devices to each channel.
· Hybrid Model SDN – By combining SDN and traditional networking, the hybrid
model assigns the optimal protocol for each type of traffic. Hybrid SDN is often used as an
incremental approach to SDN.
SDN Architecture
The architecture of software-defined networking (SDN) consists of three main layers: the
application layer, the control layer, and the infrastructure layer. Each layer has a specific role
and interacts with the other layers to manage and control the network.
Infrastructure Layer: The infrastructure layer is the bottom layer of the SDN architecture,
also known as the data plane. It consists of physical and virtual network devices such as
switches, routers, and firewalls that are responsible for forwarding network traffic based on
the instructions received from the control plane.
Control Layer: The control layer is the middle layer of the SDN architecture, also known as
the control plane. It consists of a centralized controller that communicates with the
infrastructure layer devices and is responsible for managing and configuring the network.
The controller interacts with the devices in the infrastructure layer using protocols such as
OpenFlow to program the forwarding behaviour of the switches and routers. The controller
uses network policies and rules to make decisions about how traffic should be forwarded
based on factors such as network topology, traffic patterns, and quality of service
requirements.
Application Layer: The application layer is the top layer of the SDN architecture and is
responsible for providing network services and applications to end-users. This layer consists
of various network applications that interact with the control layer to manage the network.
The main benefit of the SDN architecture is its flexibility and ability to centralize
control of the network. The separation of the control plane from the data plane enables
network administrators to configure and manage the network more easily and in a more
granular way, allowing for greater network agility and faster response times to changes in
network traffic.
Advantages of SDN:
Software-defined networking (SDN) offers several advantages over traditional networking
architectures, including:
o Centralized Network Control: One of the key benefits of SDN is that it centralizes the
control of the network in a single controller, making it easier to manage and configure
the network. This allows network administrators to define and enforce network
Disadvantages of SDN
While software-defined networking (SDN) has several advantages over traditional
networking, there are also some potential disadvantages that organizations should be aware
of. Here are some of the main disadvantages of SDN:
o Complexity: SDN can be more complex than traditional networking because it
involves a more sophisticated set of technologies and requires specialized skills to
manage. For example, the use of a centralized controller to manage the network
requires a deep understanding of the SDN architecture and protocols.
o Dependency on the Controller: The centralized controller is a critical component of
SDN, and if it fails, the entire network could go down. This means that organizations
need to ensure that the controller is highly available and that they have a robust
backup and disaster recovery plan in place.
o Compatibility: Some legacy network devices may not be compatible with SDN,
which means that organizations may need to replace or upgrade these devices to take
full advantage of the benefits of SDN.
o Security: While SDN can enhance network security, it can also introduce new
security risks. For example, a single point of control could be an attractive target for
attackers, and the programmability of the network could make it easier for attackers to
manipulate traffic.
o Vendor Lock-In: SDN solutions from different vendors may not be interoperable,
which could lead to vendor lock-in. This means that organizations may be limited in
11 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
their ability to switch to another vendor or integrate new solutions into their existing
network.
o Performance: The centralized control of the network in SDN can introduce latency,
which could impact network performance in certain situations. Additionally, the
overhead of the SDN controller could impact the performance of the network as the
network scales.
COMPONENTS OF FC SAN
A SAN consists of three basic components: servers, network infrastructure, and
storage.
These components can be further broken down into the following key elements: node
ports, cabling, interconnecting devices (such as FC switches or hubs), storage
arrays, and SAN management software.
Node Ports
In fibre channel, devices such as hosts, storage and tape libraries are all referred to as
nodes. Each node is a source or destination of information for one or more nodes.
Each node requires one or more ports to provide a physical interface for
communicating with other nodes. These ports are integral components of an HBA and
the storage front-end adapters.
A port operates in full-duplex data transmission mode with a transmit (Tx) link and a
receive (Rx) link (see Figure 6-3)
Cabling
SAN implementations use optical fiber cabling. Copper can be used for shorter
distances for back-end connectivity, as it provides a better signal-to-noise ratio for
distances up to 30 meters.
Optical fiber cables carry data in the form of light.
There are two types of optical cables, multi-mode and single-mode. Multi-mode fiber
(MMF) cable carries multiple beams of light projected at different angles
simultaneously onto the core of the cable (see Figure 6-4 (a)).
Based on the bandwidth, multi-mode fibers are classified as OM1 (62.5µm), OM2
(50µm) and laser optimized OM3 (50µm). In an MMF transmission, multiple light
beams traveling inside the cable tend to disperse and collide.
This collision weakens the signal strength after it travels a certain distance — a
process known as modal dispersion. An MMF cable is usually used for distances of up
to 500 meters because of signal degradation (attenuation) due to modal dispersion.
12 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
Single-mode fiber (SMF) carries a single ray of light projected at the center of the core
(see Figure 6-4 (b)).
These cables are available in diameters of 7–11 microns; the most common size is 9
microns. In an SMF transmission, a single light beam travels in a straight line through
the core of the fiber.
The small core and the single light wave limits modal dispersion. Among all types of
fibre cables, single-mode provides minimum signal attenuation over maximum
distance (up to 10 km).
A single-mode cable is used for long-distance cable runs, limited only by the power of
the laser at the transmitter and sensitivity of the receiver
MMFs are generally used within data centers for shorter distance runs, while SMFs
are used for longer distances. MMF transceivers are less expensive as compared to
SMF transceivers.
A Standard connector (SC) (see Figure 6-5 (a)) and a Lucent connector (LC) (see
Figure 6-5 (b)) are two commonly used connectors for fiber optic cables.
An SC is used for data transmission speeds up to 1 Gb/s, whereas an LC is used for
speeds up to 4 Gb/s. Figure 6-6 depicts a Lucent connector and a Standard connector.
A Straight Tip (ST) is a fiber optic connector with a plug and a socket that is locked
with a half-twisted bayonet lock (see Figure 6-5 (c)).
In the early days of FC deployment, fiber optic cabling predominantly used ST
connectors. This connector is often used with Fibre Channel patch panels
Interconnect Devices
Hubs, switches, and directors are the interconnect devices commonly used in SAN.
Hubs are used as communication devices in FC-AL implementations. Hubs physically
connect nodes in a logical loop or a physical star topology.
All the nodes must share the bandwidth because data travels through all the connection
points. Because of availability of low cost and high performance switches, hubs are no
longer used in SANs.
13 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
Switches are more intelligent than hubs and directly route data from one physical port
to another.
Therefore, nodes do not share the bandwidth. Instead, each node has a dedicated
communication path, resulting in bandwidth aggregation
Directors are larger than switches and are deployed for data center implementations.
The function of directors is similar to that of FC switches, but directors have higher
port count and fault tolerance capabilities.
Storage Arrays
The fundamental purpose of a SAN is to provide host access to storage resources.
The large storage capacities offered by modern storage arrays have been exploited in
SAN environments for storage consolidation and centralization.
SAN implementations complement the standard features of storage arrays by
providing high availability and redundancy, improved performance, business
continuity, and multiple host connectivity.
FC ARCHITECTURE
The FC architecture represents true channel/network integration with standard
interconnecting devices. Connections in a SAN are accomplished using FC.
Traditionally, transmissions from host to storage devices are carried out over channel
connections such as a parallel bus. Channel technologies provide high levels of
performance with low protocol overheads.
Such performance is due to the static nature of channels and the high level of hardware
and software integration provided by the channel technologies.
However, these technologies suffer from inherent limitations in terms of the number of
devices that can be connected and the distance between these devices. Fibre Channel
Protocol (FCP) is the implementation of serial SCSI-3 over an FC network. In the FCP
architecture, all external and remote storage devices attached to the SAN appear as
local devices to the host operating system.
The FC standard enables mapping several existing Upper Layer Protocols (ULPs) to
FC frames for transmission, including SCSI, IP, High Performance Parallel Interface
(HIPPI), Enterprise System Connection (ESCON), and Asynchronous Transfer Mode
(ATM).
FC Address of an NL_port
The FC addressing scheme for an NL_port differs from other ports.
The two upper bytes in the FC addresses of the NL_ports in a private loop are
assigned zero values. However, when an arbitrated loop is connected to a fabric
through an FL_port, it becomes a public loop.
In this case, an NL_port supports a fabric login. The two upper bytes of this NL_port
are then assigned a positive value, called a loop identifier, by the switch. The loop
identifier is the same for all NL_ports on a given loop.
Figure 6-15 illustrates the FC address of an NL_port in both a public loop and a
private loop. The last field in the FC addresses of the NL_ports, in both public and
16 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
private loops, identifies the AL-PA. There are 127 allowable AL-PA addresses; one
address is reserved for the FL_port on the switch
FC Frame
An FC frame (Figure 6-17) consists of five parts: start of frame (SOF), frame header,
data field, cyclic redundancy check (CRC), and end of frame (EOF).
The SOF and EOF act as delimiters.
In addition to this role, the SOF is a flag that indicates whether the frame is the first
frame in a sequence of frames. The frame header is 24 bytes long and contains
addressing information for the frame.
It includes the following information: Source ID (S_ID), Destination ID (D_ID),
Sequence ID (SEQ_ID), Sequence Count (SEQ_CNT), Originating Exchange ID
(OX_ID), and Responder Exchange ID (RX_ID), in addition to some control fields
The S_ID and D_ID are standard FC addresses for the source port and the destination
port, respectively. The SEQ_ID and OX_ID identify the frame as a component of a
specific sequence and exchange, respectively.
A sequence refers to a contiguous set of frames that are sent from one port to
another. A sequence corresponds to an information unit, as defined by the ULP.
■ Frame:
A frame is the fundamental unit of data transfer at Layer 2. Each frame can
contain up to 2,112 bytes of payload.
Flow Control
Flow control defines the pace of the flow of data frames during data transmission. FC
technology uses two flow-control mechanisms: buffer-to-buffer credit (BB_Credit)
and end-to-end credit (EE_Credit).
BB_Credit
FC uses the BB_Credit mechanism for hardware-based flow control.
BB_Credit controls the maximum number of frames that can be present over the link
at any given point in time. In a switched fabric, BB_Credit management may take
place between any two FC ports.
The transmitting port maintains a count of free receiver buffers and continues to send
frames if the count is greater than 0. The BB_Credit mechanism provides frame
acknowledgment through the Receiver Ready (R_RDY) primitive.
EE_Credit
The function of end-to-end credit, known as EE_Credit, is similar to that of BB_
Credit.
When an initiator and a target establish themselves as nodes communicating with each
other, they exchange the EE_Credit parameters (part of Port Login).
The EE_Credit mechanism affects the flow control for class 1 and class 2 traffic only
Classes of Service
The FC standards define different classes of service to meet the requirements of a
wide range of applications.
The table below shows three classes of services and their features (Table 6-1)
Another class of services is class F, which is intended for use by the switches
communicating through ISLs. Class F is similar to Class 2, and it provides notification
of nondelivery of frames.
FC TOPOLOGIES :
Fabric design follows standard topologies to connect devices.
Core-edge fabric is one of the popular topology designs.
Variations of core-edge fabric and mesh topologies are most commonly deployed in
SAN implementations.
Core-Edge Fabric
In the core-edge fabric topology, there are two types of switch tiers in this fabric. The
edge tier usually comprises switches and offers an inexpensive approach to adding
more hosts in a fabric.
The tier at the edge fans out from the tier at the core. The nodes on the edge can
communicate with each other.
The core tier usually comprises enterprise directors that ensure high fabric availability.
Additionally all traffic has to either traverse through or terminate at this tier. In a two-
tier configuration, all storage devices are connected to the core tier, facilitating fan-
out.
The host-to-storage traffic has to traverse one and two ISLs in a two-tier and three-tier
configuration, respectively.
Hosts used for mission-critical applications can be connected directly to the core tier
and consequently avoid traveling through the ISLs to process I/O requests from these
hosts.
The core-edge fabric topology increases connectivity within the SAN while
conserving overall port utilization. If expansion is required, an additional edge switch
can be connected to the core.
This topology can have different variations. In a single-core topology, all hosts are
connected to the edge tier and all storage is connected to the core tier.
Figure 6-21 depicts the core and edge switches in a single-core topology.
Mesh Topology
In a mesh topology, each switch is directly connected to other switches by using ISLs.
This topology promotes enhanced connectivity within the SAN.
When the number of ports on a network increases, the number of nodes that can
participate and communicate also increases.
A mesh topology may be one of the two types: full mesh or partial mesh. In a full
mesh, every switch is connected to every other switch in the topology. Full mesh
topology may be appropriate when the number of switches involved is small.
A typical deployment would involve up to four switches or directors, with each of
them servicing highly localized host-to-storage traffic.
In a full mesh topology, a maximum of one ISL or hop is required for host-to-storage
traffic. In a partial mesh topology, several hops or ISLs may be required for the traffic
to reach its destination.
Hosts and storage can be located anywhere in the fabric, and storage can be localized
to a director or a switch in both mesh topologies.
A full mesh topology with a symmetric design results in an even number of switches,
whereas a partial mesh has an asymmetric design and may result in an odd number of
switches.
Figure 6-23 depicts both a full mesh and a partial mesh topology.
ZONING
Zoning is an FC switch function that enables nodes within the fabric to be logically
segmented into groups that can communicate with each other (see Figure 6-18).
When a device (host or storage array) logs onto a fabric, it is registered with the name
server.When a port logs onto the fabric, it goes through a device discovery process
with other devices registered in the name server.
The zoning function controls this process by allowing only the members in the same
zone to establish these link-level services.
Multiple zone sets may be defined in a fabric, but only one zone set can be active at a
time. A zone set is a set of zones and a zone is a set of members. A member may be in
multiple zones. Members, zones, and zone sets form the hierarchy defined in the
zoning process (see Figure 6-19).
Members are nodes within the SAN that can be included in a zone. Zones comprise a
set of members that have access to one another. A port or a node can be a member of
multiple zones.
Zone sets comprise a group of zones that can be activated or deactivated as a single
entity in a fabric. Only one zone set per fabric can be active at a time.
Zone sets are also referred to as zone configurations
Types of Zoning
Zoning can be categorized into three types:
■ Port zoning: It uses the FC addresses of the physical ports to define zones.
In port zoning, access to data is determined by the physical switch port to which a
node is connected.
The FC address is dynamically assigned when the port logs on to the fabric.
Therefore, any change in the fabric configuration affects zoning. Port zoning is also
called hard zoning.
Although this method is secure, it requires updating of zoning configuration
information in the event of fabric reconfiguration.
■ WWN zoning: It uses World Wide Names to define zones. WWN zoning is also
referred to as soft zoning. A major advantage of WWN zoning is its flexibility. It
allows the SAN to be recabled without reconfiguring the zone information. This is
possible because the WWN is static to the node port.
■ Mixed zoning: It combines the qualities of both WWN zoning and port zoning.
Using mixed zoning enables a specific port to be tied to the WWN of a node.
Zoning is used in conjunction with LUN masking for controlling server access to
storage. However, these are two different activities.
Zoning takes place at the fabric level and LUN masking is done at the array level
FC SAN VIRTUALIZATION
For SAN virtualization, the available virtualization features in the IBM Storage
portfolio is described.
These features enable the SAN infrastructure to support the requirements of scalability
and consolidation, combining them with a lower TCO and a higher return on
investment (ROI):
IBM b-type Virtual Fabrics
CISCO Virtual SAN (VSAN)
N_Port ID Virtualization (NPIV) support for virtual nodes
Logical fabric When the fabric is formed with at least one logical switch, the fabric is
called a logical fabric.
Two methods of fabric connectivity are available for logical fabrics:
A logical fabric is connected with a dedicated inter-switch link (ISL) to
another switch or a logical switch.
Figure 6-7 shows a logical fabric that is formed between logical switches
through a dedicated ISL for logical switches.
Figure 6-8 shows a logical fabric that is formed through the XISL in the base
switch
Cisco virtual storage area network Cisco virtual storage area network (VSAN) is a
feature that enables the logical partition of SAN switches. A VSAN provides the
flexibility to partition, for example, a dedicated VSAN for disk and tape.
Or, a VSAN can provide the flexibility to maintain production and test devices in
separate VSANs on the same chassis.
Also, the VSAN can scale across the chassis, which allows it to overcome the fixed
port numbers on the chassis.
Virtual storage area network in a single storage area network switch With VSAN, you
can consolidate small fabrics into the same chassis.
This consolidation can also enable more security by the logical separation of the
chassis into two individual VSANs.
Figure 6-9 shows a single chassis that is divided into two logical VSANs.
Virtual storage area network across multiple chassis In multiple chassis, the virtual
storage area network (VSAN) can be formed with devices in one chassis to devices in
another switch chassis through the extended inter-switch link (XISL).
26 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
Figure 6-10 shows the VSAN across chassis with an enhanced inter-switch link (EISL)
for VSAN communication.
NPIV mode of blade server switch modules On blade servers, when they are enabled
with the NPIV mode, the FC switch modules that connect to an external SAN switch
for access to storage act as an HBA N_port (instead of a switch E_port).
The back-end ports are F_ports that connect to server blade modules.
With the NPIV mode, we can overcome the interoperability issues of merging external
switches that might come from separate vendors to the blade server switch module.
Also, management is easier because the blade switch module becomes a node in the
fabric.
And, we can overcome the scalability limitations of many switch domains for a switch
module in blade servers.
iSCSI is the host-based encapsulation of SCSI I/O over IP using an Ethernet NIC card
or an iSCSI HBA in the host.
As illustrated in Figure 8-2 (a), IP traffic is routed over a network either to a gateway
device that extracts the SCSI I/O from the IP packets or to an iSCSI storage array.
The gateway can then send the SCSI I/O to an FC-based external storage array,
whereas an iSCSI storage array can handle the extraction and I/O natively. FCIP uses
a pair of bridges (FCIP gateways) communicating over TCP/IP as the transport
protocol.
FCIP is used to extend FC networks over distances and/or an existing IP-based
infrastructure, as illustrated in Figure 8-2 (b).
Today, iSCSI is widely adopted for connecting servers to storage because it is
relatively inexpensive and easy to implement, especially in environments where an FC
SAN does not exist.
FCIP is extensively used in disaster-recovery implementations, where data is
duplicated on disk or tape to an alternate site.
iSCSI
iSCSI is an IP-based protocol that establishes and manages connections between
storage, hosts, and bridging devices over IP. iSCSI carries block-level data over IP-
based networks, including Ethernet networks and the Internet.
iSCSI is built on the SCSI protocol by encapsulating SCSI commands and data in
order to allow these encapsulated commands and data blocks to be transported using
TCP/IP packets.
Components of iSCSI
Host (initiators), targets, and an IP-based network are the principal iSCSI components.
The simplest iSCSI implementation does not require any FC components. If an iSCSI-
capable storage array is deployed, a host itself can act as an iSCSI initiator, and
directly communicate with the storage over an IP network.
However, in complex implementations that use an existing FC array for iSCSI
connectivity, iSCSI gateways or routers are used to connect the existing FC SAN.
These devices perform protocol translation from IP packets to FC packets and vice-
versa, thereby bridging connectivity between the IP and FC environments.
Use of an iSCSI HBA is also the simplest way for implementing a boot from SAN
environment via iSCSI. If there is no iSCSI HBA, modifications have to be made to
the basic operating system to boot a host from the storage devices because the NIC
needs to obtain an IP address before the operating system loads.
The functionality of an iSCSI HBA is very similar to the functionality of an FC HBA,
but it is the most expensive option.
A fault-tolerant host connectivity solution can be implemented using hostbased
multipathing software (e.g., EMC PowerPath) regardless of the type of physical
connectivity. Multiple NICs can also be combined via link aggregation technologies to
provide failover or load balancing.
Complex solutions may also include the use of vendor-specific storage-array software
that enables the iSCSI host to connect to multiple ports on the array with multiple
NICs or HBAs.
Figure 8-3 (b) illustrates an existing FC storage array used to service hosts connected
through iSCSI.
The array does not have any native iSCSI capabilities—that is, it does not have any
Ethernet ports.
Therefore, an external device, called a bridge, router, gateway, or a multi-protocol
router, must be used to bridge the communication from the IP network to the FC SAN.
These devices can be a stand-alone unit, or in many cases are integrated with an
existing FC switch. In this configuration, the bridge device has Ethernet ports
connected to the IP network, and FC ports connected to the storage.
These ports are assigned IP addresses, similar to the ports on an iSCSI-enabled array.
The iSCSI initiator/host is configured with the bridge’s IP address as its target
destination.
The bridge is also configured with an FC initiator or multiple initiators. These are
called virtual initiators because there is no physical device, such as an HBA, to
generate the initiator record
SCSI is the command protocol that works at the application layer of the OSI model.
The initiators and targets use SCSI commands and responses to talk to each other. The
SCSI command descriptor blocks, data, and status messages are encapsulated into
TCP/IP and transmitted across the network between initiators and targets.
iSCSI is the session-layer protocol that initiates a reliable session between a device
that recognizes SCSI commands and TCP/IP.
The iSCSI session-layer interface is responsible for handling login, authentication,
target discovery, and session management. TCP is used with iSCSI at the transport
layer to provide reliable service.
TCP is used to control message flow, windowing, error recovery, and retransmission.
It relies upon the network layer of the OSI model to provide global addressing and
connectivity.
The layer-2 protocols at the data link layer of this model enable node-to-node
communication for each hop through a separate physical network.
LINK AGGREGATION
Link aggregation combines multiple physical links to operate as a single larger logical
link.
The member links no longer function as independent physical connections, but as
members of the larger logical link (Figure 4-9).
Link aggregation provides greater bandwidth between the devices at each end of the
aggregated link.
Another advantage of link aggregation is increased availability because the aggregated
link is composed of multiple member links.
If one member link fails, the aggregated link continues to carry traffic over the
remaining member links. Each of the devices that is interconnected by the aggregated
link uses a hashing algorithm to determine on which of the member links the frames
will be transmitted.
The hashing algorithm might use various information in the frame to make the
decision.
This algorithm might include a source MAC, destination MAC, source IP, destination
IP, and more. It might also include a combination of these values.
SWITCH AGGREGATION:
An aggregation switch is a networking device that allows multiple network
connections to be bundled together into a single link. This enables increased
bandwidth and better network performance.
Typically, aggregation switches use link aggregation protocols, such as Link
Aggregation Control Protocol (LACP) and Ethernet Aggregation to combine multiple
links into a single, logical connection.
Therefore, they can offer great flexibility and scalability, allowing for quick and easy
network expansion or reconfiguration.
In most cases, aggregation switches are used in networks with high-traffic levels or
large numbers of users, as they can efficiently distribute data across multiple links.
It needs to be responsible for managing the data from the lower layer (the access layer
switch), and at the same time, it also reports data to the upper layer (the core layer
switch).
Usually, when the aggregation switch receives data from the access switch, it will
perform local routing, filtering, traffic balancing, and QoS priority management. Then
it will process the security mechanism, IP address translation, and multicast
management of the data.
Finally, it will forward the data to the core layer switch or perform local routing
processing according to the processing result to ensure the normal operation of the
core layer.
It can be seen from the above that the aggregation switch has functions such as source
address, destination address filtering, real-time policy, security, network isolation, and
segmentation.
Compared with access switches, aggregation switches have better performance and
higher switching speeds.
user management functions such as address authentication, user authentication, and user
information collection.
The first method uses a single link for each VLAN. This method does not scale well
because it uses many ports in networks that have multiple VLANs and multiple
switches.
Also, this method does not use link capacity efficiently when traffic in the VLANs is
not uniform.
The second method is VLAN tagging over a single link in which each frame in tagged
with its VLAN ID. This method is highly scalable because only a single link is
required to provide connectivity to many VLANs.
This configuration provides for better utilization of the link capacity when VLAN
traffic is not uniform.
The protocol for VLAN tagging of frames in a LAN environment is defined by the
IEEE 802.1p/q standard (priority tagging and VLAN identifier tagging).
Inter-switch link (ISL): ISL is another protocol for providing the VLAN tagging
function in a network. This protocol is not compatible with the IEEE 802.1p/q
standard.
Tagged frames
The IEEE 802.1p/q standard provides a methodology for information, such as VLAN
membership and priority, that is added to the frame (Figure 4-7).
37 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
FCIP PROTOCOL
Organizations are now looking for new ways to transport data throughout the
enterprise, locally over the SAN as well as over longer distances, to ensure that data
reaches all the users who need it.
One of the best ways to achieve this goal is to interconnect geographically dispersed
SANs through reliable, high-speed links.
This approach involves transporting FC block data over the existing IP infrastructure
used throughout the enterprise.
The FCIP standard has rapidly gained acceptance as a manageable, costeffective way
to blend the best of two worlds: FC block-data storage and the proven, widely
deployed IP infrastructure.
FCIP is a tunneling protocol that enables distributed FC SAN islands to be
transparently interconnected over existing IP-based local, metropolitan, and wide-area
networks.
As a result, organizations now have a better way to protect, store, and move their data
while leveraging investments in existing technology. FCIP uses TCP/IP as its
underlying protocol.
In FCIP, the FC frames are encapsulated onto the IP payload, as shown in Figure 8-9.
FCIP does not manipulate FC frames (translating FC IDs for transmission).
When SAN islands are connected using FCIP, each interconnection is called an FCIP
link.
A successful FCIP link between two SAN islands results in a fully merged FC fabric.
FCoE makes it possible to move Fibre Channel traffic across existing high-speed
Ethernet infrastructure and converges storage and IP protocols onto a single cable
transport and interface.
The goal of FCoE is to consolidate I/O (input/output) and reduce switch complexity,
as well as to cut back on cable and interface card counts.
Adoption of FCoE has been slow, however, due to a scarcity of end-to-end FCoE
devices and a reluctance on the part of many organizations to change the way they
implement and manage their networks.
Traditionally, organizations have used Ethernet for Transmission Control
Protocol/Internet Protocol (TCP/IP) networks and FC for storage networks.
Fibre Channel supports high-speed data connections between computing devices that
interconnect servers with shared storage devices and between storage controllers and
drives.
FCoE shares Fibre Channel and Ethernet traffic on the same physical cable or lets
organizations separate Fibre Channel and Ethernet traffic on the same hardware.
FCoE uses a lossless Ethernet fabric and its own frame format.
It retains Fibre Channel's device communications but substitutes high-speed Ethernet
links for Fibre Channel links between devices.
FCOE Switch
An FCoE switch has both Ethernet switch and FC switch functionalities. It has a Fibre
Channel Forwarder (FCF), an Ethernet Bridge, and a set of ports that can be used for
FC and Ethernet connectivity.
FCF handles FCoE login requests, applies zoning, and provides the fabric services
typically associated with an FC switch.
It also encapsulates the FC frames received from the FC port into the Ethernet frames
and decapsulates the Ethernet frames received from the Ethernet Bridge to the FC
frames.
Upon receiving the incoming Ethernet traffic, the FCoE switch inspects the Ethertype
of the incoming frames and uses that to determine their destination.
If the Ethertype of the frame is FCoE, the switch recognizes that the frame contains an
FC payload and then forwards it to the FCF.
From there, the FC frame is extracted from the Ethernet frame and transmitted to the
FC SAN over the FC ports.
If the Ethertype is not FCoE, the switch handles the traffic as usual Ethernet traffic
and forwards it over the Ethernet ports.
It uses FCoE protocol that encapsulates FC frames into Ethernet frames. FCoE
protocol is defined by the T11 standards committee.
FCoE is based on an enhanced Ethernet standard that supports Data Center Bridging
(DCB) functionalities (also called CEE functionalities). DCB ensures lossless
transmission of FC traffic over Ethernet.
FCoE SAN provides the flexibility to deploy the same network components for
transferring both server-to-server traffic and FC storage traffic. This helps to mitigate
the complexity of managing multiple discrete network infrastructures. FCoE SAN
uses multi-functional network adapters and switches.
Therefore, FCoE reduces the number of network adapters, cables, and switches, along
with power and space consumption required in a data center.
FCoE ARCHITECTURE
Fibre Channel over Ethernet (FCoE) is a method of supporting converged Fibre
Channel (FC) and Ethernet traffic on a data center bridging (DCB) network.
FCoE encapsulates unmodified FC frames in Ethernet to transport the FC frames over
a physical Ethernet network.
An FCoE frame is the same as any other Ethernet frame because the Ethernet
encapsulation provides the header information needed to forward the frames.
However, to achieve the lossless behavior that FC transport requires, the Ethernet
network must conform to DCB standards.
DCB standards create an environment over which FCoE can transport native FC
traffic encapsulated in Ethernet while preserving the mandatory class of service (CoS)
and other characteristics that FC traffic requires.
Supporting FCoE in a DCB network requires that the FCoE devices in the Ethernet
network and the FC switches at the edge of the SAN network handle both Ethernet
and native FC traffic. To handle Ethernet traffic, an FC switch does one of two things:
Incorporates FCoE interfaces.
FCoE Devices
Each FCoE device has a converged network adapter (CNA) that combines the
functions of an FC host bus adapter (HBA) and a lossless Ethernet network interface
card (NIC) with 10-Gbps Ethernet ports.
The portion of the CNA that handles FCoE traffic is called an FCoE Node (ENode).
An ENode combines FCoE termination functions and the client part of the FC stack
on the CNA.
ENodes present virtual FC interfaces to FC switches in the form of virtual N_Ports
(VN_Ports). A VN_Port is an endpoint in a virtual point-to-point connection called a
virtual link.
The other endpoint of the virtual link is an FC switch (or FCF) port. A VN_Port
emulates a native FC N_Port and performs similar functions: handling the creation,
detection, and flow of messages to and from the FC switch.
A single ENode can host multiple VN_Ports. Each VN_Port has a separate, unique
virtual link with a FC switch.
ENodes contain at least one lossless Ethernet media access controller (MAC). Each
Ethernet MAC is paired with an FCoE controller. The lossless Ethernet MAC is a
full-duplex Ethernet MAC that implements Ethernet extensions to avoid frame loss
due to congestion and supports frames of at least 2500 bytes.
The FCoE controller instantiates and terminates VN_Port instances dynamically as
they are needed for FCoE sessions. Each VN_Port instance has a unique virtual link
to an FC switch.
Nodes also contain one FCoE link end point (LEP) for each VN_Port connection. An
FCoE LEP is a virtual FC interface mapped onto the physical Ethernet interface.
An FCoE LEP:
Transmits and receives FCoE frames on the virtual link.
Handles FC frame encapsulation for traffic going from the server to the FC
switch.
Performs frame de-encapsulation of traffic received from the FC switch.
Figure 1 shows a block diagram of the major ENode components.
ENodeComponents
FCoE Frames
The FCoE protocol specification replaces the FC0 and FC1 layers of the FC stack
with Ethernet, but retains the FC frame header. Retaining the FC frame header
enables the FC frame to pass directly to a native FC SAN after de-encapsulation.
The FCoE header carries the FC start of file (SOF) bits and end of file (EOF) bits in
an encoded format. FCoE supports two frame types, control frames and data frames.
FCoE Initialization Protocol (FIP) carries all of the discovery and fabric login frames.
FIP control frames handle FCoE device discovery, initializing communication, and
maintaining communication.
They do not carry a data payload. FIP has its own EtherType (0x8914) to distinguish
FIP traffic from FCoE traffic and other Ethernet traffic.
To establish communication, the ENode uses the globally unique MAC address
assigned to it by the CNA manufacturer.
After FIP establishes a connection between FCoE devices, the FCoE data frames
handle the transport of the FC frames encapsulated in Ethernet.
FCoE also has its own EtherType (0x8906) to distinguish FCoE frames from other
Ethernet traffic and ensure the in-order frame handling that FC requires. FCoE frames
include:
2112 bytes FC payload
24 bytes FC header
14 bytes standard Ethernet header
14 bytes FCoE header
8 bytes cyclic redundancy check (CRC) plus EOF
4 bytes VLAN header
4 bytes frame check sequence (FCS)
The payload, headers, and checks add up to 2180 bytes. Therefore, interfaces that
carry FCoE traffic should have a configured maximum transmission unit (MTU) of
2180 or larger. An MTU size of 2180 bytes is the minimum size; some network
administrators prefer an MTU of 2240 or 2500 bytes.
Virtual Links
Native FC uses point-to-point physical links between FC devices. In FCoE, virtual
links replace the physical links.
A virtual link emulates a point-to-point link between two FCoE device endpoints,
such as a server VN_Port and an FC switch (or FCF) VF_Port.
Each FCoE interface can support multiple virtual links.
The MAC addresses of the FCoE endpoints (the VN_Port and the VF_Port) uniquely
identify each virtual link and allow traffic for multiple virtual links to share the same
physical link while maintaining data separation and security.
A virtual link exists in one FCoE VLAN and cannot belong to more than one VLAN.
Although the FC switch and the FCoE device detect a virtual link as a point-to-point
connection, virtual links do not need to be direct connections between a VF_Port and
a VN_Port.
A virtual link can traverse one or more transit switches, also known as passthrough
switches.
A transit switch can transparently aggregate virtual links while still appearing and
functioning as a point-to-point connection to the FCoE devices. However, a virtual
link must remain within a single Layer 2 domain.
FCoE VLANs
All FCoE traffic must travel in a VLAN dedicated to transporting only FCoE traffic.
Only FCoE interfaces should be members of an FCoE VLAN. Ethernet traffic that is
not FCoE or FIP traffic must travel in a different VLAN.
FCoE traffic cannot use a standard LAG because traffic might be hashed to different
physical LAG links on different transmissions. This breaks the (virtual) point-to-point
link that Fibre Channel traffic requires.
If you configure a standard LAG interface for FCoE traffic, FCoE traffic might be
rejected by the FC SAN.
BC entails preparing for, responding to, and recovering from a system outage that
adversely affects business operations. It involves proactive measures, such as business
impact analysis and risk assessments, data protection, and security, and reactive
countermeasures, such as disaster recovery and restart, to be invoked in the event of a
failure. The goal of a business continuity solution is to ensure the “information availability”
required to conduct vital business operations.
INFORMATION AVAILABILITY
Information availability (IA) refers to the ability of the infrastructure to function
according to business expectations during its specified time of operation. Information
availability ensures that people (employees, customers, suppliers, and partners) can
access information whenever they need it.
Information availability can be defined with the help of reliability, accessibility
and timeliness.
Reliability: This reflects a component’s ability to function without failure, under stated
conditions, for a specified amount of time.
Accessibility: This is the state within which the required information is accessible at the
right place, to the right user. The period of time during which the system is in an
accessible state is termed system uptime; when it is not accessible it is termed system
downtime.
Timeliness: Defines the exact moment or the time window (a particular time of the day,
week, month, and/or year as specified) during which information must be accessible.
For example, if online access to an application is required between 8:00 AM and 10:00
pM each day, any disruptions to data availability outside of this time slot are not
considered to affect timeliness.
Table 11-1 lists the approximate amount of downtime allowed for a service to achieve
certain levels of 9s availability.
For example, a service that is said to be “five 9s available” is available for 99.999
percent of the scheduled time in a year (24 × 7 × 365).
Consequences of Downtime
Data unavailability, or downtime, results in loss of productivity, loss of revenue, poor
financial performance, and damages to reputation.
Loss of produc tivity reduces the output per unit of labor, equipment, and capital.
Loss of revenue includes direct loss, compensatory payments, future revenue losses,
billing losses, and investment losses.
Poor financial performance affects revenue recognition, cash flow, discounts, payment
guarantees, credit rating, and stock price.
An important metric, average cost of downtime per hour, provides a key estimate
in determining the appropriate BC solutions.
It is calculated as follows:
Average cost of downtime per hour = average productivity loss per hour + average revenue
loss per hour
Where:
Productivity loss per hour = (total salaries and benefits of all employees per week) /
(average number of working hours per week)
Average revenue loss per hour = (total revenue of an organization per week) / (average
number of hours per week that an organization is open for business)
Common terms of BC
■ Disaster recovery: This is the coordinated process of restoring systems, data, and the
infrastructure required to support key ongoing business operations in the event of a
disaster.
■ Disaster restart: This is the process of restarting business operations with mirrored
consistent copies of data and applications.
■ Recovery-Point Objective (RPO): This is the point in time to which sys tems and data
must be recovered after an outage. It defines the amount of data loss that a business can
endure.
For example, if the RPO is six hours, backups or replicas must be made at least once
in 6 hours.
Figure 11-2 shows various RPOs and their corresponding ideal recovery
strategies. For example:
RPO of 24 hours: This ensures that backups are created on an offsite tape drive
every midnight.
RPO of 1 hour: This ships database logs to the remote site every hour.
RPO of zero: This mirrors mission-critical data synchronously to a remote site.
■ Recovery-Time Objective (RTO): The time within which systems, applications, or
functions must be recovered after an outage. It defines the amount of downtime that a
business can endure and survive.
For example, if the RTO is two hours, then use a disk backup because it enables
a faster restore than a tape backup.
Some examples of RTOs and the recovery strategies to ensure data availability are
Downloaded from5 EnggTree.com
Prepared by, Mrs.J.REVATHI, AP / CSE
SCET – DEPT OF CSE
EnggTree.com CCS367 – STORAGE TECHNOLOGIES
listed below
■ RTO of 72 hours: Restore from backup tapes at a cold site.
BC PLANNING LIFECYCLE
BC planning must follow a disciplined approach like any other planning process.
Organizations today dedicate specialized resources to develop and main tain BC
plans.
The BC planning life cycle includes five stages (see Figure 11-3):
1. Establishing objectives
2. Analyzing
3. Designing and developing
4. Implementing
5. Training, testing, assessing, and maintaining
Several activities are performed at each stage of the BC planning lifecycle, including
the following key activities:
1. Establishing objectives
■ Determine BC requirements.
■ Estimate the scope and budget to achieve requirements.
■ Select a BC team by considering subject matter experts from all areas of the
business, whether internal or external.
■ Create BC policies.
2. Analyzing
4. Implementing
■ Implement risk management and mitigation procedures that include backup,
replication, and management of resources.
■ Prepare the disaster recovery sites that can be utilized if a disaster affects the
primary data center.
■ Implement redundancy for every resource in a data center to avoid single
points of failure.
Backup software also provides extensive reporting capabilities based on the backup catalog and
the log files. These reports can include information such as the amount of data backed up, the
number of completed backups, the number of incomplete backups, and the types of errors that
may have occurred. Reports can be customized depending on the specific backup software used.
BACKUP METHODS
Hot backup and cold backup are the two methods deployed for backup. They are based on
the state of the application when the backup is performed.
In a hot backup, the application is up and running, with users accessing their data during the
backup process. In a cold backup, the application is not active during the backup process.
The backup of online production data becomes more challenging because data is actively being
used and changed. An open file is locked by the operating system and is not copied during the
backup process until the user closes it.
The backup application can back up open files by retrying the operation on files that were opened
earlier in the backup process. During the backup process, it may be possible that files opened
earlier will be closed and a retry will be successful.
The maximum number of retries can be configured depending on the backup application.
However, this method is not considered robust because in some environments certain files are
always open.
In such situations, the backup application provides open file agents. These agents interact directly
with the operating system and enable the creation of consistent copies of open files. In some
environments, the use of open file agents is not enough.
For example, a database is composed of many files of varying sizes, occupying several file
systems. To ensure a consistent database backup, all files need to be backed up in the same state.
That does not necessarily mean that all files need to be backed up at the same time, but they all
must be synchronized so that the database can be restored with consistency.
Consistent backups of databases can also be done by using a cold backup. This requires the
database to remain inactive during the backup. Of course, the disadvantage of a cold backup is
that the database is inaccessible to users during the backup process.
Hot backup is used in situations where it is not possible to shut down the database. This is
facilitated by database backup agents that can perform a backup while the database is active. The
disadvantage associated with a hot backup is that the agents usually affect overall application
performance.
A point-in-time (PIT) copy method is deployed in environments where the impact of downtime
from a cold backup or the performance resulting from a hot backup is unacceptable.
A pointer-based PIT copy consumes only a fraction of the storage space and can be created very
quickly. A pointer-based PIT copy is implemented in a disk-based solution whereby a virtual
LUN is created and holds pointers to the data stored on the production LUN or save location.
In this method of backup, the database is stopped or frozen momentarily while the PIT copy is
created. The PIT copy is then mounted on a secondary server and the backup occurs on the
primary server.
To ensure consistency, it is not enough to back up only production data for recovery. Certain
attributes and properties attached to a file, such as permissions, owner, and other metadata, also
need to be backed up.
These attributes are as important as the data itself and must be backed up for consistency. Backup
of boot sector and partition layout information is also critical for successful recovery.
In a disaster recovery environment, bare-metal recovery (BMR) refers to a backup in which all
metadata, system information, and application configurations are appropriately backed up for a
full system recovery.
BMR builds the base system, which includes partitioning, the file system layout, the operating
system, the applications, and all the relevant configurations.
BMR recovers the base system first, before starting the recovery of data files. Some BMR
technologies can recover a server onto dissimilar hardware.
DATA DEDUPLICATION :
Data deduplication emerged as a key technology to dramatically reduce the amount of space and
the cost that are associated with storing large amounts of data. Data deduplication is the art of
intelligently reducing storage needs in order of magnitude.
This method is better than common data compression techniques.
Data deduplication works through the elimination of redundant data so that only one instance of a
data set is stored. IBM has the broadest portfolio of data deduplication solutions in the industry,
which gives IBM the freedom to solve client issues with the most effective technology.
Whether it is source or target, inline or post, hardware or software, disk or tape, IBM has a
solution with the technology that best solves the problem:
IBM ProtecTIER® Gateway and Appliance
IBM System Storage N series Deduplication
IBM Tivoli Storage Manager
Data deduplication is a technology that reduces the amount of space that is required to store data
on disk. It achieves this space reduction by storing a single copy of data that is backed up
repetitively.
Data deduplication products read data while they look for duplicate data. Data deduplication
products break up data into elements and create a signature or identifier for each data element.
Then, they compare the data element signature to identify duplicate data. After they identify
duplicate data, they retain one copy of each element. They create pointers for the duplicate items,
and discard the duplicate items.
The effectiveness of data deduplication depends on many variables, including the rate of data
change, the number of backups, and the data retention period.
For example, if you back up the same incompressible data one time a week for six months, you
save the first copy and you do not save the next 24. This method provides a 25:1 data
deduplication ratio. If you back up an incompressible file on week one, back up the exact same
file again on week two, and never back it up again, this method provides a 2:1 data deduplication
ratio.
A more likely scenario is that a portion of your data changes from backup to backup so that your
data deduplication ratio changes over time.
With data deduplication, you can minimize your storage requirements. Data deduplication can
provide greater data reduction and storage space savings than other existing technologies.
Figure 6-13 shows the concept of data deduplication.
Data deduplication can reduce your storage requirements but the benefit you derive is determined
by your data and your backup policies. Workloads with a high database content have the highest
data deduplication ratios.
However, product functions, such as IBM Tivoli Storage Manager Progressive Incremental or
Oracle Recovery Manager (RMAN), can reduce the data deduplication ratio.
Compressed, encrypted, or otherwise scrambled workloads typically do not benefit from data
deduplication.
Good candidates for data deduplication are text files, log files, uncompressed and non-encrypted
database files, email files (PST, DBX, and IBM Domino®), and Snapshots (Filer Snaps, BCVs,
and VMware images).
deduplication processing because it offers larger target storage space without any need of a
temporary disk cache pool for post-processed deduplication data.
Bit comparison techniques, such as the technique that is used by ProtecTIER, were designed to
provide 100% data integrity by avoiding the risk of hash collisions.
There are a variety of approaches to cloud backup, with available services that can easily
fit into an organization's existing data protection process. Varieties of cloud backup
include the following:
and delivered via the internet, guaranteeing that only authorized users may access the
backup data.
Encryption
Data is encrypted before it is delivered over the internet to guarantee that it is safe from illegal
access. The encryption method employs a one-of-a-kind key produced by the cloud backup
program, and only the user has access to it.
Storage
After the data has been backed up, it is stored on remote servers operated by the cloud storage
provider. The data is kept in a safe, off-site location, which adds an extra degree of security against
data loss due to hardware failure, theft, or other sorts of calamities.
Recovery
To restore your data, just log into the cloud backup service and choose the files you want to
recover. The data will subsequently, be sent from distant servers to your device. This technique is
often quick and simple, and it does not require physical storage media or particular technological
knowledge.
Back up content
1. Back up photos and videos.
2. Back up files and folders.
DATA ARCHIVE :
An electronic data archive is a repository for data that has fewer access requirements.
Types of Archives :
It can be implemented as online, nearline, or offline based on the means of access:
■ Online archive: The storage device is directly connected to the host to make the data
immediately available. This is best suited for active archives.
■ Nearline archive: The storage device is connected to the host and infor- mation is local, but the
device must be mounted or loaded to access the information.
■ Offline archive: The storage device is not directly connected, mounted, or loaded. Manual
intervention is required to provide this service before information can be accessed.
An archive is often stored on a write once read many (WORM) device, such as a CD-ROM. These
devices protect the original file from being overwritten. Some tape devices also provide this
functionality by implementing file locking capabilities in the hardware or software.
Although these devices are inexpensive, they involve operational, management, and maintenance
overhead.
Requirements to retain archives have caused corporate archives to grow at a rate of 50 percent or
more per year. At the same time, organizations must reduce costs while maintaining required
service-level agreements (SLAs). Therefore, it is essential to find a solution that minimizes the
fixed costs of the archive’s operations and management.
Archives implemented using tape devices and optical disks involve many hidden costs.
The traditional archival process using optical disks and tapes is not optimized to recognize the
content, so the same content could be archived several times.
Additional costs are involved in offsite storage of media and media management. Tapes and
optical media are also susceptible to wear and tear. Frequent changes in these device
technologies lead to the overhead of converting the media into new formats to enable access and
retrieval.
Government agencies and industry regulators are establishing new laws and regulations to enforce
the protection of archives from unauthorized destruction and modification.
These regulations and standards affect all businesses and have established new requirements for
preserving the integrity of information in the archives.
These requirements have exposed the hidden costs and shortcomings of the traditional tape and
optical media archive solutions
REPLICATION :
Replication is the process of creating an exact copy of data. Creating one or more replicas
of the production data is one of the ways to provide Business Continuity (BC).
Data replication, where the same data is stored on multiple storage devices
Storage-based data replication makes use of software installed on the storage device to
handle the replication.
Image Source
Storage system-based replication supports both local and remote replication.
In storage system-based local replication, the data replication is carried out within the
storage system.
Local replication enables you to perform recovery operations in the event of data loss and
also provides support for backup.
Whereas in storage system-based remote replication, the replication is carried out between
storage systems. In simple words, one of the storage systems is on the source site and the
other storage system is on a remote site for data replication.
Data can be transmitted between the two storage systems over a shared or dedicated
network.
remote sites or backup locations, organizations can ensure continuous operations and minimize
downtime in the event of hardware failures, natural disasters, or other disruptions.
DATA MIGRATION
In general, data migration means moving digital information.
Transferring that information to a different location, file format, environment, storage system,
database, datacenter, or application all fit within the definition of data migration.
Data migration is the process of selecting, preparing, extracting, and transforming data and
permanently transferring it from one computer storage system to another.
Data migration is a common IT activity. However, data assets may exist in many different
states and locations, which makes some migration projects more complex and technically
challenging than others.
23 Prepared by, Mrs.J.REVATHI, AP / CSE
Data migration projects require planning, implementation, and validation to ensure their
success.
During data migrations, teams must pay careful attention to the following challenges:
Source data. Not preparing the source data being moved might lead to data duplicates,
gaps or errors when it's brought into the new system or application.
Wrong data formats. Data must be opened in a format that works with the system. Files
might not have access controls on a new system if they aren't properly formatted before
migration.
Mapping data. When stored in a new database, data should be mapped in a sensible way
to minimize confusion.
Sustainable governance. Having a data governance plan in place can help organizations
track and report on data quality, which helps them understand the integrity of their data.
Security. Maintaining who can access, edit or remove data is a must for security.
Microsoft SQL, AWS Data Migration Service, Varonis DatAdvantage and Varonis Data
Transport Engine.
There are three broad categories of data movers: host-based, array-based and network appliances.
Host-based software is best for application-specific migrations, such as platform upgrades,
database replication and file copying.
Array-based-based software is primarily used to migrate data between similar systems.
Network appliances migrate volumes, files or blocks of data depending on their
configuration.
running from the cloud instead of from an on-site server, but the total business cost of
downtime can be very high, so it’s imperative that the business can get back up and running.
Self-service DRaaS: The least expensive option is self-service DRaaS, where the customer
is responsible for the planning, testing and management of disaster recovery, and the
customer hosts its own infrastructure backup on virtual machines in a remote location.
Careful planning and testing are required to make sure that processing can fail over to the
virtual servers instantly in the event of a disaster. This option is best for those who have
experienced disaster recovery experts on staff.
There are the major goals of information security which are as follows −
Confidentiality − The goals of confidentiality is that only the sender and the predetermined
recipient should be adequate to approach the element of a message. Confidentiality have
negotiate if an unauthorized person is capable to create the message.
For example, it can be a confidential email message sent by user A to user B, which is
penetrated by user C without the authorization or knowledge of A and B. This kind of attack
is known as interception.
Integrity − When the element of a message are transformed after the sender sends it, but
since it reaches the intended recipient, and it can said that the principle of the message is lost.
For example, consider that user A sends message to user B and User C alter with a
message basically sent by user A, which is absolutely intended for user B.
User C somehow handles to access it, modify its elements and send the changed message to
user B. User B has no method of understanding that the element of the message changed after
user A had sent it. User A also does not understand about this change. This kind of attack is
known as modification.
Availability − The main goals of information security is availability. It is that resources must
be available to authorized parties at all times.
For instance, because of the intentional actions of an unauthorized user C, an
authorized user A cannot allow contact a server B. This can overthrow the principle of
availability. Such an attack is known as interruption.
Risk Triad
Risk triad defines the risk in terms of threats, assets, and vulnerabilities. Risk arises
when a threat agent (an attacker) seeks to access assets by exploiting an existing
vulnerability.
To manage risks, organizations primarily focus on vulnerabilities because they cannot
eliminate threat agents that may appear in various forms and sources to its assets.
Organizations can install countermeasures to reduce the impact of an attack by a
threat agent, thereby reducing vulnerability.
Risk assessment is the first step in determining the extent of potential threats and risks
in an IT infrastructure. To determine the probability of an adverse event occurring,
threats to an IT system must be analyzed in conjunction with the potential
vulnerabilities and the existing security controls.
The severity of an adverse event is estimated by the impact that it may have on critical
business activities. Based on this analysis, a relative value of criticality and sensitivity
can be assigned to IT assets and resources.
Assets, threats, and vulnerability are considered from the perspective of risk
identification and control analysis.
Assets
Information is one of the most important assets for any organization. Other assets
include hardware, software, and the network infrastructure required to access this
information.
To protect these assets, organizations must develop a set of parameters to ensure the
availability of the resources to authorized users and trusted networks. These
parameters apply to storage resources, the network infrastructure, and organizational
policies.
Several factors need to be considered when planning for asset security. Security
methods have two objectives.
First objective is to ensure that the network is easily accessible to authorized users. It
should also be reliable and stable under disparate environmental conditions and
volumes of usage.
Second objective is to make it very difficult for potential attackers to access and
compromise the system. These methods should provide adequate protection against
unauthorized access to resources, viruses, worms, Trojans and other malicious
software programs.
Threats
Threats are the potential attacks that can be carried out on an IT infrastructure. These
attacks can be classified as active or passive. Passive attacks are attempts to gain
unauthorized access into the system.
They pose threats to confidentiality of information. Active attacks include data
modification, Denial of Service (DoS), and repudiation attacks. They pose threats to
data integrity and availability. In a modification attack, the unauthorized user attempts
to modify information for malicious purposes.
5 Prepared by, Mrs.J.REVATHI,
AP / CSE
Downloaded from EnggTree.com
SCET – DEPT OF CSE EnggTree.com
CCS367 – STORAGE TECHNOLOGIES
A modification attack can target data at rest or data in transit. These attacks pose a
threat to data integrity. Denial of Service (DoS) attacks denies the use of resources to
legitimate users.
These attacks generally do not involve access to or modification of information on the
computer system. Instead, they pose a threat to data availability.
The intentional flooding of a network or website to prevent legitimate access to
authorized users is one example of a DoS attack. Repudiation is an attack against the
accountability of the information.
It attempts to provide false information by either impersonating someone or denying
that an event or a transaction has taken place.
Table 15-1 describes different forms of attacks and the security services used to
manage them.
Vulnerability
The paths that provide access to information are the most vulnerable to potential
attacks. Each of these paths may contain various access points, each of which
provides different levels of access to the storage resources.
It is very important to implement adequate security controls at all the access points on
an access path. Implementing security controls at each access point of every access
path is termed as defense in depth.
Attack surface, attack vector, and work factor are the three factors to consider when
assessing the extent to which an environment is vulnerable to security threats. Attack
surface refers to the various entry points that an attacker can use to launch an attack.
Each component of a storage network is a source of potential vulnerability
An attack vector is a step or a series of steps necessary to complete an attack. For
example, an attacker might exploit a bug in the management interface to execute a
snoop attack whereby the attacker can modify the configuration of the storage device
to allow the traffic to be accessed from one more host.
Work factor refers to the amount of time and effort required to exploit an attack
vector.
For example, if attackers attempt to retrieve sensitive information, they consider the
time and effort that would be required for executing an attack on a database.
The preventive control attempts to prevent an attack; the detective control detects
whether an attack is in progress; and after an attack is discovered, the corrective
controls are implemented.
Preventive controls avert the vulnerabilities from being exploited and prevent an
attack or reduce its impact. Corrective controls reduce the effect of an attack, while
detective controls discover attacks and trigger preventive or corrective controls.
Risk management
Businesses face different types of risks, including financial, legal, strategic, and
security risks. Proper risk management helps businesses identify these risks and find ways to
remediate any that are found.
Companies use an enterprise risk management program to predict potential problems
and minimize losses.
For example, you can use risk assessment to find security loopholes in your computer
system and apply a fix.
Compliance
Compliance is the act of following rules, laws, and regulations. It applies to legal and
regulatory requirements set by industrial bodies and also for internal corporate policies.
In GRC, compliance involves implementing procedures to ensure that business
activities comply with the respective regulations.
For example, healthcare organizations must comply with laws like HIPAA that
protect patients' privacy.
BENEFITS OF GRC :
By implementing GRC programs, businesses can make better decisions in a risk-
aware environment.
An effective GRC program helps key stakeholders set policies from a shared
perspective and comply with regulatory requirements.
With GRC, the entire company comes together in its policies, decisions, and actions.
The following are some benefits of implementing a GRC strategy at your organization.
Data-driven decision-making
You can make data-driven decisions within a shorter time frame by monitoring your
resources, setting up rules or frameworks, and using GRC software and tools.
Responsible operations
GRC streamlines operations around a common culture that promotes ethical values
and creates a healthy environment for growth. It guides strong organizational culture
development and ethical decision-making in the organization.
Improved cybersecurity
With an integrated GRC approach, businesses can employ data security measures to
protect customer data and private information. Implementing a GRC strategy is essential for
your organization due to increasing cyber risk that threatens users' data and privacy. It helps
organizations comply with data privacy regulations like the General Data Protection
Regulation (GDPR). With a GRC IT strategy, you build customer trust and protect your
business from penalties.
IMPLEMENTATION OF GRC:
Companies of all sizes face challenges that can endanger revenue, reputation, and
customer and stakeholder interest.
Some of these challenges include the following:
Internet connectivity introducing cyber risks that might compromise data storage
security
Businesses needing to comply with new or updated regulatory requirements
Companies needing data privacy and protection
Companies facing more uncertainties in the modern business landscape
Risk management costs increasing at an unprecedented rate
Complex third-party business relationships increasing risk
WORKING OF GRC :
GRC in any organization works on the following principles:
Key stakeholders
GRC requires cross-functional collaboration across different departments that
practices governance, risk management, and regulatory compliance.
Some examples include the following:
Senior executives who assess risks when making strategic decisions
Legal teams who help businesses mitigate legal exposures
Finance managers who support compliance with regulatory requirements
HR executives who deal with confidential recruitment information
IT departments that protect data from cyber threats
GRC framework
A GRC framework is a model for managing governance and compliance risk in a
company.
It involves identifying the key policies that can drive the company toward its goals.
By adopting a GRC framework, you can take a proactive approach to mitigating risks,
making well-informed decisions, and ensuring business continuity.
Companies implement GRC by adopting GRC frameworks that contain key policies
that align with the organization's strategic objectives.
Key stakeholders base their work on a shared understanding from the GRC
framework as they devise policies, structure workflows, and govern the company.
Companies might use software and tools to coordinate and monitor the success of the
GRC framework.
GRC maturity
GRC maturity is the level of integration of governance, risk assessment, and
compliance within an organization.
You achieve a high level of GRC maturity when a well-planned GRC strategy results
in cost efficiency, productivity, and effectiveness in risk mitigation.
Meanwhile, a low level of GRC maturity is unproductive and keeps business units
working in silos.
GRC TOOLS:
GRC tools are software applications that businesses can use to manage policies,
assess risk, control user access, and streamline compliance.
There are some of the following GRC tools to integrate business processes, reduce
costs, and improve efficiency.
GRC software
GRC software helps automate GRC frameworks by using computer systems. Businesses
use GRC software to perform these tasks:
Oversee policies, manage risk, and ensure compliance
Stay updated about various regulatory changes that affect the business
Empower multiple business units to work together on a single platform
Simplify and increase the accuracy of internal auditing
User management
You can give various stakeholders the right to access company resources with user
management software.
This software supports granular authorization, so you can precisely control who has
access to what information.
User management ensures that everyone can securely access the resources they need
to get their work done.
Auditing
You can use auditing tools like AWS Audit Manager to evaluate the results of
integrated GRC activities in your company.
By running internal audits, you can compare actual performance with GRC goals.
You can then decide if the GRC framework is effective and make necessary improvements.
It takes great effort to get every employee to share an ethically compliant culture.
Senior executives must set the tone of transformation and ensure that information is passed
through all layers of the organization.
Clarity in communication
The success of GRC implementation depends on seamless communication.
Information sharing must be transparent between GRC compliance teams, stakeholders, and
employees. This makes activities like creating policies, planning, and decision-making
easier.
LogicManager.
LogicGate Risk Cloud.
MetricStream Enterprise GRC.
Navex Global Lockpath.
ServiceNow Governance, Risk, and Compliance.
Availability management
The critical task in availability management is establishing a proper guideline for all
configurations to ensure availability based on service levels.
For example, when a server is deployed to support a critical business function, the
highest availability standard is usually required.
This is generally accomplished by deploying two or more HBAs, multipathing
software with path failover capability, and server clustering. The server must be
connected to the storage array using at least two independent fabrics and switches that
have built-in redundancy. Storage devices with RAID protection are made available to
the server using at least two front-end ports. In addition, these storage arrays should
have built-in redundancy for various components, support backup, and local and
remote replication. Virtualization technologies have significantly improved the
availability management task. With virtualization in place resources can be
dynamically added or removed to maintain the availability.
Capacity management
The goal of capacity management is to ensure adequate availability of resources for
all services based on their service level requirements.
Capacity management provides capacity analysis, comparing allocated storage to
forecasted storage on a regular basis.
It also provides trend analysis of actual utilization of allocated storage and rate of
consumption, which must be rationalized against storage acquisition and deployment
timetables.
Storage provisioning is an example of capacity management.
It involves activities such as device configuration and LUN masking on the storage
array and zoning configuration on the SAN and HBA components. Capacity
management also takes into account the future needs of resources, and setting up
monitors and analytics to gather such information.
Performance management
Security Management
Security management prevents unauthorized access and configuration of storage
infrastructure components.
For example, while deploying an application or a server, the security management
tasks include managing user accounts and access policies, that authorizes users to
perform role-based activities.
The security management tasks in the SAN environment include configuration of
zoning to restrict an HBA’s unauthorized access to the specific storage array ports.
LUN masking prevents data corruption on the storage array by restricting host access
to a defined set of logical devices.
Reporting
It is difficult for businesses to keep track of the resources they have in their data
centers, for example, the number of storage arrays, the array vendors, how the storage
arrays are being used, and by which applications.
Reporting on a storage infrastructure involves keeping track and gathering
information from various components/processes.
This information is compiled to generate reports for trend analysis, capacity planning,
chargeback, performance, and to illustrate the basic configuration of storage
infrastructure components.
Capacity planning reports also contain current and historic information about
utilization of storage, file system, database tablespace, and ports.
Configuration or asset management reports include details about device allocation,
local or remote replicas, and fabric configuration; and list all equipment, with details
such as their value, purchase date, lease status, and maintenance records.
Chargeback reports contain information about the allocation or utilization of storage
infrastructure components by various departments or user groups. Performance reports
provide details about the performance of various storage infrastructure components.
Security issues: Storing sensitive data can also present security risks, and the
operating system must have robust security features in place to prevent unauthorized
access to this data.
Backup and Recovery: Backup and recovery of data can also be challenging,
especially if the data is stored on multiple systems or devices.
PROVISIONING
This method entails assigning storage capacity by analyzing current capabilities, such
as storage on physical drives or the cloud, and deciding the proper information to
store in each location.
It's important to consider factors such as ease of access and security when
determining where to store your data.
Planning where to store data allows organizations to discover whether they have
ample storage space available or whether they should reconfigure their system for
better efficiency.
DATA COMPRESSION
This is the act of reducing the size of data sets without compromising them.
Compressing data allows users to save storage space, improve file transfer speeds and
decrease the amount of money they spend on storage hardware and network
bandwidth.
Data compression works by either removing unnecessary bits of information or
redundancies within data.
For example, to compress an audio file, a data compression tool may remove parts of
the file that contain no audible noise.
This would reduce the size of the file while still preserving essential parts of the data.
DATA MIGRATION
This method entails moving data from one location to another. This can include the
physical location, such as from one hard drive to another, or the application that uses
the data.
Data migration is often necessary when introducing new hardware or software
components into an organization.
For example, if a business purchases new computers for its office, it's important to
transfer all data from the old systems to the new ones.
Important factors to consider while implementing data migration include ensuring
network bandwidth, effective transfer speeds, data integrity and ample storage space
for the new location throughout the transfer.
DATA REPLICATION
This process includes making one or more copies of a particular data set, as there are
several reasons why a company may want to replicate its data.
For example, you may wish to create a backup if there's a problem with an original
data set. You may also want to replicate data so you can store it across different
locations, improving the overall accessibility across your network.
There are two types of data replication: synchronous and asynchronous.
Synchronous data replication is when companies copy any changes to an original
data set in the replicated data set. This type of replication ensures updated information
but may also use require more resources than asynchronous replication.
Asynchronous replication only occurs when a professional enters a command into
the database, so it's not an automatic process. With this type, your company has more
control over the resources used to replicate data but may not possess real-time data
backups.
AUTOMATION
Automation is the process of having tools automatically manage your data. Rather
than updating your data manually, you can use software tools to accomplish this task
for you.
For example, you could use a tool to automatically update a shared database whenever
you make a change on your local computer, rather than requiring manual updates.
This would ensure that the database contains updated information for all users and
prevents users from viewing outdated information if a user forgets to submit changes.
DISASTER RECOVERY
Disaster recovery is a plan companies create for potential scenarios regarding data
issues.
For example, if the hard drive that stores your data breaks, it's important to have an
effective plan that allows your business to return to normal operations. This plan
might include switching to a backup hard drive, making a new copy of that backup
and purchasing a new primary hard drive.
Important elements in a disaster recovery plan include speed, data integrity and costs.
Effective organizations often have plans that decrease technological downtime as
much as possible.
In addition, it's important to prevent loss of essential data.
Finally, organizations typically aim to reduce costs wherever possible, such as
compressing data to save money on storage requirements.