Module 12-Storage Infrastructure Management - Participant Guide
Module 12-Storage Infrastructure Management - Participant Guide
INFRASTRUCTURE
MANAGEMENT
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Table of Contents
Concepts in Practice................................................................................................ 27
Concepts in Practice .......................................................................................................... 28
Module Objectives
Overview
Service-focused approach
1An SLA is a formalized contract document that describes service level targets,
service support guarantees, service location, and the responsibilities of the service
provider and the user. These parameters of a service determine how the
components of the data protection environment will be managed.
End-to-end visibility
Orchestrated operations
Infrastructure Discovery
• Monitoring provides visibility into the storage infrastructure and forms the basis
for performing management operations.
• Alerting provides information about events or impending threats or issues.
• Reporting involves gathering information from various components and
operations management processes.
Operations Management
Operations Management
Monitoring
Monitoring provides visibility into the storage information health and involves the
following activities:
Monitoring Parameters
Configuration
Zone esx161_vnx_152_1
FC Switch
Availability
SW1
H2
SW2
Storage System
H3
Unavailable
Capacity
Free
NAS Capacity NAS
Free
Capacity
Free
Free Capacity
Capacity
Used Used
Capaci Capaci
Used ty ty
Capaci
Used
ty
Capaci
ty
NAS File System NAS File System
• Involves examining the amount of infrastructure resources used and what is still
available. Examples would be the free space available on a file system or a
storage pool or, the numbers of ports available on a switch.
• Helps an administrator to ensure uninterrupted data availability by averting
outages before they occur.
Performance
H1
H2
SW1
H3
SW2
100 %
New
Compute Storage System
System Port Utilization %
Compute
Systems H1 + H2 + H3
Security
Workgroup 2 (WG2)
SW1
WG2
WG1
SW2
Storage System
Replication Command
Inaccessible
Workgroup 1 (WG1)
• Detects all operations and data movement that deviate from predefined security
policies.
• Detects unavailability of information and services to authorized users due to
security breach.
Alerts
Reporting
1: Capacity planning reports contain current and historic information about the
utilization of storage, file systems, ports, etc.
2: Configuration and asset management reports include details about the allocation
of storage, local or remote replicas, network topology, and unprotected systems.
This report also lists all the equipment, with details, such as their purchase date,
license, lease status, and maintenance records.
3: The ability to measure storage resource consumption per business unit or user
group and charge them back accordingly.
To perform chargeback, the storage usage data is collected by a billing system that
generates chargeback report for each business unit or user group. The billing
system is responsible for accurate measurement of the number of units of storage
used and reports cost/charge for the consumed units.
Configuration Management
Process
Services Hardware Software People SLAs
Document
The information about CIs include their attributes, used and available capacity,
history of issues, and inter-relationships.
Change Management
Capacity Management
Performance Management
Examples of performance
management activities include:
Availability Management
Incident Management
The following table illustrates an example of an incident that was detected by the
Incident Management tool:
Sever Event Type Devi Priori Stat Last Updated Own Escalat
ity Summ ce ty us er ion
ary
Problem Management
Security Management
• Managing user accounts and access policies that authorize users to use a
backup/replication service.
• Implementing controls at multiple levels (defense in depth) to access data and
services.
• Scanning applications and databases to identify vulnerabilities.
• Configuring zoning, LUN masking, and data encryption services.
Knowledge Check
Knowledge Check
1. What information does infrastructure discovery identify? Select all that apply.
a. Configuration and connectivity
b. Capacity
c. Physical-to-virtual dependencies
d. Virtual-to-virtual dependencies
Knowledge Check
Knowledge Check
Concepts in Practice
Concepts in Practice
Dell SRM
• Combines storage capacity planning and chargeback reporting for Dell EMC
and multivendor storage environments.
• Supports end-to-end data path visualization for performance analysis and
workload balancing.
• Provides custom, multitenant, multi-site, dashboards, and reports.
• Helps in configuration change planning and compliance monitoring to validate
design best practices and the Dell EMC Support Matrix.
• Helps organizations optimize capacity and improve productivity to get the most
out of their investments in block, file, and object storage.
Dell EMC storage resource manager (SRM) frontend. (Click image to enlarge)
Dell CloudIQ
organizations with the insight to more efficiently and proactively manage their IT
infrastructure to meet business demand.
The CloudIQ portal displays your Dell EMC infrastructure systems in one view to
simplify monitoring across your data center, edge and co-location sites as well as
data protection in public clouds. With CloudIQ, you can easily assure that critical
business workloads get the capacity and performance they need, spend less time
monitoring and troubleshooting infrastructure, and spend more time innovating and
focusing on projects that add new value to organizations.
Scenario
Challenges
Requirements
Deliverables
Solutions
• A storage infrastructure includes three compute systems (H1, H2, and H3) that
are running hypervisors.
• All the compute systems are configured with two FC HBAs, each connected to
the production storage system through two FC switches, SW1 and SW2. All the
compute systems share two storage ports on the storage system.
• Multipathing software has also been installed on each compute system's
hypervisor. If one of the switches, SW1 fails, the multipathing software initiates
a path failover, and all the compute systems continue to access data through
the other switch, SW2.
• Due to absence of a redundant switch, a second switch failure could result in
unavailability of the storage system. Monitoring for availability enables detecting
the switch failure and helps the administrator take corrective action before
another failure occurs. In most cases, the administrator receives symptom alerts
for a failing component and can initiate actions before the component fails.
The image illustrates the importance of monitoring the capacity of a storage pool in
a NAS system:
• If the file system is full and no space is available for applications to perform
write I/O, it may result in an application/service outage.
• Monitoring tools can be configured to issue a notification when thresholds are
reached on the file system capacity; for example:
− When the file system reaches 66 percent of its capacity, a warning message
is issued.
− A critical message is issued when the file system reaches 80 percent of its
capacity.
− This enables the administrator to take action by provisioning additional LUNs
and extending the NAS file system before it runs out of capacity.
• Compute systems H1, H2, and H3 (with two iSCSI HBAs each) are connected
to the storage system through Ethernet switches SW1 and SW2.
• The three compute systems share the same storage ports on the storage
system to access LUNs.
• A new compute system running an application with a high work load must be
deployed to share the same storage port as H1, H2, and H3.
• Monitoring storage port utilization ensures that the new compute system does
not adversely affect the performance of the other compute systems.
Here, utilization of the shared backup storage system port is shown by the solid
and dotted lines in the graph. If the port utilization prior to deploying the new
compute system is close to 100 percent, then deploying the new compute system is
not recommended because it might impact the performance of the backup clients
running on other compute systems. However, if the utilization of the port prior to
deploying the new compute system is closer to the dotted line, then there is room to
add a new compute system.
• The storage system is shared between two workgroups, WG1 and WG2.
• The data of WG1 should not be accessible by WG2 and vice versa.
• A user from WG1 might try to make a local replica of the data that belongs to
WG2.
• If this action is not monitored or recorded, it is difficult to track such a violation of
security protocols.
• Conversely, if this action is monitored, a warning message can be sent to
prompt a corrective action or at least enable discovery as part of regular
auditing operations.
4The change management team assesses the potential risks of the changes,
prioritizes, and makes a decision on the requested changes.
The monitoring tools also help administrators to identify the gap between the
required availability and the achieved availability.
• The administrators can quickly identify errors or faults in the components that
may cause data unavailability in the future.
• Based on the data availability requirements and areas found for improvement,
the availability management team may propose and architect new data
protection and availability solutions or changes in the existing solutions.
For example, the availability management team may propose an NDMP backup
solution to support a data protection service or any critical business function that
requires high availability. The team may propose both component-level and site-
level redundancy. This is generally accomplished by deploying two or more network
adapters per backup component, multi-pathing software, and compute clustering.
The backup components must be connected to each other using redundant
switches and/or network. The switches must have built-in redundancy and hot-
swappable components. The VMs hosting backup applications must be protected
from hardware failure/unavailability through VM live shadow copy mechanisms. The
backup storage system should also have built-in redundancy for various
components and should support local and remote backup.