Solution Overview Telemetry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Solution overview

Cisco public

Cisco SAN Analytics and


SAN Telemetry Streaming

A deeper look at enterprise storage infrastructure


The enterprise storage industry is going through a historic transformation. On one end, deep adoption of all-flash arrays, and
on the other end, technologies like NVMe (nonvolatile memory express) and NVMe over fabrics are changing the landscape
forever. High performance is the key motivation for these storage trends. Millions of Input/Output Operations Per Second
(IOPS) and response times in microseconds are the new norms. However, lab certified results does not always represent
what you see in your production environments. It’s a question of known versus unknown, controlled versus uncontrolled,
low-risk versus high-risk.
In production, every single operational change must be backed by data and thoroughly approved. There is no hit-and-trial.
A production environment is like a multidimensional equation. While every single variable in that equation may work on its
own, bringing together multiple variables requires deep visibility and understanding of the way these components interact.
Without that visibility, a production environment is a best-effort solution.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

Figure 1. Challenges in getting visibility into enterprise storage infrastructure As a result, they require information in a simple and intuitive format that can
be directly converted into actionable insight.


Compute and Applications
Virtualized or bare metals •
Storage
Shared between multiple
The following sections provide an overview of Cisco SAN Analytics and
• Rack-mounts or blade servers
Writes
workloads SAN Telemetry Streaming: A first-of-its-kind industry solution to resolve the
• Multiple operating systems • All flash, hybrid or spinning

• Different block storage I/O


disk arrays above-mentioned challenges.
requirements Reads • Different architectures
• Multiple vendors
Storage Area Network (SAN)

Under ownership of different teams


Introducing Cisco SAN Analytics
The Cisco SAN Analytics solution offers end-to-end visibility into Fibre
Channel block storage traffic. The solution is natively available on the storage
Chief Information Officers (CIOs) and Chief Technology Officers (CTOs) area network due to its integrated-by-design architecture with the Cisco
understand the importance of deep visibility and analytics. However, MDS 9000 switch family. Cisco SAN Analytics delivers deep visibility into I/O
achieving it often becomes challenging in production environments due to: traffic between the compute and the storage infrastructure. This information
is in addition to the already-available visibility obtained from individual ports,
• Lack of unified visibility: Getting a unified view of compute, storage, switches, servers, virtual machines, and storage arrays.
and Storage Area Network (SAN) under a single umbrella is often complex.
Figure 2. Cisco SAN Telemetry Streaming overview
Visibility at few endpoints is possible today, but the complete visibility
of storage traffic becomes complex with the currently available SAN Analytics
industry offerings. • Deep visibility in I/O traffic between Compute
and Storage
• Hybrid infrastructure: Enterprises have to deal with multiple architectures
• Integrated-by-design with SAN
at the same time. For example, the compute layer itself may be running
• Real-time, vendor neutral monitoring
different types of hypervisors and virtual machines. These hypervisors and
• Scalable with an open architecture
the guest virtual machines may be developed by different vendors and
based on different architectures. Similarly, the storage infrastructure may
include different types of storage arrays. These arrays might be all-flash,
hybrid, or nonflash arrays. The storage arrays may be based on different
architectures —even if they are sourced from the same vendor. Writes
• Organizational silos: Many organizations have assigned the ownership of
different components to different teams. Often these teams work in silos. Reads
Even if the coordination is very well maintained, the process and compliance Storage Area Network (SAN)
guidelines may slow down the cross-team interaction. Organizational silos
are also one of the key reasons for delayed detection of issues and
troubleshooting. Compute and Storage
Applications
• Lack of simplicity: Professionals who own the infrastructure already deal
with multiple tools and architectures today. It is not practically feasible for
them to become expert in many different technologies at the same time.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

Cisco SAN Analytics and SAN Telemetry Streaming works on the following design principles as shown in Table 1.
Table 1. Cisco SAN Analytics and SAN Telemetry design principles

Principle How it works What it delivers to you

Integrated by design Compute and storage layers in a data center interact with each other • You can continue to maintain your operations under the existing
using a SAN. Cisco SAN Analytics is fully integrated by design into the well-known layers of compute, storage and SAN.
SAN layer. • You do not need to introduce any new traffic inspecting
components to your data center floor.

Simple The deployment of Cisco SAN Analytics is integrated into your • Problems in the environment can be resolved faster.
existing Cisco MDS 9000 switches. The capability can be enabled by • You can make your operations more proactive leading to
a single command within seconds. improved user experience.

Affordable Due to the integrated-by-design architecture, no new traffic • Eliminate CapEx and OpEx associated with managing
inspecting components are introduced into your data center. dedicated appliances.
Simple and flexible licensing can enable the functionality to provide • Flexible licensing helps you to enable this feature where you want
end-to-end visibility. and when you want.

Scalable Cisco SAN Analytics natively scales up with the size of your SAN. • You can deploy it everywhere to get full coverage of your
Whether it is a small SAN of a single switch or a large SAN with storage traffic.
thousands of ports, you get visibility as end devices are connected to • Analytics scale grows with the size of your fabric.
your SAN.

Always on Cisco SAN Telemetry Streaming has been designed to be always on. • You can profile and baseline the storage traffic of your
applications under normal conditions.
• You do not have to find the tipping points of your environment
manually because the always-on monitoring is doing that for
you proactively.

Open and Cisco SAN Telemetry Streaming has been designed to be open and • Support of additional metrics and protocols with firmware upgrade.
programmable programmable. The intelligent metrics are accessible to third-party • Build your own analytics apps to solve specific use cases or
tools using industry-leading formats. It is based on a programmable integrate with existing third-party apps for organization-wide
architecture. uniform visibility.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

Cisco SAN Analytics and SAN Telemetry Streaming resolves existing and new challenges. Table 2 lists existing limitations and how Cisco is addressing them.
Table 2. How Cisco SAN Analytics and SAN Telemetry Streaming addresses challenges

Existing challenges How Cisco SAN Analytics and SAN Telemetry Streaming helps

Lack of unified Cisco SAN Analytics is integrated by design into the SAN. It inspects I/O flows to bring out a unified view of the infrastructure irrespective of the
visibility architecture or vendor of storage arrays, servers or operating systems.

Hybrid infrastructure Cisco SAN Analytics is agnostic to the compute or storage infrastructure’s vendor or architecture. The visibility is obtained from the traffic flow
on the SAN. Cisco SAN Telemetry Streaming has little or no dependency on hardware or software versions of the end devices connected to
the SAN.

Organization silos Cisco SAN Telemetry Streaming exports data in industry-leading formats. This information can be remotely accessed by different teams at
the same time with complete independence. Different organizations can continue to use the tools they prefer by integrating the exported
information from Cisco SAN Telemetry Streaming.

Lack of simplicity Cisco SAN Analytics is extremely simple to deploy due to its integrated-by-design principle. The functionality can be enabled within seconds on
the Cisco MDS 9000 series switches.

Cisco SAN Analytics and SAN Telemetry Streaming use cases


Table 3 describes several Cisco SAN Analytics and SAN Telemetry Streaming use cases.
Table 3. Cisco SAN Telemetry Streaming use cases

Use case How it is delivered Why it matters

Storage performance Cisco SAN Analytics solution quantifies the performance of the storage infrastructure Complete visibility into your infrastructure reduces
insight using a holistic approach. Performance metrics are calculated for the flows between host risks and helps maintain optimum performance.
port (initiator), storage port (target), and the Logical Unit Numbers (LUNs). This unique
combination is known as ITL (Initiator-Target-LUN) flow. For NVMe traffic, the same concept
is represented by ITN, N representing a Namespace ID. The performance metrics are
calculated in real time and can be exported to an external receiver.

Faster troubleshooting The information generated by Cisco SAN Analytics solution can be used to maintain Proactive and predictive troubleshooting helps to
a performance baseline. A deviation from the historic trend can be used to generate meet strict Service-Level Agreements (SLAs) and
automated alarms, resulting into proactive troubleshooting. This monitoring also provides reduce downtimes.
insight into why the performance degraded and where may be the root cause of
the problem.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

Use case How it is delivered Why it matters

Infrastructure Cisco SAN Analytics can help to make scale-up versus scale-out decisions by Optimum utilization of your infrastructure can lead
optimization monitoring storage traffic between ITL/ITN pairs. For example, it may be possible that a to CapEx and OpEx savings.
host is running multiple virtual machines, or a particular storage port is used to access
multiple LUNs, either or which can lead to heavy storage traffic. Cisco SAN Analytics
can help you to find such conditions. To resolve, you can optimize the distribution of
the components, such as moving some of the virtual machines to a less-utilized host or
moving some of the LUNs to a less-utilized storage port.

Application Cisco SAN Analytics can be used to monitor storage traffic patterns for extended Data-driven decisions help to deploy applications
deployment durations. This information can be used to profile the applications for their storage faster.
recommendation needs. Future expansion of the same application to other virtual machines or hosts
can be recommended based on storage traffic requirements. For example, consider Eliminate hit-and-trial to ensure optimum
an existing application A. If another instance of application A needs to be deployed, performance.
knowing the storage traffic throughput can help when choosing a new host that has the
resources available to sustain the throughput requirement.

Storage provisioning Cisco SAN Analytics provides performance metrics of the available LUNs/Namespaces. Optimized utilization of storage arrays and the
recommendation This information can be used to enhance the storage provisioning. For example, LUNs overall storage infrastructure improves efficiency.
can be distributed across different storage ports to meet their throughput requirements.
If a LUN is showing no activity for an extended duration, it can be inspected further for
possible errors or can be unprovisioned.

Change Management Cisco SAN Analytics collects performance metrics at ITL/ITN granularity which can be Proactive SLA assurance during change of
used to generate trends and baselines. If a particular component needs to be changed, components lends peace of mind.
the insight generated by Cisco SAN Analytics before and after the change can be
monitored and compared to ensure that the change was successful.

Auditability Cisco SAN Analytics generates multiple performance and error metrics for the complete Get the information you need to ensure compliance.
storage fabric using a holistic approach. This information can be used to perform audits
of the infrastructure.

Health report Cisco SAN Analytics monitors ITL/ITN flows between the compute and storage layers, Simplify operations and proactively resolve problems.
including the read and the write transactions between a host and the backend storage.
This information is available in addition to already-existing device- and port-level
metrics. All these metrics can be used to generate health reports. This report goes
beyond the health of the SAN by revealing the health of the overall storage infrastructure
and applications.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

Cisco SAN Analytics Traffic inspection


Traffic inspection is integrated with the latest generation Fibre Channel port ASICs available on Cisco
solution architecture MDS 9000 switches. Frames in ingress or egress direction can be inspected without any performance or
Cisco SAN Analytics is integrated by design feature penalty. In other words, Traffic Access Points (TAPs) are inbuilt on the port ASICs.
into the SAN between the compute and the
Block I/O transactions between initiators and targets are facilitated by SCSI or NVMe protocol utilizing
storage layers. This is made possible by
an underlying Fibre Channel connection. The application data (reads or writes) is encapsulated within
recent innovations in the Cisco MDS 9000
Fibre Channel and SCSI/NVMe headers. The port ASICs, after inspection of the frames, capture Fibre
switch family. The overall architecture can be
Channel and SCSI/NVMe headers of the relevant frames. Cisco MDS 9000 switches make no attempt
logically divided into three components.
to inspect or capture underlying application data. Only headers are inspected, which is enough to
• Traffic inspection by port ASICs. gather the information.
• Traffic processing and flow metric I/O metric calculation
calculation by an on-board Network
Processing Unit (NPU). Flow metric calculation is performed on the switch itself with the help of an on-board Network
Processing Unit (NPU). A NPU is a programmable processor optimized for packet processing. On Cisco
• Streaming of flow metrics to an external MDS 9000 switches, the NPU receives frames from the port ASICs, truncated up to the Fibre Channel
analytics and visualization engine for and SCSI/NVMe headers. The frame headers are then subjected to a specialized low-level microcode
end-to-end visibility. which generates the flow metrics by correlating multiple frames with common attributes (for example
same I/O transaction (or Exchange) and same ITL/ITN flow). The metrics are stored in a hierarchical and
relational database maintained in the memory associated with the NPU.
The availability of a programmable NPU on the switches enable tremendous possibilities. New capabilities
can be added by a non-disruptive software upgrade. For example, support for FC-NVMe and other
additional metrics was added later. In future, more metrics can be added without any hardware changes.

As the size of the fabric grows, the resources for traffic inspection and metric calculation also grows
because of the integrated-by-design architecture.

Streaming of I/O metrics to an external receiver (SAN Telemetry Streaming)


Cisco MDS 9000 switches stream the flow metrics to an external receiver in industry leading open
formats. An external receiver can bring the fabric-wide and end-to-end visibility into a single pane
of glass. The external receiver can also provide long term metric storage, trending, correlation,
predictions, etc. The implementation of the external receiver has been de-coupled from the on-switch
SAN Analytics architecture for development flexibility. The receiver can aim to solve very specific
use-cases based on the metrics received from Cisco MDS 9000 switches. For example, an external
receiver can receive metrics from multiple switches at the same time and can also correlate with the
information generated from initiators and targets.

© 2019 Cisco and/or its affiliates. All rights reserved.


Solution overview
Cisco public

I/O metrics calculated Figure 3. Cisco SAN Telemetry Streaming architecture

on Cisco MDS 9000 in Traffic Inspection


• Integrated TAPs in port-ASIC
32G FC port ASIC
FC and SCSI/NVMe
I/O Metric Calculation
• On-board NPU
NX-OS 8.4(1) •

Real-time inspection
No impact on data traffic
32G FC port ASIC
headers
NPU • Receives headers from port-ASICs
• Calculates metrics by correlating
Following is a non-exhaustive list of I/O metrics • Inspects only headers, not data
32G FC port ASIC multiple frames
calculated by the Cisco MDS 9000 32-Gbps
switches in NX-OS 8.4(1). These metrics MDS 9700 32-Gbps FC module
are calculated for up to 60,000 ITL/ITN flows
per Cisco MDS 9700 director (for SCSI and
NVMe traffic), and are in addition to the already
existing port level metrics.
• Initiator ID: Fibre Channel ID (FCID) of
the initiator.
• Target ID: Fibre Channel ID (FCID) of Cisco MDS 9700
the target. Compute and Applications Storage Area Network (SAN) Storage
• LUN/NSID: Logical-Unit-Number (LUN) or
Namespace ID associated with the target.
• I/O Per Second (IOPS): Number of read or SAN Telemetry Streaming
write commands per second. I/O metrics are continuously streamed
to external receiver(s) in open format
• Throughput: Read or write command
bandwidth in bytes per second.
• Exchange Completion Time: Time taken
to complete read or write command Conclusion
(or Exchange) in microseconds. Cisco SAN Analytics is the industry’s first solution to provide visibility into Fibre Channel block storage
• Data Access Latency: Time between the traffic by inspecting frames natively on Fibre Channel switches without any external taps, probes,
read or write command and the response or appliances. It seamlessly scales to every end device of your fabric using a simple and affordable
from storage array in microseconds. approach. The open and programmable architecture helps you to work across organization silos.
• Outstanding IO: Number of read or write Overall, The real time visibility and analytics offered by Cisco SAN Analytics helps you to maintain peak
commands yet to be completed. performance and troubleshoot problems proactively.
• Error counters like aborts, failure,
timeouts, etc.
For more information
The Cisco MDS 9700 32-Gbps Fibre Channel Cisco MDS 9000 Series NX-OS SAN Analytics and SAN Telemetry Streaming Configuration Guide
module and the MDS 9000 32-Gbps Fibre
Channel fabric switches support SAN Cisco MDS 9132T data sheet
Telemetry Streaming powered by the Cisco MDS 9700 32-Gbps Fibre Channel module data sheet
advanced port-ASIC and on-board NPU.
© 2019 Cisco and/or its affiliates. All rights reserved. Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other
countries. To view a list of Cisco trademarks, go to this URL: https://www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective
owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1110R) C22-740197-01  08/19

You might also like