Compuware APM - Introduction
Market Trends, Business Challenges & APM
The World is Changing & the Rate of Change is Accelerating
Application visibility and optimization of the customer experience are more important than ever
Complexity Explosion Business Demands More Change, Faster
Business
I want change! I want competitive advantage! I want stability!
Development
Operations
User Expectations Continue to Rise
Data Smog and Blind Spots
Web Analytics
Virtualization
Third Parties
Java/.NET
Database
Network
Storage
Server
Market Trends, Business Challenges & APM
The World is Changing & the Rate of Change is Accelerating
Application visibility and optimization of the customer experience are more important than ever
Complexity Explosion Business Demands More Change, Faster
Business
I want change! I want competitive advantage! I want stability!
Development
Operations
User Expectations Continue to Rise
Data Smog and Blind Spots
Web Analytics
Virtualization
Third Parties
Java/.NET
Database
Network
Storage
Server
Market Trends, Business Challenges & APM
A Case Study for a Changed World
Verizon CEO Daniel Meadmore than 60% of iPhone sales occurred online. Thats 24,000 sales per day Thats $5-10m per day 4 internal content providers 23 external content providers
Akamai x 4 DoubleClick x 3 HitBox YieldManager Google Ad Services Atlas Advertising Amgdgt.com Interlick Tribal Fusion Turn.com
APM in 2010
End User Experience Monitoring Application Component Deep Dive
1. Captures the End User Experience of an application or service Captures rich statistics regarding components and component domains
2.
APM 2010
3.
Discovers/models application determined logical topology
Business Transaction Process/Flows
4.
Traces transaction flow across the IT environment
5.
PMDB
Performance Management Database
Consolidated, normalised, correlate d & analysed
APM in 2015
APM 2010
1. EUE, deep dive, application model, trans flows, PMDB Policy setting and workflow orchestration Understand, analyse application patterns and spot deviations Distributed knowledge capture, knowledge sharing and improvements Support cloud model and end to end management off premises and on Monitor resource usage
Policy and Orchestration Engine
2.
APM 2015
Application Behaviour Learning
3.
Crowdsourcing and Collaboration
4.
Cloud Enablement
5.
Cost Allocation and Chargeback
6.
Introduction to APM
Introduction to APM
Application Performance Management
End-user
What is the end-user experience?
Enterprise / Business
Why Manage End-user Experience?
Operations
Typical Enterprise Requirements
The Application Performance Challenge: Problems Everywhere Along the Delivery Chain Traditional Operational monitoring
Development
Traditional development flow
Gartners five APM dimensions Compuware APM product range
End-User: What is the end-user experience?
Application availability and performance for the end-user
APM for Retail Banking: http://www.youtube.com/watch?v=M7qEuLxQgOM
Button press or request
Page Load or response
The Answer: Adopt an Application Point of View That Starts with the User
Application Point of View that Starts with the End User
Data Center Cloud: Private and Public Users
ISPs Mobile carriers Browsers Devices AJAX JavaScript Mobile apps
Web Mobile App logic Database Network Mainframe Virtualization SOA CDNs Third party services
Customers
Application
Application
Employees Infrastructure
Enterprise / Business: Why manage end-user experience?
73% of performance issues are user-reported Yet less than 5% actually complain End-user Experience impacts business success
Slow apps reduce revenue by 9% and productivity by 64% *
Most monitoring is at component level Not immediately actionable Efficient enterprises accelerate fault domain isolation 80+ percent of problem resolution is misspent finding the fault, not fixing it Why? Increasing Data Center Complexity
Cost REDUCED
Revenue IMPROVED
* Aberdeen, APM: Getting on the C-Levels agenda
Enterprise / Business: Reduce time spent on Awareness and Isolation
Revenue Impact / Cost
Business Impact
Isolate Remediation
Root Cause Resolve
Response Times
Operations: Typical Operational Requirements
Generate alerts and notifications based on configurable transaction thresholds. Real Time and Historical performance data specific to app transactions Single view of usage, performance and availability for transactions across multiple tiers Real-time, detailed diagnostic data specific to users and their transactions. Report on business impact and relational diagnosis of faults. Enable multiple users at varying levels to consume and use data simultaneously Flexible, conditional alerting and reporting. Service Level Management and Operational views and workflows minus any extraneous information Usage, Performance and Availability monitoring for specific applications and transactions Reduce human hours spent isolating and analyzing performance problems Efficient communication between IT groups for reactive and proactive initiatives Integrate current monitoring investments into a strategic solution for Enterprise development
Operations: The Monitoring Challenge: Problems Everywhere Along the Delivery Chain
The Application Delivery Chain
Data Center Cloud: Private and Public
Web Mobile App logic Database Network Mainframe Virtualization SOA CDNs Third party services Inconsistent geo performance Bad performance under load Blocking content delivery Poorly performing Java or .NET methods Application Slow SQL or Web services transactions Server performance
Users Resource contention Mobile carriers Browsers ISPs Capacity issues Devices AJAX JavaScript Mobile apps Slow bursting
Customers
Poorly performing JavaScript Browser/ device incompatibility Pages too big Low cache hit rate
Network problems Bandwidth contention Improper load balancing
Network peering problems Outages
Network peering problems Bandwidth throttling Inconsistent connectivity
Infrastructure
Configuration issues Oversubscribed POP Poor routing optimization Low cache hit rate
Employees
Network resource shortage Faulty content transcoding SMS routing / latency issues
Operations: Why Traditional Monitoring Fails
APPLICATION TEAM DATA CENTER NETWORK TEAM Third-party/ Cloud Services
Network Middleware Mainframe Servers App Servers Load Balancers
INTERNET
This application CUSTOMERS is slow!
Im on it!
Storage DB Servers
Web Servers
SERVER TEAM Major ISP MAINFRAME TEAM
Local ISP
Content Delivery Networks
Mobile Carriers
Operations: Why Traditional Monitoring Fails
Not my Problem!
APPLICATION TEAM DATA CENTER NETWORK TEAM Third-party/ INTERNET CUSTOMERS
Not my Problem!
Storage DB Servers
Web Servers Network Load Balancers
Not my Problem! Cloud Services
SERVER TEAM Major ISP
Local ISP
This application is slow!
Middleware Mainframe Servers
App Servers
Not my Problem!
MAINFRAME TEAM Content Delivery Networks
Mobile Carriers
Operations: Why traditional monitoring fails: Datacenter Complexity
Component Level Monitoring Tools
Authentication Monitoring
Load Balancer Authentication
Respons e Time
Imprivata, Zimbra, ActiveIdentity, EMI Security, Juniper J-Web, Juniper
Respons e Time
Server Monitoring
Perfmon, Netcool, Sitescope, Solar Winds, Nimsoft, Nagios, MOM
Virtualized Web Servers
Firewall Virtualized Application Server
Impossible to Correlate & Troubleshoot
Respons e Time
Network Monitoring
Netscout, Niksun, NetCool, Opnet, Fluke, Cisco Works, EMC Smarts
Respons e Time
Virtual Env. Monitoring
VMWare, Quest vFoglight, Opnet vMon, ZenOS, NetIQ App Manager
Respons e Time
Application Monitoring
Load Balancer
Virtualized Application Servers Web Services, RSA Log File SAN Message Queue
Wily Introscope, Mercury Topaz , OV Transaction Analyzer, ITCAMs, dynaTrace, Optier, IBM ITCAMs
Respons e Time
Message Queue Monitoring
Candle, BMC Middleware Mgmt, Hyperic, Omegamon
Respons e Time
Database Monitoring
Quest Software, IBM Tivoli, Quest Fog Light , Precise, Oracle App SAN 1000 GB RSA SAN 250 GB
Database Instance
Operations: Why Traditional Monitoring Fails
War Room
APPLICATION TEAM DATA CENTER
. ..
All my lights are green! blah blah
blah blah
NETWORK TEAM
All my lights are green!
INTERNET
. !!!!!...
This application CUSTOMERS is slow!
Storage DB Servers
Web Servers
Service Manager
Network
SERVER TEAM
CTO All my lights are Third-party/ Cloud Services green! ????????
Local ISP
Middleware Mainframe Servers
App Servers
Load Balancers
MAINFRAME TEAM
All my lights are green!
Content Delivery Networks
Major ISP
This application is slow!
Mobile Carriers
Development: Application lifecycle
Business
(local, remote, outsourced)
Development
(local, remote, outsourced) Load testing
Test/QA
(local, remote, outsourced) Cloud load testing Monitoring
Production
Development: Problems with Application Lifecycle
Business
Business impact? Priority? Competitive info? What? Who? When? How? Code? Recreate?
Not enough business context! $$$$$$
(local, remote, outsourced)
Development
(local, remote, outsourced) Load testing
Test/QA
(local, remote, outsourced) Cloud load testing Monitoring
Production
Too much time reproducing problems!
Not engineered for performance! Too many iterations!
Too many business impacting issues!
Development: Lifecycle-Oriented APM
Which users $$ amount Conversions Abandonment Etc.
Business
Business impact $
All transactions Click-to-code All details
(local, remote, outsourced)
Development
(local, remote, outsourced) Load testing
Test/QA
(local, remote, outsourced) Cloud load testing Monitoring
Production
No need to reproduce issues
Performance from the start Fewer iterations
24x7, all transactions Fewer issues
Gartners five APM dimensions
Real User Monitoring Synthetic Monitoring
Browser, Data Center, Mobile
Backbone, LMile, Private, Streaming, Mobile
Java/.NET Network Database Server Transaction Trace
Business Service Manager 3rd Party Adapters
dynaTrace PurePath
4 5
Portal and the CAS, ADS
Dashboards Reports
The Compuware APM Solution
Portal Reporting and Dashboards Business Service Management
On-Premises
dynaTrace Enterprise Analysis
DATA CENTER INTERNAL USERS INTERNET
SaaS
Gomez SaaS multi-tenant data store
CUSTOMERS
Storage
DB Servers
App Servers
Third-party/ Cloud Services
Network Load Balancers
Local ISP
Major ISP
Mainframe Middleware Web Servers Servers
Content Delivery Networks
Mobile Carriers
RUM Browser Mobile
Data Center RUM
EUE and NPM
dynaTrace
Java .NET
Streamin g
Mobile
Backbon e
Last Mile
Enterprise
Internet
The Compuware APM Solution
Optimize performance across the entire Application Delivery Chain
Agentless real user monitoring Multi-tier analysis Application component analysis Network and server monitoring
First Mile
Application monitoring
Enterprise
Monitoring Cross-browser testing Load testing
Backbone
Virtual Test Bed
Monitoring Load testing
Last Mile
Real user monitoring
Real Users
Cloud Private Public
Browsers Customers
Data Center
Virtual/Physical Environment DB App Multi-tier transactions Servers
Java/.NET analysis
Mainframe
Servers
Web Servers
All users All apps All trans
Balancers
PurePat Load h Private
agents Private Last Mile
3rd Party/ 500+ Cloud Services
Local ISP 150,000+ consumergrade desktops
Browsers
150+ Major ISP enterprisegrade nodes
combos of browsers and O/S
168+ countries 2,500+ ISPs
Storage
All network Network segments, servers and infrastructure
Web Services
Mobile Components
WAN Optimization Employees
Data centers & cloudContent 5,000+ supported providers Delivery mobile Networksdevices
Major mobile carriers Mobile around Carrier the globe
Devices
Employees
Mobile apps
New Product names for version 12
APM Product page: http://www.compuware.com/application-performancemanagement/
For more information please refer to the support documentation available on http://go.compuware.com
Current Name Gomez Real User Monitoring Data Center (aka Vantage Real User Monitoring) Gomez Synthetic Monitoring Private Enterprise (a.k.a. Vantage Active Monitoring) Gomez Business Service Manager (a.k.a., Vantage Service Management) Gomez Java and .NET Monitoring (a.k.a., Vantage Java & .NET Monitoring) Gomez Transaction Trace Analysis (a.k.a. Application Vantage) Gomez Server Monitoring (a.k.a., ServerVantage) Gomez Network Performance Monitoring (a.k.a., Vantage Network Monitoring) New Name Data Center Real User Monitoring Synthetic Monitoring Business Service Management Java & .NET Monitoring Transaction Trace Analysis Server Monitoring Network Monitoring
Gomez Mobile Carrier Data Monitoring (a.k.a., Mobile Carrier Vantage Service Check) VantageView VantageView (no change)
DCRUM: Driven by End-User Experience
Optimize performance across the entire Application Delivery Chain
Test/monitor your app the way users access it: What they do: key transactions Where they do it: geographic locations How they do it: fat clients, browsers and native devices
All tiers, all transactions, all users
Prioritize & Resolve Issues: Measure the business impact users Isolate root causes Deep application and transaction analysis
Browsers
Deep analysis
Application
PurePat h
Mobile apps
DCRUM Capabilities
Agentless real user monitoring Unifies network and application reporting Monitors all data center tiers in one dashboard Optimize EUE for web and non-web Diagnose root-cause application problems though dynaTrace integration
DCRUM Differentiators
EUE: all users, all transactions End-to-end: whole ADC Actionable data Simplicity of deployment
Web and non-web applications ERP: SAP, Oracle EBS Business core: IBM MQ, XML middleware, mainframe front-end
Whole Application Delivery Chain Multi-vendor integration and Multi-tier view Network influenced monitoring captures all transactions
Business impact Application-specific decodes (28+) All users, all transactions, granular
No software agents to deploy or maintain Out of Box and bespoke reporting Industrys leading scale for monitoring
DCRUM Monitors All Tiers, Apps and Components
WAN
Internet
Load Balancer Authentication Virtualized Web Servers
Agentless Monitoring Device (AMD)
Firewall
Virtualized App Server Load Balancer Virtualized App Servers Web Services Message Queue
Centralized Analysis Server
Database Instance
DCRUM is Optimized for Cisco UCS
Compuware has optimized its Gomez APM On-Premises solutions for exclusive delivery on Cisco Unified Computing Standard (UCS) servers UCS is the gold standard for delivery of Compuware APM solutions with specialized leasing terms available through Cisco Capital Leasing.
This combination delivers systems excellence and solution differentiation providing our customers with choice and flexibility to respond to the everchanging demands of the business.
Customers can: - improve application performance
&
- increase scalability
- simplify operations.
Cisco UCS Servers
DCRUM Works With Your Environment
Applications
Custom & packaged applications across multiple tiers KEY EXAMPLES
Application Infrastructure
Virtual and physical environments KEY EXAMPLES
Process Automation
Existing solutions e.g., Service Desk and Event Management KEY EXAMPLES
Cloud Services
CDN, Cloud provider, and third parties KEY EXAMPLES
Browsers and Devices
Every commercial browser and mobile device KEY EXAMPLES
+ over 5,000 mobile devices
Complexity Demands Analytics
Multi-tier, multi-vendor data centers increase MTTR 1011101010011110101001100001011101
Simple monitoring does little in complex environments
Advanced root cause analysis finds these hidden problems Data must be collected from all applications and devices across all tiers Root cause analysis must work to method and code level of apps
DCRUM: Industry-leading Application Analysis
Continued investment in application intelligence
Leading end-to-end application performance analysis across entire application delivery chain
Applications
All tiers of the mission-critical applications
360 View of Application Performance
Application Health Status for IT Operational Monitoring
Enterprise Operational Dashboard
Isolate the Poorly Performing Data Center Tier Current vs. Historical Analysis Baseline performance and availability with synthetic Web and Non-Web Applications (e.g. SAP)
Isolated Network Impact on Performance
The new DCRUM troubleshooting workflow
Existing workflow: 3 levels, multiple choices on each level
New workflow: 3 screens, 3 clicks to the clue
Applications transactions health
One report for applications / transactions
Infrastructure and network drill down
One report for all tiers and all operations
Troubleshooting : operations, erro rs, locations, us er activity
One report for locations and users activity
DCRUM Reporting Dashboards
Out of box reporting provides: Enterprise Application Performance view provides up-to-date status on performance, availability, and business impact on your end users as well as a endto-end view your datacenter infrastructure with 1-click access to trend information.
Data Center Analysis View provides instant visual indication of problem areas with 1click access to detailed troubleshooting information.
DCRUM and dynaTrace integration
Reporting and events in Central Analysis server are linked directly to dynaTrace portal for deep dive diagnostics
dynaTrace: Root Cause in Seconds
Goal: get to the root cause as quickly as possible Approach: isolation the problem domain and diagnosis of root cause with an integrated solution of bread and depth
From Problem Isolation to Root Cause
dynaTrace PurePath Provides Deep Dive Diagnostics
Production Test/QA Development
Browser / Rich-Client
Web Server
Java
.NET
Other
Database
Synthetics
End-to-End Transaction Execution Path Across tiers: browser servers - database Remoting Web Services External services Code-level depth Heterogeneous- .NET & Java
Contextual Transaction Information Method arguments SQL bind variables
Environmental Data Memory Dumps
Thread Dumps
Monitoring data
Synchronization
Exceptions Logs
PMI, JMX, CLR Win, Unix, DB, VM Ware, ETC
dynaTrace Session
dynaTrace Platform Enables Unified Lifecycle Approach to Proactive Performance Management
Development Developers, CI
Test Test Centers
Production Production, Staging
Staging Tests, Tuning, Diagnostics 24x7 End-to-end Transaction Tracing, Monitoring, Diagnostics
Performance Engineering (Arch Validation, Profiling)
Automated Testing & Continuous Integration
Automated Testing, Tuning, Diagnostics
Integrate to Automate and Collaborate
IDE, CI, Build Integration System Management
Application Performance Management
Test Tool Integration
Development Team Edition
Test Center Edition
Production Edition
dynaTrace 4 One Platform Single Product
Need to increase test frequency and accuracy?
Automate Performance Analysis In Test & CI.
Integrate dynaTrace into your build, CI and test automation environment. Automate testing Unit, Load & Functional.
42
How often does the same issue resurface in production release to release? How often does the same bug reappear?
Automatically detect & Analyze Regressions
Detect performance and reliability regressions early. Compare performance and behavior of a current build to previous versions and baselines. Automate analysis to enable you to focus on features instead of debugging.
43
Application not scaling in production after passing QA? Assure A Scalable, Performing Architecture PurePath Technology provides true end-toend tracing -- Browser to Web Server to App. Server to Database. Visualize app. behavior under load for even large, complex applications to prevent scalability issues from reaching production.
44
What does fast or slow really mean? What does performs well and it scales really mean? Meet Performance Goals With KPIs Measure, track and alert against KPIs -Service level, Throughput & Response time. Compare performance relative to your competition with SpeedoftheWeb.
45
Debugging applications in the test environment? Firefighting in production? Automate Collaboration & Resolution Capture issue rootcause when they occur so engineers simply replay, at codelevel, precisely what happened.
Alerts publish captured PurePath Sessions to issue tracking systems for engineers to access immediately.
46
Gomez SaaSNetwork: The Worlds Most Comprehensive Performance and Testing Network
Backbone Web Performance Management 150+ locations
Last Mile Web Performance Management and Load Testing 150,000+ locations
Cloud High Volume Load Generation 6 locations
Virtual Test Bed
Your Actual Users
Cross-Browser Testing Real-user Monitoring 500+ browser/ Worldwide, wherever OS combos your users are 5,000+ supported devices
Gomez SaaS Network: Monitoring the Cloud
Community of cloud-based companies and experts providing: Hands-on tools Cloud education Best practices Cloud services evaluation
Cloud Performance Analyzer
Global Provider View
Outside in perspective of cloud service provider performance Real-time data Historic comparisons Performance & availability bottleneck identification Independent validation of providers SLA claims
Future APM
Compuware Delivers
TODAY
Proactive Monitoring Predictive Management
Provide visibility into the performance of heterogeneous applications from the enterprise to the cloud NEAR TERM
Predict application performance issues before they occur
NEXT GENERATION
Active Management
Dynamically adjust the infrastructure to prevent application performance problems
Compuware Concepts
Compuware Concepts
Information Gathering Protocol Analyzers Software Services Operations, Applications and Transactions Reporting Hierarchy
Tiers
Locations Metrics
Information Gathering
Application monitoring can only be as good as it is defined. Therefore, as much information as possible should be gathered surrounding the tobe-monitored applications: Minimally: Logical application topology information IP address (range) supporting the services for this application Port number (range) supporting the services for this application
Information Gathering
End-user Detection coverage
`
Synthetic Auto-check
Real User Monitoring
Synthetic Transaction
Protocol Analyzers
A.k.a. decodes monitors, parses, and analyzes a network protocol in the monitored traffic Some analyzers perform transaction monitoring: they can recognize exchanges of information where there is a recognizable question-and-answer dialog Licensed features Examples: TCP, HTTP, HTTPS, XML, MSSQL and Oracle
Software Services
Services that support an application at different levels, for example on a Web, Application or Database level.
Are minimally defined by a server IP (range) and a server port (range) together with a protocol, for example:
HTTP service on server IPs 10.10.10.1-10.10.10.3 on port 80 SOAP service on server IP 10.10.10.4 on port 8080
Oracle service on server IP 10.10.10.5 on ports 1521-1523
Configurable at different levels depending on the underlying protocol: Action identification Grouping Masking User identification
Operations, Applications and Transactions
Logical names / groupings for TCP level actions at different levels. Operation: Refers to an operation in the context of a particular protocol, and can mean a HTTP/HTTPS page load, database query, JOLT request on a Tuxedo server, DNS look-up etc. Transaction (grouping mechanism for operations):
Simple transaction consisting of a single operation, such as a Web page load.
Complex transaction consisting of a sequence of operations that are HTTP(S), XML, SAP GUI or Cerner based. Unstructured transaction that is a collection of un sequenced operations. Application (grouping mechanism for transactions): A universal container that can accommodate one or more transactions, which consist of one or more Software Services.
Applications and Transactions
Transaction A
Application 1
Transaction B
Transaction C
Applications and Transactions
Medical Records
Physician Login
URL (http://10.21.79.243/physician/login.do)
Admin Login
URL (http://10.21.79.243/admin/login.do)
Patient Login
URL (http://10.21.79.243/patient/login.do)
Application Performance
Transaction Performance
Reporting Hierarchy
Hierarchy levels depend on the analyzer type. The CAS can report on up to four levels for the following traffic types: HTTP SAP GUI Cerner
SOAP
Any database Each level can be reported independently or combined with the remaining ones. If you use DMI you are able to create reports with entries from arbitrarily chosen hierarchy levels.
Reporting Hierarchy
In the current DC RUM release (12) the division to hierarchy levels is supported: Operation The first level in the hierarchy, for example: URL, Query, SOAP Operation type Task The second level in the hierarchy, for example: Page name, Operation name, SOAP Method
Module The third level in the hierarchy, for example:
Database name, SOAP Service Service The highest level in the hierarchy, for example: SAP GUI business process
Reporting Hierarchy
Reporting Hierarchy
Real User Monitoring - Tiers
End-Users -Internal? -External? -Internet? Load Balancers / Content Switches
Web Servers Application Servers Database Servers
Mainframe / Other Tiers
Synthetic End-User Transactions (At Key Locations)
AMD Users
Start by monitoring the initial entry point of the End-Users transaction Add additional tiers for greater Fault Domain Isolation and Visibility Wide variety of transaction support: HTTP/S, Oracle/SQL/DB2/ Queries, SAPGUI, Oracle Forms, XML, MQ
CIO CTO IT Mgt Data Center Ops Monitoring Team Application Owners
CAS and ADS Report Server
Tiers
A tier is a specific layer where DC RUM collects performance data. Tiers are either pre-defined, or defined by the user in the Central Analysis Server (CAS).
Immediately after the CAS is deployed, data is reported based on the default tier configuration. If the default tier configuration does not fit your network architecture, you should configure tiers to match your topology
Tiers are configured globally. You should not create separate tiers for individual applications
Front-end Tiers
Best practice mark the tier as front-end which is closest to the user or to a device that acts on behalf of the user. In short the first layer the user connects with.
1st tier for example load balancer or Web Server
1st tier after Citrix or Terminal Service
Network Tiers
Client Network:
Wide Area Network (WAN) from remote sites. Manually and automatically defined sites (AS and CIDR blocks), except the All other site Network: Datacenter Local Area Network (LAN). All other site
Data Center Tiers
DC RUM defined Tiers that represent measurements originating from RUM DC and based on different analyzer types are listed in the Data center tiers section: Website Oracle Forms SAP GUI Exchange Middleware Message Queue Database Datacenter Infrastructure FIX User Defined Tiers that are based on software service definitions are listed in the Data center tiers with no rules assigned to them: VIP Load balancer Web servers Application servers Business logic Database servers
Locations
DC RUM refers to locations as Sites and defines them as IP address ranges. Location definitions can be made in a three-level architecture in DC RUM : Site: lowest level of granularity Area: Consists of one or more sites Region: Consists of one or more areas
Metrics: TCP Availability
Availability - The percentage number of successful attempts, that is, the total number of attempts minus the number of failures, divided by the total number of attempts and multiplied by 100%. Connection Establishment Timeouts Number of TCP errors of category 'Connection establishment timeout errors'. This category of errors applies when there was no Connection establishment timeout errors response from the server to the SYN packet(s) transmitted by the client. Connection Refused Errors Number of TCP errors of category 'Connection refused errors'. This category of errors applies when the server rejects a request from the client to open a TCP session. Such a situation usually happens when the server runs out of resources, either due to operating system kernel configuration or lack of memory. Server Session Terminations The number of Server Session Termination errors. This category of errors applies when the server detects an error on the application level and closes the TCP session with a RESET packet. Server not Responding The number of Server Not Responding errors. This category of errors applies when the client closes the TCP session with a RESET packet after the server has failed to respond for too long. Idle Sessions - The number of idle TCP sessions, that have not been active for a period of time longer than a predefined time-out time, 5 minutes by default.
Metrics: HTTP Availability
HTTP Availability - The percentage of successful HTTP hits, calculated based on the following formula:
100 * (Hits - HTTP errors) / Hits
All HTTP errors are taken into account. HTTP Client Errors - The number of observed HTTP client errors (4xx) HTTP Not Found Errors - The number of observed HTTP 404 Not found errors HTTP Other Client Errors - The number of observed HTTP client errors other than 401, 404 and 407 HTTP Unauthorized Errors - The number of observed HTTP 401 Unauthorized errors HTTP Server Errors - The number of observed HTTP server errors (5xx)
Metrics: Network Performance
Client ACK RTT - is the time it takes for an ACK packet to travel from the user to the AMD and back again. Client RTT - is the time it takes for a SYN packet to travel from the user to the AMD and back again. Client loss rate (to server)-The percentage of total packets sent by a client that were lost between the server and the AMD - and needed to be retransmitted. Server loss rate (to client)- The percentage of total packets sent by a server that were lost - between the AMD and the client - and needed to be retransmitted. Server realized bandwidth - Server realized bandwidth refers to the actual transfer rate of server data when the transfer attempt occurred, and takes into account factors such as loss rate (retransmissions). Thus, it is the size of an actual transfer divided by the transfer time. Request time - The time it took the client to send the HTTP request to the server (for example, by means of an HTTP GET or HTTP POST). Note: This time includes TCP connection setup time and SSL session setup time (if any). It starts when the client starts the TCP session on the server and ends when the server receives the whole request. Delay - Data transfer delay on a Data Center device, such as load balancer or firewall.
Metrics: Round Trip Time RTT
Metrics: Application Performance
Application Performance For transactional protocols, this is the percentage of application transactions completed in a time shorter than the performance threshold. For generic TCP protocols, this is the percentage of monitoring intervals in which user wait per kB of data was shorter than the threshold value. Operation Time The time it took to complete an operation. The term "operation" refers to an operation in the context of a particular protocol, and can mean HTTP/HTTPS page loads, database queries, XML (transactional services) operations, Jolt transactions on a Tuxedo server, e-mails, DNS requests, Oracle Forms submissions, MQ operations, VoIP calls, MS Exchange operations, or SAP operations. Note that an operation can be split over several packets. For HTTP and HTTPS, operation time is the page load time, which is equal to the redirect time plus the network time plus server HTTP time plus server think time. Person-hours lost (Performance, Errors, Availability) - In Central Analysis Server, the total monitoring time clients waited for pages to load due to bad service availability and bad application performance In Advanced Diagnostics Server, the total time clients waited for pages to load due to bad software service performance, that is, the total monitoring time during which page load time exceeded the predefined threshold. Note that this is not a sum of whole monitoring intervals, but only those intervals' portions during which problems occurred. This metric is not calculated in PVU mode.
Metrics: Operation Time
Metrics: Application Performance
Zero window size events - Client sets this in TCP header when it wants the other side to slow down with data transmission because it cannot keep up with the transmission speed. Indicates that receiving machine is busy with other tasks. Network time - The time the network (between the user and the server) takes to deliver requests to the server and to deliver page information back to the user. In other words, network time is the portion of the overall time that is due to the delivery time on the network. Redirect time - The average amount of time that was spent between the time when a user went to a particular URL and the time this user was redirected to another URL and issued a request to that new URL. The difference between Redirect Time and HTTP Redirect Time is that the former counts all operations, while the latter refers only to those operations for which redirection actually took place. Server Time The time it took the server to produce a response to a given request.
Server operation size - The size of a server operation. In HTTP and HTTPS (decrypted and non-decrypted), server operation size equals the page size.
Components and Relations
DCRUM Components
Enterprise Portal
Dashboards
Operational reports
Central Security Server
LDAP, users DB
Business Service Manager
Service Management
Central Analysis Server (CAS)
Data Mining Interface (DMI) Performance Management Database
3rd-party Integration
Service Model
RUM configuration Console
Configuration database
Synthetic Monitoring
dynaTrace DTM
Agentless Monitoring Device (AMD)
DCRUM Components - Enterprise portal
Role of the Enterprise portal
Adds new report workflow: AHS, DCA Optional component
CAS reports remain as before
Portal workflow drills down to CAS reports for details Seamless from the user perspective
CAS AMD ADS Enterprise Portal
DCRUM Components
Central Analysis Server (CAS) The main reporting component for dynaTrace Data Center Real-User Monitoring Combines measurements from the Agentless Monitoring Device (AMD) using different contexts CAS pulls its data from the AMDs in the form of zdata sample files Stores its results in an MS SQL Server database Results can be viewed real time or historically Agentless Monitoring Device (AMD) Network probes that analyze network traffic Console Client Used for configuring devices and application monitoring Console Server Stores the configuration in a flat file database
DCRUM Components
Compuware Security Server (CSS) New in the 12.0 release is a new functionality called the Compuware Security Server. Provides a central authentication and user management capability for o Central Analysis Server, Console, Advanced diagnostic server, Enterprise Portal and BSM This central component allows Users to defined locally in a CSS database or for the customer to use their own corporate user management system such as the LDAP based systems Active Directory or Apache DS. Advanced Diagnostics Server (ADS) Is a separate report server, that is integrated with CAS on reporting and configuration level Provides a more detailed, troubleshooting-oriented analysis (i.e. element level for HTTP instead of page level on CAS) Supports applications based on HTTP(S), XML over HTTP(S)/MQ, SAPGUI, DB2, MSSQL, Sybase, Informix, Oracle and Oracle Forms
DCRUM Components
ADS pulls its data from the AMDs in the form of vdata sample files Stores its results in an MS SQL Server database Results can be viewed real time or historically Enterprise Portal (EP) Helps speed the isolation of the fault domain and reduces the cost of troubleshooting issues, while restoring service as quickly as possible. Contains robust data mining and report building tools for creating new and customized reports quickly and easily. Contains dashboards which display graphs, geographic views, and tabular data regarding service and application quality, fault domain isolation, business impact, and infrastructure health. Consolidates reporting, security, and configuration functionality into a single component.
Analysis Modules
Transaction decode (analysis modules) include:
HTTP/HTTPS SAP SOAP/XML Databases: MS SQL, Oracle, DB2, Sybase, Informix
Oracle Forms
IBM MQ MS Exchange
Thin Client (Citrix/Terminal Services)
Analysers
Multi-purpose and Expandable Product Family
CAS (Web)
Oracle EBS HTTP(S) Siebel Fault Isolation Detailed HTTP MS Exchange Oracle Forms
Tuxedo/JOLT
Bus Trans
SAP GUI
SQL\ DB
TCP/IP
SOAP
Information Database
Central Analysis Server
Advanced Diagnostics Server
AMD
Network Vantage Probe
Passive traffic analysis (since v 10.1)
Flow Collector
Netflow data analysis (since v 10.2)
87
Collection and Measurement
Passive traffic analysis
SMTP
Citrix
XML
DNS
MQ
Analysis and Reporting
CAS (Ent)
ADS
Analysis Modules
Enterprise Portal Dashboards
Industry-leading breadth of analysis
1
2 3
4
5
1. 2. 3. 4. 5.
Real-time and historical trending views of application , user, network and overall data center performance Supports web and non-web applications such as SAP. Quickly identify poorly performing data center tiers. Isolate network performance impact on applications and users. Monitor baseline performance and availability with synthetic monitoring.
Optimize end-user experience
View overall status of applications and end-user performance through a single dashboard that includes quick drill down views into performance, availability, operation time and usage for individual applications and users.
Multi-tier Data Center Monitoring
Caption: Drill down from Application Health Status for a focused analysis of performance by data center tier. Isolating application performance problems in multi-tier environments in todays modern application and data center architectures is a daunting task for IT, yet the business demands rapid problem isolation to reduce business impact. The new Data Center Analysis View provides instant visual indication of problem areas with 1-click access to detailed troubleshooting information. Isolate tier, server, time period, slow web pages, middleware messages, and database queries in a single interactive view that accelerates fault domain isolation.
Multi-tier Data Center Monitoring (contd)
1 2
1. 2. 3. 4.
Data Center Analysis provides real-time views of application performance, operations, availability and usage along with requests broken down by the supporting tier of infrastructure. Historic detail of performance of tiers is displayed with mouse-over detail of how user and application performance is affected by the corresponding infrastructure tier. Individual application operations are displayed in context of overall application performance, network health and end-user experience. End-user performance is displayed for any infrastructure tier and can be sorted by user group, individual users or client types.
One click to deep-dive application analysis
1
2
1. DCRUM provides a broad view across infrastructure to triage performance of services, servers operations and websites. 2. Reports on affected users, transaction times and availability quickly surface hot spots in application performance. 3. From DCRUM dashboards, a direct drill down into dynaTrace reporting provides method call and code-level analysis of application performance issues.
Optimize end-user experience (contd)
1
1. 2. 3.
Drill down from affected users heat map to view individual user performance Identify the application(s) responsible for poor end-user performance. For specific users, identify the offending application operation with a breakdown of slow, fast and aborted requests
Central Security Server
CSS Consolidated User Management
The Compuware Security Server (CSS) is a new consolidated authentication and user management system in 12.0 CAS RUM Console ADS Enterprise Portal BSM Local defined users Corporate LDAP Active Directory Apache DS
CSS
CSS Features / Value
Users have one account/password to access DC RUM and BSM Seamless pass-through from Enterprise Portal to CAS / ADS Enterprise Portal connects to 12.0 CAS / BSM without login Administrator usernames / passwords: One vs. three Manage users in one location
Common roles across components
Audit logging online and exportable Consistent LDAP and LDAPS access Consistent password policies
Central Analysis Server
Central Analysis Server (CAS) Report Server
CAS is the main report server and repository for real user monitoring Metrics are aggregated at interval level for each unique client + operation + server o An operation is a web page load, database query, web service call, etc. Other features of the CAS o Custom reporting (DMI) o Alerts o Baselines CAS has two personalities. Transactional Monitoring (web analysis) o Focused on specific applications: web, SQL, SOAP, etc. o NOTE: Not just Web analysis Enterprise Monitoring o General network traffic monitoring CAS can also store/report on metrics from synthetic transactions and J2EE & .NET agents
CAS - Data Mining Interface (DMI)
The DMI is the custom reporting tool for DC RUM No need to write custom SQL queries 100% web based Create reports (tabular, charts) from any DC RUM data source: real user monitoring metrics, Java & .NET agent metrics, etc. Reports can be scheduled (send daily summary reports by email every night) Reports can be linked together to create a customized drilldown workflow Data can be exported Report definitions can be imported/exported (for reuse at another client) Metric names can be aliased to match customer terminology Intimidating to use at first glance, but its easy to master
CAS - Data Mining Interface (DMI)
CAS - Data Mining Interface (DMI)
CAS - Alarm System Overview
The alarm mechanism enables you to be proactive rather than reactive
Fixed thresholds V Baselines
Alarms can be sent to a specified e-mail address, or can be sent via an SNMP trap.
There are also alarms that are generated even if they have no subscribers assigned. Such alarm notifications are recorded in the alarm logs, which store records of all alarms generated.
Modify the existing alarm or define new alarms.
CAS - Types of Alarms
Alarms based on SQL detectors
Using SQL queries, these alarms perform queries on the traffic monitoring database. The benefit of using these alarms is that there are no constraints to the complexity of the queries and any event that can be expressed as an SQL query can be detected.
Alarms based on Java/.NET Monitoring measurements
VAMETRIC_ALM - for alarms performing queries on measurements related to entry points VAMETHODMETRIC_ALM - for alarms performing queries on measurements related to object methods or SQL queries PAT_VIO_4_AS_RES - for alarms performing queries on measurements related to JMX/WMI metrics
Metric alarms
These alarms provide a simple and fast mechanism for performing complex queries on a set of pre-defined metrics. The advantage of using these alarms is easy of use and modification as well as performance. To define metric alarms, you do not need to know the structure of the database or how to program in SQL. However, not all conditions can be expressed as metric alarms.
Network alarms
These alarms are similar in design and function to the metric alarms above, though they view the monitor traffic as it is done on the Network View report.
Link alarms
These are fast-executing alarms designed to monitor link utilization as presented on the Link View report.
Other alarms
A few other alarms are available which were designed for very specific purposes and which can be modified in only limited ways and which do not allow user access to the detector code.
RUM Console
Components
RUM Console consists of two components: RUM Console Server A back-end server application that maintains configuration images and device information, runs tasks related to configuration management, and provides a Web services API for RUM Console to manage configurations. The server is a Windowsbased service that can be installed on a machine with Windows 2003 Server or Windows 2008 Server R2 with a network connection to all of the managed devices within the Compuware APM infrastructure. RUM Console A GUI application for configuring report servers and data collectors. With the console, you can create and edit configurations for Compuware APM devices and propagate such configurations to other Compuware APM devices
RUM Console
Guided configuration: first time users, easy configuration first steps
Wizard configuration
Tracing ability Entire configuration: experienced user All same options Health reports Sequence transactions
Guided Configuration
Device information
Agentless Monitoring Device
AMD
The Agentless Monitoring Device (AMD) is a completely passive device, placing no additional load on the network. The AMD can be connected to the network in two ways: Spanning the switch In todays switched environments most switches have the ability to mirror multiple ports and or multiple VLANs to a single monitoring port. This gives the AMD the ability to passively monitor traffic from a number of different perspectives. Therefore the AMD can see traffic in front of and behind load balancers, as well as all the tiers in between. In cases where the switch can not accommodate more spans, the use of regeneration taps can be favourable. Cisco switches may also use VLAN Access Lists (VACLs) to bridge routed traffic to an outgoing port much in the same way as port mirroring. Passive Taps In certain cases, the use of span ports may not be viable. In this case passive taps may be utilized to capture the application traffic to be monitored. This method requires multiple tap points to fully see all tiers within the application.
AMD
AMDs job is to sniff traffic for the purpose of performance monitoring AMD processes performs initial processing of the data. Data is organized into files to be retrieved by report servers at configured time intervals Red Hat Enterprise Linux 5+ and 6+ Hardware slots are filled with additional network interface for monitoring Monitoring NICs are passive Can be copper or fiber or mixed SSL decryption is performed on the AMD RSA private key needed SSL decryption card (Nitrox Cryptoswift) Decryption processing is offloaded from main CPU RSA keys are guarded. They are not stored on disk or in main memory. Software only OpenSSL AMD does not store/keep packet traces. It inspects packets to see the URL, the userid, etc. The exception is HTTP Header request/response and POST data when using the ADS report server (optional) Sensitive data can be masked
Advanced Diagnostic Server
Advanced Diagnostics Server (ADS) Report Server
ADS is the deep-dive report server and repository for real user monitoring on web and SQL applications
Operations are not aggregated (like in CAS). Every monitored transactions can be reviewed in detail
Breaks down the page load time by individual web page element (images, css, javascript, etc.)
Can be used to drill into the transaction to see the input submitted by the user (POSTed data).
Supports monitoring of business transactions
Stores data only 3-4 business days
ADS Report Server Example
ADS Report Server Example
ADS Report Server Example
ADS Report Server Example
Component Scaling
What causes sizing problems
Incorrect Product Positioning Deep-dive bottom-up troubleshooting approach instead of top-down Application Performance and EUE Monitoring Using short-term POV parameters in longer term Post-Sales implementation All-Traffic without any filters limiting IP addresses Too many individual Clients No user aggregation User ID recognition generates too many identifiers Monitor specific page defines URL parameter with too many values (such as phone number, etc.) Too many regular expressions
HTTP Application Error tracking in high-end environment
Storage period is too long without justification ADS in high-traffic trying to handle same page volume as VAS Setting Unrealistic Expectations with the Customer
RECOMMENDED ARCHITECTURES
Report Server Integration and Aggregation
When to integrate multiple report servers?
When one Central Analysis Server is not enough to store all monitoring data from all Agentless Monitoring Devices. When AMDs are geographically dispersed (for example, in different data centers). When you need to use Advanced Diagnostics Servers to broaden your monitoring perspective and add in-depth vision alongside CAS reports. When you need failover and backup operations to provide high availability of reports.
When you want all of the reporting in one place.
RECOMMENDED ARCHITECTURES
Scalability: HTTP decode multi-threading
Heavy HTTP analysis: traffic analyzed by the AMD (Mbps) *
1600 1400 1200 1000 800 600 400 200 0
11.1 32-bit 11.7 64-bit 11.7 64-bit 11.7 64-bit
* - all HTTP analysis feature are enabled, use r recognition and operation recognition uses processintensive regular
multi-threading
32 GB RAM
multi-threading
64 GB RAM
Scalability Each version brings more optimal traffic decoding, the version 12 numbers are bit better than 11.7 version again
Monitoring Component Capacity Guidelines
The CAS database should not contain more than 2 million sessions. ADS offers two modes:
Small Website: Per hit mode can handle 3M page loads (approximately 10M hits) per day. Large Website: Per page mode can handle 13M page loads per day.
For the AMD it differs per traffic profile. Below a few examples can be seen:
123
Distributed data storage benefit
Reduce number of users maintained in the SQL database
This reduces number of CAS sessions Note: CAS client location structure must be welldefined
Practical data reduction levels will vary Theoretical benefit: 3x 7x reduction in number of sessions
Central Analysis Server - Scalability
RECOMMENDED ARCHITECTURES
AMD scaling
Passive in-line tap or splitter AMD in load-balancing mode Intelligent switch (e.g. Gigamon, Anue) Each AMD analyzes one or part of one application SPAN
Tap
AMD
AMD
AMD
126
RECOMMENDED ARCHITECTURES
CAS scaling
Add more CAS servers and distribute data per monitored Server IP Designate one CAS as master
AMD
All DMI reports will use all servers as data source
CAS
CAS
CAS
127
DCRUM Components - CAS master/slave
CAS, ADS network of master and slaves is seen as ONE by the portal
Enterprise Portal
CAS AMD ADS
CAS master-slave network
One of the CASes is designated as the master
Monitoring functionality of this CAS is similar to all other CASEes
Meta-data for consolidated reports is served from slave servers to the master Master builds a consolidated report for the user
Enterprise Portal
DMI front-end AMD Probe DMI back-end
ADS always acts as a slave server There are no performance reasons to set up a separate Master CAS
Just designate one of the CASes in the cluster
Central Analysis Server
Central Analysis Server
AMD Probe
DMI back-end
Central Analysis Server
AMD Probe
DMI back-end
RECOMMENDED ARCHITECTURES
CAS scaling SQL offload
SQL database on separate hardware Makes sense only if I/O of the SQL server is faster then I/O of the CAS h/w
AMD
Shared SQL servers not recommended for high loads
CAS
CAS
ADS
SQL
130
Additional Analysis tools
Transaction Trace Analysis
Complex application and network interaction can demand more than real-time monitoring. DCRUM includes a Transaction Trace feature that provides deep root cause analysis needed to quickly remedy complex network problems
Transaction Trace Analysis
1. Dig deeper into server processing delays with Thread Analysis visibility into popular protocols such as HTTP/S, SQL, SAP, WebSp here MQ, RMI/IIOP, CIFS, and more 2. Pinpoint the source of application performance problems by identifying the impact of the network on transaction response time 3. Roll out applications that perform well from the start by predicting and tuning response time before deployment
Other decodes including Citrix WAN Opimisation
134
Monitoring Citrix
VTCAM software is installed on presentation server (Citrix or MS Terminal Server)
Runs a Windows service Collects CPU & Memory utilization stats of Citrix host Maps back-end application traffic to the responsible end-user (session mapping data)
CAS reports
CAS
Gomez User
Monitoring Citrix
Citrix Remote Users Citrix Server Farm
Database Servers Corporate Network
Appropriat e Analysis Modules Web TCP level analysis AMD Applicati ons
TCAM
CAS + Enterprise Analysis
Other Application s
Thin Client Analysis Module (TCAM)
TCAM Vantage Thin Client Analysis Module
Target Environments Citrix and WTS enabled applications Deployment Considerations
A lightweight component is placed on the server to correlate user logins and back end Citrix conversations.
The agent uses Citrix API and Microsoft Windows API to obtain information on which user is opening which TCP sessions from the Citrix/WTS server. Agent communicates in real-time with AMD and provides mappings from TCP session IDs to Citrix user login names. This information is used by AMD to tag measurements taken on the Citrix<->application server path with actual user login names.
Thin Client Analysis Module (TCAM)
Target Environments
Citrix and WTS enabled applications Deployment Considerations Additional information on resources utilization (CPU, HDD, RAM, TCP, Number of Terminal Services sessions and Number of active Terminal Services sessions) statistics of Citrix server is also available. One AMD can monitor multiple Citrix/WTS machines (different servers, different protocols) One CAS can gather data from multiple AMDs and provide a single view of service delivery
Monitoring Citrix
Monitoring Citrix
Monitoring WAN Optimization
WAN Optimization Controllers (WOCs) are installed at branch office and data center locations
The AMD adds a SPAN or TAP on the optimized side of the data center WOC
Monitoring WAN Optimization