Ultimate Guide To IT Monitoring Management and Modernization
Ultimate Guide To IT Monitoring Management and Modernization
Monitoring,
Management, and
Modernization
In this e-guide
determine IT performance Access this guide to find out how to minimize disruptions to your business through effective IT
monitoring, management and modernization.
alerts
• IT incident management
best practices to minimize
disruptions
Page 1 of 25
In this e-guide
model depends on everyone Before you embark on an IT monitoring strategy, review these key terms to know which
metrics and methods to prioritize.
outside IT
Page 3 of 25
In this e-guide
Monitoring thresholds: Monitoring thresholds notify IT operations teams when a system's
• Words to go: Aspects of an resource usage approaches certain limits. A threshold is intended to send alerts about
potential issues early on so IT teams can address them before end users experience any
IT monitoring strategy performance lags. Admins can generally choose between static or dynamic thresholds. Static
thresholds place set values on system limits -- such as CPU utilization reaching a certain
• Monitoring thresholds
percentage -- while dynamic thresholds learn the standard performance range of a system
determine IT performance over time and alert teams when anything falls outside that range.
alerts Dashboard: A dashboard is a console that provides a unified and visual representation of
monitoring data. It can display data across a range of monitoring categories, and IT teams
• IT’s application support can customize a dashboard to show the specific metrics they choose.
model depends on everyone
outside IT
Page 4 of 25
In this e-guide
model depends on everyone Some IT monitoring tools use static thresholds that are manually adjusted, while others use a
learning system to set thresholds specific to the given environment. Both methods have a
outside IT
common objective: inform the IT operations team when there is an issue and point to a cause
• IT incident management -- ideally, before users notice any effects. Both static and dynamic monitoring thresholds have
advantages and disadvantages.
best practices to minimize
disruptions Static monitoring thresholds
• Top-tier ITSM skills go Static thresholds are fixed values that represent the limits of acceptable performance. For
example, a server with over 90% CPU utilization is generally a bad thing, no matter when it
beyond a good tool set happens or on what server. For other performance counters, it is less obvious what is
acceptable and what is dangerous. Monitoring products come with default thresholds for each
• Modernize ops practices to
performance counter that the IT team can adjust. Not all IT workloads benefit from the same
manage hybrid IT monitoring thresholds. A bank's IT team needs to know about CPU utilization that goes above
60% for a few minutes, for example, while a manufacturer might not.
infrastructure
Page 5 of 25
In this e-guide
Static monitoring threshold tuning is a major challenge for IT teams. Tuning effectively limits
• Words to go: Aspects of an the number of thresholds and usually means that the same thresholds are used across every
VM, despite these VMs serving markedly different business applications. For example, a
IT monitoring strategy reporting server is healthy at 90% CPU utilization, while a web server at the same utilization
rate requires IT support. It takes more manual tuning to override the standard threshold for
• Monitoring thresholds
applications that have these different requirements. Until manual tuning is perfected, the
determine IT performance monitoring tool will not report real issues, will over- or underreport the severity of an issue or
will report issues where none exist.
alerts
Static thresholds do not allow for cyclic variation. It is common in IT environments for CPU
• IT’s application support utilization to hit 95% for two hours overnight as the backup runs, but only during that brief
model depends on everyone window. Some tools enable users to set in-hours and after-hours thresholds separately.
However, IT infrastructure also can experience normal weekly and monthly variations in load.
outside IT Static thresholds do not respond to these cyclic workloads and require a lot of work to avoid
false positives and missed issues.
• IT incident management
best practices to minimize Dynamic, learning monitoring thresholds
disruptions
Page 6 of 25
In this e-guide
Intelligent IT monitoring tools learn what is normal
• Words to go: Aspects of an in the environment and only send an alert when Intelligent IT monitoring
things are outside of the understood normal tools learn what is
IT monitoring strategy cycles and parameters. Dynamic thresholds
normal in the
usually learn the normal range for a performance
• Monitoring thresholds environment and only
counter -- both a high and low threshold -- at
determine IT performance each point in the day, week and month. They, send an alert when
therefore, identify daily, weekly, monthly and things are outside of the
alerts
even annual cycles in IT systems. A dynamic
understood normal
• IT’s application support system knows the high CPU load during backup
is normal, but that 80% CPU utilization on a cycles and parameters.
model depends on everyone Tuesday morning is abnormal. Because tuning is
automatic, the IT monitoring strategy can include
outside IT
thousands of thresholds, even ones that change over time to follow business cycles.
• IT incident management Dynamic thresholds are not as intelligent as people. A dynamic monitoring setup can become
best practices to minimize confused when cyclic activity doesn't happen according to usual patterns. For example, the
support staff will get an alert that system load is low on a public holiday, because the users
disruptions are at the beach instead of at their desks creating load.
• Top-tier ITSM skills go Dynamic monitoring tools deployed in a broken or poorly performing IT environment can learn
beyond a good tool set that state as normal and even start to send alerts due to it getting better. For example, an
application has a memory leak, so memory utilization increases over time. But the server is
• Modernize ops practices to rebooted on a monthly basis for patches. The dynamic system will accept this monthly cycle
of increasing memory utilization as normal. Dynamic systems are also inclined to view things
manage hybrid IT that get broken for a while as the new normal. If a storage array slowly gets overloaded and
infrastructure unresponsive, the dynamic threshold monitoring system will register the overloaded state as
the new normal.
Page 7 of 25
In this e-guide
An IT monitoring strategy for the real world
• Words to go: Aspects of an
In the real world, most monitoring tools do more than just watch thresholds, and even
IT monitoring strategy
dynamic threshold systems incorporate some static parameters, too. Overall, IT monitoring
• Monitoring thresholds tools that build thresholds automatically are more useful than those that require a lot of
manual tuning. Tedious tuning never gets completed in a busy IT organization, which leads to
determine IT performance a habit of ignoring noisy false alerts.
alerts A smart monitoring strategy uses more than just performance counters. Tools incorporate
• IT’s application support system logs to help identify issues and pair infrastructure monitoring with application
monitoring. This setup tracks app availability and response time to correlate it with
model depends on everyone infrastructure performance. A monitoring system with all its dials showing green is not the
complete story; look for multiple ways to identify issues in the environment.
outside IT
• IT incident management
▼ Next Article
best practices to minimize
disruptions
Page 8 of 25
In this e-guide
• IT’s application support DevOps has succeeded as an application support model in part because it caters to user
experience. The feedback loop between IT operations, developers and users must be as
model depends on everyone short as possible, but processes must fall below the business's established risk threshold as
well.
outside IT
Page 9 of 25
In this e-guide
These team members take some application
• Words to go: Aspects of an support responsibilities off of IT. They can answer Feedback should be
the easier questions from other users, as well as constructive, simple to
IT monitoring strategy lend their understanding and expertise to
follow and encouraged
troubleshooting and change management. They
• Monitoring thresholds in company culture --
contribute directly to business performance
determine IT performance improvement and outage reduction through easier said than done.
involvement in update commits and setting
alerts
changes. App owners and superusers should test
• IT’s application support out changes first and have direct access to the IT staff members who manage, change and
develop the application, rather than go through the help desk.
model depends on everyone
IT administrators, app owners and superusers should be in an easy-to-find list. Users can
outside IT know who to contact about application issues or questions, and the list also aids the help desk
staff who have front-line application support roles.
• IT incident management
best practices to minimize Benefits of feedback
disruptions Feedback should be constructive, simple to follow and encouraged in company culture --
• Top-tier ITSM skills go easier said than done. To reach that feedback goal, start with key applications or processes,
and grow at a rate that users and IT staff can accommodate.
beyond a good tool set
Organizations have numerous methods to collect feedback, such as email, forms and help
• Modernize ops practices to desk incident requests. Choose a method for your application support model where the user
can give feedback easily and recipients can respond just as easily. If users submit feedback
manage hybrid IT and never get a response, they won't be motivated to comment in the future.
infrastructure Some feedback is impossible to act upon, but most users would rather hear "no" with an
explanation -- even if they don't like the answer -- than receive no reply at all. For actionable
Page 10 of 25
In this e-guide
feedback, improvements demonstrate to the business that IT can adapt and change.
• Words to go: Aspects of an Inversely, feedback can lend business justification to IT's change requests that get held up by
resource limits.
IT monitoring strategy
Don't just make changes indiscriminately when responding to feedback. Just because a few
• Monitoring thresholds users want Comic Sans as their default font doesn't mean that it should apply to the rest of
the business. Where standardization makes sense, enforce it. Where it doesn't, leave
determine IT performance
flexibility in the app support model to let users customize their experience.
alerts
There are other ways to improve the application support model and IT service management
• Modernize ops practices to overall. Look for opportunities that fit the business's and application users' needs, and
manage hybrid IT implement ones that have the biggest effect.
infrastructure
Page 11 of 25
In this e-guide
• Monitoring thresholds
determine IT performance
alerts
• IT incident management
best practices to minimize
disruptions
Page 12 of 25
In this e-guide
• IT’s application support A comprehensive IT incident response plan includes more than just playbooks, runbooks and
guidance on patching -- it maps out detailed post-mortem steps to ensure IT teams learn from
model depends on everyone the event. Use the following tips to optimize IT incident response planning and management.
outside IT
Modernize IT incident response plans
• IT incident management
As companies move to the cloud and build out mobile computing and DevOps strategies, they
best practices to minimize need to update their IT incident response plan to suit these environments. Johna Till Johnson,
disruptions CEO and founder of Nemertes Research, discusses five measures, including automation and
collaboration with cloud providers, to create an incident response plan that aligns with modern
• Top-tier ITSM skills go IT deployments.
infrastructure
Page 13 of 25
In this e-guide
• Monitoring thresholds
determine IT performance
alerts
• IT incident management
best practices to minimize
disruptions
Page 14 of 25
In this e-guide
incident from turning into a full-blown disaster. IT architect and college instructor Brian Kirsch
• Words to go: Aspects of an breaks down crisis management protocols that enterprises can adopt to strengthen their IT
incident response plans, starting with a playbook.
IT monitoring strategy
• IT incident management
best practices to minimize
disruptions
Page 16 of 25
In this e-guide
• IT’s application support For ops admins, it pays to be proactive -- ready, willing and able to communicate with
business leaders and put IT service management (ITSM) capabilities in the best alignment
model depends on everyone with organizational goals. But how, and is it something beyond technology that can help?
outside IT "IT professionals [who] develop the skill of 'seeing the whole' can make themselves
indispensable to their organization," said Alan Zucker, founding principal at Project
• IT incident management Management Essentials, a training and advisory company in Washington, D.C. The concept
best practices to minimize of seeing the whole comes from the lean movement, he said. Essentially, everyone should
understand what they are doing and how it fits into the product they create or the value
disruptions stream they deliver. This integration with business should be counted among ITSM skills,
• Top-tier ITSM skills go alongside things like asset tracking, change management and reporting.
beyond a good tool set Many IT professionals are deeply knowledgeable about technologies, pipelines and
architectures, but focus on their specific role or function in the organization. They don't
• Modernize ops practices to consider the downstream effects on the rest of the business; instead, they try to optimize their
part of the process in a vacuum. For example, Zucker said, too often a small component of an
manage hybrid IT application or project receives disproportionate attention, because it is using a new
infrastructure technology or tool that interests IT, but that slice of the project may not matter in the larger
scheme.
Page 17 of 25
In this e-guide
"IT professionals [who] understand how to optimize the end-to-end project ... stand out in the
• Words to go: Aspects of an eyes of their customers; they become the future leaders because they understand how
everything fits together," Zucker said.
IT monitoring strategy
This focus on the whole is one of the drivers for
• Monitoring thresholds ITSM, which blends IT capabilities with business Having worked with
determine IT performance requirements to deliver technologies that address many IT teams across
workers' needs. But even shops with ITSM
alerts
Canada and the United
experience face an intrinsic deficit in soft skills.
States, I can say that,
• IT’s application support "Having worked with many IT teams across without a doubt, the
Canada and the United States, I can say that,
model depends on everyone without a doubt, the average IT person's mindset average IT person's
outside IT holds IT back in their organization," said Mazdak mindset holds IT back in
Mohammadi, owner of BlueberryCloud, a custom their organization.
• IT incident management website developer based in Vancouver, B.C.
best practices to minimize Mohammadi said the problem is intrinsic to the
owner, BlueberryCloud
disruptions nature of IT, which is based on logic and
influences how IT people think. Logic is not
Mazdak Mohammadi
• Top-tier ITSM skills go enough when dealing with business. "In order to
influence anyone to do anything, such as
beyond a good tool set adopting a new technology, you must create trust," he said. Trust doesn't come from logic, but
• Modernize ops practices to instead requires an emotional connection. In addition to building hard ITSM skills so they can
implement the business's vision, IT people should also become "better salespeople," so the
manage hybrid IT business will put that vision into IT's capable hands.
infrastructure The deck can be stacked against IT people who try to keep up with every trend, buzzword
and best practice for their profession. That doesn't leave much time or energy for the issues
Page 18 of 25
In this e-guide
that most affect customers and the company. "It's easy to fall prey to tech-industry peer
• Words to go: Aspects of an pressure and spend all your time learning things that never help you, nor your company, get
ahead," said Wes Higbee, a consultant and author based in New York. Instead, look for real
IT monitoring strategy blocks. "Maybe your customers spend a lot of time manually importing data from your system,
and automating this could save them millions of dollars," he said.
• Monitoring thresholds
It can be difficult to think of presence as an ITSM skill, but the new approach takes shape in
determine IT performance
small differences. When IT people meet businesspeople halfway, the results are often
alerts positive, according to Kelly Finn, principal for information technology at WinterWyman, a
recruitment firm with headquarters in Waltham, Mass. For example, pay attention to your
• IT’s application support attitude and tone when you explain something technical to a nontechnical audience. Were you
model depends on everyone in a rush or patient? Did it sound condescending or helpful? Listening is the most important
soft skill you can develop, Finn said. Then, use your technical knowledge to come up with
outside IT practical and winning strategies. "Too often, a businessperson will tell you they need X, Y and
Z, which you recognize is beyond the resources you have," Finn said. Be patient, and push
• IT incident management back appropriately until you get to a service delivery plan that is achievable.
best practices to minimize
Some of the onus for a better ITSM experience is on the business side, according to James
disruptions Goepel, vice president and CTO at ClearArmor Corp., a cybersecurity company in
Riegelsville, Penn. "The misalignment between IT and business is a huge problem that
• Top-tier ITSM skills go creates inefficiencies and is at the root of many of today's cybersecurity issues," he said.
beyond a good tool set Businesses need to stop looking at the IT department as a cost center, and, instead, see it as
a business enabler. To do this, the IT department needs to approach its technological
• Modernize ops practices to systems the same way the organization's leadership approaches the rest of the company, he
said. "Tie the IT systems back to the organization's core business functions, and ... discuss
manage hybrid IT the IT initiatives that they are advocating by how they will benefit the business," he said.
infrastructure These benefits might range from improved efficiency to reduced cost or lower risk, or some
combination of these factors.
Page 19 of 25
In this e-guide
Use tools to support soft ITSM skills
• Words to go: Aspects of an
Don't neglect the technology dimension when you shift focus to the overall experience. ITSM
IT monitoring strategy
tool sets are rapidly evolving from systems that simply record tickets to ones that actually help
• Monitoring thresholds resolve incidents as they occur, said Shannon Kalvar, research manager for ITSM and virtual
client computing at IDC. The major vendors in this space, including BMC Software, Cherwell
determine IT performance Software and ServiceNow, have moved in this new direction. IT teams are more effective at
alerts the front lines, "helping people to quickly resolve complex questions and problems using
automated systems," he said. These tools boast machine learning to underpin capabilities,
• IT’s application support and more integration in one user interface. With views into, for example, help desk tickets and
monitoring stats on the same interface, "you don't have to do constant flip screening," Kalvar
model depends on everyone said.
outside IT
In keeping with the shift in mindset and skills that brings IT closer to the rest of the business,
• IT incident management ITSM tools are becoming broader in their focus, jumping to new areas such as HR. "The
impact of this technology is that IT skills have to move from the back office to more directly
best practices to minimize helping people," he said.
disruptions But Forrester analyst Charles Betz offered a note of caution. In his view, the entire premise of
• Top-tier ITSM skills go ITSM is under siege. The IT Infrastructure Library (ITIL) framework, which is the basis for
much of ITSM thinking, is due for an update in 2018, after many years without one. ITSM has
beyond a good tool set been under attack from DevOps devotees who believe it is a more efficient way to create and
manage IT.
• Modernize ops practices to
Both philosophies, ITSM and DevOps, have active adherents, but Betz said he sees more
manage hybrid IT momentum on the DevOps side. "[Research] seems to actually suggest a negative correlation
infrastructure between ITIL-related practices and business outcomes," he added, citing information in
Page 20 of 25
In this e-guide
Accelerate: The Science of Lean Software and DevOps: Building and Scaling High
• Words to go: Aspects of an Performing Technology Organizations by Nicole Forsgren, Jez Humble and Gene Kim.
IT monitoring strategy
• Monitoring thresholds
▼ Next Article
determine IT performance
alerts
• IT incident management
best practices to minimize
disruptions
Page 21 of 25
In this e-guide
Page 22 of 25
In this e-guide
Increasingly, SDN vendors base their controller
• Words to go: Aspects of an technologies on open source standards, which For those who don't plan
gives IT teams some flexibility to duplicate their for hybrid IT upfront, it
IT monitoring strategy private networks on a cloud platform. However,
might be necessary to
SDN is sometimes out of reach for the average
• Monitoring thresholds manually rebuild a
enterprise because of the specific skill sets and
determine IT performance upfront investments the technology demands. network.
alerts For those who don't plan for hybrid IT upfront, it
might be necessary to manually rebuild a
• IT’s application support network. This is true whether extending networks from on premises to the cloud or between
model depends on everyone cloud providers. For example, if migrating from AWS to a smaller regional cloud provider, the
smaller provider might not offer networking features that are as mature or extensive as those
outside IT from AWS. Consequently, IT teams will have to do some upfront work to map features
between providers and identify possible tradeoffs.
• IT incident management
best practices to minimize Use multi-cloud management tools
disruptions If multi-cloud is part of your current or future plans, a third-party cloud management platform
• Top-tier ITSM skills go (CMP) is a necessity, as the cloud-native management tools from the major cloud providers
force admins to hopscotch between management dashboards, which hurts productivity and
beyond a good tool set alerting. Choose a CMP that provides a single view and interface across multiple cloud
platforms. Administrators should be able to monitor resources and receive reports in a
• Modernize ops practices to centralized location.
manage hybrid IT
Beyond the CMDB
infrastructure
Page 23 of 25
In this e-guide
Hybrid IT infrastructure requires ops teams to transcend the traditional configuration
• Words to go: Aspects of an management database (CMDB). The nature of hybrid IT environments requires an active and
passive management system that inventories at all endpoints, not just from a central location.
IT monitoring strategy
Even if an organization has shied away from formal IT service management, there remains
• Monitoring thresholds the question of asset management, configuration management and service management.
Hybrid IT can complicate these practices, so it's crucial to implement management tools that
determine IT performance
provide an inventory of resources across cloud platforms. ServiceNow and Cherwell Software
alerts offer options here, while CMPs such as CloudBolt Software and CloudTamer are good
choices for enterprises embracing multi-cloud that don't already have mature service
• IT’s application support management practices.
model depends on everyone
Implement infrastructure as code
outside IT
Another option to manage hybrid IT infrastructure is to adopt IaC, a model in which
• IT incident management enterprises manage infrastructure provisioning and configuration the same way they do
best practices to minimize application code. IT teams store provisioning and source control logic in a centralized
repository and use CI/CD pipelines, so the logic is visible across the organization. With IaC, IT
disruptions teams can define an application stack via a configuration file or script and then automatically
• Top-tier ITSM skills go run that stack in a range of environments -- a capability that's especially beneficial in hybrid IT
setups.
beyond a good tool set
Red Hat Ansible and HashiCorp Terraform are two IaC tools that target hybrid IT
• Modernize ops practices to infrastructure. The major cloud service providers, including Amazon Web Services, Google
and Microsoft, also support IaC on their platforms. All three providers integrate with
manage hybrid IT Terraform, so users can manage cloud resources via the tool.
infrastructure Like SDNs, IaC can also pose an adoption challenge for IT departments, as it's still an
emerging practice and skills gaps exist.
Page 24 of 25
In this e-guide
Consider configuration as code
• Words to go: Aspects of an
A forward-looking option for hybrid IT management is configuration as code (CaC) -- a
IT monitoring strategy
concept similar to, but different from, IaC. On a basic level, IaC uses actual code to build and
• Monitoring thresholds configure infrastructure, whereas CaC is best to manage software. With CaC, every detail
about how a piece of software is written, provisioned and managed is all part of its source
determine IT performance code, which is stored in a centralized repository. CaC enables admins to build and manage
alerts software across an enterprise via automation, and hybrid IT management tools that are
already in place audit those builds.
• IT’s application support
Jenkins and CloudBees Rollout are two examples of CaC tools.
model depends on everyone
outside IT
▼ Next Article
• IT incident management Words to go: Aspects of an IT monitoring strategy
best practices to minimize
Monitoring thresholds determine IT performance alerts
disruptions
IT's application support model depends on everyone outside IT
• Top-tier ITSM skills go
IT incident management best practices to minimize disruptions
beyond a good tool set
Top-tier ITSM skills go beyond a good tool set
• Modernize ops practices to Modernize ops practices to manage hybrid IT infrastructure
manage hybrid IT
infrastructure
Page 25 of 25