Data Centre Monitoring and Management: e Book
Data Centre Monitoring and Management: e Book
and management
e book
www.ECDsolutions.com.au
W hether its the server under the stairs or the
worlds largest - the 2,200,000 square foot
SwitchNAP facility in the Nevada desert -
the data centre is the hub of any business. In
many cases, it is the business itself.
Increasingly, data centre security, power consumption and
environmental monitoring are considerations on the radar for
administrators. Data centres are susceptible to hits from all angles:
viruses, spyware, network threats and cyberattack, heat, humidity,
airflow and smoke. In fact, its pretty frightening how quickly and
easily a companys IT operations can be taken down, which makes
it hard to understand why some companies do not implement a
solid strategy for dealing with IT threats.
A combination of environmental and security monitoring,
remote console management and remote power control gives
administrators the power to control data centre environmental
conditions and security, server and network equipment,
and power distribution and usage. Automatic notification of
environmental conditions or security breaches before the event
disrupts network infrastructure can mean the difference between
a minor inconvenience and a major crisis.
Couple rising power prices with the need to closely control the
environment and data centres can easily be seen as the bad guy
when it comes to the overall energy consumption of a business,
but this neednt be the case. A bit of forethought in terms of design,
a few behavioural changes and a system of monitoring and
controlling power distribution can help keep energy use in check.
Todays data centre has an expected lifespan of just under 20 years,
so making a few informed decisions and considered investment
now can make all the difference a decade down the track.
Dannielle Furness
Editor ECD Solutions
Contents
3 Remote console management and power switching
F
ortunately, a number of technologies are from a centralised console, though not all such
available to overcome or at least minimise systems are able to manage all types of devices.
the negative effects of physical separa-
This approach is known as in-band management,
tion of systems administrators from the
because the management traffic flows across the
hardware they manage. Among them are remote
same network as the rest of the data.
console management and remote power switching.
This is all very well as long as the network is
running properly, but what happens when a device
Remote console management
such as a switch or firewall is at least part of the
A basic part of the picture is remote console problem and therefore network communications are
management. interrupted? Thats where out-of-band management
comes into play.
Many network or networked devices - servers,
printers, routers, switches and others - include a Many devices are still fitted with a conventional
web-based console allowing remote management serial port to allow the connection of a terminal
and configuration. In addition, or alternatively, (these days its more likely to be a notebook running
they may make provision for remote management terminal emulation software rather than a dedicated
via other network protocols such as SNMP, Telnet terminal) for configuration or management. While it
or SSH. is possible to attach a modem to the serial port to
allow remote management, that is rarely practical
Apart from manual control using such protocols,
where multiple devices are concerned.
there is a range of systems management tools from
3
various vendors that allows device management Enter the serial console server.
A serial console server connects to the serial latter being a particularly useful way of maintain-
ports of multiple devices and is in turn connected ing service levels, though the underlying cause of
to a modem (for dial-in access) or via a completely recurring failures should be investigated). These
separate IP network. This allows staff at other loca- and other conditions such as power supply ir-
tions to dial in (literally or metaphorically) to the regularities may also be reported via email or to
console server and connect to the troublesome a syslog or SNMP server.
device. This means it is important that the console
More advanced models may include power
server has appropriate security features such as
metering at the aggregate or individual circuit
strong authentication (eg, via a RADIUS or Kerberos
level, as well as metering the actual input voltage.
server) and logging - and, where dial-in modems
are used, dial-back capability to ensure the call Remote power switching can also be used to turn
is coming from an authorised location. devices on or off when they are needed (possibly
according to a schedule) to save power. In more
If the console server is not accessible via the
complex environments they can also be used to
network (which is the main reason it would be ac-
balance power loads across the machine room.
cessed via the serial port), then the usual network
security measures would not be available and Remote switching is not only for devices that
so local authentication would be required. Other run on mains power. Models are also available
security measures include keystroke logging and for remotely switching DC power, including 12, 24
automatic screen captures - such features do not and 48 V supplies. Such DC systems are said to
reduce the risk of intrusion, but they do add to reduce energy losses and therefore achieve greater
the audit trail. efficiency, as well as being simpler and cheaper
than AC power distribution systems.
Another issue is that the console server should
be able to check that an attached modem is Not all remote power switches are rack mounted:
functioning correctly. If a simple re-initialisation some models resemble an oversized power strip
fails to restore normal operation, rebooting via a and are designed for vertical mounting within a
remote power switch (see below) should do the cabinet and others are packaged in standalone
trick. The server should also periodically check for cases to support a small number of devices that
a dial tone. Given that the modem is the alterna- are not rack mounted (eg, a small branch office
tive way into the console server in the event of server or a kiosk in a shopping centre).
a network failure, it is important to attend to any
Both mains and DC remote switches may also
communications issues promptly and as far as
provide automatic switching to a secondary power
possible automatically.
supply should the primary supply fail or become
Console servers are not limited to out-of-band unstable. Combining automatic switching with
communications via serial interfaces. An alternative remote switching in this way saves rack space.
arrangement is to configure a completely separate
Another way to save space - especially in smaller
IP network for communication between the console
installations such as branch offices - is to select
server and the central control point. Such a fall-
a device that combines the remote console server
back network is much simpler than providing a
and remote power switching in one box.
secondary network path for every device, but if
any components are shared with the main network As when choosing other pieces of network
it may not be possible to reach the console server equipment, it is worth checking that a remote
in the event of an outage. power switch supports IPv6 as well as IPv4, as the
former is likely to come into common use within
the lifespan of the device.
Remote reboot power switch
Sometimes a device is completely unresponsive,
even on its serial port. In that situation all that is left Conclusion
is to cycle the power. How do you do that without Remote console servers and remote power switches
gaining physical access to the device? Use a re- make it easier for IT operations staff to deliver
mote reboot power switch, also known as a network improved service levels, especially where the equip-
power switch. Such a device can be thought of as ment they manage is geographically dispersed or
a remote-controlled power strip. Typically mounted located at a third-party data centre with restricted
in a rack along with the equipment it controls, the access. Remote console servers make it easier
switch provides multiple independently controlled to manage multiple devices with serial console
mains sockets that can be turned on or off via its interfaces and may provide a backup control path
network or serial interfaces. For maximum robust- in the event of a network outage. Remote power
ness, that serial interface can be connected to a switches allow power cycling without the need for
serial console server along with the other devices. physical access to the hardware.
Remote switches may provide additional function- These functions can be combined by connecting
ality including automatic power-off if temperatures the remote power switches to a remote console
exceed a threshold or automatic power cycling if server or by selecting a product that provides both
4 capabilities from a single unit. n
a network component becomes unresponsive (the
Energy use and data centres
Dannielle Furness, Editor, ECD Solutions
iStockphoto.com/DeepDarkness
J
ust as the industrial revolution did before There are three principle considerations for the
it, the information age has transformed our development and ongoing operations of a data centre,
world and the way we operate within it. Its the nebulous nature of which make future-looking
changed the way we access data and do decisions all the more difficult.
business; the way we communicate, educate and
consume information. It has given rise to entire
industries and countless changes to employment
Space
and careers; jobs have been created and others While the evolution of technology continues to shrink
rendered futile, often virtually overnight. the physical size of hardware and we live in the era
of virtual servers, theres no doubt the landscape has
Communication methods and the equipment we
changed considerably since the 1980s, when a 1
relied on 20 years ago, such as faxes and dial-up
GB hard drive was the size of a juke box. However,
modems, now seem as quaint as the quill and inkwell.
the sheer volume of data requiring storage ensures
The advent of mobile computing, the prevalence of
smartphones and a move to cloud computing in more that space still dictates the direction for design and
recent times have multiplied the effect and we now operations.
take for granted that digital files are permanent and Its hard to imagine total global capacity, but in
can be retrieved at any time, from any location, in February 2011, the University of Southern California
a matter of seconds. released research which calculated current worldwide
For data centre owners and operators, the chal- data storage at 295 exabytes, or 295 billion gigabytes.
lenge is to cope with the constantly changing face And it keeps growing; in a study conducted by IT
of the industry. Not only must they factor in the im- research company IDC in June of the same year, it
pact of continual shifts in customer expectation and was predicted that the world will generate 50 times
practices to adequately manage the mounting power current data production levels by 2020. Its all got to
consumption from this demand, but also crystal ball be stored somewhere. Its growing at such a rapid
into the future to ensure the projected life expectancy rate that in the not-too-distant future well hit a level
of the data centre (just under 20 years) is delivered that we havent even derived a term for yet ... but
5
and it meets commercial targets. thats another story.
Power consumption hot and cool areas. Where possible, Google uses
water for cooling, rather than chillers.
Theres no denying data centres are power-hungry
beasts. Power to run the IT equipment itself, then The Open Compute Project, which has made public
power to run cooling and other environmental controls Facebooks so-called secret data centre recipes, is
and ancillaries like lighting. Power consumption in a a bid to encourage data centre development that is
data centre is often measured using PUE, or power more efficient from both a cost and power perspec-
usage effectiveness. PUE is the ratio of total power tive. After 12 months redesigning their server specs,
for the facility, including cooling, lighting etc, divided Facebook worked with manufacturers to achieve
by power utilised by the IT gear alone. a 38% increase in efficiency and a product they
maintain costs 24% less than the industry standard.
POWER USAGE EFFECTIVENESS They assert a PUE of 1.07 at the Prineville, Oregon
data centre. Google claims between 1.06 and 1.12
= Overall facility power/IT equipment across its centres, dependent on the interpretation
power. of total facility power usage (it claims it uses a more
Guidelines indicate an optimal PUE target of 1.0, stringent approach than others).
meaning that almost all power usage is consumed
by the IT hardware itself. Given the requirement for
Design
So, while not every data centre is on a par with Google,
cooling and environmental controls to ensure that
Facebook or Amazon, lessons can be learned from
ambient conditions are the most favourable for the IT
the way the way the big guys address problems.
equipment, its not uncommon to find a PUE closer
The same basic design principles apply and the
to 2.0. Not uncommon, but not ideal either.
problems they are facing today are the problems
of the future for smaller scale projects, particularly if
Cooling and environmental controls you consider the projected lifespan of a data centre.
Continual reliable operation is paramount in a data
To assist with the design process, professional
centre as any downtime can spell disaster. Hardware
organisations such as ASHRAE make a wealth of
is susceptible to overheating if adequate cooling and
information available to members including a com-
ventilation arent in place and even a few degrees
prehensive selection of publications specifically for
can make the difference between business as usual
the datacoms sector. These incorporate guides on
and catastrophic failure. If the installation is fortunate
best practice design for energy efficiency in data
enough to escape immediate failure, it can still suffer
centres, power trends and cooling applications and
delayed malfunction as fragile electronic componentry
real-time energy consumption measurements. See
can break down weeks after an overheating incident.
www.ashrae.org/bookstore for more.
Factor in loss of business, hardware replacement
and employee underutilisation during downtime and
Monitor, monitor, monitor
its easy to see that the costs soon add up and why
The importance of monitoring really cant be over-
operators are so keen to avoid it.
emphasised and, as the size and scope of data
How the big guys do it centres increases, visibility from a remote location
is imperative. With a simple monitoring system in
Some of the worlds bigger data centre operators
place, changes to environmental conditions that pose
including Google and Facebook, have been busy
a threat to system operation are identified before the
publishing information on their own centre energy-
crisis unfolds, via a web browser from any location.
efficiency initiatives, temperature control and other
cooling methods. Cynics might suggest that this Power usage monitoring is also useful and can
transparency is a PR exercise, but if theres lessons provide valuable design input for data centre upgrade
to be learned, why not take heed? projects in particular. Many solutions offer everything
from continual data logging and report generation,
Google suggests that most data centres are
which give a snapshot of the situation as it stands,
probably running cooler than they actually need to.
through to enabling corrective measures.
They cite the American Society of Heating, Refrig-
erating and Air-Conditioning Engineers (ASHRAE) Systems incorporating redundant power switching
and IT equipment manufacturers as expert opinion provide a reliable method of automatically switching
and suggest a slight temperature increase will not equipment to a backup power source, ensuring
only have no detrimental effect on equipment, but critical network devices are always up and running.
will deliver an immediate measurable energy saving.
There was once a time where the power draw of
The company also implements a design ethos a data centre would be the least of a contractors
comprising thermal modelling and airflow controls. The concerns. As long as the install went according to
modelling identifies potential data centre hotspots, plan, then job well done. These days, everyone on
so that equipment can be physically laid out in a the project has an interest in keeping power costs
fashion that delivers even temperatures across the down as the crossover between roles creates some
installation. Methods of airflow control that require no blurring of lines of responsibility. At the very least,
energy, such as plastic curtains and blanking panels, it makes sense to have an understanding of the
6
are utilised to ensure adequate segregation between factors that influence overall project power costs. n
How to remotely reboot
after a system lock-up
W
hen a piece of LAN/WAN, telecom or standard modems, only terminal emulation soft-
other control equipment has locked ware is required to dial the site and switch the
up and is no longer responding to power. Also, real-time communication with the
normal methods of communication, it reboot switch provides a response after each
is often necessary to perform a cold boot. After command has been accomplished.
the power has been cycled on and off, normal
Applications for the remote reboot switch can
communications via the network can resume.
range from common scenarios to more complex
This can be difficult at unmanned sites or when ones. Centrally controlled WAN environments
the problem occurs outside business hours. Even have a range of equipment, such as server
if a reboot is needed while someone is around, routers and dial-up equipment that frequently
you still have to hope that the employee is savvy lock-up and require a reboot. More specialised
enough to reboot the right device. scenarios involve satellite-controlled equipment
at communication towers, cellular towers or radio
For systems administrators, the ability to per-
equipment.
form a power cycle or remote reboot is also a
means of avoiding communication disasters. One The units can switch any AC powered device.
solution is a remote power reboot switch, which Heaters and air conditioners have been remotely
can be controlled by the systems administrator turned on and off at unmanned sites to protect
to ensure correct booting sequences in the event computer equipment.
of system failures.
Remote power reboot hardware is suitable
Such a remote reboot power switch is controlled for cluster management, where services are dis-
via ASCII commands. This means you can reboot tributed across a number of computer systems.
with a standard external async modem or over the
For applications that require high amperage,
TCP/IP network using a terminal server, comm.
heavy-duty/high-amperage reboot units have also
server or local server with terminal software.
been developed. The convenience of remote
ASCII commands sent to the reboot switch AC power control can be a welcome addition
can either query the current status, turn on/off to your current network management strategy,
or cycle (reboot) the AC power of any AC equip- and can also save time and expense of off-hours
ment attached to the switch. Since the reboot service calls. n
switch is controlled using these commands and
7
Image courtesy of Creative Commons
The data centre of the
iStockphoto.com/Baris Simsek
next decade
Anthony Caruana, Editor, Technology Decisions
We are in the middle of the third great revolution of technology delivery. The
first was the mainframe era - where computing power was centralised and end-
user devices were unintelligent terminals. Then came the PC era. Marked by
massive increases in computing power, the pendulum swung completely with
client computers having more power and servers being relegated in importance.
W
e are now in the third wave. Many The way businesses look at delivering applications
services are centralised as data centres and other services must adapt. During the mainframe
have increased in computing power and client-server eras, the IT department had control
and capacity while end users enjoy a of the technology supply chain from infrastructure
massive variety in the types of equipment they can to applications. According to Trevor A Bunker, a
use and where they can use it. What does all this Vice President with CA Technologies, CIOs will be
mean if we are planning a data centre strategy that designing and managing data centres that bear little
will see us through the next decade? resemblance to those of today.
The nature of business and what CIOs need to The data centre of the future, from the CIOs view,
deliver to the business is changing at a pace that will not include the infrastructure. The infrastructure
is impossible to react to. CIOs and decision makers will be completely decoupled. I dont think that when
are working at a time where business cycles are CIOs think about the data centre that theyll even
contracting and change is accelerating. concern themselves with the infrastructure.
Robert Le Busque, Area VP Strategy and Develop- This begs the question - what is a data centre?
ment in Asia Pacific for Verizon explains: When the
curtain was falling on the Beijing Olympics the iPad In our view, the data centre is where applica-
didnt exist, the iPhone was only an infant and the tions, business communications and business logic
8
digital universe was five times smaller than it is today. reside. Typically, the data centre has also included
had a significant impact on prices but theres little
doubt that large users of power will continue to see
the bottom line being impacted. Although theres no
universally agreed statement on what will happen to
power prices over the next decade, you can expect
your bills to increase by between 5% and 15% per
year over the next decade.
In addition, natural gas is a far more environ- Issues of data sovereignty, confidentiality, reli-
mentally friendly, and therefore cheaper, fuel than ability, connectivity and commercial arrangements
coal or many other alternatives. As carbon emis- dominate any discussion of cloud services. Its
sions become a greater impost on the bottom line, interesting that service providers are starting to take
being able to produce energy with lower carbon a more active role in our region with Rackspace
emissions can make a financial difference. opening a new data centre in Australia and making
specific mention of how it wont be subject to the
A recent trigeneration implementation by the Na-
Patriot Act although theres considerable debate
tional Australia Bank cost $6.5m but was expected
about the veracity of that statement.
to deliver $1m per annum in savings.
Both IDC and Gartner have recently published
research suggesting that a hybrid approach will be
Power management is a key a viable option. So, its likely that your data centre
Power management cannot be an afterthought in
in 2025 will have some local services and some
the next decades data centre. It requires as much,
either externally hosted or delivered as cloud ap-
if not more, planning and consideration than almost
plications. The physical footprint of your premises
any other aspect of the data centre.
will no longer bound your data centre.
Power management is more than just worrying
about the quality and reliability of supply and using
low energy devices. In order to execute an effective
What will your data centre look like in
power management strategy in the data centre,
the next decade and beyond?
It will be denser with more computing power per
managers need ensure that the supply to every
square metre than today. But it will also require
device is as reliable and cost effective as possible.
more power and generate more heat. Youll be a lot
For example, automatic transfer switches can smarter about where you build the data centre - if
automatically switch a device from the primary to a you build one at all - and youll probably start by
backup power source without interrupting operations. looking at the energy and carbon footprint as closely
This is done by constantly monitoring the power as the physical specifications of the equipment.
quality. Furthermore, the right power management
Youll consider making it either energy self-
equipment will also support remote management
sufficient or less dependent on power from the grid.
of devices by supporting remote power on, off and
system restarts in the event of a system becoming Where theres no competitive advantage or a
unreliable. clear cost benefit, youll probably use cloud services
where providers can deliver on your operational
Finally, smart power management strategies can
needs and energy management goals.
make it easy to detect which devices are busy and
adjust their power use depending on the workload. What is clear is that the days of companies
For example, research from Carnegie Mellon Univer- building large rooms with raised floors, expensive
sity suggests that some significant savings in power temperature management and large capital invest-
use can be achieved by powering off servers that ments are fading because the criteria for making
10 are not in use and then bringing them back up to the investment decisions are changing. n
iStockphoto.com/Thomas Maier
Increasing uptime with improved
environmental monitoring
Workers and customers, empowered by smartphones and widely available Wi-
Fi services, want and are demanding 24x7 access to email, company network
resources and websites. And thanks to todays global marketplace, even small
companies must support round-the-clock activities.
U
nfortunately, IT system downtime remains contribute to or increase downtime and service
a problem for companies of all sizes. disruptions.
A 2010 eWEEK article reporting on an
Heat can be a killer. Extreme heat build-up can fry
industry study noted that North Ameri-
a server, knocking it offline and perhaps damaging
can businesses suffer an average of 10 hours of
it permanently. Even moderate heat build-up can
IT downtime annually. The article went on to note
have an impact. Equipment failure rate doubles for
that this downtime costs small companies about
every increase of 10C, according to studies done
$55,000 in revenue each year, while large com-
by the high-performance computing researchers at
panies lose about $1 million per year.
Los Alamos National Laboratory. Increased failure
To avoid the problems that can cause downtime, rate due to prolonged heating has also been noted
companies need to closely observe server room by the Uptime Institute and others.
environmental conditions and be alerted when
When it comes to monitoring temperature, it is
problems arise. This is an area where ITWatchDogs
not good enough simply to nail a thermostat to the
environmental monitoring solutions can help.
wall. Since the temperature can vary drastically
around different pieces of equipment, you should
Examining the causes of downtime consider placing separate temperature probes
11
Several data centre environmental factors can within individual racks or critical devices. That way,
problems with a broken fan or an air-conditioning
failure will show up quickly. Similarly, you might be
able to identify a server that is overheating due to
it running an excessive workload.
iStockphoto.com/Thomas Maier
The information is presented in a manner that allows
quick inspection of current temperatures, as well as
historical data to help spot heating pattern trends.
Finally, all ITWatchDogs environmental monitors are
capable of sending alerts via SNMP traps, email and
SMS messages. Some devices can also trigger an
external phone dialler to provide voice call alerts
for up to nine phone numbers, when predefined
thresholds are exceeded. A water main break in Texas took down the
computer systems in the Dallas County Records
Other server-room environments can cause down- Building. According to The Dallas Morning News,
time problems and need comparable monitoring and this [crippled] operations for almost the entire
alerting capabilities. county government.
Humidity is another major threat. The reason: Hu- Rains flooded a T-Mobile data centre in the
midity is the amount of water vapour in the air, and Pacific Northwest, taking down servers support-
too much water vapour can form condensation on ing the companys service activation portals and
electronic components, leading to electrical shorts. If websites.
the humidity is too low, there is an increased chance
Water is usually measured using a cable that is
of damage from electrostatic discharge. In either case,
run under an equipment rooms raised floor. When
uncontrolled humidity can severely damage critical
water comes in contact with the cable, an alarm is
server components, causing the server to crash
triggered.
and shutting down access to applications and data.
Proactive water monitoring should make use of
Unfortunately, humidity is one of the trickiest sensors capable of detecting the presence of water
environmental characteristics of a server room to over a large area so remedial action can be taken
measure and, as such, requires very close attention. before it shorts out equipment.
To measure humidity, most companies have A less frequent cause of downtime is fire and
focused on relative humidity. In fact, for years the smoke. In 2008, a fire destroyed 75 servers, routers
guidelines followed were based on recommendations and switches in a Green Bay data centre, accord-
of the American Society of Heating, Refrigerating and ing to Data Center Knowledge. Smaller fires and
Air-Conditioning Engineers (ASHRAE). The group sug- smoke from equipment or frayed wires can trigger
gested that the relative humidity for computer rooms fire-suppression systems which, while much better
be within the 40 to 55% range. However, because today at safeguarding equipment, can still cause
relative humidity varies with temperature, ASHRAE damage to IT equipment.
now recommends that data centres measure absolute
To detect fire and smoke requires more than tra-
humidity, expressed as the dewpoint (it should fall
ditional building smoke alarms. The problem is that
within 5.5 to 15C).
when they sense smoke there may be no one around
As was the case with temperature measurements, to hear it. Whats needed is an alarm that connects
humidity can vary significantly within a data centre. to web-enabled environmental monitors. In this way,
So sensors must be placed throughout the room the smoke alarm works as it normally does, but its
and server racks. alert can now be sent via an SNMP trap, email,
SMS and/or voice call to multiple IT staff members.
Water in a server room is never good news.
Whether the source is a leaking or burst pipe, or a ITWatchDogs environmental monitors come
flood, water can easily shut down an entire organisa- equipped with various onboard sensors along with
12 tion. Examples include: digital and analog inputs for external sensors, includ-
remote power monitoring and switching capabilities
to any ITWatchDogs environment monitors supporting
a digital sensor port. The add-on accessory presents
real-time logging and graphing of voltage, amper-
age, real power, apparent power, power factor and
kilowatt-hour to provide trend analysis and power
metrics for future planning. The device enables us-
ers to set alarm thresholds for these measurements
and it can remotely reboot locked systems or control
system power via the secure user interface.
Founded in 1989, Interworld Electronics offers a full range of highly reliable industrial rack mount
and embedded computer solutions, data acquisition and communications hardware, audio visual
distribution systems, call centre recording, data centre power and environmental monitoring equipment.
Our products and solutions can control manufacturing processes, collect data, control transportation
systems, monitor power and increase data centre efficiencies.
Interworld Electronics is not simply a component re-seller; we focus on helping our customers meet
their demands. We work together with you as our partner in developing and delivering a total solution
for you or your clients. Our commitment to quality has attracted some of the best known public and
private sector companies in industries as diverse as telecommunications, mining, petrochemicals,
pharmaceuticals, defence, medical, transportation, call centres and data warehousing.