0% found this document useful (0 votes)
116 views89 pages

DC Training Manual - Part 4 DC Cabling Security MGMT Design v4

The document provides a comprehensive guide on datacenter design, focusing on cabling, physical security, and management strategies. It emphasizes the importance of space savings, reliability, and manageability in optimizing data center infrastructure to support growing demands and ensure efficient operations. Key considerations include capacity planning, budget management, and the selection of appropriate cabling solutions to minimize downtime and enhance performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views89 pages

DC Training Manual - Part 4 DC Cabling Security MGMT Design v4

The document provides a comprehensive guide on datacenter design, focusing on cabling, physical security, and management strategies. It emphasizes the importance of space savings, reliability, and manageability in optimizing data center infrastructure to support growing demands and ensure efficient operations. Key considerations include capacity planning, budget management, and the selection of appropriate cabling solutions to minimize downtime and enhance performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Datacenter Design Reference Guide

Part 4 of 4
DC Cabling, Physical
Security, and
Management Design

Produced By:- Issued by:-


Hashim Ahmed Almansoor KUN Center for Business Development
Senior IT/ Datacenter Consultant Solutions & Services

Email: [email protected] Email: [email protected]


Mobile: +967 771 600 555 Office Phone: +967 1 422 999
Sana’a - Yemen Sana’a - Yemen
Table of Content
Table of Content 2
Datacenter Cabling Infrastructure 5
INTRODUCTION 5
TOP OF MIND ISSUES 5
THE THREE PRINCIPLES OF DATA CENTER DESIGN 6
PRINCIPLE 1: SPACE SAVINGS 6
PRINCIPLE 2: RELIABILITY 6
PRINCIPLE 3: MANAGEABILITY 7
CHOOSING THE RIGHT MIX OF EQUIPMENT 8
COPPER CABLING 8
FIBER OPTIC CABLING 8
DEPLOYMENT PRODUCTIONS 10
ADDITIONAL CONSIDERATIONS 10
AVOID COSTLY DOWNTIME AND 11
PREPARE FOR THE FUTURE 11
BEST PRACTICES FOR DATA CENTER CABLING DESIGN 12
GROUNDING FOR SCREENED AND SHIELDED NETWORK CABLING 18
WHY BOND AND GROUND? 18
S/FTP AND F/UTP VS. UTP - HOW DOES THE NEED TO GROUND EFFECT INSTALLATION PRACTICES? 19
WHERE FROM HERE? 20

DC – Fire & Physical Security Infrastructure 22


DC FIRE PROTECTION & SUPPRESSION SYSTEMS 22
INTRODUCTION 22
DATA CENTER DESIGN STANDARDS – NFPA CODES 22
CLASSIFICATION OF FIRES 23
CHOOSING A FIRE PROTECTION SOLUTION 24
FIRE DETECTION SYSTEM TYPES 24
FIRE SUPPRESSION SYSTEM TYPES 27
PULL STATIONS AND SIGNALING DEVICES 31
CONTROL SYSTEMS 32
MISSION CRITICAL FACILITIES 33
INDUSTRY BEST PRACTICES 34
COMMON MISTAKES 35
CONCLUSION 36
PHYSICAL SECURITY IN MISSION CRITICAL FACILITIES 37
INTRODUCTION 37
DEFINING THE PROBLEM 38

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 2
APPLYING THE TECHNOLOGY 40
ACCESS CONTROL DEVICES 42
OTHER SECURITY SYSTEM ELEMENTS 45
THE HUMAN ELEMENT 47
CHOOSING THE RIGHT SOLUTION: RISK TOLERANCE VS. COST 47
CONCLUSION 49

Datacenter Monitoring & Management 50


MONITORING PHYSICAL THREATS INTHE DATA CENTER 50
INTRODUCTION 50
WHAT ARE DISTRIBUTED PHYSICAL THREATS? 50
SENSOR PLACEMENT 53
AGGREGATING SENSOR DATA 56
“INTELLIGENT” ACTION 56
DESIGN METHOD 59
SAMPLE SENSOR LAYOUT 60
CONCLUSION 60
DATA CENTER COMMISSIONING 61
INTRODUCTION 61
DEFINITION OF COMMISSIONING 61
OUTPUTS OF COMMISSIONING 62
INPUTS TO COMMISSIONING 65
COMMISSIONING PROCESS 67
TOOLS 71
CONCLUSION 75
PREVENTIVE MAINTENANCESTRATEGY FOR DATA CENTERS 76
INTRODUCTION 76
PM OUTCOMES 78
EVOLUTION OF PM 78
EVIDENCE OF PM PROGRESS 79
WHY PHYSICAL INFRASTRUCTURE COMPONENTS FAIL 81
RECOMMENDED PRACTICES 82
PM OPTIONS 86
CONCLUSION 89

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 3
Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 4
Datacenter Cabling Infrastructure
Introduction:
Data centers are at the core of business activity, and the growing transmission speed and
density of active data center equipment is placing ever-increasing demands on the physical
layer. Enterprises are experiencing enormous growth rates in the volume of data being moved
and stored across the network. The deployment of high-density blade servers and storage
devices in the data center to handle these workloads has resulted in spiraling rates of power
consumption and heat generation.

The implementation of a robust, integrated infrastructure to handle these demands and


support future data center growth is now more critical than ever. This white paper shows how
business priorities can be balanced with power, cooling, and structured cabling practicalities
to develop an integrated comprehensive data center support system. This “capacity
planning” process optimizes network investment by ensuring reliable performance now
and the flexibility to scale up for future business and technology requirements.

Top of Mind Issues


Based on PANDUIT Labs’ research on data centers, the following issues emerge repeatedly as
critical to the strategic planning process for both new builds and upgrades. Therefore facilities
and IT managers should keep them in mind from start to finish on any data center project:

• Capacity Planning: Decisions regarding data center design and future growth increasingly center on
power, cooling, and space management. The collective awareness of these issues is defined as
“capacity planning”. The effective deployment and management of these core resources allows the
data center to operate efficiently and scale up as required.

• Reliability: A reliable infrastructure is comprised of adequate power and cooling capacity; effective
bonding and grounding of system elements; and pathways that protect, route and manage the
structured cabling. By using robust systems comprised of quality components and materials, you can
minimize network interruptions and maximize uptime and business continuity.

• Budget: The high cost of operating a data center is a reality in today’s competitive business world.
Facilities managers have responsibility for a substantial portion of the annual data center operating
costs. Effective deployment of facilities infrastructure resources is directly connected to annual cost
savings and lowest total cost of ownership (TCO).

• Aesthetics: Traditionally the focus of the facilities manager has been, “Is it in place and functional?”
However, the data center represents a very high financial investment, with value residing in both
functionality and aesthetics. Today’s data centers have become showcase areas to demonstrate to
customers a visually appealing reflection of the company image. In this sense, facilities managers are
expected to maintain an infrastructure that is highly professional in appearance.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 5
Cabling and connectivity components of the data center
infrastructure can have a direct impact on the amount
The Three Principles of Data of real estate required in your data center. High-density
solutions require less rack, floor, and pathway space,
Center Design which leave more room for flexible reconfiguration
and growth.
When you understand the three principles of data center
design, you are able to:
• Lower your total cost of ownership
• Support your future growth plans
• Reduce your risk of downtime
• Maximize performance
• Improve your ability to reconfigure

Principle 1: Space Savings


Environmentally controlled real estate is expensive. The
cost of building a data center is more than $1,000 per
square foot in some cases (see Table 1). Data center
racks and equipment can take up an enormous amount
of real estate, and the future demand for more network Figure 1. Data Center with Flexible White Space
connections, bandwidth and storage may require even
more space. With insufficient floor space as the topmost
concern among IT managers today, maximizing space Principle 2: Reliability
resources is the most critical aspect of data center design.
Uninterrupted service and continuous access are critical
to the daily operation and productivity of your business.
Reliability Tier Tier I Tier II Tier III Tier IV
With downtime translating directly to loss of income,
Construction $450 $600 $900 $1,100 data centers must be designed for redundant, fail-safe
Cost/Square Ft reliability and availability. Depending on the business,
Annual Downtime 28.8 22 1.6 0.4 downtime can cost anywhere from $50K to over $6
(Hours) million per hour (see Figure 2).
Site Availability 99.671% 99.749% 99.982% 99.995% Figure 2. Financial Impact of Network Downtime per Business
Source: Uptime Institute
Table 1. Construction Cost per Square Foot
ATM Fees
Business environments are constantly evolving, and as
Airline Reservations
a result, data center requirements continuously change.
Online Sales
Providing plenty of empty floor space when designing
your data center enables the flexibility of reallocating Credit Authorization
space to a particular function, and adding new racks and Brokerage Operations
equipment as needed.
As connections, bandwidth and storage requirements 0 2 4 6 8
grow, so does the amount of data center cabling $M Per Hour
connecting key functional areas and equipment. Ample Source: Strategic Solutions
overhead and underfloor cable pathways, as well as Data center reliability is also defined by the performance
abundant trough space, are also necessary for future of the infrastructure. As information is sent back and
growth and manageability. forth within your facility and with the outside world,
Reducing existing data center space is likely the most huge streams of data are transferred to and from
expensive and disruptive problem your organization equipment areas at extremely high data rates. The
can face. Expanding the physical space of a data center infrastructure must consistently support the flow of data
can cost more than the original data center build without errors that cause retransmission and delays.
itself, requiring construction, movement of people and Cabling and connectivity backed by a reputable vendor
equipment, recabling, and downtime. Given these with guaranteed error-free performance help avoid
consequences, properly designing the data center for poor transmission within the data center. A substandard
space savings at the start is essential. performing data canter can be just as costly and
disruptive to your business as downtime.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 6
As networks expand and bandwidth demands increase, The use of a central patching location in a cross-
the data center infrastructure must be able to maintain connect scenario provides a logical and easy-to-manage
constant reliability and performance. The cabling infrastructure whereby all network elements have
itself should support current bandwidth needs while permanent equipment cable connections that once
enabling anticipated migration to higher network speeds terminated, are never handled again. In this scenario,
without sacrificing performance. In fact, the data center all modifications, rerouting, upgrades, and maintenance
infrastructure should be designed and implemented to activities are accomplished using semi-permanent patch
outlast the applications and equipment it supports by at cord connections on the front of the cross-connect
least 10 to 15 years. (Note that most active equipment is systems (see Figure 4).
replaced every three to five years.)
The protection of cabling and connections is a key factor
in ensuring data center reliability. Components that
maintain proper bend radius throughout cable routing
paths are critical to that protection. When cabling is
bent beyond its specified minimum bend radius, it can
cause transmission failures, and as more cables are
added to a routing path, the possibility of bend radius
violation increases (see Figure ). Pathways must maintain
proper bend radius at all points where the cable makes
a bend — both at initial installation and when cables
are accessed or added. The separation of cable types
in horizontal pathways and physical protection of both
cable and connections should also be implemented to
prevent possible damage.
Figure 4. Interconnection vs. Cross-Connection
Violating minimum bend radius
Maintaining proper radius The advantage to deploying centralized patching in your
Fiber Patch Cord data center include:
Fiber Patch Cord
• Lower operating costs by greatly reducing the time it
takes for modifications, upgrades and maintenance.
Initial Installation After Future • Enhanced reliability by making changes on the
Installation
patching field rather than moving sensitive
equipment connections.
• Reduced risk of down time with the ability to isolate
network segments for troubleshooting and quickly
reroute circuits in a disaster recovery situation.
Figure . Care must be taken to avoid minimum
bend radius rules when adding fibers Deploying common rack frames with ample vertical and
horizontal cable management simplifies rack assembly,
organizes cable, facilitates cable routing and keeps
Principle 3: Manageability equipment cool by removing obstacles to air movement.
Cable management at the rack also protects the bend
Manageability is key to optimizing your data center. The radius and manages cable slack efficiently. Connectors
infrastructure should be designed as a highly reliable must also be easily defined and accessed for
and flexible utility to accommodate disaster recovery, maintenance or reconfiguration with minimal disruption
upgrades and modifications. Manageability starts with to adjacent connections.
strategic, unified cable management that keeps cabling
and connections properly stored and organized, easy to
locate and access, and simple to reconfigure.
Cable routing paths must be clearly defined and intuitive
to follow while enabling easy deployment, separation,
access, reduced congestion, and room for growth. This
is especially important in data centers with large volumes
of cables. Cables managed in this way improve network
reliability by reducing the possibility of cable damage,
bend radius violations, and the time required
for identifying, routing, and rerouting cables.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 7
High-performing Category 6 cabling solutions also have
their place in the data center with the ability to support
Choosing the Right Mix of advanced applications, including 10 Gigabit Ethernet to
a limited distance of 55 meters. ADC's TrueNet® Category
Equipment 6 cable offers best-in-class performance with extra
bandwidth and guaranteed zero bit errors. In the data
Since the total spend for network infrastructure center, the smaller diameter of TrueNet Category 6
equipment is but a fraction of the entire data center cable saves as much as 2% of available cable pathway
cost, decisions for the so-called physical layer are often space. TrueNet Category 6 cable offers valuable data
taken lightly. But the fact remains that 70% of all center space savings, reliable performance, and easier
network downtime is attributed to the physical layer, cable management.
specifically cabling faults.
When selecting fiber and copper cable, connectivity Fiber Optic Cabling
and cable management solutions for the data center,
it’s important to choose products and services that With the same percentage of terminations as copper,
satisfy the three principles outlined here. A data center fiber optic cabling and connectivity is a significant part
infrastructure without components that ensure space of the data center. Fiber links are also the most critical
savings, reliability and manageability discounts the goal links because they carry data to and from a large number
of optimizing the data center. of sources, including outlying telecommunication rooms
and the outside world. Some data center designers
ADC's copper and fiber cable, connectivity, and cable tend to underestimate fiber optic cabling requirements,
management solutions come together to provide a believing that a few strands will suffice for current and
comprehensive data center infrastructure solution that future needs. But emerging technologies continue to
lowers total cost of ownership, enables future growth, be layered onto the network, and because fiber optic
and reduces risk of downtime. cabling is backward, and not forward compatible,
• High-density copper and fiber solutions that take designers should choose the fiber type capable of
up less rack, floor, and pathway space. supporting all the current and future applications
in the data center.
• Guaranteed performance for reliable transmission
and availability. Laser optimized 50µm multimode fiber was created as
the industry moved to 10 Gbps Ethernet. Traditional LED
• Advanced cabling solutions ideal for today and for
signaling technology could not support the higher speeds
migrating to 10 Gigabit Ethernet tomorrow.
and a shift was made to cost-effective short-wavelength
• Cable management solutions that protect cable (850nm) Vertical Cavity Surface Emitting Lasers (VCSELs).
and connections while offering easy identification, However, standard multimode fiber is not optimized for
accessibility, and reconfiguration. VCSELs, and as a result cannot support the necessary
distances. By removing the impurities and carefully
Copper Cabling grading the index of refraction of the core of multimode
fibers, laser optimized 50µm multimode fiber can achieve
For years, copper UTP solutions have been the preferred 10 Gbps Ethernet to 550 meters (see Figure 5).
network medium for horizontal cabling due to its cost-
effective electronics, familiar plug-and-play connections, 10 Gbps
850 nm Laser
and easy installation. Data center horizontal cabling
is no exception. As businesses evolve and data center Detector
needs grow, transmission speeds have migrated to Cladding
accommodate huge streams of data being transferred Core
back and forth between network equipment and servers.
ADC's TrueNet® CopperTen™ Augmented Category 6
cabling provides a copper system with the necessary
characteristics to enable 10 Gbps Ethernet transmission 10 Gb/s reliable transmission, design flexibility
over a full 100 meters. Additionally, CopperTen is
10 Gbps
backwards compatible to allow seamless migration from 850 nm Laser
existing Gigabit Ethernet devices to 10 Gbps Ethernet in
Detector
the future. For larger data centers where distances reach
beyond 55 meters and transmission speeds are Cladding
Core
anticipated to reach beyond Gigabit Ethernet, CopperTen
provides peace of mind that the cabling will support Laser Optimized MM fibers control DMD to support 10 Gb/s up
equipment and applications for many years. With to 300 or 550 meters with low cost 850 nm serial applications.
CopperTen in the data center, you won't have to worry
about recabling down the road. Figure 5. Standard Multimode Fiber vs.
Laser Optimized Multimode Fiber

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 8
10 Gbps speeds are a reality today; maximizing your Fiber Cable Management Systems
investment in your data center infrastructure cabling
requires laser-optimized 50µm multimode fiber. This fiber Fiber is a critical component of the data center, but it is
type will support bandwidth requirements for the future also a delicate medium. Fiber cable management must
and ensure reliability of your data center for many years protect fiber at all times for reliability while providing
to come. The cost difference between fiber types is minor space savings and manageability. ADC's Next Generation
when you consider total investment, and laser-optimized Frame (NGF) product line meets these objectives while
50µm multimode fiber won't confine your infrastructure promoting unlimited growth.
to length limitations. Because mixing and matching fiber Ideally used in a cross-connect scenario, ADC's Next
types is not recommended, it makes sense to choose Generation Frame is a fiber distribution frame that
one fiber type that can reach all areas of your LAN and allows you to implement the maximum number of fibers
provide the most flexibility and future growth for your in a given space. This high-density solution optimizes
data center. reliability and manageability through a balance of density,
protection and functionality. Ample trough space reduces
Copper Cable Management Systems fiber congestion and potential damage while enabling
A good cable management system should save space growth. Complete bend radius protection reduces
by allowing you to maximize the number of cables and attenuation of the signal and maintains consistent,
connections in a given footprint. It should also ensure long-term fiber performance. Easy front and rear
reliability by protecting cable and connections and offer connector access, built-in jumper slack storage, and
manageability through easy identification, access and intelligent routing paths provide easy identification,
reconfiguration. Because the use of a central patching tracing and maintenance.
location in a cross-connect scenario provides a logical Within the data center, it's critical that fiber jumpers
and easy-to-manage infrastructure that helps manage not only be protected at the fiber distribution frame,
growth without disrupting service, data center cable but also within the pathways going to and from the
management systems must also easily and efficiently fiber frame. ADC’s FiberGuide® Management System
accommodate this scenario. physically separates, protects, and routes fiber while
ADC's Ethernet Distribution Frame (EDF) forms a central ensuring that a two-inch minimum bend radius is
patching location between active Ethernet network maintained throughout, even as more cables are added
elements. By creating a centralized interface for Ethernet in the future. The system is extremely flexible, making
equipment, the EDF enhances data center manageability data center fiber routing simple and reducing installation
by enabling quick modifications and reconfigurations time without sacrificing durability. The FiberGuide's
without service disruptions. With the EDF, permanent covers protect fiber from accidental damage while
connections protect equipment cables from daily activity enabling easy access for rerouting and reconfiguration.
that can damage the cables. The EDF cross-connect also Whether it's for jumpers going from frame to frame in
scales easily for adding new technologies, and its high- a high-density situation or jumpers to server cabinets, in
density interface maximizes active ports and conserves any data center cross-connect scenario, the FiberGuide's
valuable floor space. covers can provide ample pathway space, fiber protection
for reliability, and simplified manageability.
At the heart of the EDF is ADC's Glide Cable
Management system, which consists of troughs that bolt
to the side of the frames and provide integrated front,
rear, horizontal, and vertical management, eliminating
the need for horizontal cable managers that take up
valuable rack space. The Glide Cable Management
effectively doubles rack density for data center space
savings, maintains reliability through built-in bend radius
protection, and offers better manageability by organizing
cables for easy reconfiguration and effectively storing
cable slack.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 9
Deployment Productions
The graphic below depicts a sample data center using ADC's comprehensive line of data center-grade
infrastructure products.

Fiber Distribution Boxes - Fiber Panels, Term, Splice, Storage Copper Patch Panels and Blacks
Wall Mount - Race Mount • CopperTen Augmented Category 6
• WMG • FL2000 • Category 6
• FL2000 • Fiber Management Tray (FMT) Copper/Fiber Rack Systems
• Building Entrance Terminals • FPL
• RMG • Glide Cable Management System
Fiber Raceways
• FiberGuide Copper Cable - Riser and Plenum
Fiber Cable - Riser and Plenum
• FiberGuide, plenum • CopperTen Augmented Category 6
• RiserGuide • Singlemode
• Category 6
• Multimode 50 and 62.5um
Fiber Distribution Frames • Laser-Optimized
• Next Generation Frame (NGF) • Armored

• A Zone Distribution Area (ZDA) – a structured cabling


Additional Considerations area for floor-standing equipment that cannot accept
patch panels
Standards • An Equipment Distribution Area (EDA) – houses
equipment racks and cabinets in a hot aisle/cold aisle
The TIA-942 Telecommunications Infrastructure Standard
configuration to dissipate heat from electronics
for Data Centers was published in 2005 and specifies
requirements and guidelines for data center To help you evaluate the required reliability of your data
infrastructures. The standard covers cabling distances, center, TIA-942 also provides a tier classification with
pathways, site selection, space, and layout, and is a specified availability and guidelines for equipment,
valuable tool in designing your data center infrastructure. power, cooling, and redundancy.
TIA-942 specifies the following key functional areas in
Cooling
the data center:
• One or more entrance rooms – houses carrier Servers and equipment are getting smaller and more
equipment and the demarcation point powerful to accommodate the need for high-density
data center installations. However, this concentrates an
• A main distribution area (MDA) – houses the data enormous amount of heat into a smaller area. Adequate
center's main cross-connect cooling equipment is a must, as well as the use of hot
• One or more horizontal distribution areas (HDA) aisle/cold aisle configuration where equipment racks
— houses horizontal cross-connects and is the are arranged in alternating rows of hot and cold aisles.
distribution point for cabling to the equipment This practice, which has met wide industry acceptance,
distribution areas

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 10
allows cold air from the cold aisle to wash over
the equipment where it is then expelled out the Avoid Costly Downtime and
back into the hot aisle.

Power Requirements
Prepare for the Future
Electricity is the lifeblood of a data center. A Business can optimize their data centers by
power interruption for even a fraction of a selecting data center infrastructure solutions that
second is enough to cause a server failure. The work together. By recognizing the value of the
measures you employ to prevent disruptions data center infrastructure and its components,
should be based on the level of reliability you can ensure that employees and customers
required. Common practices include: have access to the servers, storage systems, and
networking devices they need to carry out daily
• Two or more power feeds from the utility business transactions and remain productive.
company
• Uninterrupted power supplies Avoiding costly downtime, preparing for the
future, and lowering total cost of ownership
• Multiple circuits to systems and equipment with space savings, reliable performance, and
• On-site generators effective manageability is the ultimate means
to a thriving data center and overall successful
It's important to properly estimate your power business.
requirements based on the devices currently in
use in the data center, as well as the number of
devices you anticipate needing in the future to
accommodate growth. Power requirements for
support equipment should also be included.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 11
Best Practices for Data Center Cabling Design
When designing and laying out a data center, understanding best practices as well the pros and cons for
each type of data center is critical. The TIA 942 data center guidelines are very specific that horizontal and
vertical cabling should be run accommodating growth, so that these areas do not have to be revisited. It is also
specific about equipment not being directly connected unless it is specifically required by the manufacturer. This
is in line with other standards documents such as ANSI/TIA/EIA 568-B that design for opens systems architecture.
So the question is raised: what is the best way to do this for a 10Gb/s environment?

There are considerations outside of the cable plant and number of connectors alone: usability, scalability, costs
and the ability to perform Moves, Adds and Changes (MAC’s). Additionally, some limitations exist based on the
category of the cabling system. Copper and fiber distances may vary with the type of cabling system
selected. We will discuss some of those parameters and their potential impact on data center designs.

All copper channels are based on a worst case, 100 meter, 4 connector model. ISO/IEC 24764 (draft), TIA-942,
ISO/IEC 11801 Ed2.0 and recommendations from electronics manufacturers suggest that the fixed horizontal
portion of the channel be a minimum of 15m (50 ft.). While some shorter lengths may be supported in other
portions of the channels, there is a requirement in zone distribution and consolidation points for this minimum
distance. When moving to 10Gb/s electronics, the 15m minimum will likely exist for all horizontal cables due to
recommendations from electronics manufacturers and the fact that all models within IEEE are based on a
minimum 15m distance.

The 15m length is also dictated by signal strength issues, as your signal is strongest in those first 15m which can
create issues with two connectors in close proximity. By providing at least 15m to the first connection point in
the channel, you are allowing the attenuation to reduce the signal strength at the receiver or between
components. In order to achieve the 15m distance, two options exist: either provide space in the pathway to
take up the distance or create service loops under the floor. Service loops should not be a loop, but rather a
loosely configured figure 8 for UTP systems, however this configuration is not a requirement for F/FUTP or
F/STP systems. Bear in mind that the additional cable will consume more pathway space.

Copper distances for category 6A twisted pair cabling are limited to 100m for all channels with the exception
of 10GBASE-T running on category 6/class E cabling. The distance for these channels will be limited to less than
37m depending upon the scope of potential mitigation practices to control alien crosstalk. It should be noted
that the purpose of TSB 155 is to provide parameters for the qualification of existing Cat 6/Class E applications
for use of 10GbaseT, TSB 155 should not be used for designing new installations.

Fiber channel lengths vary based on the grade and type of fiber and type of interface. Understanding these
limitations will assist in the design and layout of the data center space. If you are utilizing 10GBASE-CX4 or
Infiniband, you are distance limited to a maximum of 15m. The following chart summarizes the distances for all
10G applications and their associated cabling systems.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 12
Application Media Classification Max. Distance
Wavelength
10GBASE-T Twisted Pair Copper Category 6/Class E UTP up to 55m*
10GBASE-T Twisted Pair Copper Category 6A/Class EA UTP 100m
10GBASE-T Twisted Pair Copper Category 6A/Class EA F/UTP 100m
10GBASE-T Twisted Pair Copper Class F/Class FA 100m
10GBASE-CX4 Manufactured N/A 10-15m
10GBASE-SX 62.5 MMF 160/500 28m 850nm
10GBASE-SX 62.5 MMF 200/500 28m 850nm
10GBASE-SX 50 MMF 500/500 86m 850nm
10GBASE-SX 50 MMF 2000/500 300m 850nm
10GBASE-LX SMF 10km 1310nm
10GBASE-EX SMF 40km 1550nm
10GBASE-LRM All MMF 220m 1300nm
10GBASE-LX4 All MMF 300m 1310nm
10GBASE-LX4 SMF 10km 1310nm
* As defined in 802.3an

THE LAYOUT...WHERE AND HOW TO CONNECT 1.1.1.1

When designing a cabling infrastructure, often cost is the deciding characteristic of the channel selected.
However, once all elements are considered, a design with higher initial cost may have a lower overall cost of
ownership to a company that has a lot of MAC activity. The most important concern is that designers are
familiar with all aspects of the different configurations available to make the best selection possible. A listing of
cost, flexibility and performance is listed below.

Model Cost Flexibility Performance


2-Connector Lowest Lowest Highest
3-Connector with CP Medium Medium Medium
3-Connector with CC Medium Medium Medium
4-Connector Highest Highest Lowest

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 13
SPACE PLANNING OPTIONS 1.1.1.1

The MDA (Main Distribution Area) is considered the core of the data center, connectivity will be needed to
support the HDA (Horizontal Distribution Area). Following TIA-942 recommendations and utilizing EDA’s
(Equipment Distribution Areas) and ZDA’s(Zone Distribution Areas) we would like to present four design
options for consideration.

OPTION ONE

Option One is to run all fibers and copper from the core horizontal distribution areas and equipment
distribution areas to a central patching area. This provides one central area for patching all channels.

There are several benefits to this design. First, all cabinets can remain locked. As patching is done in a
central area — there is no need to enter a cabinet at any time unless there is an actual hardware change.
For industries that are governed by compliance and security related issues, this may provide a greater benefit
by reducing physical access to connections. Intelligent patching can be added to the patching field to
increase security by automatically monitoring and tracking moves, adds and changes in that environment.

Option 1 Provides Any to All connectivity.


Patch cord changes in the Another advantage is that all ports purchased for
patching area can connect any
device to any device.
active gear can be utilized. With the ability to use
VLANs, networks can be segmented as needed.
MDA
Core & SAN Fiber
Patching
Area In other scenarios, entire switch blades are likely
Copper dedicated to a cabinet of servers. However, if there
Patching
HDA HDA HDA Area are insufficient server NICs to utilize all ports, then
LAN LAN LAN Copper the idle ports become costly inefficient. For
Switches Switches Switches Patching
Area
instance, if a 48 port blade was dedicated to a cab-
inet at location XY12, but there was only 6 servers
with two connections each, then 36 ports were paid
EDA EDA EDA
Servers Servers Servers for and maintenance is being paid on those ports to
remain idle. By utilizing a central patching field,
EDA EDA EDA
Servers Servers Servers
the additional 36 ports can be used as needed
elsewhere in the network thereby lowering
equipment and maintenance costs which are far
more expensive than the cable channels.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 14
OPTION TWO
Option Two is to place patch panels in server cabinets that correspond directly to their counterparts in
the switch cabinets. In this scenario, switch blades/ports will be dedicated to server cabinets. This may be
easier from a networking perspective, but may not provide the best usage of all ports in the active
electronics. Extra ports can be used as spares or simply for future growth. However, if an enterprise is
planning to implement blade technology where server density may decrease per cabinet, this may not be a
cost effective option.

For the switch cabinets, the type of copper cabling chosen will be a significant factor due to the
increased UTP cable diameters required to support 10GBASE-T. In reality, cabinets and cabling (both copper
and fiber) are changed far less frequently than the active electronics. But with the new category 6A UTP
cable‘s maxi- mum diameter of 9.1mm (0.354 in.), pathways within the cabinets may not provide enough
room to route cable and still provide the structural stability necessary. It is always recommended that
percent fill calculations be addressed with the cabinet manufacturer. Moving the patch panels to adjacent
locations or implementing a lower switch density may be required. While moving switches into open racks
with adjacent patch panels provides a solution, this is only recommended if proper access security processes
exist and some form of intelligent patching or other monitoring system is used so that network administrators
can be notified immediately of any attempt to access switch ports.

MDA
Core & SAN
Option 2

HDA HDA HDA


LAN LAN LAN
Switches Switches Switches

One to One patching for each


port. Least flexibility
EDA EDA EDA
Servers Servers Servers

EDA EDA EDA


Servers Servers Servers

Note: Black lines are Fiber, Blue lines are Copper

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 15
OPTION THREE

Option Three consists of providing consolidation points for connections. These can be either connecting blocks
or patch panels. This allows for a zoned cabling approach, but may lead to higher moves, adds and changes
costs. It is also difficult to design within the parameters of a 4 connector channel when using Zone distribution.

The other disadvantage to the consolidation point model is that the changes take more time than swapping a
patch cord if the pair count changes. Depending on the location of the consolidation point, there may be
additional risks from loss of static pressure under the floor when removing floor tiles ending up with more than
4 connectors in a channel, or harming existing channels during changes.

MDA
Option 3
Core & SAN

HDA HDA HDA


LAN LAN LAN
Switches Switches Switches
Consolidation Points (must be
CP CP CP 15m min. from horizontal patch
panels). Can be patched from
EDA EDA EDA any CP to any server cabinet.
Servers Servers Servers

EDA EDA EDA


Servers Servers Servers

Note: Black lines are Fiber, Blue lines are Copper

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 16
OPTION FOUR
MDA Core &
A final option is to have all server cabinets and switch cabinets SAN
in a row, terminating to a single patching field for the row,
rather than to a central location. Core connections from the
MDA are brought into this patching field. This option can work
well in ISP or other environments where cross department/cus-
tomer functionality is not desirable or tolerated. This option HDA
provides a bit of best of both worlds in that there will be some LAN Switches
spare ports, but also the floor tiles will not have to be lifted to
perform MAC work. While this is very similar to the first Copper /Fiber
option, the segmentation can make it easier for network admin- Patching Area
For Row
istrators and physical plant technicians to coordinate efforts.
Additionally this style of design provides for flexibility in the EDA
ever changing environment of shrinking and expanding Servers
storage/networking requirements over time.
EDA
Servers

Option 4

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 17
Grounding for Screened and Shielded Network Cabling

Shielded cabling, of one type or another, has been the preferred cabling infrastructure in many global markets for
many years. Cables described as foil screened unshielded twisted-pair (F/UTP) and fully shielded cables with an over-
all braid screen plus individual foil shielded twisted pairs (S/FTP) are now gaining popularity in markets where
unshielded twisted-pair (UTP) cabling has traditionally been the most common solution.

F/UTP S/FTP

This rise in adoption is tied to the publication of the IEEE standard known as 802.3an 10GBASE-T and this emerging
application’s sensitivity to noise from adjacent cabling. This noise from adjacent cabling is known as alien crosstalk.
Screened and fully shielded 10 Gb/s cabling systems, such as category 6A F/UTP and category 7 S/FTP, are all but
immune to the alien crosstalk that presents problems for category 6A UTP cabling. These cabling systems can help
reduce the size and cost of pathway spaces due to their smaller diameters..

Even as cabling installers and their clients increasingly enjoy these benefits, confusion surrounding the bonding and
grounding of screened and shielded systems has caused some to avoid them. This concern is unfounded, as advances
in screened and shielded cabling systems have simplified bonding and grounding methods tremendously. Today, the
installation and bonding and grounding/earthing of F/UTP and S/FTP cabling systems requires little additional effort
and expertise over UTP installations.

Why Bond and Ground?


While electrical services, telecommunications equipment, and all other low voltage systems are required to be bonded
to ground per national and local electrical codes and industry standards for safety reasons; the specific need to
ground screened and shielded network cabling systems is only a matter of performance. A properly bonded and
grounded cabling system carries noise currents induced by electromagnetic interference (EMI) in the environment to

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 18
ground along the screen or foil shield, thereby protecting the data-carrying conductors from external noise. The screen
or foil shield also minimizes cabling emissions. It is these functions that afford screened and shielded systems their
superior immunity to alien crosstalk and other sources of conducted or radiated electromagnetic interference.

S/FTP and F/UTP vs. UTP - How does the Need to Ground Effect Installation
Practices?
A standards-based UTP network cabling system requires no path to ground. However, according to ANSI-J-STD-607-
A “Commercial Building Grounding (Earthing) and Bonding Requirements For Telecommunications”, screened and
shielded cabling channels are required to be bonded through a conducting path to the Telecommunications Grounding
Busbar (TGB) in the telecommunications room (TR). Like UTP systems, F/UTP and S/FTP horizontal cable is terminated
to outlets at the work area and in the TR. Screened and shielded connector designs, such as Siemon’s
10G 6A™ F/UTP MAX and TERA® outlets, automatically ground to the patch panel in the TR during installation, with-
out the need to individually provide a ground termination for each outlet. The only additional step required to ground
these F/UTP and S/FTP cabling systems is to connect a 6 AWG wire from the ground lug provided on the patch
panel to the TGB.

The recommended grounding sequence is as follows: the , the outlet self-grounds to the patch panel, and then the
panel is grounded to the equipment rack or adjacent metallic pathways. The basic sequence is reflected in the
diagram below.

1 F/UTP cables screen or the S/FTP


shield is terminated by the outlet

2 Outlet makes contact with patch panel’s


grounding strip as outlets are snapped
into place

3 Panel is grounded to equipment rack or


adjacent metal pathways via 6 AWG
wire attached to panel ground lug

4 6 AWG ground wire connects rack to


the TGB

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 19
Where From Here?

The continuation of ground path from the equipment rack or adjacent metallic raceway to the TGB now falls under the
broader requirements of the telecommunications network grounding system. It is critical to note that the grounding steps
dictated by the applicable codes and standards are the same for UTP, F/UTP and S/FTP cabling systems. Although
standards and codes differ from region to region and country to country, the methodology for properly grounding the
telecommunication network is largely equivalent. To understand the process, a few definitions are required. The follow-
ing are taken from ANSI-J-STD-607-A and illustrated in the diagram below:

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 20
Bonding: The permanent joining of metallic parts to form an electrically conductive path that will assure electrical
continuity and the capacity to conduct safely any current likely to be imposed. To expand on the ANSI definition,
electrical bonding is a process in which components or modules of an assembly, equipment or subsystems are
electrically connected by means of a low-impedance conductor. Bonding purpose is to make the shield structure
homogeneous in regards to the flow of RF currents. Bonding can be achieved by different methods as follows:
a) by metallic interfaces through fasteners or by direct metal-to-metal contact
b) joining two metallic parts or surfaces through the process of
welding or brazing
c) by bridging two metallic surfaces with a metallic bond strap

1 Bonding conductor for telecommunications: A conductor that interconnects the telecommunications


bonding infra- structure to the building's service equipment (power) ground.
2 Telecommunications bonding backbone: A conductor that interconnects the telecommunications main
grounding bus- bar (TMGB) to the telecommunications grounding busbar (TGB).
3 Telecommunications grounding busbar: The interface to the building telecommunications grounding
system generally located in telecommunications room. A common point of connection for
telecommunications system and equipment bonding to ground, and located in the
telecommunications room or equipment room.
4 Telecommunications main grounding busbar: A busbar placed in a convenient and accessible location
and bonded by means of the bonding conductor for telecommunications to the building service
equipment (power) ground.

The procedures for bonding and grounding a telecommunications network are straightforward. The cabling
system and equipment is grounded to equipment racks or adjacent metallic pathways. These are in turn
connected to the TGB. The TGB is bonded to the telecommunications main grounding busbar (TMGB) via the
telecommunications bonding back- bone. Finally, the TMGB is connected to the main service ground by the
bonding connector for telecommunications. Although actual methods, materials and appropriate specifications
for each of the components in the telecommunications bonding and grounding system vary according to system
and network size, capacity and by local codes, the basic structure remains as illustrated above. From the rack to
earth, the process is the same for a UTP, F/UTP or S/FTP cabling infrastructure.

Final Thought
If your facility’s bonding and grounding system complies with safety codes, then it more than satisfies the bonding
and grounding requirements for the proper performance of any twisted-pair cabling system. All that is required to
realize the performance benefits of F/UTP and S/FTP cabling is the addition of a low impedance connection from
the patch panel in the telecommunications room (TR) to the rack, which should already be connected to the TGB.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 21
DC – Fire & Physical Security Infrastructure
DC Fire Protection & Suppression Systems

Today’s data centers and network rooms, more than ever, are under enormous pressure to
Introduction maintain operations. Some companies risk losing millions of dollars with one single data
center glitch. Therefore it is not hard to believe that in the event of a catastrophic data center
fire, a company may not only lose millions, but may also go out of business.

According to the National Fire Protection Association (NFPA), there were 125,000 non-
1
residential fires in 2001 with a total of 3.231 billion dollars in losses. Industry studies tell us
that 43% of businesses that are closed by a fire never reopen. And 29% of those that do
open fail within 3 years. It is no wonder why when designing data centers; fire prevention,
detection and suppression are always top concerns. Fires in data centers are typically caused
by power problems in raceways, raised floors, and other concealed areas. This is one of the
reasons why raised floor plenums must be properly specified with fire protection equipment.
Data center fires are also caused by arson, corporate sabotage, and natural occurrences
such as lightning and power surges. Like any other critical system in a data center, fire
protection must be redundant and fault tolerant in order to increase the overall data center
availability.

Fire prevention provides more protection against fire than any type of fire detection or
suppression equipment available. Simply put, if an environment is incapable of breeding a fire
then there will be no threat of fire damage to the facility. If a fire does occur the next step is to
detect it. Before fire alarms were invented, watchmen were responsible for spotting fires and
alerting others. Now there are a number of advanced detectors that can detect fire in its
incipient stages and then notify a central center that notifies personnel and suppression
systems. Some of the first fire detection devices were nothing more than a water valve tied to
a rope with a weight attached. In the event of a fire, the rope would burn through thereby
opening the water valve. Fortunately, fire protection systems have come a long way with the
advent of technology. Today there are many ways of detecting and suppressing fires, but only
a few are recommended for data center applications. In a data center, the main goal of the
fire protection system is to get the fire under control without disrupting the flow of business
and without threatening the personnel inside.

Data center design standards – NFPA codes


NFPA, National Fire Protection Association, was established in 1896 to protect the public
against the dangers of electricity and fire. Its mission is “to reduce the worldwide burden of
fire and other hazards on the quality of life by developing and advocating scientifically based
consensus codes and standards, research, training, and education.” NFPA today is a
worldwide organization that has created many standards one of them being NFPA 75. NFPA
75 is the standard for the protection of electronic computer / data processing equipment.
Several of items listed in the Industry best practices section of this white paper are a result
of the NFPA 75 standard. One important exception provided by the 1999 edition of NFPA 75
(6-4.2.1) allows data centers to continue to power the electronic equipment upon activation of
a gaseous agent total flooding system. This exception is valid for a data center with the
following risk considerations (NFPA 75, 2-1):

1. Economic loss from loss of function or loss of records


2. Economic loss from value of equipment
3. Life safety aspects of the function
4. Fire threat of the installation to occupants or exposed property

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 22
Gaseous agents will be discussed later in depth. NFPA continuously updates its standards
therefore it is recommended that the latest standards be reviewed prior to designing or
retrofitting a fire protection system into a data center. One must be aware that in most cases the
Authority Having Jurisdiction (AHJ) has final say in what can or can not be done in regards to fire
protection systems.

Classification of fires
Fires are categorized by five classes: Class A, B, C, D, and K. The five classes are de-
scribed in Figure 1 and are accompanied by the standard picture symbol used to identify
what fires an extinguisher may be used on. Fire hazards in data centers are usually
categorized as Class A and C hazards due to their content. Class B chemicals should not be
stored in data centers.

Class Type of fire Symbol

Fires involving ordinary combustible materials such as paper, wood, cloth


A and some plastics.

Fires involving flammable liquids and gases such as oil, paint lacquer,
B petroleum and gasoline.
Figure 1
Classes of fire
Fires involving live electrical equipment. Class C fires are usually Class A
C or Class B fires that have electricity present.

Fires involving combustible metals or combustible metal alloys such as


D magnesium, sodium and potassium.

Fires involving cooking appliances that use cooking agents such as


K vegetable or animal oils and fats.

In order for fire to exist, three elements are required. This is often taught as the “Fire
Triangle”. Oxygen, heat, and fuel must all interact for a reaction to take place otherwise
known as fire. If one or more of these three elements are taken away, fire cannot exist.
Therefore, fire extinguishing methods can vary depending on which element(s) are removed.
For instance CO2 systems reduce oxygen by displacing it with a heavier CO2 gas. And
because the CO2 gas is much colder than the fire, it hampers its progression by taking away
heat.

Once a fire is started it is often categorized in stages of combustion. There are four stages of
combustion; the incipient stage or pre-combustion, visible smoke stage, flaming fire stage,

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 23
and intense heat stage. As a fire progresses through these stages, many factors increase
exponentially, including smoke, heat, and property damage. Not to mention risk of life, which
becomes critical as smoke density increases. Fire research has shown that the incipient
stage allows for the largest window of time to detect and control the progression of a fire. It is
in this window of time that fire detection systems can mean the difference between availability
and unavailability. The longer the fire burns the more products of combustion, which then
leads to a higher chance of equipment failure even if the fire is successfully extinguished.
These products of combustion may be electrically conductive and can also corrode the
circuits on IT equipment. In these next few sections, we will discuss available solutions for
detecting and suppressing fires in a data center.

Choosing a fire protection solution


For the purposes of designing a fire protection solution for a data center, three conditions
should be met; identify the presence of a fire, communicate the existence of that fire to the
occupants and proper authorities, and finally contain the fire and extinguish it if possible.
Being familiar with all technologies associated with fire detection, alarming, and suppression
will ensure a sound fire protection solution. Of course prior to selection of a detection and
suppression methodology, the design engineer must assess potential hazards and issues.
Will the data center have raised floors? Will it have high ceilings? Will personnel occupy the
area? Will detectors be obstructed in any way? These questions, and many more like them,
should be answered before the proper fire protection solution is chosen. Although a good
deal of footwork is still necessary, technology is making it easier and safer to design fire
protection solutions. The following section describes each of the components utilized in a
complete fire protection solution for data centers.

Three main types of detectors are available; smoke detectors, heat detectors, and flame
Fire detection detectors. For the purposes of protecting a data center, smoke detectors are the most
system types effective. Heat and flame detectors should not be used in data centers as they do not provide
detection in the incipient stages of a fire and therefore do not provide early warning for the
protection of high value assets.

3.1.1.1 Spot type smoke detection


Spot type or conventional smoke detectors can cover an area of about 900 square feet (84
square meters). Spacing in data centers and computer rooms is usually reduced to compen-
sate for the high air flow required in these environments. The greater the number of air
changes in the room, the more detectors should be placed per square foot as shown in
Table 1. A default standard for high air movement areas in the industry is usually one per
2 2
every 250 ft (23 m ). Spot type detectors are effective in small data centers and computer
rooms. Although more expensive, intelligent, detectors are available they would add little
value these smaller spaces. There are two types of spot type detectors; photoelectric and
ionization.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 24
Area per Area per
Air changes
detector detector
per hour
( ft2) (m2)
60 125 11.6
30 250 23.2
Table 1 20 375 34.8
Quantity of smoke 15 500 16.5
detectors as a function of
air changes 12 625 58.1
10 750 69.7
8.6 875 81.3
7.5 900 83.6
6.7 900 83.6
6 900 83.6

Source: NFPA 72 - National Fire Alarm Code (2007)

Photoelectric detectors work by using a light source or beam and a light sensor perpendicu-
lar to it. When nothing is in the chamber the light sensor does not react. When smoke enters
the chamber, some of the light is diffused and reflected into the light sensor causing it to
enter into an alarm condition.

Ionization detectors use an ionization chamber and a small amount of radiation to detect
smoke. Normally the air in the chamber is being ionized by the radiation causing a constant
flow of current, which is monitored by the detector. When smoke enters the chamber it
neutralizes or disrupts the ionized air thereby causing the current to drop. This triggers the
detector into an alarmed condition.

4.1.1.1 Intelligent spot-type very early smoke detection


Intelligent spot-type detectors, shown in Figure 2, are very similar to conventional spot-type
detectors except that they work with an addressable intelligent fire alarm control panel that
can report the location of a fire more precisely. Intelligent spot type detectors are available as
photoelectric or ionization detectors. What makes these detectors intelligent is that they are
individually addressable so that they are able to send information to the central control station
thereby pinpointing the exact location of the smoke. Some have the ability to automatically
compensate for changing environments such as humidity and dirt accumulation. They can
also be programmed to be more sensitive during certain times of the day, for instance when
workers leave the area, sensitivity will increase. Intelligent spot type detectors are commonly
placed below raised floors, on ceilings and above drop down ceilings. However modified spot
detectors are also used in air handling ducts to detect possible fires within the HVAC (heating
ventilation air conditioning) system as seen in Figure 3. By placing detectors near the
exhaust and the intake of CRAC units (computer room air conditioners), detection can be
accelerated.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 25
Figure 2 (left)
Intelligent smoke detector

Figure 3 (right)
Duct smoke detector

5.1.1.1 Air sampling smoke detection


Air sampling smoke detection, sometimes referred to as a “Very Early Smoke Detection”
(VESD) system, is usually described as a high powered photoelectric detector. Air sampling
systems use an advanced detection method using a very sensitive laser, much more powerful
than the one contained in a common photoelectric detector. As the particles pass through the
detector, the laser beam is able to distinguish them as dust or byproducts of combustion. An
air sampling system is comprised of a network of pipes attached to a single detector, which
continually draws air in and samples it. The pipes are typically made of PVC but can also be
CPVC, EMT or copper. Depending on the space being protected and the configuration of
multiple sensors, these systems can cover an area of 2,500 to 80,000 square feet (232 to
7,432 square meters). Despite the wide area of coverage, the sensors can be centrally
located for ease of maintenance and repair. Smoke detection is dependent on three vari-
ables; the sensitivity of the detector, the clarity of the smoke path leading to the detector, and
the density of the smoke once it reaches the detector. In an area such as a data center where
the airflows are rapid, it becomes difficult to detect smoke with a spot-type detector especially
in the incipient stage of a fire. This is what makes VESD an ideal smoke detection solution for
high availability data centers. The air sampling system is designed to detect the particles of
combustion such as those released from PVC wire during the initial stages of heat build up.
When the smoke particles drift through the pipes and into the detector, a photo detector or a
laser beam differentiates the particle as dust or as a byproduct of combustion. This detection
process can be up to 1000 times more sensitive than a photoelectric or ionization smoke
detector. These systems are capable of detecting byproducts of combustion in concentration
as low as 0.003% obstruction per foot. A typical air-sampling detector is show in Figure 4.

Figure 4
Air sampling smoke
detection system

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 26
6.1.1.1 Linear thermal detection
Linear thermal detection is a method of detecting “hot spots” in cable trays or cable runs. As
a general rule it is not used in enclosed and air-conditioned computer rooms or data centers.
Linear detection is more common in industrial applications that have long cable tray rooms,
such as refineries, chemical plants, and power generation facilities. Linear thermal detection
is composed of at least two heat dependent conductors. When a set temperature is reached,
the two conductors cause an alarm condition that is detected at the main control panel. The
control panel can then notify personnel and pinpoint the location of the hot spot. Linear
thermal detection is capable of detecting heat anywhere along its length up to about 5,000
feet (1,524 meters) per zone.

Once a fire is detected in a data center, it is critical to quickly extinguish the fire with no effect
Fire suppression on the data center operation. To do this various methods are used, some better than others.
system types Regardless of the method employed, it should provide a means to abort the suppression
system in the event of a false alarm.

7.1.1.1 Foam
Foam formally called, Aqueous Film-Forming Foams (AFFF), is generally used in liquid fires
because when applied it floats on the surface of the flammable liquid. This prevents the
oxygen from reaching the flames thereby extinguishing the fire. Foam is electrically
conductive therefore should not be used anywhere where electricity is present. Needless to
say it should not be used in data centers.

8.1.1.1 Dry chemical


Dry chemical or dry powder systems can be used on a wide variety of fires and pose little
threat to the environment. Different types of powders can be used depending on the type of
fire. They are electrically nonconductive but require clean up. They are used in many
industrialized applications but are not recommended for data centers due to the residue left
after discharge.

9.1.1.1 Water sprinkler system


Water sprinkler systems are designed specifically for protecting the structure of the building
(Figure 5). Water sprinklers are discharged when the valve fuse opens. Fuses are usually
solder or glass bulbs that open when they reach a temperature of 165-175 F. It is important
to note that by the time the fuse opens the temperature around the sprinkler head may be as
high as 1000°F. This has given rise to fast acting or quick-response sprinkler systems, which
are basically the same but, open at a lower temperature. Water sprinkler systems are
installed in three different configurations: wet-pipe, dry-pipe, and pre-action. Wet-pipe is by
far the most common installation and is usually found in insulated buildings to prevent
freezing. Dry-pipe systems are charged with compressed air or nitrogen to prevent freezing.
Pre-action systems prevent accidental water discharge by requiring a combination of sensors
to activate before allowing water to fill the sprinkler pipes. Normally water sprinklers are not
recommended for data centers; however, depending on local fire codes they may be re-
quired. In this case a pre-action system would be recommended. Installing a sprinkler system
during construction can range from $1 - $2 / ft2 ($3.28 - $6.56 / m2), while retrofitting an
existing building costs increase to $2- $3 / ft2 ($6.56 - $9.84 / m2).

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 27
Figure 5
Water sprinkler

10.1.1.1 Water mist system


Water mist systems discharge very fine droplets of water onto a fire. One drop ranges in size
from 100 to 120 microns, which dramatically decreases water consumption. Because mist
systems use less water than conventional sprinkler systems they require less storage space.
Water mist systems are extremely safe and pose no threat to the environment. This fine mist
of water extinguishes the fire by first absorbing heat from the fire. By absorbing the heat,
vapor is produced causing a barrier between the flame and the oxygen needed to sustain it. It
is this change of state (liquid to gas) that makes this water mist system so effective. (This is
the same phenomenon that is used in evaporative cooling.) Typical applications include gas
turbines, steam turbine generator bearings, generator sets, transformers and switchgear
rooms. Water mist systems are gaining popularity due to its effectiveness. However, there is
evidence to suggest that equipment failure can result from a discharge due to the high level
of humidity introduced into the data center.

11.1.1.1 Fire extinguishers


Sometimes the oldest method of fire suppression is the best. Fire extinguishers these days
are essentially the same as that have always been in that they are easy to use and can be
operated by just about anyone. What makes fire extinguishers so valuable to data centers is
the ability to extinguish a fire before the main suppression system discharges. As we noted
before, the human nose is the best fire detector, which helps to extinguish a fire in its earliest
stages.
Various types of fire extinguishers have been approved for use in data centers and replace
Halon 1211. One such agent is HFC-236fa, or more commonly called by its trade name FE-
36 and can be used in occupied areas (Figure 6). A few others include Halotron I, Halotron II,
and Novec 1230. They are environmentally safe and leave no residue upon discharge since
they are discharged as a gas. These clean agents extinguish fires by removing heat and
chemically preventing combustion.

Figure 6
Clean agent fire extinguisher

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 28
12.1.1.1 Total flooding fire extinguishing systems
Total flooding fire extinguishing systems, sometimes referred to as Clean Agent Fire
Suppression Systems, can be used on Class A, B, and C fires. A gaseous agent flooding fire
suppression system is highly effective in a well-sealed, confined area, which makes a data
center an ideal environment. It typically takes less than 10 seconds for an agent to discharge
and fill the room. The agent is contained in pressurized tanks as shown in Figure 7. The
number of tanks used depends on the total volume of the room being protected as well as the
type of agent used. The hidden areas in a data center present the biggest threat of fire. If
wires are damaged, loose or otherwise poorly maintained in an open area, a routine visual
inspection should uncover the problem and repairs can be made. Discovering a problem in a
closed area is far more difficult. Unlike water suppression systems, gaseous agents infiltrate
even the hardest to reach areas such as inside equipment cabinets. Later the gas and its
byproducts can be vented out of the data center with very little environmental impact and no
residue.

These agents are non-conductive, non-corrosive, and some can safely be discharged in
occupied areas. The name “clean agents” is commonly used because they leave no residue
and cause no collateral damage. For years Halon has been used as the agent of choice,
however, it was phased out in commercial applications due to its ozone depletion properties.
We will discuss Halon alternatives in the next section. The standard that guides total flooding
suppression systems is NFPA 2001 – Standard on Clean Agent Fire Extinguishing System.

Figure 7
Gaseous agent cylinders

Gaseous agents
A fire-extinguishing agent is a gaseous chemical compound that extinguishes a fire, as a gas,
by means of “suffocation” and or heat removal. Given a closed, well-sealed room, gaseous
agents are very effective at extinguishing fires and leave no residue. Back in the 1960’s when
Halon 1301 was introduced; it was widely used throughout various industries given its
effectiveness in fighting fires. However, on January 1, 1994, under the Clean Air Act (CAA),
the U.S. banned the production and import of Halons 1211, 1301, and 2402 in compliance
with the Montreal Protocol on Substances That Deplete the Ozone Layer. Recycled Halon
and inventories produced before January 1, 1994, are the only sources available today.
Furthermore the EPA (Environmental Protection Agency) published the final rule (63 FR
11084) on March 5, 1998 that bans the production of any blend of Halon. However an
exception was made for the purpose of aviation fire protection. In the midst of this ban came
two U.S. standards: NFPA 2001 standard on Clean Agent Fire Extinguishing Systems and
the Significant New Alternatives Policy (SNAP) from the (EPA). Under these standards
alternative agents are evaluated based on their safety, effect on the environment and
effectiveness.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 29
Gaseous agents are divided into two categories; inert gases and fluorine based compounds.
Note that the names of the agents are designated under the NFPA 2001 standard. The
names in parentheses are the trade names that they are normally referred to.

Inert gases
Although there are other inert gas agents approved by NFPA 2001, IG-55(Pro-Inert), and IG-
541(Inergen) are the most widely accepted and commercially available today. Some other
inert gas agents listed in NFPA 2001 include: Carbon Dioxide (CO2), IG-55 (Argonite), IG-
100 (Nitrogen), and IG-01 (Argon),

Carbon Dioxide
Carbon Dioxide or CO2 is an inert gas, which reduces the concentration of oxygen
needed to sustain a fire by means of physical displacement. Because CO2 is heavier
than oxygen, it settles to the base of the fire and quickly suffocates it. That makes this
type of agent unsafe for discharge in occupied areas. As per the Standard (NFPA 12)
these systems shall not be used in occupied areas and is therefore not recommended
for data centers. If a CO2 system is used in occupied areas because no suitable alterna-
tive is available, a proper evacuation plan should be in place and safety mechanisms
should be used to notify personnel and evacuate the areas prior to a discharge. A safety
mechanism would be one that provides audible and visual queues to data center occu-
pants 30-60 seconds prior to discharge. CO2 is non-conductive and non-damaging. An-
other disadvantage to the use of carbon dioxide is the large number of storage contain-
ers required for effective discharge. CO2 is stored in tanks as a gas and occupies about
4 times the storage volume of Halon 1301. This is obviously a poor choice for any data
center where floor space is highly valued. Some applications are transformer rooms,
switch rooms, cable vaults, generators, and industrial processes.
IG-55 (Pro-Inert) and IG-541 (Inergen)
Pro-Inert is an Inert gas composed of 50% Argon and 50% Nitrogen. Inergen is an inert
gas composed of 52% Nitrogen, 40% Argon, and 8% Carbon Dioxide, all of which are
found naturally in the atmosphere. For this reason it has a zero ozone depletion poten-
tial (ODP), an acceptably low global warming potential and does not produce any harm-
ful products of decomposition. Inert agents are non-conductive, leave no residue and
are safe to discharge in occupied areas. They are stored as a gas in high-pressure
tanks that can be located up to 300 feet (91 meters) away from the protected space.
This is convenient considering inert agents require a storage volume 10 times that of
other alternatives available today, which would take up precious data center space. Fur-
thermore to reduce the storage volume needed for systems protecting multiple rooms,
selector valves can be used to direct the agent to the alarmed zone. Due to the quantity
of agent that is being introduced into the protected space, the discharge of agent takes
60 seconds. Inert agents are used in data centers, telecommunications offices, and
various other critical applications.

Fluorine based compounds


Although there are other alternative agents approved by NFPA 2001, FK-5-1-12 (3M Novec
1230 Fire Protection Fluid), HFC-125 (ECARO-25 / FE-25), and HFC-227ea (FM-200 / FE-
227) are the most widely accepted and commercially available agents for the protection of
high value assets.

FK-5-1-12 (Novec 1230)


FK-5-1-12 is known as 3M Novec 1230 Fire Protection Fluid. It has a zero ozone deple-
tion potential (ODP) and an extremely low global warming potential. FK-5-1-12 is stored
as a liquid, is colorless, and is nearly odorless. Although at room temperature it is a liq-
uid, it is discharged as an electrically non-conductive gas that leaves no residue and will
not harm occupants; however, like in any other fire situation all occupants should
evacuate the area as soon as an alarm sounds. FK-5-1-12 systems have about the
same storage space requirement as conventional halocarbon agents. It extinguishes a
fire by removing heat faster than it is generated and is discharged in 10 seconds or less.
HFC-125 (ECARO-25 / FE-25)

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 30
HFC-125 is known under two commercial brands; ECARO-25 and FE-25. HFC-125 has
a zero ozone depletion potential (ODP) and an acceptable global warming potential. It is
odorless, colorless, and is stored as a liquefied compressed gas. It is discharged as an
electrically non-conductive gas that leaves no residue and will not harm occupants. It
can be used in occupied areas; however, like in any other fire situation all occupants
should evacuate the area as soon as an alarm sounds. HFC-125 can be used with ceil-
ing heights up to 16 feet (4.9 meters). The flow of HFC-125 is similar to Halon which
presents an alternative for an actual drop-in Halon replacement since it can use the
same pipe network distribution as an original Halon system. Flow calculation software
must be used to verify that the agent can be distributed through the pipe network and
comply with the NFPA 2001 standard and the manufactures UL FM Listing and Approv-
als. Due to its hydraulic properties it also requires less agent per unit weight than other
chemical agents available today. HFC-125 floor space requirements are about the same
as those of a Halon system. This agent chemically inhibits the combustion reaction by
removing heat and is discharged in 10 seconds or less.
HFC-227ea (FM-200 / FE-227)
HFC-227ea is known under two commercial brands; FE-200 and FE-227. HFC-227ea
has a zero ozone depletion potential (ODP) and has an acceptable global warming po-
tential. It is odorless, colorless, and is stored as liquefied compressed gas. HFC-227ea
is discharged as an electrically non-conductive gas that leaves no residue and will not
harm occupants; however, like in any other fire situation all occupants should evacuate
the area as soon as an alarm sounds. It can be used with ceiling heights up to 16 feet
(4.9 meters) and has a storage space requirement 1.7 times that of a Halon 1301 sys-
tem. HFC-227ea chemically inhibits the combustion reaction by removing heat and is
discharged in 10 seconds or less. This agent can be retrofitted into an existing Halon
1301 system but the pipe network must be replaced or an additional cylinder of nitrogen
must be used to push the agent through the original Halon pipe network.

Pull stations allow a building occupant to notify everyone in the building of a fire. These
Pull stations and should placed at every exit to the protected space and once pulled can notify the fire
signaling devices department of the alarm (Figure 8). Pull stations are sometimes the best way to catch a fire
in its incipient stage. No matter how sensitive a smoke detector may be, it is still no substitute
for the human nose. A person can pick up the scent of smoke much earlier than any smoke
detector can.

Figure 8
Pull station

Signaling devices are activated either after a pull station or a detector enters an alarm
condition. Signaling devices provide audible and / or visual queues to building occupants as a
signal to evacuate the building (Figure 9). Audible sounds may include horns, bells, sirens,
and may be heard in various patterns. Sound levels range from 75 dBA to 100 dBA.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 31
Visual signaling devices are crucial to notifying occupants who are hearing impaired. Strobes
usually incorporate a Xenon flashtube that is protected by a clear protective plastic. They are
designed with different light intensities measured in candela units. The minimum flash
frequency for these strobes should be once per second.

Figure 9
Fire alarm strobe

Regardless of the number of fire suppression and detection products in a building, they are
Control systems useless without a control system commonly known as the fire alarm control panel (FACP).
Control systems are the “brains” behind the building’s fire protection network. Every system
discussed thus far is accounted for by the fire alarm control system. Fire alarm panels are
either conventional panels or intelligent addressable panels working with detectors of the
same type (conventional or intelligent / addressable) and with the same communication
protocol. An example of one is shown in Figure 10 below. Depending on the panel, it can
control the sensitivity levels of various components such as smoke detectors and can be
programmed to alarm only after a certain sequence of events have taken place. The
computer programs used by these systems allow a user to set certain time delays, thresholds,
passwords, and other features. Reports can be generated from most intelligent panels which
can lead to improved performance of the fire protection system, by identifying faulty sensors
for example. Once a detector, pull station or sensor is activated the control system
automatically sets in motion a list of rules that have been programmed to take place. It can
also provide valuable information to authorities.

All fire alarm control panels used in a suppression environment should be listed by UL for
“releasing”. This approval guarantees that the control panel incorporates the necessary
protocol and logic to activate and control a fire suppression system.

Figure 10
Fire control panel

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 32
Now that all the fire protection components have been described, the last step is to bring
Mission critical them together to design a robust and highly available data center solution. It is important to
facilities note that while various types of detection, suppression, and gaseous agents were described,
not all of them are recommended for a highly available data center. The following list of
components compliments a data center goal of 7x24x365 uptime.

Conventional spot-type detection


Intelligent spot-type detection
Air sampling smoke detection (VESD)
Fire extinguishers
Total flooding fire-extinguishing system
Halon alternative clean agent
Pull stations
Signaling devices
Control system / fire alarm control panel (FACP)

Spot-type detection with photoelectric detectors controlled by either a conventional or


intelligent “releasing” panel should be used. Detectors should be placed in raised floors as
well as in the main environment. A sequential detection configuration should be used. An
initial alarm should not trigger the suppression system; it should prompt the control system to
sound an alarm. When a second detector goes into alarm it will provide confirmation of a fire
to the control panel and the panel will initiate a sequence to activate the suppression system
in place. The activation of a pull station will activate an immediate release of the suppression
system through the fire alarm control panel.

Redundant air sampling smoke detector systems should be placed beneath as well as above
the raised floor. This is to prevent any accidental discharge of the clean agent. Both detection
systems must enter an alarm state before the total flooding suppression system discharges. It
is also recommended that intelligent spot-type detectors be positioned at every CRAC unit
intake and exhaust. If any other ductwork enters the data center, duct smoke detectors
should also be installed. Again, to prevent accidental discharge of the clean agent, no
individual alarm should be able to trigger a discharge. The most specified Halon alternative
clean agent systems available today are HFC-125 (ECARO-25 / FE-25 and HFC-227ea (FM-
200 / FE-227) because of their small storage footprint and effectiveness. In addition to the
total flooding fire-extinguishing system, fire code may require a sprinkler system to be
installed. If this is the case, it must be a pre-action system to prevent accidental water
damage to the data center. Clean agent fire extinguishers should be placed throughout the
data center and in accordance with local fire codes. There should be pull stations as well as
written procedures posted at every exit and signaling devices throughout the building capable
of notifying all personnel inside of a fire.

The fire alarm control panel should be fault tolerant, programmable, and capable of monitor-
ing all devices. It should also be capable of automatic system overrides. It should be a panel
approved for “releasing” by UL. If the panel is protecting more than one suppression zone, all
the detectors should be addressable therefore allowing the control panel to identify the
precise location of any alarm. The control system is vital to the effectiveness of the suppres-
sion system. It must coordinate the sequence of events that take place immediately following
the initial alarm. Some of these include sounding a separate evacuation alarm prior to
discharge; closing ventilation dampers to prevent air from escaping, discharging the agent,
and notifying the local authorities. And of course, all this is incomplete without well-written
and effective emergency procedures, reinforced with regular training of all data center
employees.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 33
13.1.1.1 Raised floors
Raised floors bring up some important issues with regard to fire protection in mission critical
facilities and are worth mentioning here. Raised floor tiles conceal power and data cables as
well as any other combustible material such as paper and debris. Therefore it is
recommended that all cabling be placed overhead where it is visible and can be easily
inspected in case of a detected hot spot. When given the opportunity, raised floors become a
breeding ground for human error that poses significant fire risks. In some cases boxes of
paper have been stored under the floor. It may seem very natural to store material under a
raised floor without thinking of the fire hazards imposed.
Lastly, raised floors increase the financial cost of properly protecting a data center. Because
raised floors create a completely separate plenum, it must be protected with the same level of
fire protection as the space above it. When systems like intelligent smoke detection and
gaseous agent flooding are used, the cost could approach 1.25 times that of a non-raised
floor environment.

Industry best
The following is a list of recommend practices for increasing the availability of a data center
practices with respect to fire protection.

Ensure that the data center is built far from any other buildings that may pose a fire
threat to the data center.
Emergency procedures should be posted on all annunciator panels and fire alarm con-
trol panels.
A fire alarm system (releasing panel) should incorporate multiple stages of alarm.
A smoke purging system must be installed in the data center.
All electrical panels must be free of any obstructions.
All EPO buttons and fire alarm pull stations should be consistently labeled to avoid any
confusion.
All fire extinguisher locations should be clearly identified and should provide information
on what kind of fire to use it on.
Any openings in the data center walls should be sealed with an approved fireproof
sealant.
Each data center exit should have a list of emergency phone numbers clearly posted.
Enforce a strict no smoking policy in IT and control rooms.
EPO systems should not be activated by fire alarms.
Equip the switchgear room with clean agent fire extinguishers.
Fire dampers should be installed in all air ducts within the data center.
Fire protection systems should be designed with maintainability in mind. Replacement
parts and supplies should be stored on site. Systems should be easily accessible.
Get approval from the fire marshal to continue operating the CRAC units when the fire
system is in the alarmed state.
If a facility is still using dry chemical extinguishers ensure that the computer room
extinguishers are replaced with a Halon alternative.
Pre-action sprinklers should be placed in the data center (if required by AHJ) as well as
in the hallways.
Provide a secondary water source for fire sprinklers.
Sprinkler heads should be recessed into the ceiling to prevent accidental discharge.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 34
The annunciator panels should have emergency or operating procedures posted near
them. Most annunciator panels are located in the security office and may also be
located in the engineer’s office.
The fire suppression system should have a secondary suppression agent supply.
The data center should be void of any trash receptacles.
All office furniture in the data center must be constructed of metal. (Chairs may have
seat cushions.)
Tape libraries and record storage within the data center should be protected by an
extinguishing system. It is recommended that they be stored in a fire safe vault with a
fire rating of more than 1 hour.
Any essential supplies such as paper, disks, wire ties, etc., should be kept in completely
enclosed metal cabinets.
UL approved extension cords used to connect computer equipment to branch circuits
should not exceed 15 feet in length.
The use of acoustical materials such as foam, fabric, etc. used to absorb sound is not
recommended in a data center.
The sprinkler system should be controlled from a different valve than the one used by
the rest of the building.
All data center personnel should be thoroughly trained on all fire detection and
extinguishing systems throughout the data center. This training should be given on a
regular basis.
Air ducts from other parts of the building should never pass through the data center. If
this is not possible then fire dampers must be used to prevent fire from spreading to the
data center.
Water pipes from other parts of the building should never pass through the data center.
Duct coverings and insulation should have flame spread ratings less than 25 and a
smoke developed rating less than 50.
Air filters in the CRAC units should have a class 1 rating.
Transformers located in the data center should be a dry type or should be filled with
non-combustible dielectric.
No extension cords or power cords should be run under equipment, mats, or other
covering.
All cables passing through the raised floor should be protected against chaffing by
installing edge trim around all openings.
Computer areas should be separated from other rooms in the building by fire-resistant-
rated construction extending from structural floor slab to structural floor above (or roof).
Avoid locating computer rooms adjacent to areas where hazardous processes take
place.

Common mistakes
Some common mistakes made with regard to fire protection systems in a data center
environment.

Having the fire system automatically shut down the CRAC unit. This will cause the
computer equipment to overheat resulting in downtime.
Using dry chemical suppression agents to extinguish computer room fires will damage
computer equipment. Dry chemical agents are very effective against fires but should not
be used in a data center.
Storing combustible materials underneath a data center raised floor.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 35
Most fires in mission critical facilities can be prevented if common mistakes are avoided and
Conclusion fire detection is properly specified and monitored. Human error plays a large roll in preventing
fire hazards and must be eliminated through training and procedures that are enforced.

References:
1. www.dupont.com/fire
2. www.greatlakes.com
3. www.fike.com
4. www.fireline.com
5. www.hygood.co.uk
6. www.nfpa.org
7. www.vesda.com

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 36
Physical Security in Mission Critical Facilities

Introduction 14.1.1.1 People: a risk to be managed


When data center security is mentioned, the first thing likely to come to mind is protection
from sabotage, espionage, or data theft. While the need is obvious for protection against
intruders and the intentional harm they could cause, the hazards from ordinary activity of
personnel working in the data center present a greater day-to-day risk in most facilities.

People are essential to the operation of a data center,


yet studies consistently show that people are directly
> Data center physical
responsible for 60% of data center downtime through infrastructure
accidents and mistakes — improper procedures, Physical security is part of Data
mislabeled equipment, things dropped or spilled, Center Physical Infrastructure
(DCPI) because it plays a direct
mistyped commands, and other unforeseen mishaps
role in maximizing system
large and small. With human error an unavoidable availability (“uptime”). It does
consequence of human presence, minimizing and this by reducing downtime from
controlling personnel access to facilities is a critical accidents or sabotage due to the
presence of unnecessary or
element of risk management even when concern malicious people.
about malicious activity is slight.
Other DCPI elements are power,
cooling, racks, cabling, and fire
Identification technology is changing as fast as the suppression.
facilities, information, and communication it protects.
With the constant appearance of new equipment and
techniques, it's easy to forget that the age-old problem this technology is trying to solve is
neither technical nor complicated: keeping unauthorized or ill-intentioned people out of places
where they don't belong. And while the first step, mapping out the secure areas of the facility
and defining access rules, may produce a layered and complex blueprint, it isn’t intuitively
difficult — IT managers generally know who should be allowed where. The challenge lies in
the second step: deciding how best to apply less-than-perfect technologies to implement the
plan.
15.1.1.1 Who are you, and why are you here?
While emerging security technologies may appear exotic and inscrutable — fingerprint and
hand scans, eye scans, smart cards, facial geometry — the underlying security objective,
unchanged since people first started having things to protect, is uncomplicated and familiar to
all of us: getting a reliable answer to the question "Who are you, and why are you here?"

The first question — "Who are you?" — causes most of the trouble in designing automated
security systems. Current technologies all attempt to assess identity one way or another,
with varying levels of certainty — at correspondingly varying cost. For example, a swipe card
is inexpensive and provides uncertain identity (you can't be sure who's using the card); an iris
scanner is very expensive and provides very certain identity. Finding an acceptable com-
promise between certainty and expense lies at the heart of security system design.

The answer to the second question, "Why are you here?" — in other words, what is your
business at this access point — might be implicit once identity has been established (“It’s
Alice Wilson, our cabling specialist, she works on the cables — let her in”), or it can be
implemented in a variety of ways: A person's "who" and "why" can be combined — in the
information on a swipe-card’s magnetic strip, for example; a person's identity could call up
information in a computer file listing allowable access; or there could be different access
methods for various parts of the facility, designed to allow access for different purposes.
Sometimes "Why are you here?" is the only question, and "Who are you?" doesn't really
matter — as for repair or cleaning personnel.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 37
16.1.1.1 Combining expertise to find the solution
IT managers know the "who and why" of security for their installation, but they may not be
conversant in the details of current methodologies or the techniques for applying them — nor
should they need to be. They know their budget constraints, and they know the risks inherent
in various types of security breach at their facility.

The security system consultant, on the other hand, doesn't know the particulars of the facility,
but knows the capabilities, drawbacks, and cost of current methodologies. He or she also
has experience in the design of other security systems, and so can help clarify, refine, or
simplify the "who and why" requirements by asking the right questions.

With their combined expertise, a system can be designed that balances access requirements,
acceptable risk, available methods, and budget constraints.

Defining the 17.1.1.1 Secure areas: what needs protecting?


problem The first step in mapping out a security plan is just that — drawing a map of the physical
facility and identifying the areas and entry points that need different rules of access, or levels
of security.

These areas might have concentric boundaries:

Site perimeter
Building perimeter
> “Physical security”
Computer area can also mean…
Computer rooms Physical security can also refer
Equipment racks to protection from catastrophic
damage (fire, flood, earthquake,
bombing) or utility malfunction
Or side-by-side boundaries: (power loss, HVAC failure).

Here it refers only to protection


Visitor areas from on-site human intrusion.
Offices
Utility rooms

Concentric areas can have different or increasingly stringent access methods, providing
added protection called depth of security. With depth of security, an inner area is protected
both by its own access methods and by those of the areas that enclose it. In addition, any
breach of an outer area can be met with another access challenge at a perimeter further in.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 38
Data
Offices vault Mech.
room

Deliveries
Rack Deliv. and service
Employee parking

Room

Figure 1
Data center
Security map
showing “depth of Common
security” BUILDING areas

Employee Foot entrance


car entrance for visitors
GROUNDS

Fence

Vistor Parking
"Depth of Security"

Concentric areas to = Access point


protect data center

Darker = More Secure

Rack-level security - At the innermost “depth of security” layer — further in than the data
room itself — is the rack. Rack locks are not in common use (yet), but if used they serve as
the last defense against unauthorized access to critical equipment. It would be unusual for
everyone in a room full of racks to have the need to access every rack; rack locks can ensure
that only server people have access to servers, only telecommunications people have access
to telecommunications gear, and so on. “Manageable” rack locks that can be remotely
configured to allow access only when needed — to specific people at specific times — reduce
the risk of an accident, sabotage, or unauthorized installation of additional gear that could
cause a potentially damaging rise in power consumption and rack temperature.

Infrastructure security - It is important to include in the security map not only areas
containing the functional IT equipment of the facility, but also areas containing elements of
the physical infrastructure which, if compromised, could result in downtime. For example,
HVAC equipment could be accidentally or deliberately shut down, generator starting batteries

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 39
could be stolen, or a system management console could be fooled into thinking the fire
sprinklers should be activated.

18.1.1.1 Access criteria: who is allowed where?


A person’s authority for access to a secure area can be based on different things. Besides
the usual ones — identity and purpose, the first two listed below — there may be additional
categories requiring special treatment, such as “need to know.”

Personal identity - Certain individuals who are known to the facility need access to the
areas relevant to their position. For example, the security director will have access to most of
the facility but not to client data stored at the installation. The head of computer operations
might have access to computer rooms and operating systems, but not the mechanical rooms
that house power and HVAC facilities. The CEO of the company might have access to the
offices of the security director and IT staff and the
public areas, but not the computer rooms or
> Separate the issues
mechanical rooms.
Don't let the details of identifica-
tion technologies intrude upon
Reason to be there - A utility repair person, the initial mapping out of security
regardless of whether it’s Joe Smith or Mary Jones, requirements. First define the
might have access only to mechanical rooms and areas and the access criteria for
your facility, then attack the
public areas. The cleaning crew, whose roster could
cost/effectiveness/risk analysis,
change from day to day, might have access to consider compromises, and
common areas but nowhere else. A network switch figure out the best implementa-
expert might have access only to racks with switching tion of technology.
equipment, and not racks with servers or storage
devices. At a web server facility, a client’s system maintenance personnel might have access
only to a “client access room” where there are connections to their personal server for
administrative purposes.

Need to know - Access to extremely sensitive areas can be granted to specific people for a
specific purpose — that is, if they “need to know,” and only for as long as they have that
need.

Applying the 19.1.1.1 Methods of identification: reliability vs. cost


technology Methods of identifying people fall into three general categories of increasing reliability — and
increasing equipment cost:

What you have


What you know
Who you are

What you have - Least reliable (can be shared or stolen)


What you have is something you wear or carry — a key, a card, or a small object (a token)
that can be worn or attached to a key ring. It can be as “dumb” as an old fashioned metal key
or as “smart” as a card having an onboard processor that exchanges information with a
reader (a smart card). It can be a card with a magnetic strip of information about you (such
as the familiar ATM card); it can be a card or token having a transmitter and/or receiver that
communicates with the reader from a short distance (a proximity card or proximity token —
Mobil Speedpass® is an example).

What you have is the least reliable form of identification, since there is no guarantee it is
being used by the correct person — it can be shared, stolen, or lost and found.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 40
What you know - More reliable (can’t be stolen, but can be shared or written down)
What you know is a password, code, or procedure for something such as opening a coded
lock, verification at a card reader, or keyboard access to a computer. A password/code
presents a security dilemma: if it’s easy to remember, it will likely be easy to guess; if it’s hard
to remember, it will likely be hard to guess — but it will also likely be written down, reducing
its security.

What you know is more reliable than What you have, but passwords and codes can still be
shared, and if written down they carry the risk of discovery.

Who you are - Most reliable (based on something physically unique to you)
Who you are refers to identification by recognition of unique physical characteristics — this is
the natural way people identify one another with nearly total certainty. When accomplished
(or attempted) by technological means, it’s called biometrics. Biometric scanning techniques
have been developed for a number of human features that lend themselves to quantitative
scrutiny and analysis:

Fingerprint Hand (shape of fingers and thickness of hand)


Iris (pattern of colors) Face (relative position of eyes, nose, and mouth)
Retina (pattern of blood vessels) Handwriting (dynamics of the pen as it moves)
Voice

PIN
ID # SSN #

Figure 2 Physical traits

What you have, what you


know, who you are
Passwords
and PINs

Keys, cards, tokens

Reliability
(Is it you?)

What You What You Who You


Have Know Are

Biometric devices are generally very reliable, if recognition is achieved — that is, if the device
thinks it recognizes you, then it almost certainly is you. The main source of unreliability for
biometrics is not incorrect recognition or spoofing by an imposter, but the possibility that a
legitimate user may fail to be recognized (“false rejection”).

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 41
20.1.1.1 Combining methods to increase reliability
A typical security scheme uses methods of increasing reliability — and expense — in
progressing from the outermost (least sensitive) areas to the innermost (most sensitive)
areas. For example, entry into the building might require a combination of swipe card plus
PIN; entry to the computer room might require a
keypad code plus a biometric. Combining
methods at an entry point increases reliability at > Separate the issues
that point; using different methods for each level Don't let the details of identifica-
significantly increases security at inner levels, tion technologies intrude upon
the initial mapping out of security
since each is secured by its own methods plus requirements. First define the
those of outer levels that must be entered first. areas and the access criteria for
your facility, then attack the
cost/effectiveness/risk analysis,
consider compromises, and
21.1.1.1 Security system figure out the best implementa-
tion of technology.
management
Some access control devices — card readers and
biometric scanners, for example — can capture the
data from access events, such as the identity of people who pass through and their time of
entry. If network-enabled, these devices can provide this information to a remote
management system for monitoring and logging (who’s coming and going), device control
(configuring a lock to allow access to certain people at certain times), and alarm (notification
of repeated unsuccessful attempts or device failure).

Access control 22.1.1.1 Cards and tokens: “what you have”


devices
Several types of cards and tokens are currently being used for access control, from simple to
sophisticated, offering a range of performance on various dimensions:
Ability to be reprogrammed
Resistance to counterfeiting
Type of interaction with card reader: swipe, insert, flat contact, no contact (“proximity”)
Convenience: physical form and how carried/worn
Amount of data carried
Computational ability
Cost of cards
Cost of reader
Regardless of how secure and reliable they may be due to their technology, the security
provided by these physical “things” is limited by the fact that there is no guarantee the correct
person is using them. It is therefore common to combine them with one or more additional
methods of confirming identity, such as a password or even a biometric.

The magnetic stripe card is the most common type of card, with a simple magnetic strip of
identifying data. When the card is swiped in a reader the information is read and looked up in
a database. This system is inexpensive and convenient; its drawback is that it is relatively
easy to duplicate the cards or to read the information stored on them.
The barium ferrite card (also called a “magnetic spot card”) is similar to the magnetic stripe
card but offers more security without adding significant cost. It contains a thin sheet of
magnetic material with round spots arranged in a pattern. Rather than scanning or swiping,
the card is simply touched to the reader.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 42
The Weigand card is a variation of the magnetic stripe card. A series of specially treated
wires with a unique magnetic signature is embedded in the card. When the card is swiped
through the reader, a sensing coil detects the signature and converts it to a string of bits.
The advantage of this complex card design is that the cards cannot be duplicated; the
disadvantage is they cannot be reprogrammed either. With this technology the card need not
be in direct contact with the reader; the head of the reader can therefore be encapsulated,
making it suitable for outdoor installation. Unlike readers for proximity cards and magnetic-
stripe cards, Weigand readers are not affected by radio frequency interference (RFI) or
electromagnetic fields (EMF). The robustness of the reader combined with the difficulty in
duplicating the card makes the Weigand system extremely secure (within the limits of a “what
you have” method), but also more expensive.

The bar-code card carries a bar code, which is read when the card is swiped in the reader.
This system is very low-cost, but easy to fool — an ordinary copy machine can duplicate a
bar code well enough to fool a bar-code reader. Bar-code cards are good for minimum-
security requirements, especially those requiring a large number of readers throughout the
facility or a large volume of traffic traversing a given access point. This is not so much a
security system as it is an inexpensive access monitoring method. (It has been said that bar-
code access only serves to “keep out the honest people.”)

The infrared shadow card improves upon the poor security of the bar-code card by placing
the bar code between layers of PVC plastic. The reader passes infrared light through the
card, and the shadow of the bar code is read by sensors on the other side.

The proximity card (sometimes called a “prox card”) is a step up in convenience from cards
that must be swiped or touched to the reader. As the name implies, the card only needs to
be in "proximity" with the reader. This is accomplished using RFID (radio frequency
identification) technology, with power supplied to the card by the card reader’s
electromagnetic field. The most popular design works within a distance of about 10 cm. (four
inches) from the reader; another design — called a vicinity card —works up to about a
meter (three feet) away.

The smart card, the most recent development in access control cards, is rapidly becoming
the method of choice for new installations. It is a card with a built-in silicon chip for onboard
data storage and/or computation. Data is exchanged with the reader either by touching the
chip to the reader (contact smart card) or by interacting with the reader from a distance, using
the same technology as proximity and vicinity cards (contactless or proximity smart card).
The chip, which is about a half inch in diameter, doesn’t necessarily have to be on a card — it
can be attached to a photo ID, mounted on a key chain, or worn as a button or jewelry (such
as the iButton® token). The general term for objects that carry such a chip is smart media.

Smart cards offer a wide range of flexibility in access control. For example, the chip can be
attached to older types of cards to upgrade and integrate with pre-existing systems, or the
cardholder’s fingerprint or iris scan can be stored on the chip for biometric verification at the
card reader — thereby elevating the level of identification from “what you have” to “who you
are.” Contactless smart cards having the “vicinity” range offer nearly ultimate user
convenience: half-second transaction time with the card never leaving the wallet.

23.1.1.1 Keypads and coded locks: “what you know”


Keypads and coded locks are in wide use as a method of access control. They are reliable
and very user-friendly, but their security is limited by the sharable and guessable nature of
passwords. They have familiar phone-like buttons where users punch in a code — if the
code is unique to each user it’s called a personal access code (PAC) or personal
identification number (PIN). Keypad generally implies the ability to accept multiple codes,
one for each user; coded lock usually refers to a device having only one code that
everyone uses.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 43
The security level of keypads and coded locks can be increased by periodically changing
codes, which requires a system for informing users and disseminating new codes. Coded
locks that don’t have their code changed will need to have their keypad changed periodically
if a detectable pattern of wear develops on the keys. As with access cards, keypad security
can be increased by adding a biometric to confirm user identity.

24.1.1.1 Biometrics: “who you are”


Biometric technology is developing fast, getting better and cheaper. High confidence
affordable biometric verification — especially fingerprint recognition — is entering the
mainstream of security solutions. Many vendors
now supply a wide range of biometric devices,
and when combined with traditional “what you > Why not use just
have” and “what you know” methods, biometrics biometric?
can complement existing security measures to Q: If an entry point uses card, PIN, plus
become best practice for access control. biometric, why not use just the
biometric alone if biometrics are so
reliable?
Biometric identification is typically used not to A: Because (1) Biometric processing
recognize identity by searching a database of time can be unacceptable if a large
users for a match, but rather to verify identity database of user scans must be
searched instead of comparing to the
that is first established by a “what you have” or
scan of single user, and (2) The risk of
“what you know” method — for example, a biometric false rejection or acceptance
card/PIN is first used, then a fingerprint scan can be reduced if the scan is compared
verifies the result. As performance and to only one user in the database.
confidence in biometric technology increase, it While biometric traits are nearly
may eventually become a stand-alone method impossible to forge, there is still the
of recognizing identity, eliminating the need to risk of incorrect matches by the
technology.
carry a card or remember a password.

There are two types of failures in biometric identification:

False rejection — Failure to recognize a legitimate user. While it could be argued that
this has the effect of keeping the protected area extra secure, it is an intolerable
frustration to legitimate users who are refused access because the scanner doesn’t
recognize them.
False acceptance — Erroneous recognition, either by confusing one user with another,
or by accepting an imposter as a legitimate user.

Failure rates can be adjusted by changing the threshold (“how close is close enough”) for
declaring a match, but decreasing one failure rate will increase the other.

Considerations in choosing a biometric capability are equipment cost, failure rates (both false
rejection and false acceptance), and user acceptance, which means how intrusive,
inconvenient, or even dangerous the procedure is perceived to be. For example, retinal
scanners are generally considered to have low user acceptance because the eye has to be 1-
2 inches from the scanner with an LED directed into the eye.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 44
Figure 3
Hand scanner

Security system design focuses on devices to identify and screen individuals at entry points
Other security — “access control” — which is all you would need if there were 100% reliability of
system elements identification, total trustworthiness of the intentions of people admitted, and the physical
perfection of unbreakable walls, doors, windows, locks, and ceilings. To cover for inevitable
failings due to flaws or sabotage, security systems ordinarily incorporate additional methods
of protection, monitoring, and recovery.

25.1.1.1 Building design


When building a new facility or renovating an old one, physical security can be addressed
from the ground up by incorporating architectural and construction features that discourage or
thwart intrusion. Security considerations in the structure and layout of a building generally
relate to potential entry and escape routes, access to critical infrastructure elements such as
HVAC and wiring, and potential sources of concealment for intruders. See the appendix for a
list of some of these design considerations.

26.1.1.1 Piggybacking and tailgating: mantraps


A common and frustrating loophole in otherwise secure access control systems can be the
ability of an unauthorized person to follow through a checkpoint behind an authorized person
(called piggybacking when the authorized person is complicit — i.e., holds the door — or
tailgating if the unauthorized person slips through undetected). The traditional solution is an
airlock-style arrangement called a mantrap having doors at entry and exit, with room for only
one person in the space between the doors. Mantraps can be designed with access control
for both entry and exit, or for exit only — in which case a failed attempt to exit the enclosure
causes the entry door to lock and an alert to be issued indicating that an intruder has been
caught. A footstep-detecting floor can be added to confirm there is only one person passing
through.

A new technology for solving this problem uses an overhead camera for optical tracking and
tagging of individuals as they pass, issuing an alert if it detects more than one person per
authorized entry.

27.1.1.1 Camera surveillance


Still cameras can be used for such things as recording license plates at vehicle entry points,
or in conjunction with footstep sensors to record people at critical locations.

Closed circuit TV (CCTV) cameras — hidden or visible — can provide interior or exterior
monitoring, deterrence, and post-incident review. Several types of camera views can be

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 45
used — fixed, rotating, or remotely controlled. Some things to consider when placing
cameras:

Is it important that a person in camera view be easily identifiable?


Is it only necessary to determine if the room is occupied?
Are you watching to see if assets are being removed?
Is the camera simply to serve as a deterrent?

If CCTV signals are recorded, there must be procedures in place to address the following
issues:

How will tapes be indexed and cataloged for easy retrieval?


Will the tapes be stored on site or off site?
Who will have access to the tapes?
What is the procedure for accessing tapes?
How long will the tapes be kept before being destroyed?

New technology is in development to automate a job traditionally done by security guards —


watching TV monitors — by software detection of changes (movement) in the image on the
screen

28.1.1.1 Security guards


Despite all the technological advancements in the field of physical security, experts agree
that a quality staff of protection officers tops the list of methods for backing up and supporting
access control. Guards provide the surveillance capability of all the human senses, plus the
ability to respond with mobility and intelligence to suspicious, unusual, or disastrous events.

The International Foundation for Protection Officers (IFPO) is a non-profit organization


founded for the purpose of facilitating standardized training and certification of protection
officers. Their Security Supervisor Training Manual is a reference guide for protection
officers and their employers.

29.1.1.1 Sensors and alarms


Everyone is familiar with traditional house and building alarm systems and their sensors —
motion sensors, heat sensors, contact (door-closed) sensors, and the like. Data center alarm
systems might use additional kinds of sensors as well — laser beam barriers, footstep
sensors, touch sensors, vibration sensors. Data centers might also have some areas where
a silent alarm is preferred over an audible one in order to catch perpetrators “in the act.”

If the sensors are network-enabled, they can be monitored and controlled remotely by a
management system, which could also include personnel movement data from access-control
devices (see earlier section, Security System Management.)

30.1.1.1 Visitors
Handling of visitors must be considered in any security system design. Typical solutions are
to issue temporary badges or cards for low-security areas, and to require escorting for high
security areas. The presence of mantraps (to prevent two people from passing an entry point
with one authorization) would require a provision for a temporary override or for issuance of
visitor credentials to allow passage.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 46
Technology can’t do the job all by itself, particularly since we are calling upon it to perform
The human element what is essentially a very human task: assessing the identity and intent of people. While
people are a significant part of the security problem, they are also part of the solution — the
abilities and fallibilities of people uniquely qualify them to be not only the weakest link, but
also the strongest backup.

31.1.1.1 People: the weakest link


In addition to mistakes and accidents, there is inherent risk in the natural human tendency
toward friendliness and trust. A known person entering the facility could be a disgruntled
employee or a turncoat; the temptation to bend rules or skip procedures for a familiar face
could have disastrous consequences; a significant category of security breach is the “inside
job.” Even strangers can have surprising success overcoming security — the ability of a
clever stranger to use ordinary guile and deceit to gain access is so well documented that it
has a name: social engineering. Anyone in an area where harm could be done must be well
trained not only in operational and security protocols, but also in resistance to creative social
engineering techniques.

32.1.1.1 People: the strongest backup


Protection from a security breach often comes down to the recognition and interpretation of
unexpected factors — a skill in which technology is no match for alert people. Add an
unwavering resistance to manipulation and shortcuts, and human presence can be a
priceless adjunct to technology.

Beyond an alert staff, the incomparable value of human eyes, ears, brains, and mobility also
qualifies people for consideration as a dedicated element in a security plan — the old-
fashioned security guard. The presence of guards at entry points and roving guards on the
grounds and inside the building, while expensive, can save the day when there is failure or
hacking of technological security. The quick response of an alert guard when something
“isn’t right” may be the last defense against a potentially disastrous security breach.

In protecting against both accidental and deliberate harm, the human contribution is the
same: constant vigilance and strict adherence to protocols. Having kept out all but those
essential to the operation of the facility, the remaining staff — well trained, following well-
designed practices and procedures — is the final firewall of an effective physical security
system.

Choosing the right


The right security system is a best-guess compromise that balances the risk and potential
solution: Risk damage from people being in the wrong place against the expense and nuisance of security
tolerance vs. cost measures to keep them out.

33.1.1.1 Potential cost of a security breach


While each data center has its own unique characteristics and potential for loss, most will
have something to consider in these general categories:

Physical loss — Damage to rooms and equipment from accidents, sabotage, or outright
theft.
IT productivity loss — Diversion of staff from primary duties while equipment is repaired
or replaced, data is reconstructed, or systems are cleared of problems.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 47
Corporate productivity loss — Interruption of business due to downtime.
Information loss — Loss, corruption, or theft of data.
Loss of reputation and customer goodwill — Consequences from serious or repeated
security breaches: loss of business, drop in stock value, lawsuits.

34.1.1.1 Considerations in security system design


Security system design can be a complicated equation with many variables. While specific
strategies for security system design are beyond the scope of this paper, any design will
likely consider these issues:

Cost of equipment — Budget constraints ordinarily limit the extensive use of high-
confidence identification equipment. The usual approach is to deploy a range of
techniques appropriate to various security levels.
Combining of technologies — The reliability of identification at any level can be in-
creased by combining lower-cost technologies, with the innermost level enjoying the
combined protection of all the outer concentric perimeters that contain it.
User acceptance — (The “nuisance”
factor). Ease of use and reliability of > You can’t buy your way out
identification are important in preventing
the system from becoming a source of Even if expense were of no concern,
frustration and a temptation for subver- blanketing the facility with highest security
would, in most cases, be unacceptable
sion. intrusive and inconvenient. Each are to be
Scalability — Can the design be imple- protected must be realistically accessed
for security needs based on what is in it
mented incrementally as necessity, fund- and who needs access
ing, and confidence in the technology in-
crease?
Backwards compatibility — Is the new design compatible with elements of an older
system already in place? Keeping all or part of an existing system can significantly re-
duce deployment cost.

Loss from damage


or theft Initial cost of security
equipment

Data loss
or corruption
Maintenance of
security equipment
Damage to reputation
or customer goodwill
Figure 4
Day-to-day inconvenience of
Balancing potential loss Productivity loss
security protocols
against known cost of during downtime
security

POTENTIAL LOSS
KNOWN COST
from people being in
of keeping people out
the wrong place

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 48
As data centers and web hosting sites proliferate, the need for physical security at the facility
Conclusion is every bit as great as the need for cybersecurity of networks. Intruders who falsify their
identity or intentions can cause enormous damage, from physically disabling critical
equipment to launching a software attack at an unsecured keyboard. Even the ordinary
mistakes of well-intentioned staff pose a significant daily threat to operations, and can be
minimized by restricting access to only the most essential personnel.

Technologies are in place, and getting less expensive, to implement broad range solutions
based on the identification principles of What you have, What you know, and Who you
are. By combining an assessment of risk tolerance with an analysis of access requirements
and available technologies, an effective security system can be designed to provide a
realistic balance of protection and cost.

When building a new facility or renovating an old one, physical security can be addressed
from the ground up by incorporating architectural and construction features that discourage or
thwart intrusion. Security considerations in the structure and layout of a building generally
relate to potential entry and escape routes, access to critical infrastructure elements such as
HVAC and wiring, and potential sources of concealment for intruders.

Position the data center door in such a way that only traffic intended for the data center
is near the door.
Use steel doors and frames, with solid doors instead of hollow-core. Make sure that
hinges cannot be removed from the outside.
Data center walls should use materials sturdier than the typical sheet rock used for
interior walls. Sensors can be imbedded in the walls to detect tampering.
The room used for the data center should not abut any outside walls.
Allow long and clear lines of sight for any security stations or cameras within the data
center.
Make use of barriers to obstruct views of the entrances and other areas of concern from
the outside world. This prevents visual inspection by people who wish to study the
building layout or its security measures.
Be aware of the placement of ventilation ducts, service hatches, vents, service elevators
and other possible openings that could be used to gain access. Tamper-proof grills
should be installed on all such openings that exceed 12 inches in width, to prevent hu-
man entry.
Avoid creating spaces that can be used to hide people or things. For example, the
space beneath raised floors could be a hiding place. Make sure that potential hiding
places are secured and not easily noticed by someone walking through the facility.
Install locks and door alarms to all roof access points so that security is notified
immediately upon attempted access. Avoid points of entry on the roof whenever
possible.
Take note of all external plumbing, wiring, HVAC, etc., and provide appropriate
protection. If left in plain site or unprotected, these infrastructure components can be
used to sabotage the facility without having to disable security measures.
Eliminate access to internal runs of wire, plumbing and ventilation ducts inside the
facility. You may have a data center thoroughly secured, but if a person walking down a
corridor can gain access to a run of power cabling or data cabling, the data center is
compromised.
Consider the placement of the data center within the building when retrofitting an
existing facility or constructing a new data center within an existing structure. Avoid
vulnerable locations or man-made risks. For example, avoid placing a data center
underneath or adjacent to kitchen facilities, manufacturing areas with large machinery,
parking lots, or any area with frequent traffic or vehicular access. Anything from kitchen
fires to car bombs to traffic accidents can pose a threat.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 49
Datacenter Monitoring & Management
Monitoring Physical Threats in the Data Center
Today’s common techniques for monitoring the data center environment date from the days
Introduction of centralized mainframes, and include such practices as walking around with thermometers
and relying on IT personnel to “feel” the environment of the room. But as data centers
continue to evolve with distributed processing and server technologies that are driving up
power and cooling demands, the environment must be looked at more closely.

Rising power density and dynamic power variations are the two main drivers forcing changes
in the monitoring methodology of IT environments. Blade servers have tremendously
increased power densities and dramatically changed the power and cooling dynamics of the
surrounding environments. Power management technologies have pushed the ability of
servers and communication equipment to vary power draw (and therefore heat dissipation)
based on computational load.

Although it is common to have sophisticated monitoring and alerting capabilities in physical


equipment such as the uninterruptible power supply (UPS), computer room air conditioner
(CRAC), and fire suppression systems, other aspects of the physical environment are often
ignored. Monitoring of equipment is not enough – the surrounding environment must be
viewed holistically and watched proactively for threats and intrusions. Such threats include
excessive server intake temperatures, water leaks, and unauthorized human access to the
data center or inappropriate actions by personnel in the data center.

Remote network locations such as branch offices, data rooms, and local point-of-sale
locations further highlight the need for automated monitoring, where it is impractical and
unreliable to have people physically present to check conditions such as temperature and
humidity. With the introduction of unmanned network outposts, IT administrators must have
reliable systems in place to know what is going on.

With today’s technologies, monitoring systems can be configured to a level of detail that
meets the data center’s particular environmental and security demands – each rack can be
considered a mini “data center” with its own requirements, with a monitoring strategy that
may include multiple data collection points.

This paper discusses physical threats that can be mitigated by distributed monitoring
strategies, and offers guidelines and best practices for implementing sensors in the data
center. It also discusses the use of data center design tools to simplify the specification and
design process of these distributed monitoring systems.

What are
This paper addresses a subset of threats – distributed physical threats – that are of particular
distributed physical interest because they require deliberate and expert design to defend against them. To
threats? identify that subset, it will be helpful to briefly characterize the range of threats to the data
center.

Data center threats can be classified into two broad categories, depending on whether they
are in the realm of IT software and networking (digital threats) or in the realm of the data
center’s physical support infrastructure (physical threats).

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 50
35.1.1.1 Digital threats
Digital threats are such things as hackers, viruses, network bottlenecks, and other accidental
or malicious assaults on the security or flow of data. Digital threats have a high profile in the
industry and the press, and most data centers have robust and actively maintained systems,
such as firewalls and virus checkers, to defend against them.

36.1.1.1 Physical threats


Physical threats to IT equipment include such things as power and cooling problems, human
error or malice, fire, leaks, and air quality. Some of these, including threats related to power
and some related to cooling and fire are routinely monitored by built-in capabilities of power,
cooling, and fire suppression devices. For example, UPS systems monitor power quality,
load, and battery health; PDUs monitor circuit loads; cooling units monitor input and output
temperatures and filter status; fire suppression systems – the ones that are required by
building codes – monitor the presence of smoke or heat. Such monitoring typically follows
well understood protocols automated by software systems that aggregate, log, interpret, and
display the information. Threats monitored in this way, by pre-engineered functionality
designed into the equipment, do not require any special user expertise or planning in order to
be effectively managed, as long as the monitoring and interpretation systems are well
engineered. These automatically-monitored physical threats are a critical part of a
comprehensive management system, but are not the subject of this paper.

However, certain kinds of physical threats in the data center – and they are serious ones – do
not present the user with pre-designed, built-in monitoring solutions. For example, the threat
of poor humidity levels can be anywhere in the data center, so the number and placement of
humidity sensors is an important consideration in managing that threat. Such threats can
potentially be distributed anywhere throughout the data center, at variable locations
that are particular to room layout and equipment positioning. The distributed physical
threats covered by this paper fall into these general categories:

Air quality threats to IT equipment (temperature, humidity)


Liquid leaks
Human presence or unusual activity
Air quality threats to personnel (foreign airborne substances)
1
Smoke and fire from data center hazards

Figure 1 illustrates the distinction between digital and physical threats, and the further
distinction in physical threats between those with pre-engineered equipment-based
power/cooling monitoring and – the subject of this paper – distributed physical threats that
require assessment, decisions, and planning to determine the type, location, and number of
monitoring sensors. It is this latter type of physical threat that may risk neglect because of
lack of knowledge and expertise in designing an effective monitoring strategy.

2
Basic room smoke/fire detection required by building codes is governed by specific legal and safety
regulations, and is not the subject of this paper. This paper covers supplemental smoke detection
particular to hazards in the data center, beyond what is required by building codes.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 51
Table 1 summarizes distributed physical threats, their impact on the data center, and the
types of sensors used to monitor them.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 52
Table 1
Distributed physical threats

Threat Definition Impact on data center Types of sensors

Equipment failure and reduced equipment life span


Room, rack, and equipment air
Air temperature from temperature above specification and/or drastic Temperature sensors
temperature
temperature changes

Equipment failure from static electricity buildup at


Room and rack relative humidity at low humidity points
Humidity Humidity sensors
specific temperature
Condensation formation at high humidity points

Liquid damage to floors, cabling and equipment Rope leak sensors


Liquid leaks Water or coolant leaks
Indication of CRAC problems Spot leak sensors

Digital video cameras


Unintentional wrongdoing by Motion sensors
Human error personnel Equipment damage and data loss
Rack switches
and personnel Unauthorized and/or forced entry Equipment downtime
access Room switches
into the data center with malicious Theft and sabotage of equipment
intent Glass-break sensors
Vibration sensors

Equipment failure
Smoke / Fire Electrical or material fire Supplemental smoke sensors
Loss of assets and data

Dangerous situation for personnel and/or UPS


Hazardous Airborne chemicals such as unreliability and failure from release of hydrogen Chemical / hydrogen sensors
airborne hydrogen from batteries and
contaminants particles such as dust Equipment failure from increased static electricity Dust sensors
and clogging of filters/fans from dust buildup

Various types of sensors can be used to provide early warning of trouble from the threats
Sensor described above. While the specific type and number of sensors may vary depending upon
placement budget, threat risk, and the business cost of a breach, there is a minimum essential set of
sensors that makes sense for most data centers. Table 2 shows guidelines for this basic
recommended set of sensors.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 53
Table 2
Guidelines for basic sensors

Applicable
General best
Sensor type Location Comments industry Example
practice
guidelines

At top, middle, and bottom of In wiring closets or other open


Temperature the front door of each IT rack, rack environments, temperature ASHRAE
Rack
sensors to monitor inlet temperature of monitoring should be as close Guidelines 2
devices in rack as possible to equipment inlets

Since CRAC units provide


One per cold aisle, at the front humidity readings, location of
Humidity ASHRAE
Row of a rack in the middle of the row-based humidity sensors
sensors Guidelines
row may need to be adjusted if too
close to CRAC output

Leak rope placement around


Rope leak
each CRAC system, around Spot leak sensors for monitoring
sensors cooling distribution units, and No industry
Room fluid overflows in drip pans,
Spot leak under raised floors, and any monitoring in smaller rooms / standard
sensors other leak source (such as closets and at any low spots
pipes)

Strategically placed according


Monitoring and recording of
to data center layout covering
normal access as well as
Digital video Room and entry / exit points and a good No industry
unauthorized or after-hours
cameras Row view of all hot and cold aisles; standards
access with video surveillance
ensure complete required field
software
of view is covered

Electronic switch at every Integrating room switches into


entry door to provide audit trail the facility system may be
Room HIPPA and
Room of room access, and to limit desirable and can be achieved
switches Sarbanes-Oxley 3
access to specific people at through a communications
specific times interface

At top, middle, and bottom of In wiring closets or other open


Temperature the front door of each IT rack, rack environments, temperature ASHRAE
Rack
sensors to monitor inlet temperature of monitoring should be as close Guidelines 4
devices in rack as possible to equipment inlets

3
ASHRAE TC9.9 Mission Critical Facilities, Thermal Guidelines for Data Processing Environments,
2004.
4
CSO Fiona Williams, Deloitte & Touche security services, says “Physical security does fall under the
Sarbanes-Oxley requirements. It is a critical component of the infosec program as well as general
computer controls. It falls within sections 302 and 404, which require that management evaluate and
assert that the internal controls are operating effectively.”
http://www.csoonline.com/read/100103/counsel.html (accessed on March 5, 2010)

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 54
In addition to the essential sensors shown in Table 2, there are others that can be considered
optional, based on the particular room configuration, threat level, and availability
requirements. Table 3 lists these additional sensors along with best practice guidelines.
Table 3
Guidelines for additional, situation-dependent sensors

Applicable
General best
Sensor type Location Comments industry Example
practice
guidelines
Rack level “very early smoke
When rack-level supplemental
detection" (VESD) to provide
Supplemental smoke detection exceeds
advanced warning of No industry
smoke Rack budget, placing VESD on the
problems in highly critical standards
sensors input of each CRAC provides
areas or areas without
some degree of early warning
dedicated smoke sensors 5

When VRLA batteries are


located in the data center, it is
Chemical / not necessary to place Wet cell batteries in a separate
Draft IEEE /
hydrogen Room hydrogen sensors in the room battery room are subject to
ASHRAE Guide 6
sensors because they do not release special code requirements
hydrogen in normal operation
(as wet cell batteries do)

Used when budget constraints Motion sensors are a lower cost


Motion Room and don’t allow for digital camera alternative to digital video No industry
sensors Row installation, which is best cameras for monitoring human standards
practice (see Table 2) activity

In high traffic data centers,


electronic switches on the Integrating rack switches into
front and rear door of every the facility system may be
HIPPA and
Rack switches Rack rack to provide audit trail of desirable and can be achieved
Sarbanes-Oxley
access and to limit critical through a communications
equipment access to specific interface
people at specific times

In high traffic data centers,


vibration sensor in each rack Vibration sensors in each rack
Vibration No industry
Rack to detect unauthorized can also be used to sense when
sensors standards
installation or removal of people move racks
critical equipment

Glass-break sensor on every


Glass-break data center window (either Best if used in conjunction with No industry
Room
sensors external, or internal to hallway video surveillance cameras standards
or room)

5
Assumes the existence of a separate fire detection system to meet building codes
6
IEEE/ASHRAE, Guide for the Ventilation and Thermal Management of Stationary Battery Installations,
Draft out for ballot later in 2006

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 55
With the sensors selected and placed, the next step is the collection and analysis of the data
Aggregating received by the sensors. Rather than send all sensor data directly to a central collection
sensor data point, it is usually better to have aggregation points distributed throughout the data center,
with alert and notification capabilities at each aggregation point. This not only eliminates the
single-point-of-failure risk of a single central aggregation point, but also supports point-of-use
7
monitoring of remote server rooms and telecom closets. The aggregators communicate,
through the IP network, with a central monitoring system (Figure 2).

Aggregator Glass-break
sensor
Aggregator
Digital video
camera
Aggregator
Humidity
Temperature sensor
sensors
Figure 2
IP Network
Aggregating the sensor
data Central
Temperature monitoring
sensors system

Aggregator
Aggregator

Digital video
camera

. Door-open Humidity
sensor Fluid
switch
Temperature sensor
sensors Temperature
sensors

Individual sensors do not typically connect individually to the IP network. Instead, the
aggregators interpret the sensor data and send alerts to the central system and/or directly to
the notification list (see next section). This distributed monitoring architecture dramatically
reduces the number of network drops required and reduces the overall system cost and
management burden. Aggregators are typically assigned to physical areas within the data
center and aggregate sensors from a limited area in order to limit sensor wiring complexity.

Sensors supply the raw data, but equally important is the interpretation of this data to perform
“Intelligent” alerting, notification, and correction. As monitoring strategies become more sophisticated,
action and sensors proliferate throughout the well-monitored data center, “intelligent” processing of
this potentially large amount of data is critical. The most effective and efficient way to collect
and analyze sensor data and trigger appropriate action is through the use of “aggregators” as
described in the previous section.

It is essential to be able to filter, correlate, and evaluate the data to determine the best course
of action when out-of-bounds events occur. Effective action means alerting the right people,
via the right method, with the right information. Action is taken in one of three ways:

7
This architecture of multiple aggregators, each with alert and notification capability for the sensors it
supports, is sometimes called “distributed intelligence at the edge.”

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 56
Alerting on out-of-bounds conditions that could threaten specific devices, racks, or the
data center as a whole
Automatic action based on specified alerts and thresholds
Analysis and reporting to facilitate improvements, optimization, and fault / failure
measurements

37.1.1.1 Alerting
There are three things to establish when setting alerts: alarm thresholds – at what value(s)
should the alarms trigger; alerting methods – how the alert should be sent and to whom;
and escalation – do certain types of alarms require a different level of escalation to resolve?

Alarm thresholds – For each sensor, acceptable operating conditions should be determined
and thresholds configured to produce alarms when readings exceed those operating
conditions. Ideally, the monitoring system should have the flexibility to configure multiple
thresh- olds per sensor in order to alert at informational, warning, critical, and failure levels.
In addition to single-value thresholds, there should be triggering conditions such as over-
threshold for a specified amount of time, rate of increase, and rate of decrease. In the case
of temperature, alerting on rate of change provides a quicker indication of failure than a
snapshot temperature value.

Thresholds must be set carefully to ensure maximum usefulness. There may be different
thresholds that cause different alerts based on the severity of the incident. For example, a
humidity threshold event might result in an email to the IT administrator, whereas a smoke
sensor might trigger an automatic call to the fire department. Likewise, different threshold
levels will warrant different escalation paths. For example, an unauthorized rack access event
might escalate to the IT administrator whereas a forced entry event might escalate to the IT
director.

Thresholds should be globally set to default values, and then individually adjusted based on
IT equipment specifications and the sensor mounting location relative to equipment location
(for example, a sensor located close to a server power supply should alarm at a higher value
8
than a sensor located close to the air inlet of a server). Table 4 lists suggested default
thresholds for temperature and humidity, based on ASHRAE TC9.9. In addition to these
thresholds, it is important to monitor the rate of change of temperature. A temperature
change of 10 °F (5.6 °C) in a 5-minute period is a likely indication of a CRAC failure.

Sensor High threshold Low threshold


Table 4
Suggested temperature
and humidity sensor Air temperature 77 °F (25 °C) 68 °F (20 °C)
thresholds

Humidity 55% relative humidity 40% relative humidity

Alerting methods – Alert information can be dispatched in a variety of different ways such as
email, SMS text messages, SNMP traps, and posts to HTTP servers. It is important that the
alerting systems be flexible and customizable so that the right amount of information is

8
ASHRAE TC9.9 recommendation for class 1 environments, which are the most tightly controlled and
would be most appropriate for data centers with mission critical operations.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 57
successfully delivered to the intended recipient. Alert notifications should include information
such as the user-defined name of the sensor, sensor location, and date/time of alarm.

Alert escalation – Some alarms may require immediate attention. An intelligent monitoring
system should be able to escalate specific alarms to higher levels of authority if the issue is
not resolved within a specified amount of time. Alert escalation helps to ensure that problems
are addressed on a timely basis, before small issues cascade into larger issues.

The following are examples of both useful and not-so-useful alerts:

Temperature sensor #48 is over threshold – Not very useful since it doesn’t indicate where
sensor #48 is located

Web server X is in danger of overheating – More useful since the specific server is identified

Door sensor has been activated – Not very useful since the specific door was not identified

Door X at location Y has been opened, and a picture of the person opening the door was
captured – Very useful since it includes the door identification, door location, and a photo-
graph of the incident

38.1.1.1 Acting on data


Collecting sensor data is only the first step, and if the data center manager relies on manual
response alone, the data will not be leveraged to maximum advantage. There are systems
available that act automatically based on user-specified alerts and thresholds. In order to
implement such “smart” automation, the following must be assessed:

Alert actions – Based on the severity level of an alert, what automated actions should take
place? These automated actions could be personnel notifications, or they could be corrective
actions such as triggering dry contact points to turn on or off devices such as fans or pumps.

Ongoing real-time visibility of sensor data – The ability to view individual sensor “snap-
shot” readings is a basic requirement. However, the ability to view individual sensor trends in
real time provides a much better “picture” of the situation. Interpretation of these trends
allows administrators to detect broader issues and correlate data from multiple sensors.

Alerting systems should provide more than just basic threshold violation notifications. For
example, some monitoring systems allow administrators to include additional data with the
alerts. This additional data might be captured video, recorded audio, graphs, and maps. A
rich alerting system of this type allows administrators to make more informed decisions
because of the contextual data included with the alert. In some cases, too much information
may need to be distilled to what is useful. For example, in a high-traffic data center, it would
be a nuisance to have an alert every time there was motion in the data center. There may be
instances where certain information is blocked out or “masked” in the interest of security. For
example, a video including the view of a keyboard could block out individuals typing pass-
words.
The following are examples of “intelligent” interpretation and action:

On a temperature threshold breach, automatically turn on a fan or CRAC


Remotely provide access to specific racks with electronic door locks, based on whose
face is on real-time video surveillance
When water is detected in a remote data center, automatically turn on a sump pump
When motion is detected in the data center after normal hours of operation,
automatically capture video and alert the security guards

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 58
When a glass break is detected after hours, notify security guards and sound audible
alarm
When a door switch indicates that a rack door has been open for more than 30 minutes
(indicating the door was not closed properly) send alarm to administrator to check the
door

39.1.1.1 Analysis and reporting


Intelligent monitoring systems should include not only short term trending of sensor data, but
also long term historical data as well. Best-of-breed monitoring systems should have access
to sensor readings from weeks, months, or even years past and provide the ability to produce
graphs and reports of this data. The graphs should be able to present multiple types of
sensors on the same report for comparison and analysis. The reports should be able to
provide low, high, and average sensor readings in the selected time frame across various
groups of sensors.

Long term historical sensor information can be used in a variety of ways – for example, to
illustrate that the data center is at capacity not because of physical space, but due to
inadequate cooling. Such information could be used to extrapolate future trends as more and
more equipment is added to a data center, and could help predict when the data center will
reach capacity. Long term trending analysis could be used at the rack level to compare how
equipment from different manufacturers in different racks produce more heat or run cooler,
which may influence future purchases.

Sensor readings captured by the monitoring system should be exportable to industry-


standard formats, enabling the data to be used in off-the-shelf as well as custom reporting
and analysis programs.

While the specification and design of a threat monitoring system may appear complex, the
Design method process can be automated with data center design tools such as APC’s InfraStruXure
Designer. Design tools such as this allow the user to input a simple list of preferences, and
can automatically locate the appropriate number of sensors and aggregation devices.
Summary reports provide parts lists and installation instructions for the recommended
sensors. These data center design tools use algorithms and established rules based on best
practices and industry standards to recommend specific configurations based on density,
room layout, room access policies, and user-specific monitoring requirements.

For example, the following user-specified preferences might influence the design of the threat
monitoring system, based on the level of data center traffic and access:

High traffic / access – If the data center is accessed by many individuals, each with
different applications and functions in the data center, the design tool would suggest
rack switches on every rack to allow access only to individuals needing access to the
respective racks.
Low traffic / access – If the data center is accessed by a select few individuals, each
with responsibility for all data center functions, the design tool would not suggest rack
switches to control access to separate racks; rather, a room door switch would be
sufficient to limit access to the room by other individuals.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 59
Sample sensor layout
A sample data center layout is shown in Figure 3, illustrating where monitoring devices would be located based on the best
practices described in this paper.

Conclusion
Safeguarding against distributed physical threats is crucial to a comprehensive security strategy. While the placement and
methodology of sensing equipment requires assessment, decision, and design, best practices and design tools are available to
assist in effective sensor deployment.

In addition to proper type, location, and number of sensors, software systems must also be in place to manage the collected data
and provide logging, trend analysis, intelligent alert notifications, and automated corrective action where possible.

Understanding the techniques for monitoring distributed physical threats enables the IT administrator to fill critical gaps in overall
data center security, and to keep physical security aligned with changing data center infrastructure and availability goals.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 60
Data Center Commissioning
When building a new data center, the owner of the data center has no guarantee that the
Introduction various physical infrastructure subsystems – power, cooling, fire suppression, security, and
management – will work together. Commissioning is the process that reviews and tests the
data center’s physical infrastructure design as a holistic system in order to assure the highest
level of reliability.

Traditional commissioning is a daunting task. Since formal system operation doesn’t begin
until the system is commissioned, the commissioning team experiences intense pressure to
complete the commissioning process quickly. Commissioning can involve high expense and
requires staffs from different departmental disciplines to work together. For these reasons
data center commissioning has almost uniquely been associated with large data centers
2 2
(over 20,000 ft or 1,858 m ). In the recent past, many data center managers chose to roll the
dice and perform little or no commissioning, relying only on start-up data to press ahead with
launching the new data center. Given the reality of 24x7 operations, however, the alternative
of exposure to major system failures and accompanying downtime is no longer an
economically viable option. Commissioning has now become a business necessity.

Data center project phases

Prepare Design Acquire Implement


Figure 1
Data center design / build
project process

Commissioning
Step

Placed in the context of an entire data center design / build project, the commissioning step is
part of the implementation phase (see Figure 1). Within the implementation phase,
commissioning comes after the physical infrastructure systems have been delivered,
assembled, installed, and individually started up. Once commissioning is complete, formal
orientation and training of data center staff can begin.

Definition of Commissioning is defined as a reliability science that documents and validates the result of a
data center’s design / build process. The roots of commissioning can be traced to the many
commissioning independent equipment vendors who, over the last 10 years, provided “start-up” services
after having installed their particular data center system component. Each start-up process
was driven principally by contractual requirements that operated in a vacuum, independent of
other components. The general contractor hired various equipment vendors to supply and
install their products. These vendors were guided by a construction installation schedule.
When each vendor completed their particular product installation, they requested a certificate
of completion from the construction manager. The certificate served as proof that the
contracted systems were installed and made operational, and only then was the vendor’s
request for payment authorized. However, no contractual requirement existed for the
disparate products to perform in a fully integrated manner.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 61
The practice of commissioning an entire data center developed when standard equipment
start-up procedures consistently failed to identify system-wide weaknesses (see Figure 2). A
data center manager who avoids the time and expense of commissioning has no ability to
effectively judge the data center’s ability to handle the intended critical loads.

Integrated Commissioning

Security Management Cooling

Figure 2
Product focused start-up Physical Building
Fire Suppression Power
ignores the proper integration & Grounds
of key subsystems

Circuit Other
UPS Generator By- Pass
Breakers Systems

Component Start-up

Detailed commissioning is most often performed for medium and large “green field” (new)
data centers. Smaller data centers with mission critical applications can also improve overall
data center performance from proper commissioning, although cost may be a factor.

A supplemental resource for companies considering data center commissioning is ASHRAE


Guideline 0 – the Commissioning Process. This document provides an overview of
commissioning, description of each commissioning phase, requirements for acceptance of
each phase, requirements for documentation of each phase, and requirements for training
of operation and maintenance personnel.

Outputs of The knowledge gained from the commissioning exercise should be documented. The
following three documents need to be produced if the commissioning process is to yield some
commissioning tangible benefits:

1. “As built” script report


2. Component error log report
3. Trending report

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 62
40.1.1.1 “As built” script report
The “as built” script report highlights the specific system components tested, describes what
kinds of tests were performed, and provides a line by line account of how each component
either passed or failed the test. The “as built” script serves as an important reference
document when, in the future, failure analysis is performed. The example below outlines the
“as built” script report content:

1. Data center description


A. size in sq ft / sq meters
B. key physical infrastructure components
C. component redundancy levels
D. overall data center criticality level
2. Data center design criteria
Figure 3 A. Physical floor plan demonstrating physical infrastructure equipment locations (includes racks)
Sample “as built” script B. Floor plan denoting power distribution
report outline C. Floor plan denoting coolant, chiller and fire suppression piping
D. Floor plan with existing air flow patterns
3. Component verification
A. model specified (manufacturer, model name, model number, asset ID number)
B. model delivered (manufacturer, model name, model number, asset ID number)
C. model installed (manufacturer, model name, model number, asset ID number)
D. model capacity (kW, volts, amps)
E. general equipment condition
4. Performance data
A. test procedures
B. expected response
C. actual response
D. designation as pass or fail

Component error log report


The component error log, also known as Failure Mode Effects Analysis (FMEA), focuses on
the specific system components that failed the tests and documents how the failed test
impacted other components either upstream to or downstream of the component in question.
This report details the performance data results, highlighting errors that have occurred and
recommending solutions. Below is an example of the categories of information presented in a
component error log report.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 63
Table 1
Example of component error log report

Test area / procedure Impacted


Failure / reason for failure Corrective action
no. / and sequence ID system
Test area: power Failure: UPS failed to support load after
switching from by-pass mode to full function. Generator, Load Have chief electrician verify that all battery
Procedure: 21 Banks and battery leads are properly connected and rerun
Reason: Battery leads at the head of battery banks test.
Sequence: 12 string were disconnected

Test area: cooling Failure: Chilled water failed to circulate to


CRACS Chiller, CRAC, Have facilities engineer replace pump with
Procedure: 38
Reason: Pump located between condenser Condenser spare unit until new unit can be installed.
Sequence: 3 and CRAC failed to start

Test area: fire system Failure: Smoke detector A6 failed to raise Air distribution
Procedure: 42389 alarm when tested system, Sensor Contact vendor to replace smoke detection
aggregation point, unit
Sequence: 8 Reason: Faulty sensor near intake Smoke detection unit.

Commissioning is an ongoing process. Once all operational parameters have been verified
and all settings have been checked, the commissioning documentation serves as a bench-
mark for monitoring changes and trends in the data center.

41.1.1.1 Executive summary / trending report


Once actual commissioning is completed, a trending report is issued. This report includes a
management summary of identifiable system performance trends. The summary also
contains a high-level system description, highlights issues that were encountered and
resolved, and identifies issues that remain open for future action. The summary also includes
an action plan and a validation statement from the commissioning agent verifying that the
data center has fulfilled the company’s design expectations. This report synthesizes the data
gathered from both the “as built” script report and the component error log report. Below is an
example that outlines the content of a commissioning trending report:

Executive summary
1. Data center overview
2. Summary of pre-commissioning data (i.e. component start up data)
3. Summary of commissioning scope
Commissioning methodology overview
1. Procedures tested
2. Sequence of testing
Data center commissioning system performance trends
1. Includes data center physical infrastructure power input and heat output
2. Projected energy consumption report with both energy-use index (EUI) and energy-cost index
(ECI). The EUI is kW per air conditioned sq foot of the data center. The ECI is dollars per
conditioned square foot per year.
3. Analysis of error logs, with emphasis on root causes.
Conclusion
1. Possible impacts of future expansion

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 64
The commissioning documents should be placed into the data center’s library of procedures
and practices (Figure 5). It is important that the acquired knowledge be documented in a
formal company system and NOT merely in the possession of one or two individuals who
might leave the company.

If the commissioning knowledge base is automated, then it can serve as a valuable training
tool for vendors and new staff members who are installing new pieces of equipment. IT help
desk and on-site facilities departments can also use the commissioning data for training.
More advanced training can include a requirement that staff be knowledgeable in
commissioning test results. In fact, the ability to run commissioning tests could be used as
criteria for
attaining internal technological performance certification levels.

Trending
Report

Knowledge
Figure 5
Commissioning “As Built”
Commissioning outputs Step Script
should be fully leveraged

Base
Component
Error Log

Typical utilization of commissioning data includes the following:

Comparison of future performance against known day-one performance (trending)


Training of site staff (i.e. video tape recordings of critical procedures that will need to be
performed in the future)
Clarification of root causes of future failures (forensic analysis)
Verification of warranty claims, performance assurance claims, and for other insurance
purposes
Support data for risk assessment analysis
Benchmark data to evaluate overall system performance
Identification of system components that need to be either redesigned or retuned
Prediction of expected results from system events

The commissioning knowledge base should also be used by senior management to estimate
the future usability and life expectancy of the data center.

Commissioning is initiated as a result of several related processes that are executed in


Inputs to advance. These key inputs include the following:
commissioning
1. Data center site preparation and installation work
2. Component start up data
3. Data center design parameters

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 65
42.1.1.1 Data center site preparation and installation work
Site coordination assures that installation prerequisites have been identified, verifies that all
system requirements have been met, reviews electrical and cooling installation requirements
with appropriate subcontractors, and verifies the floor layout design. This is followed by
actual installation of the physical infrastructure equipment components.

43.1.1.1 Component start up data


Both data center staff and equipment vendors are responsible for the task of starting up
individual system components. Once a piece of equipment, like a UPS for example, is
delivered and installed, the next logical step is to perform the start up. Start up generally
consists of powering up the system to make sure that the new equipment component is
working properly. The results of these various start up tests need to be made available to the
commissioning team prior to the initiation of the commissioning process. The team must then
decide how much commissioning will be required to provide a sufficient integrated test (see
Table 2 Table 2).

Sample commissioning scope checklist

Infrastructure
Fire suppression and
Power tests Cooling tests monitoring systems
security tests
and controls tests
System grounding Chillers Pipes Power monitoring system
Generator Chilled water pumps Sprinkler system CRAC monitoring system
UPS Cooling tower Gauges Humidity sensors
ATS Condenser water pumps Pumps Motion detection sensors
Integrated power system Piping Automatic alarms Temperature sensors
EPO Heat exchanger Smoke detection Building management system
CRAC Electronics
Ducting / air flow Man trap
Integrated cooling system Door lock system
Security camera

44.1.1.1 Data center design parameters


In a traditional data center design, the data center designer takes the operational
assumptions (i.e. 5,000 foot, tier II with 10% annual growth), and then custom designs the
data center physical infrastructure using custom components. The designer consults
colleagues to verify accuracy and to make redesign corrections, and then issues final
designs. The design process includes estimations, custom parts, and redesigns – all of
which invite multiple errors by increasing complexity. This traditional approach, with all the
high risk and high costs it introduces, discourages many data center managers from
investing additional dollars to properly commission the data center.

Modern data center design takes a different approach. A detailed analysis involving power
density, criticality levels (comparable in part to data center “tier” levels), power and cooling
capacity planning, and data center growth plans sets the stage for establishing the design.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 66
These design parameters are ultimately expressed in the format of a floor plan. The floor plan
allows for the commissioning team to formulate a strategy for scripting and testing the
integrated system components (see Figure 10).

Fortunately, recent innovations in physical infrastructure technology – such as scalable,


modular power and cooling components – have helped to introduce standardized components
into the design process. Standardization of both products and processes creates wide-
ranging benefits in physical infrastructure that streamline and simplify every process from
initial planning to daily operations. With standard system components in place, commissioning
becomes a less daunting, more affordable, and higher-value task that can be employed in
both small and large data centers.

Project Process Inputs


Start Up
Data

Figure 6 Design
Parameters
Commissioning
Both internal and external Step
resources provide inputs to
commissioning
Site
Preparation

Commissioning helps to compare actual system performance to the performance assumed by


45.1.1.1 How designers as they architected the data center. The essence of commissioning is “reliability
insurance.” The main purpose of traditional insurance is to lower the liability should an
commissioning incident occur in a home or business. Commissioning lowers the risk of failures in the data
works center by making sure, ahead of time, that the system works as an integrated whole. It also
can demonstrate how the equipment and systems perform during failure scenarios.

To determine the value of commissioning, data center managers need to take into account
whether the cost of downtime is greater than the cost of the commissioning process.
According to Einhorn Yaffee Prescott (EYP), a global consulting engineering firm, a good
rule of thumb is to invest 2% of the overall data center project cost on commissioning. In
most cases, data center owners will see a 5-10% ROI benefit in terms of overall data center
1
performance as a result of commissioning.

Key commissioning processes include the following:

Commissioning 1. Planning
process 2. Investment
3. Selection of a commissioning agent

1
Einhorn Yafee Prescott, Data Center World, Everything You Need to Know About Commissioning,
March 2006

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 67
4. Scripting
5. Setting up of a command center
6. Testing
7. Documenting

46.1.1.1 Planning
The commissioning process begins months ahead of the actual delivery of the physical
infrastructure equipment. Regular commissioning meetings should be held several weeks
ahead of the actual commissioning date. Vendors of the various component subsystems
should provide start-up documentation as part of the planning process. At these planning
meetings, primary and secondary stakeholders are kept informed of how the commissioning
schedule will be organized. Plans can be formulated at these meetings to set up event
sequencing and to coordinate schedules. The responsibilities of the team members who are
engaged in the process should be clearly defined in the planning stages.

Commissioning strives to identify and eliminate as many single points of failure (SPOF) as
possible. The new facility, or “green field” facility, makes it easier to control all the moving
parts of the total data center environment. In a green field data center all engineering and
operational assumption data is fresh and obtainable. In addition, needs and constraints are
understood and key personnel are accessible. For instance, a need would be for the facility
to have 5 minutes of battery back-up time while a constraint would be that generators should
not run for more than 30 minutes.

An existing or “brown field” facility presents more limitations than a green field facility. In a
brown field data center, original commissioning documentation may only consist of
component start-up information. The original engineer of record may not be available.
Landlords or lease agreements may have changed. The general contractor’s records may be
partial or unavailable. Subcontractor and vendor documentation may be outdated and
possibly faulty or unavailable. Internal management and / or original stakeholders may have
changed. The company may have been involved in a merger or acquisition scenario. Simply
stated, it is unrealistic to have the same expectations for an existing data center
commissioning project as for a green field project. These complicated commissioning
scenarios should serve to reinforce the importance of automating, up front, the documentation
development, storage, and retrieval processes.

Four years is the average refresh time for a green field data center to experience a major
upgrade project. Therefore, it is important to implement commissioning best practices at the
outset. If the existing data center history has been properly documented, it can serve as
base-line data for the new data center. In addition, all tracking records can serve as input to
the new design. Former project cost documentation of the existing data center can also be
revised for accurate budgeting of the new data center, and existing and new equipment
reliability can be accurately predicted. The entire past commissioning investment can be
leveraged for the new design / build project.

47.1.1.1 Investment
Determining how much commissioning should be performed depends on the business
expectation of cost and need. The more thorough the past commissioning process, the faster
and less costly future commissioning projects will be. Commissioning comes back to playing
the role of an insurance policy for data center reliability. With life insurance, for example, the
older the individual the more he or she will pay for a certain level of insurance. The “right”
amount to invest is directly proportional to how old the data center is. To fully commission a

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 68
ten year old data center is possible. However it may be more cost effective to consider a
complete replacement of the existing data center.

Selection of a commissioning agent


Many different viewpoints and influences impact the ultimate selection of the commissioning
agent. When engaging a commissioning agent in medium to large organizations, a
recommended best practice is to assure that the commissioning agent is independent. This
practice is driven by an organization’s desire to enhance its corporate image by leveraging
independent validations.

Finance departments embrace a similar approach regarding the independence of outside


auditors. Most companies subscribe to generally accepted accounting principles (GAAP).
GAAP requires the engagement of an independent audit agency to validate all public financial
data. The audit agent is not permitted to maintain any secondary relationships that could
compromise the independent review. Most companies’ internal audit requirements mandate
that the commissioning agent conform to the same rigid practices that are imposed on the
finance department. The reasoning behind this practice is that validation statements derived
from the data center commissioning process are used in risk assessment by investors and
that these commissioning documents may become public record.

If a company or owner chooses not to engage an independent commissioning agent, the


design engineer or the construction company can usually perform the commissioning
process. Regardless of whether an external or associated commissioning agent is selected,
validation of the agent’s past experience in delivering a fully integrated commissioning
process is recommended.

Once the contractor team has been selected by the owner, the commissioning agent should
get involved early in the project process. Early engagement provides the cleanest, least
filtered information and enhances the ability of the team to identify potential single points of
failure (SPOF). Involving a commissioning agent early on also reduces the possibility of
having the commissioning process fall victim to budget cuts, should the project experience
cost overruns.

48.1.1.1 Scripting
Prior to the integrated testing of equipment, a comprehensive test script must be created.
Scripting is important because it provides a time-sequenced and order-based roadmap for
testing all key data center elements. The script also captures a record of all the test results.
By following the script, the commissioning team can observe and validate how each physical
infrastructure component influences the operation of linked components.

The scripting is usually performed by the independent commissioning organization. If a


company or owner chooses not to engage an independent commissioning agent, then the
design engineer or the construction company can perform the scripting process. The master
script is developed over the entire length of the construction process and refined for each
physical infrastructure element.

Scripting must first validate that all subsystems are tested using the manufacturer’s start-up
process. Vendors of the various component subsystems should provide start-up
documentation and have it added to the script well in advance of the commissioning dates.
Regular scripting meetings should be held prior to the actual commissioning date. At these
meetings, the general scripting progress is reviewed and revised for each physical
infrastructure subsystem. When all the independent subsystems have been scripted, they
are incorporated into a cohesive system script.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 69
Once the various start-ups are validated and the assorted scripting documents are in order,
the integrated testing process can begin.

49.1.1.1 Setting up of a command center


Depending upon the complexity and size of the integrated commissioning test, a command
center may be required. Smaller data centers may simply designate an individual who can act
as the command center – a communication “hub” – during the testing process. The purpose
of the command center is to coordinate various testing activities, to give next step testing
permission, to handle all internal and external communication, and to have all contact and
emergency phone numbers available.

It is vitally important that the individuals actually performing the commissioning task not be
overburdened with external communication and documentation details; this is the command
center’s responsibility. The testing group needs to focus on safety and testing.

Figure 7 is an example of a typical communication between command center personnel and


the commissioning agent. This example emphasizes the importance of the time sequencing
of events and the level of precision required during the commissioning process.

“Commissioning Agent (CA) to Command Center (CC): do I have permission to open CB #102,
Script Test line EE15, Time 01:05?”

Figure 7 “CC to CA: Please hold until I verify with IT Help Desk and Engineering, Time 01:15”

Typical command center “CC to CA: I have verified, permission is granted; Time 01:34”
communication example
“CA to CC: CB # 102 is OPEN and Lock Out / Tagged Out engaged, Time 01:40”

“CA to CC: do I have permission to proceed to Script Test line EE 16, Time 01:45?”

“CC to CA: Yes, proceed to Script Test line EE 16, Time 01:47”

Note that the time stamp on each command center communication can be used to help refine
the execution of the task in the future. The command center process ensures that the script is
followed and that shortcuts are not taken which could lead to latent defects and subsequent
downtime.

The element of human fatigue must also be considered. In a perfect world, everyone in the
commissioning process would be well rested and alert, but this is not always the case. The
command center must ensure that only well rested individuals are included on the
commissioning team. If not, the possibility for human error grows dramatically. Several
approaches can help limit the fatigue factor of the employees:

Consider scheduling the commissioning test phases during the day as opposed to late
at night.
Monitor the number of hours that staff members are involved in testing so that work
shifts can be appropriately rotated.
Avoid having staff members work on weekends, particularly if they have been involved
in testing for several intense days in a row.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 70
50.1.1.1 Testing
Every piece of equipment should be tested by executing a sequenced failure followed by a
restart and return-to-stable operation. A sequenced failure implies that a failure in one
component (such as a generator) is communicated to a second related component (such as
the air conditioning system) so that the second component can act in an appropriate manner
to minimize downtime or to be ready for action when power is restored. This testing cycle
should be performed on each component and also on the entire integrated system. This will
involve a complete power down and an automatic restart.

Power: This aspect of the commissioning process tests the high voltage electrical service
entrance. It then progresses forward to the medium voltage main power distribution system,
including parallel switchgear, transfer switches, emergency generator, UPS system, the data
center monitoring system, and the distribution down to the racks. All lighting and life safety
systems including emergency power off systems (EPO) are also tested. Finally, electrical
system commissioning should include a short-circuit and breaker coordination study using
electrical scripting to verify that all circuit breaker and ground fault trip settings are correct.

Cooling: The cooling components include the cooling towers (including incoming water
sources), chillers, piping, pumps, variable speed drives, chemical or other water treatment
systems, and filtration systems. It also includes building humidification, ventilation, heating
systems, and computer room air conditioners (CRACs).

Fire suppression: This begins with an analysis of the incoming water and post indicator
valves (PIVs), works through the alarm systems and automated reporting systems, and ends
with the sprinkler and or clean agent (gas) fire suppression systems.

Monitoring and management systems: Commissioning of the building management and


energy management monitoring and control systems is incorporated with each primary
system test. Each alarm should be verified.

Physical security systems: The central security station site video monitoring, physical
security devices such as mantraps and card readers, and central sound system are also
tested during commissioning. All wall duct penetrations should be double checked to
determine whether security bars have been installed. These security bars can prevent an
intruder who has gained access to the roof, for example, from entering the data center by
climbing down a large air duct.

Spare parts: If deploying some of the newer, modular / scalable UPSs or similar equipment,
spare parts, such as backup power modules, should also be included as part of the
commissioning process. For example, the original power module should be included in the
first test. Then that module should be pulled out and replaced with the spare module. The
test should be run again to verify that both the original and spare modules work correctly.
The spare module should then be properly stored (i.e. wrapped in a dust resistant plastic
package) in a secure environment until it is ready to be deployed as a replacement part.

Commissioning tests the “sequence of operation” of all systems working together, and tests
Tools and documents the limits of performance. During commissioning, automatic failure and
recovery modes are also tested and documented to assure that redundancies work.

Although physical infrastructure equipment is installed prior to commissioning, data centers


are not often fully loaded with IT equipment during commissioning (see Figure 8). Therefore,
a sufficient heat load may not exist for system testing. In this case, load banks can be used to
introduce heat loads and to allow for simultaneous testing of both electrical and cooling
systems.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 71
Figure 8
Large load banks simulate
computer load

51.1.1.1 Commissioning for high density


The traditional approach of utilizing load banks to simulate the data center’s electrical load is
both costly and insufficient for commissioning a data center in an integrated fashion.
Traditional methods emphasize power conditioning systems. Mechanical systems, such as
CRACS, are not tested to the same extent. The challenge with traditional load banks has
been the difficultly in producing a heat load sufficient to simulate and test the operating limits
of the CRAC systems.

Now that blade server technology is being introduced to many data centers, managing heat
has become even more important. Blade servers can generate a load of 24 kW or more per
rack.

Until now, no commissioning methodology has permitted the testing of power and cooling
systems simultaneously. No methodology has allowed for the creation of an environment that
could accurately test the level of power and cooling needed to support a true high density
data center. American Power Conversion (APC) by Schneider Electric has developed an
approach that allows end-to-end reliability testing to be performed easily and safely. Using a
“server simulator” that installs in a standard IT cabinet or rack, the methodology duplicates IT
loading both in terms of electrical load, heat and air flows (see Figure 9).

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 72
Figure 9
Rack-mounted server
simulator has adjustable heat
and air flow settings

The APC temporary independent resistive heaters can be installed in the data center racks as
artificial server loads. These heaters have selectable load and airflow ranges that can be set
to match the electrical server load and airflow designed for each rack. These heaters can be
directly plugged into the same electrical distribution system installed to supply the servers;
hence all distribution is also commissioned. The cooling, electrical and monitoring systems
must be ready to run when the load banks arrive and when the functional tests are set to be
run.

Temporary independent rack heaters test the following:

Power distribution installation


Hot / cold aisle air flow
Rack hot air flow patterns
Rack mount outlets in the racks
PDUs serving the racks
Management for the entire physical infrastructure system (including racks)

They are also useful in verifying the following:

Actual rack cooling requirement


Automatic shutdown parameters by verifying UPS and run to failure modes
Computer room air conditioner (CRAC) system operations
CRAC cooling fluid system

52.1.1.1 Scripting checklists


A second valuable tool utilized in the commissioning process is the scripting outline. In most
cases the commissioning agent will use a standard master script outline that is modified
based upon the system components in the particular installation. During actual testing, the
script should be a hand-held paper or electronic document containing a test procedure
articulating the projected outcome of each event. It should also contain check off boxes for
each test with space for comments and test results (see Figure 10). Each person associated
with the test should have an identical copy of the test script. The scripting documentation, if

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 73
properly designed and assembled, is a powerful tool for the IT staff to utilize in order to
proactively prevent future system failures.

Line number Check off box Description Results Proceed? Initials

132 Basic operational tests – manual transfers n/a Yes

133 (carry out the following functional tests) n/a Yes

134a ATS racked in “CONNECTED” position pass Yes


Figure 10
134b ATS not bypassed pass Yes
Abbreviated example
of closed-transition 134c Closed-transition transfer capability disabled pass Yes
transfer switch test
script 135 Test steps n/a Yes

136a Verify that above conditions are satisfied pass Yes

137b Move ATS to “TEST” position fail No


137c Bypass ATS to Normal source
137d Move ATS to “TEST” position
137e Move ATS to “DISCONNECTED” position

In addition to testing and command center teams, it is important that key stakeholders are
53.1.1.1 Organizatio present when the commissioning takes place. If key team members can witness failures, they
n can provide more constructive feedback during remediation and retesting. The commissioning
teams should consist of the following:

Owner team (which can include representatives from the IT department, from facilities,
from operations, and from key business units)
Design team (which may include an architect / engineer from the outside, an interior
designer, and any specialized consultants)
Contractor team (which will include the contractor, the outside project manager, the
inside program manager, and any significant subcontractors)
Supplier / vendor team (independent product representatives)
Independent commissioning agent

These stakeholders need to work in a coordinated fashion in order for the commissioning
exercise to be successful. The commissioning agent leads the process and the owner and
vendor teams typically perform the testing. Documentation is the responsibility of both the
commissioning agent and the owner teams. The design and contractor teams are involved
much earlier in the process, by providing inputs to the commissioning script and scheduling
dates.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 74
The data center physical infrastructure commissioning process can be compared to an
Conclusion insurance program. Like insurance, the owner must weigh the cost of commissioning to the
risk of a potential loss. It is the principal stakeholder’s responsibility to ensure that the initial
benefits of integrated commissioning do not degrade over time. Similar to insurance, the
commissioning agent should be contacted periodically or at major business events to provide
a review of the integrated system’s current integrity. This review is required because risk and
reliability will change over time as business needs change.

Integrated commissioning produces volumes of well documented test results, procedures,


and processes. The output of commissioning is the physical infrastructure knowledge base of
your company. If kept current, commissioning documentation is invaluable in providing
physical infrastructure refresher education and new hire training. If the information is
electronic and automated, it can be used as valuable design input to future data center
projects. Companies like APC can provide commissioning support services if required.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 75
Preventive MaintenanceStrategy for Data Centers
Introduction
This paper highlights data center power and cooling systems preventive maintenance (PM) best practices.
Hands-on PM methods (i.e., component replacement, recalibration) and non- invasive PM techniques (i.e.,
thermal scanning, software monitoring) are reviewed. The industry trend towards more holistic and less
component-based PM is also discussed. The term preventive maintenance (also known as preventative
maintenance) implies the systematic inspection and detection of potential failures before they occur. PM is
a broad term and involves varying approaches to problem avoidance and prevention depending upon the
criticality of the data center. Condition-based maintenance, for example, is a type of PM that estimates and
projects equipment condition over time, utilizing probability formulas to assess downtime risks.

PM should not be confused with unplanned maintenance, which is a response to an unanticipated problem
or emergency. Most of the time, PM includes the replacement of parts, the thermal scanning of breaker
panels, component / system adjustments, cleaning of air or water filters, lubrication, or the updating of
physical infrastructure firmware.

At the basic level, PM can be deployed as a strategy to improve the availability performance of a particular
data center component. At a more advanced level, PM can be leveraged as the primary approach to
ensuring the availability of the entire data center power train (generators, transfer switches, transformers,
breakers and switches, PDUs, UPSs) and cooling train (CRACs, CRAHs, humidifiers, condensers, chillers).

A data center power and cooling systems preventive maintenance (PM) strategy ensures that procedures for
calendar-based scheduled maintenance inspections are established and, if appropriate, that condition-based
maintenance practices are considered. The PM strategy should provide protection against downtime risk
and should avoid the problem of postponed or forgotten inspection and maintenance. The maintenance
plan must also assure that fully trained and qualified maintenance experts observe the physical
infrastructure equipment (i.e., look for changes in equipment appearance and performance and also listen
for changes in the sounds produced by the equipment) and perform the necessary work.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 76
Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 77
PM outcomes One of four results can be expected during a PM visit:
A potential issue is identified and immediate actions are taken to prevent a future
failure. This is the most prevalent outcome of a PM visit.
A new, active issue is identified and an appropriate repair is scheduled. Such a visit
should be precisely documented so that both service provider and data center owner
can compare the most current incident with past PMs and perform trend analysis.
No issue is identified during the visit and no downtime occurs through to the next PM
visit. The equipment is manufacturer approved and certified to function within operating
guidelines.
A defect is identified and an attempted repair of this defect results in unanticipated
downtime during the PM window or shortly thereafter (i.e., a new problem is introduced).

The risk of a negative outcome increases dramatically when an under-qualified person is


performing the maintenance. Methods for mitigation of PM-related downtime risks will be
discussed later in this paper.

In the data centers of the 1960s, data center equipment components were recognized as
Evolution of PM common building support systems and maintained as such. At that time, the data center was
ancillary to the core business and most critical business processing tasks were performed
manually by people. On the data center owner side, the attitude was “Why spend money on
maintenance?” Manufacturers were interested in the installation of equipment but the “fix it”
business was not something they cared about.

Over time, computers began performing numerous important business tasks. As more and
more corporate data assets began to migrate to the data center, equipment breakage and
associated downtime became a serious threat to business growth and profitability.
Manufacturers of data center IT equipment began to recognize that an active maintenance
program would maintain the operational quality of their products.

Annual maintenance contracts were introduced and many data center owners recognized the
benefits of elevated service levels. As corporate data evolved into a critical asset for most
companies, proper maintenance of the IT equipment became a necessity for supporting the
availability of key business applications. The PM concept today represents an evolution from
a reactive maintenance mentality (“fix it, it’s broken”) to a proactive approach (“check it and
look for warning signs and fix it before it breaks”) in order to maximize 24x7x365 availability.

54.1.1.1 Impact of changes in physical infrastructure architecture


As with computer maintenance, data center physical infrastructure (i.e. power and cooling)
equipment maintenance has also evolved over time. In the 1980s the internal architecture of
a UPS, for example, consisted of 100% separate components that were not, from a
maintenance repair perspective, physically integrated with other key components within the
device. These UPSs required routine maintenance such as adjustment, torquing and
cleaning in order to deliver the desired availability. A maintenance person would be required
to spend 6-8 hours per visit, per UPS, inspecting and adjusting the individual internal
components.

In the 1990’s the architecture of the UPS evolved (see Figure 2). Physical infrastructure
equipment began featuring both individually maintainable components and integrated,
computerized (digital) components. During this time period, a typical UPS consisted of only

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 78
50% manually maintainable parts with the remainder of the “guts” comprised of computerized
components that did not require ongoing maintenance.

Present 2010 and


1980s 1990s (2007) beyond

Traditional UPS Computerized UPS

50%
Figure 2 Merged/ 75%
90%
Computerized
Evolution of UPS design Components Merged/
Computerized Merged/
and associated PM
100% Components Computerized
Separate Components
Components
50%
Separate
Components
25%
Separate 10%
Components Separate
Internal redundancy Components

Monthly visit Quarterly visit Annual visit Transition to


whole power and
cooling train PM

By the mid-1990’s the computerized components within the UPS began to communicate
internal health status to operators in the form of output messages. Although PM visits were
still required on a quarterly basis, the repairperson spent an average of 5 hours per visit per
UPS. At present, the ratio of maintainable parts to computerized components has shifted
further to 25% manually maintainable parts and 75% computerized parts (see Figure 2).

Today, most data center sites require one or two PM visits per year. However, more PM visits
may be required if the physical infrastructure equipment resides in a hostile environment (i.e.,
high heat, dust, contaminants, vibration). The frequency of visits depends upon the physical
environment and the business requirements of the data center owner. The system design of
the component may also impact the frequency of PM visits. Often the number of visits is
based upon the manufacturer’s recommendation.

Today’s physical infrastructure is much more reliable and maintenance-friendly than in the
Evidence of PM past. Manufacturers compete to design components that are as mistake-proof as possible.
progress Examples of improved hardware design include the following:
Computer room air conditioners (CRACS) with side and front access to internal
components (in addition to traditional rear access)
Variable frequency drives (VFDs) in cooling devices to control speed of internal cooling
fans. VFDs eliminate the need to service moving belts (which are traditionally high-
maintenance items)
Wrap-around bypass functionality in UPS that can eliminate IT downtime during PM

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 79
In addition to hardware improvements, infrastructure design and architecture has evolved in
ways that support the PM goals of easier planning, fewer visits, and greater safety. For
example:

Redundant cooling or power designs that allow for concurrent maintenance – the critical
IT load is protected even while maintenance is being performed
Proper design of crimp connections (which provide an electrical and mechanical
connection) can reduce or eliminate the need for “re-torquing”, which, if performed in
excess, can increase exposure to potential arc flash
Recent attention to the dangers of arc flash are now influencing system design, in order
to protect PM personnel from the risk of electrical injury during maintenance

55.1.1.1 Software design as a critical success factor


The design of the physical infrastructure hardware is one way reduce PM cost and
complexity. Efficient physical infrastructure management software design is being vaulted to
the forefront as the critical success factor for maintaining high availability. Best in class data
centers leverage physical infrastructure management software.

Through self-diagnosis, infrastructure components can communicate usage hours, broadcast


warnings when individual components are straying from normal operating temperatures, and
can indicate when sensors are picking up abnormal readings. Although PM support personnel
will still be required to process the communications output of the maintenance management
system, the future direction is moving towards complete self-healing physical infrastructure
systems.

Multiple management systems, each addressing its own type of component(s)


Little or no communication among management systems

PM Management
Systems
Figure 3 “Loosely coupled”
Traditional approach: component- PM management
by-component PM management

CRAC
PDU
Breakers Humidifier
UPS

Forward thinking data center owners contemplate a holistic PM strategy for the entire data
center power train. While traditional PM support for existing equipment continues to play an
important role, a strategy for maintaining future equipment should look to embrace a PM
approach that views the data center as an integrated whole as opposed to as assembly of
individual components (see Figure 3 and Figure 4).

A further analysis will help to clarify the evolution from component-based PM to whole-power-
train or whole-refrigeration-cycle cooling PM. Consider the UPS (uninterruptible power
supply) physical infrastructure component as an example. When a power problem manifests
itself, the problem is not always with the UPS. The problem instead may be with a breaker,
switch, or faulty circuit. A monitoring system that ties together all of these critical components

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 80
and communicates data back to an individual who understands the integrated power train and
who can properly interpret the system messages represents a great value.

56.1.1.1 Organizing for “holistic” PM


To optimize efficient PM, the data center owner’s internal organizational structure should also
be aligned to support a robust implementation of holistic, integrated PM practices. Tradition-
ally, IT and facilities groups have not been harmonized to work closely together. IT has
relegated itself to supporting IT systems in the data center while the facilities department has
been relied upon to oversee the installation and maintenance of the physical infrastructure
components. Since these systems are now closely coupled in the data center, an alternative
organizational approach that tightly integrates key members of both teams needs to be
considered.

One management system, addressing all components as a system

Figure 4 Humidifier
Strategic approach: integrate PDU
CRAC
holistic PM management Breakers
UPS

“Tightly coupled”
PM management

Older UPSs (those installed in the ‘80s and ‘90s) need to be manually adjusted on a regular
Why physical basis to prevent voltage drift and “out-of-tolerance” conditions. For example, UPS control
infrastructure cards required that the calibration of potentiometers be adjusted manually by a technician
utilizing an oscilloscope on a quarterly basis. Today this same function is executed by an
components fail onboard microprocessor. Periodic recalibration helps to minimize the possibility that the UPS
will fail.

More modern UPSs are controlled with digital signal processor controls. These do not "drift"
and do not require recalibration unless major components are replaced. In addition to out-of-
tolerance conditions, harmonics and power surges also have a negative impact on physical
infrastructure power components.

Temperature fluctuation is another common cause of electronic component failure.


Electronics are designed to support specific temperature ranges. If temperatures remain
within the design range of the equipment, failures rarely occur. If, however, temperatures
stray beyond the supported range, failure rates increase significantly. In fact, according to
studies con-

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 81
ducted by high-performance computing researchers at Los Alamos National Laboratory, the
1
failure rate doubles for every rise of 10° C (18° F) (see Figure 5).

The recommended operating temperature range for IT equipment, according to the American
Society of Heating, Refrigeration, and Air Conditioning Engineers (ASHRAE) TC 9.9, is 68-
77°F (20-25°C). Proper airflow can help maintain consistent and safe temperatures and can
help to sustain environmental conditions that translate to longer component life and increased
time between failures. Excessive current is another source of damage to internal components.
Mechanical systems also require the inspection of normal and abnormal bearing wear and
the periodic replacement of oils and lubricants.

4X normal
failure rate
Failure rate doubles for every
10 ºC temperature rise

2X normal
Figure 5 failure rate
Los Alamos National Laboratory
heat-to-failure study
Normal
failure rate

20 ºC 30 ºC 40 ºC
68 ºF 86 ºF 104 ºF

Visits by qualified maintenance personnel serve as a validation that the physical


Recommended infrastructure equipment is on track to support the data center owner’s system uptime goals.
practices Physical infrastructure professionals with data center expertise can identify the aging of
various internal components and identify how much the component influences the overall
reliability of the system.

The PM professional should observe the data center environment (circuit breakers,
installation practices, cabling techniques, mechanical connections, load types) and alert the
owner to the possible premature wear and tear of components and to factors that may have a
negative impact on system availability (i.e., possible human error handling equipment, higher
than normal temperatures, high acidity levels, corrosion, and fluctuations in power being
supplied to servers).

A PM visit should also include an evaluation of outside environmental factors that can impact
performance (see Table 1). The depth and breadth of the PM visit will depend upon the
criticality level of the data center and should result in the formulation of an action plan.

1
Los Alamos National Laboratory: “The Importance of Being Low Power in High Performance Comput-
ing”, Feng, W., August 2005

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 82
Internal environment External environment

•Overall cleanliness
Hands-on
•Temperature levels
•Appearance of circuit boards
•Acidity levels
•Appearance of sub-assemblies
•Presence of corrosion
•Appearance of cable harnesses
•Frequency of disruptions
•Connectors
Table 1 •Presence of dripping water
•Filters
•Dust content of area
Sample PM environment checklist •Windings
•Hot spots
•Batteries
•Ventilation obstruction
•Capacitors
•Access hindrance
•Insulation
•Open windows and doors
•Ventilation
•Nearby construction
•Radio usage
Non-invasive
•Roof penetrations
•General appearance •Noise quality of equipment
•Thermal scanning readouts •Connections of equipment to earthing
•Predictive failure reports cables
•Internal temperature readings

57.1.1.1 Thermal scanning and predictive failure


Thermal scanning of racks and breaker panels is recommended during a PM visit. Abnormal
temperature readings can prompt a required intervention. Infrared readings can be compared
over time to identify trends and potential problems. In this way, an electrical connection, for
example, can be retightened based on scientific data instead of a guess.

The thermal scanning approach can be also be applied to switchgear, transformers,


disconnects, UPS, distribution panel boards, power distribution units, and air conditioner
unit disconnect switchers.

Computational Fluid Dynamics (CFD) can also be utilized to analyze the temperature and
airflow patterns within the data center and to determine the effect of cooling equipment
failure.

By utilizing a predictive failure approach, capacitors, for example, are replaced only when
continuous onboard diagnostics make a recommendation for replacement. This is in stark
contrast to the traditional “it’s been 6 months and its time to replace them” approach.
Adhering to predictive failure practices avoids unnecessary execution of invasive procedures
which injects the risk of human error leading to downtime.

Table 2 presents a sample list of physical infrastructure devices that require PM. These
systems interact with each other and need to be maintained as a whole system.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 83
Overall
Device Internal elements requiring PM maintenance
level required

Transformer Tightness, torque of connections low

PDU Tightness, torque of connections low

Data center air and


water distribution Piping internal densities, valves, seats and seals low
systems
In-Row CRAC Filter, coil, firmware, piping connections, fan motors medium

New Generation UPS Fans, capacitors, batteries medium

Table 2 Raised floor Physical tiles, tile position, removal of zinc whiskers high
Devices requiring data center
PM (partial listing) Traditional UPS Fans, capacitors, electronic boards, batteries high

Belts, air filters, piping connections, compressor, fan


Traditional CRAC high
motors, pumps, coils

Humidifier Drain, filter, plugs, water processor high

Transfer switch Switch components, firmware, torque high

External Batteries Torque, connections, electrolyte / acid levels, temperature


high
(wet cell and VRLA) levels

Fire Alarm System Valves, flow switches high

Chillers Oil pressure levels, gas levels, temperature settings high

Fuel filter, oil filter, hoses, belts, coolant, crankcase


Generator breather element, fan hub, water pump, connections high
torque, alternator bearings, main breaker

58.1.1.1 Scheduling practices


Traditional maintenance scheduling practices were established in the days before system
availability became a significant concern for data center owners. Nights, weekends and
three-day holiday weekends were, and are still, considered common scheduling times.
However, the rise of the global economy and the requirement for 24x7x365 availability has
shifted the maintenance scheduling paradigm.

In many cases, the justification for scheduling PM only on nights and weekends no longer
exists. In fact, a traditional scheduling approach can add significant cost and additional risk to
the PM process. From a simple hourly wage perspective, after-hours maintenance is more
expensive. More importantly, services and support personnel are likely to be physically tired
and less alert when working overtime or when performing work at odd hours. This increases
the possibility of errors or, in some cases, can increase the risk of personal injury.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 84
A PM provider / partner can add value by helping the data center owner to properly plan for
scheduling PM windows. In situations where new data centers are being built, the PM
provider / partner can advise the owner on how to organize the data center floor plan in order
to enable easier, less intrusive PM. In addition, information gathered by governmental bodies
such as the National Oceanic and Atmospheric Administration (NOAA) provide climate trend
data that can guide data center owners on optimum maintenance windows (see Figure 6).

2002-03
30 2003-04
2004-05
Degree-days

2005-06
Figure 6 20
2006-07
Normals
Research data (heating
and cooling degree-
days) as a guide to 10
scheduling PM visits

0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
HEATING degree-days COOLING degree-days

Source: National Oceanic and Atmospheric Administration, National Weather Service


http://www.cpc.ncep.noaa.gov/products/analysis_minitoring/cdus/degree_days/
Short-Term Energy Outlook, June 2007

Note: A degree-day compares the outdoor temperature to a standard of 65° F (18.3°C); the
more extreme the temperature, the higher the degree-day number. Hot days are measured in
cooling degree-days. On a day with a mean temperature of 80° F, for example, 15 cooling
degree-days would be recorded (80 – 65 base = 15 CDD). Cold days are measured in
heating degree-days. For a day with a mean temperature of 40° F, 25 heating degree-days
would be recorded (65 base – 40 = 25 HDD). By studying degree-day patterns in your area,
increases or decreases in outdoor temperatures from year to year can be evaluated and
trends can be established.

59.1.1.1 Coordination of PM
Extreme hot and cold outside temperatures and stormy “seasons” can pose significant risks. If
climate data points to April and September as the optimum months for PM to take place, then
both pros and cons still need to be considered. For example, is any nearby construction
project planned during any of the proposed PM “windows”? If so, a higher likelihood of
outages due to the construction accidents (i.e. power and water lines accidentally cut by
construction equipment) could be an important factor to consider. Would cooler weather help
provide free cooling to the data center, if data center cooling system downtime occurs? If
September is deemed an optimal month to perform PM based on outside temperature data, is
it wise to schedule during an end of quarter month, when financial systems are operating at
full capacity?

One approach is to schedule PM at different times. Mobilizing all key staff members
simultaneously could pose a risk by compromising the coverage/ support expected by both
business users and customers. If lack of personnel human resources is an issue, a phased
PM schedule will spread PM responsibilities more evenly and allow the data center to
maintain its target service levels.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 85
If access to human resources is not an issue, another approach would be to perform the PM
all at once on the same day or group of days and not at different time periods. Rather than
scheduling multiple PM visits with multiple organizations, one partner is called in to provide,
schedule, and perform key infrastructure PM. This “solution-oriented PM” (as opposed to
traditional component-oriented PM) with a qualified partner can save time and money and will
improve overall data center performance. The overriding priority is to schedule PM with a
qualified service provider when disruption to the data center is at a minimum and when
recovery options are maximized.

60.1.1.1 PM statements of work


The PM process should be well defined to both the PM provider and the data center owner. A
detailed PM statement of work (SOW) should be issued by the PM provider to the owner
which clearly describes the scope of the PM. Listed below are some of the elements that
should be included in the SOW:

Dispatch provisions – Most manufacturers recommend a PM visit one year after the
installation and commissioning of equipment although certain high usage components
(i.e., humidifiers) may require earlier analysis and constant monitoring. Proper protocols
should be followed in order to assure easy access to the equipment at the data center
site. The owner’s operational constraints should also be accounted for. A plan should
be formulated so that the equipment can be tuned for optimal performance.
Parts replacement provisions – The SOW should include recommendations regarding
which parts need to be “preventatively” replaced or upgraded. Issues such as
availability of stock, supply of tested and certified parts, contingency planning in the
event of defective parts, and the removal and disposal of old parts should all be
addressed in the SOW.
Documentation – The SOW should specify a PM output report that documents the
actions taken during the PM visit. The output report should also be automatically re-
viewed by the vendor for technical follow-up.

PM maintenance services can either be purchased directly from the manufacturer or from
PM options third party maintenance providers. The selection of a maintenance organization capable of
supporting the PM vision for the data center is an important decision. Such organizations can
be global in scope or they can offer regional or local support. Table 3 compares two
categories of mainstream PM providers.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 86
Table 3
Meeting service challenges: manufacturer vs.
unauthorized third party

Manufacturer / authorized 3rd party Unauthorized 3rd party

Stock of spares available to data center owner locally Replacement parts may be procured from the “salvage
market” or from used equipment provider
Parts built and tested in an ISO certified factory
Replacement parts may be repaired locally by unqualified
Spare parts Parts are most recent revision / compatible with product being technicians
serviced
Replacement parts may be purchased from manufacturer with
Original factory parts are used for replacement third party as intermediary, adding delays

Service specialized on specific products Service personnel are more “generalists” and are expected to
Product knowledge Experience linked to the high numbers of installations worked service a wide variety of products from multiple manufacturers
on May not have access to or knowledge of critical upgrades

Local firms may be able to provide 2 hour response


Local support Can offer standard 4 hour response
May cover localities that manufacturer cannot

Beyond individual components, manufacturer often is


Knowledge of data Data center knowledge beyond the repair of individual
knowledgeable of power and cooling issues impacting overall
center environment components may be limited
data center operations

Personnel are service factory trained and certified to meet


national safety standards Personnel may not be factory trained. If factory trained may no
Training longer receive training updates
Personnel receive regular evaluation and training updates

Typically more expensive but less time needed to diagnose and


Cost Typically less expensive than manufacturer
problem solve

Service has access to all product hardware and firmware Access to product updates and firmware revisions may be
Product updates revisions limited

Service documentation is most recent revision and includes


service update information Service personnel may not have access to updated service
Documentation documentation
Issuance of technical reports and documentation to data center
owner after PM is completed

Service has all required tools, test equipment and software and
Tools May not have as quick an access to the latest tools
conforms to ISO calibration regulations

61.1.1.1 PM by manufacturer
Manufacturers package maintenance contracts that offer hotlines, support, and guaranteed
response times. Manufacturers also maintain thousands of pieces of equipment across all
geographies and are able to leverage tens of thousands of hours of field education to further
improve their maintenance practices and enhance the expertise of their staffs. Data gathered
by the factory-trained field personnel is channeled to the R&D organizations so they can
analyze the root cause of breakdowns.

The manufacturer’s R&D groups analyze the data and build needed hardware and software
improvements into product upgrades that then form the basis for the next PM. This global
exposure also allows for manufacturer-based service personnel to maintain a deeper
understanding of integrated power and cooling issues, a knowledge that they can apply to
both troubleshooting and predictive analysis.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 87
62.1.1.1 PM by unauthorized third party
Most third party maintenance companies are local or regional in scope; they tend to work on
fewer equipment installations. As a result, their learning curve may be longer regarding
technology changes. Since they have few direct links to the manufacturer and manufacturing
sites, most unauthorized third-party maintenance providers cannot provide an escalated level
of support. Many problems they encounter are “new” because they don’t have the benefit of
leveraging the global continuous improvement PM data gathered from manufacturer
installations all over the world.

63.1.1.1 User maintenance


Whether or not data center owners decide to maintain their own physical infrastructure
equipment depends on a number of factors:

Architecture / complexity of equipment


Criticality level of related applications
Data center owner’s business model

Some manufacturers facilitate the user-maintenance approach by designing physical


infrastructure components that require far less maintenance (i.e., a UPS with modular, user-
replaceable battery cartridges). Factors in favor of user-maintenance include the ability to pay
for maintenance service through an internal budget as opposed to an external budget and the
ability of data center staff, if they are properly trained, to quickly diagnose potential errors.

Factors that discourage user-maintenance include limited internal staff experience (not a
business core competency of the data center owner) and diminishing knowledge base of staff
over time as a result of turnover. Delays in securing parts from an outside source and quick
resolution to a problem may also be difficult if no maintenance contract is in place. Without
properly structuring an organization for user maintenance, expected efficiency gains and
financial gains may not be realized.

64.1.1.1 Condition-based maintenance


Estimating and projecting equipment condition over time will help to identify particular units
that are most likely to have defects requiring repairs. Such an exercise will also identify units
whose unique stresses (i.e., a UPS that often switches to battery power because of poor
utility power quality) have an increased probability of future failure. A condition-based
maintenance method also identifies, through statistics and data, which equipment
components most likely will remain in acceptable condition without the need for
maintenance. Maintenance can therefore be targeted where it will do the most good and the
least harm.

Condition-based maintenance data that is useful and available to help estimate the condition
of the equipment includes the following:

Age
History of operating experience
Environmental history (temperature, voltage, run-time, abnormal events)
Operating characteristics (vibration, noise, temperature)

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 88
Conclusion
PM is a key lifeline for a fully functioning data center. Maintenance contracts should include a clause for PM
coverage so that the data center owner can rest assured that comprehensive support is available when
required. The current PM process must expand to incorporate a “holistic” approach. The value add that PM
services provide to common components today (such as a UPS) should be expanded to the entire data center
power train (generators, transfer switches, transformers, breakers and switches, PDUs, UPSs) and cooling train
(CRACs, CRAHs, humidifiers, condensers, chillers).

As of today, the PM provider in the strongest position to provide such a level of support is the global
manufacturer of data center physical infrastructure. An integrated approach to PM allows the data center owner
to hold one partner accountable for scheduling, execution, documentation, risk management, and follow up.
This simplifies the process, cuts costs, and enhances overall systems availability levels.

Datacenter Design Reference Guide – Part 4 – DC Cabling, Security, & Mgmt Page 89

You might also like