0% found this document useful (0 votes)

32 views14 pages

Heat Behind The Meter: A Hidden Threat of Thermal Attacks in Edge Colocation Data Centers

Uploaded by

a18257157319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views14 pages

Heat Behind The Meter: A Hidden Threat of Thermal Attacks in Edge Colocation Data Centers

Uploaded by

a18257157319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Heat Behind the Meter: A Hidden Threat of

Thermal Attacks in Edge Colocation Data Centers
Zhihui Shao Mohammad A. Islam Shaolei Ren
University of California, Riverside University of Texas at Arlington University of California, Riverside
[email protected] [email protected] [email protected]

Abstract—The widespread adoption of Internet of Things and Metered Load Generated Heat
latency-critical applications has fueled the burgeoning develop- (e.g., 200W) (e.g., 300W, 100W from Battery)
ment of edge colocation data centers (a.k.a., edge colocation) Power Æ Heat
— small-scale data centers in distributed locations. In an edge Heat
colocation, multiple entities/tenants house their own physical Power
Load
ACÆDC
servers together, sharing the power and cooling infrastructures Outlet
(e.g., PDU) Internal Components
for cost efficiency and scalability. In this paper, we discover that Integrated
the sharing of cooling systems also exposes edge colocations’ Battery
potential vulnerabilities to cooling load injection attacks (called Power Supply Unit Server
thermal attacks) by an attacker which, if left at large, may
create thermal emergencies and even trigger system outages.
Importantly, thermal attacks can be launched by leveraging Fig. 1. An attacker uses its built-in batteries to stealthily inject additional
the emerging architecture of built-in batteries integrated with heat to overload the cooling system.
servers that can conceal the attacker’s actual server power (or
cooling load). We consider both one-shot attacks (which aim at
creating system outages) and repeated attacks (which aim at
causing frequent thermal emergencies). For repeated attacks, we recent Uptime Institute survey [4] shows that more than 75%
present a foresighted attack strategy which, using reinforcement of the respondents will use edge colocations to house their
learning, learns on the fly a good timing for attacks based on physical servers and deploy edge applications.
the battery state and benign tenants’ load. We also combine The criticality of hosted applications, such as assisted driv-
prototype experiments with simulations to validate our attacks ing [3], clearly mandates a high level of security for edge
and show that, for a small 8kW edge colocation, an attacker can
potentially cause significant losses. Finally, we suggest effective colocations. While securing servers and networks from cyber
countermeasures to the potential threat of thermal attacks. attacks remains a key issue, recent research has also identified
critical vulnerabilities in data center physical infrastructures.
I. I NTRODUCTION More concretely, the practice of infrastructure oversubscription
In the wake of the Internet of Things and ubiquitous exposes data centers to well-timed power load attacks that
computing demand, edge computing has recently emerged as aim at overloading the power capacity and compromising the
a game-changing paradigm that brings computation to the data center availability [5]–[8]. Likewise, data center cooling
Internet edge, thereby enabling ultra-low latencies for many system removes server heat to avoid overheating and hence
critical applications such as augmented reality and assisted is also crucial for service uptime. If not properly managed,
driving [1]. Consequently, the rise of edge computing spurs malicious workloads can create more hot spots that expose
the burgeoning development of multi-tenant edge colocation servers to an adverse thermal environment and thus more
data centers (a.k.a., edge colocation). An edge colocation is thermal emergencies [9]. Importantly, cooling system has
a small-scale shared colocation data center built at numerous emerged as a leading root cause for downtime incidents in
distributed locations for hosting latency-ultrasensitive work- state-of-the-art data centers (e.g., Microsoft’s) [10], [11].
loads such as assisted driving [2]. In such a colocation, the To meet the power capacity constraints and avoid outages
operator provides power and cooling resources to multiple [12]–[14], the colocation operator has power meters to con-
entities (i.e., tenants) for housing their own physical servers. tinuously monitor tenants’ server power usage. Meanwhile,
Thus, this fundamentally differs from a multi-tenant cloud power meters are also used as a proxy to measure servers’
platform where users/tenants share the cloud resources without cooling loads,1 ensuring that the designed cooling capacity is
owning the physical servers. not violated. The reason is that nearly 100% server power is
Edge colocations have become the preferred choice for edge eventually converted into heat or cooling load [6], [15], [16].
service providers. For example, Vapor IO, an edge colocation Therefore, with proper heat dissipation, meeting the power
operator, is rolling out thousands of edge colocations in capacity constraints also implicitly means meeting the cooling
partnership with wireless tower companies [3]. Moreover, a capacity constraints [16].

This work was supported in part by the NSF CNS-1551661 (CAREER). 1 Heat generated by servers is “cooling load” for the cooling system.

2378-203X/21/$31.00 ©2021 IEEE 330

DOI 10.1109/HPCA51647.2021.00035
Contributions. In this paper, we study an under-explored attacker can install a large built-in battery and inject sufficient
threat of thermal attacks — injecting additional cooling loads cooling loads continuously, resulting in overheating and trig-
to overload the cooling system — in an edge colocation. gering automatic system shutdown. For repeated attacks that
While edge colocations have been generally considered as aim at benign tenants’ performance degradation, the attacker
secure due to tenants’ full control of their own servers, we needs to repeatedly trigger thermal emergencies by charging
discover that the way tenants’ cooling loads are measured and discharging its battery at appropriate times. We propose
(i.e., using power meters as proxies) is potentially vulnerable a foresighted policy based on batch Q-learning that learns on
to thermal attacks. More concretely, as illustrated in Fig. 1, the fly a good timing for repeated attacks based on the battery
an attacker can tap into the emerging architecture of built- state and benign tenants’ load: thermal attacks are launched
in battery units and generate additional cooling loads (i.e., only when both the benign tenants’ loads are sufficiently high
heat), yet without violating the power capacity enforced by and the remaining battery energy is more than a threshold.
the colocation operator. If left neglected, successful thermal We run prototype experiments to validate the potential
attacks may create significant damages: (1) service outage for threat of thermal attacks. To evaluate the effectiveness of
benign tenants due to overheating (which we call one-shot our proposed repeated attack strategies, we run year-long
attack); or (2) more frequent thermal emergencies that result simulations based on computational fluid dynamics (CFD)
in tenants’ performance degradation (which we call repeated analysis. Our results demonstrate that for an 8kW edge
attacks). While various defenses (e.g., measuring servers’ colocation, an attacker subscribing 10% of the capacity can
outlet temperatures and air flows) are readily available, they cause thermal emergencies for more than 3% of the year,
have yet to be included in standard practices for many data degrading benign tenants’ performance. Finally, while the
centers. As such, despite non-trivial efforts needed by thermal existing practices may render edge colocations vulnerable,
attacks, our study serves as a precaution for strengthening battery-assisted thermal attacks can be fairly easily detected
cooling system management in edge colocations. and nullified using a reasonable amount of efforts. We discuss
A common practice in today’s colocations is to tightly such defense strategies in Section VII.
monitor tenants’ power usage as well as their server inlet/outlet In conclusion, while batteries have been exploited (such as
temperature. Nonetheless, if other effective defense mecha- for smoothing power demand [21]), our study makes a novel
nisms (in Section VII) are not properly implemented, built-in contribution by leveraging servers’ built-in batteries for an
batteries integrated with servers’ power supply units can assist under-explored malicious purpose — thermal attacks that can
an attacker with launching thermal attacks that are difficult potentially result in service outage or performance degradation
to trace. To provide better energy efficiency and reliability in edge colocations — and serves as a precaution despite its
[17]–[19], vendors have begun to integrate built-in batteries futility when proper defensive measures are enforced.
with servers’ power supply units (e.g., Supermicro BBP [17]).
Such built-in batteries can conceal the attacker’s actual cooling II. P RELIMINARIES ON E DGE C OLOCATIONS
load from the operator’s power meters — by discharging Colocations represent a critical segment and account for
built-in batteries to supply additional power, the attacker’s nearly 40% of the total energy consumption by data centers
servers can consume more actual power and hence generate [12], serving almost all industry sectors. To complement their
more heat than the operator measures using power meters. own megascale data centers that are typically built in rural
Moreover, this additional cooling load may not be promptly areas, even top-brand companies like Google and Microsoft
pinpointed by only monitoring the servers’ inlet and outlet rely on third-party colocations for better performances due to
temperatures. Consequently, indiscernible additional cooling close proximity to end users [22]. Importantly, in the context of
loads can be injected by an attacker to exceed the shared edge computing, colocations play an even more crucial role, as
cooling capacity, thus triggering thermal emergencies. While it is not economical for individual companies to fully manage
we focus on edge colocation data center, such thermal attacks small-scale data centers in numerous locations [23].
may be mounted against larger colocation data centers as well, The data center capacity includes both power and cooling
albeit the attacker needs to commit more resources. capacities. Power capacity is quantified by the amount of UPS-
Meanwhile, before automatic system shutdown [12], [20], protected power (a.k.a. critical power) that is delivered to
handling thermal emergencies require tenants’ power/cooling the servers, excluding other power consumption such as UPS
load reduction through clock rate throttling and/or workload power losses and cooling system power. As nearly 100% server
re-routing to other unaffected data centers, which can ad- power consumption (except for fan power) is converted into
versely affect tenants’ performance in terms of application heat or cooling load, the cooling system capacity is often sized
response time. based on the colocation’s power capacity and usually also
While successful thermal attacks can create an adverse measured in kilowatt [15], [16], [24]. The data center design
environment for hosting servers in edge colocations, they need may also leave some “headroom” in the cooling capacity to
non-trivial efforts. As a prerequisite for a good timing, the handle, if any, irregular heat generation and/or hot spots due
attacker needs to estimate benign tenants’ power/cooling loads to certain servers generating more heat than expected. In such
based on a voltage side channel [5]. For a one-shot attack cases, the cooling system utilization may sometimes still be
with the goal of shutting down an entire edge data center, the high because of the increasingly common practice of power

331
Managed by Operator
1 1 supply

Benign 2

…
inlet
UPS Tenants
PDU 3 internal

4 outlet
Attacker Cold 2 3 4 Hot
Cooling System Aisle Aisle Air Circulation

Fig. 2. An edge colocation data center with an attacker. Fig. 3. Overview of a cooling system in an edge colocation.

oversubscription for capital cost saving in modern data centers the inlet temperature while server internal temperature is the
(e.g., Facebook aggressively oversubscribes its power capacity highest and regulated by servers’ internal fans. Hence, with
by 47% on average) [13], [25]–[27]. heat containment, we have the following [30]–[32]:
The colocation operator provides non-IT infrastructure sup-
port (i.e., power and cooling systems), while each tenant Tinlet ≈ Tsup < Toutlet < Tinternal . (1)
brings and controls its own physical servers.2 The non-IT In a data center, server inlet temperature is the most impor-
infrastructure is expensive and/or time-consuming to construct, tant thermal metric [31], [33], because servers’ internal tem-
taking nearly 60% of the total cost of ownership over a 10- perature control uses the inlet temperature as a reference [34].
year lifespan for a colocation operator [12], [15], [21]. Thus, For example, in modern data centers, server inlet temperature
like network bandwidth, the operator’s power and cooling is conditioned at 27◦ C for cooling efficiency, as recommended
infrastructure capacity is a limited resource carefully sized by ASHRAE [33], [35]. Also note that, while server heat is
based on the tenants’ demand. responsible for increase in internal and outlet temperatures,
neither Tinternal nor Toutlet is a reliable indicator for a server’s
A. Power Infrastructure
cooling load since they depend on the server’s internal heat
As illustrated in Fig. 2, typically, an edge colocation data management (e.g., fan speed) and air flow rate.
center uses a tree-type power hierarchy with total capacity
in the range of a few kilowatts to a few tens of kilowatts III. T HERMAL ATTACK
shared by multiple tenants. Utility power first enters the data The main focus of this section is to present the potential
center through an uninterruptible power supply (UPS). Then, threat of battery-assisted thermal attacks (when concealed
the UPS-protected power goes into a power distribution unit cooling loads are behind the meter and not promptly detected)
(PDU), which distributes the power to its downstream servers. and help strengthen edge colocations. As a precursor, we first
introduce our threat model that outlines the scenario consid-
B. Cooling Infrastructure ered for thermal attacks. We then present the potential impacts
While various cooling methods (e.g., computer room air on edge colocations. Finally, we introduce two possible attack
conditioner, chiller, and “free” outside air cooling) are avail- strategies followed by discussions on their feasibility.
able [28], an edge colocation usually uses a computer room
A. Threat Model
air conditioner to remove servers’ heat due to its small size
and often rugged deployment (e.g., outdoor with a wireless We consider an edge colocation data center with a total
tower). Fig. 3 illustrates a typical cooling system in an edge power/cooling capacity of C, housing a few racks of servers
colocation. For the best cooling efficiency, today’s edge colo- owned by multiple tenants. There exists a malicious tenant
cations also implement hot/cold aisle containment to prevent (i.e., attacker) that runs artificial workloads without real values
the hot air from mixing with the cold air [23], [29]. and has bad intentions.
There are four different notions of temperature in a data What the attacker can do. The attacker houses its own
center: supply air temperature Tsup , server inlet temperature physical servers in the edge colocation, sharing the power and
Tinlet (i.e., temperature of cold air entering a server), server cooling infrastructures with benign tenants. As illustrated in
internal temperature Tinternal (e.g., CPU temperature), and Figs. 1 and 4(a), the attacker’s server power supply units has
server outlet temperature Toutlet (i.e., temperature of hot air built-in battery units, which can conceal the attacker’s actual
exiting a server). With heat containment installed, all the server power/cooling load from the operator’s power meters.
servers’ inlet temperature is nearly identical to the supply air Fig. 4(a) shows an overview of the attacker’s server.
temperature. Thus, supply air temperature and server inlet tem- The attacker subscribes a data center capacity of ca from
perature are the lowest and baseline, whose increase will lead the colocation operator and keeps its power drawn from the
to increases in server internal and outlet temperatures. Server operator’s PDU below ca at all times (even during an attack),
outlet temperature is typically elevated by 10+◦ C compared to in order to meet the operator’s requirement.
When launching a thermal attack, the attacker runs power-
2A tenant can share fraction of a rack space with other tenants. hungry applications (e.g., intensive computation) to increase

332
below a certain level (a.k.a. power capping). The actual amount
ADC
12V and duration of power capping can be either pre-determined
PSU
based on SLA terms [35] or decided at runtime through a
Voltage
Battery Regulator dynamic coordination mechanism [12], [44].
Nonetheless, handling a thermal emergency by capping

…
tenants’ server power (through, e.g., CPU throttling) inevitably
results in performance degradation, which can in turn cause
(a) (b) user dissatisfaction, revenue loss, and/or SLA violation [12],
[13], [18], [45]. Some workloads may be re-routed to other
Fig. 4. (a) Attacker’s server with built-in batteries. (b) Supermicro’s power unaffected data centers for service continuity, but this comes
supply [17]. The built-in battery module is highlighted in a red circle.
at a higher latency since otherwise those workloads would
have been processed in the preferred site to achieve the best
its actual server power consumption to pa > ca where ca performance without being re-routed.
amount comes from the operator’s PDU and the rest from its 2) System outage: In order to prevent permanent hardware
built-in battery. In practice, when running at the peak load, a damage, if the server inlet temperature continues rising despite
single server equipped with multiple CPUs and/or GPUs can cooling load capping, automatic system shutdown may occur,
easily consume several hundred watts, even more than 1kW leading to a system outage (e.g., the shared PDU can power
[36]. Thus, the attacker can inject an additional cooling load off when the inlet temperature reaches 45◦ C) and service
of pb = pa − ca beyond its subscribed capacity by discharging interruptions [33]. Such system outages can cause loss of
built-in battery units (which can be achieved by a dual-source working data sets, and also suffer from long restart waiting
power supply that can simultaneously draw power from the time. Financially, a system outage can cost thousands of
PDU and the battery units [21], [37]–[40].) dollars every minute [10]. For latency-critical applications, an
The attacker uses a voltage side channel, as proposed in [5], outage event may cause even more catastrophic consequences
to estimate benign tenants’ real-time total server load with a such as decreased safety in edge-assisted driving [46].
high accuracy (Fig. 5(b)). We also run a prototype experiment to demonstrate the
What the attacker cannot do. We do not consider naive potential impact of thermal attacks on benign tenants, and the
attacks, such as self-explosion and tampering with the physical results are in Appendix A.
infrastructures, which are beyond the scope of our work.
C. Attack Strategies
Moreover, other attacks, such as network DDoS attacks, are
also orthogonal to our focus. We introduce two possible strategies for battery-assisted
thermal attacks with different goals.
B. Impact of Thermal Attacks One-shot attack. It aims at creating a system outage by
Although non-trivial efforts are needed in the threat model, increasing the server inlet temperature beyond the safety limit
a successful thermal attack can overload a data center’s cooling (e.g., 45◦ C [33]). It can also be coordinated across multiple
system and possibly increase the server inlet temperature to a edge colocations for a wide-area service interruption. Even
dangerous level, triggering frequent performance degradation successfully launched only once, the caused damage may
and even system outages [34], [41]. be signiﬁcant, especially for safety-critical applications (e.g.,
1) Performance degradation: Before system shutdown, a edge-assisted driving) [46].
preventative mechanism is to temporarily cap the data center- Repeated attacks. Instead of aggressively overheating and
wide cooling load (i.e., server power) below the cooling capac- shutting down the entire edge colocation, repeated attacks
ity [20], [42]. Speciﬁcally, when the server inlet temperature aim at frequently degrading performance of benign tenants’
exceeds a threshold (e.g., 32◦ C) for a certain amount of time latency-sensitive applications over a long period (e.g., one
[20], it is considered that a data center exception, called year) by triggering thermal emergencies and cooling load
thermal emergency, has occurred and servers are forcibly put capping. Thus, repeated attacks compromise the long-term
in a low power state. The wait-time between inlet temperature cooling system availability in edge colocations.
violation and thermal emergency declaration depends on oper- In general, one-shot attack requires a higher battery capacity
ator’s risk management policy. The temperature threshold for to support more intense attack loads (which may still be
a thermal emergency is set lower than the server’s automatic feasible as shown in Section VI). On the other hand, repeated
shutdown temperature to proactively handle an emergency. For attacks require relatively less (still a considerable amount of)
example, in a Google-type data center, disk speeds and/or resource, but they require more sophisticated timing of the
CPUs are throttled to lower the server power load (i.e., cooling attacks and can be easy to detect.
load) in the event of a thermal emergency [20], [43]. Similar
mechanisms also exist in multi-tenant colocations to handle a D. Feasibility of Thermal Attacks
thermal emergency. Concretely, without controlling tenants’ Motivation for thermal attacks. One-shot attack is as
servers, the operator sends signals to tenants’ own server motivating as traditional DDoS attacks, as it can potentially
management systems such that tenants can cap power loads create service outages. Likewise, repeated attacks can result

333
in frequent performance degradation for latency-sensitive ap- 0.15

Probability
plications, which in turn causes user dissatisfaction, revenue
0.10

loss, and/or SLA violation. Thus, although the cost barrier
0.05
is non-trivial, battery-assisted thermal attack might still be
inviting for potential attackers, such as the target colocation’s 0.00
−6 −4 −2 0 2 4 6
Load Estimation Err. (%)
ill-intentioned competitor or state-sponsored attackers.
Attacker’s malicious cooling load. In recent years, vendors (a) (b)
have integrated built-in batteries into servers’ power supply
units as an emerging backup power solution (e.g., Supermicro Fig. 5. (a) Server voltage carries the servers’ load information. (b) Load
estimation error of the voltage side channel.
BBP [17] shown in Fig. 4(b)). Thus, an attacker can discharge
built-in batteries to supply additional power to its servers,
generating malicious cooling loads without being monitored
and estimate the total load at runtime.
by the colocation operator’s power meters. Moreover, without
We run a 24-hour real-world workload trace in our prototype
air flow meters, temperature sensors that only monitor server
and collect the voltage signal using a NI digital data acquisi-
inlet/outlet temperature cannot reliably locate the malicious
tion (DAQ) as an ADC proxy to extract the servers’ total power
cooling load. Consequently, if left neglected, thermal attacks
load. We plot in Fig. 5(b) the probability distribution of load
can be launched behind the meter. This is also illustrated
estimation errors, confirming that the voltage side channel can
in Fig. 1: an attacker generates 300W cooling load, but the
be leveraged for precisely timing thermal attacks.
colocation operator only measures 200W from the power meter
Possibility of being detected. Detection of battery-assisted
and the additional 100W load is supported by the attacker’s
thermal attacks is not difficult, but contingent upon the edge
internal batteries.
colocation operator’s practice of environment monitoring.
Availability of off-the-shelf hardware. Servers with built-
Specifically, if the operator solely relies on power meters for
in batteries are commercially available (e.g., Supermicro [17]).
monitoring tenants’ loads and temperature sensors for condi-
The current battery energy density is enough to fit into servers
tioning the thermal environment, thermal attacks may possibly
and supply sufficient additional power to mask the attacker’s
remain undetected until they cause damages. A service outage
malicious cooling loads [47], even for an one-shot attack that
(due to one-shot attack) or more frequent thermal emergencies
requires more attack loads than repeated attacks. Moreover,
(due to repeated attacks) can trigger a thorough inspection,
servers with large peak-to-average ratios are also available
thus exposing the attacker. In order to proactively prevent
for generating a large amount of heat during an attack. For
such damages in advance, as discussed in Section VII, the
example, Dell manufactured PowerEdge R740/R740xd servers
operator can install additional monitoring apparatus such as
can be equipped with up to three Nvidia Tesla GPUs each with
server outlet air flow meters, which are not widely used in
225W peak and 20W idle power [48], [49].
many data centers. Thus, although thermal attacks do not have
Voltage side channel to time thermal attacks. Due to
a high degree of stealthiness, there is a need of attention to
time-varying loads, the attacker needs to find a good timing
potential thermal attacks.
for successful attacks (especially for repeated attacks) when
Relationship to power attacks. Power attacks exploit
benign tenants’ aggregate power load (or cooling load) is high.
oversubscribed power capacity and can be launched without
The attacker can utilize a side channel — voltage side channel
the need of battery [5]–[8], [51]. On the other hand, our
in our study — to estimate benign tenants’ power draw from
proposed thermal attacks are launched with the help of built-in
the shared PDU. The voltage side channel is robust against
battery for concealment of malicious cooling loads. Moreover,
changes in the environment and provides high accuracy due to
for repeated attacks, thermal attacks are stateful due to battery
its wired signal [5]. Utilizing the voltage side channel requires
charging/discharging that results in temporal correlation of
one analog-to-digital converter (ADC) that can fit on a server’s
battery states, whereas power attacks are stateless and can
power supply unit (as demonstrated in an orthogonal study for
be launched at any time without being constrained by the
USB-powered IoT devices [50]). As shown in Fig. 4(a), the
available battery energy. Thus, our thermal attacks are com-
ADC taps into the server’s input voltage to sample the PDU-
plementary to power attacks and present a potential threat by
level voltage.
leveraging servers’ built-in battery for a malicious purpose.
For the readers’ understanding, we show in Fig. 5(a) the fun-
damental principle behind the voltage side channel as recently
IV. L EARNING AN ATTACK P OLICY
proposed in [5]. The key idea is that because of the voltage
drop along the shared power cable, the total load information An one-shot attack is a special case of repeated attacks if the
(proportional to current) is contained in the voltage signal, e.g., attacker sets a sufficiently high threshold on benign tenant’s
V1 , entering any servers connected to the PDU. Meanwhile, all load (above which an attack is launched) and greedily use
today’s servers have power factor correction (PFC) circuits that up its large built-in battery energy. Thus, we now study a
generate high-frequency voltage ripples, whose amplitude is general repeated attack policy, Foresighted, by formulating
strongly correlated with the server load. Thus, the attacker can it as a discrete-time Markov decision process (MDP) and
sense the incoming voltage signal, extract the voltage ripples, using reinforcement learning. The repeated attack policy has

334
a structural property: attack when both the benign tenants’ system
∞ state) which maximizes the total discounted reward
k
server load and the battery energy level are sufficiently high. k=0 γ R(sk , ak , sk+1 ). The discount factor γ ∈ (0, 1) is
imposed to ensure the convergence of summation and implies
A. MDP formulation
in practice that future rewards are relatively less important than
We divide the entire time horizon into time slots (e.g., 1 immediate rewards [52]. Nonetheless, the resulting server inlet
minute each) indexed by k = 0, 1, 2, · · · , ∞, and present our temperature T (s, a) is an involved function that also depends
MDP formulation below. on external factors such as the edge colocation layout, and
• System state: s = (b, u) ∈ S the dynamics of benign tenants’ power usage is unknown to
• Action: a(s) ∈ A(s) the attacker. Thus, we need an online learning approach to

• State transition probabilities: P (s, a, s ) identify the optimal policy π ∗ on the fly.

• Reward function: R(s, a, s )
• Discount factor: γ ∈ (0, 1)
B. Batch Q-learning
The tuple (s, a, s ) means that, given an action a, the
system state evolves from s to s . In our problem, the system Reinforcement learning can effectively assist an agent with
state includes two sub-states: battery state (the amount of finding optimal actions in an unknown environment. The
remaining energy b in the batter units) and the attacker’s cooling load state is essentially uncontrollable and exogenous
estimated benign tenants’ load state u (using a voltage side to the attacker. On the other hand, the battery state is fully
channel in Section III-D [5]). Note that we consider the controllable and, with simplification, can be approximated as
estimated load as part of the system state, because the true bk+1 = min(bk + ek , B̄), where ek is the charged energy
value of servers’ total load is not available to the attacker. during one time slot (a negative value means battery discharg-
We consider three actions: (1) charging the battery units; ing for attacks) and B̄ is the total battery capacity. Thus, we
(2) launching a thermal attack by running the servers at peak adopt batch Q-learning [53], by extending the widely-used
power and discharging batteries; and (3) standby, i.e., running standard Q-learning [52], [53]. Concretely, by introducing an
dummy workloads without charging or discharging batteries. intermediate state (also called post state s̃k ), we have two state
The battery’s charging rate is fixed at the vendor recommended transition processes: from sk to s̃k , we only update the battery
value, while the effective discharging rate (i.e., power actually state whose transition, according to the attacker’s action, is
delivered to servers, excluding battery losses) is set to pb fully determined; then, from s̃k to sk+1 , we will update the
which, if combined with the attacker’s subscribed capacity cooling demand state based on observations. More specifically,
ca , can support the attacker’s total server power consumption for each time slot k, our proposed batch Q-learning works as
pa for thermal attacks. The state transitions are governed by follows:
benign tenants’ load that is exogenous to the attacker and the
battery energy evolution which is controlled by the attacker’s ak ← arg max [Q(sk , a) + θV (s̃k (sk , a))] (3)
charging/discharging decision. a∈A(sk )

We define the attacker’s reward function as follows: s̃k (sk , ak ) ← f (sk , ak ) (4)
+
R(s, a, s ) = w · [T (s, a) − T0 ] − β(a), (2) Q(sk , ak ) ← (1 − δ)Q(sk , ak ) + δR(sk , ak , sk+1 ) (5)
C(sk ) = max [Q(sk , a) + γV (s̃k )] (6)
where T (s, a) is the resulting server inlet temperature, T0 is a
the server inlet temperature conditioned by the operator with- V (s̃k ) = (1 − δ) V (s̃k ) + δC(sk+1 ) (7)
out attacks, β(a) is a cost term, and the operator [·]+ means
max(·, 0). Note that the attacker can easily sense the resulting where δ ∈ (0, 1) is the learning rate, and only the battery state
inlet temperature T (s, a), because today’s servers have built- is updated based on the attacker’s charging/discharging action
in temperature sensors to monitor the server inlet temperature when setting the post state s̃k (sk , a) in Eqn. 4.
for safety reasons (i.e., if the server inlet temperature is too Unlike standard Q-learning, three different value matrixes
high, the server may shut down by itself [34]). Clearly, after are used for batch learning: state-action value Q(sk , ak ),
discharging batteries, the attacker needs to recharge them, post-state value V (s̃k ), and normal state value C(Sk ). First,
which hence draws more energy from the operator’s PDU after observing the system state sk , the attacker makes an
than otherwise. To account for this, we add a normalized cost action a based on Q(sk , a) and post-state value V (s̃k (sk , a))
term: β(a) = 1 during an attack and β(a) = 0 otherwise. according to Eqn. 3. Then, post state sk can be obtained
The cost is normalized to 1, because the attacker discharges based on attacker’s action. Next, the reward Rk is obtained
a fixed amount of energy for each attack. The weight w ≥ 0 based on attacker’s observed server inlet temperature and its
governs the tradeoff between server inlet temperature increase reward function in Eqn. (2). Meanwhile, the next state sk+1 is
and total battery usage (or attack time): the larger w, the more obtained by estimating the cooling state through a voltage side
importance of server inlet temperature increase and hence channel as discussed in Section III-D. Thus, the three value
more attacks. matrixes can be updated recursively according to Eqns. (5),
In a standard MDP, the goal is to find an optimal policy (6) and (7), respectively, making the learning process converge
π ∗ : S → A (i.e., deciding an optimal action given each more quickly.

335
TABLE I
L IST OF PARAMETERS WITH THE DEFAULT VALUES . 3 12

Power (kW)
4 Aggregate Power
10 Cooling Capacity
Parameter Value 2 8
Data Center Capacity 8 kW 1
Number of Tenants 4 6
Number of Servers 40 4
Number of Server Racks 2 0 6 12 18 24
Attacker’s Capacity (ca ) 0.8 kW Time (h)
Attacker’s Total Battery Capacity (B̄) 0.2 kWh (a) (b)
Attack Thermal Load from Battery 1 kW
Charging Rate of the Battery 0.2 kW Fig. 6. (a) Data center layout.
1 Server racks.
2 Heat containment. 3 Air
Temperature Threshold for Emergency (Tth ) 32◦ C conditioner.
4 Supply air duct. (b) 24-hour snapshot of the power trace.
Q-learning Discount Factor (γ) 0.99
Q-learning Learning Rate (δ(t)) 1/t0.85

CFD analysis. Speciﬁcally, to extract the heat distribution

V. E VALUATION M ETHODOLOGY matrix, we test the data center with a heat spike from each
In this section, we first present the default simulation set- server and measure the resulting temperature impact for 10
tings and evaluation metrics, and then validate our simulation minutes. We repeat the process for all servers to completely
model. build the matrix. We use the 10-minute window to allow
the heat convection through the air and capture the gradual
A. Settings temperature build-up from sustained server heat generation.
It is practically challenging, if possible at all, to evaluate We limit the CFD analysis of each heat spike to 10 minutes
different thermal attack strategies over a timescale of years. since we find no measurable impact beyond this time horizon.
Thus, we resort to a simulation-based approach based on the The accuracy of CFD analysis and the heat distribution model
well-established computational fluid dynamics (CFD) analysis has been extensively verified with real systems [31], [56], and
[6], [31], [32], [54] to simulate thermal dynamics. This is also will also be validated against our prototype in Section V-B.
the state-of-the-art methodology in data center-scale research Because of the well-insulated environment, we do not incorpo-
[6], [13], [14], [31]. Prior to simulations, we will also validate rate the impact of outside temperature. Even with low outside
our simulation model with real experiments on our scaled- temperature, if overloaded, data center’s cooling system cannot
down prototype of 14 servers. We list the default simulation remove all server heat.
parameters in Table I. Attacker. The attacker has built-in batteries integrated with
Edge colocation infrastructure. We consider a container- the servers’ power supply units. While the capacity of each
ized modular data center design, which is particularly suitable server subscribed from the operator is 200W, each of the
for edge colocations due to its self-contained design. We attacker’s servers can run at a peak power of 450W by
follow the specification of the Vertiv SmartMod container data discharging built-in batteries to supply the additional 250W.
center with two server racks, each holding 20 servers [55]. We Thus, the attacker can inject up to 1kW cooling load for
consider there are four tenants (including the attacker) with repeated attacks. When recharging, the built-in batteries have
a total subscribed power (i.e., the power capacity) of 8kW, a total charging rate of 0.2 kW. We use battery specification
where each server’s maximum power consumption is 200W. of [47] with a suitable size for placing inside a server and
The attacker has 4 servers with a total subscribed capacity of set the attacker’s default total battery capacity to 0.2kWh with
0.8 kW while the other three benign tenants each subscribe 0.05kWh (i.e., 200W for 15 minutes) per server. If the attacker
to 2.4kW. The attacker’s servers are shown in red shades in aims at an one-shot attack, each of its four servers has a peak
Fig. 6(a). Note that, while we place the attacker’s servers at power of 950W, resulting in a total attack load of 3KW. This
the bottom of the rack, their location within the rack does not can be achieved by using multiple power hungry GPUs (e.g.,
play any significant role in the attack since the cooling load Nvidia RTX 3080, each with a full power of 320W [57]) in
is determined by server power. The data center employs heat each server. The current battery energy density [47] is enough
containment for the hot exhaust air returning to the AC. The to support the additional load for an one-shot attack, as each
AC supplies cold air at 27◦ C, with a cooling capacity of 8kW. attack only lasts a few minutes.
Fig. 6(a) shows the layout used in our experiment. Thermal emergency and system outage. A thermal emer-
Thermal environment. The CFD analysis provides the gency is considered to arise when the server inlet temperature
most detailed thermal dynamics of a data center (e.g., even exceeds 32◦ C for at least 2 minutes. We consider 32◦ C as
Google uses CFD analysis to predict thermal distributions the threshold temperature, because it is the maximum allowed
[54]). However, it is computationally exhaustive to run tran- temperature based on the ASHRAE guideline for data centers
sient CFD analysis for long experiments (e.g., a year) [31]. with enterprise-grade servers and storage [33]. To handle a
Therefore, following the literature [6], [31], we model the data thermal emergency, each server (including attacker’s servers)
center’s heat flow using a heat distribution matrix, for which is required to cap its power below 120W (60% of capacity) to
we only need to obtain the matrix parameters using shorter prevent more serious impacts. As a precaution, load capping

336
Temperature (∘C)
100 250 1.5kW thermal overload beyond the limit that can be handled
Overload (kW)

1.5 40

Battery Level (%)

Cooling Load

Power (W)
80 200
1.0
Experiment
Simulation 35 60 150
by the top air vents. We obtain the heat distribution model
0.5 30 40 Battery Level
Discharge 100 based on CFD analysis. In Fig. 7(a), we show the monitored
20 Re-Charge 50
0.0 25
server inlet temperature change along with our temperature
0 0
0 10 20 30 0 10 20 30 40 50 60 change simulated using our model. We see that both the heat
Time (min) Time (min)
distribution model and temperature sensor readings exhibit
(a) (b)
very similar dynamics. This is expected since we adopt well-
established CFD-based simulation [6], [31].
Fig. 7. Experimental validation of our simulation model.
Battery energy dynamics. In our Q-learning and the
simulation, we need to validate that the linear battery model
lasts for 5 minutes for each thermal emergency. If the inlet bk+1 = min(bk +ek , B̄), where bk is the battery level at time k,
temperature continues rising to reach 45◦ C, automatic shut- is accurate to model the battery energy changes with respect
down occurs (e.g., the shared PDU can power off), creating a to the charging/discharging decisions. For this, we connect
system outage and service interruptions [33]. two Dell desktops with a total load of ∼175W to our UPS
Power trace. For the three benign tenants, we use workload battery. We connect a power meter between the UPS and the
traces from Facebook and Baidu [13], [14], and generate a AC power outlet to measure the total power consumption of
year-long synthetic power trace from request-level log using the battery and the desktops. We connect another power meter
server power models validated in real systems [58]–[60]. The between the UPS and the desktops to record the total power
total power usage is scaled to have a 75% average utilization of the two desktops. Subtracting the later from the former
in our 8kW data center. We show a 24-hour snapshot of the gives the total power consumption of the UPS. To demonstrate
power trace in Fig 6(b). To demonstrate its robustness across the battery dynamics, we first run the UPS on the battery
different load patterns, we also run an alternate power trace discharging mode by unplugging it from the AC outlet. After
and show in Fig. 13 in Section VI-F. 10 minutes, we reconnect the UPS to the AC outlet, which puts
Application performance. For delay-sensitive workloads, it in the battery charging mode. We show the battery energy
high-percentile latency is the most critical metric [61]. Here, levels in Fig. 7(b). In our experiment, the charging rate is
we consider 95-percentile response time as the performance lower than the discharging rate, because of the additional UPS
metric and model the tenants’ performance based on experi- loss to power the running desktops. This experiment conforms
ments on our small cluster (Fig. 15 in Appendix A). to our choice of a linear battery energy model. While even
Q-learning parameters. Following the literature [62], we more complicated and detailed battery models (e.g., impact of
set the default discount factor γ = 0.99 and a dynamic learning ambient temperature) may be adopted [63], it does not offer
rate that is updated everyday using δ(t) = 1/t0.85 , where t much additional insight for our purpose and our observations
is the number of days elapsed. We use one minute as each still hold.
time slot, and show the other parameters when presenting the To sum up, our simulation methodology (i.e., using CFD-
results in Section VI. To initialize the table of Q values, we use based analysis for modeling temperature dynamics and using a
random power traces offline based on an initial attack policy. linear charging/discharing model for battery energy dynamics)
Our results show that during the online learning stage, the matches well with the real-world observations and hence can
action policy can converge quickly (often within 1-4 weeks). be used to evaluate thermal attacks with a good confidence.
Evaluation metrics. For the adverse thermal environment,
VI. E VALUATION R ESULTS
we consider the average server inlet temperature increase,
the probability distribution of the temperature, and the total We first show an example of one-shot attack. Then, for
emergency hours due to repeated thermal attacks. For benign repeated attacks, we compare Foresighted with another attack
tenants, we examine their performance degradation. We also policy, Myopic, that launches thermal attacks in a greedy
study the average response time during the emergency periods manner whenever there is enough energy in the battery and
normalized to that of without any emergencies. the benign tenants’ aggregate load is sufficiently high. Besides
Myopic and Foresighted, we also consider Random as a bench-
B. Experimental Validation of Our Simulation Model mark, where the attacker randomly launches thermal attacks
While simulation-based evaluation is widely used in data whenever it has enough battery energy without considering
center research [6], [9], [21], we validate our simulation model benign tenants’ power loads.
using real experiments on our prototype consisting of 14
servers and a 600VA CyberPower UPS battery. We look into A. Thermal Attack Demonstration
the two important aspects of our simulation model — thermal 1) One-shot attack: We consider a 30-minute snapshot and
dynamics and battery charging/discharging model. demonstrate an one-shot attack in Fig. 8 where the attacker
Temperature dynamics. We place our server rack in a injects 3kW of intense attack load at around the 18th minute,
sealed environment with a comparable dimension to an edge causing the server inlet temperature to rise quickly. At around
data center. The rack is cooled by the building’s central cooling the 21st minute, a thermal emergency is triggered and power
system and has air vents on the top. We create an additional capping is applied, limiting the total metered load below 5KW.

337
Temperature (∘C)
Thermal Emergency Metered Load Attack Load Temperature Metered Power Attack Load Battery Recharge
12 60 Cooling Capacity Thermal Emergency Battery Energy
Power (kW)

10 50 10 Random
System Outage

Power (kW)

Energy (%)
8 40 100

Battery
6 30 8 75
4 20 6 50
0 5 10 15 20 25 30 25
Time (min)
4 0
10 Myopic
Fig. 8. Demonstration of a one-shot attack.

Power (kW)

Energy (%)
100

Battery
8 75
6 50
25
Nonetheless, the attack load remains to keep the server inlet 4 0
temperature high enough beyond the safety threshold of 45◦ C 10 Foresighted

Power (kW)

Energy (%)
100
[33], successfully resulting in a system outage. This is also

Battery
8 75
consistent with other orthogonal studies that demonstrate a 50
6
very quick rise of inlet temperature in case of a cooling system 25
4 0
malfunction [41]. If the one-shot attack is coordinated across 0 1 2 3 4
Time (h)
multiple colocations, a service interruption may occur and
create signiﬁcant damages. Fig. 9. 4-hour snapshot of thermal attacks.
2) Repeated attacks: We illustrate how repeated attacks
create emergencies under different attack policies in Fig. 9 by

considering a four-hour snapshot when the total power/cooling

load is relatively higher. In our illustration, Random launches

attacks for 8% of the times, Myopic sets the attack threshold
at 7.4kW, while Foresighted uses a weight w = 14. These

settings are chosen to yield similar attack times (i.e., 8%
of the time) across different attack policies. The total power
drawn from the operator’s PDU is shown as “Metered Power”, (a) Weight w = 9 (b) Weight w = 14
while the actual server power consumption also includes the
contribution from the attacker’s batteries (“Attack Load”) and Fig. 10. Attack policy learnt by Foresighted.
hence is larger than the metered power during the attacks.
On the other hand, the actual server power is smaller than
the metered power during battery charging. The discrepancy with fully charged batteries, unlike Foresighted, these attacks
between the metered power and actual server power highlights will more likely occur at the wrong times due to the lack of
the attacker’s “behind-the-meter” cooling loads that are not learning and accounting for battery level dynamics.
monitored by the operator.
We see in Fig. 9 that thermal attacks using Random, B. Attack Policy Learnt by Foresighted
which remains oblivious of the high cooling load, fail to We show in Fig. 10 the structural property of our repeated
create any thermal emergencies. Note that, Random’s attacks attack policy learnt by Foresighted: attack when both the
look sparser in Fig. 9 since they are more spread over time benign tenants’ server load and the battery energy level are
while Myopic and Foresighted’s attacks are concentrated in sufficiently high. For illustration, we consider two different
the high power/cooling load periods. Myopic exploits the values of w (the larger w, the more weight on creating
voltage side channel [5] to detect benign tenants’ high power temperature increases and hence more attacks). For w = 9 in
loads and launches thermal attacks between hours 0 and 1. Fig. 10(a), attacks are launched only when the estimated power
Since the power/cooling load remains at a high level, attacks load (including the attacker’s subscribed power capacity) is
continue until the operator announces a thermal emergency. above 7.5kW and more than 60% of battery energy is left.
At that point, attacks are stopped and the power consumption For w = 14, we see that attacks are launched even for 40%
is capped to oblige to the operator’s emergency handling remaining battery energy when the power is above 7.5kW.
protocol. The power returns to a normal level after being Meanwhile, Foresighted launches attacks at a lower power of
capped for 5 minutes to handle the thermal emergency. 7kW when it has more than 80% battery energy.
While it also launches thermal attacks between hours 0 and
1, Foresighted does not launch a series of unsuccessful short- C. Cost Estimate
duration attacks like Myopic. Instead, it waits to regain the bat- Benign tenants’ cost. With an one-shot attack, benign
tery energy and launches a sustained thermal attack to trigger a tenants can suffer from service outages, which may be costly
second thermal emergency near hour 2. This shows the benefits or even indirectly cause fatal damages (e.g., decreased safety
of reinforcement learning which considers the impact of its for assisted driving [46]); with repeated attacks, tenants can
actions on the future for maximizing the long-term benefits. potentially experience more frequent performance degradation.
Note that, even if Myopic only launches long-duration attacks The monetary impact of thermal attacks is generally difficult

338
16 3 3

Norm Avg Delay

Emergency (%)
Random Myopic Foresighted Myopic Foresighted Myopic Foresighted
Ts=25oC
Time (min)

12 0.2

ΔT (∘C)
Ts=27oC 2 2
8 Ts=29oC
0.1 1 1
4
0 0.0 0 0
0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Tenant Tenant Tenant
Overload Power (kW) Attack Time (h/day) Attack Time (h/day) #1 #2 #3

(a) (b) (c) (d)

Fig. 11. (a) Overload time required to exceed the temperature limit of 32◦ C. (b) Average temperature increase vs. attack time. (c) Total attack-induced
emergency vs. attack time. (d) Tenants’ performance during emergencies.

to estimate. To offer an approximate point of reference, we However, as more attacks are launched, the temperature in-
provide a ballpark estimate for repeated attacks following crease for Myopic peaks at around attack time of 1.1 hours
prior studies [10], [12], [64] that calculate the cost impact per day and then starts to decrease. This is because Myopic
resulting from the increased 95-percentile latency. Under our launches premature attacks which deplete the battery energy
setting, Foresighted causes a total performance cost of roughly and hence miss future attack opportunities. We see a similar
$60+K/year to benign tenants in our 8kW edge colocation impact on the annual thermal emergency time in Fig. 11(c)
(roughly 80% of benign tenants’ total rental costs plus amor- where Myopic’s performance starts to deteriorate around attack
tized server costs), noting that the actual cost highly depends time of 1.5 hours per day.
on the affected tenants’ applications and can include additional Foresighted takes the future into account and hence retains
indirect cost such as business reputation. both the average temperature increase and annual emergency
Attacker’s cost. The attacker’s cost involves the power time increase with more thermal attacks. However, beyond
capacity subscription cost, electricity cost, and server purchase an attack time of 1.5 hours per day, Foresighted cannot
cost: 150$/kW/month power subscription cost, 0.1$/kWh en- create further higher temperature increases nor more thermal
ergy cost, and $4500 for each server [12]. It is on a par with emergencies. This is mainly because the total available attack
the cost for other related attacks [5]–[9], and can be affordable opportunities are limited (i.e., benign tenants do not always
for institutional or state-sponsored attackers. have high power loads) and recharging batteries takes time.
Nonetheless, given any amount of thermal attacks, Foresighted
D. Impact of Thermal Attacks can create higher server inlet temperature increases and more
thermal emergencies than Myopic.
For repeated attacks, we ﬁrst show in Fig. 11(a) how long
Attack-induced thermal emergencies. In Fig. 11(c), we
it takes for the server inlet temperature to exceed the 32◦ C
see that the attack-induced thermal emergencies for both
threshold. Naturally, the temperature exceeds the threshold
Myopic and Foresighted are close to zero at low attack time.
sooner with increased cooling overload. Similarly, when the
This is because the operator declares a thermal emergency
data center is already running hotter (i.e., higher supply tem-
when the data center temperature exceeds 32◦ C and stays
perature Ts ), its temperature reaches the limit faster. We see
there for at least two minutes. Hence, at low attack time
that it takes less than four minutes to increase the data center
which also corresponds to low average temperature increases
temperature from 27◦ C to 32◦ C with one kW of additional
in Fig. 11(b), there are almost no thermal emergencies due to
cooling load, demonstrating the potential danger of thermal
attacks.
attacks.
Performance impacts. We normalize the tenants’ 95-
We then vary the total attack energy injected into the percentile response time to that of without any emergencies.
edge colocation (i.e., total attack time), while keeping the We take the average of the normalized response time during
attack load from the battery ﬁxed at 1kW. We vary the the emergency periods and show the result in Fig. 11(d).
attack probability for Random from 0% to 15%, the load We see that Myopic has a slightly higher average perfor-
threshold (including the attacker’s own power subscription) mance impact than Foresighted. This is because Myopic
for launching an attack under Myopic from 6.5kW to 8.0kW, mainly captures the most prominent attack opportunities while
and the weight parameter for Foresighted from w = 0 to Foresighted intelligently picks up even the subtle opportunities
w = 30. Figs. 11(b) and 11(c) show the average server inlet with relatively lower impact, resulting in a lower average
temperature increase (ΔT ) beyond 27◦ C and the amount of performance impact. Nonetheless, since Foresighted seizes
attack-induced emergencies (measured in % of the total time) both the prominent and subtle attack opportunities, it results in
given different average daily attack times, respectively. In more frequent thermal emergencies, thus resulting in a greater
Fig. 11(c), we exclude Random because it fails to create any cost impact.
thermal emergency.
Temperature increase. We see in Fig. 11(b) that with more E. Sensitivity Study
attacks, the temperature increase caused by Random also rises. We now study how the battery capacity, side channel
For Myopic and Foresighted, the temperature increase rises accuracy, attack load, and data center average utilization
very fast initially, when attacks are conservatively launched. affect the resulting thermal attacks. We also study the impact

339
Myopic Foresighted Myopic Foresighted Myopic Foresighted Myopic Foresighted

Battery Capacity
4 3 4 3 0.6
Emergency (%)

Emergency (%)

Emergency (%)
3 3

(kWh)
2 2 0.4
2 2
1 1 1 1 0.2
0 0 0 0 0.0
0.1 0.2 0.3 0.4 0 1 2 5 0.5 1 1.5 2 55 65 75 85 0 2 4 6 8 10
Battery Capacity (kWh) Random Noise (%) Attack Load (kW) Capacity Utilization (%) $PPMJOH$apacity Increase (%)

(a) (b) (c) (d) (e)

Fig. 12. Sensitivity of Foresighted. (a) Battery capacity. (b) Load estimation due to random noise in side channel. (c) Attack load. (d) Average utilization
of data center capacity. (e) Required battery capacity for extra cooling capacity.

12 3
of additional cooling capacity on attacker’s battery capacity

Norm Avg Delay

Aggregate Power Myopic Foresighted

Power (kW)
10 Cooling Capacity
requirement. We exclude Random from our study here since 8
2
it fails to create any thermal emergency. 6 1
Battery capacity. Considering repeated attacks, we vary 4
0
0 6 12 18 24 #1 #2 #3
the battery capacity from 0.1 kWh to 0.4 kWh, and show Time (h) Tenant
the annual duration of thermal emergencies due to the attacks (a) (b)
in Fig. 12(a). Naturally, a larger battery provides greater
flexibility in launching thermal attacks. Hence, we see the Fig. 13. Results with an alternate power trace. (a) A 24-hour snapshot of the
annual thermal emergency time increases with battery capacity. alternate power trace. (b) Tenants’ performance during emergencies.
We also see the difference between Myopic and Foresighted
decreases with a larger battery as the battery is more likely to
be available whenever Myopic needs it, like in Foresighted. Thus, as discussed in Section VII, other defenses are more
Load estimation accuracy. To test robustness against volt- effective and cost-efficient, especially for an existing data
age side channel errors, we add varying degrees of random center that has limited cooling capacity.
errors to the estimated loads of benign tenants and show our F. Results with an Alternate Power Trace
results in Fig. 12(b). As expected, the thermal emergency time
decreases for both Myopic and Foresighted when there is more We conduct our year long evaluation with an alternate power
noise in the side channel. Nonetheless, Foresighted can still trace to demonstrate that Foresighted is effective regardless of
create a significant amount of thermal emergency, even using the benign tenants’ load patterns. We use the Google cluster
a noisy voltage side channel. trace from [40] as the alternate total power trace. We show
a 24-hour snapshot of the alternate power trace in Fig. 13(a).
Attack load. The attack load determines how much addi-
Like in the default setting, we scale the power trace to have
tional cooling load is injected during each attack. We show the
a 75% average utilization in our 8kW edge colocation. We
results in Fig. 12(c) where we keep the attacker’s subscribed
keep the same default settings as in Section V for Myopic
capacity at 0.8kW and scale the thermal attack load from
and Foresighted. Fig. 13(b) shows that, with the alternate
0.5kW to 2kW. We see that the annual emergency time
power trace, benign tenants suffer from similar performance
greatly increase with a higher attack load and that Foresighted
degradation as in our earlier results. While we omit detailed
consistently outperforms Myopic by a great margin.
discussion for space limitation, these findings are consistent
Capacity utilization. We study the impact of average data with our earlier results.
center utilization on the thermal attack by scaling the power
trace of all the servers while maintaining the peak power at VII. D EFENSE M ECHANISM
8kW. Fig. 12(d) shows that the total thermal emergency time Tenants generally expect reliable power and cooling sup-
increases with increased capacity utilization. This is intuitive plies (subject to contractual terms) from the colocation oper-
since an increased utilization means the data center more ator which manages non-IT systems. Thus, we offer possible
frequently operates close to its capacity, thus leading to more defenses from the operator’s perspective. We first discuss
thermal attack opportunities. defenses that aim at preventing potential thermal attacks,
Extra cooling capacity. We study the impact of the oper- followed by defenses that detect thermal attacks.
ator’s extra cooling capacity on Foresighted’s battery require-
ment to maintain similar impact (i.e., 2.3% emergency). In A. Prevention
Fig. 12(e), we see that the extra cooling capacity mandates The following defense strategies are proactive measures to
higher battery capacity. Specifically, the increase in battery inhibit potential thermal attacks.
capacity for 10% extra cooling capacity is about ∼0.3kWh, Infrastructure resilience. A straightforward defense
which can still be feasible given today’s battery energy density. against thermal attacks is to reinforce an edge colocation’s
Note, however, that upgrading an existing data center cooling physical infrastructure for handling thermal overloads. For
system to add extra cooling capacity is non-trivial due to this, the operator can deploy a cooling system with additional
constraints such as space limitation, data center uptime, etc. redundancies. This approach, however, can increase the capital

340
cost [24], [65] and be particularly challenging for existing attacker’s servers — the source of the injected cooling load
systems. Alternatively, the operator can lower its server inlet — is still needed to hold the attacker accountable. Thus, to
temperature set point (to 20◦ C instead of the recommended monitor the servers’ actual cooling loads, the operator can
27◦ C) to have more margins for triggering thermal emergen- measure each server’s outlet temperature as well as the hot
cies. The drawback is the increased cooling energy cost [15], air flows. Alternatively, thermal cameras may be employed
[31]. Thus, while oversubscribing data center cooling capacity to identify the servers that are running extra hot. Likewise,
[15], [16], [24] and increasing temperature set point [31] have microphone arrays can be used along with the thermal camera
been suggested for cost efficiency, they should be carefully to pinpoint servers with fans spinning at a high speed (needed
exercised, balancing the benefit versus risk to potential thermal by servers that have higher cooling loads) [7]. While these
attacks. monitoring apparatuses are not used in all data centers, they
Rigorous move-in inspection. The colocation operator are readily available and can be easily installed by data centers
can employ a more rigorous background check and move-in to identify malicious cooling loads.
inspection process for all tenants’ servers to detect and remove To sum up, there exist readily-available defenses, such as
integrated batteries. Note that without built-in batteries, the move-in inspection to disallow built-in batteries, advanced
attacker cannot have additional power sources to support anomaly detection, and installation of monitoring apparatuses
thermal attacks behind the meter or overload the shared to locate the attacker. Given the potential threat of thermal
cooling capacity, unless the data center cooling capacity is attacks that are currently neglected, the edge colocation oper-
oversubscribed as suggested by recent studies [15], [16], [24]. ator can implement one or more of the suggested defenses to
Besides, the operator can also enforce on-site power load safeguard its thermal environment for tenants.
tests to ensure that the server power is consistent with the
tenant’s data center capacity subscription. The operator should VIII. R ELATED W ORKS
be particularly careful about the servers’ peak power. Power and thermal management. The common practice
Degrading physical side channels. The colocation operator of aggressive capacity oversubscription can create occasional
may increase the attacker’s uncertainties about timing attacks capacity overloads when the demand peaks [12]–[14], [18],
by degrading/eliminating the physical side channel. For exam- [19], [21]. To safely ride through power emergencies, numer-
ple, it can add jamming noise signals into the colocation power ous graceful power capping techniques have been proposed,
networks and/or use power line noise filters. Additionally, the such as throttling CPU frequencies [13], migrating/deferring
operator may also prohibit unusual sensors (e.g., microphones) workloads [14], [45], and discharging batteries to boost power
on tenants’ servers in order to prevent an attacker from supply [18], [19], [21]. Likewise, managing server loads to
exploiting other possible but unknown side channels. handle thermal emergencies are equally crucial [16], [20],
[43]. These studies, however, are not applicable for colocations
B. Detection whose operators have no control over tenants’ servers. More-
Detection strategies can be implemented to catch an attacker over, they do not consider an adversarial setting. More recent
that may circumvent prevention approaches. works [12], [68] propose market approaches to coordinate
Detecting behind-the-meter cooling loads. The same tenants’ power demand in colocations, but they assume that
power reading can result in different cooling loads and server tenants are all benign without any malicious intentions.
inlet/outlet temperature, depending on whether malicious ther- Data center security and thermal fault attacks. Securing
mal attacks are launched or not. Thus, by using anomaly detec- data centers against cyber attacks, such as network DDoS [69]
tion algorithms (e.g., cross-checking readings by temperature and data/privacy breach [70], has been extensively investi-
sensors and power meters), the operator can detect an irregular gated. Prior studies have also considered malicious thermal
thermal environment possibly due to thermal attacks. load attacks on a single device [71]. More recently, data center
Identifying attacks from impacts. One-shot attacks can power and cooling system security has been emerging as a
be easily identified through a thorough inspection if a system crucial concern [5]–[9], [51], [72]. However, these works focus
outage occurs. By contrast, repeated-attacks that inject milder on overloading the power infrastructure (i.e., power attack) of
loads to trigger more frequent thermal emergencies can re- large data centers with multi-level redundancy or creating hot-
quire more efforts. Since precise temperature management is spots (i.e., thermal attack) in Amazon-type cloud with frequent
difficult with open airflow cooling, there can be occasional VM shuffling. In contrast, we focus on novel battery-assisted
thermal emergencies in colocations even without thermal at- thermal attacks in a shared edge colocation. Moreover, our
tacks; colocation operators often offer a long-term temperature repeated battery-assisted thermal attacks are stateful whereas
SLA (e.g., the inlet temperature is conditioned below 27◦ C prior attacks are stateless as the current attack does not depend
for 99% or more of the time) [66], [67]. This may potentially on any past/future attacks.
allow an attacker to hide behind the statistics for a longer time. Battery management and others. The prior studies have
Thus, advanced algorithms can be implemented to monitor exploited batteries for various purposes, such as better energy
SLA metrics to early detect the presence of thermal attacks. capacity [63], concealing a household’s electricity usage infor-
Improved data center monitoring. While the aforemen- mation from the utility for better privacy [73], smoothing data
tioned approaches can detect thermal attacks, pin-pointing the center power demand [18], [19], [21], among many others.

341
To our knowledge, however, our study is the first to leverage [21] D. Wang, C. Ren, A. Sivasubramaniam, B. Urgaonkar, and H. Fathy,
batteries for a malicious purpose — one-shot or repeated “Energy storage in datacenters: what, where, and how much?,” in
SIGMETRICS, 2012.
thermal attacks in edge colocations — which highlights the [22] Y. Sverdlik, “Google to build and lease data centers in big cloud
need of attention to the potential threat. expansion,” in DataCenterKnowledge, April 2016.
[23] DatacenterKnowledge, “Vapor IO to sell data center colocation services
IX. C ONCLUSION at cell towers,” http://www.datacenterknowledge.com/archives/2017/06/
21/vapor-io-to-sell-data-center-colocation-services-at-cell-towers.
In this paper, we discovered that the sharing of cooling [24] M. Skach, M. Arora, D. Tullsen, L. Tang, and J. Mars, “Virtual melting
systems may expose edge colocations’ potential vulnerabil- temperature: Managing server load to minimize cooling overhead with
ities to both one-shot and repeated thermal attacks assisted phase change materials,” in ISCA, 2018.
[25] S. Malla, Q. Deng, Z. Ebrahimzadeh, J. Gasperetti, S. Jain, P. Kon-
with built-in batteries. For repeated attacks, we presented a dety, T. Ortiz, and D. Vieira, “Coordinated priority-aware charging of
foresighted attack policy which, using reinforcement learning, distributed batteries in oversubscribed data centers,”
learns on the fly a good timing for thermal attacks. We also [26] V. Sakalkar, V. Kontorinis, D. Landhuis, S. Li, D. De Ronde, T. Bloom-
ing, A. Ramesh, J. Kennedy, C. Malone, J. Clidaras, and P. Ranganathan,
ran simulations to validate our attacks and showed that, for “Data center power oversubscription with a medium voltage power plane
an 8kW edge colocation, an attacker can cause performance and priority-aware capping,” in ASPLOS, 2020.
degradation for affected tenants. Finally, we suggested effec- [27] A. Kumbhare, R. Azimi, I. Manousakis, A. Bonde, F. Frujeri, N. Ma-
halingam, P. Misra, S. A. Javadi, B. Schroeder, M. Fontoura, and
tive countermeasures against potential thermal attacks that are R. Bianchini, “Prediction-based power oversubscription in cloud plat-
currently neglected in many data centers. forms,” 2020.
[28] T. Evans, “The different technologies for cooling data
R EFERENCES centers,” http://www.apcmedia.com/salestools/VAVR-5UDTU5/
VAVR-5UDTU5 R2 EN.pdf.
[1] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision [29] Google, “Heat containment,” http://www.google.com/about/datacenters/
and challenges,” IEEE Internet of Things Journal, vol. 3, pp. 637–646, efficiency/external/.
Oct 2016. [30] D. L. Moss, “Dynamic control optimizes facility airflow delivery,” Dell
[2] DatacenterKnowledge, “NTT plans global data center network for con- White Paper, March 2012.
nected cars,” http://www.datacenterknowledge.com/archives/2017/03/27/ [31] Q. Tang, S. K. S. Gupta, and G. Varsamopoulos, “Thermal-aware task
ntt-plans-global-data-center-network-for-connected-cars. scheduling for data centers through minimizing heat recirculation,” in
[3] Vapor IO, “The edge data center,” https://www.vapor.io/. CLUSTER, 2007.
[4] Uptime Institute, “Data center industry survey,” 2018, https:// [32] S. V. Patankar, “Airflow and cooling in a data center,” Journal of Heat
uptimeinstitute.com/2018-data-center-industry-survey-results. Transfer, vol. 132, p. 073001, July 2010.
[5] M. A. Islam and S. Ren, “Ohm’s law in data centers: A voltage side [33] R. A. Steinbrecher and R. Schmidt, “Data center environments: Ashrae’s
channel for timing power attacks,” in CCS, 2018. evolving thermal guidelines,” ASHRAE Technical Feature, pp. 42–49,
[6] M. A. Islam, S. Ren, and A. Wierman, “Exploiting a thermal side December 2011.
channel for power attacks in multi-tenant data centers,” in CCS, 2017.
[34] Dell, “Integrated dell remote access controller 9 (iDRAC9) version
[7] M. A. Islam, L. Yang, K. Ranganath, and S. Ren, “Why some like it
3.00.00.00.”
loud: Timing power attacks in multi-tenant data centers using an acoustic
[35] 365DataCenters, “Master services agreement,” http://www.
side channel,” in SIGMETRICS, 2018.
365datacenters.com/master-services-agreement/.
[8] Z. Xu, H. Wang, Z. Xu, and X. Wang, “Power attack: An increasing
[36] R. A. Bridges, N. Imam, and T. M. Mintz, “Understanding gpu power: A
threat to data centers,” in NDSS, 2014.
survey of profiling, modeling, and simulation methods,” ACM Comput.
[9] X. Gao, Z. Xu, H. Wang, L. Li, and X. Wang, “Reduced cooling
Surv., vol. 49, pp. 41:1–41:27, Sept. 2016.
redundancy: A new security vulnerability in a hot data center,” in NDSS,
2018. [37] Keysight Technology, “Learn to connect power supplies in parallel for
[10] Ponemon Institute, “2016 cost of data center outages,” 2016, https:// higher current output,” https://www.keysight.com/main/editorial.jspx?
www.ponemon.org/blog/2016-cost-of-data-center-outages. cc=US&lc=eng&ckey=520808&nid=-11143.0.00&id=520808.
[11] P. Jones, “Overheating brings down microsoft data center,” Dat- [38] S. Govindan, D. Wang, A. Sivasubramaniam, and B. Urgaonkar, “Ag-
acenter Dynamics, 2013, https://www.datacenterdynamics.com/news/ gressive datacenter power provisioning with batteries,” ACM Trans.
overheating-brings-down-microsoft-data-center/. Comput. Syst., vol. 31, pp. 2:1–2:31, Feb. 2013.
[12] M. A. Islam, X. Ren, S. Ren, A. Wierman, and X. Wang, “A market [39] D. Wang, C. Ren, and A. Sivasubramaniam, “Virtualizing power distri-
approach for handling power emergencies in multi-tenant data center,” bution in datacenters,” in ISCA, 2013.
in HPCA, 2016. [40] L. Liu, C. Li, H. Sun, Y. Hu, J. Gu, T. Li, J. Xin, and N. Zheng, “Heb:
[13] Q. Wu, Q. Deng, L. Ganesh, C.-H. R. Hsu, Y. Jin, S. Kumar, B. Li, Deploying and managing hybrid energy buffers for improving datacenter
J. Meza, and Y. J. Song, “Dynamo: Facebook’s data center-wide power efficiency and economy,” in ISCA, 2015.
management system,” in ISCA, 2016. [41] P. Lin, S. Zhang, and J. VanGilder, “Data center temperature rise during
[14] G. Wang, S. Wang, B. Luo, W. Shi, Y. Zhu, W. Yang, D. Hu, L. Huang, a cooling system outage,” APC White Paper 179, 2014.
X. Jin, and W. Xu, “Increasing large-scale data center capacity by [42] Intel, “Intel cloud builders guide to power management in cloud design
statistical power control,” in EuroSys, 2016. and deployment using Supermicro platforms and NMView management
[15] M. Skach, M. Arora, C.-H. Hsu, Q. Li, D. Tullsen, L. Tang, and J. Mars, software,” 2013.
“Thermal time shifting: Leveraging phase change materials to reduce [43] L. Ramos and R. Bianchini, “C-Oracle: Predictive thermal management
cooling costs in warehouse-scale computers,” in ISCA, 2015. for data centers,” in HPCA, 2008.
[16] I. Manousakis, I. n. Goiri, S. Sankar, T. D. Nguyen, and R. Bianchini, [44] L. Zhang, S. Ren, C. Wu, and Z. Li, “A truthful incentive mechanism for
“Coolprovision: Underprovisioning datacenter cooling,” in SoCC, 2015. emergency demand response in colocation data centers,” in INFOCOM,
[17] Supermicro, “Battery backup power - evolutionary design to replace 2015.
UPS,” http://www.supermicro.com/products/nfo/files/bbp/f bbp.pdf. [45] D. Wang, S. Govindan, A. Sivasubramaniam, A. Kansal, J. Liu, and
[18] B. Aksanli, T. Rosing, and E. Pettis, “Distributed battery control for B. Khessib, “Underprovisioning backup power infrastructure for data-
peak power shaving in datacenters,” in IGCC, 2013. centers,” in ASPLOS, 2014.
[19] V. Kontorinis, L. E. Zhang, B. Aksanli, J. Sampson, H. Homayoun, [46] S. Baidya, Y.-J. Ku, H. Zhao, J. Zhao, and S. Dey, “Vehicular and
E. Pettis, D. M. Tullsen, and T. S. Rosing, “Managing distributed UPS edge computing for emerging connected and autonomous vehicle ap-
energy for effective power capping in data centers,” in ISCA, 2012. plications,” in DAC, 2020.
[20] Y. Kim, J. Choi, S. Gurumurthi, and A. Sivasubramaniam, “Managing [47] “Calb 100 ah se series lithium iron phosphate battery,” https://www.
thermal emergencies in disk-based storage systems,” Dec 2008. evwest.com/catalog/product info.php?products id=51.

342
[48] “Poweredge r740xd rack server,” https://www.dell.com/en-us/work/ an additional 1.5kW load to overload the cooling system and
shop/povw/poweredge-r740xd. measure the server inlet temperature. As shown in Fig. 14(a),
[49] L. Brochard, V. Kamath, J. Corbalán, S. Holland, W. Mittelbach, and
M. Ott, Energy-Efﬁcient Computing and Data Centers. John Wiley & the inlet temperature rises to nearly 40◦ C within minutes.
Sons, 2019. Our experiment, albeit on a small scale, demonstrates the
[50] K. Lee, N. Klingensmith, S. Banerjee, and Y. Kim, “Voltkey: Continuous rapid increase of server inlet temperature due to a overloaded
secret key generation based on power line noise for zero-involvement
pairing and authentication,” Proc. ACM Interact. Mob. Wearable Ubiq- cooling system. This is also corroborated by other studies
uitous Technol., vol. 3, Sept. 2019. that demonstrate rapid temperature rises in data centers due
[51] X. Gao, Z. Gu, M. Kayaalp, D. Pendarakis, and H. Wang, “Container- to cooling malfunction [41]. We follow the ASHRAE safety
Leaks: Emerging security threats of information leakages in container
clouds,” in DSN, 2017. limit and do not further overload our system [33].
[52] J. N. Tsitsiklis, “Asynchronous stochastic approximation and q-
learning,” Machine learning, vol. 16, no. 3, pp. 185–202, 1994.

Temperature (∘C)

p95 Delay (ms)

1.5 40

Overload (kW)
400
[53] J. Xu, L. Chen, and S. Ren, “Online learning for ofﬂoading and autoscal- Cooling Load
Tinlet
Without Attack
With Attack
ing in energy harvesting mobile edge computing,” IEEE Transactions on 1.0 35 300 Emergency
Cognitive Communications and Networking, vol. 3, no. 3, pp. 361–373, 200
0.5 30
2017. 100
0.0 25 0
[54] Google, “Google’s Data Center Efﬁciency,” http://www.google.com/ 0 10 20 30 0 10 20 30
about/datacenters/. Time (min) Time (min)
[55] Vertiv, “Smartmod modula data center infrastructure.” (a) Temperature (b) Performance
[56] X. Wang, X. Wang, G. Xing, and C. xian Lin, “Leveraging thermal dy-
namics in sensor placement for overheating server component detection,” Fig. 14. Experiment in our server rack. (a) Server inlet temperature increases
in IGCC, 2012. due to a cooling capacity overload by 1.5kW. (b) Latency performance is
[57] NVidia, “https://www.nvidia.com/en-us/geforce/graphics-cards/30- compromised due to server power capping for handling an emergency.
series/rtx-3080/.”
[58] D. G. Feitelson, D. Tsafrir, and D. Krakov, “Experience with using
the parallel workloads archive,” Journal of Parallel and Distributed
Computing, vol. 74, no. 10, pp. 2967–2982, 2014. 2.5 2.5

Response Time

Response Time
800 users 300 req/s
Normalized

Normalized
[59] Parallel Workloads Archive, http://www.cs.huji.ac.il/labs/parallel/ 2.0 2.0
1000 users 400 req/s
workload/. 1.5
1.5
1.0
[60] X. Fan, W.-D. Weber, and L. A. Barroso, “Power provisioning for a 1.0 0.5
warehouse-sized computer,” in ISCA, 2007. 0.5 0.0
[61] M. E. Haque, Y. h. Eom, Y. He, S. Elnikety, R. Bianchini, and 0.6 0.7 0.8 0.9 1.0 0.6 0.7 0.8 0.9 1.0
Normalized Power Normalized Power
K. S. McKinley, “Few-to-many: Incremental parallelism for reducing
tail latency in interactive services,” in ASPLOS, 2015. (a) Web Service (b) Web Search
[62] E. Even-Dar and Y. Mansour, “Learning rates for q-learning,” Journal
of Machine Learning Research, vol. 5, no. Dec, pp. 1–25, 2003. Fig. 15. Performance degradation due to power capping.
[63] L. He, E. Kim, and K. G. Shin, “*aware charging of lithium-ion battery
cells,” in ICCPS, 2016. We implement the ClouSuite Web Service benchmark [74]
[64] P. X. Gao, A. R. Curtis, B. Wong, and S. Keshav, “It’s not easy being
green,” SIGCOMM Comput. Commun. Rev., 2012. in a set of 4 servers with a workload of 600 requests/s
[65] I. Goiri, R. Bianchini, S. Nagarakatte, and T. D. Nguyen, “Approx- and show the impact of power capping on the 95-percentile
hadoop: Bringing approximations to mapreduce frameworks,” in ASP- response time, which is the key performance metric [61]. An
LOS, 2015.
[66] Internap, “Colocation services and SLA,” http:// x-percentile response time means that x% of the requests
www.internap.com/internap/wp-content/uploads/2014/06/ have a latency less than this response time. For illustration,
Attachment-3-Colocation-Services-SLA.pdf. we throttle the CPU speed to cap the total server power to
[67] Equinix, “Colocation services and SLA,” https://enterprise.verizon.com/
service guide/reg/cp colocation equinix data centers sla.pdf. 60% of the peak power. We see from Fig. 14(b) that during
[68] M. A. Islam, H. Mahmud, S. Ren, and X. Wang, “Paying to save: the emergency, the response time jumps nearly four times to
Reducing cost of colocation data center via rewards,” in HPCA, 2015. 400ms.
[69] S. Yu, Y. Tian, S. Guo, and D. O. Wu, “Can we beat ddos attacks
in clouds?,” IEEE Transactions on Parallel and Distributed Systems,
We also extend our experiments to Web Search implemen-
vol. 25, pp. 2245–2254, September 2014. tation from CloudSuite [74]. We show the 95-th percentile
[70] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Cross-vm side response time normalized to the service level agreement
channels and their use to extract private keys,” in CCS, 2012.
[71] S. Skorobogatov, “Local heating attacks on ﬂash memory devices,” in
(100ms) for two different numbers of users for Web Service
Workshop on Hardware-Oriented Security and Trust, 2009. in Fig. 15(a) and two different request rates for Web Search
[72] C. Li, Z. Wang, X. Hou, H. Chen, X. Liang, and M. Guo, “Power attack in Fig. 15(b), respectively. The server power consumption is
defense: Securing battery-backed data centers,” in ISCA, 2016.
[73] L. Yang, X. Chen, J. Zhang, and H. V. Poor, “Optimal privacy-preserving
normalized to the peak. We see that when the server power
energy management for smart meters,” in INFOCOM, 2014. consumption decreases, the response times for both applica-
[74] “CloudSuite - The Search Benchmark,” http://cloudsuite.ch/. tions increase for any given workload level. This reveals the
degree of performance degradation faced by tenants when they
A PPENDIX A reduce their power consumption while the workload remains
P ROTOTYPE D EMONSTRATION OF T HERMAL ATTACKS unchanged.
To see the impact of thermal attacks, we run experiments on
a rack of 14 Dell PowerEdge servers in a scaled environment
with hot-cold aisles to mimic an edge colocation. The cooling
system can support up to a cooling load of 3kW. We inject

343

Turnover Package Checklist
100% (4)
Turnover Package Checklist
3 pages
Shao 2019
No ratings yet
Shao 2019
2 pages
Gao 2020
No ratings yet
Gao 2020
18 pages
Gao 2017
No ratings yet
Gao 2017
2 pages
Survey On Edge Computing Security
No ratings yet
Survey On Edge Computing Security
10 pages
A Cybersecurity Framework To Identify
No ratings yet
A Cybersecurity Framework To Identify
17 pages
Smart
No ratings yet
Smart
24 pages
ELEC Liu Survey On Secure Data Analytics in Edge Computing IEEEIoTJ
No ratings yet
ELEC Liu Survey On Secure Data Analytics in Edge Computing IEEEIoTJ
23 pages
Trusted Edge Computing System Based On Intelligent Risk Detection For Smart IoT
No ratings yet
Trusted Edge Computing System Based On Intelligent Risk Detection For Smart IoT
10 pages
An Efficient Security Model For Industrial Internet of Things
No ratings yet
An Efficient Security Model For Industrial Internet of Things
12 pages
Xiao 2019
No ratings yet
Xiao 2019
24 pages
Offline Device Hacking 11.00.17 PM
No ratings yet
Offline Device Hacking 11.00.17 PM
6 pages
Power Attack: An Increasing Threat To Data Centers: Zhang Xu Haining Wang Zichen Xu Xiaorui Wang
No ratings yet
Power Attack: An Increasing Threat To Data Centers: Zhang Xu Haining Wang Zichen Xu Xiaorui Wang
15 pages
Sensors 21 08226
No ratings yet
Sensors 21 08226
20 pages
Final - Douiba - CPS Reliability
No ratings yet
Final - Douiba - CPS Reliability
10 pages
IoTedgecomputingspawnnewsecurityissues-201201-145608
No ratings yet
IoTedgecomputingspawnnewsecurityissues-201201-145608
7 pages
A Survey on Post-quantum Based Approaches for Edge Computing Security
No ratings yet
A Survey on Post-quantum Based Approaches for Edge Computing Security
36 pages
Security of Distributed Intelligence in Edge Computing
No ratings yet
Security of Distributed Intelligence in Edge Computing
28 pages
Ensemble Technique of Intrusion Detection For Iot Edge Platform
No ratings yet
Ensemble Technique of Intrusion Detection For Iot Edge Platform
16 pages
Fog/Edge Computing For Security, Privacy, And Applications Wei Chang pdf download
No ratings yet
Fog/Edge Computing For Security, Privacy, And Applications Wei Chang pdf download
135 pages
R16 Seminar Report Template1
No ratings yet
R16 Seminar Report Template1
21 pages
Edge Computing
No ratings yet
Edge Computing
7 pages
Security Implications of Smart Environmental Monitoring in An HVAC Control System
No ratings yet
Security Implications of Smart Environmental Monitoring in An HVAC Control System
13 pages
Project9 Report Ver2 PDF
No ratings yet
Project9 Report Ver2 PDF
11 pages
Secure Edge Computing in IoT Systems - Review and Case Studies
No ratings yet
Secure Edge Computing in IoT Systems - Review and Case Studies
5 pages
(IJCST-V12I2P6) :berthille Igiraneza
No ratings yet
(IJCST-V12I2P6) :berthille Igiraneza
6 pages
Recommendations For Responding To System Security
No ratings yet
Recommendations For Responding To System Security
19 pages
Sensors: Microservice Security Agent Based On API Gateway in Edge Computing
No ratings yet
Sensors: Microservice Security Agent Based On API Gateway in Edge Computing
17 pages
Springer Paper Anit
No ratings yet
Springer Paper Anit
12 pages
Cooling Control Strategies in Data Centers
100% (1)
Cooling Control Strategies in Data Centers
76 pages
Jsan 11 00047 v2
No ratings yet
Jsan 11 00047 v2
50 pages
Sensors 23 09869
No ratings yet
Sensors 23 09869
20 pages
Smart
No ratings yet
Smart
18 pages
Msbte Submitted Final Paper
No ratings yet
Msbte Submitted Final Paper
6 pages
sensors-25-00213-v2
No ratings yet
sensors-25-00213-v2
42 pages
Anomaly Behavior Analysis For IoT Sensors
No ratings yet
Anomaly Behavior Analysis For IoT Sensors
15 pages
A Survey On Industrial Internet of Things Security
No ratings yet
A Survey On Industrial Internet of Things Security
49 pages
1 s2.0 S1568494624002084 Main
No ratings yet
1 s2.0 S1568494624002084 Main
21 pages
Edge AI A Survey - 2023 - Internet of Things and Cyber Physical Systems
No ratings yet
Edge AI A Survey - 2023 - Internet of Things and Cyber Physical Systems
22 pages
Governmentpolytechniccollege MATTANNUR-670702: (Department of Technical Education, Kerala)
No ratings yet
Governmentpolytechniccollege MATTANNUR-670702: (Department of Technical Education, Kerala)
29 pages
Edge Computing Security and Challenges
No ratings yet
Edge Computing Security and Challenges
2 pages
Edge Computing Driven IoT
No ratings yet
Edge Computing Driven IoT
41 pages
Sensors 20 06441
No ratings yet
Sensors 20 06441
53 pages
Published Journals
No ratings yet
Published Journals
9 pages
(IJCST-V11I3P9) :raghu Ram Chowdary Velevela
No ratings yet
(IJCST-V11I3P9) :raghu Ram Chowdary Velevela
6 pages
Sigcomm-Iot-18 D
No ratings yet
Sigcomm-Iot-18 D
7 pages
1908 00080 PDF
No ratings yet
1908 00080 PDF
33 pages
A Review of Edge Computing Technology and Its Applications in Power Systems
No ratings yet
A Review of Edge Computing Technology and Its Applications in Power Systems
28 pages
Sravani FOG Computing New
No ratings yet
Sravani FOG Computing New
20 pages
Edge Computing56
No ratings yet
Edge Computing56
17 pages
STAAR_REPORTaditya
No ratings yet
STAAR_REPORTaditya
10 pages
An Overview On Edge Computing Research
No ratings yet
An Overview On Edge Computing Research
18 pages
(IJCST-V12I6P2) :arushi Shrivastava, Khushboo Panjwani, Goldi Soni
No ratings yet
(IJCST-V12I6P2) :arushi Shrivastava, Khushboo Panjwani, Goldi Soni
6 pages
Edge Computing Seminar Report
67% (3)
Edge Computing Seminar Report
18 pages
Thermal Management in Large Data Centers: Security Threats and Mitigation
No ratings yet
Thermal Management in Large Data Centers: Security Threats and Mitigation
15 pages
1MV20CS001 TechSeminar
No ratings yet
1MV20CS001 TechSeminar
25 pages
Edge Computing in IoT
No ratings yet
Edge Computing in IoT
11 pages
Edge Computing and Cloud Computing For Internet of Things
No ratings yet
Edge Computing and Cloud Computing For Internet of Things
22 pages
2024内容创作者生态报告新榜 2024.11 76页
No ratings yet
2024内容创作者生态报告新榜 2024.11 76页
78 pages
TheWallStreetJournal - 10.29.2024
No ratings yet
TheWallStreetJournal - 10.29.2024
28 pages
Depgraph: A Dependency-Driven Accelerator For Efficient Iterative Graph Processing
No ratings yet
Depgraph: A Dependency-Driven Accelerator For Efficient Iterative Graph Processing
14 pages
Tensor Casting: Co-Designing Algorithm-Architecture For Personalized Recommendation Training
No ratings yet
Tensor Casting: Co-Designing Algorithm-Architecture For Personalized Recommendation Training
14 pages
Widir: A Wireless-Enabled Directory Cache Coherence Protocol
No ratings yet
Widir: A Wireless-Enabled Directory Cache Coherence Protocol
14 pages
Syncron:: Efficient Synchronization Support For Near-Data-Processing Architectures
No ratings yet
Syncron:: Efficient Synchronization Support For Near-Data-Processing Architectures
14 pages
Q: Query Acceleration Can Be Generic and Efficient in The Cloud
No ratings yet
Q: Query Acceleration Can Be Generic and Efficient in The Cloud
14 pages
Gradpim: A Practical Processing-In-Dram Architecture For Gradient Descent
No ratings yet
Gradpim: A Practical Processing-In-Dram Architecture For Gradient Descent
14 pages
Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack For AES Key Recovery
No ratings yet
Trident: A Hybrid Correlation-Collision GPU Cache Timing Attack For AES Key Recovery
13 pages
Revisiting Hyperdimensional Learning For Fpga and Low-Power Architectures
No ratings yet
Revisiting Hyperdimensional Learning For Fpga and Low-Power Architectures
14 pages
Faster SCHR Odinger-Style Simulation of Quantum Circuits: Aneeqa Fatima Igor L. Markov
No ratings yet
Faster SCHR Odinger-Style Simulation of Quantum Circuits: Aneeqa Fatima Igor L. Markov
14 pages
TILT: Achieving Higher Fidelity On A Trapped-Ion Linear-Tape Quantum Computing Architecture
No ratings yet
TILT: Achieving Higher Fidelity On A Trapped-Ion Linear-Tape Quantum Computing Architecture
14 pages
Systematic Approaches For Precise and Approximate Quantum State Runtime Assertion
No ratings yet
Systematic Approaches For Precise and Approximate Quantum State Runtime Assertion
15 pages
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
No ratings yet
Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
13 pages
TSOPER: Efficient Coherence-Based Strict Persistency
No ratings yet
TSOPER: Efficient Coherence-Based Strict Persistency
14 pages
Stealth-Persist: Architectural Support For Persistent Applications in Hybrid Memory Systems
No ratings yet
Stealth-Persist: Architectural Support For Persistent Applications in Hybrid Memory Systems
14 pages
Cheetah: Optimizing and Accelerating Homomorphic Encryption For Private Inference
No ratings yet
Cheetah: Optimizing and Accelerating Homomorphic Encryption For Private Inference
14 pages
New Models For Understanding and Reasoning About Speculative Execution Attacks
No ratings yet
New Models For Understanding and Reasoning About Speculative Execution Attacks
14 pages
Modul 4. Availability Concepts
No ratings yet
Modul 4. Availability Concepts
46 pages
Data Center DSI 2018
No ratings yet
Data Center DSI 2018
11 pages
ST-125™ Control System For Stanadyne DB-4 Series Pumps: Product Manual 04169 (Revision D)
No ratings yet
ST-125™ Control System For Stanadyne DB-4 Series Pumps: Product Manual 04169 (Revision D)
24 pages
Business Impact Analysis BIA Information Security - (13p)
100% (1)
Business Impact Analysis BIA Information Security - (13p)
13 pages
BBB M, M, KLKLKL JJK
No ratings yet
BBB M, M, KLKLKL JJK
16 pages
C729387 Compactador cb54
100% (1)
C729387 Compactador cb54
12 pages
Offshore Platforms & Fpsos: Bently Nevada Asset Condition Monitoring
No ratings yet
Offshore Platforms & Fpsos: Bently Nevada Asset Condition Monitoring
7 pages
Unit 3 Notes Part 2
No ratings yet
Unit 3 Notes Part 2
24 pages
Golden Gate
No ratings yet
Golden Gate
53 pages
Nuodb Neobank WP PDF
No ratings yet
Nuodb Neobank WP PDF
6 pages
Vsphere ICM 8 Lab 25
No ratings yet
Vsphere ICM 8 Lab 25
55 pages
BRD - Tool Life Monitoring
No ratings yet
BRD - Tool Life Monitoring
6 pages
KRA List & Responsibility Matrix
No ratings yet
KRA List & Responsibility Matrix
657 pages
JCI 8th Management of Information MOI and Health Care Technology HCT
No ratings yet
JCI 8th Management of Information MOI and Health Care Technology HCT
31 pages
Product Manual 26098 (Revision NEW) : LCS Series Integrated Speed Control
No ratings yet
Product Manual 26098 (Revision NEW) : LCS Series Integrated Speed Control
20 pages
Dynamic-Production Machining Center ROI:: How To Determine A Machine's True Value
No ratings yet
Dynamic-Production Machining Center ROI:: How To Determine A Machine's True Value
11 pages
What Is The Purpose of The Tco Calculator?: Homepage - Procurement - Govt.Nz
No ratings yet
What Is The Purpose of The Tco Calculator?: Homepage - Procurement - Govt.Nz
4 pages
Ayantu Melkamu
No ratings yet
Ayantu Melkamu
93 pages
Technical Report
No ratings yet
Technical Report
16 pages
HP SVSP Quickspecs v7
No ratings yet
HP SVSP Quickspecs v7
17 pages
1 VMware VSphere Interview Questions
100% (1)
1 VMware VSphere Interview Questions
10 pages
Poa Nexus
No ratings yet
Poa Nexus
1 page
NEW Telecommunications License Application Requirements Guide
No ratings yet
NEW Telecommunications License Application Requirements Guide
31 pages
Rise With Sap S4hana Private Cloud Edition Supplement English v1 2021
No ratings yet
Rise With Sap S4hana Private Cloud Edition Supplement English v1 2021
7 pages
Change Management Process Flow Guide
No ratings yet
Change Management Process Flow Guide
17 pages
ZSW03331
No ratings yet
ZSW03331
69 pages
OR Case Studies
No ratings yet
OR Case Studies
3 pages
Level: Personnel Environment Assets Reputation Financial
No ratings yet
Level: Personnel Environment Assets Reputation Financial
1 page
Ericsson WCDMAhghdgd
0% (1)
Ericsson WCDMAhghdgd
195 pages

Heat Behind The Meter: A Hidden Threat of Thermal Attacks in Edge Colocation Data Centers

Uploaded by

Heat Behind The Meter: A Hidden Threat of Thermal Attacks in Edge Colocation Data Centers

Uploaded by

2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Heat Behind the Meter: A Hidden Threat of

2378-203X/21/$31.00 ©2021 IEEE 330

CFD analysis. Speciﬁcally, to extract the heat distribution

Battery Level (%)

Norm Avg Delay

(a) (b) (c) (d)

(a) (b) (c) (d) (e)

Norm Avg Delay

p95 Delay (ms)

You might also like