0% found this document useful (0 votes)
62 views14 pages

6Design-Space Exploration and Optimization 07556373

This document discusses a machine learning approach to optimize the design of a 3D small-world network-on-chip (SWNoC) architecture. The key contributions are: 1) Proposing a 3D SWNoC design that uses vertical links as long-range shortcuts, improving energy efficiency and reliability. 2) Using machine learning to intelligently explore the large 3D SWNoC design space and optimize planar and vertical link placement. 3) Developing a computationally efficient spare vertical link allocation algorithm based on state-space search to improve 3D SWNoC reliability. Simulations guide the search process to find high-quality solutions without knowing the cost function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views14 pages

6Design-Space Exploration and Optimization 07556373

This document discusses a machine learning approach to optimize the design of a 3D small-world network-on-chip (SWNoC) architecture. The key contributions are: 1) Proposing a 3D SWNoC design that uses vertical links as long-range shortcuts, improving energy efficiency and reliability. 2) Using machine learning to intelligently explore the large 3D SWNoC design space and optimize planar and vertical link placement. 3) Developing a computationally efficient spare vertical link allocation algorithm based on state-space search to improve 3D SWNoC reliability. Simulations guide the search process to find high-quality solutions without knowing the cost function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO.

5, MAY 2017 719

Design-Space Exploration and Optimization


of an Energy-Efficient and Reliable 3-D
Small-World Network-on-Chip
Sourav Das, Student Member, IEEE, Janardhan Rao Doppa, Member, IEEE,
Partha Pratim Pande, Senior Member, IEEE, and
Krishnendu Chakrabarty, Fellow, IEEE

Abstract—A 3-D network-on-chip (NoC) enables the design of large numbers of embedded cores in a single die.
of high performance and low power many-core chips. Existing Three-dimensional NoC architectures combine the benefits
3-D NoCs are inadequate for meeting the ever-increasing perfor- of these two new paradigms to offer an unprecedented per-
mance requirements of many-core processors since they are sim- formance gain [2], [3]. With freedom in the third (vertical)
ple extensions of regular 2-D architectures and they do not fully dimension, NoC architectures that were previously impossi-
exploit the advantages provided by 3-D integration. Moreover, ble or prohibitive due to wiring constraints in planar ICs
the anticipated performance gain of a 3-D NoC-enabled many- are now realizable in 3-D NoC, and many 3-D imple-
core chip may be compromised due to the potential failures mentations can outperform their 2-D counterparts. However,
of through-silicon-vias that are predominantly used as verti- existing 3-D NoC architectures predominantly follow straight-
cal interconnects in a 3-D IC. To address these problems, we forward extensions of regular 2-D NoC designs, which do not
propose a machine-learning-inspired predictive design method-
fully exploit the advantages provided by the 3-D integration
ology for energy-efficient and reliable many-core architectures
enabled by 3-D integration. We demonstrate that a small-world
technology [3]. Another challenge is that the anticipated per-
network-based 3-D NoC (3-D SWNoC) performs significantly formance gain of 3-D NoC-enabled many-core chips may be
better than its 3-D MESH-based counterparts. On average, the compromised due to potential failures of the through-silicon-
3-D SWNoC shows 35% energy-delay-product improvement over vias (TSVs) used as vertical interconnects. TSVs in a 3-D
3-D MESH for the PARSEC and SPLASH2 benchmarks con- IC fail due to voids, cracks, and different kinds of fabrica-
sidered in this paper. To improve the reliability of 3-D NoC, tion challenges [4]. Additionally, the workload induced stress
we propose a computationally efficient spare-vertical link (sVL) increases the resistance of the TSVs, which leads to different
allocation algorithm based on a state-space search formulation. mean-time-to-failure (MTTF) for different TSVs [5], [6].
Our results show that the proposed sVL allocation algorithm can The main focus of this paper is to explore and conse-
significantly improve the reliability as well as the lifetime of 3-D quently establish performance-energy-reliability tradeoffs for
SWNoC. 3-D small-world NoC (SWNoC) [7], [8]. To this end, we make
Index Terms—3-D network-on-chip (NoC), discrete optimiza-
the following contributions.
tion, machine-learning, small-world (SW). 1) We consider the design space of 3-D SWNoC archi-
tectures, where the vertical connections predomi-
nantly work as long-range shortcuts for SW networks.
3-D SWNoC architectures (as shown in Fig. 1) help
I. I NTRODUCTION with both energy-efficiency (small average path length)
and reliability (average path length grows insignifi-
HREE-DIMENSIONAL ICs are capable of achieving
T better performance, functionality, and packaging density
compared to their traditional planar counterparts [1], [2]. On
cantly due to link failures). This is the first work to
exploit the advantages of 3-D integration to design
a power-law-based SW network-enabled 3-D NoC
the other hand, network-on-chip (NoC) enables integration architecture.
2) The design space of a 3-D SWNoC is combinatorial in
Manuscript received February 25, 2016; revised May 17, 2016 and nature. Hence, we leverage machine-learning techniques
July 8, 2016; accepted August 13, 2016. Date of publication August 30, 2016; to intelligently explore the design space to optimize the
date of current version April 19, 2017. This work was supported in part placement of both planar and vertical communication
by the U.S. National Science Foundation under Grant CNS 1564014, Grant links for high performance and energy efficiency.
CCF-0845504, Grant CNS-1059289, and Grant CCF-1162202, and in part by 3) We consider spare-vertical link (sVL) allocation to
the Army Research Office under Grant W911NF-12-1-0373. This paper was
recommended by Associate Editor S. Pasricha.
improve the reliability of the 3-D NoC. This is another
S. Das, J. R. Doppa, and P. P. Pande are with the School of Electrical combinatorial optimization problem, where we do not
Engineering and Computer Engineering, Washington State University, know the cost function. We can experimentally compute
Pullman, WA 99163 USA (e-mail: [email protected]; [email protected]; the quality (or cost) of a solution by running a simula-
[email protected]). tion. We solve this problem using a state-space search
K. Chakrabarty is with the Department of Electrical and Computer formulation, where the simulations guide the search pro-
Engineering, Duke University, Durham, NC 27708 USA (e-mail:
[email protected]).
cess. We leverage the structure of the problem and
Color versions of one or more of the figures in this paper are available domain knowledge of the 3-D SWNoC to efficiently pro-
online at http://ieeexplore.ieee.org. duce an sVL allocation that can significantly improve the
Digital Object Identifier 10.1109/TCAD.2016.2604288 reliability of the 3-D NoC.
0278-0070 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
720 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

However, on-chip photonics still suffers from performance


variation due to thermal issues [20]. In addition, the challenges
of integrating two emerging paradigms, namely, 3-D IC and
silicon nanophotonics, are yet to be adequately addressed.
B. Reliability Analysis
The performance of TSV-based 3-D ICs degrades due
to TSV failure. To overcome such performance penal-
ties arising from TSV failure, researchers have investigated
spare TSV (sTSV) allocation and sharing scheme for 3-D
ICs [4], [21]. The initial sharing algorithm was developed
based on the idea of utilizing one extra TSV for each of the
functional TSVs [21]. However, the reliability improvement
comes at the expense of double TSV count and significant
area overhead.
To avoid the 100% TSV area overhead, researchers have
Fig. 1. Conceptual view of 3-D SWNoC with TSV-enabled VL. For proposed several TSV sharing schemes (e.g., 3:2, 4:1, and so
simplicity, only one logical XY-plane is shown. on) [22], [23]. The main idea is to share sTSV/s among a group
of functional TSVs to compensate for performance penalty due
4) We perform a comprehensive experimental study by to possible TSV failure within that particular group. However,
using several PARSEC and SPLASH2 benchmarks to depending on the sharing scheme, a significant amount of
evaluate the proposed optimized 3-D NoC architecture, encoder and decoder logic circuits are necessary to shift the
and sVL allocation schemes. We show that the proposed signals and select the correct sTSV, which introduces addi-
3-D SWNoC outperforms the state-of-the-art NoC archi- tional delay and power consumption. In addition, the delay
tectures for all benchmarks considered in this paper. We for each TSV can be different depending on the location
show the effectiveness of our greedy sVL allocation of the failed TSV, which may result in timing violation. To
method by comparing its computation time and solu- address the varying delay for each TSV, a group-based 6-TSV
tion quality with those obtained via exhaustive search. placement scheme for four functional TSVs was proposed to
Finally, we also demonstrate the soundness of domain improve the reliability of a 3-D DRAM [24]. With the help of
knowledge used for pruning the search space for sVL a switchbox-based design for each group, correct signals were
allocation for 3-D SWNoC. selected and transferred for functional TSVs. The main advan-
tage was that the same amount of delay was incurred by every
II. R ELATED P RIOR W ORK TSV in the box. However, this advantage comes at the expense
We categorize the prior work on 3-D NoC design as follows. of 50% area overhead and significant power consumption
from the switch-boxes. Similarly, researchers have developed
a block-based redundancy architecture and used signal-shifting
A. 3-D NoC Architectural Space techniques for fault tolerance [25], [26]. In this case, if any
Most existing 3-D NoC architectures are based on a con- TSV fails, then the signal shifts toward the redundant ones.
ventional mesh topology [9]–[11]. However, it is well-known The signal shifting technique can tolerate one TSV failure.
that mesh-based NoCs suffer from high network latency and To improve fault tolerance for more than one TSV failure,
energy consumption due to multihop communication links. a crossbar-based redundant TSV architecture was developed
To exploit the reduced distance along the vertical dimension in [27]. However, the number of redundant TSVs increases
of 3-D IC, an NoC-bus hybrid architecture was proposed significantly in this case. TSV resource sharing algorithms,
in [12]; it uses dynamic time division multiple access to which can be selectively applied depending on the granularity
reduce the network latency. To reduce energy consumption and design complexity were also developed. Word-level and
of the system, the 3-D dimensionally decomposed NoC router bit-level TSV sharing was formulated as a constrained clique-
architecture [13] was developed. In an NoC, the largest per- partitioning problem and efficient algorithms were designed to
centage of energy is consumed by the routers, and energy solve it. However, these algorithms do not scale for large-scale
consumption increases nonlinearly with the number of input design problems.
ports. To reduce the energy consumption and the number All the above-mentioned TSV sharing schemes improve the
of input ports, an improved 3-D NoC router architecture performance of 3-D ICs and hence, the overall reliability as
was developed [14]. All these architectures have buses in the well. However, the allocation of sTSVs for 3-D NoCs need
Z-dimension; hence, with increasing network size, they are to consider additional constraints arising from the physical
subject to traffic congestion and high latency under high traffic NoC design perspective. In a 3-D NoC, TSVs are placed in
injection loads. a bundle to enable a single vertical link (VL). Depending on
The Sunfloor 3-D was developed for synthesizing appli- the physical placements of switches and cores of 3-D NoCs,
cation specific 3-D NoCs [15]. The design and synthe- these VLs maintain considerable physical distance between
sis of application-specific 3-D NoC architectures was also them. Hence, sharing TSVs among these VLs is not feasible
investigated [16], [17]. Later, a more general-purpose 3-D due to the physical design and timing constraints. In addition,
NoC was proposed in [18] using an integer linear program- if one TSV fails in a VL, then the achievable performance of
ming (ILP)-based algorithm to insert long-range links to the whole link is affected, which in turn degrades the overall
develop low diameter and low radix architecture. However, NoC performance. Hence, we focus on sVL allocation instead
the reduction in energy consumption was found to be limited. of individual sTSVs.
Photonic interconnects offer high bandwidth and low power In this paper, to analyze the reliability issues of 3-D NoC,
for future many-core chip design. A number of hybrid we evaluate the performance of a 3-D NoC with workload-
3-D/photonic NoC architectures have been designed [2], [19]. induced VL failure. We formulate sVL allocation as an
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 721

optimization problem to minimize the performance penalty


due to TSV-based VL failure. We demonstrate two different
algorithms, viz., greedy and exhaustive search to allocate the
sVLs in a 3-D SWNoC to compensate for the performance
penalty due to VL failure. We also compare the performance
of both algorithms in terms of quality of the solution and com-
putation time. We show that based on the domain knowledge
of a 3-D NoC, we can develop computationally efficient algo-
rithms whose performances are similar to exhaustive search,
a naïve approach.

III. 3-D N O C A RCHITECTURE D ESIGN


In this section, we first describe the design of an SW Fig. 2. High-level overview of the optimization algorithm.
network-based 3-D NoC. Next, we discuss the main challenges
for developing an energy-efficient 3-D NoC and the motivation C. Development of 3-D SWNoC
for a machine-learning-based optimization algorithm. Starting from a power-law-based connectivity, we attempt
to optimize the location of the planar links and the VLs to
achieve lower latency and energy consumption. We define
A. Problem Description an objective function O called communication cost, which
The goal of an on-chip communication system design is to combines the NoC performance metrics, namely, the network
transmit data with low latencies and high throughput using the latency and energy consumption per message. Optimizing the
least possible power and resources. In this context, design of communication cost ensures lower average hop count and
SW network-based NoC architectures [7] is a notable exam- improvement in the network performance in terms of both
ple. It has been shown that either by inserting long-range latency and energy consumption. However, the space of phys-
shortcuts in a regular mesh architecture to induce an SW ically feasible SW-based 3-D NoC designs D is combinatorial
effect or by adopting a power-law-based SW connectivity, it in nature and our goal is to find the design d ∈ D that
is possible to achieve significant performance gain and lower minimizes O. One could employ search algorithms such as
energy consumption compared to traditional multihop mesh hill-climbing and simulated annealing (SA), which are very
networks [7], [8]. In this paper, we advocate that the con- popular in the design community for this task. However, we
cept of small-worldness should be adopted in 3-D NoCs too. leverage machine-learning techniques that have been shown to
Specifically, the VLs in 3-D NoC should enable the design of improve the performance of these search algorithms by intelli-
long-range shortcuts necessary for an SW network. However, gently exploring the design space [31], [32]. This optimization
the appropriate placement of the planar and the long-range process is undertaken before the actual NoC implementation.
links along the vertical dimension is crucial for maximizing
the performance benefits. Hence, our goal is to optimize the
placement of the planar and the VLs in a 3-D NoC, where the IV. N O C O PTIMIZATION BASED ON
overall interconnection architecture follows the SW connectiv- M ACHINE -L EARNING
ity, and improves the network latency and power consumption We employ an online learning algorithm called
per message. STAGE [31], which was originally developed to improve the
performance of local search algorithms (e.g., hill climbing)
with random-restarts for combinatorial optimization problems.
B. Small-World Network The high-level conceptual idea of the algorithm is shown
An SW network lies in-between a regular, locally intercon- in Fig. 2. The key insight behind STAGE is to leverage some
nected mesh network and a completely random Erdös–Rényi extra features φ(d) ∈ Rm (m is the number of features) of
topology. SW graphs have a very short average path length, the optimization problem to learn an improved evaluation
defined as the number of hops between any pair of nodes. function E that can estimate the promise of a design d as
The average shortest path length of SW graphs is bounded a starting point for the local search procedure A. It employs
by a polynomial in log (N), where N refers to the number of E to intelligently select promising starting states that will
nodes; this property makes SW graphs particularly interest- guide A toward significantly better solutions. Past work
ing for efficient communication with minimal resources [28]. in the search community concluded that many practical
To develop SW network, we follow the power-law-based optimization problems exhibit a “globally convex” or “big
connectivity [28]. The probability (p) of having a direct link valley” structure, where the set of local optima appear to be
between nodes in an SW network varies exponentially with convex with one global optimum in the center [32]. The main
the link length (), i.e., p() ∞ −α where, the parameter α advantage of STAGE over popular algorithms such as SA
governs the nature of connectivity, e.g., a larger α means and ILP is that it tries to learn the solution space structure,
a locally connected network with a few, or even no long- and uses this information in a clever way to improve both
range links. By the same token, a zero value of α generates an convergence time and the quality of the solution. This aspect
ideal SW network following the Watts–Strogatz model [28]— of STAGE is very advantageous for large system sizes to
one with long-range shortcuts that are virtually independent improve the design-validate cycle before mass manufacturing
of the distance between the cores. Here, we consider an SW and for dynamically adapting the designs for new application
network with connectivity parameter α equal to 2.4, which workload. To the best of our knowledge, this is the first
was shown to produce energy efficient and high performance work that applies STAGE to an NoC design optimization
3-D SWNoC [29]. This analysis is elaborated in the longer problem. Algorithm 1 provides the pseudocode for our NoC
version [30]. optimization technique.
722 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

TABLE I
Algorithm 1 NoC Design Optimization via STAGE F EATURE D ESCRIPTION
1: Input: D = Design space, O = cost function,
(I, S) = initial state and successor generation functions,
C = network constraints, φ = feature function for NoC design,
A = local search procedure, R =regression learner,
MAX = maximum iterations,
2: Output: dbest , the best NoC design
3: Initialization: initialize evaluation function E, training set Z,
initial design d0 , Obest = O(d0 ), and dbest = d0
4: Repeat: minimize ( fij ∗ dij ), where fij and dij are the com-
5: Base search: From d0 , run the search procedure A guided by munication frequency and Cartesian distance between
O until a local optima is reached, leading to a search trajectory the cores, respectively. In this step, we form clusters
(d0 , d1 , . . . , dT ).
with 16 cores in each die.
6: Generate training data: For each design di on the search
trajectory, add (φ(di ), yi ) to Z, where yi is the best value along
3) Link Distribution (L): The link length distribution L =
the search trajectory. {l1 , l2 , . . . , lk }, where k depends on the size and topol-
7: Re-train E: E = R(Z). ogy of the network; li ’s are determined based on the
8: Meta search: From dT , run the search procedure A guided by SW connectivity parameter α. For higher values of α, lk
E until a local optima is reached to produce the best predicted decreases.
starting state d̂. 4) Communication Frequency (F): The communication
9: Next starting state: If d̂ = dT (no search progress), set d0 frequency among different cores F = { fij |1 ≤ i,
using I. Otherwise, set d0 = d̂. j ≤ N, i = j}. We assume that F for each application is
10: Update Obest and dbest if y∗ < Obest , where y* is the best given as an input to perform application-specific network
value encountered during base search and meta search. optimization.
11: Until MAX iterations or convergence. 2) Objective Function O: We define O as the communica-
12: Return best design dbest . tion cost of the given 3-D NoC, which is the product of hop
count, frequency of communication, and link length summed
over every source and destination pair, that is

N 
N
 
A. Challenges O= r ∗ hij + dij ∗ fij (1)
The main challenges in applying STAGE to 3-D NoC design i=1 j=1,i=j
are as follows.
1) We need to define additional features of the optimization where fij and dij are defined as above; hij is hop count
problem that can be exploited to learn improved evalua- between ith and jth node, and r denotes the number of
tion functions for efficient design space exploration. We switch stages. From a practical point-of-view, r is the num-
provide these features for 3-D NoC designs (Table I), ber of cycles a message spends inside a switch to move
but they can be adapted to other types of NoC designs from input to output port. An NoC design with low O will
as well. have low latency and energy consumption, and hence, low
2) Defining appropriate search spaces by leveraging the energy-delay-product (EDP).
domain knowledge can potentially improve the effec- 3) Network Constraints: To explore only physically feasi-
tiveness of the STAGE algorithm. We need to identify ble 3-D NoC designs, we enforce some constraints on the
good starting state distribution (subset of initial 3-D placement of VLs and switch configurations. If TSVs are
NoC design solutions) and search operators (actions to considered as the VLs, we only allow placing them point-
get successor states from a given state) to navigate the to-point (regularly) between the switches. Such constraints
design space. We have explored γ -greedy for starting may put additional limits on the performance of NoC designs.
state distribution with the hope of improving over ran- However, efficient optimization can overcome such limitations.
dom starting state distribution (see “starting states and The SW network has an irregular connectivity. Hence, the
successor function” below). number of links connected to each switch is not constant. For
3) We need to find a good knowledge representation for fair comparison between our SW network and 3-D MESH,
the evaluation function E that is expressive, can be we assume that both of them use the same average number
trained quickly, and makes fast predictions. We picked of connections, <kavg > per switch. This also ensures that the
regression trees (RTs) as it satisfies all the requirements. 3-D SW NoC does not introduce additional links compared to
a 3-D MESH. For a 64-core system, <kavg > is 4.5 considering
all the switches, including the peripheral ones. In addition, the
maximum connectivity per node, <kmax >, is set to be 7 for
B. Instantiation for 3-D NoC Optimization the SW network as found in [33].
In this section, we provide all the details needed to apply 4) Starting States and Successor Function: For starting
the STAGE algorithm to our 3-D NoC optimization problem. states, we randomly generate an SW network that satisfies the
1) Design Space: Our design space depends on a set of network constraints. The successor function S takes a network
network resources, which are given as input to the optimization as input and returns a set of next states, and allows the search
algorithm. These resources are defined as follows. procedure to navigate the NoC design space. S generates one
1) Cores (C): A set of all cores C = {C1 , C2 , . . . , CN }, candidate state for each link connecting two nodes in the input
where N is total number of cores. We assume that every network. It simply removes that link and places a link with
core is connected to at least one switch. the same length between two nodes in the NoC that are not
2) Planar Dies (P): A set of all dies P. For N = 64, we directly connected.
consider four dies with each die containing 16 cores. The STAGE algorithm can benefit if we can specify
For core placement, we follow a greedy algorithm to the starting state distribution using some domain knowledge.
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 723

Therefore, we also consider a starting-state distribution, We employed the WEKA machine-learning toolkit [35] to
named, γ -greedy. We formulate the starting state (design) train RTs over training set Z, and tune the hyper-parameters
construction as a sequential decision-making task, where we using validation data.
select the next link to be placed at each step. In γ -greedy dis-
tribution, we select a link greedily with probability γ based V. S PARE -V ERTICAL L INK A LLOCATION
on communication frequency and a random link with proba-
The anticipated performance gain of 3-D NoC-enabled
bility (1 − γ ). We start with γ = 1 (completely greedy) and
many-core chips can be compromised due to potential failures
gradually reduce γ to increase the randomness.
5) Local Search Procedure A: We employed a stochastic of the TSVs that are mainly used as vertical interconnects in
hill-climbing procedure, where the next states are sampled a 3-D IC. Workload induced stress is one of the main reasons
stochastically. for the failure of TSV-based VLs in 3-D IC. Stress increases
6) Feature Function φ: The main challenge in adapting the resistance of the TSVs, which leads to different MTTF
STAGE to our NoC domain is to define a set of features φ for different TSV-based VLs [4], [5]. The TSV failure model
for each network that can drive the learner. We divide the is described in detail in the longer version of this paper [30].
whole network into several overlapping subgraphs or regions, The performance of the 3-D NoC degrades over time, lead-
and define a set of features that can be categorized into three ing to eventual failure of the chip. Therefore, we consider the
types. allocation of sVLs as a way to improve the reliability of the
1) Average Hop Count (h): which calculates the average 3-D NoC.
hop count for each region or subnetwork.
2) Weighted Communication: which is defined as the sum A. Spare VL Allocation Problem
of the products of hop count and communication fre- Given a set of m functional VLs F and budget size of
quency over  all source-destination
N pairs for a particular sVLs n (n > 0, n << m), we want to select the subset
hop count ( N i=1 f
j=1,j=i ij ∗ hk ). The highest value of n functional VLs out of m those when provided with one
of k depends on the network size and topology. If sVL each will maximize the reliability (lifetime) of the 3-D
the value of this feature is small, it indicates that NoC. We can experimentally compute the quality of a given
highly communicating cores are placed in the same sVL allocation solution by running a simulation. This is an
neighborhood. instance of a combinatorial optimization problem with an
3) Clustering Coefficient (Cc ): which captures the connec- unknown cost function, where the quality of a given solution
tivity of one core with its neighbors [34]. While the hop can be computed only by making a simulator call. Here, the
count takes into account mainly long-range communi- term “solution” refers to a particular 3-D NoC configuration
cation, the clustering coefficient focuses more on local incorporated with sVLs for n functional VLs.
connectivity among the immediate neighbors. We found
these features to sufficiently capture the network charac- B. Computational Challenges
teristics, efficient to compute, and allow learning highly The main challenge here, is that we have ahuge
accurate evaluation function, E.  number
m
In this paper, for N = 64 cores, we divide the whole net- of possible solutions or NoC configurations n to allo-
work into nine regions. For each region, we consider average
cate sVLs among the functional links. A naïve approach is
hop counts as the features. In addition, the initial network has
to enumerate all possible solutions; compute the quality of
the highest hop count of eight, and hence, we require eight fea-
each solution via simulator call; and pick the best solution.
tures for weighted communication cost. Finally, for each die
However, the simulator call is expensive in terms of both
in the network, we consider the average clustering coefficient
time and memory requirements. Hence, this exhaustive search
and it gives rise to four more features. Table I lists all these
approach to quantify the performance and lifetime of each of
features.
7) Regression Learner: The quality of our optimization the candidate configuration is infeasible for practical purposes.
methodology depends on the accuracy of the evaluation func-
tion E. We can employ any regression learning algorithm, C. State Space Search Formulation
e.g., k nearest-neighbor, linear regression, support vector We solve the sVL allocation problem using a state-space
regression, and RT. However, a regression learner that is non- search formulation, where the simulations guide the search
linear, fast in terms of training time and prediction time will process. Each state in our search space is a particular NoC
improve the effectiveness of the STAGE algorithm. Therefore, configuration allocated with sVLs and consists of a set S ⊆ F,
analytically, the RT learner suits our needs the best. where S is a partial or complete solution. Our search space
Our training data consists of a set of input–output pairs is a 3-tuple <I, A, T>, where I is the initial state function
{(xi , yi )}ni=1 , where each xi ∈ Rm is a feature vector and yi ∈ R that returns the initial search state S = ∅ meaning solution
is the corresponding output. The RT learning algorithm tries to set is empty; A is a finite set of actions (or search opera-
learn a function E in the form of tree (a set of if-then rules) to tors) corresponding to growing the partial solution S by one
minimize the deviation of the predicted output E(xi ) from the element from F\S; and T is the terminal state predicate that
correct output yi . The key idea in RT learning is to recursively- maps search nodes to {1, 0} indicating whether the node is
partition the input space (as in hierarchical clustering) until we a terminal or not. Each terminal state in the search space cor-
find regions that have very similar output values. The recursive responds to a complete solution (|S| = n, where |S| denotes
partitioning is represented as a tree, where leaves correspond to the total number of candidates of S), while nonterminal states
the cells of the partition. Each leaf is assigned the sample mean correspond to a partial solution (|S| < n). Thus, the decision
of all the output variables in that cell as its prediction. During process for constructing a complete solution corresponds to
testing, we find the cell of the partition that input x belongs selecting a sequence of actions leading from the initial state
to through a series of comparison questions on the features, (none of the sVLs are allocated) to a terminal state (all the n
and return the prediction associated with that cell. RTs also sVLs are allocated). In principle, we can employ any heuristic
allow us to identify the features that are important in making search procedure (e.g., greedy and beam search) guided by
predictions. simulations.
724 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

Fig. 3. Nonhomogeneous VL utilization pattern of the 3-D SWNoC for the CANNEAL benchmark. The region between second and third dies is denoted
by VLs numbering 17 ∼ 32, and carries 45% of the total VL traffic of the four die 3-D system.

Algorithm 2 Greedy sVL Allocation benchmark (one of the PARSEC benchmark with highest traf-
1: Input: F = set of m functional VLs, fic injection load and skewed traffic). We can see that the traffic
n = budget for spare-VLs, densities of VLs 17–32 (we call this region as critical region)
2: Output: S, the best set of n fVLs that gets spares are significantly higher than that of the others and expectedly,
3: Initialization: initialize solution set S = ∅ their MTTF values are significantly lower.
4: for each greedy step = 1 to n Our key insight is that for a small budget size n (say less
5: for each choice x ∈ F than the number of critical VLs), the spares should be allocated
6: value (x) = simulator_call (S ∪ x) to some of the critical VLs only and there is no benefit for
7: end for allocating spares to noncritical VLs (chip will fail due the
8: x∗ = arg max value(x) failure of all critical VLs). We can use this domain knowledge
x∈F
9: S = S ∪ x∗ // Functional VL x* gets spare to prune the search space of possible solutions for the spare-
10: F = F\x∗ // x* is removed from F VL allocation problem. Let H ⊆ F correspond to the critical
11: end for VLs and the total number of critical VLs is h, where h = |H|.
12: return S If we consider complete solutions from H only (i.e., subsets
*simulator_call is a procedure that calculates and returns the net- of size n from H), we can still retain the optimal solution. In
work performance and lifetime for a given NoC, benchmark suite, other words, we get huge computational savings without losing
and routing algorithm through extensive experiments. any accuracy dueto sound pruning.
  For exhaustive search, we
h m
can consider n instead of n candidate solutions, where
h < m. For greedy search, we only consider the VLs from H
D. Greedy Search for Spare-VL Allocation
for spare allocation.
This is the simplest search procedure (Algorithm 2). We For the rest of the experiments and analysis, we denote
start with an empty solution set S. In each greedy step, we add the baseline greedy and exhaustive search by greedy-full
the sVL from F \S to the solution set S that when provided with and exhaustive-full, respectively. In addition, these techniques
a spare link, it improves the reliability by maximum amount. enabled with domain knowledge-based pruning are named as
We repeat this greedy selection step until S is a complete greedy-restricted and exhaustive-restricted.
solution (|S| = n). The time complexity of greedy search is
O(m ∗ n − n2 ) simulator calls. VI. E XPERIMENTAL R ESULTS AND A NALYSIS
The greedy search is able to produce highly effective sVL
allocation that can significantly improve the reliability of the In this section, we first present the achievable perfor-
3-D NoC. This effect of greedy sVL allocation was observed mance and energy consumption profiles of our optimized 3-D
through experimental studies as the cost function is unknown SW NoC architecture. Then, we present a detailed reliability
and we need to find solutions via simulator calls. The alloca- analysis in the presence of sVL insertion.
tion policy to allocate a spare (if sVL budget allows) to the first
functional VL that fails with a given functional and sVL-based A. Experimental Setup
3-D NoC configuration is highly effective. Intuitively, if we do To evaluate the performance of different NoCs, we use
not allocate spare to the functional VL that is expected to fail a cycle-accurate NoC simulator that can simulate any regular
first, it will result in a cascade of VL failures reducing the or irregular 3-D architecture [36]. We consider a chip mul-
lifetime of the chip drastically. tiprocessor consisting of 64 cores and 64 network switches
equally partitioned in four layers. In each die, 16 cores are
placed in regular interval in a grid pattern [3]. The length
E. Domain Knowledge for Sound Pruning of each packet is 64 flits and each flit consists of 32 bits.
In 3-D NoC enabled many-core chips, some VLs experience The switches are synthesized from an RTL level design using
heavy traffic and high utilization as the underlying routing TSMC 65-nm CMOS process in synopsys design vision.
algorithm tries to find shortest paths between source and des- All switch ports have a buffer depth of two flits and each
tination cores via these links. As a result, those VLs with switch port has four virtual channels in case of irregular
high utilization undergo heavier stress, and introduce addi- NoC. The NoC simulator uses wormhole routing, where the
tional delay in the path and fail more quickly when compared data flits follow the header flits once the router establishes
to others. Moreover, this is not an independent phenomenon: a path. For regular 3-D mesh-based NoC, XYZ-dimension
one VL failure can decrease the time to failure of a neigh- order-based routing is used. For irregular architectures such
boring VL leading to a clustering effect as workload of the as the SW network, the topology-agnostic adaptive layered
neighboring links increase. For example, in Fig. 3, we show shortest path routing algorithm is adopted [37]. The energy
the traffic densities and the MTTF values of all the VLs for consumption of the network switches was obtained from
a 64-core and four-layer 3-D SWNoC for the CANNEAL the synthesized netlist by running synopsys prime power,
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 725

Fig. 4. Performance comparison among the machine-learning-based opti-


mization algorithm (STAGE), the SA, and the GA.
Fig. 5. Effect of optimization algorithm on weighted communication features.

while the energy dissipated by wireline links was obtained and the regression-learning algorithm. Note that the best O-
through HSPICE simulations. We consider four SPLASH-2 value decreases monotonically as the set of explored designs
benchmarks, namely, FFT, RADIX, LU, and WATER [38], increases over the iterations. We also ran the same experi-
and five PARSEC benchmarks, namely, DEDUP, VIPS, ment with the γ -greedy starting state distribution as mentioned
FLUIDANIMATE, CANNEAL, and BODYTRACK (BT) [39] above. However, the communication cost O and the prediction
in this performance evaluation. These benchmarks vary in error have similar characteristics as the random distribution for
characteristics from computation intensive to communication the benchmarks and the system size considered in this paper.
intensive in nature and thus are of particular interest in Therefore, we present and discuss our results with a random
this paper. starting-state distribution.
It is also seen that, both the SA and GA show similar trends
in the cost function optimization. Both of them reach Obest
B. Performance of the Optimization Algorithm more gradually compared to STAGE, and even after 50 min
In Section IV, we described the details of the STAGE their respective Obest does not reach the same solution as
optimization algorithm for designing the 3-D SWNoC archi- STAGE. It should be noted that we have to optimize the link
tecture. Here, we first characterize the performance of the locations for various applications. Hence, this additional time
optimization algorithm by quantifying various performance needed by SA and GA will be a significant overhead when we
metrics of the optimized 3-D SWNoC. To evaluate the per- have to optimize and reconfigure the SWNoC in the field. It
formance of STAGE algorithm, we compare it with the well- should be noted that the final link distribution of the optimized
known combinatorial optimization algorithms, viz., SA [40] 3-D SWNoC is the same for SA, GA, and STAGE. However,
and genetic algorithm (GA) [41]. We evaluate the perfor- as shown in Fig. 4 the benefit of STAGE over SA and
mance in terms of both the quality of solution and the GA mainly comes from the much faster convergence time.
convergence time. We can conclude that STAGE algorithm is more efficient in
1) STAGE vs. SA and GA: We create the initial network designing an optimized SWNoC with better performance. We
following the power law distribution shown in Section IV, denote the final optimized NoC as 3-D SW_opt.
where long-range links are placed randomly. Our goal is to 2) Characteristics of the Design (Random vs. Optimized):
find an optimized network starting from this random SW net- Now we investigate why the STAGE-based optimization algo-
work. We call this initial NoC architecture as 3-D SW_rand. rithm is suitable for developing energy-efficient NoC architec-
Fig. 4 shows the communication cost of the optimized net- tures. In Section IV, we described the details of the feature def-
work from the STAGE, SA, and GA algorithm as a function inition (φ), to represent each network. So, we will explore how
of time. the design features change before and after the optimization
To compare the performance of these two optimization process. Here, we specifically consider the role of the weighted
algorithms, we consider two parameters, viz., the quality of communication feature mentioned in Section IV. Fig. 5 shows
the solution and the convergence time. To make the com- the weighted communication feature, which reveals the per-
parison fair, we consider the same NoC configuration and centage of total communication that is constrained between
apply both STAGE and SA algorithm to optimize it. We used two nodes separated by k hops (k ≥ 1). Careful observation
a machine configured with Intel Core i7-4700MQ processor of Fig. 5 shows that for 3-D SW_opt, the traffic constrained
and 8 GB RAM running at a clock frequency of 2.4 GHz. within one, two, and three hop increases compared to 3-D
Fig. 4 shows the cost of the best solution obtained at any SW_rand. Moreover, the amount of traffic that has to traverse
particular time for SA, GA, and STAGE. We consider the best beyond three hops decreases.
explored cost, Obest , as the quality of the optimization algo- Hence, the internode communication that takes place in
rithm. It is evident that STAGE reaches Obest very fast (within less than three hops becomes more frequent. Since the aver-
5 min). During the optimization process, the learned func- age hop count of the optimized network is calculated to be
tion E predicts an initial network configuration to start the 2.94, any communication below this average hop count can be
local search procedure that can lead to lower communication considered to be efficient. Essentially, the optimized network
cost (O). During the initial exploration phase, the error-rate becomes more efficient for the same objective function.
is nonmonotonic and high. After a few iterations the predic- The inset in Fig. 5 shows the percentage of communi-
tion error reduces to less than 1%, and after 20 iterations, the cation versus the number of hops, where the area under
error is almost zero (0.05%). The prediction error remained the curve denotes the weighted communication feature men-
more or less the same for all the subsequent iterations. These tioned in Section IV. We can see that the 3-D SW_opt
results indicate the effectiveness of our network features φ curve shifts toward the left, which means that on an average,
726 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

(a)

(b)

(c)

Fig. 6. (a) Normalized network latency of 3-D SWNoC compared with other 3-D NoCs. (b) Normalized energy consumption per message of 3-D SWNoC
compared with other 3-D NoCs. (c) Normalized EDP of 3-D SWNoC compared with other 3-D NoCs.

TABLE II
any message in the optimized network traverses less hops com- C OMPARISON OF AVERAGE H OP C OUNT AND C OMMUNICATION
pared to the initial network. Hence, it spends less time inside C OST OF 3-D N O C A RCHITECTURES
the network and occupies less network resources. Therefore,
the STAGE-based optimization algorithm converges to an
efficient architecture.

C. Performance of 3-D SWNoC Compared to


Other 3-D NoCs
In this section, we compare the performance of 3-D SWNoC
with several existing 3-D NoC architectures. For the com-
parative performance evaluation, we consider 3-D MESH and
two recently proposed irregular 3-D NoCs, namely, mrrm and It can be seen that among all the NoCs, 3-D MESH, and
rrrr [42]. Both the mrrm and rrrr NoCs have point-to-point 3-D SW_opt exhibit the highest and the lowest latency, respec-
vertical connections as in 3-D MESH and 3-D SW. However, tively. The mrrm and rrrr architectures perform somewhere in
their die-level planar connection pattern varies. For rrrr, all the the middle. As in the case of 3-D SW NoC, both mrrm and
four dies have randomly connected interconnection patterns. rrrr have irregularities in the horizontal planes. However, the
On the other hand, mrrm has random connection patterns in number and the length of the links are not optimized for these
the middle two dies whereas the first and the fourth dies fol- architectures. For rrrr, the link distribution has large number of
low mesh-based regular connectivity. To build mrrm and rrrr, long-range links that help communication among long-distant
we follow the method suggested in [42] and keep the num- cores at the expense of local communication. In the case of
ber of links equal to that of 3-D MESH and 3-D SW. All the 3-D SW_opt NoC, the link distribution follows the power law
performance metric values are normalized with respect to the and the connection pattern is optimized to facilitate both the
3-D MESH. local and long-range communications.
In addition, to show the effect of the optimization algorithm, The mrrm architecture maintains the link distribution in
we evaluate and compare the performances of the optimized between rrrr and 3-D SW_opt NoC. Hence, its network
NoC architecture with un-optimized counterpart marked as latency lies in between rrrr and 3-D SW_opt. Finally, 3-D
3-D SW_opt and 3-D SW_rand, respectively. MESH NoC suffers from higher average hop count compared
1) Network Latency: Fig. 6(a) demonstrates the normalized to other 3-D architectures due to multihop communication
network latency of both the 3-D SW_rand and 3-D SW_opt pattern; hence, it suffers from the highest network latency.
NoC compared with other existing 3-D NoCs. The optimiza- Table II lists the communication costs and average hop counts
tion improves the network latency on an average of 3% over for all these NoCs. As expected, 3-D SW_opt and 3-D MESH
the un-optimized version, and 5.5% over the conventional 3-D exhibit the lowest and highest communication cost and hop
MESH. The optimization process redistributes the links among count, respectively, whereas mrrm and rrrr reside in between
the cores such that cores that have to frequently communicate these two. The effect of these costs is reflected in the latency
with each other are either directly connected or need to tra- characteristics.
verse a small number of hops. This results in reduced average 2) Energy Consumption: Energy consumption per mes-
hop count and weighted communication for 3-D SW_opt NoC. sage depends on the amount of energy consumed by the
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 727

switch as well as the planar links and VLs. The STAGE-


based optimization algorithm reduces the average hop count
and communication cost by optimizing the objective func-
tion O specified in (1). As a result, both the switch and
link energy consumption are minimized. Fig. 6(b) plots the
energy consumption profile of the 3-D SWNoC before and
after optimization along with the profile for other 3-D NoCs.
All the energy values are normalized with respect to the cor-
responding values for the 3-D MESH. On an average, the
3-D SW_opt NoC shows 33% and 17% energy consumption
improvement over the 3-D MESH and 3-D SW_rand, respec-
tively. Fig. 5 helps us in understanding the reasons behind the
improvement in energy consumption. The area under the 3-D
SW_opt curve is less than that of the un-optimized counterpart.
Fig. 7. Normalized EDP profile of 3-D NoCs with workload induced VL fail-
Hence, 3-D SW_opt reduces the utilization of NoC resources ure scenario for CANNEAL benchmark. The EDP is normalized with respect
for any message. As a result, both the switch and link energy to fault free 3-D MESH at t = 0.
decrease and the overall energy profile improves.
Fig. 6(b) also demonstrates that among all other NoCs, TSV-based VLs fail, then the EDP and network latency of 3-D
3-D MESH has the highest energy consumption followed by NoC increases and in the worst case, the corresponding NoC
mrrm, rrrr, and 3-D SW_opt NoC. Higher network latency may contain disjoint source-destination pairs in the network.
of any NoC increases the utilization of network resources As a result, the performance of the 3-D NoC degrades over
and hence, higher energy consumption per message. For 3-D time.
MESH, the switch energy consumption is significantly higher Fig. 7 demonstrates the EDP profile of 3-D MESH, mrrm,
due to multi hop communication, so it performs the worst rrrr, and 3-D SWNoC with workload induced VL failure sce-
among all of them. The mrrm and rrrr NoCs are capa- nario with time for the CANNEAL benchmark (for up to
ble of reducing the switch energy consumption compared to 15 faults). All the EDP values are normalized with respect
mesh and performs better than 3-D MESH. However, due to fault free 3-D MESH at t = 0. From the figure, we can see
to their random link distribution, they suffer from higher that the EDP values of all the 3-D NoCs increase with time as
communication cost and average hop count compared to the the VLs fail progressively. Among all the NoCs, 3-D SWNoC
optimized SW NoC. Hence, they consume more link energy shows the lowest EDP value for any particular period of time
and switch energy. With the least communication cost, 3-D and the rate of increase in EDP is also lower than the other
SW_opt consumes the lowest energy possible among all these NoCs. As a result, 3-D SWNoC is expected to have longer
architectures. The detailed breakdown of various components lifetimes relative to other NoCs. Note that 3-D SWNoC is
of the energy consumption of 3-D SWNoC is provided in the inherently robust against link failure and its average hop count
longer version [30]. increases only marginally in the presence of link failures due
3) Energy-Delay-Product: The EDP is directly affected by to the SW nature of the overall connectivity [28]. As a result,
the network latency and energy consumption. The architecture it shows better robustness and EDP profile in comparison to
that performs best in terms of latency and energy consumption other NoCs.
is expected to have lower EDP compared to the EDP of other To address this time-dependent failure of VLs and the EDP
3-D NoCs. Fig. 6(c) presents the EDP profile of both the un- performance degradation, we propose to incorporate sVL allo-
optimized and optimized 3-D SWNoCs along with other 3-D cation. As 3-D SWNoC is inherently more robust than other
NoCs. From the EDP profile, we observe that the average EDP NoCs, we focus on allocating sVLs to 3-D SWNoC and
of 3-D SW_opt NoC is reduced by approximately 35% and analyze its performance with such an allocation.
19% compared to 3-D MESH and 3-D SW_rand, respectively.
In addition, among all the other 3-D NoCs, 3-D SW_opt NoC VII. P ERFORMANCE OF 3-D N O C S
has expectedly the lowest EDP profile followed by mrrm, rrrr, W ITH S VL A LLOCATION
and 3-D MESH. For all the benchmarks, 3-D SW_opt shows
In this section, we analyze the sVL allocation methodology
the best EDP improvement of 43% for RADIX. From the
and its effects on lifetime and overall reliability of the 3-D
above results and analysis, we can conclude that 3-D SW_opt
NoC. To quantify the effects of sVL allocation, we first define
performs better than all other considered NoCs in terms of
the lifetime of the 3-D SWNoC. Whenever any functional
network latency, energy consumption, and EDP. Hence, for
VL fails in a 3-D SWNoC, the average hop count increases
the rest of the experiments, we consider this optimized 3-D
and hence, the network latency and EDP increase as well.
SW architecture and denote it by 3-D SWNoC for simplicity.
Eventually, the EDP of the 3-D SWNoC may be higher than
fault free 3-D MESH, where it can no longer be considered
D. Performance of 3-D NoCs in Presence of Link Failures as an efficient NoC. At this point, the 3-D SWNoC loses its
architectural advantages over a 3-D MESH. We consider the
In this section, we analyze the robustness of the 3-D time required to reach this configuration as the lifetime of the
SWNoC architecture under VL failure. The reason behind 3-D SWNoC.
studying the scenario of VL failures is that despite the recent
advancements in TSV technology, TSVs are still subject to
failure due to voids, cracks, and misalignment [4], [43]. In A. Greedy vs. Exhaustive Search for sVL Allocation
addition, TSVs also face the wear-out problem due to stress For the sVL allocation, we explored two different allo-
arising from potentially high workload. The imbalance in cation algorithms as explained in Section IV: 1) greedy
workload among different TSV-based VLs in the NoC also search and 2) exhaustive search. The allocation algorithms
creates wide variation in TSV MTTF, where some VLs fail are named as greedy-full and exhaustive-full (The suffix
early compared to others. Due to all these reasons, if the “full” is added to differentiate the algorithms with domain
728 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

Fig. 8. Lifetime of the 3-D SWNoC as a function of the number of sVL for the CANNEAL, DEDUP, and VIPS (from the left) benchmark for greedy-full
and exhaustive-full sVL allocation.

knowledge-based pruning, which we will introduce later). For increases for both exhaustive search and greedy search. The
brevity, we show results for three representative benchmarks time complexities of exhaustive search and greedy search

with varying traffic patterns, viz., CANNEAL, DEDUP, and m
VIPS. These benchmarks are chosen because they have a wide (in terms of the number of simulator calls) are O n and
variation in message injection rates, e.g., high (CANNEAL), O(mn − n2 ), respectively. For example, for a 64 core 3-D
medium (DEDUP), and low (VIPS). Fig. 8(a)–(c) plots the NoC with m = 48 and n = 8, the total solution exploration
lifetime of the 3-D SWNoC with sVL allocation using greedy- times for exhaustive and greedy search are 377, 348, 994,
full and exhaustive-full algorithms for different number of and 356q, respectively, (here q corresponds to the computa-
sVLs for the CANNEAL, DEDUP, and VIPS benchmarks, tion time of a single simulator call which is ∼7 min in the
respectively. From these figures, we can see that both greedy- current experimental setup using a machine configured with
full and exhaustive-full sVL allocation algorithms achieve the Intel Core i7-4700MQ processor and 8 GB RAM running at
same lifetime for the 3-D SWNoC. Note that, greedy search (as a clock frequency of 2.4 GHz). Therefore, our sVL alloca-
expected) takes significantly less computation time to produce tion algorithms may not scale for large-scale 3-D NoC. We
the solution when compared to exhaustive search. To explain consider using domain knowledge of the workload of differ-
this, we first need to understand the details of the sVL alloca- ent functional VLs to prune the solution space as described in
tion procedure. To be more specific, the VL failure sequence Section V. In 3-D NoC, the workload of some fVLs (say criti-
and its effects on NoC performance need to be explored. cal VLs) is much higher than the others and hence, their failure
If any functional VL fails, then the workload of this partic- probabilities are higher too. Intuitively, when the sVLs budget
ular VL negatively affects the other neighboring VLs and as (n) is small, it is beneficial to allocate spares to some of the
a result, the EDP increases rapidly. Consequently, allocation critical VLs only because the chip will fail due to a cascade
of sVLs to the functional VL, which fails first, is expected to of critical VL failures.
minimize the NoC performance penalty. If sVL is allocated We select a subset of critical VLs (say H) out of the m
without following the VL failure sequence, then the allocation functional VLs that we will consider for allocating spares
effect may not be visible on both the EDP profile and lifetime and prune the remaining ones. Pruning can improve the com-
at all. putational efficiency of solving the sVL allocation problem,
To explain this behavior in more detail, we consider the case but may potentially compromise the accuracy of solutions
of the CANNEAL benchmark for the 3-D SWNoC and num- depending on the amount of pruning. We can consider varying
ber the 48 VLs serially starting from 1 to 48 for a 64 core amounts of pruning from |H| = n (only one candidate solu-
system (as shown in Fig. 3). For 8 sVLs, the sVL alloca- tion) to |H| = m (no pruning) to tradeoff speed and accuracy of
tion solution from exhaustive search corresponds to assigning producing sVL allocation solutions. A simple pruning strategy
spares to functional VLs numbered 26, 22, 27, 10, 42, 43, 7, to achieve this goal is as follows: rank all the functional VLs
and 6. Somewhat surprisingly, greedy search also produced according to their workload; select the top-|H| VLs to be con-
the same sVL allocation solution. Our experimental analysis sidered for spare allocation; prune the remaining m-|H| VLs.
showed that the greedy search produces sVL allocation solu- We can use both exhaustive search and greedy search to find
tions that can significantly improve the reliability of the 3-D the solution from this restricted set of candidate solutions. We
NoC. The allocation policy to allocate a spare (if the sVL bud- refer to the exhaustive and greedy sVL allocation algorithms
get allows) to the first functional VL that fails with a given as exhaustive-restricted and greedy-restricted, respectively.
functional and sVL-based 3-D NoC configuration is highly Fig. 3 shows the traffic densities of all the VLs for
effective. Intuitively, if we do not allocate spare to the func- a 64-core 3-D SWNoC consisting of four planar layers for
tional VL that is expected to fail first, we will be faced with the CANNEAL benchmark. It can be noted that the traffic
a cascade of VL failures, which will reduce the lifetime of the densities of some VLs (critical VLs) are significantly higher
chip drastically. For example, the VL failure sequence without than that of the others. To identify the critical VLs, we rank
any spares allocated is 26, 22, 27, 10, 32, 30, 25, 18, and so on. VLs according to workload and sort the highest workload
Greedy search allocates the first spare to functional VL 26. The ones. In this particular work, we consider 16 critical VLs.
VL failure sequence after assigning spare to VL 26 is 22, 26, This number is chosen considering the worst-case VL failure
27, 23, 32, 30, 31, 25, 18, and so on. Greedy search allocates scenario where all 16 critical VLs are placed in between two
the second spare to functional VL 22. Continuing this policy, adjacent planar dies and if all of them fail together, then the
greedy search assigns spares to the same set of functional VLs NoC becomes completely unrouteable. Therefore, we prune
as done by the exhaustive search. We found this behavior to all the noncritical VLs, a total of 32 out of 48 (other than
be consistent across all the benchmarks. 16 critical VLs). In other words, |H| = 16 corresponding to
16 high workload carrying VLs, which is significantly smaller
B. Domain Knowledge for Pruning the Search Space compared to m = 48 (total number of VLs). We found that
The time to compute the sVL allocation solution grows as with this setting, both the search algorithms with pruning pro-
the number of functional VLs (m) and the number of sVLs (n) duce the same sVL allocation solutions as their counterparts
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 729

(a) (b)

Fig. 9. (a) Estimated runtime for different number of sVL: greedy-full versus greedy-restricted. (b) Estimated runtime for different number of sVL:
exhaustive-full versus exhaustive-restricted.

of the original 3-D SWNoC (without any sVL) is also plotted.


The EDP of 3-D SWNoC is normalized with respect to the
EDP of fault free 3-D MESH. To help illustrate the lifetime
computation procedure more clearly, a dotted horizontal line
is drawn in Fig. 10, which we call the lifetime line. This line
corresponds to 100% EDP value for the fault free 3-D MESH
(at t = 0).
We have calculated the lifetimes of 3-D SWNoC and 3-D
SWNoC with 8 sVLs as marked with respective vertical
lines. The lifetimes of these NoCs are the projection of the
intersection point of the lifetime line and the corresponding
EDP profile lines of on the time axis. These are marked as
L3-DSW and L3-DSWwith8-sVL in the figure. The lifetime of other
3-D NoCs can be calculated in a similar way.

Fig. 10. Lifetime determination algorithm is explained with normalized EDP D. Effects of Spare-VL Allocation on 3-D NoC
profile of 3-D SWNoC with and without sVL allocation. As an example, life-
time calculation for 3-D SWNoC with DEDUP benchmark has been plotted. Whenever a sVL is allocated to a functional VL, the sVL
The EDP of 3-D MESH-0 sVL (dotted line) corresponds to time t = 0 and carries the traffic when the corresponding functional VL fails.
extended only for reference purpose. This minimizes the effect of VL failure on 3-D NoC perfor-
mance degradation and essentially helps in maintaining lower
without any pruning (exhaustive-full and greedy-full) for dif- EDP value over longer period of time. However, there exists
ferent number of sVLs (n = 1 to any number of upper limit). an upper limit for the sVL number, beyond which the advan-
In other words, we do not lose accuracy due to pruning. We tages of sVL allocation can no longer be pronounced. We call
do not show these results for the sake of brevity. The main this number as the optimum number of sVLs.
benefit of pruning is that it improves the computational effi- Depending on the benchmark and NoC configuration, the
ciency of producing sVL allocation solutions. As an example, optimum number of sVL varies. In this paper, we consider
Fig. 9(a) and (b) shows the estimated runtime comparison 3-D SWNoC as the testbed for evaluating the performance of
of greedy-full and greedy-restricted, and exhaustive-full and sVL allocation. However, subsequent experiments and analysis
exhaustive-restricted, respectively. We can see that the com- are equally applicable for other 3-D NoC architectures as well.
putational gains are significant due to pruning, but without 1) Optimum Number of Spare VLs: In this section, we eval-
losing any accuracy. uate the effects of different number of sVLs on the 3-D NoC
performance. Fig. 11(a)–(c) demonstrates the normalized EDP
C. Computing the Lifetime of 3-D SWNoC With of 3-D SWNoC with time for CANNEAL, DEDUP, and VIPS
sVL Allocation benchmarks, respectively. Similar to the previous experiments,
In this section, we describe the procedure to compute the we have considered these three benchmarks as the represen-
lifetime of any 3-D NoC configuration. For better understand- tative of high, medium, and low injection benchmarks from
ing, we plot the EDP profile of 3-D SWNoC with and without the PARSEC and SPLASH-2 suites. All the EDP values are
sVLs incorporated into it, and graphically illustrate how to normalized with respect to the EDP of fault free 3-D MESH
calculate the lifetime of any 3-D NoC. with no sVLs allocated to it at t = 0.
As defined in the earlier section, the lifetime of any 3-D From these figures, we can see that the EDP remains
NoC is the time when the EDP value of that particular NoC unchanged up to a certain point and after that, it increases
equals to a certain threshold value. Since the performance when the functional VLs start failing. This happens due to the
requirement for the NoC is application and/or user dependent, fact that initially no functional VL fails and EDP remains con-
the threshold value to compute the lifetime of the 3-D NoC stant up to a certain time. Subsequently, VLs from the critical
will vary. region (as defined in Section V-E) having high traffic density
Fig. 10 illustrates the lifetime computation procedure for start failing. In such a link failure scenario, the traffic of the
a 3-D SWNoC incorporated with 8 sVLs for DEDUP bench- failed VLs is carried by the neighboring VLs along with their
mark. This particular configuration is chosen as an example, own traffic. This has two kinds of negative effects. First, the
however, the procedure is applicable for any other 3-D NoC EDP and the network latency of the NoC increases due to
and benchmark. For the reference purposes, the EDP profile a critical link failure. Second, the neighboring functional VLs
730 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

(a) (b)

(c)

Fig. 11. (a) Normalized EDP profile for 3-D SWNoC with different number of sVLs allocation for the CANNEAL benchmark. (b) Normalized EDP profile
for the 3-D SWNoC with different number of sVLs for the DEDUP benchmark. (c) EDP profile for 3-D SWNoC with different number of sVL allocation
for the VIPS benchmark.

also fail quickly which further degrades the NoC performance. Fig. 12(a)–(c) shows the EDP profile with time of 3-D
As a result, the EDP increases at a faster rate. SWNoC with partial sTSV allocation. In these figures, 3-D
Another interesting result is that as the number of allocated SW-8 sVLs denotes the performance of 3-D SWNoC with
sVLs increases, the EDP profile shifts toward the right on 8 sVLs allocation (complete bundle allocation) whereas the
the time scale. This implies that the 3-D SWNoC with sVL 3-D SW-8 sVLs_x% denotes the performance of 3-D SWNoC
allocation can maintain a particular EDP level for a longer with individual sTSV allocation within the bundle (VL). For
period of time. Expectedly, the lifetime of 3-D SWNoC also example, 3-D SW-8 sVLs_50% indicates 50% TSVs within
increases with sVL allocation. In addition, we can see that the the bundle (for 8 VLs) have sTSVs. From the figures, it is
difference between the EDP profiles on the time axis decreases clear that, complete sVL allocation performs better than par-
gradually as the sVL number increases. For the CANNEAL tial sTSV allocation. As the percentage of sTSV allocation
benchmark, the right-most EDP is found to be for 8 sVLs. It increases, the EDP profile shifts right on the time scale and
is seen that even if we increase the number of sVLs beyond lifetime improves consequently. It should be noted that if we
8 for CANNEAL, the EDP profile does not shift to the right allocate 100% sTSVs, then it is equivalent to full-sVL alloca-
anymore. This implies that any further improvement of EDP tion (3-D SW-8sVLs in the figure) and achieves the best EDP
profile is not possible, and we call this scenario as the satura- profile and maximum lifetime for the 3-D NoC.
tion effect of sVL allocation. Similarly, for 3-D SWNoC with 3) Saturation of Lifetime Improvement: In this paper, we
DEDUP and VIPS benchmarks, the EDP profile gets saturated have considered one-to-one correspondence between sVLs and
for 14 sVLs. functional-VLs, where any sVL replaces one functional VL
2) Performance of 3-D SWNoC With Partial sVL- regardless of the workload intensity. Allocation of such sVLs
Allocation: In this section, we evaluate the performance of increases the traffic carrying capability of the critical VLs
3-D SWNoC with partial sVL allocation. With partial sVL and improves the lifetime of the 3-D NoC. As an example,
allocation, instead of allocating an sVL (total bundle of TSVs Fig. 13 plots the percentage of lifetime improvement of 3-D
replacing the whole VL), we only allocate some sTSVs to SWNoC for the CANNEAL benchmark with different number
an fVL and compare its performance with full sVL-allocation of sVLs allocation. Note that similar lifetime improvements
explored earlier. For partial sVL allocation, we need to con- are observed for other benchmarks as well.
sider the cross-coupling capacitance of the individual TSVs. From the figure, we can see that as the number of allocated
If we consider a grid-based layout of the TSVs in a bun- sVLs increases, the lifetime of the 3-D SWNoC also increases.
dle, then the centrally located TSVs will have the highest Initially, the gain of lifetime is almost linear with the num-
cross coupling. We replace the TSVs that are affected most by ber of allocated sVLs and later, the gain increment decreases
the cross coupling in this partial allocation. As a case study, and improvement saturates after some point. Allocation of
we consider this partial TSV allocation to the critical fVLs sVL increases the combined lifetime of the particular VL
only and allocate 50% and 75% of the total TSVs in an fVL. (consists of sVL and functional VL in this case), which helps
We characterize the performance of this partial TSV allocation to minimize the network latency and EDP degradation due to
in comparison with the full sVL allocation. VL failure.
DAS et al.: DESIGN-SPACE EXPLORATION AND OPTIMIZATION OF AN ENERGY-EFFICIENT AND RELIABLE 3-D SWNoC 731

(a) (b)

(c)

Fig. 12. (a) Normalized EDP of 3-D SWNoC for CANNEAL benchmark with 8-sVL allocation. Here, 3-D SW-8 sVLs_x% denotes the partial sVL allocation
where x denotes the percentage of total TSVs needed to enable one full VL. (b) Normalized EDP of 3-D SWNoC for DEDUP benchmark with 8-sVL allocation.
Here, 3-D SW-8 sVLs_x% denotes the partial sVL allocation where x is the percentage of total TSVs needed to enable one full VL. (c) Normalized EDP
of 3-D SWNoC for VIPS benchmark with 8-sVL allocation. Here, 3-D SW-8 sVLs_x% denotes the partial sVL allocation where x is the percentage of total
TSVs needed to enable one full VL.

VIII. C ONCLUSION
We proposed a robust design optimization methodology
to improve the energy efficiency of 3-D NoC architectures
by combining the benefits of SW networks and machine-
learning techniques to intelligently explore the design space.
We showed that the optimized 3-D SWNoC architecture out-
performs the existing 3-D NoCs. The optimized 3-D SW NoC
on an average achieves 35% EDP reduction over conventional
3-D MESH. We also demonstrated the efficacy and robustness
of the 3-D SWNoC in presence of nonhomogeneous work-
load induced VL failure. The proposed 3-D SWNoC shows
better resilience and EDP profile against VL failure at any
Fig. 13. Effect of sVL allocation on 3-D SWNoC for the CANNEAL bench- instant of time compared to state-of-the-art 3-D NoCs. We
mark. The improvement of lifetime of 3-D SWNoC initially increases linearly also proposed an sVL allocation mechanism to address the
and saturates beyond 8-sVL allocation. The gain is normalized with respect performance degradation and lifetime shortening problem due
to the initial lifetime of 3-D SWNoC at t = 0.
to VL failure. We showed that with a small number of sVLs,
we could exploit NoC domain knowledge to develop efficient
In general, most critical VLs fail early when compared to and computationally inexpensive algorithms to explore optimal
the other VLs. If the sVLs are allocated to the critical VLs, solution. The proposed sVL allocation significantly improves
then they help in significantly increasing the lifetime of the the reliability and lifetime of the 3-D NoC.
NoC. However, the lifetime gain saturates as the number of
allocated sVL crosses a certain number. This happens due
to the fact that the combined lifetime of some critical VLs R EFERENCES
even with the sVL allocation is shorter than other noncritical [1] V. F. Pavlidis and E. G. Friedman, Three-Dimensional Integrated Circuit
VLs. Consequently, even if we allocate sVLs to these non- Design. San Francisco, CA, USA: Morgan Kaufmann, 2009.
[2] A. W. Topol et al., “Three-dimensional integrated circuits,” IBM J. Res.
critical VLs, they do not improve the EDP beyond what is Develop., vol. 50, no. 4.5, pp. 491–506, Jul. 2006.
achieved already. It is important to note that similar effects are [3] B. S. Feero and P. P. Pande, “Networks-on-chip in a three-dimensional
also observed for DEDUP and VIPS benchmarks as well (in environment: A performance evaluation,” IEEE Trans. Comput., vol. 53,
these cases, the saturation effect was observed for 14 sVLs). no. 1, pp. 32–45, Jan. 2009.
[4] A.-C. Hsieh et al., “TSV redundancy: Architecture and design issues
However, we have omitted plotting such repetitive results and in 3-D IC,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20,
analysis. no. 4, pp. 711–722, Apr. 2012.
732 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 5, MAY 2017

[5] T. Frank et al., “Resistance increase due to electromigration induced [33] R. Kim et al., “Energy-efficient VFI-partitioned multicore design using
depletion under TSV,” in Proc. IEEE Int. Rel. Phys. Symp. (IRPS), wireless NoC architectures,” in Proc. CASES, Uttar Pradesh, India,
Monterey, CA, USA, Apr. 2011, pp. 3F.4.1–3F.4.6. Oct. 2014, pp. 1–9.
[6] Y. Cheng et al., “A novel method to mitigate TSV electromigration for [34] M. D. Humphries, K. Gurney, “Network ‘small-world-ness’: A quanti-
3D ICs,” in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, Natal, Brazil, tative method for determining canonical network equivalence,” J. PLoS
Aug. 2013, pp. 121–126. One, vol. 3, no. 4, Apr. 2008, Art. no. e0002051.
[7] U. Y. Ogras and R. Marculescu, “‘It’s a small world after all’: NoC per- [35] WEKA Toolkit. Accessed on Feb. 22, 2016. [Online]. Available:
formance optimization via long-range link insertion,” IEEE Trans. Very http://www.cs.waikato.ac.nz/ml/weka/
Large Scale Integr. (VLSI) Syst., vol. 14, no. 7, pp. 693–706, Jul. 2006. [36] P. Wettin et al., “Design space exploration for wireless NoCs incorpo-
[8] S. Das, J. R. Doppa, D. H. Kim, P. P. Pande, and K. Chakrabarty, rating irregular network routing,” IEEE Trans. Comput.-Aided Design
“Optimizing 3D NoC design for energy efficiency: A machine learning Integr. Circuits Syst., vol. 33, no. 11, pp. 1732–1745, Nov. 2014.
approach,” in Proc. ICCAD, Austin, TX, USA, Nov. 2015, pp. 705–712. [37] O. Lysne, T. Skeie, S.-A. Reinemo, and I. Theiss, “Layered routing in
[9] P. Jacob et al., “Predicting the performance of a 3D processor-memory irregular networks,” IEEE Trans. Parallel Distrib. Syst., vol. 17, no. 1,
chip stack,” IEEE Design Test Comput., vol. 22, no. 6, pp. 540–547, pp. 51–65, Jan. 2006.
Nov./Dec. 2005. [38] S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The
[10] H. G. Lee, N. Chang, U. Y. Ogras, and R. Marculescu, “On-chip SPLASH-2 programs: Characterization and methodological considera-
communication architecture exploration: A quantitative evaluation of tions,” in Proc. Int. Symp. Comput. Architect., Santa Margherita Ligure,
point-to-point, bus, and network-on-chip approaches,” ACM Trans. Italy, 1995, pp. 24–36.
Design Autom. Electron. Syst., vol. 12, no. 3, pp. 1–20, Aug. 2007. [39] C. Bienia, “Benchmarking modern multiprocessors,” Ph.D. dissertation,
[11] I. Loi, F. Angiolini, S. Mitra, L. Benini, and S. Fujita, “Characterization Dept. Comput. Sci., Princeton Univ., Princeton, NJ, USA, Jan. 2011.
and implementation of fault-tolerant vertical links for 3-D networks-on- [40] G. Palermo, C. Silvano, G. Mariani, R. Locatelli, and M. Coppola,
chip,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 30, “Application-specific topology design customization for STNoC,” in
no. 1, pp. 124–134, Jan. 2011. Proc. Euromicro Conf. Digit. Syst. Design Architect. Methods Tools,
[12] F. Li et al., “Design and management of 3D chip multiprocessors Lübeck, Germany, Aug. 2007, pp. 547–550.
using network-in-memory,” in Proc. ISCA, Boston, MA, USA, 2006, [41] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist
pp. 130–141. multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evol. Comput.,
[13] J. Kim et al., “A novel dimensionally-decomposed router for on-chip vol. 6, no. 2, pp. 182–197, Apr. 2002.
communication in 3D architectures,” in Proc. ISCA, San Diego, CA, [42] H. Matsutani et al., “Low-latency wireless 3D NoCs via randomized
USA, Jun. 2007, pp. 138–149. shortcut chips,” in Proc. DATE, Dresden, Germany, Mar. 2014, pp. 1–6.
[43] B. Noia and K. Chakrabarty, Design-for-Test and Test Optimization
[14] A.-M. Rahmani et al., “High-performance and fault-tolerant 3D NoC-bus
hybrid architecture using ARB-NET-based adaptive monitoring plat- Techniques for TSV-Based 3D Stacked ICs. Cham, Switzerland: Springer,
2014.
form,” IEEE Trans. Comput., vol. 63, no. 3, pp. 734–747, Mar. 2014.
[15] C. Seiculescu, S. Murali, L. Benini, and G. De Micheli, “SunFloor 3D:
A tool for networks on chip topology synthesis for 3-D systems on Sourav Das (S’14) is currently pursuing the
chips,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, Ph.D. degree with the Electrical Engineering and
no. 12, pp. 1987–2000, Dec. 2010. Computer Engineering Department, Washington
[16] P. Zhou, P.-H. Yuh, and S. S. Sapatnekar, “Application-specific 3D State University, Pullman, WA, USA.
network-on-chip design using simulated allocation,” in Proc. ASP-DAC, His current research interest includes low-power
Taipei, Taiwan, Jan. 2010, pp. 517–522. network-on-chip design.
[17] S. Murali, C. Seiculescu, L. Benini, and G. De Micheli, “Synthesis
of networks on chips for 3D systems on chips,” in Proc. ASP-DAC,
Yokohama, Japan, Jan. 2009, pp. 242–247.
[18] Y. Xu et al., “A low-radix and low-diameter 3D interconnection net-
work design,” in Proc. Symp. HPCA, Raleigh, NC, USA, Feb. 2009,
pp. 30–42.
[19] C. A. M. Macron et al., “Tiny NoC: A 3D mesh topology with router Janardhan Rao Doppa (M’14) received the Ph.D.
channel optimization for area and latency minimization,” in Proc. Int. degree from Oregon State University, Corvallis,
Conf. VLSI Design, Mumbai, India, 2014, pp. 228–233. OR, USA.
[20] S. Manipatruni et al., “Wide temperature range operation of micrometer-
scale silicon electro-optic modulators,” Opt. Lett., vol. 33, no. 19, He is an Assistant Professor with Washington
pp. 2185–2187, 2008. State University, Pullman, WA, USA.
[21] J. Kim, F. Wang, and M. Nowak, “Method and apparatus for Dr. Doppa was a recipient of the Outstanding
providing through silicon via (TSV) redundancy,” U.S. Patent US Paper Award for the research on structured predic-
2 010 029 560 0A1, Nov. 2010. tion at the AAAI 2013 conference.
[22] W.-P. Tu, Y.-H. Lee, and S.-H. Huang, “TSV sharing through multiplex-
ing for TSV count minimization in high-level synthesis,” in Proc. IEEE
SOCC, Taipei, Taiwan, Sep. 2011, pp. 156–159.
[23] Y. Wang et al., “Economizing TSV resources in 3-D network-on-chip Partha Pratim Pande (SM’11) received the M.S.
design,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 3, degree in computer science from the National
pp. 493–506, Mar. 2015.
[24] U. Kang et al., “8 Gb 3-D DDR3 DRAM using through-silicon-via tech- University of Singapore, Singapore, in 2002, and the
nology,” IEEE J. Solid-State Circuits, vol. 45, no. 1, pp. 111–119, Ph.D. degree in ECE from the University of British
Jan. 2010. Columbia, Vancouver, BC, Canada, in 2005.
[25] E. J. Marinissen and Y. Zorian, “Testing 3D chips containing through- He is a Professor and holds the Boeing Centennial
silicon vias,” in Proc. IEEE Int. Test Conf., Austin, TX, USA, 2009, Chair in computer engineering with the School
pp. 1–11. of Electrical Engineering and Computer Science,
[26] H.-H. S. Lee and K. Chakrabarty, “Test challenges for 3D inte- Washington State University, Pullman, WA, USA.
grated circuits,” IEEE Des. Test Comput., vol. 26, no. 5, pp. 26–35,
Sep./Oct. 2009.
[27] L. Jiang, Q. Xu, and B. Eklow, “On effective TSV repair for 3D-stacked
ICs,” in Proc. DATE, Dresden, Germany, Mar. 2012, pp. 793–798. Krishnendu Chakrabarty (F’08) received the
[28] T. Petermann and P. D. L. Rios, “Spatial small-world networks: M.S.E and Ph.D. degrees from the University of
A wiring-cost perspective,” arXiv:cond-mat/0501420, Jan. 2005. Michigan, Ann Arbor, MI, USA, in 1992 and 1995.
[29] S. Das, D. Lee, D. H. Kim, and P. P. Pande, “Small-world network He is the William H. Younger Distinguished
enabled energy efficient and robust 3D NoC architectures,” in Proc. Professor of Engineering with the Department
GLSVLSI, Pittsburgh, PA, USA, 2015, pp. 133–138. of Electrical and Computer Engineering, Duke
[30] S. Das et al., “Design-space exploration and optimization of an energy- University, Durham, NC, USA.
efficient and reliable 3D small-world network-on-chip,” arXiv preprint Prof. Chakrabarty served as an Editor-in-Chief
arXiv:1608.06972, 2016. for the IEEE Design & Test of Computers from
[31] J. A. Boyan and A. W. Moore, “Learning evaluation functions to improve
optimization by local search,” J. Mach. Learn. Res., vol. 1, pp. 77–112, 2010 to 2012 and the ACM Journal on Emerging
Nov. 2000. Technologies in Computing Systems from 2010 to
[32] K. D. Boese, “Cost versus distance in the traveling salesman prob- 2015. He currently serves as an Editor-in-Chief for the IEEE T RANSACTIONS
lem,” UCLA Comput. Sci. Dept., Los Angeles, CA, USA, Tech. ON V ERY L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS . He is a fellow
Rep. CSD-950018, 1995. of ACM.

You might also like