No Care A Power
No Care A Power
No Care A Power
Paolo Meloni , Salvatore Carta , Roberto Argiolas , Luigi Raffo and Federico Angiolini#
#
DIEE, University of Cagliari, 09123 Cagliari, Italy DMI, University of Cagliari, 09123 Cagliari, Italy DEIS, University of Bologna, 40136 Bologna, Italy
Abstract Networks-on-Chip (NoCs) are emerging as scalable interconnection architectures, designed to support the increasing amount of cores that are integrated onto a silicon die. Compared to traditional interconnects, however, NoCs still lack wellestablished CAD deployment tools to tackle the large amount of available degrees of freedom, starting from the choice of a network topology. Silicon-aware optimization tools are now emerging in literature; they select a NoC topology taking into account the tradeoff between performance and hardware cost, i.e. area and power consumption. A key requirement for the effectiveness of these tools, however, is the availability of accurate analytical models for power and area. Such models are unfortunately not as available and well understood as those for traditional communication fabrics. In this work, given a NoC reference architecture, we present a ow to devise analytical models of area occupation and power consumption of NoC switches, and propose two strategies for coefcient characterization which have different tradeoffs in terms of accuracy and of modeling activity effort. The models are parameterized on several architectural, synthesis-related and trafc variables, resulting in maximum exibility. We nally assess the accuracy of the models.
I. I NTRODUCTION Current and future Systems-on-Chip (SoCs) achieve increasingly better functionality and performance by integrating larger amounts of processing elements. This growth in computing resources must be matched by a corresponding evolution of the interconnection infrastructure. Traditional communication fabrics exhibit scalability issues in terms of performance and physical circuit design. The Network-on-Chip (NoC) paradigm, which brings networking concepts to the on-chip domain, answers such concerns with a scalable design, at both the architectural and physical levels. NoC design is a discipline with a large amount of degrees of freedom. While this is an advantage, in that the interconnect can be optimally tailored to match the application at hand, it creates a critical issue when exploring the design space from the hardware overhead point of view. Given the exibility of NoCs, an exhaustive exploration would require impractical amounts of synthesis runs, and a thorough characterization of switching activity in every candidate topology to properly assess power consumption. An answer to the NoC customization complexity lays in the deployment of robust CAD tools, introducing the ability for
automated design space exploration according to some fast optimization algorithm. In turn, however, such an approach requires the availability of accurate analytical models of NoC area and power consumptions to drive the optimization algorithms. Models allow the designer to quickly pre-estimate the area requirements and power consumption overhead introduced by the candidate interconnects. However, it must be noted that, as with any hardware component, the hardware cost of a NoC switch depends on several kinds of parameters, including (i) architectural (e.g. amount of buffering), (ii) synthesis tool-related (e.g. target operating frequency), (iii) operating (e.g. trafc ows). As a major contribution of this work, we propose a NoC modeling style which takes advantage of the designers knowledge of the target architecture and synthesis library. For our analysis, we choose the pipes NoC switch as a case study due to its parameterizability (Section III). Key properties of our approach include accuracy and explicit modeling on several parameters of the design, like switch arity, it width, buffering, trafc and synthesis parameters. These properties make the approach suitable for fast exploration of large parts of the fabric design space, exible and applicable in real life, for example by accounting for the behaviour of the synthesis tools when the target operating frequency approaches the limits of the design. The characterization is dependent on the target technology library, but can be easily scripted and automated. Our approach starts from an existing RTL description of the NoC components, which are then synthesized and characterized under multiple architectural congurations and trafc conditions. A mathematical formulation of the area and power models is derived from empirical evidence and from the designers knowledge of the NoC. Eventually, the coefcients of the model are tted to the experimental results, guaranteeing accurate results for the given architecture. We present two different ways of characterizing the coefcients, with varying accuracy/effort tradeoffs. II. R ELATED W ORK NoCs have recently been proposed as a way to overcome the scalability issues of traditional interconnects [1][3]. Research has focused on multiple design levels. From the architectural point of view, a complete scheme is presented for example
in [4], while specic topics are tackled in several works, such as Quality of Service (QoS) provisions [5] and asynchronous implementations [6]. The complexity in tackling the congurability of NoCs has been made clear by [7], where synthesis results for switches show widely different hardware requirements. Hence, the need for the development of algorithms and CAD toolchains for NoC instantiation and optimization, as found for example in [8], [9]. A requirement of such works is the availability of reliable power and area models. Power models and simulators for processors and memories have been proposed in an extremely large body of research [10], [11]. Interconnects have also become the focus of research [12], due to their incresing role in the hardware budget of recent and future systems; for example, the on-chip network of the MIT RAW chip multiprocessor is taking 36% of the chip power budget on average [13]. Some models of NoC hardware cost have already been proposed in previous literature. Results in [14] are derived from a mix of results on template circuits and from technology trends, and are specically aimed at wide applicability. Therefore, even though they have been used for design space exploration [15] and in association with high-level trafc injection models [16], they do not guarantee maximum accuracy within an architecture-specic CAD ow. The main advantage of these techniques is exibility and fast deployment. We see them as complementary to our approach, especially for initial exploration when the NoC component library is not available yet. The approach in [17], on the other hand, attempts to build a cycle-accurate power model of a target router instance. However, several major points differentiate our approach. First, we build a model which is parametric not only on trafcrelated events, but also on the architectural knobs of the design. Second, we include an area model in the exploration. Third, our model can be more readily adopted within a CAD mapping ow; this is both because we express the model as a function of architectural parameters, and because we provide a high-level dependence on trafc variables, instead of a cycle-by-cycle one. Fourth, we strive to make our approach as applicable as possible in real-world conditions, including the hard-tomodel peculiarities of the behaviour of synthesis tools when aiming for maximum frequency operation. Fifth, we propose a fast characterization mechanism, by means of which model coefcients can be quickly derived with a minimal amount of synthesis runs. In [18], a framework for NoC exploration is presented; the framework includes a power modeling ow. The power model features very limited dependence on architectural parameters and does not seem to account for the conguration knobs of synthesis tools. No area model is provided. In [19], a bit energy modeling ow is proposed to compare different switch fabrics in IP network routers. The approach is focused on the cost for transmitting bits from input to output ports, and while bit pattern-accurate, it is only focused on comparing router topologies against each other. The authors
Fig. 1.
of [20] propose a model based on transistor count, while in [21], which is focused on FPGAs, switch arity is the main parameter. None of these models is meant for simultaneously accurate, parametric and fast representation of power consumption, i.e. suitable for design space exploration within a CAD environment. III. T HE PIPES S WITCH A RCHITECTURE We choose the switch architecture dened in the pipes NoC component library [7], [22] as a case study, due to its customizability. The pipes switch (Figure 1) is output buffered; FIFOs of congurable depth are instantiated at each output, while inputs feature a single register. The it width can be arbitrarily set. The number of input and output ports is also a parameter; full connectivity is provided in the central crossbar. An arbiter is attached to each output port to handle contention issues. We test the switch with its default ACK/NACK ow control mechanism, which leverages the output buffer resources. Since pipes performs source routing, the switch does not include routing LUTs. IV. P ROPOSED M ODELING M ETHODOLOGY Our modeling activity is composed of ve main phases. First, we devise a set of parameters that are relevant to the accuracy of any model which aims at practical applicability. Second, we dene a general model formula for area and for power, relying on the knowledge of the target architecture as explained in Section III. Third, we synthesize several congurations (training set) of the target switch architecture in a 0.13 m technology library with Design Compiler [23], and measure the corresponding area and power consumption. The congurations are chosen so as to uniformly but sparsely cover the design space of interest, therefore allowing for an accurate yet quick construction of the model. Fourth, we use experimental results to numerically quantify the coefcients of the model. As outlined later, we propose two different ways of performing this step. Fifth, we assess the quality of our models
against congurations (test set) outside the training set. The rst four steps will be covered in Subsections IV-A, IV-B, IVC, IV-D, while the fth will be discussed in Section V. A. Parameters of Interest A key phase of the approach is devising a model that matches the architecture under consideration and its properties. However, considering the architecture alone does not guarantee that the model will be applicable and accurate enough in practice. For example, synthesis tools play a primary role in dening the area and power efciency of a component. Therefore, we rst summarize the parameters of interest when assembling our model. Architectural parameters: Switch arity (number of ports). To account for rectangular switches, we separately consider the amount of input ports (npi ) and output ports (npo ). Amount of buffering devoted to ow control handling and performance optimization, also called buffer depth (bd) (expressed in terms of single-it buffering elements). Number of bits of the incoming and outgoing elementary data blocks, also called it width (f w). Implementation ow parameters: Target frequency of operation. Target area. Target power consumption. Tuning these parameters differently in the synthesis tools results, as expected, in widely different quality of results. For example, extreme performance demands force synthesis tools to create netlists containing large amounts of buffers and fast gates, which decrease area and power efciency. To mimic a typical industrial ow, where an application performance constraint must be satised, we impose as the primary objective a certain target operating frequency (which is a parameter of our model), while area and power minimization are given to the tool as secondary optimization objectives. As a result, area and power requirements, expressed as a function of the target operating frequency, exhibit a characteristic at behaviour followed by a steeply rising trend after an inection point. This trend is well known, and can be explained by the fact that, above some target operating frequency which can be achieved with minimal circuitry, synthesis backends are forced to insert extra gates to comply with increasing performance demands. Figure 2 is a linearized approximation of this trend, and at the same time summarizes the way we modeled this effect. For each device conguration (e.g. 4x4 32-bit switches with 6deep FIFO buffers), a native frequency fn can be identied. This frequency is that achieved by the synthesizer with relaxed timing constraints. Under this condition, the tool is free to fully pursue its secondary objectives, hence creating minimum area (A(fn )) and power (P (fn )) netlists. Conguring the tools for target frequencies lower than fn does not result in further decreses of area or power dissipation. For each switch instance, it is also possible to nd a frequency fmax , that corresponds to the fastest achievable synthesis result. Under this timing
Fig. 2.
constraint, the module has A(fmax ) area and P (fmax ) power consumption. We approximate the dependency of area and power overheads as linear in the range (fs ; fmax ). This assumption allows us to characterize devices just at fs and fmax under various combinations of the architectural parameters, while being able to estimate results over the whole range of frequencies achievable by the module. Since this analysis is not correlated to other model parameters, in the following, for simplicity of notation, we will not explicitly mention the dependency of coefcients on the synthesis target frequency; the characterization of this parameter will be implicitly assumed. The linearized approximation is a way of abstracting away from low-level details of the logic synthesis process, which are impossible to capture in a high-level model. The experimental results that will be shown in Section V will be based on a test set which is also spread in terms of target operating frequency, therefore providing a metric of the accuracy of such a model. Trafc condition parameters: these parameters are only relevant to power models, since area models are clearly static. They include downstream congestion and internal congestion (i.e. arbitration conicts). They will be explained in more detail in IV-B.2. B. Area and Power Models 1) Area Model: In general, the area equation must be of the form of Equation 1: A = f (bd, f w, npo , npi ) (1)
We identify as suitable the area model expressed in Equation 2: A(f w, bd, np) = A1 npo f w bd+ +A2 npi f w + A3 npo npi + A4 f w npo npi (2)
The rationale of this formula is that the area of the target switch can be rendered as the sum of four contributions (Section III): (i) output buffers, (ii) input buffers, (iii) arbitration and ow control logic, (iv) crossbar. Each contribution
strongly depends on a known combination of architectural parameters: Output buffers, which are dominated by ip-op area, can be supposed to depend linearly on it width f w and buffer depth bd (pipes switches are output-buffered), which respectively represent the width and depth of the buffer. There are npo such buffers. Input buffers are similar to the case above, but since they have a constant depth, they do not depend on bd. Obviously npi is used in place of npo . Since a distributed arbitration technique is used in the target switch, one arbiter is instantiated at each output port. Each arbiter has a complexity proportional on the number of candidate input ports npi , therefore the overall contribution is the product of the input and output arities. Arbiter logic is clearly independent of datapath parameters such as it width and buffer depth. The area overhead due to the crossbar must have a linear dependency on it width, must be independent of the buffering resources and must have a linear dependency on the product of input and output arities. 2) Power Model: The power consumption of a module depends on the switching activity of the cells, so, to express power consumption of a NoC switch, a term that accounts for trafc conditions must be present. However, since sequential components exhibit a power consumption even if they are not performing computation, due to the clock switching, a static term must also appear. We propose Equation 3 as a general power model: P (T, f w, bd, np) = A(...) + npo + j=1 [C(...) TOCj ] +
npo j=1 [B(...) TOj ]+ npi j=1 [D(...) TICj ]
power consumption in the trafc states described above, with A(f w, bd, np) representing the consumption in absence of trafc altogether (i.e. due to sequential cells still receiving clock edges). Due to space constraints, we omit a full discussion of the dependencies of the coefcients, which are summarized in Table I. We would like to stress that some coefcients, which could be intuitively expected to quadratically depend on parameters, are instead linearly dependent, because they characterize a single input or output port. The quadratic behaviour is indirectly restored by the summation symbols in Equation 3.
Model Coefcient A B C D Dep. on fw linear linear linear linear Dep. on bd linear linear linear none Dep. on npi linear linear linear linear Dep. on npo linear none none linear
C. Choice of a Relevant Training Set To characterize the coefcients of our area and power models, we dene a training set, composed of switch congurations chosen in such a way as to uniformly cover the relevant design space. Since the pipes building blocks are highly parameterizable and can be deployed in many congurations, we select instances spanning over a large variety of realistic block sizes for a NoC. Namely, we consider square switches with arity (npi x npo ) of 4x4, 10x10, 16x16 and 20x20; buffer depths bd of 4, 5 and 7 FIFO locations; it widths f w of 21, 28 and 38 bits. Buffer depths and it widths are chosen in a relatively conservative design space from the point of view of hardware resource usage, given the current performance/overhead tradoff point of NoCs [22]. D. Fitting Model Coefcients 1) Fitting Area Model Coefcients: To estimate A1 , A2 , A3 , A4 , we propose two different methods: Methodology 1: Coefcients can be derived directly from synthesis reports, which hierarchically list every switch sub-block. For example, once the area cost of an output buffer which is bd its deep and f w bits wide is gathered from one report, it can be called Aobuf |bd,f w . Since A1 is expected to increase linearly with both bd and f w, it can be approximately derived as in Equation 4: A1 = Aobuf |bd,f w bd f w (4)
(3)
The rst term is independent of the trafc, and models the power dissipated by inactive, but still clocked, registers. The remaining terms depend on trafc conditions. An accurate representation of the trafc conditions requires a separate analysis of the state of each input and output port. Therefore, we dene npo trafc variables TOj and TOCj , to model the lack or presence of external congestion, and npi trafc variables TICj , to model internal contention for resources. More specically, we dene: TOj : Percentage of time during which the output port j is successfully transmitting its. This coefcient models trafc in absence of congestion. TOCj : Percentage of time during which the output port j is trying to transmit, but its are rejected. This coefcient models external congestion due to trafc spikes. TICj : Percentage of time during which the input port j of the switch is trying to transmit its through one of the output ports, but arbitration is denied by the switch logic. This coefcient models the contention for the same output port inside of the switch. The coefcients A, B, C, D depend on architectural parameters, as for the area model. They account for the
Other coefcients can be similarly computed. Advantages: With this methodology, each contribution in the formula keeps a strict physical meaning. Only one synthesis run is needed to extrapolate coefcients for any
switch instance; we arbitrarily choose a 10x10, 28-bit switch as a reference. This instance is close to the center of the design space of interest (see the previous Subsection); its choice will be further discussed in Section V. Disadvantages: This simplied approximation discards any constant offset that may be present in the coefcients. Further, the nature of synthesis tools introduces unpredictable uctuations in the netlist area and power trends under different architectural congurations. This noise does not have any easily characterizable property. Thus, the model suffers from a non-negligible margin of error when compared against actual switch instances. Moreover, the choice of the specic switch instance for characterization might skew the computed coefcient values. Methodology 2: Coefcients can be derived based on the same formula as above, but after an interpolation step over a set of characterization syntheses (the training set described in the previous Subsection). Advantages: The model ts better to actual synthesis results. Disadvantages: Longer characterization time; with a thorough characterization set like that chosen in Subsection IV-C, experiments must be performed in 36 device instances, against just one. The actual improvement in accuracy depends on the smoothness of the native behaviour of the synthesis tools. Some coefcients may lose their physical meaning (e.g., they may become negative). Both methodologies can be readily adapted to any parameterizable NoC architecture. 2) Fitting Power Model Coefcients: To characterize the A, B, C, D coefcients, we rst inject trafc into the switch netlists under test. This is achieved by ModelSim [24] simulation of the Verilog netlists, to which trafc generators are attached. The trafc generators are congured to inject one of the four patterns described above (idle, free ow, downstream congestion, internal contention), allowing for separate analysis of the four coefcients. The corresponding switching activity is logged and fed as an input to Synopsys PrimePower [25], therefore building an accurate power estimation under each trafc contribution separately. Finally, the coefcients are determined by using either of the techniques just outlined for area models. V. E XPERIMENTAL R ESULTS To evaluate the accuracy of the proposed techniques, we rst randomly choose a test set of 70 switch congurations spread across the design space of interest (both in terms of architectural parameters and target synthesis frequencies), and not overlapping with the training set previously used for characterization. Each switch is synthesized with Design Compiler to extract its area requirements, then stimulated with trafc streams within ModelSim and studied in PrimePower to evaluate its power consumption. A reference set of experimental results is therefore collected. The area and power consumption of the same set of switches is then estimated
according to our methodology, and the statistical distribution of the resulting error is plotted in Figure 3. The plots report the behaviour of both coefcient tting strategies. As can be seen, in around 80% of the cases, our models result in an error margin smaller then 10% of the actual value. Sporadically, relatively high error rates of up to 20% are detected; however, as can be seen for example in Figure 4, the distribution of the errors is quite randomly spread over the design space, and comprises both under- and overestimations. The gure reports modeling inaccuracy for a subspace having as axes the it width and the and switch arity, but similar plots can be derived for varying buffer depths and target synthesis frequencies. Therefore, we can attribute inaccuracies to the unpredictability which is intrinsic in the behaviour of synthesis tools, and not to a problem of our modeling approach. Comparing the results of the two techniques for coefcient tting presented in Subsection IV-D, we see that the tails of the inaccuracy distribution drop more sharply for Methodology 2, indicating a lower chance of large modeling errors. However, Methodology 1 exhibits just marginally worse average inaccuracy rates: 6.26% against 5.30% for power models and 5.97% against 5.45% for area models. In terms of characterization effort, in our experience, we can roughly assume that one hour may be needed in average for the analysis of an instance of the training set; therefore, Methodology 1 requires one hour of runtime, while Methodology 2 needs 36 hours to provide numerical values of coefcients (the actual time depends on how thoroughly the design space is covered). Due to the drastically lower effort, Methodology 1 becomes a natural candidate for fast yet accurate modeling. However, this approach leverages upon a single switch instance to characterize all the coefcients. The choice of the reference switch conguration is therefore key, and may impact the robustness of the ow. Internal testing, that we omit due to space constraints, shows that coefcients are quite accurately rendered under a wide range of possible choices of the reference switch. However, when manually picking an outlier instance as the reference, errors over the whole design space turn out to be large. As a possible workaround, Methodology 1 could be applied to multiple switch instances to minimize the chance of choosing bad references; outliers could be effectively discarded. This hybrid approach provides better reliability, but requires a modeling effort which is progressively closer to that of Methodology 2 as its robustness is increased. Methodology 2 remains the most accurate and reliable, and its characterization time can still be assumed to be fully acceptable for both academic and industrial environments. To further validate the most complex part of our methodology, i.e. the power model, we study a whole NoC topology, such as a 5x3 mesh. The mesh includes switches with three different switch arities of 4x4, 5x5 and 6x6. We then inject functional trafc on the topology, and compare the resulting power consumption against that predicted by our model (characterized with Methodology 2). Trafc patterns on the mesh are irregular, due to application needs. The results are plotted in Figure 5. The average inaccuracy is 5%,
(a) Fig. 4. Distribution of the modeling inaccuracy over a subset of the design space for Methodology 2
(b) Fig. 3. Area and power coefcient modeling inaccuracy under different characterization policies: (a) area coefcients, (b) power coefcients
Fig. 5. Distribution of the power modeling inaccuracy for the switches of a 5x3 NoC mesh
with only two switches out of fteen (about 13%) exhibiting inaccuracies greater than 10%. Since the power consumption of some switches is overestimated while that of others is underestimated, the margin of error on the consumption of the whole mesh is as little as 1.3%. This results conrms the usefulness of our modeling strategy for integration within a CAD mapping and design space exploration ow. We also perform some initial experiments with switch placement&routing steps, therefore studying the behaviour of switches after the wiring issues have been taken into account. Preliminary analyses show that the models and modeling strategies outlined in this paper still apply. Clearly, numerical coefcients vary, due to extra capacitive loads and the resulting need for extra electrical buffering. Since the placement&routing stage is especially critical for intertconnect systems, this would be a very important validation milestone. Further experiments are however required. VI. C ONCLUSIONS AND F UTURE W ORK We presented a methodology for characterization of NoC switch area and power requirements. The approach we propose is thoroughly parameterized on several architectural, deployment and runtime parameters. This guarantees excellent applicability within a NoC CAD ow for topology mapping
and/or design space exploration. The area and power models turn out to be very accurate within the limits allowed by the non-idealities of synthesis tools, even when applied to a whole NoC topology with irregular trafc ows. We show that varying tradeoffs among coefcient accuracy and modeling effort can be achieved. As future work, we rst of all plan on fully validating our models and ows on post-placement&routing netlists. Subsequently, we plan on increasing the level of detail of our models, on consistently minimizing the characterization effort, and on creating similar models for other NoC components, such as Network Interfaces (NIs). Besides the absolute accuracy of the models, we also plan on quantifying the accuracy of our model when used from within a CAD ow to establish relative cost assessments of alternative NoC topologies; early results in this activity are encouraging, with good agreement between CAD expectations and actual measurements. VII. ACKNOWLEDGMENTS This work has been supported by STMicroelectronics and by the Semiconductor Research Corporation (SRC) under contract 1188.
R EFERENCES
[1] W. J. Dally and B. Towles, Route packets, not wires: On-chip interconnection networks, in Proceedings of the 38th Design Automation Conference, June 2001, pp. 684689. [2] L. Benini and G. D. Micheli, Networks on chips: A new SoC paradigm, IEEE Computer, vol. 35, no. 1, pp. 70 78, January 2002. [3] P. Guerrier and A. Greiner, A generic architecture for on-chip packetswitched interconnections, in Design Automation and Test in Europe, DATE00, March 2000, pp. 250 256. [4] F. Karim, A. Nguyen, S. Dey, and R. Rao, On-chip communication architecture for OC-768 network processors, in Proceedings of the Design Automation Conference (DAC), 2001, pp. 678 683. [5] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, QNoC: QoS architecture and design process for network on chip, in Journal of Systems Architecture. Elsevier, 2004. [6] T. Bjerregaard and J. Spars, Scheduling discipline for latency and bandwidth guarantees in asynchronous network-on-chip, in Proceedings of the 11th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), 2005, pp. 3443. [7] F. Angiolini, P. Meloni, D. Bertozzi, L. Benini, S. Carta, and L. Raffo, Networks on chips: A synthesis perspective, in Proceedings of the 2005 ParCo Conference, 2005. [8] J. Hu and R. Marculescu, Energy-aware mapping for tile-based NoC architectures under performance constraints, in Proceedings of the Asia and South Pacic Design Automation Conference (ASP-DAC), 2003, pp. 233239. [9] S. Murali and G. D. Micheli, SUNMAP: a tool for automatic topology selection and generation for NoCs, in Proceedings of the 41st Design Automation Conference (DAC), 2004, pp. 914919. [10] D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A framework for architectural-level power analysis and optimizations, in Proceedings of the 27th International Symposium on Computer Architecture (ISCA), 2000, pp. 8394. [11] W. Ye, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin, The design and use of simplepower: A cycle-accurate energy estimation tool, in Proceedings of the 34th Design Automation Conference (DAC00), 2000, pp. 340345. [12] A. Bona, V. Zaccaria, and R. Zafalon, System level power modeling and simulation of high-end industrial network-on-chip, in Proceedings of Design, Automation and Testing in Europe Conference 2004 (DATE04). IEEE, Febuary 2004, pp. 318323. [13] J. S. Kim, M. B. Taylor, J. Miller, and D. Wentzlaff, Energy characterization of a tiled architecture processor with on-chip networks, in Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED), 2003, pp. 424427. [14] H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik, Orion: a powerperformance simulator for interconnection networks, in Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture. IEEE Computer Society Press, 2002, pp. 294305. [15] H.-S. Wang, L.-S. Peh, and S. Malik, A technology-aware and energyoriented topology exploration for on-chip networks, in Proceedings of Design, Automation and Testing in Europe Conference 2005 (DATE05). IEEE, March 2005, pp. 12381243. [16] N. Eisley and L.-S. Peh, High-level power analysis of on-chip networks, in Proceedings of the 7th International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), 2004, pp. 104115. [17] J. Chan and S. Parameswaran, NoCEE: Energy macro-model extraction methodology for network on chip routers, in Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD), 2005, pp. 254259. [18] G. Palermo and C. Silvano, PIRATE: A framework for power/performance exploration of network-on-chip architectures, in Proceedings of the 14th Intl. Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), 2004, pp. 521531. [19] T. T. Ye, L. Benini, and G. D. Micheli, Analysis of power consumption on switch fabrics in network routers, in Proceedings of the 36th Design Automation Conference (DAC02), 2002, pp. 524529. [20] C. S. Patel, S. M. Chai, S. Yalamanchili, and D. E. Schimmel, Power constrained design of multiprocessor interconnection networks, in Proceedings of the IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD), 1997, pp. 408416.
[21] H. Zhang, M. Wan, V. George, and J. Rabaey, Interconnect architecture exploration for low-energy recongurable single-chip DSPs, in Proceedings of the IEEE Workshop On VLSI 99, 1999, pp. 28. [22] F. Angiolini, P. Meloni, S. Carta, L. Benini, and L. Raffo, Contrasting a NoC and a traditional interconnect fabric with layout awareness, in Proceedings of the Design, Automation and Test in Europe (DATE) Conference and Exhibition, 2006, pp. 124129. [23] Synopsys Inc., Design Compiler, www.synopsys.org, Synopsys Inc. [24] Mentor Graphics, ModelSim, www.model.com, Mentor Graphics. [25] Synopsys Inc., PrimePower, www.synopsys.org, Synopsys Inc.