0% found this document useful (0 votes)
63 views6 pages

Interleaved Edge Routing in Buffered 3D Mesh & Cmesh Noc

This document summarizes a research paper that proposes an interleaved edge routing technique for 3D mesh and concentrated mesh (CMesh) network-on-chip (NoC) architectures. The technique stacks multiple 2D NoC layers without modifying the underlying 2D router architecture. Interleaved connections are made between edge routers in the 3D buffered NoC using through-silicon vias (TSVs). Simulation results show the proposed "MML" (Modified Multi Layer) network design achieves significant improvements in throughput and latency compared to conventional 2D and 3D buffered networks using the same number of routers.

Uploaded by

NEETHU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views6 pages

Interleaved Edge Routing in Buffered 3D Mesh & Cmesh Noc

This document summarizes a research paper that proposes an interleaved edge routing technique for 3D mesh and concentrated mesh (CMesh) network-on-chip (NoC) architectures. The technique stacks multiple 2D NoC layers without modifying the underlying 2D router architecture. Interleaved connections are made between edge routers in the 3D buffered NoC using through-silicon vias (TSVs). Simulation results show the proposed "MML" (Modified Multi Layer) network design achieves significant improvements in throughput and latency compared to conventional 2D and 3D buffered networks using the same number of routers.

Uploaded by

NEETHU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Interleaved Edge Routing in Buffered 3D Mesh &

CMesh NoC
Rose George Kunthara∗ , Neethu K∗ , Rekha K James∗ and Simi Zerine Sleeba†
∗ Division of Electronics, School of Engineering, CUSAT, Cochin, India
† Dept. of Electronics & Communication Engineering, Rajagiri School of Engineering and Technology, Cochin, India

[email protected], [email protected], [email protected], [email protected]

Abstract—Many-core processors are widely used in areas such and one connected to local processing core. Standard input
as cloud computing, big data processing, high performance buffered NoC employing store-and-forward wormhole routing
computing and datacentre applications. Network on Chip (NoC) mechanism, exhibit better load handling ability and network
is the preferred interconnect solution which can overcome
scalability issues and communication bottleneck associated with performance [1], [2], [3].
them. Minimal latency, area and better throughput are the Mesh topology is commonly used to implement NoC sys-
key performance parameters of on-chip network design. The tem. Concentrated Mesh (CMesh) topology can minimize net-
performance can be greatly enhanced by replacing 2D NoC work complexity in terms of wiring and router area overhead
communication infrastructure with 3D NoC where multiple NoC
and thereby improve network performance [4]. We employ a
layers are integrated using high-speed Through Silicon Via (TSV)
based vertical links. 3D router designs incur extra power and design approach that combine conventional VCR based NoC
area in addition to integration issues such as reliability and with 3D integration that utilize TSV links. Multiple 2D planar
fabrication problems, related with TSV based interconnection. In network layers are arranged one above the other, without mod-
this paper, we utilize an asymmetrical routing technique in Mesh ifying underlying 2D router architecture. We use this design
& CMesh topologies where we make interleaved connections
approach for Mesh based and CMesh based architectures to
between edge routers in 3D buffered on-chip network to improve
NoC performance. Simulation results indicate that our design evaluate NoC performance. On comparison with equivalent 2D
approach, MML (Modified Multi Layer) network, has significant and 3D counterparts which employ equal number of routers,
improvement in throughput and latency reduction on comparing our design approach exhibits improved network performance
with conventional buffered networks which employ same number with regard to average latency and throughput with minimal
of routers.
area footprint.
Index Terms—3D NoC, TSV, buffered router, throughput,
latency The rest of this paper is arranged as follows: Section II
summarizes existing solutions and in Section III motivation
I. I NTRODUCTION behind our design approach is described. Section IV provides
particulars of our design and in Section V we describe the
With the advent of advanced miniaturization techniques in experimental methodology followed. Section VI discusses
transistor integration, large number of processing cores are in- about results and analysis, before concluding the paper in
tegrated within a single chip known as System-on-Chip (SoC). Section VII.
This has led to emergence of Tiled Chip Multi Processors
(TCMP) which contain tens or hundreds of cores integrated II. BACKGROUND
on a chip. The standard point to point and bus based com-
munication fabrics pose a performance bottleneck to growing As 2D NoC has restricted floor planning choices, perfor-
communication needs of many-core processor systems. NoC mance improvements due to NoC designs become limited as
has emerged as the preferred interconnection infrastructure number of processing cores are scaled up. The adoption of
in TCMP design due to its improved scalability, reliability, 3D NoC systems made by stacking various layers of NoC
higher bandwidth, improved throughput, better parallelism, devices can accentuate functionality and improve performance.
lower power consumption, etc [1], [2]. Shorter interconnects, better noise immunity and improved
In a regular tile based NoC architecture, Mesh topology is packaging density are some key features of 3D NoC structures
generally used to connect several tiles or processing elements [5], [6]. Pavlidis et al. evaluate speed, power dissipation and
(PE). A high-speed router is attached to every PE and the zero-load latency to show superior performance of their 3D
communication between them happen through bidirectional NoC design in comparison to 2D Mesh NoC [7]. Ciliate 3D
links. Network traffic through NoC occurs because of cache Mesh consists of 3D Mesh based design where each switch
misses as well as coherence transactions. Packet, which is the has connections to multiple IP blocks [8]. In spite of using 7-
fundamental unit of data transference amongst tiles, is split port router structure, their proposed architecture has decreased
into flits (flow control units). Each router contains five input overall bandwidth as multiple IP blocks are connected to a
and five output ports, connected to north, east, south, west router, resulting in minimal connectivity.
Xu et al. evaluates dependency of performance and func- layer NoC structure which employs conventional VCR based
tionality of 3D NoC structures on number of TSVs by de- 5-port architecture [19]. It shows improved performance in
creasing TSV numbers to half and quarter [9]. Though TSV comparison to planar 2D Mesh and 2D torus based designs.
is the most viable solution that allows high-speed vertical
intercommunication across 3D NoC layers, it is vulnerable III. M OTIVATION
to manufacturing defects and faults [10]. Several Vertical- With scaling of number of processing cores in a planar NoC
Partially-Connected NoCs or irregular 3D NoC topologies and network, number of hops traversed increases which can affect
their routing techniques are active research areas that focus communication quality and power dissipated in a network [19].
on minimizing number of vertical links for better system Large network size also leads to increased chip area and floor
performance. plan. 3D NoC systems which consists of several layers of
Bartzas et al. proposes software-supported exploration based active dies, can improve network performance manifold. This
approach to define pattern-based interconnection topologies is due to reduced network diameter and packet transmission
for 3D NoC architectures [11]. Their aim is to reduce through delay. For efficient routing of packets along all the dimensions
silicon via based interconnections in 3D NoC structures and in 3D NoC structures, 3D router structures are to be employed.
consider homogeneous 3D Mesh and torus designs for their But this leads to increased interconnections, ports and complex
experimental evaluations. The authors propose several verti- arbitration mechanism which result in larger area overhead and
cally partially joined structures which can adjust vertical links power dissipation [18].
locations based on application flows. They come up with a TSV links are commonly used to form vertical interconnec-
routing technique based on temporary destination utilization tions as they are fast and power efficient. But TSVs incur area
across intermediate layers. overhead, misalignment problems, low fabrication yield and
Dubois et al. proposes a distributed routing technique to additional manufacturing processes are to be used. Minimizing
route packets in irregular 3D NoC topologies formed by the number of TSVs can improve reliability, reduce area
partially connecting together several planar topologies using overhead and manufacturing cost. The performance of 3D NoC
small number of vertical links [12]. Their “Elevator-First” systems is thus dependent on number of TSVs employed in a
routing technique which uses two virtual channels is determin- design.
istic, livelock and deadlock-free. Packets that have different In this paper, we capitalize on area and performance benefits
source and destination layers first follow X-First algorithm by pertaining to planar 2D NoC & 3D NoC structures to improve
which they reach nearest vertically connected router, known performance of NoC. This is achieved by stacking multiples
as elevator. Then packets get routed to destination router layer layers of 2D NoC networks that utilize 2D router architec-
via the elevator to finally reach destination router by again ture. TSV based vertical interconnections are formed only at
following X-First routing technique. unutilized ports of routers located at edges in an interleaved
Redelf, which focus on 3D Mesh topology, is a modified manner. The asymmetrical routing method improves routing
elevator-first routing algorithm that achieves deadlock-freedom efficiency with minimal router hardware and TSV links.
without using any virtual channel [13]. Improper elevator
assignments can cause network performance degradation as IV. P ROPOSED D ESIGN
there are lesser number of elevators than routers. Foroutan et NoC network performance is dependent on efficient router
al. suggests elevator assignment technique for best-effort 3D architecture, topology and routing algorithm. In traditional
NoCs [14]. Irregular decreased vertical links density 3D NoC VCR based NoC network, flits will remain in input buffers
topology can improve system performance by using dynamic associated with input ports until they acquire desired output
quadrant partitioning (DQP) adaptive routing technique [15]. port. The required output port for each packet is calculated
First-Last is a lightweight, resilient and highly efficient by routing unit. For each packet, VC allocator unit allocates
adaptive routing algorithm that is specially designed for par- a VC in downstream router. Switch allocation unit performs
tially vertically connected 3D NoC systems [16]. It employs arbitration to select winning packet when several packets need
very small number of virtual channels and guarantee deadlock- the same downstream router. Crossbar serves as switching
freedom. To evenly distribute network traffic across incomplete fabric.
3D NoC systems, Vahdatpanah et al. introduces an efficient We consider Mesh and CMesh topologies to evaluate our
adaptive routing technique. Their algorithm utilizes turn model design due to its scalability, symmetry and router interconnec-
analysis to classify layers, rows and columns into vari- tions that are made using short wires.
ous groups. Congestion-aware dynamic elevator assignment
(CDA) technique can be used for 3D NoC systems that do A. Mesh:
not have full vertical connectivity [17]. CDA method, which is 5-port planar 2D Mesh configuration can be extended to a
based on network congestion information and distance factors, 3D Mesh structure by employing additional vertical links at
is able to achieve better network performance. each router for interlayer communication. 3D Mesh structure
6-port or 7-port routers are generally used in most of the uses seven port router architecture: east, west, north, south,
work related to vertical link reduction techniques that can up, down and the local processing core. Figure 1(a) shows
improve performance of 3D NoC. 2L-2D is based on a two our modified multi layer network structure, which is modelled
Fig. 1. (a) MML Mesh NoC architecture (b) MML CMesh NoC architecture

as 4 × 4 × 4 Mesh NoC where interleaved edge routers are to nearby router located at edge such that it traverses least
utilized to interconnect adjoining layers. In a conventional 2D Manhattan distance to reach destination router at another layer.
Mesh structure, except for routers located at the edges, all
V. E XPERIMENTAL M ETHODOLOGY
other routers make use of entire 5-ports for creating intercon-
nects structure. The vacant ports of routers located at edges Booksim 2.0, an open source cycle accurate NoC simulator
are exploited in this design approach to build asymmetrical is used to model traditional VC based NoC router [21]. Mesh
interconnections using TSV links across adjoining layers. and CMesh topologies are used for our experimental evalua-
tions. We consider CMesh topology with concentration, C=2
such that one router is shared by two processing cores. Thus
B. Concentrated Mesh (CMesh): only 64 routers are required for a 128-core set up employing
CMesh topology is a popular cost-effective extension to CMesh topology. Whereas for Mesh topology, 64 routers are
widely used Mesh topology. In a CMesh topology, one large employed for a 64-core set up. We then make necessary
router is shared by a number of processing cores. Com- modifications to model modified multi-layer network.
pared to Mesh topology, for the same number of processing A. Synthetic Workload
cores, CMesh thus employs lesser number of routers thereby Synthetic traffic patterns represent abstract prototypes of
reducing hop count and total number of horizontal wires. message flow occurring in NoC. Standard synthetic traffic
Concentration refers to the number of cores that are connected workloads like uniform, tornado, bit-complement, bit-reverse,
to a router. For C=2, planar 2D CMesh structure employ 6-port etc. are used to compare performance of MML network against
routers whereas 3D CMesh structure require 8-port routers. 2D and 3D networks for both Mesh and CMesh topologies
Our CMesh topology based modified multi layer network is employing 64 routers. After adequate warm up time, average
shown in Figure 1(b). It employs 6-port router structure where latency and throughput readings are collected by changing
vacant ports of routers located at edges are interconnected injection rate between zero and network saturation point for
using TSV based vertical links in an interleaved fashion to every traffic pattern.
facilitate interlayer communication.
Both Mesh and CMesh based modified multi layer network B. Real Workloads
structures employ asymmetric routing algorithm [20] to de- The performance of MML network is evaluated using real
termine required output port for each packet. For intra-layer application workloads such as SPEC CPU2006 [22] against
routing, packets follow XY routing algorithm. When current both 2D and 3D networks for Mesh and CMesh topologies.
and destination routers are located in separate layers, packets Gem5 simulator is used to prototype a 64-core and a 128-
need to undergo inter-layer routing. So, the packet is sent core multiprocessor system for Mesh and CMesh topologies
TABLE I network structures due to reduced hop count in MML based
CACHE MPKI VALUES BASED CLASSIFICATION OF BENCHMARK design but higher latency on comparison with their equivalent
APPLICATION PROGRAMS
3D plot.
Percentage Miss Rate Benchmark applications
Low MPKI (less than 5) calculix, gobmk, gromacs, h264ref B. Throughput
Medium MPKI (between 5 and 25) bwaves, bzip2, gamess, gcc Throughput is defined as amount of packet ejections from
High MPKI (greater than 25) hmmer, lbm, leslie3d, mcf
entire network per node in each cycle. In any multilayer
NoC, average hop count and total count of horizontal and
TABLE II
PERCENTAGE DISTRIBUTION OF VARIOUS NETWORK INJECTION
vertical links decide throughput improvement. Figure 5(a) &
INTENSITY APPLICATIONS IN DIFFERENT BENCHMARK COMBINATIONS 5(b) shows that Mesh and CMesh based MML networks have
better throughput than their conventional 2D counterparts but
Benchmark Mix M1 M2 M3 M4 M5 M6 M7
% of Low 100 0 0 50 0 50 31
lower throughput when compared with their 3D counterparts
% of Medium 0 100 0 0 50 50 31 for different synthetic traffic patterns.
% of High 0 0 100 50 50 0 38
C. Wiring Overhead
In any NoC, overall area overhead comprises of router area
respectively [23]. Every core contains an out-of-order x86 and wiring overhead. The wiring overhead in VCR based
processing unit with 2 levels of cache hierarchy: 64KB, 4- 2D Mesh and CMesh topology is on account of horizontal
way set associative, private L1 cache and 512KB, 16-way connections only (both 2D Mesh and CMesh NoC employ 112
set associative, shared distributed L2 cache. Every core runs horizontal links). The total wiring overhead in 3D Mesh and
an application program from SPEC CPU2006 suite. We cate- CMesh NoC include both horizontal (96 links) and vertical
gorize application programs as shown in Table 1, depending interconnections (48 links). MML network based Mesh and
on misses per kilo instructions (MPKI) values. As given in CMesh topologies use equal number of horizontal connections
Table 2, 7 multiprogrammed workload mixes are created based (96 links) like their 3D counterparts, but minimal vertical
on network injection intensity combination of component interconnections (24 links), which is due to interleaved TSV
benchmarks. To simulate network operations, network traffic based interconnects that are formed at unused ports of edge
produced by running real applications are given to Booksim routers only. Thus the modified multi-layer structure has better
simulator. area reduction due to 50% decrease in number of TSV links
as TSVs occupy considerable silicon area and metal area.
VI. E XPERIMENTAL A NALYSIS
The performance of Mesh and CMesh topology based MML D. Hardware Overhead
network design is compared against conventional VCR based We employ DSENT, a NoC modelling tool to calculate and
2D and 3D counterparts. We perform experiments on: (a) 8×8, compare router area and power used in our MML NoC struc-
2D VCR Mesh and CMesh with XY routing, (b) 4 × 4 × 4, tures [24]. 22nm processor technology operating at 1GHz fre-
3D VCR Mesh and CMesh with XYZ routing, and (c) 4 × quency is assumed in our experimental evaluations. Compared
4 × 4 Mesh and CMesh with MML network struture using to 7-port 3D Mesh NoC router, 5-port router structure used
asymmetric routing algorithm. in MML network based Mesh topology has area and power
reduction of 48% and 46% respectively. When compared with
A. Average Flit Latency
8-port 3D CMesh NoC router, 6-port router structure employed
Average latency graphs for various synthetic workloads in MML CMesh topology has 39% and 38% reduction in area
corresponding to Mesh and CMesh networks are depicted in and power respectively.
Figure 2 & Figure 3 respectively. Better router performance
is indicated by broader and lower latency curve. VCR based VII. C ONCLUSION
3D NoC architectures are expected to have reduced average With technology scaling, low power and cost effective so-
latency as they have higher number of links and decreased lutions are needed to meet performance and area requirements
number of hops that are traversed. As the packet injection associated with on-chip networks. TSV based 3D integration
rate approaches saturation value, average latency rises expo- permits stacking of active devices with different topologies
nentially. 3D Mesh and CMesh based NoC networks exhibit and functionalities with 3D NoC as interconnect backbone. In
higher saturation injection rate as they have more number of this paper, our design approach is based on 3D NoC structure
ports to handle network traffic. MML network based Mesh formed by piling several layers of planar 2D NoC networks
and CMesh topologies exhibit lower network saturation point where interlayer communication is made possible by connect-
than their 3D counterparts but better than that of their 2D ing edge routers in an interleaved manner. Thus improved
counterparts. performance is achieved in both Mesh and CMesh based multi
Figure 4(a) & 4(b) depicts average latency comparison layer topolgoies by using 2D NoC router architecture that
under real workloads for Mesh and CMesh networks respec- follows asymmetric routing technique and reduced number of
tively. We can observe that across all benchmark mixes, MML vertical interconnection links. Simulation results show merits
based network has less average latency when compared to 2D of our design approach compared to other designs considered.
Fig. 2. Average latency comparison for different synthetic traffic patterns in various Mesh networks

Fig. 3. Average latency comparison for different synthetic traffic patterns in various CMesh networks

R EFERENCES [10] A. Eghbal, P. M. Yaghini, N. Bagherzadeh and M. Khayambashi,


“Analytical Fault Tolerance Assessment and Metrics for TSV-Based 3D
Network-on-Chip,” in IEEE Transactions on Computers, vol. 64, no. 12,
[1] W. Dally and B. Towles, Principles and Practices of Interconnection pp. 3591-3604, 2015.
Networks, Morgan Kaufmann, USA, 2004.
[11] A. Bartzas et al., “Exploration of alternative topologies for application-
[2] William Dally, “Route packets, not wires: On-Chip interconnection specific 3d networks-on-chip,” in Proc. Workshop Application Specific
networks”, in Design Automation Conference, pages 684-689, New York, Processors (WASP), 2007.
ACM Press, June 2001. [12] F. Dubois, A. Sheibanyrad, F. Ptrot and M. Bahmani, “Elevator-First:
[3] W. Dally, “Virtual-channel flow control,” IEEE Transactions on Parallel A Deadlock-Free Distributed Routing Algorithm for Vertically Partially
and Distributed Systems, vol. 3, no. 2, pp. 194-205, 1992. Connected 3D-NoCs,” in IEEE Transactions on Computers, vol. 62, no.
[4] S. Loucif, “Concentration and Its Impact on Mesh and Torus-Based NoC 3, pp. 609-615, March 2013.
Performance,” in 23rd Euromicro International Conference on Parallel, [13] J. Lee and K. Choi, “A deadlock-free routing algorithm requiring
Distributed, and Network-Based Processing, 2015. no virtual channel on 3D-NoCs with partial vertical connections,”
[5] A. W. Topol et al., “Three-Dimensional Integrated Circuits,” in IBM J. in IEEE/ACM International Symposium on Networks-on-Chip (NoCS),
Research and Development, Vol. 50, No. 4/5, 2006. 2013.
[6] W. R. Davis et al., “Demystifying 3D ICS: The Pros and Cons of Going [14] S. Foroutan, A. Sheibanyrad and F. Ptrot, “Assignment of Vertical-
Vertical,” in IEEE Design and Test of Computers, Vol. 22, No. 6, pp. Links to Routers in Vertically-Partially-Connected 3-D-NoCs,” in IEEE
498-510, 2005. Transactions on Computer-Aided Design of Integrated Circuits and
[7] V. F. Pavlidis and E. G. Friedman, “3-D Toplogies for Networks-on- Systems, vol. 33, no. 8, pp. 1208-1218, 2014.
Chip,” in IEEE-TVLSI, pp. 1081-1090, Oct. 2007. [15] H. Ying, K. Hofmann and T. Hollstein, “Dynamic quadrant partitioning
[8] B. S. Feero and P. P. Pande, “Networks-on-Chip in a Three-Dimensional adaptive routing algorithm for irregular reduced vertical link density
Environment: A Performance Evaluation,” in IEEE Transactions on topology 3-Dimensional Network-on-Chips,” in International Confer-
Computers, pp. 32-45, 2009. ence on High Performance Computing & Simulation (HPCS), 2014.
[9] T. Xu, P. Liljeberg, and H. Tenhunen, “A study of through silicon via [16] A. Charif et al., “First-Last: A Cost-Effective Adaptive Routing Solution
impact to 3D network-on-chip design,” in Proc. Conf. Electron.Inf. Eng., for TSV-Based Three-Dimensional Networks-on-Chip,” in IEEE Trans-
pp. 333-337, 2010. actions on Computers, vol. 67, no. 10, pp. 1430-1444, 2018.
Fig. 4. Average latency comparison for real workloads. (a) Mesh networks (b) CMesh networks

Fig. 5. Throughput comparison for different synthetic traffic patterns. (a) Mesh networks (b) CMesh networks

[17] Y. Fu et al., “‘Congestion-Aware Dynamic Elevator Assignment for


Partially Connected 3D-NoCs,” in IEEE International Symposium on
Circuits and Systems (ISCAS), 2019.
[18] M. O. Agyeman et al., “Performance and Energy Aware Inhomogeneous
3D Networks-on-Chip Architecture Generation,” in IEEE Transactions
on Parallel and Distributed Systems, Vol.27, No.6, pp. 1756-1769, 2016.
[19] R. G. Kunthara et al., “2L-2D Routing for Buffered Mesh Network-on-
Chip,” in VDAT, 2019.
[20] R. G. Kunthara et al., “Asymmetric Routing in 3D NoC using Interleaved
Edge Routers,” in NoCArc, 2019.
[21] Nan Jiang et al., “A Detailed and Flexible Cycle-Accurate Network-on-
Chip Simulator,” in ISPASS, 2013.
[22] “SPEC2006 CPU benchmark suite,” http://www.spec.org.
[23] N. Binkert et al., “The gem5 simulator,” SIGARCH Computer Archi-
tecture News, vol. 39, no. 2, pp. 1-7, 2011.
[24] C. Sunet al., “DSENT - A Tool Connecting Emerging Photonics with
Electronics for Opto-Electronic NoC Modeling,” in NOCS, 2012.

You might also like