Analysis of Ring Topology For NoC Architecture
Analysis of Ring Topology For NoC Architecture
Abstract—In recent years, Network on Chips (NoCs) becoming increasingly popular. However such
have provided an efficient solution for interconnecting integration on a single chip can often become complex
various heterogeneous intellectual properties (IPs) on a and can hence, make the interconnection between
System on Chip (SoC) in an efficient, flexible and different resources and IP a challenging task [1].
scalable manner. Virtual channels in the buffers Several approaches have been followed to overcome
associated with the core helps in introducing the the complexity of interconnecting different
parallelism between the packets as well as in improving heterogeneous resources. NoCs is one such
the performance of the network. However, allocating a technological approach which aims at improving the
uniform size of the buffer to these channels is not always scalability and providing high performance to the SoC
suitable. The network efficiency can be improved by networks. NoCs are preferred over other
allocating the buffer variably based on the traffic interconnecting methods like dedicated wires and
patterns and the node requirements. In this paper, we buses due to its better reusability, flexibility and
use ring topology as an underlying architecture for the scalability of bandwidth.
NoC. The percentage of packet drops has been used as a
parameter for comparing the performance of different Dedicated wires are helpful for the systems having
architectures. Through the results of the simulations a small number of cores. However, as the system
carried out in SystemC, we illustrate the impact of complexity increases, the number of wires around each
including virtual channels and variable buffers on the core increases. The use of dedicated wires can
network performance. As per our results, we observed hence lead to poor flexibly and make the physical
that varied buffer allocation led to a better performance system cumbersome. The use of buses can overcome
and fairness in the network as compared to that of the the inflexibility of the dedicated wires. But the use of
uniform allocation. buses can result in lower throughout since it allows
only one communication transaction at a time. This in
Keywords— NoC; buffers; virtual channels; ring turn results in increased packet latency. Use of
topology multiple interconnected buses can overcome some of
these problems but the scalability provided by the
I. INTRODUCTION buses is still limited [2].
The growing desirability for low-power and high
performance of the computation intensive applications Buffers and channels are the two major assets of
has led to an increase in the number of computing the interconnection networks. Each channel is
resources on a single chip. Therefore, ICs integrating associated with a single buffer. This can lead to packet
several heterogeneous resources on a single chip, congestion in some channels and in turn bring down
commonly known as system on chips (SoC) are the overall throughput of the system. However, virtual
channels can overcome this issue by providing a way
978-1-4673-7309-8/15/$31.00
Authorized ©2015
licensed use limited to: National 381on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
IEEE of Singapore. Downloaded
University
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
for multiplexing a single channel into multiple buffers. followed in the paper, experiments and results have
By using VCs, multiple packets can share a particular been discussed in section IV and lastly the paper is
physical channel at the given point of time. The use of concluded in section V.
VCs enhances the resource allocation of packets and
the overall throughout of the network, and reduced the
network latency [3].
II. LITERATURE SURVEY
Although VCs help in reducing the network latency
and improving the bandwidth, there is a need for a In [6], authors have compared the performance of
routing algorithm that helps in reducing the packet loss NoC with the traditional point to point and bus
when some channels are being heavily used as communication architectures. The area covered by
compared to others. In real world scenarios, the different architectures and their respective energy
number of packets in the virtual channels of a physical consumption were also taken into account. Real world
channel may significantly differ. A particular channel workloads like video applications etc. were considered
may have considerably lesser number of packets to for assessing the performance of these architectures.
deliver than the other channels. Hence, having the Through these experiments it was revealed that the
same buffer size for each virtual channel or following NoC architecture scaled better than the other traditional
the normal packet allocation may not be desirable [4]. architectures in terms of energy, area as well as the
Allocating the appropriate buffer size for each channel performance.
can prove to be helpful in such scenarios. In such
scenarios, uniform buffer allocation may lead to packet The authors of [7] discuss the necessity of having a
loss when the traffic in different channels is varied. programmable interconnection network for the
computation intensive and complex applications. NoC
In this paper, we use ring topology for the architecture fulfills the demands of having a high
interconnection of different IPs in an on chip network. interconnect bandwidth and for exploring the parallel
We demonstrated the importance of using virtual processing capacity of multiple computational
channels over single channels by comparing the resources.
performance of both the implementations. We tried to
modify the design of traditional ring topologies by In [8], authors compare the performance of
making changes like including bidirectional links and different networks with and without VCs. By
variable buffer in order to improve the performance of conducting experiments on the 2D meshes of different
the network using ring topology. Further, we illustrate sizes, authors reveal that the improvement in latency
the importance of having a fair way of allocating after including VCs is higher for large grids than that
buffers based on the traffic pattern. The performance of of the small dimension networks. Also, the packet
different designs is compared by taking their respective injection rate increases significantly when VCs are
packet losses into account. included in an NoC architecture. Overall, the
performance of NoC improves with the insertion of
Crossbar topologies are more widely used than any virtual channels. Similar results were obtained in [9],
other topology as an underlying architecture for NoCs. wherein the authors carried out simulations to bring
Crossbars provide low packet latency and a fair way of into light the enhancement in performance after
delivering the packets. However, the amount of including VCs. From the results of the simulations it
physical resources used in crossbar is considerably was drawn that virtual channels are ideal for the NoCs,
higher than that used in the ring topologies [5]. Our especially for the ones with high packet injection
next aim and ongoing work is based on improving the rates. The decrease in the packet latencies after
performance of the ring topologies by making its including VCs occurred with the equivalent increase in
performance comparable to that of the crossbar power consumption. It was concluded that the NoC
technologies. with high packet injection rates should consist of more
The rest of the paper is organized as follows: number of VCs as compared to the one with lower
section II discusses the related work done in this area, injection rates. In the latter case, VCs should be
section III introduces the design and methodology optimized for both leakage and dynamic power
consumption.
382on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
Figure 1. NoC without virtual channels Figure 2. NoC with virtual channels Figure 3. NoC with varied buffer in virtual
channels
Figure 4. Bidirectional NoC with virtual channels Figure 5. Bidirectional NoC with varied buffer in virtual channels
In [9] the authors propose a VC allocation algorithm but instead allocates them based on the traffic
for efficiently assigning the VCs based on the traffic requirements. Jain’s fairness is discussed in [15] to
requirements in a 2D mesh. The NoC architecture measure fairness in a system. We have used this
following this algorithm performs better than the measure to evaluate fairness in different routing
uniform allocation method in terms of buffer methods.
utilization.
A centralized buffer structure has been introduced in III. DESIGN AND METHODOLOGY
[10], which dynamically allocates the number of The design chosen for the simulations consists of a
virtual channels and buffers based on the traffic register ring of size 5. This means register 1 is
conditions. The simulations were carried out on a connected to register 2 through a link, register 2
conventional NoC architecture. Various traffic patterns connected to register 3 and so on till register 5.
like uniform random, tornado and normal random were Register 5 will be connected to register 1, forming a
used to evaluate the performance of the architecture ring of registers. Each register will have its own input
proposed but the authors. Network latency and the and output buffers. Each buffer can accommodate
buffer utilization were used to compare the certain number of message packets at the given time.
performance of the designs with and without the Each message packet includes information like source
centralized buffer. However, the throughout and the core, destination core and the message to be delivered.
percentage of packet losses were not taken into account These buffers are connected to cores. Thus each core
in this paper, for assessing the performance of the can be associated with a set of input and output buffers
modification proposed by the authors. and also a register present in the register ring. So if a
core wants to send a message to another core, it has to
In [12] authors discuss about the need of fair allocation add it to the input buffer connected to it. This message
of resources in a network. Different techniques are gets transferred only if the buffer has a vacancy. If the
discussed to compare their efficiency. The min max input buffer is full the packet gets dropped. Later, the
approach fits our 'algorithm' because it does not buffer passes the message to the ring register that it is
allocate equal quantity of resources to all the switches connected to, only when the register is empty. The
383on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
register will pass the message to next register section meant to store packets with destination as d. As
connected to it. This message keeps travelling from a result, the section for packets with destination a will
one register to another till the register assigned to the never be full and the section meant for d will always be
destination core gets the message. Once the message full and the packets will be dropped even when the
reaches this register, it transfers it to the output buffer input buffer is not actually full. The problem is just that
and then the output buffer transfers the message to the the empty space belongs to some other section.
core connected to it. We have considered the size of According to the new variation, the size will be
each buffer to be 3 i.e. each buffer can accommodate 3 assigned to particular section, meant for a destination a
message packets at a given time. The method based on the percentage of packets that have this core
mentioned involves the simplest design and routing. as destination. So if the packets with destination d are
These things can be modified in order to achieve better more than all other packets then they will be assigned a
performance. bigger section. As there is a separate section in the
input buffer for packets having each core as
One variation from the usual way of having input destination, the bullying as mentioned above is avoided
buffers is using the idea of virtual channels. This even now. But along with this, the number of packets
means each input buffer has many sections of same dropped can also be reduced as the size of the section
size. Each section will store the packets having a meant for packets that form the majority will be
certain core as destination. Earlier, if most of the greater. In this manner, the input buffers can be used
packets had a specific core as destination say d, they optimally.
would occupy the input buffer of a given core most of
the time leading to the dropping of packets having We have considered the different ways in which we
other cores as destination. Now, the packets meant for can achieve better performance by modifying the
other cores will be stored in a separate section. So the design of the components. Now we can consider the
packets having destination as d will have a different method of routing. The basic method explained above
section in the input buffer and they will have to has unidirectional routing. This means, the packets will
compete among themselves. In this design, the be transferred from one ring register to the other only
situation where packets with destination d bully the in one direction. Let us assume the order of passing is
packets with other cores as destination can be avoided. from register 1 to register 2 and so on till register 5 and
But if packets having destination d are large in number register 5 to register 1. Thus if a packet at register 1 has
then the number of dropped packets will still section in core 5 as destination, it will have to travel to register 2,
the input buffer and they will have to compete among 3, 4 and only then it can reach register 5. If we
themselves. In this design, the situation where packets consider bidirectional routing, then the same packet
with destination d bully the packets with other cores as will just be one hop away from destination. Thus the
destination can be avoided. But if packets having time required to reach the destination will be less and
destination d are large in number then the number of as a result, the speed of packet flow also increases.
dropped packets will still be high. So, this idea can This can lead to a reduction in the packet drop count.
only avoid the bullying but not the number of packet
drops. We can only assume that the modifications will
lead to better performance. The actual performance
Another idea proposed involves the use of virtual also depends on the traffic. We have considered a few
channels but in a different way. Here the size of the traffic patterns and tried different combinations of the
sections assigned is not the same. This means the size previously mentioned modifications in routing, design
of the section allotted for packets with destination a of the system. We have simulated the different traffic
need not be the same as the size of the section meant patterns with the combinations and observed the packet
for packets with destination b. This idea is used drop count. The different traffic patterns considered are
because if we assign same size to the sections meant based on the assumption that there are 5 cores and each
for different destinations, there will be wastage of core will be creating packets that consist of a message,
space. For example, say a very small fraction of a source identifier and a destination identifier. The
packets have a as destination and a large fraction have packets are assumed to be unicast. So a packet will
d as destination. Now, the size of the section meant for have only one core as destination.
packets with destination as a will be the same as that of
384on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
First traffic pattern considered is as follows. The rate and Jain’s Fairness Index. The values of these
destination of a packet generated at a core can be any parameters have been shown below. The cost for the
of the other cores with equal probability. The construction of the designs is different. So, even if a
destination of the packet is chosen randomly. Here the design performs better than all the others in a given
assumption is that there is no predefined information scenario, the cost incurred in the construction of the
regarding most of the packets being directed to a design might be higher as it has to facilitate additional
specific core [13]. components like buffers, extra registers in the case of
bidirectional routing, etc.
The second pattern is such that the probability of a
newly generated packet having a core c as destination
Here unidirectional routing without buffers is
is greater than that of any other core. This means most
represented as model-1, unidirectional routing with
of the packets generated at a core will have a particular
uniform buffers is represented as model-2,
node as destination. Here the destination is chosen
unidirectional routing with varied buffer is represented
randomly again but the randomness is such that a
by model-3, bidirectional routing with uniform buffer
higher weightage is given to one of the cores being
is represented by model-4 and bidirectional routing
chosen. This pattern is called 1-Hotspot traffic pattern.
with varied buffer is represented by model-5.
TABLE I. Random distribution of traffic
385on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
Model-5 703 133 18 0.771 Fig 9: Bar graph showing comparison of Jain’s fairness index for 1-
hotspot distribution
386on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
387on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded
2015 Intl. Conference on Computing and Network Communications (CoCoNet'15), Dec. 16-19, 2015, Trivandrum, India
[6] Lee, H. G., Chang, N., Ogras, U. Y., & Marculescu, R. (2007).
On-chip communication architecture exploration. ACM
Transactions on Design Automation of Electronic Systems, 12(3).
[7] Kumar, S., Jantsch, A., Soininen, J. P., Forsell, M., Millberg,
M., Öberg, J., ... & Hemani, A. (2002). A network on chip
architecture and design methodology. In VLSI, 2002. Proceedings.
IEEE Computer Society Annual Symposium on(pp. 105-112). IEEE.
[8] Mello, A., Tedesco, L., Calazans, N., & Moraes, F. (2005,
September). Virtual channels in networks on chip: implementation
and evaluation on hermes NoC. In Proceedings of the 18th annual
symposium on Integrated circuits and system design (pp. 178-183).
ACM.
[11] Nicopoulos, C., Park, D., Kim, J., Vijaykrishnan, N., Yousif,
M. S., & Das, C. R. (2006, December). ViChaR: A dynamic virtual
channel regulator for network-on-chip routers.
388on November 07,2022 at 10:32:42 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: National University of Singapore. Downloaded