0% found this document useful (0 votes)
32 views68 pages

Miniproject

Bidirectional network on chip Router is the device it connects two or more networks together System on chip is the integrated circuit in which all components are combined together

Uploaded by

yarakalasreeja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views68 pages

Miniproject

Bidirectional network on chip Router is the device it connects two or more networks together System on chip is the integrated circuit in which all components are combined together

Uploaded by

yarakalasreeja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 68

A Mini Project Report on

by
Advanced FIFO Structure For Router In Bi-Noc
Submitted in partial fulfillment for the award of the degree of Bachelor of Technology in
Electronics and Communication Engineering
by

Yarakala Sreeja 21321A04G0


Boya Srija 21321A04G4
Gona Teja Sree 21321A04H4

Under the esteemed Guidance of

Internal Guide

Ms. B ELEENA

Assistant Professor, ECE Department

Bhoj Reddy Engineering College for Women


Department of Electronics and Communication Engineering
(Sponsored by Sangam Laxmibai Vidyapeet, approved by AICTE & affiliated to
JNTUH) Vinaynagar, Santoshnagar X roads, Saidabad, Hyderabad – 500 059
Ph: +91-40-2459 2400 Fax: +91-40-2453 7281, www.brecw.ac.in, [email protected]

2024-25
Acknowledgement

The satisfaction that accompanies the successful completion of the task would be
incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crown all the efforts with success.

We would like to express our sincere gratitude to Ms.B.Eleena, Assistant


Professor and Project guide for the eminent guidance and supervision at every stage.
We are thankful to Ms.S.Manjula, Head of the Department, for her valuable
guidance and encouragement during our Project.

We also thank Dr.J.Madhavan, Principal, BRECW for providing the wonderful


education environment in our college.

We are equally thankful to Ms. B Eleena, Project Coordinator for her continuous
support and all the staff of Electronics and Communication Engineering Department of
BRECW for their timely help and suggestions in the Project.

Yarakala Sreeja(21321A04G0) [email protected]


Boya Srija(21321A04G4) [email protected]
Gona Teja Sree(21321A04H4) [email protected]
Tables of contents

Particulars Page Number

List of Figures i

Abstract ii

Chapter 1 Introduction 1

1.1 Introduction 1

1.2 Application of router 2

1.3 Historical and technical information 3

1.4 Why would i need a router? 4

Chapter 2 Literature survey 7

2.1 Overview of Network-on-Chip 8

2.2 Background 9

Chapter 3 Router design specification 10

3.1 Overview 10

3.2 Packet Format 10

3.2.1 Packet Header 13

3.2.2 Packet-Payload 14

3.3 Router Input Protocoal 14

3.4 Router Output Protocoal 14

3.5 Features 17

3.6 Specification 20

Chapter 4 Four Port Router Architecture 20

4.1 Router Architecture 21

4.2 Register Block 22


4.3 Router Controller(FSM) 22

4.4 Router output Block 23

4.5 Design aspects and approach 23


Chapter 5 Results and Discussion 25

5.1 Introduction 25

5.2 Working 25

5.3 Results 26

5.4 Conclusion 28

Chapter 6 Conclusion and Future scope 29

6.1 Conclusion 29

6.2 Future Scope 29


References 30
List of figures

Figure Title Page Number

1.1 OFDM Transmission Process 4


2.1 MATLAB Desktop 11
2.2 MATLAB Editor 13

3.1 MIMO-OFDM 16
3.2 Block Diagram of MIMO-OFDM 20
5.1 Triangular wave 26
5.2 Probability Spectral Density with 26
respect to SNR
5.3 RMSE with respect to SNR 27
5.4 BPSK with AWGN 27
5.5 Probability Spectral Density with 28
respect to SNR

i
Abstract
Network on chip (NoC) becomes a promising solution for intercommunication infrastructure in
System on Chip (SoC) as traditional methods exhibit severe bottlenecks at intercommunication
among processor elements. However, designing of NoC is majorly complex because of lot of issues
raise in terms of performance metrics such as system scalability, latency, power consumption and
signal integrity. This paper discussed issues of memory unit in router and thereafter, proposing
advanced memory structure. To obtain efficient data transfer, FIFO buffers are implemented in
distributed RAM and virtual channels for FPGA based NoC. An advanced FIFO based memory
units are proposed in NoC router and the performance is evaluated in Bi-directional NoC (Bi-NoC).
The major motivation of this paper is to reduce burden of router while improving FIFO internal
structure. To enhance the speed data transfer, Bi-NoC with a self-configurable intercommunication
channel is proposed. The Simulations and synthesis results are proven guaranteed throughput,
predictable latency, and fair network access highly provided when compared to recent works.
Keywords: Bi-NoC; FIFO; Virtual Channel; Switch Allocator; Router; SoC.

ii
[Document title]

Chapter 1
Introduction

1.1 Introduction:
System on chip (SOC) is a complex interconnection of various functional elements. It creates
communication bottleneck in the gigabit communication due to its bus based architecture. Thus
there was need of system that explicit modularity and parallelism, network on chip possess many
such attractive properties and solve the problem of communication bottleneck. It basically works
on the idea of interconnection of cores using on chip network.

The communication on network on chip is carried out by means of router, so for implementing
better NOC, the router should be efficiently design. This router supports four parallel connections
at the same time. It uses store and forward type of flow control and Fsm Controller deterministic
routing which improves the performance of router. is packet switching which is generally used on
network on chip. The switching mechanism used here

In packet switching the data the data transfers in the form of packets between cooperating routers
and independent routing decision is taken. The store and forward flow mechanism is best because
it does not reserve channels and thus does not lead to idle physical channels.

The arbiter is of rotating priority scheme so that every channel once get chance to transfer its
data. In this router both input and output buffering is used so that congestion can be avoided at
both sides.

A router is a device that forwards data packets across computer networks. Routers perform the
data "traffic direction" functions on the Internet. A router is a microprocessor- controlled device
that is connected to two or more data lines from different networks.

When a data packet comes in on one of the lines .The router reads the address information in the
packet to determine its ultimate destination. Then, using information in its routing table, it directs
the packet to the next network on its journey.
[Document title]

The router is a” Four Port Network Router” has a one input port from which the packet enters. It
has three output ports where the packet is driven out. Packet contains 3 parts. They are Header,
data and frame check sequence. Packet width is 8 bits and the length of the packet can be between
1 bytes to 63 bytes. Packet header contains three fields DA and length.

Destination address (DA) of the packet is of 8 bits. The switch drives the packet to respective
ports based on this destination address of the packets. Each output port has 8-bit unique port
address. If the destination address of the packet matches the port address, then switch drives the
packet to the output port, Length of the data is of 8 bits and from 0 to 63.
Length is measured in terms of bytes. Data should be in terms of bytes and can take anything.
Frame check sequence contains the security check of the packet. It is calculated over the header
and data.

A data packet is typically passed from router to router through the networks of the Internet until it
gets to its destination computer. Routers also perform other tasks such as translating the data
transmission protocol of the packet to the appropriate protocol of the next network.

1.2 Applications of Router:

When multiple routers are used in interconnected networks, the routers exchange information
about destination addresses, using a dynamic routing protocol. Each router builds up a table
listing the preferred routes between any two systems on the interconnected networks. A router has
interfaces for different physical types of network connections, (such as copper cables, fiber optic,
or wireless transmission).

It also contains firmware for different networking protocol standards. Each network interface
uses this specialized computer software to enable data packets to be forwarded from one protocol
transmission system to another.

Routers may also be used to connect two or more logical groups of computer devices known as
subnets, each with a different sub-network address. The subnets addresses recorded in the router
do not necessarily map directly to the physical interface connections.
[Document title]

1.3 Historical and technical information:

The very first device that had fundamentally the same functionality as a router does today was the
Interface Message Processor (IMP); IMPs were the devices that made up the ARPANET, the first
packet network. The idea for a router (called "gateways" at the time) initially came about through
an international group of computer networking researchers called the International Network
Working Group (INWG).

Set up in 1972 as an informal group to consider the technical issues involved in connecting
different networks, later that year it became a subcommittee of the International Federation for
Information Processing.

These devices were different from most previous packet networks in two ways. First, they
connected dissimilar kinds of networks, such as serial lines and local area networks. Second, they
were connectionless devices, which had no role in assuring that traffic was delivered reliably,
leaving that entirely to the hosts (this particular idea had been previously pioneered in the
CYCLADES network).

The idea was explored in more detail, with the intention to produce a prototype system, as part of
two contemporaneous programs. One was the initial DARPA-initiated program, which created
the TCP/IP architecture in use today.

Sometime after early 1974 the first Xerox routers became operational. The first true IP router was
developed by Virginia Strazisar at BBN, as part of that DARPA-initiated effort, during 1975-
1976. By the end of 1976, three PDP-11-based routers were in service in the experimental
prototype Internet.

The first multiprotocol routers were independently created by staff researchers at MIT and
Stanford in 1981; the Stanford router was done by William Yeager, and the MIT one by Noel
Chiappa; both were also based on PDP-11s.

Virtually all networking now uses TCP/IP, but multiprotocol routers are still manufactured.They
were important in the early stages of the growth of computer networking, when protocols other
than TCP/IPV Modern internet routers that handle both IPv4 and IPv6 are multiprotocol, but are
simpler devices than routers processing AppleTalk, DECnet, IP, and Xerox protocols.
[Document title]

1.4 Why would i need a router?

connect all computers to the Internet without having to pay a full broadband subscription service
to their ISP for each computer on the network.

In many instances, an ISP will allow you to use a router and connect multiple computers to a
single Internet For most home users, they may want to set-up a LAN (local Area Network) or
WLAN (wireless LAN) and connection and pay a nominal fee for each additional computer
sharing the connection. This is when home users will want to look at smaller routers, often called
broadband routers that enable two or more computers to share an Internet connection. Within a
business or organization, you may need to connect multiple computers to the Internet, but also
want to connect multiple private networks not all routers are created equal since their job will
differ slightly from network to network. Additionally, you may look at a piece of hardware and
not even realize it is a router.Broadband or ICS routers will look a bit different depending on the
manufacturer or brand, but wired routers are generally a small box-shaped hardware device with
ports on the front or back into which you plug each , along with a port to plug in your.

broadband modem. These connection ports allow the router to do its job of routing the data
packets between each of the computers and the data going to and from the Internet.

Depending on the type of modem and Internet connection you have, you could also choose a
router with phone or fax machine ports. A wired Ethernet broadband router will typically have a
built-in Ethernet switch to allow for expansion. These routers also support NAT (network address
translation), which allows all of your computers to share a single IP address on the Internet.
Internet connection sharing routers will also provide users with much needed features such as an
SPI firewall or server DHCP server.

The challenge of the verifying a large design is growing exponentially. There is a need to define
new methods that makes functional verification easy. Several strategies in the recent years have
been proposed to achieve good functional verification with less effort. Recent advancement
towards this goal is methodologies. The methodology defines a skeleton over which one can add
flesh and skin to their requirements to achieve functional verification. This project is aimed at
building a reusable test bench for verifying Router Protocol Verilog. Second part is verification plan
specifying the verification requirements and approaches to attack the problem, architecture of the test
bench gives complete description about the components and sub components used to achieve the
verification goal
[Document title]

Chapter 2
Literature Survey

A router is a device that forwards data packet between computer networking, creating an overlay
internet work. A router is connected to two or more data lines from different networks. When a
data packet comes in one of the lines, the router reads the address information in the packet to
determine its ultimate destination.

Then, using information in its routing table or routing policy, it directs the packet to the next
network on its journey. Routers perform the "traffic directing" functions on the internet. A data
packet is typically forwarded from one router to another through the networks that constitute the
internetwork until it reaches its destination node.

Routers may also be used to connect two or more logical groups of computer devices known as
subnets, each with a different sub-networking address. The subnets addresses recorded in the
router do not necessarily map directly to the physical interface connections. Forwarding an IP
datagram generally requires the router to choose the address and relevant interface of the next-
hop router or (for the final hop) the destination host.

In Transmission Control Protocol/Internet Protocol (TCP/IP) networking, routers are used to


interconnect the hardware and software used on different physical network segments called
subnets. Routers are also used to forward IP packets between each of the subnets. Determine the
physical layout of your network, including the number of routers and subnets you need, before
proceeding with the instructions in this guide.

Routers may provide connectivity within enterprises, between enterprises and the Internet, and
between internet service provide (ISPs) networks. The largest routers (such as the ciso CRS-1 or
juniper T1600 interconnect the various ISPs, or may be used in large enterprise networks.

Smaller routers usually provide connectivity for typical home and office networks. Other
networking solutions may be provided by a backbone Wireless Distribution System (WDS),
which avoids the costs of introducing networking cables into buildings
[Document title]

2.1 Overview of Network-on-Chip:

The growing computation-intensive applications and the needs of low-power, high-performance


systems, the number of computing resources in single-chip has enormously increased,
because current VLSI technology can support such an extensive integration of transistors. By
adding many computing resources such as CPU, DSP, specific IPs, etc to build a system in
System-on-Chip, its interconnection between each other becomes another challenging issue. In
most System-on-Chip applications, a shared bus interconnection which needs arbitration logic to
serialize several bus access requests, is adopted to communicate with each integrated processing
unit because of its low-cost and simple control characteristics. However, such shared bus
interconnection has some limitation in its scalability because only one master at a time can utilize
the bus which means all the bus accesses should be serialized by the arbitrator. Therefore, in such
an environment where the number of bus requesters is large and their required bandwidth for
interconnection is more than the current bus, some other interconnection methods should
be considered.

Such scalable bandwidth requirement can be satisfied by using on-chip packet-switched micro-
network of interconnects, generally known as Network-on-Chip (NOC) architecture. The basic
idea came from traditional large-scale multi-processors and distributed computing networks. The
scalable and modular nature of NOCs and their support for efficient on-chip communication lead
to NOC-based system implementations.
Even though the current network technologies are well developed and their supporting features
are excellent, their complicated configurations and implementation complexity. make it hard to be
adopted as an on-chip interconnection methodology. In order to meet typical SOCs or multi-core
processing environment, basic module of network interconnection like switching logic, routing
algorithm and its packet definition should be light-weighted to result in easily implemental
solutions.

2.2 Background:

The router used here is it avoid congestion and communication bottleneck. Although there are
number of router implementation has already been done. Some of the related works are included
here. Marescaux presented the implementation of router for NOC based system which has 2D
torus network topology. Packet size was 8 bits and 2 control bits. The main drawback here was
it was a 2D torus formed using 1D router which creates a serious bottleneck in traffic. Zerferino
[Document title]

presented a soft core router for NOC ,the problem with this router implementation was it uses 4
flit buffer having 8 bit implementation which is quite high.

Its input and output channel has four distinct blocks and uses a large decoding logic. Moraes also
presented its work but the drawback with it was that its packet has two headers which are quite
expensive. The buffer here is present only with input channel. The absence of output buffer
creates a serious problem in the implementation of router as it increases the problem of
congestion.

Our paper removes most of the problems cited above and improves the performance of router.
The most familiar type of routers are home and small office routers that simply pass data, such as
web pages and email, between the home computers and the owners’ cable or DSL modem, which
connects to the internet (ISP).However more sophisticated routers, which connect large business
or ISP networks up to the powerful core routers that forward data at high speed along the optical
fiber lines of the Internet backbone.
[Document title]
Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

CHAPTER 3
Router Design Specification
3.1 Overview:

Router is a packet based protocol. Router drives the incoming packet which comes from the input
port to output ports based on the address contained in the packet. The router has a one input port
from which the packet enters. It has three output ports where the packet is driven out. The router
has an active low synchronous input resetn which resets the router.
.
8 8
data data_out_0
packet_valid vld_out_0
suspend_data read_enb_0
err 8
data_out_1
vld_out_1
Router_1X3 read_enb_1
clock 8
resetn data_out_2
n vld_out_2
read_enb_2

Figure 3.1- Block Diagram of Four Port Router

Data packet moves in to the input channel of one port of router by which it is forwarded to the
output channel of other port. Each input channel and output channel has its own decoding logic
which increases the performance of the router. Buffers are present at all ports to store the data
temporarily.

The buffering method used here is store and forward. Control logic is present to make arbitration
decisions. Thus communication is established between input and output ports. According to the
destination path of data packet, control bit lines of FSM are set.
The movement of data from source to destination is called switching mechanism

BRECW, Hyderabad Page 9


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

3.2.2 Packet – Payload:


Data: Data should be in terms of bytes and can take anything.
Parity: This field contains the security check of the packet. It should be a byte of even,
Bitwise parity, calculated over the header and data bytes of the packet.
3.3 Router Input Protocol:
The characteristics of the DUV input protocol are as follows:

clock

delay
reset

packet_valid

data H D D D P H D D D P

Suspend_data

err
sent packet Packet 1 (addr = 0) Packet 1 (addr =
0)
H = Header, D = Data, P = Parity

Figure 3.3- Router Input Protocol

 All input signals are active high and are synchronized to the falling edge of the clock. This
is because the DUV router is sensitive to the rising edge of clock. Therefore, driving input
signals on the falling edge ensures adequate setup and hold time, but the signals can also
be driven on the rising edge of the clock.
 The packet _valid signal has to be asserted on the same clock as when the first byte of a
packet (the header byte), is driven onto the data bus.
 Since the header byte contains the address, this tells the router to which output channel the
packet should be routed (data_out_0, data_out_1, or data_out_2).
 Each subsequent byte of data should be driven on the data bus with each new
rising/falling clock.

BRECW, Hyderabad Page 10


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

 After the last payload byte has been driven, on the next rising/falling clock, the packet
_valid signal must be deasserted, and the packet parity byte should be driven. This signals
packet completion.
 The input data bus value cannot change while the suspend_data signal is active (indicating
a FIFO overflow). The packet driver should not send any more bytes and should hold the
value on the data bus. The width of suspend_data signal assertion should not exceed 100
cycles.
 The err signal asserts when a packet with bad parity is detected in the router, within 1 to
10 cycles of packet completion
3.4 Router Output Protocol:
 The characteristics of the output protocol are as follows:
 All output signals are active high and can be synchronized to the rising/falling edge of the
clock. Thus, the packet receiver will drive sample data at the rising/falling edge of the
clock. The router will drive and sample data at the rising edge of clock.
 Each output port data_out_X (data_out_0, data_out_1, data_out_2) is internally buffered
by a FIFO of 1 byte width and 16 location depth.
 The router asserts the vld_out_X (vlld_out_0, vld_out_1 or vld_out_2) signal when valid
data appears on the vld_out_X (data_out_0, data_out_1 or data_out_2) output bus. This is
a signal to the packet receiver that valid data is available on a particular router.
 The packet receiver will then wait until it has enough space to hold the bytes of the packet
and then respond with the assertion of the read_enb_X (read_enb_0, read_enb1 or
read_enb_2) signal that is an input to the router.
 The read_enb_X (read_enb0, read_enb_1 or read_enb_2) input signal can be asserted on
the rising/falling clock edge in which data are read from the data_out_X (data_out_0,
data_out_1 or data_out_2) bus.
 As long as the read_enb_X (read_enb_0, read_enb_1 or read_enb_2) signal remains
active, the data_out_X (data_out_0, data_out_1 or data_out_2) bus drives a valid packet
byte on each rising clock edge.

BRECW, Hyderabad Page 11


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

 The packet receiver cannot request the router to suspend data transmission in the middle of
the packet. Therefore, the packet receiver must assert the read_enb_X (read_enb_0,
read_enb_1 or read_enb_2) signal only after it ensures that there is adequate space to hold
the entire packet.
 The read_enb_X (read_enb_0, read_enb_1 or read_enb_2) must be asserted within 30
clock cycles of the vld_out_X (vld_out_0, vld_out_1 or vld_out_2) being asserted.
Otherwise, there is too much congestion in the packet receiver.
 The DUV data_out_X (data_out_0, data_out_1 or data_out_2) bus must not be tri-stated
(high Z) when the DUV signal vld_out_X (vld_out_0, vld_out_1or vld_out_2) is asserted
(high) and the input signal read_enb_X (read_enb_0, read_enb_1 or read_enb_2) is also
asserted high.

clock

reset

packet_valid

data H D D D P H D D D P

vld_out_0
response delay
read_enb
_0
data_out_0 H D D D P

received
Packet 1 (addr = 0)
packet

Figure 3.4- Router output Protocol


3.5 Features:
 Fully synthesizable.
 Reusability design.
 Variable length of transfer word up to 64 bytes.
 HEADER is the first data transfer.
 Rx and Tx on both rising or falling

BRECW, Hyderabad Page 12


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

 Fully static synchronous design with one clock domain


 Technology independent VERILOG.

3.6 Specifications:

Input Specifications:

Input Ports 1
Port size 8-bit
Input data packet
Total number of input pins 15

TABLE 3.1
Output Specifications:

Output Ports 4
Each Port size 8-bit
Output data packet
Total number of output pins 38

TABLE3. 2
Device:

Family Spartan 3e
Part Xc3s100e
package Vq100
Process Maximum
Speed grade -5
Frequency 200Mhz
TABLE 3.3

BRECW, Hyderabad Page 13


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

CHAPTER 4
Four Port Router Architecture

4.1 Router Architecture:


The Four Router Design is done by using of the three blocks .the blocks are 8-Bit Register,
Router controller and output block. the router controller is design by using FSM design and the
output block consists of three fifo’s combined together the fifo’s are store packet of data and
when u want to data that time the data read from the FIFO’s. In this router design has three
outputs that is 8-Bit size and one 8_bit data port it using to drive the data into router we are using
the global clock and reset signals, and the err signal and suspended data signals are output’s of the
router .the FSM controller gives the err and suspended_data_in signals .this functions are
discussed clearly in below FSM description.

Figure- 4.1 Four Port Router Architecture

The router_reg module contains the status, data and parity registers for the Network router_1x3.
These registers are latched to new status or input data through the control signals provided by the
fsm_router.

BRECW, Hyderabad Page 14


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

There are 3 FIFO for each output port, which stores the data coming from input port based on the
control signals provided by fsm_router module.
The fsm_router block provides the control signals to the fifo, and router_reg module. The Router
blocks Diagram shown below fig…
Router blocks are
 Register
 Router controller(FSM)
 FIFO Output Block
4.2 Register Block:

This module contains status, data and parity registers required by router. All the registers in this
module are latched on rising edge of the clock.
Data registers latches the data from data input based on state and status control signals, and this
latched data is sent to the fifo for storage. Apart from it, data is also latched into the parity
registers for parity calculation and it is compared with the parity byte of the packet. An error
signal is generated if packet parity is not equal to the calculated parity.

Figure 4.2- Register Block


If resetn is low then output (dout, err, parity_done and low_packet_valid) are low.
The output parity_done is high

BRECW, Hyderabad Page 15


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

 When the input ld_state is high and (fifo-full and packet_valid) is low or when the input
laf_state and output low_packet_valid both are high and the previous value of parity_done is
low. It is reseted to low value by reset_int_reg signal.
 The output low_packet_valid is high.
 When the input ld_state is high and packet_valid is low.It is reseted to low by reset_int_reg
signal.

First data byte i.e., header is latched inside the internal register first_byte when detect_add and
packet_valid signals are high, So that it can be latched to output dout when lfd_state signal goes
high.
Then the input data i.e., payload is latched to output dout if ld_state signal is high and fifo_full is
low.
Then the input data i.e., parity is latched to output dout if ld_state signal is high and fifo_full is
low.
The input data is latched to internal register full_state_byte when ld_state and fifo_full are high;
this full_state_byte data is latched inside the output dout when laf_state goes high.
Internal parity register stores the parity calculated for packet data, when packet is transmitted
fully, the internal calculated parity is compared with parity byte of the packet. An error signal is
generated if packet parity is not equal to the calculated parity.

Figure 4.3-Register block synchronization

BRECW, Hyderabad Page 16


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

In the above figure register block is synchronize with the fsm to latch input data to it. Here, clk,
resetn signals are synchronous with the entire module.
Eg: We are giving packet data as input to it and making read single (re1, re2, re3) as high w.r.t
input first data byte of the packet. The receiving data is driven to the Router Controller for
reaching its destination port. Which has 11 input pins (data_in [7:0],packet_valid , clk, reset).
Eg: data_in=8’b10101010, clk, reset, packet_valid are HIGH

4.3 Router Controller (FSM):

This module generates all the control signals when new packet is sent to router. These control
signals are used by other modules to send data at output, writing data into the fifo.

Figure 4.4- Router Controller Block

BRECW, Hyderabad Page 17


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

Figure -4.5 Router Controller State diagram


The ‘fsm_router’ module is the controller circuit for the router FSM State Diagram.
STATE - DECODE_ADDRESS
This is the default state. It waits for the packet_valid assertion, after packet_valid signal goes
high, if the address is valid and fifo for that address is empty (fifo_empty signal will be high),
data can be loaded, so, it goes to the next state LOAD_FIRST_DATA.
If fifo is not empty it goes to WAIT_TILL_EMPTY so that, new data couldn’t be accepted till
fifo is ready.
The output signal detect_add is made high, so that ff_sync module can detect the address of fifo
to be used. detect_add signal is also used by router_reg module to latch the first byte in internal
register.
STATE - LOAD_FIRST_DATA
In this state lfd_state signal is generated, which indicates to the router_reg module that first data
byte can be latched.
At the same time suspend_data signal is made high so that first data byte can be faithfully latched
inside the output data register in router_reg module. In the next clock edge unconditionally this
state is changed to LOAD_DATA.

BRECW, Hyderabad Page 18


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

STATE – WAIT_TILL_EMPTY
In this state neither new data is accepted nor data is latched by router_reg module, so
suspend_data signal is made high and write_enb_reg signal is made low. It waits for the
fifo_empty signal, when it goes high; it goes to the LOAD_FIRST_DATA state
STATE - LOAD_DATA0
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. Suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE0.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA0
state, it goes to LOAD_PARITY0 state, where last parity byte is latched.
STATE – LOAD_PARITY0
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched,
so, it goes to FIFO_FULL_STATE else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.
STATE – FIFO_FULL_STATE0
In this state neither new data is accepted nor any data is latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
In this state laf_state signal is generated for router_reg so that it can latch the data after
FIFO_FULL_STATE0, no new data is accepted so suspend_data is kept high, last data is latched
in router_reg module for that write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY0 state has passed,
if parity_done is high it goes to the last state CHECK_PARITY_ERROR0

BRECW, Hyderabad Page 19


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

Then it checks for low_packet_valid register, which if high shows that packet_valid for present
packet has been deasserted, if low_packet_valid is high it goes to LOAD_PARITY0 state
otherwise it goes back to the LOAD_DATA state..
STATE – CHECK_PARITY_ERROR0
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.
This state changes to default state DECODE_ADDRESS with next clock edge.
STATE - LOAD_DATA1
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE1.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA1
state, it goes to LOAD_PARITY1 state, where last parity byte is latched.
STATE – LOAD_PARITY1
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched, so,
it goes to FIFO_FULL_STATE1 else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR1.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.
STATE – FIFO_FULL_STATE1
In this state neither new data is accepted nor any data is latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
Signal full_state is generated for router_reg module. This state changes to
LOAD_AFTER_FULL1 state when fifo_full becomes low.

BRECW, Hyderabad Page 20


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

In this state laf_state signal is generated for router_reg so that it can latch the data after
FIFO_FULL_STATE1, no new data is accepted so suspend_data is kept high, last data is latched
in router_reg module for that write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY1 state has passed,
if parity_done is high it goes to the last state CHECK_PARITY_ERROR.1
Then it checks for low_packet_valid register, which if high shows that packet_valid for present
packet has been deasserted, if low_packet_valid is high it goes to LOAD_PARITY1 state
otherwise it goes back to the LOAD_DATA1 state
STATE – CHECK_PARITY_ERROR1
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.
This state changes to default state DECODE_ADDRESS with next clock edge
STATE - LOAD_DATA2
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE2.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA2
state, it goes to LOAD_PARITY2 state, where last parity byte is latched
STATE – LOAD_PARITY2
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched,
so, it goes to FIFO_FULL_STATE else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.

BRECW, Hyderabad Page 21


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

STATE – FIFO_FULL_STATE2
In this state neither new data is accepted nor is any data latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
Signal full_state is generated for router_reg module. This state changes to
LOAD_AFTER_FULL state when fifo_full becomes low. In this state laf_state signal is
generated for router_reg so that it can latch the data after FIFO_FULL_STATE, no new data is
accepted so suspend_data is kept high, last data is latched in router_reg module for that
write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY state has passed, if
parity_done is high it goes to the last state CHECK_PARITY_ERROR.

STATE – CHECK_PARITY_ERROR1
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.This state changes to
default state DECODE_ADDRESS with next clock edge.

Figure-4.6 FSM synchronization block

Fsm block will synchronize register and fifo modules. The function of fsm is it taken data from
data register and input data is latched to respective output based on header address which controls
the function of design. So, it is called Router controller.

BRECW, Hyderabad Page 22


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

It has 17 inputs and 32outputs. For designing it we having 14


states(decode_address,wait_till_empty,load_data0,load_data1,load_data2,fifo_full0,fifo_full1,fifo
_full2,load_parity0,load_parity1,load_parity2,check_parity0,check_parity1,check_parity2).By
this it takes independent decision for the data to reach its destination port.
Eg: INPUT: data_in from Registerblock=8’b10101010, fifo_full0, fifo_full1,fifo_full2 are LOW
and fifo_emp0, fifo_emp1, fifo_emp2 are HIGH.

OUPUT:data_out1,data_out2,wr_enb1,wr_enb2,suspend_data,err,valid_ch1,valid_ch2 are LOW


and data_out3,we_enb3,valid_ch3 are HIGH.

4.4 Router Output Block:

There are 3 fifos used in the router design. Each fifo is of 8 bit width and 16 bit depth.
The fifo works on system clock. It has synchronous input signal reset.
If resetn is low then full =0, empty = 1 and data_out = 0
The FIFO has doing 3 deferent operations
 Write Operation
 Read operation
 Read and Write Operation.

Figure 4.7 FIFO Block

The functionality of FIFO explain below:


Write operation:

BRECW, Hyderabad Page 23


Mini Project Report Advanced FIFO Structure For Router In Bi-Noc

The FIFO write operation is done by when the data from input data_in is sampled at rising edge
of the clock when input write_enb is high and fifo is not full.in this condition onaly FIFO Write
operation is done.

Read Operation:
The FIFO Read Operation is The data is read from output data_out at rising edge of the clock,
when read_enb is high and fifo is not empty.
Read and Write operation can be done simultaneously.
Full – it indicates that all the locations inside fifo has been written.
Empty – it indicates that all the locations of fifo are empty.
The Output Block of Network Router conisistes of three FIFO.Each FIFO is a 8-Bit data Width
and 16 bit data depth .the strcture of OUTPUT Block is shown in below fig..

Figure-4.8 Synchronization of Output Block with FSM


This module provides synchronization between fsm and fifo modules. It provides faithful
communication between single input port and three output ports.
It will detect the address of channel and will latch it till packet_valid is asserted, address and
write_enb_sel will be used for latching the incoming data into the fifo of that particular channel.A
fifo_full output signal is generated, when the present fifo is full, and fifo_empty output signal is
generated by the present fifo when it is empty.The output vld_out signal is generated when empty
of present fifo goes low, that means present fifo is ready to read(vld_out_0 =
~empty_0,vld_out_1 = ~empty, vld_out_2 = ~empty_2).
The write_enb_reg signal which comes from the fsm is used to generate write_enb signal for the
present fifo which is selected by present address.

BRECW, Hyderabad Page 24


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Eg: INPUT: clk,reset,read_enb are HIGH ,write_enb are LOW and data_in=8’b10101010.
OUTPUT:full is LOW,empty is HIGH and data_out=10101010.

4.5 Design aspects and approach:

4.5.1 Register:
It holds 8-bit values.Writing verilog code it should be declared as 8-bit width.

4.5.2 Router Controller:


For designing it we need finite state machine which controls all signals and we need states for
controlling.The states should be for initializing data,wait for data until empty, loading data to
respective port,if fifo is full w.r.t port it have to wait until empty,for parity byte loading and
parity calculation.For initializing decode_state,for data waiting till empty wait_till_empty,for
loading data load_data0, load_data1, load_data2,for checking fifo is full fifo_full0, fifo_full1,
fifo_full2,for loading parity load_parity0, load_parity1, load_parity2 and for parity calculation
check_parity0, check_parity01,check_parity2 totally we need 14 states having 4-bit width.But 4-
bit it will give 16 states i.e full case.So, we are using only 14 states i.e parallel case. States
represent as parameter keyword by this state value can’t change throughout the design.For this
design block we written in verilog code in behavioral model.
First we verifying whether it is reseting or not and when packet_valid signal is the
data(10101010) is driving to w.r.t output port(fifo2) when it making low data is loading to load
parity for parity calculation.And also verified whether it is switching to another port with another
data(10101000) when reseting and also driven some bytes of data to the port.Testbench is written
in verilog code.

4.5.3 FIFO:
It is 8-bit width and 16-bit depth.For fifo full or empty we are taking fifo_full and fifo_empty
signals.For the status of full or empty of fifo we need a internal counter for counting it locations
upto 16 locations it mean it is 4-bit wide.Input signals are data_in(8-bit),we,re,clk,resetn and
output signals are data_ou(8-bit),t,fifo_empty,fifo_full.Data

BRECW, Hyderabad Page 25


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

driven when write and not fifo full and it read when read and not fifo empty.RTL code it is
written in verilog code in behavioral model.
It is verified by giving 16 bytes of data in data_in ,we is high then fifo_full becomes high.When
it is high data can’t be written into it.We get output in data_out and re is high it given all 16 bytes
of data which we had driven after that fifo_empty is high then we can’t read data and we also
verified when both we and re signals are high it is written in verilog code.

4.5.4 Top module:


In this module we synchronized register, router controller, fifo blocks for that we calling all the
modules with .name instance declaration and developing a design plan with these blocks as
data_in is driven to register block, routing data to respective port decision taken by router
controller and we are considering three fifo as output port. Same clock is given to all blocks i.e
clock is synchronous. Code it is written in verilog code.

Fig 4.9: Block diagram of topmodule

CDMA Transmitter

The eight bit input data corresponding to a particular user is converted into serial form by an eight
bit PISO. The PISO is clocked by Fmaster divided by 15 clock where Fmaster is 0.5GHz. Then it
is spreaded by the 15 bit PN code. The PN code generator is clocked by Fmaster. Spreaded data
of all the four users are summed up and generated the signal to be transmitted.

BRECW, Hyderabad Page 26


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig: CDMA Transmitter

CDMA Receiver:

After de spreading the received signal with the corresponding code, it is compared with the same
PN code, which is converted into parallel, using an 8 bit comparator. The comparator uses
0.33GHz clock frequency. If the actual transmitted data was a high then the de spread output will
be same as that of the PN sequence. So the comparison function is performed in such a way that,
it compares the de spread output with PN sequence. If it is same, then it can be concluded that the
data send is a high and if it is not, then the data will be a low. So the comparator output
corresponds to the actual transmitted data of a particular user. Thus it is able to reconstruct the
original data from the spreaded output.

BRECW, Hyderabad Page 27


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig: CDMA Receiver

4.5.5 PN sequence generator

Linear feedback shift registers are used for generating PN sequences. Components of D ip ops
are used for this since structural modeling is used. To generate the sequence, rst it is necessary
to initialize the ip ops to a particular value. Since 15 bit long PN sequence is being used, four ip
ops are required and these four ip ops are required to be initialized. For that purpose, init signals
are used. After the initialization, the xor feedback logic will provide a method to generate a PN
sequence. Orthogonal sequences are required in this system. Time shifted versions of a PN
sequence will be nearly orthogonal. So to shift the sequences, shift registers are used in which
the sequence is given as input to the registers. The outputs from intermediate ip ops are taken
which will be time shifted. So at the output of PN generator four PN sequences are obtained.

BRECW, Hyderabad Page 29

Kendyala | [School]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig.2 Typical structure of Bi-directional NoC


The CDMA router has M transmit/receive ports. The main difference between the overloaded
and classical CDMA routers is that M > N −1 for the former due to channel overloading. Each
PE is connected to two network interfaces (NIs), transmit and receive NI modules.
During packet transmission from a PE, the packet is divided into flits to be stored in the
transmit NI first-input first output (FIFO). The router arbiter then selects M winning flits at most
from the top of the NI FIFOs to be transmitted during the current transaction. The selected flits
must all have an exclusive destination address to prevent conflicts, and a winner from two
conflicting flits is selected according to the router’s priority scheme. The employed priority
scheme is the fixed winner that takes all priority schemes; only one of the transmitters is given a
spreading code and is acknowledged to start encoding. Once done, the router assigns CDMA
codes to each transmit and receive NI. NIs with empty FIFOs or conflicting destinations are
assigned all-zero CDMA codes such that they do not contribute MAI to the CDMA channel
sum. Afterward, flits from each NI are spread by the CDMA codes in the encoder module.
The data are spread into N chips, where N is the CDMA code length that equals the number of
clock cycles in a single crossbar transaction. Spread data chips from all encoders are summed by
the CDMA crossbar adder and the sum is sent out serially to all decodersThe encoding/decoding
process lasts for N clock cycles synchronized via a counter. At each decoder, the assigned code
is cross correlated with the received sum to decode the data from the summed chips. The
decoded flits are stored in the receive NI FIFOs until they are read by the PEs.
In this paper, we focus on the high-level architecture and implementation details of the
overloaded CDMA crossbar represented by the gray block in Fig. 1(a). A store and forward flow
control and a deterministic routing

BRECW, Hyderabad Page 30


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

algorithm are employed in the OCI router. The routing algorithm lies at the network layer,
which is a higher layer than the physical layer containing the crossbar switch. According to the
OSI model design principles, each layer of the model exists as an independent layer.
Theoretically, one can substitute one protocol for another at any given layer without affecting
the operation of layers above or below. Thus, using the same flow control protocol and routing
algorithm enables comparing the OCI-based router with SDMA- and TDMA-based routers.

A. OCI Crossbar High-Level ArchitectureThe main objective of this paper is increasing the
number of ports sharing the ordinary CDMA crossbar presented, while keeping the system
complexity unchanged using simple encoding circuitry and relying on the accumulator decoder
with minimal changes. To achieve this goal, some modifications to the classical CDMA crossbar
are advanced. Fig. 2 depicts the high-level architecture of the OCI crossbar for a single-bit
interconnection. The same architecture is replicated for a multibit CDMA router. M TX-RX
ports share the CDMA router, where spread data from the transmit ports are added using an
arithmetic binary adder having M binary inputs and an m-bit output, where m = _log2 M_. The
adder is implemented in both the reference and pipelined architectures.
A controller block is used for code assignment and arbitration tasks. Each PE is interfaced to
an
encoder/decoder wrapper enabling data spreading/despreading. Unlike orthogonal
spreadingcodes, which are XORed with the binary data bit, an AND gate is utilized to spread
data using nonorthogonal spreading codes.
The AND gate encoder works as follows: if the transmitted data bit is “0,” it sends a stream of
zeros during the whole spreading cycle, which does not cause MAI on the channel; if the
transmitted data bit is “1,” the encoder sends a nonorthogonal spreading code. Therefore, the
additional MAI spreading code will either contribute an MAI value of one or zero each clock
cycle because the encoder is an AND gate.
The XOR encoder of the ordinary CDMA crossbar cannot be used to encode the OCI codes
because it only complements the spreading code chips, so an XOR gate will cause MAI to the
crossbar whether the data bit is “0” or “1.”
A hybrid encoder is developed for both orthogonal and nonorthogonal spreading with an XOR
gate, an AND gate, and a multiplexer unit, as shown in Fig. 2. Two decoder types

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page 31


are implemented for orthogonal and nonorthogonal data. More details about each component of
the OCI crossbar will be presented.
B. OCI Code Design
The Walsh–Hadamard spreading code family has a featured property that enables CDMA
interconnect overloading. The difference between any consecutive channel sums of data spread
by the orthogonal spreading codes for an odd number of TX-RX pairs M is always even,
regardless of the spread data. This property means that for the N − 1 TX-RX pairs using the
Walsh orthogonal codes, one can encode additional N − 1 data bits in consecutive differences
between the N chips composing the orthogonal code. Thus, exploiting this property enables
adding 100% nonorthogonal spreading codes, which can double the capacity of the ordinary
CDMA crossbar.

In this section, the code design methodology, mathematical foundations, and the decoding
details of the OCI codes are provided. The notations used throughout this paper are listed. An
AND gate encoder is used to encode data with nonorthogonal spreading codes as shown in Fig.
2(a). Therefore, for a nonorthogonal encoder, if data to transmit are one, a single spreading chip
at a specific time slot in the spreading cycle is added to the channel sum, which causes the
consecutive sum difference to deviate. The nonorthogonal codes imitate the TDMA signaling
scheme as each code is composed of a single chip of “1” sent in a specific time slot.

The encoding/decoding scheme presented in this paper provide a novel approach that enables
coexistence between CDMA and TDMA signals in the same shared medium. Therefore, the
developed encoder is called TDMA overloaded on CDMA interconnect (T-OCI). Fig. 3 shows
an encoding/decoding example of two T-OCI codes for a spreading code of length N = 8. An
odd number of orthogonal codes must be used simultaneously to preserve the even difference
property of Walsh codes.
where S is the N-cycle waveform of the channel sum, dC( j ) is the orthogonal CDMA data bit
sent by the j th user, dT ( j ) is the nonorthogonal TDMA data bit sent by the j th, Co( j ) is the
orthogonal code assigned to the j th user, and T ( j−N +1) is the TDMA code assigned to the j th
user. The TDMA code T (i ) is a single chip of “1” assigned at the i th time slot.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page 32


The TDMA term of the equation is the sum of products of TDMA chips and their corresponding
data bits. This term can be viewed as another N-chip spreading code added to the orthogonal
spread data represented by the first term of the equation. It should be indicated that the first chip
of the TDMA MAI code is always set to zero (T (1) = 0), and the remaining N − 1 chips are
assigned according to the encoded data bits; this note is the key to properly decode both
orthogonal and nonorthogonal spread data.
C. OCI Crossbar Building Blocks
Two variants are realized for each OCI crossbar, reference and pipelined architectures. The
pipelined architecture is implemented to increase the crossbar operating frequency, and
consequently, bandwidth by adding nonfunctional pipelining registers to reduce the crossbar
critical path. The OCI crossbar shown in Fig. 2 is basically composed of three main building
blocks: 1) the encoder wrappers; 2) the decoder wrappers; and 3) the crossbar adder blocks,
which are described in the following.

1) Crossbar Controller: At the beginning of each crossbar transaction, the controller assigns
spreading codes to different encoders. The assignment of orthogonal dispreading codes to
receive ports is fixed, i.e., does not change between the crossbar transactions. Therefore, for a
router port to initiate the communication with the receive port it addresses, its encoder must be
assigned a spreading code that matches the destined decoder. If two different ports request to
address the same decoder, the controller allows one access and suspends the other according to a
predefined arbitration scheme. This code assignment scheme is called receiver-based protocol.
In this paper, a static allocation scheme that allocates fixed spreading codes to all encoders is
used. To interconnect a large number of PEs, a torus, star, or hybrid NoC topology can be
realized where the assignment of spreading codes is local to each router. Consequently, each
new packet arriving at a router is assigned a spreading code corresponding to its exit port
decoder. The crossbar controller issues handshake signals to the transmit and receive ports with
matching spreading codes to enable the transmitter encoders and receiver decoders.

2) Hybrid Encoder:
The encoder is hybrid, it can encode both orthogonal and nonorthogonal data. A transmitted data
bit is XORed/ANDed with the spreading code to produce the orthogonal/nonorthogonal spread
data, respectively. A multiplexer chooses

BRECW, Hyderabad Page 33


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

between the orthogonal and nonorthogonal inputs according to the code type assigned to the
encoder as depicted by Fig. 2(a). The encoder is replicated N times for the P-OCI crossbar.
3) Crossbar Adder: For a spreading code set of length N, the number of crossbar TX-RX ports is
equal to M =2(N − 1). In the T-OCI crossbar, sending a “1” chip to the adder is mutually
exclusive between nonorthogonal transmit ports according to the T-OCI encoding scheme. This
indicates that among the 2(N−1) inputs to the adder, there are guaranteed (N − 2) zeros, while
the maximum number of “1” chips is N. Therefore, a multiplexer is instantiated to select only a
single input of the nonorthogonal TDMA encoded data bits and discard the remaining bits that
are guaranteed to be “0.” Thus, the adder has only N-bit inputs, N−1 from orthogonal encoders,
and 1 from the multiplexer, as shown in Fig. 2(d). The sum produced by the adder circuit needs
(log2 N) wires. The number of needed stages of registers to pipeline the adder is (log2 N), as
depicted in Fig. 2(d). N replicas of the crossbar adder are instantiated for the parallel encoding
adopted in the P-OCI crossbar.
4) Custom Decoder: There are four decoder types for different CDMA decoding techniques: the
orthogonal T-OCI and P-OCI decoders and the overloaded T-OCI and P-OCI decoders. The
orthogonal T-OCI decoder is an accumulator implementation of the correlator receiver. N − 1
accumulator decoders are instantiated in all CDMA crossbar types for orthogonal data
despreading. Instead of implementing two different accumulators (the zero and one
accumulator), an up–down accumulator is implemented and the accumulated result is the
difference between the two accumulators of the conventional CDMA decoder as shown in Fig.
2(f). The accumulator adds or subtracts the crossbar sum values according to the despreading
code chip and resets every N cycles. The sign bit of the accumulated value directly indicates the
decoded data bit, where the positive sign is decoded as “1,” while the negative sign is decoded
as “0.” The P-OCI orthogonal decoder shown in Fig. 2(e) differs from the T-OCI orthogonal
decoder in receiving the adder sum values concurrently not sequentially; therefore, the
accumulator loop is unrolled into a parallel adder.
The T-OCI overloaded decoder depicted in Fig. 2(b) is composed of a 2-bit register to store the
LSBs of two sum values, first of which is S(0) and the second is S( j − N + 1), where j is the
number of the T-OCI decoders (N ≤ j ≤ 2N − 2). The two bits are fed to the XOR gate, which
decodes nonorthogonal spread data. The T-OCI decoder is replicated N times to implement the
P-OCI decoder of Fig. 2(c). The 2-bit register is not needed anymore because the S(0) and
S( j−N+1) values exist in the same cycle. The T-OCI and P-OCI crossbar architectures contain
(N − 1) orthogonal decoders and (N − 1) overloaded decoders.

BRECW, Hyderabad Page


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

34

Fig.3 Structure of Bi-NoC with virtual channel allocation

If a NoC’s router has a larger FIFO buffer, the throughput will be larger and the latency in the
network smaller, since it will have fewer flits stagnant on the network [20]. Nevertheless, there
is a limit on the increase of the FIFO depth. Since each communication will have its
peculiarities, sizing the FIFO for the worst case communication scenario will compromise not
only the routing area, but power as well [6]. However, if the router has a small FIFO depth, the
latency will be larger, and quality of service (QoS) can be compromised. The proposed solution
is to have a heterogeneous router, in which each channel can have a different buffer size. In this
situation, if a channel has a communication rate smaller than its neighbour, it may lend some of
its buffer slots that are not being used. In a different communication pattern, the roles may be
reversed or changed at run time, without a redesign step. The proposed architecture is able to
sustain performance due to the fact that, statistically, not all buffers are used all the time. In our
architecture it is possible to dynamically reconfigure different buffer depths for each channel. A
channel can lend part or the whole of its buffer slots in accordance with the requirements of the
neighbouring buffers. To reduce connection costs, each channel may only use the available
buffer slots of its right and left neighbour channels. This way, each channel may have up to
three times more buffer slots than its original buffer with the size defined at design time. Fig. 4
shows the original and proposed input FIFO. Comparing the two architectures, the new proposal
uses more multiplexers to allow the reconfiguration process. Fig. 4(b) presents the South
Channel as an example

BRECW, Hyderabad Page


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

35

thereby increasing throughput and avoiding deadlock error. The flow of virtual channel in router
from input port to output port is as shown Fig.3. The incoming flit which has high priority
arrives to the neighbour router accessed appropriate virtual channel initially; thereafter entire
data packet will be processed. The incoming first flit of data packet is head flit which arrives to
top of virtual channel queue of the buffer thereby entering into RC stage. It decodes in RC stage
and creates respective direction of request towards destination router. The direction request of
flit transfers to VA stage to obtain selected virtual channels towards destination router. The
contention may occur among data packets with direction request towards destination router
when same virtual channel utilized. The data packets which are not accessed virtual channel
waits in VA stage and it will be start transfers data packet once current flit reached to next router
thereby avoiding contention failure of data packets. By multiplexing entire virtual channels to
one buffer queue, any flit cannot block other data packets which are available to route though
physical channel. In typical NoC structure, routers are intercommunicated through unidirectional
channel whereas in Bi-NoC, data intercommunicated in any channel thereby improving
bandwidth utilization. In order to configure channels dynamically, added channel control
module added to each directional channel. The proposed design uses each channel either input
and output therefore the width of channel request from the RC stage is doubled. The two bi-
directional channels are requested to data transfer at each output direction thereby decreasing
contention by sending data packets into same direction simultaneously. Hence, the channel
control module has two functions that are dynamic configuration and maintaining the channel
request. As bi-directional channel is shared with a pair of neighbour routers, the output of each
transition is authorized by channel control protocol of two routers. The channel control protocol
is composed by FSM module to obtain higher efficiency. The other responsibility is maintaining
of channel in terms of blocked or unblocked which is depend on status of the channel. When
channel is available to use, the arbiter sends the request to SA module to process the channel
allocation [16]. The highlight of this structure is replacing unidirectional channel into bi-
directional thereby enhancing the channel utilization and flexibility without required additional
bandwidth.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


36

CHAPTER 5

SOFTWARE AND HARDWARE COMPONENTS


5.1 Software used for the Project:

5.1.1 VERILOG:
In the semiconductor and electronic design industry, Verilog is a hardware description language
(HDL) used to model electronic systems. Verilog HDL is most commonly used in the design,
verification, and implementation of digital logic chips at the register-transfer level of
abstraction. It is also used in the verification of analog and mixed-signal circuits.

Overview :

Hardware description languages such as Verilog differ from software programming languages
because they include ways of describing the propagation of time and signal dependencies
(sensitivity). There are two assignment operators, a blocking assignment (=), and a non-blocking
(<=) assignment. The non-blocking assignment allows designers to describe a state-machine
update without needing to declare and use temporary storage variables. Since these concepts are
part of Verilog's language semantics, designers could quickly write descriptions of large circuits
in a relatively compact and concise form. At the time of Verilog's introduction (1984), Verilog
represented a tremendous productivity improvement for circuit designers who were already
using graphical schematic capture software and specially written software programs to
document and simulate electronic circuits.

The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Like C, Verilog
is case-sensitive and has a basic preprocessor (though less sophisticated than that of ANSI C/C+
+). Its flow keywords (if/else, for, while, case, etc.) are equivalent, and its operator precedence is
compatible. Syntactic differences include variable declaration (Verilog requires bit-widths on
net/reg types), demarcation of procedural blocks (begin/end instead of curly braces {}), and
many other minor differences.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


37

A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and
communicate with other modules through a set of declared input, output, and bidirectional ports.
Internally, a module can contain any combination of the following: net/variable declarations
(wire, reg, integer, etc.), concurrent and sequential statement blocks, and instances of other
modules (sub-hierarchies). Sequential statements are placed inside a begin/end block and
executed in sequential order within the block. However, the blocks themselves are executed
concurrently, making Verilog a dataflow language.

Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined") and
strengths (strong, weak, etc.). This system allows abstract modeling of shared signal lines, where
multiple sources drive a common net. When a wire has multiple drivers, the wire's (readable)
value is resolved by a function of the source drivers and their strengths.

A subset of statements in the Verilog language is synthesizable. Verilog modules that conform
to a synthesizable coding style, known as RTL (register-transfer level), can be physically
realized by synthesis software. Synthesis software algorithmically transforms the (abstract)
Verilog source into a netlist, a logically equivalent description consisting only of elementary
logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a specific FPGA or VLSI
technology. Further manipulations to the netlist ultimately lead to a circuit fabrication blueprint
(such as a photo mask set for an ASIC or a bit stream file for an FPGA).

File extension for VERILOG program / code: .v

5.1.2 Introduction to Xilinx-ISE:


Xilinx ISE (Integrated Software Environment) is a software tool produced by Xilinx for
synthesis and analysis of HDL designs, enabling the developer to synthesize ("compile") their
designs, perform timing analysis, examine RTL diagrams, simulate a design's reaction to
different stimuli, and configure the target device with the programmer. The ISE® Design Suite
is the Xilinx® design environment, which allows you to take your design from design entry to
Xilinx device programming. With specific editions for logic, embedded processor, or Digital
Signal Processing (DSP) system designers, the ISE Design Suite provides an environment
tailored to meet your specific design needs.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


37
Achieve Greater Designer Productivity
The idea of system design is to integrate different domains to better leverage resources within
the FPGA. With applications which rely more and more on functions for Digital Signal
Processing, or DSP, significant performance benefits are possible through the ISE Design Suite:
System Edition which enables the use of accelerators which can be leveraged by the processor.
In fact, one of the big benefits of FPGA specific system design is the ability perform system
partitioning, managing the trade-offs over what’s implemented in software and what’s
implemented in software. For many users, algorithms are not necessarily optimized for low level
HDL languages.

Attain Breakthrough Performance, Power and Cost Benefits

Create own custom-processing platform while reducing your system cost by consolidating
external functions into an FPGA. Select the perfect balance of feature and size for your system,
and optimize hardware/software design trade-offs for the best price-performance results that
meet your exacting requirements.

Focus on Design Differentiation

The ISE Design Suite System Edition provides a comprehensive suite of integrated development
environment, software tools, configuration wizards, and IP that facilitates your design and
utilizes all of the flexibility offered by a programmable platform. Xilinx CORE Generator™
System, included in all Editions of the ISE Design Suite, accelerates design time by providing
access to highly parameterized Intellectual Properties (IP) for Xilinx FPGAs and is included in
the ISE Design Suite. The available user-customizable IP functions range in complexity from
commonly used functions, such as memories and FIFOs, to system-level building blocks, such
as filters and transforms. Using these IP blocks can save days to months of design time. The
highly optimized IP allows FPGA designers to focus efforts on building designs quicker while
helping bring products to market faster.

A. ISE Design Suite: Logic Edition


The ISE Design Suite Logic Edition allows you to go from design entry, through
implementation and verification, to device programming from within the unified environment of
the ISE Project Navigator or from the command line. This edition includes exclusive tools and
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

technologies to help achieve optimal design results, including the following:

BRECW, Hyderabad Page


38

 PlanAhead™ software - allows you to do advanced FPGA floor planning. The


PlanAhead software includes PinAhead, an environment designed to help you to import
or create the initial I/O Port list, group the related ports into separate folders called
“Interfaces” and assign them to package pins. PinAhead supports fully automatic pin
placement or semi-automated interactive modes to allow controlled I/O Port assignment.
With early, intelligent decisions in FPGA I/O assignments, you can more easily optimize
the connectivity between the PCB and FGPA.

 CORE Generator™ software - provides an extensive library of Xilinx LogiCORE™


IP from basic elements to complex system level IP cores.
 Smart Guide™ technology - allows you to use results from a previous implementation
to guide the next implementation for faster incremental implementation.
 ChipScope™ Pro tool - assists with in-circuit verification.

B. ISE Design Suite: Embedded Edition

The ISE Design Suite Embedded Edition includes all the tools and capabilities of the Logic
Edition with the added capabilities of the Embedded Development Kit (EDK). This pre-
configured kit is an integrated software solution for designing embedded processing systems,
which includes the Platform Studio tool suite as well as all the documentation and IP required
for designing Xilinx Platform FPGAs with embedded PowerPC® hard processor cores and
MicroBlaze™ soft processor cores. This edition provides an integrated development
environment of embedded processing tools, processor cores, IP, software libraries, and design
generators, including the following:

 Xilinx Platform Studio (XPS) - provides an integrated environment for creating


software and hardware specification flows for embedded processor systems based on
MicroBlaze and PowerPC processors. It also provides an editor and a project
management interface to create and edit source code. XPS allows you to customize tool
flow configuration options and provides a graphical system editor for connection of
processors, peripherals, and buses.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


39

 Hardware Platform Generation Tool (PlatGen) - customizes and generates the


embedded processor system through the use of hardware netlist Hardware Description
Language (HDL) files. By default, PlatGen synthesizes each processor IP core instance
found in your embedded hardware design using Xilinx Synthesis Technology (XST).
PlatGen also generates the system-level HDL file that interconnects all the IP cores,
which can then be synthesized as part of the overall design flow.
 Base System Builder Wizard (BSB) - allows you to quickly create a working
embedded design, using any features of a supported development board or using basic
functionality common to most embedded systems. After you create a basic system, you
can then customize it using the XPS and ISE software tools.
 Simulation Model Generation Tool (SimGen) - generates simulation models of your
embedded hardware system, based either on your original, behavioral embedded
hardware design or your finished, timing-accurate device implementation. SimGen can
also incorporate your embedded software to run on the model.
 Create and Import Peripheral Wizard - helps you create your own peripherals and
import them into EDK-compliant repositories or XPS projects. The wizard can create an
HDL template for your custom logic and provides an interface to one of the supported
IBM Core Connect or Xilinx FSL buses.
 Software Development Kit (SDK) - provides a C/C++ development environment for
software application projects. SDK is based on the Eclipse open source standard. SDK
provides tool software project management and access to the GNU tool chain for code
compilation and debug. It is also available for purchase as a standalone product.
 GNU Software Development Tools - - assist with compiling and debugging.
Embedded software applications written in C, C++, or assembly are compiled using the
GNU compiler tool chain. The GNU tool chain is part of the SDK and customized to
target the PowerPC and MicroBlaze processors. For detailed information about the GNU
tools, including compilers and debuggers, see the "GNU Compiler Tools" and "GNU
Debugger (GDB)" chapters in the Embedded System Tools Reference Manual.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

 Xilinx Microprocessor Debugger (XMD) and GNU Software Debugging


Tools - allows you to debug your embedded application; either on the host development
system, using an instruction set simulator, or on a board that has a Xilinx device loaded
with your hardware Bitstream. For more information on XMD, see the "Xilinx
Microprocessor Debugger (XMD)" chapter in the Embedded System Tools Reference
Manual.

BRECW, Hyderabad Page


40

 Library Generation Tool (LibGen) - - configures libraries, device drivers, file systems,
and interrupt handlers for the embedded processor system to create a software platform.
 Bitstream Initializer (BitInit) - - updates a device configuration bit stream to initialize
the on-chip instruction memory with the software executable. For more information, see
the "Bitstream Initializer (BitInit)" chapter of the Embedded System Tools Reference
Manual and the “Initializing Software Overview” topic in the XPS Help.

C. ISE Design Suite: DSP Edition


The ISE Design Suite DSP Edition includes all the tools and capabilities of the Logic Edition
with the added capabilities of the System Generator for DSP and the AccelDSP™ Synthesis
Tool. This edition provides an integrated environment with tools to help you achieve optimal
design results for your DSP design in less time, including the following:

 System Generator for DSP - allows you to define and verify complete DSP systems
using industry-standard tools from The MathWorks. When using System Generator,
previous experience with Xilinx devices or RTL design methodologies is not required.
Designs are captured in the DSP-friendly Simulink® modeling environment using a
Xilinx-specific block set. All of the downstream synthesis and implementation steps are
automatically performed to generate a device programming file.
 AccelDSP Synthesis Tool - allows you to transform a MATLAB floating-point design
into a hardware module that can be implemented in a Xilinx device. The AccelDSP
Synthesis Tool features an easy-to-use graphical interface that controls an integrated
environment with other design tools such as MATLAB tools, ISE software, and other
industry- standard HDL simulators and logic synthesizers. AccelDSP Synthesis provides
the following capabilities:
• Reads and analyzes a MATLAB floating-point design.
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

• Automatically creates an equivalent MATLAB fixed-point design.


• Invokes a MATLAB simulation to verify the fixed-point design.
• Provides you with the power to quickly explore design trade-offs of
algorithms that are optimized for the target device architectures.
• Creates a synthesizable RTL HDL model and a test bench to ensure bit-
true, cycle-accurate design verification.
• Provides scripts that invoke and control down-stream tools such as HDL
simulators, RTL logic synthesizers, and ISE implementation tools.

BRECW, Hyderabad Page


41

D. ISE Design Suite: System Edition


The ISE Design Suite System Edition includes all of the tools and capabilities of the Logic
Edition, Embedded Edition, and DSP Edition.
5.1.2.1. Xilinx Design Flow Overview:

The following steps are involved in the realization of a digital system using Xilinx FPGAs, as
illustrated by the following figure.

Figure 5.1: Overview of the various steps involved in the design flow of a digital system
Design Entry
The first step is to enter y our design. This can be done by creating “Source” files. Source files
can be created in different formats such as a schematic, or a Hardware Description Language
(HDL) such as VHDL, Verilog. A project design will consist of a top-level source file and
various lower-level source files. Any of these files can be either a schematic or a HDL file.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Design Synthesis
The synthesis step creates netlist files from the various source files. The netlist files can serve as
input to the implementation module.

Design Verification (simulation)


This is an important step that should be done at various stages of the design. The simulator is
used to verify the functionality of a design (functional simulation), the behavior and the timing
(timing simulation) of your circuit. Timing simulation is run after implementing your circuit in
the FPGA since it needs to know the actual placement and routing to find out the exact speed
and timing of the circuit.

BRECW, Hyderabad Page


42
Design Implementation
After generating the netlist file (synthesis step), the implementation will convert the logic design
into a physical file that can be downloaded on the target device (e.g. Vertex FPGA). This step
involves three sub-steps: Translating the netlist, Mapping and Place&Route.

Device Configuration
This refers to the actual programming of the target FPGA by downloading the programming file
to the Xilinx FPGA.

5.1.3. Modelsim:

Modelsim is a verification and simulation tool for VHDL, Verilog, System Verilog, and mixed
language designs. Modelsim is a powerful simulator that can be used to simulate the behavior
and performance of logic circuits. Modelsim is an easy-to-use yet versatile VHDL/ (System)
Verilog/ SystemC simulator by Mentor Graphics. It supports behavioral, register transfer level,
and gate-level modeling. Modelsim supports all platforms used here at the institute of Digital
and Computer Systems (i.e. Linux, Solaris and Windows) and many others too.

5.2. Hardware used for the Project

5.2.1. FPGA:
FPGA implementations have the potential to be parallel using a mixture of these two forms. For
example, the FPGA could be configured to partition the image and distribute the resulting
sections to multiple pipelines all of which could process data concurrently. Such parallelization
is subject to the processing mode and hardware constraints of the system.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

5.2.1.1 The Advantage of Using FPGAs:

Image processing is difficult to achieve on a serial processor. This is due to the large data set
required to represent the image and the complex operations that need to be performed on the
image. Consider video rates of 25 frames per second, a single operation performed on every
pixel of a 768X576 color image (Standard PAL frame) equates to 33 million operations per
second. FPGA consists of a matrix of logic blocks that are connected by an interconnect
network. Both the logic blocks and the interconnect network are reprogrammable allowing
application specific hardware to be constructed, while at the same time maintaining the ability to
change the functionality of the system with ease. As such, an FPGA offers a compromise
between the flexibility of general purpose processors and the hardware-based speed of ASICs.

BRECW, Hyderabad Page


43

5.2.1.2 Hardware Constraints:

There are three modes of processing: stream, offline and hybrid processing. In stream
processing, data is received from the input device in a raster nature at video rates. Memory
bandwidth constraints dictate that as much processing as possible can be performed as the data
arrives. In offline processing there is no timing constraint. This allows random access to
memory containing the image data. The speed of execution in most cases is limited by the
memory access speed. The hybrid case is a mixture of stream and offline processing. In this
case, the timing constraint is relaxed so the image is captured at a slower rate. While the image
is streamed into a frame buffer it can be processed to extract the region of interest. This region
can be processed by an offline stage which would allow random access to the region’s elements.

5.2.1.3 Timing Constraints:

If there is no requirement on processing time then the constraint on timing is relaxed and the
system can revert to offline processing. This is often the result of a direct mapping from a
software algorithm. The constraint on bandwidth is also eliminated because random access to
memory is possible and desired values in memory can be obtained over a number of clock
cycles with buffering between cycles. Offline processing in hardware therefore closely
resembles the software programming paradigm; the designer need not worry about constraints to
any great extent. This is the approach taken by languages that map software algorithms to
hardware. The goal is to produce hardware that processes the input data as fast as possible given
various automatic and manual optimization techniques.

5.2.1.4 Bandwidth Constraints:


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Frame buffering requires large amounts of memory. The size of the frame buffer depends
on the transform itself. In the worst case (rotation by90º, for example) the whole image must be
buffered. A single 24-bit (8-bitsper color channel) color image with 768X576 pixels requires 1.2
MB of memory. FPGAs have very limited amounts of on-chip RAM. The logic blocks
themselves can be configured to act like RAM, but this is usually an inefficient use of the logic
blocks. Typically some sort of off-chip memory issued but this only allows a single access to the
frame buffer per clock cycle, which can be a problem for the many operations that require
simultaneous access to more than one pixel from the input image. For example, bilinear
interpolation requires simultaneous access to four pixels from the input image. This will be on a
per clock cycle basis if real-time processing constraints are imposed.

BRECW, Hyderabad Page


44

5.2.1.5 Hardware Implementation:

In the implementation of image enhancement algorithms the Spartan®-3E FPGA is used to take
advantage of the different input and output interfaces to implement and verify the system.

Figure 5.2: Spartan®-3E FPGA starter kit board

The Spartan®-3E FPGA Starter Kit board supports a variety of FPGA configuration options:

 Download FPGA designs directly to the Spartan-3E FPGA via JTAG, using the onboard
USB interface. The on-board USB-JTAG logic also provides in-system programming

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

for the on-board Platform Flash PROM and the Xilinx XC2C64A CPLD. SPI serial
Flash and Strata Flash programming are performed separately.
 Program the on-board 4 Mbit Xilinx XCF04S serial Platform Flash PROM, then
configure the FPGA from the image stored in the Platform Flash PROM using Master
Serial mode.
 Program the on-board 16 Mbit ST Microelectronics SPI serial Flash PROM, then
configure the FPGA from the image stored in the SPI serial Flash PROM using SPI
mode.
 Program the on-board 128 Mbit Intel Strata Flash parallel NOR Flash PROM, then
configure the FPGA from the image stored in the Flash PROM using BPI Up or BPI
Down configuration modes. Further, an FPGA application can dynamically load two
different FPGA configurations using the Spartan-3E FPGA’s Multi Boot mode.

BRECW, Hyderabad Page


45 5.2.1.6 Key Components and Features:

The key features of the Spartan-3E Starter Kit board are:

o Xilinx XC3S500E Spartan-3E FPGA


o Up to 232 user-I/O pins
o 320-pin FBGA package
o Over 10,000 logic cells
o Xilinx 4 Mbit Platform Flash configuration PROM
o Xilinx 64-macrocell XC2C64A Cool Runner™ CPLD
o 64 MByte (512 Mbit) of DDR SDRAM, x16 data interface, 100+ MHz
o 16 MByte (128 Mbit) of parallel NOR Flash (Intel Strata Flash)
o FPGA configuration storage
o Micro Blaze code storage/shadowing
o 16 Mbits of SPI serial Flash (ST Micro)
o FPGA configuration storage
o Micro Blaze code shadowing
o 2-line, 16-character LCD screen
o PS/2 mouse or keyboard port
o VGA display port
o 10/100 Ethernet PHY (requires Ethernet MAC in FPGA)
o Two 9-pin RS-232 ports (DTE- and DCE-style)
o On-board USB-based FPGA/CPLD download/debug interface
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

o 50 MHz clock oscillator


o SHA-1 1-wire serial EEPROM for bit stream copy protection
o Hirose FX2 expansion connector
o Three Digilent 6-pin expansion connectors
o Four-output, SPI-based Digital-to-Analog Converter (DAC)
o Two-input, SPI-based Analog-to-Digital Converter (ADC) with programmable-gain
Pre-amplifier
o Chip Scope™ Soft Touch debugging port
o Rotary-encoder with push-button shaft
o Eight discrete LEDs
o Four slide switches
o Four push-button switches
o SMA clock input
o 8-pin DIP socket for auxiliary clock oscillator

The proposed system is implemented on Spartan 3E development board, brief overview of Spartan 3E board
is given in section 5.2.

BRECW, Hyderabad Page


46

CHAPTER 6
Results And Discussions
6.1 Creating a New Project
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows
desktop. This should open up the Project Navigator window on your screen. This window
shows the last accessed project.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig 6.1: Vivado Project Navigator window (snapshot from vivado software)

Opening a project Select File->New Project to create a new project. This will bring up a new
project window on the desktop. Fill up the necessary entries as follows:

BRECW, Hyderabad Page


47
6.2 : New Project Initiation

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig 6.2 : New Project Initiation window (snapshot from Vivado software)

Project Name: Write the name of your new project

Project Location: The directory where you want to store the new project (Note: DO NOT
specify the project location as a folder on Desktop or a folder in the Xilinx\bin directory.
Your D: drive is the best place to put it. The project location path is NOT to have any spaces
in it eg: D:\ABCD\TA\new lab\sample exercises\o_gate is NOT to be used) Leave the top
level module type as HDL.

Example: If the project name were “RoBA”, enter “RoBA” as the project name and then
click “Next”.

Clicking on NEXT should bring up the following window:

BRECW, Hyderabad Page


48
6.3 : Adding source code into the project
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

Fig 6.3: Adding source code into the project(snapshot from Vivado software)

In this step we need to add the codes according to the block diagram that is proposed system
in the vivado.

6.4: Selecting the Board Required

Fig 6.4: Selecting the Board Required (snapshot from Vivado software)

BRECW, Hyderabad Page


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

49

In this step we need to select zed board zynq evaluation and development kit for the dumping
of the code into this board to get the better performance of the vivado software and click on
the next.

This zeb board is more efficient than any other boards also we can use the other but with the
good power efficiency and delay and area

6.5:New Project Summary

Fig 6.5:New Project Summary ( A Snapshot from Vivado Software)

Make sure that all the files are available with green marks and then click on ok to continue.
Then a window will open which shows that to create the project we need to click on finish.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


50

6.6 : Editor Window

Fig 6.6: Editor Window( A Snapshot from Vivado Software)

Select the uut _ POSIT multiplier and set as the Top in the editor window and open
elaborated design to get the RTL diagram and run synthesis and implementation
6.7: Code

Fig 6.7: Code for fifo( A Snapshot from Vivado Software)

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


51

Fig 6.7.1

Fig 6.7.2

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


52

Fig 6.7.3

Fig 6.7.4

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


53

CHAPTER 7
Simulation And Synthesis Report.

7.1 : RTL And Technology Schematics

Fig 7.1: RTL ( A Snapshot from Vivado Software)

These are the RTL Diagram from the open elaborated design

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


54

7.2 : Systhesis And Implementation

Fig 7.2: Synthesis And Implementation

This is the output of the synthesis and implementation

7.3 : Area

Fig 7.3.1: Synthesis result of Area

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


55
7.4 : Power

Fig 7.4 : Synthesis result of power

7.5 : Delay

Fig 7.5 : Synthesis result of delay


[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


56
7.6 : result

Fig 7.6: simulation of the output

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


57

CHAPTER 8
CONCLUSION
An advanced FIFO structure based NoC is simulated and synthesized in Xilinx 14.7 ISE and
implemented Vertex-6 FPGA device to analyze the performance in terms of occupied area,
latency, power consumption and throughput. Single router is designed initially and then
designed mesh based NoC to realize the memory utilization of FPGA. Fig.4 indicates that
Register Transfer Level (RTL) schematic of single NoC router which is composed with input
and output ports, arbiter, crossbar and channel control modules. The figure also describes the
utilizations in terms of memory units each component individually. Each module of NoC
designed using Verilog Hardware Description Language (HDL) separately and integrated as
one module. An advanced queued buffer is designed both typical NoC and Bi-directional NoC
thereby comparing both designs easily. The simulation results are analyzed area utilization in
terms of occupied number of slices registers, LUT-FF pairs and slice registers), latency in
terms of delay, Maximum operating frequency, power consumption in terms of dynamic power
dissipation, memory utilization in terms of number of RAMs, and finally, throughput in terms
of flits per sec., node. describes the performance of NoC router in terms area, delay and power
consumption which are obtained by implemented proposed in FPGA configuration. From
fig.5, it clear that proposed design shows less area overhead because of queued buffer shared
between neighbour routers and also data flits are used to transfer data packet between source
and destination. The memory unit such as number of RAMs is also less because of active
components uses buffer whereas idle modules are not using RAMs. The delay of proposed
design is less alternatively operating frequency is high because more number of channels (both
physical and virtual) is available between source and destination. The total power consumption
is slightly increased than existing work because of virtual channels are increased dynamic
power consumption while data packet transfer NoC is the solution for intercommunication of
SoC such as parallel communication wires and also removes barriers of bus based
communication. In this paper, an advanced memory unit is proposed and implemented in Bi-
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc

NoC to achieve less memory requirement of buffer and also high performance in terms of
Maximum operating bandwidth. When compared to previous work, the proposed work
improved approximately 28% delay and 17% resources utilization. As RingNet[15] used
Round robin arbiter, the resources utilization is more than proposed work. Data packet divided
into number of flits and queued buffer is shared between neighbour routers thereby requiring
of buffer size is less when data transferred through data flits from source to destination. This
advanced router design integrated in Bi-NoC configuration to achieve

BRECW, Hyderabad Page


58

higher data transfer speed when compared to typical NoC. Virtual channels are created between
routers when data flit is block in case of physical channel is not available therefore data packet
latency is reduced as well as deadlock error avoided. The implementation results are improved
in terms of resource utilization when compared with existing work. In future, NoC based
processors are used at Artificial Intelligence applications. The performance NoC is needed to be
improved by advancing router components because the power consumption increased through
virtual channels at advanced FIFO structure.
Many future work directions are inspired by this paper including exploiting the
mathematical properties of the code space to find additional nonorthogonal codes and boost the
CDMA interconnect capacity and exploring more architectural optimizations of the OCI
crossbar. Studying the robustness of CDMA interconnects and its enhancement techniques will
be one of the prior future research points. Moreover, we plan to investigate using the OCI-based
routers in different network topologies, evaluate their performance using standard benchmarks,
and study their suitability for various applications.

[Type here] [Type here] [Type here]


Mini Project Report Advanced FIFO Structure For Router Bi-Noc

BRECW, Hyderabad Page


59

[Type here] [Type here] [Type here]

You might also like