Miniproject
Miniproject
by
Advanced FIFO Structure For Router In Bi-Noc
Submitted in partial fulfillment for the award of the degree of Bachelor of Technology in
Electronics and Communication Engineering
by
Internal Guide
Ms. B ELEENA
2024-25
Acknowledgement
The satisfaction that accompanies the successful completion of the task would be
incomplete without the mention of the people who made it possible, whose constant
guidance and encouragement crown all the efforts with success.
We are equally thankful to Ms. B Eleena, Project Coordinator for her continuous
support and all the staff of Electronics and Communication Engineering Department of
BRECW for their timely help and suggestions in the Project.
List of Figures i
Abstract ii
Chapter 1 Introduction 1
1.1 Introduction 1
2.2 Background 9
3.1 Overview 10
3.2.2 Packet-Payload 14
3.5 Features 17
3.6 Specification 20
5.1 Introduction 25
5.2 Working 25
5.3 Results 26
5.4 Conclusion 28
6.1 Conclusion 29
3.1 MIMO-OFDM 16
3.2 Block Diagram of MIMO-OFDM 20
5.1 Triangular wave 26
5.2 Probability Spectral Density with 26
respect to SNR
5.3 RMSE with respect to SNR 27
5.4 BPSK with AWGN 27
5.5 Probability Spectral Density with 28
respect to SNR
i
Abstract
Network on chip (NoC) becomes a promising solution for intercommunication infrastructure in
System on Chip (SoC) as traditional methods exhibit severe bottlenecks at intercommunication
among processor elements. However, designing of NoC is majorly complex because of lot of issues
raise in terms of performance metrics such as system scalability, latency, power consumption and
signal integrity. This paper discussed issues of memory unit in router and thereafter, proposing
advanced memory structure. To obtain efficient data transfer, FIFO buffers are implemented in
distributed RAM and virtual channels for FPGA based NoC. An advanced FIFO based memory
units are proposed in NoC router and the performance is evaluated in Bi-directional NoC (Bi-NoC).
The major motivation of this paper is to reduce burden of router while improving FIFO internal
structure. To enhance the speed data transfer, Bi-NoC with a self-configurable intercommunication
channel is proposed. The Simulations and synthesis results are proven guaranteed throughput,
predictable latency, and fair network access highly provided when compared to recent works.
Keywords: Bi-NoC; FIFO; Virtual Channel; Switch Allocator; Router; SoC.
ii
[Document title]
Chapter 1
Introduction
1.1 Introduction:
System on chip (SOC) is a complex interconnection of various functional elements. It creates
communication bottleneck in the gigabit communication due to its bus based architecture. Thus
there was need of system that explicit modularity and parallelism, network on chip possess many
such attractive properties and solve the problem of communication bottleneck. It basically works
on the idea of interconnection of cores using on chip network.
The communication on network on chip is carried out by means of router, so for implementing
better NOC, the router should be efficiently design. This router supports four parallel connections
at the same time. It uses store and forward type of flow control and Fsm Controller deterministic
routing which improves the performance of router. is packet switching which is generally used on
network on chip. The switching mechanism used here
In packet switching the data the data transfers in the form of packets between cooperating routers
and independent routing decision is taken. The store and forward flow mechanism is best because
it does not reserve channels and thus does not lead to idle physical channels.
The arbiter is of rotating priority scheme so that every channel once get chance to transfer its
data. In this router both input and output buffering is used so that congestion can be avoided at
both sides.
A router is a device that forwards data packets across computer networks. Routers perform the
data "traffic direction" functions on the Internet. A router is a microprocessor- controlled device
that is connected to two or more data lines from different networks.
When a data packet comes in on one of the lines .The router reads the address information in the
packet to determine its ultimate destination. Then, using information in its routing table, it directs
the packet to the next network on its journey.
[Document title]
The router is a” Four Port Network Router” has a one input port from which the packet enters. It
has three output ports where the packet is driven out. Packet contains 3 parts. They are Header,
data and frame check sequence. Packet width is 8 bits and the length of the packet can be between
1 bytes to 63 bytes. Packet header contains three fields DA and length.
Destination address (DA) of the packet is of 8 bits. The switch drives the packet to respective
ports based on this destination address of the packets. Each output port has 8-bit unique port
address. If the destination address of the packet matches the port address, then switch drives the
packet to the output port, Length of the data is of 8 bits and from 0 to 63.
Length is measured in terms of bytes. Data should be in terms of bytes and can take anything.
Frame check sequence contains the security check of the packet. It is calculated over the header
and data.
A data packet is typically passed from router to router through the networks of the Internet until it
gets to its destination computer. Routers also perform other tasks such as translating the data
transmission protocol of the packet to the appropriate protocol of the next network.
When multiple routers are used in interconnected networks, the routers exchange information
about destination addresses, using a dynamic routing protocol. Each router builds up a table
listing the preferred routes between any two systems on the interconnected networks. A router has
interfaces for different physical types of network connections, (such as copper cables, fiber optic,
or wireless transmission).
It also contains firmware for different networking protocol standards. Each network interface
uses this specialized computer software to enable data packets to be forwarded from one protocol
transmission system to another.
Routers may also be used to connect two or more logical groups of computer devices known as
subnets, each with a different sub-network address. The subnets addresses recorded in the router
do not necessarily map directly to the physical interface connections.
[Document title]
The very first device that had fundamentally the same functionality as a router does today was the
Interface Message Processor (IMP); IMPs were the devices that made up the ARPANET, the first
packet network. The idea for a router (called "gateways" at the time) initially came about through
an international group of computer networking researchers called the International Network
Working Group (INWG).
Set up in 1972 as an informal group to consider the technical issues involved in connecting
different networks, later that year it became a subcommittee of the International Federation for
Information Processing.
These devices were different from most previous packet networks in two ways. First, they
connected dissimilar kinds of networks, such as serial lines and local area networks. Second, they
were connectionless devices, which had no role in assuring that traffic was delivered reliably,
leaving that entirely to the hosts (this particular idea had been previously pioneered in the
CYCLADES network).
The idea was explored in more detail, with the intention to produce a prototype system, as part of
two contemporaneous programs. One was the initial DARPA-initiated program, which created
the TCP/IP architecture in use today.
Sometime after early 1974 the first Xerox routers became operational. The first true IP router was
developed by Virginia Strazisar at BBN, as part of that DARPA-initiated effort, during 1975-
1976. By the end of 1976, three PDP-11-based routers were in service in the experimental
prototype Internet.
The first multiprotocol routers were independently created by staff researchers at MIT and
Stanford in 1981; the Stanford router was done by William Yeager, and the MIT one by Noel
Chiappa; both were also based on PDP-11s.
Virtually all networking now uses TCP/IP, but multiprotocol routers are still manufactured.They
were important in the early stages of the growth of computer networking, when protocols other
than TCP/IPV Modern internet routers that handle both IPv4 and IPv6 are multiprotocol, but are
simpler devices than routers processing AppleTalk, DECnet, IP, and Xerox protocols.
[Document title]
connect all computers to the Internet without having to pay a full broadband subscription service
to their ISP for each computer on the network.
In many instances, an ISP will allow you to use a router and connect multiple computers to a
single Internet For most home users, they may want to set-up a LAN (local Area Network) or
WLAN (wireless LAN) and connection and pay a nominal fee for each additional computer
sharing the connection. This is when home users will want to look at smaller routers, often called
broadband routers that enable two or more computers to share an Internet connection. Within a
business or organization, you may need to connect multiple computers to the Internet, but also
want to connect multiple private networks not all routers are created equal since their job will
differ slightly from network to network. Additionally, you may look at a piece of hardware and
not even realize it is a router.Broadband or ICS routers will look a bit different depending on the
manufacturer or brand, but wired routers are generally a small box-shaped hardware device with
ports on the front or back into which you plug each , along with a port to plug in your.
broadband modem. These connection ports allow the router to do its job of routing the data
packets between each of the computers and the data going to and from the Internet.
Depending on the type of modem and Internet connection you have, you could also choose a
router with phone or fax machine ports. A wired Ethernet broadband router will typically have a
built-in Ethernet switch to allow for expansion. These routers also support NAT (network address
translation), which allows all of your computers to share a single IP address on the Internet.
Internet connection sharing routers will also provide users with much needed features such as an
SPI firewall or server DHCP server.
The challenge of the verifying a large design is growing exponentially. There is a need to define
new methods that makes functional verification easy. Several strategies in the recent years have
been proposed to achieve good functional verification with less effort. Recent advancement
towards this goal is methodologies. The methodology defines a skeleton over which one can add
flesh and skin to their requirements to achieve functional verification. This project is aimed at
building a reusable test bench for verifying Router Protocol Verilog. Second part is verification plan
specifying the verification requirements and approaches to attack the problem, architecture of the test
bench gives complete description about the components and sub components used to achieve the
verification goal
[Document title]
Chapter 2
Literature Survey
A router is a device that forwards data packet between computer networking, creating an overlay
internet work. A router is connected to two or more data lines from different networks. When a
data packet comes in one of the lines, the router reads the address information in the packet to
determine its ultimate destination.
Then, using information in its routing table or routing policy, it directs the packet to the next
network on its journey. Routers perform the "traffic directing" functions on the internet. A data
packet is typically forwarded from one router to another through the networks that constitute the
internetwork until it reaches its destination node.
Routers may also be used to connect two or more logical groups of computer devices known as
subnets, each with a different sub-networking address. The subnets addresses recorded in the
router do not necessarily map directly to the physical interface connections. Forwarding an IP
datagram generally requires the router to choose the address and relevant interface of the next-
hop router or (for the final hop) the destination host.
Routers may provide connectivity within enterprises, between enterprises and the Internet, and
between internet service provide (ISPs) networks. The largest routers (such as the ciso CRS-1 or
juniper T1600 interconnect the various ISPs, or may be used in large enterprise networks.
Smaller routers usually provide connectivity for typical home and office networks. Other
networking solutions may be provided by a backbone Wireless Distribution System (WDS),
which avoids the costs of introducing networking cables into buildings
[Document title]
Such scalable bandwidth requirement can be satisfied by using on-chip packet-switched micro-
network of interconnects, generally known as Network-on-Chip (NOC) architecture. The basic
idea came from traditional large-scale multi-processors and distributed computing networks. The
scalable and modular nature of NOCs and their support for efficient on-chip communication lead
to NOC-based system implementations.
Even though the current network technologies are well developed and their supporting features
are excellent, their complicated configurations and implementation complexity. make it hard to be
adopted as an on-chip interconnection methodology. In order to meet typical SOCs or multi-core
processing environment, basic module of network interconnection like switching logic, routing
algorithm and its packet definition should be light-weighted to result in easily implemental
solutions.
2.2 Background:
The router used here is it avoid congestion and communication bottleneck. Although there are
number of router implementation has already been done. Some of the related works are included
here. Marescaux presented the implementation of router for NOC based system which has 2D
torus network topology. Packet size was 8 bits and 2 control bits. The main drawback here was
it was a 2D torus formed using 1D router which creates a serious bottleneck in traffic. Zerferino
[Document title]
presented a soft core router for NOC ,the problem with this router implementation was it uses 4
flit buffer having 8 bit implementation which is quite high.
Its input and output channel has four distinct blocks and uses a large decoding logic. Moraes also
presented its work but the drawback with it was that its packet has two headers which are quite
expensive. The buffer here is present only with input channel. The absence of output buffer
creates a serious problem in the implementation of router as it increases the problem of
congestion.
Our paper removes most of the problems cited above and improves the performance of router.
The most familiar type of routers are home and small office routers that simply pass data, such as
web pages and email, between the home computers and the owners’ cable or DSL modem, which
connects to the internet (ISP).However more sophisticated routers, which connect large business
or ISP networks up to the powerful core routers that forward data at high speed along the optical
fiber lines of the Internet backbone.
[Document title]
Mini Project Report Advanced FIFO Structure For Router In Bi-Noc
CHAPTER 3
Router Design Specification
3.1 Overview:
Router is a packet based protocol. Router drives the incoming packet which comes from the input
port to output ports based on the address contained in the packet. The router has a one input port
from which the packet enters. It has three output ports where the packet is driven out. The router
has an active low synchronous input resetn which resets the router.
.
8 8
data data_out_0
packet_valid vld_out_0
suspend_data read_enb_0
err 8
data_out_1
vld_out_1
Router_1X3 read_enb_1
clock 8
resetn data_out_2
n vld_out_2
read_enb_2
Data packet moves in to the input channel of one port of router by which it is forwarded to the
output channel of other port. Each input channel and output channel has its own decoding logic
which increases the performance of the router. Buffers are present at all ports to store the data
temporarily.
The buffering method used here is store and forward. Control logic is present to make arbitration
decisions. Thus communication is established between input and output ports. According to the
destination path of data packet, control bit lines of FSM are set.
The movement of data from source to destination is called switching mechanism
clock
delay
reset
packet_valid
data H D D D P H D D D P
Suspend_data
err
sent packet Packet 1 (addr = 0) Packet 1 (addr =
0)
H = Header, D = Data, P = Parity
All input signals are active high and are synchronized to the falling edge of the clock. This
is because the DUV router is sensitive to the rising edge of clock. Therefore, driving input
signals on the falling edge ensures adequate setup and hold time, but the signals can also
be driven on the rising edge of the clock.
The packet _valid signal has to be asserted on the same clock as when the first byte of a
packet (the header byte), is driven onto the data bus.
Since the header byte contains the address, this tells the router to which output channel the
packet should be routed (data_out_0, data_out_1, or data_out_2).
Each subsequent byte of data should be driven on the data bus with each new
rising/falling clock.
After the last payload byte has been driven, on the next rising/falling clock, the packet
_valid signal must be deasserted, and the packet parity byte should be driven. This signals
packet completion.
The input data bus value cannot change while the suspend_data signal is active (indicating
a FIFO overflow). The packet driver should not send any more bytes and should hold the
value on the data bus. The width of suspend_data signal assertion should not exceed 100
cycles.
The err signal asserts when a packet with bad parity is detected in the router, within 1 to
10 cycles of packet completion
3.4 Router Output Protocol:
The characteristics of the output protocol are as follows:
All output signals are active high and can be synchronized to the rising/falling edge of the
clock. Thus, the packet receiver will drive sample data at the rising/falling edge of the
clock. The router will drive and sample data at the rising edge of clock.
Each output port data_out_X (data_out_0, data_out_1, data_out_2) is internally buffered
by a FIFO of 1 byte width and 16 location depth.
The router asserts the vld_out_X (vlld_out_0, vld_out_1 or vld_out_2) signal when valid
data appears on the vld_out_X (data_out_0, data_out_1 or data_out_2) output bus. This is
a signal to the packet receiver that valid data is available on a particular router.
The packet receiver will then wait until it has enough space to hold the bytes of the packet
and then respond with the assertion of the read_enb_X (read_enb_0, read_enb1 or
read_enb_2) signal that is an input to the router.
The read_enb_X (read_enb0, read_enb_1 or read_enb_2) input signal can be asserted on
the rising/falling clock edge in which data are read from the data_out_X (data_out_0,
data_out_1 or data_out_2) bus.
As long as the read_enb_X (read_enb_0, read_enb_1 or read_enb_2) signal remains
active, the data_out_X (data_out_0, data_out_1 or data_out_2) bus drives a valid packet
byte on each rising clock edge.
The packet receiver cannot request the router to suspend data transmission in the middle of
the packet. Therefore, the packet receiver must assert the read_enb_X (read_enb_0,
read_enb_1 or read_enb_2) signal only after it ensures that there is adequate space to hold
the entire packet.
The read_enb_X (read_enb_0, read_enb_1 or read_enb_2) must be asserted within 30
clock cycles of the vld_out_X (vld_out_0, vld_out_1 or vld_out_2) being asserted.
Otherwise, there is too much congestion in the packet receiver.
The DUV data_out_X (data_out_0, data_out_1 or data_out_2) bus must not be tri-stated
(high Z) when the DUV signal vld_out_X (vld_out_0, vld_out_1or vld_out_2) is asserted
(high) and the input signal read_enb_X (read_enb_0, read_enb_1 or read_enb_2) is also
asserted high.
clock
reset
packet_valid
data H D D D P H D D D P
vld_out_0
response delay
read_enb
_0
data_out_0 H D D D P
received
Packet 1 (addr = 0)
packet
3.6 Specifications:
Input Specifications:
Input Ports 1
Port size 8-bit
Input data packet
Total number of input pins 15
TABLE 3.1
Output Specifications:
Output Ports 4
Each Port size 8-bit
Output data packet
Total number of output pins 38
TABLE3. 2
Device:
Family Spartan 3e
Part Xc3s100e
package Vq100
Process Maximum
Speed grade -5
Frequency 200Mhz
TABLE 3.3
CHAPTER 4
Four Port Router Architecture
The router_reg module contains the status, data and parity registers for the Network router_1x3.
These registers are latched to new status or input data through the control signals provided by the
fsm_router.
There are 3 FIFO for each output port, which stores the data coming from input port based on the
control signals provided by fsm_router module.
The fsm_router block provides the control signals to the fifo, and router_reg module. The Router
blocks Diagram shown below fig…
Router blocks are
Register
Router controller(FSM)
FIFO Output Block
4.2 Register Block:
This module contains status, data and parity registers required by router. All the registers in this
module are latched on rising edge of the clock.
Data registers latches the data from data input based on state and status control signals, and this
latched data is sent to the fifo for storage. Apart from it, data is also latched into the parity
registers for parity calculation and it is compared with the parity byte of the packet. An error
signal is generated if packet parity is not equal to the calculated parity.
When the input ld_state is high and (fifo-full and packet_valid) is low or when the input
laf_state and output low_packet_valid both are high and the previous value of parity_done is
low. It is reseted to low value by reset_int_reg signal.
The output low_packet_valid is high.
When the input ld_state is high and packet_valid is low.It is reseted to low by reset_int_reg
signal.
First data byte i.e., header is latched inside the internal register first_byte when detect_add and
packet_valid signals are high, So that it can be latched to output dout when lfd_state signal goes
high.
Then the input data i.e., payload is latched to output dout if ld_state signal is high and fifo_full is
low.
Then the input data i.e., parity is latched to output dout if ld_state signal is high and fifo_full is
low.
The input data is latched to internal register full_state_byte when ld_state and fifo_full are high;
this full_state_byte data is latched inside the output dout when laf_state goes high.
Internal parity register stores the parity calculated for packet data, when packet is transmitted
fully, the internal calculated parity is compared with parity byte of the packet. An error signal is
generated if packet parity is not equal to the calculated parity.
In the above figure register block is synchronize with the fsm to latch input data to it. Here, clk,
resetn signals are synchronous with the entire module.
Eg: We are giving packet data as input to it and making read single (re1, re2, re3) as high w.r.t
input first data byte of the packet. The receiving data is driven to the Router Controller for
reaching its destination port. Which has 11 input pins (data_in [7:0],packet_valid , clk, reset).
Eg: data_in=8’b10101010, clk, reset, packet_valid are HIGH
This module generates all the control signals when new packet is sent to router. These control
signals are used by other modules to send data at output, writing data into the fifo.
STATE – WAIT_TILL_EMPTY
In this state neither new data is accepted nor data is latched by router_reg module, so
suspend_data signal is made high and write_enb_reg signal is made low. It waits for the
fifo_empty signal, when it goes high; it goes to the LOAD_FIRST_DATA state
STATE - LOAD_DATA0
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. Suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE0.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA0
state, it goes to LOAD_PARITY0 state, where last parity byte is latched.
STATE – LOAD_PARITY0
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched,
so, it goes to FIFO_FULL_STATE else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.
STATE – FIFO_FULL_STATE0
In this state neither new data is accepted nor any data is latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
In this state laf_state signal is generated for router_reg so that it can latch the data after
FIFO_FULL_STATE0, no new data is accepted so suspend_data is kept high, last data is latched
in router_reg module for that write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY0 state has passed,
if parity_done is high it goes to the last state CHECK_PARITY_ERROR0
Then it checks for low_packet_valid register, which if high shows that packet_valid for present
packet has been deasserted, if low_packet_valid is high it goes to LOAD_PARITY0 state
otherwise it goes back to the LOAD_DATA state..
STATE – CHECK_PARITY_ERROR0
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.
This state changes to default state DECODE_ADDRESS with next clock edge.
STATE - LOAD_DATA1
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE1.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA1
state, it goes to LOAD_PARITY1 state, where last parity byte is latched.
STATE – LOAD_PARITY1
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched, so,
it goes to FIFO_FULL_STATE1 else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR1.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.
STATE – FIFO_FULL_STATE1
In this state neither new data is accepted nor any data is latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
Signal full_state is generated for router_reg module. This state changes to
LOAD_AFTER_FULL1 state when fifo_full becomes low.
In this state laf_state signal is generated for router_reg so that it can latch the data after
FIFO_FULL_STATE1, no new data is accepted so suspend_data is kept high, last data is latched
in router_reg module for that write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY1 state has passed,
if parity_done is high it goes to the last state CHECK_PARITY_ERROR.1
Then it checks for low_packet_valid register, which if high shows that packet_valid for present
packet has been deasserted, if low_packet_valid is high it goes to LOAD_PARITY1 state
otherwise it goes back to the LOAD_DATA1 state
STATE – CHECK_PARITY_ERROR1
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.
This state changes to default state DECODE_ADDRESS with next clock edge
STATE - LOAD_DATA2
In this state data is latched inside the data registers of router_reg module, for this ld_state signal
is generated for router_reg module. suspend_data signal is made low, so that router can accept
the new data from input simultaneously, latched data is sent to the fifo and write_enb_reg is
generated for writing into present fifo.
If fifo_full input goes high then no more data can be accepted by router so it goes to
FIFO_FULL_STATE2.
Data is latched till the packet_valid signal is asserted, when it is de-asserted in LOAD_DATA2
state, it goes to LOAD_PARITY2 state, where last parity byte is latched
STATE – LOAD_PARITY2
In this state last byte is latched which is parity byte. If fifo_full is high, data cannot be latched,
so, it goes to FIFO_FULL_STATE else if fifo_full is low, it goes to state
CHECK_PARITY_ERROR.
Signal lp_state is generated for router_reg module. suspend_data signal is made high so that
now router don’t accepts any new data. write_enb_reg is made high for latching the last byte.
lp_state signal is generated for the router_reg module, so that last byte can be latched and the
parity bytes can be compared.
STATE – FIFO_FULL_STATE2
In this state neither new data is accepted nor is any data latched. So suspend_data signal is made
high and write_enb_reg signal is made low.
Signal full_state is generated for router_reg module. This state changes to
LOAD_AFTER_FULL state when fifo_full becomes low. In this state laf_state signal is
generated for router_reg so that it can latch the data after FIFO_FULL_STATE, no new data is
accepted so suspend_data is kept high, last data is latched in router_reg module for that
write_enb_reg is made high.
It checks for parity_done register which if high shows that LOAD_PARITY state has passed, if
parity_done is high it goes to the last state CHECK_PARITY_ERROR.
STATE – CHECK_PARITY_ERROR1
In this state reset_int_reg signal is generated, which resets the status and parity registers inside
the router_reg module. Neither any data is latched nor any input data is accepted. Router_reg
compares the data parity from packet with calculated parity during this state.This state changes to
default state DECODE_ADDRESS with next clock edge.
Fsm block will synchronize register and fifo modules. The function of fsm is it taken data from
data register and input data is latched to respective output based on header address which controls
the function of design. So, it is called Router controller.
There are 3 fifos used in the router design. Each fifo is of 8 bit width and 16 bit depth.
The fifo works on system clock. It has synchronous input signal reset.
If resetn is low then full =0, empty = 1 and data_out = 0
The FIFO has doing 3 deferent operations
Write Operation
Read operation
Read and Write Operation.
The FIFO write operation is done by when the data from input data_in is sampled at rising edge
of the clock when input write_enb is high and fifo is not full.in this condition onaly FIFO Write
operation is done.
Read Operation:
The FIFO Read Operation is The data is read from output data_out at rising edge of the clock,
when read_enb is high and fifo is not empty.
Read and Write operation can be done simultaneously.
Full – it indicates that all the locations inside fifo has been written.
Empty – it indicates that all the locations of fifo are empty.
The Output Block of Network Router conisistes of three FIFO.Each FIFO is a 8-Bit data Width
and 16 bit data depth .the strcture of OUTPUT Block is shown in below fig..
Eg: INPUT: clk,reset,read_enb are HIGH ,write_enb are LOW and data_in=8’b10101010.
OUTPUT:full is LOW,empty is HIGH and data_out=10101010.
4.5.1 Register:
It holds 8-bit values.Writing verilog code it should be declared as 8-bit width.
4.5.3 FIFO:
It is 8-bit width and 16-bit depth.For fifo full or empty we are taking fifo_full and fifo_empty
signals.For the status of full or empty of fifo we need a internal counter for counting it locations
upto 16 locations it mean it is 4-bit wide.Input signals are data_in(8-bit),we,re,clk,resetn and
output signals are data_ou(8-bit),t,fifo_empty,fifo_full.Data
driven when write and not fifo full and it read when read and not fifo empty.RTL code it is
written in verilog code in behavioral model.
It is verified by giving 16 bytes of data in data_in ,we is high then fifo_full becomes high.When
it is high data can’t be written into it.We get output in data_out and re is high it given all 16 bytes
of data which we had driven after that fifo_empty is high then we can’t read data and we also
verified when both we and re signals are high it is written in verilog code.
CDMA Transmitter
The eight bit input data corresponding to a particular user is converted into serial form by an eight
bit PISO. The PISO is clocked by Fmaster divided by 15 clock where Fmaster is 0.5GHz. Then it
is spreaded by the 15 bit PN code. The PN code generator is clocked by Fmaster. Spreaded data
of all the four users are summed up and generated the signal to be transmitted.
CDMA Receiver:
After de spreading the received signal with the corresponding code, it is compared with the same
PN code, which is converted into parallel, using an 8 bit comparator. The comparator uses
0.33GHz clock frequency. If the actual transmitted data was a high then the de spread output will
be same as that of the PN sequence. So the comparison function is performed in such a way that,
it compares the de spread output with PN sequence. If it is same, then it can be concluded that the
data send is a high and if it is not, then the data will be a low. So the comparator output
corresponds to the actual transmitted data of a particular user. Thus it is able to reconstruct the
original data from the spreaded output.
Linear feedback shift registers are used for generating PN sequences. Components of D ip ops
are used for this since structural modeling is used. To generate the sequence, rst it is necessary
to initialize the ip ops to a particular value. Since 15 bit long PN sequence is being used, four ip
ops are required and these four ip ops are required to be initialized. For that purpose, init signals
are used. After the initialization, the xor feedback logic will provide a method to generate a PN
sequence. Orthogonal sequences are required in this system. Time shifted versions of a PN
sequence will be nearly orthogonal. So to shift the sequences, shift registers are used in which
the sequence is given as input to the registers. The outputs from intermediate ip ops are taken
which will be time shifted. So at the output of PN generator four PN sequences are obtained.
Kendyala | [School]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc
algorithm are employed in the OCI router. The routing algorithm lies at the network layer,
which is a higher layer than the physical layer containing the crossbar switch. According to the
OSI model design principles, each layer of the model exists as an independent layer.
Theoretically, one can substitute one protocol for another at any given layer without affecting
the operation of layers above or below. Thus, using the same flow control protocol and routing
algorithm enables comparing the OCI-based router with SDMA- and TDMA-based routers.
A. OCI Crossbar High-Level ArchitectureThe main objective of this paper is increasing the
number of ports sharing the ordinary CDMA crossbar presented, while keeping the system
complexity unchanged using simple encoding circuitry and relying on the accumulator decoder
with minimal changes. To achieve this goal, some modifications to the classical CDMA crossbar
are advanced. Fig. 2 depicts the high-level architecture of the OCI crossbar for a single-bit
interconnection. The same architecture is replicated for a multibit CDMA router. M TX-RX
ports share the CDMA router, where spread data from the transmit ports are added using an
arithmetic binary adder having M binary inputs and an m-bit output, where m = _log2 M_. The
adder is implemented in both the reference and pipelined architectures.
A controller block is used for code assignment and arbitration tasks. Each PE is interfaced to
an
encoder/decoder wrapper enabling data spreading/despreading. Unlike orthogonal
spreadingcodes, which are XORed with the binary data bit, an AND gate is utilized to spread
data using nonorthogonal spreading codes.
The AND gate encoder works as follows: if the transmitted data bit is “0,” it sends a stream of
zeros during the whole spreading cycle, which does not cause MAI on the channel; if the
transmitted data bit is “1,” the encoder sends a nonorthogonal spreading code. Therefore, the
additional MAI spreading code will either contribute an MAI value of one or zero each clock
cycle because the encoder is an AND gate.
The XOR encoder of the ordinary CDMA crossbar cannot be used to encode the OCI codes
because it only complements the spreading code chips, so an XOR gate will cause MAI to the
crossbar whether the data bit is “0” or “1.”
A hybrid encoder is developed for both orthogonal and nonorthogonal spreading with an XOR
gate, an AND gate, and a multiplexer unit, as shown in Fig. 2. Two decoder types
In this section, the code design methodology, mathematical foundations, and the decoding
details of the OCI codes are provided. The notations used throughout this paper are listed. An
AND gate encoder is used to encode data with nonorthogonal spreading codes as shown in Fig.
2(a). Therefore, for a nonorthogonal encoder, if data to transmit are one, a single spreading chip
at a specific time slot in the spreading cycle is added to the channel sum, which causes the
consecutive sum difference to deviate. The nonorthogonal codes imitate the TDMA signaling
scheme as each code is composed of a single chip of “1” sent in a specific time slot.
The encoding/decoding scheme presented in this paper provide a novel approach that enables
coexistence between CDMA and TDMA signals in the same shared medium. Therefore, the
developed encoder is called TDMA overloaded on CDMA interconnect (T-OCI). Fig. 3 shows
an encoding/decoding example of two T-OCI codes for a spreading code of length N = 8. An
odd number of orthogonal codes must be used simultaneously to preserve the even difference
property of Walsh codes.
where S is the N-cycle waveform of the channel sum, dC( j ) is the orthogonal CDMA data bit
sent by the j th user, dT ( j ) is the nonorthogonal TDMA data bit sent by the j th, Co( j ) is the
orthogonal code assigned to the j th user, and T ( j−N +1) is the TDMA code assigned to the j th
user. The TDMA code T (i ) is a single chip of “1” assigned at the i th time slot.
1) Crossbar Controller: At the beginning of each crossbar transaction, the controller assigns
spreading codes to different encoders. The assignment of orthogonal dispreading codes to
receive ports is fixed, i.e., does not change between the crossbar transactions. Therefore, for a
router port to initiate the communication with the receive port it addresses, its encoder must be
assigned a spreading code that matches the destined decoder. If two different ports request to
address the same decoder, the controller allows one access and suspends the other according to a
predefined arbitration scheme. This code assignment scheme is called receiver-based protocol.
In this paper, a static allocation scheme that allocates fixed spreading codes to all encoders is
used. To interconnect a large number of PEs, a torus, star, or hybrid NoC topology can be
realized where the assignment of spreading codes is local to each router. Consequently, each
new packet arriving at a router is assigned a spreading code corresponding to its exit port
decoder. The crossbar controller issues handshake signals to the transmit and receive ports with
matching spreading codes to enable the transmitter encoders and receiver decoders.
2) Hybrid Encoder:
The encoder is hybrid, it can encode both orthogonal and nonorthogonal data. A transmitted data
bit is XORed/ANDed with the spreading code to produce the orthogonal/nonorthogonal spread
data, respectively. A multiplexer chooses
between the orthogonal and nonorthogonal inputs according to the code type assigned to the
encoder as depicted by Fig. 2(a). The encoder is replicated N times for the P-OCI crossbar.
3) Crossbar Adder: For a spreading code set of length N, the number of crossbar TX-RX ports is
equal to M =2(N − 1). In the T-OCI crossbar, sending a “1” chip to the adder is mutually
exclusive between nonorthogonal transmit ports according to the T-OCI encoding scheme. This
indicates that among the 2(N−1) inputs to the adder, there are guaranteed (N − 2) zeros, while
the maximum number of “1” chips is N. Therefore, a multiplexer is instantiated to select only a
single input of the nonorthogonal TDMA encoded data bits and discard the remaining bits that
are guaranteed to be “0.” Thus, the adder has only N-bit inputs, N−1 from orthogonal encoders,
and 1 from the multiplexer, as shown in Fig. 2(d). The sum produced by the adder circuit needs
(log2 N) wires. The number of needed stages of registers to pipeline the adder is (log2 N), as
depicted in Fig. 2(d). N replicas of the crossbar adder are instantiated for the parallel encoding
adopted in the P-OCI crossbar.
4) Custom Decoder: There are four decoder types for different CDMA decoding techniques: the
orthogonal T-OCI and P-OCI decoders and the overloaded T-OCI and P-OCI decoders. The
orthogonal T-OCI decoder is an accumulator implementation of the correlator receiver. N − 1
accumulator decoders are instantiated in all CDMA crossbar types for orthogonal data
despreading. Instead of implementing two different accumulators (the zero and one
accumulator), an up–down accumulator is implemented and the accumulated result is the
difference between the two accumulators of the conventional CDMA decoder as shown in Fig.
2(f). The accumulator adds or subtracts the crossbar sum values according to the despreading
code chip and resets every N cycles. The sign bit of the accumulated value directly indicates the
decoded data bit, where the positive sign is decoded as “1,” while the negative sign is decoded
as “0.” The P-OCI orthogonal decoder shown in Fig. 2(e) differs from the T-OCI orthogonal
decoder in receiving the adder sum values concurrently not sequentially; therefore, the
accumulator loop is unrolled into a parallel adder.
The T-OCI overloaded decoder depicted in Fig. 2(b) is composed of a 2-bit register to store the
LSBs of two sum values, first of which is S(0) and the second is S( j − N + 1), where j is the
number of the T-OCI decoders (N ≤ j ≤ 2N − 2). The two bits are fed to the XOR gate, which
decodes nonorthogonal spread data. The T-OCI decoder is replicated N times to implement the
P-OCI decoder of Fig. 2(c). The 2-bit register is not needed anymore because the S(0) and
S( j−N+1) values exist in the same cycle. The T-OCI and P-OCI crossbar architectures contain
(N − 1) orthogonal decoders and (N − 1) overloaded decoders.
34
If a NoC’s router has a larger FIFO buffer, the throughput will be larger and the latency in the
network smaller, since it will have fewer flits stagnant on the network [20]. Nevertheless, there
is a limit on the increase of the FIFO depth. Since each communication will have its
peculiarities, sizing the FIFO for the worst case communication scenario will compromise not
only the routing area, but power as well [6]. However, if the router has a small FIFO depth, the
latency will be larger, and quality of service (QoS) can be compromised. The proposed solution
is to have a heterogeneous router, in which each channel can have a different buffer size. In this
situation, if a channel has a communication rate smaller than its neighbour, it may lend some of
its buffer slots that are not being used. In a different communication pattern, the roles may be
reversed or changed at run time, without a redesign step. The proposed architecture is able to
sustain performance due to the fact that, statistically, not all buffers are used all the time. In our
architecture it is possible to dynamically reconfigure different buffer depths for each channel. A
channel can lend part or the whole of its buffer slots in accordance with the requirements of the
neighbouring buffers. To reduce connection costs, each channel may only use the available
buffer slots of its right and left neighbour channels. This way, each channel may have up to
three times more buffer slots than its original buffer with the size defined at design time. Fig. 4
shows the original and proposed input FIFO. Comparing the two architectures, the new proposal
uses more multiplexers to allow the reconfiguration process. Fig. 4(b) presents the South
Channel as an example
35
thereby increasing throughput and avoiding deadlock error. The flow of virtual channel in router
from input port to output port is as shown Fig.3. The incoming flit which has high priority
arrives to the neighbour router accessed appropriate virtual channel initially; thereafter entire
data packet will be processed. The incoming first flit of data packet is head flit which arrives to
top of virtual channel queue of the buffer thereby entering into RC stage. It decodes in RC stage
and creates respective direction of request towards destination router. The direction request of
flit transfers to VA stage to obtain selected virtual channels towards destination router. The
contention may occur among data packets with direction request towards destination router
when same virtual channel utilized. The data packets which are not accessed virtual channel
waits in VA stage and it will be start transfers data packet once current flit reached to next router
thereby avoiding contention failure of data packets. By multiplexing entire virtual channels to
one buffer queue, any flit cannot block other data packets which are available to route though
physical channel. In typical NoC structure, routers are intercommunicated through unidirectional
channel whereas in Bi-NoC, data intercommunicated in any channel thereby improving
bandwidth utilization. In order to configure channels dynamically, added channel control
module added to each directional channel. The proposed design uses each channel either input
and output therefore the width of channel request from the RC stage is doubled. The two bi-
directional channels are requested to data transfer at each output direction thereby decreasing
contention by sending data packets into same direction simultaneously. Hence, the channel
control module has two functions that are dynamic configuration and maintaining the channel
request. As bi-directional channel is shared with a pair of neighbour routers, the output of each
transition is authorized by channel control protocol of two routers. The channel control protocol
is composed by FSM module to obtain higher efficiency. The other responsibility is maintaining
of channel in terms of blocked or unblocked which is depend on status of the channel. When
channel is available to use, the arbiter sends the request to SA module to process the channel
allocation [16]. The highlight of this structure is replacing unidirectional channel into bi-
directional thereby enhancing the channel utilization and flexibility without required additional
bandwidth.
CHAPTER 5
5.1.1 VERILOG:
In the semiconductor and electronic design industry, Verilog is a hardware description language
(HDL) used to model electronic systems. Verilog HDL is most commonly used in the design,
verification, and implementation of digital logic chips at the register-transfer level of
abstraction. It is also used in the verification of analog and mixed-signal circuits.
Overview :
Hardware description languages such as Verilog differ from software programming languages
because they include ways of describing the propagation of time and signal dependencies
(sensitivity). There are two assignment operators, a blocking assignment (=), and a non-blocking
(<=) assignment. The non-blocking assignment allows designers to describe a state-machine
update without needing to declare and use temporary storage variables. Since these concepts are
part of Verilog's language semantics, designers could quickly write descriptions of large circuits
in a relatively compact and concise form. At the time of Verilog's introduction (1984), Verilog
represented a tremendous productivity improvement for circuit designers who were already
using graphical schematic capture software and specially written software programs to
document and simulate electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C programming
language, which was already widely used in engineering software development. Like C, Verilog
is case-sensitive and has a basic preprocessor (though less sophisticated than that of ANSI C/C+
+). Its flow keywords (if/else, for, while, case, etc.) are equivalent, and its operator precedence is
compatible. Syntactic differences include variable declaration (Verilog requires bit-widths on
net/reg types), demarcation of procedural blocks (begin/end instead of curly braces {}), and
many other minor differences.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and
communicate with other modules through a set of declared input, output, and bidirectional ports.
Internally, a module can contain any combination of the following: net/variable declarations
(wire, reg, integer, etc.), concurrent and sequential statement blocks, and instances of other
modules (sub-hierarchies). Sequential statements are placed inside a begin/end block and
executed in sequential order within the block. However, the blocks themselves are executed
concurrently, making Verilog a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined") and
strengths (strong, weak, etc.). This system allows abstract modeling of shared signal lines, where
multiple sources drive a common net. When a wire has multiple drivers, the wire's (readable)
value is resolved by a function of the source drivers and their strengths.
A subset of statements in the Verilog language is synthesizable. Verilog modules that conform
to a synthesizable coding style, known as RTL (register-transfer level), can be physically
realized by synthesis software. Synthesis software algorithmically transforms the (abstract)
Verilog source into a netlist, a logically equivalent description consisting only of elementary
logic primitives (AND, OR, NOT, flip-flops, etc.) that are available in a specific FPGA or VLSI
technology. Further manipulations to the netlist ultimately lead to a circuit fabrication blueprint
(such as a photo mask set for an ASIC or a bit stream file for an FPGA).
Create own custom-processing platform while reducing your system cost by consolidating
external functions into an FPGA. Select the perfect balance of feature and size for your system,
and optimize hardware/software design trade-offs for the best price-performance results that
meet your exacting requirements.
The ISE Design Suite System Edition provides a comprehensive suite of integrated development
environment, software tools, configuration wizards, and IP that facilitates your design and
utilizes all of the flexibility offered by a programmable platform. Xilinx CORE Generator™
System, included in all Editions of the ISE Design Suite, accelerates design time by providing
access to highly parameterized Intellectual Properties (IP) for Xilinx FPGAs and is included in
the ISE Design Suite. The available user-customizable IP functions range in complexity from
commonly used functions, such as memories and FIFOs, to system-level building blocks, such
as filters and transforms. Using these IP blocks can save days to months of design time. The
highly optimized IP allows FPGA designers to focus efforts on building designs quicker while
helping bring products to market faster.
The ISE Design Suite Embedded Edition includes all the tools and capabilities of the Logic
Edition with the added capabilities of the Embedded Development Kit (EDK). This pre-
configured kit is an integrated software solution for designing embedded processing systems,
which includes the Platform Studio tool suite as well as all the documentation and IP required
for designing Xilinx Platform FPGAs with embedded PowerPC® hard processor cores and
MicroBlaze™ soft processor cores. This edition provides an integrated development
environment of embedded processing tools, processor cores, IP, software libraries, and design
generators, including the following:
Library Generation Tool (LibGen) - - configures libraries, device drivers, file systems,
and interrupt handlers for the embedded processor system to create a software platform.
Bitstream Initializer (BitInit) - - updates a device configuration bit stream to initialize
the on-chip instruction memory with the software executable. For more information, see
the "Bitstream Initializer (BitInit)" chapter of the Embedded System Tools Reference
Manual and the “Initializing Software Overview” topic in the XPS Help.
System Generator for DSP - allows you to define and verify complete DSP systems
using industry-standard tools from The MathWorks. When using System Generator,
previous experience with Xilinx devices or RTL design methodologies is not required.
Designs are captured in the DSP-friendly Simulink® modeling environment using a
Xilinx-specific block set. All of the downstream synthesis and implementation steps are
automatically performed to generate a device programming file.
AccelDSP Synthesis Tool - allows you to transform a MATLAB floating-point design
into a hardware module that can be implemented in a Xilinx device. The AccelDSP
Synthesis Tool features an easy-to-use graphical interface that controls an integrated
environment with other design tools such as MATLAB tools, ISE software, and other
industry- standard HDL simulators and logic synthesizers. AccelDSP Synthesis provides
the following capabilities:
• Reads and analyzes a MATLAB floating-point design.
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc
The following steps are involved in the realization of a digital system using Xilinx FPGAs, as
illustrated by the following figure.
Figure 5.1: Overview of the various steps involved in the design flow of a digital system
Design Entry
The first step is to enter y our design. This can be done by creating “Source” files. Source files
can be created in different formats such as a schematic, or a Hardware Description Language
(HDL) such as VHDL, Verilog. A project design will consist of a top-level source file and
various lower-level source files. Any of these files can be either a schematic or a HDL file.
Design Synthesis
The synthesis step creates netlist files from the various source files. The netlist files can serve as
input to the implementation module.
Device Configuration
This refers to the actual programming of the target FPGA by downloading the programming file
to the Xilinx FPGA.
5.1.3. Modelsim:
Modelsim is a verification and simulation tool for VHDL, Verilog, System Verilog, and mixed
language designs. Modelsim is a powerful simulator that can be used to simulate the behavior
and performance of logic circuits. Modelsim is an easy-to-use yet versatile VHDL/ (System)
Verilog/ SystemC simulator by Mentor Graphics. It supports behavioral, register transfer level,
and gate-level modeling. Modelsim supports all platforms used here at the institute of Digital
and Computer Systems (i.e. Linux, Solaris and Windows) and many others too.
5.2.1. FPGA:
FPGA implementations have the potential to be parallel using a mixture of these two forms. For
example, the FPGA could be configured to partition the image and distribute the resulting
sections to multiple pipelines all of which could process data concurrently. Such parallelization
is subject to the processing mode and hardware constraints of the system.
Image processing is difficult to achieve on a serial processor. This is due to the large data set
required to represent the image and the complex operations that need to be performed on the
image. Consider video rates of 25 frames per second, a single operation performed on every
pixel of a 768X576 color image (Standard PAL frame) equates to 33 million operations per
second. FPGA consists of a matrix of logic blocks that are connected by an interconnect
network. Both the logic blocks and the interconnect network are reprogrammable allowing
application specific hardware to be constructed, while at the same time maintaining the ability to
change the functionality of the system with ease. As such, an FPGA offers a compromise
between the flexibility of general purpose processors and the hardware-based speed of ASICs.
There are three modes of processing: stream, offline and hybrid processing. In stream
processing, data is received from the input device in a raster nature at video rates. Memory
bandwidth constraints dictate that as much processing as possible can be performed as the data
arrives. In offline processing there is no timing constraint. This allows random access to
memory containing the image data. The speed of execution in most cases is limited by the
memory access speed. The hybrid case is a mixture of stream and offline processing. In this
case, the timing constraint is relaxed so the image is captured at a slower rate. While the image
is streamed into a frame buffer it can be processed to extract the region of interest. This region
can be processed by an offline stage which would allow random access to the region’s elements.
If there is no requirement on processing time then the constraint on timing is relaxed and the
system can revert to offline processing. This is often the result of a direct mapping from a
software algorithm. The constraint on bandwidth is also eliminated because random access to
memory is possible and desired values in memory can be obtained over a number of clock
cycles with buffering between cycles. Offline processing in hardware therefore closely
resembles the software programming paradigm; the designer need not worry about constraints to
any great extent. This is the approach taken by languages that map software algorithms to
hardware. The goal is to produce hardware that processes the input data as fast as possible given
various automatic and manual optimization techniques.
Frame buffering requires large amounts of memory. The size of the frame buffer depends
on the transform itself. In the worst case (rotation by90º, for example) the whole image must be
buffered. A single 24-bit (8-bitsper color channel) color image with 768X576 pixels requires 1.2
MB of memory. FPGAs have very limited amounts of on-chip RAM. The logic blocks
themselves can be configured to act like RAM, but this is usually an inefficient use of the logic
blocks. Typically some sort of off-chip memory issued but this only allows a single access to the
frame buffer per clock cycle, which can be a problem for the many operations that require
simultaneous access to more than one pixel from the input image. For example, bilinear
interpolation requires simultaneous access to four pixels from the input image. This will be on a
per clock cycle basis if real-time processing constraints are imposed.
In the implementation of image enhancement algorithms the Spartan®-3E FPGA is used to take
advantage of the different input and output interfaces to implement and verify the system.
The Spartan®-3E FPGA Starter Kit board supports a variety of FPGA configuration options:
Download FPGA designs directly to the Spartan-3E FPGA via JTAG, using the onboard
USB interface. The on-board USB-JTAG logic also provides in-system programming
for the on-board Platform Flash PROM and the Xilinx XC2C64A CPLD. SPI serial
Flash and Strata Flash programming are performed separately.
Program the on-board 4 Mbit Xilinx XCF04S serial Platform Flash PROM, then
configure the FPGA from the image stored in the Platform Flash PROM using Master
Serial mode.
Program the on-board 16 Mbit ST Microelectronics SPI serial Flash PROM, then
configure the FPGA from the image stored in the SPI serial Flash PROM using SPI
mode.
Program the on-board 128 Mbit Intel Strata Flash parallel NOR Flash PROM, then
configure the FPGA from the image stored in the Flash PROM using BPI Up or BPI
Down configuration modes. Further, an FPGA application can dynamically load two
different FPGA configurations using the Spartan-3E FPGA’s Multi Boot mode.
The proposed system is implemented on Spartan 3E development board, brief overview of Spartan 3E board
is given in section 5.2.
CHAPTER 6
Results And Discussions
6.1 Creating a New Project
Xilinx Tools can be started by clicking on the Project Navigator Icon on the Windows
desktop. This should open up the Project Navigator window on your screen. This window
shows the last accessed project.
Fig 6.1: Vivado Project Navigator window (snapshot from vivado software)
Opening a project Select File->New Project to create a new project. This will bring up a new
project window on the desktop. Fill up the necessary entries as follows:
Fig 6.2 : New Project Initiation window (snapshot from Vivado software)
Project Location: The directory where you want to store the new project (Note: DO NOT
specify the project location as a folder on Desktop or a folder in the Xilinx\bin directory.
Your D: drive is the best place to put it. The project location path is NOT to have any spaces
in it eg: D:\ABCD\TA\new lab\sample exercises\o_gate is NOT to be used) Leave the top
level module type as HDL.
Example: If the project name were “RoBA”, enter “RoBA” as the project name and then
click “Next”.
Fig 6.3: Adding source code into the project(snapshot from Vivado software)
In this step we need to add the codes according to the block diagram that is proposed system
in the vivado.
Fig 6.4: Selecting the Board Required (snapshot from Vivado software)
49
In this step we need to select zed board zynq evaluation and development kit for the dumping
of the code into this board to get the better performance of the vivado software and click on
the next.
This zeb board is more efficient than any other boards also we can use the other but with the
good power efficiency and delay and area
Make sure that all the files are available with green marks and then click on ok to continue.
Then a window will open which shows that to create the project we need to click on finish.
Select the uut _ POSIT multiplier and set as the Top in the editor window and open
elaborated design to get the RTL diagram and run synthesis and implementation
6.7: Code
Fig 6.7.1
Fig 6.7.2
Fig 6.7.3
Fig 6.7.4
CHAPTER 7
Simulation And Synthesis Report.
These are the RTL Diagram from the open elaborated design
7.3 : Area
7.5 : Delay
CHAPTER 8
CONCLUSION
An advanced FIFO structure based NoC is simulated and synthesized in Xilinx 14.7 ISE and
implemented Vertex-6 FPGA device to analyze the performance in terms of occupied area,
latency, power consumption and throughput. Single router is designed initially and then
designed mesh based NoC to realize the memory utilization of FPGA. Fig.4 indicates that
Register Transfer Level (RTL) schematic of single NoC router which is composed with input
and output ports, arbiter, crossbar and channel control modules. The figure also describes the
utilizations in terms of memory units each component individually. Each module of NoC
designed using Verilog Hardware Description Language (HDL) separately and integrated as
one module. An advanced queued buffer is designed both typical NoC and Bi-directional NoC
thereby comparing both designs easily. The simulation results are analyzed area utilization in
terms of occupied number of slices registers, LUT-FF pairs and slice registers), latency in
terms of delay, Maximum operating frequency, power consumption in terms of dynamic power
dissipation, memory utilization in terms of number of RAMs, and finally, throughput in terms
of flits per sec., node. describes the performance of NoC router in terms area, delay and power
consumption which are obtained by implemented proposed in FPGA configuration. From
fig.5, it clear that proposed design shows less area overhead because of queued buffer shared
between neighbour routers and also data flits are used to transfer data packet between source
and destination. The memory unit such as number of RAMs is also less because of active
components uses buffer whereas idle modules are not using RAMs. The delay of proposed
design is less alternatively operating frequency is high because more number of channels (both
physical and virtual) is available between source and destination. The total power consumption
is slightly increased than existing work because of virtual channels are increased dynamic
power consumption while data packet transfer NoC is the solution for intercommunication of
SoC such as parallel communication wires and also removes barriers of bus based
communication. In this paper, an advanced memory unit is proposed and implemented in Bi-
[Type here] [Type here] [Type here]
Mini Project Report Advanced FIFO Structure For Router Bi-Noc
NoC to achieve less memory requirement of buffer and also high performance in terms of
Maximum operating bandwidth. When compared to previous work, the proposed work
improved approximately 28% delay and 17% resources utilization. As RingNet[15] used
Round robin arbiter, the resources utilization is more than proposed work. Data packet divided
into number of flits and queued buffer is shared between neighbour routers thereby requiring
of buffer size is less when data transferred through data flits from source to destination. This
advanced router design integrated in Bi-NoC configuration to achieve
higher data transfer speed when compared to typical NoC. Virtual channels are created between
routers when data flit is block in case of physical channel is not available therefore data packet
latency is reduced as well as deadlock error avoided. The implementation results are improved
in terms of resource utilization when compared with existing work. In future, NoC based
processors are used at Artificial Intelligence applications. The performance NoC is needed to be
improved by advancing router components because the power consumption increased through
virtual channels at advanced FIFO structure.
Many future work directions are inspired by this paper including exploiting the
mathematical properties of the code space to find additional nonorthogonal codes and boost the
CDMA interconnect capacity and exploring more architectural optimizations of the OCI
crossbar. Studying the robustness of CDMA interconnects and its enhancement techniques will
be one of the prior future research points. Moreover, we plan to investigate using the OCI-based
routers in different network topologies, evaluate their performance using standard benchmarks,
and study their suitability for various applications.