Nikos Paper2
Nikos Paper2
Nikos Paper2
Abstract
This paper proposes a system architecture and a set of procedures for a combined H/W S/W
implementation of the ETHERNET protocol in a DSP based environment. The architecture comprises a
protocol dedicated H/W block, a programmable DSP core and a uP. The ETHERNET MAC layer protocol
functions are partitioned between the DSP S/W and the dedicated H/W block. This approach allows the
minimization of dedicated H/W functions, thus ensuring maximum flexibility and economizing valuable die
resources for possible IC implementation while offering the capability for functional modifications per
application by amending DSP S/W in the context of a given system architecture. The paper starts with a
description of the system architecture. The generic functional blocks and the implementation of protocol
functions in each one are described and the relative information flows are elaborated. Next, the
implementation of the ETHERNET level 2 protocol functions is discussed and the criteria for function
partitioning are presented. The paper concludes with a discussion of the implementation parameters and
performance.
System structure
TXD
DSP
and
Micro
Controller
TEN
COL
Ethernet-dedicated
H/W block (DHB)
Upper Layers
TCLK
Data
Memory
Bank
Transceiver
CD
RXD
RCLK
Code
RAM
MAC Layer
Layer 1
In the context described above, the DSP together with the DHB implement the functions of the MAC layer
of the IEEE 802.3 protocol. Part of the MAC functions are implemented by S/W in the DSP and another
part by the DHB.
2.1
The DHB contains two functional blocks: the receiver and the transmitter.
2.1.1
Receiver H/W block
The Receiver H/W block contains one 64-byte receive memory bank which is DSP memory mapped. It also
contains a Deserializer unit. When a frame is received from the medium, the transceiver activates the CD
signal and drives the bits of the received frame on the RXD pin one by one and synchronously with the
RCLK pin. The Deserializer unit accepts these bits, assembles them in bytes and stores them in successive
byte positions of the receive memory bank. The 64-byte memory bank is managed by the Deserializer as
two 32-byte switchable FIFOs. The Deserializer writes 32 successive bytes in one FIFO, then 32
successive bytes to the other and so on. This allows DSP read access to one half of the receive memory
while the other half is written by the Deserializer. When one of the two FIFOs has been filled by the
Deserializer, an interrupt indicating this event is issued to the DSP in order that it transfers the 32 received
frame bytes to the external data memory (DMB). CRC Calculation is carried out on the fly over the bits of
the received frame. The calculated CRC value is compared to the CRC value carried in the frame at the end
of reception and a DSP accessible pass/fail flag is updated for each new frame.
The Receiver H/W block reports protocol and system events to the DSP by activating interrupt signals
towards the Interrupt Control module which in turn activates an interrupt signal towards the DSP.
All interfacing functions between the DSP and the Receiver H/W block are carried out via the DSP internal
data and address buses. In order to implement the matching of the module with the DSP data and address
buses, an Address Decoder and a Multiplexer are included in the block. The Address Decoder is decoding
the Address bus in order to generate the FIFO read signals and the control and status register signals. The
Multiplexer is controlled by the Control Module and switches the data flow between the two FIFOs.
Interfacing with the external Ethernet transceiver is carried out via the following external pins:
RCLK Serial clock (10 MHz)
RXD Serial data
CD Carrier Detect
The hardware modules of the Receiver H/W block and the basic interconnection signals are shown in
Figure 2:
DSP Side
Address_Bus
Ethernet Side
Address
Decoder
Status_reg
DSP_clk
Control
Data_Bus
Frame_length
FIFO1
RCLK
MUX
FIFO1_Full
DeMUX
Deserializer
RXD
FIFO2
CRC_en
FIFO2_Full
Interrupt
Interrupt
Controller
CRC_flag
CRC
Comparator
CRC
CD
The hardware modules of the Transmitter H/W block and the basic interconnection signals are shown in
Figure 3:
Ethernet Side
DSP Side
COL
Address_Bus
DSP_clk
Data_Bus
Address
Decoder
Control
End_of_trans
TEN
JAM_oe
Status_reg
JAM
FIFO1
TCLK
FIFO1_Empty
MUX
Serializer
TXD
FIFO2
CRC_oe
CRC_en
FIFO2_Empty
Interrupt
Interrupt
Controller
CRC
State name
Startup_Wait_For_Carrier_Off
Ready_To_Receive_Not_Transmit
Ready_To_Receive_And_Transmit
Receiving
Pending_6usec_After_Reception
Pending_3.6usec_After_Reception
Pending_9.6usec_After_Reception
Transmitting
exchange necessary information through semaphores or dedicated memory positions. Basic state machine
information, such as current state, is stored in common variables accessible to all ISRs.
The implementation of the MAC functionality is achieved by constructing each ISR as a jump tree that runs
specific C functions which implement specific functions of the MAC protocol. The C functions that will be
executed are determined by the signal which has been received and the current state.
2.2.2
Interfacing with other system components
The interfaces between the DSP S/W and the other system components together with the global status
variables are shown in Figure 4:
Interrupt Register
to uP
Control
Interface
to
Host
EORF(number,bool )
EOTF
LCol
ExCol
Collision
Setup_DMAC(oper)
Interrupt Register
from uP
TransmitRequest
Startup/Reset
Commands to Timers
Timers
- General
- Backoff
Start_TimerGeneral ( )
Start_TimerBackoff ( )
Stop/Reset_TimerGeneral
DSP
DSP Flags
to H/W
S/W Variables
StartRCV
StopRCV
StartTRM
ResetRx_Machine
ResetTx_Machine
State
NewRCVFrame
FirstRFIFOfull
SecondRFIFOfull
Last_RCVPage_Read
Pending_Last_Transfer
_To_Buffer
Backoff_Clear
Pending_Transmission
SecondTFIFOempty
PartOfTxFrame
Preparing_Transmission
Pending_Transmit
_Request
H/W Interrupt
Register
CarrierSense
NoCarrierSense
RFIFOfull
TFIFOempty
Collision
EOCP
Interface
to H/W
H/W Flags
EOTF
RxCRC
Carrier
Interrupt Register
From DMAC
Transfer_From_Buffer_
Completed
Transfer_To_Buffer_
Completed
Interrupt Register
To DMAC
Transfer_From_Buffer (num)
Transfer_To_Buffer (num)
DMAC
Interface
4. The Timers
The DSP may load and start two independent, programmable down counters. At the end of the count, a
dedicated interrupt for each one of the counters is issued towards the DSP. These interrupts are used for the
counting of time intervals.
2.2.3
Functionality and procedures
In the following paragraphs, the functionality of the DSP S/W is presented in detail. As is described in the
Ethernet protocol specifications, Transmission and Reception are two independent functions. It is not
possible to have both transmission and reception, at the same time, in the same station. That is why, both
hardware and software are designed in such a way, that transmission and reception constitute two
independent functions. Following this principle, we are going to describe the Reception functionality
separately from the Transmission functionality.
2.2.3.1 Reception
The code implementing the Receive Direction will be presented through six different scenarios, which
demonstrate all the possible states and situations that may occur.
Normal Frame Reception
The reception of a new frame begins when the Carrier changes from 0 to 1, while in
Ready_To_Receive state. As a result of this Carrier transition, the DSP commands the H/W to start
receiving (Command_StartRCV), initializes the global variables that refer to the Receive Direction and
changes the state to Receiving.
When the first Rx FIFO full interrupt arrives, the DSP reads the words, that correspond to the Destination
Address Field of the incoming frame, and checks if there is an address matching between that address and
the current stations address. If yes, the DSP reads the word, which corresponds to the Length (Data) Field
of the Rx frame, saves the length in a variable and checks if the length is acceptable (data bytes should not
be more than 1500). If there is no problem, the DSP transfers the words (except the 7 preamble bytes and
the SFD byte) from the Rx FIFO to the DMA Rx Mailbox, stores the number of words remaining until the
end of the current Rx frame in the RxWordsRemaining variable and sends a DREQR interrupt to the
DMA Controller, in order to transfer the words from the DMA Rx Mailbox to the external Rx Buffer.
When the requested transfer is completed, the DMA Controller informs the DSP about this
(Transfer_To_Buffer_Completed Interrupt), but the DSP ignores this interrupt.
After a while, the H/W will send another Rx FIFO full interrupt. Nevertheless, this interrupt will arrive
later than the Transfer_To_Buffer_Completed interrupt (referring to the first part of the Rx frame),
because the DMA transfer is much quicker than the rate that the Rx FIFOs are filled. The DSP reads the 16
words of the full Rx FIFO, transfers them to the DMA Rx Mailbox, updates the RxWordsRemaining
variable and sends a DREQR interrupt to the DMA Controller. When the requested transfer is completed,
the DMA Controller informs the DSP about this (Transfer_To_Buffer_Completed Interrupt), but the DSP
ignores this interrupt again.
This cycle between the interrupts Rx FIFO full, DREQR and Transfer_To_Buffer_Completed
continues until the Carrier drops (changes from 1 to 0). This Carrier transition is translated as the end of the
Rx frame. The DSP commands the H/W to stop receiving (Command_StopRCV), transfers the remaining
(according to the RxWordsRemaining variable) words of the current Rx frame to the DMA Rx
Mailbox, starts the timer of 6.0 usec and changes the state to Pending_6usec_After_Reception.
When the last Transfer_To_Buffer_Completed interrupt arrives, the whole Rx frame is now stored in the
external Rx Buffer. The DSP reads then the RxCRC_Flag in order to check if the Rx frame was CRC
correct or wrong. Then, it resets the H/W Rx Machine (Command_ResetRx_Machine) and informs the
uP that a CRC correct / wrong Rx frame of Total Rx frames Length bytes is stored in the
external buffer.
first bytes of the Tx frame transferred from the external Tx Buffer to the DMA Tx Mailbox. Soon, a
Transfer_From_Buffer_Completed interrupt comes, acknowledging that the requested transfer has
completed. Then, the DSP sends the preamble and the SFD bytes to the Tx FIFO, reads the Length Field of
the Tx Frame and stores it in a variable (TRM_Length), transfers the words from the DMA TX Mailbox to
the Tx FIFO and requests the next transfer from the external Tx Buffer.
When the next Transfer_From_Buffer_Completed interrupt comes, the DSP moves the data from the
DMA Tx Mailbox to the other Tx FIFO, stores the number of remaining data + pad words in the
TRM_DataPadWords_Remaining variable and sends a DREQT interrupt to the DMA Controller. If
we are in a Ready_To_Transmit state, the transmission starts by sending a StartTRM Command to
the H/W. In any case, the next Transfer_From_Buffer_Completed interrupt will be ignored.
When a Tx FIFO empty interrupt arrives, the DSP moves the words from the DMA Tx Mailbox to the
emptied Tx FIFO, updates the TRM_DataPadWords_Remaining variable and requests the next
transfer of data from the external Tx Buffer to the Tx Mailbox, by sending a DREQT interrupt. The next
Transfer_From_Buffer_Completed interrupt will be ignored again.
This cycle between the interrupts Tx FIFO empty, DREQT and Transfer_From_Buffer_Completed will
continue until the Carrier changes from 1 to 0. This means that the transmission of the frame has finished.
At this moment, the DSP checks the EOTF_Flag. If the EOTF_Flag shows that the transmission of the
frame ended normally, the DSP sends an EOTF interrupt to the uP, in order to inform it that the current
Tx frame has been normally transmitted. The DSP also resets the H/W Tx Machine
(Command_ResetTx_Machine), starts the 9.6 usec and changes the state to
Pending_9_6usec_After_Transmission. In case that the EOTF_Flag shows that the
transmission did not end properly, then we expect a Collision interrupt to arrive soon. What the DSP does if
a collision occurs is described below.
Collision
When a collision occurs, the H/W sends a Collision Interrupt to the DSP. The Collision Interrupt handler
checks if 64 bytes have already been transmitted. If yes, then we have a Late Collision, which is an
abnormal situation. The uP is informed about this situation via a LateCollision Interrupt. If less than 64
bytes have been transmitted, then we have a collision and the DSP informs the uP by causing a Collision
interrupt. In both cases, the DSP checks the Collision_Counter variable, which carries the number
of collisions that occurred for the current Tx frame. If its value is smaller than 16, then the DSP blocks the
beginning of transmission of the current frame via the Backoff_Clear variable, calculates the Backoff
time, according to the Binary Backoff Algorithm, starts the Backoff_Timer and changes the state to
Ready_To_Receive_Not_Transmit. If more than 15 collisions occur for the current Tx frame,
then we have an abnormal situation of excessive collision. The DSP informs the uP via an
ExcessiveCollisions Interrupt.
2.3
Partitioning criteria
The implementation of any function of the Ethernet IEEE 802.3 protocol can be carried out either in
hardware by using an FPGA chip or in software by programming the DSP core. Hardware implementation
is fast, but it is not reconfigurable and consumes area on the die. Software implementation is flexible and
easily reconfigurable but consumes on chip RAM and ROM capacity. Generally speaking, simple functions
that are repeated many times are preferable for hardware implementation to avoid extensive depletion of
DSP processing capacity; complicated functions, which are not repeated often, are suitable for DSP
software implementation. Consequently, the implementation of any function can be optimized by careful
partitioning between hardware and software. A schematic description of how partitioning optimization can
be achieved is depicted in the following figure, where the hardware and software resources required for
implementation are plotted as a function of partitioning percentage in the same graph. The sum of the
overall resources required provides an overall efficiency function which has to be minimized in order for
optimum implementation.
Total Resources
Software Resources
Hardware Resources
optimum design
100%
Hardware
100%
Software
For the same reasons, the CRC Calculation Function in the transmit direction is implemented in H/W. It
has been found that approximately 4200 DSP cycles (i.e. 164 MIPS) are needed for the transfer of 32 bytes
of a frame from the mailbox to the Tx FIFO, including CRC calculation, while no more than 200 cycles
(i.e. 7,8 MIPS) are required for the same function if the CRC is calculated in H/W. In addition to this, the
H/W module also sends the JAM signal , in case that a collision occurs, as it is time critical for the protocol
to transmit the JAM pattern as soon as the collision is detected.
2.3.2
S/W functions
In the receive direction, the DSP S/W implements the rest of the MAC functions of the Ethernet protocol.
First of all, it has been found that about only 35 cycles (i.e. 1.3 MIPS) are required for the Address
Matching Module, which identifies whether the incoming frame is targeted for the specific station or not.
Apart from this, the DSP S/W detects and discards short and runt packets.
In the transmit direction, the DSP S/W controls the flow of data from the uP to the Tx FIFO. It requests
data bytes from the uP and writes them to the Tx FIFO. It also starts the H/W Transmitter.
Implementation parameters
The implementation is realised by designing two functionally independent blocks, namely the transmitter
and the receiver, and combining them into an FPGA chip. According to the LCA architecture of FPGAs,
the area consumed in such a design is measured in Configurable Logic Blocks (CLBs). A CLB contains
some memory cells, some flip-flops and some combinational logic, depending on the FPGA Vendor. The
architecture described in former paragraphs has been implemented in a XILINX XC4044XLC-3 FPGA,
consuming approximately 1300 CLBs.
3.1
H/W parameters
The amount of CLBs that are necessary for the implementation of the basic internal elements of the
transmitter and the receiver block is shown at the following table:
Packed CLBs
Element Name
Address Decoder
Control
FIFO
CRC
CRC Comparator
Deserializer
Serializer
Bus Multiplexer
JAM
Interrupt Controller
TOTAL
Receiver
92
22
(2 x 154) 308
14
19
195
Transmitter
92
16
(2 x 154) 308
14
127
16
22
82
SUM
650 SUM
595
1327 (including Interrupt Controller)
Decoding element are common to both modes. Thus, they are redesigned adding some CLBs and
interconnection nets to the chip.
The post optimization area consummation is shown at the following table:
Packed CLBs
Element Name
Address Decoder
Control
FIFO
CRC
CRC Comparator
Deserializer
Serializer
Bus Multiplexer
JAM
Interrupt Controller
TOTAL
Receiver
Transmitter
106
22
354
15
19
195
16
127
16
22
82
974
3.2
As mentioned before, a considerable portion of the MAC functionality is implemented by the DSP S/W.
The program code written for the support of the Ethernet functionality regarding the architecture described
above occupies 1.6 KB of the DSP internal memory. In addition to this, it has been found that
approximately 10 MIPS are required in the transmit direction of the DSP S/W while approximately 12
MIPS are required in the receive direction. The difference in DSP performance between the two directions
is due to the extra processing performed by the DSP during reception (address matching, frame-length error
recognition).
3.3
The architecture described in former paragraphs provides full flexibility in terms of the partitioning of the
Ethernet functionality. An overall approach of the implementation in software in the DSP would require
more memory for the program code. Thus, more die area would be necessary, given that memory occupies
the greatest percentage of the chip. The hardware functions can be realized in a low-cost common FPGA
part without affecting the performance of the system. On the other hand, a complete implementation of the
Ethernet functionality in hardware requires a greater FPGA part in terms of area and speed, thus increasing
the cost.