Reed Solomon
Reed Solomon
Abstract
Chapter 1
1.1 OVERVIEW
1.2 LITERATURE SURVEY
1.3 OBJECTIVE OF THESIS
1
6.2 Behavioral level
6.3 Register-Transfer Level
6.4 Gate Level
6.5 History of Verilog
6.6 Features of Verilog HDL
6.7 Simulation
6.8 SYNTHESIS:
2
Concurrent Error Detection in Reed–Solomon
Encoders and Decoders
ABSTRACT
Reed–Solomon (RS) codes are widely used to identify and correct errors
in transmission and storage systems. When RS codes are used for high reliable
systems, the designer should also take into account the occurrence of faults in
the encoder and decoder subsystems. In this project, self-checking RS encoder
and decoder architectures are presented. The RS encoder architecture exploits
some properties of the arithmetic operations in .
3
CHAPTER 1
INTRODUCTION
Digital communication system is used to transport an information bearing signal from the
source to a user destination via a communication channel. The information signal is
processed in a digital communication system to form discrete messages which makes the
information more reliable for transmission. Channel coding is an important signal
processing operation for the efficient transmission of digital information over the
channel. It was introduced by Claude E. Shannon in 1948 by using the channel capacity
as an important parameter for error – free transmission. In channel coding the number of
symbols in the source encoded message is increased in a controlled manner in order to
facilitate two basic objectives at the receiver: error detection and error correction. Error
detection and error correction to achieve good communication is also employed in
electronic devices. It is used to reduce the level of noise and interferences in electronic
medium. The amount of error detection and correction required and its effectiveness
depends on the signal to noise ratio (SNR).
1.1 OVERVIEW
4
In source coding, the encoder maps the digital generated at the source output into another
signal in digital form. The objective is to eliminate or reduce redundancy so as to provide
an efficient representation of the source output. Since the source encoder mapping is one-
to – one, the source decoder on the other end simply performs the inverse mapping,
thereby delivers to the user a reproduction of the original digital source output. The
primary benefit thus gained from the application of source coding is a reduced bandwidth
requirement.
In channel coding, the objective for the encoder is to map the incoming digital signal into
a channel input and for the decoder is to map the channel output into an output signal in
such a way that the effect of channel noise is minimized. That is the combined role of the
channel encoder and decoder is to provide for a reliable communication over a noisy
channel. This provision is satisfied by introducing redundancy in a prescribed fashion in
the channel encoder and exploiting it in the decoder to construct the original encoder
input as accurately as possible. Thus in source coding, redundant bits are removed
whereas in channel coding, redundancy is introduced in a controlled manner.
Then modulation is performed for the efficient transmission of the signal over the
channel. Various digital modulation techniques could be applied for modulation such as
Amplitude Shift Keying (ASK), Frequency- Shift Keying (FSK) or Phase – Shift Keying
(PSK). The addition of redundancy in the coded messages implies the need for increased
transmission bandwidth. Moreover, the use of coding adds complexity to the system,
especially for the implementation of decoding operations in the receiver. Thus bandwidth
and system complexity
has to be considered in the design trade – offs in the use of error - control coding to
achieve acceptable error performance.
Different errors correcting codes can be used depending on the properties of the system
and the application in which the error correcting is to be introduced. Generally error –
correcting codes have been classified into block codes and convolutional codes. The
distinguishing feature for the classification is the presence or absence of memory in the
encoders for the two codes. To generate a block code, the incoming information stream is
divided into blocks and each block is processed individually by adding redundancy in
accordance with a prescribed algorithm. The decoder processes each block individually
and corrects errors by exploiting redundancy.
Many of the important block codes used for error – detection are cyclic codes. These are
also called cyclic redundancy check codes.
In a convolutional code, the encoding operation may be viewed as the discrete – time
convolution of the input sequence with the impulse response of the encoder. The duration
of the impulse response equals the memory of the encoder. Accordingly, the encoder for
a convolutional code operates on the incoming message sequence, using a “sliding
window”equal in duration to its own memory. Hence in a convolutional code, unlike a
block code where codewords are produced on a block- by – block basis, the channel
encoder accepts message bits as continuous sequence and thereby generates a continuous
sequence of encoded bits at a higher rate.
5
1.2 LITERATURE SURVEY
Channel coding is a widely used technique for the reliable transmission and reception of
data. Generally systematic linear cyclic codes are used for channel coding. In 1948,
Shannon introduced the linear block codes for complete correction of errors [1]. Cyclic
codes were first discussed in a series of technical notes and reports written between 1957
and 1959 by Prange [2], [3], [4]. This led directly to the work published in March and
September of 1960 by Bose and Ray-Chaudhuri the BCH codes. [5], [6], [7]. In 1959,
Irving Reed and Gus Solomon described a new class of error-correcting codes called
Reed-Solomon codes [8]. Originally Reed-Solomon codes were constructed and decoded
through the use of finite field arithmetic [9], [10] which used nonsingular Vandermonde
matrices [10]. In 1964 Singleton showed that this was the best possible error correction
capability for any code of the same length and dimension [11]. Codes that achieve this
"optimal" error correction capability are called Maximum Distance Separable (MDS).
Reed-Solomon codes are by far the dominant members, both in number and utility, of the
class of MDS codes. MDS codes have a number of interesting properties that lead to
many practical consequences.
The generator polynomial construction for Reed-Solomon codes is the approach most
commonly used today in the error control literature. This approach initially evolved
independently from Reed-Solomon codes as a means for describing cyclic codes.
Gorenstein and Zierler then generalized Bose and Ray-Chaudhuri's work to arbitrary
Galois fields of size p m , thus developing a new means for describing Reed and
Solomon's "polynomial codes" [12]. It was described that vector c is a code word in the
code defined by g(x) if and only if its corresponding code polynomial c(x) is a multiple of
g(x). So the information symbols could be easily mapped onto code words. All valid code
polynomials are multiples of the generator polynomial. It follows that any valid code
polynomial must have as roots the same 2t consecutive powers of α that form the roots of
g(x). This approach leads to a powerful and efficient set of decoding algorithms.
After the discovery of Reed-Solomon codes, a search began for an efficient decoding
algorithm. In 1960, Reed and Solomon proposed a decoding algorithm based on the
solution of sets of simultaneous equations [8].Though much more efficient than a look-up
table, Reed and Solomon's algorithm is still useful only for the smallest Reed-Solomon
codes. In 1960 Peterson provided the first explicit description of a decoding algorithm for
binary BCH codes [13], His "direct solution" algorithm is quite useful for correcting
small numbers of errors but becomes computationally intractable as the number of errors
increases. Peterson's algorithm was improved and extended to non - binary codes by
Gorenstein and Zierler (1961) [12], Chien (1964) [14], and Forney (1965) [15]. These
efforts were productive, but Reed-Solomon codes capable of correcting more than six or
seven errors still could not be used in an efficient manner.
In 1967, Berlekamp demonstrated his efficient decoding algorithm for both non - binary
BCH and Reed-Solomon codes [16], [17]. Berlekamp's algorithm allows for the efficient
6
decoding of dozens of errors at a time using very powerful Reed-Solomon codes. In 1968
Massey showed that the BCH decoding problem is equivalent to the problem of
synthesizing the shortest Linear Feedback Shift Register capable of generating a given
sequence [18]. Massey then demonstrated a fast shift register-based decoding algorithm
for BCH and Reed-Solomon codes that is equivalent to Berlekamp's algorithm. This shift
register-based approach is now referred to as the Berlekamp-Massey algorithm.
In 1975 Sugiyama, Kasahara, Hirasawa, and Namekawa showed that Euclid's algorithm
can also be used to efficiently decode BCH and Reed- Solomon codes [19]. Euclid's
algorithm is a means for finding the greatest common divisor of a pair of integers. It can
also be extended to more complex collections of objects, including certain sets of
polynomials with coefficients from finite fields.
As mentioned above, Reed – Solomon codes are based on the finite fields so they can be
extended or shortened. In this thesis Reed-Solomon codes used for decoding in the
compact discs are encoded and decoded. The generator polynomial approach has been
used for encoding and decoding of data.
7
CHAPTER 2
The designer of an efficient digital communication system faces the task of providing a
system which is cost effective and gives the user a level of reliability. The information
transmitted through the channel to the receiver is prone to errors. These errors could be
controlled by using Error- Control Coding which provides a reliable transmission of data
through the channel. In this chapter, a few error control coding techniques are discussed
that rely on systematic addition of redundant symbols to the transmitted information.
Using these techniques, two basic objectives at the receiver are facilitated: Error
Detection and Error Correction.
To make error correction possible the symbol errors must be detected. When an error has
been detected, the correction can be obtained in the following ways:
(1) Asking for a repeated transmission of the incorrect codeword (Automatic repeat
Request (ARQ)) from the receiver.
(2) Using the structure of the error correcting code to correct the error (Forward Error
Correction (FEC)).
It is easier to detect an error than it is to correct it. FEC therefore requires a higher
number of check bits and a higher transmission rate, given that a certain amount of
information has to be transmitted within a certain time and with a certain minimum error
probability. The reverse is also true; if the channel offers a certain possible transmission
rate, ARQ permits a higher information rate than FEC, especially if the channel has a low
error rate. FEC however has the advantage of not requiring a reply channel. The choice in
each particular case therefore depends on the properties of the system or on the
8
application in which the error – correcting is to be introduced. In many applications, such
as radio broadcasting or Compact Disc (CD), there is no reply channel. Another
advantage of the FEC is that the transmission is never completely blocked even if the
channel quality falls below such low levels that ARQ system would have completely
asked for retransmission. In a system using FEC, the receiver has no realtime contact
with the transmitter and can not verify if the data was received correctly. It must
make a decision about the received data and do whatever it can to either fix it or declare
an alarm.
There are two main methods to introduce Error- Correcting Coding. In one of them the
symbol stream is divided into block and coded. This consequently called Block Coding.
In the other one a convolution operation is applied to the symbol stream. This is called
Convolutional Coding.
FEC techniques repair the signal to enhance the quality and accuracy of the received
information, improving system performance. Various techniques used for FEC are
described in the following sections.
9
2.3 ERROR DETECTION AND CORRECTION CODES
The telecom industry has used FEC codes for more than 20 years to transmit digital data
through different transmission media. Claude Shannon first introduced techniques for
FEC in 1948 [1]. These error-correcting codes compensated for noise and other physical
elements to allow for full data recovery.
The EDAC schemes employ an algorithm, which expresses the information message,
such that any of the introduced error can easily be detected and corrected (within certain
limitations), based on the redundancy introduced into it. Such a code is said to be e-error
detecting, if it can detect any error affecting at most e-bits e.g. parity code, two-rail code,
m-out-of-n, Berger ones etc.
Similarly it is called e-error correcting, if it can correct e-bit errors e.g. Hamming codes,
Single Error Correction Double Error Detection (SECDED) codes, Bose Choudhary –
Hocquenqhem (BCH) Codes, Residue codes, Reed Solomon codes etc. As mentioned
earlier both error detection and correction schemes require additional check bits for
achieving CED. An implementation of this CED scheme is shown in Figure
Here the message generated by the source is passed to the encoder which adds redundant
check bits, and turns the message into a codeword. This encoded message is then sent
through the channel, where it may be subjected to noise and hence altered. When this
message arrives at the decoder of the receiver, it gets decoded to the most likely message.
If any error had occurred during its transmission, the error may either get detected and
necessary action taken (Error Detection scheme) or the error gets corrected and the
operations continue (Error Correction Scheme).
Detection of an error in an error detection scheme usually leads to the stalling of the
operation in progress and results in a possible retry of some or all of the past
10
computations. On the other hand the error correction scheme permits the process to
continue uninterrupted, but usually requires higher amount of redundancy than the error
detection. Each of these schemes has varied applications, depending on the reliability
requirements of the system.
The Error Syndrome is a defined function which is used to exactly identify the error
location by addition of the bad parity bit locations. It is the vector sum of the received
parity digits and the parity check digits recomputed from the received information
digits. A Codeword is a block of n symbols that carries the k information symbols and
the r redundant symbols (n = k + r).
For a (n, k) block code, the rate of the code defined as the ratio of number of
information bits to the length of code k/n.
11
2.3.2 CONCURRENT ERROR DETECTION SCHEMES
Schemes for Concurrent Error Detection (CED) find wide range of applications, since
only after the detection of error, can any preventive measure be initiated. The principle of
error detecting scheme is very simple, an encoded codeword needs to preserve some
characteristic of that particular scheme, and a violation is an indication of the occurrence
of an error. Some of the CED techniques are discussed below.
These are the simplest form of error detecting codes, with a hamming distance of two
(d=2), and a single check bit (irrespective of the size of input data). They are of two basic
types: Odd and Even. For an even-parity code the check bit is defined so that the total
number of 1s in the code word is always even; for an odd code, this total is odd. So,
whenever a fault affects a single bit, the total count gets altered and hence the fault gets
easily detected. A major drawback of these codes is that their multiple fault detection
capabilities are very limited.
In these codes the summation of all the information bytes is appended to the information
as bbit checksum. Any error in the transmission will be indicated as a resulting error in
the checksum. This leads to detection of the error. When b=1, these codes are reduced to
parity check codes. The codes are systematic in nature and require simple hardware units.
In this scheme the codeword is of a standard weight m and standard length n bits.
Whenever an error occurs during transmission, the weight of the code word changes and
the error gets detected. If the error is a 0 to 1 transition an increase in weight is detected,
similarly 1 to 0 leads to a reduction in weight of the code, leading to easy detection of
error. This scheme can be used for detection of unidirectional errors, which are the most
common form of error in digital systems.
Berger codes are systematic unidirectional error detecting codes. They can be considered
as an extension of the parity codes. Parity codes have one check bit, which can be
considered as the number of information bits of value 1 considered in modulo 2. On the
12
other hand Berger codes have enough check bits to represent the count of the information
bits having value 0. The number of check bits (r) required for k-bit information is given
by
r = [log2 (k − 1)]
Of all the unidirectional error detecting codes that exist [21] suggests, m-out-of-n codes
to be the most optimal. These codes however, are not of much application because of its
nonseparable nature. Amongst the separable codes in use, the Berger codes have been
proven to be most optimal, requiring the smallest number of check bits [22].
The Berger Codes, however, are not optimal when only t unidirectional errors need to be
detected instead of all unidirectional errors. For this reason a number of different
modified Berger codes exist: Hao Dong introduced a code [23] that accepts slightly
reduced error detection capabilities, but does so using fewer check bits and smaller
checker sizes. In this code the number of check bits is independent of the number of
information bits. Bose and Lin [24] have introduced their own variation on Berger codes
and Bose [25] has further introduced a code that improves on the burst error detection
capabilities of his previous code, where erroneous bit are expected to appear in groups.
Blaum [26] further improves on Bose-Lin Code. Favalli proposes an approach where
code cost is reduced because of the graph theoretic optimization [27].
Error-correcting codes (ECC) were first developed in the 1940s following a theorem of
Claude Shannon that showed that almost error-free communication could be obtained
over a noisy channel [1]. The quality of the recovered signal will however depend on the
error correcting capability of the codes.
Error correction coding requires lower rate codes than error detection, but is a basic
necessity in safety critical systems, where it is absolutely critical to get it right first time
itself. In these special circumstances, the additional bandwidth required for the redundant
check-bits is an acceptable price.
Over the years, the correcting capability of the error correction schemes have gradually
increased with constrained number of computation steps. Concurrently, the time and
hardware cost to perform a given number of computational steps have also greatly
decreased. These trends have led to greater application of these error-correcting
techniques.
13
2.3.3.1 Bose – Chaudhuri – Hocquenqhem (BCH) Codes
BCH codes are the most important and powerful classes of linear block codes which are
cyclic codes with a wide variety of parameters. The most common BCH codes are
characterized as follows. Specifically, for any positive integer m (equal to or greater than
3) and t [less than (2m −1) / 2 ] there exists a binary BCH code with the following
parameters:
Block length: = 2m −1 n
Number of message bits k ≥ n − mt
Minimum distance 2 1 min d ≥ t +
Where m is the number of parity bits and t is number of errors that can be corrected.
Each BCH code is a t – error correcting code in that it can detect and correct up to t
random errors per codeword. The Hamming single – error correcting codes can be
described as BCH codes. The BCH codes offer flexibility in the choice of code
parameters, block length and code rate.
Hamming codes can also be defined over the non – binary field. The parity check matrix
is designed by setting its columns as the vectors of GF(p) m whose first non – zero
element equals one. There are n = ( p −1) /( p −1) m such vectors and any pair of these is
linearly independent.
Hamming codes have the advantage of requiring fewest possible check bits for their code
lengths, but suffer from the disadvantage that, whenever more than single error occurs, it
is wrongly interpreted as a single-error, because each non-zero syndrome is matched with
one of the single-error events. Thus it is inefficient in handling burst errors.
The transmission channel could be memory less or it may be having some memory. If the
channel is memory less then the errors may be independent and identically distributed. .
Sometimes it is possible that the channel errors exhibit some kind of memory. The most
common example of this is burst errors. If a particular symbol is in error, then the
chances are good that its immediate neighbors are also wrong. Burst errors occur for
14
instance in mobile communications due to fading and in magnetic recording due to media
defects. Burst errors can be converted to independent errors by the use of an interleaver.
A burst error can also be viewed as another type of random error pattern and be handled
accordingly. But some schemes are particularly well suited to dealing with burst errors.
Cyclic codes represent one such class of codes. Most of the linear block codes are either
cyclic or are closely related to the cyclic codes. An advantage of cyclic codes over most
other codes is that they are easy to encode. Furthermore, cyclic codes posses a well
defined mathematical structure called the Galois Field, which has led to the development
of a very efficient decoding schemes for them.
Reed Solomon codes represent the most important sub-class of the cyclic codes [5], [14].
15
CHAPTER 3
3.1 INTRODUCTION
In chapter 2, various types of the error correcting codes were discussed. Burst errors are
efficiently corrected by using cyclic codes. The Galois field or the Finite Fields are
extensively used in the Error - Correcting Codes (ECC) using the Linear Block Codes. In
this chapter these finite fields are discussed thoroughly. The Galois Field is a finite set of
elements which has defined rules for arithmetic. These roots are not algebraically
different from those used in the arithmetic with ordinary numbers but the only difference
is that there is only a finite set of elements involved. They have been extensively used in
Digital Signal Processing (DSP), Pseudo- Random Number Generation, Encryption and
Decryption protocols in cryptography.
The design of efficient multiplier, inverter and exponentiation circuits for Galois Field
arithmetic is needed for these applications.
A Finite Field is a field with a finite field order (i.e., number of elements), also called a
Galois field. The order of a finite field is always a prime or a power of a prime . For each
prime power, there exists exactly one finite field GF(p m ).A Field is said to be infinite if
it consists of infinite number of elements, for e.g. Set of real numbers, complex numbers
etc. Finite field on the other hand consist of finite number of elements.
GF(p m ) is an extension field of the ground field GF(p), where m is a positive integer.
For p = 2, GF(2 m ) is an extension field of the ground field GF(2) of two elements (0,l).
GF(2 m ) is a vector space of dimension m over GF(2) and hence is represented using a
basis of m linearly independent vectors. The finite field (2 ) m GF contains (2m −1) non
zero elements. All finite fields contain a zero element and an element, called a generator
or primitive element α , such that every non-zero element in the field can be expressed as
a power of this element. The existence of this primitive element (of order 2 m -1) is
asserted by the fact that the nonzero elements of GF(2 m ) form a cyclic group.
Encoders and decoders for linear block codes over GF(2 m ), such as Reed-Solomon
codes, require arithmetic operations in GF(2 m ). In addition, decoders for some codes
over GF(2), such as BCH codes, require computations in extension fields GF(2 m ). In
GF(2 m ) addition and subtraction are simply bitwise exclusive-or. Multiplication can be
performed by several approaches, including bit serial, bit parallel (combinational), and
software. Division requires the reciprocal of the divisor, which can be computed in
hardware using several methods, including Euclid’s algorithm, lookup tables,
exponentiation, and subfield representations. With the exception of division,
16
combinational circuits for Galois field arithmetic are straightforward. Fortunately, most
decoding algorithms can be modified so that only a few divisions are needed, so fast
methods for division are not essential.
1. In GF(2 m ) fields, there is always a primitive element α , such that you can
express every element of GF(2 m ) except zero as a power of α [28]. You can
generate every field GF(2 m ) using a primitive polynomial over GF(2), and the
arithmetic performed in the GF(2 m ) field is modulo this primitive polynomial.
2. If α is a primitive element of GF(2 m ), its conjugate α 2m is also primitive
elements of GF(2 m ).
3. If α is an element of order n in GF(2 m ), all its conjugates have the same order n.
4. If α , an element in GF(2 m ), is a root of a polynomial f(x) over GF(2), then all
the distinct conjugates of α , also elements in GF(2 m ), are roots of f(x).
5. The 2 m- 1 nonzero elements of GF(2^m ) form all the roots of x m 2 - 1 - 1 = 0.
The elements of GF(2 m ) form all the roots of x 2^m - x = 0.
Hence all the elements of this field can be generated as powers ofα . This is the
polynomial representation of the field elements, and also assumes the leading coefficient
of φ (x) to be equal to 1. Figure 3.1 shows the Finite field generated by the primitive
polynomial 1 + α +α^2 +α^3 +α^4 +α^8 represented as GF( 2^8 ) or GF(256).
17
Figure 3.1: Representation of some elements in GF( 2^8)
Note that here, the primitive polynomial is of degree 4, and the numeric value of α is
considered purely arbitrary. Using the irreducibility property of the polynomial φ (x), this
can be proven that this construction indeed produces a field.
The primitive polynomial is used in a simple iterative algorithm to generate all the
elements of the field. Hence different polynomials will generate different fields. In the
work [29], the author claims that even though there are numerous choices for the
irreducible polynomials, the fields constructed are all isomorphic.
Galois Field Arithmetic (GFA) is very attractive in several ways. All the operations that
generally result in overflow or underflow in traditional mathematics gets mapped on to a
value inside the field because of modulo arithmetic followed in GFA, hence rounding
issues also get automatically eliminated.
GF generally facilitates the representation of all elements with a finite length binary
word. For e.g. in GF(2 m ), all the operands are of m-bits, where m is always a number
smaller than the conventional bus width of 32 or 64. This in turn introduces a huge
amount of parallelism into the GFA operations. Further, we assume ‘m’ to be a multiple
of 8, or a power of 2, because of its inherent convenience for sub-word parallelism [30].
To study the GFA an introduction to the mathematical concepts of the trace and dual
basis are necessary [31], [32].
18
Definition 1: The trace of an element β which belongs to GF(2 m ) is defined as
CHAPTER 4
19
REED SOLOMON CODES
4.1 INTRODUCTION
High reliable data transmission and storage systems frequently use error
correction codes (ECC) to protect data. By adding a certain grade of redundancy
these codes are able to detect and correct errors in the coded information. In the
design of high reliable electronics systems both the Reed-Solomon (RS) encoder
and decoder should be self checking in order to avoid faults in these blocks
which compromise the reliability of the whole system. In fact, a fault in the
encoder can produce a no correct codeword, while a fault in the decoder can
give a wrong data word even if no errors occur in the codeword transmission.
Therefore, great attention must be paid to detect and recover faults in the
encoding and decoding circuitry. Nowadays, the most used error correcting
codes are the RS codes, based on the properties of the finite field arithmetic. In
particular, finite fields with 2m elements are suitable for digital implementations
due to the isomorphism between the addition, performed modulo 2, and the XOR
operation between the bits representing the elements of the field.
The use of the XOR operation in addition and multiplication allows to use
parity check-based strategies to check the presence of faults in the RS encoder,
while the implicit redundancy in the codeword is used either for correct erroneous
data and for detect faults inside the decoder block.
20
bits of a symbol to be coded. An element a(x) 2 GF (2 m) is a polynomial with
where
The code words of a separable RS(n; k) code correspond to the polynomial c(x)
with degree n - 1 that can be generated by using the following formulas:
Where p(x) is a polynomial with degree less than n - k representing the parity
symbols. In practice, the encoder takes k data symbols and adds 2t parity
symbols obtaining a n symbol codeword. The 2t parity symbols allow the
correction of up to t symbols containing errors in a codeword. Defining the
21
Hamming distance of two polynomials a(x) and b(x) of degree n as the number of
coefficients of the same degree that are different, , and
the Hamming weight W(a(x)) as the number of non-zero coefficients of a(x), i.e.,
In other words the decoder is able to identify the e(x) polynomial if the
Hamming weight W (e(x)) is not greater than t. The decoding algorithm provides
as output the codeword that is the only codeword having an Hamming distance
not greater than t from the received polynomial c(x).
In this section, the motivations of the design methodology used for the
proposed implementations are described starting from an overview of the
presented literature.
22
checking implementation can be obtained by using CED implementations of
these basic arithmetic operations.
I. INTRODUCTION
23
Before data transmission, the encoder attaches parity symbols to the data
using a predetermined algorithm before transmission. At the receiving side, the
decoder detects and corrects a limited predetermined number of errors occurred
during transmission. Transmitting the extra parity symbols requires extra
bandwidth compared to transmitting the pure data. However, transmitting
additional symbols introduced by FEC is better than retransmitting the whole
package when at least an error has been detected by the receiver. Many
implementation of the RS codec is targeted to ASIC design and only a few
papers discuss about synthesizing the RS codec toward reconfigurable devices.
Implementing Reed- Solomon codec on reconfigurable devices is attractive for
two main reasons. FPGAs provide flexibility where the algorithm parameters can
be altered to provide different error correction capabilities. They also provide a
rapid development cycle, resulting in a short time to market, which is a major
factor in industry.
24
The relationship between the symbol size, s, and the size of the
codeword, n, is given by (1). This means that if there are s bits in one symbol,
there could exist 2s−1 distinct symbols in one codeword, excluding the one with
all zeros.
n = 2s − 1 (1)
A. Galois Field
25
The GF(8) will be the basis for the Reed-Solomon code, RS(7,3). Because
each symbol has
log2 (8) = 3
Bits, the variables for RS code are one
n = 23 − 1 = 7, k = 3
(Arbitrarily chosen to balance between the number of information and
parity symbols in one codeword),
t = n−k 2 = 2.
Given a stream of data,
B. Encoder
26
The transmitted codeword is systematically encoded and defined in (3) as
a function of the transmitted message m(x), the generator polynomial g(x) and
the number of parity symbols 2t.
c(x) = m(x)X2t + m(x) mod g(x) (3)
x8 + x4 + x3 + x2 + 1.
C. Decoder
After going through a noisy transmission channel, the encoded data can
be represented as
Where e(x) represents the error polynomial with the same degree as c(x)
and r(x). Once the decoder evaluates e(x), the transmitted message, c(x), is then
recovered by adding the received message, r(x), to the error polynomial, e(x), as
shown in Equation 6.
C(x) = r(x) + e(x) = c(x) + e(x) + e(x) = c(x) (6)
27
Five functional blocks that form the decoder are:
1) Syndrome Calculator: In this block, errors are detected by calculating
the syndrome polynomial, S(x) as shown in (7). This is used by the Key Equation
functional block.
Where
When S(x) = 0, the received codeword is error free. Else, the Key Equation
Solver will use S(x) to generate the error locator polynomial, _(x), and the error
evaluator polynomial, (x).
2) Key Equation Solver: The Key Equation is an equation that describes the
Solving (14) gives error locator polynomial _(x) and the error evaluator
polynomial (x), which can be represented in the general form shown in (10) and
(11) respectively.
28
The error locator polynomial, (x), has a degree of e < t. The error evaluator
polynomial, (x) has degree at most e−1 to determine the magnitude of e errors.
There are different algorithms that have been used to solve the key equation and
two common ones are the Euclidean algorithm and the
3) Error Locator: The locations of the errors are determined based on the error
locator polynomial,^(x) (10). Each is plugged into
evaluator polynomial,
29
A. Encoder
The encoder is architected using the Linear Feedback Shift Register
B. Decoder
The high level architecture of the decoding data path is shown in Fig. 6.
The decoder first calculates the syndrome of the receiving codeword to detect
any potential errors occurred during transmission. If the syndrome polynomial,
S(x), is not zero, the receiving codeword is therefore erroneous and will be
corrected if the number of erroneous symbols is less than eight.
30
1) Syndrome Calculator: The syndrome takes in codeword after codeword at a
rate of 1 symbol/clock cycle. The i start signal indicates the beginning of each
codeword. The syndrome architecture is shown in Fig. 7.
The coefficients are obtained by solving (13). After 255 clock cycles,
S(x) is ready to be processed by the Key Equation Solver.
31
2) Key Equation Solver: The Key Equation solver waits on the i start signal
before capturing the syndrome polynomial, S(x). The Berlekamp algorithm is
implemented using a state machine design as shown in Fig. 8.
The state machine is designed from the BM flowchart, Fig 4. The state
machine is initialized each time a syndrome, S(x) is ready to be processed by the
Key Equation Solver to generate ^(x). Once ^(x) is found, calculated using 14.
End If
End For
32
The chien’s algorithm calculates the location of the erroneous symbols in
each codeword. Considering the above algorithm cc i and ri represent the i-th
symbol in the corrected codeword and the received i th polynomial, respectively.
34
The input and output signals to the slice are as follows.
• Ain is the registered output of the previous slice.
• Pin is the registered parity of the previous slice.
• Fin is the feed-back of the LFSR.
• PFin is the parity of the feed-back input.
• Aout is the result of the multiplication and addition operation.
• Pout is the predicted parity of the result.
35
The parity checker block checks if the parity of the inputs is even or odd.
The self checking implementation of the parity checker is realized with a two-rail
circuit. The two outputs are each equal to the parity of one of the two disjoint
subsets of the inputs, as proposed. The fault-free behavior of the checker, when
a correct set of inputs is provided (i.e., no faults occur in the slices) is the
following: the output codes 01 or 10 are generated for an odd parity checker or
the output codes 00 or 11 for an even parity checker. If the checker receive as
input an erroneous codeword (i.e., a fault occurs in a slice) the checker provides
the output codes 11 or 00 for an odd parity checker or the output codes 01 or 10
for an even parity checker.
Also, if a fault occurs in the checker the outputs provided are 11 or 00 for
an odd parity checker or the output codes 01 or 10 for an even parity checker.
This considerations guarantee the self-checking property of the checker. It can
be noticed that, due to the LFSR-based structure of the RS encoder, there are no
control state machines to be protected against faults.
Therefore, the use of the described self-checking arithmetic structures
allows checking the entire RS encoder. The evaluations in terms of area and
delay of this structure has been carried out by using a Xilinx Virtex II FPGA as
the target device and the design flow has been performed by using the Xilinx
ISE foundation framework.
36
Table I reports the area of each of the blocks described in this section.
The adder is implemented by using one LUT for each output, while the area of
the constant multipliers and of the parity prediction block depends by the
coefficients gi. In Table I, the row named “additional logic” represents the logic
added to the slice in order to predict the parity bit. The number of LUTs required
to implement the parity checker depends by the number of slices of the encoder,
i.e., the number n - k of check bits of the RS code.
37
In order to compute the critical path for the overall self checking encoder
architecture, the following additional signal paths must be considered:
• Path crossing the parity prediction block that is comparable with the path of the
worst-case constant multiplier;
• Path crossing the parity checkers. This path depends by the number of bits
provided as input to the checker. In fact, the number of required LUTs is equal to
the number of levels of the four inputs XOR network, that is dlog4 (n -k) (m + 1)
e.
The number of levels of the two-rail parity checker increases very slowly
with the growth of the number of checks symbols, and therefore, do not represent
a problem for the maximum frequency of the self checking decoder.
38
Error detection block that take as inputs the output of the Hamming weight
counter and of the codeword checker and provides error detection signal if
a fault in the RS decoder has been detected.
Implementation 1:
39
We only need to reuse the same RS encoder used to create the codeword for the
computation of the remainder c(x) obtained from the decoder. The drawback of
this implementation is the additional latency introduced by the RS encoder, which
is n-k clock cycles. This latency must be considered by the error detection block
that must wait n - k clock cycles to check the two properties defined in Section III.
The area occupation of the RS encoder is smaller than the area occupation of
the decoder; therefore, the overhead introduced by this block is about 15% of the
decoder area.
Implementation 2:
40
the Hamming weight counter and the outputs of the codeword checker. Its
implementation depends from the chosen implementation of the codeword
checker. If we use implementation 1 the error detection block must delay the
output of the Hamming weight counter for n -k clock cycles and checks if all the
coefficients of the remainder polynomial are zero.
On the other hand, if we use the syndromes calculation block the inputs
are the computed syndromes and the error detection block checks if all the
received symbols are zero. The additional blocks used to detect faults inside the
decoder are susceptible to faults and, therefore, their implementation must
assure the self-checking property, in order to face the age old question of “who
checks the checker.” For the codeword checker and the error polinomiyal
generator blocks only register and GF (2 m) addition and constant multiplication
are used and, therefore, the same consideration of Section IV can be used to
obtain the self-checking property of these blocks. For the counters and the
comparator used in the Hamming weight counter and error detection blocks,
many efficient techniques can be found in literature.
41
CHAPTER 5
Field Programmable Gate Array (FPGA)
42
5.2 Basic concepts of FPGA
43
The performance of an application is not affected when additional process
added to the FPGA since is parallel in nature and do not have to compete to use
the same resource. FPGA can enforce critical interlock logic and can be
designed to prevent I/O forcing by an operator. Unlike hardwired printed circuit
board (PCB) designs, which have fixed and limited hardware resources, FPGA
based system can literally rewire its internal circuitry to allow reconfiguration after
the control system is deployed to the field.
FPGA is good used for large designs and it is also reconfigurable. When
creating designs, we can use simple VHDL or verilog commands to design a
complex FPGA design. Moreover, by using FPGA, it is able to deliver the
technical edge such as optimizing FPGA time closure with Precision Synthesis,
advanced timing analysis and optimizing timing closure with I/O optimization and
PCB integration. FPGA is also able to optimize the design process by reducing
by half the design time with rapid development process.
44
5.4 Language Used in FPGA
Even though HDLs were popular for logic verification, designers had to
manually translate the HDL-based design into a schematic circuit with
interconnections between gates. The advent of logic synthesis in the late 1980s
changed the design methodology radically. Digital circuits could be described at
a register transfer level (RTL) by use of an HDL. Thus, the designer had to
specify how the data flows between registers and how the design processes the
data. The details of gates and their interconnections to implement the circuit were
automatically extracted by logic synthesis tools from the RTL description.
Thus, logic synthesis pushed the HDLs into the forefront of digital design.
Designers no longer had to manually place gates to build digital circuits. They
could describe complex circuits at an abstract level in terms of functionality and
data flow by designing those circuits in HDLs. Logic synthesis tools would
implement the specified functionality in terms of gates and gate interconnections.
HDLs also began to be used for system-level design. HDLs were used for
simulation of system boards, interconnect buses, FPGAs (Field Programmable
Gate Arrays), and PALs (Programmable Array Logic). A common approach is to
45
design each IC chip, using an HDL, and then verify system functionality via
simulation.
46
CHAPTER 6
INTRODUCTION TO VERILOG
6.1 Introduction
Verilog is a HARDWARE DESCRIPTION LANGUAGE (HDL). A hardware
description Language is a language used to describe a digital system, for
example, a microprocessor or a memory or a simple flip-flop. This just means
that, by using a HDL one can describe any hardware (digital) at any level. Verilog
is one of the HDL languages available in the industry for designing the Hardware.
Verilog allows us to design a Digital design at Behavior Level, Register Transfer
Level (RTL), Gate level and at switch level. Verilog allows hardware designers to
express their designs with behavioral constructs, deterring the details of
implementation to a later stage of design in the final design.
47
Designs using the Register-Transfer Level specify the characteristics of a
circuit by operations and the transfer of data between the registers. An explicit
clock is used. RTL design contains exact timing possibility; operations are
scheduled to occur at certain times.
6.4 Gate Level
Verilog simulator was first used beginning in 1985 and was extended
substantially through 1987.The implementation was the Verilog simulator sold by
Gateway. The first major extension was Verilog-XL, which added a few features
and implemented the infamous "XL algorithm" which was a very efficient method
for doing gate-level simulation.
48
The time was late 1990. Cadence Design System, whose primary product
at that time included Thin film process simulator, decided to acquire Gateway
Automation System. Along with other Gateway product, Cadence now became
the owner of the Verilog language, and continued to market Verilog as both a
language and a simulator.
At the same time, Synopsys was marketing the topdown design
methodology, using Verilog. This was a powerful combination. In 1990, Cadence
recognized that if Verilog remained a closed language, the pressures of
standardization would eventually cause the industry to shift to VHDL.
Consequently, Cadence organized Open Verilog International (OVI), and in 1991
gave it the documentation for the Verilog Hardware Description Language.
This was the event which "opened" the language. OVI did a considerable
amount of work to improve the Language Reference Manual (LRM), clarifying
things and making the language specification as vendor-independent as
possible.In 1990. Soon it was realized, that if there were too many companies in
the market for Verilog, potentially everybody would like to do what Gateway did
so far - changing the language for their own benefit. This would defeat the main
purpose of releasing the language to public domain.
As a result in 1994, the IEEE 1364 working group was formed to turn the
OVI LRM into an IEEE standard. This effort was concluded with a successful
ballot in 1995, and Verilog became an IEEE standard in December, 1995. When
Cadence gave OVI the LRM, several companies began working on Verilog
simulators. In 1992, the first of these were announced, and by 1993 there were
several Verilog simulators available from companies other than Cadence. The
most successful of these was VCS, the Verilog Compiled Simulator, from
Chronologic Simulation. This was a true compiler as opposed to an interpreter,
which is what Verilog-XL was. As a result, compile time was substantial, but
simulation execution speed was much faster. In the meantime, the popularity of
Verilog and PLI was rising exponentially.
49
Verilog as a HDL found more admirers than well-formed and federally
funded VHDL. It was only a matter of time before people in OVI realized the need
of a more universally accepted standard. Accordingly, the board of directors of
OVI requested IEEE to form a working committee for establishing Verilog as an
IEEE standard.
The working committee 1364 was formed in mid 1993 and on October 14,
1993, it had its first meeting. The standard, which combined both the Verilog
language syntax and the PLI in a single volume, was passed in May 1995 and
now known as IEEE Std. 1364- 1995. After many years, new features have been
added to Verilog, and new version is called Verilog 2001. This version seems to
have fixed lot of problems that Verilog 1995 had. This version is called 1364-
2000. Only waiting now is that all the tool vendors implementing it.
Most popular logic synthesis tools support Verilog HDL. This makes it the
language of choice for designers.
50
All fabrication vendors provide Verilog HDL libraries for postlogic synthesis
simulation. Thus, designing a chip in Verilog HDL allows the widest choice
of vendors.
6.7 Simulation
Simulation is the process of verifying the functional characteristics of
models at any level of abstraction. We use simulators to simulate the the
Hardware models. To test if the RTL code meets the functional requirements of
the specification, see if all the RTL blocks are functionally correct. To achieve this
we need to write testbench, which generates clk, reset and required test vectors.
We use waveform output from the simulator to see if the DUT (Device
under Test) is functionally correct. Most of the simulators comes with waveform
viewer, as design becomes complex, we write self checking testbench, where
testbench applies the test vector, compares the output of DUT with expected
value.
6.8 SYNTHESIS:
51
Synthesis tool after mapping the RTL to gates, also do the minimal
amount of timing analysis to see if the mapped design meeting the timing
requirements.
SOFTWARE DETAILS:
Architectural Description:
Spartan-III Array:
Values stored in static memory cells control all the configurable logic
elements and interconnect resources. These values load into the memory cells
on power-up, and can reload if necessary to change the function of the device.
Each of these elements will be discussed in detail.
Input/Output Block:
52
The Spartan-III IOB, as seen in Figure, features inputs and outputs that
support a wide variety of I/O signaling standards. These high-speed inputs and
outputs are capable of supporting various state of the art memory and bus
interfaces.
Input Path:
A buffer in the Spartan-II IOB input path routes the input signal either
directly to internal logic or through an optional input flip-flop. An optional delay
53
element at the D-input of this flip-flop eliminates pad-to-pad hold time. The delay
is matched to the internal clock-distribution delay of the FPGA, and when used,
assures that the pad-to-pad hold time is zero.
Output Path:
The output path includes a 3-state output buffer that drives the output
signal onto the pad. The output signal can be routed to the buffer directly from
the internal logic or through an optional IOB output flip-flop.
The 3-state control of the output can also be routed directly from the
internal logic or through a flip-flip that provides synchronous enable and disable.
Each output driver can be individually programmed for a wide range of low-
voltage signaling standards. Each output buffer can source up to 24 mA and sink
up to 48 mA. Drive strength and slew rate controls minimize bus transients.
Storage Elements:
Block RAM:
54
block is four CLBs high, and consequently, a Spartan-II device eight CLBs high
will contain two memory blocks per column, and a total of four blocks.
.
Design Implementation:
The placer then determines the best locations for these blocks based on
their interconnections and the desired performance. Finally, the router
interconnects the blocks. The PAR algorithms support fully automatic
implementation of most designs. For demanding applications, however, the user
can exercise various degrees of control over the process. User partitioning,
placement, and routing information are optionally specified during the design-
entry process.
55
user-generated specifications. Specific timing information for individual nets is
unnecessary.
Configuration:
Modes
The Configuration mode pins (M2, M1, and M0) select among these
configuration modes with the option in each case of having the IOB pins either
pulled up or left floating prior to configuration.
Serial Modes:
56
There are two serial configuration modes: In Master Serial mode, the FPGA
controls the configuration process by driving CCLK as an output. In Slave Serial
mode, the FPGA Passively receives CCLK as an input from an external agent
(e.g., a microprocessor, CPLD, or second FPGA in master mode) that is
controlling the configuration process. In both modes, the FPGA is configured by
loading one bit per CCLK cycle. The MSB of each configuration data byte is
always written to the DIN pin first.
The Slave Parallel mode is the fastest configuration option. Byte-wide data is
written into the FPGA. A BUSY flag is provided for controlling the flow of data at
a clock frequency FCCNH above 50 MHz.
In Slave Serial mode, the FPGAs CCLK pin is driven by an external source,
allowing FPGAs to be configured from other logic devices such as
microprocessors or in a Daisy-chain configuration.
In Master Serial mode, the CCLK output of the FPGA drives a Xilinx PROM
which feeds a serial stream of configuration data to the FPGA’s DIN input.
Operating Modes:
57
Figure: Configuration Flow Diagram
58
Read Through (One Clock Edge):
The read address is registered on the read port clock edge and data
appears on the output after the RAM access time. Some memories may place
the latch/register at the outputs, Depending on the desire to have a faster clock-
to-out versus setup time. This is generally considered to be an inferior solution
since it changes the read operation to an asynchronous function with the
possibility of missing an address/control line transition during the generation of
the read pulse clock.
The write address is registered on the write port clock edge and the data input is
written to the memory and mirrored on the write port input.
Features
59
User programmable ground pin capability
Extended pattern security features for design protection
High-drive 24 mA outputs
3.3 V or 5 V I/O capability
Advanced CMOS 5V FastFLASH technology
Supports parallel programming of more than one XC9500 concurrently
Available in 44-pin PLCC, 84-pin PLCC, 100-pin PQFP and 100-pin TQFP
packages
Xilinx ISE
The Xilinx ISE tools allow you to use schematics, hardware description
languages (HDLs), and specially designed modules in a number of ways.
Schematics are drawn by using symbols for components and lines for
wires.Xilinx Tools is a suite of software tools used for the design of digital circuits
implemented using Xilinx Field Programmable Gate Array (FPGA) or Complex
Programmable Logic Device (CPLD).
The design procedure consists of (a) design entry, (b) synthesis and
implementation of the design, (c) functional simulation and (d) testing and
verification. Digital designs can be entered in various ways using the above CAD
tools: using a schematic entry tool, using a hardware description language (HDL)
– Verilog or VHDL or a combination of both. In this lab we will only use the
design flow that involves the use of Verilog HDL. The CAD tools enable you to
design combinational and sequential circuits starting with Verilog HDL design
specifications.
60
3. Create the test-vectors and simulate the design (functional simulation) without
using a PLD (FPGA or CPLD).
4. Assign input/output pins to implement the design on a target device.
5. Download bitstream to an FPGA or CPLD device.
6. Test design on FPGA/CPLD device
A Verilog input file in the Xilinx software environment consists of the following
segments:
• End: endmodule
ModelSim 6.2C
FEATURES:
Unified Coverage Database (UCDB) which is a central point for managing,
merging, viewing, analyzing and reporting all coverage information.
Source Annotation. The source window can be enabled to display the
values of objects during simulation or when reviewing simulation results
logged to WLF.
Finite State Machine Coverage for both VHDL and Verilog is now
supported.
Code Coverage results can now be reviewed post-simulation using the
graphical user environment.
Simulation messages are now logged in the WLF file and new capabilities
for managing message viewing are provided in the message viewer.
SystemC is now supported for x86 Linux 64-bit platforms.
61
Transaction recording and viewing is supported for SystemC using the
SCV transaction recording facilities.
The GUI debug and analysis environment continues to evolve to provide
greater user-customization and better performance.
SystemVerilog for design support continues to expand with many new
constructs added for this release.
Message logging and viewing. Simulation messages are now logged in
the WLF and new capabilities for managing message viewing are
provided. Messages are organized by their severity and type.
BENEFITS:
The best mixed-language environment and performance in the industry.
The intuitive GUI makes it easy to view and access the many powerful
capabilities of ModelSim. There is no learning curve as the debug
environment is common across all languages.
All ModelSim products are 100% standards based. This means your
investment is protected, risk is lowered, reuse is enabled, and productivity
is enhanced.
Award-winning technical support.
62
Figure 8.a Model Sim
ModelSim combines high performance and high capacity with the most
advanced code coverage and debugging capabilities in the industry. ModelSim
offers unmatched flexibility by supporting 32 and 64 bit UNIX and Linux and 32
bit Windows®-based platforms. Model Technology™ was the first to put the
award-winning single kernel simulator (SKS) technology in the hands of
engineers, enabling transparent mixing of VHDL, Verilog, and SystemC in one
design, using a common, intuitive graphical interface for development and debug
at any level, regardless of the language.
63
Verilog for Design:
64
CONCLUSION
65
Simulation results
66
The simulation output waveform of the top module of Reed
Solomon
67
Synthesis results
68
RTL schematic of sub module of self checking reed Solomon
69
RTL schematic of inner sub block module of self checking reed Solomon
70
REFERENCES
71
8. M. Gossel, S. Fenn, and D. Taylor, “On-line error detection for finite field
multipliers,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., 1997,
pp. 307–311.
9. Y.-C. Chuang and C.-W. Wu, “On-line error detection schemes for a systolic
finite-field inverter,” in Proc. 7th Asian Test Symp., 1998, pp. 301–305.
10. M. Boyarinov, “Self-checking algorithm of solving the key equation,” in Proc.
IEEE Int. Symp. Inf. Theory, 1998, p. 292.
11. C. Bolchini, F. Salice, and D. Sciuto, “A novel methodology for designing TSC
networks based on the parity bit code,” in Proc. Eur. Design Test Conf., 1997, pp.
440–444.
12. Altera Corp., San Jose, CA, “Altera Reed-Solomon compiler user guide
3.3.3,” 2006.
13. Xilinx, San Jose, CA, “Xilinx logicore Reed-Solomon decoder v5.1,” 2006.
14. D. Nikolos, “Design techniques for testable embedded error checkers,
computers,” Computer, vol. 23, no. 7, pp. 84–88, Jul. 1990.
15. P. K. Lala, Fault Tolerant and Fault Testable Hardware Design. Englewood
Cliffs, NJ: Prentice-Hall, 1985.
72