Carry Select Adder
Carry Select Adder
Introduction
The arithmetic logic unit (ALU) is the heart of every microprocessor and determines its
throughput. The core of arithmetic logic unit is adder. Therefore a high performance adder is
essential to maximize microprocessor's speed. However, the high data activity associated with
this unit results in high power and thermal density leading to increased cooling costs. Thus, there
is a critical need for breakthrough ideas in VLSI design methodology to reduce the adder power
consumption while maintaining the high performance target. There are many ways to design an
adder.
Ripple Carry Adder has most compact design but slowest in speed. If there is N-bit Ripple
Carry Adder, the delay is linearly proportional to N. Thus for large values of N the Ripple Carry
Adder gives highest delay of all adders. Whereas Carry Look ahead Adder is the fastest one but
consume more area. If there is N-bit adder, Carry Look-ahead Adder is fast for N<=4, but for
large values of N its delay increases more than other adders. Therefore, for higher number of bits,
Carry Select Adder gives higher delay than other adders due to presence of large number of logic
gates. Carry Select Adders acts as a compromise between a small area but longer delay Ripple
Carry adder and a large area with shorter delay Carry Look-ahead Adder.
In electronic application, adders are most widely used. Applications where these are used
are multipliers, DSP to execute various algorithms like FFT, FIR and IIR. It is known that
millions of instructions per second were performed in microprocessors. The speed of operation is
the most important constraint to be considered while designing multipliers.
Due to the device, portability miniaturization of device should be high and power
consumption should be low. In rapidly growing mobile industry, faster units are not the only
concern but also smaller area and less power become major concerns for design of digital
circuits. In mobile electronics, reducing area and power consumption are key factors in
increasing portability and battery life. Even in servers and desktop computers, power dissipation
is an important design constraint. Design of area and power efficient high-speed data-path logic
systems are one of the most substantial areas of research in VLSI system design.
In digital adders, the speed of addition is limited by the time required to propagate a carry
through the adder. The sum for each bit position in an elementary adder is generated sequentially
1
only after the previous bit position has been summed and a carry propagated into the next
position. Among various adders, the Carry Select Adder is intermediate regarding speed and area.
The CSLA is used in many computational systems to alleviate the problem of carry
propagation delay by independently generating multiple carries and then select a carry to generate
the sum. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry
Adders (RCA) to generate partial sum and carry by considering carry input C in = 0 and Cin = 1,
then the final sum and carry are selected by the multiplexers.
The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA
with Cin = 1 in the regular CSLA to achieve lower area and power consumption. The main
advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full
Adder (FA) structure.
The carry-select adder generally consists of two ripple carry adders and a multiplexer.
Adding two n-bit numbers with a carry-select adder is done with two adders (therefore two ripple
carry adders) in order to perform the calculation twice, one time with the assumption of the carry
being zero and the other assuming one. After the two results are calculated, the correct sum, as
well as the correct carry, is then selected with the multiplexer once the correct carry is known.
The number of bits in each carry select block can be uniform, or variable. In the uniform
case, the optimal delay occurs for a block size of . When variable, the block size should
have a delay, from addition inputs A and B to the carry out, equal to that of the multiplexer chain
leading into it, so that the carry out is calculated just in time. The delay is derived from
uniform sizing, where the ideal number of full-adder elements per block is equal to the square
root of the number of bits being added, since that will yield an equal number of MUX delays.
1.1Vlsi technology
Very large scale integration is the process of creating integrated circuits by combining
thousands of transistors into a single chip. VLSI began in the 1970s when
complex semiconductor and communication technologies were being developed.
The first semiconductor chips held two transistors. Subsequent advances added more and more
transistors. As a sequence individual functions or systems were integrated over time.
2
The first integrated circuits held only a few devices, perhaps as many as
ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or
more logic gates on a single device.
The Previous integrated circuits design methods are SSI, MSI and LSI. SSI means Small
Scale Integrated Circuits in which circuits held only a few devices, perhaps as
ten diodes, transistors, resistors and capacitors, making it possible to fabricate more logic gates.
MSI means medium scale integration. This technique led to devices with hundreds of logic gates.
LSI means large scale integration i.e. systems with at least a thousand logic gates.
Digital VLSI circuits are predominantly CMOS based. The way normal blocks like latches
and gates are implemented is different from what we have seen so far but the behavior remains
the same. The miniaturization involves new things to consider. A lot of thought has to go into
implementations as well as design.
CMOS (Complementary metal oxide semiconductor) technology is used for constructing
integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM
and the other digital logic circuits. CMOS technology is also used for several analog circuits
such as image sensors, data converters, and highly integrated transceivers for many types of
communication.
3
i. AHDL
ii. VHDL
iii. Verilog
1.2.1. AHDL
AHDL means Analog HDL. This method is not widely used because the behavior results
are not more accurate. The recently used languages are VHDL and Verilog HDL.
1.2.2. VHDL
VHDL means VHSIC hardware description language. VHDL is commonly used to write
text models that describe a logic circuit. Such a model is processed by a synthesis program, only
if it is part of the logic design. A simulation program is used to test the logic design using
simulation models to represent the logic circuits that interface to the design. This collection of
simulation models is commonly called a test bench.
1.2.3. Verilog HDL
In the semiconductor and electronic design industry, Verilog is a general purpose hardware
description language (HDL) used to model electronic systems. It is easy to learn and easy to use.
It is similar in syntax to the C programming language. It is most commonly used in the design,
verification, and implementation of digital logic chips at the register transfer level (RTL) of
abstraction. It is also used in the verification of analog and mixed signal circuits.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design
hierarchy, and communicate with other modules through a set of declared input, output, and
bidirectional ports.
1.3Introduction to adders
4
two bits (S) and the carry (C). The logic level diagram for Half Adder is shown in Fig 1.1. The
Boolean expressions for the S and C bits are as shown below.
S=A⊕B (1.1)
C=A×B (1.2)
SUM bit is the XOR function of two inputs and CARRY bit is the AND function of the
two inputs. The truth table of a half adder is shown in Table 1.1.
INPUTS OUTPUTS
A B S C
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
5
Note how the same two inputs are directed to two different gates. The inputs to the XOR
gate are also the inputs to the AND gate. The input "wires" to the XOR gate are tied to the input
wires of the AND gate; thus, when voltage is applied to the A input of the XOR gate, the A input
to the AND gate receives the same voltage.
A full adder could be defined as a combinational circuit that forms the arithmetic sum of
three input bits. It consists of three inputs and two outputs. In our design, we have designated the
three inputs as A, B and C. The third input C represents carry input to the first stage. The outputs
are S and C. Fig 1.2 shows the logic level diagram of a full adder. The Boolean expressions for
the S and C bits are as shown below.
S=A⊕B⊕C (1.3)
C = (A × B) + (B × C) + (A × C) (1.4)
SUM bit is the XOR function of all three inputs and CARRY bit is the AND function of
the three inputs.
The truth table of a full adder is shown in Table 1.2. The truth table also indicates the
status of the CARRY bit; that is to say, if that carry bit has been generated or deleted or
propagated. Depending on the status of input bits A and B, the CARRY bit is either generated or
6
deleted or propagated. If either one of A or B inputs is ‘1’, then the previous carry is just
propagated, as the sum of A and B is ‘1’. If both A and B are‘1’s then carry is generated because
summing A and B would make output S ‘0’ and C1 ‘1’. If both A and B are ‘0’s then summing A
and B would give us ‘0’ and any previous carry is added to this S making C 1 bit ‘0’. This is in
effect deleting the CARRY.
INPUT OUTPUT
A B C S C1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
B3 A3 B2 A2 B1 A1 Bo Ao
Cin
FULL FULL FULL FULL
ADDER ADDER ADDER ADDER
Cout S4 S3 S1 So
Ripple carry adder calculates sum and carry according to the following equation.
Si = Ai ⊕ Bi ⊕ Ci (1.5)
The carry out calculated from the last stage i.e. least significant bit stage is used to select
the actual calculated values of output carry and sum. The selection is done by using a
multiplexer. This technique of dividing adder in two stages increases the area utilization but
addition operation fastens. The basic block diagram for carry select adder is shown in Fig 1.4.
Carry Select Adders (CSLA) is one of the fastest adders used in many data-processing
processors to perform fast arithmetic functions. The carry select adder partitions the adder into
several groups, each of which performs two additions in parallel.
9
A’s B’s
4-bit set up
P’s G’s
C’s
Sum Generation
S’s
BEC is a circuit used to add 1 to the input numbers. A circuit of 3-bit BEC and the
function table is shown in Fig 1.5 and Table 1.3 respectively. The main objective of this project is
to reduce the gate level by using Binary to Excess-1 Converter. In order to reduce the delay and
power we use n+1 Binary to Excess-1 Converter instead of n RCA.
10
Fig 1.5 3-Bit Binary to Excess-1 Converter
BINARY EXCESS-1
[3:0] [3:0]
B2 B1 B0 X2 X1 X0
0 0 0 0 0 1
0 0 1 0 1 0
0 1 0 0 1 1
0 1 1 1 0 0
1 0 0 1 0 1
1 0 1 1 1 0
1 1 0 1 1 1
1 1 1 0 0 0
Fig 1.5 shows the basic function of the CSLA. One input for 6:3 mux is BEC output (B2,
B1 and B0) and another input for the mux is the RCA with Cin=0. This produces the two possible
11
partial results in parallel and the mux is used to select either the BEC output or the direct inputs
according to the control signal Cin. The importance of the BEC logic is the large silicon area
reduction.
The Boolean expression of the 3-bit BEC are shown below:
X0 = ~B0 (1.7)
X1 = B0 ⊕ B1 (1.8)
X2 = B2 ⊕ (B1 × B0) (1.9)
The AND, OR, and Inverter (AOI) implementation of an XOR gate is shown in Fig 1.6.
The gates between the dotted lines are performing the operations in parallel and the numeric
representation of each gate indicates the delay contributed by that gate.
The delay and area evaluation methodology considers all gates to be made up of AND,
OR, and Inverter, each having delay equal to 1 unit and area equal to 1 unit. Then add up the
number of gates in the longest path of a logic block that contributes to the maximum delay.
12
Based on this approach, the CSLA adder blocks of 2:1 mux, Half Adder (HA), and Full
Adder (FA) are evaluated and listed in Table 1.4.
XOR 3 5
2:1 Mux 3 4
Half Adder 3 6
Full Adder 6 13
The Xilinx 9.1i software is used to synthesis the Modified CSLA and ModelSim 6.4a
software is used to simulate the Modified CSLA.
ModelSim is a powerful simulator that can be used to simulate the behavior and
performance of logic circuits. The simulator allows the user to apply inputs to the designed
circuit, usually referred to as test vectors, and to observe the outputs generated in response. The
user can use the Waveform Editor to represent the input signals as waveforms.
13
1.6.1.1 Basic simulation flow
Run simulation
Debug results
In ModelSim, all designs, be they VHDL, Verilog, or some combination thereof, are
compiled into a library. You typically start a new simulation in ModelSim by creating a working
library called "work". "Work" is the library name used by the compiler as the default destination
for compiled design units.
Compiling design
After creating the working library, you compile your design units into it. The ModelSim
library format is compatible across all supported platforms. You can simulate your design on any
platform without having to recompile your design.
With the design compiled, you invoke the simulator on a top-level module (Verilog) or a
configuration or entity/architecture pair (VHDL). Assuming the design loads successfully, the
simulation time is set to zero, and you enter a run command to begin simulation.
14
Debugging results
If you don’t get the results you expect, you can use ModelSim’s robust debugging
environment to track down the cause of the problem.
A project is a collection mechanism for an HDL design under specification or test. Even
though you don’t have to use projects in ModelSim, they may ease interaction with the tool and
are useful for organizing files and specifying simulation settings. The following Fig 1.8 shows
the basic steps for simulating a design within a ModelSim project.
Create a project
Run simulation
Debug results
The flow is similar to the basic simulation flow. However, there are two important differences:
i. Do not have to create a working library in the project flow; it is done for automatically.
ii. Projects are persistent. It will open every time invoke ModelSim unless specifically close
it.
15
Chapter 2
Literature review
Ram Kumar et al. (2012) proposed that design of area- and power-efficient high-speed
data path logic systems are one of the most substantial areas of research in VLSI system design.
In digital adders, the speed of addition is limited by the time required to propagate a carry
through the adder. The sum for each bit position in an elementary adder is generated sequentially
only after the previous bit position has been summed and a carry propagated into the next
position.
[2].Area Efficient Carry Select Adder
Anitha Kumari R D, Nayana N D. “Low power and Area Efficient Carry Select Adder”,
National Conference on Electronics, Communication and Signal Processing, NCECS-2011.
Anitha Kumari et al. (2011) proposed that most of the VLSI applications, such as digital
signal processing, image and video processing, and microprocessors, extensively use arithmetic
operations. Addition, subtraction, and multiplication are examples of the most commonly used
operations. The 1-bit full adder cell is the building block of all these modules. Thus, enhancing
its performance is critical for enhancing the overall module performance.
[3].Improved Carry Select Adder with Reduced Area and Low Power
Consumption
Padma Devi, Ashima Girdher and Balwinder Singh, ”Improved Carry Select Adder
with Reduced Area and Low Power Consumption”, International Journal of Computer
Applications (0975 – 8887), Volume 3 -No.4, June 2010.
16
Padma Devi et al. (2010) proposed that power dissipation is one of the most important
design objectives in integrated circuits, after speed. As adders are the most widely used
components in such circuits, design of efficient adder is of much concern. The Carry Select
Adder (CSA) provides a good compromise between cost and performance in carry propagation
adder design.
Ceiang et al. (1998) proposed that instead of using dual carry-ripple adders a carry select
adder scheme using an add-one circuit to replace one carry-ripple adder requires fewer
transistors. If speed is crucial for this 64-bit adder, then two of the original carry select adder
blocks can be substituted by the proposed scheme with an area saving and the same speed.
Jeong et al. (2003) proposed that a new circuit based on combining XOR gates and
double pass-transistor logic has been developed for implementing a full adder. The main design
objectives for these new circuits are low power consumption.
[6]. An area efficient 64-bit square root carry select adder for low power
applications
He Y, Chang C H, and Gu J, “An area efficient 64-bit square root carry select adder
for low power applications,” in Proc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082–
4085.
17
Chang et al. (2005) proposed that Carry select method has deemed to be a good
compromise between cost and performance in carry propagation adder design. However,
conventional carry select adder (CSLA) is still area-consuming due to the dual ripple carry adder
structure.
[7].Two New Low Power High Performance Full Adders with Minimum Gates
Hosseinghadiry M, Mohammadi H and Nadisenejani M, '' Two New Low Power
High Performance Full Adders with Minimum Gates", World Academy of Science, Engineering
and Technology 52 2009.
Hosseinghadiry et al. (2009) proposed that with increasing circuit complexity and demand
to use portable devices, power consumption is one of the most important parameters these days.
Full adders are the basic block of many circuits. Therefore reducing power consumption in full
adders is very important in low power circuits. One of the most power consuming modules in full
adders is XOR circuit.
Kim et al. (2001) proposed that a carry select adder can be implemented by using a single
ripple carry adder and an add-one circuit instead of using dual ripple carry adders. A multiplexer-
based add-one circuit is proposed to reduce the area with negligible speed penalty.
[9].Single bit full adder design using 8 transistors with novel 3 transistors
XNOR gate
Manoj Kumar, Sandeep K. Arya and SujataPandey, “Single bit full adder design
using 8 transistors with novel 3 transistors XNOR gate”, International Journal of VLSI design &
Communication Systems (VLSICS) Vol.2, No.4, December 2011.
18
Manoj Kumar et al. (2011) proposed that with exponential growth of portable electronic
devices like laptops, multimedia and cellular device, research efforts in the field of low power
VLSI (Very Large Scale Integration) systems have increased many folds. With the rise in chip
density, power consumption of VLSI systems is also increasing and this further, adds to
reliability and packaging problems. Packaging and cooling cost of VLSI systems also goes up
with high power dissipation.
19
Chapter 3
Existing system
Schematic
The schematic structure of 16-bit regular carry select adder is shown in the Fig 3.1.
It has five groups of different size RCA. The steps leading to the evaluation are as follows.
20
Fig 3.2 Group 1 Architecture
1.The group 1 has one ripple carry adder. This RCA contains two full adders. Carry input is
denoted as Cin. The Cin is given to the group 1 and the data a and b also given. The full adder
performs the addition operation and gives the sum and carry. That carry is given to the next level
of group 2 for the control line or select line to the multiplexer.
2.The group 2 contains two set of ripple carry adder. Carry input Cin = 1 is fed to the one set of
RCA and the carry input Cin = 0 is fed to another one set of RCA. The data is given to the two set
21
of RCA. Now the operation is performed using the carry input. The first set of the RCA is
performed using Cin = 1. It contains two full adders. Another one RCA operates in Cin = 0. It
contains one half adder and one full adder. The control line, which is the carry out of the group 1,
is given to the multiplexer and the sum value of the two set of RCA is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 1 the first set of RCA’s sum and carry is selected. Otherwise next set of
RCA’s sum and carry is selected. The selected carry out data is given to the next level of control
line to the group 3 multiplexer.
3.The group 3 contains two set of ripple carry adder. Carry input Cin = 1 is fed to the one set of
RCA and the carry input Cin = 0 is fed to another one set of RCA. The data is given to the two set
of RCA. Now the operation is performed using the carry input. The first set of the RCA is
performed using Cin = 1. It contains three full adders. Another one RCA operates in Cin = 0. It
contains one half adder and two full adder. The control line, which is the carry out of the group 2,
is given to the multiplexer and the sum value of the two set of RCA is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 1 the first set of RCA’s sum and carry is selected. Otherwise next set of
RCA’s sum and carry is selected. The selected carry out data is given to the next level of control
line to the group 4 multiplexer.
22
Fig 3.5 Group 4 Architecture
4. The group 4 contains two set of ripple carry adder. Carry input Cin = 1 is fed to the one set of
RCA and the carry input Cin = 0 is fed to another one set of RCA. The data is given to the two set
of RCA. Now the operation is performed using the carry input. The first set of the RCA is
performed using Cin = 1. It contains four full adders. Another one RCA operates in Cin = 0. It
contains one half adder and three full adder. The control line, which is the carry out of the group
3, is given to the multiplexer and the sum value of the two set of RCA is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 1 the first set of RCA’s sum and carry is selected. Otherwise next set of
RCA’s sum and carry is selected. The selected carry out data is given to the next level of control
line to the group 5 multiplexer.
23
5.The group 5 contains two set of ripple carry adder. Carry input Cin = 1 is fed to the one set of
RCA and the carry input Cin = 0 is fed to another one set of RCA. The data is given to the two set
of RCA. Now the operation is performed using the carry input. The first set of the RCA is
performed using Cin = 1. It contains five full adders. Another one RCA operates in Cin = 0. It
contains one half adder and four full adder. The control line, which is the carry out of the group
4, is given to the multiplexer and the sum value of the two set of RCA is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 1 the first set of RCA’s sum and carry is selected. Otherwise next set of
RCA’s sum and carry is selected.
The group2 has two sets of 2-bit RCA. Based on the consideration of delay value is the
arrival time of selection input C1 of 6:3 multiplexer is earlier than S3 and later than S2. Thus, S3 is
summation of C2 and multiplexer and S2 is summation of C1 and multiplexer.
Except for group2, the arrival time of multiplexer selection input is always greater than the
arrival time of data outputs from the RCA’s. Thus, the delay of group3 to group5 is determined,
respectively as follows:
24
The Group 2 architecture calculation is,
25
3.1.2 Area calculation for RCSLA
The area calculation of regular CSLA is derived from the following steps. From the
structure of RCSLA, 8-bit, 16-bit, 32-bit and 64-bit area is calculated.
8-bit RCSLA
The total area of the 8-bit regular CSLA is 179. The total area of the different adder is
tabulated in Table 1.6.
Table 1.6 Regular CSLA Area
Word Size Adder Area (no. Of Gates)
8-bit RCSLA 179
16-bit RCSLA 399
32-bit RCSLA 839
64-bit RCSLA 1719
26
Chapter 4
Proposed system
4.1.1 Schematic
The block schematic for 16-bit Modified Carry Select Adder (MCSLA) are shown in Fig
4.1.
The structure of carry select adder using binary to excess 1 converter for RCA with
Cin=1 to optimize the area and power is shown in Fig 4.1. In our proposed method the carry 1
RCA is replaced by the BEC.
27
The n-bit RCA is replaced by the n+1bit BEC. The number of gate used in BEC is less
compare with RCA. The structure of 16-bit modified carry select adder is shown in the Fig 4.1. It
has five groups of different size binary to excess-1 convertor. The steps leading to the evaluation
are given here.
28
2. The group 2 contains one set of ripple carry adder and one BEC. Carry input Cin = 0 is fed to
the RCA. The data is given to the RCA. Now the operation is performed using the carry input.
The ripple carry adder is performed using Cin = 0. It contains one half adder and one full adder.
The binary to excess-1 convertor (BEC) instead of RCA with Cin = 1 in the regular CSLA to
achieve low area and low power consumption. The control line, which is the carry out of the
group 1, is given to the multiplexer and the sum value of RCA and BEC is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 0 the RCA’s sum and carry is selected. Otherwise the BEC’s sum and
carry is selected. The selected carry out data is fed to the next level of control line to the group 3
multiplexer.
3.The group 3 contains one set of ripple carry adder and one BEC. Carry input Cin = 0 is fed
to the RCA. The data is given to the RCA. Now the operation is performed using the carry
input. The ripple carry adder is performed using Cin = 0. It contains one half adder and one
full adder. The binary to excess-1 convertor (BEC) instead of RCA with Cin = 1 in the regular
CSLA to achieve low area and low power consumption. The control line, which is the carry
out of the group 2, is given to the multiplexer and the sum value of RCA and BEC is also
29
given to the same multiplexer. Now the multiplexer produce one sum and carry by using the
control line. If the control line has the value 0 the RCA’s sum and carry is selected.
Otherwise the BEC’s sum and carry is selected. The selected carry out data is fed to the next
level of control line to the group 4 multiplexer.
4. The group 4 contains one set of ripple carry adder and one BEC. Carry input C in = 0 is fed to
the RCA. The data is given to the RCA. Now the operation is performed using the carry input.
The ripple carry adder is performed using Cin = 0. It contains one half adder and one full adder.
The binary to excess-1 convertor (BEC) instead of RCA with Cin = 1 in the regular CSLA to
achieve low area and low power consumption. The control line, which is the carry out of the
group 3, is given to the multiplexer and the sum value of RCA and BEC is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 0 the RCA’s sum and carry is selected. Otherwise the BEC’s sum and
carry is selected. The selected carry out data is fed to the next level of control line to the group 5
multiplexer.
30
Fig 4.6 Group 5 Architecture
5. The group 5 contains one set of ripple carry adder and one BEC. Carry input Cin = 0 is fed to
the RCA. The data is given to the RCA. Now the operation is performed using the carry input.
The ripple carry adder is performed using Cin = 0. It contains one half adder and one full adder.
The binary to excess-1 convertor (BEC) instead of RCA with Cin = 1 in the regular CSLA to
achieve low area and low power consumption. The control line, which is the carry out of the
group 4, is given to the multiplexer and the sum value of RCA and BEC is also given to the same
multiplexer. Now the multiplexer produce one sum and carry by using the control line. If the
control line has the value 0 the RCA’s sum and carry is selected. Otherwise the BEC’s sum and
carry is selected.
The group2 has one 2-bit RCA which has 1FA and 1 HA for Cin = 0. Instead of another 2-
bit RCA with a 3-bit BEC is used which adds one to the output from 2-bit RCA. Based on the
consideration of delay values is the arrival time of selection input C1 [time (t) = 7] of 6:3
multiplexer is earlier than the S3 [t = 9] and C3 [t = 10] and later than S2 [t = 4]. Thus the sum3
and final C3 (output from multiplexer) are depending on S3 and multiplexer and partial C3 (input
to multiplexer) and multiplexer, respectively. The sum2 depends on C1 and multiplexer.
31
For the remaining group’s the arrival time of multiplexer selection input is always greater
than the arrival time of data inputs from the BEC’s Thus, the delay of the remaining groups
depends on the arrival time of multiplexer selection input and the multiplexer delay.
The area calculation of modified CSLA is derived from the following steps. From the
structure of MCSLA, 8-bit, 16-bit, 32-bit and 64-bit area is calculated.
The Group 1 architecture calculation is,
BEC:
AND = 1
NOT = 1
XOR = 10 (2×5)
32
BEC:
AND = 2
NOT = 1
XOR = 15 (3×5)
The total area of the modified CSLA The total area of the different adder is tabulated in
Table 1.7.
Table 1.7 Modified CSLA Area
33
Chapter 5
System requirement
Verilog HDL is one of the two most common Hardware Description Languages (HDL) used by
integrated circuit(IC) designers. The other one is VHDL.HDL’s allows the design to be simulated
earlier in the design cycle in order to correct errors or experiment with different architectures.
Designs described in HDL are technology-independent, easy to design and debug, and are usually
more readable than schematics, particularly for large circuits.
Verilog can be used to describe designs at four levels of abstraction:
(i) Algorithmic level (much like c code with if, case and loop statements).
(ii) Register transfer level (RTL uses registers connected by Boolean equations).
(iii) Gate level (interconnected AND, NOR etc.).
(iv) Switch level (the switches are MOS transistors inside gates).
The language also defines constructs that can be used to control the input and output of
simulation. More recently Verilog is used as an input for synthesis programs which will generate
a gate-level description (a net list) for the circuit. Some Verilog constructs are not synthesizable.
Also the way the code is written will greatly affect the size and speed of the synthesized circuit.
Most readers will want to synthesize their circuits, so non synthesizable constructs should be
used only for test benches. These are program modules used to generate I/O needed to simulate
the rest of the design. The words “not synthesizable” will be used for examples and constructs as
needed that do not synthesize.
There are two types of code in most HDLs:
Structural, which is a verbal wiring diagram without storage? Assign a=b & c | d; /* “|” is a OR
*/ assign d = e & (~c);
Here the order of the statements does not matter. Changing e will change a. Procedural which is
used for circuits with storage, or as a convenient way to write conditional logic.
always @(posedge clk) // Execute the next statement on every rising clock edge.
count <= count+1;
34
Procedural code is written like c code and assumes every assignment is stored in memory until
over written. For synthesis, with flip-flop storage, this type of thinking generates too much
storage. However people prefer procedural code because it is usually much easier to write, for
example, if any case statements are only allowed in procedural code. As a result, the synthesizers
have been constructed which can recognize certain styles of procedural code as actually
combinational. They generate a flip-flop only for left-hand variables which truly need to be
stored. However if you stray from this style, beware. Your synthesis will start to fill with
superfluous latches. This manual introduces the basic and most common Verilog behavioral and
gate-level modeling constructs, as well as Verilog compiler directives and system functions. Full
description of the language can be found in Cadence Verilog-XL Reference Manual and Synopsys
HDL Compiler for Verilog Reference Manual. The latter emphasizes
only those Verilog constructs that are supported for synthesis by the Synopsys Design Compiler
synthesis tool.
In all examples, Verilog keyword is shown in boldface. Comments are shown in italics.
35
Numbers
Number storage is defined as a number of bits, but values can be specified in binary, octal,
decimal or hexadecimal (See Sect. 6.1. for details on number notation).
Examples are 3’b001, a 3-bit number, 5’d30, (=5’b11110), and 16‘h5ED4, (=16’d24276)
Identifiers
Identifiers are user-defined words for variables, function names, module names, block names and
instance names.
Identifiers begin with a letter or underscore (Not with a number or $) and can include any number
of letters, digits and underscores. Identifiers in Verilog are case-sensitive.
Syntax
allowed symbols
ABCDE . . . abcdef. . . 1234567890 _$
not allowed: anything else especially - & #@
5.1.2. Operators
Operators are one, two and sometimes three characters used to perform operations on variables.
36
Examples include >, +, ~, &, !=. Operators are described in detail in “Operators”.
Primitive logic gates are part of the Verilog language. Two properties can be specified, drive
strength and delay. Drive strength specifies the strength at the gate outputs. The strongest output
is a direct connection to a source, next comes a connection through a conducting transistor, then a
resistive pull-up/down. The drive strength is usually not specified, in which case the strengths
defaults to strong1 and strong0. Refer to Cadence Verilog-XL Reference Manual for more
details on strengths.
Delays: If no delay is specified, then the gate has no propagation delay; if two delays are
specified, the first represent the rise delay, the second the fall delay; if only one delay is
specified, then rise and fall are equal. Delays are ignored in synthesis. This method of specifying
delay is a special case of “Parameterized Modules” . The parameters for the primitive gates have
been predefined as delays.
37
buf, not Gates
These implement buffers and inverters, respectively. They have one input and one or more
outputs. In the gate instantiation syntax shown below, GATE stands for either the keyword buf or
not
38
Three-State Gates; bufif1, bufif0, notif1, notif0
These implement 3-state buffers and inverters. They propagate z (3-state or high-impedance) if
their control signal is deserted. These can have three delay specifications: a rise time, a fall time,
and a time to go into 3-state.
Value Set
Verilog consists of only four basic values. Almost all Verilog data types store all these values:
0 (logic zero, or false condition)
1 (logic one, or true condition)
x (unknown logic value) x and z have limited use for synthesis.
z (high impedance state)
Wire
39
A wire represents a physical wire in a circuit and is used to connect gates or modules. The value
of a wire can be read, but not assigned to, in a function or block. See “Functions” on p. 19, and
“Procedures: Always and Initial Blocks” on p. 18. A wire does not store its value but must be
driven by a continuous assignment statement or by connecting
it to the output of a gate or module. Other specific types of wires include:
wand (wired-AND);:the value of a wand depend on logical AND of all the drivers connected to
it.wor (wired-OR);: the value of a wor depend on logical OR of all the drivers connected to it.tri
(three-state;): all drivers connected to a tri must be z, except one (which determines the value of
the tri).
Reg
Declare type reg for all data objects on the left hand side of expressions in inital and always
procedures, or functions. See “Procedural Assignments” on page 12. A reg is the data type that
must be used for latches, flip-flops and memorys. However it often synthesizes into leads rather
than storage. In multi-bit registers, data is stored as unsigned numbers and no sign extension is
done for what the user might have thought were two’s complement numbers.
40
These keywords declare input, output and bidirectional ports of a module or task. Input and
inout ports are of type wire. An output port can be configured to be of type wire, reg, wand, wor
or tri. The default is wire
Integer
Integers are general-purpose variables. For synthesis they are used mainly loops-indicies,
parameters, and constants. See “Parameter” on p. 5. They are of implicitly of type reg. However
they store data as signed numbers whereas explicitly declared reg types store them as unsigned.
If they hold numbers which are not defined at compile time, their size will default to 32-bits. If
they hold constants, the synthesizer adjusts them to the minimum width needed at compilation.
Supply0, Supply1
Supply0 and supply1 define wires tied to logic 0 (ground) and logic 1 (power), respectively
41
Time
Time is a 64-bit quantity that can be used in conjunction with the $time system task to hold
simulation time. Time is not supported for synthesis and hence is used only for simulation
purposes.
Parameter
Parameters allow constants like word length to be defined symbolically in one place. This makes
it easy to change the word length later, by change only the parameter. See also “Parameterized
Modules” . An alternative way to do the same thing is to use macro substitution, see “Macro
Definitions”.
42
5.1.6. Operators
Arithmetic Operators
These perform arithmetic operations. The + and - can be used as either unary (-z) or binary (x-y)
operator
Relational Operators
Relational operators compare two operands and return a single bit 1or 0. These operators
synthesize into comparators. Wire and reg variables are positive Thus (-3’b001) = = 3’b111 and
(-3d001)>3d110. However for integers -1< 6.
43
Bit-wise Operators
Bit-wise operators do a bit-by-bit comparison between two operands. However see “Reduction
Operators”.
Logical Operators
Logical operators return a single bit 1 or 0. They are the same as bit-wise operators only for
single bit operands. They can work on expressions, integers or groups of bits, and treat all values
that are nonzero as “1”. Logical operators are typically used in conditional (if ... else) statements
since they work with expressions.
44
Reduction Operators
Reduction operators operate on all the bits of an operand vector and return a single-bit value.
These are the unary (one argument) form of the bit-wise operators above.
Shift Operators
Shift operators shift the first operand by the number of bits specified by the second operand.
Vacated positions are filled with zeros for both left and right shifts.
Concatenation Operator
The concatenation operator combines two or more operands to form a larger vector.
45
5.1.7. Operands
Literals
Literals are constant-valued operands that can be used in Verilog expressions. The two common
Verilog literals are:
(a) String: A string literal is a one-dimensional array of characters enclosed in double quotes (“
“).
(b) Numeric: constant numbers specified in binary, octal, decimal or hexadecimal.
46
Function Calls
The return value of a function can be used directly in an expression without first assigning it to a
register or wire variable. Simply place the function call as one of the operands. Make sure you
know the bit width of the return value of the function call. Construction of functions is described
in “Functions”.
5.1.8. Modules
Module Declaration
A module is the principal design entity in Verilog. The first line of a module declaration specifies
the name and port list (arguments). The next few lines specifies the i/o type (input, output or
inout, ) and width of each port. The default port width is 1 bit.
Then the port variables must be declared wire, wand,. . ., reg . The default is wire. Typically
inputs are wire since their data is latched outside the module. Outputs are type reg if their signals
were stored inside an always or initial block.
47
Continuous Assignment
The continuous assignment is used to assign a value onto a wire in a module. It is the normal
assignment outside of always or initial blocks. Continuous assignment is done with an explicit
assign statement or by assigning a value to a wire during its declaration. Note that continuous
assignment statements are concurrent and are continuously executed during simulation. The order
of assign statements does not matter. Any change in any of the right-hand-side inputs will
immediately change a left-hand-side output.
Module Instantiations
Module declarations are templates from which one creates actual objects (instantiations).
Modules are instantiated inside other modules, and each instantiation creates a unique object
from the template. The exception is the top-level module which is its own instantiation.
The instantiated module’s ports must be matched to those defined in the template. This is
specified:
48
(i) By name, using a dot (.) “.template_port_name (name_of_wire_connected_to_port)”. or(ii) by
position, placing the ports in exactly the same positions in the port lists of both the template and
the instance.
Here I am taking simple AND ing example for understanding of step by step procedure to run a
program on hardware.
49
STEP1: Open Xilinx ISE and create a new project.
click on Next
50
Step 2: Select the Family, Device, Package and Speed of Xilinx board and also select your
programming language (Verilog/VHDL). Here i am using Verilog language.
click on Next
51
Project summery window occurs.
Click on Finish
52
Step 3: Click on Project > New Source
Step 4: Select Source type is Verilog Module and enter the file name (ANDing_code).
Click on Next
53
Step 5: Define Module Window.
Here we can define inputs & outputs and its bit/bus size. In ANDing example there are two inputs
and output of single bit each. Also define clock signal for clocking operation.
Click on Next.
Click on Finish.
54
Step 7: The Project Navigator window looks like below window.
All the inputs and outputs are already defined in Define Module window so these inputs and
outputs are seen in project navigator.
Step 8: Write a program for ANDing operator in module present in project navigator.
55
Step 9: Click on Project > New Source. Select Implementation Constraints file type and enter the
file name (e.g. pinout).
Click on Next.
Click on Finish.
56
Step 10: Write the inputs, outputs and its pin location in proper format of .ucf file. (use datasheet
of Xilinx board for pin location). Here two switchs SW0 and SW1 are used for input and one led
LD0 is used for output.
Step 11: Open main ANDing program and double click on Synthesize – XST. After successful
completion of Synthesis, double click on Implement Design. Implement design consists of three
parts-
Translate
Map
Place and Route
57
Step 12: Double click on Configure Target Device and a new ISE iMPACT window open.
58
Step 13: Connect the Xilinx board to your PC/Laptop using USB cable.
Step 14: Double click on Boundary scan. Check auto cable connection Output > Cable Auto
Connect and if cable is connected then Window bottom part looks like step 3 shown in below.
59
Step 15: Click on File > Initialize Chain. After that they ask for “Do you want to continue and
assign configuration files(s)?”
60
Step 16: Click on Open. After that they ask “Do you want to attach an SPI or BPI PROM to
this device?” click on No tab. Click on Operations > Program. If Programming successful then
they shows Program Succeeded.
61
Step 17: Check the output on hardware(Board). Here I am giving input through switches and
output shows on LED. Output:
62
Chapter 6
Results and discussion
63
6.1.2 RTL schematic results in Xilinx 9.1i software
Above Fig 6.2 is the RTL schematic of the Modified Carry Select Adder (MCSLA),
which is an internal block of the 16-bit modified carry select adder. The first architecture consist
of 2-bit RCA, it have the carry input as 0. The next level of the adder block consist 2-bit RCA
and 3-bit BEC. The 2-bit RCA have Cin = 1 and the 3-bit BEC have Cin = 0. The first
architecture’s carry is going to the next level of multiplexer’s select line or control line. By the
use of multiplexer, the sum value is obtained.
64
6.1.3 Technology schematic of MCSLA
Above Fig 6.3 is the technology schematic of 64-bit modified carry select adder architecture,
which is the schematic representation of the MCSLA design architecture
65
6.1.4 Simulation results in Xilinx 9.1i software
The above Fig 6.4 is the simulation results of modified carry select adder in Xilinx 14.1i.
The output performance is same as modelsim. Here the input data is in the form of hexadecimal
‘a=0011010011100011 and b=0010011110101001’. The output result is
‘Sum =0101110010001100 and Cout = 0’.
66
6.2 Comparison of the regular and modified csla
Word Adder Area (in Gates) Delay (in ns) Power (in
Size mW)
The above tabulation discusses comparisons of regular and modified CSLA. The
delay overhead of the 64-bit is slightly larger delay with 2.166ns. The area and power of the
modified CSLA are significantly reduced by 17% and 15% respectively. The modified CSLA
architecture is low power, low area, simple and efficient for VLSI hardware implementation
67
Chapter 7
Conclusion
The simple approach is proposed in this project to reduce the area and power of CSLA.
The reduced number of gates of this work offers the great advantage in the reduction of area and
also the total power. The compared results shows that the modified CSLA has a slightly larger
delay (only 2.166ns), but the area and power of the 64-bit modified CSLA are significantly
reduced by 17% and 15% respectively. The modified CSLA architecture is therefore, low area,
low power, simple and efficient for VLSI hardware implementation. By adapting this technology,
it is used in various applications like multipliers, DSP to execute various algorithms like FFT,
FIR and IIR. Using model based approach; this could be implemented and tested using XILINX.
68
Chapter 8
Future work
It would be interesting to test the design of the modified 128-bit SQRT CSLA and it
would be interesting to use Carry Look-ahead Adder (CLA) instead of RCA with Cin=0 in the
modified carry select adder (MCSLA) to achieve high speed and small delay.
69
Bibliography
Online helps:
1. Google
70
References
1. Anitha Kumari R D, Nayana N D, “Low power and Area Efficient Carry Select Adder”,
National Conference on Electronics, Communication and Signal Processing, NCECS-2011.
2. Bedrij O.J, “Carry-select adder,” IRE Trans. Electron. Comput., pp.340–344, 1962.
3. Ceiang T Y and Hsiao M J, “Carry-select adder using single ripple carry adder,” Electron.
Lett., vol. 34, no. 22, pp. 2101–2103, Oct. 1998.
4. Jeong .W and Roy .K, “Robust high-performance low power adder”, Proc. of the Asia and
South Pacific Design Automation Conference, pp. 503-506, 2003.
5. He Y, Chang C H, and Gu J, “An area efficient 64-bit square root carry select adder for low
power applications,” in Proc. IEEE Int. Symp. Circuits Syst., 2005, vol. 4, pp. 4082–4085.
6. Hosseinghadiry M, Mohammadi H and Nadisenejani M, '' Two New Low Power High
Performance Full Adders with Minimum Gates", World Academy of Science, Engineering and
Technology 52 2009.
8. Kim Y and Kim L S, “64-bit carry-select adder with reduced area”, Electron.Lett., vol. 37, no.
10, pp. 614–615, May 2001.
9. Manoj Kumar, Sandeep K. Arya and SujataPandey, “Single bit full adder design using 8
transistors with novel 3 transistors XNOR gate”, International Journal of VLSI design &
Communication Systems (VLSICS) Vol.2, No.4, December 2011.
71
10. Massimo Alioto and Gaetano Palumbo, "Optimized Design of Carry-Bypass Adders",
ECCTD’01 - European Conference on Circuit Theory and Design, August 28-31, 2001, Espoo,
Finland.
11. Padma Devi, Ashima Girdher and Balwinder Singh, ”Improved Carry Select Adder with
Reduced Area and Low Power Consumption”, International Journal of Computer Applications
(0975 – 8887), Volume 3 -No.4, June 2010.
12. Ram Kumar .B and Kittur H.M, “Low-Power and Area-Efficient Carry Select Adder”, IEEE
transactions on very large scale integration (VLSI) systems, vol. 20, no. 2, February 2012.
13. Saiful Islam Md, Muhammad MahbuburRahman, Zerina begum and Mohd.Zulfiquar Hafiz,
"Fault Tolerant Reversible Logic Synthesis: Carry Look-Ahead and Carry-Skip Adders", ACTEA
2009July 15-17, 2009 ZoukMosbeh, Lebanon.
72