0% found this document useful (0 votes)
14 views4 pages

A Multiplier-Free Discrete Cosine Transform Architecture Using Approximate Full Adder and Subtractor

Uploaded by

thirukg77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

A Multiplier-Free Discrete Cosine Transform Architecture Using Approximate Full Adder and Subtractor

Uploaded by

thirukg77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

This article has been accepted for publication in IEEE Embedded Systems Letters.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2024.3395900

A Multiplier-Free Discrete Cosine Transform Architecture


Using Approximate Full Adder and Subtractor
Elham Esmaeilia, Nabiollah Shiri*, a, Mahmood Rafieea, Ayoub Sadeghia
a
Department of Electrical Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
* Corresponding Author: Nabiollah Shiri ([email protected])
Abstract— A new approximate full adder (FA) and The efficient algorithm requires 5 multiplications and 29
a new approximate subtractor are presented, both of additions. By introducing some zeros in the DCT [8], its
them have 8 transistors, and their areas are 0.1944 performance has improved. The multiplications of the
μm2 and 0.1689 μm2, respectively. The FA DCTs are power-consuming and the approximation
experiences three errors, while the subtractor shows algorithms save power and increase the speed. The gate
two errors. In both circuits, to improve the speed, diffusion input (GDI) technique [11] is a low-power
output swing, and drivability, the gate diffusion input alternative to the CMOS logic, in which, one of the
(GDI) and dynamic threshold (DT) techniques are inputs is directly diffused into the gates of the N-type and
implemented by carbon nanotube field effect P-type transistors. As a drawback of the GDI cells, the
transistor (CNTFET) technology. The FA and voltage swing drop is solved by the dynamic threshold
subtractor in order are embedded in an 8-bit ripple (DT) and carbon nanotube field-effect transistors
carry adder (RCA) and an 8-bit subtractor, then they (CNTFET) technology [12]. The GDI-based full adders
make a new approximate multiplier-free discrete (FAs), subtractors, dividers, compressors, and
cosine transform (DCT). The 8-point approximate multipliers reduced area with full-swing outputs [13-15].
DCT manipulation requires only addition and no Transform coding and image compression are
multiplication. So, computational complexity is accomplished by the DCT. To have a joint photographic
brought down. The DCT shows power delay product experts group (JPEG) compression, a multiplier is
(PDP), peak signal-to-noise ratio (PSNR), and a utilized in a DCT, which occupies a large area and
figure of merit (FoM) of 63.61 fJ, 34.96 dB, and 2.39, consumes a high power [16]. FA, subtractor, D flip-flop,
respectively. The features of the presented and 1-bit left shift circuits are also used in the DCT [10],
approximate DCT confirm its application for image which cause high complexity and power.
compression and noise removal in medical images. The suggested approximate FA is implemented in a
multiplier-less DCT structure of [10] with a pipelined
Index Terms—Approximate full adder, Approximate fashion to have acceptable complexity and power
subtractor, Multiplier-free DCT, Bioimages. consumption. This letter proposes an approximate FA
and subtractor, that are evaluated by an 8-bit ripple carry
I. INTRODUCTION
adder (RCA) and an 8-bit subtractor. The FA and
N the present century, low-power, fast, and small- subtractor are implemented in an approximate DCT. The
area devices are preferable [1,2]. Arithmetic circuits 32 nm CNTFET technology is used to demonstrate the

I are the main parts of digital systems [3]. Increasing


the circuit density by complementary metal-oxide
semiconductor (CMOS) technology increases power
consumption, and approximate computing can be a
feasibility of the FA, subtractor, and DCT. The proposed
designs are fast, low-power, and small-area, and the DT
technique causes full-swing outputs. The input image
with 255×255 pixels is partitioned into 8x8 blocks and
suitable alternative for error-resilient applications. applied to the presented approximate DCT. The results
Approximate-based arithmetic cells use a lower number of compression, noise removal, and reconstruction of
of logic gates which consequently reduces power medical images confirm the superiority of the DCT.
consumption at the expense of accuracy. Recently, The rest of this letter is divided into subsequent details.
researchers have provided arithmetic circuits [4], such as Section II describes the proposed circuits. Section III
full adders (FAs), subtractors, dividers, compressors, and provides the simulation results and the application can
multipliers. These arithmetic circuits are embedded in be found in Section IV. Section V concludes the letter.
traditional discrete cosine transform (DCT) structures
for audio and image compression [5]. II. PROPOSED CIRCUITS
The implementation of 8-order 1-D DCT requires 64 A. Presented approximate FA
floating-point multiplications and 56 additions. To
Fig.1 expresses the proposed approximate FA. As shown
bypass this scenario, it is necessary to reduce the DCT
complexity. There has been huge interest in finding in the truth table, the Sum and Cout, of the FA have 3 and
fixed-point multiplication-free DCT [6,7] as low-power 1 errors, respectively. The error distance (ED), error rate
and area-efficient digital circuits. In this scenario, the (ER), and normalized mean error distance (NMED) of
approximations of the DCT have been considered [8-10]. the FA in order are |-1|, 37.5%, and 0.125.

Authorized licensed use limited to: K. Ramakrishnan Health and Educational Trust. Downloaded on August 02,2024 at 03:40:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2024.3395900

The subtractor has 8 transistors and produces the outputs


without an inverter, while the Boolean functions of (3)
and (4) display the inverted X and Y. So, the subtractor
depends on the MUX and has low power consumption.
C. 8-bit RCA and 8-bit subtractor
To evaluate the proposed circuits, an 8-bit RCA and an
8-bit subtractor are considered, as shown in Fig. 3 and
Fig. 4, respectively. The outputs of 8-bit RCA and
subtractor provide the inputs of the DCT, and the results
confirm the superiority of the circuits in larger structures.

Fig. 1. Truth table, gate level, transistor level, and layout of the
proposed approximate FA.
The FA has GDI-based F2, 𝐹2 = (𝐴̅ + 𝐵). As an Fig. 3. 8-bit RCA.
advantage, the Sum is generated by the F2 with an
inherent inverter. Fewer internal nodes and fixed input
capacitance through critical paths without inverters,
reduce static power. The F2 enables a multiplexer
(MUX), allowing the MUX to perform faster. Besides, Fig. 4. 8-bit subtractor.
the Cout is generated with OR and AND gates. Each Cout
and Sum are generated by 4 transistors, and the FA has III. SIMULATION RESULTS
8 transistors. The FA’s functions are (1)-(2).
All circuits are simulated by HSPICE and the 32 nm
𝑆𝑢𝑚 = 𝐹2 𝐶 + ̅̅̅
𝐹2 𝐴 (1) CNTFET technology. The frequency is 500 MHz, the
𝐶𝑜𝑢𝑡 = 𝐴𝐵 + 𝐴𝐶 = (𝐵 + 𝐶)𝐴 (2) load is 5 fF, while the pitch and tubes are 5 nm and 10,
respectively. The chirality vector is considered (38,0),
B. Proposed approximate subtractor and the VDD is 0.9 V. By the post-layout extraction, all
The block diagram of the new approximate subtractor is parasitic capacitances and resistances are considered
illustrated in Fig. 2. The GDI technique is used due to [11]. The circuitry performance is evaluated by power-
have advantages of small area, low complexity, and delay product (PDP), and the accuracy is checked by
reduced transistor count, to attain low-power and fast NMED. The results of normalized PDP and NMED of
circuits. The subtractor shows errors in the states of the introduced FA and subtractor with references for the
XYBin=010 and XYBin=101, therefore ED = |±1|, and the number of approximate bits (NABs) of 1, NAB1, in the
ER and NMED are 25% and 0.083, respectively.
RCA are shown in Fig. 5 (a) and Fig.5 (b), respectively.

(a) (b)
Fig. 5. Results of the PDP and NMED for (a) FA, (b) subtractor.
The proposed FA and subtractor have minimum PDP
and the best performance in terms of power and NMED.

IV. APPLICATION
Fig. 2. Truth table, gate level, transistor level, and layout of the DCT compression manages the storage space or
proposed subtractor. transmission bandwidth. A multiplier-free DCT is shown
If the two errors are related to the Diff, Bout produces in Fig. 6 [10], where the proposed FA and subtractor are
correct outputs. The subtractor has only two MUX and implemented. Here, only 24 adders/subtractors are used
one XNOR to give the Boolean functions of (3)-(4). and the 8-point propeller approximate DCT significantly
̅̅̅̅̅̅̅̅
𝐵𝑜𝑢𝑡 = (𝑋 ⊕ 𝑌)𝐵𝑖𝑛 + 𝑋̅𝑌 (3) improves the PDP and area by lowering the number of
adders and removing additional multipliers. The pixels
̅̅̅̅̅̅̅̅
𝐷𝑖𝑓𝑓 = (𝑋 ⊕ 𝑌)𝐵𝑖𝑛 + 𝑋𝑌̅ (4) of the input image are given to the first adder stage and

Authorized licensed use limited to: K. Ramakrishnan Health and Educational Trust. Downloaded on August 02,2024 at 03:40:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2024.3395900

the data is processed by the remaining adders and flip- and structural similarity index measure (SSIM). The
flops. At the end of the third stage of the flip-flops, the DCT circuit based on the proposed FA has a PSNR of
DCT coefficients are ready. 34.96 dB and SSIM of 0.76 for QF=50.
D D D
X0 + + + X0
CLK CLK CLK

D
+ D -+ D
X1 + X4
CLK CLK CLK

D D D
X2 + + + X2
CLK - CLK D CLK
<<1
CLK
(a) (b) (c) (d) (e)
D D -+ D
X3 +
CLK -
+
CLK <<1
D
CLK
X5
Fig. 8. The results of the JPEG method with QF=50, (a) input image,
CLK

D D D D
(b) exact DCT compression, (c) exact IDCT reconstruction, (d)
X4
-+ CLK
+
CLK
<<1
CLK
+
CLK
X1
presented approximate DCT compression, and (e) approximate IDCT
D D D -+ D reconstruction.
X5 + + <<1
X6
- CLK CLK CLK CLK

D D D -+ D
To consider the function of the circuit and the image
+
-+
X6 X3
- CLK CLK
<<1
CLK CLK
processing accuracy, a figure of merit (FoM) is defined
D D D -+ D
X7
-
+
CLK -
+
CLK
<<1
CLK CLK
X7
as FoM=PDP/(PSNR*SSIM). Lower FoM indicates
Fig. 6. The approximate multiplier-free DCT architecture.
better circuit performance and accurate image
The number of additions, multiplications, and bit-shift processing. Table 2 provides power, delay, transistor
operations required for the proposed architecture and the count, PDP, power-delay-area-product (PDAP), PSNR,
references are presented in Table 1. The proposed DCT SSIM, and FoM results of various FAs based on the DCT
has the same condition as T [6], but 57.14%, and 17.24% architecture. Here, the proposed approximate DCT has
saving in power computation than the Conventional better power, PDP, FoM, PSNR, and SSIM compared to
DCT and Scaled DCT [5], respectively. other designs. The Exact and PPA2 have a high number
Table 1. Computation complexity assessment. of transistors and occupy a large area. The comparison
Transform Addition Multiplication Shift
of PSNR in the context of image compression yields only
Conventional DCT 56 64 0
Scaled DCT [5] 29 5 0
≈0.5 dB degradation when compared to an exact DCT.
Table 2. The results of DCT architecture based on different FAs.
BAS-[8] 24 0 0 Adder Power Delay PDP PSNR
T [9] 24 0 2 SSIM FoM
Type (mW) (ns) (fJ) (dB)
T [6] 24 0 6 Exact [1] 3.14 29.40 92.31 35.36 0.85 3.07
Proposed 24 0 6 AFA1 [2] 2.50 29.15 72.87 32.71 0.74 3.01
As shown in Fig. 7, the input image, JPEG, is partitioned AFA2 [4] 2.32 29.88 69.32 33.75 0.73 2.81
into 8x8 blocks, and each block is entered into the PPA1 [3a] 2.25 29.30 65.92 32.80 0.75 2.67
PPA2 [3b] 2.44 29.40 71.73 33.85 0.73 2.90
approximate DCT. Then, the resulting matrix of the DCT
Proposed 2.18 29.18 63.61 34.96 0.76 2.39
coefficients is reduced by removing the high-frequency
Table 2 confirms that the exact FA in the DCT can be
DCT coefficients. The steps of image compression and
replaced by an approximate FA to save power and area
reconstruction are shown in Fig. 7. The reconstruction is
performed by the inverse DCT (IDCT). The presented at the cost of a small decrease in image quality. Table 3
approximate DCT works only until the output compares the hardware consumption of various DCTs,
coefficients are generated and the rest of the process is and the proposed DCT has the best result of PDAP.
done by other compression steps like quantization in Table 3. Comparison of hardware consumption of DCTs.
Power Delay PDP Area
MATLAB. The Quality factor (QF) is set to 50. Adder Type
(mW) (ns) (fJ) (µm2)
PDAP
8×8 Blocks
Source DCT 6.84 40.32 275.7 40.4 11138
Image
Applying the 8x8 image
Perform JPEG
Compression
Scale DCT [5] 4.42 35.22 155.6 38.9 6052.8
DCT
matrix block to the DCT
Structure Coefficients
by Quantizing and
Removing Unimportant
BAS-[8] 4.33 29.60 128.1 37.07 4748.6
DCT Coefficients
T [9] 2.75 29.34 80.68 38.6 3114.2
255×255
Image
T [6] 2.90 29.27 84.88 37.05 3144.8
Reconstructed Proposed 2.18 29.18 63.61 35.3 2245.4
Image Data
Perform DCT The original and reconstructed images are considered,
IDCT
Coefficients
Dequantization
and the PSNR comparisons are presented in Table 4 and
Reconstructed Fig. 9. The standard value of PSNR is between 30 and
Image
40, and its higher value means better results. The PSNR
Fig. 7. Block diagram of DCT/IDCT.
of the Scaled DCT [13] and Conventional DCT are
In this letter, a computed tomography (CT) scan image significantly higher than the other recent algorithms, but
is compressed by the JPEG method and a standard these require a greater number of arithmetic operations.
quantization matrix. The original image, the compressed, This research concentrates on low computational
and the reconstructed image under the DCT are shown in complexity algorithms and Table 2 and Fig. 3 show that
Fig. 8. The quality of the reconstructed images is the proposed DCT has a better PSNR than BAS-[8], T
evaluated using the peak signal-to-noise ratio (PSNR) [9], and T [6].

Authorized licensed use limited to: K. Ramakrishnan Health and Educational Trust. Downloaded on August 02,2024 at 03:40:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Embedded Systems Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LES.2024.3395900

Table 4. PSNR obtained by different 8*8 transform matrices. REFERENCES


Transform Boat Bioimage1 Bioimage2
[1] M. Rafiee, N.Shiri, A.Sadeghi. “High-performance 1-bit full adder
DCT(conventional) 34.2 34.9 34.7
with excellent driving capability for multistage
Scaled DCT [5] 32.2 34.8 34.7 structures’’. IEEE Embedded Syst Lett. 14(1):47-50.
BAS-[8] 29.1 28.6 27.9 doi:10.1109/LES..3108474, 2021.
T [9] 29.6 29.3 28.7 [2] A. Sadeghi, R. Ghasemi, H. Ghasemian, N. Shiri, “High efficient
T [6] 29.1 28.7 28.3 GDI-CNTFET-based approximate full adder for next-generation
Proposed 33.7 34.7 34.6 of computer architectures”, in: IEEE Embedded Systems Letters,
https://doi.org/10.1109/LES.2022.3192530. 2022.
[3] M. C. Parameshwara and N. Maroof, “An Area-Efficient Maority
Logic-Based Approximate Adders with Low Delay for Error-
Resilient Applications’’, Circuits, Systems, and Signal Processing
41:4977–4997. 2022.
[4] E. Esmaeili, F.Pesaran, N.Shiri. “A high-efficient imprecise
discrete cosine transform block based on a novel full adder and
Wallace multiplier for bioimages compression’’. Int Circ Theor;
1‐24. doi:10.1002/cta.3551. Appl. 2023.
[5] K.A Wahid, V.S Dimitrov, G.A Jullien, “On the error-free
realization of a scaled DCT algorithm and its VLSI
implementation”. IEEE Trans. Circuits Syst II Express Briefs. 54,
700–704, 2007.
[6] R.S Oliveira, R.J Cintra, F. MBayer, da Silveira, T.L.T.,
Madanayake, A. Leite, “Low-complexity 8-point DCT
approximation based on angle similarity for image and video
coding”. Multidimens. Syst. Signal Process. 30, 1363–1394 2019.
[7] Z. Zhou and Z. Pan, “Effective Hardware Accelerator for 2D
DCT/IDCT Using Improved Loeffler Architecture”, in: IEEE
Access, https://doi.org/10.1109/ACCESS.2022.3146162. 2022.
[8] S. Bouguezel, M.O. Ahmad, and M.N.S Swamy. “Binary discrete
cosine and Hartley transforms,” IEEE Transactions on Circuits
Fig. 9. Reconstructed with proposed DCT, Scaled DCT [5] and BAS and Systems I: Regular Papers, vol. 60,no. 4, pp. 989–1002,
[8]. For (a) Boat, (b) Bioimage1, and (c) Bioimage2. 2013.
The layout of the proposed DCT has been provided, and [9] R. Ezhilarasi, K. Venkatalakshmi, B.P. Khanth, “Enhanced
the post-layout files are extracted from Electric VLSI for approximate discrete cosine transforms for image compression
and multimedia applications”. Multimed. Tools Appl. 79, 8539–
re-simulation of the circuits. As shown in Fig. 10 the 8552. 2018.
proposed DCT occupies a 35.5 µm2 total area. [10] U S. Potluri, A .Madanayake, R . Cintra, F M. Bayer and N.
Raapaksha, “Multiplier-free DCT approximations for RF multi-
beam digital aperture-array space imaging and directional
sensing”, IOP PUBLISHING, MEASUREMENT SCIENCE AND
TECHNOLOGY, Meas. Sci.Technol. 23 15pp, 2012.
[11] N. Shiri, A.Sadeghi,M. Rafiee, M.Bigonah. “SR-GDI CNTFET-
based magnitude comparator for new generation of
programmable integrated circuits.” Int Circ Theor;1-26.
doi:10.1002/cta.3251, Appl. 2022.
[12] F.Pooladi, F.Pesaran, and N. Shiri. "Efficient GDI-based
approximate subtractors for change detection in bio-image
processing applications." Microelectronics Journal ,135 ,
Fig. 10. Layout of the proposed multiplier-free DCT. 105757. https://doi.org/10.1016/j.mejo.2023.105757. 2023.
[13] A. Sadeghi., et al.: “Tolerant and low power subtractor with 4:2
V. CONCLUSION compressor and a new TG‐PTL‐float full adder cell’’. IET
Circuits Devices Syst. https://doi.org/10.1049/cds2.12117.1–24
Two new 8-transistors approximate full adder (FA) and 2022.
subtractor are proposed. The gate diffusion input (GDI) [14] M. Mirzaei, S. Mohammadi, “Process variation-aware
and dynamic threshold (DT) are the considered approximate fulladders for imprecision-tolerant
techniques. The new FA and subtractor are embedded in applications’’,Computers & Electrical Engineering,Volume 87,
2020.
the 8-bit ripple carry adder (RCA) and 8-bit subtractor,
[15] K.V. Krishnan, A. Satish, P.r. Krishnan, “Design of energy
respectively. The suggested FA and subtractor are efficient approximate subtractors and restoring dividers for error
embedded in a new approximate multiplier-free discrete tolerant applications”, Microelectron. J. 131, 105668,
cosine transform (DCT) architecture for compressing https://doi.org/10.1016/j.mejo.2022.105668. 2023.
[16] S. Ansari , H.Jiang, B.Cockburn ,J. Han . “Low-power
and removing noises in medical images. The results
approximate multipliers using encoded partial products and
show that the exact FA in a DCT structure can be approximate compressors”. IEEE J Emerg Sel Top Circ Syst.
replaced by an approximate FA to save power and area 8(3):404-416. doi:10.1109/JETCAS.2832204. 2018.
at the cost of a small decrease in image quality.

Authorized licensed use limited to: K. Ramakrishnan Health and Educational Trust. Downloaded on August 02,2024 at 03:40:47 UTC from IEEE Xplore. Restrictions apply.
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.

You might also like