A High-Performance On-Chip Memory Module For Image Processing Applications
A High-Performance On-Chip Memory Module For Image Processing Applications
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
Abstract: To actualize numerous image processing applications, today's technology revolution in portable electronic devices
necessitates significant amounts of data processing power, area, and time. Approximate data computing has been demonstrated
to be a viable solution for picture processing. In real-time digital data computing applications, achieving high precision or
fulfilling a variable accuracy requirement of up to 100 percent has become a crucial design goal. As a result, very large-scale
integration (VLSI) architectures with excellent performance are critical in the design of digital signal processors. The use of
high-performance adders and on-chip memory structures has also been a focus of recent breakthroughs in processor design
Stability and dependability are crucial design factors that contribute to a reduction in worst-case error and the use of a hybrid
application in real time. High precision, high speed, low power, and area efficiency are achieved by balancing the restrictions of
high performance. The processor core's performance is determined by the configuration, design parameters, and efficient use of
the data channel and on-chip memory structures. As a result, high-performance adders and on-chip memory architecture were
used in this study. To provide superior quantitative computation for human perception, the suggested design is included into
image processing applications such as image approximation, canny edge detection, and noise reduction approaches. Berkeley,
medical, and satellite image samples were used..
I. INTRODUCTION
A. VLSI Design
We offer an overview of high-performance Very Large-Scale Integration (VLSI) architectures for image enhancement, as well as
the core concepts, principles, and aspects of high-performance design approach, as well as the research study's rationale. To achieve
high precision, fast speed, low power, and space efficiency, the restrictions associated in the high-performance design are
counterbalanced. This also includes an overview of the design space exploration of high-performance VLSI domain with special
reference to adder and on-chip modules, as well as the discovery of issue definition and research effort scope..
2
i = 0→1 ƒ (1.3)
r= ƒ ( + ƒ) (1.4)
0→1
i = i (1.5)
Where, in Equation (1.2), Powercapacitor denotes the dynamic power generated by the capacitance charging and discharging of a
load capacitance. CL, a01 is the switching activity factor for a one-clock-period transition probability from logic "0" to logic "1,"
clk is the operating frequency, and VDD is the supply voltage.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 146
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
PowerSCC is the power consumption due to short circuit peak current (IPeak) and is proportional to the CMOS transistor size. The
limitations of time
The rise and fall times of short circuit current are represented by tr and t, accordingly.
The static power in Equation (1.5) is determined by the leakage current, which can be either reverse-biased PN junction
current or sub-threshold current. From the lowest technology CMOS transistor level to the highest technology
architectural levels, power optimization approaches have been investigated, influencing all parameters such as a01, CL,
clk, and VDD while maintaining the needed function..
Because of the rising degrees of integration and the requirement for mobility, minimising the power consumption of circuits is vital
for a broad variety of high-volume digital data computing applications. Because the speed of arithmetic components is frequently a
limiting factor in performance, it is equally critical to increase that speed. At every design level, such as gate and transistor-level
technologies, where the majority of the power may be avoided at the high conceptualization, power reduction must be addressed. An
optimised compact design is sought at the gate level of high-performance architectures to provide energy efficiency, high speed, and
reliability for large-quantity digital data computing applications. It's also crucial to have good driving abilities under various load
levels and a balanced output to avoid errors. Layout regularity and connection complexity are critical since the modules are repeated
in huge numbers. Optimizing the gate count for a certain performance is the greatest technique to minimise junction capacitance as
well as overall gate capacitance. A number of optimization strategies have been developed to reduce the design area while
preserving performance. Path optimization and global optimization are two types of design optimization. In path-based
optimization, the critical routes' gates are increased in size to achieve the required performance, while the off-critical paths' gates
count is lowered to achieve low power consumption..
All gates in an architecture are globally optimised for a specified delay in the global optimization. To meet the total system cycle
time requirements, the designs are primarily built for maximum performance. They are made up of huge, highly parallel structures
that follow a logic pattern. As a result, the leakage power consumption of such systems is significant. However, not every
application need a fast circuit to run at peak speed all of the time. Using the timing slack, many circuit approaches have been
developed to decrease leakage energy without compromising performance. These methods may be classified according on when and
how they employ the available time slack. In non-critical logic gates, dual threshold voltage statically assigns high threshold voltage
to certain transistors. The strategies that take advantage of the slack in runtime may be classified into two categories based on
whether they minimise standby leakage or active leakage. When the calculation is not required, standby leakage reduction
techniques can place the entire system into a low leakage state. When maximum performance is not required, active leakage
reduction approaches slow down the system by dynamically altering the threshold voltage to decrease leakage. The operational
temperature rises in active mode due to the switching actions of transistors. This has an exponential impact on sub-threshold
leakage, multiplying the leakage power and making it the major leakage factor during active mode..
The space, speed, power dissipation, and wiring complexity of a chip level VLSI system are all influenced by the logic gates
optimization and switching activity along the critical route.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 147
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
As a result, approximations have been performed in adders at various levels, which have been classified as segmented (Irina & Lau
2017; Kim & Kim 2016), reconfigurable (Andrew & Seokhyeong 2012; Garg et al. 2016), speculative (Ing et al. 2015; Junjun &
Weikang 2015), floating addition (Rui & Weikang 2015; Weiqiang et all
(Haider et al. 2016), gate level (Ramkumar & Harish 2012), evolutionary type (Adnan et al. 2015; Radek et al. 2016), application
specific adders (Darjn et al. 2016), carry truncation (Vincent et al. 2016), reversible logics (Pawan et al. 2015), and architectural
level (Ning et al. 2010). Recent studies suggest that applications like probabilistic analytics, image, video, and machine learning
applications, among others, have a high degree of error resilience (Sunil et al. 2016). For human vision, this allows for
approximation computing rather than precise calculation (Melvin & Haiyang 2006). As a result, the key challenge for designers of
approximation full adder designs is to achieve high accuracy, fast speed, low energy consumption, and area efficiency.
IV. METHODOLOGY
A. Proposed Approximate Full Adder Logic
Approximate full adder logic outperforms the existing Conventional Full Adder (CFA) in terms of speed, area, and energy
efficiency (Mahdiani et al. 2010). A low-performance design could accept or tolerate probable mistakes to an acceptable degree. A
high-performance architecture, on the other hand, can speculatively execute the approximate result rather than waiting for the exact
result to become available. The speedup in machine cycles of an approximation unit relative to an exact unit is more sensitive to
accuracy than throughput performance. As a result, four approximation full adder logics are provided for increasing the approximate
arithmetic units' performance.
Figure 1 shows that AFAL1 has 6 basic logic gates (AND, OR, and NOT gates) and 4 logic delays instead of 13 basic logic gates
(2*5) + 3 and 6 logic delays utilised by CFA to build the primary gate level structure (2 Exor gates (2*5) + 3). The principal inputs
are "A" and "B," while "C" is the propagating input from the previous stage. The complement of the carry propagating signal output
is the sum output.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 148
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
′
X= + + , = = . . = .X+ .
The logic implementation of AFAL2, which contains two OR gates, one AND gate, and a multiplexer, is shown in Figure 3.
Multiplexer's underlying logic implementation consists of four fundamental logic gates. As a result, AFAL2 has seven fundamental
logic gates and two faults out of eight cases. In a simulation procedure, the suggested AFAL2's carry propagation has one logical
delay. If there is an error in the carry output, the sum output selects the appropriate approximation pre-computation logic as shown
in Table 2..
Table 2Truth Table of proposed AFAL2
INPUTS PRE- AFAL2 ED
COMPUTATIONS OUTPUTS
A B C AHASG HACG CARRY SUM
0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 0
0 1 0 1 0 0 1 0
0 1 1 1 0 0 1 -1
1 0 0 1 0 0 1 0
1 0 1 1 0 0 1 -1
1 1 0 1 1 1 0 0
1 1 1 1 1 1 1 0
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 149
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
Table 3 gives the truth table of proposed AFAL3 and the block diagram is shown in Figure .4.
The overall logic diagram of AFAL3, which contains six fundamental logic gates, is shown in Figure 5. In addition, the most
important bit of the input A is picked directly in the carry propagation to reduce the output capacitance at the carry output and to
increase speed. As a result, if there is an issue in AFACG, the MSB of A selects the suitable pre compute logic instead of the exact
sum as shown in Table 3. Out of eight examples, AFAL3 has two mistakes. As a result, if there is an error in the carry output for any
combination of input values, there is also an error in the total output owing to the pre computation selection, which minimises the
error distance by generating a compensatory deviation in the combined output value. The AHASG output is picked if the input „A =
0,” else the HACG output is selected in the AFASG output. The carry propagation of the proposed AFAL3 contains one buffer logic
gate, as indicated in Table 3.3.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 150
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
a) Equation CARRY = A is used to generate an approximation carry signal utilising a structural hierarchy of two AND-OR gate
blocks. (B and C)
b) One of the multiplexer inputs M 0=A+B+C is used to generate the total of all inputs.
c) The product of two inputs is generated and applied to another input of the multiplexer M 1 = B.C.
d) If CARRY = „0,” the Total output selects the sum of all inputs; otherwise, the SUM output selects the product of two inputs,
which has one mistake out of eight situations, as shown in Table.4.
Figure 7 displays the AND-OR gate block logic diagram, which has been simplified with a single "AND" and "OR" gate function as
given in Equations for increasing the regularity in the proposed AFAL4 structural hierarchical design..
= +
= .
Table 4 Truth Table of proposed AFAL4
INPUTS MULTIPLEXER AFAL4 ED
INPUTS OUTPUTS
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 151
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
The suggested m-bit SSOC SP SRAM memory area is increased over the n-bit traditional memory architecture, and the expected
memory size is shown in Equation. For image processing applications, user-defined multiples data depth and data breadth memory
architecture modules are offered. In place of the existing conventional SPSRAM, the suggested area efficient approximation SSOC
memory architecture is employed to provide certain data-access regulations. The address decoder is used to write or read data from
the storage element's successive places. Instead of a precise n-bit data storage array, the proposed SSOC SPSRAM uses an
approximation m-bit data storage array. The current and next input data values are placed in the memory's next neighbouring
position in a raster scan order approach. The most important bit location of the input data is used to pick the m-bit static segment, as
illustrated in Figure 9 (a), where m-bit is smaller than n-bit. The minimum allowable accuracy, area, power, and acceptance
probability will all drop as the segment size m-bit decreases. When the segment size is set to n bits, the proposed SSOC SP SRAM
behaves like a standard SP SRAM..
Figure 9 (b) shows how to convert m-bit to n-bit by padding "1" at an n-m bit position for better worst-case accuracy and padding
"0" from n-m-1 to the output register's least significant bit position. Figure 9 shows how the anticipated area of the SSOC SP SRAM
is connected to the m-bit of data for the same array size (c). The SSOC architecture presented is aimed at 45nm, 90nm, and 180nm
technologies.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 152
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
Figure 10 depicts the number of logical gates and the time it takes to generate sum and carry in proposed and current complete adder
logics. The logical gate count in AFAL1 and AFAL3 is lower than in any other logic design. Furthermore, as compared to the other
AFALs, AFAL3 has a shorter carry propagation time. AFAL3 is the best in terms of logical gate count and latency.
(d) (e)
(f) (g
Table 5 Design metrics comparison of existing and proposed full adders using Cadence compiler
AdderType Cell Leakage Dynamic Total Delay(ps) ADP PDP(aJ)
Area Power (nW) Power (nW) Power(nW)
(µm2) (µm2
.ps)
EX.CFA 17 51.390 314.024 365.414 276 4692 100.85
EAFA5 14 36.921 198.688 235.609 227 3178 53.48
EAFA6 11 41.108 244.259 285.367 183 2013 52.22
AFAL1 10 20.907 113.247 134.157 150 1500 20.12
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 153
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
According to Figure 3.10, AFAL1 and AFAL3 have lower ADPs of 1500 and 1520 m2.ps, respectively. AFAL1 has a lower ADP
than EXCFA, EAFA5, EAFA6, AFAL2, AFAL3, and AFAL4 (68.03 percent, 52.80 percent, 25.48 percent, 19.87 percent, 1.32
percent, and 42.82 percent, respectively).
D. Simulation Environment And Result Analysis of Mbit SSoc
This section compares the overall statistics of the proposed m-bit SSOC SP SRAM to the existing n-bit conventional SP SRAM
using simulation and synthesis data. The design metrics of all modules are evaluated using Cadence Encounter (R) simulation,
which is based on a gate level Application Specific Integrated Circuits (ASIC) TSMC 90nm technology. Xilinx14.2 and MATLAB
compilers are used to perform exhaustive quality and error metrics on the proposed memory.
1) Design Metrics Analysis: The design parameters for the 16-bit data and 8-bit data conventional and m-bit suggested memory
are shown in Tables 6 and 7. In 1K memories with a 16-bit data width, the estimated data storage of proposed 14-bit, 12-bit,
and 10-bit SSOC SP SRAM takes up 12.17 percent, 24.50 percent, and 37.11 percent less space than 16-bit conventional SP
SRAM. A similar pattern may also be seen in terms of electricity use. When it comes to total power consumption, the proposed
10-bit SSOC SP SRAM uses roughly 40% less than the present 16-bit conventional SP SRAM. Furthermore, the suggested 10-
bit SSOC SP SRAM has a 12.32 percent lower latency than standard 16-bit memory.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 154
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
The Power Delay Product (PDP) and Area Delay Product (ADP) of 16-bit data width memory are shown in Figures 13 and 14. The
PDP and ADP of the 10-bit SSOC SP SRAM are lowered by 47.81 percent and 44.87 percent, respectively, when the rate of
memory data width in data storage array is dropped from 16-bit to 10-bit, as shown in Figures 13 and 14.
When compared to the existing 8-bit conventional SP SRAM, the proposed 4-bit SSOC SP SRAM occupies 49 percent less space,
uses 50 percent less power, and has a 16.92 percent lower latency, as shown in Table. Figures 15 and 16 indicate that the proposed
4-bit SSOC SP SRAM has 58.98 percent less PDP and 57.69 percent less ADP than an 8-bit conventional memory. As a result of
the foregoing design characteristics, the proposed SSOC SP SRAM core is more effective at computing large amounts of data for
error-tolerant applications..
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 155
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue II Feb 2022- Available at www.ijraset.com
VI. CONCLUSION
To achieve 99.8437 percent overall computational accuracy, the 16-bit High Performance Error Tolerant Adder 1 (HPETA1) is
constructed employing single bit Approximation Full Adder Logic 1 (AFAL1). Furthermore, multiplexer-based hardware is
proposed for AFAL2, AFAL3, and AFAL4 for simplicity of testability and reliability. The suggested AFAL2, AFAL3, and AFAL4
approximation full adders outperform other approximate full adders on the evaluated design criteria. The m-bit SSOC memory
architectures are suggested in the next phase, which have lower area, access time, and power consumption in design parameters
while losing some average accuracy by lowering the size of the storage array element compared to traditional on-chip memory. .
The proposed 4-bit SSOC SP SRAM design provides the same quality output images with 49 percent area savings, 50 percent power
reduction, and 16.92 percent speed improvement over the existing 8-bit conventional SP SRAM in image processing applications,
according to a comparative study of overall balanced performance in the design and quality metrics.
Based on a comparison of overall balanced performance in design and quality parameters, the suggested high-performance adders
and memory VLSI architectures are better appropriate for specialised and actual image processing applications.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 156