Low-Power VLSI Design For Image Analysis in Embedded Vision Systems
Low-Power VLSI Design For Image Analysis in Embedded Vision Systems
Abstract-- An innovative low-power Very Large S cale identification in embedded vision systems, the research
Integration (VLS I) architecture for real-time image study offers a revolutionary VLSI architecture. Identifying
identification in embedded vision systems is presented in this pedestrians on city streets and spotting flaws in
research study. The proposed design combines the cutting-edge manufacturing processes are just a few examples of how
hardware optimizations, algorithmic improvements, and power image detection is used in computer vision as a primary job.
management practices in response to the rising demand for Power limitations, however, frequently prevent the use of
energy-efficient solutions across several applications, including complex image identification algorithms in embedded vision
robotics, surveillance, and autonomous cars. It starts with an systems and force the use of hardware that has been tuned.
effective sensor interface that uses power-saving analog-to-
Real-time image identification may not be possible with
digital converters and smart clock gating techniques to reduce
power usage while data is being acquired. Processing units conventional central processing units (CPUs) due to their
specially created for high-performance operation with less comparatively high power consumption. To reconcile the
power consumption include hardware accelerators for computing requirements of image detection with energy
convolutional neural networks, specialized feature extraction economy, specialized hardware is urgently needed. The
units, and dedicated image identification accelerators. An rising need for embedded vision systems in mobile and IoT
adequately designed memory hierarchy improves energy (Internet of Things) devices, where power efficiency is
efficiency, low-power interconnects, clock gating, and voltage crucial, is the driving force behind the study. Real-time
scaling. The proposed architecture is completed with features image detection, for instance, is essential in autonomous
for data parallelism, power management, and algorithmic
cars to guarantee the safety of passengers and pedestrians. It
improvements. The system's power utilization, real-time
performance, and image identification precision have all will be challenging to accomplish while maintaining strict
undergone extensive testing using benchmark datasets and power budgets. The goal of the study is to suggest a low-
real-world scenarios. The proposed architecture is a solution power VLSI architecture that can identify images in real-
for embedded vision systems because it strikes the perfect mix time without sacrificing precision and responsiveness. To
between real-time image identification and power economy. achieve the ideal balance between performance and power
consumption, the design uses hardware improvements,
Keywords— Low-Power Very Large Scale Integration, Real- algorithm parallelization, and carefully chosen power-
time Image Detection, Power Management, Convolutional efficient processing units. In the study, we first give an
Neural Network, Embedded Vision System. overview of relevant research in the area, highlighting the
many methods for image identification in embedded vision
I. INTRODUCTION systems and the trade-offs between performance and power
consumption. Then, to describe the unique VLSI
architecture in detail, emphasizing the hardware elements
Real-time image identification for improved
and algorithmic improvements that contribute to its
functionality and safety is made possible by embedded effectiveness. To show the viability and efficacy of the low-
vision systems, which have seen tremendous expansion and
power VLSI design through a thorough series of tests on
integration in many applications, from security cameras to benchmark datasets. The findings in the work should serve
drones and driverless vehicles. It is crucial to create low- as a starting point for future research and development in
power VLSI architectures that preserve excellent
low-power embedded vision systems, providing a solid
performance while reducing energy consumption since these option for various applications where real-time image
systems are frequently characterized by strict power limits. identification is essential.
To address the difficulties of low-power, real-time image
II. LITERATURE REVIEW one-to-all product. The TSMC-28nm CMOS accelerator can
attain 1024576 while operating at 500MHz.1.05mJ of
To support the tremendous growth of intelligently based energy is used in each frame, and 29 frames are processed at
systems over the past year, image detection has become the a speed of 35.88 TOPS/W [9]. As an electronic accelerator
most competitive and challenging sector. Extracting picture of CNNs, a modular, low-power platform will be provided.
features is necessary for image detection. Due to the wide The given architecture may execute several CNN models
range of applications and significant computing utilizing reconfigurable hardware thanks to a software
requirements in the fundamental problem, the design library that completely configures it. Additionally, 82
enables real-time processing by applying edge filter frames per second are used for picture analysis, much
methods to extract HOG features and gradients [1]. For low- quicker than earlier versions [10]. To provide an unusually
power computer architectures, to investigate a fresh research compact backbone for generating highly efficient CNN-
approach. New resistive memory, CMOS, and 3D based models for diverse visual applications, as stated in the
technologies are utilized in the method. New embedded brief introduction. Based on the compact model, NCP, and
system applications will be produced by fusing new sensing instruction set, the proposed TinyML system achieves
systems with computer technology inspired by the human exceptional precision and a record meager power of 160mW
brain [2]. The work uses VLSI to construct and put into while performing recognition and detection of images at
practice a real-time video analysis Computer Vision Engine 30FPS [11].
(CVE). It allows the CPU and GPU to concentrate on tasks
requiring less energy, such as motion tracking, facial Maintaining a high degree of image detection in various
recognition, and image identification. Twenty milliwatts are circumstances calls for online training. The designed
employed with a 0.32 nanojoule per pixel energy efficiency processor achieves good detection accuracy on the YT-
[3]. For battery-powered devices, image analysis typically Objects dataset (OD). By attaining a power consumption of
consumes too much power. To provide wake-up and high- 49.5 mW and a real-time online learning OD rate of 34.4
quality imaging with a power consumption of around 10 W, frames per second on mobile devices, the short beat the
the work suggests a method for on-chip image recognition previous OD processor [12]. For complex picture
utilizing conventional 4T pixels in 90nm CMOS technology. categorization and CV problems, CNN is a DL model that is
Face detection often results in 94% accuracy [4]. frequently used and, in the framework of low-power VLSI
circuit synthesizing, developed a groundbreaking design for
CNN-based deep learning techniques necessitate a the architecture of CNN based on totally reversible logic.
significant quantity in the flow of data and computing The suggested method is also more scalable than the
capacity. It has to deal with the hardware platform's conventional design [13]. Computational complexity and
excessive information bandwidth and power usage for real- data size increase power consumption. Suggest combining
world applications in the industry. The CNN accelerator approximation and in-memory computing to reduce power
quickly and reliably classifies pictures and finds images [5]. usage without compromising output. This study develops a
Recognition of images is essential for intelligent monitoring. nonvolatile memory component called a magnetic tunnel
Its main vision-based strategies are segmentation and junction (MTJ) to create a low-power approximation adder
tracking. Without using boundary boxes or color cues, th e for various uses [14]. Electronics and society are being
computer follows moving people using frames. The system revolutionized by artificial intelligence and DNN. DNN-
is created with just one chip. It is implemented using a focused intelligent hardware for computing statistics or
TSMC 90-nm library with 18.71K logic gates, 92.288K on- learning, multiple cores for exact computations, and
chip memory, and 11.4037MW of low power consumption software AI are all integrated into the same SoC. These
[6]. The article offers a low-power mixed-signal in-sensor brain-like SoCs offer augmented reality and low-power,
computing multi-mode CMOS sensor for next-generation high-performance object recognition for smart glasses, self-
wireless sensing using reconfigurable and completely driving cars, and intelligent robots [15]. Reversible logic is a
spontaneous mixed processing of signals at the pixel and viable alternative to digital logic circuits.
column levels, low power and compact area are achieved
throughout the entire CMOS vision sensor. [7]. The goal of Running the system backward prevents information
the study is to examine new approaches, concepts, and loss and enables the recovery of inputs from outputs. The
contributions to address problems with the creation of suggested architecture is applied to systems for surveillance,
computationally intensive smart devices for the Internet of traffic monitoring, high-speed, low-power digital signal
Things. The study proposes an energy-efficient NoC design processing, computer vision, face and gesture recognition,
for embedded computing with high performance. [8]. The and image detection [16]. Despite the high detection
article presents a sparse compressed SNN accelerator that accuracy of convolutional neural networks (CNNs), real-
uses the great sparsity of activating maps and weights for time image identification is challenging, particularly on
minimal power consumption and highly parallelism mobile devices with limited resources. The paper suggests
computation of models by implementing the proposed gated algorithm-hardware co-optimization for real-time image
detection. Compared to previous FPGA -based object created to correspond with the image detection algorithms'
recognition systems, greater efficiency, superior accuracy, data access patterns.
and cheaper hardware setup expenses were achieved [17].
The study examines the substantial challenges that must be
overcome when designing low-power analog devices using
cutting-edge CMOS VLSI technology. The supply voltage is
also decreasing due to developments in scaled approach or
technology, which forces manufacturers of CMOS-based
VLSI analog systems to develop novel, complex solutions to
maintain crucial operating parameters [18].
and processing needs, resulting in lower power A sensor interface provides effective picture data
consumption, neural network weights, and activations are gathering at the system's foundational stage. To reduce
quantized. To reduce energy usage during inference, neural energy consumption during data gathering, the component
networks use power-efficient activation functions. When uses clock gating methods and power-efficient analog-to-
doing more manageable tasks, the system uses adaptive digital converters (ADCs). The architecture's key
processing algorithms that modify the computing effort component, the processing units, are individually created to
level dependent on the complexity of the incoming data. maximize power efficiency. It comprises dedicated image
These hardware and algorithmic improvements are identification accelerators, feature extraction units, and
combined in the suggested VLSI design to produce a real- hardware-specific convolutional neural network (CNN)
time, low-power image identification system appropriate for accelerators. These devices are designed to provide
embedded vision applications. excellent performance while using the least electricity
possible. Power-efficient data access and storage depend on
IV. DESIGN AND IMPLEMENTATION a well-organized memory hierarchy. On-chip caches are
included in the system to decrease power consumption and
lower data access latency. Low-power memory technologies
The proposed VLSI architecture delicately balances
are preferred, and off-chip memory is chosen to meet the
high-performance real-time image detection and low power
data access patterns of image detection algorithms.
consumption. Several essential elements and characteristics
Interconnects are made to effectively convey data between
in the architecture work together to make it possible for it to
components while using the least amount of electricity
operate in a power-efficient manner. Figure 2 shows the
possible. Custom connectivity topologies and low-power
architecture of proposed design.
bus protocols provide energy-efficient data transmission. By
blocking clock signals from reaching idle components, clock
gating is used to lower dynamic power usage. The system
may adjust its supply voltage and clock frequency by the
processing load thanks to dynamic voltage and frequency
scaling (DVFS) methods. The guarantee of functioning is
power-efficient, especially when there are lighter workloads.
Algorithmic improvements that aim to lower computational
complexity and power consumption are advantageous for
the design. Algorithm trimming gets rid of unnecessary and
costly activities. Power-efficient activation functions limit
energy usage during inference, whereas weight and
activation quantization reduce memory and processing
needs. A key concept is data parallelism, which enables the
processing of several picture areas or frames simultaneously
while maximizing hardware resource efficiency and
minimizing power consumption. Through voltage control
and power domain management, power management
capabilities enable the operation of the architecture in
various power states, including active, idle, and sleep. A
comprehensive embedded vision system with elements like
picture pre- and post-processing has the architecture
incorporated into it. Power use, real-time performance, and
image identification precision are all thoroughly tested using
benchmark datasets and realistic scenarios. These tests are
essential for determining the system's appropriateness for
actual use. Any remaining power bottlenecks or energy -
intensive components are optimized based on test findings
through iterative fine-tuning and optimization to increase
overall efficiency.
systems since it is built to maximize power efficiency while how well the system can handle a variety of input
providing competitive real-time performance and good conditions.
image detection accuracy. The architecture may be tailored
to particular applications thanks to its versatility, and T ABLE III: IMAGE DET ECT ION ACCURACY
continuous research into power optimization, algorithmic
advancements, and dynamic power management will further
Algorithm Average Precision (AP)
hone its effectiveness and usefulness.
Viola-Jones [10] 0.92
Haar Cascade 0.91
Custom CNN 0.95
V. RESULTS AND ANALYSIS
The image detection algorithms' accuracy mainly
In this part, one can provide a thorough analysis of the measures the system's operation and shown in Table III. The
findings from the research into the low-power VLSI table shows the average precision (AP) that various
architecture created for real-time image identification in algorithms could attain. The bespoke CNN surpasses
embedded vision systems. The tables below provide a competing algorithms and the 94% peer-reviewed result
comprehensive assessment of critical performance [10], illustrating the better accuracy of the suggested design
characteristics, power consumption, and comparisons with in identifying images in the input data.
existing VLSI architectures. These results are crucial for
evaluating the performance of the suggested architecture T ABLE IV: COMPARISON WIT H EXIST ING VLSI DESIGNS
and demonstrating the advancements gained in the power
economy without sacrificing real-time image identification
Design Power FPS (Frames
capabilities and the values are tabulated in Table I.
Comparison Consumption per Second)
(mW)
T ABLE I: POWER CONSUMPT ION ANALYSIS
Proposed 5.57 50
Design
Component Power Consumption (mW) TMSC-90 [6] 11.4 45
Sensor Interface 0.25 TinyML [11] 160 30
Processing Units 3.12
Memory Hierarchy 1.45 Table IV compares the proposed architecture and
Interconnects 0.75 existing VLSI solutions is given in the table. Real-time
Total Power 5.57 performance and power use are both highlighted. The
Consumption unique feature of the suggested design is that it uses less
power than TMSC-90 [6] and TinyML [11] while
The proposed low-power VLSI architecture's maintaining a comparable frame rate. Figure 3 describes the
component power consumption is broken down in the table. power consumption and FPS analysis.
The power demand is most from the processor units, which
use the least power than the sensor interface. The
breakdown makes it easier to spot components that consume Power Consumption (mW)
much power and optimize them for increased power FPS (Frames per Second)
efficiency.
160
T ABLE II: REAL-T IME PERFORMANCE
Design Comparison Energy Efficiency (FPS/mW) change. For particular applications, more adjusting and
Proposed Design 9.0 customizing could be needed.
TMSC-90 [6] 6.24
TinyML [11] 6.98 Additionally, even though power usage has been
dramatically decreased, there is still an opportunity for
future optimization to improve energy efficiency. Future
By calculating the frames per second (FPS) attained per research in this field should concentrate on several areas.
watt (FPS/mWatt), the table concentrates on energy First, the architecture may be expanded and customized for
efficiency, a crucial statistic in embedded systems. More particular applications, including robotics, car safety
energy-efficient than TMSC-90 and TinyML, the suggested systems, and medical imaging. To further reduce power
design gives the best value. The statistic shows how the usage, integrating sophisticated power management
proposed architecture can carry out image detection tasks techniques like dynamic power gating and dynamic voltage
efficiently while using less energy, making it a desirable scaling is also possible. Further reducing computing
option for embedded systems that run on batteries and have complexity and power needs can be accomplished by
limited resources. Table V and Figure 4 shows that the ongoing research into algorithmic improvements and neural
energy efficiency of proposed design. network pruning techniques. Overall, the demonstrated low-
power VLSI architecture for real-time image identification
ENERGY EFFICIENCY (FPS/mW) paves the door for more energy-efficient embedded vision
systems. It may realize the full promise of low-power, high-
10 performance embedded vision systems for various
8 applications by resolving the identified limits and pursuing
the recommended future work, ensuring compliance with
6 the strict power constraints of contemporary technology.
Energy
4 Efficiency
2 (FPS/mW) REFERENCES
0
Proposed TMSC-90 TinyML [1] Sasongko and B. Sahbani, “VLSI Architecture for Fine Grained
Design Pipelined Feature Extraction using Histogram of Oriented Gradient,”
IEEE 7th Conference on Systems, Process and Control (ICSPC), Dec.
2019.
Fig.4: Graphical representation of Energy efficiency comparison [2] T . Ernst, “Future of Computing and Sensing Systems for Embedded
between proposed and existing systems Applications,” International Symposium on VLSI Design, Automation
and Test (VLSI-DAT), Apr. 2019.
[3] K. Xu, Y. Liu, B. Han, X. Zhang, X. Liu, and J. Ai, “ A Low-power
According to the findings, the architecture is not Computer Vision Engine for Video Surveillance,” IEEE International
only power-efficient but also capable of retaining Conference on Integrated Circuits, Technologies and Applications
(ICTA), Nov. 2018.
competitive real-time performance and accuracy, making it
[4] Verdant et al., “A 3.0μW@5fps QQVGA Self-Controlled Wake-Up
a good contender for embedded vision systems with low Imager with On-Chip Motion Detection, Auto-Exposure and Object
power budgets. Recognition,” IEEE Symposium on VLSI Circuits, Jun. 2020.
[5] S. Li, Y. Luo, K. Sun, N. Yadav, and K.-M. Choi, “ A novel FPGA
accelerator design for Real-T ime and Ultra-Low power deep
convolutional neural networks compared with T itan X GPU,” IEEE
VI. CONCLUSION Access, vol. 8, pp. 105455–105471, Jan. 2020.
[6] T . T sai and S.-W. Chen, “Single-Chip design for intelligent
In conclusion, the study has developed a low-power surveillance system,” IEEE Transactions on Very Large Scale
VLSI architecture designed for real-time image Integration Systems, Sep. 2018.
identification in embedded vision systems. The design has [7] Navaneethan, S., and N. Nandhagopal. "RE-PUPIL: resource efficient
pupil detection system using the technique of average black pixel
shown exceptional promise for reducing power consumption density." Sādhanā 46.3 (2021): 114
while retaining competitive real-time performance and good [8] R. Kumar and A. Ibrahim, “VLSI design of energy efficient
image detection accuracy. An energy-efficient solution computational centric smart objects for IoT ,” 15th Learning and
suited for embedded devices with limited resources and Technology Conference (L&T), Feb. 2018.
[9] H.-H. Lien and T .-S. Chang, “Sparse compressed spiking neural
battery capacity has been produced by careful design and network accelerator for object detection,” IEEE Transactions on
integration of hardware optimizations, algorithmic Circuits and Systems I-regular Papers, vol. 69, no. 5, pp. 2060–2069,
improvements, and power management measures. It's May 2022.
crucial to recognize some constraints, though. Depending on [10] Gilan, M. Emad, and B. Alizadeh, “FPGA-Based implementation of a
Real-T ime object recognition system using convolutional neural
the individual dataset and the difficulty of the image network,” IEEE Transactions on Circuits and Systems Ii-express
detection tasks, the performance of the architecture may Briefs, vol. 67, no. 4, pp. 755–759, Apr. 2020.
[11] K. Xu, H. Zhang, Y. Li, Y. Zhang, R. Lai, and Y. Liu, “An Ultra-Low
power T inyML system for Real-T ime visual processing at Edge,”
IEEE Transactions on Circuits and Systems Ii-express Briefs, vol. 70,
no. 7, pp. 2640–2644, Jul. 2023.
[12] S. Song, S. Kim, G. Park, D. Han, and H. Yoo, “A 49.5 MW multi-
scale linear quantized online learning processor for Real-T ime
adaptive object detection,” IEEE Transactions on Circuits and
Systems Ii-express Briefs, vol. 69, no. 5, pp. 2443–2447, May 2022.
[13] K. Khalil, B. Dey, A. Kumar, and M. Bayoumi, “A Reversible-Logic
based Architecture for Convolutional Neural Network (CNN),” IEEE
International Midwest Symposium on Circuits and Systems
(MWSCAS), Aug. 2021.
[14] K. Monga, N. D. Chaturvedi, and S. Gurunarayanan, “Design of a
Low Power Approximate Adder based on Magnetic Tunnel Junction
for Image Processing Applications,” International Symposium on
VLSI Technology, Systems and Applications (VLSI-TSA), Apr. 2021.
[15] H. Yoo, “ Mobile/embedded DNN and AI SoCs,” International
Symposium on VLSI Technology, Systems and Application (VLSI -
TSA), Apr. 2018.
[16] U. Swathi and U. Smitha, "Design and Implementation of Efficient
RGB to Grayscale Converter Architectures Using Reversible Logic,"
IEEE International Conference on Distributed Computing, VLSI,
Electrical Circuits and Robotics (DISCOVER), Oct. 2020.
[17] W. Lee, K. Kim, W. Ahn, J. Kim, and D. Jeon, “A Real-T ime object
detection processor with XNOR-Based Variable-Precision computing
unit,” IEEE Transactions on Very Large Scale Integration Systems,
vol. 31, no. 6, pp. 749–761, Jun. 2023.
[18] Lokesh, T. K. Chethan, N. Pinto, V. P. B. R, S. Shankar, and K. R.
Rekha, “ Advanced CMOS VLSI T echnology for Low Power Analog
System Design with High Gain,” 2022 IEEE International
Conference on Distributed Computing and Electrical Circuits and
Electronics (ICDCECE), Apr. 2022.