0% found this document useful (0 votes)
32 views6 pages

Real-Time Object Detection With IOT Using A Smart Cart

This paper presents a cost-effective solution for real-time object detection in smart shopping using IoT and AI technologies, specifically through a smart cart equipped with two cameras and an NVIDIA Jetson Nano. The system utilizes YOLOv7 for object detection, enabling seamless addition and removal of items from a shopping list, thereby reducing customer waiting times in payment queues. The proposed approach demonstrates promising results in enhancing customer experience and offers potential integration with e-commerce and payment systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

Real-Time Object Detection With IOT Using A Smart Cart

This paper presents a cost-effective solution for real-time object detection in smart shopping using IoT and AI technologies, specifically through a smart cart equipped with two cameras and an NVIDIA Jetson Nano. The system utilizes YOLOv7 for object detection, enabling seamless addition and removal of items from a shopping list, thereby reducing customer waiting times in payment queues. The proposed approach demonstrates promising results in enhancing customer experience and offers potential integration with e-commerce and payment systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Real-Time Object Detection with IOT Using a

Smart Cart
Muhammad Omer1, Sardar Jaffar Ali2, Syed Muhammad Raza3, Duc-Tai Le4, Hyunseung Choo1,2,3,*
1 Dept. of Computer Science and Engineering, Sungkyunkwan University, Suwon, Korea
2 Dept. of AI System Engineering, Sungkyunkwan University, Suwon, Korea
3 Dept. of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, Korea
2024 18th International Conference on Ubiquitous Information Management and Communication (IMCOM) | 979-8-3503-3101-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/IMCOM60618.2024.10418380

4 College of Computing and Informatics, Sungkyunkwan University, Suwon, Korea


*Corresponding author ([email protected])

Abstract— With Industry 4.0's technological advancements, In this paper, we present an innovative and cost-effective
the convenience and affordability of Artificial Intelligence (AI) solution aimed at reducing customer waiting time in payment
and IoT have increased. However, customers still face long queues by harnessing the advancements in IoT and AI
waiting times in payment queues during the smart shopping technologies. Our proposed solution revolves around creating
experience. Current solutions rely on expensive sensor-based
RFIDs or simple barcodes, making them impractical and limited
an automated environment for consumers that seamlessly
in scalability. This paper proposes an economically smart combines a shopping basket widget, real-time monitoring
solution with two cameras on NVIDIA JETSON NANO for real- through a camera, and the powerful NVIDIA Jetson Nano
time object monitoring and automated shopping. device. During the process of insertion, the item is scanned
Communication between modules is facilitated by TCP/IP and first by camera 1, Yolov7 detects the object and item
HTTP protocols. By utilizing the YOLO v7, real-time specifications are fetched from the database. Afterwards, if
monitoring is achieved. This innovative approach revolutionizes the same object gets detected by camera 2 within timer, then
the customer experience by providing a seamless interface and the item is added in the shopping list and the item price is
time-saving features. Additionally, it offers potential integration added in the total cost. For the case of removal of object from
with e-commerce businesses and payment gateways, further
basket, the object is detected first by camera 2 and database
enhancing the overall customer and vendor experience.
is fetched and if the same item is scanned through camera 2,
Keywords— Internet of Things, Graphical User Interface, You the item is removed from the shopping list and price is
Only Look Once, and Radio Frequency Identifications deducted in the total cost. Through comprehensive
experimentation, our solution has demonstrated promising
I. INTRODUCTION results, enhancing the customer experience with its seamless
Industry 4.0 has had a profound impact on technical interface and time saving features.
development, ushering in a new era of automation, The rest of the paper is structured as follows. The next
connectivity, and digitization. With the increased section discusses some of the most recent work that is similar
convenience and affordability of Artificial Intelligence (AI) to this project. Then in section 3, our proposed approach has
and the Internet of Things (IoT), these technologies have been presented in detail along with the experimental setups.
become more accessible across various sectors. However, a section 4 discusses the results. Lastly, the paper is concluded
persistent challenge remains i.e., customers' waiting time in in section 5.
payment queues. Lengthy queues not only lead to customer
dissatisfaction but also hinder business efficiency and II. RELATED WORK
revenue. Traditional queue management methods are
One of the main challenges faced by customers when
inadequate in meeting the demands of today's fast-paced
shopping in the store is failure to find merchandise and even
world. Therefore, there is an urgent need for innovative
to transport goods to the billing counter. In article [1], the
solutions that leverage the advancements in AI and IoT to
authors describe a new cost-effective approach to solve these
reduce customer waiting time and enhance the overall
problems by building a smart trolley using a web camera and
shopping experience.
video editing to complete the tasks.
Automated stores leveraging advanced deep learning
Shelf Scanner [3] enables visually impaired individuals to
techniques and modern technology have emerged as a
independently navigate a grocery store. Using a video stream
solution to the labor-intensive and time-consuming payment
input, it quickly identifies multiple items simultaneously by
systems in shopping and retail environments. Amazon Go [1],
leveraging the planarity of the store shelf. The machine
a prominent example of such an automated store, offers
employs an optical flow algorithm to create a real-time
customers a seamless shopping experience with no waiting
mosaic, allowing seamless use of any object detection
time for bill generation. However, the limitation of
algorithm without data loss. For efficiency, a multiclass
accommodating only a limited number of customers (100
Naive-Bayes NIMBLE classifier, inspired by Speed Up
individuals) arises from the challenges of AI handling a large
Robust Features SURF [4] descriptors from the GroZi-120
customer base while maintaining accuracy. This limitation
dataset, is utilized. The classifier measures probability
has fueled interest in smart unmanned stores worldwide.
distributions per class on video key points for final
Other approaches, such as RFID [2] labels for automatic item
classification. Research suggests Shelf Scanner's
identification in smart shopping trolley systems, have been
effectiveness in scenarios with high-quality training data.
explored. Nonetheless, the cost and effort associated with
attaching RFID labels to all items have motivated the
development of alternative solutions.

979-8-3503-3101-1/24/$31.00 © 2024 IEEE


Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.
TABLE 1. PRODUCT CATEGORIES IN THE DATASET
ID Product Name ID Product Name
0. Corn flour 9. Detergent
1. Lasagna 10. Battery
2. Soap 11. Sweet
3. Macaroni 12. Fabric softener
4. Cigarette 13. Spice
5. Jam 14. Spaghetti
6. Spice 15. Green tea
7. Vinegar 16. Biscuit
8. Mineral water

Fig. 1. Trolley Structure movements, gestures, and engagement patterns. The goods
In a similar study [5], a novel SURF-coated detector and that are taken off the shelf and those that are placed back are
descriptor, invariant in scale and rotation (Speeded-Up both identified in this. The program immediately uses the card
Robust Features), are introduced. SURF surpasses previously to make the necessary payment after you leave the store.
proposed schemes in terms of repeatability, distinctiveness, The existing techniques are quite expensive and not that
and robustness, while maintaining faster calculation and scalable. Our proposed solution costs around $250 dollars
comparison capabilities. This is achieved through integrated making the overall system much more cost friendly and
image transformations, utilizing a Hessian matrix-based scalable
measure for the detector, and a simplified descriptor. The
study outlines the novel detection, description, and matching III. PROPOSED APPROACH
steps, along with a detailed exploration of key parameters. The overall approach includes both software and
Applying SURF addresses two contrasting objectives: camera hardware implementation. The positioning of the cameras and
calibration as a specific case of image alignment and object the Nvidia Jetson Nano is still under consideration, but we
recognition. The experiments conducted underscore SURF's have mostly focused on the software implementation part.
versatility across various topics in computer vision. The block diagram of the methodology for the Smart
The large-scale image recovery instance aims to retrieve Shopping Trolley is shown in Fig. 2. The model comes into
specific objects or scene instances, posing challenges when action when the customer puts something in the cart. The
visually similar items need retrieval. In an early study [6], a camera attached to the NVIDIA Jetson Nano recognizes the
solution for multilabel image recognition is proposed, using product after the product is added or removed from the cart.
discriminatory random forests, deformable dense pixels, and The YOLO v7, the latest object detection model being
genetic optimization for runtime efficiency. Cross-dataset deployed on Jetson Nano, has been used to detect objects. The
recognition is demonstrated with one training picture per predicted label triggers a query from SQL that fetches all the
product label, evaluating in diverse real-life scenarios using a necessary information (price, quantity) related to the product.
mobile phone. The study introduces new datasets and tools A UI (User Interface) has been made on Tkinter using Python
for multi-label retail product image classification, achieving to display the bill and information of the shopped items.
strong precision and performance on 680 annotated images
and 885 test images from GroZi-120, comprising 8350 A. Object Detection
different product pictures and 680 retail test pictures. The most significant part of this project is the accurate
Walmart adopts a unique approach with smart shopping identification of the objects being added to the cart. To
baskets equipped with sensors, measuring vital signs like manage this task, we used pre-trained YOLOv7. The YOLO
heart rate and temperature to identify customers needing library [7] offers superior performance than prior neural
assistance. This innovative trolley [7], costing approximately networks for object detection, a robust real-time object
$50, promptly alerts Walmart staff in case of detected detection system that supports real-time video detection
pathologies. Additional features include self-driving through a camera and file format video. Since YOLO is based
mechanisms, video surveillance, route planning, user on CNN, it provides a convolution-based architecture, which
interface, voice input, and image capturing samples the input picture size, such as the convolution layer
and sub-sampling.
Amazon Go is regarded as the most creative smart retail
option now. The distinctive characteristics of such a system YOLO predicts several bounding boxes around the picture
have been discussed in several reputable publications. More using a single CNN and utilizes an integrated model to
than 10 super stores in the USA are now using the technology. compute the class probability in each box simultaneously.
This intelligent shopping cart is described as particularly YOLO is almost 1,000 times quicker than conventional R-
user-friendly by Pocket Lint [8]. Anyone may sign in with CNN, 100 times faster than Fast R-CNN [8], and 10 times
their Amazon account using a single application, which is all faster than the last Faster R-CNN. Since YOLO also handles
that is needed. The suggested method calls for a big array of the problem using supervised learning, high-quality and well-
labeled data must be secured. When an object detection
cameras and sensors to be positioned at specific angles to problem occurs, the right response label consists of a pair and
watch how customers and products interact including their an annotation of the label name of each item. The overall
architecture of YOLOv7 is shown below in Fig. 3.

Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Insertion of item inside a trolley Fig. 3. Removal of item inside a trolley
In general, YOLOv7 significantly increases accuracy then it is recognized by Yolov7. After the identification of
without increasing the inference cost. This enhanced product, item specification is fetched from database and
architecture offers notable advantages in terms of speed, displayed on GUI. Now if the person wants to buy the item
power, and accuracy for object detection tasks. It enables after reading the characteristics of item, the same item will be
efficient feature integration, ensures model robustness, and detected by camera 2 within timer. The timer is set for 1
significantly enhances accuracy in detecting objects. minute, and if the same item is not detected by camera 2
B. Data Collection and Annotation within timer, then the item characteristics that were fetched
from database are removed from GUI. Then a check is placed
We manually made a custom dataset of about 17 classes, on the item detected by camera 2. If both the items in cameras
which are listed in Table 1 above. We took these 17 products 1 and 2 are same, item will be added in the shopping list and
and manually annotated them since we could not find any item price is added in the total cost, else the GUI will show
public dataset related to the products in our supermarkets. We error to deal with such cases.
chose the most common products of daily use. Makesense.ai
is used for this purpose, which is an online photo labeling tool, For the removal of the item, the item is first scanned by
that is completely free to use. Data augmentation techniques camera 2, recognized by Yolov7. Then item specification is
like rotation and brightening have been applied to increase the fetched from the database and displayed on GUI. Then if the
heterogeneity in the input images. We used 2,900 images for customer wants to remove the item, it will be scanned by
the training set (75%), 357 images for testing (9%), and camera 1 within timer if not, then the item will be removed
finally 595 images for validation (16%). from the GUI. For the removal another check is made that the
item in camera 1 is same as item in camera 2. If yes then
C. Item Insertion and Deletion remove the item in the shopping list and deduct item price in
For the insertion of item, item is detected first by camera.1, the total cost, if not then show error to deal with such cases.

Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Yolov7 architecture
D. Deploying on NVIDIA Jetson Nano:
To integrate our software with hardware, we need to recommended for devices like Nano. Once the training
deploy our object detection model on Nvidia Jetson Nano. weights are downloaded into the YOLOv7 folder, YOLO
The Jetson Nano, a compact, low-power computer intended. should be able to identify objects within approximately 30
One of the primary causes of this is YOLOv7's capability to seconds without errors. A picture displaying bounding boxes
carry out real-time object recognition, which is essential for around the detected items will then appear.
many applications that call for quick and precise object
E. Performance evaluation and model specifications:
detection in photos or videos. A range of edge computing
applications, including video surveillance, autonomous cars, The following table shows the model performance along with
traffic monitoring, and smart IoT devices, are ideally suited the parameters being set for training the model.
for the Jetson Nano's low power consumption. Users may
IV. RESULTS
create strong and effective edge computing apps by utilizing
The initial 19 classes of products were reduced to 16.
YOLOv7 on the Jetson Nano to take advantage of its quick
Three classes were removed from the data based on the results.
and precise object identification capabilities. Following is the
The overall mean average position with an IoU threshold of
procedure for deploying YOLOv7 on Nvidia Jetson Nano.
50% was 0.985 or 98.60%. The confidence scores along with
Jetson Nano Setup: First, we need to install the latest Nvidia the bounding box of one of the products is shown below in
Developer Kit SDK. Next, is creating a folder for this project Figure 4. The average precision of different product IDs is
and cloning the repository of YOLOv7. Afterward, we need shown in Fig. 5. The GUI that we made on Tkinter along with
to create a virtual Python environment to avoid inconsistency the final receipt is shown in Fig. 6.
within the libraries. We must construct a symbolic link from The table 3 presents the average values for accuracy,
global to our virtual environment since OpenCV must be recall and precision. For more detailed results, accuracy,
installed system-wide (it comes preloaded with Nvidia precision and recall of the each product is given in the Fig. 5
developer kit Ubuntu 18.04). We will not be able to use it below. The GUI in Fig. 6 above, illustrates about the item
from our virtual environment if it is not accessible. price, quantity and name of item, along with the total cost at
1) Installing PyTorch and TorchVision: After setting up that particular time. If the person checksout the overall cost is
the environment and installing the necessary libraries, we displayed on the screen.
need to install Pytorch, which is a machine-learning
framework based on the Torch library. It is widely used in TABLE II. Parameters along with their values
applications such as computer vision and natural language Parameter Value
processing. Installing PyTorch version 1.8 is a recommended
choice as it is an official release provided by Nvidia. To Learning rate 0.001
install TorchVision for Jetson Nano, we clone the repository Weight decrease 0.0005
from GitHub since no pre-built wheel is available. After Epochs 2000
building the appropriate version, we install it in our virtual Ultimate loss function value 0.0646
environment. It is essential to choose a TorchVision version mAP@ (.50) 96.80%
that is compatible with our PyTorch version, such as 0.9.0 for Precision 86%
PyTorch 1.8.0. Recall 83%
2) Running Yolov7 on Jetson Nano: Before running Momentum 0.949
YOLOv7 on the Jetson Nano for the first time, it is necessary F1-score 85%
to download the training weights. You can choose between
the regular and little versions, with the compact version

Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Average precision of different product IDs
Fig. 4. Confidence score of a product

Table III. Performance metrics

Parameters Average Values


Accuracy 95.53
Recall 95.00
Precision 96.47

V. CONCLUSION & FUTURE WORKS


The precision of Amazon Go, known as the
representative of unmanned businesses, is successful.
Amazon Go's ideas and talents have been acknowledged by
the public but their implementation on a wide scale is
expensive. There is also a restriction on the number of clients Fig. 6. GUI developed on Tkinter
who can enter at the same time. The smart cart system
suggested in this study outperforms existing unmanned store REFERENCES
options in terms of cost-performance ratio. Even if the [1] Pangriya, Ruchita & Chandra, Jaiswal. (2023). AMAZON GO!!!!
capacity increases, the proposed system is unaffected. It also JUST WALKOUT. 10.13140/RG.2.2.21443.99365.
consumes a small amount of CPU power. Furthermore, unlike [2] Ajami S, Rajabzadeh A. Radio Frequency Identification (RFID)
the traditional way, RFID does not have to be attached to all technology and patient safety. J Res Med Sci. 2013 Sep;18(9):809-13.
PMID: 24381626; PMCID: PMC3872592.
items. Speed and accuracy are trade-offs in the world of
[3] Bochkovskiy, A., Wang, C. Y., and Liao, H. Y. M. (2020). “Yolov4:
product detection, however, because this system required Optimal speed and accuracy of object detection”. arXiv preprint
real-time processing, it was done using YOLO. Several arXiv:2004.10934.
factors still need to be taken into consideration for the [4] Herbert B, Andreas E, Tinne T, Luc VG (2008). “Speeded-up Robust
complete deployment of this project. Currently, we have Features (SURF)”
taken care of the software part, we need to integrate it with [5] Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang
the complete hardware prototype mentioned in the proposed Chen, Jun-Wei Hsieh, and I-Hau Yeh. CSPNet: “A new backbone that
can enhance learning capability of cnn”. Proceedings of the IEEE
approach section. The other thing that should be taken into Conference on Computer Vision and Pattern Recognition Workshop
consideration is adding the weight sensor to overcome fraud (CVPR Workshop), 2020.
for the products being removed from the cart by the customers. [6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Spatial
Another aspect is the billing system. The GUI tells us the pyramid pooling in deep convolutional networks for visual recognition”.
IEEE Transactions on Pattern Analysis and Machine Intelligence
overall bill to be paid but we need to integrate it with some (TPAMI), 37(9):1904–1916, 2015.
billing system with certain options like card payment, cash, [7] Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. “Path
etc. aggregation network for instance segmentation”. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition
ACKNOWLEDGMENT (CVPR), pages 8759–8768, 2018.
[8] Trevor Mogg, “This high-tech shopping cart from Walmart could save
This work was supported in part by IITP grant funded by the your life”, Digital Trends, November 2018. [Online] Available:
Korean government (MSIT) under IITP-2024-2020-0- https://www.digitaltrends.com/cool-tech/walmart-has-an-idea-totrack-
your-heart-rate-via-its-shopping-carts/ [Accessed: 4 of June 2019].
01821(50%), IITP-2021-0-02068(25%), and IITP-2019-0-
[9] Maggie Tillman, “What is Amazon Go, where is it, and how does it
00421(25%). work?”, Pocket-lint, February 2019. [Online] Available:
https://www.pocket-lint.com/phones/news/amazon/139650-what-

Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.
isamazon-go-where-is-it-and-how-does-it-work [Accessed: 12 of May
2019].
[10] Redmon, Joseph, Ali Farhadi. YOLO. "An incremental improvement."
arXiv preprint arXiv:1804.02767 8 (2018).
[11] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international
conference on computer vision. 2015.
[12] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao.
"YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-
time object detectors." Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 2023.
[13] David A van Dyk & Xiao-Li Meng (2001) The Art of Data
Augmentation, Journal of Computational and Graphical Statistics, 10:1,
1-50, DOI: 10.1198/10618600152418584.

Authorized licensed use limited to: East West Institute of Technology. Downloaded on March 15,2025 at 06:09:28 UTC from IEEE Xplore. Restrictions apply.

You might also like