LostNet A Smart Way For Lost and Find
LostNet A Smart Way For Lost and Find
Abstract:
Due to the enormous population growth of cities in recent years, objects are frequently lost
and unclaimed on public transportation, in restaurants, or any other public areas. While
services like Find My iPhone can easily identify lost electronic devices, more valuable
objects cannot be tracked in an intelligent manner, making it impossible for administrators to
reclaim a large number of lost and found items in a timely manner. We present a method that
significantly reduces the complexity of searching by comparing previous images of lost and
recovered things provided by the owner with photos taken when registered lost and found
items are received. In this research, we will primarily design a photo matching network by
combining the fine-tuning method of MobileNetv2 with CBAM Attention and using the
Internet framework to develop an online lost and found image identification system. Our
implementation gets a testing accuracy of 96.8% using only 665.12M GLFOPs and 3.5M
training parameters. It can recognize practice images and can be run on a regular laptop.
Introduction:
The population density and the quantity of lost objects are both rising in the areas where
urban rail transit is located around the world at the present time; yet, the traditional manual
search service is ineffective. In this situation, there is an immediate need to speed up the
development of intelligent lost and found systems in order to lessen the difficulty that
transportation operators face when it comes to lost and found management. Deep learning can
be used to build recognition and classification models for lost and found items. This is a new
method that can reduce reliance on manual labour, quickly and accurately identify categories,
significantly reduce human service costs for transportation operators, and better practice
green development strategies.
The application of convolutional neural networks is widespread throughout many scientific
subfields pertaining to image recognition and classification (Sun et al., 2021). It is not a
passing fad that academics in a wide variety of fields have shifted their attention to the study
and practical use of image recognition. In the realm of trash classification, a method for
garbage image classification was devised. It was built on an enhanced version of MobileNet
v2 and paired with transfer learning to increase the real-time performance and accuracy of
garbage image classification models (Huang et al., 2021).
Research Background:
Many academics are also engaged in questioning the status quo in the many subfields that
make up the area of applied image recognition. (Yang et al., 2015) attempting to identify
plant leaves by the utilization of a hierarchical model that is based on CNN. A study on
establishing the optimal size of the training data set that is required to achieve high
classification accuracy with low variance in medical image classification systems is presented
by Cho et al. (2015). (Purnama et al., 2019) offer a method for the classification and
diagnosis of skin diseases that is suitable for use in teledermatology.
It has also been demonstrated that transfer learning is beneficial in a variety of contexts.
Convolutional neural networks are used in the methodology that Lee et al. (2016) propose as
a fine-grained classification method for large-scale plankton databases. The implementation
of transfer learning in CNN is one potential solution. (Liu et al., 2020) apply unsupervised
transfer learning to CNN training to address these problems. Specifically, they transform
similarity learning into deep ordinal classification with the assistance of several CNN experts
who were pretrained over large-scale-labelled everyday image sets. These CNN experts
jointly determine image similarities and provide pseudo labels for classification. (Purwar et
al., 2020) make use of some models that are related to convolutional neural networks (CNNs)
in order to identify mesangial hypercellularity in MEST-C. (Herzog et al., 2021) concentrate
on the classification of MRI for the diagnosis of early and progressive dementia by utilizing
transfer learning architectures that employ Convolutional Neural Networks-CNNs, as a base
model, and fully connected layers of SoftMax functions or Support Vector Machines-SVMs.
(Phankokkruad, 2021) propose the three CNN models for detecting lung cancer using
VGG16, ResNet50V2, and DenseNet201 architectures. The proposed method is based on
transfer learning.
Methodology:
In a procedure in which CNN is being used to address more and more components of the
problem, the lost and found problem has not been solved in any intelligent method as of yet.
As a result, we recommend taking a methodical approach by utilizing Mobilnet v2 and an
intuitive graphical user interface (GUI). In this study, we build on earlier research to further
investigate the detection and categorization of lost and discovered items, and we present an
approach that combines perceptual hashing with MobileNet v2 transfer learning.
In order to solve the relatively complex problems associated with the image dataset of lost
and found items, we carried out extensive research, classified the most common items that are
lost and found into ten categories using questionnaires and market research, and produced
private dataset images using crawlers, real-world photography, and research examples. After
that, the generated dataset is used to train the network and establish an intelligent recognition
classification model for lost and found images. This model solves the problems of labor cost
and time cost consumption that are associated with conventional methods and proposes an
accurate and complete solution with scientific and accurate experimental data.
I. Mobilenet v2
An outstanding example of a good lightweight convolutional neural network is MobileNetV2
(Sandler et al., 2018). The network creates a reverse residual and linear bottlenecks, both of
which are helpful for feature extraction; the linear activation that is used in the final layer of
the inverted residual structure prevents the loss of low-latitude information; and the
traditional convolution is replaced by depth-separable convolution, which significantly
reduces the amount of the model's calculations as well as the number of the parameters that it
uses. It was developed specifically for pictures and has applications in image categorization
as well as the development of generic features.
Convolution on a depth-wise and point-wise scale are the two components that make up
depth-separable convolution. The depth separable convolution algorithm is not the same as
the standard convolution algorithm. During the process of convolution, each channel of the
feature map is covered by exactly one convolution nucleus, and the total number of
convolution nuclei is equal to the total number of channels. The following phrases can be
used to describe depth convolution:
𝑂𝑥,𝑦,𝑐 = ∑𝑊,𝐻
𝑤,ℎ
𝐾𝑤,ℎ,𝑐 ⋅ 𝐼𝑥+𝑤,𝑦+ℎ,𝑐 (formula1)
In the equation presented above, the variable O stands for the output feature graph, c stands
for the channel of the feature graph, x and y stand for the coordinates of the output feature
graph on channel, K stands for the convolutional kernel with wide W and high H, I stand for
the input feature graph, and w and h stand for the convolutional kernel weight element
coordinates of channel.
The primary difference between point-by-point and standard convolution is the size of the
convolution kernel, which in point-by-point convolution is fixed at 1x1. The first step of
depth separable convolution is to employ depth convolution to extract the characteristics of
each channel. Next, point-by-point convolution is used to correlate the extracted channel
characteristics. The depth separable convolution is intended to reduce the number of
parameters and computations required by the conventional convolution. When compared to
the total number of calculations involved in the traditional convolution, the results are as
follows:
𝑅1 𝐷𝑓 2 𝐷𝑘 2 𝐼+𝐷𝑓 2 𝐼𝑂 1 1
= =𝑁+ (formula2)
𝑅2 𝐷𝑓 2 𝐷𝑘 2 𝐼𝑂 𝐷𝑘 2
In the formula 2 shown above: R1 and R2 represent the calculations of depth separable
convolution and standard convolution, respectively; Df and Dk represent the height and width
of the input feature matrix; I represent the depth of the input feature matrix; and O represents
the depth of the output feature matrix.
In the process of feature extraction, MobileNetV2 makes use of a depth detachable
convolution with a size of 3x3, which means that the calculation cost is 1/9 of what it would
be for a standard convolution. However, the reduction in accuracy is very minimal, which is
also one of the most notable qualities of the model.
Figure 1 presents the organizational structure of the MobileNetV2 network. It is primarily
made up of three components: the front end is a convolutional neural network (CNN), which
is constructed by several layers of convolution, and then the average pooling of 1,280 7x7
blocks is utilized to generate 1,280 nerves. Element, which was afterwards completely
coupled with one thousand neurons.
3)
The spatial attention module can highlight the information region and generate two 2D maps,
which are then linked and convolved by a standard convolutional layer to produce a 2D
spatial attention map. This map is produced after first averaging and maximum pooling along
the channel to generate efficient feature descriptors. In conclusion, the formulation for the
output of the spatial attention mapping is as follows:
𝑀𝑠 (𝐹) = 𝜎 (𝑓 (𝑓𝑐 (𝐹avg ,𝐹max ))) (formula4)
F is the input feature graph, is the sig-moid nonlinear activation function, M is the forward
calculation function of multi-layer perceptron without bias, a and m are the mean and
maximum pooling functions respectively, W0 and W1 are the weights of 2 linear layer, and
Favg and Fmax are the mean and maximum pooling functions respectively. In the equation
that was just presented, F represents the input feature graph, represents the sig-moid
nonlinear.
vacuumcup 1059
It is abundantly evident, after reviewing the preceding explanation complete with graphs and
table, that the optimizer function SGD performed far better than the performance of the other
functions when applied to the issue of lost-object picture classification.
EfficientNet, Inception v4, ViT-B. 32, DenseNet201, VGGNet19, ShuffleNet v2, ResNet152
and MobileNet v3 were the eight transfer learning algorithm models of the same kind that
were chosen at random for the purpose of doing a comparison study in order to test the
efficiency and superiority of the research approach that was provided in this work.The
optimal parameter values of the model convergence were selected as the respective
parameters.This study's private data set is used to train and verify a total of nine models using
these settings. The outcomes of the training are presented in Table 3.
Table3. Results of the selected model training
Accura Total
Model AP/%
cy/% parameters/M
EfficientNet
93.9 89.0 66.348
(Koonce et al,2021)
Inception v4
89.5 80.3 41.158
(Szegedy et al,2017)
ViT-B/32
(Dosovitskiy et al, 94.9 94.3 104.766
2010)
DenseNet201
(Huang et al, 95.9 95.4 20.014
2017)
VGGNet19
(SIMONYAN et al, 95.6 94.7 139.611
2021)
ShuffleNet v2
88.6 82.3 2.279
(Ma et al,2018)
ResNet152
87.8 81.9 60.193
(He et al,2016)
MobileNet v3
95.7 94.9 5.483
(Koonce et al,2021)
us 96.8 96.2 3.505
Figure8 The performance, accuracy and parameter amount compare of difference models
The algorithm that is suggested in this article is transfer learning, which is based on the
improved MobileNetv2. This can be seen in Table 3. The results of the experiments
demonstrate that the method under study has an average accuracy of 96.2% when applied to
data sets that were self-built. This is a higher accuracy rate than EfficientNet, Inception v4,
ViT-B. 32, DenseNet201, VGGNet19, ShuffleNet v2, ResNet152 and MobileNet v3
correspondingly. 7.2%, 15.9%, 1.9%,0.8%,1.5%, 13.9%, 14.3% and 1.3% of the total
learning was transferred from other models. The suggested technique has obtained the
greatest accuracy rate in private data sets, which is 96.8%, along with strong generalization
ability and resilience. When compared to the similar kind of transfer learning model that has
been developed, this is how it stands out. In correlation with the information shown in Figure
8, The ordinate in the figure shows the test accuracy, the abscissa represents GFLOPs, the
circle colors reflect distinct transfer learning models, and the figure size denotes total
parameters. The suggested method gets the maximum accuracy on private datasets, and it is
robust and has a decent ability to generalize. When Figure 8 and Table 3 are considered
together, a further complete evaluation of the performance of each model is carried out. This
model is only second to MobileNet v3 and ShuffleNet v2, but it is worth recognizing that the
Total parameters of the proposed model are almost half that of MobileNet v3 Total
parameters, and the accuracy is higher than ShuffleNet v2 8.2%. MobileNet v3 and
ShuffleNet v2 are the only models that are ahead of this model. In comprehensive
comparison, the model that is being offered is a fantastic lightweight network, which
provides the possibility that the model might be transplanted to mobile devices.
This paper, which is based on the model training using the pytorch and torchvision machine
learning frameworks, uses the Spring Boot framework as the back-end service of the web
page in order to realize the online recognition of pictures. The front-end web page uses the
Layui framework in order to port the trained model to the Web end in order to realize the
online recognition of images that are uploaded by users. The steps involved in the online
identification procedure are as follows: the user launches the browser, navigates to the
system's website, navigates to the system's home page, clicks the button to upload the local
image, the front end sends the POST request to the back end, and the image is transmitted to
the back end. After the back end has received the picture that was uploaded by the user, it
will first make a request to the trained and upgraded MobileNet v2 migration learning model
for prediction and recognition, and it will then return a category. After it has determined the
category, it will next submit a request to the database, asking it to deliver an array containing
the picture address associated with the category it has just determined. After obtaining the
address, the image is then downloaded by using the image address, and then the perceptual
hash algorithm is used to compare the downloaded image to the image that was uploaded by
the user. The alignment will return a number for the similarity error, and the array will be
used to send the few photos that have the least significant value for the similarity error to the
front end. Figure 10b depicts the engineering interface after the model has been applied to the
situation.
Acknowledgement:
This work was supported by Provincial College Students' Innovation and Entrepreneurship
Project Project for College Students [Grant numbers S202110368112]; University
Humanities and Social Science Research Program of Anhui Province [Grant numbers
SK2020A0380]; School level Project of Key Humanities and Social Sciences Research Base
of Anhui Province, Center for Mental Health Education of College Students [Grant numbers
SJD202001]; and School level Project of the Young and Middle-Aged Natural Science
Foundation of Wannan Medical College[Grant numbers WK202115]
Reference:
F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive
survey on Transfer Learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76,
2021.
G.-W. Yang and H.-F. Jing, “Multiple convolutional neural network for feature extraction,”
Intelligent Computing Theories and Methodologies, pp. 104–114, 2015.
H. Lee, M. Park, and J. Kim, “Plankton classification on imbalanced large scale database via
convolutional neural networks with transfer learning,” 2016 IEEE International
Conference on Image Processing (ICIP), 2016.
J. Cho, K. Lee, E. Shin, G. Choy, and S. Do, “How much data is needed to train a medical
image deep learning system to achieve necessary high accuracy?,” arXiv.org, 07-Jan-
2016. [Online]. Available: https://arxiv.org/abs/1511.06348. [Accessed: 20-Aug-2022].
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”
arXiv.org, 10-Dec-2015. [Online]. Available: https://arxiv.org/abs/1512.03385.
[Accessed: 20-Aug-2022].
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv.org, 10-Apr-2015. [Online]. Available:
https://arxiv.org/abs/1409.1556. [Accessed: 20-Aug-2022].
M. Phankokkruad, “Ensemble transfer learning for lung cancer detection,” 2021 4th
International Conference on Data Science and Information Technology, 2021.
N. J. Herzog and G. D. Magoulas, “Deep learning of brain asymmetry images and transfer
learning for early diagnosis of dementia,” Proceedings of the International Neural
Networks Society, pp. 57–70, 2021.
P. Samanta and S. Jain, “Analysis of perceptual hashing algorithms in image manipulation
detection,” Procedia Computer Science, vol. 185, pp. 203–212, 2021.
W. Song, Y. Wang, and Wang, Image and graphics technologies and applications.
Singapore: Springer Singapore, 2021.
Y. Liu, L. Ding, C. Chen, and Y. Liu, “Similarity-based unsupervised deep transfer learning
for remote sensing image retrieval,” IEEE Transactions on Geoscience and Remote
Sensing, vol. 58, no. 11, pp. 7872–7889, 2020.
Z. Sun, X. Yuan, X. Fu, F. Zhou, and C. Zhang, “Multi-scale capsule attention network and
joint distributed optimal transport for bearing fault diagnosis under different working
loads,” Sensors, vol. 21, no. 19, p. 6696, 2021.
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected
convolutional networks. In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition. 4700–4708 (2017).
Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn
architecture design[C]//Proceedings of the European conference on computer vision
(ECCV). 2018: 116-131.