Computer_Vision_Based_Object_Detection_and_Recognition_System_for_Image_Searching
Computer_Vision_Based_Object_Detection_and_Recognition_System_for_Image_Searching
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on January 02,2025 at 05:27:04 UTC from IEEE Xplore. Restrictions apply.
149
There is also given the description of multiple GPU training. There is a method which has been done to give a caption of
Besides, overfitting has been reduced on the proposal. The any image [12]. It will describe what is happening in an image
total model has described with details of learning which is by running several CNN at a time. The method also shows the
very useful for this research. Finally the network achieves top- percentage of several objects prediction in the image.
1 and top-5 test set error rates of 37.5% and 17.0%. In another method, it has been proposed for Recurrent Neu-
The main approach that has used in another work which is ral Network (RNN) [13]. The main purpose of this research
publicly available, trained CNN called OverFeat [5]. OverFeat is to go deeper than before for CNN. It has been used same
has trained for image classification. For testing, there are used object image with different view to train more deeply so that
two types of datasets called ”Pascal VOC 2007” for object the accuracy can be increased.
image classification and the ”MIT-67 indoor scenes” for scene The latest proposed methods for image recognition system is
recognition. The whole network is run by Support Vector about residual learning framework which is more deeper than
Machine (SVM). For CNN-SVM, the recognition accuracy before but easy to train [14]. Through deep residual learning
is within category 91.7%, within across category 82.2% and it can say the number of objects in the image and what kind
within moving average unit cost 89.0%. For CNN-SVM aug- of object is in the image so that the image can be predicted.
mentation, the recognition accuracy is within category 93.7%, It has also been shown the co-ordinate frame of each objects.
within across category 84.9% and within moving average unit
cost 91.5%. III. M ETHODOLOGY FOR P ROPOSED S YSTEM
In a research work, there is an experiment conducted on
database [6]. Dataset or database is the most important stuff To fulfill the proposal, several techniques have been applied
for training. Without datasets it is impossible to do anything so that it helps to detect and recognize objects from images
in computer vision. On the proposal, it has been analyzed for the betterment of image search-engine. The required tools
the deep features for scene recognition. Suppose there is a and working processes are crucial for this purpose which are
scene of Sun or Ocean. The proposed method will say 66.2% described below:
accurately that the scene contains a sun otherwise will say
50.0% accurately that the scene contains other object (e.g. TABLE I
H ARDWARE & S OFTWARE C OMPONENTS FOR T HE S YSTEM
Ocean).
A method has been proposed which has done mainly for Components Requirements
RAM 4 GB (min)
classifying objects by deep searching [7]. It has been compared
Processor Intel Core i3, i5 or i7
various system for object detection and showed 5.71% error Memory 240/256/512 GB
for top-5 test in single models and 4.94% error for top-5 test OS Ubuntu-16.04 LTS or above
in multiple models. Python Version 3.5 or higher
Packages TensorFlow, Laravel-5.4 or higher, etc.
There is a proposal which is about the rich features i.e. Others Gallerify-the most powerful image downloader
”which feature is better for object detection?” [8]. The method
applied for the task is R-CNN (Region with CNN features).
The main task of this method is to detect object of an image
via semantic segmentation. There has been used ”Pascal VOC A. Collection & Classification of Datasets
2012” datasets. This work helpful to know the process of
First of all, the dataset have been collected for training from
object detection.
inception-v3 [15] so that matching with input can be done.
Another research has been done with ”Pascal VOC 2012”
By means of python the dataset have been implemented for
dataset [9]. The main proposals of this method are to show
training. The main thing is to import the TensorFlow over
how the middle level of CNN works and transfer segmented
python. At the time of implementation, the dataset start the
data. From this proposed methods, there can be found the ideas
training session and create protocol buffer (.pb) file which is
about the mid-level of a CNN.
a method that serializes structured data through classes. Thus,
There is a research work that is mainly focused on Deep
the dataset are classified.
Neural Network (DNN) [10]. It has done through image
segmentation and gray scaling. Then it has been compared
B. Development of a Demo Image Search Engine
with MNIST datasets. It is mostly related to his work which
is doing with TensorFlow library. After classification, the development of a demo image
Comparison of computer vision with human eye sights is search engine is required to show the results and best guesses.
one of the most vital fact. ”How a picture looks like from a demo image search engine has been developed through the
human?”- which has been researched. This visual presentation web language: PHP with Laravel-5.4 [16] framework. There
was done by Recurrent Neural Network (RNN), which is a are two parts for designing the search engine: backend and
class of artificial neural network where connections between frontend. The frontend contains a field for uploading along
units form a directed cycle [11]. The datasets for testing are with a box which shows the best guesses i.e. results from the
normal visual datasets. The proposed method helps to this image. The backend contains API which is connected to the
research purposes. search engine so that the produced result can be viewed.
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on January 02,2025 at 05:27:04 UTC from IEEE Xplore. Restrictions apply.
150
C. Testing Data matches according to the class. Finally, it returs the result with
Testing data has been started after the development of demo best guesses. Inside the system ILSVRC-2017 (IMAGENET
search engine. Before that, the data has been collected from Large Scale Visual Recognition Challenge 2017) [17] has
various websites by crawling through an extension of Google been followed. The equations and other facts of the work is
Chrome browser named ”Gallerify-the most powerful image supported by ILSVRC-2017 and those are mentioned below:
downloader”. 1) Object Localization: The first challenge of ILSVRC-
Almost 1K images of different classes have been collected 2017 is to localize the objects of an image to detect.
for testing. By uploading the images one by one in the search For localizing an object, given a picture, a calculation will
engine the best guesses matching with the trained dataset i.e. deliver 5 class names ci , where i=1,...5 in diminishing request
protocol buffer file can be seen. The testing doesn’t take too of certainty and 5 jumping boxes bi , where i=1,...5 one for each
much time. Rather, it takes time as much as Google image class name. The nature of a limitation naming will be assessed
search engine takes. given the mark that best matches the ground truth name for the
picture and furthermore the jumping box that covers with the
D. Process Flow Diagram for Proposed System ground truth. The thought is to allow a calculation to recognize
different questions in an image or picture and not be punished
on the off chance that one of the articles distinguished was in
Start certainty display, however, excluded in the ground truth.
The ground truth names for the picture are Ck , where
k=1,...n with n class marks. For each ground truth class mark
Ck , the ground truth jumping boxes are Bkm , m=1,...Mk
Classify imagenet datasets where Mk is the quantity of k th question in the present picture.
2) Object Detection: The preparation and approval infor-
mation for the object detection assignment will stay unaltered
Train imagenet datasets from ILSVRC 2014 [18]. The test information will be in part
invigorated with new pictures in light of a year ago’s competi-
tion(ILSVRC 2016 [19]). There are 200 essential level classes
for this undertaking which are explained in the test information
Create .pb file and graph completely, i.e., bouncing boxes for all classifications in the
picture have been marked. The classifications were precisely
picked considering diverse factors, for example, location scale,
Input image level of picture cluttered, the proper number of a protest case,
and a few others. A portion of the test images will contain
none of the 200 classes.
For each picture, calculations will deliver an arrangement
Decode and parse input image of explanations (ci , si , bi ) of class marks ci , certainly scores
si and bouncing boxes bi . This set is relied upon to contain
each case of each of the 200 protest classifications. Articles
which were not explained will be punished, as will be copied
Match input image with trained datasets
identifications (two comments for a similar protest case).
3) Equations: For object localization using from ILSVRC
2017,
let d(ci , Ck ) = 0, if ci = Ck and 1 generally. Let f(bi , Bk )
No = 0, if bi and Bk have over 50% cover and 1 generally. The
Matches? Return
mistake of the calculation on an individual picture will be
registered utilizing:
Yes
1 X
Best guesses with score e= . mini minm max{d(ci , Ck ), f (bi , Bk )} (1)
n
k
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on January 02,2025 at 05:27:04 UTC from IEEE Xplore. Restrictions apply.
151
R EFERENCES
[1] Computer Vision, British Machine Vision Conference,The British Ma-
chine Vision Association, http://www.bmva.org/visionoverview, 1990.
[2] Tensorflow, Nov 2015. Available: https://www.tensorflow.org
[3] Imagenet, 2017. Available: http://www.image-net.org/about-overview
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in Neural
Information Processing Systems 25, F. Pereira, C. J. C. Burges, L.
Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012,
pp. 1097–1105. [Online]. Available: http://papers.nips.cc/paper/ 4824-
imagenet-classification-with-deep-convolutional-neural-networks.pdf
[5] A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnn
features off-the-shelf: An astounding baseline for recognition,” in The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Workshops, June 2014.
[6] B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learn-
ing deep features for scene recognition using places database,” in
Advances in Neural Information Processing Systems 27, Z. Ghahra-
mani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger,
Eds. Curran Associates, Inc., 2014, pp. 487–495. [Online]. Avail-
able: http://papers.nips.cc/paper/5349-learning-deep-features-for-scene-
recognition-using-places-database.pdf
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” in The
IEEE International Conference on Computer Vision (ICCV), December
2015.
Fig. 2. Proposed System Result [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2014.
[9] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and transferring
mid-level image representations using convolutional neural networks,”
in The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2014.
[10] A. Nguyen, J. Yosinski, and J. Clune, “Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images,” in The
IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
June 2015.
[11] X. Chen and C. Lawrence Zitnick, “Mind’s eye: A recurrent visual
representation for image caption generation,” in The IEEE Conference
Fig. 3. Google Search Result on Computer Vision and Pattern Recognition (CVPR), June 2015.
[12] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A
neural image caption generator,” in The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), June 2015.
On the other hand, the Google Image Search Engine predict [13] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S.
the result only as a picture of bird. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convo-
The comparison of accuracy rate of the proposed methods lutional networks for visual recognition and description,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), June
and the related methods are provided in Table I. 2015.
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
TABLE II recognition,” in The IEEE Conference on Computer Vision and Pattern
C OMPARISON WITH PREVIOUS WORKS Recognition (CVPR), June 2016.
[15] Inception-v3, 2015. Available: https://arxiv.org/abs/1512.00567
[16] Laravel, 2017. Available: https://laravel.com/docs/5.4
Comparison Percentage [17] Available: http://image-net.org/challenges/LSVRC/2017/index
Accuracy Over 1,000 Classes Over 21,841 Classes [18] Available: http://image-net.org/challenges/LSVRC/2014/index
Top-1 [4] 68.3% 41.9% [19] Available: http://image-net.org/challenges/LSVRC/2016/index
Top-5 [6] 89.0% 69.6%
Top-20 [5] 96.0% 83.6%
Proposed 99.98% 89.28%
V. C ONCLUSION
Therefore, it can be seen that the proposed method contains
better accuracy than other works before and the final accuracy
has been gained 99.98% at peak for an real world image.
Besides, minimum 89.28% of accuracy has been gained for
an real world image. Thus, the system can be implemented to
any kind of image search engine to enhance the accuracy of
image searching.
Authorized licensed use limited to: AISSMS's Institute of Info Technology - PUNE. Downloaded on January 02,2025 at 05:27:04 UTC from IEEE Xplore. Restrictions apply.