Computer Science > Computer Vision and Pattern Recognition
[Submitted on 13 May 2021 (v1), last revised 26 May 2021 (this version, v2)]
Title:SizeNet: Object Recognition via Object Real Size-based Convolutional Networks
View PDFAbstract:Inspired by the conclusion that humans choose the visual cortex regions corresponding to the real size of an object to analyze its features when identifying objects in the real world, this paper presents a framework, SizeNet, which is based on both the real sizes and features of objects to solve object recognition problems. SizeNet was used for object recognition experiments on the homemade Rsize dataset, and was compared with the state-of-the-art methods AlexNet, VGG-16, Inception V3, Resnet-18, and DenseNet-121. The results showed that SizeNet provides much higher accuracy rates for object recognition than the other algorithms. SizeNet can solve the two problems of correctly recognizing objects with highly similar features but real sizes that are obviously different from each other, and correctly distinguishing a target object from interference objects whose real sizes are obviously different from the target object. This is because SizeNet recognizes objects based not only on their features, but also on their real size. The real size of an object can help exclude the interference object's categories whose real size ranges do not match the real size of the object, which greatly reduces the object's categories' number in the label set used for the downstream object recognition based on object features. SizeNet is of great significance for studying the interpretable computer vision. Our code and dataset will thus be made public.
Submission history
From: Xiaofei Li [view email][v1] Thu, 13 May 2021 11:03:24 UTC (30,246 KB)
[v2] Wed, 26 May 2021 02:32:13 UTC (30,572 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.