Tiny Object Recognition
Tiny Object Recognition
Keywords: Object Recognition, Lightweight Deep Convolutional Neural Network, Tiny Images, Global Average Pooling.
Abstract: Object recognition is an important problem in Computer Vision with many applications such as image search,
autonomous car, image understanding, etc. In recent years, Convolutional Neural Network (CNN) based mod-
els have achieved great success on object recognition, especially VGG, ResNet, Wide ResNet, etc. However,
these models involve a large number of parameters that should be trained with large-scale datasets on power-
ful computing systems. Thus, it is not appropriate to train a heavy CNN with small-scale datasets with only
thousands of samples as it is easy to be over-fitted. Furthermore, it is not efficient to use an existing heavy
CNN method to recognize small images, such as in CIFAR-10 or CIFAR-100. In this paper, we propose a
Lightweight Deep Convolutional Neural Network architecture for tiny images codenamed “DCTI” to reduce
significantly a number of parameters for such datasets. Additionally, we use batch-normalization to deal with
the change in distribution each layer. To demonstrate the efficiency of the proposed method, we conduct exper-
iments on two popular datasets: CIFAR-10 and CIFAR-100. The results show that the proposed network not
only significantly reduces the number of parameters but also improves the performance. The number of pa-
rameters in our method is only 21.33% the number of parameters of Wide ResNet but our method achieves up
to 94.34% accuracy on CIFAR-10, comparing to 96.11% of Wide ResNet. Besides, our method also achieves
the accuracy of 73.65% on CIFAR-100.
675
Truong, T-D., Nguyen, V-T. and Tran, M-T.
Lightweight Deep Convolutional Network for Tiny Object Recognition.
DOI: 10.5220/0006752006750682
In Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2018), pages 675-682
ISBN: 978-989-758-276-9
Copyright © 2018 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
INDEED 2018 - Special Session on INsights DiscovEry from LifElog Data
tional cost of their method is really large and cannot We also use data augmentation and whitening
be employed in real time. Although GPU with high data to improve accuracy. Our method only uses
performance can deal with this problem, the price of 21.33% number of parameters than the state-of-the-
GPU is really expensive and not suitable for small de- art method (Zagoruyko and Komodakis, 2016). How-
vices. Furthermore, resizing a tiny image into a large ever, we achieve the accuracy up to 94.34% and
image really does not get more information of an im- 73.65% on CIFAR-10 and CIFAR-100. With our re-
age. Additionally, training a large network takes a lot sult we achieved, it proves that our method not only
of time and requires the hardware to be really power- gets high accuracy but also reduce parameters signif-
ful enough. icantly.
A good solution for this problem is to keep the size The rest of this paper is organized as follows. Sec-
of the image and to build a network with fewer param- tion 2 presents related works. The proposed architec-
eters but it still has the ability to recognize with high ture of our network is presented in Section 3. Section
accuracy. On this basis, we propose a new method 4 presents our experimental configuration on CIFAR-
to employ very deep CNN called Lightweight Deep 10 and CIFAR-100. We compare our results to other
Convolutional Network for Tiny Object Recognition methods in section 5. Finally, Section 6 concludes the
(DCTI). Our proposed network has not only fewer pa- paper.
rameters but also high performance on the tiny im-
age. It has both good accuracy and minimal com-
putational cost. Through experiments, we achieved 2 RELATED WORKS
some good results which are quite effective for multi-
purposes. This is the motivations for us to continue
The earlier method for object recognition named Con-
to develop our method and build many systems which
volutional Neural Networks is proposed by Yann Le-
make use of object recognition such as understanding
cun et. al. (LeCun et al., 1989). It demonstrates
image systems, image search engine systems.
high performance on MNIST Dataset. Many current
Contributions. In our work, we consider in tiny architectures used for object recognition are based
images with size 32 × 32. We focus on exploiting on Convolutional Neural Networks (Graham, 2014),
local features with small convolutional filters size. (Krizhevsky et al., 2017a), (Zeiler and Fergus, 2013).
Therefore, we use convolutional filters size 3 × 3. It Very Deep Convolutional Neural Networks: a
fits with tiny images and helps to extract local fea- method proposed by Andrew Zisserman et. al. (Si-
tures. Besides that, it helps reducing parameters and monyan and Zisserman, 2014). It has good perfor-
to push network going deeper. mance on ImageNet Dataset. Very deep convolu-
In traditional approaches, the last layers use fully tional neural networks have two main architectures
connected layers to feed feature maps to feature vec- are VGG-16 and VGG-19. VGG-16 and VGG-19
tors. However, it increases more parameters and leads mean that there are 16 layers and 19 layers having
to over-fitting. Our network proposes using global av- parameters. The main contribution of its paper is a
erage pooling (Lin et al., 2013) instead of fully con- thorough evaluation of networks of increasing depth
nected layers. The purpose of this work is to help the using an architecture with very small (3 × 3) convo-
network directly project significant feature maps into lution filters, which shows that a significant improve-
the feature vectors. Additionally, global average pool- ment on the prior-art configurations can be achieved
ing layers do not employ parameters. So it has fewer by pushing the depth to 16-19 weight layer.
parameters and over-fitting is avoided. Network In Network: notice the limitations of
In deep networks, small changes can amplify layer using the fully connected layer, a novel network struc-
by layer. It leads to change distribution each layer. ture called Network In Network (NIN) to enhance the
This problem is called Internal Covariate Shift. To model discriminability for local receptive fields (Lin
tackle this problem, we use Batch-Normalization pro- et al., 2013). Global average pooling is used in this
posed by Ioffe et. al. (Ioffe and Szegedy, 2015). network instead of fully connected layer. The pur-
Once again, through experiments, we prove batch- pose of this work is to reduce parameters and enforc-
normalization is potential and efficient. It also helps ing correspondences between feature maps and cate-
faster learning. gories. It continues improving by Batch-normalized
Additionally, to prevent over-fitting, we use Maxout and has good performance on CIFAR-10
dropout. In common, dropout is put after fully con- dataset (Chang and Chen, 2015). In our work, we
nected layers. But in our network, we put it after also use global average pooling approach.
convolutional layers. Through experiment, this work Deep Residual Learning for Image Recogni-
helps improving accuracy and to avoid over-fitting. tion: one of the limitations when the network has
676
Lightweight Deep Convolutional Network for Tiny Object Recognition
677
INDEED 2018 - Special Session on INsights DiscovEry from LifElog Data
678
Lightweight Deep Convolutional Network for Tiny Object Recognition
679
INDEED 2018 - Special Session on INsights DiscovEry from LifElog Data
Figure 2: Objective (top) and Accuracy (bottom) training Figure 3: Objective (top) and Accuracy (bottom) training
plot (CIFAR-10). plot (CIFAR-100).
680
Lightweight Deep Convolutional Network for Tiny Object Recognition
tional filters size to deal with local features and push duced computational cost. Although we cannot reach
network can going deeper. And the network can learn the state-of-the-art, the results we achieved proved
high level features. Furthermore, we use global av- that our method is promising. It was also demon-
erage pooling helps reducing parameters significantly strated that the representation depth is beneficial for
and is more native to the convolution structure by en- the recognition. Additionally, we proved very deep
forcing correspondences between feature maps and models were used to fit small datasets as long as the
feature vectors. The new way of putting dropout after input image is big enough so that it does not van-
convolutional layers help improving accuracy. It was ish as the model going deeper. Our results yet again
proved through our result we achieved. confirmed the performance of very deep CNN for the
pattern recognition task. In the future, we continue
improving our method to get higher performance and
reduce parameters.
6 CONCLUSION
In our research, we proposed a new method for object
recognition with tiny images. By using very small REFERENCES
convolutional filters, we pushed our network going
Chang, J. and Chen, Y. (2015). Batch-normalized maxout
deeper and dealt with local features. It also helped
network in network. CoRR, abs/1511.02583.
our network learn high level features. And by using
Ciresan, D. C., Meier, U., and Schmidhuber, J. (2012).
global average pooling instead of fully connected lay- Multi-column deep neural networks for image classi-
ers, we reduced parameters significantly. Moreover, fication. In CVPR, pages 3642–3649. IEEE Computer
it helped the network directly project significant fea- Society.
ture maps into the feature vectors. Beside that, us- Graham, B. (2014). Fractional max-pooling. CoRR,
ing batch-normalization and dropout helped to accel- abs/1412.6071.
erate the learning process and preventing over-fitting. Grus and Joel (2015). Data science from scratch. CA:
Furthermore, it also improved performance and re- O’Reilly. pp. 99, 100. ISBN 978-1-491-90142-7.
681
INDEED 2018 - Special Session on INsights DiscovEry from LifElog Data
He, K., Zhang, X., Ren, S., and Sun, J. (2016a). Deep resid-
ual learning for image recognition. In CVPR, pages
770–778. IEEE Computer Society.
He, K., Zhang, X., Ren, S., and Sun, J. (2016b). Identity
mappings in deep residual networks. In ECCV (4),
volume 9908 of Lecture Notes in Computer Science,
pages 630–645. Springer.
Ioffe, S. and Szegedy, C. (2015). Batch normalization: Ac-
celerating deep network training by reducing internal
covariate shift. In ICML, volume 37 of JMLR Work-
shop and Conference Proceedings, pages 448–456.
JMLR.org.
Kessy, A., Lewin, A., and Strimmer, K. (2015). Optimal
whitening and decorrelation. arXiv.
Krizhevsky, A. Appendix a of learning multiple layers of
features from tiny images.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017a).
Imagenet classification with deep convolutional neu-
ral networks. Commun. ACM, 60(6):84–90.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2017b).
Imagenet classification with deep convolutional neu-
ral networks. Commun. ACM, 60(6):84–90.
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D.,
Howard, R. E., Hubbard, W. E., and Jackel, L. D.
(1989). Backpropagation applied to handwritten zip
code recognition. Neural Computation, 1(4):541–551.
Liang, M. and Hu, X. (2015). Recurrent convolutional neu-
ral network for object recognition. In CVPR, pages
3367–3375. IEEE Computer Society.
Lin, M., Chen, Q., and Yan, S. (2013). Network in network.
CoRR, abs/1312.4400.
Liu, S. and Deng, W. (2015). Very deep convolutional
neural network based image classification using small
training sample size. In ACPR, pages 730–734. IEEE.
Mohamad, B., Ismail, and Usman, D. (2013). Standardiza-
tion and its effects on k-means clustering algorithm.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
CoRR, abs/1409.1556.
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I.,
and Salakhutdinov, R. (2014). Dropout: a simple way
to prevent neural networks from overfitting. Journal
of Machine Learning Research, 15(1):1929–1958.
Stollenga, M. F., Masci, J., Gomez, F. J., and Schmidhuber,
J. (2014). Deep networks with internal selective at-
tention through feedback connections. In NIPS, pages
3545–3553.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2015). Going deeper with convolutions.
In Computer Vision and Pattern Recognition (CVPR).
Zagoruyko, S. and Komodakis, N. (2016). Wide residual
networks. In BMVC.
Zeiler, M. D. and Fergus, R. (2013). Stochastic pooling for
regularization of deep convolutional neural networks.
CoRR, abs/1301.3557.
682