Image_recognition_based_on_deep_learning (1)
Image_recognition_based_on_deep_learning (1)
Abstract—Deep learning is a multilayer neural network The high performance of large-scale data processing ability is
learning algorithm which emerged in recent years. It has brought the core technology in the era of big data. Most current
a new wave to machine learning, and making artificial classification and regression machine learning methods are
intelligence and human-computer interaction advance with big shallow learning algorithm. It is difficult to represent complex
strides. We applied deep learning to handwritten character
recognition, and explored the two mainstream algorithm of deep
function effectively, and its generalization ability is limited for
learning: the Convolutional Neural Network (CNN) and the Deep complex classification problems [4], [5].
Belief NetWork (DBN). We conduct the performance evaluation In order to overcome the problem of shallow
for CNN and DBN on the MNIST database and the real-world representation and manually extracting features, Hinton et al
handwritten character database. The classification accuracy rate put forward deep learning in 2006 [6], give rise to a new wave
of CNN and DBN on the MNIST database is 99.28% and 98.12%
respectively, and on the real-world handwritten character in artificial neural network research. Deep learning has
database is 92.91% and 91.66% respectively. The experiment become a hotspot of the Internet big data and artificial
results show that deep learning does have an excellent feature intelligence. The nature of Deep learning is self-learning by
learning ability. It don’t need to extract features manually. Deep
build multilayer model and train it with vast amounts of data.
learning can learn more nature features of the data.
It can improve the accuracy rate of the classification or
Keywords—Deep learning; Artificial intelligence; Convolutional prediction [7]. Deep learning methods are representation-
Neural Network; Deep Belief Network; Handwritten Character learning methods with multiple levels of representation,
recognition
obtained by composing simple but non-linear modules that
each transform the representation at one level into a
I. INTRODUCTION representation at a higher, slightly more abstract level. With
Nowadays, more and more people use images to represent the composition of enough such transformations, very
and transmit information. It is convenient for us to learn a lot complex functions can be learned [8]. This paper studies two
of information from images. Image recognition is an important deep learning methods: Convolutional Neural Network (CNN)
research area for its widely applications. For the image
and Deep Belief Network (DBN) for handwritten character
recognition problem such as handwritten classification, we
should know how to use the data to represent images. The data recognition. We compared and analyzed the different of those
here is not the row pixels, but should be the feature of images two methods. The experiment results show that deep learning
which has high level representation. The stand or fall of has a strong ability to learn features, and has great potential to
feature extraction is vital to the result. To the problem of solve the complex classification problems.
handwritten character recognition, Huang et al [1] extracted
the character’s structure features from the strokes, than use it II. CONVOLUTION NEURAL NETWORK ALGORITHM
to recognize the handwritten characters. Rui et al [2] adopted
morphology method to improve local feature of the characters, A. CNN Model
then use PCA to extract features of characters. These methods A simple CNN model can be seen in figure 1. The first
all need to manually extract features from images. The model's layer is input layer, the size of the input image is 28h28. The
prediction ability has strong dependency on the modeler’s prior
knowledge. In the field of computer vision, manual feature second layer is the convolution layer C2, it can obtain four
extraction is very cumbersome and impractical because of the different feature maps by convolution with the input image.
high dimension of feature vector [3]. The third layer is the pooling layer P3. It computes the local
In recent years, with the great improvement of the data average or maximum of the input feature maps. The following
collection ability and technology, the amount of data we can convolution layer and pooling layer operate in the same way,
get increasing rapidly. A revolution of the big data has come. except the number and size of convolution kernels. The output
E v, h | T ¦ ai vi ¦ b j h j ¦ ¦ v jWij h j (7)
the activation function, nl 1 is the total number of input feature i 1 j 1 i 1 j 1
l
maps, kij is convolution kernel coefficient, b is the bias, * is
Where T ^W , a , b ` is
ij i j
the parameters of RBM, Wij is the
n 1 P v, h | T (8)
The gradient of the cost function respect to the convolution Z T
kernels coefficient and the bias are:
where Z T ¦e E v , h |T
is a normalization factor. The
bl E
j
¦G l
j (3) v,h
j
(¦ kijl G ) f c u lj is the error term. We can
l 1
j
Z T h
j 1 The aim of training RBM is to get the parameter T . It can
update the network weights as follows: learn from the logistic likelihood function with the training sets.
ª§ 1 · º
T
(t)
units with the known training samples v , “data” and “model”
543
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on April 04,2025 at 03:15:18 UTC from IEEE Xplore. Restrictions apply.
is the shorthand of P h | v , T and P v, h | T respectively.
(t) database is larger than the latter. It embodies the nature of deep
learning: the deep model needs huge amounts of data. A huge
The gradients of the logistic likelihood function are: advantage of deep learning is that it is easy to achieve higher
w log P v | T accuracy rate by increasing the depth of the model, if we have
vi h j data vi h j mod el (11) huge amounts of data [7].
wWij
w log P v | T TABLE I. THE ACCURACY RATE OF CNN ON MNIST DATABASE AND THE
vi vi (12) REAL-WORD HANDWRITTEN CHARACTER DATABASE
wai
data mod el
The real-word
w log P v | T Learning
MNIST handwritten character
hj hj (13) structure
rate
database
wb j data mod el
Accuracy
epochs
Accuracy
epoch
rate(%) rate(%)
Due to the normalization factor, P v, h | T is difficult to 784-8-24-c 1 99.25 21 92.91 33
obtain, so we adopt Contrastive Divergence(CD) [15] to get the 784-4-12-c 1 99.05 25 91.98 32
784-16-48-c 1 99.28 17 88.72 -
approximate value.
note˖“-” indicates unable to convergence
IV. EXPERIMENTS When the number of kernels increased from 4, 12 to 8, 24,
the accuracy rate on both two databases increased. However,
A. Experimental Data
when it increased to 16, 48, the accuracy rate on MNIST
In this section, we choose the MNIST handwritten digits increased, on the real-word handwritten character database
database [16] and a real-word handwritten characters database decreased. It shows that if the volume of training samples can
to compare the performance of deep learning. MNIST contains fully meet the requirement of the learning method, the number
60000 training samples and 10000 testing samples, the size of of features extracted from CNN are increasing and the
the image is 28 h 28. The real-word handwritten characters recognition performance of CNN is getting better with the
database has 18760 training samples and 3240 testing samples. increase in the number of kernels. On the other hand, if the
It includes 10 numbers, 26 uppercase English letters, 26 volume of training samples is relatively small, too many
lowercase English letters and 5 Chinese characters, a total of kernels will cause overfitting, and lead to the worse
67 different characters. Those characters written by 500 people performance of CNN.
in different handwriting. The image size is also 28h28. Some
sample images in the real-word handwritten characters
database can be seen in figure 3.
(a) (b)
Fig. 3. Sample images in the real handwritten character database
544
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on April 04,2025 at 03:15:18 UTC from IEEE Xplore. Restrictions apply.
rate can be seen in figure 4. The classification error rate of the features of the data actively, instead of manually extract
training set and testing set are relatively stable in Figure 4 (a), features. It is a huge advantage and potential of deep learning.
and the CNN converged after 33 epochs. It shows that CNN However, the successful of deep learning in practical
has better performance when the learning rate is 1. In figure 4 application depends on the labeled data. Supervised learning is
(b), the classification error rate is high before the 17th epochs, still the leading direction [18].
afterwards it declined Significantly. The network becomes Comparing the experiment results in table 2 and table 1, we
convergence until the 48th epochs. Because of the small can learn the primary difference between DBN and CNN:
learning rate brings a long-term slow learning process to the 1) DBN belongs to unsupervised learning method, and it is
network. It need more time or more epochs for the network to a generation deep model, while CNN belongs to supervised
reach convergence. That is the weight update is slow with a learning method, and it is a discrimination deep model.
small learning rate. The weights update is not obvious after 2) DBN is usually suitable for one dimensional data
start training in a long time. The classification error rate is modeling, such as speech, while CNN is more suitable for two
staying at a high value with the training process in figure 4 (c). dimensional data modeling, such as images.
It is because the learning rate is too large that lead to the 3) CNN is essentially a map of input and output. It can
network become saturation quickly, and the recognition learn a lot of the mapping relations, and need not any precise
performance is bad. mathematical expression[19], while DBN needs to built a joint
probability distribution between visible and hidden units, and
C. The Experiment Results of DBN the marginal probability distribution of visible and hidden units
To observe how hidden layer structure affects the respectively.
performance of DBN, we adopted four different DBNs: 784- All in all, CNN and DBN have different advantages, we
100-100-c, 784-100-100-500-c, 784-500-500-c, 784-500-500- can choose the suitable method according to practical situation.
1000-c, 784 is the dimension of the input image, c is the class
V. CONCLUSIONS
numbers, the middle part is the number of neurons in each
hidden layer. In this paper, we applied deep learning to the real-word
Table 2 compares the accuracy rate of the four DBNs on handwritten character recognition, and obtained good
MNIST database and the real-word handwritten character performance for image recognition. We analyzed the different
database. We can see from table 2 that 784-500-500-1000-c between CNN and DBN by comparing the experiment results.
achieve the highest accuracy rate on MNIST database, but on Deep learning can approximate the complex function through
the real-word handwritten it is lower than the 784-100-100- deep nonlinear network model. It not only avoids the large
500-c. It shows that the hidden layer structure is very important workload of manually extract features, but also is better to
to DBN. It will be limited to extract features from training data, describe potential information of the data.
and decrease the performance of DBN if the number of In the feature work, we will further study the optimization
neurons in hidden layer is too small. On the other hand, if there of deep learning, and apply it to more complex image
is excessive number of neurons in hidden layer, it will bring recognition problems.
overfitting and the bad performance of DBN. There is a trick to
determine the hidden layer neurons: First, estimate the bit ACKNOWLEDGMENT
number of the data vector, than multiple the number of training This work is supported by the National Natural Science
samples. Finally, we can get the best number of hidden layer Foundation of China (No. 61375017), and the Outstanding
neurons before divide 10 [17]. Middle-young Scientific and Technological Innovation Team
Plan of Colleges and Universities in Hubei province(T201202).
TABLE II. THE ACCURACY RATE OF DBNS ON MNIST DATABASE AND
THE REAL-WORD HANDWRITTEN CHARACTER DATABASE(%) REFERENCES
real-word handwritten [1] H. M. Huang , X. J. Wang, Z. J. Yi, X. X. Ma, “A character
structure MNIST
character database recognition based on feature extraction,” Journal of Chongqing
784-100-100-c 97.72 90.72 University(Natural Science Edition), vol. 23, pp. 66-69, Jan.
784-500-500-c 97.76 89.09 2000.
784-100-100-500-c 97.91 91.66 [2] T. Rui, C. L. Shen, J. Ding, J. L. Zhang, “Handwritten character
784-500-500-1000-c 98.12 90.41 recognition using principal component analysis,” MINI-
MICRO SYSTEMS, vol. 26, pp. 289-292, Feb.2005.
D. Comparative Analysis of The Two Deep Learning
[3] R. Walid, A. Lasfar, “Handwritten digit recognition using sparse
Algorithm deep architectures,” International Conference on Intelligent
The experiment results in section 3.2 and section 3.3 show Systems: Theories & Applications. IEEE, 2014, pp.1-6
that CNN and DBN both get high accuracy rate on MNIST [4] Y. Bengio, “Learning deep architectures for AI.” Foundations
database and the real-word handwritten character database. It and Trends in Machine Learning, vol. 2, pp. 1-127, 2009.
indicates that deep learning not only has recognition ability for [5] Z. J. Sun, L. Xue, Y. M. Xu, Z. Wang, “ Overview of deep
simple handwritten digitals images, but also has good learning,” Application Research of Computers, vol. 29, pp.
performance for characters and objects recognition in complex 2806-2810, Aug. 2012.
images. In addition, deep learning can learn the inherent
545
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on April 04,2025 at 03:15:18 UTC from IEEE Xplore. Restrictions apply.
[6] G. E. Hinton, R. R. Salakhutdinov, “Reducing the [13] H. F. Li, C. G. Li, “Note on deep learning and deep learning
dimensionality of data with neural networks,” Science, vol. 313, algorithms,” Journal of Hebei University(Natural Science
pp. 504-507, 2006. Edition), vol. 32, pp. 538-544, 2012.
[7] K. Yu, L. Jia, Y. Q. Chen, W. Xu, “Deep learning: yesterday, [14] Y. Bengio, P. Lamblin, D. Popovici, “Greedy layer-wise
today, and tomorrow,” Journal of Computer Research and training of deep networks,” Advances in Neural lnformation
Development, vol. 50, pp. 1799-1804, 2013. Processing Systems, vol. 19, pp. 153-160, 2007.
[8] Y. LeCun, Y. Bengio, G. E. Hinton, “Deep Learning,” Nature, [15] G. E. Hinton, “Training products of experts by minimizing
vol.521, pp. 436-444, 2015. contrastive divergence,” Neural computation, vol. 14, pp. 1771-
[9] G. E. Hinton, “A Practical Guide to Training Restricted 1800, 2002.
Boltzmann Machines,” Lecture Notes in Computer Science, pp. [16] Y. LeCun, C. Corinna, THE MNIST DATABASE of
599-619, 2012. handwritten digits, http://yann.lecun.com/exdb/mnist/.
[10] B. David, “Character recognition using convolutional neural [17] L. Wang , B. C. Zhang, “Review on deep learning,” Highlights
networks,” Seminar Statistical Learning Theory University of of Sciencepaper Online, vol. 8, pp.510-517,2015.
Ulm, Germany Institute for Neural Information Processing, pp. [18] N. ANDREW, “Deep learning: overview and trends,” Beijing:
2-5, 2006. Automatization Institute, 2014.
[11] G. E. Hinton, S. Osindero, Y. W.Teh, “A fast learning algorithm [19] J. W. Liu, Y. Liu, X. L. Luo, “Research and development on
for deep belief nets,” Neural Computation, vol. 18, pp. 1527- deep learning,” Application Research of Computers, vol.31, pp.
1554, 2006. 1921-1930, 2014.
[12] C. X. Zhang, N. N. Ji, G. W. Wang, “Introduction of restricted
boltzmann machines,” Sciencepaper Online, Beijing,
http://www.paper.edu.cn/releasepaper/ content/201301-528.
546
Authorized licensed use limited to: Sri Sivasubramanya Nadar College of Engineering. Downloaded on April 04,2025 at 03:15:18 UTC from IEEE Xplore. Restrictions apply.