Construction of CNN Model Based On Hard-Assigned Coding of Image Features
Construction of CNN Model Based On Hard-Assigned Coding of Image Features
Abstract—This paper proposes a CNN model based on deep learning technology has been widely used, and the
hard-assigned coding processing of image features. By clearly image recognition effect is good.
describing the relationship between the input image sample
features participating in the learning dictionary training and Image feature processing mainly uses linear or nonlinear
the atomic description features in the learning dictionary in the mathematical transformation methods to accurately describe
HC hard-assigned image feature encoding transformation, two the acquired image features, so as to facilitate subsequent
sets of parameters are designed to calculate the input image classification and recognition of image features. Image
sample features and the atomic description features in the feature processing is realized by encoding image features.
learning dictionary. The minimum distance between them, and Image feature encoding is to express image features in the
the HC hard-assigned image feature coding transformation is form of encoding coefficients. The encoding coefficients
improved; the HC hard-assigned coding transformation can be expressed in two ways: manual feature encoding [5,
algorithm is introduced into the last convolutional layer and 6] and deep learning feature encoding [7, 8].
pooling layer in the CNN model, and a complete CNN deep
neural network is designed. Improve the model. The Image feature compression, also known as image feature
advantages of the model constructed in this paper are verified pooling, is the spatial dimension compression (or dimension
by experimental comparison with a variety of image feature reduction) of the extracted image features or the image
encoding transformation algorithms. features after replacement processing. The results of feature
compression are used in the classification of subsequent
Keywords—Image Feature Vector, Hard-assignment image features. The main methods of image feature
Coding(HC), Supervised learning Dictionary, Convolutional dimensionality reduction are average pooling, maximum
Neural Networks (CNN) pooling and random allocation pooling.
The image feature classifier firstly defines the label of
I. INTRODUCTION
the reduced dimension image feature, and then conducts
In the process of image recognition, there are two main supervised training on the input image feature label to verify
methods of image feature extraction: one is based on manual the accuracy of the acquired image feature. An image
feature extraction methods. The most typical algorithms are feature tag is a symbol that defines the specific features of
SIFT scale invariant feature extraction algorithm [1] and an image. The types of image feature classifiers include LR
HOG gradient histogram feature extraction algorithm [2]. logical regression classifier, KNN nearest neighbor
These feature extraction algorithms must be based on the classifier, SVM support vector classifier, image feature
accurate description of image features by computer and have coding classifier, Softmax classifier, etc. [9-11], among
summarized some experience in image feature recognition which, Softmax classifier is mainly used in CNN depth
and classification. In terms of image recognition effect, For convolution neural network model, and the core principle is
some special images, the processing and recognition to solve the minimum value of image feature value function
performance is low.The other is learning based feature based on gradient descent method.
extraction methods. The most typical representative is deep
learning [3, 4]. Deep learning first lists some known image The basis of this paper is: (1) image feature coding, that
feature information, builds a learning dictionary (or learning is, how to describe image features; (2) CNN depth
model), and puts the known image feature information into convolution neural network model is how to extract image
the learning dictionary. The input image finds its own features from image samples. The basic ideas of the research
matching features in the known image features through are: (1) Analyze the commonly used image feature coding
learning and training in the learning dictionary, At present, methods and propose an improved image feature coding
method; (2) After analyzing the commonly used CNN depth
2 j = λ1 j1 + λ 2 j 2 ,λ1 + λ 2 = 1
j = f i - c j (1) min min
2
Where, when the condition of formula 2 is satisfied, the
In Formula 1, f i is the i -th feature in the input graph, improved HC hardassigns the code z ( i , j ) = 1 , otherwise,
and c j is the feature described by the j -th atom in the z(i , j ) = 0 .
learning dictionary. Both of these features are described by
2 The mathematical modeling principle of this dictionary
vector matrix. The true meaning of item f i - c j is the learning classification algorithm is as follows: X is defined
2
as the image samples participating in model training, n is the
reconstruction residual term, which is the key to complete
total number of samples participating in model training, C is
image reconstruction through image features. The
the number of classification of image samples, k is a certain
reconstructed residual term is applied to the norm operation
classification of image samples, that is, the k-class image
based on l2 , and its parameters are described in the samples, R is the set of image samples participating in
previous section. The reconstruction of residual error is to training, p is the set of image sample classification, X=[X1,
control the gradient (amplitude) of the decline of the X2, X3, …, Xk, …, XC], X∈Rp×n, Xk∈Rp×nk, nk is a class of
difference of f i - c j in the learning dictionary through the image samples participating in training after classification.
∑Ck=1nk=n can be expressed as the total number of image
gradient descent algorithm, and reduce the difference of samples participating in the training, including the number
f i - c j to the minimum value (absolutely not 0) repeatedly of image samples after classification, and the number of
according to a certain gradient value (which is a fixed value). classified image samples is clearly defined.
The understanding of gradient value can be illustrated by an The main problem of the dictionary learning
example, such as the process from 1 to 0.2, which can be classification algorithm is to add one or more groups of
expressed as: 1→0.8 (1-0.2) →0.6 (0.8-0.2) →0.4 (0.6-0.2) strategies for classification decision into the dictionary, and
→0.2 (0.4-0.2), where 0.2 is the gradient descent value. introduce the compression mechanism. Its mathematical
In Formula 1, the main basis for the reconstruction of model is as follows:
residual is the difference between the i -th feature in the
input graph and the feature described by the j -th atom in 2
min X - DA F + λ1 A P + λ2 L( D, A, Y ) (3)
the learning dictionary, which cannot represent the fineness
and accuracy of the image feature coding in detail. The main
reason is that although i connection is established between In Formula 3, X - DA
2
is the reconstruction residual,
the i -th feature in the input graph and the feature described F
which refers to the difference between the actual detected
by the j -th atom in the learning dictionary, the connection
image classification results and the ideal image classification
between the j -th feature in the input graph and the feature results, and can be used to analyze and determine the
described by the i -th atom in the learning dictionary is not reliability of image classification results. D is a learning
clearly stated. Therefore, on the premise of Euclidean dictionary, D ∈ Rp×k; A is the coefficient matrix
576
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
corresponding to X and D, A=[A1, A2, A3, …, Ak, …, AC], A∈ in the CNN deep convolutional neural network model above.
RK×n; Y is the label of image sample after classification, A new CNN deep convolutional neural network model is
which is represented by matrix. Assuming xi is the k-th image proposed. The structure of the improved CNN model
sample after classification, yi=k, (k=1,2,3, …,C), proposed in this paper is as follows: one input layer (one
(i=1,2,3, …,n); λ1 and λ2 are Lagrange multipliers, whose neuron) → the first convolutional layer (four neurons) → the
function is to realize the regularization of the used first pooled layer (four neurons) → the second convolutional
layer (eight neurons) → the second pooled layer (eight
parameters, λ1≥0 and λ2≥0; A is the canonical tool of the neurons) →...→...→ Improved HC hardassignment encoding
p
norm of the coefficient matrix A; L is the tool to determine transform model → a fully connected layer (using multiple
the accuracy of image classification results, and L(D, A, independent Softmax classifiers for generating multiple
Y)can determine the accuracy of D and A. In the dictionary kinds of image features) → multiple output layers (multiple
learning classification model, the learning dictionary D neurons).
comes from the training results of the image sample A
participating in the training, and the function of D is to III. EXPERIMENTAL RESULTS AND DISCUSSION
determine the correctness of the label of the classified image
sample. It is also said that learning dictionary D is a tool A. Experimental conditions
added to other non-learning dictionaries. Although the In order to verify the image sample classification effect
training time of input image samples is lengthened, the of the HC hard assignment improved coding algorithm
introduction of decision analysis strategy improves the proposed in this paper, the experiment will use HC hard
accuracy of training results. assignment coding algorithm, LSC soft assignment coding
algorithm, SC sparse coding algorithm, KSVD dictionary
B. The CNN Deep convolutional neural network model learning classification algorithm and the image sample
improved in this paper classification results of the HC hard assignment improved
The basic structure of CNN deep convolutional neural coding algorithm proposed in this paper for statistical
network model is as follows: One input layer (one neuron) analysis. Meanwhile, the image recognition effect of the
→ the first convolution layer (four neurons) → the first improved CNN deep convolutional neural network model
pooled layer (four neurons) → the second convolution layer proposed in this paper is verified. In the experiment process,
(eight neurons) → the second pooled layer (eight neurons) AE depth autoencoder model, RBM Constrained Boltzmann
→ the n-th convolution layer (the number of neurons machine model, DBN deep belief network model, Image
usually increases twice as much as the previous one) → the Net deep neural network model and the improved CNN
n-th pooled layer (the number of neurons usually increases deep convolutional neural network model were used to
twice as much as the previous one) → one From one fully compare and analyze. The image sample sets used in all
connected layer (one neuron) to one or more output layers experiments are DTD texture image sample set, MIT indoor
(one or more neurons). In the whole model, the convolution scene image sample set, CUB200 bird image sample set and
layer and the pooling layer appear successively in pairs, and Caltech256 multi-class target image sample set.
the basic unit of each layer is the neuron. The function of
the stacking layer has been introduced in the previous paper. B. Experimental test results and discussion
The improved HC hard-assigned coded image feature In the verification of the classification effect of the
transformation coding model proposed in this paper can image samples, the five comparison algorithms are used as
achieve more refined classification of image features and the operating platform of the learning dictionary model of
compression of image feature dimensions (pooling the classification and coding of the previous image features.
dimension reduction). Therefore, this coding model can be The experimental statistical data are shown in Table I.
used to replace the last convolution layer and pooling layer
TABLE I STATISTICAL TABLE OF CLASSIFICATION EFFECT OF FIVE FEATURE CODING ALGORITHMS IN FOUR IMAGE SAMPLE SETS
Image sample set DTD MIT
CUB200 Caltech256
Classification accuracy rate (%) sample sample
sample set sample set
Using algorithms set set
HC hard assignment coding algorithm 72.58 74.32 71.93 71.47
LSC soft assignment coding algorithm 74.25 76.28 73.48 72.52
SC sparse coding algorithm 78.61 80.13 79.12 74.35
KSVD dictionary learning classification algorithm 78.79 79.86 78.84 74.67
Image feature coding algorithm in this paper 80.34 82.19 80.03 78.37
entry description features, and the exponential function is
Among the algorithms listed in Table I, due to the small easier to realize the construction of linear and nonlinear
range of parameters defined by HC hard assignment coding functions, and can even realize the calculation of the
algorithm, it only targets at input image samples and learns minimum value of partial derivatives. Therefore, this coding
a certain feature in dictionary entries, and its classification algorithm should theoretically have a higher image feature
results are limited to a certain range of features. In particular, classification performance. The statistical data in the
the features with coding coefficient of 0 May contain some experiment is not ideal, mainly using the relatively simple
representative features. Therefore, When classifying the linear transformation in the initial algorithm.SC sparse
features of the four image sample sets, the classification coding algorithm firstly carries out sparse description on the
accuracy is not high. LSC soft distribution coding algorithm input image features and the entry description features in the
uses an exponential function to establish the relationship learning dictionary. Since there is a good basis for image
between the input image sample features and the dictionary
577
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
feature transformation, it ensures the accuracy of image between the input image sample and the image features in
feature classification results. KSVD dictionary learning the dictionary entry can be determined from multiple angles,
classification algorithm is a supervised learning dictionary which can also improve the classification effect of image
algorithm, which is obtained by improving the initial features. The statistical data in Table I shows that the
algorithm. It has the ability to analyze and judge image classification accuracy of image features is the best.
features, and is also a relatively excellent image feature
classification algorithm. The algorithm proposed in this In the verification of the recognition effect of image
paper is improved on the basis of HC hard assignment samples, the existing four deep learning models (or deep
coding algorithm. Theoretically, it does not have a particular neural network models) are directly used to compare with
advantage over the other three algorithms. However, the the deep neural network model constructed in this paper.
data obtained from experiments show that by clearly The four image recognition models involved in the
defining the input image samples and learning the image experiment are all the models originally proposed and have
features in dictionary entries, Moreover, the relation not been improved. The experimental statistical data are
shown in Table II.
TABLE II STATISTICAL TABLE OF RECOGNITION EFFECT OF FIVE IMAGE FEATURE TRAINING MODELS IN FOUR IMAGE SAMPLE SETS
Image sample set
DTD sample
Recognition rate (%) MIT sample set CUB200 sample set Caltech256 Sample set
set
Use model
AE depth autoencoder model 85.47 84.13 84.62 84.03
RBM constrained Boltzmann machine model 85.72 85.67 85.14 85.15
DBN Deep belief network model 86.39 85.92 85.37 85.81
Image Net deep neural network model 87.65 85.46 85.39 85.72
This paper improves the CNN model 89.51 85.93 86.17 85.98
578
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
deep convolutional neural networks[A]. In: International Conference [8] Cimpoi M,Maji S, Vedaldi A.Deep filter banks for texture recognition
on Neural Information Processing Systems (NIPS)[C],2012: 1097– and segmentation[A].In:IEEE Conference on Computer Vision and
1105. Pattern Recognition (CVPR)[C], 2015: 3828–3836.
[4] Simonyan K,Zisserman A.Very Deep Convolutional Networks for [9] Wright J,Yang A,Sastry S, et al.Robust Face Recognition via Sparse
Large-Scale Image Recogni-tion[A]. In: International Conference on Representation[J].IEEE Trans on Pattern Analysis and Machine
Learning Representation (ICLR)[C], 2015. Intelligence (PAMI), 2009, 31(2): 210–227.
[5] Lazebnik S,Schmid C,Ponce J.Beyond Bags of Features: Spatial [10] Jiang Z,Lin Z,Davis L S.Label Consistent K-SVD: Learning A
Pyramid Matching for Recogniz-ing Natural Scene Categories[A]. In: Discriminative Dictionary for Recognition[J].IEEE Trans on Pattern
IEEE Conference on Computer Vision and Pattern Recognition Analysis and Machine Intelligence (PAMI),2013, 35(11):2651–2664.
(CVPR)[C], 2006: 2169–2178. [11] Gu S,Zhang L,Zuo W,et al.Projective dictionary pair learning for
[6] Zhang T,Ghanem B,Liu S, et al.Low-Rank Sparse Coding for Image pattern classification[A]. In:Advances in Neural Information
Classification[A]. In: IEEE International Conference on Computer Processing Systems[C], 2014:793–801.
Vision (ICCV)[C], 2013. [12] Lazebnik S,Schmid C,Ponce J.Beyond Bags of Features: Spatial
[7] Arandjelovic R,Gronat P, Torii A, et al.Net VLAD: CNN architecture Pyramid Matching for Recogniz-ing Natural Scene Categories[A].In:
for weakly supervised place recognition[A]. In: IEEE Conference on IEEE Conference on Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition (CVPR)[C], 2016. (CVPR)[C], 2006: 2169–2178.
579
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.