0% found this document useful (0 votes)
30 views

Construction of CNN Model Based On Hard-Assigned Coding of Image Features

Uploaded by

aishwarya.0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Construction of CNN Model Based On Hard-Assigned Coding of Image Features

Uploaded by

aishwarya.0225
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA)

Construction of CNN model based on hard-assigned


coding of image features
2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA) | 978-1-6654-7278-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICPECA56706.2023.10075740

Bei Zhang Huazhe Tian Zihao Huang


Jingmen Daily School of Electronic Information School of Electronic Information
Jingmen Mobile Media Co.Ltd Engineering Engineering
Jingmen, Hubei, China Jingchu University of Technology Jingchu University of Technology
[email protected] Jingmen, Hubei, China Jingmen, Hubei, China
[email protected] [email protected]

Xinghua Wan Zhong Shu*


School of Electronic Information Engineering School of Electronic Information Engineering
Jingchu University of Technology Jingchu University of Technology
Jingmen, Hubei, China Jingmen, Hubei, China
[email protected] [email protected]
*Corresponding author

Abstract—This paper proposes a CNN model based on deep learning technology has been widely used, and the
hard-assigned coding processing of image features. By clearly image recognition effect is good.
describing the relationship between the input image sample
features participating in the learning dictionary training and Image feature processing mainly uses linear or nonlinear
the atomic description features in the learning dictionary in the mathematical transformation methods to accurately describe
HC hard-assigned image feature encoding transformation, two the acquired image features, so as to facilitate subsequent
sets of parameters are designed to calculate the input image classification and recognition of image features. Image
sample features and the atomic description features in the feature processing is realized by encoding image features.
learning dictionary. The minimum distance between them, and Image feature encoding is to express image features in the
the HC hard-assigned image feature coding transformation is form of encoding coefficients. The encoding coefficients
improved; the HC hard-assigned coding transformation can be expressed in two ways: manual feature encoding [5,
algorithm is introduced into the last convolutional layer and 6] and deep learning feature encoding [7, 8].
pooling layer in the CNN model, and a complete CNN deep
neural network is designed. Improve the model. The Image feature compression, also known as image feature
advantages of the model constructed in this paper are verified pooling, is the spatial dimension compression (or dimension
by experimental comparison with a variety of image feature reduction) of the extracted image features or the image
encoding transformation algorithms. features after replacement processing. The results of feature
compression are used in the classification of subsequent
Keywords—Image Feature Vector, Hard-assignment image features. The main methods of image feature
Coding(HC), Supervised learning Dictionary, Convolutional dimensionality reduction are average pooling, maximum
Neural Networks (CNN) pooling and random allocation pooling.
The image feature classifier firstly defines the label of
I. INTRODUCTION
the reduced dimension image feature, and then conducts
In the process of image recognition, there are two main supervised training on the input image feature label to verify
methods of image feature extraction: one is based on manual the accuracy of the acquired image feature. An image
feature extraction methods. The most typical algorithms are feature tag is a symbol that defines the specific features of
SIFT scale invariant feature extraction algorithm [1] and an image. The types of image feature classifiers include LR
HOG gradient histogram feature extraction algorithm [2]. logical regression classifier, KNN nearest neighbor
These feature extraction algorithms must be based on the classifier, SVM support vector classifier, image feature
accurate description of image features by computer and have coding classifier, Softmax classifier, etc. [9-11], among
summarized some experience in image feature recognition which, Softmax classifier is mainly used in CNN depth
and classification. In terms of image recognition effect, For convolution neural network model, and the core principle is
some special images, the processing and recognition to solve the minimum value of image feature value function
performance is low.The other is learning based feature based on gradient descent method.
extraction methods. The most typical representative is deep
learning [3, 4]. Deep learning first lists some known image The basis of this paper is: (1) image feature coding, that
feature information, builds a learning dictionary (or learning is, how to describe image features; (2) CNN depth
model), and puts the known image feature information into convolution neural network model is how to extract image
the learning dictionary. The input image finds its own features from image samples. The basic ideas of the research
matching features in the known image features through are: (1) Analyze the commonly used image feature coding
learning and training in the learning dictionary, At present, methods and propose an improved image feature coding
method; (2) After analyzing the commonly used CNN depth

978-1-6654-7278-4/23/$31.00 ©2023 IEEE 575 January 29-31, 2023 Shenyang, China


Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
convolution neural network model, an improved depth distance formula, the following calculations can be made
convolution neural network model is proposed, which respectively: The i th feature in the input graph
applies the improved image feature coding method to the corresponds to the minimum distance j1 between the
improved depth convolution neural network model.
features described by the j th atom in the learning
II. THIS PAPER BUILDS A DEEP NEURAL NETWORK MODEL dictionary,and the j th feature in the input graph
BASED ON IMAGE FEATURES corresponds to the minimum distance j2 between the
features described by the i th atom in the learning
A. The image feature transformation coding model proposed
in this paper dictionary. Then, weight distribution is carried out for j1
In HC hard-assigned coding, the value of z( i , j ) is and j2 respectively. The weight value of j1 and j2 can
defined as 1 and 0, and the part whose value is 1 is centrally be defined as λ1 and λ2 respectively, and the value of j
defined as the coding coefficient, which is an important can be finally determined. Either λ1 or λ2 can be
parameter for the subsequent image sample classification. In
HC hard assignment coding, the key is to define the determined by the Lagrange multiplier, and λ1 + λ2 = 1 .
condition of 1 and determine the parameter of the condition The value of j can be defined as follows:
as j . j mainly refers to the atomic feature of the nearest
distance in the coding coefficient and its learning dictionary,
and the determination of the nearest distance is based on the 2
Euclidean distance formula. In Euclidean distance, the j1 = f (i ) - c j
2
relevant feature matrix needs to be represented by a vector, min

and the value of j is defined as [12] : j = f ( j ) - ci


2
2 (2)
2
min

2 j = λ1 j1 + λ 2 j 2 ,λ1 + λ 2 = 1
j = f i - c j (1) min min
2
Where, when the condition of formula 2 is satisfied, the
In Formula 1, f i is the i -th feature in the input graph, improved HC hardassigns the code z ( i , j ) = 1 , otherwise,
and c j is the feature described by the j -th atom in the z(i , j ) = 0 .
learning dictionary. Both of these features are described by
2 The mathematical modeling principle of this dictionary
vector matrix. The true meaning of item f i - c j is the learning classification algorithm is as follows: X is defined
2
as the image samples participating in model training, n is the
reconstruction residual term, which is the key to complete
total number of samples participating in model training, C is
image reconstruction through image features. The
the number of classification of image samples, k is a certain
reconstructed residual term is applied to the norm operation
classification of image samples, that is, the k-class image
based on l2 , and its parameters are described in the samples, R is the set of image samples participating in
previous section. The reconstruction of residual error is to training, p is the set of image sample classification, X=[X1,
control the gradient (amplitude) of the decline of the X2, X3, …, Xk, …, XC], X∈Rp×n, Xk∈Rp×nk, nk is a class of
difference of f i - c j in the learning dictionary through the image samples participating in training after classification.
∑Ck=1nk=n can be expressed as the total number of image
gradient descent algorithm, and reduce the difference of samples participating in the training, including the number
f i - c j to the minimum value (absolutely not 0) repeatedly of image samples after classification, and the number of
according to a certain gradient value (which is a fixed value). classified image samples is clearly defined.
The understanding of gradient value can be illustrated by an The main problem of the dictionary learning
example, such as the process from 1 to 0.2, which can be classification algorithm is to add one or more groups of
expressed as: 1→0.8 (1-0.2) →0.6 (0.8-0.2) →0.4 (0.6-0.2) strategies for classification decision into the dictionary, and
→0.2 (0.4-0.2), where 0.2 is the gradient descent value. introduce the compression mechanism. Its mathematical
In Formula 1, the main basis for the reconstruction of model is as follows:
residual is the difference between the i -th feature in the
input graph and the feature described by the j -th atom in 2
min X - DA F + λ1 A P + λ2 L( D, A, Y ) (3)
the learning dictionary, which cannot represent the fineness
and accuracy of the image feature coding in detail. The main
reason is that although i connection is established between In Formula 3, X - DA
2
is the reconstruction residual,
the i -th feature in the input graph and the feature described F
which refers to the difference between the actual detected
by the j -th atom in the learning dictionary, the connection
image classification results and the ideal image classification
between the j -th feature in the input graph and the feature results, and can be used to analyze and determine the
described by the i -th atom in the learning dictionary is not reliability of image classification results. D is a learning
clearly stated. Therefore, on the premise of Euclidean dictionary, D ∈ Rp×k; A is the coefficient matrix

576
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
corresponding to X and D, A=[A1, A2, A3, …, Ak, …, AC], A∈ in the CNN deep convolutional neural network model above.
RK×n; Y is the label of image sample after classification, A new CNN deep convolutional neural network model is
which is represented by matrix. Assuming xi is the k-th image proposed. The structure of the improved CNN model
sample after classification, yi=k, (k=1,2,3, …,C), proposed in this paper is as follows: one input layer (one
(i=1,2,3, …,n); λ1 and λ2 are Lagrange multipliers, whose neuron) → the first convolutional layer (four neurons) → the
function is to realize the regularization of the used first pooled layer (four neurons) → the second convolutional
layer (eight neurons) → the second pooled layer (eight
parameters, λ1≥0 and λ2≥0; A is the canonical tool of the neurons) →...→...→ Improved HC hardassignment encoding
p
norm of the coefficient matrix A; L is the tool to determine transform model → a fully connected layer (using multiple
the accuracy of image classification results, and L(D, A, independent Softmax classifiers for generating multiple
Y)can determine the accuracy of D and A. In the dictionary kinds of image features) → multiple output layers (multiple
learning classification model, the learning dictionary D neurons).
comes from the training results of the image sample A
participating in the training, and the function of D is to III. EXPERIMENTAL RESULTS AND DISCUSSION
determine the correctness of the label of the classified image
sample. It is also said that learning dictionary D is a tool A. Experimental conditions
added to other non-learning dictionaries. Although the In order to verify the image sample classification effect
training time of input image samples is lengthened, the of the HC hard assignment improved coding algorithm
introduction of decision analysis strategy improves the proposed in this paper, the experiment will use HC hard
accuracy of training results. assignment coding algorithm, LSC soft assignment coding
algorithm, SC sparse coding algorithm, KSVD dictionary
B. The CNN Deep convolutional neural network model learning classification algorithm and the image sample
improved in this paper classification results of the HC hard assignment improved
The basic structure of CNN deep convolutional neural coding algorithm proposed in this paper for statistical
network model is as follows: One input layer (one neuron) analysis. Meanwhile, the image recognition effect of the
→ the first convolution layer (four neurons) → the first improved CNN deep convolutional neural network model
pooled layer (four neurons) → the second convolution layer proposed in this paper is verified. In the experiment process,
(eight neurons) → the second pooled layer (eight neurons) AE depth autoencoder model, RBM Constrained Boltzmann
→ the n-th convolution layer (the number of neurons machine model, DBN deep belief network model, Image
usually increases twice as much as the previous one) → the Net deep neural network model and the improved CNN
n-th pooled layer (the number of neurons usually increases deep convolutional neural network model were used to
twice as much as the previous one) → one From one fully compare and analyze. The image sample sets used in all
connected layer (one neuron) to one or more output layers experiments are DTD texture image sample set, MIT indoor
(one or more neurons). In the whole model, the convolution scene image sample set, CUB200 bird image sample set and
layer and the pooling layer appear successively in pairs, and Caltech256 multi-class target image sample set.
the basic unit of each layer is the neuron. The function of
the stacking layer has been introduced in the previous paper. B. Experimental test results and discussion
The improved HC hard-assigned coded image feature In the verification of the classification effect of the
transformation coding model proposed in this paper can image samples, the five comparison algorithms are used as
achieve more refined classification of image features and the operating platform of the learning dictionary model of
compression of image feature dimensions (pooling the classification and coding of the previous image features.
dimension reduction). Therefore, this coding model can be The experimental statistical data are shown in Table I.
used to replace the last convolution layer and pooling layer

TABLE I STATISTICAL TABLE OF CLASSIFICATION EFFECT OF FIVE FEATURE CODING ALGORITHMS IN FOUR IMAGE SAMPLE SETS
Image sample set DTD MIT
CUB200 Caltech256
Classification accuracy rate (%) sample sample
sample set sample set
Using algorithms set set
HC hard assignment coding algorithm 72.58 74.32 71.93 71.47
LSC soft assignment coding algorithm 74.25 76.28 73.48 72.52
SC sparse coding algorithm 78.61 80.13 79.12 74.35
KSVD dictionary learning classification algorithm 78.79 79.86 78.84 74.67
Image feature coding algorithm in this paper 80.34 82.19 80.03 78.37
entry description features, and the exponential function is
Among the algorithms listed in Table I, due to the small easier to realize the construction of linear and nonlinear
range of parameters defined by HC hard assignment coding functions, and can even realize the calculation of the
algorithm, it only targets at input image samples and learns minimum value of partial derivatives. Therefore, this coding
a certain feature in dictionary entries, and its classification algorithm should theoretically have a higher image feature
results are limited to a certain range of features. In particular, classification performance. The statistical data in the
the features with coding coefficient of 0 May contain some experiment is not ideal, mainly using the relatively simple
representative features. Therefore, When classifying the linear transformation in the initial algorithm.SC sparse
features of the four image sample sets, the classification coding algorithm firstly carries out sparse description on the
accuracy is not high. LSC soft distribution coding algorithm input image features and the entry description features in the
uses an exponential function to establish the relationship learning dictionary. Since there is a good basis for image
between the input image sample features and the dictionary

577
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
feature transformation, it ensures the accuracy of image between the input image sample and the image features in
feature classification results. KSVD dictionary learning the dictionary entry can be determined from multiple angles,
classification algorithm is a supervised learning dictionary which can also improve the classification effect of image
algorithm, which is obtained by improving the initial features. The statistical data in Table I shows that the
algorithm. It has the ability to analyze and judge image classification accuracy of image features is the best.
features, and is also a relatively excellent image feature
classification algorithm. The algorithm proposed in this In the verification of the recognition effect of image
paper is improved on the basis of HC hard assignment samples, the existing four deep learning models (or deep
coding algorithm. Theoretically, it does not have a particular neural network models) are directly used to compare with
advantage over the other three algorithms. However, the the deep neural network model constructed in this paper.
data obtained from experiments show that by clearly The four image recognition models involved in the
defining the input image samples and learning the image experiment are all the models originally proposed and have
features in dictionary entries, Moreover, the relation not been improved. The experimental statistical data are
shown in Table II.

TABLE II STATISTICAL TABLE OF RECOGNITION EFFECT OF FIVE IMAGE FEATURE TRAINING MODELS IN FOUR IMAGE SAMPLE SETS
Image sample set
DTD sample
Recognition rate (%) MIT sample set CUB200 sample set Caltech256 Sample set
set
Use model
AE depth autoencoder model 85.47 84.13 84.62 84.03
RBM constrained Boltzmann machine model 85.72 85.67 85.14 85.15
DBN Deep belief network model 86.39 85.92 85.37 85.81
Image Net deep neural network model 87.65 85.46 85.39 85.72
This paper improves the CNN model 89.51 85.93 86.17 85.98

The AE depth autoencoder model is an unsupervised IV. CONCLUSION


machine learning model, which mainly completes the
training of the input image samples. The image features Based on the HC hard assignment image feature coding
described by the entries in the learning dictionary do not transform and the hierarchical structure of CNN
need to be updated at any time through learning, which convolutional deep neural network as the breakthrough
determines that the performance of this model has certain point, this paper designs the image feature transformation
limitations. Compared with other models, the image coding algorithm, and applies the algorithm to CNN
recognition effect is the least ideal. RBM Restricted convolutional deep neural network to build the CNN deep
Boltzmann machine model is also an unsupervised machine network model. Through the research work, the following
learning model, which is composed of multi-layer structures, conclusions are drawn:
and the basic unit of each layer is neurons. However, this (1) On the premise of clarifying the relationship between
model strengthens the training of input image samples and the image sample features participating in the learning
weakens the error accumulation caused by iterative training dictionary training and the atomic description features in the
of image samples. The theory is an excellent machine learning dictionary, the difference between them is
learning model. And occupy a relatively important position calculated by vector norm according to Euclidean distance
in the deep machine learning model. DBN deep belief formula as the condition to determine the value of coding
network model is developed on the basis of RBM coefficient; At the same time, two methods are used to
constrained Boltzmann machine model. In its model describe the corresponding relationship between the two,
hierarchy, each layer is composed of RBM model. Although with the minimum distance calculation as the goal, and the
it is also an unsupervised machine learning model, it is also weight distribution of the two methods.
an excellent machine learning model by greatly increasing
the number of iterative training of input image samples. Even (2) In the CNN model structure, the convolution layer
better than the Image Net deep convolutional neural network and pooling layer of the previous part are retained as much
model image recognition effect. Image Net deep neural as possible to reflect its main advantages. In the last group
network model is the most typical initial model of deep of convolution layer and pooling layer, the image feature
convolutional neural network model, and its development transformation coding algorithm designed in this paper is
stems from the proposal of back propagation algorithm. The used.
acquisition of image sample features requires two stages: The experimental results of image feature classification
training (for feature extraction) and testing (for feature and image recognition show that the CNN deep
verification). It is a supervised machine learning model. This convolutional neural network model constructed in this paper
model can not only realize the simultaneous training of a has obvious advantages in both the accuracy of image feature
large number of images, but also greatly improve the image classification and the recognition rate of image samples.
recognition effect. The improved CNN model in this paper
has improved the structure of Image Net model, and
introduced fine image feature analysis and judgment strategy REFERENCES
in its key feature extraction and pooling layer. Experimental [1] Lowe D G.Distinctive Image Features from Scale-Invariant
Keypoints[J]. International Journal of Computer Vision,2004, 60(60):
statistics show that the improved CNN model in this paper 91–110.
has the best image recognition effect.
[2] Dalal N.Triggs B.Histograms of oriented gradients for human
detection[A]. In: IEEE Conference on Computer Vision and Pattern
Recognition (CVPR)[C], 2005: 886–893.
[3] Krizhevsky A,Sutskever I,Hinton G E.Image Net classification with

578
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.
deep convolutional neural networks[A]. In: International Conference [8] Cimpoi M,Maji S, Vedaldi A.Deep filter banks for texture recognition
on Neural Information Processing Systems (NIPS)[C],2012: 1097– and segmentation[A].In:IEEE Conference on Computer Vision and
1105. Pattern Recognition (CVPR)[C], 2015: 3828–3836.
[4] Simonyan K,Zisserman A.Very Deep Convolutional Networks for [9] Wright J,Yang A,Sastry S, et al.Robust Face Recognition via Sparse
Large-Scale Image Recogni-tion[A]. In: International Conference on Representation[J].IEEE Trans on Pattern Analysis and Machine
Learning Representation (ICLR)[C], 2015. Intelligence (PAMI), 2009, 31(2): 210–227.
[5] Lazebnik S,Schmid C,Ponce J.Beyond Bags of Features: Spatial [10] Jiang Z,Lin Z,Davis L S.Label Consistent K-SVD: Learning A
Pyramid Matching for Recogniz-ing Natural Scene Categories[A]. In: Discriminative Dictionary for Recognition[J].IEEE Trans on Pattern
IEEE Conference on Computer Vision and Pattern Recognition Analysis and Machine Intelligence (PAMI),2013, 35(11):2651–2664.
(CVPR)[C], 2006: 2169–2178. [11] Gu S,Zhang L,Zuo W,et al.Projective dictionary pair learning for
[6] Zhang T,Ghanem B,Liu S, et al.Low-Rank Sparse Coding for Image pattern classification[A]. In:Advances in Neural Information
Classification[A]. In: IEEE International Conference on Computer Processing Systems[C], 2014:793–801.
Vision (ICCV)[C], 2013. [12] Lazebnik S,Schmid C,Ponce J.Beyond Bags of Features: Spatial
[7] Arandjelovic R,Gronat P, Torii A, et al.Net VLAD: CNN architecture Pyramid Matching for Recogniz-ing Natural Scene Categories[A].In:
for weakly supervised place recognition[A]. In: IEEE Conference on IEEE Conference on Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition (CVPR)[C], 2016. (CVPR)[C], 2006: 2169–2178.

579
Authorized licensed use limited to: PES University Bengaluru. Downloaded on February 01,2024 at 15:12:04 UTC from IEEE Xplore. Restrictions apply.

You might also like