Research Article: Deep Learning Model of Image Classification Using Machine Learning
Research Article: Deep Learning Model of Image Classification Using Machine Learning
Advances in Multimedia
Volume 2022, Article ID 3351256, 12 pages
https://doi.org/10.1155/2022/3351256
Research Article
Deep Learning Model of Image Classification Using
Machine Learning
Received 26 May 2022; Revised 19 June 2022; Accepted 29 June 2022; Published 19 July 2022
Copyright © 2022 Qing Lv et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Not only were traditional artificial neural networks and machine learning difficult to meet the processing needs of massive images
in feature extraction and model training but also they had low efficiency and low classification accuracy when they were applied to
image classification. Therefore, this paper proposed a deep learning model of image classification, which aimed to provide
foundation and support for image classification and recognition of large datasets. Firstly, based on the analysis of the basic theory
of neural network, this paper expounded the different types of convolution neural network and the basic process of its application
in image classification. Secondly, based on the existing convolution neural network model, the noise reduction and parameter
adjustment were carried out in the feature extraction process, and an image classification depth learning model was proposed
based on the improved convolution neural network structure. Finally, the structure of the deep learning model was optimized to
improve the classification efficiency and accuracy of the model. In order to verify the effectiveness of the deep learning model
proposed in this paper in image classification, the relationship between the accuracy of several common network models in image
classification and the number of iterations was compared through experiments. The results showed that the model proposed in
this paper was better than other models in classification accuracy. At the same time, the classification accuracy of the deep learning
model before and after optimization was compared and analyzed by using the training set and test set. The results showed that the
accuracy of image classification had been greatly improved after the model proposed in this paper had been optimized to a
certain extent.
classification based on machine learning has made some features can be used as the basis of image classification.
progress. From the adoption of artificial recognition to the Traditional image classification methods generally use single
continuous application of computer vision technology in feature extraction or feature combination and take the
image classification, scholars have made some achievements extracted features as the input value of support vector
in the research of image classification and recognition. machine. In recent years, some progress has been made in
Machine learning is an important branch of artificial image classification using artificial neural network classifier.
intelligence. Although machine learning has experienced In order to improve the accuracy of image classification, we
half a century of development, there are still some unsolved can focus on the standardized design of low-level features
problems, for example, complex image understanding and such as texture, shape, and color.
recognition, natural language translation, and recommen- Deep learning realizes the training of large-scale datasets
dation system [4]. Deep learning is an important branch through multilevel network model and adopts the method of
developed on the basis of machine learning. It makes full use layer-by-layer feature extraction to obtain the high-level
of the hierarchical characteristics of artificial neural network features of the image. Not only is the deep learning network
and biological neural system to process information and model used to extract the basic features of the image but also
obtains high-level features by learning low-level features and it can obtain the deep features of the image through multiple
adopting feature combination method, so as to realize image hidden layers. Compared with traditional machine learning
classification or regression. Different from traditional ma- methods, the features obtained by deep learning method are
chine learning, deep learning uses multilayer neural network not only accurate but also conducive to image classification.
to automatically learn the image and extract the deep-seated In the process of image recognition and classification, the
features of the image. Different depth learning models can be way of feature learning and combination is mainly deter-
formed according to different feature learning and its mined by the deep learning model [8]. At present, the
combination. However, the accuracy of image classification commonly used deep learning models are sparse model,
is not high and the operation efficiency of the existing deep restricted Boltzmann machine model, and convolution
learning model is low. +erefore, based on the existing basic neural network model. Although these models have some
theory of convolution neural network, this paper establishes differences in feature extraction, they have similarities in
a deep learning model of image classification by improving image classification and recognition. +ey all go through the
the structure of convolution neural network, which provides steps of image information input, data proprocessing, fea-
a basis for image classification under complex environ- ture extraction, model training, and classification output.
mental conditions. In image classification, some scholars have carried out a
lot of research work on image feature representation and
2. Related Works classifier selection. For example, the deep learning model
based on feature representation can be effectively applied to
Image classification provides an important basis for image the recognition and classification of various images. Some
depth processing and the application of computer vision scholars use deep convolution neural networks (DCN) to
technology in related fields. Traditional image classification deeply extract image features and apply them to large-scale
mainly goes through different stages, such as image pro- dataset ImageNet [9]. Experiments show that the model can
processing, feature extraction, classifier construction, and effectively classify large data image sets. In addition, the deep
learning training [5]. Traditional image classification learning model can effectively learn and describe image
methods mainly use the extracted basic image features to features. For example, the deep learning model can better
realize image classification, which can provide a basis for describe the hierarchical features through unsupervised
further obtaining the semantic information of images by learning, and the features extracted by the model not only
computer. Traditional image classification generally uses have strong expression ability but also improve the efficiency
image color, texture, and other information to calculate of image classification. A large number of research results
image features and uses support vector machine and logistic show that, with the deepening of the research of image
regression to realize image classification [6]. +e results of classification methods in related fields, deep learning model
image classification not only depend on the extracted fea- has gradually replaced the traditional artificial feature ex-
tures to a great extent but also are affected by the knowledge traction and machine learning methods and will be widely
and experience of relevant fields. concerned by scholars in the field of image recognition and
Not only are the manually acquired features difficult to classification [10].
apply to image classification but also a lot of time is spent in
analyzing feature data. At the same time, the traditional 3. Fundamentals of Image
machine learning cannot be applied to the processing of Classification Algorithm
large datasets, and it is difficult to realize the optimization of
feature design, feature selection, and model training, which 3.1. Basic eory of Neural Network. +e traditional neural
makes the classification effect of the model poor. +erefore, network, referred to as artificial neural network (ANN), is a
image classification methods using traditional machine hot spot in the field of early artificial intelligence [5]. Ar-
learning are affected in many application fields [7]. Research tificial neural network mainly uses the neurons of network
shows that because texture, shape, and color features can be model to abstract the characteristics of external things, so as
used for image classification and recognition, low-level basic to be used by computer to complete information processing.
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Multimedia 3
Basic feature
extraction
Artificial feature
extraction
Multi-layer feature
extraction
Weight training
Weight training
(a) (b)
Figure 2: Comparison between deep learning model and traditional machine learning algorithm. (a) Deep learning algorithm.
(b) Traditional machine learning algorithm.
Image input
Image (3×3)
b1 b2 b3
b1+3b2+ b2+3b3+
5 7
5c1+7c2 5c2+7c3
c1 c2 c3
features from the local area of the image and sends the to classify ordinary images. Because the LeNet-5 model is
extracted local features to the high level for integration pro- not deep enough, it cannot extract enough image features
cessing. Because the bottom feature of the image is independent during model training. +erefore, it cannot be applied to the
of its position, it can not only use the same convolution check classification of complex images.
to extract the relevant features but also reduce the number of +erefore, some people put forward the AlexNet model
parameters of the neural network through the shared weight based on the LeNet structure, applied the convolution neural
characteristic of the convolution kernel, so as to improve the network to the processing of complex images, and provided
training efficiency of the network model. a theoretical basis for the application of deep learning model
For complex images, in order to reduce the amount of in the field of computer vision [17]. +e network structure of
parameter training of the model, the pooling layer in AlexNet is shown in Figure 7.
convolution neural network can be used to reduce the size of AlexNet is a network structure with 8 layers. +e model
feature map. During pooling, the depth and size of the image includes 5 convolution layers and 3 full connection layers.
can remain unchanged. +e operation of pooling layer +e model uses the ReLU function as the activation function
generally includes max pooling and average pooling [15], as to avoid the gradient dispersion phenomenon caused by the
shown in Figure 6. large number of layers of the network model. In order to
reduce network training time, AlexNet uses multiple GPUs
for training. In order to suppress neurons with small re-
3.3. Convolution Neural Network Model. In the existing sponse ability, AlexNet uses LRN (local response normali-
image detection and recognition models, such as ResNet, zation) processing layer to establish a competition
Mask-RCNN, and Faster R-CNN models, they are usually mechanism for neurons, so as to make the value with large
based on common network models [16]. LeNet is the most response ability increase continuously, so as to increase the
basic convolution neural network model [9]. After trans- generalization ability of the model. In addition, in order to
forming the LeNet model, the LeNet-5 model is established prevent neurons from forward propagation and back
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Advances in Multimedia
Average pooling
3 4
3 2 6 5
3 5
2 5 3 2
1 4 4 6
5 6
2 5 2 8
5 8
Max pooling
Figure 6: Schematic diagram of pool layer operation type.
227×227×3
propagation, the dropout mechanism is adopted in the full Table 1: +e network structure of Vgg-16.
connection layer of AlexNet model, so that the output results
of all hidden layer neurons are 0, so as to avoid the complex Convolution
Convolution Characteristic
interaction between neurons. Layer type kernel
kernel size diagram size
At present, VggNet is a widely used deep convolution number
neural network model. Compared with other models, Input layer — 448 × 448 —
VggNet not only has better generalization ability but also can Convolution layer 3×3 448 × 448 128
Pool layer 3×3 448 × 448 —
be effectively used for the recognition of different types of
Convolution layer 3×3 224 × 224 256
images [11]. For example, convolution neural networks such Pool layer 2×2 112 × 112 —
as FCN, UNet, and SegNet are based on VggNet model. In Convolution layer 3×3 112 × 112 768
recent years, Vgg-16 and Vgg-19 networks have been Pool layer 2×2 56 × 56 —
commonly used for VggNet models [18]. +e network Full
structure of Vgg-16 is shown in Table 1. 4096 — —
connection layer
+e Vgg-16 network structure has 16 layers in total, Full connection
4096 — —
excluding the pooling layer and Softmax layer. +e con- layer
volution core size is 3 × 3, the pooling layer size is 2 × 2, and Full connection
1000 — —
the pooling layer adopts the maximum pooling operation layer
with step size of 2. Softmax classifier — — —
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Multimedia 7
Vgg-16 network uses convolution blocks instead of used to extract the image feature information, and the
convolution layers, in which each convolution block con- Softmax classifier is used to classify the features [14].
tains 2∼3 convolution layers, which is conducive to reducing When the improved convolution neural network model
the network model parameters. At the same time, Vgg-16 is used to classify images, it is very necessary to proprocess
network adopts ReLU activation function to enhance the the image such as noise reduction and grayscale, select a
training ability of the model. Although Vgg model has more certain number of training sets and test sets from the dataset,
layers than AlexNet model, the convolution kernel of Vgg and then take the training set as the input object of the model
model is smaller than that of AlexNet model. +erefore, the after unsupervised learning processing. Secondly, the hidden
number of training iterations of Vgg model is less than that layer of the noise reduction automatic encoder is used to
of AlexNet model. encode and decode the input object, and the processing
results are output to the sparse automatic encoder of the next
4. Image Classification Model Based on layer for normalization. +e data is trained layer by layer
Improved CNN through the hidden layer of sparse automatic encoder, and
finally the training results of sparse automatic encoder are
4.1. Improved Image Classification Model Framework. output to Softmax classifier. In order to improve the clas-
Because image classification algorithms are usually used in sification accuracy, gradient descent method can be used to
systems with high real-time requirements, image classifi- strengthen the training of classifier model parameters in
cation algorithms need to consider real-time performance. order to improve the performance of image classification
For complex neural network models, image classification depth learning model. Finally, the network model is verified
needs to consume a lot of time. +erefore, this paper by using the image test set, and the effectiveness of the image
simplifies VggNet model and takes it as the model basis of classification method is tested according to the classification
image classification. results output by the model.
Considering the distribution characteristics of datasets +e improved convolution neural network model can
used for model training, a typical dataset can be selected as overcome the problem that the traditional neural network is
the weight of the model to initialize the training dataset. only limited to some features in image classification.
When the model is pretrained and reaches a certain accu- +rough the normalization of sparse automatic encoder, the
racy, the number of nodes in Softmax layer is reduced by ten overfitting phenomenon of model in data processing can be
times, and then the dataset is used for weight training. avoided, and more abstract representative features can be
Considering that the data processed by the model may be obtained by using the hidden layer of sparse automatic
affected by various noises, a noise reduction automatic encoder to train the data layer by layer. +e improved model
encoder is added to the model to eliminate the noise in- adopts Softmax classifier, which can make the classification
terference, and the existing dataset is extended through the result closer to the real value. +e improved deep learning
data enhancement method to enhance the generalization network model is mainly divided into two stages: training
ability of the model. and testing. +e training stage is mainly used to build an
Considering that the image classification algorithm effective image classification model, and the testing stage is
needs to meet certain real-time performance, the corre- mainly to evaluate and analyze the model according to the
sponding image classification model is established and experimental classification results. Figure 9 shows the
optimized based on VggNet model. Among them, the al- workflow of the improved deep learning network model.
gorithm combining convolution neural network and noise
reduction automatic encoder can be used. Because there may
be overfitting problem in image classification, it can be 4.2. Image Classification Model Optimization. It is known
optimized by data enhancement. Compared with other al- from the existing research that the convolution neural
gorithms, this classification algorithm has certain general- network model can be optimized from the aspects of data
ization performance in the case of small amount of data. In enhancement and adjusting training methods, and the op-
addition, the algorithm also adds a noise reduction auto- timization of convolution neural network model is related to
matic encoder, which can effectively reduce the impact of the type of model used. For example, for the deep learning
data noise on the performance of the model, so as to ensure model with more layers, the training parameters can be
that the model has good generalization ability. Since the optimized.
improved algorithm is based on VggNet network, the According to the structure of convolution neural net-
training time of the model may increase [19]. +e improved work, convolution layer is used to extract features, and the
image classification algorithm is shown in Figure 8. size of convolution kernel determines the extraction quality
In order to solve the problem of automatic noise re- of image features. From the perspective of image compo-
duction of complex image structure, the normalized network sition, adjacent pixels can generally form the edge lines of the
of encoder is used for classification in this paper. Based on image. Several edge lines form the image texture, and the
the existing convolution neural network model, the noise image texture is combined into several local patterns. +e
reduction automatic encoder and sparse automatic encoder local pattern is the basic element of the image. +rough the
are organically combined, and the input original image convolution layer of the network model, different types of
information is normalized on the sparse automatic encoder. features can be extracted and the local pattern of the image
+en, the improved convolution neural network model is can be formed. When the convolution kernel is smaller,
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Advances in Multimedia
Depth feature
extraction
Back propagation
fine tuning
Data
preprocessing
Y
Training set
Network
model
Output
Data Get weight forward Testing and
classification
preprocessing propagation analysis
results
although the convolution layer extracts more features, there convolution layers can improve the classification accuracy of
may be overfitting problems in the model. On the contrary, the model.
the larger the convolution kernel is, the fewer features In order to improve the accuracy of image classification
extracted by the convolution layer are, and the worse the and recognition, the depth learning model proposed in this
image classification effect may be. +erefore, the reasonable paper needs to be optimized. Firstly, a smaller convolution
optimization of the size of convolution kernel can improve kernel is selected in the first convolution layer in order to
the accuracy of image classification. extract more image feature details. Secondly, the maximum
Because the convolution neural network model mainly pool sampling operation is adopted in the model to avoid the
extracts the image features layer by layer through different overfitting problem. As shown in Figure 10, the optimized
convolution layers, the number of convolution layers will image classification model consists of three convolution
affect the feature extraction quality of the model to a certain layers, in which the convolution kernel of each convolution
extent. Similar to the number of convolution kernels con- layer decreases in turn. At the same time, after each con-
tained in the convolution layer, the more convolution layers, volution layer, the features are processed by ReLU activation
the finer the features obtained by the model classifier, which function, and the generated features are used as the input of
may lead to overfitting phenomenon, while the less con- the maximum pooling layer. +e model adopts three full
volution layers, the coarser the features obtained by the connection layers, takes the processing result of the last full
model classifier, which may lead to the decline of image connection layer as the input of Softmax classifier, and then
classification accuracy. +erefore, the optimization of generates the classification result of the image.
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Advances in Multimedia 9
9×9
3×3 5×5 3×3 3×3 3×3
Image input Conv1 Max pool Conv2 Max pool Conv3 Max pool
Somax
function
+e convolution layer parameters of the model include the classification process, category y of the image target can
the size and number of convolution kernels. +e first have M different values. If the image training set is
convolution layer is close to the image input layer and is (x1 , y1 ), · · · , (xi , yi ), where xi represents the image
mainly used to extract the basic features of the image, so the training sample, yi is the image classification category, and
parameters of the first convolution layer have a great in- yi ∈ {1, 2, · · · , M}, the cost function of Softmax regression
fluence on the features. In order to facilitate the further algorithm can be expressed as
processing of features in the subsequent convolution layer, a
T
smaller convolution kernel needs to be used to extract the 1 N M expαk xi
attribute information such as shadow, boundary, and light of r(α) � ⎡⎢⎣ 1yi � klog⎛
⎝ ⎠⎤⎥⎦.
⎞ (4)
N i�1 k�1 j�1 expαTj xi
M
the image.
+e convolution layer maps the obtained features Assuming that M markers are accumulated in the cost
through the activation function. +erefore, the optimized function, the probability calculation of training sample x as
convolution neural network model adopts ReLU activation category k can be obtained, which is expressed as
function [20], which can be expressed by mathematical
function as follows: expαTk xi
λ yi � k | xi ; α � T
. (5)
y(x) � Max(0, x). (2) M
j�1 expαj xi
Recognition accuracy
0.6
0.5
0.4
0.3
0.2
0.1
0
50 100 150 200 250 300 350 400 450 500
Iterations
LeNet
AlexNet
VggNet
Figure 12: +e relationship between the classification accuracy of
the model and the number of iterations.
Conflicts of Interest
4 +e authors declare that there are no conflicts of interest.
3.5
3
Acknowledgments
Loss value
2.5
2 +is work was supported by the Shijiazhuang Posts and
1.5
Telecommunications Technical College.
1
0.5
0
References
0 10 20 30 40 50 60 70
Iterations
[1] S. H. Kim and H. L. Choi, “Convolutional neural network-
based multi-target detection and recognition method for
Train_optimization Test_optimization unmanned airborne surveillance systems,” INTERNA-
Train_no_optimization Test_no_optimization TIONAL JOURNAL OF AERONAUTICAL AND SPACE
SCIENCES, vol. 20, no. 4, pp. 1038–1046, 2019.
Figure 14: Iterative comparison of model loss value before and [2] P. W. Song, H. Y. Si, H. Zhou, R. Yuan, E. Q. Chen, and
after optimization. Z. D. Zhang, “Feature extraction and target recognition of
moving image sequences,” IEEE Access, vol. 8, pp. 147148–
147161, 2020.
From the above experimental comparison results of the [3] W. Y. Zhang, X. H. Fu, and W. Li, “+e intelligent vehicle
relationship between the accuracy of common network target recognition algorithm based on target infrared features
models in image classification and the number of iterations, combined with lidar,” Computer Communications, vol. 155,
it is known that the model proposed in this paper is superior pp. 158–165, 2020.
to other models in classification accuracy. By comparing the [4] M. Li, H. P. Bi, Z. C. Liu et al., “Research on target recognition
classification accuracy of the deep learning model on the system for camouflage target based on dual modulation,”
training set and the test set before and after optimization, it Spectroscopy and Spectral Analysis, vol. 37, no. 4, pp. 1174–
is known that the accuracy of image classification can be 1178, 2017.
significantly improved after a certain degree of optimization. [5] S. J. Wang, F. Jiang, B. Zhang, R. Ma, and Q. Hao, “Devel-
opment of UAV-based target tracking and recognition sys-
tems,” IEEE Transactions on Intelligent Transportation
6. Conclusion Systems, vol. 21, no. 8, pp. 3409–3422, 2020.
[6] O. Kechagias-Stamatis and N. Aouf, “Evaluating 3D local
Aiming at the problems of large time overhead and low descriptors for future LIDAR missiles with automatic target
classification accuracy in traditional image classification recognition capabilities,” e Imaging Science Journal, vol. 65,
methods, a deep learning model of image classification based no. 7, pp. 428–437, 2017.
on machine learning was proposed in this paper. By ana- [7] M. Ding, Z. J. Sun, L. Wei, Y. F. Cao, and Y. H. Yao, “Infrared
lyzing the basic theory of neural network, this paper target detection and recognition method in airborne photo-
expounded the types of convolution neural network and its electric system,” Journal of Aerospace Information Systems,
vol. 16, no. 3, pp. 94–106, 2019.
application in image classification. Using the existing con-
[8] W. L. Xue and T. Jiang, “An adaptive algorithm for target
volution neural network model, through noise reduction recognition using Gaussian mixture models,” Measurement,
and parameter adjustment in the feature extraction process, vol. 124, pp. 233–240, 2018.
an image classification depth learning model based on [9] F. Liu, T. S. Shen, S. J. Guo, and J. Zhang, “Multi-spectral ship
improved convolution neural network was proposed. In target recognition based on feature level fusion,” Spectroscopy
order to improve the classification efficiency and accuracy of and Spectral Analysis, vol. 37, no. 6, pp. 1934–1940, 2017.
6048, 2022, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2022/3351256, Wiley Online Library on [21/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 Advances in Multimedia