0% found this document useful (0 votes)

49 views14 pages

Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm

Uploaded by

devipriya1474

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views14 pages

Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm

Uploaded by

devipriya1474

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

SPECIAL SECTION ON GIGAPIXEL

PANORAMIC VIDEO WITH VIRTUAL REALITY

Received June 7, 2020, accepted June 21, 2020, date of publication June 30, 2020, date of current version July 21, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3006097

Artificial Intelligence Image Recognition

Method Based on Convolutional Neural
Network Algorithm
YOUHUI TIAN
Jiangsu Vocational Institute of Commerce, Nanjing 211168, China
e-mail: [email protected]

ABSTRACT As an algorithm with excellent performance, convolutional neural network has been widely
used in the field of image processing and achieved good results by relying on its own local receptive fields,
weight sharing, pooling, and sparse connections. In order to improve the convergence speed and recognition
accuracy of the convolutional neural network algorithm, this paper proposes a new convolutional neural
network algorithm. First, a recurrent neural network is introduced into the convolutional neural network, and
the deep features of the image are learned in parallel using the convolutional neural network and the recurrent
neural network. Secondly, according to the idea of ResNet’s skip convolution layer, a new residual module
ShortCut3-ResNet is constructed. Then, a dual optimization model is established to realize the integrated
optimization of the convolution and full connection process. Finally, the effects of various parameters of
the convolutional neural network on the network performance are analyzed through simulation experiments,
and the optimal network parameters of the convolutional neural network are finally set. Experimental results
show that the convolutional neural network algorithm proposed in this paper can learn the diverse features of
the image, and improve the accuracy of feature extraction and image recognition ability of the convolutional
neural network.

INDEX TERMS Convolutional neural network, artificial intelligence, image recognition.

I. INTRODUCTION In traditional pattern recognition methods, the most impor-

With the rapid development of the mobile Internet, tant thing is to express this image through a mathematical
the widespread use of smart phones and the populariza- statistical model after extracting a certain amount of artificial
tion of social media self-media, a large amount of picture feature points [5], [6]. Then identify the image by the method
information has accompanied [1], [2]. Nevertheless, as pic- of image matching. The basic principle of this method is that
tures become an important carrier of network information, the similar samples are very close in the pattern space and
problems also arise. Traditional information materials are form a ‘‘clustering’’, and then combined with the classifier
recorded by words, and we can retrieve and process the for classification and recognition. For example, object recog-
required content by searching keywords. However, when pic- nition uses scale-invariant feature transform (SIFT) features,
tures express the information, we cannot retrieve or process face recognition uses local binary patterns (LBP) features,
the information expressed in the pictures. The picture brings and pedestrian detection uses histogram of oriented gradi-
us a convenient way of information recording and sharing, ent (HOG) features, but such shallow machine learning meth-
but it is difficult for us to use the information expressed by ods have low recognition. With the development of artificial
the image. In this case, how to use a computer to intelligently intelligence, continuous breakthroughs in deep learning have
classify and recognize the data of these images is particularly achieved great success in the fields of speech recognition,
important [3], [4]. NLP processing, computer vision, video analysis, multime-
dia, and so on [7]–[9]. More and more enterprise compa-
nies and researchers use deep learning to discuss and study
The associate editor coordinating the review of this manuscript and image classification, which provides a good development for
approving it for publication was Zhihan Lv . artificial intelligence.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 125731
Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

In previous years, most methods used shallow structural convolutional neural network (Scatter-Net) based on wavelet
models to process data, and the structural model had at most transform, which uses wavelet transform to extract image
one or two layers of nonlinear features. The most represen- high-frequency information hierarchically instead of the
tative shallow structures are Gaussian Mixture Model [10], parameter learning process, which shows good performance
K-means clustering [11], Support Vector Machine [12], and in image recognition and classification tasks. Hu et al. [32]
Logistic Regression [13]. Convolutional neural network can proposed a model PCAnet that initializes the CNN con-
extract the connection and spatial information between its volution layer parameters by extracting the features of the
layers from the image, and can express the relevant charac- image principal component, and has achieved good results in
teristics inside the image [14], [15]. The image recognition image recognition tasks. Sadr et al. [34] pointed out that the
process based on deep learning is mainly to input the image structure of the convolutional neural network itself is the main
into the neural network, and use the deep learning forward factor for the network to extract multi-level and multi-scale
propagation and back propagation error algorithms to min- features. Guo et al. [35] combined a convolutional neural
imize the loss function. After updating the weights, a better network and a recurrent neural network to propose a new deep
recognition model is obtained. Then use this model to identify learning structure. The convolutional neural network learns
new images. the shallow features of the original image and uses it as the
In practical applications, CNN has been used in many input of the recurrent neural network. Using the recurrent
visual pattern recognition systems. Morvan et al. [16] pro- neural network to learn the high-level features, it achieves
posed the CNN structure LeNet for handwritten digit recog- a good recognition rate in color-depth image recognition.
nition. Convolutional neural networks are also used for facial Bernal et al. [36] added the input of the convolutional neu-
recognition and facial localization [17]. Parmar et al. [18] ral network based on the [35], and proposed a multi-scale
used convolutional neural networks to detect faces and facial convolutional recurrent neural network. After local contrast
expressions. Grossberg [19] used shunt suppression convo- normalization and sampling, it was used as the input of the
lutional neural networks for the detection of eye and face recurrent neural network to extract more abstract high-level
images. Nguyen et al. [20] used convolutional neural net- features.
works to detect text images. Wang et al. [21] achieved Although there are many image recognition algorithms
victory in the large-scale image recognition competition based on convolutional neural networks, and the recognition
using the classic AlexNet model, and successfully reduced effect is very good. However, many recognition algorithms
the false recognition rate to 17%. After the success of are now based on a specific database to design the depth
AlexNet, the researchers proposed other network models, and level of the network. Through continuous exploration,
such as VGGNet [22], GoogleNet [23], and ResNet [24]. the best parameters and optimization algorithms are found.
AlexNet uses the ReLU function to replace the activation The human factor is relatively large, and there is no system-
function commonly used in traditional neural networks [25]. atic theory to affect the recognition effect of the convolutional
Compared with sigmoid function and Tanh function, ReLU nerve. Especially when classifying and recognizing natural
has no exponential calculation, the calculation amount is images, the selection of the initial state parameters of the
small and the network will not be saturated, and because of convolutional neural network and the optimization algorithm
the linear unsaturated form of ReLU, it will speed up the will have a great impact on the network training. If the
network convergence speed. To solve the problem that the selection is not good, the network will not work, or it may fall
ReLU function is not derivable at the origin, Montanelli and into the local minimal, under-fitting, over-fitting, and many
Du [26] proposed a sparse ReLU function. Wang et al. [27] other problems [37], [38].
proposed parametric ReLU, and proved through experiments In order to improve the ability of the convolutional
that the PReLU function has achieved good results in the neural network to classify and recognize two-dimensional
big data classification task. In recent years, the study of images, speed up the convergence of the algorithm, reduce
convolutional neural networks has been inseparable from the number of iterations and shorten the training period,
transfer learning. Transfer learning is a method that can use and achieve good classification results, this paper pro-
the knowledge that has been learned to solve problems in poses a new convolutional neural network algorithm. First,
new fields [28]. Feng et al. [29] used large image datasets a recurrent neural network is introduced into the convolu-
to perform pre-training on convolutional neural networks, tional neural network, and the deep features of the image
and then trained and tested the trained networks on the are learned in parallel using the convolutional neural net-
image datasets to be classified. Compared with the traditional work and the recurrent neural network. Secondly, accord-
method of training the network directly on the target data set, ing to the idea of ResNet’s skip convolution layer, a new
the image recognition rate of this method is greatly improved. residual module ShortCut3-ResNet is constructed. Finally,
Zhang et al. [30] proposed an algorithm based on hierarchi- a dual optimization model is established to achieve inte-
cal sparse coding (HSC), which extracts features through grated optimization of the convolution and full connection
hierarchical pooling and sparse processing, and obtains process.
good results in handwriting recognition and multi-class Specifically, the technical contributions of our paper can
object recognition. Carroll et al. [31] proposed a scattering be concluded as follows:

125732 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

This paper proposes a new convolutional neural network 3) POOLING

algorithm. The network can learn the multi-scale and diverse The pooling layer is generally behind the convolution layer,
features of the image. Moreover, without adding additional the purpose is to compress the image after the convolution to
parameters and calculations, it can improve the accuracy and reduce the amount of parameters.
recognition ability of convolutional neural network feature
extraction. B. THE STRUCTURE OF CONVOLUTIONAL NEURAL
The rest of our paper was organized as follows. Related NETWORK
work was introduced in Section II. Section III described the Convolutional neural network is a combination of deep learn-
structure of the convolutional neural network algorithm pro- ing algorithm and artificial neural network, and it is widely
posed in this paper. Experimental results and analysis were used in image processing. Convolutional neural network is
discussed in detail in Section IV. Finally, Section V concluded generally composed of three parts: input layer, hidden layer
the whole paper. and output layer. Among them, the input layer is the original
image that has not been processed, the output layer is the
II. RELATED WORKS result of classifying the features, and the hidden layer is a
A. BASIC INTRODUCTION TO CONVOLUTIONAL NEURAL neuron layer with a complex multi-layer nonlinear structure,
NETWORKS including a convolution layer and a sub-sampling layer. Con-
In recent years, researchers have applied CNN to other fields, volutional neural networks extract and classify features in
such as speech recognition [39], face recognition, object hidden layers. Therefore, optimization of the convolutional
recognition, natural language processing [40], brain wave layer and single-layer perceptron can improve the accuracy
analysis [41], and so on. These fields continue in many direc- of feature extraction and optimize the classification effect.
tions and some breakthroughs have been made. Compared to The structure of the convolutional neural network is shown
ordinary neural networks, CNN contains a feature extractor, in Figure 1.
which consists of a convolution layer and a down-sampling
layer. A neuron is only connected to a part of neurons in the
upper layer, and this part of neurons is called a local recep-
tive field. A convolutional layer generally contains multiple
feature maps, each feature map is composed of a specific
number of neurons, and the weights of neurons are shared
between the same feature maps. The biggest feature of weight
sharing is to reduce the connection between the various layers
of the network, reduce network parameters, and at the same
time play a role in preventing overfitting. In general, the ini-
FIGURE 1. Structure of a convolutional neural network.
tial value of the convolution kernel is randomly generated.
In the process of network training, new weights are constantly
Figure 1 shows the structure model of a convolutional
learned and updated in real time until a reasonable weight
neural network with only 2 convolutional layers (C1, C3) and
is finally learned. Down sampling, also called pooling, is a
2 sub-sampling layers (S2, S4) in the hidden layer. The input
special convolution process. Therefore, CNN has three main
data in the figure is the original image input, and the output
features, namely local receptive field, weight sharing and
result of the output layer is divided into A∼G categories. The
pooling [42].
repeated structure composed of layer C and layer S serves
as the basic unit of feature extraction. After multiple feature
1) LOCAL RECEPTIVE FIELD extractions, the final feature map is rasterized to obtain a
In CNN, each neuron of the hidden layer is connected one-dimensional matrix, which is a fully connected layer.
to a small area of the input layer, and each connection The fully connected layer and the output layer use the fully
has a parameter weight and offset that can be learned. connected method to obtain the final output result.
This area is called the local receptive field of the hidden
layer neuron. Each neuron corresponds to a local receptive 1) CONVOLUTIONAL LAYER
field. Convolutional layers are the most hidden layers in CNN.
At present, the structure of CNN has gradually tended to
2) WEIGHT SHARING stack several convolutional layers in succession, followed by
For a local receptive field with 25 pixels, there is a weight of a pooling layer.
5 × 5 for each neuron in the hidden layer. Weight sharing The function of convolution is to convolve the input
is that the weights corresponding to these neurons in the image and the filter, the information of the original image
hidden layer are the same. Due to the existence of weight is enhanced, and the interference of noise is suppressed.
sharing, the amount of network parameters and training time The process of convolution reflects the characteristics of
are greatly reduced. local receptive field and weight sharing. In convolution,

VOLUME 8, 2020 125733

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

the network will automatically learn the features without the neural network can adapt to more problems that are com-
manual selection of features, which avoids time and effort. plex. Commonly used activation functions include sigmoid
Suppose there is an image with a size of 4 × 4, a 2 × 2 convo- function, Tanh function, ReLU function, and leaky ReLU
lution kernel is used, and the sliding step of the convolution function. The formula of sigmoid function is shown as for-
kernel is one, then the convolution operation process can be mula (1). The Tanh function can be expressed as formula (2).
expressed as shown in Figure 2. The ReLU function is described by formula as shown in equa-
tion (3). The improvement of the gradient problem caused by
the negative ReLU input results in the Leaky ReLU function,
whose function formula can be expressed as formula (4).

σ (x) = 1/(1 + e−x ) (1)

Tanh(x) = (ex − e−x )/(ex + e−x ) (2)
f (x) = max(0, x) (3)
f (x) = max(αx, x) (4)

The corresponding graphs of these four functions are

shown in Figure 4 below.

FIGURE 2. Process of convolution operation.

2) POOLING LAYER
The pooling layer is also a very common type of hidden layer
used in CNN. Because the local features in the image are
related, pooling the image can greatly reduce the amount of
calculation but will not lose the main features of the image.
Assuming that the size of the image is 4 × 4, a convolution
kernel of 2 × 2 size is used, and the sliding step of the con-
volution kernel is set to two, the common pooling methods,
maximum pooling, average pooling and random pooling the
process is shown in Figure 3.

FIGURE 4. Curves of four activation functions.

4) FULLY CONNECTED LAYER

The fully connected layer is the last few layers in the CNN and
acts as a classifier in the entire network [43]. If the previous
layer of the fully connected layer is also a fully connected
layer, then 1 × 1 convolution is used. If the previous layer of
the fully connected layer is a convolutional layer, the global
convolution of h×w is used, and variable h and variable w are
the height and width of the previous layer convolution result,
respectively.

C. ResNet INTRODUCTION
FIGURE 3. The process of pooling operation.
Since AlexNet, some cutting-edge CNNs are constantly deep-
ening [44], [45]. For example, AlexNet has five convolutional
3) ACTIVE LAYER layers; VGGNet and GoogleNet have 19 and 22 convolutional
Another important hidden layer of CNN is the active layer. layers, respectively. However, it is not feasible to increase the
When solving more complex problems, the activation func- network depth by directly superimposing layers. Assuming
tion adds nonlinear factors into the neural network so that, there is a shallow network, multiple maps are stacked on

125734 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

this network to form a deep network. In theory, the training Among them, the variable xi is the input vector. The vari-
error of this deep network will not be higher than that of able xi+1 is the output vector. The variable F represents the
the shallow network. However, the experimental results show residual mapping that the residual structure needs to learn
that such a deep network cannot be found. After doing many and can be expressed as F = W2 σ (W1 x). The variable
experiments, it is also found that the deep network shows f represents the activation function operation. The variable
higher training error than the shallow network on the same h(xi ) = xi is a cross-layer connection.
data set, as shown in Figure 5. Nevertheless, in CNN, as the network deepens, the number
of convolution kernels also increases. Therefore, for the case
where these two dimensions do not match, a special con-
volution kernel Ws can be used to perform the convolution
operation to ensure the matching of the two dimensions.

yi = F(xi , Wi ) + Ws h(xi ) (8)

The combination of residual block and BN in ResNet can

solve the gradient dispersion problem well. BN plays a nor-
FIGURE 5. The error of the 20-layer and 56-layer networks on the
malized role in CNN, as shown in Figure 6. Neural networks
CIFAR-10 dataset. need to learn how the input data is distributed. If the data
used for the training network and the data used for the test
In Figure 5, for the 20-layer and 56-layer networks, network come from different distributions, the generalization
the experimental results on CIFAR-10 show that the error of ability of the network thus trained will be poor. Moreover,
the 56-layer network is higher than that of the 20-layer net- for the training data, if the distribution of each batch is not
work. This phenomenon is called the degeneration problem. the same, the network needs to learn a different distribution,
Due to the problem of vanishing gradients, deep neural which will directly increase the training time of the network.
network training becomes very difficult. The gradient dis- In addition, for the activation function, the distribution of data
appearance problem means that when the gradient is propa- is also very important. When the data distribution range is too
gated back to the previous layer, repeated multiplication will large, the nonlinear characteristic of the activation function is
make the gradient infinitely small. Therefore, as the network not conducive to the utilization.
continues to deepen, its performance will gradually become
saturated.
Therefore, drawing on the idea of cross-layer connections
in high-speed networks, ResNet was proposed [46]. The core
of the high-speed network is the addition of two nonlinear
conversion layers to the ordinary neural network. One is
T (transform gate) and one is C (carry gate), as shown in
equation (5).

y = H (x, WH ) · T (x, WT ) + x · C(x, WC ) (5)

Suppose that the goal of the neural network complex sub-

module is to learn H(x). If this target mapping is complicated,
it is difficult for the neural network to learn from it. Then,
the target mapping can be directly learned without the mod-
ule. Instead, learn the difference of H(x) − x. This difference
is called the residual, that is, F(x) = H x) − x. Therefore,
the original target mapping is H(x) = F(x) + x, which consti-
tutes ResNet. This cross-layer connection network structure FIGURE 6. Schematic diagram of normalization.
breaks the convention that the output layer of n-1 traditional
neural network can only give n layers, so that the output of
a layer can directly pass through the problem of gradient It can be seen from Figure 6 that many parts of the data
disappearance or gradient explosion after several layers of before normalization may be located in the saturation region
input. The residual module can be expressed as formula (6) of the loss function, resulting in very close output results,
and formula (7). which is not conducive to network learning. Adding the BN
layer can solve this problem, that is, before the data is sent to
yi = F(xi , Wi ) + h(xi ) (6) the next layer, the data is normalized.
xi+1 = f (yi ) (7) The forward propagation process of BN is as follows:

VOLUME 8, 2020 125735

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

Assuming that the input data is xi, first calculate the mean
value of the data, as shown in equation (9).
m
1X
µB = xi (9)
m
i=1

Among them, the variable m represents the size of mini-

batch. The above formula calculates the average value of
mini-batch. Then calculate the standard deviation of mini-
batch as shown in equation (10).
m
1X
σB2 = (xi − µB )2 (10) FIGURE 7. Schematic diagram of the structure of a recurrent neural
m network.
i=1

Then normalize the input data as shown in equation (11).

B. CONSTRUCT A NEW RESIDUAL MODULE
(xi − µB )2 ShortCut in ResNet skips the two convolutional layers and
x̂i = q (11)
σB2 + ε connects to the corresponding output layer, as shown in
Figure 8(a).
Then the output result of the BN layer can be obtained as
shown in equation (12).

yi = γ x̂i + β = BNγβ (xi ) (12)

In the formula, the variable γ represents the scaling factor.

The variable β represents the offset coefficient. The essence
of the BN layer is to learn these two parameters, so that the
network can learn the feature distribution of the output.

III. IMAGE RECOGNITION ALGORITHM BASED ON CNN

This paper first introduces a recurrent neural network into
the convolutional neural network, and uses the convolutional
neural network and the recurrent neural network to learn
the deep features of the image in parallel. Secondly, accord-
ing to ResNet’s idea of skipping convolutional layer, a new
residual module ShortCut3-ResNet is constructed. Finally, FIGURE 8. ShortCut in ResNet.
a dual optimization model is established to achieve integrated
optimization of the convolution and full connection process. According to ResNet’s idea of skipping convolutional lay-
Next, we will explain systematically. ers, this paper builds a new residual module that skips three
convolutional layers, as shown in Figure 8(b). The resulting
A. RECURRENT NEURAL NETWORK ResNet is called ShortCut3-ResNet. The most important part
Recursive neural network is similar to the combination of of ResNet is the shortcut. Although its presence makes the
convolution operation and sampling operation. By repeatedly network look more complicated, it does not add additional
using the same set of weights and selecting the acceptance parameters and calculations, but improves the accuracy of
domain to achieve the purpose of reducing the feature dimen- recognition.
sion layer by layer, its structure diagram is shown in Figure 7. In the design of ShortCut3-ResNet network, this paper
Among them, variable K1 is the number of feature maps draws on the design model of VGGNet. All the convolution
output by the first-level network. The size of the bottommost kernels in the network adopt 3 × 3 convolution kernels, and
feature map is 4 × 4, which is the unit of the bottom feature can be divided into 3 segments according to the number of
map. Let the acceptance field be 2 × 2 and the connection convolution kernels. Each segment has 2n layers (n >= 3).
weight is W. Each unit of the second-layer network feature The number of convolution kernels in each layer in the first
map is connected to the 2 × 2 acceptance field of the bottom paragraph is 16. The number of convolution kernels in each
layer feature map, and finally a 2 × 2 size feature map is layer in the second paragraph is 32. The number of con-
obtained. In the same way, the second layer of feature maps volution kernels in each layer in the third paragraph is 64.
gets a 1 × 1 size feature map after going through a layer of In summary, except for the first layer and the last layer,
recurrent neural network. the entire network has 6n hidden layers.

125736 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

In the network structure described above, the first layer of the two channels of F(x) and x. The number of the two
uses a 3 × 3 convolutional layer. Nevertheless, the last layer channels at the solid line connection is the same and can be
no longer uses the fully connected layer in VGGNet, but bor- added directly. The dotted line indicates that the two channels
rows from the global average-pooling layer in network. This have different numbers, and the dimension of x needs to be
can effectively avoid the problems of excessive parameter adjusted by a convolution operation, that is, the convolution
quantity, low training speed, and easy overfitting in the fully kernel Ws in formula (8).
connected layer.
When the value of n is three, we can get the number of C. DOUBLE OPTIMIZATION
hidden layers as 18, plus the first layer of convolutional layer The design principle of the convolution optimization model is
and the last layer of global average pooling layer, we can get to realize the weight optimization of the convolution kernel.
a 20-layer ShortCut3-ResNet. Table 1 describes the original We can learn the data set weights and bias parameters from
ResNet and ShortCut3-ResNet network configurations for the small data blocks to obtain a sparse feature matrix. The
20-layer network. Based on this structure, 6n + 2 (n >= 3) convolution kernel is initialized by convolution coefficient
layer network structure can be obtained, as shown in Figure 9. control. Let matrix X is the sample data set. A is the base
matrix used to transform X from sample space to feature
TABLE 1. Two different ResNet structures. space. Matrix S is the feature table of the data set. Setting
the objective function J (A, S) and assigning the initial value
of S, the process of reducing the objective function through
iteration is the process of optimizing S. Giving a good initial
value of S can avoid the situation of poor convergence during
the iteration process, and at the same time obtain a faster
convergence and more optimized results. The process of S
initialization and feature update is as follows:
S = G(W T X ) (13)
Sc0 = Sc /||Ac || (14)
Among them, the variable W T is a random orthogonal
matrix. Using W T to extract sample X through weighted
transformation, the initial value of matrix S is generated. The
variable Sc represents the c-th feature matrix of the matrix S.
The variable Ac represents the corresponding base matrix of
Sc in matrix A. Let variable M be m × n matrix, then:
X n
m X
||M ||k = ( |mij |k )1/k (15)
i=1 j=1

This normalization process can maintain sparsity with-

out affecting the performance of the algorithm. Therefore,
through the above process, S can obtain a good initial value.
The objective function expression is:
J (A, S) = ||AS − X ||22 + γ ||A||22 (16)
The variable ||AS − X ||22 is the error between the samples
set reconstructed using the base matrix and the feature set
and the actual sample set. The variable γ ||A||22 is the sparse
control term. The variable γ is the sparse coefficient. The
value of variable J (A, S) is the sum of the error term and the
FIGURE 9. Schematic diagram of network structure.
sparse control term.
Step 1 Given a random initialization matrix A.
This network has three convolutions. According to the Step 2 According to the given A and S, use the gradient
depth of the network in each convolution, there will be multi- descent method to find the local minimum value of the objec-
ple convolutional layers. The number of convolution kernels tive function J (A, S). Moreover, obtain the value S ’of S at
in each segment is the same, and the later the number of this time, α is the step size, and control the change amount of
convolution kernels increases. The connection of the dashed the gradient direction each time. The calculation process is:
and solid lines of ShortCut in Figure 9 is different. Because ∂J (A, S)
the residual learning is H(x) = F(x) + x, it is the addition S0 = S − α (17)
∂S
VOLUME 8, 2020 125737
Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

Step 3 Use the gradient descent method to obtain the The optimization process of fully connected parameters
local minimum value of the objective function J (A, S) again is similar to convolution optimization, and the process is as
according to the value of variable S 0 . Moreover, obtain the follows.
value of variable A0 at this time, α is the step size, and Let Opt3 be a fully connected parameter matrix randomly
control the change amount of gradient direction each time. 2 . According to
initialized using parameters ncag and nf limg
The calculation process is the operation process and classification results of the convo-
lutional neural network, the parameter settings of the fully
∂J (A, S)
A0 = A − α (18) connected layer are affected by factors such as the number
∂S of iterations of the convolutional neural network. According
Step 4 Use A ’and S’ to replace A and S respectively, repeat to the interpolation principle, the constructor optimizes the
step 2 ∼ step 4. parameters of the full connection, and ρ is the optimization
As the number of iterations increases, the objective func- coefficient:
tion will gradually decrease in the reverse direction of the
gradient until the gradient vector approaches zero, and the ρ = ncag (w − εk ε−1 )/2 (23)
objective function no longer decreases or the change can be Among them, the variable ω is a factor that affects the
ignored. Randomly sample the obtained feature matrix S to optimization coefficient, which is determined by factors such
construct the initial weight of the convolution kernel. At the as the amount of data processed by the single-layer percep-
same time, the dynamically determined value µ is used in the tron and the number of classifications. Let variable θ2 be the
convolutional neural network to replace the constant µ0 as correction error term,
the convolution coefficient, to realize the optimization of the q
convolution kernel. ε
w = λ( nf limg
2 −n
cag ) − k + θ2 (24)
Suppose the convolutional neural network contains a total
ε−1
of k convolutional layers, the size of each convolution kernel X
λ=k+ i (25)
is lker ×lker . The input image size of the convolutional layer is
i=0
a matrix of variable limg × limg . The input and output feature
maps or images are nin and nout respectively. Let Opt4 be the parameter matrix of the last fully connected
Let the matrix Opt1 be the feature matrix S when the layer, and the optimized fully connected layer parameter
objective function obtains the minimum value after multiple expression is:
iterations. Use the convolution coefficient to optimize the
ρ
s
convolution kernel, analyze the original convolution result Opt4 = 2 × Opt3 (26)
2
ncag + nf limg
through the dichotomy, and construct the function expression
according to the interpolation principle. The dynamic convo-
lution coefficient µ is expressed as follows D. CONVOLUTIONAL NEURAL NETWORK TRAINING
PROCESS
2
nin nout limg
µ= + θ1 (19) Convolutional neural network is essentially a mapping from
2k input to output, which can learn many features that do not
Among them, the variable θ1 is the correction error term. require any precise mathematical expression between input
The expressions of the number of parameters required for the and output, and realize the mapping between input and out-
input data and output data corresponding to the convolution put. Because the network performs supervised learning, its
kernel are as follows: sample set is a vector pair of input vectors and ideal output
vectors. The network training process is shown in Figure 10.
2
fin = nin lker (20)
2
fout = nout lker (21)

The initialization expression of optimized convolution ker-

nel is as follows:
µ
r
Opt2 = 2 × Opt1 (22)
fin + fout
The variable Opt2 is the final optimized parameter matrix
of convolution kernel. Suppose the convolutional neural net- FIGURE 10. Network training flowchart.
work contains a total of k convolutional layers, and all input
images are divided into ncag categories. The number of iter- Use small random numbers of different sizes to initialize
ations required for a convolutional neural network is ε. The the connection weights of the convolutional layer threshold,
number of feature maps generated by the last sub-sampling two-layer convolution kernel, network input layer and hidden
layer received by the single-layer perceptron is nf . layer, and hidden layer and output layer in the network. At the

125738 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

same time, set the learning speed and the corresponding variable δjl+1 from left to right and from top to bottom. More-
accuracy control parameters. over, corresponding to the value of the variable xil multiplied,
Because each convolutional layer has its threshold that can after accumulation to obtain the derivative of the convolution
be trained, the weight of each convolution kernel is a learn- kernel function. After the derivative of the weight value of the
able parameter. Therefore, the focus of CNN weight update convolution kernel is obtained, it is updated to the Kij position
is the update of convolution kernel weight and convolution corresponding to the original convolution kernel.
layer threshold. To update the bias of the threshold of the convolutional
layer, simply add the error sensitivity of the j-th feature map
1) UPDATE OF WEIGHT AND THRESHOLD OF NETWORK of the above l + 1 layer.
The reverse adjustment of neural network is the idea of gradi- The threshold update derivative is calculated as shown in
ent descent. For the network weight update in the algorithm, equation (28).
the adjustment of parameters always proceeds in the direction
of error reduction. ∂loss X l+1
= (δj ) (28)
∂kij u,v
u,v

2) WEIGHT UPDATE OF CONVOLUTION KERNEL

To calculate the weight update of the convolution kernel, IV. EXPERIMENTS AND RESULTS
the derived derivative of the sensitivity related to the error A. IMAGE DATA SET AND EXPERIMENTAL ENVIRONMENT
sensitivity of the convolutional layer, the output layer, and The experiments in this paper use the CIFAR-10 image
the down-sampling layer is used. For the convolution kernel, dataset. CIFAR-10 is an image data set containing
we mainly adjust the weight of the convolution kernel. If the 60,000 color pictures. The size of each picture is 32 × 32.
dimension of each convolution kernel is n, then the training It is divided into 10 categories, and each category contains
parameter it can learn is n × n. The derivative of the weight 6000 pictures.
between i-th feature map of layer l and the j-th feature map CIFAR-10 is divided into five training files and one test
of l + 1 layer is required. The calculation method is shown in file. Each file contains 10,000 pictures. Among them, the test
equation (27). file is composed of 1000 pictures randomly selected from
each category. The training file contains the remaining pic-
∂loss
= xil · δjl+1 (27) tures and is out of order. Therefore, although each training file
∂kij contains 5000 pictures, some files may have more pictures in
Among them, the symbol · refers to the convolution oper- a category than others.
ation of the matrix. Suppose the size of i-th feature map xil of The TensorFlow framework is used in this experiment.
layer l is 4 × 4, as shown in Figure 11. The j-th feature map An open source software library uses data flow graphs for
δjl+1 of l + 1 layer has an error sensitivity of 3 × 3, as shown numerical calculations. A directed graph is used to represent
in Figure 11. the calculation of data flow. This graph consists of a set
of nodes. A data flow diagram describing the convolution
operation is shown in Figure 12.

FIGURE 11. Calculation of feature map and weight derivative.

FIGURE 12. Data flow graph of TensorFlow.
Then the derivative size of the weight Kij of the convolution
kernel is 2 × 2, and the calculation method is shown in The hardware environment used in this experiment is
Figure 11. The variable δjl+1 is on the variable xil . Pan the as follows: Intel Socket 2011-v3 i7 processor, 128GB of

VOLUME 8, 2020 125739

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

memory, four NVIDIA Geforce TITAN X 12GB GPU and TABLE 3. Experimental results of different sampling methods.
Ubuntu14.04 operating system.
The evaluation criteria of the experiment in this paper are
the test accuracy and the size of the model finally generated
by training.

B. OPTIMAL SELECTION OF PARAMETERS

1) ACTIVATION FUNCTION
We use the sigmoid function, Tanh function and ReLU func-
tion as the activation function of the network respectively. The
experimental results are shown in Table 2.
random sampling is better than that of average sampling, and
TABLE 2. Experimental results of different activation functions. the recognition rate of random sampling is slightly higher
than the maximum sampling. However, the computational
complexity of random sampling is higher than the maximum
sampling, the convergence speed is slower, and each iteration
takes longer. For comprehensive consideration, the maximum
sampling method is adopted in this paper.

3) POOLING METHOD AND SIZE SELECTION

It can be seen from Table 2 that the ReLU function has According to the structure of the network, three methods of
better function performance compared with other functions. pooling are selected in its pooling layer, mean pooling, max
The main reason is that the ReLU function forces certain pooling, and stochastic pooling, and the size of its pooling is
function values to zero. Therefore, the network still has a changed to 2 × 2, 3 × 3, 4 × 4, and 5 × 5, respectively.
good performance without using the regularization method, The best results of the tests on the CIFAR-10 database are
which can not only prevent the network from overfitting shown in Table 4 below. The test results of different pooling
and accelerate the calculation speed, and the ReLU function sizes are shown in Figure 13. In the pooling method, choose
can effectively prevent the gradient from disappearing. The the largest pooling and the average pooling. The smaller the
sigmoid function and Tanh function will cause the problem of pooling size is, the better the effect is. For the random pooling
gradient disappearance when there are many network layers. method, the optimal size is 3 × 3. A smaller pooling size will

2) SAMPLING METHOD
TABLE 4. Classification results of several pooling methods on the
After the feature map passes through the convolutional layer, CIFAR-10 database.
the dimension is generally very large, which can easily cause
the dimension disaster. Therefore, each convolutional layer
in the convolutional neural network will be connected to a
sampling layer to down sample the feature map to reduce the
dimension of the feature map and reduce the computational
complexity. Therefore, the sampling layer is also an essential
part of the convolutional neural network structure. Choosing
the appropriate sampling method will greatly improve the
performance of the convolutional neural network. By sam-
pling the feature map, the convolutional neural network can
tolerate small deformations. Common sampling methods are
maximum sampling, mean sampling, and random sampling.
Choosing the most suitable sampling method can improve the
recognition efficiency and accuracy of the network. In this
paper, three different network models were constructed using
three sampling methods. Except for the different sampling
methods, the remaining parameters are all the same. The
experimental results are shown in Table 3 below.
Three experiments were carried out for each model, and
the average value of the three experiments was selected as
the final recognition result. It can be seen from the above FIGURE 13. Classification results of different pooling sizes on the
table that the recognition effect of maximum sampling and CIFAR-10 database.

125740 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

cause overfitting, and a larger pooling size will increase the The mean square error curve varies with the training batch
error due to too much noise in down sampling. at different iteration times. When the training times are once,
twice and three times, the convergence curve of each algo-
C. MODEL CONVERGENCE TRAINING rithm is shown in Figure 15.
The training set samples are used for training, and the initial
standard deviation of the initial weights of the network is
0.01, with a Gaussian distribution with an average value
of zero. Suppose the number of sample iterations is 3000,
the initial learning rate of the weight parameter is 0.001, and
the momentum factor is 0.9. The training results are shown
in Figure 14.

FIGURE 15. Convergence curve of each algorithm when training times are
different.

It can be seen from Figure 15 that in the iterative exper-

iment of three trainings, the fully connected optimization
algorithm has a higher mean square error at the initial stage.
Nevertheless, the rate of decline is faster. The convolution
optimization algorithm declines faster than the original algo-
rithm and the fully connected optimization algorithm, and
does not increase the mean square error at the initial stage like
the fully connected optimization algorithm. Compared with
the other two algorithms, the double optimization algorithm
has a slightly faster convergence speed than the convolution
optimization, but it is faster than the other two optimization
algorithms and is the fastest algorithm.

E. PERFORMANCE ANALYSIS OF IMAGE RECOGNITION

Since we have given a variety of ResNet with different topolo-
FIGURE 14. Preparation rate of network training and loss curve. gies, we analyzed them from different depths and different
ShortCut. The network is 6n+2 layers. When n takes different
values, the experimental results are shown in Figure 16.
From the simulation results in Figure 14, it can be seen While caring about recognition accuracy, we also care
that the training accuracy of the designed algorithm increases about the training time of ResNet with different topologies.
rapidly with the increase of the number of iterations, and The experimental results show that under the same depth of
tends to be stable. The classification results of the training set network, all structures of the network have the same training
and the verification set are very close, with an accuracy rate time. Figure 17 shows the training time of all networks under
of 0.985. The loss value of the objective function decreases the 20-layer and 110-layer.
rapidly. The loss value of the objective function converges to In ResNet, the recognition rate of images can be improved
about 0.05 at the iteration of 2500 times. as the network deepens. For 56-layer and 110-layer networks,
the length of ShortCut that can be obtained is basically the
D. DOUBLE-OPTIMIZED PERFORMANCE ANALYSIS same. Therefore, we can see that under the smaller ShortCut,
According to different iteration times, three sets of iterative the original ResNet and the ShortCut3-ResNet constructed in
experiments were performed on convolutional optimization, this paper will have higher accuracy under deeper networks.
convolutional optimization, and fully optimized convolu- Moreover, ShortCut3-ResNet improves more obviously.
tional neural networks. When ShortCut9 and ShortCut18 are selected, the accuracy of

VOLUME 8, 2020 125741

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

FIGURE 16. Accuracy of ResNet on CIFAR-10 under different deep

networks and different ShortCut.

FIGURE 18. Experimental analysis of the same network under different

ShortCut.

FIGURE 17. Comparison chart of training time.

the 110-layer network is even lower than that of the 56-layer

network. For 32-layer and 152-layer networks, the values are
the same. However, when ShortCut10 is taken, the recogni-
tion rate of the 32-layer network is better than that of the 152
layer, as shown in Figure 18.
From the analysis in Figure 19, it can be seen that the
original ResNet and ShortCut3-ResNet are getting better and
better as the network slowly gets deeper. Nevertheless, for the
residuals and networks of other topologies, as the network
deepens, the recognition effect will become worse with the
increase of ShortCut, indicating that the size of ShortCut FIGURE 19. Experimental analysis of the same network under different
deep networks.
has a great influence on the recognition results. When the
length of ShortCut is less than six, the recognition accuracy
rate is basically improved with the deepening of the network.
When it is equal to six, it remains unchanged, and when it In Table 5, the test results of the algorithm and other
is greater than six are, it decreases. The recognition accu- methods designed in this paper are given in the data set. The
racy of the 110-layer ShortCut3-ResNet constructed in this main evaluation criteria are test accuracy and the size of the
paper is equivalent to the accuracy of the original 152-layer model generated by the training.
ResNet. Nevertheless, the parameters of the 110-layer net- In Table 5, the test methods are AlexNet, VGGNet, ResNet,
work are much less than the 152-layer network. Therefore, and Random Forest. There are also algorithms for HSC
the 110-layer ShortCut3-ResNet designed in this paper is a proposed in [30], Scatt-Net proposed in [31] and PCAnet
good network. proposed in [32]. It can be obtained from Table 5 that the

125742 VOLUME 8, 2020

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

TABLE 5. Performance comparison of different algorithms on the [5] L. Wen, K. Zhou, and S. Yang, ‘‘A shape-based clustering method for
CIFAR-10 dataset. pattern recognition of residential electricity consumption,’’ J. Cleaner
Prod., vol. 212, pp. 475–488, Mar. 2019.
[6] T. Zan, Z. Liu, H. Wang, M. Wang, and X. Gao, ‘‘Control chart pattern
recognition using the convolutional neural network,’’ J. Intell. Manuf.,
vol. 31, no. 3, pp. 703–716, Mar. 2020.
[7] J. Yu, X. Zheng, and S. Wang, ‘‘A deep autoencoder feature learning
method for process pattern recognition,’’ J. Process Control, vol. 79,
pp. 1–15, Jul. 2019.
[8] D. Freire-Obregón, F. Narducci, S. Barra, and M. Castrillón-Santana,
‘‘Deep learning for source camera identification on mobile devices,’’ Pat-
tern Recognit. Lett., vol. 126, pp. 86–91, Sep. 2019.
[9] X. Zhe, S. Chen, and H. Yan, ‘‘Directional statistics-based deep metric
learning for image classification and retrieval,’’ Pattern Recognit., vol. 93,
pp. 113–123, Sep. 2019.
[10] A. O’Hagan, T. B. Murphy, L. Scrucca, and I. C. Gormley, ‘‘Investigation
of parameter uncertainty in clustering using a Gaussian mixture model via
random forest is inferior to the traditional CNN in network jackknife, bootstrap and weighted likelihood bootstrap,’’ Comput. Statist.,
vol. 34, no. 4, pp. 1779–1813, Dec. 2019.
performance, and the algorithm proposed in this paper is [11] S. Wang, A. Gittens, and M. W. Mahoney, ‘‘Scalable kernel K-means
higher than the single structure algorithm in network test clustering with Nyström approximation: Relative-error bounds,’’ J. Mach.
accuracy. The AlexNet network uses a three-layer fully con- Learn. Res., vol. 20, no. 1, pp. 431–479, 2019.
[12] F. Karimi, S. Sultana, A. Shirzadi Babakan, and S. Suthaharan,
nected layer, so the network has a large number of training ‘‘An enhanced support vector machine model for urban expansion predic-
parameters, resulting in more storage resources occupied by tion,’’ Comput., Environ. Urban Syst., vol. 75, pp. 61–75, May 2019.
the final training model. In a comprehensive comparison, [13] P. Sur and E. J. Candès, ‘‘A modern maximum-likelihood theory for high-
dimensional logistic regression,’’ Proc. Nat. Acad. Sci. USA, vol. 116,
the algorithm proposed in this paper is more diversified when
no. 29, pp. 14516–14525, Jul. 2019.
extracting features and the test accuracy of the network has [14] D. Zhu, F. Zhang, S. Wang, Y. Wang, X. Cheng, Z. Huang, and Y. Liu,
been improved. After combining the ultra-lightweight net- ‘‘Understanding place characteristics in geographic contexts through graph
work structure, the amount of parameters is appropriately convolutional neural networks,’’ Ann. Amer. Assoc. Geographers, vol. 110,
no. 2, pp. 408–420, Mar. 2020.
reduced. [15] A. Jati and P. Georgiou, ‘‘Neural predictive coding using convolutional
neural networks toward unsupervised learning of speaker characteris-
tics,’’ IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 27, no. 10,
V. CONCLUSION
pp. 1577–1589, Oct. 2019.
In order to improve the ability of the convolutional neural [16] M. Morvan, D. Arangalage, G. Franck, F. Perez, L. Cattan-Levy,
network to classify and recognize two-dimensional images I. Codogno, M.-P. Jacob-Lenet, C. Deschildre, C. Choqueux, G. Even,
and speed up the convergence of the algorithm, this paper pro- J.-B. Michel, M. Bäck, D. Messika-Zeitoun, A. Nicoletti, G. Caligiuri,
and J. Laschet, ‘‘Relationship of iron deposition to calcium deposition
poses a new convolutional network algorithm. First, a recur- in human aortic valve leaflets,’’ J. Amer. College Cardiol., vol. 73, no. 9,
rent neural network is introduced into the convolutional neu- pp. 1043–1054, Mar. 2019.
ral network, and the deep features of the image are learned [17] M. F. Hansen, M. L. Smith, L. N. Smith, M. G. Salter, E. M. Baxter,
M. Farish, and B. Grieve, ‘‘Towards on-farm pig face recognition using
in parallel using the convolutional neural network and the convolutional neural networks,’’ Comput. Ind., vol. 98, pp. 145–152,
recurrent neural network. Not only can we use convolutional Jun. 2018.
neural networks to learn high-level features, but also recursive [18] K. Parmar, H. Kher, and M. Gandhi, ‘‘Facial expression recognition using
convolutional neural network,’’ J. Open Source Develop., vol. 6, no. 1,
neural networks to learn the combined features of low-level pp. 18–27, 2019.
features. Secondly, according to ResNet’s idea of skipping [19] S. Grossberg, ‘‘The resonant brain: How attentive conscious seeing reg-
convolutional layers, we construct a new residual module ulates action sequences that interact with attentive cognitive learning,
recognition, and prediction,’’ Attention, Perception, Psychophys., vol. 81,
ShortCut3-ResNet. Finally, the convolutional layer and the no. 7, pp. 2237–2264, Oct. 2019.
full connection process are optimized. Experiments show [20] H. T. Nguyen, C. T. Nguyen, T. Ino, B. Indurkhya, and M. Nakagawa,
that the proposed convolutional neural network algorithm can ‘‘Text-independent writer identification using convolutional neural net-
work,’’ Pattern Recognit. Lett., vol. 121, pp. 104–112, Apr. 2019.
improve the feature extraction accuracy and image recogni- [21] R. Wang, J. Xu, and T. X. Han, ‘‘Object instance detection with pruned
tion ability of convolutional neural network. AlexNet and extended training data,’’ Signal Process., Image Commun.,
vol. 70, pp. 145–156, Feb. 2019.
[22] P. Matlani and M. Shrivastava, ‘‘Hybrid deep VGG-NET convolutional
REFERENCES classifier for video smoke detection,’’ Comput. Model. Eng. Sci., vol. 119,
[1] J. Pan, ‘‘How chinese officials use the Internet to construct their public no. 3, pp. 427–458, 2019.
image,’’ Political Sci. Res. Methods, vol. 7, no. 2, pp. 197–213, Apr. 2019. [23] R. U. Khan, X. Zhang, and R. Kumar, ‘‘Analysis of ResNet and GoogleNet
[2] S. Liansheng, Z. Xiao, H. Chongtian, T. Ailing, and A. Krishna models for malware detection,’’ J. Comput. Virol. Hacking Techn., vol. 15,
Asundi, ‘‘Silhouette-free interference-based multiple-image encryption no. 1, pp. 29–37, Mar. 2019.
using cascaded fractional Fourier transforms,’’ Opt. Lasers Eng., vol. 113, [24] D. McNeely-White, J. R. Beveridge, and B. A. Draper, ‘‘Inception and
pp. 29–37, Feb. 2019. ResNet features are (almost) equivalent,’’ Cognit. Syst. Res., vol. 59,
[3] X. Zhu, Z. Li, X.-Y. Zhang, P. Li, Z. Xue, and L. Wang, ‘‘Deep convo- pp. 312–318, Jan. 2020.
lutional representations and kernel extreme learning machines for image [25] S. Scardapane, S. Van Vaerenbergh, S. Totaro, and A. Uncini, ‘‘Kafnets:
classification,’’ Multimedia Tools Appl., vol. 78, no. 20, pp. 29271–29290, Kernel-based non-parametric activation functions for neural networks,’’
Oct. 2019. Neural Netw., vol. 110, pp. 19–32, Feb. 2019.
[4] F. Wang, D. Jiang, H. Wen, and H. Song, ‘‘AdaBoost-based security level [26] H. Montanelli and Q. Du, ‘‘New error bounds for deep ReLU networks
classification of mobile intelligent terminals,’’ J. Supercomput., vol. 75, using sparse grids,’’ SIAM J. Math. Data Sci., vol. 1, no. 1, pp. 78–92,
no. 11, pp. 7460–7478, Nov. 2019. Jan. 2019.

VOLUME 8, 2020 125743

Y. Tian: Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm

[27] S.-H. Wang, K. Muhammad, J. Hong, A. K. Sangaiah, and Y.-D. Zhang, [39] W. Jing, T. Jiang, X. Zhang, and L. Zhu, ‘‘The optimisation of speech
‘‘Alcoholism identification via convolutional neural network based on recognition based on convolutional neural network,’’ Int. J. High Perform.
parametric ReLU, dropout, and batch normalization,’’ Neural Comput. Comput. Netw., vol. 13, no. 2, pp. 222–231, 2019.
Appl., vol. 32, no. 3, pp. 665–680, Feb. 2020. [40] S. Bacchi, L. Oakden-Rayner, T. Zerner, T. Kleinig, S. Patel, and J. Jannes,
[28] U. Cote-Allard, C. L. Fall, A. Drouin, A. Campeau-Lecours, C. Gosselin, ‘‘Deep learning natural language processing successfully predicts the cere-
K. Glette, F. Laviolette, and B. Gosselin, ‘‘Deep learning for electromyo- brovascular cause of transient ischemic attack-like presentations,’’ Stroke,
graphic hand gesture signal classification using transfer learning,’’ IEEE vol. 50, no. 3, pp. 758–760, Mar. 2019.
Trans. Neural Syst. Rehabil. Eng., vol. 27, no. 4, pp. 760–771, Apr. 2019. [41] Y. Zhang, X. Zhang, H. Sun, Z. Fan, and X. Zhong, ‘‘Portable brain-
[29] S. Feng, H. Zhou, and H. Dong, ‘‘Using deep neural network with small computer interface based on novel convolutional neural network,’’ Comput.
dataset to predict material defects,’’ Mater. Des., vol. 162, pp. 300–310, Biol. Med., vol. 107, pp. 248–256, Apr. 2019.
Jan. 2019. [42] C. Xu, J. Yang, H. Lai, J. Gao, L. Shen, and S. Yan, ‘‘UP-CNN: Un-
[30] Y. Zhang, Y. Qu, C. Li, Y. Lei, and J. Fan, ‘‘Ontology-driven hierarchi- pooling augmented convolutional neural network,’’ Pattern Recognit. Lett.,
cal sparse coding for large-scale image classification,’’ Neurocomputing, vol. 119, pp. 34–40, Mar. 2019.
vol. 360, pp. 209–219, Sep. 2019. [43] Y.-D. Zhang, Z. Dong, X. Chen, W. Jia, S. Du, K. Muhammad, and
[31] E. L. Carroll, R. Gallego, M. A. Sewell, J. Zeldis, L. Ranjard, H. A. Ross, S.-H. Wang, ‘‘Image based fruit category classification by 13-layer deep
L. K. Tooman, R. O’Rorke, R. D. Newcomb, and R. Constantine, ‘‘Multi- convolutional neural network and data augmentation,’’ Multimedia Tools
locus DNA metabarcoding of zooplankton communities and scat reveal Appl., vol. 78, no. 3, pp. 3613–3632, Feb. 2019.
trophic interactions of a generalist predator,’’ Sci. Rep., vol. 9, no. 1, [44] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon,
pp. 1–14, Dec. 2019. ‘‘Dynamic graph CNN for learning on point clouds,’’ ACM Trans. Graph.,
[32] F. Hu, M. Zhou, P. Yan, K. Bian, and R. Dai, ‘‘PCANet: A common solution vol. 38, no. 5, pp. 1–12, Nov. 2019.
for laser-induced fluorescence spectral classification,’’ IEEE Access, vol. 7, [45] S. H. S. Basha, S. R. Dubey, V. Pulabaigari, and S. Mukherjee, ‘‘Impact of
pp. 107129–107141, 2019. fully connected layers on performance of convolutional neural networks for
[33] Y. Wang, G. Wang, C. Chen, and Z. Pan, ‘‘Multi-scale dilated convolution image classification,’’ Neurocomputing, vol. 378, pp. 112–119, Feb. 2020.
of convolutional neural network for image denoising,’’ Multimedia Tools [46] L. Su, L. Ma, N. Qin, D. Huang, and A. H. Kemp, ‘‘Fault diagnosis of high-
Appl., vol. 78, no. 14, pp. 19945–19960, Jul. 2019. speed train bogie by residual-squeeze net,’’ IEEE Trans. Ind. Informat.,
[34] H. Sadr, M. M. Pedram, and M. Teshnehlab, ‘‘A robust sentiment analysis vol. 15, no. 7, pp. 3856–3863, Jul. 2019.
method based on sequential combination of convolutional and recursive
neural networks,’’ Neural Process. Lett., vol. 50, no. 3, pp. 2745–2761,
Dec. 2019.
[35] Z. Guo, X. Lv, L. Yu, Z. Zhang, and S. Tian, ‘‘Identification of hepatitis
b using Raman spectroscopy combined with gated recurrent unit and
multiscale fusion convolutional neural network,’’ Spectrosc. Lett., vol. 53,
no. 4, pp. 277–288, Apr. 2020.
[36] J. Bernal, K. Kushibar, D. S. Asfaw, S. Valverde, A. Oliver, R. Martí, and
YOUHUI TIAN received the master’s degree
X. Lladó, ‘‘Deep convolutional neural networks for brain image analysis
on magnetic resonance imaging: A review,’’ Artif. Intell. Med., vol. 95,
from the Heilongjiang University of Science and
pp. 64–81, Apr. 2019. Technology, in 2013. He is currently a Senior
[37] A. Kamilaris and F. X. Prenafeta-Boldú, ‘‘A review of the use of convo- Engineer with the Jiangsu Vocational Institute of
lutional neural networks in agriculture,’’ J. Agricult. Sci., vol. 156, no. 3, Commerce. His research interests include network
pp. 312–322, Apr. 2018. technology and information systems.
[38] F. Samadi, G. Akbarizadeh, and H. Kaabi, ‘‘Change detection in SAR
images using deep belief network: A new training approach based on mor-
phological images,’’ IET Image Process., vol. 13, no. 12, pp. 2255–2264,
Oct. 2019.

125744 VOLUME 8, 2020

Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
No ratings yet
Comparative Analysis of RAG Fine-Tuning and Prompt Engineering in Chatbot Development
4 pages
Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
No ratings yet
Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
5 pages
Theories, Detection Methods, and Opportunities of Fake News Detection
No ratings yet
Theories, Detection Methods, and Opportunities of Fake News Detection
4 pages
Review of Deep Convolution Neural Network in Image Classification
No ratings yet
Review of Deep Convolution Neural Network in Image Classification
6 pages
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
No ratings yet
A Review On Deep Learning Approaches To Image Classification and Object Segmentation 1
23 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
A Brief Survey and An Application of Sem
No ratings yet
A Brief Survey and An Application of Sem
38 pages
1 s2.0 S0031320317304120 Main
No ratings yet
1 s2.0 S0031320317304120 Main
24 pages
Research On Application of Deep Learning Algorithm in Image Classification (2021)
No ratings yet
Research On Application of Deep Learning Algorithm in Image Classification (2021)
4 pages
249 254Tesma601IJEAST
No ratings yet
249 254Tesma601IJEAST
7 pages
An Analysis On Object Recognition Using Convolutional Neural Networks
No ratings yet
An Analysis On Object Recognition Using Convolutional Neural Networks
8 pages
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
No ratings yet
Automatic Classification of Mechanical Components of Engines Using Deep Learning Techniques
10 pages
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
No ratings yet
(IJCST-V10I5P12) :mrs J Sarada, P Priya Bharathi
6 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Image Classification Using Convolutional Neural Networks
No ratings yet
Image Classification Using Convolutional Neural Networks
8 pages
Object Detection Using Convolutional Neural Network Transfer Learning
No ratings yet
Object Detection Using Convolutional Neural Network Transfer Learning
11 pages
A State-Of-The-Art Computer Vision Adopting Non - E
No ratings yet
A State-Of-The-Art Computer Vision Adopting Non - E
33 pages
Admin,+4554 Article+Text 17736 2 10 20210928
No ratings yet
Admin,+4554 Article+Text 17736 2 10 20210928
13 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Irjet V10i1067
No ratings yet
Irjet V10i1067
5 pages
Imagenet Classification With Deep Convolutional Neural Networks
No ratings yet
Imagenet Classification With Deep Convolutional Neural Networks
7 pages
Computation 11 00052
No ratings yet
Computation 11 00052
24 pages
Jimaging 09 00046 v2
No ratings yet
Jimaging 09 00046 v2
26 pages
ImageNet Classification With Deep
No ratings yet
ImageNet Classification With Deep
7 pages
1 s2.0 S0169023X24000090 Main
No ratings yet
1 s2.0 S0169023X24000090 Main
17 pages
Image Recognition and Processing Using Artificial Neural Network
No ratings yet
Image Recognition and Processing Using Artificial Neural Network
10 pages
Deep Learning Based Image Recognition For 5G Smart
No ratings yet
Deep Learning Based Image Recognition For 5G Smart
19 pages
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
No ratings yet
CNN Model For Image Classification Using Resnet: Dr. Senbagavalli M & Swetha Shekarappa G
10 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Design of A Recognition System Automatic
No ratings yet
Design of A Recognition System Automatic
8 pages
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
No ratings yet
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
9 pages
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Fruit Old
No ratings yet
Fruit Old
37 pages
Research Paper
No ratings yet
Research Paper
12 pages
Application of Deep Learning in Image Recognition
No ratings yet
Application of Deep Learning in Image Recognition
8 pages
SSRN Id3354412
No ratings yet
SSRN Id3354412
8 pages
Image Recognition
No ratings yet
Image Recognition
18 pages
A Survey On Computer Vision Algorithms
No ratings yet
A Survey On Computer Vision Algorithms
16 pages
Visual Image Understanding
No ratings yet
Visual Image Understanding
7 pages
4 100593163merged
No ratings yet
4 100593163merged
11 pages
Paper 12
No ratings yet
Paper 12
3 pages
Comparative Analysis of Different Convolutional Neural Network Algorithm For Image Classification
No ratings yet
Comparative Analysis of Different Convolutional Neural Network Algorithm For Image Classification
13 pages
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
No ratings yet
Master's Thesis Deep Learning For Visual Recognition: Remi Cadene Supervised by Nicolas Thome and Matthieu Cord
58 pages
Sagar Paper
No ratings yet
Sagar Paper
4 pages
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
No ratings yet
Feature Extraction Using Convolution Neural Networks (CNN) and Deep Learning
5 pages
Facial Recognition Using Deep Learning
No ratings yet
Facial Recognition Using Deep Learning
6 pages
Unit-5 DL
No ratings yet
Unit-5 DL
35 pages
Multi-Layered Deep Convolutional Neural Network For Object Detection
No ratings yet
Multi-Layered Deep Convolutional Neural Network For Object Detection
6 pages
Ijet 10892
No ratings yet
Ijet 10892
5 pages
Deep Learning Artificial Intelligence
No ratings yet
Deep Learning Artificial Intelligence
9 pages
PHD Visual Object Category Recognition
No ratings yet
PHD Visual Object Category Recognition
193 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
CNN 5
No ratings yet
CNN 5
8 pages
Pietrow 2017
No ratings yet
Pietrow 2017
5 pages
Classify Webcam Images Using Deep Learning
No ratings yet
Classify Webcam Images Using Deep Learning
17 pages
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
No ratings yet
Multi Distance Metric Network For Few Shot Learning: Farong Gao Lijie Cai Zhangyi Yang Shiji Song Cheng Wu
12 pages
I JP Am Image Classification
No ratings yet
I JP Am Image Classification
15 pages
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
No ratings yet
Transfer Learning For Object Detection Using State-of-the-Art Deep Neural Networks
7 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
A Study On Effects of Data Augmentation in Detection
No ratings yet
A Study On Effects of Data Augmentation in Detection
13 pages
Glossary: Harnessing The Disruption of Generative AI: Term Explanation
No ratings yet
Glossary: Harnessing The Disruption of Generative AI: Term Explanation
15 pages
Iso Iec 42001 Lead Auditor Elearning
No ratings yet
Iso Iec 42001 Lead Auditor Elearning
10 pages
An Overview of AI Applications in Wildlife Conservation: Binod Kumar
No ratings yet
An Overview of AI Applications in Wildlife Conservation: Binod Kumar
30 pages
GenAI 360 Degrees
No ratings yet
GenAI 360 Degrees
39 pages
Presentation 1 - Introdution To Computer
No ratings yet
Presentation 1 - Introdution To Computer
72 pages
B 5 - HubSpot and Motion AI Chatbot Enabled CRM
No ratings yet
B 5 - HubSpot and Motion AI Chatbot Enabled CRM
2 pages
Lecture 1 (Compatibility Mode)
No ratings yet
Lecture 1 (Compatibility Mode)
55 pages
Module 6 EET
No ratings yet
Module 6 EET
27 pages
Marketing Notes
No ratings yet
Marketing Notes
43 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
CV Li-Chen Fu 20231121 1
No ratings yet
CV Li-Chen Fu 20231121 1
10 pages
Dav 1 Unit
No ratings yet
Dav 1 Unit
30 pages
10 Detection of Cotton Plant Diseases Using Deep Transfer Learning
No ratings yet
10 Detection of Cotton Plant Diseases Using Deep Transfer Learning
19 pages
3-2-2. Soft Computing
No ratings yet
3-2-2. Soft Computing
2 pages
Statement of Purpose Galway
100% (2)
Statement of Purpose Galway
2 pages
Prompt Engineering Guide
No ratings yet
Prompt Engineering Guide
4 pages
A Study On Emerging Trends in Indian Startup Ecosystem - Big Data, Crowd Funding, Shared Economy
No ratings yet
A Study On Emerging Trends in Indian Startup Ecosystem - Big Data, Crowd Funding, Shared Economy
16 pages
Fractal Analytics
No ratings yet
Fractal Analytics
3 pages
Case AI
No ratings yet
Case AI
11 pages
Lex Eloquentia - 2
No ratings yet
Lex Eloquentia - 2
24 pages
Artificial Intelligence in ANESTHESIA
No ratings yet
Artificial Intelligence in ANESTHESIA
84 pages
Alisha Final Report
No ratings yet
Alisha Final Report
36 pages
Encyclopedia of Computer Graphics and Games-Springer (2024)
No ratings yet
Encyclopedia of Computer Graphics and Games-Springer (2024)
2,153 pages
AICTE QIP Broucher
No ratings yet
AICTE QIP Broucher
3 pages
Artificial Intelligence (Ai) : Prima Nur Pratama Fadhil Arif Fathoni Anas Rachmadi
No ratings yet
Artificial Intelligence (Ai) : Prima Nur Pratama Fadhil Arif Fathoni Anas Rachmadi
13 pages
Batch 10 Signature Verification
No ratings yet
Batch 10 Signature Verification
12 pages
An Intelligent Industrial Visual Monitoring and Maintenance Framework Empowered by Large-Scale Visual and Language Models
No ratings yet
An Intelligent Industrial Visual Monitoring and Maintenance Framework Empowered by Large-Scale Visual and Language Models
10 pages
Iemb2023 166 171
No ratings yet
Iemb2023 166 171
6 pages
Conversational AI Playbook Chatlayer
No ratings yet
Conversational AI Playbook Chatlayer
11 pages

Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm

Uploaded by

Artificial Intelligence Image Recognition Method Based On Convolutional Neural Network Algorithm

Uploaded by

SPECIAL SECTION ON GIGAPIXEL

PANORAMIC VIDEO WITH VIRTUAL REALITY

Artificial Intelligence Image Recognition

INDEX TERMS Convolutional neural network, artificial intelligence, image recognition.

I. INTRODUCTION In traditional pattern recognition methods, the most impor-

125732 VOLUME 8, 2020

This paper proposes a new convolutional neural network 3) POOLING

VOLUME 8, 2020 125733

σ (x) = 1/(1 + e−x ) (1)

The corresponding graphs of these four functions are

FIGURE 2. Process of convolution operation.

FIGURE 4. Curves of four activation functions.

4) FULLY CONNECTED LAYER

125734 VOLUME 8, 2020

yi = F(xi , Wi ) + Ws h(xi ) (8)

The combination of residual block and BN in ResNet can

y = H (x, WH ) · T (x, WT ) + x · C(x, WC ) (5)

Suppose that the goal of the neural network complex sub-

VOLUME 8, 2020 125735

Among them, the variable m represents the size of mini-

Then normalize the input data as shown in equation (11).

yi = γ x̂i + β = BNγβ (xi ) (12)

In the formula, the variable γ represents the scaling factor.

III. IMAGE RECOGNITION ALGORITHM BASED ON CNN

125736 VOLUME 8, 2020

This normalization process can maintain sparsity with-

The initialization expression of optimized convolution ker-

125738 VOLUME 8, 2020

2) WEIGHT UPDATE OF CONVOLUTION KERNEL

FIGURE 11. Calculation of feature map and weight derivative.

VOLUME 8, 2020 125739

B. OPTIMAL SELECTION OF PARAMETERS

3) POOLING METHOD AND SIZE SELECTION

125740 VOLUME 8, 2020

It can be seen from Figure 15 that in the iterative exper-

E. PERFORMANCE ANALYSIS OF IMAGE RECOGNITION

VOLUME 8, 2020 125741

FIGURE 16. Accuracy of ResNet on CIFAR-10 under different deep

FIGURE 18. Experimental analysis of the same network under different

FIGURE 17. Comparison chart of training time.

the 110-layer network is even lower than that of the 56-layer

125742 VOLUME 8, 2020

VOLUME 8, 2020 125743

125744 VOLUME 8, 2020

You might also like