0% found this document useful (0 votes)

37 views15 pages

AR Dynamic Image Recognition SEO

This document discusses using a convolutional neural network combined with other machine learning techniques for dynamic image recognition in augmented reality applications. It compares different CNN architectures and hyperparameters on image recognition tasks and finds that using more feature maps, smaller kernels, and more layers can improve results.

Uploaded by

gptplus08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views15 pages

AR Dynamic Image Recognition SEO

Uploaded by

gptplus08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012130, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Augmented Reality Dynamic Image Recognition

Technology Based on Deep Learning Algorithm
Qiuyun Cheng1, Sen Zhang1*, Shukui Bo1, Dengxi Chen1, Haijun Zhang2
1.School of Intelligent Engineering ,Zhengzhou University of Aeronautics; Zhengzhou Henan,450046,China

2.School of Aeronautical Engineering ,Zhengzhou University of Aeronautics; Zhengzhou Henan,450046,China

Corresponding Author:Sen Zhang(E-mail: [email protected] )

Natural Science Foundation of China (No. U1904119)：Research on the perception of foreign object invasion in airport clearance area
based on one-class learning. Natural Science Foundation of China (No. 51705472): Balancing Dynamics and Optimization of Mixed-
model Assembly Line Network for Complex Products based on Digital Twin.

ABSTRACT Augmented reality is a research hotspot developed on the basis of virtual reality. Friendly
human-computer interaction interface makes the application prospect of augmented reality technology very
broad. Convolutional neural networks in deep learning have been widely used in the field of computer
vision and become an important weapon in dynamic image recognition tasks. Combining deep learning and
traditional machine learning techniques, this paper uses convolutional neural networks to extract features
from image data. The convolutional neural network uses the last layer of features and uses the softmax
recognizer for recognition. This paper combines a convolutional neural network that can learn good feature
information with integrated learning that has good recognition effects. In the recognition tasks of the
MNIST database and the CIFAR-10 database, comparison experiments were performed by adjusting the
hierarchical structure, activation function, descent algorithm, data enhancement, pooling selection, and
number of feature maps of the improved convolutional neural network. The convolutional neural network
uses a pooling size of 3*3, and uses more cores (above 64), small receptive fields (2*2), and more
hierarchical structures. In addition, the Relu activation function, gradient descent algorithm with
momentum, and enhanced data set are also used. The research results show that under certain experimental
conditions, the dynamic image recognition results have dropped to a very low error rate in the MNIST
database, and the error rate in the CIFAR-10 database is also ideal.

INDEX TERMS dynamic image recognition; deep learning; CNN-XGBOOST model; augmented reality;
integrated learning.

I. INTRODUCTION recognition, spam or email recognition, etc., and has been

Augmented reality is to superimpose computer-generated integrated into our daily life from simple theory [2-4]. The
virtual objects, scenes and information with real scenes. application of image recognition is even more obvious.
"Enhanced" is to increase and strengthen understanding. Image recognition technology is indispensable from
"Reality" is a definition of real and existing things. It is fingerprints used in mobile phones, computers, etc. to turn
different from the fully immersive effect to be achieved by on or check in the punch card machine used, and then to the
virtual reality. The effect to be achieved by augmented sign-in system for face recognition [5,6].
reality is the fusion of virtual information and real scene [1]. The deepening of human neural network research and the
Augmented reality is an important branch of virtual reality, upgrading of hardware computing devices have created a
and also an important direction for the development of good external environment for the deep development of
human-machine interface technology. With the deep learning theory [7-9]. Deep learning builds a multi-
development and progress of science and technology, the layer feature extraction model by simulating the neural
application of recognition technology is becoming more network of the human brain, and uses massive training data
and more extensive, such as voice recognition, image to learn more accurate feature information, and ultimately

VOLUME XX, 2017 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3012130, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)

improves the recognition speed and accuracy of the overall and edge detection on the edge and texture features of the
model [10,11]. Related scholars have proposed the Deep image, respectively, and extract the underlying features of
Belief Network (DBN), which uses a series of Restricted the image [37,38]. In addition to the Scale Invariant Feature
Boltzmann Machine (RBM) and uses unsupervised layer- Transform (SIFT) algorithm for feature description, the
by-layer greedy training to extract features, obtain multiple local feature descriptor Spin Image is also used to count the
layers of deep network structure, and supervise adjustments characteristics of the two-dimensional coordinate
[12-14]. This research solved the problem of the distribution histogram around the feature points [39,40].
disappearance of gradients in deep network training, and The algorithm model also adopts a dense feature extraction
started the upsurge of deep learning research [15,16]. Goog method based on a fixed grid to extract features at multiple
Le Net uses a modular network structure and divides the scales [41].
entire inception network into 9 modules [17]. By increasing This paper conducts experiments on the MNIST database
the depth and width of the model, not only the sparsity of and the CIFAR-10 database by convolution kernel size and
the network structure is realized, but also the high its number, pooling size and method selection, parameter
computing performance of the dense matrix is used to update algorithm selection, activation function selection,
achieve precision identification and detection [18-20]. The and data enhancement, and analyzes the results.
Google Net network uses average pooling instead of the Specifically, the technical contributions of this article can
fully connected layer, and 2 additional softmaxes are added be summarized as follows:
to the network to conduct forward gradients, avoiding the First: We combine the multi-layer features of the
problem of gradient disappearance during model training convolutional neural network model in deep learning with
[21,22]. Alex Net uses the Re LU activation function, the traditional machine learning technology XGBoost
which not only fundamentally solves the problem of algorithm. The features extracted by the convolutional
gradient disappearance of deep learning networks, but also neural network are serially fused, and Principal
greatly speeds up the convergence rate [23]. Since the Re Components Analysis (PCA) technology is used to reduce
LU function can well suppress the gradient disappearance the dimension.
problem, Alex Net does not use "pre-training + "Fine- Second: In the experiment of dynamic image recognition
tuning" method, but fully adopts supervised training [24]. on these two databases, the number of selected cores is 64,
Alex Net also extends the Le Net5 structure, adding a the size of the receptive field is 3*3, the random pool is 3*3,
Dropout layer and LRN layer to reduce overfitting and and the Stochastic Gradient Descent (SGD) with
enhance generalization capabilities [25,26]. momentum is used. Optimization algorithm uses Relu as
Image recognition technology is the technology to the activation function, increases the amount of data and
recognize the target in the image, that is, the use of uses a 5-layer deep convolutional neural network, achieving
computer technology to simulate the human senses to a good recognition effect.
complete the image recognition and understanding process The rest of this article is organized as follows. Section 2
[27]. Recognition of target objects in images is a key analyzes the related theories and technologies of augmented
research direction in the field of image recognition, and has reality dynamic image recognition and deep learning.
been widely used in the fields of security, transportation, Section 3 constructs the CNN-XGBOOST augmented
and the Internet [28]. Object recognition can be divided into reality dynamic image recognition model. Section 4
object recognition and object detection [29]. Object conducted simulation experiments and analyzed the results.
recognition only needs to describe the characteristics of the Section 5 summarizes the full text.
target object in the image, and object detection not only
needs to obtain the feature description of the object in the II. RELEVANT THEORIES AND TECHNOLOGIES OF
image, but also needs to obtain the specific location AUGMENTED REALITY DYNAMIC IMAGE
information of the object [30,31]. Therefore, in addition to RECOGNITION AND DEEP LEARNING
characterizing the target, object detection also requires
analysis of the object structure [32]. Object recognition A. Key technologies of augmented reality
mainly focuses on feature learning. Relevant scholars The structure of the augmented reality system is shown
applied the Bag of Words (BoW) model in text recognition in Figure 1. It is implemented by a group of closely linked
to the field of image object recognition, and proposed a software and hardware working in real time [42]. On the
visual word packet model based on image recognition [33- one hand, the image collected by the camera is directly
35]. Through the underlying feature extraction and feature displayed on the display device and presented to the user in
coding, it is more discriminative and robust. Then, the a real scene; the image of the virtual object generated by the
feature expression of the whole image is obtained through computer is also transferred to the display device [43,44].
the feature set aggregation operation [36]. Finally, the In the integration of virtual and real, the full-scale
support vector machine is used for image recognition. The alignment of virtual and real scenes depends on the support
image recognition algorithm based on the word packet of the registration system [45]. Finally, the user is presented
model uses H augmented reality ris-Laplace operator and with the scene of virtual and real fusion.
Laplacian Gaussian operator to perform corner detection

VOLUME XX, 2017

Image
preprocessing
AR imaging
system

Human Projection system

interaction
Infrared R

Interactive
sensor projection
Augmented PC monitor system
reality display
Image Multi- system
Identification channel Real-time
interaction HD camera
Scene fusion system
Model of
virtual Wireless
scene image
transmission
Viewpoint coordinates

Multi-eye
vision tracking Communication
system Camera Virtual
Virtual object
Real world coordinates object
model
coordinates coordinates
User
Real world
scene
Video camera Virtual scene
generation system
Virtual scene
coordinates
Virtual-real Based on visual
Display synthesis system registration
Virtual camera coordinates
system Real scene
coordinates Tracking
Human-computer registration
Coordinate calibration of real
interaction system system
camera and virtual camera

FIGURE 1.Augmented reality system structure

1) Tracking technology relative to the signal receiver based on the current signal of
In the augmented reality system, the picture that the user the receiver.
sees changes as the viewing angle changes. The augmented The magnetic field tracking technology is not limited by
reality system needs to accurately track the user's location, the sight line or obstacles, and can be free from the
line of sight and other information in real time. The influence of other objects besides the conductive and the
performance of the tracking technology determines the magnet on the tracking, and the system refresh rate is
performance of the augmented reality system. higher and the real-time performance is better. In addition,
There are four main sensor-based tracking technologies the sensing device in the magnetic field tracking system is
commonly used in augmented reality systems: magnetic small and light, which is convenient for users. The
field tracking technology, optical tracking technology, applications of magnetic field tracking technology are
acoustic tracking technology, and inertial tracking mainly small-area, augmented reality applications without
technology. conductive magnets.
The magnetic field tracking system is usually composed In the optical tracking technology, the light source and
of control components, signal transmitters and receivers. the photosensitive device are diverse. The photosensitive
The signal transmitter and receiver are composed of device can be an ordinary camera, or a photodiode. The
mutually orthogonal electromagnetic induction coils. The transmission medium of the optical tracking system is an
signal generator generates a magnetic field from an optical signal, so the signal reception speed is fast and the
electromagnetic induction coil, and the signal receiver refresh rate is high, which is suitable for occasions with
receives the magnetic field and generates a corresponding high real-time requirements. However, the optical tracking
induced current. The algorithm set by the control unit can system requires that there should be no obstacles between
calculate the position and direction of the tracking target the sensor and the optical element. In addition, the cost of
optical tracking systems is relatively expensive.

VOLUME XX, 2017

Acoustic tracking technology uses ultrasound when The final effect of the augmented reality system is
tracking the target position. Compared with the magnetic displayed to the user through various means, and the
field tracking technology, the acoustic tracking technology display effect determines the user's intuitive feeling.
will not be interfered by the magnetic field, and the cost of Therefore, display technology is very important in
the system is much lower than other tracking systems. augmented reality systems.
However, similar to optical tracking technology, the The video perspective shows that the helmet closes the
reflection and diffraction of sound waves by obstacles can user's line of sight, with one or two cameras to shoot the
affect the accuracy of the system. In addition, the real scene. The camera video and graphics are synthesized
propagation rate of ultrasonic waves is low, the data refresh by the scene synthesizer, and the fusion result is finally
rate of the system is low, and the real-time performance is transmitted to the display in front of the user. The user can
poor. see through the synthesizer not only the real scene in front,
Inertial tracking technology uses inertial sensors to track. but also the virtual image reflected by the synthesizer.
The gyroscope can measure the three-degree-of-freedom In the display mode of video perspective, there are many
rotational motion of the tracking target to determine the positioning methods, a wider field of view, and more
direction of the head; the accelerometer measures the flexible synthesis of scenes, which can be delayed matching
motion acceleration of the head to determine the position of and brightness matching. The display method of optical
the head. perspective has high resolution, good safety, simple
The inertial tracking device is relatively light and structure, and no visual deviation compensation is required.
portable, suitable for dynamic tracking and outdoor The simplest and most used display method in the
tracking. Combined with GPS technology, it can achieve a augmented reality system is the display, and its schematic
better outdoor tracking effect. But the inertial tracking error diagram is shown in Figure 2. In the display of an ordinary
has a cumulative effect, so the accuracy is not high, and the display, the real scene captured by the camera and the
equipment is more expensive. virtual information generated by the computer are
2) Display technology combined and displayed on the ordinary display.
In the design of augmented reality systems, a basic
problem is the fusion of virtual information with real scenes.

Image event Incident response

management

Built-in
Video camera Image interaction image
module recognition Gaze Other
Camera orientation object program
information modules
Extended Call image
gesture response Event
Graphic
recognition function registration
system
Call the line of
sight response
Virtual object Sight Image object
Image function
Image object interaction response
tracking Get
management module function
image
Image synthesis Call the image
object
response function
Image
Enhance recognition
scene Event registration
Image interaction Other
images
module program
Incident response modules

Image event
management
Monitor
FIGURE 2.Schematic diagram of the display augmented reality

VOLUME XX, 2017

The integrated learning method combines the same

B. Theoretical basis of dynamic image recognition algorithm or different algorithms according to a certain rule,
1) Image feature extraction and combines different classifiers to learn together, which
The characteristics of the image are the basis of dynamic can achieve higher recognition accuracy than using an
image recognition and machine vision, and the algorithm alone. Common integrated learning methods
characteristics are crucial to the role of a model. For a mainly include Bagging and Boosting algorithms.
particular image feature, there are usually many different The kernel function method is mainly used to solve
expression methods. Due to the difference in subjective nonlinear problems, and the purpose is to find and learn the
understanding of human beings, there is no one best way to correlation in the data. The process is as follows: the first
express a certain feature. In fact, different expressions of step is to map the data to the high-dimensional feature
image features characterize certain attributes of the feature space using a non-linear function; the second step is to use
from different angles. Features can be divided into shallow the commonly used linear learner to divide and process
features (primary features) and structural features problems in the high-dimensional space using hyperplanes.
(advanced features) from the granularity of representation. There are two main advantages of the kernel method: first,
Shallow features are some edge features composed of this method can avoid the disaster of dimensionality, and
low-level pixel features; structural features are more has better resistance to overfitting and generalization;
structured, abstract, and complicated features composed of second, when passing nonlinear transformations, there is no
edge features. The low-level features can be transferred to need to select specific nonlinearities. Common kernel
the high-level features, step by step, and higher-level function methods include support vector machines and
feature representations can be obtained. Image feature normal random processes. This method is used more and
extraction includes two levels: the first layer is to extract more widely in the fields of image processing and machine
the features of the bottom layer; the second layer is to learning.
extract the high-level features of the image. Commonly
used underlying features are divided into shape features, C. Theoretical basis of deep learning
color features, and texture features. These features are 1) Neural network
robust and have low computational complexity. The The neural network originally originated from the
extraction of these basic features is the basis of image simulation of the human brain structure, which can be
processing. The high-level features include semantic-based regarded as evolved from the biological neural network. In
features. The semantic features are more abstract and need the rise of artificial intelligence, it was the focus of
to be extracted and learned from the underlying features. exploration at the time, also known as artificial neural
The computer can perform smarter analysis based on these network. The basic model of the neural network is a system
features, such as human behavior analysis, unmanned composed of interconnected units, which can respond to
driving, and face analysis. It is the feature extraction some stimuli input from the outside world, and then
process to describe high-dimensional features with low- activate the entire system. There are billions of neurons in
dimensional features through mapping or transformation. the central system of the human brain, and the level of
2) Method of dynamic image recognition computer development at that time was far from fully
The template matching method is a common method in simulating the human brain, so it was the simplest model
image processing. It uses a known pattern to find a abstracted at the time. Nevertheless, the algorithm model
corresponding pattern in another target image. The specific can still solve many problems, and has the advantages of
process is to match and compare the target image with the high stability, nonlinearity, and strong learning ability. The
template. In the large image, you find the similar direction neural network consists of many perceptron layers. In a
and size to the template according to the corresponding neural network, neurons that simulate biological nerves are
pattern, and then determine the position of the object. The connected to each other. Through a large amount of
disadvantage of the template matching method is that it learning, the parameters of the network can be adjusted,
requires researchers to have a certain amount of empirical that is, the weight matrix and the bias parameters can be
knowledge. Designing a suitable template and matching the automatically adjusted, and the nonlinear function can be
template with the target image depends on the matching of simulated.
each unit of the target image and each unit of the template. Neural networks usually use BP algorithm for iterative
Bayesian classification is a class of methods based on training. The calculation of each layer depends on the
probability statistics and based on Bayes' theorem, which calculation results of the previous layer, and the output of
belongs to the category of statistics. The step of Bayesian the previous layer is the input of the current layer. In this
classification method is to use a probabilistic form to process, the error will propagate from top to bottom, and
represent the classification problem and the related then calculate the difference between the sample output
probabilities are known. According to Bayes' theorem, the value and the actual value. The gradient descent algorithm
representative features of the image are extracted, and the is used to update the bias and weight parameters, which can
posterior probability is calculated to classify the image.

VOLUME XX, 2017

continuously minimize the error, and reach the set Capsule is a carrier containing multiple neurons, and
expectations. each neuron represents various attributes of a specific entity
2) Deep learning appearing in the image. These attributes can include many
With the in-depth development of shallow learning, the different types of instantiation parameters, such as attitude
SVM model develops fastest. Because these shallow (position, size, direction), deformation, speed, hue, and
models have only one hidden layer or no hidden layer and texture. One attribute of Capsule is the existence of an
are easy to train, they are used to deal with some simple or instance of a certain category in the image. The size of the
constrained problems, but their ability to deal with complex output value is the probability of the existence of the entity.
problems such as sound, natural language and images is Generally, the vector described in mathematics is a
limited. concept with direction and length. Comparing Capsule to a
Deep learning is actually a new branch of machine mathematical vector, it also has the so-called "length" and
learning algorithms. Through the hierarchical structure of "direction". Assuming that a Capsule represents the human
the staged information processing mechanism, eye, it is called a "human eye capsule", then its length
unsupervised feature extraction, pattern analysis and represents the probability that the eye exists in a certain
classification can be explored. This is the essence of deep position of the image, and the direction represents some
learning. You use the structure of multi-layer neural parameters of the eye, such as position, rotation angle,
network to realize the function of machine learning sharpness, etc.
algorithm. When training a deep network, since the Caps Nets are composed of n sub-networks (Capsule).
superposition of multiple linear functions does not change Each Capsule is dedicated to complete some individual
the nature of the function, it is necessary to replace the tasks, and Capsule itself needs a multi-layer network to
linear function neurons with non-linear function activation achieve. The output vector includes the state information of
neurons in the hidden layer, so as to increase the the object and the probability of the type of the object. The
representation ability of the deep network. In the field of output parameters of the lower layer capsule will be
dynamic image recognition, deep network learning can be converted into the higher layer capsule's prediction of the
understood as: the first layer, you learn some edge features entity state. If the predictions are consistent, the parameters
through image pixels; the second layer, you can learn the of the layer will be output.
contour, edge, corner and other features of the target; at a The Caps Nets model also uses a convolution structure
higher level based on these features, more essential and for feature extraction, but Primary Caps (main capsule layer)
complex features can be abstracted. Deep learning is a tool can divide the data information into multiple units under
for training through data, and the final purpose is feature multiple channels, thereby generating vectors that retain
learning and classification recognition. spatial information according to each unit. This structure
Compared with traditional machine learning methods, replaces the pooling layer in the traditional convolutional
deep learning methods are very practical when faced with network, which can effectively reduce the loss of
massive amounts of data. Deep learning methods can information. The last layer is similar to the fully connected
reduce model deviations through more complex models to layer, but each neuron is transformed into a Capsule
improve statistical estimation accuracy. In addition, deep structure for classification output, called Digit Caps layer.
learning is an end-to-end model that discards the 3) Convolutional neural network
intermediate steps of artificial rules and applies the learned For Convolutional Neural Network (CNN), it was
prior knowledge to other models. These advantages make originally inspired by a biological vision system and
deep learning methods very suitable for dynamic image designed a multi-layer perceptron model for the recognition
recognition. of two-dimensional data. CNN is essentially a combination
of a feature extractor and a classifier. Through continuous
D. Deep learning model feature learning on the input image, a set of feature vectors
1) Multi-layer perceptron closest to the meaning of the image is obtained, and then
Multi-Layer Perceptron (MLP) is also called an artificial the tail classifier is input to classify and identify the data.
neural network. In addition to the input and output layers, Figure 3 shows the overall structure of the convolutional
there can be many hidden layers in the middle. The MLP neural network. The input layer is usually a matrix, such as
layer is fully connected with each other. The bottom layer is an image. From the perspective of the feedforward network,
the input layer, the middle is the hidden layer, and finally the convolutional layer and the pooling layer can be
the output layer. The input layer neurons are responsible for regarded as hidden layers with special functions, and the
receiving information. For example, if an n-dimensional other layers except the input layer are ordinary hidden
vector is input, there are n neurons. The hidden layer layers. These hidden layers are generally calculated
neuron is responsible for processing the input information. according to different calculation methods. Usually, a
First, it is fully connected to the input layer. learning (training) process is needed to tune most of the
2) Capsule Network weight parameters.

VOLUME XX, 2017

Visual layer Augmented reality

dynamic image
recognition results
Fully connected layer
Convolutional Pooling Convolutional
Input layer
layer layer layer
150*150 40*40

Output layer
300*300
80*80
Convolutional Pooling layer
400*400 layer

Feature
learning
FIGURE 3.CNN overall structure diagram

The Alex Net model of distillation learning is used to

III. CNN-XGBOOST AUGMENTED REALITY DYNAMIC extract image features, and the CIFAR-10 image is input
IMAGE RECOGNITION MODEL into the network model. After a series of operations such as
convolution layer and pooling layer, the features of
A. CNN feature extraction different layers can be obtained.
Deep learning has a unique advantage in feature
extraction. With deep learning, a model composed of B. Feature fusion and dimensionality reduction
multiple neural network processing layers can learn the When training the Alex Net model, only the features
feature representation of data from data with different obtained from the last layer of the network model are used,
levels of abstraction. When performing target recognition and then input to the softmax layer, which is recognized by
tasks, feature extraction and expression are very important, the softmax recognizer. The feature information may not be
and the effect of task completion depends largely on comprehensive. In order to make the model have better
whether the information of the feature expression data performance and generalization ability, multi-layer feature
extracted from the data is accurate. Traditional data information is used for identification. Through the analysis
extraction features rely on manual extraction. For tasks with of the structure of the convolutional neural network, we can
small data sets and small feature dimensions, manual know that the convolutional layer and the pooling layer
extraction methods are also applicable. However, when the mainly extract the features of the image data. The final fully
amount of data is large and the dimensions of the data connected layer changes the extracted features into a one-
features are large, manual feature extraction is likely to dimensional output feature vector, which can be regarded as
cause missing feature information or inaccurate feature the data sample. The features extracted by the subsequent
extraction. Manual extraction often requires experience and layers of the convolutional neural network are relatively
a large number of experiments to gradually find a suitable more relevant to the original data and are more suitable as
method to extract features. With the emergence of massive features for distinguishing data. Therefore, the last three
data sets, the feature dimensions of image data are very layers of the Alex Net model are fused, the features are
large. The proposal and development of deep learning serially fused, and the fused features can be expressed by
solves the problem of feature extraction. Deep learning has the following formula.
good learning capabilities for features, can find complex Z = [ X l −3 X l −2 X l −1 X l ] (1)
structures in data.
Convolutional neural networks have excellent The fused feature vector has high dimensions and is easy
performance in image feature extraction. Through the to overfit. In order to solve the problem of fused feature
network model layer extraction, complex and abstract dimensions being high and feature information redundant,
advanced features of image data can be obtained (used to PCA technology is used for feature dimensionality
distinguish different targets in the image), so that the reduction, and the most representative features are selected
extracted features can express the essential information of to prevent over-fitting. Fitting enhances the generalization
image data. The image is used as the direct input of the ability of the model, while reducing the amount of
convolutional neural network without any processing calculation and increasing the speed of model calculation.
method. The convolutional neural network directly learns PCA is a method of data dimensionality reduction,
from the image data and learns the feature information of known as one of the most valuable results of linear algebra.
the image data, so as to achieve the purpose of feature It is a simple non-parametric method for extracting relevant
extraction of the image data. information from complex data. With minimal additional
effort, PCA provides a way to reduce complex data sets to

VOLUME XX, 2017

lower dimensions and identify the most important features. The base learner of the XGBoost algorithm can be a
Assuming that the dimension of the feature vector is 2, then linear recognizer. The GBDT algorithm uses only the first
the original data has two features. The PCA method derivative during optimization. The XGBoost algorithm
calculates the covariance matrix of the data set, and then does a second-order Taylor expansion of the loss function,
calculates the eigenvalue and eigenvector of the covariance introducing the first derivative and the second derivative.
matrix. The feature vector corresponding to the small At the same time, the cost function also introduces
eigenvalue is the secondary linear component, and the regularization to control the model complexity. Suppose
feature dimension can be reduced to 1. data set D has n samples and m features, expressed as:
The main idea of PCA is to transform the data from the D = ( X i , yi )( D = n, X i  R m , y  R) (2)
original coordinate system to the new coordinate system.
We use K functions to predict the final output:
The new coordinate system is determined by the data itself, K −1
that is, the n-dimensional feature is mapped to the k-
dimensional feature. The k-dimensional feature is a brand-
yˆ i = f
k =0
k ( X i ), f k F (3)
new orthogonal feature, known as main ingredient. The first
Among them, F represents the set of regression decision
principal component is selected from the direction with the
trees:
largest data difference (ie, the largest variance). The second
principal component selects the direction with the second  
F = f ( X ) = wq ( X ) (q : R m  T , w  R T )
(4)
largest data difference and is orthogonal to the first q(X) represents the structure of the tree, and maps the
principal component, and so on. Most of the variance is samples to the corresponding leaf nodes, w is the weight of
contained in the first k principal components, and the latter the leaf nodes, and wq(X) represents the prediction value of
part is almost close to 0. By selecting the matrix of feature the regression decision tree for the samples. T represents
vectors corresponding to the k features with the largest the number of leaf nodes of the tree, and each f k represents
feature value (that is, the largest variance), the original data the corresponding independent tree structure and leaf node
can be transformed into a new space. The flow chart of weight. Unlike the recognition tree, each leaf node on the
principal component analysis to analyze the principal regression tree is a continuous value. For a given example,
components of multivariable series is shown in Figure 4. the tree structure (that is, given q) is used to identify it to
the corresponding leaf node, and the final prediction result
Start
End
is calculated by summing the weight of the corresponding
Selected image
leaf node (that is, w).
Principal
data series When building a model, I hope that the smaller the loss
matrix
components function, the better. It is impossible to enumerate the
Yes Standardize the structure of all trees. You can use the greedy method, each
image data
series matrix
time you try to split a leaf node, you calculate how much
No Is the factor
load greater the loss function is reduced before and after the split, and
Computing
than 0.8?
correlation matrix
choose the one that reduces the most. Let I L and IR be the
left sample set and the right sample set after leaf node
Calculate the
Calculate the load matrix eigenvalues of the splitting, respectively. The loss reduction after splitting is
of each factor correlation matrix calculated by the following formula:
Calculate the feature vector
corresponding to the feature root
Calculate the
eigenvector of the
correlation matrix

 iI
gi  
gi 
 
Lsplite = 0.5 •  −
whose cumulative contribution i I
L
+ R
(5)
 
rate is greater than 0.9
 + hi  + hi 
Is the  iI L iI R 
Yes characteristic
value greater
than 0.03?
D. Combination of CNN and XGBoost
No
Calculate the variable corresponding to The convolutional neural network has a good effect on
Get a new series
the maximum weight in the eigenvector the feature information extraction of image data. By
corresponding to the eigenvalue with
eigenvalue less than 0.03
directly inputting the image data into the convolutional
neural network, the features with important information in
the image data can be extracted. However, in terms of
recognition and recognition, the convolutional neural
Discard load less
than 0.9 factor
network is not optimal. The convolutional neural network
series only uses the softmax recognizer for recognition. It assigns
FIGURE 4.The flow chart of principal component analysis to analyze the a high value to a certain neuron, and the rest of the neurons
principal components of multivariable series
are assigned low values, making the result polarized. In
practical applications, the ability to correct errors is reduced,
C. XGBoost algorithm
especially in images that are easily confused. The

VOLUME XX, 2017

recognition performance of ensemble learning is better, the best split scheme based on the summarized statistical data.
recognition accuracy is high, and it is not easy to overfit An important step in the approximation algorithm is to
and has good generalization ability. However, it is difficult propose candidate split points. In order to make the
to obtain good learning effects for the complex and candidate split points evenly distributed on the data, the
deformed features of image data. Therefore, this article hundreds of digits of the feature value are usually selected
combines the convolutional neural network that can learn as the candidate split points.
good features with the integrated learning with good 2) Reasons for high accuracy of XGBoost algorithm
recognition effect, and learns from each other to improve The XGBoost algorithm adds regularization to the loss
the accuracy of the recognition task. function, controls the complexity of the model, and uses
In this paper, the XGBoost algorithm in integrated pruning to improve the generalization ability of the model;
learning is selected. Because of its high operating efficiency and the second-order Taylor expansion of the loss function
and high precision, the XGBoost algorithm makes it an improves the efficiency and accuracy of the model solution.
important weapon in the task of target recognition. It is a The XGBoost algorithm has built-in cross-validation,
very sophisticated algorithm. which makes it easier to choose better hyperparameters and
1) Reasons for the high operating efficiency of XGBoost the model achieves better results.
algorithm Using the trained Alex Net model, we randomly extract
The XGBoost method uses parallel processing when it is 45,000 image data from the CIFAR-10 training set into the
implemented. Compared with the standard gradient lifting model, and save the last three layers of features in the
algorithm, it has a speed leap. The gradient lifting algorithm network model. The last three layers of features are serially
is composed in series, and the formation of the next base fused, and the PCA algorithm is used for feature
learner depends on the previous base learner, so it is not dimensionality reduction.
parallel when the base learner is trained. The most time- The XGBoost algorithm is used to train the features after
consuming step of the decision tree is to select the features dimensionality reduction, and the parameters of the
and feature values and then split the nodes. Before iterating, XGBoost recognizer are obtained through the cross-
XGBoost sorts the data according to the features and stores validation method. Like the Alex Net model, 45,000 in the
them as a block structure. The data of each block is stored CIFAR-10 data set are used for the training set, 5000 for
in a compressed column format. Each column is sorted the verification set, and 10,000 for the test set.
according to the corresponding characteristic value. Each
time the model is built iteratively, the block structure is
reused, which reduces the amount of calculation when IV. EXPERIMENT AND ANALYSIS
building the model, and can be processed in parallel when
calculating the gain of the feature. A. Selection of the number of cores and the size of the
When looking for the best split point, the greedy receptive field
algorithm is used to consider all the possible split points of In the convolutional layer, in theory, the number of
each feature value. The efficiency is too low. The XGBoost kernels (filters) actually represents the number of feature
method uses an approximate algorithm to speed up the split. maps. The more kernels, the more feature maps are
The algorithm first proposes candidate split points based on extracted, and the larger the network representation feature
the percentile of the feature distribution, then maps the space. The stronger the ability, the more accurate the final
continuous features to the areas divided by these candidate identification.
points, summarizes the statistical data, and proposes the

3 25
Train error Train error
Test error Test error
2.5
20

2
Error rate/%
Error rate/%

15
1.5
10
1

5
0.5

0 0
8-8-32 16-16-32 32-32-32 32-32-64 64-64-64 64-64-128 8-8-32 16-16-32 32-32-32 32-32-64 64-64-64 64-64-128
Three-layer convolution structure Three-layer convolution structure
(a) The recognition results of different core number designs on the MNIST database (b) The recognition results of different core number designs on
the CIFAR-10 database
FIGURE 5.Recognition results of different core number designs

VOLUME XX, 2017

In order to study the influence of the number of cores on In order to study the effect of the size of the receptive
the performance of the convolutional neural network, we field (convolution kernel) on the performance of the
designed several structural models based on CNN- convolutional neural network, we adjusted the size of the
XGBOOST. Keeping its hierarchical structure and other three-layer receptive field according to the CNN-
factors unchanged, we adjust its three-layer convolution XGBOOST structural model: 10*10, 9*9, 8* 8, 7*7, 5*5,
structure to be: 8-8-32, 16-16-32, 32-32-32, 32-32-64, 64- 3*3, 2*2. We keep other factors unchanged, and experiment
64-64, 64-64-128. Experiments were conducted on two on two databases. The experimental results of the MNIST
databases. The experimental results of the MNIST database database are shown in Figure 6(a), and the experimental
are shown in Figure 5(a), and the experimental results of results of the CIFAR-10 dataset are shown in Figure 6(b).
the CIFAR-10 dataset are shown in Figure 5(b).
2.5 30
Train error Train error
Test error Test error
25
2

Error rate/%
Error rate/%

20
1.5
15

1
10

0.5 5
10*10 9*9 8*8 7*7 5*5 3*3 2*2 10*10 9*9 8*8 7*7 5*5 3*3 2*2
Conv size Conv size
(a) The recognition results of different receptive field sizes on the MNIST database (b) The recognition results of different receptive field sizes on the
CIFAR-10 database
FIGURE 6.Recognition results of different receptive field sizes

In addition, the smaller the size of the convolution kernel,

Through the above experimental comparison, it can be the better the performance. It can even take the size of 1*1.
seen that changing the number of cores on the However, in the Imaget Net recognition experiment, the
convolutional layer has a reasonable improvement, but this recognition rate of 3*3 is better. The most likely reason is
is determined according to the configuration of your that the convolution is 1*1.
computer, otherwise too many cores will cause insufficient
memory. If the input image size is 300*300 and the number B. Pooling method and size selection
of cores is 128, then 11.52M is needed. The convolutional According to the structure of CNN-XGBOOST, three
memory of one layer is 11.52M. For a convolutional neural methods of pooling are selected in its 3-layer pooling layer:
network with a depth of 10, about 100M of memory is mean pooling, max pooling and stochastic pooling. We
needed. This is only needed for forward propagation, and change the size of its pooling to 2*2, 3*3, 4*4 and 5*5
back propagation needs to be multiplied by 2, then the respectively. No GPU acceleration or data enhancement
processing of a picture requires 200M of memory, and the was used during the experiment, and experiments were
mini-batch is set to 10, which requires 2G of memory, so if conducted on the MNIST database and the CIFAR-10
you use a large core number, first of all we must consider database. The best test results on the MNIST database are
the configuration of the computer and the support of the shown in Figure 7(a), and the test results of different
system. pooling sizes are shown in Figure 7(b).
Train error Mean train error
Test error 2 Mean test error
1.5 Max train error
Max test error
Stochastic train error
Error rate/%

Stochastic test error

1 1.5
Error rate/%

0.5
1
0

mean pooling 0.5

max pooling
Pooling method
stochastic pooling 2
1 0
4*4 2*2 5*5 3*3
Error categoryPooling size
(a) The recognition results of several pooling methods on the MNIST database (b) The recognition results of different pooling sizes on the MNIST
database

VOLUME XX, 2017

FIGURE 7.Recognition results on the MNIST database

The best test results on the CIFAR-10 database are the effect. For the random pooling method, the optimal size
shown in Figure 8(a), and the test results of different is 5*5. A smaller pooling size will cause overfitting, and an
pooling sizes are shown in Figure 8(b). excessively large pooling size will increase the error due to
In the pooling method, the larger the pooling size used in too much noise in downsampling.
the maximum pooling and the average pooling, the better

Train error Mean train error

Mean test error
30 Test error
40 Max train error
Max test error
Stochastic train error
Error rate/%

20 Stochastic test error

Error rate/%
10

20
0

mean pooling 10
max pooling
Pooling method
stochastic pooling 2
1 0
2*2
4*4 5*5 3*3
Error category Pooling size
(a) The recognition results of several pooling methods on the CIFAR-10 database (b) The recognition results of different pooling sizes on the CIFAR-
10 database
FIGURE 8.CIFAR-10 database recognition results

methods on convolutional neural networks, we change the

method of parameter update based on the CNN-XGBOOST
C. Optimization of network cost function algorithm model, set its learning rate to 0.012, mini-batch to 9, and
selection regularization coefficient to 0.0008.
In order to optimize the network cost function and solve Using these optimization algorithms on the MNIST
our neural network, it is necessary to update the network database and the CIFAR-10 database, the test errors for 30
parameters using a gradient algorithm. The most frequently iterations are shown in Figure 9(a) and Figure 9(b),
used are SGD, SGD with momentum, Adaptive Gradient respectively.
(ADAGRAD) and Nesterov’s Accelerated Gradient (NAG).
To this end, in order to compare the performance of these

6 65
SGD SGD
SGD with momentum 60 SGD with momentum
5 ADAGRAD ADAGRAD
NAG 55 NAG
Error rate/%

50
Error rate/%

4
45
3
40

35
2
30

1 25
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Epoch Epoch
(a) Experimental results of different optimization algorithms on the MNIST database (b) Experimental results of different optimization algorithms on
the CIFAR-10 database
FIGURE 9.Experimental results of different optimization algorithms

It can be seen from the experiment that SGD without ADAGRAD is very stable, so the curve is smoother and the
momentum can gradually reduce the error, but the decline effect is better.
speed and effect are general; the effect of SGD and NAG
with momentum is better than SGD, but the gradient D. Optimization of layer selection
decline rate is slower in the early stage, and SGD with In order to study the impact of each layer on the
momentum is better than NAG. The falling speed of performance of the convolutional neural network, we

VOLUME XX, 2017

removed some layers. CNN-XGBOOST(a) is to delete the Using the above five architectures, we keep other
CNN-XGBOOST fully connected layer structure; CNN- parameters the same as CNN-XGBOOST, and perform
XGBOOST(b) is to delete the CNN-XGBOOST third recognition experiments in the MNIST database. The
convolution pooling layer structure; CNN-XGBOOST(c) is recognition result is shown in Figure 10(a). Using the above
to delete the structure of the third and fourth convolution five architectures, we perform recognition experiments on
pooling layers of CNN-XGBOOST; CNN-XGBOOST(d) is the CIFAR-10 data set. The recognition result is shown in
the structure of deleting the third and fourth convolution Figure 10(b).
pooling layers and fully connected layers of CNN-
XGBOOST.
7 60
Train error Train error
Test error Test error
6
50

5
40
Error rate/%

Error rate/%
4

20
2

10
1

0 0
CNN-XGBOOST CNN-XGBOOST(a)CNN-XGBOOST(b)CNN-XGBOOST(c)CNN-XGBOOST(d) CNN-XGBOOST CNN-XGBOOST(a) CNN-XGBOOST(b) CNN-XGBOOST(c) CNN-XGBOOST(d)
Different hierarchies Different hierarchies
(a) The recognition results of different hierarchical structures on the MNIST database (b) The recognition results of different hierarchical structures
on the CIFAR-10 database
FIGURE 10.Identification results of different hierarchical structures

From the experimental results, deleting the top fully complexity also increases. The choice of increasing levels
connected layer does not have a great impact on the results, is also optimized based on its data set.
and deleting a convolutional pooling layer has a significant
decrease rate. This shows that the convolutional pooling E. Change of activation function
layer has an impact on the results. We delete the two-layer For the activation of neurons, an activation function
convolution pooling layer, the error increases particularly needs to be used. Commonly used are sigmoid, tanh, Relu,
fast. Through experiments, it also verified that the depth of LRelu, and PRelu. By changing the activation function of
the network has a great influence on the results. The deeper CNN-XGBOOST, experiments were conducted on the two
the network, the higher the accuracy, and the learned data sets. The experimental results are shown in Figure 11(a)
features are also different. However, the deeper the network, and Figure 11(b).
the more parameters there are for the network, and the

Train error Train error

2 Test error 60 Test error

1.5
Error rate/%
Error rate/%

40
1
20
0.5

0 0

sigmoid sigmoid
tanh tanh
Relu Relu
LRelu Activation function LRelu
Activation function PRelu 2 2 PRelu
1 1
Error category Error category
(a) The recognition results of different activation functions on the MNIST database (b) The recognition results of different activation functions on the
CIFAR-10 database
FIGURE 11.Recognition results of different activation functions

The sigmoid activation function is a popular model used it explains the activation of neurons very well, but sigmoid
in the past, especially in shallow neural networks, because will cause two problems:

VOLUME XX, 2017

First, the saturated neurons cause the gradient to F. Data enhancement

disappear; second, the sigmoid output is not centered on When using data augmentation, it helps prevent
zero, which will cause data dispersion and poor recognition overfitting and improve the recognition rate. To this end,
effect, especially in natural images, which can be seen in we compare the test error rate by continuously increasing
the experiment on CIFAR-10. the training data set, and verify that the recognition rate
The Tanh activation function solves the problem of data changes with the increase of the data set. The design of the
dispersion, but it still cannot solve the problem of the training set size of the two databases is: 1000, 2000, 3000 ,
gradient disappearing when the neuron is saturated. In 5000, 10000, half of the data set, the entire data set, twice
practical applications, it is recommended to use Relu, but to the data set.
adjust your learning rate, in natural images, we try to use After changing the amount of data in the training set,
Lrelu and PRelu, you can try to use tanh, for natural images, after 500 iterations, the experimental results on the two
it is best not to use the sigmoid function. databases are shown in Figure 12(a) and Figure 12(b),
respectively.

5 60

4 50
Error rate/%

3 Error rate/% 40

2 30

1 20
0 5 10 15 0 5 10 15
Training cases x 10
4 Training cases x 10
4

(a) The error of the MNIST database as the data increases (b) The error of the CIFAR-10 database as the data increases
FIGURE 12.Variation of error with data enhancement

During the training process, it can be seen that as the ADAGRAD and SGD with momentum converge faster and
number of trainings on the MNIST database and the have a better effect. The parameter optimization algorithm
CIFAR-10 database increases, the test error will change in CNN is generally a batch stochastic gradient descent
dynamically. Therefore, the increase in data can effectively algorithm with momentum, because it always converges
prevent overfitting. faster and has a better final value, where the momentum is
generally set to 0.9, the initial learning rate is 0.01. The
choice of layers is that the more layers, the better, but the
V. CONCLUSION depth of the layers should be selected according to the size
Augmented reality technology is a fusion of virtual and of the image data. For the MNIST database and the CIFAR-
real technology, which aims to accurately register 10 database, considering its space and time complexity, a 3-
computer-generated virtual information into real-time scene layer convolution pooling layer, a fully connected layer,
images collected in real time to form an enhanced image for and a softmax layer are optimal. In order to prevent
display to users, thereby enhancing the user's sensory overfitting and improve the generalization ability of the
enjoyment. This paper combines the convolutional neural structure, the method of increasing the data set was selected
network with the XGBOOST algorithm in integrated for the MNIST database and the CIFAR-10 database, and a
learning to make up for the problem of lack of feature good recognition effect was achieved.
information caused by the traditional neural network using
only the last layer of features for recognition. This paper
adopts the method of fusing multi-layer features to retain
more feature information for recognition, and then uses the REFERENCES
XGBOOST recognizer with good recognition effect for [1] F. Cheng, H. Zhang, W. Fan, et al., “Image recognition technology
recognition. When choosing a pooling method, it is better to based on deep learning," Wireless Personal Communications,
vol.102, no.2, pp.1917-1933, Jan. 2018.
choose random pooling for databases like MNIST and [2] X. Wang, W. Zhang, X. Wu, et al., “Real-time vehicle type
CIFAR-10. For the random pooling method, the optimal classification with deep convolutional neural networks," Journal of
size is 5*5. A smaller pooling size will cause overfitting, Real-Time Image Processing, vol.16, no.1, pp.5-14, Aug. 2019.
and an excessively large pooling size will increase the error [3] P. R. Jeyaraj, E. R. S. Nadar, “Computer-assisted medical image
classification for early diagnosis of oral cancer employing deep
due to too much noise in downsampling. However, in the learning algorithm," Journal of cancer research and clinical
selection of maximum pooling and average pooling, the oncology, vol.145, no.4, pp.829-837, Jan. 2019.
smaller the pooling size used, the better the effect. [4] S. Law, C. I. Seresinhe, Y. Shen, et al., “Street-Frontage-Net: urban

VOLUME XX, 2017

image classification using deep convolutional neural networks," [23] Y. Ariji, M. Fukuda, Y. Kise, et al., “Contrast-enhanced computed
International Journal of Geographical Information Science, vol.34, tomography image assessment of cervical lymph node metastasis in
no.4, pp.681-707, Dec. 2020. patients with oral cancer by using a deep learning system of
[5] R. Hang, Q. Liu, D. Hong, et al., “Cascaded recurrent neural artificial intelligence," Oral surgery, oral medicine, oral pathology
networks for hyperspectral image classification," IEEE Transactions and oral radiology, vol.127, no.5, pp.458-463, May 2019.
on Geoscience and Remote Sensing, vol.57, no.8, pp.5384-5394, [24] C. F. Higham, D. J. Higham, “Deep learning: An introduction for
Mar. 2019. applied mathematicians," SIAM Review, vol.61, no.4, pp.860-891,
[6] Y. Qin, L. Bruzzone, B. Li, et al., “Cross-domain collaborative Nov. 2019.
learning via cluster canonical correlation analysis and random [25] T. S. Borkar, L. J. Karam, “DeepCorrect: Correcting DNN models
walker for hyperspectral image classification," IEEE Transactions against image distortions," IEEE Transactions on Image Processing,
on Geoscience and Remote Sensing, vol.57, no.6, pp.3952-3966, Jan. vol.28, no.12, pp.6022-6034, Jun. 2019.
2019. [26] D. Ribli, A. Horváth, Z. Unger, et al., “Detecting and classifying
[7] Y. Ariji, Y. Yanashita, S. Kutsuna, et al., “Automatic detection and lesions in mammograms with deep learning," Scientific reports,
classification of radiolucent lesions in the mandible on panoramic vol.8, no.1, pp.1-7, Mar. 2018.
radiographs using a deep learning object detection technique," Oral [27] R. R. Saritha, V. Paul, P. G. Kumar, “Content based image retrieval
surgery, oral medicine, oral pathology and oral radiology, vol.128, using deep learning process," Cluster Computing, vol.22, no.2,
no.4, pp.424-430, Oct. 2019. pp.4187-4200, Feb. 2019.
[8] J. R. Hagerty, R. J. Stanley, H. A. Almubarak, et al., “Deep learning [28] P. Helber, B. Bischke, A. Dengel, et al., “Eurosat: A novel dataset
and handcrafted method fusion: higher diagnostic accuracy for and deep learning benchmark for land use and land cover
melanoma dermoscopy images," IEEE journal of biomedical and classification," IEEE Journal of Selected Topics in Applied Earth
health informatics, vol.23, no.4, pp.1385-1391, Jan. 2019. Observations and Remote Sensing, vol.12, no.7, pp.2217-2226, Jun.
[9] X. Zhu, Z. Li, X. Y. Zhang, et al., “Deep convolutional 2019.
representations and kernel extreme learning machines for image [29] N. Shibata, M. Tanito, K. Mitsuhashi, et al., “Development of a deep
classification," Multimedia Tools and Applications, vol.78, no.20, residual learning algorithm to screen for glaucoma from fundus
pp.29271-29290, Nov. 2019. photography," Scientific reports, vol.8, no.1, pp.1-9, Oct. 2018.
[10] X. He, Y. Chen, “Optimized input for cnn-based hyperspectral [30] C. Ju, A. Bibaut, M. van der Laan, “The relative performance of
image classification using spatial transformer network," IEEE ensemble methods with deep convolutional neural networks for
Geoscience and Remote Sensing Letters, vol.16, no.12, pp.1884- image classification," Journal of Applied Statistics, vol.45, no.15,
1888, May 2019. pp.2800-2818, Feb. 2018.
[11] J. W. Kim, P. K. Rhee, “Image Recognition based on Adaptive Deep [31] X. Lv, D. Ming, Y. Y. Chen, et al., “Very high resolution remote
Learning," The Journal of The Institute of Internet, Broadcasting sensing image classification with SEEDS-CNN and scale effect
and Communication, vol.18, no.1, pp.113-117, Feb. 2018. analysis for superpixel CNN classification," International journal of
[12] S. Mahdizadehaghdam, A. Panahi, H. Krim, et al., “Deep dictionary remote sensing, vol.40, no.2, pp.506-531, Sep. 2019.
learning: A parametric network approach," IEEE Transactions on [32] X. Yuan, P. He, Q. Zhu, et al., “Adversarial examples: Attacks and
Image Processing, vol.28, no.10, pp.4790-4802, May 2019. defenses for deep learning," IEEE transactions on neural networks
[13] S. Zhou, Z. Xue, P. Du, “Semisupervised stacked autoencoder with and learning systems, vol.30, no.9, pp.2805-2824, Jan. 2019.
cotraining for hyperspectral image classification," IEEE [33] I. M. Baltruschat, H. Nickisch, M. Grass, et al., “Comparison of
Transactions on Geoscience and Remote Sensing, vol.57, no.6, deep learning approaches for multi-label chest X-ray classification,"
pp.3813-3826, Jan. 2019. Scientific reports, vol.9, no.1, pp.1-10, Apr. 2019.
[14] S. Li, W. Song, L. Fang, et al., “Deep learning for hyperspectral [34] F. Özyurt, T. Tuncer, E. Avci, et al., “A novel liver image
image classification: An overview," IEEE Transactions on classification method using perceptual hash-based convolutional
Geoscience and Remote Sensing, vol.57, no.9, pp.6690-6709, Apr. neural network," Arabian Journal for Science and Engineering,
2019. vol.44, no.4, pp.3173-3182, Jul. 2019.
[15] J. M. Haut, M. E. Paoletti, J. Plaza, et al., “Visual attention-driven [35] K. Yasaka, H. Akai, A. Kunimatsu, et al., “Deep learning with
hyperspectral image classification," IEEE Transactions on convolutional neural network in radiology," Japanese journal of
Geoscience and Remote Sensing, vol.57, no.10, pp.8065-8080, Jun. radiology, vol.36, no.4, pp.257-272, Mar. 2018.
2019. [36] S. Bianco, L. Celona, P. Napoletano, et al., “On the use of deep
[16] S. Antholzer, M. Haltmeier, J. Schwab, “Deep learning for learning for blind image quality assessment," Signal, Image and
photoacoustic tomography from sparse data," Inverse problems in Video Processing, vol.12, no.2, pp.355-362, Aug. 2018.
science and engineering, vol.27, no.7, pp.987-1005, Sep. 2019. [37] D. Bychkov, N. Linder, R. Turkki, et al., “Deep learning based
[17] Y. Niu, Z. Lu, J. R. Wen, et al., “Multi-modal multi-scale deep tissue analysis predicts outcome in colorectal cancer," Scientific
learning for large-scale image annotation," IEEE Transactions on reports, vol.8, no.1, pp.1-11, Feb. 2018.
Image Processing, vol.28, no.4, pp.1720-1731, Apr. 2018. [38] J. Dean, D. Patterson, C. Young, “A new golden age in computer
[18] K. Z. Haider, K. R. Malik, S. Khalid, et al., “Deepgender: real-time architecture: Empowering the machine-learning revolution," IEEE
gender classification using deep learning for smartphones," Journal Micro, vol.38, no.2, pp.21-29, Jan. 2018.
of Real-Time Image Processing, vol.16, no.1, pp.15-29, Sep. 2019. [39] Q. Mao, F. Hu, Q. Hao, “Deep learning for intelligent wireless
[19] Y. J. Heo, S. J. Kim, D. Kim, et al., “Super-high-purity seed sorter networks: A comprehensive survey," IEEE Communications Surveys
using low-latency image-recognition based on deep learning," IEEE & Tutorials, vol.20, no.4, pp.2595-2621, Jun. 2018.
Robotics and Automation Letters, vol.3, no.4, pp.3035-3042, Oct. [40] P. Wang, H. Liu, L. Wang, et al., “Deep learning-based human
2018. motion recognition for predictive context-aware human-robot
[20] J. Zhou, L. Y. Luo, Q. Dou, et al., “Weakly supervised 3D deep collaboration," CIRP annals, vol.67, no.1, pp.17-20, May 2018.
learning for breast cancer classification and localization of the [41] A. K. Singh, B. Ganapathysubramanian, S. Sarkar, et al., “Deep
lesions in MR images," Journal of Magnetic Resonance Imaging, learning for plant stress phenotyping: trends and future
vol.50, no.4, pp.1144-1151, Mar. 2019. perspectives," Trends in plant science, vol.23, no.10, pp.883-898,
[21] Z. Gong, P. Zhong, Y. Yu, et al., “A CNN with multiscale Aug. 2018.
convolution and diversified metric for hyperspectral image [42] S. Pang, A. Du, M. A. Orgun, et al., “A novel fused convolutional
classification," IEEE Transactions on Geoscience and Remote neural network for biomedical image classification," Medical &
Sensing, vol.57, no.6, pp.3599-3618, Jan. 2019. biological engineering & computing, vol.57, no.1, pp.107-121, Jul.
[22] W. Wu, H. Li, X. Li, et al., “PolSAR image semantic segmentation 2019.
based on deep transfer learning—Realizing smooth classification [43] Y. Chen, K. Zhu, L. Zhu, et al., “Automatic design of convolutional
with small training sets," IEEE Geoscience and Remote Sensing neural network for hyperspectral image classification," IEEE
Letters, vol.16, no.6, pp.977-981, Jan. 2019. Transactions on Geoscience and Remote Sensing, vol.57, no.9,

2 VOLUME XX, 2017

pp.7048-7066, Apr. 2019. Shukui Bo, Received his Ph.D. in Cartography and
[44] X. Liu, R. Zhang, Z. Meng, et al., “On fusing the latent deep CNN Geographic Information System from Chinese
feature for image classification," World Wide Web, vol.22, no.2, Academy of Sciences in 2007. He is currently a
pp.423-436, Jue. 2019. Professor in the School of Intelligent Engineering,
[45] M. Andrews, M. Paulini, S. Gleyzer, et al., “End-to-End Physics Zhengzhou University of Aeronautics, Zhengzhou,
Event Classification with CMS Open Data: Applying Image-Based China. His current research interests cover the fields
Deep Learning to Detector Data for the Direct Classification of of information extraction from remote sensing
Collision Events at the LHC," Computing and Software for Big imagery, pattern recognition and image processing.
Science, vol.4, no.1, pp.1-14, Mar. 2020.

Qiuyun Cheng,Received master's degree from the

Shandong University in 2007. She is currently Dengxi Chen,Received master's degree from
teaches in School of Intelligent Engineering, Yanshan University in 2014. Now She is currently
Zhengzhou University of Aeronautics. Her research teaches in Zhengzhou University of Aeronautics,
interests include Intelligent Computing, Deep Her main research interests include data
learning, Image Processing. mining,computer image, information security.

Sen Zhang,Received master's degree from China

University of Geosciences in 2007. he is currently
teaches in School of Intelligent Engineering,
Zhengzhou University of Aeronautics. His main Haijun Zhang,Received Ph.D. degree from
research interests include image the Wuhan University of Technology in 2010.
processing,information security. He is currently an associate professor and
teaches in School of Aeronautical
Engineering, Zhengzhou University of
Aeronautics. His research interests include
Intelligent Computing , image processing.

2 VOLUME XX, 2017

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

Deep - Learning and AR
No ratings yet
Deep - Learning and AR
19 pages
CNN Image Recognition Advances
No ratings yet
CNN Image Recognition Advances
14 pages
249 254Tesma601IJEAST
No ratings yet
249 254Tesma601IJEAST
7 pages
Review of Deep Convolution Neural Network in Image Classification
No ratings yet
Review of Deep Convolution Neural Network in Image Classification
6 pages
Augmented Reality in Maintenance-History and Perspectives
No ratings yet
Augmented Reality in Maintenance-History and Perspectives
20 pages
Deep L Earning
No ratings yet
Deep L Earning
7 pages
A Review On Deep Learning Applications
No ratings yet
A Review On Deep Learning Applications
11 pages
The Combination of Arti Ficial Intelligence and Extended Reality: A Systematic Review
No ratings yet
The Combination of Arti Ficial Intelligence and Extended Reality: A Systematic Review
13 pages
The Impact of Artificial Intelligence in Augmented Reality: An Overview of Artificial Interactive Reality
No ratings yet
The Impact of Artificial Intelligence in Augmented Reality: An Overview of Artificial Interactive Reality
6 pages
Semantic Image Segmentation for Autonomous Driving
No ratings yet
Semantic Image Segmentation for Autonomous Driving
38 pages
Virtual and Augmented Reality An Overview
No ratings yet
Virtual and Augmented Reality An Overview
5 pages
Augmented Reality: Abhinav Kaushik
No ratings yet
Augmented Reality: Abhinav Kaushik
27 pages
AR & LLMs for Industrial Efficiency
No ratings yet
AR & LLMs for Industrial Efficiency
11 pages
AR in Education and Training
No ratings yet
AR in Education and Training
19 pages
Chapter 2: Software Requirement Specification...................... 7
No ratings yet
Chapter 2: Software Requirement Specification...................... 7
9 pages
Augmented Reality: Trends and Applications
No ratings yet
Augmented Reality: Trends and Applications
7 pages
23.adaptive Projection AR With Object Recognition Based On Deep Learning
No ratings yet
23.adaptive Projection AR With Object Recognition Based On Deep Learning
2 pages
8 Vol 103 No 6
No ratings yet
8 Vol 103 No 6
15 pages
A Framework For Automatic Generation of Augmented Reality Maintenance & Repair Instructions Bases On Convolutional Neural Networks
No ratings yet
A Framework For Automatic Generation of Augmented Reality Maintenance & Repair Instructions Bases On Convolutional Neural Networks
6 pages
Augmented Reality Android App
No ratings yet
Augmented Reality Android App
4 pages
AI Image Recognition Advances
No ratings yet
AI Image Recognition Advances
6 pages
Image Recognition in Self-Driving Cars Using CNN
No ratings yet
Image Recognition in Self-Driving Cars Using CNN
7 pages
Augmented Reality Implementation Methods in Mainstream Applications
No ratings yet
Augmented Reality Implementation Methods in Mainstream Applications
11 pages
AR with Unity 3D & Vuforia Guide
No ratings yet
AR with Unity 3D & Vuforia Guide
8 pages
AR with Unity 3D & Vuforia Guide
100% (3)
AR with Unity 3D & Vuforia Guide
8 pages
AI Innovations in Manufacturing Review
No ratings yet
AI Innovations in Manufacturing Review
19 pages
Tale 2020 ARLEMv 4 Preprint
No ratings yet
Tale 2020 ARLEMv 4 Preprint
4 pages
A Survey On Applications of Augmented Re
No ratings yet
A Survey On Applications of Augmented Re
10 pages
Computer Summer Work
No ratings yet
Computer Summer Work
8 pages
Navigating New Realities
No ratings yet
Navigating New Realities
19 pages
12 12 13 Resubmit Literature Review Prospectus Katrina Currie Educ 639 Literaturereview Augmentedreality 13
No ratings yet
12 12 13 Resubmit Literature Review Prospectus Katrina Currie Educ 639 Literaturereview Augmentedreality 13
27 pages
Augmented Reality Overview
No ratings yet
Augmented Reality Overview
11 pages
Remote Sensing
No ratings yet
Remote Sensing
21 pages
Augmented Reality Applications in Manufacturing and Its Future Scope in Industry 4.0
No ratings yet
Augmented Reality Applications in Manufacturing and Its Future Scope in Industry 4.0
30 pages
Deep Learning vs. Traditional Computer Vision
No ratings yet
Deep Learning vs. Traditional Computer Vision
17 pages
Deep Learning in Image Classification
No ratings yet
Deep Learning in Image Classification
4 pages
Augmented Reality in Context To India: Nternational Ournal of Nnovative Esearch in Omputer and Ommunication Ngineering
No ratings yet
Augmented Reality in Context To India: Nternational Ournal of Nnovative Esearch in Omputer and Ommunication Ngineering
6 pages
Report Phase1
No ratings yet
Report Phase1
39 pages
Impact of Computer Vision on AR
No ratings yet
Impact of Computer Vision on AR
27 pages
AR in Manufacturing: A Seminar
No ratings yet
AR in Manufacturing: A Seminar
29 pages
Data Augmentation For Improving Deep Learning in Image Classification Problem
No ratings yet
Data Augmentation For Improving Deep Learning in Image Classification Problem
7 pages
Machine Learning for Image Recognition
No ratings yet
Machine Learning for Image Recognition
9 pages
I J Seas 20190102
No ratings yet
I J Seas 20190102
9 pages
Augmented Reality: Department of Ece, Brill
No ratings yet
Augmented Reality: Department of Ece, Brill
27 pages
A Seminar Report On Augumented Reality
No ratings yet
A Seminar Report On Augumented Reality
32 pages
Augmented Reality A Comprehensive Review
No ratings yet
Augmented Reality A Comprehensive Review
24 pages
IJCRT2012248
No ratings yet
IJCRT2012248
3 pages
Victim Detection With Infrared Camera in A "Rescue Robot": Saeed Moradi
No ratings yet
Victim Detection With Infrared Camera in A "Rescue Robot": Saeed Moradi
7 pages
Module 11
No ratings yet
Module 11
7 pages
ALPR System Design Using CNN Techniques
No ratings yet
ALPR System Design Using CNN Techniques
8 pages
Research Acv
No ratings yet
Research Acv
6 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Deep Learning vs. Traditional CV
No ratings yet
Deep Learning vs. Traditional CV
17 pages
CH 8
No ratings yet
CH 8
42 pages
Time Series in AI
No ratings yet
Time Series in AI
21 pages
United States Patent: (10) Patent No.: (45) Date of Patent
No ratings yet
United States Patent: (10) Patent No.: (45) Date of Patent
17 pages
Pietrow 2017
No ratings yet
Pietrow 2017
5 pages
Coal Mining Worksheet
No ratings yet
Coal Mining Worksheet
4 pages
Audit Evidence - Specific Considerations: Francis H. Villamin
No ratings yet
Audit Evidence - Specific Considerations: Francis H. Villamin
48 pages
Teacher As An Organizational Leader
No ratings yet
Teacher As An Organizational Leader
30 pages
Eclipse Phase Rulebook Errata
No ratings yet
Eclipse Phase Rulebook Errata
5 pages
Catálogo de Pruebas Psicopedagógicas 2012
No ratings yet
Catálogo de Pruebas Psicopedagógicas 2012
61 pages
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
100% (2)
"The Bomb": Iowa Agricultural College Yearbook For The Class of 1895
259 pages
Biomax Presentation PDF
No ratings yet
Biomax Presentation PDF
26 pages
Ojt
No ratings yet
Ojt
4 pages
Schneiders Electric PDF
No ratings yet
Schneiders Electric PDF
251 pages
Final Imperial Residencies 1march New
No ratings yet
Final Imperial Residencies 1march New
13 pages
TC Colorcodes
100% (1)
TC Colorcodes
7 pages
Engineering Dynamics Essentials
No ratings yet
Engineering Dynamics Essentials
21 pages
English Language Evolution in China
No ratings yet
English Language Evolution in China
22 pages
6 Pillars of Intimacy Ebook July 2021
No ratings yet
6 Pillars of Intimacy Ebook July 2021
22 pages
Assignment B - Materials: Controlled Practice Activity 1
100% (10)
Assignment B - Materials: Controlled Practice Activity 1
2 pages
FEMINE
No ratings yet
FEMINE
133 pages
Nuclear MCQ 2
100% (4)
Nuclear MCQ 2
2 pages
Flow Coefficients CV Values: Bolted Bonnet Globe Valves API 623 & B 16.34 Class: 150 - 2500 Size: 2" - 18"
No ratings yet
Flow Coefficients CV Values: Bolted Bonnet Globe Valves API 623 & B 16.34 Class: 150 - 2500 Size: 2" - 18"
1 page
Accessories - Caponord Aprilia
No ratings yet
Accessories - Caponord Aprilia
12 pages
Hygienic Cleaning of Food Rooms and Catering Equipment (Deep Cleaning)
No ratings yet
Hygienic Cleaning of Food Rooms and Catering Equipment (Deep Cleaning)
16 pages
Area 9 - Laboratory (Prelim)
No ratings yet
Area 9 - Laboratory (Prelim)
5 pages
Proposal For The Creation of The National Space Development and Utilization Policy and The National Space Agency
No ratings yet
Proposal For The Creation of The National Space Development and Utilization Policy and The National Space Agency
54 pages
Sundaram ECE301 Notes PDF
No ratings yet
Sundaram ECE301 Notes PDF
115 pages
Tic-Tac Toe: C++ Programming Language
No ratings yet
Tic-Tac Toe: C++ Programming Language
20 pages
Business Plan: of GROUP 1 From A2-11ABM-07
No ratings yet
Business Plan: of GROUP 1 From A2-11ABM-07
15 pages
Plant Breeding Biometry Biotechnology 2nd Edition Dipak Kumar Kar Install Download
No ratings yet
Plant Breeding Biometry Biotechnology 2nd Edition Dipak Kumar Kar Install Download
74 pages
46.sae100 R1at en 853 1SN
No ratings yet
46.sae100 R1at en 853 1SN
1 page
Permutations and Combinations Problems
No ratings yet
Permutations and Combinations Problems
4 pages
Complete Notes - f1 - Business Studies
No ratings yet
Complete Notes - f1 - Business Studies
22 pages
Abdull Presentation
No ratings yet
Abdull Presentation
9 pages

AR Dynamic Image Recognition SEO

Uploaded by

AR Dynamic Image Recognition SEO

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Augmented Reality Dynamic Image Recognition

2.School of Aeronautical Engineering ,Zhengzhou University of Aeronautics; Zhengzhou Henan,450046,China

I. INTRODUCTION recognition, spam or email recognition, etc., and has been

VOLUME XX, 2017 1

VOLUME XX, 2017

Human Projection system

FIGURE 1.Augmented reality system structure

VOLUME XX, 2017

Image event Incident response

VOLUME XX, 2017

The integrated learning method combines the same

VOLUME XX, 2017

VOLUME XX, 2017

Visual layer Augmented reality

The Alex Net model of distillation learning is used to

VOLUME XX, 2017

VOLUME XX, 2017

VOLUME XX, 2017

In addition, the smaller the size of the convolution kernel,

Stochastic test error

mean pooling 0.5

VOLUME XX, 2017

FIGURE 7.Recognition results on the MNIST database

Train error Mean train error

20 Stochastic test error

methods on convolutional neural networks, we change the

VOLUME XX, 2017

Train error Train error

VOLUME XX, 2017

First, the saturated neurons cause the gradient to F. Data enhancement

VOLUME XX, 2017

2 VOLUME XX, 2017

Qiuyun Cheng,Received master's degree from the

Sen Zhang,Received master's degree from China

2 VOLUME XX, 2017

You might also like