Research and Prospect of Image Recognition Based o
Research and Prospect of Image Recognition Based o
1 Introduction
Image processing is to transform the image into a digital matrix and store it in a computer, and process
it with a certain algorithm. The basis of image processing is mathematics, and the main task is the
design and implementation of various algorithms. At present, image processing technology has been
widely used in biomedicine, communication technology, remote sensing technology, cultural
creativity, industrial design and production and many other fields. Image recognition and classification
technology is to extract potentially hidden, useful and even unknown knowledge from a large number
of images. Image recognition and classification technology is the intersection of data mining and
analysis, machine/deep learning, image retrieval and image processing, machine/computer vision,
artificial intelligence technology, database/data warehouse and other disciplines [1].
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution
of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Published under licence by IOP Publishing Ltd 1
ICCASIT 2020 IOP Publishing
Journal of Physics: Conference Series 1574 (2020) 012161 doi:10.1088/1742-6596/1574/1/012161
for classification. It can be seen from the whole process of information processing in the brain that
information features are extracted from the bottom to the top, and the higher up, the more abstract the
features are. The whole process is as follows:
3 Algorithm
2
ICCASIT 2020 IOP Publishing
Journal of Physics: Conference Series 1574 (2020) 012161 doi:10.1088/1742-6596/1574/1/012161
3
ICCASIT 2020 IOP Publishing
Journal of Physics: Conference Series 1574 (2020) 012161 doi:10.1088/1742-6596/1574/1/012161
Pooling is also called Dwon sampling, as opposed to Up sampling. A pooling layer is generally
required to reduce the amount of data for the feature graph obtained by convolution. The pooling
operation is shown below:
4
ICCASIT 2020 IOP Publishing
Journal of Physics: Conference Series 1574 (2020) 012161 doi:10.1088/1742-6596/1574/1/012161
characteristics of a local region learned by neural network from natural images are also suitable for
other adjacent local regions of images.
Compared with local connected network and full connected network, weight sharing network has
great advantages. The weight sharing network can reduce the number of training parameters, so as to
simplify the network structure and have wider application space. If the input image of 1000×1000 is
taken as an example, and the number of nodes in the convolutional layer is 106, 1012 weights are
required for the full connection. Assuming that the fully connected network adopts a local receptive
field of 10×10, the weights need to be reduced to 108. If the system has 100 filters, the weight sharing
network can reduce the number of weights to 104 based on the local receptive field. Thus, the weight
sharing network and the local connection network can greatly reduce the network parameters, simplify
the network structure, and improve the efficiency of image recognition.
4 Application Scenarios
CNN are widely used in the field of images, are currently less interpretable, just like a "black box".
Although they are constantly evolving, it is hard to say why the model performs so well.
Convolutional neural network has got big achievements in the field of image processing and
recognition. On the international standard ImageNet dataset, lots of successful models are based on
convolutional neural network. One of the advantages of the CNN technology algorithm is that it can
directly analyze the original image and avoids the original image’s complicated pre-processing[6].
Every major breakthrough in image recognition has involved the use of CNN as well as many derived
with this network model, can be directly to the image data as input, not only without manual for image
preprocessing and extra complex operations, such as feature extraction, and with its unique way of
fine-grained feature extraction, makes the image processing to reach the level of human.
With the further development of random technology, CNN has been a large number of applications
applied in face recognition, education image processing, intelligent driving, intelligent security, text
recognition, human-computer interaction, image search, and intelligent home[7][8].
The strength of CNN is that it can map low-dimensional low-level features to high-dimensional
high-level features. Therefore, all data that satisfy local correlation can be theoretically processed by
CNN, such as voice and text. In terms of natural language processing, CNN can be used to do some
basic tasks, such as part-of-speech tagging, entity recognition, text classification, etc., as well as some
cutting-edge tasks, such as machine translation, chatbot, etc.
5
ICCASIT 2020 IOP Publishing
Journal of Physics: Conference Series 1574 (2020) 012161 doi:10.1088/1742-6596/1574/1/012161
(2) When using CNN for image recognition, the network depth of different problem selections is
often different, and the depth value needs to be determined by manual pre-selection and
experimentation, which limits the universality of the network structure. Therefore, in practical
applications, we need to select an approximate general network structure depth for specific image
recognition problems.
(3) The application of CNN in image recognition mainly achieves the application effect through
data set training, which has great limitations for different data sets. Therefore, the CNN needs to train
the existing data set according to different problem sets in order to get the same result. If the
distribution of the training data set is different from that of the test data set, it is difficult for the
convolutional neural network to obtain a good recognition result.
In the image processing based on CNN, a complete normal theory has not been formed. At present,
many identification systems design the depth and level of the network based on a specific database,
and find the best parameters and optimization algorithm through continuous exploration. Human
reasons are relatively prominent, and there is no systematic theoretical explanation of factors affecting
the recognition effect of the CNN. In particular, when classifying and recognizing natural images, the
selection of the initial state parameters of the CNN and the optimization algorithm will have a big
effect on the network training [9]. If your validation set is more accurate than your training set, the
model is under-fitted. Overfitting occurs when your model overfits the training set. Selection will
result in the network not working, or it may over-fitting, under-fitting and any other problems.
We need to understand deeply CNN meanings and roles of each part, adjust various parameters to
optimize and deepen the network, make the network to obtain more information, or even to add our
innovation in the structure to solve various problem.
Acknowledgments
This research was supported by National Key R&D Program of China (Grant No.2017YFB1400400).
References
[1] Meng-xue Xu: An overview of image recognition technology based on deep learning. Computer
Products and Circulation 1, 213-213(2019).
[2] JING L, CHENG J H, SHI J Y, et al:Improvement. 2012.
[3] Liang Chang et al.: Convolutional neural networks in image comprehension. Automation 9,
1300-1312(2016).
[4] HINTON G E et al.: A fast learning algorithm for deep belief nets. Neural Computation 7,
1527-1554(2014).
[5] Xian-chang Chen: Deep learning algorithm and application research based on convolutional
neural network. Zhejiang Gongshang University,2014.
[6] Lin Zhang et al.: A review of research on convolutional neural networks. Chinese Journal of
Computers 40, 1229-1251(2017).
[7] Quan-Sen Sun et al.: A new method of feature fusion and its application in image recognition.
Pattern Recognition 12, 2437-2448(2005).
[8] Misgana Negassi et al.: Application of artificial neural networks for automated analysis of
cystoscopic images. World Journal of Urology 1, 2020.
[9] XU K, FENG D, MI H, et al.: Mixup-Based Acoustic Scene Classification Using Multi-channel.
Convolutional Neural Network, 2018.