Object Detection Recognition and Robot Grasping Based On Machine Learning A Survey

This document summarizes research on using machine learning for object detection, recognition, and robot grasping. It discusses traditional machine learning methods, deep learning using convolutional neural networks (CNNs), and newer approaches like unsupervised learning, self-supervised learning, and reinforcement learning. It also addresses the need to combine vision with tactile feedback to improve robot grasping performance. The goal is to provide a systematic review of the status of machine vision and tactile feedback for robot grasping.

Uploaded by

Dongkul Isaac Yu (Cable)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views25 pages

Object Detection Recognition and Robot Grasping Based On Machine Learning A Survey

Uploaded by

Dongkul Isaac Yu (Cable)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Received September 11, 2020, accepted September 23, 2020, date of publication October 5, 2020, date of current version

October 15, 2020.

Digital Object Identifier 10.1109/ACCESS.2020.3028740

Object Detection Recognition and Robot Grasping

Based on Machine Learning: A Survey
QIANG BAI 1 , SHAOBO LI 1,2,3 , JING YANG 1,3 , (Member, IEEE),
QISONG SONG1 , ZHIANG LI1 , AND XINGXING ZHANG1
1 Schoolof Mechanical Engineering, Guizhou University, Guiyang 550025, China
2 Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang 550025, China
3 Guizhou Province Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China

Corresponding author: Shaobo Li ([email protected])

This work was supported in part by the National Key Technologies Research and Development Program of China under
Grant 2018AAA0101800, in part by the National Natural Science Foundation of China under Grant 51475097 and Grant 91746116,
in part by the Ministry of Industry and Information Technology of the People’s Republic of China Talents under Grant [2016]213, and in
part by the Science and Technology Project of Guizhou Province Talents under Grant [2015]4011 and Grant [2016]5013.

ABSTRACT With the rapid development of machine learning, its powerful function in the machine vision
field is increasingly reflected. The combination of machine vision and robotics to achieve the same precise
and fast grasping as that of humans requires high-precision target detection and recognition, location and
reasonable grasp strategy generation, which is the ultimate goal of global researchers and one of the prereq-
uisites for the large-scale application of robots. Traditional machine learning has a long history and good
achievements in the field of image processing and robot control. The CNN (convolutional neural network)
algorithm realizes training of large-scale image datasets, solves the disadvantages of traditional machine
learning in large datasets, and greatly improves accuracy, thereby positioning CNNs as a global research
hotspot. However, the increasing difficulty of labeled data acquisition limits their development. Therefore,
unsupervised learning, self-supervised learning and reinforcement learning, which are less dependent on
labeled data, have also undergone rapid development and achieved good performance in the fields of image
processing and robot capture. According to the inherent defects of vision, this paper summarizes the research
achievements of tactile feedback in the fields of target recognition and robot grasping and finds that the
combination of vision and tactile feedback can improve the success rate and robustness of robot grasping.
This paper provides a systematic summary and analysis of the research status of machine vision and tactile
feedback in the field of robot grasping and establishes a reasonable reference for future research.

INDEX TERMS Machine learning, recognition, grasping, robot, tactile feedback, vision.

I. INTRODUCTION image judgment [1]–[14], etc. To this end, researchers hope

Vision is the main way in which humans to receive all to achieve great breakthroughs in machine vision to allow for
types of information, followed by tactile feedback. One goal precise recognition, positioning and grasp strategy generation
of researchers is to equip robots with vision systems that and the realization of stable grasping of robots, which could
have high accuracy and robustness, similar to human beings, lead to wide application.
to help people complete all types of work. Thus, machine Although the above papers provide a wide range of
vision has always been an important research topic in the research and surveys of machine learning and machine vision
field of artificial intelligence and robotics. With the rapid in plain image processing, there are very few surveys of
development of machine learning, machine vision has been machine learning used for object detection recognition and
widely and successfully applied in various image process- robot grasping. Accurate and fast object recognition and
ing tasks, such as defect detection, target detection, medical grasping based on vision are the basis of robot applica-
tions in both industry and real-life scenarios. This paper
The associate editor coordinating the review of this manuscript and mainly summarizes the research achievements of six main-
approving it for publication was Tao Zhou . stream methods in object detection recognition, positioning,

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 181855
Q. Bai et al.: Object Detection Recognition and Robot Grasping Based on Machine Learning

grasp strategy generation and grabbing, including traditional the main content of this paper. The second part discusses
machine learning, deep learning, unsupervised learning, self- the research achievements of several mainstream traditional
supervised learning, reinforcement learning and visual-tactile machine learning methods in image processing, object recog-
fusion. Machine learning is the inevitable product of artificial nition and guided robot grasping. The third part summarizes
intelligence development to a certain stage and has been put the performance of the convolutional neural network (CNN)
forward and developed for decades. The most substantial algorithm in object detection recognition position and grasp
advantage of traditional machine learning (support vector strategy generation. In the fourth part, aiming to address
machine (SVM), random forest, decision tree, clustering, the difficulty of acquiring label data, the paper describes
and Bayesian algorithms) is that it requires only a small the performance of unsupervised learning, self-supervised
amount of data and has strong interpretability and fast running learning and reinforcement learning in the fields of vision
speed [15]–[17]. However, with the increase in the amount and grabbing. The fifth part discusses the inherent defects of
of data, the performance of these algorithms becomes limited vision and summarizes the research achievements of robot
and stagnated instead of continuing to improve [18], [19]. For tactile feedback and the combination of vision and tactile.
a long time after the birth of the neural network algorithm in In the sixth part, the future development prospects of machine
the 1980s, SVMs and other machine learning algorithms had vision in robot object recognition and grasping are proposed
an advantage. However, the gradient vanishing problem of based on the above analysis. Finally, conclusions are drawn
these algorithms has led to difficulties in deep network train- in the seventh part.
ing [20], [21] and revealed limitations in the number of sam-
ples and computing power. In 2012, the success of the Alex II. CLASSICAL MACHINE LEARNING
network led to the comeback of the deep neural network [22]. It has been nearly 70 years since Arthur Samuel put forward
It is widely used in various fields of machine vision, and the concept of ‘‘machine learning’’ in 1952. In the 1980s,
its performance continues to increase with the increase in machine learning became an independent discipline and
datasets, avoiding the disadvantages of traditional machine developed rapidly. Since 2006, due to the demand of big
learning in large datasets. Deep learning needs numerous data analysis, neural networks based on machine learning
labeled data, but it is not easy to label all of the data, which have attracted more attention and become the basis of deep
has led to the emergence of unsupervised and self-supervised learning theory. Currently, the research of machine learning
learning algorithms. Unsupervised learning mainly addresses is mainly divided into two directions: the first is traditional
situations in which the input data is not labeled and the output machine learning, which mainly studies the learning principle
is not determined [23], [24]. This approach classifies the and pays attention to exploring the learning mechanism of
samples according to the similarity. However, unsupervised humanoids [32]–[36]; the second is the research of machine
learning has no label data at all, which may lead to slow learning in big data environments, which mainly focuses on
speed and low precision [25]. Self-supervised learning uses how to use information effectively and how to acquire hid-
the input data to generate supervisory information and ben- den, effective and understandable knowledge from massive
efits almost all types of downstream tasks [26], [27]. With amounts of data [37]–[41]. From the perspective of method-
Google’s successful application of reinforcement learning ology, machine learning can be divided into linear models and
in the Go game, reinforcement learning has attracted the nonlinear models. Linear models are relatively simple, but
worldwide attention of researchers. Reinforcement learning they are the basis of nonlinear models, and many nonlinear
considers sequence problems and has a long-term perspec- models are transformed from linear models [42]–[46]. Non-
tive on long-term returns [28], while supervised learning linear models can be divided into traditional machine learning
generally considers one-off problems and focuses on only models (SVM, KNN, decision tree, etc.) and deep learning
short-term and immediate returns. This long-term perspective models. Fig. 1 lists the currently mature traditional machine
of reinforcement learning is very important for determining learning algorithms and briefly describes their principles and
the optimal solution to many problems. The key point of the characteristics [47]–[51]. It is found that the functions of
above algorithm is to process the image collected by the cam- different algorithms are varied, indicating that each algorithm
era, realize the object detection recognition positioning and has different application scenarios. Although deep learning
grasp strategy generation and then guide the robot to complete plays a dominant role in the field of machine vision, deep
the capture. However, noncontact object perception always learning is data-driven and has poor performance in small
has inherent defects, especially in unstructured environments datasets [52]–[54]. However, traditional machine learning
and real-life scenes, and it is difficult to accurately predict can adapt to a variety of datasets; especially in scenarios with
the weight, shape and grasping strategy of the object [29]. small amounts of data (such as the medical field), machine
Based on the above situation, adding pressure sensors to learning has better performance [55], [56]. In this case,
the dexterous hand to provision it with tactile feedback and the advantages of traditional machine learning algorithms
combine it with vision has become a new direction in robot are highlighted. Alternately, the traditional machine learning
grasping research [30], [31]. model is small, and the requirement of computer hardware is
This paper is organized as follows. The first part introduces not high, which yields a strong speed advantage in the field of
the advantages and disadvantages of the six methods and manipulator grasping-based vision [57]–[59]. According to