Smart Robot Arm Motion Using Computer Vision
Smart Robot Arm Motion Using Computer Vision
Smart Robot Arm Motion Using Computer Vision
6, 2015
1Abstract—In this study computer vision and robot arm are fingerprints. In this study, a smart robot arm system is
used together to design a smart robot arm system which can designed to detect and identify randomly placed, in location
identify objects from images automatically and perform given and orientation, cutlery and plates on a table.
tasks. A serving robot application, in which specific tableware
There are many studies integrate computer vision with
can be identified and lifted from a table, is presented in this
work. A new database was created by using images of objects robot arm in literature. One of these works presents a
used in serving a meal. This study consists of two phases: First learning algorithm which attempts to identify points from
phase includes recognition of the objects through computer given two or more images of an object to grasp the object by
vision algorithms and determining the specified objects’ robot arm [6]. The algorithm performed with 87.8 % overall
coordinates. Second phase is the realization of the robot arm’s accuracy for grasping novel objects. In another study,
movement to the given coordinates. Artificial neural network is
computer vision was used to control a robot arm [7]. Some
used for object recognition in this system. 98.30 % overall
accuracy of recognition is achieved. Robot arm’s joint angles coloured bottle stoppers were placed on joints’ of the robot
were calculated by using coordinate dictionary for moving the arm. Therefore, the joints were recognized via these stoppers
arm to desired coordinates and the robot arm’s movement was using image recognition algorithms. The robot arm was
performed. simulated by detected joints in computer and 3D arm control
was performed by using stereo cameras. In two other studies
Index Terms—Classification, computer vision, robot arm, robot models were designed to play the game “rock, paper,
robot programming.
scissors” against an opponent [8], [9]. In both studies, a
I. INTRODUCTION fixated camera was used to get images of opponent’s hand to
determine the played move via computer vision algorithms.
Extracting meaningful information from images is one of In one of the studies, the robot has played a random move
the interests of the computer vision field. The primary [8]. But in the other study robot recognizes the opponent’s
objective is duplicating the human’s vision abilities on hand shape rapidly using computer vision algorithm and
electronic environment by applying methods on images for shapes the robot’s fingers such that it can beat the
processing, analysing and extracting information. Image opponent’s move [9]. In another work, the movements of a
understanding can be described as extracting symbolic or robot arm are controlled according to a human arm’s
numeric information from images by using methods movements using wireless connection and a vision system
constructed with geometry, physics and statistics [1]–[3]. [10]. Two cameras, having their planes perpendicular to
Computer vision provides basis for applications that use each other, capture the images of the arm’s movements
automated image analysis. Computers are preprogramed in through the red coloured wrist. The arm’s coordinates are
most applications that make use of computer vision to transmitted in binary format through a wireless RF
perform a specific task. Recently, learning based methods transmitter. The robot arm’s movements are synchronized
are also commonly used for that kind of applications [4]–[6]. using the received coordinates according to the human arm’s
Controlling processes, navigation, detecting events, position and orientation.
modelling objects or environments are examples of There are some other studies including autonomous object
computer vision based applications. detecting and grasping tasks. One of these studies presents
One of the applications of computer vision is to determine an autonomous robotic framework including a vision system
if any object or activity exists in a given image. The problem [11]. In their work, the robot arm can perform the task of
gets complicated as the number and type of objects with autonomous object sorting according to the shape, size and
random location, scale and position increase. Some of the colour of the object. In [12], randomly placed coloured
most successfully performed computer vision tasks in well- objects on a target surface and coloured gripper of the vision
defined illumination, background and camera angle are, based controlled educational robotic arm are detected and
recognizing simple geometric objects, analysing printed or the objects are moved to a predefined destination using two
hand-written characters, identifying human faces or on-board cameras. Centre-of-Mass based computation,
filtering and colour segmentation algorithm are used in order
Manuscript received April 15, 2015; accepted September 8, 2015. to locate the target and the position of the robotic arm. In
3
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 21, NO. 6, 2015
2) Test Database
For the test purposes we constructed a database that
includes 153 images, including randomly selected utensils
that placed on a dark background each having random
positions. Sample images from test database are given in
Fig. 4. Total number of utensils in test images are shown on
Table II.
Fig. 2. Steps of the second phase: Movement of the robot arm. Fig. 4. Sample images from test database.
4
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 21, NO. 6, 2015
TABLE II. NUMBER OF UTENSILS IN TEST DATABASE. Multi-Layer Perceptron (MLP) is one of the mostly used
Object Count of objects
structures of ANNs. MLP consist of various number of
Knife 101
Fork 208 hidden layers with different number of units besides input
Spoon 199 and output layers. The first layer receives the inputs from
Fruit knife 161 outside and transmits to hidden layers. Hidden layers process
Oval plate 93 the data in their turns and transmit to the output layer.
Total number of objects 762
Figure 5 shows the basic architecture of a MLP network
B. Object Detection and Feature Extraction [14].
Image processing methods are applied on acquired images D. Joint’s Angle Calculation and Robot Arm’s Movement
and objects are detected. The following steps are performed After the classification process, gravitational centres of
for this task: forks, knives and spoons and plates were determined as
The taken image was resized. targets of the robot arm. Angles of the joints were calculated
The coloured input image was converted to a grayscale on two 2-dimensional planes; x-y and x-z.
image. In this study a coordinate dictionary was created by
Sobel Filter was used for edge detection. generating x and y coordinates using (2) and (3) with respect
Image was filtered by a row matrix shaped to joint angles.
morphometric structure element in order to fix edge
disorders and make the edge apparent.
Overflowing or missing pixel issues were fixed by
erosion and dilation processes.
Inner sides of edges were filled in order to detect the
whole apparent area of object.
11 features were extracted for each object using
MATLAB. The extracted features are area, major axis
length, minor axis length, eccentricity, orientation, convex
area, filled area, Euler number, equivalent diameter, extent
and solidity of the detected image. All features are divided
by the perimeter value of the object for normalization
Fig. 6. Bone lengths (u) and joint angles (α) on the x-y plane.
purposes.
C. Image Classification
Artificial Neural Networks (ANN) are used for
xk i ui cos ij 1 j , (2)
classification [14]. ANN includes units that correspond to yk i ui sin ij 1 j . (3)
neurons of the biological neural network. There are input
and output layers in an ANN with adjustable weights and
xk and yk values were sampled for all possible triple
each neuron unit of these layers produces an output value
which is calculated via a function of the sum of its inputs combinations of three α angles that take values between
[14], [15]. The output value of each neuron is calculated as 0, , ,0 and / 2, / 2 , respectively, with a step
size of 0.05. As a result, 250047 ( xk , y k ) pairs were
yi f xi wi , (1) obtained. Then a coordinate dictionary that keeps the
( xk , y k ) pairs as keys and the angles as corresponding
where y i represents the output, f refers to activation values was created.
function, wi refers to weight and xi refers to input of the ith When a coordinate pair is searched in the dictionary, the
unit. pair that has the lowest Euclidian distance to the searched
pair is considered as the best match and corresponding
angles are used to construct the joint angle.
The algorithm explained above was used to determine
only the angles on the x-y plane. The last angle θ was
calculated on the x-z plane (Fig. 7) using (4).
Fig. 5. Architecture of the MLP network. Fig. 7. The last needed angle (θ) on the x-z plane.
5
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 21, NO. 6, 2015
tan 1 ( zh / xh ). (4)
6
ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 21, NO. 6, 2015
value and the previous one is reached a predefined Methods for better object recognition and classification
sensitivity value. and better coordinate value estimation in a less response
In this study, robot arm’s joint angles were determined time might be searched for future work. Besides, this study
using the coordinate dictionary method. Performances of the can be re-performed using a robot arm that has more fingers
gradient descent and the coordinate dictionary algorithms (three or five fingers). Additionally, instead of detecting all
are compared in Table V in terms of Euclidian distance error the objects in the image automatically and lifting all of them,
and time consumed while finding the best solution for the the algorithm might be changed such that only a predefined
objects. Comparison was performed using 1000 points that desired object is searched for and lifted for a more effective
were generated randomly (Fig. 10) in a region bounded by usage of the robot arm.
lines: x 20 , y 10 and the circle x 2 y 2 400 2 .
REFERENCES
Values are given in millimetres. [1] A. D. Kulkarni, Computer vision and fuzzy-neural systems. Prentice
Hall PTR, 2001, ch. 2 and ch. 6.
TABLE V. TEST RESULTS OF THE GRADIENT DESCENT [2] R. Jain, R. Kasturi, B. G. Schunck, Machine Vision, McGraw-Hill
ALGORITHM (GDA) AND THE COORDINATE DICTIONARY 1995, ch. 14.
METHOD (CDM) FOR RANDOMLY GENERATED 1000 POINTS. [3] D. A. Forsyth, J. Ponce, Computer vision: a modern approach.
Time Euclidian distance error Prentice Hall Professional Technical Reference, 2002, ch. 15.
(seconds) (millimeters) [4] G. Bradski, A. Kaehler, V. Pisarevsky, “Learning-based computer
vision with Intel's open source computer vision library”, Intel
GDA CDM GDA CDM Technology Journal, vol. 9, 2005.
Min 0 0.005411 0.001113 0.009403 [5] C. H. Lampert, H. Nickisch, S. Harmeling, “Learning to detect
Max 1.45 0.006539 15.199337 1.407663 unseen object classes by between-class attribute transfer”, IEEE
Computer Vision and Pattern Recognition, pp. 951–958, 2009.
Mean 0.030202 0.005579 0.038011 0.523210 [Online]. Available: http://dx.doi.org/10.1109/cvpr.2009.5206594
Std. dev. 0.054243 0.000163 0.510101 0.258507 [6] A. Saxena, J. Driemeyer, A. Y. Ng, “Robotic grasping of novel
objects using vision”, The International Journal of Robotics
Research, vol. 27, no. 2, pp. 157–173, 2008. [Online]. Available:
The results for gradient descent algorithm based [16] and http://dx.doi.org/10.1177/0278364907087172
coordinate dictionary based joint angle calculations are [7] R. Szabo, A. Gontean, “Full 3D robotic arm control with stereo
given in Table V. Results show that the joint angles are cameras made in LabVIEW”, Federated Conf. Computer Science and
Information Systems (FedCSIS), 2013, pp. 37–42.
calculated in 5.579 milliseconds with 0.523 millimeters [8] Y. Hasuda, S. Tshibashi, H. Kozuka, H. Okano, J. Ishikawa, “A robot
Euclidian distance error which is an ignorable error for designed to play the game Rock, Paper, Scissors”, IEEE Industrial
movement of the robot arm. Results also show that the Electronics, pp. 2065–2070, 2007. [Online]. Available:
http://dx.doi.org/10.1109/isie.2007.4374926
coordinate dictionary method is much faster than the [9] Ishikawa Watanabe Lab., University of Tokyo. [Online] Available:
gradient descent method in which the joint angles are www.k2.t.u-tokyo.ac.jp/fusion/Janken/index-e.html
calculated in 30.202 milliseconds. The standard deviation of [10] A. Shaikh, G. Khaladkar, R. Jage, T. Pathak, J. Taili, “Robotic arm
movements wirelessly synchronized with human arm movements
the distance error is 0.259 millimeters, which is almost half using real time image processing”, IEEE India Educators' Conf.
of standard deviation value when the gradient descent (TIIEC), Texas Instruments, 2013, pp. 277–284.
algorithm is used, which means that it produces more stable [11] S. Manzoor, R. U. Islam, A. Khalid, A. Samad, J. Iqbal, “An open-
source multi-DOF articulated robotic educational platform for
results than the method used in [16].
autonomous object manipulation”, Robotics and Computer-
Integrated Manufacturing, vol. 30, no. 3, pp. 351–362, 2014.
IV. CONCLUSIONS [Online]. Available: http://dx.doi.org/10.1016/j.rcim.2013.11.003
[12] N. Rai, B. Rai, P. Rai, “Computer vision approach for controlling
In this study, a smart robot arm system is designed. The educational robotic arm based on object properties”, IEEE Emerging
system can detect and identify cutlery and plates and lift Technology Trends in Electronics, Communication and Networking
them from a table. Average recall values of 98.62 % and (ET2ECN), 2nd Int. Conf., pp. 1–9, 2014. [Online]. Available:
http://dx.doi.org/10.1109/et2ecn.2014.7044931
93.83 % are obtained for training and test sets in the [13] T. P. Cabre, M. T. Cairol, D. F. Calafell, M. T. Ribes, J. P. Roca,
classification of the objects. In the previous study [16], the “Project-based learning example: controlling an educational robotic
smart robot arm system was performed average accuracy of arm with computer vision”, Tecnologias del Aprendizaje, IEEE
Revista Iberoamericana de, vol. 8, no. 3, pp. 135–142, 2013.
90 % using kNN classifier with the same features. [Online]. Available: http://dx.doi.org/10.1109/RITA.2013.2273114
Performance of the system is increased by the use of MLP [14] Y. Kutlu, M. Kuntalp, D. Kuntalp, “Optimizing the performance of
for classification. This results shows MLP is better model to an MLP classifier for the automatic detection of epileptic spikes”,
classify the objects with extracted features. Expert Systems with Applications, vol. 36, no. 4, 2009. [Online].
Available: http://dx.doi.org/10.1016/j.eswa.2008.09.052
The robot arm joints’ angles were calculated with an [15] C. M. Bishop, Neural networks for pattern recognition. Clarendon
average Euclidian distance error of 0.523 millimeters in an Press, 1995, ch. 4.
average time of 5.579 milliseconds. This is a very fast [16] B. Iscimen, H. Atasoy, Y. Kutlu, S. Yildirim, E. Yildirim, “Bilgisayar
gormesi ve gradyan inis algoritmasi kullanilarak robot kol
response with an acceptably small distance error for the uygulamasi”, Akilli Sistemlerde Yenilikler ve Uygulamalari (ASYU)
robot arm. Sempozyumu, 2014, pp. 136–140. (in Turkish).