Implementation_of_Object_Detection_and_Recognition_Algorithms_on_a_Robotic_Arm_Platform_Using_Raspberry_Pi
Implementation_of_Object_Detection_and_Recognition_Algorithms_on_a_Robotic_Arm_Platform_Using_Raspberry_Pi
Abstract— In this paper, it is aimed to implement object curvatures of curves and corners [1]. Through these features,
detection and recognition algorithms for a robotic arm platform. objects can be defined independently from the whole.
With these algorithms, the objects that are desired to be grasped Due to the difficulties encountered in object recognition
by the gripper of the robotic arm are recognized and located. In applications, object recognition algorithms have an extensive
the experimental setup that established, OWI–535 robotic arm literature. With the local feature-based approach, image
with 4 DOF and a gripper, which is similar to the robotic arms
matching studies are started with the corner detector of
used in the industry, is preferred. Local feature-based algorithms
such as SIFT, SURF, FAST, and ORB are used on the images Morevec [2]. This detector, developed by Morevec, has been
which are captured via the camera to detect and recognize the developed by Harris and Stephans [3] to give better results on
target object to be grasped by the gripper of the robotic arm. the details of the image and the edges near each other. In [4],
These algorithms are implemented in the software for object Harris used the developed detector for motion tracking and
recognition and localization, which is written in C++ reconstruction of 3D structures. Schmid and Mohr [5] created
programming language using OpenCV library and the software local features using Harris detector with their chosen points. In
runs on the Raspberry Pi embedded Linux platform. In the addition, they have achieved successful results by defining
experimental studies, the performance of the features which are vectors that are independent of orientation. With the
extracted with the SIFT, SURF, FAST, and ORB algorithms are
development of local feature matching, they have also
compared. This study, which is first implemented with OWI–535
robotic arm, shows that the local feature-based algorithms are successfully performed feature search within a wide range of
suitable for education and industrial applications. image databases. There are many studies that use appearance-
based object recognition approaches [6-9].
Index Terms— Object detection and recognition, Local The common problem encountered with object recognition
feature-based algorithms, OWI–535 robotic arm, Raspberry Pi, algorithms, which use the corner points to construct the local
OpenCV. feature vectors, is that they must be successful only when
working on a single scale. When working at different scales, it
I. INTRODUCTION is not possible to achieve independence from the scale because
Computer vision is an important sensing technology in the the points determined in each scale are in different positions.
robotics area because it has potential applications in many With Scale-Invariant Feature Transform (SIFT) algorithm
industrial processes. Object recognition, which is one of the developed by Lowe, the local features of the object were
popular applications of the computer vision, is the process of extracted independently of the scale using corner points [10].
finding and classifying objects that have a common feature or Ledwich and Williams [11] reduced the complexity of SIFT
relationships with each other through various processes. features and the number of features that define the
Object detection and recognition applications are generally environment by working indoors. The matching time was
made using appearance-based or local feature-based shortened with reducing the size and complexity of the
approaches, depending on the purpose of use. The local features. Guan et al. [12] developed an algorithm called
feature-based approach is used when appearance-based Speeded-Up Robust Features (SURF) to extract local features
approaches are not enough. In cases where illumination and so they performed object recognition quickly and
changes or objects are partially occluded by another object, efficiently. The studies showed that this algorithm is invariant
local feature-based approaches are usually preferred. Local to rotation and scale of the image as well as robust to
features can be expressed as specific regions containing illumination changes. Besides, they concluded that this
information about objects. The feature vectors obtained from algorithm is more robust than the cases where only the SIFT
this approach are descriptors such as distances to the center, and Principal Component Analysis (PCA)-SIFT algorithms
are used together. Heo et al. [13] extracted the features using
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
Features from Accelerated Segment Test (FAST) and Binary of these keypoints. The general stages of the implemented
Robust Independent Elementary Features (BRIEF) named local feature-based object recognition and localization system
algorithms together. They concluded that these algorithms are are given in Fig. 1.
at least fifty-fifty better in terms of speed and memory
capacity than any of SIFT or SURF algorithms. Rublee et al. Test Image Detection of Keypoints
Calculation of
Descriptors
[14] used an algorithm called Oriented FAST and Rotated
BRIEF (ORB), which aims to combine the good properties of
BRIEF and FAST algorithms as an alternative to these
Database Matching of Descriptors
algorithms because SIFT and SURF use a large number of
features for object detection and matching.
In this study, using 4 degrees of freedom (DOF) OWI–535
robotic arm, it is purposed that the recognized object is Estimation of
Homography
grasped in the workspace of the robotic arm and dropped to
the desired target. For this purpose, an appropriate
experimental setup is established. Object recognition and Object Recognition and
localization operations are performed with software written in Localization
C++ language using Open Source Computer Vision (OpenCV)
library on Raspberry Pi, which is a Single-Board Computer Fig. 1. The overall workflow diagram of local feature-based object
(SBC) based on Linux operating system. With this software, recognition and localization
firstly, images of desired objects are taken by camera and A. Feature Extraction
database is created. Secondly, by applying local feature-based
object detection and recognition algorithms named SIFT, An object image to be classified in the object recognition
SURF, FAST, and ORB to the taken test images, it is process usually contains a lot of unnecessary information.
determined which object is registered in the database. Thirdly, This reduces the precision of the classification and increases
the center point of the recognized object is determined in the processing time. To avoid this negativity, the object
terms of pixels and converted into position information that information is transformed into another database at a lower
the gripper must reach. Finally, this position information is size. The feature extraction is a transformation process in
sent via serial communication from Raspberry Pi to Arduino which the excess and unnecessary data of the object are
Mega board based on the microcontroller which will carry out eliminated.
motion of the robotic arm. Such a study is performed for the In this study, it is the main objective to recognize and localize
first time on OWI–535 robotic arm. This study shows that the object. Suitable features affect positively the success of
local feature-based algorithms are useful for applications of applications such as object detection and recognition. For this
education and industry. reason, SIFT, SURF, FAST, and ORB algorithms which have
The rest of this paper is organized as follows. In Section 2, proven successful in the scientific literature are used to extract
first, the necessary steps for object recognition and features. The processing steps of the local feature-based
localization are given. Then, object detection and recognition algorithms used in this study are performed after converting
algorithms used in feature extraction are given in detail. In the images from RGB to grayscale.
Section 3, the parts of the experimental setup are mentioned. In 2004, the algorithm shortly called SIFT was proposed
In Section 4, the comparative results of the applied local by Lowe to extract features in the image. SIFT algorithm can
feature-based algorithms on various test images are given. In extract features without being affected by illumination
Section 5, the paper is concluded. conditions, small angular changes in the image and scale of
the image [15]. On the other hand, although the algorithm is
II. OBJECT RECOGNITION AND LOCALIZATION suitable for detecting objects with high-resolution images, the
In recent years, in areas where the error is not acceptable, algorithm is slow in terms of execution time [16].
in order to implement applications such as object tracking, The corner is the most efficient one as a keypoint, and for
image matching, object detection and object recognition are this reason, the corners are detected and others are eliminated.
preferred local feature-based algorithms for extracting the The “Hessian matrix” is used to determine whether a point is a
features. Because, these algorithms are more robust to external corner. After keypoints are detected and gradient orientations
factors such as illumination changes, occlusions, scale are determined, features are obtained robust to scale, rotation
changes etc. The features extracted using local feature-based and position. After these operations, keypoint descriptors
algorithms are expressed as keypoint descriptors in the image. representing the feature vectors are obtained as shown in Fig.
These keypoints (interest points) in the image can be patch, 2.
edge, corner or blob, and simplify object recognition by
eliminating unnecessary information from the image.
In this study, local feature-based algorithms are used for
feature extraction. The features are extracted by detecting the
keypoints in the object images and calculating the descriptors
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
B. Feature Matching and Classification
In order to classify an unknown object in a given test
image after the database is created by extracting the features
of the training images, the features that are initially included
in the object database are individually matched with each of
the features extracted from the test image. The matches give
information about the object class.
Image gradients Keypoint descriptor The similarity measure between descriptors depends on the
Fig. 2. Obtaining image gradients and keypoint descriptor [15] data type. Hamming distance is preferred for binary data
In 2008, in order to extract features in the image, Bay types. Euclidean distance is available for real-valued data.
proposed an algorithm shortly called SURF, which is based on Euclidean metric is usually used to calculate distances
the calculation of the Hessian matrix. Since integral images between the descriptors.
are used in the Hessian calculation, the calculation time is Fast Library for Approximate Nearest Neighbors
reduced, which allows the algorithm to run faster than SIFT (FLANN) [20] algorithm, proposed by Muja and Lowe and
algorithm [17]. written in C++, is based on the nearest neighbor search and
The box filters are obtained using an integral image. The consists of collection algorithms such as hierarchical k-means
keypoints are detected by means of the scales obtained from tree [21] and multiple randomized kd-trees [22]. After the
the box filters. After the keypoints are obtained, a circular descriptors representing the features are matched and the
region is selected around the keypoints, and Haar wavelet distances between them are calculated, the closest matches are
filters are applied to this region to extract the descriptors for determined.
each keypoint [18]. As shown in Fig. 3, components obtained Another matching algorithm, Brute-Force (BF), is used
from the Haar wavelet filter are found for each region in the with any of the Euclidean or Hamming metrics. In this
4×4 region [17]. algorithm, the closest descriptors are found by performing all
combinations of matches between the database and the
descriptors obtained from the test image.
C. Homography Estimation
Homography is a two-dimensional projection
transformation that matches the points on a plane to the other
[23]. Some mismatches can occur when keypoint descriptors
extracted from images are matched to obtain similar points. In
Fig. 3. Creation of keypoint descriptors [17]
this study, the algorithm called Random Sample Consensus
In 2006, Rosten and Drummond proposed an algorithm (RANSAC) [24] was used because it can detect the most
shortly called FAST in order to detect keypoints in the image accurate matches by eliminating mismatches between these
[19]. FAST performs the detection of the keypoints by similar point set.
detecting corner points and has high speed and reliability [16]. The RANSAC algorithm proposed by Fischler and Bolles
In order to be a fast and effective alternative to the SIFT or is a general parameter estimation approach designed to
SURF algorithms in feature extraction of the image, an overcome a large number of outliers in the input data [25]. By
algorithm shortly called ORB was proposed by Rublee et al. in using the RANSAC algorithm, the homography estimation is
2011 [14]. ORB algorithm uses FAST for detection of performed. As a result of obtaining the most accurate matched
keypoints and BRIEF developed in recent years for calculation descriptors, the position of the object is correctly determined.
of descriptors. The features extracted by this algorithm are Fig. 5 shows the matches between a sample object image and
both insensitive to rotation and noise and robust to a test image. As it can be seen, the RANSAC algorithm
illumination changes. provides that the appropriate point pairs are enclosed with a
Figure 4 shows the representation of the keypoints bounding box by eliminating outliers.
detected by SIFT, SURF, FAST, and ORB algorithms of an
object image used in the study.
Object image
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
III. EXPERIMENTAL SETUP of the gripper to determine how much the gripper should be
It is aimed respectively to orient a recognized object or closed during the grasping of the object.
objects in the workspace of the 4 DOF OWI–535 robotic arm,
grasp it and drop it to a destination using local feature-based
object detection and recognition algorithms. The experimental
setup established for this purpose is shown in Fig. 6.
gripper is open or closed. By correlating the information read Fig. 8. Process steps for experimental studies
from the potentiometers and the joint angle, a feedback system
is obtained and position control of the joints is carried out. For the experimental studies, the database contains the
Besides, Force Sensitive Resistor (FSR) is placed on one side feature vectors of the object images in small size because the
objects used in training phase of the system are the objects
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
that can be grasped by the gripper. Furthermore, since the used based matcher, and the descriptors obtained by ORB were
camera is two-dimensional, the height of the objects was matched using BF matcher. The Hamming metric was used to
selected very close to each other. Thus, the gripper can match binary descriptors calculated by ORB.
perform the grasping operation at the fixed z-position After the test images were identified, the descriptors
coordinate (pz). After images of objects randomly placed in obtained by SIFT and SURF were matched using FLANN-
accordance with a workspace of the robotic arm were taken by based matcher.
the camera as RGB at 1024×768 resolution, only images The experimental results obtained by using SIFT+SIFT
belonging to the objects were cropped. As shown in Fig. 9, the and SURF+SURF algorithms are given in Table I.
database was created with 10 objects. In the training phase of
TABLE I. THE EXPERIMENTAL RESULTS (SIFT+SIFT AND SURF+SURF)
the object recognition and localization studies, 5 different
object images were used for each class to include difficult SIFT+SIFT SURF+SURF
conditions such as illumination changes, rotations or scale Number Number
differences due to camera viewpoint. Test image
of Number of Number
number
matched of matched of
descriptor outliers descriptor outliers
pairs pairs
1 95 0 129 6
2 104 3 45 3
3 52 0 50 9
4 69 0 32 1
6 82 0 88 12
7 25 0 31 0
8 39 0 25 0
Fig. 9. RGB images of the objects used for database 9 19 0 12 0
20 11 0 9 0
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
Fig. 12. Test operations for test image 10 and 12, respectively (SURF+SURF)
2 145 12 35 2
3 140 14 134 42
4 90 11 5 0
6 152 7 46 4
7 53 12 6 0
8 26 2 7 0
9 16 0 7 0
10 31 1 19 0
Fig. 14. Comparison of feature extraction times
11 152 152 313 313
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
V. CONCLUSIONS
In this study, an experimental setup was established to
recognize an object, to determine the position of the object,
and to carry out the process of grasping and moving the object
with the aid of the gripper of the robotic arm. Object
recognition and localization processes were accomplished
with software written in C++ using the OpenCV library and
+z embedded in Raspberry Pi SBC.
+x Different images were taken for both the training phase
-y View field of
camera
and the test phase in order to perform object recognition and
70 mm
localization processes experimentally. The performances of
187.3 mm 226 mm
1024
the local feature-based algorithms used on these images were
pixel compared.
768
pixel
SIFT+SIFT, SURF+SURF, FAST+SURF, and ORB+ORB
413.3 mm keypoint detector and keypoint descriptor algorithms were
Fig. 15. Necessary information for pixel-mm conversion used for experimental studies. The results of object
pz was determined a constant value because the heights of recognition with these algorithms have proven to be quite
the objects were selected very close to each other as successful. In the feature matching operation performed for
mentioned before. In our study, this value was set as 30 mm the recognition process, the most accurate matches have been
and found suitable. In addition, since the robotic arm has no successfully achieved with the SIFT+SIFT algorithm about
5th axis on its gripper, the object was placed in such a way between 97% and 100%. The SIFT+SIFT algorithm was
that the gripper can grasp it. followed by SURF+SURF in terms of matching performance.
Experimentally visualize the processes carried out to It has once again been shown that the FLANN-based matcher
recognize and move the object. The coordinates of the center for SIFT and SURF descriptors and the BF matcher for ORB
point of the recognized object were determined in terms of the descriptors are suitable for use with the Hamming metric as
pixel, and the position coordinates needed to reach the gripper described in the literature. In order to compare the execution
with the necessary calculations were obtained. After the times of the algorithms with each other, the feature extraction
gripper reaches the specified position, the motions performed times per keypoint was taken as reference. FAST+SURF and
are as shown in Fig. 16, respectively. ORB+ORB were obviously the fastest algorithms. The
SIFT+SIFT algorithm, on the other hand, appeared to work
much slower than other algorithms, in contrast to the best
matching performance.
REFERENCES
[1] Y. Mingqiang, K. Kidiyo, and R. Joseph, “A survey of shape
feature extraction techniques”, Pattern Recognition, IN-TECH,
pp. 43–90, 2008.K. Elissa, “Title of paper if known,”
unpublished.
[2] H. Morevec,“Rover visual obstacle avoidance”, Proceeding of
(a) (b) (c) the 7th International Joint Conference on Artificial Intelligence
(IJCAI), vol. 2, pp. 785–790, 1981.
[3] C. Harris and M. Stephens, “A combined corner and edge
detector”, 4th Alvey Vision Conference (AVC), vol. 15, pp.
147–151, 1988.
[4] C. Harris, “Geometry from visual motion”, Active vision, MIT
Press, pp. 263–284, 1993.
[5] C. Schmid and R. Mohr, “Local grayvalue invariants for image
retrieval”, IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), vol. 19, no. 5, pp. 530–534, 1997.
[6] K. N. Shree, “Visual learning and recognition of 3-D objects
(d) (e) from appearance”, International Journal of Computer Vision
Fig. 16. Motions order of the robotic arm: (a) reaching to the object, (IJCV), vol. 14, no. 1, pp. 5–24, 1995.
(b) grasping the object, (c) moving the object, (d) dropping the object,
[7] M. Swain and D. Ballard, “Color indexing”, International
and (e) moving to the home position
Journal of Computer Vision (IJCV), vol. 7, no. 1, pp.11–32,
1995.
[8] B. Schiele and J. L. Crowley, “Object recognition using
multidimensional receptive field histograms”, 4th European
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.
Conference on Computer Vision (ECCV), vol. 1, pp. 610–619, [18] K. Mikolajczyk and C. Schmid, “A performance evaluation of
1996. local descriptors”, IEEE Transactions on Pattern Analysis and
[9] C. Shanthi and N. Pappa, “An artificial intelligence based Machine Intelligence (TPAMI), vol. 27, no. 10, pp.1615–1630,
improved classification of two-phase flow patterns with feature 2005.
extracted from acquired images”, ISA Transactions, vol. 68, pp. [19] E. Rosten and T. Drummond, “Machine learning for high speed
425–432, 2017. corner detection”, Proceeding of the 9th European Conference
[10] D. G. Lowe, “Object recognition from local scale-invariant on Computer Vision (ECCV), vol. 1, pp. 430–443, 2006.
features”, IEEE 7th International Conference on Computer [20] M. Muja and D. G. Lowe, “Fast approximate nearest neighbors
Vision (ICCV), vol. 2, pp. 1150–1157, 1999. with automatic algorithm configuration”, International
Conference on Computer Vision Theory and Applications
[11] L. Ledwich and S. Williams, “Reduced SIFT features for image
(VISAPP), vol. 1, pp. 331–340, 2009.
retrieval and indoor localisation”, Australian Conference on
Robotics and Automation (ACRA), vol. 322, p. 3, 2004. [21] L. Y. Pratt, “Comparing biases for minimal network
construction with back-propagation”, Part of: Advances in
[12] F. Guan, X. Liu, W. Feng, and H. Mo, “Multi target recognition
Neural Information Processing Systems, Morgan Kaufmann
based on SURF algorithm”, 6th International Congress on
Publishers Inc., pp. 177–185, 1989.
Image and Signal Processing (CISP), vol. 1, pp. 444–453,
2013. [22] C. Silpa-Anan and R. Hartley, “Optimised KD-trees for fast
image descriptor matching”, IEEE Computer Vision and
[13] H. Heo, J. Y. Lee, K. Y. Lee, and C. H. Lee, “FPGA based
Pattern Recognition (CVPR), pp. 1–8, 2008.
Implementation of FAST and BRIEF algorithm for object
recognition”, TENCON 2013-2013 IEEE Region 10 Conference [23] D. Vaghela and P. Naina, “A review of image mosaicing
(31194), pp. 1–4., 2013 techniques”, International Journal of Advance Research in
Computer Science and Management Studies (IJARCSMS), vol.
[14] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an
2, no. 3, pp. 431–437, 2014.
efficient alternative to SIFT or SURF”, International
Conference on Computer Vision (ICCV), pp. 2564–2571, 2011. [24] M. A. Fischler and R. C. Bolles, “Random sample consensus: a
[15] D. G. Lowe, “Distinctive image features from scale-invariant paradigm for model fitting with applications to image analysis
keypoints”, International Journal of Computer Vision (IJCV), and automated cartography”, Communications of the ACM, vol.
vol. 60, no. 2, pp. 91–110, 2004. 24, no. 6, pp. 381–395, 1981.
[16] H. Joshi and M. K. Sinha, “A Survey on image mosaicing [25] K. G. Derpanis, “Overview of the RANSAC algorithm”, Image
techniques”, International Journal of Advanced Research in Rochester NY, vol. 4, no. 1, pp. 2–3, 2010.
Computer Engineering & Technology (IJARCET), vol. 2, no. 2, [26] C. Kaymak and A. Ucar, “Kinematic Analysis of OWI–535
pp. 365–369, 2013. Robotic Arm and Simulation of Its Motion using
[17] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up SimMechanics”, 8th International Advanced Technologies
robust features (SURF)”, Computer Vision and Image Symposium (IATS), pp. 2468–2476, 2017.
Understanding, vol. 110, no. 3, pp. 346–359, 2008.
Authorized licensed use limited to: Army Institute of Technology. Downloaded on September 29,2024 at 12:33:52 UTC from IEEE Xplore. Restrictions apply.