Journal of Intelligent Fuzzy Systems
Journal of Intelligent Fuzzy Systems
IOS Press
We have applied systematic analysis on the pro- agery and solve pose from global optimization meth-
posed algorithm (Flow Filter) by using various objects ods as correspondence matching.
in a public dataset (BigBird). BigBird image dataset Template object is produced by scanning 3D shape
includes large scale data for images for various objects, model of an object. Template based pose estimation al-
corresponding pose information, and can be down- gorithms such as ICP and CPD are widely studied in
loaded on the internet [7]. Those objects are located on the previous studies [9,10], ICP algorithm converges
a plate, have poses from 0 to 180 degree with respect to to local minimum by finding distances iterative, and
center part of the plate. BigBird dataset also supplies proposes high accuracy estimates when initialized with
a groundtruth pose that have been scanned with color good parameters and small transformations. When the
and depth images, meshes, and calibration parameters. ICP gets local minimum distance, ICP computes R
Additionally, BigBird also provides high quality color and T that gives transformation parameters for model
images that have been acquired from the Canon cam- and target point clouds. Chen et al. [11] developed a
era. We have applied test experiments on the Flow Fil- variant of pose algorithm is point to plane ICP algo-
ter algorithm on a scenarios that are subject to lower rithm that iteratively searches point matches that de-
resolution imagery. crease the defined error metric. Even the approach able
Rigid object pose can be specified as searching fine to produce accurate estimates, accuracy of ICP algo-
matches between the model and target object. Trans- rithm is poorer in some cases where objects are de-
formation between 3D points can be quantified in fined in planar surfaces, include non-Gaussian noise,
terms of a R and a T in 3-axis. Our technique neither object cluttering. In the literature, researchers present
needs training process with labeled images nor pre- modifications and better techniques for the ICP algo-
defined CAD templates to refine pose from rigid ob- rithm: Iversen et al [12] presented shape descriptors
jects. Flow Filter uses extracted point cloud from low to the ICP that enables lower computation for match-
resolution depth and flow information from color, and ing, Presnov et al. [13] combined the ICP pose esti-
doesn’t extract image features. We have integrated our mates with wearable sensor measurements by lineariz-
algorithm on to FilterReg pose estimation algorithm ing problem applying Extended Kalman Filter (EKF),
[8], and our technique upgrades FilterReg to higher ac- Aghili et al. similarly [14] presented a fusion method
curacy pose estimation method. We presented key con- by combining sensor data to the ICP algorithm with in-
tributions to the literature in two aspects. Firstly, Flow tegrating Adaptive Kalman Filter (AKF) for pose prob-
Filter uses partial object imagery, enables our algo- lems related to space shuttle. Myronenko developed a
rithm applicable to real world problems without requir- probabilistic method method termed as CPD [10], their
ing virtual and noiseless object model. Secondly, Flow algorithm search registration of point data by model-
Filter outperform the state-of-the-art matching meth- ing point data as a Gaussian Mixture Model (GMM)
ods on the BigBird dataset. and other point cloud as a data point set, and CPD
In the rest of our paper, we have explained related seeks for finding maximum GMM posteriori probabil-
researches in Section 2, we explained proposed algo- ity. Delavari et al. [15] applied from mesh construc-
rithm method at Section 3. We submitted quantitative tion of objects and added new model parameters on the
results and error analysis at Section 4 and summarized CPD, their proposed method was applied to biomedi-
our findings and future development and contributions cal matching problem that gave increased accuracy in
at Section 5. terms of pose, Liu et al. [16] developed a likelihood
field model to the CPD that enhances the CPD to refine
registration with sparse point data for the objects that
2. Related Study moved large. Biber et al. [17] presented Normal Dis-
tance Transform (NDT) that defined point data to prob-
Pose estimation algorithms can be defined with abilistic distributions. The NDT defines point cloud
three main categories: Template based methods, fea- data as a set of 2D normal distributions and other mea-
ture based methods, and machine learning based meth- surements to the NDT is defined as maximizing sum
ods. There are different approaches to solve pose es- that defines score on the density for the other measure-
timation depends on application and data. Some al- ments. Hong et al. [18] enhanced the NDT by trun-
gorithms utilize depth image and computes pose by cating and combining Gaussian components of point
searching point data matches locally as ICP algo- data that increased pose matches in terms of accu-
rithms, other algorithms utilize from stereo RGB im- racy. Liu et al. [19] studied on the NDT, they proposed
3
an upgrade that clusters Gaussian distribution trans- Their algorithm extracts line features and pose are re-
form that adds point clustering and k-means cluster- fined by matching line correspondences. Quan et al.
ing to find best matches. Even though researches pro- [29] proposed a novel method relies on voxel binary
posed methods that decreases error with regards to the descriptors that makes 3D binary characterization on
NDT, there are still issues with regards to bad conver- object geometry and their algorithm finds pose param-
gences. Opromolla et al. [20] used LIDAR point data, eters and fine registers by matching features. Liu et
their method searches centroid of LIDAR points and al. [30] proposed an algorithm that estimates pose by
pose is computed from predefined similarity term and matching image edge features, their algorithm’s es-
their algorithm requires template. They have tested on timation accuracy is highly dependent to edge fea-
space robot pose estimation applications. Picos et al. tures and object geometric shape information. Contour
[21] integrated correlation filters that estimates loca- based methods are also extensively studied research to
tions and orientation information of the target points estimate pose, contours can present edge information
by searching highest correlation between object point on model and target shape. Leng et al. [31] proposed a
data. Philips et al. [22] developed an algorithm on LI- pose refinement method that extracts model and target
DAR sensor for excavator, their algorithm uses max- contours from gray image, and iteratively seeks match-
imum evidence method infers 6 DoF pose which can ing until converge. Schlobohm et al. [32] used con-
be equal to most consistent LIDAR point cloud. Pre- tour information and proposed projected features that
defined CAD models can be used to find 3D point in- increased accuracy of the pose recovery. Their algo-
formation and can be used for shape matching. CAD rithm computes pose by global optimization. Zhang et
model presents noiseless and ideal representation of al. [33] proposed an algorithm that uses object shape
object that can be enhanced to pose estimation accu- and image contour, their algorithm detects inliers, re-
racy. He et al. [23] developed a template based pose jects outlier points and computes pose of the target ob-
estimation algorithm that extracts key points of a ob- ject model. Similarly, Wang et al. [34] also studies im-
ject, their algorithm use CAD model and refines 6 DoF age contours and edge features, their proposed method
pose by minimizing the error measure. Tsai et al. [24] applies particle filtering that searches for best matches.
developed a method that combines template matching Their algorithm presents robust pose recovery in clut-
and Perspective-n-Point (PnP) pose estimation, their tered object images.
algorithm extracts image key points and seeks for best Nowadays, machine learning algorithms rapidly de-
match. Their algorithm can be used in Augmented Re- veloping and presenting novel approaches on particu-
ality (AR) applications. Song et al. [25] studied a CAD larly for robotics and computer vision problems. Ma-
model based pose estimation algorithm, their algo- chine learning algorithms comprised of convolutional
rithm filters depth for removing outlier points, and ran- filters and learning functions that can be trained with
dom bin picking computes pose from color imagery. labeled images and can produce robust pose estimates
Feature based pose estimation methods are also ex- for dynamic environments and can overcome the lim-
tensively studied research topic. The main concept is to itation of manual design features. Machine learning
estimate feature matches and descriptors from model based methods generally segment the target and esti-
and target frames which is expected to be robust to im- mates pose of the foreground object. Zeng et al. [35]
age deformations in an object. Then, methods refine proposed a Convolutional Neural Network (CNN) for
pose of the object by minimizing the error measure or robot manipulators that is able to pick and release ob-
pooling method. Feature based pose estimation can be jects. Le et al. [36] presented a CNN network that for-
divided to local and global methods. To refine accu- merly segments object and recover pose of the target
rate pose, images are need to be sufficiently enough for robots. Brachmann et al. [37] proposes an algo-
texture on model and target object of interest. Chen rithm that uses random forest for classification from
et al. [26] used optical flow measurements that helps images and estimates pose of the object. Kendall et
to estimate large motions and their algorithm refines al. [38] applied fine tuning on the GoogleNet and pro-
pose by combining template warping and using fea- posed a CNN that refines pose from RGB images. Hua
ture correspondences as SIFT features. Liu et al. [27] et al. [39] developed a hour glass neural networks by
presents a novel feature called P2P-TL and their al- adding resudual modules to neural networks that finds
gorithm models target appearance that decreases com- pose. Giefer et al. [40] proposed a cascade connected
putational time, pose estimation error. Teng et al. [28] CNN networks for object localization and pose recov-
developed an algorithm for refining pose of aircrafts. ery. Machine learning based methods are dependent to
4
3. Methods
T12 = T2 − T1 (2)
cos β cos γ sin α sin β cos γ − cos α sin γ cos α sin β cos γ + si
R12 = cos β sin γ sin α sin β sin γ − cos α cos γ cos α sin β sin γ + sin
− sin β sin α cos β cos α cos β
(4)
cos β cos γ sin α sin β cos γ − cos α sin γ cos α sin β cos γ + si
R12 = cos β sin γ sin α sin β sin γ − cos α cos γ cos α sin β sin γ + sin
− sin β sin α cos β cos α cos β
Fig. 1. Objects in BigBird dataset have given, and we tested Flow (5)
Filter algorithm on the given objects above.
Detect Outliers:
N
1 X
µ= Zf i (9)
N i=1
2
f −µ
Z
1 −1
f (Zf ) = √ e 2 σ
(11)
σ 2π
M step:
(16)
on the public pose dataset (BigBird) and results have [8], and refines relative pose by modeling and register-
proven that Flow Filter algorithm produces improved ing point clouds using a probabilistic point set method.
pose estimation recovery with regards to alternative al-
gorithms. Flow Filter offers a potential to enhance un-
derstanding of spatial and 3D relationships in various 4. Experiments
scenarios for robot applications.
Flow Filter needs to know depth points that cor- Flow Filter algorithm has been tested on public
responds on RGB image coordinates. Since we have dataset termed BigBird pose dataset. After systematic
camera intrinsic and extrinsic parameters for depth and tests on various objects and multiple pose variances,
color cameras, Flow Filter projects depth points to Flow Filter has presented higher pose refinement on
temporary coordinates defined somewhere in reference partial objects that can be common on real world prob-
coordinate system. Then, projected depth points in ref- lems. Systematic tests has been completed by using
erence coordinate system is projected on color camera 12 different objects where objects have different size,
coordinate system. Then, points which are defined on shape, and texture properties, see Figure 1. Test ob-
depth projected on to color imagery required to be in- jects are rotated from small to large pose changes (3-
side the model object boundary. Flow Filter transforms 30 degrees). Since objects are rotated on single axes,
and projects depth image pixels on color image that we have reported results that quantified in terms of axis
gives depth pixels that matches to color image pixels. angle errors.
Point projection can create large noises especially on
the object boundaries which may lead to incorrect pose
matches due to outliers. Therefore, the Flow Filter re- 5. Results
jects noisy and far away points from model and target
objects. Target depth image can include some sparse We have tested our Flow Filter pose estimation algo-
points projected color images. To find all depth values rithm to evaluate estimations by analyzing estimation
inside object boundary, Flow Filter applies linear in- results with ground truth poses from BigBird dataset.
terpolation on the model and target. This can be easy Mean absolute error results have been quantified for
inside of the target object since depth changes linear in Flow Filter, FilterReg, and CPD algorithms. It has been
neighbour points of the object. However, depth inter- quantified and Flow Filter computes pose of the rigid
polation produces significant errors in object bound- object in improved accuracy than FilterReg and CPD
ary points because of the big depth difference between pose estimation. Instead of using plain depth or 2D
foreground and background locations. Depth error can color measurements of model and target, Flow Filter
be eliminated by applying 2D filter on depth that elim- enables higher accuracy pose estimations by combin-
inates large depth difference which can be detected ing of filtered depth measurements and corresponding
with fitting depth with Gaussian distribution inside of optical flow from the CNN algorithm. Axis angle val-
the 2D filter grid. Then, proposed Flow Filter finds 3D ues has been utilized for error analysis. Mean abso-
point cloud on color imagery that presents the rigid ob- lute error values are computed and results have been
ject for model and target points. reported in the next.
Model and target object depth point projection on Advil box is small sized and some textured on the
color imagery have been completed at prior step and object which enables limited number of depth on the
we have without prior pose estimation. Flow Filter target. Mean absolute error is quantified as 2.95, 8.69,
also required to know optical flow measurements. Op- and 5.8 degrees for Flow Filter, FilterReg, and CPD
tical flow is computed from color image frame with algorithms respectively, see Figure 1. Flow Filter pro-
the method that works based on image warping and vides higher accuracy pose matches with respect to
CNN networks [41], and the CNN network produces FilterReg and CPD algorithms. Object shape informa-
promising estimations for low-textured objects, small tion can be impacted estimation accuracy. Fish can is a
motions. Flow Filter uses optical flow from model to small sized, cylindrical shaped object that can be hard
target frame that gives image correspondences in color to track due to shape properties and limited number
imagery. Flow Filter masks flow points inside of the of points on the target object, see Figure 1. Therefore,
object boundary and fuses optical flow with corre- pose results have shown that all pose estimation al-
sponding 3D point cloud information. Finally, our pro- gorithms produced poor results for relative pose esti-
posed algorithm is integrated on FilterReg algorithm mation. Angle error is quantified as 11.12, 20.62, and
7
16.43 for Flow Filter, FilterReg, and CPD respectively. on non-cylindrical, 3D shapes with some texture on
Optical flow can be suffered as angle of R increases the tracked object. Flow Filter fuses color and depth
that increases noise on object data points. Flow Filter information and significantly rejects outliers that pro-
algorithm produces 2.19 angle error for Pepto bottle, duced robust and enhanced pose accuracy which can
see Figure 1. FilterReg and CPD produces less than be solution for error critical applications. Flow Filter
1 angle error for few samples, but amplitude of er- can be implemented on RGBD sensors as Kinect that
ror enhances as angle of R enhances because partial enables cheap and efficient pose estimation that makes
shape differs from the target frame. FilterReg produced is suitable for indoor applications. We have tested our
7.57 angle error and CPD produced 10.69 angle error. Flow Filter for low resolution depth and RGB images
Flow Filter algorithm, FilterReg and CPD produces and Flow Filter provided improved pose estimation er-
promising results for Quaker granola box because the ror than CPD and FilterReg algorithms. Previous algo-
object 3D shaped and rectangular and some texture rithms as CPD can be produced false matches because
on the surface which can be matched easier to track they can be trapped at local minima and can be suf-
pose because point cloud sets converge easier than pla- fered from outliers. Flow Filter can be applied to real
nar surfaces, see Figure 2-3. Angle errors are quanti- time and applications in dynamic environments, can be
fied as 2.12, 8.69, and 5.81 for Flow Filter, FilterReg, useful technique for robot arm or mobile robot appli-
and CPD respectively. Noodle box has some texture cations that need to know relative pose. We will ex-
on the package, and we have tested three algorithms, tend our work to real time relative pose estimation and
see Figure 1. Flow Filter produced 1.51 angle error for application on mobile robots for a future research.
chicken noodle box. FilterReg produces much higher
error, quantified as 8.0 mean absolute error in angle.
Similarly, CPD algorithm produces false alignment for
References
chicken noodle box, produces 9.33 angle error. Over-
all mean absolute error values are equal to 3.03, 8.0, [1] Murphy-Chutorian, Erik and Trivedi, Mohan Manubhai, Head
9.33 for Flow Filter, FilterReg, and CPD respectively. Pose Estimation and Augmented Reality Tracking: An Inte-
As it can be seen results on variety of test objects, it is grated System and Evaluation for Monitoring Driver Aware-
clearly seen that amplitude of mean absolute error can ness, doi=10.1109/tits.2010.2044241, vol=11, year=2010, IEEE
be significantly affected with object size, shape, and Transactions on Intelligent Transportation Systems
[2] Yang, Tianlong and Zhao, Qiancheng and Wang, Xian
texture properties of tracked objects. Object matching and Zhou, Quan, p=2118, Sub-Pixel Chessboard Corner
results have been given, see Figure 4. Localization for Camera Calibration and Pose Estimation,
doi=10.3390/app8112118, vol=8, 2018, Applied Sciences
[3] Zhao, Rui and Ali, Haider and van der Smagt, Patrick, Two-
6. Conclusion stream RNN/CNN for action recognition in 3D videos, doi =
10.1109/iros.2017.8206288, 2017 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS)
Depth cameras provide limited resolution and are [4] Mykhaylo Andriluka and Roth, Stefan and Schiele, Bernt,
more expensive than color cameras, so pose estimation Monocular 3D pose estimation and tracking by detection,
from depth only measurements can be suffered from doi=10.1109/cvpr.2010.5540156, 2010, Computer Vision and
accuracy and robustness that can be problem applica- Pattern Recognition
[5] Alper, Mehmet Akif and Goudreau, John and Daniel, Mor-
tions that prone to error sensitive applications. Color ris, pages = 468-481, Pose and Optical Flow Fusion (POFF)
cameras are generally higher resolution and cheaper for accurate tremor detection and quantification, doi =
than depth cameras, but does not provide depth. In 10.1016/j.bbe.2020.01.009, volume = 40, 2020, Biocybernetics
this context, color cameras can be fused with depth and Biomedical Engineering
which can be enhanced detection, pose recovery. Low [6] Ding, Dan and Cooper, Rory A. and Pasquina, Paul F. and
Fici-Pasquina, Lavinia, pages = 131-136, Sensor technology for
resolution cameras as depth give limited number of smart homes, doi = 10.1016/j.maturitas.2011.03.016, volume =
points and errors on projection on tracked objects that 69, year = 2011, Maturitas
can cause problems in sensor fusion. Similarly, object [7] Singh, Arjun and Sha, James and Narayan, Karthik S and
shape and texture information impacts pose estima- Achim, Tudor and Pieter Abbeel, BigBIRD: A large-scale 3D
tion accuracy especially small pose changes. We here database of object instances, doi = 10.1109/icra.2014.6906903,
2014, International Conference on Robotics and Automation
present cheap and accurate estimation. Flow Filter uses [8] Gao, Wei and Tedrake, Russ, Massachusetts Institute of Tech-
pre-calibrated cameras using extrinsic camera param- nology, FilterReg: Robust and Efficient Probabilistic Point-Set
eters. Then, Flow Filter provided improved accuracy Registration Using Gaussian Filter and Twist Parameterization,
8
[34] Wang, Bin and Zhong, Fan and Qin, Xueying, p = 12307- https://ieeexplore.ieee.org/document/7780735, 2016, IEEE
12331, Robust edge-based 3D object tracking with direction- Xplore
based pose validation, doi = 10.1007/s11042-018-6727-5, vol = [38] Kendall, Alex and Grimes, Matthew and Cipolla, Roberto,
78, 2018, Multimedia Tools and Applications PoseNet: A Convolutional Network for Real-Time 6-DOF
[35] Zeng, Andy and Yu, Kuan-Ting and Song, Shuran and Camera Relocalization, doi = 10.1109/iccv.2015.336, urldate
Suo, Daniel and Walker, Ed and Rodriguez, Alberto and = 2022-08-26, 2015, 2015 IEEE International Conference on
Xiao, Jianxiong, p = 1386–1383, Multi-view self-supervised Computer Vision (ICCV)
deep learning for 6D pose estimation in the Amazon [39] Hua, Guoguang and Li, Lihong and Liu, Shiguang, Multipath
Picking Challenge, doi = 10.1109/ICRA.2017.7989165, url affinage stacked—hourglass networks for human pose estima-
= https://ieeexplore.ieee.org/abstract/document/7989165, 2017, tion, doi = 10.1007/s11704-019-8266-2, urldate = 2021-10-27,
IEEE Xplore vol = 14, 2020, Frontiers of Computer Science
[36] Benchmarking Convolutional Neural Networks [40] Giefer, Lino Antoni and Arango Castellanos, Juan Daniel and
for Object Segmentation and Pose Estimation, Babr, Mohammad Mohammadzadeh and Freitag, Michael, p =
IEEE Conference Publication | IEEE Xplore, url = 424, Deep Learning-Based Pose Estimation of Apples for In-
https://ieeexplore.ieee.org/document/8457942, ieeex- spection in Logistic Centers Using Single-Perspective Imaging,
plore.ieee.org doi = 10.3390/pr7070424, urldate = 2020-01-05, vol = 7, 2019,
[37] Brachmann, Eric and Michel, Frank and Krull, Alexan- Processes
der and Yang, Michael Ying and Gumhold, Stefan and [41] Liu, Pengpeng and Lyu, Michael R and King, Irwin and Xu,
Rother, Carsten, p = 3364–3372, Uncertainty-Driven Jia, SelFlow: Self-Supervised Learning of Optical Flow, doi =
6D Pose Estimation of Objects and Scenes from a Sin- 10.1109/cvpr.2019.00470, urldate = 2023-07-27, 2019, CVPR
gle RGB Image, doi = 10.1109/CVPR.2016.366, url =