Lidiawaty 2017
Lidiawaty 2017
Abstract— Navigation system of some camera devices such as direction can make a small device camera like a drone become
robot could be done by measure the rotation of device’s wheels useful as a searching tool in an enclosed place.
and device’s velocity. But in several cases, like device for rescue
and surveillance, sometimes the device need to deal with an From the monocular camera, it can capture the video
uneven surface that makes device’s wheels slipped, impacting in stream while the robot or its camera is mobilizing to monitor
navigation measurement. Fortunately, almost all of devices the area around it. Video stream contains frames. If frames are
nowadays equipped with at least one camera to monitor the being extracted we can get several images that continuously
environment. So, this research purpose to utilization image of the increases its number every time until the video stream ends.
camera vision that’s been captured to build a navigation system. The image from the extracted frame is the image that will be
We will show that a simple monocular camera could build an processed time by time with continuous looping process.
alternative navigation system for several uses, while the system
also navigates and tracks camera’s paths to draw the path The process has two main step. The first step is for tracking
mapping. Firstly, we tried to detect the camera motion and its the robot motion by detecting the image feature motion of the
direction by using optical flow model. To improve this model, we images. It’s like using optical flow model, but for the feature
use feature transform, which is SURF. Secondly, after we know transformation, we use SURF approach. The second step is the
the direction from feature transform, we got the formula to know visual odometry, process to determine the odometry using
the real direction and use it for drawing the path that camera sequences image from camera. This process turns the image
took for tracking. We tested our application directly in our video coordinate in pixels into the real world coordinate in
stream camera and compare the system coordinate with the real centimeters.
world coordinate in centimeters unit.
This research also uses OpenCV library with extra
Keywords—feature transformation, monocular camera, SURF, modules, since this library already contains functions to apply
optical flow, navigation system. feature transform. The purpose of using this library because we
need a fast process since we process frames in the video
I. INTRODUCTION stream.
In some case, a camera device such as a robot with wheels
II. METHODS
can build a navigation system by measure the rotation of its
wheels and measure the velocity of robot. Or in the other case, As it mentioned before, that we have two steps to
a robot stereo camera can build the navigation system be completely build the entire application. Optical flow model and
measure the exact trajectory around the device by applying visual odometry technique. In this part, we will discuss these
visual odometry. Singh, Avi also stated that stereo camera is steps deeper.
robust because the stereo camera has more data available than Optical flow is a pattern that form by object motion based
monocular camera[5]. But, not all devices can be embedded on observer’s point of view, can be eye or camera. It’s like
stereo camera. It is useless to add stereo camera in small device when human eyes see animal. Human eyes always follow the
like drones that have the trends to become smaller and smaller. animal motion. Therefore, if the animal motion goes to the
It is different case to make a basic navigation system in device right, human eyes will follow to the right and vise versa.[1] That
that only equipped with monocular camera. is the case if the human stands in its position and doesn’t move.
However, small devices like a drone can be implemented to From that though, we assume that if the animal stands on its
many usabilities. A small size device with a camera can be position, and the human moves in significant direction will be
used to monitor the environment. In future purpose, a basic different case. The logic is, if human moves to the right, then
navigation system that can give information about distance and
141
research using the algorithm of Brute Force matching, one of
matching method that straighforward to the problem definition.
This algorithm has more advantage than others, it is easy to
use, easy to implement and wide applicability for completing
severel problems[6]. Such as searching, sorting, matching,
matching string, matching pattern, matrix multiplication and so
on.
This matching algorithm has been chosen because it doesn’t
need itteration or recursive process. In case of matching
pattern, either text or image has the same procedure. Pattern
can be feature or sample which is uniqeu or has a different
characteristic than the others part. But in the image, it will
match the integral image. Because we already apply SURF
algorithm first, we just need to match the integral image from
blob that have the same contrast.
After getting the feature of image, it will be serached the
Fig. 3. Blob matching information about integral image. Then, in the other image,
which will be matched with feature, need to search the same
B. Determining Feature part of integral image from the top-left side image until the
bottom-right image. While the algorithm find the match, it will
SURF needs a feature that has been determined in the time t inform the location where the match is found. The information
to be compared with the entire frame in time t+¨t. In this case, forms in pixel coordinate.
we determined 8 features in the very first frame. The location
of all the eight frames can be seen in Fig 4. We applied eight D. Outlier Elimination
frames divergently to evade no matching found in the trial
process. The matching result does not only show one single perfect
match but as much as the system can find the perfect match.
The number of matches depends on the number of minHessian
variable, which is a variable to describe hessian matrix. The
lower minHessian value, the more keypoint will be found and
so does the number of matches and noises. Otherwise, the
highest minHessian value, the less keypoint will be found and
the number of matches and noises also decreases.
Eventhough the noises is decreases, but it still interferes the
process to get the pixel coordinate. The noises here is the
match that the algorithm read it as a perfect match, while in the
reality it’s not even close with the other matches. So, these
noise matches can be located far away than the others, whereas
we need to calculate the mean of matches coordinate. Even one
of single noise match is being calculated, it will be ruin all of
data. The outlier or the noise matches that mentioned before
Fig. 4. Features location
can be illutrated in fig. 5.
Feature-0 and feature-1 have located in the center of frame.
It is because we start to search keypoint from the centre, which
means the frame-0. If keypoint didn’t found in the feature-0, it
start to search in feature-1, feature-2, feature-3, feature-4,
feature-5, feature-6 and the last feature-7. With this method,
will prohibited system find no match in matching process.
All these features determined in the 0th frame of sequence
frames, and will be compared to the entire frame’s area in the
next frame. After that we determined the midpoint again to
create feature in second frame that will be compared with third
frame and so on. Each frame has the sama feature’s size, which
is 100 X 100 in pixels unit.
C. Feature Matching
In feature transforms, it needs matchig process to compare
the image in timen t and image in time t+¨t. Therefore, this
142
When detected the characteristic, the system will draw the
path to map the camera route. When the camera starts, the
system will draw the point in the center of frame if camera is
idle. When it moves forward, the system will draw the line to
top. When it moves backward, the system will draw the line to
bottom. When it moves to the right, it will draw the line to the
right. And when it moves to the left, it will draw the line to the
left. The drawing also depends on camera orientation. The
instance is after the camera moves the right, and it turn right,
the system will draw the line to the bottom, since the last
camera orientation already in right.
Every direction characteristic found, the system will draw
the line for 10 pixels. Except when idle, the line will not be
drawn, it will stay in the last state. The next step is measuring
(5a)
how many centimeter that represents every 10 pixels by scaling
between coordinate in pixel and coordinate in real.
The coordinate that’s being used in this research is the
relative coordinate. It means that the coordinate which the
system given in a t+¨t time is relative to the origin point in t
time or in the very first frame when the camera device starts
the program. X coordinate represents the horizontal field and Y
coordinate represents the vertical field.
After that, we save the direction and the distance every
several time while the camera is moving. Those storage data
can be used for knowing the navigation that camera can be
taken to get back to its origin point.
III. RESULT
(5b)
Once all the methods have been applied to build the
navigation system, it’s time to check the result. Such as the
Fig. 5. Outlier location
pixel’s movement when camera has been moved without any
In fig. 5a, the rectangle image on the left-side is one of the motion from the environment around camera. So, this research
feature in first frame that compared in the next frame. First specialized in idle environment while the camera is moving.
keypoints was found in the feature and the second keypoints Not only that, but also the drawing pattern of camera
was found in the second frame. The line connected between movement that can be used as path mapping as well. The last
first keypoints and second keypoints to show the matches. result is about when camera is already being callibrated to
analyze the difference between pixel coordinate and real world
Eventhough we already decided to use the best minHessian coordinate.
value for our environment in several trial to reduce the noice, it
still shows some noices as it shown in fig 5b. These noices A. Pixel’s Movement
located far away from others match that have been found. This Pixel’s movement can describe the camera movement to
unperfect match called outlier that will be eliminated from get its characteristic. In this case, pixel’s movement represents
system. the direction that camera takes. The movement recorded while
We can find the mid-point of features in second frame by camera was idle, moving to the right, moving to the left,
calculated the mean of keypoints’ coordinates in second frame. moving forward and moving backward.
According that, if the outlier does not eliminated it will lead to Final mid-point that found in outlier elimination progress is
miss calculate the the mid-point. Therefore the outlier that being used to detect pixel’s movement. The movement’s
located far from other keypoints need to be eliminated. characteristic obtained by comparing both X and Y coordinates
in the first frame to the second frame. It means that every
E. Getting The Coordinate for Navigation Direction and several time, system will record the pixel coordinate.
Distance
The first data that has shown in Table I record the pixel’s
The mid-point is the pixel coordinate of feature in time t.
movement when camera moved to the right. In this trial, the
For getting the direction, we analyse the pixel movement that
camera moved lineary from its position to the right.
we can get from compare the mid point in the time t and in the
time t+¨t using form transformation using SURF. By
measuring the pixel movement, the characteristic of direction TABLE I. PIXEL’S MOVEMENT WHEN CAMERA MOVED TO THE RIGHT
that camera took can be formulated.
143
Feature X0 Y0 X1 Y1 has the deviation value less than 20, the X coordinate decreases
0 - - - - its value with deviation value more than 20. It illustrates that
1 209.368 325.865 127.303 329.605 the pixels move from right to the left.
2 203.904 335.285 139.09 327.733 The pixel’s movement when the camere moved to the right
3 - - - - and to the left proved the optical flow theory, when the camera
4 192.473 334.826 164.487 330.818 is moving while the object idle. It has different direction, same
5 - - - - like human’s eyes. If the camera or the eye see something and
6 - - - - the camera move the right, the object seems like moving to the
7 209.368 325.865 127.303 329.605 left. In the other hand, if the camera move the left, the object
seems like moving to the left.
It is different case if the camera moves forward or moves
For Table I – Table IV: backward. The next table, Table III and Table IV shows the
X0 = X pixel coordinate of feature in first frame. pixel’s movement when camera moved forward and backward.
Y0 = Y pixel coordinate of feature in first frame.
X1 = X pixel coordinate of feature in second frame.
Y1 = Y pixel coordinate of feature in second frame. TABLE III. PIXEL MOVEMENT WHEN CAMERA MOVES FORWARD
Feature X0 Y0 X1 Y1
Several feature didn’t find any keypoint, that’s why it left 406.082 152.072 207.139 101.091
0
blank. The 8 feature method is really helpful to find keypoint
1 406.082 152.072 207.139 101.091
eventhough some features didn’t find it. So, if the 0th feature
didn’t found a keypoint to determine a mid-point, system will 2 409.818 147.146 207.139 101.091
search in the 1st feature. If both 0th and 1st feature didn’t find 3 - - - -
any keypoint, it will check in the 2nd feature and take its mid- 4 - - - -
point as the point, and so on. 5 - - - -
6 403.608 152.233 207.139 101.091
Coordinate (X0,Y0) is the coordinate in the first frame and
coordinate (X1,Y1) is the coordinate in the second frame. The 7 409.818 147.146 207.139 101.091
deviation between the X coordinate and the Y coordinate has
been used to estimate the pixel’s movement. The X coordinate
After several frames, the pixel from all feature can be
represents horizontal field, and Y coordinate represents vertical
located in the same point, like in Table III, where the
fields.
coordinates between features shows the same value. Since the
If the absolute deviation value between X0 – X1 and Y0 – coordinate (X0, Y0) has not taken in the 0th frame, but its
Y1, is less than 20, it means the coordinate is idle. If the compare every five frames.
diference is higher than 20, it will calculate as movement. 20 is
Both X coordinate and Y coordinate in Table III decreases
the number of error tolerance. We declared the error tolerance,
its value and the deviation is more than 20. It means that the
because even the camera is idle, it still detected pixel’s
feature move from its state to the top-left. In horizontal field, it
movement but the value is small and it’s not higher than 20.
moves from right to the left since the system starts the X
Features that found its keypoint in Table I have a similarity. coordinate from left to the right. In vertical field, it moves from
The value of X0 is higher that X1, it describes that pixels move bottom to the top since the system starts Y coordinate from top
from right to the left. Meanwhile, the Y coordinate has to bottom. In the other words, if the camera moves forward, the
deviation value less than 20 and will be ignored, because it pixel moves oblique from its state to the top-left.
means that there are no pixel movement in vertical field.
TABLE IV. PIXEL MOVEMENT WHEN CAMERA MOVES BACKWARD
TABLE II. PIXEL’S MOVEMENT WHEN CAMERA MOVED TO THE LEFT
Feature X0 Y0 X1 Y1
Feature X0 Y0 X1 Y1 0 227.021 94.1245 431.159 139.057
0 120.556 325.627 173.724 333.576 1 227.021 94.1245 431.159 139.057
1 120.556 325.627 173.724 333.576 2 220.796 104.143 342.831 127.675
2 128.277 316.111 162.837 322.018 3 - - - -
3 120.556 325.627 173.724 333.576 4 - - - -
4 164.336 328.268 184.516 315.959 5 224.352 92.9317 342.831 127.675
5 - - - - 6 224.352 92.9317 342.831 127.675
6 - - - - 7 236.772 98.8568 433.764 136.106
7 - - - -
144
shows that the X and Y coordinate has increased. It means that of frame to the top. If it detects that pixel moves to the right, it
the pixel move from its state to the right in horizontal field and will draw the line from center to the righr. If it detects that
to the bottom in vertical field. The pixel moves oblique from its pixel moves to the left, it will draw the line from center to the
state to the bottom-right. left. And if it detects that pixel moves backward, it will draw a
line from center to the bottom.
Fig. 6 shows the illutration of pixel movement when
camera moves from its state to any direction. The X coordinate But the drawing depends on the last pixel’s state. For the
starts from left to the bottom while the Y coordinate starts from example, if it detects that the pixel just moved tothe right, then
top to the bottom. The hollow bullet is a pixel location the next movement depends on to the right potition, since it had
represents coordinate (X0,Y0). The solid bullet is a pixel drawn a line to the right. In the other world, after the pixel
coordinate in t+¨t time represents coordinate (X1,Y1). 6a is the moved to the right for the first time and will move again to the
pixel’s movement when camera moves to the right, 6b is the right, it will draw a line to the bottom. Or in the other case, if
pixel’s movement when camera moves to the left, 6c is the the next move is the pixel moves to the left, it will draw a line
pixel’s movement when camera moves forward and 6d is the to the top. The result can be seen in fig. 7.
pixel’s movement when camera moves backward.
(a) (b)
(a)
(c) (d)
B. Drawing Path
The direction’s characteristic can be formulated like the (b)
equation below. Equation (1) is the characteristic when the
camera is idle. Equation (2) is the characteristic when the Fig. 7. The result of path drawing
camera moves to the right. Equation (3) is the characteristic
when the camera moves to the left. Equation (4) is the We show two of our results in path drawing to describe the
characteristic when the camera moves forward. And the last, explanation before. Fig. 7, is the result from several movement.
equation (5) is the characteristic when camera moves The small black bullet is imaginer bullet where the camera
backward. starts the program. After that, in the fig. 7 (a), firstly, the
camera moved forward. Secondly, the camera moved to the
Idle = [abs(X0 – X1) < 20, abs (Y0 - Y1) < 20] (1) right. Thirdly, the camera moved to the right again. In the third
right = [(X0 – X1) > 20, abs(Y0 – Y1) < 20] (2) movement, when camera moved to the right, the system didn’t
draw line to the right but draw it to the bottom, because the
left = [(X0 – X1) > 2, abs(Y0 – Y1) < 20] (3) second movement has already moved to the right in the prior.
forward = [(X0 – X1) > 20 , (Y0 – Y1) > 20] (4) Difference camera route mapping result shown in fig. 7 (b).
Firstly, camera moved forward. Secondly, it moved to the left.
Backward = [(X0 – X1) < -20, (Y0 – Y1) < -20] (5)
Thirdly, it moved to the right. In the first movement, the
From the pixel’s movement, we can draw the path maping system drew the path as a line from the center of frame to the
of camera’s tracking route. If the absolute deviation both top. In the second movement, it drew line to the left. Since the
coordinate of X0 – X1 and Y0-Y1 less than 20, or it stands on its current state was in the left position, so when it came to the
state, the system only draw a point, since there’s no movement third movement, the system drew the line to the top to describe
detected. If the drawing starts in the first frame, and it detects the camera’s movement to the right.
that the pixel moves forward, it will draw a line from the center
145
C. The Result of Camera Callibrating navigation that need to take to make camera move to the origin
For the result of camera calibrating as the part of visual point.
odometry, we kept the coordinat in 2 dimention (X,Y). X for
the horizontal field and Y for the vertical field. In this result, TABLE VI. COMPARISON OF NAVIGATION BY SYSTEM AND NAVIGATION
we compare the coordinate that system has been detected with IN REAL
the real coordinate in real world. Number Navigation by System Navigation in Real
The coordinate that the system had been detected depends 1. back = 50, back = 50 back = 150
with the length of how long pixels that use to draw the line. In 2. back = 50, left = 50, back back = 100, left = 0, back =
this part, we noticed that every 10 pixel equals with 50 50, back = 50 150
centimeters in real world. The next step, we got the result of 3. back = 50, back 50, right = back = 100, right = 0, back
50, back = 50 = 150
coordinate comparison between coordinate that the system
detected and the coordinate in the real world. The result is
being shown in Table V. The first trial in Table VI is the navigation to back to to the
origin point after camera moves forward for 150 cm. To back
TABLE V. COMPARISON OF COORDINATE SYSTEM WITH THE REAL
to the origin point, the camera needs move backward for 150
COORDINATE cm as shown in the navigation in real, and the system suggest
to back for 50 cm and then back again for 50 cm. It shows that
Number Xsys Ysys Xreal Yreal our system gives notification every 50 cm moves.
1. 0 100 0 150
2. 150 10 150 0 In the second trial, it shows the navigation after the camera
3. -50 0 -100 0 moves forward 100 cm and then turn to the left and straight for
150 cm. That’s why the camera should move backward for 150
4. 0 -150 0 -150
cm, turn left, and then moves backward again for 150 cm to get
5. 100 100 100 150
the origin point. But, the navigation system shows the camera
6. 150 150 -150 150 needs to move backward for 50 cm and then moves to the left
7. 150 300 125 250 for 50 cm, and then backward again for 50 cm and 50 cm.
From this trial, we can know that our system can’t distinguish
whether the camera moves to the left or the camera rotates 900
We know that the coordinate in our system is relative from to the left. The same condition applied when the camera moves
the origin poin that the application start to run. First position is or rotate to the right.
(0,0). When it moved forward it increased the Y coordinate and
when it decreased when camera moves backward. Meanwhile,
when camera moves to the right, it increased the X coordinate IV. CONCLUSION
and it decreased when camera moved to the left. From this research, we conclude that our optical flow result
using SURF aproaching can be applied in monocular camera
In table V, first we tried to move the camera forward for
and can used for detecting the direction from camera’s
100 centimeters, but the system found its coordinate as Xsys,
movement by determine the pixel’s movement. It can
Ysys equal with (0,100). The second trial, the real coordinate
distinguish the direction whether the camera moves forward,
should be Xreal, Yreal equals with (150, 0) relative to coordinate
moves backward, moves to the right or moves to the left. But,
(0,0) in the first time camera move. But, the system show the
to draw the path, it can’t apply the direction characteristic
result as (150,0), and so on. From those trials, we conclude that
directly. It needs to convert the direction to draw the path. Not
the error range from the system is between 0-100 cm.
only the direction, our system also can show the coordinate
The distance that recorded in system will be saved for eventhough the eoror is still in the range 0-100 centimeters.
building the next component for our system. We build a However, it still ambiguous to recognize if the camera moves
navigation that including direction and distance for camera to to the right lineary or it rotate to the right 900.
come back to its origin point.
REFERENCES
D. Navigation to Back to Origin Point [1] David J. Fleet and Yair Weiss, “Optical flow estimation” In Paragions;
After camera’s device moves on any directions and Springer ISBN 0-387-26371-3.
distances, the system can give navigation suggestion for [2] Bay. Herbert, Tuytelaars. Tinne, Gool. Luc Van. “SURF: Speeded Up
camera to back to the origin point (0,0). By assuming that the Robust Features,” in European Conference on Computer Vision, ECCV,
in Graz, Austria, pp 404-417, May, 7-13 2006.
camera stays in it last orientation, the camera’s device need to
move backward to back to its origin point. For example, if [3] Z.Zhang. “A flexible new technique for camera calibration,” IEEE
Transactions on Pattern Analysis and Mechine Intelligence, vol. 22, pp
camera moves 100 cm forward from its origin point, it need to 1330-1334, November, 2000.
move backward 100 cm. [4] Lowe. David G. “Distinctive image features from scale-invariant
We’ve tried several trials to check the navigation to back to keypoints,” Internation Journal of Computer Vision, vol. 60, Issue 2, pp
91-110, November, 2004.
the origin point. Table VI shows three results of it. On that
[5] Singh. Avi. “Visual odometry from scratch – a tutorial for beginners,”
table, we compare the navigtion that the system suggested with press. Computer science, Barkeley Al Research, University of
the navigation that the systme suppose to give or the real California, 2015.
146