0% found this document useful (0 votes)
12 views5 pages

Iccais

fgdsdfgsdfg

Uploaded by

Đức Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Iccais

fgdsdfgsdfg

Uploaded by

Đức Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Improved Method for Object Distance Estimation

using Stereo Camera


Hieu Dang Trung
Faculty of Electronics and Telecomunications
University of Engineering and Technology, Vietnam National University, Hanoi
Hanoi, Vietnam
[email protected]
An Nguyen Ngoc
Faculty of Electronics and Telecomunications
University of Engineering and Technology, Vietnam National University, Hanoi
Hanoi, Vietnam
[email protected]

Abstract—An improved method to estimate distance on


object in stereo image. The novel part of the method includes a
real-time object detection model to extract a list of objects.
After extraction, estimate distances based on the information of
the extracted objects. To estimate the distance to the object,
instead of estimating the distance to the entire pixel from the
image, this method only estimates the distance to the objects
from the image. This method reduces processing time and
solves surface problems in stereo vision.
Fig. 2. Real-world point to pixel point
Keywords—computer vision, stereo vision, camera, distance,
object detection. Camera parameters include extrinsic and intrinsic
parameters. The extrinsic parameters consist of a rotation and
I. INTRODUCTION
a translation. These parameters are used to convert the
Humans have vision from the eyes, information about coordinates of a point in the real world to camera plane (real
vision from the eyes is transferred to the brain for processing. world with the origin lying on the camera).
Similarly, in the machine world, two or more cameras are
used to process the information extracted from the images. The intrinsic parameters include the focal length, the
Among those algorithms, object detection and distance optical center and skew coefficient. These parameters are
estimation is one of the most interested algorithms today. used to convert the coordinates of a point in camera plane to
image plane (3D to 2D).
Distance estimation may not need to use computer vision,
instead using sensors such as ultrasonic sensors, radar, Besides the intrinsic and extrinsic parameters, real
LiDAR, … are commonly used. Extracting data from images cameras have lenses, so lens distortion occurs in real
is more difficult than using sensors, but using stereo camera cameras. To accurately represent a real camera, the camera
is cost-effective, processing data from images is more model includes the radial and tangential lens distortion.
intuitive and potential for growth than using sensors. B. Rectify images
II. TRADITIONAL METHOD Rectify images is the process of converting a distorted
image to an undistorted image using the distortion
The processing of the traditional distance estimation coefficients obtained from the calibration process and
algorithm is shown in Fig. 1. reprojecting images onto a common plane parallel to the line
between optical centers if the image planes of the cameras
are not parallel to each other and to the baseline.
Radial distortion occurs when light rays bend more near
the edges of a lens than they do at its optical center. The
smaller the lens, the greater the distortion. Radial distortion
point are denoted as (x distorted , y distorted ) :
2 4 6
x distorted =x (1+k 1∗r + k 2∗r +k 3∗r )
Fig. 1. Process diagram of the traditional algorithm 2 4 6
y distorted = y (1+ k 1∗r + k 2∗r + k 3∗r )
A. Calibrate cameras
Tangential distortion occurs when the lens and the image
Camera calibration is the process of estimating camera plane are not parallel. Tangential distortion point are denoted
parameters. These parameters are necessary to define the as (x distorted , y distorted ) :
relationship between the real-world 3D point and its 2D
projection in the image captured by the camera being
calibrated.
x distorted =x [2∗p1∗x∗y+ p 2∗( r 2+2∗x 2 ) ]
y distorted = y [ p1∗( r 2+ 2∗y 2 ) +2∗p 2∗x∗ y ]

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


 x , y – Undistorted pixel locations
 k 1, k 2, k 3 – Radial distortion coefficients of the lens
 p1, p2 – Tangential distortion coefficients of the
lens
2 2 2
 r =x + y

Fig. 6. Find the best match pixel on scanline

After rectifying images, the epipolar line becomes a


horizontal line, so the coordinates of similar pairs of pixels
on the vertical axis are the same. To compute disparity, we
only need compute the coordinate difference along the
horizontal axis. Therefore, disparity of ( x 1 , y 1 ) on the left
Fig. 3. Undistorted image image and ( x 2 , y 2 ) on the right image are denoted as:

Disparity = x 1−x 2

Fig. 4. Stereo camera model

Fig. 7. Compute disparity map

D. Compute depth map


Compute depth map is the process of converting disparity
map to depth map using camera parameters.
The depth of a pixel in camera space is denoted as:
b∗f
Depth=
Disparity
 b – Baseline (m)
 f – Focal length (pixel)
Fig. 5. Reproject images onto a common plane parallel to the line between III. IMPROVEMENT
optical centers
Due to the limitations of traditional method such as
processing time and stereo correspondence problems, … all
C. Stereo matching and Compute disparity map come from the Stereo matching, so we started to change
Stereo matching is the process of finding pairs of similar distance estimation method from this step.
pixels which always lie on the same epipolar line. There are
many similarity measure to identify similar pairs of pixels The change process is shown in Figure 8.
such as SAD, SSD, NCC, …
C. Compute object disparity and depth
Similar to pixels, object disparity is also the coordinate
different along the horizontal axis of the object.
But there are some cases where the object only appears
partially on the image, so we will divide it into 3 cases and
choose the coordinate along the horizontal axis of the object
as the coordinates of the red line as shown in Fig. 10.
After computing object disparity, we convert it to the
depth like dealing with pixels.

Fig. 8. Process diagram of Improved method

Computing depth over all pixels on an image is the most


process time consuming step in the whole process. In fact,
finding similar pixels in stereo image is an unreliable
operation because the view of the two cameras are at
different coordinates, so many pixels can not appear in both
images.

Fig. 10. 3 case with an object

Fig. 9. Perpective problem


IV. COMPARISON
After the improvement, the processing distance
Stereo matching is the process of finding pairs of similar estimation algorithm is much faster than before. With
pixels which always lie on the same epipolar line. There are traditional algorithm, when estimating the distance of a
many similarity measure to identify similar pairs of pixels stereo image with a resolution of NxN, the algorithm
such as SAD, SSD, NCC, … complexity in Stereo Matching step is O(N 3 ). For the same
To solve this problem, instead of looking for similar stereo image, after improvement, the complexity in the
pixels, we find similar objects and compute the depth on Object Matching step is O(N 2 ) where N is the number of
them. This change allow us to not waste time on unnecessary objects in the image.
pixels but focus on the objects we are really interested in.
Although the improved method has an extra step of
A. Real-time Object Detection Model object detection on the image, the processing time in this step
We can use any real-time object detection model to is very fast. With current object detections models such as
predict objects on stereo images. But the processing speed YOLO that can be processed at the real-time level.
and accuracy of this change method are mainly based on the
processing speed and accuracy of this model. Using CPU Traditional Improved Improved
AMD method method with method with
B. Object Matching Ryzen 5 - YOLOv4 YOLOv4-
The object information predicted from the object 4800H tiny
detection model usually includes the coordinates of the Processing 145 s 1.4s 1.3s
object on the image, the smallest region containing the object time for a
and the object class. stereo
To identify two similar objects, we can refer to the image
following requirements: Using Traditional Improved Improve
 Same object class GPU method method with method with
GTX1660 YOLOv4 YOLOv4-
 There are approximately the same vertical TI tiny
coordinates
Average Too low 15-20 >60
 If there are more than two similar objects with the FPS on
same object class and the same vertical coordinate processing
approximation, we need to crop the two image video
regions and compare the two images pixel by pixels
or use deep learning models to find image
similarity.
V. PREPARE FOR EXPERIMENTAL B. Two objects belong the same object class
A. Build a system for testing
We use an integrated stereo camera HBV-1714-2 for
capture stereo image and connect to the computer through
USB cable for experimental.

Fig. 11. Stereo camera HBV-1714-2

After calibration, we get focal length f = 420 pixels and


integrated stereo camera HBV-1714-2 baseline b = 120 mm.
b∗f 50.4
Depth= = ( m) (1) Fig. 14. Two objects belong the same object class
Disparity Disparity
Objects Actual Measured Deviation
Based on (1), the maximum distance which system can distance (m) distance (m) (m)
estimate is 50.4 meters. And input image resolution is
640x480, so the maximum disparity is 639. Therefore, the Empty Coca 1.6 1.57 -0.03
minimum distance which system can estimate is bottle 1.5L
approximately 0.08 meters.
Small Coca 0.81 0.76 -0.05
B. Build an User Interface for easy configuration bottle

C. More than two objects belong the same object class

Fig. 12. User Interface include YOLO Configuration, Camera source


switch, Output configuration

VI. EXPERIMENTAL RESULTS


Fig. 15. More than two objects belong the same object class
We use YOLOv4 and COCO dataset for object detection
model. Deviations of all results are less than 0.08 meters Objects Actual Measured Deviation
(minimum distance system can estimate). distance (m) distance (m) (m)
A. Only one object Empty 7UP 1.6 1.53 -0.07
bottle 1.5L
Empty Coca 1.21 1.21 0
bottle 1.5L
Small Coca 0.82 0.83 +0.01
bottle

VII. CONCLUSION
I has finished building a system can estimate object
distance using my improved method. Based on the
experimental results on YOLOv4, a number of goals have
been achieved. The obtained results show that the model can
Fig. 13. Only one object estimate the distance to the object with minimum light
conditions at 150 LUX and object distance within 2 meters.
Objects Actual Measured Deviation Detectable objects need to be collected and trained for the
distance (m) distance (m) (m) model before the model can be used for prediction. The
Plastic cup 0.4 0.4 0 greatest success of the research comes not only from
estimating object distances in a short time, but also from
applying a new distance estimation model based on an object
detection model. The detection and estimation object
distance within 2 meters at real-time level is also a result that
can serve for further development and research, especially achieved results, it can be seen that the research results have
nowadays object detection models growing rapidly. From the achieved the set goals and there are promising initial results.
REFERENCES
[1] Anil R Patel; Ashish Patel. (10-12 October 2019). Comparative Analysis of Stereo Matching Algorithms. IEEE Annual Ubiquitous Computing,
Electronics & Mobile Communication Conference. New York, NY, USA.
[2] Bradski, G., and A. Kaehler. (2008). Learning OpenCV: Computer Vision with the OpenCV Library. Sebastopol, CA: O'Reilly.
[3] Burger, W. (n.d.). Zhang’s Camera Calibration Algorithm: In-Depth Tutorial and Implementation.
[4] Cui Gao; Qiang Cai; Shaofeng Ming. (25-27 December 2020). YOLOv4 Object Detection Algorithm with Efficient Channel Attention Mechanism.
International Conference on Mechanical, Control and Computer Engineering (ICMCCE). Harbin, China.
[5] Heikkila, J., and O. Silven. (1997). A Four-step Camera Calibration Procedure with Implicit Image Correction. IEEE International Conference on
Computer Vision and Pattern Recognition.
[6] Iwan Ulrich and Illah Nourbakhsh. (July/August 2000). Appearance-Based Obstacle Detection with Monocular Color Vision. AAAI National
Conference on Artificial Intelligence. Austin, TX.
[7] J. Zbrontar, Y. LeCun. (January 2016). Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. The Journal of
Machine Learning Research, 18(1).
[8] Joseph Redmon; Santosh Divvala; Ross Girshick; Ali Farhadi. ( 27-30 June 2016). You Only Look Once: Unified, Real-Time Object Detection.
Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA.
[9] Jure Zbontar , Yann LeCun. (2016). Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. Journal of Machine
Learning Research 17, 1-32.
[10] M. Bertozzi, A. Broggi, A. Fascioli, S. Nichele. (Oct 2000). Stereo Vision-based Vehicle Detection. Procs. IEEE Intelligent Vehicles Symposium 2000.
[11] M. Bertozzi, A. Broggi, A. Fascioli, S. Nichele. (n.d.). Stereo Vision-based Vehicle Detection.
[12] M. G. Mozarov, J. Weijer. (March 2015). Accurate stereo matching by two-step energy minimization. IEEE Trans. on Image Processing , 24(3), 1153-
1163.
[13] Mozerov, M. G. (n.d.). Accurate stereo matching by two-step energy minimization. IEEE and Joost van de Weijer.
[14] Nayar, S. (April 2021). Camera Calibration.
[15] Nayar, S. (April 2021). Linear Camera Model.
[16] Nayar, S. (April 2021). Simple Stereo.
[17] Prof. Sergiu Nedevschi, Radu Danescu, Dan Frentiu, Tiberiu Marita,Florin Oniga, Ciprian Pocol, Dr. Rolf Schmidt, Dr. Thorsten Graif . (June 2004).
High Accuracy Stereo Vision System for Far Distance Obstacle Detection. 2004 IEEE Intelligent Vehicles Symposium.
[18] Zhang, Z. (2000). A Flexible New Technique for Camera Calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No.
11, pp. 1330–1334.

You might also like