Foresight Redefining Stereo Ebook 2022
Foresight Redefining Stereo Ebook 2022
Foresight Redefining Stereo Ebook 2022
Rethinking Stereo
Table of Contents
Introduction 3
The vehicles of the future are chock full of sensors, cameras, and technology that enable
the car to “see” what is on the road ahead and the surrounding area in order to alert the
driver or automatically brake or swerve to avoid an obstacle. Even the smallest inaccuracy
can result in disaster, so it is crucial to ensure that the obstacle detection systems can
accurately pinpoint any potential obstacle - from an object on the road to another vehicle
to an animal or human being - regardless of the lighting or weather conditions. Many
existing solutions rely on a combination of radar, LiDAR, cameras, and sensors but the
magic bullet had yet to be found. Foresight’s patented use of stereoscopic technology
combines the best of all worlds, allowing for the use of either visible-light or thermal long-
wave infrared cameras to provide accurate detection of both classified and unclassified
objects. Traditional stereo vision solutions are limited by the placement of the cameras
and the need for them to be in perfect parallel alignment. With Foresight’s technology, this
requirement is eliminated, opening up a huge array of new possibilities.
This cutting-edge technology offers a cost-effective solution that provides highly accurate
object detection even at long-range distances regardless of the lighting and weather
conditions.
Existing solutions add an additional LiDAR or radar component to estimate the distance when objects
are detected by a camera. This not only adds cost but, if disparate components are used to capture
an image and measure distance, the results need to be “fused” together. This is a complex technical
process that brings with it a host of challenges and complications.
A standard two-dimensional static image simply does not provide enough information for an
autonomous driving system to be able to successfully and accurately avoid hitting an obstacle every
time. For example, an accident in an autonomous vehicle was attributed to the vehicle being unable
to distinguish between a white car on the road and a cloud. The autonomous vehicle hit the other car,
causing a fatal accident that could have been avoided if the obstacle detection system had been able
to provide a higher level of accuracy.
As the systems in autonomous vehicles are going to need to continuously make more and faster
decisions without human intervention, they must transition from relying on static images to using a 3D
point cloud. A 3D point cloud is a depth map that represents the x, y, and z coordinates of an object,
accounting for the depth of the object and making it possible to detect objects more accurately as well
as to conduct terrain analysis and sensor fusion.
Creating a 3D point cloud requires cameras. While some solutions use single-view cameras, the
level of accuracy they enable is not high enough. Foresight’s technology uses stereo vision (or
two cameras) to overcome many of the accuracy challenges and to be able to detect even smaller
obstacles at larger distances.
While SFM works well for creating static maps, it is not a reliable source for estimating the
distance of moving objects such as cars and pedestrians. This is because a moving object
will be in a different position in each of the two frames that were captured at different
times, making it difficult to estimate its actual position or depth. The result is an inaccurate
distance measurement that can put the safety of autonomous car drivers at risk.
To get the most accurate object detection and depth map, stereo vision is required.
The setup of the general structure from motion problem. Using a single camera for taking snapshots of the same
object from different angles, then reconstructing the world position of the points. The major difficulty is to avoid using
points from moving objects. The course notes for Stanford’s CS231A course on computer vision.
https://github.com/kenjihata/cs231a-notes
When using stereo vision, two cameras are set up facing the same direction at an accurately-measured
distance from each other known as the baseline. The cameras are located in a way that maximizes
their overlapping field of view in order to minimize the computational power needed to match up the
points located in each image and accurately estimate the depth of each object. In traditional stereo
vision solutions, the cameras must be positioned on the same horizontal access, parallel to each other.
Choosing the baseline - the distance between the cameras - is an important decision and involves
a tradeoff. A larger baseline, meaning a longer distance between the two cameras, will improve the
accuracy of the distance estimation in the long range. This larger baseline, however, will also impact
the ability to accurately estimate the depth of closer range objects because of occlusions and different
view perspectives. To illustrate this tradeoff by using extreme examples, if one was to send a satellite
to observe Earth from outer space, it would make sense to have as large a baseline as possible. If, on
the other hand, the objective is to construct a robot that needs to see 5 meters (16 feet) ahead, the
baseline only has to be a few centimeters. The purpose of the images must be taken into consideration
when determining the baseline.
This risk of inaccuracy poses a challenge for using stereo vision to ensure the safety of
drivers (and people and animals in the vicinity) of autonomous vehicles. Foresight has
developed a revolutionary approach to stereo vision, solving the positioning challenge and
creating a highly accurate 3D image of obstacles, visibility, and terrain.
Stereo vision setup. 2 pinpoint cameras facing approximately forward situated on a mutual baseline. O and O’ represent
the focal point of the cameras. e and e’ denote the epipole points, the projection of the focal point of the other camera.
Rectification is performed (homographic transformation) to bring the epipolar lines (marked in red dashes) to be parallel
(marked by red lines). The course notes for Stanford’s CS231A course on computer vision.
https://github.com/kenjihata/cs231a-notes
If, however, the cameras were to be moved and were no longer in parallel alignment, then a process
called optical flow comes into play. In this process, the entire image has to be searched in order
to match up the pixels and identify the same objects in each image. This presents a challenge and
requires complex and costly computational processes.
In the past, attempts were made to overcome the challenge. One way was to calculate optical flow
based on variations of the light consistency assumption. In this method, the system would look for small
patches in the source image that looked the same as in the target image, only shifted. The problem,
however, was the inability to handle things like scaling, rotation, morphing, and illumination changes,
resulting in inaccurate renderings. To compensate, another method was developed involving key points
algorithms. Using this process, key points were chosen from the two images, and then feature vectors
were extracted using one of a variety of feature-detecting methods. Then, the feature vectors from the
points of one image were matched to the set of points from the other image. Any points with a single
close neighbor were paired and then triangulation was performed on the matched pairs in order to
calculate the distance, producing a sparse depth map. Unfortunately, because the resulting depth map
is sparse, it is still not accurate enough to fulfill the safety requirements of autonomous vehicles. And
this is among the reasons why OEMs and Tier 1s have been hesitant to adopt stereo vision technology.
The good news is that a new age is dawning with Foresight at the forefront.
Leveraging recent advances based on neural networks, Foresight has been able to revolutionize the
way the optical flow is calculated, making it possible to result in a dense depth map even when the two
cameras are not on a parallel axis and regardless of whether the camera uses visible-light or thermal
long waves.
Foresight harnesses these state-of-the-art and ensure that they will always get the
optical flow techniques to create a dense pixel- most accurate depth map regardless of the
wise optical flow map even under challenging conditions. The system automatically calibrates
circumstances. A vehicle’s existing cameras itself, so if one camera is moved, the system is
can be used, whether they are visible-light or automatically recalibrated without the need for
thermal long-wave and regardless of where on manual intervention.
the automobile they are positioned. Foresight’s
patented methodology means the ability to Foresight has leveraged the inherent benefits
capture the depth perception and obtain a clear of stereo vision and has applied patented
3D view at any distance and no matter how the technological solutions to upgrade this method,
cameras are positioned. creating a highly-accurate depth map that
can be used to detect any object - known or
The solution offers an array of placement unknown - and indicate its size, location and
options that can be fully customized and distance.
dynamically adapted to suit the user’s needs
And this is not good news only for the passenger car
industry. The autonomous driving demand extends to
commercial transportation, agriculture, drones, and more.
Manufacturers across verticals can incorporate Foresight’s
stereo vision capabilities - with full design flexibility to
determine where to place the cameras - ensuring that their
vehicles will perform the way consumers expect without
compromising on safety.