From Depth Map To Point Cloud. How To Convert A RGBD Image To Points
From Depth Map To Point Cloud. How To Convert A RGBD Image To Points
From Depth Map To Point Cloud. How To Convert A RGBD Image To Points
Become a member
155 4
This tutorial introduces the intrinsic matrix and walks you through how you
can use it to convert an RGBD (red, blue, green, depth) image to 3D space.
RGBD images can be obtained in many ways. E.g. from a system like Kinect
that uses infrared-based time-of flight detection. But also the iPhone 12 is
rumored to integrate a LiDAR in its camera system. Most importantly for
self-driving cars: LiDAR data from a mobile unit on a car can be combined
with a standard RGB camera to obtain the RGBD data. We do not go into
details about how to get the data in this article.
Fig. 1: (left) Image plane in u, v coordinates. Each pixel has a colour and a depth assigned. (right) 3D view in
cartesian coordinates x, y, z.
Fig. 2: Projection (top view) showing the x-z plane. On the left, a pin hole camera with an object (the same blue
ball from above) in front of the camera and represented on a screen. The world coordinate system is aligned
with the camera so that the z-axis extends to the direction the camera is looking. On the right, the two partly
overlapping triangles from the left are separated for more clarity.
Usually fₓ and fᵧ are identical. They can differ though e.g. for non-
rectangular pixels of the image sensor, lens distortions, or post-processing of
the image.
To sum up, we can write a very short piece of Python code using only
geometric arguments to convert the coordinate system of the screen to the
cartesian coordinate system.
In the code (cₓ, cᵧ) is the centre of the camera sensor. Note the constant
pxToMetre, a camera property, which you can determine if the focal length
is known both in units of meters and in pixels. Even without it, the picture is
accurately represented in 3D up to a scale factor.
Of course there is a more general way to do all this. Enter intrinsic matrix! A
single matrix which incorporates the previously discussed camera
properties (focal length and centre of camera sensor as well as the skew).
Read this excellent article on it for more information. Here, we want to
discuss how to use it to do the above conversion for us. In the following we
will use capital boldface for matrices, lower case boldface for vectors and
normal script for scalars.
Think of it in this way. In Fig. 2 we could move the image plane to any other
distance e.g. from fₓ → 2fₓ and keep note of the factor h=2 that we shifted it
by. The shifting introduces a simple scaling and we can always go back to the
original by dividing u and v by h.
Eq. 3
The rotation matrix R, translation vector t, and the intrinsic matrix K make
up the camera projection matrix. It is defined to convert from world
coordinates to screen coordinates:
Eq. 4: Conversion of world coordinates to image plane coordinates written with homogeneous coordinates.
Note that [R|t] refers to the block notation, meaning we concatenate R and
the column vector t=transpose{t₀,t₁,t₂}, or, in other words, add it to the right-
hand side of R.
Let’s verify what we said above with the simplest case: the camera origin and
the world origin are aligned, i.e. R and t can be neglected, the skew S is 0,
and the image sensor is centered. Now the inverse of the camera matrix is
simply:
Just looking at the first row leads to exactly the same conclusion as we found
in the beginning (Eq. 1). Same applies to y and z using row twoand row three
of Eq. 6, respectively. For more complicated intrinsic matrices, you will need
to calculate the inverse before making this conversion. Since it is an upper
triangular matrix there is an easy analytical solution:
Now you have all the tools at hand to convert a depth map or RGBD image
into a 3D scene where each pixel represents one point (Fig. 3). There are
some assumptions that we made along the way. One of them is the simplified
camera model: a pinhole camera. Cameras you use in the real world,
however, use lenses and oftentimes can only be approximated by the pinhole
model. In our next article of this series we will explore the differences and
the impact of the lens on this conversion.
This article was brought to you by yodayoda Inc., your expert in automotive and
robot mapping systems.
If you want to join our virtual bar time on Wednesdays at 9pm PST/PDT, please
send an email to talk_at_yodayoda.co and don’t forget to subscribe.
Search Write
24 27
2 73
See all from yodayoda See all from Map for Robots
Recommended from Medium
61 3
Lists
66 66
764 5 4 1
Help Status About Careers Blog Privacy Terms Text to speech Teams