From Depth Map To Point Cloud. How To Convert A RGBD Image To Points

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Get unlimited access to the best of Medium for less than $1/week.

Become a member

From depth map to point cloud


yodayoda · Follow
Published in Map for Robots · 5 min read · Sep 23, 2020

155 4

How to convert a RGBD image to points in 3D space

This tutorial introduces the intrinsic matrix and walks you through how you
can use it to convert an RGBD (red, blue, green, depth) image to 3D space.
RGBD images can be obtained in many ways. E.g. from a system like Kinect
that uses infrared-based time-of flight detection. But also the iPhone 12 is
rumored to integrate a LiDAR in its camera system. Most importantly for
self-driving cars: LiDAR data from a mobile unit on a car can be combined
with a standard RGB camera to obtain the RGBD data. We do not go into
details about how to get the data in this article.
Fig. 1: (left) Image plane in u, v coordinates. Each pixel has a colour and a depth assigned. (right) 3D view in
cartesian coordinates x, y, z.

It is important to know your camera’s properties if you want to understand


what each pixel corresponds to in a 3D environment. The most important
parameter is the focal length. It tells us how to translate a pixel coordinate
into lengths. You probably have seen focal lengths like “28 mm”. This is the
actual distance between the lens and the film/sensor.

From a simple geometric argument (“similar triangles”) we can easily derive


the position x from u and d of each pixel. The picture below is just looking at
x and u but we can do exactly the same for y and v. For a pinhole camera
model the focal length is the same in x and y direction. This is not always the
case for a camera with a lens and we will discuss this in a future article.

Fig. 2: Projection (top view) showing the x-z plane. On the left, a pin hole camera with an object (the same blue
ball from above) in front of the camera and represented on a screen. The world coordinate system is aligned
with the camera so that the z-axis extends to the direction the camera is looking. On the right, the two partly
overlapping triangles from the left are separated for more clarity.

From the similar triangle-approach we immediately obtain:


Eq. 1

Usually fₓ and fᵧ are identical. They can differ though e.g. for non-
rectangular pixels of the image sensor, lens distortions, or post-processing of
the image.

To sum up, we can write a very short piece of Python code using only
geometric arguments to convert the coordinate system of the screen to the
cartesian coordinate system.

1 def convert_from_uvd(self, u, v, d):


2 d *= self.pxToMetre
3 x_over_z = (self.cx - u) / self.focalx
4 y_over_z = (self.cy - v) / self.focaly
5 z = d / np.sqrt(1. + x_over_z**2 + y_over_z**2)
6 x = x_over_z * z
7 y = y_over_z * z
8 return x, y, z

convert.py hosted with ❤ by GitHub view raw

In the code (cₓ, cᵧ) is the centre of the camera sensor. Note the constant
pxToMetre, a camera property, which you can determine if the focal length
is known both in units of meters and in pixels. Even without it, the picture is
accurately represented in 3D up to a scale factor.

Of course there is a more general way to do all this. Enter intrinsic matrix! A
single matrix which incorporates the previously discussed camera
properties (focal length and centre of camera sensor as well as the skew).
Read this excellent article on it for more information. Here, we want to
discuss how to use it to do the above conversion for us. In the following we
will use capital boldface for matrices, lower case boldface for vectors and
normal script for scalars.

Eq. 2: Intrinsic matrix


Next, we introduce homogeneous coordinates. Homogeneous coordinates
will help us to write transformations (translations, rotations, and skews) as
matrices with the same dimensionality.

Think of it in this way. In Fig. 2 we could move the image plane to any other
distance e.g. from fₓ → 2fₓ and keep note of the factor h=2 that we shifted it
by. The shifting introduces a simple scaling and we can always go back to the
original by dividing u and v by h.

Eq. 3

Now we can do any operation on the homogeneous coordinates, while we


leave the last dimension unchanged. All operations are defined such that the
Top highlight
last component is unchanged. Good examples can be found in Chapter 2.5.1
of this book.

The rotation matrix R, translation vector t, and the intrinsic matrix K make
up the camera projection matrix. It is defined to convert from world
coordinates to screen coordinates:

Eq. 4: Conversion of world coordinates to image plane coordinates written with homogeneous coordinates.

Note that [R|t] refers to the block notation, meaning we concatenate R and
the column vector t=transpose{t₀,t₁,t₂}, or, in other words, add it to the right-
hand side of R.

If we want to do the conversion the other way around, we have a problem.


We cannot invert the 3x4 matrix. In the literature you will find an extension
to a square matrix which allows us to invert. To do this we have to add 1/z
(disparity) to the left side to fulfill the equation. The 4x4 matrices are called
full-rank intrinsic/extrinsic matrices.
Eq. 5: Same as Eq .4 but written with full-rank matrices

Let’s verify what we said above with the simplest case: the camera origin and
the world origin are aligned, i.e. R and t can be neglected, the skew S is 0,
and the image sensor is centered. Now the inverse of the camera matrix is
simply:

Eq. 6: The simplest case for the projection matrix

Just looking at the first row leads to exactly the same conclusion as we found
in the beginning (Eq. 1). Same applies to y and z using row twoand row three
of Eq. 6, respectively. For more complicated intrinsic matrices, you will need
to calculate the inverse before making this conversion. Since it is an upper
triangular matrix there is an easy analytical solution:

Eq 7: Inverse of K with all components.

Now you have all the tools at hand to convert a depth map or RGBD image
into a 3D scene where each pixel represents one point (Fig. 3). There are
some assumptions that we made along the way. One of them is the simplified
camera model: a pinhole camera. Cameras you use in the real world,
however, use lenses and oftentimes can only be approximated by the pinhole
model. In our next article of this series we will explore the differences and
the impact of the lens on this conversion.

Fig. 3: Point cloud (green) calculated from depth map (grayscale)

This article was brought to you by yodayoda Inc., your expert in automotive and
robot mapping systems.
If you want to join our virtual bar time on Wednesdays at 9pm PST/PDT, please
send an email to talk_at_yodayoda.co and don’t forget to subscribe.
Search Write

Depth Computer Vision Camera Matrix Point Cloud


Written by yodayoda Follow

154 Followers · Editor for Map for Robots

A Map for Robots and a programmable world

More from yodayoda and Map for Robots

yodayoda in Map for Robots yodayoda in Map for Robots

Why loop closure is so important Removing non-static objects from


for global mapping point clouds
Here we show visual slam for monocular Adapting LOAM to work with REMOVERT and
videos and how a consistent map is obtained running it on an example from nuScenes
via the loop closure mechanism. dataset
6 min read · Dec 23, 2020 7 min read · Jun 24, 2021

24 27

yodayoda in Map for Robots yodayoda in Map for Robots

On security and safety of HD maps Localization with Autoware


More autonomous vehicles (AV) are driving What you need to know about paths,
on our roads every day. It’s time we discuss transforms (TF) and other settings
the security and in particular the
7 min read · Jan and…
cybersecurity 5, 2022 6 min read · Jul 20, 2021

2 73

See all from yodayoda See all from Map for Robots
Recommended from Medium

Sepideh Shamsizadeh Alifya Febriana

Decoding PointNet: A Practical [ECCV 2022/Paper Notes] Lidar


Guide to 3D Segmentation with Point Cloud Guided Monocular 3D
Python
Get ready and PyTorch
to explore the world of 3D Object
Notes: Detection
The purpose of this post is to allow me
segmentation as we take a journey through to revisit the information in this paper in the
PointNet, a super cool way to understand 3D future without having to read it all again.
12 min read
shapes. · Oct 16, 2023
PointNet… 15 min read · Nov 6, 2023

61 3

Lists

Natural Language Processing


1353 stories · 832 saves
Nikita Malviya Mohamed Khaled

PointNet++ - A Point-based Lidar Road Marking Segmentation


Architecture for 3D Point Cloud Detecting road markings in 3D pointcloud is a
Data
PointNet++ is an extension of PointNet, critical challenge for autonomous vehicles.
aiming to capture more detailed geometric These markings assist the car in figuring out…
structures in point cloud data.
6 min read · Dec 8, 2023 5 min read · Feb 10, 2024

66 66

Florent Poux, Ph.D. in Towards Data Science Mustafa Böyük

3D Python Workflows for LiDAR Obtaining Point Cloud from Depth


City Models: A Step-by-Step Guide Images with Intel RealSense D-435
The Ultimate Guide to unlocking a Camera
Hello everyone, in this article, I want to share
streamlined workflow for 3D City Modelling a theoretical and practical document on how
Applications. The tutorial covers Python to obtain a point cloud from depth images.
· 38 min read
Automation · Apr 4, 2023
combining… 7 min read · Nov 28, 2023

764 5 4 1

See more recommendations

Help Status About Careers Blog Privacy Terms Text to speech Teams

You might also like