0% found this document useful (0 votes)
25 views8 pages

Mouse Ray Picking Explained

There comes a time in every 3D game where the user needs to click on something in the scene. Maybe he needs to select a unit in an RTS, or open a door in an RPG, or delete some geometry in a level editing tool. This conceptually simple task is

Uploaded by

GomesGilzamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views8 pages

Mouse Ray Picking Explained

There comes a time in every 3D game where the user needs to click on something in the scene. Maybe he needs to select a unit in an RTS, or open a door in an RPG, or delete some geometry in a level editing tool. This conceptually simple task is

Uploaded by

GomesGilzamir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Mouse Ray Picking Explained

Brian Hook
http://www.bookofhook.com

April 5, 2005

1 Introduction
There comes a time in every 3D game where the user needs to click on
something in the scene. Maybe he needs to select a unit in an RTS, or open
a door in an RPG, or delete some geometry in a level editing tool. This
conceptually simple task is easy to screw up since there are so many little
steps that can go wrong.
The problem, simply stated, is this: given the mouse’s position in window
coordinates, how can I determine what object in the scene the user has selected
with a mouse click?
One method is to generate a ray using the mouse’s location and then in-
tersect it with the world geometry, finding the object nearest to the viewer.
Alternatively we can determine the actual 3-D location that the user has
clicked on by sampling the depth buffer (giving us (x, y, z) in viewport
space) and performing an inverse transformation. Technically there is a
third approach, using a selection or object ID buffer, but this has numer-
ous limitations that makes it impractical for widespread use.
This article describes using the inverse transformation to derive world
space coordinates from the mouse’s position on screen.
Before we worry about the inverse transformation, we need to estab-
lish how the standard forward transformation works in a typical graphics
pipeline.

1
2 The View Transformation
The standard view transformation pipeline takes a point in model space
and transforms it all the way to viewport space 1 . It does this by trans-
forming the original point through a series of coordinate systems:

M odel

W orld

V iew

Clip

N ormalizedDevice

V iewport

Not each step is discrete. OpenGL has the GL MODELVIEW matrix, (M ),


that transforms a point p from model space to a point v in view space. Both
model and view space use a right-handed coordinate system with +Y up,
+X to the right, and -Z into the screen.

Mp = v (1)

Another matrix, GL PROJECTION (P ), then transforms the view space


point to the homogeneous clip space point c. Clip space is a right-handed
coordinate system (+Z into the screen) contained within a canonical clip-
ping volume extending from (−1, −1, −1) to (+1, +1, +1).

Pv = c (2)
1
viewport coordinates are also sometimes known as window coordinates or, for systems
without a window system, screen coordinates

2
After clipping is performed the perspective divide transforms the ho-
mogeneous coordinate c back into the Cartesian point n in normalized de-
vice space. Normalized device coordinates are left-handed, where nw = 1,
and are contained within the canonical view frustum from (−1, −1, −1) to
(+1, +1, +1).

c
n = (3)
cw
Finally there is the viewport scale and translation, V , which transforms
n into the final viewport window coordinates w. Another axis inversion
occurs here; this time +Y goes down instead of up.2 . Viewport depth val-
ues are calculated by rescaling NDC Z coordinates from the range (−1, 1)
to (0, 1), with 0 at the near clip plane and 1 at the far clip plane. It’s impor-
tant to note that any user specified depth bias may impact our calculations
later.

Vn = w (4)

Using this pipeline we can take a model space point, apply a series of
transformations, and get a viewport coordinate.
Our goal is to transform the mouse position in viewport coordinates
all the way back to world space. Since we’re not rendering a model, model
space and world space are the same.

3 The Inverse View Transformation


To go from mouse coordinates to world coordinates we have to do the
exact opposite of the view transformation:

V iewport → N DC → Clip → V iew → W orld/M odel

That’s a lot of steps, and it’s easy to screw up, and if you screw up just
a little that’s enough to blow everything apart.
2
Some less common window systems may place the origin at another location, such
as the bottom left of the window, so this isn’t always true

3
3.1 V iewport → N DC → Clip
The first step is to transform the viewport coordinates into clip coordi-
nates. The viewport transformation V takes a normalized device coor-
dinate n and transforms it into a viewport coordinate. Given viewport
width w and height h then our viewport coordinate v is:

 nx +1 
2
w
1−ny 
Vn=v= 
2
h (5)
nz +1
2

So we need to do the inverse of this process by rearranging Equation 5


to solve for n:
 2vx 
w
−1
 2vy 
n= h
2v z − 1
 (6)
1
Okay, not so bad. The only real question is “what is v z ?” We can either
calculate v z by reading it back from the framebuffer, or ignore it altogether
and substitute 0, in which case we’ll be computing a ray passing through
v that we’ll then have to intersect with world geometry to find the corre-
sponding point in 3-space.
From here we need to go to clip coordinates, which, if you recall, are
the homogeneous versions of the NDC coordinates (i. e. w 6= 1.0). Since w
is already 1.0 and the transformation back to clip coordinates is a scale by
w, this step can be skipped and we can assume that our NDC coordinates
are the same as our clip coordinates.

3.2 Clip → V iew


A vector v in view space is transformed to clip coordinates c by multiply-
ing it against the GL PROJECTION matrix P :

Pv = c (7)

Given this, we can do the opposite by multiplying the clip coordinate


by the inverse of the GL PROJECTION matrix. This isn’t as scary as it

4
sounds – we can avoid computing a true 4x4 matrix inverse if we just
construct the inverse projection matrix at the same time we build the pro-
jection matrix.
A typical OpenGL perspective projection matrix P takes the form:
 
a 0 0 0
 0 b 0 0
P = 0 0 c d
 (8)
0 0 e 0
The specific coefficient values depend on the nature of the perspective
projection matrix (for more information I recommend you look at the man
pages for gluPerspective). These co-efficients should scale and bias
v x , v y , and v z into clip space while assigning −v z to cw .
To transform from view coordinate v to clip coordinates c:
 
av x
 bv y 
Pv = c =  
cv z + dv w  (9)
ev z
So solving for v we get:
cx
 
a
 cy 
v= b  (10)
 cw 
e
cz ccw
d
− de

Encoding Equation 10 in matrix form gives us the inverse projection


matrix:
1 
a
0 0 0
0 1 0 0 
P −1 = 
0 0 0
b
1 
 (11)
e
0 0 d1 − de c

Computing the view coordinate from a clip coordinate is now simply:

P −1 c = v (12)

5
There’s no guarantee that v w will be 1, so we’ll want to rescale v ap-
propriately:
v
v0 = (13)
vw

3.3 V iew → M odel


Finally we just need to go from view coordinates to world coordinates
by multiplying the view coordinates against the inverse of the modelview
matrix. Again we can avoid doing a true inverse if we just logically break
down what the modelview transform accomplishes when working with
the camera: it is a translation (centering the universe around the camera)
and then a rotation (to reflect the camera’s orientation). The inverse of this
is reversed rotation (accomplished with a transpose) followed by a trans-
lation with the negation of the modelview matrix’s translation component
after it has been rotated by the inverse rotation.
If given our initial modelview matrix M consisting of a 3x3 rotation
submatrix R and a 3-element translation vector t:
 
R11 R12 R13 tx
R21 R22 R23 ty 
M = R31 R32 R33 tz 
 (14)
0 0 0 1
Then we can construct the inverse modelview M −1 using the transpose
of the rotation submatrix RT and the camera’s translation vector t.

R T t = t0 (15)
RT11 RT12 RT13 0
 
−t x
R T RT22 RT23 −t0 y 
M −1 =  T21  (16)
R
31 RT32 RT33 −t0 z 
0 0 0 1
If you’re specifying the modelview matrix directly, for example by us-
ing glLoadMatrix, then you already have it lying around and you can
build the inverse as described in Equation 15. If, on the other hand, the
modelview matrix is built dynamically using something like gluLookAt
or a sequence of glTranslate, glRotate, and glScale calls, you can
use glGetFloatv to retrieve the current modelview matrix.

6
Now that we have the inverse modelview matrix we can use it to trans-
form our view coordinate v into world space, giving us w.

M −1 v = w (17)
If depth value under the mouse was used to construct the original
viewport coordinate, then w should correspond to the point in 3-space
where the user clicked. If the depth value was not read then we have
an arbitrary point in space with which we can construct a ray from the
viewer’s position, a:


r = a + t(w − a) (18)
However, there’s a trick we can use to skip Equation 18 altogether. Set-
ting v w in Equation 17 to 0 right before the inverse modelview transforma-
tion any translation components are removed. This means we’ll be taking
a ray in view coordinates and getting a ray back in world coordinates. Of
course this is only relevant if we’re trying to compute a pick ray instead of
back projecting an actual point in space.

4 Picking
We should now have one of two things: either an actual point in world
space corresponding to the location of the mouse click, or a world space
ray representing the direction of the mouse click.
If we have an actual point we can search against all geometry in the
world and see which piece it’s closest to. If we have the ray, then we’ll need
to perform an intersection test between the ray and the world geometry
and find the geometry closest to the near Z clip plane. Either method
should be reasonably simple to implement.

5 Conclusion
Picking objects in a 3D scene using a mouse is a common task, but there
are very few papers that describe pragmatic approaches to accomplishing
this. Hopefully this paper helps someone trying to muddle through this
on their own, God knows I could have used it a few weeks ago.

7
6 Greets
A shout out and greets to my boys Casey Muratori and Nichola Vining
for helping me sort through this shit and hopefully not sounding like a
dumbass. Yo.

You might also like