Mouse Ray Picking Explained
Mouse Ray Picking Explained
Brian Hook
http://www.bookofhook.com
April 5, 2005
1 Introduction
There comes a time in every 3D game where the user needs to click on
something in the scene. Maybe he needs to select a unit in an RTS, or open
a door in an RPG, or delete some geometry in a level editing tool. This
conceptually simple task is easy to screw up since there are so many little
steps that can go wrong.
The problem, simply stated, is this: given the mouse’s position in window
coordinates, how can I determine what object in the scene the user has selected
with a mouse click?
One method is to generate a ray using the mouse’s location and then in-
tersect it with the world geometry, finding the object nearest to the viewer.
Alternatively we can determine the actual 3-D location that the user has
clicked on by sampling the depth buffer (giving us (x, y, z) in viewport
space) and performing an inverse transformation. Technically there is a
third approach, using a selection or object ID buffer, but this has numer-
ous limitations that makes it impractical for widespread use.
This article describes using the inverse transformation to derive world
space coordinates from the mouse’s position on screen.
Before we worry about the inverse transformation, we need to estab-
lish how the standard forward transformation works in a typical graphics
pipeline.
1
2 The View Transformation
The standard view transformation pipeline takes a point in model space
and transforms it all the way to viewport space 1 . It does this by trans-
forming the original point through a series of coordinate systems:
M odel
↓
W orld
↓
V iew
↓
Clip
↓
N ormalizedDevice
↓
V iewport
Mp = v (1)
Pv = c (2)
1
viewport coordinates are also sometimes known as window coordinates or, for systems
without a window system, screen coordinates
2
After clipping is performed the perspective divide transforms the ho-
mogeneous coordinate c back into the Cartesian point n in normalized de-
vice space. Normalized device coordinates are left-handed, where nw = 1,
and are contained within the canonical view frustum from (−1, −1, −1) to
(+1, +1, +1).
c
n = (3)
cw
Finally there is the viewport scale and translation, V , which transforms
n into the final viewport window coordinates w. Another axis inversion
occurs here; this time +Y goes down instead of up.2 . Viewport depth val-
ues are calculated by rescaling NDC Z coordinates from the range (−1, 1)
to (0, 1), with 0 at the near clip plane and 1 at the far clip plane. It’s impor-
tant to note that any user specified depth bias may impact our calculations
later.
Vn = w (4)
Using this pipeline we can take a model space point, apply a series of
transformations, and get a viewport coordinate.
Our goal is to transform the mouse position in viewport coordinates
all the way back to world space. Since we’re not rendering a model, model
space and world space are the same.
That’s a lot of steps, and it’s easy to screw up, and if you screw up just
a little that’s enough to blow everything apart.
2
Some less common window systems may place the origin at another location, such
as the bottom left of the window, so this isn’t always true
3
3.1 V iewport → N DC → Clip
The first step is to transform the viewport coordinates into clip coordi-
nates. The viewport transformation V takes a normalized device coor-
dinate n and transforms it into a viewport coordinate. Given viewport
width w and height h then our viewport coordinate v is:
nx +1
2
w
1−ny
Vn=v=
2
h (5)
nz +1
2
Pv = c (7)
4
sounds – we can avoid computing a true 4x4 matrix inverse if we just
construct the inverse projection matrix at the same time we build the pro-
jection matrix.
A typical OpenGL perspective projection matrix P takes the form:
a 0 0 0
0 b 0 0
P = 0 0 c d
(8)
0 0 e 0
The specific coefficient values depend on the nature of the perspective
projection matrix (for more information I recommend you look at the man
pages for gluPerspective). These co-efficients should scale and bias
v x , v y , and v z into clip space while assigning −v z to cw .
To transform from view coordinate v to clip coordinates c:
av x
bv y
Pv = c =
cv z + dv w (9)
ev z
So solving for v we get:
cx
a
cy
v= b (10)
cw
e
cz ccw
d
− de
P −1 c = v (12)
5
There’s no guarantee that v w will be 1, so we’ll want to rescale v ap-
propriately:
v
v0 = (13)
vw
R T t = t0 (15)
RT11 RT12 RT13 0
−t x
R T RT22 RT23 −t0 y
M −1 = T21 (16)
R
31 RT32 RT33 −t0 z
0 0 0 1
If you’re specifying the modelview matrix directly, for example by us-
ing glLoadMatrix, then you already have it lying around and you can
build the inverse as described in Equation 15. If, on the other hand, the
modelview matrix is built dynamically using something like gluLookAt
or a sequence of glTranslate, glRotate, and glScale calls, you can
use glGetFloatv to retrieve the current modelview matrix.
6
Now that we have the inverse modelview matrix we can use it to trans-
form our view coordinate v into world space, giving us w.
M −1 v = w (17)
If depth value under the mouse was used to construct the original
viewport coordinate, then w should correspond to the point in 3-space
where the user clicked. If the depth value was not read then we have
an arbitrary point in space with which we can construct a ray from the
viewer’s position, a:
−
→
r = a + t(w − a) (18)
However, there’s a trick we can use to skip Equation 18 altogether. Set-
ting v w in Equation 17 to 0 right before the inverse modelview transforma-
tion any translation components are removed. This means we’ll be taking
a ray in view coordinates and getting a ray back in world coordinates. Of
course this is only relevant if we’re trying to compute a pick ray instead of
back projecting an actual point in space.
4 Picking
We should now have one of two things: either an actual point in world
space corresponding to the location of the mouse click, or a world space
ray representing the direction of the mouse click.
If we have an actual point we can search against all geometry in the
world and see which piece it’s closest to. If we have the ray, then we’ll need
to perform an intersection test between the ray and the world geometry
and find the geometry closest to the near Z clip plane. Either method
should be reasonably simple to implement.
5 Conclusion
Picking objects in a 3D scene using a mouse is a common task, but there
are very few papers that describe pragmatic approaches to accomplishing
this. Hopefully this paper helps someone trying to muddle through this
on their own, God knows I could have used it a few weeks ago.
7
6 Greets
A shout out and greets to my boys Casey Muratori and Nichola Vining
for helping me sort through this shit and hopefully not sounding like a
dumbass. Yo.