The State of The Art in Mobile Graphics Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Mobile Graphics Survey

The State of the Art in


Mobile Graphics Research
Tolga Capin ■ Bilkent University

Kari Pulli ■ Nokia Research Center Palo Alto

Tomas Akenine-Möller ■ Lund University

M obile phones are virtually omnipresent.


In 2008, 3.3 billion people—half the
world population—use mobile phones,
according to the International Telecommunications
Union. By 2010, Nokia expects that there will be
In addition, most mobile phones include a cam­
era, which allows many possibilities for better user
interaction with the device, as well as augmented
reality (AR) applications that combine digital im­
ages (rendered graphics models) with real-world
as many mobile phone users as toothbrush users (4 images (such as those on a camera viewfinder).
billion). Over the past 10 years, the phone has ex­ Standard mobile graphics APIs have laid the foun­
panded from being just a phone dation for much mobile graphics research and appli­
High-quality computer to being a full multimedia unit, cations. For 3D graphics, there’s OpenGL ES, which
on which you can play games (see is a low-level API based on the popular OpenGL, and
graphics let mobile-device
Figure 1), shoot photos, listen to M3G (JSR 184), which is designed on top of OpenGL
users access more compelling
music, watch television or video, ES and supports scene graphs, animation, and file
content. Still, the devices’
send messages, and do video­ formats for mobile Java. Kari Pulli and his colleagues
limitations and requirements conferencing. cover various uses of OpenGL ES and M3G.1 For 2D
differ substantially from One factor leading to the wide­ vector graphics, there’s OpenVG, a low-level API
those of a PC. This survey spread adoption of mobile phones similar to OpenGL, and Scalable Vector Graphics
of mobile graphics research has been the dramatic improve­ API for mobile Java (JSR 226). A description of these
describes current solutions in ment in display technologies. Dis­ and other related APIs appears elsewhere.2
terms of specialized hardware plays used to be monochromatic By mobile, we mostly mean handheld devices. So,
(including 3D displays), and small (48 × 84 pixels). Today, although aviation or car displays are certainly mo­
rendering and transmission, we have 24-bit (16.8 million col­ bile, they fall outside this article’s scope. Here, we
visualization, and user ors) displays with VGA resolution aim to survey the state of mobile graphics research.
(640 × 480 pixels). Consequently, We don’t address issues related to particular appli­
interfaces.
mobile phones have the potential cations and development tools. We also don’t dis­
to deliver graphics to the masses. cuss mobile gaming in depth; Mark Callow and his
The mobile context differs vastly from the PC colleagues provide a good overview of mobile-game
context. A mobile phone development and distribution.3 Jörg Baus and his
colleagues survey 2D and 3D maps for navigation,
■ is always with you, which is also mostly beyond this article’s scope.4
■ is always connected to the network and can find Furthermore, we concentrate on interactive graph­
its location and provide access to location-based ics because noninteractive graphics can be simply
services and navigation, and rendered on other devices and rendered as simple
■ supports applications that require a graphics- bitmaps. For this reason, we also address user inter­
intensive user interface. faces and handheld interaction techniques.

74 July/August 2008 Published by the IEEE Computer Society 0272-1716/08/$25.00 © 2008 IEEE
Handhelds’ limitations
Compared to the desktop, handheld devices are
limited by

■ power supply,
■ computational power,
■ physical display size, and

© 2008 Nokia.
■ input modalities.

Mobile devices’ fundamental problem is that


they’re battery operated. Whereas many other
aspects of computing follow Moore’s law, battery that can scale down to low-end mobile phones and Figure 1.
technology develops much more slowly. The dis­ up to larger devices, even PCs. High-quality
play is one of the largest consumers of power, and Industry and academia researchers have devel­ graphics games
graphics applications keep the display, often with a oped several solutions to these problems. The fol­ have reached
backlight, constantly on. Innovation is required at lowing sections describe the key approaches. mobile devices.
the hardware level for lower power consumption,
while diligence is required at the software level for Graphics hardware
power-aware mobile applications. Finally, the de­ A given task, such as 3D rendering, can always
vices are small; even if more power were available, be done more efficiently on special-purpose hard­
that power would turn into heat, which can dam­ ware than on a general-purpose CPU. It’s possible
age circuits unless the design process considered to write a rendering engine fully in software ex­
the thermal aspects early on. ecuting on a CPU, providing maximum flexibility.
Mobile device CPUs also have limited computing In fact, most mobile 3D engines are still software
power. A related limitation is internal bandwidth implementations. However, dedicated graphics
for memory accesses, which increases more slowly hardware can provide both faster execution and
than raw computing power and consumes much lower power consumption. Dedicated graphics
power. Another limitation is cost: mass-market processing units (GPUs) are already available on
consumer devices should be cheap, which limits high-end smart phones. Some GPUs are available
the silicon budget. For example, only the most re­ on a separate chip, but often the GPU and CPU
cent high-end phones support floating-point units. are on the same chip, which decreases manufac­
Having dedicated graphics hardware helps the de­ turing costs.
vices get by with lower-clock-rate CPUs. Although modern graphics engines, such as
Although the pixel pitch ratio is increasing at a OpenGL ES 2.0, provide programmable compo­
stable rate, the requirements to keep the devices nents—so-called shaders—a lot of functionality
handheld and pocketable means that the devices’ still isn’t programmable but consists of blocks of
physical size has an upper bound. Whereas the fixed functionality that can be parameterized and
largest displays might have a diameter of up to 5 turned on or off. Fixing the functionality allows
inches, many devices have much smaller displays. more efficient implementations and latency hid­
Furthermore, mobile devices currently support ing. Triangle setup, texture fetch and filtering, and
key-based interfaces through joypad and direction blending operations can be more efficient when
keys and a numerical keyboard. On larger devices, implemented in dedicated logic.
additional keys provide a better user experience The key to good graphics performance and low
for complex tasks because keys can be dedicated to power consumption is to reduce the internal traf­
specific tasks. Smart phones can’t easily use such fic between the processing elements and the mem­
keys owing to limited physical space. Interaction ory. So, mobile graphics solutions focus on how to
with touch-sensitive screens has emerged as an al­ compress and even completely avoid that traffic.
ternative, but most solutions require two-handed Reducing the traffic is even more important be­
interaction, which causes additional attentional cause computation power increases more quickly
overhead in users. than memory bandwidth. For example, John Ow­
Finally, there’s an order of magnitude difference ens reports that the yearly processing capability
between high- and low-end devices in graphics pro­ growth is about 71 percent, while dynamic RAM
cessing and computational capacity. A particular bandwidth grows only by 25 percent.5 This differ­
technique might run efficiently in one device but ence suggests that one should take great care when
be inefficient on another. This requires solutions designing a GPU architecture.

IEEE Computer Graphics and Applications 75


Mobile Graphics Survey

Primitives
Geometry Scene data
Primitives both processes must occur in hardware in real time.
For example, the color buffer can be compressed,
Tiling Transformed so when a triangle is rendered to a block of pix­
Tile lists scene data els (say, 4 × 4) in the color buffer, the hardware
GPU attempts to compress this block. If this succeeds,
Primitives Memory the data is marked as compressed and sent back to
Rasterizer per tile the main memory in compressed form over the bus
pixel shader and stored in that form. Most buffer compression
Texture read algorithms are exact to avoid error accumulation.
However, if the algorithm is lossy, the color data
On-chip Write
can be lossily compressed and later recompressed,
Frame buffer
buffers and so on, until the accumulated error exceeds the
RGBA/Z
threshold for what’s visible. This is called tandem
compression, meaning that if compression fails, you
Figure 2. A tiling architecture. The primitives are being transformed must have a fallback that guarantees an exact color
and stored in external memory. There they are sorted into tile lists, buffer—namely, sending the data uncompressed.7
where each list contains the triangles overlapping that tile. This makes it Depth and stencil buffers might also be com­
possible to store the frame buffer for a tile (for example, 16 × 16 pixels) pressed. The depth buffer deserves special men­
in on-chip memory, which makes accesses to the tile’s frame buffer tion because its contents are proportional to 1/z,
extremely inexpensive. and when viewed in perspective, the depth values
over a triangle remain linear. Depth-buffer com­
Compression pression algorithms heavily exploit this property,
Compression not only saves storage space, but it which accounts for higher compression rates. A
also reduces the amount of data sent over a net­ survey of existing algorithms appears elsewhere.8
work or a memory bus. For GPUs, compression Interestingly, all buffer codec algorithms are
and decompression (codec) have two major tar­ transparent to the user. All action takes place in the
gets: textures and buffers. GPU and is never exposed to the user or program­
Textures are read-only images glued onto geomet­ mer, so there’s no need for standardization. There’s
rical primitives such as triangles. A texture codec no major difference for buffer codec on mobile
algorithm’s core requirements include fast random devices versus desktops, but mobile graphics has
access to the texture data, fast decompression, caused renewed interest in such techniques.
and inexpensive hardware implementation. The
random-access requirement usually implies that a Tiling architectures
block of pixels is compressed to a fixed size. For ex­ Tiling architectures aim to reduce the memory traf­
ample, a group of 4 × 4 pixels can be compressed fic related to frame-buffer accesses using a com­
from 3 × 8 = 24 bits per pixel down to 4 bits per pletely different approach. Tiling the frame buffer
pixel, requiring only 64 bits to represent the whole so that a small tile (such as a rectangular block of
group. As a consequence of this fixed-rate compres­ pixels) is stored on the graphics chip provides many
sion, most texture compression algorithms are lossy optimization and culling possibilities. Commercial­
(for example, JPEG) and usually don’t reproduce the ly, Imagination Technologies and ARM offer mobile
original image exactly. Because textures are read- 3D accelerators using tiling architectures. Their core
only data and usually compressed offline, the time insight is that a large chunk of the memory accesses
spent compressing the image isn’t as important as are to buffers such as color, depth, and stencil.
the decompression time, which must be fast. Such Ideally, we’d like the memory for the entire frame
algorithms are sometimes called asymmetric. buffer on-chip, which would make such memory
As a result of these requirements, developers have accesses extremely inexpensive. However, this isn’t
adopted Ericsson Texture Compression (ETC) as a practical for the whole frame buffer, but storing a
new codec for OpenGL ES.6 ETC stores one base small tile of, say, 16 × 16 pixels of the frame buffer
color for each 4 × 4 block of texels and modifies on-chip is feasible. When all rendering has been
the luminance using only a 2-bit lookup index per finished to a particular tile, its contents can be
pixel. This technique keeps the hardware decom­ written to the external frame buffer in an efficient
pressor small. Currently, no desktop graphics APIs block transfer. Figure 2 illustrates this concept.
use this algorithm. However, tiling has the overhead that all the tri­
Buffers are more symmetric than textures in angles must be buffered and sorted into correct tiles
terms of compression and decompression because after they’re transformed to screen space. A tiling

76 July/August 2008
unit creates, for each tile, a list of triangles overlap­ executing the pixel shader when you can determine
ping with that tile. Each tile can then be processed that the computation results won’t contribute to the
in turn or in parallel with others. This architecture’s final image anyway. For example, consider a block
main advantage is that frame-buffer accesses become of pixels that are all in shadow (completely black).
inexpensive. This must be balanced with the cost of If a high-level mechanism could determine conser­
buffering and sorting the triangles.9 It’s still un­ vatively that these pixels are all in shadow, then per-
known whether a traditional architecture or tiling pixel shadow computations could be avoided.
is best. The optimal approach also depends on the This is another type of culling, and the basic idea
content being rendered. For example, if the overdraw is implemented in the programmable culling unit
factor is high, the tiled approach can be a winner, (PCU).13 The PCU executes the pixel shader once
but if there are many long but thin triangles, the over an entire block of pixels. For conservative out­
traditional nontiled approach might work better. put, the computations are carried out using inter­
val arithmetic, so the input is the intervals of the
Culling block’s in-parameters. The total number of instruc­
Even better than compressing data is to avoid pro­
cessing it. To cull means “to select from a group,”
and this often amounts to avoiding processing data At some point, it’s likely that the GPU
that doesn’t contribute to the final image. One par­
ticular technique stores (in a cache) the maximum, will become compute-bound—that is,
Zmax, of the depth values in a block of pixels, and
when rendering to this block, the GPU estimates
limited in performance because of too
conservatively whether the triangle is behind Zmax. much computation.
If so, all per-pixel processing can be avoided in that
block because the triangle will be hidden.10
A similar technique stores the minimum, Zmin, tions decreased from 48 to 71 percent, which indi­
of the depth values and determines whether a tri­ cates that a performance increase of about 2 times
angle is definitely in front of all rendered geometry is possible. In addition, the memory bandwidth
in a block. If so, depth buffer reads can be avoided usage decreased by 14 to 28 percent. Interestingly,
in the block.11 You can also use Zmin to handle the PCU can also operate in a lossy mode. The pro­
other depth tests. These two techniques are often grammer can activate this by instructing the pixels
called Z-culling. to be killed if the contribution is less than, say, 1
Another technique uses occlusion queries. The percent of the maximum intensity. In such a case,
programmer renders, for example, a bounding box the threshold of when per-pixel processing should
around a complex object, and the occlusion query commence provides a knob that the user can set to
counts how many fragments on the box are visible. trade off image quality for performance.
If no fragments are visible, then the bounding box
is hidden and rendering the complex object can be Adaptive voltage scaling
avoided. Another approach, called delay streams, can The techniques we just discussed are high-level so­
also be used for occlusion culling.12 The idea is to lutions. Other methods reduce power usage at the
process the triangles as usual, write to the depth buf­ hardware level. Several researchers have proposed
fer, and delay other per-pixel processing. Instead, the low-power GPUs with conventional power manage­
triangles are put in a first-in, first-out queue (that is, ment strategies. Bren Mochocki and his colleagues
the delay stream). When the delay stream is full, the analyze how such factors as resolution, frame rate,
“occluding power”—that is, the Zmax values—builds level of detail, lighting, and texture maps affect pow­
up substantially. As the triangles leave the delay er consumption of mobile 3D graphics pipeline stag­
stream, they are tested against the Zmax values, and es.14 On the basis of this analysis, they use dynamic
many fragments can be skipped because they’re now voltage and frequency scaling (DVFS) schemes for
occluded by other surfaces. different pipeline stages. Using a prediction strategy
With the advancement of programmable shaders, for workloads for the different stages, DVFS could
more work is being put into pure computation. At decrease power consumption by 40 percent.
some point, it’s likely that the GPU will become com­
pute-bound—that is, limited in performance because 3D displays and rendering
of too much computation. One solution is to spend Many solutions for mobile 3D displays don’t re­
more time on shader compiler optimization, but that quire additional peripherals, such as glasses or head
only takes you so far. Another solution is to avoid gear. Such displays are often called auto­stereoscopic

IEEE Computer Graphics and Applications 77


Mobile Graphics Survey

displays. Rendering to such displays can be more a central view when possible.15 When approxima­
expensive than rendering to a regular display. So, tive rendering is acceptable, you can avoid many
specialized algorithms and hardware can help re­ per-pixel shader instruction executions. For stereo
duce the workload. rendering, about 95 percent of the computations
To give the sensation of 3D to a stationary ob­ and bandwidth is avoided for the left view (the
server, a device must exploit a key source of 3D right view must be rendered as usual).
perception: the binocular parallax. All autostereo­
scopic displays exploit the binocular parallax through Rendering and transmission
direction-dependent displaying. This means that the In parallel with advances in graphics hardware and
device must provide different views for each eye. displays, we’re witnessing a dramatic increase in the
Existing solutions employ either a volumetric, complexity of graphics models on mobile devices.
multiview, or holographic display. The display most Here, we highlight recent advances in rendering and
applicable to mobile devices is the multiview dis­ transmitting such models on mobile devices.
play, which uses lens arrays or parallax barriers To overcome the complexity of representing the
to direct or select simultaneously displayed images mesh connectivity, numerous solutions convert in­
depending on the viewpoint. All these solutions put mesh models to internal, more efficient repre­
provide a single or multiple observer location from sentations. Florent Duguet and George Drettakis’s
where a stereo pair of images is visible, while other solution uses point-based graphics.17 They create
point samples from an input mesh as a preprocess
or procedurally on the fly and create a hierarchi­
In parallel with advances in graphics cal representation of the object samples’ bounding
volumes. During rendering, the processing of the
hardware and displays, we’re witnessing hierarchy stops at a specified depth, achieving flex­
a dramatic increase in the complexity of ible rendering that’s scalable to the mobile device’s
speed requirements and screen size. This approach
graphics models on mobile devices. is also memory efficient because it doesn’t need to
keep the whole model in main memory.
An alternative approach eliminates rendering
positions yield unfocused or incorrect views. nonimportant parts of the graphical content. Vidya
Stereo rendering generally costs twice as much in Setlur and her colleagues’ method considers the hu­
computation and bandwidth. However, for a larger man perception system’s limitations for retargeting
angle of usage (that is, larger than the angle between 2D vector animations for small displays.18 They aim
the two eyes of an observer), some displays use even to preserve key objects’ recognizability in a vector
more views, which requires more processing. Special­ graphics animation by exaggerating the important
ized hardware can potentially render to autostereo­ objects’ features and eliminating insignificant parts
scopic displays more efficiently because the images during rendering. Instead of uniformly scaling down
for the left and right eyes are similar. In contrast, the input to small displays, this perceptually based
with a brute-force implementation, the scene is ren­ solution uses nonuniform scaling of objects, based
dered first to the left eye and then to the right eye. on the objects’ importance in the scene.
However, it makes sense to render a single tri­ Jingshu Huang and her colleagues try a dif­
angle to both views before proceeding with the next ferent approach to rendering complex models on
triangle.15 Aravind Kalaiah and Tolga Capin use this small screens.19 Their MobilVis system adapts well-
rendering order to reduce the number of vertex known illustrative rendering techniques, such as
shader computations.16 Splitting the vertex shader interactive cutaway views, ghosted views, silhou­
into view-independent (computed once) and view- ettes, and selective rendering, to mobile devices to
dependent parts can greatly reduce vertex shader more clearly convey an object’s shapes, forms, and
computations. In the following per-pixel processing interior structures.
stage, a simple sorting procedure in a generalized Although these solutions provide more efficient re­
texture space greatly improves the texture cache hit sults than basic graphics rendering, they’re still lim­
ratio, keeping the texture bandwidth close to that ited by the devices’ processing power. Because mobile
of monoscopic rendering.15 devices are always connected to the network, remote
In addition, Jon Hasselgren and Tomas Ak­ rendering becomes a viable alternative. Typically,
enine-Möller introduce approximate rendering in this technique uses a client–server approach. The
the multiview pipeline, so that fragment colors in rendering occurs on a high-performance server or a
all neighboring views can be approximated from PC; the mobile client receives intermediate results

78 July/August 2008
Courtesy of Zumobi (www.zumobi.com).
over a network connection and renders the final
results. Chun-Fa Chang and Shyh-Haur Ger pres­
ent an image-based remote-rendering solution,
where the client receives depth images from the
server and applies a 3D warping method, achiev­
ing near-real-time rates.20 Daoud Hekmatzada and
his colleagues present a nonphotorealistic render­
ing solution, based on drawing silhouettes and tions for visualizing five types of data for mobile Figure 3.
contour lines as primitives.21 applications such as text, pictures, maps, physical Zumobi’s user
A related problem is the transmission of com­ objects, and abstract data.25 interface.
plex models to mobile devices. Downloading such Patrick Baudisch and Ruth Rosenholtz propose The interface
models through the air requires much bandwidth. the classification of the two following approaches platform
In Xiaonan Luo and Guifeng Zheng’s solution for to visualization on mobile devices.26 supports a
transmitting meshes, the mobile device commu­ zoomable
nicates with a wired IP server via an IP network Overview + Detail. These approaches are based on dis­ Web-browsing
and a wireless channel.22 This solution is based on playing two different views of the data simultane­ experience on
a flexible progressive mesh coding technique that ously—one for context and one for detail. While the mobile devices.
adapts to different bit-rate and error-resilience user navigates around the large data in the context
requirements, while minimizing computational view, the detailed view displays the area in focus.
complexity usually associated with a transcoder.
Azzedine Boukerche and Richard W.N. Pazzi pres­ Focus + Context. These approaches use a single view
ent a streaming protocol for 3D virtual-environ­ into data, with nonuniform scaling of data ele­
ment exploration on mobile devices; they address ments. The most prominent solution is the fish-
network transmission problems such as rate and eye view, which magnifies the data in the user’s
congestion control.23 Siddhartha Chattopadhyay attention and renders distant objects in progres­
and his colleagues describe power-aware compres­ sively smaller sizes. Fish-eye views are mostly used
sion and transmission of motion capture data for in maps and menus.27
mobile devices.24 One example of this approach is speed-dependent
Several issues must be solved for remote ren­ adaptive zooming. Tolga Capin and Antonio Haro
dering, such as connectivity problems, latency capture the device’s physical movement from
for transmitting user input, and rendered images. camera input, which they analyze to determine
Hybrid solutions that balance processing between scroll direction and magnitude.28 The zoom level
on-device and remote rendering present interest­ increases or decreases depending on the scroll’s
ing research possibilities. magnitude. For example, when a user moves a
phone, the view zooms out and the display shows
Visualization and user interfaces an overall view. When the user stops moving the
The key challenges in mobile visualization and phone, the zooming level gradually increases and
user interfaces relate to small displays and the the display shows a detailed view.
limited amount of interaction hardware compared Benjamin Bederson and his colleagues devel­
to the desktop (for example, there’s no mouse or oped DateLens, a fish-eye interface for a calendar
a full-size keyboard). Interaction is an important on mobile devices.29 The user first sees an overview
component of most graphics applications. of a large time period using a graphical representa­
tion of each day’s activities. Choosing a particular
Visualization day expands the area representing that day and
Presenting large amounts of graphical data and reveals the appointment list in context.
complex user interface components more effec­ Recently, Amy Karlson and her colleagues proposed
tively on small displays is a key research topic. AppLens and LaunchTile design solutions that adapt
When the data complexity exceeds what mobile the UI to multiple devices with different resolutions
displays can show, users must manually browse and aspect ratios.30 AppLens uses a tabular fish-eye
through the large data. This can easily happen approach for integrated access and notification for
when rendering and visualizing 2D data (such as nine applications. LaunchTile uses pure zooming
maps or documents) or 3D data (such as medical within a landscape of applications to accomplish the
data or city models). Scalable and zoomable user same goals. A further development of LaunchTile is
interfaces also require such visualization tech­ the zoomable fish-eye visualization of Zumobi, for
niques. Luca Chittaro surveys problems and solu­ Web browsing on mobile devices (see Figure 3).

IEEE Computer Graphics and Applications 79


Mobile Graphics Survey

Figure 4. The
Halo approach
displays arcs toward the front. The user can “catch” a photo, vid­
at the detailed eo, or application and make it active. This includes
view’s borders. showing the video or photo in higher resolution or
The ring’s radius activating the application. Programmable vertex
is proportional and pixel shaders render depth-of-field effects and
to the distance. motion blur. These shaders also animate “wobbly”
windows using vertex skinning.

Directly manipulating content


Mobile devices are currently limited in the mode of
interaction they provide to users. However, direct-
Image courtesy of Patrick Baudisch and Ruth Rosenholtz.
manipulation interfaces provide a more intuitive
Another problem is visualizing the location of interaction than current key-modal and menu-
off-screen objects because small mobile displays based systems.31 Users can manipulate individual
can’t display all data at once. Solutions to this objects, each with a direct display representation.
problem augment the detailed view with visual They apply actions directly to objects by selecting
references to off-screen objects. For example, Bau­ them and then choosing a command. Graphi­
disch and Rosenholtz use the “street lamp” meta­ cal representation is key for direct manipulation:
phor, with an associated halo that includes a red users manipulate, through selection events and
arc at the detailed view’s borders.26 Figure 4 illus­ moving a pointing device, a graphical or iconic
trates the Halo approach. representation of the underlying data. Dragging
an object by the pointer is an example of this in­
3D user interfaces teraction mode.
Three-dimensional user interfaces are a key ap­ Recently, stylus- and thumb-based interaction
plication of visualization on mobile devices, es­ with touch-sensitive screens has emerged as a
pecially those with autostereoscopic displays. solution for mobile direct manipulation.32 Stylus-
Creating 3D interfaces that approach the rich­ based interaction, although accurate for selecting
ness of 3D reality has long been a research tar­ objects in a small screen, requires both hands and
get of several other research groups, particularly has caused additional attentional overhead.33 To
for desktop environments. Ben Shneiderman and overcome this problem, researchers have devel­
Catherine Plaisant analyzed the features of effec­ oped one-handed thumb-based interaction. Ap­
tive 3D interfaces, primarily for desktop and near- ple’s iPhone is the most prominent example; with
to-eye display domains, and proposed numerous a multitouch capacitive touch screen, it lets us­
guidelines.31 These include making better use of ers interact with applications and type using their
occlusion, shadows, and perspective; minimizing thumbs. Karlson and her colleagues have further
the number of steps in navigation; and improving developed several high-level gestures for more in­
text readability with better rendering and contrast tuitive interaction with their zoomable user inter­
with the background. face solution.30
Graphics hardware support for OpenGL ES 2.0 Researchers have also incorporated physical sensors
in a mobile phone opens up new possibilities for such as accelerometers in mobile devices for richer
Figure 5. A user interfaces owing to the programmable nature user interaction.34 However, such sensors produce
sequence of of that API. Because 3D UI rendering solutions de­ error buildup over time. One way to overcome this
images from veloped for desktop computers don’t scale down is by merging relative continuous data from physical
the SocialRiver well to mobile devices, a different set of widgets sensors with absolute but potentially intermittent
user interface. must be developed. In Figure 5, photos, videos, and data. This approach has provided good results and
Using OpenGL applications drop down at the far end and move could lead to reliable tracking solutions.
ES 2.0,
SocialRiver
implements
motion blur,
depth of field,
and vertex
skinning. Video
input can also
be composited.
Courtesy of The Astonishing Tribe AB (www.tat.se).

80 July/August 2008
Alternatively, researchers have proposed several
solutions where incoming camera video estimates
phone motion and interacts with the user’s physi­
cal environment.28 With camera-based interac­
tion, users move the pointer or change the view by
moving the phone instead of interacting with the
screen or keypad. Correctly interpreting the ob­
jects’ observed motion or the camera’s global mo­
tion from video requires accurate tracking. Among

Courtesy of Anders Henrysson.


the recent solutions, Jari Hannuksela and his col­
leagues propose region-based matching that uses
a sparse set of features for motion analysis and a
Kalman filter-based tracker for estimation.35 Capin
and Haro’s solution tracks individual corner-like
features observed in the entire incoming camera
frames.28 This lets the tracker recognize sudden
camera movements of arbitrary size, as long as at system provides a much easier closed-loop control Figure 6. In
least some features from the previous frame are system because it analyzes the image, localizes the AR Tennis, the
still visible. The tradeoff is that the tracker can’t annotated object only with respect to the camera, camera tracks
detect rotations. and overlays the annotations with the target. markers on the
Whereas NaviCam was tethered to a worksta­ table and the
Augmented reality tion, Daniel Wagner and Dieter Schmalstieg cre­ other player’s
AR, which augments video with graphics, can be ated the first autonomous handheld AR system.40 camera. The
contrasted with virtual reality, which renders ev­ They ported ARToolkit (www.hitl.washington. players attempt
erything with computer graphics, and telepresence, edu/artoolkit), a popular library for many AR to bounce
which conveys reality somewhere else by trans­ demos that tracks camera position with respect to the ball back
mitting video and audio. Whereas many mobile- square markers, to a PDA. The system offloaded and forth in a
graphics applications resemble desktop-graphics the tracking to a server for faster frame rates, and virtual tennis
applications (only with more constraints and less the graphics rendering used a proprietary subset court.
performance), AR provides a user experience on of OpenGL. Soon after, other researchers imple­
a mobile system that’s different from, and better mented similar systems on mobile phones, such
than, the desktop user experience. Here, we dis­ as Mathias Möhring and his colleagues, who im­
cuss some early AR systems. plemented their own tracker,41 and Anders Hen­
One early example of mobile AR is the Touring rysson and his colleagues, who adapted Wagner’s
Machine.36 The main system consisted of a back­ ARToolkit port to Symbian.42 Both these systems
pack loaded with a computer and batteries. The user used OpenGL ES for graphics rendering.1
wore a head-mounted display and camera and held AR is also useful in gaming, and several games
a tablet and stylus for input. The system worked as feature an AR phone. In Mosquito Hunt by Sie­
a campus tour guide, displaying the building names mens, virtual mosquitoes are drawn over live video
and related information over the buildings on its from a camera. By moving the phone and track­
optical-see-through head-mounted display. Two ing the motion flow in the camera, users try to
surveys cover the basic components and problems zap the mosquitoes. In Marble Revolution2 (www.
of AR37 and developments in the late 1990s.38 bit-side.com/311.html), the motion flow guides a
Jun Rekimoto and Katashi Nagao’s Navicam was marble through a maze. Kick Real (www.kickreal.
an early handheld AR system.39 It consisted of a de) shows a soccer ball on the ground that users
handheld display that showed real-time camera can kick. AR Tennis tracks markers on a table to
imagery. The images were passed to a workstation anchor a tennis field and tracks additional mark­
for analysis. If the system recognized color-coded ers on players’ phones for a collaborative or com­
ID tags, it would superimpose situation-sensitive petitive tennis game (see Figure 6).42
information over the camera image and display it Most mobile AR systems use markers to track the
on the device. This kind of video-see-through sys­ camera’s relative position with respect to objects or
tem has many advantages over optical-see-through use optical flow to track the phone motion. More
systems. Optical systems are open-loop control recently, some systems have done away with mark­
systems that require a good world model and ac­ ers. The PhoneGuide is a museum guide based on
curate tracking of the user’s eye position. A video camera phones (see the Projects in VR article in

IEEE Computer Graphics and Applications 81


Mobile Graphics Survey

data sets. For both software and hardware tech­


niques and algorithms, it would be convenient to
have a knob that the user can turn to trade off im­
age quality and operation time. Approximate ren­
dering for graphics hardware is a field that hasn’t
been investigated thoroughly, and we expect that
many new innovations will emerge.
Autostereoscopic displays can provide a major
breakthrough on mobile devices before it does so
on desktops. Interestingly, several such displays
can already switch between displaying a standard
2D image and conveying a 3D autostereoscopic
experience. Graphics APIs could easily add sup­
port for these displays. For 3D TV and video, open
© 2007 IEEE.

issues in standardization organizations still exist.


The main practical obstacle for autostereoscopic
displays is creating content that fully benefits
from such displays.
Figure 7. The this issue for more on this).43 As Figure 7 illustrates, User interfaces is an area where much innovation
PhoneGuide. the PhoneGuide determines the user’s approximate will happen at every level. The low-level APIs, such
The user points location using Bluetooth beacons, so the vision sys­ as OpenVG and OpenGL ES are there, but using 3D
a camera tem only needs to distinguish between a smaller so that it truly enhances the user experience is still
phone to an set of objects. The system splits the input image an active research issue. Multimodal interfaces that
object in a into bins, each bin produces a global feature vector integrate voice, gesture, stylus or finger input, and
museum (left). consisting of various histograms and ratios (colors, keyboard input with interactive graphics and sound
A Bluetooth intensities, edges, and so on), and a neural net­ rendering, and take human perceptual and cogni­
beacon gives work uses the inputs for recognition. Herbert Bay tive capabilities into account, will create interaction
an approximate and his colleagues also created a museum guide.44 that’s easier and more fun. Games are traditionally
location Their system runs on a tablet PC and uses local good at creating interfaces that are naturally easy to
(top right). scale-invariant Speeded Up Robust Features (SURF) use; hopefully, these UI aspects will become more
The system to recognize objects. Such local feature matchers widespread in mobile UIs.
recognizes work better even if the objects have different back­ Because most mobile devices have a camera,
the image grounds or are partially occluded. SURF has also exploring how we can integrate AR functionality
and provides been ported to camera phones.45 into such cameras is worth exploring. However,
additional The primary remaining challenges in mobile AR the killer AR application has yet to be discovered.
information are object recognition and real-time tracking for The future of mobile graphics is exciting, and our
(bottom right). unprepared markerless environments. Overcom­ community will continue to invent new algo­
ing these challenges allows annotating views with rithms, techniques, and applications that exploit
labels or arrows pointing to objects of interest. the context of mobility.
A secondary problem is seamlessly blending the
graphics objects with the real scene with correct
occlusions and shading. This requires modeling Acknowledgments
the environment and the current illumination The Swedish Foundation for Strategic Research sup-
levels in real time on the device. ported Tomas Akenine-Möller through a grant on
mobile graphics, and additional support came from a

C
Knowledge Foundation visualization grant. The Euro-
learly, we need specialized graphics hardware pean Commission FP7 3DPHONE project (grant FP7-
for power-efficient graphics, but much research 213349) and FP6 3DTV project (grant FP6-511568)
remains to be done. We believe that the best way supported Tolga Capin.
around the battery capacity problem is to continue
work on all fronts, which includes more efficient
high-level graphics hardware algorithms, intelligent References
low-level power management, and clever software 1. K. Pulli et al., Mobile 3D Graphics with OpenGL ES
techniques for rendering and transmission. This and M3G, Morgan Kaufmann, 2007.
also includes handling large, complex models and 2. K. Pulli, “New APIs for Mobile Graphics,” Proc. SPIE

82 July/August 2008
Electronic Imaging: Multimedia on Mobile Devices II, Graphics and Applications, vol. 24, no. 4, 2004, pp.
SPIE, 2006, pp. 1–13. 57–63.
3. M. Callow, P. Beardow, and D. Brittain, “Big Games, 18. V. Setlur et al., “Retargeting Vector Animation for Small
Small Screens,” ACM Queue, Nov./Dec. 2007, pp. 2–12. Displays,” Proc. 4th Int’l Conf. Mobile and Ubiquitous
4. J. Baus, K. Cheverst, and C. Kray, “Map-Based Mobile Multimedia (MUM 05), ACM Press, 2005, pp. 69–77.
Services,” Map-Based Mobile Services Theories, Methods 19. J. Huang et al., “Interactive Illustrative Rendering
and Implementations, Springer, 2005, pp. 193–209. on Mobile Devices,” IEEE Computer Graphics and
5. J.D. Owens, “Streaming Architectures and Technology Applications, vol. 27, no. 3, 2007, pp. 48–56.
Trends,” GPU Gems 2, Addison-Wesley, 2005, pp. 20. C.-F. Chang, and S.-H. Ger, “Enhancing 3D Graphics
457–470. on Mobile Devices by Image-Based Rendering,” Proc.
6. J. Ström and T. Akenine-Möller, “iPACKMAN: High- 3rd IEEE Pacific Rim Conf. Multimedia (PCM 02),
Quality, Low-Complexity Texture Compression for LNCS 2532, Springer, 2002, pp. 1105–1111.
Mobile Phones,” Proc. ACM Siggraph/Eurographics Conf. 21. D. Hekmatzada, J. Meseth, and R. Klein, “Non-
Graphics Hardware, ACM Press, 2005, pp. 63–70. Photorealistic Rendering of Complex 3D Models on
7. J. Rasmusson, J. Hasselgren, and T. Akenine-Möller, Mobile Devices,” Proc. 8th Ann. Conf. Int’l Assoc.
“Exact and Error-Bounded Approximate Color Buffer Mathematical Geology, vol. 2, Alfred-Wegener-
Compression and Decompression,” Proc. ACM Stiftung, 2002, pp. 93–98.
Siggraph/Eurographics Symp. Graphics Hardware, 22. X. Luo and G. Zheng, “Progressive Meshes Transmission
Eurographics Assoc., 2007, pp. 41–48. over a Wired-to-Wireless Network,” Wireless Networks,
8. J. Hasselgren and T. Akenine-Möller, “Efficient vol. 14, no. 1, 2006, pp. 47–53.
Depth Buffer Compression,” Graphics Hardware 23. A. Boukerche and R.W.N. Pazzi, “Performance
2006: Eurographics Symp. Proc., A K Peters, 2006, Evaluation of a Streaming-Based Protocol for
pp. 103–110. 3D Virtual Environment Exploration on Mobile
9. I. Antochi et al., “Scene Management Models and Devices,” Proc. Int’l Symp. Modeling Analysis and
Overlap Tests for Tile-Based Rendering,” Proc. Simulation of Wireless and Mobile Systems (MSWiM
EUROMICRO Symp. Digital System Design, IEEE CS 06), ACM Press, 2006, pp. 20–27.
Press, 2004, pp. 424–431. 24. S. Chattopadhyay, S.M. Bhandarkar, and K. Li,
10. S. Morein, “ATI Radeon HyperZ Technology,” Proc. “Human Motion Capture Data Compression by
Workshop Graphics Hardware (Hot3D), ACM Press, 2000; Model-Based Indexing: A Power Aware Approach,”
www.graphicshardware.org/previous/www_2000/ IEEE Trans. Visualization and Computer Graphics, vol.
presentations/ATIHot3D.pdf. 13, no. 1, 2007, pp. 5–14.
11. T. Akenine-Möller and J. Ström, “Graphics for the 25. L. Chittaro, “Visualizing Information on Mobile
Masses: A Hardware Rasterization Architecture Devices,” Computer, vol. 39, no. 3, 2007, pp. 40–45.
for Mobile Phones,” ACM Trans. Graphics (Proc. 26. P. Baudisch and R. Rosenholtz, “Halo: A Technique
Siggraph), vol. 22, no. 3, 2003, pp. 801–808. for Visualizing Off-Screen Objects,” Proc. SIGCHI
12. T. Aila, V. Miettinen, and P. Nordlund, “Delay Conf. Human Factors in Computing Systems (CHI 03),
Streams for Graphics Hardware,” ACM Trans. ACM Press, 2003, pp. 481–488.
Graphics (Proc. Siggraph), vol. 22, no. 3, 2003, pp. 27. K. Hornbaek and M. Hertzum, “Untangling the
792–800. Usability of Fisheye Menus,” ACM Trans. Computer–
13. J. Hasselgren and T. Akenine-Möller, “PCU: The Human Interaction, vol. 14, no. 2, 2007, article 6.
Programmable Culling Unit,” ACM Trans. Graphics 28. T. Capin and A. Haro, “Mobile Camera Based
(Proc. Siggraph), vol. 26, no. 3, 2007, article 92. User Interaction,” Handbook of Research on User
14. B.C. Mochocki et al., “Signature-Based Workload Interface Design and Evaluation for Mobile Technology,
Estimation for Mobile 3D Graphics,” Proc. 43rd Information Science Reference, 2008, pp. 541–555.
Ann. Conf. Design Automation (DAC 06), ACM Press, 29. B. Bederson et al., “Datelens: A Fisheye Calendar
2006, pp. 592–597. Interface for PDAs,” ACM Trans. Computer–Human
15. J. Hasselgren and T. Akenine-Möller, “An Efficient Interaction, vol. 11, no. 1, 2004, pp. 90–119.
Multi-View Rasterization Architecture,” Proc. 30. A.K. Karlson, B.B. Bederson, and J. Sangiovanni,
Eurographics Symp. Rendering, Eurographics Assoc., “AppLens and launchTile: Two Designs for One-
2006, pp. 61–72. Handed Thumb Use on Small Devices,” Proc. SIGCHI
16. A. Kalaiah and T. Capin, “Unified Rendering Pipeline Conf. Human Factors in Computing Systems (CHI 05),
for Autostereoscopic Displays,” Proc. 3DTV Conf., ACM Press, 2005, pp. 201–210.
IEEE Press, 2007, pp. 1–4. 31. B. Shneiderman, and C. Plaisant, Designing the User
17. F. Duguet and G. Drettakis, “Flexible Point-Based Interface, 4th ed., Addison-Wesley, 2004.
Rendering on Mobile Devices,” IEEE Computer 32. S.J.V. Nichols, “New Interfaces at the Touch of a

IEEE Computer Graphics and Applications 83


Mobile Graphics Survey

Fingertip,” Computer, vol. 40, no. 8, 2007, pp. 12–15. Symp. User Interface and Software Technology (UIST),
33. J. Pascoe, N. Ryan, and D. Morse, “Using While ACM Press, 1995, pp. 29–36.
Moving: HCI Issues in Fieldwork Environments,” 40. D. Wagner and D. Schmalstieg, “First Steps towards
ACM Trans. Computer–Human Interaction, vol. 7, no. Handheld Augmented Reality,” Proc. 7th IEEE Int’l
3, 2000, pp. 417–437. Symp. Wearable Computers (ISWC 03), IEEE CS Press,
34. K. Hinckley et al., “Sensing Techniques for Mobile 2003, pp. 127–136.
Interaction,” Proc. 13th Ann. ACM Symp. User 41. M. Möhring, C. Lessig, and O. Bimber, “Video
Interface Software and Technology (UIST 00), ACM See­Through AR on Consumer Cell­Phones,” Proc.
Press, 2000, pp. 91–100. 3rd IEEE and ACM Int’l Sym. Mixed and Augmented
35. J. Hannuksela, P. Sangi, and J. Heikkilä, “Vision­ Reality (ISMAR 04), IEEE Press, 2004, pp. 252–253.
Based Motion Estimation for Interaction with Mobile 42. A. Henrysson, M. Billinghurst, and M. Ollila, “Face
Devices,” Computer Vision and Image Understanding, to Face Collaborative AR on Mobile Phones,” Proc.
vol. 108, nos. 1–2, 2007, pp. 188–195. 4th IEEE and ACM Int’l Symp. Mixed and Augmented
36. S. Feiner et al., “A Touring Machine: Prototyping 3D Reality (ISMAR 05), IEEE Press, 2005, pp. 80–89.
Mobile Augmented Reality Systems for Exploring the 43. E. Bruns et al., “Enabling Mobile Phones to Support
Urban Environment,” Proc. 1st Int’l Symp. Wearable Large Scale Museum Guidance,” IEEE MultiMedia,
Computers, IEEE CS Press, 1997, pp. 74–81. vol. 14, no. 2, 2007, pp. 16–25.
37. R. Azuma, “A Survey of Augmented Reality,” Presence: 44. H. Bay, B. Fasel, and L. Van Gool, “Interactive
Teleoperators and Virtual Environments, vol. 6, no. 4, Museum Guide: Fast and Robust Recognition of
1997, pp. 355–385. Museum Objects,” Proc. 1st Int’l Workshop Mobile
38. R. Azuma et al., “Recent Advances in Augmented Vision, Springer Verlag, 2006.
Reality,” IEEE Computer Graphics and Applications, 45. W.­C. Chen et al., “Efficient Extraction of Robust
vol. 21, no. 6, 2001, pp. 34–47. Image Features on Mobile Devices,” Proc. Int’l Symp.
39. J. Rekimoto and K. Nagao, “The World through the Mixed and Augmented Reality (ISMAR 07), IEEE Press,
Computer: Computer Augmented Interaction with 2007, pp. 281–282.
Real World Environments,” Proc. 8th Ann. ACM
Tolga Capin is an assistant professor in Bilkent Uni-
versity’s Department of Computer Engineering. He has
contributed to various mobile graphics standards. His
www.computer.org/security/podcasts

research interests include mobile graphics platforms,

Silver Bullet
human–computer interaction, and computer anima-
tion. Capin received his PhD in computer science from
the Ecole Polytechnique Federale de Lausanne. Contact

Security Podcast
him at [email protected].

Kari Pulli is a research fellow at Nokia Research


Center. He has been an active contributor to several
mobile graphics standards and recently wrote a book
Check out a free series of interviews about mobile 3D graphics. Pulli received a PhD in
with host Gary McGraw, computer science from the University of Washington
featuring in-depth interviews and an MBA from the University of Oulu. Contact
with security gurus, including him at [email protected].

• Jon Swartz of USA Today Tomas Akenine-Möller is a professor in Lund


• Avi Rubin of Johns Hopkins, and University’s Department of Computer Science. His
• Bruce Schneier of BT Counterpane research interests are graphics hardware for mobile
devices and desktops, new computing architectures,
Sponsored by Cigital and collision detection, and high-quality rapid rendering.
IEEE Security & Privacy magazine Akenine-Möller received his MSc in computer science
and engineering from Lund University and his PhD in
graphics at the Chalmers University of Technology. He
received the best paper award at Graphics Hardware
2005 with Jacob Ström for the ETC texture compres-
Stream it online sion scheme, which is now part of the OpenGL ES
or download to your iPod... API. Contact him at [email protected].

84 July/August 2008

You might also like