Chapter 2
Chapter 2
Graphics Hardware
Any computer-generated image must be displayed in some form. The most common
graphics display device is the video monitor, and the most common technology for
video monitors is the Cathode Ray Tube (CRT). Now we will examine the technology
behind CRT displays, and how they can be used to display images.
CRT display devices are very common in everyday life as well as in computer
graphics: most television sets use CRT technology. Figure 1 illustrates the basic
operation of a CRT display. Beams of electrons are generated by electron guns and
fired at a screen consisting of thousands of tiny phosphor dots. When a beam hits a
phosphor dot it emits light with brightness proportional to the strength of the beam.
Therefore pictures can be drawn on the display by directing the electron beam to
particular parts of the screen. The beam is directed to different parts of the screen by
passing it through a magnetic field. The strength and direction of this field, generated
by the deflection yoke, determines the degree of deflection of the beam on its way to
the screen.
To achieve a colour display, CRT devices have three electron beams, and on the
screen the phosphor dots are in groups of three, which give off red, green and blue
light respectively. Because the three dots are very close together the light given off by
the phosphor dots is combined, and the relative brightnesses of the red, green and blue
components determines the colour of light perceived at that point in the display.
CRT displays can work in one of two ways: random scan devices or raster scan
devices. In a random scan device the CRT beam is only directed to areas of the screen
where parts of the picture are to be drawn. If a part of the screen is blank the electron
beam will never be directed at it. In this case, we draw a picture as a set of primitives,
for example lines or curves (see Figure 2). For this reason, random scan devices are
also known as vector graphics displays.
Random scan displays are not so common for CRT devices, although some early
video games did use random scan technology. These days random scan is only really
used by some hard-copy plotters.
In a raster scan device, the primitives to be drawn are first converted into a grid of
dots (see Figure 2). The brightness’s of these dots are stored in a data structure known
as a frame buffer. The CRT electron beam then sweeps across the screen line-by-line,
visiting every location on the screen, but it is only switched on when the frame buffer
indicates that a dot is to be displayed at that location.
Raster scan CRT devices are the most common type of graphics display device in use
today, although recently LCD (Liquid Crystal Display) devices have been growing in
popularity. In this course, though, we will concentrate on the details of raster scan
devices.
We mentioned in Section 2.1 that for colour CRT displays there are three electron
beams: one for red light, one for green light and one for blue light. To ensure that
these three beams precisely hit the appropriate phosphor dots on the display, after the
beams have been deflected by the deflection yoke they pass through a mask
containing many tiny holes – one hole for each pixel in the raster (see Figure 1 and
Figure 4). This mask is known as the shadow mask. Because the three electron beams
originated from slightly different locations (from three separate electron guns), if they
all pass through the same hole in the shadow mask they will impact the screen at
slightly different locations: the locations of the red, green and blue phosphor dots.
The number of times that the entire raster is refreshed (i.e. drawn) each second is
known as the refresh rate of the device. For the display to appear persistent and not to
flicker the display must update often enough so that we cannot perceive a gap
between frames. In other words, we must refresh the raster when the persistence of the
phosphor dots is beginning to wear off. In practise, if the refresh rate is more than 24
frames per second (f/s) the display will appear reasonably smooth and persistent.
Modern graphics displays have high refresh rates, typically in the region of 60 f/s.
However, early graphics systems tended to have refresh rates of about 30 f/s.
Consequently, some flicker was noticeable. To overcome this, a technique known as
interlaced scanning was employed. Using interlaced scanning alternate scan-lines are
updated in each raster refresh. For example, in the first refresh only odd numbered
scan-lines may be refreshed, then on the next refresh only even-numbered scan-lines,
and so on (see Figure 5). Because this technique effectively doubles the screen refresh
rate, it has the effect of reducing flicker for displays with low refresh rates. Interlaced
scanning was common in the early days of computer graphics, but these days displays
have better refresh rates so it is not so common.
The following are the specifications of some common video formats that have been
(and still are) used in computer graphics:
In Section 2.1.2 we introduced the concept of a frame buffer. Frame buffers are used
by raster scan display devices to store the pixel values of the image that will be
displayed on the raster. It is a 2-D array of data values, with each data value
corresponding to a pixel in the image. The number of bits used to store the value for
each pixel is known as the bit-planes or depth of the frame buffer. For example, a
640x480x8 frame buffer has a resolution of 640x480 and a depth of 8 bits; a
1280x1024x24 frame buffer has a resolution of 1280x1024 and a depth of 24 bits. For
colour displays we need to store a value for each component of the colour (red, green
and blue), so the bit-planes will typically be a multiple of 3 (e.g. 8 bit-planes each for
red, green and blue makes a total of 24 bit-planes)
Now we are in a position to examine the basic architecture of raster graphics systems,
i.e. what components are required, and how do they interact? Figure 6 shows a block-
diagram of a typical raster graphics system. Most (non-graphics) processing will
occur in the CPU of the computer, which uses the system bus to communicate with
the system memory and peripheral devices. When graphics routines are to be
executed, instead of being executed by the CPU they are passed straight to the display
processor, which contains dedicated hardware for drawing graphics primitives. The
display processor writes the image data into the frame buffer, which is a reserved
portion of the display processor memory. Finally the video controller reads the data in
the frame buffer and uses it to control the electron beams in the CRT display.
The display processor is also known by a variety of other names: graphics controller,
display coprocessor, graphics accelerator and video card are all terms used to refer to
the display processor. Since the display processor contains dedicated hardware for
executing graphics routines it must be dedicated to a particular set of routines. In
other words, display processors will only be able to handle the graphics processing if
the routines used are from a particular graphics package. This is known as hardware
rendering. Most commercial video cards will support hardware rendering for the
OpenGL graphics package, and many PC video cards will also support hardware
rendering for DirectX. Hardware rendering is much faster than the alternative,
software rendering, in which graphics routines are compiled and executed by the CPU
just like any other code. For the raster graphics architecture to support software
rendering the block-diagram shown in Figure 6 would need to be modified so that the
frame buffer was connected directly to the system bus in order that it could be
updated by the CPU.
Figure 6 - The Architecture of a Raster Graphics System with a Display Processor
In recent years the popularity of 3-D graphics display devices has been growing,
although currently they are still quite expensive compared with traditional 2-D
displays. The aim of 3-D display devices is to provide a stereo pair of images, one to
each eye of the viewer, so that the viewer can perceive the depth of objects in the
scene as well as their position. The process of generating such 3-D displays is known
as stereoscopy.
3-D displays can be divided into two types: head-mounted displays (HMDs) and
head-tracked displays (HTDs). HMDs are displays that are mounted on the head of
the viewer. For example, Figure 7 shows a HMD – the device fits on the head of the
user and displays separate images to the left and right eyes, producing a sense of
stereo immersion. Such devices are common in virtual reality applications. Normally
a tracking device will be used to track the location of the display device so that the
images presented to the user can be updated accordingly – giving the impression that
the viewer is able to ‘move’ through the virtual world.
Whereas with a HMD the display moves with the viewers head, with a HTD the
display remains stationary, but the head of the viewer is tracked so that the images
presented in the display can be updated. The difficulty with HTDs is how to ensure
that the left and right eyes of the viewer receive separate stereo images to give a 3-D
depth effect. A number of technologies exist to achieve this aim. For example, the
display can use polarised light filters to give off alternate vertically and horizontally
polarised images; the viewer then wears special glasses with polarised filters to ensure
that, for example, the left eye receives the vertically polarised images and the right
eye receives the horizontally polarised images. An alternative technique is to use
colour filters: the display draws a left-eye image in, for example, blue, and a right eye
image in green; then the viewer wears glasses with colour filters to achieve the stereo
effect. Other technologies also exist.
The model of the graphics pipeline is usually used in real-time rendering. Often, most
of the pipeline steps are implemented in hardware, which allows for
special optimizations. The term "pipeline" is used in a similar sense to the pipeline in
processors: the individual steps of the pipeline run in parallel as long as any given
step has what it needs.
A graphics pipeline can be divided into three main parts: Application, Geometry and
Rasterization.
Application
The application step is executed by the software on the main processor (CPU). During
the application step, changes are made to the scene as required, for example, by user
interaction by means of input devices or during an animation. The new scene with all
its primitives, usually triangles, lines and points, is then passed on to the next step in
the pipeline.
Examples of tasks that are typically done in the application step are collision detection,
animation, morphing, and acceleration techniques using spatial subdivision schemes
such as Quadtrees or Octrees. These are also used to reduce the amount of main
memory required at a given time. The "world" of a modern computer game is far
larger than what could fit into memory at once.
Geometry
The geometry step (with Geometry pipeline), which is responsible for the majority of
the operations with polygons and their vertices (with Vertex pipeline), can be divided
into the following five tasks. It depends on the particular implementation of how these
tasks are organized as actual parallel pipeline steps.
Camera Transformation
In addition to the objects, the scene also defines a virtual camera or viewer that
indicates the position and direction of view relative to which the scene is rendered.
The scene is transformed so that the camera is at the origin looking along the Z axis.
The resulting coordinate system is called the camera coordinate system and the
transformation is called camera transformation or View Transformation.
Projection
The 3D projection step transforms the view volume into a cube with the corner point
coordinates (-1, -1, 0) and (1, 1, 1); Occasionally other target volumes are also used.
This step is called projection, even though it transforms a volume into another volume,
since the resulting Z coordinates are not stored in the image, but are only used in Z-
buffering in the later rastering step. In a perspective illustration, a central projection is
used. To limit the number of displayed objects, two additional clipping planes are
used; The visual volume is therefore a truncated pyramid (frustum). The parallel
or orthogonal projection is used, for example, for technical representations because it
has the advantage that all parallels in the object space are also parallel in the image
space, and the surfaces and volumes are the same size regardless of the distance from
the viewer. Maps use, for example, an orthogonal projection (so-called orthophoto),
but oblique images of a landscape cannot be used in this way - although they can
technically be rendered, they seem so distorted that we cannot make any use of them.
Lighting
Often a scene contains light sources placed at different positions to make the lighting
of the objects appear more realistic. In this case, a gain factor for the texture is
calculated for each vertex based on the light sources and the material properties
associated with the corresponding triangle. In the later rasterization step, the vertex
values of a triangle are interpolated over its surface. A general lighting (ambient light)
is applied to all surfaces. It is the diffuse and thus direction-independent brightness of
the scene. The sun is a directed light source, which can be assumed to be infinitely far
away. The illumination effected by the sun on a surface is determined by forming the
scalar product of the directional vector from the sun and the normal vector of the
surface. If the value is negative, the surface is facing the sun.
Clipping
Only the primitives which are within the visual volume need to actually
be rastered (drawn). This visual volume is defined as the inside of a frustum, a shape
in the form of a pyramid with a cut off top. Primitives which are completely outside
the visual volume are discarded; This is called frustum culling. Further culling
methods such as backface culling, which reduce the number of primitives to be
considered, can theoretically be executed in any step of the graphics pipeline.
Primitives which are only partially inside the cube must be clipped against the cube.
The advantage of the previous projection step is that the clipping always takes place
against the same cube. Only the - possibly clipped - primitives, which are within the
visual volume, are forwarded to the final step.
Window-Viewport transformation
In order to output the image to any target area (viewport) of the screen, another
transformation, the Window-Viewport transformation, must be applied. This is a shift,
followed by scaling. The resulting coordinates are the device coordinates of the output
device.
Rasterization
The rasterization step is the final step before the fragment shader pipeline that all
primitives are rasterized with. In the rasterization step, discrete fragments are created
from continuous primitives.
In this stage of the graphics pipeline, the grid points are also called fragments, for the
sake of greater distinctiveness. Each fragment corresponds to one pixel in the frame
buffer and this corresponds to one pixel of the screen. These can be colored (and
possibly illuminated). Furthermore, it is necessary to determine the visible, closer to
the observer fragment, in the case of overlapping polygons. A Z-buffer is usually used
for this so-called hidden surface determination. The color of a fragment depends on
the illumination, texture, and other material properties of the visible primitive and is
often interpolated using the triangle vertex properties. Where available, a fragment
shader (also called Pixel Shader) is run in the rastering step for each fragment of the
object. If a fragment is visible, it can now be mixed with already existing color values
in the image if transparency or multi-sampling is used. In this step, one or more
fragments become a pixel.
To prevent that the user sees the gradual rasterization of the primitives, double
buffering takes place. The rasterization is carried out in a special memory area. Once
the image has been completely rasterized, it is copied to the visible area of the image
memory.
Given a set of 3-D objects and a viewing position. For the generation of realistic
graphics display, we wish to determine which lines or surfaces of the objects are
visible, either from the Center of Projection(COP) (for perspective projections) or
along the direction of projection (for parallel projections), so that we can display only
the visible lines or surfaces. For this, we need to conduct visibility tests. Visibility
tests are conducted to determine the surface that is visible from a given viewpoint.
This process is known as visible-line or visible-surface determination, or hidden-line
or hidden-surface elimination.
For each pixel position (x,y) on the view plane, the surface with the smallest z-
coordinate at that position is visible. For example, Figure 3 shows three surfaces S1,
S2, and S3, out of which surface S1 has the smallest z-value at (x,y) position. So
surface S1 is visible at that position. So its surface intensity value at (x,y) is saved in
the refresh-buffer.
Here the projection is orthographic and the projection plane is taken as the xy-plane.
So, each (x,y,z) position on the polygon surfaces corresponds to the orthographic
projection point (x,y) on the projection plane. Therefore, for each pixel position (x,y)
on the view plane, object depth can be compared by comparing z-values, as shown in
Figure 3. For implementing z-buffer algorithm two buffer areas (two 2-D arrays) are
required.