Video Display Unit (VDU) : Analogue TV & Monitors Computer Monitors
Video Display Unit (VDU) : Analogue TV & Monitors Computer Monitors
A CRT scanned the display incessantly, so needed a real time stream of pixel data.
Although in principle LCDs could be ‘abused’ here, the standard interface has been retained.
CEC
Frame store
Based on a 2D array of memory (frame store) with a ‘numeric’ representation of a pixel’s colour.
0 1 2 3 …
1280 1281 …
… …
R = FF
G = FF
B = 00
Each location has an address; this may be a byte, or several bytes, or even less than a byte.
(The first address does not have to be 0000_0000.)
Each pixel’s data represents a colour: e.g. one byte/pixel gives 256 possible colours.
Colours are often separated into Red, Green and Blue intensities.
a d
Frame store
❏ Pixels need to be read many times per second to keep the display stable. This impacts:
❍ The output rate to the DVI (or whatever).
❍ The need for RAM access to the frame store.
Frame store bandwidth is critical.
The frame store needs to be read to supply this demand. If a single pixel (32-bit
word) were read at this rate the memory would need to cycle in <10 ns; not
really feasible for the ‘big’ (multi-megabyte even assuming a single frame store
and there could be more than one) memory devices needed. Thus there needs to
be a means of increasing the memory bandwidth. Fortunately the read-out pat-
Frame store
terns are entirely predictable; it’s easy enough to read the frame store at many
words wide and then serialise this data.
In the absence of dual-port memory the accesses either must interleave in time Also note, if implementing animation, at least, there is another bandwidth
(a typical solution) or two (smaller) separate and switchable frame store memo- requirement to allow concurrent writing of the pixels – and a real-time limit too.
ries are needed (expensive).
y = m.x + c
Line is aliased onto pixel array.
(X0,Y0)
Problems:
❏ Division needed once
❏ Multiplication needed constantly
Anti-aliased
❏ Rounding errors
Anti-aliasing requires considerably more calculation and more memory opera-
tions (including reading the pre-existing background).
University of Manchester School of Computer Science
Octants Optimisation
The foregoing assumes that the line is in the There is another optimisation which reduces the length of the loop by simplify-
shaded octant, shown here. If it is not, the same ing the ‘plot’ operation. Instead of translating coordinates on each iteration,
approach can be followed with some slight varia- simply work out the address of the starting point and retain that. Using the
tions. assumptions of ‘one address per pixel’ and ‘640 pixels per line’, the following
translations take place:
In this example, x is incremented and y is incre-
mented conditionally. For the octant immediately x = x + 1 ⇒ address = address + 1
below the x axis, x is incremented and y is condi- y = y + 1 ⇒ address = address + 640
tionally decremented. As long as the coordinates
The plot no longer needs to do any translation, just the store.
are modified in the correct way it the signs of the
internal variables are irrelevant. A disadvantage of this method is that running off the edge of the frame store is
not apparent, as it may be if clipping the x any y coordinates.
Similarly, if the slope of the line is >1 (i.e. ‘steeper than 45°’) then x and y are
exchanged. A similar transformation can be applied if the line is going ‘right’ or
‘down’.
If you have more than one pixel/word in the frame store (as in the lab.) then one
Similar triangles can speed up drawing by writing several pixels at once. These pixels must be in
the same word and so will form a horizontal group. This is not very useful when
The gradient (‘m’) of a step from one pixel to the next is derived from the verti- drawing single lines because there will often not be several adjacent pixels
cal/horizontal distances between end points. Although ‘m’ is typically fractional within the same word.
(0 ≤ m ≤ 1) the distances between endpoints are integers.
y
It is very useful when filling areas (e.g. clear screen) and similar (e.g. character
drawing) where it can reduce drawing times by (e.g. 4x).
m
1 dy
dx
Parallelism
Identifying parallelism is a good plan: e.g. Bresenham’s line algorithm
2 clocks/iteration 1 clock/iteration
x <= X0; x <= X0;
y <= Y0; y <= Y0;
dx <= X1 - X0; dx <= X1 - X0;
dy <= Y1 - Y0; dy <= Y1 - Y0;
e <= -dx; e <= -dx;
for (dx) for (dx)
plot(x,y); plot(x,y);
x <= x + 1; x <= x + 1;
e <= e + 2*dy; if (e + 2*dy >= 0)
if (e >= 0) y <= y + 1;
y <= y + 1; e <= e + 2*(dy - dx);
e <= e - 2*dx; else
plot(x,y); e <= e + 2*dy;
plot(x,y);
Also note the pipelining here: plot overlaps with the next pixel calculation.
In the second example the critical path is likely to be longer (‘if’ calculation followed by multiplexer)
but not much worse (multiplexers are quick).
Parallelism Problem
Probably the biggest ‘mistake’ made by people starting to develop HDL code is Fill in the timing diagram for this module.
to think serially, as it a conventional (imperative) programming language. In C,
reg [3:0] counter;
Java, assembly language etc. statements can be viewed as executing one after
the other … because they need to (at least in principle). reg carry;
In hardware the only needs are due to dependencies and resources – and always @ (posedge clk)
resources shouldn’t be too much of an issue within this lab. Thus statements if (en && carry_in) // Hint on fn. of ‘carry’
need to be mapped into time slots but as many statements as possible can go in begin
the same time. This leads to a much faster implementation than a simple one- if (counter == 9)
statement-per-clock machine. begin
counter <= 0;
The number of serial processing steps which take place in a single cycle (i.e. the carry <= 1;
critical path length) also concerns the designer; however the cycle is generous end
in the lab. so it is not likely to be a major concern when describing logic.
else
begin
counter <= counter + 1;
When developing your own code, design it before you implement. Plan what carry <= 1;
should happen (e.g. on a piece of paper) in each clock cycle. end
Pay attention to which values are latched. A common problem is that a value is end
only available after a clock edge when you want it in the current cycle. The
choice is then whether to derive the signal combinatorially so that it is available
a bit earlier or whether to start work a cycle earlier. See the problem on the right. clk
counter 7 8 9 0
carry
The circuit is unlikely to be useful. Rewrite the Verilog in at least one way to do
what the designed presumably intended.