CHAPTER8IMAGES
CHAPTER8IMAGES
CHAPTER8IMAGES
Digital images
and image formats
An important type of digital media is images, and in this chapter we are going to
review how images are represented and how they can be manipulated with sim-
ple mathematics. This is useful general knowledge for anyone who has a digital
camera and a computer, but for many scientists, it is an essential tool. In as-
trophysics data from both satellites and distant stars and galaxies is collected in
the form of images, and information extracted from the images with advanced
image processing techniques. Medical imaging makes it possible to gather dif-
ferent kinds of information in the form of images, even from the inside of the
body. By analysing these images it is possible to discover tumours and other
disorders.
8.1.1 Light
125
126 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
Our focus will be on objects that emit light, for example a computer display. A
computer monitor consists of a rectangular array of small dots which emit light.
In most technologies, each dot is really three smaller dots, and each of these
smaller dots emit red, green and blue light. If the amounts of red, green and
blue is varied, our brain merges the light from the three small light sources and
perceives light of different colours. In this way the colour at each set of three
dots can be controlled, and a colour image can be built from the total number
of dots.
It is important to realise that it is possible to generate most, but not all,
colours by mixing red, green and blue. In addition, different computer monitors
use slightly different red, green and blue colours, and unless this is taken into
consideration, colours will look different on the two monitors. This also means
that some colours that can be displayed on one monitor may not be displayable
on a different monitor.
Printers use the same principle of building an image from small dots. On
most printers however, the small dots do not consist of smaller dots of different
colours. Instead as many as 7–8 different inks (or similar substances) are mixed
to the right colour. This makes it possible to produce a wide range of colours, but
not all, and the problem of matching a colour from another device like a monitor
is at least as difficult as matching different colours across different monitors.
Video projectors builds an image that is projected onto a wall. The final im-
age is therefore a reflected image and it is important that the surface is white so
that it reflects all colours equally.
The quality of a device is closely linked to the density of the dots.
8.1. WHAT IS AN IMAGE? 127
Fact 8.2 (Resolution). The resolution of a medium is the number of dots per
inch (dpi). The number of dots per inch for monitors is usually in the range
70–120, while for printers it is in the range 150–4800 dpi. The horizontal and
vertical densities may be different. On a monitor the dots are usually referred
to as pixels (picture elements).
Fact 8.3. The resolution of a scanner usually varies in the range 75 dpi to 9600
dpi, and the colour is represented with up to 48 bits per dot.
For digital cameras it does not make sense to measure the resolution in dots
per inch, as this depends on how the image is printed (its size). Instead the
resolution is measured in the number of dots recorded.
Fact 8.4. The number of pixels recorded by a digital camera usually varies
in the range 320 × 240 to 6000 × 4000 with 24 bits of colour information per
pixel. The total number of pixels varies in the range 76 800 to 24 000 000 (0.77
megapixels to 24 megapixels).
For scanners and cameras it is easy to think that the more dots (pixels), the
better the quality. Although there is some truth to this, there are many other
factors that influence the quality. The main problem is that the measured colour
information is very easily polluted by noise. And of course high resolution also
means that the resulting files become very big; an uncompressed 6000 × 4000
image produces a 72 MB file. The advantage of high resolution is that you can
magnify the image considerably and still maintain reasonable quality.
Figure 8.1. Different version of the same image; black and white (a), grey-level (b), and colour (c).
p i , j = (r i , j , g i , j , b i , j ),
that denote the amount of red, green and blue at the point (i , j ).
Fact 8.6. In these notes the intensity values p i , j are assumed to be real num-
bers in the interval [0, 1]. For colour images, each of the red, green, and blue
8.1. WHAT IS AN IMAGE? 129
(a) (b)
Figure 8.2. Two excerpt of the colour image in figure 8.1. The dots indicate the position of the points (i , j ).
If we magnify a small part of the colour image in figure 8.1, we obtain the
image in figure 8.2 (the black lines and dots have been added). A we can see,
the pixels have been magnified to big squares. This is a standard representation
used by many programs — the actual shape of the pixels will depend on the
output medium. Nevertheless, we will consider the pixels to be square, with
integer coordinates at their centres, as indicated by the grids in figure 8.2.
Fact 8.7 (Shape of pixel). The pixels of an image are assumed to be square
with sides of length one, with the pixel with value p i , j centred at the point
(i , j ).
we can consider a grey-level image as a function P : Zm,n 7→ [0, 1]. In other words,
we may consider an image to be a sampled version of a surface with the intensity
value denoting the height above the (x, y)-plane, see figure 8.3.
130 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
Figure 8.3. The grey-level image in figure 8.1 plotted as a surface. The height above the (x, y)-plane is given
by the intensity value.
for i = 1, . . . , m and j = 1, . . . , n.
P : Zm,n 7→ R3 .
We have assumed that the intensities all lie in the interval [0, 1], but as we noted,
many formats in fact use integer values in the range 0–255. And as we perform
computations with the intensities, we quickly end up with intensities outside
[0, 1] even if we start out with intensities within this interval. We therefore need
to be able to normalise the intensities. This we can do with the simple linear
function in observation 5.23,
x −a
g (x) = , a < b,
b−a
which maps the interval [a, b] to [0, 1]. A simple case is mapping [0, 255] to [0, 1]
which we accomplish with the scaling g (x) = x/255. More generally, we typically
perform computations that result in intensities outside the interval [0, 1]. We can
then compute the minimum and maximum intensities p min and p max and map
the interval [p min , p max ] back to [0, 1]. Several examples of this will be shown
below.
132 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
Figure 8.5. The red (a), green (b), and blue (c) components of the colour image in figure 8.1.
P r = (r i , j )m,n
i , j =1
, P r = (g i , j )m,n
i , j =1
, P r = (b i , j )im,n
, j =1
.
Figure 8.6. Alternative ways to convert the colour image in figure 8.1 to a grey level image. In (a) each colour
triple has been replaced by its maximum, in (b) each colour triple has been replaced by its sum and the result
mapped to [0, 1], while in (c) each triple has been replaced by its length and the result mapped to [0, 1].
Figure 8.7. The negative versions of the corresponding images in figure 8.6.
In practice one of the last two methods are usually preferred, perhaps with
a preference for the last method, but the actual choice depends on the applica-
tion.
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
(a) (b)
(c) (d)
Figure 8.8. The plots in (a) and (b) show some functions that can be used to improve the contrast of an image.
In (c) the middle function in (a) has been applied to the intensity values of the image in figure 8.6c, while in
(d) the middle function in (b) has been applied to the same image.
136 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
somehow spread out the values. This can be accomplished by applying a func-
tion f to the intensity values, i.e., new intensity values are computed by the for-
mula
p̂ i , j = f (p i , j )
for all i and j . If we choose f so that its derivative is large in the area where many
intensity values are concentrated, we obtain the desired effect.
Figure 8.8 shows some examples. The functions in the left plot have quite
large derivatives near x = 0.5 and will therefore increase the contrast in images
with a concentration of intensities with value around 0.5. The functions are all
on the form
¡ ¢
arctan n(x − 1/2) 1
f n (x) = + . (8.1)
2 arctan(n/2) 2
For any n 6= 0 these functions satisfy the conditions f n (0) = 0 and f n (1) = 1. The
three functions in figure 8.8a correspond to n = 4, 10, and 100.
Functions of the kind shown in figure 8.8b have a large derivative near x = 0
and will therefore increase the contrast in an image with a large proportion of
small intensity values, i.e., very dark images. The functions are given by
ln(x + ²) − ln ²
g ² (x) = , (8.2)
ln(1 + ²) − ln ²
and the ones shown in the plot correspond to ² = 0.1, 0.01, and 0.001.
In figure 8.8c the middle function in (a) has been applied to the image in
figure 8.6c. Since the image was quite well balanced, this has made the dark
areas too dark and the bright areas too bright. In figure 8.8d the function in (b)
has been applied to the same image. This has made the image as a whole too
bright, but has brought out the details of the road which was very dark in the
original.
We will see more examples of how the contrast in an image can be enhanced
when we try to detect edges below.
8.2. OPERATIONS ON IMAGES 137
1 2 1
1
2 4 2 . (8.3)
16
1 2 1
We can smooth an image with this array by placing the centre of the array on
a pixel, multiplying the pixel and its neighbours by the corresponding weights,
summing up and dividing by the total sum of the weights. More precisely, we
would compute the new pixels by
1¡
p̂ i , j = 4p i , j + 2(p i , j −1 + p i −1, j + p i +1, j + p i , j +1 )
16
¢
+ p i −1, j −1 + p i +1, j −1 + p i −1, j +1 + p i +1, j +1 .
Since the weights sum to one, the new intensity value p̂ i , j is a weighted average
of the intensity values on the right. The array of numbers in (8.3) is in fact an ex-
ample of a computational molecule, see figure 7.3. For simplicity we have omit-
ted the details in the drawing of the computational molecule. We could have
used equal weights for all nine pixels, but it seems reasonable that the weight of
a pixel should be larger the closer it is to the centre pixel.
As for audio, the values used are taken from Pascal’s triangle, since these
weights are known to give a very good smoothing effect. A larger filter is given
by the array
1 6 15 20 15 6 1
6 36 90 120 90 36 6
15 90 225 300 225 90 15
1
20 120 300 400 300 120 20 . (8.4)
1024
15 90 225 300 225 90 15
6 36 90 120 90 36 6
1 6 15 20 15 6 1
These numbers are taken from row six of Pascal’s triangle. More precisely, the
value in row k and column l is given by the product k6 6l . The scaling 1/4096
¡ ¢¡ ¢
comes from the fact that the sum of all the numbers in the table is 26+6 = 4096.
The result of applying the two filters in (8.3) and (8.4) to an image is shown
in figure 8.9 (b) and (c) respectively. The smoothing effect is clearly visible.
138 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
Figure 8.9. The images in (b) and (c) show the effect of smoothing the image in (a).
Figure 8.10. The image in (a) shows the partial derivative in the x-direction for the image in 8.6. In (b) the
intensities in (a) have been normalised to [0, 1] and in (c) the contrast as been enhanced with the function f 50 ,
equation 8.1.
This image is not very helpful since it is almost completely black. The rea-
son for this is that many of the intensities are in fact negative, and these are just
displayed as black. More specifically, the intensities turn out to vary in the inter-
val [−0.424, 0.418]. We therefore normalise and map all intensities to [0, 1]. The
result of this is shown in (b). The predominant colour of this image is an average
grey, i.e, an intensity of about 0.5. To get more detail in the image we therefore
try to increase the contrast by applying the function f 50 in equation 7.6 to each
intensity value. The result is shown in figure 8.10c which does indeed show more
detail.
It is important to understand the colours in these images. We have com-
puted the derivative in the x-direction, and we recall that the computed val-
ues varied in the interval [−0.424, 0.418]. The negative value corresponds to the
largest average decrease in intensity from a pixel p i −1, j to a pixel p i +1, j . The
positive value on the other hand corresponds to the largest average increase in
intensity. A value of 0 in figure 8.10a corresponds to no change in intensity be-
tween the two pixels.
When the values are mapped to the interval [0, 1] in figure 8.10b, the small
values are mapped to something close to 0 (almost black), the maximal values
are mapped to something close to 1 (almost white), and the values near 0 are
mapped to something close to 0.5 (grey). In figure 8.10c these values have just
been emphasised even more.
Figure 8.10c tells us that in large parts of the image there is very little vari-
ation in the intensity. However, there are some small areas where the intensity
140 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
changes quite abruptly, and if you look carefully you will notice that in these ar-
eas there is typically both black and white pixels close together, like down the
vertical front corner of the bus. This will happen when there is a stripe of bright
or dark pixels that cut through an area of otherwise quite uniform intensity.
Since we display the derivative as a new image, the denominator is actually
not so important as it just corresponds to a constant scaling of all the pixels;
when we normalise the intensities to the interval [0, 1] this factor cancels out.
We sum up the computation of the partial derivative by giving its computa-
tional molecule.
0 0 0
1
−1 0 1 .
2
0 0 0
As we remarked above, the factor 1/2 can usually be ignored. We have in-
cluded the two rows of 0s just to make it clear how the computational molecule
is to be interpreted; it is obviously not necessary to multiply by 0.
0 1 0
1
0 0 0 .
2
0 −1 0
The result is shown in figure 8.12b. The intensities have been normalised
and the contrast enhanced by the function f 50 in (8.1).
8.2. OPERATIONS ON IMAGES 141
Figure 8.11. Computing the gradient. The image obtained from the computed gradient is shown in (a) and in
(b) the numbers have been normalised. In (c) the contrast has been enhanced with a logarithmic function.
Figure 8.12. The first-order partial derivatives in the x-direction (a) and y-direction (b), and the length of the
gradient (c). In all images, the computed numbers have been normalised and the contrast enhanced.
Figure 8.13. The second-order partial derivatives in the x-direction (a) and x y-direction (b), and the y-
direction (c). In all images, the computed numbers have been normalised and the contrast enhanced.
molecules
0 0 0
∂2 P −1 2 −1 ,
:
∂x 2
0 0 0
−1 0 1
∂2 P 1
: 0 0 0 ,
∂y∂x 4
1 0 −1
0 1 0
∂2 P 0 2 0 .
:
∂y 2
0 −1 0
0.8
144 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
0.6
1.0
0.8
0.6
0.4
0.4
0.2
Figure 8.14. The difference between vector graphics ((a) and (c)) and raster graphics ((b) and (d)).
Figure 8.15. The character ’S’ in the font Times Roman. The dots are parameters that control the shape of the
curves.
In (d), the plot in (b) has been magnified, and here the dots have become clearly
visible. The difference is that while the plots in (b)-(d) are represented as an im-
age with a certain number of dots, the plots in (a)-(d) are represented in terms
of mathematical primitives like lines and curves — this is usually referred to as a
vector representation or vector graphics. The advantage of vector graphics is that
the actual dots to be used are not determined until the figure is to be drawn. This
means that in figure (c) the dots which are drawn were not determined until the
magnification was known. On the other hand, the plot in (b) was saved as an
image with a fixed number of dots, just like the pictures of the bus earlier in the
chapter. So when this image is magnified, the only possibility is to magnify the
dots themselves, which inevitably produces a grainy picture like the one in(d).
In vector graphics formats all elements of a drawing are represented in terms
of mathematical primitives. This includes all lines and curves as well as text. A
line is typically represented by its two endpoints and its width. Curved shapes
are either represented in terms of short connected line segments or smoothly
connected polynomial curve segments. Whenever a drawing on a monitor or
printer is requested, the actual dots to be printed are determined from the math-
ematical representation. In particular this applies to fonts (the graphical shapes
of characters) which are usually represented in terms of quadratic or cubic poly-
nomial curves (so-called Bezier curves), see figure 8.15 for an example.
146 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
TIFF. Tagged Image File Format is a flexible image format that may contain
multiple images of different types in the same file via so-called ’tags’. TIFF sup-
148 CHAPTER 8. DIGITAL IMAGES AND IMAGE FORMATS
JPEG. Joint Photographic Experts Group is an image format that was approved
as an international standard in 1994. JPEG is usually lossy, but may also be loss-
less and has become a popular format for image representation on the Internet.
The standard defines both the algorithms for encoding and decoding and the
storage format. JPEG divides the image into 8 × 8 blocks and transforms each
block with a Discrete Cosine Transform. These values corresponding to higher
frequencies (rapid variations in colour) are then set to 0 unless they are quite
large, as this is not noticed much by human perception. The perturbed DCT val-
ues are then coded by a variation of Huffman coding. JPEG may also use arith-
metic coding, but this increases both the encoding and decoding times, with
only about 5 % improvement in the compression ratio. The compression level
in JPEG images is selected by the user and may result in conspicuous artefacts
if set too high. JPEG is especially prone to artefacts in areas where the inten-
sity changes quickly from pixel to pixel. The extension of a JPEG-file is .jpg or
.jpeg.
PNG. Portable Network Graphics is a lossless image format that was published
in 1996. PNG was not designed for professional use, but rather for transferring
images on the Internet, and only supports grey-level images and rgb images
(also palette based colour images). PNG was created to avoid a patent on the
LZW-algorithm used in GIF, and also GIF’s limitation to eight bit colour infor-
mation. For efficient coding PNG may (this is an option) predict the value of a
pixel from the value of previous pixels, and subtract the predicted value from the
actual value. It can then code these error values using a lossless coding method
called DEFLATE which uses a combination of the LZ77 algorithm and Huffman
coding. This is similar to the algorithm used in lossless audio formats like Apple
Lossless and FLAC. The extension of PNG-files is .png.
JPEG 2000. This lossy (can also be used as lossless) image format was devel-
oped by the Joint Photographic Experts Group and published in 2000. JPEG 2000
transforms the image data with a wavelet transform rather than a DCT. After sig-
nificant processing of the wavelet coefficients, the final coding uses a version
of arithmetic coding. At the cost of increased encoding and decoding times,
8.3. IMAGE FORMATS 149