0% found this document useful (0 votes)
12 views50 pages

Computer Vasion Part 1

The document provides an extensive overview of digital image processing, detailing its basic steps, types of images, phases, advantages, and disadvantages. It distinguishes between image processing and computer vision, explaining that image processing is a subset of computer vision focused on enhancing images, while computer vision aims for high-level understanding from images. Additionally, it covers various mathematical and logical operations used in image analysis and processing.

Uploaded by

mh3749175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views50 pages

Computer Vasion Part 1

The document provides an extensive overview of digital image processing, detailing its basic steps, types of images, phases, advantages, and disadvantages. It distinguishes between image processing and computer vision, explaining that image processing is a subset of computer vision focused on enhancing images, while computer vision aims for high-level understanding from images. Additionally, it covers various mathematical and logical operations used in image analysis and processing.

Uploaded by

mh3749175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

COMPUTER VISION

A PRACTICAL INTRODUCTION TO

COMPUTER VISION WITH OPEN CV

1
Digital Image Processing

Representation of binary images on a computer is done using


zeros and ones. Each image consists of rows and columns
containing a set of pixels, where a pixel is the smallest unit in the
image. The clarity of the image increases with a greater number
of pixels.

Digital Image Processing means processing digital image by


means of a digital computer. We can also say that it is a use of
computer algorithms, in order to get enhanced image either to
extract some useful information. Digital image processing is the
use of algorithms and mathematical models to process and
analyze digital images.

The basic steps involved in digital image processing are:

1. Image acquisition: This involves capturing an image using a


digital camera or scanner, or importing an existing image into
a computer.
2. Image enhancement: This involves improving the visual quality
of an image, such as increasing contrast, reducing noise, and
removing artifacts.
3. Image restoration: This involves removing degradation from
an image, such as blurring, noise, and distortion.
4. Image segmentation: This involves dividing an image into
regions or segments, each of which corresponds to a specific
object or feature in the image.
5. Image representation and description: This involves
representing an image in a way that can be analyzed and
manipulated by a computer, and describing the features of an
image in a compact and meaningful way.
6. Image analysis: This involves using algorithms and
mathematical models to extract information from an image,
such as recognizing objects, detecting patterns, and quantifying
features.
7. Image synthesis and compression: This involves generating new
images or compressing existing images to reduce storage and
transmission requirements.
8. Digital image processing is widely used in a variety of
applications, including medical imaging, remote sensing,
computer vision, and multimedia.

Image processing mainly includes the following steps:

1. Importing the image via image acquisition tools;


2. Analyzing and manipulating the image;
3. Output in which result can be altered image or a report which
is based on analyzing that image.

What is an image?

An image is defined as a two-dimensional function (x, y), where x


and y are spatial coordinates, and the amplitude of F at any pair
of coordinates (x, y) is called the intensity of that image at that
point. When x, y, and amplitude values of F are finite, we call it a
digital image.
In other words, an image can be defined by a two-dimensional
array specifically arranged in rows and columns.
Digital Image is composed of a finite number of elements, each of
which elements have a particular value at a particular location.
These elements are referred to as picture elements, image
elements, and pixels. A Pixel is most widely used to denote the
elements of a Digital Image.
Types of an Image

1. BINARY IMAGE – The binary image, as its name suggests, contains


only two pixel elements, i.e., 0 & 1, where 0 refers to black and 1 refers
to white. This image is also known as Monochrome.

2. BLACK AND WHITE IMAGE – The image which consists of only


black and white colors is called a BLACK AND WHITE IMAGE.

3. 8-bit COLOR FORMAT – It is the most famous image format. It has


256 different shades of colors in it and is commonly known as

3
Grayscale Image. In this format, 0 stands for Black, and 255 stands for
White, and 127 stands for Gray.

4. 16-bit COLOR FORMAT – It is a color image format. It has 65,536


different colors in it. It is also known as High Color Format. In this
format, the distribution of color is not the same as in Grayscale images.

A 16-bit format is actually divided into three further formats which are
Red, Green, and Blue. That famous RGB format.

PHASES OF IMAGE PROCESSING

1. ACQUISITION – It could be as simple as being given an image in


digital form. The main work involves: a) Scaling b) Color conversion
(RGB to Gray or vice-versa)

2. IMAGE ENHANCEMENT – It is among the simplest and most


appealing areas of Image Processing. It is also used to extract some
hidden details from an image and is subjective.

3. IMAGE RESTORATION – It also deals with the appeal of an image


but it is objective (Restoration is based on a mathematical or
probabilistic model or image degradation).

4. COLOR IMAGE PROCESSING – It deals with pseudo-color and


full-color image processing; color models are applicable to digital
image processing.

5. WAVELETS AND MULTI-RESOLUTION PROCESSING – It is


the foundation of representing images in various degrees.

6. IMAGE COMPRESSION – It involves developing some functions to


perform this operation. It mainly deals with image size or resolution.

7. MORPHOLOGICAL PROCESSING – It deals with tools for


extracting image components that are useful in the representation &
description of shape.

8. SEGMENTATION PROCEDURE – It includes partitioning an


image into its constituent parts or objects. Autonomous segmentation is
the most difficult task in Image Processing.
9. REPRESENTATION & DESCRIPTION – It follows the output of
the segmentation stage; choosing a representation is only part of the
solution for transforming raw data into processed data.

Advantages of Digital Image Processing:

1. Improved image quality: Digital image processing algorithms can


improve the visual quality of images, making them clearer, sharper,
and more informative.

2. Automated image-based tasks: Digital image processing can


automate many image-based tasks, such as object recognition, pattern
detection, and measurement.

3. Increased efficiency: Digital image processing algorithms can


process images much faster than humans, making it possible to analyze
large amounts of data in a short amount of time.

4. Increased accuracy: Digital image processing algorithms can provide


more accurate results than humans, especially for tasks that require
precise measurements or quantitative analysis.

Disadvantages of Digital Image Processing:

1. High computational cost: Some digital image processing algorithms are


computationally intensive and require significant computational resources.

2. Limited interpretability: Some digital image processing algorithms


may produce results that are difficult for humans to interpret, especially for
complex or sophisticated algorithms.

3. Dependence on quality of input: The quality of the output of digital


image processing algorithms is highly dependent on the quality of the input
images. Poor quality input images can result in poor quality output.

4. Limitations of algorithms: Digital image processing algorithms have


limitations, such as the difficulty of recognizing objects in cluttered or
poorly lit scenes, or the inability to recognize objects with significant
deformations or occlusions.

5. Dependence on good training data: The performance of many digital


image processing algorithms is dependent on the quality of the training data

5
used to develop the algorithms. Poor quality training data can result in poor
performance of the algorithm.

Difference between Image Processing and Computer Vision

Image processing and Computer Vision both are very exciting fields
of Computer Science.
Image processing and Computer Vision are both very exciting fields of Computer Science.

Computer Vision:

 In Computer Vision, computers or machines are made to


gain high-level understanding from the input digital
images or videos with the purpose of automating tasks that
the human visual system can do. It uses many techniques
and Image Processing is just one of them. Image
Processing:
 Image Processing is the field of enhancing the images by
tuning many parameters and features of the images. So,
Image Processing is the subset of Computer Vision. Here,
transformations are applied to an input image and the
resultant output image is returned. Some of these
transformations are sharpening, smoothing, stretching etc.
Now, as both the fields deal with working in visuals,
images and videos, there seems to be a lot of confusion
about the difference between these fields of computer
science. In this article, we will discuss the difference
between them

The stages of image analysis processing can be outlined as


follows:

1. Pre-processing This stage is used to identify and remove


noise (such as dots, speckles, and scratches) and irrelevant
visual information that does not affect the results of the areas
to be processed later.
2. Data Reduction: This stage is used to reduce the data in the
spatial domain and transfer the result to another place called
the frequency domain. We record properties (frequency
domain = spatial domain) of the analysis used for processing.

7
Primary processing is divided into sections:

1. Image engineering for a specific internal region or a particular


internal copy will use the derived features of a specific region
called ROI, where certain operations are adjusted by spatial
coordinates used in image engineering processes, including
(Group) or (Zoom) for enlargement, reduction, or transfer
rotation. Subsequently, a partial image is obtained for further
processing.

1- Zoom Process Method:

1. There are methods for processing (Zoom), and the first


method is the Zero-Order-Hold method, which involves
repeating the pixel values of previous rows and columns, for
example cleaning up a row to zoom in on rows or adding rows
and columns simultaneously to enlarge the matrix or column.
Example// You have the following part of the required image:
2. Zoom it using the (Zero-Order-Hold) row by row method.
3. Zoom it using the (Zero-Order-Hold) column by column
method
4. Zoom it using the (Zero-Order-Hold) row and column method.

Solution:

1. The output will be a matrix of size 3×6.


40 40 20 20 10 10
70 70 50 50 30 30
90 90 80 80 10 10

2. The result will be a matrix with a size of 3×6.

40 20 10
40 20 10
70 50 30
70 50 30
90 80 10
90 80 10

3. The output is a 3×6 matrix

40 40 20 20 10 10
40 40 20 20 10 10
70 70 50 50 30 30
70 70 50 50 30 30
90 90 80 80 10 10
90 90 80 80 10 10

How to find the average:


Finding the average between two adjacent pixel values and putting the
value between them, such as 4-8. We add it = 12. We divide by 2, so the
average value becomes 6. The result is written as 4-6-8.
If we use this method by averaging row pixels, the columns will
increase, and if we use columns, the rows will increase.

9
We can work with two pixels in each row and each column, and we can
expand the columns and rows together:
This method enlarges the capacity of the N*N matrix to become an
image matrix of size (2n-1-2n-1).

Example
If we have a 3×3 matrix that represents part of the values of the digital
image, we need to expand the columns and rows together.
Solution:
The size of the matrix becomes 5 x 5
Example for clarification: We have the following matrix on which the
rows and columns will be expanded together.
Expand

The rows and columns are operated on the resulting matrix, not the
original.
4. Zoom using factor (k):

This means, for example, that the image (matrix) is enlarged, for
example, 3 times its size. This means that the factor K=3 multiplies
it by the capacity of the matrix.

If what is required is to enlarge a matrix (part of an image), enlarge


it three or four times or something else.

We use what is called the k factor and we do the following:


1. Subtract the value of each of two adjacent values.
2. Divide the result by the magnification factor (K).
3. Add the result to the smallest value and continue adding to
all elements by (k-1).
4. Apply these steps to rows and columns.

Example: You have a portion of the following image that you want
to enlarge by 3 times its original size

Solution: We take each of two adjacent values, subtract the smaller


one from the larger one, then divide the result by 3, then add the
result of the division to the smallest value, and it becomes

11
That is, we add 5 twice, and the result becomes two numbers
between 125 and 140
[ 125 130 135 140 ]
Then we take the other two adjacent numbers, which are 140 and 155

The matrix becomes as follows:

Computer vision modeling

Image Algebra
Algebraic operations are divided into mathematical operations (arithmetic
operations) and logical operations.

Mathematical Calculations:
Addition:
The addition process is used to collect information from two images by
combining the elements of the first image with the second, starting with
the first element of the first image with the first element of the second
image, and so on for the rest of the elements. We use the addition method
to restore or number the image.
Image Restoration and to add noise to the image (as a type of encryption).

Example: You have parts of the following two images. The first image is
I1 and the second image is I2. What is required to add these two parts?
Solution

Example: If the plural is used for Noise, what is the way to break the two
images back?

Answer: We will depend on the result, so we subtract one of the two


matrices from the result. The matrix resulting from the subtraction in the
noise matrix starts from the target and we penetrate the target’s
probabilities using the probabilities of the search space.

Subtraction
The subtraction process is used to subtract information from two images,
so that we subtract every element in the first image.

Example: You have the following two images. You need to subtract the
two images?

13
Solution:
Multiplication
The process is done by multiplying the matrix elements of the image by a
factor greater than one and is used.
Increase or shrink the image
For example: The factor K must be greater than one when you want to
enlarge the image

Example: You have the following image to scale up and down using one
of the digital image jigsaw operations.

Answer: Using the multiplication process: 1- Increase it 2- Decrease it

We use multiplication as algebraic mathematical operations, for


example, we multiply this matrix by a factor (here we choose the
factor now, it was not specified for us in the question)

The coefficient K=3 is greater than one if it increases

If the image is reduced, multiply by 3-less than one.

15
Example: we have the matrix (part of an image), which is the result
of the increment process, and (K=3) required to find the original
matrix?

Answer: There are two ways to solve:


1- We divide the matrix by the factor K, where each value
in the resulting matrix is divided by
Parameter (3) produces the original matrix.
2- In terms of the decrease, such that you use the factor K
less than one, for example, K=1/3.

Note:

1 < K if increased makes the image tend to be whiter

1 > K in case of decrease in (shrinking) and here the image tends to


black (darkness)

• Division:
- The elements of the given image are divided by a factor greater
than one. The division process makes the image dark.
- Such as // You have the following matrix, which is part of an
image. You need to divide the image by a factor of K=4
Solution
Logical operations:
• Logical AND operation:
Logical operations are applied to the elements of the image after
converting each element of the image to the binary state, so that
logical operations can be used in it through the (ROI) method.
AND is considered similar to the multiplication process,
meaning that the image tends to be white and is done through a
white square with Image elements so that the output is the part
of the image corresponding to the white square.
(AND: makes the background of the part we want white,
while OR makes the background of the part we want black)

• Logical OR operation
it is done by taking a black square and a white background for
the required image data from the original image, and the OR
process is similar to the addition process.
• The logical operation NOT
It is used to give negative values to the original image, meaning
it deviates the opposite of the image (e.g., negative camera
film).

17
That is, the image data is reversed, i.e., black becomes white and
white becomes black
Example: If you have the following image part to use NOT?

The image resulting from the NOT process is close to black, and the
data of this image must be converted to (Binary) (0,1) format.

Example: Apply an AND gate operation to an element of the image


so that the first element is 88 and the second element is 111

Solution: We convert the 88 to the binary form (1,0) so that:


In the case of NOT, it is for one of the two numbers, so that every
zero becomes a one and every one is a zero.
First
Number
Second
Number

Note: The values here are distorted, so I used a second method to deal
with these parameters for logical operations and convert them
to binary, and also for gates. (NAND, NOR, XOR)

19
Image enhancement (spatial filters)
Filter means a process that filters the image from any remaining
impurities, that is, it highlights the features of the part of the image
that we want by removing noise and impurities.

We use spatial filters to remove noise or improve the image, as these


filters apply directly in the image field (directly on the image
elements) and not in the frequency domain (transformation), where
the image elements are used using one of the transformations such as
the Fourier transform and the cos transform.

Filters are divided into three types:


1- Mean Filter
2- Median Filter
3- Enhancement Filter
The first and second types are used to remove noise, in addition to
some applications that give.
The form of smoothing the image,
1- Removing noise
2- Smoothing
The third type is used to clarify the edges and details in the image,
where spatial filters are applied, either by using the elements directly
without using and your name, or by using a wrap mask with the
elements and their neighbors.
The results of the mask can be known as follows:

1. If the sum of the mask’s parameters equals 1, it means high


illumination of the image.

2. If the sum of the coefficients is equal to 0, then the image


illumination loses, that is, it tends to become black.

3. If the coefficients are negative and wavelike, this means


information about the edges.

1- If the coefficients are only waves, there is some kind of


distortion in the image.

21
1- Mean Filter
- It is a linear filter whose elements are:

All of its elements are positive, and because they are all positive,
there is distortion in the image, and since the sum of the mask’s
elements equals 1, then there is a high loss.

Example: Apply the Mean mask to the following image:

Solution:

We know that the result is two points, so we connect them and the
shape becomes linear.
2- -Median Filter
It is a nonlinear filter that acts on image elements immediately after
selecting a mask through elements where the center of the image is
replaced by the value in the middle.

Example: Apply Median Filter to the following image part?

There is no masker in Median, and we create a masker for it from the


elements of the matrix, where we take elements
The image and we make a mask in it, and that is in the order of the
image elements in ascending order, so it becomes:
1- The first step: Arrange the items in ascending order

2- The second step: We divide the number of elements by 2 to


extract the middle location.

3- Step Three: We see what the value of the fifth position is


The value of the fifth position is 5
It is not necessary that the value be equal to the intermediate
position
-4 The fourth step: We change the middle location in the
matrix, which is the fifth location (4), which is “equal to the

23
value 5 instead of the middle element in the original matrix,
which is 4 (that is, we put 5 instead of 4), and it becomes

Example: You have the following image fragment

Solution:
1- We take the part (3*3) and arrange it in ascending order

2- We select the element in the middle

3- We find the value in the middle


3=5
4- We replace the element in the middle and write the matrix

We take the next part of the matrix, which is also (3*3), and
arrange it in ascending order.
Enhancement Filter

Image Quantization:
- The difference between compression and shrinkage: Image shrinkage is the
process of transferring image data by removing some image information by
projecting a group of image elements to a single point. This process of
shrinkage (quantization) takes place.
- As for compression: we deal with the image itself as a file, while shrinking
may delete part of the image and we deal with the values of the image.

- There are two ways to reduce the image:

1- Gray Level Reduction


- That is, we reduce the color levels of the image, and here it is done on
I(r,c).

2- Special Reduction
- Here, work is done on the coordinates of the image elements (r, c),
the location, for example (1,1).

1- Gray Level Reduction


- We can explain its sections as follows:

Reducing the gray level consists of three points:

25
A- The first method: Threshold

A specific value of color levels is chosen. This value is called a threshold. Any
value of image data that is higher than the threshold value becomes one, and if
it is lower, its value becomes zero. This means that the number with 256 color
levels is converted into binary images.
like
If the threshold value is 127, apply it to the following values:

Solution:
- We see that the highest value is 251 and the lowest value is 11 according to
the binary model, the threshold is 127, so it is:

Example: If the threshold value is 127, apply it to the following values:


Solution: All of this value is less than the threshold of 127, so it is zero. In this
case, we determine the largest and the least, and we take the middle, so the
smallest value is 2, and the largest value is 25, so the middle between them is
12 or 13.
B- The second method:

It is the OR, AND process, not using the mask


Here the bits per pixel are reduced.

Example:
We want to reduce or reduce the information to eight. If the standard
probability for us is 256, for the gray level to 32 levels, we use the AND
method to explain this. The smallest number of each place is in AND.

Solution:

So, these color spectra must be reduced from 256 to 32 spectra, meaning that
every 8 bits we put in a cell, after which we take the lowest value in each cell,
so here the first cell will have the lowest value, 0, the second, 8, and the third,
16, until it reaches 256, so that the number of these extracted numbers is 32
numbers.
But if the OR method takes the largest number in the box, why?
Answer: Because R does not take a zero, and the numbers are less than
AND by one number starting from the number
The second) becomes .............. 32 15 7 to OR takes the largest number from
each digit

Example: If you have the number of standard levels (256) do you want to
reduce it to 16?

27
Answer: This means that it is all 16 bits in one place, so the division is by
16, so we start with the first place from 0 to 16, and so on.

C- The third method is ANE and OR using the mask

This method is used to shrink the image (reduce it) using a specific mask.
Example //
If you have diamonds, the following is required to use the AND catcher
method to shrink this part of the image depending on the number of bits for
each element, which is 8?

Solution:
The law of gray levels

Any 8 comes from (0, 7)


In binary, the 7 becomes (111), so the masker is 8 bits and equals 00000111.
Then we take the number of a number from the matrix and convert it to
binary form and multiply it by the masker, so you get the number zero,
which in binary is 00000000
After hitting it with musk, it will be:
And 10, which we convert to binary, is 1010

As for the number 255, it is

And so on for the rest of the numbers in the matrix

29
2- Special Reduction
Reducing space is done in three ways:
1- Average
2- Median
3- Reduction

 The first method: the rate

This is a method that takes a group of adjacent elements and takes their
average.

Example: It is required to use the rate method for the following image part

Solution:

If the overall image rate is:


Total Average of Image = (33 +17)/4= 13

If the average is for rows, it is the sum of the row divided by the number of
numbers in one row.

 The second method: the mediator


In this method, the image elements are arranged in ascending order and the
value in the middle is taken.
Example: You have the following array. What is required: 1- Taking the
median of all elements 2- Using a specific mask?
- We arrange the numbers in ascending order, so they become:

- If the median is the sixth element, i.e.

- If the masker is used and the masker is assumed to be 3*3 and we arrange
the elements in ascending order, the order of the elements of the first matrix
will be 3*3.

So, the median is the fifth element and its value is 5


The order of the elements of the second matrix is 3*3

So, the median is the fifth element and its value is also 5

• The third method is reduction

31
- Some image data is deleted, for example, the image size is reduced by 2.
Here, each row is taken or a column from the image and delete the row
and the next column/

Example: You have the following image part that needs to be reduced by 2
by columns?

So, the second and fourth columns are deleted, meaning the matrix becomes
as follows:

\
If the reduction is by 3, we delete two columns, the second and third,
meaning 2 = 1 - 3.
So, the matrix becomes a sequence

In this case, if what is required is to reduce by 3 rows, the answer is not


possible because the matrix consists of 3 rows, not
Histogram modification
- A chart that uses the gray levels of an image distributes these levels of
the image so that the part of the image that contains the information
fills the chart and the rest of the space is empty, depending on the
values of the chart's image points.
There are many of these modified levels that we can mention as follows:
1. Histogram with a small spread of low contrast levels
2. Image.
3. Histogram with a large spread of High Contrast gray levels
4. Image.
5. Histogram clustered at the low-end Dark Slide Image
6. Histogram clustered at the top end White Slide Image

The process of changing the histogram is done in three ways:


• Histogram Stretching
• Histogram Shrink (compressed)
• Slide of Histogram

• The first method: Histogram Stretching

33
The histogram can be expanded according to the following law:

whereas:
1. The largest gray level value in the image I (r,c)max
2. The minimum gray level value in the image is I(r,c) min
3. The possible minimum and maximum gray level values depend on
(255.0) Max & Min

Example: You have the following image part to expand this part of the
image using the histogram expansion method?
• The second method is histogram shrinking

The histogram can be reduced according to the following law:

35
whereas:
1. The largest gray level value in the image is I(r,c)max
2. The minimum gray level value in the image is I(r,c)man
3. Depends on the maximum and minimum Shrinkmax & Shrinkmin
gray level values the potential is (0, 255).

Example: You have the following image part to shrink this part of the
image using method Shrink histogram?
• The third method: Shift the histogram slide
The histogram can be shifted by a certain distance according to the
following law:
Slide (I (r,c) ) = I (r,c) - OFFSET.................(8)
whereas:
Offset: The amount by which the histogram is offset by a distance.

Example: You have the following part of the image that needs to be
shifted by a distance of 10 units using the Histogram slide method.

37
Introduction

Computer vision is the automatic analysis of images and videos by


computers in order to gain some understanding of the world.
Computer vision is inspired by the capabilities of the human
vision system and, when initially addressed in the 1960s and
1970s, it was thought to be a relatively straightforward problem
to solve.

However, the reason we think/thought that vision is easy is that


we have our own visual system which makes the task seem
intuitive to our conscious minds. In fact, the human visual system
is very complex and even the estimates of how much of the brain
is involved with visual processing vary from 25% up to more than
50%.

1.1 A Difficult Problem

The first challenge facing anyone studying this subject is to


convince them that the problem is difficult. To try to illustrate the
difficulty, we first show three different versions of the same image
in Figure 1.1. For a computer, an image is just an array of values,
such as the array shown in the left-hand image in Figure 1.1. For
us, using our complex vision system, we can perceive this as a face
image but only if we are shown it as a grey scale image (top right).
Computer vision is quite like understanding the array of values
shown in Figure 1.1, but is more complicated as the array is really
much bigger (e.g. to be equivalent to the human eye a camera
would need around 127 million elements), and more complex (i.e.
with each point represented by three values in order to encode
colour information). To make the task even more convoluted, the
images are constantly changing, providing a stream of 50–60
images per second and, of course, there are two streams of data as
we have two eyes/cameras.

39
67 67 66 68 66 67 64 65 65 63 63 69 61 64 63 66 61 60

69 68 63 68 65 62 65 61 50 26 32 65 61 67 64 65 66 63

72 71 70 87 67 60 28 21 17 18 13 15 20 59 61 65 66 64

75 73 76 78 67 26 20 19 16 18 16 13 18 21 50 61 69 70

74 75 78 74 39 31 31 30 46 37 69 66 64 43 18 63 69 60

73 75 77 64 41 20 18 22 63 92 99 88 78 73 39 40 59 65

74 75 71 42 19 12 14 28 79 102 107 96 87 79 57 29 68 66

75 75 66 43 12 11 16 62 87 84 84 108 83 84 59 39 70 66

76 74 49 42 37 10 34 78 90 99 68 94 97 51 40 69 72 65

76 63 40 57 123 88 60 83 95 88 80 71 67 69 32 67 73 73

78 50 32 33 90 121 66 86 100 116 87 85 80 74 71 56 58 48

80 40 33 16 63 107 57 86 103 113 113 104 94 86 77 48 47 45

88 41 35 10 15 94 67 96 98 91 86 105 81 77 71 35 45 47

87 51 35 15 15 17 51 92 104 101 72 74 87 100 27 31 44 46

86 42 47 11 13 16 71 76 89 95 116 91 67 87 12 25 43 51

96 67 20 12 17 17 86 89 90 101 96 89 62 13 11 19 40 51

99 88 19 15 15 18 32 107 99 86 95 92 26 13 13 16 49 52

99 77 16 14 14 16 35 115 111 109 91 79 17 16 13 46 48 51

Figure 1.1 Different versions of an image. An array of numbers


(left) which are the values of the grey scales in the low-resolution
image of a face (top right). The task of computer vision is most
like understanding the array of numbers

1.2 The Human Vision System

If we could duplicate the human visual system then the problem


of developing a computer vision system would be solved. So why
can’t we? The main difficulty is that we do not understand what
the human vision system is doing most of the time.

If you consider your eyes, it is probably not clear to you that your
colour vision (provided by the 6–7 million cones in the eye) is
concentrated in the centre of the visual field of the eye (known as
the macula). The rest of your retina is made up of around 120
million rods (cells that are sensitive to visible light of any
wavelength/colour). In addition, each eye has a rather large blind
spot where the optic nerve attaches to the retina. Somehow, we
think we see a continuous image (i.e. no blind spot) with colour
everywhere, but even at this lowest level of processing it is unclear
as to how this impression occurs within the brain.
The visual cortex (at the back of the brain) has been studied and
found to contain cells that perform a type of edge detection (see
Chapter 6), but mostly we know what sections of the brain do
based on localised brain damage to individuals. For example, a
number of people with damage to a particular section of the brain
can no longer recognise faces (a condition known as
prosopagnosia). Other people have lost the ability to sense moving
objects (a condition known as akinetopsia). These conditions
inspire us to develop separate modules to recognise faces (e.g. see
Section 8.4) and to detect object motion (e.g. see Chapter 9).

We can also look at the brain using functional MRI, which allows
us to see the concentration of electrical activity in different parts
of the brain as subjects perform various activities. Again, this
may tell us what large parts of the brain are doing, but it cannot
provide us with algorithms to solve the problem of interpreting
the massive arrays of numbers that video cameras provid

41
1.3 Practical Applications of Computer Vision

Computer vision has many applications in industry, particularly


allowing the automatic inspection of manufactured goods at any
stage in the production line. For example, it has been used to:
Inspect printed circuits boards to ensure that tracks and
components are placed correctly. See Figure 1.2.
Inspect print quality of labels. See Figure 1.3.
Inspect bottles to ensure they are properly filled. See Figure 1.3.

Figure 1.2 PCB inspection of pads (left) and images of some


detected flaws in the surface mounting of components (right).
Reproduced by permission of James Mahon

Figure 1.3 Checking print quality of best-before dates (right), and


monitoring level to which bottles are filled (right). Reproduced by
permission of Omron Electronics LLC
Guide robots when manufacturing complex products such as
cars.
On the factory floor, the problem is a little simpler than in the
real world as the lighting can be constrained and the possible
variations of what we can see are quite limited. Computer vision
is now solving problems outside the factory. Computer vision
applications outside the factory include:

The automatic reading of license plates as they pass through


tollgates on major roads.
Augmenting sports broadcasts by determining distances for
penalties, along with a range of other statistics (such as how far
each player has travelled during the game).
Biometric security checks in airports using images of faces and
images of fingerprints. See Figure 1.4.
Augmenting movies by the insertion of virtual objects into video
sequences, so that they appear as though they belong (e.g. the
candles in the Great Hall in the Harry Potter movies).
30.8
30.0
29.1
28.3
27.5
26.6
25.8
25.0
24.2
23.3
22.5
°C
 Figure 1.4 Buried landmines in an infrared image (left).
Reproduced by permission of Zouheir Fawaz,
 Handprint recognition system (right). Reproduced by
permission of Siemens AG
 Assisting drivers by warning them when they are drifting
out of lane.
 Creating 3D models of a destroyed building from multiple
old photographs.

43

Advanced interfaces for computer games allowing the real
time detection of players or their hand-held controllers.
 Classification of plant types and anticipated yields based
on multispectral satellite images. Detecting buried
landmines in infrared images. See Figure 1.4.
Some examples of existing computer vision systems in the outside
world are shown in Figure 1.4.

1.4 The Future of Computer Vision


The community of vision developers is constantly pushing the
boundaries of what we can achieve. While we can produce
autonomous vehicles, which drive themselves on a highway, we
would have difficulties producing a reliable vehicle to work on
minor roads, particularly if the road marking were poor. Even in
the highway environment, though, we have a legal issue, as who is
to blame if the vehicle crashes? Clearly, those developing the
technology do not think it should be them and would rather that
the driver should still be responsible should anything go wrong.

This issue of liability is a difficult one and arises with many vision
applications in the real world. Taking another example, if we
develop a medical imaging system to diagnose cancer, what will
happen when it mistakenly does not diagnose a condition? Even
though the system might be more reliable than any individual
radiologist, we enter a legal minefield. Therefore, for now, the
simplest solution is either to address only non-critical problems or
to develop systems, which are assistants to, rather than
replacements for, the current human experts.

Another problem exists with the deployment of computer vision


systems. In some countries the installation and use of video
cameras is considered an infringement of our basic right to
privacy. This varies hugely from country to country, from
company to company,

and even from individual to individual. While most people


involved with technology see the potential benefits of camera
systems, many people are inherently distrustful of video cameras
and what the videos could be used for. Among other things, they
fear (perhaps justifiably) a Big Brother scenario, where our
movements and actions are constantly monitored. Despite this,

the number of cameras is growing very rapidly, as there are


cameras on virtually every new computer, every new phone, every
new games console, and so on.

Moving forwards, we expect to see computer vision addressing


progressively harder problems; that is problems in more complex
environments with fewer constraints. We expect computer vision
to start to be able to recognise more objects of different types and
to begin to extract more reliable and robust descriptions of the
world in which they operate. For example, we expect computer
vision to

 become an integral part of general computer interfaces;


 provide increased levels of security through biometric
analysis;
 provide reliable diagnoses of medical conditions from
medical images and medical records;
 allow vehicles to be driven autonomously;
 automatically determine the identity of criminals through
the forensic analysis of video.

45
Figure 1.5 The ASIMO humanoid robot which has two cameras in
its ‘head’ which allow ASIMO to determine how far away things
are, recognise familiar faces, etc. Reproduced by permission of
Honda Motor Co. Inc

Ultimately, computer vision is aiming to emulate the capabilities


of human vision, and to provide these abilities to humanoid (and
other) robotic devices, such as ASIMO (see Figure 1.5). This is
part of what makes this field exciting, and surprising, as we all
have our own (human) vision systems which work remarkably
well, yet when we try to automate any computer vision task it
proves very difficult to do reliably.
2
Images
Images play a crucial role in computer vision, serving as the visual data
captured by devices like cameras. They represent the appearance of
scenes, which can be processed to highlight key features before
extracting information. Images often contain noise, which can be
reduced using basic image processing methods.

2.1 Cameras

A camera includes a photosensitive image plane that detects light, a


housing that blocks unwanted light, and a lens that directs light onto the
image plane in a controlled manner, focusing the light rays.

47
48
2.1.1The Simple Pinhole Camera Model

The pinhole camera model is a basic yet realistic representation of a


camera, where the lens is considered a simple pinhole through which all
light rays pass to reach the image plane. This model simplifies real
imaging systems, which often have distortions caused by lenses.
Adjustments to address these distortions are discussed in Section 5.6.

X
I
Z
J
Image
plane Focal

Figure 2.1 illustrates the pinhole camera model, demonstrating how the
3D real world (right side) relates to images on the image plane (left side).
The pinhole serves as the origin in the XYZ coordinate system. In
practice, the image plane needs to be enclosed in a housing to block
stray light.

In homogeneous coordinates, www acts as a scaling factor for image


points. fif_ifi and fjf_jfj represent a combination of the camera's focal
length and pixel sizes in the I and J directions. (ci,cj)(c_i, c_j)(ci,cj) are
the coordinates where the optical axis, a line perpendicular to the image
plane passing through the pinhole, intersects the image plane.

2.2 Images

An image is a 2D projection of a 3D scene captured by a sensor,


represented as a continuous function of two coordinates (i, j), (column,
row), or (x, y). For digital processing, the image needs to be converted
into a suitable format.

To process an image digitally, it is sampled into a matrix with MMM


rows and NNN columns and then quantized, assigning each matrix
element an integer value. The continuous range is divided into intervals,
commonly k=256k = 256k=256.
2.2.1 Sampling

Digital images are formed by sampling a continuous image into discrete


elements using a 2D array of photosensitive elements (pixels). Each pixel
has a fixed photosensitive area, with non-photosensitive borders between
them. There is a small chance that objects could be missed if their light
falls only in these border areas. A bigger challenge with sampling is that
each pixel represents the average luminance or chrominance over an
area, which might include light from multiple objects, especially at
object boundaries.

The number of samples in an image determines the ability to distinguish


objects within it. A sufficient resolution (number of pixels) is crucial for
accurately recognizing objects. However, if the resolution is too high, it
may include unnecessary details, making processing more difficult and
slower.

2.2.2Quantization

Each pixel in a digital image f(i,j)f(i, j)f(i,j) represents scene brightness


as a continuous function. However, these brightness values must be
discretely represented using digital values. Typically, the number of
brightness levels per channel is k=2bk = 2^bk=2b, where bbb is the
number of bits, commonly set to 8.

Figure 2.2 Four different samplings of the same image; top left 256x192,
top right 128x96, bottom left 64x48 and bottom right 32x24

49
50
The essential question is how many bits are truly needed to represent
pixels. Using more bits increases memory requirements, while using
fewer bits results in information loss. Although 8-bit and 6-bit images
appear similar, the latter uses 25% fewer bits. However, 4-bit and 2-bit
images show significant issues, even if many objects can still be
recognized. The required bit depth depends on the intended use of the
image. For automatic machine interpretation, more quantization levels
are necessary to avoid false contours and incorrect segmentation, as seen
in lower-bit images.

Figure 2.3 Four different quantization of the same grey-scale image; top
left 8 bits, top right 6 bits, bottom left 4 bits and bottom right 2 bits

You might also like