0% found this document useful (0 votes)
11 views47 pages

Computer Vasion 17-11

The document provides a comprehensive introduction to computer vision, detailing types of digital images, the differences between image processing and computer vision, and the stages of image analysis processing. It covers various methods for image enhancement, including filters and histogram modification techniques, as well as algebraic operations used in image processing. The document serves as a practical guide for understanding and applying computer vision concepts using OpenCV.

Uploaded by

mh3749175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views47 pages

Computer Vasion 17-11

The document provides a comprehensive introduction to computer vision, detailing types of digital images, the differences between image processing and computer vision, and the stages of image analysis processing. It covers various methods for image enhancement, including filters and histogram modification techniques, as well as algebraic operations used in image processing. The document serves as a practical guide for understanding and applying computer vision concepts using OpenCV.

Uploaded by

mh3749175
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

COMPUTER VISION

A PRACTICAL INTRODUCTION TO

COMPUTER VISION WITH OPEN CV

1
What are the types of digital images?
Types of an image

1. BINARY IMAGE– The binary image as its name


suggests, contain only two-pixel elements i.e., 0 & 1,
where 0 refers to black and 1 refers to white. This image
is also known as Monochrome.
2. BLACK AND WHITE IMAGE– The image which
consist of only black and white color is called BLACK
AND WHITE IMAGE.
3. 8-bit COLOR FORMAT– It is the most famous image
format. It has 256 different shades of colors in it and
commonly known as Grayscale Image. In this format, 0
stands for Black, and 255 stands for white, and 127
stands for gray.
4. 16-bit COLOR FORMAT– It is a color image format. It
has 65,536 different colors in it. It is also known as High
Color Format. In this format the distribution of color is
not as same as Grayscale image.

2
What are the differences between image
processing and computer vision?

Difference between Image Processing and Computer Vision


Image processing and Computer Vision both are very exciting field
of Computer Science.
Computer Vision:
- In Computer Vision, computers or machines are made to gain
high-level understanding from the input digital images or videos
with the purpose of automating tasks that the human visual
system can do. It uses many techniques and Image Processing is
just one of them.
Image Processing:
- Image Processing is the field of enhancing the images by tuning
many parameters and features of the images. So, Image
Processing is the subset of Computer Vision. Here,
transformations are applied to an input image and the resultant
output image is returned. Some of these transformations are-
sharpening, smoothing, stretching etc.

3
Talk about the the stages of image
analysis processing?
The stages of image analysis processing can be outlined as
follows:

1. Pre-processing This stage is used to identify and remove


noise (such as dots, speckles, and scratches) and
irrelevant visual information that does not affect the
results of the areas to be processed later.
2. Data Reduction: This stage is used to reduce the data in
the spatial domain and transfer the result to another place
called the frequency domain. We record properties
(frequency domain = spatial domain) of the analysis used
for processing.

Primary processing is divided into sections:

1. Image engineering for a specific internal region or a


particular internal copy will use the derived features
of a specific region called ROI, where certain
4
operations are adjusted by spatial coordinates used
in image engineering processes, including (Group)
or (Zoom) for enlargement, reduction, or transfer
rotation. Subsequently, a partial image is obtained
for further processing.

1- Zoom Process Method:

1. There are methods for processing (Zoom), and the


first method is the Zero-Order-Hold method, which
involves repeating the pixel values of previous rows
and columns, for example cleaning up a row to
zoom in on rows or adding rows and columns
simultaneously to enlarge the matrix or column.
Example// You have the following part of the
required image:
2. Zoom it using the (Zero-Order-Hold) row by row
method.
3. Zoom it using the (Zero-Order-Hold) column by
column method
4. Zoom it using the (Zero-Order-Hold) row and
column method.

5
Solution:

1. The output will be a matrix of size 3×6.

40 40 20 20 10 10
70 70 50 50 30 30
90 90 80 80 10 10

2. The result will be a matrix with a size of 3×6.

40 20 10
40 20 10
70 50 30
70 50 30
90 80 10
90 80 10

3. The output is a 3×6 matrix

40 40 20 20 10 10
40 40 20 20 10 10
70 70 50 50 30 30
70 70 50 50 30 30
90 90 80 80 10 10
90 90 80 80 10 10

6
How to find the average:
Finding the average between two adjacent pixel values and putting
the value between them, such as 4-8. We add it = 12. We divide by
2, so the average value becomes 6. The result is written as 4-6-8.
If we use this method by averaging row pixels, the columns will
increase, and if we use columns, the rows will increase.
We can work with two pixels in each row and each column, and
we can expand the columns and rows together:
This method enlarges the capacity of the N*N matrix to become an
image matrix of size (2n-1-2n-1).

7
Example
If we have a 3×3 matrix that represents part of the values of the
digital image, we need to expand the columns and rows together.
Solution:
The size of the matrix becomes 5 x 5
Example for clarification: We have the following matrix on
which the rows and columns will be expanded together.
Expand

The rows and columns are operated on the resulting matrix, not the
original.

8
4. Zoom using factor (k):
This means, for example, that the image (matrix) is enlarged,
for example, 3 times its size. This means that the factor K=3
multiplies it by the capacity of the matrix.

If what is required is to enlarge a matrix (part of an image),


enlarge it three or four times or something else.

We use what is called the k factor and we do the following:


1. Subtract the value of each of two adjacent values.
2. Divide the result by the magnification factor (K).
3. Add the result to the smallest value and continue
adding to
all elements by (k-1).
4. Apply these steps to rows and columns.
Example: You have a portion of the following image that you
want to enlarge by 3 times its original size

Solution: We take each of two adjacent values, subtract the smaller


one from the larger one, then divide the result by 3, then add the
result of the division to the smallest value, and it becomes

9
That is, we add 5 twice, and the result becomes two numbers
between 125 and 140
[ 125 130 135 140 ]
Then we take the other two adjacent numbers, which are 140 and 155

The matrix becomes as follows:

10
Computer vision modeling

Image Algebra
Algebraic operations are divided into mathematical operations
(arithmetic operations) and logical operations.

Mathematical Calculations:

Addition:
The addition process is used to collect information from two images
by combining the elements of the first image with the second,
starting with the first element of the first image with the first element
of the second image, and so on for the rest of the elements. We use
the addition method to restore or number the image.
Image Restoration and to add noise to the image (as a type of
encryption).
Example: You have parts of the following two images. The first
image is I1 and the second image is I2. What is required to add these
two parts?

Solution

11
Example: You have the following two images. You need to subtract
the two images?

Solution:

12
Multiplication
The process is done by multiplying the matrix elements of the image by a factor greater than one and is used.
Increase or shrink the image.

For example: The factor K must be greater than one when you want to enlarge the image

Example: You have the following image to scale up and down using one of the digital image jigsaw operations.

Answer: Using the multiplication process: 1- Increase it 2- Decrease it

We use multiplication as algebraic mathematical operations, for example, we multiply this matrix by a
factor (here we choose the factor now, it was not specified for us in the question)

The coefficient K=3 is greater than one if it increases

If the image is reduced, multiply by 3-less than one.

13
Note:

1 < K if increased makes the image tend to be whiter

1 > K in case of decrease in (shrinking) and here the image tends to black
(darkness)

• Division:
- The elements of the given image are divided by a factor
greater than one. The division process makes the image
dark.
- Such as // You have the following matrix, which is part of
an image. You need to divide the image by a factor of
K=4
Solution

14
15
Logical operations:
• Logical AND operation:
Logical operations are applied to the elements of the image
after converting each element of the image to the binary
state, so that logical operations can be used in it through
the (ROI) method.

• Logical OR operation
it is done by taking a black square and a white background
for the required image data from the original image, and
the OR process is similar to the addition process.
• The logical operation NOT
It is used to give negative values to the original image,
meaning it deviates the opposite of the image (e.g.,
negative camera film).

That is, the image data is reversed, i.e., black becomes white and
white becomes black
Example: If you have the following image part to use NOT?

16
The image resulting from the NOT process is close to black, and
the data of this image must be converted to (Binary) (0,1)
format.

Example: Apply an AND gate operation to an element of the image


so that the first element is 88 and the second element is 111
Solution: We convert the 88 to the binary form (1,0) so that:

17
In the case of NOT, it is for one of the two numbers, so that every zero
becomes a one and every one is a zero.
First Number
Second
Number

Note: The values here are distorted, so I used a second method to deal with
these parameters for logical operations and convert them to binary,
and also for gates. (NAND, NOR, XOR)

18
Image enhancement (spatial filters)
Filter means a process that filters the image from any remaining
impurities, that is, it highlights the features of the part of the
image that we want by removing noise and impurities.

Filters are divided into three types:


1- Mean Filter
2- Median Filter
3- Enhancement Filter
The first and second types are used to remove noise, in addition
to some applications that give.
The form of smoothing the image,
1- Removing noise
2- Smoothing
The third type is used to clarify the edges and details in the
image, where spatial filters are applied, either by using the
elements directly without using and your name, or by using a
wrap mask with the elements and their neighbors.

19
The results of the mask can be known as follows:
1. If the sum of the mask’s parameters equals 1, it means
high illumination of the image.
2. If the sum of the coefficients is equal to 0, then the image
illumination loses, that is, it tends to become black.
3. If the coefficients are negative and wavelike, this means
information about the edges.
4. If the coefficients are only waves, there is some kind of
distortion in the image.

20
1- Mean Filter
- It is a linear filter whose elements are:

All of its elements are positive, and because they are all positive, there is
distortion in the image, and since the sum of the mask’s elements equals 1,
then there is a high loss.

Example: Apply the Mean mask to the following image:

Solution:

We know that the result is two points, so we connect them and the shape
becomes linear.

21
2- -Median Filter
It is a nonlinear filter that acts on image elements immediately
after selecting a mask through elements where the center of the
image is replaced by the value in the middle.
Example: Apply Median Filter to the following image part?

There is no masker in Median, and we create a masker for it from the elements of
the matrix, where we take elements
The image and we make a mask in it, and that is in the order of the image elements
in ascending order, so it becomes:
1- The first step: Arrange the items in ascending order

2- The second step: We divide the number of elements by 2 to extract the


middle location.

3- Step Three: We see what the value of the fifth position is


The value of the fifth position is 5
It is not necessary that the value be equal to the intermediate position
4- - The fourth step: We change the middle location in the matrix, which is
the fifth location (4), which is “equal to the value 5 instead of the middle
element in the original matrix, which is 4 (that is, we put 5 instead of 4),
and it becomes

22
Example: You have the following image fragment

Solution:
1- We take the part (3*3) and arrange it in ascending order

2- We select the element in the middle

3- We find the value in the middle


3=5

4- We replace the element in the middle and write the


matrix

We take the next part of the matrix, which is also (3*3), and
arrange it in ascending order.

23
Histogram modification
A chart that uses the gray levels of an image distributes these levels
of the image so that the part of the image that contains the
information fills the chart and the rest of the space is empty,
depending on the values of the chart's image points.

There are many of these modified levels that we can mention as


follows:
1. Histogram with a small spread of low contrast levels
2. Image.
3. Histogram with a large spread of High Contrast gray levels
4. Image.
5. Histogram clustered at the low-end Dark Slide Image
6. Histogram clustered at the top end White Slide Image

The process of changing the histogram is done in three ways:


• Histogram Stretching
• Histogram Shrink (compressed)
• Slide of Histogram

24
• The first method: Histogram Stretching

The histogram can be expanded according to the following law:

whereas:
1. The largest gray level value in the image I (r,c)max
2. The minimum gray level value in the image is I(r,c) min
3. The possible minimum and maximum gray level values depend on (255.0)
Max & Min
Example: You have the following image part to expand this part of the image
using the histogram expansion method?

25
• The second method is histogram shrinking

The histogram can be reduced according to the following law:

whereas:
1. The largest gray level value in the image is I(r,c)max
2. The minimum gray level value in the image is I(r,c)man
26
3. Depends on the maximum and minimum Shrinkmax &
Shrinkmin gray level values the potential is (0, 255).
Example: You have the following image part to shrink this part of
the image using method Shrink histogram?

27
• The third method: Shift the histogram slide
The histogram can be shifted by a certain distance according to the
following law:
Slide (I (r,c) ) = I (r,c) - OFFSET.................(8)
whereas:
Offset: The amount by which the histogram is offset by a distance.

Example: You have the following part of the image that needs to be
shifted by a distance of 10 units using the Histogram slide method.

28
Introduction

Computer vision is the automatic analysis of images and videos by


computers in order to gain some understanding of the world.
Computer vision is inspired by the capabilities of the human
vision system and, when initially addressed in the 1960s and
1970s, it was thought to be a relatively straightforward problem
to solve.

However, the reason we think/thought that vision is easy is that


we have our own visual system which makes the task seem
intuitive to our conscious minds. In fact, the human visual system
is very complex and even the estimates of how much of the brain
is involved with visual processing vary from 25% up to more than
50%.

1.1 A Difficult Problem

The first challenge facing anyone studying this subject is to


convince them that the problem is difficult. To try to illustrate the
difficulty, we first show three different versions of the same image
in Figure 1.1. For a computer, an image is just an array of values,
such as the array shown in the left-hand image in Figure 1.1. For
us, using our complex vision system, we can perceive this as a face
image but only if we are shown it as a grey scale image (top right).

Computer vision is quite like understanding the array of values


shown in Figure 1.1, but is more complicated as the array is really
much bigger (e.g. to be equivalent to the human eye a camera
would need around 127 million elements), and more complex (i.e.
with each point represented by three values in order to encode
colour information). To make the task even more convoluted, the
images are constantly changing, providing a stream of 50–60
images per second and, of course, there are two streams of data as
we have two eyes/cameras.

29
67 67 66 68 66 67 64 65 65 63 63 69 61 64 63 66 61 60

69 68 63 68 65 62 65 61 50 26 32 65 61 67 64 65 66 63

72 71 70 87 67 60 28 21 17 18 13 15 20 59 61 65 66 64

75 73 76 78 67 26 20 19 16 18 16 13 18 21 50 61 69 70

74 75 78 74 39 31 31 30 46 37 69 66 64 43 18 63 69 60

73 75 77 64 41 20 18 22 63 92 99 88 78 73 39 40 59 65

74 75 71 42 19 12 14 28 79 102 107 96 87 79 57 29 68 66

75 75 66 43 12 11 16 62 87 84 84 108 83 84 59 39 70 66

76 74 49 42 37 10 34 78 90 99 68 94 97 51 40 69 72 65

76 63 40 57 123 88 60 83 95 88 80 71 67 69 32 67 73 73

78 50 32 33 90 121 66 86 100 116 87 85 80 74 71 56 58 48

80 40 33 16 63 107 57 86 103 113 113 104 94 86 77 48 47 45

88 41 35 10 15 94 67 96 98 91 86 105 81 77 71 35 45 47

87 51 35 15 15 17 51 92 104 101 72 74 87 100 27 31 44 46

86 42 47 11 13 16 71 76 89 95 116 91 67 87 12 25 43 51

96 67 20 12 17 17 86 89 90 101 96 89 62 13 11 19 40 51

99 88 19 15 15 18 32 107 99 86 95 92 26 13 13 16 49 52

99 77 16 14 14 16 35 115 111 109 91 79 17 16 13 46 48 51

Figure 1.1 Different versions of an image. An array of numbers


(left) which are the values of the grey scales in the low-resolution
image of a face (top right). The task of computer vision is most
like understanding the array of numbers

1.2 The Human Vision System

If we could duplicate the human visual system then the problem


of developing a computer vision system would be solved. So why
can’t we? The main difficulty is that we do not understand what
the human vision system is doing most of the time.

If you consider your eyes, it is probably not clear to you that your
colour vision (provided by the 6–7 million cones in the eye) is
concentrated in the centre of the visual field of the eye (known as

30
the macula). The rest of your retina is made up of around 120
million rods (cells that are sensitive to visible light of any
wavelength/colour). In addition, each eye has a rather large blind
spot where the optic nerve attaches to the retina. Somehow, we
think we see a continuous image (i.e. no blind spot) with colour
everywhere, but even at this lowest level of processing it is unclear
as to how this impression occurs within the brain.
The visual cortex (at the back of the brain) has been studied and
found to contain cells that perform a type of edge detection (see
Chapter 6), but mostly we know what sections of the brain do
based on localised brain damage to individuals. For example, a
number of people with damage to a particular section of the brain
can no longer recognise faces (a condition known as
prosopagnosia). Other people have lost the ability to sense moving
objects (a condition known as akinetopsia). These conditions
inspire us to develop separate modules to recognise faces (e.g. see
Section 8.4) and to detect object motion (e.g. see Chapter 9).
We can also look at the brain using functional MRI, which allows
us to see the concentration of electrical activity in different parts
of the brain as subjects perform various activities. Again, this
may tell us what large parts of the brain are doing, but it cannot
provide us with algorithms to solve the problem of interpreting
the massive arrays of numbers that video cameras provide.

31
1.3 Practical Applications of Computer Vision

Computer vision has many applications in industry, particularly


allowing the automatic inspection of manufactured goods at any
stage in the production line. For example, it has been used to:
Inspect printed circuits boards to ensure that tracks and
components are placed correctly. See Figure 1.2.
Inspect print quality of labels. See Figure 1.3.
Inspect bottles to ensure they are properly filled. See Figure 1.3.

Figure 1.2 PCB inspection of pads (left) and images of some


detected flaws in the surface mounting of components (right).
Reproduced by permission of James Mahon

Figure 1.3 Checking print quality of best-before dates (right), and


monitoring level to which bottles are filled (right). Reproduced by
permission of Omron Electronics LLC

Guide robots when manufacturing complex products such as


cars.

32
On the factory floor, the problem is a little simpler than in the
real world as the lighting can be constrained and the possible
variations of what we can see are quite limited. Computer vision
is now solving problems outside the factory. Computer vision
applications outside the factory include:

The automatic reading of license plates as they pass through


tollgates on major roads.
Augmenting sports broadcasts by determining distances for
penalties, along with a range of other statistics (such as how far
each player has travelled during the game).
Biometric security checks in airports using images of faces and
images of fingerprints. See Figure 1.4.
Augmenting movies by the insertion of virtual objects into video
sequences, so that they appear as though they belong (e.g. the
candles in the Great Hall in the Harry Potter movies).
30.8
30.0
29.1
28.3
27.5
26.6
25.8
25.0
24.2
23.3
22.5
°C
▪ Figure 1.4 Buried landmines in an infrared image (left).
Reproduced by permission of Zouheir Fawaz,
▪ Handprint recognition system (right). Reproduced by
permission of Siemens AG
▪ Assisting drivers by warning them when they are drifting
out of lane.
▪ Creating 3D models of a destroyed building from multiple
old photographs.

▪ Advanced interfaces for computer games allowing the real


time detection of players or their hand-held controllers.
▪ Classification of plant types and anticipated yields based
on multispectral satellite images. Detecting buried
landmines in infrared images. See Figure 1.4.

33
Some examples of existing computer vision systems in the outside
world are shown in Figure 1.4.

34
1.4 The Future of Computer Vision
The community of vision developers is constantly pushing the
boundaries of what we can achieve. While we can produce
autonomous vehicles, which drive themselves on a highway, we
would have difficulties producing a reliable vehicle to work on
minor roads, particularly if the road marking were poor. Even in
the highway environment, though, we have a legal issue, as who is
to blame if the vehicle crashes? Clearly, those developing the
technology do not think it should be them and would rather that
the driver should still be responsible should anything go wrong.
This issue of liability is a difficult one and arises with many vision
applications in the real world. Taking another example, if we
develop a medical imaging system to diagnose cancer, what will
happen when it mistakenly does not diagnose a condition? Even
though the system might be more reliable than any individual
radiologist, we enter a legal minefield. Therefore, for now, the
simplest solution is either to address only non-critical problems or
to develop systems, which are assistants to, rather than
replacements for, the current human experts.
Another problem exists with the deployment of computer vision
systems. In some countries the installation and use of video
cameras is considered an infringement of our basic right to
privacy. This varies hugely from country to country, from
company to company,
and even from individual to individual. While most people
involved with technology see the potential benefits of camera
systems, many people are inherently distrustful of video cameras
and what the videos could be used for. Among other things, they
fear (perhaps justifiably) a Big Brother scenario, where our
movements and actions are constantly monitored. Despite this,
the number of cameras is growing very rapidly, as there are
cameras on virtually every new computer, every new phone, every
new games console, and so on.
Moving forwards, we expect to see computer vision addressing
progressively harder problems; that is problems in more complex
environments with fewer constraints. We expect computer vision
to start to be able to recognise more objects of different types and
to begin to extract more reliable and robust descriptions of the

35
world in which they operate. For example, we expect computer
vision to

▪ become an integral part of general computer interfaces;


▪ provide increased levels of security through biometric
analysis;
▪ provide reliable diagnoses of medical conditions from
medical images and medical records;
▪ allow vehicles to be driven autonomously;
▪ automatically determine the identity of criminals through
the forensic analysis of video.

Figure 1.5 The ASIMO humanoid robot which has two cameras in
its ‘head’ which allow ASIMO to determine how far away things
are, recognise familiar faces, etc. Reproduced by permission of
Honda Motor Co. Inc
Ultimately, computer vision is aiming to emulate the capabilities
of human vision, and to provide these abilities to humanoid (and
other) robotic devices, such as ASIMO (see Figure 1.5). This is
part of what makes this field exciting, and surprising, as we all
have our own (human) vision systems which work remarkably
well, yet when we try to automate any computer vision task it
proves very difficult to do reliably.

36
2
Images
Images play a crucial role in computer vision, serving as the visual data
captured by devices like cameras. They represent the appearance of
scenes, which can be processed to highlight key features before
extracting information. Images often contain noise, which can be
reduced using basic image processing methods.

2.1 Cameras

A camera includes a photosensitive image plane that detects light, a


housing that blocks unwanted light, and a lens that directs light onto the
image plane in a controlled manner, focusing the light rays.

2.1.1The Simple Pinhole Camera Model

The pinhole camera model is a basic yet realistic representation of a


camera, where the lens is considered a simple pinhole through which all
light rays pass to reach the image plane. This model simplifies real
imaging systems, which often have distortions caused by lenses.
Adjustments to address these distortions are discussed in Section 5.6.

X
I
Z
J
Image
plane Focal

Figure 2.1 illustrates the pinhole camera model, demonstrating how the
3D real world (right side) relates to images on the image plane (left side).
The pinhole serves as the origin in the XYZ coordinate system. In
practice, the image plane needs to be enclosed in a housing to block
stray light.

37
38
In homogeneous coordinates, www acts as a scaling factor for image
points. fif_ifi and fjf_jfj represent a combination of the camera's focal
length and pixel sizes in the I and J directions. (ci,cj)(c_i, c_j)(ci,cj) are
the coordinates where the optical axis, a line perpendicular to the image
plane passing through the pinhole, intersects the image plane.

2.2 Images

An image is a 2D projection of a 3D scene captured by a sensor,


represented as a continuous function of two coordinates (i, j), (column,
row), or (x, y). For digital processing, the image needs to be converted
into a suitable form

To process an image digitally, it is sampled into a matrix with MMM


rows and NNN columns and then quantized, assigning each matrix
element an integer value. The continuous range is divided into intervals,
commonly k=256k = 256k=256.

2.2.1 Sampling

Digital images are formed by sampling a continuous image into discrete


elements using a 2D array of photosensitive elements (pixels). Each pixel
has a fixed photosensitive area, with non-photosensitive borders between
them. There is a small chance that objects could be missed if their light
falls only in these border areas. A bigger challenge with sampling is that
each pixel represents the average luminance or chrominance over an
area, which might include light from multiple objects, especially at
object boundaries.

The number of samples in an image determines the ability to distinguish


objects within it. A sufficient resolution (number of pixels) is crucial for
accurately recognizing objects. However, if the resolution is too high, it
may include unnecessary details, making processing more difficult and
slower.

2.2.2Quantization

38
Each pixel in a digital image f(i,j)f(i, j)f(i,j) represents scene brightness
as a continuous function. However, these brightness values must be
discretely represented using digital values. Typically, the number of
brightness levels per channel is k=2bk = 2^bk=2b, where bbb is the
number of bits, commonly set to 8.

Figure 2.2 Four different samplings of the same image; top left 256x192,
top right 128x96, bottom left 64x48 and bottom right 32x24

The essential question is how many bits are truly needed to represent
pixels. Using more bits increases memory requirements, while using
fewer bits results in information loss. Although 8-bit and 6-bit images
appear similar, the latter uses 25% fewer bits. However, 4-bit and 2-bit
images show significant issues, even if many objects can still be
recognized. The required bit depth depends on the intended use of the
image. For automatic machine interpretation, more quantization levels
are necessary to avoid false contours and incorrect segmentation, as seen
in lower-bit images.

Figure 2.3 Four different quantization of the same grey-scale image; top
left 8 bits, top right 6 bits, bottom left 4 bits and bottom right 2 bits

2.3 Colour Images

39
40

Colour images have multiple channels, while grey-scale images have only one
channel representing luminance (Y) at each scene point. Colour images include
both luminance and chrominance (colour information), requiring multiple data
channels for representation. Consequently, colour images are larger and more
complex than grey-scale images due to the need to process each channel. Much
of image processing was originally developed for grey-scale images, leading to
less clear applications for colour images.

For many years, computer vision primarily focused on grey-level images due to
two main reasons: humans can easily interpret grey-level images, and they are
smaller and less complex (one value per point). However, colour images offer
additional useful information that aids in tasks like image segmentation into
distinct objects or surfaces. For instance, in Figure 2.4, it is significantly easier
to separate and segment different trees in a colour image compared to a grey-
scale image.

Figure 2.4 shows an RGB colour image on the left and its grey-scale equivalent
on the right. Humans are sensitive to light wavelengths between 400 nm and
700 nm, which is why most camera sensors are designed to detect these
wavelengths. Colour images are more complex than grey-scale images and are
usually represented in a three-channel colour space.

2.3.1 Red–Green–Blue (RGB) Images

The most common representation for colour images uses three channels
corresponding to the red (700 nm), green (546.1 nm), and blue (435.8 nm)
wavelengths, as illustrated in Figure 2.5. This indicates that the photosensitive
elements in the camera are designed to be spectrally sensitive to wavelengths
centered around these colours, as shown in Figure 2.6.

40
Figure 2.5 RGB Image (top left) shown with red channel (top right), green
channel (bottom left) and blue channel (bottom right)

When displaying the image, a combination of these three channels is presented


to the user.

Although there are around 16.8 million possible colors (256×256×256) with this
model, some colors cannot be accurately represented.

RGB color information can be converted to grayscale using the formula: Y =


0.299R + 0.587G + 0.114B. While the combination of RGB channels allows for
around 16.8 million possible colors, some colors cannot be accurately
represented.

In most cameras, the photosensitive elements sensitive to different wavelengths


are not co-located; instead, they are arranged in a regular pattern, as
illustrated in Figure 2.7. The RGB values are then interpolated from these
sensed values.

Figure 2.7 illustrates a sample arrangement of photosensitive cells in an RGB


camera, where the red, green, and blue boxes represent individual cells
sensitive to wavelengths around their respective colours. This arrangement is
known as the Bayer pattern and is commonly used in modern CCD and older
CMOS cameras.

41
42

2.3.5 Other Colour Spaces

Many other colour spaces are available for use, which are essentially alternative
representations of an image. In theory, these spaces do not contain more or less
information than RGB, CMY, or HSV images; however, they do hold more
information than grey-scale images, which discard information compared to the
original colour images. OpenCV supports six additional colour spaces through
conversion functions.

2.3.6 -Some Colour Applications

Color plays a significant role in various applications of computer vision and


image processing. These applications leverage color information to achieve
specific tasks, such as object recognition, segmentation, and enhancement. In
many cases, color is used as a feature to differentiate between objects in an
image or video. Below are some common color-based applications:

Object Recognition: Color can be a useful feature in distinguishing objects


from the background or from other objects. By identifying specific color ranges
that correspond to a certain object, algorithms can recognize and track objects
more effectively.

Segmentation: In segmentation, color is often used to divide an image into


meaningful regions or segments. For example, in medical imaging, color might
be used to segment tissues or organs. In natural images, color segmentation can
help isolate different elements in the scene.

1. Skin Detection: As discussed earlier, skin detection uses color


information to identify regions of an image that correspond to human
skin. This is useful in applications such as facial recognition, gesture
recognition, or human activity analysis.
2. Color-based Tracking: Color is often used in tracking moving objects in
video. By defining the color of an object, algorithms can track the object
as it moves through a sequence of frames, even in cluttered
environments.
3. Image Enhancement: Color can be adjusted to improve the quality of
an image, making it more visually appealing or easier to interpret. For
instance, adjusting the saturation, contrast, or brightness of an image
can make certain features stand out more.

These applications illustrate how color can be leveraged in various domains to


improve the effectiveness and accuracy of image processing tasks.

42
2.3.6.1-Skin Detection

Skin detection is a technique in image processing and computer vision that


focuses on identifying and locating regions in an image that correspond to
human skin. Skin detection typically relies on analyzing the color
characteristics of the pixels in an image. Since human skin has a characteristic
color, it tends to fall within specific ranges in certain color spaces, such as the
RGB (Red, Green, Blue) or HSV (Hue, Saturation, Value) color models.

Skin detection algorithms typically work by defining a set of skin color values
or thresholds within a chosen color space. These values are then used to filter
out non-skin pixels and highlight the regions of the image that are likely to be
skin. Some methods involve the use of machine learning models, such as neural
networks, to classify pixels as skin or non-skin based on trained data. Skin
detection is especially useful in applications where human presence or
interaction is important, such as in surveillance, gaming, and medical imaging.

2.4 Noise

In the context of image processing, "noise" refers to unwanted random


variations or disturbances that degrade the quality of an image. These
disturbances can be caused by various factors, such as imperfections in the
imaging device (e.g., camera sensors), environmental conditions (e.g., lighting
or temperature), or transmission errors (e.g., during data transfer). Noise
typically manifests as random pixel variations, which can distort the image and
affect its visual appearance or the performance of computer vision tasks, such
as object detection or image segmentation.

There are different types of noise, each with unique characteristics and impact
on an image. Noise can occur in both grayscale and color images, and its effect
depends on the type and level of noise introduced. Removing or reducing noise
is essential for improving the quality of images and ensuring the accuracy of
subsequent image processing tasks. Various techniques, such as filtering
methods and noise models, are used to address noise and improve the clarity of
images.

43
44

2.4.1 Types of Noise

In image processing and computer vision, noise refers to random variations or


disturbances in the image that can degrade its quality. Noise can arise due to
various factors such as sensor imperfections, environmental conditions, or
transmission errors. There are different types of noise, each with its
characteristics and effects on an image:

Gaussian Noise:

This type of noise follows a Gaussian distribution, where the intensity values of
the noisy pixels are distributed around the original pixel value with a certain
mean and variance. Gaussian noise appears as random variations in pixel
values and is commonly caused by sensor noise or thermal fluctuations in
cameras.

Salt-and-Pepper Noise:

This noise is characterized by random occurrences of black (0 intensity) and white


(maximum intensity) pixels in the image, resembling "salt" and "pepper" scattered
across the image. Salt-and-pepper noise is often caused by transmission errors or pixel
malfunctions in digital imaging systems.

Poisson Noise:

Poisson noise occurs when the pixel values in an image are affected by a Poisson
distribution. It is often observed in low-light imaging or photon-counting
sensors, where the number of detected photons follows a Poisson distribution.
Poisson noise is more pronounced in images with low signal levels.

Speckle Noise:

Speckle noise appears as grainy patterns in the image and is often caused by
interference from the imaging system, such as uneven illumination or sensor
imperfections. It can be especially problematic in medical imaging or remote
sensing applications, where it can obscure fine details in the image.

Quantization noise arises during the process of converting continuous signal


values into discrete values. This happens when an image is captured and

44
digitized, leading to small errors between the actual and quantized pixel values.
While typically subtle, quantization noise can accumulate and become more
noticeable in images with low resolution or high contrast.

Each type of noise requires different methods for detection and removal, such
as filtering techniques like Gaussian blur, median filtering, or wavelet
denoising, depending on the nature of the noise and the specific image
processing task.

2.4.2 Noise Models

Noise models refer to mathematical representations used to describe the


behavior of noise in a system, particularly in the context of image processing
and computer vision. These models are essential for understanding how noise
affects images and for developing methods to reduce or remove noise. By using
noise models, one can simulate different types of noise, predict their impact on
image quality, and apply appropriate filtering techniques to improve the image.

2.4.2.1Additive Noise

Additive noise is a type of noise that is added to the original signal or image,
meaning it is the result of unwanted random variations that accumulate on top
of the actual image data. This type of noise is typically modeled as an
independent, identically distributed random variable that is added to each pixel
of the image. The key feature of additive noise is that it does not depend on the
signal itself, but rather it is a separate entity that disturbs the image data.

2.4.2.2Multiplicative Noise

Multiplicative noise is a type of noise where the noise is applied as a


multiplicative factor to the original signal or image. This means that the noise
affects the signal in proportion to its intensity, altering the pixel values based on
a certain factor. Unlike additive noise, which simply adds random variations,
multiplicative noise causes the image data to change in a way that depends on
the original signal. This type of noise is often encountered in scenarios such as
radar imaging, medical imaging, or remote sensing.

2.4.3 Noise Generation

45
46

Noise generation refers to the process of creating noise within a system,


especially in the context of image processing and signal processing. It involves
using mathematical models or algorithms to simulate the introduction of noise
into an image or signal. The purpose of noise generation is to study the
behavior of noise under different conditions, evaluate the effectiveness of noise
removal techniques, or test the robustness of various image processing
algorithms. By generating noise, one can simulate realistic scenarios where
noise might be present in real-world images or signals.

2.4.4 Noise Evaluation

Noise evaluation refers to the process of assessing the impact of noise on an


image or signal. It involves measuring how noise affects the quality, clarity, and
accuracy of the data, and determining the level of degradation caused by the
noise. This evaluation is essential for understanding the severity of the noise
and for developing strategies to mitigate its effects. Various metrics and
techniques, such as Signal-to-Noise Ratio (SNR) and Mean Squared Error
(MSE), are used to quantify and evaluate the impact of noise in image
processing and signal processing tasks.

2.5 Smoothing

Smoothing is a technique used in image processing to reduce noise and enhance


the quality of an image by averaging or blending pixel values in a
neighborhood. The goal of smoothing is to remove high-frequency noise, which
often manifests as abrupt changes in pixel intensity, while preserving important
features of the image, such as edges and contours. This is especially important
in tasks like object recognition or feature extraction, where preserving the
integrity of the original structure is crucial. Smoothing can be achieved using
various methods, including linear filters like averaging filters and nonlinear
methods like median filtering.

2.5.1 Image Averaging

Image averaging is a smoothing technique where the pixel values of an image


are replaced by the average of the pixel values in a neighborhood around each
pixel. This method reduces noise by averaging out random variations in pixel
intensity. In a typical image averaging process, a sliding window or kernel
moves across the image, and for each pixel, the average of the surrounding
pixels is calculated and assigned to the center pixel. This helps to smooth out
sharp transitions and minor details, making the image appear less noisy and
more uniform. However, it can also blur edges and fine details, which can be
problematic in certain applications such as edge detection or image
segmentation.

46
2.5.3 Rotating Mask

A rotating mask is a type of image processing technique that involves applying


a filter or mask to an image, where the mask can rotate at different angles
during the process. The mask typically consists of a set of weights or values
arranged in a specific pattern, and as it rotates, it performs calculations (such
as averaging, convolution, or other transformations) on the pixel values of the
image in its local neighborhood. This technique is particularly useful in tasks
where directional information is important, such as edge detection, texture
analysis, or feature extraction, as it allows the filter to adapt to different
orientations in the image. The rotating mask can provide more flexibility and
accuracy in analyzing patterns and structures that are not aligned along the
standard horizontal or vertical directions.

2.5.4 Median Filter

The median filter is a nonlinear smoothing technique used in image processing


to remove noise, particularly "salt-and-pepper" noise. Unlike linear filters,
which compute an average of pixel values, the median filter works by replacing
each pixel in the image with the median value of the pixels in its neighborhood.
The median is the middle value when the pixel values in the neighborhood are
sorted in ascending or descending order. This method is particularly effective in
removing outlier noise (such as extreme black or white pixels) without blurring
the edges of the image as much as linear filters. The median filter is commonly
used in applications such as denoising and image enhancement, where
preserving sharp details while reducing noise is important.

47

You might also like