Computer Vasion Part 1
Computer Vasion Part 1
A PRACTICAL INTRODUCTION TO
1
Digital Image Processing
What is an image?
3
Grayscale Image. In this format, 0 stands for Black, and 255 stands for
White, and 127 stands for Gray.
A 16-bit format is actually divided into three further formats which are
Red, Green, and Blue. That famous RGB format.
5
used to develop the algorithms. Poor quality training data can result in poor
performance of the algorithm.
Image processing and Computer Vision both are very exciting fields
of Computer Science.
Image processing and Computer Vision are both very exciting fields of Computer Science.
Computer Vision:
7
Primary processing is divided into sections:
Solution:
40 20 10
40 20 10
70 50 30
70 50 30
90 80 10
90 80 10
40 40 20 20 10 10
40 40 20 20 10 10
70 70 50 50 30 30
70 70 50 50 30 30
90 90 80 80 10 10
90 90 80 80 10 10
9
We can work with two pixels in each row and each column, and we can
expand the columns and rows together:
This method enlarges the capacity of the N*N matrix to become an
image matrix of size (2n-1-2n-1).
Example
If we have a 3×3 matrix that represents part of the values of the digital
image, we need to expand the columns and rows together.
Solution:
The size of the matrix becomes 5 x 5
Example for clarification: We have the following matrix on which the
rows and columns will be expanded together.
Expand
The rows and columns are operated on the resulting matrix, not the
original.
4. Zoom using factor (k):
This means, for example, that the image (matrix) is enlarged, for
example, 3 times its size. This means that the factor K=3 multiplies
it by the capacity of the matrix.
Example: You have a portion of the following image that you want
to enlarge by 3 times its original size
11
That is, we add 5 twice, and the result becomes two numbers
between 125 and 140
[ 125 130 135 140 ]
Then we take the other two adjacent numbers, which are 140 and 155
Image Algebra
Algebraic operations are divided into mathematical operations (arithmetic
operations) and logical operations.
Mathematical Calculations:
Addition:
The addition process is used to collect information from two images by
combining the elements of the first image with the second, starting with
the first element of the first image with the first element of the second
image, and so on for the rest of the elements. We use the addition method
to restore or number the image.
Image Restoration and to add noise to the image (as a type of encryption).
Example: You have parts of the following two images. The first image is
I1 and the second image is I2. What is required to add these two parts?
Solution
Example: If the plural is used for Noise, what is the way to break the two
images back?
Subtraction
The subtraction process is used to subtract information from two images,
so that we subtract every element in the first image.
Example: You have the following two images. You need to subtract the
two images?
13
Solution:
Multiplication
The process is done by multiplying the matrix elements of the image by a
factor greater than one and is used.
Increase or shrink the image
For example: The factor K must be greater than one when you want to
enlarge the image
Example: You have the following image to scale up and down using one
of the digital image jigsaw operations.
15
Example: we have the matrix (part of an image), which is the result
of the increment process, and (K=3) required to find the original
matrix?
Note:
• Division:
- The elements of the given image are divided by a factor greater
than one. The division process makes the image dark.
- Such as // You have the following matrix, which is part of an
image. You need to divide the image by a factor of K=4
Solution
Logical operations:
• Logical AND operation:
Logical operations are applied to the elements of the image after
converting each element of the image to the binary state, so that
logical operations can be used in it through the (ROI) method.
AND is considered similar to the multiplication process,
meaning that the image tends to be white and is done through a
white square with Image elements so that the output is the part
of the image corresponding to the white square.
(AND: makes the background of the part we want white,
while OR makes the background of the part we want black)
• Logical OR operation
it is done by taking a black square and a white background for
the required image data from the original image, and the OR
process is similar to the addition process.
• The logical operation NOT
It is used to give negative values to the original image, meaning
it deviates the opposite of the image (e.g., negative camera
film).
17
That is, the image data is reversed, i.e., black becomes white and
white becomes black
Example: If you have the following image part to use NOT?
The image resulting from the NOT process is close to black, and the
data of this image must be converted to (Binary) (0,1) format.
Note: The values here are distorted, so I used a second method to deal
with these parameters for logical operations and convert them
to binary, and also for gates. (NAND, NOR, XOR)
19
Image enhancement (spatial filters)
Filter means a process that filters the image from any remaining
impurities, that is, it highlights the features of the part of the image
that we want by removing noise and impurities.
21
1- Mean Filter
- It is a linear filter whose elements are:
All of its elements are positive, and because they are all positive,
there is distortion in the image, and since the sum of the mask’s
elements equals 1, then there is a high loss.
Solution:
We know that the result is two points, so we connect them and the
shape becomes linear.
2- -Median Filter
It is a nonlinear filter that acts on image elements immediately after
selecting a mask through elements where the center of the image is
replaced by the value in the middle.
23
value 5 instead of the middle element in the original matrix,
which is 4 (that is, we put 5 instead of 4), and it becomes
Solution:
1- We take the part (3*3) and arrange it in ascending order
We take the next part of the matrix, which is also (3*3), and
arrange it in ascending order.
Enhancement Filter
Image Quantization:
- The difference between compression and shrinkage: Image shrinkage is the
process of transferring image data by removing some image information by
projecting a group of image elements to a single point. This process of
shrinkage (quantization) takes place.
- As for compression: we deal with the image itself as a file, while shrinking
may delete part of the image and we deal with the values of the image.
2- Special Reduction
- Here, work is done on the coordinates of the image elements (r, c),
the location, for example (1,1).
25
A- The first method: Threshold
A specific value of color levels is chosen. This value is called a threshold. Any
value of image data that is higher than the threshold value becomes one, and if
it is lower, its value becomes zero. This means that the number with 256 color
levels is converted into binary images.
like
If the threshold value is 127, apply it to the following values:
Solution:
- We see that the highest value is 251 and the lowest value is 11 according to
the binary model, the threshold is 127, so it is:
Example:
We want to reduce or reduce the information to eight. If the standard
probability for us is 256, for the gray level to 32 levels, we use the AND
method to explain this. The smallest number of each place is in AND.
Solution:
So, these color spectra must be reduced from 256 to 32 spectra, meaning that
every 8 bits we put in a cell, after which we take the lowest value in each cell,
so here the first cell will have the lowest value, 0, the second, 8, and the third,
16, until it reaches 256, so that the number of these extracted numbers is 32
numbers.
But if the OR method takes the largest number in the box, why?
Answer: Because R does not take a zero, and the numbers are less than
AND by one number starting from the number
The second) becomes .............. 32 15 7 to OR takes the largest number from
each digit
Example: If you have the number of standard levels (256) do you want to
reduce it to 16?
27
Answer: This means that it is all 16 bits in one place, so the division is by
16, so we start with the first place from 0 to 16, and so on.
This method is used to shrink the image (reduce it) using a specific mask.
Example //
If you have diamonds, the following is required to use the AND catcher
method to shrink this part of the image depending on the number of bits for
each element, which is 8?
Solution:
The law of gray levels
29
2- Special Reduction
Reducing space is done in three ways:
1- Average
2- Median
3- Reduction
This is a method that takes a group of adjacent elements and takes their
average.
Example: It is required to use the rate method for the following image part
Solution:
If the average is for rows, it is the sum of the row divided by the number of
numbers in one row.
- If the masker is used and the masker is assumed to be 3*3 and we arrange
the elements in ascending order, the order of the elements of the first matrix
will be 3*3.
So, the median is the fifth element and its value is also 5
31
- Some image data is deleted, for example, the image size is reduced by 2.
Here, each row is taken or a column from the image and delete the row
and the next column/
Example: You have the following image part that needs to be reduced by 2
by columns?
So, the second and fourth columns are deleted, meaning the matrix becomes
as follows:
\
If the reduction is by 3, we delete two columns, the second and third,
meaning 2 = 1 - 3.
So, the matrix becomes a sequence
33
The histogram can be expanded according to the following law:
whereas:
1. The largest gray level value in the image I (r,c)max
2. The minimum gray level value in the image is I(r,c) min
3. The possible minimum and maximum gray level values depend on
(255.0) Max & Min
Example: You have the following image part to expand this part of the
image using the histogram expansion method?
• The second method is histogram shrinking
35
whereas:
1. The largest gray level value in the image is I(r,c)max
2. The minimum gray level value in the image is I(r,c)man
3. Depends on the maximum and minimum Shrinkmax & Shrinkmin
gray level values the potential is (0, 255).
Example: You have the following image part to shrink this part of the
image using method Shrink histogram?
• The third method: Shift the histogram slide
The histogram can be shifted by a certain distance according to the
following law:
Slide (I (r,c) ) = I (r,c) - OFFSET.................(8)
whereas:
Offset: The amount by which the histogram is offset by a distance.
Example: You have the following part of the image that needs to be
shifted by a distance of 10 units using the Histogram slide method.
37
Introduction
39
67 67 66 68 66 67 64 65 65 63 63 69 61 64 63 66 61 60
69 68 63 68 65 62 65 61 50 26 32 65 61 67 64 65 66 63
72 71 70 87 67 60 28 21 17 18 13 15 20 59 61 65 66 64
75 73 76 78 67 26 20 19 16 18 16 13 18 21 50 61 69 70
74 75 78 74 39 31 31 30 46 37 69 66 64 43 18 63 69 60
73 75 77 64 41 20 18 22 63 92 99 88 78 73 39 40 59 65
74 75 71 42 19 12 14 28 79 102 107 96 87 79 57 29 68 66
75 75 66 43 12 11 16 62 87 84 84 108 83 84 59 39 70 66
76 74 49 42 37 10 34 78 90 99 68 94 97 51 40 69 72 65
76 63 40 57 123 88 60 83 95 88 80 71 67 69 32 67 73 73
88 41 35 10 15 94 67 96 98 91 86 105 81 77 71 35 45 47
86 42 47 11 13 16 71 76 89 95 116 91 67 87 12 25 43 51
96 67 20 12 17 17 86 89 90 101 96 89 62 13 11 19 40 51
99 88 19 15 15 18 32 107 99 86 95 92 26 13 13 16 49 52
If you consider your eyes, it is probably not clear to you that your
colour vision (provided by the 6–7 million cones in the eye) is
concentrated in the centre of the visual field of the eye (known as
the macula). The rest of your retina is made up of around 120
million rods (cells that are sensitive to visible light of any
wavelength/colour). In addition, each eye has a rather large blind
spot where the optic nerve attaches to the retina. Somehow, we
think we see a continuous image (i.e. no blind spot) with colour
everywhere, but even at this lowest level of processing it is unclear
as to how this impression occurs within the brain.
The visual cortex (at the back of the brain) has been studied and
found to contain cells that perform a type of edge detection (see
Chapter 6), but mostly we know what sections of the brain do
based on localised brain damage to individuals. For example, a
number of people with damage to a particular section of the brain
can no longer recognise faces (a condition known as
prosopagnosia). Other people have lost the ability to sense moving
objects (a condition known as akinetopsia). These conditions
inspire us to develop separate modules to recognise faces (e.g. see
Section 8.4) and to detect object motion (e.g. see Chapter 9).
We can also look at the brain using functional MRI, which allows
us to see the concentration of electrical activity in different parts
of the brain as subjects perform various activities. Again, this
may tell us what large parts of the brain are doing, but it cannot
provide us with algorithms to solve the problem of interpreting
the massive arrays of numbers that video cameras provid
41
1.3 Practical Applications of Computer Vision
43
Advanced interfaces for computer games allowing the real
time detection of players or their hand-held controllers.
Classification of plant types and anticipated yields based
on multispectral satellite images. Detecting buried
landmines in infrared images. See Figure 1.4.
Some examples of existing computer vision systems in the outside
world are shown in Figure 1.4.
This issue of liability is a difficult one and arises with many vision
applications in the real world. Taking another example, if we
develop a medical imaging system to diagnose cancer, what will
happen when it mistakenly does not diagnose a condition? Even
though the system might be more reliable than any individual
radiologist, we enter a legal minefield. Therefore, for now, the
simplest solution is either to address only non-critical problems or
to develop systems, which are assistants to, rather than
replacements for, the current human experts.
45
Figure 1.5 The ASIMO humanoid robot which has two cameras in
its ‘head’ which allow ASIMO to determine how far away things
are, recognise familiar faces, etc. Reproduced by permission of
Honda Motor Co. Inc
2.1 Cameras
47
48
2.1.1The Simple Pinhole Camera Model
X
I
Z
J
Image
plane Focal
Figure 2.1 illustrates the pinhole camera model, demonstrating how the
3D real world (right side) relates to images on the image plane (left side).
The pinhole serves as the origin in the XYZ coordinate system. In
practice, the image plane needs to be enclosed in a housing to block
stray light.
2.2 Images
2.2.2Quantization
Figure 2.2 Four different samplings of the same image; top left 256x192,
top right 128x96, bottom left 64x48 and bottom right 32x24
49
50
The essential question is how many bits are truly needed to represent
pixels. Using more bits increases memory requirements, while using
fewer bits results in information loss. Although 8-bit and 6-bit images
appear similar, the latter uses 25% fewer bits. However, 4-bit and 2-bit
images show significant issues, even if many objects can still be
recognized. The required bit depth depends on the intended use of the
image. For automatic machine interpretation, more quantization levels
are necessary to avoid false contours and incorrect segmentation, as seen
in lower-bit images.
Figure 2.3 Four different quantization of the same grey-scale image; top
left 8 bits, top right 6 bits, bottom left 4 bits and bottom right 2 bits