Computer Vasion 17-11
Computer Vasion 17-11
A PRACTICAL INTRODUCTION TO
1
What are the types of digital images?
Types of an image
2
What are the differences between image
processing and computer vision?
3
Talk about the the stages of image
analysis processing?
The stages of image analysis processing can be outlined as
follows:
5
Solution:
40 40 20 20 10 10
70 70 50 50 30 30
90 90 80 80 10 10
40 20 10
40 20 10
70 50 30
70 50 30
90 80 10
90 80 10
40 40 20 20 10 10
40 40 20 20 10 10
70 70 50 50 30 30
70 70 50 50 30 30
90 90 80 80 10 10
90 90 80 80 10 10
6
How to find the average:
Finding the average between two adjacent pixel values and putting
the value between them, such as 4-8. We add it = 12. We divide by
2, so the average value becomes 6. The result is written as 4-6-8.
If we use this method by averaging row pixels, the columns will
increase, and if we use columns, the rows will increase.
We can work with two pixels in each row and each column, and
we can expand the columns and rows together:
This method enlarges the capacity of the N*N matrix to become an
image matrix of size (2n-1-2n-1).
7
Example
If we have a 3×3 matrix that represents part of the values of the
digital image, we need to expand the columns and rows together.
Solution:
The size of the matrix becomes 5 x 5
Example for clarification: We have the following matrix on
which the rows and columns will be expanded together.
Expand
The rows and columns are operated on the resulting matrix, not the
original.
8
4. Zoom using factor (k):
This means, for example, that the image (matrix) is enlarged,
for example, 3 times its size. This means that the factor K=3
multiplies it by the capacity of the matrix.
9
That is, we add 5 twice, and the result becomes two numbers
between 125 and 140
[ 125 130 135 140 ]
Then we take the other two adjacent numbers, which are 140 and 155
10
Computer vision modeling
Image Algebra
Algebraic operations are divided into mathematical operations
(arithmetic operations) and logical operations.
Mathematical Calculations:
Addition:
The addition process is used to collect information from two images
by combining the elements of the first image with the second,
starting with the first element of the first image with the first element
of the second image, and so on for the rest of the elements. We use
the addition method to restore or number the image.
Image Restoration and to add noise to the image (as a type of
encryption).
Example: You have parts of the following two images. The first
image is I1 and the second image is I2. What is required to add these
two parts?
Solution
11
Example: You have the following two images. You need to subtract
the two images?
Solution:
12
Multiplication
The process is done by multiplying the matrix elements of the image by a factor greater than one and is used.
Increase or shrink the image.
For example: The factor K must be greater than one when you want to enlarge the image
Example: You have the following image to scale up and down using one of the digital image jigsaw operations.
We use multiplication as algebraic mathematical operations, for example, we multiply this matrix by a
factor (here we choose the factor now, it was not specified for us in the question)
13
Note:
1 > K in case of decrease in (shrinking) and here the image tends to black
(darkness)
• Division:
- The elements of the given image are divided by a factor
greater than one. The division process makes the image
dark.
- Such as // You have the following matrix, which is part of
an image. You need to divide the image by a factor of
K=4
Solution
14
15
Logical operations:
• Logical AND operation:
Logical operations are applied to the elements of the image
after converting each element of the image to the binary
state, so that logical operations can be used in it through
the (ROI) method.
• Logical OR operation
it is done by taking a black square and a white background
for the required image data from the original image, and
the OR process is similar to the addition process.
• The logical operation NOT
It is used to give negative values to the original image,
meaning it deviates the opposite of the image (e.g.,
negative camera film).
That is, the image data is reversed, i.e., black becomes white and
white becomes black
Example: If you have the following image part to use NOT?
16
The image resulting from the NOT process is close to black, and
the data of this image must be converted to (Binary) (0,1)
format.
17
In the case of NOT, it is for one of the two numbers, so that every zero
becomes a one and every one is a zero.
First Number
Second
Number
Note: The values here are distorted, so I used a second method to deal with
these parameters for logical operations and convert them to binary,
and also for gates. (NAND, NOR, XOR)
18
Image enhancement (spatial filters)
Filter means a process that filters the image from any remaining
impurities, that is, it highlights the features of the part of the
image that we want by removing noise and impurities.
19
The results of the mask can be known as follows:
1. If the sum of the mask’s parameters equals 1, it means
high illumination of the image.
2. If the sum of the coefficients is equal to 0, then the image
illumination loses, that is, it tends to become black.
3. If the coefficients are negative and wavelike, this means
information about the edges.
4. If the coefficients are only waves, there is some kind of
distortion in the image.
20
1- Mean Filter
- It is a linear filter whose elements are:
All of its elements are positive, and because they are all positive, there is
distortion in the image, and since the sum of the mask’s elements equals 1,
then there is a high loss.
Solution:
We know that the result is two points, so we connect them and the shape
becomes linear.
21
2- -Median Filter
It is a nonlinear filter that acts on image elements immediately
after selecting a mask through elements where the center of the
image is replaced by the value in the middle.
Example: Apply Median Filter to the following image part?
There is no masker in Median, and we create a masker for it from the elements of
the matrix, where we take elements
The image and we make a mask in it, and that is in the order of the image elements
in ascending order, so it becomes:
1- The first step: Arrange the items in ascending order
22
Example: You have the following image fragment
Solution:
1- We take the part (3*3) and arrange it in ascending order
We take the next part of the matrix, which is also (3*3), and
arrange it in ascending order.
23
Histogram modification
A chart that uses the gray levels of an image distributes these levels
of the image so that the part of the image that contains the
information fills the chart and the rest of the space is empty,
depending on the values of the chart's image points.
24
• The first method: Histogram Stretching
whereas:
1. The largest gray level value in the image I (r,c)max
2. The minimum gray level value in the image is I(r,c) min
3. The possible minimum and maximum gray level values depend on (255.0)
Max & Min
Example: You have the following image part to expand this part of the image
using the histogram expansion method?
25
• The second method is histogram shrinking
whereas:
1. The largest gray level value in the image is I(r,c)max
2. The minimum gray level value in the image is I(r,c)man
26
3. Depends on the maximum and minimum Shrinkmax &
Shrinkmin gray level values the potential is (0, 255).
Example: You have the following image part to shrink this part of
the image using method Shrink histogram?
27
• The third method: Shift the histogram slide
The histogram can be shifted by a certain distance according to the
following law:
Slide (I (r,c) ) = I (r,c) - OFFSET.................(8)
whereas:
Offset: The amount by which the histogram is offset by a distance.
Example: You have the following part of the image that needs to be
shifted by a distance of 10 units using the Histogram slide method.
28
Introduction
29
67 67 66 68 66 67 64 65 65 63 63 69 61 64 63 66 61 60
69 68 63 68 65 62 65 61 50 26 32 65 61 67 64 65 66 63
72 71 70 87 67 60 28 21 17 18 13 15 20 59 61 65 66 64
75 73 76 78 67 26 20 19 16 18 16 13 18 21 50 61 69 70
74 75 78 74 39 31 31 30 46 37 69 66 64 43 18 63 69 60
73 75 77 64 41 20 18 22 63 92 99 88 78 73 39 40 59 65
74 75 71 42 19 12 14 28 79 102 107 96 87 79 57 29 68 66
75 75 66 43 12 11 16 62 87 84 84 108 83 84 59 39 70 66
76 74 49 42 37 10 34 78 90 99 68 94 97 51 40 69 72 65
76 63 40 57 123 88 60 83 95 88 80 71 67 69 32 67 73 73
88 41 35 10 15 94 67 96 98 91 86 105 81 77 71 35 45 47
86 42 47 11 13 16 71 76 89 95 116 91 67 87 12 25 43 51
96 67 20 12 17 17 86 89 90 101 96 89 62 13 11 19 40 51
99 88 19 15 15 18 32 107 99 86 95 92 26 13 13 16 49 52
If you consider your eyes, it is probably not clear to you that your
colour vision (provided by the 6–7 million cones in the eye) is
concentrated in the centre of the visual field of the eye (known as
30
the macula). The rest of your retina is made up of around 120
million rods (cells that are sensitive to visible light of any
wavelength/colour). In addition, each eye has a rather large blind
spot where the optic nerve attaches to the retina. Somehow, we
think we see a continuous image (i.e. no blind spot) with colour
everywhere, but even at this lowest level of processing it is unclear
as to how this impression occurs within the brain.
The visual cortex (at the back of the brain) has been studied and
found to contain cells that perform a type of edge detection (see
Chapter 6), but mostly we know what sections of the brain do
based on localised brain damage to individuals. For example, a
number of people with damage to a particular section of the brain
can no longer recognise faces (a condition known as
prosopagnosia). Other people have lost the ability to sense moving
objects (a condition known as akinetopsia). These conditions
inspire us to develop separate modules to recognise faces (e.g. see
Section 8.4) and to detect object motion (e.g. see Chapter 9).
We can also look at the brain using functional MRI, which allows
us to see the concentration of electrical activity in different parts
of the brain as subjects perform various activities. Again, this
may tell us what large parts of the brain are doing, but it cannot
provide us with algorithms to solve the problem of interpreting
the massive arrays of numbers that video cameras provide.
31
1.3 Practical Applications of Computer Vision
32
On the factory floor, the problem is a little simpler than in the
real world as the lighting can be constrained and the possible
variations of what we can see are quite limited. Computer vision
is now solving problems outside the factory. Computer vision
applications outside the factory include:
33
Some examples of existing computer vision systems in the outside
world are shown in Figure 1.4.
34
1.4 The Future of Computer Vision
The community of vision developers is constantly pushing the
boundaries of what we can achieve. While we can produce
autonomous vehicles, which drive themselves on a highway, we
would have difficulties producing a reliable vehicle to work on
minor roads, particularly if the road marking were poor. Even in
the highway environment, though, we have a legal issue, as who is
to blame if the vehicle crashes? Clearly, those developing the
technology do not think it should be them and would rather that
the driver should still be responsible should anything go wrong.
This issue of liability is a difficult one and arises with many vision
applications in the real world. Taking another example, if we
develop a medical imaging system to diagnose cancer, what will
happen when it mistakenly does not diagnose a condition? Even
though the system might be more reliable than any individual
radiologist, we enter a legal minefield. Therefore, for now, the
simplest solution is either to address only non-critical problems or
to develop systems, which are assistants to, rather than
replacements for, the current human experts.
Another problem exists with the deployment of computer vision
systems. In some countries the installation and use of video
cameras is considered an infringement of our basic right to
privacy. This varies hugely from country to country, from
company to company,
and even from individual to individual. While most people
involved with technology see the potential benefits of camera
systems, many people are inherently distrustful of video cameras
and what the videos could be used for. Among other things, they
fear (perhaps justifiably) a Big Brother scenario, where our
movements and actions are constantly monitored. Despite this,
the number of cameras is growing very rapidly, as there are
cameras on virtually every new computer, every new phone, every
new games console, and so on.
Moving forwards, we expect to see computer vision addressing
progressively harder problems; that is problems in more complex
environments with fewer constraints. We expect computer vision
to start to be able to recognise more objects of different types and
to begin to extract more reliable and robust descriptions of the
35
world in which they operate. For example, we expect computer
vision to
Figure 1.5 The ASIMO humanoid robot which has two cameras in
its ‘head’ which allow ASIMO to determine how far away things
are, recognise familiar faces, etc. Reproduced by permission of
Honda Motor Co. Inc
Ultimately, computer vision is aiming to emulate the capabilities
of human vision, and to provide these abilities to humanoid (and
other) robotic devices, such as ASIMO (see Figure 1.5). This is
part of what makes this field exciting, and surprising, as we all
have our own (human) vision systems which work remarkably
well, yet when we try to automate any computer vision task it
proves very difficult to do reliably.
36
2
Images
Images play a crucial role in computer vision, serving as the visual data
captured by devices like cameras. They represent the appearance of
scenes, which can be processed to highlight key features before
extracting information. Images often contain noise, which can be
reduced using basic image processing methods.
2.1 Cameras
X
I
Z
J
Image
plane Focal
Figure 2.1 illustrates the pinhole camera model, demonstrating how the
3D real world (right side) relates to images on the image plane (left side).
The pinhole serves as the origin in the XYZ coordinate system. In
practice, the image plane needs to be enclosed in a housing to block
stray light.
37
38
In homogeneous coordinates, www acts as a scaling factor for image
points. fif_ifi and fjf_jfj represent a combination of the camera's focal
length and pixel sizes in the I and J directions. (ci,cj)(c_i, c_j)(ci,cj) are
the coordinates where the optical axis, a line perpendicular to the image
plane passing through the pinhole, intersects the image plane.
2.2 Images
2.2.1 Sampling
2.2.2Quantization
38
Each pixel in a digital image f(i,j)f(i, j)f(i,j) represents scene brightness
as a continuous function. However, these brightness values must be
discretely represented using digital values. Typically, the number of
brightness levels per channel is k=2bk = 2^bk=2b, where bbb is the
number of bits, commonly set to 8.
Figure 2.2 Four different samplings of the same image; top left 256x192,
top right 128x96, bottom left 64x48 and bottom right 32x24
The essential question is how many bits are truly needed to represent
pixels. Using more bits increases memory requirements, while using
fewer bits results in information loss. Although 8-bit and 6-bit images
appear similar, the latter uses 25% fewer bits. However, 4-bit and 2-bit
images show significant issues, even if many objects can still be
recognized. The required bit depth depends on the intended use of the
image. For automatic machine interpretation, more quantization levels
are necessary to avoid false contours and incorrect segmentation, as seen
in lower-bit images.
Figure 2.3 Four different quantization of the same grey-scale image; top
left 8 bits, top right 6 bits, bottom left 4 bits and bottom right 2 bits
39
40
Colour images have multiple channels, while grey-scale images have only one
channel representing luminance (Y) at each scene point. Colour images include
both luminance and chrominance (colour information), requiring multiple data
channels for representation. Consequently, colour images are larger and more
complex than grey-scale images due to the need to process each channel. Much
of image processing was originally developed for grey-scale images, leading to
less clear applications for colour images.
For many years, computer vision primarily focused on grey-level images due to
two main reasons: humans can easily interpret grey-level images, and they are
smaller and less complex (one value per point). However, colour images offer
additional useful information that aids in tasks like image segmentation into
distinct objects or surfaces. For instance, in Figure 2.4, it is significantly easier
to separate and segment different trees in a colour image compared to a grey-
scale image.
Figure 2.4 shows an RGB colour image on the left and its grey-scale equivalent
on the right. Humans are sensitive to light wavelengths between 400 nm and
700 nm, which is why most camera sensors are designed to detect these
wavelengths. Colour images are more complex than grey-scale images and are
usually represented in a three-channel colour space.
The most common representation for colour images uses three channels
corresponding to the red (700 nm), green (546.1 nm), and blue (435.8 nm)
wavelengths, as illustrated in Figure 2.5. This indicates that the photosensitive
elements in the camera are designed to be spectrally sensitive to wavelengths
centered around these colours, as shown in Figure 2.6.
40
Figure 2.5 RGB Image (top left) shown with red channel (top right), green
channel (bottom left) and blue channel (bottom right)
Although there are around 16.8 million possible colors (256×256×256) with this
model, some colors cannot be accurately represented.
41
42
Many other colour spaces are available for use, which are essentially alternative
representations of an image. In theory, these spaces do not contain more or less
information than RGB, CMY, or HSV images; however, they do hold more
information than grey-scale images, which discard information compared to the
original colour images. OpenCV supports six additional colour spaces through
conversion functions.
42
2.3.6.1-Skin Detection
Skin detection algorithms typically work by defining a set of skin color values
or thresholds within a chosen color space. These values are then used to filter
out non-skin pixels and highlight the regions of the image that are likely to be
skin. Some methods involve the use of machine learning models, such as neural
networks, to classify pixels as skin or non-skin based on trained data. Skin
detection is especially useful in applications where human presence or
interaction is important, such as in surveillance, gaming, and medical imaging.
2.4 Noise
There are different types of noise, each with unique characteristics and impact
on an image. Noise can occur in both grayscale and color images, and its effect
depends on the type and level of noise introduced. Removing or reducing noise
is essential for improving the quality of images and ensuring the accuracy of
subsequent image processing tasks. Various techniques, such as filtering
methods and noise models, are used to address noise and improve the clarity of
images.
43
44
Gaussian Noise:
This type of noise follows a Gaussian distribution, where the intensity values of
the noisy pixels are distributed around the original pixel value with a certain
mean and variance. Gaussian noise appears as random variations in pixel
values and is commonly caused by sensor noise or thermal fluctuations in
cameras.
Salt-and-Pepper Noise:
Poisson Noise:
Poisson noise occurs when the pixel values in an image are affected by a Poisson
distribution. It is often observed in low-light imaging or photon-counting
sensors, where the number of detected photons follows a Poisson distribution.
Poisson noise is more pronounced in images with low signal levels.
Speckle Noise:
Speckle noise appears as grainy patterns in the image and is often caused by
interference from the imaging system, such as uneven illumination or sensor
imperfections. It can be especially problematic in medical imaging or remote
sensing applications, where it can obscure fine details in the image.
44
digitized, leading to small errors between the actual and quantized pixel values.
While typically subtle, quantization noise can accumulate and become more
noticeable in images with low resolution or high contrast.
Each type of noise requires different methods for detection and removal, such
as filtering techniques like Gaussian blur, median filtering, or wavelet
denoising, depending on the nature of the noise and the specific image
processing task.
2.4.2.1Additive Noise
Additive noise is a type of noise that is added to the original signal or image,
meaning it is the result of unwanted random variations that accumulate on top
of the actual image data. This type of noise is typically modeled as an
independent, identically distributed random variable that is added to each pixel
of the image. The key feature of additive noise is that it does not depend on the
signal itself, but rather it is a separate entity that disturbs the image data.
2.4.2.2Multiplicative Noise
45
46
2.5 Smoothing
46
2.5.3 Rotating Mask
47