Unit1 CV
Unit1 CV
INTRODUCTION TO
COMPUTER VISION
Dr Resmi K R
2
Introduction
● Computer Vision is a field of Artificial Intelligence that enables machines to capture
and interpret visual information from the world just like humans do.
● Computer vision aims to automate the human vision system to recognize objects,
understand scenes, and make judgments after analyzing the visual data.
3
Steps in Computer Vision
4
Steps in Digital Image processing
5
Tasks in Computer Vision
6
● Object Detection—Object detection identifies and locates what object is in the image.
Instead of simply saying that an image contains a dog and a cat, object detection shows
where they are located in the image. The algorithm can also detect multiple objects at
the same time.
7
● Semantic Segmentation—Semantic segmentation labels each pixel in an image with a
specific class. Unlike object detection, which focuses on bounding boxes around
objects, semantic segmentation provides a detailed understanding of the scene.
8
● Instance Segmentation—Instance segmentation is an extension of semantic
segmentation that differentiates between multiple instances of the same object class.
For example, in an image with several cars, instance segmentation would not only
label all the vehicles but also distinguish between individual cars, assigning a unique
label to each one.
9
● Keypoint Detection—Keypoint detection identifies specific points of
interest within an object, such as the corners of a box or the joints in a
human body.
10
Real time example
Consider the image below to store the number ‘8’ in the form of an image
zoom Image
11
Applications of computer vision
12
Quality management Tumor detection
13
Animal monitoring-Smart farming Plant disease detection
14
Parking occupancy detection Vehicle Counting
15
Human Vision Machine Vision
Speed The human visual system can process 10 to High speed – hundreds to thousands of
12 images per second. parts per minute (PPM)
Consistency, Impaired by boredom, distraction & fatigue Continuous repeatable performance – 24/7
reliability & safety 100% accuracy
16
Geometric Camera Models
When we click a picture, a real scenario which is in 3D is captured by a real camera in 2D.
So here, 3D to 2D conversion takes place.
Real Scene (3D) → Real Cameras (2D) → CV Output (3D)
Pinhole Camera Model
For a pinhole camera, a hole of the size of a pin is created on one side of a box and a thin
paper on the other side of the box. The light entering this hole will then project the image
of the world on the paper. The image captured will be upside down i.e. an inverted
image.
17
Image Formation in eye
18
Digital Image Processing, 2nd ed. www.imageprocessingbook.com
Brightness adaptation
• Dynamic range of
human visual system
– 10-6 ~ 104
• Cannot accomplish this
range simultaneously
• The current sensitivity
level of the visual
system is called the
brightness adaptation
level
20
© 2002 R. C. Gonzalez & R. E. Woods
Digital Image Processing, 2nd ed. www.imageprocessingbook.com
The output of most of the image sensors is an analog signal, and we can not apply digital
processing on it because we can not store it. We can not store it because it requires infinite
memory to store a signal that can have infinite values.
So we have to convert an analog signal into a digital signal.
To create an image which is digital, we need to covert continuous data into digital form. There are
two steps in which it is done.
Sampling
Quantization
• For digital images the minimum gray level is usually 0, but the maximum depends on number of
quantization levels used to digitize an image. The most common is 256 levels, so that the maximum
level is 255.
30
Basic relationships between pixels
31
● Neighbors of a Pixel
32
33
Distance Measures
34
● L1 Distance (or Cityblock Distance or Manhattan Distance)-does not go
in straight lines but in blocks
35
● Chebyshev Distance (or Chessboard Distance)
The most intuitive understanding of the Chebyshev distance is the movement of the King on a chessboard:
it can go one step in any direction (up, down, left, right and verticals).
36
IMAGE FILE FORMATS
37
38
Color Models
39
RGB (Red, Green, Blue) Model
The RGB model is the most widely used color model in digital imaging. Colors are created by combining
red, green, and blue light in varying intensities.
Applications:
Primarily used in monitors, televisions, and cameras.
40
CMY and CMYK (Cyan, Magenta, Yellow, Black) Model
● The CMY model is a subtractive color model used in color printing. It works by subtracting varying
amounts of cyan, magenta, and yellow from white light.
41
HSV (Hue, Saturation, Value) Model
● Description: The HSV model represents colors in a way that aligns with human
perception. It separates color information (hue) from brightness (value) and intensity
(saturation).
● Applications: Useful for image enhancement, object recognition, and color-based
segmentation.
42
● YUV and YCbCr Models
Description: These models are used in video compression and broadcasting. Y represents luminance, while
U and V (or Cb and Cr in YCbCr) represent chrominance.
Applications: Widely employed in digital video standards such as JPEG and MPEG compression.
● CIELAB Model
Description: The CIELAB model is a perceptually uniform color model developed by the International
Commission on Illumination (CIE). It represents colors in a way that closely matches human vision.
Applications: Useful in color difference measurements, image analysis, and ensuring consistent color
reproduction across different devices.
43
Additional Video Links
44