Unit 1 Computer Vision Notes
Unit 1 Computer Vision Notes
Image Processing
Image processing is a methodology used to perform different operations on any given image or using
that image to extract useful insights from the image. It is similar to signal processing in which we
input an image and the output is an image or set of features linked to that image. Image processing
is one of the most growing set of technologies.
There are three steps included in image processing mentioned on slide.
• Acquisition - This is basically defined as the medium through which we capture the
image. It can be either Video, Scanner or any camera device.
• Storage - After we have the image, it needs to be stored somewhere. Either CD
drive, Hard Drives, Taps etc.
• Processing - Now we have the stored image. We need some system in order to process it
and perform our data manipulations. It can done using Computers or Work stations.
• Communication - In order for image to be passed from one unit to the other we need a
communication channel. This channel can be anything like pen drive, email, internet or
bluetooth connection.
• Display - We need some device in order to view the image or see its output. This
can be done using some display device like Computer Monitor, Printing hard copies
using Printers, Projectors etc
Computer Vision
Computer Vision is a technical field where we try to gain a high level understanding of the images
and videos in order understand how the image would be interpreted by humans.
Computer vision is concerned with automatic extraction, understanding of images, analysis of the
data present in the images. It involves theoretical and practical algorithms to achieve visual
understanding.
Computer vision is a sub domain of artificial intelligence and many use algorithms of Deep
Learning.
Understanding the context means transforming the visual image and then transforming them into
descriptions of real world, which makes sense to the thought processes of humans.
Computer Graphics
Computer Graphics:
• A method of crating images through computer
• Use various techniques and algorithms
• Transform and then presents the image in a visual form
• Help to create 2-D or 3-D images
The size of images can vary from few hundreds to thousands which makes it difficult to fit it
on computer screens. This method is further simplified by computer graphics using various
techniques and algorithms. Computer graphics is creation of images using computers. The
final end product of computer graphics can be an image, drawing, business graph. The
process of computer graphics is a two way process between computers and users. The
computers will receive signals from an input device and the picture is modified accordingly.
Object Detection - Object Detection is a technique where we try to find out the exact
place where a particular object is present in a given image. This object is classified by
drawing a rectangular box around that object.
Object Classification - Given a particular image we try to classify what object it is irrespective
of the position it is present in the image.
Object Tracking - Tracking the object while the image is in motion. Basically used in videos.
Image Segmentation - Classifying each pixel of the image into one of the output classification
categories rather than drawing a rectangle around the object
Imaging Geometry
Imaging Geometry is a part of Computer Graphics, where we try to perform some geometric
operations on the image in order to get a new output image.
1. Scaling - Used to change the image content size.
2. Rotate - Change in the image orientation.
3. Reflect - Flipping over the image contents.
4. Translate - Changing the position of the image contents.
5. Affine - General image content linear transformations.
6. Image Interpolation - Image interpolation occurs when we zoom in the pixel values which
means increasing the quantity of the pixels. Interpolation works by estimating the unknown values
using the known data.
7. Twirling - The twirling effect refers to distortion of an image by rotating a layer around the centre.
8. Image Warping - This process refers to manipulating a particular image such that the
shapes portrayed in the image are significantly distorted. It can be used for correcting the image
distortion.
9. Morphing - It is a process used for metamorphosis from one image into another. We
preform wrapping before Morphing so that both the images have same shape.
Using the above techniques we can also create multiple copies of the same image but they are
slightly different from each other.
Image Sampling
In the Digital Image Processing world, signal capturing from the physical world needs to be
translated into digital form by Digitization process.
An image function f(x,y) must be used to digitize it both spatially and amplitude wise.
Sampling:
The analogous image is continuous in its coordinates and in its amplitude. Digitization of the
coordinates is known as sampling.
Taking samples from the signal will help in reducing the noise, more the number of samples,
more noise will be removed.
The following image shows some random variations, which are caused by the noise in the
signal.
Mathematical Tools
The following image shows the mathematical operations in digital image processing.
Each image is a matrix of n x n pixels. These pixels values range between 0 and 255. Any
mathematical operation is performed on this matrix of n x n in order to create a resultant
matrix of m x m which is the output matrix created after performing mathematical operations. This
m can be either less than or greater than or equal to n.
1. Identity - Identity matrix has 1 across one of the diagonals and 0 at all the other positions.
2. Scaling - Multiplying every element of the matrix with a particular number in order to
increase the magnitude of the pixels.
3. Rotation - Rotating the angle of the images in order to rotate it by a particular angle.
4. Translation - Adding a particular value to the elements of the matrix in order to shift it from its
original position on x and y axis.
5. Shear (Vertical) - Rotating the image on its vertical axis.6. Shear (Horizontal) - Rotating the image
on its horizontal axis.
Image Enhancement
Image enhancement: A process of improving the quality and the information content of the
image before we start processing it.
Common techniques are:
• Contrast Enhancement
• Spatial Filtering
• Density Slicing
• FCC
Contrast Enhancement
Contrast enhancement helps in improving the pixel intensity of the images. It can be done using
linear or non-linear techniques.
Suppose we have a low contrast image as shown in the image above on the left side and the
corresponding high contrast image on the right.
The term and x and y show the pixel intensity for the corresponding images. Each image has N
number of pixels.
Each pixel transformation can be mapped to a function f which will help in achieving the
output pixel and construct the high contrast image as the output of the low contrast input image.
The inverse of this function f can be used to convert the resultant high contrast image into a low
contrast image
Spatial Filtering
A technique used directly on the pixels of the image.
• Uses a mask in size, so that it has a specific centre pixel.
• The mask is transverse over the image such that the mask centre hovers over all the
pixels.
• Smoothing Spatial Filtering is used on blur and noisy images in order to make them
smooth.
Density Slicing
Density Slicing is a technique which is usually used on a single monochrome image for
highlighting areas that appear to be uniform but they are not. Grayscale values (0-255) are
converted into series of intervals or groups with each group assigned a different colour.
FCC
False Composite Colour (FCC) is an artificially generated colour image, which has green, blue and red
colors assigned to the wavelength regions to which they don’t belong in nature.
They are representation of multi spectral images produced using any band other than visible red,
blue and green as the red, blue and green components of display. False colour composites allows us
to understand and visualize the wavelength that human eyes cannot visualize.
Image Segmentation
Image segmentation is a process of partitioning the image pixels into a set or group of pixels which
belong to the same object. The end goal of image segmentation is to convert the image into
something more meaningful and insightful. Image segmentation is usually used to identify objects or
boundaries of the object in the image. Its similar to what object detection is but in object detection
the machine draws a rectangular box around the object whereas in image segmentation each pixel
is classified into one of the output categories and each category has a different colour of
representation.
Some important points about image segmentation:
• The results of image segmentation are a set of segments, which collectively covers the
complete image or few contours extracted from the image.
• Every pixel in the region is similar with respect to few characteristics or some other
properties.
• Adjacent Pixels are significantly different from each other based on these same properties
Groups of Image Segmentation are
• Semantic Segmentation - This approach detects every pixel belonging to class of a
particular object. For example in a given group photo of people, the background is one
object and person is another object.
• Instance Segmentation - This approach identifies, for each pixel, belonging instance of
the object. Each distinct object is identified as interest in the image. For example every
person in the image is segmented as an individual object.
Applications
• Medical Imaging
• Object Detection
• Recognition Tasks
Augmented Reality
Augmented Reality is a technology which is used to created an object virtually in front of you using
some hardware device like Mobile camera, Video Camera etc.
This can be used to generate real world objects which are enhanced using Computer Generated
images (AR Objects).
AR involves overlaying visual, auditory or other sensory information into the world in order to
improve or enhance the experience of an individual.
The most common example you must have come across is Pokemon Go mobile application, which
showed Pokemons using the power of AR.
AR can be defined as a system that can fulfil three basic characteristics:
• Real- Time Interaction
• Combination of Virtual and Real World
• Accurate 3D registration of real and virtual objects. Sensors can also be involved in order to
enhance the experience in terms of audio and feel
Virtual Reality
A simulated experience, which is created using the power of computers. Places the user inside the
experience and making them sense as much as possible like Vision, Hearing, Touch and
sometimes even smell.
Virtual Reality (VR) is a simulated experience that can be different or similar to the real world
environment. A person using VR can ideally move through the virtual environment and look around
the artificial world and interact with the virtual features or items in it.
This type of effect is commonly created by VR headsets. One can categorize the VRs into two
categories:
• Immersive VR - This type of VR changes your views when you change the position of your
head.
• Text based networked VR - This type of VR is mainly used in Distance Learning where one is
the viewer and other is the one present in that environment and experiencing it.
Object Recognition
A collection of related computer vision tasks, which involves identification of objects. Task
involved in object recognition:
• Image Classification
• Object Localization
• Object Detection
Image Classification - It predicts the class or the object in the image
• Input - A simple photograph with a single object in it.
• Output - A class label or an integer mapping to that class
Object Localization - Finds out the location of the object in that image and draws a bounding box
around it.
• Input - Image with one or more objects
• Output - One or more bounding boxes drawn around the objects.
Object Detection - Finds out the presence of the objects with bounding box and then identifies the
objects within that bounding box.
• Input - Image with one or more objects
• Output - One or more bounding box along with their corresponding labels
Object Tracking
Object tracking is a process of locating the objects in motion during a given set of videos.
In order to perform object tracking an algorithm will analyse sequential video frames and then
output the movement of the targets between the frames.
There are multiple of algorithms which their own strengths and weaknesses. Depending on the use
case it is important to select which algorithm should be selected.
Localization and target representation are mostly bottom - up process. Using these methods there
are variety of tools which can be used to identify moving objects.
Target Localization and Representation:
A bottom up approach responsible for locating and tracking the objects.
Some common target representation and localization algorithms are:
• Kernel-based tracking
• Contour-based tracking
Target Localization and representation has 2 types of algorithms:
• Kernel based tracking - It is an iterative localization procedure based on maximum
similarity measure.
• Contour Tracking - It works on detection of object boundary. This algorithm iteratively
evolve an initial contour from the previous frame to its new position in the present frame.
The approach of contour tracking directly evolves the contour by minimizing the contour
energy using gradient descent.
Filtering and Data Association Process:
A top-down approach, which involves incorporating prior information about that particular object or
scene which helps in dealing with objects dynamically.
This type of approach is useful in tracking objects even behind some obstructions. This increases the
complexity of the algorithms.
There are two types of filtering algorithms:
• Kalman filter
• Particle filter.
The two types of algorithms are
• Kalman filter - It is a optimal recursive Bayesian filter from some linear functions subjected
to Gaussian noise. This algorithm uses series of measurements observed over time,
containing noise and other inaccuracies which may be random noises and then produce
estimates of unknown variables which tend to be more precise as compared to those
which are on single measurement filter.
• Particle filter - It is useful for sampling the underlying state- space distribution on
non-gaussian and non-linear processes